Re: [AMBER] Amber 11 and Amber 12

From: David Winogradoff <dwino218.gmail.com>
Date: Wed, 13 Mar 2013 19:30:36 -0400

Hey Dan,

Everything passed for the 2 processor case, but their were a few
failed tests for the 8 processor case. One of the error messages read,

cd qmmm2/xcrd_build_test/ && ./Run.ortho_qmewald0

 * NB pairs 145 185645 exceeds capacity ( 185750) 3
     SIZE OF NONBOND LIST = 185750
 SANDER BOMB in subroutine nonbond_list
 Non bond list overflow!
 check MAXPR in locmem.f
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 3 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun has exited due to process rank 3 with PID 3048 on
node login-2.deepthought.umd.edu exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).


This is similar to the error message I receive when running my simulation.

-David

On Wed, Mar 13, 2013 at 2:57 PM, Daniel Roe <daniel.r.roe.gmail.com> wrote:
> Hi,
>
> Have you been able to successfully run the test cases in parallel?
>
> -Dan
>
> On Wednesday, March 13, 2013, David Winogradoff wrote:
>
>> Hey Jason,
>>
>> I followed the directions on your wiki for installing amber 12,
>> running ./patch_amber.py --update-tree until it returns that Amber12
>> and AmberTools12 are up to date.
>>
>> The error messages from the supercomputer I'm using are,
>>
>>
>> "[compute-f17-1.deepthought.umd.edu:21064] [[44012,1],39]
>> ORTE_ERROR_LOG: Data unpack would read past end of buffer in file
>> runtime/orte_init.c at line 132
>> --------------------------------------------------------------------------
>> It looks like MPI_INIT failed for some reason; your parallel process is
>> likely to abort. There are many reasons that a parallel process can
>> fail during MPI_INIT; some of which are due to configuration or environment
>> problems. This failure appears to be an internal failure; here's some
>> additional information (which may only be relevant to an Open MPI
>> developer):
>>
>> ompi_mpi_init: orte_init failed
>> --> Returned "Data unpack would read past end of buffer" (-26)
>> instead of "Success" (0)
>> --------------------------------------------------------------------------
>> *** The MPI_Init() function was called before MPI_INIT was invoked.
>> *** This is disallowed by the MPI standard.
>> *** Your MPI job will now abort.
>> [compute-f17-1.deepthought.umd.edu:21061] Abort before MPI_INIT
>> completed successfully; not able to guarantee that all other processes
>> were killed!"
>>
>>
>> which I thought were generic openmpi error messages. For your
>> reference, the version of openmpi I am using is openmpi16-gnu.
>>
>> I have also attempted to use the openmpi143-gnu version of openmpi,
>> obtaining the error message,
>>
>>
>> "[compute-f16-26.deepthought.umd.edu:30072] mca: base: component_find:
>> unable to open
>> /cell_root/software/openmpi/1.4.3/gnu46/sys/lib/openmpi/mca_btl_openib:
>> perhaps a missing symbol, or compiled for a different version of Open
>> MPI? (ignored)
>> --------------------------------------------------------------------------
>> MPI_ABORT was invoked on rank 14 in communicator MPI_COMM_WORLD
>> with errorcode 1.
>>
>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
>> You may or may not see output from other processes, depending on
>> exactly when Open MPI kills them."
>>
>>
>> My REMD simulation is 50 replicas at different temperatures in NVT
>> ensembles attempting exchanges every 5 ps. The template for my mdin
>> files is
>>
>> replica exchange
>> &cntrl
>> imin=0, irest=1, ntx=5,
>> ntt=3, temp0=XXXXX, gamma_ln=2.0, ig=-1,
>> nstlim=2500, dt=0.002,
>> ntb=1, ntc=2, ntf=2,
>> ntr=0, ntp=0,
>> ntwr=100000, ntwx=2500, ntpr=2500,
>> cut=12,
>> numexchg=1000,
>> /
>> &wt type='END'
>> /
>>
>>
>>
>> Hopefully this answers your questions, thanks,
>> David
>>
>> On Mon, Mar 11, 2013 at 6:56 PM, Jason Swails <jason.swails.gmail.com<javascript:;>>
>> wrote:
>> > On Mon, Mar 11, 2013 at 6:34 PM, David Winogradoff <dwino218.gmail.com<javascript:;>
>> >wrote:
>> >
>> >> I am currently working on a project using amber in parallel to conduct
>> >> a long REMD run. A few weeks ago, using the same input file that
>> >> worked before (modified to start from where the previous simulation
>> >> ended), my parallel run failed with a generic openmpi error message.
>> >> After some initial hesitation, I decided to delete my installation of
>> >> amber 12 and recompile and rebuild. I have run 'make test'
>> >> successfully with my new serial and parallel versions of amber 12, but
>> >> my REMD simulation still won't run. After many unsuccessful attempts
>> >> to run, I decided to look at some of my previous output files. To my
>> >> surprise, the older output files had amber 11 at the top. Even though
>> >> my new simulations won't run, they do write the header, which says
>> >> amber 12.
>> >>
>> >> I did see that it is possible to have some combination of amber 11 and
>> >> amber 12, but I never installed amber 11 and do not have the tar files
>> >> to do so. Is there any way to install amber 11 with amber 12 tar files
>> >> (accidentally)? My hypothesis is that some requirement, such as my
>> >> version of gnu, was out of date during my previous installation,
>> >> preventing the complete installation of amber 12. I have tried to
>> >> recreate the mistake, since I didn't think I could make files meant
>> >> for amber 11 to work with amber 12, without success. The only solution
>> >> I can think of right now is to purchase amber 11 and create a hybrid
>> >> form of amber 11 and amber 12 that will hopefully run my simulations
>> >> once more.
>> >>
>> >
>> > When Amber 12 was released, it actually still said "Amber 11" at the top.
>> > That was updated in bugfix.2 for Amber 12. Do you have all bug fixes
>> > applied?
>> >
>> > What error messages are you getting? And what exactly are you doing?
>> The
>> > better way forward, IMO, is to fix Amber 12... What kind of REMD are you
>> > doing?
>> >
>> > All the best,
>> > Jason
>> >
>> > --
>> > Jason M. Swails
>> > Quantum Theory Project,
>> > University of Florida
>> > Ph.D. Candidate
>> > 352-392-4032
>> > _______________________________________________
>> > AMBER mailing list
>> > AMBER.ambermd.org <javascript:;>
>> > http://lists.ambermd.org/mailman/listinfo/amber
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org <javascript:;>
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>
>
> --
> -------------------------
> Daniel R. Roe, PhD
> Department of Medicinal Chemistry
> University of Utah
> 30 South 2000 East, Room 201
> Salt Lake City, UT 84112-5820
> http://home.chpc.utah.edu/~cheatham/
> (801) 587-9652
> (801) 585-9119 (Fax)
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Mar 13 2013 - 17:00:02 PDT
Custom Search