Re: [AMBER] Amber 11 and Amber 12

From: Jason Swails <jason.swails.gmail.com>
Date: Fri, 15 Mar 2013 13:11:12 -0400

These are actually errors in the test that you get when running with more
than 2 cores. The only way to fix this error is to change the MAXPR
variable in sander and recompile.

I've seen this error a number of times (all in the tests), but I don't
think it's particularly harmful since QM/MM simulations rarely scale out to
8 cores very well, anyway.

HTH,
Jason

On Fri, Mar 15, 2013 at 11:42 AM, Daniel Roe <daniel.r.roe.gmail.com> wrote:

> Hi,
>
> >From this message it seems that you don't have enough memory to store
> all of the non-bonded pairs. How many atoms are in your system, and
> how much memory does each node have available? Are you running in
> explicit solvent?
>
> -Dan
>
> On Wed, Mar 13, 2013 at 5:30 PM, David Winogradoff <dwino218.gmail.com>
> wrote:
> > Hey Dan,
> >
> > Everything passed for the 2 processor case, but their were a few
> > failed tests for the 8 processor case. One of the error messages read,
> >
> > cd qmmm2/xcrd_build_test/ && ./Run.ortho_qmewald0
> >
> > * NB pairs 145 185645 exceeds capacity ( 185750) 3
> > SIZE OF NONBOND LIST = 185750
> > SANDER BOMB in subroutine nonbond_list
> > Non bond list overflow!
> > check MAXPR in locmem.f
> >
> --------------------------------------------------------------------------
> > MPI_ABORT was invoked on rank 3 in communicator MPI_COMM_WORLD
> > with errorcode 1.
> >
> > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> > You may or may not see output from other processes, depending on
> > exactly when Open MPI kills them.
> >
> --------------------------------------------------------------------------
> >
> --------------------------------------------------------------------------
> > mpirun has exited due to process rank 3 with PID 3048 on
> > node login-2.deepthought.umd.edu exiting without calling "finalize".
> This may
> > have caused other processes in the application to be
> > terminated by signals sent by mpirun (as reported here).
> >
> >
> > This is similar to the error message I receive when running my
> simulation.
> >
> > -David
> >
> > On Wed, Mar 13, 2013 at 2:57 PM, Daniel Roe <daniel.r.roe.gmail.com>
> wrote:
> >> Hi,
> >>
> >> Have you been able to successfully run the test cases in parallel?
> >>
> >> -Dan
> >>
> >> On Wednesday, March 13, 2013, David Winogradoff wrote:
> >>
> >>> Hey Jason,
> >>>
> >>> I followed the directions on your wiki for installing amber 12,
> >>> running ./patch_amber.py --update-tree until it returns that Amber12
> >>> and AmberTools12 are up to date.
> >>>
> >>> The error messages from the supercomputer I'm using are,
> >>>
> >>>
> >>> "[compute-f17-1.deepthought.umd.edu:21064] [[44012,1],39]
> >>> ORTE_ERROR_LOG: Data unpack would read past end of buffer in file
> >>> runtime/orte_init.c at line 132
> >>>
> --------------------------------------------------------------------------
> >>> It looks like MPI_INIT failed for some reason; your parallel process is
> >>> likely to abort. There are many reasons that a parallel process can
> >>> fail during MPI_INIT; some of which are due to configuration or
> environment
> >>> problems. This failure appears to be an internal failure; here's some
> >>> additional information (which may only be relevant to an Open MPI
> >>> developer):
> >>>
> >>> ompi_mpi_init: orte_init failed
> >>> --> Returned "Data unpack would read past end of buffer" (-26)
> >>> instead of "Success" (0)
> >>>
> --------------------------------------------------------------------------
> >>> *** The MPI_Init() function was called before MPI_INIT was invoked.
> >>> *** This is disallowed by the MPI standard.
> >>> *** Your MPI job will now abort.
> >>> [compute-f17-1.deepthought.umd.edu:21061] Abort before MPI_INIT
> >>> completed successfully; not able to guarantee that all other processes
> >>> were killed!"
> >>>
> >>>
> >>> which I thought were generic openmpi error messages. For your
> >>> reference, the version of openmpi I am using is openmpi16-gnu.
> >>>
> >>> I have also attempted to use the openmpi143-gnu version of openmpi,
> >>> obtaining the error message,
> >>>
> >>>
> >>> "[compute-f16-26.deepthought.umd.edu:30072] mca: base: component_find:
> >>> unable to open
> >>> /cell_root/software/openmpi/1.4.3/gnu46/sys/lib/openmpi/mca_btl_openib:
> >>> perhaps a missing symbol, or compiled for a different version of Open
> >>> MPI? (ignored)
> >>>
> --------------------------------------------------------------------------
> >>> MPI_ABORT was invoked on rank 14 in communicator MPI_COMM_WORLD
> >>> with errorcode 1.
> >>>
> >>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> >>> You may or may not see output from other processes, depending on
> >>> exactly when Open MPI kills them."
> >>>
> >>>
> >>> My REMD simulation is 50 replicas at different temperatures in NVT
> >>> ensembles attempting exchanges every 5 ps. The template for my mdin
> >>> files is
> >>>
> >>> replica exchange
> >>> &cntrl
> >>> imin=0, irest=1, ntx=5,
> >>> ntt=3, temp0=XXXXX, gamma_ln=2.0, ig=-1,
> >>> nstlim=2500, dt=0.002,
> >>> ntb=1, ntc=2, ntf=2,
> >>> ntr=0, ntp=0,
> >>> ntwr=100000, ntwx=2500, ntpr=2500,
> >>> cut=12,
> >>> numexchg=1000,
> >>> /
> >>> &wt type='END'
> >>> /
> >>>
> >>>
> >>>
> >>> Hopefully this answers your questions, thanks,
> >>> David
> >>>
> >>> On Mon, Mar 11, 2013 at 6:56 PM, Jason Swails <jason.swails.gmail.com
> <javascript:;>>
> >>> wrote:
> >>> > On Mon, Mar 11, 2013 at 6:34 PM, David Winogradoff <
> dwino218.gmail.com<javascript:;>
> >>> >wrote:
> >>> >
> >>> >> I am currently working on a project using amber in parallel to
> conduct
> >>> >> a long REMD run. A few weeks ago, using the same input file that
> >>> >> worked before (modified to start from where the previous simulation
> >>> >> ended), my parallel run failed with a generic openmpi error message.
> >>> >> After some initial hesitation, I decided to delete my installation
> of
> >>> >> amber 12 and recompile and rebuild. I have run 'make test'
> >>> >> successfully with my new serial and parallel versions of amber 12,
> but
> >>> >> my REMD simulation still won't run. After many unsuccessful attempts
> >>> >> to run, I decided to look at some of my previous output files. To my
> >>> >> surprise, the older output files had amber 11 at the top. Even
> though
> >>> >> my new simulations won't run, they do write the header, which says
> >>> >> amber 12.
> >>> >>
> >>> >> I did see that it is possible to have some combination of amber 11
> and
> >>> >> amber 12, but I never installed amber 11 and do not have the tar
> files
> >>> >> to do so. Is there any way to install amber 11 with amber 12 tar
> files
> >>> >> (accidentally)? My hypothesis is that some requirement, such as my
> >>> >> version of gnu, was out of date during my previous installation,
> >>> >> preventing the complete installation of amber 12. I have tried to
> >>> >> recreate the mistake, since I didn't think I could make files meant
> >>> >> for amber 11 to work with amber 12, without success. The only
> solution
> >>> >> I can think of right now is to purchase amber 11 and create a hybrid
> >>> >> form of amber 11 and amber 12 that will hopefully run my simulations
> >>> >> once more.
> >>> >>
> >>> >
> >>> > When Amber 12 was released, it actually still said "Amber 11" at the
> top.
> >>> > That was updated in bugfix.2 for Amber 12. Do you have all bug
> fixes
> >>> > applied?
> >>> >
> >>> > What error messages are you getting? And what exactly are you doing?
> >>> The
> >>> > better way forward, IMO, is to fix Amber 12... What kind of REMD
> are you
> >>> > doing?
> >>> >
> >>> > All the best,
> >>> > Jason
> >>> >
> >>> > --
> >>> > Jason M. Swails
> >>> > Quantum Theory Project,
> >>> > University of Florida
> >>> > Ph.D. Candidate
> >>> > 352-392-4032
> >>> > _______________________________________________
> >>> > AMBER mailing list
> >>> > AMBER.ambermd.org <javascript:;>
> >>> > http://lists.ambermd.org/mailman/listinfo/amber
> >>>
> >>> _______________________________________________
> >>> AMBER mailing list
> >>> AMBER.ambermd.org <javascript:;>
> >>> http://lists.ambermd.org/mailman/listinfo/amber
> >>>
> >>
> >>
> >> --
> >> -------------------------
> >> Daniel R. Roe, PhD
> >> Department of Medicinal Chemistry
> >> University of Utah
> >> 30 South 2000 East, Room 201
> >> Salt Lake City, UT 84112-5820
> >> http://home.chpc.utah.edu/~cheatham/
> >> (801) 587-9652
> >> (801) 585-9119 (Fax)
> >> _______________________________________________
> >> AMBER mailing list
> >> AMBER.ambermd.org
> >> http://lists.ambermd.org/mailman/listinfo/amber
> >
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
>
>
>
> --
> -------------------------
> Daniel R. Roe, PhD
> Department of Medicinal Chemistry
> University of Utah
> 30 South 2000 East, Room 201
> Salt Lake City, UT 84112-5820
> http://home.chpc.utah.edu/~cheatham/
> (801) 587-9652
> (801) 585-9119 (Fax)
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>



-- 
Jason M. Swails
Quantum Theory Project,
University of Florida
Ph.D. Candidate
352-392-4032
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Mar 15 2013 - 10:30:05 PDT
Custom Search