Re: [AMBER] MPI_ABORT in pH-REMD

From: Elisa Pieri <elisa.pieri90.gmail.com>
Date: Mon, 21 Mar 2016 16:13:05 +0100

In this case the error is:

 Running multisander version of sander Amber14
    Total processors = 16
    Number of groups = 8


     Coordinate resetting (SHAKE) cannot be accomplished,
     deviation is too large
     NITER, NIT, LL, I and J are : 0 0 749 1497 1498

     Note: This is usually a symptom of some deeper
     problem with the energetics of the system.
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 2 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun has exited due to process rank 3 with PID 24567 on
node agachon exiting improperly. There are two reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[agachon:24563] 1 more process has sent help message help-mpi-api.txt /
mpi-abort
[agachon:24563] Set MCA parameter "orte_base_help_aggregate" to 0 to see
all help / error messages

So there is a SHAKE problem?

Elisa

On Mon, Mar 21, 2016 at 3:59 PM, Jason Swails <jason.swails.gmail.com>
wrote:

> Try sander.MPI. Does it give the same error?
>
> --
> Jason M. Swails
> BioMaPS,
> Rutgers University
> Postdoctoral Researcher
>
> > On Mar 21, 2016, at 5:46 AM, Elisa Pieri <elisa.pieri90.gmail.com>
> wrote:
> >
> > No, because replicas must be run in multisander/multipmemd mode!
> >
> >> On Mon, Mar 21, 2016 at 10:36 AM, Bin ZOU <zoubin1025.gmail.com> wrote:
> >>
> >> Hi Elisa,
> >>
> >> You can just use pmemd, not the MPI version, to have a try and maybe you
> >> can see what is wrong
> >>
> >> On Mon, Mar 21, 2016 at 5:08 PM, Elisa Pieri <elisa.pieri90.gmail.com>
> >> wrote:
> >>
> >>> Ah yes, I forgot to mention it.. the mdout files are empty. Just like
> the
> >>> cpouts and the logfiles.
> >>>
> >>> Any idea?
> >>>
> >>> Thanks, Elisa
> >>>
> >>> On Fri, Mar 18, 2016 at 7:46 PM, Jason Swails <jason.swails.gmail.com>
> >>> wrote:
> >>>
> >>>> On Fri, Mar 18, 2016 at 6:05 AM, Elisa Pieri <elisa.pieri90.gmail.com
> >
> >>>> wrote:
> >>>>
> >>>>> Dear all,
> >>>>>
> >>>>> I was perfectly able to run pH-REMD in explicit solvent, while I have
> >>>>> problems in Implicit solvent. When I execute the command:
> >>>>>
> >>>>> mpirun -n 16 pmemd.MPI -ng 8 -groupfile dio.grpfile
> >>>>>
> >>>>> I get this error:
> >>>>>
> >>>>> Running multipmemd version of pmemd Amber12
> >>>>> Total processors = 16
> >>>>> Number of groups = 8
> >>
> --------------------------------------------------------------------------
> >>>>> MPI_ABORT was invoked on rank 3 in communicator MPI_COMM_WORLD
> >>>>> with errorcode 1.
> >>>>>
> >>>>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> >>>>> You may or may not see output from other processes, depending on
> >>>>> exactly when Open MPI kills them.
> >>
> --------------------------------------------------------------------------
> >>
> --------------------------------------------------------------------------
> >>>>> mpirun has exited due to process rank 3 with PID 6576 on
> >>>>> node agachon exiting improperly. There are two reasons this could
> >>> occur:
> >>>>>
> >>>>> 1. this process did not call "init" before exiting, but others in
> >>>>> the job did. This can cause a job to hang indefinitely while it waits
> >>>>> for all processes to call "init". By rule, if one process calls
> >> "init",
> >>>>> then ALL processes must call "init" prior to termination.
> >>>>>
> >>>>> 2. this process called "init", but exited without calling "finalize".
> >>>>> By rule, all processes that call "init" MUST call "finalize" prior to
> >>>>> exiting or it will be considered an "abnormal termination"
> >>>>>
> >>>>> This may have caused other processes in the application to be
> >>>>> terminated by signals sent by mpirun (as reported here).
> >>>>>
> >>>>> The groupfile has 8 items repeating the unity:
> >>>>>
> >>>>> # pH 08
> >>>>> -O -i ph08.mdin -p 3lzt.parm7 -c 3lzt.equil.rst7 -cpin
> >> 3lzt.equil.cpin
> >>> -o
> >>>>> 3lzt.ph08.mdout -cpout 3lzt.ph08.cpout -cprestrt 3lzt.ph08.cpin -r
> >>>>> 3lzt.ph08.rst7 -inf 3lzt.ph08.mdinfo -rem 4 -remlog rem.ph.log -x
> >>>>> 3lzt.ph08.nc
> >>>>>
> >>>>> (of course, the pH changes from item to item). This is (one of) the
> >>>> input:
> >>>>>
> >>>>> REM for CpH
> >>>>> &cntrl
> >>>>> icnstph=1, dt=0.002, ioutfm=1, ntxo=2,
> >>>>> nstlim=100, ig=-1, ntb=0, numexchg=10000,
> >>>>> ntwr=10000, ntwx=1000, irest=1,
> >>>>> cut=30, ntcnstph=5, ntpr=1000,
> >>>>> ntx=5, solvph=8, saltcon=0.1, ntt=3,
> >>>>> ntc=2, ntf=2, gamma_ln=5.0, igb=2,
> >>>>> tempi=300, temp0=300, nrespa=1,
> >>>>> tol=0.000001,
> >>>>> /
> >>>>>
> >>>>> I don't understand where this error comes from. Can you help me?
> >>>>
> >>>> ​The error message you sent is another one of those "something went
> >>> wrong"
> >>>> error messages -- but it gives no details about the "what" went wrong.
> >>>> Look for error messages in some of the mdout files -- those will be
> >> more
> >>>> informative.
> >>>>
> >>>> HTH,
> >>>> Jason
> >>>>
> >>>> --
> >>>> Jason M. Swails
> >>>> _______________________________________________
> >>>> AMBER mailing list
> >>>> AMBER.ambermd.org
> >>>> http://lists.ambermd.org/mailman/listinfo/amber
> >>> _______________________________________________
> >>> AMBER mailing list
> >>> AMBER.ambermd.org
> >>> http://lists.ambermd.org/mailman/listinfo/amber
> >>
> >>
> >>
> >> --
> >> Regards,
> >> ZOU, Bin
> >> _______________________________________________
> >> AMBER mailing list
> >> AMBER.ambermd.org
> >> http://lists.ambermd.org/mailman/listinfo/amber
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Mar 21 2016 - 08:30:04 PDT
Custom Search