Re: [AMBER] error with NMA calculation

From: Jason Swails <jason.swails.gmail.com>
Date: Fri, 13 Nov 2015 22:05:41 -0500

On Fri, Nov 13, 2015 at 6:33 PM, Mohammad Salem <mohammad.alaraby.gmail.com>
wrote:

> Hi All,
>
> I get this error:
>
> Beginning quasi-harmonic calculations with
> /global/software/amber-14/bin/cpptraj
> --------------------------------------------------------------------------
> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
>

​Looks like the quasi-harmonic calculation is failing.



> ​​
> with errorcode 1.
> ​​
>
> ​​
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> ​​
> You may or may not see output from other processes, depending on
> ​​
> exactly when Open MPI kills them.
>

​This doesn't provide any information about why. It could be that you are
using too much memory -- depending on how many CPUs you are using it may be
trying to do too many concurrent normal mode calculations and the
covariance matrix diagonalization at the same time. Your error message
doesn't say anything about what might have gone wrong -- just that a
problem was detected and it quit.

Some basic debugging advice:

- Look for error messages in some of the output files beginning with
_MMPBSA_​

​-- in particular look for anything in the quasi-harmonic output

- Look for any clues in the standard error stream or standard output stream
(I presume this was a submitted job, and that the output and/or error was
sent to a file).

- Try some simpler calculations. Do not put entropy=1 and normal mode
calculations in the same input file. Try the normal mode approximation in
serial on a single frame to make sure it works. Try the quasi-harmonic
approximation in a separate calculation (again, in serial, but with many
frames) to make sure the two work independently.

- If it works, try scaling the calculation up to more processors. Start
small -- try 2, then 4, slowly working your way up to the number of cores
you want to use. You should see the maximum number of cores that are
supported for your calculation based on when it starts failing (just make
sure that you supply each CPU with a single frame).

Note that doing PB and quasi-harmonic calculations in the same input is
going to be *very* demanding. entropy=1 typically requires many thousands
of frames to even start to converge the total entropy, while it's typically
a waste of resources to carry out PB calculations on more than a couple
hundred (statistically independent) frames.

HTH,
Jason

-- 
Jason M. Swails
BioMaPS,
Rutgers University
Postdoctoral Researcher
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Nov 13 2015 - 19:30:03 PST
Custom Search