Hi Jason,
Thanks for your support. I looked at the output error (sorry I named the
job incorrectly that's why didn't see this in the beginning). It has
lots of errors that start with:
/var/spool/torque/mom_priv/jobs/9489199.yak.local.SC: line 9: purge:
command not found
[n248:16483] 1 more process has sent help message help-mpi-runtime.txt /
mpi_init:warn-fork
[n248:16483] Set MCA parameter "orte_base_help_aggregate" to 0 to see
all help / error messages
[n248:16483] 10 more processes have sent help message
help-mpi-runtime.txt / mpi_init:warn-fork
Line minimizer aborted: step at upper bound 0.0016720636
--------
and then lots of "line minimizer aborted", so I skip them
--------
/var/spool/torque/mom_priv/jobs/9489199.yak.local.SC: line 9: purge:
command not found
[n248:16483] 1 more process has sent help message help-mpi-runtime.txt /
mpi_init:warn-fork
[n248:16483] Set MCA parameter "orte_base_help_aggregate" to 0 to see
all help / error messages
[n248:16483] 10 more processes have sent help message
help-mpi-runtime.txt / mpi_init:warn-fork
Line minimizer aborted: step at upper bound 0.0016720636
I am trying now your other debugging hints. Let me know if you have
further suggestions (much appreciated).
Regards,
Mohammad
On 15-11-13 08:05 PM, Jason Swails wrote:
> On Fri, Nov 13, 2015 at 6:33 PM, Mohammad Salem <mohammad.alaraby.gmail.com>
> wrote:
>
>> Hi All,
>>
>> I get this error:
>>
>> Beginning quasi-harmonic calculations with
>> /global/software/amber-14/bin/cpptraj
>> --------------------------------------------------------------------------
>> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
>>
> Looks like the quasi-harmonic calculation is failing.
>
>
>
>>
>> with errorcode 1.
>>
>>
>>
>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
>>
>> You may or may not see output from other processes, depending on
>>
>> exactly when Open MPI kills them.
>>
> This doesn't provide any information about why. It could be that you are
> using too much memory -- depending on how many CPUs you are using it may be
> trying to do too many concurrent normal mode calculations and the
> covariance matrix diagonalization at the same time. Your error message
> doesn't say anything about what might have gone wrong -- just that a
> problem was detected and it quit.
>
> Some basic debugging advice:
>
> - Look for error messages in some of the output files beginning with
> _MMPBSA_
>
> -- in particular look for anything in the quasi-harmonic output
>
> - Look for any clues in the standard error stream or standard output stream
> (I presume this was a submitted job, and that the output and/or error was
> sent to a file).
>
> - Try some simpler calculations. Do not put entropy=1 and normal mode
> calculations in the same input file. Try the normal mode approximation in
> serial on a single frame to make sure it works. Try the quasi-harmonic
> approximation in a separate calculation (again, in serial, but with many
> frames) to make sure the two work independently.
>
> - If it works, try scaling the calculation up to more processors. Start
> small -- try 2, then 4, slowly working your way up to the number of cores
> you want to use. You should see the maximum number of cores that are
> supported for your calculation based on when it starts failing (just make
> sure that you supply each CPU with a single frame).
>
> Note that doing PB and quasi-harmonic calculations in the same input is
> going to be *very* demanding. entropy=1 typically requires many thousands
> of frames to even start to converge the total entropy, while it's typically
> a waste of resources to carry out PB calculations on more than a couple
> hundred (statistically independent) frames.
>
> HTH,
> Jason
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Nov 16 2015 - 22:00:03 PST