Re: [AMBER] mpirun error

From: Jason Swails <jason.swails.gmail.com>
Date: Sun, 26 Apr 2015 19:19:05 -0400

On Sun, Apr 26, 2015 at 12:23 PM, shahab shariati <shahab.shariati.gmail.com
> wrote:

> Dear amber users
>
> After I installed amber as parallel successfully, I run energy minimization
> correctly. But in heating step, I encountered with following error:
>
> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
> with errorcode 1.
>
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
> --------------------------------------------------------------------------
> [compute-0-0.local][[56620,1],1][btl_tcp_frag.c:215:mca_btl_tcp_frag_recv]
> mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
> --------------------------------------------------------------------------
> mpirun has exited due to process rank 0 with PID 14580 on
> node compute-0-0.local exiting improperly. There are two reasons this could
> occur:
>
> 1. this process did not call "init" before exiting, but others in
> the job did. This can cause a job to hang indefinitely while it waits
> for all processes to call "init". By rule, if one process calls "init",
> then ALL processes must call "init" prior to termination.
>
> 2. this process called "init", but exited without calling "finalize".
> By rule, all processes that call "init" MUST call "finalize" prior to
> exiting or it will be considered an "abnormal termination"
>
> This may have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
>
> --------------------------------------------------------------------------------------------
>
> 1) What is the reason of this error?
>

​No idea. This is a very generic error message -- all it says is
"something has gone wrong". With no details about what you did (i.e., the
*exact* command-line that you used along with the mdin file) or any
potential error messages printed in the mdout file, there is nothing we can
do to help.

​I suggest looking in the mdout file for the error message and Google that
(chances are your problem has been encountered before). If you still can't
figure it out, respond back with the command you used along with the input
file and the contents of the mdout file so we have more information upon
which to investigate the source of the problem.

HTH,
Jason

-- 
Jason M. Swails
BioMaPS,
Rutgers University
Postdoctoral Researcher
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sun Apr 26 2015 - 16:30:02 PDT
Custom Search