[AMBER] job failed for REMD in cluster from Albert on 2012-10-15 (Amber Archive Oct 2012)

From: Albert <mailmd2011.gmail.com>
Date: Mon, 15 Oct 2012 20:48:04 +0200

hello:

I am trying to submit REMD jobs in cluster under amber 12 by command:

.
.
.

mpirun -np 64 $AMBERHOME/bin/pmemd.MPI -ng 11 -groupfile
equilibrate.groupfile
.
.
.

but it said the following. It is OK in mimization steps.

n385:15832] [[30377,0],3]-[[30377,1],53] mca_oob_tcp_msg_recv: readv
failed: Connection reset by peer (104)
[n388:04879] 63 more processes have sent help message help-mpi-api.txt /
mpi-abort
[n388:04879] Set MCA parameter "orte_base_help_aggregate" to 0 to see
all help / error messages
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
setup_groups: MPI size is not a multiple of -ng
setup_groups: MPI size is not a multiple of -ng
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 4946 on
node n388 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
setup_groups: MPI size is not a multiple of -ng
[n388:04945] 3 more processes have sent help message help-mpi-api.txt /
mpi-abort
[n388:04945] Set MCA parameter "orte_base_help_aggregate" to 0 to see
all help / error messages

thank you very much
Albert

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Oct 15 2012 - 12:00:03 PDT