[AMBER] job failed for REMD in cluster

From: Albert <mailmd2011.gmail.com>
Date: Mon, 15 Oct 2012 20:48:04 +0200


I am trying to submit REMD jobs in cluster under amber 12 by command:


mpirun -np 64 $AMBERHOME/bin/pmemd.MPI -ng 11 -groupfile

but it said the following. It is OK in mimization steps.

n385:15832] [[30377,0],3]-[[30377,1],53] mca_oob_tcp_msg_recv: readv
failed: Connection reset by peer (104)
[n388:04879] 63 more processes have sent help message help-mpi-api.txt /
[n388:04879] Set MCA parameter "orte_base_help_aggregate" to 0 to see
all help / error messages
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
setup_groups: MPI size is not a multiple of -ng
setup_groups: MPI size is not a multiple of -ng
mpirun has exited due to process rank 0 with PID 4946 on
node n388 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
setup_groups: MPI size is not a multiple of -ng
[n388:04945] 3 more processes have sent help message help-mpi-api.txt /
[n388:04945] Set MCA parameter "orte_base_help_aggregate" to 0 to see
all help / error messages

thank you very much

AMBER mailing list
Received on Mon Oct 15 2012 - 12:00:03 PDT
Custom Search