if I have 11, so probably I have to use core like 66, 77?
which means I cannot run it with CUDA?
thank you very much
On 10/15/2012 08:48 PM, Albert wrote:
> hello:
>
> I am trying to submit REMD jobs in cluster under amber 12 by command:
>
> .
> .
> .
>
> mpirun -np 64 $AMBERHOME/bin/pmemd.MPI -ng 11 -groupfile
> equilibrate.groupfile
> .
> .
> .
>
> but it said the following. It is OK in mimization steps.
>
>
>
> n385:15832] [[30377,0],3]-[[30377,1],53] mca_oob_tcp_msg_recv: readv
> failed: Connection reset by peer (104)
> [n388:04879] 63 more processes have sent help message help-mpi-api.txt
> / mpi-abort
> [n388:04879] Set MCA parameter "orte_base_help_aggregate" to 0 to see
> all help / error messages
> --------------------------------------------------------------------------
>
> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
> with errorcode 1.
>
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
> --------------------------------------------------------------------------
>
> setup_groups: MPI size is not a multiple of -ng
> setup_groups: MPI size is not a multiple of -ng
> --------------------------------------------------------------------------
>
> mpirun has exited due to process rank 0 with PID 4946 on
> node n388 exiting without calling "finalize". This may
> have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
> --------------------------------------------------------------------------
>
> setup_groups: MPI size is not a multiple of -ng
> [n388:04945] 3 more processes have sent help message help-mpi-api.txt
> / mpi-abort
> [n388:04945] Set MCA parameter "orte_base_help_aggregate" to 0 to see
> all help / error messages
>
>
>
> thank you very much
> Albert
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Oct 15 2012 - 12:00:03 PDT