Re: [AMBER] job failed for REMD in cluster

From: Carlos Simmerling <carlos.simmerling.gmail.com>
Date: Mon, 15 Oct 2012 15:01:47 -0400

Why 11?
On Oct 15, 2012 2:54 PM, "Albert" <mailmd2011.gmail.com> wrote:

> if I have 11, so probably I have to use core like 66, 77?
> which means I cannot run it with CUDA?
>
> thank you very much
>
>
> On 10/15/2012 08:48 PM, Albert wrote:
> > hello:
> >
> > I am trying to submit REMD jobs in cluster under amber 12 by command:
> >
> > .
> > .
> > .
> >
> > mpirun -np 64 $AMBERHOME/bin/pmemd.MPI -ng 11 -groupfile
> > equilibrate.groupfile
> > .
> > .
> > .
> >
> > but it said the following. It is OK in mimization steps.
> >
> >
> >
> > n385:15832] [[30377,0],3]-[[30377,1],53] mca_oob_tcp_msg_recv: readv
> > failed: Connection reset by peer (104)
> > [n388:04879] 63 more processes have sent help message help-mpi-api.txt
> > / mpi-abort
> > [n388:04879] Set MCA parameter "orte_base_help_aggregate" to 0 to see
> > all help / error messages
> >
> --------------------------------------------------------------------------
> >
> > MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
> > with errorcode 1.
> >
> > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> > You may or may not see output from other processes, depending on
> > exactly when Open MPI kills them.
> >
> --------------------------------------------------------------------------
> >
> > setup_groups: MPI size is not a multiple of -ng
> > setup_groups: MPI size is not a multiple of -ng
> >
> --------------------------------------------------------------------------
> >
> > mpirun has exited due to process rank 0 with PID 4946 on
> > node n388 exiting without calling "finalize". This may
> > have caused other processes in the application to be
> > terminated by signals sent by mpirun (as reported here).
> >
> --------------------------------------------------------------------------
> >
> > setup_groups: MPI size is not a multiple of -ng
> > [n388:04945] 3 more processes have sent help message help-mpi-api.txt
> > / mpi-abort
> > [n388:04945] Set MCA parameter "orte_base_help_aggregate" to 0 to see
> > all help / error messages
> >
> >
> >
> > thank you very much
> > Albert
> >
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Oct 15 2012 - 12:30:05 PDT
Custom Search