Re: [AMBER] strange background job crash

From: Ross Walker <ross.rosswalker.co.uk>
Date: Mon, 18 Apr 2011 09:35:20 -0700

Hi Bala,

It is perfectly fine to use 25 cpus for a run with PMEMD. I just thought it
was a strange number since it seems like a strange combination of node count
and cores per node that you would have to use to get to the number 25. Are
you using 5 nodes and 5 cores per node? It is not normally a good idea to
use different core counts per node. I.e. 6 nodes where you use 4 cores on 5
of the nodes and 5 on another. This will lead to all sorts of imbalances in
the way the interconnect is used and probably adversely affect performance.

All the best
Ross

> -----Original Message-----
> From: Bala subramanian [mailto:bala.biophysics.gmail.com]
> Sent: Monday, April 18, 2011 9:25 AM
> To: AMBER Mailing List
> Subject: Re: [AMBER] strange background job crash
>
> Thank you, Now the job seems to be running. But why is it strange to
> use 25
> cpu for the process. Somewhere in amber forum i have read once that for
> sander one needs to give no. of cpu's in powers of 2 but for pmemd any
> no.
> of cpu can be given. Is this still the case ?
>
> On Mon, Apr 18, 2011 at 6:11 PM, Ross Walker <ross.rosswalker.co.uk>
> wrote:
>
> > Hi Bala,
> >
> > Try redirecting stdin from /dev/null - some MPI implementations
> require
> > this. As an aside 25 MPI threads is a strange number to be using. You
> > really
> > have 25 cores allocated to this job?
> >
> > nohup mpirun -np 25 pmemd.MPI -O -i md1 -o md1.out -r md1.rst -p
> ALL.top
> > -c min1.rst -ref min1.rst </dev/null &
> >
> > All the best
> > Ross
> >
> > > -----Original Message-----
> > > From: Bala subramanian [mailto:bala.biophysics.gmail.com]
> > > Sent: Monday, April 18, 2011 8:44 AM
> > > To: AMBER Mailing List
> > > Subject: [AMBER] strange background job crash
> > >
> > > Friends,
> > > When i submit a job using pmemd.MPI (amber 11), it runs fine
> (following
> > > syntax)
> > >
> > > mpirun -np 25 pmemd.MPI -O -i md1 -o md1.out -r md1.rst -p
> ALL.top -
> > > c
> > > min1.rst -ref min1.rst
> > >
> > > But when i submit the same job in background with & symbol at the
> end
> > > and if
> > > i press any key after the job submission, the job gets terminated.
> > > Could you
> > > please write me what would be the problem. The following was the
> > > message the
> > > termination throws.
> > >
> > > HYDU_sock_read (./utils/sock/sock.c:223): read errno (Input/output
> > > error)
> > > control_cb (./pm/pmiserv/pmiserv_cb.c:249): assert (!closed) failed
> > > HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77):
> callback
> > > returned error status
> > > HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:206):
> error
> > > waiting for event
> > > main (./ui/mpich/mpiexec.c:404): process manager error waiting for
> > > completion
> > >
> > > I am submitting the job in SUSE Linux Enterprise Server 11.
> > >
> > > Thanks,
> > > Bala
> > > _______________________________________________
> > > AMBER mailing list
> > > AMBER.ambermd.org
> > > http://lists.ambermd.org/mailman/listinfo/amber
> >
> >
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Apr 18 2011 - 10:00:03 PDT
Custom Search