Re: [AMBER] Sander.MPI parallel run

From: Jason Swails <jason.swails.gmail.com>
Date: Thu, 27 Oct 2011 16:21:21 -0400

On Thu, Oct 27, 2011 at 4:02 PM, Lianhu Wei <lianhu.wei.gmail.com> wrote:

> Hi Jason,
>
> On my cluster, I have two network interface, one ethernet and one
> Infiniband. How can I know if the data exchange is via ethernet or
> Infiniband?
>

Not sure -- that's a question to ask your sysadmin. Naively, I would say
switch to mvapich/mvapich2 and see if you get a big speedup. If so, I would
say that's your answer. If not, you're still stuck with the same question.

But I did mean to mention not to use nohup inside a submission script, which
Ross pointed out (that's what reminded me). The last thing you want to do
is launch a process inside an automated environment like this that resists
hang-up signals. nohup is something you use on a personal machine or
something and you don't want to kill the process if you end your shell
session and go home.

HTH,
Jason


> Sorry for the system question.
> William
>
> On Thu, Oct 27, 2011 at 3:32 PM, Jason Swails <jason.swails.gmail.com
> >wrote:
>
> > On Thu, Oct 27, 2011 at 3:08 PM, Lianhu Wei <lianhu.wei.gmail.com>
> wrote:
> >
> > > Hi Ross and other experts,
> > >
> > > This time, I used pmemd.MPI. Generally it runs a lot faster than
> > > sander.MPI. But I still have the same issue when I run on multiple
> > nodes.
> > > And also I looked carefully with my input options. All these runs I
> used
> > > the same input file, just using varies nodes on cluster. The
> > > interconnection among the nodes are infiniband. In y pbs scripts, I
> > > removed
> > > all the unnecessary options. The results is still with 2nodes, I got
> the
> > > maximum performance (x1.65 compared to one node). On 4 nodes, it was
> > x0.3.
> > > on 8 nodes, it was x0.1. The speed is much slower then running on one
> > node.
> > > My simulation system is about 100K atoms. I checked on all the
> > distributed
> > > nodes, there were 8 threads on each nodes.
> > >
> > > When pmemd is running on 1 or 2 nodes, most of the thread on the nodes
> > are
> > > all running. Running on 4 nodes and 8 nodes, I saw many threads showed
> > > status "S" (sleeping).
> > >
> > > I do not know if I did not use options properly, or pmemd did not
> > > distribute
> > > the calculation well enough. Please give suggestions.
> > >
> >
> > pmemd does a good job of loadbalancing, and it does it dynamically. The
> > problem is unlikely to be workload distribution. I'm guessing your issue
> > is
> > interconnect. You say you have infiniband (but not what speed
> infiniband).
> > Keep in mind that with 8 threads per node, you're effectively cutting
> your
> > bandwidth that each processor sees down to 1/8 of the full speed (certain
> > MPI implementations may be able to deal with that more efficiently than
> > others).
> >
> > Another possibility is that your MPI is not actually taking advantage of
> > your infiniband hardware. Can you install a different MPI (like mvapich2
> > or
> > mvapich) and try with that? Those 2 packages, specifically, are tailored
> > specifically for infiniband.
> >
> > HTH,
> > Jason
> >
> >
> > > Thanks,
> > > WIlliam
> > >
> > > Here is my PBS scripts on 2 nodes:
> > >
> > > [william.speed]$ more Qsub_T
> > > #!/bin/bash
> > > #PBS -V
> > > #PBS -r n
> > > #PBS -N ENZ_T
> > > #PBS -l select=2:ncpus=8
> > > #PBS -l select=arch=linux
> > > #PBS -m abe
> > > #PBS -q verylong
> > > #
> > > export WRKDIR=/home/william/Work/PAD4/MD/KP94_ENZ_T
> > > cd $WRKDIR
> > >
> > > ulimit -a
> > >
> > > nohup mpirun -np 16 -machinefile $PBS_NODEFILE pmemd.MPI -O -i
> > > ENZ_KP94_CnsP_4_6ns.in \
> > > -o T_ENZ_KP94_CnsP_6ns.out \
> > > -p K94_ENZ.top \
> > > -c ENZ_KP94_CnsP_4ns.rst \
> > > -x T_ENZ_KP94_CnsP_6ns.mdcrd \
> > > -v T_ENZ_KP94_CnsP_6ns.mdvel \
> > > -e T_ENZ_KP94_CnsP_6ns.mden \
> > > -r T_ENZ_KP94_CnsP_6ns.rst \
> > > -inf T_ENZ_KP94_CnsP_6ns.mdinfo
> > >
> > > =========================
> > >
> > > This is my pmemd input file:
> > > &cntrl
> > > timlim = 999999,
> > > imin=0,
> > > nmropt = 0,
> > >
> > > ntx=7,
> > > irest=1,
> > >
> > > ntxo=1,
> > > ntpr=50,
> > > ntwr=50,
> > > iwrap=0,
> > > ntwx=500,
> > > ntwv=500,
> > > ntwe=500,
> > >
> > > ntf=2,
> > > ntb=2,
> > > dielc=1.0,
> > > igb=0,
> > > scnb=2.0,
> > > scee=1.2,
> > >
> > > nstlim=1000000,
> > > t=10.0,
> > > dt=0.002,
> > >
> > > temp0=300,
> > > tempi=300,
> > > heat=0.0,
> > > ntt=1,
> > > tautp=1.0,
> > > vlimit=0.0,
> > >
> > > ntp=1,
> > > pres0=1.0,
> > > comp=44.6,
> > > taup=1.0,
> > > npscal=1,
> > >
> > >
> > > ntc=2,
> > > tol=0.0005,
> > >
> > > cut=12.0,
> > > &end
> > >
> > > &ewald
> > > a = 104.8389167,
> > > b = 132.7362072,
> > > c = 72.5503008,
> > > alpha=90,
> > > beta=90,
> > > gamma=90,
> > > nfft1=100,
> > > nfft2=144,
> > > nfft3=81,
> > > order=4,
> > > ischrgd=0,
> > > verbose=1,
> > > ew_type=0,
> > > dsum_tol=0.00001,
> > > &end
> > >
> > > Here is the speed reported by pmemd info:
> > >
> > > #PBS -l select=1:ncpus=8
> > > ...
> > > nohup mpirun -np 8 ...
> > >
> > > ==> T1_ENZ_KP94_CnsP_6ns.mdinfo <==
> > > | Elapsed(s) = 69.08 Per Step(ms) = 345.38
> > > | ns/day = 0.50 seconds/ns = 172688.77
> > > |
> > > | Average timings for all steps:
> > > | Elapsed(s) = 3215.09 Per Step(ms) = 347.58
> > > | ns/day = 0.50 seconds/ns = 173788.91
> > > |
> > > |
> > > | Estimated time remaining: 95.7 hours.
> > >
> > >
> >
> ------------------------------------------------------------------------------
> > >
> > > #PBS -l select=2:ncpus=8
> > > ...
> > > nohup mpirun -np 16
> > >
> > > ==> T_ENZ_KP94_CnsP_6ns.mdinfo <==
> > > | Elapsed(s) = 62.11 Per Step(ms) = 207.05
> > > | ns/day = 0.83 seconds/ns = 103524.14
> > > |
> > > | Average timings for all steps:
> > > | Elapsed(s) = 2305.82 Per Step(ms) = 206.80
> > > | ns/day = 0.84 seconds/ns = 103399.92
> > > |
> > > |
> > > | Estimated time remaining: 56.8 hours.
> > >
> > >
> >
> ------------------------------------------------------------------------------
> > >
> > > #PBS -l select=4:ncpus=8
> > > ...
> > > nohup mpirun -np 32
> > >
> > > ==> T4_ENZ_KP94_CnsP_6ns.mdinfo <==
> > > | Elapsed(s) = 147.48 Per Step(ms) = 1474.84
> > > | ns/day = 0.12 seconds/ns = 737418.58
> > > |
> > > | Average timings for all steps:
> > > | Elapsed(s) = 3122.34 Per Step(ms) = 1178.24
> > > | ns/day = 0.15 seconds/ns = 589120.22
> > > |
> > > |
> > > | Estimated time remaining: 326.4 hours.
> > >
> > >
> >
> ------------------------------------------------------------------------------
> > >
> > > #PBS -l select=8:ncpus=8
> > > ...
> > > nohup mpirun -np 64
> > >
> > > ==> T8_ENZ_KP94_CnsP_6ns.mdinfo <==
> > > | Elapsed(s) = 282.79 Per Step(ms) = 5655.89
> > > | ns/day = 0.03 seconds/ns = 2827947.20
> > > |
> > > | Average timings for all steps:
> > > | Elapsed(s) = 2922.61 Per Step(ms) = 3896.82
> > > | ns/day = 0.04 seconds/ns = 1948409.06
> > > |
> > > |
> > > | Estimated time remaining: 1081.6 hours.
> > >
> > >
> >
> ------------------------------------------------------------------------------
> > >
> > > Best,
> > > William
> > >
> > > On Thu, Oct 20, 2011 at 12:33 PM, Ross Walker <ross.rosswalker.co.uk>
> > > wrote:
> > >
> > > > Hi Lianhu,
> > > >
> > > > > I have been struggling with my MD simulation using sander.MPI for
> > many
> > > > > days. Tried many ways, but still can not figure out why my
> parallel
> > > > > running
> > > > > is not speed up. Using 2 nodes, is faster than using 1 node. But
> > when
> > > > > I
> > > > > used 8 nodes, the speed is similar as using 1 node. My system have
> > > > > 101,679
> > > > > atoms. The following is the detail of my tests.
> > > >
> > > > This is normal for sander. It generally won't scale much beyond 32
> > cores
> > > or
> > > > so and especially with these multicore boxes that have a large number
> > of
> > > > cores in a single box but do not have a corresponding interconnect to
> > > > match.
> > > >
> > > > You don't say what your interconnect is. If it is infiniband then you
> > are
> > > > in
> > > > with a shot. If it something else then all bets are off.
> > > >
> > > > A few suggestions:
> > > >
> > > > > #PBS -v LD_LIBRARY_PATH=/home/appmgr/Software/Openmpi/openmpi-
> > > > > 1.4.3/exe/lib
> > > >
> > > > Consider using MVAPICH instead of openmpi. It generally performs
> better
> > > and
> > > > is optimized for infiniband.
> > > >
> > > > > #PBS -l select=8:ncpus=8
> > > >
> > > > I assume you have 8 real cores per node and not 4 cores and 4
> > > hyperthreads?
> > > > - Check this.
> > > >
> > > > > #PBS -l select=arch=linux
> > > > > #PBS -l place=scatter
> > > >
> > > > I am not sure what the 'scatter' implies here for thread placement.
> It
> > is
> > > > probably better just to remove this line all together and use
> whatever
> > > the
> > > > default placement is.
> > > >
> > > > > export OMP_NUM_THREADS=64
> > > > > ##unset OMP_NUM_THREADS
> > > >
> > > > This does absolutely nothing for sander.
> > > >
> > > > >
> > > > > mpirun -np 64 -machinefile $PBS_NODEFILE sander.MPI -O -i
> > > > > ENZ_KP94_CnsP_4_6ns.in \
> > > > > -o ENZ_KP94_CnsP_6ns.out \
> > > > > -p K94_ENZ.top \
> > > > > -c ENZ_KP94_CnsP_4ns.rst \
> > > > > -x ENZ_KP94_CnsP_6ns.mdcrd \
> > > > > -v ENZ_KP94_CnsP_6ns.mdvel \
> > > > > -e ENZ_KP94_CnsP_6ns.mden \
> > > > > -r ENZ_KP94_CnsP_6ns.rst \
> > > > > -inf ENZ_KP94_CnsP_6ns.mdinfo
> > > >
> > > > Consider using pmemd instead of sander. If you input options are
> > > supported
> > > > then pmemd.MPI will generally much faster and scale much better than
> > > > sander.
> > > >
> > > > I would also consider manually checking that the MPI threads get
> placed
> > > on
> > > > the correct nodes. I.e. that you are not just ending up with 64
> threads
> > > > running on the first node.
> > > >
> > > > You can also try:
> > > >
> > > > #PBS -l select=8:ncpus=4
> > > >
> > > > mpirun -np 32 ...
> > > >
> > > > Often leaving cores on a node idle can actually give you higher
> > > performance
> > > > since then the interconnect is not so overloaded.
> > > >
> > > > It would also be helpful to see your input file so we can offer some
> > > > suggestions on tweaking that for performance. I note you have mdvel
> and
> > > > mden
> > > > specified above. Do you actually need these files? Doing too much i/o
> > can
> > > > seriously hurt performance in parallel. I would suggest turning off
> > > writing
> > > > to mden and mdvel unless you absolutely need the info in them.
> > > >
> > > > The biggest improvement is likely to come from using pmemd.MPI
> though.
> > > >
> > > > All the best
> > > > Ross
> > > >
> > > > /\
> > > > \/
> > > > |\oss Walker
> > > >
> > > > ---------------------------------------------------------
> > > > | Assistant Research Professor |
> > > > | San Diego Supercomputer Center |
> > > > | Adjunct Assistant Professor |
> > > > | Dept. of Chemistry and Biochemistry |
> > > > | University of California San Diego |
> > > > | NVIDIA Fellow |
> > > > | http://www.rosswalker.co.uk | http://www.wmd-lab.org/ |
> > > > | Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
> > > > ---------------------------------------------------------
> > > >
> > > > Note: Electronic Mail is not secure, has no guarantee of delivery,
> may
> > > not
> > > > be read every day, and should not be used for urgent or sensitive
> > issues.
> > > >
> > > >
> > > >
> > > >
> > > > _______________________________________________
> > > > AMBER mailing list
> > > > AMBER.ambermd.org
> > > > http://lists.ambermd.org/mailman/listinfo/amber
> > > >
> > > _______________________________________________
> > > AMBER mailing list
> > > AMBER.ambermd.org
> > > http://lists.ambermd.org/mailman/listinfo/amber
> > >
> >
> >
> >
> > --
> > Jason M. Swails
> > Quantum Theory Project,
> > University of Florida
> > Ph.D. Candidate
> > 352-392-4032
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>



-- 
Jason M. Swails
Quantum Theory Project,
University of Florida
Ph.D. Candidate
352-392-4032
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Oct 27 2011 - 13:30:03 PDT
Custom Search