Hi Jason,
On my cluster, I have two network interface, one ethernet and one
Infiniband. How can I know if the data exchange is via ethernet or
Infiniband?
Sorry for the system question.
William
On Thu, Oct 27, 2011 at 3:32 PM, Jason Swails <jason.swails.gmail.com>wrote:
> On Thu, Oct 27, 2011 at 3:08 PM, Lianhu Wei <lianhu.wei.gmail.com> wrote:
>
> > Hi Ross and other experts,
> >
> > This time, I used pmemd.MPI. Generally it runs a lot faster than
> > sander.MPI. But I still have the same issue when I run on multiple
> nodes.
> > And also I looked carefully with my input options. All these runs I used
> > the same input file, just using varies nodes on cluster. The
> > interconnection among the nodes are infiniband. In y pbs scripts, I
> > removed
> > all the unnecessary options. The results is still with 2nodes, I got the
> > maximum performance (x1.65 compared to one node). On 4 nodes, it was
> x0.3.
> > on 8 nodes, it was x0.1. The speed is much slower then running on one
> node.
> > My simulation system is about 100K atoms. I checked on all the
> distributed
> > nodes, there were 8 threads on each nodes.
> >
> > When pmemd is running on 1 or 2 nodes, most of the thread on the nodes
> are
> > all running. Running on 4 nodes and 8 nodes, I saw many threads showed
> > status "S" (sleeping).
> >
> > I do not know if I did not use options properly, or pmemd did not
> > distribute
> > the calculation well enough. Please give suggestions.
> >
>
> pmemd does a good job of loadbalancing, and it does it dynamically. The
> problem is unlikely to be workload distribution. I'm guessing your issue
> is
> interconnect. You say you have infiniband (but not what speed infiniband).
> Keep in mind that with 8 threads per node, you're effectively cutting your
> bandwidth that each processor sees down to 1/8 of the full speed (certain
> MPI implementations may be able to deal with that more efficiently than
> others).
>
> Another possibility is that your MPI is not actually taking advantage of
> your infiniband hardware. Can you install a different MPI (like mvapich2
> or
> mvapich) and try with that? Those 2 packages, specifically, are tailored
> specifically for infiniband.
>
> HTH,
> Jason
>
>
> > Thanks,
> > WIlliam
> >
> > Here is my PBS scripts on 2 nodes:
> >
> > [william.speed]$ more Qsub_T
> > #!/bin/bash
> > #PBS -V
> > #PBS -r n
> > #PBS -N ENZ_T
> > #PBS -l select=2:ncpus=8
> > #PBS -l select=arch=linux
> > #PBS -m abe
> > #PBS -q verylong
> > #
> > export WRKDIR=/home/william/Work/PAD4/MD/KP94_ENZ_T
> > cd $WRKDIR
> >
> > ulimit -a
> >
> > nohup mpirun -np 16 -machinefile $PBS_NODEFILE pmemd.MPI -O -i
> > ENZ_KP94_CnsP_4_6ns.in \
> > -o T_ENZ_KP94_CnsP_6ns.out \
> > -p K94_ENZ.top \
> > -c ENZ_KP94_CnsP_4ns.rst \
> > -x T_ENZ_KP94_CnsP_6ns.mdcrd \
> > -v T_ENZ_KP94_CnsP_6ns.mdvel \
> > -e T_ENZ_KP94_CnsP_6ns.mden \
> > -r T_ENZ_KP94_CnsP_6ns.rst \
> > -inf T_ENZ_KP94_CnsP_6ns.mdinfo
> >
> > =========================
> >
> > This is my pmemd input file:
> > &cntrl
> > timlim = 999999,
> > imin=0,
> > nmropt = 0,
> >
> > ntx=7,
> > irest=1,
> >
> > ntxo=1,
> > ntpr=50,
> > ntwr=50,
> > iwrap=0,
> > ntwx=500,
> > ntwv=500,
> > ntwe=500,
> >
> > ntf=2,
> > ntb=2,
> > dielc=1.0,
> > igb=0,
> > scnb=2.0,
> > scee=1.2,
> >
> > nstlim=1000000,
> > t=10.0,
> > dt=0.002,
> >
> > temp0=300,
> > tempi=300,
> > heat=0.0,
> > ntt=1,
> > tautp=1.0,
> > vlimit=0.0,
> >
> > ntp=1,
> > pres0=1.0,
> > comp=44.6,
> > taup=1.0,
> > npscal=1,
> >
> >
> > ntc=2,
> > tol=0.0005,
> >
> > cut=12.0,
> > &end
> >
> > &ewald
> > a = 104.8389167,
> > b = 132.7362072,
> > c = 72.5503008,
> > alpha=90,
> > beta=90,
> > gamma=90,
> > nfft1=100,
> > nfft2=144,
> > nfft3=81,
> > order=4,
> > ischrgd=0,
> > verbose=1,
> > ew_type=0,
> > dsum_tol=0.00001,
> > &end
> >
> > Here is the speed reported by pmemd info:
> >
> > #PBS -l select=1:ncpus=8
> > ...
> > nohup mpirun -np 8 ...
> >
> > ==> T1_ENZ_KP94_CnsP_6ns.mdinfo <==
> > | Elapsed(s) = 69.08 Per Step(ms) = 345.38
> > | ns/day = 0.50 seconds/ns = 172688.77
> > |
> > | Average timings for all steps:
> > | Elapsed(s) = 3215.09 Per Step(ms) = 347.58
> > | ns/day = 0.50 seconds/ns = 173788.91
> > |
> > |
> > | Estimated time remaining: 95.7 hours.
> >
> >
> ------------------------------------------------------------------------------
> >
> > #PBS -l select=2:ncpus=8
> > ...
> > nohup mpirun -np 16
> >
> > ==> T_ENZ_KP94_CnsP_6ns.mdinfo <==
> > | Elapsed(s) = 62.11 Per Step(ms) = 207.05
> > | ns/day = 0.83 seconds/ns = 103524.14
> > |
> > | Average timings for all steps:
> > | Elapsed(s) = 2305.82 Per Step(ms) = 206.80
> > | ns/day = 0.84 seconds/ns = 103399.92
> > |
> > |
> > | Estimated time remaining: 56.8 hours.
> >
> >
> ------------------------------------------------------------------------------
> >
> > #PBS -l select=4:ncpus=8
> > ...
> > nohup mpirun -np 32
> >
> > ==> T4_ENZ_KP94_CnsP_6ns.mdinfo <==
> > | Elapsed(s) = 147.48 Per Step(ms) = 1474.84
> > | ns/day = 0.12 seconds/ns = 737418.58
> > |
> > | Average timings for all steps:
> > | Elapsed(s) = 3122.34 Per Step(ms) = 1178.24
> > | ns/day = 0.15 seconds/ns = 589120.22
> > |
> > |
> > | Estimated time remaining: 326.4 hours.
> >
> >
> ------------------------------------------------------------------------------
> >
> > #PBS -l select=8:ncpus=8
> > ...
> > nohup mpirun -np 64
> >
> > ==> T8_ENZ_KP94_CnsP_6ns.mdinfo <==
> > | Elapsed(s) = 282.79 Per Step(ms) = 5655.89
> > | ns/day = 0.03 seconds/ns = 2827947.20
> > |
> > | Average timings for all steps:
> > | Elapsed(s) = 2922.61 Per Step(ms) = 3896.82
> > | ns/day = 0.04 seconds/ns = 1948409.06
> > |
> > |
> > | Estimated time remaining: 1081.6 hours.
> >
> >
> ------------------------------------------------------------------------------
> >
> > Best,
> > William
> >
> > On Thu, Oct 20, 2011 at 12:33 PM, Ross Walker <ross.rosswalker.co.uk>
> > wrote:
> >
> > > Hi Lianhu,
> > >
> > > > I have been struggling with my MD simulation using sander.MPI for
> many
> > > > days. Tried many ways, but still can not figure out why my parallel
> > > > running
> > > > is not speed up. Using 2 nodes, is faster than using 1 node. But
> when
> > > > I
> > > > used 8 nodes, the speed is similar as using 1 node. My system have
> > > > 101,679
> > > > atoms. The following is the detail of my tests.
> > >
> > > This is normal for sander. It generally won't scale much beyond 32
> cores
> > or
> > > so and especially with these multicore boxes that have a large number
> of
> > > cores in a single box but do not have a corresponding interconnect to
> > > match.
> > >
> > > You don't say what your interconnect is. If it is infiniband then you
> are
> > > in
> > > with a shot. If it something else then all bets are off.
> > >
> > > A few suggestions:
> > >
> > > > #PBS -v LD_LIBRARY_PATH=/home/appmgr/Software/Openmpi/openmpi-
> > > > 1.4.3/exe/lib
> > >
> > > Consider using MVAPICH instead of openmpi. It generally performs better
> > and
> > > is optimized for infiniband.
> > >
> > > > #PBS -l select=8:ncpus=8
> > >
> > > I assume you have 8 real cores per node and not 4 cores and 4
> > hyperthreads?
> > > - Check this.
> > >
> > > > #PBS -l select=arch=linux
> > > > #PBS -l place=scatter
> > >
> > > I am not sure what the 'scatter' implies here for thread placement. It
> is
> > > probably better just to remove this line all together and use whatever
> > the
> > > default placement is.
> > >
> > > > export OMP_NUM_THREADS=64
> > > > ##unset OMP_NUM_THREADS
> > >
> > > This does absolutely nothing for sander.
> > >
> > > >
> > > > mpirun -np 64 -machinefile $PBS_NODEFILE sander.MPI -O -i
> > > > ENZ_KP94_CnsP_4_6ns.in \
> > > > -o ENZ_KP94_CnsP_6ns.out \
> > > > -p K94_ENZ.top \
> > > > -c ENZ_KP94_CnsP_4ns.rst \
> > > > -x ENZ_KP94_CnsP_6ns.mdcrd \
> > > > -v ENZ_KP94_CnsP_6ns.mdvel \
> > > > -e ENZ_KP94_CnsP_6ns.mden \
> > > > -r ENZ_KP94_CnsP_6ns.rst \
> > > > -inf ENZ_KP94_CnsP_6ns.mdinfo
> > >
> > > Consider using pmemd instead of sander. If you input options are
> > supported
> > > then pmemd.MPI will generally much faster and scale much better than
> > > sander.
> > >
> > > I would also consider manually checking that the MPI threads get placed
> > on
> > > the correct nodes. I.e. that you are not just ending up with 64 threads
> > > running on the first node.
> > >
> > > You can also try:
> > >
> > > #PBS -l select=8:ncpus=4
> > >
> > > mpirun -np 32 ...
> > >
> > > Often leaving cores on a node idle can actually give you higher
> > performance
> > > since then the interconnect is not so overloaded.
> > >
> > > It would also be helpful to see your input file so we can offer some
> > > suggestions on tweaking that for performance. I note you have mdvel and
> > > mden
> > > specified above. Do you actually need these files? Doing too much i/o
> can
> > > seriously hurt performance in parallel. I would suggest turning off
> > writing
> > > to mden and mdvel unless you absolutely need the info in them.
> > >
> > > The biggest improvement is likely to come from using pmemd.MPI though.
> > >
> > > All the best
> > > Ross
> > >
> > > /\
> > > \/
> > > |\oss Walker
> > >
> > > ---------------------------------------------------------
> > > | Assistant Research Professor |
> > > | San Diego Supercomputer Center |
> > > | Adjunct Assistant Professor |
> > > | Dept. of Chemistry and Biochemistry |
> > > | University of California San Diego |
> > > | NVIDIA Fellow |
> > > | http://www.rosswalker.co.uk | http://www.wmd-lab.org/ |
> > > | Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
> > > ---------------------------------------------------------
> > >
> > > Note: Electronic Mail is not secure, has no guarantee of delivery, may
> > not
> > > be read every day, and should not be used for urgent or sensitive
> issues.
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > AMBER mailing list
> > > AMBER.ambermd.org
> > > http://lists.ambermd.org/mailman/listinfo/amber
> > >
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
>
>
>
> --
> Jason M. Swails
> Quantum Theory Project,
> University of Florida
> Ph.D. Candidate
> 352-392-4032
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Oct 27 2011 - 13:30:02 PDT