Re: [AMBER] Sander.MPI parallel run from Ross Walker on 2011-10-20 (Amber Archive Oct 2011)

From: Ross Walker <ross.rosswalker.co.uk>
Date: Thu, 20 Oct 2011 09:33:14 -0700

Hi Lianhu,

> I have been struggling with my MD simulation using sander.MPI for many
> days. Tried many ways, but still can not figure out why my parallel
> running
> is not speed up. Using 2 nodes, is faster than using 1 node. But when
> I
> used 8 nodes, the speed is similar as using 1 node. My system have
> 101,679
> atoms. The following is the detail of my tests.

This is normal for sander. It generally won't scale much beyond 32 cores or
so and especially with these multicore boxes that have a large number of
cores in a single box but do not have a corresponding interconnect to match.

You don't say what your interconnect is. If it is infiniband then you are in
with a shot. If it something else then all bets are off.

A few suggestions:

> #PBS -v LD_LIBRARY_PATH=/home/appmgr/Software/Openmpi/openmpi-
> 1.4.3/exe/lib

Consider using MVAPICH instead of openmpi. It generally performs better and
is optimized for infiniband.

> #PBS -l select=8:ncpus=8

I assume you have 8 real cores per node and not 4 cores and 4 hyperthreads?
- Check this.

> #PBS -l select=arch=linux
> #PBS -l place=scatter

I am not sure what the 'scatter' implies here for thread placement. It is
probably better just to remove this line all together and use whatever the
default placement is.

> export OMP_NUM_THREADS=64
> ##unset OMP_NUM_THREADS

This does absolutely nothing for sander.

>
> mpirun -np 64 -machinefile $PBS_NODEFILE sander.MPI -O -i
> ENZ_KP94_CnsP_4_6ns.in \
> -o ENZ_KP94_CnsP_6ns.out \
> -p K94_ENZ.top \
> -c ENZ_KP94_CnsP_4ns.rst \
> -x ENZ_KP94_CnsP_6ns.mdcrd \
> -v ENZ_KP94_CnsP_6ns.mdvel \
> -e ENZ_KP94_CnsP_6ns.mden \
> -r ENZ_KP94_CnsP_6ns.rst \
> -inf ENZ_KP94_CnsP_6ns.mdinfo

Consider using pmemd instead of sander. If you input options are supported
then pmemd.MPI will generally much faster and scale much better than sander.

I would also consider manually checking that the MPI threads get placed on
the correct nodes. I.e. that you are not just ending up with 64 threads
running on the first node.

You can also try:

#PBS -l select=8:ncpus=4

mpirun -np 32 ...

Often leaving cores on a node idle can actually give you higher performance
since then the interconnect is not so overloaded.

It would also be helpful to see your input file so we can offer some
suggestions on tweaking that for performance. I note you have mdvel and mden
specified above. Do you actually need these files? Doing too much i/o can
seriously hurt performance in parallel. I would suggest turning off writing
to mden and mdvel unless you absolutely need the info in them.

The biggest improvement is likely to come from using pmemd.MPI though.

All the best
Ross

/\
\/
|\oss Walker

---------------------------------------------------------
| Assistant Research Professor |
| San Diego Supercomputer Center |
| Adjunct Assistant Professor |
| Dept. of Chemistry and Biochemistry |
| University of California San Diego |
| NVIDIA Fellow |
| http://www.rosswalker.co.uk | http://www.wmd-lab.org/ |
| Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
---------------------------------------------------------

Note: Electronic Mail is not secure, has no guarantee of delivery, may not
be read every day, and should not be used for urgent or sensitive issues.

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Oct 20 2011 - 10:00:02 PDT