RE: AMBER: amber 10 parallel cpu utlization from Ross Walker on 2008-09-15 (Amber Archive Sep 2008)

From: Ross Walker <ross.rosswalker.co.uk>
Date: Mon, 15 Sep 2008 19:09:37 -0700

Hi Sohail,

This issue has been discussed many times previously on the list. Please take
a look at the archives http://archive.ambermd.org/ for some examples.

Essentially the problem is that gigabit ethernet is far too slow for running
in parallel on modern systems, in fact I'm surprised you see any speedup
with a crossover cable but perhaps you aren't using multicore nodes, I know
for dual x quad core nodes it is next to useless. Your only option, short of
rewriting the laws of physics, is to move to an interconnect designed for
low latency and high bandwidth as MD in parallel requires. Probably the best
well known example right now is infiniband.

Alternatively, if you are at a US university or national lab then you can
apply for time on NSF supercomputers (http://pops-submit.teragrid.org)
alternatively most countries run supercomputer centers and you should
contact them to see if you can get time.

Unfortunately the days of gigabit ethernet for building clusters to run MD
in parallel are long gone...

Good luck,
Ross

> -----Original Message-----
> From: owner-amber.scripps.edu [mailto:owner-amber.scripps.edu] On Behalf
> Of meandme meandme
> Sent: Monday, September 15, 2008 10:31 AM
> To: amber.scripps.edu
> Subject: AMBER: amber 10 parallel cpu utlization
>
> Dear amber users and makers,
>
>
>
> I have compiled amber 10 using gfortran and -mpich2 option. MPI library
> used is mpich2-1.0.7 and OS used is opensuse 10.3 linux x86.
>
> Now if i execute md simulations in parallel using 2 nodes, the cpu
> utilization is 91% on each node. i have connected the 2 nodes through
> gigabit NICS and crossover cables, without any switch/hub. My questions is
> that why utilization is not 99.9%. Furthermore, if i use a swtich and add
> additional nodes the cpu utilization further reduces to 70% on each node.
> Why it is so? is it because of slow network connection (10/100 in my case
> with 1gb nics). Or should i use additional switches/flags while compiling
> amber 10 or mpich2. i use this command to execute sander on 2 nodes;
>
> mpiexec -np 2 sander.MPI -O -i md...........................bla bla bla
>
>
>
> or is my problem because of not using a load balancer or scheduler like
> "openbs" etc. i have also tried running parallel md on hard disks instead
> of nfs file system but there is no increase in speed. My 2 node cluster is
> doing 2000 steps with 28000 atoms in 10 minutes. I will be thankful for
> suggestions in order to maximize my cluster cpu utlization on each node.
>
>
>
> best regards,
>
> ...................sohail. pakistan.
>
>
>
>
>
> /life is like an english grammar.....present tense past perfect\
>
>
>
>
> -----------------------------------------------------------------------
> The AMBER Mail Reflector
> To post, send mail to amber.scripps.edu
> To unsubscribe, send "unsubscribe amber" (in the *body* of the email)
> to majordomo.scripps.edu

-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" (in the *body* of the email)
to majordomo.scripps.edu
Received on Wed Sep 17 2008 - 03:09:03 PDT