RE: AMBER: the parallel efficiency of my sander and pmemd descended rapidly with increasing CPUs from Ross Walker on 2006-05-14 (Amber Archive May 2006)

From: Ross Walker <ross.rosswalker.co.uk>
Date: Sun, 14 May 2006 22:13:01 -0700

Dear Zhihong,

> for factor_ix:
> pmemd:
> CPU Ps/Day speedup Parallel efficiency
> 1 77.14 1.00 100.00%
> 8 284.58 3.69 46.12%
>
> Compared with amber website's data (pmemd, Intel Xeon x86_64,
> 3.4 GHz, 1cpu, jac, 179 ps/day), I think my 159 ps/day is
> normal and acceptable, But compared with datas on
> "http://coffee.sdsc.edu/amber_sdsc/", my parallel

> BTW, for sander, my configure command is: ./configure -mpi -p4 ifort
> for pmemd, the command is: ./new_configure linux_p4

Are you absolutely sure you linked against the correct libraries. That is
that your mpirun is definately going over the myrinet and not the gigabit
ethernet? This is a common error where the mpi libraries are either
installed incorrectly, sander and PMEMD are built against the wrong
libraries and or the users environment / batch queuing system is setup wrong
so the wrong MPI libraries are accessed at runtime. The net result is that
the mpi traffic ends up going over the ethernet network via TCP/IP instead
of myrinet. Check your environment when you run a job. Also check how many
jobs are being allocated to each node, is the batch scheduler allocating
things correctly? Take a look at the ethernet switch during a run is there
an inordinate amount of traffic on it? Have you got any benchmarks / test
cases that came with the mpi installation? I would try these out and see if
things look good. E.g. ping pong tests / bandwidth tests etc.

If you have dual processor nodes the best way to allocate a 16 way job is as
8x2. If your scheduler gives you 16x1 then you may be competing with
somebody elses job on the same node which can cause big performance
problems. I would check this by having your run script echo the nodes that
were allocated to stdout so it turns up in the job's log. You might also
want to get each node to do a 'ps -aux' and echo this to the job log as well
to check that there are no competing processes on the nodes and/or they are
not short of memory for some reason.

These are just a couple of suggestions, it is a little hard to offer much
more advice without more details about your system, setup, batch scheduler,
runtime environment etc. Are other people running calculations on the
system? Do they see the speedup they expect? You should end up getting
around 9.8x speedup for JAC and 9.1x for FactorIX on 16 cpus. Look at the
IA64 Teragrid column on http://coffee.sdsc.edu/amber_sdsc/ this is a
myrinet machine. Note myrinet is not great, infiniband is definately better
but it is definately better than gigabit ethernet.

All the best
Ross

/\
\/
|\oss Walker

| HPC Consultant and Staff Scientist |
| San Diego Supercomputer Center |
| Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
| http://www.rosswalker.co.uk | PGP Key available on request |

Note: Electronic Mail is not secure, has no guarantee of delivery, may not
be read every day, and should not be used for urgent or sensitive issues.

-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
Received on Wed May 17 2006 - 06:07:07 PDT