Re: AMBER: parallel AMBER/pmemd installation problem on Opteron

From: Lars Packschies <packschies.rrz.uni-koeln.de>
Date: Fri, 25 Feb 2005 00:58:25 +0100

--On Mittwoch, Februar 23, 2005 12:27:13 -0500 Robert Duke
<rduke.email.unc.edu> wrote:

> Lars-
> If anything, I would say pmemd is more stable, or at least as stable, on
> platforms it has been tested on. However, the pgi compiler is a bit of
> an unknown quantity; apparently the pgi c compiler is known to be
> problematic. When you are seeing hangs, that is most likely associated
> with the network layer (mpi) somehow, and the only reason that pmemd may
> give more trouble in this area than sander is that it uses nonblocking
> primitives a lot more, and drives the net connection harder. I would be
> interested to hear exactly what you are observing. and on what exact
> setup (hw + sw), and it is possible that my work on the cray machines
> will shed some light on the problems, as it is a similar setup (opterons,
> mpich pgi compilers).

Dear Bob,

  thanks for your reply. Please see this as some preliminary information,
especially regarding the compiler issues, I'm still trying some things.

Let me start with the hardware and Compilers first. We use a 128 dual core
node Sun (v20z) system with 2.2 GHz Opterons, 4Gb Mem per node. IB is from
Voltaire, all latency measurements and bandwidth tested ok. Virtually no
issues regarding stability, it just works fine - except "cleaning up"
problems when jobs crash or hang. Which seems to be kind of normal with
mpich...?

There are other issues but these are not related to Amber/Pmemd.

We use Rocks as OS (www.rocksclusters.org), Version 3.3.0 and the IB
drivers package "ibhost-hpc-2.2.0_10-4rhas3.k2.4.21_20", Kernel
2.4.21-20.ELsmp. On the compiler side there is the PGI CDK 5.2-2. As far as
I know Rocks is moving to the 2.6 Kernel soon. Furthermore, Voltaire just
finished a new version of the ibhost package which I'm going to try in a
few days (on a small partition of the cluster).

I compiled Amber following the instructions I found here:
<http://www.pgroup.com/resources/amber/amber8_pgi52.htm>

I could compile Pmemd with your hotfix (ierr to ierr_nmr twice).

If you wish, I can provide you with Amber 8 parallel benchmarks (hb, factor
ix and jac) with up to 128 processors. It would be of interest for me to
see if you and others see comparable scaling behavior.

I have to test Pmemd some more and try to isolate more specific sources of
errors. Up to know it looks too diffuse.

> One other thought. Early on last year, I
> attempted to run on an opteron workstation and had serious problems with
> heating; this would cause hangs of the entire system (ie., the machine
> locks up), and the problem was worse with pmemd because it would drive
> the opteron fp unit about 50% harder. Any chance your opterons have
> cooling problems (on a dual p4 with thermoregulated fans, I can hear the
> fans rev up as you go into a pmemd run - sounds like a jet taxiing out).

Up to now we did not have any problems with overheating. E.g. two weeks ago
the cluster ran 100 hours on full throttle without getting too hot.

Sincerely,

  Lars

-- 
Dr. Lars Packschies, Computing Center, University of Cologne
-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
Received on Fri Feb 25 2005 - 00:53:00 PST
Custom Search