Florian -
A quick reply to this is that all kinds of weird things can happen on
myrinet clusters with failing or intermittent hardware. At UNC we have seen
lots of grief with this sort of thing; I have never gotten the full story
from the system folks on this, but my current opinion is that such hardware
requires a fair bit of careful oversight and maintenance.
Regards - Bob Duke
----- Original Message -----
From: "Florian Barth" <bio_hazard.gmx.de>
To: <amber.scripps.edu>
Sent: Friday, August 13, 2004 2:52 PM
Subject: AMBER: pmemd problems
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hello,
>
> in some cases (about 5-10% of the jobs) pmemd jobs stop using the
> assigned number of processors N and instead continue to run with N-1
> cpus. This happends after a normal startup of the jobs and without any
> obvious cause. PMEMD or mpiexec/mpirun don't return an error message but
> continue to run normally, only in some of these cases it stops writing
> output files. The same jobs run correctly after a fresh start.
> The pmemd version is 3.1 compiled with ifc 7.1 and mpich-gm 1.2.5..10 on
> a linux cluster with dual athlon nodes and Myrinet interconnect.
> Has anybody observed a similar behaviour of pmemd or can provide a
> solution to this problem?
>
> Regards
>
> Florian Barth
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.3 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFBHQ3gwYcs9fJ1MJIRAh++AJ9hxoJLvOVjn56hjdkbflH4NbzO4gCfTSaj
> wtUMOzSQ1jk0o0O2E+/6Pw8=
> =Ifiq
> -----END PGP SIGNATURE-----
> -----------------------------------------------------------------------
> The AMBER Mail Reflector
> To post, send mail to amber.scripps.edu
> To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
>
-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
Received on Fri Aug 13 2004 - 20:53:02 PDT