Hi Robert Duke:
Apologies in advance for what may be a prematurely posed question - but if
you have insight, it could save us a lot of time hunting down a problem.
I am minimizing a solvated protein/ligand system using PMEMD 3.03.
I am seeing reasonable, near-identical results on the following three
A - pmemd run on my dual Xeon desktop (Debian Linux - intel ifc 7.1)
B - pmemd run on our SGI RS12000 x 8cpu cluster
C - pmemd run on two CPUs only (one board only) within our 16 cpu ( 8 dual
boards) PIII myrinet cluster (Linux intel ifc 7.1).
D However, when I run 8 or 16 CPUs on our 16 cpu PIII myrinet cluster
(Linux), I get wildly divergent results - energies off by 10,000 and
100,000 kCAL/mol compared to the other 3 platforms. Moreover, the .out
file states that a single solvent atom (which should be free to move in the
minimization) is continuously responsible for the highest positive
energy. So, I'm pretty sure that our multi-board myrinet run with PMEMD is
doing very bad things. But, I'm not getting any error messages from PMEMD
- just the disturbing variances in output.
Everything about the minimizations is identical expect the varying mpirun
commands required on the different platforms. Between platforms C and D I
only change the "-np" parameter from 8 to 2.
In short... any suggestions on how we might troubleshoot pmemd on
myrinet/linux would be greatly appreciated. (I don't personally maintain
the hardware here - so I'm looking for concrete ideas to forward to our
staff who do). If you'd like to look at any of the simulation files, I can
email them to you directly - but it is far too much to post out on the mail
If you strongly suspect this is a hardware problem on our end, I suppose
running sander and looking for similar trouble would be a good next step.
Any advice appreciated.
Thanks as always
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
Received on Wed Mar 24 2004 - 21:53:00 PST