Re: AMBER: launching a job works with sander.MPI and fail with pmemd.MPI from Robert Duke on 2008-12-16 (Amber Archive Dec 2008)

From: Robert Duke <rduke.email.unc.edu>
Date: Tue, 16 Dec 2008 08:55:47 -0500

Okay, this has been discussed a lot. PMEMD should replicate sander results
for a couple of hundred steps at least, unless you have an unbelievably bad
starting configuration with a couple of atoms on top of each other (in which
case some of the force gradients are huge and the simulation is bad anyway).
However, the thing with MD is that there are on the order of millions, if
not billions, of calculations per step, including additions, and the thing
about addition of floating point numbers on computers is that it is not
truly associative - the order in which the additions are performed DOES
matter, due to truncation in the floating point representation of the
number. So what this means is that if you have an algorithm that is
different AT ALL, even in logically insignificant ways, there will be a
rounding error, and due to the nature of MD, this rounding error will rather
quickly grow. The main sources of difference between pmemd and sander are
probably the following: 1) a different splining function for the erf()
function in pmemd for some implementations (there is an optimization, and
pmemd is actually more accurate than sander), 2) workload distribution
differences running in parallel (which effect which force additions will
occur with net-limited precision of a 64 bit floating point number), and 3)
differences in the order of force additions arising from differences in
calculation and communication order. The thing to note about rounding
error - we are talking about a loss in precision down around 1e-17 I
believe - rather small. Now, the erf() splining errors are probably closer
to 1e-11 - probably the lowest precision transcendental we have, but the
other transcendental functions are probably between these two numbers in
precision (rough guess, have not looked recently, and it will be
machine-dependent). Now all this junk does not really matter, because your
calculation is probably off by at least 1e-5 (actually much worse) based on
precision of forcefield parameterization, the fact that coulomb's law does
not really get electrostatics just right, the fact that (substitute here the
next force term generator) just right, ... And the standard justification
for not being disturbed by all this - the different errors just mean that
you sample different parts of phase space, and if you run long enough, you
will get it all (this last point is why I have labored so long to make pmemd
fast). Run your system on some other software and you will see some more
dramatic differences in phase space sampling... Heck, just change the
cutoffs a bit, the fft grid densities, etc. etc. etc. I have gone on-and-on
about this stuff for the last several years on the amber reflector (see
ambermd.org for links), probably hitting different high and low points -
perhaps worth going back to look over, if you want the complete discussion.
I always jump on these questions, but am sort-of answering for Ross here
because I am 3 hrs closer to Europe and he is hopefully still asleep ;-)
Regards - Bob Duke

----- Original Message -----
From: "Thérèse Malliavin" <terez.pasteur.fr>
To: <amber.scripps.edu>
Sent: Tuesday, December 16, 2008 7:57 AM
Subject: RE: AMBER: launching a job works with sander.MPI and fail with
pmemd.MPI

Hi Ross,

Thank you for your mail. Finally, I tried to use AMBER 10 in place of
AMBER 9, and pmemd runs without any problem. Now, I have another naive
question. I already realized that pmemd runs significantly faster than
sander even on 4 processors. But, if I compare the results obtained
by sander and pmemd starting from the same system, as for example the
total energy, the two runs seem not to be so much correlated. So, I would
like to know whether we have to expect that pmemd or sander should produce
the same numbers if the runs start from the same system. The
differences observed come probably from a different architecture of the
two programs, could you please tell me little bit more about that?

Thank you for your help,

Best regards,

Therese

On Mon, 15 Dec 2008, Ross Walker wrote:

> Hi Therese,
>
> First thing to check. PMEMD when built in parallel (which I assume you
> did)
> is called pmemd, not pmemd.MPI. Hence you should be getting a file not
> found
> error - which in parallel may be masking itself as a lamboot failure.
>
> Also I would make sure you do the following to run cleanly in your script:
>
> export AMBERHOME=/foo/bar/amber10
> lamboot
> mpirun -np 4 $AMBERHOME/exe/pmemd -O -i ...
> lamhalt
>
> Then you can nohup the entire script. You should probably make sure you
> kill
> any existing lambood or lamd instances on your machine first though since
> some will probably be left over from earlier runs. You should also make
> sure
> that pmemd was built with the same version of lam as your mpirun refers
> to.
> Makes sure you run the test cases:
>
> export DO_PARALLEL='mpirun -np 4'
> lamboot
> cd $AMBERHOME/test/
> make test.pmemd
> lamhalt
>
> Good luck,
> Ross
>
>> -----Original Message-----
>> From: owner-amber.scripps.edu [mailto:owner-amber.scripps.edu] On Behalf
>> Of Thérèse Malliavin
>> Sent: Monday, December 15, 2008 6:42 AM
>> To: amber.scripps.edu
>> Cc: terez.pasteur.fr
>> Subject: AMBER: launching a job works with sander.MPI and fail with
>> pmemd.MPI
>>
>> Dear AMBER Netters,
>>
>> I have a question about the use of PMEMD. It is probably a trivial
>> question, but, as I did not find an answer neither on the Web pages
>> neither in the manuals, I am asking it to you.
>>
>> I am doing the parallel calculations with sander.MPI using a lamd deamon
>> and the command nohup to launch the job, so I am doing:
>>
>> . /Bis/shared/centos-3_x86_64/etc/custom.d/amber9_intel8.1_lam-
>> 7.1.2_intel-8.1.sh
>> lamboot
>>
>> before starting the AMBER calculations. The typical command line for
>> sander.MPI is then:
>>
>> mpirun -np 4 ${AMBERHOME}/exe/sander.MPI -O -i mdr1.in -o mdr1.out -inf
>> mdr1.inf -x mdr1.crd -c eq7.rst -p prmtop -r mdr1.rst
>>
>> But, if I replace in the command line sander.MPI by pmemd.MPI:
>>
>> mpirun -np 4 ${AMBERHOME}/exe/pmemd.MPI -O -i mdr1.in -o mdr1.out -inf
>> mdr1.inf -x mdr1.crd -c eq7.rst -p prmtop -r mdr1.rst
>>
>> I get an error saying that lamboot was not started.
>>
>> I am trying to do these calculation on an 64 bits 8-proc Linux machine,
>> running under centos-3. The lam used is the version 7.1.2_intel-8.1.
>>
>> Also, I am only using features which should exist in PMEMD according to
>> the AMBER manual.
>>
>> Do you have any idea what I could check or what to find information to
>> fix
>> this problem?
>>
>> Thank you in abvance for your help,
>>
>> Therese Malliavin
>> Unite de Bioinformatique Structurale
>> Institut Pasteur, Paris
>> France
>>
>> -----------------------------------------------------------------------
>> The AMBER Mail Reflector
>> To post, send mail to amber.scripps.edu
>> To unsubscribe, send "unsubscribe amber" (in the *body* of the email)
>> to majordomo.scripps.edu
>
> -----------------------------------------------------------------------
> The AMBER Mail Reflector
> To post, send mail to amber.scripps.edu
> To unsubscribe, send "unsubscribe amber" (in the *body* of the email)
> to majordomo.scripps.edu
>

-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" (in the *body* of the email)
to majordomo.scripps.edu
Received on Wed Dec 17 2008 - 01:18:56 PST