Fw: AMBER: launching a job works with sander.MPI and fail with pmemd.MPI

From: Robert Duke <rduke.email.unc.edu>
Date: Tue, 16 Dec 2008 09:56:13 -0500

Sorry if this creates additional noise; I am resending this because I have
not seen it show up on the list after an hour. - Bob
----- Original Message -----
From: "Robert Duke" <rduke.email.unc.edu>
To: <amber.scripps.edu>
Sent: Tuesday, December 16, 2008 8:55 AM
Subject: Re: AMBER: launching a job works with sander.MPI and fail with
pmemd.MPI


> Okay, this has been discussed a lot. PMEMD should replicate sander
> results for a couple of hundred steps at least, unless you have an
> unbelievably bad starting configuration with a couple of atoms on top of
> each other (in which case some of the force gradients are huge and the
> simulation is bad anyway). However, the thing with MD is that there are on
> the order of millions, if not billions, of calculations per step,
> including additions, and the thing about addition of floating point
> numbers on computers is that it is not truly associative - the order in
> which the additions are performed DOES matter, due to truncation in the
> floating point representation of the number. So what this means is that
> if you have an algorithm that is different AT ALL, even in logically
> insignificant ways, there will be a rounding error, and due to the nature
> of MD, this rounding error will rather quickly grow. The main sources of
> difference between pmemd and sander are probably the following: 1) a
> different splining function for the erf() function in pmemd for some
> implementations (there is an optimization, and pmemd is actually more
> accurate than sander), 2) workload distribution differences running in
> parallel (which effect which force additions will occur with net-limited
> precision of a 64 bit floating point number), and 3) differences in the
> order of force additions arising from differences in calculation and
> communication order. The thing to note about rounding error - we are
> talking about a loss in precision down around 1e-17 I believe - rather
> small. Now, the erf() splining errors are probably closer to 1e-11 -
> probably the lowest precision transcendental we have, but the other
> transcendental functions are probably between these two numbers in
> precision (rough guess, have not looked recently, and it will be
> machine-dependent). Now all this junk does not really matter, because
> your calculation is probably off by at least 1e-5 (actually much worse)
> based on precision of forcefield parameterization, the fact that coulomb's
> law does not really get electrostatics just right, the fact that
> (substitute here the next force term generator) just right, ... And the
> standard justification for not being disturbed by all this - the different
> errors just mean that you sample different parts of phase space, and if
> you run long enough, you will get it all (this last point is why I have
> labored so long to make pmemd fast). Run your system on some other
> software and you will see some more dramatic differences in phase space
> sampling... Heck, just change the cutoffs a bit, the fft grid densities,
> etc. etc. etc. I have gone on-and-on about this stuff for the last
> several years on the amber reflector (see ambermd.org for links), probably
> hitting different high and low points - perhaps worth going back to look
> over, if you want the complete discussion. I always jump on these
> questions, but am sort-of answering for Ross here because I am 3 hrs
> closer to Europe and he is hopefully still asleep ;-)
> Regards - Bob Duke
>
> ----- Original Message -----
> From: "Thérèse Malliavin" <terez.pasteur.fr>
> To: <amber.scripps.edu>
> Sent: Tuesday, December 16, 2008 7:57 AM
> Subject: RE: AMBER: launching a job works with sander.MPI and fail with
> pmemd.MPI
>
>
> Hi Ross,
>
> Thank you for your mail. Finally, I tried to use AMBER 10 in place of
> AMBER 9, and pmemd runs without any problem. Now, I have another naive
> question. I already realized that pmemd runs significantly faster than
> sander even on 4 processors. But, if I compare the results obtained
> by sander and pmemd starting from the same system, as for example the
> total energy, the two runs seem not to be so much correlated. So, I would
> like to know whether we have to expect that pmemd or sander should produce
> the same numbers if the runs start from the same system. The
> differences observed come probably from a different architecture of the
> two programs, could you please tell me little bit more about that?
>
> Thank you for your help,
>
> Best regards,
>
> Therese
>
> On Mon, 15 Dec 2008, Ross Walker wrote:
>
>> Hi Therese,
>>
>> First thing to check. PMEMD when built in parallel (which I assume you
>> did)
>> is called pmemd, not pmemd.MPI. Hence you should be getting a file not
>> found
>> error - which in parallel may be masking itself as a lamboot failure.
>>
>> Also I would make sure you do the following to run cleanly in your
>> script:
>>
>> export AMBERHOME=/foo/bar/amber10
>> lamboot
>> mpirun -np 4 $AMBERHOME/exe/pmemd -O -i ...
>> lamhalt
>>
>> Then you can nohup the entire script. You should probably make sure you
>> kill
>> any existing lambood or lamd instances on your machine first though since
>> some will probably be left over from earlier runs. You should also make
>> sure
>> that pmemd was built with the same version of lam as your mpirun refers
>> to.
>> Makes sure you run the test cases:
>>
>> export DO_PARALLEL='mpirun -np 4'
>> lamboot
>> cd $AMBERHOME/test/
>> make test.pmemd
>> lamhalt
>>
>> Good luck,
>> Ross
>>
>>> -----Original Message-----
>>> From: owner-amber.scripps.edu [mailto:owner-amber.scripps.edu] On Behalf
>>> Of Thérèse Malliavin
>>> Sent: Monday, December 15, 2008 6:42 AM
>>> To: amber.scripps.edu
>>> Cc: terez.pasteur.fr
>>> Subject: AMBER: launching a job works with sander.MPI and fail with
>>> pmemd.MPI
>>>
>>> Dear AMBER Netters,
>>>
>>> I have a question about the use of PMEMD. It is probably a trivial
>>> question, but, as I did not find an answer neither on the Web pages
>>> neither in the manuals, I am asking it to you.
>>>
>>> I am doing the parallel calculations with sander.MPI using a lamd deamon
>>> and the command nohup to launch the job, so I am doing:
>>>
>>> . /Bis/shared/centos-3_x86_64/etc/custom.d/amber9_intel8.1_lam-
>>> 7.1.2_intel-8.1.sh
>>> lamboot
>>>
>>> before starting the AMBER calculations. The typical command line for
>>> sander.MPI is then:
>>>
>>> mpirun -np 4 ${AMBERHOME}/exe/sander.MPI -O -i mdr1.in -o mdr1.out -inf
>>> mdr1.inf -x mdr1.crd -c eq7.rst -p prmtop -r mdr1.rst
>>>
>>> But, if I replace in the command line sander.MPI by pmemd.MPI:
>>>
>>> mpirun -np 4 ${AMBERHOME}/exe/pmemd.MPI -O -i mdr1.in -o mdr1.out -inf
>>> mdr1.inf -x mdr1.crd -c eq7.rst -p prmtop -r mdr1.rst
>>>
>>> I get an error saying that lamboot was not started.
>>>
>>> I am trying to do these calculation on an 64 bits 8-proc Linux machine,
>>> running under centos-3. The lam used is the version 7.1.2_intel-8.1.
>>>
>>> Also, I am only using features which should exist in PMEMD according to
>>> the AMBER manual.
>>>
>>> Do you have any idea what I could check or what to find information to
>>> fix
>>> this problem?
>>>
>>> Thank you in abvance for your help,
>>>
>>> Therese Malliavin
>>> Unite de Bioinformatique Structurale
>>> Institut Pasteur, Paris
>>> France
>>>
>>> -----------------------------------------------------------------------
>>> The AMBER Mail Reflector
>>> To post, send mail to amber.scripps.edu
>>> To unsubscribe, send "unsubscribe amber" (in the *body* of the email)
>>> to majordomo.scripps.edu
>>
>> -----------------------------------------------------------------------
>> The AMBER Mail Reflector
>> To post, send mail to amber.scripps.edu
>> To unsubscribe, send "unsubscribe amber" (in the *body* of the email)
>> to majordomo.scripps.edu
>>
>

-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" (in the *body* of the email)
      to majordomo.scripps.edu
Received on Wed Dec 17 2008 - 01:18:46 PST
Custom Search