RE: AMBER: questions about previous posts from Ross Walker on 2008-08-11 (Amber Archive Aug 2008)

From: Ross Walker <ross.rosswalker.co.uk>
Date: Mon, 11 Aug 2008 23:18:49 -0700

Hi Naser,

I'll try and answer your questions as best I can here although please bear
in mind that in some cases these are research questions and or open to
multiple interpretations so please take my answers here as simple
suggestions etc, others might want to comment more and these are certainly
topics for discussion.

>1) http://amber.ch.ic.ac.uk/archive/200412/0026.html is a discussion about
the difference btw Langevin vs >Berendsen temperature control. The end
conclusion is that Langevin corrupts the fast dynamics of the system so >it
shouldn't be used during production run, and if one wants to compute time
correlation function then they >need to avoid ntt=3. Can anyone elaborate on
this? From the tutorial I was under the impression that Berendsen >algorithm
had to be much more elaborate, so is my last step correct? My main focus is
picking up vibrations >occurring in a time frame of ~6ps, is it a good idea
to use ntt=3 at all for this type simulation?

This really depends on what it is you are looking at. The issue here, as I
understand it, is that if for example you wanted to calculate an
autocorrelation function from a given property arising from your MD
simulation then that is going to be dependent on what you choose as the
parameters for the langevin thermostat hence the 'corrupts'. The Berendsen
thermostat is also going to affect things but probably to a lesser extent,
especially if the relaxation time is set reasonably long, say 10ps or so. If
you are looking at large scale motions or some kind of ensemble average then
I believe you are okay with a Langevin thermostat. Of course you still have
a thermostat there so have to take this into account. Maybe others can
comment more here but in your case if you are interested in specific
vibrations then you should probably consider running NVE for the production
phase. Of course as with any simulation the key is to show that your
specific choices do not affect the results. I.e. if you vary gamma_ln do you
change the results over the time frame you are interested in? If so then you
have an issue you need to deal with, if not then the approximation of a
thermostat is probably reasonable.

>TIP3PBOX w/ 8 angstrom water
>I get the following warning because I don't add Na+ ion "WARNING: The
unperturbed charge of the unit: -4.000000 >is not zero". I neutralize DNA
but I am not sure if the guideline applied to everything (does it cause
problems >with periodic boundary condition?)

With AMBER's PME this should not cause a problem since it applies a net
neutralizing plasma in cases where the net charge is not zero.

>I am moving the center of mass at every step based on someone else's
example, and I am not sure if it makes a >difference to do this during
simulation or in my ptraj. There is a discussion wrt to "pca analysis"
>http://amber.ch.ic.ac.uk/archive/200601/0124.html and I am not sure if it
applies. Center of mass was also >discussed here
http://amber.ch.ic.ac.uk/archive/200704/0202.html and it suggests that it is
not necessary to do >this if one is using ntt=1. Any hints?

I'm not sure you really want to be doing this although it may not ultimately
make a difference (other than making your output file huge) since the amount
of COM movement / rotation removed on every step will be minimal. Again if
you run an NVE production run from an equilibrated box then you shouldn't
need to remove the center of mass motion during production since it
shouldn't have any. Note COM movement is generally more of an issue with
implicit solvent simulations than explicit simulations. The issue with NTT=1
and not removing COM motion is that you can get a rotating ice cube, I.e.
all of your kinetic energy ends up in rotation and translation of the box
and your actual dynamics are effectively frozen. The use of PME
significantly reduces this problem, however.

>2) http://amber.ch.ic.ac.uk/archive/200603/0110.html and
>http://www.rosswalker.co.uk/tutorials/amber_workshop/Tutorial_eight/ says
that if one averages the results and >the simulation is long enough two
machines or different number of processors will give the same result. What
is >averaged? And what is long enough? Is there a rule of thumb?

This is a generic comment. In principle for any ensemble property if you run
two independent simulations for an infinite time then the results should
converge to the same result. As for what is averaged this could be anything
in principle, temperature, density, energy, H-bonds etc etc, really anything
that should converge. As for rule of thumb this really depends on what it is
you want to measure. In reality I doubt any MD simulations that anyone runs
right now are converged, you can run microsecond long MD simulations and
still show that many structural properties are not converged. The key though
is that for two independent simulations, started from different structures
and or different random seeds that your results converge to the same value.
Unless of course you are specifically trying to show some dependence on the
initial conditions.

>3) http://amber.ch.ic.ac.uk/archive/200805/0518.html is a discussion about
doing multistep production runs >(i.e. doing a simulation from 0 to 100ps
and feeding the .rst file to the input of another simulation from 100 >to
200ps or a 200ps in two 100ps steps). Discussion points out that during
calculation higher precision is used >than what is written to the .rst file.
This truncation doesn't seem to shift the position of the absorption >peaks
for me. However, there is a significant difference between doing a 0 to
200ps run vs 0 to 100ps fed to >100 to 200ps. So a multi step simulation is
less accurate?

No, since the precision is still significantly better than the property you
are trying to measure. The issue is that in reality Newtonian Dynamics while
deterministic is still chaotic - hence why analytical solutions do not
exist. What this means is that while a single trajectory is in theory
deterministic an infinitesimal change in the initial conditions will always
ultimately lead to uncorrelated trajectories. Hence what you see is that the
200ps single run simply follows a different, but equally correct (or
incorrect ;-) ), region of phase space than the 2x100ps simulations.
Recently there has been some work by Shaw research where they have looked at
bitwise reversibility, coupled with fixed precision that allows them to run
the exact same trajectory in multiple runs or a single run. This is really
just a matter of improving precision though and as such doesn't say anything
about accuracy. It is important not to confuse the two.

Now, if you take the above to it's logical conclusion and restart on every
time step then I think you may have issues due to the precision but I
suspect that beyond 1,000 steps between restarts you would not be able to
show any differences in the various ensemble properties.

>4) http://archive.ambermd.org/200407/0148.html
is
a discussion about regularity of >writing to mdcrd during production run. As
I mentioned before I am interested in the molecular vibrations >within ~6ps
time frame. Does this mean that at max. I can get away w/ writing to mdcrd
every 3000 step >(ntwx=3000, dt=.002)? Also does it make a different what
value one uses for vecs? I picked a value from an >example but I am not sure
what the preference is.

If you are looking at modes on the order of 6ps then your maximum space
between mdcrd frames should not be more than every 1.5ps (750 steps),
although ideally you want about 10 samples per period of the fastest motion
you are interested in, so in your case around 300 steps or so. This is a
function of what is known as the Nyquist limit from sampling theory. It is
probably a good idea to read up on it, you will find a good discussion in
any introductory signal processing text book.

>I can't seem to make the absorption coefficients vs cm^-1 plot to settle on
any value since position of >absorption peaks change w/ number of
processors, time duration, steps in simulation.

This suggests to me that your simulations are not long enough since what you
are seeing is dependence on the initial conditions, you are only running
100ps production runs here, for good convergence you probably need 100 times
this (10ns) or even greater. Try plotting convergence of the above plot as a
function of run length. I.e. over your 100ps, calculate what you would have
if you only used the first 5ps of the data, first 10ps, first 15ps etc and
see if the results appear to converge. The other thing to try is to take a
sliding window of half your data, in your case 50ps and see what you get for
0-50ps, 10-60ps, 20-70ps etc. If the window is long enough that the data
should be converged within the window then this will test how well
equilibrated your initial production phase structure was.

Good luck,
Ross

/\
\/
|\oss Walker

| Assistant Research Professor |
| San Diego Supercomputer Center |
| Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
| http://www.rosswalker.co.uk | PGP Key available on request |

Note: Electronic Mail is not secure, has no guarantee of delivery, may not
be read every day, and should not be used for urgent or sensitive issues.

-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" (in the *body* of the email)
to majordomo.scripps.edu
Received on Wed Aug 13 2008 - 06:07:35 PDT