Re: AMBER: WHAM: how frequently do I need data on the system coordinates? from John Chodera on 2008-04-10 (Amber Archive Apr 2008)

From: John Chodera <jchodera.gmail.com>
Date: Thu, 10 Apr 2008 10:47:57 -0700

Hi David,

It's important to note that WHAM can only be applied to *equilibrium*
data. In order to compute the potential of mean force (PMF) along
something like a pulling coordinate using WHAM, typical practice is to
conduct a number of umbrella sampling simulations where the positions
and spring constants of harmonic restraints are carefully chosen to
ensure good overlap in the distribution of sampled distances along the
pulling coordinate. Each simulation has to be long enough to reach
equilibrium and then sample enough uncorrelated configurations to give
a precise estimate. The initial equilibration period of each
simulation must be discarded, and correlation times of the production
period are analyzed to determine the number of uncorrelated
configurations in the dataset. Typically, you want something like 40
such uncorrelated configurations per simulation, but if you use a
method that gives you statistical uncertainties as well as your PMF,
this will let you know how large the statistical uncertainty is in
your final estimate, and may give you clues to where you need more
sampling.

Benoît Roux has an excellent paper on the theory behind applying WHAM
to umbrella sampling simulations:

M. SOUAILLE and B. ROUX, "Extension to the Weighted Histogram Analysis
Method: Combining Umbrella Sampling with Free Energy Calculations",
Comput. Phys. Comm. 135, 40-57 (2001).

I would, however, suggest you use an estimator that gives you
statistical uncertainties as well. Michael Shirts and I have recently
produced one called MBAR (for "Multistate Bennett acceptance ratio")
which is pretty much a drop-in replacement for WHAM:

http://arxiv.org/abs/0801.1426
http://www.simtk.org/home/pymbar

Finally, methods such as "reverse cumulative averaging" can help you
automatically judge when each simulation has reached equilibrium and
how many correlation times you have in your dataset:

http://dx.doi.org/10.1063/1.1638996

The protocol you describe sounds like it would continue to perturb the
Hamiltonian at a rate that would not allow the system to come to
equilibrium before the umbrella restraint is moved again. As such,
you cannot use WHAM to analyze this data, as it will be
nonequilibrium.

To rectify this, you can either (1) increase the time spent before the
umbrella restraint is moved again to be sufficiently large to allow
the system to come to equilibrium and many equilibrium samples to be
collected from each window, or (2) accumulate the total work done on
the system during each pushing run and use this collection of work
measurements (one per pushing simulation) with a nonequilibrium
estimator, such as one based on the Jarzynski relation, to recover the
equilibrium free energy difference as a function of pulling
coordinate.

For option (2), the extraction of the PMF along the pulling coordinate
is a little bit tricky, since it's not the same as the free energy of
the system as a function of the umbrella equilibrium position. See,
for example, this paper by David Minh and Artur Adib of the NIH for
how it is done for bidirectional pulling experiments (where you pull
from both directions at equal rate):

http://arxiv.org/abs/0802.0224

I'm not entirely sure what the best estimator to use for
unidirectional pulling experiments is, but I am sure we can dig that
up if you think that nonequilibrium pulling is the way you'd like to
go. You would probably need many, many more pulling simulations than
10-20 to get this to converge, however.

My own suggestion would be to stick with umbrella sampling, especially
since these simulations can be run in parallel. The umbrellas
probably don't need to be so close as 0.025 A between umbrella
centers. You can probably start with a crude spacing, run some
simulations, and then examine the distribution of pulling distances
sampled and fill in the gaps where needed.

Best,

John

--
Dr. John D. Chodera <jchodera.gmail.com>      | Mobile    : 415.867.7384
Postdoctoral researcher, Pande lab            | Lab phone : 650.723.1097
Department of Chemistry, Stanford University  | Lab fax   : 650.724.4021
http://www.dillgroup.ucsf.edu/~jchodera
On 10/04/2008, David Cerutti <dcerutti.mccammon.ucsd.edu> wrote:
> Hello,
>
>    I'm just starting into the wide world of WHAM, but I want to do a very
> thorough study.  Starting very soon, I intend to perform 10-20 simulations
> for pushing biotin out of strepatvidin.  My basic data collection is as
> follows:
>
>  1.) 15ns total pushing run, with 0.025A perturbations every 30ps at a 1.5fs
> time step (1000 steps = 1.5ps)
>
>  2.) Collect data on the restraining potential with:
>     &wt type='DUMPFREQ', istep1=10
>
>  3.) Write energy output data every 10 steps (0.015ps)
>
>  4.) Write coordinates every 1000 steps (1.5ps)
>
>  Is this a sufficent amount of data collection?  Primarily, I'm concerned
> whether my rate of coordinate output is all right, because the example in
> the AMBER9 manual doesn't seem to need coordinates output at a high rate.
> Since I'll have all the restart files, I could certainly go back on many
> processors and resample the conformations if I needed denser sampling, but
> it'd be much better to get enough readouts the first time.
>
>  Thanks,
>  Dave
> -----------------------------------------------------------------------
>  The AMBER Mail Reflector
>  To post, send mail to amber.scripps.edu
>  To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
>
-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu

Received on Fri Apr 18 2008 - 21:18:25 PDT