Re: [AMBER] Multiple vs Continuous MD opinion from Ross Walker on 2014-03-05 (Amber Archive Mar 2014)

From: Ross Walker <ross.rosswalker.co.uk>
Date: Wed, 05 Mar 2014 09:08:35 -0800

Ultimately one long simulation vs many short simulations are the same
thing. In other words an infinitely long simulation is exactly the same as
an infinite number of one step simulations. Of course the caveat here is
that your starting conditions for the infinite short simulations are
independent and in this extreme example represent a correctly weighted set
of accessible structures. The underlying issue is that most of our
simulations are biased by the starting conditions. We almost all
exclusively start simulations from the crystal structure, and for reasons
I could never understand consider that an RMSD that stays close to the
crystal structure is good.

I think the real problem is one of a lack of error estimation. Far too
many MD papers I see don't have any decent (if any) error estimation. So
for example one would run a single 50ns simulation and report a binding
free energy. That's really not much use to anybody. Why 50ns? Why not 5ns?
Why not 500ns? - What a reviewer is most likely getting at when he/she
asks for repeats is actually a coded way of saying: "show me that you
didn't just conveniently stop your single simulation when you got the
answer you wanted." - Of course they can't say that directly so they ask
you for repeats.

I think what is really needed in such papers is evidence of convergence
that allows a reader to decide how much they want to trust your results.
For example if one is calculating a binding energy say. Why not produce a
plot showing the binding energy prediction as a function of the simulation
length. Essentially a cumulative average. It would allow someone to see
how much faith they can put in the final binding energy reported.

Now, what you really want to do is make your results bullet proof within
all reasonable doubt. So how could you do that. Well, you could run
another simulation from the same starting structure with a different seed
and plot the cumulative average for that on the same plot - this will give
an idea of what the spread might be in your results - as in how much to
trust it. Note this is VERY different from just adding error bars to plots
(most of which are wrong by the way) based on the number and range of
points seen in a single run. Using such errors bars is of little use since
you cannot make estimates of the error in your results based on a lack of
knowledge for what you could have sampled. Too many people give things
like 5.0Kcal/mol +/- 0.5 kcal/mol - which is simply wishful thinking and
gives the unwary reader way more faith in the result than they should
have. It is up to you as the author of the work to properly address
uncertainty in the results. So a reviewer who asks for multiple repeats is
really doing the author a favor.

Now a separate argument is one of how to improve sampling. I don't have
concrete numbers but what you need to ask is:

Does a single 500ns simulation started from the crystal structure sample
phase space better than 10x 50ns simulations started from the same crystal
structure? - I would offer that these are likely to be very similar.
However, one should always extrapolate such things to their limits. So how
does it compare to 100x 5ns? or 1000x 0.5ns? or 10,000x 50ps?

I think most people would agree that the last one, possible two, would
give you terrible sampling. So for multiple runs where is the sweet spot?
- That is the bit that will be system dependent and a function of what you
are looking to calculate. That said what one really wants to be doing is
running lots of simulations from an equilibrium set of starting geometries
- So what about running 500ns, taking snapshots every 10ns and running
these for another XXX ns each? This may give better sampling than the idea
of starting everything from the same starting structure.

Ultimately though I think the underlying motivation for a reviewer to ask
for multiple simulations has nothing to do with questioning whether your
sampling was reasonable etc. They are just asking for you to prove, beyond
reasonable doubt, that your conclusions can be considered reliable and
reproducible and isn't just based on some fortuitous one off.

All the best
Ross

On 3/5/14, 7:58 AM, "Soumendranath Bhakat" <bhakatsoumendranath.gmail.com>
wrote:

>Okay say for example i have a ligand bound protein,we demonstrated
>that a continuous md leads to an reasonable binding free energy close
>to experiment or statistically siginificant trends. Also this
>continuous md simulation leads to identify prominent motions related
>to conformation. Now some people say running multiple md simulations
>from an intial starting point with different random velocity and then
>combining all trajectories will possibily cover more sampling space.
>So for example a general protein ligand complex which will lead to
>better overall sampling a long continuous sampling or multiple md
>approach? Which technique will cover more sampling space if we go
>for conformational analysis a multiple md or continuous md.
>
>On 3/5/14, Brian Radak <radak004.umn.edu> wrote:
>> Back during my graduate preliminary exams I recall being (somewhat)
>>gently
>> reminded that the validity of (nearly?) all statistical mechanical
>> estimators in use in MD analysis are predicated on the *assumption* of
>> ergodicity. That is, that the trajectory at hand is in fact really
>>really
>> long and has therefore visited all *relevant *regions of phase space.
>>
>> Now I would argue that this depends on how one defines relevant and that
>> this is the great advantage/disadvantage of simulations in general, the
>> complete control one has of defining the system/problem. The validity of
>> this definition will probably reduce to physical arguments based on
>> intuition and empirical knowledge of the problem at hand. Therefore, as
>> Carlos pointed out, which tools are appropriate and which compromises
>>are
>> best is likely to always be a case by case challenge.
>>
>> Regards,
>> Brian
>>
>>
>> On Wed, Mar 5, 2014 at 9:46 AM, Carlos Simmerling <
>> carlos.simmerling.gmail.com> wrote:
>>
>>> In my opinion this is like wondering whether one should do standard MD
>>>or
>>> free energy calculations, or explicit vs implicit solvent, or for that
>>> matter QM vs MM. Multiple MD and long continuous MD are just two
>>> different
>>> tools, and which one is the "right" tool depends completely on the
>>> problem
>>> you are trying to solve, and what sort of data it requires. The best
>>> answer
>>> is of course to do multiple very long MD, but I believe that the key to
>>> success in this area (or any other where the tools are not fully
>>>mature)
>>> is
>>> to recognize that compromises must often be made, and to carefully
>>>choose
>>> the ones that have the least impact on your specific goals for the
>>> project.
>>> For a reviewer to say that in all cases multiple short MD is better
>>>than
>>> long MD makes no sense to me. That being said, I am very skeptical of
>>> studies where there is no attempt to quantify precision.
>>> carlos
>>>
>>>
>>> On Wed, Mar 5, 2014 at 9:33 AM, Soumendranath Bhakat <
>>> bhakatsoumendranath.gmail.com> wrote:
>>>
>>> > Dear Amberists;
>>> >
>>> > We have reported long range continuous MD simulations (50ns) in many
>>>of
>>> our
>>> > research communications. But we observe that some journals and
>>> > reviewers
>>> > are very much critical of continuous MD simulations and asked for
>>> multiple
>>> > MD simulations.
>>> >
>>> > But recently in a debate many people put different views on multiple
>>>MD
>>> > simulations and as per their view this multiple MD simulation does
>>>not
>>> > provide a great insight than continuous MD (50/100ns sampling). Some
>>> people
>>> > say in positive aspect to multiple MD saying that it covers a large
>>> > conformational space.
>>> >
>>> > Majority of people agreed that if you are doing long range continuous
>>> > MD
>>> > and proper post dynamics analysis thats enough to demonstrate maximum
>>> > points related to motions of a biological system.
>>> >
>>> > As a continuous learner my question is to AMBER community that which
>>> > one
>>> is
>>> > preferred a long range continuous MD or corresponding Multiple MD
>>> > simulation?
>>> >
>>> > As there are numerous numbers of paper on continuous MD rather than a
>>> very
>>> > few multiple MD papers on aspects like conformational analysis and
>>>etc.
>>> so
>>> > which one is the best to go with.
>>> >
>>> > Please put justification in support of your argument. We experience
>>> > that
>>> > some journal and reviewers always point out to do multiple MD over
>>> > continuous MD simulation,but in maximum cases people accept long
>>>range
>>> > continuous MD.
>>> >
>>> > Thanks & Regards;
>>> > Soumendranath Bhakat
>>> > Co-Founder Open Source Drug Design and In Silico Molecules (
>>> > www.insilicomolecule.org)
>>> > UKZN, Durban
>>> > Past: Birla Institute of Technology,Mesra, India
>>> > --
>>> > Thanks & Regards;
>>> > Soumendranath Bhakat
>>> > _______________________________________________
>>> > AMBER mailing list
>>> > AMBER.ambermd.org
>>> > http://lists.ambermd.org/mailman/listinfo/amber
>>> >
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>
>>
>>
>>
>> --
>> ================================ Current Address =======================
>> Brian Radak : BioMaPS
>> Institute for Quantitative Biology
>> PhD candidate - York Research Group : Rutgers, The State
>> University of New Jersey
>> University of Minnesota - Twin Cities : Center for
>>Integrative
>> Proteomics Room 308
>> Graduate Program in Chemical Physics : 174 Frelinghuysen Road,
>> Department of Chemistry : Piscataway, NJ
>> 08854-8066
>> radak004.umn.edu :
>> radakb.biomaps.rutgers.edu
>> ====================================================================
>> Sorry for the multiple e-mail addresses, just use the institute
>>appropriate
>> address.
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>
>
>--
>Thanks & Regards;
>Soumendranath Bhakat
>
>_______________________________________________
>AMBER mailing list
>AMBER.ambermd.org
>http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Mar 05 2014 - 09:30:02 PST