Re: [AMBER] Multiple vs Continuous MD opinion

From: Carlos Simmerling <carlos.simmerling.gmail.com>
Date: Wed, 19 Mar 2014 10:50:58 -0400

I suppose I can comment on this since we've published a series of papers on
HIV-PR flap dynamics. As I said before, your comments seem to miss the fact
that the "correct" approach depends strongly on the type of data you want.
Are you looking to see just an instance of a rare event? In that case, a
large number of short simulations is going to work. What "large" and
"short" mean are connected, and this is the sort of thing that the
folding.home people have addressed in detail (as Hai referenced). If you
had an infinite number of very short simulations, starting from the correct
ensemble of starting structures, then you would indeed see transitions.
Alternatively, do you want to look at rates, or populations, or data
related to averages? Do you want statistical confidence that the
probability of the events you observe is accurately related to a particular
timescale, or some changes related to sequence variation? These all impact
what type of simulation is "best". Clearly if one wants to learn about
thermodynamic binding affinity, one could do large numbers of simulations
(or very long ones), and try to partition the results into bound vs.
unbound states. However, statistical mechanics gives us alternate, more
efficient routes to these data by doing things like FEP calculations. I
would suggest that it might be helpful to follow up on Ross's email and do
some reading about the ergodic hypothesis and how it relates to using a
small number of simulations to compare to experiments that typically have
closer to Avogadro's number of molecules involved.

In your email, it's not clear what you mean by "in continuous MD, we
always get an event". To me this seems true only in continuous MD of
infinite length. Certainly your "continuous" simulations and your
"multiple" simulations all have a finite length, and the dynamics that you
wish to explore also have a characteristic timescale. There is no real
separation between "continuous" and "multiple"; these are not 2 methods, in
my opinion. I believe that it is typically a good idea to do more than one
observation, as it is usually with experiments. How you partition your
available computer resources into more simulations of shorter length vs.
fewer simulations of longer length depends on the needs of the project (and
often, on the specific computational resources that you have available, the
parallel scaling of the software, and so on). Based on that, I disagree
with your final statement "this further confirms...". because you have not
defined the relationship between the timescale of the dynamics of interest
to the length of your "long" or "multiple" MD trajectories, nor is
"multiple" well defined here.

My point is that doing simulations requires compromises, since we can't do
infinite numbers of simulations, or infinitely long simulations to get
perfect thermodynamic ensembles. I don't believe that there is a
well-accepted simulation "method" that embodies the right compromise for
all cases. The questions that one asks about a system that under study will
lead to a set of data that one would like to obtain to answer the
questions, and these data requirements lead to a simulation design that
will (hopefully) provide the data. Sometimes one can get comparable data
with a variety of methods, and the available resources will influence which
one you select for the project. Asking questions about which is the best
simulation method, or arguing about merits of various methods makes little
sense to me when done in the absence of a specific, clearly defined project
goal and an understanding of molecular behavior. In my opinion, success in
this field (and many others) depends very much on learning a variety of
tools, and applying the right tool in each case. It's not about looking for
which tool is "best".
just my opinion.
carlos


On Wed, Mar 19, 2014 at 3:30 AM, Soumendranath Bhakat <
bhakatsoumendranath.gmail.com> wrote:

> Okay for example I started a multiple MD from equil.rst with irest=0 to
> define random initial velocities for to run a 20ns multiple md divided into
> 5ns*4. Whats interesting in that is if we are taking snapshots from by
> combining trajectories or by individual snapshots from different
> trajectories still the biological events like flap opening and closing
> which has enough experimental proof is not reproducible in case of multiple
> MD while in continuous MD we always get an event which is more obvious and
> experimentally validated. This further confirms (with very less number of
> papers on multiple md) that using a long continuous MD is the best option
> to check the dynamic behaviour of a system (solvation box dynamics of
> simple protein ligand).
>
>
> On Tue, Mar 18, 2014 at 12:38 AM, Hai Nguyen <nhai.qn.gmail.com> wrote:
>
> > Hi,
> >
> > the multiple MD runs is widely used at Folding.Home. You can find the
> > article discussing about the probability to get one event (such as
> > getting one folding event with a given MD length).
> >
> > Pande, Vijay S., et al. "Atomistic protein folding simulations on the
> > submillisecond time scale using worldwide distributed computing."
> > Biopolymers68.1 (2003): 91-109.
> >
> > You can find more papers in their website
> > http://folding.stanford.edu/home/papers
> >
> > Hai Nguyen
> >
> > On Mon, Mar 17, 2014 at 2:40 AM, Soumendranath Bhakat
> > <bhakatsoumendranath.gmail.com> wrote:
> > > Dear Filip and Amberists;
> > >
> > > My opinion on this matter is as follows
> > > 1. Whenever we are going for a multiple MD e.g 5*100 or 100*5 we will
> > never
> > > going to explore an event happening in a longer nano second timeframe.
> > > Say for example in case of HIV protease a flap opening and closing
> occurs
> > > after 50ns continuous MD. But if we adapt a multiple MD approach say
> for
> > > example 5*10 or 10*5 or whatever then it will never show a biological
> > > movement in longer time scale as in multiple MDs all sub MDs are
> > > independent. Phenomenas such as flap opening closing, etc. will never
> > > possible in multiple MD. Whereas in long continuous MD we can monitor
> > > certain very interesting biological events in very longer timescale.
> > > 2. I agree to the fact that multiple MD with more sampling points such
> as
> > > 5ns*10 will probably leads to a conclusive MMGBSA/MMPBSA scores rather
> > than
> > > a long continuous MD.
> > >
> > > Lets hope that we might put a solid article with substantial evidence
> to
> > > end this story of continuous vs Multiple md.
> > > For me to understand dynamic behaviour of biological system always
> opted
> > > for continuous MD but for binding free energy calculation always opt
> for
> > > multiple MD approach.
> > > Cheers!!
> > >
> > >
> > > On Sat, Mar 15, 2014 at 4:23 AM, Bill Ross <ross.cgl.ucsf.edu> wrote:
> > >
> > >> A tool I am wont to recommend is low-temperature vacuum dynamics - you
> > >> can simulate a long time to get a feel for the range of movement of
> > >> your system, possibly cherry-picking representative frames to solvate
> > >> and run in parallel.
> > >>
> > >> Bill
> > >>
> > >> Brian Radak <radak004.umn.edu> wrote:
> > >>
> > >> > Back during my graduate preliminary exams I recall being (somewhat)
> > >> gently
> > >> > reminded that the validity of (nearly?) all statistical mechanical
> > >> > estimators in use in MD analysis are predicated on the *assumption*
> of
> > >> > ergodicity. That is, that the trajectory at hand is in fact really
> > really
> > >> > long and has therefore visited all *relevant *regions of phase
> space.
> > >> >
> > >> > Now I would argue that this depends on how one defines relevant and
> > that
> > >> > this is the great advantage/disadvantage of simulations in general,
> > the
> > >> > complete control one has of defining the system/problem. The
> validity
> > of
> > >> > this definition will probably reduce to physical arguments based on
> > >> > intuition and empirical knowledge of the problem at hand. Therefore,
> > as
> > >> > Carlos pointed out, which tools are appropriate and which
> compromises
> > are
> > >> > best is likely to always be a case by case challenge.
> > >> >
> > >> > Regards,
> > >> > Brian
> > >> >
> > >> >
> > >> > On Wed, Mar 5, 2014 at 9:46 AM, Carlos Simmerling <
> > >> > carlos.simmerling.gmail.com> wrote:
> > >> >
> > >> > > In my opinion this is like wondering whether one should do
> standard
> > MD
> > >> or
> > >> > > free energy calculations, or explicit vs implicit solvent, or for
> > that
> > >> > > matter QM vs MM. Multiple MD and long continuous MD are just two
> > >> different
> > >> > > tools, and which one is the "right" tool depends completely on the
> > >> problem
> > >> > > you are trying to solve, and what sort of data it requires. The
> best
> > >> answer
> > >> > > is of course to do multiple very long MD, but I believe that the
> > key to
> > >> > > success in this area (or any other where the tools are not fully
> > >> mature) is
> > >> > > to recognize that compromises must often be made, and to carefully
> > >> choose
> > >> > > the ones that have the least impact on your specific goals for the
> > >> project.
> > >> > > For a reviewer to say that in all cases multiple short MD is
> better
> > >> than
> > >> > > long MD makes no sense to me. That being said, I am very skeptical
> > of
> > >> > > studies where there is no attempt to quantify precision.
> > >> > > carlos
> > >> > >
> > >> > >
> > >> > > On Wed, Mar 5, 2014 at 9:33 AM, Soumendranath Bhakat <
> > >> > > bhakatsoumendranath.gmail.com> wrote:
> > >> > >
> > >> > > > Dear Amberists;
> > >> > > >
> > >> > > > We have reported long range continuous MD simulations (50ns) in
> > many
> > >> of
> > >> > > our
> > >> > > > research communications. But we observe that some journals and
> > >> reviewers
> > >> > > > are very much critical of continuous MD simulations and asked
> for
> > >> > > multiple
> > >> > > > MD simulations.
> > >> > > >
> > >> > > > But recently in a debate many people put different views on
> > multiple
> > >> MD
> > >> > > > simulations and as per their view this multiple MD simulation
> does
> > >> not
> > >> > > > provide a great insight than continuous MD (50/100ns sampling).
> > Some
> > >> > > people
> > >> > > > say in positive aspect to multiple MD saying that it covers a
> > large
> > >> > > > conformational space.
> > >> > > >
> > >> > > > Majority of people agreed that if you are doing long range
> > >> continuous MD
> > >> > > > and proper post dynamics analysis thats enough to demonstrate
> > maximum
> > >> > > > points related to motions of a biological system.
> > >> > > >
> > >> > > > As a continuous learner my question is to AMBER community that
> > which
> > >> one
> > >> > > is
> > >> > > > preferred a long range continuous MD or corresponding Multiple
> MD
> > >> > > > simulation?
> > >> > > >
> > >> > > > As there are numerous numbers of paper on continuous MD rather
> > than a
> > >> > > very
> > >> > > > few multiple MD papers on aspects like conformational analysis
> and
> > >> etc.
> > >> > > so
> > >> > > > which one is the best to go with.
> > >> > > >
> > >> > > > Please put justification in support of your argument. We
> > experience
> > >> that
> > >> > > > some journal and reviewers always point out to do multiple MD
> over
> > >> > > > continuous MD simulation,but in maximum cases people accept long
> > >> range
> > >> > > > continuous MD.
> > >> > > >
> > >> > > > Thanks & Regards;
> > >> > > > Soumendranath Bhakat
> > >> > > > Co-Founder Open Source Drug Design and In Silico Molecules (
> > >> > > > www.insilicomolecule.org)
> > >> > > > UKZN, Durban
> > >> > > > Past: Birla Institute of Technology,Mesra, India
> > >> > > > --
> > >> > > > Thanks & Regards;
> > >> > > > Soumendranath Bhakat
> > >> > > > _______________________________________________
> > >> > > > AMBER mailing list
> > >> > > > AMBER.ambermd.org
> > >> > > > http://lists.ambermd.org/mailman/listinfo/amber
> > >> > > >
> > >> > > _______________________________________________
> > >> > > AMBER mailing list
> > >> > > AMBER.ambermd.org
> > >> > > http://lists.ambermd.org/mailman/listinfo/amber
> > >> > >
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > ================================ Current Address
> > =======================
> > >> > Brian Radak :
> BioMaPS
> > >> > Institute for Quantitative Biology
> > >> > PhD candidate - York Research Group : Rutgers, The State
> > >> > University of New Jersey
> > >> > University of Minnesota - Twin Cities : Center for
> > >> Integrative
> > >> > Proteomics Room 308
> > >> > Graduate Program in Chemical Physics : 174 Frelinghuysen
> > Road,
> > >> > Department of Chemistry : Piscataway,
> NJ
> > >> > 08854-8066
> > >> > radak004.umn.edu :
> > >> > radakb.biomaps.rutgers.edu
> > >> > ====================================================================
> > >> > Sorry for the multiple e-mail addresses, just use the institute
> > >> appropriate
> > >> > address.
> > >> > _______________________________________________
> > >> > AMBER mailing list
> > >> > AMBER.ambermd.org
> > >> > http://lists.ambermd.org/mailman/listinfo/amber
> > >>
> > >> _______________________________________________
> > >> AMBER mailing list
> > >> AMBER.ambermd.org
> > >> http://lists.ambermd.org/mailman/listinfo/amber
> > >>
> > >
> > >
> > >
> > > --
> > > Thanks & Regards;
> > > Soumendranath Bhakat
> > > _______________________________________________
> > > AMBER mailing list
> > > AMBER.ambermd.org
> > > http://lists.ambermd.org/mailman/listinfo/amber
> >
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
>
>
>
> --
> Thanks & Regards;
> Soumendranath Bhakat
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Mar 19 2014 - 08:00:03 PDT
Custom Search