Dear Amber developers,
I am dealing with some MMGBSA calculations and their statistical
uncertainties and I came across this very useful and explanatory email
from J. Swails in the Amber archives.
However I am wondering how the standard deviations on the DeltaG_gas
and DeltaG_solv are handled by MMGB(PB)SA.py.
For instance in one my calculations I obtained the following results:
****************************************************************************************
Differences (Complex - Receptor - Ligand):
Energy Component Average Std. Dev. Std. Err. of Mean
-------------------------------------------------------------------------------
VDWAALS -183.0991 11.5396 0.1632
EEL -1166.7511 55.0115 0.7779
EGB 1213.1014 54.1049 0.7651
ESURF -25.3833 1.3550 0.0192
DELTA G gas -1349.8502 62.1130 0.8783
DELTA G solv 1187.7181 53.0433 0.7501
DELTA G binding = -162.1321 +/- 11.7599 0.1663
****************************************************************************************
If I understood correctly, the standard deviations of each
contribution (VDWAALS,
EEL, EGB and ESURF) are worked out as Jason Swails explained.
But, for instance, how is the standard deviation
of "DELTA G gas" derived? Being that just the summation of VDWAALS and EEL?
And consequently, how is the standard deviation derived for "DELTA G
binding", being just
the summation of DELTA G gas and DELTA G solv ?
In this case the SD is drastically reduced: from ~62 and ~53 to ~12 .
Thanks a lot in advance.
Best regards,
From: Jason Swails <jason.swails.gmail.com>
Date: Tue, 14 Feb 2012 18:56:02 -0500
Hello,
I'll provide a rather detailed explanation of MM/PBSA statistics as I've
come to think about them. However, as my knowledge of MMPBSA
implementations in Amber is almost exclusively limited to the Python
version, I can't comment on implementation details of the perl version.
On Tue, Feb 14, 2012 at 4:14 AM, Urszula Uciechowska <
urszula.uciechowska.chem.umu.se> wrote:
>
> Dear Amber users,
>
> I run MMPBSA calculations for a protein-peptide complex for 100 sps. The
> MD simulations were carried out for 10ns, complex was stable...
> below I pasted the results that I received from mmpbsa run, what I dont
> understand is this huge STD value ... ? why is that? where does it come
> from?
> I have also checked the single sps and rerun MMPBSA, however I have not
> seen such as big deviation.
> How should I understand PBTOT= -129.74 with STD=75.51?
>
Standard deviations are difficult to calculate. The easiest way to
calculate them is to generate a sample of "DELTA" values, which is simple
to do for a single trajectory -- all you do is do "complex - receptor -
ligand" for each snapshot and take the standard deviation of that. This
perfectly accounts for the correlation in the complex, receptor, and ligand
data (since they're obviously correlated). However, this obvious
correlation disappears once you use multiple trajectories, for obvious
reasons.
With MMPBSA.py, we try to use the "sample standard deviation" approach when
we can (i.e. for a single trajectory), but fall back to assuming _no_
correlation and propagating the error as a series of uncorrelated, random
errors (so a square root of a sum of squares of the components). This is a
_maximum_ estimate of your standard deviation (assuming of course you don't
have more systematic error than random error), and will typically
drastically overestimate your standard deviations (but involves the
simplest coding).
I'm not sure what mm_pbsa.pl does when it calculates statistics, but it
appears to be drastically overestimating it here for the reason I indicate
above. The thing that makes me believe this is the case is the highlighted
line below:
# DELTA
> # -----------------------
> # MEAN STD
> # =======================
> ELE -322.09 138.62
> VDW -86.14 39.01
> INT 0.00 71.77
>
Your internal potential terms have an average value of 0 kcal/mol
difference between your complex and receptor/ligand out to 2 decimal
places(!). This complete cancellation is almost certainly due to the fact
that you are using a single trajectory, in which case _every_ snapshot
should have a value of exactly 0, leading to a standard deviation of 0 as
well, but your results clearly show a large standard deviation. This
indicates to me that your standard deviations are overestimated compared
with taking the sample standard deviation.
However, I'm of the opinion that MM/PBSA results using a single trajectory
are overly correlated (obviously, the ensembles do not overlap so
completely in reality), so a single trajectory will provide abnormally low
values for the standard deviation. It would be interesting to do more
thorough statistical analysis on the different methods of computing free
energies using MM/PB(GB)SA, and maybe others have done so in the
literature, but I'm not sure.
I'll also comment on what I think is a common misunderstanding of the
standard deviation. In addition to including random errors that take place
from measurement to measurement, standard deviations also reveal _natural_
fluctuations within a specific data set. Therefore, I would not expect the
standard deviation to fall just by drawing more snapshots from an
equilibrated simulation (note the term equilibrated) simulation. Unless
there are significant structural changes that take place on the time scale
of your entire simulation that causes your data to shift during the course
of your simulation, I would expect that even as few as 20 to 50 snapshots
are enough to give you a good estimate of what your final standard
deviation will be. Because fluctuations are natural to your system, the
standard deviation should NOT decrease past a certain point (that is, once
the standard deviation matches the natural fluctuations of a complete data
set).
If you're looking for how confident you are that your mean value is
correct, on the other hand, the more points you draw should naturally
improve your confidence, so _this_ measure should decrease as more
snapshots are drawn. This value is called the standard error of the mean,
and it's related to the standard deviation:
http://en.wikipedia.org/wiki/Standard_error_(statistics). Basically what
it says is the more snapshots you draw, the more confident that you are
that you're sampling equally from both sides of the _true_ mean (which you
can never know). So consider carefully what value you really want -- the
standard deviation or the standard error of the mean (which is standard
deviation / sqrt(number data points) for ideal sampling).
I think this issue is more important than the attention that it is given.
But, it's also a fairly difficult analysis/problem, so canned scripts
(e.g. mm_pbsa.pl and MMPBSA.py) kind of sweep it under the rug since it's
easier that way.
MMPBSA.py will provide you with sample standard deviations for single
trajectories as well as standard errors of the mean (assuming ideal
sampling by using the equation std.dev/sqrt(num frames)), if you would like
to try that version as well. They both work by calling the same external
programs to get their data, with slightly different defaults, so the
results should be comparable for the averages (we checked this in the
initial implementation).
And so ends my saga. I hope it was (or will be) helpful to someone,
somewhere.
All the best,
Jason
--
Jason M. Swails
Quantum Theory Project,
University of Florida
Ph.D. Candidate
352-392-4032
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
--
Dr Massimiliano Porrini
Institute for Condensed Matter and Complex Systems
School of Physics & Astronomy
The University of Edinburgh
James Clerk Maxwell Building
The King's Buildings
Mayfield Road
Edinburgh EH9 3JZ
Tel +44-(0)131-650-5229
E-mails : M.Porrini.ed.ac.uk
mozz76.gmail.com
maxp.iesl.forth.gr
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Jul 09 2012 - 07:00:03 PDT