- Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]

From: Jesper Sørensen <lists.jsx.dk>

Date: Tue, 5 Apr 2011 19:06:58 +0200

Hi Jason,

Thanks a lot for the answer - it is all much clearer to me know.

In the sense of the standard error I have read that you can only use it if

you data-points are uncorrelated or otherwise you will have to do

block-averaging and then you can only divide by the square-root of the new

number of data-points. The test for this is called statistical inefficiency.

Have you played around with this? I have a huge data set, but I want to make

sure the data is uncorrelated - otherwise the std. error of the mean may get

un-physically low. I guess otherwise you could just sample a lot or more

frequently (to infinity) and report really small values (going to 0)...

Best,

Jesper

-----Original Message-----

From: Jason Swails [mailto:jason.swails.gmail.com]

Sent: 5. april 2011 17:41

To: AMBER Mailing List

Subject: Re: [AMBER] MMPBSA.py ala-scanning output results

2011/4/5 Jesper Sørensen <lists.jsx.dk>

*> Hi Jason,
*

*>
*

*> Thanks for the reply. I understand how MMPBSA.py comes up with that
*

*> number now - but I am not sure that it makes sense to me why it is
*

*> calculated that way. Isn't it more reasonable to calculate the
*

*> standard deviation off of the column with the ddG per frame? I mean
*

*> shouldn't that take advantage of cancellation of errors? And in any
*

*> case from an ALA-scanning with a 1-trajectory approach, wouldn't you
*

*> expect the numbers to be correlated?
*

*>
*

Yes, you would. It was done the way it was due to ease of implementation,

but I'll consider changing it for an upcoming release. You can certainly

use the smaller number as long as you report how it was calculated.

*> The two ways of looking at it gives huge differences, so I am just
*

*> curious which number to use for publication purposes. I don't recall
*

*> seeing people having reported numbers like the first below, but on the
*

*> other hand if that is the more correct number then that is the one I
*

should use...

*> Is there a problem using the standard error of the mean instead of the
*

*> standard deviation, since it is not also reported?
*

*>
*

Be careful with the two numbers -- they mean different things. The standard

deviation represents the natural fluctuations of that variable that you

would expect in any data set. You will always get fluctuations in this kind

of data set. The standard error of the mean, on the other hand, is more of

a measure of your error. For an infinite data set, your standard deviation

will remain finite (in fact, probably close to what you've calculated now),

but your standard error of the mean will drop to 0 (since you have *all* of

the data and there can be no uncertainty as to what the true mean is).

Hope this helps,

Jason

*> 0.3974 +/- 24.2708
*

*> 0.397356 +/- 0.937653
*

*>
*

*> Best,
*

*> Jesper
*

*>
*

*>
*

*>
*

*> -----Original Message-----
*

*> From: Jason Swails [mailto:jason.swails.gmail.com]
*

*> Sent: 4. april 2011 18:05
*

*> To: AMBER Mailing List
*

*> Subject: Re: [AMBER] MMPBSA.py ala-scanning output results
*

*>
*

*> 2011/4/4 Jesper Sørensen <lists.jsx.dk>
*

*>
*

*> > Hi,
*

*> >
*

*> > I have a question about the output produced by the MMPBSA.py script
*

*> > from an alanine scanning.
*

*> >
*

*> > I think the value of the standard error is wrong - mostly because it
*

*> > is really high.
*

*> >
*

*> > RESULT OF ALANINE SCANNING: (L12A MUTANT:) DELTA DELTA G binding =
*

*> > 0.3974
*

*> > +/- 24.2708
*

*> >
*

*> > I have taken the output files and run them through my own matlab script.
*

*> > If I subtract the two columns with the "DELTA G binding" numbers I get:
*

*> >
*

*> > AVG -0.397356
*

*> > STD 0.937653
*

*> >
*

*>
*

*> This standard deviation appears (?) to be a population standard
*

*> deviation; i.e. you have a list of numbers and calculate the stdev.
*

*> MMPBSA.py calculates this standard deviation in a propagation of
*

*> errors type of way
*

*> (sqrt(std1**2 + std2**2 + ...)). The first approach takes into
*

*> account all of the correlation in the data (which lowers the
*

*> variance), whereas the latter approach assumes no correlation.
*

*>
*

*> Hope this helps,
*

*> Jason
*

*>
*

*>
*

*> > The average is the same, but the STD is very different. Does anybody
*

*> > have a comment on this?
*

*> > I have performed this calculation of a lot of residues in my complex
*

*> > and they all produce +/- around 24 to 25, no matter the value of
*

*> > the
*

*> ddG.
*

*> >
*

*> > Best regards,
*

*> > Jesper
*

*> >
*

*> >
*

*> > _______________________________________________
*

*> > AMBER mailing list
*

*> > AMBER.ambermd.org
*

*> > http://lists.ambermd.org/mailman/listinfo/amber
*

*> >
*

*>
*

*>
*

*>
*

*> --
*

*> Jason M. Swails
*

*> Quantum Theory Project,
*

*> University of Florida
*

*> Ph.D. Candidate
*

*> 352-392-4032
*

*> _______________________________________________
*

*> AMBER mailing list
*

*> AMBER.ambermd.org
*

*> http://lists.ambermd.org/mailman/listinfo/amber
*

*>
*

*>
*

*> _______________________________________________
*

*> AMBER mailing list
*

*> AMBER.ambermd.org
*

*> http://lists.ambermd.org/mailman/listinfo/amber
*

*>
*

Date: Tue, 5 Apr 2011 19:06:58 +0200

Hi Jason,

Thanks a lot for the answer - it is all much clearer to me know.

In the sense of the standard error I have read that you can only use it if

you data-points are uncorrelated or otherwise you will have to do

block-averaging and then you can only divide by the square-root of the new

number of data-points. The test for this is called statistical inefficiency.

Have you played around with this? I have a huge data set, but I want to make

sure the data is uncorrelated - otherwise the std. error of the mean may get

un-physically low. I guess otherwise you could just sample a lot or more

frequently (to infinity) and report really small values (going to 0)...

Best,

Jesper

-----Original Message-----

From: Jason Swails [mailto:jason.swails.gmail.com]

Sent: 5. april 2011 17:41

To: AMBER Mailing List

Subject: Re: [AMBER] MMPBSA.py ala-scanning output results

2011/4/5 Jesper Sørensen <lists.jsx.dk>

Yes, you would. It was done the way it was due to ease of implementation,

but I'll consider changing it for an upcoming release. You can certainly

use the smaller number as long as you report how it was calculated.

should use...

Be careful with the two numbers -- they mean different things. The standard

deviation represents the natural fluctuations of that variable that you

would expect in any data set. You will always get fluctuations in this kind

of data set. The standard error of the mean, on the other hand, is more of

a measure of your error. For an infinite data set, your standard deviation

will remain finite (in fact, probably close to what you've calculated now),

but your standard error of the mean will drop to 0 (since you have *all* of

the data and there can be no uncertainty as to what the true mean is).

Hope this helps,

Jason

-- Jason M. Swails Quantum Theory Project, University of Florida Ph.D. Candidate 352-392-4032 _______________________________________________ AMBER mailing list AMBER.ambermd.org http://lists.ambermd.org/mailman/listinfo/amber _______________________________________________ AMBER mailing list AMBER.ambermd.org http://lists.ambermd.org/mailman/listinfo/amberReceived on Tue Apr 05 2011 - 10:30:04 PDT

Custom Search