- Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]

From: Jason Swails <jason.swails.gmail.com>

Date: Mon, 9 Mar 2015 08:52:07 -0400

On Mon, Mar 9, 2015 at 5:14 AM, abdennour braka <

abdennour.braka.univ-orleans.fr> wrote:

*> Dear Amber users,
*

*> I would like to produce a PCA vectors representationsuch as_*Tutorial
*

*> 22-SECTION 3.*_
*

*> Could you explain to me how can I add the energie landscape to the PCA
*

*> vectorsrepresentation
*

*> here is my cpptraj input script:
*

*>
*

You need to process the PCA projections for each snapshot (which will be

in pca12-ca). The basic synopsis of what they did was to generate a set of

weights for each point (in order to 'reweight' those points back to their

standard, unbiased Boltzmann distribution), then do a 2-dimensional

(weighted) histogram in order to get 'populations' in every bin of the

2-dimensional surface (the two dimensions being the first two principal

components).

Then they just used standard practice to convert that 2-D histogram of

populations into a free energy -- they found the most populated bin and

divided every bin by the population of that bin, converting the populations

to probabilities relative to the most populous bin. Then they took

-kT*ln(bin) for every bin to convert those probabilities into relative free

energies (the act of dividing by the largest bin population makes the

minimum free energy on the surface 0).

They show two ways to do that -- one of which involves a kernel density

estimator, which is a bin-free alternative to representing a density of

states (the bin being replaced by the 'bandwidth' of the kernel).

They *do* walk through the process of reweighting (and provide a useful

link to further explain and provide the scripts). You then have to use

these weights and take the data to generate a 2-D histogram from which you

can compute the free energy surface. I would use the histogram2d function

in numpy to do this, since it takes a set of optional weights to generate

the density:

http://docs.scipy.org/doc/numpy/reference/generated/numpy.histogram2d.html.

So the general workflow is:

1. Compute the PCA eigenvectors

2. Generate the time series of projections along the first two eigenvectors

3. Compute the time series of (re)weighting factors for each frame (which

should match up with the projections you calculated in 2)

4. Read the projections from (2) and the weights from (3) into numpy arrays

5. Call numpy.histogram2d with the 2-D array of projections and the 1-D

array of weights you read in in (4)

6. Divide each bin by the value of the *largest* bin.

7. Compute -kT*ln(bin) for all bins.

8. Plot the result.

This is fairly involved, so it might take some time to really understand

what is happening and what you are doing. But try to make sense of it at

each step. And hopefully by the end of it, you'll be able to explain the

process to somebody else much better than I explained it to you ;).

Of course, you don't *have* to use numpy and Python to process the data --

you could use R, Perl, Ruby, probably even awk or a spreadsheet like

Excel. I just picked Python because it's what I'm familiar with.

HTH,

Jason

Date: Mon, 9 Mar 2015 08:52:07 -0400

On Mon, Mar 9, 2015 at 5:14 AM, abdennour braka <

abdennour.braka.univ-orleans.fr> wrote:

You need to process the PCA projections for each snapshot (which will be

in pca12-ca). The basic synopsis of what they did was to generate a set of

weights for each point (in order to 'reweight' those points back to their

standard, unbiased Boltzmann distribution), then do a 2-dimensional

(weighted) histogram in order to get 'populations' in every bin of the

2-dimensional surface (the two dimensions being the first two principal

components).

Then they just used standard practice to convert that 2-D histogram of

populations into a free energy -- they found the most populated bin and

divided every bin by the population of that bin, converting the populations

to probabilities relative to the most populous bin. Then they took

-kT*ln(bin) for every bin to convert those probabilities into relative free

energies (the act of dividing by the largest bin population makes the

minimum free energy on the surface 0).

They show two ways to do that -- one of which involves a kernel density

estimator, which is a bin-free alternative to representing a density of

states (the bin being replaced by the 'bandwidth' of the kernel).

They *do* walk through the process of reweighting (and provide a useful

link to further explain and provide the scripts). You then have to use

these weights and take the data to generate a 2-D histogram from which you

can compute the free energy surface. I would use the histogram2d function

in numpy to do this, since it takes a set of optional weights to generate

the density:

http://docs.scipy.org/doc/numpy/reference/generated/numpy.histogram2d.html.

So the general workflow is:

1. Compute the PCA eigenvectors

2. Generate the time series of projections along the first two eigenvectors

3. Compute the time series of (re)weighting factors for each frame (which

should match up with the projections you calculated in 2)

4. Read the projections from (2) and the weights from (3) into numpy arrays

5. Call numpy.histogram2d with the 2-D array of projections and the 1-D

array of weights you read in in (4)

6. Divide each bin by the value of the *largest* bin.

7. Compute -kT*ln(bin) for all bins.

8. Plot the result.

This is fairly involved, so it might take some time to really understand

what is happening and what you are doing. But try to make sense of it at

each step. And hopefully by the end of it, you'll be able to explain the

process to somebody else much better than I explained it to you ;).

Of course, you don't *have* to use numpy and Python to process the data --

you could use R, Perl, Ruby, probably even awk or a spreadsheet like

Excel. I just picked Python because it's what I'm familiar with.

HTH,

Jason

-- Jason M. Swails BioMaPS, Rutgers University Postdoctoral Researcher _______________________________________________ AMBER mailing list AMBER.ambermd.org http://lists.ambermd.org/mailman/listinfo/amberReceived on Mon Mar 09 2015 - 06:00:04 PDT

Custom Search