Re: [AMBER] PCA_representation_in_Cpptraj from Jason Swails on 2015-03-09 (Amber Archive Mar 2015)

From: Jason Swails <jason.swails.gmail.com>
Date: Mon, 9 Mar 2015 08:52:07 -0400

On Mon, Mar 9, 2015 at 5:14 AM, abdennour braka <
abdennour.braka.univ-orleans.fr> wrote:

> Dear Amber users,
> I would like to produce a PCA vectors representationsuch as_*Tutorial
> 22-SECTION 3.*_
> Could you explain to me how can I add the energie landscape to the PCA
> vectorsrepresentation
> here is my cpptraj input script:
>

You need to process the PCA projections for each snapshot (which will be
in pca12-ca). The basic synopsis of what they did was to generate a set of
weights for each point (in order to 'reweight' those points back to their
standard, unbiased Boltzmann distribution), then do a 2-dimensional
(weighted) histogram in order to get 'populations' in every bin of the
2-dimensional surface (the two dimensions being the first two principal
components).

Then they just used standard practice to convert that 2-D histogram of
populations into a free energy -- they found the most populated bin and
divided every bin by the population of that bin, converting the populations
to probabilities relative to the most populous bin. Then they took
-kT*ln(bin) for every bin to convert those probabilities into relative free
energies (the act of dividing by the largest bin population makes the
minimum free energy on the surface 0).

They show two ways to do that -- one of which involves a kernel density
estimator, which is a bin-free alternative to representing a density of
states (the bin being replaced by the 'bandwidth' of the kernel).

They *do* walk through the process of reweighting (and provide a useful
link to further explain and provide the scripts). You then have to use
these weights and take the data to generate a 2-D histogram from which you
can compute the free energy surface. I would use the histogram2d function
in numpy to do this, since it takes a set of optional weights to generate
the density:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.histogram2d.html.

So the general workflow is:

1. Compute the PCA eigenvectors
2. Generate the time series of projections along the first two eigenvectors
3. Compute the time series of (re)weighting factors for each frame (which
should match up with the projections you calculated in 2)
4. Read the projections from (2) and the weights from (3) into numpy arrays
5. Call numpy.histogram2d with the 2-D array of projections and the 1-D
array of weights you read in in (4)
6. Divide each bin by the value of the *largest* bin.
7. Compute -kT*ln(bin) for all bins.
8. Plot the result.

This is fairly involved, so it might take some time to really understand
what is happening and what you are doing. But try to make sense of it at
each step. And hopefully by the end of it, you'll be able to explain the
process to somebody else much better than I explained it to you ;).

Of course, you don't *have* to use numpy and Python to process the data --
you could use R, Perl, Ruby, probably even awk or a spreadsheet like
Excel. I just picked Python because it's what I'm familiar with.

HTH,
Jason

-- 
Jason M. Swails
BioMaPS,
Rutgers University
Postdoctoral Researcher
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

Received on Mon Mar 09 2015 - 06:00:04 PDT