Re: [AMBER] PCA_representation_in_Cpptraj from Jason Swails on 2015-03-11 (Amber Archive Mar 2015)

From: Jason Swails <jason.swails.gmail.com>
Date: Wed, 11 Mar 2015 10:30:22 -0400

On Wed, 2015-03-11 at 12:52 +0000, Juan Eiros Zamora wrote:
> Hi,
>
> Thanks for your corrections on my script. I have one additional
> question, when you say
> > Jason explained all of this quite well. I will only add that as an
> > alternative to 'hist' in one dimension, you can create 1D histograms
> > in cpptraj with a Gaussian kernel density estimator using the 'kde'
> > command.
> >
> > Hope this helps,
> >
> > -Dan
> >
> Is there any way I can use the KDE to calculate the Free Energy in the
> PCA 2D plot? Looking in the Amber manual I've found that the only option
> for the hist command is with bins. Or a way where I could do the 1D
> histogram with kde command and then transfer it to the PCA projection
> file? I think it would be a better idea than selecting myself the bin
> size, as Jason mentioned that it is a better solution for this problem.

As far as I can tell (at least according to the cpptraj help), the
kernel density estimator (kde) only works on 1-dimensional data. This
is presumably true because the kernel is only defined as a 1-dimensional
Gaussian function. In order to do a 2-D PMF with KDEs, you need a 2-D
KDE so you can map the density due to each point on a grid (i.e., you
need the kernel to have "width" in every dimension of the space you are
histogramming).

There are other packages that provide a more flexible KDE framework that
allows multivariate kernels. Among them are R and scipy.stats.kde for
Python. If you're a Python enthusiast, you can actually use the pytraj
extension to cpptraj (see https://github.com/pytraj/pytraj) which is
going to be released with the next release of AmberTools in order to
compute the PCA vectors and load their projections into a numpy array
that you can feed to the kernel density estimator :). Very neat.

In this case, cpptraj powers the calculation of the covariance matrix,
diagonalizes it, gets the eigenvectors, projects the frames onto them,
and hands you back the projections as a data set. Then you can do with
that numpy array what you wish (i.e., feed it to scipy.stats.kde).

As Dan said, though, with enough data, binning typically is not that
bad.

HTH,
Jason

-- 
Jason M. Swails
BioMaPS,
Rutgers University
Postdoctoral Researcher
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

Received on Wed Mar 11 2015 - 08:00:05 PDT