Re: [AMBER] PCA

From: Daniel Roe <daniel.r.roe.gmail.com>
Date: Fri, 8 Apr 2016 10:34:11 -0600

On Thu, Apr 7, 2016 at 12:59 PM, Sofia Vasilakaki <svasilak.chem.uoa.gr> wrote:
> I am using a common script for this:
> matrix covar name covmat out covmat.dat :1-704&!.H=
> diagmatrix covmat out evecs.dat vecs 57
> projection modes evecs.dat out pca.dat beg 1 end 57 :1-704&!.H=

I don't think this will do what you want. Projecting your coordinates
along eigenvectors should be run *after* the matrix has been generated
and eigenvectors calculated. Also, be aware that since you are not
rms-fitting your coordinates prior to generating the matrix your
resulting eigenmodes will include global rotational/translational
motion as well (unless the input trajectory has been fit previously).
See supporting info in the following articles for examples of
performing PCA (including projection) with cpptraj:

http://pubs.acs.org/doi/abs/10.1021/ct400862k
http://pubs.acs.org/doi/abs/10.1021/jp4125099

> 1. When the diagmatrix is performing the following warning appears. What
> does this mean? How the eigenvectors are generated from the covariance
> matrix?
>
> Warning: In matrix 'covmat', # of frames 1534 is less than # of columns
> 16998.
> Warning: The max # of non-zero eigenvalues will be 1534

In order to fully populate a covariance matrix you need at least as
many observations as the number of columns in your matrix; in your
case you have far fewer frames than columns, which is why cpptraj is
giving the warning. If you don't have a lot of frames consider
increasing the length of your simulation or using fewer atoms (e.g.
only CA atoms).

> 2.Is there any rule for choosing the number of vecs in the diagmatrix
> command? Should it be a multiple of 3 (x,y,z)?

This depends on how many eigenvectors you're interested in, but the
max for a Cartesian covariance matrix will be 3 times the number of
atoms in the mask used to generate the matrix.

> 3.I used "hist pca.dat:2 bins 200 .." which run smoothly while "hist
> pca.dat:2,*,*,*,200 out .." returned
> Warning: Data set 'pca.dat:2' not found.
> Error: Dataset pca.dat:2 not found.
> Error: Could not setup analysis [hist]
>
> Why? I used "readdata pca.dat" before "hist".

Difficult to say without seeing your full input/output. What version
of cpptraj are you using?

> 4.I am trying to plot gnu files which I got from "hist" and "kde" command,

The gnuplot output from cpptraj is really only useful for 2 or more
dimensions (or viewing multiple 1D plots together). Try grace or
standard column (.dat) output instead.

> but the graphs I am getting with splot are not right (a line appears in
> the middle of the graph). Could you please explain the format of the gnu
> files and maybe provide me with correct gnuplot command to plot them. For

You'll have to consult gnuplot documentation for this info.

> 5.How you calculate the % of total motion of one mode? Is there any way to
> calculate an entropy estimation for the top PCs?

The 'modes' analysis command with the 'eigenval' keyword may give you
what you want. See the Amber 15 manual for more details.

-Dan

-- 
-------------------------
Daniel R. Roe, PhD
Department of Medicinal Chemistry
University of Utah
30 South 2000 East, Room 307
Salt Lake City, UT 84112-5820
http://home.chpc.utah.edu/~cheatham/
(801) 587-9652
(801) 585-6208 (Fax)
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Apr 08 2016 - 10:00:03 PDT
Custom Search