# Re: [AMBER] some questions about PCA

From: Daniel Roe <daniel.r.roe.gmail.com>
Date: Thu, 5 Sep 2013 10:25:46 -0600

Hi,

On Mon, Sep 2, 2013 at 8:55 PM, qiao xue <xueqiaoup.gmail.com> wrote:
> Dear Amber Users:
> I have some problems when performing PCA. The follows is the script:
>
> trajin md1.mdcrd
> rms previous .CA,C,N time 2
> matrix covar .CA name cvmat out cvmat.dat
> analyze matrix cvmat name evecs out evecs.dat vecs 25
> projection modes evecs .CA out project.dat beg 1 end 2

Be aware that in a normal batch ptraj/cpptraj run all analysis happens
after the initial pass through trajectories is complete, so your
'projection' command will actually be processed *before* the 'analyze
matrix' command. You probably want to do the 'projection' in a
separate run using 'evecs.dat' generated with the 'analyze matrix'
command (note that in cpptraj you actually can do this in one run via
COORDS data sets).

> The first question is: How could I know the total variance? I know
> the largest eigenvalue, but I do not know the total variance. So I
> could not get the ratio. And I do not know whether PC1 and PC2 can
> explain the motion of the system.

If you want to know how much each eigenvector contributes to the total
fluctuation, you should calculate and print out all the eigenmodes
from the diagonalization of your coordinate covariance matrix (i.e.
all 3*N modes, where N is the number of atoms selected by mask '.CA'),
not just the first 25. Then you can sum up all of the eigenvalues and
calculate the fraction each one contributes.

> The second question is: When I get the projection data. I found
> that the data has big difference with any tutorials and literatures.
> In tutorials, when ploting the datas into scatter diagram (X axis:PC1;
> Y axis: PC2), the range of x and y axises are -2 nm to 2 nm. However,
> in my scatter diagram, the range of the datas varied from -300 to 300.

First, there is no reason that the range of your projections should
match those from calculations perfromed on different systems with
differing #s of frames etc. It's difficult to say whether the results
are reasonable without knowing the size of your system, the # of
frames you are using etc.

> And I do not know the unit of measurement. How could I get the correct

The units of a projection are Angstroms.

> The last question is: I have N atoms to calculate. But the
> covariance matrix dimension is 3N * 3N, this can not be used as the x,
> y axis. How could I get the N * N covariance matrix?

I'm not sure what you mean by "can not be used as the x, y axis". You
can reduce the dimensionality of covariance eigenvectors from 3N to N
with the 'reduce' keyword (see the AmberTolls 13 manual for more
details). You can also calculate an NxN correlation matrix (aka a
DCCM) with 'matrix correl'.

Hope this helps,

-Dan

```--
-------------------------
Daniel R. Roe, PhD
Department of Medicinal Chemistry
University of Utah
30 South 2000 East, Room 201
Salt Lake City, UT 84112-5820
http://home.chpc.utah.edu/~cheatham/
(801) 587-9652
(801) 585-9119 (Fax)
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
```
Received on Thu Sep 05 2013 - 09:30:03 PDT
Custom Search