# Re: [AMBER] PCA: comparing PCs from different trajectories

From: Eiros Zamora, Juan <j.eiros-zamora14.imperial.ac.uk>
Date: Thu, 6 Aug 2015 16:32:48 +0000

Hey I’ve been doing some PCA myself so maybe I can help you
El 6/8/2015, a las 16:18, George Tzotzos <gtzotzos.me.com<mailto:gtzotzos.me.com>> escribió:

Thank you Daniel,

It worked. I have a couple of queries.

1. In my script
kde T1:1 kldiv T2:1 klout KL-PC.agr bins 400 name PC1-PC2-1
kde T1:2 kldiv T2:2 klout KL-PC.agr bins 400 name PC1-PC2-2
kde T1:3 kldiv T2:3 klout KL-PC.agr bins 400 name PC1-PC2-3
kde T1:4 kldiv T2:4 klout KL-PC.agr bins 400 name PC1-PC2-4
kde T1:5 kldiv T2:5 klout KL-PC.agr bins 400 name PC1-PC2-5

1. Does the produced histogram (see attached) represent the Kullback−Leibler divergence of the first 5 PCs between run 1 and 2? Is this correct?

I don’t see anything attached, but yeah I think that should work

2. In the affirmative, I guess I should do the same for run 1 and 3, 1 and 4, etc. Am I right?

True, I discussed this with Dan on an email in the list a day ago

3. Last but not least, I don’t see anywhere in the output how much of the total motion is represented in the 1st, 2nd and 3rd PC for each run. Is there a way to produce this type of output?

I don’t know if cpptraj can do this directly (at least I haven’t figured out how). But the way I’ve done it is outputting an evecs file when you run the diagmatrix command to diagonalize your covariance matrix (with the out flag). In that file you have all the vectors you’ve specified (to more the better accuracy on the % of motion) preceded by their eigenvalue. So what you do is get all those eigenvalues, sum them and then calculate what % each value is of that total sum. So that will tell you how much of the variance each eigenvector is responsible for. If everything goes well, you should see that each PC has a lower % than the previous one. It’s a bit of a workaround so if cpptraj has the option to do that and I haven’t seen it I’ll feel pretty dumb

Hope that helps,

Juan

Once again thank you for your kind help

Regards

George

On 5 Aug 2015, at 21:18, Daniel Roe <daniel.r.roe.gmail.com<mailto:daniel.r.roe.gmail.com>> wrote:

On Wed, Aug 5, 2015 at 11:39 AM, George Tzotzos <gtzotzos.me.com<mailto:gtzotzos.me.com> <mailto:gtzotzos.me.com>> wrote:
crdaction crd1 projection T1 modes evecs.dat beg 1 end 20 :1-125&!.H= \
crdframes 1, 2000 out T1.dat

Remove the space in your crdframes argument, e.g. 'crdframes 1,2000'.
This is the source of your error messages.

Warning: No actions/output trajectories specified.

This is because you only have analyses set up at the end (kde etc).
Can be safely ignored.

Warning: Size of Mode1 (10000) != size of Mode1 (8000)
Warning: Only using 8000 data points.

This is related to your malformed crdframes arguments above - each
projection start argument was read but not the end because of the
space. So the first projection T1 had 10000 frames, the second 8000,
and so on. This will be resolved after you fix the crdframes
arguments.

Hope this helps,

-Dan

On 5 Aug 2015, at 20:16, Daniel Roe <daniel.r.roe.gmail.com<mailto:daniel.r.roe.gmail.com>> wrote:

Hi,

This is by no means the only way, but we've done this kind of analysis
recently (see http://pubs.acs.org/doi/abs/10.1021/jp4125099). The idea
is to calculate principal components from a combined trajectory (e.g.
trajectories T1 + T2 + T3), then calculate projections along those
principal components separately (i.e. the projections for T1, T2, and
T3 along the combined PCs). The distributions for the individual
projections for PCs (particularly the low frequency ones) should
overlap pretty well, otherwise the trajectories are not converged -
note that overlap is a necessary but not sufficient condition of
convergence, ideally you will look at several different properties to
ascertain overall convergence. We used Kullback-Leibler divergence to
quantify the overlap of distributions, but that isn't the only way.
The SI in the given publication has some example scripts for CPPTRAJ

Hope this helps,

-Dan

On Wed, Aug 5, 2015 at 2:24 AM, George Tzotzos <gtzotzos.me.com<mailto:gtzotzos.me.com>> wrote:
I’ve run 5 independent MD simulations of the same system. I’m seeking advice on how to compare the Principal Components derived from each trajectory to assess convergence of sampling.

Regards

George
_______________________________________________
AMBER mailing list
AMBER.ambermd.org<mailto:AMBER.ambermd.org>
http://lists.ambermd.org/mailman/listinfo/amber

--
-------------------------
Daniel R. Roe, PhD
Department of Medicinal Chemistry
University of Utah
30 South 2000 East, Room 307
Salt Lake City, UT 84112-5820
http://home.chpc.utah.edu/~cheatham/
(801) 587-9652
(801) 585-6208 (Fax)

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org<mailto:AMBER.ambermd.org> <mailto:AMBER.ambermd.org>
http://lists.ambermd.org/mailman/listinfo/amber <http://lists.ambermd.org/mailman/listinfo/amber>

--
-------------------------
Daniel R. Roe, PhD
Department of Medicinal Chemistry
University of Utah
30 South 2000 East, Room 307
Salt Lake City, UT 84112-5820
http://home.chpc.utah.edu/~cheatham/ <http://home.chpc.utah.edu/~cheatham/>
(801) 587-9652
(801) 585-6208 (Fax)

_______________________________________________
AMBER mailing list
AMBER.ambermd.org<mailto:AMBER.ambermd.org> <mailto:AMBER.ambermd.org>
http://lists.ambermd.org/mailman/listinfo/amber <http://lists.ambermd.org/mailman/listinfo/amber>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org<mailto:AMBER.ambermd.org>
http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Aug 06 2015 - 10:00:02 PDT
Custom Search