Re: [AMBER] Questions about Principle Component Analysis from Thomas Cheatham on 2021-04-15 (Amber Archive Apr 2021)

From: Thomas Cheatham <tec3.utah.edu>
Date: Thu, 15 Apr 2021 20:17:00 +0000

To make sure the modes match between the independent trajectories, you need to use combined principal component analysis.

https://amberhub.chpc.utah.edu/introduction-to-principal-component-analysis/

If different topologies, you have to use the various parm options to specify multiple topologies and then correctly associate them to the different trajectories. For an example, see:

https://amberhub.chpc.utah.edu/comparing-the-structural-difference-between-a-series-of-structures/

You will need an equivalent 1 to 1 mapping of atoms between the different trajectories, for example CA atoms. If there are gaps/additions, you will have to play with your mask selections to make sure there is a 1 to 1 mapping.

You need a lot more frames than the minimum number required for a defined covariance matrix.

I assume Angstroms, but I would read the manual.

To see a real world use of comparing PCA modes among independent simulations, see:

https://www.sciencedirect.com/science/article/pii/S0304416514003092

--tec3

________________________________________
From: Li,Haoxi <hl2500.chem.ufl.edu>
Sent: Thursday, April 15, 2021 12:56:50 PM
To: AMBER Mailing List
Subject: [AMBER] Questions about Principle Component Analysis

Dear Amber Users,

I’m recently doing principle component analysis. I’m relatively new to this. Can I ask some questions?

1. If I would like to compare different MD trajectories by using PCA, should I combine the trajectories and then perform the analysis, like what was done in the 2 DNA Amber tutorial? Or can I perform PCA on each trajectory individually and then compare the results? In the later case, would the randomness of the sign of eigenvector cause inconsistency?

2. If I have to combine different trajectories, how can I compare different proteins with different number of atoms.

3. The PCA Amber tutorial says that we will need at least as many input frames to calculate the coordinate covariance matrix as we have rows/columns (i.e. 3 * # selected atoms). Can I ask why the frames number should be as many as the atom coordinates? It is not very obvious to me how this would influence the calculation of the covariance matrix.

4. I’m a little confused about nmwizvecs calculated from diagmatrix command. Does it have an actual meaning? From the .nmd output file, it seems nmwizvecs are a number of vectors which point from the average coordinates to the coordinates + eigenvector, so they are just parts of the eigenvector, 3 in a group for each atom for 3D visualization purpose? Is there a better way to think of this? Why they are starting with the lowest frequency mode?

5. Is the unit of the projection on eigenvectors in Angstroms?

Thank you so much in advance!

Best wishes,
Haoxi

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Apr 15 2021 - 13:30:02 PDT