Re: [AMBER] PCA and K-mean together scripting in CPPTRAJ from Christina Bergonzo on 2021-08-12 (Amber Archive Aug 2021)

From: Christina Bergonzo <cbergonzo.gmail.com>
Date: Thu, 12 Aug 2021 09:24:58 -0400

Hello,

Keep in mind this is a more advanced analysis that really should be
done in combination with visualization of the trajectory - otherwise,
it will be hard to interpret your results.

First, if you haven't already you will need to perform PCA and project
your coordinates along the resulting eigenvectors. There is a tutorial
that introduces how to do this here:
https://amberhub.chpc.utah.edu/introduction-to-principal-component-analysis/

Then, you should look at pseudo-trajectories (generated with the
'modes trajout' command) and look at the relative contribution of each
eigenvector based on eigenvalues (using 'modes eigenval'). This will
help you understand what the structures along modes do (by visualizing
them)
or contribute to motion, and if clustering based on them is something
you're really
interested in (versus some other clustering metric based on smaller
parts of the structure, for example). You will need to get a feel for
how many principal components you would like to cluster on. What can
help is looking at what percentage of the whole each mode accounts for
- i.e., if mode 1 is 99 %, you should choose that one only. If modes 1
and 2 are 45 % and 38 %, that may be enough.

Last, you can cluster using the PC projection data via the 'data'
metric keyword of the 'cluster' command. You will probably have to
repeat the clustering several times, where you use kmeans to identify
many different numbers of clusters to get a sense for optimizing
clustering metrics DBI and pSF (DBI should be low values, and pSF
should be high values for the same number of clusters). Then, you will
have relevant structures that reflect principal component based
cluster results. The clustering input will look something like this (top 3
PCs):

# Read in projection data
readdata ../P1.dat name P1
readdata ../P2.dat name P2
eaddata ../P3.dat name P3
# Perform clustering
cluster kmeans clusters 10 data P1,P2,P3 out cvt.10.dat summary
summary.10.dat nocoords savepairdist info info.010.dat

-Christina

-----------------------------------------------------------------
Christina Bergonzo
Research Chemist
Biomolecular Measurement Division, MML, NIST
-----------------------------------------------------------------

On Thu, Aug 12, 2021 at 5:15 AM Nisha Amarnath Jonniya <
phd1601271002.iiti.ac.in> wrote:

> Dear Amber users,
>
> I would like to appreciate any help in the regard of my concern.
> I would like to take principal components as an input for the K-mean to
> define number of possible clusters for the given principal components and
> to extract the respective pdb structure for each cluster.
> Is there any script available to take principal components as an input in
> k-mean in CPPTRAJ?
>
> Thanks
>
> --
>
> Nisha Amarnath Jonniya
> PhD Research Scholar
> Biosciences and Biomedical Engineering
> Indian Institute of Technology, Indore
> India
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Aug 12 2021 - 06:30:03 PDT