Re: [AMBER] PCA and K-mean together scripting in CPPTRAJ from Nisha Amarnath Jonniya on 2021-08-13 (Amber Archive Aug 2021)

From: Nisha Amarnath Jonniya <phd1601271002.iiti.ac.in>
Date: Fri, 13 Aug 2021 18:21:52 +0530

Thank you for the mail.
It helps me to take principal components as input for the kmean.
Now my concern is how to decide the number of clusters for running the
kmean. How to get the scree plot for choosing the number of clusters?

Thanks

On Thu, Aug 12, 2021 at 6:56 PM Christina Bergonzo <cbergonzo.gmail.com>
wrote:

> Hello,
>
> Keep in mind this is a more advanced analysis that really should be
> done in combination with visualization of the trajectory - otherwise,
> it will be hard to interpret your results.
>
> First, if you haven't already you will need to perform PCA and project
> your coordinates along the resulting eigenvectors. There is a tutorial
> that introduces how to do this here:
>
> https://amberhub.chpc.utah.edu/introduction-to-principal-component-analysis/
>
> Then, you should look at pseudo-trajectories (generated with the
> 'modes trajout' command) and look at the relative contribution of each
> eigenvector based on eigenvalues (using 'modes eigenval'). This will
> help you understand what the structures along modes do (by visualizing
> them)
> or contribute to motion, and if clustering based on them is something
> you're really
> interested in (versus some other clustering metric based on smaller
> parts of the structure, for example). You will need to get a feel for
> how many principal components you would like to cluster on. What can
> help is looking at what percentage of the whole each mode accounts for
> - i.e., if mode 1 is 99 %, you should choose that one only. If modes 1
> and 2 are 45 % and 38 %, that may be enough.
>
> Last, you can cluster using the PC projection data via the 'data'
> metric keyword of the 'cluster' command. You will probably have to
> repeat the clustering several times, where you use kmeans to identify
> many different numbers of clusters to get a sense for optimizing
> clustering metrics DBI and pSF (DBI should be low values, and pSF
> should be high values for the same number of clusters). Then, you will
> have relevant structures that reflect principal component based
> cluster results. The clustering input will look something like this (top 3
> PCs):
>
> # Read in projection data
> readdata ../P1.dat name P1
> readdata ../P2.dat name P2
> eaddata ../P3.dat name P3
> # Perform clustering
> cluster kmeans clusters 10 data P1,P2,P3 out cvt.10.dat summary
> summary.10.dat nocoords savepairdist info info.010.dat
>
> -Christina
>
> -----------------------------------------------------------------
> Christina Bergonzo
> Research Chemist
> Biomolecular Measurement Division, MML, NIST
> -----------------------------------------------------------------
>
> On Thu, Aug 12, 2021 at 5:15 AM Nisha Amarnath Jonniya <
> phd1601271002.iiti.ac.in> wrote:
>
> > Dear Amber users,
> >
> > I would like to appreciate any help in the regard of my concern.
> > I would like to take principal components as an input for the K-mean to
> > define number of possible clusters for the given principal components and
> > to extract the respective pdb structure for each cluster.
> > Is there any script available to take principal components as an input in
> > k-mean in CPPTRAJ?
> >
> > Thanks
> >
> > --
> >
> > Nisha Amarnath Jonniya
> > PhD Research Scholar
> > Biosciences and Biomedical Engineering
> > Indian Institute of Technology, Indore
> > India
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>

-- 
Nisha Amarnath Jonniya
PhD Research Scholar
Biosciences and Biomedical Engineering
Indian Institute of Technology, Indore
India
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

Received on Fri Aug 13 2021 - 06:00:02 PDT