- Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]

From: Christina Bergonzo <cbergonzo.gmail.com>

Date: Thu, 12 Aug 2021 09:24:58 -0400

Hello,

Keep in mind this is a more advanced analysis that really should be

done in combination with visualization of the trajectory - otherwise,

it will be hard to interpret your results.

First, if you haven't already you will need to perform PCA and project

your coordinates along the resulting eigenvectors. There is a tutorial

that introduces how to do this here:

https://amberhub.chpc.utah.edu/introduction-to-principal-component-analysis/

Then, you should look at pseudo-trajectories (generated with the

'modes trajout' command) and look at the relative contribution of each

eigenvector based on eigenvalues (using 'modes eigenval'). This will

help you understand what the structures along modes do (by visualizing

them)

or contribute to motion, and if clustering based on them is something

you're really

interested in (versus some other clustering metric based on smaller

parts of the structure, for example). You will need to get a feel for

how many principal components you would like to cluster on. What can

help is looking at what percentage of the whole each mode accounts for

- i.e., if mode 1 is 99 %, you should choose that one only. If modes 1

and 2 are 45 % and 38 %, that may be enough.

Last, you can cluster using the PC projection data via the 'data'

metric keyword of the 'cluster' command. You will probably have to

repeat the clustering several times, where you use kmeans to identify

many different numbers of clusters to get a sense for optimizing

clustering metrics DBI and pSF (DBI should be low values, and pSF

should be high values for the same number of clusters). Then, you will

have relevant structures that reflect principal component based

cluster results. The clustering input will look something like this (top 3

PCs):

# Read in projection data

readdata ../P1.dat name P1

readdata ../P2.dat name P2

eaddata ../P3.dat name P3

# Perform clustering

cluster kmeans clusters 10 data P1,P2,P3 out cvt.10.dat summary

summary.10.dat nocoords savepairdist info info.010.dat

-Christina

-----------------------------------------------------------------

Christina Bergonzo

Research Chemist

Biomolecular Measurement Division, MML, NIST

-----------------------------------------------------------------

On Thu, Aug 12, 2021 at 5:15 AM Nisha Amarnath Jonniya <

phd1601271002.iiti.ac.in> wrote:

*> Dear Amber users,
*

*>
*

*> I would like to appreciate any help in the regard of my concern.
*

*> I would like to take principal components as an input for the K-mean to
*

*> define number of possible clusters for the given principal components and
*

*> to extract the respective pdb structure for each cluster.
*

*> Is there any script available to take principal components as an input in
*

*> k-mean in CPPTRAJ?
*

*>
*

*> Thanks
*

*>
*

*> --
*

*>
*

*> Nisha Amarnath Jonniya
*

*> PhD Research Scholar
*

*> Biosciences and Biomedical Engineering
*

*> Indian Institute of Technology, Indore
*

*> India
*

*> _______________________________________________
*

*> AMBER mailing list
*

*> AMBER.ambermd.org
*

*> http://lists.ambermd.org/mailman/listinfo/amber
*

*>
*

_______________________________________________

AMBER mailing list

AMBER.ambermd.org

http://lists.ambermd.org/mailman/listinfo/amber

Received on Thu Aug 12 2021 - 06:30:03 PDT

Date: Thu, 12 Aug 2021 09:24:58 -0400

Hello,

Keep in mind this is a more advanced analysis that really should be

done in combination with visualization of the trajectory - otherwise,

it will be hard to interpret your results.

First, if you haven't already you will need to perform PCA and project

your coordinates along the resulting eigenvectors. There is a tutorial

that introduces how to do this here:

https://amberhub.chpc.utah.edu/introduction-to-principal-component-analysis/

Then, you should look at pseudo-trajectories (generated with the

'modes trajout' command) and look at the relative contribution of each

eigenvector based on eigenvalues (using 'modes eigenval'). This will

help you understand what the structures along modes do (by visualizing

them)

or contribute to motion, and if clustering based on them is something

you're really

interested in (versus some other clustering metric based on smaller

parts of the structure, for example). You will need to get a feel for

how many principal components you would like to cluster on. What can

help is looking at what percentage of the whole each mode accounts for

- i.e., if mode 1 is 99 %, you should choose that one only. If modes 1

and 2 are 45 % and 38 %, that may be enough.

Last, you can cluster using the PC projection data via the 'data'

metric keyword of the 'cluster' command. You will probably have to

repeat the clustering several times, where you use kmeans to identify

many different numbers of clusters to get a sense for optimizing

clustering metrics DBI and pSF (DBI should be low values, and pSF

should be high values for the same number of clusters). Then, you will

have relevant structures that reflect principal component based

cluster results. The clustering input will look something like this (top 3

PCs):

# Read in projection data

readdata ../P1.dat name P1

readdata ../P2.dat name P2

eaddata ../P3.dat name P3

# Perform clustering

cluster kmeans clusters 10 data P1,P2,P3 out cvt.10.dat summary

summary.10.dat nocoords savepairdist info info.010.dat

-Christina

-----------------------------------------------------------------

Christina Bergonzo

Research Chemist

Biomolecular Measurement Division, MML, NIST

-----------------------------------------------------------------

On Thu, Aug 12, 2021 at 5:15 AM Nisha Amarnath Jonniya <

phd1601271002.iiti.ac.in> wrote:

_______________________________________________

AMBER mailing list

AMBER.ambermd.org

http://lists.ambermd.org/mailman/listinfo/amber

Received on Thu Aug 12 2021 - 06:30:03 PDT

Custom Search