Re: [AMBER] cpptraj cluster -representative structure from Thomas Cheatham on 2016-12-14 (Amber Archive Dec 2016)

From: Thomas Cheatham <tec3.utah.edu>
Date: Wed, 14 Dec 2016 11:38:55 -0700 (MST)

> Hi, I have ten best-fit structures of my enzyme extracted from long
> dynamics tracks; the structures are solvated and have ions for charge
> neutralization. I want to identify the most representative structure in
> the group of ten; what is the best way to do this with cpptraj?

This is a difficult question to directly answer since it depends on your
definition of "representative" and it also depends on how your 10
structures were chosen/determined.

I would approach this problem not by picking only ten structures, but by
clustering all the trajectory snapshots, creating both average and
representative structures from the individual clusters. One might then
argue that the largest cluster contains the best representative
structures, however this assumes complete sampling. In the real world
this is tricky, so in fact a minor populated cluster (which took a long
time to find and perhaps was only found at the end of the MD runs) could
be, in fact, most representative. Therefore, You likely want to apply an
arsenal of analyses to get at what is hidden in the trajectory data,
likely using CPPTRAJ.

- RMSD
- 2D RMSD (to get a handle on how many states might be accessible) and
whether particular conformations are being revisited)
- clustering
- MM-PBSA or other energetic analysis of sub-states/clusters
- PCA (to look at major modes of motion, perhaps in each cluster)
- if multiple trajectories, combined clustering/PCA across all
trajectories
- visualization of dynamics/movies to "see" what was happening
- grid analysis (on clusters) of ion/water major sites

In recent papers from my lab over the past couple of years, we try to put
CPPTRAJ scripts into the supporting information which may be helpful, in
addition to the CPPTRAJ tutorials/examples in AmberTools.

The specification of "representative" is subjective and subject to your
interpretation (what atoms to cluster on, what algorithm, sieving,
cluster count, ...) which you must defend as the data is presented. So
try to explore your data as much as possible-- the analyses are likely the
most people-time consuming part of any MD project.

--tec3

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Dec 14 2016 - 11:00:04 PST