From: Thomas Cheatham <>
Date: Thu, 8 Jan 2015 10:05:58 -0700 (MST) add to what Dan said, ptraj development was effectively ended prior
to the wide-spread adoption of netCDF files; think of the ptraj clustering
code as "tests" for various clustering algorithms.

In an ideal world, where the data is clearly disjoint, the clustering
algorithm does not matter. In the case where transitions between clusters
is smooth, and the breaking point from one cluster to another is somewhat
arbitrary (depending on your choice of metric, statistics, definition of
centroid or "membership" in a cluster), the results do depend on the
clustering and there is no one "right" way to do this. This is the case
with MD data usually. What we do is explore choices of clustering that
provide the information we are trying to decipher. An excellent example
is the Henriksen et al. paper from 2013 on the GACC tetranucleotide, where
Niel went through many test/iterations to find a way to partition the data
in a meaningful way.

What I often do to figure out how many clusters is to do a 2D RMSD plot
since this is easy to see visually.

CPPTRAJ does not re-implement all the algorithms for clustering of ptraj
since we didn't think it necessary...

Hope this helps, --tec3

Received on Thu Jan 08 2015 - 09:30:03 PST
