Re: [AMBER] How to do clustering analysis by ptraj? from Thomas Cheatham on 2013-09-20 (Amber Archive Sep 2013)

From: Thomas Cheatham <tec3.utah.edu>
Date: Fri, 20 Sep 2013 11:42:07 -0600 (Mountain Daylight Time)

> However, I still have some question about how to use option "epsilon" and
> "clusters".
> Normally,* what case should we use "epsilon" or "clusters"?*

Epsilon specifies the cutoff, and if epsilon is too small, you get many
many clusters which is tricky to analyze / understand.

There really isn't a case to specify which is better, it is simply a
preference; essentially, clustering results depend on choices made in
clustering algorithm, cutoff/epsilon, etc. and you want to run multiple
variants of the clustering until you find a partitioning that helps
explain your data.

> *when I publish my result, is it necessary to analysis all frames?*

Using the sieve doesn't miss frames, it just uses a subset to do the
initial clustering and then puts the skipped points into existing
clusters. As long as you do not miss a particular cluster with the sieve,
results should be comparable to clustering over all frames. The way to
check is to cluster multiple times (with different offsets, sieve sizes,
and/or random frame sieving) to see how the results compare.

> I created the 2D-rms plot as your advice.
>
> > trajin mdcrd 1 32500 25
> > rms2d out 2drms.gnu :2-103.CA
>
> [image: Inline image 1]
> This 2D-rms plot how can help us?

Your plot is simply a 1D RMS and was likely calculated by ptraj NOT
cpptraj. For ptraj you want the 2drms command.

> In my case, "* **A snapshot may*
> *become a member of its closest cluster if the rmsd is smaller*
> *than a given cutoff (3 Å)." *described at the paper,
> *if I want to do the same analysis with the paper to give a cutoff value,*
> *should I set the epsilon option at the following input file ?*

When you did this initially, you obtained more than 5000 clusters. If
that is what you want, great...

> Really, I did the clustering analysis used above input file, and compared
> best 3 most
> populated cluster's representative structure with the figure on the paper,
> there are very different.

Either your simulations are not converged or they were not converged in
the published paper. I would compare the results from the different
replicas (not temperature sorted) which if sufficient sampling was
obtained should be identical.

> *Maybe my initial extend structure at REMD simulation is not same with
> paper's structure. If the others parameter is same, is it possible to get
> the same result with published paper?*

Yes, if both of you have demonstrated complete sampling (which is
exceedingly difficult, especially for a disordered or semi-ordered
protein)

--tec3

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Sep 20 2013 - 11:00:03 PDT