Re: [AMBER] clustering places all frames in one cluster

From: Daniel Roe <daniel.r.roe.gmail.com>
Date: Thu, 25 Apr 2013 08:32:52 -0600

Hi,

One thing you may want to try (if you haven't already) is looking at a
2D-RMS plot of the trajectory you're trying to cluster. This will give
you an idea of how much the trajectory actually varies (at least in
RMSD-space), indicating how many clusters you can reasonably expect.
You may also want to try the new version of cpptraj (AmberTools13),
which supports clustering on data sets and has an implementation of a
density-based clustering algorithm, DBSCAN. Also, if you have a
multi-core machine (and configure for openmp) the pairwise calculation
in cluster is OpenMP parallelized.

-Dan

On Thu, Apr 25, 2013 at 3:20 AM, Amparo Garcia Lopez
<Amparo.GarciaLopez.unige.ch> wrote:
> thanks Dan, yes I did try all those epsilon values too. Still getting:
>
> "too many clusters, clusters beyond 62 will be represented using *"
>
> it gives me more than 200 clusters.
>
> I'll report any progress once I've tried the other clustering methods (different from average linkage), in case it might be of use for someone in the future. I'm feeling a bit lost here myself, a I'm still quite a beginner.
>
> Thanks again!
>
> Amparo Garcia-Lopez, Ph.D.
>
> Pharmaceutical Biochemistry
> School of Pharmaceutical Sciences
> University of Geneva
> Quai Ernest-Ansermet 30
> 1211 Genève 4 - Switzerland
>
> Tel: +41 (0)22 379 3376
> Fax: +41 (0)22 379 3360
>
> e-mail: Amparo.GarciaLopez.unige.ch
> ________________________________________
> From: Miguel Ortiz Lombardía [miguel.ortiz-lombardia.afmb.univ-mrs.fr]
> Sent: 22 April 2013 15:11
> To: amber.ambermd.org
> Subject: Re: [AMBER] clustering places all frames in one cluster
>
> I have a question regarding the "critical distance" (epsilon) metrics.
> If I correctly understand Shao et al. paper, this metrics can be useful,
> among others, to decide the number of clusters. However, when ptraj is
> run requiring a certain number of clusters, the critical distance is not
> reported (I see the average distance to the centroid within each cluster
> but not the distance between the clusters centroids) Or is it?
>
> Now, if we do it the other way round, that is, if we fix epsilon in
> ptraj then we end with the same problem we had at the beginning: we need
> to choose a certain number of clusters to decide what epsilon to apply,
> since there seems not to be a clear-cut criterium for epsilon given a
> clustering method. Or I misunderstood something (very possible, even
> probable)
>
> I would appreciate insights from more clever and experienced people.
> Cheers,
>
> Miguel Ortiz Lombardía
>
> Architecture et Fonction des Macromolécules Biologiques (UMR7257)
> CNRS, Aix-Marseille Université
> Case 932, 163 Avenue de Luminy, 13288 Marseille cedex 9, France
> Tel: +33(0) 491 82 55 93
> Fax: +33(0) 491 26 67 20
> mailto:miguel.ortiz-lombardia.afmb.univ-mrs.fr
> http://www.afmb.univ-mrs.fr/Miguel-Ortiz-Lombardia
>
> Le 22/04/13 07:20, Vaibhav Dixit a écrit :
>> Hi,
>> Did you try other epsilon values like 1.9, 2.0, 2.1, 2.2 or 2.5, 2.6?
>>
>>
>> On Fri, Apr 19, 2013 at 1:44 PM, Amparo Garcia Lopez <
>> Amparo.GarciaLopez.unige.ch> wrote:
>>
>>> Hi Dan,
>>>
>>> thanks for your reply.
>>>
>>> So I tried using the sieve option, and I got the same thing: all frames in
>>> one cluster. I tried using epsilon (2.3) instead of cluster count and then
>>> it created ~200 representative pdb's!
>>>
>>> Yes, I read the paper you're mentioning back in the day. I'll have to read
>>> it again and try some other method, as the average linking isn't making
>>> much sense for me. Any suggestions on what to try first?
>>>
>>> Thanks very much,
>>> Amparo
>>>
>>>
>>> Amparo Garcia-Lopez, Ph.D.
>>>
>>> Pharmaceutical Biochemistry
>>> School of Pharmaceutical Sciences
>>> University of Geneva
>>> Quai Ernest-Ansermet 30
>>> 1211 Genève 4 - Switzerland
>>>
>>> Tel: +41 (0)22 379 3376
>>> Fax: +41 (0)22 379 3360
>>>
>>> e-mail: Amparo.GarciaLopez.unige.ch
>>> ________________________________________
>>> From: Daniel Roe [daniel.r.roe.gmail.com]
>>> Sent: 15 April 2013 16:09
>>> To: AMBER Mailing List
>>> Subject: Re: [AMBER] clustering places all frames in one cluster
>>>
>>> Hi,
>>>
>>> On Mon, Apr 15, 2013 at 2:26 AM, Amparo Garcia Lopez
>>> <Amparo.GarciaLopez.unige.ch> wrote:
>>>> what I get is 10 clusters in which, 19,991 frames are in the first
>>> cluster (c0), and the rest of the clusters have only one point in them.
>>>>
>>>> This is not possible, I've done clustering of this RNA before using diff
>>> settings (e.g., collecting 1 frame every 100 instead of every 20) and I get
>>> clusters with nicely distributed occurences.
>>>> what do you think I have done wrong? Maybe 10 clusters is too much,
>>> maybe collecting 1 every 20 is not a good idea? We are talking about a
>>> trajectory of 400 ns here.
>>>
>>> Clustering is very much an art form; there really is no
>>> one-size-fits-all procedure for clustering every trajectory. Settings
>>> that work well on one system may give poor results on another (as you
>>> have seen). Have you tried varying epsilon instead of cluster count,
>>> or trying some of the other clustering algorithms available in ptraj?
>>> You can also try the 'sieve' option, which will first cluster on a
>>> reduced set (similar to the offset method) then add frames back in
>>> based on similarity to cluster centroids. If you haven't yet, I highly
>>> recommend you read through the clustering journal article from Shao &
>>> Cheatham et al., JCTC (2007) v 3 p 2312-2334.
>>>
>>> -Dan
>>>
>>> --
>>> -------------------------
>>> Daniel R. Roe, PhD
>>> Department of Medicinal Chemistry
>>> University of Utah
>>> 30 South 2000 East, Room 201
>>> Salt Lake City, UT 84112-5820
>>> http://home.chpc.utah.edu/~cheatham/
>>> (801) 587-9652
>>> (801) 585-9119 (Fax)
>>>
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>
>>
>>
>>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber



-- 
-------------------------
Daniel R. Roe, PhD
Department of Medicinal Chemistry
University of Utah
30 South 2000 East, Room 201
Salt Lake City, UT 84112-5820
http://home.chpc.utah.edu/~cheatham/
(801) 587-9652
(801) 585-9119 (Fax)
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Apr 25 2013 - 08:00:02 PDT
Custom Search