Re: [AMBER] clustering analysis in ptraj takes too long time

From: Daniel Roe <daniel.r.roe.gmail.com>
Date: Tue, 22 Nov 2011 13:07:44 -0500

Another option is to use the 'sieve' keyword to seed the initial
clusters with a smaller number of frames, after which a second pass is
made to add the rest of the frames. This speeds up clustering
considerably while providing comparable results. See the AmberTools
manual for more details. Also, I recommend checking out the following
paper if you haven't already:

Shao, J.; Tanner, S.W.; Thompson, N.; Cheatham, T.E. III. Clustering
molecular dynamics trajectories: 1. Characterizing the performance of
different clustering algorithms. J. Chem. Theory Comput., 2007, 3,
2312–2334.

-Dan

On Tue, Nov 22, 2011 at 8:08 AM, Alex Rodriguez <alexdepremia.gmail.com> wrote:
> Hi!
>
> I have never employed ptraj clustering facilities but, in general,
> clustering algorithms scale with the square of the number of elements
> involved. Try to do it with a lower number of snapshots, may be not using
> all the snaps of your trajectory but, let's say, only one each five... I'm
> afraid that if you employ all of them, it would take really long...
>
> Alex.
>
>
>
> On Tue, Nov 22, 2011 at 11:07 AM, liu junjun <ljjlp03.gmail.com> wrote:
>
>> Dear All,
>>
>> I have a MD trajectory consisting of ~22,000 snapshots for a small peptide,
>> which has only 537 atoms. I am trying to do clustering analysis by ptraj.
>> But the clustering calculation has been running for more than 24 hours
>> without completion. This is very wired because the clustering analysis in
>> ptraj is much much faster on relatively small trajectory.
>>
>> ------------------ Some information that may needed for diagnose the
>> problem ----------------------
>> ptraj version:                     from AmberTools 1.5, all patch have been
>> applied
>> the clustering command:   cluster out testcluster representative pdb
>> average pdb averagelinkage clusters 5 rms .CA
>>
>> ------------------ test of a clustering analysis on a trajectory consisting
>> of 1881 snapshots -------------------------
>> the calculation time: 1.5 minutes
>> file size of PairwiseDistances: 14Mb
>> the last line of ClusterMerging.txt: -1877: -1735 -1875 7.723636 1.881472
>> 2697.106445
>>
>> ------------------ clustering analysis on the trajectory consisting of
>> ~22,000 snapshots -----------------------------
>> the calculation time: more than 24 hours and not finished yet
>> file size of PairwiseDistances: 2.0Gb
>> the last line of ClusterMerging.txt: -3930: 12251 -1653 1.057229
>> 352.6313
>>
>>
>> Any helps are highly appreciated!
>>
>> Junjun
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>



-- 
-------------------------
Daniel R. Roe, PhD
Postdoctoral Associate
BioMaPS Institute, Rutgers University
610 Taylor Road
Piscataway, NJ   08854
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Nov 22 2011 - 10:30:02 PST
Custom Search