Re: [AMBER] PTRAJ clustering from Jianyin Shao on 2011-06-07 (Amber Archive Jun 2011)

From: Jianyin Shao <jyshao2004.gmail.com>
Date: Tue, 7 Jun 2011 09:50:55 -0600

I should say:

trajin Trajectory_file
reference cluster.rep.c0
rms REFERENCE out temp.txt mass .CA

Sorry for the inconvenience.

Jianyin Shao

On Tue, Jun 7, 2011 at 9:44 AM, Jianyin Shao <jyshao2004.gmail.com> wrote:

> Hi Barbara,
>
> Yes, the structures are rotated and translated during clustering. So the
> coordinates will be changed. The best way to find out the ordinal number of
> a representative structure is to do an rms fit using the representative
> structure as a reference.
>
> trajin Trajectory_file
> reference cluster.rep.c0
> rms out temp.txt mass .CA
>
> Then check out the temp.txt to see which frame has an rms value of 0
> or very close to 0. Also I think the cluster.c0 should contain all
> structures, including the representative structure, cluster.rep.c0. Same
> applies to other clusters.
>
> The reason we do not report the ordinal number of the representative
> structure lies in the way we read in the trajectory. Say you read in 500
> frames and do a clustering analysis on these 500 frames; you find that the
> second frame is the representative structure of cluster 0. However, that
> does not mean the second frame of your original trajectory file is the
> representative unless you use "trajin Trajectory_file". You could read one
> frame in for every ten frames; or you could read in multiple trajectory
> files, or the combination of these two. The clustering module has no means
> to know the detailed information of trajin command. Therefore we choose not
> to report the ordinal number of the representative structure.
>
> Hope it helps.
>
> Best,
>
> Jianyin Shao
>
> On Tue, Jun 7, 2011 at 8:40 AM, Sander B. <bs3e09.soton.ac.uk> wrote:
>
>> Hi,
>>
>> I do two different forms of postprocessing with PTRAJ, namely just dumping
>> snapshots and clustering based on different criteria. For both tasks, I read
>> in the same set of 500 frames from one binpos file. So I dump 500 snapshots
>> (numbering 500, 510, 530 ... 5490) and do the clustering with the same 500
>> frames (I do that in two steps with two different ptraj.in files, both
>> reading in the same set of frames though). When clustering, two files are
>> produced (among others): firstly, the cluster.ci file which contains all
>> the members of one cluster (MODEL 1 to MODEL n-1, where n is the number of
>> cluster members) except the representative and secondly, the
>> cluster.rep.ci file which contains the representative. In the
>> cluster.txt file, one can see which structure is in which cluster. So for
>> example:
>>
>> cluster0 XXXXXXXXXXXX...................................
>> cluster1 ............................XXXXXXXXXX...........
>> cluster2
>> ....................................................XXXXX
>>
>> and so on. In this example, cluster0 contains 12 structures, 11 of which
>> are in the cluster.c0 file and one of which is in the cluster.rep.c0 file.
>> Unfortunately, I could not find a statement anywhere saying which of the 12
>> structures is the rep, so that I am able to say snapshot 530 (or whichever
>> number) = rep of cluster0. The same problem occurs with the different MODELs
>> in the cluster.ci file. As they are numbered from 1 to n-1, it is not
>> clear (not stated anywhere) which MODEL is which frame read in.
>>
>> In order to find out, I tried the following:
>>
>> I took 500 snapshots previously generated with PTRAJ, read them in like
>> this:
>>
>> trajin snapshot500.pdb
>> trajin snapshot510.pdb
>> .
>> .
>> .
>> trajin snapshot5490.pdb
>>
>> and clustered them based on CA into 10 clusters.
>>
>> Looking at the cluster.txt file I was expecting the first "X" in the line
>> of cluster0 to be the first frame read in (snapshot 500), the second "X" the
>> second (snapshot 510) and so on. The last "X" in the line should, according
>> to this theory, be the twelfth frame read in (snapshot 610). So cluster0
>> should contain snapshots 500, 510, ..., 610. Eleven of these should be
>> MODELs 1 to 11 in the cluster.c0 file, one should be the cluster.rep.c0
>> file. But which MODEL is which snapshot and which one is the rep? To find
>> out, I took the X-coordinate of the first atom in the cluster.rep.c0 file
>> and looked in the files of snapshots 500 to 610 for this number. And found
>> no matches. I did the same with all the 500 snapshot files I initially had
>> read in and again found no matches. To me it looks like during the
>> clustering process the structures are translated or rotated in some way. Is
>> that true? Why? But the main question is: When I read in 500 snapshots and
>> cluster them, is there a way of assigning the cluster MODELS and reps
>> exactly to the snapshots that were read in?
>>
>> I hope my description of the problem is understandable somehow...
>>
>> Many thanks in advance!
>> Barbara
>>
>>
>> _________________________
>> Barbara Sander, PhD Research Student
>> Jonathan W. Essex Group
>> School of Chemistry
>> University of Southampton
>> Highfield, Southampton SO17 1BJ
>> B27:2005
>> External: +44 2380595560
>> Internal: 25560
>> E-mail: bs3e09.soton.ac.uk
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Jun 07 2011 - 09:00:03 PDT