Re: [AMBER] PTRAJ clustering from Jianyin Shao on 2011-06-07 (Amber Archive Jun 2011)

From: Jianyin Shao <jyshao2004.gmail.com>
Date: Tue, 7 Jun 2011 09:44:21 -0600

Hi Barbara,

Yes, the structures are rotated and translated during clustering. So the
coordinates will be changed. The best way to find out the ordinal number of
a representative structure is to do an rms fit using the representative
structure as a reference.

trajin Trajectory_file
reference cluster.rep.c0
rms out temp.txt mass .CA

Then check out the temp.txt to see which frame has an rms value of 0 or very
close to 0. Also I think the cluster.c0 should contain all structures,
including the representative structure, cluster.rep.c0. Same applies to
other clusters.

The reason we do not report the ordinal number of the representative
structure lies in the way we read in the trajectory. Say you read in 500
frames and do a clustering analysis on these 500 frames; you find that the
second frame is the representative structure of cluster 0. However, that
does not mean the second frame of your original trajectory file is the
representative unless you use "trajin Trajectory_file". You could read one
frame in for every ten frames; or you could read in multiple trajectory
files, or the combination of these two. The clustering module has no means
to know the detailed information of trajin command. Therefore we choose not
to report the ordinal number of the representative structure.

Hope it helps.

Best,

Jianyin Shao

On Tue, Jun 7, 2011 at 8:40 AM, Sander B. <bs3e09.soton.ac.uk> wrote:

> Hi,
>
> I do two different forms of postprocessing with PTRAJ, namely just dumping
> snapshots and clustering based on different criteria. For both tasks, I read
> in the same set of 500 frames from one binpos file. So I dump 500 snapshots
> (numbering 500, 510, 530 ... 5490) and do the clustering with the same 500
> frames (I do that in two steps with two different ptraj.in files, both
> reading in the same set of frames though). When clustering, two files are
> produced (among others): firstly, the cluster.ci file which contains all
> the members of one cluster (MODEL 1 to MODEL n-1, where n is the number of
> cluster members) except the representative and secondly, the
> cluster.rep.ci file which contains the representative. In the cluster.txt
> file, one can see which structure is in which cluster. So for example:
>
> cluster0 XXXXXXXXXXXX...................................
> cluster1 ............................XXXXXXXXXX...........
> cluster2
> ....................................................XXXXX
>
> and so on. In this example, cluster0 contains 12 structures, 11 of which
> are in the cluster.c0 file and one of which is in the cluster.rep.c0 file.
> Unfortunately, I could not find a statement anywhere saying which of the 12
> structures is the rep, so that I am able to say snapshot 530 (or whichever
> number) = rep of cluster0. The same problem occurs with the different MODELs
> in the cluster.ci file. As they are numbered from 1 to n-1, it is not
> clear (not stated anywhere) which MODEL is which frame read in.
>
> In order to find out, I tried the following:
>
> I took 500 snapshots previously generated with PTRAJ, read them in like
> this:
>
> trajin snapshot500.pdb
> trajin snapshot510.pdb
> .
> .
> .
> trajin snapshot5490.pdb
>
> and clustered them based on CA into 10 clusters.
>
> Looking at the cluster.txt file I was expecting the first "X" in the line
> of cluster0 to be the first frame read in (snapshot 500), the second "X" the
> second (snapshot 510) and so on. The last "X" in the line should, according
> to this theory, be the twelfth frame read in (snapshot 610). So cluster0
> should contain snapshots 500, 510, ..., 610. Eleven of these should be
> MODELs 1 to 11 in the cluster.c0 file, one should be the cluster.rep.c0
> file. But which MODEL is which snapshot and which one is the rep? To find
> out, I took the X-coordinate of the first atom in the cluster.rep.c0 file
> and looked in the files of snapshots 500 to 610 for this number. And found
> no matches. I did the same with all the 500 snapshot files I initially had
> read in and again found no matches. To me it looks like during the
> clustering process the structures are translated or rotated in some way. Is
> that true? Why? But the main question is: When I read in 500 snapshots and
> cluster them, is there a way of assigning the cluster MODELS and reps
> exactly to the snapshots that were read in?
>
> I hope my description of the problem is understandable somehow...
>
> Many thanks in advance!
> Barbara
>
>
> _________________________
> Barbara Sander, PhD Research Student
> Jonathan W. Essex Group
> School of Chemistry
> University of Southampton
> Highfield, Southampton SO17 1BJ
> B27:2005
> External: +44 2380595560
> Internal: 25560
> E-mail: bs3e09.soton.ac.uk
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Jun 07 2011 - 09:00:02 PDT