[AMBER] PTRAJ clustering from Sander B. on 2011-06-07 (Amber Archive Jun 2011)

From: Sander B. <bs3e09.soton.ac.uk>
Date: Tue, 7 Jun 2011 15:40:12 +0100

Hi,

I do two different forms of postprocessing with PTRAJ, namely just dumping snapshots and clustering based on different criteria. For both tasks, I read in the same set of 500 frames from one binpos file. So I dump 500 snapshots (numbering 500, 510, 530 ... 5490) and do the clustering with the same 500 frames (I do that in two steps with two different ptraj.in files, both reading in the same set of frames though). When clustering, two files are produced (among others): firstly, the cluster.ci file which contains all the members of one cluster (MODEL 1 to MODEL n-1, where n is the number of cluster members) except the representative and secondly, the cluster.rep.ci file which contains the representative. In the cluster.txt file, one can see which structure is in which cluster. So for example:

cluster0 XXXXXXXXXXXX...................................
cluster1 ............................XXXXXXXXXX...........
cluster2 ....................................................XXXXX

and so on. In this example, cluster0 contains 12 structures, 11 of which are in the cluster.c0 file and one of which is in the cluster.rep.c0 file. Unfortunately, I could not find a statement anywhere saying which of the 12 structures is the rep, so that I am able to say snapshot 530 (or whichever number) = rep of cluster0. The same problem occurs with the different MODELs in the cluster.ci file. As they are numbered from 1 to n-1, it is not clear (not stated anywhere) which MODEL is which frame read in.

In order to find out, I tried the following:

I took 500 snapshots previously generated with PTRAJ, read them in like this:

trajin snapshot500.pdb
trajin snapshot510.pdb
.
.
.
trajin snapshot5490.pdb

and clustered them based on CA into 10 clusters.

Looking at the cluster.txt file I was expecting the first "X" in the line of cluster0 to be the first frame read in (snapshot 500), the second "X" the second (snapshot 510) and so on. The last "X" in the line should, according to this theory, be the twelfth frame read in (snapshot 610). So cluster0 should contain snapshots 500, 510, ..., 610. Eleven of these should be MODELs 1 to 11 in the cluster.c0 file, one should be the cluster.rep.c0 file. But which MODEL is which snapshot and which one is the rep? To find out, I took the X-coordinate of the first atom in the cluster.rep.c0 file and looked in the files of snapshots 500 to 610 for this number. And found no matches. I did the same with all the 500 snapshot files I initially had read in and again found no matches. To me it looks like during the clustering process the structures are translated or rotated in some way. Is that true? Why? But the main question is: When I read in 500 snapshots and cluster them, is there a way of assigning the cluster MODELS a
nd reps exactly to the snapshots that were read in?

I hope my description of the problem is understandable somehow...

Many thanks in advance!
Barbara

_________________________
Barbara Sander, PhD Research Student
Jonathan W. Essex Group
School of Chemistry
University of Southampton
Highfield, Southampton SO17 1BJ
B27:2005
External: +44 2380595560
Internal: 25560
E-mail: bs3e09.soton.ac.uk

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Jun 07 2011 - 08:00:03 PDT