Re: [AMBER] PTRAJ clustering

From: Sander B. <>
Date: Wed, 8 Jun 2011 09:11:42 +0100

Hi Jianyin,

many thanks for this, that was very helpful!

I understand what you say regarding the way frames are read in. But just to confirm: If I read in 500 single snapshots as described, it is correct that the first "X" is definitely the first snapshot read in and the second "X" is the second snapshot and so on. So I can say that if

cluster0 XXX..............
cluster1 ......XX..........

snapshots 500, 510 and 520 are definitely in cluster 0?

Is the same true if I read in the frames like this:

trajin Trajectory_file 500 5490 10 ?

Thank you again!

Barbara Sander, PhD Research Student
Jonathan W. Essex Group
School of Chemistry
University of Southampton
Highfield, Southampton SO17 1BJ
External: +44 2380595560
Internal: 25560

-----Original Message-----
From: Jianyin Shao []
Sent: 07 June 2011 16:44
To: AMBER Mailing List
Subject: Re: [AMBER] PTRAJ clustering

Hi Barbara,

Yes, the structures are rotated and translated during clustering. So the coordinates will be changed. The best way to find out the ordinal number of a representative structure is to do an rms fit using the representative structure as a reference.

trajin Trajectory_file
reference cluster.rep.c0
rms out temp.txt mass .CA

Then check out the temp.txt to see which frame has an rms value of 0 or very close to 0. Also I think the cluster.c0 should contain all structures, including the representative structure, cluster.rep.c0. Same applies to other clusters.

The reason we do not report the ordinal number of the representative structure lies in the way we read in the trajectory. Say you read in 500 frames and do a clustering analysis on these 500 frames; you find that the second frame is the representative structure of cluster 0. However, that does not mean the second frame of your original trajectory file is the representative unless you use "trajin Trajectory_file". You could read one frame in for every ten frames; or you could read in multiple trajectory files, or the combination of these two. The clustering module has no means to know the detailed information of trajin command. Therefore we choose not to report the ordinal number of the representative structure.

Hope it helps.


Jianyin Shao

On Tue, Jun 7, 2011 at 8:40 AM, Sander B. <> wrote:

> Hi,
> I do two different forms of postprocessing with PTRAJ, namely just
> dumping snapshots and clustering based on different criteria. For both
> tasks, I read in the same set of 500 frames from one binpos file. So I
> dump 500 snapshots (numbering 500, 510, 530 ... 5490) and do the
> clustering with the same 500 frames (I do that in two steps with two
> different files, both reading in the same set of frames
> though). When clustering, two files are produced (among others):
> firstly, the file which contains all the members of one
> cluster (MODEL 1 to MODEL n-1, where n is the number of cluster
> members) except the representative and secondly, the
> file which contains the representative. In the cluster.txt file, one can see which structure is in which cluster. So for example:
> cluster0 XXXXXXXXXXXX...................................
> cluster1 ............................XXXXXXXXXX...........
> cluster2
> ....................................................XXXXX
> and so on. In this example, cluster0 contains 12 structures, 11 of
> which are in the cluster.c0 file and one of which is in the cluster.rep.c0 file.
> Unfortunately, I could not find a statement anywhere saying which of
> the 12 structures is the rep, so that I am able to say snapshot 530
> (or whichever
> number) = rep of cluster0. The same problem occurs with the different
> MODELs in the file. As they are numbered from 1 to n-1, it
> is not clear (not stated anywhere) which MODEL is which frame read in.
> In order to find out, I tried the following:
> I took 500 snapshots previously generated with PTRAJ, read them in
> like
> this:
> trajin snapshot500.pdb
> trajin snapshot510.pdb
> .
> .
> .
> trajin snapshot5490.pdb
> and clustered them based on CA into 10 clusters.
> Looking at the cluster.txt file I was expecting the first "X" in the
> line of cluster0 to be the first frame read in (snapshot 500), the
> second "X" the second (snapshot 510) and so on. The last "X" in the
> line should, according to this theory, be the twelfth frame read in
> (snapshot 610). So cluster0 should contain snapshots 500, 510, ...,
> 610. Eleven of these should be MODELs 1 to 11 in the cluster.c0 file,
> one should be the cluster.rep.c0 file. But which MODEL is which
> snapshot and which one is the rep? To find out, I took the
> X-coordinate of the first atom in the cluster.rep.c0 file and looked
> in the files of snapshots 500 to 610 for this number. And found no
> matches. I did the same with all the 500 snapshot files I initially
> had read in and again found no matches. To me it looks like during
> the clustering process the structures are translated or rotated in
> some way. Is that true? Why? But the main question is: When I read in
> 500 snapshots and cluster them, is there a way of assigning the cluster MODELS and reps exactly to the snapshots that were read in?
> I hope my description of the problem is understandable somehow...
> Many thanks in advance!
> Barbara
> _________________________
> Barbara Sander, PhD Research Student
> Jonathan W. Essex Group
> School of Chemistry
> University of Southampton
> Highfield, Southampton SO17 1BJ
> B27:2005
> External: +44 2380595560
> Internal: 25560
> E-mail:
> _______________________________________________
> AMBER mailing list
AMBER mailing list

AMBER mailing list
Received on Wed Jun 08 2011 - 01:30:06 PDT
Custom Search