[AMBER] About dpeaks implementation in cpptraj from Casalini Tommaso on 2018-11-19 (Amber Archive Nov 2018)

From: Casalini Tommaso <tommaso.casalini.chem.ethz.ch>
Date: Mon, 19 Nov 2018 13:49:19 +0000

Dear Amber users and developers,
I have employed TREMD with the implicit solvent method previously described in a JACS paper from Nguyen et al. to study the conformation of a peptide of interest and I would like to use dpeaks clustering method from Rodriguez and Laio.

If I request the generation of representative structures with the "repout" commands, the index of the output goes from 0 to the number of clusters, which corresponds (according to the printed text) to the lowest density value to the highest density value. In other terms, from the less populated to cluster to the most "crowded" one.
Indeed, if I have understood correctly, the most populated cluster is the one with the highest density.

If I take a look at the summary file, the cluster with the lowest index seems to be the most populated one, since, according to the printed text, includes the higher frames fraction. It seems that there is discrepancy in the use of the indexes. My question is: who is right?

This is an example of the output:

ANALYSIS: Performing 1 analyses:
  0: [cluster C1 dpeaks epsilon 2.0 choosepoints manual distancecut 8.0 densitycut 50.0 dvdfile Distvsdens.dat rms .CA sieve 2 random info info.dat summary summary_1block.dat repout rep_1block repfmt pdb singlerepout singlerep_1block.nc singlerepfmt netcdf ]
        Starting clustering.
        Mask [.CA] corresponds to 46 atoms.
Random_Number: seed is <= 0, using wallclock time as seed (970000)
        Estimated pair-wise matrix memory usage: > 799.960 MB
        Pair-wise matrix set up with sieve, 40000 frames, 20000 sieved frames.
        Calculating pair-wise distances.
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% Complete.
        Memory used by pair-wise matrix and other cluster data: 800.200 MB
        Starting DPeaks clustering, discrete density calculation.
        Determining local density of each point.
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Complete.
        Finding closest neighbor point with higher density for each point.
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Complete.
        Point 19880 (frame 8206, density 84) selected as candidate for cluster 0
        Point 19940 (frame 23297, density 93) selected as candidate for cluster 1
        Point 19970 (frame 7606, density 98) selected as candidate for cluster 2
        Point 19999 (frame 17848, density 139) selected as candidate for cluster 3
        Identified 4 cluster centers from density vs distance peaks.
        Restoring sieved frames.
FIXME: Adding sieved frames not yet supported.
Warning: Within cluster average distance (AvgDist) does not include sieved frames.
        Writing 'singlerep_1block.nc' as Amber NetCDF
        Writing 'rep_1block.c0.pdb' as PDB
        Writing 'rep_1block.c1.pdb' as PDB
        Writing 'rep_1block.c2.pdb' as PDB
        Writing 'rep_1block.c3.pdb' as PDB

But the summary file says:

#Cluster Frames Frac AvgDist Stdev Centroid AvgCDist
       0 8034 0.201 11.962 3.492 35596 6.102
       1 7781 0.195 10.695 2.143 26770 5.093
       2 3446 0.086 10.723 3.092 8039 6.384
       3 739 0.018 8.296 2.512 8463 7.073

I use cpptraj V17.00
I have another question, which is more general: I would like to identify different populations using the 2N dihedral angles of the backbone. Is it possible to compute a distance between structures using them and not, e.g., alpha carbon atoms?
I thank you in advance for your support!
Best regards,
Tommaso
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Nov 19 2018 - 06:00:04 PST