Re: [AMBER] Cluster analysis

From: Elvis Martis <elvis.martis.bcp.edu.in>
Date: Wed, 28 Jun 2017 16:29:28 +0000

HI,

It seems you have picked up the epsilon and minpts from the tutorial itself. These variables must be determined for your system by one of the two methods

1) by setting a large kdist (see the text from the manual that I have pasted) --- recommended method

>>>> Hints for setting DBSCAN parameters with ’kdist’
It is not always obvious what parameters to set for DBSCAN. You can get a rough idea of what to set ’mindist’ and ’epsilon’ to by generating a so-called "K-dist" plot with the ’kidst <k>’ option. The K-dist plot shows for each point (X axis) the Kth farthest distance (Y axis), sorted by decreasing distance. You supply the same distance metric and sieve parameters you want to use for the actual clustering, but nothing else. For example:
cluster C0 dbscan kdist 4 rms :1-4.CA <change according to what you want> sieve 10 loadpairdist pairdist CpptrajPairDist (use this commend)
The K-dist plot will be named <prefix>.<k>.dat, with the default prefix being ’Kdist’ (in this case the file name would be Kdist.4.dat). The K-dist plot usually looks like a curve with an initially steep slope that gradually decreases. Around where the initial part of the curve starts to flatten out (indicating an increas in density) is around where epsilon should be set; minpoints is set to whatever <k> was. It has been suggested that the shape of the K-dist curve doesn’t change too much after Kdist=4, but users are encouraged to experiment.

2) trial and error by trying various minpts and epsilon values.



                   Best Regards

[photo]



Elvis Martis
Ph.D. Student (Computational Chemistry)
 at Bombay College of Pharmacy


A Kalina, Santacruz [E], Mumbai 400098, INDIA
W www.elvismartis.in<https://webapp.wisestamp.com/www.elvismartis.in>
Skype. adrian_elvis12<https://webapp.wisestamp.com/#>


[https://ci3.googleusercontent.com/proxy/P0F8-p0kwxKdscp6zsT-ZSRttk9OJEsBGiaXej_H2ERz8n2ma5SLHFAWJdKL-wqOlXSGjbmEyga9C8lmU1bs-_yPIq3CnazA5eJVDYjce1r-34uwxqjjRnmAtE473lEq28nSHQ=s0-d-e1-ft#https://s3.amazonaws.com/images.wisestamp.com/icons_for_colors_32/linkedin.png]<http://www.linkedin.com/in/elvisadrianmartis/>





________________________________
From: Elisa Pieri <elisa.pieri90.gmail.com>
Sent: 28 June 2017 21:21:00
To: AMBER Mailing List
Subject: [AMBER] Cluster analysis

Hello,

I'm absolutely new to cluster analysis, so I'm getting problems following
the tutorial http://www.amber.utah.edu/AMBER-workshop/London-2015/Cluster/
. Here is my cpptraj input:

parm rat.parm7
trajin rat_ph5.nc
strip :WAT,Cl- outprefix pep
cluster C0 dbscan minpoints 25 epsilon 0.9 sievetoframe rms sieve 10 random
out cnumvtime.dat sil Sil summary summary.dat info info.dat cpopvtime
cpopvtime.agr normframe repout rep repfmt pdb singlerepout singlerep.nc
singlerepfmt netcdf avgout Avg avgfmt restart
run

And this is what I get after the "classic" trajectory processing:

ACTION OUTPUT:

ANALYSIS: Performing 1 analyses:
  0: [cluster C0 dbscan minpoints 25 epsilon 0.9 sievetoframe rms sieve 10
random out cnumvtime.dat sil Sil summary summary.dat info info.dat
cpopvtime cpopvtime.agr normframe repout rep repfmt pdb singlerepout
singlerep.nc singlerepfmt netcdf avgout Avg avgfmt restart]
    Starting clustering.
    Mask [*] corresponds to 354 atoms.
    Calculating pair-wise distances.
Random_Number: seed is <= 0, using wallclock time as seed (11931638)
    Estimated pair-wise matrix memory usage: > 36.594 MB
    Pair-wise matrix set up with sieve, 42771 frames, 4278 sieved frames.
 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% Complete.
    Memory used by pair-wise matrix and other cluster data: 36.808 MB
    Starting DBSCAN Clustering:
 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Complete.
    No clusters found.
    Cluster timing data:
TIME: Cluster Init. : 0.0000 s ( 0.00%)
TIME: Pairwise Calc.: 44.4838 s ( 99.73%)
TIME: Clustering : 0.1219 s ( 0.27%)
TIME: Cluster Post. : 0.0000 s ( 0.00%)
TIME: Total: 44.6057 s

TIME: Analyses took 44.6058 seconds.

DATASETS (2 total):
    _DEFAULTCRD_ "_DEFAULTCRD_" (coordinates), size is 42771 (182.718 MB)
Box Coords, 354 atoms
    C0 "C0" (integer), size is 0

DATAFILES (2 total):
  cnumvtime.dat (Standard Data File): C0
  cpopvtime.agr (Grace File):
Warning: Set 'C0' contains no data.
Warning: File 'cnumvtime.dat' has no sets containing data.
Warning: File 'cpopvtime.agr' has no sets containing data.

So, I guess the problem is in this C0, and I bet it's a trivial one, but I
have no clues. Can you help me?

Thanks!
Elisa
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Jun 28 2017 - 09:30:03 PDT
Custom Search