Re: [AMBER] Kmeans-clustering : AvgDist query

From: Daniel Roe <daniel.r.roe.gmail.com>
Date: Fri, 20 Jan 2017 16:54:29 -0500

On Wed, Jan 18, 2017 at 5:51 AM, Bala subramanian
<bala.biophysics.gmail.com> wrote:
> Q1) From the rmsd matrix I calculated the AvgDist value of all the points
> in cluster #0 and #1,and I get the values 3.749 (for #0) and 3.918 (#1).
>
> But cpptraj reports a value of 2.756 and 1.205 (see above). From the rmsd
> matrix, it is easy to guess that the AvgDist is likely to be greater than
> 2.7 and 1.2. Am I missing something in understanding cpptraj AvgDist ?.

AvgDist is the average of all point to point distances *within a
single cluster*. Make sure you're not double-counting (i.e. make sure
you're not using both distance of 1 to 2 and 2 to 1). Looking at the
RMSD matrix can be misleading since it contains distances between
points in different clusters as well.

> Q2) Is there a way (in cpptraj) to dump the ascii pairdist file. I

Use the GitHub version of cpptraj
(https://github.com/Amber-MD/cpptraj) and you can read in cluster
matrices and treat them like a data set. So something like:

readdata PAIRD-BIN.dat name PW
writedata test.dat PW

should work.

> converted the binary pairdist file to ascii format (using: hexdump -v -e
> '10/4 "%06f "' -e '"\n"' PAIRD-BIN.dat > test.dat) and I get something like
> pasted below. What do these trailing zeros mean ?

The binary format is well-described in the Amber 16 manual. Note also
in the GitHub version you can also write cluster matrices in netcdf
format (extension .nccmatrix), which may be easier to interact with
(although the matrix is still stored as a 1D array).

Hope this helps,

-Dan

>
> 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 3.333752
> 1.000338 1.891641 4.172873 1.580285 7.259061 5.991429 7.105317 6.895642
> 3.820035 3.972430 1.370469 3.610308 4.069707 2.705386 4.098798 3.704278
> 0.992006 4.522664 1.088604 7.753875 6.499359 7.547778 7.335218 4.575821
> 0.940676 7.760054 6.560380 7.492657 7.288205 4.467124 4.149098 2.685882
> 4.407049 3.787850 7.268991 6.144238 6.942125 6.818217 1.503415 0.979665
> 0.599390 1.988932 1.306563
> 0.851417
>
> Thanks,
> Bala
>
>
>
>
>
>
> --
> C. Balasubramanian
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>



-- 
-------------------------
Daniel R. Roe
Laboratory of Computational Biology
National Institutes of Health, NHLBI
5635 Fishers Ln, Rm T900
Rockville MD, 20852
https://www.lobos.nih.gov/lcb
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Jan 20 2017 - 14:00:02 PST
Custom Search