Re: [AMBER] Clustering analysis

From: Debarati DasGupta <>
Date: Wed, 15 Apr 2020 14:38:37 +0000

Hi Daniel,

Could you let me know what are the inputs actually needed to calculate the DB Index?
I did google a lot and seems like I did not find what I was looking for.

Sent from Mail<> for Windows 10

From: Daniel Roe<>
Sent: 25 March 2020 15:27
To: AMBER Mailing List<>
Subject: Re: [AMBER] Clustering analysis


Unfortunately there's no magic formula to getting results from
clustering. Like any other method it requires careful scrutiny to
really have confidence in your results. 2 things I can recommend:

1) Check your DBI and pseudo F values for the various cluster results
you have. In general you want a small DBI, high pseudo F. It also
helps to look at the cluster silhouettes.

2) I recommend this all the time and I'm sure people are tired of it
(but I also don't care): **read through the Shao & Cheatham et al
clustering paper**, and specifically everything they do to validate
your results. I don't think you'll find a more comprehensive study on
clustering of MD trajectory data. It's where I go when I need new
ideas (or have to brush up on some old ones) on how to analyze
clustering data. If anyone has more recent recommendations please post


On Wed, Mar 18, 2020 at 6:34 PM Debarati DasGupta
<> wrote:
> Hello Daniel,
> I have been trying to follow your advice *clustering is an art form and requires various different trial and error sessions.*
> My aims were simple
> I am working on a NMR structure (10 conformers in the pdb file) a Med25 ACID protein domain and trying to answer 2 questions
> 1. How different are the 10 models from each other?
> 2. We did run some plain TIP3P water explicit solvent simulations (10 different runs on 10 different NMR models)
> Trying to analyse the conformations and how do the 10 conformers behave during the explicit solvent simulations.
> So I focused on k-means and hierarchical clustering methods just to learn the basics of setting up clustering in cpptraj.
> So I played with the cluster number and also the epsilon value (distance between cluster points) and got a wide array of results!
> Now I am getting drowned in statistics and literally lost.
> Can anyone suggest “how to analyse” my outputs and how to make sense of my results?
> What I am looking to compare between the different outputs and how to analyse the avg structures I got the representative structures I got ?
> Any help will be super grateful.
> Regards
> Debarati
> Sent from Mail<> for Windows 10
> _______________________________________________
> AMBER mailing list

AMBER mailing list

AMBER mailing list
Received on Wed Apr 15 2020 - 08:30:02 PDT
Custom Search