Re: [AMBER] Clustering analysis from Daniel Roe on 2020-03-25 (Amber Archive Mar 2020)

From: Daniel Roe <daniel.r.roe.gmail.com>
Date: Wed, 25 Mar 2020 15:27:27 -0400

Hi,

Unfortunately there's no magic formula to getting results from
clustering. Like any other method it requires careful scrutiny to
really have confidence in your results. 2 things I can recommend:

1) Check your DBI and pseudo F values for the various cluster results
you have. In general you want a small DBI, high pseudo F. It also
helps to look at the cluster silhouettes.

2) I recommend this all the time and I'm sure people are tired of it
(but I also don't care): **read through the Shao & Cheatham et al
clustering paper**, and specifically everything they do to validate
your results. I don't think you'll find a more comprehensive study on
clustering of MD trajectory data. It's where I go when I need new
ideas (or have to brush up on some old ones) on how to analyze
clustering data. If anyone has more recent recommendations please post
them! https://pubs.acs.org/doi/10.1021/ct700119m

-Dan

On Wed, Mar 18, 2020 at 6:34 PM Debarati DasGupta
<debarati_dasgupta.hotmail.com> wrote:
>
> Hello Daniel,
>
> I have been trying to follow your advice *clustering is an art form and requires various different trial and error sessions.*
>
> My aims were simple
> I am working on a NMR structure (10 conformers in the pdb file) a Med25 ACID protein domain and trying to answer 2 questions
>
> 1. How different are the 10 models from each other?
> 2. We did run some plain TIP3P water explicit solvent simulations (10 different runs on 10 different NMR models)
> Trying to analyse the conformations and how do the 10 conformers behave during the explicit solvent simulations.
>
> So I focused on k-means and hierarchical clustering methods just to learn the basics of setting up clustering in cpptraj.
> So I played with the cluster number and also the epsilon value (distance between cluster points) and got a wide array of results!
> Now I am getting drowned in statistics and literally lost.
> Can anyone suggest “how to analyse” my outputs and how to make sense of my results?
> What I am looking to compare between the different outputs and how to analyse the avg structures I got the representative structures I got ?
> Any help will be super grateful.
>
> Regards
> Debarati
>
>
>
>
> Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Mar 25 2020 - 12:30:03 PDT