# Re: [AMBER] [cpptraj] Kullback Leibler Divergence cutoff choice

Date: Wed, 5 Aug 2015 12:09:45 -0400

Hi Juan

If you look at the paper you mention, they use KLD in a very different
way that what you are trying to do. In their case, they KNOW the
correct, final distribution, obtained through immense amounts of
sampling. Then they ask, how long does it takes, using a different
sampling technique, to get a KLD < 0.02 VERSUS the correct, converged
distribution. To remind you, KLD = 0 means the distributions are identical.

Now, in your case, you are comparing different MD runs, NONE of which is
possibly fully converged, correct? This means you expect your KLD to be
higher than 0.02. Now, KLDs are comparisons between TWO distributions.
For some properties, such a radius of gyration for instance, I expect
that your KLD would be close to zero when comparing two dynamics, each
of 200 ns. PCA's are much trickier and take longer to converge.

When you compute PCAs, there are many details about how you overlapped

On 8/5/15 12:01 PM, Eiros Zamora, Juan wrote:
> Hi everyone,
>
> I’ve replicated the KLD analysis of this paper http://pubs.acs.org/doi/abs/10.1021/jp4125099 on my system and I have a couple of questions.
>
> A cutoff of convergence of KLD < 0.02 is chosen because the slope of the KLD plot vs time no longer changes once its below this number. For my system, this appears to be happening as well, but for a KLD < 2.5.
>
> 1) Are the KLD values expected to be higher the more complex a system is? (i.e. in the paper the analysis is done on a tetra nucleotide and I’m doing it on a 419 residue protein). I understand that this is a measure of the difference between two probability distribution functions, therefore it wouldn’t really matter how complex the system is when you do the PC projection on the trajectory and histogram it, but I was wondering if I’m missing something and that could be the explanation. Also, am I just wrong assuming that this is converged by choosing a higher cutoff? I’m just picking this 2.5 value because it appears that for the last 200 ns the plot is stable, but if were to pick 0.02 then it would be not that easy to say so.
>
> 2) Is there a reason to not do all the pairwise KLD comparisons between the independent runs? As in, if you have 10 runs you should be doing 90 KLDs, because the KLD is not symmetric. But I don’t know if that would make much sense in MD, or if it would give extra info at all? I’d like to have the opinion of the authors on this, because it looks to me a tedious analysis with cpptraj that maybe isn’t really adding any insight.
>
>
> Juan
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber

```--