- Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]

From: Daniel Roe <daniel.r.roe.gmail.com>

Date: Wed, 5 Aug 2015 11:31:31 -0600

Hi,

On Wed, Aug 5, 2015 at 10:01 AM, Eiros Zamora, Juan

<j.eiros-zamora14.imperial.ac.uk> wrote:

*> I’ve replicated the KLD analysis of this paper http://pubs.acs.org/doi/abs/10.1021/jp4125099 on my system and I have a couple of questions.
*

*>
*

*> A cutoff of convergence of KLD < 0.02 is chosen because the slope of the KLD plot vs time no longer changes once its below this number. For my system, this appears to be happening as well, but for a KLD < 2.5.
*

Note that since KL divergence is a logarithmic, a value of 2.5

signifies that the overlap is orders of magnitude worse than 0.02.

From personal experience I consider any value over 1.0 extremely poor

overlap. You will really want to look at the distributions themselves

- it's pretty easy to see whether the overlap is "good" or not. If you

look at figure 3 of that publication showing the final distributions

of projections of PC1 and compare them to the final values of KLD

(figure 4), you can start to get an idea of how the value of KLD

relates to actual overlap. Also note that a low value does not always

mean "well converged". From the discussion of those figures:

"Note that, in the case of the DFC-HREMD runs, the initial KLD values

for PCs 1 and 2 drop rapidly to ∼0.021 at around 48 ns before rising

back up to ∼0.269. This illustrates a potential pitfall of using KLD

as a metric of convergence when sampling is limited, such as at the

beginning of an MD simulation: if two otherwise independent runs begin

in the same conformational space of a certain PC, they may both

initially explore the same limited subspace along that PC and the

overlap of their PC projection histograms may actually be quite good.

Once sampling increases and one or both of the runs can explore

outside of their initial regions, the KLD better reflects the

convergence between the two simulations. Therefore, the KLD of PC

projection histograms is a good measure of convergence between two

simulations only if at least one simulation has explored along most or

all of a given PC. In other words, there is a minimum sampling time

required before this metric can be considered viable."

In our case since we were using enhanced sampling (HREMD and MREMD) as

well as other measures of convergence (like combined clustering) we

were confident that we had enough sampling to start to claim the

distributions are well-converged. So KLD is a useful metric for

quantifying overlap, but by itself does not necessarily indicate

convergence.

*> 1) Are the KLD values expected to be higher the more complex a system is?
*

No - it's purely a measure of overlap between distributions and has

nothing to do with system size.

*> 2) Is there a reason to not do all the pairwise KLD comparisons between the independent runs?
*

You certainly can. In our case we only wanted to look at overlap

between the 2 independent runs for each enhanced sampling method

tried.

Hope this helps answer some of your questions. Let me know if you have any more,

-Dan

Date: Wed, 5 Aug 2015 11:31:31 -0600

Hi,

On Wed, Aug 5, 2015 at 10:01 AM, Eiros Zamora, Juan

<j.eiros-zamora14.imperial.ac.uk> wrote:

Note that since KL divergence is a logarithmic, a value of 2.5

signifies that the overlap is orders of magnitude worse than 0.02.

From personal experience I consider any value over 1.0 extremely poor

overlap. You will really want to look at the distributions themselves

- it's pretty easy to see whether the overlap is "good" or not. If you

look at figure 3 of that publication showing the final distributions

of projections of PC1 and compare them to the final values of KLD

(figure 4), you can start to get an idea of how the value of KLD

relates to actual overlap. Also note that a low value does not always

mean "well converged". From the discussion of those figures:

"Note that, in the case of the DFC-HREMD runs, the initial KLD values

for PCs 1 and 2 drop rapidly to ∼0.021 at around 48 ns before rising

back up to ∼0.269. This illustrates a potential pitfall of using KLD

as a metric of convergence when sampling is limited, such as at the

beginning of an MD simulation: if two otherwise independent runs begin

in the same conformational space of a certain PC, they may both

initially explore the same limited subspace along that PC and the

overlap of their PC projection histograms may actually be quite good.

Once sampling increases and one or both of the runs can explore

outside of their initial regions, the KLD better reflects the

convergence between the two simulations. Therefore, the KLD of PC

projection histograms is a good measure of convergence between two

simulations only if at least one simulation has explored along most or

all of a given PC. In other words, there is a minimum sampling time

required before this metric can be considered viable."

In our case since we were using enhanced sampling (HREMD and MREMD) as

well as other measures of convergence (like combined clustering) we

were confident that we had enough sampling to start to claim the

distributions are well-converged. So KLD is a useful metric for

quantifying overlap, but by itself does not necessarily indicate

convergence.

No - it's purely a measure of overlap between distributions and has

nothing to do with system size.

You certainly can. In our case we only wanted to look at overlap

between the 2 independent runs for each enhanced sampling method

tried.

Hope this helps answer some of your questions. Let me know if you have any more,

-Dan

-- ------------------------- Daniel R. Roe, PhD Department of Medicinal Chemistry University of Utah 30 South 2000 East, Room 307 Salt Lake City, UT 84112-5820 http://home.chpc.utah.edu/~cheatham/ (801) 587-9652 (801) 585-6208 (Fax) _______________________________________________ AMBER mailing list AMBER.ambermd.org http://lists.ambermd.org/mailman/listinfo/amberReceived on Wed Aug 05 2015 - 11:00:03 PDT

Custom Search