Re: [AMBER] GIST: convergence of the translational entropy term from Tom Kurtzman on 2016-11-11 (Amber Archive Nov 2016)

From: Tom Kurtzman <simpleliquid.gmail.com>
Date: Fri, 11 Nov 2016 13:26:13 -0500

Sergey, Steve and I had a brief discussion about this and there doesn't
seem to be anything concerning to us in this behavior. The behavior, is
consistent with what we'd expect with sparse sampling and how the code
handles sparse sampling.

The value of \rho \ln \rho which is used to calculate the translational
entropy is zero when \rho = 0. In the code, when the value of \rho is
below some threshold, we just don't calculate in since the \ln \rho term
approaches negative infinity and the computer just can't handle it and the
overall contribution even if we did calculate it is negligible.

The Nearest Neighbor algorithm which uses an approximation of the local
density around a particle (a water oxygen here) that is 1 over the volume
of the sphere that has a radius of the nearest neighbor distance. For
computational efficiency to find a nearest neighbor of a water oxygen, we
only search the voxel the water oxygen is in, and all neighboring voxels
(27 total). With the default voxel size of .5 angstroms per side, the
probability of finding a particle in a voxel is about 1 in 10.

What this means is that at very sparse sampling (1 frame for example) the
NN estimated (or should I say mis-estimated) density in every voxel is zero
and the estimated entropy would be zero. If you only sample two frames
with independent configurations, about 90% of the frames would still have
an estimated local density of zero and hence we still expect entropy
estimates much higher than what is actual. Even with 10 frames of
sampling, you are still quite likely not to find any neighbors. This is
all an artifact of extremely sparse sampling and code that is designed for
efficiency at higher sampling. Once there is sufficient sampling for the
NN algorithm to work, the convergence of the method is really quite
outstanding. From your figure I'd certainly not use fewer frames than the
minimum of that curve (100 frames?) for translational density.

Tom

On Fri, Nov 11, 2016 at 11:43 AM, Steven Ramsey <vpsramsey.gmail.com> wrote:

> Hi Sergey,
>
> I think the initial drop in entropy you're seeing in your analysis is
> indeed an artifact due to low sampling. At very low frame counts (under
> 1000 or so) the nearest neighbor algorithm used to solve entropies will
> provide strange results due to there being a very low number of waters (and
> therefore neighbor distances) to consider.
>
> We recently evaluated GIST convergence rates in the cpptraj software
> release study (doi: 10.1002/jcc.24417) and found that entropies converge
> within 30000 frames (sampled every ps). This may be system specific, but is
> a reasonably good guess for most studies.
>
> Hope this helps, best of luck!
>
> --Steve
>
> On Fri, Nov 11, 2016 at 9:28 AM, Sergey Samsonov <
> sergeys.biotec.tu-dresden.de> wrote:
>
> > Dear AMBERs,
> >
> > I'm calibrating some GIST calculations. In particular, I'm checking how a
> > number of frames (equidistantly distributed through the equilibrated
> > simulation) taken for GIST calculations affects the values of GIST energy
> > components. The reason to do this is to find an optimal length of my
> > simulations (and a number of frames to analyze with GIST) for a system I
> > study so that the values I obtain are converged. I found a common feature
> > independently of regions, sizes a box used for GIST and lengths of the
> > simulations within the ranges I'm working in. So E(sw), E(ww) and
> > TS(orientational) converge very similarly: the values go down
> monotonically
> > with the increase of number of frames taken into account for the
> > calculations and converge for several thousand of frames (exact number
> > depends on the system and simulation type). This result is something one
> > can expect. However, TS(translational) behaves essentially differently:
> its
> > value drops when increasing the number of analyzed frames to ~200 and
> then
> > it goes up again (one of the examples is attached for 10000 frames from
> 100
> > ns long simulation). What could be the reason for such a non-monotonic
> > behaviour? Is the decrease observed simply due to an artifact of the
> > calculations when the number of frames too low?
> >
> > Thank you very much and cheers,
> >
> > Sergey
> >
> > --
> > Sergey A. Samsonov
> > Postdoctoral researcher
> > Structural Bioinformatics
> > Biotechnology Center
> > Tatzberg 47-51
> > 01307 Dresden, Germany
> >
> > Tel: (+49) 351 463 400 83
> > Fax: (+49) 351 463 402 87
> > E-mail: sergey.samsonov.biotec.tu-dresden.de
> > Webpage: www.biotec.tu-dresden.de
> >
> >
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
> >
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>

-- 
************************************************
Tom Kurtzman, Ph.D.
Assistant Professor
Department of Chemistry
Lehman College, CUNY
250 Bedford Park Blvd. West
Bronx, New York 10468
718-960-8832
http://www.lehman.edu/faculty/tkurtzman/
<http://www.lehman.edu/faculty/tkurtzman/index.html>
************************************************
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

Received on Fri Nov 11 2016 - 10:30:02 PST