Re: [AMBER] Memory footprint goes on increasing: pytraj traj.iterchunk used, distances between bonds calculation from SHAILESH KUMAR on 2017-03-12 (Amber Archive Mar 2017)

From: SHAILESH KUMAR <shaile27_sit.jnu.ac.in>
Date: Sun, 12 Mar 2017 22:13:51 +0100

Hi,

Thank you for reply.

Problem does not seem to be with iterchunk, because earlier I tried to do
chunking by calling ptraj.iterload inside loop emulating chunks and frame
slicing but had similar problems.

If ptraj.iterload is called inside loop with varying frame_slice combined
with range based extraction of frames, it should have not done problem, but
it did.

I suspect pytraj.distance, pytraj.angles and pytraj.dihedrals are causing
memory leaks. Apart from this, as you suggested using iterframes
it would be to costly in terms of performance to use because trajectory
have milions of frames and I am doing it to actually for coordinate system
conversion
for the molecule from cartesian to internal coordinate system. I intend to
write these bonds, angles, dihedrals sets as NETCDF4/HDF5 files with
chunking support
so that it can be efficiently read dimension wise(individual bonds/angles
/torsions) for all the frames.

Additionally, I am using netCDF4 package for writing NetCDF4 (currently
disabled for spoting memory leaks) trajectory in inernal coordinate, I
would prefer to use pytraj itself if I could write NetCDF4 with chunking
with it.

and reduce list of dependencies.

On Sun, Mar 12, 2017 at 9:41 PM, Nhai <nhai.qn.gmail.com> wrote:

> Hi
>
> iterchunk is not well written and I don't find it useful much.
>
> can you try:
>
> for frame in traj:
> dosomething(...)
>
> But I will have a look at the iterchunk stuff. Thanks.
>
> Hai
>
> > On Mar 12, 2017, at 4:17 PM, SHAILESH KUMAR <shaile27_sit.jnu.ac.in>
> wrote:
> >
> > Dear all,
> >
> > I am trying to process a big trajectory which can not fit in memory my
> > computer. So, I tried using Iterating over the full trajectory using
> > iterchunk method in pytraj. In each iteration bond lengths are calculated
> > for the frames in chunk, and can be written to a file (which currently is
> > disable for memory profiling purpose) for further analysis because
> memory
> > footprint of the process keeps on growing.
> >
> > Its pseudo code can be as follows:
> >
> > traj = ptraj.iterload(trajfile, prmtopfile, frame_slice=slice_info)
> >
> > for chunk in traj.iterchunk(chunk_size, start=0, start=-1):
> > bnd_vals = pt.distance(chunk, bnd_list, dtype='ndarray')
> > # do process bnd_vals() ## curently disabled
> > gc.collect()
> >
> > But on memory profiling it was observed that memory keeps on increasing
> in
> > ievery iteration of chunks. Which indicates memory leak in
> > bnd_vals = pt.distance(chunk, bnd_list, dtype='ndarray')
> > but why is not clear to me. May be I am doing something wrong and not
> able
> > to spot it, or there is memory leak somewhere (may be in api).
> >
> >
> > Now I ask for help, if any one can help me to sort it out, it would be a
> > great favor. For reproduciblity i am attaching simplified code, with
> > necessary input files except trajectory, i can share it using dropbox
> when
> > needed. This test dataset corresponds to a small molecule. Actual problem
> > is to do similar for actual protein molecules.
> > <profile.log>
> > <dummy-code.py>
> > <INR.DFS.tree>
> > <INR.lig.gas.leap.prmtop>
> > <INR.pdb>
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sun Mar 12 2017 - 14:30:02 PDT