Re: [AMBER] Memory footprint goes on increasing: pytraj traj.iterchunk used, distances between bonds calculation from Hai Nguyen on 2017-03-12 (Amber Archive Mar 2017)

From: Hai Nguyen <nhai.qn.gmail.com>
Date: Sun, 12 Mar 2017 17:34:49 -0400

by the way, for this technical issue, can we move the discussion to github
issue here? (e.g: having code highlighting

thanks

https://github.com/Amber-MD/pytraj/issues/1365

Hai

On Sun, Mar 12, 2017 at 5:30 PM, Hai Nguyen <nhai.qn.gmail.com> wrote:

> Hi,
>
> so I looked further at your profile.log and saw:
>
> 153 455.2 MiB 15.0 MiB bonds_val =
> pt.distance(chunk, bnd_list, dtype='ndarray')
>
> Isn't it expected since you save the distance val to bonds_val?
>
> Can you try to replace bonds_val = pt.distance(...)
> by
> bonds_val = some_equal_size_numpy_array
> and do the profiling again?
>
> Hai
>
>
> On Sun, Mar 12, 2017 at 5:13 PM, SHAILESH KUMAR <shaile27_sit.jnu.ac.in>
> wrote:
>
>> Hi,
>>
>> Thank you for reply.
>>
>>
>> Problem does not seem to be with iterchunk, because earlier I tried to do
>> chunking by calling ptraj.iterload inside loop emulating chunks and frame
>> slicing but had similar problems.
>>
>> If ptraj.iterload is called inside loop with varying frame_slice combined
>> with range based extraction of frames, it should have not done problem,
>> but
>> it did.
>>
>> I suspect pytraj.distance, pytraj.angles and pytraj.dihedrals are causing
>> memory leaks. Apart from this, as you suggested using iterframes
>> it would be to costly in terms of performance to use because trajectory
>> have milions of frames and I am doing it to actually for coordinate system
>> conversion
>> for the molecule from cartesian to internal coordinate system. I intend to
>> write these bonds, angles, dihedrals sets as NETCDF4/HDF5 files with
>> chunking support
>> so that it can be efficiently read dimension wise(individual bonds/angles
>> /torsions) for all the frames.
>>
>> Additionally, I am using netCDF4 package for writing NetCDF4 (currently
>> disabled for spoting memory leaks) trajectory in inernal coordinate, I
>> would prefer to use pytraj itself if I could write NetCDF4 with chunking
>> with it.
>>
>> and reduce list of dependencies.
>>
>> On Sun, Mar 12, 2017 at 9:41 PM, Nhai <nhai.qn.gmail.com> wrote:
>>
>> > Hi
>> >
>> > iterchunk is not well written and I don't find it useful much.
>> >
>> > can you try:
>> >
>> > for frame in traj:
>> > dosomething(...)
>> >
>> > But I will have a look at the iterchunk stuff. Thanks.
>> >
>> > Hai
>> >
>> > > On Mar 12, 2017, at 4:17 PM, SHAILESH KUMAR <shaile27_sit.jnu.ac.in>
>> > wrote:
>> > >
>> > > Dear all,
>> > >
>> > > I am trying to process a big trajectory which can not fit in memory my
>> > > computer. So, I tried using Iterating over the full trajectory using
>> > > iterchunk method in pytraj. In each iteration bond lengths are
>> calculated
>> > > for the frames in chunk, and can be written to a file (which
>> currently is
>> > > disable for memory profiling purpose) for further analysis because
>> > memory
>> > > footprint of the process keeps on growing.
>> > >
>> > > Its pseudo code can be as follows:
>> > >
>> > > traj = ptraj.iterload(trajfile, prmtopfile, frame_slice=slice_info)
>> > >
>> > > for chunk in traj.iterchunk(chunk_size, start=0, start=-1):
>> > > bnd_vals = pt.distance(chunk, bnd_list, dtype='ndarray')
>> > > # do process bnd_vals() ## curently disabled
>> > > gc.collect()
>> > >
>> > > But on memory profiling it was observed that memory keeps on
>> increasing
>> > in
>> > > ievery iteration of chunks. Which indicates memory leak in
>> > > bnd_vals = pt.distance(chunk, bnd_list, dtype='ndarray')
>> > > but why is not clear to me. May be I am doing something wrong and not
>> > able
>> > > to spot it, or there is memory leak somewhere (may be in api).
>> > >
>> > >
>> > > Now I ask for help, if any one can help me to sort it out, it would
>> be a
>> > > great favor. For reproduciblity i am attaching simplified code, with
>> > > necessary input files except trajectory, i can share it using dropbox
>> > when
>> > > needed. This test dataset corresponds to a small molecule. Actual
>> problem
>> > > is to do similar for actual protein molecules.
>> > > <profile.log>
>> > > <dummy-code.py>
>> > > <INR.DFS.tree>
>> > > <INR.lig.gas.leap.prmtop>
>> > > <INR.pdb>
>> > > _______________________________________________
>> > > AMBER mailing list
>> > > AMBER.ambermd.org
>> > > http://lists.ambermd.org/mailman/listinfo/amber
>> >
>> > _______________________________________________
>> > AMBER mailing list
>> > AMBER.ambermd.org
>> > http://lists.ambermd.org/mailman/listinfo/amber
>> >
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sun Mar 12 2017 - 15:00:03 PDT