Re: [AMBER] FW: clustering problem in ambertool14

From: Daniel Roe <daniel.r.roe.gmail.com>
Date: Tue, 6 Jan 2015 16:24:28 -0700

Hi,

On Tue, Jan 6, 2015 at 3:47 PM, Mahendra B Thapa <thapamb.mail.uc.edu> wrote:
> With the use of 'sieve 10' , the 'cpptraj' command has been running
> without any complain but the analysis is very slow for my case (50000
> frames, 511 residues with 7584 atoms in each frame). A section of the
> screenshot *after 24 hours *is as follows:

This is an inherently time-consuming process, since you need to run
N*(N-1) / 2 calculations (roughly 12.5 M). Depending on your processor
speed this can take a long time, particularly if you have a big
system. I typically use OpenMP-compiled cpptraj for this, since the
pairwise calc is one of the things that is parallelized. To give you
an idea of what timings I see, for 22084 sieved frames with 8 threads
I can complete the pairwise portion of the calculation (RMSD selecting
67 atoms) in 97 seconds (CPU is 2x Xeon X5660 . 2.8 GHz).

Your best bet is to first use a very small number of frames (as a
test) to get an idea of how long things will take, and also to make
sure that when your clustering completes you get all the output you
are expecting. It's pretty awful when you do an expensive clustering
calc and realize you forgot you wanted cluster numbers vs time etc.

One thing that can help speed up subsequent clustering calcs is to use
the loadpairdist/savepairdist keywords to re-use calculated pairwise
distances. Some care must be taken when doing this though - everything
pertaining to the distance metric (sieve, mask, etc) MUST remain the
same or you will get bad results. Cpptraj does some checking for this
but can't always catch everything.

Hope this helps,

-Dan

>
> ANALYSIS: Performing 1 analyses:
> 0: [cluster crdset MYTRAJ :1-511.CA,N,C,O mass clusters 10 out
> cluster_out nofit averagelinkage summary summary_out info Cluster_info
> sieve 10 repout box2.rep repfmt pdb clusterout cluster.nc clusterfmt netcdf]
> Starting clustering.
> Mask [:1-511.CA,N,C,O] corresponds to 2044 atoms.
> Calculating pair-wise distances.
> Pair-wise matrix set up with sieve, 50000 frames, 5000 sieved frames.
> 0%
>
> Thank you for help,
> Mahendra Thapa
> University of Cincinnati,OH
>
>
> On Fri, Jan 2, 2015 at 4:35 PM, Thapa, Mahendra (thapamb) <
> thapamb.mail.uc.edu> wrote:
>
>>
>>
>>
>> ________________________________________
>> From: Daniel Roe
>> Sent: Friday, January 2, 2015 3:35:17 PM (UTC-06:00) Central America
>> To: AMBER Mailing List
>> Subject: Re: [AMBER] clustering problem in ambertool14
>>
>> Hi,
>>
>> I suspect that if you are running out of memory even after using
>> 'loadtraj', the issue may be with the pairwise distance matrix.
>>
>> To reduce the amount of memory needed by the pairwise distance matrix use
>> the 'sieve' keyword. Try 'sieve 10' to start. Increase the sieve value as
>> necessary.
>>
>> -Dan
>>
>> On Friday, January 2, 2015, Mahendra B Thapa <thapamb.mail.uc.edu> wrote:
>>
>> > Dear Dr. Daniel
>> >
>> > Thank you for the suggestion; I successfully updated to 'CPPTRAJ:
>> > Trajectory Analysis. V14.22'
>> >
>> > But, again previous error message (as you answered in the first time)
>> > appeared as
>> > terminate called after throwing an instance of 'std::bad_alloc'
>> > what(): std::bad_alloc
>> >
>> > Using the formula you gave me, the space requirement is 23GB but I have
>> > enough memory (400GB) in my external drive where I run the 'cpptraj'
>> > command.
>> >
>> > Thank you for help,
>> > Mahendra Thapa
>> > University of Cincinnati,OH
>> >
>> > On Fri, Jan 2, 2015 at 2:01 PM, Thapa, Mahendra (thapamb) <
>> > thapamb.mail.uc.edu <javascript:;>> wrote:
>> >
>> > >
>> > >
>> > >
>> > > ________________________________________
>> > > From: Daniel Roe
>> > > Sent: Friday, January 2, 2015 1:00:45 PM (UTC-06:00) Central America
>> > > To: AMBER Mailing List
>> > > Subject: Re: [AMBER] FW: clustering problem in ambertool14
>> > >
>> > > Hi,
>> > >
>> > > According to your log output you haven't applied all updates. The very
>> > > first line of output is:
>> > >
>> > > CPPTRAJ: Trajectory Analysis. V14.00
>> > >
>> > > You need at least 14.17 for clustering with the traj data set to work
>> > > properly (and really you should have 14.22). After updates are applied
>> > > the code must be recompiled. Also if you are not using the full path
>> > > to cpptraj when executing make sure that the cpptraj you are actually
>> > > using is the up-to-date one (with e.g. the command 'which cpptraj`).
>> > >
>> > > Hope this helps,
>> > >
>> > > -Dan
>> > >
>> > >
>> > > On Fri, Jan 2, 2015 at 9:08 AM, Mahendra B Thapa <thapamb.mail.uc.edu
>> > <javascript:;>>
>> > > wrote:
>> > > > Dear Dr.Daniel
>> > > > Memory issues were solved when I followed the steps you suggested;
>> > thank
>> > > > you for that.
>> > > >
>> > > > A new problem appeared as seen in the screen:
>> > > >
>> > > > Internal Error: Metric is COORDS base but data set is not.
>> > > > Error: in Analysis # 0
>> > > > 1 errors encountered reading input.
>> > > >
>> > > > {{ Note: I have already fixed bugs for ambertool 14
>> > > > http://ambermd.org/bugfixes/AmberTools/14.0/update.17
>> > > > }}
>> > > >
>> > > > DATAFILES:
>> > > > cluster_out (Standard Data File): Cnum_00001
>> > > > Warning: Set 'Cnum_00001' contains no data.
>> > > > Warning: File 'cluster_out' has no sets containing data.
>> > > >
>> > > > Are these errors due to (i) a large numbers of frames (250000) and
>> > number
>> > > > of atoms (7584 atoms) ?
>> > > >
>> > > > In the previous post (http://archive.ambermd.org/201408/0214.html),
>> > > there
>> > > > is some discussion but I am assuming that I have been using stripped
>> > > > topology file to run cpptraj. I have attached the screen shot ( text
>> > > > file:TEST_LOG) with this email.
>> > > >
>> > > > Thank you for help,
>> > > > Mahendra Thapa
>> > > >
>> > > >
>> > > > On Tue, Dec 16, 2014 at 3:20 PM, Thapa, Mahendra (thapamb) <
>> > > > thapamb.mail.uc.edu <javascript:;>> wrote:
>> > > >
>> > > >>
>> > > >>
>> > > >>
>> > > >> ________________________________________
>> > > >> From: Daniel Roe
>> > > >> Sent: Tuesday, December 16, 2014 2:19:51 PM (UTC-06:00) Central
>> > America
>> > > >> To: AMBER Mailing List
>> > > >> Subject: Re: [AMBER] clustering problem in ambertool14
>> > > >>
>> > > >> Hi,
>> > > >>
>> > > >> Usually when you get this error message during a command that uses a
>> > > >> COORDS data set (cluster, 2drms, crdfluct etc) it's because you ran
>> > > >> out of memory. Here is a formula to estimate the amount of memory
>> you
>> > > >> will need to hold a COORDS data set:
>> > > >>
>> > > >> memory_in_bytes = (F * A * 3) * 4
>> > > >>
>> > > >> where F is the number of frames, A is the number of atoms (after
>> > > >> stripping in this case), the 3 is from # of coords per atom and 4 is
>> > > >> bytes (COORDS are single precision). Divide by 1048576 to get the
>> > > >> result in MB. Add 6 to (F * A *3) if you have box coordinates,
>> double
>> > > >> if you have velocities as well.
>> > > >>
>> > > >> However, in place of a COORDS data set cpptraj also lets you use
>> what
>> > > >> is called a TRAJ data set (which leaves data on-disk). The only
>> issue
>> > > >> with this is because it remains on the disk you cannot modify a TRAJ
>> > > >> data set, so you will have to pre-process your trajectory (i.e.
>> > > >> strip/image) first. This is a good idea to do in general since it
>> will
>> > > >> make subsequent analyses faster. Here is some input as an example.
>> > > >>
>> > > >> # Step 1 - Preprocess
>> > > >> parm myparm.parm7
>> > > >> trajin mytraj.nc
>> > > >> strip :Na+,WAT nobox outprefix strip
>> > > >> autoimage
>> > > >> rms first mass .C,CA,N
>> > > >> trajout strip.mytraj.nc nobox
>> > > >>
>> > > >> A few things to note here. First is that I put the 'strip' command
>> > > >> before everything else; this way subsequent commands will be faster
>> > > >> because there are less atoms to deal with. Also note in my 'strip'
>> > > >> command I'm writing out a stripped topology for use with my stripped
>> > > >> trajectory. Finally and most importantly, because you are
>> rms-fitting
>> > > >> you will no longer be able to image anyway, so I'm getting rid of
>> any
>> > > >> box coordinates.
>> > > >>
>> > > >> # Step 2 - Cluster
>> > > >> parm strip.myparm.parm7
>> > > >> trajin strip.mytraj.nc
>> > > >> loadtraj name MYTRAJ
>> > > >> cluster crdset MYTRAJ :1-291.CA,N,C,O mass clusters 10 out
>> > cluster_out
>> > > >> nofit averagelinkage \
>> > > >> summary summary_out info Cluster_info repout box2.rep repfmt pdb
>> > > >> clusterout cluster.nc clusterfmt netcdf
>> > > >>
>> > > >> The 'loadtraj' command in this case is taking all loaded
>> trajectories
>> > > >> from 'trajin' statements and putting them into a TRAJ data set named
>> > > >> MYTRAJ, which stays on-disk and can subsequently be used by the
>> > > >> 'cluster' command.
>> > > >>
>> > > >> One more thing to keep in mind is that even though the coordinates
>> > > >> will be kept on disk, you will still need enough memory to hold the
>> > > >> pairwise distance matrix:
>> > > >>
>> > > >> memory_in_bytes = ((F * (F-1)) / 2) * 4
>> > > >>
>> > > >> If you don't have enough memory to hold the pairwise distance matrix
>> > > >> try using the 'sieve' keyword to reduce the number of frames being
>> > > >> clustered in the first pass. This will also speed up the actual
>> > > >> clustering a bit. Last and most importantly make sure you are using
>> > > >> the most up-to-date version of cpptraj (14.22).
>> > > >>
>> > > >> Hope this helps,
>> > > >>
>> > > >> -Dan
>> > > >>
>> > > >> On Tue, Dec 16, 2014 at 11:28 AM, Mahendra B Thapa <
>> > thapamb.mail.uc.edu <javascript:;>
>> > > >
>> > > >> wrote:
>> > > >> > Dear Amber users
>> > > >> > I used following command for clustering 50ns all-atom simulated
>> > data.
>> > > >> > cpptraj -i input_file -p para_top
>> > > >> > where 'input_file' consists of
>> > > >> >
>> > > >> > trajin mdcrd_files
>> > > >> > autoimage
>> > > >> > rms first mass .C,CA,N
>> > > >> > strip :Na+,WAT
>> > > >> > cluster :1-291.CA,N,C,O mass clusters 10 out cluster_out nofit
>> > > >> > averagelinkage \
>> > > >> > summary summary_out info Cluster_info repout box2.rep repfmt pdb
>> > > >> > clusterout cluster.nc clusterfmt netcdf
>> > > >> > go
>> > > >> >
>> > > >> > After running the command, I got following message without any
>> > output
>> > > >> files:
>> > > >> >
>> > > >> > 1]terminate called after throwing an instance of 'std::bad_alloc'
>> > > >> > what(): std::bad_alloc
>> > > >> > Aborted
>> > > >> >
>> > > >> > 2] Warning: One or more analyses requested creation of default
>> > COORDS
>> > > >> > DataSet.
>> > > >> > CREATECRD: Saving coordinates from Top to file to
>> "_DEFAULTCRD_"
>> > > >> >
>> > > >> >
>> > > >> > 3]Warning: Coordinates are being rotated and box coordinates are
>> > > present.
>> > > >> > Warning: Unit cell vectors are NOT rotated; imaging will not be
>> > > possible
>> > > >> > Warning: after the RMS-fit is performed.
>> > > >> >
>> > > >> > Any comments and suggestion will be very useful.
>> > > >> >
>> > > >> > Thank you,
>> > > >> > Mahendra Thapa
>> > > >> > University of Cincinnati
>> > > >> > _______________________________________________
>> > > >> > AMBER mailing list
>> > > >> > AMBER.ambermd.org <javascript:;>
>> > > >> > http://lists.ambermd.org/mailman/listinfo/amber
>> > > >>
>> > > >>
>> > > >>
>> > > >> --
>> > > >> -------------------------
>> > > >> Daniel R. Roe, PhD
>> > > >> Department of Medicinal Chemistry
>> > > >> University of Utah
>> > > >> 30 South 2000 East, Room 307
>> > > >> Salt Lake City, UT 84112-5820
>> > > >> http://home.chpc.utah.edu/~cheatham/
>> > > >> (801) 587-9652
>> > > >> (801) 585-6208 (Fax)
>> > > >>
>> > > >> _______________________________________________
>> > > >> AMBER mailing list
>> > > >> AMBER.ambermd.org <javascript:;>
>> > > >> http://lists.ambermd.org/mailman/listinfo/amber
>> > > >>
>> > > >
>> > > > _______________________________________________
>> > > > AMBER mailing list
>> > > > AMBER.ambermd.org <javascript:;>
>> > > > http://lists.ambermd.org/mailman/listinfo/amber
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > -------------------------
>> > > Daniel R. Roe, PhD
>> > > Department of Medicinal Chemistry
>> > > University of Utah
>> > > 30 South 2000 East, Room 307
>> > > Salt Lake City, UT 84112-5820
>> > > http://home.chpc.utah.edu/~cheatham/
>> > > (801) 587-9652
>> > > (801) 585-6208 (Fax)
>> > >
>> > > _______________________________________________
>> > > AMBER mailing list
>> > > AMBER.ambermd.org <javascript:;>
>> > > http://lists.ambermd.org/mailman/listinfo/amber
>> > >
>> > _______________________________________________
>> > AMBER mailing list
>> > AMBER.ambermd.org <javascript:;>
>> > http://lists.ambermd.org/mailman/listinfo/amber
>> >
>>
>>
>> --
>> -------------------------
>> Daniel R. Roe, PhD
>> Department of Medicinal Chemistry
>> University of Utah
>> 30 South 2000 East, Room 307
>> Salt Lake City, UT 84112-5820
>> http://home.chpc.utah.edu/~cheatham/
>> (801) 587-9652
>> (801) 585-6208 (Fax)
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber



-- 
-------------------------
Daniel R. Roe, PhD
Department of Medicinal Chemistry
University of Utah
30 South 2000 East, Room 307
Salt Lake City, UT 84112-5820
http://home.chpc.utah.edu/~cheatham/
(801) 587-9652
(801) 585-6208 (Fax)
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Jan 06 2015 - 15:30:02 PST
Custom Search