Re: [AMBER] FW: FW: FW: clustering problem in ambertool14

From: Daniel Roe <daniel.r.roe.gmail.com>
Date: Tue, 20 Jan 2015 10:59:42 -0700

Hi,

On Sun, Jan 18, 2015 at 9:01 AM, Mahendra B Thapa <thapamb.mail.uc.edu> wrote:
> When I tried to test ' cpptraj.OMP' with 5000 frames for clustering, I got
> the following error message:
> *Floating point exception*

It's much easier for us to help if you can paste or attach your entire
input and output. A floating point exception message with no context
will be impossible to debug without more information.

> I had enough space (500GB) while running it and I did not get that error
> message while running 'cpptraj' in series even with 250000 frames.

You have 500 GB of RAM? Again, remember it is RAM that matters for the
pairwise matrix calculation, not disk space. The OpenMP version will
use a bit more memory than the serial version but not a huge amount
more. At any rate I don't think memory is the issue here. Send me the
entire output off-list and I'll see what I can find.

-Dan

>
> Thank you for help,
> Mahendra Thapa
> University of Cincinnati,OH
>
>
> On Sat, Jan 17, 2015 at 7:00 PM, Thapa, Mahendra (thapamb) <
> thapamb.mail.uc.edu> wrote:
>
>>
>>
>>
>> ________________________________________
>> From: Daniel Roe
>> Sent: Saturday, January 17, 2015 5:58:55 PM (UTC-06:00) Central America
>> To: AMBER Mailing List
>> Subject: Re: [AMBER] FW: FW: clustering problem in ambertool14
>>
>> Hi,
>>
>> The information in that post is unfortunately several years out of
>> date. The OpenMP-enabled cpptraj is built with the usual OpenMP
>> AmberTools build that you configure with the '-openmp' flag. This will
>> install cpptraj.OMP in the $AMBERHOME/bin directory. When you run it
>> you will see this at the top of the output:
>>
>> CPPTRAJ: Trajectory Analysis. V14.22 OpenMP
>>
>> -Dan
>>
>>
>> On Sat, Jan 17, 2015 at 9:01 AM, Mahendra B Thapa <thapamb.mail.uc.edu>
>> wrote:
>> > Dear Dr. Daniel,
>> >
>> > After compiling cpptraj in parallel, as mentioned in AMBER manual 14 and
>> > previous post "http://dev-archive.ambermd.org/201107/0005.html", cpptraj
>> > looks working well without any error message ( though it is not completed
>> > yet for my system). How do I know that cpptraj is running in parallel
>> mode
>> > instead of series? I issued the same command as
>> > cpptraj -i input_file -p test.top
>> >
>> > Thank you for help,
>> > Mahendra
>> >
>> >
>> >
>> > On Tue, Jan 6, 2015 at 6:25 PM, Thapa, Mahendra (thapamb) <
>> > thapamb.mail.uc.edu> wrote:
>> >
>> >>
>> >>
>> >>
>> >> ________________________________________
>> >> From: Daniel Roe
>> >> Sent: Tuesday, January 6, 2015 5:24:28 PM (UTC-06:00) Central America
>> >> To: AMBER Mailing List
>> >> Subject: Re: [AMBER] FW: clustering problem in ambertool14
>> >>
>> >> Hi,
>> >>
>> >> On Tue, Jan 6, 2015 at 3:47 PM, Mahendra B Thapa <thapamb.mail.uc.edu>
>> >> wrote:
>> >> > With the use of 'sieve 10' , the 'cpptraj' command has been running
>> >> > without any complain but the analysis is very slow for my case (50000
>> >> > frames, 511 residues with 7584 atoms in each frame). A section of the
>> >> > screenshot *after 24 hours *is as follows:
>> >>
>> >> This is an inherently time-consuming process, since you need to run
>> >> N*(N-1) / 2 calculations (roughly 12.5 M). Depending on your processor
>> >> speed this can take a long time, particularly if you have a big
>> >> system. I typically use OpenMP-compiled cpptraj for this, since the
>> >> pairwise calc is one of the things that is parallelized. To give you
>> >> an idea of what timings I see, for 22084 sieved frames with 8 threads
>> >> I can complete the pairwise portion of the calculation (RMSD selecting
>> >> 67 atoms) in 97 seconds (CPU is 2x Xeon X5660 . 2.8 GHz).
>> >>
>> >> Your best bet is to first use a very small number of frames (as a
>> >> test) to get an idea of how long things will take, and also to make
>> >> sure that when your clustering completes you get all the output you
>> >> are expecting. It's pretty awful when you do an expensive clustering
>> >> calc and realize you forgot you wanted cluster numbers vs time etc.
>> >>
>> >> One thing that can help speed up subsequent clustering calcs is to use
>> >> the loadpairdist/savepairdist keywords to re-use calculated pairwise
>> >> distances. Some care must be taken when doing this though - everything
>> >> pertaining to the distance metric (sieve, mask, etc) MUST remain the
>> >> same or you will get bad results. Cpptraj does some checking for this
>> >> but can't always catch everything.
>> >>
>> >> Hope this helps,
>> >>
>> >> -Dan
>> >>
>> >> >
>> >> > ANALYSIS: Performing 1 analyses:
>> >> > 0: [cluster crdset MYTRAJ :1-511.CA,N,C,O mass clusters 10 out
>> >> > cluster_out nofit averagelinkage summary summary_out info Cluster_info
>> >> > sieve 10 repout box2.rep repfmt pdb clusterout cluster.nc clusterfmt
>> >> netcdf]
>> >> > Starting clustering.
>> >> > Mask [:1-511.CA,N,C,O] corresponds to 2044 atoms.
>> >> > Calculating pair-wise distances.
>> >> > Pair-wise matrix set up with sieve, 50000 frames, 5000 sieved
>> frames.
>> >> > 0%
>> >> >
>> >> > Thank you for help,
>> >> > Mahendra Thapa
>> >> > University of Cincinnati,OH
>> >> >
>> >> >
>> >> > On Fri, Jan 2, 2015 at 4:35 PM, Thapa, Mahendra (thapamb) <
>> >> > thapamb.mail.uc.edu> wrote:
>> >> >
>> >> >>
>> >> >>
>> >> >>
>> >> >> ________________________________________
>> >> >> From: Daniel Roe
>> >> >> Sent: Friday, January 2, 2015 3:35:17 PM (UTC-06:00) Central America
>> >> >> To: AMBER Mailing List
>> >> >> Subject: Re: [AMBER] clustering problem in ambertool14
>> >> >>
>> >> >> Hi,
>> >> >>
>> >> >> I suspect that if you are running out of memory even after using
>> >> >> 'loadtraj', the issue may be with the pairwise distance matrix.
>> >> >>
>> >> >> To reduce the amount of memory needed by the pairwise distance matrix
>> >> use
>> >> >> the 'sieve' keyword. Try 'sieve 10' to start. Increase the sieve
>> value
>> >> as
>> >> >> necessary.
>> >> >>
>> >> >> -Dan
>> >> >>
>> >> >> On Friday, January 2, 2015, Mahendra B Thapa <thapamb.mail.uc.edu>
>> >> wrote:
>> >> >>
>> >> >> > Dear Dr. Daniel
>> >> >> >
>> >> >> > Thank you for the suggestion; I successfully updated to 'CPPTRAJ:
>> >> >> > Trajectory Analysis. V14.22'
>> >> >> >
>> >> >> > But, again previous error message (as you answered in the first
>> time)
>> >> >> > appeared as
>> >> >> > terminate called after throwing an instance of 'std::bad_alloc'
>> >> >> > what(): std::bad_alloc
>> >> >> >
>> >> >> > Using the formula you gave me, the space requirement is 23GB but I
>> >> have
>> >> >> > enough memory (400GB) in my external drive where I run the
>> 'cpptraj'
>> >> >> > command.
>> >> >> >
>> >> >> > Thank you for help,
>> >> >> > Mahendra Thapa
>> >> >> > University of Cincinnati,OH
>> >> >> >
>> >> >> > On Fri, Jan 2, 2015 at 2:01 PM, Thapa, Mahendra (thapamb) <
>> >> >> > thapamb.mail.uc.edu <javascript:;>> wrote:
>> >> >> >
>> >> >> > >
>> >> >> > >
>> >> >> > >
>> >> >> > > ________________________________________
>> >> >> > > From: Daniel Roe
>> >> >> > > Sent: Friday, January 2, 2015 1:00:45 PM (UTC-06:00) Central
>> America
>> >> >> > > To: AMBER Mailing List
>> >> >> > > Subject: Re: [AMBER] FW: clustering problem in ambertool14
>> >> >> > >
>> >> >> > > Hi,
>> >> >> > >
>> >> >> > > According to your log output you haven't applied all updates. The
>> >> very
>> >> >> > > first line of output is:
>> >> >> > >
>> >> >> > > CPPTRAJ: Trajectory Analysis. V14.00
>> >> >> > >
>> >> >> > > You need at least 14.17 for clustering with the traj data set to
>> >> work
>> >> >> > > properly (and really you should have 14.22). After updates are
>> >> applied
>> >> >> > > the code must be recompiled. Also if you are not using the full
>> path
>> >> >> > > to cpptraj when executing make sure that the cpptraj you are
>> >> actually
>> >> >> > > using is the up-to-date one (with e.g. the command 'which
>> cpptraj`).
>> >> >> > >
>> >> >> > > Hope this helps,
>> >> >> > >
>> >> >> > > -Dan
>> >> >> > >
>> >> >> > >
>> >> >> > > On Fri, Jan 2, 2015 at 9:08 AM, Mahendra B Thapa <
>> >> thapamb.mail.uc.edu
>> >> >> > <javascript:;>>
>> >> >> > > wrote:
>> >> >> > > > Dear Dr.Daniel
>> >> >> > > > Memory issues were solved when I followed the steps you
>> suggested;
>> >> >> > thank
>> >> >> > > > you for that.
>> >> >> > > >
>> >> >> > > > A new problem appeared as seen in the screen:
>> >> >> > > >
>> >> >> > > > Internal Error: Metric is COORDS base but data set is not.
>> >> >> > > > Error: in Analysis # 0
>> >> >> > > > 1 errors encountered reading input.
>> >> >> > > >
>> >> >> > > > {{ Note: I have already fixed bugs for ambertool 14
>> >> >> > > > http://ambermd.org/bugfixes/AmberTools/14.0/update.17
>> >> >> > > > }}
>> >> >> > > >
>> >> >> > > > DATAFILES:
>> >> >> > > > cluster_out (Standard Data File): Cnum_00001
>> >> >> > > > Warning: Set 'Cnum_00001' contains no data.
>> >> >> > > > Warning: File 'cluster_out' has no sets containing data.
>> >> >> > > >
>> >> >> > > > Are these errors due to (i) a large numbers of frames (250000)
>> and
>> >> >> > number
>> >> >> > > > of atoms (7584 atoms) ?
>> >> >> > > >
>> >> >> > > > In the previous post (
>> http://archive.ambermd.org/201408/0214.html
>> >> ),
>> >> >> > > there
>> >> >> > > > is some discussion but I am assuming that I have been using
>> >> stripped
>> >> >> > > > topology file to run cpptraj. I have attached the screen shot (
>> >> text
>> >> >> > > > file:TEST_LOG) with this email.
>> >> >> > > >
>> >> >> > > > Thank you for help,
>> >> >> > > > Mahendra Thapa
>> >> >> > > >
>> >> >> > > >
>> >> >> > > > On Tue, Dec 16, 2014 at 3:20 PM, Thapa, Mahendra (thapamb) <
>> >> >> > > > thapamb.mail.uc.edu <javascript:;>> wrote:
>> >> >> > > >
>> >> >> > > >>
>> >> >> > > >>
>> >> >> > > >>
>> >> >> > > >> ________________________________________
>> >> >> > > >> From: Daniel Roe
>> >> >> > > >> Sent: Tuesday, December 16, 2014 2:19:51 PM (UTC-06:00)
>> Central
>> >> >> > America
>> >> >> > > >> To: AMBER Mailing List
>> >> >> > > >> Subject: Re: [AMBER] clustering problem in ambertool14
>> >> >> > > >>
>> >> >> > > >> Hi,
>> >> >> > > >>
>> >> >> > > >> Usually when you get this error message during a command that
>> >> uses a
>> >> >> > > >> COORDS data set (cluster, 2drms, crdfluct etc) it's because
>> you
>> >> ran
>> >> >> > > >> out of memory. Here is a formula to estimate the amount of
>> memory
>> >> >> you
>> >> >> > > >> will need to hold a COORDS data set:
>> >> >> > > >>
>> >> >> > > >> memory_in_bytes = (F * A * 3) * 4
>> >> >> > > >>
>> >> >> > > >> where F is the number of frames, A is the number of atoms
>> (after
>> >> >> > > >> stripping in this case), the 3 is from # of coords per atom
>> and
>> >> 4 is
>> >> >> > > >> bytes (COORDS are single precision). Divide by 1048576 to get
>> the
>> >> >> > > >> result in MB. Add 6 to (F * A *3) if you have box coordinates,
>> >> >> double
>> >> >> > > >> if you have velocities as well.
>> >> >> > > >>
>> >> >> > > >> However, in place of a COORDS data set cpptraj also lets you
>> use
>> >> >> what
>> >> >> > > >> is called a TRAJ data set (which leaves data on-disk). The
>> only
>> >> >> issue
>> >> >> > > >> with this is because it remains on the disk you cannot modify
>> a
>> >> TRAJ
>> >> >> > > >> data set, so you will have to pre-process your trajectory
>> (i.e.
>> >> >> > > >> strip/image) first. This is a good idea to do in general
>> since it
>> >> >> will
>> >> >> > > >> make subsequent analyses faster. Here is some input as an
>> >> example.
>> >> >> > > >>
>> >> >> > > >> # Step 1 - Preprocess
>> >> >> > > >> parm myparm.parm7
>> >> >> > > >> trajin mytraj.nc
>> >> >> > > >> strip :Na+,WAT nobox outprefix strip
>> >> >> > > >> autoimage
>> >> >> > > >> rms first mass .C,CA,N
>> >> >> > > >> trajout strip.mytraj.nc nobox
>> >> >> > > >>
>> >> >> > > >> A few things to note here. First is that I put the 'strip'
>> >> command
>> >> >> > > >> before everything else; this way subsequent commands will be
>> >> faster
>> >> >> > > >> because there are less atoms to deal with. Also note in my
>> >> 'strip'
>> >> >> > > >> command I'm writing out a stripped topology for use with my
>> >> stripped
>> >> >> > > >> trajectory. Finally and most importantly, because you are
>> >> >> rms-fitting
>> >> >> > > >> you will no longer be able to image anyway, so I'm getting
>> rid of
>> >> >> any
>> >> >> > > >> box coordinates.
>> >> >> > > >>
>> >> >> > > >> # Step 2 - Cluster
>> >> >> > > >> parm strip.myparm.parm7
>> >> >> > > >> trajin strip.mytraj.nc
>> >> >> > > >> loadtraj name MYTRAJ
>> >> >> > > >> cluster crdset MYTRAJ :1-291.CA,N,C,O mass clusters 10 out
>> >> >> > cluster_out
>> >> >> > > >> nofit averagelinkage \
>> >> >> > > >> summary summary_out info Cluster_info repout box2.rep
>> repfmt
>> >> pdb
>> >> >> > > >> clusterout cluster.nc clusterfmt netcdf
>> >> >> > > >>
>> >> >> > > >> The 'loadtraj' command in this case is taking all loaded
>> >> >> trajectories
>> >> >> > > >> from 'trajin' statements and putting them into a TRAJ data set
>> >> named
>> >> >> > > >> MYTRAJ, which stays on-disk and can subsequently be used by
>> the
>> >> >> > > >> 'cluster' command.
>> >> >> > > >>
>> >> >> > > >> One more thing to keep in mind is that even though the
>> >> coordinates
>> >> >> > > >> will be kept on disk, you will still need enough memory to
>> hold
>> >> the
>> >> >> > > >> pairwise distance matrix:
>> >> >> > > >>
>> >> >> > > >> memory_in_bytes = ((F * (F-1)) / 2) * 4
>> >> >> > > >>
>> >> >> > > >> If you don't have enough memory to hold the pairwise distance
>> >> matrix
>> >> >> > > >> try using the 'sieve' keyword to reduce the number of frames
>> >> being
>> >> >> > > >> clustered in the first pass. This will also speed up the
>> actual
>> >> >> > > >> clustering a bit. Last and most importantly make sure you are
>> >> using
>> >> >> > > >> the most up-to-date version of cpptraj (14.22).
>> >> >> > > >>
>> >> >> > > >> Hope this helps,
>> >> >> > > >>
>> >> >> > > >> -Dan
>> >> >> > > >>
>> >> >> > > >> On Tue, Dec 16, 2014 at 11:28 AM, Mahendra B Thapa <
>> >> >> > thapamb.mail.uc.edu <javascript:;>
>> >> >> > > >
>> >> >> > > >> wrote:
>> >> >> > > >> > Dear Amber users
>> >> >> > > >> > I used following command for clustering 50ns all-atom
>> simulated
>> >> >> > data.
>> >> >> > > >> > cpptraj -i input_file -p para_top
>> >> >> > > >> > where 'input_file' consists of
>> >> >> > > >> >
>> >> >> > > >> > trajin mdcrd_files
>> >> >> > > >> > autoimage
>> >> >> > > >> > rms first mass .C,CA,N
>> >> >> > > >> > strip :Na+,WAT
>> >> >> > > >> > cluster :1-291.CA,N,C,O mass clusters 10 out cluster_out
>> nofit
>> >> >> > > >> > averagelinkage \
>> >> >> > > >> > summary summary_out info Cluster_info repout box2.rep
>> repfmt
>> >> pdb
>> >> >> > > >> > clusterout cluster.nc clusterfmt netcdf
>> >> >> > > >> > go
>> >> >> > > >> >
>> >> >> > > >> > After running the command, I got following message without
>> any
>> >> >> > output
>> >> >> > > >> files:
>> >> >> > > >> >
>> >> >> > > >> > 1]terminate called after throwing an instance of
>> >> 'std::bad_alloc'
>> >> >> > > >> > what(): std::bad_alloc
>> >> >> > > >> > Aborted
>> >> >> > > >> >
>> >> >> > > >> > 2] Warning: One or more analyses requested creation of
>> default
>> >> >> > COORDS
>> >> >> > > >> > DataSet.
>> >> >> > > >> > CREATECRD: Saving coordinates from Top to file to
>> >> >> "_DEFAULTCRD_"
>> >> >> > > >> >
>> >> >> > > >> >
>> >> >> > > >> > 3]Warning: Coordinates are being rotated and box coordinates
>> >> are
>> >> >> > > present.
>> >> >> > > >> > Warning: Unit cell vectors are NOT rotated; imaging will
>> not be
>> >> >> > > possible
>> >> >> > > >> > Warning: after the RMS-fit is performed.
>> >> >> > > >> >
>> >> >> > > >> > Any comments and suggestion will be very useful.
>> >> >> > > >> >
>> >> >> > > >> > Thank you,
>> >> >> > > >> > Mahendra Thapa
>> >> >> > > >> > University of Cincinnati
>> >> >> > > >> > _______________________________________________
>> >> >> > > >> > AMBER mailing list
>> >> >> > > >> > AMBER.ambermd.org <javascript:;>
>> >> >> > > >> > http://lists.ambermd.org/mailman/listinfo/amber
>> >> >> > > >>
>> >> >> > > >>
>> >> >> > > >>
>> >> >> > > >> --
>> >> >> > > >> -------------------------
>> >> >> > > >> Daniel R. Roe, PhD
>> >> >> > > >> Department of Medicinal Chemistry
>> >> >> > > >> University of Utah
>> >> >> > > >> 30 South 2000 East, Room 307
>> >> >> > > >> Salt Lake City, UT 84112-5820
>> >> >> > > >> http://home.chpc.utah.edu/~cheatham/
>> >> >> > > >> (801) 587-9652
>> >> >> > > >> (801) 585-6208 (Fax)
>> >> >> > > >>
>> >> >> > > >> _______________________________________________
>> >> >> > > >> AMBER mailing list
>> >> >> > > >> AMBER.ambermd.org <javascript:;>
>> >> >> > > >> http://lists.ambermd.org/mailman/listinfo/amber
>> >> >> > > >>
>> >> >> > > >
>> >> >> > > > _______________________________________________
>> >> >> > > > AMBER mailing list
>> >> >> > > > AMBER.ambermd.org <javascript:;>
>> >> >> > > > http://lists.ambermd.org/mailman/listinfo/amber
>> >> >> > > >
>> >> >> > >
>> >> >> > >
>> >> >> > >
>> >> >> > > --
>> >> >> > > -------------------------
>> >> >> > > Daniel R. Roe, PhD
>> >> >> > > Department of Medicinal Chemistry
>> >> >> > > University of Utah
>> >> >> > > 30 South 2000 East, Room 307
>> >> >> > > Salt Lake City, UT 84112-5820
>> >> >> > > http://home.chpc.utah.edu/~cheatham/
>> >> >> > > (801) 587-9652
>> >> >> > > (801) 585-6208 (Fax)
>> >> >> > >
>> >> >> > > _______________________________________________
>> >> >> > > AMBER mailing list
>> >> >> > > AMBER.ambermd.org <javascript:;>
>> >> >> > > http://lists.ambermd.org/mailman/listinfo/amber
>> >> >> > >
>> >> >> > _______________________________________________
>> >> >> > AMBER mailing list
>> >> >> > AMBER.ambermd.org <javascript:;>
>> >> >> > http://lists.ambermd.org/mailman/listinfo/amber
>> >> >> >
>> >> >>
>> >> >>
>> >> >> --
>> >> >> -------------------------
>> >> >> Daniel R. Roe, PhD
>> >> >> Department of Medicinal Chemistry
>> >> >> University of Utah
>> >> >> 30 South 2000 East, Room 307
>> >> >> Salt Lake City, UT 84112-5820
>> >> >> http://home.chpc.utah.edu/~cheatham/
>> >> >> (801) 587-9652
>> >> >> (801) 585-6208 (Fax)
>> >> >> _______________________________________________
>> >> >> AMBER mailing list
>> >> >> AMBER.ambermd.org
>> >> >> http://lists.ambermd.org/mailman/listinfo/amber
>> >> >>
>> >> > _______________________________________________
>> >> > AMBER mailing list
>> >> > AMBER.ambermd.org
>> >> > http://lists.ambermd.org/mailman/listinfo/amber
>> >>
>> >>
>> >>
>> >> --
>> >> -------------------------
>> >> Daniel R. Roe, PhD
>> >> Department of Medicinal Chemistry
>> >> University of Utah
>> >> 30 South 2000 East, Room 307
>> >> Salt Lake City, UT 84112-5820
>> >> http://home.chpc.utah.edu/~cheatham/
>> >> (801) 587-9652
>> >> (801) 585-6208 (Fax)
>> >>
>> >> _______________________________________________
>> >> AMBER mailing list
>> >> AMBER.ambermd.org
>> >> http://lists.ambermd.org/mailman/listinfo/amber
>> >>
>> > _______________________________________________
>> > AMBER mailing list
>> > AMBER.ambermd.org
>> > http://lists.ambermd.org/mailman/listinfo/amber
>>
>>
>>
>> --
>> -------------------------
>> Daniel R. Roe, PhD
>> Department of Medicinal Chemistry
>> University of Utah
>> 30 South 2000 East, Room 307
>> Salt Lake City, UT 84112-5820
>> http://home.chpc.utah.edu/~cheatham/
>> (801) 587-9652
>> (801) 585-6208 (Fax)
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber



-- 
-------------------------
Daniel R. Roe, PhD
Department of Medicinal Chemistry
University of Utah
30 South 2000 East, Room 307
Salt Lake City, UT 84112-5820
http://home.chpc.utah.edu/~cheatham/
(801) 587-9652
(801) 585-6208 (Fax)
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Jan 20 2015 - 10:00:05 PST
Custom Search