Re: [AMBER] FW: clustering problem in ambertool14

From: Mahendra B Thapa <thapamb.mail.uc.edu>
Date: Fri, 2 Jan 2015 11:08:40 -0500

Dear Dr.Daniel
Memory issues were solved when I followed the steps you suggested; thank
you for that.

A new problem appeared as seen in the screen:

Internal Error: Metric is COORDS base but data set is not.
Error: in Analysis # 0
    1 errors encountered reading input.

{{ Note: I have already fixed bugs for ambertool 14
http://ambermd.org/bugfixes/AmberTools/14.0/update.17
}}

DATAFILES:
  cluster_out (Standard Data File): Cnum_00001
Warning: Set 'Cnum_00001' contains no data.
Warning: File 'cluster_out' has no sets containing data.

Are these errors due to (i) a large numbers of frames (250000) and number
of atoms (7584 atoms) ?

In the previous post (http://archive.ambermd.org/201408/0214.html), there
is some discussion but I am assuming that I have been using stripped
topology file to run cpptraj. I have attached the screen shot ( text
file:TEST_LOG) with this email.

Thank you for help,
Mahendra Thapa


On Tue, Dec 16, 2014 at 3:20 PM, Thapa, Mahendra (thapamb) <
thapamb.mail.uc.edu> wrote:

>
>
>
> ________________________________________
> From: Daniel Roe
> Sent: Tuesday, December 16, 2014 2:19:51 PM (UTC-06:00) Central America
> To: AMBER Mailing List
> Subject: Re: [AMBER] clustering problem in ambertool14
>
> Hi,
>
> Usually when you get this error message during a command that uses a
> COORDS data set (cluster, 2drms, crdfluct etc) it's because you ran
> out of memory. Here is a formula to estimate the amount of memory you
> will need to hold a COORDS data set:
>
> memory_in_bytes = (F * A * 3) * 4
>
> where F is the number of frames, A is the number of atoms (after
> stripping in this case), the 3 is from # of coords per atom and 4 is
> bytes (COORDS are single precision). Divide by 1048576 to get the
> result in MB. Add 6 to (F * A *3) if you have box coordinates, double
> if you have velocities as well.
>
> However, in place of a COORDS data set cpptraj also lets you use what
> is called a TRAJ data set (which leaves data on-disk). The only issue
> with this is because it remains on the disk you cannot modify a TRAJ
> data set, so you will have to pre-process your trajectory (i.e.
> strip/image) first. This is a good idea to do in general since it will
> make subsequent analyses faster. Here is some input as an example.
>
> # Step 1 - Preprocess
> parm myparm.parm7
> trajin mytraj.nc
> strip :Na+,WAT nobox outprefix strip
> autoimage
> rms first mass .C,CA,N
> trajout strip.mytraj.nc nobox
>
> A few things to note here. First is that I put the 'strip' command
> before everything else; this way subsequent commands will be faster
> because there are less atoms to deal with. Also note in my 'strip'
> command I'm writing out a stripped topology for use with my stripped
> trajectory. Finally and most importantly, because you are rms-fitting
> you will no longer be able to image anyway, so I'm getting rid of any
> box coordinates.
>
> # Step 2 - Cluster
> parm strip.myparm.parm7
> trajin strip.mytraj.nc
> loadtraj name MYTRAJ
> cluster crdset MYTRAJ :1-291.CA,N,C,O mass clusters 10 out cluster_out
> nofit averagelinkage \
> summary summary_out info Cluster_info repout box2.rep repfmt pdb
> clusterout cluster.nc clusterfmt netcdf
>
> The 'loadtraj' command in this case is taking all loaded trajectories
> from 'trajin' statements and putting them into a TRAJ data set named
> MYTRAJ, which stays on-disk and can subsequently be used by the
> 'cluster' command.
>
> One more thing to keep in mind is that even though the coordinates
> will be kept on disk, you will still need enough memory to hold the
> pairwise distance matrix:
>
> memory_in_bytes = ((F * (F-1)) / 2) * 4
>
> If you don't have enough memory to hold the pairwise distance matrix
> try using the 'sieve' keyword to reduce the number of frames being
> clustered in the first pass. This will also speed up the actual
> clustering a bit. Last and most importantly make sure you are using
> the most up-to-date version of cpptraj (14.22).
>
> Hope this helps,
>
> -Dan
>
> On Tue, Dec 16, 2014 at 11:28 AM, Mahendra B Thapa <thapamb.mail.uc.edu>
> wrote:
> > Dear Amber users
> > I used following command for clustering 50ns all-atom simulated data.
> > cpptraj -i input_file -p para_top
> > where 'input_file' consists of
> >
> > trajin mdcrd_files
> > autoimage
> > rms first mass .C,CA,N
> > strip :Na+,WAT
> > cluster :1-291.CA,N,C,O mass clusters 10 out cluster_out nofit
> > averagelinkage \
> > summary summary_out info Cluster_info repout box2.rep repfmt pdb
> > clusterout cluster.nc clusterfmt netcdf
> > go
> >
> > After running the command, I got following message without any output
> files:
> >
> > 1]terminate called after throwing an instance of 'std::bad_alloc'
> > what(): std::bad_alloc
> > Aborted
> >
> > 2] Warning: One or more analyses requested creation of default COORDS
> > DataSet.
> > CREATECRD: Saving coordinates from Top to file to "_DEFAULTCRD_"
> >
> >
> > 3]Warning: Coordinates are being rotated and box coordinates are present.
> > Warning: Unit cell vectors are NOT rotated; imaging will not be possible
> > Warning: after the RMS-fit is performed.
> >
> > Any comments and suggestion will be very useful.
> >
> > Thank you,
> > Mahendra Thapa
> > University of Cincinnati
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
>
>
>
> --
> -------------------------
> Daniel R. Roe, PhD
> Department of Medicinal Chemistry
> University of Utah
> 30 South 2000 East, Room 307
> Salt Lake City, UT 84112-5820
> http://home.chpc.utah.edu/~cheatham/
> (801) 587-9652
> (801) 585-6208 (Fax)
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

Received on Fri Jan 02 2015 - 08:30:02 PST
Custom Search