- Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]

From: Daniel Roe <daniel.r.roe.gmail.com>

Date: Tue, 28 May 2019 14:41:52 -0400

Hi,

To use less memory, don't store the 'fit' trajectory as a COORDS data

set. The only reason to save the coordinates to memory is just so you

don't have to rms-fit twice, but really that's not that much of a

bottleneck anyway (particularly if you're processing netcdf

trajectories). Just make sure that when you run the 'projection'

command that you're manipulating the trajectory coordinates the same

way as when you are generating the covariance matrix. So for example:

# Input trajectory

trajin ../Trajectory/md1_1.cdf

# Generate average for rms fit

rms first !.H=

average crdset AVG

run

# Generate covariance matrix

rms ref AVG !.H=

matrix covar name MyMatrix !.H=

diagmatrix MyMatrix vecs 2 name MyEvecs

run

# Do the projection

rms ref AVG !.H=

projection evecs MyEvecs !.H= out project.dat beg 1 end 2

run

Hope this helps,

-Dan

On Sat, May 25, 2019 at 3:32 PM David Cerutti <dscerutti.gmail.com> wrote:

*>
*

*> Can anyone on the list help me do PCA with a really giant trajectory? I
*

*> have 35GB now and will have more than 200GB by the time this is all done.
*

*> The frames are being saved every 50ps and this is an implicit solvent
*

*> trajectory, so it's not going to be much use to strip the coordinates or
*

*> reduce the output rate if I want to keep gathering the relevant data. What
*

*> I'm working with is a file that looks like this, taken from the cpptraj
*

*> manual:
*

*>
*

*> trajin ../Trajectory/md1_1.cdf
*

*> ( ... many more trajin commands ...)
*

*> trajin ../Trajectory/md112_22.cdf
*

*> rms first !.H=
*

*> average crdset AVG
*

*> run
*

*> rms ref AVG !.H=
*

*> matrix covar name MyMatrix !.H=
*

*> createcrd CRD1
*

*> run
*

*> runanalysis diagmatrix MyMatrix vecs 2 name MyEvecs
*

*> crdaction CRD1 projection evecs MyEvecs !.H= out project.dat beg 1 end 2
*

*> go
*

*>
*

*> The problem, it seems is that createcrd CRD1 line. When doing that, it
*

*> commits all coords to memory, limiting the amount of trajectory you can
*

*> analyze to the amount of system RAM. Otherwise, it seems that I can
*

*> compute the average positions AVG and compose the covariance matrix while
*

*> reading each frame from disk, without storing the entire trajectory in RAM.
*

*>
*

*> I believe that if I had a way to store the matrix (which cpptraj provides)
*

*> and then READ IT BACK IN, I could compose the covariance matrix for the
*

*> entire trajectory and save the average coordinates. I could then read
*

*> segments of the trajectory, read back the averaged coordinates, load the
*

*> matrix and diagonalize it, align each frame from the trajectory segment to
*

*> the average from the complete trajectory, and calculate each frame's
*

*> projection onto the matrix eigenvectors.
*

*>
*

*> The only other alternative I could see here would be to use cpptraj to
*

*> compute the averaged coordinates and save them along with the covariance
*

*> matrix. Matlab could diagonalize the matrix and give me the eigenvectors.
*

*> I could then proceed segment by segment, using cpptraj to align the
*

*> trajectory coordinates to that average and write the aligned coordinates
*

*> back to disk. A secondary jiffy program could then read the aligned
*

*> coordinates and compare them to each eigenvector to calculate the
*

*> projections and thus give me the PCA. It would be a duct-tape solution,
*

*> but one that is possible if that's what I need to do.
*

*>
*

*> Cheers,
*

*> Dave
*

*> _______________________________________________
*

*> AMBER mailing list
*

*> AMBER.ambermd.org
*

*> http://lists.ambermd.org/mailman/listinfo/amber
*

_______________________________________________

AMBER mailing list

AMBER.ambermd.org

http://lists.ambermd.org/mailman/listinfo/amber

Received on Tue May 28 2019 - 12:00:01 PDT

Date: Tue, 28 May 2019 14:41:52 -0400

Hi,

To use less memory, don't store the 'fit' trajectory as a COORDS data

set. The only reason to save the coordinates to memory is just so you

don't have to rms-fit twice, but really that's not that much of a

bottleneck anyway (particularly if you're processing netcdf

trajectories). Just make sure that when you run the 'projection'

command that you're manipulating the trajectory coordinates the same

way as when you are generating the covariance matrix. So for example:

# Input trajectory

trajin ../Trajectory/md1_1.cdf

# Generate average for rms fit

rms first !.H=

average crdset AVG

run

# Generate covariance matrix

rms ref AVG !.H=

matrix covar name MyMatrix !.H=

diagmatrix MyMatrix vecs 2 name MyEvecs

run

# Do the projection

rms ref AVG !.H=

projection evecs MyEvecs !.H= out project.dat beg 1 end 2

run

Hope this helps,

-Dan

On Sat, May 25, 2019 at 3:32 PM David Cerutti <dscerutti.gmail.com> wrote:

_______________________________________________

AMBER mailing list

AMBER.ambermd.org

http://lists.ambermd.org/mailman/listinfo/amber

Received on Tue May 28 2019 - 12:00:01 PDT

Custom Search