Re: [AMBER] PCA with a really giant trajectory

From: David Cerutti <dscerutti.gmail.com>
Date: Tue, 28 May 2019 14:47:56 -0400

I would just add that I'm very happy with the design structure of cpptraj
and how it can use RAM to save compute time, or provide the option of a
little more computation to keep RAM requirements down.

Dave

On Tue, May 28, 2019 at 2:42 PM Daniel Roe <daniel.r.roe.gmail.com> wrote:

> Hi,
>
> To use less memory, don't store the 'fit' trajectory as a COORDS data
> set. The only reason to save the coordinates to memory is just so you
> don't have to rms-fit twice, but really that's not that much of a
> bottleneck anyway (particularly if you're processing netcdf
> trajectories). Just make sure that when you run the 'projection'
> command that you're manipulating the trajectory coordinates the same
> way as when you are generating the covariance matrix. So for example:
>
> # Input trajectory
> trajin ../Trajectory/md1_1.cdf
> # Generate average for rms fit
> rms first !.H=
> average crdset AVG
> run
> # Generate covariance matrix
> rms ref AVG !.H=
> matrix covar name MyMatrix !.H=
> diagmatrix MyMatrix vecs 2 name MyEvecs
> run
> # Do the projection
> rms ref AVG !.H=
> projection evecs MyEvecs !.H= out project.dat beg 1 end 2
> run
>
> Hope this helps,
>
> -Dan
>
> On Sat, May 25, 2019 at 3:32 PM David Cerutti <dscerutti.gmail.com> wrote:
> >
> > Can anyone on the list help me do PCA with a really giant trajectory? I
> > have 35GB now and will have more than 200GB by the time this is all done.
> > The frames are being saved every 50ps and this is an implicit solvent
> > trajectory, so it's not going to be much use to strip the coordinates or
> > reduce the output rate if I want to keep gathering the relevant data.
> What
> > I'm working with is a file that looks like this, taken from the cpptraj
> > manual:
> >
> > trajin ../Trajectory/md1_1.cdf
> > ( ... many more trajin commands ...)
> > trajin ../Trajectory/md112_22.cdf
> > rms first !.H=
> > average crdset AVG
> > run
> > rms ref AVG !.H=
> > matrix covar name MyMatrix !.H=
> > createcrd CRD1
> > run
> > runanalysis diagmatrix MyMatrix vecs 2 name MyEvecs
> > crdaction CRD1 projection evecs MyEvecs !.H= out project.dat beg 1 end 2
> > go
> >
> > The problem, it seems is that createcrd CRD1 line. When doing that, it
> > commits all coords to memory, limiting the amount of trajectory you can
> > analyze to the amount of system RAM. Otherwise, it seems that I can
> > compute the average positions AVG and compose the covariance matrix while
> > reading each frame from disk, without storing the entire trajectory in
> RAM.
> >
> > I believe that if I had a way to store the matrix (which cpptraj
> provides)
> > and then READ IT BACK IN, I could compose the covariance matrix for the
> > entire trajectory and save the average coordinates. I could then read
> > segments of the trajectory, read back the averaged coordinates, load the
> > matrix and diagonalize it, align each frame from the trajectory segment
> to
> > the average from the complete trajectory, and calculate each frame's
> > projection onto the matrix eigenvectors.
> >
> > The only other alternative I could see here would be to use cpptraj to
> > compute the averaged coordinates and save them along with the covariance
> > matrix. Matlab could diagonalize the matrix and give me the
> eigenvectors.
> > I could then proceed segment by segment, using cpptraj to align the
> > trajectory coordinates to that average and write the aligned coordinates
> > back to disk. A secondary jiffy program could then read the aligned
> > coordinates and compare them to each eigenvector to calculate the
> > projections and thus give me the PCA. It would be a duct-tape solution,
> > but one that is possible if that's what I need to do.
> >
> > Cheers,
> > Dave
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue May 28 2019 - 12:00:02 PDT
Custom Search