Re: [AMBER] PCA with a really giant trajectory

From: Daniel Roe <daniel.r.roe.gmail.com>
Date: Tue, 28 May 2019 14:41:52 -0400

Hi,

To use less memory, don't store the 'fit' trajectory as a COORDS data
set. The only reason to save the coordinates to memory is just so you
don't have to rms-fit twice, but really that's not that much of a
bottleneck anyway (particularly if you're processing netcdf
trajectories). Just make sure that when you run the 'projection'
command that you're manipulating the trajectory coordinates the same
way as when you are generating the covariance matrix. So for example:

# Input trajectory
trajin ../Trajectory/md1_1.cdf
# Generate average for rms fit
rms first !.H=
average crdset AVG
run
# Generate covariance matrix
rms ref AVG !.H=
matrix covar name MyMatrix !.H=
diagmatrix MyMatrix vecs 2 name MyEvecs
run
# Do the projection
rms ref AVG !.H=
projection evecs MyEvecs !.H= out project.dat beg 1 end 2
run

Hope this helps,

-Dan

On Sat, May 25, 2019 at 3:32 PM David Cerutti <dscerutti.gmail.com> wrote:
>
> Can anyone on the list help me do PCA with a really giant trajectory? I
> have 35GB now and will have more than 200GB by the time this is all done.
> The frames are being saved every 50ps and this is an implicit solvent
> trajectory, so it's not going to be much use to strip the coordinates or
> reduce the output rate if I want to keep gathering the relevant data. What
> I'm working with is a file that looks like this, taken from the cpptraj
> manual:
>
> trajin ../Trajectory/md1_1.cdf
> ( ... many more trajin commands ...)
> trajin ../Trajectory/md112_22.cdf
> rms first !.H=
> average crdset AVG
> run
> rms ref AVG !.H=
> matrix covar name MyMatrix !.H=
> createcrd CRD1
> run
> runanalysis diagmatrix MyMatrix vecs 2 name MyEvecs
> crdaction CRD1 projection evecs MyEvecs !.H= out project.dat beg 1 end 2
> go
>
> The problem, it seems is that createcrd CRD1 line. When doing that, it
> commits all coords to memory, limiting the amount of trajectory you can
> analyze to the amount of system RAM. Otherwise, it seems that I can
> compute the average positions AVG and compose the covariance matrix while
> reading each frame from disk, without storing the entire trajectory in RAM.
>
> I believe that if I had a way to store the matrix (which cpptraj provides)
> and then READ IT BACK IN, I could compose the covariance matrix for the
> entire trajectory and save the average coordinates. I could then read
> segments of the trajectory, read back the averaged coordinates, load the
> matrix and diagonalize it, align each frame from the trajectory segment to
> the average from the complete trajectory, and calculate each frame's
> projection onto the matrix eigenvectors.
>
> The only other alternative I could see here would be to use cpptraj to
> compute the averaged coordinates and save them along with the covariance
> matrix. Matlab could diagonalize the matrix and give me the eigenvectors.
> I could then proceed segment by segment, using cpptraj to align the
> trajectory coordinates to that average and write the aligned coordinates
> back to disk. A secondary jiffy program could then read the aligned
> coordinates and compare them to each eigenvector to calculate the
> projections and thus give me the PCA. It would be a duct-tape solution,
> but one that is possible if that's what I need to do.
>
> Cheers,
> Dave
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue May 28 2019 - 12:00:01 PDT
Custom Search