# [AMBER] PCA with a really giant trajectory

From: David Cerutti <dscerutti.gmail.com>
Date: Sat, 25 May 2019 15:31:54 -0400

Can anyone on the list help me do PCA with a really giant trajectory? I
have 35GB now and will have more than 200GB by the time this is all done.
The frames are being saved every 50ps and this is an implicit solvent
trajectory, so it's not going to be much use to strip the coordinates or
reduce the output rate if I want to keep gathering the relevant data. What
I'm working with is a file that looks like this, taken from the cpptraj
manual:

trajin ../Trajectory/md1_1.cdf
( ... many more trajin commands ...)
trajin ../Trajectory/md112_22.cdf
rms first !.H=
average crdset AVG
run
rms ref AVG !.H=
matrix covar name MyMatrix !.H=
createcrd CRD1
run
runanalysis diagmatrix MyMatrix vecs 2 name MyEvecs
crdaction CRD1 projection evecs MyEvecs !.H= out project.dat beg 1 end 2
go

The problem, it seems is that createcrd CRD1 line. When doing that, it
commits all coords to memory, limiting the amount of trajectory you can
analyze to the amount of system RAM. Otherwise, it seems that I can
compute the average positions AVG and compose the covariance matrix while
reading each frame from disk, without storing the entire trajectory in RAM.

I believe that if I had a way to store the matrix (which cpptraj provides)
and then READ IT BACK IN, I could compose the covariance matrix for the
entire trajectory and save the average coordinates. I could then read
matrix and diagonalize it, align each frame from the trajectory segment to
the average from the complete trajectory, and calculate each frame's
projection onto the matrix eigenvectors.

The only other alternative I could see here would be to use cpptraj to
compute the averaged coordinates and save them along with the covariance
matrix. Matlab could diagonalize the matrix and give me the eigenvectors.
I could then proceed segment by segment, using cpptraj to align the
trajectory coordinates to that average and write the aligned coordinates
back to disk. A secondary jiffy program could then read the aligned
coordinates and compare them to each eigenvector to calculate the
projections and thus give me the PCA. It would be a duct-tape solution,
but one that is possible if that's what I need to do.

Cheers,
Dave
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sat May 25 2019 - 13:00:02 PDT
Custom Search