From: David Cerutti <dscerutti.gmail.com>

Date: Sat, 25 May 2019 15:31:54 -0400

Can anyone on the list help me do PCA with a really giant trajectory? I

have 35GB now and will have more than 200GB by the time this is all done.

The frames are being saved every 50ps and this is an implicit solvent

trajectory, so it's not going to be much use to strip the coordinates or

reduce the output rate if I want to keep gathering the relevant data. What

I'm working with is a file that looks like this, taken from the cpptraj

manual:

trajin ../Trajectory/md1_1.cdf

( ... many more trajin commands ...)

trajin ../Trajectory/md112_22.cdf

rms first !.H=

average crdset AVG

run

rms ref AVG !.H=

matrix covar name MyMatrix !.H=

createcrd CRD1

run

runanalysis diagmatrix MyMatrix vecs 2 name MyEvecs

crdaction CRD1 projection evecs MyEvecs !.H= out project.dat beg 1 end 2

go

The problem, it seems is that createcrd CRD1 line. When doing that, it

commits all coords to memory, limiting the amount of trajectory you can

analyze to the amount of system RAM. Otherwise, it seems that I can

compute the average positions AVG and compose the covariance matrix while

reading each frame from disk, without storing the entire trajectory in RAM.

I believe that if I had a way to store the matrix (which cpptraj provides)

and then READ IT BACK IN, I could compose the covariance matrix for the

entire trajectory and save the average coordinates. I could then read

segments of the trajectory, read back the averaged coordinates, load the

matrix and diagonalize it, align each frame from the trajectory segment to

the average from the complete trajectory, and calculate each frame's

projection onto the matrix eigenvectors.

The only other alternative I could see here would be to use cpptraj to

compute the averaged coordinates and save them along with the covariance

matrix. Matlab could diagonalize the matrix and give me the eigenvectors.

I could then proceed segment by segment, using cpptraj to align the

trajectory coordinates to that average and write the aligned coordinates

back to disk. A secondary jiffy program could then read the aligned

coordinates and compare them to each eigenvector to calculate the

projections and thus give me the PCA. It would be a duct-tape solution,

but one that is possible if that's what I need to do.

Cheers,

Dave

