Re: [AMBER] PCA with a really giant trajectory

From: Chris Neale <>
Date: Mon, 27 May 2019 01:29:24 -0600

Dear David:

I realize that I am not directly answering your question, but do you really
need to run PCA on a trajectory that is saved every 50 ps? That seems
awfuly frequent for a long trajectory. In many situations, I suspect that
the same correlated motions will be picked up with a larger step size
between frames. You might test this equivlence by running PCA with a skip
1000, skip 100, and skip 10 version (or whatever you can afford) and then
comparing the change in top-ranked eigenvectors as a function of dt. If
your argument is that you absolutely must see the result when saved every
50 ps, then what is the argument that you do not need to run PCA on frames
saved every 5 ps? Sorry that I do not have any direct suggestions for code

Thank you,

On Sat, May 25, 2019 at 1:32 PM David Cerutti <> wrote:

> Can anyone on the list help me do PCA with a really giant trajectory? I
> have 35GB now and will have more than 200GB by the time this is all done.
> The frames are being saved every 50ps and this is an implicit solvent
> trajectory, so it's not going to be much use to strip the coordinates or
> reduce the output rate if I want to keep gathering the relevant data. What
> I'm working with is a file that looks like this, taken from the cpptraj
> manual:
> trajin ../Trajectory/md1_1.cdf
> ( ... many more trajin commands ...)
> trajin ../Trajectory/md112_22.cdf
> rms first !.H=
> average crdset AVG
> run
> rms ref AVG !.H=
> matrix covar name MyMatrix !.H=
> createcrd CRD1
> run
> runanalysis diagmatrix MyMatrix vecs 2 name MyEvecs
> crdaction CRD1 projection evecs MyEvecs !.H= out project.dat beg 1 end 2
> go
> The problem, it seems is that createcrd CRD1 line. When doing that, it
> commits all coords to memory, limiting the amount of trajectory you can
> analyze to the amount of system RAM. Otherwise, it seems that I can
> compute the average positions AVG and compose the covariance matrix while
> reading each frame from disk, without storing the entire trajectory in RAM.
> I believe that if I had a way to store the matrix (which cpptraj provides)
> and then READ IT BACK IN, I could compose the covariance matrix for the
> entire trajectory and save the average coordinates. I could then read
> segments of the trajectory, read back the averaged coordinates, load the
> matrix and diagonalize it, align each frame from the trajectory segment to
> the average from the complete trajectory, and calculate each frame's
> projection onto the matrix eigenvectors.
> The only other alternative I could see here would be to use cpptraj to
> compute the averaged coordinates and save them along with the covariance
> matrix. Matlab could diagonalize the matrix and give me the eigenvectors.
> I could then proceed segment by segment, using cpptraj to align the
> trajectory coordinates to that average and write the aligned coordinates
> back to disk. A secondary jiffy program could then read the aligned
> coordinates and compare them to each eigenvector to calculate the
> projections and thus give me the PCA. It would be a duct-tape solution,
> but one that is possible if that's what I need to do.
> Cheers,
> Dave
> _______________________________________________
> AMBER mailing list
AMBER mailing list
Received on Mon May 27 2019 - 00:30:02 PDT
Custom Search