- Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]

From: Daniel Roe <daniel.r.roe.gmail.com>

Date: Wed, 2 Jun 2021 14:02:18 -0400

Hi,

First, if you haven't already I really recommend that you look at some

literature to familiarize yourself with what principal component

analysis actually does. It's important to understand what each part of

the analysis is for. I like the old "essential dynamics" papers, but a

very rough explanation is given in the cpptraj PCA tutorial:

https://amberhub.chpc.utah.edu/introduction-to-principal-component-analysis/

Also, read the cpptraj manual so you know what each command is actually doing.

That said, I now realize my examples in the manual and the tutorial

both rely on having the trajectories stored in memory. So I will give

a rough outline of how to do it all on-disk.

1) Create average coordinates.

parm noions.prmtop

trajin gamd2.nc

# Fit to the first frame to remove global rotation/translation

rms first :1-1530&!.H=

# Write out the restart

average gaccAvg.rst7 restart

2) Rms fit your trajectory to average coordinates, calculate the

covariance matrix, write out the fit trajectory, diagonalize the

matrix and write out the "modes" data (i.e. eigenvectors and

eigenvalues).

parm noions.prmtop

trajin gamd2.nc

# Read in average structure, tag as [avg]

reference gaccAvg.rst7 [avg]

# RMS-fit coordinates to average

rms ref [avg] :1-1530&!.H=

# Calculate coordinate covariance matrix

matrix covar :1-1530&!.H= name gaccCovar

# Write out the fit trajectory

trajout fit.nc

# Diagonalize coordinate covariance matrix for eigenmodes (only 2)

diagmatrix gaccCovar out evecs.dat vecs 2 nmwiz nmwizvecs 2 nmwizfile

cas.nmd nmwizmask :1-1530&!.H=

3) Read in the "modes" data, calculate principal component projections

from the fit trajectory, do the Kullback-Leibler divergence analysis.

parm noions.prmtop

trajin fit.nc

# Read in modes

readdata evecs.dat name MyModes

# Now create separate PC projections for each trajectory

projection T1 modes MyModes beg 1 end 2 :1-1530&!.H= start 1 stop

50000 out T1.dat

projection T2 modes MyModes beg 1 end 2 :1-1530&!.H= start 50001 stop

100000 out T2.dat

<other projection commands>

# Calculate Kullback-Leibler Divergence vs time for PC histograms

kde T1:1 kldiv T2:1 klout KL-PC.agr bins 400 name AMD-MREMD-1

<other kde commands and analyses>

Hope this helps,

-Dan

On Tue, Jun 1, 2021 at 3:23 PM Sruthi Sudhakar

<sruthisudhakarraji.gmail.com> wrote:

*>
*

*> The available memory in the beginning of cpptraj is shown as 32 Gb and
*

*> estimated memory usage is 82gb. I am really confused about this memory
*

*> allotment statistics. I am doing the process in a disk with more than 3TB
*

*> space. I am not well versed with this technicality. Could someone please
*

*> exaplain how to overcome the issue in this principal component analysis
*

*> part? I did understand that we have to separate the analysis into 3 phases
*

*> but not clear as to how the inputs should be changed. Kindly advise.
*

*>
*

*> On Tue, 1 Jun 2021 at 7:21 PM, Sruthi Sudhakar <sruthisudhakarraji.gmail.com>
*

*> wrote:
*

*>
*

*> >
*

*> > Thank you for the reply. Since I am doing this for the first time, I
*

*> > wanted to know if I am supposed to create 3 separate inputs to run in
*

*> > cpptraj to do the methodology you suggested.
*

*> >
*

*> > Regards,
*

*> > Sruthi
*

*> >
*

*> > On Tue, 1 Jun 2021 at 6:33 PM, Daniel Roe <daniel.r.roe.gmail.com> wrote:
*

*> >
*

*> >> Hi,
*

*> >>
*

*> >> You are likely running out of memory. This is why your problems during
*

*> >> clustering went away when you reduced the number of input frames by a
*

*> >> factor of 5. The solution is to do everything on disk. So instead of
*

*> >> loading all coordinates into memory, separate the principal component
*

*> >> analysis into three separate phases:
*

*> >>
*

*> >> 1) Create average coordinates.
*

*> >> 2) Rms fit your trajectory to average coordinates, calculate the
*

*> >> covariance matrix, write out the fit trajectory, diagonalize the
*

*> >> matrix and write out the "modes" data (i.e. eigenvectors and
*

*> >> eigenvalues).
*

*> >> 3) Read in the "modes" data, calculate principal component projections
*

*> >> from the fit trajectory, do the Kullback-Leibler divergence analysis.
*

*> >>
*

*> >> This way, the most memory you need is to store the covariance matrix,
*

*> >> modes, and other data of that type. Hope this helps,
*

*> >>
*

*> >> -Dan
*

*> >>
*

*> >> PS - Note that there appears to be a small error in the input you
*

*> >> posted (in pca.in). The 'nmwiz' keyword should be part of the
*

*> >> diagmatrix command, not on a separate line.
*

*> >>
*

*> >>
*

*> >> On Mon, May 31, 2021 at 1:22 PM Sruthi Sudhakar
*

*> >> <sruthisudhakarraji.gmail.com> wrote:
*

*> >> >
*

*> >> > Dear all,
*

*> >> >
*

*> >> > I have been doing pca analysis on an accelerated MD trajectory of 500ns
*

*> >> > (250,000 frames). I have attached the input file I have used for the
*

*> >> study.
*

*> >> > The job stops at the createcrd stage. Basically, the job gets killed at
*

*> >> > 30%. The same happened during the cluster analysis reading every frames.
*

*> >> > The clustering error was solved when I changed the input to read every
*

*> >> 5th
*

*> >> > frame. Now since this is repeating in pca analysis, kindly help
*

*> >> regarding
*

*> >> > the same.
*

*> >> >
*

*> >> >
*

*> >> > Regards,
*

*> >> > Sruthi Sudhakar
*

*> >> > _______________________________________________
*

*> >> > AMBER mailing list
*

*> >> > AMBER.ambermd.org
*

*> >> > http://lists.ambermd.org/mailman/listinfo/amber
*

*> >>
*

*> >> _______________________________________________
*

*> >> AMBER mailing list
*

*> >> AMBER.ambermd.org
*

*> >> http://lists.ambermd.org/mailman/listinfo/amber
*

*> >>
*

*> >
*

*> _______________________________________________
*

*> AMBER mailing list
*

*> AMBER.ambermd.org
*

*> http://lists.ambermd.org/mailman/listinfo/amber
*

_______________________________________________

AMBER mailing list

AMBER.ambermd.org

http://lists.ambermd.org/mailman/listinfo/amber

Received on Wed Jun 02 2021 - 11:30:03 PDT

Date: Wed, 2 Jun 2021 14:02:18 -0400

Hi,

First, if you haven't already I really recommend that you look at some

literature to familiarize yourself with what principal component

analysis actually does. It's important to understand what each part of

the analysis is for. I like the old "essential dynamics" papers, but a

very rough explanation is given in the cpptraj PCA tutorial:

https://amberhub.chpc.utah.edu/introduction-to-principal-component-analysis/

Also, read the cpptraj manual so you know what each command is actually doing.

That said, I now realize my examples in the manual and the tutorial

both rely on having the trajectories stored in memory. So I will give

a rough outline of how to do it all on-disk.

1) Create average coordinates.

parm noions.prmtop

trajin gamd2.nc

# Fit to the first frame to remove global rotation/translation

rms first :1-1530&!.H=

# Write out the restart

average gaccAvg.rst7 restart

2) Rms fit your trajectory to average coordinates, calculate the

covariance matrix, write out the fit trajectory, diagonalize the

matrix and write out the "modes" data (i.e. eigenvectors and

eigenvalues).

parm noions.prmtop

trajin gamd2.nc

# Read in average structure, tag as [avg]

reference gaccAvg.rst7 [avg]

# RMS-fit coordinates to average

rms ref [avg] :1-1530&!.H=

# Calculate coordinate covariance matrix

matrix covar :1-1530&!.H= name gaccCovar

# Write out the fit trajectory

trajout fit.nc

# Diagonalize coordinate covariance matrix for eigenmodes (only 2)

diagmatrix gaccCovar out evecs.dat vecs 2 nmwiz nmwizvecs 2 nmwizfile

cas.nmd nmwizmask :1-1530&!.H=

3) Read in the "modes" data, calculate principal component projections

from the fit trajectory, do the Kullback-Leibler divergence analysis.

parm noions.prmtop

trajin fit.nc

# Read in modes

readdata evecs.dat name MyModes

# Now create separate PC projections for each trajectory

projection T1 modes MyModes beg 1 end 2 :1-1530&!.H= start 1 stop

50000 out T1.dat

projection T2 modes MyModes beg 1 end 2 :1-1530&!.H= start 50001 stop

100000 out T2.dat

<other projection commands>

# Calculate Kullback-Leibler Divergence vs time for PC histograms

kde T1:1 kldiv T2:1 klout KL-PC.agr bins 400 name AMD-MREMD-1

<other kde commands and analyses>

Hope this helps,

-Dan

On Tue, Jun 1, 2021 at 3:23 PM Sruthi Sudhakar

<sruthisudhakarraji.gmail.com> wrote:

_______________________________________________

AMBER mailing list

AMBER.ambermd.org

http://lists.ambermd.org/mailman/listinfo/amber

Received on Wed Jun 02 2021 - 11:30:03 PDT

Custom Search