Re: [AMBER] Error in pca analysis

From: Sruthi Sudhakar <sruthisudhakarraji.gmail.com>
Date: Wed, 2 Jun 2021 23:58:57 +0530

Thank you so much for the input guidance along with the literature. I would
try to go through the terminologies first before fixing the issues.

Regards,
Sruthi

On Wed, 2 Jun 2021 at 11:32 PM, Daniel Roe <daniel.r.roe.gmail.com> wrote:

> Hi,
>
> First, if you haven't already I really recommend that you look at some
> literature to familiarize yourself with what principal component
> analysis actually does. It's important to understand what each part of
> the analysis is for. I like the old "essential dynamics" papers, but a
> very rough explanation is given in the cpptraj PCA tutorial:
>
> https://amberhub.chpc.utah.edu/introduction-to-principal-component-analysis/
>
> Also, read the cpptraj manual so you know what each command is actually
> doing.
>
> That said, I now realize my examples in the manual and the tutorial
> both rely on having the trajectories stored in memory. So I will give
> a rough outline of how to do it all on-disk.
>
> 1) Create average coordinates.
>
> parm noions.prmtop
> trajin gamd2.nc
> # Fit to the first frame to remove global rotation/translation
> rms first :1-1530&!.H=
> # Write out the restart
> average gaccAvg.rst7 restart
>
> 2) Rms fit your trajectory to average coordinates, calculate the
> covariance matrix, write out the fit trajectory, diagonalize the
> matrix and write out the "modes" data (i.e. eigenvectors and
> eigenvalues).
>
> parm noions.prmtop
> trajin gamd2.nc
> # Read in average structure, tag as [avg]
> reference gaccAvg.rst7 [avg]
> # RMS-fit coordinates to average
> rms ref [avg] :1-1530&!.H=
> # Calculate coordinate covariance matrix
> matrix covar :1-1530&!.H= name gaccCovar
> # Write out the fit trajectory
> trajout fit.nc
> # Diagonalize coordinate covariance matrix for eigenmodes (only 2)
> diagmatrix gaccCovar out evecs.dat vecs 2 nmwiz nmwizvecs 2 nmwizfile
> cas.nmd nmwizmask :1-1530&!.H=
>
>
> 3) Read in the "modes" data, calculate principal component projections
> from the fit trajectory, do the Kullback-Leibler divergence analysis.
>
> parm noions.prmtop
> trajin fit.nc
> # Read in modes
> readdata evecs.dat name MyModes
> # Now create separate PC projections for each trajectory
> projection T1 modes MyModes beg 1 end 2 :1-1530&!.H= start 1 stop
> 50000 out T1.dat
> projection T2 modes MyModes beg 1 end 2 :1-1530&!.H= start 50001 stop
> 100000 out T2.dat
> <other projection commands>
> # Calculate Kullback-Leibler Divergence vs time for PC histograms
> kde T1:1 kldiv T2:1 klout KL-PC.agr bins 400 name AMD-MREMD-1
> <other kde commands and analyses>
>
> Hope this helps,
>
> -Dan
>
> On Tue, Jun 1, 2021 at 3:23 PM Sruthi Sudhakar
> <sruthisudhakarraji.gmail.com> wrote:
> >
> > The available memory in the beginning of cpptraj is shown as 32 Gb and
> > estimated memory usage is 82gb. I am really confused about this memory
> > allotment statistics. I am doing the process in a disk with more than 3TB
> > space. I am not well versed with this technicality. Could someone please
> > exaplain how to overcome the issue in this principal component analysis
> > part? I did understand that we have to separate the analysis into 3
> phases
> > but not clear as to how the inputs should be changed. Kindly advise.
> >
> > On Tue, 1 Jun 2021 at 7:21 PM, Sruthi Sudhakar <
> sruthisudhakarraji.gmail.com>
> > wrote:
> >
> > >
> > > Thank you for the reply. Since I am doing this for the first time, I
> > > wanted to know if I am supposed to create 3 separate inputs to run in
> > > cpptraj to do the methodology you suggested.
> > >
> > > Regards,
> > > Sruthi
> > >
> > > On Tue, 1 Jun 2021 at 6:33 PM, Daniel Roe <daniel.r.roe.gmail.com>
> wrote:
> > >
> > >> Hi,
> > >>
> > >> You are likely running out of memory. This is why your problems during
> > >> clustering went away when you reduced the number of input frames by a
> > >> factor of 5. The solution is to do everything on disk. So instead of
> > >> loading all coordinates into memory, separate the principal component
> > >> analysis into three separate phases:
> > >>
> > >> 1) Create average coordinates.
> > >> 2) Rms fit your trajectory to average coordinates, calculate the
> > >> covariance matrix, write out the fit trajectory, diagonalize the
> > >> matrix and write out the "modes" data (i.e. eigenvectors and
> > >> eigenvalues).
> > >> 3) Read in the "modes" data, calculate principal component projections
> > >> from the fit trajectory, do the Kullback-Leibler divergence analysis.
> > >>
> > >> This way, the most memory you need is to store the covariance matrix,
> > >> modes, and other data of that type. Hope this helps,
> > >>
> > >> -Dan
> > >>
> > >> PS - Note that there appears to be a small error in the input you
> > >> posted (in pca.in). The 'nmwiz' keyword should be part of the
> > >> diagmatrix command, not on a separate line.
> > >>
> > >>
> > >> On Mon, May 31, 2021 at 1:22 PM Sruthi Sudhakar
> > >> <sruthisudhakarraji.gmail.com> wrote:
> > >> >
> > >> > Dear all,
> > >> >
> > >> > I have been doing pca analysis on an accelerated MD trajectory of
> 500ns
> > >> > (250,000 frames). I have attached the input file I have used for the
> > >> study.
> > >> > The job stops at the createcrd stage. Basically, the job gets
> killed at
> > >> > 30%. The same happened during the cluster analysis reading every
> frames.
> > >> > The clustering error was solved when I changed the input to read
> every
> > >> 5th
> > >> > frame. Now since this is repeating in pca analysis, kindly help
> > >> regarding
> > >> > the same.
> > >> >
> > >> >
> > >> > Regards,
> > >> > Sruthi Sudhakar
> > >> > _______________________________________________
> > >> > AMBER mailing list
> > >> > AMBER.ambermd.org
> > >> > http://lists.ambermd.org/mailman/listinfo/amber
> > >>
> > >> _______________________________________________
> > >> AMBER mailing list
> > >> AMBER.ambermd.org
> > >> http://lists.ambermd.org/mailman/listinfo/amber
> > >>
> > >
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Jun 02 2021 - 11:30:03 PDT
Custom Search