Re: [AMBER] Error in pca analysis

From: Sruthi Sudhakar <sruthisudhakarraji.gmail.com>
Date: Wed, 16 Jun 2021 09:27:22 +0530

Could someone please suggest on the above issue, I tried rms fitting the 2
trajectories onto one. But it is being difficult and I get the error as
instead of having total frames (traj1+ traj2) it has only total number of
frames as trajectory 1. A help in refining the script would be really
helpful.

Regards,
Sruthi Sudhakar


On Sun, Jun 13, 2021 at 2:46 AM Sruthi Sudhakar <
sruthisudhakarraji.gmail.com> wrote:

>
> This method of breaking the analysis into stages actually worked and I was
> able to complete the analysis overcoming the memory issues.
> An additional query: Incase we are using 2 different trajectories with
> different number of residues (1530 and 1526) to do a combined trajectory
> analysis, would it be enough to load gamd2.nc (1530 residues) and gamd3.nc (1526
> residues) together? Each has 250000 frames. And in that case do we need to
> fit into a single trajectory (fit.nc) or separately create 2 fit
> trajectory? I am doubtful about the third stage where we load the fit
> trajectory, should we load both the parm files in that case? Also would the
> difference in the total number of residues create any problem?
>
> My sample script following your guidelines would be
> pca1.in
> parm noionsa.prmtop [ag]
> parm noionsb.prmtop [bg]
> trajin gamd2.nc parm [ag]
> trajin gamd3.nc parm [bg]
> # Fit to the first frame to remove global rotation/translation
> rms first :1-1362.CA
> # Write out the restart
> average gaccAvg.rst7 restart
>
> pca2.in
> parm noionsa.prmtop [ag]
> parm noionsb.prmtop [bg]
> trajin gamd2.nc parm [ag]
> trajin gamd3.nc parm [bg]
> # Read in average structure, tag as [avg]
> reference gaccAvg.rst7 [avg]
> # RMS-fit coordinates to average
> rms ref [avg] :1-1362.CA
> # Calculate coordinate covariance matrix
> matrix covar :1-1362.CA name gaccCovar
> # Write out the fit trajectory
> trajout fit.nc
> # Diagonalize coordinate covariance matrix for eigenmodes (only 2)
> diagmatrix gaccCovar out evecs.dat vecs 20 nmwiz nmwizvecs 5
> nmwizfile prot.nmd nmwizmask :1-1362.CA
>
> pca3.in
>
> parm noionsa.prmtop
> parm noionsb.prmtop
> trajin fit.nc
> # Read in modes
> readdata evecs.dat name MyModes
> # Now create separate PC projections for each trajectory
> projection T1 modes MyModes beg 1 end 20 :1-1362.CA start 1 stop 250000
> out T1.dat
> projection T2 modes MyModes beg 1 end 20 :1-1362.CA start 250001 stop
> 500000 out T2.dat
> # Calculate Kullback-Leibler Divergence vs time for PC histograms
> kde T1:1 kldiv T2:1 klout KL-PC1.dat bins 200 name SG-DG-1
> kde T1:2 kldiv T2:2 klout KL-PC2.dat bins 200 name SG-DG-2
> kde T1:1 out kde1-PC1.dat bins 200 name KDE1-1
> kde T2:1 out kde2-PC1.dat bins 200 name KDE2-1
> hist T1:1 bins 200 out T1-hist1.dat normint name HIST1-1
> hist T1:2 bins 200 out T1-hist2.dat normint name HIST1-2
> hist T2:1 bins 200 out T2-hist1.dat normint name HIST2-1
> hist T2:2 bins 200 out T2-hist2.dat normint name HIST2-2
>
> Kindly suggest any corrections required.
> Regards,
> Sruthi Sudhakar
>
>
> On Sat, Jun 12, 2021 at 12:11 PM Sruthi Sudhakar <
> sruthisudhakarraji.gmail.com> wrote:
>
>> This method actually worked and I was able to complete the analysis. An
>> additional query: Incase we are using 2 different trajectories to do a
>> combined trajectory analysis, would it be enough to load gamd2.nc and
>> gamd3.nc together? And in that case do we need to fit into a single
>> trajectory (fit.nc) or separately create 2 fit trajectory?
>>
>> Regards,
>> Sruthi Sudhakar
>>
>>
>> On Wed, Jun 2, 2021 at 11:32 PM Daniel Roe <daniel.r.roe.gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> First, if you haven't already I really recommend that you look at some
>>> literature to familiarize yourself with what principal component
>>> analysis actually does. It's important to understand what each part of
>>> the analysis is for. I like the old "essential dynamics" papers, but a
>>> very rough explanation is given in the cpptraj PCA tutorial:
>>>
>>> https://amberhub.chpc.utah.edu/introduction-to-principal-component-analysis/
>>>
>>> Also, read the cpptraj manual so you know what each command is actually
>>> doing.
>>>
>>> That said, I now realize my examples in the manual and the tutorial
>>> both rely on having the trajectories stored in memory. So I will give
>>> a rough outline of how to do it all on-disk.
>>>
>>> 1) Create average coordinates.
>>>
>>> parm noions.prmtop
>>> trajin gamd2.nc
>>> # Fit to the first frame to remove global rotation/translation
>>> rms first :1-1530&!.H=
>>> # Write out the restart
>>> average gaccAvg.rst7 restart
>>>
>>> 2) Rms fit your trajectory to average coordinates, calculate the
>>> covariance matrix, write out the fit trajectory, diagonalize the
>>> matrix and write out the "modes" data (i.e. eigenvectors and
>>> eigenvalues).
>>>
>>> parm noions.prmtop
>>> trajin gamd2.nc
>>> # Read in average structure, tag as [avg]
>>> reference gaccAvg.rst7 [avg]
>>> # RMS-fit coordinates to average
>>> rms ref [avg] :1-1530&!.H=
>>> # Calculate coordinate covariance matrix
>>> matrix covar :1-1530&!.H= name gaccCovar
>>> # Write out the fit trajectory
>>> trajout fit.nc
>>> # Diagonalize coordinate covariance matrix for eigenmodes (only 2)
>>> diagmatrix gaccCovar out evecs.dat vecs 2 nmwiz nmwizvecs 2 nmwizfile
>>> cas.nmd nmwizmask :1-1530&!.H=
>>>
>>>
>>> 3) Read in the "modes" data, calculate principal component projections
>>> from the fit trajectory, do the Kullback-Leibler divergence analysis.
>>>
>>> parm noions.prmtop
>>> trajin fit.nc
>>> # Read in modes
>>> readdata evecs.dat name MyModes
>>> # Now create separate PC projections for each trajectory
>>> projection T1 modes MyModes beg 1 end 2 :1-1530&!.H= start 1 stop
>>> 50000 out T1.dat
>>> projection T2 modes MyModes beg 1 end 2 :1-1530&!.H= start 50001 stop
>>> 100000 out T2.dat
>>> <other projection commands>
>>> # Calculate Kullback-Leibler Divergence vs time for PC histograms
>>> kde T1:1 kldiv T2:1 klout KL-PC.agr bins 400 name AMD-MREMD-1
>>> <other kde commands and analyses>
>>>
>>> Hope this helps,
>>>
>>> -Dan
>>>
>>> On Tue, Jun 1, 2021 at 3:23 PM Sruthi Sudhakar
>>> <sruthisudhakarraji.gmail.com> wrote:
>>> >
>>> > The available memory in the beginning of cpptraj is shown as 32 Gb and
>>> > estimated memory usage is 82gb. I am really confused about this memory
>>> > allotment statistics. I am doing the process in a disk with more than
>>> 3TB
>>> > space. I am not well versed with this technicality. Could someone
>>> please
>>> > exaplain how to overcome the issue in this principal component analysis
>>> > part? I did understand that we have to separate the analysis into 3
>>> phases
>>> > but not clear as to how the inputs should be changed. Kindly advise.
>>> >
>>> > On Tue, 1 Jun 2021 at 7:21 PM, Sruthi Sudhakar <
>>> sruthisudhakarraji.gmail.com>
>>> > wrote:
>>> >
>>> > >
>>> > > Thank you for the reply. Since I am doing this for the first time, I
>>> > > wanted to know if I am supposed to create 3 separate inputs to run
>>> in
>>> > > cpptraj to do the methodology you suggested.
>>> > >
>>> > > Regards,
>>> > > Sruthi
>>> > >
>>> > > On Tue, 1 Jun 2021 at 6:33 PM, Daniel Roe <daniel.r.roe.gmail.com>
>>> wrote:
>>> > >
>>> > >> Hi,
>>> > >>
>>> > >> You are likely running out of memory. This is why your problems
>>> during
>>> > >> clustering went away when you reduced the number of input frames by
>>> a
>>> > >> factor of 5. The solution is to do everything on disk. So instead of
>>> > >> loading all coordinates into memory, separate the principal
>>> component
>>> > >> analysis into three separate phases:
>>> > >>
>>> > >> 1) Create average coordinates.
>>> > >> 2) Rms fit your trajectory to average coordinates, calculate the
>>> > >> covariance matrix, write out the fit trajectory, diagonalize the
>>> > >> matrix and write out the "modes" data (i.e. eigenvectors and
>>> > >> eigenvalues).
>>> > >> 3) Read in the "modes" data, calculate principal component
>>> projections
>>> > >> from the fit trajectory, do the Kullback-Leibler divergence
>>> analysis.
>>> > >>
>>> > >> This way, the most memory you need is to store the covariance
>>> matrix,
>>> > >> modes, and other data of that type. Hope this helps,
>>> > >>
>>> > >> -Dan
>>> > >>
>>> > >> PS - Note that there appears to be a small error in the input you
>>> > >> posted (in pca.in). The 'nmwiz' keyword should be part of the
>>> > >> diagmatrix command, not on a separate line.
>>> > >>
>>> > >>
>>> > >> On Mon, May 31, 2021 at 1:22 PM Sruthi Sudhakar
>>> > >> <sruthisudhakarraji.gmail.com> wrote:
>>> > >> >
>>> > >> > Dear all,
>>> > >> >
>>> > >> > I have been doing pca analysis on an accelerated MD trajectory of
>>> 500ns
>>> > >> > (250,000 frames). I have attached the input file I have used for
>>> the
>>> > >> study.
>>> > >> > The job stops at the createcrd stage. Basically, the job gets
>>> killed at
>>> > >> > 30%. The same happened during the cluster analysis reading every
>>> frames.
>>> > >> > The clustering error was solved when I changed the input to read
>>> every
>>> > >> 5th
>>> > >> > frame. Now since this is repeating in pca analysis, kindly help
>>> > >> regarding
>>> > >> > the same.
>>> > >> >
>>> > >> >
>>> > >> > Regards,
>>> > >> > Sruthi Sudhakar
>>> > >> > _______________________________________________
>>> > >> > AMBER mailing list
>>> > >> > AMBER.ambermd.org
>>> > >> > http://lists.ambermd.org/mailman/listinfo/amber
>>> > >>
>>> > >> _______________________________________________
>>> > >> AMBER mailing list
>>> > >> AMBER.ambermd.org
>>> > >> http://lists.ambermd.org/mailman/listinfo/amber
>>> > >>
>>> > >
>>> > _______________________________________________
>>> > AMBER mailing list
>>> > AMBER.ambermd.org
>>> > http://lists.ambermd.org/mailman/listinfo/amber
>>>
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>
>>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Jun 15 2021 - 21:00:02 PDT
Custom Search