- Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]

From: Sruthi Sudhakar <sruthisudhakarraji.gmail.com>

Date: Sat, 19 Jun 2021 11:42:37 +0530

I have tried the approach 2 and got a rmsip.dat file. But it contained only

2 lines. Is it how the rmsip file look like? The result looked like this

#evecs2.dat_X_evecs3.dat

0.396164

Input:

parm parmsg.prmtop

trajin protsg.nc

trajin protdg.nc

readdata evecs2.dat

readdata evecs3.dat

modes rmsip out rmsip.dat name evecs2.dat name2 evecs3.dat beg 1 end 5

Regards,

Sruthi Sudhakar

On Thu, Jun 17, 2021 at 9:36 PM Sruthi Sudhakar <

sruthisudhakarraji.gmail.com> wrote:

*> Trajin is given Actually it was a typo. trajin fit1.nc was loaded and
*

*> like I said above T1.dat has 250000 frames but T2.dat has given the error.
*

*> I had attached the files in the mail I had directly send you. Is there any
*

*> other error in the pca1.in, pca2.in and pca3.in. If not I am not sure
*

*> what is happening to the T2.dat. Because almost all the kde and hist
*

*> commands seems to give odd results.
*

*> Regards,
*

*> Sruthi Sudhakar
*

*>
*

*>
*

*> On Thu, Jun 17, 2021 at 8:36 PM Daniel Roe <daniel.r.roe.gmail.com> wrote:
*

*>
*

*>> Hi,
*

*>>
*

*>> It looks like in your 'pca3.in' file you're missing the 'trajin'
*

*>> statements, i.e. there is nothing for the 'projection' commands to
*

*>> work with.
*

*>>
*

*>> -Dan
*

*>>
*

*>> On Thu, Jun 17, 2021 at 7:23 AM Sruthi Sudhakar
*

*>> <sruthisudhakarraji.gmail.com> wrote:
*

*>> >
*

*>> > Thanks Dr. Anselm for adding the comments. I had some more follow up
*

*>> > questions on the approach one , which I had directly mailed Dan. But for
*

*>> > beginners like me who might face similar issue, I would like to post the
*

*>> > question here. And I would really appreciate it if someone would just
*

*>> > verify if my script is right or not. Because it has been really a
*

*>> > tough task to find the loophole in this since I feel my input is right.
*

*>> > I have tried approach one, where I stripped off the nucleic acid part
*

*>> in
*

*>> > both (box info not removed) and used a common prmtop file. But in that
*

*>> > case, I do face some errors in the modes of the second trajectory. The
*

*>> > method I used is
*

*>> > 1. load both the stripped trajectories, rms fit to the first frame, and
*

*>> > write out the average file;
*

*>> > 2. the second stage loaded the 2 stripped trajectories, rms fit the
*

*>> > trajectories to the average, calculated covariance matrix and wrote out
*

*>> a
*

*>> > fit nc file (500000 frames - the sum of both trajectories) and used the
*

*>> > diagmatrix command;
*

*>> > 3. In the third stage where all the analysis were done I had faced the
*

*>> > error. In the third stage, the fit trajectory with 500000 frames was
*

*>> loaded
*

*>> > and I used the projection command to get the PC 1 and 2 for the
*

*>> > trajectories separately by dividing the frames which gave odd results.
*

*>> The
*

*>> > KL-PC1 (according to the script below, all inputs are attached too) was
*

*>> not
*

*>> > formed at all. T1.dat read 250000 frames but the T2.dat had 500000
*

*>> > coordinates. Projection of T2 was supposed to begin from the 250001th
*

*>> frame
*

*>> > and end in the last frame. The histogram of T1 gave a Gaussian
*

*>> distribution
*

*>> > but the histogram of T2 was not giving a Gaussian distribution. T2.dat
*

*>> > contained all values of mode 1 and mode 2 as zero for the first 250000
*

*>> > frames.
*

*>> >
*

*>> > Could you give any insights regarding why this might be happening? I
*

*>> > suspect it is some error in the script below (input.txt), since, in
*

*>> > independent analysis, only the loaded frames appeared in the modes,
*

*>> unlike
*

*>> > the T2 of this case. Also, would you recommend combining the
*

*>> trajectories
*

*>> > much before fitting the trajectory, where instead of loading 2 stripped
*

*>> > trajectories in the first stage and then creating a combined fit
*

*>> > trajectory, we could directly create a single trajectory in the very
*

*>> first
*

*>> > stage?
*

*>> > Regards,
*

*>> > Sruthi Sudhakar
*

*>> >
*

*>> > parm parmsg.prmtop
*

*>> > trajin protsg.nc
*

*>> > trajin protdg.nc
*

*>> > # Fit to the first frame to remove global rotation/translation
*

*>> > rms first :1-1362.CA
*

*>> > # Write out the restart
*

*>> > average gaccAvg1.rst7 restart
*

*>> >
*

*>> > pca2.in
*

*>> > parm parmsg.prmtop
*

*>> > trajin protsg.nc
*

*>> > trajin protdg.nc
*

*>> > # Read in average structure, tag as [avg]
*

*>> > reference gaccAvg1.rst7 [avg]
*

*>> > # RMS-fit coordinates to average
*

*>> > rms ref [avg] :1-1362.CA
*

*>> > # Calculate coordinate covariance matrix
*

*>> > matrix covar :1-1362.CA name gaccCovar
*

*>> > # Write out the fit trajectory
*

*>> > trajout fit1.nc
*

*>> > # Diagonalize coordinate covariance matrix for eigenmodes (only 2)
*

*>> > diagmatrix gaccCovar out evecs1.dat vecs 20 nmwiz nmwizvecs 5 nmwizfile
*

*>> > prot.nmd nmwizmask :1-1362.CA
*

*>> >
*

*>> > Pca3.in
*

*>> > parm parmsg.prmtoptrajin fit1.nc
*

*>> > # Read in modes
*

*>> > readdata evecs1.dat name MyModes
*

*>> > # Now create separate PC projections for each trajectory
*

*>> > projection T1 modes MyModes beg 1 end 2 :1-1362.CA start 1 stop 250000
*

*>> out
*

*>> > T1.dat
*

*>> > projection T2 modes MyModes beg 1 end 2 :1-1362.CA start 250001 stop
*

*>> 500000
*

*>> > out T2.dat
*

*>> > # Calculate Kullback-Leibler Divergence vs time for PC histogramskde
*

*>> T1:1
*

*>> > kldiv T2:1 klout KL-PC1.dat bins 200 name SG-DG-1
*

*>> > kde T1:2 kldiv T2:2 klout KL-PC2.dat bins 200 name SG-DG-2
*

*>> > kde T1:1 out kde1-PC1.dat bins 200 name KDE1-1
*

*>> > kde T2:1 out kde2-PC1.dat bins 200 name KDE2-1hist T1:1 bins 200 out
*

*>> > T1-hist1.dat normint name HIST1-1
*

*>> > hist T1:2 bins 200 out T1-hist2.dat normint name HIST1-2
*

*>> > hist T2:1 bins 200 out T2-hist1.dat normint name HIST2-1
*

*>> > hist T2:2 bins 200 out T2-hist2.dat normint name HIST2-2
*

*>> >
*

*>> >
*

*>> >
*

*>> > On Thu, Jun 17, 2021 at 2:52 PM Dr. Anselm Horn <anselm.horn.fau.de>
*

*>> wrote:
*

*>> >
*

*>> > > Hi,
*

*>> > >
*

*>> > > many thanks to Dan for his explaining comments regarding the PCA
*

*>> analysis!
*

*>> > >
*

*>> > > Since this issue (different systems) bothered me for many years and I
*

*>> > > had many discussions about that I'd like to add my thoughts here.
*

*>> > >
*

*>> > > I see some disadvantages of approach 2, i.e. performing isolated PCA
*

*>> > > analyses on the different systems. As Dan pointed out, the underlying
*

*>> > > eigenvectors will certainly not be the same. Although a pairwise
*

*>> > > comparison (e.g. via RMSIP) between the eigenvectors of the two
*

*>> systems
*

*>> > > is possible, in praxi there may arise problems:
*

*>> > > 1) the eigenvectors will have a different size, if you include all
*

*>> atoms
*

*>> > > of the full systems, or the vector entries may represent different
*

*>> > > atoms. Thus, a comparison may not be that straightforward.
*

*>> > > 2) from the pairwise comparison between the eigenvectors of the two
*

*>> > > systems you deduce a correspondence between them. Maybe in real
*

*>> systems
*

*>> > > the most important eigenvectors can clearly be matched, but in
*

*>> principle
*

*>> > > the matching may not be that distinct. And this problem may arise even
*

*>> > > if you restrict your analysis to atoms common on both system (e.g.
*

*>> > > backbone or CA).
*

*>> > > 3) The ease of application of isolated PCA-analysis may seduce the
*

*>> user
*

*>> > > to omit that matching step and just proceed to the well-known
*

*>> > > PCA1:PCA2-histogram plots.
*

*>> > >
*

*>> > > That said, I totally agree with Dan that you may use approach 2 in a
*

*>> > > very elegant way to obtain histogram plots of a similar pattern, if
*

*>> you
*

*>> > > put an extra efford into the comparison step and be aware that the
*

*>> basis
*

*>> > > eigenvectors resemble a similar basis, i.e. describe a similar overall
*

*>> > > motion, but are not the same.
*

*>> > >
*

*>> > > It's my feeling that approach 1, where you restrict you analysis to a
*

*>> > > subset of atoms occuring in all systems investigated (backbone or CA),
*

*>> > > is more "save" and thus should be considered first. In that way, you
*

*>> are
*

*>> > > sure that the basis of all your PCA histograms is the same, and
*

*>> > > differences and equalities in the plot between different systems can
*

*>> > > directly be compared and discussed.
*

*>> > > However, using approach 2 in addition can provide further insights
*

*>> into
*

*>> > > the systems' dynamics.
*

*>> > > And I also agree with Dan that visual inspection might be very
*

*>> helpful.
*

*>> > >
*

*>> > > Best regards,
*

*>> > >
*

*>> > > Anselm
*

*>> > >
*

*>> > >
*

*>> > >
*

*>> > > On 06/16/2021 03:18 PM, Daniel Roe wrote:
*

*>> > > > Hi,
*

*>> > > >
*

*>> > > > I know I just responded to you directly but I'll post the reply here
*

*>> > > > as well in case people search for it in the future.
*

*>> > > >
*

*>> > > > So when you have different systems and you want to perform principal
*

*>> > > > component analysis there are two ways you can go. You can either 1)
*

*>> > > > modify one or both of the systems so you have a common core (i.e.
*

*>> > > > remove the parts of each system via 'strip' which don't match the
*

*>> > > > other) and perform a combined PCA on that, or 2) perform PCA on each
*

*>> > > > system separately and then compare the resulting eigenvectors via
*

*>> > > > something like RMSIP (root mean square inner product).
*

*>> > > >
*

*>> > > > The advantage of approach 1 is the usual advantage of the combined
*

*>> PCA
*

*>> > > > approach - you're guaranteed that each eigenvector is the same for
*

*>> the
*

*>> > > > separate systems. This works best when the differences between the
*

*>> > > > systems are not too extreme, but if the systems are too different
*

*>> this
*

*>> > > > should show up as e.g. non-overlapping PC projection histograms. The
*

*>> > > > disadvantage is that if you have important contributions to the
*

*>> > > > overall motion from parts that you've removed this of course will
*

*>> not
*

*>> > > > show up in your final analysis.
*

*>> > > >
*

*>> > > > The advantage of approach 2 is that you can perform PCA on each full
*

*>> > > > system, so you won't miss any motion. The disadvantage is that
*

*>> you're
*

*>> > > > not guaranteed that the eigenvectors from each analysis will match
*

*>> > > > (and it's likely they won't), i.e. PC 1 in system A will not be the
*

*>> > > > same as PC 1 in system B. This is where you would use RMSIP to
*

*>> compare
*

*>> > > > the different PCs to see which (if any) match.
*

*>> > > >
*

*>> > > > My feeling is that both approaches should be tried. Approach 2 will
*

*>> > > > give you a sense for whether the motions in the systems are similar
*

*>> > > > enough for approach 1 to make sense. Also, really make use of the
*

*>> > > > 'modes trajout' analysis to generate pseudo trajectories (and also
*

*>> the
*

*>> > > > 'nmwiz' keyword for the 'diagmatrix' command for use with the VMD
*

*>> > > > nmwiz plugin) for the first 2-3 PCs for each system to get a sense
*

*>> for
*

*>> > > > what the motions actually look like. In my opinion there's really no
*

*>> > > > substitute for visualization when it comes to PCA.
*

*>> > > >
*

*>> > > > Hope this helps!
*

*>> > > >
*

*>> > > > -Dan
*

*>> > > >
*

*>> > > > On Sat, Jun 12, 2021 at 5:16 PM Sruthi Sudhakar
*

*>> > > > <sruthisudhakarraji.gmail.com> wrote:
*

*>> > > >>
*

*>> > > >> This method of breaking the analysis into stages actually worked
*

*>> and I
*

*>> > > was
*

*>> > > >> able to complete the analysis overcoming the memory issues.
*

*>> > > >> An additional query: Incase we are using 2 different trajectories
*

*>> with
*

*>> > > >> different number of residues (1530 and 1526) to do a combined
*

*>> trajectory
*

*>> > > >> analysis, would it be enough to load gamd2.nc (1530 residues) and
*

*>> > > >> gamd3.nc (1526
*

*>> > > >> residues) together? Each has 250000 frames. And in that case do we
*

*>> need
*

*>> > > to
*

*>> > > >> fit into a single trajectory (fit.nc) or separately create 2 fit
*

*>> > > >> trajectory? I am doubtful about the third stage where we load the
*

*>> fit
*

*>> > > >> trajectory, should we load both the parm files in that case? Also
*

*>> would
*

*>> > > the
*

*>> > > >> difference in the total number of residues create any problem?
*

*>> > > >>
*

*>> > > >> My sample script following your guidelines would be
*

*>> > > >> pca1.in
*

*>> > > >> parm noionsa.prmtop [ag]
*

*>> > > >> parm noionsb.prmtop [bg]
*

*>> > > >> trajin gamd2.nc parm [ag]
*

*>> > > >> trajin gamd3.nc parm [bg]
*

*>> > > >> # Fit to the first frame to remove global rotation/translation
*

*>> > > >> rms first :1-1362.CA
*

*>> > > >> # Write out the restart
*

*>> > > >> average gaccAvg.rst7 restart
*

*>> > > >>
*

*>> > > >> pca2.in
*

*>> > > >> parm noionsa.prmtop [ag]
*

*>> > > >> parm noionsb.prmtop [bg]
*

*>> > > >> trajin gamd2.nc parm [ag]
*

*>> > > >> trajin gamd3.nc parm [bg]
*

*>> > > >> # Read in average structure, tag as [avg]
*

*>> > > >> reference gaccAvg.rst7 [avg]
*

*>> > > >> # RMS-fit coordinates to average
*

*>> > > >> rms ref [avg] :1-1362.CA
*

*>> > > >> # Calculate coordinate covariance matrix
*

*>> > > >> matrix covar :1-1362.CA name gaccCovar
*

*>> > > >> # Write out the fit trajectory
*

*>> > > >> trajout fit.nc
*

*>> > > >> # Diagonalize coordinate covariance matrix for eigenmodes (only 2)
*

*>> > > >> diagmatrix gaccCovar out evecs.dat vecs 20 nmwiz nmwizvecs 5
*

*>> > > >> nmwizfile prot.nmd nmwizmask :1-1362.CA
*

*>> > > >>
*

*>> > > >> pca3.in
*

*>> > > >>
*

*>> > > >> parm noionsa.prmtop
*

*>> > > >> parm noionsb.prmtop
*

*>> > > >> trajin fit.nc
*

*>> > > >> # Read in modes
*

*>> > > >> readdata evecs.dat name MyModes
*

*>> > > >> # Now create separate PC projections for each trajectory
*

*>> > > >> projection T1 modes MyModes beg 1 end 20 :1-1362.CA start 1 stop
*

*>> > > 250000 out
*

*>> > > >> T1.dat
*

*>> > > >> projection T2 modes MyModes beg 1 end 20 :1-1362.CA start 250001
*

*>> stop
*

*>> > > >> 500000 out T2.dat
*

*>> > > >> # Calculate Kullback-Leibler Divergence vs time for PC histograms
*

*>> > > >> kde T1:1 kldiv T2:1 klout KL-PC1.dat bins 200 name SG-DG-1
*

*>> > > >> kde T1:2 kldiv T2:2 klout KL-PC2.dat bins 200 name SG-DG-2
*

*>> > > >> kde T1:1 out kde1-PC1.dat bins 200 name KDE1-1
*

*>> > > >> kde T2:1 out kde2-PC1.dat bins 200 name KDE2-1
*

*>> > > >> hist T1:1 bins 200 out T1-hist1.dat normint name HIST1-1
*

*>> > > >> hist T1:2 bins 200 out T1-hist2.dat normint name HIST1-2
*

*>> > > >> hist T2:1 bins 200 out T2-hist1.dat normint name HIST2-1
*

*>> > > >> hist T2:2 bins 200 out T2-hist2.dat normint name HIST2-2
*

*>> > > >>
*

*>> > > >> Kindly suggest any corrections required.
*

*>> > > >> Regards,
*

*>> > > >> Sruthi Sudhakar
*

*>> > > >>
*

*>> > > >>
*

*>> > > >> On Sat, Jun 12, 2021 at 12:11 PM Sruthi Sudhakar <
*

*>> > > >> sruthisudhakarraji.gmail.com> wrote:
*

*>> > > >>
*

*>> > > >>> This method actually worked and I was able to complete the
*

*>> analysis. An
*

*>> > > >>> additional query: Incase we are using 2 different trajectories to
*

*>> do a
*

*>> > > >>> combined trajectory analysis, would it be enough to load gamd2.nc
*

*>> and
*

*>> > > >>> gamd3.nc together? And in that case do we need to fit into a
*

*>> single
*

*>> > > >>> trajectory (fit.nc) or separately create 2 fit trajectory?
*

*>> > > >>>
*

*>> > > >>> Regards,
*

*>> > > >>> Sruthi Sudhakar
*

*>> > > >>>
*

*>> > > >>>
*

*>> > > >>> On Wed, Jun 2, 2021 at 11:32 PM Daniel Roe <
*

*>> daniel.r.roe.gmail.com>
*

*>> > > wrote:
*

*>> > > >>>
*

*>> > > >>>> Hi,
*

*>> > > >>>>
*

*>> > > >>>> First, if you haven't already I really recommend that you look
*

*>> at some
*

*>> > > >>>> literature to familiarize yourself with what principal component
*

*>> > > >>>> analysis actually does. It's important to understand what each
*

*>> part of
*

*>> > > >>>> the analysis is for. I like the old "essential dynamics" papers,
*

*>> but a
*

*>> > > >>>> very rough explanation is given in the cpptraj PCA tutorial:
*

*>> > > >>>>
*

*>> > > >>>>
*

*>> > >
*

*>> https://amberhub.chpc.utah.edu/introduction-to-principal-component-analysis/
*

*>> > > >>>>
*

*>> > > >>>> Also, read the cpptraj manual so you know what each command is
*

*>> > > actually
*

*>> > > >>>> doing.
*

*>> > > >>>>
*

*>> > > >>>> That said, I now realize my examples in the manual and the
*

*>> tutorial
*

*>> > > >>>> both rely on having the trajectories stored in memory. So I will
*

*>> give
*

*>> > > >>>> a rough outline of how to do it all on-disk.
*

*>> > > >>>>
*

*>> > > >>>> 1) Create average coordinates.
*

*>> > > >>>>
*

*>> > > >>>> parm noions.prmtop
*

*>> > > >>>> trajin gamd2.nc
*

*>> > > >>>> # Fit to the first frame to remove global rotation/translation
*

*>> > > >>>> rms first :1-1530&!.H=
*

*>> > > >>>> # Write out the restart
*

*>> > > >>>> average gaccAvg.rst7 restart
*

*>> > > >>>>
*

*>> > > >>>> 2) Rms fit your trajectory to average coordinates, calculate the
*

*>> > > >>>> covariance matrix, write out the fit trajectory, diagonalize the
*

*>> > > >>>> matrix and write out the "modes" data (i.e. eigenvectors and
*

*>> > > >>>> eigenvalues).
*

*>> > > >>>>
*

*>> > > >>>> parm noions.prmtop
*

*>> > > >>>> trajin gamd2.nc
*

*>> > > >>>> # Read in average structure, tag as [avg]
*

*>> > > >>>> reference gaccAvg.rst7 [avg]
*

*>> > > >>>> # RMS-fit coordinates to average
*

*>> > > >>>> rms ref [avg] :1-1530&!.H=
*

*>> > > >>>> # Calculate coordinate covariance matrix
*

*>> > > >>>> matrix covar :1-1530&!.H= name gaccCovar
*

*>> > > >>>> # Write out the fit trajectory
*

*>> > > >>>> trajout fit.nc
*

*>> > > >>>> # Diagonalize coordinate covariance matrix for eigenmodes (only
*

*>> 2)
*

*>> > > >>>> diagmatrix gaccCovar out evecs.dat vecs 2 nmwiz nmwizvecs 2
*

*>> nmwizfile
*

*>> > > >>>> cas.nmd nmwizmask :1-1530&!.H=
*

*>> > > >>>>
*

*>> > > >>>>
*

*>> > > >>>> 3) Read in the "modes" data, calculate principal component
*

*>> projections
*

*>> > > >>>> from the fit trajectory, do the Kullback-Leibler divergence
*

*>> analysis.
*

*>> > > >>>>
*

*>> > > >>>> parm noions.prmtop
*

*>> > > >>>> trajin fit.nc
*

*>> > > >>>> # Read in modes
*

*>> > > >>>> readdata evecs.dat name MyModes
*

*>> > > >>>> # Now create separate PC projections for each trajectory
*

*>> > > >>>> projection T1 modes MyModes beg 1 end 2 :1-1530&!.H= start 1 stop
*

*>> > > >>>> 50000 out T1.dat
*

*>> > > >>>> projection T2 modes MyModes beg 1 end 2 :1-1530&!.H= start 50001
*

*>> stop
*

*>> > > >>>> 100000 out T2.dat
*

*>> > > >>>> <other projection commands>
*

*>> > > >>>> # Calculate Kullback-Leibler Divergence vs time for PC histograms
*

*>> > > >>>> kde T1:1 kldiv T2:1 klout KL-PC.agr bins 400 name AMD-MREMD-1
*

*>> > > >>>> <other kde commands and analyses>
*

*>> > > >>>>
*

*>> > > >>>> Hope this helps,
*

*>> > > >>>>
*

*>> > > >>>> -Dan
*

*>> > > >>>>
*

*>> > > >>>> On Tue, Jun 1, 2021 at 3:23 PM Sruthi Sudhakar
*

*>> > > >>>> <sruthisudhakarraji.gmail.com> wrote:
*

*>> > > >>>>>
*

*>> > > >>>>> The available memory in the beginning of cpptraj is shown as 32
*

*>> Gb
*

*>> > > and
*

*>> > > >>>>> estimated memory usage is 82gb. I am really confused about this
*

*>> > > memory
*

*>> > > >>>>> allotment statistics. I am doing the process in a disk with
*

*>> more than
*

*>> > > >>>> 3TB
*

*>> > > >>>>> space. I am not well versed with this technicality. Could
*

*>> someone
*

*>> > > please
*

*>> > > >>>>> exaplain how to overcome the issue in this principal component
*

*>> > > analysis
*

*>> > > >>>>> part? I did understand that we have to separate the analysis
*

*>> into 3
*

*>> > > >>>> phases
*

*>> > > >>>>> but not clear as to how the inputs should be changed. Kindly
*

*>> advise.
*

*>> > > >>>>>
*

*>> > > >>>>> On Tue, 1 Jun 2021 at 7:21 PM, Sruthi Sudhakar <
*

*>> > > >>>> sruthisudhakarraji.gmail.com>
*

*>> > > >>>>> wrote:
*

*>> > > >>>>>
*

*>> > > >>>>>>
*

*>> > > >>>>>> Thank you for the reply. Since I am doing this for the first
*

*>> time, I
*

*>> > > >>>>>> wanted to know if I am supposed to create 3 separate inputs
*

*>> to run
*

*>> > > in
*

*>> > > >>>>>> cpptraj to do the methodology you suggested.
*

*>> > > >>>>>>
*

*>> > > >>>>>> Regards,
*

*>> > > >>>>>> Sruthi
*

*>> > > >>>>>>
*

*>> > > >>>>>> On Tue, 1 Jun 2021 at 6:33 PM, Daniel Roe <
*

*>> daniel.r.roe.gmail.com>
*

*>> > > >>>> wrote:
*

*>> > > >>>>>>
*

*>> > > >>>>>>> Hi,
*

*>> > > >>>>>>>
*

*>> > > >>>>>>> You are likely running out of memory. This is why your
*

*>> problems
*

*>> > > >>>> during
*

*>> > > >>>>>>> clustering went away when you reduced the number of input
*

*>> frames
*

*>> > > by a
*

*>> > > >>>>>>> factor of 5. The solution is to do everything on disk. So
*

*>> instead
*

*>> > > of
*

*>> > > >>>>>>> loading all coordinates into memory, separate the principal
*

*>> > > component
*

*>> > > >>>>>>> analysis into three separate phases:
*

*>> > > >>>>>>>
*

*>> > > >>>>>>> 1) Create average coordinates.
*

*>> > > >>>>>>> 2) Rms fit your trajectory to average coordinates, calculate
*

*>> the
*

*>> > > >>>>>>> covariance matrix, write out the fit trajectory, diagonalize
*

*>> the
*

*>> > > >>>>>>> matrix and write out the "modes" data (i.e. eigenvectors and
*

*>> > > >>>>>>> eigenvalues).
*

*>> > > >>>>>>> 3) Read in the "modes" data, calculate principal component
*

*>> > > >>>> projections
*

*>> > > >>>>>>> from the fit trajectory, do the Kullback-Leibler divergence
*

*>> > > analysis.
*

*>> > > >>>>>>>
*

*>> > > >>>>>>> This way, the most memory you need is to store the covariance
*

*>> > > matrix,
*

*>> > > >>>>>>> modes, and other data of that type. Hope this helps,
*

*>> > > >>>>>>>
*

*>> > > >>>>>>> -Dan
*

*>> > > >>>>>>>
*

*>> > > >>>>>>> PS - Note that there appears to be a small error in the input
*

*>> you
*

*>> > > >>>>>>> posted (in pca.in). The 'nmwiz' keyword should be part of the
*

*>> > > >>>>>>> diagmatrix command, not on a separate line.
*

*>> > > >>>>>>>
*

*>> > > >>>>>>>
*

*>> > > >>>>>>> On Mon, May 31, 2021 at 1:22 PM Sruthi Sudhakar
*

*>> > > >>>>>>> <sruthisudhakarraji.gmail.com> wrote:
*

*>> > > >>>>>>>>
*

*>> > > >>>>>>>> Dear all,
*

*>> > > >>>>>>>>
*

*>> > > >>>>>>>> I have been doing pca analysis on an accelerated MD
*

*>> trajectory of
*

*>> > > >>>> 500ns
*

*>> > > >>>>>>>> (250,000 frames). I have attached the input file I have used
*

*>> for
*

*>> > > >>>> the
*

*>> > > >>>>>>> study.
*

*>> > > >>>>>>>> The job stops at the createcrd stage. Basically, the job gets
*

*>> > > >>>> killed at
*

*>> > > >>>>>>>> 30%. The same happened during the cluster analysis reading
*

*>> every
*

*>> > > >>>> frames.
*

*>> > > >>>>>>>> The clustering error was solved when I changed the input to
*

*>> read
*

*>> > > >>>> every
*

*>> > > >>>>>>> 5th
*

*>> > > >>>>>>>> frame. Now since this is repeating in pca analysis, kindly
*

*>> help
*

*>> > > >>>>>>> regarding
*

*>> > > >>>>>>>> the same.
*

*>> > > >>>>>>>>
*

*>> > > >>>>>>>>
*

*>> > > >>>>>>>> Regards,
*

*>> > > >>>>>>>> Sruthi Sudhakar
*

*>> > > >>>>>>>> _______________________________________________
*

*>> > > >>>>>>>> AMBER mailing list
*

*>> > > >>>>>>>> AMBER.ambermd.org
*

*>> > > >>>>>>>> http://lists.ambermd.org/mailman/listinfo/amber
*

*>> > > >>>>>>>
*

*>> > > >>>>>>> _______________________________________________
*

*>> > > >>>>>>> AMBER mailing list
*

*>> > > >>>>>>> AMBER.ambermd.org
*

*>> > > >>>>>>> http://lists.ambermd.org/mailman/listinfo/amber
*

*>> > > >>>>>>>
*

*>> > > >>>>>>
*

*>> > > >>>>> _______________________________________________
*

*>> > > >>>>> AMBER mailing list
*

*>> > > >>>>> AMBER.ambermd.org
*

*>> > > >>>>> http://lists.ambermd.org/mailman/listinfo/amber
*

*>> > > >>>>
*

*>> > > >>>> _______________________________________________
*

*>> > > >>>> AMBER mailing list
*

*>> > > >>>> AMBER.ambermd.org
*

*>> > > >>>> http://lists.ambermd.org/mailman/listinfo/amber
*

*>> > > >>>>
*

*>> > > >>>
*

*>> > > >> _______________________________________________
*

*>> > > >> AMBER mailing list
*

*>> > > >> AMBER.ambermd.org
*

*>> > > >> http://lists.ambermd.org/mailman/listinfo/amber
*

*>> > > >
*

*>> > > > _______________________________________________
*

*>> > > > AMBER mailing list
*

*>> > > > AMBER.ambermd.org
*

*>> > > > http://lists.ambermd.org/mailman/listinfo/amber
*

*>> > > >
*

*>> > >
*

*>> > >
*

*>> > > _______________________________________________
*

*>> > > AMBER mailing list
*

*>> > > AMBER.ambermd.org
*

*>> > > http://lists.ambermd.org/mailman/listinfo/amber
*

*>> > >
*

*>> > _______________________________________________
*

*>> > AMBER mailing list
*

*>> > AMBER.ambermd.org
*

*>> > http://lists.ambermd.org/mailman/listinfo/amber
*

*>>
*

*>> _______________________________________________
*

*>> AMBER mailing list
*

*>> AMBER.ambermd.org
*

*>> http://lists.ambermd.org/mailman/listinfo/amber
*

*>>
*

*>
*

_______________________________________________

AMBER mailing list

AMBER.ambermd.org

http://lists.ambermd.org/mailman/listinfo/amber

Received on Fri Jun 18 2021 - 23:30:02 PDT

Date: Sat, 19 Jun 2021 11:42:37 +0530

I have tried the approach 2 and got a rmsip.dat file. But it contained only

2 lines. Is it how the rmsip file look like? The result looked like this

#evecs2.dat_X_evecs3.dat

0.396164

Input:

parm parmsg.prmtop

trajin protsg.nc

trajin protdg.nc

readdata evecs2.dat

readdata evecs3.dat

modes rmsip out rmsip.dat name evecs2.dat name2 evecs3.dat beg 1 end 5

Regards,

Sruthi Sudhakar

On Thu, Jun 17, 2021 at 9:36 PM Sruthi Sudhakar <

sruthisudhakarraji.gmail.com> wrote:

_______________________________________________

AMBER mailing list

AMBER.ambermd.org

http://lists.ambermd.org/mailman/listinfo/amber

Received on Fri Jun 18 2021 - 23:30:02 PDT

Custom Search