- Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]

From: Sruthi Sudhakar <sruthisudhakarraji.gmail.com>

Date: Sat, 26 Jun 2021 11:23:16 +0530

Dear all,

Sorry for making this thread particularlylong. But even using the above

inputs my T2.dat file contains 500,000 data points in place of the 250,000

frames I have given in the input. Though there are 500,000 points, the

first 250,000 are just zeros. and the kde command is not getting completed.

Regards,

Sruthi Sudhakar

On Mon, Jun 21, 2021 at 8:03 PM Daniel Roe <daniel.r.roe.gmail.com> wrote:

*> Hi,
*

*>
*

*> Sorry for the delay. This email thread has gotten long, and has also
*

*> been split up between the mailing list and directly so I may have
*

*> missed something, but I think your issue is that in step three you're
*

*> doing the projection on both trajectories. So with that in mind I will
*

*> try to just post a version of the example I gave previously, modified
*

*> for two separate systems. Note an additional first step where I create
*

*> a stripped "combined" trajectory that only keeps common elements from
*

*> each system:
*

*>
*

*> 1) Create a combined, stripped trajectory from 2 separate
*

*> trajectories. Keep in mind that the topology you get from the 'strip'
*

*> command will probably have different numbering (atom and residue
*

*> indexing in Amber always starts from 1 and goes up in integer
*

*> increments).
*

*>
*

*> parm system1.parm
*

*> trajin system1.nc parmindex 0
*

*> parm system2.parm
*

*> trajin system2.nc parmindex 1
*

*> strip <mask that removes anything different> parmout combined.parm7
*

*> trajout combined.nc
*

*>
*

*> 2) Create average coordinates from combined trajectory
*

*>
*

*> parm combined.parm7
*

*> trajin combined.nc
*

*> # Fit to the first frame to remove global rotation/translation
*

*> rms first <mask>
*

*> # Write out the restart
*

*> average Avg.rst7 restart
*

*>
*

*> 3) Rms fit your trajectory to average coordinates, calculate the
*

*> covariance matrix, write out the fit trajectory, diagonalize the
*

*> matrix and write out the "modes" data (i.e. eigenvectors and
*

*> eigenvalues).
*

*>
*

*> parm combined.parm7
*

*> trajin combined.nc
*

*> # Read in average structure, tag as [avg]
*

*> reference Avg.rst7 [avg]
*

*> # RMS-fit coordinates to average
*

*> rms ref [avg] <mask>
*

*> # Calculate coordinate covariance matrix
*

*> matrix covar <mask> name Covar
*

*> # Write out the fit trajectory
*

*> trajout fit.nc
*

*> # Diagonalize coordinate covariance matrix for eigenmodes (only 2)
*

*> diagmatrix Covar out evecs.dat vecs 2 nmwiz nmwizvecs 2 nmwizfile
*

*> cas.nmd nmwizmask <mask>
*

*>
*

*>
*

*> 4) Read in the "modes" data, calculate principal component projections
*

*> from the fit trajectory, do the Kullback-Leibler divergence analysis.
*

*>
*

*> parm combined.parm7
*

*> trajin combined.nc
*

*> # Read in modes
*

*> readdata evecs.dat name MyModes
*

*> # Now create separate PC projections for each trajectory
*

*> projection T1 modes MyModes beg 1 end 2 <mask> start 1 stop <# frames
*

*> in system1.nc> out T1.dat
*

*> projection T2 modes MyModes beg 1 end 2 <mask> start <# frames in
*

*> system1.nc + 1> stop <# frames in combined.nc> out T2.dat
*

*> <other projection commands>
*

*> # Calculate Kullback-Leibler Divergence vs time for PC histograms
*

*> kde T1:1 kldiv T2:1 klout KL-PC.agr bins 400 name AMD-MREMD-1
*

*> <other kde commands and analyses>
*

*>
*

*> Hope this helps,
*

*>
*

*> -Dan
*

*>
*

*> On Sat, Jun 19, 2021 at 2:13 AM Sruthi Sudhakar
*

*> <sruthisudhakarraji.gmail.com> wrote:
*

*> >
*

*> > I have tried the approach 2 and got a rmsip.dat file. But it contained
*

*> only
*

*> > 2 lines. Is it how the rmsip file look like? The result looked like this
*

*> > #evecs2.dat_X_evecs3.dat
*

*> > 0.396164
*

*> >
*

*> >
*

*> >
*

*> > Input:
*

*> > parm parmsg.prmtop
*

*> > trajin protsg.nc
*

*> > trajin protdg.nc
*

*> > readdata evecs2.dat
*

*> > readdata evecs3.dat
*

*> > modes rmsip out rmsip.dat name evecs2.dat name2 evecs3.dat beg 1 end 5
*

*> >
*

*> > Regards,
*

*> > Sruthi Sudhakar
*

*> >
*

*> >
*

*> > On Thu, Jun 17, 2021 at 9:36 PM Sruthi Sudhakar <
*

*> > sruthisudhakarraji.gmail.com> wrote:
*

*> >
*

*> > > Trajin is given Actually it was a typo. trajin fit1.nc was loaded and
*

*> > > like I said above T1.dat has 250000 frames but T2.dat has given the
*

*> error.
*

*> > > I had attached the files in the mail I had directly send you. Is
*

*> there any
*

*> > > other error in the pca1.in, pca2.in and pca3.in. If not I am not sure
*

*> > > what is happening to the T2.dat. Because almost all the kde and hist
*

*> > > commands seems to give odd results.
*

*> > > Regards,
*

*> > > Sruthi Sudhakar
*

*> > >
*

*> > >
*

*> > > On Thu, Jun 17, 2021 at 8:36 PM Daniel Roe <daniel.r.roe.gmail.com>
*

*> wrote:
*

*> > >
*

*> > >> Hi,
*

*> > >>
*

*> > >> It looks like in your 'pca3.in' file you're missing the 'trajin'
*

*> > >> statements, i.e. there is nothing for the 'projection' commands to
*

*> > >> work with.
*

*> > >>
*

*> > >> -Dan
*

*> > >>
*

*> > >> On Thu, Jun 17, 2021 at 7:23 AM Sruthi Sudhakar
*

*> > >> <sruthisudhakarraji.gmail.com> wrote:
*

*> > >> >
*

*> > >> > Thanks Dr. Anselm for adding the comments. I had some more follow up
*

*> > >> > questions on the approach one , which I had directly mailed Dan.
*

*> But for
*

*> > >> > beginners like me who might face similar issue, I would like to
*

*> post the
*

*> > >> > question here. And I would really appreciate it if someone would
*

*> just
*

*> > >> > verify if my script is right or not. Because it has been really a
*

*> > >> > tough task to find the loophole in this since I feel my input is
*

*> right.
*

*> > >> > I have tried approach one, where I stripped off the nucleic acid
*

*> part
*

*> > >> in
*

*> > >> > both (box info not removed) and used a common prmtop file. But in
*

*> that
*

*> > >> > case, I do face some errors in the modes of the second trajectory.
*

*> The
*

*> > >> > method I used is
*

*> > >> > 1. load both the stripped trajectories, rms fit to the first frame,
*

*> and
*

*> > >> > write out the average file;
*

*> > >> > 2. the second stage loaded the 2 stripped trajectories, rms fit the
*

*> > >> > trajectories to the average, calculated covariance matrix and wrote
*

*> out
*

*> > >> a
*

*> > >> > fit nc file (500000 frames - the sum of both trajectories) and used
*

*> the
*

*> > >> > diagmatrix command;
*

*> > >> > 3. In the third stage where all the analysis were done I had faced
*

*> the
*

*> > >> > error. In the third stage, the fit trajectory with 500000 frames was
*

*> > >> loaded
*

*> > >> > and I used the projection command to get the PC 1 and 2 for the
*

*> > >> > trajectories separately by dividing the frames which gave odd
*

*> results.
*

*> > >> The
*

*> > >> > KL-PC1 (according to the script below, all inputs are attached too)
*

*> was
*

*> > >> not
*

*> > >> > formed at all. T1.dat read 250000 frames but the T2.dat had 500000
*

*> > >> > coordinates. Projection of T2 was supposed to begin from the
*

*> 250001th
*

*> > >> frame
*

*> > >> > and end in the last frame. The histogram of T1 gave a Gaussian
*

*> > >> distribution
*

*> > >> > but the histogram of T2 was not giving a Gaussian distribution.
*

*> T2.dat
*

*> > >> > contained all values of mode 1 and mode 2 as zero for the first
*

*> 250000
*

*> > >> > frames.
*

*> > >> >
*

*> > >> > Could you give any insights regarding why this might be happening? I
*

*> > >> > suspect it is some error in the script below (input.txt), since, in
*

*> > >> > independent analysis, only the loaded frames appeared in the modes,
*

*> > >> unlike
*

*> > >> > the T2 of this case. Also, would you recommend combining the
*

*> > >> trajectories
*

*> > >> > much before fitting the trajectory, where instead of loading 2
*

*> stripped
*

*> > >> > trajectories in the first stage and then creating a combined fit
*

*> > >> > trajectory, we could directly create a single trajectory in the very
*

*> > >> first
*

*> > >> > stage?
*

*> > >> > Regards,
*

*> > >> > Sruthi Sudhakar
*

*> > >> >
*

*> > >> > parm parmsg.prmtop
*

*> > >> > trajin protsg.nc
*

*> > >> > trajin protdg.nc
*

*> > >> > # Fit to the first frame to remove global rotation/translation
*

*> > >> > rms first :1-1362.CA
*

*> > >> > # Write out the restart
*

*> > >> > average gaccAvg1.rst7 restart
*

*> > >> >
*

*> > >> > pca2.in
*

*> > >> > parm parmsg.prmtop
*

*> > >> > trajin protsg.nc
*

*> > >> > trajin protdg.nc
*

*> > >> > # Read in average structure, tag as [avg]
*

*> > >> > reference gaccAvg1.rst7 [avg]
*

*> > >> > # RMS-fit coordinates to average
*

*> > >> > rms ref [avg] :1-1362.CA
*

*> > >> > # Calculate coordinate covariance matrix
*

*> > >> > matrix covar :1-1362.CA name gaccCovar
*

*> > >> > # Write out the fit trajectory
*

*> > >> > trajout fit1.nc
*

*> > >> > # Diagonalize coordinate covariance matrix for eigenmodes (only 2)
*

*> > >> > diagmatrix gaccCovar out evecs1.dat vecs 20 nmwiz nmwizvecs 5
*

*> nmwizfile
*

*> > >> > prot.nmd nmwizmask :1-1362.CA
*

*> > >> >
*

*> > >> > Pca3.in
*

*> > >> > parm parmsg.prmtoptrajin fit1.nc
*

*> > >> > # Read in modes
*

*> > >> > readdata evecs1.dat name MyModes
*

*> > >> > # Now create separate PC projections for each trajectory
*

*> > >> > projection T1 modes MyModes beg 1 end 2 :1-1362.CA start 1 stop
*

*> 250000
*

*> > >> out
*

*> > >> > T1.dat
*

*> > >> > projection T2 modes MyModes beg 1 end 2 :1-1362.CA start 250001
*

*> stop
*

*> > >> 500000
*

*> > >> > out T2.dat
*

*> > >> > # Calculate Kullback-Leibler Divergence vs time for PC histogramskde
*

*> > >> T1:1
*

*> > >> > kldiv T2:1 klout KL-PC1.dat bins 200 name SG-DG-1
*

*> > >> > kde T1:2 kldiv T2:2 klout KL-PC2.dat bins 200 name SG-DG-2
*

*> > >> > kde T1:1 out kde1-PC1.dat bins 200 name KDE1-1
*

*> > >> > kde T2:1 out kde2-PC1.dat bins 200 name KDE2-1hist T1:1 bins 200 out
*

*> > >> > T1-hist1.dat normint name HIST1-1
*

*> > >> > hist T1:2 bins 200 out T1-hist2.dat normint name HIST1-2
*

*> > >> > hist T2:1 bins 200 out T2-hist1.dat normint name HIST2-1
*

*> > >> > hist T2:2 bins 200 out T2-hist2.dat normint name HIST2-2
*

*> > >> >
*

*> > >> >
*

*> > >> >
*

*> > >> > On Thu, Jun 17, 2021 at 2:52 PM Dr. Anselm Horn <anselm.horn.fau.de
*

*> >
*

*> > >> wrote:
*

*> > >> >
*

*> > >> > > Hi,
*

*> > >> > >
*

*> > >> > > many thanks to Dan for his explaining comments regarding the PCA
*

*> > >> analysis!
*

*> > >> > >
*

*> > >> > > Since this issue (different systems) bothered me for many years
*

*> and I
*

*> > >> > > had many discussions about that I'd like to add my thoughts here.
*

*> > >> > >
*

*> > >> > > I see some disadvantages of approach 2, i.e. performing isolated
*

*> PCA
*

*> > >> > > analyses on the different systems. As Dan pointed out, the
*

*> underlying
*

*> > >> > > eigenvectors will certainly not be the same. Although a pairwise
*

*> > >> > > comparison (e.g. via RMSIP) between the eigenvectors of the two
*

*> > >> systems
*

*> > >> > > is possible, in praxi there may arise problems:
*

*> > >> > > 1) the eigenvectors will have a different size, if you include all
*

*> > >> atoms
*

*> > >> > > of the full systems, or the vector entries may represent different
*

*> > >> > > atoms. Thus, a comparison may not be that straightforward.
*

*> > >> > > 2) from the pairwise comparison between the eigenvectors of the
*

*> two
*

*> > >> > > systems you deduce a correspondence between them. Maybe in real
*

*> > >> systems
*

*> > >> > > the most important eigenvectors can clearly be matched, but in
*

*> > >> principle
*

*> > >> > > the matching may not be that distinct. And this problem may arise
*

*> even
*

*> > >> > > if you restrict your analysis to atoms common on both system (e.g.
*

*> > >> > > backbone or CA).
*

*> > >> > > 3) The ease of application of isolated PCA-analysis may seduce the
*

*> > >> user
*

*> > >> > > to omit that matching step and just proceed to the well-known
*

*> > >> > > PCA1:PCA2-histogram plots.
*

*> > >> > >
*

*> > >> > > That said, I totally agree with Dan that you may use approach 2
*

*> in a
*

*> > >> > > very elegant way to obtain histogram plots of a similar pattern,
*

*> if
*

*> > >> you
*

*> > >> > > put an extra efford into the comparison step and be aware that the
*

*> > >> basis
*

*> > >> > > eigenvectors resemble a similar basis, i.e. describe a similar
*

*> overall
*

*> > >> > > motion, but are not the same.
*

*> > >> > >
*

*> > >> > > It's my feeling that approach 1, where you restrict you analysis
*

*> to a
*

*> > >> > > subset of atoms occuring in all systems investigated (backbone or
*

*> CA),
*

*> > >> > > is more "save" and thus should be considered first. In that way,
*

*> you
*

*> > >> are
*

*> > >> > > sure that the basis of all your PCA histograms is the same, and
*

*> > >> > > differences and equalities in the plot between different systems
*

*> can
*

*> > >> > > directly be compared and discussed.
*

*> > >> > > However, using approach 2 in addition can provide further insights
*

*> > >> into
*

*> > >> > > the systems' dynamics.
*

*> > >> > > And I also agree with Dan that visual inspection might be very
*

*> > >> helpful.
*

*> > >> > >
*

*> > >> > > Best regards,
*

*> > >> > >
*

*> > >> > > Anselm
*

*> > >> > >
*

*> > >> > >
*

*> > >> > >
*

*> > >> > > On 06/16/2021 03:18 PM, Daniel Roe wrote:
*

*> > >> > > > Hi,
*

*> > >> > > >
*

*> > >> > > > I know I just responded to you directly but I'll post the reply
*

*> here
*

*> > >> > > > as well in case people search for it in the future.
*

*> > >> > > >
*

*> > >> > > > So when you have different systems and you want to perform
*

*> principal
*

*> > >> > > > component analysis there are two ways you can go. You can
*

*> either 1)
*

*> > >> > > > modify one or both of the systems so you have a common core
*

*> (i.e.
*

*> > >> > > > remove the parts of each system via 'strip' which don't match
*

*> the
*

*> > >> > > > other) and perform a combined PCA on that, or 2) perform PCA on
*

*> each
*

*> > >> > > > system separately and then compare the resulting eigenvectors
*

*> via
*

*> > >> > > > something like RMSIP (root mean square inner product).
*

*> > >> > > >
*

*> > >> > > > The advantage of approach 1 is the usual advantage of the
*

*> combined
*

*> > >> PCA
*

*> > >> > > > approach - you're guaranteed that each eigenvector is the same
*

*> for
*

*> > >> the
*

*> > >> > > > separate systems. This works best when the differences between
*

*> the
*

*> > >> > > > systems are not too extreme, but if the systems are too
*

*> different
*

*> > >> this
*

*> > >> > > > should show up as e.g. non-overlapping PC projection
*

*> histograms. The
*

*> > >> > > > disadvantage is that if you have important contributions to the
*

*> > >> > > > overall motion from parts that you've removed this of course
*

*> will
*

*> > >> not
*

*> > >> > > > show up in your final analysis.
*

*> > >> > > >
*

*> > >> > > > The advantage of approach 2 is that you can perform PCA on each
*

*> full
*

*> > >> > > > system, so you won't miss any motion. The disadvantage is that
*

*> > >> you're
*

*> > >> > > > not guaranteed that the eigenvectors from each analysis will
*

*> match
*

*> > >> > > > (and it's likely they won't), i.e. PC 1 in system A will not be
*

*> the
*

*> > >> > > > same as PC 1 in system B. This is where you would use RMSIP to
*

*> > >> compare
*

*> > >> > > > the different PCs to see which (if any) match.
*

*> > >> > > >
*

*> > >> > > > My feeling is that both approaches should be tried. Approach 2
*

*> will
*

*> > >> > > > give you a sense for whether the motions in the systems are
*

*> similar
*

*> > >> > > > enough for approach 1 to make sense. Also, really make use of
*

*> the
*

*> > >> > > > 'modes trajout' analysis to generate pseudo trajectories (and
*

*> also
*

*> > >> the
*

*> > >> > > > 'nmwiz' keyword for the 'diagmatrix' command for use with the
*

*> VMD
*

*> > >> > > > nmwiz plugin) for the first 2-3 PCs for each system to get a
*

*> sense
*

*> > >> for
*

*> > >> > > > what the motions actually look like. In my opinion there's
*

*> really no
*

*> > >> > > > substitute for visualization when it comes to PCA.
*

*> > >> > > >
*

*> > >> > > > Hope this helps!
*

*> > >> > > >
*

*> > >> > > > -Dan
*

*> > >> > > >
*

*> > >> > > > On Sat, Jun 12, 2021 at 5:16 PM Sruthi Sudhakar
*

*> > >> > > > <sruthisudhakarraji.gmail.com> wrote:
*

*> > >> > > >>
*

*> > >> > > >> This method of breaking the analysis into stages actually
*

*> worked
*

*> > >> and I
*

*> > >> > > was
*

*> > >> > > >> able to complete the analysis overcoming the memory issues.
*

*> > >> > > >> An additional query: Incase we are using 2 different
*

*> trajectories
*

*> > >> with
*

*> > >> > > >> different number of residues (1530 and 1526) to do a combined
*

*> > >> trajectory
*

*> > >> > > >> analysis, would it be enough to load gamd2.nc (1530 residues)
*

*> and
*

*> > >> > > >> gamd3.nc (1526
*

*> > >> > > >> residues) together? Each has 250000 frames. And in that case
*

*> do we
*

*> > >> need
*

*> > >> > > to
*

*> > >> > > >> fit into a single trajectory (fit.nc) or separately create 2
*

*> fit
*

*> > >> > > >> trajectory? I am doubtful about the third stage where we load
*

*> the
*

*> > >> fit
*

*> > >> > > >> trajectory, should we load both the parm files in that case?
*

*> Also
*

*> > >> would
*

*> > >> > > the
*

*> > >> > > >> difference in the total number of residues create any problem?
*

*> > >> > > >>
*

*> > >> > > >> My sample script following your guidelines would be
*

*> > >> > > >> pca1.in
*

*> > >> > > >> parm noionsa.prmtop [ag]
*

*> > >> > > >> parm noionsb.prmtop [bg]
*

*> > >> > > >> trajin gamd2.nc parm [ag]
*

*> > >> > > >> trajin gamd3.nc parm [bg]
*

*> > >> > > >> # Fit to the first frame to remove global rotation/translation
*

*> > >> > > >> rms first :1-1362.CA
*

*> > >> > > >> # Write out the restart
*

*> > >> > > >> average gaccAvg.rst7 restart
*

*> > >> > > >>
*

*> > >> > > >> pca2.in
*

*> > >> > > >> parm noionsa.prmtop [ag]
*

*> > >> > > >> parm noionsb.prmtop [bg]
*

*> > >> > > >> trajin gamd2.nc parm [ag]
*

*> > >> > > >> trajin gamd3.nc parm [bg]
*

*> > >> > > >> # Read in average structure, tag as [avg]
*

*> > >> > > >> reference gaccAvg.rst7 [avg]
*

*> > >> > > >> # RMS-fit coordinates to average
*

*> > >> > > >> rms ref [avg] :1-1362.CA
*

*> > >> > > >> # Calculate coordinate covariance matrix
*

*> > >> > > >> matrix covar :1-1362.CA name gaccCovar
*

*> > >> > > >> # Write out the fit trajectory
*

*> > >> > > >> trajout fit.nc
*

*> > >> > > >> # Diagonalize coordinate covariance matrix for eigenmodes
*

*> (only 2)
*

*> > >> > > >> diagmatrix gaccCovar out evecs.dat vecs 20 nmwiz nmwizvecs 5
*

*> > >> > > >> nmwizfile prot.nmd nmwizmask :1-1362.CA
*

*> > >> > > >>
*

*> > >> > > >> pca3.in
*

*> > >> > > >>
*

*> > >> > > >> parm noionsa.prmtop
*

*> > >> > > >> parm noionsb.prmtop
*

*> > >> > > >> trajin fit.nc
*

*> > >> > > >> # Read in modes
*

*> > >> > > >> readdata evecs.dat name MyModes
*

*> > >> > > >> # Now create separate PC projections for each trajectory
*

*> > >> > > >> projection T1 modes MyModes beg 1 end 20 :1-1362.CA start 1
*

*> stop
*

*> > >> > > 250000 out
*

*> > >> > > >> T1.dat
*

*> > >> > > >> projection T2 modes MyModes beg 1 end 20 :1-1362.CA start
*

*> 250001
*

*> > >> stop
*

*> > >> > > >> 500000 out T2.dat
*

*> > >> > > >> # Calculate Kullback-Leibler Divergence vs time for PC
*

*> histograms
*

*> > >> > > >> kde T1:1 kldiv T2:1 klout KL-PC1.dat bins 200 name SG-DG-1
*

*> > >> > > >> kde T1:2 kldiv T2:2 klout KL-PC2.dat bins 200 name SG-DG-2
*

*> > >> > > >> kde T1:1 out kde1-PC1.dat bins 200 name KDE1-1
*

*> > >> > > >> kde T2:1 out kde2-PC1.dat bins 200 name KDE2-1
*

*> > >> > > >> hist T1:1 bins 200 out T1-hist1.dat normint name HIST1-1
*

*> > >> > > >> hist T1:2 bins 200 out T1-hist2.dat normint name HIST1-2
*

*> > >> > > >> hist T2:1 bins 200 out T2-hist1.dat normint name HIST2-1
*

*> > >> > > >> hist T2:2 bins 200 out T2-hist2.dat normint name HIST2-2
*

*> > >> > > >>
*

*> > >> > > >> Kindly suggest any corrections required.
*

*> > >> > > >> Regards,
*

*> > >> > > >> Sruthi Sudhakar
*

*> > >> > > >>
*

*> > >> > > >>
*

*> > >> > > >> On Sat, Jun 12, 2021 at 12:11 PM Sruthi Sudhakar <
*

*> > >> > > >> sruthisudhakarraji.gmail.com> wrote:
*

*> > >> > > >>
*

*> > >> > > >>> This method actually worked and I was able to complete the
*

*> > >> analysis. An
*

*> > >> > > >>> additional query: Incase we are using 2 different
*

*> trajectories to
*

*> > >> do a
*

*> > >> > > >>> combined trajectory analysis, would it be enough to load
*

*> gamd2.nc
*

*> > >> and
*

*> > >> > > >>> gamd3.nc together? And in that case do we need to fit into a
*

*> > >> single
*

*> > >> > > >>> trajectory (fit.nc) or separately create 2 fit trajectory?
*

*> > >> > > >>>
*

*> > >> > > >>> Regards,
*

*> > >> > > >>> Sruthi Sudhakar
*

*> > >> > > >>>
*

*> > >> > > >>>
*

*> > >> > > >>> On Wed, Jun 2, 2021 at 11:32 PM Daniel Roe <
*

*> > >> daniel.r.roe.gmail.com>
*

*> > >> > > wrote:
*

*> > >> > > >>>
*

*> > >> > > >>>> Hi,
*

*> > >> > > >>>>
*

*> > >> > > >>>> First, if you haven't already I really recommend that you
*

*> look
*

*> > >> at some
*

*> > >> > > >>>> literature to familiarize yourself with what principal
*

*> component
*

*> > >> > > >>>> analysis actually does. It's important to understand what
*

*> each
*

*> > >> part of
*

*> > >> > > >>>> the analysis is for. I like the old "essential dynamics"
*

*> papers,
*

*> > >> but a
*

*> > >> > > >>>> very rough explanation is given in the cpptraj PCA tutorial:
*

*> > >> > > >>>>
*

*> > >> > > >>>>
*

*> > >> > >
*

*> > >>
*

*> https://amberhub.chpc.utah.edu/introduction-to-principal-component-analysis/
*

*> > >> > > >>>>
*

*> > >> > > >>>> Also, read the cpptraj manual so you know what each command
*

*> is
*

*> > >> > > actually
*

*> > >> > > >>>> doing.
*

*> > >> > > >>>>
*

*> > >> > > >>>> That said, I now realize my examples in the manual and the
*

*> > >> tutorial
*

*> > >> > > >>>> both rely on having the trajectories stored in memory. So I
*

*> will
*

*> > >> give
*

*> > >> > > >>>> a rough outline of how to do it all on-disk.
*

*> > >> > > >>>>
*

*> > >> > > >>>> 1) Create average coordinates.
*

*> > >> > > >>>>
*

*> > >> > > >>>> parm noions.prmtop
*

*> > >> > > >>>> trajin gamd2.nc
*

*> > >> > > >>>> # Fit to the first frame to remove global
*

*> rotation/translation
*

*> > >> > > >>>> rms first :1-1530&!.H=
*

*> > >> > > >>>> # Write out the restart
*

*> > >> > > >>>> average gaccAvg.rst7 restart
*

*> > >> > > >>>>
*

*> > >> > > >>>> 2) Rms fit your trajectory to average coordinates, calculate
*

*> the
*

*> > >> > > >>>> covariance matrix, write out the fit trajectory, diagonalize
*

*> the
*

*> > >> > > >>>> matrix and write out the "modes" data (i.e. eigenvectors and
*

*> > >> > > >>>> eigenvalues).
*

*> > >> > > >>>>
*

*> > >> > > >>>> parm noions.prmtop
*

*> > >> > > >>>> trajin gamd2.nc
*

*> > >> > > >>>> # Read in average structure, tag as [avg]
*

*> > >> > > >>>> reference gaccAvg.rst7 [avg]
*

*> > >> > > >>>> # RMS-fit coordinates to average
*

*> > >> > > >>>> rms ref [avg] :1-1530&!.H=
*

*> > >> > > >>>> # Calculate coordinate covariance matrix
*

*> > >> > > >>>> matrix covar :1-1530&!.H= name gaccCovar
*

*> > >> > > >>>> # Write out the fit trajectory
*

*> > >> > > >>>> trajout fit.nc
*

*> > >> > > >>>> # Diagonalize coordinate covariance matrix for eigenmodes
*

*> (only
*

*> > >> 2)
*

*> > >> > > >>>> diagmatrix gaccCovar out evecs.dat vecs 2 nmwiz nmwizvecs 2
*

*> > >> nmwizfile
*

*> > >> > > >>>> cas.nmd nmwizmask :1-1530&!.H=
*

*> > >> > > >>>>
*

*> > >> > > >>>>
*

*> > >> > > >>>> 3) Read in the "modes" data, calculate principal component
*

*> > >> projections
*

*> > >> > > >>>> from the fit trajectory, do the Kullback-Leibler divergence
*

*> > >> analysis.
*

*> > >> > > >>>>
*

*> > >> > > >>>> parm noions.prmtop
*

*> > >> > > >>>> trajin fit.nc
*

*> > >> > > >>>> # Read in modes
*

*> > >> > > >>>> readdata evecs.dat name MyModes
*

*> > >> > > >>>> # Now create separate PC projections for each trajectory
*

*> > >> > > >>>> projection T1 modes MyModes beg 1 end 2 :1-1530&!.H= start 1
*

*> stop
*

*> > >> > > >>>> 50000 out T1.dat
*

*> > >> > > >>>> projection T2 modes MyModes beg 1 end 2 :1-1530&!.H= start
*

*> 50001
*

*> > >> stop
*

*> > >> > > >>>> 100000 out T2.dat
*

*> > >> > > >>>> <other projection commands>
*

*> > >> > > >>>> # Calculate Kullback-Leibler Divergence vs time for PC
*

*> histograms
*

*> > >> > > >>>> kde T1:1 kldiv T2:1 klout KL-PC.agr bins 400 name AMD-MREMD-1
*

*> > >> > > >>>> <other kde commands and analyses>
*

*> > >> > > >>>>
*

*> > >> > > >>>> Hope this helps,
*

*> > >> > > >>>>
*

*> > >> > > >>>> -Dan
*

*> > >> > > >>>>
*

*> > >> > > >>>> On Tue, Jun 1, 2021 at 3:23 PM Sruthi Sudhakar
*

*> > >> > > >>>> <sruthisudhakarraji.gmail.com> wrote:
*

*> > >> > > >>>>>
*

*> > >> > > >>>>> The available memory in the beginning of cpptraj is shown
*

*> as 32
*

*> > >> Gb
*

*> > >> > > and
*

*> > >> > > >>>>> estimated memory usage is 82gb. I am really confused about
*

*> this
*

*> > >> > > memory
*

*> > >> > > >>>>> allotment statistics. I am doing the process in a disk with
*

*> > >> more than
*

*> > >> > > >>>> 3TB
*

*> > >> > > >>>>> space. I am not well versed with this technicality. Could
*

*> > >> someone
*

*> > >> > > please
*

*> > >> > > >>>>> exaplain how to overcome the issue in this principal
*

*> component
*

*> > >> > > analysis
*

*> > >> > > >>>>> part? I did understand that we have to separate the
*

*> analysis
*

*> > >> into 3
*

*> > >> > > >>>> phases
*

*> > >> > > >>>>> but not clear as to how the inputs should be changed. Kindly
*

*> > >> advise.
*

*> > >> > > >>>>>
*

*> > >> > > >>>>> On Tue, 1 Jun 2021 at 7:21 PM, Sruthi Sudhakar <
*

*> > >> > > >>>> sruthisudhakarraji.gmail.com>
*

*> > >> > > >>>>> wrote:
*

*> > >> > > >>>>>
*

*> > >> > > >>>>>>
*

*> > >> > > >>>>>> Thank you for the reply. Since I am doing this for the
*

*> first
*

*> > >> time, I
*

*> > >> > > >>>>>> wanted to know if I am supposed to create 3 separate
*

*> inputs
*

*> > >> to run
*

*> > >> > > in
*

*> > >> > > >>>>>> cpptraj to do the methodology you suggested.
*

*> > >> > > >>>>>>
*

*> > >> > > >>>>>> Regards,
*

*> > >> > > >>>>>> Sruthi
*

*> > >> > > >>>>>>
*

*> > >> > > >>>>>> On Tue, 1 Jun 2021 at 6:33 PM, Daniel Roe <
*

*> > >> daniel.r.roe.gmail.com>
*

*> > >> > > >>>> wrote:
*

*> > >> > > >>>>>>
*

*> > >> > > >>>>>>> Hi,
*

*> > >> > > >>>>>>>
*

*> > >> > > >>>>>>> You are likely running out of memory. This is why your
*

*> > >> problems
*

*> > >> > > >>>> during
*

*> > >> > > >>>>>>> clustering went away when you reduced the number of input
*

*> > >> frames
*

*> > >> > > by a
*

*> > >> > > >>>>>>> factor of 5. The solution is to do everything on disk. So
*

*> > >> instead
*

*> > >> > > of
*

*> > >> > > >>>>>>> loading all coordinates into memory, separate the
*

*> principal
*

*> > >> > > component
*

*> > >> > > >>>>>>> analysis into three separate phases:
*

*> > >> > > >>>>>>>
*

*> > >> > > >>>>>>> 1) Create average coordinates.
*

*> > >> > > >>>>>>> 2) Rms fit your trajectory to average coordinates,
*

*> calculate
*

*> > >> the
*

*> > >> > > >>>>>>> covariance matrix, write out the fit trajectory,
*

*> diagonalize
*

*> > >> the
*

*> > >> > > >>>>>>> matrix and write out the "modes" data (i.e. eigenvectors
*

*> and
*

*> > >> > > >>>>>>> eigenvalues).
*

*> > >> > > >>>>>>> 3) Read in the "modes" data, calculate principal component
*

*> > >> > > >>>> projections
*

*> > >> > > >>>>>>> from the fit trajectory, do the Kullback-Leibler
*

*> divergence
*

*> > >> > > analysis.
*

*> > >> > > >>>>>>>
*

*> > >> > > >>>>>>> This way, the most memory you need is to store the
*

*> covariance
*

*> > >> > > matrix,
*

*> > >> > > >>>>>>> modes, and other data of that type. Hope this helps,
*

*> > >> > > >>>>>>>
*

*> > >> > > >>>>>>> -Dan
*

*> > >> > > >>>>>>>
*

*> > >> > > >>>>>>> PS - Note that there appears to be a small error in the
*

*> input
*

*> > >> you
*

*> > >> > > >>>>>>> posted (in pca.in). The 'nmwiz' keyword should be part
*

*> of the
*

*> > >> > > >>>>>>> diagmatrix command, not on a separate line.
*

*> > >> > > >>>>>>>
*

*> > >> > > >>>>>>>
*

*> > >> > > >>>>>>> On Mon, May 31, 2021 at 1:22 PM Sruthi Sudhakar
*

*> > >> > > >>>>>>> <sruthisudhakarraji.gmail.com> wrote:
*

*> > >> > > >>>>>>>>
*

*> > >> > > >>>>>>>> Dear all,
*

*> > >> > > >>>>>>>>
*

*> > >> > > >>>>>>>> I have been doing pca analysis on an accelerated MD
*

*> > >> trajectory of
*

*> > >> > > >>>> 500ns
*

*> > >> > > >>>>>>>> (250,000 frames). I have attached the input file I have
*

*> used
*

*> > >> for
*

*> > >> > > >>>> the
*

*> > >> > > >>>>>>> study.
*

*> > >> > > >>>>>>>> The job stops at the createcrd stage. Basically, the job
*

*> gets
*

*> > >> > > >>>> killed at
*

*> > >> > > >>>>>>>> 30%. The same happened during the cluster analysis
*

*> reading
*

*> > >> every
*

*> > >> > > >>>> frames.
*

*> > >> > > >>>>>>>> The clustering error was solved when I changed the input
*

*> to
*

*> > >> read
*

*> > >> > > >>>> every
*

*> > >> > > >>>>>>> 5th
*

*> > >> > > >>>>>>>> frame. Now since this is repeating in pca analysis,
*

*> kindly
*

*> > >> help
*

*> > >> > > >>>>>>> regarding
*

*> > >> > > >>>>>>>> the same.
*

*> > >> > > >>>>>>>>
*

*> > >> > > >>>>>>>>
*

*> > >> > > >>>>>>>> Regards,
*

*> > >> > > >>>>>>>> Sruthi Sudhakar
*

*> > >> > > >>>>>>>> _______________________________________________
*

*> > >> > > >>>>>>>> AMBER mailing list
*

*> > >> > > >>>>>>>> AMBER.ambermd.org
*

*> > >> > > >>>>>>>> http://lists.ambermd.org/mailman/listinfo/amber
*

*> > >> > > >>>>>>>
*

*> > >> > > >>>>>>> _______________________________________________
*

*> > >> > > >>>>>>> AMBER mailing list
*

*> > >> > > >>>>>>> AMBER.ambermd.org
*

*> > >> > > >>>>>>> http://lists.ambermd.org/mailman/listinfo/amber
*

*> > >> > > >>>>>>>
*

*> > >> > > >>>>>>
*

*> > >> > > >>>>> _______________________________________________
*

*> > >> > > >>>>> AMBER mailing list
*

*> > >> > > >>>>> AMBER.ambermd.org
*

*> > >> > > >>>>> http://lists.ambermd.org/mailman/listinfo/amber
*

*> > >> > > >>>>
*

*> > >> > > >>>> _______________________________________________
*

*> > >> > > >>>> AMBER mailing list
*

*> > >> > > >>>> AMBER.ambermd.org
*

*> > >> > > >>>> http://lists.ambermd.org/mailman/listinfo/amber
*

*> > >> > > >>>>
*

*> > >> > > >>>
*

*> > >> > > >> _______________________________________________
*

*> > >> > > >> AMBER mailing list
*

*> > >> > > >> AMBER.ambermd.org
*

*> > >> > > >> http://lists.ambermd.org/mailman/listinfo/amber
*

*> > >> > > >
*

*> > >> > > > _______________________________________________
*

*> > >> > > > AMBER mailing list
*

*> > >> > > > AMBER.ambermd.org
*

*> > >> > > > http://lists.ambermd.org/mailman/listinfo/amber
*

*> > >> > > >
*

*> > >> > >
*

*> > >> > >
*

*> > >> > > _______________________________________________
*

*> > >> > > AMBER mailing list
*

*> > >> > > AMBER.ambermd.org
*

*> > >> > > http://lists.ambermd.org/mailman/listinfo/amber
*

*> > >> > >
*

*> > >> > _______________________________________________
*

*> > >> > AMBER mailing list
*

*> > >> > AMBER.ambermd.org
*

*> > >> > http://lists.ambermd.org/mailman/listinfo/amber
*

*> > >>
*

*> > >> _______________________________________________
*

*> > >> AMBER mailing list
*

*> > >> AMBER.ambermd.org
*

*> > >> http://lists.ambermd.org/mailman/listinfo/amber
*

*> > >>
*

*> > >
*

*> > _______________________________________________
*

*> > AMBER mailing list
*

*> > AMBER.ambermd.org
*

*> > http://lists.ambermd.org/mailman/listinfo/amber
*

*>
*

*> _______________________________________________
*

*> AMBER mailing list
*

*> AMBER.ambermd.org
*

*> http://lists.ambermd.org/mailman/listinfo/amber
*

*>
*

_______________________________________________

AMBER mailing list

AMBER.ambermd.org

http://lists.ambermd.org/mailman/listinfo/amber

Received on Fri Jun 25 2021 - 23:00:02 PDT

Date: Sat, 26 Jun 2021 11:23:16 +0530

Dear all,

Sorry for making this thread particularlylong. But even using the above

inputs my T2.dat file contains 500,000 data points in place of the 250,000

frames I have given in the input. Though there are 500,000 points, the

first 250,000 are just zeros. and the kde command is not getting completed.

Regards,

Sruthi Sudhakar

On Mon, Jun 21, 2021 at 8:03 PM Daniel Roe <daniel.r.roe.gmail.com> wrote:

_______________________________________________

AMBER mailing list

AMBER.ambermd.org

http://lists.ambermd.org/mailman/listinfo/amber

Received on Fri Jun 25 2021 - 23:00:02 PDT

Custom Search