Re: [AMBER] Regarding RMSD calculation and Clustering from Daniel Roe on 2020-08-04 (Amber Archive Aug 2020)

From: Daniel Roe <daniel.r.roe.gmail.com>
Date: Tue, 4 Aug 2020 10:54:54 -0400

Hi,

On Sun, Feb 9, 2020 at 12:10 AM Manish Kumar Mohanty
<mkmohanty.iiserb.ac.in> wrote:
>
> Now, if I want to do rmsd calculation for the DNA 12-mer, will the data
> present in nowater.nc be fine? Or should I use this and proceed?
> * rms ToFirst :1-24&!.H *

So the coordinates in the nowater.nc trajectory file 1) have been
imaged, 2) have been stripped of all WAT residues, 3) have been
RMS-best fit to the first frame (default behavior when no reference is
specified) using atoms in residues 1-24. I'm not sure what you mean by
"fine" here. The 'rms ToFirst :1-24&!.H' will calculate the best-fit
RMSD to the first structure, but not because of 'ToFirst' - the
resulting data set will be named 'ToFirst'; you haven't specified a
reference structure, so the command will default to the first frame.
This will be clear in the output from the command. Also, you probably
want :1-24&!.H= (notice the equals sign, which is equivalent to an
asterisk wildcard); .H would just exclude atoms named 'H'. You could
also do ./H to exclude all H elements.

> If possible, can someone clarify the difference between *rms fit* and* rms
> ToFirst* command?

No difference; "fit" and "ToFirst" aren't keywords for the 'rms'
command. I highly recommend reading the manual entry for the 'rms'
command, or at least running cpptraj interactively and typing 'help
rms' to see recognized keywords.

> *2- *I have a single trajectory of 1 μsec long DNA duplex simulation. Is it
> reliable to do clustering for the same after RMSD calculation to sample all
> the possible conformations or should clustering be done for multiple
> trajectories using different starting structures only? Then is it ok to do
> Markov State Modelling using the clusters obtained?

Cluster analysis of an entire microsecond trajectory is likely too
many frames to complete in a reasonable amount of time (there may be
memory issues as well if the pairwise cache is stored in memory which
is the default behavior). You'll likely want to do some sieving (via
'sieve <#>'); I recommend using 'random' in conjunction with 'sieve'
since regular sieving could be problematic if there are underlying
periodic motions in your molecule. Also, you'll probably want to
repeat the cluster analysis a few times with different settings for
your algorithm and see what gives you the "best" clusters (high
pseudo-F, low DBI, etc). Once you're confident you have reasonable
looking clusters, I think it's fine to use them in Markov State
modeling (I think that this is often what's done, although there are
other approaches to grouping for Markov state modeling; there's tons
of literature out there, see e.g. work by Chodera et al). No matter
what you do, before you do a lot of clustering I urge you to read the
manual entry, check out a tutorial (e.g.
https://amberhub.chpc.utah.edu/clustering-a-protein-trajectory/), and
try to cluster a small subset of your trajectory (no more than a
couple hundred frames tops) so you can get familiar with things before
you go all out. If you have a multi-core machine, you may want to use
OpenMP-enabled cpptraj (cpptraj.OMP) since the pairwise calculation
will be faster. Finally, I always recommend people read the wonderful
cluster analysis paper from Shao & Cheatham et al. - it's a classic:
https://doi.org/10.1021/ct700119m

Hope this helps,

-Dan

>
> Thanks
> Manish
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Aug 04 2020 - 08:00:03 PDT