# Re: [AMBER] Question about PCA tutorial

From: Thomas Cheatham <tec3.utah.edu>
Date: Wed, 29 Apr 2020 21:24:07 -0600 (MDT)

> I have a question on PCA. I did AMBER PCA tutorial and i have a
> question. How to decide how many vectors to use. The PCA tutorial uses 3.
> ( runanalysis diagmatrix cpu-gpu-covar out cpu-gpu-evecs.dat vecs 3 name
> myEvecs nmwiz nmwizvecs 3 nmwizfile dna.nmd nmwizmask :1-36&!.H= )

Well, no one has responded yet so I will give it a shot noting that
hoefully my people will correct me if I am wrong.

Like clustering, there is no right answer and it depends upon what you
want to learn. When I teach about PCA, I try to start from the concepts of
normal modes of motion. You do a QM minimization and you can visualize the
normal modes. The first eigenvalues/eigenvectors report on the slowest
collective modes of motion (the overall molecule bending / twisting with
many of the atoms moving). The later ones, the high frequency modes, are
just a couple of atoms moving, such as bond vibrations. If you haven't
ever viewed modes of motion in GaussView or equivalents, do...

When looking at the modes of motion from PCA, they are ordered from low
frequency to high (bonds). Bond vibrations do not tell you much about the
dynamics or function or really much more than the fact that the bond is
vibrating. Often the most informative are the first few modes on motion
since they give you a picture of the largest / collective motion. The
first few modes represent 90% of the motion. So, back to your question
about how many modes you need to see depends on what you are trying to
learn.

In the papers you reference from my lab, we were trying to demonstrate
reproducibilty and convergence from independent simulations from different
initial conditions. If different simulations showed equivalence for the
first three modes of motion, probably good. If 15-20 modes agree (between
independent simulations) even better since this gets to ~95-99% of motion
-- hard to claim independent simulations are not equivalent if they
reproduce 15-20 of the modes. Does this guarantee convergence, no, since
the independent simulations could have both missed important states (i.e.
locally converged, not necessarily globally converged). Yet, do we need to
compare all modes, probably not (noise) but we could with CPPTRAJ since it
was designed by be flexible (allow you to investigate what you want).
CPPTRAJ by being flexible means experiement; there is no one "correct" way
and the rate limiting step in simulation these days is not running MD, but
figuring out what the MD means.

To reframe your question, it is not about the number of modes, it is what
question are you trying to answer with respect to the modes of motion?

--tec3

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Apr 29 2020 - 20:30:03 PDT
Custom Search