Re: [AMBER] # of Coordinates Mismatch in PCA

From: Jason Swails <jason.swails.gmail.com>
Date: Tue, 21 Aug 2018 18:15:07 -0400

It may help to look at another example of distortions resulting from
projecting onto a small set of PCs -- image processing.

http://scikit-learn.org/stable/modules/decomposition.html#pca-using-randomized-svd

You recover the original trajectory if you project over *all* PCs (since
it's a space of equivalent size), but if you pick only some, you lose
information. Things become "blurry".

That example contains projections on the first 16 PCs. Clearly these 16
PCs contain more information than 16 random pixels in the original images.
But it still loses some data in the dimensionality reduction (as it
[almost] must).

HTH,
Jason

On Tue, Aug 21, 2018 at 12:57 PM, Robert Molt <rwmolt07.gmail.com> wrote:

> Ah-ha. I had assumed (erroneously, it seems) that the "bond stretch
> variance" (for lack of a better term) could not occur from random
> motion, but must reflect a genuine "mode" of high variance...but could
> not understand a "mode" separating atoms. If I understand properly, such
> a result is a negative result on finding a conformational change? That
> is, if a concerted motion existed such that the positional variance
> could all be lumped into one orthogonal eigenvector cleanly (i.e., like
> a real "mode"), it would have been found. However, since the positional
> variance is clearly spread through several eigenvectors to represent the
> true real-space movements, it is not a single-mode conformational
> change? I think this would rely on a uniqueness theorem for the
> orthogonalization process...I think this is true under the condition
> that one maximizes the first element, but that condition itself is
> arbitrary?
>
> Or is the answer more simplistically: I have learned nothing, since the
> highest positional variance has no direct mapping to any real-space
> movement because of the non-uniqueness of the "vectors"?
>
> I apologize for the constant quotation marks, but the language is
> arduous (for the extent to which a position principal component
> /genuinely/ represents a "mode" of vibration is unclear a priori).
>
>
> On 8/21/18 12:29 PM, Daniel Roe wrote:
> > On Tue, Aug 21, 2018 at 12:10 PM Robert Molt <rwmolt07.gmail.com> wrote:
> >> principle component. My last question is not AMBER-specific, I do not
> >> believe. When I examine the visualization of the "mode" of largest
> >> variance, I see an extreme lengthening of several bonds in VMD (i.e.,
> >> they look like they are "stretching" to an unphysical degree). I am very
> >> confused why this could happen, short of user error. Is it for some
> >> reason "normal" to see that the highest variance in positions between
> >> "bonded" atoms could be large, as in visually stretched a huge degree? I
> >> assume not, and that this is an error somewhere on my part, but I wanted
> >> to check if I have a misunderstanding before I go back to the drawing
> board.
> > This is in fact normal, and is a consequence of projecting your
> > original structure into "eigenspace". I will attempt to elaborate but
> > will probably do a poor job, so bear with me.
> >
> > As you allude to, the eigenvectors that you obtain from PCA represent
> > the axes which explain variance in your system (in the case of
> > Cartesian space, it's positional); the eigenvector corresponding to
> > the largest eigenvalue accounts for the largest variance, the next
> > eigenvector the second-largest variance, and so on. However, unless
> > your system is extremely simple, the eigenvectors by themselves do not
> > necessarily explain any "real" motion - any motion undergone by the
> > system is a combination of motion along several eigenvectors. This is
> > why when you isolate just one, things can look strange. You can even
> > think about this in terms of projecting along the "normal" Cartesian
> > XYZ axes. Say you have a bond that lies almost primarily along the X
> > axis. If you project it along either the Y or Z axes it would look
> > squished.
> >
> > Hopefully that helps a bit,
> >
> > -Dan
> >
> > PS - Another option for viewing motion along eigenvectors you may find
> > helpful is to use the 'nmwiz' keyword of 'diagmatrix' to generate
> > output which can be visualized with the 'nmwiz' plugin of VMD.
> >
> >>
> >> On 8/21/18 9:39 AM, Daniel Roe wrote:
> >>> Hi,
> >>>
> >>> On Mon, Aug 20, 2018 at 10:54 PM Jason Swails <jason.swails.gmail.com>
> wrote:
> >>>> You have stripped out residues 1-37 and 319-<total number of
> residues>.
> >>>> You now have 244 residues left (it seems that your system doesn't
> have 318
> >>>> full residues? I would have expected 281 left if it did). These are
> now
> >>>> residues 1-244 since cpptraj always numbers residues from 1.
> >>> Jason's explanation is spot-on (as usual).
> >>>
> >>> Note that in recent versions of CPPTRAJ you can now use the ':;'
> >>> (colon semicolon) token to use original residue numbers (even after
> >>> stripping the system), e.g.
> >>>
> >>> :;38-318&!.H=
> >>>
> >>> This may be more intuitive. Hope this helps,
> >>>
> >>> -Dan
> >>>
> >>> _______________________________________________
> >>> AMBER mailing list
> >>> AMBER.ambermd.org
> >>> http://lists.ambermd.org/mailman/listinfo/amber
> >> --
> >> Dr. Robert Molt Jr.
> >>
> >>
> >> _______________________________________________
> >> AMBER mailing list
> >> AMBER.ambermd.org
> >> http://lists.ambermd.org/mailman/listinfo/amber
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
>
> --
> Dr. Robert Molt Jr.
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>



-- 
Jason M. Swails
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Aug 21 2018 - 15:30:03 PDT
Custom Search