Re: [AMBER] PCA: comparing PCs from different trajectories from George Tzotzos on 2015-08-05 (Amber Archive Aug 2015)

From: George Tzotzos <gtzotzos.me.com>
Date: Wed, 05 Aug 2015 20:39:05 +0300

Dan

thank you very much,

I guessed that much and I’ve already applied your suggestion. See my input script below. I throws some errors which I’m also adding at the end of this message. Nevertheless, the script produced a histogram a snapshot of which is attached.

I’d be most grateful for your comments, particularly as I’m a novice in this type of analysis.

Best regards

George
================
INPUT SCRIPT
parm 3n7h_de3_solv.prmtop
trajin combined_100ns.nc # 5 x 20ns trajectories combined
rms first
average av_100ns.nc
createcrd crd1
run
reference av_100ns.nc [ref1]
crdaction crd1 rms ref [ref1] :1-125&!.H=
crdaction crd1 matrix covar :1-125&!.H= name combinedCovar
runanalysis diagmatrix combinedCovar out evecs.dat vecs 20
crdaction crd1 projection T1 modes evecs.dat beg 1 end 20 :1-125&!.H= \
        crdframes 1, 2000 out T1.dat
crdaction crd1 projection T2 modes evecs.dat beg 1 end 20 :1-125&!.H= \
        crdframes 2001, 4000 out T2.dat
crdaction crd1 projection T3 modes evecs.dat beg 1 end 20 :1-125&!.H= \
        crdframes 4001, 6000 out T3.dat
crdaction crd1 projection T4 modes evecs.dat beg 1 end 20 :1-125&!.H= \
        crdframes 6001, 8000 out T4.dat
crdaction crd1 projection T5 modes evecs.dat beg 1 end 20 :1-125&!.H= \
        crdframes 8001, 10000 out T5.dat
kde T1:1 kldiv T2:1 klout KL-PC.agr bins 4000 name PC1-PC2-1
kde T1:2 kldiv T2:2 klout KL-PC.agr bins 4000 name PC1-PC2-2
kde T1:3 kldiv T2:3 klout KL-PC.agr bins 4000 name PC1-PC2-3
kde T1:4 kldiv T2:4 klout KL-PC.agr bins 4000 name PC1-PC2-4
kde T1:5 kldiv T2:5 klout KL-PC.agr bins 4000 name PC1-PC2-5
kde T1:1 out kde-PC.agr bins 400 name KDE1-1
hist T1:1,*,*,*,200 out pca.hist.agr normint name HIST1-1

ERRORS
[crdaction crd1 projection T1 modes evecs.dat beg 1 end 20 :1-125&!.H= crdframes 1, 2000 out T1.dat]
Error: [projection] Not all arguments handled: [ 2000 ]

[crdaction crd1 projection T2 modes evecs.dat beg 1 end 20 :1-125&!.H= crdframes 2001, 4000 out T2.dat]
Error: [projection] Not all arguments handled: [ 4000 ]

[crdaction crd1 projection T3 modes evecs.dat beg 1 end 20 :1-125&!.H= crdframes 4001, 6000 out T3.dat]
Error: [projection] Not all arguments handled: [ 6000 ]

[crdaction crd1 projection T4 modes evecs.dat beg 1 end 20 :1-125&!.H= crdframes 6001, 8000 out T4.dat]
Error: [projection] Not all arguments handled: [ 8000 ]

[crdaction crd1 projection T5 modes evecs.dat beg 1 end 20 :1-125&!.H= crdframes 8001, 10000 out T5.dat]
Error: [projection] Not all arguments handled: [ 10000 ]

WARNINGS
Several of those. Example. May be of diagnostic relevance
Warning: No actions/output trajectories specified.

ANALYSIS: Performing 7 analyses:
  0: [kde T1:1 kldiv T2:1 klout KL-PC.agr bins 4000 name PC1-PC2-1]
        No minimum specified, determining from input data.
        No maximum specified, determining from input data.
        Dim : -23.254726->30.371716, step 0.013407, 4000 bins.
        Determined bandwidth from normal distribution approximation: 2.201004
Warning: Size of Mode1 (10000) != size of Mode1 (8000)
Warning: Only using 8000 data points.

> On 5 Aug 2015, at 20:16, Daniel Roe <daniel.r.roe.gmail.com> wrote:
>
> Hi,
>
> This is by no means the only way, but we've done this kind of analysis
> recently (see http://pubs.acs.org/doi/abs/10.1021/jp4125099). The idea
> is to calculate principal components from a combined trajectory (e.g.
> trajectories T1 + T2 + T3), then calculate projections along those
> principal components separately (i.e. the projections for T1, T2, and
> T3 along the combined PCs). The distributions for the individual
> projections for PCs (particularly the low frequency ones) should
> overlap pretty well, otherwise the trajectories are not converged -
> note that overlap is a necessary but not sufficient condition of
> convergence, ideally you will look at several different properties to
> ascertain overall convergence. We used Kullback-Leibler divergence to
> quantify the overlap of distributions, but that isn't the only way.
> The SI in the given publication has some example scripts for CPPTRAJ
> you should be able to adapt for your use.
>
> Hope this helps,
>
> -Dan
>
> On Wed, Aug 5, 2015 at 2:24 AM, George Tzotzos <gtzotzos.me.com> wrote:
>> I’ve run 5 independent MD simulations of the same system. I’m seeking advice on how to compare the Principal Components derived from each trajectory to assess convergence of sampling.
>>
>> Thank in advance for any guidance and advice
>>
>> Regards
>>
>> George
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>
>
>
> --
> -------------------------
> Daniel R. Roe, PhD
> Department of Medicinal Chemistry
> University of Utah
> 30 South 2000 East, Room 307
> Salt Lake City, UT 84112-5820
> http://home.chpc.utah.edu/~cheatham/
> (801) 587-9652
> (801) 585-6208 (Fax)
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Aug 05 2015 - 11:00:04 PDT