Re: [AMBER] Error in pca analysis from Sruthi Sudhakar on 2021-06-01 (Amber Archive Jun 2021)

From: Sruthi Sudhakar <sruthisudhakarraji.gmail.com>
Date: Wed, 2 Jun 2021 00:52:59 +0530

The available memory in the beginning of cpptraj is shown as 32 Gb and
estimated memory usage is 82gb. I am really confused about this memory
allotment statistics. I am doing the process in a disk with more than 3TB
space. I am not well versed with this technicality. Could someone please
exaplain how to overcome the issue in this principal component analysis
part? I did understand that we have to separate the analysis into 3 phases
but not clear as to how the inputs should be changed. Kindly advise.

On Tue, 1 Jun 2021 at 7:21 PM, Sruthi Sudhakar <sruthisudhakarraji.gmail.com>
wrote:

>
> Thank you for the reply. Since I am doing this for the first time, I
> wanted to know if I am supposed to create 3 separate inputs to run in
> cpptraj to do the methodology you suggested.
>
> Regards,
> Sruthi
>
> On Tue, 1 Jun 2021 at 6:33 PM, Daniel Roe <daniel.r.roe.gmail.com> wrote:
>
>> Hi,
>>
>> You are likely running out of memory. This is why your problems during
>> clustering went away when you reduced the number of input frames by a
>> factor of 5. The solution is to do everything on disk. So instead of
>> loading all coordinates into memory, separate the principal component
>> analysis into three separate phases:
>>
>> 1) Create average coordinates.
>> 2) Rms fit your trajectory to average coordinates, calculate the
>> covariance matrix, write out the fit trajectory, diagonalize the
>> matrix and write out the "modes" data (i.e. eigenvectors and
>> eigenvalues).
>> 3) Read in the "modes" data, calculate principal component projections
>> from the fit trajectory, do the Kullback-Leibler divergence analysis.
>>
>> This way, the most memory you need is to store the covariance matrix,
>> modes, and other data of that type. Hope this helps,
>>
>> -Dan
>>
>> PS - Note that there appears to be a small error in the input you
>> posted (in pca.in). The 'nmwiz' keyword should be part of the
>> diagmatrix command, not on a separate line.
>>
>>
>> On Mon, May 31, 2021 at 1:22 PM Sruthi Sudhakar
>> <sruthisudhakarraji.gmail.com> wrote:
>> >
>> > Dear all,
>> >
>> > I have been doing pca analysis on an accelerated MD trajectory of 500ns
>> > (250,000 frames). I have attached the input file I have used for the
>> study.
>> > The job stops at the createcrd stage. Basically, the job gets killed at
>> > 30%. The same happened during the cluster analysis reading every frames.
>> > The clustering error was solved when I changed the input to read every
>> 5th
>> > frame. Now since this is repeating in pca analysis, kindly help
>> regarding
>> > the same.
>> >
>> >
>> > Regards,
>> > Sruthi Sudhakar
>> > _______________________________________________
>> > AMBER mailing list
>> > AMBER.ambermd.org
>> > http://lists.ambermd.org/mailman/listinfo/amber
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Jun 01 2021 - 12:30:03 PDT