Re: [AMBER] possible bug - SIGSEGV in cpptraj called from MMPBSA.py

From: Andrzej Dorobisz via AMBER <amber.ambermd.org>
Date: Fri, 9 Dec 2022 16:10:28 +0100

Dear Dan,
Thank you for investigating this bug. In our core dump we got exactly
the same values you pasted here (75042866, 3158064, ... at the beginning
of the Selected_ vector in atom mask object).

I hope you will manage to find the cause of this memory corruption.

Andrzej


On 9.12.2022 14:56, Daniel Roe via AMBER wrote:
> OK - so I was able to reproduce the bug, and it does seem like it's a
> memory overwrite issue. I'm running an extensive valgrind memcheck to
> try to pinpoint the exact cause now.
>
> What is happening is that the selected atoms array (which contains the
> indices of each selected atom) in the atom mask in the RMS action is
> being corrupted somehow. Here you can see the first two elements are
> clearly incorrect (it should look like 0, 1, 2, 3...):
>
> (gdb) print tgtMask_.Selected_
> $12 = std::vector of length 9280, capacity 16384 = {775042866,
> 3158064, 2, 3, 4, 5, 6, 7, 8, 9,
>
> There is almost no way this could happen without some sort of memory
> corruption since the routine that sets up the selected array
> (Selected_) looks like this (AtomMask.cpp):
>
> Selected_.clear();
> for (int atom = 0; atom != Natom_; atom++) {
> if (charmask[atom] == maskChar_)
> Selected_.push_back( atom );
> }
>
> When subsequent routines try to use this corrupted mask they hit the
> huge first index which is way out of range (in a 9280 atom system)
> which is what triggers the segfault that actually stops execution.
>
> Unfortunately one of the downsides to valgrind being thorough is that
> it is also slow. I've had the run going overnight and nothing has
> triggered yet. I'll keep you up to date with what I find.
>
> -Dan
>
> On Thu, Dec 8, 2022 at 10:17 AM Daniel Roe <daniel.r.roe.gmail.com> wrote:
>> Thanks, I'm downloading it now. I was able to run the given input with
>> cpptraj on the shorter trajectory you provided with no issues;
>> valgrind showed no memory errors. This is starting to feel like an
>> out-of-memory type issue, but I will keep digging.
>>
>> I'm already seeing some areas where quality of life improvements can
>> be made to cpptraj (e.g. every frame does not need to be printed to
>> stdout for 'onlyframes' etc).
>>
>> I'll report when/if I find anything. Thanks for the files.
>>
>> -Dan
>>
>> On Thu, Dec 8, 2022 at 8:51 AM Andrzej Dorobisz via AMBER
>> <amber.ambermd.org> wrote:
>>> Hi,
>>> I just uploaded the input data (22 GB) so you can download and test
>>> cpptraj on it.
>>>
>>> - file E81A.nc
>>> https://s3.cloud.cyfronet.pl/share/amber-cpptraj-issue/E81A.nc?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=71M2J3OGZ6O5J6K1WAFP%2F20221208%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20221208T134155Z&X-Amz-Expires=604800&X-Amz-SignedHeaders=host&X-Amz-Signature=d01eba1bf5f637ccd55f1f28cdd0623ded099ad95a226a66108f8bc8cc1eeca9
>>> <https://s3.cloud.cyfronet.pl/share/amber-cpptraj-issue/E81A.nc?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=71M2J3OGZ6O5J6K1WAFP%2F20221208%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20221208T134155Z&X-Amz-Expires=604800&X-Amz-SignedHeaders=host&X-Amz-Signature=d01eba1bf5f637ccd55f1f28cdd0623ded099ad95a226a66108f8bc8cc1eeca9>
>>>
>>> - all other files (E81A.top + input-cpptraj.txt)
>>> https://s3.cloud.cyfronet.pl/share/amber-cpptraj-issue/cpptraj-SIGSEGV-files.zip?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=71M2J3OGZ6O5J6K1WAFP%2F20221208%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20221208T134129Z&X-Amz-Expires=604800&X-Amz-SignedHeaders=host&X-Amz-Signature=ca2814b488dbe69ed12261ad8b64b057870d155779cba7b147ce9d05af2f7f70
>>> <https://s3.cloud.cyfronet.pl/share/amber-cpptraj-issue/cpptraj-SIGSEGV-files.zip?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=71M2J3OGZ6O5J6K1WAFP%2F20221208%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20221208T134129Z&X-Amz-Expires=604800&X-Amz-SignedHeaders=host&X-Amz-Signature=ca2814b488dbe69ed12261ad8b64b057870d155779cba7b147ce9d05af2f7f70>
>>>
>>> The error occurs after about 1 hour and 30 minutes.
>>>
>>>
>>> Regards,
>>> Andrzej
>>>
>>>
>>> On 6.12.2022 16:43, Daniel Roe via AMBER wrote:
>>>> Hi,
>>>>
>>>> On Tue, Dec 6, 2022 at 10:21 AM Andrzej Dorobisz
>>>> <andrzej.dorobisz.cyfronet.krakow.pl> wrote:
>>>>> Thank you for the quick reply. I can send the topology file but I don't
>>>>> know how to extract frames from the trajectory file (../input/E81A.nc).
>>>> The input would be something like:
>>>>
>>>> parm E81A.top
>>>> trajin E81A.nc 1 10
>>>> trajout E81A.1-10.nc
>>>>
>>>> -Dan
>>>>
>>>> _______________________________________________
>>>> AMBER mailing list
>>>> AMBER.ambermd.org
>>>> http://lists.ambermd.org/mailman/listinfo/amber
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Dec 09 2022 - 07:30:03 PST
Custom Search