Re: [AMBER] possible bug - SIGSEGV in cpptraj called from MMPBSA.py

From: Daniel Roe via AMBER <amber.ambermd.org>
Date: Fri, 9 Dec 2022 08:56:28 -0500

OK - so I was able to reproduce the bug, and it does seem like it's a
memory overwrite issue. I'm running an extensive valgrind memcheck to
try to pinpoint the exact cause now.

What is happening is that the selected atoms array (which contains the
indices of each selected atom) in the atom mask in the RMS action is
being corrupted somehow. Here you can see the first two elements are
clearly incorrect (it should look like 0, 1, 2, 3...):

(gdb) print tgtMask_.Selected_
$12 = std::vector of length 9280, capacity 16384 = {775042866,
3158064, 2, 3, 4, 5, 6, 7, 8, 9,

There is almost no way this could happen without some sort of memory
corruption since the routine that sets up the selected array
(Selected_) looks like this (AtomMask.cpp):

Selected_.clear();
for (int atom = 0; atom != Natom_; atom++) {
    if (charmask[atom] == maskChar_)
      Selected_.push_back( atom );
  }

When subsequent routines try to use this corrupted mask they hit the
huge first index which is way out of range (in a 9280 atom system)
which is what triggers the segfault that actually stops execution.

Unfortunately one of the downsides to valgrind being thorough is that
it is also slow. I've had the run going overnight and nothing has
triggered yet. I'll keep you up to date with what I find.

-Dan

On Thu, Dec 8, 2022 at 10:17 AM Daniel Roe <daniel.r.roe.gmail.com> wrote:
>
> Thanks, I'm downloading it now. I was able to run the given input with
> cpptraj on the shorter trajectory you provided with no issues;
> valgrind showed no memory errors. This is starting to feel like an
> out-of-memory type issue, but I will keep digging.
>
> I'm already seeing some areas where quality of life improvements can
> be made to cpptraj (e.g. every frame does not need to be printed to
> stdout for 'onlyframes' etc).
>
> I'll report when/if I find anything. Thanks for the files.
>
> -Dan
>
> On Thu, Dec 8, 2022 at 8:51 AM Andrzej Dorobisz via AMBER
> <amber.ambermd.org> wrote:
> >
> > Hi,
> > I just uploaded the input data (22 GB) so you can download and test
> > cpptraj on it.
> >
> > - file E81A.nc
> > https://s3.cloud.cyfronet.pl/share/amber-cpptraj-issue/E81A.nc?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=71M2J3OGZ6O5J6K1WAFP%2F20221208%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20221208T134155Z&X-Amz-Expires=604800&X-Amz-SignedHeaders=host&X-Amz-Signature=d01eba1bf5f637ccd55f1f28cdd0623ded099ad95a226a66108f8bc8cc1eeca9
> > <https://s3.cloud.cyfronet.pl/share/amber-cpptraj-issue/E81A.nc?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=71M2J3OGZ6O5J6K1WAFP%2F20221208%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20221208T134155Z&X-Amz-Expires=604800&X-Amz-SignedHeaders=host&X-Amz-Signature=d01eba1bf5f637ccd55f1f28cdd0623ded099ad95a226a66108f8bc8cc1eeca9>
> >
> > - all other files (E81A.top + input-cpptraj.txt)
> > https://s3.cloud.cyfronet.pl/share/amber-cpptraj-issue/cpptraj-SIGSEGV-files.zip?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=71M2J3OGZ6O5J6K1WAFP%2F20221208%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20221208T134129Z&X-Amz-Expires=604800&X-Amz-SignedHeaders=host&X-Amz-Signature=ca2814b488dbe69ed12261ad8b64b057870d155779cba7b147ce9d05af2f7f70
> > <https://s3.cloud.cyfronet.pl/share/amber-cpptraj-issue/cpptraj-SIGSEGV-files.zip?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=71M2J3OGZ6O5J6K1WAFP%2F20221208%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20221208T134129Z&X-Amz-Expires=604800&X-Amz-SignedHeaders=host&X-Amz-Signature=ca2814b488dbe69ed12261ad8b64b057870d155779cba7b147ce9d05af2f7f70>
> >
> > The error occurs after about 1 hour and 30 minutes.
> >
> >
> > Regards,
> > Andrzej
> >
> >
> > On 6.12.2022 16:43, Daniel Roe via AMBER wrote:
> > > Hi,
> > >
> > > On Tue, Dec 6, 2022 at 10:21 AM Andrzej Dorobisz
> > > <andrzej.dorobisz.cyfronet.krakow.pl> wrote:
> > >> Thank you for the quick reply. I can send the topology file but I don't
> > >> know how to extract frames from the trajectory file (../input/E81A.nc).
> > > The input would be something like:
> > >
> > > parm E81A.top
> > > trajin E81A.nc 1 10
> > > trajout E81A.1-10.nc
> > >
> > > -Dan
> > >
> > > _______________________________________________
> > > AMBER mailing list
> > > AMBER.ambermd.org
> > > http://lists.ambermd.org/mailman/listinfo/amber
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Dec 09 2022 - 06:00:03 PST
Custom Search