Re: [AMBER] cpptraj error with Test_SPAM from Daniel Roe via AMBER on 2024-03-15 (Amber Archive Mar 2024)

From: Daniel Roe via AMBER <amber.ambermd.org>
Date: Fri, 15 Mar 2024 08:36:00 -0400

Hi,

Thanks for the report. I'm trying to get access to an ARM machine so I
can try to look at this myself.

Thanks also for your suggested fixes. Protecting FloatWidth() is a
good idea, but I'm also going to try to figure out why 0.0 is being
passed in as a width at all; that probably should be caught somewhere
in the SPAM action.

I've started an Issue on GitHub to keep track of this:
https://github.com/Amber-MD/cpptraj/issues/1070

-Dan

On Fri, Mar 15, 2024 at 2:14 AM Hirosuke Hotta (Fujitsu) via AMBER
<amber.ambermd.org> wrote:
>
> Dear cpptraj developers,
>
> Regarding cpptraj error with test "Test_SPAM".
> We found Test_SPAM (there are 2 tests in RunTest.sh, what I say here is the 1st one) was abnormally ended with segmentation fault on the ARM version of RHEL8.8, which was not seen on the ARM versions of RHEL8.7 or older.
> We also found the similar symptoms were recognized on some versions (arm64, riscv64, sparc64 and so on) of Debian Linux (https://tracker.debian.org/pkg/cpptraj).
>
> After investigation, we found the segmentation fault occurred at "vsprintf(linbuffer_,format,args);" in the function "CpptrajFile::Printf" (CpptrajFile.cpp).
> In the function,
> - The value of "format" was "%8.2147483647f" on ARM Linux system, while "8.3f" on Intel Linux system
> - The value of "linebuffer_" is, we think, 1024 or so, which is much smaller than 2147483647
> - On some ARM system (like RHEL8.7), Test_SPAM works, BUT summary.dat looks anomalous, i.e., we don't see any numbers in the column of "#Peak"
> - On some ARM system (like RHEL8.8), Test_SPAM fails with the segmentation fault.
> - As far as we know, this problem may occur with AmberTools 20, 21, 22 and 23.
>
> Also we found the fundamental causes were as follows. In the function "FloatWidth" (StringRoutines.cpp),
> - there are no checks with the argument for log10() in the code, and unfortunately 0.0 is passed as the argument, and
> - the return value "(int)float_exponent" (= (int)(fabs(log10(0.0))+1)) is different between ARM and Intel
> on ARM: 2147483647 (max value of signed int)
> on Intel: -2147483648 (min value of signed int)
>
> For our idea of modification and the detail of what happens, please see the attached file.
>
> Regards,
> Hiro
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Mar 15 2024 - 06:00:02 PDT