Re: [AMBER] Segmentation fault when running on GPU, working fine on CPU from Markowska on 2025-08-08 (Amber Archive Aug 2025)

From: Markowska <amber.ambermd.org>
Date: Fri, 8 Aug 2025 11:51:18 +0200

Hi Zhen,

thank you so much for the investigation.
As for your 1st option - I'm unable to upgrade to Amber24. I can suggest
this to the IT support of the HPC that I'm using for the simulations, but
I'm not sure if it's possible.
For the 2nd option - I checked the output file and also the cluster itself.
I'm using Amber22 on an A100 GPU (NVIDIA A100-SXM-64GB) but with a slightly
older cuda version, 12.2. And it fails again.

Also, the error message I'm getting is different from yours, it's the
following:
Backtrace for this error:
#0 0x1517ea44ab4f in ???
#1 0x60e86d in gti_lj1264_nb_setup_
        at
/dev/shm/propro01/spack-stage-amber-22-cz7v3y4nrcoxnjsgdwukvsexakhx2k5k/spack-src/src/pmemd/src/cuda/gti_f95.cpp:555
#2 0x49b92b in __extra_pnts_nb14_mod_MOD_nb14_setup
        at
/dev/shm/propro01/spack-stage-amber-22-cz7v3y4nrcoxnjsgdwukvsexakhx2k5k/spack-src/src/pmemd/src/extra_pnts_nb14.F90:540
#3 0x505e30 in __pme_alltasks_setup_mod_MOD_pme_alltasks_setup
        at
/dev/shm/propro01/spack-stage-amber-22-cz7v3y4nrcoxnjsgdwukvsexakhx2k5k/spack-src/src/pmemd/src/pme_alltasks_setup.F90:251
#4 0x4e23f9 in pmemd
        at
/dev/shm/propro01/spack-stage-amber-22-cz7v3y4nrcoxnjsgdwukvsexakhx2k5k/spack-src/src/pmemd/src/pmemd.F90:518
#5 0x411fbc in main
        at
/dev/shm/propro01/spack-stage-amber-22-cz7v3y4nrcoxnjsgdwukvsexakhx2k5k/spack-src/src/pmemd/src/pmemd.F90:77
/var/spool/slurmd/job18638520/slurm_script: line 30: 2685272 Segmentation
fault $AMBERHOME/bin/pmemd.cuda -O -i relax03.in -p $TOP -c $CRD -ref
$CRD -o ${NAME}_relax03.out -r ${NAME}_relax03.rst7 -x ${NAME}_relax03.nc

Best regards,
Karolina

czw., 7 sie 2025 o 18:43 Li, Zhen <lizhen6.chemistry.msu.edu> napisał(a):

> Hi Karolina,
>
> Thank you for waiting. After thoroughly reviewing the code, I have
> identified several ways to avoid the segfault from my end. Hopefully, it
> will work for you.
>
>
> 1. Upgrade to AMBER24 if possible. There was an update to optimize the
> GPU memory allocation for the 1264 code. The old code defines the
> allocation factor by only checking whether the architecture is Pascal,
> Volta or Ampere, but now there are new GPU architectures like Hopper, etc.
> (Please see the source code below)
> 2. Try to stick to K80, V100, and A100 GPUs. I have tested the 1264
> code on those devices with CUDA 12.3, and it is working fine on my end.
>
>
> If it is still not working, thank you for passing the error message to me.
> Does it look like "of length = 42Failed an illegal memory access was
> encountered", or "segfault with several rows of address printed"?
>
> Thank you again.
> Zhen
>
>
> //---------------------------------------------------------------------------------------------
> // ik_Build1264NBList:
> //
> // Arguments:
> // gpu: overarching data structure containing simulation
> information, here used
> // for stream directions and kernel launch parameters
>
> //---------------------------------------------------------------------------------------------
> void ik_Build1264NBList(gpuContext gpu)
> {
> if (gpu->sim.numberLJ1264Atoms == 0) {
> return;
> }
> int nterms = gpu->sim.atoms / GRID;
>
> unsigned threadsPerBlock = (isDPFP) ? 128 : ((gpu->major < 6) ? 768 :
> 256);
> unsigned factor = 1;
> unsigned blocksToUse = (isDPFP) ? gpu->blocks : min((nterms /
> threadsPerBlock ) + 1,
> gpu->blocks *
> factor);
>
> kgBuildSpecial2RestNBPreList_kernel<<<blocksToUse, threadsPerBlock,
> 0,
> gpu->mainStream>>>(GTI_NB::LJ1264);
> LAUNCHERROR("kgBuildSpecial2RestNBPreList");
>
> nterms = gpu->sim.atoms;
>
> threadsPerBlock = (isDPFP) ? 128 : 512; // Tuned w/ M2000M
> threadsPerBlock = min(threadsPerBlock, MAX_THREADS_PER_BLOCK);
> factor = (PASCAL || VOLTA || AMPERE) ? 1 : 2;
> blocksToUse = (isDPFP) ? gpu->blocks : min((nterms / threadsPerBlock) +
> 1,
> gpu->blocks * factor);
>
> kgBuildSpecial2RestNBList_kernel<<<blocksToUse, threadsPerBlock,
> 0, gpu->mainStream>>>(GTI_NB::LJ1264);
> LAUNCHERROR("kgBuildSpecial2RestNBList");
>
> nterms = gpu->sim.numberLJ1264Atoms * 400;
> threadsPerBlock = (isDPFP) ? 128 : ((PASCAL || VOLTA || AMPERE) ? 64 :
> 1024);
> factor = (PASCAL || VOLTA || AMPERE) ? 4 : 1;
> blocksToUse = (isDPFP) ? gpu->blocks : min((nterms / threadsPerBlock) +
> 1,
> gpu->blocks * factor);
>
> kg1264NBListFillAttribute_kernel <<<blocksToUse, threadsPerBlock,
> 0, gpu->mainStream>>>();
> LAUNCHERROR("kgBuild1264NBListFillAttribute");
> }
>
>
>
> _____________________
>
> Zhen Li <http://lizhen62017.wixsite.com/home>, Ph.D.,
>
> The Merz Research Group <http://merzgroup.org>,
>
> Michigan State University,
>
> Cleveland Clinic.
> ------------------------------
> *From:* Karolina Mitusińska (Markowska) <markowska.kar.gmail.com>
> *Sent:* Sunday, August 3, 2025 5:58 AM
> *To:* Li, Zhen <lizhen6.chemistry.msu.edu>; David A Case <
> dacase1.gmail.com>
> *Cc:* AMBER Mailing List <amber.ambermd.org>
> *Subject:* Re: [AMBER] Segmentation fault when running on GPU, working
> fine on CPU
>
> Dear prof. Case and Zhen,
>
> thank you for your hints.
> I tried to run minimization using sander.MPI, however with the same result
> - it goes well at first (I ran 2500 steps of sander.MPI minimization), but
> when I switch to pmemd.cuda, it crashes again with segmentation fault. And
> of course for the non-1264 prmtop everything goes fine.
> So now I believe it must be related to the 12-6-4 params. But why? I'm
> definitely using Amber 22, I checked that again.
>
> I'm using a modified set of LJ parameters that I got for tests, and
> therefore I don't want to paste them here on the list, but will share them
> with the developers if needed. I managed to run them on CPU starting from
> minimization up to heating to 300 K, but now I would like to switch to GPU
> and it's impossible because of the seg fault. Are there any more general
> hints that I could use to try and run the simulations on GPU?
>
> Best,
> Karolina
>
> niedz., 3 sie 2025 o 00:51 Li, Zhen <lizhen6.chemistry.msu.edu>
> napisał(a):
>
> Hi Karolina,
>
> Dr. Case pointed out a very helpful way of debugging it. Could you
> double-check whether your AMBER version is 22 or 20? There is a known bug
> in AMBER20 GPU 1264 (see the red paragraph here
> <https://urldefense.com/v3/__https://ambermd.org/tutorials/advanced/tutorial20/m1264.php__;!!HXCxUKc!yGo5O-svaNTy8fr_Taj7ZFKNnlwDgjyZYNpSbMFO4u-tUA1Iy5j1Zr_Triv66NIPCqhlGHBkd07rx41pvxIrqOTOMBbH4Nw$>),
> where applying C4 to the last atom type results in a segfault because the
> code fails to update atom type indexing from [1, 2, 3,...] to [0, 1,
> 2,...]. It was later patched in AMBER22.
>
> Hope the debugging went well. Another helpful way for us developers is to
> provide the printljmatrix output as both the 1264 and m1264 tutorials show.
> Thank you very much!
>
> Best regards,
> Zhen.
>
> _____________________
>
> Zhen Li
> <https://urldefense.com/v3/__http://lizhen62017.wixsite.com/home__;!!HXCxUKc!yGo5O-svaNTy8fr_Taj7ZFKNnlwDgjyZYNpSbMFO4u-tUA1Iy5j1Zr_Triv66NIPCqhlGHBkd07rx41pvxIrqOTOFn0VuUY$>,
> Ph.D.,
>
> The Merz Research Group
> <https://urldefense.com/v3/__http://merzgroup.org__;!!HXCxUKc!yGo5O-svaNTy8fr_Taj7ZFKNnlwDgjyZYNpSbMFO4u-tUA1Iy5j1Zr_Triv66NIPCqhlGHBkd07rx41pvxIrqOTONmnT6_M$>
> ,
>
> Michigan State University,
>
> Cleveland Clinic.
> ------------------------------
> *From:* David A Case via AMBER <amber.ambermd.org>
> *Sent:* Saturday, August 2, 2025 5:26 PM
> *To:* Karolina Mitusińska (Markowska) <markowska.kar.gmail.com>; AMBER
> Mailing List <amber.ambermd.org>
> *Subject:* Re: [AMBER] Segmentation fault when running on GPU, working
> fine on CPU
>
> On Sat, Aug 02, 2025, Karolina Mitusińska (Markowska) via AMBER wrote:
> >
> >I'm facing an interesting issue with Amber22.
> >I want to use the 12-6-4 LJ parameters for my system, using the following
> >tutorial:
> https://urldefense.com/v3/__https://ambermd.org/tutorials/advanced/tutorial20/12_6_4.php__;!!HXCxUKc!19t96I0y_fKwtF7RjzkwpL7XdsRD_mdpkRdbUPq2WEjQyohDhosUmFB7oZ-T2heW3RTRiQaRwifP000SN23bV3w$
> >I prepared the system using the frcmod.ions234lm_1264_tip3p for my system
> >solvated in TIP3P water model. I generated the .inpcrd and .prmtop files
> >without any errors using tLeaP.
> >Then I used parmed to add the C coefficient parameters to the system.
> >Parmed did not report any issues with the files.
> >
> >But when I tried to run minimization on the system, I'm seeing a
> >segmentation fault error and the output of the minimization ends at the
> >following line:
>
> We need to first figure out if the seg fault has anything to do with
> 12-6-4.
> Run a few steps of minimization with sander.MPI, say 25 steps with ntpr=1
> and ntmin=3.
>
> Don't worry that it is slow: you generally only need to do a few hundred
> steps of minimization with sander. If you are lucky, you can then go back
> to pmemd.cuda and continue with more minimzation or with MD.
>
> There are many strange failures that can happen with minimization with
> pmemd.cuda, which is why I am suggesting this. Of course, if the sander
> run
> also fails, there may a 12-6-4 specific problem. But it might give you
> better error messages.
>
> ...good luck...dac
>
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
>
> https://urldefense.com/v3/__http://lists.ambermd.org/mailman/listinfo/amber__;!!HXCxUKc!19t96I0y_fKwtF7RjzkwpL7XdsRD_mdpkRdbUPq2WEjQyohDhosUmFB7oZ-T2heW3RTRiQaRwifP000Sr5y_jYc$
>
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Aug 08 2025 - 03:00:03 PDT