Re: [AMBER] about "...gti_controlVariable.i for GaMD-PPI ..." from Patricio Barletta via AMBER on 2025-10-06 (Amber Archive Oct 2025)

From: Patricio Barletta via AMBER <amber.ambermd.org>
Date: Tue, 7 Oct 2025 01:33:13 +0000

Yeah, those limits are too high then.

I glanced over the kernels that, I assume, are the ones giving you issues:

kgBuildSpecial2RestNBPreList_kernel(specialType type)
kgBuildSpecial2RestNBList_kernel(specialType type)
kgBuildTI2TINBList_kernel() {
kgBuildTI2TINBList_gamd2_kernel()

If you really need such a high atom count, then the easiest solution is to do away with the shared memory usage and use global memory:

Eg: at kgBuildSpecial2RestNBPreList_kernel()
```
  __shared__ volatile float coordTI[gti_simulationConst::MaxNumberTIAtom][3]; # REMOVE THIS
...
  while (th < listSize) { # REMOVE THIS WHOLE LOOP
    unsigned iatom = cSim.pImageAtomLookup[pList[th]];
    coordTI[th][0] = cSim.pImageX[iatom];
    coordTI[th][1] = cSim.pImageY[iatom];
    coordTI[th][2] = cSim.pImageZ[iatom];
    th += blockDim.x;
  }
...
        PMEFloat dx = coordTI[i][0] - x0; # REPLACE WITH cSim.pImage(X,Y,Z)[iatom]
        PMEFloat dy = coordTI[i][1] - y0;
        PMEFloat dz = coordTI[i][2] - z0;
```

The performance hit on each kernel will be massive, but I have no idea how the overall performance will be impacted. I've never run these calculations.

The proper fix would be to determine the amount of shared memory, given the architecture, and issue these kernels serially until all atoms are considered. Are you in contact with the gamd developers? Perhaps this is something they'd be willing to collaborate on. I don't know them personally, but my PI (Darrin York) surely does.

Sorry I can't be of much help. I work on the alchemical side of things, so I try not to touch other features.

Best of luck.

________________________________
From: Christian Seitz <cgseitz.uchicago.edu>
Sent: Monday, October 6, 2025 6:41 PM
To: Patricio Barletta <pb777.iqb.rutgers.edu>; Carmen Al Masri Said via AMBER <amber.ambermd.org>
Subject: Re: about "...gti_controlVariable.i for GaMD-PPI ..."

Hi Patricio,

Many thanks for the suggestions! I am using NVIDIA A100s and have cuda11.8 - cuda 12.6. I used a fresh tarball of amber24, and edited the cmake/CudaConfig.cmake from

elseif((${CUDA_VERSION} VERSION_GREATER_EQUAL 11.8) AND (${CUDA_VERSION} VERSION_LESS 12.7)) message(STATUS "Configuring for SM5.0, SM5.2, SM5.3, SM6.0, SM6.1, SM7.0, SM7.5, SM8.0, SM8.6, and SM9.0") list(APPEND CUDA_NVCC_FLAGS ${SM50FLAGS} ${SM52FLAGS} ${SM53FLAGS} ${SM60FLAGS} ${SM61FLAGS} ${SM70FLAGS} ${SM75FLAGS} ${SM80FLAGS} ${SM86FLAGS} ${SM90FLAGS} -Wno-deprecated-gpu-targets -Wno-deprecated-declarations)

to

elseif((${CUDA_VERSION} VERSION_GREATER_EQUAL 11.8) AND (${CUDA_VERSION} VERSION_LESS 12.7))
                        message(STATUS "Configuring for SM5.0, SM5.2, SM5.3, SM6.0, SM6.1, SM7.0, SM7.5, SM8.0, SM8.6, and SM9.0")
                        list(APPEND CUDA_NVCC_FLAGS ${SM80FLAGS} -Wno-deprecated-gpu-targets -Wno-deprecated-declarations)

After doing "make install" I see the same errors as before. I can send the full terminal output if necessary, but I do not see any errors related to global constant data, thankfully - only shared data. Let me know if I'm misunderstanding your suggestions, or any other ideas you have. Thank you!

Best,
Christian

Christian Seitz
PhD, Dept. of Computer Science | UChicago
cgseitz.uchicago.edu<mailto:cgseitz.uchicago.edu>
[www.linkedin.com/in/christianseitz21]<http://www.linkedin.com/in/christianseitz21>
________________________________
From: Patricio Barletta <pb777.iqb.rutgers.edu>
Sent: Wednesday, October 1, 2025 12:25
To: Christian Seitz <cgseitz.uchicago.edu>
Subject: about "...gti_controlVariable.i for GaMD-PPI ..."

I only see errors in shared memory allocations, so that's good. You can also run into issues with the constant memory when doing these edits.

There's no way around the limit on constant memory, so if you also got errors like "... uses too much global constant data...", you'll have to lower those atom counts, or maybe zero out the sizes of some variables for other features (eg: MaxNumberRMSDAtom).

About the shared memory, maybe your GPU can deal with the increased usage, but since amber is being built for multiple targets, your compilation is breaking unnecessarily.

Check your cuda version and your GPU capability,<https://urldefense.com/v3/__https://developer.nvidia.com/cuda-gpus__;!!BpyFHLRN4TMTrA!-OoutJvvOt75yETJdP_OLYYwHYM6hQuCvcYF1JhJYF_tT0CAVw2gjRiF0O1mqqPehoD4Czx3kKDpvJZgXF0MAFOK$> and edit cmake/CudaConfig.cmake

Say you have CUDA12 and and RTXA4500, you would go to:

```
elseif((${CUDA_VERSION} VERSION_GREATER_EQUAL 11.8) AND (${CUDA_VERSION} VERSION_LESS 12.7))
message(STATUS "Configuring for SM5.0, SM5.2, SM5.3, SM6.0, SM6.1, SM7.0, SM7.5, SM8.0, SM8.6, and SM9.0")
list(APPEND CUDA_NVCC_FLAGS ${SM50FLAGS} ${SM52FLAGS} ${SM53FLAGS} ${SM60FLAGS} ${SM61FLAGS} ${SM70FLAGS} ${SM75FLAGS} ${SM80FLAGS} ${SM86FLAGS} ${SM90FLAGS} -Wno-deprecated-gpu-targets -Wno-deprecated-declarations)

```
And delete all those ${SM*FLAGS} and leave only ${SM86FLAGS}.

Let me know how it goes.
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Oct 06 2025 - 19:00:02 PDT