On Thu, Jul 24, 2025, Umut Çağan Uçar via AMBER wrote:
>
>I realized that Amber24 uses more memory than Amber16. For instance,
>below are the memory usages of the same system with Amber16 and
>Amber24 in conventional MD simulations:
>
>Amber16:
>| KB of GPU memory in use: 557017
>
>Amber24:
>| KB of GPU memory in use: 601960
>
>This difference does not matter with conventional MD. However, with
>replica exchange molecular dynamics we run out of memory. We have some
>REMD jobs (with 40 replicas) that run on GPUs without problem using
>Amber16. However, we cannot run the same jobs (or restart them) using
>Amber24. The Amber output file ends after writing Ewald parameters
>without any error message. On the other hand, the slurm output ends
>with a message like "cudaMalloc Failed out of memory". The same
>systems run successfully if we discard several replicas (for example,
>REMD with 30 replicas runs successfully).
I think you have correctly identified the problem. New features keep
getting added to the GPU code, with some increase in memory use that have
not been considered as carefully as they might have been. Experts on the
list may have some ideas here, but I'm guessing that there is no simple fix.
Thanks for the report...regards....dac
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Jul 24 2025 - 08:00:03 PDT