Hi ZhiWei,
I tracked this down to modifications made in the GTI code and it's recorded in the Amber bug tracker.
It looks like it's 3 different routines that cause it in order of worst offender to least offender:
1) gti_update_md_ene
2) gpu_pressure_scale
3) gpu_neighbor_list_setup
Here's the relevant valgrind output:
==452135== 94,504 (40 direct, 94,464 indirect) bytes in 1 blocks are definitely lost in loss record 2,038 of 2,120
==452135== at 0x81B08C3: operator new(unsigned long) (vg_replace_malloc.c:422)
==452135== by 0x74AEA2: gpu_neighbor_list_setup_ (gpu.cpp:6700)
==452135== by 0x625A95: __pme_setup_mod_MOD_final_pme_setup (pme_setup.F90:112)
==452135== by 0x5F670C: MAIN__ (pmemd.F90:625)
==452135== by 0x5F7964: main (pmemd.F90:77)
==452135==
==452135== 159,960 bytes in 19,995 blocks are definitely lost in loss record 2,050 of 2,120
==452135== at 0x81B004F: malloc (vg_replace_malloc.c:380)
==452135== by 0x81B50F2: realloc (vg_replace_malloc.c:1437)
==452135== by 0x27545232: ??? (in /usr/lib64/libstdc++.so.6.0.25)
==452135== by 0x2754E0A0: ??? (in /usr/lib64/libstdc++.so.6.0.25)
==452135== by 0x2754E360: __cxa_demangle (in /usr/lib64/libstdc++.so.6.0.25)
==452135== by 0x765D6E: GpuBuffer<NTPData>::Upload(NTPData*) (gpuBuffer.h:219)
==452135== by 0x758EFB: gpu_pressure_scale_ (gpu.cpp:9920)
==452135== by 0x5A1C5D: __runmd_mod_MOD_runmd (runmd.F90:2883)
==452135== by 0x5F705C: MAIN__ (pmemd.F90:863)
==452135== by 0x5F7964: main (pmemd.F90:77)
==452135==
==452135== 634,912 bytes in 19,841 blocks are definitely lost in loss record 2,104 of 2,120
==452135== at 0x81B004F: malloc (vg_replace_malloc.c:380)
==452135== by 0x81B50F2: realloc (vg_replace_malloc.c:1437)
==452135== by 0x27545232: ??? (in /usr/lib64/libstdc++.so.6.0.25)
==452135== by 0x2754E0A0: ??? (in /usr/lib64/libstdc++.so.6.0.25)
==452135== by 0x2754E360: __cxa_demangle (in /usr/lib64/libstdc++.so.6.0.25)
==452135== by 0x765FC2: GpuBuffer<unsigned long long>::Download(unsigned long long*) (gpuBuffer.h:253)
==452135== by 0x778DBB: icc_GetEnergyFromGPU(gti_gpuContext*, double*) (gti_gpu.cpp:69)
==452135== by 0x781F10: gti_update_md_ene_ (gti_f95.cpp:1630)
==452135== by 0x568075: __pme_force_mod_MOD_pme_force (pme_force.F90:3780)
==452135== by 0x59F714: __runmd_mod_MOD_runmd (runmd.F90:1592)
==452135== by 0x5F705C: MAIN__ (pmemd.F90:863)
==452135== by 0x5F7964: main (pmemd.F90:77)
==452135==
==452135== LEAK SUMMARY:
==452135== definitely lost: 806,176 bytes in 40,901 blocks
==452135== indirectly lost: 94,464 bytes in 1 blocks
==452135== possibly lost: 34,784 bytes in 283 blocks
==452135== still reachable: 99,412,077 bytes in 51,063 blocks
==452135== suppressed: 0 bytes in 0 blocks
I've not had a chance to look at it myself yet unfortunately - it's really waiting on the authors of the GTI code to fix and issue a patch.
All the best
Ross
> On May 26, 2023, at 11:46, 张志伟 via AMBER <amber.ambermd.org> wrote:
>
> Dear Amber experts,
>
>
> Since we switched to amber22, running the simulation using pmemd.cuda will always encounter a memory leak problem. Even a machine with 512G of memory will eventually be filled so that the task will be killed. In the email list, we found that Franz Waibl reported this issue in 2022.
>
>
> Is there any progress on this issue so far?
>
>
> Best regards,
> ZhiWei Zhang
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> https://urldefense.com/v3/__http://lists.ambermd.org/mailman/listinfo/amber__;!!Mih3wA!AQ---eZTgl_3F6A1xvdpT6Qktm1DjDwpZnWjX0pGVOwadR1ICbiW2Xf8d98rtAZiH3BtU1b_FYY$
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri May 26 2023 - 11:00:03 PDT