Re: [AMBER] pmemd.cuda.MPI run stops (amber24)

From: Dulal Mondal via AMBER <amber.ambermd.org>
Date: Sun, 21 Dec 2025 20:35:09 +0530

Please anyone respond.

*With regards,*
*Dulal Mondal,*
*Research Scholar,*
*Department of Chemistry,*
*IIT Kharagpur, Kharagpur 721302.*

On Sun, 14 Dec, 2025, 12:19 pm Dulal Mondal, <
babunmondal.chem.kgpian.iitkgp.ac.in> wrote:

> Thank you for your reply.
>
> The mdout file contains the following header:
>
> ------------------------------------------------------- Amber 24 PMEMD 2024
> -------------------------------------------------------
>
> The pmemd.cuda executable works correctly *on the DGX cluster.*
> Additionally, the constant pH replica-exchange simulations run
> successfully using pmemd.cuda.MPI *on the DGX cluster.*
>
> However, the *REAF calculation does not work on the DGX cluster*,
> although the same REAF calculation runs successfully on another cluster.
>
> On Sat, Dec 13, 2025 at 9:53 PM David A Case <dacase1.gmail.com> wrote:
>
>> On Fri, Dec 12, 2025, Dulal Mondal via AMBER wrote:
>>
>> >I submit a REAF job using pmemd.cuda.MPI. But the error is
>> > Primary job terminated normally, but 1 process returned
>> >a non-zero exit code. Per user-direction, the job has been aborted.
>>
>> >--------------------------------------------------------------------------
>>
>> >--------------------------------------------------------------------------
>> >mpirun detected that one or more processes exited with non-zero status,
>> >thus causing
>> >the job to be terminated. The first process to do so was:
>> >
>> > Process name: [[41136,1],2]
>> > Exit code: 255
>>
>> >--------------------------------------------------------------------------
>> >and
>> >*cudaMemcpyToSymbol: SetSim copy to cSim failed invalid device symbol*
>>
>> This message, and the MPI one, just indicate that some error occurred, but
>> offer no realy clues as to why.
>>
>> Is there anything in the mdout file that looks suspicious. Does the code
>> work with the non-MPI version of pmemd.cuda? Is the fact that REAF is
>> being
>> used relevant? (That is, do non-REAF jobs work OK?) Does that system
>> work
>> OK with the CPU version of pmemd?
>>
>> I think you will have to do some trial and error debugging to try to
>> localize the source of the problem.
>>
>> >
>> >But amber 24 installation using cuda 11.7 and openmpi version 4.1.2 is
>> >successfully completed.
>>
>> Does this imply that your ran the test suite (e.g. 'make
>> test.cuda.serial')
>> successfully?
>>
>> ...good luck...dac
>>
>
>
> --
> *With regards,*
> *Dulal Mondal,*
> *Research Scholar,*
> *Department of Chemistry,*
> *IIT Kharagpur, Kharagpur 721302.*
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sun Dec 21 2025 - 07:30:02 PST
Custom Search