Re: [AMBER] Error in sander.quick.cuda

From: Goetz, Andreas via AMBER <amber.ambermd.org>
Date: Tue, 26 Aug 2025 02:33:32 +0000

Hi Kriti,

This is a failure of the Fock matrix diagonalization on GPU during the SCF iterations at some time step in your QM/MM MD simulation.

I do not recall encountered this myself and without additional details it is hard to say why this happened. It could be a hardware issue or it could be a numerical problem (e.g. near linear dependences in the basis set you are using combined with current geometry).

If the simulation otherwise proceeds as expected, I suggest to ignore this. If this repeatedly happens on the same GPU but not other GPUs, then you may have a hardware issue. If this keeps happening on different GPUs, then we can investigate it further.

All the best,
Andy


Dr. Andreas W. Goetz
Associate Research Scientist
San Diego Supercomputer Center
Tel: +1-858-822-4771
Email: agoetz.sdsc.edu
Web: www.awgoetz.de

On Aug 25, 2025, at 4:56 AM, Kriti Shukla via AMBER <amber.ambermd.org> wrote:

Hi community,
I have been trying to do a steered QM/MM MD in Amber24, but this type of
error is coming:
sander.quick.cuda:
/home/23cy91f06/amber24_src/AmberTools/src/quick/src/cuda/cusolver/quick_cusolver.c:212:
cuda_diag_: Assertion `CUSOLVER_STATUS_SUCCESS == cusolver_status' failed.

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
#0 0x7f31695823ff in ???
#1 0x7f3169582387 in ???
#2 0x7f3169583a77 in ???
#3 0x7f316957b1a5 in ???
#4 0x7f316957b251 in ???
#5 0x7f31724c5833 in ???
#6 0x7f3172303c9d in ???
#7 0x7f3172455cc8 in ???
#8 0x7f3172333716 in ???
#9 0x7f3172334c46 in ???
#10 0x7f31723359d6 in ???
#11 0x762b67 in ???
#12 0x707770 in ???
#13 0x5a1129 in ???
#14 0x63352e in ???
#15 0x5e77f0 in ???
#16 0x5e06ff in ???
#17 0x5e0755 in ???
#18 0x7f316956e554 in ???
#19 0x409cac in ???
#20 0xffffffffffffffff in ???
/var/share/slurm/d/job1154457/slurm_script: line 22: 216053 Aborted
       $ambexe -O -i smd_1.in -o smd_1.out -p 2ILI_solv.prmtop -c
f4080.ncrst -r smd_1.ncrst -x smd_1.nc

But this job is executing perfectly in another workstation, where Amber24
is installed under the same versions of modules. I am unable to interpret
the error.
Your guidance will be greatly appreciated.
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
https://urldefense.com/v3/__http://lists.ambermd.org/mailman/listinfo/amber__;!!Mih3wA!Cx3ZE-X3_iTFuahvYgLcZAjC7KViQF3x6S7cymuLwb3zF6cN9qOqVt0VzXLJOwiSdJOPmWWYVG1lwQ$

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Aug 25 2025 - 20:00:02 PDT
Custom Search