[AMBER] AMBER-ORCA Interface from Ramdhan,Peter A via AMBER on 2024-10-11 (Amber Archive Oct 2024)

From: Ramdhan,Peter A via AMBER <amber.ambermd.org>
Date: Fri, 11 Oct 2024 13:08:47 +0000

Hi everyone,

I have a question about using QM with ORCA as the external program for AMBER. When I perform this calculation in serial, it works fine, however, when using parallel it fails after a couple of steps. Does anyone have experience with this?

Here is my mdin file:

&cntrl
imin = 0, ! Perform MD, not minimization
irest = 1, ! Restart simulation from previous run
ntx = 5, ! Coordinates and velocities from the restart file
nstlim =100, ! Number of MD steps = 1 ps
dt = 0.0005, ! Time step in picoseconds
cut = 8.0, ! Non-bonded cutoff in angstroms

ntr = 0, ! No positional restraints
restraint_wt = 0.0, ! Weight of restraint (no restraints applied)

ntb = 2, ! Constant pressure periodic boundary conditions
ntp = 1, ! Isotropic position scaling (NPT ensemble)
barostat = 1, ! Berendsen pressure control

ntc = 2, ! SHAKE on bonds involving hydrogen
ntf = 2, ! Bond interactions with hydrogens excluded

ntt = 3, ! Langevin thermostat
gamma_ln = 5.0, ! Collision frequency for Langevin dynamics
tempi = 310, ! Initial temperature
temp0 = 310, ! Target temperature

ioutfm = 1, ! Write binary trajectory file
ntpr = 1, ! Print energy information every 500 steps
ntwx = 1, ! Write coordinates to trajectory file every 500 steps
ntwr = 1, ! Write restart file every 500 steps

ifqnt=1, ! Turn on QM/MM
/

&qmmm
qmmask=':CYP.SG,CB|:HEM.FE,O1,NA,NB,NC,ND,C1C,C2C,C3C,C4C,CHD,HHD,C1D,C2D,C3D,C4D,CHA,HHA,C1A,C2A,C3A,C4A,CHB,HHB,C1B,C2B,C3B,C4B,CHC,HHC', ! QM region, specifying residues 1 and 465
qmmm_int=1, !
qm_theory='EXTERN', !
qmcharge=-2,
spin=4,
qmshake=0,
qm_ewald = 0,
qm_pme=0,
/
&orc
method = 'bp86',
basis = 'sv(p)',
num_threads=8,
maxcore=2000,
/

&wt
type='END'
&end

And here is my slurm file:

#!/bin/bash
#SBATCH --job-name=clop_qm
#SBATCH --nodes=1
#SBATCH --ntasks=8
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=4GB
#SBATCH --partition=gpu
#SBATCH --gres=gpu:a100:1
#SBATCH --time=4-00:00:00
#SBATCH --output=job.%j.out
#SBATCH --error=job.%j.err

module purge
ml gcc
#ml openmpi/4.1.1
export PATH=/apps/gcc/12.2.0/openmpi/4.1.1/orca/5.0.4:$PATH
export LD_LIBRARY_PATH=/apps/gcc/12.2.0/openmpi/4.1.1/orca/5.0.4:$LD_LIBRARY_PATH
export PATH=/apps/mpi/gcc/12.2.0/openmpi/4.1.1/bin:$PATH
export LD_LIBRARY_PATH=/apps/mpi/gcc/12.2.0/openmpi/4.1.1/lib:$LD_LIBRARY_PATH
source $AMBERHOME/amber.sh

$AMBERHOME/bin/sander -O -i step8_qm.mdin -o step8_qm.out -p com.parm7 -c step6.ncrst -r step7_qm.ncrst -x step7_qm.nc -ref step6.ncrst -inf step7_qm.info

I am encountering this error after a couple of steps:

------------------------- --------------------
FINAL SINGLE POINT ENERGY -2762.899076990582
------------------------- --------------------

[1728651922.027067] [c0800a-s17:1471414:0] mm_posix.c:234 UCX ERROR open(file_name=/proc/1471416/fd/71 flags=0x0) failed: No such file or directory
[1728651922.028197] [c0800a-s17:1471414:0] mm_posix.c:234 UCX ERROR open(file_name=/proc/1471416/fd/71 flags=0x0) failed: No such file or directory
[1728651922.028735] [c0800a-s17:1471414:0] mm_sysv.c:59 UCX ERROR shmat(shmid=3145743) failed: Invalid argument
[1728651922.028742] [c0800a-s17:1471414:0] mm_ep.c:189 UCX ERROR mm ep failed to connect to remote FIFO id 0x30000f: Shared memory error
[1728651922.025391] [c0800a-s17:1471415:0] mm_posix.c:234 UCX ERROR open(file_name=/proc/1471416/fd/71 flags=0x0) failed: No such file or directory
[1728651922.026517] [c0800a-s17:1471415:0] mm_posix.c:234 UCX ERROR open(file_name=/proc/1471416/fd/71 flags=0x0) failed: No such file or directory
[1728651922.027067] [c0800a-s17:1471415:0] mm_sysv.c:59 UCX ERROR shmat(shmid=3145743) failed: Invalid argument
[1728651922.027073] [c0800a-s17:1471415:0] mm_ep.c:189 UCX ERROR mm ep failed to connect to remote FIFO id 0x30000f: Shared memory error
[1728651922.032593] [c0800a-s17:1471417:0] mm_posix.c:234 UCX ERROR open(file_name=/proc/1471426/fd/71 flags=0x0) failed: No such file or directory
[1728651922.033773] [c0800a-s17:1471417:0] mm_posix.c:234 UCX ERROR open(file_name=/proc/1471426/fd/71 flags=0x0) failed: No such file or directory
[1728651922.034341] [c0800a-s17:1471417:0] mm_sysv.c:59 UCX ERROR shmat(shmid=3145746) failed: Invalid argument
[1728651922.034347] [c0800a-s17:1471417:0] mm_ep.c:189 UCX ERROR mm ep failed to connect to remote FIFO id 0x300012: Shared memory error
[1728651922.030778] [c0800a-s17:1471419:0] mm_posix.c:234 UCX ERROR open(file_name=/proc/1471426/fd/71 flags=0x0) failed: No such file or directory
[1728651922.031930] [c0800a-s17:1471419:0] mm_posix.c:234 UCX ERROR open(file_name=/proc/1471426/fd/71 flags=0x0) failed: No such file or directory
[1728651922.032472] [c0800a-s17:1471419:0] mm_sysv.c:59 UCX ERROR shmat(shmid=3145746) failed: Invalid argument
[1728651922.032479] [c0800a-s17:1471419:0] mm_ep.c:189 UCX ERROR mm ep failed to connect to remote FIFO id 0x300012: Shared memory error
[1728651922.025267] [c0800a-s17:1471420:0] mm_posix.c:234 UCX ERROR open(file_name=/proc/1471426/fd/71 flags=0x0) failed: No such file or directory
[1728651922.026408] [c0800a-s17:1471420:0] mm_posix.c:234 UCX ERROR open(file_name=/proc/1471426/fd/71 flags=0x0) failed: No such file or directory
[1728651922.026952] [c0800a-s17:1471420:0] mm_sysv.c:59 UCX ERROR shmat(shmid=3145746) failed: Invalid argument
[1728651922.026959] [c0800a-s17:1471420:0] mm_ep.c:189 UCX ERROR mm ep failed to connect to remote FIFO id 0x300012: Shared memory error
[1728651922.032055] [c0800a-s17:1471413:0] mm_posix.c:234 UCX ERROR open(file_name=/proc/1471416/fd/71 flags=0x0) failed: No such file or directory
[1728651922.033186] [c0800a-s17:1471413:0] mm_posix.c:234 UCX ERROR open(file_name=/proc/1471416/fd/71 flags=0x0) failed: No such file or directory
[1728651922.033720] [c0800a-s17:1471413:0] mm_sysv.c:59 UCX ERROR shmat(shmid=3145743) failed: Invalid argument
[1728651922.033727] [c0800a-s17:1471413:0] mm_ep.c:189 UCX ERROR mm ep failed to connect to remote FIFO id 0x30000f: Shared memory error

ORCA finished by error termination in SCF gradient
Calling Command: mpirun -np 8 /apps/gcc/12.2.0/openmpi/4.1.1/orca/5.0.4/orca_scfgrad_mpi orc_job.scfgrad.inp orc_job
[file orca_tools/qcmsg.cpp, line 465]:
.... aborting the run

Am I not allocating enough memory? According to the job file, it uses about 200-300 mb per step and if maxcore is mb allocated per cpu (ntasks is 8 and cpus per task is 1 so 8 total) then I figured it would be enough if my maxcore is 2000.

Sincerely,

Peter Ramdhan, PharmD

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Oct 11 2024 - 06:30:02 PDT