Hi Martin,
Thank you for the help! That did work, however now I am running into another issue. I am trying to now use ORCA 6.0 coupled with AMBER 24. Here is my input:
ASMD simulation
&cntrl
imin = 0, nstlim = 238, dt = 0.001,
ntx = 1, temp0 = 310,tempi=310,
ntt = 3, gamma_ln=5.0,
ntc = 2, ntf = 2, ntb =1,
ntwx = 5, ntwr = 5, ntpr = 5,
cut = 8.0, ig=-1, ioutfm=1,
irest = 0, jar=1,
ifqnt=1, ! Turn on QM/MM
/
&qmmm
qmmask=':MOL.C5,C6,C8,H8,C10,H9,S1|:CYP.SG,CB|:HEM.FE,O1,NA,NB,NC,ND,C1C,C2C,C3C,C4C,CHD,HHD,C1D,C2D,C3D,C4D,CHA,HHA,C1A,C2A,C3A,C4A,CHB,HHB,C1B,C2B,C3B,C4B,CHC,HHC', ! QM region, specifying residues 1 and 465
qmmm_int=1, !
qm_theory='EXTERN', !
qmcharge=-2,
spin=4,
qmshake=0,
qm_ewald = 0,
qm_pme=0,
/
&orc
method = 'b3lyp',
basis = '6-31G**',
num_threads=16,
maxcore=3000,
/
&wt type='DUMPFREQ', istep1=5 /
&wt type='END' /
DISANG=dist.RST.dat.1
DUMPAVE=asmd_24.work.dat.1
LISTIN=POUT
LISTOUT=POUT
My slurm file:
#!/bin/bash
#SBATCH --job-name=ASMD_stage24
#SBATCH --mail-type=END,FAIL
#SBATCH --nodes=1
#SBATCH --ntasks=16
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=4GB
#SBATCH --time=4-00:00:00
echo "Running orca test calculation on a with 16 CPU cores"
echo "Date = $(date)"
echo "Hostname = $(hostname -s)"
echo "Working Directory = $(pwd)"
echo ""
echo "Number of Nodes Allocated = $SLURM_JOB_NUM_NODES"
echo "Number of Tasks Allocated = $SLURM_NTASKS"
echo "Number of Cores/Task Allocated = $SLURM_CPUS_PER_TASK"
echo ""
# Load required modules
module purge
module load cuda/12.4.1 gcc/12.2.0 openmpi/4.1.6 amber/24
# Set ORCA path and ensure mpirun is accessible
export orcadir=/blue/lic/pramdhan1/orca-6.0.0
export PATH=$orcadir:$PATH
export PATH=/apps/mpi/cuda/12.4.1/gcc/12.2.0/openmpi/4.1.6/bin:$PATH
export LD_LIBRARY_PATH=/apps/mpi/cuda/12.4.1/gcc/12.2.0/openmpi/4.1.6/lib:$LD_LIBRARY_PATH
# Suppress CUDA-aware support warning in OpenMPI
export OMPI_MCA_opal_warn_on_missing_libcuda=0
# Check ORCA and mpirun paths
which mpirun
which orca
echo $PATH
echo $LD_LIBRARY_PATH
# Run Amber with sander in the local scratch directory
$AMBERHOME/bin/sander -O -i asmd_24.1.mdin -o asmd_24.1.out \
-p "com.parm7" \
-c "readySMD.ncrst" \
-r asmd_24.1.ncrst -x asmd_24.1.nc \
-ref "readySMD.ncrst" \
-inf asmd_24.1.info
My error:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!! FATAL ERROR ENCOUNTERED !!!
!!! ----------------------- !!!
!!! I/O OPERATION FAILED !!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!! FATAL ERROR ENCOUNTERED !!!
!!! ----------------------- !!!
!!! I/O OPERATION FAILED !!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!! FATAL ERROR ENCOUNTERED !!!
!!! ----------------------- !!!
!!! I/O OPERATION FAILED !!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[36284,1],3]
Exit code: 64
--------------------------------------------------------------------------
[file orca_tools/qcmsg.cpp, line 394]:
.... aborting the run
It performs one calculation and returns the above error. I'm not sure if this is an ORCA 6.0 issue or an mpirun issue. There are other sections within one calculation that utilize mpirun, however, it works fine there. It's just towards the ending, prior to when a new calculation would start, that it fails.
Sincerely,
Peter Ramdhan
________________________________
From: Martin Juhás <juhasm.faf.cuni.cz>
Sent: Friday, October 11, 2024 2:29 PM
To: Ramdhan,Peter A <pramdhan1.ufl.edu>; AMBER Mailing List <amber.ambermd.org>
Subject: Re: AMBER-ORCA Interface
[External Email]
Hi, this looks like a problem with the MPI and/or UCX library. Try running somr test job only with orca using more than 1 cpu if that works.
Best,
Martin
Odoslané z aplikácie Outlook pre Android<
https://aka.ms/AAb9ysg>
________________________________
From: Ramdhan,Peter A via AMBER <amber.ambermd.org>
Sent: Friday, October 11, 2024 3:08:47 PM
To: AMBER Mailing List <amber.ambermd.org>
Subject: [AMBER] AMBER-ORCA Interface
[EXTERNAL EMAIL]
Hi everyone,
I have a question about using QM with ORCA as the external program for AMBER. When I perform this calculation in serial, it works fine, however, when using parallel it fails after a couple of steps. Does anyone have experience with this?
Here is my mdin file:
&cntrl
imin = 0, ! Perform MD, not minimization
irest = 1, ! Restart simulation from previous run
ntx = 5, ! Coordinates and velocities from the restart file
nstlim =100, ! Number of MD steps = 1 ps
dt = 0.0005, ! Time step in picoseconds
cut = 8.0, ! Non-bonded cutoff in angstroms
ntr = 0, ! No positional restraints
restraint_wt = 0.0, ! Weight of restraint (no restraints applied)
ntb = 2, ! Constant pressure periodic boundary conditions
ntp = 1, ! Isotropic position scaling (NPT ensemble)
barostat = 1, ! Berendsen pressure control
ntc = 2, ! SHAKE on bonds involving hydrogen
ntf = 2, ! Bond interactions with hydrogens excluded
ntt = 3, ! Langevin thermostat
gamma_ln = 5.0, ! Collision frequency for Langevin dynamics
tempi = 310, ! Initial temperature
temp0 = 310, ! Target temperature
ioutfm = 1, ! Write binary trajectory file
ntpr = 1, ! Print energy information every 500 steps
ntwx = 1, ! Write coordinates to trajectory file every 500 steps
ntwr = 1, ! Write restart file every 500 steps
ifqnt=1, ! Turn on QM/MM
/
&qmmm
qmmask=':CYP.SG,CB|:HEM.FE,O1,NA,NB,NC,ND,C1C,C2C,C3C,C4C,CHD,HHD,C1D,C2D,C3D,C4D,CHA,HHA,C1A,C2A,C3A,C4A,CHB,HHB,C1B,C2B,C3B,C4B,CHC,HHC', ! QM region, specifying residues 1 and 465
qmmm_int=1, !
qm_theory='EXTERN', !
qmcharge=-2,
spin=4,
qmshake=0,
qm_ewald = 0,
qm_pme=0,
/
&orc
method = 'bp86',
basis = 'sv(p)',
num_threads=8,
maxcore=2000,
/
&wt
type='END'
&end
And here is my slurm file:
#!/bin/bash
#SBATCH --job-name=clop_qm
#SBATCH --nodes=1
#SBATCH --ntasks=8
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=4GB
#SBATCH --partition=gpu
#SBATCH --gres=gpu:a100:1
#SBATCH --time=4-00:00:00
#SBATCH --output=job.%j.out
#SBATCH --error=job.%j.err
module purge
ml gcc
#ml openmpi/4.1.1
export PATH=/apps/gcc/12.2.0/openmpi/4.1.1/orca/5.0.4:$PATH
export LD_LIBRARY_PATH=/apps/gcc/12.2.0/openmpi/4.1.1/orca/5.0.4:$LD_LIBRARY_PATH
export PATH=/apps/mpi/gcc/12.2.0/openmpi/4.1.1/bin:$PATH
export LD_LIBRARY_PATH=/apps/mpi/gcc/12.2.0/openmpi/4.1.1/lib:$LD_LIBRARY_PATH
source $AMBERHOME/amber.sh
$AMBERHOME/bin/sander -O -i step8_qm.mdin -o step8_qm.out -p com.parm7 -c step6.ncrst -r step7_qm.ncrst -x step7_qm.nc -ref step6.ncrst -inf step7_qm.info
I am encountering this error after a couple of steps:
------------------------- --------------------
FINAL SINGLE POINT ENERGY -2762.899076990582
------------------------- --------------------
[1728651922.027067] [c0800a-s17:1471414:0] mm_posix.c:234 UCX ERROR open(file_name=/proc/1471416/fd/71 flags=0x0) failed: No such file or directory
[1728651922.028197] [c0800a-s17:1471414:0] mm_posix.c:234 UCX ERROR open(file_name=/proc/1471416/fd/71 flags=0x0) failed: No such file or directory
[1728651922.028735] [c0800a-s17:1471414:0] mm_sysv.c:59 UCX ERROR shmat(shmid=3145743) failed: Invalid argument
[1728651922.028742] [c0800a-s17:1471414:0] mm_ep.c:189 UCX ERROR mm ep failed to connect to remote FIFO id 0x30000f: Shared memory error
[1728651922.025391] [c0800a-s17:1471415:0] mm_posix.c:234 UCX ERROR open(file_name=/proc/1471416/fd/71 flags=0x0) failed: No such file or directory
[1728651922.026517] [c0800a-s17:1471415:0] mm_posix.c:234 UCX ERROR open(file_name=/proc/1471416/fd/71 flags=0x0) failed: No such file or directory
[1728651922.027067] [c0800a-s17:1471415:0] mm_sysv.c:59 UCX ERROR shmat(shmid=3145743) failed: Invalid argument
[1728651922.027073] [c0800a-s17:1471415:0] mm_ep.c:189 UCX ERROR mm ep failed to connect to remote FIFO id 0x30000f: Shared memory error
[1728651922.032593] [c0800a-s17:1471417:0] mm_posix.c:234 UCX ERROR open(file_name=/proc/1471426/fd/71 flags=0x0) failed: No such file or directory
[1728651922.033773] [c0800a-s17:1471417:0] mm_posix.c:234 UCX ERROR open(file_name=/proc/1471426/fd/71 flags=0x0) failed: No such file or directory
[1728651922.034341] [c0800a-s17:1471417:0] mm_sysv.c:59 UCX ERROR shmat(shmid=3145746) failed: Invalid argument
[1728651922.034347] [c0800a-s17:1471417:0] mm_ep.c:189 UCX ERROR mm ep failed to connect to remote FIFO id 0x300012: Shared memory error
[1728651922.030778] [c0800a-s17:1471419:0] mm_posix.c:234 UCX ERROR open(file_name=/proc/1471426/fd/71 flags=0x0) failed: No such file or directory
[1728651922.031930] [c0800a-s17:1471419:0] mm_posix.c:234 UCX ERROR open(file_name=/proc/1471426/fd/71 flags=0x0) failed: No such file or directory
[1728651922.032472] [c0800a-s17:1471419:0] mm_sysv.c:59 UCX ERROR shmat(shmid=3145746) failed: Invalid argument
[1728651922.032479] [c0800a-s17:1471419:0] mm_ep.c:189 UCX ERROR mm ep failed to connect to remote FIFO id 0x300012: Shared memory error
[1728651922.025267] [c0800a-s17:1471420:0] mm_posix.c:234 UCX ERROR open(file_name=/proc/1471426/fd/71 flags=0x0) failed: No such file or directory
[1728651922.026408] [c0800a-s17:1471420:0] mm_posix.c:234 UCX ERROR open(file_name=/proc/1471426/fd/71 flags=0x0) failed: No such file or directory
[1728651922.026952] [c0800a-s17:1471420:0] mm_sysv.c:59 UCX ERROR shmat(shmid=3145746) failed: Invalid argument
[1728651922.026959] [c0800a-s17:1471420:0] mm_ep.c:189 UCX ERROR mm ep failed to connect to remote FIFO id 0x300012: Shared memory error
[1728651922.032055] [c0800a-s17:1471413:0] mm_posix.c:234 UCX ERROR open(file_name=/proc/1471416/fd/71 flags=0x0) failed: No such file or directory
[1728651922.033186] [c0800a-s17:1471413:0] mm_posix.c:234 UCX ERROR open(file_name=/proc/1471416/fd/71 flags=0x0) failed: No such file or directory
[1728651922.033720] [c0800a-s17:1471413:0] mm_sysv.c:59 UCX ERROR shmat(shmid=3145743) failed: Invalid argument
[1728651922.033727] [c0800a-s17:1471413:0] mm_ep.c:189 UCX ERROR mm ep failed to connect to remote FIFO id 0x30000f: Shared memory error
ORCA finished by error termination in SCF gradient
Calling Command: mpirun -np 8 /apps/gcc/12.2.0/openmpi/4.1.1/orca/5.0.4/orca_scfgrad_mpi orc_job.scfgrad.inp orc_job
[file orca_tools/qcmsg.cpp, line 465]:
.... aborting the run
Am I not allocating enough memory? According to the job file, it uses about 200-300 mb per step and if maxcore is mb allocated per cpu (ntasks is 8 and cpus per task is 1 so 8 total) then I figured it would be enough if my maxcore is 2000.
Sincerely,
Peter Ramdhan, PharmD
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Upozornění: Není-li v této zprávě výslovně uvedeno jinak, má tato e-mailová zpráva nebo její přílohy pouze informativní charakter. Tato zpráva ani její přílohy v žádném ohledu Univerzitu Karlovu k ničemu nezavazují. Text této zprávy nebo jejích příloh není návrhem na uzavření smlouvy, ani přijetím případného návrhu na uzavření smlouvy, ani jiným právním jednáním směřujícím k uzavření jakékoliv smlouvy a nezakládá předsmluvní odpovědnost Univerzity Karlovy. Obsahuje-li tento e-mail nebo některá z jeho příloh osobní údaje, dbejte při jeho dalším zpracování (zejména při archivaci) souladu s pravidly evropského nařízení GDPR.
Disclaimer: If not expressly stated otherwise, this e-mail message (including any attached files) is intended purely for informational purposes and does not represent a binding agreement on the part of Charles University. The text of this message and its attachments cannot be considered as a proposal to conclude a contract, nor the acceptance of a proposal to conclude a contract, nor any other legal act leading to concluding any contract; nor does it create any pre-contractual liability on the part of Charles University. If this e-mail or any of its attachments contains personal data, please be aware of data processing (particularly document management and archival policy) in accordance with Regulation (EU) 2016/679 of the European Parliament and of the Council on GDPR.
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Oct 28 2024 - 07:00:02 PDT