Re: [AMBER] AMBER-ORCA Interface

From: Martin Juhás via AMBER <amber.ambermd.org>
Date: Fri, 11 Oct 2024 18:29:38 +0000

Hi, this looks like a problem with the MPI and/or UCX library. Try running somr test job only with orca using more than 1 cpu if that works.

Best,

Martin

Odoslané z aplikácie Outlook pre Android<https://aka.ms/AAb9ysg>
________________________________
From: Ramdhan,Peter A via AMBER <amber.ambermd.org>
Sent: Friday, October 11, 2024 3:08:47 PM
To: AMBER Mailing List <amber.ambermd.org>
Subject: [AMBER] AMBER-ORCA Interface

[EXTERNAL EMAIL]


Hi everyone,

I have a question about using QM with ORCA as the external program for AMBER. When I perform this calculation in serial, it works fine, however, when using parallel it fails after a couple of steps. Does anyone have experience with this?

Here is my mdin file:


&cntrl
 imin = 0, ! Perform MD, not minimization
 irest = 1, ! Restart simulation from previous run
 ntx = 5, ! Coordinates and velocities from the restart file
 nstlim =100, ! Number of MD steps = 1 ps
 dt = 0.0005, ! Time step in picoseconds
 cut = 8.0, ! Non-bonded cutoff in angstroms

 ntr = 0, ! No positional restraints
 restraint_wt = 0.0, ! Weight of restraint (no restraints applied)

 ntb = 2, ! Constant pressure periodic boundary conditions
 ntp = 1, ! Isotropic position scaling (NPT ensemble)
 barostat = 1, ! Berendsen pressure control

 ntc = 2, ! SHAKE on bonds involving hydrogen
 ntf = 2, ! Bond interactions with hydrogens excluded

 ntt = 3, ! Langevin thermostat
 gamma_ln = 5.0, ! Collision frequency for Langevin dynamics
 tempi = 310, ! Initial temperature
 temp0 = 310, ! Target temperature

 ioutfm = 1, ! Write binary trajectory file
 ntpr = 1, ! Print energy information every 500 steps
 ntwx = 1, ! Write coordinates to trajectory file every 500 steps
 ntwr = 1, ! Write restart file every 500 steps

 ifqnt=1, ! Turn on QM/MM
/

&qmmm
 qmmask=':CYP.SG,CB|:HEM.FE,O1,NA,NB,NC,ND,C1C,C2C,C3C,C4C,CHD,HHD,C1D,C2D,C3D,C4D,CHA,HHA,C1A,C2A,C3A,C4A,CHB,HHB,C1B,C2B,C3B,C4B,CHC,HHC', ! QM region, specifying residues 1 and 465
 qmmm_int=1, !
 qm_theory='EXTERN', !
 qmcharge=-2,
 spin=4,
 qmshake=0,
 qm_ewald = 0,
 qm_pme=0,
/
&orc
 method = 'bp86',
 basis = 'sv(p)',
 num_threads=8,
 maxcore=2000,
/

&wt
 type='END'
&end



And here is my slurm file:


#!/bin/bash
#SBATCH --job-name=clop_qm
#SBATCH --nodes=1
#SBATCH --ntasks=8
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=4GB
#SBATCH --partition=gpu
#SBATCH --gres=gpu:a100:1
#SBATCH --time=4-00:00:00
#SBATCH --output=job.%j.out
#SBATCH --error=job.%j.err

module purge
ml gcc
#ml openmpi/4.1.1
export PATH=/apps/gcc/12.2.0/openmpi/4.1.1/orca/5.0.4:$PATH
export LD_LIBRARY_PATH=/apps/gcc/12.2.0/openmpi/4.1.1/orca/5.0.4:$LD_LIBRARY_PATH
export PATH=/apps/mpi/gcc/12.2.0/openmpi/4.1.1/bin:$PATH
export LD_LIBRARY_PATH=/apps/mpi/gcc/12.2.0/openmpi/4.1.1/lib:$LD_LIBRARY_PATH
source $AMBERHOME/amber.sh

$AMBERHOME/bin/sander -O -i step8_qm.mdin -o step8_qm.out -p com.parm7 -c step6.ncrst -r step7_qm.ncrst -x step7_qm.nc -ref step6.ncrst -inf step7_qm.info


I am encountering this error after a couple of steps:


------------------------- --------------------
FINAL SINGLE POINT ENERGY -2762.899076990582
------------------------- --------------------

[1728651922.027067] [c0800a-s17:1471414:0] mm_posix.c:234 UCX ERROR open(file_name=/proc/1471416/fd/71 flags=0x0) failed: No such file or directory
[1728651922.028197] [c0800a-s17:1471414:0] mm_posix.c:234 UCX ERROR open(file_name=/proc/1471416/fd/71 flags=0x0) failed: No such file or directory
[1728651922.028735] [c0800a-s17:1471414:0] mm_sysv.c:59 UCX ERROR shmat(shmid=3145743) failed: Invalid argument
[1728651922.028742] [c0800a-s17:1471414:0] mm_ep.c:189 UCX ERROR mm ep failed to connect to remote FIFO id 0x30000f: Shared memory error
[1728651922.025391] [c0800a-s17:1471415:0] mm_posix.c:234 UCX ERROR open(file_name=/proc/1471416/fd/71 flags=0x0) failed: No such file or directory
[1728651922.026517] [c0800a-s17:1471415:0] mm_posix.c:234 UCX ERROR open(file_name=/proc/1471416/fd/71 flags=0x0) failed: No such file or directory
[1728651922.027067] [c0800a-s17:1471415:0] mm_sysv.c:59 UCX ERROR shmat(shmid=3145743) failed: Invalid argument
[1728651922.027073] [c0800a-s17:1471415:0] mm_ep.c:189 UCX ERROR mm ep failed to connect to remote FIFO id 0x30000f: Shared memory error
[1728651922.032593] [c0800a-s17:1471417:0] mm_posix.c:234 UCX ERROR open(file_name=/proc/1471426/fd/71 flags=0x0) failed: No such file or directory
[1728651922.033773] [c0800a-s17:1471417:0] mm_posix.c:234 UCX ERROR open(file_name=/proc/1471426/fd/71 flags=0x0) failed: No such file or directory
[1728651922.034341] [c0800a-s17:1471417:0] mm_sysv.c:59 UCX ERROR shmat(shmid=3145746) failed: Invalid argument
[1728651922.034347] [c0800a-s17:1471417:0] mm_ep.c:189 UCX ERROR mm ep failed to connect to remote FIFO id 0x300012: Shared memory error
[1728651922.030778] [c0800a-s17:1471419:0] mm_posix.c:234 UCX ERROR open(file_name=/proc/1471426/fd/71 flags=0x0) failed: No such file or directory
[1728651922.031930] [c0800a-s17:1471419:0] mm_posix.c:234 UCX ERROR open(file_name=/proc/1471426/fd/71 flags=0x0) failed: No such file or directory
[1728651922.032472] [c0800a-s17:1471419:0] mm_sysv.c:59 UCX ERROR shmat(shmid=3145746) failed: Invalid argument
[1728651922.032479] [c0800a-s17:1471419:0] mm_ep.c:189 UCX ERROR mm ep failed to connect to remote FIFO id 0x300012: Shared memory error
[1728651922.025267] [c0800a-s17:1471420:0] mm_posix.c:234 UCX ERROR open(file_name=/proc/1471426/fd/71 flags=0x0) failed: No such file or directory
[1728651922.026408] [c0800a-s17:1471420:0] mm_posix.c:234 UCX ERROR open(file_name=/proc/1471426/fd/71 flags=0x0) failed: No such file or directory
[1728651922.026952] [c0800a-s17:1471420:0] mm_sysv.c:59 UCX ERROR shmat(shmid=3145746) failed: Invalid argument
[1728651922.026959] [c0800a-s17:1471420:0] mm_ep.c:189 UCX ERROR mm ep failed to connect to remote FIFO id 0x300012: Shared memory error
[1728651922.032055] [c0800a-s17:1471413:0] mm_posix.c:234 UCX ERROR open(file_name=/proc/1471416/fd/71 flags=0x0) failed: No such file or directory
[1728651922.033186] [c0800a-s17:1471413:0] mm_posix.c:234 UCX ERROR open(file_name=/proc/1471416/fd/71 flags=0x0) failed: No such file or directory
[1728651922.033720] [c0800a-s17:1471413:0] mm_sysv.c:59 UCX ERROR shmat(shmid=3145743) failed: Invalid argument
[1728651922.033727] [c0800a-s17:1471413:0] mm_ep.c:189 UCX ERROR mm ep failed to connect to remote FIFO id 0x30000f: Shared memory error

ORCA finished by error termination in SCF gradient
Calling Command: mpirun -np 8 /apps/gcc/12.2.0/openmpi/4.1.1/orca/5.0.4/orca_scfgrad_mpi orc_job.scfgrad.inp orc_job
[file orca_tools/qcmsg.cpp, line 465]:
  .... aborting the run


Am I not allocating enough memory? According to the job file, it uses about 200-300 mb per step and if maxcore is mb allocated per cpu (ntasks is 8 and cpus per task is 1 so 8 total) then I figured it would be enough if my maxcore is 2000.

Sincerely,

Peter Ramdhan, PharmD

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

Upozornení: Není-li v této zpráve výslovne uvedeno jinak, má tato e-mailová zpráva nebo její prílohy pouze informativní charakter. Tato zpráva ani její prílohy v zádném ohledu Univerzitu Karlovu k nicemu nezavazují. Text této zprávy nebo jejích príloh není návrhem na uzavrení smlouvy, ani prijetím prípadného návrhu na uzavrení smlouvy, ani jiným právním jednáním smerujícím k uzavrení jakékoliv smlouvy a nezakládá predsmluvní odpovednost Univerzity Karlovy. Obsahuje-li tento e-mail nebo nekterá z jeho príloh osobní údaje, dbejte pri jeho dalsím zpracování (zejména pri archivaci) souladu s pravidly evropského narízení GDPR.

Disclaimer: If not expressly stated otherwise, this e-mail message (including any attached files) is intended purely for informational purposes and does not represent a binding agreement on the part of Charles University. The text of this message and its attachments cannot be considered as a proposal to conclude a contract, nor the acceptance of a proposal to conclude a contract, nor any other legal act leading to concluding any contract; nor does it create any pre-contractual liability on the part of Charles University. If this e-mail or any of its attachments contains personal data, please be aware of data processing (particularly document management and archival policy) in accordance with Regulation (EU) 2016/679 of the European Parliament and of the Council on GDPR.


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Oct 11 2024 - 12:00:01 PDT
Custom Search