Dear Amber Mailing List Members,
I am writing to seek assistance with an issue I have encountered while running simulations with sander. The problem occurs specifically when I try to run the job in parallel using mpirun.
When I use sander for a minimization simulation with mpirun, it fails with the following NetCDF error:
NetCDF error: NetCDF: Unknown file format
write_nc_restart(): Could not open restart
But the same simulation completes successfully when I run sander serially (without mpirun).
Below are the details of my simulation:
The .in file content is:
Classical minimization: relax QM region only (MM)
&cntrl
imin=1,
maxcyc=8000,
ncyc=4000,
cut=8.0,
ntx=1,
ntb=1,
ntr=1,
restraint_wt=100.0,
restraintmask='(!.3342-3349,3381-3388,3420)&(!:226-229)',
ntpr=100,
ntwx=0,
/
The failing command (with mpirun on our LSF HPC system):
mpirun -np 72 sander -O -i in/min_fix_qm_adp.in -o out/min_fix_qm_adp.out -p param/adk_2adp_mg_4wat.prmtop -c rst/md1ns_adp.rst -r rst/min_fix_qm_adp.rst -ref rst/md1ns_adp.rst
The successful command (without mpirun):
sander -O -i in/min_fix_qm_adp.in -o out/min_fix_qm_adp.out -p param/adk_2adp_mg_4wat.prmtop -c rst/md1ns_adp.rst -r rst/min_fix_qm_adp.rst -ref rst/md1ns_adp.rst
The simulation proceeds normally until the end of the first step, where the error appears. The tail of the output file shows:
---------------------------------------------------
APPROXIMATING switch and d/dx switch using CUBIC SPLINE INTERPOLATION
using 5000.0 points per unit in tabled values
TESTING RELATIVE ERROR over r ranging from 0.0 to cutoff
| CHECK switch(x): max rel err = 0.2738E-14 at 2.422500
| CHECK d/dx switch(x): max rel err = 0.8332E-11 at 2.782960
---------------------------------------------------
| Local SIZE OF NONBOND LIST = 6599863
| TOTAL SIZE OF NONBOND LIST = 6599863
NSTEP ENERGY RMS GMAX NAME NUMBER
1 -1.0042E+05 1.4559E+01 1.2872E+02 C 2159
BOND = 701.3337 ANGLE = 1822.5120 DIHED = 1176.4719
VDWAALS = 11290.1670 EEL = -123447.7270 HBOND = 0.0000
1-4 VDW = 776.4365 1-4 EEL = 7066.9539 RESTRAINT = 0.0000
CMAP = 196.3039
NetCDF error: NetCDF: Unknown file format
write_nc_restart(): Could not open restart
The restart file md1ns_adp.rst was obtained from an equilibration simulation.
And the last step of the output file of md1ns_adp shows:
NSTEP = 100000 TIME(PS) = 400.000 TEMP(K) = 300.01 PRESS = -47.7
Etot = -81027.7443 EKtot = 19423.5200 EPtot = -100451.2643
BOND = 672.1893 ANGLE = 1819.0946 DIHED = 1181.4037
1-4 NB = 776.7944 1-4 EEL = 7079.8781 VDWAALS = 11289.1504
EELEC = -123465.6595 EHBOND = 0.0000 RESTRAINT = 0.0000
EKCMT = 8582.0969 VIRIAL = 8907.9399 VOLUME = 316654.4103
CMAP = 195.8847
Density = 1.0202
Ewald error estimate: 0.2461E-04
------------------------------------------------------------------------------
My questions are:
a) Could anyone help me understand why this NetCDF error occurs only in the parallel run, and how I might resolve it?
b) Additionally, I have observed a separate behavioral difference: during minimization, sander seems to utilize all assigned CPU cores. However, during equilibrium simulations, the CPU usage is often not full, sometimes with only one core appearing active. Any insights into this discrepancy would also be appreciated.
Thank you very much for your time and expertise.
Best regards,
Zeyu Zhang
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sun Dec 21 2025 - 01:30:02 PST