[AMBER] NetCDF Format Error When Using mpirun with sander, and CPU Usage Query

From: Tan Rua via AMBER <amber.ambermd.org>
Date: Sun, 21 Dec 2025 09:19:10 +0000

Dear Amber Mailing List Members,
I am writing to seek assistance with an issue I have encountered while running simulations with sander. The problem occurs specifically when I try to run the job in parallel using mpirun.
When I use sander for a minimization simulation with mpirun, it fails with the following NetCDF error:
NetCDF error: NetCDF: Unknown file format
write_nc_restart(): Could not open restart
But the same simulation completes successfully when I run sander serially (without mpirun).
Below are the details of my simulation:

The .in file content is:

Classical minimization: relax QM region only (MM)
&cntrl
  imin=1,
  maxcyc=8000,
  ncyc=4000,
  cut=8.0,
  ntx=1,
  ntb=1,
  ntr=1,
  restraint_wt=100.0,
  restraintmask='(!.3342-3349,3381-3388,3420)&(!:226-229)',
  ntpr=100,
  ntwx=0,
/

The failing command (with mpirun on our LSF HPC system):

mpirun -np 72 sander -O -i in/min_fix_qm_adp.in -o out/min_fix_qm_adp.out -p param/adk_2adp_mg_4wat.prmtop -c rst/md1ns_adp.rst -r rst/min_fix_qm_adp.rst -ref rst/md1ns_adp.rst

The successful command (without mpirun):

sander -O -i in/min_fix_qm_adp.in -o out/min_fix_qm_adp.out -p param/adk_2adp_mg_4wat.prmtop -c rst/md1ns_adp.rst -r rst/min_fix_qm_adp.rst -ref rst/md1ns_adp.rst

The simulation proceeds normally until the end of the first step, where the error appears. The tail of the output file shows:

 ---------------------------------------------------
 APPROXIMATING switch and d/dx switch using CUBIC SPLINE INTERPOLATION
 using 5000.0 points per unit in tabled values
 TESTING RELATIVE ERROR over r ranging from 0.0 to cutoff
| CHECK switch(x): max rel err = 0.2738E-14 at 2.422500
| CHECK d/dx switch(x): max rel err = 0.8332E-11 at 2.782960
 ---------------------------------------------------
| Local SIZE OF NONBOND LIST = 6599863
| TOTAL SIZE OF NONBOND LIST = 6599863


   NSTEP ENERGY RMS GMAX NAME NUMBER
      1 -1.0042E+05 1.4559E+01 1.2872E+02 C 2159

 BOND = 701.3337 ANGLE = 1822.5120 DIHED = 1176.4719
 VDWAALS = 11290.1670 EEL = -123447.7270 HBOND = 0.0000
 1-4 VDW = 776.4365 1-4 EEL = 7066.9539 RESTRAINT = 0.0000
 CMAP = 196.3039
NetCDF error: NetCDF: Unknown file format
write_nc_restart(): Could not open restart

The restart file md1ns_adp.rst was obtained from an equilibration simulation.
And the last step of the output file of md1ns_adp shows:

 NSTEP = 100000 TIME(PS) = 400.000 TEMP(K) = 300.01 PRESS = -47.7
 Etot = -81027.7443 EKtot = 19423.5200 EPtot = -100451.2643
 BOND = 672.1893 ANGLE = 1819.0946 DIHED = 1181.4037
 1-4 NB = 776.7944 1-4 EEL = 7079.8781 VDWAALS = 11289.1504
 EELEC = -123465.6595 EHBOND = 0.0000 RESTRAINT = 0.0000
 EKCMT = 8582.0969 VIRIAL = 8907.9399 VOLUME = 316654.4103
 CMAP = 195.8847
                                                    Density = 1.0202
 Ewald error estimate: 0.2461E-04
 ------------------------------------------------------------------------------

My questions are:

a) Could anyone help me understand why this NetCDF error occurs only in the parallel run, and how I might resolve it?

b) Additionally, I have observed a separate behavioral difference: during minimization, sander seems to utilize all assigned CPU cores. However, during equilibrium simulations, the CPU usage is often not full, sometimes with only one core appearing active. Any insights into this discrepancy would also be appreciated.


Thank you very much for your time and expertise.
Best regards,
Zeyu Zhang

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sun Dec 21 2025 - 01:30:02 PST
Custom Search