[AMBER] Inconsistent GPU Results

From: Matthew Guberman-Pfeffer via AMBER <amber.ambermd.org>
Date: Sun, 27 Nov 2022 09:44:50 -0500

Dear AMBER community,

I ran two identical calculations on (presumably different) GPUs and got completely different results. In the first run, the system blew up; in the second, everything looked fine. I did not change the input at all. How could this happen and are any of my results trustworthy? The output files are attached.

Background: I’m running TI simulations at different lambda values in a sequential fashion (e.g., 0.00, then 0.01, etc.) For each window, I’m running a minimization and an equilibration. Because I’m using the smooth softcore potential that is only implemented in pmemd.cuda, I’m running both the minimization and equilibration on GPUs. I know there are numerical issues with running minimizations on GPUs— Should I skip the minimizations or not use the smooth softcore potential for the minimizations so I can run on CPUs?

Sometimes the minimization on GPUs entirely goes awry:

   NSTEP ENERGY RMS GMAX NAME NUMBER
      1 6.1827E+08 4.4600E+03 4.3883E+05 H1 70188

 BOND = ************* ANGLE = 4854.3827 DIHED = 7029.8038
 VDWAALS = 93066.3839 EEL = -559385.5182 HBOND = 0.0000
 1-4 VDW = 2030.7940 1-4 EEL = 30573.6921 RESTRAINT = 0.0700
 EAMBER = *************
 DV/DL = 81.9596
 NMR restraints: Bond = 0.000 Angle = 0.000 Torsion = 0.070
===============================================================================
  Softcore part of the system: 25 atoms, TEMP(K) = 0.00
 SC_Etot= 0.0000 SC_EKtot= 0.0000 SC_EPtot = 79.4772
 SC_BOND= 1.7211 SC_ANGLE= 13.7642 SC_DIHED = 18.4959
 SC_14NB= 2.4757 SC_14EEL= 20.1833 SC_VDW = -0.0318
 SC_EEL = 22.8688
 SC_RES_DIST= 0.0000 SC_RES_ANG= 0.0000 SC_RES_TORS= 0.0000
 SC_EEL_DER= 56.6513 SC_VDW_DER= -1.0518 SC_DERIV = 55.5995
 ———————————————————————————————————————

But other times, using exactly the same input, it works just fine:

   NSTEP ENERGY RMS GMAX NAME NUMBER
      1 -4.4889E+05 1.7372E+01 1.1381E+02 C 218

 BOND = 1666.4578 ANGLE = 4854.3626 DIHED = 7029.8072
 VDWAALS = 59875.1546 EEL = -554800.0321 HBOND = 0.0000
 1-4 VDW = 1906.7559 1-4 EEL = 30573.8373 RESTRAINT = 0.0700
 EAMBER = -448893.6567
 DV/DL = 79.2240
 NMR restraints: Bond = 0.000 Angle = 0.000 Torsion = 0.070
===============================================================================
  Softcore part of the system: 25 atoms, TEMP(K) = 0.00
 SC_Etot= 0.0000 SC_EKtot= 0.0000 SC_EPtot = 79.4772
 SC_BOND= 1.7211 SC_ANGLE= 13.7642 SC_DIHED = 18.4959
 SC_14NB= 2.4757 SC_14EEL= 20.1833 SC_VDW = -0.0318
 SC_EEL = 22.8688
 SC_RES_DIST= 0.0000 SC_RES_ANG= 0.0000 SC_RES_TORS= 0.0000
 SC_EEL_DER= 56.6513 SC_VDW_DER= -1.0518 SC_DERIV = 55.5995
 ———————————————————————————————————————

Note that in both cases the minimization seemingly finishes, printing the usual timing information at the end of the output file.

How can two identical inputs give these very different outputs? Can I trust the results where the energies look reasonable?

Best,
Matthew




Best,
Matthew



_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

Received on Sun Nov 27 2022 - 07:00:02 PST
Custom Search