[AMBER] sander14 MPI produces different results depending on number of used cores

From: David Abia <dabia.cbm.csic.es>
Date: Tue, 03 Jun 2014 14:00:53 +0200

Hi,

I'm testing sander.MPI from AmberTools 14 and I'm obtaining different
results depending on how many cores I use. This behaviour is independent
of the chosen compiler (intel or gnu) and mpi library (imtel mpi or
mvapich2). All 3 available patches have been applied to the source code
and the tests have been run in a server with 2 Xeon(R) CPU E5645 and
Centos 6 ( kernel 3.10.30-1.el6.elrepo.x86_64 ). I'm doing the testing
with one of amber benchmarks ( Amber12_GPU_Benchmark_Suite.tar.gz ), the
PME JAC_production_NPT, using 4 or 8 cores. I've reduced the number of
steps to 4000 and I'm printing the status of the simulation every 500
steps. Small differences appear just after the first 500 steps.

These are the results using intel composer_xe_2013_sp1.3.174, with MKL
and intel mpi library 4.1.3.049. The fftw library is the one provided
with AmberTools:

After 500 steps:

4 cores:
  NSTEP = 500 TIME(PS) = 7.000 TEMP(K) = 294.55 PRESS
= 37.1
  Etot = -58125.3527 EKtot = 14159.3567 EPtot =
-72284.7093
  BOND = 453.0272 ANGLE = 1164.3293 DIHED = 970.1314
  1-4 NB = 553.6299 1-4 EEL = 6577.2786 VDWAALS =
8233.1300
  EELEC = -90236.2357 EHBOND = 0.0000 RESTRAINT =
0.0000
  EKCMT = 6313.9290 VIRIAL = 6121.0412 VOLUME =
240517.1837
                                                     Density =
0.9978
  Ewald error estimate: 0.6831E-05

8 cores:
  NSTEP = 500 TIME(PS) = 7.000 TEMP(K) = 294.55 PRESS
= 37.1
  Etot = -58125.3518 EKtot = 14159.3582 EPtot =
-72284.7100
  BOND = 453.0272 ANGLE = 1164.3292 DIHED = 970.1314
  1-4 NB = 553.6299 1-4 EEL = 6577.2785 VDWAALS =
8233.1311
  EELEC = -90236.2374 EHBOND = 0.0000 RESTRAINT =
0.0000
  EKCMT = 6313.9290 VIRIAL = 6121.0274 VOLUME =
240517.1837
                                                     Density =
0.9978
  Ewald error estimate: 0.7043E-05

After 4000 steps:

4 cores:
  NSTEP = 4000 TIME(PS) = 14.000 TEMP(K) = 299.64 PRESS =
-368.2
  Etot = -58098.2269 EKtot = 14404.0825 EPtot =
-72502.3095
  BOND = 459.1074 ANGLE = 1237.4373 DIHED = 992.3646
  1-4 NB = 555.0881 1-4 EEL = 6542.3732 VDWAALS =
8417.1990
  EELEC = -90705.8791 EHBOND = 0.0000 RESTRAINT =
0.0000
  EKCMT = 6346.4178 VIRIAL = 8237.9884 VOLUME =
237955.4669
                                                     Density =
1.0085
  Ewald error estimate: 0.7797E-04

8 cores:
  NSTEP = 4000 TIME(PS) = 14.000 TEMP(K) = 297.79 PRESS =
-510.4
  Etot = -58097.7496 EKtot = 14314.9066 EPtot =
-72412.6562
  BOND = 444.4120 ANGLE = 1248.1361 DIHED = 994.2033
  1-4 NB = 535.1447 1-4 EEL = 6641.6835 VDWAALS =
8121.4087
  EELEC = -90397.6444 EHBOND = 0.0000 RESTRAINT =
0.0000
  EKCMT = 6280.7656 VIRIAL = 8904.3312 VOLUME =
238085.7473
                                                     Density =
1.0079
  Ewald error estimate: 0.6962E-05

Compiling with gcc, gfortran (Red Hat 4.4.7-4) and mvapich2-2.0rc1-1,
something similar occurs:

4 cores:
  NSTEP = 4000 TIME(PS) = 14.000 TEMP(K) = 295.61 PRESS =
-252.2
  Etot = -58090.6641 EKtot = 14210.1483 EPtot =
-72300.8124
  BOND = 478.0411 ANGLE = 1246.9681 DIHED = 987.7397
  1-4 NB = 537.7395 1-4 EEL = 6546.0346 VDWAALS =
8426.7697
  EELEC = -90524.1051 EHBOND = 0.0000 RESTRAINT =
0.0000
  EKCMT = 6288.5345 VIRIAL = 7585.9111 VOLUME =
238235.8018
                                                     Density =
1.0073
  Ewald error estimate: 0.1385E-04

8 cores: NSTEP = 4000 TIME(PS) = 14.000 TEMP(K) = 298.88
PRESS = -203.4
  Etot = -58085.9769 EKtot = 14367.4331 EPtot =
-72453.4100
  BOND = 421.5745 ANGLE = 1231.9493 DIHED = 968.5337
  1-4 NB = 551.8598 1-4 EEL = 6565.7616 VDWAALS =
8185.8820
  EELEC = -90378.9710 EHBOND = 0.0000 RESTRAINT =
0.0000
  EKCMT = 6298.3416 VIRIAL = 7342.9989 VOLUME =
237879.9709
                                                     Density =
1.0088
  Ewald error estimate: 0.1709E-03


There are no differences in consecutive runs with the same number of
cores, which produce exactly the same numbers.

Any idea of what could be happening?

Best regards!



David Abia
Bioinformatics Unit CBM-SO





_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Jun 03 2014 - 05:30:02 PDT
Custom Search