[AMBER] Wrong results in GTX TITAN, correct results on GTX580

From: iqtcub <iqtcub.gmail.com>
Date: Thu, 11 Jul 2013 09:35:47 +0200

Hi all,

First of all, I'm just a sysadmin, so my technical amber knowledge is
very limited.

Here's the scenario:

We have a machine with a Gigabyte GTX580(driver 319.32), SLES11 OS and
CUDA5. We're using Amber 12 with Ambertools 13 updated with the latest
patches. The compiler used is intel 11.1.072 but also we've tried with
the gnu compilers that come with SLES11(gcc version 4.3.4).

This machine works fine.

Now we've bought another machine with four EVGA GTX TITAN(driver
319.32), SLES11 OS and CUDA5. Same Amber version and patches, compilers,
etc.

With the input i'm attaching, we're seeing wrong TEMP, Etot and EKtot
values. It happens after the 100000 NSTEP the first time, if i kill the
job and start it again, it happens after the 50000 NSTEP or so. Like
some overheating memory issues i've read in the list that happens with
GTX TITAN.

The job has correct values when doing the same job on the GTX580. The
output is as follows:

#############################

   NSTEP = 70000 TIME(PS) = 270.000 TEMP(K) = 301.25 PRESS
= 0.0
  Etot = -112943.3112 EKtot = 47807.0625 EPtot =
-160750.3737
  BOND = 24797.9158 ANGLE = 2543.7279 DIHED = 3079.0534
  1-4 NB = 1035.0690 1-4 EEL = 11250.6176 VDWAALS = 26954.3958
  EELEC = -230411.1531 EHBOND = 0.0000 RESTRAINT =
0.0000
  ------------------------------------------------------------------------------

check COM velocity, temp: 0.000004 0.00(Removed)
check COM velocity, temp: 0.000001 0.00(Removed)
check COM velocity, temp: 0.000003 0.00(Removed)
check COM velocity, temp: 0.000002 0.00(Removed)
check COM velocity, temp: 0.000002 0.00(Removed)
check COM velocity, temp: 0.000002 0.00(Removed)
check COM velocity, temp: 0.000001 0.00(Removed)
check COM velocity, temp: 0.000002 0.00(Removed)
check COM velocity, temp: 0.000002 0.00(Removed)
check COM velocity, temp: 0.000002 0.00(Removed)


   NSTEP = 70000 TIME(PS) = 270.000 TEMP(K) = 301.25 PRESS
= 0.0
  Etot = -112943.3112 EKtot = 47807.0625 EPtot =
-160750.3737
  BOND = 24797.9158 ANGLE = 2543.7279 DIHED = 3079.0534
  1-4 NB = 1035.0690 1-4 EEL = 11250.6176 VDWAALS = 26954.3958
  EELEC = -230411.1531 EHBOND = 0.0000 RESTRAINT =
0.0000
  ------------------------------------------------------------------------------

check COM velocity, temp: 0.000004 0.00(Removed)
check COM velocity, temp: 0.000001 0.00(Removed)
check COM velocity, temp: 0.000003 0.00(Removed)
check COM velocity, temp: 0.000002 0.00(Removed)
check COM velocity, temp: 0.000002 0.00(Removed)
check COM velocity, temp: 0.000002 0.00(Removed)
check COM velocity, temp: 0.000001 0.00(Removed)
check COM velocity, temp: 0.000002 0.00(Removed)
check COM velocity, temp: 0.000002 0.00(Removed)
check COM velocity, temp: 0.000002 0.00(Removed)

#############################

While the output in the GTX TITAN is:

#############################

  NSTEP = 70000 TIME(PS) = 270.000 TEMP(K) = 619.32 PRESS
= 0.0
  Etot = -61495.6874 EKtot = 98281.8984 EPtot =
-159777.5858
  BOND = 29075.5960 ANGLE = 2397.2902 DIHED = 3028.8070
  1-4 NB = 1015.5372 1-4 EEL = 11294.9046 VDWAALS = 26864.1996
  EELEC = -233453.9203 EHBOND = 0.0000 RESTRAINT =
0.0000
  ------------------------------------------------------------------------------

check COM velocity, temp: 0.000004 0.00(Removed)
check COM velocity, temp: 0.000002 0.00(Removed)
check COM velocity, temp: 0.000002 0.00(Removed)
check COM velocity, temp: 0.000001 0.00(Removed)
check COM velocity, temp: 0.000002 0.00(Removed)
check COM velocity, temp: 0.000003 0.00(Removed)
check COM velocity, temp: 1112.636152*********(Removed)
check COM velocity, temp: 1286.464585*********(Removed)
check COM velocity, temp: 824.413845*********(Removed)
check COM velocity, temp: 1106.406956*********(Removed)


  NSTEP = 80000 TIME(PS) = 280.000 TEMP(K) =********* PRESS
= 0.0
  Etot = ************** EKtot = ************** EPtot =
**************
  BOND = 0.0000 ANGLE = 423951.1225 DIHED = 14356.6179
  1-4 NB = 0.0000 1-4 EEL = 0.0067 VDWAALS =
**************
  EELEC = -188998.4079 EHBOND = 0.0000 RESTRAINT =
0.0000
  ------------------------------------------------------------------------------

check COM velocity, temp: 1163.680382*********(Removed)
check COM velocity, temp: 750.040734*********(Removed)
check COM velocity, temp: 629.104266*********(Removed)
check COM velocity, temp: 1465.801815*********(Removed)
check COM velocity, temp: 637.864373*********(Removed)
check COM velocity, temp: 1888.864547*********(Removed)
check COM velocity, temp: 1527.586226*********(Removed)
check COM velocity, temp: 1659.560655*********(Removed)
check COM velocity, temp: 953.381316*********(Removed)
check COM velocity, temp: 1613.977188*********(Removed)

#############################

Is this the same issue the other people are having with the GTX TITAN
and thats being investigated?

By the way, running both memtest g80 or
cudagpumemtest(http://sourceforge.net/projects/cudagpumemtest/) after
the job gives starts giving those results, returns 0 errors.

Thanks in advance!




_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

Received on Thu Jul 11 2013 - 01:00:03 PDT
Custom Search