GTX Titan's are NOT currently supported - we are waiting on NVIDIA to fix
the cuFFT library. Until then there is nothing that can be done
unfortunately, they will just need to sit idle. Other codes have also seen
similar problems so for now all you can do is sit tight unfortunately.
Or RMA the cards and get GTX680s instead (or 780s although some people
have reportedly seen problems with these as well but I have been unable to
repro).
All the best
Ross
On 7/11/13 12:35 AM, "iqtcub" <iqtcub.gmail.com> wrote:
>Hi all,
>
>First of all, I'm just a sysadmin, so my technical amber knowledge is
>very limited.
>
>Here's the scenario:
>
>We have a machine with a Gigabyte GTX580(driver 319.32), SLES11 OS and
>CUDA5. We're using Amber 12 with Ambertools 13 updated with the latest
>patches. The compiler used is intel 11.1.072 but also we've tried with
>the gnu compilers that come with SLES11(gcc version 4.3.4).
>
>This machine works fine.
>
>Now we've bought another machine with four EVGA GTX TITAN(driver
>319.32), SLES11 OS and CUDA5. Same Amber version and patches, compilers,
>etc.
>
>With the input i'm attaching, we're seeing wrong TEMP, Etot and EKtot
>values. It happens after the 100000 NSTEP the first time, if i kill the
>job and start it again, it happens after the 50000 NSTEP or so. Like
>some overheating memory issues i've read in the list that happens with
>GTX TITAN.
>
>The job has correct values when doing the same job on the GTX580. The
>output is as follows:
>
>#############################
>
> NSTEP = 70000 TIME(PS) = 270.000 TEMP(K) = 301.25 PRESS
>= 0.0
> Etot = -112943.3112 EKtot = 47807.0625 EPtot =
>-160750.3737
> BOND = 24797.9158 ANGLE = 2543.7279 DIHED =
>3079.0534
> 1-4 NB = 1035.0690 1-4 EEL = 11250.6176 VDWAALS =
>26954.3958
> EELEC = -230411.1531 EHBOND = 0.0000 RESTRAINT =
>0.0000
>
>--------------------------------------------------------------------------
>----
>
>check COM velocity, temp: 0.000004 0.00(Removed)
>check COM velocity, temp: 0.000001 0.00(Removed)
>check COM velocity, temp: 0.000003 0.00(Removed)
>check COM velocity, temp: 0.000002 0.00(Removed)
>check COM velocity, temp: 0.000002 0.00(Removed)
>check COM velocity, temp: 0.000002 0.00(Removed)
>check COM velocity, temp: 0.000001 0.00(Removed)
>check COM velocity, temp: 0.000002 0.00(Removed)
>check COM velocity, temp: 0.000002 0.00(Removed)
>check COM velocity, temp: 0.000002 0.00(Removed)
>
>
> NSTEP = 70000 TIME(PS) = 270.000 TEMP(K) = 301.25 PRESS
>= 0.0
> Etot = -112943.3112 EKtot = 47807.0625 EPtot =
>-160750.3737
> BOND = 24797.9158 ANGLE = 2543.7279 DIHED =
>3079.0534
> 1-4 NB = 1035.0690 1-4 EEL = 11250.6176 VDWAALS =
>26954.3958
> EELEC = -230411.1531 EHBOND = 0.0000 RESTRAINT =
>0.0000
>
>--------------------------------------------------------------------------
>----
>
>check COM velocity, temp: 0.000004 0.00(Removed)
>check COM velocity, temp: 0.000001 0.00(Removed)
>check COM velocity, temp: 0.000003 0.00(Removed)
>check COM velocity, temp: 0.000002 0.00(Removed)
>check COM velocity, temp: 0.000002 0.00(Removed)
>check COM velocity, temp: 0.000002 0.00(Removed)
>check COM velocity, temp: 0.000001 0.00(Removed)
>check COM velocity, temp: 0.000002 0.00(Removed)
>check COM velocity, temp: 0.000002 0.00(Removed)
>check COM velocity, temp: 0.000002 0.00(Removed)
>
>#############################
>
>While the output in the GTX TITAN is:
>
>#############################
>
> NSTEP = 70000 TIME(PS) = 270.000 TEMP(K) = 619.32 PRESS
>= 0.0
> Etot = -61495.6874 EKtot = 98281.8984 EPtot =
>-159777.5858
> BOND = 29075.5960 ANGLE = 2397.2902 DIHED =
>3028.8070
> 1-4 NB = 1015.5372 1-4 EEL = 11294.9046 VDWAALS =
>26864.1996
> EELEC = -233453.9203 EHBOND = 0.0000 RESTRAINT =
>0.0000
>
>--------------------------------------------------------------------------
>----
>
>check COM velocity, temp: 0.000004 0.00(Removed)
>check COM velocity, temp: 0.000002 0.00(Removed)
>check COM velocity, temp: 0.000002 0.00(Removed)
>check COM velocity, temp: 0.000001 0.00(Removed)
>check COM velocity, temp: 0.000002 0.00(Removed)
>check COM velocity, temp: 0.000003 0.00(Removed)
>check COM velocity, temp: 1112.636152*********(Removed)
>check COM velocity, temp: 1286.464585*********(Removed)
>check COM velocity, temp: 824.413845*********(Removed)
>check COM velocity, temp: 1106.406956*********(Removed)
>
>
> NSTEP = 80000 TIME(PS) = 280.000 TEMP(K) =********* PRESS
>= 0.0
> Etot = ************** EKtot = ************** EPtot =
>**************
> BOND = 0.0000 ANGLE = 423951.1225 DIHED =
>14356.6179
> 1-4 NB = 0.0000 1-4 EEL = 0.0067 VDWAALS =
>**************
> EELEC = -188998.4079 EHBOND = 0.0000 RESTRAINT =
>0.0000
>
>--------------------------------------------------------------------------
>----
>
>check COM velocity, temp: 1163.680382*********(Removed)
>check COM velocity, temp: 750.040734*********(Removed)
>check COM velocity, temp: 629.104266*********(Removed)
>check COM velocity, temp: 1465.801815*********(Removed)
>check COM velocity, temp: 637.864373*********(Removed)
>check COM velocity, temp: 1888.864547*********(Removed)
>check COM velocity, temp: 1527.586226*********(Removed)
>check COM velocity, temp: 1659.560655*********(Removed)
>check COM velocity, temp: 953.381316*********(Removed)
>check COM velocity, temp: 1613.977188*********(Removed)
>
>#############################
>
>Is this the same issue the other people are having with the GTX TITAN
>and thats being investigated?
>
>By the way, running both memtest g80 or
>cudagpumemtest(http://sourceforge.net/projects/cudagpumemtest/) after
>the job gives starts giving those results, returns 0 errors.
>
>Thanks in advance!
>
>
>_______________________________________________
>AMBER mailing list
>AMBER.ambermd.org
>http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Jul 11 2013 - 20:00:03 PDT