Re: [AMBER] Wrong results in GTX TITAN, correct results on GTX580

From: iqtcub <iqtcub.gmail.com>
Date: Mon, 15 Jul 2013 09:08:59 +0200

Hi all,

All right. Thanks for confirming that the errors are "expected".

We'll try to get them changed with GTX680, otherwise we'll have to wait.

Thanks

On 07/12/2013 04:48 AM, Ross Walker wrote:
> GTX Titan's are NOT currently supported - we are waiting on NVIDIA to fix
> the cuFFT library. Until then there is nothing that can be done
> unfortunately, they will just need to sit idle. Other codes have also seen
> similar problems so for now all you can do is sit tight unfortunately.
>
> Or RMA the cards and get GTX680s instead (or 780s although some people
> have reportedly seen problems with these as well but I have been unable to
> repro).
>
> All the best
> Ross
>
>
>
> On 7/11/13 12:35 AM, "iqtcub" <iqtcub.gmail.com> wrote:
>
>> Hi all,
>>
>> First of all, I'm just a sysadmin, so my technical amber knowledge is
>> very limited.
>>
>> Here's the scenario:
>>
>> We have a machine with a Gigabyte GTX580(driver 319.32), SLES11 OS and
>> CUDA5. We're using Amber 12 with Ambertools 13 updated with the latest
>> patches. The compiler used is intel 11.1.072 but also we've tried with
>> the gnu compilers that come with SLES11(gcc version 4.3.4).
>>
>> This machine works fine.
>>
>> Now we've bought another machine with four EVGA GTX TITAN(driver
>> 319.32), SLES11 OS and CUDA5. Same Amber version and patches, compilers,
>> etc.
>>
>> With the input i'm attaching, we're seeing wrong TEMP, Etot and EKtot
>> values. It happens after the 100000 NSTEP the first time, if i kill the
>> job and start it again, it happens after the 50000 NSTEP or so. Like
>> some overheating memory issues i've read in the list that happens with
>> GTX TITAN.
>>
>> The job has correct values when doing the same job on the GTX580. The
>> output is as follows:
>>
>> #############################
>>
>> NSTEP = 70000 TIME(PS) = 270.000 TEMP(K) = 301.25 PRESS
>> = 0.0
>> Etot = -112943.3112 EKtot = 47807.0625 EPtot =
>> -160750.3737
>> BOND = 24797.9158 ANGLE = 2543.7279 DIHED =
>> 3079.0534
>> 1-4 NB = 1035.0690 1-4 EEL = 11250.6176 VDWAALS =
>> 26954.3958
>> EELEC = -230411.1531 EHBOND = 0.0000 RESTRAINT =
>> 0.0000
>>
>> --------------------------------------------------------------------------
>> ----
>>
>> check COM velocity, temp: 0.000004 0.00(Removed)
>> check COM velocity, temp: 0.000001 0.00(Removed)
>> check COM velocity, temp: 0.000003 0.00(Removed)
>> check COM velocity, temp: 0.000002 0.00(Removed)
>> check COM velocity, temp: 0.000002 0.00(Removed)
>> check COM velocity, temp: 0.000002 0.00(Removed)
>> check COM velocity, temp: 0.000001 0.00(Removed)
>> check COM velocity, temp: 0.000002 0.00(Removed)
>> check COM velocity, temp: 0.000002 0.00(Removed)
>> check COM velocity, temp: 0.000002 0.00(Removed)
>>
>>
>> NSTEP = 70000 TIME(PS) = 270.000 TEMP(K) = 301.25 PRESS
>> = 0.0
>> Etot = -112943.3112 EKtot = 47807.0625 EPtot =
>> -160750.3737
>> BOND = 24797.9158 ANGLE = 2543.7279 DIHED =
>> 3079.0534
>> 1-4 NB = 1035.0690 1-4 EEL = 11250.6176 VDWAALS =
>> 26954.3958
>> EELEC = -230411.1531 EHBOND = 0.0000 RESTRAINT =
>> 0.0000
>>
>> --------------------------------------------------------------------------
>> ----
>>
>> check COM velocity, temp: 0.000004 0.00(Removed)
>> check COM velocity, temp: 0.000001 0.00(Removed)
>> check COM velocity, temp: 0.000003 0.00(Removed)
>> check COM velocity, temp: 0.000002 0.00(Removed)
>> check COM velocity, temp: 0.000002 0.00(Removed)
>> check COM velocity, temp: 0.000002 0.00(Removed)
>> check COM velocity, temp: 0.000001 0.00(Removed)
>> check COM velocity, temp: 0.000002 0.00(Removed)
>> check COM velocity, temp: 0.000002 0.00(Removed)
>> check COM velocity, temp: 0.000002 0.00(Removed)
>>
>> #############################
>>
>> While the output in the GTX TITAN is:
>>
>> #############################
>>
>> NSTEP = 70000 TIME(PS) = 270.000 TEMP(K) = 619.32 PRESS
>> = 0.0
>> Etot = -61495.6874 EKtot = 98281.8984 EPtot =
>> -159777.5858
>> BOND = 29075.5960 ANGLE = 2397.2902 DIHED =
>> 3028.8070
>> 1-4 NB = 1015.5372 1-4 EEL = 11294.9046 VDWAALS =
>> 26864.1996
>> EELEC = -233453.9203 EHBOND = 0.0000 RESTRAINT =
>> 0.0000
>>
>> --------------------------------------------------------------------------
>> ----
>>
>> check COM velocity, temp: 0.000004 0.00(Removed)
>> check COM velocity, temp: 0.000002 0.00(Removed)
>> check COM velocity, temp: 0.000002 0.00(Removed)
>> check COM velocity, temp: 0.000001 0.00(Removed)
>> check COM velocity, temp: 0.000002 0.00(Removed)
>> check COM velocity, temp: 0.000003 0.00(Removed)
>> check COM velocity, temp: 1112.636152*********(Removed)
>> check COM velocity, temp: 1286.464585*********(Removed)
>> check COM velocity, temp: 824.413845*********(Removed)
>> check COM velocity, temp: 1106.406956*********(Removed)
>>
>>
>> NSTEP = 80000 TIME(PS) = 280.000 TEMP(K) =********* PRESS
>> = 0.0
>> Etot = ************** EKtot = ************** EPtot =
>> **************
>> BOND = 0.0000 ANGLE = 423951.1225 DIHED =
>> 14356.6179
>> 1-4 NB = 0.0000 1-4 EEL = 0.0067 VDWAALS =
>> **************
>> EELEC = -188998.4079 EHBOND = 0.0000 RESTRAINT =
>> 0.0000
>>
>> --------------------------------------------------------------------------
>> ----
>>
>> check COM velocity, temp: 1163.680382*********(Removed)
>> check COM velocity, temp: 750.040734*********(Removed)
>> check COM velocity, temp: 629.104266*********(Removed)
>> check COM velocity, temp: 1465.801815*********(Removed)
>> check COM velocity, temp: 637.864373*********(Removed)
>> check COM velocity, temp: 1888.864547*********(Removed)
>> check COM velocity, temp: 1527.586226*********(Removed)
>> check COM velocity, temp: 1659.560655*********(Removed)
>> check COM velocity, temp: 953.381316*********(Removed)
>> check COM velocity, temp: 1613.977188*********(Removed)
>>
>> #############################
>>
>> Is this the same issue the other people are having with the GTX TITAN
>> and thats being investigated?
>>
>> By the way, running both memtest g80 or
>> cudagpumemtest(http://sourceforge.net/projects/cudagpumemtest/) after
>> the job gives starts giving those results, returns 0 errors.
>>
>> Thanks in advance!
>>
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Jul 15 2013 - 00:30:02 PDT
Custom Search