Re: [AMBER] GTX670

From: Ross Walker <ross.rosswalker.co.uk>
Date: Mon, 24 Feb 2014 09:26:52 -0800

If you've already tried 331.20 or 38 then .49 is for sure not going to
help so I'd skip that. Try one GPU at a time in the machine - that might
tell you if it is a power supply issue. Also I've found that just
reseating the GPUs can help sometimes. Beyond that it's a faulty GPU.


On 2/24/14, 8:30 AM, "Deák Robert" <kokumetto.gmail.com> wrote:

>Thanks Scott and Ross!
>
>I have 23950 atoms and it's happening at random places given the same
>input
>as you wrote, Scott.
>The cards (both of them) produce the mentioned error from the first use
>...
>
>I have tried the 319.60, 331.20, 331.38 drivers. Now, I will try the
>331.49
>driver, and the test tool from Ross, than I will write again.
>
>All the best,
>Robert
>
>
>2014-02-24 17:12 GMT+01:00 Ross Walker <ross.rosswalker.co.uk>:
>
>> Hi Robert,
>>
>> This does indeed sound like faulty GPUs - do they both do it?
>>
>> First download the latest version of the NVIDIA driver and install it -
>> v331.49 from here: http://www.nvidia.com/object/unix.html
>>
>> See if that helps at all. Then try downloading my test suite:
>>
>>
>>https://dl.dropboxusercontent.com/u/708185/GPU_Validation_Test_2cards.tar
>>.g
>> z
>>
>> Run this and it should run for about 24 hours (you might have to tweak
>>the
>> run script a little for your setup, paths etc). At the end check the log
>> files - all 10 runs should give the same final energy and both GPUs
>>should
>> also match. If they don't (or you see crashes during the run) then it
>> means your GPUs are faulty.
>>
>> All the best
>> Ross
>>
>>
>>
>> On 2/24/14, 3:01 AM, "Deák Robert" <kokumetto.gmail.com> wrote:
>>
>> >Dear Amber users,
>> >
>> >Recently we bought 2 GTX 670 DC mini (
>> >http://www.asus.com/Graphics_Cards/GTX670DCMOC2GD5/) but with both of
>> them
>> >I experienced the same error message after random run time.
>> >
>> >The message is:
>> >*cudaMemcpy GpuBuffer::Download failed unspecified launch failure*
>> >
>> >With exactly the same input files and input settings there are no error
>> >messages using a GTX TITAN or a TESLA card. I have tried the GTX 670
>>cards
>> >in the other machine, and also a TITAN card in this server, but the
>>error
>> >is related to GTX 670 cards, independently from the server.
>> >
>> >My question is, this type of error message means hardware failure?
>> >
>> >These are my input parameters:
>> > &cntrl
>> > imin = 0, irest = 0, ntx = 1,
>> > ntb = 2, pres0 = 1.0, ntp = 1,
>> > taup = 2.0,
>> > cut = 11.0, ntr = 0,
>> > ntc = 2, ntf = 2,
>> > tempi = 300.0, temp0 = 300.0,
>> > ntt = 3, gamma_ln = 0.1,
>> > nstlim = 100000000, dt = 0.002,
>> > ntpr = 1000, ntwx = 1000, ntwr = 1000,
>> > ig=1, nscm=1000
>> > /
>> >
>> >Thanks,
>> >Robert
>> >_______________________________________________
>> >AMBER mailing list
>> >AMBER.ambermd.org
>> >http://lists.ambermd.org/mailman/listinfo/amber
>>
>>
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>_______________________________________________
>AMBER mailing list
>AMBER.ambermd.org
>http://lists.ambermd.org/mailman/listinfo/amber



_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Feb 24 2014 - 09:30:05 PST
Custom Search