Re: [AMBER] GTX670

From: Gould, Ian R <i.gould.imperial.ac.uk>
Date: Mon, 24 Feb 2014 16:33:59 +0000

Dear All,

A few anecdotes, I recently bought 4 GTX780 Ti's and due to the PSU being
out of stock I ended up temporarily utilising them in older machines with
well used PSU's and all the motherboards were PCI 2.0 not 3.0. I ran my
own version of Ross's 24 hour validation tests, 1 card was completely
unreliable and would not report the same numbers on any rerun.
The other three would report exactly the same numbers after numerous,
greater than 20 repeats of the same run. However, one of them would
occasionally have the "cudaMemcpy GpuBuffer::Download failed unspecified
launch failure" error and this seemed totally random in its occurrence.

Recently I received the PSU, a 1350W guaranteed with 108A on the +12V
rail, and installed all three cards in a big chasis, ie well ventilated,
with an ASROCK PCI 3.0 compliant board. Having rerun the test jobs on a
loop that meant they were tested for 72 hours nonstop I had absolutely no
failures. So my own personal conclusion is to
1) run Ross's test suite on all new GPU cards and if they fail RMA them
2) get the biggest and best PSU's you can afford, I would contend that
850W is minimum for a 1 GPU, 1000W for 2GPU's and 1350W for 3GPU's
3) I would not advise try a "home-brew" GTX 4 GPU solution as I really
don't think it is easy to get an off the shelf PSU that would be rated
1.5Kw and above

Cheers
Ian
 

On 24/02/2014 16:12, "Ross Walker" <ross.rosswalker.co.uk> wrote:

>Hi Robert,
>
>This does indeed sound like faulty GPUs - do they both do it?
>
>First download the latest version of the NVIDIA driver and install it -
>v331.49 from here: http://www.nvidia.com/object/unix.html
>
>See if that helps at all. Then try downloading my test suite:
>
>https://dl.dropboxusercontent.com/u/708185/GPU_Validation_Test_2cards.tar.
>g
>z
>
>Run this and it should run for about 24 hours (you might have to tweak the
>run script a little for your setup, paths etc). At the end check the log
>files - all 10 runs should give the same final energy and both GPUs should
>also match. If they don't (or you see crashes during the run) then it
>means your GPUs are faulty.
>
>All the best
>Ross
>
>
>
>On 2/24/14, 3:01 AM, "Deák Robert" <kokumetto.gmail.com> wrote:
>
>>Dear Amber users,
>>
>>Recently we bought 2 GTX 670 DC mini (
>>http://www.asus.com/Graphics_Cards/GTX670DCMOC2GD5/) but with both of
>>them
>>I experienced the same error message after random run time.
>>
>>The message is:
>>*cudaMemcpy GpuBuffer::Download failed unspecified launch failure*
>>
>>With exactly the same input files and input settings there are no error
>>messages using a GTX TITAN or a TESLA card. I have tried the GTX 670
>>cards
>>in the other machine, and also a TITAN card in this server, but the error
>>is related to GTX 670 cards, independently from the server.
>>
>>My question is, this type of error message means hardware failure?
>>
>>These are my input parameters:
>> &cntrl
>> imin = 0, irest = 0, ntx = 1,
>> ntb = 2, pres0 = 1.0, ntp = 1,
>> taup = 2.0,
>> cut = 11.0, ntr = 0,
>> ntc = 2, ntf = 2,
>> tempi = 300.0, temp0 = 300.0,
>> ntt = 3, gamma_ln = 0.1,
>> nstlim = 100000000, dt = 0.002,
>> ntpr = 1000, ntwx = 1000, ntwr = 1000,
>> ig=1, nscm=1000
>> /
>>
>>Thanks,
>>Robert
>>_______________________________________________
>>AMBER mailing list
>>AMBER.ambermd.org
>>http://lists.ambermd.org/mailman/listinfo/amber
>
>
>
>_______________________________________________
>AMBER mailing list
>AMBER.ambermd.org
>http://lists.ambermd.org/mailman/listinfo/amber


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Feb 24 2014 - 09:00:06 PST
Custom Search