OK, after the tests, if it's the case, I will try your suggestions.
Thanks,
Robert
2014-02-24 17:33 GMT+01:00 Gould, Ian R <i.gould.imperial.ac.uk>:
> Dear All,
>
> A few anecdotes, I recently bought 4 GTX780 Ti's and due to the PSU being
> out of stock I ended up temporarily utilising them in older machines with
> well used PSU's and all the motherboards were PCI 2.0 not 3.0. I ran my
> own version of Ross's 24 hour validation tests, 1 card was completely
> unreliable and would not report the same numbers on any rerun.
> The other three would report exactly the same numbers after numerous,
> greater than 20 repeats of the same run. However, one of them would
> occasionally have the "cudaMemcpy GpuBuffer::Download failed unspecified
> launch failure" error and this seemed totally random in its occurrence.
>
> Recently I received the PSU, a 1350W guaranteed with 108A on the +12V
> rail, and installed all three cards in a big chasis, ie well ventilated,
> with an ASROCK PCI 3.0 compliant board. Having rerun the test jobs on a
> loop that meant they were tested for 72 hours nonstop I had absolutely no
> failures. So my own personal conclusion is to
> 1) run Ross's test suite on all new GPU cards and if they fail RMA them
> 2) get the biggest and best PSU's you can afford, I would contend that
> 850W is minimum for a 1 GPU, 1000W for 2GPU's and 1350W for 3GPU's
> 3) I would not advise try a "home-brew" GTX 4 GPU solution as I really
> don't think it is easy to get an off the shelf PSU that would be rated
> 1.5Kw and above
>
> Cheers
> Ian
>
>
> On 24/02/2014 16:12, "Ross Walker" <ross.rosswalker.co.uk> wrote:
>
> >Hi Robert,
> >
> >This does indeed sound like faulty GPUs - do they both do it?
> >
> >First download the latest version of the NVIDIA driver and install it -
> >v331.49 from here: http://www.nvidia.com/object/unix.html
> >
> >See if that helps at all. Then try downloading my test suite:
> >
> >https://dl.dropboxusercontent.com/u/708185/GPU_Validation_Test_2cards.tar
> .
> >g
> >z
> >
> >Run this and it should run for about 24 hours (you might have to tweak the
> >run script a little for your setup, paths etc). At the end check the log
> >files - all 10 runs should give the same final energy and both GPUs should
> >also match. If they don't (or you see crashes during the run) then it
> >means your GPUs are faulty.
> >
> >All the best
> >Ross
> >
> >
> >
> >On 2/24/14, 3:01 AM, "Deák Robert" <kokumetto.gmail.com> wrote:
> >
> >>Dear Amber users,
> >>
> >>Recently we bought 2 GTX 670 DC mini (
> >>http://www.asus.com/Graphics_Cards/GTX670DCMOC2GD5/) but with both of
> >>them
> >>I experienced the same error message after random run time.
> >>
> >>The message is:
> >>*cudaMemcpy GpuBuffer::Download failed unspecified launch failure*
> >>
> >>With exactly the same input files and input settings there are no error
> >>messages using a GTX TITAN or a TESLA card. I have tried the GTX 670
> >>cards
> >>in the other machine, and also a TITAN card in this server, but the error
> >>is related to GTX 670 cards, independently from the server.
> >>
> >>My question is, this type of error message means hardware failure?
> >>
> >>These are my input parameters:
> >> &cntrl
> >> imin = 0, irest = 0, ntx = 1,
> >> ntb = 2, pres0 = 1.0, ntp = 1,
> >> taup = 2.0,
> >> cut = 11.0, ntr = 0,
> >> ntc = 2, ntf = 2,
> >> tempi = 300.0, temp0 = 300.0,
> >> ntt = 3, gamma_ln = 0.1,
> >> nstlim = 100000000, dt = 0.002,
> >> ntpr = 1000, ntwx = 1000, ntwr = 1000,
> >> ig=1, nscm=1000
> >> /
> >>
> >>Thanks,
> >>Robert
> >>_______________________________________________
> >>AMBER mailing list
> >>AMBER.ambermd.org
> >>http://lists.ambermd.org/mailman/listinfo/amber
> >
> >
> >
> >_______________________________________________
> >AMBER mailing list
> >AMBER.ambermd.org
> >http://lists.ambermd.org/mailman/listinfo/amber
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Feb 24 2014 - 09:00:07 PST