Now its clear for me that the GTX670 cards are faulty.
I have tried them on a system replacing well working TESLA cards. A TESLA
card consuma 250 W, these GTX670 "only" 150 W, so the power is not a
question in this case. Inspite of this from 10 runs only 3 are reproduces,
the other runs ends in error message, or other energy values.
During the tetst I have monitored the temperature oif the cards and one of
them goes even up to 103 C-grade.
So I will ask the supplier to chenge the cards to other, but not GTX670.
Thanks your help!
Regards,
Robert
2014-02-24 18:26 GMT+01:00 Ross Walker <ross.rosswalker.co.uk>:
> If you've already tried 331.20 or 38 then .49 is for sure not going to
> help so I'd skip that. Try one GPU at a time in the machine - that might
> tell you if it is a power supply issue. Also I've found that just
> reseating the GPUs can help sometimes. Beyond that it's a faulty GPU.
>
>
> On 2/24/14, 8:30 AM, "Deák Robert" <kokumetto.gmail.com> wrote:
>
> >Thanks Scott and Ross!
> >
> >I have 23950 atoms and it's happening at random places given the same
> >input
> >as you wrote, Scott.
> >The cards (both of them) produce the mentioned error from the first use
> >...
> >
> >I have tried the 319.60, 331.20, 331.38 drivers. Now, I will try the
> >331.49
> >driver, and the test tool from Ross, than I will write again.
> >
> >All the best,
> >Robert
> >
> >
> >2014-02-24 17:12 GMT+01:00 Ross Walker <ross.rosswalker.co.uk>:
> >
> >> Hi Robert,
> >>
> >> This does indeed sound like faulty GPUs - do they both do it?
> >>
> >> First download the latest version of the NVIDIA driver and install it -
> >> v331.49 from here: http://www.nvidia.com/object/unix.html
> >>
> >> See if that helps at all. Then try downloading my test suite:
> >>
> >>
> >>
> https://dl.dropboxusercontent.com/u/708185/GPU_Validation_Test_2cards.tar
> >>.g
> >> z
> >>
> >> Run this and it should run for about 24 hours (you might have to tweak
> >>the
> >> run script a little for your setup, paths etc). At the end check the log
> >> files - all 10 runs should give the same final energy and both GPUs
> >>should
> >> also match. If they don't (or you see crashes during the run) then it
> >> means your GPUs are faulty.
> >>
> >> All the best
> >> Ross
> >>
> >>
> >>
> >> On 2/24/14, 3:01 AM, "Deák Robert" <kokumetto.gmail.com> wrote:
> >>
> >> >Dear Amber users,
> >> >
> >> >Recently we bought 2 GTX 670 DC mini (
> >> >http://www.asus.com/Graphics_Cards/GTX670DCMOC2GD5/) but with both of
> >> them
> >> >I experienced the same error message after random run time.
> >> >
> >> >The message is:
> >> >*cudaMemcpy GpuBuffer::Download failed unspecified launch failure*
> >> >
> >> >With exactly the same input files and input settings there are no error
> >> >messages using a GTX TITAN or a TESLA card. I have tried the GTX 670
> >>cards
> >> >in the other machine, and also a TITAN card in this server, but the
> >>error
> >> >is related to GTX 670 cards, independently from the server.
> >> >
> >> >My question is, this type of error message means hardware failure?
> >> >
> >> >These are my input parameters:
> >> > &cntrl
> >> > imin = 0, irest = 0, ntx = 1,
> >> > ntb = 2, pres0 = 1.0, ntp = 1,
> >> > taup = 2.0,
> >> > cut = 11.0, ntr = 0,
> >> > ntc = 2, ntf = 2,
> >> > tempi = 300.0, temp0 = 300.0,
> >> > ntt = 3, gamma_ln = 0.1,
> >> > nstlim = 100000000, dt = 0.002,
> >> > ntpr = 1000, ntwx = 1000, ntwr = 1000,
> >> > ig=1, nscm=1000
> >> > /
> >> >
> >> >Thanks,
> >> >Robert
> >> >_______________________________________________
> >> >AMBER mailing list
> >> >AMBER.ambermd.org
> >> >http://lists.ambermd.org/mailman/listinfo/amber
> >>
> >>
> >>
> >> _______________________________________________
> >> AMBER mailing list
> >> AMBER.ambermd.org
> >> http://lists.ambermd.org/mailman/listinfo/amber
> >>
> >_______________________________________________
> >AMBER mailing list
> >AMBER.ambermd.org
> >http://lists.ambermd.org/mailman/listinfo/amber
>
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Feb 27 2014 - 02:30:02 PST