Re: [AMBER] cuda lauch time out error in Amber 11

From: Ross Walker <ross.rosswalker.co.uk>
Date: Fri, 27 Aug 2010 06:01:31 -0700

Hi Sasha,

> My guess is that your error message is an inherent problem with a
> GTX470
> (a consumer card, just like GTX480), since it's never happened to a
> Tesla C1060 (in my experience). I haven't tested the latest Tesla cards
> (C2050 series), but my guess is that random memory errors without ECC
> cause the calculations to bail out, while such events would go
> unnoticed
> in a gaming/video environment.
> In my case, I'm just using a workaround in the job script to catch the
> error output and rerun the job. And waiting to upgrade to Teslas.

I'd be VERY surprised if this was an ECC related memory issue. It is much
more likely to either be a very marginal card in the first place or
overheating / insufficient power supply. The 650W power supply he has is too
small in my opinion and is probably struggling to keep up with the GPU under
load. Maybe the machine starts to get hot and tries to ramp up the fan
speeds and that is putting the power supply over the limit and it 'browns'
out on the GPU. Just a guess.

Note, I have run calculations for weeks on a GTX295 without an error and I
run all the C2050's without ECC turned on since it gives you around a 10%
boost in performance to disable ECC. So I am guessing dodgy hardware.
 
All the best
Ross

/\
\/
|\oss Walker

---------------------------------------------------------
| Assistant Research Professor |
| San Diego Supercomputer Center |
| Adjunct Assistant Professor |
| Dept. of Chemistry and Biochemistry |
| University of California San Diego |
| NVIDIA Fellow |
| http://www.rosswalker.co.uk | http://www.wmd-lab.org/ |
| Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
---------------------------------------------------------

Note: Electronic Mail is not secure, has no guarantee of delivery, may not
be read every day, and should not be used for urgent or sensitive issues.





_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Aug 27 2010 - 06:30:03 PDT
Custom Search