Re: [AMBER] AMBER GPU Cooling/usage Question

From: Ross Walker <ross.rosswalker.co.uk>
Date: Thu, 09 May 2013 06:27:54 -0700

Hi ET,


>I've installed AMBER GPU on several different PC's with varying case
>designs. Generally I've noticed that when the GPU is under load the
>temperature gets to about 76-82 degrees C depending on Case design, other
>components, etc.
>
>My questions are:
>
>1) Has anyone managed to get their GPUs running at load at a temperature
>less than 76 degrees C. I've noticed that the well ventilated cases I've
>used seem to level at this temperature under load, but the difference is
>that they cool back to baseline temperature v rapidly. 30-60 secs.

You can if you water cool it - here's a 'getto' example from my lab that
works and stays below 75C.

http://www.brightsideofnews.com/Data/2013_4_29/Take-a-Tour-of-San-Diegos-Su
percomputer-Center/SDSC13%20(40%20of%2041)_689.jpg


That said it doesn't really matter. The cards are designed to run that hot
- CPUs typically run upwards of 90C so I wouldn't worry about it. I've
been running boxes with 4 GTX-680s in all at 85C plus flat out for months
on end with no problems. I've had a few infant deaths in the cards but
once that settled things are pretty stable.

>2) If you are running a long production run - e.g 6 repeats of a 100ns
>simulation in series. Do people tend to segment the production run. I.e.
>Simulate in 5ns segments, then give a "sleep" period of 5 minutes before
>starting the next segment.

I would definitely segment your simulation - purely to avoid heartache if
the machine crashes, or your disk gets corrupted etc. Typically I try to
space my runs to be around 2 to 4 hours and figure that I can always
repeat a 2 hour run without too much trouble. It can also make analysis
easier later as you don't end up with multi-terabyte single trajectory
files. Just make sure you are using a new random seed (or set ig=-1) for
each restart.

In terms of the 'sleep' I wouldn't bother - probably the worse thing you
can do for hardware is keep heating it up and cooling it down, that just
leads to metal fatigue and probably (pure speculation) shortens the life
of fans etc. Probably better just to leave it running flat out.

>The logic being not to run the card under a continuous load, as I
>understand the consumer grade Geforce 680s & Titans we use are not really
>rated for 24/7 usage. Thus they could potentially burn out under
>protracted, heavy use. On the other hand would lots of rapid heating then
>cooling reduce the lifespan of the electronics, though there must be some
>level of inbuilt tolerance for this.

They come with a 3 year warranty - If they break get them replaced. To be
honest though I haven't seen any real difference in reliability between
the gaming cards and the tesla cards. A few of the gaming cards die early
probably because the QC is not as good but after that they tend to run
just as well. They are the same physical chip underneath (and if you buy
from EVGA they are made by the same company in the same factory) so there
is no real argument that I know of for why the gaming card should be less
tolerant of being run continuously than a tesla card - even the fans are
the same. So really I think it is just a case of how much they are tested
when leaving the factory and that is related to infant death of the card
and not it's long term reliability as far as I can figure.

BTW, if you buy from Exxact they will provide you fully warrantied
machines (desktops and rack mount clusters) with 3 year + on GeForce
equipped systems. http://ambermd.org/gpus/recommended_hardware.htm#exxact

Hope that helps.

All the best
Ross

/\
\/
|\oss Walker

---------------------------------------------------------
| Assistant Research Professor |
| San Diego Supercomputer Center |
| Adjunct Assistant Professor |
| Dept. of Chemistry and Biochemistry |
| University of California San Diego |
| NVIDIA Fellow |
| http://www.rosswalker.co.uk | http://www.wmd-lab.org |
| Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
---------------------------------------------------------

Note: Electronic Mail is not secure, has no guarantee of delivery, may not
be read every day, and should not be used for urgent or sensitive issues.






_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu May 09 2013 - 07:00:02 PDT
Custom Search