Re: [AMBER] Sufficient CPU cores/GPU ratio ?

From: Jodi Ann Hadden <jodih.uga.edu>
Date: Wed, 21 Sep 2011 17:54:08 +0000

Just un update on our GPU machine with the motherboard socket burn:

I received an email from Microway today saying the machine is being shipped back to us. They have installed a new motherboard of the same model and a 1350W PSU and claim to have stress tested it with the 4 C2070s according to their protocol for testing a newly assembled machine. They only thing they had to say with regard to what went wrong after examining the old motherboard and PSU was that "results were inconclusive". I have lots of work lined up for the machine upon its return, so we should find out pretty quickly if there is some fundamental insufficiency in this particular grouping of hardware components... I'm still hoping for the lemon explanation and that all will be well now, but I'll be saying a prayer to Glycon, patron deity of the GLYCAM lab, before booting it up just in case...

Jodi

On Sep 18, 2011, at 10:34 AM, Marek Maly wrote:

> Hello Ross,
> thanks for deep analysis ! So let's see what answer/solution will be
> offered to Jodi.
>
> Regarding to me I definitely decided not to go over 3 x GTX 580 (3GB) using
> common (one socket) motherboard (like mentioned "Asus P6T7 WS
> SuperComputer").
>
> I will also buy some 1400W or 1500W PSU (even for "just" 3 x GTX 580) to
> ensure
> safe/long-term functioning of these machines.
>
> Best wishes,
>
> Marek
>
>
>
>
> Dne Sun, 18 Sep 2011 02:01:49 +0200 Ross Walker <ross.rosswalker.co.uk>
> napsal/-a:
>
>> Hi Marek,
>>
>>> However I would assume that insufficient PSU will cause just GPUs/CPU
>>> errors but not
>>> the melting of PSU connector ... but I am definitely not an expert
>>> here.
>>
>> Yes BUT there should be no way an overloaded power supply would short out
>> and melt the motherboard power connector. In principal it should just
>> trip
>> the power supply. However I suppose it is possible that either the
>> current
>> draw through the PCI-E slots from the 4 C2070s was too high causing a
>> motherboard voltage regulator to fail and subsequently short out. This
>> would
>> be a fundamental design flaw in the motherboard but seems unlikely.
>>
>> The other possibility is the power supply was up against the limit and
>> shorted in some way and put too much voltage on the motherboard power
>> connector and that caused it to burn out.
>>
>> Either way it is simply not a failure that should be possible even if the
>> power supply got overloaded. But then one only has to look at San Diego's
>> power company trying to blame some lowly technician for blacking out the
>> whole of southern California, Arizona and New Mexico to realize that
>> everybody takes short cuts not bothering to make things fail safe.
>>
>> Oh well.
>>
>> Let's wait to see what happens when the machine comes back.
>>
>> BTW, people should note that all the 4 GPU boxes I have built, that are 2
>> socket with 4 C2070 or 4 x M2090 and supermicro boards ALL have dual
>> 1.4KW
>> redundant power supplies. Trying to run 4 GPUs (4 GTX580s would be crazy)
>> off a single power supply is really pushing the envelope. Especially in
>> the
>> US where the voltage is an appallingly low 110V so you can only get 2.2KW
>> total off a single circuit. 2 x 1.4KW independent power supplies plugged
>> in
>> independent circuits (but sharing the same earth) is probably the correct
>> way to go for building such systems. In Europe you can plug both power
>> supplies in the same circuit. :-)
>>
>> All the best
>> Ross
>>
>> /\
>> \/
>> |\oss Walker
>>
>> ---------------------------------------------------------
>> | Assistant Research Professor |
>> | San Diego Supercomputer Center |
>> | Adjunct Assistant Professor |
>> | Dept. of Chemistry and Biochemistry |
>> | University of California San Diego |
>> | NVIDIA Fellow |
>> | http://www.rosswalker.co.uk | http://www.wmd-lab.org/ |
>> | Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
>> ---------------------------------------------------------
>>
>> Note: Electronic Mail is not secure, has no guarantee of delivery, may
>> not
>> be read every day, and should not be used for urgent or sensitive issues.
>>
>>
>>
>>
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>> __________ Informace od ESET NOD32 Antivirus, verze databaze 6472
>> (20110917) __________
>>
>> Tuto zpravu proveril ESET NOD32 Antivirus.
>>
>> http://www.eset.cz
>>
>>
>>
>
>
> --
> Tato zpráva byla vytvořena převratným poštovním klientem Opery:
> http://www.opera.com/mail/
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
>



_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Sep 21 2011 - 11:00:04 PDT
Custom Search