Re: [AMBER] Sufficient CPU cores/GPU ratio ?

From: Jodi Ann Hadden <jodih.uga.edu>
Date: Tue, 29 Nov 2011 17:43:19 +0000

Just a quick update for anyone who was interested... We've had this machine back for about 2 months now and have encountered no problems with running AMBER on all 4 C2070s at once for days at a time. This hardware configuration seems to be working out for us, though I'm still keeping my fingers crossed just in case... ;-)

Jodi

On Sep 21, 2011, at 2:10 PM, Marek Maly wrote:

> OK,
> I wish to your new machine long life !
>
> Would be anyway interesting to hear from you after
> this machine is sufficiently tested (I mean Amber longer calculations with
> all 4 GPUs fully "loaded") if everything
> is OK and you are fully satisfied with this HW combination.
>
> Best wishes,
>
> Marek
>
>
>
> Dne Wed, 21 Sep 2011 19:54:08 +0200 Jodi Ann Hadden <jodih.uga.edu>
> napsal/-a:
>
>> Just un update on our GPU machine with the motherboard socket burn:
>>
>> I received an email from Microway today saying the machine is being
>> shipped back to us. They have installed a new motherboard of the same
>> model and a 1350W PSU and claim to have stress tested it with the 4
>> C2070s according to their protocol for testing a newly assembled
>> machine. They only thing they had to say with regard to what went wrong
>> after examining the old motherboard and PSU was that "results were
>> inconclusive". I have lots of work lined up for the machine upon its
>> return, so we should find out pretty quickly if there is some
>> fundamental insufficiency in this particular grouping of hardware
>> components... I'm still hoping for the lemon explanation and that all
>> will be well now, but I'll be saying a prayer to Glycon, patron deity of
>> the GLYCAM lab, before booting it up just in case...
>>
>> Jodi
>>
>> On Sep 18, 2011, at 10:34 AM, Marek Maly wrote:
>>
>>> Hello Ross,
>>> thanks for deep analysis ! So let's see what answer/solution will be
>>> offered to Jodi.
>>>
>>> Regarding to me I definitely decided not to go over 3 x GTX 580 (3GB)
>>> using
>>> common (one socket) motherboard (like mentioned "Asus P6T7 WS
>>> SuperComputer").
>>>
>>> I will also buy some 1400W or 1500W PSU (even for "just" 3 x GTX 580) to
>>> ensure
>>> safe/long-term functioning of these machines.
>>>
>>> Best wishes,
>>>
>>> Marek
>>>
>>>
>>>
>>>
>>> Dne Sun, 18 Sep 2011 02:01:49 +0200 Ross Walker <ross.rosswalker.co.uk>
>>> napsal/-a:
>>>
>>>> Hi Marek,
>>>>
>>>>> However I would assume that insufficient PSU will cause just GPUs/CPU
>>>>> errors but not
>>>>> the melting of PSU connector ... but I am definitely not an expert
>>>>> here.
>>>>
>>>> Yes BUT there should be no way an overloaded power supply would short
>>>> out
>>>> and melt the motherboard power connector. In principal it should just
>>>> trip
>>>> the power supply. However I suppose it is possible that either the
>>>> current
>>>> draw through the PCI-E slots from the 4 C2070s was too high causing a
>>>> motherboard voltage regulator to fail and subsequently short out. This
>>>> would
>>>> be a fundamental design flaw in the motherboard but seems unlikely.
>>>>
>>>> The other possibility is the power supply was up against the limit and
>>>> shorted in some way and put too much voltage on the motherboard power
>>>> connector and that caused it to burn out.
>>>>
>>>> Either way it is simply not a failure that should be possible even if
>>>> the
>>>> power supply got overloaded. But then one only has to look at San
>>>> Diego's
>>>> power company trying to blame some lowly technician for blacking out
>>>> the
>>>> whole of southern California, Arizona and New Mexico to realize that
>>>> everybody takes short cuts not bothering to make things fail safe.
>>>>
>>>> Oh well.
>>>>
>>>> Let's wait to see what happens when the machine comes back.
>>>>
>>>> BTW, people should note that all the 4 GPU boxes I have built, that
>>>> are 2
>>>> socket with 4 C2070 or 4 x M2090 and supermicro boards ALL have dual
>>>> 1.4KW
>>>> redundant power supplies. Trying to run 4 GPUs (4 GTX580s would be
>>>> crazy)
>>>> off a single power supply is really pushing the envelope. Especially in
>>>> the
>>>> US where the voltage is an appallingly low 110V so you can only get
>>>> 2.2KW
>>>> total off a single circuit. 2 x 1.4KW independent power supplies
>>>> plugged
>>>> in
>>>> independent circuits (but sharing the same earth) is probably the
>>>> correct
>>>> way to go for building such systems. In Europe you can plug both power
>>>> supplies in the same circuit. :-)
>>>>
>>>> All the best
>>>> Ross
>>>>
>>>> /\
>>>> \/
>>>> |\oss Walker
>>>>
>>>> ---------------------------------------------------------
>>>> | Assistant Research Professor |
>>>> | San Diego Supercomputer Center |
>>>> | Adjunct Assistant Professor |
>>>> | Dept. of Chemistry and Biochemistry |
>>>> | University of California San Diego |
>>>> | NVIDIA Fellow |
>>>> | http://www.rosswalker.co.uk | http://www.wmd-lab.org/ |
>>>> | Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
>>>> ---------------------------------------------------------
>>>>
>>>> Note: Electronic Mail is not secure, has no guarantee of delivery, may
>>>> not
>>>> be read every day, and should not be used for urgent or sensitive
>>>> issues.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> AMBER mailing list
>>>> AMBER.ambermd.org
>>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>>
>>>> __________ Informace od ESET NOD32 Antivirus, verze databaze 6472
>>>> (20110917) __________
>>>>
>>>> Tuto zpravu proveril ESET NOD32 Antivirus.
>>>>
>>>> http://www.eset.cz
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Tato zpráva byla vytvořena převratným poštovním klientem Opery:
>>> http://www.opera.com/mail/
>>>
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>
>>>
>>
>>
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>> __________ Informace od ESET NOD32 Antivirus, verze databaze 6482
>> (20110921) __________
>>
>> Tuto zpravu proveril ESET NOD32 Antivirus.
>>
>> http://www.eset.cz
>>
>>
>>
>
>
> --
> Tato zpráva byla vytvořena převratným poštovním klientem Opery:
> http://www.opera.com/mail/
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
>



_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Nov 29 2011 - 10:00:02 PST
Custom Search