For serial jobs, it ought to work just fine...
-----Original Message-----
From: Sasha Buzko [mailto:obuzko.ucla.edu]
Sent: Thursday, August 05, 2010 10:10
To: AMBER Mailing List
Subject: Re: [AMBER] Parallel pmemd.cuda: local bandwidth question. Are dual host PCI-E cards make any sense?
In other words, even four GPUs per PCI-E slot would be ok if we run
multiple serial processes, right?
In theory, then, we could even take a board with 3+ PCI-E x16 slots and
max them out with DHICs (as long as we can maintain the corresponding
processor core count)...
Thanks
Sasha
Scott Le Grand wrote:
> Four serial processes - go for the S2050...
>
> For one parallel process, maximize the bandwidth, give every GPU 16x...
>
>
> -----Original Message-----
> From: Sasha Buzko [mailto:obuzko.ucla.edu]
> Sent: Wednesday, August 04, 2010 20:58
> To: AMBER Mailing List
> Subject: [AMBER] Parallel pmemd.cuda: local bandwidth question. Are dual host PCI-E cards make any sense?
>
> Ross, Scott,
> thank you for the feedback. Clearly, any future mpi version should go
> over QDR IB.
>
> Can you help with the issue of local bandwidth? Nvidia sells dual host
> PCI-E adapter card that effectively connects 4 GPUs in an S2050 to a
> single x16 slot. When pmemd.cuda is run locally (parallel or four serial
> processes), how much would this impact performance?
> In other words, should we even consider these cards for a host system
> intended to run pmemd.cuda?
>
> Thanks again
>
> Sasha
>
>
> Thomas Zeiser wrote:
>
>> On Sat, Jul 31, 2010 at 10:12:29AM -0700, Scott Le Grand wrote:
>>
>>
>>> I'd definitely go for QDR between nodes.
>>>
>>> What's up in the air ATM is whether it's best to spread the
>>> C2050s across as many nodes as possible or whether 2 or 4 C2050s
>>> per node is the optimum configuration.
>>>
>>>
>> for QDR to work at full speed you need PCIe2.0 8x (not only
>> mechanically but also electrically). Are there any boards which have
>> four 16x slots and additionally more than only 4x electrically?
>>
>> At least in the typical Intel-based boards you have two 16x and one
>> 4x slot when there is one chipset on the mainboard; or four 16x and
>> (several) 4x slots (4x electrically although their mechanics width
>> is 8x) if there are two chipsets.
>>
>> Thus, with these boards, the maximum feasable is full DDR speed
>> using a DDR-IB card supporting 5.0 GT/s; (slightly cheaper DDR
>> cards with 2.5 GT/s only operate with an effective speed similar to
>> SDR ...)
>>
>>
>>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
> -----------------------------------------------------------------------------------
> This email message is for the sole use of the intended recipient(s) and may contain
> confidential information. Any unauthorized review, use, disclosure or distribution
> is prohibited. If you are not the intended recipient, please contact the sender by
> reply email and destroy all copies of the original message.
> -----------------------------------------------------------------------------------
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Aug 05 2010 - 10:30:05 PDT