Re: [AMBER] performance of pmed.cuda.MPI

From: Gould, Ian R <i.gould.imperial.ac.uk>
Date: Fri, 21 Sep 2012 17:18:05 +0000

Hi Jonathan,

I run my GTX's and Tesla's on the cheapest of cheap processors, dual
pentium or i3, with 8gb of memory tops combined this helps keep the heat
in the case down.

Cheers
Ian

Women love us for our defects. If we have enough of them, they will
forgive us everything, even our intellects.
Oscar Wilde,
-- 
Dr Ian R Gould
Reader in Computational Chemical Biology
Department of Chemistry
Imperial College London
Exhibition Road
London
SW7 2AY
E-mail i.gould.imperial.ac.uk
http://www3.imperial.ac.uk/people/i.gould
Tel +44 (0)207 594 5809
On 21/09/2012 18:09, "Jonathan Gough" <jonathan.d.gough.gmail.com> wrote:
>ON a slightly related note:
>
>When running gpu calculations, has anyone looked at or have an idea of the
>effect that the CPU speed has?  For example -- if  you were to compare
>machines that were identical (same motherboard, hard drive, RAM, and a
>680X) but had different CPU's would you see a performance bump/drop or is
>it still a level playing field?  Can you get by with an i3 or i5 instead
>of
>a top end i7?  (could be another way to save $ if your building on a
>budget)
>
>any thoughts or insight?
>
>Thanks,
>JOnathan
>
>
>
>On Fri, Sep 21, 2012 at 12:56 PM, Ross Walker <ross.rosswalker.co.uk>
>wrote:
>
>> Hi Tru,
>>
>>
>> >>MPI performance of GTX 690 is abysmal because the two GPUs share the
>>same
>> >> PCIEX adaptor.
>> >>
>> >> That will improve down the road somewhat.
>> >>
>> >> In the meantime, I think you'll be happy at the performance of two
>> >> independent runs (one on each GPU): 98+% efficiency when I last
>> >>checked...
>> >
>> >If I understand you correctly, with a 4 PCI-E motherboard, and 4x GTX
>>690,
>> >one should run 8 independant pmemd.cuda (non MPI) to get the maximum
>> >throughput.
>>
>> Yes with caveats.
>>
>> Firstly, when you run single GPU AMBER the entire calculation is run on
>> the GPU and communication over the PCI-E bus only occurs for I/O. Thus
>>if
>> you set NTPR and NTWX high enough, typically >= 1000 then the PCI-E
>>speed
>> will have little impact on the performance. Additionally as long as you
>> select the GPUs correctly so you make sure each individual job runs on a
>> different physical GPU then the performance impact of each successive
>> single GPU job will be minimal. Performance decrease occurs mostly
>>because
>> of contention for I/O resources. Thus PCI-E x8 is reasonable for single
>> GPU jobs. x4 might be cutting it too fine but would still work
>>reasonably
>> well if you don't do I/O too frequently.
>>
>> Now on to parallel jobs. When you want to run a job across multiple GPUs
>> then it is necessary for information to be sent between each GPU on
>>every
>> time step irrespective of whether I/O is being done. This makes the
>>PCI/E
>> bandwidth critical and a major bottleneck. x16 was marginal back in the
>> days of C1060 / C2050 cards. Now we have cards that are almost double
>>the
>> speed and we are still at x16 speed - clearly this is FAIL! Then it gets
>> worse, with the K10 and GTX690 there are two GPUs on the same board,
>> although for all intents and purposes they are really two distinct GPUs
>> that are essentially jammed into the same PCI-E slot. The bandwidth for
>> each GPU is thus x8 which is woefully inadequate for running AMBER in
>> parallel across multiple GPUs. When you use both GPUs on a single K10 or
>> GTX690 they still share the PCI-E bus so it is like having two cards
>>each
>> in X8 slots hence it doesn't help in parallel. If there was a 'real'
>> interconnect between the two GPUs then it would be interesting but these
>> aren't, they are just two GPUs each one on half of the PCI-E connection.
>> The K10's scale a little better than the GTX690s but that's just because
>> the GPUs themselves are slower and so the performance to bandwidth ratio
>> is a little better. If you measure absolute performance though there is
>>no
>> free lunch there.
>>
>> Now onto putting 4 GTX 690s in the same box. I have not tried this and I
>> don't know of any vendor selling them. 4 x GTX680 is no problem. The
>>issue
>> with 4 x 690s is you have 8 physical GPUs per box and there are VERY few
>> motherboards that have bios's that can support 8 physical GPUs. The
>>Tyan 8
>> way M2090s took a LOT of work to get to work, including substantial
>> hacking of the bios. The issue is that the physical address space is
>>just
>> 64K (a hard limit imposed by the legacy x86 architecture). Each GPU uses
>> around 4K of IO space so 8 GPUs needs half of the total IO space which
>> assumes everything else on the motherboard, NIC cards, hard drive
>> controllers etc is being very economical and well behaved. On consumer
>> boards this is unlikely and so I'd be very surprised if you can get 4
>> GTX690s in a regular board. You probably need to go for multi socket
>> specialist super micro or than boards which can be VERY expensive (not
>>to
>> mention the CPU costs). So it is generally much more cost effective to
>> built 2 nodes by 2 GTX690s each. You might be able to get away with 3
>> GTX690s in one board although I don't know anybody who has tried it and
>>it
>> will run VERY hot.
>>
>> Power, you probably need 2 x 1.2KW independent power supplies for 4
>> GTX690s as well which will make the case expensive.
>>
>> >
>> >The GTX-690 is seen as 2 nvidia devices that are adressed
>>independantly?
>>
>> Yes, for all intents and purposes consider them to be 2 physical GPUs
>> jammed in a single PCI-E x16 slot sharing the fan.
>>
>> >In order to get a better pmemd.cuda.MPI scaling, that one needs to only
>> >target
>> >one of the 2 GPUS on each PCI-E for each run? How does that behave for
>> >independant
>> >pmemd.cuda.MPI simulations? Do the shared PCI-E become the bottleneck?
>> >Bottom line, are multiple GTX-690 in the same server worth it? or
>>should
>> >one stay with the regular GTX-680?
>>
>> Using only one of the GTX690 GPUs on each board can help. E.g. if you
>>use
>> one GPU from each of two boards then they will get x16 bandwidth each
>>and
>> the parallel scaling will improve. But you will be leaving half the GPUs
>> idle. You can't run 2 x 2 GPU jobs split over two boards since this puts
>> you back with x8 bandwidth for each GPU. The GTX680s don't scale very
>>well
>> in parallel because they are so damn fast individually and the PCI-E X16
>> bandwidth can't keep up. Hence until we get x32 being ubiquitous and
>> feeding all GPUs at that speed it is better to focus on single GPU runs
>>in
>> which case it is a close call between GTX690s and 680s since the 690s
>>are
>> about twice the price of the 680s so you don't get any extra hardware
>>for
>> free. Thus if you can get 690s at a discount compared to 2 680s then it
>>is
>> probably worth going with the 690s. Unless you have other constraints
>>like
>> space etc. Given nobody should be running single MD runs these days and
>> trying to draw conclusions (in the lab we work with MOLs of molecules
>> remember) so it isn't critical that with 4 GPUs in a machine the optimum
>> way to run them is 4 independent MD simulations.
>>
>> Here's an example hardware shopping list for building your own 2 x
>>GTX680
>> or 2 x GTX690 machines. We have built several of these and they work
>>great.
>>
>> http://www.rosswalker.co.uk/current_amber_gpu_spec.htm
>>
>> Hope that helps.
>>
>> All the best
>> Ross
>>
>> /\
>> \/
>> |\oss Walker
>>
>> ---------------------------------------------------------
>> |             Assistant Research Professor              |
>> |            San Diego Supercomputer Center             |
>> |             Adjunct Assistant Professor               |
>> |         Dept. of Chemistry and Biochemistry           |
>> |          University of California San Diego           |
>> |                     NVIDIA Fellow                     |
>> | http://www.rosswalker.co.uk | http://www.wmd-lab.org  |
>> | Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk  |
>> ---------------------------------------------------------
>>
>> Note: Electronic Mail is not secure, has no guarantee of delivery, may
>>not
>> be read every day, and should not be used for urgent or sensitive
>>issues.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>_______________________________________________
>AMBER mailing list
>AMBER.ambermd.org
>http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Sep 21 2012 - 10:30:03 PDT
Custom Search