Re: [AMBER] performance of pmed.cuda.MPI

From: Gould, Ian R <i.gould.imperial.ac.uk> Date: Fri, 21 Sep 2012 17:18:05 +0000

-- 
Dr Ian R Gould
Reader in Computational Chemical Biology
Department of Chemistry
Imperial College London
Exhibition Road
London
SW7 2AY
E-mail i.gould.imperial.ac.uk
http://www3.imperial.ac.uk/people/i.gould
Tel +44 (0)207 594 5809
On 21/09/2012 18:09, "Jonathan Gough" <jonathan.d.gough.gmail.com> wrote:
>ON a slightly related note:
>
>When running gpu calculations, has anyone looked at or have an idea of the
>effect that the CPU speed has?  For example -- if  you were to compare
>machines that were identical (same motherboard, hard drive, RAM, and a
>680X) but had different CPU's would you see a performance bump/drop or is
>it still a level playing field?  Can you get by with an i3 or i5 instead
>of
>a top end i7?  (could be another way to save $ if your building on a
>budget)
>
>any thoughts or insight?
>
>Thanks,
>JOnathan
>
>
>
>On Fri, Sep 21, 2012 at 12:56 PM, Ross Walker <ross.rosswalker.co.uk>
>wrote:
>
>> Hi Tru,
>>
>>
>> >>MPI performance of GTX 690 is abysmal because the two GPUs share the
>>same
>> >> PCIEX adaptor.
>> >>
>> >> That will improve down the road somewhat.
>> >>
>> >> In the meantime, I think you'll be happy at the performance of two
>> >> independent runs (one on each GPU): 98+% efficiency when I last
>> >>checked...
>> >
>> >If I understand you correctly, with a 4 PCI-E motherboard, and 4x GTX
>>690,
>> >one should run 8 independant pmemd.cuda (non MPI) to get the maximum
>> >throughput.
>>
>> Yes with caveats.
>>
>> Firstly, when you run single GPU AMBER the entire calculation is run on
>> the GPU and communication over the PCI-E bus only occurs for I/O. Thus
>>if
>> you set NTPR and NTWX high enough, typically >= 1000 then the PCI-E
>>speed
>> will have little impact on the performance. Additionally as long as you
>> select the GPUs correctly so you make sure each individual job runs on a
>> different physical GPU then the performance impact of each successive
>> single GPU job will be minimal. Performance decrease occurs mostly
>>because
>> of contention for I/O resources. Thus PCI-E x8 is reasonable for single
>> GPU jobs. x4 might be cutting it too fine but would still work
>>reasonably
>> well if you don't do I/O too frequently.
>>
>> Now on to parallel jobs. When you want to run a job across multiple GPUs
>> then it is necessary for information to be sent between each GPU on
>>every
>> time step irrespective of whether I/O is being done. This makes the
>>PCI/E
>> bandwidth critical and a major bottleneck. x16 was marginal back in the
>> days of C1060 / C2050 cards. Now we have cards that are almost double
>>the
>> speed and we are still at x16 speed - clearly this is FAIL! Then it gets
>> worse, with the K10 and GTX690 there are two GPUs on the same board,
>> although for all intents and purposes they are really two distinct GPUs
>> that are essentially jammed into the same PCI-E slot. The bandwidth for
>> each GPU is thus x8 which is woefully inadequate for running AMBER in
>> parallel across multiple GPUs. When you use both GPUs on a single K10 or
>> GTX690 they still share the PCI-E bus so it is like having two cards
>>each
>> in X8 slots hence it doesn't help in parallel. If there was a 'real'
>> interconnect between the two GPUs then it would be interesting but these
>> aren't, they are just two GPUs each one on half of the PCI-E connection.
>> The K10's scale a little better than the GTX690s but that's just because
>> the GPUs themselves are slower and so the performance to bandwidth ratio
>> is a little better. If you measure absolute performance though there is
>>no
>> free lunch there.
>>
>> Now onto putting 4 GTX 690s in the same box. I have not tried this and I
>> don't know of any vendor selling them. 4 x GTX680 is no problem. The
>>issue
>> with 4 x 690s is you have 8 physical GPUs per box and there are VERY few
>> motherboards that have bios's that can support 8 physical GPUs. The
>>Tyan 8
>> way M2090s took a LOT of work to get to work, including substantial
>> hacking of the bios. The issue is that the physical address space is
>>just
>> 64K (a hard limit imposed by the legacy x86 architecture). Each GPU uses
>> around 4K of IO space so 8 GPUs needs half of the total IO space which
>> assumes everything else on the motherboard, NIC cards, hard drive
>> controllers etc is being very economical and well behaved. On consumer
>> boards this is unlikely and so I'd be very surprised if you can get 4
>> GTX690s in a regular board. You probably need to go for multi socket
>> specialist super micro or than boards which can be VERY expensive (not
>>to
>> mention the CPU costs). So it is generally much more cost effective to
>> built 2 nodes by 2 GTX690s each. You might be able to get away with 3
>> GTX690s in one board although I don't know anybody who has tried it and
>>it
>> will run VERY hot.
>>
>> Power, you probably need 2 x 1.2KW independent power supplies for 4
>> GTX690s as well which will make the case expensive.
>>
>> >
>> >The GTX-690 is seen as 2 nvidia devices that are adressed
>>independantly?
>>
>> Yes, for all intents and purposes consider them to be 2 physical GPUs
>> jammed in a single PCI-E x16 slot sharing the fan.
>>
>> >In order to get a better pmemd.cuda.MPI scaling, that one needs to only
>> >target
>> >one of the 2 GPUS on each PCI-E for each run? How does that behave for
>> >independant
>> >pmemd.cuda.MPI simulations? Do the shared PCI-E become the bottleneck?
>> >Bottom line, are multiple GTX-690 in the same server worth it? or
>>should
>> >one stay with the regular GTX-680?
>>
>> Using only one of the GTX690 GPUs on each board can help. E.g. if you
>>use
>> one GPU from each of two boards then they will get x16 bandwidth each
>>and
>> the parallel scaling will improve. But you will be leaving half the GPUs
>> idle. You can't run 2 x 2 GPU jobs split over two boards since this puts
>> you back with x8 bandwidth for each GPU. The GTX680s don't scale very
>>well
>> in parallel because they are so damn fast individually and the PCI-E X16
>> bandwidth can't keep up. Hence until we get x32 being ubiquitous and
>> feeding all GPUs at that speed it is better to focus on single GPU runs
>>in
>> which case it is a close call between GTX690s and 680s since the 690s
>>are
>> about twice the price of the 680s so you don't get any extra hardware
>>for
>> free. Thus if you can get 690s at a discount compared to 2 680s then it
>>is
>> probably worth going with the 690s. Unless you have other constraints
>>like
>> space etc. Given nobody should be running single MD runs these days and
>> trying to draw conclusions (in the lab we work with MOLs of molecules
>> remember) so it isn't critical that with 4 GPUs in a machine the optimum
>> way to run them is 4 independent MD simulations.
>>
>> Here's an example hardware shopping list for building your own 2 x
>>GTX680
>> or 2 x GTX690 machines. We have built several of these and they work
>>great.
>>
>> http://www.rosswalker.co.uk/current_amber_gpu_spec.htm
>>
>> Hope that helps.
>>
>> All the best
>> Ross
>>
>> /\
>> \/
>> |\oss Walker
>>
>> ---------------------------------------------------------
>> |             Assistant Research Professor              |
>> |            San Diego Supercomputer Center             |
>> |             Adjunct Assistant Professor               |
>> |         Dept. of Chemistry and Biochemistry           |
>> |          University of California San Diego           |
>> |                     NVIDIA Fellow                     |
>> | http://www.rosswalker.co.uk | http://www.wmd-lab.org  |
>> | Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk  |
>> ---------------------------------------------------------
>>
>> Note: Electronic Mail is not secure, has no guarantee of delivery, may
>>not
>> be read every day, and should not be used for urgent or sensitive
>>issues.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>_______________________________________________
>AMBER mailing list
>AMBER.ambermd.org
>http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber