Re: [AMBER] Amber CUDA calcualtion on GeForce GTX 590 ?

From: Jason Swails <jason.swails.gmail.com>
Date: Sat, 9 Apr 2011 10:31:16 -0700

Hi Filip,

I believe pmemd.cuda.MPI needs to be run with not only an even number of
processors, but a power-of-2 number of processors, for performance reasons.

Also, the scaling penalty on the GTX590 is not only not surprising, but
expected (as Ross and Scott have belabored previously in this thread). The
problem is that you have effectively 2 GPUs sharing the PCIe bus. Since the
bottleneck to scaling is going to be the communication, if you halve the
bandwidth available to each card (which you're doing by plugging 2 GPUs into
the same socket), then you're going to see a serious hit in parallel
scaling.

In fact, I think the most surprising part of this thread is that using both
GPUs on a GTX590 DOESN'T suck, since they're trying to communicate to each
other through the same PCIe channel (which is now only 8x because they have
to split it).

When you move onto the second GTX590, though, you're definitely pushing your
luck. Now you have 4 GPUs all trying to communicate with each other via
PCIe x8 slots, rather than the x16 slots that would give optimal
performance.

Hope this helps,
Jason

On Sat, Apr 9, 2011 at 9:21 AM, filip fratev <filipfratev.yahoo.com> wrote:

> Hi Ross,
> Is it possible to run pmemd.cuda.MPI with only 3 cards?
>
> I tried to do that but received this error:
> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
> APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)
>
> The problem is that I observed today that the 4th GPU of the second GTX590
> in fact worse the results or give only up to 5% contribution. For instance,
> the scaling between the docking program Hex and Amber is almost the same
> -1.48x between 2GPU's of GTX590. When I used the 3th GPUs in Hex the scale
> is fine 1.87x compared to 1GPU (Also similar to ACEMD results -1 to 3 cards
> gives nearly 2x speed up). However, the 4th GPUs worse the calculations in
> Hex and I suppose also in Amber will be the same. The results in NAMD2.8b1
> are similar and the 4th GPU gives only 5% contribution.
> Thus if I am able to use only 3GPUs in Amber this will give me significant
> advantage with the current drivers.
>
> I'd like also to ask you about your opinion what could be the reason for
> such bad/terrible scaling? I mean 1.3-4x is somehow ok, but 1.1x..Only
> driver problem or I am missing something?
>
> All the best,
> Filip
>
>
>
> --- On Fri, 4/8/11, filip fratev <filipfratev.yahoo.com> wrote:
>
> > From: filip fratev <filipfratev.yahoo.com>
> > Subject: Re: [AMBER] Amber CUDA calcualtion on GeForce GTX 590 ? -Test
> results
> > To: "AMBER Mailing List" <amber.ambermd.org>
> > Date: Friday, April 8, 2011, 12:04 PM
> > Hi Ross, Marek and all,
> > I obtained terrible results by GTX590 (two cards) using the
> > unofficial 270.40 drivers:
> >
> > Explicit solvent:
> >
> > 1) GTX590 DHFR NPT:
> > 1GPU: 20.18 ns/day
> > 2GPU 28.93 ns/day
> > 4GPU 32.60 ns/day
> > (GTX470 18.82 ns/day)
> >
> > 2) GTX590 DHFR NVE:
> > 1GPU: 23.45 ns/day
> > 2GPU 33.54 ns/day
> > 4GPU 36.91 ns/day
> > (GTX470 21.20 ns/day)
> >
> >
> > From the above results we can see that for very small
> > systems GTX590 is only about 54-56% faster than GTX470.
> > Thus, GTX590 has comparable to GTX580 result if the last is
> > in reality 50% faster than GTX470 (I don't have such card,
> > if one has, please post some results here). As you can
> > see the scaling is 1.43 between the cores and ONLY 1.13
> > (13%...) between the cards. So, if one plan to simulate only
> > such small systems the choose is obvious...:)
> >
> > In the realistic case (90 906 atoms) the things are more
> > optimistic:
> > 3) GTX590 FactorIX NPT:
> > 1GPU: 4.10 ns/day
> > 2GPU 6.66 ns/day
> > 4GPU 7.42 ns/day
> > (GTX470 3.65 ns/day)
> >
> > 4) GTX590 FactorIX NVE:
> > 1GPU: 6.53 ns/day
> > 2GPU 9.66 ns/day
> > 4GPU 11.01 ns/day
> > (GTX470 5.53 ns/day)
> >
> > Here we have more than 1.6x scale between cores for NPT and
> > 1.5 NVE, but again 1.11-1.14 between cards. Probably
> > something wrong with my system... Thus, in that case GTX590
> > is about 75-81% faster than GTX470 and probably around 30%
> > faster than GTX580, but not 50%. I have seen few percent
> > differences between the drivers and especially in the case
> > of 270 series (always bad results) thus one can expect
> > additional few % after official driver realize and further
> > improvements. Unfortunately the memory is insufficient for
> > 400 000 atoms and two 3GB GTX580 (! if they scale well, I
> > don't believe that after today's experiments and
> > Ross's comments, but some results are welcome) seems to be
> > better choose for larger systems.
> >
> > GB
> > 5) GTX590 TRPCage:
> > 1GPU: 354.77 ns/day
> > (GTX470 398.25 ns/day)
> >
> > 6) GTX590 Myoglobin:
> > 1GPU: 49.42 ns/day
> > 2GPU 62.82 ns/day
> > 4GPU 79.09 ns/day
> > (GTX470 49.03 ns/day)
> >
> >
> > I think that the bad driver conclusion is supported by the
> > above results, because even if only the speed play the major
> > role here, GTX470 gives better results, which is I think
> > impossible... In the case of Myoglobin the scale between
> > cores is the same like two C2050 and similar between cards.
> > Thus, I hope that with better drivers the numbers will be
> > better in all tests.
> >
> > I'd like to note that I observed similar scales with NAMD
> > 2.7 and 2.8.
> > No problem with core temperatures (I don't know what
> > about VRM and all hysteria in that direction) - 62-65C under
> > load and probably will reach 70-75C during long simulations,
> > but don't think more because of the GPU's scales.
> >
> >
> > I also have to mention that have some problems with the
> > BIOS versions - GPU1 and GPU 3 works with ASUS bios revision
> > 2, but GPU2 and 4 with revision 1...I will solve that
> > today. In general the driver is not good; I was not
> > able to start some programs and also to follow up the GPU
> > usages. How these problems, or other, relate to the above
> > results I don't know, but probably we can not expect much
> > more from GTX590. Disappointed :(
> >
> > I am not expert, but hope that CUDA 4.0 (created mainly for
> > better parallel performance) may solve some of these
> > problems, but as Ross informed today this will take time.
> >
> > Regards,
> > Filip
> >
> >
> >
> > --- On Fri, 4/8/11, Marek Maly <marek.maly.ujep.cz>
> > wrote:
> >
> > > From: Marek Maly <marek.maly.ujep.cz>
> > > Subject: Re: [AMBER] Amber CUDA calcualtion on GeForce
> > GTX 590 ?
> > > To: "AMBER Mailing List" <amber.ambermd.org>
> > > Date: Friday, April 8, 2011, 3:53 AM
> > > Hi Filip,
> > > thanks for the info (I didn.t obtain any info from
> > NVIDIA
> > > help desk till
> > > now).
> > > Anyway when you have the first Amber benchmarks with
> > 590
> > > done I hope that
> > > you
> > > tell us here about your first impressions ...
> > >
> > >
> > > Best wishes,
> > >
> > > Marek
> > >
> > >
> > >
> > >
> > > Dne Fri, 08 Apr 2011 01:52:35 +0200 filip fratev
> > <filipfratev.yahoo.com>
> > >
> > > napsal/-a:
> > >
> > > > Hi Ross and Marek,
> > > >
> > > > Marek I saw this evening that GTX590 was included
> > in
> > > the new drivers
> > > > "Developer Drivers for Linux (270.40)" that
> > comes
> > > with the new CUDA
> > > > 4.0RC2 release. There is still no access via the
> > > official Nvidia
> > > > website, but one can download them from
> > developer
> > > zone:
> > > > http://developer.nvidia.com/cuda-toolkit-40
> > > > I will test the drivers and hope finally to test
> > > GTX590 with Amber too.
> > > >
> > > > Ross I read your further comments about Cuda4.0,
> > but
> > > regarding to our
> > > > last discussion about GTX590 GPU's scaling, do
> > you
> > > think that Cuda 4.0
> > > > can bring us some better performance for these
> > cards
> > > compared to GTX295?
> > > >
> > > > Regards,
> > > > Filip
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > _______________________________________________
> > > > AMBER mailing list
> > > > AMBER.ambermd.org
> > > > http://lists.ambermd.org/mailman/listinfo/amber
> > >
> > >
> > > --
> > > Tato zpráva byla vytvořena převratným poštovním
> > > klientem Opery:
> > > http://www.opera.com/mail/
> > >
> > > _______________________________________________
> > > AMBER mailing list
> > > AMBER.ambermd.org
> > > http://lists.ambermd.org/mailman/listinfo/amber
> > >
> >
> >
> >
> >
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
>
>
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>



-- 
Jason M. Swails
Quantum Theory Project,
University of Florida
Ph.D. Candidate
352-392-4032
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sat Apr 09 2011 - 11:00:02 PDT
Custom Search