Re: [AMBER] The system size limitations for Tesla C2050 ? from Marek Maly on 2010-07-13 (Amber Archive Jul 2010)

From: Marek Maly <marek.maly.ujep.cz>
Date: Wed, 14 Jul 2010 01:04:01 +0200

OK,

thanks for the supplementary info !

Best,

Marek

Dne Wed, 14 Jul 2010 00:13:14 +0200 Scott Le Grand <SLeGrand.nvidia.com>
napsal/-a:

> No I'm only talking multi-GPU. Single GPU jobs run more or less
> independently unless you're doing something silly like dumping
> coordinates on every single step.
>
>
> -----Original Message-----
> From: Marek Maly [mailto:marek.maly.ujep.cz]
> Sent: Tuesday, July 13, 2010 14:47
> To: AMBER Mailing List
> Cc: andrea.danani.supsi.ch; massimo maiolo
> Subject: Re: [AMBER] The system size limitations for Tesla C2050 ?
>
> If this is true also for the single GPU jobs on such machine,
> then of course hybrid solution is really wasting of potential performance
> and money.
>
> Thanks a lot for this valuable and not so obvious information !
>
> Best wishes,
>
> Marek
>
>
> Dne Tue, 13 Jul 2010 23:37:19 +0200 Scott Le Grand <SLeGrand.nvidia.com>
> napsal/-a:
>
>> Without a lot of load-balancing work, a hybrid solution will have the
>> performance of the slowest part...
>>
>>
>> -----Original Message-----
>> From: Marek Maly [mailto:marek.maly.ujep.cz]
>> Sent: Tuesday, July 13, 2010 09:46
>> To: AMBER Mailing List
>> Cc: andrea.danani.supsi.ch; massimo maiolo
>> Subject: Re: [AMBER] The system size limitations for Tesla C2050 ?
>>
>> OK,
>>
>> many thanks to Ross and Scott for their explanation !
>> I hope that many C2050 users were pleased as me :))
>>
>> In fact my question was motivated just by the fact that my colleagues
>> are going to buy in these days small GPU machine (just 4GPUs
>> "Workstation"
>> ) so we
>> were thinking about the proper combination of C1060 and C2050.
>>
>> C1060 slower but able to calculate big systems (400K atoms and somewhat
>> more) and C2050
>> pretty fast and available for the small and middle sized systems (let
>> say
>> up to 300K).
>>
>> After these new information about the "magic patch" it seems that it
>> will
>> be sufficient to go here
>> with combination 3xC2050 + 1xC1060 or even with 4xC2050.
>>
>> The question if it is worth to consider hybrid solution could be
>> answered
>> fully after
>> is clear which is the difference in the max. system size computable on
>> C2050 and C1060
>> after the patch is applied.
>>
>> This will be still probably determined by the differences in the GPUs
>> build-in memories (4 and 3 GB) but maybe not only since
>> apart from the differences in the architectures which can affect the way
>> how this memory can be used, there might be also possibility,
>> that these GPUs can use differently some additional part of the
>> available
>> RAM memory of the machine where they are installed ...
>>
>> Probably the final answer could be given here after the new
>> "after-patch"
>> tests/benchmarks are done.
>>
>> Thanks again for the very good news and for the effort and time of all
>> developers !
>>
>> Best wishes,
>>
>> Marek
>>
>>
>>
>>
>> Dne Mon, 12 Jul 2010 20:02:19 +0200 Ross Walker <ross.rosswalker.co.uk>
>> napsal/-a:
>>
>>> Hi Marek,
>>>
>>>> Could you please send some more information (the proper web link is
>>>> enough) about the patch after which single
>>>> C2050 was able calculate 400K atom system (i.e. the cellulose
>>>> benchmark
>>>> ) ?
>>>
>>> Please be patient here. The patch will come as part of the 'monster'
>>> patch
>>> to add parallel support. This all needs to be extensively tested before
>>> release to make sure the code is giving the correct answers, that we
>>> have
>>> found as many bugs as we can etc.
>>>
>>> I would like to avoid people being given 'experimental' or 'partial'
>>> patches
>>> since it will just make support a complete disaster down the line.
>>> Given
>>> people ultimately want to publish the results from their simulations it
>>> is
>>> also critical that others be able to reproduce their work and this is
>>> difficult if there are multiple versions of AMBER out there, especially
>>> with
>>> something as new as the CUDA GPU support.
>>>
>>>> You are rigt the speedup is (speaking about the explicit solvent Amber
>>>> calc.) from cca 40 to 100% according
>>>> to the relevant benchmark:
>>>>
>>>> http://ambermd.org/gpus/benchmarks.htm
>>>>
>>>> From that benchmark is evident that speedup is strongly dependent on
>>>> system size (with higher size the speedup is decreasing).
>>>
>>> Yes this will ALWAYS be the case. The interesting thing about the GPU
>>> situation is that the speedup for small systems such as JAC is greater
>>> than
>>> for large systems such as FactorIX. The reasons for this, as with all
>>> benchmarks, are hopelessly complex and a function of the way memory
>>> access
>>> is done on the GPU but also the fact that on the CPU the larger test
>>> case
>>> scales better to the 8 cores of the test machine than the smaller one.
>>> This
>>> is often what is missing when people just talk about speedup since
>>> there
>>> are
>>> MANY degrees of freedom. However, the key point is that the AMBER GPU
>>> code
>>> gets better speedup with smaller systems than larger ones. This of
>>> course
>>> breaks down if you go too small. Probably JAC is the sweetspot although
>>> I've
>>> never had time to characterize it properly. Note this is the complete
>>> reverse of MPI where the larger the system the better the scaling.
>>>
>>> So, in summary with regards to the patch, please be patient. I wish
>>> things
>>> could be done a lot faster but ultimately funding is the limitation
>>> which
>>> limits the number of people that can work on this. I'm sure NVIDIA
>>> would
>>> love to chuck out the patch to you right now etc but that is because
>>> they
>>> ultimately don't have to support this when things go wrong. Plus I
>>> appreciate the need for the science to be correct! So just give us a
>>> while
>>> to get things properly tested and then the patch will be posted on the
>>> amber
>>> website.
>>>
>>> All the best
>>> Ross
>>>
>>>
>>> /\
>>> \/
>>> |\oss Walker
>>>
>>> ---------------------------------------------------------
>>> | Assistant Research Professor |
>>> | San Diego Supercomputer Center |
>>> | Adjunct Assistant Professor |
>>> | Dept. of Chemistry and Biochemistry |
>>> | University of California San Diego |
>>> | http://www.rosswalker.co.uk | http://www.wmd-lab.org/ |
>>> | Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
>>> ---------------------------------------------------------
>>>
>>> Note: Electronic Mail is not secure, has no guarantee of delivery, may
>>> not
>>> be read every day, and should not be used for urgent or sensitive
>>> issues.
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>
>>> __________ Informace od ESET NOD32 Antivirus, verze databaze 5272
>>> (20100712) __________
>>>
>>> Tuto zpravu proveril ESET NOD32 Antivirus.
>>>
>>> http://www.eset.cz
>>>
>>>
>>>
>>
>>
>
>

-- 
Tato zpráva byla vytvořena převratným poštovním klientem Opery:  
http://www.opera.com/mail/
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

Received on Tue Jul 13 2010 - 16:30:03 PDT