Re: [AMBER] The system size limitations for Tesla C2050 ? from Ross Walker on 2010-07-12 (Amber Archive Jul 2010)

From: Ross Walker <ross.rosswalker.co.uk>
Date: Mon, 12 Jul 2010 11:02:19 -0700

Hi Marek,

> Could you please send some more information (the proper web link is
> enough) about the patch after which single
> C2050 was able calculate 400K atom system (i.e. the cellulose benchmark
> ) ?

Please be patient here. The patch will come as part of the 'monster' patch
to add parallel support. This all needs to be extensively tested before
release to make sure the code is giving the correct answers, that we have
found as many bugs as we can etc.

I would like to avoid people being given 'experimental' or 'partial' patches
since it will just make support a complete disaster down the line. Given
people ultimately want to publish the results from their simulations it is
also critical that others be able to reproduce their work and this is
difficult if there are multiple versions of AMBER out there, especially with
something as new as the CUDA GPU support.

> You are rigt the speedup is (speaking about the explicit solvent Amber
> calc.) from cca 40 to 100% according
> to the relevant benchmark:
>
> http://ambermd.org/gpus/benchmarks.htm
>
> From that benchmark is evident that speedup is strongly dependent on
> system size (with higher size the speedup is decreasing).

Yes this will ALWAYS be the case. The interesting thing about the GPU
situation is that the speedup for small systems such as JAC is greater than
for large systems such as FactorIX. The reasons for this, as with all
benchmarks, are hopelessly complex and a function of the way memory access
is done on the GPU but also the fact that on the CPU the larger test case
scales better to the 8 cores of the test machine than the smaller one. This
is often what is missing when people just talk about speedup since there are
MANY degrees of freedom. However, the key point is that the AMBER GPU code
gets better speedup with smaller systems than larger ones. This of course
breaks down if you go too small. Probably JAC is the sweetspot although I've
never had time to characterize it properly. Note this is the complete
reverse of MPI where the larger the system the better the scaling.

So, in summary with regards to the patch, please be patient. I wish things
could be done a lot faster but ultimately funding is the limitation which
limits the number of people that can work on this. I'm sure NVIDIA would
love to chuck out the patch to you right now etc but that is because they
ultimately don't have to support this when things go wrong. Plus I
appreciate the need for the science to be correct! So just give us a while
to get things properly tested and then the patch will be posted on the amber
website.

All the best
Ross

/\
\/
|\oss Walker

---------------------------------------------------------
| Assistant Research Professor |
| San Diego Supercomputer Center |
| Adjunct Assistant Professor |
| Dept. of Chemistry and Biochemistry |
| University of California San Diego |
| http://www.rosswalker.co.uk | http://www.wmd-lab.org/ |
| Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
---------------------------------------------------------

Note: Electronic Mail is not secure, has no guarantee of delivery, may not
be read every day, and should not be used for urgent or sensitive issues.

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Jul 12 2010 - 11:30:03 PDT