Re: [AMBER] why cut off is so small for CUDA running? from Ross Walker on 2012-06-17 (Amber Archive Jun 2012)

From: Ross Walker <ross.rosswalker.co.uk>
Date: Sun, 17 Jun 2012 21:47:47 -0700

Hi Albert,

> I found that in the CUDA calculation the cut off always use 9.0 or
> even in most cases 8.0. I am curious about why it is so small? Is it

Not sure why you found this specific to the CUDA version. AMBER has always
defaulted to a nonbond cutoff of 8.0. Some people use 9. Very rarely do
people use more than that and most of the time (unless they carefully tweak
their reciprocal space settings to keep the ewald error constant but shift
more work to the direct space) they just end up wasting time for little or
no gain in accuracy.

The benchmarks provided on the GPU page are those that were designed long
ago (way before CUDA was even a twinkle in NVIDIA's eye) for the specific
reason of providing realistic benchmarks that reflect production
calculations rather than specific tuned benchmarks. We have used those
specific settings with the CPU codes for a long time.

> because of the CUDA module or just because the developer would like to
> show a better performance of CUDA? Usually, from the literature in most
> cases, people use 10.0.

If I really wanted to show a 'better' performance of CUDA then I'd be using
hydrogen mass repartitioning, 4 fs time steps, multiple time stepping for
PME, a crazy coarse PME grid, an 8 angstrom cut off, a shake tolerance of
10^-4 and running pure single precision SPSP. Yeah baby! we'd be at
150ns/day+ for DHFR no worries! And it would be 'hot hot hot' ;-)

Man, if we really wanted to 'artificially' show better performance we'd be
shaking all bonds and angles, doing dihedral only MD at 20fs timestep and
we'd be at a microsecond a day+ Woo hoo!!! Don't laugh, some marketing folks
for 'other' codes that shall remain nameless to protect the guilty seriously
do this to publish headline benchmark numbers and then compare to AMBER
running 'production' settings.

So, NO, specifically the settings have been chosen to reflect real world
simulations that people typically run. The comparisons are ALL apples to
apples as closely as we can (with the exception of using the new hybrid
precision model) and the CPU runs are all run with the same settings.
Besides, using a big cutoff would likely make the CPU code even slower
relative to the GPU code given the GPUs can do direct space sums so much
more efficiently.

Hope that clears that up.

All the best
Ross

/\
\/
|\oss Walker

---------------------------------------------------------
| Assistant Research Professor |
| San Diego Supercomputer Center |
| Adjunct Assistant Professor |
| Dept. of Chemistry and Biochemistry |
| University of California San Diego |
| NVIDIA Fellow |
| http://www.rosswalker.co.uk | http://www.wmd-lab.org/ |
| Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
---------------------------------------------------------

Note: Electronic Mail is not secure, has no guarantee of delivery, may not
be read every day, and should not be used for urgent or sensitive issues.

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sun Jun 17 2012 - 22:00:02 PDT