Re: [AMBER] OpenMM Performance (was Re: in vacuo dynamics)

From: Jason Swails <jason.swails.gmail.com>
Date: Tue, 27 May 2014 22:14:38 -0400

On Tue, May 27, 2014 at 5:11 PM, Robert McGibbon <rmcgibbo.gmail.com> wrote:
>
> The particular claim of Jason's that I disputed was that the overhead from
> the python interpreter and/or file IO has a significant effect on OpenMM
> performance.


​It can have a very significant impact. Writing a trajectory file even
using a binary format (for my example I used the NetCDF reporter class in
ParmEd, which does effectively the same thing, using the
scipy.io.netcdf_file backend, as MDTraj's NetCDFReporter class)​ requires a
_lot_ of overhead going through the Python application layer.

You download all 3N coordinates into Python space (not sure of the
mechanics or how much, if any, memory is copied each time), do a unit
conversion on all 3N coordinates, then write those coordinates to the
trajectory file. In pmemd.cuda, this overhead is effectively eliminated --
the sole cost is the NetCDF API call (cheap) and the I/O (expensive). If
you write snapshots infrequently, the trajectory file writing is not
noticeable since it is so much slower than the MD. If you write frequently
enough, the cost becomes measurable (and at some point, even dominant).
 Trajectory I/O cost is ~2 orders of magnitude faster with pmemd.cuda (or
any program that uses the OpenMM C/C++/F90 API directly -- basically
anything that _avoids_ Python and OMM's dimensional analysis) than it is in
the OpenMM Python application layer.

I ran a quick benchmark on my desktop with a GTX 680 (writing to a standard
SATA drive, 6 Gb/s IIRC) running all calculations using CUDA (5.5) on a
21,548-atom system with PME (SPFP in pmemd and mixed precision model in
OpenMM). Same constraint tolerance, cutoff, etc. Results are shown below
in a (fixed-width-optimized) table. The text file is attached in case the
formatting comes out poorly and you want to see it:

            OpenMM pmemd.cuda
            ------ ----------

      steps | Total time (s) steps | Total time (s)
    -------------------------- --------------------------
      10,000 | 374.9797 10,000 | 86.58
       5,000 | 375.5790 5,000 | 85.89
       2,000 | 378.6431 2,000 | 85.96
       1,000 | 384.0025 1,000 | 85.93
         500 | 395.7065 500 | 86.34
         250 | 417.7666 250 | 86.31
         100 | 484.1631 100 | 87.03
          50 | 597.1071 50 | 88.20
          10 | 1478.6362 10 | 98.74
           5 | 2677.9380* 5 | 109.40*
           1 | 11431.2420* 1 | 199.20*

This effect I had seen before. The cost of having the Simulation class
break up the MD into 10 step chunks so Python can catch a SIGINT I have
never measured. It would be easy enough to subclass Simulation and
implement "step" without breaking it into 10-step chunks, I've just never
done it.

All the best,
Jason

P.S. The timings presented here are just to look at the significance of
trajectory file I/O on the performance of OMM-Python and pmemd.cuda -- the
cross-comparisons are probably less reliable.

-- 
Jason M. Swails
BioMaPS,
Rutgers University
Postdoctoral Researcher



_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

Received on Tue May 27 2014 - 19:30:02 PDT
Custom Search