Re: [AMBER] GBSA speeds for big molecules

From: Jason Swails <>
Date: Tue, 27 Jan 2015 17:18:35 -0500

> On Jan 27, 2015, at 4:18 PM, Asmit Bhowmick <> wrote:
> Hello Amber users,
> I am trying to do an MD simulation of a big protein(~ 22,000 atoms) with
> Amber12 software. The 1st thing I tried was to use GBSA with Amberff99SB
> given the system size. However, the speeds I am getting are surprisingly
> slow. On 64 cores of an AMD opteron 6274 processor(2.2 Ghz) using PMEMD and
> using 16 A cutoffs for both VdW and rgbmax, I get about 0.2 ns/day (~ 300
> ms/step). Are these the standard speeds to expect on such a big system with
> GBSA ?

They are not surprising to me. Do you get nearly-linear scaling up to these 64 cores? (This linear scaling really assumes either very good interconnect between nodes or a 64-core node). If scaling is not very good, then something weird might be happening (are you using pmemd.MPI instead of pmemd? Does the output file state that 64 nodes are being used?) However, even if your timings really are only making use of 1 CPU, you will definitely not get better than 3.8 ns/day -- probably a decent bit less (unless something that you’re doing is leading to resource thrashing that renders the 64-thread performance *worse* than 1-thread performance, which is possible).

That said, GBSA is a very poorly-scaling method -- it scales as O(N^2), since it typically requires *very* large cutoffs (a 16 A GB cutoff is probably too small). This is notably worse than PME, which, by virtue of the FFT used in the reciprocal calculation, scales as O(N.logN). The only reason GB is frequently faster is because PME usually requires a lot more particles -- but there is a crossover point in which the algorithmic advantages of PME outweigh the extra particles it requires over GB. That cutoff is quite likely less than 22K atoms (but I’ve never simulated a system that large).

Some more info here -- while technically using a cutoff reduces the scaling of a simulation from O(N^2) to O(N), the lack of any pairlist or spatial “cell” decomposition in the sander and pmemd GB code drops you back to O(N^2) complexity. So while a 16 A cutoff is computationally cheaper than no cutoff, it isn’t *that* much cheaper.

> Are there any suggested methods to simulate such a big protein ?

Check out the GB functionality in NAB (which you can actually use to run MD). It is parallelized just like pmemd (although it is also parallelized with OpenMP, which is likely a better choice if you are staying completely within a single node), but it *does* use a pairlist and so probably runs a little bit faster. There is also a new O(N.logN) GB scaling method in NAB called the heirarchical charge partitioning scheme (HCP). I’ve never used it myself, but it was designed for use with very large systems, and you may find that it will substantially improve performance.

It does require some extra steps when preparing the system, which are described in the manual.

Another thing to check out -- see what the GPU performance for your system is. You may find that it is quite acceptable.


Jason M. Swails
Rutgers University
Postdoctoral Researcher
AMBER mailing list
Received on Tue Jan 27 2015 - 14:30:02 PST
Custom Search