On Wed, Jun 13, 2018, Dmitry Suplatov wrote:
>
> I compared scalability of classical explicit-solvent MD implemented in
> AMBER on GPU and CPU. You can see the resulting plot by the dropbox link
> below (different colors correspond to four proteins of different size):
It looks like the communication efficiency across more than four nodes
on the CPU is drastically lower than among four nodes. You can examine
the "logfile" from each run, to see details of how much time is spent in
communication, and where the bottlenecks might be.
Overall, your (relative) numbers look a lot like what one would expect
from our bencmarks page: a single K40 GPU is 3-4 times faster than a
single node (15-20 cores) of a 2697v3 CPU. Scaling on the GPUS is
rather poor: e.g. your blue curve goes from 19 ns/day to only 28 ns/day
on going to 4 GPUs. This is why many people choose to use GPUs in
serial mode, running multiple independent simulations rather than one
coupled one.
Absolute scaling on the CPU is rather better: the black curve goes from
7 to 22 ns/day on scaling to 4 cpus.
...hope this helps....dac
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Jun 13 2018 - 06:00:06 PDT