Re: [AMBER] Cluster considerations from Ross Walker on 2011-02-11 (Amber Archive Feb 2011)

From: Ross Walker <ross.rosswalker.co.uk>
Date: Fri, 11 Feb 2011 20:12:15 -0800

Hi Peker,

> We are considering putting together a small GPU cluster for running
> AMBER simulations of some larger biomolecules (~100k atoms).
> Naturally, there are many decisions to be made and not a whole lot of
> documentation describing what works. Our budget is <$10k, so our first
> inclination is to buy four Intel i5s boxes, each with two GPUs
> connected over Gigabit Ethernet. Have people had good experiences with
> this sort of setup? In particular,
>
> 1) Has anyone had experience using GPUs in an MPI configuration over
> gigabit ethernet? Is Gigabit Ethernet capable of delivering the
> bandwidth/latency to keep the cards busy?

The gigabit Ethernet will be fine for mounting a file system, say over NFS.
For MPI communication, to run simulations in parallel it will be completely
useless. For GPU runs in parallel across nodes you need QDR Infiniband as a
minimum. However, you'd be able to run in parallel within a node over one or
more GPUs.

> 2) In the event that gigabit ethernet is insufficient, we have
> considered purchasing an Infiniband interconnect. This, of course,

Only if you want to run across multiple nodes. If you have multiple jobs you
could always just run them on individual nodes which should be fine.

> would require 3x16 PCIe lanes, which no consumer motherboard I have
> seen provides. It seems like the most common configuration is one x16

See: http://www.provantage.com/supermicro-x8dtg-qf~7SUPM39V.htm

Works well with 4 GPUs in one box. All 4 lanes are X16. If you want the
specs for a complete machine here's an option:

http://www.rosswalker.co.uk/foo.htm

> slot with two x8 slots. This brings us to the question, how much does
> AMBER rely on GPU-CPU data transfers? Would running two GPUs with 8
> lanes each substantially reduce performance? Is there a way we could
> disable 8 lanes of our current setup for benchmarking purposes?

Running in parallel across multiple GPUs will be poor if you only have them
in x8 slots. It should not affect single GPU runs too much. Maybe 10% or so.
However, for running a single calculation across multiple GPUs then you need
them all in x16 slots.

All the best
Ross

/\
\/
|\oss Walker

---------------------------------------------------------
| Assistant Research Professor |
| San Diego Supercomputer Center |
| Adjunct Assistant Professor |
| Dept. of Chemistry and Biochemistry |
| University of California San Diego |
| NVIDIA Fellow |
| http://www.rosswalker.co.uk | http://www.wmd-lab.org/ |
| Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
---------------------------------------------------------

Note: Electronic Mail is not secure, has no guarantee of delivery, may not
be read every day, and should not be used for urgent or sensitive issues.

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Fri Feb 11 2011 - 20:30:02 PST