AMBER: Small scale compute environments from M. L. Dodson on 2007-05-12 (Amber Archive May 2007)

From: M. L. Dodson <mldodson.houston.rr.com>
Date: Sat, 12 May 2007 14:13:59 -0500

Hello Ambers,

I am about to purchase two new compute nodes for my organization.
I asked Ross Walker to comment on compute nodes for small scale
Amber simulation environments. In particular, I asked him to
comment on two Intel quadcore nodes connected by GB ethernet in a
crossover cable configuration (8 cores total). This is a summary
of his response. I hope it will be useful for other individual
investigator Amber environments. I will, in some cases be leaving
out the technical rationale he gave for his positions. Contact me
directly by email, and I will forward his whole response email.

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Hi Bud,

> I'm thinking of two 2.4GHz Intel Core 2 Quad motherboards with
> GB ethernet talking to each other via a crossover cable. What
> is your feeling about Core 2 Duo CPUs, Core 2 Quad CPUs, etc.

I have not looked directly into the quad core chips yet. My
initial instincts are that they will be sorely lacking in memory
bandwidth. So for reasonably small systems, say 30K atoms or less
they will likely scream but for > 30K atoms the performance while
better than 2 cores will not be as great as expected. That said
for a 4 way calculation you should get pretty good scaling.

As for a cross over cable Bob Duke who writes PMEMD swears by it
although I don't think he has tried it with dual quad cores. That
said he has certainly done it with 2 machines by 2 by dual core so
4 to 4 and it works okay so 2x quad core shouldn't be bad. I don't
have explicit numbers for a crossover cable but from some recent
things I have been finding out it may be pretty good even with 4
cpus as long as it is definitely a crossover cable. It seems that
the diabolical performance these days with gigabit ethernet may
not simply be because we have maxed out the bandwidth but that the
cpus are now fast enough to overload the switch... [rationale
elided] The issue is one of fundamental design. The reason
infiniband is so much better is not so much that it has a higher
bandwidth but that it does a transactional based flow control
mechanism that guarantees a packet can never be lost due to lack
of buffer space at the receiver or through collision. Plus if a
packet is lost due to signal degradation with infiniband the
retransmit time is on the order of ms as opposed to 2 seconds with
ethernet....

[summary elided]

You may have to tweak the software network setting a bit to get
the best performance but it should also be pretty good out the box
- although I can't guarantee this since I haven't tried it in like
4 years.. The advantage of working at a supercomputer center is
that I don't have to worry about cheap interconnects anymore
;-)...

> I don't want to step up to really high speed interconnects
> because of the cost considerations. Ditto going through a
> switch. That is why I am thinking about limiting myself to two
> physical boxes.

Yes this will save you quite a bit. One thing you might want to
consider as well is seeing how much a dual by quad core would cost
you - in this way you could get 8 procs in one box and the
performance would be better than two 4 proc boxes.

> 8GB of memory, BTW.

That's good - although the thing to remember here is memory per
core so 8 GB is really only 2GB per core - that said that is a
ton for MD though - sander typically never needs more than about
2GB total for even the largest of systems.

> My reading of the benchmarking posts indicate I should scale
> reasonably well over the quad CPU cores, with quite a bit of
> falloff when going over the GB EN.

Yes but that is for a switch - which almost certainly (in
hindsight) had flow control turned off. I would expect at least 5
to 6 times speed up for 8 cpus with a cross over cable. Plus for
regular MD you should use PMEMD which will both perform faster and
scale faster.

> Job mix would be PMEMD classical MD, steered MD (classical and
> QMMM) and NEB (classical and QMMM), mostly. I understand QMMM
> will not be very accelerated.

That depends on the ratio of QM to MM. If you are doing pure QM at
present you'll see pretty much no speedup. However, everything in
the QM code is parallel in Amber 9 with the exception of the
matrix diagonalization and the density build. So you can get an
idea of how well it will scale by looking at the percentage time
that these two operations take... [rationale elided]

Note NEB should scale really well as replicas get shared out
across processors. So even a pure QM NEB calculation with 32
replicas should show close to 8 times speedup on 8 cpus. Amber 10
will have an even better NEB implementation for parallel - I am
currently testing this and it will likely scale to 2048+ cpus :-)

[elided]

All the best
Ross
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Hope this is a helpful to others as it was for me. Thanks to Ross
for sharing his insights to me and to the list.

Bud Dodson

-- 
M. L. Dodson
Email:	mldodson-at-houston-dot-rr-dot-com
Phone:	eight_three_two-five_63-386_one
-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu

Received on Sun May 13 2007 - 06:07:49 PDT