Re: [AMBER] Amber scaling on culster from Ross Walker on 2014-06-24 (Amber Archive Jun 2014)

From: Ross Walker <ross.rosswalker.co.uk>
Date: Tue, 24 Jun 2014 13:48:46 -0700

That sounds normal to me - scaling over multiple nodes is mostly an
exercise in futility these days. Scaling to multiple cores normally
improves with system size - chances are your system is too small (12,000
atoms?) to scale to more than about 16 or 24 MPI tasks so that's probably
about where you will top out. Unfortunately the latencies and bandwidths
of 'modern' interconnects just aren't up to the job.

Better use a single GTX-780 GPU in a single node and you should get
180ns/day+ - < $2500 for a node with 2 of these:
http://ambermd.org/gpus/recommended_hardware.htm#diy

All the best
Ross

On 6/24/14, 1:39 PM, "Roitberg,Adrian E" <roitberg.ufl.edu> wrote:

>Hi
>
>I am not sure those numbers are indicative of a bad performance. Why do
>you say that ?
>
>If I look at the amber benchmarks in the amber webpage for JAC (25K
>atoms, roughly double yours), it seems that 45 ns/day is not bad at all
>for cpus.
>
>
>Dr. Adrian E. Roitberg
>
>Colonel Allan R. and Margaret G. Crow Term Professor.
>Quantum Theory Project, Department of Chemistry
>University of Florida
>roitberg.ufl.edu
>352-392-6972
>
>________________________________________
>From: George Tzotzos [gtzotzos.me.com]
>Sent: Tuesday, June 24, 2014 4:19 PM
>To: AMBER Mailing List
>Subject: [AMBER] Amber scaling on culster
>
>Hi everybody,
>
>This is a plea for help. I'm running production MD on a cluster of a
>relatively small system (126 residues, ~ 4,000 HOH molecules). Despite
>all sorts of tests using different number of nodes and processors, I
>never managed to get the system running faster than 45ns/day, which seems
>to me a rather bad performance. The problem seems to be beyond the
>knowledge range of our IT people, therefore, your help will be greatly
>appreciated.
>
>
>I¹m running Amber 12 and AmberTools 13
>
>My input script is:
>
>production Agam(3n7h)-7octenoic acid (OCT)
> &cntrl
> imin=0,irest=1,ntx=5,
> nstlim=10000000,dt=0.002,
> ntc=2,ntf=2,
> cut=8.0, ntb=2, ntp=1, taup=2.0,
> ntpr=5000, ntwx=5000,
> ntt=3, gamma_ln=2.0, ig=-1,
> temp0=300.0,
> /
>
>The Cluster configuration is:
>
>
>SGI Specs SGI ICE X
>OS - SUSE Linux Enterprise Server 11 SP2
>Kernel Version: 3.0.38-0.5
>2x6-Core Intel Xeon
>
>16 blades 12 cores each
>
>The cluster uses Xeon E5-2630 . 2.3 GHz; Infiniband FDR 70 Gbit/sec
>
>
>
>[root.service0 ~]# mpirun -host r1i0n0,r1i0n2 -np 2 /mnt/IMB-MPI1 PingPong
> benchmarks to run PingPong
>#---------------------------------------------------
># Intel (R) MPI Benchmark Suite V3.2.4, MPI-1 part
>#---------------------------------------------------
># Date : Wed May 21 19:52:41 2014
># Machine : x86_64
># System : Linux
># Release : 2.6.32-358.el6.x86_64
># Version : #1 SMP Tue Jan 29 11:47:41 EST 2013
># MPI Version : 2.2
># MPI Thread Environment:
>
># New default behavior from Version 3.2 on:
>
># the number of iterations per message size is cut down # dynamically
>when a certain run time (per message size sample) # is expected to be
>exceeded. Time limit is defined by variable # "SECS_PER_SAMPLE" (=>
>IMB_settings.h) # or through the flag => -time
>
>======================================================
>Tests resulted in the following output
>
># Calling sequence was:
>
># /mnt/IMB-MPI1 PingPong
>
># Minimum message length in bytes: 0
># Maximum message length in bytes: 4194304 #
># MPI_Datatype : MPI_BYTE
># MPI_Datatype for reductions : MPI_FLOAT
># MPI_Op : MPI_SUM
>#
>#
>
># List of Benchmarks to run:
>
># PingPong
>
>#---------------------------------------------------
># Benchmarking PingPong
># #processes = 2
>#---------------------------------------------------
> #bytes #repetitions t[usec] Mbytes/sec
> 0 1000 0.91 0.00
> 1 1000 0.94 1.02
> 2 1000 0.96 1.98
> 4 1000 0.98 3.90
> 8 1000 0.97 7.87
> 16 1000 0.96 15.93
> 32 1000 1.09 28.07
> 64 1000 1.09 55.82
> 128 1000 1.28 95.44
> 256 1000 1.27 192.46
> 512 1000 1.44 338.48
> 1024 1000 1.64 595.48
> 2048 1000 1.97 992.49
> 4096 1000 3.10 1261.91
> 8192 1000 4.65 1681.57
> 16384 1000 8.56 1826.30
> 32768 1000 15.84 1972.98
> 65536 640 17.73 3525.00
> 131072 320 32.92 3797.43
> 262144 160 55.51 4504.01
> 524288 80 115.21 4339.80
> 1048576 40 256.11 3904.54
> 2097152 20 537.72 3719.39
> 4194304 10 1112.70 3594.86
>
>
># All processes entering MPI_Finalize
>_______________________________________________
>AMBER mailing list
>AMBER.ambermd.org
>http://lists.ambermd.org/mailman/listinfo/amber
>
>_______________________________________________
>AMBER mailing list
>AMBER.ambermd.org
>http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Jun 24 2014 - 14:00:03 PDT