Re: [AMBER] Amber scaling on culster from Ross Walker on 2014-06-24 (Amber Archive Jun 2014)

From: Ross Walker <ross.rosswalker.co.uk>
Date: Tue, 24 Jun 2014 14:11:49 -0700

One further note - you can improve things a little bit by using ntt=1 or 2
rather than 3. The langevin thermostat can hurt scaling in parallel. You
could also try leaving some of the cores idle on the machine - sometimes
this helps. As in request say 4 nodes but only 8 cores per node and set
mpirun -np 32. Make sure it does indeed run only 8 mpi tasks per node.

All the best
Ross

On 6/24/14, 1:48 PM, "Ross Walker" <ross.rosswalker.co.uk> wrote:

>That sounds normal to me - scaling over multiple nodes is mostly an
>exercise in futility these days. Scaling to multiple cores normally
>improves with system size - chances are your system is too small (12,000
>atoms?) to scale to more than about 16 or 24 MPI tasks so that's probably
>about where you will top out. Unfortunately the latencies and bandwidths
>of 'modern' interconnects just aren't up to the job.
>
>Better use a single GTX-780 GPU in a single node and you should get
>180ns/day+ - < $2500 for a node with 2 of these:
>http://ambermd.org/gpus/recommended_hardware.htm#diy
>
>All the best
>Ross
>
>
>On 6/24/14, 1:39 PM, "Roitberg,Adrian E" <roitberg.ufl.edu> wrote:
>
>>Hi
>>
>>I am not sure those numbers are indicative of a bad performance. Why do
>>you say that ?
>>
>>If I look at the amber benchmarks in the amber webpage for JAC (25K
>>atoms, roughly double yours), it seems that 45 ns/day is not bad at all
>>for cpus.
>>
>>
>>Dr. Adrian E. Roitberg
>>
>>Colonel Allan R. and Margaret G. Crow Term Professor.
>>Quantum Theory Project, Department of Chemistry
>>University of Florida
>>roitberg.ufl.edu
>>352-392-6972
>>
>>________________________________________
>>From: George Tzotzos [gtzotzos.me.com]
>>Sent: Tuesday, June 24, 2014 4:19 PM
>>To: AMBER Mailing List
>>Subject: [AMBER] Amber scaling on culster
>>
>>Hi everybody,
>>
>>This is a plea for help. I'm running production MD on a cluster of a
>>relatively small system (126 residues, ~ 4,000 HOH molecules). Despite
>>all sorts of tests using different number of nodes and processors, I
>>never managed to get the system running faster than 45ns/day, which seems
>>to me a rather bad performance. The problem seems to be beyond the
>>knowledge range of our IT people, therefore, your help will be greatly
>>appreciated.
>>
>>
>>I¹m running Amber 12 and AmberTools 13
>>
>>My input script is:
>>
>>production Agam(3n7h)-7octenoic acid (OCT)
>> &cntrl
>> imin=0,irest=1,ntx=5,
>> nstlim=10000000,dt=0.002,
>> ntc=2,ntf=2,
>> cut=8.0, ntb=2, ntp=1, taup=2.0,
>> ntpr=5000, ntwx=5000,
>> ntt=3, gamma_ln=2.0, ig=-1,
>> temp0=300.0,
>> /
>>
>>The Cluster configuration is:
>>
>>
>>SGI Specs SGI ICE X
>>OS - SUSE Linux Enterprise Server 11 SP2
>>Kernel Version: 3.0.38-0.5
>>2x6-Core Intel Xeon
>>
>>16 blades 12 cores each
>>
>>The cluster uses Xeon E5-2630 @ 2.3 GHz; Infiniband FDR 70 Gbit/sec
>>
>>
>>
>>[root.service0 ~]# mpirun -host r1i0n0,r1i0n2 -np 2 /mnt/IMB-MPI1
>>PingPong
>> benchmarks to run PingPong
>>#---------------------------------------------------
>># Intel (R) MPI Benchmark Suite V3.2.4, MPI-1 part
>>#---------------------------------------------------
>># Date : Wed May 21 19:52:41 2014
>># Machine : x86_64
>># System : Linux
>># Release : 2.6.32-358.el6.x86_64
>># Version : #1 SMP Tue Jan 29 11:47:41 EST 2013
>># MPI Version : 2.2
>># MPI Thread Environment:
>>
>># New default behavior from Version 3.2 on:
>>
>># the number of iterations per message size is cut down # dynamically
>>when a certain run time (per message size sample) # is expected to be
>>exceeded. Time limit is defined by variable # "SECS_PER_SAMPLE" (=>
>>IMB_settings.h) # or through the flag => -time
>>
>>======================================================
>>Tests resulted in the following output
>>
>># Calling sequence was:
>>
>># /mnt/IMB-MPI1 PingPong
>>
>># Minimum message length in bytes: 0
>># Maximum message length in bytes: 4194304 #
>># MPI_Datatype : MPI_BYTE
>># MPI_Datatype for reductions : MPI_FLOAT
>># MPI_Op : MPI_SUM
>>#
>>#
>>
>># List of Benchmarks to run:
>>
>># PingPong
>>
>>#---------------------------------------------------
>># Benchmarking PingPong
>># #processes = 2
>>#---------------------------------------------------
>> #bytes #repetitions t[usec] Mbytes/sec
>> 0 1000 0.91 0.00
>> 1 1000 0.94 1.02
>> 2 1000 0.96 1.98
>> 4 1000 0.98 3.90
>> 8 1000 0.97 7.87
>> 16 1000 0.96 15.93
>> 32 1000 1.09 28.07
>> 64 1000 1.09 55.82
>> 128 1000 1.28 95.44
>> 256 1000 1.27 192.46
>> 512 1000 1.44 338.48
>> 1024 1000 1.64 595.48
>> 2048 1000 1.97 992.49
>> 4096 1000 3.10 1261.91
>> 8192 1000 4.65 1681.57
>> 16384 1000 8.56 1826.30
>> 32768 1000 15.84 1972.98
>> 65536 640 17.73 3525.00
>> 131072 320 32.92 3797.43
>> 262144 160 55.51 4504.01
>> 524288 80 115.21 4339.80
>> 1048576 40 256.11 3904.54
>> 2097152 20 537.72 3719.39
>> 4194304 10 1112.70 3594.86
>>
>>
>># All processes entering MPI_Finalize
>>_______________________________________________
>>AMBER mailing list
>>AMBER.ambermd.org
>>http://lists.ambermd.org/mailman/listinfo/amber
>>
>>_______________________________________________
>>AMBER mailing list
>>AMBER.ambermd.org
>>http://lists.ambermd.org/mailman/listinfo/amber
>
>
>
>_______________________________________________
>AMBER mailing list
>AMBER.ambermd.org
>http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Jun 24 2014 - 14:30:02 PDT