Re: [AMBER] Running Amber on a SGI cluster from George Tzotzos on 2013-09-25 (Amber Archive Sep 2013)

From: George Tzotzos <gtzotzos.me.com>
Date: Wed, 25 Sep 2013 15:35:43 -0300

Ross, Jason,

Many thanks.

Our cluster uses Xeon E5-2630 @ 2.3 GHz; Infiniband FDR 70 Gbit/sec

I tried to bench mark the system I mentioned earlier for 2 ns production MD. It scaled as follows:

8 nodes 6 core/node estimated time for 2 ns 2.8 h 16.7 ns/day
16 nodes 3 core/node estimated time for 2 ns 2.4 h 18.0 ns/day
16 nodes 6 core/node estimated time for 2 ns 2.3 h 19.5 ns/day
16 nodes 8 core/node estimated time for ins 6.5 h 7.3 ns/day

The above shows that processing capacity gets saturated if more than a certain number of cores are used.

Best regards

George

On Sep 25, 2013, at 1:30 PM, Ross Walker <ross.rosswalker.co.uk> wrote:

> So that's like, what? < 23000 atoms.
>
> That's likely what is limiting your scaling.
>
> Things that affect scaling:
>
> 1) Atom count - more atoms will scale better
> 2) FFT dimensions - larger NFFT values will scale better
> 3) cut off - larger cutoffs will run slower but scale better
> 4) ntt=3 - the langevin thermostat hurts scaling - berendsen and anderson
> scale better.
> 5) NPT - constant pressure runs don't scale as well due to the all to all
> in virial.
>
> You are probably just at the limit for such a small system. Note scaling
> for DHFR NPT tops out on Gordon at 64 cores of E5-2670 GPUs [at
> 44.2ns/day] (8 core sandybridge with decent memory bandwidth) - note this
> is dual channel QDR IB.
>
> What is the IB you have on your system? - is it dual channel QDR or single
> channel FDR? (or something worse?)
>
> Specifically which xeon chips are these? Are they the models prior to
> Sandybridge? - The memory bandwidth pre-sandybridge was notoriously bad
> and so one typically saw very poor performance if all the cores in a node
> were used. Try using just 8 cores per node and see if that helps.
>
> All the best
> Ross
>
>
> On 9/25/13 9:20 AM, "George Tzotzos" <gtzotzos.me.com> wrote:
>
>> Hi Ross,
>>
>> Using a 140 residue protein in thermostat (~ 4,000 HOHs).
>>
>> Cheers
>>
>> George
>>
>> On Sep 25, 2013, at 12:53 PM, Ross Walker <ross.rosswalker.co.uk> wrote:
>>
>>> How many atoms is the system you are trying to simulate?
>>>
>>> And what settings are you using? - Thermostat, barostat etc.
>>>
>>> All the best
>>> Ross
>>>
>>>
>>> On 9/25/13 8:26 AM, "George Tzotzos" <gtzotzos.me.com> wrote:
>>>
>>>> Jason,
>>>>
>>>> Many thanks. We have Infiniband running on the cluster. Is there
>>>> another
>>>> diagnostic to achieve better scaling?
>>>>
>>>> Any suggestion will be much appreciated.
>>>>
>>>> Regards
>>>>
>>>> George
>>>>
>>>>
>>>> On Sep 25, 2013, at 8:52 AM, Jason Swails <jason.swails.gmail.com>
>>>> wrote:
>>>>
>>>>> On Tue, Sep 24, 2013 at 3:32 PM, George Tzotzos <gtzotzos.me.com>
>>>>> wrote:
>>>>>
>>>>>> Hi everybody,
>>>>>>
>>>>>> I'm trying to run Amber on a cluster of the following specs
>>>>>>
>>>>>> SGI Specs SGI ICE X
>>>>>> OS - SUSE Linux Enterprise Server 11 SP2
>>>>>> Kernel Version: 3.0.38-0.5
>>>>>> 2x6-Core Intel Xeon
>>>>>>
>>>>>> 16 blades 12 cores each
>>>>>>
>>>>>> Environment
>>>>>> export AMBERHOME=/bio/georgios/MD/amber12
>>>>>> export
>>>>>>
>>>>>>
>>>>>> LD_LIBRARY_PATH=/opt/rpm_share/lib/lib64:/bio/george/amber12/lib:/bio/
>>>>>> ge
>>>>>> orge/MD/amber12/AmberTools/lib
>>>>>>
>>>>>> Command line
>>>>>> mpirun -np 48 pmemd.MPI -O -i prod.in -o prod_12ns.out -p
>>>>>> 2erb_bis_solv.prmtop -c prod_10ns.rst -r prod_12ns.rst -x
>>>>>> prod_12ns.mdcrd
>>>>>>
>>>>>> Question
>>>>>>
>>>>>> No advantage in increasing the number of nodes beyond -np 24. The
>>>>>> performance is reduced the more cores engaged. In fact it is similar
>>>>>> or
>>>>>> worse to that on a OSX 2 x 3.06 GHz 6-Core Intel Xeon
>>>>>>
>>>>>> I'd be very grateful for any suggestions on what may be wrong
>>>>>>
>>>>>
>>>>> Your interconnect between nodes may be too slow. Also, the more cores
>>>>> you
>>>>> have on a single node, the more bandwidth you need between nodes to
>>>>> avoid a
>>>>> slow-down (this is why some supercomputers give faster timings for
>>>>> Amber
>>>>> when you do not utilize the whole node). This does not paint the
>>>>> whole
>>>>> picture (the topology of the inter-node connections also matters to
>>>>> some
>>>>> extent), but it's probably the most important part.
>>>>>
>>>>> You really need Infiniband (QDR is typical, I think) to see good
>>>>> scaling
>>>>> across nodes with Amber.
>>>>>
>>>>> HTH,
>>>>> Jason
>>>>>
>>>>> --
>>>>> Jason M. Swails
>>>>> BioMaPS,
>>>>> Rutgers University
>>>>> Postdoctoral Researcher
>>>>> _______________________________________________
>>>>> AMBER mailing list
>>>>> AMBER.ambermd.org
>>>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>>
>>>>
>>>> _______________________________________________
>>>> AMBER mailing list
>>>> AMBER.ambermd.org
>>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>
>>>
>>>
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Sep 25 2013 - 12:00:02 PDT