Re: [AMBER] README under pmemd/src?

From: yunshi11 . <yunshi09.gmail.com>
Date: Wed, 20 Nov 2013 09:21:41 -0800

Hi Ross,


On Tue, Nov 19, 2013 at 10:35 AM, Ross Walker <ross.rosswalker.co.uk> wrote:

> Hi Yun,
>
>
> What do you mean by 48 CPUs + 12 GPUs? - Do you mean you are trying to run
> pmemd.cuda.MPI across 48 cores connected to 12 GPUs? - For starters things
> like block_fft only apply to CPU runs - they are meaningless in GPU runs.
> Secondly I would suggest reading the following page:
> http://ambermd.org/gpus/ which will explain how to run AMBER GPU runs.
> Essentially 48 CPUs + 12 GPUs does not make sense and even if this was 12
> Cores + 12 GPUs the calculation would be unlikely to scale unless it was a
> replica exchange run.
>
>
Our cluster has some 12-core nodes (2 x 6-core Intel E5649) that have 3
general-purpose GPUs (NVIDIA Tesla M2070s) each, which results in this 4:1
CPU(core):GPU ratio.

Reading through the link, it seems to me that a 1:1 ratio would be better?
When running
*pmemd.cuda.MPI*, the number of tasks/threads depend on the number of CPUs?
And it is better to assign only one task to each GPU?

But why do 12 cores + 12 GPUs NOT scale? Because 12 is not a power of 2?

I also noticed that the "AMBER Certified Mid-Level Workstation" has 2x
(6-core) Intel Xeon E5-2620 with 2x NVIDIA GTX 780 GPUs, which would make
CPU(core):GPU ratio at 6:1?



> In terms of parameters for the CPU only runs. The README is attached
> although this has not been maintained so may be out of date in places.
>
> To summarize (although note these are mostly set automatically based on
> processor count - so messing with them may or may not help):
>
> block_fft = 0 - use slab fft
> = 1 - use block fft; requires at least 4 processors, and not
> permitted for minimizations or if nrespa > 1.
>
> fft_blk_y_divisor = 2 .. nfft2 (after axis optimization reorientation);
> default=2 or 4 depending on numtasks.
>
> excl_recip = 0..1 - Exclusive reciprocal tasks flag. This flag, when 1,
> specifies that tasks that do reciprocal force calcs
> will
> not also do direct force calculations. This has some
> benefits at higher task count. At lower task count,
> setting this flag can result in significant
> underutilization of reciprocal tasks. This flag will
> automatically be cleared if block fft's are not in use.
>
> excl_master = 0..1 - Exclusive master task flag. This flag, when 1,
> specifies that the master task will not do force and
> energy calculations. At high scaling, what this does
> is insure that no tasks are waiting for the master to
> initiate collective communications events. The master
> is thus basically dedicated to handling loadbalancing
> and
> output. At lower task count, this is obviously
> wasteful. This flag will automatically be cleared if
> block fft's are not in use or if excl_recip .ne. 1.
>
> AND NOTE - when block fft's are in use, that implies
> that
> you are not doing a minimization and are not using
> nrespa > 1.
>
> atm_redist_freq = 16..1280 - The frequency (in pairlist build events) for
> reassigning atom ownership to tasks. As a run
> progresses, diffusion causes the atoms
> originally
> collocated and assigned to one task to occupy
> a
> larger volume. With time, this starts to
> cause
> a higher communications load, though the
> increased
> load is lower than one might expect.
> Currently,
> by default we reassign atoms to tasks every
> 320
> pairlist builds at low to medium task count
> and
> we reassign atoms to tasks every 32 pairlist
> builds at higher task counts (currently
> defined
> as >= 96 tasks, redefinable in config.h). The
> user can however specify the specific value he
> desires. At low task count, frequent atom
> redistribution tends to have a noticeable cost
> and little benefit. At higher task count, the
> cost is lower and the benefit is higher.
>
>
>
>
>
>
>
> All the best
> Ross
>
> On 11/19/13 8:45 AM, "yunshi11 ." <yunshi09.gmail.com> wrote:
>
> >Hi there,
> >
> >I'm curious about these "PMEMD ewald parallel performance parameters"
> >since
> >I found pmemd assigns them differently with different computing facilities
> >(exactly the same system, i.e. MD run starts with the same .restrt file).
> >
> >With 8*16=128 CPUs, I have them as:
> >
> >| block_fft = 1
> >| fft_blk_y_divisor = 4
> >| excl_recip = 1
> >| excl_master = 1
> >| atm_redist_freq = 32
> >
> >
> >In another run with 48 CPUs + 12 GPUs, I have:
> >
> >| block_fft = 0
> >| fft_blk_y_divisor = 4
> >| excl_recip = 0
> >| excl_master = 0
> >| atm_redist_freq = 320
> >
> >
> >So I really wonder what these parameters mean as cannot access the "README
> >under pmemd/src" for some reason.
> >
> >Best,
> >Yun
> >_______________________________________________
> >AMBER mailing list
> >AMBER.ambermd.org
> >http://lists.ambermd.org/mailman/listinfo/amber
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
>

Thanks,

Yun
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Nov 20 2013 - 09:30:04 PST
Custom Search