Re: [AMBER] README under pmemd/src?

From: Ross Walker <>
Date: Tue, 19 Nov 2013 10:35:28 -0800

Hi Yun,

What do you mean by 48 CPUs + 12 GPUs? - Do you mean you are trying to run
pmemd.cuda.MPI across 48 cores connected to 12 GPUs? - For starters things
like block_fft only apply to CPU runs - they are meaningless in GPU runs.
Secondly I would suggest reading the following page: which will explain how to run AMBER GPU runs.
Essentially 48 CPUs + 12 GPUs does not make sense and even if this was 12
Cores + 12 GPUs the calculation would be unlikely to scale unless it was a
replica exchange run.

In terms of parameters for the CPU only runs. The README is attached
although this has not been maintained so may be out of date in places.

To summarize (although note these are mostly set automatically based on
processor count - so messing with them may or may not help):

block_fft = 0 - use slab fft
           = 1 - use block fft; requires at least 4 processors, and not
                 permitted for minimizations or if nrespa > 1.

fft_blk_y_divisor = 2 .. nfft2 (after axis optimization reorientation);
                         default=2 or 4 depending on numtasks.

excl_recip = 0..1 - Exclusive reciprocal tasks flag. This flag, when 1,
                    specifies that tasks that do reciprocal force calcs
                    not also do direct force calculations. This has some
                    benefits at higher task count. At lower task count,
                    setting this flag can result in significant
                    underutilization of reciprocal tasks. This flag will
                    automatically be cleared if block fft's are not in use.

excl_master = 0..1 - Exclusive master task flag. This flag, when 1,
                     specifies that the master task will not do force and
                     energy calculations. At high scaling, what this does
                     is insure that no tasks are waiting for the master to
                     initiate collective communications events. The master
                     is thus basically dedicated to handling loadbalancing
                     output. At lower task count, this is obviously
                     wasteful. This flag will automatically be cleared if
                     block fft's are not in use or if excl_recip .ne. 1.

                     AND NOTE - when block fft's are in use, that implies
                     you are not doing a minimization and are not using
                     nrespa > 1.

atm_redist_freq = 16..1280 - The frequency (in pairlist build events) for
                             reassigning atom ownership to tasks. As a run
                             progresses, diffusion causes the atoms
                             collocated and assigned to one task to occupy
                             larger volume. With time, this starts to
                             a higher communications load, though the
                             load is lower than one might expect.
                             by default we reassign atoms to tasks every
                             pairlist builds at low to medium task count
                             we reassign atoms to tasks every 32 pairlist
                             builds at higher task counts (currently
                             as >= 96 tasks, redefinable in config.h). The
                             user can however specify the specific value he
                             desires. At low task count, frequent atom
                             redistribution tends to have a noticeable cost
                             and little benefit. At higher task count, the
                             cost is lower and the benefit is higher.

All the best

On 11/19/13 8:45 AM, "yunshi11 ." <> wrote:

>Hi there,
>I'm curious about these "PMEMD ewald parallel performance parameters"
>I found pmemd assigns them differently with different computing facilities
>(exactly the same system, i.e. MD run starts with the same .restrt file).
>With 8*16=128 CPUs, I have them as:
>| block_fft = 1
>| fft_blk_y_divisor = 4
>| excl_recip = 1
>| excl_master = 1
>| atm_redist_freq = 32
>In another run with 48 CPUs + 12 GPUs, I have:
>| block_fft = 0
>| fft_blk_y_divisor = 4
>| excl_recip = 0
>| excl_master = 0
>| atm_redist_freq = 320
>So I really wonder what these parameters mean as cannot access the "README
>under pmemd/src" for some reason.
Received on Tue Nov 19 2013 - 11:00:02 PST
