Re: [AMBER] Multiple CPU and single GPU doubt from Jason Swails on 2011-07-02 (Amber Archive Jul 2011)

From: Jason Swails <jason.swails.gmail.com>
Date: Sat, 2 Jul 2011 18:14:48 -0600

No, this is impossible. Each MPI thread spawns a separate GPU thread as
well, so the number of threads you launch should be less than or equal to
the number of GPUs you have.

What NAMD probably does is assign the entire calculation EXCEPT for the long
range electrostatic part (for PME) to the GPU, handling that last part with
CPUs. Thus, it would pay to have several CPUs to parallelize the long range
part (which uses an FFT, which hasn't yet been efficiently parallelized on
GPUs) while using a single GPU to do the rest.

pmemd.cuda, I believe, does the FFT on the GPU itself, so the CPU handles
*none* of the calculation (just some of the bookkeeping, printing, etc.)
Here, nothing the CPU does is worth parallelizing.

I'm not sure which way is *better* (doing the FFT on CPUs or GPUs) -- I
think it depends largely on your philosophy of the proper non-bonded
settings. For instance, a larger direct-space cutoff (NAMD uses 12 Å
compared to Amber using 8 Å by default) means that the direct space sum is
bigger, and the reciprocal space sum smaller (the direct space calculation
scales as N^2, whereas the FFT in the reciprocal space is N log (N), so it's
less expensive).

The FFT is more difficult to parallelize, especially over large numbers of
processors, whereas the direct space sum is not. Therefore, using a larger
cutoff will naturally lead to "more efficient scaling", since you're
reducing the size of the problem that can't be parallelized effectively (at
the cost of increasing the part that can by a much larger amount).

Therefore, you have to take scaling/timing comparisons with a grain of salt
-- even published timing comparisons are often done partially and/or
unfairly (knowingly or not).

This is probably a more detailed explanation than you were looking for, but
alas here it is.

HTH,
Jason

2011/7/2 Fabrício Bracht <bracht.iq.ufrj.br>

> Having my compilation problems solved. I have one doubt left. Is there
> a way to run amber on multiple processors while still using
> accelerated GPU pmemd. I asking this because with namd, one can
> achieve greater performance while using multiple processors and the
> gpu for a CUDA-accelerated calculation.
> Thank you
> Fabrício
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>

-- 
Jason M. Swails
Quantum Theory Project,
University of Florida
Ph.D. Candidate
352-392-4032
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

Received on Sat Jul 02 2011 - 17:30:03 PDT