Hi Eric,
The time taken to send data to and from the CPU and GPU is very slow compared with the speed of the GPU. As such, and given the GPU code in PMEMD runs so fast, it does not make sense to try to use the CPU cores during an MD run. Other codes like Gromacs and NAMD use the CPUs at the same time as the GPU but this is more a function of their GPU implementations being non-optimal which allows them to mop up cycles on the CPU. AMBER just uses a single core per GPU - which will show up as 100% usage but in reality that is almost entirely idle spooling waiting for I/O data back from the GPU. So while a GPU run is going you are fine to use the CPUs for something else. E.g. a 16 core by 4 GPU system you could run 4 pmemd.cuda runs and then a 16 cpu pmemd.cuda.MPI run and it should be fine (or a Gaussian run or something else that just uses the CPUs).
The real benefit from the AMBER design of not using the CPUs for GPU runs is that it allows you to buy the very cheap bottom bin CPUs without effecting performance. E.g. with Gromacs one has to also get high end CPUs so you end up buying the 12 core+ Intel Haswell chips which are $3K a pop in addition to your $1K each Titan-X GPUs. With AMBER it is quite happy with $99 8 core AMD chips or $200 6 core Intel Haswell chips - thus the cost of a node for a given level of performance is much lower.
I hope that helps.
All the best
Ross
> On Jul 8, 2015, at 5:02 PM, Éric Germaneau <germaneau.sjtu.edu.cn> wrote:
>
> Thank you Gerarld for the info,
>
> That really surprises me, multri-threading on CPU in never useless when
> using GPU since CPU is always highly solicited.
> Anyway I note that Amber doesn't support OpenMP.
> Thanks again.
>
> Eric
>
>
> On 07/08/2015 04:10 PM, Gerald Monard wrote:
>> Hi,
>>
>> On 07/08/2015 08:12 AM, Éric Germaneau wrote:
>>> Dear all,
>>>
>>> I'm helping a user running Amber on our machine.
>>> I basically compiled it doing
>>>
>>> ./configure -intelmpi -mkl -cuda -openmp intel
>> As far as I known, the -openmp configuration option does not have any
>> effect on the compilation of pmemd (except for MIC?).
>>
>>> We have two 8-cores CPU and two GPU per node, we use LSF as job scheduler.
>>> I submit a 2 MPI process per node job.
>>> Each of the process takes care of one GPU as expected, however only one
>>> threads per MPI process is used (even if I set OMP_NUM_THREADS).
>>> Is there a way to perform a multi-threading job using MPI and CUDA?
>> The way pmemd.cuda works is that it uses 1 thread per GPU. Most of the
>> computations are done in the GPU (unlike NAMD for example), thus
>> multi-threading on the CPU is useless here.
>>
>> Gerald.
>>
>>> Thank you,
>>>
>>> Éric.
>>>
>
> --
> Éric Germaneau (艾海克), Specialist
> Center for High Performance Computing
> Shanghai Jiao Tong University
> Room 205 Network Center, 800 Dongchuan Road, Shanghai 200240 China
> Email:germaneau.sjtu.edu.cn Mobi:+86-136-4161-6480 http://hpc.sjtu.edu.cn
>
> *
> * English
> * Chinese (Simplified)
>
> * English
> * Chinese (Simplified)
>
> <javascript:void(0);>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Jul 09 2015 - 04:30:02 PDT