Re: [AMBER] pmemd.cuda.MPI vs openmpi

From: Victor Ma <victordsmagift.gmail.com>
Date: Wed, 3 Jun 2015 11:58:01 -0700

Hello Ross,

Thank you so much for the detailed explanation. I think I know what the
problem is. My command to run 2 gpus on a single node is right:
export CUDA_VISIBLE_DEVICES=0,1
mpirun -np 2 pmemd.cuda.MPI -O ...

When I run make for check_p2p, the error message is:
/bin/nvcc -ccbin g++
-I/home/rcf-proj2/zz1/zhen009/membrane/amber/prep/openmpi-1/check_p2p
-m64 -o gpuP2PCheck.o -c gpuP2PCheck.cu
make: /bin/nvcc: Command not found
make: *** [gpuP2PCheck.o] Error 127

I suppose nvcc is indeed not installed on the cluster or at least not under
/bin/nvcc

And your guess is right: the two gpus are on two different buses:
lspci -t -v
-+-[0000:20]-+-00.0-[31]--
 | +-01.0-[21]--
 | +-01.1-[2a]--
 | +-02.0-[24]----00.0 NVIDIA Corporation GK110GL [Tesla K20m]
 | +-02.1-[2b]--
 | +-02.2-[2c]--
 | +-02.3-[2d]--
 | +-03.0-[27]--
 | +-03.1-[2e]--
 | +-03.2-[2f]--
 | +-03.3-[30]--
 | +-04.0 Intel Corporation Xeon E5/Core i7 DMA Channel 0
 | +-04.1 Intel Corporation Xeon E5/Core i7 DMA Channel 1
 | +-04.2 Intel Corporation Xeon E5/Core i7 DMA Channel 2
 | +-04.3 Intel Corporation Xeon E5/Core i7 DMA Channel 3
 | +-04.4 Intel Corporation Xeon E5/Core i7 DMA Channel 4
 | +-04.5 Intel Corporation Xeon E5/Core i7 DMA Channel 5
 | +-04.6 Intel Corporation Xeon E5/Core i7 DMA Channel 6
 | +-04.7 Intel Corporation Xeon E5/Core i7 DMA Channel 7
 | +-05.0 Intel Corporation Xeon E5/Core i7 Address Map,
VTd_Misc, System Management
 | +-05.2 Intel Corporation Xeon E5/Core i7 Control Status and
Global Errors
 | \-05.4 Intel Corporation Xeon E5/Core i7 I/O APIC
 \-[0000:00]-+-00.0 Intel Corporation Xeon E5/Core i7 DMI2
             +-01.0-[05]----00.0 LSI Logic / Symbios Logic SAS2308
PCI-Express Fusion-MPT SAS-2
             +-01.1-[06]--
             +-02.0-[08]----00.0 NVIDIA Corporation GK110GL [Tesla K20m]
             +-02.1-[0c]--
             +-02.2-[0b]--
             +-02.3-[0d]--
             +-03.0-[07]----00.0 Mellanox Technologies MT27500 Family
[ConnectX-3]
             +-03.1-[0e]--
             +-03.2-[0f]--
             +-03.3-[10]--
             +-04.0 Intel Corporation Xeon E5/Core i7 DMA Channel 0
             +-04.1 Intel Corporation Xeon E5/Core i7 DMA Channel 1
             +-04.2 Intel Corporation Xeon E5/Core i7 DMA Channel 2
             +-04.3 Intel Corporation Xeon E5/Core i7 DMA Channel 3
             +-04.4 Intel Corporation Xeon E5/Core i7 DMA Channel 4
             +-04.5 Intel Corporation Xeon E5/Core i7 DMA Channel 5
             +-04.6 Intel Corporation Xeon E5/Core i7 DMA Channel 6
             +-04.7 Intel Corporation Xeon E5/Core i7 DMA Channel 7
             +-05.0 Intel Corporation Xeon E5/Core i7 Address Map,
VTd_Misc, System Management
             +-05.2 Intel Corporation Xeon E5/Core i7 Control Status and
Global Errors
             +-05.4 Intel Corporation Xeon E5/Core i7 I/O APIC
             +-11.0-[04]--
             +-1a.0 Intel Corporation C600/X79 series chipset USB2
Enhanced Host Controller #2
             +-1c.0-[02]--+-00.0 Intel Corporation I350 Gigabit Network
Connection
             | \-00.1 Intel Corporation I350 Gigabit Network
Connection
             +-1c.7-[01]--+-00.0 Hewlett-Packard Company Integrated
Lights-Out Standard Slave Instrumentation & System Support
             | +-00.1 Matrox Electronics Systems Ltd. MGA G200EH
             | +-00.2 Hewlett-Packard Company Integrated
Lights-Out Standard Management Processor Support and Messaging
             | \-00.4 Hewlett-Packard Company Integrated
Lights-Out Standard Virtual USB Controller
             +-1d.0 Intel Corporation C600/X79 series chipset USB2
Enhanced Host Controller #1
             +-1e.0-[03]--
             +-1f.0 Intel Corporation C600/X79 series chipset LPC
Controller
             \-1f.2 Intel Corporation C600/X79 series chipset 6-Port SATA
AHCI Controller

I will let the system admin know and hope they might do something. :(

Thanks again and really appreciate it.

Victor


On Wed, Jun 3, 2015 at 11:31 AM, Ross Walker <ross.rosswalker.co.uk> wrote:

> Hi Victor,
>
> Do not attempt to run regular GPU MD runs across multiple nodes.
> Infiniband is way too slow these days to keep up with the computation speed
> of the GPUs. The only types of simulation that you can run over multiple
> nodes with GPUs are loosely coupled runs such as those based on Replica
> exchange approaches.
>
> In terms of using more than one GPU within a node for a single MD run it
> is crucial that they can communicate via peer to peer over the PCI-E bus.
> Having to go through the CPU chipset (which is what happens when they can't
> talk via Peer to Peer is also too slow these days). In terms of CPU counts
> for multi-GPU runs - the CPU is used purely to control the GPU - as such
> running with -np 16 does not help - it actually runs 16 GPU 'instances'
> which end up as 8 on each of your GPUs which really slows things down. We
> could have taken the NAMD / Gromacs approach of only offloading part of the
> calculation to the GPU and using the CPUs for the remainder but the net
> result is you actually end up slower overall than just taking the
> 'everything on the GPU approach' and leaving the excess CPUs idle. That
> said you can use those CPUs for other jobs. E.g.
>
> export CUDA_VISIBLE_DEVICES=0
> nohup $AMBERHOME/bin/pmemd.cuda -O -i mdin.0 -o mdout.0 ... &
> export CUDA_VISIBLE_DEVICES=1
> nohup $AMBERHOME/bin/pmemd.cuda -O -i mdin.1 -o mdout.1 ... &
> nohup mpirun -np 14 $AMBERHOME/bin/pmemd.MPI -O -i mdin.2 -o mdout.2 ... &
>
> So the CPUs are not entirely wasted - although this takes a carefully
> crafted scheduler on a cluster.
>
> On terms of using the 2 GPUs at the same time the correct command line is,
> for your 2 GPU case:
>
> export CUDA_VISIBLE_DEVICES=0,1
> mpirun -np 2 $AMBERHOME/bin/pmemd.cuda.MPI -O ...
>
> The issue is that without P2P it is impossible to get speedup (for non-GB
> calculations) over multiple GPUs. In this case the best you can do is run
> two GPU runs, one on each GPU as above.
>
> Is there a reason you cannot build the check_p2p code? It's real simple -
> I'd be shocked if the cluster did not have make and nvcc installed. How
> would anyone compile their code for it? - how did they compile AMBER 14?
>
> One thing you can quickly try is running lspci | grep nvidia on one of the
> nodes. E.g.
>
> [root.GTX_TD ~]# lspci | grep NVIDIA
> 02:00.0 VGA compatible controller: NVIDIA Corporation GM204 (rev a1)
> 02:00.1 Audio device: NVIDIA Corporation Device 0fbb (rev a1)
> 03:00.0 VGA compatible controller: NVIDIA Corporation GM204 (rev a1)
> 03:00.1 Audio device: NVIDIA Corporation Device 0fbb (rev a1)
> 82:00.0 VGA compatible controller: NVIDIA Corporation GM204 (rev a1)
> 82:00.1 Audio device: NVIDIA Corporation Device 0fbb (rev a1)
> 83:00.0 VGA compatible controller: NVIDIA Corporation GM204 (rev a1)
> 83:00.1 Audio device: NVIDIA Corporation Device 0fbb (rev a1)
>
> Here you get the bus numbers that the GPUs are connected to. In this case
> there are 4 GPUs. one on bus 02, one on bus 03, one on bus 82 and one on
> bus 83. You can then run 'lspci -t -v' to get a full bus connectivity
> listing. In this case (pulling out the bits relevant to the GPUs) we have:
>
> +-[0000:80]-+-00.0-[81]--+-00.0 Intel Corporation I350 Gigabit Network
> Connection
> | | \-00.1 Intel Corporation I350 Gigabit Network
> Connection
> | +-02.0-[82]--+-00.0 NVIDIA Corporation GM204
> | | \-00.1 NVIDIA Corporation Device 0fbb
> | +-03.0-[83]--+-00.0 NVIDIA Corporation GM204
> | | \-00.1 NVIDIA Corporation Device 0fbb
> | +-04.0 Intel Corporation Xeon E5 v3/Core i7 DMA Channel 0
>
> and
>
> \-[0000:00]-+-00.0 Intel Corporation Xeon E5 v3/Core i7 DMI2
> +-01.0-[01]--
> +-02.0-[02]--+-00.0 NVIDIA Corporation GM204
> | \-00.1 NVIDIA Corporation Device 0fbb
> +-03.0-[03]--+-00.0 NVIDIA Corporation GM204
> | \-00.1 NVIDIA Corporation Device 0fbb
> +-04.0 Intel Corporation Xeon E5 v3/Core i7 DMA Channel 0
>
>
> So you see here that the 4 GPUs are in two groups, one set of two on one
> bus (connected to one of the CPU sockets) and the other set of two on the
> other bus connected to the other CPU socket. GPUs here can only communicate
> via P2P if they are on the same PCI bus. So here GPUs 0 and 1 can do P2P
> and 2 and 3 can do P2P but the combinations 0-2,0-3,1-2,1-3 are not
> supported.
>
> In the case of your system I suspect that they placed one GPU on one bus
> and one GPU on the other bus - this is about the worst combination you can
> make for having two GPUs in the same node. If this is the case then you
> need to ask the administrators to please physically move one of the GPUs to
> a different PCI-E slot such that they are both connected to the same
> physical CPU socket.
>
> Confusing and annoying but unfortunately a complexity that most people
> building clusters these days don't consider.
>
> Hope that helps.
>
> All the best
> Ross
>
> > On Jun 3, 2015, at 10:54 AM, Victor Ma <victordsmagift.gmail.com> wrote:
> >
> > Hello Amber community,
> >
> > I am testing my amber14 on a gpu cluster with IB. I noticed that when I
> > turn on openmpi with pmemd.cuda.MPI, it actually slows things down.
> > On single node, I have two gpus and 16 cpus. If I submit a job using
> > "pmemd.cuda.MPI -O -i .....", one gpu is 99% used and P2P support is on.
> > For my big system, I am getting ~27ns/day. If I turn on openmpi and use
> > this instead "export CUDA_VISIBLE_DEVICES=0,1 then mpirun -np 2
> > pmemd.cuda.MPI -O -i ....", two gpus are 77% used each but P2P is OFF. In
> > this case, I am getting 33 ns/day. It is faster but I suspect that it
> could
> > be even faster if the P2P is on. The other thing I tried is to run
> "mpirun
> > -np 16 pmemd.cuda.MPI -O -i ....". Here the run is slowed down to
> 14ns/day.
> > One GPU is used and all 16 cpus are used. Again p2p is off.
> >
> > I downloaded the check_p2p scripts. But as I am working on a cluster and
> > therefore do could not run "make".
> >
> > I am pretty happy with the speed I am getting but also wondering if the
> > configuration can be further optimized to improve performance, eg running
> > on 2gpus 100% with P2P on.
> >
> >
> > Thank you!
> >
> >
> > Victor
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Jun 03 2015 - 12:00:03 PDT
Custom Search