Hi Pengfei,
This is a bios bug. Go to the motherboard manufacturers website (or contact your vendor) and update the bios and see if this fixes the problem. If it doesn't then you'll need to contact the motherboard / machine vendor and request info on how to fix P2P communications between PCIe sockets. I am betting that if you install and build the NVIDIA CUDA Toolkit and then run p2p bandwidth latency test you should find that it fails when run between the two physical K80 cards.
All the best
Ross
> On Nov 10, 2016, at 20:48, Pengfei Li <lipengfei_mail.126.com> wrote:
>
> Dear Ross,
> I still have a question about the parallel GPU-version AMBER.
> There are two physical K80 cards in our GPU node, but each card has two chips. So there are GPU 0 1 2 3 in this node.
> When I use the GPU 0 1, it can do a parallel calculation. However, when I use the GPU 0 2, it fails and causes the machine to crash.
> I find that the p2p communication is ENABLED in later output file.
>
> The message about our GPU node:
> GPU0 GPU1 GPU2 GPU3 mlx4_0 CPU Affinity
>
> GPU0 X PIX PHB PHB PHB 0-11,24-35
>
> GPU1 PIX X PHB PHB PHB 0-11,24-35
>
> GPU2 PHB PHB X PIX PHB 0-11,24-35
>
> GPU3 PHB PHB PIX X PHB 0-11,24-35
>
> mlx4_0 PHB PHB PHB PHB X
>
> Legend:
> X = Self
>
> SOC = Connection traversing PCIe as well as the SMP link between CPU sockets(e.g. QPI)
>
> PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
>
> PXB = Connection traversing multiple PCIe switches (without traversing the PCIe Host Bridge)
>
> PIX = Connection traversing a single PCIe switch
>
> NV# = Connection traversing a bonded set of # NVLinks
>
> I wonder if this node can not support the p2p communication between the two cards or not.
> I wonder if the AMBER can not identify this so that it enables the p2p communication between the two cards or not.
>
> Or, are there any problem?
>
> Best,
>
>
> Pengfei Li
> Email:lipengfei_mail.126.com
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Nov 10 2016 - 19:00:02 PST