Dear Ross,
I still have a question about the parallel GPU-version AMBER.
There are two physical K80 cards in our GPU node, but each card has two chips. So there are GPU 0 1 2 3 in this node.
When I use the GPU 0 1, it can do a parallel calculation. However, when I use the GPU 0 2, it fails and causes the machine to crash.
I find that the p2p communication is ENABLED in later output file.
The message about our GPU node:
GPU0 GPU1 GPU2 GPU3 mlx4_0 CPU Affinity
GPU0 X PIX PHB PHB PHB 0-11,24-35
GPU1 PIX X PHB PHB PHB 0-11,24-35
GPU2 PHB PHB X PIX PHB 0-11,24-35
GPU3 PHB PHB PIX X PHB 0-11,24-35
mlx4_0 PHB PHB PHB PHB X
Legend:
X = Self
SOC = Connection traversing PCIe as well as the SMP link between CPU sockets(e.g. QPI)
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe switches (without traversing the PCIe Host Bridge)
PIX = Connection traversing a single PCIe switch
NV# = Connection traversing a bonded set of # NVLinks
I wonder if this node can not support the p2p communication between the two cards or not.
I wonder if the AMBER can not identify this so that it enables the p2p communication between the two cards or not.
Or, are there any problem?
Best,
Pengfei Li
Email:lipengfei_mail.126.com
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Nov 10 2016 - 18:00:03 PST