Re: [AMBER] Quenstions about pmemd.cuda.MPI

From: Pengfei Li <lipengfei_mail.126.com>
Date: Fri, 11 Nov 2016 09:48:11 +0800 (CST)

Dear Ross,
I still have a question about the parallel GPU-version AMBER.
There are two physical K80 cards in our GPU node, but each card has two chips. So there are GPU 0 1 2 3 in this node.
When I use the GPU 0 1, it can do a parallel calculation. However, when I use the GPU 0 2, it fails and causes the machine to crash.
I find that the p2p communication is ENABLED in later output file.

The message about our GPU node:
      GPU0 GPU1 GPU2 GPU3 mlx4_0 CPU Affinity

GPU0 X PIX PHB PHB PHB 0-11,24-35

GPU1 PIX X PHB PHB PHB 0-11,24-35

GPU2 PHB PHB X PIX PHB 0-11,24-35

GPU3 PHB PHB PIX X PHB 0-11,24-35

mlx4_0 PHB PHB PHB PHB X

Legend:
  X = Self

  SOC = Connection traversing PCIe as well as the SMP link between CPU sockets(e.g. QPI)

  PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)

  PXB = Connection traversing multiple PCIe switches (without traversing the PCIe Host Bridge)

  PIX = Connection traversing a single PCIe switch

  NV# = Connection traversing a bonded set of # NVLinks

I wonder if this node can not support the p2p communication between the two cards or not.
I wonder if the AMBER can not identify this so that it enables the p2p communication between the two cards or not.

Or, are there any problem?

Best,


Pengfei Li
Email:lipengfei_mail.126.com
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Nov 10 2016 - 18:00:03 PST
Custom Search