Re: [AMBER] CUDA MPI issues from Jason Swails on 2016-05-23 (Amber Archive May 2016)

From: Jason Swails <jason.swails.gmail.com>
Date: Mon, 23 May 2016 09:22:20 -0400

On Mon, May 23, 2016 at 6:45 AM, Biplab Ghosh <ghosh.biplab.gmail.com>
wrote:

> Dear Amber Experts,
>
> I am trying to run amber14 using parallel GPUs.
> I have 2 "GeForce GTX TITAN X" cards installed
> in my workstation and having cuda-7.5 libs.
> Individual GPUs are performing but when I run
> pmemd.cuda.MPI, it gives me the following error:
>
> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>
> I then referred to the Amber website to check why GPU communication
> is failing. I downloaded "check_p2p.tar.bz2" program from the amber site
> and getting the following output upon running.
>
> [biplab.proline check_p2p]$ ./gpuP2PCheck
> CUDA_VISIBLE_DEVICES is unset.
> CUDA-capable device count: 2
> GPU0 "GeForce GTX TITAN X"
> GPU1 "GeForce GTX TITAN X"
>
> Two way peer access between:
> GPU0 and GPU1: NO
>
>
> Can anyone help me on how to configure my system, so that both
> GPU can work in parallel.
>

This does not mean that GPU0 and GPU1 cannot work in parallel, it simply
means that they cannot communicate via the P2P (peer-to-peer) protocol. So
they won't be able to work together *efficiently*. This support comes from
the motherboard, and you need to plug the cards into two PCIe slots that
support this communication. Some motherboards don't support this
communication at all, to my knowledge. If that is the case, there is
nothing you can do to fix P2P connectivity in your computer.

For the error message, you should check your output file for possible
errors. And/or run in serial (or on the CPU) if it's still not clear what
the error is.

HTH,
Jason\

-- 
Jason M. Swails
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

Received on Mon May 23 2016 - 06:30:02 PDT