Re: [AMBER] Problem running multiple GPU's

From: Jason Swails <jason.swails.gmail.com>
Date: Wed, 10 Sep 2014 12:45:14 -0400

On Wed, 2014-09-10 at 15:53 +0000, jon.maguire.louisville.edu wrote:
> We’ve built a system that has 3 Nvidia Titan Blacks. We CAN run pmemd.cuda (and the MPI version) in the following configs
>
> export CUDA_VISIBLE_DEVICES=0
> export CUDA_VISIBLE_DEVICES=0,1
> export CUDA_VISIBLE_DEVICES=0,2
>
> However, we CANNOT run the following:
>
> export CUDA_VISIBLE_DEVICES=1
> export CUDA_VISIBLE_DEVICES=2
> export CUDA_VISIBLE_DEVICES=1,2
>
> We want to run one job per GPU, but amber comes back with “Error
> selecting compatible GPU out of memory” when nothing is running on the
> GPU. Or in the case of running on 1,2, it returns
> “cudaMemcpyToSymbol: SetSim copy to cSim failed out of memory." Is
> there a flag that needs to be set? An nvidia-smi command? Its really
> bizarre behavior!

What happens when you run deviceQuery from the CUDA code samples? Do
you see all 3 GPUs?

It's important to note that the GPU ordering printed by nvidia-smi is
NOT always the same ordering as what the CUDA runtime sees. In order to
get the true device ID -> card mapping, you need to use a program that
actually uses the CUDA API (e.g., deviceQuery).

It could be that you have 4 GPUs on your machine with one powering the
display? And that 4th GPU won't work for Amber? In any case, the
output of deviceQuery will tell us what the CUDA RT expects in terms of
available GPUs and their properties.

HTH,
Jason

-- 
Jason M. Swails
BioMaPS,
Rutgers University
Postdoctoral Researcher
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Sep 10 2014 - 10:00:03 PDT
Custom Search