Re: [AMBER] Utilization of 3 GPUs suddenly drop to 0% and won't work any more from Ross Walker on 2015-07-29 (Amber Archive Jul 2015)

From: Ross Walker <ross.rosswalker.co.uk>
Date: Wed, 29 Jul 2015 07:31:45 -0700

Hi Asai,

This problem is normal. Please read http://ambermd.org/gpus/ for details. Specifically this section http://ambermd.org/gpus/#Running

Specifically, if you have not purchased hardware designed to support 4 GPUs in parallel via 'peer to peer' then you will most likely be limited to a maximum of 2 GPUs per simulation.

I would suggest running either 4 independent simulations, one on each GPU or running 2 x 2 GPU simulations on each pair of P2P capable GPUs. Likely this will be 0+1 and 2+3. But you can use the check_p2p code on the above website to check this for sure.

All the best
Ross

> On Jul 28, 2015, at 11:35 PM, 浅井賢 <suguruasai.gmail.com> wrote:
>
> Dear Amber user,
>
>
> Hi, I'm facing some strange problem on using 4 GPUs with pmemd.cuda.MPI sometime so I wonder if someone can help me.
> I am not really sure the minimum requirements for reproducing the situation but it occurs probably when I use pmemd.cuda.MPI to use two or more GPUs in a simulation.
> The phenomenon is a sudden utilization drop, which you can see the screen shot below.
>
> http://gyazo.com/07564256c2b9a9f02277cc5d6170ba15
>
> It seems the simulation is slow but goes OK actually but I'm afraid physical problem on GPUs, and also it is so annoying.
> While I have no idea what's going on, I don't know the word to search on internet.
> So does anybody have any idea?
>
>
> Thank you.
>
>
> Asai
>
>
> ATTACHEMENTS:
>
> md.out - mdout file of `pmemd.cuda.MPI`
> nvidia-smi.txt - `$ nvidia-smi > nvidia-smi.txt`
> <md.out><nvidia-smi.txt>_______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Jul 29 2015 - 08:00:03 PDT