Re: [AMBER] cuda job not working from Jio M on 2014-01-04 (Amber Archive Jan 2014)

From: Jio M <jiomm.yahoo.com>
Date: Sat, 4 Jan 2014 07:59:41 -0800 (PST)

sorry this "3) Another thing I noted: job needs to be exclusive in a node" should be read as "job with more than one GPU needs to be exclusive in a node (means no other job running on same node)"

On Saturday, January 4, 2014 3:42 PM, Jio M <jiomm.yahoo.com> wrote:

Dear Daniel

Here are my replies:

1) I get this: so looks like its not updated?

|--------------------- INFORMATION ----------------------
| GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
| Version 12.0

2) Actually am accessing central cluster facility so not much in our hands. Cluster has nodes with 3 GPU in each node and others having 8 GPU in each node. I get 2.8ns/GPU, 3.7ns/2GPU and 4.7ns/3GPU for a system of 250,000 atoms. So I thought as nodes are connected by infinibands (I guess so) the job can be run on GPU residing on different nodes.

3) Another thing I noted: job needs to be exclusive in a node. I use bsub submission scripts and with that -x flag allows to use node exclusively for one job but when I don't use this and submit on free GPU in some node with some job already running on it, my AMBER job kills with same error.

best
JIo

On Saturday, January 4, 2014 4:38 AM, Daniel Roe <daniel.r.roe.gmail.com> wrote:

Hi,

First, make sure you're using the most up to date version of the code. Your
output should contain:

|--------------------- INFORMATION ----------------------
| GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
| Version 12.3.1

That being said, do you really need to use 3 GPUs across 3 different nodes
for a single job? I can't imagine you would see much of a speedup (if any).
If fact, with the current code even using 3 GPUs in a single node you
probably don't get that much overall speedup. I think that for Amber14 the
MPI implementation is going to receive a huge boost which may change this
in the future, but for now you're probably better off running separate jobs
on each GPU.

-Dan

On Fri, Jan 3, 2014 at 5:16 PM, Jio M <jiomm.yahoo.com> wrote:

> Hi all,
>
> Just to add, I tested now same job in single GPU and it works fine. My
> previous jobs with same input files worked fine when it uses 3 GPU in same
> node but when I use 3 GPU in different nodes it kills without error as said.
>
> Any ideas?
>
> regards
> JIo
>
>
>
>
>
> On Friday, January 3, 2014 11:26 PM, Jio M <jiomm.yahoo.com> wrote:
>
> Dear all
>
> I have one job with NPT input mdin file but it ends without error in cuda
> versio (pmemd.cuda_SPDP.MPI) but it works fine with pmemd.MPI
>
> here are end lines from cuda job without error:
>
> | Conditional Compilation Defines Used:
> | DIRFRC_COMTRANS
> | DIRFRC_EFS
> | DIRFRC_NOVEC
> | MPI
> | PUBFFT
> | FFTLOADBAL_2PROC
> | BINTRAJ
> | CUDA
>
> | Largest sphere to fit in unit cell has radius = 63.294
>
> Please suggest
>
> thanks
> JIo
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>

-- 
-------------------------
Daniel R. Roe, PhD
Department of Medicinal Chemistry
University of Utah
30 South 2000 East, Room 201
Salt Lake City, UT 84112-5820
http://home.chpc.utah.edu/~cheatham/
(801) 587-9652
(801) 585-6208 (Fax)
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

Received on Sat Jan 04 2014 - 08:00:02 PST