Re: [AMBER] using two GPUs

From: Ross Walker <ross.rosswalker.co.uk>
Date: Thu, 3 May 2012 01:41:45 -0700

Hi Vijay,

> [gpuadmin.gpucc benchMark-malto-Thermo-in-2GPU-amber12]$ ./gpu-md-
> malHL-RT-1ns.sh &
> [2] 7030
> [gpuadmin.gpucc benchMark-malto-Thermo-in-2GPU-amber12]$ CMA: unable to
> get RDMA device list
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list

It looks to me like something is funky with your mvapich installation.
Firstly it is interesting that the calculation hangs at 100 steps. This
happens to match the setting of ntwe, ntwx and ntpr. Try setting ntwe and
ntwx = 0 and try again. There may be an issue with I/O through your MPI
implementation. If the calculation runs beyond 100 steps then this could
indeed be the issue.

I'd start by trying some runs with pmemd.MPI (the CPU version) built with
the same mvapich and see if that runs fine - then move back to the GPU
version.

> When I use top command to see if pmemd.cuda.MPI is running I get this
> lines
>
> *******************************************************************
>  6901 gpuadmin  20   0  112g 109m  26m R 99.8  0.1  30:23.90
> pmemd.cuda
>  7043 gpuadmin  20   0  116g 128m  32m R 99.8  0.1   9:52.63
> pmemd.cuda.MPI
>  7044 gpuadmin  20   0  116g 123m  27m R 99.8  0.1   9:52.71
> pmemd.cuda.MPI
> ***********************************************************************

Why do you have a pmemd.cuda as well as 2 pmemd.cuda.MPI jobs? - Do you
have 3 GPUs in your node? - Are you certain they are all running on
different GPUs?

If not then this could cause all sorts of problems. I would actually suggest
completely rebooting the system before testing further to make sure
everything is clean.
 
All the best
Ross

/\
\/
|\oss Walker

---------------------------------------------------------
| Assistant Research Professor |
| San Diego Supercomputer Center |
| Adjunct Assistant Professor |
| Dept. of Chemistry and Biochemistry |
| University of California San Diego |
| NVIDIA Fellow |
| http://www.rosswalker.co.uk | http://www.wmd-lab.org/ |
| Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
---------------------------------------------------------

Note: Electronic Mail is not secure, has no guarantee of delivery, may not
be read every day, and should not be used for urgent or sensitive issues.





_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu May 03 2012 - 02:00:03 PDT
Custom Search