Re: [AMBER] CUDA driver insufficient

From: Ross Walker <ross.rosswalker.co.uk>
Date: Tue, 26 Nov 2013 16:31:22 -0800

Hi Wei,

As soon as you mention the word Dell I am not surprised. First question - is
it REALLY Dell customized in some way? - I.e. Dell's own hacked to bits
driver? - If yes then my advice would be to format the machine and just
install a vanilla CentOS 6 - it will ultimately be a LOT less pain in the
long run.

Failing that try the following:

yum remove nvidia-kmod xorg-x11-drv-nvidia nvidia-settings nvidia-xconfig
nvidia-modprobe

Then download the NVIDIA-Linux-x86_64-331.20.run from the NVIDIA website,
chmod 755 it, and then run it - when asked say yes to uninstall the existing
driver and then install the new driver and say yes you want it to update you
X11 configuration files. Hopefully then when you reboot your login window
will come back. (you may need to install kernel-devel and kernel-headers if
the driver fails to compile).

Check that running 'nvidia-smi' shows you the GPU and driver loaded.

You should also download and install the vanilla cuda 5.0 (yes, 5.0) from
the NVIDIA website and install that.

Ultimately if this doesn't fix the mess I would suggest the format and
vanilla install approach since I wouldn't trust anything that is Dell
customized.

All the best
Ross

From: wei zhang <zhangwee.yahoo.com>
Reply-To: wei zhang <zhangwee.yahoo.com>
Date: Tuesday, November 26, 2013 3:35 PM
To: wei zhang <zhangwee.yahoo.com>, Ross Walker <ross.rosswalker.co.uk>,
AMBER Mailing List <amber.ambermd.org>, AMBER Mailing List
<amber.ambermd.org>
Subject: Re: [AMBER] CUDA driver insufficient

Dear All,
As described in my previous Emails, I have complied the cuda and it runs
fine at the Dell machine. However, to make PMEMD_cuda work, I had to fist
uninstall the original Dell-nvidia package. I actually had to uninstall it
using the windowıs graphic interface : System setting/ add/removing
application ( rpm ­e doenıt work, shown GPU not installedŠ). However, by
doing so, I lost the login windows. To make things worse, after rebooting
the server, PMEMD_cuda stopped working, even after I recompile it.
 
During the recompiling, some abnormal returns and warnings were shown as
below:
³B40C/radix_sort/../radix_sort/../radix_sort/upsweep/../../radix_sort/upswee
p/cta.cuh(127): Advisory: Loop was not unrolled, unexpected control flow
construct
B40C/radix_sort/../radix_sort/../radix_sort/upsweep/../../radix_sort/upsweep
/cta.cuh(127): Advisory: Loop was not unrolled, unexpected control flow
construct
./kForcesUpdate.cu(87): Advisory: Loop was not unrolled, cannot deduce loop
trip count
./kForcesUpdate.cu(127): Advisory: Loop was not unrolled, cannot deduce loop
trip count
ptxas warning : Too many threads per SM specified for entry
_ZN4b40c10radix_sort7upsweep6KernelINS0_7Enactor16OpaquePassPolicyINS0_15Pro
blemInstanceINS_4util11MultiBufferILi2EjjEEiEELNS0_11ProblemSizeE1ELi4EE13Up
sweepPolicyEijEEvPT0_PT1_NS6_19CtaWorkDistributionISD_EEj, will be ignored²
 make test.cuda hangs on:
cd nmropt/gb/angle/ && ./Run.nmropt_1angle_gb SPFP
/home/wei/amber12/include/netcdf.mod
 Similar hanging happed using as my originally compile PMEMD_cuda after the
rebooting, ( PMEMD_cuda running but doesnıt output, seems like a dead
circleŠ)
 Would very much appreciate you help!
Meanwhile, any hints on how can I get the login window back? Tried to
reinstall the original-Dell Nvidia drive back, but shown confliction with
current driver ( same thing happened when install the latest NVIDIA driver,
thatıs why I had to remove the Dell-driver to use PMEMD_cuda).
Thanks a lot!
Wei
  
 
 
 
   From: wei zhang <zhangwee.yahoo.com>
 To: wei zhang <zhangwee.yahoo.com>; Ross Walker <ross.rosswalker.co.uk>;
AMBER Mailing List <amber.ambermd.org>; AMBER Mailing List
<amber.ambermd.org>
 Sent: Wednesday, November 20, 2013 9:09 AM
 Subject: Re: [AMBER] CUDA driver insufficient
  
 

Here are some updates about the CUDA driver.
Driver 331.20 works fine for tesla C2075.
PMEMD testing resutls on the machine ( Dual six-core Dell-Precision
R5500n):
 
1np: 1X
2np: 1.74X
4np: 3.65X
8np: 6.36X
10np: 7.38X
cuda: 23.39X
 
Best regards,
Wei
 
 

________________________________
 From: wei zhang <zhangwee.yahoo.com>
To: Ross Walker <ross.rosswalker.co.uk>; AMBER Mailing List
<amber.ambermd.org>
Sent: Tuesday, November 19, 2013 9:19 AM
Subject: Re: [AMBER] CUDA driver insufficient
  

Thanks! Ross,
 
I gpt the drives from the NVDIA website, below is the link:
http://www.nvidia.com/Download/Find.aspx?lang=en-us
(Seems somehow I used advanced search, but 304 listed as recommandered
driver still can be misleading..)
 
Best regards,
Wei
  

________________________________
From: Ross Walker <ross.rosswalker.co.uk>
To: wei zhang <zhangwee.yahoo.com>; AMBER Mailing List <amber.ambermd.org>
Sent: Monday, November 18, 2013 10:26 PM
Subject: Re: [AMBER] CUDA driver insufficient
  

Hi Wei,

This happens when the driver you have loaded does not match the version of
cuda, and importantly the cuda runtime library. I.e. NVIDIA has minimum
driver requirements for each version of CUDA. The 304 driver sounds
ancient to me. Not sure where you are finding that one from. I doubt it
works for anything beyond cuda 5.0 which is likely the source of your
problem.

Note AMBER also has a minimum requirement with the latest GPUs (700
series) of 319.60 (recommended) or 325.15 (which is actually older than
319.60). I have not tried the 331 branch which is brand new and completely
untested. Hopefully the fixes in the 319 and 325 trees got propogated into
the 331 tree but you will likely be the first person trying it.

Where did you see 304 being recommended? - I'll let NVIDIA know to fix
their website.

All the best
Ross



On 11/18/13 3:35 PM, "wei zhang" <zhangwee.yahoo.com> wrote:

>Thanks! Jason.
>Using 331.20 indeed solved the problem.
>A bit supprise though.. ( both 331.20 and 304.1160 are the latest
>version released on Nov6, 2013, while the recommanded one did not work
>for some reason)
>
>Best regards,
>Wei
>
>
>________________________________
> From: Jason Swails <jason.swails.gmail.com>
>To: wei zhang <zhangwee.yahoo.com>; AMBER Mailing List
><amber.ambermd.org>
>Sent: Monday, November 18, 2013 4:32 PM
>Subject: Re: [AMBER] CUDA driver insufficient
>
>
>
>On Mon, Nov 18, 2013 at 5:08 PM, wei zhang <zhangwee.yahoo.com> wrote:
>
>Dear All,
>>
>>I am try to compile AMBER12 on a Dell-Precision R5500n machine.
>>It have a single tesla C2075 GPU.
>>Everything seems OK untill "make test.cuda" retuned a bunch of errors
>>indicating" CUDA driver version is insufficient for CUDA runtime version"
>>
>>I used the latest CUDA-5.5 toolkits, and the NVIDIA recommadered driver:
>>Linux x64 (AMD64/EM64T) Display Driver (version: 304.1160).
>>
>>Could anyone provide some hints? should I use a older CUDA?
>>
>
>No, you should use a later driver. There were issues with drivers before
>325.15 for certain hardware, so there was a check put in to make sure
>that was the version driver being used.
>
>I believe the C2075 is fine with the 304.1160 drivers (maybe). If you
>can't upgrade the driver for whatever reason, then you can always comment
>out the driver version check in gpu.cpp and recompile (but I would
>suggest trying to get the updated driver, first).
>
>Good luck,
>Jason
>--
>
>Jason M. Swails
>BioMaPS,
>Rutgers University
>Postdoctoral Researcher
>_______________________________________________
>AMBER mailing list
>AMBER.ambermd.org
>http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber


 
 
  


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Nov 26 2013 - 17:00:02 PST
Custom Search