RE: [AMBER] amber11 PMEMD cuda problem. (sometimes stopped...)

From: Ross Walker <ross.rosswalker.co.uk>
Date: Thu, 3 Jun 2010 07:44:12 -0700

Hi Wookyung,

 

Having tried this out on a machine here it looks like you need to have
Xwindows started in order for the CUDA drivers to be loaded. Hence try the
following. To make sure everything is fresh reboot the machine and let it
load up and start Xwindows but DO NOT login. Then press ctrl-alt-F1 to get a
terminal (if you Linux OS supports that) or login remotely from another
machine using SSH.

 

Then try running the calculations from within the terminal and see what
happens. At NO POINT should you log in via the xwindows on the machine. Also
make sure there is NO SCREENSAVER running.

 

All the best

Ross

 

From: Wookyung Yu [mailto:sfcywk.gmail.com]
Sent: Thursday, June 03, 2010 6:06 AM
To: Ross Walker
Subject: Re: [AMBER] amber11 PMEMD cuda problem. (sometimes stopped...)

 

Dear Ross Walker

 

Thank you for your concerning for our problem

and giving good way to solve the problem.

 

Maybe you are correct

"My guess is that you are also running Xwindows on this same GPU

and that is causing conflicts, low memory on the GPU etc."

 

I shut down the X11 and checked running simulation.

And then now the same problem is not happened.

 

But another problem is happened.

 

Previous case (turn on X11) job is stopped, and next job is run again.

 

This case (shut down X11) job isn't stopped,

but job doesn't give any output file(mdcrd, rst, mdout)

even though job(and GPU card) is continuesly running...

(the job will end less than 2 hours,

but job is running more than 7 hours, and doens't give output file)

 

-> simliar problem but different problem....

 

And the good sign is the frequency of happening error is decreased

 

 

Maybe our system have some problem using GTX480.

 

In this case what shall I do?

 

Another good way to solve the problem?

 

Thank you for all your concerning.

 

 

Best regards,

Wookyung Yu



 

2010/6/2 Ross Walker <ross.rosswalker.co.uk>

Hi Wookyung,

 

I have run this locally on my machine now. Using your scripts to run the
protein 2 simulation and as you can see it runs fine:

 

-rw------- 1 107973 May 31 17:12 eq.out

-rw------- 1 6431523 Jun 1 11:48 md10.out

-rw------- 1 6431523 Jun 1 13:36 md11.out

-rw------- 1 6431523 Jun 1 15:23 md12.out

-rw------- 1 5444547 Jun 1 16:55 md13.out

-rw------- 1 6431523 May 31 19:40 md1.out

-rw------- 1 6431523 May 31 21:27 md2.out

-rw------- 1 6431523 May 31 23:15 md3.out

-rw------- 1 6431523 Jun 1 01:03 md4.out

-rw------- 1 6431523 Jun 1 02:50 md5.out

-rw------- 1 6431523 Jun 1 04:38 md6.out

-rw------- 1 6431523 Jun 1 06:26 md7.out

-rw------- 1 6431523 Jun 1 08:14 md8.out

-rw------- 1 6431523 Jun 1 10:01 md9.out

 

Hence I think it must be something local to your setup. My guess is that you
are also running Xwindows on this same GPU and that is causing conflicts,
low memory on the GPU etc. Please let me know if this is not the case. If it
is then I would suggest going and getting a cheap graphics card to run X11
on while you are running GPU calculations. Or be resigned to shutting down
X11 while you are running.

 

If this is not the case and you are using the GTX480 dedicated then let me
know and I can try and dig into this some more.

 

All the best

Ross

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Jun 03 2010 - 08:00:03 PDT
Custom Search