Re: [AMBER] GPU related issues/GTX400 series from Gould, Ian R on 2010-11-20 (Amber Archive Nov 2010)

From: Gould, Ian R <i.gould.imperial.ac.uk>
Date: Sat, 20 Nov 2010 10:44:08 +0000

Hi Scott,

Has anyone tried a GTX580 yet? If so do you know if the same problem as the GTX480 been observed? If not would I be correct in assuming that the firmware on a GTX580 is different to that on a GTX480 and therefore the problem on the 480's could be firmware and possibly correctable by flashing the firmware? Sorry for so many questions. I am currently running two C2050's in parallel and they have been rock steady running 60K atom simulations for 50ns with no observed problems. Also have an undergraduate project student running an 18k simulation on an old GTX260 for 10's of ns with again no problems. I may just take a punt and buy a GTX580 and see how it goes.

Cheers
Ian

"I think that God in creating Man somewhat overestimated his ability."
Oscar Wilde
-
Dr Ian R Gould
Senior Lecturer Biological and Biophysical Chemistry
Imperial College London
Exhibition Road
London
SW7 2AY
E-mail i.gould.imperial.ac.uk
http://www3.imperial.ac.uk/people/i.gould
Fax +44 (0)207 594 5809

On 20/11/2010 04:23, "Scott Le Grand" <SLeGrand.nvidia.com> wrote:

We are still investigating. We have reproduced it in house. It's a head-scratcher - the cause is not obvious. The only hope I can currently offer is that I developed the initial PME code on a GTX480 (C2050s weren't available yet) and I probably would have run into then (I didn't) if it's an unfixable bug.

-----Original Message-----
From: Sergio R Aragon [mailto:aragons.sfsu.edu]
Sent: Friday, November 19, 2010 10:17
To: AMBER Mailing List
Subject: Re: [AMBER] GPU related issues/GTX400 series

Hello Scott,

Is Nvidia investigating the documented problem of the GTX 400 series cards in running pmemd or should we kiss those cards goodbye and move on? A sincere answer will be appreciated.

Thanks,

Sergio Aragon

-----Original Message-----
From: Scott Le Grand [mailto:SLeGrand.nvidia.com]
Sent: Friday, November 19, 2010 9:38 AM
To: 'AMBER Mailing List'
Subject: Re: [AMBER] GPU related issues

I have a quick fix for the 10 A cutoff. This only happens when there are 2 or fewer nonbond boxes per any axis. The kludge is to rebuild the neighbor list every step. The better solution is going to come with the fix to allow an arbitrary number of GPUs in multi-gpu for PME runs.

The situation there is I have relieved that constraint 2 ways.

The first approach slows down small jobs but provides a 10% speed kick to larger jobs at 8 GPUs

The second approach hit 58 ns/day for JAC for 8 GPUs but slowed down larger molecules 1-2%

Since the roadmap points to larger molecules (500K to 2M atoms) I am focused on fixing the shortcomings of the second approach.

Please send me the input file and I can verify you're hitting what I think you're hitting. If so, I'll check in the kludge for now.

-----Original Message-----
From: Ross Walker [mailto:ross.rosswalker.co.uk]
Sent: Friday, November 19, 2010 09:24
To: 'AMBER Mailing List'
Subject: Re: [AMBER] GPU related issues

Hi Ye,

> I have applied all the patches to AMBER 11 on my GPU machine, which has 4
> C2050 cards. But sometimes the jobs still fail. AMBER 11 is compiled with

Can you just confirm specifically up to what patch number you applied. Just
to be sure you have ALL of the latest patches.

> In the first job, the system has 12157 atoms only, and the simulation is
under
> NPT ensemble. If cut is set to 8A, this job runs fine. But if cut is set
to 10, it
> dies with a lot of NaN in energy terms and coordinates.

Can you confirm that you can run this simulation on the CPU with cut=10
without issue?

> In the second job, the system has 34116 atoms. The serial cuda run is OK.
But
> in parallel CUDA run, it dies with error message "max pairlist cutoff must
be
> less than unit cell max sphere radius". However, cut is set to 8A, and the
> distance between the protein and cell boundary is set to 10A.

This occurs when the system blows up, however why it blows up is the issue.

> Can anyone help me out?

Can you please send you input files so we can try to reproduce this.

Thanks,

Ross

/\
\/
|\oss Walker

---------------------------------------------------------
| Assistant Research Professor |
| San Diego Supercomputer Center |
| Adjunct Assistant Professor |
| Dept. of Chemistry and Biochemistry |
| University of California San Diego |
| NVIDIA Fellow |
| http://www.rosswalker.co.uk | http://www.wmd-lab.org/ |
| Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
---------------------------------------------------------

Note: Electronic Mail is not secure, has no guarantee of delivery, may not
be read every day, and should not be used for urgent or sensitive issues.

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Sat Nov 20 2010 - 03:00:02 PST