Re: [AMBER] problem of GTX470 running pmemd.cuda_DPDP/input file access provided

From: Scott Le Grand <SLeGrand.nvidia.com>
Date: Wed, 8 Sep 2010 12:52:13 -0700

Running full double-precision changes the balance of computation and memory access. This could have the effect of cooling the chip.

Running NPT versus NVT also traverses different code paths. This could also have the effect of cooling the chip.

But the big question is if you run the same simulation twice. Does it crash on exactly the same iteration? This is *the* *biggest* question. If it does, then this is a code issue. If not, then it's something else outside of the pmemd.cuda application(s). These simulations are deterministic. Two independent runs on the same hardware configuration and same input files and command line should produce the *same* output.

Scott




-----Original Message-----
From: Sergio R Aragon [mailto:aragons.sfsu.edu]
Sent: Wednesday, September 08, 2010 11:35
To: AMBER Mailing List
Cc: Duncan Poole
Subject: [AMBER] problem of GTX470 running pmemd.cuda_DPDP/input file access provided

Hello Ross,

The job that I wrote to you about, 1faj, just failed with the DPDP program in my 470 card after accumulating 2.3 ns of NVT ensemble. The error messages captured were the following (a little different from previous failures):

Error: the launch timed out and was terminated launching kernel kPMEGetGridWeights
Error: the launch timed out and was terminated launching kernel kCalculatePMENonbondForces

A second kernel time out occurred in addition to the usual one. The DPDP model allowed the system to run a bit more before crashing. It would be very nice if you could try this system on your C2050 card. This 1faj system is also running in an 8 processor machine under Amber 10 and has accumulated 3.66 ns so far under NPT. The density is around 1.07 in both the Amber 10 run and the Cuda_DPDP run (determined by 1ns NPT simulation before starting NVT), at 300K. As I mentioned before, this is a 6 subunit protein, inorganic pyrophosphatase. This system has 65,000 atoms.

An even better system to try to reproduce the error on is 1cts, citrate synthase. This is only a dimeric protein whose file is too big to run under the cuda DPDP program in my 470 card (malloc error). I am running it on Amber 10 and it has accumulated, 20.1 ns under NPT. Under pmemd.cuda, it crashes with the usual kernel time out error (#1 above), in the first ns on NVT md. The density of this system is 1.04 under Amber 10 NPT, and under pmemd.cuda (determined with 1 ns of NPT before starting NVT), at 300K. This system has 79,000 atoms.

I don't know what systems Sasha Buzko is running, but they appear to be smaller than mine. We are trying the 1faj system at SFSU with a GTX 240 card in the default SPDP model. I'm afraid that card does not have enough memory to run this system - we'll find out soon.

I have made an account in my system for you to login; data is provided off list. Thanks!

Sergio

Sergio Aragon
Professor of Chemistry
SfSU

 

-----Original Message-----
From: Ross Walker [mailto:ross.rosswalker.co.uk]
Sent: Monday, September 06, 2010 5:54 PM
To: 'AMBER Mailing List'
Cc: 'Duncan Poole'
Subject: Re: [AMBER] problem of GTX480 running pmemd.cuda

Hi All,

Can we please get a very simple example of the input and output that is
effectively 'guaranteed' to produce this problem. I would like to start by
confirming for sure that this works fine on GTX295, C1060 and C2050. Once
this is confirmed we will know that it is something related specifically to
GTX480 / 470. Unfortunately I do not have any GTX480's so cannot reproduce
things myself. I want to make sure though that it definitely does not occur
on other hardware.

All the best
Ross

> -----Original Message-----
> From: Sasha Buzko [mailto:obuzko.ucla.edu]
> Sent: Monday, September 06, 2010 2:21 PM
> To: AMBER Mailing List
> Subject: Re: [AMBER] problem of GTX480 running pmemd.cuda
>
> Hi Yi,
> yes, this issue does happen to other people, and we are in the process
> of figuring out why these things happen on consumer cards and don't
> happen on Tesla. As far as I know, there is no clear solution to this
> yet, although maybe Ross and Scott could make some suggestions.
>
> As a side note, have you seen any simulation failures with "the launch
> timed out" error? Also, what's your card/CUDA driver versions?
>
> Thanks
>
> Sasha
>
>
> Yi Xue wrote:
> > Dear Amber users,
> >
> > I've been running pmemd.cuda on GTX480 for two months (implicit
> solvent
> > simulation). Occasionally, the program would get stuck: the process
> is
> > running ok when typing "top"; output file "md.out" just prints out
> energy
> > terms at some time point but does not get updated any more;
> temperature of
> > GPU will decrease by ~13C, but it is still higher than the idle
> temperature
> > by ~25C. After I restart the current trajectory, the problem would be
> gone
> > in most cases.
> >
> > It seems like in that case the job cannot be summited to (or executed
> in)
> > GPU unit. I'm wondering if this issue also happens to other people...
> >
> > Thanks for any response.
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
> >
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber



_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------

_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Sep 08 2010 - 13:00:04 PDT
Custom Search