Re: [AMBER] Horrific pmemd.cuda performance on an 8-GPU system

From: Ross Walker <ross.rosswalker.co.uk>
Date: Wed, 11 Aug 2010 13:04:25 -0700

Hi Sasha,

> We just got an 8-GPU system that uses onboard PLX switches that expand
> four x16 PCIe slots on a Tyan board to eght. I ran a small production
> simulation test, and the results are horrifying, to say the least.
> If you could help narrow down possible sources of the problem, it would
> be great.

The quick answer would be that something is wrong with the bandwidth to the
cards, they are defaulting to some very low rate however your numbers look
good.

> Device 0: GeForce GTX 480
> 33554432 3306.6
> 33554432 3086.5
> 33554432 111378.4

Compared with what you get on your reference system and what I get on mine:

   33554432 2490.3
   33554432 2080.6
   33554432 88706.4

So it doesn't look like bandwidth is the issue.

Specifically what is the test system you are running? - Can you run one of
the standard ones from the amber website. Either JAC or FactorIX NVE and see
what you get in ns/day.

Some things that could make it slow:

1) You are running in DPDP mode for some reason. This will be VERY slow on a
GTX480 and only 'slow' on a C2050.

2) The calculation is running on the same GPU that is running the xwindows.
This can often cause problems. Especially if say a screensaver kicks in.
Make sure you uninstall the screensaver rpm, turn off any power saving etc.
I would also try manually specifying other device ID's and see if the
performance varies with device ID.

Also can you put the GTX480 into the reference system and try it there. And
vice versa. That would rule out if it is an issue with the card or not.

> Could it be a BIOS problem, an issue with the PLX switching, slow hard
> drive?

Set ntwx=0, ntpr=100000,ntwr=1000000 and that will rule out the hard drive
since then it won't do any io. It could be a BIOS issue or maybe even a
driver issue. Are you using the same driver version on both machines? and
Same cuda toolkit version?

I would start by seeing if the poor performance is reproducible across all
the cards in the new machine to see if it is something like xwindows being a
resource hog.

All the best
Ross

/\
\/
|\oss Walker

---------------------------------------------------------
| Assistant Research Professor |
| San Diego Supercomputer Center |
| Adjunct Assistant Professor |
| Dept. of Chemistry and Biochemistry |
| University of California San Diego |
| http://www.rosswalker.co.uk | http://www.wmd-lab.org/ |
| Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
---------------------------------------------------------

Note: Electronic Mail is not secure, has no guarantee of delivery, may not
be read every day, and should not be used for urgent or sensitive issues.






_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Wed Aug 11 2010 - 13:30:03 PDT
Custom Search