Thanks for the suggestions, guys.
It turned out to be a driver issue. The X server wasn't even running
(runlevel 3), and I used the same SPDP executable as the reference
system, so I checked the driver version. The originally installed was
195.36.31 - the same as on the reference system. But when I installed
driver 256.40 on the test 8-GPU system, it worked. Performance numbers
ended up pretty much the same for pmemd.cuda.
Not sure why it required the newer driver, even though the cards are
identical.. perhaps it had something to do with cards sharing an x16
slot and traffic going through a PLX switch.. Nvidia folks might know
more about this.
Thanks again
Sasha
Ross Walker wrote:
> Hi Sasha,
>
>
>> We just got an 8-GPU system that uses onboard PLX switches that expand
>> four x16 PCIe slots on a Tyan board to eght. I ran a small production
>> simulation test, and the results are horrifying, to say the least.
>> If you could help narrow down possible sources of the problem, it would
>> be great.
>>
>
> The quick answer would be that something is wrong with the bandwidth to the
> cards, they are defaulting to some very low rate however your numbers look
> good.
>
>
>> Device 0: GeForce GTX 480
>> 33554432 3306.6
>> 33554432 3086.5
>> 33554432 111378.4
>>
>
> Compared with what you get on your reference system and what I get on mine:
>
> 33554432 2490.3
> 33554432 2080.6
> 33554432 88706.4
>
> So it doesn't look like bandwidth is the issue.
>
> Specifically what is the test system you are running? - Can you run one of
> the standard ones from the amber website. Either JAC or FactorIX NVE and see
> what you get in ns/day.
>
> Some things that could make it slow:
>
> 1) You are running in DPDP mode for some reason. This will be VERY slow on a
> GTX480 and only 'slow' on a C2050.
>
> 2) The calculation is running on the same GPU that is running the xwindows.
> This can often cause problems. Especially if say a screensaver kicks in.
> Make sure you uninstall the screensaver rpm, turn off any power saving etc.
> I would also try manually specifying other device ID's and see if the
> performance varies with device ID.
>
> Also can you put the GTX480 into the reference system and try it there. And
> vice versa. That would rule out if it is an issue with the card or not.
>
>
>> Could it be a BIOS problem, an issue with the PLX switching, slow hard
>> drive?
>>
>
> Set ntwx=0, ntpr=100000,ntwr=1000000 and that will rule out the hard drive
> since then it won't do any io. It could be a BIOS issue or maybe even a
> driver issue. Are you using the same driver version on both machines? and
> Same cuda toolkit version?
>
> I would start by seeing if the poor performance is reproducible across all
> the cards in the new machine to see if it is something like xwindows being a
> resource hog.
>
> All the best
> Ross
>
> /\
> \/
> |\oss Walker
>
> ---------------------------------------------------------
> | Assistant Research Professor |
> | San Diego Supercomputer Center |
> | Adjunct Assistant Professor |
> | Dept. of Chemistry and Biochemistry |
> | University of California San Diego |
> | http://www.rosswalker.co.uk | http://www.wmd-lab.org/ |
> | Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
> ---------------------------------------------------------
>
> Note: Electronic Mail is not secure, has no guarantee of delivery, may not
> be read every day, and should not be used for urgent or sensitive issues.
>
>
>
>
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Thu Aug 12 2010 - 09:30:03 PDT