Re: [AMBER] Possible concerns regarding future availability of cost effective NVIDIA GPU systems for running AMBER from David Cerutti on 2017-10-30 (Amber Archive Oct 2017)

From: David Cerutti <dscerutti.gmail.com>
Date: Mon, 30 Oct 2017 16:21:06 -0400

I would add that NVIDIA has reached out to me several times over the past
couple of months with offers to help in the development of pmemd.cuda.
Today they sent me some timings made by one of their software engineers
trying to make better use of NCCL and the NVLink interconnects to get
pmemd.cuda to scale to eight GPUs on the aforementioned DGX-1V. I don't
know what's in the minds of their bureaucracy, but if the profits are in
one sector we can't fault a corporation for tending their bottom line.
Appealing to the corporate magnates to please keep things the way we like
isn't a very strong position. When they don't comply with a plea to forego
their own profits in favor of our academic pursuits, the veil of Robin-Hood
altruism is quickly lost if we then start to complain, i.e. cursing out
their Vice President or sending profanity laced emails.

We may need to adapt, but even if the price point for MD simulations were
to quadruple I don't see it being the end of Amber CUDA development.
Intel's MIC is not yet a game-changer, and even if it were to become
price-competitive with GPUs the parallelism in CUDA is simpler to develop,
maintain, and debug. If the hourly price per ns-atom goes way up with
Volta, there may be a lag where academicians aren't inclined to buy them
for in-house use until the new cards get even more powerful and the
existing architecture breaks down. Very quickly, if not immediately, there
will be vendors who come out with solutions for hosting 1, 2, or 4
server-style Voltas that, while still expensive, will not be out of reach
for an academic lab making a 3-4 year investment. If NVIDIA is making a
lot of profit with its NCCL and NVLink, then we may need to adapt to make
better use of those technologies. Our performance gains on two or four
GPUs have already been deteriorating as the individual cards get faster, so
if we find that future servers only come with NVLink and are more expensive
for that feature, we would be priced out simply by being unable to make
best use of the hardware.

All that aside, it is concerning to have yet another monopoly becoming
established, this one with such power over our own livelihood. The US
Gov't was doing things to keep US chip makers competitive and healthy, not
least because it protects our defense contractors from reliance on
foreign-made hardware or software. By all means, let's also engage in
development for AMD GPUs, but let's not turn our backs on NVIDIA or greet
them with frustration.

Dave

On Mon, Oct 30, 2017 at 3:29 PM, Scott Le Grand <varelse2005.gmail.com>
wrote:

> This is for real guys. DGX-1V is *spectacular* but most entrepreneurs,
> hackers, researchers, and students simply cannot afford it. Without an
> affordable Volta GPU, IMO a large fraction of the CUDA ecosystem will
> bitrot. Existing CUDA code does not compile cleanly under CUDA 9 anymore.
> Pmemd.cuda itself doesn't even work on SM 7 (Volta) unless one forces
> legacy compilation, a feature NVIDIA states will go away soon. recently, I
> refactored the code so that it compiled without warnings and errors, and it
> crashes on Volta.
>
> What this means is that Volta is the end of the line for pmemd.cuda.
> Whatever is going on is above my pay grade and my free cycles to address.
> At my end, I have begun a clandestine effort to get AMBER running on AMD
> GPUs. At your ends, you need to think about the future. It's been a great
> 8 years, but nothing lasts forever.
>
> In a world where NVDA hadn't stopped caring about its own developers in
> favor of #DerpAllTheThings, there would be a GTX Titan XV on which we could
> figure this out and fix it. Instead, you guys are faced with either paying
> $150K upfront plus hosting costs or $25/hr on AWS. I don't see how pro
> bono CUDA development remains a good idea in that world for anyone.
>
> Scott
>
> On Mon, Oct 30, 2017 at 11:57 AM, Ross Walker <ross.rosswalker.co.uk>
> wrote:
>
> > Dear All,
> >
> > In the spirit of open discussion I want to bring the AMBER community's
> > attention to a concern raised in two recent news articles:
> >
> > 1) "Nvidia halts distribution partners from selling GeForce graphics
> cards
> > to server, HPC sectors" - http://www.digitimes.com/news/
> > a20171027PD200.html
> >
> > 2) "Nvidia is cracking down on servers powered by Geforce graphics cards"
> > - https://www.pcgamesn.com/nvidia-geforce-server
> >
> > I know many of you have benefitted greatly over the years from the GPU
> > revolution that has transformed the field of Molecular Dynamics. A lot of
> > the work in this field was provided by people volunteering their time and
> > grew out of the idea that many of us could not have access to or could
> not
> > afford supercomputers for MD. The underlying drive was to bring
> > supercomputing performance to the 99% and thus greatly extend the amount
> > and quality of science each of us could do. For AMBER this meant
> supporting
> > all three models of NVIDIA graphics card, Geforce, Quadro and Tesla in
> > whatever format or combination, you the scientist and customer, wanted.
> >
> > In my opinion key to AMBER's success was the idea that, for running MD
> > simulations, very few people in the academic field, and indeed many R&D
> > groups within companies, small or large, could afford the high end tesla
> > systems, whose price has been steadily going up substantially above
> > inflation with each new generation (for example the $149K DGX-1). The
> > understanding, both mine and that of the field in general, has
> essentially
> > always been that assuming one was willing to accept the risks on
> > reliability etc, use of Geforce cards should be perfectly reasonable. We
> > are not after all running US air traffic control, or some other equally
> > critical system. It is prudent use of limited R&D funds, or in many cases
> > tax payer money and we are the customers after all so should be free to
> > choose the hardware we buy. NVIDIA has fought a number of us for many
> years
> > on this front but mostly in a passive aggressive stance with the
> occasional
> > personal insult or threat. As highlighted in the above articles with the
> > latest AI bubble they have cemented a worrying monopoly and are now
> getting
> > substantially more aggressive, using this monopoly to pressure suppliers
> to
> > try to effectively ban the use of Geforce cards for scientific compute
> and
> > restrict what we can buy to Tesla cards, that for the vast majority of us
> > are simply out of our price range.
> >
> > In my opinion this a very worrying trend that could hurt us all and have
> > serious repercussions on all of our scientific productivities and the
> field
> > in general. If this is a concern to you too I would encourage each of you
> > to speak up. Contact people you know at NVIDIA and make your concerns
> > heard. I am concerned that if we as a community do not speak up now we
> > could see our field be completely priced out of the ability to make use
> of
> > GPUs for MD over the next year.
> >
> > All the best
> > Ross
> >
> >
> >
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Mon Oct 30 2017 - 13:30:02 PDT