Nvidia's CUDA: The End of the CPU? [Archive]

View Full Version : Nvidia's CUDA: The End of the CPU?

Suhit Gupta

06-30-2008, 01:00 PM

<div class='os_post_top_link'><a href='http://www.tomshardware.com/reviews/nvidia-cuda-gpu,1954.html' target='_blank'>http://www.tomshardware.com/reviews...a-gpu,1954.html</a> </div>"The operation of a GPU is sublimely simple. The job consists of taking a group of polygons, on the one hand, and generating a group of pixels on the other. The polygons and pixels are independent of each other, and so can be processed by parallel units. That means that a GPU can afford to devote a large part of its die to calculating units which, unlike those of a CPU, will actually be used. GPUs differ from CPUs in another way. Memory access in a GPU is extremely coherent – when a texel is read, a few cycles later the neighboring texel will be read, and when a pixel is written, a few cycles later a neighboring pixel will be written. By organizing memory intelligently, performance comes close to the theoretical bandwidth. That means that a GPU, unlike a CPU, doesn’t need an enormous cache, since its role is principally to accelerate texturing operations. A few kilobytes are all that’s needed to contain the few texels used in bilinear and trilinear filters."<img border="0" src="http://images.thoughtsmedia.com/resizer/thumbs/size/600/dht/auto/1214790337.usr14.jpg" alt="" />The article is not only a great history lesson in the GPU but also raises some good though provoking questions, like what kind of processing should be done on the CPU vs the GPU, how much should the CPU be coordinating video processing, etc. Overall, the direction in which we are moving is somewhat expected really -- i.e. specialized hardware created for various types of processing. Furthermore, the introduction of CUDA, with the GeForce 8800 series, application developers have a whole new level of control of the GPU cores. Have any of you used the beta APIs?

Felix Torres

06-30-2008, 04:26 PM

A couple of thoughts come to mind:

1- The use of GPU resources for non-graphics apps is already a reality on curent-gen gaming consoles (360/PS3). The 360 in particular has a fairly clean interface for allocating CPU and GPU resources within a single app. The best known examples being the Dashboard Media player, the MC Extender app, the HD-DVD playback app and the WMV/VC1 and H.264 codecs. In other words, digital media apps. Tells us what to expect on the PC side from stuff like CUDA; its not just for ivory tower number crunchers like the old Transputer or Cell...

2- When the XBOX team presented the first working 360 to Chairman Gates, his first question was "Can it run Windows?". The answer was no, because the Xenon procesor does not support out-of-order execution and without it, performance would be unaceptable. Interestingly, neither does Intel's Atom processor, which does run Windows, but it is a three year-newer design. Which suggests the next XBOX could bring interesting things with it...

3- Speaking of the next XBOX, given that the 360 CPU and GPU tally up to 500 millions transistors, combined, Moore's law (transistor counts double every 18-24 months) suggests a 2011/12 XBOX-next should easily have 4 billion transistors to work with for CPU/GPU/dedicated EDRAM and still meet the manufacturing costs of the current models. Keeping the current architecture for compatibility purposes, the could simply scale up everything by about 4 or so; General purpose execution units (say 16 threads instead of 6), vector processing units (4 instead of 1), and dedicated RAM (64MB instead of 10). That is a *lot* of power and more than needed for "merely" 1080p/60fps game rendering. Which opens the door to dedicated physics hardware, ray-traced graphics (Intel is working on that for PCs) as well as raster-based rendering, or possibly adding out-of-order execution code execution to the GP execution units. In other words, MS has all the tools needed to create its own PC architecture, distinct and separate from x86. Since Sony's PS3 has proven people will pay up to $800 for a console, MS could come out with a full-function Media Center XBOX in the next generation along with the replacements for the pure game console/media extender of the current-gen.

4- Folks tend to forget that Intel is a big-time graphics chip vendor; just because they don't play at the high end doesn't mean they're not in the game and as NVIDIA and AMD/ATI start to cross-link GPU and CPU functionality, Intel will have strong incentive to bring equivalent vector-processing power to x86 chipsets. In other words: watch for the other shoe to drop.

There's a whole new computing model coming to digital media processing...

Jason Dunn

07-01-2008, 04:00 AM

4- Folks tend to forget that Intel is a big-time graphics chip vendor; just because they don't play at the high end doesn't mean they're not in the game and as NVIDIA and AMD/ATI start to cross-link GPU and CPU functionality, Intel will have strong incentive to bring equivalent vector-processing power to x86 chipsets. In other words: watch for the other shoe to drop.

Intel likes to make noise, then they fizzle when it comes to the actual execution. I'll believe it when I see it. :rolleyes:

Felix Torres

07-01-2008, 01:30 PM

Intel likes to make noise, then they fizzle when it comes to the actual execution. I'll believe it when I see it. :rolleyes:

Hey, you're the one that posted the Anandtech article about the Intel Tick-Tock strategy! :cool:

Well, the thing to keep an eye on is the *next* tock.
The one after Nehalem, due circa 2010.
Code-named Larrabee.
http://arstechnica.com/articles/paedia/hardware/clearing-up-the-confusion-over-intels-larrabee.ars

Intel shows off Raytraced Quake 4 - The INQUIRER (http://www.theinquirer.net/en/inquirer/news/2007/04/23/intel-shows-off-raytraced-quake-4)

http://www.pcper.com/article.php?aid=534&type=expert&pid=3

Larrabee will be formally introduced at SIGGRAPH 2008 in August:
SIGGRAPH Core | Tuesday, 12 August | 10:30 am - 12:15 pm | Hall B

Session Chair/Discussant
Marc Olano, University of Maryland (http://www.siggraph.org/s2008/attendees/papersform/index_7.php)

Larrabee: A Many-Core x86 Architecture for Visual Computing
This paper introduces the Larrabee a many-core hardware architecture, a new software rendering pipeline, a many-core programming model, and performance analysis for several applications. Larrabee uses multiple in-order x86 CPU cores that are augmented by a wide vector processor unit, as well as fixed-function co-processors. This provides dramatically higher performance per watt and per unit of area than out-of-order CPUs on highly parallel workloads and greatly increases the flexibility and programmability of the architecture as compared to standard GPUs.
My basic point is that as use of GPUs as non-graphics processors increases, the CPU vendors (INTEL/AMD/VIA) are not going to stand still; each of the three has its own in-house GPU expertise that will be applied to the competition. Everybody knows (in general) of the AMD/ATI effort; well, Intel isn't standing still (Larrabee), MS has the tools it needs to play the game in its own domain (XBOX) and Via owns S3.

NVIDIA may (or not) remain dominant in GPUs (ATI has their own ideas about that) but just becazuse they provide ways to use GPUs for GP code doesn't mean they're going to shove anybody out of the market.
Not without a fight.
And a fair fight it will be because, as the CUDA article makes clear, NVIDIA is mapping GPU functions onto a GPU architecture. On the other hand, Larrabee, its AMD equivalent (Fusion), and other many-core CPU/GPU hybrids to come won't be so constrained. Those will use high-end GPU-type vector processing units that are designed from the ground up for GP computing. Should be a fun fight to watch.

CUDA is out today, which is great. But it has a limited window of opportunity to establish itself because that window won't last forever. Sooner, rather than later, x86 CPUs will adopt GPU tech (just as they adopted RISC tech and wiped out that breed of competitors) and take the fight right back to NVIDIA.

That is the other shoe waiting to drop; the counterattack.

PS: Today Intel just told programmers to start thinking in terms of hundreds and thousands of cores in future CPUs and architectures.

Intel says to prepare for 'thousands of cores' | Nanotech: The Circuits Blog - CNET News.com (http://news.cnet.com/8301-13924_3-9981760-64.html?tag=nefd.riv)

(Strictly speaking, I think the guy meant threads or execution units, but if the cores are simple enough, all three could be one and the same...)