Introduction
It might not quite be justified to call today historical, but this July 17, 2002 is certainly marking a significant quirk in the continuity of the PC graphics scene. For years, the Toronto-based 3D chip maker ATi used to run behind technology leader NVIDIA, often just about managing to catch up for a short time, but utterly unable to take a meaningful lead in terms of technology or performance.This trend is over today, as ATi is attacking its Californian arch enemy in three areas. Radeon 9000 (Pro) marks the first DirectX 8-capable 3D chip in the value segment that is so financially important for 3D chip makers, while Radeon 9700 will take the crown of the technology as well as performance leader. Additionally, ATi is giving the first presentation of its highly anticipated ‘RenderMonkey’ 3D-tool suite, which will finally be beta-released next week at SIGGRAPH. Different to previous releases of ATi products, NVIDIA will not be able to counter ATi’s attack with the launch of an even more powerful product a few weeks later. Mysterious ‘NV30,’ NVIDIA’s “DX9 weapon,” is still months away from its inception. The Santa Clara-based 3D chip maker will have to live with the No. 2 status for a while.
Of course, it wouldn’t be an ATi launch if there weren’t a few of the ‘ regular irregularities.’ While cards with Radeon 9000 Pro, ATi’s new DX 8.1 value chip, should be found on store shelves within the next days, the world will still have to wait a few weeks until cards with the new ‘super chip’ Radeon 9700 will become available. ATi says it won’t be before the second half of August 2002 that you can purchase their new wonder product. This article will focus on Radeon 9700, while a parallel piece is dedicated to Radeon 9000 Pro.
Unusual Launch Preparations
Before I start to get into the technology review of ATi’s new ьber-product, I’d like to make a few comments about ATi’s launch procedure. These are just personal thoughts from the perspective of a computer journalist and of no consequence whatsoever to you readers out there, which is why you might want to skip this paragraph.
For the first time in five years, ATi has developed a product that can claim technology as well as performance leadership. The amazement of this incredible perspective must have paralyzed ATi’s marketing department, as there was an utter lack of the usual technology white papers, demos and other launch information prior to the release day. Only a week ago, we (the Tom’s Hardware reporter team dedicated to this event) weren’t even quite sure of what exactly would be launched today, because ATi wasn’t able to give us this information. In the last few days before July 17, we were finally buried in new information on an hourly basis, barely able to put all that stuff into the upcoming article. Radeon 9700 is an impressive part, but because of this you want to get the word out about its technology well ahead of time.
ATi’s choice of the location for the launch of the new Radeon 9000 series could be considered gutsy or maybe crazy. Instead of inviting the press to ATi’s ‘home town’ Toronto, the launch event was placed right into NVIDIA’s front yard in San Francisco, California. This might be a display of defiance, along the lines of “Hello NVIDIA! We are not afraid of you!” and in fact, NVIDIA was slightly irritated. However, it gave NVIDIA the chance to try and spoil ATi’s efforts prior to the launch. Journalists were told that NVIDIA is less than impressed by Radeon 9000 as well as Radeon 9700 and that they don’t believe ATi is going to sell a lot of either product. I leave it up to you to decide if you consider these words meaningful or simply sour grapes. It remains a fact that this time, NVIDIA will have to accept ATi’s leadership for quite a bit longer than just a few weeks.
ATi’s Radeon 9700 ‘VPU’ – An Introduction
ATi’s new high-end 3D chip, also known under its code name “R300,” was designed for Microsoft’s upcoming DirectX 9 specification. ATi calls it ‘VPU’ for Visual Processing Unit to distinguish it from the term ‘GPU,’ which was invented by NVIDIA at the introduction of GeForce256 in 1999. The step from the Direct X 8.1 of today to DX9 will be another major leap forward, as it will allow a new level of 3D quality and add a new amount of 3D features. In Direct X 9, vertex shader programs can be much more complex than before, as the new vertex shader specs add flow control, a lot more constants, as well as up to 1024 vertex shader instructions per program. While the new pixel shaders will still not allow flow control, the maximum number of pixel shader instructions has grown to 160. The real key feature of DirectX 9, however, is the introduction of RGBA values in 64 (16-bit FP per color) as well as 128-bit (32-bit FP per color) floating point precision. This great increase of color precision allows a stunningly new amount of visual effects and picture quality.
DirectX 9 won’t be out for several months and so you might wonder about today’s value of a 3D chip that has been designed for those future specifications. This is a very valid question, since DirectX 9 titles won’t ship before Microsoft has actually released DirectX 9. However, OpenGL titles that are programmed for the above-mentioned features can fully use Radeon 9700 already today. It’s another question how long we will have to wait until we will indeed see the first games that take full advantage of the new features, but e.g. Id’s upcoming Doom 3 will at least make use of a few of Radeon 9700’s new capabilities.
Besides all this new stuff, Radeon 9700 will also have enough brute force to accelerate today’s DX7 and DX8 games to beyond what we have seen before. The eight-pixel-pipeline rendering unit, four parallel vertex shaders and a 256-bit memory interface with a bandwidth of 20 GB/s make sure that the previous performance leader is left in the dust. Radeon 9700 might be a bit ahead of its time, but not so far ahead that you wouldn’t be able to enjoy it already today.
Radeon 9700 – Fully Loaded
It’s finally about time that I spill the beans about the technical specifications of ATi’s upcoming Radeon 9700 chip.
Let’s start with the numbers.
The Die
Radeon 9700 is still produced in 15µ process and its 110+ million transistors are responsible for a rather respectable size. It is packaged as a flip chip, so that the actual size of the R300-chip can easily be spotted. I forgot to measure the chip, but it is pretty much as big as Intel’s first Pentium 4 core ‘Willamette.’
ATi is ‘shooting for’ a 325 MHz clock rate, but we all know that the Canadian 3D chip maker is infamous for changing chip clock specs ‘on the fly.’ Let’s hope that ATi will finally leave this unpleasant tradition behind and stick to the announced core clock frequency or even a few MHz more.
The power requirements of the chip plus memory will be higher than what ATi wants to run through the AGP-slot. Therefore, Radeon 9700 cards will have an extra floppy drive-size power connector, to avoid a too close resemblance to 3dfx’s good old Voodoo5 cards.
AGP 8x
It is not surprising that Radeon 9700 is an AGP 8x part, as all new 3D chips today follow this new standard. AGP 8x is not exactly an exciting specification, as it merely doubles the already meager 1 GB/s bandwidth of AGP 4x. Still the 2 GB/s of AGP 8x will improve the bottleneck situation for vertex transfers from the host to the graphics chip for a little while.
The Memory Controller
As I already mentioned, Radeon 9700 will have a 256-bit wide memory interface, making it the second 3D chip after Matrox’ Parhelia to utilize an access path of this size to its onboard memory. The DDR-memory will be clocked at 310 MHz or something close, offering a bandwidth of 256/8*2*310 = 19,840 MB/s, or 19.4 GB/s (1 GB = 1024 MB for those of you who don’t work for a hard drive maker).
Just quoting the raw bandwidth doesn’t do the new memory interface of ‘R300’ justice. Finally, ATi has followed NVIDIA’s example and included a crossbar memory controller that divides the 256-bit wide memory interface into four sub-units that can access the memory separately, making memory accesses more efficient.
Radeon 9700 will be able to be equipped with up to 256 MB of onboard memory, but the initial version will come with the lately common 128 MB.
R300 has already been designed with the new DDRII memory type in mind, so that future cards can be equipped with this upcoming memory type as well, once it becomes available.
The Vertex Shaders
Now it becomes theoretical. Radeon 9700 comes with four parallel vertex shader units, which all listen to the new Vertex Shader 2.0 standard. This doesn’t just give Radeon 9700 twice the amount of vertex shaders found in NVIDIA’s current flagship GeForce4 Ti, but four units that are much more sophisticated, per se, than the two vertex shaders of NV25.
The Vertex Shader 2.0 specification finally blesses future vertex shaders with real programmability, as it includes flow control instructions, such as conditional jumps, loops or procedures. Vertex programs can now consist of up to 1024 instructions (previously 128 instr.), but this is only a theoretical number, as loops and jumps naturally allow even larger numbers of successive instructions. The amount of constants has been increased to 256, and, most importantly, the new spec allows 128-bit FP color precision for the lighting part of the vertex processor.
ATi says that the four parallel vertex shaders of Radeon 9700 are capable of processing one vertex per clock, leading to a vertex or triangle rate of no less than 325 Mtriangles/s in case Radeon 9700 should indeed be clocked at 325 MHz.
Besides the already known N-patch ‘higher order surface’ tessellation, Radeon 9700 also includes displacement mapping, called “displacement mapped N-patch.” This feature could already be found in Parhelia and is also part of DX9. ATi counts it into its new ‘Truform 2.0’ specification.
ATi supplied a good example for displacement mapping, comparing it to dot3 bump mapping. The latter is also able to give surface the look as if it had more geometry, but as a texture operation, it doesn’t really produce more vertices. Displacement mapping does indeed create geometry. You can see the difference between the two in the picture below.
Pixel Shader 2.0 Specification
When, a bit more than a year ago, NVIDIA released GeForce3, which was the first 3D chip to include vertex as well as pixel shaders, the world of 3D developers welcomed the vertex shader, but criticized the half-hearted specs of the pixel shader. DirectX 9 is supposed to finally improve this situation, even though the new pixel shader 2.0 spec doesn’t seem to be as much of a leap forward as one would have expected. The most important addition to the pixel shader spec is certainly the inclusion of 64-bit and 128-bit floating point color precision. Besides that, new instructions have been added and the maximum number of pixel shader instructions has been increased to (a still measly looking) 160, but there is still now flow control. NVIDIA’s upcoming ‘NV30′ will go several steps further than that, but more about this in three or four months’ time, when NVIDIA’s new chip’s review units will finally hit the reviewers’ desks.
Floating Point Precision Color
As already mentioned, the most important new spec of PS 2.0 is the inclusion of 64 or even 128-bit floating point color precision. Why is that so important?
So far, the highest color precision we could use on a PC was 32-bit. A 32-bit integer number can be anything between 0 and 232-1 = 4,294,967,295. If you look at it from that point of view, you could say that 32-bit color supports more than 4 billion different colors. Is that not enough?
First of all, 32-bit color does not really allow 4 billion colors. In fact, only 24 bit are used for the ‘RGB’ (=red/green/blue) color information. The remaining 8 bit usually carry the ‘alpha’ value. 224 is only 16,777,216, and this is the number you certainly know as ‘true color,’ 16.7 million different colors. Still, this number looks rather respectable. However, if we look at it more closely, we can see that 32-bit color is even less impressive. The 24 bits that carry the color information consist of 8 bit for red, 8 bit for green and 8 bit for blue. This means that for each elementary color, you’ve got a precision of a measly 8-bit, allowing integer values from 0 to 255. That’s not really a lot, is it? This 8-bit per color channel precision has one serious flaw, it doesn’t allow a whole lot of dynamic. Dynamic means the difference between the lowest and the highest value. If we forget about the ‘0’ value for a minute, and just start at ‘1,’ then we can see that the biggest dynamic from this value to the max is only 255. This is what game developers are fighting with — they want to have very dark, as well as very bright areas in their 3D worlds, and the range from 1 to 255 for each color channel just isn’t good enough.
Now in the computer world, there’s two different ways of handling numbers. The easiest to understand are, of course, integer numbers, which are stored ‘just as they are.’ Floating point numbers work differently. Here the number consists of a sign bit, a few bits for the exponent and quite a lot of bits for the mantissa. The formula would be x = m*2e, where ‘x’ is the number, ‘m’ is the mantissa and ‘e’ is the exponent. As you can see, the smallest as well as largest floating point number is specified by the exponent, while the mantissa defines the precision. In the case of a 128-bit FP number, each color channel has a 32-bit floating point precision (IEEE single precision, remember ‘SSE’?), which consists of 1 sign bit, 8 exponent bits (7 plus sign) and 23 mantissa bits. This allows a dynamic range from 0.00000000000000000000000000000000000000294 (=2-128) to 170,000,000,000,000,000,000,000,000,000,000,000,000 (=2127), and with 23-bit, a much higher precision than the 8-bit precision of the 32-bit color values.
The increase in dynamics as well as precision enables a quantum leap in terms of image quality. It also opens the door for a lot of new effects that weren’t really possible before.
On the left you see the same scene in 32-bit color and on the right you see the dramatic increase in color and brightness dynamics due to the usage of 128-bit FP color precision.
ATi shows a demo of a low polygon car that is visually enhanced with dot3 bump mapping using a 64-bit precision normal map. This kind of effect was impossible with 32-bit integer normal maps, as you can see in the demo pictures below.
Unfortunately, 128-bit color requires four times the memory bandwidth of 32-bit color. It’s going to take a while until memory technology will caught up with that.
The Pixel Rendering Pipelines
ATi is proud that Radeon 9700 is the first mainstream graphics chip with eight parallel pixel rendering pipelines. This is twice the amount of pipelines found in current high-end graphics chips. At a clock of 325 MHz, the eight pipelines are able to supply a fill rate of 8 * 325 = 2,600 Mpixels/s. Each pixel rendering pipeline has one texture unit, so the multi texturing fill rate is the same as the single texturing fill rate above. It might look as if one texture unit per pipeline is very little, but if you calculate the memory bandwidth requirement of eight parallel pipes with one texture unit doing a trilinear 32-bit color texture lookup, you will understand why two texture units wouldn’t have made an awful lot of sense: 32 bit * 8 (trilinear filtering requires 8 texels to be read) * 8 (eight pipelines) = 2048 bit. 2048 bit would have to be read per clock, but ‘only’ 512 bit per clock are provided by the 256 bit-wide DDR memory interface of Radeon 9700. Bilinear filtering mode would still require 1024 bit per clock. Two texture units per pipe could never be fed by the memory interface. This is why it wouldn’t have made sense to add those units.
The Pixel Shaders
Each pixel rendering pipeline of Radeon 9700 is a separate pixel shader. Following the new PS 2.0 spec, those shaders can run programs of up to 160 instructions. Each pixel shader program can do up to 32 texture sampling operations on up to 16 different texture maps and an additional 64 color operations per pass. The amount of clock cycles per pass is, of course, variable and can certainly reach rather high numbers as well, especially when anisotropic filtering is used at the same time. When the 160 instruction limitation should turn out to be too small, the result can be fed into the pixel shader for another pass without losing any precision, since the result can be handled in 64 or 128-bit floating point precision.
Hyper-Z III
Since the release of Radeon256 in July 2000, ATi calls its technique to avoid the rendering of hidden surfaces ‘Hyper-Z,’ and Radeon 9700 is equipped with its third generation. Just as the two versions before, it’s meant to save precious memory bandwidth.
Hyper-Z divides the frame buffer and Z-buffer in blocks of 8×8 pixels, which can be cached and handled very efficiently. Fast Z-clear only clears a flag for each block, speeding up the Z-clearing process.
The newly improved lossless Z-compression is able to compress the z-values in those 8×8 pixel blocks at a compression ratio of between 2:1 up to 4:1 before the blocks are written to the Z-buffer, thus saving memory bandwidth.
In the case of 6x FSAA, the Z-compression (as well as color compression) can be up to 1:24, as for FSAA the frame and Z buffer blocks are utilized as well and the six corresponding sample blocks are compressed as one.
Hierarchical-Z is also utilizing the flag that stands for each 8×8 pixel Z-buffer block. This flag contains the lowest Z-value that is found in the block represented by the flag. The Z-value of a pixel that comes from triangle setup is checked against the Z-flag of the block in which the pixel is supposed to be drawn. If the pixel’s Z-value is lower than the flag’s value, the pixel is discarded and the block is not read from the Z-buffer into the cache. Should the Z-value of the pixel be higher than the flag value, the block is read from the Z-buffer into the cache, undergoing a Z-decompression along the way.
Smoothvision 2.0 – FSAA
ATi calls its implementations of anti-aliasing and anisotropic filtering ‘Smoothvision’ and Radeon 9700 is equipped with the latest generation Smoothvision 2.0. Unlike its predecessor Radeon 8500, Radeon 9700 uses a multi-sampling technique for its anti-aliasing implementation. Its superior quality and performance is due to a few special techniques. First of all, Radeon 9700 is using patterns of sub-pixel samples for its multi-sampling that are different to NVIDIA’s implementation. Below you see ATi’s 6x FSAA sample pattern:
ATi is also sampling the Z-buffer values to ensure a better quality FSAA. You get an idea how this is done in the picture above.
ATi is storing the multiple frame samples in the same way as I described for the Z-buffer (Hyper-Z III). The dividing of frame samples into 8×8 pixel blocks allows a lossless compression of the frame buffer as well as Z-buffer blocks across all the different samples (up to 1:24 compression in case all pixel samples carry the same color and Z-value). This technique is able to save a significant amount of memory bandwidth, which is the bottleneck in FSAA. It ensures that the performance impact of FSAA is significantly lower than what we see in other implementations.
Besides the special sample patterns and the Z-buffer sampling, Smoothvision 2.0 comes with another special feature that ensures superior FSAA on Radeon 9700. A patented gamma-correction algorithm ensures that the color gradients generated by the FSAA procedure are displayed as smoothly as they are supposed to be. Most CRTs or flat panels come with a non-linear gamma gradient. ATi’s algorithm corrects the non-linear behavior and ensures smooth color gradients.
We had the chance to see a demo of a spokes wheel on Radeon 9700 besides the same wheel on GeForce4, both running at 4x FSAA. It was highly impressive to see how much better this wheel looked on Radeon 9700.
Smoothvision 2.0 – Anisotropic Filtering
Anisotropic filtering is a special filtering technique that greatly improves the quality of textures on surfaces that are under a larger angle to the viewer (e.g., like walls along a corridor that we look down). The reason why neither bilinear nor trilinear texture filtering does a good job here is because the actual pixel covers a larger part of the texture than the four or eight texel samples that are used to define the pixel color in the above mentioned filtering techniques. Anisotropic filtering takes up to 16 bilinear or trilinear samples along the slope of the surface to define the color of the pixel.
We know that ATi’s implementation of anisotropic filtering for Radeon 8500 is running a lot faster than NVIDIA’s implementation on GeForce4 Ti. NVIDIA complains that ATi is only taking bilinear samples, while NVIDIA uses trilinear texture samples, which costs twice the memory bandwidth. The new anisotropic filtering found in Radeon 9700 has a ‘performance’ setting that uses bilinear samples and a new ‘quality’ setting that uses trilinear samples. This should satisfy NVIDIA as well as every future Radeon 9700 owner.
ATi Video Shader
The new programmability of the pixel shaders allows their usage in the video processing engine of Radeon 9700. Several tasks of the video decoding process can be done by the pixel shaders. ATi calls this new technology Videoshader.
Videoshader allows Radeon 9700 cards to do without a special video chip.
ATi demonstrated how ‘Videoshader’ is able to de-block a streamed low-bandwidth video in real time, or how certain effects can be applied to a video signal, such as blurring, embossing or outlining.
Display Output
Matrox introduced it already with Parhelia and it will be a part of Microsoft’s DirectX 9; by “it” we mean the new 10/10/10 bit color precision output format that should provide us with a more vibrant image experience. Usual 32-bit color is only using 24-bit for the actual color information, while the remaining 8-bit are not used for the output to a CRT or flat panel. Radeon 9700 is able to use 10-bit precision for each color channel, supplying 1024 different levels of red, green and blue rather than the mere 256 different levels known so far. I am sure that analog output devices have a good chance of benefiting from this new format, while I wouldn’t know how digital flat panels are supposed to handle the additional two bits per color channel.
Radeon 9700 comes with the already well known HYDRAVISION software for two simultaneous displays. It has two integrated 10-bit per channel 400 MHz RAMDACs for CRTs, and one integrated 165 MHz TDMS transmitter for digital flat panels. The integrated TV-out supports NTSC/PAL/SECAM formats with a resolution of up to 1024×768.
RenderMonkey
Last but not least, I will only briefly mention our first glimpse at ATi’s 3D development tool suite, ‘Rendermonkey.’ This software can seamlessly be used as a plugin with any of the current 3D-development suites, generating vertex and pixel shader code. Rendermonkey is very comfortable to use for developers as well as artists, and should make the development of titles that use vertex and pixel shaders a lot easier than it has been thus far. Additionally, ATi includes a compiler for Renderman and told us that another compiler for Maya is in the works. Rendermonkey will be available from ATi’s website once it has officially been beta-released at SIGGRAPH next week, and it will be free of charge. What we heard is that it is currently regarded as much more useful than NVIDIA’s pseudo-standard Cg.
Callan McInally, one of the fathers of Rendermonkey, presents his baby.
Performance Evaluation
Unfortunately, Radeon 9700 is still a few weeks away from its final form, which is why ATi was not supplying test samples to the press. However, to back up the claims in the white papers, ATi allowed us to run tests on a Pentium 4 2.5 GHz system equipped with a Radeon 9700 prototype alongside an identical system equipped with NVIDIA’s GeForce4 Ti4600. ATi asked us to refrain from publishing the actual numbers we saw, but to just report our impressions. The three hours of hands on testing did not disappoint us. Here is the performance report of Lars Weinand, our 3D specialist:
There’s no denying the fact that the specifications of Radeon 9700 promise a very fast graphics card. While full blown test samples are unfortunately still a few weeks away, we were given the chance to test a prototype and compare its performance to a GeForce4 Ti4600 in an identical system.
Tests showed that Radeon 9700 is clearly superior to NVIDIA’s GeForce4 Ti4600, especially once high resolutions, FSAA or anisotropic filtering are used. However, what would you expect from a card with twice the memory bandwidth of its competitor? ATi promises twice the performance of GeForce4 Ti4600 in any game, but that should be considered pure marketing hype. Naturally, it takes high resolutions and full scene anti aliasing to beat NVIDIA’s flagship to the punch. CPU-limited games are of course running just as fast on either of the two cards.
Thanks to the superior performance of Radeon 9700 under FSAA and anisotropic filtering, there is hardly any reason left to run games without these features, as the performance penalty is considerably small. ATi’s new high-end card beats GeForce4 Ti4600 even in terms of quality, at least when it comes to anti aliasing. The spokes wheel demo spoke a clear language. ATi’s new gamma correction feature makes a considerable difference. However, Radeon 9700 doesn’t need those quality enhancing features to beat the competitor from NVIDIA. The raw power of R300 makes sure that it scores better even at normal settings.
It is not yet clear if the omission of a second texture unit per pixel rendering pipeline was indeed a smart choice. Under multi texturing conditions, GeForce4 Ti4600 is theoretically able to supply just the same amount of pixels per clock as Radeon 9700. The fill rate test of 3DMark2001 SE hinted in the same direction. The score of Radeon 9700 is very close to the result of GeForce4 Ti4600.
Here are a few numbers for those of you who can’t wait. The 3DMark2001SE score of GeForce4 Ti4600 was 11,400, while Radeon 9700, with its young driver, scored 14,000. Once 4x FSAA and 8x anisotropic filtering was used, the scores changed quite significantly. GeForce4 Ti4600 was able to get a mere 4500 points, while Radeon 9700 scored 10,000 points, thus, more than twice the points of the competitor from NVIDIA.
Future games with pixel and vertex shader effects will naturally benefit most from ATi’s new flagship card. Supposedly, ATi is expecting 100 games for Christmas 2002 that use DirectX 8 effects. At this time, DirectX 9 will finally be out, and Radeon 9700 will have much tougher competition than it does today. We don’t expect DirectX 9 to make much of a difference this year. It is known well enough how long it takes until new features get adopted by game developers.
There’s no denying that right now, Radeon 9700 is way ahead of the competition. It plays in a different performance league than the rest of the graphics cards that are available today. Different to Parhelia, it doesn’t take major exploration to find the performance of Radeon 9700. However, in three months NVIDIA is certain to get its revenge. We will see how the upcoming ‘NV30’ will fare against Radeon 9700. Until then, ATi has all the reason in the world to enjoy its new leadership role.
We will follow up as soon as ATi supplies us with final review samples.
Please follow-up by reading The new mainstream Radeon: The 9000 series.