The Technology Behind the Radeon 9000
Generally speaking the new Radeon 9000 chip is based on the Radeon 8500 design and has a vertex shading unit (v1.0) as well as a pixel shader unit (v1.4). This gives ATI a jump ahead of its San Jose-base competitor in the lower price segment ($100-$150), because it brings the first real DirectX 8 card on the market. NVIDIA’S mainstream product, GeForce4 MX, is still a pure DirectX 7 design stemming from the somewhat outdated GeForce2. Vertex shader programs can only be run with GeForce4 MX with the help of optimized software emulation in the NVIDIA driver. Even SiS, the newcomer to the 3D arena, offers pixel shader support with its new Xabre card!
NVIDIA GF4 Ti 4200 | NVIDIA GF4 MX 460 | ATI Radeon 8500 | ATI Radeon 9000 Pro | ATI Radeon 9000 | |
Chip Technology | 256-bit | 256-bit | 256-bit | 256-bit | 256-bit |
Process | 0.15 Micron | 0.15 Micron | 0.15 Micron | 0.15 Micron | 0.15 Micron |
Transistors | 63 Mio | – | 60 Mio | – | – |
Memory Bus | 128-bit DDR | 128-bit DDR | 128-bit DDR | 128-bit DDR | 128-bit DDR |
Memory Bandwidth | 8,2 GB/s | 8,8 GB/s | 8,8 GB/s | 8,8 GB/s | 6,4 GB/s |
AGP Bus | 1x/2x/4x | 1x/2x/4x | 1x/2x/4x | 1x/2x/4x/8x | 1x/2x/4x/8x |
Memory | 64/128MB | 128MB | 64/128MB | 64/128MB | 64MB |
GPU Clock | 250 MHz | 300 MHz | 275 MHz | 275 MHz | 250 MHz |
Memory Clock | (514) MHz 64MB (444) MHz 128MB |
275 (550) | MHz 275 (550) | MHz 275 (550) | MHz 200 (400) MHz |
Speicher | SD/BGA 3,3-4ns | BGA 3,3ns | SD/BGA 3,3ns | SD | SD |
Vertex Shader | 2 | – | 2 | 2 | 2 |
Pixel Pipelines | 4 | 2 | 4 | 4 | 4 |
Texture Units Per Pipe | 2 | 2 | 2 | 1 | 1 |
Textures per Texture Unit | 4 | 4 | 3 | 6 | 6 |
Vertex S. Version | 1.1 | – | 1.1 | 1.1 | 1.1 |
Pixel S. Version | 1.3 | – | 1.4 | 1.4 | 1.4 |
DirectX Generation | 8.0 | 7.1 | 8.1 | 8.1 | 8.1 |
FSAA Modi | MultiSampling | MultiSampling | SuperSampling | SuperSampling | SuperSampling |
Memory Optmizations | LMA II | LMA II | Hyper Z II | Hyper Z II | Hyper Z II |
Display Outputs | 2 | 2 | 2 | 2 | 2 |
Chip Internal Ramdacs | 2 x 400 MHz | 2 x 400 MHz | 2 x 400 MHz | 2 x 400 MHz | 2 x 400 MHz |
Chip External Ramdacs | – | – | – | – | – |
Bits per Color Channel | 8 | 8 | 8 | 8 | 8 |
Special | – | TV Encoder On-Chip | – | TV Encoder On-Chip; FullStream | TV Encoder On-Chip; FullStream |
Estimated Price | ~ $179 | $149 | $179 | $149 | $109 |
If you thought that the Radeon 9000 (alias RV250) would merely be a simplified version of the Radeon 8500 (R200), then you’re quite mistaken. As with its predecessor, the Radeon 7500, ATI has once again integrated the functions of the Rage Theater chip and integrated a second RAMDAC in the chip. Moreover, significant modifications were made to the 3D core. Aside from performance aspects, these changes are supposed to allow for cost reductions. Fewer transistors enable a larger yield, thereby resulting in lower production costs and consequently a lower price. After all, with the OEM and mainstream markets, every penny counts.
The more significant changes have to do with the pixel pipeline. Just as with the R8500, the R9000 has four pixel pipelines. Instead of two pixels per cycle, however, these can only calculate 1 texel per cycle. In exchange, the number of textures that comprise the end pixel increases from three to six. What these confusing numbers mean in practice is that the R9000 is considerably slower than the R8500 in games with multitexturing.
The second change has to do with the vertex shader. It wasn’t till now that ATI has admitted that the Radeon 8500 contains two vertex shader units, similar to the GeForce4 Ti. The reason for keeping quiet on this was more because of marketing concerns rather than understatement. The R8500 had two shader units more than NVIDIA’s GeForce3. With the R9000, these units were re-worked and optimized. If the Canadian PR department is to be believed, then the new vertex shaders contain many optimizations that are also found in the R300 design.
The third and final big change affects the video capabilities. Graphics cards from ATI have the reputation of providing high quality DVD and video playback, and with good reason. As with the R300, ATI does away completely with the circuitry responsible for this and instead uses a new technology called “Videoshader”. This allows ATI to combine the optimized calculations in the hardware and the flexibility of a software solution, in the form of special pixel shader programs. During video playback, the pixel shaders remain in 2D mode anyway, so they are available for this purpose. In addition, there are completely new realtime filtering capabilities available that, for example, prevent pixelation in low-resolution videos or allow special optimizations for specific video codecs such as DivX. With simple driver updates, optimizations for new video formats can be implemented. This step lets ATI save on transistors, which, in turn, has a positive effect on the price. However, it still remains to be seen how this solution works in practice.
Architecture
The Radeon 9000, like the Radeon 7500 and 9700, is a highly integrated chip that has two 400 MHz RAMDACs as well as complete circuitry necessary for TV-out. With the Radeon 8500, ATI integrated an addition Rage Theater chip, which reduces production costs and enables a simpler board design. ATI equips the reference card with two display outputs: a normal VGA-out and an additional DVI-I port, which allows you to connect a standard CRT monitor using an adapter (included).
According to ATI, an active cooler is not needed for either the fast 9000 PRO version or the standard version, and thus follows the trend of the “low-noise PC”. However the real reason for this, once again, is due to cost considerations. A simple cooling unit is still much cheaper than a fan. Various ATI board partners, such as Hercules, PowerColor and Gigabyte, will certainly provide a cooling fan with their boards when they arrive on the market, at least with the PRO versions. Retail cards that come from ATI itself will only be available on the North American market.
DualView / Hydravision
The dual-display function from the Radeon 8500 has not changed. This allows for various possible combinations with a CRT monitor or TV-out. The maximum possible resolution of the TV-out is 1024×768. Hydravision is responsible for managing the desktop when in dual-display mode.
Test Setup
Hardware | |
Processor | Intel Pentium 4 2,2 GHz (100 MHz) |
Memory | 2 x 256 MB, PC 266, CL2 |
Mainboard | MSI 845G Max |
Drivers & Software | |
Graphics Driver | NVIDIA – v29.42 ATI – v6.13.10.6094 Matrox – v1.00.25 |
DirectX Version | 8.1 |
OS | Windows XP Professional |
Benchmarks & Settings | |
Quake III Arena | Retail Version 1.17 Benchmark using ‘Q3DEMO1’ |
Star Wars | Retail Version v1.02 |
Jedi Knight II | Benchmark using ‘jk2ffa’ |
Max Payne | Retail Version v1.05 Benchmark using ‘Shooting Alex’ |
Giants | Retail Version v1.04 |
Aquanox | Retail Version v1.17 |
Dungeon Siege | Retail Version v1.00 |
Comanche 4 | Benchmark Demo V 1.0.0.1.18 |
3D Mark 2001 SE | Pro Version, Build 330 |
It was pretty clear that the driver of the Radeon 9000 PRO test sample was still in the beta stages, which was rather surprising, considering the imminent product launch. At the time of the test, the driver could only be used with the Radeon 9000. For testing the Radeon 7500 and Radeon 8500, we used the latest official Catalyst driver. NVIDIA’s test cards were tested with the reference driver v29.42 and the Xabre was tested with v3.00.53.
Max Payne
In Max Payne, the Radeon 9000 PRO is slightly ahead of its direct competitor, GeForce4 MX460, which is barely able to hold off the SiS Xabre 400 at a distance. The expensive Radeon 8500 and GeForce4 Ti4200 cards are significantly faster. The Radeon 7500 brings up the rear in this game.
Giants
In Giants, the R9000 PRO plays its trump card, and, at up to 1280×1024, it even beats the GeForce4 Ti4200. It’s not clear where this performance boost comes from, because the Radeon 8500 is considerable further behind. It’s only at the highest resolution that the 9000 breaks down. The reason for the setback for the older version might be caused by the somewhat outdated driver that we had to resort to when testing the R7500 and R8500, due to the reasons mentioned at the beginning of this article. In this test, NVIDIA’s GeForce4 MX460 just barely manages to outperform the R7500.
Star Wars Jedi Knight II
At maximum quality settings, Jedi Knight II runs very slowly on 64 MB graphics cards, so it’s hardly possible to make a definitive statement, except that the Xabre 400 clearly seems to be the fastest here. Even caching the map data before making the benchmark run at the various resolutions does not help to solve the problem.
Aquanox
In Aquanox, the R9000 performs very similarly to the R8500. MX 460 and 440 lag clearly behind, but NVIDIA’s “big” GF4 Ti 4200 proves in this test that it has not only moved to another price class. Also notable is how the SiS Xabre 400 puts a respectable distance between itself and the MX cards.
Quake 3
Quake 3 is an excellent game for reflecting the theoretical fill rate of a card. At higher resolutions, the game puts a considerably smaller load on the CPU than other games, but on the other hand, it uses a lot of multitexturing. The Radeon 9000 PRO loses a bit of ground because of its pixel pipelines have been pared down. Even the GeForce4 MX460 is faster in this text, which is surprising, when you consider its modest multitexture fill rate.
3D Mark 2001 SE (330)
The R9000 also appears to suffer from the poor multitexturing performance (relative to the R8500) in 3D Mark 2001 as well, as a result of its stripped down pixel engine. Yet the R9000 PRO has no problems keeping a grip on its mainstream competitors, NVIDIA and SiS.
Comanche 4
In this game, we again see the phenomenon that the R8500 and 7500 run much slower than the R9000 PRO. The only thing that can solve this problem is a final version of the R9000 that can also be used with other cards.
Anisotropic Filtering
Anisotropic filtering is responsible for sharper rendering of the textures in the surroundings. Recently, NVIDIA and ATI have declared FSAA to be their favorite marketing feature. Thanks to internal driver optimization (not all textures are completely filtered), ATI’s anisotropic filtering is very fast. Opinions between the two manufacturers differ quite widely as to how anisotropic filtering is exactly supposed to look. It’s such a complex topic that it should be discussed in its own separate article. Here, we will limit ourselves to talking about pure numbers only.
NVIDIA and ATI give different names to their filtering levels. This table shows the filtering levels that just about match each other in terms of the resulting quality. With NVIDIA, LV2 corresponds to 4tap anisotropic filtering.
FSAA
With R8500 and R9000, ATI’s solution to anti-aliasing is called SmoothVision, which is still at version 1.0. However, what’s behind this fancy-sounding name is a slow, memory-bandwidth-gobbling and rather outmoded SuperSampling. Although it enables rather good image quality, it is still too slow. Even ATI’s Performance Mode (indicated in the table with “P”), which sacrifices image quality, can do nothing to remedy this. Both the GeForce4 Ti and MX versions use a significantly faster multisampling technology.
The R9000 PRO falls clearly behind its older sibling, the R8500. Because Performance Mode (P) comes at a cost to image quality, a direct comparison can only be made to NVIDIA’s cards in Quality Mode (Q). Here, the ATI cards clearly admit defeat to the competition from NVIDIA.
3D Mark 2001 Detail
In addition to the popular overall scores for 3D, the 3D Mark 2001 benchmark test also gives very detailed results in individual disciplines. This is very good for determining the strengths and weaknesses of the individual cards. However, Mad Onion’s benchmark has a few downsides, which make the results for the individual tests seem a bit questionable, as is the case for the fill rate test.
Fill Rate: Single-Texturing
This benchmark tests for speed when in single-texturing mode. Interestingly enough, the R9000 lags behind its older sibling, the R8500, despite the fact that it runs at the same clock speed. This might be caused by the use of additional alpha blending operations in the single-texturing test.
Fill Rate: Mutltitexturing
Here, the stripped-down pixel pipelines of the R9000 make themselves especially evident. The R9000 reaches almost exactly half of the performance of the R8500, and so it even places behind its predecessor, the R7500. As opposed to the single-texturing test, Mad Onion does not use additional alpha blending operations. However, the GeForce4 MX models, with their castrated pixel pipes, perform even worse than that. Noteworthy is the Xabre 400, which performs very well in this test.
High Polygon Count – 1 Light
Even here, the R9000 PRO loses heavily compared to the R8500, ending up just before last place, which is held by the R7500.
High Polygon Count – 8 Lights
In this test, the results are about the same, except that the Xabre 400 is able to overtake both of the MX models.
Game 4 Nature Test
This is a stress test for the cards because it involves a high polygon count and pixel shader effects. Here, the Radeon cards show their strengths and are even able to put a GeForce4 Ti4200 in its place. The Xabre 400 falls behind all too clearly. MX 460 and 400 aren’t included in this test at all, since they lack pixel shaders.
Pixel Shader Speed
In this test, the R9000 is able to get ahead of the competition. The Xabre 400 reveals that its pixel shader unit is nothing more than empty marketing jargon, at least in the 3D Mark 2001 Pixel Shader tests.
Adv. Pixel Shader Speed
The results of this test can only be compared indirectly, since it runs on ATI cards with pixel shader version 1.4, but not with version 1.1. In this test, the ATI cards have the advantage, because calculation of the water (per pixel) can be processed in a single clock cycle, thanks to support from the pixel shader standard 1.4. Cards with pixel shader 1.3 require two clock cycles. Despite this theoretical advantage, ATI boards are only a nose ahead of the Ti 4200. The Xabre lags hopelessly behind.
Vertex Shader Speed
There’s hardly any evidence of the R9000’s optimized vertex shaders that ATI talks about. The performance is clearly poorer than the R8500. Again, the reason for this might be found in the test mode of 3D Mark 2001.
Conclusion: Radeon 9000 PRO
Without a doubt, ATI will introduce a new standard to the mainstream market with its R9000 series. Thanks to Microsoft’s Xbox, there are some new games to be expected this Christmas that make use of or support pixel shaders (v1.3) and vertex shaders (v1.0). ATI is the first to offer a reasonably priced graphics card that is able to render these effects with respectable performance. In fact, the GF4 MX series is still based on GeForce2 technology with DirectX 7.1, which has nothing to offer in the way of pixel shaders and vertex shaders.
Despite the positive performance of the R9000 relative to the price, there are still many questions that need to be answered. Some tests give you the impression that ATI has made more changes to the 3D core than you might be willing to admit. Above all, very little can be seen of the the vertex shader, which is supposed to have been optimized in a positive manner. In practice, the R9000 PRO also suffers from its castrated pixel pipelines. The faster bus speed is hardly noticeable in the latest games.
With the Radeon 9000 series, ATI offers a very solid 3D performance to a reasonable price, and it also takes the technological lead in the mainstream segment ($100 – $150). It remains to be seen whether NVIDIA will soon be able to hold its own with its GF4 MX (NV17) successor NV18, which stands ready at the starting line. The solution from SiS, whose pixel shader support is supposed to allow it to keep up with the competition, at least on paper, disqualifies itself in practice through the poor performance of the pixel shader unit.
Please follow-up by reading