Introduction
ATi has just released details on their upcoming Rage6 based board called the Radeon256 and it has to be one of the most exciting times for a consumer right now as the graphics scene on the desktop is one of the hottest areas in the industry. With NVIDIA taking the lead in the 3D performance arena and 3dfx lagging behind, other competitors decided to get into gear and put together their best efforts to become the market leaders in leading-edge 3D technology. One such company is ATi, currently the largest 3D graphics supplier to the world. This Radeon256 should make its way onto shelves sometime in May and will be competing with the likes of NVIDIA’s GeForce, GeForce2GTS and 3dfx’s VSA-100 series (Voodoo 4-5). Only the specifications and technical details were released at this time but no hardware, so performance comparisons will have to be on hold for another month as we wait for the board to be shipped later next month. In this article we’ll take a peek at what ATi has up its sleeve as well as the upcoming competing products from 3dfx, Bitboys, Matrox, NVIDIA, and S3.
Lessons Learned
Aside from the recent release of the Rage Fury MAXX, ATi hasn’t really been in touch with the high-end 3D consumer/gamer. Cards produced by it were mainly for the average Joe and didn’t appeal to the mainstream 3D fanatic who needs the latest and greatest. Those needs were mainly filled by 3dfx, Matrox and NVIDIA and although ATI continued to do extremely well where it really counts (making money selling tons of cards), it didn’t touch the competition in that part of the market. Well things have changed and ATi is full steam ahead on their attempt to produce a leading edge 3D graphics card for the desktop. This was apparent with its first big thrust into this market using the Rage Fury MAXX. From seemingly nowhere they produced a card that nearly reached the performance of the competition. Although the card didn’t have on-board T&L, it managed to have extremely good fill-rate and memory performance. Unfortunately, it came up short of the competition’s performance and the price was in most people’s opinions too high, so it didn’t exactly take over the market. The thing to note here is that ATi went from supplying the world with low-priced and average performing video cards, to producing a high performance graphics card and with its first attempt did pretty well. With the introduction of this latest technology, it shows that ATi is paying attention to what the market is asking for and what developers are looking to take advantage of right now and in the near future.
ATi’s Next Generation Technology Feature Set
So what’s all this fuss about? Features, features, features! ATi looked at the current trend and did a little research with developers to come up with a rather complete list of features. Let’s take a look.
Important Features
- Hardware T&L (features Vertex Skinning that blends up to four matrices and Keyframe Interpolation, pretty much equal to NVIDIA’s ‘vertex blending’)
- Large memory support (launching with 32MB but capable of 128MB)
- High speed memory (launching with 333MHz DDR but 366-400MHz DDR a possibility later on)
- Dual rendering pipelines capable of Tri-Texturing in a single pass (launching between 150-200MHz core speed)
- Full featured Pixel Shader support (DX8)
- Hyper Z (loss less data compression when using Z-buffer data)
- FSAA/Motion Blur/Depth of Field (hardware supported), all the features that require very high fill rates
- Texture Compression (supports all DX modes)
- Emboss, Dot Product 3 and Environmental Bump Mapping
- Texture Transforming Capabilities
- HDTV support (18 formats)
- Adaptive De-Interlacing (enhanced video playback)
The Charisma Engine
The first feature is the Charisma Engine that is a fancy name for ATi to say hardware T&L that is capable of 30M Triangles/sec and up to 8 lights local or infinite. They’re calling it TCL for Transform, Clipping and Lighting but the T&L is basically the same with a few enhancements, as we know it from NVIDIA.
Vertex Skinning with more than two matrices allows developers to provide models or objects with joints that appear more realistic. As you can see in the example below, the 2-matrix joint has visual flaws that make the joint look odd while the 3-matricie joint looks well rounded. We’ve seen all that last year when NVIDIA presented their ‘vertex blending’.
Greater than two matrices is possible on a card like the GeForce 256 but it must be done in software and may cost a bit of performance (keep in mind I’m referring to NVIDIA T&L that’s currently available and not a future product). Basically what this boils down to is that ATi’s solution will be able to create a life-like joint on models that’s well rounded when moved in different angles through total hardware.
Keyframe Interpolation will allow developers to interpolate (or insert) frames between “keyframes”. A good example of this is where you have a mesh on a face that’s a plain expression (our first keyframe) and then a mesh on a face that’s smiling (our last keyframe). The hardware is able to create as many frames as a developer wants in between to make the transition from normal to smiling. In our example below, the model only has two frames created by the feature but the developer could request for more to build an animation that’s even more convincing. This would be an excellent feature for having characters lip sync their dialogue.
Pixel Tapestry Architecture
The second big feature is ATi’s Pixel Tapestry Architecture (PTA) that is a fancy set of texture abilities that the new hardware will posses. There are two rendering pipelines the new chip will have and each pipeline will be able to apply three filtered textures per pixel. This ability comes from the fact that ATi has three texturing units per pipeline. That’s awesome for multi-texturing and/or heavily filtered games but keep in mind this will NOT improve unfiltered single textured gaming environments. Consequently this feature might not improve pixel fill-rate but it’ll definitely boost that Texel rate through the roof. There are still many games today that don’t use multi-texturing, however, almost all games are moving towards it so don’t think will be a useless feature. The table below is showing what the Radeon256 dual pipeline will be able to do compared to the GeForce 256’s quad pipeline. Keep in mind that the data below is per pipeline and not total.
Textures Per Pixel | Radeon256’s 3 Texturing Unit Graphics Pipeline |
GeForce 256 Graphics Pipeline | GeForce2GTS Graphics Pipeline |
1 Bilinear Filtered | 1 Pass | 1 Pass | 1 Pass |
1 Trilinear Filtered | 1 Pass | 1-2 Pass | 1 Pass |
2 Bilinear | 1 Pass | 2 Pass | 1 Pass |
1 Trilinear + 1 Bilinear |
1 Pass | 2-3 Pass | 1-2 Pass |
3 Bilinear | 1 Pass | 3 Pass | 2 Pass |
Number of Pipelines | 2 | 4 | 4 |
Core Clock | 200 MHz | 120 MHz | 200 MHz |
Texel Fill Rate | 1200 Mtexel/s | 480 Mtexel/s | 1600 Mtexel/s |
Pixel Fill Rate | 400 Mpixel/s | 480 Mpixel/s | 800 Mpixel/s |
An important thing to note about the above information is that apart from the superior performance of NVIDIA’s upcoming GeForce2GTS chip, the Radeon256 and the GeForce256 will have an advantage in a given circumstance. In a single textured situation with tons of texture filtering, the GeForce 256 theoretically would provide the better performance while in a heavily textured scene (many textures per pixel) the Radeon256 would take control. Keep in mind that currently there aren’t any other graphics solutions that can do three texels in one cycle in a single pipeline. On the other hand NVIDIA’s latest chips sport four pipelines vs. the Radeon256’s only two.
3D Textures
Another nifty little feature the PTA provides is 3D textures. Have you ever broken apart a tree (or any object for that matter) with a weapon and noticed that the innards look very unrealistic because the chopped tree innards were usually a single colored texture. With 3D textures you can give the tree “rings” inside that no matter how you cut the tree, the rings will accurately be drawn as if the rings were really there inside the tree giving a very realistic appearance. 3D textures aren’t just limited to this as developers also may create complex/dynamic light maps, volumetric fog, smoke, and liquid/fluid effects using this feature.
Bump Mapping is something we’ve known about for some time now thanks to Matrox who’s offered bump mapping in their G400 product line. You can check out the G400 MAX review to see greater details on what bump mapping looks like. Due to the triple texturing units of this upcoming hardware, ATi will be able to provide developers with the ability to do various effects like Embossing, Environment Mapped Bump Mapping or Spherical/Dual-Paraboloid/Cubic Environment Mapping at virtually no performance hit.
Priority Buffers
One of the most interesting innovations derived from PTA is a Priority Buffer. A Priority Buffer is like a Z-buffer but uses order of objects over each objects distance from the viewpoint like a Z-buffer does. Priority Buffers have been available by software and ATi has now brought them into their hardware.
Shadow Mapping
Using Priority Buffers, ATi is pushing the use of Shadow Mapping, which is the use of a light source as a viewpoint. A scene is rendered to a Priority Buffer so that the closest shadow casting the object has the highest priority, the next closest has the second-highest priority and so on. This method of adding shadows to a game is easier, faster and removes issues that Volumetric Shadows have (i.e. objects unable to cast a shadow on themselves).
FSAA/Motion Blur/Depth of Field
Some features that I was surprised to find out about are Full Screen Anti-Aliasing (FSAA), Motion Blur and Depth of Field. The AA is full screen and is supported in hardware through a special memory space in the frame buffer (similar to 3dfx’s T-Buffer). The framerate performance of the FSAA is said to be excellent up to 800x600x32 and at 1024x768x32 in some games. I’ll wait to be the judge of that before I buy it. Motion Blur and Depth of Field are both supported but no details were given about the performance aspects when these features are used. All those three features require very high fill rates, because each feature involves the rendering of either one very large or several different frames to create the frame that finally will be displayed. 3dfx used to create some mysticism around those three features, which are supposed to make up for the lack of TnL-support of the upcoming VSA100, but high fill rate is really what it takes to offer them. Let’s see if the 400 Mpixel/s of Radeon256 will be enough. I have my doubts about that.
Pixel Shader
The Pixel Shader allows for things like Lighting effects, Shadows, Bump Mapping, Texture Morphing, and Alpha Blending between multiple textures. Basically you are able to do any texture blending operation you can imagine. Another advantage to using ATi’s Pixel Shader is that all these operations are done with very few rendering passes. Other hardware will redundantly use textures and go through multiple passes to create the safe effects. In other words the ATi implementation can give you all the visual benefits with very little performance loss. The one big thing to note here though is that DX8 is needed to take full advantage of this feature.
Above is an example of a shading operation done in DX7. This method has its limitations, as it cannot offer more efficient use of the same textures in one pass. Multiple passes must be made to create given effects.
Here we have an example of what the Radeon256 can do when coupled with DX8. This method is far more complex and allows not only efficient use of a single pass but gives developers greater control of the effects they may create.
Hyper Z
Hyper Z is something that ATi created to alleviate the growing memory bandwidth issue as resolutions rise and color depths increase the bandwidth needed. The Hyper Z is a special Z-buffer cache that helps cut down on bandwidth and memory used when manipulating Z data. ATi is claiming to see somewhere around a 20% performance gain but the true test will be done once we have the board up and running on our test platform proving it.
Fill Rate
Fill-rate is something that we’ve discussed for a while now when comparing various cards “theoretical” abilities to each other but it hasn’t always made sense in the real world. The architecture, clock speed and software situation will all effect how a given card will perform. Architectures can affect the fill-rate depending on how many pipelines they have, how many texturing unit per pipeline, and how efficiently they can do various filtering methods at the same time. If a given architecture needs more than one pass for a given multi-texturing situation then it is typically bad news on the performance side of things as it means wasting another pass to complete the task. The Radeon256 will take full advantage of these situations as both of its pipelines may do a triple textured and filtered pixel in one cycle. If we take a look at the pixel fill-rate, the Radeon256 might not appear to be very impressive.
Pixel Fill-Rate | |
Radeon256 | 400Mpixels/s |
ATi Rage Fury MAXX AFR | 500Mpixels/s |
GeForce256 | 480Mpixels/s |
S3 Diamond Viper II | 250Mpixels/s |
First, we’ve given a range for the Radeon256, as the core speed wasn’t final at the time we spoke with ATi. The numbers were based on a 200MHz-core clock speed. You’ll notice that the numbers it boasts aren’t all that great actually but don’t let this fool you. Let’s take a look at the Texel Fill-Rate.
Texel Fill-Rate (Dual Texture) | |
Radeon256 | 1200MTexels/s |
ATi Rage Fury MAXX AFR | 500MTexels/s |
GeForce256 | 480MTexels/s |
S3 Diamond Viper II | 500MTexels/s |
You can quickly see how the Radeon256 can quickly gain some ground as scenes become covered with tons of textures. The Radeon256 is just fine up until you have three textures per pixel but after that you’re talking about needing to do an additional pass unlike the Viper II that can do a single textured pixel in a single pass (although only one pixel). There is much more to this as factors like total number of textures used per pixel, filtering modes and architecture efficiencies (like Hyper Z) alter real world performance. Once again, the only true proof can come from testing these capabilities by running various real world game applications.
Memory Bandwidth
Memory Bandwidth is yet another interesting topic to throw around because its such an expensive commodity these days. Higher fill-rates and leading edge features are just sucking away all available bandwidth and can kill performance. Companies are given a few choices to combat this issue. Increase memory bus width from the standard 128-bit to 256 or greater, increase memory speed (i.e. higher clock speeds or better memory – DDR) or provide features that are more memory efficient. ATi took the route of using fast DDR memory as well as implementing features like Hyper Z that help cache various data to decrease the amount of wasted rendering passes. This allows for more efficient use of the already available high-speed memory and essentially gives them a boost over the competition that uses the same type of memory.
Memory Bandwidth | |
Radeon256 | 5.3 (-6.4) GBs/s |
ATi Rage Fury MAXX AFR | 4.9 GBs/s |
GeForce256 DDR | 4.8 GBs/s |
GeForce2GTS | 5.3 GB/s |
S3 Diamond Viper II | 2.5 GBs/s |
We had to use a variable number once again for Memory Bandwidth as the DDR memory to be used was stated to be 166MHz (effectively 333MHz since it is DDR memory) and possibly up to 200MHz later on. Factor in the various efficiencies that ATi has enhanced the Radeon256 with and we’re talking about a very powerful card all of a sudden.
Competition
Tons of hype is surrounding the upcoming next generation graphics cards and as we’ve seen in the past, several companies have come up short of their claims. Sound familiar? Sure it does. It’s all a part of the marketing hype. This might be one of the reasons ATi was smart enough not to hand out any performance numbers for us to beat them with if things don’t go perfectly. In any case, we have the goods on a bit of the competition and the routes they are taking to their next product. Let’s have a look.
3dfx is about to release their latest product and prove to the world that they’re far from dead. The upcoming line-up from 3dfx looks to provide consumers with high fill-rates enabling various visual effects in hardware like full-screen Anti-Aliasing, Depth of Field and Motion Blur. Although all these things sound appealing, only the fill-rates appear to be its unique advantage as other companies have already begun supporting the visual effects as well. So far, results scored with Voodoo5500 beta units can’t impress much. Unfortunately the 3dfx name will not carry the once leading edge graphics company much farther so let’s hope that the recent acquisition of Gigapixel helps their future product line-up.
Matrox has just announced the new G450 product, but important details as e.g. the core clock and fill rate of this item are being kept under tight wraps so I cannot give further detail.
NVIDIA will be releasing their next generation chipset tonight and further details will be available shortly. The GeForce2GTS chip will be offering enhanced T&L, significantly improved fill-rates, higher memory performance and revamped graphics pipelines. Keep your eyes open for the full review.
Most people have agreed with me about the tragedy that we’ve seen happen with the Viper II product from S3 and I’ve tried testing various driver drops that have become available through download recently but to my disbelief the drivers are still sub-par when compared to any of their competition. We will be seeing a faster version of the Viper II sometime soon but with the driver status being so shaky, who will be getting one? We will see what will happen with S3’s performance products now that VIA has acquired S3’s graphics department.
Summary
ATi has definitely jumped into the high-end/gamers market as they’re boasting hardware that will offer competitive performance, appealing visual effects and a leading edge feature set that makes sure that ATi won’t lose sight of NVIDIA. It might be offering the best all-around package as they have next generation T&L, top-notch texturing capabilities (from texture compression to bump mapping to triple texturing), a well thought out efficient graphics architecture and the well known and respected video capabilities. All this will still have a hard time competing against NVIDIA’s upcoming GeForce2GTS. Unfortunately we don’t have a product to test at this time, so we’ll have to reserve our judgment on this future battle until we’re able to line-up all the upcoming graphics hardware and can put them all to the test. The one big question that is lingering in my mind: “What will the competition do if an AFR version of the Radeon256 comes out?”