<!–#set var="article_header" value="ATi's New Radeon –
Smart Technology Meets Brute Force” –>
Introduction
ATi, the Canadian graphics chip and card maker with its headquarter in Toronto / Ontario, has been one of the big players in the PC graphics scene for a long time. It’s still holding the biggest market share in the overall graphics chip and card market world wide and it has been expanding in several other new fields lately as well. However, despite all its high acclaims, ATi was never quite able to get hold of the performance freaks of this planet. Power hungry and fanatic 3D enthusiasts have so far only recognized two companies in the 3D hardware scene. In the past it was 3Dfx (today “3dfx”) and today it is NVIDIA.
ATi’s latest 3D chip “Radeon” is supposed to finally establish a strong foothold for ATI in this high-end 3D graphics arena as well, so that all 3D freaks might consider it closely with the products from NVIDIA and 3dfx. We know that ATi has claimed to have the fastest 3D chip many times in the past, but in tests the products always fell short of ATi’s claims. ATi’s pitch for the new Radeon is everything but modest once more (amongst others “fastest 32-bit gaming performance”) and we will now find out if Radeon is able to deliver the performance that ATi is promising.
ATi’s Business Strategy
Before I will start focusing on Radeon, I’d like to give you some idea of what ATi is actually doing. We all know that this Canadian company is producing graphics chips as well as their own graphics cards. Those cards have nowadays become pretty famous for their industry leading support for video acceleration, such as DVD/MPEG2 and DTV/HDTV, and its long history of flat panel support. With this in mind it is not particularly difficult to understand why ATi is diversifying into the mobile computing market as well as in the set-top boxes arena. Today ATi has become the largest supplier of 3D chips for notebooks and it’s pretty much sharing the market with S3 only. The set-top boxes stuff is less well known to most end users as well as distributors, because neither of the two groups would have a reason to go out and buy a set-top box. However, the pay-TV market is expanding at a very high rate and now, since digital TV is becoming more and more available, ATi can take advantage of its experience with digital video, television and high-definition TV.
Adding 3D features to a pay-TV box might be difficult for other manufacturers, but it’s obviously an easy thing for a company that has been in the mainstream 3D-graphics business for almost as long as it exists. This gives cable providers the chance to add 3D games to their offerings and once a pay-TV customer is receiving digital service, he can also play 3D-games on his TV or HDTV instead of only receiving broadcastings. Once you look at this opportunity, you will understand why ATi has very high expectations in their set-top box division. So far the Canadian company is way ahead of any competition in this market.
Another new thing is the fact that ATi is moving into the platform market. After the acquisition of ArtX, ATi is now able to produce chipsets with integrated 3D, which are easily able to outperform Intel’s i810 or i815 solutions. ATi’s first integrated chipset for Socket370 platforms will soon be available, and the demonstrations that I have seen were clearly able to show that this new chipset is way ahead of Intel’s chipsets with integrated 3D-deccelerator.
In the 3D-graphics scene ATi was never able to reach the performance crown, but at the same time the Rage-line of 3D-products was never far behind the competition. ATi’s manufacturing abilities, the good pricing and wide product spectrum ensured that ATi chips are found in a huge amount of computer systems, because particularly OEMs and system integrators love to do business with ATi. Now the Canadian company is finally reaching for the recognition to be at least right up there with NVIDIA at the top of 3D-technology. The new Radeon has got more high-tech features than any other 3D-chip in the market, and we will see if NVIDIA was right when it stated that it doesn’t consider 3dfx, but ATi to be their toughest competitor in the high-end 3D-arena.
RADEON – Packed With A Million And One Features
I am sure you can remember the release of NVIDIA’s GeForce2 GTS chip that came with a long list of funky features. ATi decided to add even a few more and renamed the ones it shares with GeForce2. From a features standpoint, the Radeon chip is at least as advanced as the GeForce2 GTS, but it doesn’t have the brute force approach of the NVIDIA chip. Instead, ATi went for a more elegant solution, which I consider as rather commendable. I would like to go through Radeon’s features in the same way that a 3D scene gets rendered from the moment when the scene is computed by the host CPU until it’s displayed on the screen.
A – Radeon’s Charisma Engine
The ones of us who know ATi’s announcements are aware of the fact that this Canadian company is never shy of creating ‘kewl’ new names for their feature lists. The funky ‘Charisma’-engine is nothing else but Radeon’s integrated transform and lighting unit, formerly called ‘geometry’ unit.
ATi claims that their integrated T&L is able to do 30 million triangles per second, which tops the 25 million triangles/s of GeForce2 GTS. That’s not all however, as ATi was able to add a few more nice features to this unit as well. ‘Vertex Skinning’ is one, ‘Keyframe Interpolation’ is the other. Let’s have a quick look at each of those features.
Hardware T&L
We know it since the release of NVIDIA’s GeForce256 last October, hardware transform and lighting lets the 3D chip do the power hungry floating point calculations that make a scene out of a 3D world (‘transforming’), remove the objects that are outside of the viewing range (‘clipping’) and give each vertex a light vector after computing the 3D-scene and its light sources (‘lighting’).
Most people that come to hardware websites are pretty sure they’ve got an idea what T&L is actually doing, but I still think that the usual descriptions are far too abstract to understand what’s really happening inside the chip. Thus I’ll try to give an analogy that’s easier to understand.
The first data that the platform/software unit sends to the 3D chip could be compared with the plan of an architect. The architect gives you the map of a room in a house that he wants to build and you, as the 3D chip, have to make a picture of this room as you e.g. stand in the door and look inside. This is pretty much what the transform unit does. It creates a scene of a room viewed from a particular spot after receiving the layout of the room from the CPU. The layout of the room is supplied by the platform/software in form of ‘vertices’, which are pretty much all the ‘corners’ in this room. These ‘vertices’ have coordinates that are based on the 3D world and they never change. The scene created by the transform unit is again made up of ‘vertices’, but those vertices have coordinates that could e.g. start from the viewer. These coordinates change each time that the location of the viewer changes.
In this picture you see the transformed vertices that represent a Porsche Boxster. You can see the right rear and front wheels, which are included into the transformation although they will be covered by the body of the car.
Now depending where you stand in this room there are parts of it that you cannot see because they are outside of your field of view. ‘Clipping’ removes those parts, so that the next steps in the 3D pipeline don’t have to bother about them. Clipping DOES NOT remove any objects that are within your field of view but covered by other objects in front!
The ‘lighting’ is easier to understand. The platform/software unit tells the 3D chip where the light sources in this room are. Depending on those light sources, the ‘lighting’ unit calculates a special light vector for each vertex.
What you have now, after those three steps are finished, is a room with all the objects that are in your field of view, including those that are behind an object in front. Those objects have no textures though. If the light sources should supply only simple white light, you would see all objects with the same plain surface in different shades of gray.
This is the same Boxster ‘coated’ with a solid skin, after it has been transformed and lit. The textures are still missing.
Hardware T&L, Continued
This is an analogy though! In ‘reality’ the scene has still got no solid surfaces at all after it has been created by the T&L unit. I say that because some objects in this room could be see-through, other objects are hidden by things in front of them.
That’s what the Boxster looks like once the whole rendering process is finished. You can see that the windows are see-through, so that you can spot the seats inside the car.
What is good about an integrated T&L unit? Well, if those complex T&L calculations don’t have to be done by the platform (CPU, chipset, memory), but by a dedicated unit on the graphics chip that handles those kind of calculations much easier, a scene can be made out of a lot more ‘vertices’.
For the human brain ‘vertices’, the ‘corners’ of an object’, are rather difficult to understand, which is why we usually draw lines from each vertex to the ones next to it, creating the so called ‘wire frame model’. A ‘polygon’ is a shape that lies in between the ‘vertices’, limited by the lines that run from the vertices that surround it. In most cases, the ‘polygons’ of a 3D model are triangles; only in some rare occasions a polygon has more than three ‘corners’ (vertices around it).
This is the Boxster as ‘wireframe model’. You can easily spot all the polygons / triangles.
A scene is the more detailed the more vertices and thus polygons it is made of. To make 3D as realistic as possible, we have to allow as many vertices/polygons as possible. A T&L-unit helps to achieve that. It gets us one step closer to the famous ‘photo realism’.
Radeon’s T&L-unit is supposed to be able to transform and light 30 million polygons per second, which is higher than the 25 million polygons/s claimed by NVIDIA for their GeForce2 GTS chip. It is very hard to verify those numbers, so we might as well just accept it. However, in each of NVIDIA’s own benchmark programs Radeon performs worse than GeForce2 GTS. This doesn’t have to say much, because those benchmarks are probably optimized for NVIDIA chips, but at least it doesn’t seem to be too obvious that Radeon’s T&L-unit is more powerful than the one of GeForce2 GTS.
Vertex Skinning
This new term is an enhancement of what NVIDIA calls ‘vertex blending’. As you know, lately characters in games, such as animals, monsters or players, are animated using a technique that is called ‘skeletal animation’. The idea is to define ‘bones’ in limbs and attach the surrounding tissue in form of vertices around this ‘bone’. The behavior of these ’tissue vertices’ will then depend on the bones that were defined to influence their appearance. The thought behind this animation technique is to save storage space, loading times and system memory, because it keeps a game from having to provide all the different appearances of the limbs for a movement. Instead, those different appearances are computed, depending on the position of the ‘bones’.
The problem with skeletal animation is that it creates gaps when the joint between two bones is bent too far.
With ‘Vertex Skinning’ or ‘Vertex blending’ this problem can be solved. The usual 2-matrix vertex blending/skinning, as used by NVIDIA’s GeForce chips, joins the two bones together, to create a more or less realistic joint.
ATi’s Radeon supports a more sophisticated version of vertex skinning or blending than NVIDIA’s GeForce group of chips. It’s called ‘4-Matrix Skinning’. In skeletal animation a bent joint requires different matrix transformations for the vertices of the ’tissue’ around each ‘bone’. While the transformation of each added matrix requires a lot of computation power, a joint looks more realistic the more ‘bones’ and thus matrices are used. The knee joint would be a typical example, where the patella (knee cap) would add another ‘bone’ and thus matrix to make the knee joint look realistic. NVIDIA’s GeForce chips can do this transformation of more than two matrices only in software, which requires a lot of CPU power. Radeon is able to transform up to four matrices in hardware, thus offering a more realistic way of skeletal animation without performance penalty.
Keyframe Interpolation
This funky feature is also known under the name ‘morphing’ out of the 2D-world. It comes into play where skeletal animation doesn’t work. The changing expression on a face even if the character is only speaking and thus opening and closing his mouth as well as bulging muscles can be animated with this technique in a rather easy fashion. The idea is simple. You simply need two conditions, as e.g. a face that’s frowning and smiling. Then you use those two frames as ‘keyframes’ and interpolate a few frames in between, to ‘morph’ the face from the one expression to the other.
This way game characters can be shown a whole lot more life-like. They can talk and show facial expressions. Several game developers have already decided to use this feature in their next games.
Summarizing Radeon’s T&L Features
It is pretty obvious that Radeon’s T&L shines over the one of NVIDIA’s GeForce2 GTS. If ATi’s claims are correct then Radeon is able to apply T&L to more polygons per second than GeForce2 GTS, but Radeon throws in 4-matrix vertex skinning and keyframe interpolation as well. From that point of view the standing is 1:0 for Radeon.
Excurse – The Next Step Of The 3D Pipeline After T&L, The Triangle Setup
3D chips have been equipped with this feature for such a long time that hardly anybody remembers what it does anymore. However, it is obviously an integral part of the 3D pipeline, which is why it was implemented into the chips so early. Do you know what the triangle setup actually does?
After the transform and the lighting have been done we are still looking at a real 3D-scene, where each vertex has x,y and z coordinates. This 3D-scene has now to be changed to the 2D frame that we get displayed on our screen, which is made out of pixels. The triangle setup procedure has to be done polygon for polygon or better triangle for triangle. Some of the triangles of the 3D-scene might be covered by other triangles that are ‘in front’ of it, but at this stage it is unknown to the 3D chip which triangles are covered or partly covered and which aren’t. So the triangle setup unit receives a triangle that’s defined by three vertices. Each of these three vertices has a x,y and z coordinate which defines its place in the 3D-scene. The triangle setup ‘fills’ those triangles with pixels. Each of the pixels in the triangle receives the x and y coordinate for the place it takes on the screen and a z-coordinate which holds its depth information. Each of the pixels gets also light information and then the pixels of this very triangle are one by one sent to the pixel-rendering unit.
I am taking the time to explain this step, because it is integral to what comes next in the Radeon chip, the first part of the mysterious HyperZ, which makes Radeon so very special. It is also important to understand the term ‘triangle size’, as it has a big impact on the fill rate.
Let’s look at a few facts now. First of all it depends on the triangle size as well as the screen resolution how many pixels represent the triangle.
Fill Rate, Rendering Pipelines and Triangle Size
If we look at the wireframe of the Boxster once more, we can see that some of the triangles are extremely small. In fact some triangles are so small that they build a white chaos together with other small triangles around them. You can easily understand that many of those small triangles represent one or even less than one pixel. This is an important issue for the fill rate of the pixel-rendering unit. You have to realize that the pixel-rendering unit can only be fed with one triangle at a time, unless the chip would have several T&L and triangle setup units. If the triangle consists of fewer pixels than the amount of rendering pipelines in the pixel-rendering unit, some of those pipelines will be idle. This is the big disadvantage of several parallel rendering pipelines. A chip such as NVIDIA’s GeForce2 GTS that comes with four rendering pipelines will always have some idle pipelines if a triangle contains less than four pixels. Frames with many small triangles will therefore never be able to live up to the high fill rate claims. You can imagine that there are the more small triangles the lower the resolution is. This is why we can always see the effective fill rate (frame rate * screen resolution) increase as resolution increases.
Wasted Energy – The Rendering Of Hidden Surfaces
The second thing you have to be aware of is the problem that many pixels get rendered in vain. If the triangle setup unit gets a triangle that is somewhere in the background of the scene, where it is partly or completely covered by triangles in front of it, it will still do what it always does and convert this triangle into pixels. After that, those pixels get sent to the rendering unit. Here in this unit the Z-buffer gets accessed and the z-coordinate of the pixel at the spot where the new pixel is supposed to be drawn is read. If the value in the z-buffer is zero, which means that nothing has been drawn at this location yet, or if it shows that the new pixel is in front of the value that was found in the z-buffer, the pixel will get rendered and then the z-coordinate of the new pixel will get stored in the z-buffer. The problem is however, that the rendering pipeline has already wasted one clock cycle, even if the new pixel should have a z-coordinate behind the pixel that has been drawn at the same spot already, which means that it will be discarded. Even if the new pixel should have been rendered and stored, it is possible that a later triangle will happen to cover this pixel and so the ‘old’ pixel will be overwritten. Once again the pixel was drawn in vain.
I hope that this flow chart I created will make the understanding a bit easier:
I hope I explained in a somewhat understandable way that we’ve got two of the most important problems of 3D-rendering here. First of all, the engine is wasting valuable rendering power for the drawing or at least fetching of pixels that will never be seen on the screen. Each of those ‘uselessly rendered pixels’ is taking away fill rate. The second problem is the z-buffer. It gets accessed twice for each pixel in each triangle of the scene, which represents several times the screen resolution. These z-buffer accesses cost an immense amount of memory bandwidth. In fact the z-buffer is the most accessed part of the local video memory.
Besides the tile architecture we know from Videologic and Microsoft’s Talisman idea ATi is the first mainstream 3D chip maker that has developed a technology that addresses both problems, the wasted fill rate as well as the wasted memory bandwidth.
B – Radeon’s HyperZ
The biggest catch for ATi’s new 3D-chip is clearly this feature. In fact I like HyperZ so much, that I really have to commend ATi’s engineering team for it. The only complaint I have is the name. ‘HyperZ’ is definitely ‘too funky for me’. ‘Accelerated Z’ would have catered more to the grown-ups who are supposed to shell out the significant amount of money for a Radeon card.
The actual ‘HyperZ’ technology consists of three different ways to reduce the fill rate as well as the memory bandwidth waste that kills so much performance in modern 3D accelerators.
Hierarchical Z
The first technique to reduce unnecessary Z-buffer accesses and wasted pixel rendering is called ‘Hierarchical Z’. It comes into place AFTER the triangle setup and BEFORE the rendering unit. Before a pixel gets sent from the triangle setup to the rendering unit Hierarchical Z looks up a defined area of the Z-buffer and checks if the pixel will be visible or not. If the pixel should be hidden it gets discarded right away, so that the rendering unit doesn’t waste its time with it. The catch here is that this defined area of the Z-buffer is kept in a special cache, which avoids unnecessary Z-buffer reads. I don’t want to get in any deeper detail, because ATi doesn’t want to disclose any more than necessary.
Z-Compression
This one is rather easy to understand, although the implementation takes a bit more understanding. We have learned that Z-buffer accesses are the biggest threat to local video memory bandwidth, so it is easy to understand that the (lossless) compression of the Z-coordinates will increase the performance of any chip that is hindered by memory bandwidth problems. The programmers amongst you will know that the compression of one z-coordinate at a time won’t buy you much, so you can figure that ATi is obviously compressing a whole area. It is also not too difficult to guess that this area will be the very same one that is kept in the cache for the operations of Hierarchical-Z.
Fast Z-Clear
‘Z-Clear’ is something that most of us would forget when thinking about the Z-buffer impact on local video memory bandwidth, because it’s not required while a frame is rendered. However, ‘Z-Clear’ is needed each time a frame has been fully rendered, and before the next frame can get drawn.
Hidden behind the term ‘Z-Clear’ is something rather pathetic, but important. We’ve learned above that each pixel that was rendered gets its z-coordinate stored in the z-buffer, so that the rendering unit can find out if a pixel that ‘wants’ to get rendered in the same spot is in front or behind that other pixel. Once the frame is all rendered the z-buffer represents the z-coordinates of all pixels that are visible on the screen. Those values need to be cleared before the next frame gets rendered, which is done by filling the z-buffer with zeros. A zero in the z-buffer shows the rendering pipeline that no pixel has been rendered in the spot so far, with the result that the pixel in the rendering pipeline will get rendered and not discarded.
Now this filling of a respectable amount of memory with zeros is taking a considerable amount of time and of course memory bandwidth. At a screen resolution of 1600×1200 and a color depth of 32-bit, the z-buffer is no less than 5.5 MB big. This amount of memory needs to be cleared after each frame, which can take quite a while. ATi’s ‘Fast Z-Clear’ is able to clear the z-buffer more than 50 times faster, thus saving time and memory bandwidth. Again the programmers amongst my readers will have a pretty good idea how this will work, especially after my comments about the ‘special areas’ of the Z-buffer.
Fill Rate and Memory Bandwidth – They Belong Together!
In the recent article about 3D benchmarking I already tried to point out that today the fill rate of a 3D-chip is vastly limited by the bandwidth of its local video memory. In fact I have produced benchmarking data that shows how badly the impact of memory bandwidth is destroying the fill rate of the GeForce2 GTS chip quite a while ago. I know that Voodoo5 is suffering from this memory bandwidth problem even more and there will be an article dedicated to these surprising or even shocking findings soon. In today’s article I only want to show you two charts which shall prove that ATi’s idea with HyperZ is right on the money.
What I did here was to underclock the GeForce2 GTS chip to prove that with standard memory it is utterly unable to ever supply the 800 Mpixel/s that NVIDIA is claiming. What you see are 16-bit results of Quake 3 Arena scored with a standard GeForce2 GTS 64 MB card with a chip clock of 200 MHz and a memory clock of 333 MHz, compared to the results of Gainward’s CARDEXpert GeForce2 GTS/400 card with its memory clocked at 472 MHz and the chip clocked at different clock speeds. The idea behind those numbers is that only the Gainward card is able to supply the claimed fill rate of 800 Mpixel/s, because it isn’t restricted by its memory. A GeForce2 GTS chip clocked at 166 MHz can only supply 667 Mpixel/s, at 133 MHz it’s only 533 Mpixel/s and at 100 MHz GF2 can only render 400 Mpixel/s, regardless how fast the memory is. You can see that the standard GeForce2 GTS card scores just a bit better than the Gainward card with a chip clock of 133 MHz. This proves quite nicely that in reality, a standard GeForce2 GTS card is just about able to supply some 550-600 Mpixel/s in 16-bit color. There is no way that GeForce2 GTS can achieve 800 Mpixel/s with 6ns DDR SDRAM! Have you ever wondered why GeForce2 GTS is not scoring double the frame rates of GeForce256, although it is supposed to have almost double the pixel fill rate and more than triple the texel fill rate of its predecessor? Well, here’s the answer.
Fill Rate and Memory Bandwidth – They Belong Together! Continued
Things are even worse in 32-bit color. Even the Gainward card is dropping its frame rates, proving that it is under memory bandwidth restriction. However, the standard GeForce2 GTS card is not even able to reach the scores of the Gainward card underclocked to a pathetic 100 MHz core clock!! This simply shows that the GeForce2 GTS chip equipped with standard memory is not even able to supply 400 Mpixel/s at 32-bit color. In fact it is even less than 300 Mpixel/s, as I will show you in my upcoming article.
Before now the 3dfx-freaks start shouting hurray, because I found a speckle on NVIDIA’s vest, I would like to remind all Voodoo5 owners of the fact that the good Voodoo5 5500 scores even less than GeForce2 GTS, although 3dfx had the courage to claim that Voodoo5 5500 had a fill rate of 667 Mpixel/s. The new 3dfx card is even less able to live up to those claims, once again because of its memory restriction. This is even more disappointing than GeForce2, because Voodoo5 5500 has got the very same memory bandwidth as GeForce2. If it had the claimed fill rate it would kick GeForce2’s butt. However, its 16-bit fill rate is closer to 500 Mpixel/s and its 32-bit fill rate just above 200 Mpixel/s. Ain’t that sad?
I hope that this will make you understand why Radeon with its pathetically sounding fill rate of only 366 Mpixel/s is able to be right up there with the competition. By using its memory more efficiently than the others Radeon can deliver where the others can’t.
Summary HyperZ
The memory bandwidth has become the most important factor for high-end 3D chips today, which is well known to the entire 3D-chip industry, but nobody would talk about it. We have just learned that even chips with the highest theoretical fill rate can’t deliver due to memory restrictions, so that claims of fill rates above 600 Mpixel/s can only be sustained with either a memory bandwidth way above 5.3 GB/s or a new technology that avoids unnecessary accesses to memory.
HyperZ improves frame rates by about 20% right now and it will probably be even more soon, once all the features have the proper driver support. HyperZ reduces memory bandwidth requirements at the same time as it limits the waste of fill rate for pixels that are not supposed to be displayed. With memory advancing much slower than 3D chip performance ATi has done the right thing. If the memory doesn’t get faster soon enough the solution is to use its bandwidth more economically. Radeon is the first 3D chip of the year 2000 that can actually live up to its (admittedly modest) fill rate claims.
C – The Pixel Tapestry Architecture
What NVIDIA calls ‘NSR‘, the NVIDIA Shading Rasterizer, is called ‘Pixel Tapestry’ by ATi. The main difference between the two is, that ATi’s solution offers two rendering pipelines with three texture units each, unlike the four pipelines with two texture units each found in NVIDIA’s GeForce2 GTS chip.
The results is that Radeon has only got a pixel fill rate of 366 Mpixel/s while GeForce2 GTS claims 800 Mpixel/s. It’s a lot closer when you look at the texel fill rate, because here Radeon can supply 1,100 Mtexel/s and GeForceclaims 1,600 Mtexel/s. Keeping in mind the impact of HyperZ makes Radeon’s effective fill rates go up or GeForce2 GTS’ fill rates go down, whichever you prefer.
It is a fact that Radeon is not able to look as good if a game is only using two textures per pixel, because then one texture unit per pipeline will be idle, while GeForce2 GTS gets a bit more in trouble if 3 textures are used for each pixel, since then it will have idle texture units while Radeon won’t. Future games will show whose solution is better, but from a raw force point of view GeForce2 GTS seems clearly in front.
The Pixel Tapestry Architecture, Continued
Still I’d like to show you this little table made by ATi:
ATi, our new memory bandwidth economist, is also making the point that in case of three textures per pixel Radeon doesn’t require as many texture memory accesses as GeForce2 GTS, because it can load all three texels, perform the rendering and then store the pixel, while an architecture with only two texture units per pipeline has to write the pixel to memory twice.
All in all it is important to note that Radeon is just as able to do per-pixel rendering as GeForce2 GTS.
Here are a few sample pics that show how 3 or more applied textures can make a 3D scene look quite awesome:
3D – Textures
ATi introduces this new feature, although it is so far only supported by OpenGL 1.2 and not by DirectX. The idea is a texture that is actually filling a volume instead of just representing a surface. At the same time the texture itself will still only by visible where it intersects polygons, which means that in the end it’s still seen as a 2D texture, only that it’s defined in a different way.
The usage of 3D textures includes special volumetric lighting conditions as well as the well-known idea of cutting down objects that will then offer you a look inside. It is certainly a great thing for lovers of gruesome games, as you could cut open the tummy of your enemy and have a good look at his liver. We will see how game developers will adopt this nifty feature. It’s a new thing and maybe that’s the reason why it doesn’t sound as if one couldn’t live without it right now.
Bump Mapping
Well, well, haven’t we heard it all about bump mapping by now? In fact I think we have, which is why I will simply list the three bump mapping techniques that Radeon supports.
- Emboss Bump Mapping is the simplest way of bump mapping with the least satisfying results. It works by simply adding another texture (height map) plus some simple computations.
- Dot Product 3 is the bump mapping version that is using the ‘per-pixel’ approach, using a ‘normal map’, which doesn’t mean that this map is normal, but ‘normal’ is the name of the vector that is carrying the light reflection information. Dot Product 3 bump mapping is the most sophisticated and certainly most realistic bump mapping form to date.
- Environment Mapped Bump Mapping is certainly still known to some of you from Matrox and its G400 chip. This bump mapping technique is best suited for glossy, shiny or reflective surfaces, as e.g. waves on water. It is using a special kind of texture map. NVIDIA once said that GeForce2 GTS is supporting this bump mapping form as well, but rumor has it that GeForce2 does actually not support it.
Environment Mapping
We’ve also heard a lot about environment mapping already, since NVIDIA did its best to educate us about this feature as well. It ‘maps the environment’ onto an object, which in layman terms could be called ‘mirroring’. There are always a lot of objects in a 3D-world that could easily be reflective and that’s where Environment Mapping comes in.
Radeon supports all forms of environment mapping, which is equal to NVIDIA’s GeForce chips. Spherical Environment Mapping as well as Dual-Paraboloid Environment Mapping aren’t offering very satisfying results, which might be the reason why they are not supported by DirectX 7. However, Radeon is also able to use Cube Environment Mapping, which I will explain once more.
Cube Environment Mapping
Reflection of the surrounding environment on a 3D object may sound rather unimportant, but if you look around your room you might realize that many objects actually do reflect. A bottle of water, a mobile phone, a CD-cover, a picture frame, the front of your stereo, … are only few examples of reflecting objects that could be found in any 3D scene just as much. To make the 3D-world more realistic those object should show reflections as well and for this purpose cube environment mapping was introduced into NVIDIA’s GeForce last year.
Cube environment mapping is a technique developed by SGI a while ago. The idea is pretty simple. From an object with a reflective surface (and not from the room center, as I read somewhere else) you render six environment maps in each room direction (front, back, up, down, left, right) and use those to display the reflections on this object.
It can either be used to reflect in detail or blurred, but it can also be used for more accurate (per-pixel) specular lighting. The viewer/camera can move around the reflecting object without you noticing distortions or other artifacts in the reflection, something that’s not possible with sphere environment mapping. Cube environment mapping is fully implemented into DirectX 7 and will certainly be found in 3D-games very soon.
Projective Textures
The main difference between projective textures and normal textures is that, like a spot light, it can be projected through a room and each polygon it hits would display a part of this projective texture, while casting a shadow behind it. To do this a priority buffer is required.
Priority Buffers
The idea behind the priority buffer is rather simple. It adds another form of depth information over the well known z-buffer, but it’s a lot less complex. A priority buffer assigns numbers to polygons in the 3D scene depending how close they are to the viewer. This buffer is required for shadow mapping. Radeon is meant to be the first architecture that supports this special buffer in hardware.
Shadow Mapping
Once a 3D chip supports the priority buffer, it can take advantage of ‘Shadow Mapping’. This method of generating realistic shadows is superior to the volumetric shadows that use the stencil buffer. ‘Shadow Mapping’ is easier to implement and it is the only method that lets an object cast a shadow on itself.
Range Based Fog
The long list of the Pixel Tapestry Engine’s features ends with the range based fog. This one is simple to understand. While the older technique of depth-based fog defines its ‘fogginess’ by the depth of an object, range-based fog defines it by its distance from to the viewer.
Radeon’s Video Features
You certainly remember ATi’s long tradition for MPEG2, DVD, DTV and HDTV support and so Radeon can of course take advantage of the latest developments that ATi has made in this area as well. Instead of going on about all the video details of Radeon, I will simply state that so far ATi cards have always been the best integrated video solution available and I doubt that it will be any different with Radeon.
Here’s a short list of Radeon’s video features:
- Hardware Motion Compensation
- IDCT / DCT
- Subpicture
- 4×4-tap Horizontal and Vertical Filtered Scaling
- Adaptive Per-Pixel De-Interlacing
- 8-bit Alpha Blending of Video and Graphics
- Direct Support of All DTV formats in their Native Resolution, Up to 1920×1080
Card Details
ATi’s new Radeon cards are supposed to start shipping right now. The 32 MB versions are supposed to cost around $279 US, the 64 MB version will be around $349 US. Initially the Radeon cards will only be available without video in/out, but in August ATi will start to supply cards with this feature as well. September holds another new Radeon card, the Radeon SDR. This card is supposed to be only 10-15% slower than the cards with DDR memory, but it will go for only $199. A little bit later on you will finally be able to get ATi’s famous ‘All-In-Wonder’ solution of the Radeon as well, which will include a TV-tuner, EPG software and all the other nifty video functions, as e.g. a programmable fully digital MPEG2 video recorder function with time shifting.
These are the details of our test card:
Chip | ATi Radeon / Rage 6 |
Chip Clock | 183 MHz |
Power Consumption | 6 W |
Process | .18 micron |
Memory | 5.5 ns DDR SDRAM (Hyundai) |
Memory Clock | 183 MHz |
Amount of Memory | 64 MB |
RAMDAC Clock | 350 MHz |
Driver Interface
The driver interface is an area that many ATI customers have come to appreciate for its simplicity yet ability to offer full control over the various desktop, 3D and video options. This hasn’t changed much as things have stayed rather simple within this driver. It maintains the functionality that allows advanced users to adjust most of the commonly used features. Let’s take a look at the available option windows.
Our first window is very simple. The options consist of video out choices and the ability to use presets or schemes.
Nothing too strange in this window that gives you the ability to adjust your various color levels for the desktop. This is a very basic window for a very simple option.
Driver Interface, Continued
Here you have the slew of OpenGL options ranging from VSYNC to FSAA enabling. You’ll notice that you can only toggle the FSAA option on but not adjust it. The driver forces a 4X Super Sampling mode if you enable FSAA.
Our Direct3D options are rather scarce but you’ll most likely only need a couple of them. The FSAA option is once again not adjustable but instead locked at a 4X Super Sampling mode.
The options window provides you with basic driver information as well as the ability to toggle the taskbar shortcuts to the display property adjustments.
Overall the ATi driver interface is nearly identical to what we’ve seen in the Rage Fury MAXX. The interface is clean and simple but gives users everything they’ll need for the most part. Our power-users will probably be more inclined to download and use Powerstrip if they need such added functionality.
Test Setup
Graphics Cards and Drivers | |
Radeon DDR 64MB | 4.12.3044 |
GeForce2 GTS GeForce2 MX GeForce DDR 32MB |
4.12.01.0532 |
Voodoo5 5500 | 4.12.01.0543 |
Platform Information | |
CPU | PIII 1GHz |
Motherboard | Asus CUSL2 (bios 1000 BETA 013) |
Memory | Crucial PC133 CAS2 |
Network | Netgear FA310TX |
Environment Settings | |
OS Version | Windows 98 SE 4.10.2222 A |
DirectX Version | 7.0 |
Quake 3 Arena | Retail version command line = +set cd_nocd 1 +set s_initsound 0 OpenGL FSAA set to 4X Super Sampling |
Expendable | Downloadable Demo Version command line = -timedemo D3D FSAA set to 4x Super Sampling |
Evolva | Rolling Demo v1.2 Build 944 Standard command line = -benchmark Bump Mapped command line = -benchmark -dotbump |
MDK2 | Downloadable Demo Version T&L = On |
Benchmark Expectations
With the obvious raw fill-rate disadvantage, I expect to see the Radeon take some losses in 16-bit color and in low resolutions. However, I do feel that we’ll see a few ties or minor wins at the higher color depths and resolutions due to the Hyper Z technology. T&L benchmarks should be very close as both NVIDIA and ATI are claiming some serious numbers when it comes to their respective hardware implementations. Let’s take a look now at the tests to see what the story really is.
Test hardware noted with an ‘*’ notes that the board was unable to run the given test due to memory or driver issues.
Benchmark Results
Benchmark Results – Quake 3 Arena Demo001
Although the Radeon doesn’t take a commanding lead at our Quake 3 16-bit tests, it does manage to do a decent job staying with the rest of the pack. Only the mighty GeForce2 towers over it at this point.
Things start looking up for the Radeon as the benchmark goes into High Quality mode and the resolution rises. You can see the efficiencies of HyperZ start to kick in as the Radeon gain a tremendous amount of ground quickly when the resolution goes up.
Benchmark Results – Quake 3 Arena Demo001
As you can see by the results, the Radeon is hanging right in there with the various NVIDIA offerings. You’ll notice that the card is mostly limited by the CPU and not the fill-rate as the score remains the same across the board.
The Radeon stands its ground as the fill-rate demands increase while the GeForce2 slowly trails off and loses the lead.
Benchmark Results – Quake 3 Arena Demo001 FSAA
Unfortunately all but two cards are able to properly do this FSAA mode correctly. The VD5 5500 does have a slight video quality edge in this mode but it’s very hard to tell the two pictures apart. It takes very close examination to notice the differences at this setting. When we bring our high color and resolution FSAA modes into the testing, the Radeon rages forward to take a commanding lead over the 3dfx offering. ATi can probably chalk this win up thanks to Hyper Z as it saves precious memory, bandwidth and processing time.
Benchmark Results – Expendable Demo
Our newcomer continues to wade in the midst of its competitors in 16-bit color. Things look brighter once the resolutions gets pumped up a bit.
As predicted, the Radeon not only reaches but also surpasses the GeForce2 at 32-bit as the resolutions rise.
Benchmark Results – Expendable Demo FSAA
Once again only the Radeon and VD5 5500 are able to do the demanding 4X FSAA mode our testing required. The 3dfx monster dominates the 16-bit tests but begins to buckle once the game was switched to 32-bit color.
Benchmark Results – Dagoth Moor Zoological Gardens
With the GeForce2 and Radeon both sporting fully functional T&L units, the only things keeping them apart is fill performance. You’ll see the GeForce1 go from slightly faster to slightly slower when the test makes the transition from 16-bit to 32-bit color.
Benchmark Results – Evolva Rolling Demo
The Radeon doesn’t do as well as I thought it would in this test although it is in 16-bit mode. As resolution rose, the board seemed to do worse which is unusual.
The Radeon does much better now that the color depth is back up to 32-bit again. It keeps a steady 2nd place until it overtakes the GeForce2 at the highest resolution.
Benchmark Results – Evolva Rolling Demo Bump Mapped
Our new Evolva demo had the Bump Mapping feature available so we decided to add it to our suite of testing. It should tax the fill rates of our test subjects a bit more with this feature toggled on.
If you compare the positioning of these bump mapped results to the non-bump mapped, you’ll see that things haven’t changed for the most part. Only at the two highest settings does the Radeon start pulling ahead.
Benchmark Results – MDK2 Demo
Our final test is heavily dependant on T&L and then fill-rate performance. You’ll notice that the VD5 5500 struggles throughout because it must rely on the CPU for its T&L calculations. The Radeon does respectably well but falls far behind the GeForce2 and GeForce DDR powerhouses.
Things continue to be very consistent with the Radeon in 32-bit color as it sits in the back of the pack until the resolution setting makes its way upwards to 1600×1200.
Apology
I’d like to apologize to all of you who are missing results with a GeForce2 GTS 64 MB card. Unfortunately my 3D-specialist Silvino, who helped me with the benchmarks, did not have any 64 MB GeForce2 GTS card in his lab at the time of the tests. Those cards happen to be a bit faster than the 32 MB versions at resolutions above 1024×768 and they are able to do 1600x1200x32 in D3D as well as 640x480x32 and 800x600x32 in 4X FSAA. Still I can assure you that spot checks done by me have shown that Radeon keeps a continuous lead over GeForce2 GTS in 32-bit color at resolutions above 1024×768, even if the NVIDIA chip should be equipped with 64 MB and even if it should run with its memory overclocked to 366 MHz. I will supply you with updated results as soon as I have slept a bit. I will also add the resolutions in between 1024×768 and 1600×1200 for those of you who missed them in our benchmarks. Running those tests would have delayed this review by about a day.
Conclusions
You have read the endless list of 3D features and you have seen all the benchmark results. What do you say?
First of all I’d like to note that Radeon’s benchmark results at 16-bit color look worse than they are, simply because numbers can’t give you a feel for game play. I understand if some of you might complain about Radeon’s 16-bit performance, but the 16-bit scores of Radeon are only an issue if you happen to play at 1280x1024x16 or 1600x1200x16, because Radeon’s scores are definitely good enough in all the lower 16-bit color resolutions. In those two resolutions GeForce2 GTS is clearly and utterly beating the Radeon. This doesn’t come as a surprise, because the NVIDIA chip does not suffer from memory bandwidth limitation as much in 16-bit color as it suffers in 32-bit color, making it able to reach at least 70% of its claimed fill rate. Radeon’s HyperZ is also not too effective in 16-bit color, which is why Radeon’s scores are almost identical in 16-bit color as in 32-bit color.
Things are a lot different when you look at the results in 32-bit color. You might be missing the scores at 1152×864 and 1280×1024, but believe me, as soon as the resolution skips 1024×768 Radeon is ahead of the rest, thanks to its HyperZ feature. The same is valid for FSAA. Let’s be honest, why should somebody who is so much into image quality that he is using FSAA use anything worse than 32-bit color?
I personally like the Radeon, which is mainly due to the elegant ‘HyperZ’-feature with the stupid name. ATi has shown that the memory bandwidth issue can be tackled in a different way than with pure brute force. It’s like a light and fast sports car with a much better fuel consumption thanks to smart technology.
Radeon is indeed up there with the top crop when it comes to 32-bit performance and the chip comes with a wealth of new 3D features. Additionally you get the best integrated video, DVD and HDTV solution that money can buy right now. What is NVIDIA supposed to say? 16-bit color is more important than 32-bit color? I don’t think so, since it was NVIDIA who told us how important 32-bit color is back in 1999 when 3dfx’s Voodoo3 was unable to support that.
I am pretty sure that Radeon’s performance will further improve once the drivers have matured a bit. I certainly look forward to the luxurious ‘All-In-One’ Radeon that’s supposed to be released in early fall of this year. The SDR Radeon might mix up NVIDIA’s GeForce2 MX sales as well, because it is meant to offer performance that’s close to DDR Radeon for a rather low price.
As I already said, I like the Radeon. I like it because I prefer intelligent technology to brute force. That’s why I also prefer a Porsche 911 Turbo to a Dodge Viper.
Please Follow-up by reading the article Update: ATi’s Industry Shaking Radeon revisited.