<!–#set var="article_header" value="3D Benchmarking –
Understanding Frame Rate Scores” –>
Introduction
Already quite a while ago I saw the urgent need to clarify the situation and status of 3D-benchmarking as a tool to evaluate 3D-accelerator chips and cards. Except for the crazy Rambus-hype, there’s hardly any other area in the PC-testing scene where as much wrong information, misunderstanding and false conclusions are published as in the 3D-arena. The reason is quite simple and reminds me a bit of my medical practice. Everybody who is using 3D-cards and particularly the so-called ‘hard core gamers’ are convinced they know lots about 3D graphics, simply because they are using it. It’s just the same as all those people who give medical advices or who think they can tell their doctors what kind of medical treatment he should give them, simply because everybody thinks he knows something about medicine, due to the fact that everyone has a body and some kind of experiences with diseases or injuries. The generalization of those experiences is typically human and I don’t blame anybody for it, but I’d like to point out that it can have very damaging effects. Adding to the confusion is what we doctors call third-hand or tertiary literature, the typical ‘medical advice columns‘ that you find in any rainbow/yellow press magazine. The same holds true for hardware websites. 3D is hip and even though the knowledge isn’t deep, it’s still good to write about it.
3D Games Have Become the Common 3D Benchmark Applications
Today it has finally become common to use 3D-games or at least game-like 3D-scenes as benchmarks for 3D chips and cards. The traditionally most commonly used software for 3D benchmarking is Id’s Quake-family, consisting of Quake, Quake 2 and Quake 3 Arena. However, Quake’s archenemy Unreal and now Unreal Tournament have also become some kind of benchmark standard, because Unreal Tournament’s graphic engine is simply the best looking on the market. Benchmarking with either of the two produces very different results however.
When you run Quake 3 or Unreal Tournament as 3D benchmark your system is under heavy demand. The processor has to do a lot of work and so has the 3D card. While the processor’s performance is impacted by the motherboard chipset and the memory, the 3D chip is impacted by its video or onboard memory as well. This makes 3D-benchmarks so beautiful; they test almost every component of your system except for the mass storage devices like the hard drive and CDROM. Unfortunately a 3D benchmark is only producing one number, called the ‘frame rate’. This number can only express the complete ‘unity’ of all the system devices listed above. It’s more difficult to pinpoint components with 3D-benchmarks, but it is possible. Let’s list the three most important factors on the frame rate score of a 3D-benchmark:
1. The Impact of the Platform
(Processor, Motherboard/Chipset, System Memory, Graphics Bus Type and Software + 3D Card Driver)
The motherboard with the chipset, the memory, the processor and the PCI or AGP-slot can be seen as a unity when you want to benchmark a graphics card. For simplicity I will call those components ‘the platform’ from now on.
The platform is responsible to provide the 3D-scene with all its players, objects, light sources for each frame and it’s calculating the ‘game AI’ as well as any special kind of motion. The geometry calculations, today called ‘transform and lighting’, have to be done either entirely (for cards without T&L) or in parts (for cards with T&L) by ‘the platform’ as well. Once a frame is calculated, the vertices and textures need to be sent to the 3D-card, obviously through the bus, which is PCI or AGP 1x, 2x, 4x. The faster ‘the platform’ is, the more frame data it can send to the 3D card. If ‘the platform’ is not fast enough it is stalling the 3D card and thus lowering the frame rate.
What is important to note is that ‘the platform’ doesn’t care whatsoever about the screen resolution of the 3D game. For ‘the platform’ it’s just the same if Quake 3 runs at 320×240 or at 1920×1440. The reason why is simple. ‘The platform’ sends VERTICES over to the 3D chip. The relative coordinates of those vertices don’t change with different resolutions.
The system also doesn’t care much about the color depth. OK, it does care about it in terms of memory and bus bandwidth to some extent (especially if 32-bit textures should be used), but this is negligible in most of the 3D benchmarks that are used right now.
We can conclude, that 3D-benchmarks will hardly show any performance change over the different resolutions and color depths if ‘the platform’ is the bottleneck.
1. The Impact of the Platform, Continued
This graph shows you how a 3D-benchmark looks if ‘the platform’ should be the limiting factor. The frame rate won’t decrease at higher resolutions or higher color depth, because the 3D chip is permanently waiting for the 3D-data from ‘the platform’. In this case a faster 3D-chip won’t get you any higher frame rates.
Unreal Tournament has a rather ineffective engine when it comes to the usage of ‘the platform’. Therefore a fast 3D chip will always wait for ‘the platform’:
You can see that there’s hardly any change in frame rate over the resolutions, particularly not in case of the Celeron 600 system. However, what you can see is that a faster CPU translates directly into higher frame rates. For Unreal Tournament you rather want to go for a fast CPU than for the fastest graphics card.
The situation is similar if you use a processor that’s not fast enough to evaluate different 3D-cards. If the ‘platform’ is the bottleneck, you will get identical frame rates with completely different 3D-cards. Many ‘reviewers’ have in this case claimed “the different cards perform almost the same!“, simply because they were using a slow platform. Make always sure that 3D-card evaluations are using a platform that is at least as fast as your own! Otherwise the results won’t help you at all!
2. The Impact of the Fill Rate
After taking care of ‘the platform’, the 3D-card is the only thing left. The ‘fill rate’ describes the amount of pixels that a 3D-solution can render in a given amount of time. We all know that a frame consists of a certain amount of little dots, called ‘pixels’. Each screen resolution requires a certain amount of pixels. The common resolution 640×480 is made of 307,200 pixels, while a high resolution as 1600×1200 requires 1,920,000 pixels. The 3D-chip has to ‘render’ each pixel of a frame before the frame can get displayed. The ‘frame rate’ is defined as the number of frames that can be displayed in a certain amount of time. It’s easy to see that it requires a lot more rendering performance to supply a certain frame rate at a high resolution than at a low resolution. This is why typically 3D cards score high frame rates at 640×480 and lower frame rates at 1600×1200. After all the 3D-chip has to render more than 6 times as many pixels for each frame at 1600×1200 than at 640×480.
Nowadays 3D-chips have several rendering pipelines that can operate in parallel. Such a pipeline is usually able to render one pixel per clock cycle. Thus the maximal pixel fill rate is the 3D-chip clock times the number of rendering pipelines times the number of chips in case that more than one 3D-chip is being used on a 3D-card. A typical example would be NVIDIA’s new GeForce2 GTS chip, which is clocked at 200 MHz and which comes with 4 rendering pipelines. 4 pixels x 200 million/s = 800 million pixel/s. 3dfx’s Voodoo5 5500 is clocked at 166 MHz, each chip has two rendering units and the card comes with two chips. 2 pixels x 166 million/s x 2 = 667 million pixel/s.
Now without taking in consideration triangle size, T&L and hidden surface removal one can still say that if the fill rate remains constant frame rate will go down as the resolution goes up. Ideally, you find the highest frame rate at the lowest resolution and see it coming down continuously as resolution increases.
2. The Impact of the Fill Rate, Continued
In most real world applications this isn’t the case. Most of the time you will see the frame rates at the lowest resolutions being almost identical, until the slope finally begins. This is due to the limitation of ‘the platform’ as discussed above. At low resolutions the 3D-chip is stalled because it is able to process data faster than it is delivered by the platform. This effect gets less as resolution increases, which is one reason why the slope is usually starting slowly.
The next thing you might have seen in the schematic fill rate chart above is that I kept the frame rate scores at 32-bit color at the same level as at 16-bit color. This might appear strange to you, because you would never see this behavior in real world applications. In fact, from the 3D-chip point of view, rendering of frames in 32-bit color is pretty much the same as rendering a frame in 16-bit color. As long as the rendering engine is able to handle 32-bit wide data, something that e.g. is not the case of 3dfx’s Voodoo3 chip, the pixels can be rendered in exactly the same amount of time. Thus, as long as an application should only be limited by pure fill rate, the frame rates at 32-bit color should be the same as at 16-bit color. Don’t ever forget that!
3. The Huge Impact of the Memory Bandwidth
In the past, and that means up to fairly recently, the memory bandwidth of the local graphics memory didn’t use to be much of an issue. Hardly any 3D-chip before NVIDIA’s GeForce256 was ever really limited by its memory. When GeForce256 was released in October 1999 it came with SDR memory at 166 MHz clock. The release of the famous ‘GeForceDDR’ cards, which were nothing else than the same chip, but with faster memory, showed how much a fast 3D chip can be stalled by slow memory. Things have become even worse with NVIDIA’s latest high-end chip GeForce2 GTS. 3dfx’s latest Voodoo5 5500 card is suffering from the same problem even a bit harder.
I am showing you this diagram once again to point out under how much threat the local memory of a modern 3D-card really is. Each red arrow is stealing a bit more of the available memory bandwidth.
- First of all the local memory hosts the frame buffer, which consists of a front and a back buffer and in case of triple buffering even a third one. Those buffers have exactly the size of the screen resolution times the color depth. The frame buffer needs to be accessed by the rendering unit for each pixel several times.
- The Z-buffer is also as big as the screen resolution times the Z-buffer depth. It gets accessed like crazy. You get an idea how hefty Z-buffer puts a threat on memory bandwidth when you realize that Intel added the ‘display cache’ option to the integrated 3D-graphics of i810, which is only supposed to host the Z-buffer of i810. This ‘display cache’=external Z-buffer improves 3D-performance of i810 considerably, because the Z-buffer is the most accessed part of graphics memory.
- Then there is the texture buffer, which holds compressed or uncompressed textures that can then be accessed faster by the rendering unit than if the rendering unit would have to fetch it from main system memory through the AGP. Again, textures need to be read for each pixel several times, depending on the filtering option and the amount of textures applied per pixel.
- I am not quite in the picture of how much impact a T&L-unit has on memory bandwidth, but you can be sure that it is taking at least a small part of it as well.
- Last but not least there is the RAMDAC, which needs to read the front frame buffer to display it on the screen. The higher the resolution and the higher the refresh rate the more often the RAMDAC has to access the frame buffer. You might think that this is not an issue today anymore, but you are sadly mistaken! A 3D-card that is already limited by its memory bandwidth, such as e.g. a GeForce2 GTS card, reacts extremely sensitive to high refresh rates. I measured an impact of over 15% at 1600x1200x32-bit color when I switched between 60 and 85 Hz refresh rate. At lower resolutions it is still an issue.
3. The Huge Impact of the Memory Bandwidth, Continued
After you’ve seen how often the local memory needs to be accessed for each pixel, you can imagine why the impact of memory bandwidth on frame rates increases as screen resolution and color depth go up.
At low resolutions and 16-bit color the memory bandwidth doesn’t usually limit the chip. However, even at only 16-bit color depth and high resolutions the memory bandwidth does already have a hefty impact on the frame rate, regardless how high the theoretical fill rate of the 3D-chip may be. Things get a lot worse at 32-bit color depth. You will see the frame rate almost halve wherever the memory bandwidth was already the bottleneck at 16-bit color. At 32-bit color the amount of data that needs to be transferred between the 3D-chip and the local memory doubles almost exactly. This is why the frame rates at 32-bit color are always lower than at 16-bit color, unless there is excess memory bandwidth at 16-bit color.
This important issue with memory bandwidth has to be kept in mind when reading the fill rates that are claimed for a chip. People who e.g. overclock a GeForce2 GTS chip to 250 MHz are telling you complete crap if they claim a fill rate of 1 Gpixel/s. For this fill rate GeForce2 GTS would require its memory to run at 600 MHz.
Summary
The chart above pretty much shows how an average 3D game benchmark chart would look. In the low resolutions the platform is limiting the frame rate, keeping the line flat in this area. Then at 16-bit color the fill rate limitation comes in and at higher resolutions the frame rate gets another hit by the memory bandwidth limitation. The scores at 32-bit color are lower than the 16-bit color scores. At low resolutions the difference is only small and at higher resolutions the frame rates at 32-bit color are only half of the scores at 16-bit color. The 3D-chip is never quite able to deliver its theoretical fill rate maximum. At low resolutions it’s limited by the platform performance, waiting for the CPU to deliver the 3D-data and at high resolutions the memory bandwidth limitation makes high fill rates impossible.
The Future ..?
Future 3D-chips need much faster memory interfaces if we want high frame rates at high resolutions or full scene antialiasing. A chip that can render 2 Gpixel/s will be stalled permanently if it doesn’t get a memory bandwidth of at least 12 GB/s. The alternatives are solutions that decrease the requirements for memory bandwidth, as e.g. Ati’s ‘hyper-Z’ technology in the upcoming ‘Radeon’ chip. Besides that, faster platforms with faster processors, faster memory (please no RDRAM!) and faster AGP will help a lot too. However, more memory bandwidth is the most important requirement for future 3D-solutions.