Introduction
It started all with the 3Dfx Voodoo chip, when performance hungry users found out that a 3D-chip can be overclocked just as a processor. The idea of overclocking is evidently to improve performance, and this is what was achieved by altering the clock speed of Voodoo, Voodoo2, Rendition V2k, TNT and many more. The last generation of 3D-chips was even officially distinguished by different clock speeds, so that overclocking of 3D-chips has become an almost common thing. With this experience in mind it doesn’t surprise that NVIDIA’s new GeForce256 would be the next overclock-victim and we at Tom’s Hardware decided to not only crank up the core and memory clock of the chip, but also evaluate the effects of those procedures in detail.
Expectations
A – Memory Bandwidth
Last week you learned from our article,
B – Rendering Pipeline
GeForce’s core clock is currently ‘only’ 120 MHz. This is a lot less than what we are used to in high-performance 3D-chips, but we should not forget that GeForce is a lot more complex than its predecessors, especially due to its integrated T&L-engine. GeForce has twice the amount of rendering pipelines than the last generation of high-end 3D-chips, which is another reason why 120 MHz might be good enough for the time being. Let’s have a quick look at the numbers: 4 pipelines that can render one single textured pixel per clock each, running at 120 MHz, results in a theoretical fill rate of maximal 480 Mpixels/s. The (rather stupid term) ‘Mtexels/s’ is identical with also 480 Mpixels/s. Please spare me from explaining why, the unit ‘texels/s’ was invented by 3Dfx to market their Voodoo2-chip and the world would do a lot better without it. Just keep in mind that ‘texels/s’ is only for marketing guys or other people who talk a lot without much knowledge, technicians and engineers detest it.
Anyway, 480 Mpixels/s are not quite as much as we would expect from a ground breaking new 3D-chip, so that we should expect a performance improvement of GeForce after raising the core clock over 120 MHz. Having said that, I’d like to remind you that a 3D-pipeline can only render a pixel per clock if the data supplied by the geometry engine as well as by the local memory is there and ready. GeForce’s pipeline could be nastily stalled by its memory bandwidth, particularly in case of the SDR-board, so that we might never even see the 480 Mpixels/s. In this case we can raise the core clock as much as we want and we won’t get much performance increase at all.
C – T&L-engine
We are talking about GeForce, the ground breaking new ‘GPU’, so that we should not possibly forget that overclocking the core will also improve the speed of the transform and lighting engine. The question is if we will see any improvement in any of the current game applications. Even if the T&L-engine is used by the game, we still don’t know if the rendering pipeline, the memory bandwidth or the T&L-engine is limiting the game’s frame rate. We will try to shed some light into this complex issue with this article.
Testing Methodology
To determine each of the boards (SDR & DDR) core/memory frequency limitations we ran several games until we hit each boards frequency limit (both core & memory). So how did we decide if a particular test setting was stable enough to pass the testing? We ran the highest given setting through our full benchmark suite and if the tests didn’t crash or show visual defects, it passed. Keep in mind that this will not guarantee all cards will over-clock or stay predictably stable through all software applications. Keep in mind that although both boards have the same core chip, the peak core speed on one of our boards was higher than the other and unfortunately it wasn’t the DDR board. After finding our test points, we ran each setting through our usual graphic benchmark suite along with the new DMZG (Dagoth Moor Zoological Gardens) benchmark. All tests were run at high resolution to stress the graphics cards.
Clock Frequency Adjustments
With a few changes to the registry we were able to dig out some hidden display property settings that allowed us to adjust all the clock settings that we needed. As we’ve seen with the TNT2 based cards, OEM’s in the past have provided over-clocking utilities with their cards so that users can push their hardware to the limit and we are pretty sure the same will hold true for the GeForce based boards. With that said, I doubt anyone will have to deal with the registry settings like we did to tweak the card frequency adjustments.
So what do the core and memory performance changes do for you? We will keep this explanation simple because there are so many factors involved. The core clock getting faster increases the performance of anything being processed within the graphics chip. This includes having a , a higher triangle and fill-rate and stronger T&L performance in the case of the GeForce. As for the memory performance (we’re talking speed not size), the need for greater memory speeds is increased when you have large resolutions, triple buffering, complex filtering, high depth complexity, large z-buffer and high color depths. With the increased bandwidth, you are able to process more screens of information per second.
Here is a peek at what we were working with.
You’ll use this panel to setup your core and memory adjustments to your liking. Once you’ve ready to move forward, you have to test the setting before it’s applied. When you run the test you get the following messages.
Voila, as long as your computer didn’t freeze, you’ll be using your new settings after hitting the ok button. Even though the built-in test might pass, it doesn’t necessarily mean the board will be stable. We still had to run the board through our graphic test suite before we labeled the settings stable.
The Benchmark Setup
Before looking at the benchmark scores, we expect certain performance gains from our various test settings. In particular, we anticipate the SDR GeForce to be starved for memory bandwidth at mid to high resolutions. We plan to see performance gains mainly from over-clocking the memory. As for the DDR board, we figure the memory performance should be more than enough and only core settings would alter its overall performance.
Hardware Information | |
CPU | PIII 550 |
Motherboard (BIOS rev.) | ABIT BX6 2.0 (BIOS date 7/13/99) |
Memory | 128 MB Viking PC100 CAS2 |
Network | Netgear FA310TX |
Driver Information | |
NVIDIA GeForce 256 | 4.12.01.0347 |
ATI Rage Fury Pro | 4.11.6713 |
NVIDIA TNT2 Series | 4.11.01.0208 |
Voodoo3 Series | 4.11.01.2103.03.0204 |
Matrox G400 Series | 4.11.01.1300 (5.30.007) |
Environment Settings | |
OS Version | Windows 98 SE 4.10.2222 A |
DirectX Version | 7.0 |
Quake 3 Arena | v1.08 command line = +set cd_nocd 1 +set s_initsound 0 |
Shogo | v2.14 Advanced Settings = disable sound, disable music, disable movies, disable joysticks, enable optimized surfaces, enable triple buffering, enable single-pass multi-texturing High Detail Settings = enabled |
Descent III | Retail version Settings = -nosound -nomusic -nonetwork -timetest |
DMZG | Demo Version Command line = -bench -tl on |
The Benchmark Results – Shogo
Shogo is a DX6-game that uses dual-texturing, some lighting and it doesn’t have a high depth complexity, nor a high polygon count. It does not use GeForce’s T&L-engine and you cannot switch it to 32-bit color depth.
Taking a look at the numbers you can see that the SDR GeForce lacks the memory bandwidth to keep up with its DDR big brother. Note that the SDR board gains more from boosting the memory speed than the DDR board.
Sliding the resolution up pretty high, we begin to hit the fill rate ceiling on all of the cards. Notice how well the over-clocked core speeds help frame rate, but don’t overlook that memory bandwidth is also a big issue here, particularly in case of the GeForce w/SDR. The Voodoo3 based boards do not support this resolution of Shogo.
The Benchmark Results – Descent3 DirectX
Descent is another DX6-game that needs rather little T&L-power of the CPU due to a low depth-complexity and low polygon counts. It’s also using multi-texturing a lighting effects, but you cannot switch it to 32-bit color. It also doesn’t take advantage of GeForce’s T&L.
It appears that in this benchmark we have the CPU as a bottleneck so it makes it tough to draw any solid conclusions. The resolution isn’t really high enough to have reached the GeForce fill-rate.
Here we can see that memory bandwidth plays a huge role in performance as the stock DDR board leaps 8 FPS ahead of the stock SDR board. Although upping the core speed helps, the memory performance of SDR is mainly what’s holding back the board.
The Benchmark Results – Descent3 OpenGL
Here we have shockingly higher results between the DDR and higher clocked SDR boards in the line-up. Boosting the core speeds netted very little while the memory settings boosted performance very well. Take a peek at the SDR board at 120/166 vs. the SDR board at 145/166. Note how we gained a minor .9 FPS using the higher speed core over base configuration while we gained 4.2 FPS gain when putting the memory from 166MHz to 183Mhz. The SDR board gains practically nothing from advancing core but gains almost 10% by turning up the memory speed. Now wander over to the DDR board and note how it actually gains a few frames from the over-clocked core speed. The basic rule of thumb here is that you need to overcome the memory bandwidth problem before upping the core speed does you any good. Again, the Matrox and Voodoo3 boards wouldn’t run Descent 3 in OpenGL mode.
The SDR and DDR GeForce boards both gain some ground when boosting the memory speed. Of course we expected this for the SDR board but it’s still a surprise to see the DDR is still bottlenecked by the memory performance.
The Benchmark Results – Descent3 Best API
We’ve pretty much went over the GeForce scores in both available APIs in this game but wanted to give you an idea of how the other cards faired against the GeForce over-clocked. The Voodoo3 3500 is one tough cookie when running in Glide as only the over-clocked DDR GeForce boards best it.
Tapping the resolution to a hefty setting makes most of the competition drop off quickly as the GeForce keeps its ground and respectable lead. The highest clocked GeForce is almost 20 FPS faster than the next fastest chipset.
The Benchmark Results – Quake Arena – Normal
Quake3 Test is the one of the most advanced games currently available. As an OpenGl-based game it has the chance to take advantage of GeForce’s integrated T&L, but rumor has it that only the next version of Q3Test will include support for it. The lighting is so far mainly done by lighting maps rather than lighting calculations, but Q3 has a higher polygon count and depth complexity than the above games and it gives you the chance of using 32-bit color depth.
There isn’t much to talk about here other than point out that the CPU has become the bottleneck even at this high resolution. The scores of all the difference GeForce flavors are nearly the same. We would like to mention that we’ve seen an Athlon coupled with a DDR board in this very same test break 100+ FPS.
Once again we see the DDR board show us why everyone should opt for the faster memory solution. Both boards gain from boosting the core and memory but it’s obvious that having the DDR board with its superior memory performance is a huge jump in itself. The over-clocked GeForce DDR board almost doubles the score of the next fastest chipset, the TNT2 Ultra.
The Benchmark Results – Quake Arena – High Quality (32-bit)
In Quake Arena with all the features turned up, the DDR board shows us the true power of the GeForce, but it’s quite surprising to see that memory bandwidth is clearly the limiting factor. Even the 4.8 GB/s of the DDR-board are not enough to feed the rendering pipeline with data fast enough, which is why we hardly see any speed gain when overclocking the core of the DDR-board, but a rather remarkable gain from increasing the memory clock This 32-bit test obviously strains the memory performance of the GeForce cards. As you look at the SDR board, you can see that it mainly gains performance when you bump up the memory speed. This isn’t as much of a problem when you look at the DDR scores but you still get better gains from bumping up the memory.
If memory bandwidth is the limiting factor at only 1024×768, then we cannot possibly be surprised about the scores at 1600×1200, a resolution that stresses the memory-interface even harder. The base DDR is about 40% faster than the stock SDR board in this test, but it gains exactly 10% in frame rate after increasing the memory bandwidth by 10. It’s rather obvious, even the DDR-board has a memory bandwidth that is too low.
The Benchmark Results – DMZG 16-bit
This technology demo from WXP is a DX7-game using GeForce’s integrated T&L. It comes with a very high depth complexity and high polygon counts and it can be run at 32-bit color. The high depth complexity requires a high fill rate, because many scene-objects are rendered but not displayed, which means that a lot more than the actual visible pixels have to be rendered for each frame. Only Vidoelogic’s PowerVR-technology is not sensitive to this issue.
Explaining these results is not too easy. GeForce w/DDR is clearly faster than its brother equipped with SDR, but increasing GeForce’s core clock is improving the frame rate scores as well. All that we can say is that the scores in this benchmark seem to be influenced by memory bandwidth, by the speed of the rendering pipeline and possibly by T&L-speed as well, all at the same time.
These results are still rather complex, but it becomes more obvious that memory bandwidth is more of a limiting factor than core speed.
The Benchmark Results – DMZG 32-bit
Again we see a mixture of core speed and memory-bandwidth influences, but memory bandwidth seems to have a larger impact.
At this high resolution and color depth, memory bandwidth seems to influence the scores most, but still the core speed plays an important rule too.
Conclusion
Before I get to my conclusion, I want to state that over-clocking is not for everyone. You can take some serious risks by over-clocking core settings and may damage your hardware. On top of this risk, it’s very possible that you won’t see a problem until you’re in the middle of that important spreadsheet or word document and your machine freezes before you have a chance to save. Make sure your not going to be using your system for any critical tasks while over-clocking the video solution. We also recommend you take careful measures to properly cool your hardware. Please note that not every video board has the same margins in their components for running them out of specification. A great example of this is the difference in over-clockability between the SDR and DDR boards we used for testing. The SDR allowed for a higher core speed (+10MHz) than the DDR board.
The findings in this overclocking article have more of a scientific than a practical value to most of us. The gains achieved by overclocking GeForce256 are rather little, mainly due to the fact that the possible increase of the clock rates is rather small as well. It doesn’t come as a surprise that overclocking the memory of a SDR-GeForce will help in many cases, especially at high resolutions. Cranking up the core clock makes most sense on DDR-boards.
What made this article very interesting to me was the impact of memory bandwidth-alterations of the DDR-GeForce. First of all I was surprised to find out that the DDR-board has an actual lower memory clock than the SDR-board. Doubling the 150 MHz gets you to ‘300 MHz’, which is still way beyond 166 MHz though. However, you remember that I’ve criticized GeForce’s memory interface in my first article and the results we saw after overclocking the memory of the DDR-GeForce seem to prove my point. Especially in Quake3 at the high quality setting the increase in memory clock translated into an increase in frame rate at the same percentage. This shows that even DDR-memory cannot quite compensate the shortcomings of GeForce’s memory interface. I was told by NVIDIA that a 256-bit wide memory interface was close to unrealizable, but I am sure that this is what it will take for future 3D-chips. The only alternative could be the much beloved RDRAM.