<!–#set var="article_header" value="NVIDIA Strikes Back –
The GeForce2 Ultra 3D Monster” –>
Introduction
Late April 2000 NVIDIA released its latest high-end 3D chip, the GeForce2 GTS. Although this chip was again reaching new performance heights, it still left me with mixed feelings. GeForce2 GTS suffered and obviously still suffers from a serious imbalance between chip performance and memory bandwidth, keeping it from ever reaching the impressive fill rate numbers NVIDIA claims.
This imbalance is the very reason why GeForce2 GTS cards are not a lot faster than 3dfx’ Voodoo5 5500 solution at high resolutions and true color, it is why GF2 GTS doesn’t perform amazingly well in FSAA and it also signs responsible for the fact that ATi’s new Radeon chip is even able to surpass the GeForce2 GTS at high-res/true-color settings.
To be properly prepared for this article, I’d like to suggest that you read the following articles addressing the above said, unless you have already done that.
- Tom’s Take On NVIDIA’s New GeForce2 GTS
- Full Review of NVIDIA’s GeForce2 MX
- 3D Benchmarking – Understanding Frame Rate Scores
- The Fastest GeForce2 GTS Card – Gainward’s CARDEXpert GeForce2 GTS/400
- ATi’s New Radeon – Smart Technology Meets Brute Force
Who Is To Blame?
It’s not all fair to blame NVIDIA of bad performance or bad engineering. What NVIDIA could be blamed for is not telling the whole truth when claiming the mind blowing high fill rate numbers of GeForce2 GTS at its launch.
Now before we all kick the behind of NVIDIA, we should realize that the ‘3D chip developing game’, if you don’t mind me calling it that, is not that different from the ‘CPU developing game’. In both cases teams are working on a new design many months or even years before it’s finally released. By the time when GeForce2 GTS was taped out nobody could exactly know what the memory market would look like by the time of its release. Obviously NVIDIA took a lucky guess, hoping for fast DDR SDRAM to be available in the second quarter of 2000 already. We know today, that NVIDIA didn’t have a choice and had to equip GeForce2 GTS with 6 ns DDR SDRAM, which does not quite deliver the bandwidth that it takes to feed this high-speed chip. ATi just released Radeon, which is suffering from memory bandwidth restrictions as well, because the best memory ATi could get was 5.5 ns DDR SDRAM, which is still not really good enough.
GeForce2 Ultra To The Rescue
Finally now NVIDIA was able to get hold of DDR SDRAM of 4.4 and even 4 ns specification. 64 MB of this new and fast memory gets combined with a GeForce2 GTS chip that is running at an even higher clock speed of no less than 250 MHz and ready it is, NVIDIA’s new GeForce2 Ultra!
Experienced 3D cracks and the ones of you who have followed my advice and read all the above listed articles don’t really need to read any further, because they only need to count 2 and 2 together to grasp what kind of performance boost GeForce2 is getting out of that high-octane mix. Finally NVIDIA’s new top-performer isn’t slowed down by dawdling memory anymore and so it can finally unleash the performance that its engineering fathers designed into it.
To bore you all to death I could now repeat all of GeForce2’s 3D features once more, as some of my dear colleagues will certainly do. However, since the GeForce2 Ultra is not a new miracle, but simply a faster version of GeForce2 GTS I will save the valuable time of both of us. If you still require the details, I suggest that you consult the following list. All the others of you are given the chance to skip redundant stuff. This is a chance you don’t have on every website!
GeForce2 Features:
- .18 micron Process
- Second Generation Integrated Transform & Lighting Engine
- NVIDIA Shading Rasterizer
- Per Pixel Lighting and more
- Full Scene Anti Aliasing (FSAA)
- Cube Environment Mapping
- Transform & Lighting – What is it?
- Fill Rate, Rendering Pipelines and Triangle Size
- Wasted Energy – The Rendering Of Hidden Surfaces
- 3D Benchmarking – Understanding Frame Rate Scores
- GeForce2 GTS Suffers Badly From Memory Bandwidth Limitation
There’s even more, but I grew a bit tired of re-iterating it here. Please complain to me if you thought that the list was too short. 🙂
A Look Back in History
Before we get to the details of the new NVIDIA 3D accelerator I would like to point out to you how the issues fill rate and memory bandwidth went with the NVDIA products of the last two years.
Let’s first have a look at the theoretical specs found in the press releases:
Well, in terms of theoretical numbers NVIDIA has come a long way since the release of TNT in fall 1998. TNT came with two rendering pipelines, running at 90 MHz. Today the GeForce2 Ultra comes with four pipes, each of those can render two texels per clock and they are clocked at 250 MHz. This means a theoretical pixel fill rate increase of 455% and a texel fill rate increase of more than 1000%!! That’s called progress!
The next thing of interest is how the memory bandwidth of the card’s video memory evolved:
First of all you might spot the step-back marked by the first released GeForce256 chip with SDR memory. It came with a memory bandwidth of only 2.5 GB/s, although its predecessor TNT2 Ultra had already been able to sport 2.7 GB/s.
The next important thing is that the increase in memory bandwidth from TNT to GeForce2 Ultra is only 331%. It is obviously unable to live up to the high pixel and the ultra high texel fill rate increase that took place from TNT to GF2 Ultra. This fact is a strong proof that memory technology isn’t able to keep up with 3D chip technology. It has always been like this, but in the past the graphics-chip makers simply increased the data path. Going from 16-bit to 32-bit or from 32-bit to 64-bit may be possible, but today we’ve reached 128-bit and more would be very difficult to implement. Thus the memory HAS TO get faster.
A Look Back in History, Continued
The next thing I did was to simply divide the available memory bandwidth of each card by its pixel or texel fill rate. The result is interesting.
With the exception of GeForce256 SDR the ratio stayed the same until GeForce256 DDR. It’s no miracle why the GeForce256 SDR was such a modest performer, because its memory slowed the GeForce chip down immensely. Things went downhill with the GeForce2 GTS, GeForce2 MX is not quite so bad and the now to-be-released GeForce2 Ultra is still suffering from a bandwidth of only 7 bytes per second for each ‘to-be-rendered’ pixel.
Things seem even worse if you look at the bandwidth per texel. However, this is not as much of an issue, as we will see later. The architecture of the modern GeForce chips doesn’t require as much bandwidth per texel anymore. Still, the GeForce2 GTS has got the worst ratio.
A Look Back in History, Continued
Let’s now get to the facts. I ran each card under the fill rate test of 3DMark2000. This benchmark works fine as long as a pipeline isn’t able to render more than two texels per clock. This is why the benchmark is useless for Radeon. It works great for all the NVIDIA cards though.
Please go ahead and compare this chart to the one from the beginning of this chapter. You will notice a significant difference. Only few chips come close enough to their theoretical fill rate limits under 16-bit color. At 32-bit color the memory bandwidth always gets in the way, usually halving the fill rate. This phenomenon is not only valid for NVIDIA cards. Every other 3D chip that has been released so far is suffering from the same problem. Some suffer more, some less. Please restrict your disappointment though! The actual increase in pixel fill rate from TNT to GeForce2 Ultra is even more than it would be if the theoretical numbers could have been met. Going from 140 MPixel/s to 790 MPixel/s is an increase of 465%. That’s almost six fold in less than 2 years!
The story is the same with the texel fill rate of course. There’s not much more to say. Still you can see that there was an immense improvement from TNT to GeForce2 Ultra. Going from 150 Mtexel/s to 1600 Mtexel/s is indeed an increase of almost 1000%.
Those numbers were all measured on an Asus CUSL2 i815 platform with a Pentium III at 1 GHz. I chose the resolution 1024×768 for both color depths. The new ‘Detonator 3’ driver rev. 6.16 was used.
This little excurse was supposed to point out the performance problems created by a lack of memory bandwidth on a 3D card. This is not at all a new problem. However, it became obvious only recently.
The GeForce2 Ultra Card
This funky piece of expensive hardware represents NVIDIA’s new GeForce2 Ultra reference design. Please note that it’s obviously deranged from the Quadro2-design, although it’s a tiny bit bigger, mainly due to the larger power supply of GeForce2 Ultra.
The card comes with the following specs:
- Core Clock 250 MHz
(GeForce2 GTS has a core clock of 200 MHz) - Memory Clock 460 MHz
(GeForce2 GTS runs the memory only at 333 MHz)
This is good for a theoretical 1 GPixel/s pixel fill rate and 2 GPixel/s texel fill rate. The memory bandwidth is a whopping 6.9 GB/s, which marks an increase of 38% over the GeForce2 GTS. The other features are, as mentioned above, identical to GeForce2 GTS.
The shaggadelicly nice green heat sinks cover the most important feature of GeForce2 Ultra, the memory. We removed one of the heat sinks to find the following chip underneath:
A four nano seconds (4ns) rated DDR SDRAM chip from ESMT. This chip is therefore rated for no less than 250 MHz or ‘500 MHz DDR’. This is obviously a bit surprising, since it means that this memory is a bit ‘too good’ for only 460 MHz operation. Our overclocking checks proved that point as the cards memory went indeed up to 500 MHz clock without problems, offering some 8 GB/s of memory bandwidth then. The chip was also not too impressed with the 250 MHz it’s meant to run at by default. We got it up to 285 MHz, which unfortunately still doesn’t mean that much, because even a memory bandwidth of 8 GB/s is still not quite good enough to feed the big data hunger of a GeForce2 chip running at 250 MHz.
NVIDIA told us that it wasn’t easy at all to get all the memory together for a reasonable launch of the GeForce2 Ultra. Therefore some cards might come with memory rated at 4.4 ns and the manufacturers will most likely vary as well. I guess that this is the reason why NVIDIA preferred to stay on the safe side and clocked the memory at ‘only’ 460 MHz.
The New Detonator 3 Drivers
So far about the hardware behind the new GeForce2 Ultra. There is something else that is particularly important for the excellent performance of this new 3D solution. NVIDIA’s driver team under Dwight “I never sleep and my home is my office” Diercks is today the best 3D driver team that exists. Period!
After NVIDIA had to find out that the air is getting a bit thin with ATi up there in the high-res/true-color heights, the efforts were increased, Dwight and his team disregarded such a wasteful thing as sleep (that’s my opinion too) completely and the Detonator 3 driver set was born.
Silvino is doing a dedicated article on this topic, so that I won’t tell you too much now. What I can say however is that this new driver set improves the performance of all supported chips (from TNT right up to GeForce2 Ultra) in a tremendous fashion.
If there is one complaint then it is the missing Linux Detonator 3 driver. Dwight promised me this driver for the end of the month.
Test Setup
Graphics Cards and Drivers | |
Radeon DDR 64MB | 4.12.3054 |
GeForce2 Ultra GeForce2 GTS 64MB GeForce2 GTS GeForce2 MX GeForce DDR 32MB GeForce SDR RIVA TNT2 Ultra RIVA TNT2 RIVA TNT |
4.12.01.0616 |
Voodoo5 5500 | 4.12.01.0543 |
Platform Information | |
CPU | Intel Pentium III 1GHz |
Motherboard | Asus CUSL2 (bios 1002 BETA 02) |
Memory | Wichmann WorkX PC133 CAS2, setting 2-2-2-5/7 |
Network | Netgear FA310TX |
Environment Settings | |
OS Version | Windows 98 SE 4.10.2222 A |
DirectX Version | 7.0 |
Quake 3 Arena | Retail version command line = +set cd_nocd 1 +set s_initsound 0 OpenGL FSAA set to 2x SuperSampling or FSAAQuality 1 |
Expendable | Downloadable Demo Version command line = -timedemo D3D FSAA set to 4x SuperSampling |
Evolva | Rolling Demo v1.2 Build 944 Standard command line = -benchmark Bump Mapped command line = -benchmark -dotbump |
MDK2 | Downloadable Demo Version T&L = On trilinear, high texture detail |
Benchmark Results – Quake 3 Arena
I guess that there’s not much to say about the yellow results in this chart. GeForce2 Ultra leads the pack by quite a respectable margin. If you should be surprised about the high results of the other NVIDIA cards, don’t be! The secret behind those higher numbers is the new Detonator 3 driver rev. 6.16.
At 32-bit color the lead of GeForce2 Ultra is even more obvious and the new Detonator 3 drivers make sure that GeForce2 GTS stays untouched by Radeon as well. GeForce256 DDR can even leave Voodoo5 5500 behind due to the new driver.
Benchmark Results – MDK2
MDK2 takes advantage of integrated T&L, which is why Voodoo5 doesn’t look good in this test. The yellow lines showing the GeForce2 Ultra scores are leading the field.
Once more GeForce2 Ultra can show its muscles at 32-bit color. ATi’s new Radeon stays behind the Detonator 3 turbo charged GeForce2 GTS and GeForce DDR looks very good too.
Benchmark Results – Evolva
The same story all over again. Nothing can touch the new ‘Ultra’.
At last Radeon can surpass GeForce2 GTS at 1600x1200x32. Interestingly, Evolva doesn’t benefit too much from the new Detonator 3 drivers, although NVIDIA bundles this game.
Benchmark Results – Evolva Bump Mapped
Evolva with bump mapping enabled doesn’t make much of a difference to GeForce2 Ultra. It stays in the lead.
Once more Radeon can show that it’s still pretty competitive against GeForce2 GTS. The 8 fps difference between Radeon and GeForce2 Ultra is as close as a competitor gets to the scores of NVIDIA’s new chip.
Benchmark Results – Evolva FSAA Without Bump Mapping
To satisfy the FSAA freaks out there we included one test, running FSAA 4X at 800×600 and 32-bit color on our seven contenders.
Interestingly, here even GeForce2 GTS can pull ahead of ATi’s Radeon and Voodoo5. This might be due to the Detonator 3 drivers. Still it’s NVIDIA’s GeForce2 Ultra that shows how high frame rates can be under FSAA.
Overclocking Results
As already mentioned above, I was able to run my GeForce2 Ultra sample at 285 MHz core clock and 500 MHz memory clock. This represents a core clock increase of 14% and a memory clock increase of 8.6 %. The main gain is due to the increased memory bandwidth, the core clock has only got a minor impact, because the fast chip is still starving for data. Here are the results:
Except for the ‘low’ resolution and color depth of 1024x768x16, where the CPU/system is the limiting factor, the overclocking translates in frame rate increases of 5 – 10%. Especially at 1600×1200 the benefit of the overclocking is very well noticeable.
The same as for Quake 3 is valid for MDK2 as well.
Overclocking Results, Continued
Evolva, our only Direct3D benchmark, is also behaving like the two games displayed above. The benefit of the overclocking is clearly visible.
The bump mapping version of Evolva doesn’t show as high frame rates, but it benefits from the overclocking even more than the other games.
It may well be that your GeForce2 Ultra card will not have memory that is quite as overclockable as this. However, every percent increase of the memory clock will directly translate in frame rate as long as you play at high resolutions and possibly even true color.
Conclusion
The results of the benchmarks show it clearly, NVIDIA’s new GeForce2 Ultra is declassing the whole competition. Frame rates of around 100 fps at 1600×1200 / 16-bit color on a PC haven’t been seen by any of us ever before. Finally even 1600×1200 at 32-bit color is becoming seriously playable, with the GeForce2 Ultra scoring over 50 fps in most of the games we’ve tested! We had all reasons to omit results at screen resolutions of less than 1024×768, because even the weirdest ‘hard core gamer’ would be crazy playing any game at less than that with NVIDIA’s latest 3D-card. If you should indeed fancy FSAA, for some to me unfathomable reason, then the GeForce2 Ultra has got that for you as well. However, you should maybe rather invest in a high-class monitor and take advantage of the excellent detail offered by a resolution of 1600×1200 at true color.
NVIDIA has managed it once more to leapfrog 3dfx. If you look at the results you will notice that GeForce2 Ultra scores more than double the frame rates of a Voodoo5 5500 at high resolutions. Voodoo5 6000 will hardly score more than double of the 5500, due to its design. Therefore it is most likely that NVIDIA will actually keep the 3D performance crown even by the time that 3dfx finally releases the 6000. This is bad news for 3dfx, since their part will most likely be more expensive than the already luxury GeForce2 Ultra as well.
So far everyone who saw the GeForce2 Ultra in action was amazed by its performance. It is almost too fast for today’s games. However, for me this Ultra-version of the GeForce2 chip is finally the product that I expected when NVIDIA released GeForce2 GTS less than 4 months ago. It comes at mind a blowing $499 US, which won’t be affordable to a whole lot of people. If you should indeed spend this rather huge amount of money for a graphics card you will get the fastest and technically most advanced 3D-accelerator money can buy right now. Make sure that you play a lot of games on it, to make the investment worthwhile!
Should you buy this card or not? Well, I guess you can answer this question by yourself best after you had a good look into your wallet. If you only go for the best and if price is no worry for you then go ahead and buy this card. For ‘normal’ people the price tag of $499 is a very bitter pill to swallow, but obviously someone is supposed to buy 3dfx’s Voodoo5 6000 as well and that one’s supposed to go for $600, without any T&L or per-pixel-lighting support.
NVIDIA’s next chip ‘NV20’ is already waiting ‘around the corner’ and this one will sport a complete new set of features, all for DirectX 8. Nowadays you wonder if Microsoft’s DirectX is driving NVIDIA or if it isn’t really the other way around. Most 3D chip makers haven’t even caught up with DircetX 7 yet …