Intel’s Dominance
Last year in April AMD was able to offer the fastest Windows x86 CPU for a very short period until Intel released the Pentium II processor. Today, many people have already forgotten about this, simply being used to Intel as the all-time high end CPU supplier. Not only are Pentium II processors ruling the high end market but drops in prices have enabled the Pentium II to becoming more and more affordable. Intel decided to also get into the low end market and released the Celeron CPU, which is nothing other than a Pentium II without L2 cache. At the same time, the Pentium II at 100 MHz front side bus was also launched, assuring Intel’s lead in the upper class systems sector. It won’t be long before Intel comes out with the Pentium II Xeon CPU, which will also use the Pentium II core but with a second level cache that is running at the processor clock speed. Today you can look into almost any PC market segment and Intel is pretty much dominant. The alternative CPUs are becoming less and less popular.
Office Application Performance Becoming less Significant in Favor of 3D Performance
Times are not only changing in terms of Intel’s dominance. The way of how the performance of a processor has to be evaluated has changed a lot too. Whilst everyone was used to using office applications for system performance measuring in the last 5 years, nowadays there is a trend to veer away from these old trails. It’s not that people wouldn’t use office applications anymore, in the business sector as well as in SOHO people are still using a lot of or even mainly office apps. However, system performance has today reached a level where the user isn’t waiting for Winword or Excel anymore, but Winword or Excel are most of the time waiting for the user instead. This lead to the funny expression of ‘Winstone is today measuring how fast the system is waiting for the user’.
Office application performance is still measurable and there are certainly still differences, but it’s really questionable how important office application performance is nowadays. Particularly in the lower end SOHO sector, people don’t really care about how fast their Winword is running. What is of prime importance today is getting more and more important today are the eye candy joys brought about by 3D gaming. A lot of the old fashioned computer journalists are going on about how bad the Celeron processor is, completely missing the point that nobody cares how fast it runs Winword, as long as it runs it as fast as a Pentium MMX 233. What matters instead is its 3D gaming performance and ,surprise surprise, it’s performing very well in this field, making this CPU a lot better than what many publications have cited.
Floating Point SIMD vs. Brute FPU Power
AMD saw this development taking place already a year ago, when they decided to improve the K6 CPU by specifically increasing the 3D performance. Whilst Pentium II CPUs are taking their great 3D performance from their brute FPU power, AMD decided to go a more elegant way of approaching 3D performance. The FPU of a CPU can do an amazing amount of complicated floating point calculations, but for 3D games only some of the FPU calculations are needed. Picking these special ‘3D’ calculations and enabling the CPU to do them on several single numbers at the same time was what AMD did. Grabbing and then processing several data packets at the same time is called ‘SIMD’ or ‘single instruction multiple data’. This does not say that only one instruction is needed to work on multiple data, but this means that you do this instruction on multiple data of the same sort at the same time. 3D processing and rendering is using an incredible number of matrix operations. Huge amounts of data has to be processed all the same, usually done one after the other. SIMD can improve this significantly, because grabbing for example, four words and processing them at the same time is obviously faster than grabbing one word four times.
The first time SIMD was implemented into a x86 CPU was when the Pentium w/MMX was released. Intel did a lot of work convincing us that MMX would accelerate any kind of multi media, including 3D. Today Intel admits that MMX is mainly good for image processing, MMX2 or ‘KNI’=’Katmai New Instructions’ is supposed to change that significantly though. The difference of the K6-2’s new instruction set ‘3DNow!’ and what we know from MMx already is that ‘3DNow!’ as well as KNI are able to do SIMD with floating point numbers (MMX could do this only with integers). Here’s where the 3D acceleration takes place.
Problem No.1 – New Instructions Require New Software
It took a long while for MMX software to materialise after Intel had released the Pentium w/MMX and it seems as if this was a painful experience for them. AMD is facing a similar problem with its 3DNow!. However their situation doesn’t seem to be quite so dire. Whilst MMX software wasn’t necessarily exciting, 3D games can easily amaze people, so that the demand will be higher, thus pushing game developers as well as 3D chip manufacturers into using 3DNow! features as long as Intel will let them.
If anyone wants to take advantage of the new K6-2 and 3DNow!, there are three possible options on how that can be done. Either the 3D game is taking advantage of DirectX 6 by using the geometry engine of Direct3D 6, or the game has got its own geometry engine which is using 3DNow! directly. Games that are only written for DirectX 5 or which don’t use 3DNow! in their own engine will show only small or no improvements at all with the K6-2. The third option is a 3D chip/card driver that is optimized for 3DNow!. NVIDIA is the first 3D chip manufacturer who supplies a special driver for 3DNow!. It would be sensible to assume that AMD prefers game developers to use 3DNow! directly in their games, but if this should not be an option, it’s still of advantage if the game is at least programmed for DirectX 6. It will be up to us consumers to push 3D chip makers into providing drivers that are optimized for 3DNow!.
Problem No. 2 – The Future of Socket 7
On April 15, 1998 Intel had quite a memorable day. Not only were the next generation of Pentium II CPUs with 100 MHz FSB released, but there was also the release of the Celeron CPU, which was targeted at the low end market segment. Both CPUs are use Slot 1 instead of good old Socket 7. Intel wants Socket 7 to die a quick and painful death and AMD will have a rough job keeping people on this platform. It is certainly not easy to say what is going to happen with Socket 7, but it’s difficult to overlook meaning that Slot 1 looks the more future proof route to take right now. The K6-2 has to be a good enough product to convince people to stick to the Socket 7 platform. I doubt that any Slot 1 system owner will go back to Socket 7 however.
Office Application Performance
The 3D performance may be becoming more and more important, but the business application performance shouldn’t be ignored either. Although the K6-2 has got a new chip design, the integer part of the CPU is pretty much still the same as found in the K6. It was only optimized for the 100 MHz front side bus clock, mainly assuring a more stable timing at this speed. The K6 was already able to do the 100 MHz FSB, but AMD is not officially supporting this, due to above timing issues. More conservative timing slows down the CPU by a very small amount, which is why a K6 at 300/100 MHz is running business apps about 1% faster than the K6-2.
Compared to the Intel sixth generation CPUs the K6 was already running relatively better under Windows 95 than under Windows NT, and this hasn’t changed with the K6-2. However, you may remember how much performance increase the K6 gets out of the 100 MHz system bus, compared to the 66 MHz bus. Thus the K6-2 is now defintely the fastest Socket 7 CPU for Socket 7 in office applications, far ahead of the Cyrix ‘M2’ and the IBM 6x86MX as well as the Pentium MMX to boot.
The Pentium II, especially the new 100 MHz FSB versions at 350 and 400 MHz core clock, is still holding the office application performance crown and this crown will only go over to the Pentium II Xeon processors with their L2 cache running at CPU clock once they’re released at the end of June. However, the K6-2 stands up pretty well to its Pentium II competitors at the same clock speed. Remember that the Pentium II is unlike the K6-2, hardly getting any benefit out of the higher front side bus clock, which is why we can expect the K6-2 at 350 and 400 MHz being around the performance of the PII 333 and 350, once AMD releases these versions later on this year.
Under Windows 95 the K6-2 300/100 is slightly slower than the Pentium II 300 and the K6-2 333/95 is slightly slower than the Pentium II 333. The performance of the K6-2 in office applications can still raise with better motherboards and larger L2 caches. The test system was only using 512 kB L2 cache, 1 MB and even 2 MB are possible as well though and it will improve the speed. The VIA mVP3 chipset looks pretty promising too, possibly scoring higher than ALI’s Aladdin V.
Windows NT is the domain of Intel’s sixth generation CPUs, so that the K6-2 300/100 and 333/95 can here only score somewhere in between the results of the PII 266 and PII 300. The Celeron overclocked to 400/100 MHz is slightly slower than these CPUs under Windows 95, under Windows NT it’s a tad faster than both of them though. We should also not forget that you can run multiple CPU systems with Pentium II CPUs, accelerating professional software at a very significant level. AMD’s CPUs cannot do that.
Running the high end application Winstone 98 shows the superiority of Intel’s sixth generation processors even more. The K6-2 300/100 as well as the 333/95 version are both in between the Pentium II 233 and 266, and the Celeron overclocked to 300/100 is in the same area too.
All in all the K6-2 is offering a new office application performance push for Socket 7 and as long as you are using Windows 95 with normal business apps the performance is impressively close to what a Pentium II at the same clock speed is able to do.
Benchmark Setup
Socket 7 System:
- Microstar MS-5169 motherboard w/512 kB L2 cache (ALI Aladdin V Chipset rev. C)
- 64 MB Corsair PC100 SDRAM
- IBM DGVS 09U Ultra wide SCSI hard drive
- Adaptec 2940UW SCSI host adapter
- Diamond Viper V330 AGP graphics card, NVIDIA reference driver 4.10.01.0250
- Resolution 1024×768
- Color depth 16 bit
- Refresh rate 85 Hz
Slot 1 System:
- Asus P2B motherboard (Intel 440BX chipset, final revision)
- 64 MB Corsair PC100 SDRAM
- IBM DGVS 09U Ultra wide SCSI hard drive
- Adaptec 2940UW SCSI host adapter
- Diamond Viper V330 AGP graphics card, NVIDIA reference driver 4.10.01.0250
- Resolution 1024×768
- Color depth 16 bit
- Refresh rate 85 Hz
Benchmark Results
3D Performance
Evaluating the 3D performance of the K6-2 is not quite as easy as evaluating the office application performance. The reason for this is that the K6-2 requires software that takes advantage of its new instruction set 3DNow!. Thus the performance advantage varies a lot with different games used. The chart below depicts show how different the performance gain of several of today’s games can be when generated by 3DNow! . It was accomplished by running these games on a K6 at 300/100 MHz against a K6-2 at 300/100.
Id’s Quake II running on a K6-2 and using the new 3Dfx/3DNow! driver is almost double as fast as running on a K6 without 3DNow!, only achieved by optimizing Glide and the mini OpenGL driver, the Quake II engine stayed unchanged. This should be a clear hint on how much performance gain is possible via 3DNow!. 3DWinbench 98 running under Microsoft’s latest DirectX 6 beta shows a 66% performance increase through 3DNow! on a NVIDIA RIVA128 card using drivers optimized for 3DNow!. Rage’s Incoming was optimized for the K6-2 as well, obviously without the same effort as Quake II, but still showed a 25% increase. Forsaken was not specially optimized for the K6-2, but it still gets a performance gain of 14% out of the DirectX 6. Older games like Turok don’t use any fancy DirectX 6 features, thus when running with the K6-2 a similar speed to the K6 CPU was attained.
3D Winbench 98 may not be a very helpful tool when trying to find out about 3D card performance, but it’s still pretty useful for CPU benchmarking. The difference of 3D Winbench compared to current 3D games is that ZD doesn’t use any own 3D engine, but the engine that’s offered by Direct3D, something that isn’t done by any current 3D game. DirectX 6 has a 3D engine that is significatnly improved over DirectX 5’s 3D engine, thus making any 3D card and any CPU scoring higher results by simply using DirectX 6. 3DNow! is built into DirectX 6, making the K6-2 300/100 and 333/95 score almost as high and no less than a Pentium II 400. However, only games that will use the 3D engine of DirectX 6 will show this amazing performance of the K6-2, unless they will be properly optimized for 3DNow! in the first place. Two facts are very important to Intel:
- Current games do not use the 3D engine offered by Direct3D, because DirectX 5 came with a 3D engine which was too lousy. Instead of this games are using their own 3D engines, only taking advantage of DirectX’s raterization features. Future games written specifically for DirectX 6 will possibly use the much improved 3D engine of Direct3D 6.
- The driver of the NVIDIA RIVA128 card running with the K6-2 was specifically optimized for 3DNow!. Drivers of other 3D cards are currently not offering this optimization, thus scoring lower when in comparison to the Pentium II. However, other 3D chip manufacturers may develop drivers optimized for 3DNow! as well soon, which will change the picture.
All in all 3D Winbench 98 shows what is possible. A K6-2 300 could be faster than a Pentium II 400 once 3D software takes full advantage of 3DNow!. All of this perhaps even scaing Intel.
Quake II on a Voodoo2 card, using the new ‘3Dfx/3DNow! driver is currently the most impressive way of showing how much 3DNow! can kick butt. In the past the K6 didn’t have the slightest chance against any of the Intel CPUs when it came down to playing Quake II. Now a K6-2 300 scores an even higher frame rate than the Pentium II 300, which is another scary thing for Intel. The times when every serious Quake II player (like me) had to go for an Intel CPU are definitely over, the K6-2 offers excellent Quake II performance for less money than a Pentium II. Even Thresh is agreeing with me here.
3D Performance, Continued
AMD and Rage Software had a cooperation on Incoming for including 3DNow! support in this great looking game. Unfortunately the result wasn’t as great as what was achieved by 3Dfx and AMD for Quake II. Incoming is still running faster on a Pentium II, showing that the K6-2 alone doesn’t do the job.
It surprised me a lot to see that Forsaken as a game that’s not optimized for 3DNow! is showing the K6-2 a lot better than the ‘optimized’ game Incoming. However, this for DirectX 5 written game is still hardly showing the abilties of the K6-2 at all.
Turok could hardly be distinguished between running on a K6 or a K6-2. It is certainly not optimized for 3DNow! and it doesn’t even really take advantage of DirectX 6’s 3DNow! optimization. For future K6-2 owners, Turok is one of the games that should be avoided.
Evaluation Summary
Giving a final word on the K6-2 is certainly not as easy as it used to be with previous CPUs. As already pointed out in the introduction, the K6-2 is facing a few pretty serious problems. Will the 3D game developers jump on the 3DNow! train? Will the people have enough belief in Socket 7? Will the Taiwanese chipset manufacturers offer convincing solutions for the Super7 platform?
There are three different levels where performance increase can be achieved by implementing 3DNow!:
- Level
- Possible Performance Gain
- 3D Game Engine – up to 100%
- DirectX 6 Support – up to 30%
- 3D Card Driver – up to 15%
The concept of the K6-2 is certainly a very good approach, being more than half a year earlier than Intel with its Katmai CPU, using the MMX2 or ‘KNI’ instructions and offering a similar solution to 3DNow!. The K6-2 could mix up the 3D gaming area by a whole lot, since the performance offered by 3DNow! could change the CPU hirarchy significantly. I’ve got a comment from AMD’s Director of Technical Marketing, Lance Smith (alias SFX), which should say it all: “The 3DNow! is faster than Intel’s best FPU and on a Super7 bus without a backside L2!”. I don’t doubt that the potential of the K6-2 is close to amazing, however am I not in the position to foresee if the answers to the above questions will be all ‘yes’. AMD will do as much as they can, generating an attractive environment for game developers, offering a 3DNow! vector compiler soon and they will hopefully kick some serious butt back there at the Taiwanese chipset and motherboard manufacturers. All in all it’s going to be up to you users out there, if you will have enough belief in the K6-2. If enough units are sold, the game developers will be pushed to do something about this, the chipset manufacturers will smell the money too and the future of Socket 7 will be safe. It’s up to you guys. It is certainly better to keep a serious competitor to Intel, isn;t it? Or do you like the situation on the OS market, where Microsoft can play around with us as much as they like?