We are almost there, the wait for the new microprocessors from AMD and Intel is finally close to an end. Intel’s upcoming Pentium III is an upgrade of the well known Pentium II-core with the new ‘streaming SIMD extension, and AMD’s new K6-3 is nothing but the latest K6-2-core with 256 kB on-die L2-cache running at core-clock. Before we start looking at the scores of those new CPUs, let’s first consider what we should expect.
AMD’s new K6-3
Last year in November AMD improved the K6-2 to the CXT-core with a feature called ‘write combining’. This feature should be known to many of us from the earlier days of Intel’s Pentium Pro processor. ‘Write combining’ has been included in all the sixth generation processor core from Intel, you can find it in the Pentium Pro, the Pentium II, the Celeron and the upcoming Pentium III. What it does is writing to memory in larger chunks and thus less often than an application may demand it. An application that e.g. wants to write only one or two bytes to memory (and this could e.g. be the frame buffer of the graphics card) would usually make the processor do a ‘store’-instruction over the memory bus right away. Especially graphic applications write a whole lot of data to the frame buffer all of the time and it is pretty slow if the CPU has to push only one or two bytes over the 64-bit wide data bus to the graphics card. Write combining waits until 64 bit of data have been collected and then transfers the data to the memory location. This results in only one memory write every four bytes, instead of four memory writes that take a lot more time. As already said, Pentium Pro was the first PC-processor equipped with this feature and many of you can certainly remember when the programm ‘fastvid’ improved Quake on a Pentium Pro significantly, because it enabled write combining to the graphics card’s frame buffer. The K6-2 with the ‘CXT’-core is now able to do the same, but the drivers of the graphics card have to be rewritten for it, which hasn’t happened in the majority of cases yet.
K6-3 adds a 256 kB on-die L2-cache to the CXT-core. This should result in a significant performance increase, because it removes one of the most important speed killers of Socket 7-systems, the on-board L2-cache that’s only running at bus clock speed, thus at either only 66 or on Super7-boards at 100 MHz. Intel’s Pentium Pro was the first CPU that had a L2-cache running at core clock and thus this L2-cache was getting faster with a faster processor core. Pentium II has a L2-cache that’s running at only half the core clock, but it’s still getting faster with a faster CPU. On Socket7 the L2-cache always ran at a fixed speed, regardless if you’ve had a K6-2 300 or K6-2 400 CPU. K6-3 does not suffer from this problem anymore, its L2-cache is ‘as fast’ as the CPU-core, so that faster K6-3s don’t have to wait longer for the L2-cache to deliver its data. The on-board L2-cache on Socket7-boards turns into L3-cache and can still improve the overall system performance by a few percent points. K6-3 is thus expected to perform significantly better than K6-2, and you can plug it into any Socket7-board that supports its voltage (2.3-2.5 V), as long as if it is upgraded to the latest BIOS.
Intel’s new Pentium III
The Pentium III, also known under the code name ‘Katmai’, does not come with a feature that would show an immediate performance increase as in case of the K6-3. Its basic core as well as the L2-cache architecture is identical to the Pentium II processor. The justification for the new name lies in a set of 70 new multimedia instructions, once known as ‘KNI’, now known as ‘SSE’ standing for ‘streaming SIMD extensions’. Those new instructions enable the CPU to perform floating-point calculations on multiple data at the same time, which proves very helpful for 3D graphics, video encoding and decoding and other floating point intensive applications that operate on large sets of data, as e.g. voice recognition. Unfortunately this doesn’t simply work on its own, the application has to be programmed for ‘SSE’ specifically, which means that none of the currently available software will benefit from Pentium III’s new instructions at all. We should also not forget that this beautiful ‘SSE’-stuff is not exactly a new invention, AMD released their very own set of floating point SIMD (single instruction multiple data) instructions last year in June, called ‘3DNow!’. We have learned that K6-2 with its 3DNow!-instruction set is showing a decent performance improvement when running applications that are written for 3DNow!, but we also know how many applications are not ‘3DNow!-optimized’. With the Pentium III we are facing pretty much the same, we will only benefit from its new instructions if software is ‘SSE’-optimized. Without that enhancement software will run identically on a Pentium III as on a Pentium II at the same clock speed. 3D-games that are supposed to take advantage of Pentium III’s new instructions will require DirectX 6.1 and up, without that you won’t find much of a benefit.
If you need more information about the new features of Pentium III, please have a look at my article from Microprocessor Forum 1998.
The Quick Specs
AMD K6-3
- 0.25 micron chip
- CXT core plus 256 kB on-die second level cache running at core clock frequency
- Plugs into Socket 7
- Socket 7 on-board cache turns from second to third-level cache
- 3DNow! floating point SIMD instructions, two pipelines that can operate two 32 bit single precision floating point numbers at the same time, thus 4 32-bit single precision FP numbers can be computed synchronously.
- voltage 2.3-2.5 V, depending on core, marked on the chip
- runs in any Socket7-board that can offer 2.3-2.5 V at a high current, board requires BIOS that recognizes K6-3
- available at clock speeds of 350, 400 and 450 MHz
- 3DNow! supported by DirectX 6 and up
Intel Pentium III
- 0.25 micron chip
- Katmai core, modified Deschutes core with SSE
- Plugs into Slot 1
- 512 kB external second level cache running at half the processor clock frequency, found inside the processor cartridge and thus coming with the CPU
- SECC2 package
- SSE floating point SIMD instructions, one pipeline that can operate four 32 bit single precision floating point numbers at the same time, thus 4 32-bit single precision FP numbers can be computed synchronously.
- Voltage 2 V
- Runs in any Slot1 board, board requires BIOS that recognizes Pentium III and supplies correct micro code update
- Available at clock speeds of 450 and 500 MHz, possibly 550 MHz, but I got different information on this topic. One official Intel source says the PIII 550 will be releases on February 26, the other official source tells me that it will be sometimes in Q2/1999.
- SSE supported by DirectX 6.1 and up
The Numbers
Before I am boring you with even more theory, I rather get into the hard benchmarking data. The test systems were equipped with the following:
AMD K6-3 and K6-2 system:
- Motherboard Asus P5A, BIOS 1005
- 128 MB PC100 SDRAM
- Adaptec 2940U2W SCSI host adapter
- IBM DGVS 09U SCSI hard drive
- Asus V3400TNT graphics card, running with fast memory timings and NVIDIA reference driver 1.06 (‘Detonator’)
Intel Pentium III, Pentium II and Celeron system:
- Motherboard Asus P2B, BIOS 1008 beta 3
- 128 MB PC100 SDRAM
- Adaptec 2940U2W SCSI host adapter
- IBM DGVS 09U SCSI hard drive
- Asus V3400TNT graphics card, running with fast memory timings and NVIDIA reference driver 1.06 (‘Detonator’)
Winstone99 and 3D Winbench99 ran at 1024x768x16bit color, 85 Hz refresh rate.
Quake 2 ran at 1024x768x16bit color and 85 Hz refresh rate, version 3.20, for K6-3 and K6-2 version 3.19 with 3DNow! Quake two patch.
Shogo ran at 1024x768x16bit color, 85 Hz refresh rate, version 2.1, demo ‘tomsdemo’ recorded by myself.
Photoshop 5 ran at 1024x768x16bit color, 85 Hz refresh rate.
3Dmark99 ran at 1024x768x16bit color, 85 Hz refresh rate, triple buffering enabled.
3DstudioMax 1.2 rendered the file ‘ktx_rays.max’ at 640×480.
Naturally Speaking, Netshow Encoder and Photoshop 5 were benchmarked using Intel’s ‘Application Launcher’ software.
For K6-3 third level comparison FIC PA-2013 motherboards were used.
The file system was FAT16.
For Windows NT benchmarks the server version of Windows NT 4 with service pack 4 was used.
Office Application Performance under Windows 98
Winstone used to be the most important benchmark in the PC-world and a product with a good Winstone score is still highly respected. However, times have changed, today the user doesn’t have to wait for office applications to finish their job anymore, office applications are nowadays continuously waiting for the next user input. In other words, even the cheapest systems you can buy today will still provide more than enough power to run Word or Excel. This is why the importance of Winstone scores is certainly decreasing. Today’s PC are used for a lot more than just office applications. 3D-gaming, video editing, 3D-modelling, sound editing and many other things are the really power hungry applications of today and it’s not only a minority anymore that runs those applications. So let’s be aware that a high Winstone score does not automatically make a CPU a great overall performer. This is certainly different compared to two years ago.
It is certainly impressive to see that AMD’s K6-3 at 450 MHz is able to beat Intel’s new flagship Pentium III running at 500 MHz. The new features of Pentium III don’t help in Business Winstone at all, Pentium II at 500 (which does not exist, it’s overclocked) and Celeron 500/100 (which also doesn’t exist yet) are scoring the exact same results. Even Intel’s own performance brief agrees with that. The K6-3 can really show its muscles, the new on-die L2-cache makes it significantly faster than Intel’s 6th generation processors at the same clock speed.
Office Application Performance under Windows NT
NT has always been the friend of Intel’s 6th generation CPUs and this is still remaining the same. NT is an operating system that supports multi-CPU systems, something that still remains in the hands of Intel, AMD won’t have any multi-processor capable CPU available until the release of K7.
AMD’s K6-3 falls back quite considerably and is under NT slightly slower than an Intel P6-processor at the same clock speed.
High End Application Performance under Windows NT
Ziff-Davis’ High End Winstone has always been targeted to the workstation area. The applications used in this benchmark are for professional use and not likely to be found in home or small office environments. This is why High End Winstone may not be as important to the majority of computer users. However, it is a very image-creating segment. Performing well in workstation environments makes the CPU-maker look very professional.
It’s easily seen that K6-3 has got a tough standing in this benchmark, but it’s not the right CPU for those kind of applications anyway. Many of these applications can take a decent advantage of multi-CPU systems, which is impossible with K6-2 or K6-3 CPUs anyway.
Special Pentium III Software Performance – Dragon Naturally Speaking Professional 3.52
Dragon’s Naturally Speaking is a well known professional voice recognition software and the latest version (3.52) is taking advantage of Pentium III’s ‘streaming SIMD extensions’. Today’s PC computing power is increasing at a very fast pace and the prices are dropping at the same time. This opens a great future to voice recognition software, which always needed a lot of computation power. Voice recognition is certainly a very important step to make systems more user friendly and will significantly increase the acceptance of computers as soon as it becomes affordable. The benchmark is turning a voice recording into a text file and measures the time it takes to finish the document. This does of course not apply to real world, since we won’t dictate faster with a faster system. However, scoring well in this benchmark is a good measure of how long it takes until the system is trained to your voice.
Pentium III scores a lot better than all the other CPUs, its performance is about 25% better than Pentium II at the same clock speed, seemingly all due to the new SSE-instructions. The results show a lot more interesting stuff as well though. Naturally Speaking is obviously heavily depending on the memory bandwidth, which is why a Celeron 400 at 66 MHz bus clock is a huge 20% slower than a Celeron 400 at 100 MHz bus clock, something you hardly ever seen in any other benchmark. It seems that L2-cache speed is also very important to Naturally Speaking, since Celeron is scoring a whole lot better than a Pentium II at the same clock/bus clock.
K6-3 is not looking great in this benchmark, but how could it? This benchmark comes directly from Intel and Naturally Speaking is of course not 3DNow!-optimized. I have to say though that Intel is doing an excellent marketing job by providing a benchmark for a voice recognition software that is PIII-optimized. I just heard that Via-Voice is supposed to be 3Dnow!-optimized, but would the marketing department of AMD let us know about it or even use it for the presentation of K6-3? Well, they obviously don’t, and that’s one of the many reasons why Intel is doing a lot better than AMD.
Special Pentium III Software Performance – Microsoft Netshow Encoder
Netshow Encoder is a bit of an odd software to most of us. It converts AVI-files (videos compressed with motion-JPEG, not MPEG!) to a special format that makes those videos ‘streamable’ via the Internet. I would have preferred a decent MPEG1-encoder software or even some kind of MPEG2-encoder, which would mean a lot more to the majority of people. However, I will let Intel off for choosing this program, since it’s so nicely PIII-optimized and it can at least give some idea of how well PIII performs at video encoding. Take the results with a grain of salt, the benchmark comes directly from Intel!
You can see how much of a great job Intel did by choosing this software and we should congratulate Microsoft for providing support of SSE, but not of 3DNow!-enhancements. I guess it’s not the first time that Microsoft is twisting the competition a bit and as long as AMD can’t or won’t kick some ass in Redmond, things will probably just stay the same. OK, so let’s tag this benchmark with ‘Yes, I’ve seen it!’ and quickly move on to the next one.
Special Pentium III Software Performance – Adobe Photoshop 5
Adobe is another close friend of Intel processor enhancements, Photoshop used to be one of the few programs that could show a difference when using the good old MMX-extensions back then. However, Photoshop is after all a widely used DTP-software and thus the results should be respected. What should not be respected is the missing support of 3DNow! though, but maybe I am blaming the wrong one and AMD doesn’t even know that Photoshop exists!
Special Pentium III Software Performance – Futuremark MultimediaMark99
Futuremark is also “strongly supported” by Intel, MultimediaMark99 supports SSE but of course not 3DNow!. The ‘readme’ says that 3Dnow! ‘may’ be supported in the future and we all understand this very well, don’t we? After all 3Dnow! is only out for more than 9 months now and the Pentium III has just been released, so it is obviously extremely logical that SSE has to be supported first, isn’t it? MultimediaMark does four tests, MPEG1-encoding, decoding, imaging and sound editing. Please take the results with a grain of salt also, I only published them to be kind of complete.
Surprise surprise, Pentium III scores better than the rest, K6-3 scores bad – what a fair benchmark!
3D Gaming Benchmark Scores – 3DWinbench99
You know that I don’t think much of 3DWinbench as a tool to evaluate graphics cards performance, but I do consider it as one of the better CPU-benchmarks indeed. Due to the fact that it’s using DirectX’s transform and lighting engine, it’s automatically 3DNow! as well as SSE- optimized when using DirectX 6.1 or higher. It’s still hard to say how much of a good optimizing job was, but it’s at least some way of comparing 3DNow! to SSE.
I do admit that choosing NVIDIA’s RIVA TNT may not have been a too wise decision for this benchmark. The 1.06 driver is certainly not 3Dnow!-optimized, which makes K6-3 fall behind. Pentium III is not scoring a whole lot better than Pentium II and this should be some kind of sign to all of you who expect a major improvement in 3D gameplay with Pentium III.
Looking at the transform & lighting performance on its own shows a bit of a different picture:
Here you can see a pretty nice comparison of 3DNow! and SSE. I don’t even know if AMD was ever aware of it, but until the days of Pentium III, K6-2 was the performance leader in this benchmark. Now times have changed, Pentium III smokes the whole competition. However, what is the value of this benchmark? It’s actually pretty worthless, because you can see that the overall 3DWinbench score is hardly influenced by it. Transform and lighting is just one part of the complete rendering process and improving this by even 100% doesn’t make a game run faster by more than maybe 10%. Maybe this benchmark is for the message boards, where one idiot shouts at another because he has better numbers. I’m sure that my message board and the lovely people on it will love that benchmark as well.
3D Gaming Performance – Monolith Shogo Tomsdemo
This benchmark is actually taken from my Voodoo3-review. It does not include all of the CPUs, so it can only be used as a comparison between Pentium III and K6-3.
Shogo is not yet 3DNow!-optimized, the beta version of the 3DNow!-renderer is too unstable and Shogo isn’t SSE-optimized either. You can see that K6-3 doesn’t score earth-shattering great, but it’s not as far behind as in the 3DWinbench comparison with the TNT. 3Dfx has always been one of the few companies that really supported 3DNow! in their drivers and you can see that K6-3 is doing pretty ok with the Voodoo3.
3D Gaming Performance – Id Quake2 Crusher.dm2
This benchmark was done twice, once with the TNT and once with the Voodoo3:
K6-3 does still not score a whole lot better than K6-2 in Quake2, at least as long as the 3DNow!-support is only minimal. You can also see that Pentium III doesn’t score better than a Pentium II either.
Things look seriously different with Voodoo3 instead of TNT. The 3Dnow!-support of the 3Dfx-driver makes the difference. Now K6-3 is scoring better than a Celeron 400!
Floating Point Performance – 3DStudioMax Rendering Time
Floating point calculations have never been a friend of the K6 or K6-2, when using them for a comparison with Intel CPUs. 3DStudioMax needs pure FPU-power. Let’s see how K6-3 and it’s on-die L2-cache are doing here.
Well, there isn’t much of a difference between with or without the on-die L2-cache. 3DstudioMax is one of the programs that you don’t really want to run on a K6-2 or K6-3, unless you’ve got quite a lot of time to waste.
Third Level Cache Size Comparison for AMD K6-3
A few months ago there was quite a lot of blabla about the influence of the on-board L3-cache in K6-3 systems on the web. Those are the results:
It seems as if it makes sense to go up to 1 MB of L2-cache, 2 MB seem hardly worth it anymore. The P5A with only 512 kB onboard-cache scores actually even better than the PA-2013 with 2 MB onboard-cache. All in all I wouldn’t worry about the L3-cache issue too much at all.
A Word on SSE-Optimized 3D Games
I received a few games and a few demos that are SSE-optimized. Intel’s ‘Bug’-Demo shows a bug flying around an area with a very high amount of polygons. This demo runs of course faster with SSE-enabled on a Pentium III. However, the frame counter that’s built into this demo shows wrong results, which is why I couldn’t use it for any kind of benchmarking. ‘Dispatched’ is a demo from Rage, particularly written for Intel and the Pentium III. A guy is flying around on a jet ski kind of thing through some nicely done open areas and caves. It is not useful as a benchmark as well though. ‘Expendable’ is another game from Rage that is SSE-optimized. On a Pentium III the little figures have more shadows and some of the effects are better. However, this can’t really push me into buying a Pentium III for this game, since it runs just fine on a Pentium II as well. ‘Wargasm’ is also supposed to be PIII-optimized, but Intel never told me what kind of difference you see when running it on a Pentium III system. All in all there isn’t any screamer game out right now that would particularly benefit from SSE-optimizations. Maybe this will change once Quake Arena is out, but it could also be that games will run just fine without SSE for a long while and Pentium III owners may only see some more fancy features.
Conclusion
AMD’s K6-3 is now the fastest PC-processor for in business applications under Windows98, but as soon as Intel releases the Pentium III 550 the crown will go back to Intel. I have to say that I am a bit disappointed by the K6-3, since it only really shines when running integer operations. Applications with floating point calculations as e.g. 3D-games, voice recognition, 3D-rendering and video compression do only run fast enough if optimized for 3DNow!, Unfortunately AMD wasn’t able to do a great job in convincing software developers to do decent optimizations for 3Dnow!, so that the majority of software does still not take advantage of AMD’s floating point SIMD instructions, although this instruction set is available for almost 9 months now. It also seems to be obvious that a K6-3 owner should get no other grapics card than a 3Dfx-card. 3Dfx is the only 3D-chip maker that has well performing 3Dnow!-optimized drivers. AMD will definitely have a serious problem to place K6-3. Its Winstone performance would make it eligible to be promoted as a high-end processor, but this won’t really work out as long as it performs worse than Celeron in most 3D-games. You can also still not really use it for 3D-rendering or other workstation-software, for those tasks the Celeron is the better and still cost effective choice also. One of the beauties about K6-3 is the fact that any K6-2-owner can drop it into his Super7-board, as long as the board provides enough current. However, this previous K6-2 owner may be disappointed by the 3D-gaming performance of K6-3, because he will find that in many cases it’s hardly better than K6-2.
My verdict on the K6-3: Buy it if office application performance is most important to you, get a Celeron if you care about 3D-games or other floating point intensive software.
I never really expected Pentium III to be an exiting product, so I’m not disappointed about it, but I have to say that people should still expect more performance from an Intel product that even got a new name. Pentium III’s SSE-instruction set may be useful for the future or it may not. The story may not be quite as pathetic as in case of MMX. However, when MMX came out, Intel added several performance enhancing features into their product as well. Thus it made sense changing from Pentium Classic to Pentium MMX, the Pentium MMX was significantly faster than its predecessor, even though it wasn’t due to MMX. Pentium III is only faster than Pentium II due to a higher clock speed. We will have to wait if there will be any software that can take a real advantage of SSE. The best step may be Quake Arena, Dragon’s Naturally Speaking and the upcoming 3DStudioMax 3.0. If those software titles should indeed show a real performance advantage on a Pentium III-system, then I will recommend it to you. For now, I cannot see why anybody has to rush into buying Pentium III processors. Celeron is still the by far best bang for the buck and Pentium II prices will drop now as well.
An Outlook into the closer Future
There is is hardly any doubt that Intel made it again. Pentium III may not be exiting, but with the SSE-optimized software that is available now and with upcoming titles like e.g. Quake Arena it only takes a decent amount of Intel-marketing and the Pentium III will be a success. Intel has got a big and pretty powerful department that is pushing software developers into using SSE, so that we can expect a lot of SEE-optimized software this year. Pentium III as it is right now will probably not exist for a long time. The introduction of ‘Dixon’ to the mobile market proves that Intel is ready to go with 256kB on-die L2-cache. ‘Dixon’ is a Pentium II core with 256kB on-die cache and thus very similar to Celeron’s ‘Mendocino’-core, which has only got 128kB on-die L2-cache. The next CPU on Intel’s roadmap has the code name ‘Coppermine’. This CPU is Pentium III with 256kB on-die L2-cache, so the only difference to ‘Dixon’ is the additional SSE-instruction unit. It is not very hard to imagine that Intel has already got several ‘Coppermine’-cores up and running, the couple of million transistors more that Coppermine needs over Dixon will hardly be a problem to Intel. Thus we can expect ‘Coppermine’ anytime soon, or rather any time that Intel feels like releasing it. Coppermine does not require any external chips anymore, which makes it possible that it will also be available for a new kind of socket rather than only for Slot1.
AMD will have to go through a hard time until the release of K7. The K6-3 is not that much of a great product as many of us were hoping for. Office performance alone doesn’t really amaze any of us anymore, the 3D-performance is at least as important. As long as Celeron runs 3D-games and even high-end applications faster than K6-3, AMD can hardly ask for a higher price than what Intel takes for Celeron. This puts AMD into a horrible situation. K6-3 will be significantly more expensive to produce than K6-2, it has more than double the amount of transistors, but K6-3 can not possibly be any more expensive than K6-2 if anyone is supposed to buy it. Celeron’s low pricing is a serious threat to K6-3 sales so that it’s very questionable if AMD will make any profit until the release of K7. K7 will most likely have to face ‘Coppermine’ rather than Pentium III’s current ‘Katmai’-core, so that even this promising processor will have a very hard time at its release date mid 1999.