Today (September 15, 1998) was the official start of
The most interesting announcement for me came right after those two speakers, when Ticky Thakkar gave first details about KNI (Katmai New Instructions) and other enhancements that the upcoming Katmai/Tanner architecture will bring us.
You certainly know already about the 70 new instructions of the upcoming Katmai processor, which are mainly designed to enhance 3D gaming. Those new instructions are offering SIMD (Single Instruction Multiple Data) operations on single precision floating point values, one of the most important things for 3D game computing. This idea has already been realized by AMD with the 3DNow! technology, but the announcement today was making clear that Intel takes a different and more sophisticated approach to this issue, which differs significantly from AMD’s way of implementing SIMD-FP. There’s also a bit more to the Katmai Architecture than just SIMD-FP, as you will see in the following list:
- Introduction of 8 new 128 bit = 4 x 32 bit wide single precision packed CPU registers, enabling the computation of 4 single precision FP variables at the same time, equaling in up to 2 GFLOPs/sec peak floating point performance. It is not yet clear if Katmai will have one or two SIMD-FP pipelines. It is possible that the peak performance of two SIMD-FP pipelines will sum up to an overall result of up to 4 GFLOPs/sec.
AMD’s K6-2 is equipped with 8 64 = 2 x 32 bit registers only.
- Introduction of a new separate processor state or mode to take advantage of those registers, which is the first new Intel processor mode since the 386 mode that was introduced more than 10 years ago.
This new processor mode will require an extension to the operating system, Intel has a patch for Windows98 already available and Windows NT5 will support this new mode by default.
- The new processor state will enable concurrent use of either SIMD-FP and MMX or SIMD-FP and IA-FP double precision floating point code. You may remember that it was and still is impossible to use MMX and normal IA-FP at the same time, since both are using the same registers. This problem does not occur with Katmai’s new SIMD-FP.
The K6-2 is not able to use it’s 3DNow! and normal double precision FP-unit in parallel.
- Introduction of new load/store, basic arithmetic, square root, logic comparison, negation, masking, ‘swizzle’ and conversion instructions on those new SIMD-FP registers, known as ‘KNI’. It is not yet known how many of the 70 new instructions will be used for the SIMD-FP. The different SIMD-FP instructions and specific algorithms will determine which operation will be faster on a Katmai or a K6-2.
Those features will improve 3D gaming significantly, but processing of audio (e.g. speech recognition, surround sound, AC3), physical models and imaging are also supposed to benefit from KNI by a large amount. Andreas Stiller, the famous CPU expert of c’t-Magazine disagrees with the estimation of a well known 3D analyst of Micro Design Resources, who thinks that Katmai’s architecture is pretty much the same as AMD’s 3Dnow! Andreas Stiller agrees with me that the new Katmai SIMD-FP architecture makes a significant difference to AMD’s SIMD-FP implementation. The introduction of a new processor mode as well as the addition of 8 128 bit registers is not realized by 3Dnow! AMD is using 8 64 bit registers and can thus only do two single precision FP operations per pipeline at the same time, Katmai can do four. The K6-2 has got two pipelines, so that Katmai will only have a x2 advantage in case it’s equipped with two SIMD-FP pipelines as well. We should still not forget however, that Katmai’s underlying SIMD-FP unit is based on the architecture of the normal P6 FP unit, which is a lot faster than the normal FP unit of the K6. Hence it wouldn’t be surprising if Katmai’s SIMD-FP instructions will be faster than the ones of the K6-2. We can expect that AMD will include Katmai’s SIMD-FP into the upcoming K7 processor.