The Trace Cache Branch Prediction Unit
Intel is very proud on the branch prediction unit that aids the execution trace cache. Its branch target buffer is 8 times as large as the one found in Pentium III and its new algorithm is supposed to be way better than AMD's latest G-share algorithm used in Thunderbird and Spitfire. Intel claims that this unit can eliminate 33% of the mispredictions of Pentium III.
One of the most well known features of the new Pentium 4 is its extremely long pipeline. While the pipeline of Pentium III has 10 stages and the one of Athlon 11, Pentium 4 has no less than 20 stages.
The reason for the longer pipeline is Intel's wish of Pentium 4 to deliver highest clock rates. The smaller or shorter each pipeline stage, the fewer transistors or 'gates' it needs and the faster it is able to run. However, there is also one big disadvantage to long pipelines. As soon as it turns out at the end of the pipeline that the software will branch to an address that was not predicted, the whole pipeline needs to be flushed and refilled. The longer the pipeline the more 'in-flight' instructions will be lost and the longer it takes until the pipeline is filled again.
Intel is proud to announce that the Pentium 4 pipeline can keep up to 126 instructions 'in-flight', amongst them up to 48 load and 24 store operations. The improved trace cache branch prediction unit described above is supposed to ensure that flushes of this long pipeline are only rare occasions.
The stuff that happens in the trace cache, as mentioned above, only represents the first five stages of the pipeline of Pentium 4. What follows is
- Allocate resources
- Register renaming
- Write into the µOP queue
- Write into the schedulers and compute dependencies
- Dispatch µOPs to their execution units
- Read register file (to ensure that the correct ones of the 128 all-purpose register files are used as the register(s) for the actual instruction)
After that comes the actual execution of the µOP, which I will discuss more detailed in the next paragraph. Of the above-mentioned previous stages the schedulers as well as the register file read are the most interesting. I have still decided against discussing them in detail to keep this article from becoming my next book.