Advanced Transfer Cache
The next thing that (most of) the data has to pass is Pentium 4's on-die L2-cache. Intel calls it 'Advanced Transfer Cache' since the days of
Pentium 4's L1 Cache
After the discussion of the L2 cache it wouldn't be more than logical to move over to the L1 cache. This is what we will do, but not without a special remark. While Pentium III is equipped with a 16 KB L1 cache for instructions and a 16 KB L1 cache for data, there is only an 8 KB small data L1 cache in Pentium 4, while a pretty nifty feature called 'Execution Trace Cache', which I'll discuss in the next paragraph, replaces the L1 instruction cache of Pentium III.
Intel was probably forced to reduce the size of the L1 data cache down to only 8 KB, which is half the size of Pentium III's L1 data cache and only an eighth (!!!) of Athlon's, to enable its extremely low latency of only 2 clock cycles. It results in an overall read latency of less than half of Pentium III's L1 data cache already in the Pentium 4 at 1.4 GHz, but the small size of Pentium 4's L1 data cache may be one reason for the performance flaws we will see when we get to the benchmark results.
The L1 data cache of Pentium 4 is 4-way set associative and uses 64-byte cache-lines. The dual-port architecture allows one load and one store operation per clock.