We've seen Intel's move to an on-die cache usher in higher levels of speed and introduce processors whose frequencies were no longer limited by SRAM technologies. At the same time, AMD has been releasing faster processors with dwindling cache dividers, keeping the L2 speed close to ~300-340MHz (the point where SRAM modules become very expensive). We've been mouthing the words "on-die" for months and AMD is finally ready to deliver.
Although the overall quantity of cache memory on the Athlon has decreased, the Thunderbird core itself sports three times more cache than the preceding K75. The first level consists of a dual-ported, 128KB L1, which is split up into a 64KB instruction cache and a 64KB data cache (each being 2-way set-associative). In theory, having four times more L1 than the Pentium III equates to more speed, since there is four times as much data and instruction information local to the processor.
Both the Pentium III and the new Athlon CPUs utilize 256KB of full-speed cache, but the L2 of the Athlon is 16-way set-associative, as opposed to Intel's 8-way associative cache. Essentially, the purpose of associativity is to reduce cache conflicts in hardware rather than software, where a programmer would have to address these conflicts, which is ideal but not practical. For unoptimized programs, set-associative caches increase the cache "hit" rate and can reduce execution time, especially in multithreaded applications.
By adding various other features, such as redundant columns (ensures cache integrity), an exclusive architecture (eliminates the need for redundant data in the L1 and L2), and more write back and fill buffers (reduces the chance of the processor waiting for data and stalling), AMD has ensured that the Athlon is well optimized for heavy bandwidth loads.
For those of you that are into WCPUID, we ran the latest beta version of this nifty utility for info on AGP, cache, chip features and the usual and this is what she said…