Like the Pentium III before it, the Pentium 4 boasts 256KB of on-die cache on a 256-bit bus. Unlike the Coppermine, however, the Pentium 4's L2 cache transfers data on each core clock rather than every other cycle. Given the following equation we can calculate the data transfer rate of the L2 to the CPU's core.
(256-bit (32 byte) x 1 (data transferred per clock) x 1.5GHz) = 48GB/s for Pentium 4 1.5GHz
(256-bit (32 byte) x .5 (data transferred per clock) x 1GHz) = 16GB/s for Pentium III 1GHz
Again, as processor frequencies increase, so does the memory bandwidth of the L2. For example, once Intel hits 2GHz, the L2 will be able to provide 64GB/s of bandwidth - another example of Intel striving to keep the execution units busy rather than sitting idle.
The first processor to receive Single Instruction Multiple Data (SIMD) instructions was the Pentium with MMX technology back in 1996. These 64-bit integer instructions paved the way for SSE, which debuted with the Pentium III a couple of years later featuring 128-bit single precision floating point instructions. Now that quite a few software developers have accepted MMX and SSE, Intel is ready to introduce the Pentium 4 with SSE2.
Featuring 144 new instructions that add 128-bit SIMD integer arithmetic and 128-bit double-precision floating-point operations, SSE2 gives developers further ability to reduce the number of instructions required to execute particular tasks. Among the operations that may be accelerated are those pertaining to video, speech, image and photo processing, encryption, financial, engineering and scientific applications.
Unfortunately, for an application to benefit from SSE2 it must be recompiled with Intel's optimized C/C++ compiler. Thus far Intel has done well in getting developers to support their SIMD instructions, so we do not see a problem with SSE2 other than the time it will take for a majority of applications to incorporate optimizations.