Since the Pentium 4-2.0 GHz processor does not represent a change from previous Pentium 4 core designs, we'll once again do a short run-down on the basic features, performance and capabilities of the Pentium 4 processor.
One of the more important Pentium 4 enhancements is the switch to a NetBurst Micro-Architecture. In a nutshell, this shift means that the Pentium 4 changed gears a bit from the existing processor architecture, with much higher frequencies being one of the main goals. Intel has extended the pipeline stages with the Pentium 4 (from 10 to 20), thereby helping it to reach the current 2.0GHz speed.
But this move also translated into some performance penalties. A longer pipeline means a much longer trip for mis-predicted data, which essentially may have to climb back up to the top of the 20-stage pipeline to be processed. In the most basic sense, this helps illustrate one of the main differences between the Athlon and Pentium 4, and how the Athlon 1.4 GHz can compete so well against a higher-speed Pentium 4.
To help combat this, Intel has made a few improvements to the core design, such as L1 Execution Trace Cache, Advanced Dynamic Execution, and Rapid Execution Engine. These revolve around improving cache latencies and hit/miss ratios, increasing instruction sets, improving branch prediction and literally doubling the speed of the Pentium 4's ALU (Arithmetic Logic Units) which is used in integer processing. This last point holds the most promise, since Intel has doubled the internal ALU speed relative to the processor's clock speed. While impressive on the surface, it could become even more important as the Pentium 4 clock speeds continues to increase, and thereby giving an even bigger boost to the ALU.
Another such enhancement has taken place in the Pentium 4's 256K of internal L2 cache. Intel has implemented a form of the Advanced Transfer Cache found in the Pentium III Coppermine, except now it can transfer data at each clock cycle rather than alternating clocks as in the Coppermine. Under the right circumstances this can have the effect of doubling cache throughput, and coupled with the Pentium 4's high clock rates, it gives this feature some real potential. In addition, Intel has also jazzed up their SIMD instructions with 144 new instructions to create SSE2.
Currently, the Pentium 4 is the only processor with SSE2 incorporated, and there are projected to be a large number of applications recompiled with the new SIMD instruction set in mind. The only enemy of the Pentium 4 and SSE2 is time, as it may take awhile yet for the majority of SSE2-enhanced software to be recompiled and released, or for SSE2-enahbled games and other entertainment products to go through the design period.
The most recognizable piece of the Pentium 4 architecture is its 400 MHz memory bus coupled with high-bandwidth RDRAM. This dual-channel memory bus can theoretically handle up to 3.2 GB/s of data, which compares to 2.1 GB/s for an Athlon DDR system at a 266 MHz bus and only 1.06 GB/z for a Pentium III running on a 133 MHz system bus. While theoretical memory bandwidth figures are definitely in the Pentium 4's corner, there are also questions as to which current games and applications require it, as well as potential tradeoffs both with RDRAM memory and the specific i850 memory bus design.
As we like to say here at SE, the proof is in the pudding, and as we will see in the upcoming System Benchmark area, there are definitely areas where the Pentium 4 excels, just as there are places it comes in second place to the Athlon. The trick is in determining you own specific requirements and then using the performance indicators to determine if the Pentium 4 makes a good fit.