At the heart of the Itanium lies a parallel execution core. Itanium family processors are intended to be very "wide." That is to say, they are able to run multiple instructions and operations at once in parallel. The Itanium will also be somewhat deep, with a ten-stage pipeline. The first-generation Itanium processor will be able to issue up to six EPIC instructions in parallel every clock cycle. You can compare this to the Pentium III and Athlon architectures, which are able to issue up to three x86 instructions per clock.
The six issue (two bundle) scheduler disperses instructions into nine functional slots, two integer slots, two floating-point slots, two memory slots, and three branch slots, giving a total of nine dispersal slots. This limits the number of each type of instruction that can be assigned in a single clock cycle. If an instruction or instructions cannot be executed because too many slots of one type are filled, the instructions are delayed until the next cycle. Proper compiler design should be able to handle most situations without overloading any type of functional slot.
Itanium Block Diagram
Backing up the Itanium's six issue scheduler are 11 execution units: four integer, two floating point, three branch, and two load/store units. These help support the various EPIC instructions that can launch more than one operation in a single instruction, such as SIMD floating point operations. Combined with the EPIC instruction set, the Itanium can execute up to 20 operations in a single cycle when doing some floating point- intensive tasks.
The Itanium may not consistently run 20 operations per cycle, but the potential is there and proper coding and compiling should yield efficient usage of the CPU. Either way, in concert with the EPIC instruction set and predication, this width will allow tasks to be completed in relatively few clock cycles. For more integer-oriented tasks, where there are few instructions with multiple operations, running eight operations per clock is the theoretical maximum.