The Itanium contains four pipelined FMAC units (Floating-point Multiply Add Calculator). The primary two are each capable of processing two single-precision, two double precision, or two double-extended-precision floating-point operations per clock. That yields up to 3.2GFLOPS of highly precise floating point processing.
There are an additional two FMACs tuned for 3D applications. They are each capable of processing up to two single-precision floating-point operations per clock. That yields another 3.2GFLOPS of single-precision processing power. All together, the Itanium has a theoretical max of 6.4GLOPS of single-precision floating point processing power.
There are four pipelined ALUs (Arithmetic Logic Unit) in the original Itanium. Each can process one integer calculation per cycle. They can also process MMX type instructions. While the Itanium has the potential to be a massive floating-point powerhouse, its integer performance also has tremendous potential.
The Itanium will come with 128 floating point and 128 integer registers. When processing up to 20 operations in a single clock, the registers give plenty of room for data inside the processor. This reduces the chances of the execution of an instruction being delayed because data could not be held locally. This is especially important since the Itanium can process up to eight floating-point operations in a single clock. With the possibility of eight operations running in a single clock, having too few registers could be a serious bottleneck.
The registers also have the ability to rotate. Rotating registers allows the processor to perform an operation on multiple software accessible registers in turn. This increases CPU pipeline utilization and efficiency when dealing with streams of data to process.