When a processor is waiting for data or instructions, time is wasting. The longer it takes for data and instructions to get to the CPU, the worse it gets. When data and instructions are in cache, the processor can grab them much quicker than when having to go to slow main memory. Not only is cache latency much lower than DRAM latency, the bandwidth is much higher.
There are some trick programming techniques in use out there to keep often-used data and instructions in cache and they are not the kind of techniques you learn in your high school BASIC course. Still, the easiest way to keep data and instructions in cache is to have a lot of cache to keep them in. Intel knew that when they designed the Itanium.
The Itanium has three levels of cache. L1 and L2 are on-die while L3 is on cartridge. According to Intel, the L3 cache weighs in at 2MB or 4MB of four-way set associative cache on two or four 1MB chips. IDC reports that the L2 cache size is 96k in size, and the L1 cache, which does not deal with floating point data, has a 16KB integer data and a 16KB instruction cache.
The 294.8 million transistors of (4MB) level three cache runs at the full processor speed, giving 12.8GBps of memory bandwidth at 800MHz. With 2MB or 4MB of L3 cache on the Itanium, the chances of the required data and instructions being in cache are quite good, bus traffic can be reduced, and performance increases. With six pipelines hungry for instructions and data, the Itanium needs all the cache it can get.
To make caching even more effective, Intel uses data speculation and cache hints. Data speculation is caching and calling for data that may be needed or may be changed before it is needed, so that, in the case that the data is needed and it has not changed, the CPU does not have to take a latency impact from calling for the data. The processor, with the help of compiled instructions, looks ahead, anticipates what info it may need, and then brings it to cache or into the processor. This helps hide memory latency. Cache hints are two-bit markers for memory loads set by the compiler that help the CPU find data in cache. This improves the speed of retrieving data from cache.
A major link in the food delivery system for the Itanium is the system bus. The Itanium will use a 2.1GBps multi-drop system bus to keep well fed with data and instructions. We expect it will have a 128-bit 133MHz bus. The memory subsystem and I/O will be determined by the chipset used. First generation systems should use dual-memory ported SDRAM giving 4.2GBps of memory bandwidth. Later generations will have the option to use DDR SDRAM or RDRAM. Eventually, Intel plans on moving server platforms to DDR II. 64bit, 66MHz PCI and AGP Pro (4x) should be common on Itanium motherboards and support will be included in Intel's 460GX chipset.