AMD takes a totally different road. Just as Intel extended the x86 instruction set from 16-bit to 32-bit when designing the 386, AMD is extending the x86 instruction set to 64-bit in the K8.
The K8 will carry the whole x86 legacy with it, and it's clear that a 64-bit x86 CPU will never be as efficient as an IA-64-bit CPU.
Nevertheless, the K8 will run the ubiquitous IA 32 programs fast, very fast. Just like the 32-bit 386 ran 16-bit programs faster than the 16-bit 286. The K8 will have an improved K7 core and will not have to switch back and forth between 32-bit and 64-bit. It will be able to execute 32-bit instructions in 64-bit mode, just like the 386 could carry out 16-bit instructions in 32-bit mode. But what about the FPU ?
The FPU would seem to be the weak link in a 64-bit x86 CPU. Although the Athlon has a strong FPU, it can not compete (clock-for-clock) with the RISC cores of Alpha and SUN and it surely won't be able to compete with Intel's Itanium. The K8 and its siblings would be able to compete with the IA-64 CPU's as long as there are not many IA-64-bit programs. But once IA-64 programs are prevalent, the IA-64 FPU would slaughter the 64-bit x87 FPU. Considering how important FPU has become for desktop CPU's, and how important it has always been for workstations, this is a huge problem.
AMD has, however, anticipated this problem. 64-bit programs will be able to use new RISC-like (three operands) FPU instructions and a new programming model . The new FPU instructions will be able to use a flat floating point register file instead of the old stack based one just like those fast RISC CPU's (SUN, Alpha). In other words, when running 64-bit x86 programs, the double precision FPU performance of the K8 should be on par with the fastest FPU's out there.
But what about single precision (32-bit)? Well, that should be handled by 3DNow!. If developers want the best performance they should go for 3DNow !, which is, thanks to its SIMD capabilities, faster than any x87 FPU will ever be.Provided that AMD can leverage enough software support, the K8 will feature a very powerfull FPU , which will trounce the Athlon's FPU. But there is more.
While Intel's Itanium will be used in massive SMP (4, 8 processors on one motherboard), AMD plans to deploy multiple x86-64 processors on a single die. One K8 might actually consist of two or more processing units. Those processing units are almost like two different CPU's which share certain functions.
Why is that so interesting? Well, software developers are moving to multithreaded software models. Multithreaded software is where one program consists of several small pieces that can be run in parallel. This makes it possible to send those different threads to different CPU's (SMP) or different multiple thread processing units.
Multiple thread processing units (MTP) united in one core have one big advantage when compared to the classic SMP (like dual Xeons etc.). They don't duplicate everything and they don't have to synchronize over a slow bus. In other words, two MTP units take less die space and communicate faster than two CPU's.