This really is one of the most exciting rumors I have heard in a long time. For anybody who doesnÂ't know SMT is likely to be the next "big thing" in micro-processor design and should rank with previous improvements like pipelining, superscaler architecture, and out of order execution.I don't really think I can talk about SMT without first talking about some of these other things so I will break this up into several posts. This one will have the background stuff is likely to be long winded, so feel free to skip it if you understand the concepts.
Around 1970 somebody at Intel realized that you could use counter to identify instructions stored in memory and execute a series of instructions programmatically. I.E. Load the instruction pointed to by the (program) counter, decode it to find out which logic gates to activate, activate the gates and apply the input from the registers. Then move on to the next instruction. Some instruction (branch instruction) could effect the contents of the program counter. Instruction (conditional instruction) could have differing behavior base on the results of the previous instruction. This is how the first processors worked.
The next thing people did was to take the operation I described above and break it into stages. I.E. you would decode the one instruction while the logic for the previous instruction is activated, and at the same time as you were obtaining the result for the instruction before that. While all this was occurring, you could go out and get the next instruction from memory. This is called pipelining and it speed up processor speeds greatly.
After that came superscaler architecture. Instead of 1 pipeline you put in 2 (or more) and execute 2 instructions simultaneously.
The next major innovations were speculative execution and out of order execution. In previous architectures sometimes the pipeline(s) would stall. For example, an instruction is at the point where it is ready to apply the output of the registers to the internal logic, but the data that was supposed to be in the register has not arrived from memory yet. In order processors need to stop and wait for the data to arrive before they can continue.
Another problem was conditional branch instructions. When one of these was encountered the processor had to stop and wait until the instruciton was completed before it knew which was the next instruction to load.
The solutions to these problems were speculative execution, and out of order execution. Speculative execution means the processor would try and "guess" which instruction would be next and begin executing it immediately. If it guessed wrong it would have to undo anything changes it made and begins again with the next instruction.
Out of order execution solved the problem of stalls due to a required piece of logic not being available or some data is not ready yet. The idea here is that you fetch, decode, etc your instructions as usual, but instead of being carried out immediately they wait in an area called a reservation station. Each instruction is checked to see if it depends on a previous instruction that is not completed yet. Dependent instructions are held until all the instructions ahead of it are complete. When an instruction has no more dependencies it is issued as soon as the required execution resources (logic) is available.
This is vital in fast modern processors since data will take at least 2 or 3 cycles to arrive from cache. Without this, they would stall for at least a cycle or 2 every few instructions.
AFIK the P6 (P-Pro, PII, PIII) was the first commercial speculative, out of order processor. This is why it was such a big blow to the RISC processor makers. Since the P6 was released most of the RISC chips are now Speculative out of order devices. As a result a few (Alpha, PA-RISC) have regained a performance lead over the PIII in part due to a superior ISA. AMD's Ahtlon is also a speculative out of order chip and is wider (more pipelines) then the PIII