Home

News

Forums

Hardware

CPUs

Mainboards

Video

Guides

CPU Prices

Memory Prices

Shop



Sharky Extreme :


Latest News


- Acer Fires Up Two New Ferrari Notebooks
- Belkin Debuts Docking Station for ExpressCard-Equipped Notebooks
- Logitech 5.1 Speaker System Puts Your Ears At Eye Level
- Dell, Nvidia, and Intel Fire Up Overclockable Gaming Notebook
- Gateway Puts Premium Components Into Affordable Home Desktop
News Archives

Features

- SharkyExtreme.com: Interview with ATI's Terry Makedon
- SharkyExtreme.com: Interview with Seagate's Joni Clark
- Half-Life 2 Review
- DOOM 3 Review
- Unreal Tournament 2004 Review

Buyer's Guides

- September High-end Gaming PC Buyer's Guide
- September Value Gaming PC Buyer's Guide
- October Extreme Gaming PC Buyer's Guide

HARDWARE

  • CPUs


  • Motherboards

    - DFI LANPARTY UT nF4 Ultra-D Motherboard Review

  • Video Cards

    - Gigabyte GeForce 7600 GT 256MB Review
    - ASUS EN7900GT TOP 256MB Review
    - ASUS EN7600GT Silent 256MB Review
    - Biostar GeForce 7900 GT 256MB Review





  • SharkyForums.Com - Print: Performance Discussion - The Biggest Bottleneck

    Performance Discussion - The Biggest Bottleneck
    By Arcadian October 09, 2000, 04:38 PM

    I wanted to spawn a discussion on system performance. More specifically, what do you think is the biggest bottleneck in a computer system?

    These days, we are starting to see diminishing returns for improvements in various hardware. It has gotten to the point where insignificant improvements are taken to be huge leaps in technology. We used to live in a time where the newest video card gave a 20-40% improvement over the competition, whereas today it is more like 5%. CPUs used to give similar increases, and now they, too, are giving < 5% improvements.

    So what is the bottleneck? What is preventing our systems from doubling in performance? Is it system memory? Is it the CPU or the video? Is it AGP? Or front side bus speed? Is it the hard disk or removable media? Or is it something else entirely?

    Before I give my opinion, I was just curious on what other people thought. So come on... state your opinions. Keep it technical, if you can, and site examples if possible. Web links are really great, too.

    By nkeezer October 09, 2000, 07:35 PM

    As for what's holding up performance on the most general level, I think the answer to that is easy: hard drives. Everything else in a computer is measured on the order of nanoseconds, yet hard drives are still way up in the millisecond range. As long as things like seek times stay up at say 9ms, things won't change much. I mean, the processors will get faster and the computer will be able to crunch more numbers, but in overall use, it won't "seem" that much faster. Unfortunately, I'd consider this to be a limitation of hard drives themselves, i.e. until we get away from mechanical forms of storage, and into the realm of say holographic or other optical storage methods, then things won't get dramatically better.

    And the comments about smaller performance increases with succeeding generations of technology...I think that's only partway true. If you look at CPU speeds over the past couple years, processing power has actually beaten the estimate from Moore's Law (due to the chip wars between Intel and AMD). So in that sense, performance is still increasing just as much as before. However, I think the greater scale leads to the impression of diminishing returns. For instance: around 1995, Intel had the 100MHz Pentium. They followed up with chips running at 120MHz and 133MHz, representing increases of 20 and 33%. Pretty hefty. But the numbers themselves are small in comparison to today's chips...it'd be akin to Intel releasing a chip running at 800, then following it with a chip running at 960 (20%) and 1066 (33%). Which of course we know they don't do. In other words, the increments between chips has stayed pretty much the same, but the frequency of their releases has been increased instead (hell, I remember a few years back, it seemed like it took 3 or 4 months for a new chip to be released).

    And then of course there's the bigger question: is the extra speed really necessary? For the most part, I'm inclined to think that it isn't. I upgraded my PII 350 to a Celeron II @877 several months ago. To be honest, the differences between the two processors in typical everyday use was negligable at best. To satisfy my own personal curiosity, I even timed how long it took to do certain things on my computer with the two different chips (the results are actually online: http://jpnsystems.8m.com/celeron566/index.html) but it turned out that the new chip didn't make much of a difference at all. And the 250% boost in clock speed only really came into play with things like video compression, where the extra clock cycles can be put to use. The rest of the time, the chip sat on it's ass waiting for the hard drive to catch up.

    I'm not sure what the point of all that was

    By zombor October 09, 2000, 11:28 PM

    quote:Originally posted by nkeezer:
    As for what's holding up performance on the most general level, I think the answer to that is easy: hard drives. Everything else in a computer is measured on the order of nanoseconds, yet hard drives are still way up in the millisecond range. As long as things like seek times stay up at say 9ms, things won't change much. I mean, the processors will get faster and the computer will be able to crunch more numbers, but in overall use, it won't "seem" that much faster. Unfortunately, I'd consider this to be a limitation of hard drives themselves, i.e. until we get away from mechanical forms of storage, and into the realm of say holographic or other optical storage methods, then things won't get dramatically better.

    But hard drives have always been measured in ms and memory has always been measured in nanoseconds...that hasnt changed, they've just gotten smaller(proportionally?). I think its due to the FSB. Think about it, when we had 133mhz pentiums, wasnt the bus speed 66? Im not sure, because the first computer i built was a 233 k6, then before that, i just owned a 486. But a 66 bus speed compared to 133 or 233 is a bigger memory speed to cpu speed ratio than, say 133mhz buss to 1133mhz cpu speed. The problem with just speeding up the bus is the heat issue. Im assuming the chipset would generate gobs and gobs of heat running at 1133Mhz! But if memory/bus/cpu speeds were equal, the proc wouldt have to go into "wait states(??)" waiting for the memory to catch up to itself.

    This is all based off of some articles i was reading a year or so back, so it may not all be 100% accurate. Im open to criticizim to it because it is probably partly wrong.

    By nkeezer October 09, 2000, 11:42 PM

    My point exactly: hard drives as so slow compared to the *other* components in a computer, that huge increases in processing power and the like are effectively covered up (in normal use) by the lagging hard drives. So until we can get rid of spinning platters and other mechanical components as a means of data storage, then hard drives will continue to be the biggest bottleneck.

    quote:Originally posted by zombor:
    But hard drives have always been measured in ms and memory has always been measured in nanoseconds...that hasnt changed

    By zombor October 10, 2000, 12:01 AM

    but the hard drive isn't continously being accessed like memory is. it is for loading things like games and programs, then once theyre loaded, it never accesses the HDD for that app again to load it anyway. In game performance has nothing to do with HDD speed(unless your using virtual memory!). although speeded up HDD will increase overall speed, its not really the "limiting reactant" as my chem teacher said today

    And windows would load 10x faster if all the legacy code for the architecture was destroyed, but thats a whole other topic!

    By Ymaster October 10, 2000, 12:35 AM

    Limiting factors..Slow refresh rates for monitors at 1600X+ Just wasted FPS...

    Parts that work non-asink.

    There are current techs on faster hardrives with no moving parts. They use crytal and can store the internet.. heh...joke

    If HD's where as fast as dram. Then there would be no point in dram.

    I personally think the limiting factors are the Motherboard to hardware parts interfaces..Pci, ISA, AGP...If we had agp slots for all cards...Call it apci?

    When the speed of Cpu's have reached there max we will just make duel cpus a standurd. Multi CPu's are what made the super computers that tolk up rooms into those small Doctor Who sized Tartuses..

    I dont want the see video cards anymore. I want to see GPU sockets! Adding Alphas to are Video GPU's like we do to new cpu's...Gives a whole new light to onboard video?

    By Ymaster October 10, 2000, 12:38 AM

    I dont want the see video cards anymore. I want to see GPU sockets! Adding Alphas to are Video GPU's like we do to new cpu's...Gives a whole new light to onboard video?

    [/B][/QUOTE]

    Now that I think about it..Why not add on-die cash to the Gpu's? Hmmmm

    By Igor October 10, 2000, 12:48 AM

    HD's are by far the slowest things computers have.. aside from printers .
    If you want more speed go intp RAID 0. This will impore the speed. Also going SCSI will help, specialy if you get thouse 15,000 RPM Drives.
    But for the real boost in performance we gotta wait 'till solid state HD's will get to the market for any1 who has less then a spare million sitting around

    By Igor October 10, 2000, 12:48 AM

    BTW nkeezer the link you posted is dead

    By Raptor^ October 10, 2000, 06:06 AM

    This is an interesting question
    I assume we're talking standard PC hardware here.

    This very much depends on the sort of application that your running.

    For example in my old job I was dealing with 3d scenes containing 100s MB of textures on screen at once and 1000s of polygons. Here the limiting factor for a standard PC was the speed of the AGP bus, even with 4X AGP it was simply slow. If you went to an area where the textures fit in the memory of the card, performance would suddenley increase.
    The SGI320 and 540 PCs could texture directly from main memory and performed much more consistently.

    I can also claim that memory bandwidth (more important than, but linked to FSB). If I'm running a sorting algorithm on data that will fit in RAM, then my memory bandwidth will certainly limit the speed of execution.

    IMHO in normal use the HDD only becomes a limiting factor if you don't have enough RAM. If your limited by the speed of your disk, buy some more memory instead of a new processor.

    HDDs do limit performance during startup and for the first time you load an application, but to me these are unimportant as they occur relatively rarely and are not performance critical.

    I think of removeable media in the same way. If I install something from CD, for example, I know I'm only going to do it once and then the speed of the CD drive becomes irrelevant.

    CPU speed can still be the limiting factor if you are running an application with a very tight innermost loop. If this loop fits the in L1 cache, then the processor will be running nearly flat out. Even if it misses the L1 and hits the L2 cache we can consider the processor the limiting factor as the L2 cache tends to be on die now. It is only if we get a complete cache miss and have to go across the bus to RAM that everything slows down.

    So, where have the performance increases been recently?

    Advances in CPU are most obvious, along with those in 3D gfx cards. Both have been pushing Moore's Law recently.

    Memory bandwidth certainly hasn't improved vastly over the last few years, so in the general case this would probably be my choice. My 'old' P2-400 has exactly the same memory bandwidth as my p3-700 at its default FSB, yet the processor speed has almost doubled.

    HDDs have increased in capacity at an astounding rate recently, but have not got significantly quicker in real life situations, especially if lots of seeks are involved.

    CD drives have hit a plateau in terms of performance, but DVDs are significantly fastre. I'm not even considering floppy disks

    In a general situation, I think that memory bandwidth is now the limiting factor in a PC. Advances in this area have been slow, yet memory is used fundamentally in a PC.
    woah this has turned into a monster post... enough of my ramblings for now

    By Humus October 10, 2000, 10:08 AM

    The biggest bottle neck is Microsoft. They keep add new fancy features into the OS and software that we have to learn how to disable. If they would stop adding useless stuff and work for a year or two on increasing performance and removing some bugs I'd be more than happy.

    The next bottleneck is system memory, with DDR the situation will be better and will perhaps hold for a couple of years. I hope FC-RAM becomes mainstream until then.


    quote:Originally posted by Ymaster:
    Now that I think about it..Why not add on-die cash to the Gpu's? Hmmmm

    On-die cash? Would be nice ...
    I Guess you mean on-die cache ... they already have this, all graphic cards out there have a texture cache and a vertex cache.

    By zombor October 10, 2000, 10:33 AM

    any os on a pc will be just as slow, reletavly, to any windows os. And even with ddr memory, it still wont be even as close to as fast(along with the FSB) as the proc.

    By Adisharr October 10, 2000, 11:41 AM

    I would have to vote for the system bus. We are still running that relatively slow compared to other devices in the system. You can almost factor HD's out of the equation if your lucky enough to have enough memory. We really need some type of high speed, very wide data bus with very little latency. That would at least put the monkey back on processors somewhat.

    $ .02

    By Sol October 10, 2000, 11:44 AM

    quote:Originally posted by Adisharr:
    I would have to vote for the system bus. We are still running that relatively slow compared to other devices in the system. You can almost factor HD's out of the equation if your lucky enough to have enough memory. We really need some type of high speed, very wide data bus with very little latency. That would at least put the monkey back on processors somewhat.

    $ .02

    Yeah, i have to agree with this. The data still has to travel accross the busto get from component to component.

    By Moridin October 10, 2000, 12:34 PM

    The thing limiting PC performance is a basic rule of economics. "The law of diminishing returns". Any time you add a new technology you see an immediate jump in performance. Over time the tech is perfected, expanded and extended, but each subsequent change yields smaller gains then the previous iteration of the technology. I may make a separate post about some of these "new" technologies that have allowed processors to keep pace.

    A prime example of this would be 3D acceleration. The jump from no 3D to the first 3DFX chips was huge in comparison to the change between generations of GPU's now.

    Every once and a while we reach a point where improvements in a given area become so difficult that that area begins to lag behind the rest of the system and changes to the whole system architecture are necessary to compensate. Two (relatively) recent examples of this are Ram and motherboards (I'll talk a little more about motherboards in another post). A lot of the design features we see today are dictated by this limitation so I consider these the limiting hardware factor.

    HDD are not really a problem since if you have enough memory most consumer apps only use them during startup or when they save data.

    I would also like to add that I disagree with some statements made about moors law. Moore's law is specific to speed and complexity of integrated circuits including MPU's; it says nothing about overall performance. It turns out that increased complexity of MPU's have offset limiting factors in other areas keeping IPC high while clock rate increased. As a result absolute performance has matched or bettered Moore's law, but Moore's law does not specifically address performance.

    I would also disagree with the statements that performance gains have accelerated in the last 2 years. This may be true for the x86 world but if you look at the cutting edge of processor design (the highest performing processors) the opposite is true.

    By zombor October 10, 2000, 01:08 PM

    i was just thinking about this in class:
    what if someone built a PC that was all on die exept for the hard drive off course. Think about it...the proc would have direct access to the gpu, the ram, sound card, everything...sure you wouldnt be able to upgrade it at all, but wouldnt it be alot faster than having to go thru busses to et to everything? Just an idea

    By Arcadian October 10, 2000, 01:34 PM

    quote:Originally posted by zombor:
    i was just thinking about this in class:
    what if someone built a PC that was all on die exept for the hard drive off course. Think about it...the proc would have direct access to the gpu, the ram, sound card, everything...sure you wouldnt be able to upgrade it at all, but wouldnt it be alot faster than having to go thru busses to et to everything? Just an idea

    What you refer to has been branded by computer architects as the SoC processor (System on a Chip). SiS is coming out with a form of SoC processor, which has integrated video, sound, northbridge, and southbridge. VIA is making a Cyrix processor with integrated video, sound, and northbridge. Intel's Timna processor was supposed to integrate video and northbridge, and just because that project was cancelled, I don't think you'll see the last of that technology.

    Right now, integrating DRAM is a very difficult task. Nintendo has managed to create a platform called the Gamecube, which will be its next generation console, and that will be using integrated DRAM. However, it will only be 24MB. It will take semiconductor manufacturers a long time before larger amounts will be able to be integrated inexpensively. Though, they might start by including a small amount of embedded DRAM (say, 16 or 32MB), to act as an L3 cache, perhaps.

    Zombor, what you say is true! Integration is in the future for both high and low performance systems.

    By zombor October 10, 2000, 01:44 PM

    but is there that much of a speed increase in these integrated systems? Id think there would be due to the speed jump with moving L2 ondie. And if everything was integrated, nobody better realase a new system every 3.546 days

    By Moridin October 10, 2000, 02:36 PM

    A few words about Motherboards and Ram.

    Up until about 7 years ago motherboard speed kept up with processor speed. This has changed a lot and I'd like to take a few minutes to discuses why. But first some history.

    When the IBM XT came out the ISA bus could be used as an expansion port to add additional RAM. It didn't take all that long before RAM outpaced the ISA bus but for many years RAM and Motherboards at least kept up with the FSB of the processor even if it took some latency hits along the way.

    Back in the 486 days we first started to see the processors pull ahead of motherboards. Intel had a 50 MHz 486 on the market for some time, but even though a 50 MHz motherboard spec was available the chipsets and boards themselves were almost non-existent. Intel finely got feed up and took maters into their own hands and produces the 486 DX2 50 and DX2 66, the first x86 processors with clock multipliers. (This was also when they got into the chipset business) The first generation of Pentiums did not have multipliers but it didn't take long before they had them also.

    The reason why motherboards now require multipliers is size. Electrically speaking they have become huge. I say electrically speaking because in many cases it is better measure electronics in wavelengths instead of inches or cm. The original XT board was physically large by today's standards, something on the order of 50 cm by 50 cm. (I don't feel like looking up the exact numbers but these make my calculations easier)

    Using an 8 MHz clock and assuming the signal propagate at .667 c (2X10^8) m/s one wavelength is 200000000/8000000 or 25 m. The board itself is 1/50 of a wavelength across. This is a relatively simple thing to design. Even with harmonics it is unlikely you would have to deal with transmission line effects.

    I guess at this point I should explain harmonics and transmission lines. If you want to transmit a square wave of a given frequency it is not good enough to transmit that frequency you also have to transmit harmonics. A harmonic is a sine wave that occurs at specific frequencies relative to the original signal. A square wave has harmonic frequency at multiples of every odd number. You usually require a couple harmonics at least to get a decent square wave.

    For example, to get a square wave you would add a sine wave at F1, third harmonic at 3xF1 and the fifth harmonic at 5xF1. Each harmonic is smaller then the one below it so higher harmonics play an ever smaller role in the signal. So you need to allow for frequencies 3 -5 time that of your digital signals.

    Transmission line effects come into play any time a signal must travel more then about 1/10 of a wavelength. They cause a number of bizarre effects if not carefully managed, like causing open circuits to act like short circuits; short circuits to act like capacitors, etc.

    So on the old XT even the fifth harmonic could travel all the way across the board without having to worry about transmission line effects. A 33 MHz board 33 cm across would be almost 1/20 wavelengths across, but if you are careful with your traces and avoid having any (33 MHz) trace being more the 20 cm you can at least get the full third harmonic and probably most of the fifth so you are still OK.

    Consider a modern 133 MHz board that is say 25 cm across. The board is now 1/6 of a wavelength across. At this frequency even your primary frequency may be subject to transmission line effects. You now need to carefully control things like impedance, trace length etc to send digital signals around the board. (These effects start to kick in around 1/10 of a wavelength but don't get into full gear until about 1/4 of a wavelength)

    Around this frequency you start to run into other problems as well. If two signals travel different distances the difference in travel time can become important. Imagine having two memory chips far enough apart that one receives a second clock pulse half a clock pulse after the first. By the time they send data back to the CPU the second bit of data from the first chip arrives around the same time as the first bit from the second chip. Not a good thing.

    This is what is likely to stop parallel busses from operating faster then about 200 MHz and why the really fast memory like RDRAM or SLDRAM use serial busses.

    By Evil Twin October 10, 2000, 03:04 PM

    I think it's the laws of physics that are starting to take their toll on the materials used nowadays. Anyway I've always loved wearing turtleneck sweaters. Why not bottleneck sweaters for computers nerds?

    By nkeezer October 10, 2000, 04:15 PM

    Wwwwhhhhhhhooooooosssssshhhhh! -- that was the sound of 90% of what Moridin wrote flying completely over my head But at least he summed it up nicely in the last paragraph, so I think I know what he's talking about now. Enlightening

    Anyway, I still stand by the hard drive as the biggest bottleneck. Now maybe all you guys ever do with your computer is play games, in which case I guess your HD doesn't matter. But I don't, and (I know this is a shock) a lot of other people don't either. And I know that if I look down at my computer, there's a lot of times when that hard drive light is on those things are thrashing away: like any time I want to open an application that I haven't used for a while, or need to save a file and the entire directory structure has to get listed again, or I want to go to a website and it's reading and writing back to the cache folder, etc. Like I said before, all that takes milliseconds. Which might not seem like that much, unless you consider that everything else the computer's doing is measured in nanoseconds. So in other words...seems to me that every time your hard drive is being used, your computer is being slowed down.

    And just one final point: the biggest bottleneck in computers is not really the hard drive, the chip, memory or bus speeds. It's really the average consumer that buys their box at Circuit City and is happy with it for 5 years. If all computer users were a bunch of tech geeks, things would certainly move a hell of a lot faster

    By Arcadian October 10, 2000, 05:41 PM

    Moridin was simply explaining one of the many reasons why it is so difficult to move up speeds on the motherboard. These days in modern electronics, you can easily crank up the speed of ICs to over 1GHz. In fact, in other industries other than CPUs, it is even common to see ICs on the order of 40-80GHz with need technology like SoI or Silicon Germanium BJT transistors. However, once you leave the safe confines of your die, you begin to feel the wrath of onboard electrical wiring.

    For the reason Moridin gave, and others, it is very difficult to get wide pathways going in excess of certain speeds. Harmonic noise is just one of the factors you start to have to deal with. The front side bus on the Intel and AMD micrprocessors that people use today have 64bits used for data, and many other bits used for address and protocol. Getting these to run at the speed of a microprocessor is not possible with todays electronics, which is why CPUs have multipliers.

    However, getting information to the processor faster than it's going right now is only part of the problem. My opinion of the system bottleneck is that it is a number of things that cause current systems to not perform as well as they can. Processor to memory bandwidth is one of them, and perhaps an important one, too. However, even if you were to give a processor an unlimited amount of free information (in other words, the processor receives information instantly without any latency), it still wouldn't be able to process everything it receives.

    Perhaps this sounds a little obvious, but consider how much of a bottleneck the CPU really is. My feeling is that the x86 architecture is the bottleneck. Because of the antique nature of the architecture, it is very difficult to improve upon. I am really looking forward to Intel's new architecture, IA64, because that will be the first new architecture Intel has used in it's main processor line since the 8086.

    In the future, I see faster systems doing the following.

    1) Moving to IA64, or other non-x86 architecture. This will undoubtably be IA64, but I am opening myself to the possibility that IA64 might fail, and another architecture comes to fill its shoes. On this same topic, we'll call it 1b) Paralellize data. In other words, SMP configurations are good for now, but I see symmetric multiprocessing on the core level in the future. Read another post on this forum for more information, but I believe SMT and CMP to be the future. More paralellized code will also be necessary so that each of the processor cores and threads will be given data 'a plenty to sort through.

    2) Integrating everything. Like I said, speed is only found on the die, and as soon as you move off of it, you face nature's wrath. So the solution is to integrate CPU, video, memory, I/O, and the kitchen sink all on to the very same die.

    3) Serialize I/O. If you have to move off the chip, it isn't going to be using wide parallel interfaces. Rather, everything will move to high speed serial. Serial can move much faster, and can scale well by adding more channels. Since data from different channels don't have to sync up like in parallel, this gives the designer a lot more room to speed things up.

    4) Getting rid of rotating media. Nkeezer is mostly correct in sticking with his argument about hard drives. Frankly, a lot of programs out there are hard drive limited, and there may be even more in the future with all the streaming media that is becoming more common. If we want to speed up hard storage, we have to find a way to create cheap media that is not mechanical in nature. Spinning plates will NEVER reach microseconds of access time, let alone nanoseconds. Unfortunately, technology to do this may be a long ways off.

    5) Make outside lines optical. So far, we have seen no cable capable of producing more throughput than optical. However, the disadvantage right now is that you can't bend an optical cable because the photons travelling through the cable will leave the cable if the angle is too great (this is a scientific fact... want to mess up data going to a server? bend the optical cable, if there is one). Once we overcome this, which may be very distant from now, we can start wiring our motherboards, and start connecting to online sources through optical cabling, thus enabling very high speed connections.

    These 5 things together will move the industry into the next tier of computer performance. Small changes in one area or another aren't likely to affect much. This is my opinion, at least.

    By Moridin October 11, 2000, 02:02 PM

    quote:Originally posted by Arcadian:

    1) Moving to IA64, or other non-x86 architecture. This will undoubtably be IA64, but I am opening myself to the possibility that IA64 might fail, and another architecture comes to fill its shoes. On this same topic, we'll call it 1b) Paralellize data. In other words, SMP configurations are good for now, but I see symmetric multiprocessing on the core level in the future. Read another post on this forum for more information, but I believe SMT and CMP to be the future. More paralellized code will also be necessary so that each of the processor cores and threads will be given data 'a plenty to sort through.

    I am not so sure about IA-64 at this point. When I first heard about it I was excited but I am beginning to have a lot of doubts. The premise of IA-64 is to make use of VLIW/Explicit Parallelism to avoid the complexities of out of order execution and hopefully get more parallelism then OOOE can achieve.

    OOOE finds parallelism at run time using hardware; VLIW/EPIC finds it at compile time using software. In theory this should make your hardware simple.

    In other words it is supposed to be small simple and effective. Somewhere along the way IA-64 became very very complicated. The die size is huge and it is too complicated to run at high speed. This is a very bad way to start for a processor whose primary advantage is supposed to be simplicity. This may change with future versions of IA-64 but while IA-64 was under development a few other things happened.

    OOOE was developed and perfected. Improved manufacturing process made more transistors available to chip designers. This meant that deep OOO designs are now much more practical. In fact the OOO components of most processes take up less then 20 % of the die space, so it has become relatively cheep to implement OOOE.

    On the other side VLIW/EPIC compilers have not advanced nearly as quickly. Memory latency has increased dramatically. This is worse for VLIW then OOOE because memory latency can't be predicted at compile time. Dependencies generated by memory latency can only be resolved at run time using some form of OOOE.

    VLIW/EPIC can look much deeper for parallelism then OOOE provided the compiler is good, but OOOE can react to things like memory latency much better then VLIW/EPIC. Ultimately we will probably see a convergence and most ISA's will be similar


    By Moridin October 11, 2000, 02:28 PM

    quote:Originally posted by Arcadian:

    1) Moving to IA64, or other non-x86 architecture. This will undoubtably be IA64, but I am opening myself to the possibility that IA64 might fail, and another architecture comes to fill its shoes


    Intel has done a very good job of eliminating the CISC penalty in x86. It is still there, but is quite small. They still have a lot of problems though. IMHO,I think X86 can be competitive for some time yet if it can make the following changes. (Not necessarily easy ones though)

    Give the architecture some more registers. I don't know if they need 32 like most RISC machines but 16 are a minimum. Limited registers means more memory operations and a lot less parallelism.

    Get rid of the accumulator configuration in the ALU. X86 ALU can only save the result of a calculation to 1 register (the Accumulator) and on top of that the Accumulator is always one of the original operands, so you always destroy one of your values unless you specifically save it to another register. The process of copying register contents, moving them to put them in the accumulator adds extra, unnecessary instructions to the program.

    SSE can do most of this on the FPU side so that is a step forward, and now that we have SSE2 which is fully IEEE compliant and can completely replace X87 (SSE could not) a lot of the concerns on the FPU side have been addressed.

    Get triadic instructions for both the ALU and FPU. X86 and SSE2 only support 2 operand instructions. They can do things like a=a+b, a=a+c etc (the result must be put into a). Most other ISA's support 3 operand instructions and can do a=b+c+d, a=a+b+c, c=a+b+d (result can go anywhere)

    At some point 64 bit operation will be required (for memory access)

    These changes, along with a trace cache to remove the CISC decode penalty would bring X86 a long way towards parity with more modern ISA's

    By Arcadian October 11, 2000, 08:43 PM

    quote:Originally posted by Moridin:
    Give the architecture some more registers. I don't know if they need 32 like most RISC machines but 16 are a minimum. Limited registers means more memory operations and a lot less parallelism.

    Get rid of the accumulator configuration in the ALU. X86 ALU can only save the result of a calculation to 1 register (the Accumulator) and on top of that the Accumulator is always one of the original operands, so you always destroy one of your values unless you specifically save it to another register. The process of copying register contents, moving them to put them in the accumulator adds extra, unnecessary instructions to the program.

    The Itanium is able to provide these things. It actually has 128 registers, all 64bit. It also has 128 more floating point registers, and 64 predicate registers. The instructions also accomodate you suggestion regarding the ALU. You say in a previous post that you are having some doubts about IA-64. You shouldn't because it is progressing quite well. Some people are displeased that it has taken so long, but there are the 5 9's of reliability it must attain. In other words, 99.999% uptime. Intel can't release it unless it is at least that stable. Plus, the successor to Itanium, the McKinley is supposed to be a much smaller die, and much faster, too. Intel has a lot in store for IA-64, so I would be very surprised if it doesn't do very well.

    By Humus October 12, 2000, 09:42 AM

    Moridin:
    You're wrong on some points. The x86 ALU can save to results to all registers for many instruction, not just the accumulator (EAX). You can do stuff like
    ADD EBX, ECX
    AND EBX, ECX
    OR EBX, ECX
    SHL EBX, 1
    You don't need to use the accumulator for those.
    However, many instructions (such as MUL and IMUL (the single operand one)) can only work on the accumulator, but it's mainly the complex instructions (which should be avoided anyway).

    And a = b + c + d is a 4 operand instruction. A 3 operand should read a = b + c.

    By Humus October 12, 2000, 09:48 AM

    ... I would like to add that I have high hopes for IA-64 too. It really solves most of the problems that the x86 archetecture has. It removes much of the dependency of high speed memory, it removes the need for a branch prediction unit, it can do register indexing meaning you can put small arrays into registers instead of memory (I always wanted to be able to do that), and it's instruction set isn't so damn ugly!

    By Moridin October 12, 2000, 12:39 PM

    quote:Originally posted by Humus:
    Moridin:
    You're wrong on some points. The x86 ALU can save to results to all registers for many instruction, not just the accumulator (EAX). You can do stuff like
    ADD EBX, ECX
    AND EBX, ECX
    OR EBX, ECX
    SHL EBX, 1
    You don't need to use the accumulator for those.
    However, many instructions (such as MUL and IMUL (the single operand one)) can only work on the accumulator, but it's mainly the complex instructions (which should be avoided anyway).



    Thanks, I am not familiar with the full IA-32 instruction set. Most of my assembly programming was done in 68000, 6800, 8080 and the 8080 compatible Z80. I still think IA-32 is limited in this regard compared to newer ISA's and that this is a bottleneck in the ISA.

    I also understand that the LEA instruction can extend some of these capabilities, but I am not entirely sure what it does.

    By Moridin October 12, 2000, 12:52 PM

    quote:Originally posted by Arcadian:
    The Itanium is able to provide these things. It actually has 128 registers, all 64bit. It also has 128 more floating point registers, and 64 predicate registers. The instructions also accomodate you suggestion regarding the ALU. You say in a previous post that you are having some doubts about IA-64. You shouldn't because it is progressing quite well. Some people are displeased that it has taken so long, but there are the 5 9's of reliability it must attain. In other words, 99.999% uptime. Intel can't release it unless it is at least that stable. Plus, the successor to Itanium, the McKinley is supposed to be a much smaller die, and much faster, too. Intel has a lot in store for IA-64, so I would be very surprised if it doesn't do very well.

    The problem with adding more registers is that adds extra levels of logic to every register access. 16 registers require only 4 levels of logic while 128 require 7. If you look at the pipeline for Itanium register access is spread over 2 pipeline stages. Every other processor does this in a single stage.

    IA-64 is an in order device so it needs many more registers for the compiler to use so it can find parallelism. An OOO device does not need nearly as many architectural registers to find parallelism. It does however need a set of rename registers for each pipeline stage prior to the execution stage.

    I agree with you though, McKinley looks a lot better then Itanium, I still do not think it will be as fast as EV7 though.


    By zombor October 12, 2000, 01:44 PM

    ok, this just got way out my league

    By Arcadian October 12, 2000, 02:30 PM

    quote:Originally posted by Moridin:
    The problem with adding more registers is that adds extra levels of logic to every register access. 16 registers require only 4 levels of logic while 128 require 7. If you look at the pipeline for Itanium register access is spread over 2 pipeline stages. Every other processor does this in a single stage.

    I think you are misunderstanding some of the specifications. I don't know much more myself, but what you said probably doesn't have the impact that you think. At least, I haven't heard anyone else complain about this. Can anyone shed some light on this topic?

    quote:Originally posted by Moridin:
    IA-64 is an in order device so it needs many more registers for the compiler to use so it can find parallelism. An OOO device does not need nearly as many architectural registers to find parallelism. It does however need a set of rename registers for each pipeline stage prior to the execution stage.

    For an in order device, Itanium can be much faster than many out of order devices. It is the architecture itself that makes in order operation work. The reason why in order affects performance in x86 is because there are too many dependancies, and even register renaming can't solve them all. In EPIC, these dependancies are resolved at compiler time, so that the CPU will not be stalled as often. In order actually raises the performance of the Itanium.

    quote:Originally posted by Moridin:
    I agree with you though, McKinley looks a lot better then Itanium, I still do not think it will be as fast as EV7 though.

    There is nothing impressive about the EV7. Actually EV7 doesn't change the processor architecture at all. It simply allows new levels of SMP parallelism. It's a different protocol that will only raise performance on a system level, not a processor level. If you are talking about IA-64 vs. an EV7 single processor, Itanium will be much faster, let alone McKinley. Only in very large SMP systems will EV7 be faster. At least this is what I hear.

    By Moridin October 12, 2000, 03:54 PM

    quote:Originally posted by zombor:
    ok, this just got way out my league

    Hey, I'm pretty close to that myself. I may be an engineer, but I'm not a chip designer.

    By chickenboo October 12, 2000, 03:55 PM

    quote:Originally posted by nkeezer:
    As for what's holding up performance on the most general level, I think the answer to that is easy: hard drives.

    I don't know about you, but my hard drive is blazing fast compared to my floppy drive. Isn't there anyone that can make my floppy drive go faster??!!

    By Ymaster October 12, 2000, 04:13 PM

    quote:Originally posted by chickenboo:
    I don't know about you, but my hard drive is blazing fast compared to my floppy drive. Isn't there anyone that can make my floppy drive go faster??!!

    No! Just use a zipdrive...

    By Moridin October 12, 2000, 04:27 PM

    quote:Originally posted by Arcadian:
    For an in order device, Itanium can be much faster than many out of order devices. It is the architecture itself that makes in order operation work. The reason why in order affects performance in x86 is because there are too many dependancies, and even register renaming can't solve them all. In EPIC, these dependancies are resolved at compiler time, so that the CPU will not be stalled as often. In order actually raises the performance of the Itanium.

    EPIC cannot resolve them all either. Memory dependencies cannot be predicated at compile time and therefore can never be resolved by an in order machine. As the gap between processor speed and memory speed increases so do the effect of memory dependencies. A deep OOO design can deal with most of the instruction dependencies and help with memory dependencies as well.

    To expand on this a bit, at compile time you don't know if a given piece of data will be in L1 L2 or main memory. If you try to execute an instruction that operates on that data an in order processor, EPIC included, has no choice but to wait until it loads that data from memory. You can get around this a little by placing the load as soon as possible in the instruction stream and working on the data later, but you still have no idea exactly how long you will have wait before the data is in a register. This is why EPIC needs so many registers.

    The situation gets worse when you are accessing a memory location that is decided upon by a calculation or even just the contents of another memory location. (Pointer chasing) In this case there is little EPIC can do but sit and wait for the data to arrive. Even if it already has all the data in its registers needed to execute an instruction it cannot do anything if that instruction is not next in line. A single call to main memory can stop execution for tens or hundreds of cycles.

    An OOO processor on the other hand just keeps looking for instructions that can be executed. If the first cannot be it looks at the next, then the next and so on until it finds one that can be executed. If data from an instruction that comes later in an instruction stream arrives before the data for an earlier one the later instruction can be executed if it has no other dependencies.


    quote:Originally posted by Arcadian:
    There is nothing impressive about the EV7. Actually EV7 doesn't change the processor architecture at all. It simply allows new levels of SMP parallelism. It's a different protocol that will only raise performance on a system level, not a processor level. If you are talking about IA-64 vs. an EV7 single processor, Itanium will be much faster, let alone McKinley. Only in very large SMP systems will EV7 be faster. At least this is what I hear.

    Given its massive memory bandwidth and process shrink to .18 the EV7 should double the current EV6 SPEC scores which already lead the industry by a large margin. Intel has already announced that they will not submit SPEC scores for Itanium. The only other company that has done this recently is Sun with its pathetic USII scores. (Sun did post scores for the USIII, but they look fairly average.)

    By PDR60 October 13, 2000, 12:52 AM

    This is an interesting question. On one hand we have hard drives that really haven't even pushed the ATA66 standard for sustained output. I really think that HD technology is topping out. It is a mechanical device that in no way will ever approach the speed of en electronic device. You can only spin platters so fast.
    Then we have system buss technology. I think Intel, Via and others are at a loss as to find an economical way to increase buss speed and maintain factors like memory bandwidth. Look at all the incarnations of the 8xx chipset intel has tried. Rambus was a flop. Then Via came out with all those K133 series chips now its the 694 series. Both have memory bandwidth problems. Its hard to believe that a chipset as old as the BX is still a viable option. It can still compete with the new chipsets in its overclocked 133 state.
    Memory is another bottleneck. Maybe not just in the speeed catagory but in the amount that bloated OS's are demanding. 98 running on anything less then 128meg takes a performance hit. Then theres Win2k! I think that DDR ram may help in the memory front. However you will still need adequate amounts to satisify your OS. I run a dually and am considering going to half a gig for memory.

    By Phoenix October 13, 2000, 04:24 AM

    quote:Originally posted by Arcadian:
    I think you are misunderstanding some of the specifications. I don't know much more
    myself, but what you said probably doesn't have the impact that you think. At least, I
    haven't heard anyone else complain about this. Can anyone shed some light on this topic?

    We recently had a small discussion on this subject in one of my lectures, the professor simply said that have done test that show that about 30 registers is about the perfect of registers. 30 was the best amount for non-floating point calculations, as floating point calculations only needed about 15 registers before performance at it's max. Once you get above 40 registers performance started to decrease due to long addresses. The only problem is that our discussion was on MIPS procressors and I'm not sure how it would relate to any x86, or x87 for that matter, processors.

    By Humus October 13, 2000, 08:49 AM

    quote:Originally posted by Moridin:

    Thanks, I am not familiar with the full IA-32 instruction set. Most of my assembly programming was done in 68000, 6800, 8080 and the 8080 compatible Z80. I still think IA-32 is limited in this regard compared to newer ISA's and that this is a bottleneck in the ISA.

    I also understand that the LEA instruction can extend some of these capabilities, but I am not entirely sure what it does.

    Yes, LEA extends the capabilities some. LEA stands for Load Effective Address and is used for calculating pointers, but can be used for other stuff too. You use it like this:
    LEA REG1, [REG2 + s * REG3 + immediate]

    So, its a 4 operand instruction where the last one has to be a constant and s has to be 1,2,4 or 8.

    So if you want to do a = b + c you can write
    LEA EAX, [EBX + ECX]
    or for a = b + 2 * c you can write
    LEA EAX, [EBX + 2 * ECX]
    or for a = b + 2 * x + 7 you can do
    LEA EAX, [EBX + 2 * ECX + 7]

    The lea instruction is rather powerful, but is a little limited too. You cannot do a = b - c, only add. The constant can be negative though.

    By Rick_James9 October 13, 2000, 11:16 AM

    Here's some info about AMD's new processor "hammer" and "Itanium" You can read more about it at: http://www.cpureview.com/art_64bit_a.html

    "Key Features of AMD's 64 bit x86 architecture
    AMD is stressing compatibility; and with a good reason: it will speed adoption of their upcoming 64 bit processors.

    Full backward compatibility with exiting 32 bit x86 code base
    Same instruction set, registers extended to 64 bits
    Full 64 bit flat address model

    If you were to ask programmers what the major faults of the existing x86 architecture were, they would give you a short but extremely important list:

    Not enough general purpose registers
    Stack based floating point processor; with only eight registers
    Did I mention not enough registers?
    Someone must have been listening. Here are some of the key features of x86-64:

    general purpose registers extended to 64 bits
    added eight more general purpose 64 bit registers (!!!!!)
    added an additional 16 register IEEE standard floating point unit (!!!!!)
    They did not stop there. More goodies:

    SSE support! with twice the number of registers Intel provides
    PC relative addressing for data
    same syntax for using 64 bit operations as 16/32 bit operations
    prefix byte to allow access to new general purpose registers in 64 bit mode
    byte-addressing for the low byte of all 16 general purpose registers
    Basically, the changes go a long way to making the instruction set more orthogonal; and the larger register files allow far greater opportunities for code optimization by the compiler writers."


    By Arcadian October 13, 2000, 12:08 PM

    Rick_James9, thanks for replying to this post with such a great topic. I hope other people read far enough into this one to see your link. You might want to repost this as a main topic. But I did want to respond to you post with some insight.

    First, the article makes a good point about the compatability aspect of x86-64 architecture. It makes sure to point out that Itanium's 32bit performance is not strong, and even mentions that Intel does not intend for this processor to run in 32bit, and that the capability is only there for compatibility.

    I wanted to mention that this should not be taken as a negative to IA-64 architecture. You see, Intel has made sure to research what OEMs are looking for in systems. What they found out after talking to many different companies is that large servers are usually sold as complete packages that are further tested by the companies that sell them. Take Hewlette Packard, for example, who I believe is taking an active part in the Itanium launch.

    HP will bundle an Itanium system with the OS and all the software that a customer will want. They will make sure that all the software is 64bit for optimal compatability, and at the same time try to offer the customer as many choices as are available. If there is one application that is 32but that the customer requests, then Itanium can certainly be bundled with it, and not be severely impacted. However, if there are many 32bit applications that the customer desires, HP will still also have Intel's IA-32 processors, which will continue to progress with new technology, and still be compelling choices. However, Intel believes from what they've heard from their customers, that most users of these advanced systems will be able to use complete 64bit solutions. If they weren't sure about this, it is doubtful that Intel would invest so much into Itanium.

    As the Itanium platform matures, there will be more software available, which means more choices to the end users, and a larger user base. It's a fairly solid business model that's hard to see from our point of view, Rick_James9.

    AMD's business model is different. Even though there will probably be an x86-64 compatible Linux kernel by the time Sledgehammer launches, there will not be too much software at first. Although this will grow with time, AMD knows this, and is prepared to offer Sledgehammer as a strong 32bit processor. As more software is developed, then Sledgehammer will slowly transition to its more powerful 64bit side.

    However, this takes time, and Intel will quickly gain market share with their 64bit solution. They've already been nearly guarenteed this through customer feedback and support. AMD will surely transition their processor to the workstation market, where its strong floating point should be very compelling. Butm in terms of the enterprise server, they will have a much tougher time penetrating.

    Eventually, AMD wants the Hammer series to be their next desktop chip. With strong 32bit performance, and hopes of more x86-64 software being developed, seeing this processor in the desktop market is inevitable. Intel will be hard pressed to compete with their IA-32 line once 64bit software becomes ubiquitous, but this is very far down the road, and I believe they have solutions already planned.

    The article also mentions McKinley, the successor to Itanium. Details about this chip are scarce, but it is said that it will perform in many ways better than Itanium. Before details are released, I think it is presumptuous for the article to already claim Sledgehammer's superiority to McKinley, especially since the two are aimed at different markets.

    Itanium is already slated to be available in 8, 16, and 32 procesor systems from NEC and Unisys, and Sledgehammer will not scale nearly as well. AMD has an emerging 2 processor system, which definately breaks the surface, but they have a while to go before they are able to implement the 8-way servers that are on their roadmap, and I have yet to hear of anything greater than that.

    I believe there is enough market space for both Intel and AMD to be successful in their own way. I hope they are both successful, because they both offer new and interesting technology that will surely put us into the next generation of computing.

    By Superwormy October 13, 2000, 01:59 PM

    Limiting factor is bus speeds and memory speeds, certaintly not HD speed. Look at the computer when your playing games and such, as long as you have 128 megs of RAM, the little hard drive light rarely comes on. On the other hand, look at the huge FPS difference u get if u go from bus speed of 66 to 133 with the same proccessor.


    Contact Us | www.SharkyForums.com

    Copyright 1999, 2000 internet.com Corporation. All Rights Reserved.


    Ultimate Bulletin Board 5.46

    previous page
    next page




    HardwareCentral
    Compare products, prices, and stores at Hardware Central!


    Copyright 2002 INT Media Group, Incorporated. All Rights Reserved. About INT Media Group | Press Releases | Privacy Policy | Career Opportunities