Home

News

Forums

Hardware

CPUs

Mainboards

Video

Guides

CPU Prices

Memory Prices

Shop



Sharky Extreme :


Latest News


- Dell Updates Notebook, Desktop Line and Goes for "Chic"
- Larrabee Disappoints at IDF
- What's Coming Up at IDF?
- AMD and Intel Beating Sales Projections
- 2631
News Archives

Features

- SharkyExtreme.com: Interview with Microsoft's Dan Odell
- SharkyExtreme.com: Interview with ATI's Terry Makedon
- SharkyExtreme.com: Interview with Seagate's Joni Clark
- Half-Life 2 Review
- DOOM 3 Review

Buyer's Guides

- February High-end Gaming PC Buyer's Guide
- November Value Gaming PC Buyer's Guide
- September Extreme Gaming PC Buyer's Guide

HARDWARE

  • CPUs


  • Motherboards

    - Gigabyte X48T-DQ6 Motherboard Review
    - Intel DX48BT2 (X48) Motherboard Review

  • Video Cards






  • SharkyForums.Com - Print: GPU vs CPU - Please explain architectural differences.

    GPU vs CPU - Please explain architectural differences.
    By mellojoe September 09, 2001, 11:47 PM

    What are the major (and minor, heck everything) differences between a CPU and a Graphics Processing unit. I mean, why is it that there are 1.4 to 2Ghz processors out but GPU core clocks have only reached 500mhz (or approximately). My Radeon, for instances, is only clocked at 200mhz.

    Is it different like RISC is differenct from CISC?

    It seems like CPU processing power is no longer a bottleneck to major applications anymore. The only component that is actually stressed these days is the GPU. So why is it that the GPU falls so far behind the CPU in clock frequency?

    By idris5 September 10, 2001, 04:03 AM

    You've touched on an interesting subject here where complexity is usually wrongly judged (by the less technically minded) by the transistor count.

    The difference between the attainable clock rates of a GPU and a CPU comes down to the way they are designed - hang on, I know you think this is an obvious comment, but bare with me.

    CPU's are designed using an approach called full custom. This means that every single transistor is hand optimised for its location in the circuit. So, for example, if you want to drive a long interconnect you'll need a high driving strength transistor (so the aspect ratio of gate width to length is increased). There are lots of tricks that can be brought to bare to eek out tiny performance advantages in each transistor (like placing transistors next to each other so that they abutt which means no interconnect is required), however it takes a long time to design using full custom methods.

    GPU designers, on the other hand, absolubtly can not afford long turn around times - time to market is king - so they use another approach called cell based design.

    Cell based design takes a standard set of cells (which are gates like buffers, AND gates, latches etc) that are unique to a foundary process like TI 0.13 or TSMC 0.18. These cells are generic and are instantiated into the design with little (I'm not going to overcomplicate this too much) consideration to the function of the cell in that part of circuit. Because of this the performance achievable is not as high as with full custom, however you can iterate the back end of the design (RTL->Netlist->layout [chip floorplan]->GDSII [what is sent to the foundary for manufacturing]) very quickly (three weeks or so, depending on the speed of the machines available) and fix problems as and when they come up.

    Cell based design isn't simple and there are an awful lot of pitfalls (many of which are brought about by the design tools), but the big advantage it has over full custom is time to market, which it does at the expense of speed.

    By idris5 September 10, 2001, 05:28 AM

    Oh, and here is an example of one of the major limitations of cell based design with current design tools.

    Synthesis tools can currently only swallow around 500,000 instances (gates, sort of) in one go. So therefore if you have a GPU of 10 million + instances (not transistors) you have to break it down into lots of sub blocks and synthesise these blocks seperately.

    The problem here is that even though you make as much effort as possible to place pins on these macro blocks as close to each other as possible, you almost always haves buses that need to go to at least two blocks and so one has to follow a much longer path (as you can't stack macro blocks on top of each other).

    Obviously you can try and make sure that your critical path has macro blocks placed as close to each other as possible, but you will always get some degradation in maximum clock rate.

    Edit - Realised that many won't know what synthesis is! Synthesis is a process that occurs around the middle of the design flow. You take a verilog RTL description of your design, which is a reasonably high level description of your design (its a bit C [programming language] like), and this is then synthesised to a gate level description of your design called a netlist.

    By elimc September 10, 2001, 05:30 AM

    Hey Idris, do you work for Idris the company? Im assuming that you do.

    By idris5 September 10, 2001, 05:39 AM

    No, I work for ARM in the UK.

    By Adisharr September 10, 2001, 09:23 AM

    quote:Originally posted by idris5:
    You've touched on an interesting subject here where complexity is usually wrongly judged (by the less technically minded) by the transistor count.

    The difference between the attainable clock rates of a GPU and a CPU comes down to the way they are designed - hang on, I know you think this is an obvious comment, but bare with me.

    CPU's are designed using an approach called full custom. This means that every single transistor is hand optimised for its location in the circuit. So, for example, if you want to drive a long interconnect you'll need a high driving strength transistor (so the aspect ratio of gate width to length is increased). There are lots of tricks that can be brought to bare to eek out tiny performance advantages in each transistor (like placing transistors next to each other so that they abutt which means no interconnect is required), however it takes a [b]long time to design using full custom methods.

    GPU designers, on the other hand, absolubtly can not afford long turn around times - time to market is king - so they use another approach called cell based design.

    Cell based design takes a standard set of cells (which are gates like buffers, AND gates, latches etc) that are unique to a foundary process like TI 0.13 or TSMC 0.18. These cells are generic and are instantiated into the design with little (I'm not going to overcomplicate this too much) consideration to the function of the cell in that part of circuit. Because of this the performance achievable is not as high as with full custom, however you can iterate the back end of the design (RTL->Netlist->layout [chip floorplan]->GDSII [what is sent to the foundary for manufacturing]) very quickly (three weeks or so, depending on the speed of the machines available) and fix problems as and when they come up.

    Cell based design isn't simple and there are an awful lot of pitfalls (many of which are brought about by the design tools), but the big advantage it has over full custom is time to market, which it does at the expense of speed.

    [/B]

    I always find your posts very interesting..

    By mellojoe September 10, 2001, 09:32 AM

    Wow. That is very interesting.

    So could you quickly explain why, then (maybe I missed something), a GPU has to have 57 million transistors when CPUs are just now reaching 40 million?

    By idris5 September 10, 2001, 11:07 AM

    quote:So could you quickly explain why, then (maybe I missed something), a GPU has to have 57 million transistors when CPUs are just now reaching 40 million?

    GPU's have a lot of duplication - four or so identical pipelines, duplicated logic for matrix operations etc, however as I mentioned earlier you should not look at how many transistors there are, but how they are put together.

    Just to reiterate/concatanate, the CPU has every transistor crafted by hand (not quite, but almost), whereas the GPU just picks out of a library a cell and puts it into the design. So the GPU approach is far less flexible but much faster.

    Perhaps an analogy would be helpful:

    If you want a new kitchen you can either get a craftsmen in who will measure every unit gap up to the mm and use space to the maximum, but this may take a long time and cost a lot (the full custom approach). Or, you can go down to MFI/KMart (insert local DIY place) and buy off the shelf units that may not fit brilliantly but are cheap and quick to install (the cell based approach).

    quote:I always find your posts very interesting.

    Thanks.

    [edit] I realise, Mellojoe, that you are looking for architectural differences - but really there are few similarities other than the fact they are both processing units that operate on data. The work they have to do is totally different, so you have to view physical differences like clock rates from a design ethos perspective, which is what I've tried to highlight above.

    [edit 2] Actually, I realise some people may point out that a GPU may be designed using semi-custom techniques, where the majority of the design is cell based, but the critical path is designed using full custom. Alternatively, and perhaps most likely, Nvidia may design their own standard cell library optimised for the TSMC 0.1x process.

    I don't know enough about how Nvidia design to really comment, but those are some of the options available.

    By mellojoe September 10, 2001, 11:32 AM

    Thanks. That is exactly what I was looking for.

    Just a plethora of information there.

    By Moridin September 10, 2001, 12:31 PM

    quote:Originally posted by idris5:
    No, I work for ARM in the UK.


    For those of you not familiar with ARM, it was originally the processor Acorn's line of RISC based computers. Today it is the most common processor in embedded devices, and if I am not mistaken outsells all X86 processors combined.

    Intel's XScale is one example of a processor based on ARM technology. The next generation Palm will use ARM based processors, as does the current generation of PDA's from companies like Compaq.

    [/ARM Plug]

    By idris5 September 10, 2001, 01:56 PM

    Thanks Moridin - it would be a bit wrong for me to plug us!

    cough The Gameboy Advance has an ARM7 in it cough!

    By SlartyB September 10, 2001, 02:21 PM

    Thanks for the very detailed discourse on ASIC design idris5, you are pretty much right on the money. There are a couple of other factors involved here too :

    1) Intel and AMD litterally have armies of people who do nothing but tweak the design getting every last ounce of performance out of their designs. Companies like nVidia and ATI can not afford that kind of resource.

    2) I would actually argue that GPUs are *MORE* complex than CPUs. Contraversial, I know, but having been on the inside of the design process at a major GPU manufacturer, I can tell you that the amount of adders and multipliers that goes into a modern GPU would blow most people's minds - it's a LOT more than in a typical CPU. GPUs basically use the transistors to get more done in the time, whereas CPUs do a simpler set of more general-purpose things done a lot quicker (because they are simpler).

    By Moridin September 10, 2001, 03:15 PM

    quote:Originally posted by SlartyB:

    2) I would actually argue that GPUs are *MORE* complex than CPUs. Contraversial, I know, but having been on the inside of the design process at a major GPU manufacturer, I can tell you that the amount of adders and multipliers that goes into a modern GPU would blow most people's minds - it's a LOT more than in a typical CPU. GPUs basically use the transistors to get more done in the time, whereas CPUs do a simpler set of more general-purpose things done a lot quicker (because they are simpler).

    I would also think there is a lot more opportunity to do things in parallel. This would favor having additional hardware doing the same job over running the same hardware faster, especially when you take into account the design cycle stuff IDRIS5 outlined above.

    By Conrad Song September 10, 2001, 03:37 PM

    quote:Originally posted by SlartyB:
    Thanks for the very detailed discourse on ASIC design idris5, you are pretty much right on the money. There are a couple of other factors involved here too :

    1) Intel and AMD litterally have armies of people who do nothing but tweak the design getting every last ounce of performance out of their designs. Companies like nVidia and ATI can not afford that kind of resource.

    2) I would actually argue that GPUs are *MORE* complex than CPUs. Contraversial, I know, but having been on the inside of the design process at a major GPU manufacturer, I can tell you that the amount of adders and multipliers that goes into a modern GPU would blow most people's minds - it's a LOT more than in a typical CPU. GPUs basically use the transistors to get more done in the time, whereas CPUs do a simpler set of more general-purpose things done a lot quicker (because they are simpler).

    But at the same time, one would think that duplicating an adder or multiplier, etc., would not be too difficult? For example, going from two rendering pipelines for four sounds a lot like a "stamp" operation. I would think that the front-end (T&L) logic would be much more complicated. Please correct me if I'm wrong, I'm very interested.

    By SlartyB September 10, 2001, 05:16 PM

    quote:Originally posted by Conrad Song:
    But at the same time, one would think that duplicating an adder or multiplier, etc., would not be too difficult? For example, going from two rendering pipelines for four sounds a lot like a "stamp" operation. I would think that the front-end (T&L) logic would be much more complicated. Please correct me if I'm wrong, I'm very interested.

    Yes - there is a lot of duplication in a GPU. But there is also a lot of specialisation that goes on in a GPU that does not happen in a CPU. For example, suppose you have some algorithm to do a particular function in a GPU. That algorithm will be mapped - more or less unchanged - to the hardware. The precision of each multiplier and adder will be tailored to that particular task and will only have as many bits of precision that they need. A CPU on the other hand can not know ahead of time what algorithm it will be asked to do, so can not make compromises on the precision of it's calculations - it always calculates a result with maximum precision. However, because of this general purpose nature, you only have a few multipliers and adders, so you can spend a long time optimising them to make them go as fast as possible. In GPUs, it's more about cramming as many multipliers and adders as you can into the design, worrying less about the individual performance of each unit, but more about the overall performance.

    By idris5 September 11, 2001, 01:55 PM

    Yes, I completely agree. Because graphics is deterministic (you know exactly what is going to happen when), you can have lots of highly specialised units and not worry about inefficiency.

    [edit]

    quote:I would actually argue that GPUs are *MORE* complex than CPUs. Contraversial, I know, but having been on the inside of the design process at a major GPU manufacturer, I can tell you that the amount of adders and multipliers that goes into a modern GPU would blow most people's minds - it's a LOT more than in a typical CPU. GPUs basically use the transistors to get more done in the time, whereas CPUs do a simpler set of more general-purpose things done a lot quicker (because they are simpler).

    IMO you are partly right. I'd say that the design planning stages of a GPU are more complicated and the layout work must be fun , however the low level intracacy involved in CPU design is far more complex than GPU design.

    So they are both complicated - but in different ways at different level.


    Contact Us | www.SharkyForums.com

    Copyright © 1999, 2000 internet.com Corporation. All Rights Reserved.


    Ultimate Bulletin Board 5.46

    previous page
    next page





    Copyright © 2002 INT Media Group, Incorporated. All Rights Reserved. About INT Media Group | Press Releases | Privacy Policy | Career Opportunities