Home

News

Forums

Hardware

CPUs

Mainboards

Video

Guides

CPU Prices

Memory Prices

Shop



Sharky Extreme :


Latest News


- Outdoor Life: Panasonic Puts 3G Wireless Into Rugged Notebooks
- Averatec Launches Lightweight Turion 64 X2 Laptop
- Acer Fires Up Two New Ferrari Notebooks
- Belkin Debuts Docking Station for ExpressCard-Equipped Notebooks
- Logitech 5.1 Speaker System Puts Your Ears At Eye Level
News Archives

Features

- SharkyExtreme.com: Interview with ATI's Terry Makedon
- SharkyExtreme.com: Interview with Seagate's Joni Clark
- Half-Life 2 Review
- DOOM 3 Review
- Unreal Tournament 2004 Review

Buyer's Guides

- September High-end Gaming PC Buyer's Guide
- September Value Gaming PC Buyer's Guide
- October Extreme Gaming PC Buyer's Guide

HARDWARE

  • CPUs


  • Motherboards

    - Gigabyte GA-965P-DS3 Motherboard Review
    - DFI LANPARTY UT nF4 Ultra-D Motherboard Review

  • Video Cards

    - Gigabyte GeForce 7600 GT 256MB Review
    - ASUS EN7900GT TOP 256MB Review
    - ASUS EN7600GT Silent 256MB Review
    - Biostar GeForce 7900 GT 256MB Review





  • SharkyForums.Com - Print: What would you think about putting RDRAM on a Video Card

    What would you think about putting RDRAM on a Video Card
    By Arcadian January 06, 2001, 03:30 PM

    I wanted to start a technical discussion today, and see what people would think about putting RDRAM on a video card instead of SDRAM.

    One of the buggest bottlenecks in video cards, even on the GeForce 2 Ultra, is memory bandwidth. Many people believe that more memory bandwidth will allow for better frame rates, and faster graphics for very intense games.

    Correct me if I'm wrong, but I believe that the current limit for memory bandwidth on a modern video card is using 220MHz memory with DDR technology, thus allowing for the equivalent of 440MHz video DDR memory.

    What if instead of DDR SDRAM, video card manufacturers use RDRAM. First, let's talk about memory speed. It is known that direct connect memory can reach much higher speeds than system memory, which have several loads connected together. While system DDR memory maxes out at 133MHz (PC2100), video DDR memory can go up to 200MHz (440MHz DDR, again correct me if I'm wrong).

    If we used Rambus memory, which easily reaches 400MHz in a system (PC800 RDRAM), the direct connect version should reach up to 500MHz or 600MHz (1GHz or 1.2GHz DDR, since Rambus also uses double data rate technology, in case you didn't know). However, we'll be conservative, and assume an effective 1GHz for video Rambus.

    Now, let's consider the pin count. Rambus is 16-bit wide, and SDRAM is 64-bit wide. Most video cards make a dual channel SDRAM interface of 128 bits. So, for the same amount of pins of a DDR interface, you can create a connection of 8 Rambus channels. This will effectively quadruple the bandwidth of a standard dual channel connection.

    So let's do a simple calculation. Let's say that current DDR video memory interfaces get 440MHz, effective speed, 64 bits (or 8 bytes per clock), and 2 channels. The bandwidth would then be 440 * 8 * 2 = 7.04GB/s. Let's take the same calculation for a Rambus interface. From before, we can assume we get at least 1GHz per channel for speed, 16 bits (or 2 bytes per clock), and 8 channels. The bandwidth would then be 1000 * 2 * 8, or 16GB/s. That's more than twice the bandwidth for a similar connection! We were also being fairly conservative about the speeds of a direct Rambus connection. I believe I read somewhere that they were about to come out with a 1.6GHz direct Rambus connection, but I don't know whether that ever materialized. If that were to come out some time in the future, though, we could talk about 1600 * 2 * 8 = 25.6GB/s video memory bandwidth!!!

    Now there are two things about Rambus system memory that are disconcerning to the rest of us. The first is latency, and the second is price.

    For the first, latency, we know that Rambus does have higher latencies than DDR SDRAM, mostly because of the packet decoding process that has to go on, since Rambus data is packet based. However, it is known that a lot of Rambus latency comes from the system memory design scheme that allows for multiple loads. Direct Rambus interfaces are faster than Rambus in a PC system. Therefore, latency should not have an adverse affect performance over a similar DDR interface.

    Second, price, is also a consern. However, it is known that the largest expense for Rambus memory is the RIMM module. In a direct connection, Rambus is actually cheaper than SDRAM. However, prices will be affected by lower volumes, since a new technology of direct video Rambus would be expensive. But then again, Driect DDR interfaces are also relatively low volume, so I am thinking that a video Rambus would not be much more expensive than video DDR.

    So there you have it. Obviously if everything else is equal, then Rambus should be the best things to ever happen to the video world. But, things may not be so equal, and I was hoping to get a few people's opinions on how this kind of setup would work. Do you think that using Rambus in video cards would be worth it? As we have seen, bandwidths can easily be doubled, or even tripled over current technologies. How would you like frame rates of 120+ fps at 1600*1200*32 with all features turned on in your favorite games?

    Please reply, and have fun with this. I am hoping for technical responses, but feel free to toss in an opinion or two, as well. Later .

    By Assmonkey January 06, 2001, 03:39 PM

    First off, i think you talk to much.
    second off, Yes RDRAM i think would be sweet, its just to expensive right now.

    By Arcadian January 06, 2001, 04:26 PM

    quote:Originally posted by Assmonkey:
    First off, i think you talk to much.
    second off, Yes RDRAM i think would be sweet, its just to expensive right now.

    Wow. This is even more of an insightful technical response than I could ever have hoped for....

    Maybe I've been mislead to think that the highly technical forum might yield some interesting discussion in some rarely explored areas of interest. Maybe I should go back to the CPU forums and ask how to get another 20MHz out of an already highly overclocked 700MHz Duron. Or, maybe assmonkeys are just too stupid to know anything about highly technical subjects such as high bandwidth memory technologies, and they should learn when to keep their mouths shut and not insult someone who is simply trying to entice some technical discussions. Perhaps you should move on to the next mindless overclocking topic and kindly keep future arrogant responses to yourself.

    [edit] As an afterthought, I just want to say that I really don't mean to come down so hard on the assmonkey, but it really does upset me when the only response I get to a writeup that I spent almost a half hour on, was that I talk too much. I know that I did invite comments and opinions, but I still did not appreciate the insult.

    By blk-sabbath-fan January 06, 2001, 04:33 PM

    [Original message edited out]

    We won't tolerate personal attacks, blk-sabbath-fan - this is a perfectly valid discussion, and I, like Arcadian would like to see intelligent conversation to the possibility of RDRAM on video cards, thank you.

    Ciao!

    By Arcadian January 06, 2001, 04:57 PM

    You've got me figured all wrong, man. I am not picking on the assmonkey because his response was short, but because his response was insulting and innapropriate. And you're not helping things with yours.

    I have a lot of posts, because I've been a member here since this forum first began. I have a life except for times when I'm bored, and during those times, I look for some interesting discussions on this board.

    In case you haven't noticed, we are in the Highly Technical Forum. I spent a good amount of time thinking up and writing about a topic that not many people have talked about so far. If you think this is dorky, then you shouldn't be here.

    Many people that come to this site either look to get help, or help others. I happen to participate in the latter quite a bit, and give knowledge to many readers that want to know more about computer architecture. Again, if you are not interested in topics of computer architecture, then you should be surfing around some Black Sabbath fan sites, and downloading concert times, or whatever floats your boat.

    However, there are quite a few people that are interested in highly technical discussions, and for those people, both you and the assmonkey are wasting space with your bullsh!t. I hope you can be respectful, and keep this forum close to what it is intended to be. If not, I'm sure the moderators here would have no problem barring you from the boards.

    I have no problem with opinions that Rambus is expensive, or that you may not think it is a valid idea. Backing up your claims would be very appropriate, but it isn't necessary. However, I think you are crossing the line when you make comments such as yours or the assmonkey's. Please make sure that you treat others on this board with respect, and you will be treated accordingly. Have a nice day.

    By i954 January 06, 2001, 05:19 PM

    I don't enjoy flame wars. Take it easy people. Maybe the "you talk too much" remark was made as a joke, but "Wow. This is even more of an insightful technical response than I could ever have hoped for.... " is kind of insulting in my book. Of course, "hey dumb **** " is not better either, so people, if you see something that you don't like, don't insult other, but nicely tell them that you think what they said is not appropriate.

    As for RDRAM, I do not have anything to say that you didn't mention in your first post. Actually, you kind of answered yourself, yes it would be better. Of course, I am sure there is a reason because of which RDRAM is not (yet) used in vid. cards. Maybe someone with more knowledge on this can contribute some of it!

    By Angelus January 06, 2001, 06:37 PM

    I'm confused here.

    Isn't the RIMM module the actual piece of hardware? If so, of course that's the most expensive piece, it's THE piece you get.

    If it's only a part of the stick then why is
    the RIMM module the most expensive part? Is it made of something special maybe? I've never seen a RDRAM module for real, only in pictures.

    By H@rdw@reXpert January 06, 2001, 06:44 PM

    DDR is good enough on video cards today.

    By Elxman January 06, 2001, 07:23 PM

    That's very interesting, I think along the way video cards will be taking advantage of rdram's incredible bandwidth(just like what sony did with the ps2), and also what Arcadian has noted that it uses a 16bit datapath should make it more cheaper since there doesn't need to be as many traces. I know there is some downsides but I just can't think of it or I don't know how to explain.

    By James January 06, 2001, 07:41 PM

    quote:Originally posted by Angelus:
    I'm confused here.

    Isn't the RIMM module the actual piece of hardware? If so, of course that's the most expensive piece, it's THE piece you get.

    If it's only a part of the stick then why is
    the RIMM module the most expensive part? Is it made of something special maybe? I've never seen a RDRAM module for real, only in pictures.

    The "RIMM" (Rambus Inline Memory Module) is the PCB that the actual Rambus memory is mounted on. Same thing with SDRAM. SDRAM memory is currently mounted on a 168 DIMM (Dual Inline Memory Modules). So for example, if you talk about picking up a "stick" of 128MB of RAM, you are thinkind of purchasing a 168 DIMM with 128MB of RAM modules mounted on it. It is the mounting/packing of more memory in the same amount of space that ups the price of DIMMs as you increase the amount on a single DIMM. (aka 2x128MB sticks cost less than a single 256MB stick, etc.)

    Speaking of packaging: To follow along with the original discussion, how much RDRAM do you think could be theoretically packed onto the vid card's PCB? If video bandwidth is the current choke point, I would imagine another factor (not quite so serious at the moment) would be having to fetch data from system memory. Current SDRAM video memory on the Ultra series of nVidia cards runs what? 4.5 or 5 ns? What does main memory run at 7.5ns or something close to that?

    Also, just kind of an off the wall idea. Do you think that games will ever come to the point that all textures are loaded at the beginning of the game into the video cards cache. Throughout the entire game the video card would never have to load textures from main memory. The only data the video card would require for rendering is the current list of onscreen items and the textures that they use. Obviously, this would take a massive amount of onboard storage.

    Maybe not as technical as you would like, but better than "dumb***!"

    By AMD_Forever January 06, 2001, 10:23 PM

    There is one problem that rambus just created for themselves with getting their memory adapted to graphics cards. They went hostile on nvidia. In nvidias latest finance-technobable report they said rambus has requested royalties for the DDR ram nvidia graphics cards use. this could mean 1 thing, a) nvidia wins and pretty much is never friendly toward rambus or b) nvidia loses, has to pay more for DDR use than they would for rambus use and switch to using rdram. however, it's most likely nvidia will win and not want to deal with rambus at all. and as for me personally, i think rambus memory in graphics cards is a great idea, but with rambus pissing off chipset makers including intel and now pissing off video chipset makers, no one is gonna be considering them for anything no matter the technical gains.

    and on a final note, dont flame arcadian DAMNIT.

    By Adisharr January 06, 2001, 11:02 PM

    quote:Originally posted by Assmonkey:
    First off, i think you talk to much.
    second off, Yes RDRAM i think would be sweet, its just to expensive right now.

    Hmm.. let me guess.. 12 or 14 years old? Go to bed please..

    By dighn January 06, 2001, 11:20 PM

    jeez the guy just made a joke(although not funny) and you are all jumping on him... calm down ppl!

    By JabberJaw January 07, 2001, 12:22 AM

    quote:Originally posted by Arcadian:
    Many people believe that more memory bandwidth will allow for better frame rates, and faster graphics for very intense games.

    High end accelerators and 3D cards use memory for two distinctly different functions - for computation, and for the frame buffer.

    For computation, memory is used by the graphic processor like any microprocessor. Here is where the term "frame rate" you used actually refers to how many times per second the graphics engine can calculate a new image and put it into the video memory. Frame rate is much more a function of the type of software being used and how well it works with the acceleration capabilities of the video card. So, essentially what you are talking about doing here would be using RDRAM with a graphic processor in the same manner as used with some Intel microprocessors.

    For frame buffering, the bandwidth of the memory used to buffer video frame data to the RAMDAC is what affects the refresh rate per color depth per resolution. For example, at 1600x1200 Resolution, 24 Bit, 100 Hz Refresh, a video memory bandwidth of 549.3 megabytes/second is required. VRAM memory is considered best for frame buffering because it is dual ported for simultaneous write by GPU/read by RAMDAC. I'm not sure what effect RDRAM packet decoding would have interacting with the RAMDAC, but would suspect that any latency in this real-time function would be immediatly apparent on the monitor screen.

    Some cards use their video memory for both the frame buffer and additional calculations, and some have more than one type of video memory for the separate functions. Do you postulate using RDRAM for either, or both?

    By Fuzzball January 07, 2001, 02:09 AM

    Hmmmm. That's a lot to munch on. Gives me someing to ponder tonight. To be honest, I haven't been a Rambus fan, but you do present an interesting idea. It would be cool to see some real world benchmarks (hint hint to all relative the R&D departments out there)

    BTW, I don't think you talk to much. You gave a lot of info that was need and I give you credit for the time you spent on that post.

    By m538 January 07, 2001, 04:29 AM

    Bandwidth (Mb/s) is multiplication of bit width and frequency. You must understand that increasing of bit width and frequency hasn't same effect in certain cases. Actually increasing frequency with same latency is more effective but much more expensive than increasing bit width same times. In case with main memory CPU often need only 8,16,32 bits from entire 64,128,256-bit packet and next data from other place. CPU place-changing speed is limited by frequency. Programmers are trying to use continuous data to use increased bit width, but in most logical parts they can't do this so easy like with computation of angle array for example. But GPU uses much more continuous memory requests, it usually sends and takes packets up to 256-1024 bits.
    I'm trying to figure out that GPU doesn't need extra high-clocked memory like CPU. GPU itself isn't high-clocked enough, it's bit-wide, and if it will need more bandwidth it will utilize cheaper bit-wide memory instead of high-clocked. Even DDR technology has virtually double bit width not frequency.
    Last, video memory isn't a bottleneck. It's a bottleneck with current stupid situation that all textures stored in video memory. Memory bandwidth is high enough to support triple frame buffer and mipmap variables. I think video memory must also store more often used textures, but actually GPUs don't use cache mechanism applying to video memory. Program itself writes more used in the entire level/trace textures into local memory at the beginning of that level/trace.
    I suppose that with AGP 8x (with same bandwidth as 133 DDR RAM) we will finally see games with textures of one level more than 100Mb without video cards overcoming 64Mb memory and 1000$ barriers.

    By Humus January 07, 2001, 09:39 AM

    Well, I didn't read all post, but the topic is interesting. Graphic memory is actually a place where RDRAM would fit great (technologically, but I don't want to see it happend unless RAMBUS change their behaviour).
    A graphic card does almost always linear memory accesses. Many card uses tiled textures for increased linearity of the memory accesses. Latency is a very small problem but bandwidth is a important one.

    For cpu's I'd hope more for a low-latency technology such as FCRAM. A cpu accesses memory much more seldom in a linear manner.

    By Humus January 07, 2001, 09:49 AM

    quote:Originally posted by m538:
    Bandwidth (Mb/s) is multiplication of bit width and frequency. You must understand that increasing of bit width and frequency hasn't same effect in certain cases. Actually increasing frequency with same latency is more effective but much more expensive than increasing bit width same times. In case with main memory CPU often need only 8,16,32 bits from entire 64,128,256-bit packet and next data from other place. CPU place-changing speed is limited by frequency. Programmers are trying to use continuous data to use increased bit width, but in most logical parts they can't do this so easy like with computation of angle array for example. But GPU uses much more continuous memory requests, it usually sends and takes packets up to 256-1024 bits.
    I'm trying to figure out that GPU doesn't need extra high-clocked memory like CPU. GPU itself isn't high-clocked enough, it's bit-wide, and if it will need more bandwidth it will utilize cheaper bit-wide memory instead of high-clocked. Even DDR technology has virtually double bit width not frequency.

    Last, video memory isn't a bottleneck. It's a bottleneck with current stupid situation that all textures stored in video memory. Memory bandwidth is high enough to support triple frame buffer and mipmap variables. I think video memory must also store more often used textures, but actually GPUs don't use cache mechanism applying to video memory. Program itself writes more used in the entire level/trace textures into local memory at the beginning of that level/trace.
    I suppose that with AGP 8x (with same bandwidth as 133 DDR RAM) we will finally see games with textures of one level more than 100Mb without video cards overcoming 64Mb memory and 1000$ barriers.

    I agree with most except the last things. It's not a stupid situation that all textures are in video memory. AGP will never be fast enough for texturing. AGP speed is increasig slower than onboard memory speed.
    Also, GPUs does cache video memory. They have both texture cache and vertex cache. If you're programming with the texture cache in mind you can easily increase speed by a factor of two or close to that.

    By Arcadian January 07, 2001, 01:04 PM

    Wow, there is some really good discussion going on here. I appreciate everybody continuing to put their opinions in, even after the flames in the beginning (which I apologize for ).

    I see that some people are suggesting that memory bandwidth isn't as important as some other aspects of the video card. Can anyone elaborate on that, or give examples of ways that the GPU portion of the video card can be improved?

    Also, some people are suggesting that the latency of Rambus may affect the timing of the Ramdac and other refresh timings. Is there anything else fundemental that Rambus could cause problems with?

    Finally, a couple people mentioned Rambus politics, and that of course will also play a big part. Does anyone know of current legal battles with Rambus that could make a difference on whether video card manufacturers may adopt RDRAM as a video memory technology?

    Thanks again. This is good stuff... keep it coming.

    By Humus January 07, 2001, 06:52 PM

    quote:Originally posted by Arcadian:
    I see that some people are suggesting that memory bandwidth isn't as important as some other aspects of the video card. Can anyone elaborate on that, or give examples of ways that the GPU portion of the video card can be improved?

    My opinion: Memory bandwidth is the most important issue now.
    Even a Radeon (which is the least memory bandwidth constrained card among the newer ones) will choke if you don't carefully make sure that the GPU can all data it needs fast enough. You even have to think about such things as linearity as a developer to get good fps. I'm currently evaluating various rendering techiques for an upcoming 3dengine and in the current state of it it has 5 passes (normalmap, bumpmap, basetexture, lightmap and environmentmap). I got a strange experience with it during implementation of bumpmapping. The basetexture and the bumpmap are large textures (256x256 and higher is what I consider large in this case). The lightmap and normalmap are small (16x16).
    In the first implementation I rendered the bumpmap and basetexture in the same pass with multitexturing and got a fps of 90. But when I rendered it with two passes and single texturing I got 100fps. It was a little strange since the multitexturing should be twice as fast (if there were infinite memory bandwidth) and definitly not slower. Turning texture compression on made multitexturing score go up to 150 while single texturing went up to only 140. That revealed that memory held it back, and even more important linearity. With two large textures applied in the same pass the linerity is destroyed since it would need to fetch texels from two different places in memory for each pixel. After rearranging the passes to only have one large texture and one small texture in the same pass the fps went up to over 200 with texture compression on for the base texture (but not for the bumpmap since it doesn't look especially good compressed).

    By nkeezer January 08, 2001, 02:11 AM

    Given 3dfx's recent demise, and ATi's historical position as follower (as opposed to innovator), it seems to me that the only company that would be likely to introduce a Rambus video card would be Nvidia.

    Now, given the facts that 1) they were the first company to use DDR in a video card and 2) they were supposedly developing a DDR chipset, is it reasonable to assume that Nvidia has some kind of vested interest in DDR, or (maybe more likely) a reason to see Rambus to go down? If that's the case, I wouldn't expect to see Rambus in a video card for a while -- at least as long as Nvidia has the power that it does.

    quote:Originally posted by Arcadian:
    Finally, a couple people mentioned Rambus politics, and that of course will also play a big part. Does anyone know of current legal battles with Rambus that could make a difference on whether video card manufacturers may adopt RDRAM as a video memory technology?

    By m538 January 08, 2001, 04:23 AM

    I must add that I don't know future of course. If market wishes all textures will be loaded in the local video memory. Let this memory grows up to 4Gb or even more, IT IS NOT IMPOSSIBLE, but in such case engineers must develop protocol that virtually extend main memory with video memory to load textures directly without doubling them in two places. I think best choice is relatively cheap video card with maximum of 64Mb embedded memory and cool AGP.

    As I already posted somewhere, our eyes' resolution and framerate aren't endless. It seems that NV20 will support maximum resolution and framerate our eyes can see with current quality. It's still a lot of things to improve to make interactive games look like cinema. I think that majority of such improvement come from more powerful GPU and minority from increased AGP bandwidth. I didn't see T&L still, but I know it is an example. Certainly, I can be wrong there.
    Well, maybe 32-bit AGP 8x is not enough but PCI-X based 64-bit AGP 8x (4 Gb/s as dual channeled 133DDR) must be enough for a while.

    Also true quality come from large amount of textures and increased number of independent elements on the screen.

    By Humus January 08, 2001, 06:29 AM

    quote:Originally posted by nkeezer:
    Given 3dfx's recent demise, and ATi's historical position as follower (as opposed to innovator), it seems to me that the only company that would be likely to introduce a Rambus video card would be Nvidia.

    I wouldn't call ATi a "follower". There's nothing wrong with their capability to innovate. Rage128 was the best hardware when it can (but crappiest drivers seen by mankind) and Radeon is clearly the most innovative graphic product available until now.

    By Humus January 08, 2001, 06:36 AM

    I'd like to add ...
    What I'd like to see is better texture compression schemes. Compressed bumpmaps gives very weird results, thus makes texture compression useless on them. Some sort of texture compression that is created with bumpmaps in mind would be great. That's at least something that could boost performance in my 3dengine by 20%.

    By Moridin January 08, 2001, 11:18 AM

    I tend to agree with Arcadian. I think RDRAM is well suited to video cards. There seems to be some discussion about how much memory bandwidth is really required and whether or not it is really a bottleneck. This has been quite interesting and I am looking forward to seeing more on this. To me the correlation between bandwidth and video performance seems strong so I am guessing that it is important but I am no expert.

    My understanding is that video cards tend to use data in a very predictable way, and transfer large blocks of data at a time. This is where RDRAM really excels, since it's biggest problem is the initial latency. After the initial request is made RDRAM transfers data very quickly.

    QRSL RDRAM which has 3.2 GB/s on a 16-pin channel was announced last year although Rambus's web site lists it as being available in 2001. Maybe all this means is that although the technology is ready nobody is using it yet. Using QRSL and 128 data pins you could get 8*3.2=25.6 GB/s compared to 7 GB/s for the best DDR using the same number of pins. The restrictions on QRSL is that you can have a maximum of 4 devices on a 5 in channel. This should not be a problem for a video card.

    I suspect that RDRAM hasn't been used in video cards yet because of cost. The price of RDRAM has come down a lot recently but I think GPU makers have been reluctant to design around RDRAM while cost was high and availability was suspect, but if one company does I think others will follow suit. Graphics does seem to be one of Rambus's target markets.

    Further down the road I would like to see GPU's move towards embedded DRAM for memory this would provide the best possible bandwidth and latency. The only problems right now are cost of the special wafers required and the amount of die space required.

    By Arcadian January 08, 2001, 12:32 PM

    Moridin, not to mention that there is a limit to the amount of embedded memory that you can place on a die. The Playstation 2 and Game Cube are two examples. The PS2 could only fit 4MB of embedded SRAM, but that was on a .25u process, and SRAM tends to be much bigger and more expensive than DRAM. The Game Cube, which is on a .18u process using IBM's most cutting edge eDRAM process, only was able to fit 24MB of embedded memory on their chips. These amounts are certainly not enough for current video cards, which perform best at around 32MB-64MB. In the future, however, on .13u or better processes, we may see embedded memory start becoming available on video cards.

    By Moridin January 08, 2001, 01:58 PM

    quote:Originally posted by Arcadian:
    Moridin, not to mention that there is a limit to the amount of embedded memory that you can place on a die. The Playstation 2 and Game Cube are two examples. The PS2 could only fit 4MB of embedded SRAM, but that was on a .25u process, and SRAM tends to be much bigger and more expensive than DRAM. The Game Cube, which is on a .18u process using IBM's most cutting edge eDRAM process, only was able to fit 24MB of embedded memory on their chips. These amounts are certainly not enough for current video cards, which perform best at around 32MB-64MB. In the future, however, on .13u or better processes, we may see embedded memory start becoming available on video cards.

    Yes, that is what I was getting at with limited die space. The primary benefit of EDRAM is it takes much less space then SRAM therefor you can get more on a chip. I decided to start a new thread on EDRAM and I would be interested in hearing your thoughts on potential uses for this technology.

    By Marsolin January 08, 2001, 07:44 PM

    It think that RDRAM is very well suited for video memory. One detriment though is density. SDRAM technology typically seems to be one generation ahead of RDRAM. 512Mb is available for DDR, but RDRAM is still 256Mb.

    I agree with Arcadian that the 1 GB/s is conservative. Cypress released a 533MHz DRCG (Direct Rambus Clock Generator) a few months ago that I assume would be ideal for these situations.

    I have also heard rumors about the quad-pumped RDRAM that was mentioned above and it would be a logical progression. I think the proposed 8 channels of RDRAM to match the bit width of DDR would probably be pared down and the clock speed increased to account for the needed speed. Eight channels of RDRAM would take more space to route than 2 channels of DDR, even though the graphics chipset pin count would be the same.

    By 3dcgi January 08, 2001, 11:23 PM

    Despite one earlier post memory bandwidth is definitely a problem with graphics performance. Its effect will vary depending on the chip's architecture and every chip on the market today is affected by a lack of bandwidth. For proof look no further than benchmarks of nVidia's chips. Increasing memory clock speed boosts performance tremendously.

    Designers have attacked this problem with techniques like texture compression, z buffer wizardy (ATI's hyper z), and embedded dram among others. Embedded dram has so far been tried with limited success and it will probably be a while longer before it becomes mainstream. If it ever does. We have yet to see a product from BitBoys despite their promises of embedded dram. The best use of embedded dram might be as a supplement to DDR or RDRAM. Maybe as a level 2 or 3 cache. I also expect companies to start using some form of vertex or geometry compression.
    Why should textures get all the attention? Polygons take memory too.

    I don't know why RDRAM is not being more widely used, because there could be some advantages. Many people are saying that RDRAM is too expensive and that may be so, however RDRAM needs less pins for bandwidth equivalent to DDR which means the chip's die is smaller and less expensive. This is also one reason why DDR is used instead of just slapping more SDR chips on a board. Smaller is better.

    Technical risk and expertise might be a downside to using RDRAM. Intel has shown that designing chipsets and interfaces is difficult. Just look at all the problems with 820. Graphics companies are already coming out with new designs in as little as 6 months. Totally changing the architecture of the memory controller might be too risky.

    By m538 January 09, 2001, 02:11 AM

    Originally posted by Humus:
    Also, GPUs does cache video memory. They have both texture cache and vertex cache.
    ----------
    I hadn't found information what GPU does cache video memory, or I understood you incorrectly. I know that GPUs have small on-die cache. And I mean that GPU must use a part of local video memory to cache textures which basically stored in the main memory. Did you say that GPU does this without program help? If so, I will think about current situation as pretty good one.

    My last knowledge about NV20 that it will have 200DDR/256 memory and 300/256 chip, while GeForce2pro has 200DDR/128 memory and 200/256 chip, and GeForce2ultra has 233DDR/128 memory and 250/256 chip. I think nVidia engineers are smart enough and adjust their product for optimal performance. See, even they can make video card with 233DDR/128 memory and 200/256 chip, they didn't. They don't think video memory to GPU bandwidth is a bottleneck. Instead they develop AGP 8x even you said it is a shame.
    Evaluation:
    ( GeForce2pro memory bandwidth ) / ( GeForce2pro chip bandwidth ) = 1
    while
    ( NV20 memory bandwidth ) / ( NV20 chip bandwidth ) = 1.333
    "Aha!", you say, "video memory is a bottleneck". But NV20 will be clock-to-clock faster.
    Once again (sorry people) video memory is a bottleneck when all textures are loaded in it and you are comparing framerates above 100. But it is not the way video card must work. I am sure nVidia adjust chip/memory equation for games with textures and polygons (as 3dcgi significantly added) which take at least twice as entire video memory. People, that's right point.
    Probably when GPUs will reach 500 MHz, RDRAM will be MUST. I still think that video cards don't need high-clocked RDRAM at this time and are satisfied with bit-wide DDR RAM.

    By Humus January 09, 2001, 04:11 AM

    quote:Originally posted by m538:
    I hadn't found information what GPU does cache video memory, or I understood you incorrectly. I know that GPUs have small on-die cache. And I mean that GPU must use a part of local video memory to cache textures which basically stored in the main memory. Did you say that GPU does this without program help? If so, I will think about current situation as pretty good one.

    Games don't need to take care about memory management, that's something for the driver (at least in OpenGL and DirectX, but not in Glide). You just call a texture uploading function that is part of the API. In OpenGL it's called glTexture2D(). All the application needs to keep track on is the ID for that texture, but the driver decides what textures will be resident in video memory. As a developer you try to fit everything into video memory, because transfers over AGP each frame will kill performance.


    quote:Originally posted by m538:
    My last knowledge about NV20 that it will have 200DDR/256 memory and 300/256 chip, while GeForce2pro has 200DDR/128 memory and 200/256 chip, and GeForce2ultra has 233DDR/128 memory and 250/256 chip. I think nVidia engineers are smart enough and adjust their product for optimal performance. See, even they can make video card with 233DDR/128 memory and 200/256 chip, they didn't. They don't think video memory to GPU bandwidth is a bottleneck. Instead they develop AGP 8x even you said it is a shame.
    Evaluation:
    ( GeForce2pro memory bandwidth ) / ( GeForce2pro chip bandwidth ) = 1
    while
    ( NV20 memory bandwidth ) / ( NV20 chip bandwidth ) = 1.333
    "Aha!", you say, "video memory is a bottleneck". But NV20 will be clock-to-clock faster.
    Once again (sorry people) video memory is a bottleneck when all textures are loaded in it and you are comparing framerates above 100. But it is not the way video card must work. I am sure nVidia adjust chip/memory equation for games with textures and polygons (as 3dcgi significantly added) which take at least twice as entire video memory. People, that's right point.
    Probably when GPUs will reach 500 MHz, RDRAM will be MUST. I still think that video cards don't need high-clocked RDRAM at this time and are satisfied with bit-wide DDR RAM.

    Whatever you think you know about NV20, it's probably not gonna be final specs. Any, sure they didn't release GTS with lower clockspeed. Why would they reduce the clockspeed just because memory is slower? That will just lower the performance during not so memory intensive operations, such as flatshading or texture magnification.

    And AGP8x won't be available on NV20, the first AGP8x products are expected to come 2003.

    By Arcadian January 09, 2001, 10:58 AM

    quote:Originally posted by m538:
    Originally posted by Humus:
    My last knowledge about NV20 that it will have 200DDR/256 memory and 300/256 chip, while GeForce2pro has 200DDR/128 memory and 200/256 chip, and GeForce2ultra has 233DDR/128 memory and 250/256 chip. I think nVidia engineers are smart enough and adjust their product for optimal performance. See, even they can make video card with 233DDR/128 memory and 200/256 chip, they didn't. They don't think video memory to GPU bandwidth is a bottleneck. Instead they develop AGP 8x even you said it is a shame.
    Evaluation:
    ( GeForce2pro memory bandwidth ) / ( GeForce2pro chip bandwidth ) = 1
    while
    ( NV20 memory bandwidth ) / ( NV20 chip bandwidth ) = 1.333
    "Aha!", you say, "video memory is a bottleneck". But NV20 will be clock-to-clock faster.
    Once again (sorry people) video memory is a bottleneck when all textures are loaded in it and you are comparing framerates above 100. But it is not the way video card must work. I am sure nVidia adjust chip/memory equation for games with textures and polygons (as 3dcgi significantly added) which take at least twice as entire video memory. People, that's right point.
    Probably when GPUs will reach 500 MHz, RDRAM will be MUST. I still think that video cards don't need high-clocked RDRAM at this time and are satisfied with bit-wide DDR RAM.

    Assuming your knowledge of the NV20 is correct, don't you think that a lot of clever routing has gone into increasing the bit width of the memory bus to 256 bit? You claim that the DDR channel will be 200MHz and 256 bit wide. If they used the quad data rate Rambus, they could have made it 64 bit wide, running at 400MHz QDR, and make a much cheaper video card for the same bandwidth, and perhaps equal engineering work (of course this is a qualitative opinion, since I would not know how hard it is to implement quad data rate Rambus relative to routine 256 bits of DDR data).

    Or, they could have made a slightly cheaper solution using standard double data rate Rambus with 128 bits of data, and using the new clock generator that someone mentioned at 533MHz. This would make 533 * 2 * 2 * 8 = 17GB/s bandwidth, which is much more than the DDR solution of 200 * 2 * 8 * 4 = 12.8GB/s. (The numbers come from the clock speed of the memory, the number of data bits per clock (DDR or QDR), the width of the data in bytes, and the number of channels necessary for 128 bits and 256 bits, respectively).

    Overall, RDRAM would still have made a better solution. However, the caveats include more work being spent perfecting a new technology, inherent problems in the design that could cause timing inconsistancies or other bugs, Rambus politics, and many other reasons. I'm sure nVidia has their own reasons for not choosing it, but it does remain that it could have made a better, cheaper solution.

    By Galen_of_Edgewood January 09, 2001, 12:23 PM

    Rambus is killing itself by all of the lawsuits against possible partners. Sueing(sp?) Nvidia was probably one of Rambus' biggest mistakes.

    I do not personally see Rambus winning it's lawsuits. If this is true, I would highly doubt that Nvidia will support a former enemy because of the bad blood of the law suit.

    I do agree with you, Arcadian. Rambus' memory would go very well with video cards, but I do not see it happening unless Rambus wins all of its lawsuits. It will be the cheapest memory at that point in time and no other option of making video cards priced in that "sweet spot" would be possible, w/o the use of RDRAM.

    Here is another aspect to this that I was curious about. What about the amount of power that is required to keep the memories running? Is RDRAM more expensive than SDRAM or DDR SDRAM in terms of power?

    Then again, I've seen talk about an AGP Pro slot. Will this supply all of the power needs of a video card for a while?

    Another wrinkle that just popped into my mind. (Sorry, my mind is just rambling on over here.) What about the heat spreader that is located on the RDRAM chips? Will this be anymore of a problem/cost for the video card manufactuers to consider on their design of their video cards?

    By Marsolin January 09, 2001, 06:58 PM

    quote:Originally posted by Galen_of_Edgewood:
    Rambus is killing itself by all of the lawsuits against possible partners. Sueing(sp?) Nvidia was probably on of Rambus' biggest mistakes.

    Bad PR from all of these suits is a big problem for Rambus. Despite having a great technology DDR will probably win as long as it remains close in performance because no one likes Rambus.

    quote:Originally posted by Galen_of_Edgewood:
    Here is another aspect to this that I was curious about. What about the amount of power that is required to keep the memories running? Is RDRAM more expensive than SDRAM or DDR SDRAM in terms of power?

    Then again, I've seen talk about an AGP Pro slot. Will this supply all of the power needs of a video card for a while?

    AGP Pro allows about 110W of power. And that should be plenty of headroom for a while.

    By Un4given January 11, 2001, 05:21 PM

    One of the main problems I see with RDRAM is the memory controller side of things. Now I'm not an engineer, but if you look at what it took Intel to get a decent performing Rambus chipset, that is a lot work.

    Right now the dual channel Rambus solutions are putting out 3.2GB/sec. Now even if you go quad channel, you get 6.4GB/sec, which is about right on par with a GF2 Pro. The more channels you add, the more complex the memory controller becomes.

    Also keep in mind that QDR SDRAM is on the horizon.

    Lastly, I do believe that memory is the biggest bottleneck. I also think that tile based rendering or other HSR techniques would be a better idea than trying to continually increase memory bandwidth via RAM only. By reducing or eliminating overdraw, and substantial amount of memory bandwidth is conserved.

    By Arcadian January 11, 2001, 08:42 PM

    quote:Originally posted by Un4given:
    One of the main problems I see with RDRAM is the memory controller side of things. Now I'm not an engineer, but if you look at what it took Intel to get a decent performing Rambus chipset, that is a lot work.

    Right now the dual channel Rambus solutions are putting out 3.2GB/sec. Now even if you go quad channel, you get 6.4GB/sec, which is about right on par with a GF2 Pro. The more channels you add, the more complex the memory controller becomes.

    Also keep in mind that QDR SDRAM is on the horizon.

    Lastly, I do believe that memory is the biggest bottleneck. I also think that tile based rendering or other HSR techniques would be a better idea than trying to continually increase memory bandwidth via RAM only. By reducing or eliminating overdraw, and substantial amount of memory bandwidth is conserved.

    Actually, the small pin count, and packate based data transfer allow RDRAM to lend itself to multiple channels. There is no reason that I can see not to go with 8 channels for a video card. The routing of data pins will be no more complicated than for SDRAM. In fact, Rambus designed their technology around multiple channels, so adding more RDRAM channels does not complicate the memory controller nearly as much as adding more pins for SDRAM (as video cards currently do).

    By m538 January 12, 2001, 10:55 PM

    For purposes of dispute continuation, can anyone evaluate where memory bandwidth requirement ends? Take your GF2 Ultra, downclock GPU as possible, and overclock RAM while performance significantly rises. Why all that stuff? In the first post Arcadian offered to increase memory bandwidth about three times. I only want to know how many times GPU bandwidth must be increased to eliminate bottlenecks in a video card.

    If nVidia will take care about cool memory controller, common cards will come with 128-256Mb of RDRAM to justify expensive memory controller. And if GPU will be twice as fast, video card will cost more than entire value computer. How many peoples will pay 1000-2000$ for most buggy part of PC?

    By Plantruler January 13, 2001, 02:37 PM

    [QUOTE]Originally posted by Arcadian:
    [B]I wanted to start a technical discussion today, and see what people would think about putting RDRAM on a video card instead of SDRAM.

    I think it's an interesting thought but;
    I used to have a video card with rambus on it {it was a 4 megabyte 64 bit Chaintech Desperado 3d with the Cirrus Logic 5465 laguna 3d chipset}
    This thing was actually slower than my older Creative Graphics blaster ma202 with the 5446 & 2 megs of 50ns edo memory on it.
    For instance; I was getting 60-65 frames per sec in Duke Nukem3d {in the build editor at 640x480 resolution}with the older 5446 versus 40-45 fps with the rambus equipped 5465.
    Nearly all of my non3d games for dos and for windows including Blood, Shadow Warrior, Quake, Winquake, Doom, Doom2, Doom95, non accelerrated quake2, were running only 2/3rd's as fast on the rambus equipped laguna3d 5465 versus the venerable edo equipped 5446.
    Clearly a lot of work still needs doin' to make rambus really worth the fuss and expense.
    As for ddr, I'm looking forward to the final product of my dream for a 300-350mhz .13 micron 256bit gpu with not a gimpy 128bit memory data path, but a competent 256 bit datapath and 350-400 mhz ddr memory.
    And have this animal{oops-i-mean-uh Beast) scan line interleavable with the optional addition of a daughtercard for a comparable price as dual voodoo2's were when they 1st stormed onto the market 3 years ago

    By Bateluer January 13, 2001, 03:29 PM

    I didn't read the entire thread, but I am sure someone brought this up. Rdram has already been used on video cards. It was a total failure, the card was so slow, Quake benches were timed with a calender. Creative is still selling a 3D Blaster on their site, using Rdram. http://www.creativelabs.com/graphics/gb-3d/
    The description clearly states its using Rdram. Would you trade a GF2 for that?

    BTW DDR can go much higher than 220, projections place it at over 400 by 2002.

    This has all probably been said, but I feel better saying it.

    By Humus January 13, 2001, 06:50 PM

    One thing to add to this topic. While most rendering will be able to use most of RDRams bandwidth some features will take a huge performance hits. One is EMBM. That's because the environmentmap is addressed by the color values from earlier passes. That means, it'll need color values from the environmentmap in a sort of random way on per pixel level. It'll be very slow. The same goes for the yet to be implemented hardware feature called dependent texture reads (of which EMBM is a special case).

    By Moridin January 13, 2001, 08:40 PM

    quote:Originally posted by Un4given:
    One of the main problems I see with RDRAM is the memory controller side of things. Now I'm not an engineer, but if you look at what it took Intel to get a decent performing Rambus chipset, that is a lot work.

    Right now the dual channel Rambus solutions are putting out 3.2GB/sec. Now even if you go quad channel, you get 6.4GB/sec, which is about right on par with a GF2 Pro. The more channels you add, the more complex the memory controller becomes.

    Also keep in mind that QDR SDRAM is on the horizon.

    Lastly, I do believe that memory is the biggest bottleneck. I also think that tile based rendering or other HSR techniques would be a better idea than trying to continually increase memory bandwidth via RAM only. By reducing or eliminating overdraw, and substantial amount of memory bandwidth is conserved.


    I've heard rumors that the Rambus reference design for the memory controller is not very good and that Compaq is replacing it in there 8 channel RDRAM bus in the Alpha EV7 will use.

    I think there may be a nice pin count comparison in here someplace. The EV7 will have 1700 pins (total for the chip) and 12.6 GB of bandwidth. IBM's Power4 will have less bandwidth despite having over 5500 total pins. Can anybody confirm these numbers? Also I should note that the Power4 is has multiple cores on a chip.

    By supercaffeinated January 14, 2001, 12:07 AM

    Do you guys actually think that you were the first ones to think of this?

    RDRAM was used in the original Creative Graphics Blaster 3d card about 3 years ago.

    Guess what? It SUCKED.

    Guess what else? RDRAM has terribly high latency, making it totally unsuitable for graphics.

    DDR has low latency and high bandwidth, making it perfect for graphics.

    You might check around a bit next time before you waste your time on a discussion like this in the future.

    By Moridin January 14, 2001, 01:23 AM

    quote:Originally posted by supercaffeinated:
    Do you guys actually think that you were the first ones to think of this?

    RDRAM was used in the original Creative Graphics Blaster 3d card about 3 years ago.

    Guess what? It SUCKED.

    Guess what else? RDRAM has terribly high latency, making it totally unsuitable for graphics.

    DDR has low latency and high bandwidth, making it perfect for graphics.

    You might check around a bit next time before you waste your time on a discussion like this in the future.

    There is likely more to it then that. A video card can perform poorly for many reasons. If bandwidth was not a bottleneck to begin with then RDRAM is not a factor at all.

    Another thing to look at was the type of RDRAM used. PC 600, PC 700, and PC 800 are all relatively new. Before that most RDRAM clocked at 400 MHz. If they used a single channel this is only 800 MB /s bandwidth. They may have used Rambus to reduce pin count and therefor design cost, not to get performance.

    What makes you think latency is particularly important for graphics chips? Graphics chips tend to transfer very large blocks of data at once. This is something that RDRAM is particularly good at.

    What makes you think RDRAM has particularly high latency? It uses the same basic cells as SDRAM and DDR so at that level the latency is the same or better. It does have some additional overhead due to decoding packets but that is only 2 cycles. RDRAM also can have lower latency then DDR in many cases due to the large number of open banks supported by Rambus.

    You might want to think a bit before posting something like this. There are a lot of very knowledgeable people in this forum and there are good reasons for what they are saying.


    By Chas January 14, 2001, 05:52 AM

    Sorry, but in competitive online play, latency is EVERYTHING.

    Personally, I feel that a two-level RAM soloution might be more optimal than developing something like a RAMBUS-based card.

    Have a small amount of high-speed RAM on the die itself. At that point, you can determine how wide a data path to the processing units you want to supply. Everyone knows where this idea's from, so I don't need to mention it. But the basic reasoning behind it is quite solid.

    Such on-die memory could head off any major bandwidth problems. All that's required then is that you back it with a decent (and decently clocked) amount of DDR (or the upcoming QBR) RAM.

    Yes, adding a said on-die RAM will bump up the price per-chip, but consider how little the price of the chip affects the end-price of the graphics card (anywhere from 10 to 15 of most cards).

    But look at it this way, it's a cost trade-off. The chip will get more expensive (figure conservative a price increase of 50-150%), yet you shouldn't have to scale the second-level RAM subsystem as quickly as you're doing with today's cards. So there's more of a chance for higher quantities to be on the market, and thus drive down prices there.

    By Humus January 14, 2001, 08:26 AM

    quote:Originally posted by Chas:
    Sorry, but in competitive online play, latency is EVERYTHING.

    Personally, I feel that a two-level RAM soloution might be more optimal than developing something like a RAMBUS-based card.

    Have a small amount of high-speed RAM on the die itself. At that point, you can determine how wide a data path to the processing units you want to supply. Everyone knows where this idea's from, so I don't need to mention it. But the basic reasoning behind it is quite solid.

    Such on-die memory could head off any major bandwidth problems. All that's required then is that you back it with a decent (and decently clocked) amount of DDR (or the upcoming QBR) RAM.

    Yes, adding a said on-die RAM will bump up the price per-chip, but consider how little the price of the chip affects the end-price of the graphics card (anywhere from 10 to 15 of most cards).

    But look at it this way, it's a cost trade-off. The chip will get more expensive (figure conservative a price increase of 50-150%), yet you shouldn't have to scale the second-level RAM subsystem as quickly as you're doing with today's cards. So there's more of a chance for higher quantities to be on the market, and thus drive down prices there.

    That latency .. uhm, that's just another word for performance ...

    Anyway, another way to tackle the memory bandwidth issue might be doing it ala Voodoo2. That is, one 4MB memory chip for the frame buffer, one for texture unit 0 and one for texture unit 1 etc. For next generation cards it would perhaps be 32MB for the framebuffer (1600x1200x32 tripplebuffered + 32bit zbuffer = 29MB), 32MB for texture unit 0, 32MB for texture unit 1 and 32MB for texture unit 2 for a total of 128MB. That would significantly improve multitexturing performance, and would make higher quality lightmapping a possibility.

    By m538 January 14, 2001, 08:44 AM

    Originally posted by Chas:
    Sorry, but in competitive online play, latency is EVERYTHING.
    ------------
    If I understood correctly, latency of RDRAM produces frame latency in an Internet game. In such question RDRAM latency can affect framerate, and low framerate can produce frame latency. RDRAM latency is TOO small to produce frame latency in an Internet game.

    Also latency of RDRAM in TIME UNITS (ns) is still smaller than SDRAM latency. Work RDRAM frequency is three times higher than SDRAM (or DDR) one - 400 / 133. I know that video memory is running at 200MHz, but I agreed with Arcadian directly connected RDRAM can reach 600MHz. 600 / 200 also equals 3. RDRAM latency in CYCLES is NOT three times higher than SDRAM (or DDR) one, you catch the idea.
    It is very funny, I am a kind of opponent to Arcadian's suggestion because of unused high clock (low latency) of RDRAM. Instead I suggest bit-wide DDR RAM with nearly unlimited room (as applied to video card) for increasing bandwidth via bit width increasing, not latency decreasing. But several people prefer DDR RAM for its low latency! No comments!

    By Moridin January 14, 2001, 11:04 AM

    quote:Originally posted by Chas:
    Sorry, but in competitive online play, latency is EVERYTHING.

    Personally, I feel that a two-level RAM soloution might be more optimal than developing something like a RAMBUS-based card.

    Latency is a concept that occurs in many places. In online gaming you are usually concerned about the latency of your Internet connection (measured in tens of milliseconds). When you are talking about DRAM you are concerned with the latency of the memory itself (measured in tens on nanoseconds or about 1 000 000 times smaller then Internet latency.)

    Latency is the amount of time it takes for information to return once you request it. In the case of Internet gaming this determines how much the action on your machine is “out of sync” with the other machines.

    MPU's do have memory on chip, its called cache. Caches don't do much for bandwidth though, but they greatly improve latency. Does anyone know what is typical for the cache on a GPU (graphics chip)? Do they even have caches?


    By Moridin January 14, 2001, 11:21 AM

    quote:Originally posted by m538:
    Originally posted by Chas:
    Sorry, but in competitive online play, latency is EVERYTHING.
    ------------
    If I understood correctly, latency of RDRAM produces frame latency in an Internet game. In such question RDRAM latency can affect framerate, and low framerate can produce frame latency. RDRAM latency is TOO small to produce frame latency in an Internet game.

    Also latency of RDRAM in TIME UNITS (ns) is still smaller than SDRAM latency. Work RDRAM frequency is three times higher than SDRAM (or DDR) one - 400 / 133. I know that video memory is running at 200MHz, but I agreed with Arcadian directly connected RDRAM can reach 600MHz. 600 / 200 also equals 3. RDRAM latency in CYCLES is NOT three times higher than SDRAM (or DDR) one, you catch the idea.
    It is very funny, I am a kind of opponent to Arcadian's suggestion because of unused high clock (low latency) of RDRAM. Instead I suggest bit-wide DDR RAM with nearly unlimited room (as applied to video card) for increasing bandwidth via bit width increasing, not latency decreasing. But several people prefer DDR RAM for its low latency! No comments!

    In absolute terms (measured in ns) RDRAM latency of a single transfer is usually 20 – 30 % higher then SDRAM. This is far from the entire story though. RDRAM is a different technology with different strengths and weaknesses then SDRAM or DDR. RDRAM has a number of features that help reduce latency over that of SDRAM.

    As a general rule RDRAM is better for less frequent, large transfers while SDRAM is better at small random transfers. To take full advantage of RDRAM you want data access that fits this pattern. My understanding is that this is true of graphics cards.

    Its off topic a bit, but streaming apps like the ones the P4 excels at also fit this pattern. The P4 also uses larger cache lines to make data access fit this pattern better, and has a built in prefetch to eliminate latency when it does not.


    By awa64 January 14, 2001, 12:31 PM

    I've read through pretty much this whole thing, and it's a really interesting discussion. Every point made was valid, and if it was done properly, a RDRAM video card could work pretty well. There are many factors that make it impractical, though. As stated before, nVidia has no intentions of an RDRAM card, nor do any other major video card producer. Second, to get a real major improvement, the cards would end up costing way more than most people would want to pay for a graphics card. Third, and most importantly, unless we're doing a test, or play really hardware-intensive games, then a 32MB TNT-2, or a GeForce or GeForce2-based chip will suffice for any needs. I'm writing this on a 266MHz PII with a 6GB HD, and a 4MB Matrox Millenium GII graphics card, and it does everything I want it to do with ease. My main gaming computer is a 800MHz Athlon T-Bird, with a 45GB HD, a 32MB TNT2 graphics card, and unless I'm playing First-Person-Shooters, I don't see any difference between the two when I'm using them. In closing, an RDRAM video card would be pretty good, but would be too expensive and excessive for anybody but the biggest 'gotta-have-it' techie, or someone who runs amazingly graphics-intensive programs.

    By Conrad Song January 14, 2001, 02:29 PM

    quote:Originally posted by Moridin:
    Latency is a concept that occurs in many places. In online gaming you are usually concerned about the latency of your Internet connection (measured in tens of milliseconds). When you are talking about DRAM you are concerned with the latency of the memory itself (measured in tens on nanoseconds or about 1 000 000 times smaller then Internet latency.)

    Latency is the amount of time it takes for information to return once you request it. In the case of Internet gaming this determines how much the action on your machine is “out of sync” with the other machines.

    MPU's do have memory on chip, its called cache. Caches don't do much for bandwidth though, but they greatly improve latency. Does anyone know what is typical for the cache on a GPU (graphics chip)? Do they even have caches?


    Caches improve both mean bandwidth and mean latency.

    By Humus January 14, 2001, 02:30 PM

    quote:Originally posted by Moridin:
    MPU's do have memory on chip, its called cache. Caches don't do much for bandwidth though, but they greatly improve latency. Does anyone know what is typical for the cache on a GPU (graphics chip)? Do they even have caches?

    I tested the cache size on my old G400 once by increasing the texture size until I saw a significant decrease in performance. Up until 256x256x32 the performance wasn't affected much, but when going higher than that the performance was reduced a lot. That would imply a cache size of 256kb I guess. When I tested the same thing on my Radeon I didn't see a slowdown even after reaching 2048x2048x32 ... I doubt it has the 16MB cache needed to fit that texture into it, so I guess they have some optimized memory access pattern ...

    By Moridin January 14, 2001, 06:37 PM

    quote:Originally posted by Conrad Song:
    Caches improve both mean bandwidth and mean latency.

    It depends on the data access pattern. If you are accessing the same memory location over and over again you can reduce the total bandwidth demand but the primary job of cache is to decrease effective latency. To put it another way a Celeron with 128 K of cache doesn't need more memory bandwidth then a PIII with 256 K of cache. In fact the performance reduction due to higher latency would likely reduce the bandwidth used by the Celeron.

    By Moridin January 14, 2001, 06:41 PM

    quote:Originally posted by Humus:
    I tested the cache size on my old G400 once by increasing the texture size until I saw a significant decrease in performance. Up until 256x256x32 the performance wasn't affected much, but when going higher than that the performance was reduced a lot. That would imply a cache size of 256kb I guess. When I tested the same thing on my Radeon I didn't see a slowdown even after reaching 2048x2048x32 ... I doubt it has the 16MB cache needed to fit that texture into it, so I guess they have some optimized memory access pattern ...


    Could the drop off be due to hitting a bandwidth limit and not a cache size? I would not have thought that a GPU would have a cache this large.

    By Plantruler January 14, 2001, 08:34 PM

    quote:Originally posted by Bateluer:
    I didn't read the entire thread, but I am sure someone brought this up. Rdram has already been used on video cards. It was a total failure, the card was so slow, Quake benches were timed with a calender. Creative is still selling a 3D Blaster on their site, using Rdram. http://www.creativelabs.com/graphics/gb-3d/
    The description clearly states its using Rdram. Would you trade a GF2 for that?

    BTW DDR can go much higher than 220, projections place it at over 400 by 2002.

    This has all probably been said, but I feel better saying it.

    Yup, the Cirrus Logic 5465 Laguna 3d struck here too.
    Thanks for bringing up the creative Graphics Blaster Exxtreme Rambus fiasco, I had (hehe) forgotten about that one.

    By m538 January 15, 2001, 03:16 AM

    The GPU cache is mostly a buffer that connect GPU and memory with various frequencies and bit-width. Also it allows for GPU not to be idle while vertical synchronisation requests (on non-SGRAM cards).

    By m538 January 15, 2001, 03:27 AM

    Originally posted by Moridin:
    In absolute terms (measured in ns) RDRAM latency of a single transfer is usually 20 – 30 % higher then SDRAM. This is far from the entire story though. RDRAM is a different technology with different strengths and weaknesses then SDRAM or DDR. RDRAM has a number of features that help reduce latency over that of SDRAM.

    As a general rule RDRAM is better for less frequent, large transfers while SDRAM is better at small random transfers. To take full advantage of RDRAM you want data access that fits this pattern. My understanding is that this is true of graphics cards.

    Its off topic a bit, but streaming apps like the ones the P4 excels at also fit this pattern. The P4 also uses larger cache lines to make data access fit this pattern better, and has a built in prefetch to eliminate latency when it does not.


    First, I don't understand is that post refutation or confirmation. First part is confirmation, where you by yourself said "RDRAM has a number of features that help reduce latency over that of SDRAM". Second part is refutation, just full inversion of my suggestions. Also earlier you commented my first post in this thread and said that you are mostly agreed but also said "graphic memory is actually a place where RDRAM would fit great" while I typed "GPU itself isn't high-clocked enough, it's bit-wide, and if it will need more bandwidth it will utilize cheaper bit-wide memory instead of high-clocked. Even DDR technology has virtually double bit width not frequency". If you thought that DDR technology is one that used in RDRAM then sorry for my tacit language, otherwise I think it is clearly enough that I prefer DDR RAM for video cards.
    Second, you said "in absolute terms (measured in ns) RDRAM latency of a single transfer is usually 20 – 30 % higher then SDRAM". That is more interesting statement, but I afraid it is not fully true. This number I guess come from comparison of P3 system where processor and memory FSBs both clocked at 133MHz with P4 system. I suppose latency you talked about is time in ns between processor requests data and gets it, I must consider it is true. But low latency of a single transfer on P3 system comes from simplified memory controller, not from memory latency itself. SDRAM DIMM latency is still higher than RDRAM RIMM one.
    But complex memory controller of P4 delays a single transfer. But this is no matter for overall performance, because interactions with memory aren't performed when processor actually does this, generally speaking processor's work of a moment is undefined last years. interactions with memory are performed to synchronize it with cache or to serve DMA requests. If we will measure latency of SEVERAL single transfers in the different places in memory then P4 will beat P3. To judge between linear data access and erratic data access we must consider time latency of memory itself, because we talk about plural transfers not single. In the light of this I choose RDRAM for main memory and DDR or QDR for video memory. "Stupid" Intel engineers also chose RDRAM and gained triple memory performance over Athlon, and "smart" AMD engineers chose DDR and gained... nothing. Also, don't you know why KT133A is so much even without DDR? Because they "synchronized" FSBs and eliminated latency produced by 100*2-to-133 buffer. And that buffer wasn't needed. This is just a trick to market Athlon with initial 100 MHz FSB and then easily move it on 133MHz platform. They hoped that modern programs access continuous data, and that in any case other devices need memory bandwidth and this will decrease available "memory frequency" to 100 MHz. But next such trick I will personally write letter with only three words "Access Memory Directly".
    Third, I tired a bit from P4 optimized for streams, but I also tired a lot from writing. Arcadian, how do you do this? I only can add now that knowing incredible ability of AMD to implement various caches it is not difficult for them to add third cache like P4's level 1 and beat Intel both in stream and erratic tasks. BOTH ERRATIC AND STREAM ALGORITHMS WILL EXIST FOREVER AND BOTH ARE ESSENTIAL FOR PERFORMANCE.

    By supercaffeinated January 15, 2001, 01:28 PM

    So many people in the semiconductor world have made the same mistake.

    Attempting patch fixes rather than focusing on the real problems. Caching is a kludge.

    There are 3 basic things that matter in memory. Capacity, bandwidth and latency.

    All this effort put into staged caches and cache algorithms should be put into simply making faster memory.

    What? SRAM is too expensive? Put the effort into making fast RAM cheaper rather than wasting time writing complex cache algorithms.

    Caches will always have a "worst case scenario" where they perform horribly. Sure, that will only happen 10% of the time, depending on what you're doing, but according to murphy's law, that 10% will be when you need it most, like when your car's CPU is trying to calculate the rate for your ABS brakes, or when you make an unpredictable move into a previously unvisited area on a map in Quake 3.

    One of these days (hopefully soon), someone will develop memories that are as fast, dense and cheap as possible, and there will be no point in caching because every access will take as long as any other. The sooner we get to that day the better.

    RDRAM has inherent flaws that lead to high latency. The more RDRAM you add to a memory subsystem, the higher the average latency gets because RDRAM travels in packets and these packets must traverse all modules before reaching the CPU. Of course, a video card could be designed in such a way that a particular model might work acceptably. It doesn't, however scale particularly well if you need constant low latency.

    By Humus January 15, 2001, 03:34 PM

    quote:Originally posted by Moridin:

    Could the drop off be due to hitting a bandwidth limit and not a cache size? I would not have thought that a GPU would have a cache this large.

    It's probably a combination of the two. The whole texture doesn't needs to fit into the cache, just as much as the number of texels that's requested by the GPU can be gotten from either of the cache or memory. So the G400 could perhaps have 128kb cache.
    GPUs don't need as fast caches as CPUs, so they can probably be made a litte more dense on GPUs. I think that the typical cache size on recent GPUs is about 256kb.

    By Humus January 15, 2001, 03:40 PM

    quote:Originally posted by supercaffeinated:
    So many people in the semiconductor world have made the same mistake.

    Attempting patch fixes rather than focusing on the real problems. Caching is a kludge.

    There are 3 basic things that matter in memory. Capacity, bandwidth and latency.

    All this effort put into staged caches and cache algorithms should be put into simply making faster memory.

    What? SRAM is too expensive? Put the effort into making fast RAM cheaper rather than wasting time writing complex cache algorithms.

    Caches will always have a "worst case scenario" where they perform horribly. Sure, that will only happen 10% of the time, depending on what you're doing, but according to murphy's law, that 10% will be when you need it most, like when your car's CPU is trying to calculate the rate for your ABS brakes, or when you make an unpredictable move into a previously unvisited area on a map in Quake 3.

    One of these days (hopefully soon), someone will develop memories that are as fast, dense and cheap as possible, and there will be no point in caching because every access will take as long as any other. The sooner we get to that day the better.

    RDRAM has inherent flaws that lead to high latency. The more RDRAM you add to a memory subsystem, the higher the average latency gets because RDRAM travels in packets and these packets must traverse all modules before reaching the CPU. Of course, a video card could be designed in such a way that a particular model might work acceptably. It doesn't, however scale particularly well if you need constant low latency.

    I guess you're rather alone on that opinion. Sure, it might be possible to make a memory almost as fast as a cache to use as system memory or gpu memory, but to what cost? It would probably cost as much as having 64MB cache and would produce HUGE amounts of heat.
    Caching is definitly here to stay, and is probably the best solution to memory access problems.

    By Marsolin January 15, 2001, 05:37 PM

    One thing to consider when comparing latency between DDR and RDRAM is the amount of bandwidth being used. Latency does not remain constant throughout the usage curve. When the bandwidth usage increases RDRAM begins to see read latency advantages.

    DDR has turnaround cycle penalties for Write to Read (Read after Write), Read to Read, and Read to Write (Write after Read) situations. Write to Read delays are smallest when accessing different DRAM banks, while Read to Read delays are smaller when accessing the same physical bank. This does not apply to data transfers within the specified burst length, but between those burst transfers. A typical burst length is 4 for system memory, but I'm not sure if that would change for graphics memory.

    RDRAM has only Read after Write penalties. Please note though that there are other latencys involved with each memory type. Their protocols are different and memory controller differences could also cause additional delays.

    By Moridin January 16, 2001, 11:04 AM

    quote:Originally posted by m538:


    First, I don't understand is that post refutation or confirmation. First part is confirmation, where you by yourself said "RDRAM has a number of features that help reduce latency over that of SDRAM". Second part is refutation, .

    Maybe it would be clearer if I said it this way. RDRAM starts off at 20%-30% latency disadvantage compared to SDRAM. This is primarily related to time of flight and packet decode.

    After this however most of the advantages go to RDRAM. It generally offers more bandwidth and supports more open pages. The one big disadvantage is that it doesn't support critical first word bursting. I don't think that this is a big issue with the P4 due to the FSB speed. (The P4 FSB can send an entire 2 RDRAM packets in one cycle so the first word arrives on either the first or second tick of the 100 MHz FSB clock)
    Read Marsolin's post for some insight on other things that effect latency.

    By email_atif January 24, 2001, 05:05 PM

    Hey SharkyForum Readers,

    Once again, Arcadian, you've got an excellent, informative post up here. I remember reading a while ago that a video card, a home-based video card, not some Quantum-based Flight Simulator System Video Card, featured RDRAM. It was a Cirrus-Logic card if I'm not mistaken. Anyhow, it didn't perform too well, most likely because it couldn't take advantage of the added bandwidth of the RIMM. Now I realize that their card was vaporware, however, BitBoys Oy always spoke of implementing I believe DRAM internally, and SDRAM externally (XBA Architecture). According to them, this RAM was able to achieve a bandwidth of 12.5 GB/s. Here's how they broke it down:

    Memory Bus: Extra-wide 512-bit Bus
    Every clock cycle, it can achieve 768-bits (utilizing eDRAM+AGP+SDRAM) every clock cycle.

    I think this technology, if applied through an actual hardware-developer, would be simply stunning, we'll have to wait and see.

    Late

    By DaveB3D January 25, 2001, 06:13 AM

    The problem with XBA, at least in the form found in Glaze3D, was that it completely lacked cost effectiveness. They were taking the most absolute possible brute force approach, and that is frankly stupid (I don't have the slightest clue why anyone would take such an approach, though I'd love to sit down with the engineers and talk to them about it and better solutions).

    XBA basically is nothing more than embedded memory. Embedded memory massively increases your transistor count. If I had to theorize the size of the chip, I would guess Glaze 3D would have been around 120 million transistors (and that is without DX8 complaince). That is just a guess though, I could do the math and calculate it better but don't feel like it right now.

    So obviously that chip is huge, so it takes up a lot of wafer space and brings low yields because of massive complexity. That is why we haven't seen it yet, because they can do it on the FAB.

    Now if they wanted to do things intelligently, they would take much better approachs doing some occlusion culling, Z compression, deferred rendering or something. They are rendering everything and just using a massive ammount of bandwidth. It just is not a smart move at all.

    If anyone would care to discuss this further, let me know.. If not I'll end it here.

    By email_atif January 25, 2001, 04:31 PM

    Hey SharkyForum Readers,

    Dave, your point is well-taken. I agree with you on the point that the embedded memory they are speaking of would result in an outrageous price, but for the most part, I was led to believe that despite the expensive costs, because the actual size of the RAM does not have to be much to attain excellent real-world performance that this RAM architecture has been implemented in one of the new video game systems (I just can't remember which one.) Either way, that's totally true, as much as people seem to love dishing out $500 for a nVidia GeForce2 Ultra, a card with this memory architecture WOULD cost a lot more. It was worth a try though :-)

    Late

    By Humus January 25, 2001, 08:23 PM

    Embedded RAM could be a solution further into the future when we may pack a little more transistors onto the chip. One could perhaps use embedded RAM just for the framebuffer. I think the graphics chip that's gonna be in the gamecube is going to do it this way.

    By DaveB3D January 26, 2001, 08:12 AM

    Are you nutts? Do you realize that if you want to use embedded memory as a "solution" to the realtively near future (next year) that you are going to need like a 64 MB frame-buffer.. so you want to embed 64 MB of memory onto a chip? That is just crazy.

    Rather, instead you do other stuff and you use a small amount of frame-buffer space for some tricks.

    For example, I might come up with a deferred rendering architecture. That covers all the needs I have and basically eliminates the need for embedded memory. However, I could still use embedded memory for some things if I needed. For example, I could use it to pull some interesting tricks in order to remove all non-visible triangles before deferring to remove at the pixel level. Or I might use it as a bin cache, having several bins always waiting for my tile buffer. Interesting stuff like that.


    If I were going the traditional route, I might use it to store my Z-buffer and use a heirarchial Z with compression or something along those lines.

    There are many tricks you can use, but using embedded memory as your overall solution to the problem is just a bad, bad idea.


    By Humus January 26, 2001, 09:24 AM

    When I said "further into the future" I thought it was obvious that I didn't mean the absolutely nearest future, but rather like 3 or more years from now. And I didn't promote it as THE solution. (And for todays cards you wouldn't need 64MB framebuffer memory either. 1600x1200x32 tripplebuffered and with 32bit Zbuffer is about 30MB.)

    Of course smarter GPU chips are preferred, but sometime in the future it may be very difficult to further reduce the memory bandwidth needs by using compression technologies or other tricks. At that point a embedded RAM solution could be a possibility (but there will probably be other possibilities...). At that point I of course also expect that we have something like a .10 or smaller process available so we can pack a little more into the chip.

    By DaveB3D January 26, 2001, 09:54 AM

    Ok, I gotcha.

    As for frame-buffer size, I said next year. If anyone is moving like 3dfx was they'll need it...

    By email_atif January 26, 2001, 12:48 PM

    Hey SharkyForum Readers,

    Yeah, that's why this architecture featured a mixed RAM solution. The Embedded DRAM was limited to 8MB, with the external SDRAM taking the rest of the load. This could prove to be more effective in the future (Perhaps in 2002.) Perhaps even more so if DDR RAM were used to replace the SDRAM. What do you guys think?

    Late

    By DaveB3D January 26, 2001, 01:30 PM

    The problem is that limiting the amount of memory greatly effects efficiency. A good advantage of having some is that you can do your frame-buffer read/writes to the embedded memory, but you need to have a very efficient architecture to write from the embedded memory to the local memory on the fly (so you don't overflow).

    By idris5 January 28, 2001, 08:03 PM

    A 300 million transistor design (32MBytes of memory+ GPU core and control) is a while off - probably three years at least (take a look at the SIA roadmap). It isn't so much a complexity problem as a 3.2 million gate Xilinx FPGA is around 300 million transistors already. The problem could well be one of clock skew on accross the device - I'll get to this in a second.

    I had the oppurtunity to talk to some of the designers of the V4400e GPU that Micron produced. This had 12MB of eDRAM on it (the reason they designed the chip was basically to demo their eDRAM tech) but one of the problems was shifting data around. It was mentioned that if the chip had been any larger buffers would have been required on the internals buses to combat clock skew. The device was manufactured in 0.25u (which considering it was 125 million transistors it certainly used a lot of power), however interconnect delays are not significant compared to gate delays at this feature size, so it would not have helped clock skew to have dropped the feature size.

    Perhaps the biggest problem with a chip of this size would be yield. Xilinx charge an incredible amount for FPGA's - I paid $60 for a 50 million gate speed grade 5 device. This isn't anywhere near the top end in terms of power consmption or speed where you'll be looking at a $150 or so. Xilinx can afford to charge this much as they've got a captive corporateaudience and also they may not get an excellent yield, but they aren't shifting loads either. However Nvidia, ATI or however must be sure of getting excellent yields and they must make cheap GPU's too (I think that a GTS is about $50 - give or take because I use them last summer and they'll have changed a bit + my memory is crap!).

    Whatever, I doubt we'll see large amounts (16MByte+) of eDRAM or 1T-SRAM on any cards for a while, but we may see 8MBytes or so quite soon (its easily technically possible and won't be too bad for yields).

    How about an eDRAM z-buffer? Dave?

    By ace_d_1 January 28, 2001, 09:00 PM

    Rambus ram would be good in a vid card, however it is not the rout that I would take. After doing some marketing resaerch on ram, I found an interesting article @ tomshardware relating to ram. Anyway he mentioned several verietes of ram including enhanced sdram. This is nothing more then ram with ram, but you can get it to run at 200+ mhz with no other enhancements. So I would mix this with ddr technology, vcm tech, or even some odd mix. This should be able to reach speeds of 600mhz+ on the basis of ddr reaching 400+ and enhaced ram reaching 200+. On top of all of that the vch or virtual chanel pathways should increase bandwith by about 50%, making this video ram reach the bandwith of 9 times pc100 ram.

    By DaveB3D January 29, 2001, 01:04 AM

    Interesting information.

    Personally I have greatly favor deferred rendering so Z-bandwidth isn't an issue. On a traditional, that can help. Even there though, Z compression can greatly reduce the bandwidth needed (say 75% reduction), so it becomes less a factor.

    By James January 29, 2001, 01:43 AM

    Sorry to interrupt, but if I can I would like to pose a few "low brow" questions.

    1. If what I am getting from this discussion is correct, RDRAM is fairly slow to get started (20-30% higher latency) but once it is going it has a much higher throughput. If this is correct it would be well suited to media or similar applications where initial latency is not important, but throughput is.

    My first question is this: Are video card memory demands in line with this architecture?

    2. Obviously speed is almost as important as throughput. Arcadian and others have addressed the fact that the RDRAM architecture is apparently very scalable when it comes to memory frequency.

    What is the theoretical limit on the clock speed of RDRAM (including SDR, DDR, and QDR)?

    3. Next up is memory capacity. As applications (read: games) become more realistic and they require higher resolution textures, as well as more of them, more memory is needed.

    Now the question(s) are, can you get away with a scaled memory architecture, similar to the main memory setup of a PC system? AKA, large but relatively slow "main" or L3 memory (on card), much smaller and faster L2 memory and finally the onchip L1 memory? Or would this not be in line with a video cards demands?

    4. And my final question (and the one I would most like answered) is this: Obviously different graphics cards serve different purposes. For example the high end Wildcat 3D cards from 3DLabs have something like 192MB of onboard memory. The card itself is a mini rendering machine of sorts meant for intensive 3D modeling. However, a card with a much smaller amount of onboard memory can run circles around it when it comes to applications like games.

    Is it because the hardware is not optimized for running gaming applications? If that is the case, then how much does the optimization of both hardware and software (and firmware, if applicable) go towards performance gains?

    As always, these are less than technical questions. If they are inappropriate to the discussion, say so and I will edit them out.

    By Humus January 29, 2001, 11:02 AM

    quote:Originally posted by James:
    Sorry to interrupt, but if I can I would like to pose a few "low brow" questions.

    1. If what I am getting from this discussion is correct, RDRAM is fairly slow to get started (20-30% higher latency) but once it is going it has a much higher throughput. If this is correct it would be well suited to media or similar applications where initial latency is not important, but throughput is.

    My first question is this: Are video card memory demands in line with this architecture?

    2. Obviously speed is almost as important as throughput. Arcadian and others have addressed the fact that the RDRAM architecture is apparently very scalable when it comes to memory frequency.

    What is the theoretical limit on the clock speed of RDRAM (including SDR, DDR, and QDR)?

    3. Next up is memory capacity. As applications (read: games) become more realistic and they require higher resolution textures, as well as more of them, more memory is needed.

    Now the question(s) are, can you get away with a scaled memory architecture, similar to the main memory setup of a PC system? AKA, large but relatively slow "main" or L3 memory (on card), much smaller and faster L2 memory and finally the onchip L1 memory? Or would this not be in line with a video cards demands?

    4. And my final question (and the one I would most like answered) is this: Obviously different graphics cards serve different purposes. For example the high end Wildcat 3D cards from 3DLabs have something like 192MB of onboard memory. The card itself is a mini rendering machine of sorts meant for intensive 3D modeling. However, a card with a much smaller amount of onboard memory can run circles around it when it comes to applications like games.

    Is it because the hardware is not optimized for running gaming applications? If that is the case, then how much does the optimization of both hardware and software (and firmware, if applicable) go towards performance gains?

    As always, these are less than technical questions. If they are inappropriate to the discussion, say so and I will edit them out.

    1) Memory reads are in general very linear, so RDRAM could work quite good. But with certain features (such as EMBM) the texture reads are in a more random fashion. That may cause a big performance hit on RDRAM graphic cards.

    3) Graphic cards of today already have caches for textures and vertices. But, you could add like a L2 texture cache too.
    Some professional cards have something called virtaul texture memory. This works almost as virtual memory in a PC system. The application requests a memory address and the card will first look into the texture cache, if not found it'll look into video memory. If not there it'll load it from AGP. This way you don't need to swap the whole texture over AGP.

    4) Professional cards have pretty low fillrate, but are very fast on triangle setups, line drawing etc. Most professionals are working like 90% in wireframe mode, so this will of course be the most important.
    Gaming cards have high fillrates, but are very slow at drawing lines etc. (very often software driven).

    By Moridin January 29, 2001, 11:13 AM

    quote:Originally posted by idris5:

    Perhaps the biggest problem with a chip of this size would be yield.

    Since most of the 300+ million transistors would be memory cells it should be relatively easy to build in redundancy to avoid yield problems.

    By James January 29, 2001, 12:03 PM

    quote:Originally posted by Humus:
    1), 3), 4)

    What no 2?

    Seriously though, thanks for taking the time to answer my questions. The professional graphics explaination especially.

    By idris5 January 29, 2001, 12:36 PM

    quote:Originally posted by Moridin:
    Since most of the 300+ million transistors would be memory cells it should be relatively easy to build in redundancy to avoid yield problems.

    I'm not sure on this. I would of thought that the level of redundancy that would need to be built in to combat yield problems would increase the transistor count by a fairly large amount (10% at least) - and so effectively reducing the number of chips on a wafer.

    Power would be a significent problem as well (I hadn't thought of this earlier which is stupid becuase its quite important). You can't have huge great heat sinks and fans hanging off graphics cards so the power consumption needs to be kept low - a 300 million transistor chip wouldn't really achieve this!

    By Humus January 29, 2001, 02:41 PM

    quote:Originally posted by James:
    What no 2?

    Seriously though, thanks for taking the time to answer my questions. The professional graphics explaination especially.

    I don't know the answer to no 2 ...

    By fluppy January 29, 2001, 03:03 PM

    Ok Ive just read the entire thread and think I have something to contribute. I will start by quoting this

    "RDRAM has inherent flaws that lead to high latency. The more RDRAM you add to a memory subsystem, the higher the average latency gets *** because RDRAM travels in packets and these packets must traverse all modules before reaching the CPU***. Of course, a video card could be designed in such a way that a particular model might work acceptably. It doesn't, however scale particularly well if you need constant low latency.:

    Note the part inside the asterisks! Its something I had forgotten about RDRAM. If you were to create a whopping 8 banks of RDRAM to achieve the higher bandwidth, wouldnt you also be whipping up a tremendous latency? I agree that latency is hardly everything, especially in video cards, but in my mind, this latency would be tremendous. Not like the tiny increase in latency of the 1/2 bank i820/840 chipsets. We're talking 8 banks which all data must pass through.

    Indeed if I am wrong in my theory.. or it can be overcome.. and a memory controller can be made to be efficient, then I think its a good idea.

    Though I also believe we are discussing the wrong problem. Memory bandwidth bottlenecks are the concern here.
    Try this analogy - if you have a water leak that keeps overflowing the bucket.. do you get a bigger bucket or fix the leak?

    Current 3d accelleration technology, to me, is like a water leak. It reminds me of the x86 processors. They are inefficient.. but it takes too much effort to change the standard. I think we are inhibited by the downfalls of pioneering 3d technology which is difficult to shake. lets face it.. our 3d chips are stupidly bandwidth hungry and very wasteful. Anyone can come up with theories for making this more efficient (such as tile rendering)... but implementing it in the face of 'classic' and proven techniques is difficult. However, I do think graphics companies should be looking to fix the leak rather than get a bigger bucket. From vague rumours Ive been hearing here and there, it sounds like Nvidia may already have that planned for the next generation after the NV20.

    Also supporting the idea of nvidia's reluctance.. one of the reasons they are so popular is because they have released product after product with huge success in performance time after time without a single failure. Intel used to be the same. But they blew it... and the market lost confidence in them. I think it could hurt nvidia too much to make a mistake and release a below-average product (or waste valuable resources researching one only to find its not good enough to release). I would say that in their current legendary and well loved status, they would be reluctant to change anything unless they were serverely threatened by a competitor.. which they arent. I know ATI is up there with performance.. but ati, like intel, has stuffed up a bit.. and most people, including myself, have little faith in them (unlike my faith for nvidia). I cant trust ATI to deliver a great product every time.. its one of those things that I have to see to believe... and Im sure nvidia doesnt want the world saying that about their products too.

    I hope Ive provided some kind of valuable insight. I know I babbled a lot.. Im tired

    By idris5 January 29, 2001, 06:36 PM

    I don't agree.

    Each RAMBUS IC would be entirely seperate from all others - each chip would have its own data bus (the address bus would be common to all of the RAMBUS IC's). This is no different to how DDR DRAM IC's are addressed on a graphics card.

    In other words to create a 128 bit wide data bus, you need 8 16bit wide RAMBUS IC's. IC 0 gives Data[15:0], IC 1 gives Data[31:16], IC 2 gives Data[47:32] and so on.

    This layout actually means that the latency is massively reduced as there is only one RAMBUS IC on each data bus lowering trace capacitance and shortening the packets time of flight (in comparison to the 4+ RAMBUS IC's on a RIMM).

    The memory controller in this sense would be no different from one in a DDR graphics card.

    By idris5 January 29, 2001, 06:42 PM

    I hope I explained that properly - it seems a bit garbled...

    The basic premise is you effectively have a channel to each RAMBUS IC, so you don't incur the latency problems you mentioned.

    I agree with you on the second half of your post though.

    By Bateluer January 29, 2001, 10:17 PM

    I think I said this before and it may have been said somewhere in this thread.

    BUT, creative did release a card with RDRAM on it. http://www.creativelabs.com/graphics/gb-3d/

    It was a major flop, the high latency of RDRAM does no work well on video cards.

    By Marsolin January 30, 2001, 12:37 PM

    quote:Originally posted by fluppy:
    RDRAM has inherent flaws that lead to high latency. The more RDRAM you add to a memory subsystem, the higher the average latency gets *** because RDRAM travels in packets and these packets must traverse all modules before reaching the CPU***. Of course, a video card could be designed in such a way that a particular model might work acceptably. It doesn't, however scale particularly well if you need constant low latency.:

    Note the part inside the asterisks! Its something I had forgotten about RDRAM. If you were to create a whopping 8 banks of RDRAM to achieve the higher bandwidth, wouldnt you also be whipping up a tremendous latency? I agree that latency is hardly everything, especially in video cards, but in my mind, this latency would be tremendous. Not like the tiny increase in latency of the 1/2 bank i820/840 chipsets. We're talking 8 banks which all data must pass through.

    Your quote, while technically correct, is a little confusing. We need to make sure and be clear about the different between number of RDRAM devices and amount of memory. The number of devices in the chain determines (among other things) the latency. By increasing the number of channels, not only is bandwidth doubled, but latency is decreased by spreading out those devices.

    Flight time with RDRAM is also a component of latency due to its serial nature we just mentioned. Without the need for RIMM slots like those on a motherboard, a graphics card could scratch out some additional latency savings by shortening data paths.

    By Moridin January 30, 2001, 02:36 PM

    Calling the time of flight latency "massive" is misleading at best. The most latency that is ever added in a PIII system would be a 9 ns which rounds up to a single tick of the FSB clock. This is the same as going from CAS 2 to CAS 3 SDRAM. Since the channel will be much shorter on a video card the time of flight latency penalty would be much smaller.

    Parallel interfaces also have problems with time of flight. In the case of a parallel interface this limits clock speed since the data on every data line must arrive at exactly the same time. If two lines have even slightly different characteristics this will limit your max clock speed. Serial interfaces are not nearly as sensitive to this.

    By idris5 January 31, 2001, 06:38 AM

    I was refering to the trace capacitance as well as the trace length.

    If there is only one IC on the bus (rather than the 4? on a RIMM) then obviously you have only 1/4 the capacitance. As signal speed on a transmission line is equal to 1/sqrt(L*C) then a higher speed can be obtained.

    But either way you are right - massive was an overstatement.

    However, parallel interfaces are not that limited. As long as the bus is not stupidly wide (128bits+) it isn't too difficult to match trace lengths. At, say, 1GHz a signal will move 0.3 metres in one tick. A synchronous interface will therefore allow a differnce in trace lengths of up to 30cm, even at this frequency, before integrity issues occur.

    And data does not have to arrive simultaneously on either an asynchronous or a synchronous bus.

    On a synchronous bus you have a window of time (between clock edges) that the signals must arrive in. On our notional 1GHz bus if all the signals are sent at the same time their can be a maximum path difference of up to 30cm and yet data integrity can still be kept.

    On an asynchronous bus some kind of handshaking protocol is in use (i.e the RTS/CTS signals on RS232). By its very nature asychronous buses tend to be slower as they rely on these handshaking signals, but they still don't need the data to arrive simultaneously.

    The reasons that that massively parallel buses aren't used tends more to be with PCB and component cost. Frequency issues do exist, but they are more of a problem in the design of the connectors.

    By Moridin January 31, 2001, 11:44 AM

    quote:Originally posted by idris5:
    I was refering to the trace capacitance as well as the trace length.

    If there is only one IC on the bus (rather than the 4? on a RIMM) then obviously you have only 1/4 the capacitance. As signal speed on a transmission line is equal to 1/sqrt(L*C) then a higher speed can be obtained.

    AFAIK the IC's on a Rambus channel are connected point to point in a loop. Additional IC's do not change the capacitance since they are not connected to a common bus. It is SDR/DDR channels that would suffer from the problem you describe.

    Signals on a circuit board do not propagate as quickly as signals in air. You also need some harmonics to get a digital signal, more if you want a good rise time. Factor in some time for the signals to settle before they are clocked in and you are stating to cut thing close. On a parallel bus you also double the problem since you need to worry about the same problems with your clock and your data.

    I could be mistaken, but I think timing issues like this is what is holding down the speed of parallel buses. Almost every really high speed bus I have seen proposed is serial. If you look at the RDRAM packet all the packet timing occurs as if it were a 100 MHz bus.

    By idris5 January 31, 2001, 06:32 PM

    quote:Signals on a circuit board do not propagate as quickly as signals in air. You also need some harmonics to get a digital signal, more if you want a good rise time. Factor in some time for the signals to settle before they are clocked in and you are stating to cut thing close. On a parallel bus you also double the problem since you need to worry about the same problems with your clock and your data.

    I tried to keep it simple by just using C as the speed of the signals, but you are right, the actual signal speed tends to be 2/3 C (or 1/sqrt(L*C) if you want it really accurate)

    quote:I could be mistaken, but I think timing issues like this is what is holding down the speed of parallel buses

    Possibly, but DDR DRAM is being clocked to 250MHz on graphics cards (all be it with easier to design signal paths) so it is technically possible. Perhaps its just economically impracticle at present.


    Contact Us | www.SharkyForums.com

    Copyright © 1999, 2000 internet.com Corporation. All Rights Reserved.


    Ultimate Bulletin Board 5.46

    previous page
    next page





    Copyright © 2002 INT Media Group, Incorporated. All Rights Reserved. About INT Media Group | Press Releases | Privacy Policy | Career Opportunities