Home

News

Forums

Hardware

CPUs

Mainboards

Video

Guides

CPU Prices

Memory Prices

Shop



Sharky Extreme :

 
Search
 



Latest News


- Outdoor Life: Panasonic Puts 3G Wireless Into Rugged Notebooks
- Averatec Launches Lightweight Turion 64 X2 Laptop
- Acer Fires Up Two New Ferrari Notebooks
- Belkin Debuts Docking Station for ExpressCard-Equipped Notebooks
- Logitech 5.1 Speaker System Puts Your Ears At Eye Level
News Archives

Features

- SharkyExtreme.com: Interview with ATI's Terry Makedon
- SharkyExtreme.com: Interview with Seagate's Joni Clark
- Half-Life 2 Review
- DOOM 3 Review
- Unreal Tournament 2004 Review

Buyer's Guides

- September High-end Gaming PC Buyer's Guide
- September Value Gaming PC Buyer's Guide
- October Extreme Gaming PC Buyer's Guide

HARDWARE

  • CPUs


  • Motherboards

    - Gigabyte GA-965P-DS3 Motherboard Review
    - DFI LANPARTY UT nF4 Ultra-D Motherboard Review

  • Video Cards

    - Gigabyte GeForce 7600 GT 256MB Review
    - ASUS EN7900GT TOP 256MB Review
    - ASUS EN7600GT Silent 256MB Review
    - Biostar GeForce 7900 GT 256MB Review





  • SharkyForums.Com - Print: Early Itanium Info

    Early Itanium Info
    By Arcadian January 13, 2001, 02:24 PM

    I was just coursing through the Register, and they happened to have a few links to sites getting ready for Itanium. Here is an update from Compaq, who plans to have products ranging from, "4P application servers to 32P data center servers".
    http://www5.compaq.com/solutions/speedstart/newsletter/jan2001/itanium.html

    Also, here is an early benchmark of Itanium, compared to a 1.0GHz Athlon with SDR SDRAM, and a 1.5GHz Pentium 4 without any optimizations.
    http://www.aceshardware.com/

    Not surprisingly the Athlon beats the Pentium 4 by a small margin, since the Pentium 4 is using its sub par generic floating point unit, but the interesting thing is the Itanium. Itanium is clocked at only 667MHz, and it is already 40-50% faster than the Athlon at 1GHz!

    Production level Itanium chips will be clocked at 733MHz and 800MHz, which are 20% faster clock speeds than the one being tested. This is very exciting news, since these are early production samples, and performance is even likely to improve more over time.

    In case anyone doesn't know, Itanium is Intel's new 64 bit processor based on the more parellel EPIC instruction set. It has received some ciriticism because of repeated delays, and the difficulty in writing a compiler that can take full advatage of the new architecture.

    However, it does seem that performance is doing quite well, even for an early engineering sample that has leaked into the public. I would like to hear anyone's comments or feedback if they have something to say about this.

    By Arcadian January 13, 2001, 02:45 PM

    This is a little off topic, but since the Aces Hardware link does give more information about the Pentium 4, I thought we should discuss that, too.

    As I said above, the first test was with no optimizations, and the Pentium 4 wavered around 1000-1100 MFLOPS, or millions of floating point operations per second. Since Atlas (the benchmark) is a simple test, we find that even though the Pentium 4 can run 1500 cycles per second (1.5GHz = 1500 cycles per second), it can only do 1000-1100 calculations per second, which means without optimizations, 400-500 cycles per second are being wasted completely.

    The Athlon is running at 1000 cycles per second (again, from 1.0GHz), yet it achieves between 1100-1200 MFLOPS. Obviously, the Athlon's improved floating point design allows it to sometimes compute more than one operation every cycle. In other words, when it achieves a score of 1200 MFLOPS for a particular matrix size, then 20% of the time, it was able to do 2 calculations in the same cycle, much more efficient than the Pentium 4.

    However, there is another test that is exclusively single precision. Single precision is used in a lot of 3D games, for example, where less precision is allowed, and it uses 32 bits for the numbers, while the previous double precision test used 64 bit numbers.

    On this second test, the Pentium 4 was optimized around SSE (not SSE2, which may in fact help the performance even more), while the Athlon was optimized around 3DNow.

    The Athlon received scores between 1860.5 and 2298.9 MFLOPS. Not surprisingly, this score is much higher than for double precision. The reason being is because single precision numbers can be parellelized much easier, and you can often execute several calculations per cycle through the multiple execution engines that modern processors have. Averaging out these results, we can see that through 3DNow, the Athlon can roughly execute 2 single precision calculations every cycle (about average for microprocessors executing single precision these days).

    The Pentium 4 also receives a boost, but because of SSE, the results are quite impressive. Now, the Pentium 4 receives between 2500 and 3811.1 MFLOPs. The average (since the 2500 score was the result of a very small matrix, and most scores were much higher) is about 3600. This means that the Pentium 4 actually gets three calculations done per cycle, 30% of the time, while the rest of the time, it can do two! And, at a 50% increase in clock speed, it soundly whoops the Athlon.

    The point here is that SSE optimizations are absolutely necessary for the Pentium 4 to really shine. It actually performs better per clock than the Athlon can when optimized, and we know that the Pentium 4 can easily continue its lead in megahertz.

    This is pretty interesting, and obviously it continues to ask the question, "when will applications be optimized for the Pentium 4?" Well, the answer is probably not "never", but the answer could be "a long time". Still, though, I await to see other tests to confirm this, and there already have been several, and they show that the Pentium 4 needs opimizations, and when provided, it can really shine.

    As always, I look forward to any feedback to this message.

    By Angelus January 13, 2001, 03:10 PM

    I've been wondering about this, but what exactly is meant by optimizing software for the P4?

    Is it the applications that need huge amounts of bandwith that need to be programmed differently, or does it begin with redesigning Windows and from that all the way down to let's say Winamp to let it make more efficient use of SSE2?

    By Arcadian January 13, 2001, 03:19 PM

    quote:Originally posted by Angelus:
    I've been wondering about this, but what exactly is meant by optimizing software for the P4?

    Is it the applications that need huge amounts of bandwith that need to be programmed differently, or does it begin with redesigning Windows and from that all the way down to let's say Winamp to let it make more efficient use of SSE2?

    Windows is already programmed to be "aware" of SSE/SSE2. In order to use these instructions, it is necessary for Windows to know how to, say, service an Interrupt while executing an SSE2 instruction.

    Besides that, though, individual programs get optimizations. Intel has a compiler that can make these optimizations for you. All you need to do is recompile your code. Intel also is improving this compiler over time to make better use of the code.

    It's hard to describe program optimizations on a complete layman level, but let's say that certain instructions can be executed in a certain amount of time. But, if you use other instructions, or the same instructions in a different order, you can get the same work done, but get it done faster.

    If you really know how a given CPU works, you get good at anticipating the timings, and how each processor takes advantage of parellelism. For the Pentium 4, that can mean replacing standard FPU instructions with the equivalent SSE/SSE2 instruction. This is actually easier done than said, since Intel makes compilers to do this for you.

    However, it is sometimes difficult for developers to have confidence in compilers they haven't used before, so it is up to Intel to get developers to use the tools they are making available. I believe Intel can do this, but it will take time.

    By James January 13, 2001, 03:31 PM

    Question, Arcadian:

    The Itanium is 64-bit native in structure. Is that why it smokes the P4 and AMD when running the 64-bit double precision tests?

    Also, they were running Linux in that link. What kernel? Is it2.4?

    By slick January 13, 2001, 04:55 PM

    This is kind of abit off topic but how much you think an Itanium will cost when it hits the street?

    By Conrad Song January 13, 2001, 06:36 PM

    quote:Originally posted by Arcadian:
    I was just coursing through the Register, and they happened to have a few links to sites getting ready for Itanium. Here is an update from Compaq, who plans to have products ranging from, "4P application servers to 32P data center servers".
    http://www5.compaq.com/solutions/speedstart/newsletter/jan2001/itanium.html

    Also, here is an early benchmark of Itanium, compared to a 1.0GHz Athlon with SDR SDRAM, and a 1.5GHz Pentium 4 without any optimizations.
    http://www.aceshardware.com/

    Not surprisingly the Athlon beats the Pentium 4 by a small margin, since the Pentium 4 is using its sub par generic floating point unit, but the interesting thing is the Itanium. Itanium is clocked at only 667MHz, and it is already 40-50% [b]faster than the Athlon at 1GHz!

    Production level Itanium chips will be clocked at 733MHz and 800MHz, which are 20% faster clock speeds than the one being tested. This is very exciting news, since these are early production samples, and performance is even likely to improve more over time.

    In case anyone doesn't know, Itanium is Intel's new 64 bit processor based on the more parellel EPIC instruction set. It has received some ciriticism because of repeated delays, and the difficulty in writing a compiler that can take full advatage of the new architecture.

    However, it does seem that performance is doing quite well, even for an early engineering sample that has leaked into the public. I would like to hear anyone's comments or feedback if they have something to say about this. [/B]

    I'm interested in knowing what the definition of a FLOP is. There are supposedly only two double precision FMACs on Itanium. Wouldn't that in theory imply about 1.3GFLOPs in double precision? In any case, this is what I expected from Itanium.

    Five years ago, the buzz word was how big your cache was. In the world of superscalar, this has changed. The new words are bandwidth and latency, and the processor with the most wins. You can stick eight FMACS on a processor and they won't do a thing if your memory system can support them.

    It is hard to compare IA-64 and IA-32. Itanium has enough FP registers to help mask the latency from the memory subsystem. Althon and Pentium IV does not have that luxury. Here, we see that while Pentium IV has significantly better memory subsystem, it suffers from only having a single FMAC versus Althon's split FADD and FMUL. However, SSE optimize the thing and the Athlon advantage disappears and the Pentium IV memory subsystem really shines.

    By Conrad Song January 13, 2001, 06:44 PM

    quote:Originally posted by slick:
    This is kind of abit off topic but how much you think an Itanium will cost when it hits the street?

    I remember reading that Itanium will have several sizes of L3 cache. I imagine these can easily cost several thousands of dollars even for a 1MB L3 version. Remember that these chips will eventually make their way into server boxes costing up to the multimillions.

    By Moridin January 13, 2001, 08:09 PM

    quote:Originally posted by Conrad Song:
    I'm interested in knowing what the definition of a FLOP is. There are supposedly only two double precision FMACs on Itanium. Wouldn't that in theory imply about 1.3GFLOPs in double precision? In any case, this is what I expected from Itanium.

    Five years ago, the buzz word was how big your cache was. In the world of superscalar, this has changed. The new words are bandwidth and latency, and the processor with the most wins. You can stick eight FMACS on a processor and they won't do a thing if your memory system can support them.

    It is hard to compare IA-64 and IA-32. Itanium has enough FP registers to help mask the latency from the memory subsystem. Althon and Pentium IV does not have that luxury. Here, we see that while Pentium IV has significantly better memory subsystem, it suffers from only having a single FMAC versus Althon's split FADD and FMUL. However, SSE optimize the thing and the Athlon advantage disappears and the Pentium IV memory subsystem really shines.


    Itanium has instructions that X86 does not including some that can perform a multiply and an add with one instruction. This effectively allows 2.6 GFLOPS in double precision.

    Itanium has a very good memory subsystem. While its bandwidth is not as high as the P4 it's latency performance should be superior. On top of this IA-64 code should make extensive use of prefetch because of the nature on the ISA.

    By Moridin January 13, 2001, 08:26 PM

    quote:Originally posted by James:
    Question, Arcadian:

    The Itanium is 64-bit native in structure. Is that why it smokes the P4 and AMD when running the 64-bit double precision tests?

    64-bit operation plays no role in this whatsoever. The X87 FPU has always been 64 bits AFAIK. When you say a processor is 64 bits of 32 bits you are referring to its integer operation.

    Very few applications actually benefit from a 64-bit processor since normally the values you are working with fit into 32 bits. There are some exceptions to this like encryption.

    The real benefit of 64 bits is the amount of memory you can address. Since memory addresses and pointers are handled in the integer part of the chip the number of bits (32 or 64) sets the maximum amount of memory a chip can access. X86 cannot handle memory in chunks larger then 2^32 or 4 GB. Since it has hardware left over from its earlier incarnations it can handle 16 such blocks of memory with OS support but the normal limit is 4 GB.

    (The first IBM PC's was 16 bits and could only handle 64 KB of memory per segment. DOS used 10 of these 16 memory segments for applications giving you the old 640 K memory limit in DOS.)


    By James January 13, 2001, 08:32 PM

    Thanks Moridin. Apparently I am confused. (What else is new?)

    P.S. Here is an old thread I started on the idea of memory addressing, and more specifically the memory addressing capabilities of the IA-64 architecture.
    http://www.sharkyforums.com/ubb/Forum27/HTML/000083.html

    By Moridin January 13, 2001, 09:02 PM

    quote:Originally posted by James:
    Thanks Moridin. Apparently I am confused. (What else is new?)

    P.S. Here is an old thread I started on the idea of memory addressing, and more specifically the memory addressing capabilities of the IA-64 architecture.
    http://www.sharkyforums.com/ubb/Forum27/HTML/000083.html

    The way I understand it all memory operations are performed using 32 bits. In effect you have a register that is 32 bits wide for memory addressing. However the register has 4 additional bits that can be set or changed as required, but you would need specify the change. This allows you to access the additional memory. 4 bits = 16 possible combinations and 4 GB X 16 =64 GB. I think the pins are not there on the PII/PIII to support this but that the logic still exists internally.

    In this configuration you would not have a flat 64 GB address space. I may be mistaken though I know this is basically what X86 processors up to the 486 did (what I studied in school 6 years ago since the P5 was still very new) however it is possible my memory has failed me or new features have been added to X86.

    Are there any programmers out there who could comment on this? Is the memory model for 64 GB X86 platforms flat? How do you utilize the memory?

    Edit
    (After I wrote this I realize how badly I worded it is. I hope people can still answer though, just don't flame me for asking what seems like a dumb question. Obviously the memory model is the OS responsibility and you shouldn't need different code to use the 64 GB of memory. I'm just having some difficulty phrasing my question.)
    /Edit

    Most 64 bit platforms do not currently use the full 64 bits for addressing I don't know what Itanium does.


    By Marsolin January 15, 2001, 06:40 PM

    quote:Originally posted by slick:
    This is kind of abit off topic but how much you think an Itanium will cost when it hits the street?

    I haven't heard any actual numbers, but I would expect $2000-$3000 depending upon the 2MB or 4MB cache version.

    By Bash January 18, 2001, 12:32 PM

    quote:Originally posted by slick:
    This is kind of abit off topic but how much you think an Itanium will cost when it hits the street?

    I'd bet that a single processor Itanium workstation with SCSI, a decent amount of ram, and good monitor will run start about $12k a launch so they compete with the Alpha.
    (wild guess)

    -bash

    By Arcadian January 18, 2001, 01:05 PM

    quote:Originally posted by Bash:
    I'd bet that a single processor Itanium workstation with SCSI, a decent amount of ram, and good monitor will run start about $12k a launch so they compete with the Alpha.
    (wild guess)

    -bash

    I believe I read somewhere that the 2MB Itanium systems will cost nearly $2000, and the 4MB versions will cost nearly $4000 for a single processor. I'm also curious, since you mentioned it: how much does a similar Alpha system cost?

    By smtkr January 18, 2001, 08:11 PM

    Would you expect any less for a processor intended for servers? And, who really cares? Is there any consumer with the money, time, or craziness to actually want to put a server chip in a home system? I mean, I've heard of people with xeons, but this a very advanced processor. As mentioned before, just the level 3 cache will make producing or buying this chip costly.

    By smtkr January 18, 2001, 08:14 PM

    While we're on the price thing, consider the fact that it's intel. I've always been partial to intel, but their crap is way too expensive for performance. It could be different in the server world though, as I am uninformed on pricing there.

    By ua549 January 19, 2001, 07:10 AM

    The last small server I configured cost well over US$50,000. It was a 4 proc, large cache Xeon 700 with 8 GB memory and an Adaptec 3400S RAID controller - No disk.


    Contact Us | www.SharkyForums.com

    Copyright © 1999, 2000 internet.com Corporation. All Rights Reserved.


    Ultimate Bulletin Board 5.46

    previous page
    next page





    Copyright © 2002 INT Media Group, Incorporated. All Rights Reserved. About INT Media Group | Press Releases | Privacy Policy | Career Opportunities