Hierarchical Z has also received a significant amount of attention. Naturally, ATI is somewhat secretive about the technology in order to avoid competitors copying them, so instead of attempting an explanation of it ourselves, we approached David Nalasco, ATI's Technology Marketing Manager, for more detailed clarification.
David Nalasco: I'll start with an overview of the traditional (i.e. non-tile-based) 3D graphics pipeline. A 3D scene starts out as just a bunch of polygons, or more specifically, a set of vertex co-ordinates that can be used to define polygons. The T&L engine (whether in s/w or h/w) then rotates, translates, and scales the vertex co-ordinates to get them in the desired location in 3D space (also known as world space). Polygons partially outside of the viewing area are then clipped, and the remaining vertices are then assigned light intensity values (assuming vertex lighting is used). The final T&L stage is the viewport transform, where the vertex co-ordinates are transferred from world space to view space. Once T&L is completed, we are still left with a bunch of vertices and no actual "triangles."
The next stage is triangle setup, where 2D pixel co-ordinates are assigned to each polygon to fill it in. The triangle setup engine also interpolates lighting values across each polygon and applies perspective correction calculations. The output of this stage is a series of gouraud shaded, non-textured triangles.
In traditional architectures, the next stage is the rendering pipeline. This cycles through the list of triangles, looks at each pixel in each triangle, and applies textures, fog, alpha blending, etc. to assign it a color value. The rendering engine then performs a Z-test, which determines if the pixel just rendered is visible or located behind a previously rendered pixel by looking at its depth or Z value. If the new pixel is visible, it's written to the frame buffer, otherwise it's discarded. Once every pixel in a polygon has been rendered, the engine moves on to the next polygon. This process is inefficient, since it ends up going through the whole rendering process for each pixel before deciding if it will be visible or not. The more hidden pixels (aka overdraw) there are in a scene, the greater the inefficiency. A typical 3D game today like Quake III Arena renders on average about three times as many pixels as are actually visible, meaning 2/3 of the rendering time spent on the scene is wasted. This also severely impacts memory bandwidth, since each pixel rendered requires multiple texture samples to be read from memory (4 for bilinear filtering, 8 for trilinear filtering, and even more for anisotropic).
Hierarchical Z improves the efficiency of the rendering process in two ways. First, it does "early Z detection", meaning it does the Z-test on each pixel before it is rendered instead of after. Second, it can do the Z-test on entire blocks of pixels at a time. The result is that overdraw is virtually eliminated, and memory bandwidth efficiency is significantly improved. The RADEON chip's HyperZ implementation also employs lossless Z Compression on Z-buffer reads and writes, and Fast Z Clear to rapidly clear the Z-buffer in between frames. This combination of techniques dramatically reduces memory bandwidth requirements and substantially improves performance.