AGP - A New Interface for Graphic Accelerators - THG.RU

It’s not very long until the first AGP systems will hit the market. Intel will release its new 440LX Pentium II chipset with AGP support on August 26, 1997, only a few weeks away. Lots of hopes and hypes about AGP can be found all over the place and so I found it was time to add my 2 cents to this discussion, hopefully giving you some clear facts about this new interface.

What Does AGP do?

AGP is nothing mystical at all and the idea behind AGP isn’t even particularly unique. If future graphic accelerators should be particularly faster than the current PCI graphic cards it will take much more than just AGP. However AGP enables graphics hardware to do its job faster whilst also keeping the costs low.

The AGP specification is based on the 66 MHz PCI specification rev. 2.1, which isn’t in much use currently, since all current PCI cards are still only able to use the 33 MHz PCI bus speed. AGP however is adding three special extensions via so called ‘sideband’ signals, provided by some special lines added to the PCI specs. These three extensions are

pipelined memory read/write operations
demultiplexing of address and data on the bus
timing for data transfer rate as if clocked with 133 MHz

Now what does this mean in laymen terms?

First of all AGP offers a much higher throughput over the AGP bus than PCI does. PCI as currently clocked at 33 MHz can transport 133 MB/s at peak rates over its 32 bit data bus (33,000,000 * 4 byte * sec^-1). AGP is clocked with 66 MHz, which enables a peak rate of 266 MB/s (66,000,000 * 4 byte * sec^-1)at the classic so called ‘x1’ mode, but using the ‘x2’ mode, which transports data on both the rising and the falling edges of the 66 MHz clock, it can transport up to 532 MB/s at peak rate (please note that it is up to the graphic accelerator’s vendor if ‘x2’ mode is supported). So far about the ‘133 MHz’ data transfer rates, which doesn’t mean that the AGP bus is clocked at 133 MHz at all! Now in real world AGP is able to transfer closer to the hypothetical peak values due to some extra signal wires which enable pipelining and queuing of requests.

Figure 1: Non-pipelined PCI vs. AGP An is the address of the request, and Dn is the result. Copyright(c) Intel Corporation

Due to this new technology, AGP peak transfer rate is as high as the peak transfer rate of current main memory, which in Pentium and above systems operates with a 64 bit wide bus at 66 MHz bus clock. Future systems will reach a main memory peak transfer rate of 800 MB/s by using 100 MHz bus clock.

Now this new main memory like high data transfer rate that AGP offers us is only one part of the story, but for the beginning of AGP it might be the most important one.

Due to the high data transfer rate between the graphics accelerator and main memory, AGP enables graphic accelerators to use main memory instead of local memory for things like typically textures, which can be as big as up to 128 kB. These textures so far had to be loaded into the local graphic accelerator memory to be processed there by the graphic processor. Now these textures can be processed in main memory without a performance impact. Intel calls this DIME, for DIrect Memory “Execute”. UMA the ‘unified memory architecture’ used on low cost boards in the past, where already main memory was used as graphics memory, had two important differences:

The main memory provided via AGP and thus called ‘AGP memory’ doesn’t replace the screen buffer of the graphic accelerator as done in UMA. The AGP memory is an addition to it.
UMA had to go through the much slower PCI interface.

These two differences show why UMA was particularly slow and should make you understand why AGP graphic accelerators should be faster than current PCI solutions.

If this is hard to understand, let me give you a simple example:

The 3D accelerators with the 3Dfx Voodoo chip e.g. the Diamond Monster 3D usually come with 4 MB memory. Now 2 MB of this memory are used for textures and 2 MB are used for frame buffer and Z-buffer. This is why the Monster 3D is limited to 640*480 resolutions in e.g. GLQuake, since only 2 MB can be used for frame buffering and 2 MB are used for textures, which would not be necessary if main memory could be used for this job, as possible with AGP’s DIME.

“Graphics local RAM is usually more expensive than generalized system memory and it cannot be used for other purposes by the OS when unneeded by the graphics of the running applications. The graphics chip needs fast access to local memory for screen refresh, Z-buffers, and pixels (front and back-buffers). For these reasons, programmers can always expect to have more texture memory available via AGP system memory. Keeping textures out of the frame buffer allows larger screen resolution, or permits Z-buffering for a given large screen size. Most applications could use 2-16 MB for texture storage. By using AGP and DIME, they can get it.” (Intel Corporation)

But let’s for now get back to the theory again.

The chipset has to provide the function to map the ‘AGP memory’ to normal main memory. Intel calls this GART (Graphics Address Remapping Table).

“The processor “linear” virtual addresses get translated by its paging hardware into physical addresses. These physical addresses are used to access system RAM, local Frame Buffer, and AGP RAM. The CPU accesses to the Local Frame buffer and AGP RAM use the same addresses as the graphics chip does; for that reason, the operating system sets up the CPU paging hardware to a straight 1:1 non-translation of virtual to physical address. ” (Intel Corporation)

What’s the Beef ?

So far so good, let’s now summarize the benefits AGP is offering:

higher bandwidth than PCI, up to 4 times as high
no sharing of bandwidth with other components like in case of PCI
DIME, direct memory execution of textures
CPU accesses to system RAM can proceed concurrently with the graphics chip’s AGP RAM reads
Allowing the CPU to write directly to shared system AGP memory when it needs to provide graphics data, such as commands or animated textures. Generally the CPU can more quickly access main memory than it can graphics local memory via AGP, and certainly faster than via the PCI bus.

Obviously it doesn’t take a Pentium II to provide the needs for an AGP system. This is why Socket 7 systems with AGP (e.g. upcoming VIA Apollo VP3 chipset) will do just the same as the AGP provided by the 440LX chipset for Pentium II platforms.

Software Considerations

Unfortunately, getting an AGP board plus an AGP graphic accelerator won’t be enough to take advantage of AGP’s new performance. Nothing goes without a proper operating system which has to take care of particularly the DIME/GART part of the AGP benefits. The OS has to provide main memory for the AGP RAM and has to monitor that main memory is still enough for the running applications. This shall be achieved via DirectDraw of Memphis (Windows98) and Windows NT 5. As long as these operating systems aren’t out, nobody will be able to take advantage of the DIME and hence only half of the AGP benefits are used.

AGP – Some Critical Thoughts

The number one benefit from AGP is supposed to be the DIME feature, which is meant to save video RAM onboard the graphics adapter. There are some doubts however, where I’m wondering if this idea will turn out to be as wonderful as it sounds. We have learned that AGP offers a theoretical peak throughput of 528 MB/s using ‘x2’ mode and the next ‘x4’ mode is already planned. This mode would offer a throughput of about 1 GB/s, isn’t that amazing? There is a little problem we easily forget though. This throughput is meant to transport data from main memory to the graphic accelerator. Now currently the maximum throughput of main memory to the CPU at 66 MHz bus clock is exactly these 528 MB/s. You certainly don’t expect that the whole system is doing nothing while the graphic accelerator is accessing the main memory via DIME, do you? Whilst the graphics accelerator is doing its work, the CPU and other DMA using devices are accessing main memory just as well of course. Therefore AGP will never be able to get a throughput of 528 MB/s, since this is the whole bandwidth of main memory and thus it has to be shared with CPU and others. If you see it in a very simple statistical way you can’t expect that AGP will get more of that main memory bandwidth than 50% = 264 MB/s. What is the ‘x2’ mode good for then? These above averaged 528 MB/s bandwidth of main memory are already only valid for SDRAM systems. EDO is considerably slower, let alone good old FPM. What AGP really needs is the 100 MHz bus!! This bus will offer 800 MB/s bandwidth with SDRAM and so AGP could get a good share of it. Hence there’s not much value in going on about ‘x1’ or ‘x2’ mode AGP graphic cards currently, since there’s simply no technical chance that data could be transfered at the speed ‘x2’ mode is offering in 66 MHz bus speed systems. What does this mean for us? ‘Let’s wait again!!’ Let’s wait for the 440BX or VIA Apollo VP4 chipset, both using 100 MHz system bus.

There’s one other consideration as well. Modern VRAM or WRAM cards as well as RAMBUS RAM cards are offering a video memory (onboard, LFB or local memory) bandwidth of up to 1.6 GB/s (e.g. Number Nine Revolution 3D, 128 bit port WRAM). This is much more than even ‘x4’ mode will offer. These cards will be faster if they are using their local memory for texture processing rather than the much slower AGP RAM. This means that high end cards will work just as PCI cards in the past, only taking advantage of the higher data transfer speed of AGP, no DIME used. Intel thinks that this will be more expensive, but isn’t it funny … RAM prices are lower than ever. This should not really be a reason for a more expensive card.

Three Different Flavors

This leads to the question if you have to use DIME to benefit from AGP. The answer even provided by Intel is ‘NO’.

You can use AGP without using the DIME feature at all. In this case the graphic accelerator is just benefiting from the much higher transfer rates than PCI. The ‘sidebands’ can be used, but they don’t have to. Without ‘sidebanding’ the transfer rate is already 266 MB/s, which is double of what a stand alone PCI graphics card would get. Here the access can (as with PCI cards as well) either use PIO or DMA to transfer the data from main memory into the frame buffer of the graphic accelerator.
The majority of graphic accelerators will most likely use DIME, thus saving on board texture memory and hence making the card cheaper without loosing performance. Of course these cards should be using the ‘sidebands’ to enable ‘x2’ mode.
The high end versions will most likely use DIMEL (Direct Memory Execute and Local also). Often used textures would be stored in a (large) on board local memory, less frequently used ones would reside in the AGP RAM. These cards will come with a lot of memory on board, like e.g. the (expensive but fast) Diamond Fire GL 4000 (PCI) with its 32 MB RAM already shows. Even Intel admits that high end solutions will still have a very large local memory, but will be too expensive for mainstream.

Shall We Go and Buy AGP Boards Now ?

The answer is yes and no. As you will see from my benchmark results, currently there isn’t much to AGP at all. However the SDRAM support and the upgradeability (to AGP) of 440LX chipset boards will be a great advantage over the 440FX Pentium II boards. It will probably take at least until NT 5 and Memphis are released until there will be a really visible performance boost from AGP.

Valuable AGP Links

AGP and 3D Graphics Software by Intel – a must read for everybody who wants to know more about AGP

ACCELERATED GRAPHICS PORT INTERFACE SPECIFICATION also by Intel

AGP FAQ

AGP Support in Windows 95 and Windows NT from Microsoft

The Accelerated Graphics Port (AGP) – A Diamond Multimedia White Paper