Anyone who has tested more than a few graphics cards has observed that there is not always a very big performance delta between an AGP graphics card and a similarly configured PCI graphics card. The typical comeback goes something like…
"In today’s games and benchmarks, AGP isn’t even turned on yet. Just wait till games actually turn AGP on with big textures and stuff. Then you’ll see a huge difference."
Allow me to debunk this belief by going through the typical sequence of questions and answers. Then I will get into the details behind the whole situation.
Is AGP faster than PCI?
YES, but not by much.
Is AGP faster than Local Graphics Memory?
Will an AGP2x board be faster than a 1x board?
Isn’t it true that any AGP board will be faster than any PCI board?
Does graphics accelerator performance improve when AGP texturing is turned on?
No, graphics accelerators DECREASE in performance when using AGP mode.
Does AGP enhance CPU performance?
NO. When AGP texturing is used, CPU performance also DECREASES!
Then why is it that sometimes AGP cards score better than non-AGP on benchmarks?
This is usually because the non-AGP texture management software implementation of the application or benchmark in question is not very efficient. Game developers pay special attention to the performance of their texture swapping capability. The best techniques can show equal performance under normal circumstances. Some benchmarks may use a less optimized approach.
What can I do to avoid the negative performance impact of AGP?
Make sure you have a generous configuration of high performance graphics memory, and don’t buy a i740 based board (unless Intel "un-breaks" its drivers).
What is AGP good for?
It is a safety net in case you weren’t wise enough to buy enough graphics memory for your card. As games get more complex and require more texture memory, they will still run, but probably much slower than if you had enough graphics memory.
When that time comes, should I buy a faster CPU or more graphics memory?
Intel wants you to think that you need a faster CPU, when in fact all you may need is a few dollars worth of graphics memory. This is why Intel likes AGP so much. And this is why they crippled the i740.
How is the i740 crippled?
I’ll save that for the end.
AGP has several advantages over PCI. It offers a minor data transfer rate advantage when it comes to moving the geometry stream from the CPU to the graphics card. When it comes to managing large texture databases, AGP’s GART table allows the OS to manage textures in off screen memory as well as in system memory, and allows the graphics card to access them directly in either location. Prior to AGP, the game developer had three options for methods to manage textures:
Limit the texture database to whatever would fit in off-screen memory only. This usually delivers outstanding frame rates, but memory size constraints can limit artistic creativity. Depending on the graphics card, texture space could be as low as only one megabyte or possibly as high as five or six megabytes.
Use the OS to manage textures in main memory, and require the CPU to copy textures from main memory to graphics memory as needed. This is PAINFULLY slow. In order to make AGP look as good as possible, Intel likes to compare AGP to this mode. This is what DirectX Retained Mode does, but game developers do not generally use this method.
Place the most frequently used textures in off screen memory (like #1 above), then lock down a few additional megabytes of system memory for the remainder of the texture database. If the graphics chip needs a texture that is not in graphics memory, then the accelerator must use PCI master mode (DMA) to copy the needed texture, on demand, into texture swap space in graphics memory. Performance is very good. This has been the preferred approach for game developers, and can be programmed under Direct X Immediate Mode.
AGP is a modified approach to option 3. Memory management is a little more flexible, and the data transfer rate is better because of AGP’s faster clock speed and bus pipelining.
AGP offers two ways to deal with textures. One is called DMA mode which operates almost exactly like Option 3 above, but transfers occur over AGP rather than PCI. The other is Execute Mode, which allows the graphics chip to access the texture information in main memory without first copying it to graphics memory. The effective bus throughput of Execute mode and DMA mode are the same. If anything, DMA mode could be faster because of better concurrency and deeper pipelining.
Intel has gone to great lengths to convince game developers that DMA mode stinks. They have even gone so far as to refer to method #2 above as DMA mode in order to confuse everybody. DMA stands for "Direct Memory Access". There is nothing direct about using the CPU to copy data. This is pure deception. True DMA uses a hardware bus master, like a PCI or AGP graphics accelerator.
Game developers and users should prefer AGP’s DMA mode because it offers excellent performance while still being architecturally compatible with the installed base of PCI accelerators. Intel prefers Execute Mode because it only runs with AGP. As we all know, Intel is always trying to stir up more ways to persuade users to dump their PCI Pentium 233 systems ASAP, and go buy a more costly P2 AGP system. Intel’s Mission is not to "Accelerate 3D Graphics", but rather to "Accelerate Obsolescence".
AGP’s DMA mode offers excellent concurrency because the game can detect if it must swap textures before the texture is actually needed, while the CPU is still calculating the geometry (in the geometry setup stage). This way, the graphics card can begin fetching the texture before it is needed to paint pixels on the screen. Concurrency is one of the keys to performance.
In AGP Execute mode, texture accesses are driven by the rasterizer at the final stage of the 3D pipeline. At this point, the accelerator is dead if it cannot have immediate access to textures. In this way, AGP creates a nasty performance bottleneck. Instead of having immediate access to textures in high bandwidth local memory, the accelerator must stall while it arbitrates for access to slower system memory via AGP.
30% Reduction in Accelerator Performance
Intel has developed a software tool which is useful in comparing the performance of AGP vs. Local texturing modes. It is called IBASES. As a matter of principle though, I do not recommend this tool to anyone. IBASES does not support AGP DMA mode texturing. Instead of AGP DMA mode, it substitutes the extremely useless CPU copy mode (method #2 above). Oddly enough, the software still refers to this as "DMA Mode". This is clearly NOT a mistake, but rather a blatant attempt to deceive. For anyone using this tool, results from the "DMA" test should be completely disregarded. One should instead assume that AGP DMA mode results would be about the same as AGP Execute mode.
I evaluated local vs AGP texture performance of the following accelerators:
ATI Rage Pro
3D Labs Permedia 2
nVidia Riva 128
The program allows the user to change the number of times the textures are accessed per frame. This has the effect of gradually increasing the total texture bandwidth demand. As seen over AGP, the texture bandwidth demand created by IBASES in the chart below ranges from 256K per frame at the left extreme, up to 4 megabytes per frame at the right extreme of the chart.
The chart shows the average difference in rendering performance for all of the accelerators using AGP Execute mode texturing, compared to local graphics memory. This data demonstrates that overall, AGP execute mode is about 30% slower than local texture mode.
10% Reduction in CPU Performance
The other side of the performance equation is the CPU. What happens to the CPU when AGP texturing is activated? When AGP texturing is turned on, the graphics card takes control of the main memory bus in order to access texture data. When this happens, the CPU is locked out of main memory. If the CPU is not very busy, this may not be a problem. But if the CPU is engaged in a computationally challenging task (such as a game) it is highly possible that the CPU may stall, waiting for its turn to access main memory.
Right now there is no good way to physically test this scenario. It must be modeled. For this and other reasons, I have built a rather complex software model of the entire PC architecture. Using this model, I am able to estimate the CPU performance impact of main memory arbitration conflicts resulting from AGP texturing.
Intel has publicly released figures that show that a 300MHz P2 requires about 100MB/s of external bandwidth while running 3D Winbench. Third party testing has demonstrated that games can create a CPU bandwidth demand of 50 to 120MB/s from main memory. In the face of this load, AGP texturing could also place a concurrent main memory bandwidth demand of about 50 megabytes per second (or more).
Using this data, my system performance model shows that main memory conflicts between the CPU and the graphics controller will result in a CPU performance reduction of more than 10%. This assumes the use of a Pentium II with the back side cache intact. The cacheless Covington would be brought to its knees under these circumstances.
The assumption that AGP is a little faster now, but will be a lot faster when it is "really turned on" is completely false. In fact the opposite is true. In most cases, system which actually use AGP for textures will be potentially 40% slower than systems which use local graphics memory for textures instead.
Is this enough motivation to make sure you get a graphics card with enough memory to do the job?
The Deal with the i740
Intel often talks about a "Balanced Architecture". A 3D graphics computer with a small amount of graphics memory is horribly imbalanced. It is interesting to observe that Intel is now manufacturing and selling its own i740 based graphics cards. These cards come in 2 meg or 4 meg configurations only - and there is no option to add more memory. Why would any intelligent user select such a board? Just to save a few dollars on graphics memory?
Though the i740 seems to be a very good product, beneath the surface, there are some things which are very odd about it. First and foremost, its drivers have been crippled so that it will refuse to allocate any textures in local graphics memory. Any texture allocated by D3D will be automatically redirected to AGP space regardless of how much graphics memory is available. With an 8M configuration, about 2M is usually used for the front and back buffer (plus Z sometimes). This leaves a very generous 5 or 6 megabytes for high speed local texture caching. The i740 drivers will refuse to make use of this precious resource (even when the game demands it) but instead moves all textures into slower AGP memory. It should be no surprise why the i740 is not blowing away the competition in terms of raw performance.
If Intel insists on crippling the i740, there is only one other company that has the ability to fix the situation. That company is Real3D – the developer of the i740 and its drivers (in cooperation with Intel). If Real3D becomes the only i740 board manufacturer that supports local texturing, their product could blow away all of the competing i740 based boards. As an alternative, they could make a good business out of licensing their drivers to other graphics card manufacturers.
Meantime, the best bet for anyone who cares about 3D performance is to distance yourself from AGP texturing by choosing a high performance graphics card with a generous memory configuration - or at least the option to add a memory upgrade later. If you cannot stop yourself from buying a i740 card, the safest bet may be to buy it from Real3D – then to bombard them with email until they implement local texturing capability in the drivers.