Introduction
I guess that I don’t have to explain too much about AGP, the ‘accelerated graphics port’, anymore, which is nowadays the state-of-the-art interface between the system chipset and the graphics card. AGP was developed three years ago to allow data transfer between the system and the graphics adapter at a significantly higher bandwidth than PCI. It was 1997 when graphics cards with 3D-acceleration became not only fashionable, but also pretty common, and those 3D-accelerating graphics cards required much more data from the CPU and memory than their only ‘2D-accelerating’ predecessors. AGP was born to accommodate those new needs. Here’s an article I wrote about AGP in 1997, which gives you more details about it.
The Theory of AGP-Performance vs. PCI-Performance
Already at the first launch of the AGP, specifications allowed two different AGP-speeds, AGP1x and AGP2x. The main differences between AGP and PCI start with the fact that AGP is a ‘port’, which means it can only host one client and this client ought to be a graphics accelerator. PCI is a bus, and it can serve several different kinds of clients, may it be a graphics accelerator, a network card, a SCSI host adapter or a sound card. All those different clients have to share the PCI-bus and its bandwidth, while AGP offers one direct connection to the chipset and from there to the memory, the CPU or the PCI-bus.
The normal PCI-bus is 32-bit wide and is clocked at 33 MHz. Thus it can offer a maximum bandwidth of 33 * 4 byte/s = 133 MB/s. The new PCI64/66-specification offers four times as much, it comes with 64-bit width and a 66 MHz clock, thus its bandwidth limitation lies at 533 MB/s. However, please let’s not forget that PCI64/66 is hardly supported anywhere yet and it was particularly developed to host I/O-controllers with very high data bandwidth, as e.g. IEEE1394 or Gbit-network interface cards. AGP is clocked at 66 MHz to begin with and it is 32-bit wide. This offers a maximal bandwidth of 266 MB/s in case of AGP1x, where data is transferred the common way, at the ‘falling edge’ of each clock. AGP2x offers 533 MB/s, by transferring data at the rising as well as the falling edge of the clock. The new addition called AGP4x doubles this bandwidth another time to 1066 MB/s.
Why AGP?
In the first years of AGP its higher bandwidth was mainly used to get textures for 3D-objects to the 3D-accelerator. Some 3D-accelerator took merely advantage of AGP’s high bandwidth and used it for the same kind of tasks as they would have used PCI before. Other 3D-chips were using the ‘AGP-texturing’, which enables the 3D-accelerator to store and leave large textures in the main system memory and use them in for the rendering process directly from there, without storing those textures in its local graphics memory. This is certainly still an issue today, but the demands for AGP4x were coming from a different corner in the 3D-rendering process, the transfer of triangle-data of complex 3D-objects. Before a 3D-scene goes through the transform and lighting part, the objects in this scene need to be known to the renderer. The more detailed these objects are, the more vertices have to be transferred. NVIDIA’s GeForce, as the first 3D-accelerator with integrated Transform and Lighting, can process a huge amount of triangles, but before it can start, the data needs to be transferred to GeForce, which is obviously done over the AGP.
Benchmarking the AGP
This fact obviously needs to be considered when benchmarking the AGP as well. AGP-benchmarks from a few years ago did nothing but displaying scenes that were using huge textures, trying to saturate the AGP with large texture data streams. Those benchmarks were hardly able to show much of a difference between AGP1x and AGP2x even back then, but they certainly can’t show the performance advantage of AGP4x today. This is why you need to use different techniques to saturate the AGP. The best way to show AGP-performance today are 3D-scenes with very complex objects in it, using the AGP to transfer huge amounts of triangle data. You will see that in the benchmark results below. However, today’s 3D-games are not using by far enough polygons to saturate AGP4x. Again we’ll have to wait for ‘upcoming titles’. For the time being it is mainly professional OpenGL-software that uses very complex 3D-objects. Thus this software is most suitable to take advantage of AGP4x right now.
Issues Combined with AGP
If you should have read my good old article ‘AGP – A New Interface for Graphic Accelerators‘ you may recall that back then I demanded the 100 MHz memory bus to supply enough bandwidth for the AGP and the other parts of a system that require memory access at the same time. Today the demands are of course even higher. The AGP’s data bandwidth can only be used completely if the system has ample memory bandwidth. The memory is permanently accessed by several system devices at the same time, as the CPU, PCI-Masters, DMA-devices and the AGP. If the AGP is to supply its full bandwidth, the memory bandwidth needs to be at least as high as the AGP-bandwidth, since the memory is where the data to the AGP-device comes from under most circumstances. In case of AGP4x and its 1066 MB/s at least PC133-memory is required, which offers exactly the same bandwidth of 64-bit times 133 MHz = 1066 MB/s. We remember however that the AGP has never got the memory bandwidth to its own disposal; it has to share it with the rest, so that AGP4x can only live up to its full capacity when the system is either using RDRAM or the upcoming DDR-SDRAM. One PC800 RDRAM-channel, as used in platforms with Intel’s 820-chipset, supplies 1.6 GB/s, PC200 DDR-SDRAM offers the same, PC266 DDR-SDRAM raises that to 2.1 GB/s and finally two PC800 RDRAM-channels, as found in platforms with Intel’s 840-chipset, can supply even 3.2 GB/s. Platforms with one of those memory types will show better performance than PC100 or PC133-platforms as software is starting to make usage of AGP4x.
Fast Writes, a Unique Feature of GeForce
One of the special features of NVIDIA’s GeForce256 graphics accelerator is its unique support for ‘Fast Writes‘. The idea behind this implementation is the improvement of data transfers that go directly from the CPU to the graphics chip, which obviously does not touch such things as e.g. ‘AGP-texturing’. 3D-software with very complex 3D-objects requires that the CPU transfers a huge amount of triangle-data to the graphics chip and here the ‘Fast Writes’ avoid the stalling detour from the CPU to memory and then from memory to the 3D-chip. ‘Fast Writes’ idea is to directly connect CPU and 3D-chip. So far about the theory, please look at NVIDIA’s white paper for more detail. Currently ‘Fast Writes’ are only made available to platforms using either Intel’s 820 or 840 chipset. Other AGP4x-chipsets like VIA’s Apollo Pro 133 Slot1-chipset and VIA’s Apollo KX133 SlotA-chipset are currently not supported by GeForce’s drivers. Further down in this article you will find out why this is an actual blessing right now, since the driver seems to have some problems with Fast Writes, leading to a rather significant drop in performance in i820 and i840-systems.
AGP and WindowsNT
After describing the hardware-facts of AGP I should not forget to mention that AGP requires a bit of software as well. As you might recall, AGP offers the graphics chip fast access to system memory for several purposes, AGP-texturing is one of them. The operating system has to be aware of that and hand memory resources over to the driver of the graphics card. The GART (graphics address remapping table) is where these memory resources are listed and the GART-driver is the software that takes care of it. Today all graphics card drivers for Windows95 and Windows98 include the GART-driver for platforms with Intel-chipsets, called ‘vgartd.vxd’. The other chipset vendors have to supply their own GART-driver with the software that comes with the motherboard. An Athlon-system for example is not able to even recognize its AGP-graphics card unless you’ve installed this driver, the driver-file called ‘amdmp.sys’ for the AMD750-chipset or ‘viagart.vxd’ for VIA’s Apollo KX133 chipset.
Microsoft’s soon released, but heavily used operating system Windows NT was actually never meant to offer AGP-support. There is no GART-driver in any of the many service pack updates for NT, so that graphic-chip vendors were left alone to supply AGP-support under Windows NT. This AGP-support may or may not be implemented into the NT-driver of a graphics card. You can only tell by using some special detection software or from benchmarking under NT. I have so far tested the AGP-support under Windows NT only for NVIDIA graphics chips and found out that TNT, TNT2 and GeForce have AGP-support, but usually only on platforms with Intel-chipset. Platforms with other chipsets can only take advantage of the so-called ‘PCI66’ mode under NT, which offers a data bandwidth of not quite as much as AGP1x. The latest, but not official, exception to this rule is currently only VIA’s new Athlon chipset Apollo KX133, which runs GeForce at full blown AGP4x even under Windows NT. This will hopefully all improve with Windows2000.
Benchmarks for AGP-Testing
I’ve already mentioned it above, AGP-benchmarks are not that easy to come by and most games don’t take much advantage of anything faster than AGP2x. It’s also important to be aware of some more restrictions when using 3D-games for AGP-benchmarking. The test is obviously supposed to have AGP as its bottleneck, so that you get different results at different AGP-speeds. It’s obviously rather helpful to use fast CPUs if you want to avoid that the CPU becomes the bottleneck. It’s also important to avoid the graphics card becoming the bottleneck, typically in form of a fill rate or local memory bandwidth restriction. Thus the 3D-gaming benchmarks should not run at a screen resolution and color depth that is too high, because that’s when the fill rate or memory bandwidth restrictions kick in.
After several trial and errors I found out that the ‘High Polygon, 1 Light’-benchmark within 3Dmark2000 is a very revealing AGP-benchmark. NVIDIA’s SphereMark, found at NVIDIA’s website, is also producing valid results for this purpose. The only 3D-game I used was the widely available Quake 3 Arena from Id-Software, which I ran at 3 different settings, the ‘NORMAL’-setting, the ‘High Quality’-setting and at a setting derived from ‘High Quality’ with the resolution increased to 1024×768 pixels. It also turned out that SPEC’s ‘viewperf’ OpenGL-benchmark was producing very helpful results. I ran it under the operating system Windows98, which is rather odd for this benchmark, but because of the above mentioned driver restrictions of GeForce’s NT-driver I was not able to adjust the AGP-speeds under WindowsNT.
Adjusting the AGP-Speed
Adjusting the system so that it would run under different AGP-speeds is a rather difficult task. It varied for almost each platform that I used.
- Athlon-platform with AMD750 Irongate-chipset
AMD750 or ‘Irongate’ is supposed to offer AGP1x and AGP2x, but you are certainly aware by now that on those systems GeForce is running only at AGP1x by default. There is a switch in the driver however, which enables AGP2x on Irongate. It is an entry into the system registry of Windows 95 or Windows 98:
[HKEY_LOCAL_MACHINESoftwareNVIDIA CorporationGlobalSystem]
“EnableIrongate2x”=dword:00000001
Setting it to ‘1’ as above results in AGP2x-mode, removing the entry completely resets back to AGP1x. For some reason I could set this entry to ‘0’ and AGP2x would still be running. Please don’t forget to restart your system before a change in the registry is taking effect. I’d also like to note that most Irongate-systems will sooner or later hang in 3D-games if you keep AGP2x enabled. - Athlon-Platform with VIA’s Apollo KX133-chipset
It’s very nice and easy in case of this chipset, because you can switch between AGP1x, 2x and 4x within the BIOS-setup. For some reason the registry-entries mentioned below don’t have any impact on the AGP-speed in case of KX133. - Pentium III-Platform with Intel’s 820 and 840-chipset
There is another registry-entry for the GeForce-driver that impacts on the AGP-speed:
[HKEY_LOCAL_MACHINESoftwareNVIDIA CorporationGlobalSystem]
“ReqAGPRate”=dword:00000004
Setting this entry to ‘4’, ‘2’ or ‘1’ will switch between the different AGP-speeds. Somewhere on the web I found somebody proclaiming the setting ‘256’ or hex ‘100’ as a helpful adjustment, and I would like to advise you that this setting does only have one effect, it confuses the GeForce-driver and slows down system-performance tremendously. Again I’d like to mention that you need to restart your system before the setting takes effect and also that this setting doesn’t have any influence on KX133-platforms.
Benchmark Setup
Hardware Information | |
CPU | AMD Athlon 800, Pentium III 800EB |
KX133 Motherboard | VIA VT5249B1 Reference Board, BIOS date Jan 4, 2000 |
Irongate Motherboard | Gigabyte GA-7IX, BIOS Dec. 1999, Super Bypass enabled The previously recorded Asus K7M-board without Super Bypass was not used in this test. I apologise for this mistake. |
Intel 820 Motherboard | Asus P3C-L, BIOS 1012 beta 3 |
Intel 840 Motherboard | Intel OR840, BIOS OR840600.86E.0207.P02. |
Memory for Athlon boards | 128 MB Micron/Crucial Technologies PC133 CAS2 |
Memory for PIII-boards | 2 x 64 MB Samsung PC400 RDRAM RIMMs |
Network | Netgear FA310TX |
Graphics Card | NVIDIA GeForce256 DDR Reference Board |
Driver Information | |
GeForce256 | NVIDIA Reference Driver rev. 3.68 |
KX133 Chipset Drivers | VIA 4in1 4.19 |
Environment Settings | |
OS Version | Windows 98 SE 4.10.2222 A |
DirectX Version | 7.0 |
Screen Resolution for SPECviewperf | 1280x1024x32 |
Refresh Rate for all resolutions | 85 Hz |
SPECviewperf version | 6.1.1 |
AGP-Speed Benchmark Results – 3DMark2000
This test is sending a large stream of triangle-data to GeForce, thus taking advantage of AGP. Keeping the light sources as low as possible makes sure that GeForce’s lighting-engine isn’t becoming the bottleneck in this benchmark. You can see that there’s a vast difference between AGP1x and AGP2x. The step from AGP2x to AGP4x however is rather small, probably due to a limitation of either the CPU, GeForce’s transform-engine or fill rate.
AGP-Speed Benchmark Results – SphereMark
The results of SphereMark are very similar to the ones from above. Again this benchmark is mainly sending many triangles over the AGP to the GeForce-chip. It’s easy to see that AGP1x definitely doesn’t cut it.
AGP-Speed Benchmark Results – Quake 3 Arena
Things change when testing with Q3A at the ‘Normal’-setting, which runs at only 640×480, has only 16-bit color, low texture-detail and less polygons. Technically even AGP1x is still sufficient, in case of i840 you can not even see any difference between the different AGP-speeds at all.
In the ‘High Quality’ settings the game is a lot more detailed and at least some effect of the AGP-speed kicks in. If 4 fps are important to you, you might wonder why AMD claims that there’s virtually no difference between AGP1x and AGP2x on Irongate, but I do admit that less than 4% are not exactly a huge difference.
At 1024×768 GeForce’s fill rate or local memory bandwidth is starting to even the results out, the results are not differing a whole lot at all.
AGP-Speed Benchmark Results – SPECviewperf 6.1.1
For people who are using OpenGL-software professionally those results could be rather meaningful. You could say that under Windows NT or the upcoming Windows2000 the results will be very similar. However, please don’t forget that you currently don’t have any chance of switching to AGP2x on Irongate-platforms under NT.
Advanced Visualizer may not care much about AGP4x, but it definitely makes a big difference between AGP1x and AGP2x. Again I wonder what AMD has to say to the Irongate-results, since a performance-difference of more than 20% is definitely noticeable.
The CAD-software Design Review is not making much of a difference between the AGP-speeds, although there’s at least some improvement when you switch from AGP1x to AGP2x.
Data Explorer handles AGP-speed pretty much the same as Design Review above. It shouldn’t really be AGP1x, but more than AGP2x seems not really required.
AGP-Speed Benchmark Results – SPECviewperf 6.1.1, Continued
Lightscape is again rather unimpressed by AGP4x, but at least on i840 you can notice quite a difference between AGP1x and AGP2x.
Last but not least the well known ProCDRS-benchmark for ProEngineer shows even quite an advance if AGP4x is used. AGP4x seems to be worth a consideration for ProEngineer users.
Summary of AGP-Speed Results
The benchmarks have shown that even now AGP-speed doesn’t seem to have a whole lot of an impact on the current crop of 3D-games. For gamers AMD might get away with its claim that AGP1x is just as good as AGP2x as long as the difference is only 4%. This will only remain until finally 3D-games with detailed objects and thus high polygon counts become available, as NVIDIA has been promising us for quite a while. Those games will certainly make a difference between the different AGP-speeds. The first two benchmarks should give you a pretty good idea of it.
Professional users of OpenGL-software should avoid the Athlon/Irongate-combination if they are planning to use a 3D-card with NVIDIA’s GeForce256-chip, as e.g the Quadro-cards. At the moment it’s close to impossible to run any GeForce or Quadro card on Irongate under NT at anything more than PCI66-speed. Athlon/KX133 or Pentium III on an expensive RDRAM-platform are the smarter choices.
‘Fast Writes’ on Platforms with i820 and i840
It seems as if NVIDIA is currently having some problems with ‘Fast Writes’. This feature is obviously supposed to improve performance, but currently the system runs faster without it. Unfortunately ‘Fast Writes’ are enabled by default in GeForce’s driver. There is a way however to disable it at least under Windows 95/98.
A registry switch is also taking care of the fast writes:
[HKEY_LOCAL_MACHINESoftwareNVIDIA CorporationGlobalSystem]
“ReqAGPFW”=dword:00000000
Again I’d like to note that you need to restart your system to make the setting effective. The same person who published the hoax with the AGP-speed switch also advises to turn this feature on by entering ‘1’. This doesn’t have any impact on platforms with chipsets other than i820 or i840. It’s also the default setting anyway, so that you can definitely save the time of entering this setting in the ‘enabled’-mode into your registry. You will only get the quite noticeable performance-boost by disabling this feature though, via entering the above shown ‘0’.
Here are some benchmarks that show the difference:
‘Fast Writes’ on Platforms with i820 and i840 – 3DMark2000
‘Fast Writes’ on Platforms with i820 and i840 – SphereMark
‘Fast Writes’ on Platforms with i820 and i840 – Quake 3 Arena
‘Fast Writes’ on Platforms with i820 and i840 – SPECviewperf 6.1.1
You can certainly see what remarkable difference this little setting makes.
I informed NVIDIA about those findings last week, so that I’m sure you can expect a driver-update that takes care of this issue very soon.