Introduction
When AMD released the first Athlon processor in August 1999, we were already told that its point-to-point processor bus, Alpha’s EV6, supports multi processor configuration. It took almost two years however, until AMD finally released a chipset that would support more than one Athlon processor. Today AMD releases the AMD760MP chipset, which supports two AthlonMP processors, which are not based on the well-known Thunderbird core, but on the new Palomino core, which has recently been released as a mobile version. Therefore, another AMD-product is being launched today as well, the new AthlonMP processor.
Due to significant time constraints because of Computex2001 and because AMD was unable to provide sufficient information to me on time, this article will be shorter and less detailed than what you are used to as a Tom’s Hardware Guide reader. I had to make a decision about which product I consider more important and I decided for NVIDIA’s ground breaking nForce chipset. Don’t be too disappointed though. I have still benchmarked AMD760MP as well as AthlonMP for four days without a break and can provide you with a rather large amount of benchmark data, including some performance evaluation of single Palomino in comparison to single Thunderbird. You will simply have to do without the usual technical in-depth information. I will provide this information next week once Computex2001 is over.
The AMD760MP Chipset
First, I have to note that the 760MP is not a complete new chipset. Only the AMD762 north bridge, which communicates with the two processors, the system memory and the AGP, is of new design. The AMD766 south bridge that completes the 760MP chipset is already known from the previous AMD760 chipset. This south bridge is of classical design, connecting to AMD762 via the PCI-bus and providing the common features as ATA100 interface, 4 USB ports, interfaces for the serial/parallel ports and the SM-bus controller. The AMD762 north bridge is the interesting and obviously important part of 760MP, as it hosts the two AthlonMP processors that run in SMP configuration.
The AMD762 provides two EV6 buses, one to each AthlonMP, and a DDR-only memory controller that runs the memory at the same clock as the processor bus, either 100 MHz (200 MHZ DDR) or 133 MHz (266 MHz DDR). 762 does not support any PC100 or PC133 SDRAM, just as its older brother the AMD760 chipset. Tyan’s K7 Thunder motherboard, which is currently the only platform with AMD760MP that is available on the market, requires registered DDR-DIMMs. This DIMM type comes with additional buffers to reduce the load of the memory controller. The K7 Thunder does not run with the common unbuffered DDR-DIMMs, although the AMD760MP specifications allow the support of up to two of them. In a few weeks or months, we will see low-end AMD760MP motherboards that will probably support two of the common unbuffered DIMMs as well. AMD762 supports the typical ECC goodies, including ‘memory scrubbing’ as you would expect from a workstation/server chipset. The final feature that 762 has over the single-Athlon 761 north bridge of AMD760 is the support of 64-bit PCI or short PCI64. While PCI64 is rather uninteresting for desktop as well as most workstation users, it can be of importance for servers to accommodate cards with high I/O loads.
The One And Only AMD760MP Motherboard
Different to usual chipset launches there is only one motherboard with AMD760MP chipset available right now. Tyan worked together with AMD on the board design and in return it will take other board makers at least another month until they will be able to supply their own AMD760MP solutions. Tyan’s K7 Thunder motherboard is a real high-end monster with tons of nifty features, but it also carries a hefty price tag of $700-800, which makes it rather unaffordable for sane, but enthusiastic Athlon SMP fans.
Here’s the long features list:
- 2 SocketA plus 2 VRMs
- 1 AGP Pro Slot
- 5 PCI64/32 Slots
- 4 DIMM Slots for PC1600/2100 registered DDR-SDRAM
- ECC-support
- 2 3Com 3C920 LAN controllers 100/10 Mbit with two WOL headers
- Adaptec AIC-7899 Dual Channel Ultra160 SCSI Controller
- Optional ATi Rage XL gGraphics Chip with 4 MB video memory
Of course it comes with the usual floppy adapter, serial and parallel port and ATA100 IDE controller as well.
The board showed stable operation with single and dual AthlonMP processors as well as with single/dual Athlon processors based on the Thunderbird core and single/dual Duron processors. It even supported the ‘abuse’ of two Thunderbird Athlons with different clock speeds. I tested the K7 Thunder with an Athlon 1200 plus an Athlon 1333 and it ran both processors at their respective speeds without major problems, even showing improved scores in SMP-capable benchmarks. However, neither 3Dmark2001 nor Sysmark2001 Office Productivity would run properly with this configuration. We should note that theoretically the point-to-point processor bus of Athlon actually allows the operation of two differently clocked Athlon processors.
The Tyan K7 Thunder requires a special 460W power supply, which comes with board connectors that are different to the common ATX power supplies.
Currently there is only one supplier of this power supply, Delta Electronics Inc. The K7 Thunder does not run with common ATX power supplies.
AthlonMP
We have seen pictures of Athlon with Palomino core in form of mobile Athlon4 processors before. AthlonMP is using the same core as Athlon4, but it doesn’t put the PowerNow! power saving features to any use. The core voltage of our AthlonMP 1200 test samples was a surprisingly high 1.75 V. We know the new features of the Palomino core from our previous Athlon4 Mobile article, but I’ll sum them up once more:
- New design to reduce power consumption over Thunderbird core by 20%
- Implementation of the full Intel SSE instruction set. The SSE processor flag is set so that software can recognize AthlonMP as a SSE-capable processor. AMD calls its SSE-implementation ‘3Dnow! Professional’.
- Hardware auto data pre-fetching unit
- L1 Data TLB (Translation Look-Aside Buffer) was increased from 32 to 40 entries, the architecture of the data and instruction L1 and L2 TLBs was made exclusive and TLB-entries can be written speculatively.
- Implementation of a thermal diode to monitor processor temperature
It is a bit disappointing to see that AMD is only releasing AthlonMPs at 1 and 1.2 GHz, but obviously, the validation had to be a lot tougher than for a single processor Athlon processors, and the relatively low clock speeds seem to be required for reliable SMP operation.
We learned from AMD’s presentation, that the above features should improve Palomino’s performance over the previous Thunderbird Athlons. The benchmark results further down will confirm this. In terms of compatibility, I have made the experience that many current SocketA motherboards will run AthlonMP, but the majority of boards didn’t recognize the Palomino core, which resulted in unstable operation. Only Asus’ A7V133 was able to report ‘Athlon H-series Processor’ and ran AthlonMP reliably. Unfortunately, I didn’t benchmark AthlonMP on this board, since it only supports PC133 and therefore isn’t able to show Palomino’s full potential. The motherboards that didn’t recognize Palomino did also somehow ‘cover’ the SSE-flag, so that software was unable to recognize Palomino’s SSE-capability. Only the Asus A7V133 and of course Tyan’s K7 Thunder enabled the correct reporting of a set ‘SSE’ feature flag.
Overall, I expect that many current SocketA motherboards will support Palomino once they are equipped with the proper BIOS. Until then, I would suggest to buy a faster Thunderbird Athlon instead, since AthlonMP is more expensive than the ‘normal’ Athlon with Thunderbird core. AMD supplied us with the following pricing information:
$265 for AthlonMP 1.2 GHz (OEM price for 1000 unit quantities)
$215 for AthlonMP 1.0 GHz (OEM price for 1000 unit quantities)
AMD SMP Technology – SmartMP
The SMP-capability of Athlon’s architecture was obvious from the day when AMD disclosed that Athlon will use Alpha’s EV6 processor bus. Athlon’s processor bus is a point-to-point bus. This requires a processor bus for each CPU in case of multiple processor configurations, which increases manufacturing costs of motherboards, but it ensures full data bandwidth for each Athlon processor. Intel’s processors, including the recently released Xeon 4, are using a shared bus architecture, which keeps motherboard production costs down, but it has the disadvantage that the processors have to share the processor bus bandwidth.
Due to time constraints I will supply you with additional information about AMD’s SmartMP in form of an AMD paper. I apologize and promise to give my own evaluation at a later point in time.
You could say that AMD’s SMP solution has some clear performance advantages over Intel’s SMP-architecture, but it is more expensive to implement due to the requirement of dedicated processor buses, which makes motherboard design complex and expensive. We will have a look at our benchmarks later to see if dual AthlonMP is indeed able to show an edge over Intel’s dual Xeon 4.
The Benefits Of SMP
Multi processor systems are common for servers as well as complex workstations for a long time. However, so far the average home user was rather unable to get any benefit out of SMP. The first reason why the average PC-user won’t be able to take much advantage of a multi processor system is the operating system. The most common OS today is Windows98, before it was Windows95 and many people today are using WindowsME. None of those three operating systems is able to support more than one system processor. If you want to get something out of a multi processor system, you require either Windows 2000, Windows NT, BeOS or one of the UNIX OSes, like e.g. Linux or FreeBSD. Then you need to either use certain software that is programmed for multi-threading, because normal software can’t occupy more than one processor, or you have to be a person that uses many performance intensive programs at the same time. A single processor system is well able to run Excel, Word and PowerPoint at the same time without the urgent need nor benefit of an additional CPU. However, if you want to zip large directories or burn CDs while e.g. working with Word, you will indeed benefit from two processors, since one processor will be busy zipping or burning the CD, while the other is available to you for other workloads. If you neither use special workstation software, like e.g. 3D rendering or CAD, that has been designed for multi processor operation, nor run more than a few low profile programs at the same time, it is better to spend your money on one fast CPU rather than two slower CPUs. This is particularly important to 3D-gamers. Only Quake 3 and games that are based on the Quake 3 engine are currently able to take advantage of multi-processor system. All other 3D-games run just as fast with one as with two CPUs, unless you are running some other processor intensive software in the background.
The verdict remains the same. The majority of PC users will not benefit from SMP-systems. Performance crazy Quake 3 Arena gamers should have a good look at our benchmarks though.
Benchmark Setup
Hardware | ||||
CPU | AthlonMP 1.2 GHz, Athlon 1.2 GHz, Athlon 1.33 GHz
Pentium 4 1.5 GHz, Pentium 4 1.7 GHz, Xeon 4 1.7 GHz |
|||
Motherboard | MSI K7 Master – BIOS 1.2
Tyan K7 Thunder – pre-release BIOS Asus P4T BIOS 1.05beta1 Intel OEM i860 no name – pre-release BIOS |
|||
Memory | 256 MB unbuffered Infineon PC2100 DDR SDRAM 2-2-2 (AMD760)
256 MB registered Corsair PC2100 DDR SDRAM 2-2-3 (AMD760MP) 256 MB Samsung PC800 RDRAM (i850/i860) |
|||
Hard Drive | IBM DTLA 307030, 30 GB, 7200 RPM, ATA100, FAT32 (Win98) / NTFS (Win2k) | |||
Network Card | NetGear FA310TX | |||
Graphics Card | NVIDIA GeForce 2 Ultra Reference Card, Driver 12.40 (Win98/Win2k) | |||
Software and Settings | ||||
Operating System | Windows 98 SE / Windows 2000 Professional Service Pack 2 | |||
Desktop Resolution for BAPCo’s Sysmark 2000 and Webmark2001 | 1024x768x16x85 |
|||
Quake 3 Arena | Retail Version no sound |
|||
3D Mark 2001 | Default Benchmark | |||
SiSoft Sandra Standard | Version 2000.3.6.4 | |||
FlasK Settings | Video Codec: DivX 3.11 alpha, Fast-Motion, keyframe every 10 seconds, compression 100, data rate 910 kbps Audio Codec: audio not processed Video Resolution: 720×480, 29.97 fps, interlaced Resizing: Nearest Neighbor |
Windows 2000 Results – Sysmark 2001
The new Sysmark2001 is finally able to show benefits from SMP-systems, because it runs several applications simultaneously, even though the majority of applications are not designed for SMP-operation.
Dual Xeon 4 1.7 GHz is clearly leading the pack, but we should not forget that dual AthlonMP has a hefty 500 MHz processor clock disadvantage. Still you can see that AthlonMP as well as dual AthlonMP is performing better than Athlon/dual Athlon.
The internet content creation part of Sysmark2001 is clearly ruled by Pentium 4 and Xeon 4 1.7 GHz, but you can see a major performance increase of the Athlon systems once you switch from single to dual processor operation.
In office productivity Xeon 4 1.7 GHz is scoring surprisingly well, but you can see that office software doesn’t benefit much from dual-processor systems, even if you run several programs at the same time.
Windows 2000 Results – 3D Studio Max R3
I am aware of the fact that 3D Studio Max R4 is already available, but we were unfortunately not able to get this software early enough to include it in our tests. R3 will do just as well. This time we rendered the complex 3D-scene displayed above. It took many minutes until it was completed.
This is the chart for the render time. Please note that less is better in this one! It is obvious that 3D rendering is still ruled by the powerful FPU of AMD processors. Dual AthlonMP is the clear winner, while you have to wait almost 4 minutes longer until dual Xeon 4 1.7 GHz finishes rendering the same scene.
This chart shows pretty much the same, but is for people who don’t understand that a shorter bar can sometimes mean a better result. Once again, you can see that dual AthlonMP leads with a little edge over dual Athlon. Intel’s Pentium 4 / Xeon 4 gang is not able to compete against any of the AMD-scores.
Windows 2000 Results – CINEMA 4D
CINEMA 4D is another 3D rendering software.
This benchmark is clearly ruled by the dual processor systems. All three dual processor systems score almost alike, which makes you wonder if they might not be limited by a different issue.
Windows 2000 Results – Quake 3 Arena
Quake 3 Arena has the famous ‘r_smp’ switch that enables multi threading and you can see that the scores of dual-CPU systems are improved significantly. We know that the Pentium 4 architecture that is also found in Xeon 4 is able to score particularly well in Quake 3, which is why dual Xeon 4 is the clear winner of this contest.
Things change however, once the processor intensive NV15 demo is used. Now dual AthlonMP is able to get an edge over dual Xeon 4. You should also note the tremendous relative increase in frame rate from single to dual processor operation.
Windows 2000 Results – FlasK MPEG4 Encoding
The latest version of FlasK does indeed have SMP-support, as you can see in the results below.
Dual Xeon 4 is winning this comparison with ease, as we would have expected from a Pentium 4 architecture. However, the performance increase of the dual AthlonMP compared to single is very impressive once more.
Windows 2000 Results – 3DMark2001
We wouldn’t expect any SMP-support within 3DMark2001, but it is important to show it, so that you believe it.
All processors are scoring pretty much alike, proving that 3DMark2001 has become a pure 3D-card benchmark, at least as long you run it at default settings. The single Xeon 4 did not run, which seemed due to unstable operation of a single CPU in the dual motherboard.
Windows 2000 Results – SiSoft Sandra 2001 Pro SE
I am including those numbers for geeks only, because they don’t have much of a real world meaning. Still it’s good to check them out. Again, the single Xeon 4 would not run Sandra reliably.
Well, from the Flops and the Mips point of view, dual Xeon 4 1.7 GHz is the big winner.
The multimedia performance of the three dual processor systems is pretty much identical, making this test disappointingly useless for this comparison.
It is interesting to see the increase in memory bandwidth of the dual-processor Athlon(MP) systems. As usual, the RDRAM platforms of Pentium 4 and Xeon 4 are scoring the highest numbers in this test.
Windows 2000 Results – Dual Processor Benefit
Now I am coming to the most interesting chart in this comparison. How much do you actually gain from using two processors instead of one? Is the benefit all the same, or are there differences?
With the exception of CINEMA 4D, all results show the same trend. AthlonMP gets the by far most benefit out of dual processor operation. Athlon comes in second place and Xeon 4 is the loser. This shows the high potential of dual AthlonMP. Xeon 4 will have a hard time, once AMD is able to supply AthlonMP at competitive clock speeds. It’s also interesting to see that Athlon with Thunderbird core is indeed not able to benefit from dual operation as much as Palomino. This has to be due to one of the enhancements that were implemented into the Palomino core.
Windows 98 Benchmarks [Updated]
I am supplying these benchmarks only to show you the performance difference between the Thunderbird and the Palomino core, since Windows 98 is not able to take advantage of dual processor operation. It’s just to give you an idea how the future single-CPU Palomino will perform in comparison to the current Athlon with Thunderbird-core.
Windows 98 Results – Sysmark2001
Before you look at the numbers, I would like to mention that I am extremely disappointed with BAPCo’s new Sysmark in terms of its consistency. Especially under Windows98 you get results that vary in a range of up to 8%. Even under Windows2000 the variance is too high to e.g. use it for motherboard testing. This is why you have to take those results with a grain of salt even though we repeated each Sysmark2001 run (which takes about an hour) at least three times. As much as I like the benchmark for its SMP-capability as much I dismiss it for its inconsistent results.
I think we did enough test runs of Sysmark2001 under Windows98 to ensure that Palomino is indeed scoring those 3 points higher than Thunderbird, which is not really that much though.
In the internet content creation part of Sysmark2001, the difference between Thunderbird and Palomino is a bit larger, which is due to the fact that this component of Sysmark2001 scales more than the office productivity section.
The office productivity section of Sysmark2001 doesn’t scale very well and so there’s barely any noticeable difference between Thunderbird and Palomino.
Windows 98 Results – 3D Games
The Dronez benchmark is one of NVIDIA’s technology demos for GeForce3. It is using a lot of modern 3D-features, which makes it a very up-to-date kind of 3D-gaming benchmark. Here the difference between Thunderbird and Palomino is about 10%.
AquaMark is another GeForce3 tech demo and therefore another modern 3D-benchmark. The performance of Palomino is about 8% higher than the performance of Thunderbird.
Evolva does not show as much of a difference between Thunderbird and Palomino, but there’s still a noticeable delta between the two Athlon 1.2 GHz processors.
Windows 98 Results – 3D Games, Continued
In Quake 3’s demo001 Palomino is once more able to leave Thunderbird behind, but the difference is still not very substantial.
The NV15-demo doesn’t change the picture much. Thunderbird is about 5% slower than Palomino.
In Unreal Tournament the difference between Thunderbird and Palomino is minimal.
Altogether, we saw that Palomino is about 2-8% faster than Thunderbird under Windows98 as well as Windows2000. This is not as much as the promised 15%, but in some benchmarks almost as much as one speed bin.
Summary
Right now, the release of AMD’s new dual-Athlon chipset seems more like a technology demonstration rather than a full-blown product release. We have learned that dual-AthlonMP is performing very well indeed, but as long as AMD doesn’t supply AthlonMP at high enough clock speeds, Intel’s Xeon solution has a clear performance advantage.
The high cost of the Tyan K7 Thunder cannot be equalized by the relatively low costs of AthlonMP. Right now dual AthlonMP systems are only interesting for server and workstation setups, where the system price is not considered important. However, even though AthlonMP is significantly cheaper than its Intel Xeon counterpart, it doesn’t make a large difference once you look at the price of a complete server or workstation system.
We have seen that AthlonMP benefits more from dual-processor operation than Xeon, so that AthlonMP has a good chance to catch up and even overtake dual Xeon once AMD supplies AthlonMP processors at higher clock speeds. However, right now I wouldn’t see how AMD would be able to attract a reasonable amount of customers with dual Xeon beating AthlonMP in the majority of benchmarks.
Home users will have to wait for inexpensive motherboard designs before they can afford their own dual AthlonMP system.
The performance increase from Athlon with Thunderbird core to Athlon with Palomino core is with 2-8% not quite what we would have liked to see. It is one reason more why users who want to run Athlon in single configuration should wait until the official Athlon processors with Palomino core for single-CPU operation become available, rather than spend too much money on an AthlonMP now.
As far as our testing is concerned, we didn’t see any issues with dual-operated Athlon processors based on the Thunderbird core. Those processors feature the same SMP-capabilities as Palomino, even if AMD doesn’t want us to believe it. However, no Thunderbird was or will ever be validated for dual-operation, which basically means that AMD doesn’t take any responsibility if you should run into trouble. The story looks very similar to the good old Celeron SMP-situation of a few years ago. Intel claimed that Celeron wouldn’t support SMP, but in actual fact it did. It’s all politics, my friends, and wouldn’t the world be a boring place without it?