<!–#set var="article_header" value="Painting a New Picture of Pentium 4 –
Tweaked MPEG4 Encoding” –>
Introduction
This is now my third article about Pentium 4 within less than a week. Again, it will draw new conclusions because we found new interesting and important facts that need to be considered. I am fully aware of the risk that some of you might feel confused by now and some might lose their faith in me, but what I want to show with this continuous flow of Pentium 4 articles is that right now it is very difficult to ensure a valid, meaningful and fair evaluation of Intel’s latest processor, simply because it is of a brand new design with very special and uncommon behavior in what we would consider a default set of benchmarks. Even at this point I would be dishonest to say that our P4-testing has been finalized. In fact, I have just finished another new set of tests with Pentium 4 that will be excellent material for another article after this one.
What I want to reach with those numerous P4-write-ups is to keep you as a reader as updated as possible. I want to involve you in this complicated process of evaluating a very complex product that is of highly political as well as technical importance. Instead of claiming that Tom’s Hardware has got all the answers about Pentium 4 right now, I want you to realize that the ‘search for the truth behind Pentium 4’ is a meticulous task that takes a lot more than just running a few benchmarks, taking a few pictures, counting two and two together and then drawing some simple kind of conclusions. I also don’t want to serve you a bottom line that goes “… well, I also really don’t know what to say about Pentium 4, but I believe that it could maybe be a great product sometimes in the future. It might not be as well though …” I know that my readers expect authoritative conclusions from me, you expect me to ‘cut the crap’ and to come up with a valid, reliable and crystal clear stand point. With Pentium 4,reaching this clear standpoint is more difficult than with any other product that I have evaluated before. Therefore I decided to not just serve you a whole menu, but to actually involve you in the cooking process, serving dish after dish. The final meal-evaluation can only be done once the dessert has been eaten. Right now we are only at the third course.
Reactions To The Recent P4-Update Article
The Pentium 4 update article published on Wednesday, which I revised about 15 hours later to make it sound more politically correct, produced a huge amount of responses. While the Intel-followers amongst my readers complained that I was being unfairly harsh with Pentium 4 and AMD-supporters applauded me for the new tough criticism of Pentium 4, several software engineers felt compelled to write me their opinion about our MPEG4-encoding benchmark with FlasK MPEG and Pentium 4’s bad performance. Those responses were extremely interesting and eye opening, which is why I feel compelled to share them with those of you, who want to know a bit more about the background of video encoding and the development of software for it.
- Email from Guy Bonneau, Software Codec Architect, Video Products Group, Matrox Electronic Systems Ltd.
- Email from Jim Quinlan, Multimedia & Intel(r) Xscale(tm), Intel Corporation
- Tom’s Hardware Guide Community Posting from ‘AngusH’
I am amazed about the efforts those three and many other developers put into the mails and messages they sent me. It shows how qualified as well as dedicated the readership of Tom’s Hardware Guide is. I want to thank each contributor for his invaluable insights.
Reading each of those three messages carefully brings us to the following conclusions:
- Video encoding software is often programmed to tightly fit a special group of processors, because it is so time critical. Pentium 4 is a brand new design and therefore at possible disadvantage unless Intel would have made sure that Pentium III routines run just as good on Pentium 4, which is obviously not the case.
- The IEEE iDCT of FlasK MPEG might not be that useful after all, because the quality of the MMX-iDCT should be adequate in the first place. It seemingly isn’t however, which raises questions about the quality of the MMX-implementation.
- The re-programming or possibly only a re-compiling of the open source code of FlasK could possibly show Pentium 4 in much better light.
At the moment we might only be talking about video encoding, but the current situation seems to be exemplary for the majority of software that is available right now. Due to the fact that Pentium 4 goes very different ways than Pentium III or AMD’s Athlon, it happens to be at major disadvantage with today’s software. This was Intel’s decision. It is up to us to decide if this makes Pentium 4 a bad product.
Intel’s Admirable Reaction To Our Last Article
Now so far I only presented you a bunch of what some of you may call ‘idle talk’. The sentence “Pentium 4 would be faster if …” doesn’t sound particularly helpful, because it is merely based on wishful thinking. It would be a lot more helpful if we could have a look at some facts and hard numbers.
Luckily there’s a guy by the name of ‘Alex’, who works at Intel’s German headquarter in Munich. After he had read my Pentium 4 update article on Thursday, he decided to have a close look at the source code of Flask MPEG, which is openly available under the GNU software license. Thursday night Alex didn’t go home, but spent his time re-compiling the iDCT-code of FlasK using several different options. He built a special version of FlasK that would show a very different behavior. On Friday afternoon I received this little gem from Intel Germany and I was extremely impressed with what I saw.
Hi Tom,
here’s the version of Flask with the Intel additions built in. Our engineer quickly integrated and tested the following iDCT modules with Flask:
You find all the variants in this order selectable from the UI of the attached version. Here are the measurements on producing a non-compressed AVI (so the codec isn’t in the picture) out of a 1:02 min DVD title. The size of the produced AVI is 446KB. The system is a 1.5 GHz Win98SE. iDCT Module Time,m:s ============================ The new version was compiled with different compiler options. So only by different options No.6 improved significantly. We measured 10:36 with the unchanged Flask v0.594. Tom, please keep in mind that this program was edited in only half a day. So the engineer didn’t incorporate a CPU identification. If you select one of the SSE2 optimized paths running Flask on a non-SSE2-enabled machine, you might run into issues. Also we didn’t improve the User Interface or the like. The engineer concentrated on coding the given algorithms with SSE2. And finally we didn’t make extensive testing. As agreed on the phone please don’t distribute this version of flask to anybody else. We still haven’t got hold on the author of Flask and we don’t want to distribute this version without permission. If you find any strange things or irregularities we would be very happy if you contact George or ourselves in that case. Thank you and best regards, Hans & Christian |
Intel’s Famous Old Efficiency
Before I get into the discussion of this special version of FlasK MPEG, I would like to express my impression with Intel’s dedication, swiftness and professionalism in dealing with my Pentium 4 update article from Wednesday.
Intel had certainly not ignored my second P4-article. Instead of dismissing it as just another piece of bad press about Pentium 4, several extremely dedicated Intel employees spit in their hands and got to work right away, even scarifying a good night sleep and time with their families to sort this issue out. The result, as you will see below, speaks for itself.
You can say what you want about Intel, but you’ve got to envy them for having employees of this caliber. I certainly do. You also have to give them the highest respect for reacting quickly and wisely at the same time. I personally stand in awe and have to admit that Intel really caught me on the wrong foot here. Intel has a lot of mistakes and I am not forgetting this, but the above reaction deserves a huge amount of credit.
I hope that this issue is also able to show you, my readers, how much is actually happening behind the scenes. Technical journalism is not a black and white kind of job. It’s not just testing, writing and then publishing. There is a lot more going on and I hope you appreciate that I am trying to involve you in this once in a while. You don’t only have a right to know. I think that in several ways you’ve got the duty to know.
There’s another thing I might add. MPEG4 encoding and DVD-ripping in general, as well as FlasK MPEG and similar software in particular is seen by some groups (especially the self-important and money-thirsty movie industry) as pure piracy stuff. This is another reason why Intel could have dismissed the whole MPEG4-issue. Still Intel decided to get involved and make ‘its hands dirty’ with the coding of a possible ‘video piracy application’. I very much respect this as well.
The Tweaked FlasK MPEG
The changes made to the special FlasK version are pretty nifty. This is what the “Global Project Options’-tab of FlasK looks like normally:
You can spot the three different iDCT-options in the right upper area.
Intel’s special ‘overnight’-version looks like this:
That’s all you see. The only other difference is the file size of the two executables. The original version (rev. 0.594) is 995,328 bytes long, Intel’s version comes to 1,032,257 bytes.
Let’s get to the actual testing now.
Benchmark Setup
Hardware Setup | |
Pentium 4 Platform | |
Mainboard | Asus P4T motherboard Intel 850 Chipset |
Processor | Pentium 4 1.5 GHz |
Memory | 256 MB Samsung 45ns RDRAM, 2x 128 RIMMs |
Graphics Card | NVIDIA GeForce 2 GTS 32 MB Reference Card |
Operating System | Windows 2000 Professional SP1 |
Pentium III Platform | |
Mainboard | Asus CUSL2 motherboard Intel 815 Chipset |
Processor | Pentium III 1 GHz |
Memory | 256 MB Wichman WorkX PC133 SDRAM, 2x 128 MB DIMMs, 2-2-2-5/7 |
Graphics Card | NVIDIA GeForce 2 GTS 32 MB Reference Card |
Operating System | Windows 2000 Professional SP1 |
AMD Athlon Platform | |
Mainboard | Gigabyte GA-7DX Rev. 1.3 motherboard AMD 760 Chipset w/ VIA Southbridge |
Processor | Athlon 1.2 GHz, 133 MHz Bus |
Memory | 256 MB Infineon DDR-SDRAM, 1x 256 MB DIMM, CL 2 |
Graphics Card | NVIDIA GeForce 2 GTS 32 MB Reference Card |
Operating System | Windows 2000 Professional SP1 |
Due to the urgency of this article I refrained from running the tests at several different clock speeds of Pentium 4 for the time being. The Pentium 4 at 1.5 GHz should suffice to make the point. I also left out Athlon 1200 on KT133 for the same reason.
The file to be encoded to MPEG4 is a piece of ‘Romeo Must Die’ in the DVD-typical ‘VOB’-format. I used the DivX 😉 codec rev. 3.11.
The Difference Between The Old And The New IEEE-1180 iDCT
You remember Intel’s email from above. Amongst other things, Intel simply re-compiled the original source code of FlasK’s IEEE-1180 iDCT that is using the normal x87-FPU. The benchmarks in the Pentium 4 update article were run with this option and you remember that Pentium 4 showed very bad performance. Since the underlying code is the same, the resulting MPEG4-file encoded with the old as well as the new version using the x87-iDCT should obviously be identical (which it is). Let’s see what difference the re-compilation of this code actually makes.
I regard it as rather mind-blowing that Pentium 4 as well as Pentium III and Athlon benefited from the re-compilation immensely. It shows that the original compilation of FlasK was either done with an ancient compiler or with very strange optimizations. Intel’s version doesn’t only improve performance for Pentium 4 owners, but for Pentium III and Athlon owners as well.
SSE2-Optimized iDCT Scores
Once SSE2 comes into play, which according to Intel is easy to implement, the Pentium 4 can show its real muscles, scoring significantly better than its competition.
Please be reminded of several things before you draw any conclusions of Pentium 4’s performance in comparison with Athlon’s.
- AMD did not supply an Athlon-optimized version of FlasK, which could and actually should show Athlon score a lot better as well. Still Athlon benefited a whole lot from the re-compilation with Intel’s latest compiler as well. I hereby encourage AMD’s developers to supply me with their special version to show what Athlon is really capable of.
- Intel made perfectly clear that they could have optimized the code a lot more to make Pentium 4 as well as Pentium III look even better. Due to time constraints and also due to the fact that a simple compile-job could already provide excellent results Intel refrained from doing that.
I was specifically asked to keep this special FlasK-version to myself, even though a lot of you would most likely be very interested in it. I suggest you contact the author of FlasK MPEG, Alberto Vigatб and ask him to establish a dialog with Intel as well as AMD to improve his excellent application. It should really go through him and certainly not through me. I just got the ball rolling.
Summary
If the situation with FlasK MPEG should be exemplary for Pentium 4’s performance – and why shouldn’t it? – then it really counts on the software industry to at least re-compile their applications to make Pentium 4 look a whole lot better. Once that’s done Pentium 4 has a good chance to become a success even at its current clock speeds.
Of course this is easier said than done. Which software maker would supply its customers with a new and free version of its product, although it only took a re-compilation? Which software maker will even bother to do that for the time being? After all Pentium 4 systems are very expensive and thus not exactly widely spread. We know that Intel has a very forceful way of ‘convincing’ other industries to follow them. We will see how much power Intel has right now. For the time being Intel is in the same situation AMD used to be with K6-2 and 3D Now! Without proper support of the software industry it will be hard to make a product such as Pentium 4 successful. Let’s not feel too sorry for Intel however. It is in a lot better situation than AMD used to be.
At the end of this article you will have to do without a ‘to buy or not to buy’ comment from me. I supplied you with a lot of facts so far and I am sure that you can come to your own conclusions. Additionally, there is a lot of P4-testing still going on in our lab. Very shortly this article will be followed by another benchmark evaluation, which should help us all to put the Pentium 4 puzzle together a little bit further.
PLease follow-up by reading Tom’s Blurb: Pentium 4 – Another Recount?.