Introduction
Part 2 of this series discussed applications for digital video on the Internet, professional editing systems and processing software. Part 3 covers popular video formats and their fields of application. We will also take a look at the basic function principles behind MPEG compression using practical examples.
Video Formats – from Postage Stamp Sizes to High Resolution Images
Avid PC users will almost certainly remember the first time they were able to view a video clip on their computer. The clips were about the size of a postage stamp and were generously referred to as “multimedia”. Later, the first acceptable video clips were used in the opening scenes of computer games. In some cases, there were even digital 3D animations that couldn’t be created in real-time with the hardware and software that was available in those days. As the video clips demanded extensive storage space (despite their short length), they were only available on CD-ROM drives that had recently become popular. Because of this, many PC’s became multimedia-compatible, in a restricted sense, by the integration of a CD-ROM drive and a soundcard. However, their limitations soon became apparent: it wasn’t possible to run the video clip smoothly in fullscreen mode even with the most powerful hardware available. With the development of high performance graphic chips, faster processors and corresponding software interfaces, today’s users are now able to run video clips in all the usual formats (including fullscreen mode) without problems. We’ll continue with a look at the most video formats and we’ll then provide an overview of their specific applications.
The AVI Format
One of the oldest formats in the x86 computer world is AVI. The abbreviation ‘AVI’ stands for ‘Audio Video Interlaced’. This video format was created by Microsoft, which was introduced along with Windows 3.1. AVI, the proprietary format of Microsoft’s “Video for Windows” application, merely provides a framework for various compression algorithms such as Cinepak, Intel Indeo, Microsoft Video 1, Clear Video or IVI. In its first version, AVI supported a maximum resolution of 160 x 120 pixels with a refresh rate of 15 frames per second. The format attained widespread popularity, as the first video editing systems and software appeared that used AVI by default. Examples of such editing boards included Fast’s AV Master and Miro/Pinnacle’s DC10 to DC50. However, there were a number of restrictions: for example, an AVI video that had been processed using an AV Master could not be directly processed using an interface board from Miro/Pinnacle. The manufacturers adapted the open AVI format according to their own requirements. AVI is subject to additional restrictions under Windows 95, which make professional work at higher resolutions more difficult. For example, the maximum file size under the FAT16 file system is 2 GB. The FAT32 file system (came with OSR2 and Windows 98) brought an improvement: in connection with the latest DirectX6 module ‘DirectShow’, files with a size of 8 GB can (at least in theory) be created. In practice however, many interface cards lack the corresponding driver support so that Windows NT 4.0 and NTFS are strongly recommended. Despite its age and numerous problems, the AVI format is still used in semi-professional video editing cards. Many TV cards and graphic boards with a video input also use the AVI format. These are able to grab video clips at low resolutions (mostly 320 x 240 pixels).
Apple’s Format
The MOV format which originated in the Macintosh world, was also ported to x86 based PC’s. It is the proprietary standard of Apple’s Quicktime application that simultaneously stores audio and video data. Between 1993 and 1995, Quicktime was superior to Microsoft’s AVI format in both functionality and quality. The functionality of the latest generation (Quicktime 4.0) also includes the streaming of Internet videos (the realtime transmission of videos without the need to first download the entire file to the computer). Despite this, Apple’s proprietary format is continually losing popularity with the increasing use of MPEG. Video clips coded with Apple’s format are still found on some CD’s because of Quicktime’s ability to run on both Macintosh and x86 computers.
MPEG Formats
The MPEG formats are by far the most popular standard. MPEG stands for “Motion Picture Experts Group” – an international organization that develops standards for the encoding of moving images. In order to attain widespread use, the MPEG standard only specifies a data model for the compression of moving pictures and for audio signals. In this way, MPEG remains platform independent. One can currently differentiate between four standards: MPEG-1, MPEG-2, MPEG-4 und MPEG-7. Let’s take a brief look at each format separately.
MPEG-1 was released in 1993 with the objective of achieving acceptable frame rates and the best possible image quality for moving images and their sound signals for media with a low bandwidth (1 MBit/s up to 1,5 MBit/s). The design goal of MPEG-1 is the ability to randomly access a sequence within half a second, without a noticeable loss in quality. For most home user applications (digitizing of vacation videos) and business applications (image videos, documentation), the quality offered by MPEG-1 is adequate.
MPEG-2 has been in existence since 1995 and its basic structure is the same as that of MPEG-1, however it allows data rates up to 100 MBit/s and is used for digital TV, video films on DVD-ROM and professional video studios. MPEG-2 allows the scaling of resolution and the data rate over a wide range. Due to its high data rate compared with MPEG-1 and the increased requirement for memory space, MPEG-2 is currently only suitable for playback in the home user field. The attainable video quality is noticeably better than with MPEG-1 for data rates of approximately 4 MBit/s.
MPEG-4 is one of the latest video formats and its objective is to get the highest video quality possible for extremely low data rates in the range between 10 KBit/s and 1 MBit/s. Furthermore, the need for data integrity and loss-free data transmission is paramount as these play an important role in mobile communications. Something completely new in MPEG-4 is the organization of the image contents into independent objects in order to be able to address or process them individually. MPEG-4 is used for video transmission over the Internet for example. Some manufacturers plan to transmit moving images to mobile phones in the future. MPEG-4 is intended to form a platform for this type of data transfer.
MPEG-7 is the latest MPEG family project. It is a standard to describe multimedia data and can be used independently of other MPEG standards. MPEG-7 will probably reach the status of an international standard by the year 2001.
Differences between MPEG-1 and MPEG-2
Although the MPEG-2 format is a more current technology, this format doesn’t present a major technical improvement over MPEG-1 as far as the basic principles are concerned. However, some differences have resulted due to the extension of the specification as well, as changes made to match the requirements of digital television and future high-resolution television. The most important details changed are:
- Increase the precision of movement vectors to half-pixels
- Extended error redundancy due to special vectors in I frames
- Selectable precision of discrete cosine transformation
- Further prediction modes and macro blocks
- Scalability (different quality levels in a single video stream)
The MJPEG Format
The abbreviation MJPEG stands for “Motion JPEG”. This format is practically an intermediate step between a still image and video format, as an MJPEG clip is a sequence of JPEG images. This is one reason why the format is often implemented by video editing cards and systems. MJPEG is a compression method that is applied to every image. Video editing cards such as Fast’s AV Master or Miro’s DC50 or the much more inexpensive Matrox Marvel product series reduce the resulting data stream of a standard television signal from approximately 30 MB/s (!) to 6 MB/s (MJPEG file). This corresponds to a compression ratio of 5:1. However, a standard for the synchronization of audio and video data during recording has not been implemented in the MJPEG format so that the manufacturers of video editing cards have had to create their own implementations.
The H.261 and H.263 Protocol
The H.261 standard is designed for videoconferences and video telephony via an ISDN network. H.261 enables the image quality to be adapted to the bandwidth of the transmission line. In addition, entire images from a sequence can be omitted during playback in order to improve image quality. Transmission can occur at a bit rate of 64 KBit/s or 128 KBit/s (grouping of two ISDN channels). The successor standard H.263 implements a higher precision for motion compensation in comparison to H.261. Also, other image formats are supported in order to accommodate for different application fields such as gate monitoring systems and wide screen videoconferences.
Compression – what for?
Digitizing a video sequence results in extremely high data rates. For example, a television image with a resolution of 720 x 576 pixels and a color depth of 16 bits produces a data stream of 1.35 MB per individual frame. As 25 frames per second are required to avoid jumpy video scenes, a gigantic data volume of 33.75 MB/s is produced! Only a few selected high-end SCSI hard drives in a RAID-0 configuration would be in a position to save this data stream. A data storage medium such as a CD-R would only have space for about 16 seconds of video. For this reason, it is absolutely inevitable that video signals are compressed so they can remove or reorganize data in order to reduce the size of digital files. One distinguishes between loss-free compression methods and lossy compression.
“Lossless” compression retains the original data so that the individual image sequences remain the same after compression (like ZIP is for files). Most lossless compression techniques use run length encoding that removes images areas that use the same color. However, the compression rate is not better than 3:1, depending on the complexity of individual images. In practice, lossless methods play a low-key role due to their low compression rates.
On the other hand, “lossy” compression methods attempt to remove image information that is unlikely to be noticed by the viewer. These methods do not retain the original data and some image information is lost. The volume of data lost depends on the degree of compression. In practice, time compression is gaining in importance. With this method, the resulting data volume for each individual image (within a video clip) is optimized. The following section covers the most important method, MPEG compression, using an example.
How MPEG Compression works
Figure 1 shows a Mercedes coupe, the position of which is altered from frame N to the following frame N+1. The image background however is very similar for both frames.
Figure 1: Motion compensation: The Mercedes coupe has changed its position from the starting frame N to the next frame N+1, the background image however is identical.
The image details are altered continuously from one frame to the next N+1 frame as the car moves from left to right without changes being made to the background (e.g. the trees). A central part of the MPEG compression comes into play at this point: motion compensation. A vector can describe the movement of the Mercedes coupe relatively easily. It is enough to state that the vehicle has moved by 12 pixels to the right and 10 pixels downwards from frame 1 to frame 2. However, the recognition of a complete object (and at that, one as complicated as a car) would be much too difficult to realize in practice.
Subdivision into Macro Blocks
Instead, the image is split into macro blocks with a size of 8×8 or 16×16 pixels that are separately handled. In the next step, the difference between the macro block in image N and the moved macro block in image N+1 is established (see figure 4). This error image has to be coded and saved along with the displacement vector in order to monitor the subsequent error accumulation. Memory space requirements are minimized if the difference between the moved macro blocks is so small that it’s possible to completely forget encoding the difference.
Figure 2: A discrete cosine transformation method (DCT) is used to suppress highly frequent image areas, as these aren’t apparent to the human eye.
In order to code the images (the next step), the following macro block types, shown in figure 3, are used: the “I” frames are images which are saved in the equivalent JPEG format and are not dependent on the previous or subsequent frames. Only the “I” frame (intra coded image) permits direct access to the individual sections or still frames in a clip. In contrast, “P” frames (predicted image) are predicted from the previous “I” frame. The most universal image types are the “B” frames (bi-directionally interpolated image) that are interpolated from the previous, or following P or I frame.
Figure 3: Macro block types during image encoding.
Discrete Cosine Transformation (no Data Loss)
Figure 2 shows a section of the side of the car and its headlight as an 8×8 block. This block is assigned to a color value matrix that is used for the discrete cosine transformation (DCT). DCT suppresses highly frequent image parts that aren’t apparent to the human eye. DCT is based on Fourier transformations that present any signal as merged (superimposed) sine signals of different frequencies and amplitudes. The Fourier transformation yields frequency and amplitude distribution values from the location of pixel values in an image. This means that large, regular areas in the image are then represented more in the lower frequency parts, whereas finer details are in the higher range. In our concrete example, DCT transforms the displayed 8×8 macro block into an 8×8 coefficient matrix. The value in the upper left corner of the coefficient matrix contains the lowest frequency parts. This coefficient at location 0,0 is normally referred to as the DC coefficient, whereas the remaining 63 coefficients are termed AC coefficients (AC = Amplitude Coefficient). As there is normally a solid relationship between the DC coefficients of two subsequent 8×8 blocks, the DC coefficient is encoded as the difference to the predecessor. The remaining 63 AC coefficients are sorted according to a set pattern.
DCT concentrates the signal energy of a block in the lowest coefficients, especially in the DC coefficient. The higher AC coefficients are normally 0 or almost 0, because the main part of the visual information of an image lies in a continuously distributed range of values in the lower frequency area. Edges normally only constitute a small part of an image. After the discrete transformation, the coefficients are quantized in order to attain an additional compression improvement.
Figure 4: Principle flow of MPEG compression. MPEG-1 reduces the original data volume to about 1:35.
Quantizing (high data loss)
Quantizing is a process that involves adapting the data encoding precision to the capacity of human perception. Due to the fact that the eye is not able to monitor changes to fine details (such as the parked cars in the image) very well, the observer will not notice the slightly reduced display precision. Looking at the image very carefully however, a certain softening effect can be seen which smudges sharp edges. In addition, so-called artifacts occur if the degree of quantization is too high.
Code Optimization 1: Run Length Encoding:
Run Length Encoding works by grouping elements that repeatedly occur and by encoding them with a count value. As the counter also requires space, elements that occur twice or three times remain uncoded. This type of compression is used in the graphics field, for example to display smooth surfaces with a minimum byte count.
Code Optimization 2: Huffman Encoding:
The Huffman method encodes often-repeated elements with a few bits and rare ones with more bits. The number of times the elements occur is used to determine the respective bit encoding method.
Errors in encoded Video Clips
The attainable image quality of an encoded MPEG video depends mostly on the quality of the encoder. Low-cost software solutions for MPEG-1 encoding from about $150 such as that offered by Xing are available on the market. However, the attainable encoding quality depends heavily on the original material. Figure 5 shows the Mercedes coupe with noticeable compression errors. The arrows mark so-called artifacts that occur most often in areas where sharp edges are visible.
Figure 5: Compression errors: Artefacts mostly occur in areas where sharp edges and overlapping areas are located.
Next article: Part 4 discusses hardware and software products that can be used to create MPEG videos. The AVI, MPEG-1 and MPEG-2 formats play a major role in this.