<!–#set var="article_header" value="Building a Digital Video Capture System –
Part I” –>
Introduction
The commercials make it look easy. Get a video camera, shoot some scenes, capture the video into your computer, and with a few clicks and drags you turn those disjointed shots into a cinematic masterpiece ready to put on a CD, email to grandma, or stream off your Web site. No problem. But capturing and manipulating digital video in a computer is a little trickier than that.
NLE on a PC – Image courtesy of Pinnacle
In part one of this two-part article I’ll give you a crash course in digital video capture basics, what kinds of system issues there are, outline the types of capture hardware available (and costs), and what kinds of results you can expect. In part 2 we’ll roll up the sleeves and review and compare a number of capture systems that won’t send you to the poor house.
Compression is King
There is no other way to say it. Video takes up a lot of everything; bandwidth, storage space, time, and money. We might like to think that all things digital are preferable to all things analog but the brutal truth is that while analog formats like video might not be exact or precise they are remarkably efficient when it comes to storing and transmitting vast amounts of information. For example, a typical NTSC television program that you can receive with a pair of rabbit ears on a $60 TV set is streaming data at about 26 MB/s (that’s megabytes per second not megabits). If you strip out some of the carrier information and extra baggage inherent in NTSC signals (and convert to megabits for the sake of later arguments) you end up with about 124 Mbps. But that’s still about 1.5 gigabytes per minute or 93.5 gigabytes per hour. If you want to do something simple like a dissolve from one scene to another then you have to have two video streams running at the same time so double those numbers. If you wanted to do HDTV-quality then you’re closing in on 5 Gb/s.
You can see that we’re beginning to talk about terabytes of storage and transfer rates approaching gigabytes per-second. These days you can buy a 30 GB EIDE hard drive for about $150. While the EIDE specifications list a maximum transfer rate of 33 MB/s that’s only theoretical. Under ideal conditions an EIDE drive might have a read/write rate of around 12-15 MB/s but most hard disk drives can only achieve those rates using caching techniques and only hit those numbers in very short bursts. In reality EIDE sustainable read/write rates are usually anywhere from 3 to 6 MB/s. Wide Ultra SCSI claims a theoretical transfer rate of 40 MB/s and SCSI Wide Ultra II claims maximum throughput rates of about 80 MB/s. But even these SCSI numbers are only theoretical while the reality is most SCSI buses and hard drives can only achieve sustainable read/write rates of about 6 to 13 MB/s.
We’re still only halfway toward that 26 MB/s single-stream of uncompressed video. Now there are ways around this bottleneck (terribly expensive ways that is), such as using fibre-channel and specially configured RAID systems but even then there are other problems I’ll touch on in a bit.
Color Space
So if we can’t adjust our computer systems to accommodate uncompressed video then we have to adjust the video by compressing it down to a more manageable size. But video compression is a tricky business. Video is interlaced while computer displays are non-interlaced. Video operates in a mind bogglingly complicated variation of YUV called Y’CbCr or ITU-R 601 while computers use RGB (a fairly extensive color-spaces FAQ can be found at Poynton’s Color FAQ, but be forwarned, it gets a bit deep and just skims the reasoning behind TV’s particular variation on using color.). Video runs at different frequencies than most computer clocks and displays. Finally, analog video signals are very, very noisy by computer standards.
CODECs
Tom’s Hardware Guide has covered compression technologies in a previous article, but I’m going to lay it out again, with some added information that will be useful as a reference for the review of video capture in Part II.
Video compression/decompression algorithms (CODECs) have been around for a about ten years and some of them, like MPEG-2, are quite good, but they all have their drawbacks. All video compression algorithms perform the same basic functions. And no matter what the manufacturer claims, video compression is always lossy. In short, a frame of video is captured (digitized) and then compressed using a myriad of arcane and borderline mystical techniques before being converted to a non-interlaced RGB image. . Ideally, only the visual information that is not noticeable to the human eye is stripped out. This is called intraframe compression since it only happens on one frame at a time.
Some algorithms stop at this point. For example, M-JPEG (motion JPEG) simply compresses individual frames (using the JPEG compression algorithm), which can then be played back like a cartoon flip-book. But compressing 30 frames per second is still a daunting task so video capture boards that use the M-JPEG technique usually rely on dedicated encoder chips like the ZR36060 and ZR36050 from Zoran. DV cameras use a similar technique to compress and playback frames on the fly. (Note: DV camcorders use a proprietary compression algorithm performed by special chips in the camera so in spite of what you may hear, the DV format is not uncompressed video. DV also presents its own problems for the aspiring videographer that we’ll get into later.)
The main problem with this technique is that the complexity of each image also affects the size of the compressed frames. Either your playback algorithms have to be able to constantly adjust to different sized frames (while maintaining a constant playback rate) or you can force the compressor to make every frame exactly the same size every time. Of course that will mean that some frames will require very little compression while others will have to be compressed quite a bit. And, the rule of thumb is “the more compression you apply, the worse the image looks”.
MPEG-1
MPEG-1 algorithms force each frame to be the same size by dynamically changing the amount of compression performed on each frame but it also performs additional compression based on the differences between frames. This next step up in video algorithms is called interframe compression. The basic idea is that you can eliminate any redundant information that is common across multiple video frames. Rather than compress each complete frame individually we should only need to keep track of the differences between frames.
You start with a frame, compress it using JPEG. This will be our reference image (called an I frame or Intracoded frame). We then take the next frame, compress it too, and then compare the two. We throw away all the redundant information and only keep track of the differences what we’re left with iscalled a P or Predictive coded frame. Repeat for all following frames. This will introduce some errors over time and since these errors are cumulative you have to reset everything every so often by inserting another I frame. You can also improve the algorithm by inserting special frames that compare differences both backward and forward in time. These are called B or Bidirectionally interpolated frames. In terms of size I frames are the largest, P frames are next, and B frames are the smallest. A collection of I, P, and B frames is called a GOP or Group of Pictures..
MPEG-1 was originally designed as a non-isotropic compression algorithm. In other words the amount of processing power required to encode MPEG is much higher than the amount of processing power required to decode that same MPEG stream. Good for playback, bad for capture and compression.
MPEG-1 for video CDs (sometimes called White Book) specified that an MPEG stream had to run at a constant 1.15 Mbps no matter what (for 1X CD-ROMs). In order to maintain this constant bit-rate during playback the amount of compression applied to each frame is constantly changing (not to mention that compressing a video stream from 124 Mbps – down to MPEG-1 rates is a ratio of over 100:1!).
And there are other problems to deal with. As I said earlier, video is very noisy. Even if you point a camera at a blank wall and shoot a handful of frames you would find that virtually every single pixel that makes up each image will change slightly from frame to frame. A pixel that shifts from an RGB value 10,10,10 to 10,10,9 and back again from one frame to the next might not be enough to notice with the naked eye but to a computer those pixel color values are completely different and would count as changes to keep track of.
Now you can pre-process the images or make the algorithms more sophisticated but the tradeoffs involve processing time, image quality, compression ratio, or additional hardware and even then, the results can vary quite a bit depending on the source video you’re trying to compress. You might be able to filter out those noise artifacts in a static scene but what happens when the video really is changing significantly from frame to frame? When the camera zooms or pans or moves at all then virtually every single pixel really is changing. Shots of a wheat field blowing in the wind or water flowing or a crackling fire are all difficult to compress. Scene cuts, fades, and dissolves also involve changing every single pixel so any intra-frame compression gains are lost in these situations.
MPEG-2
The next step up the compression ladder is to use a variable bit rate when encoding. MPEG-2 uses this technique. If there is a scene change or the image is particularly difficult to compress MPEG-2 will devote more attention to those frames. The player also has to be able to adjust to this changing bit-rate. By comparison, MPEG-1 VCDs run at about 1.15Mbps while MPEG-2 DVDs run anywhere from about 4Mbps up to 9.8Mbps (about a 40:1 compression ratio).
More MPEG Bad News
While video compression algorithms have become fairly sophisticated they have also introduced other problems. For example, because of the nature of MPEG’s IBP structure a player can’t decode the B and P frames until it has an entire GOP in memory. This has multiple implications. First, MPEG doesn’t degrade gracefully like analog TV. MPEG is an all-or-nothing algorithm. If there are problems in the MPEG stream you simply don’t get any image at all (rather than a blurry or fuzzy one).
This also means that MPEG is poorly suited to streaming over the internet since packets can arrive at different times and not necessarily in the proper order. If one of the frames is incomplete then the whole process of decoding has to wait for that missing information to show up – or worse yet everything just crashes to a halt.
This also means that MPEG is harder to edit. You can’t just say “cut from this sequence to another” since the cut might not fall on an I-frame or GOP boundary. MPEG editing software has to decode all the frames first and then re-encode everything after an edit is performed. Unfortunately, since MPEG is a lossy algorithm every time you decode and re-encode a sequence the quality of the image degrades further and further.
Finally, even at 4 to 6 Mbps, we’re still talking about transfer rates and storage requirements that would make an IT manager choke.
Beyond MPEG-2
Now that I’ve got you completely confused about MPEG-1 and MPEG-2 I should mention that there are other MPEGs in the works. MPEG-4 is just getting off the ground and MPEG-7 and MPEG-21 are right around the corner. These new MPEGs are basically extensions of the original MPEG algorithms but are optimized for things like multimedia, very low bit-rates (between 5 kbit/s and 10 Mbit/s), and incorporating things like 3D objects and positional audio into the stream. While these features might sound intriguing we’re just going to have to wait a while to play with them since there are almost no tools available today and the few tools that do exist are still in the experimental phase or cost a fortune. On the plus side many of the first tools to make use of MPEG-4 will be software based so you should be able to make use of any existing hardware capture equipment you might buy today.
Good sources for links and information about MPEG can be found at:
Now For The Good News
So compressing video is tough and fraught with problems. There is some good news however. Since most of the current video compression algorithms were invented back in the days of 486s and P-90s most companies that produced video capture and compression products gave up on the idea of using the CPU to perform all the necessary calculations. For the most part this still holds true today. For example, even a 400Mhz CPU has to devote almost 80% of its processing power to decode a DVD MPEG-2 stream using a software CODEC (and remember, encoding is much tougher than decoding). Granted, a lot of this has to do with getting around the CSS copy protection schemes but it’s still a lot of work. Yes, there are some solutions that are mostly software based but they are usually not even close to real-time. Some specialized effects that require software rendering can take 4 to 10 minutes of processing for every minute of video or more.
So, since using the CPU was out they relied on custom silicon to do the tough stuff and pretty much left the CPU alone. For someone building a digital video capture system today this means that even a top of the line, $30,000 encoder will work fine on an old 300Mhz CPU.
The other good news is that since digital video has been picking up momentum over the past five years a lot of the specialized hardware needed has been falling in price. When the first MPEG-2 capture boards were released they cost about $60,000. Today, you can buy a decent MPEG-2 board for about $400.
Time Is Money
So, if you can buy an MPEG-2 capture board for about $400 then how can companies charge $1,500, $5,000, or up to $30,000 for a capture card? Quality is part of the reason but in digital video processing, speed is the real issue. Post-production houses that charge hundreds of dollars per hour to manipulate your video are willing to pay top dollar in order to do their tweaking in real time. When your client is standing over your shoulder with a stopwatch you don’t want to say “okay, I’ve started the rendering, let’s come back in an hour or two”. This is good news for the do-it-yourselfer. Up to a certain point you’re paying for quality – beyond that point you’re paying for speed. If you don’t mind waiting for an effect to render then you can save literally thousands of dollars. You’ll end up with the same quality but it won’t cost you as much as a new car.
Video Toaster on NT – Image courtesy of Newtek
The trick is finding that line and not stepping over (although, you might want to get close enough so that if you discover you really enjoy this stuff you’ll be able to take the next steps without having to buy all new gear). Another thing to consider when shopping around for digital video boards is the age-old computer adage “garbage in, garbage out.” You might be able to get along now with a $100 capture card that is limited to a resolution of 320 x 240 at 15 fps (frames per second) but you’ll never be able to make that video any bigger or better.
Let’s Face It, Video is Expensive
So even though prices have come way down over the years a reasonably good consumer-level video capture board will cost you between $250 and $900. And unfortunately, that’s just the beginning.
As I pointed out earlier, video requires a lot of bandwidth and storage space. While it is possible to do video work with an EIDE drive you will be pushing the limits of the bus and the hard drive. Just about every video professional or serious hobbyist I know uses an Ultra SCSI 2 controller and drives…big, big drives.
I personally like the Adaptec AHA 2940UW, (even though it’s a bit long in the tooth) and 5,000 to 7,500rpm IBM or Seagate Cheetah drives. There are two sidenotes here. First, not all SCSI interface controllers are created equal. Some of the less expensive controllers don’t play nice with the system bus. They basically take control of the entire bus and can choke off the rest of your system. While video capture is 99% a matter of capture card, bus, controller, and hard drive talking to each other there are still a few functions that require at least some CPU and memory system bus access. Your SCSI controller and drives may be able to handle the video stream but could still end up waiting for a bus-starved CPU to tell if you clicked the mouse somewhere. I have also seen motherboards with lousy system busses that prevent the SCSI controller from achieving its full potential.
Second, hard drives (and file systems) are designed for lots of little reads and writes with time to rest in-between. They were not designed to handle very, very large files that absolutely must have continuous, uninterrupted data transfers both to and from the drive. Because of this many hard disk drives use a caching system to boost speed. Unfortunately, those caches are too small to do any good when you’re trying to save or read a 10 GB video file. If you find that you are dropping frames while capturing (the most common problem) then you might have to disable the hard drive’s buffering.
Also, some hard drives perform a thermal recalibration every few seconds. It only takes an instant but it can be enough to cause dropped frames. Newer drives and so called “AV drives” don’t have this problem but you should be aware that your old drives might not be up to the strain of video capture.
Lastly, depending on the version of your file system you may hit a 2 GB limit on file sizes. The FAT-16 (file allocation table) uses a linked list of 16-bit values to keep track of clusters, the minimum amounts of disk space the operating system will allocate. The maximum number of such entries (65,536) multiplied by the maximum size of a cluster (32,768 bytes) is 2,147,483,648 bytes, or 2GBLike so many things in computer design, programmers of the original FAT file system structures never dreamed that anyone would ever want to (or be able to) store files over 2 GB. This problem occures in PC FAT 16 (or lower), Linux NFS, and Apple Macintosh (prior to OS X) but not in NTFS or FAT 32 (although problems can still occur). Most capture systems and accompanying software will offer a workaround to this problem but not all. If you find that you can’t capture sequences longer than a few minutes (or a few seconds depending on your compression settings) this limit might be your problem and you might consider moving up to FAT-32 or NTFS.
Let’s Face It, Video is Expensive, Continued
So if you find that your inexpensive EIDE drive just can’t hack it you’ll need to invest in a SCSI controller and a few very large, fast hard drives. If you really get serious about video then you might consider stepping up to a RAID system, but we’ll leave that for now. As far as the rest of your system goes there isn’t that much you’ll need beyond an average CPU (300-400Mhz or better) and a fair amount of RAM (128MB for Win98, 256MB for Win2K or NT). Since most video capture systems bypass the graphics card completely and simply overlay the video on top of the signal just about any graphics adapter that can handle today’s games should be fine for a video capture system as long as it supports Direct Draw hardware overlay. Video on the motherboard (particularly AGP setups), ATI All-in-Wonder cards, and the original Matrox Millennium (not the Millennium II) can cause problems for video capture systems since they tend to steal PCI and memory resources. This problem can also occur with high-end sound cards, DVD cards, and Ultra66 cards. Network adapters can also cause problems during video capture since the adapter and the CPU are constantly talking to each other. If you absolutely must have your system connected to a network then you should probably go with NT.
Finally, there are a number of older systems (and a few new ones), chipsets, processors, and peripherals that don’t play nice with IRQs and memory boundaries. Sometimes, a little IRQ tweaking or getting the latest and greatest drivers will solve the problem, but the naked truth is that some configurations just won’t work for capturing and manipulating video. I’ve heard reports that Compaq Presarios with built in FireWire, Sony VAIO’s, HP’s with built-in AGP graphics, IBM Aptiva’s with built-in MPEG, and old Packard Bell computers all have problems with video capture systems. Also, Pentium Pros, AMD K5s, Cyrix P120/P150s, and OverDrive processors for Pentium 60/66/75/90/100s are not recommended.
Other than the capture board, the proper software, and your computer, you’ll probably find that the most expensive aspect of building a video capture system is all the video equipment. You can spend a small fortune on a good camcorder and VCR (to supply all that video to your new board). You’ll also need tapes, camera batteries, tripods, miles of video and audio cables and adapters, a TV set (to view that video during editing), perhaps some basic audio equipment (microphones, mixer, equalizer, etc.) and a million other small things that can quickly drain your bank account.
I’m not going to try and recommend which camcorder to buy but I will mention that DV and Digital8 cameras are a little different and require a special kind of capture setup. If you’re going out shopping for a new DV camcorder then look for one with IEEE 1394 connections and external control features. Many DV capture systems can connect directly to the camcorder and control functions such as starting and stopping, frame accurate searching, etc. This can be very handy for batch processing and non-linear editing.
Capture Card Basics
There are basically five types of video capture systems; Analog M-JPEG, Analog MPEG, DV, Analog/DV combinations, and the rest (some using proprietary CODECs such as Indeo, and some ultra-high-end systems costing many thousands of dollars). These can also be broken down into realtime and non-realtime (although they are actually referring to how transition effects are done for output, not how they capture video in the first place). Note that when we say analog here we’re referring to the type of video input signals not the end results (although some boards can’t output back to a VCR without help).
All capture systems are designed to accept certain types of video input – PAL, NTSC, SECAM, composite, component, S-Video, DV, etc. so you need to make sure your video capture board matches your source.
Some capture cards have an external breakout box (or put all the circuitry in the external box). Breakout boxes are handy since they alleviate all that climbing behind your computer to plug things in and out – and believe me, you’ll be doing a lot of that.
All capture systems have a maximum resolution and maximum frame rate. Some will also have a maximum data rate. Less expensive systems will have a maximum resolution of 320 x 240 or less but keep in mind that you can always make an image smaller if you have to but you can’t expand a small image too far without losing a lot of quality so you should probably stick with a system that can handle at least 640 x 480 or 720 x 480. For the best quality you should also look for a system that can capture 60 fields per second rather than 30 frames per second. As far as maximum data rate goes you should look for a system that can handle at least 3.6 MB/s.
Analog M-JPEG boards were the first pioneers in digital video capture and have gradually lost out in favor of MPEG and DV alternatives but you can still find them out there (at very good prices) and they do a pretty respectable job.
Analog MPEG systems come in two flavors: MPEG-1 and MPEG-2. If you’re serious about video (or think you might be sometime in the future) then it’s worth it to go for the higher-end MPEG-2 systems – who knows, you might want to put your videos on DVD someday. MPEG-1 systems can be perfectly adequate but the quality can vary quite a bit from system to system.
DV capture systems are a special case and require a DV camera or DV VCR for input. These systems can also be divided into two categories; internal and external. In this case we’re talking about where those proprietary DV encoder chips are located. Some early DV systems used the chips in the camcorder to process the video. They would capture the DV video stream from the camera, convert it into another video format such as AVI and then send the video back to the camcorder in order to perform the DV encoding. They were a little awkward at first but they’ve improved over the years.
DV capture systems have an advantage in that many of them can connect to and control the DV camcorder or DV deck remotely via IEEE 1394 connections. This is handy for batch capturing and editing.
Some DV capture systems have an unique feature that takes advantage of this external control. Rather than capturing and storing a DV signal on the hard drive they capture in compressed AVI (or another format). After you do all your editing and press the render button, the system goes back out to the camcorder or deck, grabs only the bits and pieces necessary, and outputs straight back out in DV format. Of course, this requires that your DV capture system knows how to control your particular DV Camcorder or deck. So if you have a DV camcorder it’s important to make sure the capture system supports that particular make and model.
Capture Card Basics, Continued
DV Camera and Deck – Image courtesy of Panasonic
Analog/DV combination systems will accept input from either a standard analog input or from a DV camera. These systems will give you the most flexibility if you already have an analog camcorder but are considering a DV camcorder sometime in the future.
Virtually all capture cards come with some sort of bundled software that will let you capture and at least do some basic editing. If a $500 price tag for the capture system makes you cringe then take comfort in the fact that you’re probably also going to get about $800 worth of bundled software. However, if you have a particular target platform in mind, such as RealVideo or QuickTime for Web streaming then you might have to purchase additional software.
You may also find that once you outgrow the bundled “lite” version of Adobe, Ulead, or Avid software you might want to invest in the full version but be prepared for some serious sticker shock – professional-quality NLE (non-linear editing) software and effects packages can get very expensive.
Beyond these basic features, capture systems differentiate themselves based on speed, built-in transition effects, and bundled software. Just about any system will let you capture video that is suitable for the Web (although that’s not saying much). And just about all systems in the $250 on up category will let you capture and manipulate video that is comparable to VHS or better. You’re not going to get HDTV-quality unless you spend a small fortune but for under $1,000 you should be able to put together a system that is capable of getting very close to DVD-quality.
With everything, there are compromises and trade-offs. When shopping for a capture system it’s always better to opt for the best you can afford. You can always make the image smaller or use a lower frame rate if you need to, but it’s difficult to take a small, poor-quality video and try to make it better.
Coming Up
There have been dozens of books and articles and entire magazines and Web sites devoted to video capture and there’s no way I can cover everything in one article. In the next installment I’ll take a look at some of the systems out there that you can get for under a $1,000, what they can (and can’t) do, and how they stack up. In the meanwhile, drag that camcorder out of the closet and go shoot some footage. You’ll need something to capture once you’ve built your own system.
guywright@home.com
Guy Wright has worked as a video engineer, video editor, and producer/director for cable and experimental television stations. He is a published science fiction author and author of books on computers and video. He has been Technical Editor, Technical Manager, and Editor-in-Chief for a number of high-tech magazines since 1983 including Run Magazine, Amiga World Magazine, OS/2 Magazine, InterActivity Magazine, and Multimedia Week. He has founded or co-founded four software companies. He has published over 500 articles in more than two dozen high-tech and video magazines. He has lectured at numerous trade shows and conferences. He has also produced over a dozen commercial CD-ROM titles. He is currently a Senior Producer at Digital Media Online where he produces four Web sites – DVDCreation.com, DTVProfessional.com, DigitalGameDeveloper.com, and AVVideo.com.