The article first appeared in
IDE Training Course, Part 3: Using RAID
We talked extensively about IDE and the technical aspects of RAID in the first two parts of this series. Only a few practical questions remain unanswered, such as: what is the performance like for each individual RAID Level? And, which levels are best for which applications?
The Levels 0 and 1 are as different as fire and water; Level 0 actually exposes your data to a higher risk, while Level 1 ensures maximum security. If you want both, you’ll find yourself having to dig deep into your budget. RAID Levels 3 and 5 store parity information so that, in case a drive crashes, the entire data stock can be recovered once the culprit has been exchanged. However, it takes a rather powerful processor to calculate these checksums – the best choice would be an RISC model (Reduced Instruction Set Computing), because these chips have been optimized for their tasks. The appropriate controllers are expensive, and you’ll need at least three hard drives.
Setting Up a RAID Array
Setting up a RAID array usually doesn’t take much time. Especially for modes 0 and 1, it will suffice to select the drives to be included in the array in BIOS of the controller. Lastly, after the system has rebooted, the new drive must be formatted (it may be necessary to activate the RAID controller with its driver under Windows).
In RAID 3 or RAID 5 arrays, the controllers often run an initialization process that can take up to several hours.
Something for Everyone: Possible Applications
Though RAID Level 0 is the fastest option of all, it is also the most precarious by far. For example, using four hard drives will push the data transfer rate far beyond 100 MB/s, but fault tolerance will be virtually non-existent. A hard drive is a mechanical component that will age and wear out after a while. Mechanically-induced defects are therefore really only a question of time. But even an electronic error or a minor production error may result in a catastrophe.
For this reason, RAID 0 is not recommended for long-term storage data, but primarily for setting up fast drives with temporary data, such as file or database servers. And if the system has to be mirrored on short notice, RAID 1 is your best choice. If you have a hot swap cabinet, you can remove the hard drive while the system is running in order to mirror it to a different drive of the same size on a different computer. Then the drive is reinstalled in the computer, while the copy can be saved once again with a RAID 1 when connecting it to a RAID controller.
RAID Level 1 does nothing other than mirror a hard drive’s data (in special cases, also those of any array) to another hard drive in real-time. People are inclined to believe the misleading proposition that RAID 1 does not offer any improvement in performance. Though write operations really aren’t any faster than with only one hard drive, when reading data, it is possible, in theory, to have a data transfer rate equivalent to that of an analog RAID 0. This is only logical, since data can be read simultaneously from all the drives in the array. In practice, however, there are differences, as the data to be read are not available in cleanly split stripes as they are with RAID 0; instead, the controller has to perform this division itself based on specific patterns.
Using RAID 1 makes sense if your main focus is on maximum data security and minimum recovery efforts (e.g., simple servers). Most RAID controllers are able to perform the recovery procedure independently after a hard drive has been exchanged. You can do this on the fly only if the hard drives are housed in hot swap cabinets.
RAID Level 3 is losing more and more of its popularity because RAID 5 offers the same advantages with fewer disadvantages. With RAID 3, parity data is written to one or maybe even several hard drives. The big advantage is in the distribution of the actual data stock to several drives, in the form of stripe sets, actually allowing a significantly higher data rate – and at the same time protecting against a hard drive crash. Its disadvantage, however, is the fact that the parity data is written to only one drive. This cuts down on write performance considerably.
RAID 3 is usually deployed in servers with mostly static data or servers that require better performance than RAID 0 can provide, without foregoing data security. This is a simple way to keep the low write performance from carrying too much weight.
RAID Level 5 dominates in today’s high-end server segment. If you’re using four to seven drives, such an array is a real performer and, if the drives are large, allows accordingly large partitions. Unlike RAID 3, the parity data are integrated in the stripes on all drives and are distributed in a way that will have a positive impact on performance. Consequently, RAID 5 offers a high level of performance for all kinds of applications.
The Sky’s the Limit: Nested RAID
If the data transfer rate of an array with several drives is still not enough, you can combine and nest RAID arrays any way you like. These configurations are called Nested RAID (multiple RAID levels), but you’ll encounter them very rarely – and no wonder, because “conventional” arrays are generally fast enough.
As far as we know, controllers supporting Nested RAID are not yet available in the IDE sector (with the exception of RAID 10). For SCSI, be prepared to quickly dish out several hundred to some thousand dollars if you want to set up an elaborate RAID solution.
RAID Level 0+1
The most popular Nested Raid is probably 0+1. You’ll need an even number of hard drives for this, but at least four. Use half of the hard drives to create a stripe set (RAID 0), while the resulting construct is simply mirrored (with RAID 1). You will then get almost four times the read performance and about twice the write performance relative to a single hard drive.
RAID Level 50 (5+0)
The performance of a RAID 5 with several drives is not good enough for you? Then simply create a stripe set consisting of two identical RAID 5 arrays. Though data security is no longer a given now (an array is to be considered a drive in this case), performance can theoretically be doubled once more. In reality, you’ll now be faced with the limits of what PCI and network connections will allow.
Naming is an important factor in Multiple or Nested RAID configurations. While RAID 0+1 works on the lower level with stripe sets and mirrors only on the upper level, with RAID 10 it’s exactly the opposite. As the latter does not really make sense, the wrong nomenclature would be less grave in this case.
Nested RAID and Security: It’s All or Nothing
Now, a few words on the cascaded application of RAID arrays, even though most of you will probably never be in a situation to have to worry about linked drives like this.
Combining several RAID arrays is efficient and prudent, but perfect data security can be achieved only if each array is just as safe in and of itself. A RAID 5 consisting of multiple RAID 0 arrays is not secure, because if one of the drives of a secondary array crashes, its data cannot be recovered.
RAID Levels: Security and Performance at a Glance
RAID | Number of drives | Data security | Availability | Capacity | Performance | Cost |
0 | 1+ | unsatisfactory | bad | 100% | very good | very low |
1 | 2 | good | good | 50% | satisfactory | low |
3 | 3+ | satisfactory | good | (x-1) / x | satisfactory | medium |
5 | 3+ | satisfactory | good | (x-1) / x | good | medium |
0+1 | 4,6,8… | good | good | 50% | good | medium |
The Key to Success: Block Size
In RAID arrays, the block size generally also determines the stripe size (not in RAID 1). The principles concerning block size and wasted memory space apply equally to RAID configurations: if, for example, the blocks have a size of 64 KB, then at least 64 KB are written at all times – even if there’s only a text file with 2 KB. So, the smaller the average file size, the smaller the block size should be.
But block size is also significant in terms of the performance to be expected, as the smallest unit also determines when a file can be distributed to two or more drives. This would mean that with a block size of 64 KB, files with less than 64 KB would be written to only one hard drive. This does not happen any faster in a RAID array than on a single hard drive.
On the other hand, a file with 150 KB would be distributed to three hard drives (if available): 64 + 64 + 22 KB. The controller is now able to read from all three drives simultaneously, which reduces the read operation immensely.
RAID in Perfection: Adaptec ATA RAID 2400A
Adaptec is one of the indispensable giants, especially in the SCSI sector. The current champion in the IDE sector is the ATA RAID 2400A.
- Four channels for each device
- UltraATA/100
- supports RAID modes 0, 1, and 5
- One DIMM slot for on-board memory up to 128 MB, 32 MB included
- full size 32-bit PCI card
On the controller card, next to the i960, you’ll find two HPT370 chips from HighPoint – one of Promise’s biggest competitors (see below).
HighPoint RocketRAID 404: Conventional but with 4 Channels
The RocketRAID 404 is currently the most highly recommended controller. Not only does the chip used (HPT374, developed and manufactured by HighPoint) master UltraATA/133, it also comes with four full-fledged IDE channels. Consequently, the controller is able to handle eight drives de facto.
- Four channels for up to two drives per channel
- UltraATA/133
- supports RAID modes 0, 1, 1+0
- 32-bit PCI card
6-Channel Monster: Promise SuperTrak SX6000
We selected an IDE RAID controller that we could use in as many tests as possible without having to move on to another model. The winner was the SuperTrak SX6000 by Promise, which offerings a maximum in features:
- Six channels for each device
- UltraATA/100
- supports RAID modes 0, 1, 1+0 and 5
- i960 RISC processor
- One DIMM slot for on-board memory up to 128 MB
- full size 32-bit PCI card
Similar to most other IDE RAID controllers, the SC6000 does not support any ATAPI devices such as, for example, ZIP drives, CD-ROM drives or DVD drives. Nor has it been designed to run only one hard drive. However, hard drives can still be run independently from each other, by setting up a RAID array for every single drive.
The fact that this controller does not support UltraATA/133 is not critical for two reasons. For one, there are only a few hard drives that are equipped with this interface. Second, the additional performance is limited to the rare experience of being able to read data directly from the hard drive’s cache instead of from the disk surface.
Test Configuration
Test System | |
Processor | Intel Pentium 4, 2.26 GHz256 KB L2 cache (Northwood) |
Motherboard | Asus P4B533845E chipset |
RAM | 256 MB DDR/PC2100, CL2Infineon |
IDE controller | i845E UltraDMA/100-Controller (ICH4)HighPoint RocketRAID 4044 Channel UltraATA/133 RAID |
Graphics card | NVIDIA GeForce3, 64 MB |
Network | 3COM 905TX PCI 100 MBit |
Operating systems | Windows XP Pro 5.10.2600 |
Benchmarks and Measurements | |
Office applications | ZD WinBench 99 – Business Disk Winmark 1.2 |
High-end applications | ZD WinBench 99 – Highend Disk Winmark 1.2 |
Performance test | HD Tach 2.61PC Mark 2002, HD Test |
I/O performance | Intel I/O-Meter |
Drivers and Settings | |
Graphics driver | NVIDIA reference driver 29.42 |
IDE driver | Intel Application Accelerator 2.2HighPoint RAID driver 1.21 |
DirectX version | 8.1 |
Screen resolution | 1024×768, 16 bit, 85 Hz refresh |
Data Transfer Performance
The tests of the transfer rates show that a RAID 0 array can clearly rake in points over a single drive when it comes to writing.
Burst Transfer Rate
Access Time
I/O Performance
The I/O performance in RAID 1 is encouraging: as two identical media are available to read data from, this controller seems to be able to put this type of resource to good use. It can respond to almost as many requests as a RAID 0 with three drives.
Application Disk WinMarks
PC Mark 2002: Disk Index
Conclusion
The benefits of RAID systems are clear: depending on the mode, you’ll get better data security or improved performance – and you’ll get it on a large scale. RAID arrays are about two years ahead of hard drives when it comes to performance. First and foremost, however, RAID helps to deplete your funds, because the more complex and powerful you want the solution to be, the more you’ll need to invest.
That’s why you need to take certain factors into consideration when you’re toying with the idea of purchasing a hardware RAID. What are your requirements? What will you need in order to fulfill those requirements?
Experience has shown that RAID 0 or 0+1 are best for home use. Though RAID 1 ensures excellent data security, subjectively speaking, the investment in the controller and two hard drives hardly pays off, as you won’t notice too much more performance at first. Modern PCs boot rather quickly anyhow, and copying CD-ROMs to the hard drive is not really accelerated by the presence of a RAID, either.
RAID 0 is, without a doubt, the fastest system, yet it harbors risks. Only one single defect means it’s all over.
Only with RAID 3 or 5 will you get good performance along with high data security, but several hundred dollars for the right controller, plus several hard drives, is something most of us cannot afford.
At this point we have to take IDE RAID’s ranking down a notch or two, because, in addition to the costs, disadvantages include the increased administrative efforts as well as the higher temperature and operating noise caused by the number of hard drives.
Furthermore, IDE currently has to deal with a few handicaps: the drives have not been designed for continuous operation (which is important for server applications), and the ATA cabling is downright cumbersome, especially when you’re using several drives; it causes the heat to be trapped in the cabinet and blocks the view to the interior. Be that as it may, Serial ATA is on its way to finally putting the 40-pin cable into a well-deserved retirement.