Friday, August 01, 2008

RAID-0 and Amdahl's Law

When I see comments like this:

Ok, yeah, a 10k drive is fast. Guess what? Two 7200 drives in Raid 0 are faster. Four drives in RAID 0+1 are bigger, cheaper, faster, and... yeah. Better. You have a measly 300GB drive - I have a 1TB (2TB counting mirroring) array that's faster and with automatic mirroring for less than what you spent on one 10k drive. And your cost doesn't include the second slower drive for archiving.

...the little nerd-alarm in my head goes off, and I feel compelled to present the case against RAID-0. For those who aren't aware, RAID stands for Redundant Array of Independent or Inexpensive Drives. The basic idea is simple: by spreading data out over several drives in varying configurations, one can theoretically improve performance and/or protect against data loss. For example, by duplicating data across two separate drives, one drive can fail and there will be no loss of data. This is commonly called RAID-1.

RAID-0, on the other hand, is when data is evenly split between two drives. By doing this, you can read data from both drives simultaneously. However, this also doubles the propensity for data loss--a single drive failure means you lose big.

First of all, I should note that the ground I am about to cover has been discussed before. Anandtech evaluated RAID-o performance using two Western Digital Raptor drives by running them through numerous benchmarks, and concluded that:
If you haven't gotten the hint by now, we'll spell it out for you: there is no place, and no need for a RAID-0 array on a desktop computer. The real world performance increases are negligible at best and the reduction in reliability, thanks to a halving of the mean time between failure, makes RAID-0 far from worth it on the desktop.
StorageReview.com, long the most trusted source with respect to drive performance, has also made it quite clear how unthrilled they are with RAID-0. The administrator of the site, Eugene, states:
Given the recent return of various "should I raid?!?!??!" posts in the community, I should also take time to point out that the FarCry data presented above is relatively STR-heavy compared to the Office and even the High-End pattens that are broken down in the TB4 article.

Despite this, however, note that while (sequential transfer rates represent about 70% of all accesses in FarCry, the array spends only 15% of its time on sequential transfers. Doubling STR through a two-drive array halves this 15%.

Hence, the 10-20% improvement we see in SR's tests when going from a single drive to 2xRAID0 comes from the doubling in capacity (which, in the past, has more or less established itself as a 7-10% performance boost in our tests) + this small improvement.

In other words, sequential transfers already complete so quickly that they have in effect written themselves out of the performance equation. Doubling the performance of a factor that exerts such a small effect nets a small improvement... as one should expect.
What Eugene is getting at is drive performance can be broken down into two distinct factors: physically moving the actuator arm to the location of the data (i.e. seek time), and then actually reading the data. The former is quite slow, while the latter happens rapidly. RAID-0 has no affect on seek times, and effectively doubles read speeds--which means it doubles something that already happens quickly.

There is a formal name for this rule: Amdahl's Law. Consider a program consisting of two distinct routines, A and B:

The length of the line corresponds with how much time each routine takes. Notice that doubling A results in a far greater improvement than making B five times faster. Programmers frequently refer to Amdahl's Law when optimizing software--they first identify the longest running task, and then seek to optimize that task. Bad Programmers™ (yes, we exist) often spend time optimizing B, which is clearly less effective.

Now consider this graph (complements of Storage Review):

The first bar consists of the amount of time it takes a single Western Digital Raptor drive to perform a random read. The second bar consists of the amount of time it takes two Western Digital drives to perform a random read. The yellow section is the actual process of reading data, and the red section is the time it takes to seek to the location on disk. And this brings us to an important conclusion: RAID-0 really only has a big affect with respect to sequential reads, but sequential reads take so little time on modern hard drives that they are completely dwarfed by seek times. Doubling sequential read speeds means jack squat. Again, Eugene states that:
In a typical access pattern that features significant localization such as the Office DriveMark, an adept drive such as the WD740GD can achieve about 600 I/Os per second. Inversely stated, each I/O (which, again, consists of positioning + transfer) takes about 1.7 milliseconds. In a single-drive, highly-localized scenario, the Raptor average 1.7 milliseconds per I/O. Of this 1.7 ms, 0.3 ms, or 18%, is the transfer of data to or from the platter. The other 82% of the operation consists of moving the actuator to or waiting for the platter to spin to the desired location. The situation further polarizes itself as transfer rates rise. At 126 MB/sec, transfers consist of just 11% of the total service time. In effect, sequential transfer rates ranging from 50 MB/sec to 130 MB/sec and higher "write themselves out of the equation" by trivializing the time it takes to read and write data when contrasted with the time it takes to position the read/write heads to the desired location.

...

RAID helps multi-user applications far more than it does single-user scenarios. The enthusiasm of the power user community combined with the marketing apparatus of firms catering to such crowds has led to an extraordinarily erroneous belief that striping data across two or more drives yields significant performance benefits for the majority of non-server uses. This could not be farther from the truth! Non-server use, even in heavy multitasking situations, generates lower-depth, highly-localized access patterns where read-ahead and write-back strategies dominate. Theory has told those willing to listen that striping does not yield significant performance benefits. Some time ago, a controlled, empirical test backed what theory suggested. Doubts still lingered- irrationally, many believed that results would somehow be different if the array was based off of an SATA or SCSI interface. As shown above, the results are the same. Save your time, money and data- leave RAID for the servers!

...or for those of us that don't like verbosity, Eugene's article can be distilled: Amdahl's Law, Baby. There are situations where sequential read is important (e.g. video editing), but the propensity for data loss needs to be carefully considered with the potential trade offs. For the average desktop user, RAID-0 does not provide a substantial performance benefit and comes with an increased chance of data loss, and should be avoided at all costs.