After moving my blog to its new home and getting my hands dirty with Drupal, it's time to continue my series of blog articles about setting up a home server. Remember? We talked about home server requirements, then I presented to you my small and energy-efficient, still ECC-protected and powerful AMD-based home server. Now it's time to explore some different ZFS disk pool RAID strategies.
The great thing about ZFS for home servers is that it gives you the power of RAID without the need to pay for expensive RAID-cards. In fact, the RAID options offered by ZFS rival those of really big, powerful and expensive enterprise disk systems! And all that with cheap, consumer-grade disks.
A lot of this has been written already, so let me focus on ZFS RAID options in a home server context (you can skip this short ZFS intro and jump straight to the RAID-Greed discussion below if you know this already):
Home Server ZFS RAID Options
For home servers, you have the same options than your big server brothers in datacenter-land. Enterprise RAID power to the people!
RAID-0: Basic striping. More disks, more space. Simple but dangerous: One disk breaks, and your whole pool is lost. Only recommended in addition to configuring RAID for fault tolerance. See the
zpool(1M) addman page.
RAID-1: Basic mirroring. More disks, more reliability, same capacity. If a disk breaks, you still have enough disks left with all the data. You also get more read speed: ZFS will fetch data blocks in a round-robin fashion from all disks in the mirror, in parallel. While 2-way mirrors are most often used, 3- or more-way mirrors are possible, too. See
zpool(1M) attachfor details.
RAID-Z: Similarly to RAID-5, you get n-1 disks worth of space, and spend 1 parity disk for fault tolerance. One disk breaks, you still get to your data. Performance is more complicated: Writes are spread across all disks, so essentially, per I/O, you'll always get the performance of a single disk. Same for reads. If you're lucky and the stars align and you read a lot of data at once, you may get more than that. I'll let the
zpool(1M)man page explain the rest.
RAID-Z2 and -Z3: Same as RAID-Z but you get to spend 2 or 3 disks for parity. This means that 2 or 3 disks may fail before you lose any data. You guessed it: The
zpool(1M)man page is your friend.
What ZFS Gives You that Controllers Can't
So far, so good. There are a couple of things that ZFS RAID levels offer that traditional RAID controllers can't:
ZFS RAID is about as fast as hardware RAID. That's right, there's no need to pay money for RAID controllers any more. The explanation is hidden in the algorithms behind ZFS and there are a couple of articles out there to explain why. You can trust me on this or ask me to write an extra article on ZFS performance.
ZFS can detect and recover from partial fails. It's easy to detect a disk that is completely broken: It won't answer to any SATA/SCSI commands. It's less easy for the cases where the disk seems happy, but returns bad data. That's where you lose data without knowing, and these cases are much more common! ZFS will detect any bad block and be able to recover from it, if you configured any RAID-level above 0. Before ZFS, nothing could give you that level of data integrity. In my home server experience with consumer grade disks, I typically see checksum errors every 4-6 months and about 1 fully broken disk every 2 years, with a total population of about 4-6 disks.
ZFS is open-source and cross-platform. This is important, because you can rip out ZFS disks from any server, mix them if you like, then put them into any other server that speaks ZFS, and your pool will be readable. Try that with a 3 year old RAID controller when it's broken and you can't buy any replacement one!
There's still more but let's focus on a different question, one that home server builders tend to neglect too often:
I've seen many people, enterprise customers, developers, consultants and home server builders blindly deciding to use RAID-Z (or RAID-5 if they use a controller). It seems like a natural choice: You buy disks by the Gigabyte, you want to get the most out of them, so you configure them for maximum space, because all that counts is capacity. Really?
Let's say we want to create a 2TB pool. What are the options?
A single 2TB disk or multiple smaller, striped disks: Forget it. We're talking consumer disks here, they'll break sooner rather than later and then all of your data will be lost. If this is a backup, or scratch space and you have other copies elsewhere, this may work, but not for the main data pool of your server.
Mirroring: Buy 2 x 2 TB disks and mirror them together. Today, a pair of Samsung F3 HD203WI(the cheapest 2 TB disks I could find on Amazon.de) will cost you EUR 280 and you're done. You could also buy 4 x 1 TB disks and stripe together 2 mirrors of 2 disks to get a total of 2 TB, but that would be slightly more expensive (around EUR 300). If you're after performance, this is still a good option, because 4 disks in a striped mirror tend to be twice as fast as 2.
RAID-Z: Buy 3 x 1 TB (for example the Samsung F3 HD103SJ, which is the small brother of the 2TB drive above), and you'll get a 2TB RAID-Z pool for about EUR 225.
Mirroring and RAID-Z Compared
So how does Mirroring and RAID-Z compare?
Price: RAID-Z wins by about 20%. Nice.
Performance: We'll assume a random read usage pattern. The mirror will write at about the same performance as a single disk. So does the RAID-Z pool. But for reads, the mirror will be up to twice as fast, because data blocks can be fetched in parallel from both disks. The RAID-Z configuration needs to access all disks for every single block of data, so not much to gain here. There is a corner case where streaming large amounts of data (if you're lucky) can take advantage of all the disks in parallel, but this doesn't apply to the regular usage pattern of a home server.
The difference gets worse if you increase the number of disks: A 2 x 2 mirror will be roughly twice as fast for writes and 4 times as fast for reads, while a 2+1 RAID-Z is still stuck at the speed of a single disk for both writes and reads.
Does performance matter for a home server? Maybe, maybe not: Average performance for a single drive of this class is around 60-70 MB/s (and I've seen that on my current home server, too), but this is only about 50% of what Gigabit Ethernet offers, so you won't be able to run a backup at full network speed. But I agree that 60 MB/s may be enough for most use cases.
Fault-Tolerance: In the simplest case, there's almost a draw: Both the mirror and the RAID-Z config can compensate for 1 broken disk. But since the RAID-Z pool has more disks (and assuming roughly the same per-disk probability of disk failure), there's a bigger chance that a disk breaks and that you'll need to order a replacement (and replace it quickly enough!) for RAID-Z. The difference increases with the number of disks: In a 2x2 mirror of 2TB disk scenario, up to 2 disks may break, if they're the right ones, before data is lost. In a 3+1 RAID-Z scenario (same number of disks), still only one disk may break before you're in trouble.
Flexibility: So you want to upgrade your pool because your new DSLR filled the old one too quickly with all those RAWs. In the 2-way mirror case, you just buy 2 extra disks (of your choice) and
zpool addthem, and you're done. Or you buy 2 bigger disks and replace your old disks with them, anticipating that the old one may break soon. With RAID-Z, the entry hurdle to replacement starts at 3 disks and goes up with stripe size, unless you want to end up with a chaotically mixed RAID-Z+Mirror configuration (This works, but is not recommended). This may become unwieldy if your server case isn't the biggest and it introduces yet more disks into your pool that may break and add to your gray hair count.
If you want to dig deeper, Richard Elling wrote a more complete discussion (with more disks) of Price/Performance/Fault Tolerance of RAID-Levels which I highly recommend to read.
To summarize: The 20% price advantage that RAID-Z gives us, doesn't buy us very much:
We get slower read performance,
we get less fault-tolerance (read: A higher probability that we lose data), and
we get less flexibility and more clutter.
More disks in a RAID-Z set mean bigger savings, but performance, fault tolerance and flexibility gets worse the bigger the stripes get.
Conclusion and a Bonus Strategy
You already guessed it: I like mirroring! Especially for home servers. It's simple. It's fast. It does a better job at protecting my data. I can expand my pool in increments of 2, not 3 disks. As Richard concludes in his article: Life's happier with mirrors.
I even added an extra bonus to my home server pool strategy:
In the summer of 2009, when I built my home system, I bought 2 x 1.5 TB disks (the older Samsung F2 HD154UI) and mirrored them (today, I'd buy two same-sized disks from different vendors, just to be sure not to run into any serial production issues).
In January, one of the disks showed 14 read errors. ZFS was able to fix them, because the disks were mirrored, and I got a warning: Time to think about hot spares. I bought the 2 TB Western Digital WD20EADSfor two reasons:
Just in case Samsung had a bad year, I wanted to switch vendors. Over time, I'll spread risk over different vendors in my pool.
When attaching the new disk to the 1.5 TB pool, I'll avoid any "sorry, your disk is just a few blocks to small" errors. While two disks may be sold with the same number of GB, they still may differ in small amounts. But adding a slightly smaller disk to a ZFS mirror simply doesn't work, but adding a bigger one always does.
I kept the disk that showed the read errors for now (it still works ok after resilvering), and attached the new disk to the mirror, forming a 3-way mirror. This is like a hot-spare that is already sync'ed in, providing an extra layer of fault-tolerance (I wouldn't complain about the extra 4W of power consumption, this is still as good as the cheaper 2+1 disk RAID-Z configuration from a power perspective, but much more fault-tolerant).
I'm now waiting for one of the 1.5 TB disks to really fail. This will give me an excuse to buy a second 2 TB disk, get rid of both 1.5 TB drives and ZFS will automatically grow the pool size to 2 TB. Automatic, organic pool growth through faulty drive replacement!
Once any of the 2 TB drive starts showing first signs of failure, the whole cycle will start again, with bigger drives. Or, I may decide to add the next bigger hot spare sooner, rather than later, and upgrade to a 3-way mirror again, before any new errors start nagging me again.
All Good Things Come in Three
Now there you have it: I'm proposing 3-way mirrors for home servers. Really, there's no reason not to!
Contrary to widespread GB-greed practice, disks are cheap!
Disk performance and disk fault-tolerance are not cheap. While I may settle for 60 MB/s vs. the maximum of 128 MB/s that Gigabit Ethernet offers, I really don't want to lose data. This is the real goal: Don't lose data.
You should always have a hot-spare. Why not sync it in already, save resilvering time and avoid windows of vulnerability?
As an extra bonus, a 3-way mirror gives you 3x read performance. Cool!
And finally, mirroring always gives you a granularity of 2 for expansion, which is useful. No need to save money to buy that 5-drive set! Using a 3rd drive is optional with mirroring, it really is just a hotspare that is sync'ed in already, and it lets you sleep really well!
What's your RAID strategy for home servers? What's your rationale behind it? What experiences did you make with broken disks, hot-spares and replacements? Let me know by leaving a comment!
Update: Corrected "128 GB/s" into "128 MB/s". Thanks to Tom for pointing this out!
- OpenSolaris Home Server Scripting Howto Part One: Intro and a Simple ZFS Auto-Snapshot Enabling Script
- OpenSolaris ZFS Deduplication: Everything You Need to Know
- OpenSolaris ZFS Home Server Reference Design
- Seven Useful OpenSolaris ZFS Home Server Tips
- Ten Ways To Easily Improve Oracle Solaris ZFS Filesystem Performance