ZFS Is for 1337 Hax0rz · Constant Thinking

The developers of ZFS are a funny bunch of people. You can tell that by watching the “ZFS: The Next Word” talk, meeting them on conferences, reading their blogs or their comments on mailing lists.

And there are also some funny parts in the ZFS source code, too. In fact, if you use ZFS, you’ll have a funny joke sitting on your disk, right under your nose!

I was reminded about this particular joke while listening to Ulrich Gräf’s excellent talk on ZFS internal data structures during OSDevCon 2009 (no link, osdevcon.org no longer exists) (watch a video of Ulrich’s talk here (no link, sun.com no longer exists)).

But first, we need to dig a little bit into the world of ZFS data structures.

ZFS On-Disk Data Structure

ZFS’ on-disk data structure is organized as a self-validating merkle-tree. The root of the ZFS tree is called “Überblock”.

Yes, the intention is indeed to use the German word “Über” (for Over). So the root of the tree is the “Over-Block”, but since English keyboards don’t have diaresis on their letters, the usual spelling is “Uberblock”. (Don’t worry, Diaresis is not an illness, they’re the two dots on top of certain vowels that are used in certain languages, but I digress.)

So in order to import a pool, ZFS needs to identify disks as having a ZFS datastructure on them, along with the information about where the Überblock is stored. And in order to do that, ZFS looks for a magic number in each disk it imports.

The Magic Number

The magic number for a ZFS Überblock is: 0x00bab10c. I know it by heart, and soon you will, too.

Checking the source (no link, opensolaris.org no longer exists), we see what it really means:

#define	UBERBLOCK_MAGIC		0x00bab10c		/* oo-ba-bloc!	*/

So there you have it, Leetspeak of the finest hex kind, right in the middle of the ZFS source code, and baked into the deepest innards of every ZFS disk!

Of course, I’m not the first to point this out, and perhaps Tabriz deserves the credit for being the first (no link, sun.com no longer exists). But it always leads to a round of chuckles when discussing ZFS during a talk with customers :).

You can learn more about the ZFS on-disk format by reading the ZFS on-disk format specs (no link, opensolaris.org no longer exists).

The Überblock’s magic number may be an old joke by now, so here’s a newer one:

The Funny Side of Dedup

Recently, I found another funny piece of source. This time, it was buried deep inside the recently released deduplication source.

As Jeff pointed out (no link, sun.com no longer exists), ZFS deduplication uses SHA 256 as a checksum by default. Since it’s a strong hash algorithm, the chance of finding two blocks with the same checksum, but different data, is low. Very low. Astronomically low.

Like a colleague of mine put it: “Your server is likely going to disintegrate into dust out of pure radioactive decay before it’ll find two different blocks with a matching SHA256 hash!”

Still, a programmer’s duty is to write code for every possible case. Including the one with two different blocks sharing the same strong hash. Finding two different blocks with the same hash may be next to impossible, but it is, technically, not completely impossible.

Here’s the piece of code that handles this, straight out of zio.c (no link, opensolaris.org no longer exists), with a very optimistic comment:

   1990 if (zp->zp_dedup_verify && zio_ddt_collision(zio, ddt, dde)) {
   1991   /*
   1992    * If we're using a weak checksum, upgrade to a strong checksum
   1993    * and try again.  If we're already using a strong checksum,
   1994    * we can't resolve it, so just convert to an ordinary write.
   1995    * (And automatically e-mail a paper to Nature?)
   1996    */
   1997   if (!zio_checksum_table[zp->zp_checksum].ci_dedup) {
   1998     zp->zp_checksum = spa_dedup_checksum(spa);
   1999     zio_pop_transforms(zio);
   2000     zio->io_stage = ZIO_STAGE_OPEN;
   2001     BP_ZERO(bp);
   2002   } else {
   2003     zp->zp_dedup = 0;
   2004   }
   2005   zio->io_pipeline = ZIO_WRITE_PIPELINE;
   2006   ddt_exit(ddt);
   2007   return (ZIO_PIPELINE_CONTINUE);
   2008 }

I’m looking forward to that paper in Nature! In a few million years, that is.

But this still leaves me thinking: How did they test this particular branch of code?

Your Take

What pieces of funny code did you find in the OpenSolaris source? Or elsewhere? Share the fun by posting a comment!

Update (2010-06-20): Deirdré pointed out that Ulrich’s talk, in which he explains the 0x00bab10c joke, is available on video. Watch it now! (no link, sun.com no longer exists) Thanks, Deirdré!