Constant Thinking | Technology Thoughts for the Quality Geek | by Constantin Gonzalez
Obsolete Post

This post is probably obsolete

This blog post is older than 5 years and a lot has changed since then: Meanwhile, I have changed employer, blogging platform, software stack, infrastructure, and other technologies I work with and interests I follow.
The stuff I wrote about in this article likely has changed, too, so you may find more recent information about things like Solaris, Oracle/Sun systems, Drupal, etc. somewhere else.
Still, this content is provided for your convenience, in case it is still useful for you. Just don’t expect it to be current or authoritative at this point.
Thanks for stopping by!


ZFS is for 1337 Hax0rz

zfscode.jpg

The developers of ZFS are a funny bunch of people. You can tell that by watching the "ZFS: The Next Word" talk, meeting them on conferences, reading their blogs or their comments on mailing lists.

And there are also some funny parts in the ZFS source code, too. In fact, if you use ZFS, you'll have a funny joke sitting on your disk, right under your nose!

I was reminded about this particular joke while listening to Ulrich Gräf's excellent talk on ZFS internal data structures during OSDevCon 2009 (watch a video of Ulrich's talk here).

But first, we need to dig a little bit into the world of ZFS data structures.

ZFS On-Disk Data Structure

ZFS' on-disk data structure is organized as a self-validating merkle-tree. The root of the ZFS tree is called "Überblock".

Yes, the intention is indeed to use the German word "Über" (for Over). So the root of the tree is the "Over-Block", but since English keyboards don't have diaresis on their letters, the usual spelling is "Uberblock". (Don't worry, Diaresis is not an illness, they're the two dots on top of certain vowels that are used in certain languages, but I digress.)

So in order to import a pool, ZFS needs to identify disks as having a ZFS datastructure on them, along with the information about where the Überblock is stored. And in order to do that, ZFS looks for a magic number in each disk it imports.

The Magic Number

The magic number for a ZFS Überblock is: 0x00bab10c. I know it by heart, and soon you will, too.

Checking the source, we see what it really means:


#define	UBERBLOCK_MAGIC		0x00bab10c		/* oo-ba-bloc!	*/

So there you have it, Leetspeak of the finest hex kind, right in the middle of the ZFS source code, and baked into the deepest innards of every ZFS disk!

Of course, I'm not the first to point this out, and perhaps Tabriz deserves the credit for being the first. But it always leads to a round of chuckles when discussing ZFS during a talk with customers :).

You can learn more about the ZFS on-disk format by reading the ZFS on-disk format specs.

The Überblock's magic number may be an old joke by now, so here's a newer one:

The Funny Side of Dedup

Recently, I found another funny piece of source. This time, it was buried deep inside the recently released deduplication source.

As Jeff pointed out, ZFS deduplication uses SHA 256 as a checksum by default. Since it's a strong hash algorithm, the chance of finding two blocks with the same checksum, but different data, is low. Very low. Astronomically low.

Like a colleague of mine put it: "Your server is likely going to disintegrate into dust out of pure radioactive decay before it'll find two different blocks with a matching SHA256 hash!"

Still, a programmer's duty is to write code for every possible case. Including the one with two different blocks sharing the same strong hash. Finding two different blocks with the same hash may be next to impossible, but it is, technically, not completely impossible.

Here's the piece of code that handles this, straight out of zio.c, with a very optimistic comment:


   1990 if (zp->zp_dedup_verify && zio_ddt_collision(zio, ddt, dde)) {
   1991   /*
   1992    * If we're using a weak checksum, upgrade to a strong checksum
   1993    * and try again.  If we're already using a strong checksum,
   1994    * we can't resolve it, so just convert to an ordinary write.
   1995    * (And automatically e-mail a paper to Nature?)
   1996    */
   1997   if (!zio_checksum_table[zp->zp_checksum].ci_dedup) {
   1998     zp->zp_checksum = spa_dedup_checksum(spa);
   1999     zio_pop_transforms(zio);
   2000     zio->io_stage = ZIO_STAGE_OPEN;
   2001     BP_ZERO(bp);
   2002   } else {
   2003     zp->zp_dedup = 0;
   2004   }
   2005   zio->io_pipeline = ZIO_WRITE_PIPELINE;
   2006   ddt_exit(ddt);
   2007   return (ZIO_PIPELINE_CONTINUE);
   2008 }

I'm looking forward to that paper in Nature! In a few million years, that is.

But this still leaves me thinking: How did they test this particular branch of code?

Your Take

What pieces of funny code did you find in the OpenSolaris source? Or elsewhere? Share the fun by posting a comment!

Update (2010-06-20): Deirdré pointed out that Ulrich's talk, in which he esplains the 0x00bab10c joke, is available on video. Watch it now! Thanks, Deirdré!

By Constantin Gonzalez , 2010-06-19, updated: 2017-10-03 in Solaris.


Related posts:

Comments

Welcome!

This is the blog of Constantin Gonzalez, a Solutions Architect at Amazon Web Services, with more than 25 years of IT experience.

The views expressed in this blog are my own and do not necessarily reflect the views of my current or previous employers.


Copyright © 2018 – Constantin Gonzalez – Some rights reserved.
This site uses Google Analytics, Alexa, Feedburner, Amazon Affiliate, Disqus and possibly other web tracking code. See Imprint for details and our information policy. By using this site you agree to not hold the author responsible for anything related to this site.

This page was built using Python, Jinja2, Bootstrap, Font Awesome, AWS Step Functions, and AWS Lambda. It is hosted on Amazon S3 and distributed through Amazon CloudFront.