One of the first things to do when setting up a new system is to mirror your boot disk. This protects you against system disk failures: If one of the two mirrored boot disks fails, the system can continue running from the other disk without downtime. You can even boot from the surviving mirror half and continue using the system normally, until you have replaced the failed half.
At the currently low prices for boot drive sized disks, this is a no-brainer for increasing your system’s availability, even for a home server system.
Unfortunately, the steps to complete until you’re running off a mirrored ZFS root pool are not yet a no-brainer. While there is a piece of documentation entitled How to Configure a Mirrored Root Pool, it only covers how to add a second disk to your root pool, it does not cover how to prepare and layout a fresh disk so Solaris will accept it as a bootable second half of an rpool mirror.
Which, for historic reasons, is slightly more complicated than just saying zpool attach
.
Over the weekend, I sat down and played a bit with the current Oracle Solaris 11 Express (no link, page no longer exists) release in VirtualBox and tested, re-tested and investigated all currently necessary steps to get your root pool mirrored, including some common issues and variations.
Here’s a complete, step-by-step guide with background information on how to mirror your ZFS root pool:
The Basic Plan
After a standard install of Oracle Solaris 11 Express, we’ll have our system disk configured as a ZFS root pool called rpool
. The rpool disk is set up as an fdisk
partition with some SMI partitions (= “slices”) on top. The fdisk
part is for compatibility with other OSes, the SMI slicing is done in order to reserve some room on the physical disk for the boot blocks and GRUB.
This is different from a regular ZFS data disk which would normally use EFI (not fdisk) labels and no further partitioning.
So here’s the basic plan on how to turn a fresh disk into an rpool mirror:
You see, the official documentation only covers step 4 above, and let’s you guess about the other steps. Here’s the full sequence of stuff to do to create a proper mirror in more detail:
1. Figure Out Your System’s Disks
Hard drives in Solaris show up in the /dev/rdsk
directory as raw devices and the same drives with the same names show up again in /dev/dsk
. The former are used to perform raw partitioning and low-level options, while the latter is the standard way to access disks from a day-to-day point of view such as setting up ZFS pools.
Here’s a typical device name: c0t0d0s0
. The naming convention is simple: Controller 0, SCSI target 0, disk 0 and Solaris slice 0. Of course, the digits may vary and even become multi-digits in larger systems such as c12t18d5s8
, but the convention is always the same.
PATA systems omit the t0
part, because PATA doesn’t support “targets” like SCSI or SATA does. This will give you devices like: c0d0s0
.
Sometimes, when dealing with DOS partitions, you’ll see a p0
part instead of the (Solaris specific) s0
piece. This simply refers to DOS partition 0 (or any other DOS primary partition).
So before we do anything, we need to figure out what disks we are dealing with, what device names they have and if they’re used somewhere else already. Two commands will help us here:
-
zpool status
will print information about running zpools. This should tell you what the device name for your existing root pool (“rpool”) is. On my system, I get this:admin@s11test:~$ zpool status pool: rpool state: ONLINE scan: resilvered 2.61G in 0h16m with 0 errors on Sun Mar 13 21:01:06 2011 config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 c7t0d0s0 ONLINE 0 0 0 errors: No known data errors
This means my rpool sits on controller 7, target 0, disk 0 and slice 0.
-
The easiest, interactive way of figuring out all of your disks in the system would be the
format
command, but we don’t want to spend time going through menus and needless interactivity. Here’s a less common, but effective option:cfgadm
. This command will tell you what disks we have in the system:admin@s11test:~$ cfgadm -s "select=type(disk)" Ap_Id Type Receptacle Occupant Condition sata0/0::dsk/c7t0d0 disk connected configured ok sata0/1::dsk/c7t1d0 disk connected configured ok
Not surprisingly, the second disk in our system therefore sits on target 1 of the same controller. Since
cfgadm
only knows about hardware, not (software) slices, it omits any “s” part.
Now we know what disks we have, which of them is used for rpool already, and which ones are available as a second mirror half for our rpool.
2. x86 only: Set Up a Single Fdisk Partition on Your Second Disk
Solaris disk partitioning works differently in the SPARC and in the x86 world:
-
SPARC: Disks are labeled using special, Solaris-specific “SMI labels”. No need for special boot magic or GRUB, etc. here, as the SPARC systems’ OpenBoot PROM is intelligent enough to handle the boot process by itself.
-
x86: For reasons of compatibility with the rest of the x86 world, Solaris uses a primary
fdisk
partition labeledSolaris2
, so it can coexist with other OSes. Solaris then treats itsfdisk
partition as if it were the whole disk and proceeds by using an SMI label on top of that to further slice the disk into smaller partitions. These are then called “slices”. The boot process uses GRUB, again for compatibility reasons, with a special module that is capable of booting off a ZFS root pool.
So for x86, the first thing to do now is to make sure that the disk has an fdisk partition of type “Solaris2” that spans the whole disk. For SPARC, we can skip this step.
fdisk
doesn’t know about Solaris slices, it only cares about DOS-style partitions. Therefore, device names are different when dealing with fdisk
: We’ll refer to the first partition now and call it “p0”. This will work even if there are no partitions defined on the disk, it’s just a way to address the disk in DOS partition mode.
Again, we could use fdisk
in interactive mode and wiggle ourselves through the menus, but I prefer the command line way. Here’s how to check if your disk already has some kind of DOS partitioning:
admin@s11test:~# fdisk -W - c7t1d0p0
* /dev/rdsk/c7t1d0p0 default fdisk table
* Dimensions:
* 512 bytes/sector
* 63 sectors/track
* 255 tracks/cylinder
* 2088 cylinders
*
* systid:
* 1: DOSOS12
* 2: PCIXOS
* 4: DOSOS16
(lots of id specifications omitted…)
* 191: SUNIXOS2
* 238: EFI_PMBR
* 239: EFI_FS
*
* Id Act Bhead Bsect Bcyl Ehead Esect Ecyl Rsect Numsect
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
The second -
tells the W
option to write to standard out instead of to a file.SUNIXOS2
(191) really means SOLARIS2
. This is the partition type that we’ll create soon.
Here’s how to apply a default Solaris fdisk
partition to a disk in one simple step:
admin@s11test:~# fdisk -B c7t1d0p0
That’s it. Be careful and double-check that you got the device name right! If you’re unsure, you can still use the interactive version (fdisk c7t1d0p0
) and work through the menus by hand.
Now let’s verify that we got what we wanted:
admin@s11test:~# fdisk -W - c7t1d0p0
* /dev/rdsk/c7t1d0p0 default fdisk table
* Dimensions:
* 512 bytes/sector
* 63 sectors/track
* 255 tracks/cylinder
* 2088 cylinders
*
* systid:
* 1: DOSOS12
* 2: PCIXOS
* 4: DOSOS16
(stuff omitted…)
* 191: SUNIXOS2
* 238: EFI_PMBR
* 239: EFI_FS
*
* Id Act Bhead Bsect Bcyl Ehead Esect Ecyl Rsect Numsect
191 128 0 1 1 254 63 1023 16065 33527655
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
Here’s the fdisk
partition we wanted. Its type is 191 which equals to SOLARIS2
(you can double-check using the interactive version of fdisk
), and it spans the whole disk.
3. Set Up an SMI Label With the Same Partitioning on the Second Disk
Before ZFS can do its magic, we need to tell it where on the disk the rpool’s mirror is supposed to be, and what blocks are off-limits because they’re supposed to host the GRUB bootloader. This is done by using a Solaris SMI label that breaks down our Solaris2 fdisk
partition into Solaris “slices”.
Again, there’s an interactive possibility using the format
command, which involves many interactive steps (print out the original disk’s layout, set it up step by step on the second disk, write the label), but we want to be cool here, so we’ll do it in a single step, again:
admin@s11test:~# prtvtoc /dev/rdsk/c7t0d0s0 | fmthard -s - /dev/rdsk/c7t1d0s0
fmthard: New volume table of contents now in place.
That’s it. You can check how the new Solaris-style partitioning looks like on the second disk and compare to the first one. Here’s my first disk:
admin@s11test:~# format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c7t0d0 <ATA -VBOX HARDDISK -1.0 cyl 2085 alt 2 hd 255 sec 63>
/pci@0,0/pci8086,2829@d/disk@0,0
1. c7t1d0 <ATA -VBOX HARDDISK -1.0 cyl 2085 alt 2 hd 255 sec 63>
/pci@0,0/pci8086,2829@d/disk@1,0
Specify disk (enter its number): 0
selecting c7t0d0
[disk formatted]
/dev/dsk/c7t0d0s0 is part of active ZFS pool rpool. Please see zpool(1M).
FORMAT MENU:
disk - select a disk
type - select (define) a disk type
partition - select (define) a partition table
current - describe the current disk
format - format and analyze the disk
fdisk - run the fdisk program
repair - repair a defective sector
label - write label to the disk
analyze - surface analysis
defect - defect list management
backup - search for backup labels
verify - read and display labels
save - save new disk/partition definitions
inquiry - show vendor, product and revision
volname - set 8-character volume name
!<cmd> - execute <cmd>, then return
quit
format> p
PARTITION MENU:
0 - change `0' partition
1 - change `1' partition
2 - change `2' partition
3 - change `3' partition
4 - change `4' partition
5 - change `5' partition
6 - change `6' partition
7 - change `7' partition
select - select a predefined table
modify - modify a predefined partition table
name - name the current table
print - display the current table
label - write partition map and label to the disk
!<cmd> - execute <cmd>, then return
quit
partition> p
Current partition table (original):
Total disk cylinders available: 2085 + 2 (reserved cylinders)
Part Tag Flag Cylinders Size Blocks
0 root wm 1 - 2084 15.96GB (2084/0/0) 33479460
1 unassigned wm 0 0 (0/0/0) 0
2 backup wu 0 - 2084 15.97GB (2085/0/0) 33495525
3 unassigned wm 0 0 (0/0/0) 0
4 unassigned wm 0 0 (0/0/0) 0
5 unassigned wm 0 0 (0/0/0) 0
6 unassigned wm 0 0 (0/0/0) 0
7 unassigned wm 0 0 (0/0/0) 0
8 boot wu 0 - 0 7.84MB (1/0/0) 16065
9 unassigned wm 0 0 (0/0/0) 0
partition> q
And here’s my second disk:
admin@s11test:~# format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c7t0d0 <ATA -VBOX HARDDISK -1.0 cyl 2085 alt 2 hd 255 sec 63>
/pci@0,0/pci8086,2829@d/disk@0,0
1. c7t1d0 <ATA -VBOX HARDDISK -1.0 cyl 2085 alt 2 hd 255 sec 63>
/pci@0,0/pci8086,2829@d/disk@1,0
Specify disk (enter its number): 1
selecting c7t1d0
[disk formatted]
FORMAT MENU:
disk - select a disk
type - select (define) a disk type
partition - select (define) a partition table
current - describe the current disk
format - format and analyze the disk
fdisk - run the fdisk program
repair - repair a defective sector
label - write label to the disk
analyze - surface analysis
defect - defect list management
backup - search for backup labels
verify - read and display labels
save - save new disk/partition definitions
inquiry - show vendor, product and revision
volname - set 8-character volume name
!<cmd> - execute <cmd>, then return
quit
format> p
PARTITION MENU:
0 - change `0' partition
1 - change `1' partition
2 - change `2' partition
3 - change `3' partition
4 - change `4' partition
5 - change `5' partition
6 - change `6' partition
7 - change `7' partition
select - select a predefined table
modify - modify a predefined partition table
name - name the current table
print - display the current table
label - write partition map and label to the disk
!<cmd> - execute <cmd>, then return
quit
partition> p
Current partition table (original):
Total disk cylinders available: 2085 + 2 (reserved cylinders)
Part Tag Flag Cylinders Size Blocks
0 root wm 1 - 2084 15.96GB (2084/0/0) 33479460
1 unassigned wu 0 0 (0/0/0) 0
2 backup wu 0 - 2084 15.97GB (2085/0/0) 33495525
3 unassigned wu 0 0 (0/0/0) 0
4 unassigned wu 0 0 (0/0/0) 0
5 unassigned wu 0 0 (0/0/0) 0
6 unassigned wu 0 0 (0/0/0) 0
7 unassigned wu 0 0 (0/0/0) 0
8 boot wu 0 - 0 7.84MB (1/0/0) 16065
9 unassigned wu 0 0 (0/0/0) 0
partition> q
Note: This is a typical x86 layout. It’s likely different on SPARC systems as they don’t use a special slice for boot block hosting. But the basic idea on how to replicate the partition table is the same.
Great! We’re almost there.
4. Set Up the ZFS Rpool Mirror
Now that our second disk is prepared, the rest is quite easy. From now on, we can just follow the standard Solaris documentation for mirroring the root pool.
The right command to use here is zpool attach
. Notice that this is different from zpool add
: By attaching a disk to an existing disk, we mean attaching it to its mirror (you can attach more than one disk to a mirror). By adding a disk to a pool, we mean expanding the pool size in the sense of striping in another disk (or sets of mirrored/RAID-Z disks). For mirroring, zpool attach
is the way to go. Remember? Slice 0 is the one we reserved for the rpool’s mirrored data:
admin@s11test:~# zpool attach rpool c7t0d0s0 c7t1d0s0
invalid vdev specification
use '-f' to override the following errors:
/dev/dsk/c7t1d0s0 overlaps with /dev/dsk/c7t1d0s2
Wait, what happened? ZFS is complaining that two slices are overlapping. If ZFS uses slice 0, and something else uses slice 2, it may overwrite some of ZFS’ data!
In this particular case, ZFS’ worries are unfounded: Slice 2 by convention spans the whole disk and is named “backup” (see the output of format above), so traditional disk backup solutions have a way of easily performing raw backups of whole disks. Today it’s hardly used, but the convention remains for historical reasons.
Therefore, we can safely override this little nit and get our mirror done:
admin@s11test:~# zpool attach -f rpool c7t0d0s0 c7t1d0s0
Make sure to wait until resilver is done before rebooting.
admin@s11test:~# zpool status
pool: rpool
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Tue Mar 15 18:17:32 2011
13.9M scanned out of 2.72G at 594K/s, 1h19m to go
13.3M resilvered, 0.50% done
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c7t0d0s0 ONLINE 0 0 0
c7t1d0s0 ONLINE 0 0 0 (resilvering)
errors: No known data errors
Great! Everything’s working fine now. Before we make the second disk bootable, we should really wait until it has finished resilvering. We don’t want to boot into a half-baked root pool, do we?
Here’s the end state, freshly resilvered:
admin@s11test:~# zpool status
pool: rpool
state: ONLINE
scan: resilvered 2.72G in 0h15m with 0 errors on Tue Mar 15 18:33:23 2011
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c7t0d0s0 ONLINE 0 0 0
c7t1d0s0 ONLINE 0 0 0
errors: No known data errors
5. x86 only: Make the Second Mirror Half Bootable
Since x86 systems depend on a bootloader that is installed on disk, we need to perform a final step so that the system can boot off the second disk, too, in case the first one fails completely.
This is a simple install of GRUB onto the second disk. GRUB, ZFS and Solaris will then figure it out automatically in case you have to boot from the second disk instead of the original one.
admin@s11test:~# installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c7t1d0s0
stage2 written to partition 0, 277 sectors starting at 50 (abs 16115)
stage1 written to partition 0 sector 0 (abs 16065)
Since we’re dealing with a low-level operation (boot blocks etc.), we want to address the devices using the raw device paths. The s0
part is still needed so GRUB knows what slice to boot from.
Almost done!
6. Add the Second Disk to the BIOS’ or OpenBoot PROM’s List of Bootable Devices
This is one of the little things that often gets overlooked but then becomes critical in case of a real failure: The system crashes because the first disk is completely borked, or you force a reboot and the first disk fails to come up again. How does the system know it’s supposed to boot from the second half of the mirror?
-
Managing the Solaris boot behavior and its mechanism is described thoroughly in the documentation:
-
SPARC: Here you usually set up aliases for your bootable mirror halfs in the Open Boot PROM, then assign them to the
boot-device
variable as a list of possible devices to boot from (e.g.: “disk1 disk2 net
“). Check out the SPARC Enterprise Servers (no link, page no longer exists) section of the Oracle System Documentation area, find the administration guide for your particular system, then consult the sections on booting. -
x86: Most BIOSes have a section where you can configure what disks to boot from, in what order and what to do if a disk is not bootable. Here’s a list of current Oracle Sun x86 system documentations (no link, page no longer exists). Again, look for the boot section of your system’s admin manual.
Play With It! And Check Out Some Man Pages!
How do you know if this really works? How do you develop confidence for something critical like booting from a second mirror half, surviving a disk disaster, etc.?
Here’s the easiest option: Use VirtualBox to set up a test system like I did. It comes with ready-to use suggestions for a standard Solaris machine. Then, configure a second virtual disk and play with the commands above. Set up a mirrored rpool, bring down the machine, unconfigure the original disk, then see if it can boot from the second mirror half and so on.
BTW: I did not find a way to tell VirtualBox what disk to boot from (it only allows to specify what type of device to boot from, not what individual disk), so I reverted to just pull out (figuratively speaking) the original boot disk, then test if if boots from the mirrored one.
In short: Play, experiment, break it, etc., until you know what’s going on and are confident to make it happen on your real system.
Finally, here’s a list of useful man pages to check out, including links:
-
cgfadm(1M)
: Asks about and modify’s your system’s hardware. -
fdisk(1M)
: Manipulate DOS-style partitions. -
prtvtoc(1M)
: Print out a disk’s partitioning information in a machine-readable format. -
fmthard(1M)
: Write a partitioning table to a disk. -
format(1M)
: Interactive formatting utility. -
zpool(1M)
: Manipulate ZFS pools. -
installgrub(1M)
: Install the GRUB boot loader.
I hope this article has made rpool mirroring a little easier for you from now on!
Your Take
There are endless variations to the above, and sometimes I’ve been more verbose, or more simplified for the sake of ease-of-use. I’m sure there are many different ways to achieve the same result, so here’s your chance to share your favorite mirrored rpool tricks!
What’s your routine for mirroring rpools? Did you find other good tutorials to share? (‘cause I didn’t, at least nothing obvious in Google…) What are your preferred rpool mirroring tricks?
Feel free to write a comment!
Update: Wow, this article got a lot of comments, thank you! Make sure you check them out as they contain a lot of useful additional information.
Commenting is currently not available, because I’d like to avoid cookies on this site. I may or may not endeavor into building my own commenting system at some time, who knows?
Meanwhile, please use the Contact form to send me your comments.
Thank you!