Table of Contents
Acknowledgements
Our gratitude goes to the people who offered feedback on the manuscript that became this book: Will Andrews, Marie Helene Kvello-Aune, Josh Paetzel, Benedict Reuschling, Alan Somers, Matthew Seaman, and Wim Wauters.
Lucas portions of this book were largely written on hardware from iX Systems (http://www.ixsystems.com).
The authors would like to thank the FreeBSD Project and the FreeBSD Foundation for providing access to NVMe devices in the NetPerf cluster, and to Sentex Data Communications for hosting said cluster. Lucas would like to thank Jude for somehow convincing these folks to grant Jude cluster access, because theres no way theyd give it to Lucas. Also because it means that Lucas didnt have to write that part of the book.
Chapter 0: Introduction
The Z File System, or ZFS, is a complicated beast, but it is also the most powerful tool in a sysadmins Batman-esque utility belt. This book tries to demystify some of the magic that makes ZFS such a powerhouse, and give you solid, actionable intel as you battle your storage dragons.
ZFS contains over 100 engineering years of effort from some of the best minds in the industry. While it has competitors, such as B-Tree File System (BTRFS), those competitors have a lot of catching up to do. And ZFS races further ahead every day.
This book takes you into some of the more complicated and esoteric parts of managing ZFS. If you want to know why a single gigabyte of data fills your 2 GB drive, if you want to automatically update your disaster recovery facility, or if you just want to use boot environments on your laptop, FreeBSD Mastery: Advanced ZFS is for you.
Just about everything in this book applies in general to OpenZFS. We use FreeBSD as the reference platform, but the mechanics of using OpenZFS dont change much among platforms.
Prerequisites
The title of the book includes the word Advanced. We expect you to know a couple things before you can use this. The easy answer would be that you should read and assimilate two earlier FreeBSD Mastery titles: Storage Essentials and ZFS. But you might already know whats in those books, so here are some details on what you need to bring with you.
Youll need familiarity with FreeBSDs storage management layer, GEOM. On non-FreeBSD platforms you can use disks and partition devices for ZFS. Always use ZFS on disk or partition devices, not on RAID or other software devices.
We assume youre familiar with ZFS pools and datasets. You know how to add VDEVs to a pool, and understand why you cant add a lone disk to your RAID-Z. You can take snapshots and create clones.
If you want to use FreeBSDs encrypted ZFS support, you must understand FreeBSDs GELI encryption. (You could use GBDE if youre relying on the encryption to preserve human life, but the built-in GELI support suffices for most of us. Also, GELI takes advantage of the AES-NI hardware crypto acceleration in modern CPUs.)
ZFS Best Practices
While you can acquire all the needed ZFS knowledge from publicly available documentation, that wont give you the ZFS best practices weve discussed in earlier books. As with so many other things in technology, the nice thing about best practices is that there are so many of them to choose from.
Were discussing some of our best practices here. Perhaps these practices are better than yours and youll gleefully adopt them. Maybe theyll spark some improvements in your existing best practices. Even if your best practices blow ours away, these at least display our biases so you know how were approaching the issues of storage management.
Space Management
With copy-on-write filesystems, deleting files uses space. Sysadmins accustomed to traditional filesystems might hear this when they start with ZFS, but dont really internalize it until the first time they run out of disk and suffer a nasty shock. As the pool approaches capacity, ZFS needs more and more time to store additional data blocks. Performance degrades. While the ZFS developers keep reducing the performance impact of fragmentation, it becomes more and more of an issue as the pool approaches 100% utilization.
Recovering from a completely full pool is terribly hard. To prevent all of the space from being used, or to at least provide a warning ahead of time, create a reservation.
Ideally, you should create a reservation for 20% of the capacity of your pool. You can always lower the reservation to buy time while you work on adding more capacity or removing old data. The last thing you want is to unexpectedly run out of space. This can give you the soft landing that the Unix File System (UFS) offers, where only root can use up the last few percent of available disk space.
On this 1 TB pool, we create a new dataset with 200 GB refreservation.
# zfs create -o refreservation=200G mypool/reserved
Any time youre exploring space issues on a ZFS dataset, remember the zfs get space command. Youll see all of the space-related properties in a single convenient display.
# zfs get space zstore/usrNAME PROPERTY VALUE SOURCEzstore/usr name zstore/usr -zstore/usr available 5.00T -zstore/usr used 367M -zstore/usr usedbysnapshots 0 -zstore/usr usedbydataset 140K -zstore/usr usedbyrefreservation 0 -zstore/usr usedbychildren 367M -
While zfs get space wont free up space for you, its the quickest path to finding out where all your space went.
Picking a VDEV Type
As discussed at length in FreeBSD Mastery: ZFS, selecting the correct VDEV type when creating your pool is the most important decision you make. It affects the performance of your pool, as well as the expansion possibilities.
A study by Pris, Amer, Long, and Schwarz (http://arxiv.org/ftp/arxiv/papers/1501/1501.00513.pdf) found that to build a disk array that could survive for four years with no human interaction, required triple parity RAID. Double parity, even with an unlimited number of spares, cannot maintain 99.999% (five nines) reliability over a four-year period.
Combine this consideration with the hardware you have and your expected future storage needs.
The Importance of Labels
By labeling drives, you save your future self a lot of headache. Label your disks and partitions before adding them to a ZFS poolor, indeed, using them in any way, for reasons well discuss through this section.
Take the case of an unfortunate friend of Judes, who created a pool with raw device names. When a device failed, he rebooted before replacing the disk. His pool looked a little different than he expected.
# zpool statuspool: datastate: DEGRADEDstatus: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state.action: Wait for the resilver to complete. scan: resilver in progress since Sat Apr 11 17:49:38 2015 62.0M scanned out of 1.55T at 5.16M/s, 87h40m to go 9.81M resilvered, 0.00% doneconfig:NAME STATE READ WRITE CKSUMdata DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 spare-0 UNAVAIL 0 0 0 5694975301095445325 FAULTED 0 0 0 was /dev/da1 da7 ONLINE 0 0 856 (resilvering) da14 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 da1 ONLINE 0 0 0 da13 ONLINE 0 0 0
Originally, the pool had consisted of two mirrors: mirror-0 of da1 and da15, and mirror-1 of da2 and da14. Disk da1 failed.
FreeBSD dynamically assigns disk device nodes at boot. With da1 missing, FreeBSD numbered the remaining disk devices to shift one number lower. Disk da15 became da14, da14 became da13, and worst of all, da2 became da1.