More Tech Books from Michael W Lucas
Absolute BSD
Absolute OpenBSD (1 st and 2 nd edition)
Cisco Routers for the Desperate (1 st and 2 nd edition)
PGP and GPG
Absolute FreeBSD
Network Flow Analysis
the IT Mastery Series
SSH Mastery
DNSSEC Mastery
Sudo Mastery
FreeBSD Mastery: Storage Essentials
Networking for Systems Administrators
Tarsnap Mastery
FreeBSD Mastery: ZFS (coming soon)
FreeBSD Mastery: Specialty Filesystems (coming soon)
Brief Contents
Acknowledgements
Books are not written in a vacuum. For one thing, the author would turn blue and his eyeballs would explode. Unless he had a pressure suit. But its hard to type in those heavy gloves, so hed need a custom pressure suit. And a tank of air wont last more than a page or two.
Actually, vacuum-writing might improve many books. The poorly planned ones would certainly be shorter.
But for this book in particular, several people sacrificed their free time so that I wouldnt write only from my own experience. They range from complete novices to Tarsnap masters. Their comments and thoughts helped improve this book immeasurably, and even when I didnt take their advice, they often made me reconsider why I expressed concepts the way I did. They are, in alphabetical order: Navan Carson, Trannie Carter, John Gamble, Josh Grosse, Larry Hynes, Denis Krienbh, Henry Hagns, Frank Moore, Hakisho Nukama, Andreas Olsson, Jason Tubnor, and Scott Vokes.
Thanks also go to Colin Percival, both for creating Tarsnap and for granting me unlimited access to his brain as I wrote this book.
This books chapters are numbered in octal. Because computers.
For Liz
Chapter 00: The Backup Problem
Everyone from big organizations to family photographers worries about preserving their precious data in the event of system failures. But time has changed what we worry about.
Ive worked for more than one firm with more than one room choked with racks and racks of densely packed tapes. Many of these tapes have tidy computer-printed labels that have faded with time. Others have jagged labels hand-scribbled by the over-caffeinated squirrel that the company fired last year. And theres always one label loose in the middle of the floor, the glue failure unnoticed. Restoring data requires locating the correct tapeswhich might not be anywhere near each other, labeled correctly, or labeled at all.
Then theres offsite storage. If the organizations offices are destroyed, offsite backups mean that the company might be able to restore critical data and either close down in an orderly manner or perhaps even survive. Many companies exist solely to shuffle backup tapes to and from offsite storage. This involves boxing up the correct physical tapes and cramming returned tapes into their correct spaces on the aforementioned overloaded shelves. Hopefully the backup firm is better at managing tapes than you are. And theres no obvious problem in having a companys groundbreaking research data worth billions protected by a couple of minimum-wage security guards.
Ive unwillingly concluded that most technical people arent any good at managing physical tapes over time. We get totally disgusted and start over, with a nice new organizational system that promises to solve all issues forever, but over time that system degrades into exactly the same maddening morass. Backup tape management requires the attitudes and aptitudes of a fascist thug masquerading as a file clerk.
Then theres tape disposal. Backups contain sensitive, confidential, or possibly damning information. Big companies and government agencies erase or physically destroy worn-out tapes before disposing of them, and even small companies in certain fields must do the same. Mind you, if someone got your organizations old tapes theyd need your backup software and the right hardware to restore them, but it would be possible.
Sysadmins have a really hard time destroying physical tapes, even if they have a government mandate glaring down at them. Is the data on there really unused? Are you sure? Destroying those tapes might mean your job, while nobody would notice yet another box of worn-out tapes in the back of the tape swamp. That room is so terrible that only the tape monkeys go there anyway. The risk/benefit calculation is clear; you keep the tapes if at all possible. Even though their presence reduces the odds of finding the tapes you really need.
Ubiquitous bandwidth and cloud services have changed backup management.
Systems administrators are good at managing logical entities like data. Were really good at that, so long as we dont have to touch anything but our keyboards. If a company has hundreds of megabits of external bandwidth available, why not send your backups out over the network, keep the backups in an easily searchable and retrievable format, and turn that horrendous tape room into the geek lounge?
More and more organizations run services in virtual environments provided by external vendors, somewhere far removed from their sysadmins. These hosts have access to far more bandwidth than any enterprise could have hoped for even a few years ago. Backing up these hosts over the network makes even more sense.
But network backup carries a whole bunch of new problems with it.
Placing your backup data on the network means that your old data is live. Someone who stole your tapes from an external warehouse would need a bunch of hardware to retrieve your data. Not so with network backup. Even if you use remote offline backup services like Amazon Glacier, a few keystrokes will resurrect your data in a couple hours.
You could run a private remote backup service, but then you must care for all the hardware and infrastructure involved. Part of the point of using an online backup service is that they do that work for you.
Using an external service requires that you trust the provider to handle their backups. That means they must manage their tapes far better than you do. They must take security precautions to protect your data as well as you would, rather than just piling the tapes in a disused warehouse redolent of many generations of rodents. You must choose an external provider who has invested in equipment and people to make backups a non-issue.
Then there are legal aspects. Suppose I own a company that makes widgets with an advanced super-secret process. My company gets a subpoena. Anything entered as evidence is in the public record. Even if I sincerely desire to cooperate fully with the subpoena, I dont want unrelated information about my special widget manufacturing process going into the public record. If my cloud server or backup provider is hit with a subpoena, theyll hand everything over to the court without blinking. It doesnt matter if the service provider has my backups or if they have my live servers, theyll expose my data and might not even tell me. To keep my business running, I really need to know if my information goes out under subpoena.
Some backup services do offer encrypted backups. This is a great idea, but not as simple as it sounds. Encryption can expand the size of your data, increasing the amount of bandwidth and space needed to store the backup. And encrypting a backup means adding complexity to recovery.
Encryption comes in many different grades, usually expressed as key lengths and algorithms. Algorithm choice is importantcertain algorithms can be easily broken by freely available software, and most sysadmins dont have the expertise to judge how well an algorithm works for their application. But far more important is how well that algorithm is used as part of the whole system. I can get a safe that is guaranteed to resist acid attacks and shelling by tanks, but if I leave it in the back yard with the door unlatched, its not secure. A mediocre encryption algorithm used well is more secure than a great algorithm used badly. (A great algorithm used well is best.) Some offsite backup services use nearly unbreakable encryption algorithms, but you must trust that the vendor software implements and uses that algorithm correctly. Most online backup vendors do not share their client software source code, so you cant independently verify the encryption is correctly used.
Next page