PSA: backups

Dear Lazyweb, and also a certain you-know-who-you-are who should certainly know better by now,

I am here to tell you about backups. It's very simple.

Option 1: Learn not to care about your data. Don't save any old email, use a film camera, and only listen to physical CDs and not MP3s. If you have no posessions, you have nothing to lose.

Option 2 goes like this:

  • You have a computer. It came with a hard drive in it. Go buy two more drives of the same size or larger. If the drive in your computer is SATA2, get SATA2. If it's a 2.5" laptop drive, get two of those. Brand doesn't matter, but physical measurements and connectors should match.

  • Get external enclosures for both of them. The enclosures are under $30.

  • Put one of these drives in its enclosure on your desk. Name it something clever like "Backup". If you are using a Mac, the command you use to back up is this:

    sudo rsync -vaxAX --delete --ignore-errors / /Volumes/Backup/

    If you're using Linux, it's something a lot like that. If you're using Windows, go fuck yourself.

  • If you have a desktop computer, have this happen every morning at 5AM by creating a temporary text file containing this line:

    0 5 * * * rsync -vaxAX --delete --ignore-errors / /Volumes/Backup/

    and then doing sudo crontab -u root that-file

    If you have a laptop, do that before you go to bed. Really. Every night when you plug your laptop in to charge.

  • If you're on a Mac, that backup drive will be bootable. That means that when (WHEN) your internal drive scorches itself, you can just take your backup drive and put it in your computer and go. This is nice.

  • When (WHEN) your backup drive goes bad, which you will notice because your last backup failed, replace it immediately. This is your number one priority. Don't wait until the weekend when you have time, do it now, before you so much as touch your computer again. Do it before goddamned breakfast. The universe tends toward maximum irony. Don't push it.

  • That third drive? Do a backup onto it the same way, then take that to your office and lock it in a desk. Every few months, bring it home, do a backup, and immediately take it away again. This is your "my house burned down" backup.

"OMG, three drives is so expensive! That sounds like a hassle!" Shut up. I know things. You will listen to me. Do it anyway.



Update: Mac users: for the backup drive to be bootable, you need to do two things:

  • When you partition the drive, use GUID, not Apple Partition Map;

  • Get Info on the drive and un-check "Ignore ownership on this drive" under "Ownership and permissions."

You can test whether it's bootable by holding down Option while booting and selecting the external drive.
Tags: , ,

119 Responses:

  1. jwz says:

    P.S., RAID is a waste of your goddamned time and money. Is your personal computer a high-availability server with hot-swappable drives? No? Then you don't need RAID, you just need backups.

    • aaronlehmann says:

      100% agree. I've had too many close calls with broken RAID implementations to even consider bothering with it anymore. I just back up using a similar rsync command to what you proposed. I'm planning to buy an external 750GB hard drive specifically for backups when they get just a bit cheaper.

    • carus_erus says:

      RAID = Taking the multiple failure rate of JBOD and giving you one chance to recover from a fuck up instead of zero. (Note: See maximum universe irony above).

      The problem is, you can't back it all up. Tape drives are too small, and you need another RAID of disks to hold all the data. There exist boxes that let you combine hard drives into a "portable" box, but at this point you might as well replace "hard drives" with "computers" in your post above.

      The best solution is to figure out which data is (A) Critical and static (i.e. archivable, like photos or for me a Genealogy project) or (B) Critical and dynamic (Email, Recent work/CVS). DVD-R's though a pain in the ass, are decent for A as it might be a once a year activity to burn three disks and store them in different locations (if it's 100GB or less, stop thinking about tape and get thine ass to a burner). The dynamic stuff could be taken offsite, but really it's likely not that huge and an offsite mirror would do the trick, if you can figure out what's *really* important.

      Then there's (C) which is all that crap on your hard drive (Farscape, Bleach, tentacle porn, whatever your poison) which should be classified as "shit that I can download again if my house burned down" and not stressed over.

      Not enough people do A (I do), almost nobody does (B), and people who fret about the loss of (C) need to get on with their lives.

      • edge_walker says:

        Forget classifying data. It's too error prone.

        1. You will forget to include something important, some buried directory, and then it will be gone.
        2. You will skip stuff that you don't realise is worth backing up, like your apps' settings.
        3. It takes work to compile all these inclusion lists.

        Making backups must be a brain-dead easy activity for two reasons:

        1. You won't do it if it's something that you think of as something you have to do. It must be so easy that thinking "I should make a backup" is indistinguishable in effort from making the friggin' backup.
        2. The single biggest threat to your data is yourself. Minimising operator involvement at all stages of the process minimises the potential for operator error.

        That means burning "critical static data" to DVDs is fine as far as that goes, but it should only be a fall-back defence; the same data should always be included along with every "critical dynamic data" backup because You Will Forget To Include Something Important. If you want to exclude "shit I can download again", more power to ya, but your backups should still include everything by default, and only specifically exclude the directories where that stuff goes.

        It also means: forget tapes. Way too slow and way too little capacity, which means way too fucking painful; also, way too much operator involvement. In fact, forget backing up to dedicated media like DVDs and tapes. It's passè. It used to be worthwhile when the backup media had much more capacity at much lower per-megabyte prices than hard drives. But the per-megabyte price of hard drives is now so ridiculously low and their capacity relative to backup media so large that it's far easier to just use extra hard drives as backup media.

        • rapier1 says:

          Tapes useless? I'd have to say that's highly dependant on what you are doing. For the home user then yes, the home user generally only has data of sentimental or low economic value. For commercial enterprises or research organizations - especially larger ones with high value data they're a much better choice than rotating media. The initial investment is higher but the media price per GB is very comparable (a 300GB SDLT is $70 (and as low as $45 in bulk)) and the archival storage life is 30+ years.

          For a home user, yeah, tapes are useless.

          • edge_walker says:

            I understood jwz's advice to be aimed at home users and that's what I talked about.

            Organisations have the resources to pay people for making backups, which means laziness is not an issue and operator error can be negated by rigorous procedure and routine tests. Obviously the rules differ when the premises differ.

            But backups are increasingly important for all of us, and conventional wisdom has not caught up with the change in premises for home users. It's vital that we get away from that one-track thinking.

            • freiheit says:

              Also, most "enterprises" and "organizations" can afford to have a robot do the boring work of changing tapes.

              • edge_walker says:

                Yeah. Though there's lots more tedious work than just changing tapes:

                • Decide what to back up.
                • On what schedule; what scheme of full/incremental/generational.
                • Set up software (oh god, it's hateful).
                • Periodic test restore (not just a dry run) to ensure the "backup regime" actually is.

                (Missing the last point in particular is a classic way to lose - but if you use tapes and only own a single machine, how are you going to do that? In contrast, if you copy everything to a second hard disk, then a test restore is as simple as booting from it.)

      • jwz says:

        Dear readers, do not listen to this person.

    • decibel45 says:

      I have to disagree with this. On at least 3 occasions now I've had a drive fail and lost absolutely 0 data because of it, thanks to RAID. This was on a machine running FreeBSD and serving my web sites, email and DNS, so maybe it stretches you definition of 'not a server', but the fact that the machine stayed up and running until I could deal with the failed drive later that day is a godsend for me.

      But of course I *also* have backups for that machine as well.

      • jwz says:

        You also would have lost absolutely 0 data if you used that second drive as a backup. And the time it takes to rebuild the RAID array is the same or longer as the time it takes to restore the backup. And you don't have both drives on the same power supply. And you don't have the option of the OS deciding to take a shit on both drives' data simultaneously.

        I say again, RAID is a waste of your goddamned time and money.

        But hey, if you want to spend twice as much money on drives for no appreciable benefit, and also spend a huge amount of time debugging your RAID config, and re-learning which arcane command you have to use to rebuild the array when you need to remember that again two years from now, more power to you.

        • smin says:

          RAID protects the data you've added or amended since you last ran your backup. Like everything since 5am if you use your cronjob example.

          "OMG, three four drives is so expensive! That sounds like a hassle!"

        • decibel45 says:

          Unless you lose the drive exactly after the backup, you'll lose data, at least on a mail server with any real volume. And unlike restoring from a backup, you can still use the machine while rebuilding the array. More important, you're still up and running until you do replace the failed drive.

          Yes, I do have to relearn how to rebuild the array every time I lose a drive, but that takes less time that it does to go to the store and by a replacement. And that's only because I'm using software raid; every hardware raid I've seen is very easy to use.

          Finally, you're recommending a total of 3 drives; going with RAID1 means one more drive, so hardly twice the money. (Remember I said raid is not a replacement for backups.)

        • I'm using two 500GB drives in RAID 1 on a Windows box. The RAID 1 is provided by the onboard SATA component of the motherboard, so I spent nothing extra to gain RAID functionality.

          Given that I have 3-400GB of data, I need a second 500GB drive for backing my data up whether I use RAID or not, and RAID simplifies the process. Setting it up was not at all complicated, I just had to read the motherboard manual and then use the semi-graphical user interface that you can access during the boot process.

          I think that many home users are in the same place I am, which is to find themselves in an option of 1) spending a brief amount of time setting up a RAID 1 array which then automatically provides at least some redundancy as long as you leave it that way or 2) telling themselves they'll make regularly backups and then inevitably forgetting or slacking off. And I just don't get the "spend twice as much money" thing, given that RAID cards are cheap, or RAID is built onto motherboards these days. All you're really paying for is two hard drives, and isn't that what it would cost to do backups your way anyway?

          • jwz says:

            You have to do backups anyway, so all RAID does is add one drive (or add two drives, if your backup disk is also RAIDed.)

            You may think dicking around with BIOS settings is "not at all complicated" but I characterize it somewhat differently, more along the lines of, "please stab me in the eyes with chopsticks instead."

            I'm done talking about RAID now. All of you can stop.

    • RAID also fails to protect you from the "Oops my stupid self/cat/parent/interloper accidentally deleted a lot of data", because RAID copies your mistakes too.

      All my photos are on redundant Firewire drives, and I'm reaching the point where I'm going to have to swap them all up a larger capacity. I foresee this as some kind of infinite game of leapfrog.

      I did a survey paper on holographic data storage way back in college -- where the hell is my gazillobyte holographic data cube?

    • rexar says:

      I have an Enterprise 450 server with about a dozen hard drives. Is a ZFS RAID array okay?

      • teh_munchkin says:

        No. ZFS uses one drive worth of parity. Your failure rate on 2 of 12 drives is going to be worse than a single drive.

        • rexar says:

          It's using raidz2, so actually three would have to fail. Granted, the purpose isn't really to act as a place to store backups. It's acting as a fileserver--I just happen to be using it as an additional place to store backups; hence, to lose data, three hard drives plus the hard drives on this machine would have to fail in fairly quick succession.

  2. The universe tends toward maximum irony. Don't push it.

    Somewhere along the line I can see that being included in the set of most memorable quotes of all time.

  3. duskwuff says:

    As a counter-PSA, I'll point out that rsync doesn't behave entirely correctly on HFS. (Yes, even with Apple's improved version. It still screws up some random bits of metadata in the backup.) If this matters to you, you can either use ditto and live with the fact that it isn't incremental, or shell out $30 for SuperDuper, which is totally worth it.

    • jwz says:

      What does it screw up?

      • jwz says:

        (Whoa, stupid completion tricks on the subject. I just typed "metadata" there and Safari filled in the rest.)

      • duskwuff says:

        rsync screws up BSD flags, certain types of utime data, and ACLs. Here is a much more exhaustive treatment of the topic.

        • jwz says:

          Ok, so, nothing that even remotely matters?

          • duskwuff says:

            Not unless you mind the incorrect timestamps, basically. Also, I'm really uncomfortable with how rsync handles resource forks - it seems to work most of the time, but it also has a tendency to spit out unsettling error messages.

            Of course, if you don't have anything that uses resource forks (and unless you've still got pre-OS X stuff floating around, you probably don't), this is a non-issue as well.

            • jwz says:

              The only thing that I'm aware of that uses resource forks in a non-disposable way at all any more is Finder, with text-clippings and web-location files. (Curse you, Finder. That's totally unnecessary.) I suppose it's possible that some application bundles have stuff inside resource forks, but I don't know of any. I guess the only way to find out would be to try and restore from a backup without copying the resources and see what breaks. I strongly suspect the answer would be "almost nothing".

              • duskwuff says:
                #!/usr/bin/env python

                import os

                for path, dirnames, filenames in os.walk('/'):

                  for f in filenames:

                    try:

                      if os.stat(os.path.join(path, f, '..namedfork/rsrc')).st_size > 0:

                        print os.path.join(path, f)

                    except:

                      pass

                • jwz says:

                  Note that I said "in a non-disposable way".

                  Lots of things still write resource forks, for example, Photoshop and Illustrator write resource forks into every saved JPG and PDF. But as far as I can tell, there's nothing in those forks that matters.

                  • inoah says:

                    It looks like all fonts are in resource forks (why?). But aside from that resource forks do seem to be less common than they used to be.

                  • lord_knusper says:

                    yes, fonts are in resource forks, and I could never figure out why either ...

                  • scullin says:

                    Well, fonts targeted at MacOS are, probably because that gives them compatibility with older OS9 apps, but OSX can deal with .ttf and .otf files (and some others) just fine, in fact most of the non-OS bundled fonts I have installed are non-resource forked.

                    There's probably a way to bulk de-rez them if you're never going to use OS9.

                  • saltation_lj says:

                    the "why" is probably just historical inertia.

                    fonts on the old macos could be added to any file. so if you used a funky font in a Word document, you could make sure it looked&printed ok on any target mac by simply attaching the font to the document. as Resources, they could travel invisibly with the document's user-data.

                    nowadays, people have forgotten about the (to my mind, lovely) idea of fonts not being fixed to a particular os installation/config. and i'd be surprised if macosx respected a document's embedded fonts and used them for its display.

                  • duskwuff says:

                    The other one I'd watch out for is applications which store resources in resource forks. Checking my own apps, I noticed at least a few which look as though they'd turn into pumpkins if their resource forks were removed.

              • jace says:

                Lots of Mac fonts, too.

                I'm currently doing backups to a disk attached to a NSLU2 over SSH and a wireless network (saves me the hassle of plugging the laptop into the network each night) via an rsync process at 4 AM. Three issues:

                1. All HFS+ metadata is lost since the remote disk is ext3 on Linux.
                2. For some weird reason, rsync can't set timestamps on symlinks.
                3. The backup job appears to pull a ton of stuff into disk cache, making system performance somewhat sluggish in the mornings.

                Maybe I should mount as SSHFS instead of using SSH and let OS X create those ._ fork files.

                • this_old_man says:

                  rsync 3 is in active development and just recently became dependable as a replacement to OS X's built-in version.

                  :pserver:cvs@pserver.samba.org:/cvsroot

                  no acls (no loss) but excellent support for forks and metadata. also fixes OS X's large directory crashes and memory leaks.

                  the only hangs i've seen have been when the target disk is full.

                  been using this for weeks for nightly backups of several large servers and it's been fine.

                  • Thank you for this. You just resolved a very large number of headaches.

                  • this_old_man says:

                    rsync 3 command line i use:

                    $ rsync --verbose --archive --xattrs --delete --stats --delete-excluded --exclude-from=excludes-system / /Volumes/backup 2>&1 >> log-backup-laptop

                    excludes-system looks like this:

                    /.vol/*
                    /Network/*
                    /Volumes/*
                    /automount/*
                    /cores/*
                    /dev/*
                    /afs/*
                    /private/tmp/*
                    /private/var/launchd/*
                    /private/var/imap/socket/*
                    /private/var/run/*
                    /private/var/servermgrd/*
                    /private/var/spool/postfix/*
                    /private/var/tmp/*
                    /private/var/vm/*
                    /proc/*
                    /tmp/*

        • pw201 says:

          Yep. That article put the fear into me wrt Mac backups. I'm currently using psync to an external drive, but after following the links in these comments, I'll be checking up on psync and going through the other possible tools with Backup Bouncer, a set of torture tests for Mac backup tools which some helpful soul has put together.

    • zojas says:

      just use rsyncX, plain rsync patched to handle the resource forks. add the --eahfs flag (rsyncX only) to turn on the hfs special stuff. the install of rsyncX gives you a useless gui and a new rsync in /usr/local/bin. http://archive.macosxlabs.org/rsyncx/rsyncx.html

      • jwz says:

        There's no need for rsyncX; the version of rsync that Apple ships in /usr/bin/ already supports the -E option for preserving resource forks (--extended-attributes).

  4. 33mhz says:

    Strangely, with the laptops I have, the display is way more prone to outright fail than the hard drive. I only wish there were a way to matter-compile a back-up of one of those.

  5. autopope says:

    I don't do this: I have a different backup policy.

    I work on laptops.

    When I buy a new laptop, I probably sell the last but one. The old one then becomes the hot backup machine. I never have less than two similarly-configured work machines.

    (If the new laptop's drive is much larger than the old laptop, consider buying the old lappie a new hard disk.)

    Use rsync to backup data from the current work machine to the old one every week. Meanwhile, every day (or more often) use rsync to backup critical data -- work files, email archives, not random MP3s -- onto an 8Gb thumb drive that lives on a keyring. Every couple of days, use rsync to backup the critical data to a server 500 miles away.

    When you boil it down, this is essentially the same level of cover as jwz's spare drive policy; it just uses a different kind of drive enclosure (the laptop).

    Oh, as for Windows ... use rsync. (You did install Cygwin, right?)

    Horrible confession time: my most recent laptop is running vista. And it's staying that way for, oh, another month or so (until Ubuntu Gusty Gibbon is stable and I have a spare few days to spend getting the wifi drivers to work). Again: rsync is available for Windows.

    • wilecoyote says:

      Alternatively, if you don't want to fight with Cygwin (I mean, isn't that the Windows equivalent of "recompile your kernel"?), you can try cwrsync.

    • pixelfreak says:

      If you've upgraded to an Intel based Mac from PPC, you can't rsync the entire drive. While applications on Mac OS X can be universal binaries, the system bits are platform specific.

      However, you could sync your home folder and applications.

      If I recall correctly, this limitation should be resolved when Leopard ships.

    • vladekk says:

      vista has robocopy.exe

    • etfb says:

      If, for some reason, you don't feel like futzing with rsync, the built-in XCOPY command in DOS isn't too bad, really. XCOPY /C/R/I/K/E/Y/D/H C:\ F:\Backups\CDrive will copy your entire C: drive, maintaining permissions and continuing after any error, only copying files whose dates are more recent than the ones already on the backup. This is inferior to rsync in a bunch of ways -- no --delete option to keep your backups clean, and the date-based system is pretty simplistic -- but superior in that you can remember the switches (Steve Irwin warning D.H. Lawrence not to pet the crocodile, perhaps? "Crikey, D.H.!!!") and it's already there and functional on every MS system since DOS 3.

      Posted from an Ubuntu machine. I only know MS-DOS because I've been using it for the last twenty-odd years, but I can give up anytime, you know?

    • newz_top says:

      its nice idea!

  6. everybody42 says:

    Go buy two more drives of the same size or larger. If the drive in your computer is SATA2, get SATA2. If it's a 2.5" laptop drive, get two of those. Brand doesn't matter, but physical size and connectors should match.

    Well, brand matters sometimes, as the typical user won't find out that models with seemingly identical sizes (e.g. 400 GB) but of different brands have different capacities (off by some bytes). That is no problem for isync, but if you backup your disk with dd (or any other partition copy tool) so you have a replacement to plug in, you should have the same capacity (or more).

    Also, raid: I use a raid 1 setup on my main workstation (WinXP) and on a server (debian). Pro: backup disk is always up to date, reads are double as fast. Cons: If your disk does not die 'naturally', but from any physical cause (PC falls down, electrical problems), probably both disks are gone.

    Any chance you tell us what prompted you to write this post?

    • jwz says:

      Which is exactly why my advice does not include the use of "dd".

      I say again: RAID is a waste of your goddamned time and money.

  7. allartburns says:

    (Option++;): Buy a DDS-4 tape drive, a copy of Retrospect and a bunch of DDS-4 tapes. Divide the tapes in half -- one set is the current backup set the other is the "keep at a friend's house" backup set. Swap between the two.

    I backup four systems this way (two laptops, two desktops) and have saved my ass many, many times.

    • nrr says:

      This fails miserably because the price point for the tape drive and tapes is far higher than that of the two hard drives and their external enclosures. Really.

      The idea is to do this sanely and cheaply when confronted with the problem as a home user.

      • allartburns says:

        It doesn't "fail miserably" at all. It simply costs money. How much are daily, incremental backups and an off-site copy worth to you?

        I have four systems that need daily, incremental backups, three mac and one XP. Sure, I could buy EIGHT hard drives and enclosures and do all sorts of fiddly moving drives around, or I could invest a tape drive, tapes and software and not have to fool with it.

        I could also save a LOT of money by not using Macs and just using bsd on generic PCs. I wouldn't be nearly as productive, but I'd be saving money!

        • nrr says:

          There's one prerequisite you're missing, and that's the home user bit. You are not the typical home user because you have a myriad of machines under your care. The details in this case are trivial. I.e., I don't care if you're some supernerd with all four machines sitting on your desk, nor do I care if you're just some parental figure doing tech support for your children.

          My claim is that tapes and tape drives are expensive compared to hard drives and external hard drive enclosures in a typical home user setting. The best indicators of that are to consider the mean lifetime of a tape, the price per gigabyte of storage that a tape affords you, the assurance of data integrity (i.e., how do you know that your data won't rot before you end up needing it?), and the amount of human intervention in the process of performing a backup and recovery.

          I'll leave the proof of that as an exercise to you because I don't care to pursue this further than the riling-up-shit-on-the-internets-to-blow-off-steam stage.

          (Also, you suck for using conjunctive elimination. Just FYI.)

    • jwz says:

      Tapes, like RAID, are a waste of your goddamned time and money.

      When you back up to a drive, and that drive is sitting on your desk and mounted, you know when the drive has gone bad because your computer starts yelling errors at you. When you back up to a tape or a DVD, that media is just sitting on your shelf silently rotting and you won't know it's gone bad until that horrible day in the future when you try to read it.

      Never, ever, ever back up to anything except a connected, live file system.

      • inoah says:

        Never, ever, ever back up to anything except a connected, live file system.

        I second this.

      • allartburns says:

        My experience recovering data from tapes over the past ~15 years is good enough that I'm unlikely to change based on your fear that all my tapes are "silently rotting".

        So no, they haven't been a waste of my time or money.

        >except a connected, live file system.

        Which can be erased in real time along with all your data or destroyed in real time during an earthquake/fire/flood/whatever.

        • rapier1 says:

          Agreed. If you care about your data you want offsite and you want archival quality. Tapes work and to be perfectly honest, with all the tapes we have at work, I've never heard of one of them getting bit rot. We have a few petabytes of tape on line (two silos controlled by an old J90) at all times and if we've ever had a user lose data because of a bad tape I've not heard about it in the past 12 years.

        • keimel says:

          So no, they haven't been a waste of my time or money.

          According to the subsequent followup comments by the original poster, it's only a waste of "goddamned" time and money. If you've got regular old non-damned time and money, go for it!

          And yes, with the cost of drives, tapes can suck it.

      • edouardp says:

        To drive home one of Jamie's points, ad nauseam, RAID sucks. It sucks, among all the other ways, for the same reason that tapes suck. The data can go bad behind the scenes, and you will know nothing about it until it's too late.

        A story: A friend of mine decided that his new work machine was going to have RAID to protect his data. RAID-1 mirroring. Our work didn't do backups, so he thought RAID was the next best thing. Paid good money for the system.

        Then, a few months later, he went to open a very important file he hadn't used in a while and got a disk error. Oh noes! He runs disk check, and finds that, somehow, his file system has become corrupted, and there are now lots of files that are corrupt.

        But how can this happen he thinks? I had RAID, why isn't the second disk OK? Cue sickening realisation...

        Because, of course, RAID works at the block level. If the filesystem gets screwed up on one disk, the RAID sub-system faithfully reproduces that filesystem corruption on the second disk. Now you have two identically corrupt disks. RAID, however, has done it's job correctly - all it cares about is that the blocks on disk 1 are now on disk 2. It doesn't care about your *data*. That is your job.

        RAID is an availability system, not a backup system. Don't buy a car if you need a boat.

        Ob-tapes-suck-quip: I fondly remember that the failure rate of DAT tapes, back when I used them for UNIX backups in the 90s, was about one in two. Maybe modern tapes are better, but most people wouldn't know, because almost no-one ever tries regular trial restores to make sure the damn backups actually, you know, worked. Tapes suck.

        • rubin110 says:

          I had a client I had to explain this too, who was rather cheap and didn't understand the concept. Two mirrors were kept, one live and one off site, swapped once every week. The solution was two fold, bootable backup that can only be as stale as a week, quicker read times (at the cost of slower write times, which for this machine was not an issue at all).

          He insisted I show him exactly what needed to be done to the SCSI controller to stop the mirror and boot off of the drive sitting in the swap bay. He said this was just in case I was on vacation or unable to come into the office and he had to do it himself during an emergency.

          Truth be told when he accidentally hacked off all of his client's web site content, instead of calling me in to deal with recovery (his excuse for being a cheap bastard was he called me twice and got voice mail), he attempted to boot off of the live backup himself. The only issue was that the live backup drive was just a copy of the same missing sites on the boot drive. Him being the smart person he thought he was, ended up futsing with pretty much every changeable option the SCSI controller card had to offer, as from his observations he was still booting off of the "bad" drive and not the "good" live backup.

          In the end I got paid double time to fix his mess over the weekend.

    • chrtle says:

      I've always used RAID systems, normally RAID 5 arrays in the belief that the data was secure - not necessarily so.

      3 months ago I had a server fail, a server backed up using DDS-4 tapes, only the tapes or the drive or something was corrupted and the data couldn't be read.

      Luckily the data was recovered by a data recovery company, http://www.abcdatarecovery.co.uk , but for the privilige I was relieved of £1200.

      In my experience all the backups in the world will not help you, unless you check that the data stored on them is intact, free from corruption, and stored away from the server in question.

  8. cowsandmilk says:

    is there a reason why rsync would be better than using carbon copy cloner?

    • jwz says:

      Carbon Copy Cloner uses rsync. They are identical.

    • dsandler says:

      They're the same, under the hood. CCC (the new version of which, by the way, is excellent) is a shiny Cocoa wrapper around rsync. For scheduled jobs it seems to use its own ccc_helper background app instead of using cron, but other than that it's essentially a GUI way to do what's suggested here.

  9. malokai says:

    The `ditto` command can also be used to make one-off bootable backups, preserving the resource forks (for fonts, I guess, or any mac os classic applications you may be using if you're still running that ancient copy of quark?)

    Other utilities like the carbon copy cloner can be used for this (useful for the 'third' backup drive option that you leave in your desk).

    The 10.5 'time travelling' thing seems to just wander your filesystem looking for modified files and backing up previous revisions of the files, so at least things are looking up in the automated Mac OS X backup world(since this post is obviously inspired by a friend or relative's drive crashing?).

    http://www.bombich.com/mactips/image.html

  10. decibel45 says:

    Personally, I use rdiff-backup instead of rsync so that I get incremental backups, and eventually I'll by rsyncing that backup offsite.

    But yeah, what he said. :)

    • keimel says:

      Semantics aside, how is rsync not ending up being incremental. If I use the switches indicated in the original post, it says "only copy the new shit" and "delete all the old shit I deleted". It doesn't have to do what I would normally think of as a Level 0 anymore than one time, when the drive is synchronized the first time.

      Seriously asking, not just yanking your chain or playing. Am I missing something?

      Is it just because rsync automatically looks at the entire list of files first before determining what has changed? Or is there some other thing?

      Thanks

  11. eliot says:

    I do a complete backup at my office on to an external drive with SuperDuper! and then an rsync to StrongSpace of my Documents, Mail, Address Book, and Pictures. I've found this to be useful because not only do I have a completely off-site secondary storage, but I also can browse from a friend's house and find a photo or document I have on StrongSpace.

    • jwz says:

      That looks like a lot more typing for zero benefit.

      • tongodeon says:

        The main benefit is that if your crontab runs "rsync -vaxE --delete --ignore-errors / /Volumes/Backup/" when your backup drive isn't mounted you won't end up with a duplicate directory full of your user data at /Volumes/Backup. (This happened to me a few times.) It gets even stupider when you plug your Backup drive in and then you've got /Volumes/Backup (the directory) and /Volumes/Backup.1 (the new mount point for your backup drive).

        A secondary benefit is that once you've got a launchd script in ~/Library/LaunchAgents/ everything is fire-and-forget. The backup script gets backed up along with everything else, which means that whenever I do a clean reinstall and restore the LaunchAgent gets restored along with everything else and my backups continue without further intervention.

        Launchd also lets you control your jobs a little better. If I ever want to do a special backup (I'm leaving on a 10pm plane flight and I'll miss the 3AM sync) I can run the script early by running "launchctl start com.tongodeon.backup". Launchd isn't a huge thing of course. Mostly I just wanted to make sure I understood how it works, more or less.

  12. smin says:

    Does one no longer have to bless a Mac backup drive before it's bootable? Did that come in with Tiger?

    • wootest says:

      A bit-by-bit copy of every data on the drive will include the blessing. It's not in the partition table.

      • yakko says:

        Even if the drive wasn't copied bit-by bit, all that matters is that the destination disk was created as a GUID partition table. All I had to do to upgrade the disk in my laptop was 1) create GUID partition on replacement drive with Disk Utility; 2) rsync the old drive to the new one; 3) swap them and reboot.

        If the machine is a PPC, I believe blessing the replacement volume is still necessary.

  13. ladykalessia says:

    I hereby invoke the jwz-fanboy-advice=baiting.

    I've got an iBook with a tiny hard drive, which I've been cloning to an external drive using CarbonCopy Cloner whenever I remember and/or whenever Apple Software Updater runs. Not a lot of important data here, but every once in awhile I dump my documents folder onto a CD and put it away somewhere.

    I remember hearing horror stories of a graphic artist somewhere who had impeccable backup technique, except that they were all sitting by his computer when his house burned down. Leetle paranoid after hearing of that.

    Eventually I'll upgrade my desktop paperweight and have something running OSX, at which point I will come back here and follow the above advice.

  14. lohphat says:

    Basic rules of IT:

    1. Information wants to be free.
    2. It's not a matter of "if" but "when" it will fail.
    3. People do what they can do, not what they should do.

    My desktop system is hardware RAID1 to buy me time when then the first drive does fail.

    I backup to one of these weekly. "RAID-X" which allows my to grow the volume size dynamically just by popping in a larger drive at a time and letting the volume re-balance without having to do a full backup and restore.

    It handles CIFS (Supports AD auth), NFS (UDP only for now), AFP, FTP, Rsync, http, has built-in iTunes and Slimserver streaming and more..

    Comes in a rack mount model too.

    • ptz says:

      Just some thoughts:
      1. "My desktop system is hardware RAID1 to buy me time when then the first drive does fail."

      RAIDs tend to use the same disks. Actually, identical types, sometimes taken from the same production lot.
      Exercise to the reader: Go figure about the probability of them failing at similar times. Also take into account damage caused by over-current, etc.

      2. ""RAID-X" which allows my to grow the volume size dynamically just by popping in a larger drive at a time and letting the volume re-balance without having to do a full backup and restore."

      Do a disaster recovery exercise. Do it while your primary hard drive is OK (onto a new, spare, empty drive).
      Better yet, do it after doing another backup onto some other computer/drive at a friend's house 50 miles away.

      Couple of years a friend of mine got a RAID controller for his work PC, some disks and - before production use - tried out its balancing capabilities, result of a simulated hardware fail, drive exchange, etc.
      It was not a cheap RAID solution, and the vendor claimed everything would be fault-tolerant and even hot-pluggable.

      Net result: five trials, five complete data losses.

  15. awooster says:

    Option 3

    When Leopard comes out, use Time Machine, the thing that's been making my weekends disappear for the past few months.

    • cjensen says:

      Time Machine looks cool and all.... but I'm supposed to trust a version 1.0 of a really-complex backup utility with all my precious bits?

      I'm thinking I need to get a real backup scheme working BEFORE switching to Leopard and its time machine.

      • rubin110 says:

        Time Machine is fairly useful if you're the everyday Joe who's got a desktop machine and thinks that getting back on your two feet after a system crash of sorts will require some time.

        Running Time Machine on my MBP, the thing has a nack for starting a versioned backup without really informing you. If you have the Finder opened (I use Path Finder) you'll see a rotating sync animation where the eject button should be next to your backup volume. This tiny thing is to let you know that A) There's an automatic backup happening right now and B) Don't fuck with me.

        If you ignore B by just simply not noticing that there's a backup in progress because you didn't have a look, put your machine to sleep, unplug all USB devices, stuff the machine in your bag, bike to a net cafe, then wake the thing up, you'll notice that...

        - Finder is frozen.
        - Path Finder vomits violently when you try to eject your backup drive's volume.
        - The Time Machine preference pane becomes unresponsive.
        - sudo umount -f /Volumes/backup doesn't do shit.
        - You'll need to do a cold reboot to make your MBP functional again.
        - And when you go home you'll have the unpleasant surprise that the backup volume on the external drive is completely toasted and unfixable.

        Applause for Apple in creating Time Machine, a backup solution not for me.

        Did I also mention the drive isn't bootable in any way? You must have a OS X Leopard disc handy to recover any (I mean all) of your data during a full HD crash, that is after you've replaced the drive in your machine. Here's a link to the most useful/detailed information I've found on this new feature...

        http://www.appleinsider.com/articles/07/10/12/road_to_mac_os_x_leopard_time_machine.html&page=2

  16. mattlazycat says:

    Most people seem to think that they don't need backups. It might not be productive to call them idiots, but masturbating on drugs isn't productive (except in the white sticky sense) either, and that's totally worth it too.

    As someone else pointed out, SuperDuper really is a lovely bit of backup software for those allergic to doing things in the Terminal, and having a bootable backup is supremely useful as well as wise.

    As an additional tip, paranoia (and with backups, every little helps) has me keeping my backup drives switched off whenever I'm not actually doing a backup. No one can wipe or mess up a drive remotely if it's not connected.

  17. sircyan says:

    As a former computer store owner, I can safely say that virtually nobody backs up their data. I know this, because data recovery was my #2 source of income after virus removal. Then you get bizarre happenings, like the woman who bought a 250GB external drive, plugged it in, and thought it would 'magically' backup her data. Her primary hard disk failed, she brought it in all smiles expecting me to restore it from the backup USB drive. Boy, she was sure unhappy to find out that it was empty!

    Personally, I use an online backup service for anything important. I just rsync directly to them via cron every morning at 5am, and it's done. I think it's something like $5.00 a month or so for a couple of gigs. Takes care of the "keep your backups off-site" problem, too.

    -RS.

    • khedron says:

      That's one of the things Maxtor's "OneTouch" line is meant to solve -- once you've plugged the drive in and configured it, you just push the button on front whenever you want a backup. You can also do scheduled backups too, of course.

      I just ordered another of these from Fry's. 750 GB, $185.

  18. rstevens says:

    Learned this the hard way. Can't agree enough.

  19. ghewgill says:

    You sir, are wise. I had a "maximum irony" episode back in february, where a software installation/config failure was followed by intermittent hardware failure followed by complete drive failure on my colo server followed by a stolen laptop.

    Now, amazon s3 and rdiff-backup are my daily friends. I have learned my backup lesson many times before, but I had become lax. That's when the universe strikes.

  20. darkengobot says:

    I more or less do exactly this. I do use OSX's software RAID-1 on some file servers, but this is backed up still using this procedure. The software RAID is there merely to prevent losing a day's worth of data, which is decent value for your $100 SATA drive.

    You always have to back up. And you always have to have an off-site backup. Good post.

  21. divelog says:

    Make sure to test booting off your external drive. MacBooks can only boot off firewire (sigh). The partition type of your external drive matters; the apple disk utility defaults to Apple Partion Map, which won't boot an intel mac. Make sure to pick GUID. This will boot an intel mac. And trust me, it really sucks thinking you have a bootable backup when you actually don't.

    • psi0nik says:

      Intel Macs can boot from USB drives as well as FireWire (I've done this first-hand on my MacBook). The partition table type is still significant and does need to be GUID.

  22. ultranurd says:

    Is it tempting fate to have the "live" backup (i.e. not the my-house-burned-down external backup locked in some other location) be to a second internal drive? It seems to beg for a lightning strike or exploding power supply that would take out both.

  23. waltfrench says:

    Yup, that's sure one good way.

    Regrettably, it'll do nothing for me when I discover that I didn't save the original when I cropped a photo too tight (or otherwise over-edited it) two months ago. ... that I know I had Frank in my address book last summer when I called him about a trip. And for all sorts of other situations where you need multiple instances of a file. (And I sure as H don't want to overwrite my CURRENT address book with one from 9 months ago to save one entry, and burn a couple dozen.)

    Dunno why anybody in this business assumes HIS practices are right for everybody else, too.

    I imagine we'll find ways that the OMG Time Machine!!! isn't so neat, either -- perhaps, for somebody who maintains multiple, large databases that have small tweaks once a day, and needs detailed rollbacks. Easy way to max out the 750GB external that "should easily" hold all the data. But we're getting pretty specialized here.

    And your 3rd drive is a fine idea for off-site, even WITH TM.

    Thanks for pointing out that Backup Matters, and that you don't need Leopard to do it. But I'm not sure that's what Time Machine is about.

    • rubin110 says:

      You can limit what Time Machine hits, say just your home directory minus Music, Movies and Downloads. Then have rsync, CCC or Super Duper deal with the rest on its own time.

  24. rubin110 says:

    Sir, thank you for a fine post, mostly for the switches on rsync.

    For someone who dwells in the city and knows Jake, I've got a good feeling we've either met at some point in time or our paths will eventually cross.

  25. divelog says:

    Hrm I can't get this to work. I'm syncing to an external fw drive with the command:
    sudo rsync -vaxE --delete --ignore-errors / /Volumes/Backup/

    Rsync finishes with:
    rsync error: some files could not be transferred (code 23) at /SourceCache/rsync/rsync-24.1/rsync/main.c(717)

    And my backup doesn't boot. Well, it almost does, but complains about mDNSResponder spawning too fast or something.

    • divelog says:

      Hrm, well I did a full mirror using SuperDuper, which created a bootable backup. Then using the same rsync command, I now have a bootable drive. Maybe some external drives still need the 'bless'?

      • jwz says:

        Or, something didn't get copied the first time. It would have been interesting to run a diff to see. But, now that SuperDuper has done its thing, they are likely identical again.

      • jwz says:

        I just verified that I can boot my Intel iMac off my external USB2 backup drive... I've never run SuperDuper or anything other than the usual rsync incantation on it. So, I dunno what went wrong on your end...

        • divelog says:

          Just to clear up the confusion, I've tried making a mirror to my backup disk three times now. My backup drive has both usb2 and firewire. The first time I used usb; the other two times I used firewire. I think the connection type is irrelevant.
          First time: Used usb2. Formatted external drive using disk util, didn't mess with any settings. Assumed osx would do the right thing and use a GUID partition since I have a macbook. It used Apple Partition Map. Couldn't boot from this at all, didn't even show up as an option.

          Second time: Used firewire. Re-formatted as GUID this time. Did an rsync. Holding down option, I could see my external drive when I booted, though it was named 'EFI Boot'. Booted, but dropped to console mode flashing "mDNSResponder: invalid system call; respawning too fast" or something.

          Third time: Used firewire. Did a full mirror using SuperDuper, which re-formatted my drive yet again. After that completed, I rsynced. Drive showed up as 'Backup' when I booted with the option key, and worked fine.

          I'm guessing SuperDuper just un-checked 'ignore ownership on this volume.' as unstablehuman mentioned. I'll re-format and do another mirror tonight, see if that works.

          I hate computers.

      • The first time I tried to create a clone of my drive using rsync, I got the exact same mDNSRepsonder error when I tried to boot, along with an error that the OS expected some files to be owned by root that weren't. After a little googling I learned while rsync will preserve ownership and permissions, the OS defaults to ignoring that information on external drives. This is what I did to fix it:

        1) In disk utility, create an HFS+ Journaled partition using the GUID partition map (For intel-based macs)
        2) Select the newly-created drive in the finder, do a Command-i, expand "Ownership & Permissions" and ensure that "Ignore ownership on this volume" is NOT checked
        3) run the rsync command: sudo rsync -vaxEH --delete --ignore-errors / /Volumes/Backup/ (the H switch is to preserve hardlinks, just in case)
        4) Order Thai food and drink a few beers
        5) Reboot system. Hold down option key to boot off of backup drive to test.

  26. I think that you missed a second detail that you need in order for the disk to be bootable. I think that you also (may?) need to do bless -verbose -folder "/Volumes/whatever/System/Library/CoreServices" -bootinfo after an rsync.

    • jwz says:

      Not on my core duo iMac.

      • More to the point, not coming from the main disk in your core duo iMac, because you haven't screwed up the permissions in some way and you never had "ignore ownership" enabled on your target volumes. bless(8) is cheap and guarantees that the appropriate EFI is present for the OS you've duped. (It's not clear to me what it would do on a PowerPC mac, yet, but I'll tinker with my Powerbook some time in the future.)

    • Rather, that some people may need, depending on the state of Disk Utility's opinion about whether their permissions need to be "fixed" and so forth.

    • antifuchs says:

      I followed the original instructions, but Boot Camp Utils had troubles seeing the new startup volume and refused to boot into OS X (using the volume selector that comes up when holding Option on bootup, the startup disk had "EFI Boot" as the only entry).

      bless --verbose --folder /System/Library/CoreServices/ --bootefi /System/Library/CoreServices/boot.efi --bootinfo

      fixed this (note that this command differs from yours in the --bootefi switch; I cargo-culted this so that the --info output is the same as it was with the previous volume.

  27. lasarius says:

    Thanks for the fine article. While I've had automatic rsync backups to other machines for a long time, this is far nicer. However I have one issue with this: Spotlight.

    Whenever I would reconnect my backup disk, Spotlight would start indexing the backup disk. Not much sense in that. Adding the disk to Spotlight's Privacy list would immediately stop the indexing process, but it would do the same thing the next day (I have a MacBook, so I disconnect the backup drive during the day, then attach it in the evening so the nightly cronjob can do its thing).

    Obviously rsync overwrites the backup drive's Spotlight settings with those from the system drive. After a while of procrastination I finally went and found the culprit. And a fix.

    The file /Volumes/BackupDrive/.Spotlight-V100/Store-V1/Exclusions.plist contains the items to ignore on this drive. Since I don't have anything for Spotlight to ignore on my system disk, this gets carried over to the backup drive.

    The solution is simple: Add your backup drive to Spotlight's Privacy list. Then copy the file and put it with your backup script. In the end just copy it back to the backup drive after rsync has finished (as root):

    cp Exclusions.plist /Volumes/MyBackup/.Spotlight-V100/Store-V1/

    Hope this helps somebody. Annoyed the heck out of me.

  28. chrisam says:

    I'm a little late to the party, but...
    When running rsync as root, and you have FileVault enabled for your account, it will not back your /Users/you directory up. It will copy the encrypted image of your profile.

    Your /Users/you directory will be empty, and there will he a /Users/.you directory with a you.sparseimage file in it. That's the encrypted home directory.

    Is this a good thing, or a bad thing? It depends. By using FileVault, you are now automatically encrypting your personal files in all the backups you make. This will not be a problem if you are backing up to a bootable drive (like is suggested). But for pulling off individual files from a backup, that is impossible.

    Running a separate rsync as your user for your home directory should do the trick. This will leave all your personal files unencrypted on a backup volume somewhere though.

  29. steffi2 says:

    If you have or have been tempted to copy anything else to the drive and if that is no longer on the source drive when you copy you will delete the contents. I personally would use --delete but put the contents one level down instead of a root and sacrafice the ability to boot from the drive but yet still get true incremental backups without erasing anything else at the root level.