PSA: Back Up Your Shit.

Here in this modern world, you talk to your friends over all kinds of different media: SMS, Facebook, Twitter DMs, GChat, and maybe even AIM if you're really old. These conversations aren't ephemeral and disposable, they are your life, and you want to save them forever. You don't just throw your letters in the trash. You might want them some day.

Unfortunately, you and your friends have become beholden to third-party corporations who don't give a shit about preserving your data. That's because you're not the customer, you're the product. You already knew that, but you go along with it anyway, because frankly you don't have much choice.

I've fixed that for you. Mostly. Here's what I've got:

  • sms-backup-iphone ~/Documents/SMS\ Messages/

    This extracts the SMSes from your iPhone backup database, and saves them to a local directory. This only works if you back up your iPhone to "This Computer" rather than to iCloud.

  • facebook-rss.pl --messages $USER ~/Documents/FB\ Messages/

    This backs up your Facebook direct messages to a local directory. It only gets things in your Facebook "Inbox" folder, not things that have been shuffled off to the "Other" folder. I'd like to back that up too but I can't figure out how to read it through the API. But you probably never look in that folder anyway.

    You have to create your own "Facebook App" to make this work. It's a pain in the ass. Do that, then run this script once with --generate-session.

  • twit-backup.pl --user $USER ~/Documents/Twitter\ Messages/

    This backs up your Twitter direct messages to a local directory.

    You have to create your own "Twitter App" to make this work. It's a pain in the ass. Do that, then run this script once with --generate-session.

    This will only archive about a year's worth of your DMs. As far as I can tell, DMs older than that are completely inaccessible to you now, even via the Twitter web interface. They're just gone now. You missed them.

    This is why you need backups: because companies like Twitter pull shit like that. All the time.

    (Though Twitter provides a way to download an archive of your public posts, that archive does not include any of your DMs. And the guy who wrote that code quit, so don't expect this feature to be updated again, ever.)

  • AIM, GChat, Jabber, IRC and whatnot:

    If you use Adium for all the protocols that it supports, it does an adequate job of archiving everything (in "Library/Application Support/Adium 2.0/Users/Default/Logs/"). The interface for accessing those logs is a pain in the ass, and searching has never worked reliably, but at least the bits are there.

    If you use Adium for Facebook Messages, it archives those just fine, but if you ever reply to someone using the FB web site or phone app, Adium will only see and archive their half of the conversation, so that's no good. Thus you still need the FB archiver, above.

Remember: if it's not on a drive that is in your physical possession, it's not really yours.

Previously, previously, previously.

Tags: , , , , ,

40 Responses:

  1. Other jamie says:

    Sing it, brother.

    Also: back up your backed up shit. That laptop drive? The question is when, not if, it will fail. Your personal risk analysis may vary, but I keep two backup arrays, and important stuff is also copied to a host in a data center. That last bit is the "goddamn neighbor burnt my house down" defense. At least you have your important documents.

    Details depend on your host and OS preferences, but unix-alikes are easy with scp. Set up keys, schedule it, and check it once in a while to make sure nothing broke. I'm sure there is some way to make that work in Windows, but not my department. I'm lucky that I don't pay for my shell accounts, but they are cheap enough - if you're paying more than $20/mo., you're paying too much.

    Sorry for the rant. I recently was enlisted as reluctant sysadmin to panicking person who failed to even plug in the goddamn drive I bought her for the express purpose of backing up. Time Machine is annoying, but less annoying than losing your grad school papers. For instance.

    • Jonathan says:

      Our gracious host wrote perhaps still the most pragmatic back up advice, http://www.jwz.org/doc/backups.html

    • martin says:

      Hi (other) Jamie,
      I have lost more laptop and home computer drives to theft than to device failure drive. Important lesson here is:
      If you backup to some kind of portable disk, rather than across a network of some sort, DO NOT store your backups in the same place as the device they're backing up.

      hth

      • Other jamie says:

        You might notice the "goddamn neighbor burnt my house down" part. It applies to a number of situations that don't always include fire. Metaphor is your friend; used wisely, it communicates a lot.

        Or you might not. Your choice.

  2. cheide says:

    That reminds me that you have to watch out for the format things get stored in, too. A while back I organized my archived logs from some old IM clients only to find they were often stored in some custom binary format and/or encrypted. I'm sure the latter was intended for your own protection against snooping family and such, as long as you keep using the exact same program forever...

    Finding tools to export or decrypt them is a fun trawl through adware/bundleware/malware-infested waters, too.

  3. curgoth says:

    Future historians will have nothing to go on besides your backups (and maybe Charlie Stross) and the archival footage of the documentaries of Paul Verhoeven.

    I'd count on the NSA to have copies of everything, but I'm assuming a violent revolt will wipe all of that out at some stage.

    • Other jamie says:

      You can't count on intelligence agencies to have anything. They care about self-perpetuation, just like the rest of us. Having documentation might well be there, but if they are not stupid, likely not.

  4. Aloha says:

    I wish Adium and Pidgin used the same logging format - it's made the eventual transition to OS X from Windows even more painful. Beyond just a conversion, I still use finch on Linux.

    • Pavel Lishin says:

      To be fair, how often do you look at age-old conversations?

      When you need to see that stuff, parsing the format won't be that much of a trouble.

  5. Aaron says:

    You can backup most of your Facebook data easily from Facebook.

    Go to: https://www.facebook.com/settings , then click "Download a copy of your Facebook data."

    It's your entire wall, your photo albums, your messages and a few other things. It worked well when I tried it a year ago. It's not beautifully formatted, but the information is there in case you (or someone) decides to write a parser for it.

    • Patrick says:

      Except it isn't - lately they aren't archiving any comments on your posts, they aren't archiving links you shared, etc etc etc. Only content that you created yourself, basically.

  6. Lamont Granquist says:

    Every IRC conversation of mine from the 1990s is completely lost.

    And I guarantee you that I've never once woke up in a cold sweat over it. In fact, I hadn't ever considered someone would care about the historical record of their chat logs until just now.

    Backing up all this shit is just digital clutter. Maybe if you're the Director of the CIA and you want a record for your memoirs you should back up your chat logs (careful to keep work separate from the mistress though), but for the rest of us we're just not that important and 10 years from now you'll be a different person and you should be moving on...

    If your facebook account was accidentally wiped and that causes you to freak out, then you probably have bigger issues.

    • Tobermory says:

      I suspect you never received a formal reprimand from your work for IRC chat content in the 1990s. cf People you know who have been wrist-slapped for Facebook postings in the 2010s.

    • Jeremy Wilson says:

      I've been archiving IRC logs on a private server my friends and I use since 1999, and I have to say, that archive has been immensely useful many, many times over the years.

      Beyond its usefulness for remembering that webpage you found a few years ago or proving a point of order, you can't deny the simple pleasures of prancing down memory lane occasionally.

      I've lost email prior to 1997 due to various crashes, thefts and poor data handling over the years and I definitely regret it.

  7. I was thinking this the other day when I was browsing the old Reddit page and none of the links worked, and then I thought "what about my shit?".

    Gratitude.

  8. Rahul Pathak says:

    Does asking twitter to send you email on all activity help? (I guess it doesn't archive messages you originate.) Things sent to you should be ok though.

    • jwz says:

      I guess it doesn't archive messages you originate.

      Yes.
      Thursday?
      A baby's arm holding an apple, obviously.
      Pythagorean Theorem.

  9. Wouter says:

    Apart from not being interested in what people have for breakfast, this is the main reason why I don't like instant messaging and social media. It's already hard enough to keep track of emails and especially (phone) text messages, which in my case do often contain information I'd like to or ought to keep. It's not just the back-up process that's hard, it's also remembering which medium has the specific information you're looking for.

    At least one has a fighting chance when all the information can be downloaded in one form or another onto a real computer; trying to move data between different phones and phone OS platforms can be even more challenging, if not impossible.

    I guess a good policy would be: if it's not in my (IMAP) mailbox, it doesn't exist. Email can be organised, can be searched, can be backed up. But it, too, is pretty far from being the perfect medium these days – as in, since spam and its countermeasures.

  10. Matt Conway says:

    Backing up all your (online) shit is core to our mission here at backupify. We're in the process of rolling out a hosted platform that allows you to do so for _any_ REST api on a regular schedule. We're still in beta, so the ability to define your own backups is limited to some early partners, but we plan to open it up to all within the next few months.

    https://api.backupify.com

    • Aaron says:

      I hope you'll excuse the following blunt question: Other than a potential point of failure, what does your service offer that a competent hacker can't do for himself or his company?

      I get the strong impression that your API service, possibly all of your services, needs either a sysadmin to deal with it on behalf of his users, or a polite interface wrapped around it before users can deal with it themselves. None of the upstream APIs you're hitting appear to be private, so there doesn't seem to be anything in the way of my writing a client for the APIs of whichever of those services I need to back up, rather than writing a client for yours. And, while I'm not quite clear on this part, it appears that backups made via your service exist not on my disks but on yours, which is of course to say that they're not my backups at all.

      So what, precisely, is it that your users get for their money?

      • Matt Conway says:

        Good questions. I understand the confusion as our current products do optimize for the business use cases, where a central authority is usually responsible for making sure all business data is backed up. The product/platform I'm referring to is still not fully exposed and documented, so I'll try to explain it a bit better here.

        The end goal with our developer platform is to allow end users (syadmins or individuals) to define, share and reuse the backup definitions for the APIs they want backed up. There will be a polite interface for authoring and using said definitions, and since they can be shared, this allows less competent users to reuse them without having to grok N APIs, much less write and maintain clients for them all. Having tried to do this ourselves for a dozen different APIs, we know this is a significant burden for an entire engineering team, much less an individual hacker. Yes, it can be done, but do you really want to do it?

        The backed up data will live on our servers, but you'll be able to do an export of all your data across all your services through a single API call. You can periodically make that single restful call (cron/curl) to save your export locally if so inclined. This is much easier to do than maintaining client code to the matrix of services you probably use. Note that all data is stored encrypted with keys you can choose to deny us access to, though the nature of the beast means you still have to trust us to do this with your best interests in mind - that is, trust us to not "look" at the data as we fetch/export it, nor keep around your keys when we decrypt it for export. Hopefully we have made the right balance between security and convenience here. If we had a way to fetch your data in an pre-encrypted form, we would do so.

        Like with most SaaS applications, its all about providing you with the value you need whilst saving you the time and focus it would take you to do it yourself.

        Hope this helps clear things up

        Matt

        • Aaron says:

          Good answers, and thank you very kindly!

          I make a habit of not using cloud services for my personal data where possible, so I don't have much of a use case in my personal life for your offering, but I can see a few places where it might fit in here at work, and I'll definitely keep you in mind.

  11. Hear! Hear! I'm a big fan of DVD backups, because they don't get overwritten. I've written a detailed post describing how to make automatic backups with a batch file (Windows):

    http://progenygenealogy.blogspot.ca/2012/08/effective-backups.html

    I can still read CDs burned in 1997.

    Backing up to a hard drive, or Flash memory has risks, as described here:

    http://progenygenealogy.blogspot.ca/2012/11/the-perils-of-cycling-your-backups.html

    • Aaron says:

      The capacity shortcomings of optical media make it unreasonable to consider entirely replacing disk-based backups, but I can see an argument in favor of maintaining a disk backup cycle and checkpointing mission-critical data to optical media every so often. (Or in favor of backing up everything to LTO-5 tapes, but at two grand for the drive and thirty bucks on sale for each tape, that's pretty far out of the realm of feasibility for most home backup situations.)

    • Pavel Lishin says:

      How, exactly, do you back up automatically to a DVD, without buying a system that picks 'em up off a platter and loads them into a drive?

      • Ben says:

        By having all of your data and its periodic diffs for your entire life total less than 4.7GB.

        Backing up my ~6TB of data on DVDs would be an interesting exercise, for values of interesting equal to installing a modern OS from floppies.

        • Pavel Lishin says:

          Ah, but you still have to manually shove a blank DVD into the tray, right? That's exactly the kind of slight inconvenience that would result in me losing all my shit :p

  12. Sam says:

    If it's not on a drive in your possession, then after 180 days it is considered "garbage" and can be taken by anyone, including the government without so much as a subpoena.

  13. Jon says:

    Another problem I'd like to solve would be automatically deleting old tweets, posts etc on social sites. I can see the attraction of having a personal copy of that stuff but not so much the rest of the world. One day I might try to tackle this.

  14. Alastair says:

    I was getting the following error with sms-backup-iphone.pl, on a message from 2012:

    sms-backup-iphone.pl: unparsable time: Wed Feb 29 10:21 AM

    I don't blame strptime for complaining. Changing line 316 to...

    my $timestr = strftime ("%a %b %d %Y %I:%M %p", @lt);

    ... made it happy.

    Thanks for this.

  15. Nath says:

    Did I miss something or sms-backup-iphone does not work with old text messages. It properly extracted texts exchanged this month but ignored old ones from 2012.

    • jwz says:

      See comment at line 482.

    • jwz says:

      BTW, I know that was lame, so I think I've made it so that on the first run you don't have to do anything special. On subsequent runs (when the files exist already) it still tries to be cautious by not touching old ones.

  16. Louis says:

    Looks like Facebook's trying to discourage people to use your script. On step 4 of "generate session", they redirect to a login_success URL, but after that quickly redirects again to https://www.facebook.com/connect/blank.html#_=_ , which says:

    SECURITY WARNING: Please treat the URL above as you would your password and do not share it with anyone.

    I used Chromium's Developer Tools' Network monitor to get the URL (Using the red circle "Preserve Log upon Navigation" icon), and the script is backing up my messages now.