ljgrabber

I haven't posted this before, but it's been accreting over several years: my perl code for keeping a backup copy of my Livejournal. There are others. This is mine. ljgrabber.

Running it the first time:

   ljgrabber.pl -v Download everything to ~/Documents/LJ/
ljgrabber.pl -v --lock --before 30d   Lock all of your more-than-a-month-old entries to disallow new comments. This helps with comment-spam.
Running it nightly from cron:

ljgrabber.pl --since 2d Download any entries modified in the last two days.
ljgrabber.pl --lock \
  --before 30d --since 33d
Lock any entries modified between 30 and 33 days ago (the range is so that it doesn't have to examine every old entry, only the ones that have just expired.)

It doesn't back up comments.

Tags: , , , , ,

25 Responses:

  1. jarodrussell says:

    You know how do document code quite well.

  2. vordark says:

    Why do you do these sorts of things in Perl? I've been trying to make myself learn Perl for years, but every time I look at it I'm like "Wow, I wonder if this would be any less readable if I rot13'd it."

    • jwz says:

      Because Perl's pre-installed everywhere, the language doesn't change, and it's actually easier to do this kind of text-and-network-munging crap in Perl than Emacs-Lisp.

      I don't like Perl, but every other option comes with more hassle.

    • lionsphil says:

      Did you look at this one? Because the only line noise I see are the regexps, and frankly concise, embedded microlanguages beat ten pages of C string-munging when it comes to capturing intent.

      (I feel vindicated seeing that someone who got into a book about programming practices uses the same kind of indentation-for-alignment, rather than strictly indentation-by-block-nesting, as me.)

      • vordark says:

        Yes, I saw it. I'm still unimpressed by the language. I continue to hear tell that it's great for what it does, once you get used to it. Every time I hear this I think "Sewer rat may taste like pumpkin pie..."

        P.S. Saying a language is good because it's superior to C is kind of like proving how tough you are by hitting a baby with a pipe wrench.

  3. spendocrat says:

    Does it not grab comments because they're a pain to grab, or you just don't want them?

  4. duskwuff says:

    Perhaps LWP::UserAgent might be a more portable solution than wget? Here's the patch, which also adds Firefox support under OS X. I haven't completely tested it, but it seems to successfully download posts, at least.

    Side note: Is it just me, or is the LJ API being really slow right now?

  5. Does it backup friends only posts?

  6. duskwuff says:

    There does appear to be some bugginess with regard to text encoding - when running this against entries with UTF-8 in the properties and/or body, I get "wide character in print" warnings, and the wide chars in the body end up double-encoded.

  7. strspn says:

    I wonder why the Wayback Machine gave up in 2007. robots.txt is fine.

    • jwz says:

      Weird. Did it give up on all of LJ? It stopped indexing brad.livejournal.com at the same time.

    • jarodrussell says:

      I think it's because 2007 was around the time LJ started having "Adult Content" warnings on some blogs, where you have to click that you're an adult before you go to the entry. If the Archive.org gets that screen every time it visits, then the journal hasn't changed since then.

  8. jkonrath says:

    What exactly does this mean?

    ljgrabber.pl: getting item 83... ljgrabber.pl: LJ error: no properties for item 83

    It always seems to happen a dozen or so items in. Am I hitting a locked post or something?

    • semiclever says:

      As far as I can tell this is a bug. JWZ probably didn't see it because all of his posts have music associated with them. Line 432 should be changed from:

      my $propcount = 0;
      to:
      my $propcount = length keys %props;

      • jwz says:

        No, that doesn't make sense, all you've done is make $propcount be wrong and thus hide the error. I don't think it's possible for an LJ entry to have no prop_*_value in it. There should be more than 0 things in $propkeys and $propvals. And I do have many entries with no music.

        • semiclever says:

          Here's an entry from my livejournal as dumped by the script:

          ljgrabber.pl: result:
          events_1_anum
          30
          events_1_event
          From%20an%20article%20on%20the%20new%20%3Ca%20href%3D%22http://reviews.cnet.com/Samsung_Upstage_SPH_M620/4514-6454_7-32378893.html%3Fpart%3Dcnet%26subj%3DSamsung%2BUpstage%2BSPH-M620%22%3ESamsung%20Upstage%20phone%3C/a%3E:%0A%3Cblockquote%3EIt%20took%20us%20about%20an%20hour%20to%20master%20it%20completely,%20even%20after%20we%20took%20the%20handy%20tutorial.%20But%20once%20we%20got%20the%20hang%20of%20it,%20we%20thought%20it%20was%20quite%20user-friendly%20and%20intuitive.%3C/blockquote%3E%0ASure,%20reviewers%20take%20this%20word%20in%20vain%20all%20the%20time.%20%20After%20all,%20%3Ca%20href%3D%22http://www.greenend.org.uk/rjk/2002/08/nipple.html%22%3Ethe%20only%20intuitive%20user%20interface%20is%20the%20nipple%3C/a%3E.%20%20But%20come%20on.
          events_1_eventtime
          2007-03-26 22:26:00
          events_1_itemid
          314
          events_1_subject
          That word ... I do not think it means what you think it means
          events_1_url
          http://semiclever.livejournal.com/80414.html
          events_count
          1
          prop_count
          0
          success
          OK

          • jwz says:

            Huh. All of mine have at least this:

            prop_1_name commentalter
            prop_2_name interface
            prop_3_name opt_lockcomments
            prop_4_name opt_preformatted
            prop_5_name personifi_tags
            prop_6_name personifi_word_count
            prop_7_name revnum
            prop_8_name revtime
            prop_9_name taglist
            prop_10_name verticals_list

            I guess you can just comment out the "no properties" error and it will work. I don't know what this personifi crap is, but it's on every entry of mine I've ever seen.

            • jkonrath says:

              Just a guess, but could this be somehow related to you having a permanent account and me having a free one?

              I commented out the error and it worked fine for me. Thanks for writing this - I appreciate getting my info backed up before the LJ servers are stolen and stripped for copper by Ukranian farmers.

  9. muftak says:

    Thanks for that, most useful. I hardcoded the cookie in, as I don't use safari, and commented out all the errors, as LJ's database seems to be pretty broken.