SXSW scraper

I wrote a thing to scrape the SXSW schedule, intersect it against highly-rated tracks in iTunes, and generate an iCal calendar.

This will probably be of interest to like 3 people. Let me know if you are one of those 3, or if you improve it.


Update: It looks like sched.org contains more events than sxsw.org does, so I added an option to read from there instead, or from both of them.

If you've downloaded it already, grab the new version and run it again. I've just realized that the geniuses running the sxsw.com web site think that an event at 12:01AM on Fri Mar 16 should be listed under Thu Mar 15 instead. Fantastic.

Tags: , , ,

8 Responses:

  1. todd says:

    rad idea but i'm lame and don't know how to code. how do i use it?

  2. Matt Sayler says:

    Extremely handy, given I've been rating the SXSW torrent tracks... When I import this into iCal, it seems to duplicate all events... probably my own fault since the ICS file itself seems just fine.

    The following patch makes minimum stars configurable, though it's kind of awkward since you're not using getopt.

    --- scrape-sxsw.pl.old 2012-03-11 11:32:14.000000000 -0500
    +++ scrape-sxsw.pl 2012-03-11 11:36:48.000000000 -0500
    @@ -49,6 +49,7 @@

    my $verbose = 1;
    my $debug_p = 0;
    +my $minstars = 3;

    my $itunes_xml = $ENV{HOME} . "/Music/iTunes/iTunes Music Library.xml";
    my $base_url = ('http://schedule.sxsw.com/2012/' .
    @@ -140,7 +141,7 @@
    my $body = ;
    close $in;

    - my $stars = 3;
    + my $stars = $minstars;

    my @e = split (m@Track ID@, $body);
    shift @e;
    @@ -456,6 +457,7 @@
    if (m/^--?verbose$/s) { $verbose++; }
    elsif (m/^-v+$/s) { $verbose += length($_)-1; }
    elsif (m/^--?debug$/s) { $debug_p++; }
    + elsif (m/^--?minstars$/s) { $minstars=shift @ARGV; }
    elsif (m/^-./) { usage; }
    elsif (! $out) { $out = $_; }
    else { usage; }

    • jwz says:

      I don't see how adding a single line to an "if" clause is "kind of awkward" compared to using the incomprehensible syntax of getopt. Which is why I never use that junk, in Perl or C.

      Not sure why you'd see dups, since the UID for each event should be unique and stable even if you regenerate it, but the ways of iCal are mysterious.

      • Matt Sayler says:

        well I can't contribute hacks to hacks without at least proffering some pointless criticism…

  3. Dumped into a github repo for easy forking and what not. Also a README to make sure people know it's your code (for proper blaming). It found 17 events for me and I didn't even wade through the torrents yet! Fuck, maybe next year I'll make it to the music part and buy you a bloody mary.

  4. Jason Heilig says:

    This would be awesome, if I had gotten a music pass this year.

    $750 is a bit much. I'll stick to unofficial showcases, but I guess that's the door prize I get for living here.

  5. If you live here, you can get a wristband for $165-200 (depending on when you buy) instead. "Party" events tend to require badges, but most other events will let you in with the wristband; sometimes you will have to wait in a longer line, sometimes not.