Fuck Smugmug.

Dear Lazyweb,

Every now and then I have a need to bulk-download a photo gallery created by someone who has had the exceedingly poor taste to use Smugmug to display their photos. (Sometimes hosted on smugmug.com, sometimes on their own site.)

Someone please write me a script to bulk-download such things.

There is some seriously evil AJAX shit going on here. It's easy to find the name of photo #1, but I don't see how to get the name of photo #2 or subsequent. For example.

I'm actually kind of impressed at how inconvenient they've managed to make this. You know, in a beauty-of-pure-evil, Guild of Calamitous Intent kind of way.


Update: Aha, apparently the page's RSS feed actually contains useful information. I hadn't noticed that.

Tags: , , ,

15 Responses:

  1. ritcey says:

    Looks like the RSS feed will at least give you a list of photos (photo pages, anyway).

    curl 'http://sadiemelleriophotography.smugmug.com/hack/feed.mg?Type=gallery&Data=11745213_6EyaZ&format=rss200'|grep link

    Assuming their goal was to make it as difficult as possible to get actual photos off their site, it's impressive. Otherwise, yes, that's some serious crack.

    • jwz says:

      Yes, it doesn't work unless you have an account there.

      • cabbey says:

        Actually *you* don't need an account, the person that posted the images does. You as a viewer can use the api via anonymous login with only their account name (needed so the servers know who you're trying to get info on). This will give you the public view of their account via the api in a read only fashion.

        There are a few tools already written to do what you want, see the downloaders section here: http://wiki.smugmug.net/display/SmugMug/Hacks+and+Apps

  2. proub says:

    This is the barest-bones, dumbass version; proper parsing would be nicer, but as a proof-of-concept:

    Your sample page's Atom feed lives at http://sadiemelleriophotography.smugmug.com/hack/feed.mg?Type=gallery&Data=11745213_6EyaZ&format=atom10

    The following stomps clumsily through that, pulling images as it goes:

    use LWP::Simple;

    while (<>)
    {
    if (m!img src="([^"]+/)([^"]+)-Th\.jpg"!i)
    {
    my $imgurl = "$1$2-L.jpg";
    my $fn = "$2.jpg";

    if (is_error(getstore($imgurl, $fn)))
    {
    print "Unable to retrieve $imgurl to $fn: $!\n";
    }
    else
    {
    print "Saved $fn\n";
    }

    }
    }

  3. kiskadee says:

    If the RSS thing doesn't work out, this is almost certainly possible with Selenium.

  4. onethumb says:

    As the founder, CEO and Chief Geek at SmugMug, it'd be easy for me to get indignant, defensive, and rant about how this isn't fair.

    That, however, isn't what I'm going to do. I'd rather improve our product than get into a flame war, especially with someone whom I've respected for probably close to 15 years.

    So you've got my attention - how can we make it easier to bulk download photos from SmugMug, should the owner of said photos wish to let you do so? We already provide RSS and Atom feeds of the gallery in question, an API that you can consume with no login or account required, and thousands of developers have created a wide range of 3rd party plugins, apps, and scripts. Everything from Python scripts to mirroring apps to browser plugins - it's out there and easy to find.

    Perhaps more importantly, though, is the question of which photo sharing service does it better, and why? Let's start with the biggest ones: Facebook? Kodak? Shutterfly? Snapfish? Flickr? Which of those makes it easier to do than SmugMug, how do they do it, and how can we exceed whatever method they're using?

    We have lots of customers and lots of goals. Your use case is one - but fast pages chock full of photos and videos is another one. Our "seriously evil AJAX shit" helps us do that. But that certainly doesn't mean we overlooked the "getting data back out" piece. We don't own our customers' data - they do. We want to make it easy for them to get it in *and* out.

    So let me know what barriers we've put in your way. I promise - we'll fix it.

    • jwz says:

      The RSS feeds are fine, once I figured out that they were there, which obviously I found non-obvious, having spent hours beating my head against trying to scrap the HTML first.

      The code is so crazy that it really looked very much like intentional obfuscation (e.g., Youtube, Facebook), which put me very much in a "how do I crack this" frame of mood rather than on a "where did they misplace the control for this feature" hunt. So the hostility of my initial reaction was based on the appearance that your site was specifically designed to make what I was trying to do as hard as possible. I'm glad to learn that I was wrong about that.

      If it is the case that it's possible to use your APIs without creating an account first, that also was very non-obvious; several of the "downloaders" I tried seemed to want me to log in first, making it look like they were for downloading my own photos.

      So, the first problem is poor documentation and undiscoverable features.

      But beyond the bulk access problem, which is something that I face a lot but that admittedly most other users don't, I just intensely dislike the UI you guys have created. It is baroque, cluttered and too clever for its own good. I find Flickr minimal and simple enough to be tolerable, but frankly I prefer plain old HTML galleries that don't blink and zoom and zip around and that aren't covered with random magic pop-ups that appear when I accidentally mouse over the wrong invisible mine-field inside the image, but instead just sit there and show me the page of the URL that I clicked on.

      But I'm just an unfrozen caveman, what do I know.

      Also, your URLs are gross.

      When I click on the gallery list, they start out as

      http://USER.smugmug.com/GALLERY-A/GALLERY-B/TWELVEDIGITS#FIFTEENDIGITS

      but after a little clicking (I'm not sure how) I eventually end up with

      http://USER.smugmug.com/gallery/TWELVEDIGITS#FIFTEENDIGITS

      Any time I see crap like that, as opposed to

      http://USER.whatever/GALLERY/THREEDIGITS.html which embeds

      http://USER.whatever/GALLERY/THREEDIGITS.jpg,

      I know that I'm dealing with someone who designed their URL space not based on what made sense, or what was clear, but based instead on the constraints farted out by whatever database framework they're using. Nobody cares what your database keys and hashes look like. mod_rewrite that shit and make your URLs canonical (the "U" is for "universal").

      Also: having the #anchor in the URL change the content of the page is just insane. I mean, it's really clever and all, it's a good trick, who knew you could even do that -- but it breaks the decades-old assumption about what URL anchors mean! Anchors scroll. They do not change content. Web sites that do that are, again, being too clever for their own good, and require people to re-learn how URLs work for no good reason at all. (Youtube sometimes does this too, and it's just as stupid there, so you're not alone in this new madness.)

      So, those are some of the reasons that your site irritates me.

      But obviously I am not whoever you believe your target market to be.

      • sharding says:

        I'm a pretty happy SmugMug customer. But Don, if you're still reading, I agree that the weird #anchor in the URL thing does kind of drive me nuts.

      • phs says:

        I think the problem is that currently the "anchor trick" is the only portable way to manipulate the browser's history so that "clever" web developers can subvert the back button to work within their page load. GMail uses this extensively so that one can click on a message and have it load quickly (without actually reloading the page) and still have the back button go back to the inbox.

        Apparently when HTML5 support becomes common (in 2039 or so) there will be history.{push,replace}State() so that people can get the same behavior without this stupid anchor trick. (http://dev.w3.org/html5/spec-author-view/history.html) I'll be holding my breath...

        (I don't like it either, for what it's worth...)

        • n0man says:

          Seconding phs, if you're going to do ajax / dynamic html at all, the best / only way to represent page state is through an anchor tag so that the back button works and you can load bookmarks properly. It doesn't even strike me as evil: scroll position is one kind of page state, "active image" is another in this case.

    • vitorious says:

      I worked on a photo management app last year, and we handled bulk downloads in a "shopping cart" style.

      Just mapping that case here, you could have a "Save" button to the left of "Buy", with the same "This photo"/"Photos in this gallery" drop down, and "This photo" would trigger the same action that "Save photo" does from the right-side slide-out panel, and "Photos in this gallery" would give you the same UI as the buy screen, except instead of purchasing photos, you're adding them to a .zip file that's generated on-the-fly.

      This would probably solve most of the use cases for bulk downloads.

      We also supported saving images to a .zip from multiple galleries. Roughly aligning what we did to Smugmug, you'd have a "Save to .zip" button below the "Save photo" button on the right-side slide-out panel, which would turn on a "Download .zip" button somewhere, and then you could go to a "Photos in this gallery"-style management page to confirm and download the .zip. Downloading it would wipe out your selections (only one "batch" of photos at a time, no libraries of .zip files unless Smugmug already does favoriting, custom galleries, whatever).

      • jhitesma says:

        Interestingly enough Smugmug does offer that. But like most things on Smugmug it's given to the owner of the photos as an option whether or not they offer their images that way.

        When you click 'Buy' to buy a photo if the owner has made digital downloads available then they're an option there. Though I don't think there's a way to enable it at zero cost.

        While I'm not a fan of Smugmugs URL's and overuse of AJAX, I still find them to offer MUCH more to a photographer than a site like Flickr. A poor analogy would be that Smugmug is the hosting equivalent of a digital SLR while Flickr is the equivalent of a fixed focus 110.

        I've tried both - as well as any number of other options ranging from my own homebrewed galleries (both hand coded and generated from scripts) to several self-hosted gallery packages and dozens of hosted options and none of them come close to giving the creator of the photos the features that smugmug delivers.

        Is there room for improvement? Sure. But at least for me it's still by far the least evil of the options available.

        • Flickr suuuuuucks for *just browsing* a bunch of photos, large and up close without all the surrounding BS about Pools, Groups, tags, blah blah blah.

          By far the best UI for just looking at photos is Boston.com's "Big Picture". Now it's not one URL per picture (sorry JWZ), but it uses some JavaScript cleverness to let you use the good old J and K keys to move up and down through the photos (hitting different anchor points, although the URL does not change. It'd be nice if it did for easy cut & paste.)

          Anything that gives a shout-out to vi is all right in my book.

          [Yes, I know there are hacks/alternate interfaces for Flickr. But they're not what you get when you go to the Flickr site.]

          [Yes, I know that SmugMug has a new Journal mode that is kind of like Boston.com's "Big Picture".]

          [SmugMug user for about 6 years.]