Shortlinks are terrible for all kinds of reasons, but this post isn't about that. But let me get that part out of the way first:

  • They obscure the destination you're about to click on, making them a primary tool for phishing attacks.
  • They train people that not looking at their link destinations is a reasonable thing to do.

  • Each shortening "service" introduces a new point of failure: when, not if, they go out of business, they will have broken a vast swath of the web equivalent to their market share.

  • The real reason that link shorteners exist is not actually to save typing, or reading, but as a tool of surveillance: the shortening "service" wants to interject itself between your mouse and the destination site to sell those hit statistics to other people.

  • Twitter, who inflicted this blight upon the world in the first place, won't even respect the shortlinks that sites provide on their own, but instead double-encode them using their own shortener. They say this is for "security" reasons but that's a bald-faced lie that I'm sure I don't have to... unpack... for you.

So, all that aside -- it's still an interesting numerical / bit-twiddling problem, on a purely technical level.

Back when I switched to WordPress, I noticed that the "shortlinks" it generated for every post were terrible. They really weren't that short at all, just appending the base 10 numeric post ID to the blog's base URL. They were barely shorter than the long URL that includes the post's whole subject. So I wrote a plugin to do better. For example, the blog post:

has this default shortlink: (35 bytes)
Other services give us: (26 bytes) (20 bytes) (19 bytes) (19 bytes) (16 bytes)

My code gives us: (21 bytes)

I did that by just encoding the post's ID number in base64, which is the same thing those other shorteners do, except that the ID in question is intrinsic to the post. Other shorteners either just increment a global variable, or pick a random non-conflicting number. Of course the smaller that number is, the more traversable the space is, which can be a problem.

But since the post's ID number isn't a secret maybe it could be shorter? Could it be fewer than 4 bytes? Sure, if your post IDs were smaller. By default, a brand new WordPress blog gives its first post the ID 100, which encodes as "ZA". This blog currently has 9469 posts, so that would have still been way down in the three-byte space, "JP0". The post IDs don't increase quite monotonically (the number increases every time you do a preview, among other things), but it still would have fit in three.

Unfortunately, I used to host my blog on Livejournal, and only migrated it here in 2010. The tool I used to import the blog preserved Livejournal's post ID numbers in the WordPress database. Those were already four bytes: "FDWn" was the last one. And then immediately after that, something went wonky with the import, and subsequent WordPress IDs jumped by eleven million for some reason, all the way up to "ygO-". If I had noticed it at the time, I could have done surgery to pull that number back down, but since then there have been almost 5,000 more posts, and I suspect that WordPress might lose its mind if post IDs are non-increasing. It doesn't matter, though, because these IDs will still fit in 4 bytes for the next 3.5 million posts.

Anyway, a few weeks ago I decided to waste some time making shortlinks for the DNA web site. Since there was no Livejournal fuckery, the WP blog over there already had nice and small IDs that fit in three bytes, so its shortlinks looked like But I thought it might be interesting to make shortlinks for the various other pages on the site, too. Most of those pages are date-based, so that suggests a way to generate unique IDs that are predictable and do not require a global counter: just use the date! But a time_t is a big number that takes six bytes to encode, so that won't do.

So I computed the number of days since the Epoch instead of the number of seconds (no, you can't just divide, because of leap years and daylight savings). Then there's the matter of the directory (is this a blog post, a calendar page, a flyer page, a gallery page?) and the room suffix (is this a daytime event in the main room, a nighttime event in Above DNA, etc?) So I use 3 bits for each of those, adding 6 bits to the 15-bit day number, and a 21-bit number still handily encodes as 4 bytes.

So here's a gallery: and its calendar page: and flyer: and a blog post from around the same time: That they start with low capital letters means there's plenty of space left.

Of course those aren't actually all that short, since unsurprisingly, whoever was squatting "" back in 1998 never answered my email when I tried to find out what their price for it would be. But if someone wanted to buy me "" from the Registrar of the Great Nation of Georgia, I wouldn't say no.

BTW, autocomplete keeps changing "shortlink" to "chortling", which is what I think we should call them now.


Tags: , , , ,