"The plaintiff gains the power to traverse multiple silos of data"

The Traceability of an Anonymous Online Comment

Suppose that I post an anonymous and potentially defamatory comment on a Boing Boing article, but Boing Boing for some reason is unable to supply the plaintiff with any hints about who I am -- not even my IP address. The plaintiff will only know that my comment was posted publicly at "9:42am on Fri. Feb 5." But as I mentioned yesterday, Boing Boing -- like almost every other site on the web -- takes advantage of a handful of useful third party web services.

For example, one of these services -- for an article that happens to feature video -- is an embedded streaming media service that hosts the video that the article refers to. The plaintiff could issue a subpoena to the video service and ask for information about any user that loaded that particular embedded video via Boing Boing around "9:42am on Fri. Feb 5." There might be one user match or a few user matches, depending on the site's traffic at the time, but for simplicity, say there is only one match -- me. Because the video service tracks each user with a unique persistent cookie, the service can and probably does keep a log of all videos that I have ever loaded from their service, whether or not I actually watched them. The subpoena could give the plaintiff a copy of this log.

In perusing my video logs, the plaintiff may see that I loaded a different video, earlier that week, embedded into an article on TechCrunch. He may notice further that TechCrunch uses Google Analytics. With two more subpoenas -- one to TechCrunch and one to Google -- and some simple matching up of dates and times from the different logs, the plaintiff can likely rebuild a list of all the other Analytics-enabled websites that I've visited, since these will likely be noted in the records tied to my Analytics cookie.

The bottom line: From the moment I first load that video on Boing Boing, the plaintiff gains the power to traverse multiple silos of data, held by independent third party entities, to trace my activities and link my anonymous comment to my web browsing history. Given how heavily I use the web, my browsing history will tell the plaintiff a lot about me, and it will probably be enough to uniquely identify who I am.

But this is just one example of many potential paths that a plaintiff could take to identify me. Recall from yesterday that when I visit Boing Boing, the site quietly forwards my information to the servers of at least 17 other parties. Each one of these 17 is a potential subpoena target in the first round of discovery. The information culled from this first round -- most importantly, what other websites I've visited and at what times -- could inform a second round of subpoenas, targeted to these other now-relevant websites and third parties. From there, as you might already be able to tell, the plaintiff can repeat this data linking process and expand the circle of potentially identifying information.

See also EFF's Panopticlick, which shows that even with cookies turned off, just your user agent string alone contains enough information to (on average) identify you to within 1/1500 people in the world.

The most surprising thing to me was that web servers can get the list of all the fonts installed on your system -- and that that is usually even more uniquely-identifying than the user-agent string.

Tags: , , ,

21 Responses:

  1. fantasygoat says:

    They might be able to identify a unique machine, but how might they connect that machine to a person?

    For example, I surf from work, which uses a single NATed IP for all workstations. At best they could narrow it down to an office of dozens of people.

    However, doing the subpoena dance they could probably eventually track it down to a name.

    • jwz says:

      IP → ISP → employer → logs on NAT box. Two subpoenas, both of whom will roll over on you at the drop of a hat.

      • jered says:

        Also, the series of cookie-linking described in the article means you could be tracked to a webmail account, personal photo album... Anything with your username in the URL sent as a referrer to ad or analytics provider.

        The combination of fonts and plugins means my laptop tested unique. Of course, all iPhones of the same sw rev and time zone test same, as will the iPad.

        • jwz says:

          Surprisingly, my iPhone and a friend's iPhone, same hardware model and OS, did not show up as identical. I'm not sure what the difference was.

          Someone theorized that some installed apps show up as browser plugins, and the server can get a list of those (and their version numbers).

      • fantasygoat says:

        You assume that such logs are kept for any length of time, or at all. For example, my employer doesn't log any network traffic at all on the internal network.

        I know this because I run it. Now, at a Fortune 500 company there might be a process but in my 15 years of industry work I've never seen a small to medium shop keep more than a day's worth of traffic logs.

        An ARIN query will connect the office to an IP without even a subpoena, but after that, they're SOL generally.

        If they're smart, they'll go another way and get a list of destinations from the ISP for the time period and work back from that. The ISP *may* keep such logs, although again in my experience never more than a week's worth.

    • lionsphil says:

      Orthogonal to the "getting down to the individual machine" part, I wonder if there's a potential argument here along the same lines as the need for front-facing photo speed cameras: proof that the actual person using the computer was the defendant.

      Mind you, given that there have been cases of teachers getting put on kiddie-fiddler lists for having machines packed with malware start popping up porn ads during presentations, this is presumably another point where computers are considered magical and the user account is inexorably tied to a single person.

  2. krick says:

    Do you have a source for the the font list thing?

  3. discogravy says:

    i'm reading this as "do anything potentially obnoxious via a linux LiveCD from a wifi-enabled DNA Lounge hotspot cafe"

  4. lionsphil says:

    Font information was enough to get me uniquely. Turn off JavaScript (leaving plugins on), however, and "one in 84,945 browsers have the same fingerprint as yours".

    Unfortunately, the tendency for people to write the damn object elements out using JavaScript these days means you won't be watching any BoingBoing embedded videos this way anyway.