Suppose that I post an anonymous and potentially defamatory comment on a Boing Boing article, but Boing Boing for some reason is unable to supply the plaintiff with any hints about who I am -- not even my IP address. The plaintiff will only know that my comment was posted publicly at "9:42am on Fri. Feb 5." But as I mentioned yesterday, Boing Boing -- like almost every other site on the web -- takes advantage of a handful of useful third party web services.
For example, one of these services -- for an article that happens to feature video -- is an embedded streaming media service that hosts the video that the article refers to. The plaintiff could issue a subpoena to the video service and ask for information about any user that loaded that particular embedded video via Boing Boing around "9:42am on Fri. Feb 5." There might be one user match or a few user matches, depending on the site's traffic at the time, but for simplicity, say there is only one match -- me. Because the video service tracks each user with a unique persistent cookie, the service can and probably does keep a log of all videos that I have ever loaded from their service, whether or not I actually watched them. The subpoena could give the plaintiff a copy of this log.
In perusing my video logs, the plaintiff may see that I loaded a different video, earlier that week, embedded into an article on TechCrunch. He may notice further that TechCrunch uses Google Analytics. With two more subpoenas -- one to TechCrunch and one to Google -- and some simple matching up of dates and times from the different logs, the plaintiff can likely rebuild a list of all the other Analytics-enabled websites that I've visited, since these will likely be noted in the records tied to my Analytics cookie.
The bottom line: From the moment I first load that video on Boing Boing, the plaintiff gains the power to traverse multiple silos of data, held by independent third party entities, to trace my activities and link my anonymous comment to my web browsing history. Given how heavily I use the web, my browsing history will tell the plaintiff a lot about me, and it will probably be enough to uniquely identify who I am.
But this is just one example of many potential paths that a plaintiff could take to identify me. Recall from yesterday that when I visit Boing Boing, the site quietly forwards my information to the servers of at least 17 other parties. Each one of these 17 is a potential subpoena target in the first round of discovery. The information culled from this first round -- most importantly, what other websites I've visited and at what times -- could inform a second round of subpoenas, targeted to these other now-relevant websites and third parties. From there, as you might already be able to tell, the plaintiff can repeat this data linking process and expand the circle of potentially identifying information.
See also EFF's Panopticlick, which shows that even with cookies turned off, just your user agent string alone contains enough information to (on average) identify you to within 1/1500 people in the world.
The most surprising thing to me was that web servers can get the list of all the fonts installed on your system -- and that that is usually even more uniquely-identifying than the user-agent string.