Hm. That wasn't supposed to be a blank comment. Does Unicode break your comments?
It said this (which might also break this comment, I suppose):
Some unicrud vanishes, most does not. It's some long-known PHP bug that I didn't really understand the last time I looked into it.
Heeehehehee. Heh. Poop.
Are the avatars on this page HUGE for anyone else?
I have no idea why that's happening. I blame Unicrud somehow.
WordPress is garbage. For whatever reason it has decided that this particular user input should result in emitting an unclosed HTML anchor. Maybe that's because of an underlying PHP bug, maybe the blame lies solely with WordPress. Really there's no reason to care about the difference between the two.
Assuming for a moment that there's no way to emit anything else, and thus inject arbitrary code into visitors browsers we're just left with this broken tag - the DOM processing in the browser gets confused, this page isn't anywhere close to valid, and we've chosen to take the blue pill and support browsers that try to muddle on rather than just giving you an error page saying "This site is bad and its owners / developers should feel bad".
When doing CSS evaluation the broken DOM means later avatar images are counted as if they were not inside comments, and only part of their style is applied. The partial style includes a width of 100%, which makes the images huge.
I think this is actually a MySQL bug? Apparently anything you try to insert into the database that includes a 4-byte UTF8 character is silently truncated at that character.
The tag was left unclosed because what David typed above was an A HREF whose properly-quoted string originally contained the literal poo character. So the DB just truncated it there, breaking the tag, long after WordPress had sanitized the input.
First of all, I think there might be a market for a browser extension that always put up a message saying "This site is bad and its owners / developers should feel bad".
Second, there's a philosophical question about whether an application that sanitizes data before storing it in a database should trust the data coming out of the database, or whether it should somehow re-sanitize it (which could be hard, if sanity isn't idempotent). Or maybe the sanitization (for html purposes) should only be done on the way out of the db (assuming SQL query sanitization is a separate operation that obviously has to be done on the way into the db). After all, some bozo with a mysql client could easily stuff any old garbage into the database without passing it through the application. Should the database be treated as "inside" the application, or is it "externally visible"?
It makes me sad that backwards compatability caused MySQL to create a new character set name (utf8mb4) to mean "really utf-8, we mean it this time for reals!"
On the other hand, I love the Drupal bug name "#1886646: Fire kills bot" (though of course in the details it's revealed that waves, volcano, and panda also kill bot).
From the comments on this shit, I can't tell if bleeding a utf8mb4-shaped chicken over my mysql instance is the thing to do, or if that will invite the attention of the One whose Name cannot be expressed in the Basic Multilingual Plane.
My impression from the comments is that it would probably do the trick (assuming you're running MySQL 5.5.3 or later), but then I'm not the one who'd get sucked into the abyss if I'm wrong.
There's a WordPress ticket about this. It frightens me. This is an explanation of how to upgrade a WordPress DB, but it requires MySQL 5.5.14, and I'm still on 5.1.73 since that's what CentOS 6.5 ships.
That's a lot of work for poop.
ALL SHALL BURN IN THE CLEANSING FIRE OF UNICRUD