CDDB: Feel the Pain
© 2003 Jamie Zawinski <jwz@jwz.org>


In case you didn't know, the file format that CDDB (and FreeDB) use is complete garbage. In addition to random idiotic crap like it being impossible to unambiguously represent a song title that has a slash in it, it's rocket science to figure out how long a song is supposed to be. I need this info not only to display it in Gronk (my MP3 jukebox software), but also for some error-checking that my CD-ripping scripts do, so that I don't end up with truncated files if there was a crash or a full disk or something.

So get this. CDDB files contain junk like this:

    # Track frame offsets:
    #       150
    #       18265
    #       32945
    #       49812
    ...
    # Disc length: 3603 seconds
    #
    DISCID=...
    DTITLE=...
(You'd think that the fact that it's in a comment would mean something, but no: you have to parse both comments and non-comments, begging the question of what they thought "comment" means.)

Those numbers are the starting sectors of each track on the disc. There are 75 sectors per second. So you convert those to seconds by dividing, and then find the length of each track by subtracting each from the previous. Oh, but wait, they don't give you the sector address of the end of the last track: for that one, it's expressed in seconds instead of sectors, for no sensible reason. Still, the info is there, right?

Uh, almost.

It turns out that if the last track on a CD is a data track (an ISO9660 file system) then there is a gap between the last track (the data track) and the second-to-last track (the last audio track.) This gap is exactly 11400 sectors (152 seconds, 2:32.) On some discs, you can actually see this track, it's a differently-shiny ring. Why's it there? I don't know. Why's it that size? I don't know. What if the data track is not the last track on the CD? (Does that even work?) I don't know.

So what this means is, when computing the length that a track should be, you have to subtract 152 seconds from the length of the second-to-last track, only if the last track is a data track.

How do you tell whether the last track is a data track, without having the CD in question physically in your computer? By hoping that the CDDB file contains the words "data track" in the title of that track, I guess. Yeah, that's reliable.

And, just to keep things interesting, it turns out that older versions of grip and cdparanoia didn't skip over this gap when ripping: instead, they would append 152 seconds of silence onto the end of the second-to-last track. So now my script that sanity-checks the lengths of the files has to consider two different lengths to be "right", since I now have CDs that were ripped both ways.

Whee. I love love love supporting standards invented by 12-year-olds.

Of course the reason that I use CDDB files at all in Gronk is because of the mind-blowing worthlessness of ID3 tags (32 character limits on titles, etc.) Yay more standards invented by 12-year-olds. (Please don't even mention ID3v2 or Ogg. I laugh at you, you silly person. Those are universally-unsupported fantasies that simply trade one set of problems for a whole new set of problems.)

And as if CDDB wasn't bad enough, FreeDB has taken the CDDB braindeadness and layered even more braindeadness on top of it: it is truly a thing of wonder.

For example, go ahead and try to ever have the "genre" field be something approaching reality -- oops! The first person who ripped this CD said it was "folk" because that's genre number zero! So fix it and resubmit it to the database? Sorry! You can't ever change the genre of an entry in the database after creation, since the genre dictates what directory the file goes in on their server. And so on.

It's a wonder anything works at all.


[ up ]