<LJ-CUT TEXT=" three fisted tales of how my computer hates me ">
I say to myself, "Self, you've been really slack about doing backups on the machine that has all your MP3s lately." I retorted that doing backups sucks, because I do it to 12G DATs, and each one takes about 3.5 hours to write, and the same to verify, so when I get it in my head to "do backups", that's an every-night-for-a-week job. Then I said to myself, "dude, here's a nickel, go get another disk drive and back them up to that instead!"
So I get this 120G drive, and I start copying my existing MP3s to it. God damn this machine is slow. I mean really slow. Wait, it shouldn't be that slow. This is crazy. Writing DATs is faster than this.
Shit, no wonder performance went to hell at around the time I upgraded to RH7.3: apparently, around that time, one of my three disks (the one my system is on, along with about 1/4th of my MP3s) decided to start running two orders of magnitude slower than it used to:
/dev/hda: 64 MB in 526.75 seconds = 124.42 kB/sec
/dev/hdb: 64 MB in 2.91 seconds = 21.99 MB/sec
/dev/hdc: 64 MB in 3.29 seconds = 19.45 MB/sec
/dev/hdd: 64 MB in 1.70 seconds = 37.65 MB/sec
Wow. Nice. So I tried a bunch of things -- checked jumper settings, bought new IDE cables, etc -- no luck. The next thing to try was to move it to another IDE controller and see if it still loses. Oh, but it's my system disk. So first I have to copy my system from this disk to a new one. I consolidate space so that I can overwrite one of my disks, and clone my system disk to it. Which takes forever, because note, I'm copying files off of my system disk at a whopping 125KB/s.
So I boot the new system disk, and yup, machine's a lot faster, and yup, the slow disk is still slow. I also attached the slow disk to a different computer, and it was also slow there. Great. So that disk is essentially dead, and now I'm down to only having room for one copy of my MP3s instead of two. So I need to buy another 120G disk. Except by now it's saturday night, and I can't get one until monday. Oh well, I'll spend the weekend copying the rest of the MP3s off of the slow disk. This is a lamentably manual process, because I don't trust the machine to, well, function, so I'm babysitting it a lot.
Somewhere in here I have a genuine premonition, and say to myself, "Self, you ought to make checksums of all the files on the slow disk, and compare them to the copies. Just in case." This makes everything take twice as long (since I'm reading each file twice.)
So on monday, it's almost done copying and summing, and I go get another 120G disk. To add insult to injury, the price of 120G disks has gone up by $30 over the weekend.
So now I've got a machine with three disks, a small-ish one for the system, a big one for MP3s, and a big one for a copy of the MP3s, under the assumption that both disks probably won't fail at the same time.
I take advantage of my premonition, and check the checksums of the files on the new disk. Gasp! Some of them (a few dozen files, out of the many thousands) don't match! How did that happen? Well, the "slow disk" is obviously failing, so maybe this is just another symptom. I re-copy those files over, and they match this time.
I feel like I'm just about done. Ho ho ho!
Because I was using three disks before and now I'm using one, the partition sizes aren't the same in the new world, so there are some partitions that aren't all the way full. So I start moving directories around to pack things in. It's going well, and I'm basking in the glorious speed of the new disks, compared to the broken one.
Then the machine crashes.
And when I boot up, all those directories I was moving around? They're gone.
All of them.
The ext3 file system decided it was going to roll back the journal by at least fifteen minutes -- FIFTEEN MINUTES -- on the destination partitions. The source partitions, it left alone. So it went ahead and let the deletions happen, but un-did all of the file creations.
I spend some time tracking down what went missing, and it's like 70+ albums.
I check the contents of the decomissioned disks -- nope, none of the files are there. It turns out that all of these files happened to originally live on the disk I reformatted to be my new system disk.
I found about 20 of them on my old DAT backups from two years ago, and was able to restore them. But the rest were all things I'd gotten more recently than that. So now I have to re-rip 50+ CDs. And I can't even find them all: since my Damned Shelves have been full for years, new acquisitions have been sitting scattered in piles on the floor, and apparently some of those piles have gone where the socks in the dryer go. Or something.
Oh but wait, there's more!
Remember that premonition about the checksums? Well guess what. When I copy files from the new "main" MP3 disk to the new "backup" MP3 disk, I find that some of the files don't match. This can't be blamed on the "slow" disk, because it's not even attached to the system at this point. What the hell? I pull both versions of one of the files into Emacs and compare them. They're the same length, but starting a few MB into the file, there are a few bytes that have been changed in non-obvious ways. Oddly, mp3_check reports no MP3 errors in either file, which I guess just means the bytes didn't happen to be diddled in an MPEG header.
So I re-copy them again, and again that works.
Since then, I've seen this same kind of file corruption when moving files from partition to partition within the same disk, on both of the new disks. Let's recap:
- brand new disks, two different vendors (IBM, Western Digital)
- brand new IDE cables
- latest stable kernel, 2.4.19
- recently ran "memtest86 3.0" to verify my RAM
- failures seen between two partitions both on the IBM disk
- failures seen between two partitions both on the WD disk
So that sounds like either: ext3 is a way flakier file system than it can believably be, given how widely deployed it is; or, my mobo's IDE controller has lost its mind; or, there's some mysterious white-hole source of cosmic rays under my desk, flipping bits willy-nilly.
Someone said they had seen this kind of thing when slow disks were being used on a fast bus, but I'm pretty sure these disks are way faster than my bus. And my bus may even be running slower than it should be. I don't remember what mobo is in this machine, and I don't want to take it apart again to look, but syslog says
- ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
So the disks are probably running at like, 1/4th the speed they're capable of? That sounds like it ought to be pretty fucking safe.
So, after all this, I've been disabused of the fantasy that I don't need to back up to DAT any more, and my machine is now sitting here making the harmonizing rackets of ripping CDs and writing DATs at the same time. I'm going to be doing this for at least another week.
I really fucking hate computers. I just want an appliance that works.