Something about Linux, Time, and Value.

So, the drive in my DirecTivo is going bad. Hangs every few days, "DriveStatusError BadCRC" in the logs. The usual. Good times.

The annoying part here is that I just replaced a failing drive back in February by buying a pre-formatted replacement drive from DVRUpgrade.com -- and the drive they sent me went bad in eight and a half months.

Turns out they only offered a 6 month warranty, but I mailed them anyway ("Come on, guys, 8 months to failure? Seriously?") Their answer was, "Get fucked."   "See if the drive manufacturer will replace it, send the new one to us, and then we will charge you again to copy a new image onto that drive."

Nice.

So now I have a decision to make.

Option 1: Borrow someone's linux machine, figure out how to get MFSTools running on it, try to use that to copy the failing drive to a new drive, and pray. (Because apparently Macs can't read Tivo partition maps, so I can't just rsync or dd the fucker.)

Option 2: Just pay these asshats another $300 for a new pre-formatted drive, re-enter all of my season passes, and hope this one lasts longer than eight months.

I know -- I KNOW -- that Option 2 is the right thing to do. I know it deep inside the place where good decisions come from. But Option 1 is still so viscerally tempting because it infuriates me to give more money to people who have already ripped me off, even when the amount of money is small enough that it is far outweighed by the amount of hassle that not spending it would cause.

Computers are stupid and my indignance is trying to sabotage me.

In related news, I understand that the new HD DirecTivo is still on track for release in mid-2009. Har har har.

Incidentally, recent investigations show that "just torrent everything" is still a shittier experience than the increasingly-shitty experience provided by SD DirecTivo.

Update: In case you're curious, I ended up cloning the dying 750gb drive onto a spare 1tb drive I had lying around using ddrescue and that seemed to work fine, geometry be damned. So, yay. Without repartitioning it (which I think would require Linux-based indignities) I'm missing 1/4th of the drive, but who cares, I've never filled the thing anyway.

Tags: , , , , ,

62 Responses:

  1. @akx says:

    How about a virtualized Linux and an USB enclosure for the target drive(s) if you don't feel like finding a real Linux box?

  2. Roger says:

    There is probably a good chance you can just dd the whole disk from the old to the new. It won't take into account any additional space but other than that should work.

    • jwz says:

      I thought it never worked to dd one raw drive device to another unless the two drives had identical geometry (platters, sectors, etc.), even if the target drive was larger? So I'd have to track down the exact same model drive instead of using one that I have lying around.

      • Wibble says:

        I would think it matters less with larger drives that have to hide the geometry behind LBA maps and whatnot. Perhaps not?

      • Richard Perrin says:

        Have you searched yet for availability of the exact model drive?
        Option 3) "Buy same shitty drive and blindly dd the mf-er" seems like a winner to me.

        • Niczar says:

          Unless Tivo enforces some kind of check on disk SN#, mfg and model, it's actually safer to get a larger disk; if you try getting the exact same model, you run the risk of getting one that's actually slightly different with a slightly smaller available space. It's happened to me before.

      • Ian Young says:

        I've done it, and it's worked. Now, will it work for the Tivo? It'll cost you--what--$70 and half an hour (not counting dd time) to find out.

        Hell, MFSTools has a bootable ISO. Don't tell me you still have only PPC Macs...

        • Andrew Wilcox says:

          Don't be ignorant. He isn't stupid enough to run 10.6 on PPC, see his rant on how badly iCloud sucks (which, it does).

      • Michael Dwyer says:

        I've never had problems dd'ing complete images from one drive to a larger but unrelated drive.

        The geometry only matters at boot time for older BIOSs. At boot time, the odds of you leaving the first cylinder or head is so minuscule that it usually doesn't matter in practice. Once the bootloader starts everything else is LBA and the disk is just a giant long array of blocks.

        I suppose there might be a performance issue if you switch from a 512 to a 4k block hard drive, but even then, I'd take the risk. I'd bet that it would be worth your time to try it.

        'Course I'm betting with your time, which is kinda cheap to me... Okay, I'd actually bet that it would be worth my time to do it.

        • Julian says:

          +1.

          So far never had problems doing that on current era hardware. Firmwares have ignored "geometry" for probably more than a decade now. I don't know what the Tivo disklabel / partition map looks like, but I would give it a try.

          As a bonus, you will have a readable copy should your disk fail, and it sounds like it's pretty close now. It's OK to ignore bad sectors and write zeros in your copy instead. Chances are it won't matter much. Then you could even try to copy back to the original drive, and have significant chances are that it will work again for some time (maybe years). Or break for good.

          My own experience told me that with modern low-cost drives and enclosures that tend to keep too much heat inside, rewriting the whole data once in a while helps.

        • Zygo says:

          OS software used the critical geometry data from hard disk partition tables during the era when a) there were half a dozen drive-specific variables, b) hard disks could not answer questions from the host computer like "what cylinder should I put the disk heads on before shutdown to avoid damaging you?" or "do I need to skip 630 out of every 1008 sectors of address space because you have only three platters?", and c) the PC partition table could adequately contain some or all of the data. PCs in particular had two copies of that data--one in CMOS on the motherboard, and one on the disk--and different software (including BIOSes, OSes, and boot loaders) would use one or the other of the two copies with disastrous results if they disagreed.

          Today sane OSes only bother to read the partition table to learn what areas of the disk they should quietly pretend don't exist and stay the hell away from, and (except when creating a new partition table on a blank disk) never care to know any detail of the underlying disk geometry at all--not even the total size. Drives can know all of the details, and these days those details are non-trivial.

          Modern hard drives penalize the clever. ATA hard drives can be configured to have some areas that are read-only for system recovery images and the like. The "disk size" includes all areas, but there may be 500GB of readable data on the drive, while only 495GB of it is writable. The smart thing for software to do when it sees a drive that is larger than its partition table describes is to ignore the difference. The dumb thing to do is naively try to use the extra space, because there's no telling what will happen to any data that ventures into the last bits of the drive (the write could fail with an error, or the drive could silently ignore the write request, or the drive could remap bad sectors and make some of the last sectors inaccessible, or something even worse can happen).

          Some RAID controllers put a block of array configuration data at the high end of each drive. Those will break if you copy the data onto any different drive size; however, RAID controllers will happily writ e a new configuration block on a new disk for you and even fill it with a redundant copy of your data, so it's rarely an issue in practice.

      • tarzxf says:

        There's also dd_rescue that's on most of the MFS tools live discs that can skip over bad sectors and use smaller and smaller sectors to get as much data from the bad areas as possible. I might also have the instantcake ISO for your model laying about.

        • DFB says:

          ddrescue is better than dd_rescue for this case, and it would run on OSX if both drives could be connected to the mac. ddrescue will do things like approach bad sectors from both sides without manual intervention.

        • Tyler Wagner says:

          Seconding (or thirding, etc) that you can dd between heterogeneous devices, and that dd_rescue is better.

          I advise you to do both option 2 and 1 - if your old disk is fucked, you don't want to clone it. Buy an image again, and this time clone the drive to a raw file somewhere. Then next time you can just restore it.

      • Andrew Wilcox says:

        For what it's worth, on a Mac you'd have to dd the /dev/rdiskX, not /dev/diskX. The r stands for raw or something and if you don't, it mucks everything up horribly. But yes, you can dd /dev/rdiskX to any other /dev/rdiskX >= the original size; the only thing you'll lose is the extra space if your target disk is bigger. I've done this with internal, USB, and FireWire, so I'm pretty confident this will work.

      • Adolf Osborne says:

        I've had excellent luck dd'ing a smaller drive to a larger drive and making it behave exactly like the smaller drive did. Especially with Linux (which the Tivo allegedly runs): The kernel simply trusts whatever the partition table says, even if it's wrong, and even if the partitions themselves don't start/end on a cylinder boundary like they're "supposed" to. (fdisk throws a fit when it sees it, but nothing about running dd implies also running fdisk.)

        The greater question, perhaps, is whether or not your existing drive is in sufficiently-fit form to survive a session of dd. And to that end, I think if it were me I'd dd the thing immediately and look for options afterward, even if all I had for Important Data were some Season Passes...

      • Niczar says:

        I'm surprised you of all people would say this. I've done it a dozen times in the last 10 years, so I know for a fact that it works in general. Just copy the bits from one disk to a larger one (cat is much faster than dd with default values due to it copying one 512 byte chunk at time), adjust partition table and then extend volume manager and filesystem. And it's supposed to work anyway since disk geometry has been bullshit for something like 15 years, it's just there for backward compat in ATA et al.

        • Adolf Osborne says:

          I think that the bs= option in dd is what you're missing when you think that cat is faster.

          When I toyed with it somewhat extensively several years ago, I found that bs=8192 worked the best (fastest) on the particular hardware I had in front of me, and was far faster than bs=512 (which is how you claim cat operates). YMMV, of course.

  3. David says:

    If you dd the raw block device (and buy an identically-enough sized drive) why would it matter if the Mac can read the partition table or not? Or is there some magic Tivo juju that needs to happen for a different drive to work?

  4. Option #2 is the right decision. I ran through this whole rigamarole back in 2002 with my original Philips SD TiVo, and while I eventually made it all work it is seriously Not Worth Your Time. (Pop quiz: what is the most recent version of ubuntu or fedora that will boot on a PC old enough to have ATA33 connectors on the mobo but also has MFS installed by default so you don't have to shed blood recompiling an ancient version of the linux kernel? Answer: spend the $300, dude.)

    • Julian says:

      By the way, don't even think about using any kind of USB to PATA or SATA adapter for data recovery. The ones that can handle read errors (in terms of how the silicon is etched) are extremely rare, and anyway AFAIK even if the chip designers got it right the kernel you will use (mac, linux, bsd whatever) will fail.

      • Adolf Osborne says:

        Whatever you say, Doc. My own anecdotes about USB adapters play out completely differently from your own, with a whole lot of not-fail when a read error happens.

        (Windows tends to vomit on read errors over USB, but then Windows tends to behave irrationally on any other read error as well. [Whatever] seems to behave pretty well.)

        • Zygo says:

          (not really) Surprisingly, different USB to [PS]ATA adapters behave differently, and the firmware that tells the etched silicon what to do with all those annoying USB block read requests coming from the host is the root cause of a lot of mysterious behavioral deviation. Debugging and reloading the firmware in the adapter is going to cost more than $300 in specialized tools alone, so let's ignore that option for now.

          While the $80 USB adapter you bought in 2008 is probably garbage, a $30 adapter in the local retail store this afternoon is probably just fine. Old ones are awful (locking up not just on error cases, but also on every billionth successful I/O). New ones support proper ATA-level pass-through (as opposed to supporting only the minimal USB storage interface), so you can examine the drive's SMART monitoring data and find out whether your drive just needs its data reloaded, or is about to fail completely and permanently.

          To answer the earlier question: The latest version of Ubuntu or Fedora will talk to such a device, and so will year-old and two-year-old versions, and lots of live CDs.

          With a known bad drive, you can find out which category your USB bridges fall into.

          • Adolf Osborne says:

            What about the $14 adapter that I bought in 2006? Because that's the sort of trash that I've had good success with.

            I've never poked around with SMART using such an adapter because by the time I'm in data recovery mode I'm already convinced that the drive is fucked. Obviously if I bothered with asking SMART about it by that point, it would at best produce redundant information. (It's my opinion that any drive that starts losing bits and generating errors is destined to be binned ASAP.)

            But I might give it a shot, just to see.

            • Zygo says:

              Congratulations on winning the old-USB-ATA-bridge lottery!

              I wouldn't be so fast to bin a hard drive with just one bad sector. Consumer drives need to be sold with bright red labels that say "WARNING: this drive can be expected to lose 0.0000000001% of your data in normal operation." That's what the specs allow, if anyone bothers to download and read them.

              Drive firmware will transparently replace the capacity (but not the contents) when new data is written at an LBA that went bad. When this happens, if you're listening carefully, you'll hear an extra seek whenever you access sectors sequentially near the error. Few people notice this in practice--the lost sector is in an area of the disk that nothing ever wants to read, or it gets written over before anything tries to read it, or the operating system trashes far more of the user's data and presents them with more error messages that they can't understand than their drive ever will.

              Disks with minor errors on them can be recovered in situ, without removing them from the machines they're installed in. If you have a RAID mirror of the disk, you (or your RAID controller) simply replace the lost data from the mirror disk. If you have backups, you restore data lost in bad sectors from those. If you have neither, you just learn to live without your data.

              I run full surface scans daily over my disks, and get email when errors are detected or corrected. In my experience it's common for a drive to get a few dozen bad sectors in the first two years, then nothing for the next 5 years, then the drive dies (or gets taken offline just because it's 7 years old). "No errors at all" and "total instantaneous failure" are common cases too, but they require different responses.

              When there are high-hundreds/low-thousands of bad sectors on a disk, or if the disk is hanging instead of returning neat and tidy error codes, it's probably time to replace it. Something in the drive destroying hundreds of sectors at a time is very wrong, but one or two is totally within spec.

              • Adolf Osborne says:

                You know, I'm old enough to remember keying in the list of known-bad sectors into a debug script under MS-DOS from a label printed by the factory and stuck right to the top of an MFM drive. I'm totally aware that hard drives are imperfect.

                However, it has been my understanding, for decades, that IDE (and by extension SATA, plus any modern SCSI implementation and...and...) hard drives map away bad sectors in a manner which is never visible to the operating system under any normal circumstances. In order to do this, a certain amount of space is reserved on the disk.

                Any bad sectors which are visible to the OS (with, eg, badblocks) are an indication that this reserved space has been filled and that the disk is no longer capable of remapping sectors on its own unless Clever, Low-Level Tricks are performed to start things over (and lose an insignificant amount of capacity in the process).

                Which, you know, is just hiding a problem which has demonstrably grown beyond what the manufacturer expected.

                I'm sure that things look very different from the standpoint of a well-managed storage system that is frequently inspected with SMART. But in this context of single-drive end-user PCs and TiVos, any error which is repeatable and reported without special care and feeding is an indication of a drive that is attempting to die.

                I can't make my customers use RAID. And I can't make them perform backups. But I can help them get away from hardware which is approaching death, which a modern hard drive with visible bad sectors plainly is. (And no, performing Clever Low-Level Tricks to "fix" the problem would not be doing them any favors.)

                • Zygo says:

                  You have some details wrong. TL;DR: Write errors are very bad, read errors are unfortunate but do not (within reason, in and of themselves) indicate any sort of problem with the drive.

                  Modern drives add ECC bits (the bit densities are so high now that a zero bit error rate is infeasible) to the data, and retry reads that fail internally. Expensive drives have more ECC bits, and disks designed for RAID and NAS vary the number of retries and push some of the ECC and defect management up into the host storage stack, but they're basically the same.

                  When an error is too big for ECC to handle, you get an uncorrectable read error (or UNC sector). Most of those are discovered during the manufacturing process, and listed in the drive's defect table so they are not visible to a host OS. Those that are discovered later during write operations are added to the drive's grown defect table. SMART reports on the size of that table and the number of times it has been extended, and the drive may or may not let you see the contents of the table with some obscure SCSI commands.

                  Uncorrectable sectors that are discovered during read operations are visible to the OS--they must be, because data has not been successfully read from sectors on the disk, and if the drive said it had done so then the data would appear to the OS to be corrupted. This is why healthy drives sometimes report medium errors during reads.

                  Manufacturers expect hundreds of UNC sectors during the service life of a modern hard disk. The tables have room for thousands, and many drives never use even one, to give you an idea of what the expected range is (there is usually room on modern disks for millions of remapped sectors, but even low-end consumers notice gigabytes of data going missing, so it's artifically capped at a number like 1023 or (size of a small integer number of sectors) / (size of remapped sector data structure)).

                  When an error is detected during a write operation (disk heads have separate read and write elements), the drive firmware will remap the offending sector and retry the write transparently. There is no need to inform the OS of this as it does not in and of itself affect data integrity. If remapping is not possible (e.g. the UNC sector table is full, or the head is broken, or the electronics are failing, or the bad-sector-remapping-table sector is bad) then the drive will send the OS a write error. This is why a drive that reports a medium error on a write is not healthy.

                  I've never seen a disk that worked properly after its very first write error. Such disks invariably have SMART data logs showing thousands of remapped sectors and sometimes other error flags.

                  If you see a disk with a thousand or more bad sectors using a read-only test like badblocks, then it's dead--if you tried to recover the bad sectors you'd just fill up the grown defect table and find yourself in the "when writes fail" case above.

                  If you have a disk that is hanging, corrupting data, failing to spin up, or otherwise operating completely out of spec, then read or write tests don't matter.

                  Overwritng a disk with simple UNC sector errors (e.g. with 'badblocks -w' or using dd to copy the data from somewhere else) works. It's the way the error recovery strategy of low-end hard disks is designed to work.

                  If you return a drive with a small (low hundreds) number of bad sectors for warranty replacement, and it doesn't have electrical or mechanical problems, the manufacturer will (grossly oversimplified) just blast all the sectors with zeros, reset the user-visible serial number and UNC sector counts, and ship it to the next customer because there's nothing wrong with the drive.

                  • jwz says:

                    Thank you for that. Even though I have absolutely no practical use for the information in your massive data dump, I still love it when that happens. (I am not being sarcastic.) It's like if I was reading some blog and someone made some erroneous statement about CDR-coding and I had to take them to school on it. It always brings me joy to know that someone, somewhere has something truly obscure in their list of Dream Jeopardy Categories.

                  • Alex says:

                    Thanks - that's really useful! I just ran smartctl --t short against my laptop HDD and found a grand total of four read errors and zero write errors in four years of daily service.

    • phuzz says:

      Well, the latest version of Debian runs fine on my NAS box, which is something like 6-7 years old now, and has an ATA interface, so I'm going to guess, the latest version of ubuntu will be fine.

      Unfortunately, now is exactly the wrong time for a harddisk to die, what with most of the fabs in Thailand still being under water. In the UK we're seeing prices about three times higher than they were a few months ago.

  5. As an aside, I honestly believe that the "deals" that DTV, Comcast and various other cable companies did with TiVo were deliberately designed as trojan horses: announce that native tivo support is coming "any day now" so that their customers don't get itchy feet, and then drag their heels on the implementation forever so that tivo inc eventually bleeds out while waiting for the licensing revenue to come in.

    • Travis Roy says:

      The Comcast "TiVo" was just a java version of the TiVo frontend that runs on Comcast supplied cable boxes. It gives you a few features not on the normal DVR, like the 15min tick marks, and suggestions.. But they let it lag behind the normal DVR for features (network scheduling sucks for example). That version is going to be discontinued and a discount on a TiVo Premier will be offered for those that want to continue with the TiVo experience.

      Of course, on cable, you could get a CableCard TiVo for years now.

  6. David says:

    If it is available for your model Tivo, you might try "InstantCake", which is essentially a bootable CD that contains the Tivo image. You boot it on some handy x86 machine (I've actually done it with VMware, but that took manual poking around), and it writes the right bits to a new drive that you provide. It's less hassle than #1, and somewhat less offensive than #2.

    Of course, you buy InstantCake from the same people who ripped you off, but on the brighter side, it's only $40 or so.

  7. Mike Hoye says:

    So, two things. The first is that in my limited experience that bad CRC thing is more likely a bad-cable problem as a bad-drive problem; at least it's cheap to check.

    The second is that ddrescue is the tool you really want for this, not dd. Works fine on a Mac, and if you do the whole drive over to an image, and then back to another same-sized drive, neither it nor dd should give any fucks whatsoever about partitions. It's available painlessly through macports. I would start by pulling the drive, getting one of those USB-to-(p/s)ATA adapters and taking the image with ddrescue before blowing the image onto a similar drive with dd.

    In some instances I've let ddrescue run for days, and been able to get usable images out of far-side-of-failed drives; the drives haven't needed to be physically identical provided the target is same-size or bigger. It's a pretty great piece software.

  8. MZ MegaZone says:

    MFSTools is ancient - no need to deal with it. Check out MFSLive and WinMFS: http://www.mfslive.org/

    Also, TiVo has their quarterly financial call yesterday and finally announced dates for the new DirecTiVo THR22. DirecTV has said they'll ship it for select markets in early December (2011, yes), and will release it wide in early 2012.

  9. Option 2. Think of it as paying yourself to not stab your eyes out trying Option 1.

    In a related note, there are ways to make the "just torrent everything" experience less shitty, but I doubt you have an XBox 360. Vuze combined with EZTV RSS feeds is a nice setup, when the site isn't shitting the bed due to DDoS or other various downtimes. Vuze will also stream to a number of devices, but it's doesn't have to transcode for the XBox 360 which saves a shit ton of time. Yes, you can measure time in tons.

  10. Ron says:

    'Incidentally, recent investigations show that "just torrent everything" is still a shittier experience than the increasingly-shitty experience provided by SD DirecTivo.'

    True, but binary newsgroups may not be. SickBeard + SABnzbd.

    • Andrew says:

      Another recommendation for SickBeard. Setting it up is not painless, but beyond the setup you just about never have to think about it again.

  11. Kevin Murray says:

    Just shove it in a system on the network and Clonezilla the sucker. Should allow you to put it on another disk, even if it is bigger. You should be able to run Clonezilla via a live CD / live USB.

  12. Otto says:

    Forget MFSTools and that pain. Use an MFSLive boot CD or WinMFS to copy the fucker from one to the other. See http://www.mfslive.org/softwareguide.htm . Way simple, no need to learn arcane crap.

    And yes, technically you can just dd copy the sucker, as long as the target is larger. Very first copies of drives were made in this way. You lose that extra space, but geometry doesn't have to match anymore. It's all virtual anyway. The OS really treats it as a big ass byte store nowadays anyway.

  13. Russell Borogove says:

    Out of the box question: what is Tivo+cable getting you that iTunes Store + patience wouldn't? Personally, I don't mind being a season behind on most TV shows, but that's partly because I don't care about water cooler OMGWTFHAPPENEDLASTEPISODE chat.

  14. gryazi says:

    The one thing no one has asked yet is "So what was the drive that blew out in 8 months?" so we can all collect an anecdotal data point for your trouble.

  15. Julian says:

    Option 3: buy drive, borrow a linux user that knows something about dd_rescue, offer free pass and drinks to DNA lounge if successful.

    Not me. Flight would cost more than $300.

  16. MattyJ says:

    Obligatory non-answer 'Get a Roku' thread starts here.

  17. Option 2b: Order from some different asshats, such as http://www.newreleasesvideo.com/index.php. I've never used them myself, so this is somewhat useless. But surely someone on the LazyWeb knows someone else who does this, other than the original asshats.

  18. JCB says:

    MFSLive has a boot ISO. I've been using it for quite some time, and have never had an issue copying flaky drives with it.

  19. David M.A. says:

    Something about your posts asking the lazyweb for assistance generates way more comments than anything else (save the nymwars posts that get picked up by the tech blogs). Why is that? Is everyone hoping for bragging rights? "I helped jwz with a problem once, you should totally sleep with me!"

    I wonder how many bars in Silicon Valley that'd work as a pickup line in.

    • jwz says:

      Note that I didn't actually ask a question here!

    • Adolf Osborne says:

      I "helped" JWZ with a couple of problems once, a long time ago, and he swore off Linux forever shortly thereafter citing some arcane difficulty with the very sound card I'd recommended...

      And so, let it be known: "I made JWZ hate Linux" does not work as a pickup line.

    • Otto says:

      Damnit. Helping jwz with problems has never gotten me laid. :(

  20. I've had *really* good experiences with weaKnees (http://www.weaknees.com/). The sent me a new large drive, I dropped it into my gen-1 DirecTivo, and it's worked for several years without issues.

    • Adolf Osborne says:

      Yeah, but the problem here is that of warranty, and integrity. Both weaknees and dvrupgrade.com simply buy plain-pack OEM drives that include a manufacturer's warranty, install some Tivo MFS magic, and sell them to their own customers. Drives sourced from either vendor are equally likely to fail after a given time.

      The problem is that this particular drive failed early, by random chance, and dvrupgrade doesn't care. It's not dvrupgrade's fault that this happened, but it most certainly is their fault that they'd rather piss off a customer than simply perform the MFS magic one more time on a customer-supplied replacement drive.

      (Or, better: Just handle the manufacturer warranty swap themselves and ship out a new/refurb unit post-haste. Certainly, the dollars this would cost have far less value than the bad press that is a JWZ dissertation that names names.)

  21. basketofcat says:

    What was the extent of your investigations about torrenting these days? I take it you looked at flexget and thetvdb.org?

  22. Stephen Thorne says:

    Silly question. Why not buy another harddrive the same, and take the tivo guys up on 'Option 1'. You can mail them a drive *today* and hold them to their promise to copy all the data over.

    I know it means you have reenter all your season passes, but at least that way if option 1 doesn't work out, you're not up shit creek.