How the Internet Archive is Digitizing LPs to Preserve Generations of Audio

Earlier this year, the Internet Archive began working with the Boston Public Library (BPL) to digitize more than 100,000 audio recordings from their sound collection.

The recordings exist in a variety of historical formats, including wax cylinders, 78 rpms, and LPs. They span musical genres including classical, pop, rock, and jazz, and contain obscure recordings like this album of music for baton twirlers, and this record of radio's all-time greatest bloopers. [...]

Once cataloged, the LP's are then digitized. The Internet Archive partners with Innodata Knowledge Services, an organization focused on machine learning and digital data transformation, to complete the digitization process at their facilities in Cebu, Philippines. An Innodata worker digitizes 12 LPs at a time, setting turntables to play and record by hand, then turning each record over to the next side. Since each LP is digitized in real time, it takes a full 20 minutes to record an average LP side. By operating 12 turntables simultaneously, the team expects to be able to digitize ten LPs per hour.

This is awesome. However, as someone who has put many hours into manually digitizing records using what I think is the very same turntable pictured (they look like Panasonic 1200s to me) this idea that you can just fire-and-forget seems... fantastically optimistic. Perhaps the records in the BPL collection are all of a never-been-played, white-glove level of archival quality, but if these records have ever been owned by a human... no. No, that's just not going to work. It's going to be a wobbly mess of oscillating playback speeds and skips.

Decades ago I read about someone's project to rip vinyl by scanning the disc on a flatbed scanner and then processing the image. Did that ever go anywhere? That would be kind of analogous to the way IA is archiving old floppy disks with flux scans that record a ludicrously high resolution image of the analog waveform rather than the bits comprising the file system.

Previously, previously, previously, previously, previously, previously.

Tags: ,

21 Responses:

  1. pagrus says:

    Did you go to the block party? I thought all the stuff they talked about seemed very relevant to your interests but I am having a hard time finding official announcements about everything.

    They also announced that they are archiving and transcribing many (millions?) of hours of radio which on the face of it might seem kind of eh. I am very excited about this though, partly because I miss some radio shows that I like a lot and it would be cool to be able to roll back and listen to them.

  2. Ben Collver says:

    Y'all probably know better than me that floppies hail from an era when software routinely went outside of the file system.

    • jwz says:

      Yes, that's kind of the point -- flux recording is a way to reliably image "copy protected" disks that were manufactured with weak writes and half tracks and other shenanigans, without having to crack the software.

      I eagerly await the arguments over how to properly archive records with looping music in the play-out groove!

      • k3ninho says:

        > arguments over how to properly archive records with looping music in the play-out groove!

        There's a tracker-like spec for looping a chunk of a waveform in the forthcoming MPEG 5 that's coming with the 5G radio data-telephony thing. I'm not saying it's exploded to touch every part of existence, that's to be decided when the spec is signed off, but the Legislation and Constitutions Working Group are tardy, as are the Quanta, Relativity and Gravity for Unified Cosmology WG, in submitting their portions of 5G.


  3. Derpatron9000 says:

    A timely reminder to once again make a donation to, thanks.

  4. Frew says:

    This may be what you were talking about, taking a picture to convert it to sound.

    • Rich says:

      Not the one I remember. I read somewhere about digitizing shellac records (none of your newfangled vinyl) from broken fragments. They'd be trash, otherwise.

      One of the nits Jamie picks at above is the 'preowned' problem. I used to listen to a pompous twit by the name of Frank Wappat on the BBC who had special needles made to match the angle the grooves were cut at, and guess what? They were as clean as a whistle then.

      Cheap needles are what make your recordings of vinyl and shellac sound like shit, not the records themselves.

      • jwz says:

        Apparently you did not store your records unsleeved in a cardboard box full of loose gravel, like the previous owners of literally every used record I ever purchased in my life.

  5. Louis says:

    This girl digitized a tiny vinyl record by processing a digital photo of it:

  6. tfb says:

    I think that naïve calculations show that the 'make an image of a record' trick does not work: there aren't enough bits in a digital image (or were not until very recently) to get anywhere close.

    'Naïve' because ignoring compression is a mistake, but not completely stupid because you probably don't want lossy compression in anything archival, and I'm not sure how well lossless compression can do.

  7. Jason Scott says:

    Well, first, let's show the class some of the output:

  8. Nick Lamb says:

    It's going to be a wobbly mess of oscillating playback speeds and skips.

    Why would playback speeds oscillate? That's entirely at the mercy of the player, which just needs to spin the disk at constant speed. If there's a consistent problem on a player, or across all their players you can just "fix it in post" anyway.

    Skips I'll grant you are likely. But this is all human audio, it had skips in it when lots of real humans listened to it already. This is quite unlike the floppy disk situation where a naive copy just doesn't work as intended in a lot of cases.

    • jwz says:

      Have you never played a warped record? This is a thing that happens when spinning discs are no longer circular, or flat. When you're dealing with a crappy record you have to eyeball the strobe nubbins and tweak the speed as it's playing. It's horrifyingly manual.

      • MattyJ says:

        I may be reading too much into it, but I translated "... an organization focused on machine learning and digital data transformation" to mean that they would be putting some AI on that to take care of that stuff. Otherwise, why go all the way to the Philippines just to record some vinyl?

        Their website is so full of marketing speak it's hard to tell what they actually do, though.

        • k3ninho says:

          >Otherwise, why go all the way to the Philippines just to record some vinyl?

          Cheap labour and space to put out all the Technics 12xx record players alongside the vinyl source recordings.

          The "machine learning and digital data transformation" stuff will be on rented computer time with a mainframe provider such as AWS or Google Compute.


  9. Pronoiac says:

    Checking my bookmarks, I saw Digital Needle in 2013.

  10. Tree Speaker says:

    Or multi-track-on-the-same-side discs, c.f Mad Magazine's It's A Super-Spectacular Day or The Monty Python Matching Tie and Handkerchief.

  11. Andrew Klossner says:

    Here's the E&T article on laser turntables. "Laser turntables are so thorough in their scrutiny of a recorded groove that they will pick up everything the groove contains – including alien deposits that have not properly been cleaned from the groove – not necessarily physical damage such as scratches."

    • jwz says:

      I had a photo negative scanner years ago, and one of the tricks it did was to make several scanning passes in different wavelengths of light: dust and other crud have different optical properties than film, so it used that pass to build a dust map that allowed it to apply noise reduction to the dirty spots. Like a list of bad sectors. I'm surprised they didn't mention the laser turntables using a similar trick.

      • Torkell says:

        A lot of film scanners use an infrared light to do dust/scratch removal - most colour negatives are transparent to IR light.

    • Tim says:

      ...but: "In the case of the ELP device, records must be black; coloured, transparent or translucent records cannot be played, so laser’s not good for bringing new life to your punk picture-disc or New Wave coloured vinyl collection."

      And presumably no novelty-shaped records either. For shame!

  • Previously