How bad an idea would it be to re-encode the audio on all of my videos?

Something like:

# Extract the audio track from the video as a 16 bit 44.1kHz WAV.
ffmpeg -vn -acodec pcm_s16le -ar 44100 -ac 2 TMP.wav -i OLD.mov

# Use lame to determine the volume adjustment for that WAV:
lame -f TMP.wav /dev/null | grep ReplayGain

# Convert dB to a ratio: r=10^(db/20)
# Convert ratio to ffmpeg's -vol arg: r*256

# Use ffmpeg to re-encode the audio at the new volume, while leaving video alone:
ffmpeg -i OLD.mov -vcodec copy -vol VOL -acodec libfaac -ab 160k -map_meta_data 0:0 NEW.mov

My worry, of course, is that doing this without actually listening to each file afterward might be a crazy thing to do, since it's a lossy process, and the files are huge, so I'm not going to want to keep both copies around for long. And also because I don't trust ffmpeg to not just go off and do something completely insane every now and then.

I'd be really, really worried about losing A/V sync on some files and not noticing it until later. Is this a way in which ffmpeg is likely to fuck up?

I am only even considering this, of course, since Apple broke the "Volume Adjustment" option on video files in iTunes 10, and I have no reason to expect that they'll ever fix that bug.


Update: Two things:

  1. Apple fixed the bug! They obey Volume Adjustment again as of iTunes 10.5 beta!
  2. That doesn't help with videos that require more than +6dB, though, so I wrote this: video-replaygain.pl.

Tags: , , , ,

25 Responses:

  1. Can't you create a QT movie that *references* the original, but applies a volume transform to it?

  2. Ewen McNeill says:

    The main risk which would worry me about that process is how reliably audio/video synchronisation would be maintained during the reencode process. My experience in the past has been "if you're lucky it'll still be in sync". Possibly the chances are better if you're rencoding to the same audio codec and bitrate as was in the file before. (If there was some metadata that you could set to indicate gain on playback, rather than reencoding anything, it'd be a lot safer. But that might boil down to the problem you first started with.)

    Ewen

    • jwz says:

      There is volume-adjustment metadata that can be placed in a Quicktime file.
      There is volume-adjustment metadata that can be set in iTunes directly.
      iTunes 10 ignores them both.

  3. i have frequently had synchronization issues with ffmpeg.

    i think the better solution would be to stick a compressor in to the audio chain somehow- the simplest thing would be to just literally get a cheap hardware compressor (i'd guess this would set you back like $100,) or perhaps there's an opensource standalone compressor that'll do this.

    • Jake says:

      A compressor would work to a point. The more compression you apply, though, the more the signal is "squashed" and degraded. You can get away with it to a point, but I think simply normalizing the volume is preferable in this case.

      • jwz says:

        We're using a software compressor via Audio Hijack Pro, and we've been tweaking the settings on it for months, but it still tends to breathe and sound crappy a lot of the time.

        • Adolf Osborne says:

          Compressors do that. It's what they know how to do.

          Even fancy broadcast compressors with the ability to look ahead in time (by minutes, seconds, or hours) to see what's coming around and adjust things accordingly slowly still suck.

          It seems to me that the most elegant solution would be to get out the pitchforks and torches and spend some quality time in front of Apple until they fix the shit that they broke.

  4. Is there any way you could Mechanical-Turk the QA process? Like, make humans do a sync-check captcha before you let them post comments to your blog enter the DNA Lounge?

  5. Danny Dulai says:

    It's rare you ask for a subjective opinion...

    I assume you want to keep these videos around for a long time, if not as long as possible. I think that loosing the quality would be pretty bad thing for long term storage. If you agree, then your only options are to use a different player or to keep the two audio streams around.

    Lets go with the 2 streams because you seem to want to use iTunes.

    Ideally you would extract the original audio content in whatever the original format is, not decoded to wav. Process the original->wav->replaygain->aac. Then encode both audio streams (the original format & the replaygain'd aac) into the video file. Think of the second stream like a second language audio track.

    Even for a couple of hours, an extra aac audio track should be smallish for modern storage, and years down the road, you could kill the extra audio track if you really didn't want the clutter.

    • Brian Enigma says:

      +1 to the above comment. Most containers will hold multiple audio tracks and the size of audio is almost negligible compared to the size of video. iTunes will (presumably) only play the first. You can use ffmpeg to extract the audio, which you can normalize to your heart's content. Then use ffmpeg, two "-i" flags, three "-map" flags, and a "-newaudio" flag to map video->video ("-vcodec copy"), the external (normalized audio)->audio1 and the video file's audio->audio2 (with "-acodec copy"). This will result in a file that plays, by default, the normalized audio but a backup of the original audio is there as an extra track if you later run across a problem with the normalization and need to reencode.

      Alternately, you can just extract the original audio (as-is or converted to something lossless like wav) and keep that alongside the file instead of trying to remux it back in as a second audio track.

  6. James Corey says:

    I know how much you enjoy hearing obvious assertions that don't answer your question, so here we go. I've had good luck with batch audio transcoding, though I agree A/V sync is the main worry. If there's anything you consider precious, it seems reasonably cheap to archive the original compressed audio stream.

  7. subOctave says:

    I tried changing "volume" in the "movie properties" under QuickTime 7 for a video and QuickTime would then save and respect the setting but indeed iTunes completely ignored the metadata.

    ...but on the latest iTunes 10.5 beta (released yesterday) I could get the "Volume Adjustment" offset located in "Get Info" to work at least for the content I was testing with... My guess is that internally they want to store such data in individual playlists since many playlists can access one piece of media.

  8. Sean Graham says:

    I tested this in the iTunes 10.5b3 that is being used for iOS 5 testing and created this video

    http://dl.dropbox.com/u/22906/iTunes%20Video%20Volume%20Bug.mov

    This certainly violates some NDA I'm sure I'm bound by.

  9. Jeff says:

    I'm going to echo the synchronization doomsaying. About a year ago I had cause to attempt re-encoding the audio on a few dozen videos I had. I found no set of tools that would allow me to automate this process without the audio and video going into separate timezones.

  10. jone says:

    It's possible to losslessly adjust the volume of MP3 and AAC tracks with AACGain but getting the adjusted audio back to the video file is still the hard part if the file format is not supported by the tool.

  11. Sean B says:

    Normal replaygain works on the fact that the signal chain is something like "decoded_audio_signal * program_volume_control * master_volume_control" and then it stuffs the gain into, say, "program_volume_control" so it's computing "decoded_audio_signal * (replay_gain * program_volume_control') * master_volume_control". And you want to bake together decoded_audio_signal and replay_gain so that it's the equivalent "(original_audio_signal * replay_gain) * program_volume_control * master_volume_control".

    The above math is valid on reals, but the audio signal in most file formats is clamped to a [-1,1] range, so the question is whether the "replay_gain" value can be greater than 1, and by how much--is it possible for the (original_audio_signal * replay_gain) to also exceed one? Because if so it can't be encoded (it'll clip). I'd have to research what exactly ReplayGain does to know if it's possible.

    You might think 'hey, it can't clip, they're computing the amount to boost the signal to bring it to the same volume', but it's not the same--if it was a normalization gain, then yes, but it's a gain to make audio "sound" the same level, so it's not directly related to the waveform levels, it's a power measure, and they throw an EQ in that approximates the human spectral hearing, and such, which means the amount to boost could easily cause clipping (unless they designed the whole thing to avoid this problem by e.g. making replaygain turn down the loud songs and never boost the quiet ones, but given the name i'm doubtful).

    • jwz says:

      The man page for lame says that replaygain tries to make the average volume of the track be 89dB, giving you a positive or negative number to get there. It does seem to do some psychoacoustic stuff.

      But, it does look like the new version of iTunes will have this bug fixed, so that's good news.

      (Of course, the way I computed the Volume Adjustment setting for iTunes was by having lame compute the replaygain on these same files, so maybe that's all the same in the end.)

  12. Otto says:

    Instead of reencoding, you may just be able to apply a direct transformation. Check out the aacgain code: http://altosdesign.com/aacgain/

    • jwz says:

      Promising, but as far as I can tell, whatever aacgain does is ignored by all of iTunes, Quicktime Player 7, and Quicktime Player X.

      • Otto says:

        I haven't used aacgain, but I did use mp3gain back in the day. The mp3gain program actually modified the mp3's directly. It wasn't attaching some kind of flags or anything (though it could do that too), it directly modified the audio data. Therefore, player support was not required.

        In theory, an aacgained audio track would be modified similarly, therefore it wouldn't make any difference what support iTunes or Quicktime had. The audio data itself is modified without re-encoding it, making it louder/quieter.

        So you could pull out the audio, apply aacgain to it, then reattach the audio, and iTunes would have no choice but to use the now-changed audio data. Because you're not re-encoding, there's no loss involved. Sync should be less of a problem as well, since the length and track mark points are not changing either.

        • jwz says:

          aacgain succeeds in modifying the audio data inside a .mov directly, without needing to detach/re-attach. It's doing something to it. It's just not having an audible effect, even if I crank it up by 10dB.

          Also, it turns out that "detach then re-attach the audio" is not as simple as it sounds. I find that sometimes ffmpeg, when you tell it "-vcodec copy", sometimes fucks up the aspect ratio or gives you an unplayable file.

  13. kohi says:

    Maybe you could leave the .mov untouched and just create a new audio file with the same prefix (and maybe an _xy prefix) in its folder. Depending on which player you use, it might be possible to configure it to always prefer a certain "language" xy. But I guess this won't work with itunes as well.