normalizing audio volume on movies

Dear Lazyweb,

How do I normalize the audio volume of a bunch of MOV and MP4 files?

The "Sound Check" option in iTunes works passably well for MP3 files, but doesn't do anything for videos. This makes it annoying to use a playlist full of music videos as a source of ambient entertainment, since the volume fluctuates wildly.

(Only slightly surprisingly, iTunes doesn't use the RVA2 ID3 tag used by normalize, but instead uses a COMM/iTunNORM tag.)

I think that a solution involving manually pulling the audio out of the movie files, normalizing it as a WAV, and re-inserting it into the movie is probably doomed to synchronization errors. So let's not.

Tags: , , , ,

34 Responses:

  1. baconmonkey says:

    a painful and tedious route would be manual normalization.
    this can be done either in quicktime via Window > Show Movie Properties > sound track (and then saved to the file)
    or in itunes Get Info > Options > Volume Adjustment.

    I've not seen a tool to automate that, but I've not looked very hard.
    perhaps search

  2. bitwise says:

    Buy a cheap analog compressor and connect it between the computer and speakers? I had an old guitar effects box laying around that I use for this purpose.

    It'd be nice if computers could do this internally, kind of like a master insert on a mixer. Surely there are free compressor plugins out there, but they probably only work inside another piece of software and not directly at the OS level.

    • structurefall says:

      this sounds ridiculous at first, but actually may be the easiest way.

      we've had the normalization/compression conversation before- in all probability the stuff you're dealing with is already "normalized"- that just means that the absolute highest peak of sound during the course of the -entire- recording is as high as possible. what you want is to reduce the dynamic range so that they all average out the same, which is done by making the loud parts softer on really loud recordings, and simultaneously turning up the volume on softer ones.

      a lot of stereo systems used to have compressors built in, back in yon day, but i suspect it doesn't show up much anymore because most people don't know what the hell it does. your receiver's got a lot of crazy stuff on it though, i wouldn't be shocked if it was there.

      i could actually bring a compressor by for you to try out if you want...

      • lnghnds says:

        Normalization across tracks is different than compression. The latter will sacrifice dynamic range within a track whereas normalization would preserve dynamic range.

        Actually, in this case you might want an expander with some gain, no?

        • bitwise says:

          True, but it's possible to set the attack and release time on a compressor so that the difference isn't obvious.

          • adolf says:


            Any amount of reactive compression sufficient to remove the level differences from videos sourced from a variety of places will be certain to also remove all of the dynamics from the music.

            You can set the attack to be several tens of seconds, and it'll still fuck up every crescendo which might be present: Every quiet part will be turned up to be the same as the loud part following it, which isn't at all desirable.

            Yeah, I guess it might not be strictly obvious that this is happening, unless you've heard the track before with dynamics intact. And in that case it will go from "possibly non-obvious" to "murderously bad," or "butchered."

            I submit that the end result, if it actually accomplishes anything, will approach modern FM radio in terms of badness and destruction.

      • jwz says:

        Looks like my tuner (Denon AVR 2805) does have a couple of compression options, but only when the source is Dolby Digital.

        • structurefall says:


          it'd probably be better to stick a compressor in the path than to stick a dolby converter there, and you'll have more tweaking options with a standalone compressor.

          • baconmonkey says:

            Dolby digital has some magical flags in the stream for gain control and normalization. converting would not insert said flags.

    • structurefall says:

      oh- the other bonus of this is that it won't tax your processor OR overwrite the original files. any in-machine solution will do at least one of those things.

      • houdini_cs says:

        If part of your requirement is to not use the processor, using a computer is going to be difficult.

    • cigfran_lwyd says:

      i think i recommended this an hour or so ago... was the reply deleted?

    • strangedave says:

      You can use something like Soundflower (free from cycling74) to capture default system audio (if whatever app used to play video doesn't have the option of setting audio out), and then route it to something else that could compress all audio on its way out (the free developer tool AULab would probably do as well as anything if you use an AudioUnit plugin). Jack is the only alternative to Soundflower I'm aware of, it probably also does the job.

      A quick look through the free mac audio plugins on KVR suggests something like Rider is probably the go, as it looks designed for this sort of job, rather than the more 'musical' ones like Audio Damage Rough Rider. The mda dynamics plugins are probably worth a look, too. Ignore blockfish, its PPC only and likely to stay that way.

      • jwz says:

        What am I supposed to do with this Rider lego block exactly?

        • strangedave says:

          First, its compression not normalisation, so it may not be exactly what you want anyway (though compression is of course useful i you are trying to listen to anything in a noisy environment, which is why every radio station slaps a giant compressor on its output, as it assumes you are listening in your car).

          Put Rider (or whatever (I've never used Rider, the mda ones are more a known quantity but seem less specific to your needs) in /Libaries/Audio/Plugins/Components or the single user equivalent. Actually, now I look at it, Apple includes a multi-band compressor in the standard AudioUnit set, so it may be worth getting things working with that before downloading anything (or a less subtle effect like a low/high pass filter or distortion might be useful just to verify everything is working)

          If whatever you are using for playback supports AudioUnits, just slap it on the output. Otherwise, get SoundFlower and Soundflowerbed. Use Audio MIDI Setup to put sound output via Soundflower. Startup AULab (or whatever), set it to get its input from Soundflower outputs to default output. Rider should show up in the list of effects you can add.

          • jwz says:

            I think you just said, "download this junk and then figure out how to install it into some hypothetical program that is not iTunes."

            Uh. *plonk*

            • strangedave says:

              Its a completely different setup, this whole thread has been about how to strap some additional effects processing over the live audio output of anything, including ITunes if you want (I just explained how to do at OS level what other people where suggesting you do by slapping a bit of expensive physical hardware on the audio output).

              If you didn't pick that up earlier in the thread, well, uh *plonk*

            • strangedave says:

              Oh, and you can give it a try with only small bit of downloaded stuff - Soundflower for audio routing. For the rest of it, you can use stuff that will already be on your machine if you have a developer install (though you might get better results with stuff more speciaised to task).

    • solarbird says:

      ...if whatever playing software is involved talks to Apple AudioDevices then the OSX port of Ardour should have the functionality needed to replicate this in software only.

      Possibly also Audacity, which should be easier to use.

      Otherwise, the external-hardware solution sounds great to me. Radio stations do that all the time.

      • fnivramd says:

        I think we can safely assume that the modern Mac-using JWZ doesn't regard using Ardour (a full blown DAW) to normalise his background music as an acceptable solution even if it would do what he wants, which it arguably won't.

        Having read Apple's documentation, Sound Check does the same thing as other "intelligent" normalisation systems, storing ahead of time information about the apparent dynamic range of material in order to adjust the software gain, causing a small loss of audio quality but nicely flattening things to the extent possible without (as a compressor would) smashing them into dull pop music mush. So what JWZ (and everybody else) needs is Sound Check or any of its non-proprietary alternatives, but for all AV input.

        The closest I can imagine to a general solution without source code for your player software is: Take a player which changes its titlebar or similar metadata exactly in sync to the music. Add software that can process audio from every conceivable pre-recorded source (maybe ffmpeg derived) and use it to pre-process everything you rip, download or otherwise obtain. When your player changes its titlebar, your new software tweaks the software gain according to its database of normalisation values. This should give the same effect as "Sound Check" but for everything, without needing to update your preferred player software. Unfortunately writing such a system is a beast, and no doubt any solutions which presently exist are half-arsed attempts.

        • strspn says:

          The algorithm requires two passes, so stream-based software tools which only do a single pass aren't going to work. A waveform editor used in an unsophisticated manner will run the risk of over-softening due to, for example, brief pops caused by electrical discontinuities which lead to an artificially large waveform amplitude. A waveform editor will probably also try to read and malloc the whole file, which you really don't need to do because it can waste time. Careful attention to detail (eliminating all fencepost errors) is the only thing which can prevent loss of time synchronization.

          The correct solution involves "windowing" say every few dozen milliseconds (overlapping windows if you buy gold-plated contacts -- and then stop doing that unless you have contact grime problems, e.g., in a saltwater environment) and taking the mean amplitude, then using the maximum observed to multiply the entire signed vector waveform after performing a DC bias adjust to global mean=0. It is possible for someone to engineer a percussive track which can be harmed by this kind of normalization, but unlikely.

          Someone asked how to tell the track has already be normalized. Since you need to compute the maximum, you might as well compute the global (track) standard deviation as well, which can be helpful in deciding whether the track is already normalized (the lower the std. deviation, the greater the probability, in general.)

          In OSX the apparent way to do this is with the AudioUnit framework of which the supplied "compressors" may do the Right Thing. However, my source for suggesting so is little more than my attempts to decipher the audio techs' moon-speak in this thread. Compression, to me, is data rate reduction, not amplitude normalization.

    • jwm says:

      I honestly have no clue as to how a compressor really works; how will it tell the difference between a song made last Tuesday that's already compressed to hell and back, and something recorded in the eighties that still has dynamic range?

    • netik says:

      This is exactly how CoreAudio works.

      You can insert coreaudio effects programatically anywhere in the audio chain. There's some good examples of this in Xcode's examples folder.

  3. bastard_blog says:

    mplayer can do this during playback via the `-af' (audio filter) option. You'd do something like...

    mplayer -af volnorm=2:0.25 file(s)

    • jwz says:

      Ha ha ha ha ha ha ha ha NO.

    • gryazi says:

      That might not be a completely terrible suggestion to the extent that MEncoder should be able to do the audio processing (and allow you to batch-normalize). I note the following in the documentation that suggests it should be possible to do this without fucking up sync:

      If you want to further guard against strange frame skips and duplication, you can use both -mc 0 and -noskip. This will prevent all A/V sync, and copy frames one-to-one, so you cannot use it if you will be using any filters that unpredictably add or drop frames, or if your input file has variable framerate! Therefore, using -noskip is not in general recommended.

      The so-called "three-pass" audio encoding which MEncoder supports has been reported to cause A/V desync. This will definitely happen if it is used in conjunction with certain filters, therefore, it is now recommended not to use three-pass audio mode. This feature is only left for compatibility purposes and for expert users who understand when it is safe to use and when it is not. If you have never heard of three-pass mode before, forget that we even mentioned it!

      A normalizing filter shouldn't add or delete any audio frames, just edit them. YMMV on figuring out how to pass through the video stream unmodified/without reencoding.

  4. jwm says:

    Your doomed.

    The ideal solution is to take a replay gain engine (which is better designed that normalize), and wrap it in some video decoding framework which will handle the demuxing and decoding work for various formats for you.

    The output gain values need to go somewhere; either in the database of whatever player you use (may be fesible), an accompanying file (kinda sucks), or you could try attaching a comment block to the file directly (which, given the variety of container formats, and the likelihood that this will break existing players, is a world of pain).

    Then you need to write a plugin for the player you use to apply the gain to the output volume.

    Like I said, your doomed.

    (That said, vorbis and FLAC have had replay gain support working in most players for the past five years, largely due to having a sensible tag block, rather than the over engineered brain damage that is ID3v2, so it's possible to get this working, but it would take a mighty effort at this stage.)

    • greatevil says:

      At that point... put a mic in front of the speaker and set a target overall volume and have it adjust the system or app volume. Makes just as much sense and would cover all contingencies... except that it as much as a shit idea as using mplayer.

      • jwm says:

        The lag would be horrible, and it wouldn't cope well with music that's supposed to have soft and loud bits. Replay gain and it's ilk work so well because they work out the 'average' volume of an entire song as a basis for calculating a gain adjustment.

  5. wikkit42 says:

    Have you tried Levelator? It's meant for getting reasonable levels on video or audio podcasts.

    • jwz says:

      The only thing of that name I see is Windows software that works on WAV and AIFF.

      • wikkit42 says:

        Ah. I found the OSX and Linux versions, but you're right about the format restriction. I appear to have misremembered its capabilities. My apologies.

        It's also meant to work in one file where there is difference of volume, so it'd fail for your purpose unless you extracted the audio from the files, concatenated them, leveled them, and then split them back apart. And while that could be done with scripting easier than the stuff above, it'd still be an almighty pain in the ass and not worth the hassle.

        So it looks like you're FUBAR. But I'm always encouraged when you do a post like this and expect there to exist a reasonable solution for a common problem. It's part of that "Never lose that optimism, kids. Don't die inside." thing.

  6. leolo says:

    Are you only interested in the audio? Or does the video also count in the "ambient entertainment"?

    If only audio, extract audio to WAV (optionnaly convert to MP3), normalise that, playback.
    (Baring finding something that would actually normalise the audio w/in the video.)

  7. baconmonkey says:

    I think the real answer is:
    "Hold an itunes engineer hostage, and submit a feature request whose completion is necessary for the release of said hostage"