How do you get a recording properly leveled (normalized) for YouTube? Glad you asked. In part two of my take-a-break-from-K8s posting, I’ll answer that question. I mentioned a limiter in my previous post. A limiter is somewhat self-described. It places a limit on the maximum audio level. If you hit the limiter threshold, it will squash the audio to keep it under the threshold level. This comes in handy in the worse-case scenario of going above full scale zero and clipping. Think of a limiter as your last resort safe guard for not clipping your audio (I’ve seen professional audio engineers use limiters to raise gain and normalize, this is beyond my current understanding. I know what sounds good to me and using a limiter to exceed zero always sounds bad to me).
There is another mechanism that can manage loud signals, it’s a compressor. A limiter is actually a compressor too, it just isn’t adjustable. A compressor is. We can set a compressor to kick in at a threshold, and then reduce (compress) volume by a ratio/percentage. This is a great overview of compression. A compressor reduces the dynamic range of the wave sine, allowing us to raise the entire signal a bit more.
So, a limiter compresses to an infinite percentage to force a signal to stay below a given threshold. We would set that threshold just below full scale zero (e.g. negative point 5). If audio signal went to +5 dB, it would be compressed to stay at -..5. That can result in a pretty bad sound if we’re constantly hitting the limiter, so we try to avoid it. But the limiter helps us if that unfortunate case arises. A compressor allows us to make the lower parts a bit louder by reducing the louder parts, enabling us to raise the entire wave signal without clipping.
See LUFS described here. LUFS is the measurement of perceived loudness based on the overall volume of combined high and low over time. It is what online platforms like YouTube use to normalize audio levels. If your recording is too loud, it will be adjusted to match the same level as all other videos. But with YouTube, your quiet recordings are not raised.
For YouTube, the max LUFS is -14. There are meters in DAWs that will let you monitor your overall LUFS for this purpose. The theoretical bullseye is hit when you upload a video and YouTube processes the audio to zero point zero level. But anything between a -5 and -3 value for narration audio seems best to my ear.
Since LUFS also considers the quiet parts, and given a narrative recording has more quiet moments than a recording with instruments constantly producing noise, I suspect -19 to -17 LUFS might be better for this type of recording. I’ve tested two versions of a sample recording I narrated. One with just my voice, and another with a drum beat added. The one with the drum beat added 3 LUFS without raising my voice level. So the -14 LUFS measurement for voice alone isn’t necessarily the ideal target.
Right click on YouTube videos with narration only, select ‘stats for nerds’ and see where it was normalized to. Decide for yourself which level sounds best.
[Bad gain and bad EQ setting. This was recorded without enough headroom to correct. EQ not as bad as previous, but still off.. Noise gate set well for de-breath.]
I’ve left some of my learning mistakes in the previous recordings (Too much gain, bad EQ, clipping levels, etc.). When I set out to post this series, I thought about whether or not I should just cover it all in one post. I decided it might be better to post along the same trajectory of my learning process. So here is the final tip, and an example recording that should be the best result of all recordings in this series:
[If anything, bad adjustment on expander for breath. When you compress and raise gain to lift LUFS, you raise and pronounce the lower levels. In this case, I pronounced the breaths (Something that makes me cringe). To fix this, I would either adjust my noise/expander gate as is, or move it forward in the processing chain, or compress with a smaller ratio. — This recording posted to YouTube at 4.3 dB below -14 LUFS, but it was a mono track. If this was properly uploaded as stereo interleaved audio, it would be equivalent to 1.2 below -14… So still not perfect, I could upload another sample, but I think I’d need to start another blog and YT channel if I post any more audio info.]
Correction to something I later figured out:
In the above recording, I was measuring LUFS on the input (mic) strip. That’s a mono chain. I should have been measuring on the stereo output chain. As it turns out, stereo adds three LUFS to the same level of a mono track (Technically 3.1). So, if we measure -17 LUFS on a mono track, it will be -14 on a stereo mix. That cleared up a lot of frustration I had with past to -18 LUFS without shooting past full scale zero. The second mistake I was making, that compounded my misunderstanding was that I was using the easy button ‘export’ for the track. Export created a non interleaved, mono file. Using the ‘bounce’ function of the DAW created the two channel interleaved file, with the expected +3 LUFS.
There is another step further you can take if you get really picky about the final product. Every DAW has a feature that enables you to selectively lower/raise gain at specific moments in your track, If you’re like me, you will have a few ‘hot’ peaks in your audio track where you emphasize a consonant. After setting your compression and EQ, run your gain until you see just those few spikes. Then lower just those spots to be inline with the rest. If done well, the listener will not notice the minor adjustment, It then allows you to raise your gain a bit more without clipping/limiting. I’ve found this can move integrated LUFS by as much as 1.5-2.
This practice is called volume automation. You graphically pull down, or push up, levels across specific timelines in your audio. The DAW then automates the fader to match those settings during playback. There are paid-for audio plugins that will do all of this for you, automagically. The combination of good compression and volume automation gives us a signal we can raise with out needing a limiter.
I’ve made all sorts of lazy mistakes with YouTube posts. Clipping and low levels being the most common. Not paying attention to my output being mixed to one channel being the other most common. I have to assume user error, but I could swear YouTube has turned down some of my previous videos.
Net/net, it really isn’t difficult to produce ‘good enough’ audio for any output. Just give it the minimal attention it requires after spending a ton of time researching it, weeding through all the ‘experts’ opinions, and testing and trying everything for yourself. I’ve done a lot of research and shared it with you in these two posts. Use a free DAW. Use a free noise remover plugin, use free plugins to compress and EQ, use free plugin to meter your peaks and LUFS, use a free site to verify your LUFS, make sure you produce your audio file to interleaved stereo. Use zero latency plugins and free audio path virtualization drivers to improve real-time audio.
At the end of the day/chain, you don’t want to simply target coming in as close to -14 LUFS as possible for YouTube. You want to come in as close as you can (at least -18 LUFS, and probably one or two LUFS shy of -14 for narration only) without clipping (One or two limited spots is fine). Less compression equals better quality.
Absolutes to know…
- Number one absolute to keep in mind: I am an IT nerd, not an audio engineer. Everything I’ve said in this two part series is likely stacked with incorrect understanding 🙂 But, my intent is to share what I’ve figured out in search of my own ‘good enough’.
- You do not need a FetHead or Cloudlifter for your XLR mic. Your XLR mic might sound quiet, it’s not a problem. The Cloudlifter/Fethead is sitting on the analog side. Everything you’re doing with your audio is on the digital side.
- You don’t need an audio interface gain turned up past 25%.
- Position your microphone closer to your mouth for less room noise.
- Use a good noise remover plugin. Apple has a surprisingly good one bundled. Waves Clarity Vx for ~40 bucks is probably better.
- For Mac, you can use Garageband and Blackhole for free. On Windows (afaik), you can use Audacity and VB-Audio for free to route DAW processed audio to other apps like Zoom.
- It’s always better to have a slightly lower final level than to have a bunch of clipping at a higher one. Whatever you have for difference between high and low level in your recording is what you have. You can compress it a bit, but too much compression brings up all the irritating noises in the quieter levels.
- Try not to raise your voice significantly at single moments during your recording to avoid setting your ceiling much higher than the average of the entire recording. When you normalize, you’re limited by the loudest single moment in all of the recording. (There are ways to selectively lower high spots and reshape the audio to deal with spikes. There are also paid for tools that will automate this. I don’t care to go that deeply into it for my own needs. I just need good enough.)
- Final thing I’ve come to accept, you will never achieve what you set out for with audio. You will find ‘good enough’.
- Number ten is just to have a number ten. A list from 1 to 9 always seems incomplete.
For a look into the extremes you can go to with post production audio processing, I think this is a great example. This goes way beyond anything I’d dive into for my needs. It’s an interesting review of what goes into better-than-good-enough though.