Making Mixes Sound Three-Dimensional

I often hear people ask about ways to make their mixes sound more three-dimensional, which is a great question as such dimensionality is a worthy goal for recording engineers.  It is, however, not a simple question to answer (no more simple, for example, than answering the question, "How do you play guitar really well?").

As I see it there are basically two overall approaches to creating deeply dimensional, realistic soundscapes:

  1. Capture authentic soundscapes using excellent stereo miking techniques, and
  2. master techniques designed to artificially create the sense of depth and space missing from track after track of close-miked, overdubbed instruments. 

For the sake of this post I will assume we are working with the latter, lots of tracks that utterly lack useful spatial information (I may tackle the stereo miking techniques with future videos, so feel free to ask me about it to keep me on task).

(For that matter I need to make some video examples of techniques discussed in this article, so please hold my feet to the fire for that as well.)

While it may be tempting to reach for that killer new reverb or delay plugin you just shelled out your cash for, truly 3D sounding mixes are crafted by folks who understand the roles each of the following variables play in the sound we hear:

  • timing
  • dynamic response
  • frequency content
  • ambient/reflective information

Those who can effectively impact or manipulate these variables will be able to create rich 3D mixes where everything has its place, all without the use of dramatic panning or any oddball effects plugins (though dramatic panning and effects can be quite fun!).

By 'timing' I'm basically referring to the Haas effect. It's helpful to understand, though not something I rely upon a great deal but for the occasional mono track that can benefit from it with the addition of an EQ'd mono delay, panned opposite the original track, to create this effect. Google it. Not gonna say more about it specifically here (though it is related to the discussion on ambient and reflective information at the end of this post).

With regard to dynamic response I can't speak to the math or science, but I can speak about my observations over the years. Distant sounds generally have a smaller perceived dynamic range than closer sounds. When someone standing 50 feet away from you goes from whispering to talking to yelling, you likely won't hear the whisper, will hear the talking and yelling, but they won't sound dramatically different (though the timbre will... but that's another topic... we're talking dynamics at the moment). If, however, someone standing immediately in front of you whispers, talks and yells, you will hear all of it, and you will feel it as well. You don't really feel the distant version, but up close you do. So what does this mean? In short, it means that closer sounds have a greater dynamic range.

In practice you can take two voices recorded the same way, pan both to the center, compress one with a 2:1 ratio, medium threshold, slow attack and quick release and the other treat with a 4:1 ratio, low to medium threshold, fast attack and medium release, and the first will feel closer to you while the second feels further back. All you have adjusted is the dynamic response of the signals, but even with them panned to the center, one will feel upfront, the other further back. The slower the attack, and the lower the ratio, the more dynamic the signal is allowed to remain -- the more up front it will sound. The fast release will further keep the overall level of the endings of phrases and words louder, so it dominates.

The faster attack and higher ratio on the other version will grab the signal faster, allowing fewer transients through and overall leveling the track more, reducing its dynamic range. The slower release means that the ends of phrases will not dominate as much, being less distinct when compared to the version with faster release.

This is one way to create a sense of 'front' and 'rear' elements in a mix using nothing but compression.

With respect to frequency content - have someone speak softly into your ear at a distance of a couple of inches. What do you hear? You hear the lows of their voice, and the definition and articulation of the consonants, and the air moving from their lungs (and all the spit in their mouth to boot!). Now have them do the exact same thing from 10 feet away. What happened to the lows and the articulation? The air? Never mind any ambient information from the room at this point. What is the difference you hear between the two with respect only to frequency content? The distant voice lacks the ends of the frequency spectrum -- much more midrangey.

So when you want something to sound more distant, roll off the extremes of the frequency response leaving the midrange content to dominate. Unless, of course, you're working with a primarily low-frequency ("LF") content source (bass guitar or kick drum or some thick, low pad). Low frequencies are powerful (which is why you hear them coming through your neighbors wall so well!). For these instruments leaving the low end intact may benefit them.

Be aware, however, that lower frequencies (sub 150Hz-ish) tend to be perceived as omnidirectional by human ears. So if you pan a predominantly LF instrument to one side, and you're listening through speakers (not headphones) don't be surprised that it doesn't really feel panned (this is a big reason why I use hi-pass filtering on guitar cabinets that are hard panned, because the unnecessary (usually) uber lows in their sound can really mess up the dramatic imaging by making the sound feel more centered. Filter that LF off and let the bass and kick own the low end a bit more, and your guitars will feel wider, but still very full sitting on top of the bass track).

With most sources if you want them to sound up close it must contain frequency content that suggests its location to the listener. Without slightly exaggerated LF and HF content (relative to 'distant' elements in the mix) it'll sound many feet away, no matter how loud it is relative to those other elements (so turning it up won't help it feel up front, it'll just make it too loud) -- it just won't have the intimacy, because it lacks the frequency content that tells our brains that the source is close.

Also, it is helpful to make sure there are no frequencies that are overlapping or fighting with one another, which you can do by simply playing the tune and, one by one, muting each track except the vocal. Just hit 'play,' and mute a track, listen, unmute it, and mute another, listen, and continue through the whole arrangement. Listen specifically to see how the sense of space around the vocal changes as you mute each track. You can, in this manner, identify which elements are contributing to the perspective of the lead vocal. Sometimes I've discovered that a seemingly unrelated track has a profound impact on the tracks I'm focusing on. Often, I can fix my main tracks by leaving them alone and adjusting others instead that seemed to have no real relationship to the ones I'm focusing on, but turns out there is some odd interaction anyway. By removing each track one at a time you can often find issues that you never knew existed.

Regarding ambient information, this could be the mixing of room mics or the creation of artificial ambience using reverb and/or delay. I think this is pretty obvious, and something most folks who have some mixing experience understand really well. I have only a few observations and tips here.

Firstly, with the lead vocal, make sure the ambient information isn't competing with the dry track (assuming, of course, that you want a somewhat natural presentation). If you're standing in a large, live room, and someone is immediately in front of you singing, you can hear the room, yes, but is it in any way as loud as the singer's direct voice in your ears? While it is common to use greater reverb level in a mix than most natural environments provide, don't allow it to compete with the dry signal. If you really want an exaggerated, loud, glorious, Celine Dion reverb on your vocal, try sidechaining a compressor over the reverb return using the vocal as the sidechain source, so when the vocal is happening the reverb is attenuated slightly, giving the vocal greater dominance until it stops and the reverb is allowed to fill in the remaining space. Play with compressor's time constants to make this feel natural, and don't overdo it -- a couple dB of reduction can do wonders. This way you can have a glorious reverb, but it doesn't drown out the intimacy and articulation of the vocal.

These days I use very little reverb on lead vocals, often none at all. I usually opt, instead, for a delay of some kind. Delays obviously contain far less spatial information (because they present only a few reflections rather than hundreds or thousands like a reverb). Many reverbs can have a tendency to smear consonants and make articulation less distinct (and therefore farther away in perceived distance). Instead, try to pick a delay setting that compliments the other spatial information in the mix (such as picking a delay setting that fits the timing of a guitar's delay or the drum reverb, etc.). With this approach your vocal track can actually 'borrow' or 'imply' the reverb from something else in the mix (by that I mean the listener perceives the spatial dimension in the mix overall and credits it to the vocal track as well, even though there's not really any actual reverb on it) and yet keep the clarity of the up close performance. The delay, in this instance, acts like sonic glue between the more reverberant elements of the mix and the up-front, dry clarity of the vocal. It's weird to talk about, but mess with it, and you'll see what I mean.

This, of course, only really works if there is reverb present in the other elements of the mix, but it is worth keeping in mind.

Delays, as stated above, don't smear things like reverbs can. I generally opt for delay on sustained sources (like guitars and keys) and reserve reverb for more transient-rich elements (drums, percussion, consonants on vocals). Using complimentary settings (such as a medium/long delay on sustaining instruments with a large reverb on percussive elements) can create the overall perceived blend I mentioned above without causing smear and mush of putting reverb on everything. Reverb mush not only robs headroom and clarity in the mix, but the combined smear lessens the clarity of your panning assignments as well.

Another trick I use for creating space is assigning several mono delays to individual mono signals within a mix to create width by panning a track to one side and its mono delay to another. I don't do this in a general way, however, that is, it's not set-up on an aux send that will get used by multiple tracks. Instead, I pretty much assign unique delays to each source that I want them on individually. Otherwise you lose depth -- everything would start to sound like they are on one side or the other with an identical sense of distance (because the delay is identical for all sources), and you wind up with a different kind of perceived 'narrowness' to your mix.

It's important not to hard-pan these sources left and right either, as this can give a very wide, unnatural image whereas sources in a natural soundscape won't be hard left and right.  When you stand in front of a band playing live the guitarist on the left side of the stage isn't only in your left ear, though it is likely stronger in that ear.  Mimic this through the use of judicious pan settings, both with the original signals and their delay (both processed with dynamics and EQ for the proper feeling of distance you want).

Short delays (anything under 100ms) has no real sense of timing to it, so it won't alter the timing of a given track (like an 8th or 16th note delay does, for instance). So by putting a 25ms delay on one thing 35ms on another and 45, 55 and 70 on others, you can create the sense that these items are in different locations within a consistent environment, creating a sense of width and clarity for each element without creating any timing issues. Again, by creating unique space for each element in a mix you can carve space for the feature element (usually the lead vocal).

Lastly, with regard to these delays -- the same frequency realities apply here. That is, the farther away something is the less distinct the LF and HF information is. And a delay, if it is to sound natural and fit the other spatial elements of the mix, will need to sound distant (since it is supposed to feel like it is bouncing off of a distant object). So when I set up my aux send for the delay I'll insert an EQ in front of the delay on its channel. Typically I'll use this eq to apply gentle hi and lo shelves to remove these frequency extremes from the sources before they hit the delay. By putting the EQ in front of the delay instead of after it, you get to keep the full bandwidth of the delay processor itself (EQ'ing afterward results in the frequency content you want, but it doesn't sound as natural... you can hear the effect, and it doesn't sit as effectively in the mix).

In fact, I generally EQ all my delays and reverbs in my mixes these days. It really allows you to control the frequency content so you can reveal what you want to reveal within the mix, and the overall effect is much more natural.

Oh, and with respect to reverb and delay -- a little dab will do ya. That is, these effects, when used to create a natural sense of space, shouldn't draw your attention to the effects themselves. Ideally, they shouldn't really be noticeable apart from the track they're complimenting.  This is a flaw of younger engineers as many will often use way too much of an effect rather than reach for the tools and techniques discussed above.

When you have control of the dynamic and frequency content, combined with panning assignments, there isn't much need for noticeably audible effects settings (except for purely artistic reasons). Subtlety is key to making it feel natural and truly three-dimensional.

So.... when you bring all of these concepts together, at the same time, when mixing, you can see that there are many ways to create space and depth giving each component of the mix its own unique place. The combinations can be overwhelming, and mastering it is an ongoing process, really, just as mastering any instrument is.

Hopefully this provides some useful tips to help folks create more engaging, three-dimension mixes.  Have fun!

-Joel