Setting Up A More Advanced Home Video Recording Studio For Higher Quality Audio In Your Videos

Last month we published an in-depth look at creating a home recording studio for everything from filming tutorial videos and recorded talks to the ultimate video conferencing. At the time we walked through how to create a basic recording setup using a Mac Air, a Canon SL3 camera and a Blue Yeti USB microphone, including connecting the earphone output of the Blue Yeti to the microphone input of the SL3 to avoid the dreaded "lip sync delay."

One question we've heard since then is what a more sophisticated and professional recording arrangement look like for audio. Below, we provide a reference recording setup we use that yields extremely high quality audio. While the professional audio world is increasingly moving towards digital hardware, we've purposely avoided incorporating digital components since they can have shorter lifespans and frequently rely on firmware and drivers that can be abruptly ended by manufacturers, stranding you. In particular, there is a growing trend among high-end digital audio devices today to have less and less user-accessible controls on the device itself, moving all of the controls to proprietary browser-based and mobile applications that can suddenly stop working with an OS upgrade or be discontinued, stranding users with an expensive piece of equipment permanently stuck on its last settings. For this reason we've focused here primarily on analog hardware with an eye towards longevity and independence, allowing each piece to be independently upgraded over time.

We've also purposely focused on the audio side of the equation, keeping the SL3 camera as the video capture device. This is because video standards are evolving at breathtaking speed. While video conferencing software still typically defaults to 720p, 1080p video is the standard for streaming, 4K is increasingly the standard for recorded video, DSLRs support 8K video and 12K cameras are increasingly accessible. This means that even if you spend a large amount of money on a top-end video camera, chances are that it will be obsolete and a fraction of the price just a year or two later. In contrast, professional analog audio equipment is relatively stable, with most of the changes occurring in the studio and digital space, ensuring much longer longevity for a quality audio pipeline.

The list below describes in order the sound pipeline from microphone to final video, with each component plugging into the next one.

  • Microphone (SM7B). For those with the budget and an exceptionally quiet room, the Neumann U87 Ai is an exceptional studio microphone that is a standard at NPR, but is likely out of reach of most home users. Most importantly, this microphone is very good at picking up background noise, meaning if you are recording from an untreated home office or apartment, the recording may feature too much background noise. For those with less extreme budgets, the Shure SM7B and Electrovoice RE20 are both excellent choices and are roughly the same price (the RE20 is typically priced about $50 above the SM7B). The SM7B in practice is frequently better at noise rejection for background noise and thus in an untreated room with noisy neighbors the SM7B is one of the best at minimizing ambient and external noise. This also gives it a slight edge for podcasting scenarios where two or more speakers are close to one another in a minimally treated room, since it is better at rejecting other speakers' voices in the same room. The RE20 and SM7B have extremely different and distinct sound profiles, so often the choice between these two comes down to personal preference as to which sound you feel sounds "best" with your specific voice. Note that the SM7B requires much higher gain and thus may not work well with cheaper prosumer-grade equipment that struggles to provide a full +60db of gain. In this case we are going to show the reference arrangement we use with an SM7B.
  • On-Stage DS7200B Adjustable Desktop Microphone Stand. In our case we don't want the SM7B microphone visible in our videos, so the traditional boom arm mounting of the SM7B would be problematic. Instead the SM7B is mounted on a On-Stage DS7200B Adjustable Desktop Microphone stand, which adjusts from 9" to 13", allowing the SM7B to be positioned just precisely off camera at the perfect height, placing it just 3" from the speaker's mouth while remaining completely out of view.
  • Kopul Premier Quad Pro 5000 Series Cables. All XLR cables used here are Kopul Premier Quad Pro 5000 Series studio cables. While not quite at the level of Mogami cables, we've found them to offer exceptional quality and they use Neutrik XX connectors.
  • Cloudlifter CL1. The SM7B requires an extraordinary amount of gain (typically around +60db) which prosumer audio equipment may struggle to provide without distortion or other audio effects. Even equipment capable of comfortably providing the requisite amount of gain can introduce subtle coloration at such a high level of gain. Thus, the SM7B is typically paired with a Cloud Microphones Cloudlifter CL-1 that uses phantom power to provide +25db of transparent gain. This requires a preamp capable of phantom power, but significantly reduces the gain demand on the preamp, reducing coloration and other negative effects and allows the microphone to be used with nearly any available preamp rather than a more limited selection of preamps capable of a full +60db of gain. The SM7B plugs directly into the CL-1.
  • DBX 286S. While the CL-1 Cloudlifter can be plugged directly into an audio interface or mixer, for the most polished sound quality you'll want to use a Channel Strip Processor, with one of the most popular being the DBX 286S. The CL-1 plugs directly into the XLR microphone input of the 286S with phantom power enabled. For vocal applications you'll typically want to enable the lowpass filter as well. Use the Process Bypass button to A/B test different configurations. The Compressor reduces the dynamic range of the audio, boosting quiet sounds and reducing loud ones, producing a more consistent volume level that mimics the "in your face" sound of broadcast radio. This can boost ambient noise like computer fan and A/C noise, so may take some experimentation to get just right for your specific room and speaking style. The De-Esser reduces "sh" sounds, while the Enhancers boost low and high pitched sounds. The Expander/Gate is what makes this unit especially powerful – it allows you to set a decibel level below which sound is attenuated by a certain amount. Every room has a certain amount of ambient noise, with the typical untreated or minimally treated home office having a substantial amount of ambient noise from fans, outdoors, etc. Remaining silent, you'll adjust this until when you don't speak the gate closes and the unit effectively mutes the microphone, opening it only while you speak, closing it between words and sounds. This yields perfectly silent spacing between your words.
  • DBX 231S. We use a dual channel DBX 231S 31-band (1/3 octave) graphic equalizer for precise sound adjustment.  Channel one is plugged into the Insert on the DBX 286S, allowing precise adjustment of the microphone input after the preamp but before the processors affect it. We use this to precisely zero out the specific ambient noise frequencies such as computer fans and A/C noise along just the bands and just the amount necessary. This is especially important when using the DBX 286S' Compressor, which will boost soft ambient noise like fan noise, raising it above the gate threshold in many cases. This way we are able to largely zero it out before the Compressor sees it. The second channel is used as the Insert on the Mackie mixer we'll discuss in a moment. This allows us to precisely adjust the processed output of the 286S after all of the processors have affected it. Thus, we are able to fine-tune the audio profile both before and after processing, allowing for maximal flexibility and permitting one set of adjustments to minimize ambient noise that affect compression and another set to enhance the audio profile for final recording. Note that parameter equalizers are more commonly used today for their more precise surgical adjustment, but affordable parametric equalizers typically have only a small number of filters and in our case we've found that rejecting ambient noise in a typical home office requires a larger number of adjustments beyond the ability of all but the most expensive parametric units.
  • Mackie ProFX10v3. The output of the DBX 286S is plugged into a Mackie ProFX10v3 mixer using line-level TRS. Those with more desk space or considering a greater variety of inputs might consider the ProFXV3 units with more inputs. In our case additional inputs are a Windows workstation outputting line level stereo analog via TRS and the Mac Air outputting via USB, allowing mixing of all three inputs. The ProFX10v3 supports four channel (twin stereo) USB input and two channel stereo output. Those with multitrack needs with have to look elsewhere, but most non-musical needs will be met by this unit, though podcasts will be mixed down to two channels, without the ability to separate the speakers on their own tracks. During recording sessions we record the ProFXv3 directly to the Mac Air via USB to capture the highest quality ADC alongside the synced output to the Canon SL3, with the ProFXv3 natively supporting 24-bit/192kHz conversion. It includes a separate headphone out for monitoring with independent volume control and a separate control room output with its own volume control which we route to a Yahama receiver and studio monitors.
  • Saramonic SR-AX101. In practice, one could technically connect the Main Out of the ProFX10v3 directly to the microphone input of the Canon SL3 and carefully adjust the output levels. While this will typically work and yield reasonable quality sound, it means connecting studio line level audio to a mic-level input (while some professional cameras do have a setting for line input levels, they are typically designed for the prosumer line output of portable digital audio equipment like portable recorders rather than the studio line output of stationary professional audio equipment). Instead we connect the Main Out of the ProFX10v3 to a Saramonic SR-AX101 passive adapter and the output of the SR-AX101 directly to the microphone input on the SL3 using an Audioquest Tower cable, with the unit attenuating the line input by 40dB, with two adjustment knobs to permit fine adjustment. The SR-AX101 is passive meaning there are no power supplies or batteries to worry about and it supports both microphone and line inputs from consumer to prosumer to studio levels, offering one of the widest ranges for such adapters. Stereo XLR is run from the ProFX10v3 to the SR-AX101 (while the SM7B is a mono microphone it is mirrored to both channels and more complex audio processing pipelines may introduce stereo effects). The SR-AX101 also supports ground lift to eliminate ground loops and can internally mirror mono audio to stereo channels.
  • Gain Staging. While not a piece of hardware, the sheer number of audio devices in the audio pipeline above requires careful consideration of gain at each level, known as gain staging, which can take considerable effort in untreated spaces like home offices with computing equipment and auxiliary cooling.
  • Clarity M. For precision loudness and spectral analysis the Mackie's Main Out TRS outputs are connected to provide a parallel feed to a Clarity M Loudness Meter. This monitor receives the exact signal being sent to the Saramonic for the SL3 so provides true monitoring of the final audio signal being recorded. The Clarity M provides a wealth of volume statistics and includes profiles for most major film, broadcast and streaming standards. It includes a unique "radar" temporal loudness meter and includes a 1/3 octave RealTime Analyzer (RTA), offering precision spectral analysis that can be used to precisely tune the DBX 231S.
  • Canon SL3. Gain on the Canon SL3 camera is turned to one step above zero, effectively minimizing reliance on the internal preamps, with both automatic gain and wind noise features disabled as well. Despite the extensive signal path from the microphone to the camera, the analog zero-latency nature of the equipment above means there is no noticeable delay between the microphone and camera (adjust the Expander/Gate on the 286S if you find words truncated from a delayed or premature gate closure). This ensures that when the SL3 bundles the external audio from the SM7B with its own captured video, the two will be perfectly synced, despite the SL3, like all non-professional-grade DSLR video capture systems, introducing a significant processing delay from sensor to HDMI out (only true video cameras output near-zero-latency video signals). Remember to set the SL3 to output "Clean HDMI".
  • Elgato Cam Link 4K. Finally, the HDMI output of the SL3 is connected to an Elgato Cam Link 4K USB video capture device. Unlike a PCI capture card or much more expensive external capture unit, the Cam Link 4K is limited to 1080p60 and 2160p30, but has the advantage of being extremely portable and compatible with any computer with a USB port, including a Mac Air. Most importantly, we can switch it back and forth between machines, using it in in both a Mac Air for 1080p60 video (which it works exceptionally well for) and a more powerful workstation for 2160p30 (sustained 4K video @ 30 fps is typically beyond the capability of current generation Mac Air laptops). Since the entire audio chain is terminated at the Canon SL3's HDMI output, this means the Cam Link 4K USB dongle can be plugged into any computer to feed it the video+audio signal, without any change to the rest of the pipeline. Simply unplug the Cam Link 4K from the Mac Air after filming 1080p @ 60fps content and plug it into a Windows workstation to film a 4K @ 30fps video.
  • Anker 13-Port USB Hub. The complete studio requires a number of USB devices to be plugged into the Mac Air (USB mouse, Cam Link 4K, Logitech Brio 4K "behind the scenes" USB camera, ProFX10v3, hardwired internet, SL3 USB). Combined these devices would likely exceed the power capabilities of a single USB port on the Mac Air and so a powered hub is needed. An Anker 13-Port USB Hub is used as a powered USB hub, providing both USB 3.0 support and full 60W support (many large-port-count powered hubs provide far less power and thus don't support expansion to large numbers of high-drain devices). The Anker hub is also made of aluminium allowing it to readily shed the heat that can be generated under significant load.
  • StarTech 12U Desktop Open Frame 2 Post Rack. All of the audio hardware above other than the ProFX10v3 is mounted in a StarTech 12U Desktop Open Frame 2 Post Rack that sits on the desk.
  • StarTech 16 Outlet PDU. Two StarTech 16 Outlet PDUs are mounted in the rack to manage all of the power plugs, one in always-on for the audio equipment and one that switches the lighting.
  • UPS. Even large apartment buildings can have unstable power with momentary brownouts and blackouts, especially during the high-demand summer months. In this case all equipment is connected to a UPS, though note that UPS units can introduce noise while on battery power.

We hope this reference architecture is helpful in designing your own home recording studio for these pandemic-era video recording times!