DAC / DA Conversion / Linear vs Minimum Phase

Digital audio requires an anti-aliasing filter to suppress high frequencies (at or above Nyquist, or half the sampling frequency). Without this, an infinite number of different analog waves could pass through the digital sampling points. With this, there is only 1 unique analog wave that passes through them. The anti-aliasing filter is essential to ensure the analog wave that the DAC constructs from the bits is the same one that was recorded and encoded.

Audiophiles debate about whether linear or minimum phase anti-aliasing filters are ideal for sound reproduction and perception. Linear phase has the lowest overall distortion, but its symmetric response around transients (a bit of ripple just before and after a transient pulse), often called the Gibbs effect, means there is a “pre-echo” or “pre-ring”. In the diagram below, the red line is the signal and the black wave is the analog wave constructed from it using a linear phase filter.

If the X axis is t for time, this black curve is the function sinc(t). It is symmetric before and after the transient, which means it starts wiggling before the transient actually happens. This is unnatural; in the real world, all of the sound happens after the actual event. This pre-ringing is an artifact of linear phase anti-aliasing filters. Many audiophiles claim this is audible, smearing transients and adding “digital glare”.

Here’s what the audio books don’t always tell you. According to the Whittaker-Shannon interpolation formula, this sinc(t) response represents the “perfect” reconstruction of the bandwidth limited analog signal encoded by the sampling points. The pre-ring is very low level, and it rings at the Nyquist frequency (half the sampling frequency). That is at least 22,500 Hz (octaves higher if the digital signal is oversampled, as it virtually always is). This makes it unlikely for anyone to hear it even under ideal conditions of total silence followed by a sudden percussive SMACK.

NOTE: I say “unlikely” not “impossible” because even though humans can’t hear 22,500 Hz (let alone frequencies octaves higher), it is at least feasible that somebody could still hear the difference. Under the right conditions, removing frequencies we can’t hear as pure tones causes audible changes to the wave. That doesn’t make sense mathematically, but human perception of the frequency & time domains is not as symmetric as Fourier transforms.

Now many audiophiles suggest minimum phase filters as an alternative to solve this problem. But I believe this cure is worse than the disease. Minimum phase filters do have an asymmetric response around transients, with no pre-ringing. A picture is worth 1,000 words, so here’s what that same impulse looks like when a minimum phase filter is used.

You can see that the impulse strikes instantly without any pre-ringing. Well it actually rings louder and longer than the linear phase filter, but that ringing happens after the transient.

This has the added benefit that the ringing is masked by the sound itself for the simple reason that loud sounds psychoacoustically mask quiet ones. So what’s not to like here?

The problem is, minimum phase filters actually have more distortion (more ringing, more phase shift) than linear phase. So you get more distortion overall, but it’s time-delayed so you get cleaner initial transients with more distorted decay. And the phase shift caused by minimum phase filters happens all the time, not just in transients. So it seems you can have clean transients, or good phase response, but not both. Choose your poison.

At this point a purist audiophile might hang his head in sadness. But there’s a better solution to the digital bogeyman of pre-ring: oversampling (or higher sampling rates). The phase distortion and ringing of any filter is related to its slope, or the width of its transition band. Oversampling further increases the frequency of the pre-ring (which was already ultrasonic), makes a shallower slope, wider transition band, reducing distortion.

For example consider CD, sampled at 44,100 Hz. Nyquist is 22,050 and some people can hear 20,000 so the transition band is from 20,000 to 22,050. That’s very narrow (only 0.14 octaves) and requires a steep filter with Gibbs effect pre-ring at 22,500 Hz. Oversample it 8x and Nyquist is now 176.4 kHz, so your transition band is now 20k to 176.4k, which is 3.14 octaves (actually, you’d use a lower cutoff frequency, but it’s still octaves higher than 22,050 Hz). With this 8x oversampling, the frequency of the pre-echo ripple is 176.4 kHz. Absolutely inaudible; go ahead and use linear phase with no worries.

In short, use higher sampling frequencies (or oversample) not because you need to capture higher frequencies, but because it gives you a more gradual anti-aliasing filter which means faster transient response without any time or phase distortion.

This idea is nothing new. Most D-A converters already oversample, and have been doing so for decades. The pre-ring or ripple of a well-engineered DAC is negligibly small, supersonic and inaudible. However, many people prefer minimum phase filters in blind tests! How can we explain that? Minimum phase filters have no pre-ripple, yet they also have phase distortion and ring louder and longer. Blind tests only reveal whether people can hear differences; they don’t qualify exactly what differences they were hearing. My guess is that they’re simply finding the greater phase distortion to be euphonic. This seems reasonable, given that preferences for vinyl records and tube amps are also common.

Here’s one example of such a test done a few years ago: http://archimago.blogspot.com/2015/04/internet-blind-test-linear-vs-minimum.html

This topic has been endlessly debated in audiophile circles for years. Keith Howard wrote a good one for Stereophile a few years ago: https://www.stereophile.com/reference/106ringing
I love the experimental attitude: test and discover! But when they talk about how hard it was to tell the filters apart, it is kinda funny thinking about a bunch of middle-age guys wondering why they can’t hear a supersonic ripple octaves above the range of their hearing. Especially when most of them understand math & engineering well enough to know why.