All posts by Mike Clements

DACs and Digital Filters, Pushing the Limits

April 6, 2020Audio, TechMike Clements
by Mike Clements

I’ve discussed this topic before, here and here. A recent discussion at ASR led me to think about this further, devise some practical examples, and gain a deeper understanding, which I share here.

44-16 is a Tough Nut to Crack

It all started with the digital filters of the WM8741, which my DAC uses (article linked above). We tend to think of CD audio as being “perfect” for all practical purposes. It certainly is higher quality than lossless streaming, and perceptually transparent for most people. Yet at 44.1 kHz, none of the WM8741’s 5 filters was perfect from an engineering perspective. The closest were filters #3 and #5, which it labels “sharp linear phase” and “slow linear phase”, respectively.

Filter #3 has perfectly flat frequency response up to 20,021 Hz (0.454 fs at 44,100 kHz sampling) and no phase distortion. Problem is, it is too weak. At Nyquist (22,050) it is attenuated only 6.43 dB and the stopband (-110 dB) is 24,079 Hz (0.546 fs at 44,100 kHz sampling). The stopband being above Nyquist, it could allow high frequency noise to leak through.

Filter #5 is fully attenuated by Nyquist – the stopband (-110 dB) is 22,050 Hz. And it has no phase distortion. But the passband only goes up to 18,390 Hz, so it begins to attenuate below 20 kHz.

Neither of these filters is perfect, each is a compromise. Why is that? The problem is, the CD standard of 44.1 kHz sampling is so low, it forces a filter transition band that is very narrow (20,000 to 22,050; only 0.14 octaves). Even with modern hardware, it’s hard to implement digital filters that are correct from an engineering perspective and run in real-time, with these constraints. Something’s got to give: frequency response, phase response, or Nyquist attenuation.

Note that at 48 kHz, the WM8741’s filter is perfect. Fully attenuated at Nyquist, with no attenuation or phase shift below 20 kHz. So while 44.1 kHz may not be quite sufficient for implementing perfect real-time filters, it’s almost sufficient. It only takes just a little more “room” to make it perfect. By “room” I mean a wider filter transition band.

So which of these filters, #3 or #5, is better? At first I thought filter #5 was better because I considered full attenuation at Nyquist to be the most important feature of any digital reconstruction filter. Few people (not me) can hear above 18 kHz, so that is a small price to pay for full attenuation. But on further thought, I believe that filter #3 is better. To explain why, I’ll start with aliasing.

Aliasing

Most audiophiles have heard of aliasing and have some idea what it means. Yet surprisingly few have a solid grasp on the math behind what it actually is. I was one of them, so I did a little exploring to rectify that.

The Nyquist-Shannon theorem says if we sample at least twice as fast as the highest frequency we want to capture, our sampling points capture the wave with mathematical perfection. The Whittaker Shannon formula provides a method to perfectly reconstruct the analog wave from the digital sampling points. In both cases, limiting the bandwidth to frequencies below half the sampling rate (the Nyquist limit) is critical.

Note: the Whittaker-Shannon interpolation formula provides mathematically perfect reconstruction, but it is not the only way to reconstruct the analog wave. It requires summing an infinite series for every sampling point, and even when the series is truncated it is too computationally expensive to be practical for real-time decoding. Two common methods that DACs use are delta-sigma and R2R, which provide similar results. One can think of these as engineering compromises: mathematically imperfect, but requiring fewer computations.

For any frequency (below Nyquist) we encode digitally into sampling points, an alias is a different frequency (above Nyquist) that passes through the exact same sampling points. We can derive a mathematical relationship between frequencies and their aliases. Intuitively, each frequency and its alias are reflected across Nyquist. Put differently, they are equidistant from Nyquist, or that Nyquist is always the arithmetic average of a frequency and its alias.

At CD sampling at 44,100 Hz, Nyquist is 22,050 Hz, so we can encode any frequency below this. Examples:

The alias of 18,000 Hz is 22,050 + (22,050 – 18,000) = 26,100 Hz. That is: 18,000 and 26,100 are each 4,050 away from 22,050: one below it, one above it.
The alias of 1 kHz is 43,100 Hz; each is 21,050 away from Nyquist
The alias of 100 Hz is 44,000 Hz; each is 21,950 away from Nyquist

A picture’s worth a thousand words. In the following graphs, I use small numbers to keep it all simple, but it all extends to any sampling frequency. The entire X axis is 1 second, and we sample at 10 Hz, so Nyquist is 5 Hz.

Here is a 3 Hz wave.

At 10 Hz sampling, the alias of this 3Hz wave is 7 Hz, in red below.

Now recall what exactly it means to say that these 2 waves are aliases of each other at 10 Hz sampling: it means either of these waves can perfectly match the same sampling points.

We can see this below:

Hmmm… is that not obvious? OK try this:

The green shows the points where these waves intersect. Of course, intersecting means they are equal. Observe that these intersection points are perfectly evenly spaced in time. If you sampled either of these waves at these points, you would get the exact same thing. Both waves perfectly fit the sampling points. That is what aliasing means.

Note: the astute reader may notice that the above 2 waves intersect more often than the points noted in green. For purposes of digital sampling and reconstruction, it is sufficient that they pass through the same sampling points, and it's irrelevant whether they intersect more often than that.

Now suppose all you have are these sampling points, and you must construct the analog wave. You could construct either one! So the solution is ambiguous: how do you know which is the correct one — meaning the one that was recorded and encoded?

Recall the primary rule of digital recording: you must filter the analog wave to remove all frequencies above Nyquist. The same rule applies when reconstructing the wave from the sampling points. Alias pairs are always symmetrically centered around Nyquist; one above, one below. Thus, filtering to only frequencies below Nyquist eliminates the ambiguity during reconstruction.

A Simple Yet Clever Trick

One conclusion we can draw from the above is that frequencies close to Nyquist have aliases close to Nyquist. Grokking the fullness of this symmetry leads to a simple, yet clever trick when implementing digital reconstruction filters.

As we’ve seen above, the filter’s stopband should be no higher than Nyquist. But squashing the signal from full scale at 20,000 Hz to negative infinity (say -100 dB) at 22,050 Hz will cause passband artifacts, given real-time hardware limitations.

Yet consider what happens if we break the rules and shift the filter stopband a little above Nyquist. Remember how aliases reflect across Nyquist? We want the top of our passband to be 20 kHz, and Nyquist is 22,050. The difference is 2,050 Hz. Add that to Nyquist and we have 24,100 Hz. This is the alias of 20 kHz, when sampled at 44.1 kHz. What if we make this the filter stopband?

Any frequency below 20 kHz will have an alias above 24,100 Hz, so it will be fully attenuated. Conversely, any frequency between Nyquist and the stopband will have an alias above 20 kHz. And we stretched our filter transition twice as wide, making a gentler slope, easier to implement.

Thus, our digital filter will be imperfect from a math or engineering perspective, but perceptually transparent. It may leak some frequencies above Nyquist, which is by definition noise or distortion (call it “junk”). But all this “junk” and its aliases must be all above 20 kHz which is inaudible.

In this case, we shifted the filter stopband just a bit above Nyquist, to widen its transition band. We took advantage of aliasing symmetry, or the fact that frequencies near Nyquist have their aliases near Nyquist.

Of course, TANSTAAFL and this is no exception. This filter may leak some supersonic junk from 20 kHz to 24 kHz. This is inaudible in itself, but when it passes through analog circuits (preamps, power amps, speakers), harmonic and intermodulation distortion will create artifacts in the passband. However, this filter transition band from 20 to 24 kHz is strongly attenuated and most music has little or no energy up there to begin with. So pragmatically speaking, it should not be a problem. Even so, one can see why Wolfson’s engineers provided filter #5 as an alternative – being fully attenuated at Nyquist, it cannot leak any supersonic junk. So the engineers building devices that use the WM8741 can choose which filter makes the best compromise for their needs.

The WM8741 Uses this Trick

Now let’s take another look at the WM8741’s filter #3, at 44.1 kHz sampling. The passband goes up to .454 * fs, which is 20,021 Hz. The stopband is .546 * fs, which is 24,079 Hz. The range between them is the transition band.

Notice anything interesting about these numbers? The transition band is perfectly centered around Nyquist! By sampling frequency ratio, it’s .046 below and .046 above. By frequency, it’s 2,029 below and 2,029 above. Any frequency below 20,021 will alias above 24,049, so aliases of all passband frequencies are fully attenuated. This is the filter we just described above!

BTW, I don’t think this trick is unique to the WM8741. At ASR, reviews of various DACs show their “sharp, linear phase” digital filters down only 6 dB at Nyquist (22,050 Hz), and their stopband around 24 kHz. So it seems like common engineering practice, creative rule-breaking to stretch the limits and provide the best implementation possible given the constraints of 44.1 kHz sampling. Now I know why, and so do you!

If audio standardized on a higher sampling frequency (even only slightly higher like 48 kHz which is already used for DVD), or as DAC chips gain more processing power, these engineering compromises would become unnecessary.

Ubuntu 18 and Slow Network

March 22, 2020TechMike Clements
by Mike Clements

Recently my Ubuntu 18 laptop had intermittent very slow internet/network. After checking the usual causes (router, WiFi, etc.), I found this helpful article. Turns out what fixed it was one of those suggestions.

Edit this file:

sudo vi /etc/nsswitch.conf

Then change the line that says:

hosts:          files mdns4_minimal [NOTFOUND=return] dns mdns

To this:

hosts:          files dns mdns4_minimal [NOTFOUND=return] mdns

So, in hindsight, it seems to have been a DNS problem.

Update 1

While this seemed to fix the problem temporarily, network problems started again. After an hour or two of reading about the issue, I made another change. I learned that Ubuntu 18 changed the way network is configured, using a new service called “netplan”. The /etc/netplan directory should have config files, but mine was empty! I don’t know how it got emptied; I certainly didn’t do it. But I needed to create a default config file. I created a file called 01-netcfg.yaml:

network:
  version: 2
  renderer: networkd
#  ethernets:
#    wlan0:
#      dhcp4: true
#      nameservers:
#        addresses: [192.168.1.1]

After some experimentation I commented out the last few lines. This tells Ubuntu that the network manager service will control network. Then I configured it in the network WiFi GUI, enabling DHCP wasn’t enough; I also added my home router (192.168.1.1) to the DNS list.

Networking (more specifically, DNS resolution) was still slow and occasionally intermittent. Finally, I had to flush the DNS cache:

sudo systemd-resolve --flush-caches

Now, everything seems to be working again.

Update 2

This too only fixed things temporarily. I still was getting intermittent network problems – but only on WiFi, not when wired. It looks like Ubuntu 18’s new “netplan” was conflicting with the “networking” service. I configured netplan and stopped the networking service. That is:

In the /etc/netplan directory, I had this file named 10-netcfg.yaml, which tells the system that the desktop Network Manager app will control network setup:

network:
version: 2
renderer: NetworkManager

I created additional files AFTER the ones that are there, so they override this. These files look like this:

20-wlan0.yaml

# Enable this only if you don't want to use the desktop GUI
network:
  ethernets:
    wlan0:
      addresses: []
      dhcp4: true
      optional: true
  version: 2

30-eth0.yaml

# Enable this only if you don't want to use the desktop GUI
network:
  ethernets:
    wlan0:
      addresses: []
      dhcp4: true
      optional: true
  version: 2

Because they sort after the first file, they override it. Next, run this command:

mclement@clements6:~$ sudo netplan --debug generate
DEBUG:command generate: running ['/lib/netplan/generate']
** (generate:27016): DEBUG: 09:12:23.854: Processing input file /etc/netplan/10-netcfg.yaml..
** (generate:27016): DEBUG: 09:12:23.854: starting new processing pass
** (generate:27016): DEBUG: 09:12:23.854: Processing input file /etc/netplan/20-wlan0.yaml..
** (generate:27016): DEBUG: 09:12:23.854: starting new processing pass
** (generate:27016): DEBUG: 09:12:23.854: Processing input file /etc/netplan/30-eth0.yaml..
** (generate:27016): DEBUG: 09:12:23.854: starting new processing pass
** (generate:27016): DEBUG: 09:12:23.854: wlan0: setting default backend to 2
** (generate:27016): DEBUG: 09:12:23.854: Configuration is valid
** (generate:27016): DEBUG: 09:12:23.854: eth0: setting default backend to 2
** (generate:27016): DEBUG: 09:12:23.854: Configuration is valid
** (generate:27016): DEBUG: 09:12:23.855: Generating output files..
** (generate:27016): DEBUG: 09:12:23.855: networkd: definition wlan0 is not for us (backend 2)
** (generate:27016): DEBUG: 09:12:23.855: networkd: definition eth0 is not for us (backend 2)

Note the last 2 lines, which is “networkd” saying it won’t be managing these network connections.

Next, apply this configuration

mclement@clements6:~$ sudo netplan apply

Now, disable the system networking service

sudo service networking stop

At this point, my WiFi networking started working again, and was not slow anymore.

Update 3

Sigh… this again was only a temporary fix. Even after all of the above, network was still slow! Then I remembered that I was using 5 GHz WiFi, which has different frequencies/channels in different regions, so requires the device to know what country it is in. So I changed one more thing. Edit the file /etc/default/crda and set my country code. That is, the file originally looked like this:

# Set REGDOMAIN to a ISO/IEC 3166-1 alpha2 country code so that iw(8) may set
# the initial regulatory domain setting for IEEE 802.11 devices which operate
# on this system.
#
# Governments assert the right to regulate usage of radio spectrum within
# their respective territories so make sure you select a ISO/IEC 3166-1 alpha2
# country code suitable for your location or you may infringe on local
# legislature. See `/usr/share/zoneinfo/zone.tab' for a table of timezone
# descriptions containing ISO/IEC 3166-1 alpha2 country codes.

REGDOMAIN=

Note that REGDOMAIN was not set. I changed that last line to this:

REGDOMAIN=US

Since “US” is the code for my country.

What is interesting, is that I tested and 2.4 GHz WiFi worked fine all along. It was only 5 GHz WiFi that was intermittently broken. This would be consistent with not having the region set.

Given all this, I reverted the netplan changes above, so the desktop NetworkManager controls my networking.

Update 4

Again, this still didn’t fix the problem. However, the problem may have been the channel I was using on 5 GHz. This thread was helpful. Channel 149 was listed by “iw list”, but not by “iwlist chan”. I changed the router to use channel 48, which appears in both lists, and it is now working.

Conclusion

There’s a lot here, some of which wasn’t necessary. In summary, the fix was:

Configure /etc/netplan to tell the desktop Network Manager GUI to manage network connections. This is the default Ubuntu desktop setup.

Set /etc/default/crda to set the system country code (needed for 5 GHz).

Run iw list and iwlist chan to see which 5 GHz channels the WiFi card supports.

Configure my router to use one of these 5 GHz channels.

BANG! It’s still working overnight, fast and reliable. Problem solved.

Scrabble on Mobile

February 3, 2020Games, TechMike Clements
by Mike Clements

I’ve been playing Words with Friends with family, both near and far, for the past year or so. It’s similar to Scrabble but the scoring and rules are different enough I wanted to try good old fashioned Scrabble. I discovered that Hasbro and Electronic Arts collaborated to create a mobile version of Scrabble where the gameplay is similar to Words with Friends.

What’s Wrong with WWF?

The Words with Friends rules favor frequent players, which can be unfair. For example, it has short 5-move games you can play with its AI, and doing this earns you credits you can use to buy swaps and other advantages when playing other people. Also, WWF is generally easier than Scrabble encouraging crazy big plays. For example, every time you prepare a move it shows a bargraph showing how good that move is compared to the best available, so you know whether there’s a bigger scoring move, how much bigger it is, and whether it’s worth taking more time before submitting your play. Finally, the Scrabble app uses the official built-in Scrabble dictionary, where WWF has its own dictionary that is frustratingly inconsistent.

Scrabble is available on iOS and Android and it has the same rules & scoring as the good old board game you remember. And the familiar consistent dictionary. Once you get it set up and you log in, it works quite well. But getting there is much more difficult than it needs to be.

Where Are My Friends?

Installing is easy enough. But once installed, if you want to play with your friends, you all need to create accounts. Scrabble offers Facebook, but since I don’t have a FB account, I used the alternative option to create an Electronic Arts (EA) account.

I did this for both Michelle and myself. Then in the app I clicked “New Game”, then “Play with Friends”. The list of friends was empty, as expected since this was my first time. I tapped the “Find player” searchbox to enter her username but the phone’s keyboard didn’t appear, so I couldn’t enter anything. This was confusing: what is the point of a “Find player” search box, if it doesn’t let you type in anything to search for? I tried this on Michelle’s phone too, same behavior.

Then I googled the problem. Apparently, lots of people encounter this problem. With further reading and experimenting, here is the workaround that I cobbled together:

Key Facts

There is a site called Origin, owned by Electronic Arts.
When you create an EA account, it is also an Origin account.
In Scrabble, you can only play with people you have befriended on Origin.
Origin is both a web site, and a fat client application on Windows and Mac.
Finding and befriending other players can only be done in the application, not on the web site.

Workaround Steps

Point your browser at EA create a user account.
Point your browser at Origin and download the fat client (Windows or Mac).
- Since I run Linux, I used my Windows 10 VM running on VirtualBox.
Install the client app, run it, and log in as your EA user.
In the app, use the “find friends” feature to find your friend(s).
- You need to know their username or email.
For each one, click on them and send a friend request.
Your friends must follow the above steps, then accept your friend request.

After the above is complete, run Scrabble on your device. Pick “Create New Game”, then “Play With Friends”. Your friends should now appear in the list. Pick one and play!

Why Can’t I Submit my Move?

I’m in a game with Michelle and I submit a move. The submit button transforms to say “waiting”… and the app just hangs. A few minutes goes by, nothing. I close the app, open it again, and my move is gone as if I had never made it. I make the same move again, and submit it. Same behavior.

Long story short, the app had logged me out. But it didn’t tell me. And it just stopped working without any error message or indication why. Poking around, I went to settings and happened to see that I wasn’t logged in anymore. I logged in again. Then the app started working.

So, the app occasionally logs you out without telling you, and misbehaves for no apparent reason. Keep that in mind. If the app every acts strangely, go to settings and double-check your login status.

Conclusion

What’s really frustrating about this process is how obscure it is. The average person:

Won’t know why the app “Find player” search box doesn’t work.
Won’t know that EA is associated with Origin.
Won’t know that they can only find friends on Origin.
Won’t know that this feature only works in the Origin app, not the web site.
Won’t know that the app occasionally logs them out.

None of this is explained in the app or in help that I could find online. The first part is a one-time setup thing so that’s no a problem if you know to do it. The second part you just need to know about. But once you get past these, the app is pretty good. It’s fun to play old-fashioned Scrabble without any player aids which is more balanced and fair.

Loudness Wars and Classical Music

February 1, 2020Audio, TechMike Clements
by Mike Clements

Note: it turns out that my PC had a background app that was boosting the level by +10 dB. This didn’t show up in the audio panel, which had everything set to flat / zero. There was nothing wrong with this recording. However, I’ll leave this here since it talks about how to identify overly hot recordings and fix them as much as possible.

—

Until recently, classical music has been free of loudness wars nonsense. Most classical music recordings are made with maximum transparency, with little or dynamic range compression, equalization, or other processing. Classical music recordings still sound quite different, but the differences are due to the room, how it’s miced, types of mics, etc. Post-processing is kept to a minimum compared to other genres.

However, as an Idagio subscriber I’ve been listening to a wide variety of different music and recordings and recently found some that make me worry about this. Here is one example, and a few steps I took to “correct” it in Audacity. I use that word loosely because clipping loses information and any restoration is at best mathematically educated guesswork.

The recording is the Brahms Piano Trios played by Ax, Ma and Kavakos recorded on Sony in 2017. You can find it Idagio, Amazon and other places. When I first started listening to it I thought it was a great performance but it seemed a bit loud; I had to turn down the volume to a lower position than I normally use. Then, when the first crescendo came it sounded just a bit harsh and distorted. Not obvious, but just a bit “strained” sounding.

Out of curiosity I loaded the track into Audacity and this is what I saw:

Oops, that doesn’t look good. Let’s turn on “view clipping”:

Yowza! Those engineers really blasted this recording. Let’s zoom in on one of those clipped parts:

Yep, that is some serious clipping. This is not just intersample overs, it is actual honest-to-goodness clipping. They definitely over-baked this recording. Let’s shift the level down by 6 dB, then apply the “Clip Fix” tool with a threshold of 99%.

Holy smokes Batman! Even after a 6 dB reduction, restoring the peaks still clipped! Those engineers really blasted this recording. Let’s undo the clip fix, undo the 6 dB reduction, then reduce it by 9 dB and do another clip fix:

OK, that’s looking better. Now let’s look at the entire track, with view clipping enabled:

Good. After applying -9 dB and clip fix to every track, the new peak level was near -1 dB. So all was good. On listening, that harsh strained sound in the crescendos is gone. But of course, this doesn’t actually fix the problem. When the music is clipped, information is forever lost. We don’t know the shape of the waveform when it exceeded 0 dB. All clip fix does is restore a smooth curve which avoids the harsh sound of the sharp edge transitions of clipping.

Passive Attenuators

January 31, 2020Audio, TechMike Clements
by Mike Clements

Introduction

This is about passive attenuators. Sometimes called “passive preamps”, they are switchboxes with volume controls that typically have 24 to 128 discrete positions. Back in ’00 I designed and built one, and used it daily for over 10 years.

Passive attenuators get a mixed reaction from audiophiles. Some say they are the most transparent way to listen to music, better than any active preamp at any price. Others say they sound un-dynamic and flat. Audiophiles with EE backgrounds also have a mixed reaction to them. Some say they are transparent, others say they have high noise and non-flat frequency response.

In this article I’ll describe

System requirements for a passive to work well
How a passive actually works
Measurements of noise and frequency response comparing their performance to the best active preamps
Comparison to active preamps

1. System Requirements

It turns out all the above views have some thread of truth. How well a passive works depends on the system in which it is used. Here are the requirements:

Upstream devices (sources) have low output impedances
Downstream devices (destinations) have high input impedances
Short cables having low capacitance
Sources are “loud” with enough gain to drive destinations to full power

Put differently

You don’t need gain, you only need attenuation.
All your devices, upstream & downstream are solid state.
If you plug your sources directly into your power amp, it will drive it to extra loud levels you will never actually use.

Most solid state components and well engineered cables meet these requirements. A system that doesn’t meet these requirements is the exception, not the norm.

2. How a Passive Attenuator Works

A passive attenuator is a simple voltage divider. The source device signal is a voltage swinging from + to -. Send this voltage through 2 resistors in series, R1 and R2. The downstream device receiving the signal is in parallel with R2.

The voltage will have some drop across R1, and some drop across R2. How much it drops across each resistor depends on their impedance ratios. This determines the volume setting: how much it attenuates the signal.

The passive attenuator’s volume knob has a fixed number of discrete positions, typically spaced 0.5 to 2 dB apart. For example 24 positions about 2 dB apart, or 64 positions about 0.5 dB apart. Each position puts 2 different resistors in the signal path.

Before going further, let’s mention 2 simplifying assumptions:

The source device output impedance is zero
The destination device input impedance is infinite

These are not actually correct, but they are close enough. Most solid state sources have output impedances around 10 to 100 ohms. Most solid state amps have input impedances around 10,000 to 50,000 ohms.

2a. Source Load

The passive attenuator shows the same load (impedance) to the source device at every volume position. So the source doesn’t “care” what volume position you are using. Make this load high enough that it is easy for the source to drive it, but no higher. The source has to swing a voltage back and forth, and the higher the load impedance, the less current it draws. So higher impedance is an easier load. But too high an impedance creates higher noise (more on that later).

A 10k attenuator means R1 + R2 = 10,000 ohms at every volume position. A 5k attenuator mean they sum to 5,000 ohms. The most popular attenuator is 10k, though 5k and 20k are also used. From here on we’ll talk about 10k, but the reasoning can be applied to any value.

As a general rule, you want at least a 1:10 ratio from the source to the load. If the source has a 100 ohm output impedance, it wants to drive a load of at least 1,000 ohms. Typical solid state sources are less than this, so a 10k attenuator gives more than 1:100 ratio which is more than sufficient. If all your sources are under 500 ohms output impedance, then you should use a 5k attenuator.

Since R1 and R2 are in series, the total load the source sees is R1 + R2. Of course it’s a little less than this since the destination device is in parallel with R2 which lowers the resistance across R2. But its input impedance is so high it doesn’t materially affect it.

So now we have the first rule of a passive attenuator: each pair of resistors R1, R2, sum to 10,000 (or 5k, or 20k).

2b. Attenuation

We mentioned earlier that the ratio of R1 to R2 determines the attenuation. Here I’ll explain exactly what that means.

At every volume position, the total load is 10,000 ohms. If R1 makes up half of that, then half the voltage drops over R1 and the other half drops over R2. In this case, if the source signal is 2 V, then 1 V drops over R1 and 1 V drops over R2. If R1 makes up 75% of that, then 75% of the voltage drops over R1 and 25% drops over R2. In this case if the source signal is 2 V, then 1.5 V drops over R1 and 0.5 V drops over R2.

We convert these ratios into dB with the standard formula

20 * log(ratio) = dB

Harmonic Content, Bass and Energy

January 9, 2020Audio, TechMike Clements
by Mike Clements

Background

Most of the sounds we hear are made up of many different frequencies all vibrating together at the same time. The energy in a wave depends on its amplitude and frequency. The higher the amplitude, the more energy. Also the higher the frequency, the more energy. The amplitude part of this makes intuitive sense. The frequency part does too, but it is less obvious.

If the energy of a wave depends on its amplitude and frequency, this implies that if total energy is constant for all frequencies, then amplitude must drop with frequency.

Consider a musical instrument playing a sound. Since energy depends on amplitude and frequency, if it puts equal energy into all the frequencies it emits, then the higher frequencies must have a smaller amplitude. Musical instruments don’t actually put equal energy at all the frequencies they emit, but this does hold true roughly or approximately. If you do a spectrum analysis, they are loudest at or near the fundamental (lowest) frequency and their amplitude drops with frequency. Typically, roughly around 6 dB per octave. That is, every doubling of the frequency roughly halves the amplitude.

For example, here is amplitude vs. frequency for a high quality orchestral recording:

This graph shows amplitude dropping as frequency increases. Since energy is based on amplitude and frequency, this means roughly constant energy across the spectrum (all frequencies).

This implies that low frequencies are responsible for most of the amplitude in a musical waveform. So, if you look at a typical musical waveform, it looks like a big slow bass wave with ripples on it. Those ripples are the higher frequencies which have smaller amplitudes. Further below I have an example picture.

Audio Linearity

Audio devices are not perfectly linear. They are usually designed to have the best linearity for medium level signals, and as the signal amplitude approaches the maximum extremes they can become less linear. This is generally true with analog devices like speakers and amplifiers, and to a lesser extent with digital devices like DACs.

For example, consider a test signal like 19 and 20 kHz played simultaneously. If you encode this signal at a high level just below clipping, it’s not uncommon for DACs to produce more distortion than they do for the same signal encoded just a little quieter. I’ve seen much smaller level changes, like a 1 dB reduction in level giving a 24 dB reduction in distortion! The same can be true for amplifiers.

Incidentally, when companies publish specs for DACs or CD players, they typically measure distortion at around -20 dB. Yet they measure noise or SNR at full scale. So they’re not really telling the whole truth.

Furthermore, the lower the level of a sound, the fewer bits remain to encode it. 16-bit audio refers to a full scale signal. But a signal at -36 dB has only 10 bits to encode it because the 6 most significant bits are all zero. Because in music the high frequencies are at lower levels, they are encoded with even fewer bits, which is lower resolution. In our -36 dB example, high frequencies 3 octaves above the fundamental are likely 18 dB smaller, which is only 7 bits. When we consider that the lowest bit is dither, this is only 6 bits for the frequencies where our hearing is most sensitive!

The Redbook CD standard had a solution to this called pre-emphasis: boost the high frequencies before digital encoding, then cut them after decoding. This was an effective solution but is no longer used because it reduces high frequency headroom and most recordings today are made in 24 bit and are dithered when converted to 16-bit.

The Importance of Bass Response

One insight from the above is that bass response is more important than we might realize. At low frequencies (say 40 Hz), the lowest level of distortion that trained listeners can detect is around 5%. But at high frequencies (say, 2 kHz), that threshold can be as low as 0.5%.

So one could say who cares if an audio device isn’t perfectly linear? Because of the energy spectrum of music, the highest amplitudes that approach non-linearity are usually in the bass, and we’re 10 times less sensitive to distortion in the bass, so we won’t hear it.

But this view is incorrect. It is based on faulty intuition. The musical signal is a not a bunch of frequencies propagating independently. It is a single wave with all those frequencies superimposed together. Thus, the high frequencies are riding as a ripple on the bass wave. If the bass wave has high amplitude approaching the non-linear regions of a device, it is carrying the smaller amplitude high frequencies along with it, forcing even those smaller frequencies into the non-linear region.

A picture’s worth 1,000 words so here’s what I’m talking about, a snippet from a musical waveform. The ripples marked in red are the midrange & treble which is lower amplitude and normally would be centered around zero, but riding on top of the bass wave has forced them toward the extreme positive and negative ranges:

Speaker Example

Here’s another practical example. Decades ago, I owned a pair of Polk Audio 10B speakers. They had two 6.5″ midrange drivers, a 1″ dome tweeter, and a 10″ tuned passive radiator. The midrange drivers produced the bass and midrange. As you turned up the volume playing music having significant bass, at some point you started hearing distortion in the midrange. This is the point where the bass energy is driving the 6.5″ driver excursion near its limits where its response goes non-linear. All the frequencies it produces are more or less equally affected by this distortion, but our hearing is more sensitive in the higher frequencies so that’s where we hear it first.

Obviously, if you turn down the volume, the distortion goes away. However, if you use EQ or a tone control to turn down the bass, the same thing happens – the distortion goes away. Here the midrange frequencies are just as loud as before, but they’re perfectly clear because the distortion was caused by the larger amplitude bass wave forcing the driver to non-linear excursion.

Other Applications: Headphones

The best quality dynamic headphones have < 1 % distortion through the midrange and treble, but distortion increases at low frequencies, typically reaching 5% or more by the time it reaches down to 20 Hz. The best planar magnetic headphones have < 1% distortion through the entire audible range, even down to 20 Hz and lower. This is due in part to having a physically large driver, which moves less to produce a given volume level.

Most people think it doesn’t matter that dynamic headphones have higher bass distortion, because we can’t easily hear distortion in the bass. But remember that the mids and treble are just a ripple riding on the bass wave, and most headphones have a single full-range driver. If you listen at low levels, it doesn’t matter. But as you turn up the volume, the bass distortion will leak into the mids and treble and become audible.

Thus, low bass distortion is more important in a speaker or headphone, than it might at first seem. If the headphone or speaker has a separate bass driver with a crossover, then this doesn’t apply – the mids and treble aren’t affected by the bass excursions.

Test signals like frequency sweeps will not show this increased distortion, because they don’t play bass & treble at the same time.

Other Applications: amplifiers and DACs

Amplifiers and DACs have a similar issue, though to a lesser extent. This concept applies here as well – especially when considering the dynamic range compression that is so often applied to music these days.

Consider a digital recording that is made with dynamic range compression and leveled too hot, so it has inter-sample overs or clipping. Or, it may be perfectly clean, but with levels that are just below full scale. Sadly, this describes most modern music rock/pop recordings, though it’s less common in jazz and classical.

Most of the energy in the musical waveform is in the bass, so if you attenuate the bass you reduce the overall levels by almost the same amount. This will entirely fix inter-sample overs, though it can’t fix clipping. Remember the 19+20 kHz example above, showing that distortion increases as amplitude levels approach full scale? With most music, attenuating the bass will fix that too, since the higher frequencies are usually riding on that bass wave. For example, this explains how the subsonic filter on an LP may improve midrange and treble response.

VueScan Multi-Crop – How To

December 25, 2019TechMike Clements
by Mike Clements

Continued from a few years ago … VueScan is a great scanning app but it has a UI that only an engineer could love. Once you know how to do something, it’s efficient. But it can be hard to figure it out the first time. Multi-Crop is a feature that scans several things at once on the scanner deck and saves each as a separate file. I use this to scan 35mm film negatives, since my scanner can load 12 frames at a time. While this feature is very useful, it took me a while to figure out how it works.

Here, I describe how I use this feature with VueScan 9.7 and my Epson V600. The process should be similar with other scanners.

First, load your media in the scanner. For this, I use the 35mm film negative tray and load 2 parallel strips each having 6 photos. Getting them lined up perfectly is tedious and requires cutting the negative strips with sharp scissors, but essential for good results. I also recommend cleaning the negatives (I use Pec Pads and Pec-12) before mounting them in the tray.

When loading the film, read the fine print along its edge to get the vendor, brand and type. You will set this below, on the Color tab.

Next, turn on the scanner, then start VueScan, and make the right settings:

Settings: Input

Important settings:

Mode: Transparency
Media: Color Negative
Bits per pixel: 24 bit RGB
Batch scan: On
- This will make it scan each cropped sub-image and save as a separate file
Scan resolution: 3200 dpi
- anything higher is overkill for most film negatives

Snapshot:

Settings: Crop

Important settings:

Crop size: 35mm Film
Auto offset: check
Multi crop: 35mm Film
Show multi outline: check

Snapshot:

Settings: Filter

All settings to taste or as needed. You can set these for the individual slides in the batch, so whatever you set here are just defaults. I typically use:

Infrared clean: Light
Grain reduction: Light

Snapshot:

Settings: Color

Like the Filter tab, these are defaults and you can change them for individual slides. I typically use:

Color balance: Neutral
Black point: 0.1%
White point: 0.5%
Curve low: 0.25
Curve high: 0.75
Brightness: 1
negative vendor: from actual film type
negative brand: from actual film type
negative type: from actual film type

Screenshot:

Settings: Output

Important settings:

Default folder: make sure it exists
- Else VueScan won’t save the pictures and it won’t give you any error message.
Auto file name: check
JPEG, quality 95
- anything higher is overkill for most film negatives

Screenshot:

Settings Complete!

Now you’re done with setup. Select File/Save options and save these settings. Give them a name like “35mNegBatch”. In the future, load these settings and skip all the above steps.

Start Scanning

Hit the Preview button (lower left area of the screen). VueScan will scan, then a grid will appear in the scan overview area. It shows a dotted line rectangle over each of the slides. It won’t be perfectly lined up, but as long as it’s reasonably close it’s OK because you’ll fix that next.

You’ll see something like this:

Note: if it’s not even close, or if you haven’t filled the entire tray and you don’t want to waste time scanning blanks, you can go to the Crop tab and select Multi crop: Custom. Then do your own layout (rows, columns, sizes). That’s a different topic I might cover some other time.

Now back to the grid of scanned slides…

Note the blue <- and -> arrows at the bottom right of the screen
- Located to the right of the image zoom magnifying glass buttons
- These move the focus forward and back across the different images of the multi-crop.
Click the left <- button until it disappears; this moves to the 1st image.
When focused on each image, VueScan remembers the settings you make for that image.

Repeat the following steps for each image:

It’s not always clear which of the 12 pictures has the current focus, but zooming in and out will show you. So…
Click the magnifying + and – buttons (lower right of UI) repeatedly to center the current image.
- After doing this, you see something like this:
In the image preview, click inside the image near a corner, then drag a rectangle to mark the crop area to contain the image.
- As you do this, the image colors will change as VueScan applies the Color settings to the portion of the image that is being captured inside the crop rectangle.
- Now the screen looks something like this. You can see the dotted line rectangle around the image, and that the colors have improved (but they’re still washed out).
If needed, click the rotate buttons (lower right of UI) or Image|Mirror (menu item) to ensure the image is oriented correctly.
- Now the screen looks like this:
Images on old film (like this one) often look washed out. To fix this, go to the Filter tab and check either Restore colors, or Restore fading (whichever looks better).
- Now the screen looks like this:
Go to the Color tab to fine-tune the picture’s exposure.
- This is optional; if you’re happy with how the picture looks, skip this step.
- The key controls for this are:
- Color balance: Neutral, Landscape, etc.
- Black point: what % of the pixels are mapped to black (lowest intensity).
- White point: what % of the pixels are mapped to white (highest intensity).
- Curve low & high: set the shape of the contrast curve
  - Low and high are the 25th and 75th percentiles
  - If you set them to .25 and .75, you get a 1:1 linear mapping
  - To increase mid-intensity contrast, at the expense of losing detail in the darkest and lightest parts of the image: increase the low, decrease the high
After the image looks good, click the blue next -> arrow to focus on the next image.

When you’re done, click the previous <- arrow to review each of the images. VueScan will show each picture with its individual settings, so you can ensure they are all correct.

Now click the “Scan” button at the bottom left of the screen. VueScan will scan each image, which (with my Epson GT-X820 / V600) can take about 3 minutes per image (over 30 minutes for a deck of 12). This is fully automatic so you can walk away and come back later to check the results.

With the above settings, with my scanner, for each photo, VueScan produces a JPG file having approximately 4500×3000 resolution, about 2-4 MB in size (about 13-14 megapixels). This is plenty of resolution for typical 35mm film photos. But you may want to increase that if your photos came from professional equipment.

Corda Soul & WM8741 DAC Filters

December 24, 2019Audio, TechMike Clements
by Mike Clements

The Corda Soul uses the WM8741 DAC chip. Actually, it uses 2 of them, each in mono mode which gives slightly better performance. This chip has 5 different anti-aliasing reconstruction filters. The Corda Soul has a switch to select either of 2 different filters. Here I describe these filters, show some measurements I made, and from this make an educated guess which 2 of these filters the Corda Soul uses, at various sampling rates. At higher sampling frequencies the digital filter should make less difference; more on that here. My measurements and observations below are consistent with that.

Note: this DAC chip has a mode called OSR for oversampling. The Soul uses this chip in OSR high, which means it always oversamples the digital signal at the highest rate possible, to 192 or 176.4 kHz, whichever is an integer multiple of the source. For example, 44.1k is oversampled 4x to 176.4k and 96k is oversampled 2x to 192k. The function of the digital filters depends on this OSR mode.

Summary: the filters have 3 key attributes:

Frequency Response: how fast (sharp) or slow they attenuate high frequencies.
Frequency Response: the filter stop-band – is it above, at, or below Nyquist.
Phase: whether the filter is linear (constant group delay, FIR) or minimum phase (variable group delay, IIR).

This table summarizes key filter attributes – taken from the WM8741 data sheet linked above, for 44.1k / 48k sampling in OSR high mode.

Name	Rate	Phase	Passband	Stopband	Nyquist	Group Delay
1	sharp	lin [min?]	20,021 / 21,792	24,079 / 26,208	-6.02	43
2	slow	min [lin?]	17,993 / 19,584	23,020 / 25,056	-28.07	8
3	sharp	lin	20,021 / 21,792	24,079 / 26,208	-6.43	7
4	slow	min	18,390 / 20,016	22,050 / 24,000	-116.19	47
5	slow	lin	18,390 / 20,016	22,050 / 24,000	-122.6	8

Note: at 44.1 kHz sampling, filters 1 and 3 are almost identical. The first is called “soft knee” while the third is called “brickwall”. Yet strangely, their frequency response is the same (despite their names which suggest otherwise) and the only difference is that 1 has more group delay. This suggests that the labels for filters 1 and 2 might have been mistakenly reversed in the WM8741 data sheet. Brickwall is usually the standard sharp filter closest to the ideal mathematical response. But not here, because being only -6 dB at Nyquist, it can allow ultrasonic noise to leak into the passband.

Filters 4 and 5 are labeled as apodizing. From what I read, this means their stop-band is a little below Nyquist. Why set the stop-band below Nyquist? Theoretically this is unnecessary. The reason given is that rejecting the upper band just below Nyquist is supposed to be an extra-safe way of avoiding any distortion introduced by the AD conversion during recording. Here, the stop-band of the apodizing filters is at Nyquist, but that’s still a bit lower than the others which are above Nyquist (which is an improper implementation).

Based on the above chart, filter 5 is the most correct implementation because it is the only filter that is fully attenuated by Nyquist, with flat phase response (minimal group delay). However, filter 5 rolls off a little early to achieve this. If you want flat response to 20 kHz, filter 3 is the best choice, though it does so at the price of allowing some noise above Nyquist. If one wanted a minimum phase alternative, the best choice would be filter 4. Both 1 and 4 are minimum phase, but 1 is not fully attenuated at Nyquist. Filter 4 is. However, to achieve this, filter 4 sacrifices FR with an earlier roll off.

For comparison, here’s how these filters behave at 96k / 88.2 k sampling (also in OSR high mode).

Name	Rate	Phase	Passband	Stopband	Nyquist	Group Delay
1	sharp	lin [min]	19,968/18,346	48,000/44,100	-120.41	17
2	slow	min [lin]	19,968/18,346	48,000/44,100	–120.8	9
3	sharp	lin	40,032/36,779	48,000/44,100	-116.89	48
4	slow	min	19,968/18,346	43,968/40,396	-126.82	9
5	slow	lin	19,968/18,346	43,968/40,396	-130.52	8

At these higher sampling rates, all the filters are fully attenuated by Nyquist (or lower). That’s a good thing and Wolfson should have done this at the lower rates too. Also, filters 1, 2, 4 and 5 (all but 3) take advantage of the higher sampling frequency to have a wide transition band with gentler slope. This sacrifices response above 20k (which we don’t need) to minimize passband distortion, particularly phase shift. The numbers reflect this, as they all have flatter (better) phase response than filter 3.

As with the first table, filters 1 and 2 look like a mis-print; both have the same transition and stop bands. But all else equal, linear phase should have less phase shift, not more. This is probably a typo, because as you’ll see below, the impulse response for filter 1 is asymmetric, and for filter 2 is symmetric, and symmetric impulse response usually implies linear phase.

Based on this data, filters 2, 3 or 5 are the most correct implementations. Filter 3 has flat FR up to 40 kHz, but this extra octave comes at the price of a narrower transition band having more phase shift and group delay. Filters 2 and 5 have flatter phase response but start rolling off around 20 kHz to get a wider transition band. If one wanted a minimum phase alternative, filters 1 or 4 are the only choices and either would be fine.

I measured the Soul’s output with the digital filter switch in each mode, sharp and slow, using 2 test signals: a frequency sweep and a square wave. From this, I measured frequency and phase response, group delay and impulse response. Charts/graphs are below, in the appendix.

Here’s the square wave: first sharp, then slow:

Overall, at 44.1 kHz I observed 3 key differences:

In sharp mode, frequency response and group delay are both flat to 20 kHz.
In slow mode, frequency response starts to roll off and group delay starts to rise between 18 and 19 kHz.
In slow mode, the square wave shows no ripple before a transition, and ripples with greater amplitude and longer duration after a transition.
The above curves are similar when comparing the sharp & slow filters at 48k sampling.

From these observations I conclude that for 44.1k and 48k signals, the Soul uses filters 3 and 4 in sharp and slow modes, respectively. Here’s why:

Because FR is flat to 20 kHz in sharp mode, it must be using filter 1 or 3.
Because GD is flat in sharp mode, it must be using filter 3.
Because FR rolls off just above 18k in slow mode, it must be using filter 2, 4 or 5.
Because GD rises in slow mode, it must be using filter 4.

Appendix

I recorded these graphs using my sound card, an ESI Juli@. This is not a great setup, but it’s the best I can do without dedicated equipment.

PC USB Audio output –> Corda Soul USB input –> Corda Soul analog output –> sound card analog input

Details:

Configured the sound card for analog balanced input & output (flip its daughter board from unbalanced to balanced.
Cabled from Soul to Juli@, using 3-pin XLR to 1/4″ TRS.
On PC:
- Disable pulseaudio
- Use Room EQ Wizard (REQW) on PC, in ALSA mode
- Configure REQW
  - set desired sampling rate (44.1, 48, 88.1, 96)
  - set audio output to USB
  - set audio input to Juli@ analog
- Configure Corda Soul
  - Select USB audio input
  - Ensure all DSP disabled (knobs at 12:00)
  - Set volume as desired
    - measured at max: 0 dB
    - measured at 12:00; -16 dB; 34 clicks down
- Use REQW “Measure” function
- Confirm proper sampling rate light on Corda Soul

Important Note: My measurements depend as much on the Corda Soul as they do on the Juli@ sound card. For example, if the Juli@ rolls off the frequency response faster than the Soul, then I will measure the same FR in both mods of the Soul. And if the Juli@ applies a minimum phase filter that adds phase distortion, then I will measure that phase distortion in both modes of the Soul. This probably explains why the digital filter responses were so similar at 88 and 96 kHz.

Here are FR, phase, GD and impulse plots for all tested sampling rates. Each is sharp top, slow bottom. Observe that at multiples of 44.1k (44.1k and 88.2k), the sharp filter has flat phase response while the slow filter does not. But at multiples of 48k (48k and 96k), both filters have similar non-flat phase response. This is probably due to the Juli@ card. However, the comments below assume the Juli@ card is transparent and all differences are due to the Soul.

In all cases, both filters at all sampling rates:

Frequency response: starts to taper at 20 kHz for the widest possible transition band.
Impulse response: sharp is symmetric, slow is asymmetric.
Group delay: sharp is flatter than slow.
At high sampling rates, the difference between the filters becomes immaterial. This is consistent with theory.

44.1 kHz: sharp is filter 3 and slow is filter 4.

Sharp FR doesn’t taper until past 20k, so it must be filter 1 or 3.
Sharp has flat GD, so it must be filter 3.
Slow FR tapers past 19k, so it must be filter 4 or 5.
Slow has more GD than sharp, so it must be filter 4.

48 kHz: sharp is filter 3 and slow is filter 4, for the same reasons as above.

88.2 kHz: Sharp is filter 2 and slow is filter 1.

Both FR start to taper at 20 kHz, so neither can be filter 3.
Both have a stopband at 44,100 kHz (beyond 40k), so neither can be filter 4 or 5.
Sharp has flatter phase / less group delay, which is filter 2.

96 kHz: Sharp is filter 2 and slow is filter 1.

Both FR start to taper at 20 kHz, so neither can be filter 3.
Both have a stopband at 48 kHz (beyond 44k), so neither can be filter 4 or 5.
Sharp has flatter phase / less group delay, which is filter 2.

Blind Audio Testing: A/B and A/B/X

November 22, 2019Audio, MathMike Clements
by Mike Clements

Blind Testing: Definitions

The goal of a blind audio test is to differentiate two sounds by listening alone with no other clues. Eliminating other clues ensures that any differences detected were due to sound alone and not to other factors.

A blind audio test (also called A/B) is one in which the person listening to the sounds A and B doesn’t know which is which. It may involve a person conducting the test who does know.

A double-blind audio test (also called A/B/X) is one in which neither the person listening, nor the person conducting the test, knows which is which.

In a blind test, it is possible for the test conductor to give clues or “tells” to the listener, whether directly or indirectly, knowingly or unknowingly. A double-blind test eliminates this possibility.

What is the Point?

The reason we do blind testing is because our listening/hearing perception is affected by other factors. Sighted listening, expectation bias, framing bias, etc. This is often subconscious. Blind testing eliminates these factors to tell us what we are actually hearing.

The goal of an A/B/X test is to differentiate two sounds by listening alone with no other clues. Key word: differentiate.

A blind test does not indicate preference.
A blind test does not indicate which is “better” or “worse”.

Most people — especially audio objectivists — would say that if you pass the test, then you can hear the difference between the sounds. And if don’t, then you can’t. Alas, it is not that simple.

If you pass the test, it doesn’t necessarily mean you can hear the difference.
- You could get lucky: a false positive.
If you fail the test, it doesn’t necessarily mean you can’t hear the difference.
- You might tell them apart better than random guessing, but not often enough to meet the test threshold: a false negative.
If you can hear the difference, it doesn’t necessarily mean you’ll pass the test.
- False negative, like case (2).
If you can’t hear the difference, it doesn’t necessarily mean you’ll fail the test.
- False positive, like case(1).

Simply put, the odds are that if you pass the test, you can hear a difference, and if you fail, you can’t. But exceptions to this rule do happen, how frequently depends on the test conditions. Even a blind squirrel sometimes finds a nut!

Hearing is Unique

Hearing is quite different from touch or sight in an important way that is critical to blind audio testing. If I gave you two similar objects and asked you to tell whether they are exactly identical, you can perceive and compare them both simultaneously. That is, you can view or touch both of them at the same time. But not with sound! If I gave you two audio recordings, you can’t listen to both simultaneously. You have to alternate back and forth, listening to one, then the other. In each case, you compare what you are actually hearing now, with your memory of what you were hearing a moment ago.

In short: audio testing requires an act of memory. Comparing 2 objects by sight and touch can be done with direct perception alone. But comparing 2 sounds requires both perception and memory.

Audio objectivists raise a common objection: “But surely, this makes no difference. It only requires a few seconds of short-term memory, which is near perfect.” This sounds reasonable, but evidence proves it wrong. In A/B/X testing, sensitivity is critically dependent on fast switching. Switching delays as short as 1/10 second reduce sensitivity, meaning it masks differences that are reliably detected with instantaneous switching. This shows that our echoic memory is quite poor. Instantaneous switching improves sensitivity, but it still requires an act of memory because even with instant switching you are still comparing what you are actually hearing, with your memory of what you were hearing a moment before.

This leaves us with the conundrum that the perceptual acuity of our hearing is better than our memory of it. We can’t always remember or articulate what we are hearing. Here, audio objectivists raise a common objection: “If you can’t articulate or remember the differences you hear, then how can they matter? They’re irrelevant.” Yet we know from numerous studies in psychology that perceptions we can’t articulate or remember can still affect us subconsciously — for example subliminal advertising. Thus it is plausible that we hear differences we can’t articulate or remember, and yet they still affect us.

If this seems overly abstract or metaphysical, relax. It plays no role in the rest of this discussion, which is about statistics and confidence.

Accuracy, Precision, Recall

More definitions:

A false positive means the test said the listener could tell them apart, but he actually could not (maybe he was guessing, or just got lucky). Also called a Type I error.

A false negative means the test said the listener could not tell them apart, but he actually can (maybe he got tired or distracted). Also called a Type II error.

Accuracy is what % of the trials the listener got right. An accurate test is one that is rarely wrong.

Precision is what % of the test positives are true positives. High precision means the test doesn’t generate false positives (or does so only rarely). Also called specificity.

Recall is what % of the true positives pass the test. High recall means the test doesn’t generate false negatives (or does so only rarely). Also called sensitivity.

With these definitions, we can see that a test having high accuracy can have low precision (all its errors are false positives) or low recall (all its errors are false negatives), or it can have balanced precision and recall (its errors are a mix of false positives & negatives).

Computing Confidence

A blind audio test is typically a series of trials, in each of which the listener differentiates two sounds, A and B. Given that he got K out of N trials correct, and each trial has 2 choices (X is A or X is B), what is the probability that he could get that many correct by random guessing? Confidence is the inverse of that probability. For example, if the likelihood of guessing is 5% then confidence is 95%.

Confidence Formula

p = probability to guess right (1/2 or 50%)
n = # of trials – total
k = # of trials – successful

The formula:

(n choose k) * p^k * (1-p)^(n-k)

This gives the probability that random guessing would get exactly K of N trials correct. But since p = 1/2, (1-p) also = 1/2. So the formula can be simplified:

(n choose k) * p^n

Now, substituting for (n choose k), we have:

(n! * p^n) / (k! * (n-k)!)

However, this formula doesn’t give the % likelihood to pass the test by guessing. To get that, we must add them up.

For example, consider a test consisting of 8 trials using a decision threshold of 6 correct. To pass the test, one must get at least 6 right. That means scoring 6, 7 or 8. These scores are disjoint and mutually exclusive (each person gets a single score, so you can’t score both 6 and 7), so the probability of getting any of them is the sum of their individual probabilities. Use the above formula 3 times: to compute the probabilities for 6, then 7, then 8. Then sum these 3 numbers. That is the probability that someone will pass the test by randomly guessing to reach our decision threshold of 6. Put differently: how often people who are guessing will get at least 6 right.

Now you can do a little homework by plugging into this formula:

4 trials all correct is 93.8% confidence.
5 trials all correct is 96.9% confidence.
7 correct out of 8 trials (1 mistake) is 96.5% confidence.

The Heisen-Sound Uncertainty Principle

A blind audio test cannot be high precision and high recall at the same time.

Proof: the tradeoff between precision & recall is defined by the test’s confidence threshold. Clearly, we always set that threshold greater than 50%, otherwise the results are no better than random guessing. But how much more than 50% should we set it?

At first, intuition says to set it as high as possible. 95% is often used to validate statistical studies in a variety of fields (P-test at 5%). From the above definitions, the test’s confidence percentile is its precision, so we have only 5% chance for a false positive. That means we are ignoring (considering invalid) all tests with scores below 95%. For example, somebody scoring 80% on the test is considered invalid; we assume he couldn’t hear the difference. But he did better than random guessing! That means he’s more likely than not to have heard a difference, but it didn’t reach our high threshold for confidence. So clearly, with a 95% threshold there will be some people who did hear a difference for whom our tests falsely says they didn’t. Put differently, at 95% (or higher) we are likely to get some false negatives.

The only way to reduce these false negatives is to lower our confidence. The extreme case is to set confidence at 51% (or anything > 50%). Now we’ll give credit to the above fellow who scored 80% on the test. And a lot of other people. Yet this is our new problem. In reducing false negatives, we’ve increased false positives. Now someone who scores 51% on the test is considered valid, even though his score is low enough he could easily have been guessing.

The bottom line: the test will always have false positives and negatives. Reducing one increases the other.

Confidence vs. Raw Score

We said this above but it’s important to emphasize that confidence is not the same as raw test score. From the above, 7 of 8 is 96.5% confidence, yet 7/8 = 87.5%. In this case the raw score is 87.5% but the confidence is 96.5%.

If you get 60% of the trials correct, your confidence may be higher or lower than 60%. It depends on how many trials you did. The more trials you did, the more confident the 60% score becomes. For example, 3 of 5 is only 50% confidence; 6 of 10 is 62.3%; 12 of 20 is 74.8%. Getting 60% of the trials correct, you reach 95% confidence at 48 of 80, which is 95.4% confident.

The intuition behind this is that if you are doing only slightly better than guessing, consistency (more trials) is what separates random flukes from actual performance. If you flip a coin 6 times, you may frequently get 4 heads. But if you flip a coin 600 times, you will almost never get 400 heads. Put differently, you can sometimes win in Vegas, but you can’t consistently win else it would still be a desert.

Problem is, we’re limited in how many trials we can do. Listener fatigue sets in after 10 to 20 trials, skewing the results. You must take a break, relax the ears before continuing. So to get high sensitivity/recall from ABX testing requires multiple tests, in order to get high confidence from marginal raw scores.

Optimal Confidence

The ideal confidence threshold is whatever serves our test purposes. Higher is not always better. It depends on what we are testing, and why. Do we need high precision, or high recall? Two opposite extreme cases illustrate this:

High precision: 99% confidence
We want to know what audio artifacts are audible beyond any doubt.

Use case: We’re designing equipment to be as cheap as possible and don’t want to waste money making it more transparent than it has to be. It has to be at least good enough to eliminate the most obvious audible flaws and we’re willing to accept that it might not be entirely transparent to all listeners.

Use case: We’re debunking audio-fools and the burden of proof is on them to prove beyond any doubt that they really are hearing what they claim. We’re willing to accept that some might actually be hearing differences but can’t prove it (false negatives).

High recall: 75% confidence
We want to detect the minimum thresholds of hearing: what is the smallest difference that is likely to be audible?

Use case: We’re designing state-of-the-art equipment. We’re willing to over-engineer it if necessary to achieve that, but we don’t want to over-engineer it more than justified by testing probabilities.

Use case: Audio-fools are proving that they really can hear what they claim, and the burden of proof is on us to prove they can’t hear what they claim. We’re willing to accept that some might not actually be hearing the differences, as long as the probabilities are on their side however slightly (false positives).

Why wouldn’t we use 51% confidence? Theoretically we could. But there’s so much noise, our results become statistically meaningless. Using 75% reduces the noise (or false positives) while still recognizing raw scores only slightly better than random guessing, and using more trials to reduce false positives. For example, if our threshold raw score is 60%, we achieve 75% confidence at 15 of 25.

Conclusion

To mis-quote Churchill, “Blind testing is the worst form of audio testing, except for all the others.” Blind testing is an essential tool for audio engineering from hardware to software and other applications. For just one example, it’s played a crucial role in developing high quality codecs delivering the highest possible perceptual audio quality with the least bandwidth.

But blind testing is not perfectly sensitive, nor specific. It is easy to do it wrong and invalidate the results (not level matching, not choosing appropriate source material, ignoring listener training & fatigue). Even when done right it always has false positives or false negatives, usually both. When performing blind testing we must keep our goals in mind to select appropriate confidence thresholds (higher is not always better). High precision or specificity can be achieved in a single test, but high recall or sensitivity requires aggregating results across multiple tests.

Survey Bias

November 6, 2019Economics, PoliticsMike Clements
by Mike Clements

With Census 2020 coming around, the topic of survey bias will certainly arise. Drafting neutral surveys free of bias requires understanding in several disciplines from math, to language, psychology, demographics, and a quite a bit of experience & judgement. Here are some of the more obvious forms.

Sample Bias

People living in the same neighborhoods have some common demographics and common opinions on certain topics. This also applies in the virtual world: people who visit certain web sites (say, New York Times, Wired, and Wall Street Journal).

Sometimes sample bias can be unintentional and subtle. The people you surveyed had something in common that you didn’t know about.

Framing Effect Bias

People respond differently to questions depending on how you ask them or “frame” the question. This is one of the most important biases.

For example, 93% of students registered early when a late penalty fee was assessed. But only 67% registered early when the fee was called a discount for early registration.

Another example: suppose 600 people have a deadly disease. Treatment A is predicted to result in 400 deaths. Treatment B is 33% likely to have no deaths, but 67% likely for all 600 to die.

When framed positively: A saves 200 lives, and B has a 33% chance of saving all 600, and 67% chance to save nobody.
Here, A was preferred by 72% of people.

When framed negatively: With A, 400 people will die. B has a 33% chance that nobody will die, and a 67% chance that all 600 die.
Here, A was preferred by 22% of people.

In the long term, outcomes from A and B are the same. Yet how the question is framed made a huge difference in which people preferred.

Response (and non-Response) Bias

This is similar to sample bias. Different people have different rates of response to your survey. Here, you can get burned either way. If you sample every group at the same rate, the uneven response rates can bias your data. If you sample groups at different rates, you can introduce a new bias. Eliminating this kind of bias requires measuring the different response rates and carefully targeting your sampling.

Question Order Bias

The answers people give to early questions influence how they answer later questions. Thus, questions can be ordered to lead people to answer later questions in certain ways. In multiple choice surveys, this also applies to the order in which each question’s potential answers are provided.