Category Archives: Math

Blind Audio Testing: A/B and A/B/X

Blind Testing: Definitions

The goal of a blind audio test is to differentiate two sounds by listening alone with no other clues. Eliminating other clues ensures that any differences detected were due to sound alone and not to other factors.

A blind audio test (also called A/B) is one in which the person listening to the sounds A and B doesn’t know which is which. It may involve an person conducting the test who does know.

A double-blind audio test (also called A/B/X) is one in which neither the person listening, nor the person conducting the test, knows which is which.

In a blind test, it is possible for the test conductor to give clues or “tells” to the listener, whether directly or indirectly, knowingly or unknowingly. A double-blind test eliminates this possibility.

What is the Point?

The reason we do blind testing is because our listening/hearing perception is affected by other factors. Sighted listening, expectation bias, framing bias, etc. This is often subconscious. Blind testing eliminates these factors to tell us what we are actually hearing.

The goal of an A/B/X test is to differentiate two sounds by listening alone with no other clues. Key word: differentiate.

  • A blind test does not indicate preference.
  • A blind test does not indicate which is “better” or “worse”.

Most people — especially audio objectivists — would say that if you pass the test, then you can hear the difference between the sounds. And if don’t, then you can’t. Alas, it is not that simple.

  • If you pass the test, it doesn’t necessarily mean you can hear the difference.
  • If you fail the test, it doesn’t necessarily mean you can’t hear the difference.
  • If you can hear the difference, it doesn’t necessarily mean you’ll pass the test.
  • If you can’t hear the difference, it doesn’t necessarily mean you’ll fail the test.

Hearing is Unique

Hearing is quite different from touch or sight in an important way that is critical to blind audio testing. If I gave you two similar objects and asked you to tell whether they are exactly identical, you can perceive and compare them both simultaneously. That is, you can view or touch both of them at the same time. But not with sound! If I gave you two audio recordings, you can’t listen to both simultaneously. You have to alternate back and forth, listening to one, then the other. In each case, you compare what you are actually hearing now, with your memory of what you were hearing a moment ago.

In short: audio testing requires an act of memory. Comparing 2 objects by sight and touch can be done with direct perception alone. But comparing 2 sounds requires both perception and memory.

Audio objectivists raise a common objection: “But surely, this makes no difference. It only requires a few seconds of short-term memory, which is near perfect.” This sounds reasonable, but evidence proves it wrong. In A/B/X testing, sensitivity is critically dependent on fast switching. Switching delays as short as 1/10 second reduce sensitivity, meaning it masks differences that are reliably detected with instantaneous switching. This shows that our echoic memory is quite poor. Instantaneous switching improves sensitivity, but it still requires an act of memory because even with instant switching you are still comparing what you are actually hearing, with your memory of what you were hearing a moment before.

This leaves us with the conundrum that the perceptual acuity of our hearing, is better than our memory of it. We can’t always remember or articulate what we are hearing. Here, audio objectivists raise a common objection: “If you can’t articulate or remember the differences you hear, then how can they matter? They’re irrelevant.” Yet we know from numerous studies in psychology that perceptions we can’t articulate or remember can still affect us subconsciously — for example subliminal advertising. Thus it is plausible that we hear differences we can’t articulate or remember, and yet they still affect us.

If this seems overly abstract or metaphysical, relax. It plays no role in the rest of this discussion, which is about statistics and confidence.

Accuracy, Precision, Recall

More definitions:

A false positive means the test said the listener could tell them apart, but he actually could not (maybe he was guessing, or just got lucky). Also called a Type I error.

A false negative means the test said the listener could not tell them apart, but he actually can (maybe he got tired or distracted). Also called a Type II error.

Accuracy is what % of the trials the listener got right. An accurate test is one that is rarely wrong.

Precision is what % of the test positives are true positives. High precision means the test doesn’t generate false positives (or does so only rarely). Also called specificity.

Recall is what % of the true positives pass the test. High recall means the test doesn’t generate false negatives (or does so only rarely). Also called sensitivity.

With these definitions, we can see that a test having high accuracy can have low precision (all its errors are false positives) or low recall (all its errors are false negatives), or it can have balanced precision and recall (its errors are a mix of false positives & negatives).

A blind audio test is typically a series of trials, in each of which the listener differentiates two sounds, A and B. Given that he got K out of N trials correct, and each trial has 2 choices (X is A or X is B), what is the probability that he could get that many correct by random guessing? Confidence is the inverse of that probability. For example, if the likelihood of guessing is 5% then confidence is 95%.

Confidence Formula

p = probability to guess right (1/2 or 50%)
n = # of trials – total
k = # of trials – successful

The formula:

(n choose k) * p^k * (1-p)^(n-k)

This gives the probability that random guessing would get exactly K of N trials correct. But since p = 1/2, (1-p) also = 1/2. So the formula can be simplified:

(n choose k) * p^n

Now, substituting for (n choose k), we have:

(n! * p^n) / (k! * (n-k)!)

However, this formula doesn’t give the % likelihood to pass the test by guessing. To get that, we must add them up. For example, consider a test consisting of 8 trials using a decision threshold of 6 correct. To pass the test, one must get at least 6 right. That means scoring 6, 7 or 8. These scores are disjoint and mutually exclusive (each person gets a single score, so you can’t score both 6 and 7), so the probability of getting any of them is the sum of their individual probabilities.

Now you can do a little homework by plugging into this formula:

  • 4 trials all correct is 93.8% confidence.
  • 5 trials all correct is 96.9% confidence.
  • 7 correct out of 8 trials (1 mistake) is 96.5% confidence.

The Heisen-Sound Uncertainty Principle

A blind audio test cannot be high precision and high recall at the same time.

Proof: the tradeoff between precision & recall is defined by the test’s confidence threshold. Clearly, we always set that threshold greater than 50%, otherwise the results are no better than random guessing. But how much more than 50% should we set it?

At first, intuition says to set it as high as possible. 95% is often used to validate statistical studies in a variety of fields (P-test at 5%). From the above definitions, the test’s confidence percentile is its precision, so we have only 5% chance for a false positive. That means we are ignoring (considering invalid) all tests with scores below 95%. For example, somebody scoring 80% on the test is considered invalid; we assume he couldn’t hear the difference. But he did better than random guessing! That means he’s more likely than not to have heard a difference, but it didn’t reach our high threshold for confidence. So clearly, with a 95% threshold there will be some people who did hear a difference for whom our tests falsely says they didn’t. Put differently, at 95% (or higher) we are definitely going to get false negatives.

The only way to reduce these false negatives is to lower our confidence. The extreme case is to set confidence at 51% (or anything > 50%). Now we’ll give credit to the above fellow who scored 80% on the test. And a lot of other people. Yet this is our new problem. In reducing false negatives, we’ve increased false positives. Now someone who scores 51% on the test is considered valid, even though his score is low enough he could easily have been guessing.

The bottom line: the test will always have false positives and negatives. Reducing one increases the other.

Optimal Confidence

The ideal confidence threshold is whatever serves our test purposes. Higher is not always better. It depends on what we are testing, and why. Do we need high precision, or high recall? Two opposite extreme cases illustrate this:

High precision: 99% confidence
We want to know what audio artifacts are audible beyond any doubt.

Use case: We’re designing equipment to be as cheap as possible and don’t want to waste money making it more transparent than it has to be. It has to be at least good enough to eliminate the most obvious audible flaws and we’re willing to accept that it might not be entirely transparent to all listeners.

Use case: We’re debunking audio-fools and the burden of proof is on them to prove beyond any doubt that they really are hearing what they claim. We’re willing to accept that some might actually be hearing differences but can’t prove it (false negatives).

High recall: 51% confidence
We want to detect the minimum thresholds of hearing: what is the smallest difference that is likely to be audible?

Use case: We’re designing state-of-the-art equipment. We’re willing to over-engineer it if necessary to achieve that, but we don’t want to over-engineer it more than justified by testing probabilities.

Use case: Audio-fools are proving that they really can hear what they claim, and the burden of proof is on us to prove they can’t hear what they claim. We’re willing to accept that some might not actually be hearing the differences, as long as the probabilities are on their side however slightly (false positives).

Conclusion

To mis-quote Churchill, “Blind testing is the worst form of audio testing, except for all the others.” Blind testing is an essential tool for audio engineering from hardware to software and other applications. For just one example, it’s played a crucial role in developing high quality codecs delivering the highest possible perceptual audio quality with the least bandwidth.

But blind testing is not perfectly sensitive, nor specific. It is easy to do it wrong and invalidate the results (not level matching, not choosing appropriate source material, ignoring listener training & fatigue). Even when done right it always has false positives or false negatives, usually both. When performing blind testing we must keep our goals in mind to select appropriate confidence thresholds (higher is not always better). We should recognize its limitations and take them into account when interpreting the results. Most blind testing is done with a decision threshold of 95% confidence, which minimizes false positives yet increases false negatives, which means human hearing acuity is better than the test indicates.

Fractional Octaves

I’ve been working with parametric EQ settings lately; here’s a quick cheat sheet.

Overview

We perceive the frequencies of sounds logarithmically. Each doubling of frequency is an octave. Thus, the difference between 40 and 80 Hz sounds the same as the difference between 4000 and 8000 Hz. Even though the latter difference is 10 times greater, it sounds the same to us. This gives a range of audible frequencies between 9 to 10 octaves, which is much wider than the range of frequencies of light that we can see.

Ratios

Two frequencies 1 octave apart have a frequency ratio of 2:1; one has twice the frequency of the other. A half octave is halfway between them on a logarithmic scale. That is, some ratio R such that f1 * R * R = f2. Since f2 = 2 * f1, R is the square root of 2, or about 1.414. Sanity check: 40 * 1.414 = 56.6, and 56.6 * 1.414 = 80. Thus 56.6 Hz is a half-octave above 40, and a half-octave below 80. Even though 60 Hz is the arithmetic half-way point between 40 and 80 Hz, to our ears 56.6 sounds like the half-way point between them.

More generally, the ratio for the fractional octave 1/N, is 2^(1/N). Above, N=2 so the half-octave ratio is 1.414. If N=3 we have 1/3 octave ratio which is 2^(1/3) = 1.260. Here is a sequence taken to 4 significant figures:

  • 1 octave = 2.000
  • 3/4 octave = 1.682
  • 1/2 octave = 1.414
  • 1/3 octave = 1.260
  • 1/4 octave = 1.189
  • 1/5 octave = 1.149
  • 1/6 octave = 1.122
  • 1/7 octave = 1.104
  • 1/8 octave = 1.091
  • 1/9 octave = 1.080
  • 1/10 octave = 1.072
  • 1/11 octave = 1.065
  • 1/12 octave = 1.059

The last is special because in western music there are 12 notes in an octave. With equal temperament tuning, every note has equally spaced frequency ratios. Thus the frequency ratio between any 2 notes is the 12th root of 2, which is 1.059:1. Every note is about 5.9% higher in frequency than the prior note.

Bandwidth with Q

Another way to express the frequency range or bandwidth of a parametric filter is Q. Narrow filters have big Q values, wide filters have small Q values. A filter 2 octaves wide (1 octave on each side of the center frequency) has Q = 2/3 = 0.667.

For a total bandwidth of N octaves (N/2 on each side of center frequency), the formula is:

Q = sqrt(2^N) / (2^N - 1)

Here are some example values. You can check them by plugging into the formula.

  • N=2, Q=0.667
  • N=1.5, Q=0.920
  • N=1, Q=1.414
  • N=2/3, Q=2.145
  • N=1/2, Q=2.871

Note that these N octave fractions are total width, which is twice the above table which shows octave on each side of the center frequency.

Gotchas

Whatever tool you’re using for this, make sure you know whether it expects total bandwidth around the center frequency, or bandwidth on each side. And make sure you know whether it expects frequency ranges as raw ratios, fractions of an octave, or Q.

Example

Suppose you are analyzing frequency response and see a peak between frequencies f1 and f2. You want to apply a parametric EQ at the center point that tapers to zero by f1 and f2.

First, find the logarithmic midpoint. Compute the ratio f2 / f1 and take its square root to get R. Multiple f1 by R, or divide f2 by R and you’ll have the logarithmic midpoint.

For example if f1 is 600 Hz and f2 is 1700 Hz, the ratio is 2.83:1, so R = sqrt(2.83) = 1.683. Double check our work: 600 * 1.683 = 1010 and 1010 * 1.683 = 1699. Close enough.

So 1,010 Hz is the logarithmic midpoint between 600 and 1700 Hz. We center our frequency here and we want it to taper to zero by 600, and 1700. That range is a ratio of 1.683 on each side, which in the above list is 3/4 octave, or Q=0.920. So now we know the center frequency and width of our parametric EQ.

Velocity: Orbital vs. Escape

While thinking about escape velocity recently, I wondered why orbital velocity wasn’t the same as escape velocity. The intuition was: consider an object in a circular orbit around the Earth at speed v. If the object speeds up just a smidge, then its centrifugal force increases, which pulls it slightly further away from Earth, where gravity is weaker, so it goes even further away, etc. It seems like a positive feedback chain reaction, the object getting progressively further away from E. That would imply that orbital velocity equals escape velocity, because if you go even a smidge faster, you’ll eventually escape orbit.

However, I worked out the equations and escape velocity is not equal to orbital velocity, but it’s about 41% faster (actually, square root of 2 faster). Upon further thought, I realized my earlier intuition missed a key point: as the object moving slightly faster goes further from Earth, its trajectory flattens out. When its trajectory is a circle, the force of Earth’s gravity is perpendicular to its motion, so it does not affect the object’s speed. But when the object’s trajectory flattens out, it’s no longer a circle, so Earth’s gravitational pull is no longer perpendicular to its motion. Some small portion of Earth’s gravitational pull is slowing it down! Then, of course, pulls it forward speeding it up as it comes around the other side of the ellipse.

So when the object speeds up a smidge, its orbit becomes elliptical. It has to go significantly faster than that to escape from Earth. In fact, about 41% faster since the difference is the square root of 2.

This also means orbits are stable: if the velocity changes a bit the shape of the orbit changes, but it stays in orbit. If escape velocity equaled orbital velocity, orbits would be unstable: the slightest bump would send it out into space or spiraling inward.

When the math contradicts intuition, it leads to re-thinking which can deepen one’s intuitive understanding.

Escape Velocity

Escape Velocity is commonly described as the minimum speed an object must reach to escape the Earth (or other celestial body) into space. But this definition is ambiguous and can be misleading.

You can escape the Earth at walking speed, if you could walk straight up; you don’t need anywhere near escape velocity. Imagine a rocket launch; in the first few seconds just as it starts to move, it’s going up at walking speed. Theoretically, it could throttle back the engines to maintain that slight upward speed all the way into space, so long as it didn’t run out of fuel or become unstable. A space elevator could also leave Earth at mundane speeds.

The key to this ambiguity is escape velocity applies to a free body, an object that is passively moving according to the laws of physics, having no thrust of its own. In other words, if a rocket achieves escape velocity, it could at that point turn off its engines and it would still escape the Earth. Intuitively it seems the higher the altitude, the slower the escape velocity. This turns out to be correct.

Escape velocity is easy to understand and derive mathematically with some creative thinking. Imagine 2 objects in space (a big one and a much smaller one, like the Earth and a stone) surrounded by vacuum, no other objects. So there is no friction and no other bodies exerting gravitational pull. Suppose the stone is at rest relative to the Earth and almost infinitely far away. The gravitational pull is effectively zero. Imagine the stone precariously balanced just on the outer rim of Earth’s gravity well. Then you nudge the stone just a smidge toward the Earth, so it crosses that rim and the Earth starts pulling on it (and vice versa). It starts out slow, but accelerates toward the Earth incrementally faster and faster.

Eventually, when the stone reaches the Earth it will be moving very fast. Escape velocity is the speed it is going just before it smashes into the Earth. Or if it misses the Earth, it’s the speed at its point of closest approach. More correctly and completely, the stone is always traveling at escape velocity at every moment along its path. The escape velocity for that distance from the Earth, is the speed at which the stone is moving when it’s that far away.

Note: the bold face statement above is the nut of this explanation. When you grok its fullness, you grok the fullness of escape velocity.

That’s because of conservation of energy. When the stone was at the rim of Earth’s gravity well, it had a lot of potential energy. At the point of closest approach, all that potential energy has been converted into kinetic energy. Assuming no atmosphere, no losses, the two energies are equal. So as the stone speeds past the Earth, slowing down due to the same gravitational pull that sucked it in, that kinetic energy is converted back into potential energy. So it must reach the exact same distance away when it peters out and eventually stops.

The direction of motion is irrelevant to escape velocity. Normally this seems counterintuitive, but understanding escape velocity with our theoretical example, you can easily see why direction doesn’t matter. At that point of closest approach, it doesn’t matter what direction the stone is moving relative to the Earth. It could be nearly straight up (can’t be exactly straight up, or it wouldn’t have missed), or nearly horizontal. If it’s going horizontal, it has to travel further to escape, but being horizontal, gravity isn’t pulling it as hard. These conflicting factors are equal and cancel each other. All that matters is the altitude (distance of closest approach), because the speed depends only how much energy it’s gained from Earth’s gravity field.

If, at that point of closest approach, the stone were moving any slower, then it would have less kinetic energy, and it will not go as far away. That means it won’t make it to the rim of Earth’s gravity well, so it will still be inside the well, reverse direction and eventually come back to Earth. So escape velocity is the minimum speed a free body can have, and escape the Earth.

Of course, in the real world direction does matter. The Earth has an atmosphere that creates a lot of friction and energy loss at high speeds. If you go straight up, you’re in the atmosphere for a shorter time, less energy loss. If you go horizontal, you’re in the atmosphere longer and will lose more energy.

Here is the mathematical derivation:

escapeVelocity

Infinite Numbers

ℵ0 (Aleph zero, null or naught) is the smallest infinity, the size of the natural numbers. It is countably infinite, which means there exists some method of counting that will eventually reach each item in the set.

The rational numbers – all fractions of the form p/q where p and q are natural numbers, is also of size ℵ0. One way to prove this is to demonstrate a method for pairing each natural number with a rational number, and show that every rational number will have a pair. The classic proof draws a table of rational numbers and walks through it starting in a corner and marching along diagonals.

So we have the intuitive result that you can pair off the elements of 2 sets with each other, if and only if the sets are the same size.

To me it seems counterintuitive that the rational numbers are the same size as the natural numbers, even though this fact follows logically from the very simple and intuitive above proposition. It seems like there are a lot more rational numbers. However, what follows seems even stranger to me.

The irrational numbers – π, e, and myriad others, are more numerous. Their size is a bigger infinity called ℵ1. They are uncountable – which means there doesn’t exist any method of counting that will reach all of them. Every method you come up with will skip some. There is no way to pair them off with the rational or natural numbers – no matter how you do it, there will always be irrational numbers left over without a pair.

Despite being countable, the rational numbers are infinitely dense. Between any two of them lie infinitely many more. The irrational numbers are also infinitely dense. What is more, between any two rational numbers lie infinitely many irrational numbers. But we’d expect that, given there are more irrational numbers. Furthermore, and most strangely, between any 2 irrational numbers lie infinitely many rational numbers. How can that be, if irrationals outnumber rationals?

The proof is simple. Pick any two irrational numbers, n1 and n2. Take the absolute value of their difference, d = | n1 – n2 |. There are infinitely many irrational numbers smaller than d. If that’s not obvious, pick some natural number ε greater than both d and 1/d. Then 1/ε is a rational number smaller than d.

It seems strange that 2 sets, each infinitely dense, both in itself and in each other, can be of different sizes. But they’re both infinite,so this is probably just a manifestation of the intuitive difficulty conceptualizing different sizes of infinity.