When digital audio came out I wondered how the number of bits per sample correlated to the amplitude of waves. I imagined that the total expressible range was independent of the size of the smallest discernible gradation. Since this appeared to be a trade-off, I wondered how anyone decided what was a good balance.
Later I realized this is a false distinction. First: the number of bits per sample determines the size of the smallest gradation. Second: total expressible range is not a “thing” in the digital domain. Third: if the total range is a pie of arbitrary size, dynamic range is the number of slices. The smaller the slices, the bigger the dynamic range.
Regarding the first: to be more precise, bits per sample determines the size of the smallest amplitude gradation, as a fraction of full scale. Put differently: what % of full scale is the smallest amplitude gradation. But full scale is the amplitude of the analog wave, which is determined after D/A conversion, so it’s simply not part of the digital specification.
Amplitude swings back and forth. Half the bits are used for negative, the other for positive, values. Thus 16 bit audio gives 2^16 = 65,536 amplitudes, which is 32,768 for positive and negative each (actually one of the 65,536 values is zero, which leaves an odd number of values to cover the + and – amplitude swings, making them asymmetric by 1 value, which is a negligible difference). Measuring symmetrically from zero, we have 32,768 amplitudes in either direction. So the finest amplitude gradation is 1/32,768 of full scale in either direction, or 1/65,536 of peak-to-peak. 16-bit slices the amplitude pie into 65,536 equal pieces.
Here’s another way to think about this: the first bit gives you 2 values and each additional bit doubles the number of values. Amplitude is measured as voltage, and doubling the voltage is 6 dB. So each bit gives 6 dB of range, and 16 bits gives 96 dB of range. But this emphasizes the total range of amplitude, which can be misleading because what we’re really talking about is the size of the finest gradation.
So let’s follow this line of reasoning but think of it as halving, rather than doubling. We start with some arbitrary amplitude range (defined in the analog domain after the D/A conversion). It can be anything; you can suppose it’s 1 Volt but it doesn’t matter. The first digital bit halves it into 2 bins, and each additional bit doubles the number of bins, slicing each bin to half its size. Each of these halving operations shrinks the size of the bins by 6 dB. So 16 bits gives us a bin size 96 dB smaller than full scale. Put differently, twiddling the least significant bit creates noise 96 dB quieter than full scale.
To check our math, let’s work it backward. For any 2 voltages V1 and V2, the definition of voltage dB is:
20 * log(V1/V2) = dB
So 96 dB means for some ratio R,
20 * log R = 96
where R is the ratio of full scale to the smallest bin. This implies that
R = 10 ^ (96/20) = 63,096
That’s almost the 65,536 we expected. The reason it’s slightly off, is that doubling the voltage is not exactly 6 db. That’s just a convenient approximation. To be more precise:
20 * log 2 = 6.0206
So doubling (or halving) the voltage changes the level by 6.0206 dB. If we use this more precise figure, then 16 bits gives us 96.3296 dB of dynamic range. If we compute:
20 * log R = 96.3296
R = 10 ^ (96.3296 / 20) = 65,536
When the math works, it’s always a nice sanity check.
The term dynamic range implies how “big” the signal can be. But it is both more precise and more intuitive to imagine the concept of dynamic range as the opposite: the size of the smallest amplitude gradation or “bin”, relative to full scale. Put differently: dynamic range is defined as the ratio of full scale, to the smallest amplitude bin.
With 16 bits, that smallest bin is 1/65,536 of full scale, which is 96 dB quieter. With 16-bit amplitudes, if you randomly wiggle the least significant bit, you create noise that is 96 dB below full scale.
With 24 bits, that smallest bin is 1/16,777,216 of full scale, which is 144 dB quieter. With 24-bit amplitudes, if you randomly wiggle the least significant bit, you create noise that is 144 dB below full scale.
Typically, the least significant bit is randomized with dither, so we get half a bit less dynamic range, so for 16-bit we get 93 dB and 24-bit we get 141 dB.
Practical Dynamic Range
Virtually nothing we record, from music to explosions, requires more than 93 dB of dynamic range, so why does anyone use 24-bit recording? With more bits, you slice the amplitude pie into a larger number of smaller pieces, which gives more fine-grained amplitude resolution–and, consequently, a larger range of amplitudes to play with. This can be useful during live recording, when you aren’t sure exactly how high peak levels will be. More bits gives you the freedom to set levels conservatively low, so peaks won’t overload, but without losing resolution.
Another reason that 24 bits can be useful is related to the frequency spectrum of musical energy. Most music has its maximum energy at or near its lowest frequencies, and from the lower midrange upward, energy usually drops by around 6 dB / octave. By the time you get to the top octave, the level is down 30 dB or more, so you’ve lost at least 5 bits of resolution — sometimes more. You might think that 16 – 5 = 11 bits is enough. But since the overall level was below full scale to begin with, you don’t have 16 bits. You typically have only 8 bits for these high frequencies, which is only 48 dB. Recording in 24-bit gives you 8 more bits which solves the problem, giving you 16 bits in this top octave.
Back in the late 80s there was another solution to this, part of the CD Redbook standard called “emphasis”. They applied an EQ to boost the top octave by about 10 dB before digitally encoding it, giving about 2 bits more resolution. Then after decoding it, then cut it by the same amount. In principle, it’s Dolby B for digital audio. However, this is never used anymore because the latest ADC and DACs are so much better now than they used to be.
However, once that recording is completed, you know what the peak level recorded was. You can up-shift the amplitude of the entire recording to set the peak level to 0 dB (or something close like -0.1 dB). So long as the recording had less than 93 dB of dynamic range, this transforms the recording to 16-bit without any loss of information (such as dynamic range compression).
In the extremely rare case that the recording had more than 93 dB of dynamic range, you can keep it in 24-bit, or you can apply a slight amount of dynamic range compression while in the 24-bit domain, to shrink it to 93 dB before transforming it. There are sound engineering reasons to use compression in this situation, even for purist audiophiles!
To put this into perspective: 93 dB of dynamic range is beyond what most people can pragmatically enjoy. Consider: a really quiet listening room has an ambient noise level around 30 dB SPL. If you listened to a recording with 93 dB of dynamic range, and you wanted to hear the quietest parts, the loud parts would would peak at 93 + 30 = 123 dB SPL. That is so loud as to be painful; the maximum safe exposure is only a couple of seconds. And whether your speakers or amplifier can do this at all, let alone without distortion, is a whole ‘nuther question. You’d have to apply some amount of dynamic range compression simply to make such a recording listenable.