An Objective Look Into Audio Sampling-Rate

Sampling-rate has, for many years now, been a topic of discussion ever since the introduction of digital music formats, in particular regarding the audio quality between various sampling-rates and bit-depths. Whether a higher sampling-rate makes for a better audio experience or not is something all of us “audiophiles” wonder at one point or another, and the enormous mountain of marketing campaigns out there can make it incredibly difficult to reach a concrete answer. Many experts have been involved in this topic too, and it often seems that the true experts agree that there really isn’t much to gain from using higher sampling-rate files. Nevertheless, we’ve decided to check this out and to conduct in-house tests to see whether we should trust the experts, or to trust the companies trying to sell us “higher quality” music.

As many of us know, the music found on CDs have a sampling-rate of 44.1kHz and a 16-bit depth.  There are a number of websites, such as HDTracks, that offer high quality downloads up to 192kHz sampling-rate and 24-bit depth, and sometimes even higher apparent “quality” files in the DSD format. The most commonly used and available lossless format is FLAC, which goes up to 192kHz sampling-rate and 24-bit depth, so we will use FLAC to conduct these tests.

When considering whether or not a “higher quality” sampling-rate file will yield an improvement, there are only 2 questions we need to ask; “Can you hear a difference?” and “Would you hear a difference?”.
The first question deals with the need to consider what is within the realms of absolute possibility when taking into consideration what the audible limitations are for human hearing.
According to Wikipedia:

What this means is that the absolute highest possible frequency of sound that a human can hear under the absolute best possible conditions is 28kHz, whilst the lowest is 12Hz. However, we have no information as to who it was that could hear those frequencies. It may very well have been a young child. Unfortunately, as we get older, our ability to hear higher frequency sounds decreases considerably, and if we consider the conditions under which most of us listen to our favourite tunes, none of them come even close to ideal laboratory conditions, especially when commuting on public transport. But, let’s still use 28kHz as the upper limit as it is, technically, still in the realm of possibility for humans.

The 2nd question, “Would you hear a difference?” is somewhat harder to answer. The difficulty in getting to a clear-cut answer is because some people genuinely do hear better than others, so it is quite possible that they may very well be able to hear a difference between various sample-rate files. However, the vast majority of people can’t tell the difference between a high quality MP3 file and a “simple” 44.1kHz/16-bit FLAC file during a blind test, let alone between various sampling rate and bit-depth FLAC files.

To illustrate these differences, we’ll be conducting a number of “null tests” for various sampling-rates and bit-depths, and representing the differences on a spectrogram. A null test is used to show absolute differences between 2 or more audio files. But before we get to the spectrograms, let’s first touch on the basic concept of any audio waveform.

 

Wave Patterns

Below is an example of a simple sine wave pattern.

Notice that the “crests” (top parts) reach a level of +1, whilst the “troughs” (bottom parts) reach -1. These crests and troughs also represent the movement of the driver (i.e +1 is pushed out, -1 is pulled in, for example), and a level of 0 means no movement (i.e no sound).
Below is an image of the exact same wave pattern that has been inverted.

As we can see, all crests and troughs have swapped positions, which means that all +1 values are now -1, and vice versa. The purpose of a null test is to combine a “control signal” (the original audio file) and the inverted signal.
The image below illustrates the inverted signal superimposed onto the original signal.

What this shows us is that each +1 value, and each -1 value has an identically inverted counterpart. If we think of simple mathematics, 1-1=0, and -1+1=0. So, when combining 2 perfectly matched signals like this, the result will be dead silence, as shown below.

This is essentially how noise-cancelling headphones work too.

So the point of the null test is to combine an inverted music file with a non-inverted file, and then see whether or not there is a difference. There are 2 scenarios here. Either the 2 files are audibly identical, in which case there would be no sound that we can hear, or there would be an audible difference between them, which would yield an audible sound. We specifically put an emphasis on “audible” as physical differences may occur between 2 files, but those differences can very well fall entirely out of the range of human hearing. So, relying on sound alone can be incredibly difficult, but luckily we have computers which can translate sound into a visible illustration with the help of a spectrogram.

 

Sampling-rate Null Tests

Below is a spectrogram image of Amber Rubarth’s “Ball and Chain” from her Scribbled Folk Symphonies album in 192kHz sampling-rate, 24-bit depth.

For now, let’s focus on the black part. Anything that is pure black means that there is no sound at that frequency. This is the important part to remember when looking at the spectrogram of the combined signals later. It is also important, however, to keep in mind that the spectrogram illustrates the loudness of various frequencies too, but does not mean that those frequencies will be audible to humans (irrespective of how loud that frequency is being played). We also see that some of the sound goes right up to 35kHz, well past the 28kHz limit of what some humans may technically be able to hear under the absolute ideal conditions.

The first test will be to see if there is any difference between the original 192/24 file, and a file that has been converted to a 96kHz sampling-rate, 24-bit depth file. Below is the spectrogram image of those to files combined.

As we can see, everything is black, which means there is absolutely no data lost, and as such, no sound difference between the 2 files.

Next up is the spectrogram image for the test with a 48kHz sampling-rate, 24-bit depth file.

Here we can see that there is a difference between the 2 files. Below about 23kHz everything is identical, which means that under ideal laboratory conditions, some people may be able to hear whatever sound is being produced at those frequencies. These differences represent details that are “lost” by converting to a lower sampling format. However, also note the colour; whilst there is a difference, the volume level is incredibly low, and will be extremely difficult to hear (even under ideal conditions). We should note, however, that some songs do have sounds at much louder volume levels at those frequencies, but it’s still very unlikely that you’d be able to hear those sounds, even under ideal conditions.

The final test is with a standard “CD quality” 44.1kHz sampling-rate, 16-bit depth file.

Here we can see a slightly larger difference, with everything below about 21kHz being identical to the original 192/24 file. But again, the volume level of these “lost details” is incredibly faint.

We decided to test a “lossy” format as well, just for the heck of seeing how much more of a difference there would be.

This test proved to yield a much larger difference, especially with all frequencies below 5kHz. These sounds are absolutely audible to the vast majority of humans. What these changes show is a combination of details that are lost with the lossy format (320kbps LAME MP3), as well as data that may have been changed during the conversion process.
Interestingly enough, once the original and inverted files were combined and played back, it was quite easy to actually make out the lyrics of the song. That’s how much “detail” is thrown out with a lossy file.
However, the most interesting thing to take away from this is that it has been proven time and again in proper blind tests that very few (if any) people can honestly tell the difference between a lossless file and a high quality lossy file (such as a LAME 320 MP3 file). So, if such a big physical difference exists between lossy and lossless, yet pretty much no one can actually tell them apart audibly, it’s safe to say that you’d gain absolutely nothing from listening to higher sampling-rate files instead of “standard” CD-quality files (44.1kHz sampling-rate, 16-bit depth).
Keep in mind, however, that this does not mean that a player which can play 192/24 files will sound the same as one that can only play lossy formats. The way the players (or external DAC) decode the files and convert them into analogue signals can have a huge impact on how the music sounds. Hardware specs and implementation is the key here, and will arguably play a far greater role when it comes to audio quality and fidelity.

So, now that we know what the differences between each of these sampling rates and bit-depths are in terms of the audio; which one should you use? Well, there are a few things to take into consideration.

Read: High-resolution audio: everything you need to know

Cost: Storage

Firstly, there is an impact with regard to cost. All of this data needs to be stored somewhere, and the more storage you need, the more you’ll need to spend. Storage space can have a significant impact on the final cost of your portable audio setup. So, let’s have a look at just how much storage would be needed for various file sizes.

Compared to the original 192/24 file, the 96/24 file takes up nearly 40% less space, the 48/24 almost 64%, and the 44.1/16 file close to 83% less. That’s quite a large difference, and could potentially have a significant knock on effect in terms of cost. For example, let’s have a look at some specs of a typical album. Below are some details of Lorde’s Pure Heroine album in 192/24 quality.

Here we can see that the average track length for this album is around 3 minutes and 43 seconds, with an average bitrate of 5135Kbps. That gives an average file size of about 140Mb per track.

We conducted a short survey of 47 participants which revealed that among them they had an average music library consisting of 2893 tracks. If we assume that a person’s entire library is made up of only 192/24 files, a library of 2893 tracks would require roughly 396Gb of available storage space.
A 396Gb music collection wouldn’t be able to fit onto a single MicroSD card (since the largest capacity commercially available right now is 256Gb), but instead would require two 256Gb cards. However, by converting all of your music down to 96/24, you’d only need around 238Gb, in which case you’d only need a single 256Gb card. Going further, if you down convert to 48/24, you’d require 143Gb of space, in which case you’d still need a single 256Gb card. But, if you go even further and convert all your files to 44.1/16 files, your entire collection would need 68Gb of storage, in which case only a single 128Gb card would be needed, and you’d still have plenty of space left over to almost double your collection in the future.

If we take a look on Amazon.com at the current cost of MicroSD cards, you’ll see that there’s quite some saving to be had. Using the Samsung EVO+ line of MicroSD cards for comparison, the 128Gb version costs 70% (roughly $100) less than the 256Gb version. That works out to $0.36/Gb for the 128Gb version, and $0.61/Gb for the 256Gb version. The 64Gb version, on the other hand, costs roughly 80% less than the 256Gb version, and equates to about $0.47/Gb. So clearly the 128Gb card has the best value.

So, in order to fit your entire 396GB collection of 192/24 files, you’d need two 256Gb cards, totaling a cost of nearly $300 (excluding shipping and taxes). That’s a pretty penny just for storage, and also assumes that your audio player of choice can take two cards simultaneously (unless you plan on swapping cards).
Since that entire 396Gb collection could be down-converted to take up only 143Gb of space (converting to 48/24), that would cut your cost in half already. Or, if you down convert to 44.1/16, you could save another $108 or so by only purchasing a single 128Gb card. All of this means that you could potentially be looking at a saving of over $260.
Another saving that could be involved is if you’re in the market for a new audio player, you may very well find yourself stuck between 2 players; one with 2 card slots, and one with only a single card slot. If the player with a single card slot is considerably cheaper, do you really need the one that features 2 card slots (assuming both players can produce the same audio quality)? Moreover, you might find a player that is cheaper, has great specs and features, but can “only” play up to 96/24 files. Would that really be a limitation in terms of audio quality?

 

Cost: Albums

The other aspect, in terms of cost, is what the actual music files will cost you. The Scribbled Folk Symphonies album, for example, can be found on HDTracks in a number of various formats and sampling rates. The 192/24 version will set you back nearly $25.

The 96/24 version will set you back just under $18, which means a nearly 30% decrease in cost to you already.

The “CD quality” 44.1/16 version will save you even more cash, coming in at about $12, which is a nearly 52% saving compared to the 192/24 album.

So, as we can see, there is an almost immediate and considerable benefit to using lower sampling-rate files in terms of how big of a hole it’ll burn in your wallet.

For the sake of this illustration, let’s tally up the potential costs thus far.
If we stick with the idea of a collection consisting of 2893 tracks, how many albums would that be? On average, an album has around 13 tracks, so a collection of 2893 tracks would equate to roughly 223 albums. So, at a cost of around $25 per album, that would mean that the entire collection would cost around $5575. That’s quite a chunk of change. Then add in the cost of the two 256Gb cards needed to store the collection, and you’re looking at a final cost of nearly $5890 just for music and storage.
Conversely, if you were to purchase the music in 44.1/16, the cost of the collection would come to around $2670. Adding in the cost of the 128Gb card, that gets you to a little more than $2720. That’s a lot of cash to save. In fact, you could be saving nearly $3200 in that scenario. Those savings could go a long way in buying better audio equipment instead, or buying even more music.

All of this is entirely hypothetical, of course, and does not necessarily reflect what would happen in reality. Instead, this example is simply intended to illustrate the potential cost benefits of buying and using lower sampling-rate and bit-depth files.

Read: Hi-Res Audio Logo, What does it mean? Why seen on new audio products?

Power

The 3rd aspect to take into account is the effect that higher sampling-rate files can have on your audio player. A higher sampling-rate requires more processing power, which translates into more power needed. Now, when we talk about portable audio devices, power consumption can be a significant factor to take into account.
We’ve tested this idea on Fiio’s latest player, the X5 3rd generation, and found that after 8 hours of continuous playback, a 192/24 album used up 77% of the battery, whilst the same album converted to 48/24 only used up 68% (11% less battery used). At 44.1/16 the player seemed to have used the same amount of power as the 48/24 version.
When less demand is placed on the audio processing chain, it’ll also produce less heat (an inevitable by-product of electrical processing). When we combine less heat with fewer recharging cycles, in the long run, this may very well mean a greater longevity for your device(s).

 

The Audio-chain

Another aspect that should be considered is the entire audio playback chain beyond the audio file. By this, we mean from the DAC, AMP, and headphones.
For example, have a look at the specs for your audio player, DAC, AMP, and headphones. What are their rated frequency response rates? Your external DAC or audio player’s internal DAC may only be able to decode up to 96kHz sampling-rate files. So, would there be any point in spending money on a 192/24 album, or extra storage to hold 192/24 albums? What about the player’s AMP or your external AMP; does it have a rated frequency response to take full advantage of the higher sampling-rate or bit-depth files? And finally, the headphones. The vast majority of headphones, even the really high quality ones, are rated to go up to 20kHz, sometimes higher. If a 44.1/16 file is identical to the 192/24 file below 21kHz, your headphones won’t be able to reproduce those “extra” frequencies contained in a higher sampling-rate file, never mind the fact that in all likelihood you can’t physically hear those frequencies, irrespective of how loud they are anyway.

 

Ultrasonics

The final consideration to take into account is the effect of ultrasonic sounds on humans. Ultrasonic sound is defined as frequencies above that of normal human hearing, i.e 20kHz. This will vary from person to person, but these frequencies can induce fatigue, headaches, or nausea. So whilst those frequencies may not be audible, they can potentially have a significantly undesired effect. Such extremes may only affect very few people, but it may be something to keep in mind. However, it is also unclear whether or not the ultrasonic frequencies that can be contained in higer sampling-rate files (i.e 22kHz-96kHz) are able to have this undesirable effect.

 

Conclusion

At the end of the day, it’s entirely up to you whether or not you opt for higher sampling-rate files, and this article is simply intended to inform you about what the actual differences are, and whether or not you stand to notice an increase in raw sonic quality. That being said, it would seem that 44.1/16 files are the perfect “compromise” for portable use. Not only can it save you a ton of cash down the road, but the so-called benefits of higher sampling-rate files are simply of no real benefit given the relatively limited ability of the human hearing system.
Just have a search on the internet; you’re bound to find numerous blogs and forum posts by individuals and groups who swear by the benefits of higher sampling-rate audio and who claim they can absolutely hear the audible differences between various sampling-rate files. It is entirely possible for them to be 100% correct, however, we need to consider WHY they hear those differences. The brain is a funny organ; capable of fooling us to an exceptional degree. If you tell a person that they’re listening to a higher quality file, you’ve just introduced the power of suggestion. This leads to a very real phenomenon: the placebo effect. So whilst they simply cannot actually hear a difference between the files, their brains have fooled them into thinking that they truly are hearing a difference. But, when we conduct a blind test with no indication of which file is which, the person will no longer be able to hear those differences as no cues have been given for their brains to fool them.
Sometimes, though, you really can hear a difference, but it is important to know why. We simply assume that various formats and sampling-rates of the same track all come from the exact same master recording. This isn’t necessarily true, and it’s only because of a certain file originating from a different master recording that you’d be able to hear a difference (assuming both files are in lossless formats). But, if you were to conduct a null test with the 2 different quality files, yet a spectrogram does not show any difference between the 2 files within the audible range, then that “audible difference” simply does not physically exist, and hence what you’re hearing is a placebo effect.
However, do not assume that this means that audio equipment capable of playing higher sampling-rate files are meaningless; far from it. As briefly mentioned before, these audio devices are comprised of many very sophisticated audio-specific parts, capable of reproducing audio to a far greater degree of detail and accuracy than your modern smartphone could possibly hope to do.

The simplest but most effective example we can think of to illustrate this point is to think back to your school days.
Picture this: you’re sitting in a quiet classroom and you hear footsteps coming from the corridor. Can you perhaps remember being able to tell which teacher was walking down the corridor simply by listening to their footsteps? We’re talking about a very simple action; walking. Yet, you were able to distinguish between different people merely by the rhythm and frequencies of sound projected from their stride. If their rhythm changed, your brain wouldn’t be able to put together its usual cues in order to determine who it is that’s walking.
The same thing applies to music too. Various instruments and voices have very specific rhythms and other audio cues which your brain uses to distinguish one from the other. So, the more accurate the audio equipment (this includes the original equipment used to make the recording too), the more natural those instruments and voices will sound. It is only with this accuracy and detail that you’d be able to hear the natural brassy texture of cymbals, the sound of the wood resonating from a cello, or the sound of fingers plucking a guitar string. In terms of high quality audio, this is the main advantage of dedicated audio equipment over regular consumer products. It allows you to experience the audio as a presentation, rather than a mere musical rhythm.

Leave a Reply

Your email address will not be published. Required fields are marked *