Lines of code are not limited to creating visual user interfaces or logic with a database. They can also be used to craft innovative and captivating soundscapes.
The historical evolution of sound media, from analog to digital, has not been without its struggles. Today, several issues remain relevant in the digital music world. Projects involving audio solutions that we undertake, such as PlayPodcast or L’exposition Jean Starobinski, face significant challenges: sound quality, connection speed constraints, and storage space limitations. Let’s delve into the issues related to sound quality and file size.
A Brief History of Sound Quality and File Size
Vinyl – The Perfect Yet Bulky Signal
Vinyl is the most historically qualitative sound medium in terms of audio fidelity because it is not sampled (editor’s note: vinyl record sales, which have been steadily increasing in recent years, reached their highest level since 1990 in 2023). In terms of weight, we’re not talking about kilobits but the famous “180 grams” of a 33 RPM record.
A vinyl record stores music in the form of physical grooves, directly reproducing the original sound vibrations. Unlike digital formats that sample sound, this method captures a richness and authenticity that is difficult to match with digital media. Thanks to the precision of the grooves, the sound wave is faithfully reproduced. If the sound engineer is particularly skilled, a vinyl can even reproduce the acoustics of the recording venue. However, the entire audio playback chain (the Hi-Fi system) must meet the highest standards.
CD – The High-Definition Signal of the Digital World
The first compact disc was launched on the market by Sony and Philips in 1981, featuring ABBA’s album The Visitors. Although recorded analogically, this new product allowed an album to be played on a small digital medium containing about 800 MB of data, a revolution for its time! Four years later, in 1985, Dire Straits released Brothers In Arms, the first entirely digital CD. However, it would not be until the early 1990s that the CD became widely adopted, marking the decline of vinyl.
Sampling and Encoding
When transitioning to the realm of digital audio, there are two crucial “quantities” to understand and grasp.
The first is sampling. It is akin to the “frames per second” in a film. The higher the number of frames per second (24 in cinema, but up to 60 on some platforms today), the more natural the motion appears. Conversely, if the number drops below 16-18 frames per second, the movement appears choppy (note: in sports broadcasts, some cameras capture 1,000 frames per second to offer high-quality slow-motion).
A similar principle applies to audio, and the term used is sampling. The explanation again hinges on human capabilities, this time those of our ears.
The basic hearing range of the human ear is between 20 Hz and 20,000 Hz (20 kHz). Under certain conditions, we might hear sounds from 12 Hz to 28 kHz. To record all audible sounds with quality, we need to sample at least twice the highest audible frequency. Theoretically, a 40 kHz sampling rate would suffice, meaning 40,000 measurements per second. However, a safety margin is often added. Thus, the 44.1 kHz sampling rate has become the standard for audiophiles, allowing for sound reproduction up to 22.05 kHz.
The second important quantity is the bit depth of encoding.
Just as colors in an image are encoded as a series of 0s and 1s, audio encoding uses bits to represent sound information. For video enthusiasts, color encoding is done with 72 bits; for audiophiles, 16 bits or even 24 bits are used. This “bit depth” indicates the amount of information recorded during each sample. The greater the bit depth, the more detailed the encoded information. Higher bit depth allows capturing the faintest to the loudest sounds in a recording. In summary, the higher the sampling rate and bit depth, the more precise the sound reproduction.
To maintain high sound quality within just 700 MB of space on a CD, engineers developed a compression algorithm, similar to a zip file for audio. This algorithm identifies and compresses recurring patterns in the audio, reducing file size without quality loss.
This encoding, still favored by audiophiles, is called Free Lossless Audio Coding (FLAC).
We have a high-fidelity source file. However, music and sounds come alive through sharing. Thus, this is also the starting point for exploring and creating new compressions to make the file “transportable.”
Large Files: How to Reduce their Weight?
While today’s phones come with several gigabytes (GB) of memory, early MP3 players only had 1 or 2 GB, equivalent to 2 or 3 albums in FLAC format.
The WAV Format
Among the well-known digital formats is WAV. This file format (or more precisely, a container) captures sound in PCM (Pulse Code Modulation), a technique for digitizing an analog audio signal. It has the same size and characteristics as CD audio (16 Bits, 44,100 Hz), making it a common format for music recording. However, it is also the heaviest file type: one minute of silence or music in .wav weighs approximately 10 MB…
Destructive Compression and Masked Frequencies
To reduce the size of an audio file, several techniques are available. Among destructive compression methods, there are primarily approaches that exploit the limitations of human hearing. The human ear perceives frequencies between 20 Hz and 20 kHz. Frequencies outside this range are considered unnecessary and can be removed without noticeable loss in audio quality, as they are inaudible to the ear. Moreover, we are most sensitive to frequencies between 2 kHz and 5 kHz. It takes less than 5 dB to hear frequencies in this band, while more than 20 dB is needed to hear frequencies below 100 Hz or above 10 kHz. These observations can be used to reduce file sizes. For example, frequencies above 15 kHz can be removed.
Similarly, a low-intensity sound occurring immediately after a high-intensity sound may not be audible because the ear is already occupied. Thus, it can be deleted to continue reducing file size by eliminating inaudible information. Such lossy compression techniques permanently alter files and limit data to what can be perceived by humans.
The MP3 Example: Circulating on the Internet Since 1995
With the rise of the Internet and the establishment of the MPEG standard in 1995, MP3 files, released two years earlier, began circulating online.
The bit rate of an MP3 file, which refers to the amount of audio data processed per second during playback, ranges from 96 to 320 kbps. Current streaming services, like Spotify, use bit rates of only 96 to 256 kbps (despite the official announcement of “Spotify HiFi” in spring 2021, an option repeatedly delayed)… which is at least 10 times lower than a CD audio! Hence, the quality is noticeably reduced.
The Famous AAC from Apple
When Apple entered the market with its iPod, it introduced another audio format: AAC (Advanced Audio Coding). The difference from MP3 is not always significant, and the debate has been ongoing for years about whether AAC is truly better than MP3.
FLAC: A Lossless and Viable Solution
FLAC (Free Lossless Audio Codec) emerged in 2001 as an open-source alternative to other lossless formats. It was long considered a “pirate format” due to its lack of DRM protection for files. However, it possesses all the qualities needed to appeal to a broad audience. Another advantage is that it is open-source (royalty-free), meaning audio and music professionals do not have to pay royalties to use it.
FLAC has naturally become the preferred format for those who prioritize sound quality. The main benefit of FLAC lies in the assurance that the audio quality remains completely unaltered. The FLAC codec uses lossless audio compression, preserving the original audio file’s integrity while maintaining a relatively compact file size. Moreover, FLAC is supported by the vast majority of smartphones (including iPhones), audio players, computers, and Hi-Fi systems.
“FLAC is excellent for transmitting files over the Internet as it reduces the download time for high-fidelity music by half. It’s unlikely that much better lossless compression will be found,” says a psychoacoustics professor from the University of Essex, cited by Bowers & Wilkins.
FLAC is not limited to CD quality (44.1 kHz/16-bit) and can even correspond to “high-definition” or “Hi-Res” files, at 96 kHz/24-bit and even 192 kHz/24-bit.
8 MB, 40 MB or 80 MB?
A final comparison is in order: a 4-minute music track will be approximately 8 MB in MP3, 40 MB in FLAC, and 80 MB in WAV.
A high-quality, custom sound generation technology
AudioVitality is a project that required extensive reflection and understanding of these aspects. This mobile application, grounded in neuroscience and just one piece of the AudioVitality puzzle, aims to generate high-quality, custom sounds. Each listener hears a unique sound, specifically created for them, as part of sound therapy.