Lossy vs. Lousy Sound

name

One late evening while visiting a dear friend, he presented a disturbing issue. He owns an upscale men’s clothing store on Market Street in San Francisco and playing on his sound system all day long are an assortment of employee contributions supplied by their individual mp3 players. He indicated that it was very convenient and they all get a fair share of what they like because they simply swap out their players and plug in. Happy with the new arrangement, he began to notice over a period of a few months that there was something strange about how he experienced the quality of the music. This was especially true for him if he was in the store for more than a few hours at a time. He had a difficult time describing the feeling, but it wasn’t pleasant. He also mentioned that after those periods of time, when coming home, the sound around him as well as the quality of his personal systems at home sounded a bit strange.

He asked me to stop by the store and analyze the sound system to see if there was a problem in the setup or possibly the wiring, but as I questioned further, I knew the problem all too well. I asked him what devices were used. These were invariably cell phones or other portable players and they were all utilizing files that were compressed. They were all mp3s! With the official publication and release of the mp3 audio layer in 1993, the intention was to faithfully reproduce the original sound quality of the recordings while at the same time reducing the data size significantly. This process of compressing the file using algorithmic methods would isolate elements of perception and hopefully, for most people, the degradation wouldn’t be noticed. Certainly the cost of data space was an issue. In 1956, IBM reported the cost of 5 megabytes of space to be $50,000.00. Fast forward to September of 1990 the cost had dropped to $9.00.For a full CD quality sound file at just over 10 megabytes per minute, a full CD of music at this rate would be obviously impossible to efficiently capture. So the efforts of reducing file size were certainly justified to allow the computer to be an audio player.

Portability was the next step in the evolution of listening. The earlier models of portable mp3 players offered up to 32 megabytes. Today an average player holds 160 GIGABYTES of storage. At a typical mp3 compression rate, this translates into carrying 40,000 tracks of the pop-song length of 3 minutes or roughly 2,000 full-length CDs. Surely we need to be carrying this much music on our morning commutes! It has clearly become congruent with all topics relating to attention span and more importantly, the lack of intimacy we previously had in the days of the vinyl long playing record. I remember the days of bringing a new album home in my youth, caressing the 12-inch cover, reading the liner notes and listening to it repeatedly, learning every gesture and every musical phrase. I often wonder how many of us have this type of relationship with recorded music now. I’ve heard stories about record labels offering their entire catalog on a packaged flash drive. Imagine the question from a friend when asking what you have on your device and unabashedly answering something to the effect of “everything that’s ever been recorded.” Although not impossible, the shame of having it “all” but at a quality unworthy of the potential our ears have to hear it would be hardly worth it.

Without entering the subject of distribution of music and the financial woes of those who create it, the issue of how it enters our ears has become a concern that may shed light on the experiential degradation of our current listening habits. Streaming media does become important due to the ease of accessing music while browsing and the quality of this medium is also a factor. The typical default streaming rate is half of the quality of the average mp3 download: 32 kbps for AM quality, 64 kbps for FM quality, 1,411.2kbps is the quality of conventional CD digital audio or Linear PCM (pulse code modulation). The higher number indicates more information per second. More information translates into more data and therefore, more aural events and frequencies that are offered to our ears. In the upcoming months most companies are converting to a streaming metaphor for track delivery so the trend of maintaining poor quality seems to be a permanent solution. Most radio stations of stature are offering the choice to stream at all of the quality levels mentioned above, but the highest level offered is commonly 128 kbps. If the streaming is done on a mobile device using Wifi, this bit rate could also be achieved. If using 3G or Edge networks, the average will automatically be reduced to 32 kbps! Even having the best rate is in my opinion the worst. It’s O.K. for NPR talk shows, but we ARE talking about music here.

In the following example, a short orchestral excerpt is analyzed to show the distinction between a full CD quality file and a typical mp3 file at 128 kbps.

Figure 1 shows a visual representation of the relative presence of pitches that are produced by the sound sample horizontally over time in the form of a spectrogram at full CD quality. The lower frequencies are on the bottom rising higher to the top. Figure 2 shows the same data compressed as an mp3 at 128 kbps. The first thing that should be obviously clear is that an mp3 at the 128 kbps rate slams the door on all frequencies above approximately 15,000Hz. A bad haircut indeed! Since the human ear can perceive pitches between 20 to 20,000 Hz, which is a range of 10 octaves, our aural potential is already losing a tremendous amount of data. This fact is further amplified by not only the loss of fundamental notes of each instrument, but the paramount alarm is that the spectral data or overtones of each note by each instrument are brutally and insensitively sacrificed. The highest pitches in the orchestral ensemble can be as high as 4,400 Hz, yet when these high pitches sound, their spectral series can reach beyond 15K Hz. With mp3 compression, this range of color is lost. Furthermore, the notes of the lower pitched instruments which produce a set of overtones reaching our aural limit, even though are less in amplitude are missing as well. This is what we refer to as color, warmth or presence. In commercial terms: full quality. In the sound examples below, I offer an example of full quality, mp3 quality at 128 kbps, and also an example isolating what is missing from the full quality. While the example of what is missing may not sound like music, it is essential to the full experience. It would be likened to creating the perfect soufflé while forgetting the subtle spices for the full emotionally charged culinary delight or perhaps like drinking wine out of a box.


name

Figure 1: Conventional CD quality. A full spectrum of sound.

 

name

Figure 2: OTSample mp3 at 128 kbps. The missing sound is obvious.

Everything above 15kHz is gone.

To fully appreciate the importance of this loss, consider the reason a clarinet sounds different than an oboe being not only due to timbre and color, but that it offers unique spectral differences. When listening to sound sample 3—which contains “what is missing”—you can hear a complex and thick texture of high pitches. Don’t be fooled by thinking that missing this as trivial or unimportant. These are the exact sounds that fill up the richness of the full-quality sound in context. Whether you bask in the warmth of a Sinatra standard or luxuriate in the lush textures of a Lindberg orchestral piece, the full spectrum of sound fed through the compression encoders will leave a lifeless and empty version of what could be a massively potential explosion of color and richness. The trouble is that being habitual by nature, we get used to things and as we were successfully seduced into the convenience of portability, quick downloads, and streaming media. We are definitely capable of distinguishing the capabilities of our ears. Put simply, the result of the compressed audio is not the same as the original sound file. The scientific desire is to make the difference of the two imperceptible by altering the errors such that we fool our senses by psychoacoustics and after enough repetition, we stop noticing the difference. This manipulative type of indoctrination does have a cost.

There are countless studies about the damage we are doing to our ears by listening with headphones at loud levels for extended periods of time. This poses a concern about the premature degradation of young people’s ears which will obviously lead to hearing loss earlier than normal. The concern of the musician, composer, or anyone serious about listening should be more about what our ears can no longer distinguish. Many others such as my friend at the clothing store report to me that compressed music makes them tired. More scientific studies are on the horizon, but surely the psychoacoustic algorithms used to reduce file size are responsible. Surely the intentions were never to protect quality and performance. Surely we are being forced into a Procrustean bed arbitrarily to satisfy the needs of the expanding market and the exaggerated availability of music that we could never have the time to be intimate with or understand deeply. Finally, 5 megabytes of data no longer sets us back $50,000.00. Western Digital published in August of this year a cost of 8.21 cents per gigabyte of storage. To put that into perspective: $82.10 for a terabyte of data storage. Certainly we can transform and return to full quality. The question is and will always be dependent on the questionable demands of the consumer versus the habitual laziness of the patterning we are accustomed to.

It is possible to invest in higher quality versions of lossless compression models and spend time doing conversions and cataloging. My feeling is that the time is ripe for the simplification of maintaining the highest quality possible. It encourages artistic and financial justice to those who created the music and a respectful integrity as well.

The final insult to this reduction of aural experience are the devices which we use to listen to compressed audio. As we load the numerous sound files onto our portable devices, they in turn, sub-define a discreet approximation of the pitches used to get the sound that is converted from the sound file itself to get into our ears. The heart of the matter is protecting the frequency response from point A to point B. Without posting a product comparison, know that there are differences in the test results and they vary a great deal. Before committing to purchasing a player, know and understand at least these three things: the signal to noise ratio deviation or (SNR), the total harmonic distortion or (THD+N) and the maximum power output coupled with stereo crosstalk. Compensating with expensive headphones or earbuds, might give an illusion that life is better, but in the end what you put into a device, you will certainly get out of it.

The seduction of technology and product evolution can be extremely powerful and at times overbearing. Updating software is a testament to this. Many feel that they spend the majority of their productive hours on their machines learning them rather than actually doing their work. A new feature keeps consumers impressed even if it may not contribute to an insistent and constant work flow. Our ears, however, seem a priceless set of receptors carrying us forward into a lifelong relationship of outer phenomena into an inner experience.

***

name

True Rosaschi



True Rosaschi is a New York-based composer, teacher, theorist and technologist. He is interested in the crossroads of music and intercultural/postmodern theory, improvisation, electroacoustic music and teaching music to adults as a transformative process. He has been a student of Alvin Curran, Fred Frith and Pauline Oliveros. In 2005 with Oliveros, he created a live-networked performance from Vilnius in real time to New York. Rosaschi continues to lecture and perform throughout Europe and the United States.