The Musical Ear

Chris Plack
Photo By Roger Deeble

Our perception of sound depends on the biological equipment we are born with—our ears and our brains. How does the ear decode the acoustic information that we receive, and what can we learn about music from an understanding of how the ear and the brain respond to sound?

From Air to Ear

Sound is composed of pressure fluctuations in a medium (for example, the air). The pressure fluctuations enter the ear through the ear canal that ends with the eardrum (see Figure 1). Vibrations at the eardrum are carried to the cochlea by three tiny bones—the malleus, incus, and stapes (collectively called the “ossicles”). The cochlea is a narrow fluid-filled tube curled up into a spiral. Running the length of the tube is a thin sheet of tissue called the “basilar membrane.” Vibrations of the ossicles produce sound waves in the cochlear fluid, which cause the basilar membrane to vibrate. These vibrations are converted into electrical impulses in the auditory nerve, which carries information about the sound to the brain.

Figure 1
Figure 2

The ear is exquisitely sensitive to sound. We can hear vibrations of the eardrum of less than a tenth the width of a hydrogen atom! The ear is also very good at separating out the different frequency components of a sound (e.g., the different harmonics that make up a complex tone). Each place on the basilar membrane is tuned to a different frequency (Figure 2), so that low-frequency sounds cause the membrane to vibrate near the top (apex) of the spiral, and high-frequency sounds cause the membrane to vibrate near the bottom (base) of the spiral. Each nerve cell or neuron in the auditory nerve is connected to a single place on the basilar membrane, so that information about different frequencies travels to the brain along different neurons.

Figure 3

The ear acts a bit like a prism for sound. A prism separates out the different frequencies of light (red, yellow, green, blue etc.) to produce a spectrum (also seen in a rainbow, of course). Similarly, the ear separates out the different frequencies of sound to produce an acoustic spectrum. Actually, the human eye can distinguish just three basic colors: The vivid sensation of color we experience is made up of combinations of these three sensations. The ear, on the other hand, can separate up to a hundred different sound frequencies, corresponding to the number of frequencies that can be separated by the basilar membrane. We get a much more detailed experience of the “color” of sounds (timbre) than we do of the color of light. This is how we can tell the difference between two different instruments playing the same note, for example, a French horn and a cello both playing C3. Although the pitch of the two instruments is the same, the timbre—which is determined by the relative levels of the harmonics—is different (Figure 3). By separating out the different harmonics on the basilar membrane, the ear can distinguish between the two sounds.

The sensation of dissonance is determined in part by the response of the basilar membrane. When two notes are played together, dissonance is related to the production of “beats,” which are heard as a regular flutter. We hear beats when two harmonics are too close together in frequency to be separated by the basilar membrane. For simple frequency ratios, many of the harmonics of the two tones coincide (e.g., the third harmonic of a 440-Hz fundamental has the same frequency—1320 Hz—as the second harmonic of a 660-Hz fundamental). These simple ratios are heard as consonant. For complex ratios, many of the harmonics from the two tones do not coincide exactly, and those harmonics that are close together in frequency interact on the basilar membrane to produce beating sensations that lead to a sensation of dissonance.

The brain processes the electrical signals from the cochlea using vastly complicated networks of specialized neurons in the brain. The way the sound is analyzed depends on our own personal experience to a certain extent. The strengths of the connections between neurons change as we experience sounds, particularly during early infancy when the brain is growing rapidly.


Figure 4

Tonal musical instruments vibrate to produce regular, repetitive, patterns of pressure fluctuations (Figure 4) (as opposed to some percussion instruments, such as a cymbal, that produce irregular or impulsive sound waveforms). The frequency at which the instrument vibrates determines the frequency of the pressure fluctuations in the air, which in turn determines the pitch that we hear.

Pitch is the sensation corresponding to the repetition rate of a sound wave. Pitch is represented in the brain in terms of the pattern of neural impulses (Figure 5). When a tone is played to the ear, neurons will tend to produce electrical impulses synchronized to the frequency of the tone, or to the frequencies of the lower harmonics. An individual neuron may not fire on every cycle, but across an array of neurons the periodicity of the waveform is well represented. Indeed, if you record the electrical activity of the auditory nerve when a melody is played, you can hear the melody in the electrical impulses!

Figure 5

The highest frequency that can be represented in this way is about 5000 Hz. Above this frequency, neurons cannot synchronize their impulses to the peaks in the sound waveform. This limit is reflected in the frequency range of musical instruments: The highest note on an orchestral instrument (the piccolo) is about 4500 Hz. Melodies played using frequencies above 5000 Hz sound rather peculiar. You can tell that something is changing but it doesnÕt sound “melodic” in any way.

Pitch may be decoded by specialized neurons in the brain that are sensitive to different rates of neural impulses. It seems that the information from the first eight harmonics is the most important in determining pitch. The basilar membrane can separate out these first few harmonics, and a trained listener can “hear out” each harmonic in turn, by carefully attending to the individual harmonic frequencies. The frequencies of the low harmonics are coded individually by regular patterns of activity in separate neurons, and the brain combines the information to derive pitch. For example, if harmonics of 880 Hz, 1320 Hz, and 1760 Hz are identified, then the brain can work out that the fundamental frequency of the waveform is 440 Hz (the highest common factor of these three).

Absolute Pitch

A melody is composed of a sequence of tones with different frequencies. A melody is characterized by the intervals between the individual frequencies (i.e., the frequency ratios between the notes), rather than by the absolute frequencies of the notes. I can play “Twinkle, Twinkle Little Star” in any key I like, and the melody will still be instantly recognizable. We can easily form memories for a sequence of musical intervals, but most of us do not have an internal reference or memory for absolute frequency. There are individuals (perhaps 0.1 percent of the population) who can instantly identify the note being played, in the absence of any external cues. These individuals are said to have perfect or absolute pitch, and may have acquired their skill by exposure to standard frequencies during a critical learning period in childhood.

Individuals with absolute pitch have the ability to form a stable representation of pitch in their memories, which they can use as a standard reference to compare with the pitch of any sound in the environment. These individuals also have a way of labeling the pitch they experience in terms of the language of music. This latter ability is sometimes ignored. It has been argued that there are many people with a stable memory for pitch who cannot provide a musical label in the way associated with absolute pitch, but can, for example, hum or sing a tune from a recording they know with a good frequency match to the original.

Following Musical Sequences

In many situations, we experience a number of different sounds at the same time. This is particularly true if we are listening to an ensemble of musicians, when we may be receiving several different melodies at once. All the sound waves from the different instruments add together in the air, so that our ears receive a sum of the sound waves. It is like trying to work out what swimming strokes several different swimmers on a lake are using, just by looking at the complex patterns of ripples that arrive at the shore. How do our ears make sense of all this?

Figure 6

One of the ways the ear can separate out sounds that occur close together is in terms of their pitches. If the notes from two different sequences cover the same range of frequencies, the melodies are not heard separately, and a combined tune is heard. If the frequency ranges are separated (for example, if one melody is in a difference octave) then two distinct melodies are heard (Figure 6). Some composers (e.g., Bach, Telemann, Vivaldi) have used this property of hearing to enable a single instrument (such as a flute) to play two tunes at a time, by rapidly alternating the notes between a low-frequency melody and a high-frequency melody. Looking at this in another way, the earÕs tendency to separate sequences of notes by pitch constrains (to a certain extent) the melodies that can be used in music. If the frequency jump in a musical line is too great, then the ear may not be able to fuse the notes into a single sequence. The effect is also dependent on the rate at which the notes are played. Melodies with rates slower than about two notes a second can be fused even if the frequency jump between notes is quite large.

Listen to examples of:

  • Segregated Sound
  • Fused Sound

  • We can also use the timbres of different instruments to separate melodies and rhythms, even if the notes cover the same frequency range. For example, a melody played on a French horn can be separated from a melody played on a cello, even if the notes used are similar in frequency. Again, the separation is stronger for rapid sequences of notes. As we learned earlier, instruments with different timbres produce different patterns of excitation on the basilar membrane. The ear is very good at distinguishing different patterns of harmonics.

    Finally, we can use our two ears to separate sounds coming from different directions. A sound from the right arrives at the right ear before the left and is more intense in the right ear than the left. The brain uses these differences to localize sound sources, and we can easily attend to the sequence of sounds that come from a specific point in space. Each instrument in an ensemble occupies a single location, and this helps us to separate out the different melodies. For this same reason, stereo musical recordings (which contain cues to location) sound much clearer than mono recordings.

    Emotion and Meaning

    Why does music have such a strong psychological effect on us?

    The brain is very good at learning associations between events. A piece of music may be associated strongly with a particular place or time. If we hear a piece of music during an emotional experience (falling in love, death of a relative) the piece of music may gain the power to conjure up that emotion. A primitive region of the brain called the amygdala seems to be important in making emotional connections such as this. The amygdala controls another region of the brain called the hypothalamus, that in turn controls the release of hormones such as adrenalin, and basic bodily functions such as the beating of the heart and respiration. In this way, emotional stimuli can produce physiological changes in our bodies. Music can cause stress and fear reactions similar to those produced by events that are truly dangerous.

    Some chords or sequences of notes seem broadly connected with sad feelings (e.g., minor modes) and others with happy feelings (e.g. major modes). Part of this might be due to learned associations, although even three-year-old children associate minor and major modes in this way. It is possible that consonant musical intervals, such as those involved in major triads, may lead naturally to a positive and upbeat feeling.

    Another component of music that can be used for emotional effect is rhythm. I am reminded of the menacing increase in tempo during a shark attack in Jaws. Again, we may learn to associate certain rhythms with particular feelings, although it is clear that, physiologically, a slow tempo reflects withdrawal and depression (slowing down of natural rhythms) and a fast tempo reflects excitement (increase in breathing, heart rate etc.). It seems likely, therefore, that part of the emotional response to rhythm is innate. Indeed, it has been suggested that one of the reasons minor modes sound sad is that they are often played with slow tempi, and children form the association at an early age.

    In many ways, music is like a spoken language, and like a spoken language we need to learn the language before we can appreciate the meaning that is being expressed. An American needs to learn to understand Chinese music, just as he must learn to understand Mandarin or Cantonese. Similarly, most children in the West receive intense exposure to harmonic, consonant, major mode, music. To break away from this brain washing requires a degree of commitment on the part of the listener. It might also help to have the right genes. The evolutionary psychologist Geoffrey Miller has suggested that music (like other art forms) is a “fitness indicator.” According to Miller, musical ability indicates to potential mates that we have good genes that will benefit our progeny. If this hypothesis is correct, then we would expect musical ability to be inherited, and there is some evidence for this. Genetically identical twins are more alike in their musical talents than non-identical twins (although childhood environment plays a greater role). So while we may never discover a “gene for appreciation of avant-garde music” (genetics is rarely this simple), given that many other aspects of our personalities have been shown to be inherited to some extent, it is at least plausible that some individuals are naturally more receptive to new musical ideas.

    Further Reading

    • Bregman, A.S. (1990). Auditory Scene Analysis: The Perceptual Organization of Sound. Cambridge, USA: MIT Press. The definitive work on how sounds are organized by the ear, including a chapter on music.
    • Deutsch, D. (Ed.) (1999). The Psychology of Music (2nd ed.). London: Academic Press. Covers everything from acoustics, to music perception, to music performance.
    • Moore, B.C.J. (2003). An Introduction to the Psychology of Hearing (5th ed.). London: Academic Press. A comprehensive yet readable account of hearing, including the basic physiology of the ear.
    • Plack, C.J. (2005). The Sense of Hearing. Mahwah, New Jersey: Laurence Erlbaum Associates. My new book—out soon!


    Professor Chris Plack was born in Exeter, England, in 1966. He studied Natural Sciences at the University of Cambridge as an undergraduate, and gained a PhD in psychoacoustics at the same institution. Since then he has worked as a research scientist at the University of Minnesota and at the University of Sussex, England, and now teaches in the Department of Psychology at the University of Essex, England. Professor Plack is a Fellow of the Acoustical Society of America, and a member of the Association for Research in Otolaryngology.