Absolute Pitch research, ear training and more
Now that all my books are out of their cardboard boxes and on the shelves again, I can dig into Principles of Perceptual Learning and Development by Eleanor Gibson. I realized that even though I want to move ahead with leveraging Mon's experience and existing relative-pitch ear training exercises, what I want to learn from them is not their recommended exercises as much as the principles and strategies underlying the exercises, and to that end I'd better know what those principles and strategies are. Already, just on page one, I've encountered an interesting phrase.
Especially convincing, history has shown, was [Bishop Berkeley's] insistence [in 1709] that vision, through association, gains its meaning from touch, and that any apparent equivalence of sensory information is the result of associative integration.
If vision gains its meaning from touch, from what association might sound gain its meaning? Vision, perhaps? Many times in my life I've heard a sound I didn't recognize, and been driven crazy not to know; so every time, in order to identify it, I've had to physically locate and look at what's making the sound. Once I have looked at the object, I am satisfied that I understand the sound. But if sound is learned from vision, and vision is learned from touch, then ultimately sound gains its meaning from touch as well as vision, and you can't see or touch sound. So how do we recognize a sound that has no visual association, like a musical tone or a melody?
I was reading H.G. Wells' The Time Machine last week, and while I read his explanation of the principle of the machine, using Time as a fourth dimension, I thought about musical notes.
For instance, here is a portrait of a man at eight years old, another at fifteen, another at seventeen, another at twenty-three, and so on. All these are evidently sections, as it were, Three-Dimensional representations of his Four-Dimensioned being, which is a fixed and unalterable thing.
When I read this, I realized that a note is not literally a four-dimensional object, but a three-dimensional one-- only not the typical three dimensions. Instead of length, width, and height, it has length (wavelength), height (amplitude), and duration. Its lack of width explains why we are unable to see a sound, of course, if it exists only in two physical dimensions; its third dimensional existence in time may be reconciled with the experiment from in Thinking In Sound, where it's shown that people incorporate duration as part of a sound's identity. What Wells has pointed out is that people unwittingly incorporate duration as part of any object's identity. As the object changes shape, position, or appearance along that duration, it's still recognized as the same object. Wells' portrait of a man's life cycle seems parallel to the life cycle of a sound (attack-sustain-decay). The instantaneous existence of any musical note is a two-dimensional representation of its three-dimensioned being. Wells continued:
Here is a popular scientific diagram, a weather record. This line I trace with my finger shows the movement of the barometer. Yesterday it was so high, yesterday night it fell, then this morning it rose again, and so gently upward to here. Surely the mercury did not trace this line in any of the dimensions of Space generally recognized? But certainly it traced such a line, and that line, therefore, we must conclude was along the Time-Dimension.
Is the implicit movement of a musical interval traced along the time dimension? When you play a perfect fifth, one note after the other, you recognize a distance because you have heard the linear motion from point C to point G. But the note certainly "did not trace this line in any of the dimensions of Space generally recognized." Could it be that our mental conception of a musical distance is that of a distance traced in time? I think perhaps it can be so, in which case the most important aspect to recognize (a la Wells' barometer) is that Time is a spatial dimension. When we hear a melody, we are perceiving an invariant three-dimensional shape. It appears to be changing, just as the appearance of the man changed from age eight to twenty-three, but each moment is merely a cross-section of the entire unchanging shape. As far as our senses are concerned, the song is still the same invariant object, its third dimension defined by lines traced not in Width but in Time.
Gibson promises a chapter on "intermodal transfer", which is the term for how touch and vision become associated. I may have to wait until I see that chapter to speculate about how we learn to recognize and identify a musical object, to know what associations bring specific cognitive meaning to an interval or a melody. On the other hand, perhaps musical sound is as direct a perception as is touch, and isn't informed by another sensory input. Maybe that's why ear training can be such a difficult task; with none of our other senses is the Time dimension so clearly necessary for perception and recognition. Perhaps traditional relative-pitch ear training is teaching us to use our ears to "feel" an object's contour directly, like we would feel the edge of a ruler, but along the spatial dimension of Time.
I wonder how a deaf adult, whose hearing was restored, would perceive music. Gibson cites a then-recent (1960) study by von Senden, who examined various adults whose sight had been restored by surgically removing their cataracts:
Senden reported that color alone is attended to at first, but that absolutely no attention is paid to contour. The same form in a different color, he says, is not recognized as the same object.
By analogy, this seems to support what I'd suggested as a reason why retarded people frequently have perfect pitch. If wavelength is the most obvious property of an object and contour is learned, then the learning-disabled would be more likely to stop learning after gaining the simplest concept of sound. But this situation isn't entirely identical; Gibson warns that "the adult who has developed without access to visual stimulation is not the same organism as the infant who has immediate access to it," which makes me wonder instead about the deaf person. Would they be able to hear a melodic contour? Would they recognize the same tune in a different key? Would they think two different melodies in the same key were the same song? What would their answer be to the hoary question, "is a transposed melody the same song?"
I've addressed that question before, but reading Gibson's introduction makes me understand that the question is misleading, since it has no single answer. The question is truly asking, has the musical object changed? and the answer depends on the criterion used for assessing change. If a listener is perceptually indifferent to wavelength, that listener will answer yes, it is the same, because of its identical contour; but if the listener perceives wavelength variance, they will answer no regardless of the contour. Both answers are correct, because the listeners use different perceptual judgments of "sameness". But these judgments are learned. Their choices illustrate what seems to me a central goal of perceptual learning, and I'll borrow one of Gibson's phrases to express it: "the development of perceived invariance."
Gibson begins to illustrate perceptual learning by describing how everything in the world around us is constantly changing, "with every move of head, eyes, and body." If you want an amusing demonstration of how this happens, I still remember one from Eye and Brain. The author asked why it is that when our eyes move, we don't see the world whirling about? After all, our eyes are still in the same place in our head; relatively, the objects around us should seem to be what have moved. Why do they stay in the same places? It's a trick of your own mind. Close one eye completely, and close the other eyelid far enough so you can place your finger on the eyelid and move your eyeball (gently!) with your finger instead of the usual eye muscles. Do this, and you will see the world whirling about.
But even though your perception of everything in the world around you is changing in each instant, you can still recognize which aspects of what you perceive are not changing. You can detect invariances of pattern, shape, color, and more. That's why, even when you poke your eyeball with your finger, the objects that whirl around are still the same objects, although they move differently. It's the "and more" which I find immediately intriguing, though, because I've been thinking about musical sounds as having shapes that are defined by the Time dimension. Gibson offers, in her introduction, one of her own academic goals:
I shall try to show then... that since stimulation occurs over time, as well as over space, and has temporal as well as spatial structure, invariants are present in the stimulus transformations over time; and that it is pickup of these invariants that permits perception of the permanent properties of things.
This is concurrent with Mon's experience, in which he literally created temporal structure by placing the target stimulus (a B-flat) into melodies, chords, and arpeggios. By identifying the invariant among these structures, he learned to perceive the pitch.
I've been pondering a pair of quotes from Eleanor Gibson's book which seem to provide challenges to what I've been supposing about perfect-pitch education. In the first, the author intends to create a focal concept for studying perceptual learning.
...perception is not of stimuli; it is of the distal objects and surfaces. Stimulation carries information about them, but only if it is examined over space and time. It is extraction of this information that characterizes perception, and it is increasing ability to extract this information that characterizes perceptual learning and development.
The second quote, in which Gibson draws from the work of another Gibson (JJ Gibson, 1966), suggests a relationship between stimulus and object.
These variables are not found... at a given instant, but are found by analyzing structure, that is, boundaries, moving edges, transitions, gradients, and transformations over time. "Stimulus energy, unless it has structure, conveys no information. The natural structure of stimulation from the near environment conveys information directly. The structure of stimulation from representations conveys similar information, but indirectly. The structure of stimulation from socially coded or conventional signals conveys information still more indirectly."
The first quote proposes that we perceive only objects, not stimuli. The second suggests that a stimulus, by itself, "conveys no information". A pitch frequency is a stimulus; if you take both these statements literally, then, logically, you can't learn to perceive pitches by listening to pitches. If you hear a pitch frequency, your mind will not perceive any information, and if you do perceive something, what you perceive can't be a pitch because a pitch is not an object. A pitch has only two dimensions, length and height, and is therefore as abstract and imperceptible a concept as a two-dimensional geometric line; once you add a third dimension (duration) it's no longer a pitch but a tone of which pitch is one characteristic. This notion seems to support the idea that even if you successfully learn to recognize all twelve tone objects of the musical scale, you still may not know pitch; and, as such, seems to suggest that note-naming exercises may be less effective than other methods which present pitches within structures more sophisticated than a tone.
These two comments also seem to say that our ability to extract stimulus information is drawn from the structure in which the stimulus is presented-- and not from the stimulus sensation. If so, this would make clear what I've suggested before, that we can hammer G and C and B-flat tones into our brain all we want, and that won't make us any better at extracting those pitches from tones, chords, or music; our increasing ability to extract pitch from music is not dependent on our increased familiarity with the pitch-stimulus sensation, but on our increased familiarity with the structure of musical objects. This also supports the recommendation of learning relative pitch, of course, and explains why we can more easily recognize tones on our primary instrument-- but in addition, it creates an argument against listening to random note clusters while learning perfect pitch. If a stimulus gains its meaning from the structure which contains it, then doesn't it make sense to contain the target stimulus within a meaningful structure? You can see this in early language education; Sesame Street instructs the child, "M: mask, monster, marble," not "M: lkpmbz, mfgej, hpaqm."
In short, I'm beginning to think that perfect pitch training should be combined with basic music theory, to be most effective. Or, perhaps, I shouldn't expect something so simplistic to be complete; Sesame Street was good with letter sounds, but Electric Company had to take over to explain the consonant blends and syllable formations. I shouldn't expect to accomplish everything at once. But it's eminently clear that I need some basic music theory. I can find my way around a grand staff, but I don't know chords on sight nor other fundamental compositional structures. If someone could recommend a good textbook to me I'd appreciate it. (Update 8/27: Douglas very kindly recommended Tonal Harmony, which, with its companion workbook, seems at first blush to be precisely what I needed. Thanks!)
This week I began a class in Lessac Technique, taught by Yanci. This method is meant to improve the performer's vocal presentation. The training begins with strategies of muscular placement, vocal resonance, facial structure, and so on, for maximum utility of the voice, and I'm looking forward to learning those techniques. But when I first read the textbook, The Use and Training of the Human Voice, I zeroed in on one section in particular which wasn't any of these things. I was amazed because, in that section, Arthur Lessac seems to have created the perfect complement to my pitch studies so far, contributing a critical perspective that I had been missing. In my comparisons of music and language, I've concentrated principally on vowels as analogous to pitches; yet, obviously, there is more to language than just vowels. And, although I have been aware of prosody, I have been thinking mainly of how music is similar to language, not much concerning myself with how language is similar to music. Lessac's book has filled in these gaps; I was thrilled to discover that a major component of the Lessac technique is the recognition of consonants as musical sounds.
Lessac illustrates all the consonant sounds as various musical instruments. He writes that the stopped consonants are various percussions: for example, the "B" sound is a tympani, and "T" is a snare drum. The sustainable consonants are strings and winds: "M" is a viola, "Z" a bass fiddle, "R" a trombone. All together, Lessac assembles an entire orchestra! By visualizing the individual effects of each type of consonant sound, as though they were different instruments in an orchestra, the student can implement those effects in his speech and create a more sophisticated vocal style. His sounds will be more varied and distinct, which will make his voice more compelling.
I have to wonder, as I read this book, how many students assume that Lessac is speaking metaphorically. When Lessac creates his orchestra, I suspect that students may assume that the consonants are merely similar to the instruments. But the physical structure of each consonant sound represents a direct parallel to the timbre of its assigned instrument. The "T" sound doesn't sound like a snare drum; it is a snare drum. And even though Lessac discusses "the music of the vowels", it seems probable that a student will imagine that the vowels are "musical" in an abstract way, without knowing that spoken vowels resolve to virtual pitches which are in fact musical sounds. Throughout their training, Lessac students might not realize that when they speak they are not just "sounding musical", but are literally making music. I think this becomes clearer when you consider two counterintuitive facts.
First, the fundamental vocal frequency and the language formant frequencies are separate events. Your mouth produces frequency F0 in the throat, while F1 and F2 (and F3 onward) are created in the mouth. As long as F1 = 800 Hz and F2 = 2000 Hz then your listener hears "a as in hat", so F0 can be any pitch at all. The F0 pitch is heard through the vowel sound-- it is a characteristic, not a component. Similarly, when you hold an "M" sound, the F0 pitch is voiced within the timbre that is the consonant. The pitch is still perfectly recognizable, even though there's no vowel. Therefore, if you accept Lessac's associations, speaking the letter M at F0 = 415Hz is functionally identical to playing G# on a viola. By acknowledging this separation between pitch and language sounds, it should be easy to see that as you speak each word you are "playing" the consonant instruments, each of them resonating at the fundamental pitch of your speaking voice. [Unvoiced consonants like T and CH have no obvious pitch, but they provide the rhythm and texture which percussion should.]
Second, a sound shorter than 450 milliseconds is perceived as a single concept. When you speak the word "turned" in a quick monotone (let's say F0 = 784Hz, which is G5), the T, R, N, and D "instruments" are in fact all playing simultaneously. For language purposes, our phonemic awareness allows us to identify the letters' apparent order-- but musically, all the sounds in this syllable are a single chord. That is, in saying this word, a trombone and violin both play G5 together, on the beat of a tympani and snare. Every syllable you speak is a new symphonic chord arranged at the F0 pitch.
As I continue to read and explore the Lessac theory and technique, I'm sure that I will discover more about how language is like music. I see that there is a section in the book which refers to vowels as the "structure" of language, a statement which seems similar to the assertion that pitches are the building blocks of music. But for now, although I know it is an oversimplified summation, I'm satisfied to accept that consonants are the "instruments" of linguistic sound.
Gibson separates the set of "things we perceive" into five different categories: objects, space, events, coded stimuli, and representations. I've been wondering if Gibson has created these categories because the learning methods are different for each. If that were so, then identifying where pitch and music exist within the categories could suggest which learning methods are best.
The problem seems to be that music is, or can be, at least four of these five. A musical sound could be an invisible, temporal object-- defined by audible rather than visual contours, with its third dimension Time and not Width. A sound creates an implicit spatial perception, which we identify as a scalar "distance" or "degree" (and that's height). Sound is definitely an event, since it happens over time and exists only when some action has occurred. Musical sound is meant to evoke more than the images of the instruments which created it, which makes it coded information, "designed to correspond with some other set of events according to mapping rules which are more or less arbitrary." When Billy Joel sings about the Piano Man, you hear structures and melodies created by many different instruments, and all those musical sounds "correspond" with his lament of playing for that crowd in the bar.
I'm not sure music is a "representation", though. Although there are pieces like Peter and the Wolf which can be said to represent a storyline abstractly, Gibson uses the word literally. That is, even if a picture of a pipe is not a pipe, the picture still offers a visual experience that is almost perceptually equivalent. There are musical sounds which are aurally representational (the "wa-wah" pedal, perhaps) but this is not typical of musical sound.
I suppose that it makes the most sense simply to acknowledge that multiple categories are applicable. These perceptual classes could help us recognize which aspect of musical sound we are attending to; and that awareness, perhaps, could make it easier to infer pitch. A tone object, for example, demonstrates pitch as a consistent characteristic. A melody uses pitch as the spatial points along its contour. A major triad contains pitches which are root and harmony. If there are specific strategies for learning characteristics, or edges, or components, then we should match the strategy to the musical object we're hearing.
Is it necessary to hear a "special quality" of a tone in order to begin perfect pitch training? I've been assuming so, and I think just about everyone else has too. This is the process, and on the surface it sounds logical enough:
- perfect pitch is recognizing tone chroma ("color")
- start by listening to tone chroma
- drill the sensations of chroma until they stick in your mind.
But Gibson's book is making me question that process. I realized that this same perspective and strategy can be rephrased as:
- If you recognize tone chroma, you have perfect pitch.
- To learn perfect pitch, you must recognize tone chroma.
It seems like a Catch-22-- you can't learn perfect pitch unless you already have it. But existing methods of training assume that you do already have it; you just don't know that you have it. Even though trainers seem to agree that you need to develop an awareness of the unique quality of an individual tone (the "chroma"), exercises are mainly focused on helping you to memorize tone chroma, assuming that you are able to perceive that sensation. Although there are meditation drills which allow you to dwell on the sensation of each tone, and I've recommended trigger words and vowel sounds to help you hear and recognize each pitch as individually distinct, none of these strategies actually teaches you to hear pitch chroma. The strategies can help to clarify your recognition of some quality in each sound-- a thing which you already heard but never knew was there-- but only if you "get it".
This is exactly what Gibson addresses when she mentions Gestalt theories of perception. She references the famous type of experiment in which a monkey learns to reach bananas by using a stick. The monkey first throws the useless stick away, but when the bananas appear, the monkey figures out that the stick can be used to retrieve the bananas. The stick doesn't change, and neither does the monkey's sensory experience of the stick; but the monkey "learns" that the stick is also a reaching tool. Is this perceptual learning? Gibson seems to think not, indicating that the monkey's reorientation towards the stick represents a different kind of "perception".
The conception of perceptual learning in this book does not coincide with the conception of insight defined as perceptual reorganization. I am interested, rather, in the role of learning in perception. ...The Gestalt psychologists... rather than studying the contribution of learning to perception, they thought of learning as dependent on perceptual reorganization-- a kind of reversal of the roles of learning and perception... Creative learning, that is, perceptual reorganization, was due not to experience but to self-redistribution of forces within the organism.
The Gestalt psychologist would agree with what I've been agreeing with up until today: everyone definitely hears the tone chroma, they just don't know it, and if they simply alter their understanding of the sound they're on their way to having perfect pitch. Yet if this is so, then according to Gibson's quote, no perceptual learning is occurring in available methods of perfect pitch training. The "learning is dependent on perceptual organization." There is learning, yes, in the same way that the monkey learned that a "stick" was also a "tool", but this is not perceptual learning. It is conceptual insight.
I'm not sure that conceptual insight is reliable. If you've been looking around any of the on-line discussion groups, then by now you've definitely encountered at least one person who complains that, even after trying all these exercises, they just don't hear this mystical thing everyone's trying to tell them about. Since there are people who do get it, it's been easy to assume that the complainer is somehow not doing it right, or is "missing something", or is just thick-witted. I haven't yet seen the idea seriously entertained that these people honestly, truly, do not and can not hear tone chroma. What if they don't already hear the quality of the sound, and therefore can't apply insight to achieve perceptual reorganization? That is, if it isn't enough to just tell them to "listen for something different", expecting and hoping they'll catch on, how do you teach them?
This is what I hope to discover from perceptual learning principles-- how to ensure that someone will learn perfect pitch, even if they don't "get it". I want a curriculum which is scientific according to Dr Weinberger's definition: by applying the same procedure, you will always and invariably get the same results. No more of this "I tried it for a while, but didn't seem to be getting anywhere."..!
I'm also wondering about the usefulness of repeating single note sensations. Conventional knowledge says that the more familiar you become with the absolute sound of each tone, the more easily you will be able to hear those tones in music. I hadn't disagreed with that before now. Although I've discovered that the experience of a complex sound is holistically different from that of a single tone, and that a single tone in a chord is not actually heard but inferred, I still assumed that it was constant reinforcement of the single tone which would make it easier to extract that tone from a more complex sound. That might not be true.
Gibson mentions a 1926 experiment by Gestalt experimenter Gottschaldt in which subjects were shown simple line figures, after which they were presented with more complex shapes-- something like these diagrams.
The experiment was elaborated upon in Kurt Koffka's Principles of Gestalt Psychology, from which Gibson quotes:
After completing exposure of the a patterns, b patterns, which contained in each one an embedded a pattern, were shown for two seconds each. The subjects were told to describe the new patterns, mentioning anything that struck them... According to Koffka, if the empiristic theory was right, 'practice in seeing the a figure should make the b figure look like a plus something else (1935, p 156).' Since 520 exposures of the a figures did not produce more spontaneous mentions of them in the b figures than did three exposures, Koffka concluded that the theory was disproved.
Gibson quickly points out that this "by no means proves that learning does not alter or influence perception," as Koffka concluded, but that's not what interests me about the experiment. I'm interested in the fact that regardless of whether the subject is exposed to the a shape three times or 520 times, their perception of the b pattern does not spontaneously change. Once the a shape is defined for you, naturally you can find it in the b pattern, if you look-- this is learning, but it's not spontaneous, and you don't need to see the a shape more than once to do it. One exposure or a thousand, it's the same result. I'm remembering Sesame Street, too-- although they were big on repetition, most of the segments I can remember which involved placing a simple sound into a complex form (like this a/b experiment) presented the letter sound only once at the beginning and once at the end.
So I'm left wondering-- is it really necessary to play a tone 3000 times over? Will that really help you hear the chroma, and make it easier to extract the pitches from music? The Gottschaldt experiment suggests that no matter how familiar you become with individual tones, your perception of music will not spontaneously change; and, if you make a deliberate effort, you only need to have heard a tone three times, not 3000 times. You just have to have heard it correctly those three times.
Exercises of relentless note and chord repetition seem to be a Gestalt approach. That is, by continually attempting to hear tone chroma, eventually perceptual reorganization will occur, and learning will result. But Gibson thinks this is a "reversal of the roles of learning and perception"; in her view, once you learned to hear tone chroma, perceptual reorganization would result. If this is so, then no available methods actually teach you to hear tone chroma. It's my suspicion and my hope that Mon has already demonstrated the most effective approach to learning tone chroma (as an adult), and that my exploration of Gibson's book will mainly serve to codify and structure that approach in order to make it simple and accessible to anyone.
Mon raised a few questions about my previous entry; most notably, he was surprised that I seemed to suggest that becoming familiar with the a shape would not make it any easier to find within the b figure. I do need to offer some further clarification of what I've written, and I'm copying in below my reply to him. I also need to mention-- I know that the entries I've written about in this phase, so far, are almost purely speculative, although they draw from legitimate sources. In this part of the process, I'm not aiming to prove anything, but to create some additional perspectives and possible approaches which may become useful as I continue.
[Here's what I wrote to Mon:]
The emphasis I'm making is twofold: first, that familiarity with the A shape did not spontaneously cause the complex B shape to seem different; and second, that the experiment assumed that the A shape could be perceived without difficulty.
The lack of spontaneous change I see as mainly reinforcing the idea that even when you know the original figure extremely well, you still have to learn how to find it when it's presented in a B shape-- that analyzing the B shape is a different task. You're right in surmising that repetition of the original shape does have an effect, but this is what Gibson has to say about that:
Hanawalt (1942) found not only that repetition of the same design was effective in locating such forms, but that practice yielded transfer to new designs. Amount of practice was the important factor. An attitude of search could not alone explain this transfer. The subjects learned, among other things, to look for distinctive parts of the embedded design in the whole complex. ...It is not clear... whether the old experience must always be an exact repetition of the new one... we do not know whether or not it is literally the repeated experience itself that accomplishes something, since transfer to discovery of different embedded figures occurs.
In other words, although repeated exposure to the A shape does improve the performance of actively finding A within the B shape, the subject also finds other embedded shapes as a result of the same exposure; so it's not clear whether "A exposure" is actually familiarizing the subject with the A shape itself or, instead, principally raising their awareness of the fact that there are embedded shapes (of all different types) within a B shape.
You mention that "to extract [a pitch] from a chord, I need to be familiar with it," and I'm not challenging that. What I am challenging is whether repeated exposure to the A shape actually helps you to see the A shape itself any differently than you did before. You may become more familiar with what you perceive to be the A shape, but if you can't see the shape properly to begin with, it may not spontaneously turn into a meaningful figure. If you aren't listening to a tone correctly, if you aren't hearing the true pitch chroma, then you may never come to hear that chroma even after listening to that tone 3000 times. It won't just pop into your head. Gibson, in her next chapter, talks about cue-based learning:
Eriksen and Doroz (1963) presented subjects with ostensibly perceptual tasks in which extraneous but correlated cues were provided... Analysis of the results revealed that those subjects who had detected the cue and could report on it showed cue learning in the sense of above-chance performance. A number of these subjects spontaneously verbalized their awareness during the course of the experiment. But the subjects who were unable to verbalize the nature of the cue did not learn.
I interpret this as supporting what I mentioned in my last entry. Current methods of pitch training encourage you to listen for pitch chroma, but do so via a Gestalt assumption that you already hear it and just need to be cued to recognize it. But cue learning is unreliable, this experiment demonstrates, because there are some people who "get it", and learn, but the people who don't understand the cue will not learn. This is why I am so interested in what you did-- you didn't use meditation, hypnosis, trigger words, vowel sounds, or any such external-but-correlated cues, applied repeatedly to individual tones; you forced yourself to hear the phonemic identity of each pitch directly, by inferring an invariant component from complex musical objects. I think this will very probably work for anyone, whether or not they can insightfully grasp the concept of "pitch chroma" like the subjects in the Eriksen/Doroz experiment were able to verbalize their experimental cues.
Today I needed to kill ten minutes at the music library, so I thumbed through Selected Theories of Music Perception by Harold Fisk. One of the chapter titles caught my attention-- "Why Music Does Not (Usually) Sound Like Speech"-- and, intrigued, I began to read.
The chapter cited two studies that had been accomplished by Campbell and Heller (I guess I just have to get used to the fact that academic citations never use first names) in 1978 and 1980, in which the researchers discovered that people could identify the timbre of an instrument-- distinguishing, for instance, a violin from a saxophone-- within only 100 milliseconds. The writer offered his evaluation of that experiment:
Counting voice as an instrument, I reasoned that a speech sound could (also likely) be discriminated from a nonspeech sound given as little acoustic information as that provided by the first 100 milliseconds or so of the signal.
This statement was the centerpiece of this chapter. Naturally, when I read it I thought of the entry I'd posted here just a few days ago, reporting on the Lessac technique, and I became excited. Had I serendipitously found the proof (or disconfirmation) of my speculation-- would I find out for sure that consonants were (or were not) like musical timbres? I read further in the chapter, looking for the experimental data that would demonstrate this hypothesis, and was frustrated when I didn't find any. As it turns out, although the title of the book had made me think that it was a collection of scientific articles selected and edited by Fisk, it was actually a series of philosophical essays written by him.
I felt let down; his was a reasonable speculation, but without scientific proof it is, unfortunately, easily contested. For example, I've learned that our minds process sound differently based on our expectations of the sound; what if, for example, a person was played a musical sound but told that it was a speech sound-- or vice versa? When I got home, I recorded myself speaking Lessac consonants and used MIDI to generate notes of the corresponding timbres. I immediately found that 100 milliseconds is a long time-- you can fit an entire drumbeat into 100 milliseconds-- and that Fisk was quite right, in that I could easily discriminate musical from speech sound. But while I was listening, I also confirmed that a 100 ms speech sound can definitely sound musical if you're not aware it's speech. As long as you know that I'm speaking the letters L and N, you can easily recognize which is which; but if you didn't know, it would just sound like a couple of synthesized musical tones. I wanted to see proof of whether consonants and instruments were psychologically distinct, and Fisk hadn't taken the idea far enough to be of help to me.
But the important part of this experience, for me, isn't Fisk's content. It's the fact that I felt let down that he had an interesting, reasonable idea but didn't do the scientific work to prove some point. (it was especially frustrating because setting up an experiment to test this idea would have been so easy!) I was surprised recently when a fellow read Phase 7, in which I discuss my own theories about the connection between language and music, and responded to me that it was a "not very convincing argument." I hadn't understood what he found lacking; but the Fisk chapter now reminds me that it really doesn't matter what my logical "argument" is. Until I take the time to do legitimate tests on my hypotheses they can't possibly be convincing. Interesting, yes, and compellingly probable, but definitely not convincing. I'm glad to be reminded of this, mainly because now I'll start wondering how I can test the music/language correspondence hypothesis that I began with on July 26.
Before getting to her own theories of perceptual learning, Eleanor Gibson wades through a lot of theories which she positions as "old school" thinking.
She doesn't seem too keen on cognitive theories of perceptual learning. She characterizes these theories as "enrichment" theories, because each of them suppose that sensory input is supplemented (or "enriched") by inference or hypothesis, and she describes them with apparent skepticism. She acknowledges that cognitive theories do seem to explain various strategies with which we categorize and comprehend sensory input, but she concludes that they fall short of being useful. Because they're "thought to be on the inside, to be inferential and unconscious," they can't be easily tested or proven. That is, they may be perfectly valid descriptions of some aspect of our perceptual process, but in the end they aren't much practical help. She proceeds to this alternative:
Another group of theories... may also be classified as enrichment theories, but the supplementary processes are thought of as responses instead of conceptual or inferential processes. These response-oriented theories assume that association is the essential mechanism of integration, usually Pavlovian conditioning of stimuli to responses.
In this category she includes "motor copy" theory (I mentioned this theory earlier), in which the sense of vision is learned by association with motor stimulus. This theory posits, for example, that when the eye perceives a "cube", it is the direct tactile experience of the cube that will allow us to generate a meaningful visual imprint of the "cube" concept. The contours which are visibly suggested are physically confirmed. This seemed plausible enough, but Gibson raises important questions: "how would it explain perceptions other than those of form and scale? How would it treat, for example, the perception of color differences?" She also points out the potential contradiction that, if motor confirmation were required to perceive a contour, then an object's contour would not necessarily be perceived, and thereby could not be confirmed through movement!
Gibson also takes some time to discuss strategies of verbal labeling. I took particular interest in this section, because, of course, I have thought that trigger words and vowel sounds are useful for improving your ability to recognize pitch sensation. She describes a number of perceptual experiments which involved different types of verbal labels-- nonsense sounds, alphabetic assignments, shape-descriptive words, and others-- and in most cases, experimental groups which used verbal labels performed better in identification tasks than control groups who had no labels. But even though the evidence in favor of verbal labels is roundly positive, she does not accept the conclusion that the labels themselves were responsible for the improved performance. Rather, in each instance, she finds a reason to challenge the conclusion, like this one: "can one say that the label and its response-produced cue are the cause of increased generalization? It seems at least equally plausible that the children learned to pay attention only to that aspect of the forms which permitted most economical division into two categories." The role of the verbal label, where it appears to improve performance, is that of allowing the subject to explicitly qualify a differentiation that they have perceived separately from the label itself.
Labeling, if I'm understanding her view, doesn't actually contribute to perceptual learning, but it helps to solidify learning when it occurs... usually. She cites at least one experiment where, after multiple practice trials, those who used labels demonstrated equal performance to those who didn't. It occurs to me that I've promoted this idea, although without really knowing it, when I recognized that the trigger words and vowel sounds existed principally to bring you to the point where you could hear the pitch sensation and recognize that it sounds "like a D"-- that eventually, the labels do become irrelevant. I had not, however, considered the probability that the labels do not directly affect the learning process.
She finishes her treatment of existing theories with an introduction to "differentiation" theory, and this theory appears to be the cornerstone of her book. I must admit I like it already.
The last type of theory I have called stimulus oriented, because it considers perceptual development to be an improvement in discrimination of information which is actually present in stimulation. Differentiation of perception occurs as response to aspects of information in the stimulation becomes more selective and specific.
I like it mainly because an "enrichment" theory would presuppose that a neural model of perception exists and may be applied-- but in the case of pitch sensation, there may be no existing model. Differentiation theory, from what I see so far, suggests that learning may be induced by repeated and structured exposure to existing sensory stimuli. The "repeated" part I knew-- now it's a better structure I'm looking for.
P.T. Brady didn't actually teach himself perfect pitch.
I've previously expressed my frustration that, since I wasn't affiliated with a university, I didn't have access to the academic resources and journals that professional researchers could make use of. Well, today, I suddenly remembered that now that I am affiliated with a university, I do have access! I stopped by the science library and, within fifteen wonderfully short minutes, I had in my hands P.T. Brady's 1970 article "Fixed-Scale Mechanism of Absolute Pitch", and an unexpected bonus-- Lola Cuddy's 1968 article "Practice Effects in the Absolute judgment of Pitch", from the same journal, which (in skimming Brady's article) I discovered had been Brady's principal inspiration. He also cited Felix Salzer's Structural Hearing as a significant influence, and I've now ordered that book; I wonder how it will compare with Mathieu's Harmonic Experience? It was especially exciting to have found these two articles so easily, because, as I passed through the circulation desk, the fellow behind the counter remarked with some surprise that in the thirty-two years that these had been sitting on the shelves, no one else had ever taken them out before.
P.T. Brady has been referenced in most of the research that I've read about absolute pitch as "the only man to have taught himself perfect pitch as an adult." Although that statement has to be modified to acknowledge that his is the only scientifically documented case, I had been curious ever since I learned of his accomplishment to know exactly what his new experience with pitch actually was. Marguerite Nering had, at least, described how Brady had done it, but even then, I still only knew of his results: that he had learned to name notes with great accuracy. I wanted to know exactly what he had taught himself, and how he described his experience as a result of his self-training. What had he really learned? I say now with unbridled glee that he didn't actually teach himself absolute pitch! Instead, he learned what Bruce Arnold calls "One Note Relative Pitch"-- and for only the key of C. Note the title of Brady's article: "Fixed-scale Mechanism of Absolute Pitch."
Now, the reason I'm so delighted isn't because I wanted to prove anyone wrong. Cuddy offers, in her paper, an appropriate admonition.
A great deal of controversy has surrounded the problem of absolute pitch. It has been suggested that absolute pitch is an innate gift (Bachem, 1940; 1955; Revesz, 1953), that it is learned but is dependent upon early experience (Copp, 1916; Watt, 1917; Jeffress, 1962; Ward, 1963b), or that it is attainable at any age through training (Meyer, 1899; Seashore, 1919; Riker, 1946; Neu, 1947; Brammer, 1951; Lundin, 1953). Our present purpose is not to decide among the various explanations of absolute pitch-- for one thing, it is logically impossible to "prove" that absolute pitch is not innate, and, for another, one can always choose to define absolute pitch so as to exclude all cases where some kind of formal training can be detected. Our purpose rather is to develop a method by which listeners may improve their judgment of pitch...
Certainly, at the time, by many folks' definition of perfect pitch, Brady did learn it, and the only reason I can say he didn't is that my definition is different. That's still not why I'm so gleeful-- however, for the moment, I would like you to consider the validity of my distinction. This is based partly on the results of his training, but also on the mentality of the training itself, as Brady describes it: "For example, the sequence G-A should not sound like a whole step; the G should sound like C's dominant, and the A like C's minor." If the G sounds principally like a "dominant" tone, then it's a relative judgment being made, not an absolute one.
Now, here's what Brady could do:
- For 57 days, upon waking each morning, he identified a random note, with 67% accurate responses and 31.5% semitone errors, for a total of 98.5% adequately correct answers.
- "I was able to identify every note... from a uniform-tone distribution played at the fastest rate, without feedback... the task became very easy."
- "A flutist and I conversed in a soundproof booth, and roughly every 3 minutes, the flutist played a note from a randomized list... From five notes played, I made four correct responses and one semitone error."
- "While random notes are played, I have no trouble retaining C."
And here's what he couldn't do:
- "the entire mechanism collapses on hearing a fragment (a few seconds) of music played in a key other than C."
- "On hearing most music, I still cannot identify the key without considerable effort and occasional gross errors... yet possessors of AP have told me that their easiest task is musical key identification..."
- "The sense of pitch memory waxes and wanes... [on some days] I have to struggle to remember any note."
Most telling, though, is his description of how his new skill actually works, relating all tones to the key of C, without hearing any new and unique sound qualities of those tones.
If you state that you are going to play a random note for my identification, I will simply try to reconstruct the C-major scale so that when the note is struck it will be identified as if someone just played the whole scale. It is the same task that a trained musician can perform; there seems to be no new dimension.
Now, why am I so pleased to read all this? Not because I want to tell him he's wrong, but because his work seems to tell me I'm right. Everything he wrote is in complete accordance with my speculations so far. Based on his testing methods, I would have predicted the results he achieved. I would have been surprised if he had been able to identify any tones within music. I would have expected that musical perception would interfere with his ability. His ability to recognize the "absolute" sound of C-major tones, but not any other scale, correlates with what I was writing about on May 23 of this year. And, marvelously, one of his section headings says outright, "The Inefficient but Successful Acquisition of AP" (my emphasis). I could go on-- there's more-- but that's the gist of it. (If you have any further questions about his process or his results, please feel free to write.)
Most importantly, I'm tremendously relieved that discovering this article means that I can keep moving forward in the direction I'm going. Before today, I had been seriously concerned that, without knowledge of Brady's successful method, I would find myself overlooking an essential factor in perfect-pitch training; furthermore, if his results contradicted my research, then I would have needed to refine (or throw out completely) whatever had been contradicted. Instead, I get to keep building in the same direction, with an even more solid base than before.
And if that weren't enough, I now have additional support for what I was thinking about before, that you can't learn pitch by listening to pitches-- or, more precisely, that pitches are most clearly and easily inferred from a musical structure instead of a note. In Taneda's book, he says right at the top of page 33, "Absolute pitch develops primarily with chordal recognition," and "if the child learns to absolutely recognize two chords, he or she can gradually also hear absolute single sounds." Naturally this isn't the entire picture, but it's developing pretty well so far!
As I've been reading about perceptual learning, I've also been teaching myself some basic music theory. Combined with my awareness of intervals' harmonic identity, this has been particularly applicable to a class I'm taking in musical theater (my singing technique badly needs improvement). Right now I'm rehearsing "On This Night of a Thousand Stars", from Evita; and, as I looked at the sheet music and listened to the song last week, the piano chords seemed oddly familiar. This is a song about wooing a woman who is already fond of me; a sentiment sort of like "My Bonnie". I wondered if... yep, all the chords were major sixths. Yesterday, I also asked to look at the sheet music of one of my classmates' pieces; in his song, about reaching for a love that's just out of his grasp, the chorus climaxed at a particular note that created such a powerful, tangible sense of hope and frustration that I immediately wanted to know what the interval was. Turns out it was a major seventh (F# in the key of G). That is, of course, the leading tone of its scale, which is emotionally "reaching for" the tonic that's just out of its grasp.
I'm coming to wonder with more interest to what effect perfect pitch might be used in everyday speech-- or, more specifically, in performance speech. The speech mechanism has two distinct parts: the language formants, which we interpret explicitly, and the fundamental pitch, which seems cognitively irrelevant to language. But the fundamental pitch is a musical sound, and musical sound creates abstract emotion just as directly as language sound creates abstract aural association. If you speak the word "panda" you conjure literal images of a fuzzy black-and-white beast; if you sing the musical interval of a seventh you [can] conjure emotional images of desire and longing. It seems probable to me that a performer with perfect pitch, aware of harmonic effect, could mindfully modulate his spoken pitch frequencies for a very precise impact.
I'm not quite sure what to make of this curious little fact... Taneda, in his book, says that a child will learn and recognize F# and Bb before any other accidental tones. PT Brady agreed that Bb was the easiest to recognize. When I was plunking out intervals at the keyboard, I was surprised to discover that F# and Bb are the only accidentals which form perfect fourths and fifths with non-matching keys. All the other P4s and P5s are either both white keys or both black keys. Is this a coincidence? Or is there something significant about this distinction? I've been told that F# is the boldest of the musical notes; what makes it and Bb so special?
Although I ask this somewhat rhetorically, since I don't know how to answer it myself, it's not an idle question. Gibson's book reports how stimulus categories are critical to perception. She describes one experiment that identified nine recognizable features of language phonemes, and tested people to learn how they confused the sounds with each other. Here's a chart of the feature categories:
From On Human Communication by Colin Cherry, MIT Press, 1965
When the scientists analyzed which phonemes were mistaken for each other, they confirmed that errors most frequently occurred when the phonemes differed by only one or two features. These results were duplicated by other researchers in 1968, and another experiment showed similar results with written letters (using categories like "horizontal/vertical" or "open/closed"). This categorical method of classifying perceptual distinction could potentially be useful for determining, objectively, which musical tones are most dissimilar-- but the challenge there is identifying the categories. The researchers acknowledged that their phoneme (and letter) categories were intuitive choices; they were able to create the categories because they were already able to recognize the differences, and they could test the categories because their subjects could absolutely identify the phoneme sounds (or symbols). In order to develop a similar categorical scheme, I would need a consensus from people who already had perfect pitch, and I'd need an adequate number of test subjects who also had perfect pitch. At this point I'm not completely certain that the effort it would take to create this structure would pay off in the long run-- but Gibson's book places such a strong and convincing emphasis on feature differences that I do wonder if I should start finding those people.
The other experiment was one in which kindergarten children were shown a simple shape, as a prototype, and then asked to determine whether a new figure was the same or different. The new figure would resemble the prototype, but would be distorted in some way: rotated, curved, or resized, for example, among other possible changes. After some training with these shapes, the experiment began. The control group (C) was shown unfamiliar shapes that were distorted in unfamiliar ways. One experimental group (E1) was shown the familiar shapes, but they were distorted in unfamiliar ways. A second experimental group (E2) was shown unfamiliar shapes, but the types of distortion were familiar. The results:
|Group||Number of errors|
The E2 group were able to identify completely unfamiliar shapes as "the same" just because they recognized how the shapes could be distorted. When the E1 group was shown the same shape but in a different context, they couldn't recognize it. This seems very similar to how people can memorize all 12 tones of the musical scale, when played in isolation on a piano, but still can't recognize them in a different context (in music or real-life sounds). They don't know how the pitches can change, so they aren't able to recognize a pitch that has been changed.
In addition to these results, Gibson describes other, similar experiments which also support the conclusion that "learning of a prototype was not the sole or the essential process in improvement of discrimination." Obviously, a prototype must exist in order to make the comparison, and Gibson says that the prototype is undoubtedly important "when retention over time is required", but she emphasizes that the learning process is enhanced not so much by prototype memorization, but through perceptual differentiations.
Although I've written against note memorization almost from the start, these results appear to offer the scientific support I wanted for my recent speculation that, instead of trying to imprint the prototypical 12 tones, the learning process should focus principally on demonstrating transformational differences. That is, if it's demonstrated for you how a G tone can change from one context to another, you should become able to recognize the G-- without ever having had a stable idea of what "G" was to begin with. That is, it should be this type of differential learning process which would teach you to hear "G" even if you never figured out what "pitch color" or "tone chroma" was supposed to be.
Of course, I realized almost as soon as I'd written that last entry that the probable reason for F# and Bb being most recognizable was their horrific harmonic ratios in equal-temperament C-major. The minor seventh is even uglier than the augmented fourth-- no wonder Brady found it the easiest to recognize. That doesn't exactly explain the black/white fourths/fifths thing, but I don't think I need to wrap my head quite so far around that one. Either it will serendipitously show itself to be important or it won't.
I've recently been challenged, by a notable in the music-education department here at the university, to "define the problem" in a way that can be scientifically tested. Although I think I see the broader scope of what I'm proposing-- adult perfect-pitch training as development of phonemic awareness, treated as a perceptual-learning task-- the challenge is to determine where to begin, in a way which will convincingly demonstrate that this is the correct direction. On the one hand, I could attempt to demonstrate Taneda's method, but that necessarily requires a study over multiple years, involving enough children to create a meaningful sample. On the other hand, results for an adult population might be more immediately evident, but an experiment would need to show that the results are indeed a step towards perfect pitch and not merely aural discrimination.
On the plus side, a perceptual-learning experiment for adults, properly designed, could show more conclusive results than note-naming experiments have been able to do. Thanks again to my association with this school, I finally tracked down yet another elusive publication: the Mark Rush study from Ohio State. This is a PhD dissertation, where Nering's was a master's thesis, so its treatment is more rigorous and its background material more extensive, but it's hampered by the same hypothetical weakness: because Rush and Nering both define perfect pitch as "naming notes", they are only able to demonstrate whether or not their subjects got better at naming notes. Their subjects did get better at naming notes, but neither study proves that subjects actually acquired perfect pitch perception as a result of the training. Indeed, Rush seems to have uncovered something which I have wondered about for some while now:
Absolute pitch ability following training was related to advancement in the training method; a strong correlation was found. Correlations between the posttest scores and measures of the amount of effort expended by the subjects were only moderately strong. It was also found that correlations between absolute pitch ability and native ability were slightly stronger than any of the other correlations, though this finding had not been anticipated.
That is: the better a musician you already are, the easier it will be to learn how to identify tones via a note-naming system. Although Rush determined that an ability to name notes increases proportionally to progress in the training material, your ability to progress in the training appears to be strongly dependent on your existing musicianship.
In this respect, I found interesting the results of three particular subjects compared to the comments they submitted. Rush says that his results "indicate that at least one and possibly three of the five subjects who completed the training regimen actually became possessors." Rush says this because all three of these subjects-- J, T, and U-- accurately named as many notes on the posttest as someone with perfect pitch might have been able to. But here's what they said about their accomplishment.
Subject "J" (50% correct, or 75% within 1 semitone): "This did help develop my relative pitch sense. I was much more aware of intonation problems in Symphonic Choir... I really do believe absolute pitch is acquired, especially after practicing. I feel that I am (maybe) halfway there."
Subject "U" (57.5% correct, or 87.5% within 1 semitone): "...I think that it helped my relative pitch extensively. I also think that I've got a beginning sense of absolute pitch. ...I think if I kept at it I could probably perfect it in time."
Subject "T" (88.3% correct, or 93.3% within 1 semitone): "I made it through. What's really strange is listening to the radio and naming the pitch. The study, by itself, was not quite enough to get it going, though. The relative pitch skills (especially relating notes to songs: Beethoven's 5th to remember G, etc) were very helpful. ...[from] the study and my previous training with relative pitch, I now have close to perfect pitch."
Rush indicates, elsewhere in his study, how the people who did well on the pre-tests tended to perform best on the post-tests. I had strongly suspected that this factor-- existing musical ability (especially relative pitch)-- was probably the X-factor which caused people to differ so wildly in their results using commercial perfect-pitch courses or software. I'm pleased to have some scientific support of that, because it helps me refine the experimental problem. It makes me all the more intent on finding out how perceptual learning can teach people what they don't already know.
At rehearsal tonight, I inadvertently confirmed that our accompanist has perfect pitch. I had observed her sight-reading and improvising in the classroom, and noticed how she had memorized the entire musical score for "The Big Bang" which she performed at the Hippodrome state theater, and I expected that she would have it. Tonight, while the cast rehearsed a dance number I am not in, I was tweaking the translation of Taneda's book on my laptop; and she was in earshot when the director curiously asked me what I was working on, so she volunteered the answer to my suspicion without my having to ask. Actually, there were a handful of people in earshot, and since they are singers, I suddenly found myself explaining to some very curious performers what you probably have already read in these pages I've written.
What makes this particularly interesting is that, in learning her status, I had the opportunity to ask her about how she plays piano. I explained what I'd learned from typing all those German words (the ideas I wrote about in July) and asked her if this matched her experience in playing piano. What she seems to have verified-- I say "seems to" because we had only about ten minutes to chat during the break, so there is surely much more to know-- is, well, everything. Specifically, she confirmed three particular suppositions which I thought would have to be true if reading music were identical to reading language. She said that she reads the sounds and her fingers automatically find the keys for the sounds she hears in her head, like the letters of a computer keyboard; she agreed that she doesn't actually perceive the individual notes of a chord when she plays it, but the entire unit as a single aural shape, like a word; she confirmed that if she sees the beginning of a musical phrase, she knows what to expect next, in the same way that anyone would know the last word of an incomplete sentence. Of course, this doesn't prove anything, scientifically, but the important thing for today was that, as I explained my theories equating musical and language perception, she agreed that I was accurately describing her experience.
One important difference between her experience and what I've heard from the people with perfect pitch I've met on-line is that she says she does hear an emotional "lift" or "drop" from a scale played ascending or descending. Does she hear "distance"? Are note acceleration and distance also psychologically separate and distinct from each other? I hope to have the opportunity to talk with her further about the nature of her perception, and she is intrigued to do so, although her schedule is about as busy as mine (obviously, one can't have lengthy theoretical discussions when one meets in rehearsal); and, of course, I'm heartened because this makes me realize that I could probably find a remarkable resource just in canvassing the music school. I've already approached the university's Music Education department; I hope that they can provide additional help in developing the new curriculum (v 3.0!), and I also hope to find aspiring educators who are interested in teaching Taneda's method in a way that I can officially, academically, study and report on.
In one section of his book, Taneda talks about why he uses colored piano notation instead of traditional black-and-white notation. He claims that the black and white note system is too abstract for a three-year-old child. Here is part of his explanation.
"Although it certainly seems that children of three to five years can learn note reading, what appears to be note reading is most likely some other process. When a small child sits in front of a page of traditional notes, of course you can see for yourself that the child is, apparently, playing the music on that page; but the child is not really reading the notes. Rather, the sounds and movement are flowing from the child’s memory.
"With somewhat older children, five to six years of age, it is certainly possible for them to read notes, but their capacity for free musical expression is limited, because so much of their effort must be applied to reading the notes. It is, unfortunately, a fact of musical training for children that development of note literacy rarely takes place. When this systematic instruction is missing, the child is not fully able to realize their musical training. This can be observed in older children, ten years and up, who can read notes only slowly and with great effort. In such cases, the note reading gradually becomes an intolerable hindrance, and psychological barriers arise out of the child’s frustration towards their inability to succeed. Each attempt at playing is perceived as a terrible effort, and the danger of this is that the child will give up completely."
This passage precisely describes my own experience with Suzuki piano. I began training around age three, and by kindergarten was playing "The Happy Farmer" and surprising the kids (and adults) at school with my apparent precociousness. But as I progressed in the method, the pieces became progressively harder and harder to play until I did give up out of frustration at age 10, just as Taneda predicts.
Taneda's explanation makes sense, knowing the Suzuki method. The child is supposed to listen to a recording of the musical pieces and then duplicate that recording on their instrument. This is supposed to help the child to understand the rhythm and the dynamics of each piece. However, this process also makes it possible for the child to learn the piece without ever actually having to read the music on the page. Early last year, when I bought my new keyboard, I bought the first few volumes of the Suzuki method along with it. I was surprised to open the first book, and then the second, and discover that I could immediately play any of the same songs I'd played back then, twenty-odd years ago, even though I hadn't practiced them since. The sounds and movement were effortlessly "flowing from my memory"-- although I couldn't actually play them from memory. If you asked me, I would point to exactly where I was in the piece, and tell you that I was reading from that particular measure. I blazed through the first couple of books, pleased that I was making good progress, and optimistic about being able to start learning piano again... and then I reached the first piece I hadn't played as a child, and wham. I found myself struggling to find simple triads and harmonies with my right hand; and my left hand was practically useless, unable even to consistently remember at which spot on the keyboard to find the bottom note of the staff, much less construct chords or harmonies. I was definitely not reading this music, although I certainly had seemed to be! Rather, I was using what I do know about musical notation to find visual cues that would allow me to access the memory of how to play the song.
Naturally, this isn't going to be everyone's experience with Suzuki method, but it's definitely mine. Clearly, "note literacy" was ignored in my training; very probably there are Suzuki teachers who continue to emphasize note literacy, but rather than teaching reading by absolute pitch they focus on the association between the printed note and the physical movement to the indicated piano key. I also think there must be some validity to the idea that bowed (stringed) instruments are more likely to instill perfect pitch, because they require the student to be attentive to the pitch of their sound where a keyed instrument does not; I was amused when a musician recently told me how strange it is to her that the people with perfect pitch "all seem to choose violin."
I've been reading more of Gibson's book, and trying to separate out what's applicable to learning. I'm intrigued that back in August 2002, when I first started this website, I stumbled across some of the principles she's explaining, but of course her context and purpose make the ideas more meaningful.
This has caught me by surprise today: the Ball-Stick-Bird reading system asserts and addresses this question: how can you read if you can't remember the sounds of the phonemes? It seems to be a direct response to Taneda's complaint (which is an extension of the quote I cited in the previous entry): how can you learn to read music if you don't know the sounds of the pitches?
There's plenty on this site that appeals to me (no, not that horrible artwork). The system itself seems to almost precisely match the method by which my mother taught me to read at age 2, so I'm inclined to expect that it would be effective and that its concepts would be verifiable. I'm intrigued by how strongly the author has to argue against "phonemic teaching" techniques-- the word "phoneme" didn't even exist until the 20th century, and now phonemic training is considered the best (or only) way to teach reading? This site would certainly make you think that. Most important, though, as far as I'm concerned, are the examples which are applicable to my research.
How can you learn to read music if you don't know the sounds of the pitches? If the Ball-Stick-Bird method is any example, this can be accomplished through context and through "building". I had speculated in my previous entry that Suzuki students who don't have perfect pitch might have learned "note literacy" through extensive motor drilling; it also seems possible that, by the student's increasing awareness of musical context and content, they would be able to read music fluently without ever knowing the pitch sounds, just as the author of the B-S-B method learned to read language fluently without ever fully grasping the phonemic sounds.
I don't think I can emphasize strongly enough the significance of this fact: the author learned to read fluently from context and structure, without learning to read the phonemic sounds. She describes her childhood experience with reading words in a way that sounds strikingly like a person who can't hear the middle pitch of a triad and can't read sheet music (just substitute "chords" for "words"):
How can you take words apart? Words don't have parts, like a car. Besides, I don't hear those different sounds that people claim they hear. Maybe they're just pretending to hear those different sounds. And the funny symbols that are supposed to represent the different sounds - they all look alike.
Faced with this evidence, and the success that she subsequently discovered in being able to read and write, how would anyone ever convince her that she actually needed "absolute phoneme" ability? Yet, if you yourself are a normal reader, can you take a moment and think to yourself what it would be like to experience language without being able to hear the individual sounds? Could you read without actually being able to hear the words in your head? The B-S-B system's strategy of "word building" seems rather like the Speak & Spell analogy I'd written about on July 21. It's practically unimaginable-- laughable, even-- and yet this how we expect to experience music, without absolute sound comprehension.
What I was initially drawn to on this site is the case of "Tom", in the article "Is Phonemic Awareness a Prerequisite in Learning to Read?" Apparently, Tom had perfect pitch by age 13, but was still unable to read. This, along with the review of my own material that I've been doing (I'm trying to write a comprehensive and comprehensible summary, which isn't as easy as it might seem, since my thoughts range all over the place and thus resist distillation) has prompted me to re-think my idea of calling perfect pitch "hyper-linguistic". Language sounds are harmonic-- they are composed of distinct combinations of frequencies, not single frequencies. Although I have suggested that perfect pitch is, cognitively, an "extension of the phoneme set" into the twelve musical pitches, and it is true that the language area of the brain is active in absolute pitch judgments, perfect pitch isn't necessarily a hyper-linguistic skill. If a person can have perfect pitch and be unable to read, then it's clear that pitches aren't added to the phoneme set, even if they are processed in the same way. It makes sense enough that the language area of the perfect-pitch brain could become enlarged, not because it's superlatively skilled, but merely because it's pulling double duty.
In all, the site principally demonstrates what I take to be a parallel between the "reading deficient" student and the experience of proficient musicians who do not have perfect pitch. It is mainly designed to teach a person to read despite phonemic awareness, and not intended to induce phonemic awareness. But I don't know if this system doesn't also induce phoneme awareness, which it might; and, since (thanks to Gibson) I have been wondering about context as a necessary feature of perfect-pitch training, I wonder if the B-S-B system might be on to something when it suggests that we will learn best when the learning concepts are embedded in a story. A musical melody is a "story", isn't it?
Gibson says that, although perceptual learning is guided by mindful instruction, the learning itself is entirely unconscious. So it definitely would make sense to embed the target percepts in a meaningful structure, as I've considered. I've already heard from many people (including Subject T's comment in Mark Rush's study) that melody association was a strong factor in their being able to identify notes. I am not confident that the first-note-memory strategy is an effective one, but perhaps-- and this makes even more sense in light of what I was writing about on May 23-- the first stage of perfect-pitch training for adults need not be pitches or even chords, but simple melodies in a particular key signature. After all, people with perfect pitch say that key-signature identification is the easiest musical task. Perhaps being "tuned" to the key signature is an effective way to impress the pitch sensation on the brain without actually having to present or recognize an actual pitch.
Well, whether or not I can infer a specific type of exercise from the B-S-B system, it observes potentially important features of acoustic learning and development which I'm glad to have found.
Taneda, in his book, strongly emphasizes what he claims is a critical pedagogic principle. (The italics are his.)
Because children often react to failure with hostility and denial, it is better to simply ignore errors at this stage, and instead respond positively to the next correct answer. This instruction applies both to the teacher and the parents. In many cases where this pedagogic principle was neglected, the listening instruction came to a complete halt. Corrections from the teacher or the parents lead to frustration, and in most cases to resistance. It is often difficult for the parent to ignore the child’s obvious mistakes; of course, they want their child to demonstrate high achievement, and therefore it is difficult not to challenge an incorrect answer. Nevertheless, the principal task is to not draw attention to the child’s mistakes.
Taneda's admonition seems like a simple application of modern child psychology, and when I first read this passage that's all I took it to be. However, as I have been reading further in Gibson's book, I find a particular statement repeated again and again, with one example after another, in chapter after chapter: knowing whether or not you answered correctly is irrelevant. I was especially pleased to see that one of Gibson's many examples an experiment which showed that "correction given by the experimenter is not essential for improvement of pitch discrimination." Taneda's instruction is not just a sensible idea to make the child feel good about the work; Gibson shows that actively correcting the child's mistakes provides no meaningful contribution to the learning process. It can't be helpful. It can only be harmful. The mind's process is self-regulated; it searches for and identifies the correct answer automatically, whether you know it or not.
Explicit feedback may even interfere with the perceptual learning itself. Gibson cites a 1962 study in which adults were asked to identify characteristics of sounds.
...the sound stimuli varied along four or five dimensions (frequency, amplitude, interruption rate, duty cycle, and duration), and had two to five values on each. The subject identified the sound by listing the value assumed by each dimension, that is, by a four- or five-digit number. A number of training procedures were compared: one was a standard, passive, procedure in which the subject pressed a space bar on his typewriter to initiate a trial and the computer identified a sound for him by typing a five-digit number and playing it; other procedures were conditions with overt response and various elaborate kinds of correction and and reinforcement. The standard condition, with no overt response, no correction, and no reinforcement, led to the highest average of correct responses. The condition with most elaborate feedback and highest probability of reinforcement led to the lowest percentage of correct responses.
This experiment was repeated four years later, with the result that "comparison of different training procedures and sequences again indicated that simple observation of the sound and its identification was best, uncluttered by all the special paraphernalia and routines." Gibson likens this to another study in which subjects improved their ability to recognize Morse code signals by doing nothing more than passively listening to a steady stream of code.
I don't see Gibson attempting to directly explain why this might be so. If I consider all her different examples, though, some consistent themes seem to appear. One of these themes is that our survival as a species depends on being able to discriminate helpful from harmful things, so our minds are not only willing but eager to work towards perceptual learning for its own sake (without additional reinforcement). Another is that perceptual learning automatically takes place through repeated exposure to stimuli, so explicit correction is simply not necessary-- and correction may even distract from or obscure the nature of the stimulus information.
Does this mean yet another nail in the coffin of note-naming strategies for learning perfect pitch? I suspect so; but a new strategy would have to frame the problem properly, presenting the pitch-recognition task in such a way that the mind would automatically abstract the relevant sensory information without needing to know "right" or "wrong". It'll be an interesting challenge to do that without using note names.
Although I've encountered strong opinions about perfect pitch on-line, I had the experience this week of meeting such an opinion in person, in the form of a person from the music department here at the school. I swear, this person seemed ready to plunge icepicks into my eyes merely for having the temerity to bring up the subject. It was an unfortunate situation; I wanted to know about this person's many years of experience as a music educator, but in order to ask the questions, I had to explain the theories from which the questions arose, and since this didn't match what the person understood about perfect pitch, their impatience and hostility grew with each new idea. Although the conversation was terribly uncomfortable, in hindsight I suppose it's necessary to experience this as the kind of opposition-- not just resistance, but opposition-- which will greet my findings. But until I have incontrovertible, incontestable proof, it's perfectly understandable; as long as my work remains theoretical, why should anyone listen to my explanation versus anyone else's? In the scientific scheme of things, I'm an unpublished, non-accredited, no-name upstart. All I can do, at this point, is give people new points of view to think about, and if they don't want to think about them, I haven't the evidence to convince beyond question. Yet.
One thing I was surprised to learn is that, according to this person's experience, it is widely known that "pitch recognition" is not the same as "perfect pitch". This person casually and easily accepted that a musician can learn to recognize all the notes of the musical scale, without a reference tone, and still not have perfect pitch. How many people in the music school have perfect pitch? I wondered. Only one or two out of a few hundred, was the reply. And how many have pitch recognition? Ninety-eight percent, I was told. I asked this person if I could attach their name to this observation, and thus provide authoritative evidence that "perfect pitch" courses teach pitch recognition (not perfect pitch); but they declined, stating the point was too insignificant. If I'm understanding this person correctly, they've judged it an unimportant point because, since perfect pitch can't be taught, it can be assumed that "perfect pitch" courses only teach pitch recognition. I have to agree that it's not the most important point, but I have a different reason. That is, if music educators (at a university level) generally agree that pitch recognition is what's being taught, then I don't need to build a case to show that... which is mainly what I've been doing. I don't need to prove that-- but if I do assume that all existing strategies teach pitch recognition, as this person adamantly contends (and my research supports), then I can explore why pitch recognition can be learned without perfect pitch perception. Which I've been doing this week.
Gibson's book states that learning, specifically defined, is "the reduction of uncertainty". That is, when you learn something, that means you've discovered and discarded all the wrong choices. You learn the "correct" answer by systematically ignoring all the information you don't need. When you continue to learn, that means you are screening out the bad information more completely and efficiently. Think about taking a multiple-choice test. If you have learned the answer, you can immediately eliminate all the incorrect choices. If you have not learned, then you are uncertain about which one is correct. This concept translates directly to perceptual learning-- the more irrelevant information you can strip away, the more precise your perception becomes. The question becomes, then, what is irrelevant? How can we distinguish between meaningful and meaningless information?
Gibson presents gobs of evidence that demonstrate how, to recognize any given figure, we fixate on its distinctive features. In fact, she goes so far as to suggest that this is how we perceive objects-- and how we remember them! This is in keeping with the information I recall from Memory and Brain; according to that book, our memories are not actually "stored" in the brain cells, but are instead represented by patterns of associated neurons. I suppose one way to think about that is I might have a single concept of "pointed", but by associating that same concept with "sharp", or "logical", or "finger", it forms a different memory. That's an oversimplification, of course, but I think it's an adequate description of the idea.
Anyway, I read this in Gibson's book while I was at a rehearsal, and out of curiosity I walked around with paper and pencil, asking my castmates to draw a certain figure. I reasoned that, if Gibson was correct in saying that we remember an object through a pattern of distinctive features, instead of its total appearance, then each person I asked would reproduce the complete figure by making sure to include the features they considered "distinctive", regardless of what the overall shape was supposed to be. The figure I asked for was one of the fifty states-- guess which state this is supposed to be?
As unspecific and inaccurate as this illustration is, the fact that it has two distinctive features-- a panhandle and a peninsula-- ensures that it really can't be any state other than Florida. The next illustration was almost as generic, but the person added a couple more features; you'll notice the "turn" at the southern end of the peninsula, and that inlet on the western coast..
The next person to draw felt that Florida's islands were distinctive, and he included them; but he also felt that Florida's relationship to Cuba is a distinctive feature (he is from Cuba), and so he included that in his perception of Florida.
The fourth person was a Florida native, and her drawing was even more specific; plus, as she finished the drawing, she commented "...and I know it has islands!" and proceeded to draw random ellipses at the southern part of the state. The position of the islands was irrelevant; it was "islands" as a distinctive feature which was important.
Of course, I also drew one myself, just to see how it would turn out.
This final figure appears to be the most accurate of these drawings, yet if you compare it to an actual contour of Florida, you'll see that it's incorrect. Gibson has the answer: the more distinctive features you present, the more recognizable the figure becomes. Interestingly, when I was drawing this figure, I had a picture in my head of the Florida map (I've put together USA jigsaw puzzles since I was very young, and I know the shape of each state very well), but when I drew this I still wasn't trying to trace the entire outline of that mental image. I was thinking "include a bump... and a dip... there's a wrinkle at the end of the handle... and a lake somewhere in the middle." Even though I had a complete, rather detailed image in my mind, when it came to reproducing that image I was thinking in terms of its distinctive features.
I'll use this Floridian example to emphasize a point that I began making before: when you're trying to name musical tones, you're trying to categorize an object. As such, you're attempting to make that object as distinct as possible. These Florida drawings show how an object becomes more recognizable as you add more features. But for any given musical object (tone, chord, melody, etc), pitch is merely one feature! So, by naming notes, you could be training yourself away from true pitch recognition. Your mind could be adding features to create a more distinct image of the tone object, instead of stripping away features to isolate the pitch characteristic. The most common experience is that many people can hear a difference between tones almost immediately, but then have to go through months of training before they can actually identify the tones; could it be that they are, through their training, discovering new features of the musical tones, in addition to pitch? It certainly seems possible. Gibson anticipates this probability, but in the same breath describes how categorizing could be helpful.
It would seem, superficially, that practice in categorizing should lead only to more generalized, less specific perceptions and thus result in the opposite of perceptual learning. But it is also possible that members assigned to a category actually share, potentially, some feature-- either a minimal distinctive one or some higher order one-- that distinguishes the category from other categories. If this is picked up with practice and processed without verbal intervention, perceptual learning has taken place. (p 190)
The catch here is the phrase "higher order" feature. In other parts of the book, Gibson points out that our minds tend to prefer a feature which is of higher order-- that is, a feature which is some kind of relationship. You can see that in the first Florida drawing-- although there is a panhandle and a peninsula, the size relationship between the two is clearly "wrong". Even though the second drawing is just as unspecific as the first one, it still looks more accurate because the size relationship-- the higher-order feature-- is more accurate. Our brains like relationships. If I'm interpreting Memory and Brain correctly, the basic principles of memory acquisition are association and relationships. And there are plenty of potential relationships that can be applied to a musical tone which would enable you to recognize it "without a reference". The fellow who drew the second picture above is also a singer, and he can recall a middle G as consistently as I can recall a middle C (it was entertaining when we both did so, and then checked ourselves against each other); I asked him if he can recognize a G in music, and he said "only when it's sustained for a long time," so that he can re-tune his mind away from melodic scale-degree effects. But this implies that he's still not recognizing the lower-order pitch-feature of G, because that would be constant regardless of any tuning effects; rather, he's got some internal relationship that he can use to identify the G sensation, and when other relationships disappear, he can apply his own relationship.
Incidentally, this offers further evidence why I do not consider perfect pitch to merely be a matter of memory. If memory is association, and pitch is by its nature an unassociated percept, then you literally can't "remember" a pitch. Learning pitch has to be something other than memory.
And, of course, people constantly complain that they can't "turn off" their relative pitch. They find that they can't not identify a tone using relative pitch, because the relationship is so obvious to them that their minds are satisfied with the higher-order feature (the interval) and don't continue to search for more information.
So the goal here is to find the lower order feature. How do we strip away everything except the pitch sensation? And then, having stripped it away, how do we recognize the pitch for what it actually is? If we strip down the shape of Florida until it is only "panhandle and peninsula", we have a convincing representation of the object. I just showed the first drawing to my roommate, and asked him if it looked like anything. "Florida," he answered. Anything else? "Florida," he replied. No, I mean other than that. "Florida," he grinned, and handed the picture back to me. So it's definitely a strong suggestion of the state. But it also looks sort of like Serbia-- or even a woman's hairdo, in profile. And if you turn it upside down, it doesn't look like Florida at all. Once you strip an object down to its simplest characteristics, any reorientation can cause you to interpret those characteristics differently. Somehow the characteristic has to be made to stand alone.
I still think that the goal is to hear pitches "phonemically"-- but I was disturbed by the fact that "Tom", from the Ball-Stick-Bird page, had perfect pitch yet couldn't recognize language phonemes. So I did some experimentation today which has demonstrated to me, unfortunately, that listening for vowel sounds is not the answer I thought it was. I had speculated that the reason we can hear a vowel sound in a musical pitch is because of "closure". But I had overlooked the fact that, because a vowel sound is two frequencies, recognizing a pitch via "closure" would mean we have mentally created a relationship which helps us recognize the pitch-- and, therefore, it is the imaginary relationship that we are recognizing, not the pitch itself.
I decided to test this by looking at the formant chart and recording piano notes which corresponded to the vowels "ee" and "oo". I chose these vowels for two reasons: because they share the same F1 (lower formant frequency), and because with these two vowels I could make the words "we" (oo-ee) and "you" (ee-oo). I put those intervals together in my sequencer, and connected them with a glide that would be typical of speech. Having done this, I considered how the bottom tone of an interval is merely a reference point, and the top note has the "character". If so, then I could play these intervals, and then yank out the bottom note, and it would still sound the same. The imaginary relationship would be intact. Sure enough, that's exactly what happened. Click the image below to download the sound file; you'll hear the first measure twice and then the second measure twice. We, you; we, you; we, you; we, you.
Then, looking again at the formant chart, I realized that by changing only the bottom note of the interval, I could change this "ee-oo" to "eh-aw". I was disappointed that these didn't actually form words, but I tried it anyway. The result was that the first note was less distinct in each glide, but the second (terminal) note was very clearly now either "eh" or "aw", not "ee" or "oo". Again, click the image file to hear the sound file, with the measures repeated twice: aw-eh, eh-aw, aw-eh, eh-aw. Same pitches-- different vowels.
Of course, having put these frequencies on the musical scale, I quickly realized that the bottom pitches were D and Eb, separated by an octave. If that were so, then I could tune my mind to either D-major or Eb-major, and I would hear the G-tone as either "oo" or "aw" depending on my reference. So I constructed that in the sequencer. Yep-- the first G sounded like "oo", and the second one "aw".
Of course, the first G can sound like "aw", and the second one can sound like "oo", but that's not the point. The point is that vowel sounds are a higher-order feature. Vowel sounds are a relationship. Consequently, listening for vowels is not how you'll hear a pitch "phonemically". I suspect that I may use vowel-listening in the future, but it's certainly not the final answer.
So what does all this mean? I've begun to visualize "pitch" as a thread embedded in a musical object. There are so many different kinds of relationships which can wrap themselves around that thread that it's easy to achieve full pitch recognition, without a "reference", by discovering an alternative higher-order feature which can distinguish each individual tone object without an immediate "relationship" to another note. PT Brady used the C-major scale. Peter used vowel sounds. I'm sure there are other equally valid strategies. But-- especially when you are training yourself by categorizing tones-- you are recognizing features that wrap themselves around the thread, instead of the thread itself. Therefore, when the thread is disguised in a different package, it cannot be recognized; or, when you need to produce the pitch with your own singing voice, you don't know how to wrap your voice around the thread in a way that matches the tone object which you would recognize if you heard it in the air.
If I'm reading Gibson correctly, then it is through comparing these higher-order features to each other that our minds will abstract out the lower-order percept. The new curriculum might present a single pitch sound in as many different ways as possible. I'm fairly confident that this is the way to go; it seems the main thing I have to wonder about is which relationships make the most effective comparisons. And in what order. And in what kind of training structure... and with what kind of presentation, and... um...
Well, I'll keep working on it.