The Rhythms of Phonemes


December 2011
Ivan Obolensky

To reach out to more people we often have to know more than one language. Learning another language later in life is hard work, but knowing something of the nature of how we learn languages can help in the process.

We are social creatures. One may disagree, but try being isolated for any period of time. It is not easy. Is it any wonder that isolation is used as a form of torture in prisons?

As humans, we like to communicate, and spoken language is the preferred method. One might ask why we prefer to use the spoken word. Why have we not all learned sign language? It can be just as expressive, and it can also communicate complex ideas.

The answer is speed.

The spoken word does not require proper lighting, or line of sight contact. It can be shouted across a distance, or whispered. Speech is by far the fastest way of getting information from one person to another.

There are reasons as to why this is so.

Linguists have broken down speech into phonemes, or the smallest unit of speech sound in a language. The “k” sound in “skill” is an example of a phoneme in English. Each language has its own set of phonemes.

Phonemes are determined for any given language by working out which specific difference in sound works to indicate a difference in meaning. For instance, there is the difference in the word “skill” and “skull” in English. The difference in the sound of the “u” and the “i” create a whole different meaning. In Spanish the long “r” in “perro” (dog) differentiates it from “pero” (but). In a typical language there are anywhere from twenty to sixty phonemes.

The English language alphabet contains twenty-six letters and uses forty phonemes, or sound units, in speech. The pairing of the alphabet to speech sounds is essential to learning how to read in any alphabetic writing system. One learns that a letter of the alphabet represents a phoneme, or speech sound. By sounding out the letters we recognize a word and instantly we can read. Out of this concept comes the system of reading that we call phonics.1

So why is learning to speak a new language so hard?

Remember what it’s like to hear someone speaking in a totally unfamiliar language? Have you noticed that it is hard to distinguish how many words are being spoken? Or when separate words start and stop? It all seems to run together in a continuous stream of sounds.

Listening to someone speaking Chinese into a cell phone, one has not a clue of what is being said even if one is looking directly at their face and expression. The dialogue could be reciting a shopping list or explaining how to fix a flat tire. It is impossible to say.

We do not know what is being said because we are unfamiliar with the phonemes of the Chinese language. When we hear a language that we know, we hear familiar sounds which when run together create familiar words from which we get context and understanding as to what is being said. It is similar to how we immediately respond to our name being mentioned several tables away in a noisy restaurant. The phonemes of our name stand out in our mind.

The study of how the brain interprets sound and words really started during World War II in the 1940s, when engineers tried to develop reading machines for the blind. The US Office of Scientific Research and Development asked certain research laboratories to evaluate and develop technologies for assisting blinded WWII veterans. One of the developments was a reading machine based on an alphabet of individual sounds. Each sound represented a letter of the alphabet so that the listener would hear a series of tones. Unfortunately, even the most skilled listener could not recognize these sounds at a rate faster than three units of sound per second, or about as fast as Morse code over a telegraph.

There is a difference in the way the brain perceives simple sounds and the building blocks of language, or phonemes.

A distinctive click is a sound. As the click is repeated faster and faster it changes from individually perceived clicks to a low buzz or hum. This transition from click to indistinguishable buzz occurs at about twenty clicks per second.

Normal speech is spoken at 10-15 phonemes per second. Late night infomercials can jump this speed to 20-30 phonemes per second. The upper limit of intelligibility is 40-50 phonemes per second.

So how can we hear and interpret phonemes at a rate of forty-five phonemes per second which is theoretically faster than we can hear and distinguish individual repeated sounds?

Firstly, what distinguishes one click from another is the silence between them. Phonemes on the other hand when spoken form a continuous sound with no silences in between. Even words with many syllables run together. Listening to two people speaking very rapid Spanish, there is not only no silence between syllables but words follow words with no space in between.

Secondly, our brain has a special language ability to be able to unpack (decompress) speech information at almost two to three times faster than the rate of distinguishable consecutive sounds (which explains some of the difficulty speech recognition software has when we talk fast.) By combining sounds and running them together the brain can shortcut and make out words before it has actually finished hearing them. It is this skill that has evolved and made us so language savvy.2

In computer terms, it is as if the individual phonemes are like tiny compressed files that our brain is able to recognize as having meaning, and is able to unpack (unzip) and process at a speed faster than we are aware of hearing them.

This is the difficulty a person has in learning a new language.

To speak a new language, one has to train the mind to recognize and be familiar with new phonemes and new combinations of phonemes. In some languages (like Russian) there are special letters that denote sounds that have no analog in English. A new language contains a totally unknown vocabulary and grammar. Each one of the new words contains sounds that are run together to create a series of sounds we have not heard before. Just like listening to the Chinese person speaking into the cell phone, one has no idea how many words are being spoken or even when a word ends and another word begins. It is unintelligible to us because we are not familiar with any of the phonemes of the new language.

It is the combining of phonemes together, and the brain being trained to recognize meaning below our awareness that creates speech understanding and thus language recognition so we can hear and speak fluently.3

Learning a language is not just something we simply learn like telling time or getting one’s wits around how a sewing machine works.

Strangely, the closest thing to learning a new language that I can think of is learning to land an airplane. It is not using the controls or knowing what the procedures are for landing that I am talking about. It is about the sensory overload of learning new things.

When one is learning to fly it is very easy to “get behind the airplane”. This is a technical definition of the airplane flying you as opposed to you piloting the airplane. An example of this is when one is a novice pilot and trying to land. There is the bouncing around of some light turbulence. There is listening to the radio and talking to the tower. There is maintaining proper airspeed and attitude. There is listening to the instructor. There is the fact that you are landing at 70 knots or even 129 if it is a jet. One is descending and the runway is coming up fast. Things are happening very quickly and as one more piece of sensory input is added to senses already overloaded, one freezes up. At this point the instructor intervenes and saves the day.

This is what constant training in emergency procedures is all about. It is the emphasis on a few important things that one can latch onto when one’s world is spinning around and the altimeter is unwinding rapidly. Practice prevents the freeze-up of sensory overload.

With much landing practice the brain seems to quiet down. Over time, the runway seems to approach at a slower rate. There is time to do other things such as plan which exit you are going to take once you land. Your brain has gotten used to the speed of information and is able to process it faster and more easily.

As a more mundane example, a couple of years ago I went outside with my first pair of prescription sunglasses. I had not seen an eye doctor since I was twelve. I will never forget the experience: I could barely drive because of the massive amount of new sensory inputs that seemed to overload my brain. I could see leaves on trees from far away. Everything had been a slight blur before, but not anymore. Life was crystal clear, and it gave me a headache as my mind tried to process it. I realized that this tendency to process all the information I was receiving through my eyes was not something I could particularly control. It was instinctual and happened whether I liked it or not. Over time, I got used to my sunglasses. But even today I marvel at all the detail I can now perceive.

The point is that in learning a language there is a dual process going on. There are those pesky irregular verbs that one has to sit down and memorize, but there is the other part of constantly putting oneself into sensory overload and trying to speak in spite of that, which allows one to learn the language. This takes time and commitment.

Eventually conversations are not gut-wrenching experiences but simple things that one can have while thinking about several other things. One has trained the brain.

In conclusion, learning a new language is often thought of as a sit-down-and-learn-it activity. The rise of the Rosetta Stone company and ubiquitous booths at airports attests to the fact that many people would like to learn that way. Many people start but then fail to carry through.

But if learning a language was simply a memorization test then I think more would succeed, when in fact there is a lot more to it. One has to learn new phonemes by indoctrinating and forcing one’s brain to process them over and over until the brain can handle them. This means abandoning one’s thinking to an activity that happens below our awareness, until our brains can process the information easily. This means putting oneself firmly outside of one’s comfort zone, because this is the only way to get that particular part of the brain (that unpacks language phonemes) to become skilled. It is no wonder that immersion courses work so well.

To have the skill of speaking to and understanding people of diverse cultures, one must consistently embrace stepping outside one’s comfort zone. Our language processors require it.

1 University of Oregon. (n.d.). Phonemic Awareness. Retrieved December 19, 2011, from

2 Haskins Laboratories. (n.d.). Alvin M. Liberman, 82, Speech and Reading Scientist. Retrieved December 19, 2011, from

3 Pinker, S. (1994). The Language Instinct: How the Mind Creates Language. New York, NY: Harper Perennial Modern Classics.

If you would like to sign up for our monthly articles, please click here.

Interested in reprinting our articles? Please see our reprint requirements.

© 2011 Ivan Obolensky. All rights reserved. No part of this publication can be reproduced without the written permission from the author.

Leave a Reply