The Transmission of Acoustic Energy from Air to the Receptor and Transducer in the Cochlea

L. Naftalin and M. Mattey

Department of Bioscience and Biotechnology
University of Strathclyde
31 Taylor Street
Glasgow G4 ONR


The anatomy of the mammalian hearing apparatus is described, followed by details of the microanatomy of the cochlea. Indications are given of the actual size of the structures involved. The generally accepted theory of transmission and transduction is recounted; then observational and experimental data providing difficulties in accepting current teaching are explained. An alternative explanation is offered in brief outline, in which it is suggested that wave analysis occurs by constructive 3-dimensional interference patterns, resulting in "phonons" (which may be, e.g. Gabor units df·dt or Dirac delta functions) and which enter "matching" biological macromolecules that react coherently.

There is general agreement that frequency analysis of auditory sound takes place in the inner ear, and more specifically, in higher animals, in the cochlea. There is, however, less agreement about which structures actually perform this frequency analysis; and there is incomplete understanding of how transduction of acoustic to electrochemical energy occurs, en route to neurophysiological signalling.


Click on figure
to enlarge it.

Sound, including speech, propagates through air as a longitudinal compression-rarefaction wave, Fig.1.

It has to be borne in mind that the passage of a sound wave does not involve a bodily motion of a fluid medium as a whole, but as particles at points distant from the source are disturbed there is a streaming of energy in the direction of wave propagation. This formulation applies also to the solid rods and lattices. In the cochlea the acoustic wave-energy has to be converted - or transduced - to an electrochemical signal for onward transmission.


Click on figure
to enlarge it.

Fig. 2 is the well-known illustration by Brödel (1946) of the anatomy of the external, middle and inner ear, in which one can trace the route of sound from air to the cochlea through the tympanic membrane (ear drum), through the three ossicles to the footplate of the stapes vibrating in the oval window of the bony labyrinth.


Click on figure
to enlarge it.

Fig.3, modified from Gray's Anatomy (1958), shows that the footplate of the stapes is vibrating into the vestibule of the bony labyrinth and is doing so very nearly at a right angle to the scala vestibuli. The acoustic signal has then to pass through a membrane into the scala media in which is sited the organ of Corti. The organ of Corti consists essentially of the sensory hair cells together with the basilar membrane below and the tectorial membrane above them.


Click on figure
to enlarge it.

Other textbook illustrations, e.g. Fig. 4, show the acoustic route as not necessarily going via the helicotrema which is the narrow space at the apex of the cochlea where the scala vestibuli meets the scala tympani, but crossing through Reissner's membrane and scala media to the scala tympani. This point about the route will be returned to later.

To give an indication of the actual size of the structures in the inner ear involved in the transmission of the sound signal; the footplate of the stapes is about 2mm wide and, being roughly oval-shaped, is little more than 1mm across the smaller axis. The total volume of perilymph in the scalae vestibuli and tympani has been estimated at about 78µl (in the human) and the volume of endolymph in the scala media at about 3µl (Maggio, 1966). We are thus dealing with very narrow liquid columns for the sound to propagate through. Having reached the organ of Corti, Fig. 5, the structures receiving the signal are measured in micrometres. The tectorial membrane may be as much as 30µm thick; the hair cell and hair processes together are approximately 45µm long; the stereocilia (hair processes) regarded as the actual receptor structures, being about 6µm long.


Click on figure
to enlarge it.

Thus the analysed acoustic signal has to find its way according to its frequency to a microscopic place in the cochlear system.

The text book explanation of how cochlear frequency analysis is performed is as follows. A major review in Nature (Hudspeth, 1989) described the mode of analysis "The piston-like motion of the stapes displaces the contents of the cochlea's three fluid-filled internal compartments, thereby flexing the basilar membrane up and down. Hair bundles in the organ of Corti are then stimulated by the shearing motion between the apical hair-cell surfaces and the overlying gelatinous tectorial membrane". More recently, Wu et al (1995) repeat the view that the basilar membrane is mechanically tuned by a gradient in its stiffness giving rise to a travelling wave, as originally described by von Békésy (1960). Yet another text-book description states that the movements of the stapes footplate sets up a series of travelling waves in the perilymph of the scala vestibuli and the peak values of these travelling waves stimulate corresponding travelling waves and peaks in the basilar membrane. These latter peaks lift the hair cells so that the hair processes are sheared by the tectorial membrane.

We find difficulty in accepting this mechanical hypothesis for several reasons.

  1. There is not enough energy, in the acoustic wave in air, at the threshold of hearing, to move the basilar membrane. At the threshold of hearing the acoustic energy at the tympanic membrane is 1E-16 W/cm/cm, or 1E-19 Jm/s/cm/cm (milliseconds and even less are important in hearing). This power value is so small that not only are losses or dissipation not available to the system but selective amplification is required somewhere in the route from tympanic membrane to the transduction site (to an electrochemical signal) to increase the signal to noise ratio. However, in the middle ear the contrary situation appears to be present. Rosowski and colleagues (1986) examined power transfer data from external and middle ears to the cochlea. One of their conclusions was that "the middle ear neither extracts all the available power from the external ear, nor delivers to the cochlea all the power it takes in. Therefore describing the middle ear as an impedance matching device is not helpful" and again, "the match is particularly poor at low frequencies; less an 1% of the available power enters the middle ear at frequencies less than 1 kHz".

    When we note that kT at 25oC is approximately 4E-21 J (Brownian movement) and given that Rosowski's calculations are in the correct order of magnitude, then at the threshold of hearing the power entering the inner ear is too close to noise to effect mechanical movement as a distinguishing signal.

    Roberts et al (1988) report that a force of 300 fN (0.3pN) is required to open a single transduction channel in a stereo-cilium deflecting the hair-processes by 0.3nm. Hudspeth (1989) writes "the gate of the transduction channel evidently swings through 4nm upon opening, which is opposed by a restoring force of about 2pN".

    These calculations by Roberts et al (1988) and by Hudspeth (1989) are based on experimental results. Thus we have the situation, if we adhere to concepts suitable for macrolever systems, in which a force of 0.3pN opens a "gate" against an opposing force of 2.0pN closing it. It would seem then that if the input power rises sufficiently, at least tenfold, to overcome the opposing force, we could then begin to use concepts of macrophysics, but at the threshold of hearing we should consider alternative forms of energy transmission and transduction.

  2. The rate of propagation of the travelling wave in the basilar membrane is measured in milliseconds. Much psycho-acoustic evidence demonstrates that speech formants are being recognised in shorter periods of time. It has been pointed out that speech recognition, including the timbre of voice, involving many simultaneous frequencies, can be detected by waves reaching the two ears out of phase by 500µs (Naftalin, 1981). Carey and Hudspeth (1979) have shown that hair-cells from the bullfrog sacculus can respond to a fast stimulus with only a 40µs, or less, delay. "The insertion of variable low speed components, the travelling waves in the two cochlear, between two higher speed systems (acoustic transmission and nervous system) would seem to produce for the transients of speech an unnecessary degree of confusion." (Naftalin, 1981).

  3. Anatomical evidence. 95% of the afferent nerve fibres (from cochlear haircells to the brain) supply the inner hair cells, which, in at least some mammals, are sited on bone: a travelling wave - of the nature observed by von Békésy - is not possible in bone. It is perhaps not irrelevant that some amphibian and reptilian species are capable of frequency analysis but have no basilar membrane in their inner ears.

  4. The travelling wave is indirectly induced in the basilar membrane. If we accept that the wave is first set up in the scala vestibuli by the movements of the stapes, this travelling wave propagates and dissipates through the whole fluid internal cochlear system. Even if the travelling wave is in some context important, the internal geometry of the cochlea governs its direction and dissipation.

Alternative Proposition

Our alternative explanation begins by considering the cochlear geometry. The human cochlea is fully formed by the 7th month of foetal life and remains the same size and has the same internal configuration throughout life. Quoting from a previous communication "If one thinks of the wind instruments of the orchestra, the problem the designers of these instruments have is that of the distribution of acoustic energy in a fluid elastic medium in a specialised container, and therefore the primary concern of the instrument makers has to be that of the internal geometry of their product. One may ask, why should the cochlea, an instrument dealing with acoustic energy distribution in a fluid elastic medium, be exempt from such a condition (Naftalin, 1981)? To that we can add that the human mouth is designed to utter various frequencies by altering its internal geometry; this is specially the case with unvoiced phonemes.

Given the high probability that the internal geometry of the cochlea governs the frequency distribution of incoming acoustic signals, at least to a first approximation, such a property would provide an obvious advantage for speech since the nervous system would not have to relearn interpretation of sound signals with each new period of growth. Nevertheless, clinically and experimentally, a temporary threshold shift can occur after acoustic trauma. Since the bony geometry does not alter, a second structure involved in frequency placement must be affected. We suggest that this second structure is the tectorial membrane. This membrane has a graded thickness increasing from base to apex, and its internal composition also varies, with concentrations of proteoglycan macromolecules increasing from base to apex (Munyer and Schulte, 1994). Our hypothesis is illustrated in Figs. 6 and 7.


Click on figure
to enlarge it.

Fig. 6A is Brillouin's formulation of the interface joining a continuous line capable of carrying all frequencies with a lattice permitting only defined values. Fig.6B represents the cochlea as an example of 6A. Perilymph in the scala vestibuli is a continuous line carrying all frequencies; the overall internal geometry analyses the incoming frequencies by 3-dimensional constructive interference into Gabor units, df·dt, (or to Dirac delta functions) which have specific placements. Where the energy characteristic of a Gabor unit fits a place in the lattice of the tectorial membrane a transfer of energy can occur. The tectorial membrane is a complex dielectric with an adsorbed water layer on each macromolecule; by their own structure the macromolecules are polarised and further polarisation is caused by the positive endolymphatic d.c. potential of 100mV in the scala media, Fig. 7.


Click on figure
to enlarge it.

We suggest that in this way the acoustic signal is transduced in the tectorial membrane to a form capable of initiating an enzyme reaction in a stereocilium of a hair cell. It is worth noting that protons can pass through amiloride-sensitive sodium channels in one biological system (Gilbertson et al, 1992) and that hair cell transduction is amiloride sensitive (Hackney et al, 1992).

Returning to the problem of the energy available at the threshold of hearing we can restate the values in the following way. 10-19 Jms-1 is equivalent to 0.625 eVms-1. Since frequency can also be expressed in eV, we may conclude that 3 formants of a vowel of speech, with each formant consisting of several frequencies as seen in speech spectrograms, will each have only a fraction of 0.625 eVms-,1, and some speech formants are shorter than 1ms. Hydrogen bond energies range from 0.13 to 0.3 eV and a transfer of such an energy value via a hydrogen-bond chain, would seem to to be the best fit to the known structures. Viewed from this standpoint the ion-exchange mechanism, proton percolation and the associated dielectric phenomena are not separable, and underlie the biochemical and electrophysiological observations of transfers in ion-channels in the transducing properties of stereocilia.


Block, H., Goodwin, K.M.W., Gregson, E.M. and Walker, S.M. (1978): Stimulated resonance between electrical and shear fields by a colloid system. Nature (London) 275, 632-634.

Brillouin, L. (1962): Giant molecules and semiconductors. In: Horizons in Biochemistry, pp295-318, Eds. M. Kasha and B. Pullman, Academic Press, New York. Br(del, M. (1946): The anatomy of the Human Ear. W.B. Saunders Company, Philadelphia.

Cantor, C.C. and Schimmel, P.R. (1980): Biophysical Chemistry. W.H. Freeman and Company, San Francisco.

Corey, D.P. and Hudspeth, A.J. (1979): Response latency of vertebrate haircells. Biophys.J. 26, 499-506. Gabor, D. (1947) Acoustical quanta and the theory of hearing. Nature (London) 4004, 591-594.

Gilbertson, T.A., Avenet, P., Kinnamon, S.C. and Roper, S.D. (1992): Proton currents through amiloride-sensitive sodium channels. J. Gen. Physiol. 100, 803-824.

Gray's Anatomy (1958): 32nd Edition.

Hackney, C.M., Furness, D.N., Benos, D.J., Woodley, J.F. and Barratt, J. (1992): Putative immunolocalisation of mechanoelectrical transduction channels in mammalian cochlear haircells. Proc. R. Soc. London B. 248, 215-221.

Hudspeth, A.J. (1989): How the ear's works work. Nature (London) 341, 397-404.

Maggio, E. (1966): The humoral system of the labyrinth. Acta Otolaryng. Suppl. 218.

Munyer, P.D. and Schulte, B.A. (1994): Immunochemical localisation of keratan sulphate and chondroitin 4- and 6-sulphate proteoglycans in subregions of the tectorial and basilar membranes. Hear. Res. 79, 83-93.

Naftalin, L. (1981): Energy transduction in the cochlea. Hear. Res. 5, 307-315.

Richards, E.G. (1980): An introduction to physical properties of large molecules in solution. Cambridge University Press, Cambridge.

Roberts, W.M., Howard, J. and Hudspeth, A.J. (1988): Haircells : transduction, tuning and transmission in the inner ear. Annu.Rev. Cell Biol. 4: 63-92.

Rosowski, J.J., Carney, L.H., Lynch, T.J. and Peake, W.T. (1986): The effectiveness of external and middle ears in coupling acoustic power into the cochlea. In: Peripheral auditory mechanisms. Eds. Allen, J.B. et al. Springer, Berlin.

Von Békésy, G. (1960): Experiments in hearing. McGraw Hill, New York.

Wu, Y-C., Art, J.J., Goodman, M.B. and Fettiplace, R. (1995); A kinetic description of the calcium-activated potassium channel and its application to electrical tuning of haircells. Prog.Biophys.Molec.Biol. 63, 131-158.