The word ‘perception’ comes from the Latin word percepio, meaning “receiving, collecting, action of taking possession, apprehension with the mind or senses”.

What to perceive?

For philosophers, the fundamental question about perception is:


Is there a reality, and if so, do you perceive it directly?

Within the philosophy of perception, there are two and a half answers to this question.


The Realist position is that the objects of perception exist independently of the mind. Within this current of thought two refinements can be distinguished. Direct or naïve realism holds that we perceive reality directly. You should be able to find a counter-example in the discussion of auditory transduction in the following chapter. It is … … that the human basilar membrane is only sensitive to frequencies between 20 Hz and 20 kHz. There are many other frequencies in nature higher or lower than this range, which we cannot hear at all, so it is unrealistic :-) to hold that we can perceive reality in any direct sense.

The fall-back position, indirect or representative realism, maintains that we can only perceive a representation of reality, one mediated through the limitations of the senses. Thus the 20 Hz to 20 kHz range of the basilar membrane is to be expected.


The foil of realism, constructivism, agrees that there is a reality out there, but argues that we perceive it by elaborating a prediction of it, and then checking it against the incoming sensory information. If the prediction holds, then there is no need to do any further processing. If it does not, then additional cognitive resources must be spent to figure out the discrepancy.

You can use your recently gained knowledge of auditory transduction as a test of either philosophy. Since both accept the existence of a reality – the sound waves impinging on the ear drum – the crucial difference is whether there is a prediction made at any point in the transductive sequence. Do you think there is? The only candidate is the feedback process of an outer hair cell in exaggerating the motion of the basilar membrane, but it seems incorrect to claim that an outer hair cell tries to predict an (upcoming) movement of the basilar membrane. I took pains to frame it as a reflex, in which an outer hair cell reacts quickly enough to synchronize with the basilar membrane. We can conclude that auditory transduction is a realist phenomenon.

But we are finished with auditory transduction. A recurring question for the rest of the course is whether, at any point in the process of language understanding, does the neural processor which subserves it rely on a prediction to perform its function or not. By thinking in terms of computation, this question can be made even more precise. In auditory transduction, the direction of processing is always forward, from one mechanism to the next, except for the outer hair cells, whose dancing feeds back to the basilar membrane. A prediction, on the other hand, invariably arises in a higher area and is passed backwards to guide a lower area. Since both feedforward and feedback pathways must have a neuroanatomical basis, an understanding of the tracts entering a neural processor can tip us off to whether feedback is involved or not.

Let me bundle these ideas together into a table to summarize this discussion:

Table 17 Philosophy of perception




direct or naïve realism

reality is perceived directly


indirect or representative realism

reality is perceived through a representation


sensory representation of reality is tested against a prediction

feedforward + feedback

Why perceive?


What does a sensory system do? Or to state it another way, what is the goal or purpose of a sensory system?

The answer to these questions is related to that of the previous one. Most people would say that the purpose of a sensory system is to provide an accurate portrayal of reality, which is a sort of realist answer. Certainly it seems ludicrous to propose that the purpose of a sensory system is to provide an inaccurate portrayal of reality.

Yet the nub of the question hinges on the meaning of “accurate”. Several recent schools of psychology take “accurate” to really mean “good enough”. Good enough for what?, you might ask. That is the proper question.

Ecological psychology

For ecological psychology, perception should be good enough to guide an organism’s behavior in the environment or ecological niche which it inhabits. The coupling between environment-based perception and action is so tight that it is claimed that perception affords certain actions to the organism. For humans, the prototypical example of an affordance is a doorknob: you look at it, and you recognize immediately that you should put your hand on it and turn. My preferred example is a cat’s litter box: over the years, my wife and I have adopted several stray or feral cats, and every one of them knew exactly what the litter box was for the first time they saw it. As the ecological psychologists say, “ask not what’s inside your head, but what your head’s inside of”.

Do you know enough yet to challenge ecological psychology against the facts of audition? Well, almost. You will learn in greater detail in the next chapter that most of the information in speech is found in the 100 Hz to 5 kHz range. This embraces about two-thirds of the tonotopic map of the basilar membrane, so that there is a plausible argument to be made (at least to me) that the basilar membrane affords speech production.

Evolutionary psychology

A somewhat different answer to the question of “Good enough for what?” is given by evolutionary psychology, which claims that psychological traits are the result of natural or sexual selection applied to problems recurring in the human ancestral environment. In a nutshell, the subtribe Hominia split from the ancestors of chimpanzees about 7 million years ago. Its members – known as the genus Australopithecus – finished the shift to bipedalism and lost their fur about 3 million years ago. They lived in the savanna and began to incorporate into their diet meat from scavenging and developed the first stone tools. The genus Homo split off from Australopithecus in East Africa 2.8 million years ago. Their stone tools were more sophisticated and abundant, marking the beginning of the Paleolithic or Old Stone Age. Modern Homo sapiens appears in the fossil record between 200 and 160 thousand years ago. The invention of farming in the Middle East brings an end to the Paleolithic between 10.2 and 8.8 thousand years ago and ushers in the Neolithic or New Stone Age. You can imagine the rest of the story.

Given that roughly 99.6% of Homo evolution took place during the Paleolithic, it is there that evolutionary psychologists look for explanations for modern human cognition. The evidence from stone tools suggests that Old Stone Age humans were hunter-gatherers, so ethnographic description of modern hunter-gatherers such as the Bushmen of the Kalahari Desert in southern Africa, the Aborigines of Australia or the Pirahã of Brazil’s Amazon is considered to reflect Paleolithic social and ecological conditions and so shed light on the “problems recurring in the human ancestral environment”.

To tie the evolutionary to the ecological approach, we could recast the ecological aphorism as “ask not what’s inside your head, but what your head was inside of during the Paleolithic”. Unfortunately, little of this is applicable to our current concern with audition. To the best of my knowledge, auditory transduction has not changed appreciably during the evolution of Homo, though there might be some evolutionary change apparent in the chapter on subcortical audition.

An example from the evolution of color vision


What is the difference between the left and right photographs?

Table 18 A forest scene [3]
_images/percep-ForestNOred.png _images/percep-ForestRed.png

I hope that you realized that the left photograph is just the right photograph with all the reds filtered out.

To briefly explain how this works, imagine that color vision evolves through the three stages in Table 19.

Table 19 Chromacy [4]




_images/percep-monochromatic.png _images/percep-dichromatic.png _images/percep-trichromatic.png

black & white, AKA grayscale

blues & greens

blues, greens & reds

The basic type of color vision distinguishes no ‘color’ at all, just black, white and the grayish gradations in between. This is because the retina of such monochromatic visual systems only has photoreceptors that are sensitive to low levels of light – as low a single photon – at the cost of being insensitive to color. Called rods, they are good for seeing in the dark or in deep water from which most of the lower wavelengths of light have been filtered out. In such environments, it appears that distinguishing color is a luxury that animals can do without.

more See Rod cell for additional explanation of how rods work.

In the next stage, the retina has evolved two types of photoreceptors which require more light to function but are sensitive to different wavelengths. Called cones, in the case of most modern mammals, they respond to a short wavelength (high frequency) in the vicinity of blue and a medium-to-long wavelength (medium frequency) in the vicinity of green.

In the final stage of humans and some Old World monkeys, the medium-to-long wavelength cones split into two, one sensitive to medium wavelengths – just green – and the other sensitive to long wavelengths – (low frequency) red. Such trichromacy enables the animal to see what we would consider the full gamut of colors.

The question that has sparked considerable debate is what motivated the differentiation of dichromats into trichromats. The oldest hypothesis is that it enabled primates to quickly find ripe (that is, red-orange) fruit in a thick canopy of green leaves in the daytime, cf. Allen (1892). There are two main alternatives, however. One is that enabled primates to better discern hidden predators; the other is that enabled them to better understand the reproductive state of their partners. Carvalho et al. (2017) summarizes the three in the following marvelous image, which I have flipped into the order of Table 19:


Fig. 35 Examples of evolutionary drives that might have influenced the evolution of primate color vision. [5]

It is not my goal to settle this issue. What I have in mind is …


How would ecological psychology and evolutionary psychology treat the evolution of trichromacy?

The two sorts of selection

Table 20 Natural vs. sexual selection

Natural or Darwinian selection [6]

Sexual selection [7]

_images/percep-NaturalSelect.png _images/percep-SexualSelection.png

Natural or Darwinian selection

In the theory of natural selection,

  • favorable heritable traits become more common in successive generations of a population of reproducing organisms, and

  • unfavorable heritable traits become less common.

Over time, this process may result in adaptations that specialize organisms for particular ecological niches, i.e. evolution. With respect to perception, the process of natural selection guarantees a strong connection between the design of an organism’s perceptual systems and the properties of the physical environment in which the organism lives.

In humans, this connection is implemented through a mixture of

  • fixed (hardwired) adaptations that are present at birth and

  • facultative (plastic) adaptations that alter or adjust the perceptual systems during the lifespan.

Sexual selection

How to perceive

Veridicality is expensive

For the sake of argument, let us assume that this photograph, despite being in grayscale, is a veridical representation of the visual reality of the scene depicted:


Fig. 36 caption here

Author’s plot from code above.

Perceptual objects

According to the Oxford English Dictionary, object means

something placed before the eyes, or presented to the sight or other sense; an individual thing seen or perceived, or that may be seen or perceived; a material thing

Its etymology explains its visuocentric connotation:

object derives from the Latin ob-, ‘before’ or ‘toward’, and iacere, ‘to throw’ and used to mean: something ‘thrown’ or put in the way, so as to interrupt or obstruct the course of a person or thing; an obstacle, a hindrance

Indeed, most visible things are obstacles or a hindrance to sight; they prevent you from seeing something that lies behind them because they are opaque.

Perceptual objects have the following properties:

  1. They have edges (or contours or boundaries).

  2. They can be grouped.

  3. They can be figures for figure-ground segregation.



What is the relationship between the left and right images?

Table 21 Two perspectives on Lena

Processed image [8]

Original image [9]

_images/Percep-LenaEdges.png _images/Percep-Lena.png

The left image has been processed by a computer program that attempts to retain the edges in the right image and throw away everything else. The fact that the left is still informative shows the importance of edges to visual object recognition.


Edge: an abrupt change in contrast (light ~ dark)

more For more about Lena/Lenna including the complete photograph, see The Lenna Story, as well as Wikipedia’s Lenna, which rather chastely does not include the entire photograph, but does discuss its potential sexism.

With this in mind, let us turn to audition.


Do you see any edges in following auditory object?


Fig. 37 Pressure trace and wide-band spectrogram of the author’s pronunciation of ‘phonetician’. [10]

There are quite abrupt transitions between fricatives and


Figure-ground segregation



Wikipedia’s Ecological psychology is a convenient starting point to learn about the approach, while Wikipedia’s Evolutionary psychology is an excellent overview of the field. My all-too-brief sketch of human evolution is condensed from Wikipedia’s fascinating Timeline of human evolution, which starts at THE BEGINNING.

  • Carvalho, Livia S., Daniel M. A. Pessoa et al. “The Genetic and Evolutionary Drives Behind Primate Color Vision.” Frontiers in Ecology and Evolution 5.34 (2017)

  • Regan, B. C., C. Julliot et al. “Fruits, Foliage and the Evolution of Primate Colour Vision.” Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences 356.1407 (2001): 229-83.

  • Allen, Grant. The Colour-Sense: Its Origin and Development. An Essay in Comparative Psychology. Vol. 10 London: Kegan Paul, Trench, Trübner & Co., 1892.

  • Kubovy & Van Valkenburg (2001) Auditory and visual objects. Cognition 80, 97-126.

Powerpoint and podcast


The next topic

Come to class having read Auditory transduction and answered the questions.

Last edited Aug 22, 2023