THE COMPLEMENTARY BRAIN:
Unifying Brain Dynamics and Modularity
Department of Cognitive and Neural Systems
677 Beacon Street
Boston, MA 02215
FAX: (617) 353-7755
Technical Report CAS/CNS-TR-98-003
Boston, MA: Boston University
Trends in Cognitive Sciences, in press
Requests for reprints should be sent to:
Department of Cognitive and Neural Systems
677 Beacon Street, Room 201
Boston, MA 02215
Key Words: modularity, What and Where processing, visual cortex, motor cortex, reinforcement, recognition, attention, learning, expectation, volition, speech, neural network
* Supported in part by grants from the Defense Advanced Research Projects Agency and the Office of Naval Research (ONR N00014-95-1-0409), the National Science Foundation (NSF ITI-97-20333), and the Office of Naval Research (ONR N00014-95-1-0657).
Acknowledgments. The author wishes to thank Robin Amos and Diana Meyers for their valuable assistance in the preparation of the manuscript and figures.
How are our brains functionally organized to achieve adaptive behavior in a changing world? This article presents one alternative to the computer metaphor suggesting that brains are organized into independent modules. Evidence is reviewed that brains are organized into parallel processing streams with complementary properties. Hierarchical interactions within each stream and parallel interactions between streams create coherent behavioral representations that overcome the complementary deficiencies of each stream and support unitary conscious experiences. This perspective suggests how brain design reflects the organization of the physical world with which brains interact. Examples from perception, learning, cognition, and action are described, and theoretical concepts and mechanisms by which complementarity is accomplished are presented.
In one simple view, our brains are proposed to possess independent modules, as in a digital computer, and we see by processing perceptual qualities such as form, color, and motion using these independent modules. The brains organization into processing streams1 supports the idea that brain processing is specialized, but it does not, in itself, imply that these streams contain independent modules. Independent modules should be able to fully compute their particular processes on their own. Much perceptual data argue against the existence of independent modules, however, because strong interactions are known to occur between perceptual qualities2-6. For example, changes in perceived form or color can cause changes in perceived motion, and conversely; and changes in perceived brightness can cause changes in perceived depth, and conversely. How and why do these qualities interact? An answer to this question is needed to determine the functional and computational units that govern behavior as we know it.
The present article reviews evidence that the brains processing streams compute complementary properties. Each streams properties are related to those of a complementary stream much as a lock fits its key, or two pieces of a puzzle fit together. It is also suggested how the mechanisms that enable each stream to compute one set of properties prevent it from computing a complementary set of properties. As a result, each of these streams exhibits complementary strengths and weaknesses. How, then, do these complementary properties get synthesized into a consistent behavioral experience? It is proposed that interactions between these processing streams overcome their complementary deficiencies and generate behavioral properties that realize the unity of conscious experiences. In this sense, pairs of complementary streams are the functional units because only through their interactions can key behavioral properties be competently computed. As illustrated below, these interactions may be used to explain many of the ways in which perceptual qualities are known to influence each other. Thus, although analogies like a key fitting its lock, or puzzle pieces fitting together, are suggestive, they do not fully capture the dynamism of what complementarity means in the brain. I will suggest below that the concept of pairs of complementary processes brings new precision to the popular idea that both functional specialization and functional integration occur in the brain. Table 1 summaries some pairs of complementary processes that will be described herein.
Why does the brain often need several processing stages to form each processing stream? Accumulating evidence suggests that these stages realize a process of hierarchical resolution of uncertainty. Uncertainty here means that computing one set of properties at a given stage can suppress information about a different set of properties at that stage. As I will illustrate below, these uncertainties are proposed to be overcome by using more than one processing stage to form a stream. Overcoming informational uncertainty utilizes both hierarchical interactions within the stream and the parallel interactions between streams that overcome their complementary deficiencies. The computational unit is thus not a single processing stage; it is, rather, proposed to be an ensemble of processing stages that interact within and between complementary processing streams.
According to this view, the organization of the brain obeys principles of uncertainty and complementarity, as does the physical world with which brains interact, and of which they form a part. This article suggests that these principles reflect each brains role as a self-organizing measuring device in the world, and of the world. Appropriate principles of uncertainty and complementarity may better explain the brains functional organization than the simpler view of computationally independent modules. Experimental and theoretical evidence for complementary processes and processing streams are described below.
SOME COMPLEMENTARY PAIRS OF BRAIN PROCESSES
What learning and matching
Where learning and matching
Optic flow navigation
Sensory cortical representation
Learned motivational feedback
Working memory order
Working memory rate
In most of these cases, evidence for the existence of processing streams and their role in behavior has been developed by many investigators. The fact that pairs of these streams exhibit complementary computational properties, and that successive processing stages realize a hierarchical resolution of uncertainty, has only gradually become clear through neural modeling, primarily from our group and colleagues. Through a large number of such modeling studies, it gradually became clear that different pairs of streams realize different combinations of complementary properties, as illustrated below. As of this writing, so many streams seem to follow this pattern that I now suggest that complementarity may be a general principle of brain design.
Complementary boundaries and surfaces in visual form perception.
Visual processing, from the retina through the inferotemporal and parietal cortices, provides excellent examples of parallel processing streams (Figure 1). What evidence is there to suggest that these streams compute complementary properties, and how is this done? A neural theory, called FACADE (Form-And-Color-And-DEpth) theory, proposes that perceptual surfaces are formed in the LGN-Blob-Thin Stripe-V4 stream while perceptual boundaries are formed in the LGN-Interblob-Interstripe-V4 stream7. Many experiments have supported this prediction8-10.
FACADE theory suggests how and why perceptual boundaries and perceptual surfaces compute complementary properties. Figure 2A illustrates three pairs of complementary properties using the illusory contour percept of a Kanizsa square4. In response to both images of this figure, boundaries form inwardly between cooperating pairs of incomplete disk (or pac man) inducers to form the sides of the square. These boundaries are oriented to form in a collinear fashion between like-oriented inducers. The square boundary in Figure 2A can be both seen and recognized because of the enhanced illusory brightness of the Kanizsa square. In contrast, the square boundary in Figure 2B can be recognized even though it is not visible; that is, there is no brightness or color difference on either side of the boundary. Figure 2B shows that some boundaries can be recognized even though they are invisible. FACADE theory predicts that all boundaries are invisible within the boundary stream, which is proposed to occur in the Interblob cortical processing stream (Figure 1). This prediction has not yet been directly tested through a neurophysiological experiment, although several studies have shown the distinctness of a perceptual grouping, such as an illusory contour, can be dissociated from the visible stimulus contrast that is associated with it11,12.
Figure 1. Schematic diagram of anatomical connections and neuronal selectivities of early visual areas in the macaque monkey. LGN = lateral geniculate nucleus (parvocellular [parvo] and magnocellular [magno] divisions. Divisions of visual areas V1 and V2; blob = cytochrome oxidase blob regions, interblob = cytochrome oxidase-poor regions surrounding the blobs, 4B = lamina 4B, thin = thin (narrow) cytochrome oxidase stripes, interstripe = cytochrome oxidase-poor regions between the thin and thick stripes, thick = thick (wide) cytochrome oxidase stripes, V3 = Visual Area 3, V4 = Visual Area(s) 4, and MT = Middle Temporal area. Areas V2, V3, V4, and MT have connections to other areas not explicitly represented here. Area V3 may also receive projections from V2 interstripes or thin stripes. Heavy lines indicate robust primary connections, and thin lines indicate weaker, more variable connections. Dotted lines represent observed connections that require additional verification. Icons: rainbow = tuned and/or opponent wavelength selectivity (incidence at least 40%), angle symbol = orientation selectivity (incidence at least 20%), spectacles = binocular disparity selectivity and/or strong binocular interactions (V2; incidence at least 20%), and right-pointing arrow = direction of motion selectivity (incidence at least 20%). Adapted with permission from Reference 1.
This invisible boundary in Figure 2B can be traced to the fact that its vertical boundaries form between black and white inducers that possess opposite contrast polarity with respect to the gray background. The same is true of the boundary around the gray disk in Figure 2C. In this figure, the gray disk lies in front of a textured background whose contrasts with respect to the disk reverse across space. In order to build a boundary around the entire disk, despite these contrast reversals, the boundary system pools signals from opposite contrast polarities at each position. This pooling process renders the boundary system output insensitive to contrast polarity. The boundary system hereby loses its ability to represent visible colors or brightnesses, since its output cannot signal the difference between dark and light. It is in this sense that "all boundaries are invisible". These properties of boundary completion are summarized in Figure 3. Figure 2D illustrates another invisible boundary that can be consciously recognized.
If boundaries are invisible, then how do we see anything? FACADE theory predicts that visible properties of a scene are represented by the surface processing stream, which is predicted to occur within the Blob cortical stream (Figure 1). A key step in representing a visible surface is called filling-in. Why does a surface filling-in process occur? An early stage of surface processing compensates for variable illumination, or discounts the illuminant13-15 in order to prevent illuminant variations, which can change from moment to moment, from distorting all percepts. Discounting the illuminant attenuates color and brightness signals except near regions of sufficiently rapid surface change, such as edges or texture gradients, which are relatively uncontaminated by illuminant variations. Later stages of surface formation fill in the attenuated regions with these relatively uncontaminated color and brightness signals, and do so at the correct relative depths from the observer through a process called surface capture. This multi-stage process is an example of hierarchical resolution of uncertainty, because the later filling-in stage overcomes uncertainties about brightness and color that were caused by discounting the illuminant at an earlier processing stage.
Figure 2. A Kanizsa square (A) and a reverse-contrast Kanizsa square (B). The emergent Kanizsa square can be seen and recognized because of the enhanced illusory brightness within the illusory square. The reverse-contrast Kanizsa square can be recognized but not seen. (C) The boundary of the gray disk can form around its entire circumference even though the relative contrast between the disk and the white and black background squares reverses periodically along the circumference. (D) The vertical illusory contour that forms at the ends of the horizontal lines can be consciously recognized even though it cannot be seen by virtue of any contrast difference between it and the background.
How do the illuminant-discounted signals fill-in an entire region? Filling-in behaves like a diffusion of brightness across space15-17. In response to the display in Figure 3, filling-in spreads outwardly from the individual blue inducers in all directions. Its spread is thus unoriented. How is this spread of activation contained? FACADE theory predicts that signals from the boundary stream to the surface stream define the regions within which filling-in is restricted. This prediction has not yet been neurophysiologically tested. Without these boundary signals, filling-in would dissipate across space, and no surface percept could form. Invisible boundaries hereby indirectly assure their own visibility through their interactions with the surface stream.
For example, in Figure 2A, the square boundary is induced by four black pac man disks that are all less luminant than the white background. In the surface stream, discounting the illuminant causes these pac men to induce local brightness contrasts within the boundary of the square. At a subsequent processing stage, these brightness contrasts trigger surface filling-in within the square boundary. The filled-in square is visible as a brightness difference because the filled-in activity level within the square differs from the filled-in activity of the surrounding region. Filling-in can lead to visible percepts because it is sensitive to contrast polarity. These three properties of surface filling-in are summarized in Figure 3. They are easily seen to be complementary to the corresponding properties of boundary completion.
In Figure 2B, the opposite polarities of the two pairs of pac men with respect to the gray background lead to approximately equal filled-in activities inside and outside the square, so the boundary can be recognized but not seen. In Figure 2D, the white background can fill-in uniformly on both sides of the vertical boundary, so no visible contrast difference is seen.
Figure 3. In this example of neon color spreading, the color in the blue contours spreads in all directions until it fills the square illusory contour. An explanation of this percept is given in reference (7). Three complementary computational properties of visual boundaries and surfaces are also described. Boundaries are predicted to be completed within a Boundary Contour System (BCS) that passes through the Interblobs of cortical area V1, whereas surfaces are filled-in within a Feature Contour System (FCS) that passes through the Blobs of cortical area V1 (see Figure 1).
These remarks just begin the analysis of filling-in. Even in the seemingly simple case of the Kanizsa square, one often perceives a square hovering in front of four partially occluded circular disks, which seem to be completed behind the square. FACADE theory predicts how surface filling-in is organized to help such figure-ground percepts to occur, in response to both two-dimensional pictures and three-dimensional scenes7,18.
In summary, boundary and surface formation illustrate two key principles of brain organization: hierarchical resolution of uncertainty, and complementary interstream interactions. Figure 3 summarizes three pairs of complementary properties of the boundary and surface streams. Hierarchical resolution of uncertainty is illustrated by surface filling-in: Discounting the illuminant creates uncertainty by suppressing surface color and brightness signals except near surface discontinuities. Higher stages of filling-in complete the surface representation using properties that are complementary to those whereby boundaries are formed, guided by signals from these boundaries7,15-17.
Complementary form and motion interactions
A third parallel processing stream, passing through LGN-4B-Thick Stripe-MT, processes motion information (Figure 1)19-21. Why does a separate motion stream exist? In what sense are form and motion computations complementary? What do interactions between form and motion accomplish from a functional point of view? Modeling work suggests how these streams and their mutual interactions compensate for complementary deficiencies of each stream towards generating percepts of moving-form-in-depth22,23. Such motion percepts are called formotion percepts because they arise from a form-motion interaction.
The form system uses orientationally tuned computations while the motion system uses directionally tuned computations. In the formotion model, the processing of form by the boundary stream uses orientationally tuned cells24 to generate emergent object representations, such as the Kanizsa square (Figure 2). Such emergent boundary and surface representations, rather than just the energy impinging on our retinas, define the form percepts of which we are consciously aware. Precise orientationally tuned comparisons of left eye and right eye inputs are used to compute sharp estimates of the relative depth of an object from its observer25,26, and thereby to form three-dimensional boundary and surface representations of objects separated from their backgrounds7.
How is this orientation information used by the motion stream? An object can contain contours of many different orientations which all move in the same direction as part of the objects motion. Both psychophysical and neurophysiological experiments have shown that the motion stream pools information from many orientations that are moving in the same direction to generate precise estimates of a moving objects direction and speed19-21, 27-29. Lesions of the form system also show that, on its own, the motion system can make only coarse depth estimates30,31. Thus it seems reasonable that the orientationally tuned form system generates emergent representations of forms with precise depth estimates, whereas the directionally tuned motion system on its own can generate only coarse depth estimates. In this conception, orientation and direction are complementary properties, since orientation is computed parallel to a contour, whereas, at least in the absence of contextual constraints, direction is computed perpendicular to it32.
How do the emergent object boundaries that are computed with precise depth estimates in the form stream get injected into the motion stream and thereby enable the motion stream to track emergent object representations in depth? How does the motion stream pool information across space from multiple oriented contours to generate precise estimates of an objects direction and speed? These are large questions with complex answers on which many investigators are working. Classical computational models of motion detection involving Reichardt-like or motion-energy mechanisms have focused on the recovery of local motion directions33-35. Cells in motion processing areas like MT, however, are sensitive to both the direction and the speed of moving patterns20,36. Indeed, both direction and speed estimates are needed to track moving objects. More recent models have proposed how motion signals can be differentiated and pooled over multiple orientations and spatial locations to form global estimates of both object direction and speed37.
The present discussion of motion perception focuses on how the complementary uncertainties of the form and motion streams may be overcome by their interaction. There is evidence for an interstream interaction from area V2 of the form stream to area MT of the motion stream (Figure 1). This interaction could enable form representations to be tracked by the motion stream at their correct depths as they move through time. A model of this formotion interaction has successfully simulated many perceptual and brain data about motion perception22,23,37,38. This model predicts an important functional role for percepts of long-range apparent motion, whereby observers perceive continuous motion between properly timed but spatially stationary flashes of color or brightness. These continuous motion interpolations can be used to track targets, such as prey and predators, that intermittently disappear as they move at variable rates behind occluding cover, such as bushes and trees in a forest. The "flashes" are the intermittent appearances of the prey or predator. This prediction has not yet been tested neurophysiologically.
Figure 4. Images used to demonstrate that apparent motion of illusory figures arises through interactions of the static illusory figures, but not from the inducing elements themselves. Frame 1 (row 1) is followed by Frame 2 (row 2) in the same spatial locations. With correctly chosen image sizes, distances, and temporal displacements, an illusory square is seen to move continuously from the inducers in the left picture of Frame 1 to the inducers in the right picture of Frame 2. Reprinted with permission from Reference 39
Figure 4 illustrates an experimental display that vividly illustrates such a formotion interaction. In Frame 1, the pac men at the left side of the Figure define a Kanizsa square via the boundary completion process that takes place within the form stream. In Frame 2, the pac men are replaced by closed disks, and a square region is cleared in the line array to the right. As a result, an illusory square forms adjacent to the line ends. The pac men and line arrays were designed so that none of their features could be matched. Only the emergent squares have matching features. When Frame 2 is turned on right after Frame 1 is turned off, the square appears to move continuously from the pac man array to the line array. This percept is an example of apparent motion, since nothing in the images actually moves. The percept is a "double illusion" because both the emergent forms and their motions are visual illusions. The theory suggests that the illusory square boundaries are generated in the form stream before being injected into the motion stream, where they are the successive "flashes" that generate a wave of apparent motion. Such displays, and their theoretical explanation, also illustrate how the form system can help to create percepts of moving objects whose boundaries are not explicitly defined within individual frames of a display.
Complementary expectation learning and matching during what and where processing.
Complementary form and motion processing are proposed to be part of a larger design for complementary processing whereby objects in the world are cognitively recognized, spatially localized, and acted upon. The form stream inputs to the inferotemporal cortex, whereas the motion stream inputs to the parietal cortex (Figure 1). Many cognitive neuroscience experiments have supported the hypotheses of Ungerleider and Mishkin40,41 and of Goodale and Milner42 that inferotemporal cortex and its cortical projections learn to categorize and recognize what objects are in the world, whereas the parietal cortex and its cortical projections learn to determine where they are and how to deal with them by locating them in space, tracking them through time, and directing actions towards them. This design thus separates sensory and cognitive processing from spatial and motor processing.
These hypotheses have not, however, noted that sensory and cognitive learning processes are complementary to spatial and motor learning processes on a mechanistic level. Neural modeling has clarified how sensory and cognitive processes solve a key problem, called the stability-plasticity dilemma43-45, and can thus rapidly and stably learn about the world throughout life without catastrophically forgetting our previous experiences. In other words, we remain plastic and open to new experiences without risking the stability of previously learned memories. This type of fast stable learning enables us to become experts at dealing with changing environmental conditions: Old knowledge representations can be refined by changing contingencies, and new ones built up, without destroying the old ones due to catastrophic forgetting.
On the other hand, catastrophic forgetting is a good property for spatial and motor learning. We have no need to remember all the spatial and motor representations (notably motor maps and gains) that we used when we were children. In fact, the parameters that controlled our small childhood limbs would cause major problems if they continued to control our larger and stronger adult limbs. This forgetting property of the motor system should not be confused with the more stable sensory and cognitive representations with which they interact that, for example, help us to ride a bike after years of disuse.
These distinct what and where memory properties are proposed to follow from complementary mechanisms whereby these systems learn expectations about the world, and match these expectations against world data. To see how we use a sensory or cognitive expectation, suppose you were asked to "find the yellow ball within one-half second, and you will win a $10,000 prize". Activating an expectation of yellow balls enables more rapid detection of a yellow ball, and with a more energetic neural response, than if you were not looking for it. Neural correlates of such excitatory priming and gain control have been reported by several laboratories46-52. Sensory and cognitive top-down expectations hereby lead to excitatory matching with confirmatory bottom-up data. On the other hand, mismatch between top-down expectations and bottom-up data can suppress the mismatched part of the bottom-up data, and thereby start to focus attention upon the matched, or expected, part of the bottom-up data. This sort of excitatory matching and attentional focusing of bottom-up data with top-down expectations is proposed to generate resonant brain states that support conscious experiences43-45. Paradoxical data about conscious perceptual experiences from several modalities have been explained as emergent properties of such resonant states45.
In contrast, a motor expectation represents where we want to move, such as to the position where our hand can grasp a desired object. Such a motor expectation is matched against where the hand is. After the hand moves to the desired position, no further movement is required, and movement stops. Motor expectations hereby control inhibitory matching. Inhibitory matching does not lead to brain resonance, so motor processing is not conscious. In summary, in the present theory, sensory and cognitive matching is excitatory, whereas spatial and motor matching is inhibitory. These are complementary properties.
Figure 5. The LAMINART model synthesis of bottom-up, top-down, and horizontal interactions in LGN, V1, and V2. Cells and connections with open symbols denote preattentive excitatory mechanisms that are involved in perceptual grouping. Solid black symbols denote inhibitory mechanisms. Dashed symbols denote top-down attentional mechanisms.
Recent modeling work predicts some of the cells and circuits that are proposed to carry out these complementary types of matching. For example, recent modeling has suggested how top-down sensory matching is controlled in visual cortex, notably from cortical area V2 to V1, and by extension in other sensory and cognitive neocortical circuits53,54. This top-down circuit is part of a larger model of how bottom-up, top-down, and horizontal interactions are organized within the laminar circuits of visual cortex; see Figure 5. The circuit generates top-down outputs from cortical layer 6 of V2 that activate, via a possibly polysynaptic pathway, layer 6 of V1. Cells in layer 6 of V1, in turn, activate an on-center off-surround circuit to layer 4 of V1. (See below for more discussion of on-center off-surround circuits.) The on-center is predicted to have a modulatory effect on layer 4, due to the balancing of excitatory and inhibitory inputs to layer 4 within the on-center. The inhibitory signals in the off-surround can suppress unattended visual features. This top-down circuit realizes a type of folded feedback, whereby feedback inputs from V2 are folded back into the feedforward flow of information from layer 6-to-4 of V1. The modulatory nature of the layer 6-to-4 connections helps to explain a curious fact about bottom-up cortical design: despite the fact that the LGN activates layer 6 of V1 in a bottom-up fashion, a separate, direct excitatory pathway exists from LGN to layer 4 of V1. It is predicted that this direct pathway is needed to enable the LGN to drive layer 4 cells to suprathreshold activity levels, because the indirect LGN-6-4 pathway is modulatory. The modeling articles summarize neurophysiological, anatomical, and psychophysical experiments that are consistent with these predictions.
Recent modeling work also predicts some of the cells and circuits that are proposed to carry out top-down motor matching, notably in cortical areas 4 and 555,56. Inhibitory matching is predicted to occur between a Target Position Vector (TPV) that represents where we want to move our arm, and a Present Position Vector (PPV) that computes an outflow representation of where the arm is now (Figure 6). This comparison is proposed to occur at Difference Vector (DV) cells in cortical area 5, which compute how far, and in what direction, the arm is commanded to move. This Difference Vector is, in turn, predicted to be transmitted to cortical area 4, where is multiplicatively gated by a GO signal that is under volitional control. Turning on the GO signal determines whether the limb will move, and its amplitude scales the speed of movement. The product of DV and GO hereby determined a Desired Velocity Vector (DVV). Such a DV is predicted to be computed at area 5 phasic cells, and its corresponding DDV at area 4 phasic MT cells. The modeling articles summarize neurophysiological, anatomical, and psychophysical experiments that are consistent with these predictions. It should also be noted that various other cell types within cortical areas 4 and 5 do not do inhibitory matching, and may even support resonant states.
Figure 6. The VITE circuit model. Thick connections represent the kinematic feedback control aspect of the model, with thin connections representing additional compensatory circuitry. GO, scaleable gating signal; DVV, desired velocity vector; OPV, outflow position vector; OFPV, outflow force + position vector; SFV, static force vector; IFV, inertial force vector; CBM, assumed cerebello-cortical input to the IFV stage; PPV, perceived position vector; DV, difference vector; TPV, target position vector; g d, dynamic gamma motoneuron; g s, static gamma motoneuron; a , alpha motoneuron; Ia, type Ia afferent fiber; II, type II afferent fiber (position error feedback); c.s., central sulcus; i.p.s., intraparietal sulcus. The symbol + represents excitation, represents inhibition, 5 represents multiplicative gating, and + Ú represents integration.
The learning processes that accompany these complementary types of matching are also proposed to exhibit complementary properties. Learning within the sensory and cognitive domain is often match learning. Match learning occurs only if a good enough match occurs between active top-down expectations and bottom-up information. When such an approximate match occurs, previously stored knowledge can be refined. If novel information cannot form a good enough match with the expectations that are read-out by previously learned recognition categories, then a memory search is triggered that leads to selection and learning of a new recognition category, rather than catastrophic forgetting of an old one43-45. In contrast, learning within spatial and motor processes is proposed to be mismatch learning that continuously updates sensory-motor maps57 or the gains of sensory-motor commands58,59. Thus both learning and matching within the what and where streams may have complementary properties. As a result, we can stably learn what is happening in a changing world, thereby solving the stability-plasticity dilemma43-45, while adaptively updating our representations of where objects are and how to act upon them using bodies whose parameters change continuously through time57-59.
Complementary attentive-learning and orienting-search.
Match learning has the great advantage that it leads to stable memories in response to changing environmental conditions. It also has a potentially disastrous disadvantage, however: If you can only learn when there is a good enough match between bottom-up data and learned top-down expectations, then how do you ever learn anything that you do not already know? Some popular learning models, such as back propagation, try to escape this problem by assuming that the brain does only supervised learning. During supervised learning, an explicit correct answer, or teaching signal, is provided in response to every input. This teaching signal forces learning to track the correct answer. Such a model cannot learn if an explicit answer is not provided. It appears, however, that much human and animal learning, especially during the crucial early years of life, takes place in a relatively unsupervised fashion.
Other models do allow unsupervised learning to occur. Here, the key problem to be solved is, that if a teacher is not available to force the selection and learning of a representation that can map onto a correct answer, then the internal dynamics of the model must do so on their own. In order to escape the problem of not being able to learn something that one does not already know, some of these models assume that we do already know (or, more exactly, have internal representations for) everything that we may ever wish to know, and that experience just selects and amplifies these representations60. These models depend upon the bottom-up filtering of inputs, and a very large number of internal representations that respond to these filtered inputs, to provide enough memory to represent whatever may happen. Having such a large number of representations leads to a combinatorial explosion, with an implausibly large memory. Thus, although using a very large number of representations can help with the problem of catastrophic forgetting, it creates other, equally serious, problems instead. Other unsupervised learning models shut down learning as time goes on in order to avoid catastrophic forgetting61.
I propose that these problems are averted in the brain through the use of another complementary interaction, which was briefly mentioned above. This complementary interaction helps to balance between processing the familiar and the unfamiliar, the expected and the unexpected. It does so using complementary processes of resonance and reset, which are predicted to subserve properties of attention and memory search, respectively. This interaction enables the brain to discover and stably learn new representations for novel events in an efficient way, without assuming that representations already exist for as yet unexperienced events. It hereby solves the combinatorial explosion while also solving the stability-plasticity dilemma.
Figure 7. Search for a recognition code within an ART learning circuit: (A) The input pattern I is instated across the feature detectors at level F1 as a short term memory (STM) activity pattern X. Input I also nonspecifically activates the orienting subsystem A. STM pattern X is represented by the hatched pattern across F1. Pattern X both inhibits A and generates the output pattern S. Pattern S is multiplied by long term memory (LTM) traces, or learned adapative weights. These LTM-gated signals are added at F2 nodes to form the input pattern T, which activates the STM pattern Y across the recognition categories coded at level F2. (B) Pattern Y generates the top-down output pattern U which is multiplied by top-down LTM traces and added at F1 nodes to form the prototype pattern V that encodes the learned expectation of the active F2 nodes. If V mismatches I at F1, then a new STM activity pattern X* is generated at F1. X* is represented by the hatched pattern. It includes the features of I that are confirmed by V. Mismatched features are inhibited. The inactivated nodes corresponding to unconfirmed features of X are unhatched. The reduction in total STM activity which occurs when X is transformed into X* causes a decrease in the total inhibition from F1 to A. (C) If inhibition decreases sufficiently, A releases a nonspecific arousal wave to F2, which resets the STM pattern Y at F2. (D) After Y is inhibited, its top-down prototype signal is eliminated, and X can be reinstated at F1. Enduring traces of the prior reset lead X to activate a different STM pattern Y at F2. If the top-down prototype due to Y also mismatches I at F1, then the search for an appropriate F2 code continues until a more appropriate F2 representation is selected. Then an attentive resonance develops and learning of the attended data is initiated.[Reprinted with permission from reference .]
One of these complementary subsystems is just the what stream that was described above, with its top-down expectations that are matched against bottom-up inputs. When a recognition category activates a top-down expectation that achieves a good enough match with bottom-up data, this match process focuses attention upon those feature clusters in the bottom-up input that are expected (Figure 7). Experimental evidence for such matching and attentional processes has been found in neurophysiological data about perception and recognition48,50,62-66. Many behavioral and neural data have been explained by assuming that such top-down feedback processes can lead to resonant brain states that play a key role in dynamically stabilizing both developmental and learning processes43-45,53, 67-69.
How does a sufficiently bad mismatch between an active top-down expectation and a bottom-up input drive a memory search, say because the input represents an unfamiliar type of experience? This mismatch within the attentional system is proposed to activate a complementary orienting system, which is sensitive to unexpected and unfamiliar events. Output signals from the orienting system rapidly reset the recognition category that has been reading out the poorly matching top-down expectation (Figure 7B and 7C). The cause of the mismatch is hereby removed, thereby freeing the system to activate a different recognition category (Figure 7D). The reset event hereby triggers memory search, or hypothesis testing, which automatically leads to the selection of a recognition category that can better match the input. If no such recognition category exists, say because the bottom-up input represents a truly novel experience, then the search process can automatically activate an as yet uncommitted population of cells, with which to learn about the novel information. This learning process works well under both unsupervised and supervised conditions. Supervision can force a search for new categories that may be culturally determined, and are not based on feature similarity alone. For example, separating the letters E and F into separate recognition categories is culturally determined; they are quite similar based on visual similarity alone. Taken together, the interacting processes of attentive-learning and orienting-search realize a type of error correction through hypothesis testing that can build an ever-growing, self-refining internal model of a changing world.
The complementary attentive-learning and orienting-search subsystems and how they interact have been progressively developed since the 1970s within Adaptive Resonance Theory, or ART43-45. Neurobiological data have elsewhere been reviewed in support of the ART hypothesis that the attentive-learning system includes such what processing regions as inferotemporal cortex and its projections in prefrontal cortex, whereas the orienting-search system includes circuits of the hippocampal system45. Data about mismatch cells in the hippocampal system are particularly relevant to this hypothesis70. ART predicts that these interactions between inferotemporal cortex and the hippocampal system during a mismatch event offset the inability of the what processing stream to search for and learn appropriate new recognition codes on its own. This deficiency of the what stream has been used to predict how hippocampal lesions can lead to symptoms of amnesic memory45. Because of their ability to learn stably in real-time about large amounts of information in a rapidly changing world, ART models have also been used in pattern recognition applications in technology71.
Complementary additive and subtractive intrastream processing.
The two types of matching across the what and where processing streams use different combinations of excitatory and inhibitory neural signals. Complementary processes can also arise within a processing stream. Thus, a processing stream may be broken into complementary substreams. Several examples will now be mentioned wherein parallel combinations of additive and subtractive neural signals can be computed within a single processing stream. A classical example in the what processing stream combines outputs from long-wave length (L) and medium wave-length (M) retinal photoreceptors into parallel luminance (L + M) and color (L - M) channels72. The color channels compute reflectances, or ratios, by discounting the illuminant, while the luminance channel computes luminant energy. By using both channels, the illuminant can be discounted without throwing away information about luminant energy.
Intrastream complementarity also seems to occur within the where stream. Here, cortical area MT activates area MST (not shown in Figure 1) on the way to parietal cortex. In macaque monkeys, the ventral part of MST helps to track moving visual objects, whereas dorsal MST helps to navigate in the world using global properties of optic flow73,74. These tasks are behaviorally complementary: the former tracks an object moving in the world with respect to an observer, whereas the latter navigates a moving observer with respect to the world. The tasks are also neurophysiologically complementary: Neurons in ventral MST compute the relative motion of an object with respect to its background by subtracting background motion from object motion; whereas neurons in dorsal MST compute motions of a wide textured field by adding motion signals over a large visual domain74. Corresponding to MSTs breakdown into additive and subtractive subregions, area MT of owl monkeys possesses distinct bands and interbands75. Band cells have additive receptive fields for visual navigation, whereas interband cells have subtractive receptive fields for computing object-relative motion. Modeling studies have shown how these complementary properties can be used, on the one hand, for visual navigation using optical flow information and, on the other hand, for predictive tracking of moving targets using smooth pursuit eye movements76,77. These studies make a number of neurophysiological predictions, including how the log polar mapping that is defined by the cortical magnification factor helps to achieve good navigational properties. A remarkable prediction is that the biologically observed spiral tuning curves that were found by Graziano et al.78 in cortical area MST maximize the amount of position invariance of which the positionally-variant log polar map is capable.
Intrastream complementarity is also predicted to occur during sensory-motor control, or how processing. To see this, suppose that both eyes fixate an object that can be reached by the arms. Psychophysical79 and neurophysiological data80,81 suggest that the vergence of the two eyes, as they fixate the object, is used to estimate the objects radial distance, while the spherical angles that the eyes make relative to the observers head estimate the objects angular position. Distance and angle are mathematically independent properties of an objects position with respect to an observer. How does the brain compute the distance and angle to an object that the eyes are fixating? A neural model proposes how addition and subtraction can again realize the necessary computations by exploiting the bilateral symmetry of the body57. In particular, eye movement control pathways give rise to parallel branches, called corollary discharges, that inform other brain systems of the present position of the eyes13. These outflow movement control pathways have an opponent organization to control the bodys agonist and antagonist muscles. Neural modeling has mathematically proved that, when both eyes fixate an object, accurate spherical angle and vergence estimates of object position may be derived by adding and subtracting, respectively, the ocular corollary discharges that control the two eyes, while preserving their opponent relationships, at separate populations of cells57.
These examples illustrate how a rich repertoire of complementary behavioral capabilities can be derived by doing "brain arithmetic", whereby outputs of a processing stage are segregated into additive and subtractive parallel computations at a subsequent processing stage. Such additive and subtractive combinations can occur both between processing streams and within a single processing stream. These simple computations generate very different behavioral properties when applied to different sensory inputs or different stages of a processing stream. The next sections illustrate several ways in which complementary multiplication and division operations may enter the brains "arithmetic" repertoire.
Factorization of pattern and energy: ratio processing and synchrony
Multiplication and division occur during processes that illustrate the general theme of how the brain achieves factorization of pattern and energy67. Pattern here refers to the hypothesis that the brains functional units of short-term representation of information, and of long-term learning about this information, are distributed patterns of activation and of synaptic weight, respectively, across a neuronal network. Energy refers to the mechanisms whereby pattern processing is turned on and off by activity-dependent modulatory processes.
Why do pattern and energy need to be processed separately? Why cannot a single process do both? One reason is that cell activities can fluctuate within only a narrow dynamic range. Often input amplitudes can vary over a much wider dynamic range. For example, if a large number of input pathways converge on a cell, then the number of active input pathways can vary greatly through time, and with it, the total size of the cell input. Owing to the small dynamic range of the cell, its activity could easily become saturated when a large number of inputs is active. If all the cells got saturated, then their activities could not sensitively represent the relative size, and thus importance, of their respective inputs. One way to prevent this would be to require that each individual input be chosen very small so that the sum of all inputs would not saturate cell activity. But such small individual inputs could easily be lost in cellular noise. The cells small dynamic range could hereby make it insensitive to both small and large inputs as a result of noise and saturation, respectively, at the lower and upper extremes of the cells dynamic range. This noise-saturation dilemma faces all biological cells, not merely nerve cells. Interactions across a network of cells is needed to preserve information about the relative sizes of inputs to the cells in the network, and thereby overcome noise and saturation. This kind of pattern processing sacrifices information about the absolute amplitude of inputs in order to enable the cells to respond sensitively to their relative size, over a wide dynamic range. Since the pattern processing network discards information about absolute input size, a separate channel is needed to track information about the total amplitude, or energy, of the inputs.
Retaining sensitivity to the relative size of inputs can be accomplished by on-center off-surround interactions between cells that obey the membrane equations of neurophysiology67,82,83. In a feedforward on-center, off-surround network, feedforward inputs excite their target cells while inhibiting more distant cells. To store inputs temporarily in short-term (or working) memory, excitatory feedback between nearby cells and inhibitory feedback between more distant cells can solve the noise-saturation dilemma. Stated using more general terms, these networks define mass-action interactions among short-range cooperative and longer-range competitive inputs or activities. The mass action terms of membrane equations introduce multiplication into brain arithmetic by multiplying cell inputs with cell voltages, or activities. Membrane equations respond to on-center off-surround interactions by dividing each cells activity by a weighted sum of all the cell inputs (in a feedforward interaction) or activities (in a feedback interaction) with which it interacts. This operation keeps cell activities away from the saturation range by normalizing them while preserving their sensitivity to input ratios.
The ubiquitous nature of the noise-saturation dilemma in all cellular tissues clarifies why such on-center off-surround anatomies are found throughout the brain. For example, when ratio processing and normalization occur during visual perception, they help to control brightness constancy and contrast15,16 as well as perceptual grouping and attention53,54,84,85. At higher levels of cognitive processing, these mechanisms can provide a neural explanation of the limited capacity of cognitive short-term memory68.
The cooperative-competitive interactions that preserve cell sensitivity to relative input size also bind these cell activities into functional units. Indeed, relative activities need to be computed synchronously, and early theorems about short-term memory and long-term memory processing67 predicted an important role for synchronous processing between the interacting cells. Subsequent neurophysiological experiments have emphasized the functional importance of synchronous brain states86,87. More recent neural modeling has shown how such synchronized activity patterns can, for example, quantitatively explain psychophysical data about temporal order judgments during perceptual grouping within the visual cortex88.
Motor expectation and volition
Factorization of pattern and energy shows itself in many guises. For example, it helps to explain how motor expectations (pattern) interact with volitional speed signals (energy) to generate goal-directed arm movements89-91, as during the computation of the Desired Velocity Vector in the cortical area 4 circuit of Figure 6. As noted in the discussion of where and how processing, a motor expectation represents where we want to move, such as to the position where our hand can grasp a desired object. Such a motor representation, or Target Position Vector (TPV), can prime a movement, or get us ready to make a movement, but by itself, it cannot release the movement55,89. First the TPV needs to be converted into a Difference Vector (DV), which triggers an overt action only when a volitional signal90 that multiplicatively gates action read-out. The volitional signal for controlling movement speed is called a GO signal, as in Figure 6. The signal for controlling size is called a GRO signal. Neural models have predicted how such GO and GRO signals may, for example, alter the size and speed of handwritten script without altering its form91. As noted in Figure 6, some motor expectations seem to be computed in the parietal and motor cortex. Volitional signals seem to be computed within the basal ganglia90.
The Vector Integration to Endpoint, or VITE, neural model, summarized in Figure 6, of how these arm-controlling pattern and energy factors combine within cortical areas 4 and 5 has been used to predict the functional roles of six identified cortical cell types, and to quantitatively simulate their temporal responses during a wide range of behavioral tasks55,56. These results support model hypotheses about how variable-speed and variable-force arm movements can be carried out in the presence of obstacles. The model hereby provides a detailed example of how task-sensitive volitional control of action realizes an overall separation into pattern and energy variables.
Figure 8. Schematic conditioning circuit: Conditioned stimuli (CSi) activate sensory categories (SCSi), which compete among themselves for limited capacity short-term memory activation and storage. The activated SCsi representations, i = 1, 2, elicit trainable signals to drive representations D and motor command representations M. Learning from a sensory representation SCSi to a drive representation D is called conditioned reinforcer learning. Learning from D to a SCSi is called incentive motivational learning. Signals from D to SCSi are elicited when the combination of conditioned sensory plus internal drive inputs is sufficiently large. Sensory representations that win the competition in response to the balance of external inputs and internal motivational signals can activate motor command pathways
Cognitive-emotional interactions and attentional blocking
Cognitive-emotional learning enables sensory and cognitive events to acquire emotional and motivational significance. Both classical and instrumental conditioning can be used for this purpose92-95. For example, during classical conditioning, an irrelevant sensory cue, or conditioned stimulus (CS), is paired with a reinforcing event, or unconditioned stimulus (US). The CS hereby acquires some of the reinforcing properties of the US; it becomes a "conditioned reinforcer" with its own motivational properties. The manner in which the thalamocortical representation of a conditioned reinforcer CS is influenced by motivational signals represents, I suggest, another example of factorization of pattern and energy. Here, the activities across the thalamocortical representations of recently presented sensory events, including the CS, constitute the "pattern". This pattern is normalized by the feedback on-center off-surround interactions that are used to store the activities in short-term memory without saturation. If one or more of these sensory events is a conditioned reinforcer, then it can amplify its own activity via learned motivational feedback signals, which play the role of "energy" in this example45,67. These amplified representations can, in turn, attentionally block94, or inhibit, the representations of irrelevant sensory events via the off-surround of the feedback network. Attentional blocking is one of the key mechanisms whereby animals learn which consequences are causally predicted by their antecedent sensory cues and actions, and which consequences are merely accidental. A more detailed summary of how blocking is proposed to happen is now given.
During cognitive-emotional learning, at least three types of internal representations interact: Sensory and cognitive representations (S), drive representations (D), and motor representations (M)45,67, as depicted in Figure 8. The sensory representations S are thalamocortical representations of external events, like the ones described above within the what processing stream. They include representations of CSs. D representations include the hypothalamic and amygdala circuits at which homeostatic and reinforcing cues converge to generate emotional reactions and motivational decisions96-98. M representations include cortical and cerebellar circuits for controlling discrete adaptive responses59,99. As noted above, the S representations represent the pattern information in this example. They interact with one another via an on-center off-surround feedback network that stores their activities in short-term memory, while also solving the noise-saturation dilemma. The D representations supply modulatory energy owing to the action of the following types of learning processes:
(1) Conditioned reinforcer learning occurs in the S D pathways, and enables a sensory event, such as a conditioned stimulus CS, to become a conditioned reinforcer that can activate a drive representation D. This may be accomplished by pairing the CS with an unconditioned stimulus US. The CS activates its sensory representation S. The US activates its own sensory representation, which in turn activates the drive representation D. Adaptive weights in the S D pathway can grow in response to this correlated activity. Future presentations of the CS can hereby lead to activation of D, which controls various emotional and motivational responses.
(2) Due to this pairing of CS and US, incentive motivational learning can also occur in the adaptive weights within the D S pathway. This type of learning allows an activated drive representation D to prime, or modulate, the sensory representations S of all sensory events that have consistently been activated with it in the past. Speaking intuitively, these sensory events are motivationally compatible with D.
(3) S M habit learning, or motor learning, trains the sensorimotor maps and gains that control appropriate and accurately calibrated responses to the CS. These processes include circuits such as those summarized in Figure 6.
Conditioned reinforcer learning and incentive motivational learning combine to control attentional blocking in the following way. As noted above, the sensory representations S are the pattern variables that store sensory and cognitive representations in short-term memory using on-center off-surround feedback networks. Due to the self-normalizing properties of these networks, the total activity that can be stored in short-term memory across the entire network is limited. This is thus, once again, an example of the noise-saturation dilemma. Due to activity normalization, sufficiently great activation of one sensory representation implies that other sensory representations cannot be stored in short-term memory. In the present example, conditioning of a CS to a US strengthens both its S D conditioned reinforcer and D S incentive motivational pathways. Thus, when a conditioned reinforcer CS activates its sensory representation S, learned S D S positive feedback quickly amplifies the activity of S. This S D S feedback pathway supplies the motivational energy that focuses attention upon salient conditioned reinforcers. These amplified sensory representations inhibit the storage of other sensory cues in short-term memory via the lateral inhibition that exists among the sensory representations S. Blocking is hereby explained using incentive motivational "energy" to amplify conditioned reinforcer CS representations within the self-normalized sensory "pattern" that is stored in short-term memory. This S D S feedback causes a cognitive-emotional resonance to occur. The model prediction of how drive representations D, such as those in the amgydala, influence blocking by delivering incentive motivational feedback to thalamocortical sensory representations has not yet been tested neurophysiologically.
Rate-invariant speech and language understanding
Factorization of pattern and energy also seems to play an important role in temporally organized cognitive processes such as speech and language. Here sequences of events are transformed into temporally evolving spatial patterns of activation that are stored within working memories100. The pattern information that is stored in working memory represents both the event itselfits so-called item informationand the temporal order in which the events occurred. The energy information encodes both the temporal rate and rhythm with which the events occur68. Factorization of information about item and order from information about rate and rhythm helps us to understand speech that is spoken at variable rates: A rate-invariant representation of speech and language in working memory avoids the need to define multiple representations of the same speech and language utterance at every conceivable rate. This representation can, in turn, be used to learn speech and language codes, or categories, that are themselves not too sensitive to speech rate. Because rate and rhythm information are substantially eliminated from the rate-invariant working memory representation, rate and rhythm need to be computed by a separate process. This is a problem of factorization, rather than of independent representation, because the speech rate and rhythm that are perceived depend upon the categorical language units, such as syllables and words, that are familiar to the listener. What these language units are, in turn, depends upon how the listener has learned to group together, and categorize, the temporally distributed speech and language features that have previously been stored in the rate-invariant working memory.
Rate-invariant working memories can be designed from specialized versions of the on-center off-surround feedback networks that are used to solve the noise-saturation dilemma67,68,101. In other words, the networks that are used to store spatially distributed feature patterns, without a loss of sensitivity to their identity and relative size, can be specialized to store temporally distributed events, without a loss of sensitivity to their identity and temporal order. The normalization of these stored activities is the basis for their rate-invariant properties. Thus, this model predicts that a process like discounting the illuminant, in the spatial domain, uses a variant of the same mechanisms that are used to process rate-invariant speech, in the temporal domain. A key problem concerns how the rate-invariant working memory can maintain the same representation as the speech rate speeds up. The model predicts that the energy information that is computed from the speech rate and rhythm can be used to automatically gain-control the processing rate of the working memory to maintain its rate-invariant speech properties102. In particular, the rate at which the working memory stores individual events needs to keep up with the overall rate at which successive speech sounds are presented. A neural model of this process has been progressively developed to quantitatively simulate psychophysical data concerning the categorization of variable-rate speech by human subjects69,102,103, and to functionally interpret neurophysiological data that are consistent with model properties103. In this model, the working memory interacts with a categorization network via bottom-up and top-down pathways, and conscious speech is a resonant wave that emerges through these interactions.
Much experimental evidence has supported the idea that the brain is organized into processing streams, but how these streams are determined and how they interact to generate behavior is still a topic of active research. This article has summarized some of the rapidly growing empirical and theoretical evidence that our brains compute complementary operations within parallel pairs of processing streams. Table 1 summarizes some of the processes for which evidence of complementarity has been collected from behavioral and neural data and models. The variety of these behavioral processes provides some indication of the generality of this organizational principle in the brain. Interstream interactions are proposed to overcome complementary processing deficiencies within each stream. Hierarchical interactions between the several levels of each processing stream are proposed to overcome informational uncertainties that occur at individual processing stages within that stream. Hierarchical intrastream interactions and parallel interstream interactions work together to generate behavioral properties that are free from these uncertainties and complementary insufficiencies. Such complementary processing may occur on multiple scales of brain organization.
Many experimentalists have described properties of functional specialization and integration in their neural data. Some neural modelers have attempted to characterize such properties using concepts about how the brain may work to achieve information maximization. Information, as a technical concept, is well defined for stationary information channels, or channels whose statistical properties tend to persist through time. In contrast, brains self-organize on a relatively fast time scale through development and life-long learning, and do so in response to nonstationary, or rapidly changing, statistical properties of their environments. I propose that hierarchical intrastream interactions and parallel interstream interactions between complementary systems are a manifestation of this capacity for self-controlled and stable self-organization. This observation leads to my final remarks.
How do complementary sets of properties arise, rather than some other combination of properties? How is the organization of smaller-scale complementary properties organized within larger-scale complementary properties? The simplest hypothesis, for which little direct experimental evidence is yet available, is that each pair of complementary processes represents two sides of a larger brain system. Complementarity could arise if, during brain development, precursors of the larger system bifurcated into complementary streams through a process of symmetry-breaking that operates on multiple scales of organization. In this view, complementary systems are an integral part of the self-organization process that enables the brain to adapt to a rapidly changing world. This view of brain development is not in conflict with prevailing views of specific developmental mechanisms104. Rather, it points to a global organizational principle that may be capable of coordinating them.
Thus, just as in the organization of the physical world with which it interacts, it is proposed that the brain is organized to obey principles of complementarity, uncertainty, and symmetry-breaking. In fact, it can be argued that known complementary properties exist because of the need to process complementary types of information in the environment. The processes that form perceptual boundaries and surfaces provide a particularly clear example of this hypothesis. The complementary brain may thus perhaps best be understood through analyses of the cycles of perception, cognition, emotion, and action whereby the brain is intimately linked to its physical environment through a continuously operating feedback cycle. One useful goal of future research may be to study more directly how complementary aspects of the physical world are translated into complementary brain designs for coping with this world.