Journal of Cognitive Neuroscience, in press
Technical Report CAS/CNS-TR-99-023
Boston: Boston University
§Supported in part by the Defense Research Projects Agency, the National Science Foundation, and the Office of Naval Research (ONR N00014-95-1-0409, NSF IRI 97-20333, and ONR N00014-95-1-0657).
Supported in part by the Defense Research Projects Agency and the Office of Naval Research (ONR N00014-94-1-0597, ONR N00014-95-1-0409, and ONR N00014-95-1-0657).
Smooth pursuit eye movements are eye rotations that are used to maintain fixation on a moving target. Such rotations complicate the interpretation of the retinal image, because they nullify the retinal motion of the target, while generating retinal motion of stationary objects in the background. This poses a problem for the oculomotor system, which must track the stabilized target image, while suppressing the optokinetic reflex, which would move the eye in the direction of the retinal background motion, which is opposite to the direction in which the target is moving. Similarly, the perceptual system must estimate the actual direction and speed of moving objects in spite of the confounding effects of the eye rotation. This paper proposes a neural model to account for the ability of primates to accomplish these tasks. The model simulates the neurophysiological properties of cell types found in the superior temporal sulcus of the macaque monkey, specifically the medial superior temporal (MST) region. These cells process signals related to target motion, background motion, and receive an efference copy of eye velocity during pursuit movements. The model focuses on the interactions between cells in the ventral and dorsal subdivisions of MST, which are hypothesized to process target velocity and background motion, respectively. The model explains how these signals can be combined to explain behavioral data about pursuit maintenance and perceptual data from human studies, including the Aubert-Fleischl phenomenon and the Filehne Illusion, thereby clarifying the functional significance of neurophysiological data about these MST cell properties. It is suggested that the connectivity used in the model may represent a general strategy used by the brain in analyzing the visual world.
The visual acuity of humans and other primates is marked by a central foveal region of high acuity and concentric regions of decreasing acuity. As such, it is advantageous to keep the fovea fixed on an object as it moves relative to the observer. This can be accomplished if the eye rotates at a speed equal to that of the target. Such rotations are called smooth pursuit eye movements (SPEMs). Humans can execute accurate SPEMs for target motion in excess of 30o/s (Lisberger et al., 1987). A SPEM consists of at least two phases: one related to target selection and pursuit initiation, and another related to maintenance of ongoing pursuit movements. This paper focuses on the second phase.
The maintenance of SPEMs is often characterized in terms of a negative feedback system, meaning that the oculomotor system continuously attempts to match the velocity of the eye to that of the target. However, this description is likely to be incomplete for the simple reason that a successful SPEM stabilizes the target near the fovea. As a result, there is often little or no motion of the target on the retina. Therefore, the pursuit system cannot rely on retinal target velocity to drive a SPEM. A number of additional signals have been hypothesized to guide pursuit, including target position, target acceleration (Lisberger et al., 1987), and a "memory" of target velocity (Young et al., 1968), which is often described in terms of an oculomotor efference copy of eye velocity (von Holst, 1954). An efference copy duplicates the neural signal sent to the muscles which move the eye, and as such carries information about the movement of the eye that is independent of the retinal image. The efference copy therefore maintains a prediction of pursuit velocity from moment to moment. Thus, the brain may combine retinal information about target motion or position with extraretinal information about the velocity of eye rotation.
An additional source of information relevant to pursuit maintenance is the existence of a visual background. A SPEM is typically made across a visual scene which contains stationary objects, and these objects sweep across the retinal image with a velocity opposite that of the target. This results in large-field coherent motion across the retina, which normally triggers an involuntary eye rotation known as optokinetic nystagmus (OKN). OKN causes the eye to move in the same direction as the large stimulus, so that an OKN movement to track retinal motion of a visual background during pursuit would be in the opposite direction of the ongoing pursuit movement. As such, it is crucial that optokinetic signals be suppressed during execution of a SPEM. On the other hand, such signals provide a reafferent stimulus to the visual system indicating that the eye is moving (Gibson, 1950). Therefore the visual motion of the background can provide information about the velocity of ongoing SPEMs, even when the pursuit target is relatively stable on the retina. Such a signal could also be used to generate a prediction of target velocity. The visual motion of the background therefore has contradictory effects on the pursuit system, providing a potentially useful signal for pursuit maintenance, and a potentially destructive OKN signal.
An analogous problem exists for the perceptual system. During an accurate SPEM, the retinal target motion is very small, while objects in the background move across the retina. Psychophysical experiments indicate that human subjects are able to estimate the velocity of objects during a SPEM, but that this ability is somewhat limited. Specifically, it is well-known that observers underestimate the velocity of a moving target during a SPEM when no visible background is present (the Aubert-Fleischl phenomenon: Aubert, 1886), and perceive slight motion of a stationary visual background during a SPEM (the Filehne illusion: Filehne, 1922). More generally, distinguishing between the retinal motion caused by the movement of external objects and that caused by eye rotation is of primary importance for navigation and object tracking, as evidenced by the behavioral deficits that occur when localized cortical lesions disrupt this ability (Haarmeier et al., 1997).
Neural signals related to target motion, background motion, and an oculomotor efference copy have been found in single cells in the superior temporal sulcus of monkey cortex. Within this sulcus, the medial temporal (MT) area contains cells which are selective for the direction and speed of motion. These cells can be broadly subdivided into two types, based on physiological properties and anatomical clustering (Born and Tootell, 1992). One type of cell responds best to small moving stimuli, and is suppressed by large motion patterns, while the other responds best to large stimuli moving across the receptive fields. The latter has been suggested to be useful for interpreting retinal motion of the background during self-induced motion, while the former is likely to be useful for interpreting the motion of potential pursuit targets (Born and Tootell, 1992; Eifuku and Wurtz, 1998). Area MT projects to the medial superior temporal (MST) area, which appears to be subdivided on the basis of the cell types found in MT (Tanaka et al., 1993; Bherezovskii and Born, 1999). The dorsal part (MSTd) contains cells which respond best to large-field motion, while the ventral division (MSTv) responds best to small moving targets.
Lesion studies have confirmed that areas MT and MST are involved in the control of SPEMs. Lesions of MT create a retinotopic deficit in pursuit initiation, meaning that the monkey's ability to execute a SPEM is impaired when the target moves in a particular part of the visual field (Dursteler et al., 1987), irrespective of target direction. Lesions of MST create a directional deficit, impairing the animal's ability to execute a SPEM when the target moves toward the lesioned hemisphere, irrespective of position in the visual field (Dursteler and Wurtz, 1988). A similar deficit is seen for OKN movements, indicating that the two behaviors share common neural pathways. Further evidence for the role of MST in controlling SPEMs comes from studies indicating that microstimulation within MST influences the velocity of SPEMs (Komatsu and Wurtz, 1989). It is also known that cells in MST receive a signal related to the aforementioned oculomotor efference copy -- they continue to respond during a SPEM when the target is momentarily extinguished or stabilized on the retina (Newsome et al., 1988).
We have constructed a model of smooth pursuit that provides a functional interpretation and simulates properties of the MT and MST cell types described above. The model proposes a specific set of interactions among MT and MST cells encoding target and background motion, and demonstrates how the visual properties of these cells can interact with an efference copy of eye movement velocity to control SPEMs and suppress OKN. A key hypothesis is that the brain gathers information about target motion, eye speed, and background motion to drive SPEMs. Perceptual phenomena such as the Filehne illusion and the Aubert-Fleischl phenomenon are shown to be emergent properties of the MSTd and MSTv interactions that control SPEMs. A brief report of these results has appeared previously (Pack et al., 1998).
The model consists of two levels of processing. The first level contains 200 MT-like cells which have direction- and speed-selective responses to retinal image motion for stimuli within their receptive fields. Half of these cells have inhibitory surrounds, such that they are suppressed by large stimuli moving in a coherent direction. The other half have excitatory surrounds, exhibiting larger responses to larger stimulus sizes. These categories of cells have been observed in primate MT (Tanaka et al., 1986; Born & Tootell, 1992; Saito, 1993).
The two types of cells form distinct projections to the model MST, which is the second level of processing (Figure 1). Based on these projections, the model MST cells are divided into two functional classes, MSTd and MSTv. We simulate 2 cells in each class. The model MSTd cells integrate the responses of MT cells with excitatory surrounds, thereby enabling them to respond well to global motion of the background. The model MSTv cells integrate the responses of MT cells with inhibitory surrounds, thereby enabling them to respond well to discrete moving targets. The connections from MT to MSTv are weighted such that MT cells preferring higher speeds generate larger responses in the MSTv cells to which they connect. This generates a graded response to target speed in MSTv (see below). This response is used to drive pursuit eye movements, and the resulting efference copy is fed back into the model at the level of MST.
The excitatory connections from MSTd to MSTv enable increasing relative background motion to supplement decreasing relative target motion as the speed of the eye approaches that of the target (Figure 2). Thus model cells in MSTv code the predicted target speed, rather than the speed of the target on the retina.
The full model connectivity at the level of MST is shown in Figure 3. Each MST cell can be thought of as belonging to a channel for driving eye movements in a preferred direction. Each channel contains excitatory connections for processing three different types of signals:
1) Target speed in a preferred eye movement direction: As noted above, this signal is calculated in the model MSTv. Tanaka et al. (1993) have shown a graded response to velocity in MSTv, although this study did not examine the response of these cells to eye movements. Kawano et al. (1994) have shown that MST cells that drive eye movements exhibit a response that is linear with log stimulus velocity. The model simulates these properties by assigning higher weights to the inputs from MT cells with faster preferred speeds (see Mathematical Methods). Signals related to target acceleration (Lisberger and Movshon, 1999) and position may also be present in MT or MST, but are not included in the current implementation of the model. The relationship between MST cell responses and target velocity will be examined further below.
2) Efference copy of the eye's speed in the preferred eye movement direction: This signal is fed back to MSTv and MSTd. Eye velocity signals have been observed in MSTd and MSTv (Komatsu and Wurtz, 1988). These signals appear to provide an estimate of the ongoing speed of eye movements, even when the target is momentarily extinguished or stabilized on the retina (Newsome et al., 1988).
3) Motion of the background in a direction opposite that of the preferred eye movement direction: This signal is computed in MSTd, and the magnitude of the directional response is scaled primarily by the size of the stimulus (Komatsu and Wurtz, 1988).
For example, a rightward eye movement channel consists of excitatory inputs for MSTv cells preferring rightward motion, MSTd cells preferring leftward motion, and efference copy input which is strongest for rightward eye movements. Channels for opposite eye movement directions are linked by inhibitory connections. The next section presents simulations of pursuit and MST data. The mathematical description of the model is presented at the end of the paper.
Simulation 1 -- Pursuit against a textured background
This section describes simulations of some behavioral and perceptual consequences of SPEMs made over a textured background. One of the primary difficulties encountered by the pursuit system is that of distinguishing between motion signals generated by target motion and those generated by eye rotation across a background. The goal of a SPEM is to track the motion of the target, while suppressing the optokinetic response to the retinal motion of the background. If the system tracked both types of signals indiscriminately, the eye would oscillate, endlessly tracking self-induced motion. The model proposes that this problem is solved by the brain as follows: motion of the background is computed by cells in MSTd, which excite MSTv cells that are tuned to a direction opposite their own preferred motion direction (Figure 3).
Model SPEM performance was tested for target motion across a textured background at a number of target speeds. Figure 4 depicts the results, which show that the model is capable of performing SPEMs against a textured background. In all cases, pursuit speed approaches target speed, although the gain decreases with increasing stimulus speed. Thus, the model is able to maintain an ongoing estimate of target speed despite the fact that there is little motion of the target during the SPEM. Furthermore, the model does not track the retinal motion of the background, even though it is present at the input stage (see Figure 1) during the SPEM. To clarify how this occurs, the next section examines the dynamics of individual model cells during persuit.
Simulation 2 -- Perceptual and Pursuit Dyamics
One of the main goals of the present work is to link the neurophysiology of pursuit control to the perceptual consequences of eye movements. There exist a number of well-known perceptual illusions related to SPEMs, against which the model can be tested. In particular, a target pursued against an otherwise blank background is perceived as moving more slowly than a target moving at the same speed while the observer fixates a stationary point (Fleischl, 1882; Aubert, 1886; Gibson et al., 1957; Mack and Herman, 1978; Honda, 1990). This illusion is known as the Aubert-Fleischl effect.
The Aubert-Fleischl effect is eradicated when pursuit is made against a textured background, which generates retinal motion signals as the eye is swept across it. In this case, steady-state pursuit speed decreases (Yee et al., 1983; Collewijn and Tamminga, 1984; Barnes and Crombie, 1985; Kaufman and Abel, 1986; Ilg et al., 1993; Mohrmann and Thier, 1995; Ilg and Hoffman, 1996; Niemann and Hoffmann, 1997), but the perceived speed of the pursuit target increases (Dichigans et al., 1969; Dodge, 1904; Gibson, 1968; Mack and Herman, 1978). Furthermore, observers perceive motion of the textured background in a direction opposite that of the ongoing SPEM. This latter effect is known as the Filehne Illusion (Filehne, 1922).
The dissociation between the perceptual and motor aspects of pursuit, as seen for pursuit against a textured background, might seem to imply the existence of separate neural mechanisms for processing of visual motion for perception and for eye movements, although Mack et al., (1982), Zivotosky et al. (1995), and Beutter and Stone (1998) have provided evidence to the contrary. We now demonstrate how these disparate effects can be explained as emergent properties of a single MST circuit for controlling SPEMs. The behavioral linking hypotheses that support these explanations are that the perceived leftward or rightward speeds are proportional to the activities of the respective MSTv cells 3 and 4, but the actual pursuit speed is proportional to their time-averaged difference p in equation (1) (Mathematical Methods).
The first simulation compares the output of cell 4, an MSTv cell encoding in the rightward eye movement channel (see Figure 3), across a number of different pursuit conditions. Figure 5 shows the output of this cell during pursuit against a blank background (thick solid line), pursuit against a 60o x 60o textured background (thin solid line), and the average output during stimulus motion with no eye movements (dashed line). In all cases, the speed of target motion is 8o/sec. The fact that the activity in cell 4 during pursuit is lower than during fixation for the same target speed can be considered an analogue of the Aubert-Fleischl illusion. That is, the model calculates the speed of the moving target to be less during pursuit than during fixation. This effect disappears for pursuit against a background, as it does in human observers. This is shown by the thin solid line in Figure 5.
It is possible to derive an explanation for these effects in terms of the relative weighting of the three pursuit signals (target motion, efference copy, and background motion) used by the model. The target motion alone drives the model cell 4 most strongly, as evidenced by the dashed line in Figure 5. During an SPEM against a blank background, there is very little target motion, so the remaining signal is primarily due to the efference copy (Figure 1). The fact that the output of cell 4 is lower in this case indicates that the efference copy slightly underestimates pursuit speed, as is often suggested (c.f. Wertheim, 1994 for a review). During pursuit against a textured background, the background motion computed by MSTd cell 2 provides further excitatory input to MSTv cell 4, resulting in a slightly higher output. This corresponds to the perceived increase in speed during pursuit against a textured background.
The retinal motion resulting from rightward pursuit against a textured background is also registered by the model MSTv cells that form the leftward pursuit channel. In this sense, the model experiences an analogue of the Filehne Illusion. Figure 6 shows the activity in model cell 3 (see Figure 2), which is an MSTv cell encoding leftward motion, during the same conditions for rightward pursuit over a textured background as in the previous simulation. Note that there is a low level of activity in this cell, consistent with the observation that during the Filehne Illusion, the perceived speed of the background is low relative to its motion across the retina (Mack and Herman, 1973; 1978). The reason for this is that there are inhibitory connections between cells 3 and 4, so that the combination of large-field background motion and efference copy in cell 4 inhibits the output of cell 3. It is interesting to note that increasing the size of the visible background further decreases the gain of the Filehne Illusion (Turano and Heidenreich, 1999), presumably by increasing this inhibition.
The residual background motion encoded by the leftward motion channel during rightward pursuit has an effect on the commanded pursuit speed, as defined by p in equation (1) of the Mathematical Methods section. A key hypothesis embodied within the circuit in Figure 3 is that the two pursuit channels simultaneously attempt to drive the eye in opposite directions. Therefore the activity described in Figure 6 during a rightward SPEM should decrease the commanded speed of the eye movement. Figure 7 depicts the pursuit speeds controlled by p for the same conditions as in the previous simulations of MSTv cells, for pursuit against a blank and a textured background. When a pursuit eye movement is made against a textured background, the pursuit speed is smaller than when the eye movement is made in the dark, as is observed in human subjects. This effect can be attributed directly to the retinal motion of the background. Since the eye movement speed is affected by the difference of the activities in each channel, the motion of the background creates a leftward eye movement signal, causing a slowing of pursuit in the rightward direction.
The reduced activity in cell 3 also suggests a possible mechanism underlying suppression of optokinetic nystagmus (OKN) during maintenance of SPEMs. Lesions studies have demonstrated that MST plays a role in initiating and maintaining OKN movements, and support the strong possibility that SPEMs and OKN share common neural pathways (see Dursteler and Wurtz, 1988 for a review). Competition between pursuit channels encoding opposite directions of eye movement allows the model cells to suppress stimuli that would normally trigger a disruptive movement to track the retinal motion of the background. This mechanism would appear to be sufficient to suppress at least the cortical portion of the OKN response, and may reflect a general strategy used by the oculomotor system, since similar cell types are found in a number of subcortical structures (see Discussion).
These simulations suggest a possible explanation for differences found psychophysically between perceived speed and pursuit speed. In the model, the quantity that corresponds best with perceived speed is the activity of the MSTv cells that generate the eye movement command. This activity is derived from a combination of visual and efference copy signals. The actual pursuit speed is a result of competition between MSTv cells that encode opposite directions of motion, as in equation (1) (Mathematical Methods). This competition serves to suppress the representation of background motion that would otherwise cause an OKN movement.
Simulation 3 -- Electrical Stimulation in MST
A key issue in understanding the cortical control of SPEMs is the nature of the representation of target motion in MST. It is known that target motion is represented in MST by both visual and efference copy signals, and a number of studies have indicated that MST cell responses are linear with log target velocity for a range of velocities consistent with SPEMs. This has been demonstrated for MST responses to visual motion stimuli (Tanaka et al., 1993; Kawano et al., 1994), and for efference copy signals when retinal stimulation is limited or absent (Sakata et al., 1983; Komatsu and Wurtz, 1988). The present model uses a simple weighting scheme (see Methods) to transform the known velocity tuning in MT into a graded response to log target velocity.
These data indicate that MST cells maintain a representation of target velocity. If this is true, then increasing the activity of MST cells during a SPEM should increase the velocity of eye movement. Komatsu and Wurtz (1989) verified this by introducing electrical stimulation into MSTv cells while monkeys made SPEMs in response to target motion at different speeds. Importantly, the effect of this stimulation was lateralized, such that stimulating in MSTv increased pursuit velocity on the ipsiversive side, and decreased velocity on the contraversive side. That is, stimulation in MSTv biased pursuit velocity towards the stimulated hemisphere. The magnitude of this bias was related to the magnitude of the stimulation current.
The lateralization of eye movement commands in MST makes it possible to study stimulation effects in the model. As described in the Methods section, the model cells are organized into two competing channels, each of which drives pursuit in the opposite direction. Similarly, MSTv appears to have an anatomical specialization for pursuit, with each hemisphere driving pursuit in the ipsiversive direction. We therefore conceptualized the effect of electrical stimulation as an increased input to the model MSTv cells that drive SPEMs in a preferred direction.
Figure 8a shows the effect of introducing stimulation with S3 = 0.8 into the leftward channel (see I3 in equations (12) and (14)) during rightward pursuit of a 10o target moving against a dark background at 22o/sec. The value of S was chosen because it gave a reasonable quantitative fit to the data, although the qualitative effect was robust across choices of stimulation level. The stimulation slows pursuit substantially. Figure 8b shows that the same stimulation of the rightward channel during rightward pursuit causes a slight increase in pursuit speed. This effect can be quantified across speeds by measuring the difference between the average pursuit speed with and without stimulation. The simulation results are shown in Figure 9b, along with the experimental results from Komatsu and Wurtz (1989) in Figure 9a.
The demonstration of lateral specialization for horizontal pursuit raises the question of how the cortex controls vertical pursuit. Komatsu and Wurtz (1989) found that stimulation of MSTv in either hemisphere caused a slowing of pursuit in either of the two vertical directions. Thus, there does not appear to be a cortical lateralization for vertical SPEMs. We verified this effect in the model by redefining the direction selectivity of the model cells to be downward and upward, rather than leftward and rightward. Since there does not appear to be lateral specialization for vertical SPEMs, we modeled the effect of stimulation of each hemisphere as an increased input to both upward and downward cells. That is, we assume that stimulation in MST activates an approximately equal number of cells specialized for upward and for downward pursuit, as suggested by Komatsu and Wurtz (1989). We simulated this effect in the model with S3 = S4 = 1.2, and measured the velocity difference with and without stimulation, as in the previous simulation. Again the value of S was chosen rather arbitrarily. The results, shown in Figure 10, indicate that stimulation of both channels produced a decrease in pursuit velocity. The effect was strongest at high speeds, and almost entirely absent at low speeds, as found by Komatsu and Wurtz (1989).
These results address the nature of the signal that drives eye movements. Komatsu and Wurtz (1989) found that microstimulation in MSTv produced three effects. First, there was a decrease in pursuit velocity for contraversive stimulation, and this decrease was larger at greater pursuit speeds. Second, there was an increase in pursuit velocity for ipsiversive stimulation that was largely unrelated to pursuit velocity before stimulation. Third, there was a decrease in pursuit velocity for microstimulation during vertical pursuit, and the decrease was larger at greater pursuit speeds. The model simulates all three results.
The model explanation for the effects of electrical stimulation is based on two properties. The first is the aforementioned encoding of target velocity by opponent pairs of model MSTv cells. The second property is the observation that the responses of individual cells cannot reach infinity, but are limited by a saturating nonlinearity (see equations (10-13)). As a result, the additive effect of ipsiversive stimulation is greater when the cell is responding at a low level (at low target velocities) than when the cell is responding strongly (to high target velocities). For contraversive stimulation, the increased activity results in inhibition of the channel driving the ongoing SPEM. Since the model MSTv cells encode log velocity (see Methods), a unit decrease in activity causes a greater loss of speed when the ongoing pursuit speed is greater. Likewise, for stimulation of both channels during vertical pursuit, the effect is stronger on the pursuit channel encoding the direction opposite the ongoing SPEM, because its response level before the stimulation is lower. The result is the observed decrease in pursuit velocity.
One prediction of the model is that electrical stimulation of MSTd cells should evoke an increase in pursuit velocity in the direction opposite the preferred direction of the stimulated cell. For example, stimulation of an MSTd cell preferring rightward motion (e.g., cell 1 in Figure 3) should increase leftward pursuit velocity by exciting MSTv cells that code leftward motion (e.g., cell 3 in Figure 3). This effect has indeed been observed at the level of MT: stimulation of MT cells with excitatory surrounds (MT+ cells in the model) increases pursuit velocity in a direction opposite that of the preferred direction of the cell (Born et al., 2000). Komatsu and Wurtz (1989) did not observe directional effects in their stimulation of MSTd in different hemispheres, but it may be that MSTd lacks the lateral specialization for pursuit found in MSTv.
Simulation 4 -- Neurophysiology
MST eye movement cells exhibit a number of intriguing physiological properties. These cells appear to receive an eye movement efference copy, since they respond to ongoing SPEMs, even when the pursuit target is blinked off or stabilized on the retina (Newsome et al., 1988). More interesting from the standpoint of the current model are the cells' visual properties. The majority of MST eye movement cells respond in a directionally-selective manner to moving dot fields, and the preferred direction depends in a complex way on the size and speed of the dot field. In general, MST cells respond preferentially to small fields of motion in the same direction of pursuit (called the forward response), and to large fields of motion in the opposite direction from pursuit (called the reversed response). This reversal in direction selectivity is primarily dependent on the size of the stimulus, although it is also affected by the speed of the stimulus. Large speeds increase the magnitude of the forward response, while slow speeds increase the magnitude of the reversed response. As a result, higher speeds were required to cause a reversal in direction selectivity for larger stimulus sizes. These studies were conducted while the monkey fixated a stationary point, so that they can be attributed entirely to visual properties of the cells. Cells of this type were found in both MSTd and MSTv.
To simulate these effects in the model, the input was a square motion field defined to stimulate a region between 10o and 80o in width and height, centered on the fovea. The speed was either 14o/sec. or 28o/sec., as specified by Komatsu and Wurtz (1988). Figure 11 shows that the model provides a good qualitative fit to the MST eye movement cell data. We found that the model reproduced these results in a manner that was robust to parameter choices. The parameters used in Figure 11 (C = 0.02, F = 5.0, J = 40.0, M = 0.15) provided the best fit to the data, although they were slightly different from those used in previous simulations.
The model explanation of these cell properties is based on the connectivity described in Figure 3. The increasing response to increasing stimulus size in the reversed direction is due to spatial summation in area MSTd, which is represented by cells 1 and 2 in the model (see Figure 3). MSTv cells (cells 3 and 4) in the model generate the forward response. As the stimulus size increases, activity in MSTd cells in the opponent channel increases. Because of inhibition between the channels, the MSTv activity decreases, causing the suppression of the forward response for large stimuli. As stimulus speed is increased, activity in MSTv cells increases during the forward response. Greater inhibition from the opponent pursuit channel is required to suppress the response at high speeds. Since the inhibition results from MSTd cells in the opponent channel, this requires a larger stimulus size.
The model described in this paper focuses on how cortical area MST controls behavioral properties of smooth pursuit maintenance. While MST has been directly implicated in pursuit control, it has been largely ignored by previous models. The current work demonstrates that three types of information -- target velocity, an oculomotor efference copy, and retinal background motion -- can be combined in physiologically realistic ways to generate many of the behavioral and perceptual effects that have been observed during SPEMs. The behavioral effects include suppression of the optokinetic response, slowing of pursuit against a textured background and the effects of electrical stimulation on pursuit maintenance. The perceptual effects include the Filehne Illusion (Filehne, 1922) and the Aubert-Fleischl effect (Fleischl, 1882; Aubert, 1886; Mack and Herman, 1978; Honda, 1990). Furthermore, the connections among the model cell types provide the first functional explanation for the peculiar properties of MST eye movement cells (Komatsu and Wurtz, 1988) in response to visual stimuli of varying sizes and speeds.
The presence of an eye movement signal in MST can be considered as evidence for an ongoing coordinate transformation of visual information. While visual information is processed in retinal coordinates in V1 and MT, it could be argued that MST has access to a head-centered representation of target motion, since motion-selective cells respond even when SPEMs nullify the retinal target motion (Newsome et al., 1988). This head-centered representation can be converted to a body-centered representation if head-rotation signals are taken into account. This would be useful for estimating target motion during combined rotations of the eye and head. Evidence for vestibular signals related to head rotation in MST has been presented by a number of researchers (Kawano et al., 1984; Thier and Erickson, 1992). These signals could be used in a manner identical to that of the oculomotor efference copy in the present model: to maintain a representation of target velocity during head or body rotations, and to nullify the resulting retinal motion of the background. The latter function has been observed in a population of cells in MSTd (Shenoy et al., 1999).
A similar type of computation appears to be made by the "passive-only" cells found in MST. Erickson and Thier (1992) observed that a substantial number of MST cells responded in a direction-selective manner to stimuli moving across their receptive fields during fixation of a stationary point. When the same retinal stimulus was generated by moving the eye across a stationary stimulus, the cells lost their direction selectivity. This effect was not observed in V4 or MT. Thus, MST appears to maintain a representation of stimulus motion or stationarity that is largely independent of retinal stimulation. The current model provides a possible explanation for this "passive-only" property, wherein the suppression of visual stimulation results from an indirect inhibitory input derived from an oculomotor efference copy. This model property is described in the simulation of the Filehne Illusion (Simulation 2). Erickson and Thier (1992) also suggested a role for MST cells in mediating the Filehne Illusion.
Many studies have focused on the selectivity of MSTd cells for optic flow stimuli consisting of expanding, rotating, and spiral motions (e.g., Graziano et al., 1994). We have suggested in previous work (Grossberg et al., 1999) that MSTd cells may actually be tuned for these types of optic flow with respect to the fovea, and shown computer simulations to support this idea. Such selectivity could be used to guide vergence and torsional eye movements. This would be consistent with a general role for MST in maintaining fixation on an object as it moves relative to the observer in three dimensions.
There is substantial evidence to suggest that SPEMs are dependent on the calculation of retinal disparity. In infants, the onset of binocular vision and symmetric optokinetic nystagmus (the ability to make tracking eye movements in leftward and rightward directions) are correlated within individuals (Wattam-Bell et al., 1987). Symmetric OKN does not develop when binocular vision is disrupted early in life (Kiorpes et al., 1996; Westall et al., 1998). In adults, pursuit accuracy is impaired by textured backgrounds only if they are in the plane of fixation, and not when they are moved out of the plane of fixation (Howard and Marton, 1992). Neurophysiological studies have shown that the responses of nearly all MSTd cell increase for stimuli moved out of the plane of fixation (Roy et al., 1992), while no such bias exists in MSTv (Eifuku and Wurtz, 1997). This observation fits well with the current model's suggested roles for MSTd and MSTv cells, since stimulus motion outside the plane of fixation is likely to be interpreted as retinal motion of the background, rather than being used to initiate pursuit (Howard and Simpson, 1990). Earlier modeling work (Chey et al., 1997; 1998; Grossberg and Rudd, 1992) suggested how MT cells become disparity-selective, as well as direction- and speed-selective. Thus interaction between MSTv and MSTd can be interpreted to link cells that represent the fixation plane and backgrounds at further depths, respectively. A sensible elaboration of the current model would be to assign a preference for motion outside the fixation plane to MSTd cells, and a preference for motion near the fixation plane to MSTv cells.
An open question regards the origin of the oculomotor input to MST cells. It is often assumed that this signal is a corollary discharge related to the brainstem signals used to generate the SPEM. Such a signal could be relayed through the pulvinar or superior colliculus to the superior temporal sulcus (Sakata et al., 1980). Alternatively, this signal may not be directly derived from brainstem signals, but may reflect a more cognitive "memory" of target velocity, possibly generated in the frontal eye fields or premotor areas. Because of the reciprocal connections among these areas, the exact nature of the efference copy signal will be difficult to determine.
MST eye movement cells output to subcortical regions which relay pursuit signals to the oculomotor centers in the brainstem. The most direct connection of MST eye movement cells is the dorsolateral pontine nucleus (DLPN). Kawano et al. (1992) found two main cell types in DLPN which correspond well with the hypothesized roles for MST cells. One type of cell responds to large-field motion in the opposite direction of its preferred response for eye movement. The other type responds to motion of a small target in the same direction as pursuit. These cells have also been found in the nucleus of the optic tract (Mustari and Fuchs, 1990; Ilg and Hoffmann, 1996) to which the DLPN projects. Thus, the functional distinction between background and target motion can be seen at anatomical levels throughout the pursuit pathway.
Komatsu and Wurtz (1988) and Wurtz et al. (1990) first pointed out that MST eye movement cell responses are related to a perceptual effect known as induced motion. Induced motion is the perceived motion of a target in a direction opposite that of a moving background. This effect is analogous to the reversed response seen in MST cells (see simulation 5 - Neurophysiology). Simulation 2 showed that this reversed response could help to account for the Filehne Illusion.
The reversed response in the model is generated by spatial integration of background motion in MSTd. Thus, following the analogy suggested by Komatsu and Wurtz (1988), it is likely that the induced motion effect should be similar to the reversed responses seen in MST. If so, induced motion should be stronger for larger stimulus sizes. This has been verified experimentally by Pack and Mingolla (1998), who also found that the magnitude of the induced motion effect saturated near 20o/sec. This correlates well with MST data from Komatsu and Wurtz (1988) and Tanaka et al. (1993) suggesting that the reversed responses in MST cells saturate near the same speed.
Another perceptual correlate that is found in the response patterns of MST eye movement cells is the tendency to perceive large objects as moving more slowly than small objects, even when the actual velocity is the same (Brown, 1930; Snowden, 1996). This is reflected in the outputs of MSTv cells, which decrease for large stimulus sizes (Figure 11, dotted lines), although it remains to be seen whether perceptual responses exhibit the same dependence on stimulus velocity as that found for MST cell responses.
Previous models of MST have been primarily concerned with computing the parameters of self-motion from optic flow (Lappe and Rauschecker, 1993; Perrone and Stone, 1994). This problem is undoubtedly related to the control of SPEMs, since it is useful for the oculomotor system to know in which direction the body is moving in order to maintain fixation on a stationary target. Similarly, an eye movement efference copy is useful for heading perception, as it allows the observer to identify the portion of the optic flow field that is due to eye rotation (Royden et al., 1994).
Previous pursuit models have characterized the control of SPEMs as an engineering problem, and have been successful in characterizing pursuit behavior at very short time scales (Krauzlis and Lisberger, 1989; Ringach, 1995). Notably, Lisberger et al. (1987) have hypothesized that cells in the pons, cerebellum, brainstem, and some areas of visual cortex are involved in converting visual information about target motion into oculomotor signals. Their model has been successful in linking some behavioral observations on SPEMs to neurophysiological substrates. However, as the authors note, "we have a few parts left over that must serve some function. The `spare parts' include . . . cortical area MST . . ." (p. 124). The current model suggests a possible use for the MST in generating behavioral and perceptual consequences of pursuit.
An active area of enquiry has been the study of the representation of target motion in the visual cortical pathways. Most models, including the current model, have focused on the control of pursuit using target velocity. However, experimental (Krauzlis and Lisberger, 1987) and theoretical (Krauzlis and Lisberger, 1989; Pola and Wyatt, 1989; Ringach, 1995) results indicating that a signal encoding target acceleration is useful for stable pursuit. A recent study by Lisberger and Movshon (1999) has shown that target acceleration is represented in the early responses of some MT cells. This observation makes it possible to extract acceleration estimates from the transient activity of MT cells, and to use these estimates in the initiation of SPEMs. Lisberger and Movshon (1999) use a weighted representation of sustained MT cell activities to obtain velocity estimates, as does the current model. Chey, Grossberg, and Mingolla (1997, 1998) model how such a weighted velocity representation can be derived from retinal output signals. These models do not, however, address the role of MT and MST cells in perception of object and background motion during SPEMs.
Models of perception during eye movements have generally not attempted to link perception to cortical processes. The Post and Leibowitz (1985) model hypothesizes that the pursuit and optokinetic systems comprise different systems, and that perceptual illusions such as induced motion result from a voluntary effort to suppress the optokinetic response during fixation of a stationary target. The current model is generally in agreement with this proposition, and it takes the important further step of casting it in a quantitative form and using it to explain known properties of MST cells.
The use of a signal related to retinal motion of the background is akin to Wertheim's (1994) concept of a "reference signal." Wertheim (1994) hypothesized that large-field motion of the background stimulates cells in the vestibular and accessory optic areas. This signal was proposed to compensate for the underestimation of pursuit velocity by the oculomotor efference copy. The current model uses a similar mechanism, although the hypothesized neural substrate is in the cortex. An interesting possibility is that the neural circuit described in this paper is replicated at cortical and subcortical levels. The evidence for this is described above. The Wertheim (1994) model and the model of Post and Leibowitz (1985) did not, however, attempt to link pursuit perception to pursuit control.
The current model's hypothesis of excitatory connections between cells that encode opposite directions is similar to that of structure-from-motion models (Nawrot and Blake, 1989; Andersen et al., 1996). These models also suggest that inhibitory connections between MT cells with preferences for similar disparities and opposite motion directions can allow percepts of rotation-in-depth and transparent motion. The current model hypothesizes a similar connectivity in MST, based on the distinct properties of MSTv and MSTd cells and behavioral observations on pursuit (see Retinal Disparity above). Similar concepts have been used to explain data about 3-D form perception and figure-ground perception (Grossberg, 1987, 1994; Grossberg and McLoughlin, 1997). Thus, it seems possible that this organization reflects a general strategy used by the brain in analyzing both form and motion.
The model is designed to capture key aspects of the interactions of visual signals in MT and MST with eye movement signals in MST. This requires that the model simulate pursuit eye movements, but it is important to keep in mind that the model is not designed to capture the details of pursuit at very short time scales, a problem upon which other models have focused successfully (Robinson et al., 1986; Krauzlis and Lisberger, 1989) by incorporating realistic delays in sensory processing, and information about target acceleration and position. Instead, the current model attempts to link the neurophysiology of MST to behavioral observations on SPEMs, using the first-order dynamics of a simple tracking system.
The pursuit speed p is thus quantified in a simple way according to the equation:
where x3 and x4 are the output of model MSTv cells driving pursuit to the left and right, respectively (see Figure 3). The dynamics of these cells are described in detail below. The parameter S is a "switch" which has been described elsewhere as a neural correlate of the conscious decision to pursue a target (Goldreich et al., 1992). A thorough examination of this switch is beyond the scope of this paper. We set S = 1 during pursuit simulations, and S = 0 during simulations which required fixation of a stationary target. Positive values of p are interpreted as rightward pursuit, and negative values are interpreted as leftward pursuit. The system is limited to one dimension to reduce computational complexity, although a generalization to two dimensions could be made using the same model ideas.
The visual input to the model is a field of motion vectors v(x,y) which describe the speed of the motion vector at each point (x,y) on the retina. The values of x and y are constrained to be in the normalized range [-1,1], as are the velocities v(x,y). The input is described in terms of a square target object of length and width r moving horizontally across the visual field. The center of the object is given by x0 and the velocity by v0. Therefore
In order to do quantitative data simulations, a conversion factor is determined for assigning units to the model inputs and outputs. The spatial dimensions [-1,1] were scaled by a factor of 50 degrees to provide a visual field of 100o x 100o. This applies to sizes of backgrounds and cell receptive fields. Speed was scaled from the range v e (-1,1) to 210|v| degrees per second, to facilitate comparison with MT cells, which are tuned across octaves of speeds (Rodman and Albright, 1987). This tuning is described in the next section.
The cells representing the input to the MST circuit are modeled after cell types found in the middle temporal (MT) area, which projects heavily to MST (Maunsell and Van Essen, 1983a). MT cells represent local motion speed and direction in two different ways (Allman et al., 1985; Tanaka et al., 1986; Born & Tootell, 1992; Saito, 1993). Cells in one group (MT-)respond well to small stimuli moving in their receptive field centers, but are inhibited by large stimuli which extend into the receptive field surrounds. A second group of cells (MT+) shows increasing responses to larger stimulus sizes, indicative of spatial summation. Born and Tootell (1992) showed that these two cell types are clustered anatomically in MT. Since the current model is designed primarily to examine MST cells, every effort was made to model MT functionality in the simplest possible way.
We simulated 200 model MT cells, each of which had a preferred direction and speed. The receptive field center (i, j) for each cell was constrained to lie in the coordinate system defined by points (x,y). The direction preferences are limited to left or right to simplify the simulations. The speed tuning of a cell at position (i,j) is defined by a Gaussian centered on a preferred speed sij. The characterization of input speeds allows speed tuning to be defined across octaves of speed, as has been shown for MT cells (Maunsell and Van Essen, 1983b; Rodman and Albright, 1987). The width of each receptive field is a function of its position in the visual field, such that the diameter h of a cell at position (i,j) is scaled by:
For each MT cell, the total response to a motion stimulus was characterized in terms of the center-surround structure of MT cells (Born and Tootell, 1992). For a cell at position (i,j) The response in the center is calculated by summing the speed vectors v(x,y) in the preferred direction within the receptive field:
Setting H = 25 in equation (5) gives a receptive field size (square root of area) equal to about 70% of eccentricity, which is consistent with neurophysiological measurements (e.g., Ferrera and Lisberger, 1997). The multiplicative relationship between the tuning for speed and tuning for space (receptive field position) ensures that neither stimulus feature alone is sufficient to elicit a response from the cell.
The cells also receive stimulation from receptive field surround regions that are chosen to be 5 times the size of the centers, which is consistent with the findings of Allman et al. (1985). The portion of the response due to the receptive field surround is given by:
MT center-surround structure is thereby describes as a difference of Gaussians, as in other work (Murakami and Shimojo, 1993; Raiguel et al., 1995). We set the value of G = 0.1, which is consistent with the data of Rodman and Albright (1987).
The distribution of cell types was constrained by physiological data. MT+ and MT- cells are found in approximately equal quantities (Born and Tootell, 1992), so half the model cells were assigned to each category. The distribution of direction preferences was also equally balanced between left and right. The receptive field locations were primarily constrained to be near the fovea. Each model MT cell was assigned a position at a distance d from the fovea with probability exp(-Pd2), where P determined the distribution of cell positions. This yielded a unimodal distribution centered on d = 0.0. Setting P = 9.0 yielded a distribution that was qualitatively similar to that found for MT by Tanaka et al. (1993).
The distribution of speed preferences was constrained by a unimodal distribution, centered on 32o/sec., as found by Maunsell and Van Essen (1983b). This was given by exp(-Q(0.5-s)2). Recall that the speed s = 0.5 in the units used by the model corresponds to 32o/sec (following the conversion factor 210|s|). Setting Q = 10 yielded a distribution of speed preferences comparable to that found by Maunsell and Van Essen (1983b).
The two cell types found in MT appear to have separate projections to the two subdivisions of MST. This has been demonstrated for the owl monkey (Berezovskii and Born, 1999), although it has yet to be shown for the macaque monkey, on which most physiological studies have been conducted. MSTv cells have response properties that are similar to those of MT- cells (Tanaka et al., 1993), and MSTd cells have similar properties to MT+ cells (Duffy and Wurtz, 1991). This distinction was mirrored in the model connections between MT and MST, with model MT- cells projecting to MSTv, and model MT+ cells projecting to MSTd.
The model simulates the activities of 2 MSTv cells and 2 MSTd cells, with the connectivity described below. The activities for MSTd cells (1 and 2 in Figure 3) are defined by membrane equations (Hodgkin, 1964; Grossberg, 1973); namely:
where C and F are parameters which reflect the relative strength of the excitatory and inhibitory connections. The terms -xi, i=1,2, define passive decay to an equilibrium potential that is scaled to zero. The expression (1-x) in the excitatory input terms shunt the response of each cell to remain below the normalized maximum output of 1. The terms -Fx1x2 describe inhibition by x2 shunted by x1, as well as inhibition by x1 shunted by x2. The MSTd cells integrate over all MT cells in the visual field using term having the same left/right direction preference. They are therefore selective to coherent motion, as are many MSTd cells (Duffy and Wurtz, 1997).
where J and M are parameters. In (11), however, the terms of (10) are replaced by terms and the MSTd variables x1 and x2 are replaced by the MSTv variables x3 and x4, respectively. The MT/MSTv connections are defined in such a way as to allow MSTv cells to reconstruct the speed of a moving stimulus. To that end, the response of each MT cell is weighted by the speed preference sij of that cell. Since model MT speed tuning is defined across octaves of speed, the resulting speed estimate at the level of MSTv is linear with log velocity, as has been found in MST (Kawano et al., 1994). Model MSTv cells integrate terms over a receptive field radius of 20o, which is the average MSTv cell radius found by Tanaka et al. (1993). Both model MSTv cells were placed in the center of the visual field.
where I3(t) and I4(t) indicate the level of stimulation. The simulations used the same technique as Komatsu and Wurtz (1989), which was to initiate pursuit, allow it to stabilize, introduce stimulation as a step function, and remove the stimulation. Mathematically,
The model was implemented in C programming language on a UNIX platform. The equations were integrated at each time step using a fourth order Runge-Kutta technique. It was assumed for simplicity that a single integration time step corresponded to 1 msec in real time, although no effort was made to match the model output to millisecond variations in pursuit dynamics.
The model equations were used to simulate data from a variety of experiments. For the SPEM experiments, pursuit was conducted across either a homogeneous or a textured background. The difference between these two cases lies in the fact that a textured background creates retinal motion signals as the eye is moved across it, while a homogeneous background does not. In all cases, motion vectors were calculated every 0.5o, using equation (4). In general, a large (5ox5o) moving textured square was used as the tracking stimulus, in order to keep the required number of model MT cells reasonably low. Unless otherwise noted, the parameters were set as follows: C = 0.05, F = 1.0, J = 20.0, M = 0.5.
Dichigans J, Korner F, Voigt K (1969) Vergleichende Skalierung des afferenten und efferenten Bewegunggssehens beim Menschen: lineare Funktionen mit verschiedener Anstiegssteilheit. Psychol Forsch 32:277-295.
Dursteler MR, Wurtz RH, Newsome WT (1987) Directional pursuit deficits following lesions of the foveal representation within the superior temporal sulcus of the macaque monkey. J Neurophysiol 57:1262-1287.
Kawano K, Sasaki M, Yamashita M (1984) Response properties of neurons in posterior parietal association cortex of monkey during visual-vestibular stimulation. II. Optokinetic neurons. J Neurophysiol 51:352-360.
Kiorpes L, Walton PJ, O'Keefe LP, Movshon JA, Lisberger SG (1996) Effects of early-onset artificial strabismus on pursuit eye movements and on neuronal responses in area MT of macaque monkeys. J Neurosci 16:6537-6553.
Maunsell JH, Van Essen DC (1983b) Function properties of neurons in middle temporal visual area of the macaque monkey. I. Selectivity for stimulus direction, speed, and orientation. J Neurophysiol 49:1127-1147.
Raiguel S, Van Hulle MM, Xiao DK, Marcar VL, Orban GA (1995) Shape and spatial distribution of receptive fields and antagonistic motion surrounds in the middle temporal area (V5) of the macaque. Eur J Neurosci, 7:2064-82.
Saito H. (1993). Hierarchical neural analysis of optical flow in the macaque visual pathway. In Ono T, Squire L, Raichle M, Perrett D, Fukuda M (Eds.), Brain mechanisms of perception and memory: From neuron to Behavior, Oxford Press, New York, pp.121-139.
Westall CA, Eizenman M, Kraft SP, Panton CM, Chatterjee S, Sigesmund D (1998) Cortical binocularity and monocular optokinetic asymeetry in early-onset esotropia. Investigative Opthalm & Vis Sci 39: 1352-1360.
Wurtz, R., Komatsu, H., Dürsteler, M., & Yamasaki, D. (1990) Motion to movement: Cerebral cortical visual processing for pursuit eye movements. In: Edelman, G. and Cowen, W.M. (Eds.), Signal and Sense: Local and global order in perceptual maps, (pp. 233-260). New York: John Wiley.