![]() |
One focus of our research is motor equivalent control of the speech articulators for producing strings of speech sounds. The DIVA model, introduced in Guenther (1994), is a neural network model of speech production that learns to control the movements of a computer-simulated vocal tract (including tongue, jaw, lips, and larynx) in order to produce phoneme strings. The model's neural mappings are tuned during a babbling cycle in which auditory feedback from self-generated speech sounds is used to learn the relationship between motor actions and their acoustic consequences. After learning, the model can produce arbitrary combinations of phonemes even in the presence of constraints on the articulators. The model is explained in more detail in (Guenther,1995a). This article explains how a new theory concerning the nature of the targets of speech (the idea of sound targets as convex regions rather than points) leads to new and unified explanations for many long-studied speech production phenomena, including motor equivalence, contextual variability, speaking rate effects, anticipatory coarticulation, and carryover coarticulation.
The initial version of the model did not incorporate true acoustic information. A more recent version, described in Guenther (1995b) and Guenther, Hampson, and Johnson (1998) , uses an articulatory synthesizer developed by Shinji Maeda to produce acoustic output. Several simulations of this model have been captured as quicktime videos. See a portion of the babbling phase during which the model learns the neural mappings that allow it to produce vowels, or see the model producing the vowels in "bet beet bat but boot" (videos are approximately 2 MB each). The model can produce the same vowels without using it's jaw, even though it has never trained this way and blocking the jaw requires very different movements to produce the same sounds. The model can also produce the vowels with 2/3 of its tongue mobility removed, and comes reasonably close to producing them without moving its jaw, lips, or larynx, much like a ventriloquist.
Another recent research interest of mine is speech perception. In particular, we have been using a combination of modeling and psychophysics to study the formation of phonetic categories in the brain, including the so-called perceptual magnet effect. An explanation of the effect in terms of neural maps in the auditory system is provided in Guenther and Gjaja (1996), and a better model and supporting experimental results are presented in Guenther, Husain, Cohen, and Shinn-Cunningham (1999).
My dissertation research primarily concerned motor equivalent reaching to targets. This work was published in Bullock, Grossberg, and Guenther (1993), which describes the DIRECT model of goal-directed reaching. The model is a self-organizing neural network that learns to control a simplified arm during an action-perception cycle in which random movements of the arm are perceived through vision to help tune the model's visual-motor neural mappings. After training, the model is capable of reaching to targets normally, with a joint blocked, with a pointer attached to the hand, or in the absence of visual feedback, as evidenced in the simulations reported in the article. The model is able to automatically and immediately compensate for constraints such as blocked joints, even though no constraints were present during learning. More recent work on the DIRECT model is described in Guenther and Micci Barreca (1997).