ANR REM (2013/2016)
Reinforcement learning theory provides a general conceptual framework to account for behavioral changes. Recently the idea that reinforcement may be used to explain learning in motor responses has emerged. In particular, there is a growing interest in studying the effects of reinforcement learning in arm movements trajectories (Dam, Kording, & Wei, 2013), pointing movements (Trommershauser, Landy, & Maloney, 2006), or eye movements (Madelain, Champrenaut, & Chauvin, 2007; Madelain & Krauzlis, 2003b; Madelain, Paeye, & Wallman, 2011; Sugrue, Corrado, & Newsome, 2004; Takikawa, Kawagoe, Itoh, Nakahara, & Hikosaka, 2002; Xu-Wilson, Zee, & Shadmehr, 2009). However, and despite these few seminal studies, much is still unknown about both the details of the effects of reward on motor control and the underlying mechanisms. This proposal aims at a better understanding of how skilled motor responses are learned focusing on voluntary eye movements.
Although learning is often regarded as a restricted period of time during which a behavior undergo some changes we view learning as a continuously ongoing process. In the case of motor control every instance of a behavior is followed by some consequences that will affect some dimensions of the future response. These changes will in return affect the functional relations with the environment and this feedback process continues through lifetime. Therefore we do not regard motor learning as a special phase that allows the emergence of a particular motor response but as a continuous adaptation to the changes within the organism that affect the functional relations with her environment. This distinction is important because the learning situations that are experimentally tested over a short period of time may then be viewed as a condensed version of motor learning in the real life: the same adaptive processes are responsible for the changes in the response in both situations.
An important aspect of this fundamental research project is that the theoretical propositions addressed provide a new view on motor learning that departs from conventional wisdom. We expect to gain considerable knowledge on learning by constructing new experimental paradigms to collect behavioural data, implementing new learning models based on Bayesian theories and testing dynamical mathematical models of behavioural changes. Whichever way the results turn out, we anticipate that these studies will provide a better understanding of motor learning and provide a well-defined and solid framework for studying other forms of motor plasticity. If eye movement learning follows the rules of other operant responses (i.e. responses reinforced by their consequences), this will constitute a minor revolution in the study of motor control, both at the behavioral and neural levels, with important implications for the understanding of plasticity in other motor systems.
This work was supported by ANR project ANR-13-APPR-0008 "ANR R.E.M.".
Natural environments potentially contain several interesting targets for goal-directed behavior. Thus sensorimotor systems need to operate a competitive selection based on behaviorally meaningful parameters. Recently, it has been observed that voluntary eye movements such as saccades and smooth pursuit can be considered as operant behaviors (Madelain et al, 2011). Indeed, parameters of saccades such as peak-velocity or latency (Montagnini et al, 2005) as well as smooth pursuit behavior during transient blanking (Madelain et al, 2003) or visually-guided pursuit of ambiguous stimuli (Schutz et al, 2015) can be modified by reinforcement contingencies. Here we address the question of whether expectancy-based anticipatory smooth pursuit can be modulated by reinforcement contingencies. When predictive information is available, anticipatory smooth pursuit eye movements (aSPEM) is frequently observed before target appearance. Actions that occur at some distance in time from the reinforcement outcome, such as aSPEM -which occurs without any concurrent sensory feedback suffer of the well-known credit assignment problem (Kaelbling et al, 1996). We designed a direction-bias task as a baseline and modified it by setting an implicit eye velocity criterion during anticipation. The nature of the following trial-outcome (reward or punishment) was contingent to the online criterion matching. We observed a dominant graded effect of motion-direction bias and a small modulational effect of reinforcement on aSPEM velocity. A yoked-control paradigm corroborated this result showing a strong reduction in anticipatory behavior when the reward/punishment schedule was not contingent to behavior. An additional classical conditioning paradigm confirmed that reinforcement contingencies have to be operant to be effective and that they have a role in solving the credit assignment problem during aSPEM.