Publications

DynTex: A Real-Time Generative Model of Dynamic Naturalistic Luminance Textures
DynTex: A Real-Time Generative Model of Dynamic Naturalistic Luminance Textures

The visual systems of animals work in diverse and constantly changing environments where organism survival requires effective senses. To study the hierarchical brain networks that perform visual information processing, vision scientists require suitable tools, and Motion Clouds (MCs)—a dense mixture of drifting Gabor textons—serve as a versatile solution. Here, we present an open toolbox intended for the bespoke use of MC functions and objects within modeling or experimental psychophysics contexts, including easy integration within Psychtoolbox or PsychoPy environments. The toolbox includes output visualization via a Graphic User Interface. Visualizations of parameter changes in real time give users an intuitive feel for adjustments to texture features like orientation, spatiotemporal frequencies, bandwidth, and speed. Vector calculus tools serve the frame-by-frame autoregressive generation of fully controlled stimuli, and use of the GPU allows this to be done in real time for typical stimulus array sizes. We give illustrative examples of experimental use to highlight the potential with both simple and composite stimuli. The toolbox is developed for, and by, researchers interested in psychophysics, visual neurophysiology, and mathematical and computational models. We argue the case that in all these fields, MCs can bridge the gap between well- parameterized synthetic stimuli like dots or gratings and more complex and less controlled natural videos.

A Predictive Approach to Enhance Time-Series Forecasting
A Predictive Approach to Enhance Time-Series Forecasting

Accurate time-series forecasting is essential across a multitude of scientific and industrial domains, yet deep learning models often struggle with challenges such as capturing long-term dependencies and adapting to drift in data distributions over time. We introduce Future-Guided Learning, an approach that enhances time-series event forecasting through a dynamic feedback mechanism inspired by predictive coding. Our approach involves two models: a detection model that analyzes future data to identify critical events and a forecasting model that predicts these events based on present data. When discrepancies arise between the forecasting and detection models, the forecasting model undergoes more substantial updates, effectively minimizing surprise and adapting to shifts in the data distribution by aligning its predictions with actual future outcomes. This feedback loop, drawing upon principles of predictive coding, enables the forecasting model to dynamically adjust its parameters, improving accuracy by focusing on features that remain relevant despite changes in the underlying data. We validate our method on a variety of tasks such as seizure prediction in biomedical signal analysis and forecasting in dynamical systems, achieving a 40% increase in the area under the receiver operating characteristic curve (AUC-ROC) and a 10% reduction in mean absolute error (MAE), respectively. By incorporating a predictive feedback mechanism that adapts to data distribution drift, Future-Guided Learning offers a promising avenue for advancing time-series forecasting with deep learning.

Robust Unsupervised Learning of Spike Patterns with Optimal Transport Theory
Robust Unsupervised Learning of Spike Patterns with Optimal Transport Theory

Temporal sequences are an important feature of neural information processing in biology. Neurons can fire a spike with millisecond precision, and, at the network level, repetitions of spatiotemporal spike patterns are observed in neurobiological data. However, methods for detecting precise temporal patterns in neural activity suffer from high computational complexity and poor robustness to noise, and quantitative detection of these repetitive patterns remains an open problem. Here, we propose a new method to extract spike patterns embedded in raster plots using a 1D convolutional autoencoder with the Earth Mover’s Distance (EMD) as a loss function. Importantly, the properties of the EMD make the method suitable for spike-based distributions, easy to compute, and robust to noise. Through gradient descent, the autoencoder is trained to minimize the EMD between the input and its reconstruction. We then expect the weight matrices to learn the repeating spike patterns present in the data. We validate our method on synthetically generated raster plots and compare its performance with an autoencoder trained using the Mean Squared Error (MSE) as a loss function. We show that the method using the EMD performs better at detecting the occurrence of the spike patterns, while the method using the MSE is better at capturing the underlying distributions used to generate the spikes. Finally, we propose to train the autoencoder iteratively by sequentially combining the EMD and the MSE losses. This sequential approach outperforms the widely used seqNMF method in terms of robustness to various types of noise, speed and stability. Overall, our method provides a novel approach to reliably extract repetitive temporal spike sequences, and can be readily generalized to other sequence detection applications.

Integrating the What and Where Visual Pathways to Improve CNN Categorisation
Integrating the What and Where Visual Pathways to Improve CNN Categorisation

Convolutional Neural Networks (CNNs) have been widely used for categorisation tasks over the past decades. Many studies have attempted to improve their performance by increasing model complexity, adding parameters, or adopting alternative architectures such as transformers, which excel at large-scale benchmarks. However, these approaches often come at a high computational cost. We take a different approach, prioritizing ecological plausibility to achieve high accuracy with minimal computational cost. We focus on visual search — a task requiring both localisation and categorisation of a target object in natural scenes. Our work is inspired by the organisation of the primate visual system, which processes visual information through two distinct pathways: the ventral ‘‘What’’ pathway, responsible for object recognition, and the dorsal ‘‘Where’’ pathway, specialized in spatial localisation. Using this principle, we aim to evaluate the validity of a ‘‘what/where’’ approach, capable of selectively processing only the relevant areas of the visual scene with respect to the classification task. This selection relies on the implementation of a visual sensor (‘‘retina’’) that samples only part of the image, coupled with a map representing the regions of the image. This map, referred to as a ‘’likelihood map’’ is based on the probability of correctly identifying the target label. Depending on the case, it can be guided (resp not guided) by the target label, similar to the Grad-CAM (resp DFF). In both scenarios, we show improved classification performance when the eye shifts toward the region of interest, outperforming previously mentioned methods. Surprisingly, the gain in classification accuracy is offset by a reduction in the precision of object localisation within the scene. Beyond its computational benefits, this What-Where framework serves as an experimental tool to further investigate the neural mechanisms underlying visual processing.

A Robust Event-Driven Approach to Always-on Object Recognition

We propose a neuromimetic architecture able to perform always-on pattern recognition. To achieve this, we extended an existing event-based algorithm [1], which introduced novel spatio-temporal features as a Hierarchy Of Time-Surfaces (HOTS). Built from asynchronous events acquired by a neuromorphic camera, these time surfaces allow to code the local dynamics of a visual scene and to create an efficient event-based pattern recognition architecture. Inspired by neuroscience, we extended this method to increase its performance. Our first contribution was to add a homeostatic gain control on the activity of neurons to improve the learning of spatio-temporal patterns [2]. A second contribution is to draw an analogy between the HOTS algorithm and Spiking Neural Networks (SNN). Following that analogy, our last contribution is to modify the classification layer and remodel the offline pattern categorization method previously used into an online and event-driven one. This classifier uses the spiking output of the network to define novel time surfaces and we then perform online classification with a neuromimetic implementation of a multinomial logistic regression. Not only do these improvements increase consistently the performances of the network, they also make this event-driven pattern recognition algorithm online and bio-realistic. Results were validated on different datasets: DVS barrel [3], Poker-DVS [4] and N-MNIST [5]. We foresee to develop the SNN version of the method and to extend this fully event-driven approach to more naturalistic tasks, notably for always-on, ultra-fast object categorization.

An open-source vision-science tool for the auto-regressive generation of dynamic stochastic textures Motion Clouds

Motion Clouds are a generative model for naturalistic visual stimulation that offer full parametric control and more naturalism than the widely used alternatives of Random Dot Kinematograms (RDKs) or luminance gratings. We previously released an 3D FFT-based generation algorithm (Sanz-Leon et al., J Neurophysiol, 2012). Here, we present a novel implementation of motion clouds that uses an Auto-Regressive formulation so that any number of frames can be generated quickly with parameters changed in near real time, as needed in closed loop experiments. We demonstrate a version of the proposed toolbox that will be available online to illustrate the level of control available. With a graphic user interface, researchers can use interactive sliders to adjust motion cloud parameters like central frequency, orientations and bandwidths to get an intuitive feel for the parametric changes. We provide functions that can be easily integrated with psychophysics task tools like Psychtoolbox. Motion clouds can be used to generate trials of stand-alone moving luminance textures or added to other stimuli like images or videos as dynamic noise to disrupt visual processing. The toolbox can be run using GPUs to speed up generation to pseudo real-time for large stimulus arrays of about 1024 by 1024 pixels at 100Hz. We argue that this tool can enhance visual perception experiments in a range of contexts and would like it to be open to extensive testing, use and further development by the psychophysics, computational modelling, functional imaging and neurophysiology communities.

Kernel Heterogeneity Improves Sparseness of Natural Images Representations

Both biological and artificial neural networks inherently balance their performance with their operational cost, which balances their computational abilities. Typically, an efficient neuromorphic neural network is one that learns representations that reduce the redundancies and dimensionality of its input. This is for instance achieved in sparse coding, and sparse representations derived from natural images yield representations that are heterogeneous, both in their sampling of input features and in the variance of those features. Here, we investigated the connection between natural images’ structure, particularly oriented features, and their corresponding sparse codes. We showed that representations of input features scattered across multiple levels of variance substantially improve the sparseness and resilience of sparse codes, at the cost of reconstruction performance. This echoes the structure of the model’s input, allowing to account for the heterogeneously aleatoric structures of natural images. We demonstrate that learning kernel from natural images produces heterogeneity by balancing between approximate and dense representations, which improves all reconstruction metrics. Using a parametrized control of the kernels’ heterogeneity used by a convolutional sparse coding algorithm, we show that heterogeneity emphasizes sparseness, while homogeneity improves representation granularity. In a broader context, these encoding strategy can serve as inputs to deep convolutional neural networks. We prove that such variance-encoded sparse image datasets enhance computational efficiency, emphasizing the benefits of kernel heterogeneity to leverage naturalistic and variant input structures and possible applications to improve the throughput of neuromorphic hardware.

Vision dynamique utilisant la précision temporelle des motifs d'impulsions dans les calculs neuronaux

Notre cerveau est extrêmement efficace pour résoudre des tâches visuelles très complexes. En quelques centaines de millisecondes, nous sommes capables de reconnaître différents objets de manière invariante à diverses caractéristiques, telles que leur taille ou leur orientation. Récemment, les réseaux neuronaux artificiels ont fait de grands progrès dans la résolution des tâches auxquelles sont confrontés les systèmes biologiques. Ils s’appuient sur les connaissances des neurosciences pour former des architectures d’apprentissage biologiquement réalistes qui pourraient nous fournir des informations intéressantes sur le fonctionnement du cerveau humain. Mais ces architectures sont encore confrontées à un certain nombre de défis : les modèles ne sont pas toujours interprétables, ils ne semblent pas nécessairement utiliser les mêmes stratégies que leurs équivalents biologiques et ils sont très gourmands en énergie. Nous pensons qu’une des raisons de la grande efficacité du système visuel est qu’il utilise des impulsions courtes pour représenter l’information : les potentiels d’action émis par les neurones. En utilisant une approche neuromorphique, l’objectif de ce projet de thèse est de développer des modèles de traitement de l’information visuelle utilisant des représentations basées sur ces impulsions, événements binaires décrits uniquement par leur temps et leur origine. Nous avons choisi d’utiliser un signal dynamique, capturé par une caméra événementielle, qui transcrit une scène visuelle en utilisant uniquement des événements, ou impulsions. Nous résolvons des tâches cognitives visuelles en utilisant le code temporel formé par des séquences précises d’événements que nous appelons motifs d’impulsions. De nombreuses preuves expérimentales suggèrent que le code temporel porté par ces motifs serait une stratégie d’encodage de l’information visuelle utilisée par le cerveau. Nous verrons que l’utilisation de ces motifs permet de développer des méthodes d’apprentissage locales et biologiquement réalistes tout en traitant de manière dynamique et asynchrone les événements caractérisant une scène visuelle. Nous montrons que ces algorithmes permettent de résoudre une tâche de reconnaissance d’objet et une tâche d’estimation de mouvement de manière ultra-rapide et efficace. Nous observons également l’émergence d’une organisation des champs récepteurs similaire à celle des systèmes biologiques, ce qui suggère qu’une stratégie similaire peut être employée par le cerveau. Dans la dernière partie de ce travail, nous détaillerons le développement d’un nouvel algorithme pour détecter ce type d’activité dans des enregistrements de neurones réels.

Modélisation multi-échelle de la sélectivité à l'orientation dans les stimulations visuelles naturelles

Cette thèse vise à comprendre les fondements et les fonctions des calculs probabilistes impliqués dans les processus visuels. Nous nous appuyons sur une double stratégie, qui implique le développement de modèles dans le cadre du codage prédictif selon le principe de l’énergie libre. Ces modèles servent à définir des hypothèses claires sur la fonction neuronale, qui sont testées à l’aide d’enregistrements extracellulaires du cortex visuel primaire. Cette région du cerveau est principalement impliquée dans les calculs sur les unités élémentaires des entrées visuelles naturelles, sous la forme de distributions d’orientations. Ces distributions probabilistes, par nature, reposent sur le traitement de la moyenne et de la variance d’une entrée visuelle. Alors que les premières ont fait l’objet d’un examen neurobiologique approfondi, les secondes ont été largement négligées. Cette thèse vise à combler cette lacune. Nous avançons l’idée que la connectivité récurrente intracorticale est parfaitement adaptée au traitement d’une telle variance d’entrées, et nos contributions à cette idée sont multiples. (1) Nous fournissons tout d’abord un examen informatique de la structure d’orientation des images naturelles et des stratégies d’encodage neuronal associées. Un modèle empirique clairsemé montre que le code neuronal optimal pour représenter les images naturelles s’appuie sur la variance de l’orientation pour améliorer l’efficacité, la performance et la résilience. (2) Cela ouvre la voie à une étude expérimentale des réponses neurales dans le cortex visuel primaire du chat à des stimuli multivariés. Nous découvrons de nouveaux types de neurones fonctionnels, dépendants de la couche corticale, qui peuvent être liés à la connectivité récurrente. (3) Nous démontrons que ce traitement de la variance peut être compris comme un graphe dynamique pondéré conditionné par la variance sensorielle, en utilisant des enregistrements du cortex visuel primaire du macaque. (4) Enfin, nous soutenons l’existence de calculs de variance (prédictifs) en dehors du cortex visuel primaire, par l’intermédiaire du noyau pulvinaire du thalamus. Cela ouvre la voie à des études sur les calculs de variance en tant que calculs neuronaux génériques soutenus par la récurrence dans l’ensemble du cortex.

Analyse Des Données Neurobiologiques Guidée Par La Modélisation

Les récentes avancées technologiques en neurobiologie ont ouvert la voie à l’enregistrement de très grandes populations de neurones à la résolution du potentiel d’action (PA). Elles apportent un éclairage nouveau sur la structure de l’activité neuronale, et en particulier sur la riche structure spatio-temporelle de l’information neuronale. Les méthodes d’analyse actuelles ne sont pas encore adaptées à ce niveau de précision et le but de ce projet de thèse est de développer de nouvelles méthodes d’analyse des données neurobiologiques qui prennent en compte des principes fondamentaux dérivés des neurosciences computationnelles. L’objectif principal est de combler le fossé entre deux approches classiques, l’encodage et le décodage de l’activité neuronale, d’une manière auto-consistante, c’est-à-dire en atteignant une cohérence entre la façon dont les neurones encodent l’information et la façon dont elle peut être décodée. Pour ce faire, le projet s’appuiera sur une approche théorique développée dans notre groupe qui permet de formaliser cette cohérence en déduisant une mesure de l’efficacité du traitement des potentiels d’actions (PA). L’algorithme d’analyse sera optimisé en minimisant une fonction de coût reflétant ces principes. Des modèles d’encodage et de décodage existants, que nous avons déjà validés sur des données neurophysiologiques, seront alors combinés selon cette approche. En étudiant le rôle de principes tels que l’efficacité énergétique du traitement neuronal et la prise en compte des contraintes physiologiques, nous pourrons alors déduire le rôle de chacun de ces principes dans l’information neuronale en mesurant les changements dans l’efficacité du traitement. À terme, l’objectif est de développer de nouvelles méthodes d’analyse permettant d’établir des liens prédictifs entre l’activité neuronale multidimensionnelle enregistrée et des hypothèses fonctionnelles telles que la détection d’objets ou la navigation spatiale. Cette méthode permettra également de tester différentes hypothèses sur le rôle de la précision temporelle dans le traitement de l’information. Les risques associés à l’aspect innovant du projet seront mitigés par l’expertise des encadrants en apprentissage automatique, modélisation neuronale et neurosciences computationnelles.

Réseaux de Neurones Impulsionnels Pour La Vision Embarquée Basée Sur Les Événements

La vision par ordinateur embarquée est récemment devenue omniprésente. Elle englobe des tâches telles que la détection, la reconnaissance et le suivi d’éléments visuels, avec des applications en robotique (conduite autonome), dans l’industrie (évaluation de la qualité des produits, automatisation de tâches répétitives), dans la sécurité, pour l’expérience client, dans les réseaux sociaux, etc. Cet engouement généralisé ne fait que renforcer la nécessité de surmonter les défis posés par ce domaine de recherche, à savoir une consommation d’énergie gargantuesque, une grande mémoire et la prise en charge d’un large éventail d’algorithmes. Nous pensons qu’une réponse prometteuse à ces défis peut être amenée par l’utilisation combinée de réseaux de neurones à impulsions (SNNs) et de caméras événementielles. Les SNNs sont des réseaux de neurones artificiels bio-inspirés qui visent à imiter la dynamique des neurones biologiques en traitant l’information sous forme de séries d’impulsions. Les caméras événementielles sont un nouveau type de capteur visuel bio-inspiré qui génère des données asynchrones en fonction des changements d’intensité des pixels. Elles sont idéales pour les applications en temps réel, mais la grande quantité d’informations temporelles qu’elles génèrent est difficile à traiter à l’aide de modèles traditionnels de vision par ordinateur. Cependant, les données événementielles se combinent naturellement aux SNNs en termes d’inspiration biologique, d’économie d’énergie, de latence et d’utilisation de la mémoire, pour le traitement dynamique des données visuelles.Cependant, la nouveauté des SNNs et caméras événementielles laisse place à de nombreuses améliorations en termes de prétraitement optimal des données ainsi que leur traitement subséquent, en tirant le meilleur parti des particularités de ces concepts scientifiques. Dans cette thèse, nous avons identifié plusieurs problématiques liées à ce vaste champ de recherche, que nous avons condensées en deux thèmes principaux. La première problématique concerne l’optimisation du prétraitement embarqué des données événementielles acquises par une caméra embarquée pour en faciliter l’analyse ultérieure. Nous proposons trois solutions : les événements pourraient soit 1) être réduits dans l’espace ou dans le temps, online ou offline ; 2) ne conserver que les éléments saillants et rejeter le reste ; 3) être soumis à un mécanisme de fovéation, selon un compromis bio-plausible entre les deux solutions précédentes. Nous avons comparé qualitativement et quantitativement les données obtenues après chaque méthode de prétraitement afin d’évaluer si le compromis entre la quantité de données (c’est-à-dire le nombre d’événements) conservées et la pertinence de l’information préservée est idéal. Le second défi est l’exploitation de l’adéquation des SNNs pour traiter la temporalité spécifique des événements dans un contexte embarqué. Dans l’ensemble, nous espérons avoir apporté une contribution utile à l’exploitation des avantages uniques de la combinaison des SNNs avec les caméras événementielles pour la vision par ordinateur embarquée, en particulier en ce qui concerne le prétraitement des données événementielles.

Accurate Detection of Spiking Motifs by Learning Heterogeneous Delays of a Spiking Neural Network

Recently, interest has grown in exploring the hypothesis that neural activity conveys information through precise spiking motifs. To investigate this phenomenon, various algorithms have been proposed to detect such motifs in Single Unit Activity (SUA) recorded from populations of neurons. In this study, we present a novel detection model based on the inversion of a generative model of raster plot synthesis. Using this generative model, we derive an optimal detection procedure that takes the form of logistic regression combined with temporal convolution. A key advantage of this model is its differentiability, which allows us to formulate a supervised learning approach using a gradient descent on the binary cross-entropy loss. To assess the model’s ability to detect spiking motifs in synthetic data, we first perform numerical evaluations. This analysis highlights the advantages of using spiking motifs over traditional firing rate based population codes. We then successfully demonstrate that our learning method can recover synthetically generated spiking motifs, indicating its potential for further applications. In the future, we aim to extend this method to real neurobiological data, where the ground truth is unknown, to explore and detect spiking motifs in a more natural and biologically relevant context.

Ultra-Fast Image Categorization in biology and in neural models
Ultra-Fast Image Categorization in biology and in neural models

Humans are able to robustly categorize images and can, for instance, detect the presence of an animal in a briefly flashed image in as little as 120 ms. Initially inspired by neuroscience, deep-learning algorithms literally bloomed up in the last decade such that the accuracy of machines is at present superior to humans for visual recognition tasks. However, these artificial networks are usually trained and evaluated on very specific tasks, for instance on the 1000 separate categories of IMAGENET. In that regard, biological visual systems are more flexible and efficient compared to artificial systems on generic ecological tasks. In order to deepen this comparison, we retrained the standard VGG Convolutional Neural Network (CNN) on two independent tasks which are ecologically relevant for humans: one task defined as detecting the presence of an animal and the other as detecting the presence of an artifact. We show that retraining the network achieves human-like performance level which is reported in psychophysical tasks. We also compare the accuracy of the detection on an image-by-image basis. This showed in particular that the two models perform better when combining their outputs. Indeed, animals (e.g. lions) tend to be less present in photographs containing artifacts (e.g. buildings). These re-trained models could reproduce some unexpected behavioral observations from humans psychophysics such as the robustness to rotations (e.g. upside-down or slanted image) or to a grayscale transformation. Finally, we quantitatively tested the number of layers of the CNN which are necessary to reach such a performance, showing that a good accuracy for ultra-fast categorization could be reached with only a few layers, challenging the belief that image recognition would require a deep sequential analysis of visual objects. We expect to apply this framework to guide future model-based psychophysical experiments and biomimetic deep neuronal architectures designed for such tasks.

Resilience to sensory uncertainty in the primary visual cortex

Our daily endeavors occur in a complex visual environment, whose intrinsic variability shapes the way we integrate information to make decisions. By processing thousands of parallel sensory inputs, our brain is theoretically able to compute the uncertainty of its environment, which would allow it to perform Bayesian integration of its internal representations and its new sensory inputs to drive optimal inference. While there is convincing evidence that humans do compute this sensory uncertainty to guide their behavior, the actual neurobiological and computational principles on which uncertainty computations rely are still poorly understood. Here, we generated naturalistic stimuli of controlled uncertainty and performed a model-based analysis of their electrophysiological correlates in the primary visual cortex. Firstly, we report two layer-specific neuronal responses : infragranular layer neurons were vulnerable to increments of uncertainty, contrarily to supragranular neurons who were resilient, to the point of sometimes reducing uncertainty from the input. Secondly, we used neural decoding to show that these two responses have two different functional population roles: vulnerable neurons encode only the sensory feature (here, orientation) of the input, while resilient neurons co-encode both the sensory feature and its uncertainty. Finally, we implemented a recurrent leaky integrate-and-fire neural network to mechanistically demonstrate that these different types of responses to uncertainty can be explained by different types of recurrent connectivity between cortical neurons. Overall, we provide neurobiological and computational evidences which pinpoint recurrent interactions as the neural substrate of computations on sensory uncertainty. This fits theoretical considerations on canonical microcircuit in the cortex, potentially establishing uncertainty computations as a new general role for local recurrent cortical connectivity.

Precise spiking motifs in neurobiological and neuromorphic data
Precise spiking motifs in neurobiological and neuromorphic data

Why do neurons communicate through spikes? By definition, spikes are all-or-none neural events which occur at continuous times. In other words, spikes are on one side binary, existing or not without further details, and on the other can occur at any asynchronous time, without the need for a centralized clock. This stands in stark contrast to the analog representation of values and the discretized timing classically used in digital processing and at the base of modern-day neural networks. As neural systems almost systematically use this so-called event-based representation in the living world, a better understanding of this phenomenon remains a fundamental challenge in neurobiology in order to better interpret the profusion of recorded data. With the growing need for intelligent embedded systems, it also emerges as a new computing paradigm to enable the efficient operation of a new class of sensors and event-based computers, called neuromorphic, which could enable significant gains in computation time and energy consumption, a major societal issue in the era of the digital economy and global warming. In this review paper, we provide evidence from biology, theory and engineering that the precise timing of spikes plays a crucial role in our understanding of the efficiency of neural networks.

Detection of precise spiking motifs using spike-time dependent weight and delay plasticity

The spiking response of a biological neuron depends on the precise timing of afferent spikes. This temporal aspect of the neuronal code is essential in understanding information processing in neurobiology. In this model, raster plot analysis showed repeated activation of specific spiking motifs, which exhibit a precise temporal sequence of neural activations. Our first contribution is to develop a model for the efficient detection of temporal spiking motifs based on a layer of neurons with hetero-synaptic delays. Indeed, the variety of synaptic delays on the dendritic tree allows synchronizing synaptic inputs as they reach the basal dendritic tree. Second, we propose a bio-plausible unsupervised learning rule on both weights and delays through the derivation of a loss function which depends on the membrane potential of the spiking neuron and a sparseness regularization. We demonstrate on synthetic data that such a layer of spiking neurons is able to learn different repeating spatio-temporal motifs embedded in the spike train. Then, we test the robustness of the detection accuracy of the model by adding Poisson noise and compare it to a layer of Leaky-Integrate and Fire neurons trained with STDP. Results show a large improvement in performances when adding temporal delays for computations and a great increase in robustness to noise. We show that using synaptic delays for neuronal computations highly increases the representational capacities of a single neuron and its resilience to noise. .

Pooling in a predictive model of V1 explains functional and structural diversity across species
Pooling in a predictive model of V1 explains functional and structural diversity across species

Neurons in the primary visual cortex are selective to orientation with various degrees of selectivity to the spatial phase, from high selectivity in simple cells to low selectivity in complex cells. Various computational models have suggested a possible link between the presence of phase invariant cells and the existence of cortical orientation maps in higher mammals’ V1. These models, however, do not explain the emergence of complex cells in animals that do not show orientation maps. In this study, we build a model of V1 based on a convolutional network called Sparse Deep Predictive Coding (SDPC) and show that a single computational mechanism, pooling, allows the SDPC model to account for the emergence of complex cells as well as cortical orientation maps in V1, as observed in distinct species of mammals. By using different pooling functions, our model developed complex cells in networks that exhibit orientation maps (e.g., like in carnivores and primates) or not (e.g., rodents and lagomorphs). The SDPC can therefore be viewed as a unifying framework that explains the diversity of structural and functional phenomena observed in V1. In particular, we show that orientation maps emerge naturally as the most cost-efficient structure to generate complex cells under the predictive coding principle.

Ultra-rapid visual search in natural images using active deep learning
Ultra-rapid visual search in natural images using active deep learning

Visual search, that is, the simultaneous localization and detection of a visual target of interest, is a vital task. Applied to the case of natural scenes, searching for example to an animal (either a prey, a predator or a partner) constitutes a challenging problem due to large variability over numerous visual dimensions such as shape, pose, size, texture or position. Yet, biological visual systems are able to perform such detection efficiently in briefly flashed scenes and in a very short amount of time.Deep convolutional neuronal networks (CNNs) were shown to be well fitted to the image classification task, providing with human (or even super-human) performance. Previous models also managed to solve the visual search task, by roughly dividing the image into sub-areas. This is at the cost, however, of computer-intensive parallel processing on relatively low-resolution image samples. Taking inspiration from natural vision systems, we develop here a model that builds over the anatomical visual processing pathways observed in mammals, namely the What and the Where pathways. It operates in two steps, one by selecting regions of interest, before knowing their actual visual content, through an ultra-fast/low resolution analysis of the full visual field, and the second providing a detailed categorization over the detailed foveal selected region attained with a saccade.

Revisiting Horizontal Connectivity Rules in V1: From like-to-like towards like-to-All

Horizontal connections in the primary visual cortex of carnivores, ungulates and primates organize on a near-regular lattice. Given the similar length scale for the regularity found in cortical orientation maps, the currently accepted theoretical standpoint is that these maps are underpinned by a like-to-like connectivity rule: horizontal axons connect preferentially to neurons with similar preferred orientation. However, there is reason to doubt the rule’s explanatory power, since a growing number of quantitative studies show that the like-to-like connectivity preference and bias mostly observed at short-range scale, are highly variable on a neuron-to-neuron level and depend on the origin of the presynaptic neuron. Despite the wide availability of published data, the accepted model of visual processing has never been revised. Here,~we review three lines of independent evidence supporting a much-needed revision of the like-to-like connectivity rule, ranging from anatomy to population functional measures, computational models and to theoretical approaches. We advocate an alternative, distance-dependent connectivity rule that is consistent with new structural and functional evidence: from like-to-like bias at short horizontal distance to like-to-all at long horizontal distance. This generic rule accounts for the observed high heterogeneity in interactions between the orientation and retinotopic domains, that we argue is necessary to process non-trivial stimuli in a task-dependent manner.

A Behavioral Receptive Field for Ocular Following in Monkeys: Spatial Summation and Its Spatial Frequency Tuning

In human and non-human primates, reflexive tracking eye movements can be initiated at very short latency in response to a rapid shift of the image. Previous studies in humans have shown that only a part of the central visual field is optimal for driving ocular following responses. Herein, we have investigated spatial summation of motion information across a wide range of spatial frequencies and speeds of drifting gratings by recording short-latency ocular following responses in macaque monkeys. We show that optimal stimulus size for driving ocular responses cover a small ($<$20$,^∘$ diameter), central part of the visual field that shrinks with higher spatial frequency. This signature of linear motion integration remains invariant with speed and temporal frequency. For low and medium spatial frequencies, we found a strong suppressive influence from surround motion, evidenced by a decrease of response amplitude for stimulus sizes larger than optimal. Such suppression disappears with gratings at high frequencies. The contribution of peripheral motion was investigated by presenting grating annuli of increasing eccentricity. We observed an exponential decay of response amplitude with grating eccentricity, the decrease being faster for higher spatial frequencies. Weaker surround suppression can thus be explained by sparser eccentric inputs at high frequencies. A Difference-of-Gaussians model best renders the antagonistic contributions of peripheral and central motions. Its best-fit parameters coincide with several, well-known spatial properties of area MT neuronal populations. These results describe the mechanism by which central motion information is automatically integrated in a context-dependent manner to drive ocular responses.Significance statementOcular following is driven by visual motion at ultra-short latency in both humans and monkeys. Its dynamics reflect the properties of low-level motion integration. Here, we show that a strong center-surround suppression mechanism modulates initial eye velocity. Its spatial properties are dependent upon visual inputs’ spatial frequency but are insensitive to either its temporal frequency or speed. These properties are best described with a Difference-of-Gaussian model of spatial integration. The model parameters reflect many spatial characteristics of motion sensitive neuronal populations in monkey area MT. Our results further outline the computational properties of the behavioral receptive field underpinning automatic, context-dependent motion integration.

From event-based computations to a bio-plausible Spiking Neural Network

We propose a neuromimetic online classifier for always-on digit recognition. To achieve this, we extend an existing event-based algorithm which introduced novel spatio-temporal features: time surfaces. Built from asynchronous events acquired by a neuromorphic camera, these time surfaces allow to code the local dynamics of a visual scene and create an efficient hierarchical event-based pattern recognition architecture. Its formalism was previously adapted in the computational neuroscience domain by showing it may be implemented using a Spiking Neural Network (SNN) of leaky integrate-and-fire models and Hebbian learning. Here, we add an online classification layer using a multinomial logistic regression which is compatible with a neural implementation. A decision can be taken at any arbitrary time by taking the argmax of the probability values associated to each class. We extend the parallel with computational neuroscience by demonstrating that this classification layer is also equivalent to a layer of spiking neurons with a Hebbian-like learning mechanism. Our method obtains state-of-the-art performances on the N-MNIST dataset and we show that it is robust to both spatial and temporal jitter. As a summary, we were able to develop a neuromimetic SNN model for online digit classification. We aim at pursuing the study of this architecture for natural scenes and hope to offer insights on the efficiency of neural computations, and in particular how mechanisms of decision-making may be formed.

A homeostatic gain control mechanism to improve event-driven object recognition

We propose a neuromimetic architecture able to perform pattern recognition. To achieve this, we extended the existing event-based algorithm from Lagorce et al (2017) which introduced novel spatio-temporal features: time surfaces. Built from asynchronous events acquired by a neuromorphic camera, these time surfaces allow to code the local dynamics of a visual scene and create an efficient hierarchical event-based pattern recognition architecture. Inspired by biological findings and the efficient coding hypothesis, our main contribution is to integrate homeostatic regulation into the Hebbian learning rule. Indeed, in order to be optimally informative, average neural activity within a layer should be equally balanced across neurons. We used that principle to regularize neurons within the same layer by setting a gain depending on their past activity and such that they emit spikes with balanced firing rates. The efficiency of this technique was first demonstrated through a robust improvement in spatio-temporal patterns which were learnt during the training phase. In order to compare with state-of-the-art methods, we replicated past results on the same dataset as Lagorce et al (2017) and extended results in this study to the widely used N-MNIST dataset.

A robust bio-inspired approach to event-driven object recognition

We propose a neuromimetic architecture able to perform online pattern recognition. To achieve this, we extended the existing event-based algorithm from Lagorce et al (2017) which introduced novel spatio-temporal features: time-surfaces. Built from asynchronous events acquired by a neuromorphic camera, these time surfaces allow to code the local dynamics of a visual scene and to create an efficient hierarchical event-based pattern recognition architecture. Inspired by biological findings and the efficient coding hypothesis, our main contribution is to integrate homeostatic regulation to the Hebbian learning rule. Indeed, in order to be optimally informative, average neural activity within a layer should be equally balanced across neurons. We used that principle to regularize neurons within the same layer by setting a gain depending on their past activity and such that they emit spikes with balanced firing rates. The efficiency of this technique was first demonstrated through a robust improvement in spatio-temporal patterns which were learned during the training phase. We validated classification performance with the widely used N-MNIST dataset reaching 87.3 percent accuracy with homeostasis compared to 72.5 percent accuracy without homeostasis. Finally, by studying the impact of input jitter on classification highlights resilience of this method. We expect to extend this fully event-driven approach to more naturalistic tasks, notably for ultra-fast object categorization.

Sparse Deep Predictive Coding captures contour integration capabilities of the early visual system
Sparse Deep Predictive Coding captures contour integration capabilities of the early visual system

Both neurophysiological and psychophysical experiments have pointed out the crucial role of recurrent and feedback connections to process context-dependent information in the early visual cortex. While numerous models have accounted for feedback effects at either neural or representational level, none of them were able to bind those two levels of analysis. Is it possible to describe feedback effects at both levels using the same model? We answer this question by combining Predictive Coding (PC) and Sparse Coding (SC) into a hierarchical and convolutional framework. In this Sparse Deep Predictive Coding (SDPC) model, the SC component models the internal recurrent processing within each layer, and the PC component describes the interactions between layers using feedforward and feedback connections. Here, we train a 2-layered SDPC on two different databases of images, and we interpret it as a model of the early visual system (V1~&~V2). We first demonstrate that once the training has converged, SDPC exhibits oriented and localized receptive fields in V1 and more complex features in V2. Second, we analyze the effects of feedback on the neural organization beyond the classical receptive field of V1 neurons using interaction maps. These maps are similar to association fields and reflect the Gestalt principle of good continuation. We demonstrate that feedback signals reorganize interaction maps and modulate neural activity to promote contour integration. Third, we demonstrate at the representational level that the SDPC feedback connections are able to overcome noise in input images. Therefore, the SDPC captures the association field principle at the neural level which results in better disambiguation of blurred images at the representational level.

Modelling Complex-cells and topological structure in the visual cortex of mammals using Sparse Predictive Coding

Cells in the primary visual cortex of mammals (V1) have historically been divided into two classes: simple and complex. Simple cells exhibit a rectified linear response to oriented visual stimuli while complex cells show various degrees of invariance with respect to the stimulus’ phase (position). The existence of these two populations can be explained by hierarchical models where simple cells feed information into complex cells through a non-linear spatial pooling [1]. Nevertheless, how the brain develops this structure remains an open question. One of the most successful theories to model hierarchical processing in the brain is Predictive Coding (PC): a framework introduced by Rao & Ballard [2] that exploits feedback and feedforward connectivity to solve a Bayesian inference problem. We extended the classical PC to account for a sparse representation of the input data (natural images) and a convolutional structure to allow translation invariance. We demonstrate that this framework, called Sparse Deep Predictive Coding (SDPC) [3], can easily replicate complex-like neurons when a non-linear pooling is included between the layers. In particular, we show that a large population of complex-like neurons, showing various degrees of phase invariance, emerges in the 2nd layer of the model when the pooling function is extended to include not only neighboring spatial locations but also neighboring neurons with different tuning properties. We trained various networks on natural images (STL-10 data-set). To quantify the complex behavior of the model neurons, we used the modulation ratio F1/F0[4]: if F1/F0 ≥ 1 the cell is identified as simple-like, if F1/F0<1 the cell is complex-like. In all the tested settings the 1st layer of the network exhibits simple-like neurons, while the 2nd layer always presents a fraction of complex-like cells. We observed the emergence of different behaviors by introducing different types on non-linear pooling. A striking emergent property of our model is that this non-linearities induce a topographical structure on the neurons of the network. This organization shows qualitatively strong similarities with that found in V1. Importantly, this organization is solely a consequence of the feedback connection coming from the 2nd layer: enforcing the pooling across neighboring neurons constrains neighboring simple-cells to encode for similar features. In particular, edge-like filters with similar orientations and frequencies, but different phases, tend to be grouped together. We were able to reproduce similar effects with two different types of pooling (l2-pooling, max-pooling) and different network sizes, obtaining different degrees of topographical organization and different ratios of complex-like cells. A possible explanation of this phenomenon lies in the computational bottleneck caused by a reduction in the network layers’ size. This is also in line with previous studies that showed how cortical networks with the same functional characterization can explain different behavior with different structural parameters [5]. The novelty of this model lies in its ability to highlight the link between structure and function in a neural network. This study addresses a long-debated question on the function and role of the diversity of topographical structure in the visual cortex of mammals across species. References: [1] Ko Sakai and Shigeru Tanaka. Spatial pooling in the second-order spatial structure of cortical complex cells. In: Vision Research 40.7 (2000), pp. 855–871. [2] Rajesh PN Rao and Dana H Ballard. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. In: Nature neuroscience2.1 (1999), p. 79. [3] Victor Boutin et al. Sparse Deep Predictive Coding captures contour integration capabilities of the early visual system. In: arXiv preprint arXiv:1902.07651 (2019). [4] Bernt C Skottun et al. Classifying simple and complex cells on the basis of response modulation. In: Vision research 31.7-8 (1991), pp. 1078–1086. [5] Jaeson Jang, Min Song, and Se-Bum Paik. Classification of columnar and salt-and-pepper organization in mammalian visual cortex. In: bioRxiv (2019), p. 698043

A dual foveal-peripheral visual processing model implements efficient saccade selection

In computer vision, the visual search task consists in extracting a scarce and specific visual information (the target) from a large and crowded visual display. This task is usually implemented by scanning the different possible target identities at all possible spatial positions, hence with strong computational load. The human visual system employs a different strategy, combining a foveated sensor with the capacity to rapidly move the center of fixation using saccades. Saccade-based visual exploration can be idealized as an inference process, assuming that the target position and category are independently drawn from a common generative process. Knowing that process, visual processing is then separated in two specialized pathways, the where pathway mainly conveying information about target position in peripheral space, and the what pathway mainly conveying information about the category of the target. We consider here a dual neural network architecture learning independently where to look and then at what to see. This allows in particular to infer target position in retinotopic coordinates, independently to its category. This framework was tested on a simple task of finding digits in a large, cluttered image. Simulation results demonstrate the benefit of specifically learning where to look before actually knowing the target category. The approach is also energy-efficient as it includes the strong compression rate performed at the sensor level, by retina and V1 encoding, which is preserved up to the action selection level, highlighting the advantages of bio-mimetic strategies with regards to traditional computer vision when computing resources are at stake.

Modelling Complex-cells and topological structure in the visual cortex of mammals using Sparse Predictive Coding

Cells in the primary visual cortex of mammals (V1) have historically been divided into two classes: simple and complex. Simple cells exhibit a rectified linear response to oriented visual stimuli while complex cells show various degrees of invariance with respect to the stimulus’ phase (position). The existence of these two populations can be explained by hierarchical models where simple cells feed information into complex cells through a non-linear spatial pooling [1]. Nevertheless, how the brain develops this structure remains an open question. One of the most successful theories to model hierarchical processing in the brain is Predictive Coding (PC): a framework introduced by Rao & Ballard [2] that exploits feedback and feedforward connectivity to solve a Bayesian inference problem. We extended the classical PC to account for a sparse representation of the input data (natural images) and a convolutional structure to allow translation invariance. We demonstrate that this framework, called Sparse Deep Predictive Coding (SDPC) [3], can easily replicate complex-like neurons when a non-linear pooling is included between the layers. In particular, we show that a large population of complex-like neurons, showing various degrees of phase invariance, emerges in the 2nd layer of the model when the pooling function is extended to include not only neighboring spatial locations but also neighboring neurons with different tuning properties. We trained various networks on natural images (STL-10 data-set). To quantify the complex behavior of the model neurons, we used the modulation ratio F1/F0[4]: if F1/F0 ≥ 1 the cell is identified as simple-like, if F1/F0<1 the cell is complex-like. In all the tested settings the 1st layer of the network exhibits simple-like neurons, while the 2nd layer always presents a fraction of complex-like cells. We observed the emergence of different behaviors by introducing different types on non-linear pooling. A striking emergent property of our model is that this non-linearities induce a topographical structure on the neurons of the network. This organization shows qualitatively strong similarities with that found in V1. Importantly, this organization is solely a consequence of the feedback connection coming from the 2nd layer: enforcing the pooling across neighboring neurons constrains neighboring simple-cells to encode for similar features. In particular, edge-like filters with similar orientations and frequencies, but different phases, tend to be grouped together. We were able to reproduce similar effects with two different types of pooling (l2-pooling, max-pooling) and different network sizes, obtaining different degrees of topographical organization and different ratios of complex-like cells. A possible explanation of this phenomenon lies in the computational bottleneck caused by a reduction in the network layers’ size. This is also in line with previous studies that showed how cortical networks with the same functional characterization can explain different behavior with different structural parameters [5]. The novelty of this model lies in its ability to highlight the link between structure and function in a neural network. This study addresses a long-debated question on the function and role of the diversity of topographical structure in the visual cortex of mammals across species. References: [1] Ko Sakai and Shigeru Tanaka. Spatial pooling in the second-order spatial structure of cortical complex cells. In: Vision Research 40.7 (2000), pp. 855–871. [2] Rajesh PN Rao and Dana H Ballard. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. In: Nature neuroscience2.1 (1999), p. 79. [3] Victor Boutin et al. Sparse Deep Predictive Coding captures contour integration capabilities of the early visual system. In: arXiv preprint arXiv:1902.07651 (2019). [4] Bernt C Skottun et al. Classifying simple and complex cells on the basis of response modulation. In: Vision research 31.7-8 (1991), pp. 1078–1086. [5] Jaeson Jang, Min Song, and Se-Bum Paik. Classification of columnar and salt-and-pepper organization in mammalian visual cortex. In: bioRxiv (2019), p. 698043

Etude d’un Algorithme Hiérarchique de Codage Épars et Prédictif : Vers Un Modèle Bio-Inspiré de La Perception Visuelle

La représentation concise et efficace de l’information est un problème qui occupe une place centrale dans l’apprentissage machine. Le cerveau, et plus particulièrement le cortex visuel, ont depuis longtemps trouvé des solutions performantes et robustes afin de résoudre un tel problème. A l’échelle locale, le codage épars est l’un des mécanismes les plus prometteurs pour modéliser le traitement de l’information au sein des populations de neurones dans le cortex visuel. A l’échelle structurelle, le codage prédictif suggère que les signaux descendants observés dans le cortex visuel modulent l’activité des neurones pour inclure des détails contextuels au flux d’information ascendant. Cette thèse propose de combiner codage épars et codage prédictif au sein d’un modèle hiérarchique et convolutif. D’un point de vue computationnel, nous démontrons que les connections descendantes, introduites par le codage prédictif, permettent une convergence meilleure et plus rapide du modèle. De plus, nous analysons les effets des connections descendantes sur l’organisation des populations de neurones, ainsi que leurs conséquences sur la manière dont notre algorithme se représente les images. Nous montrons que les connections descendantes réorganisent les champs d’association de neurones dans V1 afin de permettre une meilleure intégration des contours. En outre, nous observons que ces connections permettent une meilleure reconstruction des images bruitées. Nos résultats suggèrent que l’inspiration des neurosciences fournit un cadre prometteur afin de développer des algorithmes de vision artificielles plus performants et plus robustes.

Effect of top-down connections in Hierarchical Sparse Coding
Effect of top-down connections in Hierarchical Sparse Coding

Hierarchical Sparse Coding (HSC) is a powerful model to efficiently represent multi-dimensional, structured data such as images. The simplest solution to solve this computationally hard problem is to decompose it into independent layer-wise subproblems. However, neuroscientific evidence would suggest inter-connecting these subproblems as in the Predictive Coding (PC) theory, which adds top-down connections between consecutive layers. In this study, a new model called 2-Layers Sparse Predictive Coding (2L-SPC) is introduced to assess the impact of this inter-layer feedback connection. In particular, the 2L-SPC is compared with a Hierarchical Lasso (Hi-La) network made out of a sequence of independent Lasso layers. The 2L-SPC and the 2-layers Hi-La networks are trained on 4 different databases and with different sparsity parameters on each layer. First, we show that the overall prediction error generated by 2L-SPC is lower thanks to the feedback mechanism as it transfers prediction error between layers. Second, we demonstrate that the inference stage of the 2L-SPC is faster to converge than for the Hi-La model. Third, we show that the 2L-SPC also accelerates the learning process. Finally, the qualitative analysis of both models dictionaries, supported by their activation probability, show that the 2L-SPC features are more generic and informative.

Humans adapt their anticipatory eye movements to the volatility of visual motion properties
Humans adapt their anticipatory eye movements to the volatility of visual motion properties

Humans are able to accurately track a moving object with a combination of saccades and smooth eye movements. These movements allow us to align and stabilize the object on the fovea, thus enabling highresolution visual analysis. When predictive information is available about target motion, anticipatory smooth pursuit eye movements (aSPEM) are efficiently generated before target appearance, which reduce the typical sensorimotor delay between target motion onset and foveation. It is generally assumed that the role of anticipatory eye movements is to limit the behavioral impairment due to eyetotarget position and velocity mismatch. By manipulating the probability for target motion direction we were able to bias the direction and mean velocity of aSPEM, as measured during a fixed duration gap before target rampmotion onset. This suggests that probabilistic information may be used to inform the internal representation of motion prediction for the initiation of anticipatory movements. However, such estimate may become particularly challenging in a dynamic context, where the probabilistic contingencies vary in time in an unpredictable way. In addition, whether and how the information processing underlying the buildup of aSPEM is linked to an explicit estimate of probabilities is unknown. We developed a new paired* task paradigm in order to address these two questions. In a first session, participants observe a target moving horizontally with constant speed from the center either to the right or left across trials. The probability of either motion direction changes randomly in time. Participants are asked to estimate "how much they are confident that the target will move to the right or left in the next trial" and to adjust the cursor’s position on the screen accordingly. In a second session the participants eye movements are recorded during the observation of the same sequence of randomdirection trials. In parallel, we are developing new automatic routines for the advanced analysis of oculomotor traces. In order to extract the relevant parameters of the oculomotor responses (latency, gain, initial acceleration, catchup saccades), we developed new tools based on best*fitting procedure of predefined patterns (i.e. the typical smooth pursuit velocity profile).

Learning dynamics in a neural network model of the primary visual cortex

Background: The primary visual cortex (V1) is a key component of the visual system that builds some of the first levels of coherent visual representations from sparse visual inputs. While the study of its dynamics has been the focus of many computational models for the past years, there is still relatively few research works that put an emphasis on both synaptic plasticity in V1 and biorealism in the context of learning visual inputs. Here, we present a recurrent spiking neural network that is capable of spike timing dependent plasticity (STDP) and we demonstrate its capacity to discriminate spatio-temporal orientation patterns in noisy natural images. Methods: A two stage model was developed. First, natural images flux (be it videos/gratings/camera) were converted into spikes, using a difference of gaussians (DOG) approach. This transformation approximates the retina-lateral geniculate nucleus (LGN) organization. Secondly, a spiking neural network was build using PyNN simulator, mimicking cortical neurons dynamics and plasticity, as well as V1 topology. This network was then fed with spikes generated by the first model and its ability to build visual representations was assessed using control gratings inputs. Results: The neural network exhibited several interesting properties. After a short period of learning, it was capable of learning multiples orientations and reducing noise in such learned feature, compared to the inputs. These learned features were stable even after increasing the noise in inputs and were found to not only encoding the spatial properties of the input, but also its temporal aspects (i.e., the time of each grating presentation Conclusions: Our work shows that topological structuring of the cortical neural networks, combined with simple plasticity rules, are sufficient to drive strong learning dynamics of natural images properties. This computational model fits many properties found in the literature and provides some theoritical explanations for the shape of tuning curve of certain layers of V1. Further investigations are now conducted to validate its properties against the neuronal responses of rodents, using identical visual stimuli.

From the retina to action: Dynamics of predictive processing in the visual system
From the retina to action: Dynamics of predictive processing in the visual system

Within the central nervous system, visual areas are essential in transforming the raw luminous signal into a representation which efficiently conveys information about the environment. This process is constrained by the necessity of being robust and rapid. Indeed, there exists both a wide variety of potential changes in the geometrical characteristics of the visual scene and also a necessity to be able to respond as quickly as possible to the incoming sensory stream, for instance to drive a movement of the eyes to the location of a potential danger. Decades of study in neurophysiology and psychophysics at the different levels of vision have shown that this system takes advantage of a priori knowledge about the structure of visual information, such as the regularity in the shape and motion of visual objects. As such, the predictive processing framework offers a unified theory to explain a variety of visual mechanisms. However, we still lack a global normative approach unifying those mechanisms and we will review here some recent and promising approaches. First, we will describe Active Inference, a form of predictive processing equipped with the ability to actively sample the visual space. Then, we will extend this paradigm to the case where information is distributed on a topography, such as is the case for retinotopically organized visual areas. In particular, we will compare such models in light of recent neurophysiological data showing the role of traveling waves in shaping visual processing. Finally, we will propose some lines of research to understand how these functional models may be implemented at the neural level. In particular, we will review potential models of cortical processing in terms of prototypical micro-circuits. These allow to separate the different flows of information, from feed-forward prediction error to feed-back anticipation error. Still, the design of such a generic predictive processing circuit is still not fully understood and we will enumerate some possible implementations using biomimetic neural networks.

A dynamic model for decoding direction and orientation in macaque primary visual cortex

When objects are in motion, the local orientation of their contours and the direction of motion are two essential components of visual information which are processed in parallel in the early visual areas. Generally, to probe a neuron’s response property to moving stimuli, bars or gratings are drifted across neuron’s receptive field at various angles. The resulting tuning curve will reflect the "confound" selectivity to both the orientation and direction of motion orthogonal to the orientation. Focusing on the primary visual cortex of the macaque monkey (V1), we challenged different models for the joint representation of orientation and direction within the neural activity. Precisely, we considered the response of V1 neurons to an oriented moving bar to investigate whether, and how, the information about the bar’s orientation and direction could be encoded dynamically at the population activity level. For that purpose, we used a decoding approach based on a space-time receptive field model that encodes jointly orientation and direction. Then, using this model and a maximum likelihood paradigm, we inferred the most likely representation for a given network activity [1, 2]. We tested this model on surrogate data and on extracellular recordings in area V1 of awake macaque monkeys in response to oriented bars moving in 12 different directions. Using a cross-validation method we could robustly decode both the orientation and the direction of the bar within the classical receptive field (cRF). Furthermore, this decoding approach shows different properties: First, information about the orientation and direction of the bar is emerging before entering the cRF. Second, when testing different orientations with the same direction, our approach unravels that we can ``unconfound’’ the information about direction and orientation by decoding them independently. Finally, our results demonstrate that the orientation and the direction of motion of an ambiguous moving bar can be progressively decoded in V1. This is a signature of a dynamic solution to the aperture problem in area V1, similarly to what was already found in area MT [3]. [1] M. Jazayeri and J.A. Movshon. Optimal representation of sensory information by neural populations. Nature Neuroscience, 9(5):690–696, 2006. [2] W. Taouali, G. Benvenuti, P. Wallisch, F. Chavane, L. Perrinet. Testing the Odds of Inherent versus Observed Over-dispersion in Neural Spike Counts. Journal of Neurophysiology, 2015. [3] C. Pack, R. Born. Temporal dynamics of a neural solution to the aperture problem in visual area MT of macaque brain. Nature, 409(6823), 1040–1042. 2001.

Effet de La Variabilité de La Vitesse Sur Le Mouvement de Poursuite Oculaire Lente et Sur La Perception de La Vitesse

Nous avons expliqué comment le système visuel intègre les informations de mouvement en manipulant la distribution de vitesse locale à l’aide d’une classe bien contrôlée de stimuli de texture aléatoires à large bande appelée Motion Clouds (TM), avec des spectres de fréquence spatio-temporels naturalistes continus. Nos résultats montrent que le gain et la précision des poursuites se détériorent à mesure que la variabilité de la fréquence de stimulation augmente. Dans l’expérience de discrimination de vitesse perceptuelle, nous avons constaté que les MC ayant une largeur de bande légèrement supérieure à la vitesse étaient perçus comme se déplaçant plus rapidement. Cependant, au-delà d’une bande passante critique, la perception d’une vitesse constante a été perdue. Dans une troisième expérience de discrimination, nous avons constaté que pour les contrôleurs multimédias à large bande passante, les participants ne pouvaient plus discriminer la direction du mouvement. Ces résultats suggèrent que lorsqu’on augmente la bande passante de petites à grandes vitesses, l’observateur expérimente différents régimes de perception. Nous avons finalement réalisé une expérience d’échelle de différence de vraisemblance maximale avec nos stimuli MC afin d’étudier ces différents régimes de perception possibles. Nous avons identifié trois régimes dans la plage des valeurs de différence de vitesse testées qui correspondraient à la cohérence de mouvement, à la transparence de mouvement et à l’incohérence complète.

Sparse Deep Predictive Coding captures contour integration capabilities of the early visual system

Both neurophysiological and psychophysical experiments have pointed out the crucial role of recurrent and feedback connections to process context-dependent information in the early visual cortex. While numerous models have accounted for feedback effects at either neural or representational level, none of them were able to bind those two levels of analysis. Is it possible to describe feedback effects at both levels using the same model? We answer this question by combining Predictive Coding (PC) and Sparse Coding (SC) into a hierarchical and convolutional framework. In this Sparse Deep Predictive Coding (SDPC) model, the SC component models the internal recurrent processing within each layer, and the PC component describes the interactions between layers using feedforward and feedback connections. Here, we train a 2-layered SDPC on two different databases of images, and we interpret it as a model of the early visual system (V1~&~V2). We first demonstrate that once the training has converged, SDPC exhibits oriented and localized receptive fields in V1 and more complex features in V2. Second, we analyze the effects of feedback on the neural organization beyond the classical receptive field of V1 neurons using interaction maps. These maps are similar to association fields and reflect the Gestalt principle of good continuation. We demonstrate that feedback signals reorganize interaction maps and modulate neural activity to promote contour integration. Third, we demonstrate at the representational level that the SDPC feedback connections are able to overcome noise in input images. Therefore, the SDPC captures the association field principle at the neural level which results in better disambiguation of blurred images at the representational level.

Speed-Selectivity in Retinal Ganglion Cells is Sharpened by Broad Spatial Frequency, Naturalistic Stimuli
Speed-Selectivity in Retinal Ganglion Cells is Sharpened by Broad Spatial Frequency, Naturalistic Stimuli

Motion detection represents one of the critical tasks of the visual system and has motivated a large body of research. However, it remains unclear precisely why the response of retinal ganglion cells (RGCs) to simple artificial stimuli does not predict their response to complex, naturalistic stimuli. To explore this topic, we use Motion Clouds (MC), which are synthetic textures that preserve properties of natural images and are merely parameterized, in particular by modulating the spatiotemporal spectrum complexity of the stimulus by adjusting the frequency bandwidths. By stimulating the retina of the diurnal rodent, Octodon degus with MC we show that the RGCs respond to increasingly complex stimuli by narrowing their adjustment curves in response to movement. At the level of the population, complex stimuli produce a sparser code while preserving movement information; therefore, the stimuli are encoded more efficiently. Interestingly, these properties were observed throughout different populations of RGCs. Thus, our results reveal that the response at the level of RGCs is modulated by the naturalness of the stimulus -in particular for motion- which suggests that the tuning to the statistics of natural images already emerges at the level of the retina.

Orientation selectivity to synthetic natural patterns in a cortical-like model of the cat primary visual cortex

A key property of the neurons in the primary visual cortex (V1) is their selectivity to oriented stimuli in the visual field. Orientation selectivity allows the segmentation of objects in natural visual scenes, which is the first step in building integrative representations from retinal inputs. As such, V1 has always been of central interest in creating artificial neural networks and the recent years have seen a growing interest in the creation of explainable yet robust and adaptive models of cortical visual processes, for fundamental or applied purposes. One notable challenge for those models is to behave reliably in generic natural environments, where information is usually hidden in noise, while most models are typically studied with oriented gratings. Here we show that a simple biologically inspired neural network accounts for orientation selectivity to natural-like textures in the cat’s primary visual cortex. Our spiking neural network (SNN) is made of point neurons organized in recurrent and hierarchical layers based on the structure of cortical layers IV and II/III. We found that Spike-timing plasticity and synaptic recurrence allowed the SNN to self-organize its connections weights and reproduce the activity of neurons recorded with laminar probes in cortical areas 17 and 18 of cats, notably orientation tuning responses. After less than 5 seconds of stimulus presentation, the SNN displays narrow orientation selectivity (bandwidth = 10 degrees) characteristic of sparse representations, removes noise from the input and learns the structure of natural pattern repetitions. Our results support the use of natural stimuli to study theoretical and experimental cortical dynamics. Furthermore, this model encourages using SNNs to reduce complexity in cortical networks as a method to understand the separate contribution of different components in the laminar organization of the cortex. From an applied perspective, the computations this network performs could also be used as an alternative to classical blackbox Deep Learning models used in artificial vision.

An adaptive homeostatic algorithm for the unsupervised learning of visual features

The formation of structure in the visual system, that is, of the connections between cells within neural populations, is by large an unsupervised learning process: the emergence of this architecture is mostly self-organized. In the primary visual cortex of mammals, for example, one can observe during development the formation of cells selective to localized, oriented features which results in the development of a representation of contours in area V1. We modeled such a process using sparse Hebbian learning algorithms. These algorithms alternate a coding step to encode the information with a learning step to find the proper encoder. We identified here a major difficulty of classical solutions in their ability to deduce a good representation while knowing immature encoders, and to learn good encoders with a non-optimal representation. To solve this problem, we propose to introduce a new regulation process between learning and coding, called homeostasis. It is compatible with a neuromimetic architecture and allows for a more efficient emergence of localized filters sensitive to orientation. The key to this algorithm lies in a simple adaptation mechanism based on non-linear functions that reconciles the antagonistic processes that occur at the coding and learning time scales. We tested this unsupervised algorithm with this homeostasis rule for a series of learning algorithms coupled with different neural coding algorithms. In addition, we propose a simplification of this optimal homeostasis rule by implementing a simple heuristic on the probability of activation of neurons. Compared to the optimal homeostasis rule, we show that this heuristic allows to implement a faster unsupervised learning algorithm while retaining much of its effectiveness. These results demonstrate the potential application of such a strategy in computer vision and machine learning and we illustrate it with a result in a convolutional neural network.

A hierarchical, multi-layer convolutional sparse coding algorithm based on predictive coding

Sparse coding holds the idea that signals can be concisely described as a linear mixture of few components (called atoms) picked from a bigger set of primary kernels (called dictionary). This framework has long been used to model the strategy employed by mammals´ primary visual cortex (V1) to detect low-level features, in particular, oriented edges in natural scenes. Differently, predictive coding is a prominent tool used to model hierarchical neural dynamics: high-level cortical layers predict at best the activity of lower-level ones and this prediction is sent back through of a feedback connection between the layers. This defines a recursive loop in which prediction error is integrated to the sensory input and fed forward to refine the quality of the prediction. We propose a Sparse Deep Predictive Coding algorithm (SDPC) that exploits convolutional dictionaries and a feedback information flow for meaningful, hierarchical feature learning in static images. The proposed architecture allows us to add arbitrary non-linear spatial transformation stages between each layer of the hierarchical sparse representations, such as Max-Pooling or Spatial Transformer layers. SPDC consists of a dynamical system in the form of a convolutional neural network, analogous to the model proposed by Rao and Ballard, 1999. The state variables are sparse feature maps encoding the input and the feedback signals while the parameters of the system are convolutional dictionaries optimized through Hebbian learning. We observed that varying the strength of the feedback modulates the overall sparsity of low-level representations (lower feedback scales correspond to a less sparse activity), but without changing the exponential shape of the distribution of the sparse prior. This model could shed light on the role of sparsity and feedback modulation in hierarchical feature learning with important applications in signal processing (data compression), computer vision (by extending it to dynamic scenes) and computational neuroscience, notably by using more complex priors like group sparsity to model topological organization in the brain cortex.

Bayesian Modeling of Motion Perception using Dynamical Stochastic Textures
Bayesian Modeling of Motion Perception using Dynamical Stochastic Textures

A common practice to account for psychophysical biases in vision is to frame them as consequences of a dynamic process relying on optimal inference with respect to a generative model. The present study details the complete formulation of such a generative model intended to probe visual motion perception. It is first derived in a set of axiomatic steps constrained by biological plausibility. We then extend previous contributions by detailing three equivalent formulations of the Gaussian dynamic texture model. First, the composite dynamic textures are constructed by the random aggregation of warped patterns, which can be viewed as 3D Gaussian fields. Second, these textures are cast as solutions to a stochastic partial differential equation (sPDE). This essential step enables real time, on-the-fly, texture synthesis using time-discretized auto-regressive processes. It also allows for the derivation of a local motion-energy model, which corresponds to the log-likelihood of the probability density. The log-likelihoods are finally essential for the construction of a Bayesian inference framework. We use the model to probe speed perception in humans psychophysically using zoom-like changes in stimulus spatial frequency content. The likelihood is contained within the generative model and we chose a slow speed prior consistent with previous literature. We then validated the fitting process of the model using synthesized data. The human data replicates previous findings that relative perceived speed is positively biased by spatial frequency increments. The effect cannot be fully accounted for by previous models, but the current prior acting on the spatio-temporal likelihoods has proved necessary in accounting for the perceptual bias.

Smooth Pursuit Eye Movements and Learning : Role of Motion Probability and Reinforcement Contingencies

Un défi majeur pour les organismes vivants est leur capacité d’adapter constamment leurs comportements moteurs. Dans la première étude de cette thèse, nous avons étudié le rôle des régularités statistiques et du conditionnement opérant sur la poursuite lisse d’anticipation (PLA). Nous avons démontré que la PLA est générée de manière cohérente avec la probabilité du mouvement attendu. De plus, nous avons montré pour la première fois que la PLA peut être modulée par les contingences de renforcement.Dans une seconde étude, nous avons créé un paradigme de poursuite, inspiré par l’Iowa Gambling Task (IGT), impliquant un choix entre deux cibles associées à différentes récompenses. Nous avons testé ce paradigme chez des patients Parkinson (PP), ainsi que des contrôles âgés ou jeunes. Seulement chez les participants jeunes, la latence du choix oculomoteur est fortement réduite quand celui-ci est associé à une règle de renforcement. Pour les PP le choix est fortement retardé dans toutes les conditions et cet effet n’est pas simplement attribuable à un déficit moteur. Autrement, la stratégie de choix s’avère de mauvaise qualité dans tous les groupes suggérant des différences avec les résultats de l’IGT classique.La dernière contribution de cette thèse fut de modéliser l’effet du biais directionnel sur la PLA que nous avons observé dans la première étude. Nous avons testé deux modèles incluant une mémoire de type intégrateur à fuite de la séquence d’essais, ou l’estimation Bayesienne adaptative de la taille optimale de mémoire. Nos résultats suggèrent que les modèles adaptatifs pourraient contribuer dans le futur à mieux comprendre l’apprentissage statistique et par renforcement.

Speed uncertainty and motion perception with naturalistic random textures

It is still not fully understood how visual system integrates motion energy across different spatial and temporal frequencies to build a coherent percept of the global motion under the complex, noisy naturalistic conditions. We addressed this question by manipulating local speed variability distribution (i. e. speed bandwidth) using a well-controlled class of broadband random-texture stimuli called Motion Clouds (MCs) with continuous naturalistic spatiotemporal frequency spectra (Sanz-Leon et al., 2012, ; Simoncini et al., 2012). In a first 2AFC experiment on speed discrimination, participants had to compare the speed of a broad speed bandwidth MC (range: 0.05-8$,^∘$/s) moving at 1 of 5 possible mean speeds (ranging from 5 to 13 $,^∘$/s) to that of another MC with a small speed bandwidth (SD: 0.05 $,^∘$/s), always moving at a mean speed of 10$,^∘$/s . We found that MCs with larger speed bandwidth (between 0.05-0.5$,^∘$/s) were perceived moving faster. Within this range, speed uncertainty results in over-estimating stimulus velocity. However, beyond a critical bandwidth (SD: 0.5 $,^∘$/s), perception of a coherent speed was lost. In a second 2AFC experiment on direction discrimination, participants had to estimate the motion direction of moving MCs with different speed bandwidths. We found that for large band MCs participant could no longer discriminate motion direction. These results suggest that when increasing speed bandwidth from small to large range, the observer experiences different perceptual regimes. We then decided to run a Maximum Likelihood Difference Scaling (Knoblauch & Maloney, 2008) experiment with our speed bandwidth stimuli to investigate these different possible perceptual regimes. We identified three regimes within this space that correspond to motion coherency, motion transparency and motion incoherency. These results allow to further characterize the shape of the interactions kernel observed between different speed tuned channels and different spatiotemporal scales (Gekas et al ., 2017) that underlies global velocity estimation.

On the Origins of Hierarchy in Visual Processing

It is widely assumed that visual processing follows a forward sequence of processing steps along a hierarchy of laminar sub-populations of the neural system. Taking the example of the early visual system of mammals, most models are consequently organized in layers from the retina to visual cortical areas, until a decision is taken using the representation that is formed in the highest layer. Typically, features of higher complexity (position, orientation, size, curvature, …) are successively extracted in distinct layers (Carandini, 2012). This is prevalent in most deep learning algorithms and stems from a long history of feed-forward architectures. Though this proved to be highly successful, the origin of such architectures is not known i̧teSerre07. Using a generic unsupervised learning algorithm, we first trained a simple one-layer convolutional neural network on a seta of natural images with a growing number of neurons. By doing this, we could quantitatively manipulate the complexity of the representation that emerges from such learning and analyze if sub-populations within the layer could be grouped by their similarity, hence justifying the emergence of a hierarchical processing. As shown in previous studies (Olshausen, 1996), such an algorithm converges to a weight matrix that has strong analogies with the receptive fields of simple cells located in the Primary Visual Cortex of mammals (V1). This result extends naturally to a cortical representation of the input image that encodes second-order features (edges) as neural responses arranged in a three-dimensional space, where the third dimension can be seen as a model of hyper-columns of the Primary Visual Cortex. From this bio-inspired encoding, we were able to define contours in images as simple smooth trajectories in a cortical representation space. This simple model shows that hierarchical processing may originate from the neural encoding of different visual transformations within natural images: respectively translation, rotations and zooms, which correspond to rigid translation in the cortical space. The model can be further extended to reproduce the effect of complex cells in V1 (max pooling) and feedback signals from higher cortical areas. We predict that invariance to more complex transformation like shearing (perspective) and viewpoint changes (looming) will emerge as these additional steps in sensory processing are taken into account. Indeed, a higher level of complexity can be introduced as the cortical representation is extended from smooth trajectories (space domain) to smooth surfaces (space-time domain). As such, this justifies the extension of a simple sparse network formalism to translation invariant neural networks (such as the convolutional neural networks used in deep learning) that is able to generalize geometrical transformations, such as translation, rotations, and zooms, in an invariant bio-inspired representation (Perrinet, 2015). This should provide some key insights into higher-order features such as co-occurrences, but also to novel categorization architectures. Indeed, such features were recently found to be sufficient to allow the categorization of images containing an animal (Perrinet and Bednar, 2015). Crucially, as the geometrical transformations develop in time, we expect that the detection of these features is made robust by dynamical processes.

Estimating and anticipating a dynamic probabilistic bias in visual motion direction

Humans are able to accurately track a moving object with a combination of saccades and smooth eye movements. These movements allow us to align and stabilize the object on the fovea, thus enabling high-resolution visual analysis. When predictive information is available about target motion, anticipatory smooth pursuit eye movements (aSPEM) are efficiently generated before target appearance, which reduce the typical sensorimotor delay between target motion onset and foveation. It is generally assumed that the role of anticipatory eye movements is to limit the behavioral impairment due to eyetotarget position and velocity mismatch. By manipulating the probability for target motion direction we were able to bias the direction and mean velocity of aSPEM, as measured during a fixed duration gap before target rampmotion onset. This suggests that probabilistic information may be used to inform the internal representation of motion prediction for the initiation of anticipatory movements. However, such estimate may become particularly challenging in a dynamic context, where the probabilistic contingencies vary in time in an unpredictable way. In addition, whether and how the information processing underlying the buildup of aSPEM is linked to an explicit estimate of probabilities is unknown. We developed a new paired task paradigm in order to address these two questions. In a first session, participants observe a target moving horizontally with constant speed from the center either to the right or left across trials. The probability of either motion direction changes randomly in time. Participants are asked to estimate "how much they are confident that the target will move to the right or left in the next trial" and to adjust the cursor’s position on the screen accordingly. In a second session the participants eye movements are recorded during the observation of the same sequence of random direction trials. In parallel, we are developing new automatic routines for the advanced analysis of oculomotor traces. In order to extract the relevant parameters of the oculomotor responses (latency, gain, initial acceleration, catchup saccades), we developed new tools based on bestfitting procedure of predefined patterns (i.e. the typical smooth pursuit velocity profile).

Estimating and anticipating a dynamic probabilistic bias in visual motion direction

Humans are able to accurately track a moving object with a combination of saccades and smooth eye movements. These movements allow us to align and stabilize the object on the fovea, thus enabling highresolution visual analysis. When predictive information is available about target motion, anticipatory smooth pursuit eye movements (aSPEM) are efficiently generated before target appearance, which reduce the typical sensorimotor delay between target motion onset and foveation. It is generally assumed that the role of anticipatory eye movements is to limit the behavioral impairment due to eyetotarget position and velocity mismatch. By manipulating the probability for target motion direction we were able to bias the direction and mean velocity of aSPEM, as measured during a fixed duration gap before target rampmotion onset. This suggests that probabilistic information may be used to inform the internal representation of motion prediction for the initiation of anticipatory movements. However, such estimate may become particularly challenging in a dynamic context, where the probabilistic contingencies vary in time in an unpredictable way. In addition, whether and how the information processing underlying the buildup of aSPEM is linked to an explicit estimate of probabilities is unknown. We developed a new paired* task paradigm in order to address these two questions. In a first session, participants observe a target moving horizontally with constant speed from the center either to the right or left across trials. The probability of either motion direction changes randomly in time. Participants are asked to estimate "how much they are confident that the target will move to the right or left in the next trial" and to adjust the cursor’s position on the screen accordingly. In a second session the participants eye movements are recorded during the observation of the same sequence of randomdirection trials. In parallel, we are developing new automatic routines for the advanced analysis of oculomotor traces. In order to extract the relevant parameters of the oculomotor responses (latency, gain, initial acceleration, catchup saccades), we developed new tools based on best*fitting procedure of predefined patterns (i.e. the typical smooth pursuit velocity profile).

The flash-lag effect as a motion-based predictive shift
The flash-lag effect as a motion-based predictive shift

Due to its inherent neural delays, the visual system has an outdated access to sensory information about the current position of moving objects. In contrast, living organisms are remarkably able to track and intercept moving objects under a large range of challenging environmental conditions. Physiological, behavioral and psychophysical evidences strongly suggest that position coding is extrapolated using an explicit and reliable representation of object’s motion but it is still unclear how these two representations interact. For instance, the so-called flash-lag effect supports the idea of a differential processing of position between moving and static objects. Although elucidating such mechanisms is crucial in our understanding of the dynamics of visual processing, a theory is still missing to explain the different facets of this visual illusion. Here, we reconsider several of the key aspects of the flash-lag effect in order to explore the role of motion upon neural coding of objects’ position. First, we formalize the problem using a Bayesian modeling framework which includes a graded representation of the degree of belief about visual motion. We introduce a motion-based prediction model as a candidate explanation for the perception of coherent motion. By including the knowledge of a fixed delay, we can model the dynamics of sensory information integration by extrapolating the information acquired at previous instants in time. Next, we simulate the optimal estimation of object position with and without delay compensation and compared it with human perception under a broad range of different psychophysical conditions. Our computational study suggests that the explicit, probabilistic representation of velocity information is crucial in explaining position coding, and therefore the flash-lag effect. We discuss these theoretical results in light of the putative corrective mechanisms that can be used to cancel out the detrimental effects of neural delays and illuminate the more general question of the dynamical representation of spatial information at the present time in the visual pathways.

Voluntary tracking the moving clouds : Effects of speed variability on human smooth pursuit

The properties of motion processing for driving smooth eye movements have bee investigated using simple, artificial stimuli such as gratings, small dots or random dot patterns. Motion processing in the context of complex, natural images is less known. We have previously investigated the human ocular following responses to a novel class of random texture stimuli of parameterized naturalistic statistics: the Motion Clouds. In Fourier space, these dynamical textures are designed with a log normal distribution of spatial frequencies power multiplied by a pink noise power spectral density that reduces the high frequency contents of the stimulus (Sanz-Leon et al. 2011). We have previously shown that the precision of reflexive tracking increases with the spatial frequency bandwidth of large (> 30 deg diameter) patterns (i.e. the width of the spatial frequency distribution around a given mean spatial frequency; Simoncini et al. 2012). Now, we extend this approach to voluntary tracking and focused on the effects of spatial frequency bandwidth upon the initial phase of smooth pursuit eye movements. Participants were instructed to pursue a large patch of moving clouds (mean speeds: 5, 10 or 20 deg/s) embedded within a smoothing Gaussian window of standard deviation 5 deg. The motion stimuli were presented with four different spatial frequency bandwidths and two different mean spatial frequencies (0.3 and 1 cpd). We observed that smaller bandwidth textures exhibit a stronger spectral energy within the low spatial frequency range (below 1cpd), yielding to shorter latency of smooth pursuit eye movements. A weak and less consistent effect was found on initial eye acceleration, contrary what was previously observed with OFR. After 400ms, the steady-state tracking velocity matched the mean visual motion speed and pursuit performance was comparable with that observed with a control, small dot motion. Motion Clouds offer an efficient tool to probe the optimal window of visibility for human smooth pursuit through the manipulation of both the mean and the variability of spatial frequency.

Expériences autour de la perception de la forme en art et science
Expériences autour de la perception de la forme en art et science

La vision utilise un faisceau d’informations de différentes qualités pour atteindre une perception unifiée du monde environnant. Nous avons utilisé lors de plusieurs projets art-science (voir https://github.com/NaturalPatterns) des installations permettant de manipuler explicitement des composantes de ce flux d’information et de révéler des ambiguités dans notre perception. Dans l’installation Tropique, des faisceaux de lames lumineuses sont arrangés dans l’espace assombri de l’installation. Les spectateurs les observent grâce à leur interaction avec une brume invisible qui est diffusée dans l’espace. Dans Trame Élasticité, 25 parallélépipèdes de miroirs (3m de haut) sont arrangés verticalement sur une ligne horizontale. Ces lames sont rotatives et leurs mouvements est synchronisé. Suivant la dyamique qui est imposé à ces lames, la perception de l’espace environnent fluctue conduisant à recomposer l’espace de la concentration à l’expansion, ou encore à générer un surface semblant transparente ou inverser la visons de ce qui est située devant et derrière l’observateur. Enfin, dans Trame instabilité, nous explorons l’interaction de séries périodiques de points placées sur des surfaces transparentes. À partir de premières expérimentations utilisant une technique novatrice de sérigraphie, ces trames de points sont placées afin de faire émerger des structures selon le point de vue du spectateur. De manière générale, nous montrerons ici les différentes méthodes utilisées, comme l’utilisation des limites perceptives, et aussi les résultats apportés par une telle collaboration.

Efficient learning of sparse image representations using homeostatic regulation

One core advantage of sparse representations is the efficient coding of complex signals using compact codes. For instance, it allows for the representation of any image as a combination of few elements drawn from a large dictionary of basis functions. In the context of the efficient processing of natural images, we propose here that such codes can be optimized by designing proper homeostatic rules between the elements of the dictionary. Indeed, a common design for unsupervised learning rules relies on a gradient descent over a cost measuring representation quality with respect to sparseness. The sparseness constraint introduces a competition which can be optimized by ensuring that each item in the dictionary is selected as often as others. We implemented this rule by introducing a gain normalization similar to what is observed in a balanced neural network. We validated this theoretical insight by challenging different sparse coding algorithms with the same learning rule but with or without homeostasis. The different sparse coding algorithms were chosen for their efficiency and generality. They include least-angle regression, orthogonal matching pursuit and basis pursuit. Simulations show that for a given homeostasis rule, gradient descent performed similarly the learning of a dataset of image patches. While the coding algorithm did not matter much, including homeostasis changed qualitatively the learned features. In particular, homeostasis results in a more homogeneous set of orientation selective filters, which is closer to what is found in the visual cortex of mammals. To further validate these results, we applied this algorithm to the optimization of a visual system embedded in an aerial robot. As a consequence, this biologically-inspired learning rule demonstrates that principles observed in neural computations can help improve real-life machine learning algorithms.

Efficient learning of sparse image representations using homeostatic regulation

One core advantage of sparse representations is the efficient coding of complex signals using compact codes. For instance, it allows for the representation of any image as a combination of few elements drawn from a large dictionary of basis functions. In the context of the efficient processing of natural images, we propose here that such codes can be optimized by designing proper homeostatic rules between the elements of the dictionary. Indeed, a common design for unsupervised learning rules relies on a gradient descent over a cost measuring representation quality with respect to sparseness. The sparseness constraint introduces a competition which can be optimized by ensuring that each item in the dictionary is selected as often as others. We implemented this rule by introducing a gain normalization similar to what is observed in a balanced neural network. We validated this theoretical insight by challenging different sparse coding algorithms with the same learning rule but with or without homeostasis. The different sparse coding algorithms were chosen for their efficiency and generality. They include least-angle regression, orthogonal matching pursuit and basis pursuit. Simulations show that for a given homeostasis rule, gradient descent performed similarly the learning of a dataset of image patches. While the coding algorithm did not matter much, including homeostasis changed qualitatively the learned features. In particular, homeostasis results in a more homogeneous set of orientation selective filters, which is closer to what is found in the visual cortex of mammals. To further validate these results, we applied this algorithm to the optimization of a visual system embedded in an aerial robot. As a consequence, this biologically-inspired learning rule demonstrates that principles observed in neural computations can help improve real-life machine learning algorithms.

Testing the odds of inherent vs. observed overdispersion in neural spike counts

The repeated presentation of an identical visual stimulus in the receptive field of a neuron may evoke different spiking patterns at each trial. Probabilistic methods are essential to understand the functional role of this variance within the neural activity. In that case, a Poisson process is the most common model of trial-to-trial variability. For a Poisson process, the variance of the spike count is constrained to be equal to the mean, irrespective of the duration of measurements. Numerous studies have shown that this relationship does not generally hold. Specifically, a majority of electrophysiological recordings show an " over-dispersion " effect: Responses that exhibit more inter-trial variability than expected from a Poisson process alone. A model that is particularly well suited to quantify over-dispersion is the Negative-Binomial distribution model. This model is well-studied and widely used but has only recently been applied to neuroscience. In this paper, we address three main issues. First, we describe how the Negative-Binomial distribution provides a model apt to account for overdispersed spike counts. Second, we quantify the significance of this model for any neurophysiological data by proposing a statistical test, which quantifies the odds that over-dispersion could be due to the limited number of repetitions (trials). We apply this test to three neurophysiological tests along the visual pathway. Finally, we compare the performance of this model to the Poisson model on a population decoding task. We show that the decoding accuracy is improved when accounting for over-dispersion, especially under the hypothesis of tuned over-dispersion.

Push-Pull Receptive Field Organization and Synaptic Depression: Mechanisms for Reliably Encoding Naturalistic Stimuli in V1
Push-Pull Receptive Field Organization and Synaptic Depression: Mechanisms for Reliably Encoding Naturalistic Stimuli in V1

Neurons in the primary visual cortex are known for responding vigorously but with high variability to classical stimuli such as drifting bars or gratings. By contrast, natural scenes are encoded more efficiently by sparse and temporal precise spiking responses. We used a conductance-based model of the visual system in higher mammals to investigate how two specific features of the thalamo-cortical pathway, namely push-pull receptive field organization and synaptic depression, can contribute to this contextual reshaping of V1 responses. By comparing cortical dynamics evoked respectively by natural vs. artificial stimuli in a comprehensive parametric space analysis, we demonstrate that the reliability and sparseness of the spiking responses during natural vision is not a mere consequence of the increased bandwidth in the sensory input spectrum. Rather, it results from the combined impacts of synaptic depression and push-pull inhibition, the later acting for natural scenes as a form of ``effective’’ feed-forward inhibition as demonstrated in other sensory systems. Thus, the combination of feedforward-like inhibition with fast thalamo-cortical synaptic depression by simple cells receiving a direct structured input from thalamus composes a generic computational mechanism for generating a sparse and reliable encoding of natural sensory events.

Compensation of oculomotor delays in the visual system's network

We consider the problem of sensorimotor delays in the optimal control of movement under uncertainty. Specifically, we consider axonal conduction delays in the visuo-oculomotor loop and their implications for active inference. Active inference uses a generalisation of Kalman filtering to provide Bayes optimal estimates of hidden states and action in generalised coordinates of motion. Representing hidden states in generalised coordinates provides a simple means of compensating for both sensory and oculomotor delays. This compensation is illustrated using neuronal simulations of oculomotor following responses with and without compensation. We then consider an extension of the generative model that produces ocular following to simulate smooth pursuit eye movements in which the system believes both the target and its centre of gaze are attracted by a (fictive) point moving in the visual field. Finally, the generative model is equipped with a hierarchical structure, so that it can register and remember unseen (occluded) trajectories and emit anticipatory responses. These simulations speak to a straightforward and neurobiologically plausible solution to the generic problem of integrating information from different sources with different temporal delays and the particular difficulties encountered when a system, like the oculomotor system, tries to control its environment with delayed signals.

A dynamic model for decoding direction and orientation in macaque primary visual cortex

Natural scenes generally contain objects in motion. The local orientation of their contours and the direction of motion are two essential components of visual information which are processed in parallel in the early visual areas. Focusing on the primary visual cortex of the macaque monkey (V1), we challenged different models for the joint representation of orientation and direction within the neural activity. Precisely, we considered the response of V1 neurons to an oriented moving bar to investigate whether, and how, the information about the bar’s orientation and direction could be encoded dynamically at the population activity level. For that purpose, we used a decoding approach based on a space-time receptive field model that encodes jointly orientation and direction. We based our decoding approach on the statistics of natural scenes by first determining optimal space-time receptive fields (RFs) that encode orientation and direction. For this, we first derived a set of dynamic filters from a database of natural images~[1] and following an unsupervised learning rule~[2]. More generally, this allows us to propose a dynamic generative model for the joint coding of orientation and direction. Then, using this model and a maximum likelihood paradigm, we infer the most likely representation for a given network activity~[3, 4]. We tested this model on surrogate data and on extracellular recordings in area emphV1 (67 cells) of awake macaque monkeys in response to oriented bars moving in $12$ different directions. Using a cross-validation method we could robustly decode both the orientation and the direction of the bar within the classical receptive field (cRF). Furthermore, this decoding approach shows different properties: First, information about the orientation of the bar is emerging ıt before entering the cRF if the trajectory of the bar is long enough. Second, when testing different orientations with the same direction, our approach unravels that we can decode the direction and the orientation independently. Moreover, we found that, similarly to orientation decoding, the decoding of direction is dynamic but weaker. Finally, our results demonstrate that the orientation and the direction of motion of an ambiguous moving bar can be progressively decoded in V1. This is a signature of a dynamic solution to the aperture problem in area V1, similarly to what was already found in area MT~[5]. $[1]$ J. Burge, W. Geisler. Optimal speed estimation in natural image movies predicts human performance. Nature Communications, 6, 7900. http://doi.org/10.1038/ncomms8900, 2015. $[2]$ L. Perrinet. Role of homeostasis in learning sparse representations. ıt Neural Computation, 22(7):1812–36, 2010. $[3]$ M. Jazayeri and J.A. Movshon. Optimal representation of sensory information by neural populations. ıt Nature Neuroscience, 9(5):690–696, 2006. $[4]$ W. Taouali, G. Benvenuti, P. Wallisch, F. Chavane, L. Perrinet. Testing the Odds of Inherent versus Observed Over-dispersion in Neural Spike Counts. ıt Journal of Neurophysiology, 2015. $[5]$ C. Pack, R. Born. Temporal dynamics of a neural solution to the aperture problem in visual area MT of macaque brain. ıt Nature, 409(6823), 1040–1042. 2001.

A dynamic model for decoding direction and orientation in macaque primary visual cortex

Natural scenes generally contain objects in motion. The local orientation of their contours and the direction of motion are two essential components of visual information which are processed in parallel in the early visual areas. Focusing on the primary visual cortex of the macaque monkey (V1), we challenged different models for the joint representation of orientation and direction within the neural activity. Precisely, we considered the response of V1 neurons to an oriented moving bar to investigate whether, and how, the information about the bar’s orientation and direction could be encoded dynamically at the population activity level. For that purpose, we used a decoding approach based on a space-time receptive field model that encodes jointly orientation and direction. We based our decoding approach on the statistics of natural scenes by first determining optimal space-time receptive fields (RFs) that encode orientation and direction. For this, we first derived a set of dynamic filters from a database of natural images~[1] and following an unsupervised learning rule~[2]. More generally, this allows us to propose a dynamic generative model for the joint coding of orientation and direction. Then, using this model and a maximum likelihood paradigm, we infer the most likely representation for a given network activity~[3, 4]. We tested this model on surrogate data and on extracellular recordings in area emphV1 (67 cells) of awake macaque monkeys in response to oriented bars moving in $12$ different directions. Using a cross-validation method we could robustly decode both the orientation and the direction of the bar within the classical receptive field (cRF). Furthermore, this decoding approach shows different properties: First, information about the orientation of the bar is emerging ıt before entering the cRF if the trajectory of the bar is long enough. Second, when testing different orientations with the same direction, our approach unravels that we can decode the direction and the orientation independently. Moreover, we found that, similarly to orientation decoding, the decoding of direction is dynamic but weaker. Finally, our results demonstrate that the orientation and the direction of motion of an ambiguous moving bar can be progressively decoded in V1. This is a signature of a dynamic solution to the aperture problem in area V1, similarly to what was already found in area MT~[5]. $[1]$ J. Burge, W. Geisler. Optimal speed estimation in natural image movies predicts human performance. Nature Communications, 6, 7900. http://doi.org/10.1038/ncomms8900, 2015. $[2]$ L. Perrinet. Role of homeostasis in learning sparse representations. ıt Neural Computation, 22(7):1812–36, 2010. $[3]$ M. Jazayeri and J.A. Movshon. Optimal representation of sensory information by neural populations. ıt Nature Neuroscience, 9(5):690–696, 2006. $[4]$ W. Taouali, G. Benvenuti, P. Wallisch, F. Chavane, L. Perrinet. Testing the Odds of Inherent versus Observed Over-dispersion in Neural Spike Counts. ıt Journal of Neurophysiology, 2015. $[5]$ C. Pack, R. Born. Temporal dynamics of a neural solution to the aperture problem in visual area MT of macaque brain. ıt Nature, 409(6823), 1040–1042. 2001.

Biologically Inspired Computer Vision
Biologically Inspired Computer Vision

As the state-of-the-art imaging technologies became more and more advanced, yielding scientific data at unprecedented detail and volume, the need to process and interpret all the data has made image processing and computer vision also increasingly important. Sources of data that have to be routinely dealt with today applications include video transmission, wireless communication, automatic fingerprint processing, massive databases, non-weary and accurate automatic airport screening, robust night vision to name a few. Multidisciplinary inputs from other disciplines such as computational neuroscience, cognitive science, mathematics, physics and biology will have a fundamental impact in the progress of imaging and vision sciences. One of the advantages of the study of biological organisms is to devise very different type of computational paradigms beyond the usual von Neumann e.g. by implementing a neural network with a high degree of local connectivity. This is a comprehensive and rigorous reference in the area of biologically motivated vision sensors. The study of biologically visual systems can be considered as a two way avenue. On the one hand, biological organisms can provide a source of inspiration for new computational efficient and robust vision models and on the other hand machine vision approaches can provide new insights for understanding biological visual systems. Along the different chapters, this book covers a wide range of topics from fundamental to more specialized topics, including visual analysis based on a computational level, hardware implementation, and the design of new more advanced vision sensors. The last two sections of the book provide an overview of a few representative applications and current state of the art of the research in this area. This makes it a valuable book for graduate, Master, PhD students and also researchers in the field.

Anticipatory smooth eye movements and reinforcement

When an object is moving in the visual field, we are able to accurately track it with a combination of saccades and smooth eye movements. These movements allow us to align and stabilize the object on the fovea, thus enabling visual analysis with high acuity. Importantly, when predictive information is available about the target motion, anticipatory smooth pursuit eye movements (aSPEM) are efficiently generated before target appearance, which reduce the typical sensorimotor delay between target motion onset and foveation. By manipulating the probability for target motion direction we were able to bias the direction and mean velocity of aSPEM (baseline condition). This suggests that probabilistic information may be used to inform the internal representation of motion prediction for the initiation of anticipatory movements. To further understand the nature of this process, we investigate the effects of reinforcement on aSPEM with two distinct experiments. First, it has been previously shown that several properties of eye movements can be modulated by reinforcement paradigms based on monetary reward (Madelain et al. 2011). We adapted and extended this framework to prediction-based aSPEM, by associating a monetary reward to a criterion-matching anticipatory velocity, in the gap before the target onset. Second, it has also been reported that accurate perception per se can play the role of an efficient ecological reinforcer for visually guided saccades (Montagnini & Chelazzi, 2005). With a gaze-contingent procedure, we manipulated the discriminability of a perceptual target (appearing during the pursuit trial and followed by a discrimination task) The difficulty level of this task has been matched depending on the velocity of aSPEM. This experiment taps on the very reason to produce anticipatory tracking movement, that is to grant a quicker high-acuity vision of the moving target. We compare predictive anticipatory eye movements across these conditions.

On overdispersion in neuronal evoked activity

The repeated presentation of an identical visual stimulus in the receptive field of a neuron may evoke different spiking patterns at each trial. Probabilistic methods are essential to understand its functional role within the neural activity. In that case, a Poisson process is the most common model of trial-to-trial variability. However, the variance of the spike count is constrained to be equal to the mean, irrespective of measurement’s duration. Numerous studies have shown that this relationship does not generally hold. Specifically, a majority of electrophysiological recordings show an ``em overdispersion’’ effect: Responses that exhibit more inter-trial variability than expected from a Poisson process alone. A model that is particularly well suited to quantify overdispersion is the Negative-Binomial distribution model. This model is largely applied and studied but has only recently been applied to neuroscience. In this paper, we address three main issues. First, we describe how the Negative-Binomial distribution provides a model apt to account for overdispersed spike counts. Second, we quantify the significance of this model for any neurophysiological data by proposing a statistical test, which quantifies the odds that overdispersion could be due to the limited number of repetitions (trials). We apply this test to three neurophysiological tests along the visual pathway. Finally, we compare the performance of this model to the Poisson model on a population decoding task. This shows that more knowledge about the form of dispersion tuning is necessary to have a significant gain, uncovering a possible feature of the neural spiking code.

Edge co-occurrences can account for rapid categorization of natural versus animal images
Edge co-occurrences can account for rapid categorization of natural versus animal images

Making a judgment about the semantic category of a visual scene, such as whether it contains an animal, is typically assumed to involve high-level associative brain areas. Previous explanations require progressively analyzing the scene hierarchically at increasing levels of abstraction, from edge extraction to mid-level object recognition and then object categorization. Here we show that the statistics of edge co-occurrences alone are sufficient to perform a rough yet robust (translation, scale, and rotation invariant) scene categorization. We first extracted the edges from images using a scale-space analysis coupled with a sparse coding algorithm. We then computed the ``association field’’ for different categories (natural, man-made, or containing an animal) by computing the statistics of edge co-occurrences. These differed strongly, with animal images having more curved configurations. We show that this geometry alone is sufficient for categorization, and that the pattern of errors made by humans is consistent with this procedure. Because these statistics could be measured as early as the primary visual cortex, the results challenge widely held assumptions about the flow of computations in the visual system. The results also suggest new algorithms for image classification and signal processing that exploit correlations between low-level structure and the underlying semantic category.

Active inference, eye movements and oculomotor delays
Active inference, eye movements and oculomotor delays

This paper considers the problem of sensorimotor delays in the optimal control of (smooth) eye movements under uncertainty. Specifically, we consider delays in the visuo-oculomotor loop and their implications for active inference. Active inference uses a generalisation of Kalman filtering to provide Bayes optimal estimates of hidden states and action in generalised coordinates of motion. Representing hidden states in generalised coordinates provides a simple way of compensating for both sensory and oculomotor delays. The efficacy of this scheme is illustrated using neuronal simulations of pursuit initiation responses, with and without compensation. We then consider an extension of the generative model to simulate smooth pursuit eye movements in which the visuo-oculomotor system believes both the target and its centre of gaze are attracted to a (hidden) point moving in the visual field. Finally, the generative model is equipped with a hierarchical structure, so that it can recognise and remember unseen (occluded) trajectories and emit anticipatory responses. These simulations speak to a straightforward and neurobiologically plausible solution to the generic problem of integrating information from different sources with different temporal delays and the particular difficulties encountered when a system, like the oculomotor system, tries to control its environment with delayed signals.

Une Approche Computationnelle de La Dépendance Au Mouvement Du Codage de La Position Dans La Système Visuel

Cette thèse est centralisée sur cette question : comment est-ce que le système visuel peut coder efficacement la position des objets en mouvement, en dépit des diverses sources d’incertitude ? Cette étude déploie une hypothèse sur la connaissance a priori de la cohérence temporelle du mouvement (Burgi et al 2000; Yuille and Grzywacz 1989). Nous avons ici étendu le cadre de modélisation précédemment proposé pour expliquer le problème de l’ouverture (Perrinet and Masson, 2012). C’est un cadre d’estimation de mouvement Bayésien mis en oeuvre par un filtrage particulaire, que l’on appelle la prévision basé sur le mouvement (MBP). Sur cette base, nous avons introduit une théorie du codage de position basée sur le mouvement, et étudié comment les mécanismes neuronaux codant la position instantanée de l’objet en mouvement pourraient être affectés par le signal de mouvement le long d’une trajectoire. Les résultats de cette thèse suggèrent que le codage de la position basé sur le mouvement peut constituer un calcul neuronal générique parmi toutes les étapes du système visuel. Cela peut en partie compenser les effets cumulatifs des délais neuronaux dans le codage de la position. En outre, il peut expliquer des changements de position basés sur le mouvement, comme par example, l’Effect de Saut de Flash. Comme un cas particulier, nous avons introduit le modèle de MBP diagonal et avons reproduit la réponse anticipée de populations de neurones dans l’aire cortical V1. Nos résultats indiquent qu’un codage en position efficace et robuste peut être fortement dépendant de l’intégration le long de la trajectoire.

The characteristics of microsaccadic eye movements varied with the change of strategy in a match-to-sample task

Under natural viewing conditions, large eye movements are interspace by small eye movements (microsaccade). Recent works have shown that these two kinds of eye movements are generate by the same oculomotor mechanisms (Goffart et al., 2012) and are driven from the same visual information (Simoncini et al., VSS 2012 abstract). These results seem to demonstrate that microsaccade and saccade represent a continuum of the same ocular movement. However, if the role played in vision perception by large saccades is clearly identified, the role of the microsaccade is not clearly defined. In order to investigate the role of microsaccade, we measured pattern discrimination performance using an ABX match-to-sample task during the presentation of 1/f natural statistics texture where we varied the spatial frequency contents. We compared perceptual performance with eye movements recorded during the task. We found that the rate of microsaccadic movements changed as a function of the subjects task strategy. In particular, in the trials where the perception of the difference between the stimuli was simple (low spatial frequency) the subjects used the information provided by all the stimuli to do the task and the microsaccadic rate for all the stimuli (ABX) was the same. However, when the perception of the difference between the stimuli was harder (for instance for high spatial frequency), the subjects rather used the information provided by the last two stimuli only and the microsaccadic rate for the image BX increased respect at the image A. These results demonstrate that microsaccadic eye movements also play a role during the analysis of the visual scene and that such experiments can help decipher their participation to perception of the scene. Goffart L., Hafed Z.M., Krauzlis R.J. 2012. Visual fixation as equilibrium: evidence from superior colliculus inactivation. (31) 10627-10636.

Motion-based prediction model for flash lag effect

The flash lag effect (FLE) is a well known visual illusion that reveals the perceptual difference in position coding of moving and stationary flashed objects. It has been reproduced experimentally in retina and V1 along with some relevant evidences about motion based position coding in areas MT and MT+. Numerous hypotheses for mechanisms underlying FLE such as motion extrapolation, latency difference, position persistence, temporal averaging and postdiction have been under debate for last two decades. Here, we have challenged our previous motion-based prediction model to understand FLE, consistently with the motion extrapolation account proposed by Nijhawan. Our hypothesis is based on predictability of motion trajectory and importance of motion signal in manipulation of receptive field shape for moving objects. Using a probabilistic framework, we have implemented motion-based prediction (MBP) and simulated three different demonstrations of FLE including standard, flash initiated and flash terminated cycles. This method allowed us to compare the shape of the characteristic receptive fields for moving and stationary flashed dots in the case of rightward and leftward motions. As control model, we have eliminated velocity signal from motion estimation and simulated position-based (PX) model of FLE. Results of MBP model suggest that above a minimal time for duration of flash, the development of predictive component for the moving object is sufficient to shift in the direction of motion and to produce flash lag effect. MBP model reproduces experimental data of FLE and its dependence to the contrast of flash. Against what has been argued as shortage of motion extrapolation account, in our results spatial lead of moving object is also evident in flash initiated cycle. Our model, without being restricted to one special visual area, provides a generic account for FLE by emphasize on different manipulation of stationary objects and trajectory motion by the sensory system.

Edge co-occurrences are sufficient to categorize natural versus animal images

Analysis and interpretation of a visual scene to extract its category, such as whether it contains an animal, is typically assumed to involve higher-level associative brain areas. Previous proposals have been based on a series of processing steps organized in a multi-level hierarchy that would progressively analyze the scene at increasing levels of abstraction, from contour extraction to low-level object recognition and finally to object categorization (Serre, PNAS 2007). We explore here an alternative hypothesis that the statistics of edge co-occurences are sufficient to perform a rough yet robust (translation, scale, and rotation invariant) scene categorization. The method is based on a realistic model of image analysis in the primary visual cortex that extends previous work from Geisler et al. (Vis. Res. 2001). Using a scale-space analysis coupled with a sparse coding algorithm, we achieved detailed and robust extraction of edges in different sets of natural images. This edge-based representation allows for a simple characterization of the ``association field’’ of edges by computing the statistics of co-occurrences. We show that the geometry of angles made between edges is sufficient to distinguish between different sets of natural images taken in a variety of environments (natural, man-made, or containing an animal). Specifically, a simple classifier, working solely on the basis of this geometry, gives performance similar to that of hierarchical models and of humans in rapid-categorization tasks. Such results call attention to the importance of the relative geometry of local image patches in visual computation, with implications for designing efficient image analysis systems. Most importantly, they challenge assumptions about the flow of computations in the visual system and emphasize the relative importance in this process of associative connections, and in particular of intra-areal lateral connections.

Beyond simply faster and slower: exploring paradoxes in speed perception

Estimating object speed in visual scenes is a critical part of perception. While various aspects of speed computation including discrimination thresholds, neural mechanisms and spatial integration mechanisms have been studied, there remain areas to elucidate. One is the integration of information across spatio-temporal frequency channels to compute speed. We probe this integration with a 2-AFC psychophysical task in which moving random phase noise stimuli are used with experimenter defined frequency parameters and bandwidths to target specific neural populations. They are presented for 300ms in a large square aperture with smooth eye movements recorded while speed discrimination judgements are made over two intervals. There is no instruction to observers to pursue the stimuli and no pre trial saccade to induce a classic ocular following response. After a latency, eye movements follow the stimulated direction presumably to facilitate the speed judgement. Within each of the two intervals, we randomly vary a range of spatial frequency and speed parameters respectively such that stimuli at the centre of the ranges are identical. The aim is to characterise the speed response of the eye movements recorded in a context which creates an ocular motor ‘action’ during a perceptual task instead of artificially separating the two. Within the speed varied intervals, averaged eye movements are systematically modulated in strength by stimulus speed. Within the spatial frequency intervals, higher frequencies perceived as faster in discrimination responses interestingly show no corresponding strengthening of eye responses particularly at higher contrasts where they may be weaker. Thus for a pair of stimuli matched for contrast and perceived speed, this early eye response appears to be driven by a contrast dependent low level motion energy like computation. We characterise an underlying spatial frequency response which is shifted towards lower frequencies, unlike the perceptual responses and is probably separate from perception.

Signature of an anticipatory response in area V1 as modeled by a probabilistic model and a spiking neural network

As it is confronted to inherent neural delays, how does the visual system create a coherent representation of a rapidly changing environment? In this paper, we investigate the role of motion-based prediction in estimating motion trajectories compensating for delayed information sampling. In particular, we investigate how anisotropic diffusion of information may explain the development of anticipatory response as recorded in a neural population to an approaching stimulus. We validate this using an abstract probabilistic framework and a spiking neural network (SNN) model. Inspired by a mechanism proposed by Nijhawan [1], we first use a Bayesian particle filter framework and introduce a diagonal motion-based prediction model which extrapolates the estimated response to a delayed stimulus in the direction of the trajectory. In the SNN implementation, we have used this pattern of anisotropic, recurrent connections between excitatory cells as mechanism for motion-extrapolation. Consistent with recent experimental data collected in extracellular recordings of macaque primary visual cortex [2], we have simulated different trajectory lengths and have explored how anticipatory responses may be dependent on the information accumulated along the trajectory. We show that both our probabilistic framework and the SNN model can replicate the experimental data qualitatively. Most importantly, we highlight requirements for the development of a trajectory-dependent anticipatory response, and in particular the anisotropic nature of the connectivity pattern which leads to the motion extrapolation mechanism.

Motion-based prediction explains the role of tracking in motion extrapolation

During normal viewing, the continuous stream of visual input is regularly interrupted, for instance by blinks of the eye. Despite these frequents blanks (that is the transient absence of a raw sensory source), the visual system is most often able to maintain a continuous representation of motion. For instance, it maintains the movement of the eye such as to stabilize the image of an object. This ability suggests the existence of a generic neural mechanism of motion extrapolation to deal with fragmented inputs. In this paper, we have modeled how the visual system may extrapolate the trajectory of an object during a blank using motion-based prediction. This implies that using a prior on the coherency of motion, the system may integrate previous motion information even in the absence of a stimulus. In order to compare with experimental results, we simulated tracking velocity responses. We found that the response of the motion integration process to a blanked trajectory pauses at the onset of the blank, but that it quickly recovers the information on the trajectory after reappearance. This is compatible with behavioral and neural observations on motion extrapolation. To understand these mechanisms, we have recorded the response of the model to a noisy stimulus. Crucially, we found that motion-based prediction acted at the global level as a gain control mechanism and that we could switch from a smooth regime to a binary tracking behavior where the dot is tracked or lost. Our results imply that a local prior implementing motion-based prediction is sufficient to explain a large range of neural and behavioral results at a more global level. We show that the tracking behavior deteriorates for sensory noise levels higher than a certain value, where motion coherency and predictability fail to hold longer. In particular, we found that motion-based prediction leads to the emergence of a tracking behavior only when enough information from the trajectory has been accumulated. Then, during tracking, trajectory estimation is robust to blanks even in the presence of relatively high levels of noise. Moreover, we found that tracking is necessary for motion extrapolation, this calls for further experimental work exploring the role of noise in motion extrapolation.

Measuring speed of moving textures: Different pooling of motion information for human ocular following and perception

The visual system does not process information instantaneously, but rather integrates over time. Integration occurs both for stationary objects and moving objects, with very similar time constants (Burr, 1981). We measured, as a function of exposure duration, speed discrimination and ocular following performance for rich textured motion stimuli of varying spatial frequency bandwidth. Psychometric sensitivity and Oculometric sensitivity for these patterns increased with exposure duration. However the best stimuli for ocular following (namely those with a large bandwidth for spatial frequency) was well integrated up to about 150 - 200 msec, while the best stimuli for speed discrimination (small bandwidth) was well integrated up to about 300 msec. Interestingly, discriminability of ocular tracking eye movements follow a non-monotonic time course, due to the contribution of motor noise. These results suggest that although perception and action relies work in synergy, they may be described by two different integrating mechanisms: A low level, fast one guiding the ocular movement to enable one to catch stimuli in the visual fi eld quickly; and a slower one being able to measure the speed difference between two objects translating in the visual fi eld. Burr, D.C. (1981). Temporal summation of moving images by the human visual system. Proceedings of Royal Society, B211, 321 - 339

How and why do image frequency properties influence perceived speed?

Humans are able to interact successfully with moving objects in our dynamic world and the visual system effi ciently performs the motion computation that makes this possible. Object speed and direction are estimated following the integration of information across cortical motion sensitive channels. Speed estimation along this system is not fully understood, particularly the mapping function between the actual speed of viewed objects and that perceived by observers, a question we address in this work. It has been demonstrated that perceived speed is profoundly influenced by object contrast, spatial frequency, stimulus complexity and frequency bandwidth. In a 2 interval forced choice speed discrimination task, we present a random phase textured motion stimulus to probe small shifts in perceived speed measured using fi xed stimulus sets as reference scales while mean spatial frequency and bandwidths serve as the dependent variable in a probe. The presentations are short (200ms). Using a scale of narrowband stimuli (0.2 octaves), we measured a shift in perceived speed; higher frequencies are seen as faster moving than lower ones. On the scale of broader bandwidth (1 octave), this difference across frequency was reduced and perceived speed seems to converge on a slower representation. From these results we estimated this mapping between perceived and veridical stimulus speeds. In direct comparisons, the relative speed is faster for high frequencies and increases in bandwidth make stimuli appear slower. During this early computation, when presented with a random phase stimulus it appears that the visual systems makes assumptions about expected speeds based on the richness of the frequency content and the veridical speed is not explicitly computed. In this first 200ms, the perceptual system perhaps underestimates some speeds in an optimal response for initially stabilizing the scene. Acknowledgement: CNRS & Brainscales FP7

Active inference, eye movements and oculomotor delays

We consider the problem of sensorimotor delays in the optimal control of movement under uncertainty. Specifically, we consider axonal conduction delays in the visuo-oculomotor loop and their implications for active inference. Active inference uses a generalisation of Kalman filtering to provide Bayes optimal estimates of hidden states and action in generalised coordinates of motion. Representing hidden states in generalised coordinates provides a simple means of compensating for both sensory and oculomotor delays. This compensation is illustrated using neuronal simulations of oculomotor following responses with and without compensation. We then consider an extension of the generative model that produces ocular following to simulate smooth pursuit eye movements in which the system believes both the target and its centre of gaze are attracted by a (fictive) point moving in the visual field. Finally, the generative model is equipped with a hierarchical structure, so that it can register and remember unseen (occluded) trajectories and emit anticipatory responses. These simulations speak to a straightforward and neurobiologically plausible solution to the generic problem of integrating information from different sources with different temporal delays and the particular difficulties encountered when a system, like the oculomotor system, tries to control its environment with delayed signals.

Active inference, eye movements and oculomotor delays

We consider the problem of sensorimotor delays in the optimal control of movement under uncertainty. Specifically, we consider axonal conduction delays in the visuo-oculomotor loop and their implications for active inference. Active inference uses a generalisation of Kalman filtering to provide Bayes optimal estimates of hidden states and action in generalised coordinates of motion. Representing hidden states in generalised coordinates provides a simple means of compensating for both sensory and oculomotor delays. This compensation is illustrated using neuronal simulations of oculomotor following responses with and without compensation. We then consider an extension of the generative model that produces ocular following to simulate smooth pursuit eye movements in which the system believes both the target and its centre of gaze are attracted by a (fictive) point moving in the visual field. Finally, the generative model is equipped with a hierarchical structure, so that it can register and remember unseen (occluded) trajectories and emit anticipatory responses. These simulations speak to a straightforward and neurobiologically plausible solution to the generic problem of integrating information from different sources with different temporal delays and the particular difficulties encountered when a system, like the oculomotor system, tries to control its environment with delayed signals.

Smooth Pursuit and Visual Occlusion: Active Inference and Oculomotor Control in Schizophrenia

This paper introduces a model of oculomotor control during the smooth pursuit of occluded visual targets. This model is based upon active inference, in which subjects try to minimise their (proprioceptive) prediction error based upon posterior beliefs about the hidden causes of their (exteroceptive) sensory input. Our model appeals to a single principle -the minimisation of variational free energy - to provide Bayes optimal solutions to the smooth pursuit problem. However, it tries to accommodate the cardinal features of smooth pursuit of partially occluded targets that have been observed empirically in normal subjects and schizophrenia. Specifically, we account for the ability of normal subjects to anticipate periodic target trajectories and emit pre-emptive smooth pursuit eye movements -prior to the emergence of a target from behind an occluder. Furthermore, we show that a single deficit in the postsynaptic gain of prediction error units (encoding the precision of posterior beliefs) can account for several features of smooth pursuit in schizophrenia: namely, a reduction in motor gain and anticipatory eye movements during visual occlusion, a paradoxical improvement in tracking unpredicted deviations from target trajectories and a failure to recognise and exploit regularities in the periodic motion of visual targets. This model will form the basis of subsequent (dynamic causal) models of empirical eye tracking measurements, which we hope to validate, using psychopharmacology and studies of schizophrenia.

Pattern discrimination for moving random textures: Richer stimuli are more difficult to recognize
Pattern discrimination for moving random textures: Richer stimuli are more difficult to recognize

In order to analyze the characteristics of a rich dynamic visual environment, the visual system must integrate information collected at different scales through different spatiotemporal frequency channels. Still, it remains unclear how reliable representations of motion direction or speed are elaborated when presented with large bandwidth motion stimuli or natural statistics. Last year, we have shown that broadening the spatiotemporal frequency content of a textured pattern moving at constant speed leads to different results on a reflexive tracking task and a speed discrimination task. Larger bandwidth stimuli increase response amplitude and sensitivity of ocular following, consistently with a maximum-likelihood (ML) model of motion decoding. In contrast, larger bandwidth stimuli impair speed discrimination performance, suggesting that the perceptual system cannot take advantage of such additional, redundant information. Instead of ML, a gain control decoding mechanism can explain the drop in performance, suggesting that action and perception rely on different decoding mechanisms. To further investigate such task-dependant pooling of motion information, we measured pattern discrimination performance using these textured stimuli. Two noise patterns were presented sequentially for 250 ms on a CRT monitor (1280 × 1024 @ 100 Hz) and covered 47$,^∘$ of visual angle with identical properties (mean SF, bandwidth SF, speed) except for a randomized phase spectrum. A test pattern was then presented and subjects were asked to match it with one or the other reference stimulus (ABX task). At small bandwidth and optimal mean spatial frequency (0.3 cpd), subjects were able to discriminate the two patterns with high accuracy. Performance dropped to chance level as spatial frequency bandwidth increased. Increasing the mean spatial frequency decreased the overall performance. Again, these results suggest that perceptual performance is deteriorated in presence of larger information.

Motion-based prediction is sufficient to solve the aperture problem
Motion-based prediction is sufficient to solve the aperture problem

In low-level sensory systems, it is still unclear how the noisy information collected locally by neurons may give rise to a coherent global percept. This is well demonstrated for the detection of motion in the aperture problem: as luminance of an elongated line is symmetrical along its axis, tangential velocity is ambiguous when measured locally. Here, we develop the hypothesis that motion-based predictive coding is sufficient to infer global motion. Our implementation is based on a context-dependent diffusion of a probabilistic representation of motion. We observe in simulations a progressive solution to the aperture problem similar to psychophysics and behavior. We demonstrate that this solution is the result of two underlying mechanisms. First, we demonstrate the formation of a tracking behavior favoring temporally coherent features independently of their texture. Second, we observe that incoherent features are explained away while coherent information diffuses progressively to the global scale. Most previous models included ad-hoc mechanisms such as end-stopped cells or a selection layer to track specific luminance-based features. Here, we have proved that motion-based predictive coding, as it is implemented in this functional model, is sufficient to solve the aperture problem. This simpler solution may give insights in the role of prediction underlying a large class of sensory computations.

Motion-based prediction is sufficient to solve the aperture problem

In low-level sensory systems, it is still unclear how the noisy information collected locally by neurons may give rise to a coherent global percept. This is well demonstrated for the detection of motion in the aperture problem: as luminance of an elongated line is symmetrical along its axis, tangential velocity is ambiguous when measured locally. Here, we develop the hypothesis that motion-based predictive coding is sufficient to infer global motion. Our implementation is based on a context-dependent diffusion of a probabilistic representation of motion. We observe in simulations a progressive solution to the aperture problem similar to psychophysics and behavior. We demonstrate that this solution is the result of two underlying mechanisms. First, we demonstrate the formation of a tracking behavior favoring temporally coherent features independently of their texture. Second, we observe that incoherent features are explained away while coherent information diffuses progressively to the global scale. Most previous models included ad-hoc mechanisms such as end-stopped cells or a selection layer to track specific luminance-based features. Here, we have proved that motion-based predictive coding, as it is implemented in this functional model, is sufficient to solve the aperture problem. This simpler solution may give insights in the role of prediction underlying a large class of sensory computations.

Measuring speed of moving textures: Different pooling of motion information for human ocular following and perception.

To measure speed and direction of moving objects, the cortical motion system pools information across different spatiotemporal channels. One yet unsolved question is to understand how the brain pools this information and whether this pooling is generic or adaptive at the behavioral contexts. Here, we investigate in humans this integration process for two different tasks: psychophysical speed discrimination and ocular following eye movements, which are a probe of early motion detection and integration (Masson & Perrinet, 2011). For both tasks, we used short presentations of ``moving textures’’ stimuli (Schrater et al., 2000) in which the width of the spatial frequency distribution (Bsf) was varied. We found that larger Bsf elicited stronger initial eye velocity during the open-loop part of tracking responses. Moreover, richer stimuli resulted in more accurate and reliable motor responses. By contrast, larger Bsf had a detrimental effect upon speed discrimination performance: speed discrimination thresholds linearly decreased when the width of spatial frequency distribution increased. These opposite results can be explained by a different decoding strategy where speed information is under the control of different gain setting mechanisms. We tested this model by measuring contrast response functions of both ocular following and speed discrimination for each Bsf. We found that varying spatial frequency distribution had opposite effect upon contrast gain control. Increasing Bsf lowered half-saturation contrast for ocular following but increased it for perception. Our results supports the view that speed-based perception and tracking eye movements are under the control of different early decoding mechanism. References Masson, G.S. & Perrinet, L.U. The behavioural receptive field underlying motion integration for primate tracking eye movements. Neurosci. BioBehav. Review 36, 1-25 (2011). Schrater, P.R., Knill, D.C. & Simoncelli, E.P. Mechanism of visual motion detection. Nat. Neurosci. 3, 64-68 (2000).

Effect of image statistics on fixational eye movements

Under natural viewing conditions, small movements of the eyes prevent the maintenance of a steady direction of gaze. It is unclear how the spatiotemporal content of the fixated scene has an impact on the properties of miniatures, fixational eye movements. We have investigated the characteristics of fixational eye movements recorded while human subjects are instructed to fixate natural statistics random textures (Motion Clouds) in which we manipulated the spatial frequency content. We used long presentations (5 sec) of Motion Clouds stimuli (Schrater et al. 2000) of varying spatial frequency bandwidths (Bsf) around different central spatial frequency (Sf0). We found that central spatial frequency has an effect upon microsaccadic eye movements. In particular, smaller saccadic amplitudes were associated with high spatial frequencies, and larger saccades with low spatial frequencies. Broadening the spatial frequency bandwidth also changed the distribution of microsaccade amplitudes. A lower spatial frequencies, larger Bsf resulted in a large reduction of microsaccades amplitude while fixation behavior for high spatial frequencies texture was not affected. Relationship between microsaccade rate and intersaccadic timing was also dependent upon Bsf. These results suggest that the spatial frequency content of the fixated images have a strong impact upon fixation instability.

Complex dynamics in recurrent cortical networks based on spatially realistic connectivities
Complex dynamics in recurrent cortical networks based on spatially realistic connectivities

Most studies on the dynamics of recurrent cortical networks are either based on purely random wiring or neighborhood couplings. Neuronal cortical connectivity, however, shows a complex spatial pattern composed of local and remote patchy connections. We ask to what extent such geometric traits influence the ’’ idle’’ dynamics of two-dimensional (2d) cortical network models composed of conductance-based integrate-and-fire (iaf) neurons. In contrast to the typical 1 mm2 used in most studies, we employ an enlarged spatial set-up of 25 mm2 to provide for long-range connections. Our models range from purely random to distance-dependent connectivities including patchy projections, i.e., spatially clustered synapses. Analyzing the characteristic measures for synchronicity and regularity in neuronal spiking, we explore and compare the phase spaces and activity patterns of our simulation results. Depending on the input parameters, different dynamical states appear, similar to the known synchronous regular ’’ SR’’ or asynchronous irregular ’’ AI’’ firing in random networks. Our structured networks, however, exhibit shifted and sharper transitions, as well as more complex activity patterns. Distance-dependent connectivity structures induce a spatio-temporal spread of activity, e.g., propagating waves, that random networks cannot account for. Spatially and temporally restricted activity injections reveal that a high amount of local coupling induces rather unstable AI dynamics. We find that the amount of local versus long-range connections is an important parameter, whereas the structurally advantageous wiring cost optimization of patchy networks has little bearing on the phase space.

Active inference, smooth pursuit and oculomotor delays

We consider the problem of sensorimotor delays in the optimal control of movement under uncertainty. Specifically, we consider axonal conduction delays in the visuo-oculomotor loop and their implications for active inference. Active inference uses a generalisation of Kalman filtering to provide Bayes optimal estimates of hidden states and action in generalised coordinates of motion. Representing hidden states in generalised coordinates provides a simple means of compensating for both sensory and oculomotor delays. This compensation is illustrated using neuronal simulations of oculomotor following responses with and without compensation. We then consider an extension of the generative model that produces ocular following to simulate smooth pursuit eye movements in which the system believes both the target and its centre of gaze are attracted by a (fictive) point moving in the visual field. Finally, the generative model is equipped with a hierarchical structure, so that it can register and remember unseen (occluded) trajectories and emit anticipatory responses. These simulations speak to a straightforward and neurobiologically plausible solution to the generic problem of integrating information from different sources with different temporal delays and the particular difficulties encountered when a system, like the oculomotor system, tries to control its environment with delayed signals. This work was supported from the European Community’s Seventh Framework Program FP7/2007-2013 under grant agreement number 214728-2, (CODDE)

Pattern discrimination for moving random textures: Richer stimuli are more difficult to recognize

In order to analyze the characteristics of a rich dynamic visual environment, the visual system must integrate information collected at different scales through different spatiotemporal frequency channels. Still, it remains unclear how reliable representations of motion direction or speed are elaborated when presented with large bandwidth motion stimuli or natural statistics. Last year, we have shown that broadening the spatiotemporal frequency content of a textured pattern moving at constant speed leads to different results on a reflexive tracking task and a speed discrimination task. Larger bandwidth stimuli increase response amplitude and sensitivity of ocular following, consistently with a maximum-likelihood (ML) model of motion decoding. In contrast, larger bandwidth stimuli impair speed discrimination performance, suggesting that the perceptual system cannot take advantage of such additional, redundant information. Instead of ML, a gain control decoding mechanism can explain the drop in performance, suggesting that action and perception rely on different decoding mechanisms. To further investigate such task-dependant pooling of motion information, we measured pattern discrimination performance using these textured stimuli. Two noise patterns were presented sequentially for 250 ms on a CRT monitor (1280 × 1024 @ 100 Hz) and covered 47$,^∘$ of visual angle with identical properties (mean SF, bandwidth SF, speed) except for a randomized phase spectrum. A test pattern was then presented and subjects were asked to match it with one or the other reference stimulus (ABX task). At small bandwidth and optimal mean spatial frequency (0.3 cpd), subjects were able to discriminate the two patterns with high accuracy. Performance dropped to chance level as spatial frequency bandwidth increased. Increasing the mean spatial frequency decreased the overall performance. Again, these results suggest that perceptual performance is deteriorated in presence of larger information.

Pursuing motion illusions: a realistic oculomotor framework for Bayesian inference

Accuracy in estimating an object’s global motion over time is not only affected by the noise in visual motion information but also by the spatial limitation of the local motion analyzers (aperture problem). Perceptual and oculomotor data demonstrate that during the initial stages of the motion information processing, 1D motion cues related to the object’s edges have a dominating influence over the estimate of the object’s global motion. However, during the later stages, 2D motion cues related to terminators (edge-endings) progressively take over, leading to a final correct estimate of the object’s global motion. Here, we propose a recursive extension to the Bayesian framework for motion processing (Weiss, Simoncelli, Adelson, 2002) cascaded with a model oculomotor plant to describe the dynamic integration of 1D and 2D motion information in the context of smooth pursuit eye movements. In the recurrent Bayesian framework, the prior defined in the velocity space is combined with the two independent measurement likelihood functions, representing edge-related and terminator-related information, respectively to obtain the posterior. The prior is updated with the posterior at the end of each iteration step. The maximum-a posteriori (MAP) of the posterior distribution at every time step is fed into the oculomotor plant to produce eye velocity responses that are compared to the human smooth pursuit data. The recurrent model was tuned with the variance of pursuit responses to either p̈ure1̈D or r̈̈e ̈̈motion. The oculomotor plant was tuned with an independent set of oculomotor data, including the effects of line length (i.e. stimulus energy) and directional anisotropies in the smooth pursuit responses. The model not only provides an accurate qualitative account of dynamic motion integration but also a quantitative account that is close to the smooth pursuit response across several conditions (three contrasts and three speeds) for two human subjects.

Saccadic foveation of a moving visual target in the rhesus monkey

When generating a saccade toward a moving target, the target displacement that occurs during the period spanning from its detection to the saccade end must be taken into account to accurately foveate the target and to initiate its pursuit. Previous studies have shown that these saccades are characterized by a lower peak velocity and a prolonged deceleration phase. In some cases, a second peak eye velocity appears during the deceleration phase, presumably reflecting the late influence of a mechanism that compensates for the target displacement occurring before saccade end. The goal of this work was to further determine in the head restrained monkey the dynamics of this putative compensatory mechanism. A step-ramp paradigm, where the target motion was orthogonal to a target step occurring along the primary axes, was used to estimate from the generated saccades: a component induced by the target step and another one induced by the target motion. Resulting oblique saccades were compared with saccades to a static target with matched horizontal and vertical amplitudes. This study permitted to estimate the time taken for visual motion-related signals to update the programming and execution of saccades. The amplitude of the motion-related component was slightly hypometric with an undershoot that increased with target speed. Moreover, it matched with the eccentricity that the target had 40-60 ms before saccade end. The lack of significant difference in the delay between the onsets of the horizontal and vertical components between saccades directed toward a static target and those aimed at a moving target questions the late influence of the compensatory mechanism. The results are discussed within the framework of the dual drive and mapping hypotheses.

Edge statistics in natural images versus laboratory animal environments: implications for understanding lateral connectivity in V1

Oriented edges in images of natural scenes tend to be aligned in collinear or co-circular arrangements, with lines and smooth curves more common than other possible arrangements of edges (Geisler et al., Vis Res 41:711-24, 2001). The visual system appears to take advantage of this prior information, and human contour detection and grouping performance is well predicted by such an association field (̈Field et al., Vis Res 33:173-93, 1993). One possible candidate substrate for implementing an association field in mammals is the set of long-range lateral connections between neurons in the primary visual cortex (V1), which could act to facilitate detection of contours matching the association field, and/or inhibit detection of other contours (Choe and Miikkulainen, Biol Cyb 90:75-88, 2004). To fill this role, the lateral connections would need to be orientation specific and aligned along contours, and indeed such an arrangement has been found in tree shrew primary visual cortex (Bosking et al., J Neurosci 17:2112-27, 1997). However, it is not yet known whether these patterns develop as a result of visual experience, or are simply hard-wired to be appropriate for the statistics of natural scenes. To investigate this issue, we examined the properties of the visual environment of laboratory animals, to determine whether the observed connection patterns are more similar to the statistics of the rearing environment or of a natural habitat. Specifically, we analyzed the cooccurence statistics of edge elements in images of natural scenes, and compared them to corresponding statistics for images taken from within the rearing environment of the animals in the Bosking et al. (1997) study. We used a modified version of the algorithm from Geisler et al. (2001), with a more general edge extraction algorithm that uses sparse coding to avoid multiple responses to a single edge. Collinearity and co-circularity results for natural images replicated qualitatively the results from Geisler et al. (2001), confirming that prior information about continuations appeared consistently in natural images. However, we find that the largely man-made environment in which these animals were reared has a significantly higher probability of collinear edge elements. We thus predict that if the lateral connection patterns are due to visual experience, the patterns in wild-raised tree shrews would be very different from those measured by Bosking et al. (1997), with shorter-range correlations and less emphasis on collinear continuations. This prediction can be tested in future experiments on matching groups of animals reared in different environments. W.H. Bosking and Y. Zhang and B. Schofield and D. Fitzpatrick (1997) Orientation selectivity and the arrangement of horizontal connections in tree shrew striate cortex Journal of Neuroscience 17:2112-27. E.M. Callaway and L.C. Katz (1990) Emergence and refinement of clustered horizontal connections in cat striate cortex. Journal of Neuroscience 10:1134–53. Y. Choe and R. Miikkulainen (2004) Contour integration and segmentation with self-organized lateral connections Biological Cybernetics 90:75-88. D.J. Field, A. Hayes, and R.F. Hess (1993) Contour integration by the human visual system: Evidence for a local s̈̈ociation field̈s̈̈ion Research 33:173–93. W.S. Geisler, J.S. Perry, B.J. Super, and D.P. Gallogly (2001) Edge co-occurrence in natural images predicts contour grouping performance. Vision Research 41:711-24.

Role of homeostasis in learning sparse representations
Role of homeostasis in learning sparse representations

Neurons in the input layer of primary visual cortex in primates develop edge-like receptive fields. One approach to understanding the emergence of this response is to state that neural activity has to efficiently represent sensory data with respect to the statistics of natural scenes. Furthermore, it is believed that such an efficient coding is achieved using a competition across neurons so as to generate a sparse representation, that is, where a relatively small number of neurons are simultaneously active. Indeed, different models of sparse coding coupled with Hebbian learning and homeostasis have been proposed that successfully match the observed emergent response. However, the specific role of homeostasis in learning such sparse representations is still largely unknown. By quantitatively assessing the efficiency of the neural representation during learning, we derive a cooperative homeostasis mechanism which optimally tunes the competition between neurons within the sparse coding algorithm. We apply this homeostasis while learning small patches taken from natural images and compare its efficiency with state-of-the-art algorithms. Results show that while different sparse coding algorithms give similar coding results, the homeostasis provides an optimal balance for the representation of natural images within the population of neurons. Competition in sparse coding is optimized when it is fair: By contributing to optimize statistical competition across neurons, homeostasis is crucial in providing a more efficient solution to the emergence of independent components.

NeuralEnsemble: Towards a meta-environment for network modeling and data analysis

NeuralEnsemble (http://neuralensemble.org) is a multilateral effort to coordinate and organise neuroscience software development efforts based around the Python programming language into a larger, meta-simulator software system. To this end, NeuralEnsemble hosts services for source code management and bug tracking (Subversion/Trac) for a number of open-source neuroscience tools, organizes an annual workshop devoted to collaborative software development in neuroscience, and manages a google-group discussion forum. Here, we present two NeuralEnsemble hosted projects: PyNN (http://neuralensemble.org/PyNN) is a package for simulator-independent specification of neuronal network models. You can write the code for a model once, using the PyNN API, and then run it without modification on any simulator that PyNN supports. Currently NEURON, NEST, PCSIM and a VLSI hardware implementation are fully supported. NeuroTools (http://neuralensemble.org/NeuroTools) is a set of tools to manage, store and analyse computational neuroscience simulations. It has been designed around PyNN, but can also be used for data from other simulation environments or even electrophysiological measurements. We will illustrate how the use of PyNN and NeuroTools ease the developmental process of models in computational neuroscience, enhancing collaboration between different groups and increasing the confidence in correctness of results. NeuralEnsemble efforts are supported by the European FACETS project (EU-IST-2005-15879)

Inferring monkey ocular following responses from V1 population dynamics using a probabilistic model of motion integration

Short presentation of a large moving pattern elicits an ocular following response that exhibits many of the properties attributed to low-level motion processing such as spatial and temporal integration, contrast gain control and divisive interaction between competing motions. Similar mechanisms have been demonstrated in V1 cortical activity in response to center-surround gratings patterns measured with real-time optical imaging in awake monkeys (see poster of Reynaud et al., VSS09). Based on a previously developed Bayesian framework, we have developed an optimal statistical decoder of such an observed cortical population activity as recorded by optical imaging. This model aims at characterizing the statistical dependence between early neuronal activity and ocular responses and its performance was analyzed by comparing this neuronal read-out and the actual motor responses on a trial-by-trial basis. First, we show that relative performance of the behavioral contrast response function is similar to the best estimate obtained from the neural activity. In particular, we show that the latency of ocular response increases with low contrast conditions as well as with noisier instances of the behavioral task as decoded by the model. Then, we investigate the temporal dynamics of both neuronal and motor responses and show how motion information as represented by the model is integrated in space to improve population decoding over time. Lastly, we explore how a surrounding velocity non congruous with the central excitation information shunts the ocular response and how it is topographically represented in the cortical activity. Acknowledgment: European integrated project FACETS IST-15879.

Functional consequences of correlated excitation and inhibition on single neuron integration and signal propagation through synfire chains

Neurons receive a large number of excitatory and inhibitory synaptic inputs whose temporal interplay determines their spiking behavior. On average, excitation (Gexc) and inhibition (Ginh) balance each other, such that spikes are elicited by fluctuations [1]. In addition, it has been shown in vivo that Gexc and Ginh are correlated, with Ginh lagging Gexc only by few milliseconds (6ms), creating a small temporal integration window [2,3]. This correlation structure could be induced by feed-forward inhibition (FFI), which has been shown to be present at many sites in the central nervous system. To characterize the functional consequences of the FFI, we first modeled a simple circuit using spiking neurons with conductance based synapses and studied the effect on the single neuron integration. We then coupled many of such circuits to construct a feed-forward network (synfire chain [4,5]) and investigated the effect of FFI on signal propagation along such feed-forward network. We found that the small temporal integration window, induced by the FFI, changes the integrative properties of the neuron. Only transient stimuli could produce a response when the FFI was active whereas without FFI the neuron responded to both steady and transient stimuli. Due to the increase in selectivity to transient inputs, the conditions of signal propagation through the feed-forward network changed as well. Whereas synchronous inputs could reliable propagate, high asynchronous input rates, which are known to induce synfire activity [6], failed to do so. In summary, the FFI increased the stability of the synfire chain. Supported by DFG SFB 780, EU-15879-FACETS, BMBF 01GQ0420 to BCCN Freiburg [1] Kumar A., Schrader S., Aertsen A. and Rotter S. (2008). The high-conductance state of cortical networks. Neural Computation, 20(1):1–43. [2] Okun M. and Lampl I. (2008). Instantaneous correlation of excitation and inhibition during ongoing and sensory-evoked activities. Nat Neurosci, 11(5):535–7. [3] Baudot P., Levy M., Marre O., Monier C. and Frégnac (2008). [4] Abeles M. (1991). Corticonics: Neural circuits of the cerebral cortex. Cambridge, UK [5] Diesmann M., Gewaltig M-O and Aertsen A. (1999). Stable propagation of synchronous spiking in cortical neural networks. Nature, 402(6761):529–33. [6] Kumar A., Rotter S. and Aertsen A. (2008), Conditions for propagating synchronous spiking and asynchronous firing rates in a cortical network model. J Neurosci 28 (20), 5268–80. Preliminary Program

Dynamics of cortical networks including long-range patchy connections

Most studies of cortical network dynamics are either based on purely random wiring or neighborhood couplings [1], focussing on a rather local scale. Neuronal connections in the cortex, however, show a more complex spatial pattern composed of local and long-range patchy connections [2,3] as shown in the figure: It represents a tracer injection (gray areas) in the GM of a flattened cortex (top view): Black dots indicate neuron positions, blue lines their patchy axonal ramifications, and red lines represent the local connections. Moreover, to include distant synapses, one has to enlarge the spatial scale from the typically assumed 1mm to 5mm side length. As it is our aim to analyze more realistic network models of the cortex we assume a distance dependent connectivity that reflects the geometry of dendritesand axons [3]. Here, we ask to what extent the assumption of specific geometric traits influences the resulting dynamical behavior of these networks. Analyzing various characteristic measures that describe spiking neurons (e.g., coefficient of variation, correlation coefficient), we compare the dynamical state spaces of different connectivity types: purely random or purely local couplings, a combination of local and distant synapses, and connectivity structures with patchy projections. On top of biologically realistic background states, a stimulus is applied in order to analyze their stabilities. As previous studies [1], we also find different dynamical states depending on the external input rate and the numerical relation between excitatory and inhibitory synaptic weights. Preliminary results indicate, however, that transitions between these states are much sharper in case of local or patchy couplings. This work is supported by EU Grant 15879 (FACETS). Thanks to Stefan Rotter who supervised the PhD project [3] this work is based on. Network dynamics are simulated with NEST/PyNN [4]. [1] A. Kumar, S. Schrader, A. Aertsen and S. Rotter, Neural Computation 20, 2008, 1-43. [2] T. Binzegger, R.J. Douglas and K.A.C. Martin, J. of Neurosci., 27(45), 2007, 12242-12254. [3] Voges N, Fakultaet fuer Biologie, Albert-Ludwigs-Universitaet Freiburg, 2007. [4] NEST. M.O. Gewaltig and M. Diesmann, Scholarpedia 2(4):1430.

Dynamical state spaces of cortical networks representing various horizontal connectivities

Most studies of cor tical network dynamics are either based on purely random wiring or neighborhood couplings, e.g., [Kumar, Schrader, Aer tsen, Rotter, 2008, Neural Computation 20, 1–43]. Neuronal connections in the cor tex, however, show a complex spatial pattern composed of local and long-range connections, the latter featuring a so-called patchy projection pattern, i.e., spatially clustered synapses [Binzegger, Douglas, Martin, 2007, J. Neurosci. 27(45), 12242–12254]. The idea of our project is to provide and to analyze probabilistic network models that more adequately represent horizontal connectivity in the cor tex. In particular, we investigate the effect of specific projection patterns on the dynamical state space of cor tical networks. Assuming an enlarged spatial scale we employ a distance dependent connectivity that reflects the geometr y of dendrites and axons. We simulate the network dynamics using a neuronal network simulator NEST/PyNN. Our models are composed of conductance based integrate-and-fire neurons, representing fast spiking inhibitor y and regular spiking excitator y cells. In order to compare the dynamical state spaces of previous studies with our network models we consider the following connectivity assumptions: purely random or purely local couplings, a combination of local and distant synapses, and connectivity structures with patchy projections. Similar to previous studies, we also find different dynamical states depending on the input parameters: the external input rate and the numerical relation between excitator y and inhibitor y synaptic weights. These states, e.g., synchronous regular (SR) or asynchronous irregular (AI) firing, are characterized by measures like the mean firing rate, the correlation coefficient, the coefficient of variation and so for th. On top of identified biologically realistic background states (AI), stimuli are applied in order to analyze their stability. Comparing the results of our different network models we find that the parameter space necessar y to describe all possible dynamical states of a network is much more concentrated if local couplings are involved. The transition between different states is shifted (with respect to both input parameters) and shar pened in dependence of the relative amount of local couplings. Local couplings strongly enhance the mean firing rate, and lead to smaller values of the correlation coefficient. In terms of emergence of synchronous states, however, networks with local versus non-local or patchy versus random remote connections exhibit a higher probability of synchronized spiking. Concerning stability, preliminar y results indicate that again networks with local or patchy connections show a higher probability of changing from the AI to the SR state. We conclude that the combination of local and remote projections bears important consequences on the activity of network: The apparent differences we found for distinct connectivity assumptions in the dynamical state spaces suggest that network dynamics strongly depend on the connectivity structure. This effect might be even stronger with respect to the spatio-temporal spread of signal propagation. This work is suppor ted by EC IP project FP6-015879 (FACETS).

Decoding center-surround interactions in population of neurons for the ocular following response

Short presentation of a large moving pattern elicits an Ocular Following Response (OFR) that exhibits many of the properties attributed to low-level motion processing such as spatial and temporal integration, contrast gain control and divisive interaction between competing motions. Similar mechanisms have been demonstrated in V1 cortical activity in response to center-surround gratings patterns measured with real-time optical imaging in awake monkeys. More recent experiments of OFR have used disk gratings and bipartite stimuli which are optimized to study the dynamics of center-surround integration. We quantified two main characteristics of the global spatial integration of motion from an intermediate map of possible local translation velocities: (i) a finite optimal stimulus size for driving OFR, surrounded by an antagonistic modulation and (ii) a direction selective suppressive effect of the surround on the contrast gain control of the central stimuli [Barthelemy06,Barthelemy07]. In fact, the machinery behind the visual perception of motion and the subsequent sensorimotor transformation is confronted to uncertainties which are efficiently resolved in the primate’s visual system. We may understand this response as an ideal observer in a probabilistic framework by using Bayesian theory [Weiss02] and we extended in the dynamical domain the ideal observer model to simulate the spatial integration of the different local motion cues within a probabilistic representation. We proved that this model is successfully adapted to model the OFR for the different experiments [Perrinet07neurocomp], that is for different levels of noise with full field gratings, with disks of various sizes and also for the effect of a flickering surround. However, another emphad hoc inhibitory mechanism has to be added in this model to account for suppressive effects of the surround. We explore here an hypothesis where this could be understood as the effect of a recurrent prediction of information in the velocity map. In fact, in previous models, the integration step assumes independence of the local information while natural scenes are very predictable: Due to the rigidity and inertia of physical objects in visual space, neighboring local spatiotemporal information is redundant and one may introduce this empha priori knowledge of the statistics of the input in the ideal observer model. We implement this in a realistic model of a layer representing velocities in a map of cortical columns, where predictions are implemented by lateral interactions within the cortical area. First, raw velocities are estimated locally from images and are propagated to this area in a feed-forward manner. Using this velocity map, we progressively learn the dependance of local velocities in a second layer of the model. This algorithm is cyclic since the prediction is using the local velocities which are themselves using both the feed-forward input and the prediction: We control the convergence of this process by measuring results for different learning rate. Results show that this simple model is sufficient to disambiguate characteristic patterns such as the Barber-Pole illusion. Due to the recursive network which is modulating the velocity map, it also explains that the representation may exhibit some memory, such as when an object suddenly disappears or when presenting a dot followed by a line (line-motion illusion). Finally, we applied this model that was tuned over a set of natural scenes to gratings of increasing sizes. We observed first that the feed-forward response as tuned to neurophysiological data gave lower responses at higher eccentricities, and that this effect was greater for higher grating frequencies. Then, we observed that depending on the size of the disk and on its spatial frequency, the recurrent network of lateral interactions Lastly, we explore how a surrounding velocity non congruous with the central excitation information shunts the ocular response and how it is topographically represented in the cortical activity.

Correlating Excitation and Inhibition in Visual Cortical Circuits : Functional Consequences and Biological Feasibility

The primary visual cortex (V1) is one of the most studied cortical area in the brain. Together with the retina and the lateral geniculate nucleus (LGN) it forms the early visual system, which has become a common model for studying computational principles in sensory systems. Simple artificial stimuli (such as drifting gratings (DG)) have given insights into the neural basis of visual processing. However, recently more and more researchers have started to use more complex natural visual stimuli (NI), arguing that the low dimensional artificial stimuli are not sufficient for a complete understanding of the visual system. For example, whereas the responses of V1 neurons to DG are dense but with variable spike timings, the neurons respond with only few but precise spikes to NI. Furthermore, linear receptive field models provide a good fit to responses during simple stimuli, however, they often fail during NI. To investigate the mechanisms behind the stimulus dependent responses of cortical neurons we have built a biophysical, yet simple and comprehensible, model of the early visual system. We show how the spatial and temporal stimulus properties interact with the model architecture to give rise to the differential response behaviour. Our results show that during NI the LGN afferents show epochs of correlated activity. These temporal correlations induce transient excitatory synaptic inputs, resulting in precise spike timings in V1. Furthermore, the sparseness of the responses to NI can be explained by correlated and lagging inhibitory conductance, which is induced by the interactions of the thalamocortical circuit with the spatial-temporal correlations in the stimulus. We continue by investigating the origin of stimulus dependent nonlinear responses, by comparing models of different complexity. Our results suggest that adaptive processes shape the responses, depending on the temporal properties of the stimuli. The spatial properties can result in nonlinear inputs through the recurrent cortical network. We then study the functional consequences of correlated excitatory and inhibitory condutances in more details in generic models. These results show that: (1) spiking of individual neurons becomes sparse and precise, (2) the selectivity of signal propagation increases and the detailed delay allows to gate the propagation through feed-forward structures (3) and recurrent cortical networks are more stable and more likely to elicit in vivo type activity states. Lastly our work illustrates new advances in methods of constructing and exchanging models of neuronal systems by the means of a simulator independent description language (called PyNN). We use this new tool to investigate the feasibility of comparing software simulations with neuromorphic hardware emulations. The presented work gives new perspectives on the processing of the early visual system, in particular on the importance of correlated conductances. It thus opens the door for more elaborated models of the visual system.

Dynamics of distributed 1D and 2D motion representations for short-latency ocular following
Dynamics of distributed 1D and 2D motion representations for short-latency ocular following

Integrating information is essential to measure the physical 2D motion of a surface from both ambiguous local 1D motion of its elongated edges and non-ambiguous 2D motion of its features such as corners or texture elements. The dynamics of this motion integration shows a complex time course as read from tracking eye movements: first, local 1D motion signals are extracted and pooled to initiate ocular responses, then 2D motion signals are integrated to adjust the tracking direction until it matches the surface motion direction. The nature of these 1D and 2D motion computations are still unclear. One hypothesis is that their different dynamics may be explained from different contrast sensitivities. To test this, we measured contrast-response functions of early, 1D-driven and late, 2D-driven components of ocular following responses to different motion stimuli: gratings, plaids and barberpoles. We found that contrast dynamics of 1D-driven responses are nearly identical across the different stimuli. On the contrary, late 2D-driven components with either plaids or barberpoles have similar latencies but different contrast dynamics. Temporal dynamics of both 1D- and 2D-driven responses demonstrates that the different contrast gains are set very early during the response time course. Running a Bayesian model of motion integration, we show that a large family of contrast-response functions can be predicted from the probability distributions of 1D and 2D motion signals for each stimulus and by the shape of the prior distribution. However, the pure delay (i.e. largely independent upon contrast) observed between 1D- and 2D-motion supports the fact that 1D and 2D probability distributions are computed independently. This two-pathway Bayesian model supports the idea that 1D and 2D mechanisms represent edges and features motion in parallel.

PyNN: A Common Interface for Neuronal Network Simulators

Computational neuroscience has produced a diversity of software for simulations of networks of spiking neurons, with both negative and positive consequences. On the one hand, each simulator uses its own programming or configuration language, leading to considerable difficulty in porting models from one simulator to another. This impedes communication between investigators and makes it harder to reproduce and build on the work of others. On the other hand, simulation results can be cross-checked between different simulators, giving greater confidence in their correctness, and each simulator has different optimizations, so the most appropriate simulator can be chosen for a given modelling task. A common programming interface to multiple simulators would reduce or eliminate the problems of simulator diversity while retaining the benefits. PyNN is such an interface, making it possible to write a simulation script once, using the Python programming language, and run it without modification on any supported simulator (currently NEURON, NEST, PCSIM, Brian and the Heidelberg VLSI neuromorphic hardware). PyNN increases the productivity of neuronal network modelling by providing high-level abstraction, by promoting code sharing and reuse, and by providing a foundation for simulator-agnostic analysis, visualization and data-management tools. PyNN increases the reliability of modelling studies by making it much easier to check results on multiple simulators. PyNN is open-source software and is available from http://neuralensemble.org/PyNN.

Decoding the population dynamics underlying ocular following response using a probabilistic framework
Decoding the population dynamics underlying ocular following response using a probabilistic framework

The machinery behind the visual perception of motion and the subsequent sensorimotor transformation, such as in Ocular Following Response (OFR), is confronted to uncertainties which are efficiently resolved in the primate’s visual system. We may understand this response as an ideal observer in a probabilistic framework by using Bayesian theorỹWeiss02 which we previously proved to be successfully adapted to model the OFR for different levels of noise with full field gratings or with disk of various sizes and the effect of a flickering surround̃Perrinet07neurocomp. More recent experiments of OFR have used disk gratings and bipartite stimuli which are optimized to study the dynamics of center-surround integration. We quantified two main characteristics of the global spatial integration of motion from an intermediate map of possible local translation velocities: (i) a finite optimal stimulus size for driving OFR, surrounded by an antagonistic modulation and (ii) a direction selective suppressive effect of the surround on the contrast gain control of the central stimuliB̃arthelemy06,Barthelemy07. Herein, we extended in the dynamical domain the ideal observer model to simulate the spatial integration of the different local motion cues within a probabilistic representation. We present analytical results which show that the hypothesis of independence of local measures can describe the initial segment of spatial integration of motion signal. Within this framework, we successfully accounted for the dynamical contrast gain control mechanisms observed in the behavioral data for center-surround stimuli. However, another inhibitory mechanism had to be added to account for suppressive effects of the surround. We explore here an hypothesis where this could be understood as the effect of a recurrent integration of information in the velocity map. F. Barthelemy, L. U. Perrinet, E. Castet, and G. S. Masson. Dynamics of distributed 1D and 2D motion representations for short-latency ocular following. Vision Research, 48(4):501–22, feb 2007. doi: 10.1016/j.visres.2007.10.020. F. V. Barthelemy, I. Vanzetta, and G. S. Masson. Behavioral receptive field for ocular following in humans: Dynamics of spatial summation and center-surround interactions. Journal of Neurophysiology, (95):3712–26, Mar 2006. doi: 10.1152/jn.00112.2006. L. U. Perrinet and G. S. Masson. Modeling spatial integration in the ocular following response using a probabilistic framework. Journal of Physiology (Paris), 2007. doi: 10.1016/j.jphysparis.2007.10.011. Y. Weiss, E. P. Simoncelli, and E. H. Adelson. Motion illusions as optimal percepts. Nature Neuroscience, 5(6):598–604, Jun 2002. doi: 10.1038/nn858. This work was supported by EC IP project FP6-015879, ‘‘FACETS’’.

Control of the temporal interplay between excitation and inhibition by the statistics of visual input: a V1 network modelling study

In the primary visual cortex (V1), single cell responses to simple visual stimuli (gratings) are usually dense but with a high trial-by-trial variability. In contrast, when exposed to full field natural scenes, the firing patterns of these neurons are sparse but highly reproducible over trials (Marre et al., 2005; Frégnac et al., 2006). It is still not understood how these two classes of stimuli can elicit these two distinct firing behaviours. A common model for simple-cell computation in layer 4 is the ``push-pull’’ circuitry (Troyer et al. 1998). It accounts for the observed anti-phase behaviour between excitatory and inhibitory conductances in response to a drifting grating (Anderson et al., 2000; Monier et al., 2008), creating a wide temporal integration window during which excitation is integrated without the shunting or opponent effect of inhibition and allowed to elicit multiple spikes. This is in contrast to recent results from intracellular recordings in vivo during presentation of natural scenes (Baudot et al., 2013). Here the excitatory and inhibitory conductances were highly correlated, with inhibition lagging excitation only by few milliseconds (̃6 ms). This small lag creates a narrow temporal integration window such that only synchronized excitatory inputs can elicit a spike, similar to parallel observations in other cortical sensory areas (Wehr and Zador, 2003; Okun and Lampl, 2008). To investigate the cellular and network mechanisms underlying these two different correlation structures, we constructed a realistic model of the V1 network using spiking neurons with conductance based synapses. We calibrated our model to fit the irregular ongoing activity pattern as well as in vivo conductance measurements during drifting grating stimulation and then extracted predicted responses to natural scenes seen through eye-movements. Our simulations reproduced the above described experimental observation, together with anti-phase behaviour between excitation and inhibition during gratings and phase lagged activation during natural scenes. In conclusion, the same cortical network that shows dense and variable responses to gratings exhibits sparse and precise spiking to natural scenes. Work is under way to show to which extent this feature is specific for the feedforward vs recurrent nature of the modelled circuit.

Adaptive Sparse Spike Coding : applications of Neuroscience to the compression of natural images
Adaptive Sparse Spike Coding : applications of Neuroscience to the compression of natural images

If modern computers are sometimes superior to cognition in some specialized tasks such as playing chess or browsing a large database, they can’t beat the efficiency of biological vision for such simple tasks as recognizing a relative or following an object in a complex background. We present in this paper our attempt at outlining the dynamical, parallel and event-based representation for vision in the architecture of the central nervous system. We will illustrate this by showing that in a signal matching framework, a L/LN (linear/non-linear) cascade may efficiently transform a sensory signal into a neural spiking signal and we apply this framework to a model retina. However, this code gets redundant when using an over-complete basis as is necessary for modeling the primary visual cortex: we therefore optimize the efficiency cost by increasing the sparseness of the code. This is implemented by propagating and canceling redundant information using lateral interactions. We compare the efficiency of this representation in terms of compression as the reconstruction quality as a function of the coding length. This will correspond to a modification of the Matching Pursuit algorithm where the ArgMax function is optimized for competition, or Competition Optimized Matching Pursuit (COMP). We will particularly focus on bridging neuroscience and image processing and on the advantages of such an interdisciplinary approach.

Dynamical Neural Networks: modeling low-level vision at short latencies

The machinery behind the visual perception of motion and the subsequent sensori-motor transformation, such as in ocular following response (OFR), is confronted to uncertainties which are efficiently resolved in the primate’s visual system. We may understand this response as an ideal observer in a probabilistic framework by using Bayesian theory [Weiss, Y., Simoncelli, E.P., Adelson, E.H., 2002. Motion illusions as optimal percepts. Nature Neuroscience, 5(6), 598-604, doi:10.1038/nn858] which we previously proved to be successfully adapted to model the OFR for different levels of noise with full field gratings. More recent experiments of OFR have used disk gratings and bipartite stimuli which are optimized to study the dynamics of center-surround integration. We quantified two main characteristics of the spatial integration of motion: (i) a finite optimal stimulus size for driving OFR, surrounded by an antagonistic modulation and (ii) a direction selective suppressive effect of the surround on the contrast gain control of the central stimuli [Barthélemy, F.V., Vanzetta, I., Masson, G.S., 2006. Behavioral receptive field for ocular following in humans: dynamics of spatial summation and center-surround interactions. Journal of Neurophysiology, (95), 3712-3726, doi:10.1152/jn.00112.2006]. Herein, we extended the ideal observer model to simulate the spatial integration of the different local motion cues within a probabilistic representation. We present analytical results which show that the hypothesis of independence of local measures can describe the spatial integration of the motion signal. Within this framework, we successfully accounted for the contrast gain control mechanisms observed in the behavioral data for center-surround stimuli. However, another inhibitory mechanism had to be added to account for suppressive effects of the surround.

Self-Invertible 2D Log-Gabor Wavelets
Self-Invertible 2D Log-Gabor Wavelets

Meanwhile biorthogonal wavelets got a very popular image processing tool, alternative multiresolution transforms have been proposed for solving some of their drawbacks, namely the poor selectivity in orientation and the lack of translation invariance due to the aliasing between subbands. These transforms are generally overcomplete and consequently offer huge degrees of freedom in their design. At the same time their optimization get a challenging task. We proposed here a log-Gabor wavelet transform gathering the excellent mathematical properties of the Gabor functions with a carefully construction to maintain the properties of the filters and to permit exact reconstruction. Two major improvements are proposed: first the highest frequency bands are covered by narrowly localized oriented filters. And second, all the frequency bands including the highest and lowest frequencies are uniformly covered so as exact reconstruction is achieved using the same filters in both the direct and the inverse transforms (which means that the transform is self-invertible). The transform is optimized not only mathematically but it also follows as much as possible the knowledge on the receptive field of the simple cells of the Primary Visual Cortex (V1) of primates and on the statistics of natural images. Compared to the state of the art, the log-Gabor wavelets show excellent behavior in their ability to segregate the image information (e.g. the contrast edges) from incoherent Gaussian noise by hard thresholding and to code the image features through a reduced set of coefficients with large magnitude. Such characteristics make the transform a promising tool for general image processing tasks.

Visual tracking of ambiguous moving objects: A recursive Bayesian model

Perceptual and oculomotor data demonstrate that, when the visual information about an object’s motion differs on the local (edge-related) and global levels, the local 1D motion cues dominate initially, whereas 2D information takes progressively over and leads to the final correct representation of global motion. Previous models have explained the initial errors (deviations from the global motion) in terms of best perceptual guess in the Bayesian sense. These models accounted for the intrinsic sensory noise of the image and general expectancies for object velocities. Here we propose a recursive extension of the Bayesian model, with the purpose of encompassing the whole dynamical evolution of motion processing, from the 1D cues to the correct global motion. Our model is motivated and constrained by smooth pursuit oculomotor data. Eye movements were recorded in 3 participants using the scleral search coil technique. Participants were asked to track either a single line (vertical or oblique) or a Gaussian blob moving horizontally. In our model, oculomotor data obtained with non ambiguous stimuli (e.g. with coherent local and global information, such as a Gaussian blob or a vertical line moving horizontally) are combined to constrain the initial likelihood and prior functions for the general, ambiguous case (e.g. a tilted line moving horizontally). The prior knowledge is then recursively updated by using the previous posterior probability as the current prior. The idea is that the recursive injection of posterior distribution boosts the spread of information about the object’s shape, favoring the integration of 1D and 2D cues. In addition, a simple model of the sensory-oculomotor loop is taken into account, including transmission delays and the evolution of the retinal motion during pursuit. Preliminary results show substantial agreement between the model prediction and the oculomotor data.

PyNN: towards a universal neural simulator API in Python

Trends in programming language development and adoption point to Python as the high-level systems integration language of choice. Python leverages a vast developer-base external to the neuroscience community, and promises leaps in simulation complexity and maintainability to any neural simulator that adopts it. PyNN http://neuralensemble.org/PyNN strives to provide a uniform application programming interface (API) across neural simulators. Presently NEURON and NEST are supported, and support for other simulators and neuromorphic VLSI hardware is under development. With PyNN it is possible to write a simulation script once and run it without modification on any supported simulator. It is also possible to write a script that uses capabilities specific to a single simulator. While this sacrifices simulator-independence, it adds flexibility, and can be a useful step in porting models between simulators. The design goals of PyNN include allowing access to low-level details of a simulation where necessary, while providing the capability to model at a high level of abstraction, with concomitant gains in development speed and simulation maintainability. Another of our aims with PyNN is to increase the productivity of neuroscience modeling, by making it faster to develop models de novo, by promoting code sharing and reuse across simulator communities, and by making it much easier to debug, test and validate simulations by running them on more than one simulator. Modelers would then become free to devote more software development effort to innovation, building on the simulator core with new tools such as network topology databases, stimulus programming, analysis and visualization tools, and simulation accounting. The resulting, community-developed ‘meta-simulator’ system would then represent a powerful tool for overcoming the so-called complexity bottleneck that is presently a major roadblock for neural modeling.

On efficient sparse spike coding schemes for learning natural scenes in the primary visual cortex

We describe the theoretical formulation of a learning algorithm in a model of the primary visual cortex (V1) and present results of the efficiency of this algorithm by comparing it to the SparseNet algorithm [1]. As the SparseNet algorithm, it is based on a model of signal synthesis as a Linear Generative Model but differs in the efficiency criteria for the representation. This learning algorithm is in fact based on an efficiency criteria based on the Occam razor: for a similar quality, the shortest representation should be privileged. This inverse problem is NP-complete and we propose here a greedy solution which is based on the architecture and nature of neural computations [2]). It proposes that the supra-threshold neural activity progressively removes redundancies in the representation based on a correlation-based inhibition and provides a dynamical implementation close to the concept of neural assemblies from Hebb [3]). We present here results of simulation of this network with small natural images and compare it to the Sparsenet solution. Extending it to realistic images and to the NEST simulator http://www.nest-initiative.org/, we show that this learning algorithm based on the properties of neural computations produces adaptive and efficient representations in V1. 1. Olshausen B, Field DJ: Sparse coding with an overcomplete basis set: A strategy employed by V1? Vision Res 1997, 37:3311-3325. 2. Perrinet L: Feature detection using spikes: the greedy approach. J Physiol Paris 2004, 98(4–6):530-539. 3. Hebb DO: The organization of behavior. Wiley, New York; 1949.

Modeling spatial integration in the ocular following response using a probabilistic framework
Modeling spatial integration in the ocular following response using a probabilistic framework

The machinery behind the visual perception of motion and the subsequent sensori-motor transformation, such as in Ocular Following Response (OFR), is confronted to uncertainties which are efficiently resolved in the primate’s visual system. We may understand this response as an ideal observer in a probabilistic framework by using Bayesian theory (Weiss et al., 2002) which we previously proved to be successfully adapted to model the OFR for different levels of noise with full field gratings (Perrinet et al., 2005). More recent experiments of OFR have used disk gratings and bipartite stimuli which are optimized to study the dynamics of center-surround integration. We quantified two main characteristics of the spatial integration of motion : (i) a finite optimal stimulus size for driving OFR, surrounded by an antagonistic modulation and (ii) a direction selective suppressive effect of the surround on the contrast gain control of the central stimuli (Barthélemy et al., 2006). Herein, we extended the ideal observer model to simulate the spatial integration of the different local motion cues within a probabilistic representation. We present analytical results which show that the hypothesis of independence of local measures can describe the integration of the spatial motion signal. Within this framework, we successfully accounted for the contrast gain control mechanisms observed in the behavioral data for center-surround stimuli. However, another inhibitory mechanism had to be added to account for suppressive effects of the surround.

Input-output transformation in the visuo-oculomotor loop: modeling the ocular following response to center-surround stimulation in a probabilistic framework

The quality of the representation of an object’s motion is limited by the noise in the sensory input as well as by an intrinsic ambiguity due to the spatial limitation of the visual motion analyzers (aperture problem). Perceptual and oculomotor data demonstrate that motion processing of extended ob jects is initially dominated by the local 1D motion cues orthogonal to the ob ject’s edges, whereas 2D information takes progressively over and leads to the final correct representation of global motion. A Bayesian framework accounting for the sensory noise and general expectancies for ob ject velocities has proven successful in explaining several experimental findings concerning early motion processing [1, 2, 3]. However, a complete functional model, encompassing the dynamical evolution of object motion perception is still lacking. Here we outline several experimental observations concerning human smooth pursuit of moving ob jects and more particularly the time course of its initiation phase. In addition, we propose a recursive extension of the Bayesian model, motivated and constrained by our oculomotor data, to describe the dynamical integration of 1D and 2D motion information.

Dynamical contrast gain control mechanisms in a layer 2/3 model of the primary visual cortex

Computations in a cortical column are characterized by the dynamical, event-based nature of neuronal signals and are structured by the layered and parallel structure of cortical areas. But they are also characterized by their efficiency in terms of rapidity and robustness. We propose and study here a model of information integration in the primary visual cortex (V1) thanks to the parallel and interconnected network of similar cortical columns. In particular, we focus on the dynamics of contrast gain control mechanisms as a function of the distribution of information relevance in a small population of cortical columns. This cortical area is modeled as a collection of similar cortical columns which receive input and are linked according to a specific connectivity pattern which is relevant to this area. These columns are simulated using the sc Nest simulator Morrison04 using conductance-based Integrate-and-Fire neurons and consist vertically in 3 different layers. The architecture was inspired by neuro-physiological observations on the influence of neighboring activities on pyramidal cells activity and correlates with the lateral flow of information observed in the primary visual cortex, notably in optical imaging experiments Jancke04, and is similar in its final implementation to local micro-circuitry of the cortical column presented by Grossberg05. They show prototypical spontaneous dynamical behavior to different levels of noise which are relevant to the generic modeling of biological cortical columns Kremkow05. In the future, the connectivity will be derived from an algorithm that was used for modeling the transient spiking response of a layer of neurons to a flashed image and which was based on the Matching Pursuit algorithm Perrinet04. The visual input is first transmitted from the Lateral Geniculate Nucleus (LGN) using the model of Gazeres98. It transforms the image flow into a stream of spikes with contrast gain control mechanisms specific to the retina and the LGN. This spiking activity converges to the pyramidal cells of layer 2/3 thanks to the specification of receptive fields in layer 4 providing a preference for oriented local contrasts in the spatio-temporal visual flow. In particular, we use in these experiments visual input organized in a center-surround spatial pattern which was optimized in size to maximize the response of a column in the center and to the modulation of this response by the surround (bipartite stimulus). This class of stimuli provide different levels of input activation and of visual ambiguity in the visual space which were present in the spatio-temporal correlations in the input spike flow optimized to the resolution of cortical columns in the visual space. It thus provides a method to reveal the dynamics of information integration and particularly of contrast gain control which are characteristic to the function of V1.

Dynamical contrast gain control mechanisms in a layer 2/3 model of the primary visual cortex
Modeling of simple cells through a sparse overcomplete gabor wavelet representation based on local inhibition and facilitation

We present a biologically plausible model of simple cortical cells as 1) a linear transform representing edges and 2) a non-linear iterative stage of inhibition and facilitation between neighboring coefficients. The linear transform is a complex log-Gabor wavelet transform which is overcomplete (i.e. there are more coefficients than pixels in the image) and has exact reconstruction. The inhibition consists in diminishing down the coefficients which are not at a local-maxima along the direction normal to the edge filter orientation, whereas the facilitation enhances the collinear and co-aligned local-maximum coefficients. At each iteration and after the inhibition and facilitation stages, the reconstructed error is subtracted in the transform domain for keeping an exact reconstruction. Such process concentrates the signal energy on a few coefficients situated along the edges of the objects, yielding a sparse representation. The rationale for such procedure is: (1) th e overcompleteness offers flexibility for activity reassignment; (2) images can be coded by sparse Gabor coefficients located on object edges; (3) image contours produce aligned and collinear local-maxima in the transform domain; (4) the inhibition/facilitation processes are able to extract the contours. The sparse Gabor coefficients are mostly connected each other and located along object contours. Such layout makes chain coding suitable for compression purposes. Specially adapted to Gabor wavelets features, our chain coding represents every chain by its end-points (head and tail) and the elementary movements necessary to walk along the chain from head to tail. Moreover it predicts the module and phase of each Gabor coefficient according to the previous chain coefficient. As a result, redundancy of the transform domain is further reduced. Used for compression, the scheme limits particularly the high-frequency artifacts. The model performs also efficiently in tasks the Human Visual System is supposed to deal with, as for instance edge extraction and image denoising.

Efficient Source Detection Using Integrate-and-Fire Neurons
Efficient Source Detection Using Integrate-and-Fire Neurons
Efficient representation of natural images using local cooperation

Low-level perceptual computations may be understood in terms of efficient codes (Simoncelli and Olshausen, 2001, Annual Review of Neuroscience 24 1193-216). Following this argument, we explore models of representation for natural static images as a way to understand the processing of information in the primary visual cortex. This representation is here based on a generative linear model of the synthesis of images using an over-complete multi-resolution dictionary of edges. This transform is implemented using log-Gabor filters and permits an exact reconstruction of any image. However, this linear representation is redundant and since to any image may correspond different representations, we explore more efficient representations of the image. The problem is stated as an ill-posed inverse problem and we compare first different known strategies by computing the efficiency of the solutions given by Matching Pursuit (Perrinet, 2004, IEEE Trans. Neural Networks 15 1164-75) and sparse edge coding (Fischer, in press, Trans. Image Processing) with classical representation methods such as JPEG. This comparison allows us to provide a synthesized approach using a probabilistic representation which would progressively construct the neural representation by using lateral cooperations. We propose an algorithm which dynamically diffuses information to correlated filters so as to yield a progressively disambiguated representation. This approach takes advantage of the computational properties of spiking neurons such as Integrate-and-Fire neurons and provides an efficient yet simple model for the representation of natural images. This representation is directly linked with the edge content of natural images and we show applications of this method to edge extraction, denoising and compression. We also show that this dynamical approach fits with neuro-physiological observations and may explain the non-linear interactions between neighboring neurons which may be observed in the cortex.

Dynamics of motion representation in short-latency ocular following: A two-pathways Bayesian model

The integration of information is essential to measure the exact 2D motion of a surface from both local ambiguous 1D motion produced by elongated edges and local non-ambiguous 2D motion from features such as corners, end-points or texture elements. The dynamics of this motion integration shows a complex time course which can be read from tracking eye movements: local 1D motion signals are extracted first and then pooled to initiate the ocular responses before that 2D motion signals are taken into account to refine the tracking direction until it matches the surface motion direction. The nature of these 1D and 2D motion computations is still unclear. Previously, we have shown that the late, 2D-driven response components to either plaids or barber-poles have very similar latencies over a large range of contrast, suggesting a shared mechanism. However, they showed different contrast response functions with these different motion stimuli, suggesting different motion processing. We designed a two-pathways Bayesian model of motion integration and showed that this family of contrast response functions can be predicted from the probability distributions of 1D and 2D motion signals for each type of stimulus. Indeed, this formulation may explain contrast response functions that could not be explained by a simple bayesian model (Weiss et al., 2002 em Nature Neuroscience bf 5 , 598–604) and gives a quantitative argument to study how local information with different relative ambiguities values may be pooled to provide an integrated response of the system. Finally, we formulate how different spatial information may be pooled and we draw the analogy of this method with methods using the partial derivative equations. This simple model correctly explains some non-linear interactions between neighboring neurons selective to motion direction which are observed in short-latency ocular following and neuro-physiological data.

Coding static natural images using spiking event times: do neurons cooperate?
Coding static natural images using spiking event times: do neurons cooperate?

To understand possible strategies of temporal spike coding in the central nervous system, we study functional neuromimetic models of visual processing for static images. We will first present the retinal model which was introduced by Van Rullen and Thorpe [1] and which represents the multiscale contrast values of the image using an orthonormal wavelet transform. These analog values activate a set of spiking neurons which each fire once to produce an asynchronous wave of spikes. According to this model, the image may be progressively reconstructed from this spike wave thanks to regularities in the statistics of the coefficients determined with natural images. Here, we study mathematically how the quality of information transmission carried by this temporal representation varies over time. In particular, we study how these regularities can be used to optimize information transmission by using a form of temporal cooperation of neurons to code analog values. The original model used wavelet transforms that are close to orthogonal. However, the selectivity of realistic neurons overlap, and we propose an extension of the previous model by adding a spatial cooperation between filters. This model extends the previous scheme for arbitrary -and possibly non-orthogonal representations of features in the images. In particular, we compared the performance of increasingly over-complete representations in the retina. Results show that this algorithm provides an efficient spike coding strategy for low-level visual processing which may adapt to the complexity of the visual input.

Feature detection using spikes : the greedy approach
Feature detection using spikes : the greedy approach

A goal of low-level neural processes is to build an efficient code extracting the relevant information from the sensory input. It is believed that this is implemented in cortical areas by elementary inferential computations dynamically extracting the most likely parameters corresponding to the sensory signal. We explore here a neuro-mimetic feed-forward model of the primary visual area (V1) solving this problem in the case where the signal may be described by a robust linear generative model. This model uses an over-complete dictionary of primitives which provides a distributed probabilistic representation of input features. Relying on an efficiency criterion, we derive an algorithm as an approximate solution which uses incremental greedy inference processes. This algorithm is similar to ‘Matching Pursuit’ and mimics the parallel architecture of neural computations. We propose here a simple implementation using a network of spiking integrate-and-fire neurons which communicate using lateral interactions. Numerical simulations show that this Sparse Spike Coding strategy provides an efficient model for representing visual data from a set of natural images. Even though it is simplistic, this transformation of spatial data into a spatio-temporal pattern of binary events provides an accurate description of some complex neural patterns observed in the spiking activity of biological neural networks.

Finding Independent Components using spikes : a natural result of Hebbian learning in a sparse spike coding scheme

To understand possible strategies of temporal spike coding in the central nervous system, we study functional neuromimetic models of visual processing for static images. We will first present the retinal model which was introduced by Van Rullen and Thorpe [1] and which represents the multiscale contrast values of the image using an orthonormal wavelet transform. These analog values activate a set of spiking neurons which each fire once to produce an asynchronous wave of spikes. According to this model, the image may be progressively reconstructed from this spike wave thanks to regularities in the statistics of the coefficients determined with natural images. Here, we study mathematically how the quality of information transmission carried by this temporal representation varies over time. In particular, we study how these regularities can be used to optimize information transmission by using a form of temporal cooperation of neurons to code analog values. The original model used wavelet transforms that are close to orthogonal. However, the selectivity of realistic neurons overlap, and we propose an extension of the previous model by adding a spatial cooperation between filters. This model extends the previous scheme for arbitrary -and possibly non-orthogonal representations of features in the images. In particular, we compared the performance of increasingly over-complete representations in the retina. Results show that this algorithm provides an efficient spike coding strategy for low-level visual processing which may adapt to the complexity of the visual input.

Visual Strategies for Sparse Spike Coding
Visual Strategies for Sparse Spike Coding