Let’s admit it: brains are not computers. Indeed, computers are still deceptive compared to biological perceptual systems. Think about rapidly detecting a novel object in clutter. Think about performing this with little supervision at a low energetic cost…
To narrow the gap between neuroscience and the theory of sensory processing computations, I am interested in bridging geometrical regularities found in natural scenes with the properties of neural computations as they are observed in sensory processes or behavior.
Laurent Perrinet is a computational neuroscientist specialized in large scale neural network models of low-level vision, perception and action, currently at the “Institut de Neurosciences de la Timone” (France), a joint research unit (UMR7289, CNRS / Aix-Marseille Université). He co-authored more than 55 articles in computational neuroscience and computer vision. He graduated from the aeronautics engineering school SUPAERO, in Toulouse (France) with a signal processing and applied mathematics degree. He received a PhD in Cognitive Science in 2003 on the mathematical analysis of temporal spike coding of images by using a multiscale and adaptive representation of natural scenes. His research program is focusing in bridging the complex dynamics of realistic, large-scale models of spiking neurons with functional models of low-level vision. In particular, as part of the FACETS and BrainScaleS consortia, he has developed experimental protocols in collaboration with neurophysiologists to characterize the response of population of neurons. Recently, he extended models of visual processing in the framework of predictive processing in collaboration with the team of Karl Friston at the University College of London. This method aims at characterizing the processing of dynamical flow of information as an active inference process. His current challenge within the NeOpTo team is to translate, or compile in computer terminology, this mathematical formalism with the event-based nature of neural information with the aim of pushing forward the frontiers of Artificial Intelligence systems.
Habilitation à diriger des recherches, 2014
Aix-Marseille Université
PhD. in Cognitive Science, 2003
Université P. Sabatier, Toulouse, France
M.S. in Engineering, 1998
SupAéro, Toulouse, France
Event cameras asynchronously report brightness changes with a temporal resolution in the order of microseconds, which makes them inherently suitable to address problems that involve rapid motion perception, such as ventral landing and fast obstacle avoidance. These problems are typically addressed by estimating a single global time-to-contact (TTC) measure, which explicitly assumes that the surface/obstacle is planar and fronto-parallel. We relax this assumption by proposing an incremental event-based method to estimate the TTC that jointly estimates the (up-to scale) inverse depth and global motion using a single event camera. The proposed method is reliable and fast while asynchronously maintaining a TTC map (TTCM), which provides per-pixel TTC estimates. As a side product, the proposed method can also estimate per-event optical flow. We achieve state-of-the-art performances on TTC estimation in terms of accuracy and runtime per event while achieving competitive performance on optical flow estimation.
Our daily endeavors occur in a complex visual environment, whose intrinsic variability challenges the way we integrate information to make decisions. By processing myriads of parallel sensory inputs, our brain is theoretically able to compute the variance of its environment, a cue known to guide our behavior. Yet, the neurobiological and computational basis of such variance computations are still poorly understood. Here, we quantify the dynamics of sensory variance modulations of cat primary visual cortex neurons. We report two archetypal neuronal responses, one of which is resilient to changes in variance and co-encodes the sensory feature and its variance, improving the population encoding of orientation. The existence of these variance-specific responses can be accounted for by a model of intracortical recurrent connectivity. We thus propose that local recurrent circuits process uncertainty as a generic computation, advancing our understanding of how the brain handles naturalistic inputs.
Humans are able to robustly categorize images and can, for instance, detect the presence of an animal in a briefly flashed image in as little as 120 ms. Initially inspired by neuroscience, deep-learning algorithms literally bloomed up in the last decade such that the accuracy of machines is at present superior to humans for visual recognition tasks. However, these artificial networks are usually trained and evaluated on very specific tasks, for instance on the 1000 separate categories of IMAGENET. In that regard, biological visual systems are more flexible and efficient compared to artificial systems on generic ecological tasks. In order to deepen this comparison, we retrained the standard VGG Convolutional Neural Network (CNN) on two independent tasks which are ecologically relevant for humans: one task defined as detecting the presence of an animal and the other as detecting the presence of an artifact. We show that retraining the network achieves human-like performance level which is reported in psychophysical tasks. We also compare the accuracy of the detection on an image-by-image basis. This showed in particular that the two models perform better when combining their outputs. Indeed, animals (e.g. lions) tend to be less present in photographs containing artifacts (e.g. buildings). These re-trained models could reproduce some unexpected behavioral observations from humans psychophysics such as the robustness to rotations (e.g. upside-down or slanted image) or to a grayscale transformation. Finally, we quantitatively tested the number of layers of the CNN which are necessary to reach such a performance, showing that a good accuracy for ultra-fast categorization could be reached with only a few layers, challenging the belief that image recognition would require a deep sequential analysis of visual objects. We expect to apply this framework to guide future model-based psychophysical experiments and biomimetic deep neuronal architectures designed for such tasks.
Why do neurons communicate through spikes? By definition, spikes are all-or-none neural events which occur at continuous times. In other words, spikes are on one side binary, existing or not without further details, and on the other can occur at any asynchronous time, without the need for a centralized clock. This stands in stark contrast to the analog representation of values and the discretized timing classically used in digital processing and at the base of modern-day neural networks. As neural systems almost systematically use this so-called event-based representation in the living world, a better understanding of this phenomenon remains a fundamental challenge in neurobiology in order to better interpret the profusion of recorded data. With the growing need for intelligent embedded systems, it also emerges as a new computing paradigm to enable the efficient operation of a new class of sensors and event-based computers, called neuromorphic, which could enable significant gains in computation time and energy consumption, a major societal issue in the era of the digital economy and global warming. In this review paper, we provide evidence from biology, theory and engineering that the precise timing of spikes plays a crucial role in our understanding of the efficiency of neural networks.
Neurons in the primary visual cortex are selective to orientation with various degrees of selectivity to the spatial phase, from high selectivity in simple cells to low selectivity in complex cells. Various computational models have suggested a possible link between the presence of phase invariant cells and the existence of cortical orientation maps in higher mammals’ V1. These models, however, do not explain the emergence of complex cells in animals that do not show orientation maps. In this study, we build a model of V1 based on a convolutional network called Sparse Deep Predictive Coding (SDPC) and show that a single computational mechanism, pooling, allows the SDPC model to account for the emergence of complex cells as well as cortical orientation maps in V1, as observed in distinct species of mammals. By using different pooling functions, our model developed complex cells in networks that exhibit orientation maps (e.g., like in carnivores and primates) or not (e.g., rodents and lagomorphs). The SDPC can therefore be viewed as a unifying framework that explains the diversity of structural and functional phenomena observed in V1. In particular, we show that orientation maps emerge naturally as the most cost-efficient structure to generate complex cells under the predictive coding principle.
👉🏼 Jean-Nicolas Jérémie @JnJerem 📍Poster session at #FENS2022 🎯 Ultra-rapid visual search in natural images using active deep learning.#FENSAmbassador @SocNeuro_Tweets @laurentperrinet 1/2 pic.twitter.com/ldGJMCQb4c — Mélina Cordeau (@CordeauMelina) July 11, 2022 This work extends to natural scenes a previous work on visual search on a simplified task formulated in Emmanuel Daucé, Pierre Albigès, Laurent U Perrinet (2020).
On a daily basis, the primary visual cortex (V1) detects oriented elements from sensory inputs made of orientation distributions. To remain selective to a large variety of possible input configurations, V1 has to account for the precision of these inputs. Here, we decode the population activity of V1 to uncover a neural code which achieves invariance to input precision. Extracellular recordings were made from 247 V1 neurons in anesthetized cats in response to visually presented naturalistic textures (a). These textures were generated from two parameters : orientation and orientation precision. Using multinomial logistic regression, we were able to recover these two parameters from the population activity. We report two previously unknown types of neurons in V1 : predominantly infragranular neurons that encode solely orientation, and predominantly supragranular neurons which co-encode both orientation and its precision. Using a simple mean-rate population model, we observed that recurrent cortical inhibition can single-handedly account for the existence of these two types of neurons.
Visual search, that is, the simultaneous localization and detection of a visual target of interest, is a vital task. Applied to the case of natural scenes, searching for example to an animal (either a prey, a predator or a partner) constitutes a challenging problem due to large variability over numerous visual dimensions such as shape, pose, size, texture or position. Yet, biological visual systems are able to perform such detection efficiently in briefly flashed scenes and in a very short amount of time.Deep convolutional neuronal networks (CNNs) were shown to be well fitted to the image classification task, providing with human (or even super-human) performance. Previous models also managed to solve the visual search task, by roughly dividing the image into sub-areas. This is at the cost, however, of computer-intensive parallel processing on relatively low-resolution image samples. Taking inspiration from natural vision systems, we develop here a model that builds over the anatomical visual processing pathways observed in mammals, namely the What and the Where pathways. It operates in two steps, one by selecting regions of interest, before knowing their actual visual content, through an ultra-fast/low resolution analysis of the full visual field, and the second providing a detailed categorization over the detailed foveal selected region attained with a saccade.
Convolutional Neural Networks have been considered the go-to option for object recognition in computer vision for the last couple of years. However, their invariance to object’s translations is still deemed as a weak point and remains limited to small translations only via their max-pooling layers. One bio-inspired approach considers the What/Where pathway separation in Mammals to overcome this limitation. This approach works as a nature-inspired attention mechanism, another classical approach of which is Spatial Transformers. These allow an adaptive endto-end learning of different classes of spatial transformations throughout training. In this work, we overview Spatial Transformers as an attention-only mechanism and compare them with the What/Where model. We show that the use of attention restricted or ``Foveated’’ Spatial Transformer Networks, coupled alongside a curriculum learning training scheme and an efficient log-polar visual space entry, provides better performance when compared to the What/Where model, all this without the need for any extra supervision whatsoever.
see a follow-up in: Jean-Nicolas Jérémie, Laurent U Perrinet (2023). Ultra-Fast Image Categorization in biology and in neural models. Vision. Preprint Cite DOI see an extension to visual search in: Jean-Nicolas Jérémie, Emmanuel Daucé, Laurent U Perrinet (2022).
Both neurophysiological and psychophysical experiments have pointed out the crucial role of recurrent and feedback connections to process context-dependent information in the early visual cortex. While numerous models have accounted for feedback effects at either neural or representational level, none of them were able to bind those two levels of analysis. Is it possible to describe feedback effects at both levels using the same model? We answer this question by combining Predictive Coding (PC) and Sparse Coding (SC) into a hierarchical and convolutional framework. In this Sparse Deep Predictive Coding (SDPC) model, the SC component models the internal recurrent processing within each layer, and the PC component describes the interactions between layers using feedforward and feedback connections. Here, we train a 2-layered SDPC on two different databases of images, and we interpret it as a model of the early visual system (V1~&~V2). We first demonstrate that once the training has converged, SDPC exhibits oriented and localized receptive fields in V1 and more complex features in V2. Second, we analyze the effects of feedback on the neural organization beyond the classical receptive field of V1 neurons using interaction maps. These maps are similar to association fields and reflect the Gestalt principle of good continuation. We demonstrate that feedback signals reorganize interaction maps and modulate neural activity to promote contour integration. Third, we demonstrate at the representational level that the SDPC feedback connections are able to overcome noise in input images. Therefore, the SDPC captures the association field principle at the neural level which results in better disambiguation of blurred images at the representational level.
Dans la pièce de théâtre la plus célèbre de Marivaux Le jeu de l’amour et du hasard, l’auteur joue à inverser le rôle des personnages, et le hasard est invité à guider leurs destins. De la même façon, notre cerveau est ballotté au gré du hasard, aussi bien dans une loterie que dans les incertitudes et ambiguı̈tés révélées dans la vision par les illusions d’optique. Au point que l’on peut attribuer à un esprit malin le fait que la tartine tombe du côté de la confiture, ou que la fiche du câble USB soit toujours dans le mauvais sens. Le hasard s’invite comme un personnage à part entière dans la cognition, et on peut s’interroger du rôle que celui-ci peut jouer dans le fonctionnement de notre cerveau.
Hierarchical Sparse Coding (HSC) is a powerful model to efficiently represent multi-dimensional, structured data such as images. The simplest solution to solve this computationally hard problem is to decompose it into independent layer-wise subproblems. However, neuroscientific evidence would suggest inter-connecting these subproblems as in the Predictive Coding (PC) theory, which adds top-down connections between consecutive layers. In this study, a new model called 2-Layers Sparse Predictive Coding (2L-SPC) is introduced to assess the impact of this inter-layer feedback connection. In particular, the 2L-SPC is compared with a Hierarchical Lasso (Hi-La) network made out of a sequence of independent Lasso layers. The 2L-SPC and the 2-layers Hi-La networks are trained on 4 different databases and with different sparsity parameters on each layer. First, we show that the overall prediction error generated by 2L-SPC is lower thanks to the feedback mechanism as it transfers prediction error between layers. Second, we demonstrate that the inference stage of the 2L-SPC is faster to converge than for the Hi-La model. Third, we show that the 2L-SPC also accelerates the learning process. Finally, the qualitative analysis of both models dictionaries, supported by their activation probability, show that the 2L-SPC features are more generic and informative.
Humans are able to accurately track a moving object with a combination of saccades and smooth eye movements. These movements allow us to align and stabilize the object on the fovea, thus enabling highresolution visual analysis. When predictive information is available about target motion, anticipatory smooth pursuit eye movements (aSPEM) are efficiently generated before target appearance, which reduce the typical sensorimotor delay between target motion onset and foveation. It is generally assumed that the role of anticipatory eye movements is to limit the behavioral impairment due to eyetotarget position and velocity mismatch. By manipulating the probability for target motion direction we were able to bias the direction and mean velocity of aSPEM, as measured during a fixed duration gap before target rampmotion onset. This suggests that probabilistic information may be used to inform the internal representation of motion prediction for the initiation of anticipatory movements. However, such estimate may become particularly challenging in a dynamic context, where the probabilistic contingencies vary in time in an unpredictable way. In addition, whether and how the information processing underlying the buildup of aSPEM is linked to an explicit estimate of probabilities is unknown. We developed a new paired* task paradigm in order to address these two questions. In a first session, participants observe a target moving horizontally with constant speed from the center either to the right or left across trials. The probability of either motion direction changes randomly in time. Participants are asked to estimate "how much they are confident that the target will move to the right or left in the next trial" and to adjust the cursor’s position on the screen accordingly. In a second session the participants eye movements are recorded during the observation of the same sequence of randomdirection trials. In parallel, we are developing new automatic routines for the advanced analysis of oculomotor traces. In order to extract the relevant parameters of the oculomotor responses (latency, gain, initial acceleration, catchup saccades), we developed new tools based on best*fitting procedure of predefined patterns (i.e. the typical smooth pursuit velocity profile).
Within the central nervous system, visual areas are essential in transforming the raw luminous signal into a representation which efficiently conveys information about the environment. This process is constrained by the necessity of being robust and rapid. Indeed, there exists both a wide variety of potential changes in the geometrical characteristics of the visual scene and also a necessity to be able to respond as quickly as possible to the incoming sensory stream, for instance to drive a movement of the eyes to the location of a potential danger. Decades of study in neurophysiology and psychophysics at the different levels of vision have shown that this system takes advantage of a priori knowledge about the structure of visual information, such as the regularity in the shape and motion of visual objects. As such, the predictive processing framework offers a unified theory to explain a variety of visual mechanisms. However, we still lack a global normative approach unifying those mechanisms and we will review here some recent and promising approaches. First, we will describe Active Inference, a form of predictive processing equipped with the ability to actively sample the visual space. Then, we will extend this paradigm to the case where information is distributed on a topography, such as is the case for retinotopically organized visual areas. In particular, we will compare such models in light of recent neurophysiological data showing the role of traveling waves in shaping visual processing. Finally, we will propose some lines of research to understand how these functional models may be implemented at the neural level. In particular, we will review potential models of cortical processing in terms of prototypical micro-circuits. These allow to separate the different flows of information, from feed-forward prediction error to feed-back anticipation error. Still, the design of such a generic predictive processing circuit is still not fully understood and we will enumerate some possible implementations using biomimetic neural networks.
Traveling waves have recently been observed in different animal species, brain areas and behavioral states. However, it is still unclear what are their functional roles. In the case of cortical visual processing, waves propagate across retinotopic maps and can hereby generate interactions between spatially and temporally separated instances of feedforward driven activity. Such interactions could participate in processing long-range apparent motion stimuli, an illusion for which no clear neuronal mechanisms have yet been proposed. Using this paradigm in awake monkeys, we show that suppressive traveling waves produce to a spatio-temporal normalization of apparent motion stimuli. Our study suggests that cortical waves shape the representation of illusory moving stimulus within retinotopic maps for an straightforward read-out by downstream areas.
Motion detection represents one of the critical tasks of the visual system and has motivated a large body of research. However, it remains unclear precisely why the response of retinal ganglion cells (RGCs) to simple artificial stimuli does not predict their response to complex, naturalistic stimuli. To explore this topic, we use Motion Clouds (MC), which are synthetic textures that preserve properties of natural images and are merely parameterized, in particular by modulating the spatiotemporal spectrum complexity of the stimulus by adjusting the frequency bandwidths. By stimulating the retina of the diurnal rodent, Octodon degus with MC we show that the RGCs respond to increasingly complex stimuli by narrowing their adjustment curves in response to movement. At the level of the population, complex stimuli produce a sparser code while preserving movement information; therefore, the stimuli are encoded more efficiently. Interestingly, these properties were observed throughout different populations of RGCs. Thus, our results reveal that the response at the level of RGCs is modulated by the naturalness of the stimulus -in particular for motion- which suggests that the tuning to the statistics of natural images already emerges at the level of the retina.
If you’re at #sfn2019 and have an interest in #sparse #deep Predictive Coding, checkout @VictorBoutin ‘s poster 403.16 / P20:https://t.co/2VLEsl98oU It shows today + comes with a (timely) preprint https://t.co/FfKi9tjqrN !
Lorsque nous observons un sablier, lorsque nous fixons notre regard sur les grains de sable qui tombent, nous avons le sentiment que le temps s’écoule de façon continue. Nous pensons qu’il en est ainsi depuis la naissance du monde, et que rien ne peut contredire cette vérité universelle. Pourtant, nos perceptions sensorielles et les neurones qui en sont à l’origine ont une toute autre manière de scander le temps. Une manière subjective et sensuelle, au sens propre du terme.
A common practice to account for psychophysical biases in vision is to frame them as consequences of a dynamic process relying on optimal inference with respect to a generative model. The present study details the complete formulation of such a generative model intended to probe visual motion perception. It is first derived in a set of axiomatic steps constrained by biological plausibility. We then extend previous contributions by detailing three equivalent formulations of the Gaussian dynamic texture model. First, the composite dynamic textures are constructed by the random aggregation of warped patterns, which can be viewed as 3D Gaussian fields. Second, these textures are cast as solutions to a stochastic partial differential equation (sPDE). This essential step enables real time, on-the-fly, texture synthesis using time-discretized auto-regressive processes. It also allows for the derivation of a local motion-energy model, which corresponds to the log-likelihood of the probability density. The log-likelihoods are finally essential for the construction of a Bayesian inference framework. We use the model to probe speed perception in humans psychophysically using zoom-like changes in stimulus spatial frequency content. The likelihood is contained within the generative model and we chose a slow speed prior consistent with previous literature. We then validated the fitting process of the model using synthesized data. The human data replicates previous findings that relative perceived speed is positively biased by spatial frequency increments. The effect cannot be fully accounted for by previous models, but the current prior acting on the spatio-temporal likelihoods has proved necessary in accounting for the perceptual bias.
see a write-up in “Humans adapt their anticipatory eye movements to the volatility of visual motion properties” as presented at https://eyemovements.sciencesconf.org/ get the poster code : https://github.com/invibe/ANEMO/
The selectivity of the visual system to oriented patterns is very well documented in a wide range of species, especially in mammals. In particular, neurons of the primary visual cortex are anatomically grouped by their preference to a given oriented visual stimulus. Interactions between such groups of neurons have been successfully modeled using recurrently-connected network of spiking neurons, so called "ring models". Nonetheless, this selectivity is most often studied with crystal-like patterns such as gratings. Here, we studied the ability of human observers to discriminate texture-like patterns for which we could quantitatively tune the precision of their oriented content and we propose a generic model to explain such results. The first contribution shows that the discrimination threshold as a function of the precision did not vary smoothly as would be expected, but more in a binary, "all or none" fashion. Our second contribution is to propose a novel model of orientation selectivity that is based on deep-learning techniques, which performance we evaluated in the same task. This model has human-like performance in term of accuracy and exhibits qualitatively similar psychophysical curves. One hypothesis that such a structure allows for the system to be robust to noise in its visual inputs.
Due to its inherent neural delays, the visual system has an outdated access to sensory information about the current position of moving objects. In contrast, living organisms are remarkably able to track and intercept moving objects under a large range of challenging environmental conditions. Physiological, behavioral and psychophysical evidences strongly suggest that position coding is extrapolated using an explicit and reliable representation of object’s motion but it is still unclear how these two representations interact. For instance, the so-called flash-lag effect supports the idea of a differential processing of position between moving and static objects. Although elucidating such mechanisms is crucial in our understanding of the dynamics of visual processing, a theory is still missing to explain the different facets of this visual illusion. Here, we reconsider several of the key aspects of the flash-lag effect in order to explore the role of motion upon neural coding of objects’ position. First, we formalize the problem using a Bayesian modeling framework which includes a graded representation of the degree of belief about visual motion. We introduce a motion-based prediction model as a candidate explanation for the perception of coherent motion. By including the knowledge of a fixed delay, we can model the dynamics of sensory information integration by extrapolating the information acquired at previous instants in time. Next, we simulate the optimal estimation of object position with and without delay compensation and compared it with human perception under a broad range of different psychophysical conditions. Our computational study suggests that the explicit, probabilistic representation of velocity information is crucial in explaining position coding, and therefore the flash-lag effect. We discuss these theoretical results in light of the putative corrective mechanisms that can be used to cancel out the detrimental effects of neural delays and illuminate the more general question of the dynamical representation of spatial information at the present time in the visual pathways.
La vision utilise un faisceau d’informations de différentes qualités pour atteindre une perception unifiée du monde environnant. Nous avons utilisé lors de plusieurs projets art-science (voir https://github.com/NaturalPatterns) des installations permettant de manipuler explicitement des composantes de ce flux d’information et de révéler des ambiguités dans notre perception. Dans l’installation Tropique, des faisceaux de lames lumineuses sont arrangés dans l’espace assombri de l’installation. Les spectateurs les observent grâce à leur interaction avec une brume invisible qui est diffusée dans l’espace. Dans Trame Élasticité, 25 parallélépipèdes de miroirs (3m de haut) sont arrangés verticalement sur une ligne horizontale. Ces lames sont rotatives et leurs mouvements est synchronisé. Suivant la dyamique qui est imposé à ces lames, la perception de l’espace environnent fluctue conduisant à recomposer l’espace de la concentration à l’expansion, ou encore à générer un surface semblant transparente ou inverser la visons de ce qui est située devant et derrière l’observateur. Enfin, dans Trame instabilité, nous explorons l’interaction de séries périodiques de points placées sur des surfaces transparentes. À partir de premières expérimentations utilisant une technique novatrice de sérigraphie, ces trames de points sont placées afin de faire émerger des structures selon le point de vue du spectateur. De manière générale, nous montrerons ici les différentes méthodes utilisées, comme l’utilisation des limites perceptives, et aussi les résultats apportés par une telle collaboration.
Sparse coding of images in the retina follows regular statistics at the global, not the local scale. See supplementray code. How does the retina respond to stimuli with different sparseness?
Natural scenes generally contain objects in motion. The local orientation of their contours and the direction of motion are two essential components of visual information which are processed in parallel in the early visual areas. Focusing on the primary visual cortex of the macaque monkey (V1), we challenged different models for the joint representation of orientation and direction within the neural activity. Precisely, we considered the response of V1 neurons to an oriented moving bar to investigate whether, and how, the information about the bar’s orientation and direction could be encoded dynamically at the population activity level. For that purpose, we used a decoding approach based on a space-time receptive field model that encodes jointly orientation and direction. We based our decoding approach on the statistics of natural scenes by first determining optimal space-time receptive fields (RFs) that encode orientation and direction. For this, we first derived a set of dynamic filters from a database of natural images~[1] and following an unsupervised learning rule~[2]. More generally, this allows us to propose a dynamic generative model for the joint coding of orientation and direction. Then, using this model and a maximum likelihood paradigm, we infer the most likely representation for a given network activity~[3, 4]. We tested this model on surrogate data and on extracellular recordings in area emphV1 (67 cells) of awake macaque monkeys in response to oriented bars moving in $12$ different directions. Using a cross-validation method we could robustly decode both the orientation and the direction of the bar within the classical receptive field (cRF). Furthermore, this decoding approach shows different properties: First, information about the orientation of the bar is emerging ıt before entering the cRF if the trajectory of the bar is long enough. Second, when testing different orientations with the same direction, our approach unravels that we can decode the direction and the orientation independently. Moreover, we found that, similarly to orientation decoding, the decoding of direction is dynamic but weaker. Finally, our results demonstrate that the orientation and the direction of motion of an ambiguous moving bar can be progressively decoded in V1. This is a signature of a dynamic solution to the aperture problem in area V1, similarly to what was already found in area MT~[5]. $[1]$ J. Burge, W. Geisler. Optimal speed estimation in natural image movies predicts human performance. Nature Communications, 6, 7900. http://doi.org/10.1038/ncomms8900, 2015. $[2]$ L. Perrinet. Role of homeostasis in learning sparse representations. ıt Neural Computation, 22(7):1812–36, 2010. $[3]$ M. Jazayeri and J.A. Movshon. Optimal representation of sensory information by neural populations. ıt Nature Neuroscience, 9(5):690–696, 2006. $[4]$ W. Taouali, G. Benvenuti, P. Wallisch, F. Chavane, L. Perrinet. Testing the Odds of Inherent versus Observed Over-dispersion in Neural Spike Counts. ıt Journal of Neurophysiology, 2015. $[5]$ C. Pack, R. Born. Temporal dynamics of a neural solution to the aperture problem in visual area MT of macaque brain. ıt Nature, 409(6823), 1040–1042. 2001.
Natural scenes generally contain objects in motion. The local orientation of their contours and the direction of motion are two essential components of visual information which are processed in parallel in the early visual areas. Focusing on the primary visual cortex of the macaque monkey (V1), we challenged different models for the joint representation of orientation and direction within the neural activity. Precisely, we considered the response of V1 neurons to an oriented moving bar to investigate whether, and how, the information about the bar’s orientation and direction could be encoded dynamically at the population activity level. For that purpose, we used a decoding approach based on a space-time receptive field model that encodes jointly orientation and direction. We based our decoding approach on the statistics of natural scenes by first determining optimal space-time receptive fields (RFs) that encode orientation and direction. For this, we first derived a set of dynamic filters from a database of natural images~[1] and following an unsupervised learning rule~[2]. More generally, this allows us to propose a dynamic generative model for the joint coding of orientation and direction. Then, using this model and a maximum likelihood paradigm, we infer the most likely representation for a given network activity~[3, 4]. We tested this model on surrogate data and on extracellular recordings in area emphV1 (67 cells) of awake macaque monkeys in response to oriented bars moving in $12$ different directions. Using a cross-validation method we could robustly decode both the orientation and the direction of the bar within the classical receptive field (cRF). Furthermore, this decoding approach shows different properties: First, information about the orientation of the bar is emerging ıt before entering the cRF if the trajectory of the bar is long enough. Second, when testing different orientations with the same direction, our approach unravels that we can decode the direction and the orientation independently. Moreover, we found that, similarly to orientation decoding, the decoding of direction is dynamic but weaker. Finally, our results demonstrate that the orientation and the direction of motion of an ambiguous moving bar can be progressively decoded in V1. This is a signature of a dynamic solution to the aperture problem in area V1, similarly to what was already found in area MT~[5]. $[1]$ J. Burge, W. Geisler. Optimal speed estimation in natural image movies predicts human performance. Nature Communications, 6, 7900. http://doi.org/10.1038/ncomms8900, 2015. $[2]$ L. Perrinet. Role of homeostasis in learning sparse representations. ıt Neural Computation, 22(7):1812–36, 2010. $[3]$ M. Jazayeri and J.A. Movshon. Optimal representation of sensory information by neural populations. ıt Nature Neuroscience, 9(5):690–696, 2006. $[4]$ W. Taouali, G. Benvenuti, P. Wallisch, F. Chavane, L. Perrinet. Testing the Odds of Inherent versus Observed Over-dispersion in Neural Spike Counts. ıt Journal of Neurophysiology, 2015. $[5]$ C. Pack, R. Born. Temporal dynamics of a neural solution to the aperture problem in visual area MT of macaque brain. ıt Nature, 409(6823), 1040–1042. 2001.
Neurons in the primary visual cortex are known for responding vigorously but with high variability to classical stimuli such as drifting bars or gratings. By contrast, natural scenes are encoded more efficiently by sparse and temporal precise spiking responses. We used a conductance-based model of the visual system in higher mammals to investigate how two specific features of the thalamo-cortical pathway, namely push-pull receptive field organization and synaptic depression, can contribute to this contextual reshaping of V1 responses. By comparing cortical dynamics evoked respectively by natural vs. artificial stimuli in a comprehensive parametric space analysis, we demonstrate that the reliability and sparseness of the spiking responses during natural vision is not a mere consequence of the increased bandwidth in the sensory input spectrum. Rather, it results from the combined impacts of synaptic depression and push-pull inhibition, the later acting for natural scenes as a form of ``effective’’ feed-forward inhibition as demonstrated in other sensory systems. Thus, the combination of feedforward-like inhibition with fast thalamo-cortical synaptic depression by simple cells receiving a direct structured input from thalamus composes a generic computational mechanism for generating a sparse and reliable encoding of natural sensory events.
As the state-of-the-art imaging technologies became more and more advanced, yielding scientific data at unprecedented detail and volume, the need to process and interpret all the data has made image processing and computer vision also increasingly important. Sources of data that have to be routinely dealt with today applications include video transmission, wireless communication, automatic fingerprint processing, massive databases, non-weary and accurate automatic airport screening, robust night vision to name a few. Multidisciplinary inputs from other disciplines such as computational neuroscience, cognitive science, mathematics, physics and biology will have a fundamental impact in the progress of imaging and vision sciences. One of the advantages of the study of biological organisms is to devise very different type of computational paradigms beyond the usual von Neumann e.g. by implementing a neural network with a high degree of local connectivity. This is a comprehensive and rigorous reference in the area of biologically motivated vision sensors. The study of biologically visual systems can be considered as a two way avenue. On the one hand, biological organisms can provide a source of inspiration for new computational efficient and robust vision models and on the other hand machine vision approaches can provide new insights for understanding biological visual systems. Along the different chapters, this book covers a wide range of topics from fundamental to more specialized topics, including visual analysis based on a computational level, hardware implementation, and the design of new more advanced vision sensors. The last two sections of the book provide an overview of a few representative applications and current state of the art of the research in this area. This makes it a valuable book for graduate, Master, PhD students and also researchers in the field.
Talk @ NeurIPS: https://neurips.cc/Conferences/2015/Schedule?showEvent=5418 See a followup in Jonathan Vacher, Andrew Isaac Meso, Laurent U Perrinet, Gabriel Peyré (2018). Bayesian Modeling of Motion Perception using Dynamical Stochastic Textures. Neural Computation.
Making a judgment about the semantic category of a visual scene, such as whether it contains an animal, is typically assumed to involve high-level associative brain areas. Previous explanations require progressively analyzing the scene hierarchically at increasing levels of abstraction, from edge extraction to mid-level object recognition and then object categorization. Here we show that the statistics of edge co-occurrences alone are sufficient to perform a rough yet robust (translation, scale, and rotation invariant) scene categorization. We first extracted the edges from images using a scale-space analysis coupled with a sparse coding algorithm. We then computed the ``association field’’ for different categories (natural, man-made, or containing an animal) by computing the statistics of edge co-occurrences. These differed strongly, with animal images having more curved configurations. We show that this geometry alone is sufficient for categorization, and that the pattern of errors made by humans is consistent with this procedure. Because these statistics could be measured as early as the primary visual cortex, the results challenge widely held assumptions about the flow of computations in the visual system. The results also suggest new algorithms for image classification and signal processing that exploit correlations between low-level structure and the underlying semantic category.
This paper considers the problem of sensorimotor delays in the optimal control of (smooth) eye movements under uncertainty. Specifically, we consider delays in the visuo-oculomotor loop and their implications for active inference. Active inference uses a generalisation of Kalman filtering to provide Bayes optimal estimates of hidden states and action in generalised coordinates of motion. Representing hidden states in generalised coordinates provides a simple way of compensating for both sensory and oculomotor delays. The efficacy of this scheme is illustrated using neuronal simulations of pursuit initiation responses, with and without compensation. We then consider an extension of the generative model to simulate smooth pursuit eye movements in which the visuo-oculomotor system believes both the target and its centre of gaze are attracted to a (hidden) point moving in the visual field. Finally, the generative model is equipped with a hierarchical structure, so that it can recognise and remember unseen (occluded) trajectories and emit anticipatory responses. These simulations speak to a straightforward and neurobiologically plausible solution to the generic problem of integrating information from different sources with different temporal delays and the particular difficulties encountered when a system, like the oculomotor system, tries to control its environment with delayed signals.
In order to analyze the characteristics of a rich dynamic visual environment, the visual system must integrate information collected at different scales through different spatiotemporal frequency channels. Still, it remains unclear how reliable representations of motion direction or speed are elaborated when presented with large bandwidth motion stimuli or natural statistics. Last year, we have shown that broadening the spatiotemporal frequency content of a textured pattern moving at constant speed leads to different results on a reflexive tracking task and a speed discrimination task. Larger bandwidth stimuli increase response amplitude and sensitivity of ocular following, consistently with a maximum-likelihood (ML) model of motion decoding. In contrast, larger bandwidth stimuli impair speed discrimination performance, suggesting that the perceptual system cannot take advantage of such additional, redundant information. Instead of ML, a gain control decoding mechanism can explain the drop in performance, suggesting that action and perception rely on different decoding mechanisms. To further investigate such task-dependant pooling of motion information, we measured pattern discrimination performance using these textured stimuli. Two noise patterns were presented sequentially for 250 ms on a CRT monitor (1280 × 1024 @ 100 Hz) and covered 47$,^∘$ of visual angle with identical properties (mean SF, bandwidth SF, speed) except for a randomized phase spectrum. A test pattern was then presented and subjects were asked to match it with one or the other reference stimulus (ABX task). At small bandwidth and optimal mean spatial frequency (0.3 cpd), subjects were able to discriminate the two patterns with high accuracy. Performance dropped to chance level as spatial frequency bandwidth increased. Increasing the mean spatial frequency decreased the overall performance. Again, these results suggest that perceptual performance is deteriorated in presence of larger information.
Choosing an appropriate set of stimuli is essential to characterize the response of a sensory system to a particular functional dimension, such as the eye movement following the motion of a visual scene. Here, we describe a framework to generate random texture movies with controlled information content, i.e., Motion Clouds. These stimuli are defined using a generative model that is based on controlled experimental parametrization. We show that Motion Clouds correspond to dense mixing of localized moving gratings with random positions. Their global envelope is similar to natural-like stimulation with an approximate full-field translation corresponding to a retinal slip. We describe the construction of these stimuli mathematically and propose an open-source Python-based implementation. Examples of the use of this framework are shown. We also propose extensions to other modalities such as color vision, touch, and audition.
Most studies on the dynamics of recurrent cortical networks are either based on purely random wiring or neighborhood couplings. Neuronal cortical connectivity, however, shows a complex spatial pattern composed of local and remote patchy connections. We ask to what extent such geometric traits influence the ’’ idle’’ dynamics of two-dimensional (2d) cortical network models composed of conductance-based integrate-and-fire (iaf) neurons. In contrast to the typical 1 mm2 used in most studies, we employ an enlarged spatial set-up of 25 mm2 to provide for long-range connections. Our models range from purely random to distance-dependent connectivities including patchy projections, i.e., spatially clustered synapses. Analyzing the characteristic measures for synchronicity and regularity in neuronal spiking, we explore and compare the phase spaces and activity patterns of our simulation results. Depending on the input parameters, different dynamical states appear, similar to the known synchronous regular ’’ SR’’ or asynchronous irregular ’’ AI’’ firing in random networks. Our structured networks, however, exhibit shifted and sharper transitions, as well as more complex activity patterns. Distance-dependent connectivity structures induce a spatio-temporal spread of activity, e.g., propagating waves, that random networks cannot account for. Spatially and temporally restricted activity injections reveal that a high amount of local coupling induces rather unstable AI dynamics. We find that the amount of local versus long-range connections is an important parameter, whereas the structurally advantageous wiring cost optimization of patchy networks has little bearing on the phase space.
Moving objects generate motion information at different scales, which are processed in the visual system with a bank of spatiotemporal frequency channels. It is not known how the brain pools this information to reconstruct object speed and whether this pooling is generic or adaptive; that is, dependent on the behavioral task. We used rich textured motion stimuli of varying bandwidths to decipher how the human visual motion system computes object speed in different behavioral contexts. We found that, although a simple visuomotor behavior such as short-latency ocular following responses takes advantage of the full distribution of motion signals, perceptual speed discrimination is impaired for stimuli with large bandwidths. Such opposite dependencies can be explained by an adaptive gain control mechanism in which the divisive normalization pool is adjusted to meet the different constraints of perception and action.
In low-level sensory systems, it is still unclear how the noisy information collected locally by neurons may give rise to a coherent global percept. This is well demonstrated for the detection of motion in the aperture problem: as luminance of an elongated line is symmetrical along its axis, tangential velocity is ambiguous when measured locally. Here, we develop the hypothesis that motion-based predictive coding is sufficient to infer global motion. Our implementation is based on a context-dependent diffusion of a probabilistic representation of motion. We observe in simulations a progressive solution to the aperture problem similar to psychophysics and behavior. We demonstrate that this solution is the result of two underlying mechanisms. First, we demonstrate the formation of a tracking behavior favoring temporally coherent features independently of their texture. Second, we observe that incoherent features are explained away while coherent information diffuses progressively to the global scale. Most previous models included ad-hoc mechanisms such as end-stopped cells or a selection layer to track specific luminance-based features. Here, we have proved that motion-based predictive coding, as it is implemented in this functional model, is sufficient to solve the aperture problem. This simpler solution may give insights in the role of prediction underlying a large class of sensory computations.
If perception corresponds to hypothesis testing (Gregory, 1980); then visual searches might be construed as experiments that generate sensory data. In this work, we explore the idea that saccadic eye movements are optimal experiments, in which data are gathered to test hypotheses or beliefs about how those data are caused. This provides a plausible model of visual search that can be motivated from the basic principles of self-organized behavior: namely, the imperative to minimize the entropy of hidden states of the world and their sensory consequences. This imperative is met if agents sample hidden states of the world efficiently. This efficient sampling of salient information can be derived in a fairly straightforward way, using approximate Bayesian inference and variational free-energy minimization. Simulations of the resulting active inference scheme reproduce sequential eye movements that are reminiscent of empirically observed saccades and provide some counterintuitive insights into the way that sensory evidence is accumulated or assimilated into beliefs about the world.
Qui créera le premier ordinateur intelligent? Les ordinateurs classiques sont de plus en plus puissants, mais restent toujours aussi « stupides ». Impossible d’en trouver un avec lequel on puisse dialoguer de façon naturelle.
Neurons in the input layer of primary visual cortex in primates develop edge-like receptive fields. One approach to understanding the emergence of this response is to state that neural activity has to efficiently represent sensory data with respect to the statistics of natural scenes. Furthermore, it is believed that such an efficient coding is achieved using a competition across neurons so as to generate a sparse representation, that is, where a relatively small number of neurons are simultaneously active. Indeed, different models of sparse coding coupled with Hebbian learning and homeostasis have been proposed that successfully match the observed emergent response. However, the specific role of homeostasis in learning such sparse representations is still largely unknown. By quantitatively assessing the efficiency of the neural representation during learning, we derive a cooperative homeostasis mechanism which optimally tunes the competition between neurons within the sparse coding algorithm. We apply this homeostasis while learning small patches taken from natural images and compare its efficiency with state-of-the-art algorithms. Results show that while different sparse coding algorithms give similar coding results, the homeostasis provides an optimal balance for the representation of natural images within the population of neurons. Competition in sparse coding is optimized when it is fair: By contributing to optimize statistical competition across neurons, homeostasis is crucial in providing a more efficient solution to the emergence of independent components.
Neurons in the neocortex receive a large number of excitatory and inhibitory synaptic inputs. Excitation and inhibition dynamically balance each other, with inhibition lagging excitation by only few milliseconds. To characterize the functional consequences of such correlated excitation and inhibition, we studied models in which this correlation structure is induced by feedforward inhibition (FFI). Simple circuits show that an effective FFI changes the integrative behavior of neurons such that only synchronous inputs can elicit spikes, causing the responses to be sparse and precise. Further, effective FFI increases the selectivity for propagation of synchrony through a feedforward network, thereby increasing the stability to background activity. Last, we show that recurrent random networks with effective inhibition are more likely to exhibit dynamical network activity states as have been observed in vivo. Thus, when a feedforward signal path is embedded in such recurrent network, the stabilizing effect of effective inhibition creates an suitable substrate for signal propagation. In conclusion, correlated excitation and inhibition support the notion that synchronous spiking may be important for cortical processing.
We study cortical network dynamics for a more realistic network model. It represents, in terms of spatial scale, a large piece of cortex allowing for long-range connections, resulting in a rather sparse connectivity. We use two different types of conductance-based I&F neurons as excitatory and inhibitory units, as well as specific connection probabilities. In order to remain computationally tractable, we reduce neuron density, modelling part of the missing internal input via external poissonian spike trains. Compared to previous studies, we observe significant changes in the dynamical phase space: Altered activity patterns require another regularity measure than the coefficient of variation. We identify two types of mixed states, where different phases coexist in certain regions of the phase space. More notably, our boundary between high and low activity states depends predominantly on the relation between excitatory and inhibitory synaptic strength instead of the input rate. Key words:Artificial neural networks, Data analysis, Simulation, Spiking neurons. This work is supported by EC IP project FP6-015879 (FACETS).
NeuralEnsemble (http://neuralensemble.org) is a multilateral effort to coordinate and organise neuroscience software development efforts based around the Python programming language into a larger, meta-simulator software system. To this end, NeuralEnsemble hosts services for source code management and bug tracking (Subversion/Trac) for a number of open-source neuroscience tools, organizes an annual workshop devoted to collaborative software development in neuroscience, and manages a google-group discussion forum. Here, we present two NeuralEnsemble hosted projects: PyNN (http://neuralensemble.org/PyNN) is a package for simulator-independent specification of neuronal network models. You can write the code for a model once, using the PyNN API, and then run it without modification on any simulator that PyNN supports. Currently NEURON, NEST, PCSIM and a VLSI hardware implementation are fully supported. NeuroTools (http://neuralensemble.org/NeuroTools) is a set of tools to manage, store and analyse computational neuroscience simulations. It has been designed around PyNN, but can also be used for data from other simulation environments or even electrophysiological measurements. We will illustrate how the use of PyNN and NeuroTools ease the developmental process of models in computational neuroscience, enhancing collaboration between different groups and increasing the confidence in correctness of results. NeuralEnsemble efforts are supported by the European FACETS project (EU-IST-2005-15879)
Integrating information is essential to measure the physical 2D motion of a surface from both ambiguous local 1D motion of its elongated edges and non-ambiguous 2D motion of its features such as corners or texture elements. The dynamics of this motion integration shows a complex time course as read from tracking eye movements: first, local 1D motion signals are extracted and pooled to initiate ocular responses, then 2D motion signals are integrated to adjust the tracking direction until it matches the surface motion direction. The nature of these 1D and 2D motion computations are still unclear. One hypothesis is that their different dynamics may be explained from different contrast sensitivities. To test this, we measured contrast-response functions of early, 1D-driven and late, 2D-driven components of ocular following responses to different motion stimuli: gratings, plaids and barberpoles. We found that contrast dynamics of 1D-driven responses are nearly identical across the different stimuli. On the contrary, late 2D-driven components with either plaids or barberpoles have similar latencies but different contrast dynamics. Temporal dynamics of both 1D- and 2D-driven responses demonstrates that the different contrast gains are set very early during the response time course. Running a Bayesian model of motion integration, we show that a large family of contrast-response functions can be predicted from the probability distributions of 1D and 2D motion signals for each stimulus and by the shape of the prior distribution. However, the pure delay (i.e. largely independent upon contrast) observed between 1D- and 2D-motion supports the fact that 1D and 2D probability distributions are computed independently. This two-pathway Bayesian model supports the idea that 1D and 2D mechanisms represent edges and features motion in parallel.
If modern computers are sometimes superior to cognition in some specialized tasks such as playing chess or browsing a large database, they can’t beat the efficiency of biological vision for such simple tasks as recognizing a relative or following an object in a complex background. We present in this paper our attempt at outlining the dynamical, parallel and event-based representation for vision in the architecture of the central nervous system. We will illustrate this by showing that in a signal matching framework, a L/LN (linear/non-linear) cascade may efficiently transform a sensory signal into a neural spiking signal and we apply this framework to a model retina. However, this code gets redundant when using an over-complete basis as is necessary for modeling the primary visual cortex: we therefore optimize the efficiency cost by increasing the sparseness of the code. This is implemented by propagating and canceling redundant information using lateral interactions. We compare the efficiency of this representation in terms of compression as the reconstruction quality as a function of the coding length. This will correspond to a modification of the Matching Pursuit algorithm where the ArgMax function is optimized for competition, or Competition Optimized Matching Pursuit (COMP). We will particularly focus on bridging neuroscience and image processing and on the advantages of such an interdisciplinary approach.
The machinery behind the visual perception of motion and the subsequent sensorimotor transformation, such as in Ocular Following Response (OFR), is confronted to uncertainties which are efficiently resolved in the primate’s visual system. We may understand this response as an ideal observer in a probabilistic framework by using Bayesian theorỹWeiss02 which we previously proved to be successfully adapted to model the OFR for different levels of noise with full field gratings or with disk of various sizes and the effect of a flickering surround̃Perrinet07neurocomp. More recent experiments of OFR have used disk gratings and bipartite stimuli which are optimized to study the dynamics of center-surround integration. We quantified two main characteristics of the global spatial integration of motion from an intermediate map of possible local translation velocities: (i) a finite optimal stimulus size for driving OFR, surrounded by an antagonistic modulation and (ii) a direction selective suppressive effect of the surround on the contrast gain control of the central stimuliB̃arthelemy06,Barthelemy07. Herein, we extended in the dynamical domain the ideal observer model to simulate the spatial integration of the different local motion cues within a probabilistic representation. We present analytical results which show that the hypothesis of independence of local measures can describe the initial segment of spatial integration of motion signal. Within this framework, we successfully accounted for the dynamical contrast gain control mechanisms observed in the behavioral data for center-surround stimuli. However, another inhibitory mechanism had to be added to account for suppressive effects of the surround. We explore here an hypothesis where this could be understood as the effect of a recurrent integration of information in the velocity map. F. Barthelemy, L. U. Perrinet, E. Castet, and G. S. Masson. Dynamics of distributed 1D and 2D motion representations for short-latency ocular following. Vision Research, 48(4):501–22, feb 2007. doi: 10.1016/j.visres.2007.10.020. F. V. Barthelemy, I. Vanzetta, and G. S. Masson. Behavioral receptive field for ocular following in humans: Dynamics of spatial summation and center-surround interactions. Journal of Neurophysiology, (95):3712–26, Mar 2006. doi: 10.1152/jn.00112.2006. L. U. Perrinet and G. S. Masson. Modeling spatial integration in the ocular following response using a probabilistic framework. Journal of Physiology (Paris), 2007. doi: 10.1016/j.jphysparis.2007.10.011. Y. Weiss, E. P. Simoncelli, and E. H. Adelson. Motion illusions as optimal percepts. Nature Neuroscience, 5(6):598–604, Jun 2002. doi: 10.1038/nn858. This work was supported by EC IP project FP6-015879, ‘‘FACETS’’.
Computational neuroscience has produced a diversity of software for simulations of networks of spiking neurons, with both negative and positive consequences. On the one hand, each simulator uses its own programming or configuration language, leading to considerable difficulty in porting models from one simulator to another. This impedes communication between investigators and makes it harder to reproduce and build on the work of others. On the other hand, simulation results can be cross-checked between different simulators, giving greater confidence in their correctness, and each simulator has different optimizations, so the most appropriate simulator can be chosen for a given modelling task. A common programming interface to multiple simulators would reduce or eliminate the problems of simulator diversity while retaining the benefits. PyNN is such an interface, making it possible to write a simulation script once, using the Python programming language, and run it without modification on any supported simulator (currently NEURON, NEST, PCSIM, Brian and the Heidelberg VLSI neuromorphic hardware). PyNN increases the productivity of neuronal network modelling by providing high-level abstraction, by promoting code sharing and reuse, and by providing a foundation for simulator-agnostic analysis, visualization and data-management tools. PyNN increases the reliability of modelling studies by making it much easier to check results on multiple simulators. PyNN is open-source software and is available from http://neuralensemble.org/PyNN.
Meanwhile biorthogonal wavelets got a very popular image processing tool, alternative multiresolution transforms have been proposed for solving some of their drawbacks, namely the poor selectivity in orientation and the lack of translation invariance due to the aliasing between subbands. These transforms are generally overcomplete and consequently offer huge degrees of freedom in their design. At the same time their optimization get a challenging task. We proposed here a log-Gabor wavelet transform gathering the excellent mathematical properties of the Gabor functions with a carefully construction to maintain the properties of the filters and to permit exact reconstruction. Two major improvements are proposed: first the highest frequency bands are covered by narrowly localized oriented filters. And second, all the frequency bands including the highest and lowest frequencies are uniformly covered so as exact reconstruction is achieved using the same filters in both the direct and the inverse transforms (which means that the transform is self-invertible). The transform is optimized not only mathematically but it also follows as much as possible the knowledge on the receptive field of the simple cells of the Primary Visual Cortex (V1) of primates and on the statistics of natural images. Compared to the state of the art, the log-Gabor wavelets show excellent behavior in their ability to segregate the image information (e.g. the contrast edges) from incoherent Gaussian noise by hard thresholding and to code the image features through a reduced set of coefficients with large magnitude. Such characteristics make the transform a promising tool for general image processing tasks.
The machinery behind the visual perception of motion and the subsequent sensori-motor transformation, such as in Ocular Following Response (OFR), is confronted to uncertainties which are efficiently resolved in the primate’s visual system. We may understand this response as an ideal observer in a probabilistic framework by using Bayesian theory (Weiss et al., 2002) which we previously proved to be successfully adapted to model the OFR for different levels of noise with full field gratings (Perrinet et al., 2005). More recent experiments of OFR have used disk gratings and bipartite stimuli which are optimized to study the dynamics of center-surround integration. We quantified two main characteristics of the spatial integration of motion : (i) a finite optimal stimulus size for driving OFR, surrounded by an antagonistic modulation and (ii) a direction selective suppressive effect of the surround on the contrast gain control of the central stimuli (Barthélemy et al., 2006). Herein, we extended the ideal observer model to simulate the spatial integration of the different local motion cues within a probabilistic representation. We present analytical results which show that the hypothesis of independence of local measures can describe the integration of the spatial motion signal. Within this framework, we successfully accounted for the contrast gain control mechanisms observed in the behavioral data for center-surround stimuli. However, another inhibitory mechanism had to be added to account for suppressive effects of the surround.
I will illustrate in this talk how computational neuroscience may inspire and be inspired by mathematical image processing. Focusing on efficiently representing natural images in the primary visual cortex, we derive an event-based adaptive algorithm inspired by statistical inference, Matching Pursuit and Hebbian learning. This algorithm allows to learn efficient ëdge-likë̊eceptive fields similarly to Independent Components Analysis. The correlation-based inhibition has been shown to be a necessary condition for the fomation of this type of receptive fields and shows the putative functional role of lateral propagation of information in cortical layers. I’ll first present state-of-the-art neural algorithms for this task, the results of a detailed analysis of this Sparse Hebbian Learning algorithm and finally draw a comparison with similar strategies.
We describe the theoretical formulation of a learning algorithm in a model of the primary visual cortex (V1) and present results of the efficiency of this algorithm by comparing it to the SparseNet algorithm [1]. As the SparseNet algorithm, it is based on a model of signal synthesis as a Linear Generative Model but differs in the efficiency criteria for the representation. This learning algorithm is in fact based on an efficiency criteria based on the Occam razor: for a similar quality, the shortest representation should be privileged. This inverse problem is NP-complete and we propose here a greedy solution which is based on the architecture and nature of neural computations [2]). It proposes that the supra-threshold neural activity progressively removes redundancies in the representation based on a correlation-based inhibition and provides a dynamical implementation close to the concept of neural assemblies from Hebb [3]). We present here results of simulation of this network with small natural images and compare it to the Sparsenet solution. Extending it to realistic images and to the NEST simulator http://www.nest-initiative.org/, we show that this learning algorithm based on the properties of neural computations produces adaptive and efficient representations in V1. 1. Olshausen B, Field DJ: Sparse coding with an overcomplete basis set: A strategy employed by V1? Vision Res 1997, 37:3311-3325. 2. Perrinet L: Feature detection using spikes: the greedy approach. J Physiol Paris 2004, 98(4–6):530-539. 3. Hebb DO: The organization of behavior. Wiley, New York; 1949.
Trends in programming language development and adoption point to Python as the high-level systems integration language of choice. Python leverages a vast developer-base external to the neuroscience community, and promises leaps in simulation complexity and maintainability to any neural simulator that adopts it. PyNN http://neuralensemble.org/PyNN strives to provide a uniform application programming interface (API) across neural simulators. Presently NEURON and NEST are supported, and support for other simulators and neuromorphic VLSI hardware is under development. With PyNN it is possible to write a simulation script once and run it without modification on any supported simulator. It is also possible to write a script that uses capabilities specific to a single simulator. While this sacrifices simulator-independence, it adds flexibility, and can be a useful step in porting models between simulators. The design goals of PyNN include allowing access to low-level details of a simulation where necessary, while providing the capability to model at a high level of abstraction, with concomitant gains in development speed and simulation maintainability. Another of our aims with PyNN is to increase the productivity of neuroscience modeling, by making it faster to develop models de novo, by promoting code sharing and reuse across simulator communities, and by making it much easier to debug, test and validate simulations by running them on more than one simulator. Modelers would then become free to devote more software development effort to innovation, building on the simulator core with new tools such as network topology databases, stimulus programming, analysis and visualization tools, and simulation accounting. The resulting, community-developed ‘meta-simulator’ system would then represent a powerful tool for overcoming the so-called complexity bottleneck that is presently a major roadblock for neural modeling.
relies on log-Gabor filters: Sylvain Fischer, Filip Šroubek, Laurent U Perrinet, Rafael Redondo, Gabriel Cristóbal (2007). Self-Invertible 2D Log-Gabor Wavelets. International Journal of Computer Vision. PDF Cite Code DOI Schematic structure of the primary visual cortex implemented in the present study.
We describe the theoretical formulation of a learning algorithm in a model of the primary visual cortex (V1) and present results of the efficiency of this algorithm by comparing it to the SparseNet algorithm (Olshausen, 1996). As the SparseNet algorithm, it is based on a model of signal synthesis as a Linear Generative Model but differs in the efficiency criteria for the representation. This learning algorithm is in fact based on an efficiency criteria based on the Occam razor: for a similar quality, the shortest representation should be privilegied. This inverse problem is NP-complete and we propose here a greedy solution which is based on the architecture and nature of neural computations (Perrinet, 2006). We present here results of a simulation of this network of small natural images (available at https://github.com/bicv/SparseHebbianLearning) and compare it to the SparseNet solution. We show that this solution based on neural computations produces an adaptive algorithm for efficient representations in V1.
To understand possible strategies of temporal spike coding in the central nervous system, we study functional neuromimetic models of visual processing for static images. We will first present the retinal model which was introduced by Van Rullen and Thorpe [1] and which represents the multiscale contrast values of the image using an orthonormal wavelet transform. These analog values activate a set of spiking neurons which each fire once to produce an asynchronous wave of spikes. According to this model, the image may be progressively reconstructed from this spike wave thanks to regularities in the statistics of the coefficients determined with natural images. Here, we study mathematically how the quality of information transmission carried by this temporal representation varies over time. In particular, we study how these regularities can be used to optimize information transmission by using a form of temporal cooperation of neurons to code analog values. The original model used wavelet transforms that are close to orthogonal. However, the selectivity of realistic neurons overlap, and we propose an extension of the previous model by adding a spatial cooperation between filters. This model extends the previous scheme for arbitrary -and possibly non-orthogonal representations of features in the images. In particular, we compared the performance of increasingly over-complete representations in the retina. Results show that this algorithm provides an efficient spike coding strategy for low-level visual processing which may adapt to the complexity of the visual input.
A goal of low-level neural processes is to build an efficient code extracting the relevant information from the sensory input. It is believed that this is implemented in cortical areas by elementary inferential computations dynamically extracting the most likely parameters corresponding to the sensory signal. We explore here a neuro-mimetic feed-forward model of the primary visual area (V1) solving this problem in the case where the signal may be described by a robust linear generative model. This model uses an over-complete dictionary of primitives which provides a distributed probabilistic representation of input features. Relying on an efficiency criterion, we derive an algorithm as an approximate solution which uses incremental greedy inference processes. This algorithm is similar to ‘Matching Pursuit’ and mimics the parallel architecture of neural computations. We propose here a simple implementation using a network of spiking integrate-and-fire neurons which communicate using lateral interactions. Numerical simulations show that this Sparse Spike Coding strategy provides an efficient model for representing visual data from a set of natural images. Even though it is simplistic, this transformation of spatial data into a spatio-temporal pattern of binary events provides an accurate description of some complex neural patterns observed in the spiking activity of biological neural networks.
It is generally assumed that neurons in the central nervous system communicate through temporal firing patterns. As a first step, we will study the learning of a layer of realistic neurons in the particular case where the relevant messages are formed by temporally correlated patterns, or synfire patterns. The model is a layer of Integrate-and-Fire (IF) neurons with synaptic current dynamics that adapts by minimizing a cost according to a gradient descent scheme. This leads to a rule similar to Spike-Time Dependent Hebbian Plasticity (STDHP). Our results show that the rule that we derive is biologically plausible and leads to the detection of the coherence in the input in an unsupervised way. An application to shape recognition is shown as an illustration.
How to reach me