Matthis Dallain

Matthis Dallain

Phd candidate in Computational Neuroscience

PhD Student (2024-10 / 2027-09): Focus of attention: a sensory-motor task for energy reduction in spiking neural networks.

Context

This project takes place in the context of the EMERGENCES project (ANR PEPR IA 2023-2027) which aims to advance the state of the art on machine learning models using inspiration from biology. Indeed, inspiration from brain features promises to show the emergence of unrivalled efficient processing. Among the most promising features studied in the literature of bio-inspired AI are temporal data encoding using spikes, multimodal association, local learning or attention-based processing.

This PhD subject focuses on the association between attention and spiking neural networks for defining new efficient AI models for embedded systems such as drones, robots and more generally autonomous systems.

The thesis will take place between the LEAT research lab in Sophia-Antipolis and the INT institute in Marseille which both develop complementary approaches on bio-inspired AI from neuroscience observation to embedded systems design.

Subject

The volume as well as the diversity of visual information that reaches our eyes at every moment are huge and cannot be fully integrated by the visual system. In other words, the biological system is confronted to the same challenge as the one encountered by artificial systems (especially at the edge) when dealing with the huge amounts of information coming continuously from the real world. Interestingly, the brain has found an original approach to deal with this issue by focusing on a sub-part of the visual information at a time. Indeed, the study of the visual cortex in neuroscience has made it possible to highlight subregions that treat each or all of the multiple properties of information coming from the visual pathways: shapes, colors, movements, etc [1], thus revealing the interaction of attentional processes and the concept of “saliency” used in cognitive science.

Creating a fully autonomous system remains a significant challenge, especially when operating in the dynamic real world. In recent times, machine learning has assumed a prominent role in machine vision, particularly through the implementation of deep learning algorithms. These algorithms have yielded impressive outcomes in tasks such as object detection, recognition, and tracking. However, these systems come with a high computational cost, as they must process entire camera images to generate these results. Additionally, they struggle to dynamically adapt to changes in their environment.

Our focus lies on two integrated bio-inspired approaches that leverage attentional mechanisms. The first approach, known as bottom-up, draws inspiration from the work of the Gestalt theory, the Feature Integration Theory (Triesman, Gelad) [3], and the model of visual attention from Itti & Koch [1]. This approach relies on the saliency of low-level features in the visual field, processed in parallel, including movement, color, and edges. It employs emergent mechanisms to integrate features guided by their saliency in order to detect the consistency of objects, encompassing their form, position, and speed. As shown by the Gestalt theory, only the more salient data are needed in this mechanism. Thus, we can dramatically reduce the amount of needed data by extracting only the more salient regions of interest during bottom-up phase.

The second approach, known as top down, considers that the visual attention is guided by higher level cognitive stages. For instance, in the Guided Search theory [4], Wolfe emphasizes the role of prior knowledges, expectations, and intentions. In this work, Wolfe proposes a guided search mechanism that relies on a “Priority map that represents the system’s best guess as to where to deploy attention next.”. This Priority map is built on multiple sources of information such as the visual system as well as higher-level information such as intention, search history and the actual visual semantics. In this way, higher-level information is used to guide the filtering of the botom-up path, so that only the information required for a given task is selected and processed. Similar systems are proposed by Schöner [5] in which saliency maps, working memories and “priority map”, guided visual search mechanisms are implemented through the Neural Field Theory (NFT). Here, Dynamic Neural Fields are used to implement the saliency of feature maps, as well as scene spatial selection mechanism, working memory, etc.

In a previous work from the LEAT [6], we have proposed a brain inspired attentional process implementing bottom-up and top-down paths based on a dynamic neural fields properties embodied in a sensory-motor loop. In a complementary work, the INT group has developed a dual pathway model of the visual system in which saliency emerges as a property of the perceptual system to perform saccades, that is, rapid shifts of the fixation point [7]. This uses a recognition model which takes as an input a retinotopically transformed input and shows the emergence of saliency maps [8] In the dual-pathway model, the exploration of a visual scene is based on both the saliency of the color feature (bottom-up) and the class of the last selected object recognized by a convolutional neural network (top-down). Both paths are integrated by a dynamic neural field to select the next visual information to be explored or conserved by setting motor orders accordingly.

The main goal of the thesis is to propose a new vision of the integration of attention into machine learning models. The proposed model will draw on the dynamics at play in a sensory-motor approach to perception and will thus reconsider the classical perception tasks in order to better fit with the continuous flow of information coming from the environment.

References

Education
  • Phd candidate in Computational Neuroscience

    Aix-Marseille Université

  • Master in Neuroscience, 2024

    Aix-Marseille Université