
Humans are able to robustly categorize images and can, for instance, detect the presence of an animal in a briefly flashed image in as little as 120 ms. Initially inspired by neuroscience, deep-learning algorithms literally bloomed up in the last decade such that the accuracy of machines is at present superior to humans for visual recognition tasks. However, these artificial networks are usually trained and evaluated on very specific tasks, for instance on the 1000 separate categories of IMAGENET. In that regard, biological visual systems are more flexible and efficient compared to artificial systems on generic ecological tasks. In order to deepen this comparison, we retrained the standard VGG Convolutional Neural Network (CNN) on two independent tasks which are ecologically relevant for humans: one task defined as detecting the presence of an animal and the other as detecting the presence of an artifact. We show that retraining the network achieves human-like performance level which is reported in psychophysical tasks. We also compare the accuracy of the detection on an image-by-image basis. This showed in particular that the two models perform better when combining their outputs. Indeed, animals (e.g. lions) tend to be less present in photographs containing artifacts (e.g. buildings). These re-trained models could reproduce some unexpected behavioral observations from humans psychophysics such as the robustness to rotations (e.g. upside-down or slanted image) or to a grayscale transformation. Finally, we quantitatively tested the number of layers of the CNN which are necessary to reach such a performance, showing that a good accuracy for ultra-fast categorization could be reached with only a few layers, challenging the belief that image recognition would require a deep sequential analysis of visual objects. We expect to apply this framework to guide future model-based psychophysical experiments and biomimetic deep neuronal architectures designed for such tasks.