The formation of connections between neural cells is essentially emerging from an unsupervised learning process. During the development of primary visual cortex (V1) of mammals, for example, one may observe the emergence of cells selective to localized and oriented features. This leads to the development of a rough contour-based representation of the retinal image in area V1. We modeled the formation of this representation along the thalamo-cortical pathway using a sparse unsupervised learning algorithm in a hierarchical network. This algorithm alternates (i) a coding phase to encode the information and (ii) a learning phase to find the proper encoder (also called dictionary). We replicated and adapted the Multi-Layer Convolutional Sparse Coding (ML-CSC) model from Michael Elad’s group. As an application, we have trained our implementation on a database containing images from faces. The extracted features show similarities with some of the neuron’s receptive field found in V1 and beyond. Furthermore, our results demonstrate the potential application of such a strategy to the fast classification of images, for example in hierarchical and dynamical architectures.