Timing is essential for neural processing, but evidence for such temporal precision is still lacking. We have developed a theoretical model of representation based on spatio-temporal spiking motifs. Our goal is to develop a self-supervised learning method for optimal detection of such motifs in neurobiological data. To detect such motifs, we have extended the K-Means algorithm to process temporal data using a convolutional operator. A second pooling layer ensures that only one motif is used per time step. The results were improved by ensuring that the detected motifs are equiprobably activated using a homeostatic mechanism. We applied this algorithm to the Spiking Heidelberg database, which consists of the output of a realistic cochlear model to spoken digits. Qualitatively, the filters show a structure similar to the receptive fields found in the auditory cortex. Based on these promising results on this realistic yet synthetic dataset, future work will aim to apply his algorithm to neurological data to challenge the hypothesis of the role of precise spike timing in neural processes.