Pachet, F. and Laigre, D. A Naturalist Approach to Music File Name Analysis. In University of Indiana, editor, Proceedings of 2nd International Symposium on Music Information Retrieval (ISMIR 2001), pages 51-58, Indiana, USA, 2001

Sony CSL authors: Damien Laigre, François Pachet


Music title identification is a key ingredient of contentbased electronic music distribution. Because of the lack of standards in music identification – or the lack of enforcement of existing standards – there is a huge amount of unidentified music files in the world. We propose here an identification mechanism that exploits the information possibly contained in the file name itself. We study large corpora of files whose names are decided by humans without particular constraints other than readability, and draw various hypotheses concerning the natural syntaxes that emerge from these corpora. A central hypothesis is the local syntactic consistency, which claims that file name syntaxes, whatever they are, are locally consistent within clusters of related music files. These heuristics allow to parse successfully file names without knowing their syntax a priori, using statistical measures on clusters of files, rather than on parsing files on a strict individual basis. Based on these validated hypothesis we propose a heuristics-based parsing system and illustrate it in the context of an Electronic Music Distribution project.

Keywords: Cuidado, file name, natural language processing, music identification


[PDF] Adobe Acrobat PDF file

BibTeX entry

@INPROCEEDINGS { pachet:01b, ADDRESS="Indiana, USA", AUTHOR="Pachet, F. and Laigre, D.", BOOKTITLE="Proceedings of 2nd International Symposium on Music Information Retrieval (ISMIR 2001)", EDITOR="University of Indiana", PAGES="51-58", TITLE="A Naturalist Approach to Music File Name Analysis", YEAR="2001", }