Sample Orchestrator: Gestion Par Le Contenu D’Échantillons Sonores

Sample Orchestrator: Gestion Par Le Contenu D’Échantillons Sonores

Hugues Vinet Gérard Assayag  Juan José Burred  Grégoire Carpentier  Nicolas Misdariis  Geoffroy Peeters  Axel Roebel  Norbert Schnell  Diemo Schwarz  Damien Tardieu 

STMS IRCAM-CNRS-UPMC, 1, place Igor Stravinsky, F-75004 Paris

30 June 2011
| Citation



The main advances of the R&D Sample Orchestrator project are presented, aiming at the development of innovative functions for the manipulation of sound samples. These features rely on studies on sound description, i.e. the formalization of relevant data structures for characterizing the sounds’ content and organization. This work was applied to automatic sound indexing and to the development of new applications for musical creation - interactive corpus-based synthesis and computer-aided orchestration. The project also included an important part on high-quality sound processing, through several enhancements of the phase vocoder model – processing by sinusoidal model in the spectral domain and automatic computation of the analysis parameters.


Nous présentons les principaux travaux menés dans le projet Sample Orchestrator, destiné au développement de fonctions innovantes de manipulation d’échantillons sonores. Celles-ci se fondent sur des études consacrées à la description des sons, c’est-à-dire à la formalisation de structures de données pertinentes pour caractériser le contenu et l’organisation des sons. Ces travaux ont été appliqués à l’indexation automatique des sons, ainsi qu’à la réalisation d’applications inédites pour la création musicale – synthèse sonore interactive par corpus et aide informatisée à l’orchestration. Le projet a aussi comporté un important volet consacré au traitement de haute qualité des sons, à travers plusieurs perfectionnements du modèle de vocodeur de phase – traitement par modèle sinusoïdal dans le domaine spectral et calcul automatique des paramètres d’analyse.


sound, sound samples, sound synthesis and processing, sound indexing, auditory

cognition, machine learning, signal models, phase vocoder, short-term Fourier transform, concatenative synthesis, corpus-based synthesis, orchestration.


son, échantillons sonores, synthèse et traitement sonore, indexation sonore, cognition auditive, apprentissage automatique, modèles de signaux, vocodeur de phase, transformée de Fourier à court terme, synthèse concaténative, synthèse par corpus, orchestration.

1. Introduction
2. Description et Indexation d’échantillons Sonores
3. Modèles de Signaux pour le Traitement Sonore : Extensions du Vocodeur de Phase
4. Applications pour la Création Musicale
5. Conclusion

Abe M., Smith J. O. (2005). AM/FM rate estimation for time-varying sinusoidal modelling, Proc. Int. Conf. on Acoustics, Speech and Signal Proc. (ICASSP’05), vol. III, p. 201-204.

Amatriain X., Bonada J., Loscos A., Serra X. (2002). Spectral Processing, Digital Audio Effects8, John Wiley & Sons, chapter 10, p. 373-43.

Aucouturier J.-J., Pachet F. (2006). Jamming With Plunderphonics: Interactive Concatenative Synthesiso Music, Journal of New Music Research, Special Issue on Audio Mosaicing, vol. 35, n° 1, p. 35-50.

Bevilacqua F., Muller R., Schnell N. (2005). MnM: a Max/MSP mapping toolbox, Proc. Conference for New Interfaces for Musical Expression (NIME’05), Vancouver, p. 85-88.

Brent W. (2010). A Timbre Analysis and Classification Toolkit for Pure Data, Proc. International Computer Music Conference (ICMC’10), New York City, NY.

Brown J. (1999). Computer identification of musical instruments using pattern recognition with cepstral coefficients as features, Journal of the Acoustical Society of America, vol. 105, n° 3, p. 1933-1941.

Burred J. J., Cella C., Peeters G., Roebel A., Schwarz D. (2008). Using the SDIF Sound Description Interface Format for Audio Features, Proc. International Conference on Music Information Retrieval (ISMIR’08), Philadelphia, USA.

Carpentier G. (2008). Approche computationnelle de l’orchestration musicale - Optimisation multicritère sous contraintes de combinaisons instrumentales dans de grandes banques de sons, Thèse de doctorat, UPMC Paris-6.

Carpentier G. (2011). Global Constraints in Orchestration, In Truchet & Assayag (Eds.), Constraint Programming in Music, Wiley.

Carpentier G., Bresson J. (2010). Interacting with Symbolic, Sound and Feature Spaces in Orchidée, a Computer-Aided Orchestration Environment, Computer Music Journal, vol. 34, n° 1, p. 10-27, MIT Press.

Carpentier G., Assayag G., Saint-James E. (2010a). Solving the Musical Orchestration Problem using Multiobjective Constrained Optimization with a Genetic Local Search Approach, Journal of Heuristics, vol. 16, n° 5, p. 681-714, Springer.

Carpentier G., Tardieu D., Harvey J., Assayag G., Saint-James E. (2010b). Predicting Timbre Features of Instrument Sound Combinations: Application to Automatic Orchestration. Journal of New Music Research, vol. 39, n° 1, p. 47-61, Taylor & Francis.

Casey M., Grierson M. (2007). Soundspotter and remix-tv: Fast approximate matching for audio and video performance, Proc. International Computer Music Conference (ICMC’07), Copenhagen, Denmark.

Chion M. (1994). Guide des objets sonores, Editions Buchet Chatel, Paris.

Collins N. (2007). Audiovisual Concatenative Synthesis, Proc. International Computer Music Conference, (ICMC’07), p. 389-392.

Comajuncosas J. M., Barrachina A., Connell J.O. (2011). Nuvolet: 3D Gesture-driven Collaborative Audio Mosaicing, Proc. of New Interfaces for Musical Expression (NIME’11), p. 252-255.

Deb K. (2000). An Efficient Constraint Hand ling Method for Genetic Algorithms, Computer Methods in Applied Mechanics and Engineering, vol. 186, p. 311-338.

Doval B., Rodet X. (1993). Fundamental frequency estimation and tracking using maximum likelihood harmonic matching and HMMs, Proc. of International conference on audio, speech and signal processing (ICASSP’93), Minneapolis.

Einbond A., Schwarz D., Bresson J. (2009). Corpus-based transcription as an approach to the compositional control of timbre, Proc. of the International Computer Music Conference (ICMC’09), Montreal, QC, Canada.

Esling P., Carpentier G., Agon C. (2010). Dynamic Musical Orchestration using Genetic Algorithms and a Spectro-Temporal Description of Musical Instruments, Lecture Notes in Computer Science, vol. 6025, EvoApplications Part II, Springer.

Flanagan J. L., Golden R.M. (1966). Phase vocoder, Bell Syst. Tech. J., vol. 45, p. 1493-1509.

Gaver W. (1993). “How do we hear in the world? explorations in ecological acoustics”, Ecological Psychology, vol. 5, n° 4, p. 285-313.

Gaver W. (1993). What in the world do we hear? an ecological approach to auditory event perception, Ecological Psychology, vol. 5, p. 1-29.

Godoy R.I. (2006). “Gestural-Sonorous Objects: embodied extensions of Schaeffer’s conceptual apparatus”, Organised Sound, vol. 11, n° 2, p. 149-157.

Goldberg D.E. (1989). Genetic Algorithms in Search, optimization and Machine Learning. Addison-Wesley.

Goodwin M., Rodet X. (1994). Efficient Fourier synthesis of nonstationary sinusoids, Proc. International Computer Music Conference (ICMC’94), p. 333-334.

Grey J. M. (1977). Multidimensional perceptual scaling of musical timbres, Journal of the Acoustical Society of America, vol. 61, n° 5, p. 1270-1277.

Griffin D., Lim J. (1984). Signal Estimation from Modified Short-Time Fourier Transform, IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 32, n° 2, p. 236-243.

Houix O., Lemaitre G., Misdariis N., Susini P. (2007). Everyday sound classification, Part 2: Experimental classification of everyday sounds, Deliverable 4.1 - part I, projet CLOSED.

Jaillet F., Torresani B. (2007). Time-frequency jigsaw puzzle: adaptive multiwindow and multilayered Gabor expansions, Int. J. Wavelets Multiresolut, Inf. Process., vol. 2, p. 293-316.

Janer J., de Boer M. (2008). Extending voice-driven synthesis to audio mosaicing, Proc. of the International Conference on Sound and Music Computing (SMC’08).

Jaszkiewicz A. (2001). Genetic Local Search for Multiple Objective Combinatorial Optimization, Foundations of Computing and Design Sciences, vol. 26, n° 1, p. 99-120.

Jensenius A. R. (2007). Action - Sound: Developing Methods and Tools to Study Music-Related Body Movement, Ph.D. thesis, Department of Musicology, University of Oslo.

König S. (2006). scrambled? hackz! web page.

Krimphoff J., McAdams S. et al. (1994). Caractérisation du timbre des sons complexes, II: Analyses acoustiques et quantification psychophysique, Journal de physique, vol. 4, p. 625-628.

Krumhansl C. L. (1989). Why is musical timbre so hard to understand? Structure and Perception of Electroacoustic Sound and Music, S. Nielzen and O. Olsson, Eds., Elsevier, Amsterdam, The Netherlands, (Excerpta Medica 846), p. 43-53.

Laroche J., Dolson M. (1999a). New phase-vocoder techniques for real-time pitch shifting, chorusing, harmonizing and other exotic audio modifications, Journal of the Audio Engineering Society, vol. 47, n° 11, p. 928-936.

Laroche J., Dolson M. (1999b). Improved Phase Vocoder Time-Scale Modification of Audio, IEEE Transactions on Speech and Audio Processing, vol. 7, n° 3, p. 323-332.

Legendre P., Legendre L. (1998). Numerical Ecology, Development in environmental modelling, Elsevier, second English edition.

Lemaitre G., Houix O., Misdariis N., Susini P. (2010). Listener expertise and sound identification influence the categorization of environmental sounds, Journal of Experimental psychology, vol. 16, n° 1, p. 16-32.

Lemaitre G., Susini P., Winsberg S., McAdams S., Letinturier B. (2007). The sound quality of car horns: a psychoacoustical study of timbre, Acta Acustica United with Acustica, vol. 93, n° 3, p. 457-468.

Lindemann E. (2007). Music synthesis with reconstructive phrase modelling, IEEE Signal Processing Magazine, vol. 24, n° 2, p. 80-91.

Liuni M., Roebel A., Romito M., Rodet X. (2010). A reduced multiple Gabor frame for local time adaptation of the spectrogram, Proc. International Conference on Digital Audio Effects (DAFx’10), p. 338-343.

Maestre E., Ramirez R., Kersten S., Serra X. (2009). Expressive Concatenative Synthesis by Reusing Samples from Real Performance Recordings, Computer Music Journal, vol. 33, n° 4, p. 23-42.

Maresz Y. (2006). Pour un traité d’orchestration contemporain, L’Etincelle, journal de la création à l’IRCAM, IRCAM, Paris.

McAdams S., Susini P., Misdariis N., Winsberg S. (1998). Multidimensional characterisation of perceptual and preference judgements of vehicle and environmental noises, Proc. Euronoise Conference, Munich, Germany.

McAdams S., Winsberg S., Donnadieu S., de Soete G., Krimphoff J. (1995). Perceptual scaling of synthesized musical timbres: common dimensions, specificities, and latent subject classes, Psychological Research, vol. 58, n° 3, p. 177-192.

Miranda E. (2001). Composing Music with Computers, Focal Press. 

Misdariis N., Minard A., Susini P., Lemaitre G., McAdams S., Parizet E. (2010). Environmental sound perception : meta-description and modeling based on independent primary studies, EURASIP Journal on Audio, Speech, and Music Processing, vol. 2010, Article ID 362013.

Misdariis N., Smith B., Pressnitzer D., Susini P., McAdams S. (1998). Validation of a multidimensional distance model for perceptual dissimilarities among musical timbres, Proc. 135th Meet. Ac. Soc. of America/16th Int. Cong. on Acoustics, Seattle, Washington, USA.

Moore B., Glasberg B. R., Baer Th. (1997). A Model for the Prediction of Thresholds

Loudness and Partial Loudness, Journal of the Audio Engineering Society, vol. 45, p. 224-240.

Moorer J. A. (1978). The use of the phase vocoder in computer music applications, Journal of the Audio Engineering Society, vol. 26, n° 1/2, p. 42-45.

Morrison A., Chalmers M. (2003). Improving hybrid MDS with pivot-based searching, IEEE Symposium on Information Visualization, Los Alamitos, CA, USA, p. 11.

MPEG-7 (2002). Information Technology - Multimedia Content Description Interface - Part 4: Audio, ISO/IEC JTC 1/SC 29, ISO/IEC FDIS 15938-4, 2002.

Mullon P., Geslin Y., Jacob M. (2002). Ecrins: an audio-content description environment for sound samples, Proc. International Computer Music Conference (ICMC’02), Göteborg.

Nouno G., Cont A., Carpentier G., Harvey J. (2009). Making an Orchestra Speak, Proc. Sound and Music Computing Conference (SMC’09), Porto, Portugal, p. 277-282.

Oswald J. (1993). Plexure, CD.

Parizet E., Guyader E., Nosulenko V. (2006). Analysis of car door closing sound quality, Applied Acoustics, vol. 69, n° 1, p. 12-22.

Peeters G. (2003). Automatic Classification of Large Musical Instrument Databases Using Hierachical Classifiers with Inertia Ratio Maximization, Proc. of the Audio Engineering Society, AES 115th Convention, New York.

Peeters G. (2004). A large set of audio features for sound description (similarity and classification) in the Cuidado project, Technical Report version 1.0, IRCAM – Centre Pompidou, Paris, France.

Peeters G., Deruty E. (2008). Automatic morphological description of sounds, Proc. Acoustics’08, Paris.

Peeters G., McAdams S., Herrera P. (2000). Instrument sound description in the context of MPEG-7, Proc. International Computer Music Conference (ICMC’00), Berlin, Germany.

Rioux V. (2001). Projet ECRINS/validation expérimentale phase I: descripteurs morphologiques, Rapport interne IRCAM.

Roads C. (1988). Introduction to granular synthesis, Computer Music Journal, vol. 12, n° 2, p. 11-13.

Rodet X., Depalle P. (1992). A new additive synthesis method using inverse Fourier transform and spectral envelopes, Proc. International Computer Music Conference (ICMC’92), p. 410-412.

Roebel A. (2003). A new approach to transient processing in the phase vocoder », Proc. International Conference on Digital Audio Effects (DAFx’03), p. 344-349.

Roebel A. (2010). A Shape-Invariant Phase Vocoder for Speech Transformation, Proc. International Conference on Digital Audio Effects (DAFx’10).

Roebel A., Zivanovic M., Rodet X. (2004). Signal decomposition by means of classification of spectral peaks, Proc. International Computer Music Conference (ICMC’04), p. 446-449.

Schaeffer P. (1966). Traité des objets musicaux, Editions du Seuil, Paris, France.

Scheirer E., Slaney M. (1997). Construction and evaluation of a robust multifeature speech/music discriminator, Proc. International Conference on Audio, Speech and Signal Processing (ICASSP’97), Munich, Germany.

Schnell N., Borghesi R., Schwarz D., Bevilacqua F., Muller R. (2005). FTM-Complex Data Structures for Max. Proc. International Computer Music Conference (ICMC’05), Barcelona, Spain.

Schnell N., Röbel A., Schwarz D., Peeters G., Borghesi R. (2009). MuBu & friends –assembling tools for content based real-time interactive audio processing in Max/MSP. Proc. of the International Computer Music Conference (ICMC’09), Montréal, Canada.

Schnell N., Schwarz D. (2005). Gabor, Multi-Representation Real-Time Analysis/Synthesis, Proc. International Conference on Digital Audio Effects (DAFx’05), Madrid, Spain.

Schwarz D. (2000). A System for Data-Driven Concatenative Sound Synthesis, Proc. International Conference on Digital Audio Effects (DAFx’00), Verona, Italy, p. 97-102.

Schwarz D. (2004). Data-Driven Concatenative Sound Synthesis, Thèse de doctorat, université Paris 6 – Pierre et Marie Curie, Paris.

Schwarz D. (2006). Concatenative sound synthesis: The early years, Journal of New Music Research, vol. 35, n° 1, p. 3-22, Special Issue on Audio Mosaicing.

Schwarz D. (2007). Corpus-based concatenative synthesis, IEEE Signal Processing Magazine, vol. 24, n° 2, p. 92-104, Special Section: Signal Processing for Sound Synthesis.

Schwarz D. (2011). State of the art in sound texture synthesis, Proc. International Conference on Digital Audio Effects (DAFx’11), Paris, France.

Schwarz D., Cadars S., Schnell N. (2008). What next? continuation in real-time corpus-based concatenative synthesis, Proc. International Computer Music Conference (ICMC’08), Belfast, Northern Ireland.

Schwarz D., Cahen R., Britton S. (2008). Principles and applications of interactive corpusbased concatenative synthesis, Proc. Journées d’Informatique Musicale (JIM’08), GMEA, Albi, France.

Schwarz D., Schnell N. (2009). Sound search by content-based navigation in large databases, Proc. of the International Conference on Sound and Music Computing (SMC’09), Porto, Portugal.

Schwarz D., Schnell N. (2010). Descriptor-based sound texture sampling, Proc. of the International Conference on Sound and Music Computing (SMC’10), Barcelona, Spain, p. 510-515.

Schwarz D., Schnell N., Gulluni S. (2009). Scalability in content-based navigation of sound databases, Proc. International Computer Music Conference (ICMC’09), Montréal, QC, Canada.

Serra X., Bonada J. (1998). Sound transformations based on SMS High Level Attributes, Proc. International Conference on Digital Audio Effects (DAFx’98), Barcelona (Spain).

Shneiderman B., Plaisant C. (2005). Designing the User Interface, Chapter Information visualization, Boston, USA: Pearson, p. 580-603.

Smith B. (1995). PsiExp: an environment for psychoacoustic experimentation using the IRCAM musical workstation, Proc. Society for Music Perception and Cognition conference, University of Berkeley.

Stowell D., Plumbley M. (2010). Timbre remapping through a regression-tree technique, In Proc. of the International Conference on Sound and Music Computing (SMC’10). Susini P., McAdams S., Misdariis N., Lemaitre G., Winsberg S. (2005). Timbre des sons environnementaux, CIM, Montréal.

Susini P., McAdams S., Winsberg S., Perry I., Vieillard S., Rodet X. (2004). Characterizing the sound quality of airconditioning noise, Applied Acoustics, vol. 65, n° 8, p. 763-790.

Talbi E-G. (2009). Metaheuristics: From Design to Implementation, Wiley.

Tardieu D., McAdams S. (2011). Perception of dyads of percussive and sustained instruments, En attente de parution, Music Perception.

Tardieu D., Peeters G., Rodet X. (2011). Instrument Model for Computer Aided Orchestration, Soumis à IEEE Transactions on Audio Speech and Language Processing.

Tremblay P. A., Schwarz D. (2010). Surfing the waves: Live audio mosaicing of an electric bass performance as a corpus browsing interface, Proc. Internationa Conference for New Interfaces for Musical Expression (NIME’10), Sydney, Australia, p. 447-450.

Truchet C., Codognet P. (2004). Musical Constraint Satisfaction Problems Solved with Adaptive Search, Soft Computing, vol. 8-9, p. 633-640.

Van Nort D. (2009). Instrumental Listening: sonic gesture as design principle, Organised Sound, vol. 14, n° 2, p. 177-187.

Vinet H. (2003). The Representation Levels of Musical Information, Lecture Notes in Computer Science, vol. 2771, Springer Verlag.

Vinet H. (2006). Applications musicales du traitement de signal : synthèse et prospective, Traitement du signal, vol. 5-6, n° 3, GRETSI.

Vinet H., Ballet G., Puig V. (1999). Rapport final du projet Studio en ligne, rapport interne IRCAM,

Vinet H., Herrera P., Pachet F. (2002). The CUIDADO Project, Proc. InternationalConference on Music Information Retrieval (ISMIR’02), IRCAM, Paris, 2002.

Wells J.J., Murphy D. T. (2010). A comparative evaluation of techniques for single-frame discrimination of nonstationary sinusoids, IEEE Transactions on Audio, Speech and Language Processing, vol. 18, n° 3, p. 498-508.

Winsberg S., De Soete G. (1993). A latent class approach to fitting the weighted Euclidean model, clascal, Psychometrika, vol. 58, n° 2, p. 315-330.

Wöhrmann R., Ballet G. (1999). Design and Architecture of Distributed Sound Processing and Database Systems for Web-Based Computer Music Applications, Computer Music Journal, vol. 23, n° 3, MIT Press, p. 73-84.

Wold E., Blum T., Keislar D., Wheaton J. (1999). Classification, search and retrieval of audio, CRC Handbook of Multimedia Computing, B. Furth, Boca Raton, FLA, CRC Press, p. 207-226.

Wolfe P., Godsill S. J., Dörfler M. (2001). Multi-gabor dictionaries for audio time-frequency analysis, Proc. IEEE WASPAA, p. 43-46.

Wright M. (2005). Open Sound Control: An Enabling Technology for Musical Networking. Organised Sound, vol. 10, n° 3, p. 193-200.

Yeh C., Roebel A., Rodet X. (2010). Multiple Fundamental Frequency Estimation and Polyphony Inference of Polyphonic Music Signals, IEEE Transactions on Audio, Speech and Language Processing, vol. 18, n° 6, p. 1116-1126.

Zivanovic M., Roebel A., Rodet X. (2008). Adaptive Threshold Determination for Spectral Peak Classification, Computer Music Journal, vol. 32, n° 2, p. 57-67.

Zwicker E. (1990). Psychoacoustics, Springer-Verlag, Berlin.

Zwicker E., Terhardt E. (1980). Analytical expression for critical-band rate and critical bandwidth as a function of frequency, Journal of the Acoustical Society of America, vol. 68, p. 1523-1525.