Home Journals TS Choice and adaptation of statistical models for single channel singing voice separation. Choix et adaptation de modèles statistiques pour la séparation de voix chantée à partir d’un seul microphone

JOURNAL METRICS

Impact Factor (JCR) 2023: 1.2 ℹImpact Factor (JCR):

The JCR provides quantitative tools for ranking, evaluating, categorizing, and comparing journals. The impact factor is one of these; it is a measure of the frequency with which the “average article” in a journal has been cited in a particular year or period. The annual JCR impact factor is a ratio between citations and recent citable items published. Thus, the impact factor of a journal is calculated by dividing the number of current year citations to the source items published in that journal during the previous two years.

5-Year Impact Factor: 1.2 ℹ5-Year Impact Factor:

A 5-Year Impact Factor shows the long-term citation trend for a journal. This is calculated differently from the Journal Impact Factor, so it is not simply an average of the Impact Factors in the time period. The Impact Factor itself is based only on Web of Science Core Collection citation data from the last three years and thus reflects only recent impact. The Journal Impact Factor is the average number of times articles from the journal published in the past two years have been cited in the Journal Citation Reports year.

Choice and adaptation of statistical models for single channel singing voice separation. Choix et adaptation de modèles statistiques pour la séparation de voix chantée à partir d’un seul microphone

Choice and adaptation of statistical models for single channel singing voice separation.

Choix et adaptation de modèles statistiques pour la séparation de voix chantée à partir d’un seul microphone

Alexey Ozerov | Pierrick Philippe | Rémi Gribonval | Frédéric Bimbot

Orange Labs, 4 rue du Clos Courtel, BP 91226, 35512 Cesson Sévigné cedex, France

IRISA (CNRS & INRIA) - projet METISS, Campus de Beaulieu, 35042 Rennes Cedex, France

Received:

12 January 2006

| |

Accepted:

N/A

| | Citation

ts24_3_211-224.pdf

OPEN ACCESS

Abstract:

The problem of singing voice extraction from mono audio recordings, i.e.,one microphone separation of voice and music,is studied.The approach is based on a priori probabilistic models for two sources,more precisely on Gaussian Mixture Models (GMM).A method for model adaptation to the characteristics of the mixed sources is developed and a comparative study of different models and estimators is performed.We show that the adaptation of the model of music from the non-vocal parts of songs yields good results in realistic conditions.

Résumé

Le problème de l’extraction de la voix chantée dans des enregistrements musicaux monophoniques, c’est-à-dire la séparation voix / musique avec un seul capteur,est étudié. Les approches utilisées sont basées sur des modèles statistiques a priori des deux sources (musique et voix),notamment sur des Modèles de Mélange de Gaussiennes (MMG). Une méthode d’adaptation des modèles aux caractéristiques des sources mélangées est proposée,et une étude comparative des différents modèles et estimateurs est effectuée. Les résultats montrent que l’adaptation du modèle de musique sur les parties non-vocales des chansons permet d’obtenir de bonnes performances dans un cadre réaliste.

Keywords:

Single channel source separation,singing voice,statistical models,Gaussian mixture models,adaptive Wiener filtering, models adaptation.

Mots clés

Séparation de sources avec un seul capteur,voix chantée,modèles statistiques,modèles de mélange de gaussiennes,filtrage de Wiener adaptatif,adaptation de modèles.

1. Introduction

2. Présentation Générale des Méthodes de Séparation

3. Méthodes de Séparation à Base de MMG

4. Mesures de Performance

5. Adaptation des Modèles et Choix de la Méthode de Séparation

6. Cadre Expérimental

7. Expérimentations et Résultats

8. Conclusions et Perspectives

References

[1] S.T. ROWEIS. “One microphone source separation”, in Advances in Neural information Processing Systems,vol. 13,MIT Press,2001,pp. 793-799.

[2] L. BENAROYA. « Séparation de plusieurs sources sonores avec un seul microphone », Ph.D. dissertation, Université de Rennes 1, 2003.

[3] Y. EPHRAIM and D. MALAH. “Speech enhancement using a minimum mean square error log-spectral amplitude estimator”, in IEEE Trans. on Acoust., Speech, and Sig. Proc., vol. ASSP-33, Apr 1985, pp. 443-445.

[4] D. BURSHTEIN and S. GANNOT. “Speech enhancement using a mixture-maximum model”, in European Conf. on Speech Communication and Technology (EuroSpeech’99), ol. 6, Budapest, Hungary, Sep 1999, pp. 2591-2594.

[5] S. M. KAY. Fundamentals of Statistical Signal Processing, Estimation Theory. Prentice Hamm, 1993.

[6] G. PEETERS and X. RODET. “SINOLA: A new analysis/synthesis method using spectrum peak shape distortion, phase and reassigned spectrum”, in International Computer Music Conference (ICMC’99), Oct. 1999, pp. 153-156.

[7] A. P. DEMPSTER, N. M. LAIRD, and D. B. RUBIN. “Maximum likehood from incomplete data via the EM algorithm,” Journal of the Royal Statistical Society, vol. 39, pp. 1-38, 1977.

[8] L. BENAROYA and F. BIMBOT. “Wiener based source separation with HMM/GMM using a single sensor,” in Intl. Conf. on Indep. Component Analysis and Blind Source Separation (ICA’03), Nara, Japan,Apr. 2003, pp. 957-961.

[9] J. MCQUEEN. “Some methods for classification and analysis of multivariate obervations,” in 5th Berkeley Symposium on mathematics, Statistics and Probability, 1967, pp. 281-298.

[10] L. BENAROYA, F. BIMBOT, and R. GRIBONVAL. “Audio source separation with a single sensor,” IEEE Trans. Audio, Speech and Language Proc., vol. 14, no. 1, pp. 191-199, january 2006.

[11] F. D. NEESER AND J. L. MASSEY. “ Proper complex random processes with applications to information theory, IEEE Trans. inform. Theory, vol. 39, no. 4, pp. 1293-1302, July 1993.

[12] B. PICINBONO. “Second-order complex random vectors and normal distributions,” IEEE Trans. Signal Processing, vol. 44, no. 10, pp. 2637-2640, October 1996.

[13] W. H. PRESS, B. P. FLANNERY, S. A. TEUKOLSKY, and W. T. VETTERLING. Numerical Recipes in C. : The Art of Scientific Computing, 2nd ed. Cambridge University Press, October 1992. [Online]. Available: http://www.library.cornell.edu/nr/bookcpdf.htlm

[14] Y. EPHRAIM. “A bayesian estimation approach for speech enhancement using hidden Markov models,” IEEE Trans. Signal Processing, vol. SP-40, pp. 725-735,April 1992.

[15] L.R. RABINER. “A tutorial on hidden Markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol. 77, no. 2, pp. 257-286, 1989.

[16] A. NÁDAS, D. NAHAMOO, and M. A. PICHENY. “Speech recognition using noise-adaptive prototype,” in IEEE Trans. on Speech and Audio Proc., 1989, pp. 1495-1505.

[17] P.J. MORENO, B. RAJ, and R. M. STERN. “A vector taylor series approach for environment-independent speech recognition,” in IEEE intl. Conf. on Acoustics, Speech and Signal Proc. (ICASSP’96), vol. 2, 1996.

[18] R. GRIBONVAL, L. BENAROYA, E. VINCENT, and C. FÉVOTTE. “Proposals for performance measurement in source separation,” in Intl. Conf. Indep. Component Analysis and Blind Source Separation (ICA’03),April 2003, pp. 763-768.

[19] J.-M. VALIN, J. ROUAT, and F. MICHAUD. “Microphone array post-filter for separation of simultaneous non-stationary sources,” in IEEE Intl. Conf. on Acoustics, Speech and Signal Proc. (ICASSP’04), 2004.

[20] L. RABINER and B.-H. JUANG. Fundamentals of speech recognition. Englewood Cliffs, n.J.: Prentice Hall, 1993.

[21] R. VERGIN, D. O’SHAUGHNESSY, and A. FARHAT. “Generalized mel frequency cepstral coefficients for large-vocabulary speakerindependant continuous-speech recognition,” IEEE Trans. on Speech and Audio Proc., vol. no. 5, pp. 525-532, Sep 1999.

[22] T. KRISTJANSSON, H. ATTIAS, and J. HERSHEY, “Single microphone source separation using high resolution signal recontruction,” in IEEE Intl. Conf. on Acoustics, Speech and Signal Proc. (ICASSP’04), vol. 2, 2004, pp. 817-820.

[23] A. OZEROV, P. PHILIPPE, R. GRIBONVAL, and F. BIMBOT. “One microphone singing voice separation using source-adapted models,” in IEEE Worksh. on Apps. of Signal Processing to Audio and Acoustics (WASPAA’05), Mohonk, NY, Oct. 2005, pp. 90-93.

[24] E. VINCENT and R. GRIBONVAL. « Construction d’estimateurs oracles pour la séparation de sources », in GRETSI’05 Syposium on Signal and Image Processing, Louvain-la-Neuve, Belgium, 2005.

[25] A. OZEROV, R. GRIBONVAL, P. PHILIPPE, and F. BIMBOT. « Séparation voix/musique à partir d’enregistrements mono quelques remarques sur le choix et l’adaptation des modèles », in GRETSI’05 Symposium on Signal and Image Processing, Louvain-la-Neuve, Belgique, Sept. 2005.

[26] W.-H. TSAI, D. ROGERS, and H.-M. WANG. “Blind clustering of popular music recordings based on singer voice characteristics,” Computer Music Journal, vol. 28, no. 3, pp. 68-78, 2004.

IJHT
MMEP
ACSM
EJEE
ISI
I2M
JESA
RCMA
RIA
TS
IJSDP
IJSSE
IJDNE
JNMES
IJES
EESRJ
RCES
AMA_A
AMA_B
AMA_C
AMA_D
MMC_A
MMC_B
MMC_C
MMC_D

Username
Password
Remember me

Search form

Choice and adaptation of statistical models for single channel singing voice separation. Choix et adaptation de modèles statistiques pour la séparation de voix chantée à partir d’un seul microphone