Automatic Extraction of Prosodic Features for Automatic Language Identification. Extraction Automatique de Paramètres Prosodiques pour L’Identification Automatique des Langues

Automatic Extraction of Prosodic Features for Automatic Language Identification

Extraction Automatique de Paramètres Prosodiques pour L’Identification Automatique des Langues

Jérôme Farinas Jean-Luc Rouas  François Pellegrino  Régine André-Obrecht 

Université Paul Sabatier; Équipe SAMOVA, IRIT UMR 5505; F-31062 Toulouse Cedex 9

DDL UMR 5596 - ISH; 14, avenue Berthelot; F-69363 Lyon Cedex 7

Page: 
81-97
|
Received: 
N/A
|
Accepted: 
N/A
|
Published: 
30 April 2005
| Citation

OPEN ACCESS

Abstract: 

The aim of this study is to propose a new approach to Automatic Language Identification:it is based on rhythmic modelling and fundamental frequency modelling and does not require any hand labelled data.First we need to investigate how prosodic or rhythmic information can be taken into account for Automatic Language Identification.A new automatically extracted unit,the pseudo syllable,is introduced.Rhythmic and intonative features are then automatically extracted from this unit.Elementary decision modules are defined with gaussian mixture models.These prosodic modellings are combined with a more classical approach,a vocalic system acoustic modelling.Experiments are conducted on the five European languages of the MULTEXT corpus:English,French,German,Italian and Spanish.The relevance of the rhythmic parameters and the efficiency of each system (rhythmic model,fundamental frequency model and vowel system model) are evaluated.The influence of these approaches on the performances of automatic language identification system is addressed.We obtain 91% of correct identification with 21 s.utterances using all the information sources

Résumé

Le but de cette étude est de proposer une nouvelle approche pour l’identification automatique des langues, basée sur une modélisation du rythme,ne nécessitant pas de données étiquetées manuellement. Il faut tout d’abord savoir comment apporter des informations sur la prosodie,le rythme pour l’identification automatique des langues. Pour répondre à cette question nous avons introduit une nouvelle unité,la pseudo-syllabe,qui est automatiquement extraite. Des paramètres rythmiques et intonatifs sont alors calculés à partir de cette unité. Des modèles élémentaires pour chaque type de paramètres sont définis en utilisant des mélanges de lois gaussiennes. Ces modélisations de la prosodie sont couplées à une approche plus classique utilisant une modélisation acoustique des systèmes vocaliques. Les expériences sont menées sur les cinq langues européennes du corpus MULTEXT. L’intérêt des paramètres rythmiques,et l’efficacité de chaque système (modèle rythmique,modèle de la fréquence fondamentale et modèle vocalique) sont évalués. L’impact de ces approches sur les performances d’identification est analysé. Nous obtenons des résultats de 91% d’identification correcte avec des fichiers de 21 secondes.

Keywords: 

Automatic language identification,prosody,rhythm,fundamental frequency,Gaussian Mixture Models.

Mots clés

Identification automatique des langues,prosodie,rythme,fréquence fondamentale,mélange de lois gaussiennes.

Acknowledgements
1. Introduction
2. Motivations
3. Pseudo-Syllabe et IAL
4. Architecture Générale du Système
5. Expériences
6. Discussion et Conclusion
7. Annexe 1 - Exemples de Passages lus
8. Annexe 2. Histogrammes des Distributions des Valeurs de Dc Dv Nc sur les Cinq Langues du Corpus d’Apprentissage
  References

[1] M.A. ZISSMAN, K.M. BERKLING, «Automatic language identification», in Speech Communication,vol.35, pp.115-124, Elsevier Science, 2001. 

[2] Y.K. MUTHUSAMY, R.A. COLE, B.T. OSHIKA, «The ogi multilanguage telephone speech corpus», in International Conference on Speech and Language Processing,vol.2, pp.895-898, Oct. 1992. 

[3] P.A. TORRES-CARRASQUILLO, E. SINGER, M.A. KOHLER, R.J. GREENE, D.A. REYNOLDS, J.R. DELLER Jr., «Approaches to language identification using gaussian mixture models», in 4th International Conference on Spoken Language Processing,vol.1, (Denver, CO, USA), pp.89-92, Sept. 2002. 

[4] E. SINGER, P. A. TORRES-CARRASQUILLO, T.P. GLEASON, W. M. CAMPBELL, and D.A. REYNOLDS, «Acoustic, phonetic and discriminative approaches to automatic language identification», in 8th European Conference on Speech Communication and Technology (ISCA, ed.), (Genève, Suisse), pp. 1345-1348, Sept. 2003. 

[5] Y.K. MUTHUSAMY, E. BARNARD, R.A. COLE, «Reviewing automatic language identification», IEEE Signal Processing Magazine,vol.11, pp. 33-41, Oct. 1994. 

[6] L.F. LAMEL, J.-L. GAUVAIN, «Cross-lingual experiments with phone recognition», in IEEE 18th International Conference on Acoustics Speech and Signal Processing, (Minneapolis, USA), Apr. 1993.

[7] M.A. ZISSMAN, «Comparison of four approaches to automatic language identification of telephone speech», IEEE Transactions on Speech and Audio Processing,vol. 4, no. 1, 1996. 

[8] T.J. HAZEN, V.W. ZUE, «Segment-based automatic language identification», Journal of the Acoutical Society of America, 1997. 

[9] D.MATROUF, M. ADDA-DEKKER, L. LAMEL, J.GAUVAIN, «Language identification incorporating lexical information», in 5th International Conference on Spoken Language Processing, pp. 181184, 1998. 

[10] J.J. OHALA, J.B. GILBERT, «Listeners' ability to identify languages by their prosody», in Problèmes de prosodie : expérimentations, modèles et fonctions (P. Léon and M. Rossi, eds.), vol.2, pp.121-131, Paris, France: Didier, 1979. 

[11] J.F. WERKER, J.H.V. GILBERT, K.HUMPHREY, R.C. TEES, «Developmental aspects of cross-language speech production», Child Development,vol.52, pp. 349-355, 1981. 

[12] J.A. MAIDMENT, «Language recognition and prosody: further evidence», in Speech Hearing and Language,vol. 1, pp. 131-141, University College London, 1983.

[13] J. MEHLER, J. BERTONCINI, E. DUPOUX, C. PALLIER in Phonological Structure and Language Processing: Cross Linguistic Studies (T. Otake and A. Cutler, eds.), ch. The role of suprasegmentals in speech perception and acquisition, pp. 145-169, New-York, USA: Mouton de Gruyter, 1996. 

[14] P. F. MACNEILAGE and B. L. DAVIS, «On the Origin of Internal Structure of Word Forms», Science,vol. 288, pp. 527-531,Apr. 2000. 

[15] P. LADEFOGED, ed., The intonation of American English. Michigan, USA: University of Michigan Press, 1945. 

[16] D. ABERCROMBIE, ed., Élements of General Phonetics. Edinburgh: Edinburgh University Press, 1967. 

[17] P. LADEFOGED, ed., A course in phonetics. New York, USA: Harcourt Brace Jovanovich, 1975. 

[18] R. M. DAUER, « Stress-timing and syllable-timing reanalysed», in Journal of Phonetics,vol. 11, pp. 51-62, Cambridge, UK: Academic Press, 1983. 

[19] A. E. THYMÉ-GOBBEL, S. E. HUTCHINS, «Prosodic features in automatic language identification reflect language typology», in 14th International Congress of Phonetics Sciences, (San Francisco, CA, USA), pp. 29-32, Aug. 1999. 

[20] F. RAMUS, M. NESPOR, J. MEHLER, « Correlates of linguistic rhythm in the speech signal», Cognition,vol. 73, no. 3, pp. 265-292, 1999. 

[21] P. F. DOMINEY, F. RAMUS, «Neural Network Processing of Natural Language: 1. Sensitivity to Serial, Temporal and Abstract Structure in the Infant», Language and Cognitive Processes,vol. 15, no. 1, pp. 87-127, 2000. 

[22] F. RAMUS, M.D. HAUSER, C. MILLER, D. MORRIS, J. MEHLER, «Language Discrimination by Human Newborns and by Cotton-Top Tamarin Monkeys», Science,vol. 288, pp. 349-351, Apr. 2000.

[23] P.A. BARBOSA, Caractérisation et génération automatique de la structuration rythmique du français. PhD thesis, Institut National Polytechnique, Grenoble, France, 1994. 

[24] R.ANDRÉ-OBRECHT, Segmentation et parole ? Habilitation à diriger les recherches, Université de Rennes - IRISA, Rennes, June 1993. 

[25] J. FARINAS, F. PELLEGRINO, «Comparison of two approaches to Language Identification», in 7th International Conference on Speech Communication and Technology, (Aalborg, Danemark), pp. 399-402, Sept. 2001. 

[26] N. VALLÉE, L.-J. BOË, I. MADDIESON, I. ROUSSET, «Des lexiques aux syllabes des langues du monde: typologies et structures», in XXIIIèmes Journées d'Étude sur la Parole,(Aussois, France), pp. 93-96, June 2000. 

[27] R. ANDRÉ-OBRECHT, «A new statistical approach for automatic speech segmentation», IEEE Transactions on Acoustic, Speech and Signal Processing,vol. 36, pp. 29-40, Jan. 1988. 

[28] N. SUAUDEAU, R. ANDRÉ-OBRECHT, «An efficient combination of acoustic and supra-segmental informations in a speech recognition system», in IEEE International Conference on Acoustics, Speech, and Signal Processing, (Adélaide, Australie), Apr. 1994. 

[29] F. PELLEGRINO, Une approche phonétique en identification automatique des langues : la modélisation acoustique des systèmes vocaliques. Thèse de doctorat, Université Paul Sabatier, Toulouse, France, Dec. 1998. 

[30] F. PELLEGRINO, R. ANDRÉ-OBRECHT, «From Vocalic Detection to Automatic Emergence of Vowel Systems», in IEEE 22d International Conference on Acoustics Speech and Signal Processing, (Munich, Allemagne), pp. 108-112, Apr. 1997. 

[31] H.R. PFITZINGER, S.BURGER, S. HEID, «Syllable detection in read and spontaneous speech», in 4th International Conference on Spoken Language Processing,vol. 2, (Philadelphia, PA, USA), pp.1261-1264, 1996. 

[32] N. FAKOTAKIS, K. GEORGILA, A. TSOPANOGLOU, «An continuous hmm text-independant speaker recognition system based on vowel spotting», in 5th European Conference on Speech Communication and Technology,vol. 5, (Rhodes, Grèce), pp.22472250, Sept. 1997. 

[33] R. PFAU, G. RUSKE, «Estimating the speaking rate by vowel detection», in IEEE 23rd International Conference on Acoustics Speech and Signal Processing,vol.2, (Seattle, WA, USA), pp. 945-948, May 1998. 

[34] A.W. HOWITT, «Vowel landmark detection», in 6th International Conference on Speech Communication and Technology, (Budapest, Hongrie), Sept. 1999. 

[35] F. PELLEGRINO, R. ANDRÉ-OBRECHT, «An unsupervised approach to language identification», in IEEE 24th International Conference on Acoustics Speech and Signal Processing,vol. 2, (Phoenix, AR, USA), pp. 833-836, Mar. 1999. 

[36] F. PELLEGRINO, J. FARINAS, R. ANDRÉ-OBRECHT, «Comparison of Two Phonetic Aproaches to Language Identification», in 6th European Conference on Speech Communication and Technology, (Budapest, Hongrie), pp.399-402, Sept. 1999. 

[37] C. MOKBEL, D. JOUVET, J. MONNÉ, «Blind Equalization using Adaptative Filtering for improving Speech Recognition over Telephone», in 4th European Conference on Speech Communication and Technology, (Madrid, Espagne), pp. 1987-1990, 1995. 

[38] F. PELLEGRINO, R. ANDRÉ-OBRECHT, «Automatic Language Identification: an alternative approach to phonetic modeling», in Signal Processing,vol. 80, pp. 1231-1244, Elsevier Science, jul 2000.

[39] J. FARINAS, «Une modélisation automatique du rythme pour l'identification des langues». PhD thesis, Université Toulouse III, Toulouse, France, Nov. 2002. 

[40] E. CAMPIONE, J. VÉRONIS, «A multilingual prosodic database», in 5th International Conference on Spoken Language Processing, (Sidney, Australie), pp. 3163-3166, Nov. 1998. 

[41] D. CHAN, A. FOURCIN, D. GIBBON, B. GRANSTRÖM, M.HUCKVALE, G. KOKKINAKIS, K. KVALE, L. LAMEL, B. LINDBERG, A. MORENO, J. MOUROPOULOS, F. SENIA, I.TRANCOSO, C. VELT, J. ZEILIGER, «EUROM: A Spoken Language Ressource for the E.U.», in 4th European Conference on Speech Communication and Technology, (Madrid, Espagne), 1995. 

[42] F. RAMUS in De la caractérisation à l'identification des langues, actes de la 1ère journée d'étude sur l'identification automatique des langues (F. Pellegrino, ed.), ch. La discrimination des langues par la prosodie: modélisation linguistique et étude comportementale, Éditions de l'Institut des Sciences de l'Homme, 1999.