Home Journals TS Débruitage de parole par un filtrage utilisant l'image du locuteur. Une étude de faisabilité

JOURNAL METRICS

Impact Factor (JCR) 2024: 1 ℹImpact Factor (JCR):

The JCR provides quantitative tools for ranking, evaluating, categorizing, and comparing journals. The impact factor is one of these; it is a measure of the frequency with which the “average article” in a journal has been cited in a particular year or period. The annual JCR impact factor is a ratio between citations and recent citable items published. Thus, the impact factor of a journal is calculated by dividing the number of current year citations to the source items published in that journal during the previous two years.

5-Year Impact Factor: 1.2 ℹ5-Year Impact Factor:

A 5-Year Impact Factor shows the long-term citation trend for a journal. This is calculated differently from the Journal Impact Factor, so it is not simply an average of the Impact Factors in the time period. The Impact Factor itself is based only on Web of Science Core Collection citation data from the last three years and thus reflects only recent impact. The Journal Impact Factor is the average number of times articles from the journal published in the past two years have been cited in the Journal Citation Reports year.

Débruitage de parole par un filtrage utilisant l'image du locuteur. Une étude de faisabilité

Speech Enhancement with Filters Estimated from the Speaker's Image. A Feasibility Study

Laurent Girin | Gang Feng | Jean-Luc Schwartz

Institut de la Communication Parlée, UPRESA 5009 Institut National Polytechnique de Grenoble Université Stendhal Domaine Universitaire 1 180 Av. Centrale - B.P. 25 38040 GRENOBLE Cedex 9

Corresponding Author Email:

girin@icp.grenet.fr

Received:

6 July 1996

| |

Accepted:

N/A

| | Citation

ts13_4_319-334.pdf

OPEN ACCESS

Abstract:

Since speech is both auditory and visual, visual cues could compensate to a certain extent the deficiency of auditory ones, in order to improve man-machine communication and telecommunication systems. This paper deals with a noise reduction method based on speech enhancement with adaptive filters estimated from the speaker's lip pattern . We first present the two selected filtering techniques, and then the tool we used to predict the filter pattern from the lip shape . The whole noise reduction system is tested in the context of stationary vowels including a first kick into the problem of non-visible gestures . The results of perceptual tests are quite promising and allow us to validate the basic principles of our system, which should be tested with more complex stimuli in the future .

Résumé

La parole étant à la fois acoustique et visuelle, il est intéressant d'utiliser cette bimodalité pour améliorer les performances des systèmes de télécommunication et de communication homme-machine . Nous proposons dans cet article une méthode originale de réduction de bruit utilisant des filtres estimés à partir de la forme des lèvres du locuteur. Après avoir décrit deux techniques de filtrage, nous présentons une méthode simple et efficace pour relier la forme des lèvres et ces filtres . Le système complet de débruitage est testé dans le cadre de voyelles stationnaires, qui permet une première approche du problème des gestes non-visibles. Les résultats de tests perceptifs sont très encourageants et permettent de valider les principes de base du système, sous réserve d'extensions futures à des stimuli plus complexes.

Keywords:

Speech enhancement, Audiovisual speech, Filtering, Image processing, Spectral estimation

Mots clés

Débruitage de parole, Parole audiovisuelle, Filtrage, Traitement d'images, Estimation de spectres

1. Introduction

2. Élaboration Des Filtres

3. L'associateur Lèvres-Spectre

4. Évaluation Du Système

5. Discussion

6. Conclusion

References

[1] D . Baudois, C . Servière, & A . Silvent, « Algorithmes adaptatifs et soustraction de bruit », Traitement du Signal, vol . 6, No . 5, 1989, pp. 391-497 .

[2) C . Benoît, T. Mohamadi, & S . Kandel, « Effects of phonetic context on audio - visual intelligibility of French », J. Speech and Hearing Research, vol. 37, 1994, pp. 1195-1203 .

[3] Calliope, La parole et son traitement automatique, J.P. Tubach (Ed.), Masson, Paris, 1989 .

[4]P. Ducknowski, U . Meier, & A. Waibel, « See me, hear me : integrating automatic speech recognition and lip reading », Int. Conf. on Spoken Language Processing, Yokohama, Japan, 1994, pp. 547-550 .

[5] N .P. Erber, « Interaction of audition and vision in the recognition of oral speech stimuli », J. of Speech and Hearing Research, vol. 12, 1969, pp. 423-425.

[6] K .E . Finn, An investigation of visible lip information to be used in automated speech recognition, Doctoral dissertation, Georgetown University, Washington DC, 1986 .

[7] C . Fowler, « Coarticulation and theories of extrinsic timing », J. of Phonetics , vol . 8, 1980, pp . 113—133 .

[8] A .J . Goldschen, Continuous automatic speech recognition by lipreading , Doctoral dissertation, George Washington University, 1993 .

[9] T. Lallouache, Un poste « Visage Parole » couleur : acquisition et traitement automatique des contours des lèvres, Thèse doctorale, INPG, Grenoble, 1990 .

[10] J .S . Lim, Speech enhancement, Prentice-Hall, Englewood Cliffs, NJ, 1983 .

[11] H . McGurk, & J . MacDonald, «Hearing lips and seeing voices », Nature , vol. 264, 1976, pp . 746-748 .

[12] A. McLeod, & Q . Summerfield, « Quantifying the contribution of vision to speech perception in noise », British Journal of Audiology, vol . 21, 1987 , pp. 131–141 .

[13] M.W. Mak, & W.G . Allen, « Lip-motion analysis for speech segmentation in noise », Speech Communication, vol . 14, 1994, pp . 279-296 .

[14] J. Makhoul, « Linear prediction : a tutorial review », Proc. IEEE, vol . 63 , No. 4, 1975, pp . 561—580 .

[15] J.D . Markel, & A,H.Jr. Gray, Linear prediction of speech, Springer-Verlag , New York, 1976.

[16] K. Mase, & A. Pentland, « Automatic lipreading by optical-flow analysis » , Systems and Computers in Japan, vol . 22, No . 6 , 1991, pp . 67—76 .

[17] D .W. Massaro, «Testing between trace model and the fuzzy logical model of speech perception », Cognitive Psychology, vol. 21, 1989, pp . 398-421.

[18] NATO ASI Workshop, Speechreading by man and machine : models, systems and applications, D .G . Stork (Ed.), à paraître.

[19] S . Öhman, «Coarticulation in VCV utterance : spectrographic measurements », J. Acoust. Soc. Am ., vol. 39, 1966, pp . 151—168 .

[20] A . Papoulis, Signal analysis, McGraw–Hill, New York, 1977

[21] E.D . Petajan, Automatic lipreading to enhance speech recognition, Doctoral thesis, University of Illinois, 1984 .

[22] L.R . Rabiner, & R.W. Schafer, Digital processing of speech signals, Prentice- Hall (Signal Processing Series), 1978.

[23] J. Robert-Ribes, Modèles d'intégration audiovisuelle de signaux linguistiques : de la perception humaine à la reconnaissance automatique des voyelles, Thèse doctorale, INPG, Grenoble, 1995 .

[24] M.R. Schroeder, B.S. Atal, & J.L. Hall, « Objective measure of certain speech signal degradations based on masking properties of human auditory perception », Frontiers of Speech Communication Research, B . Lindblom & S .Ohman (Eds .) Academic Press, London, 1979, pp . 217-229 .

[25] D .G . Stork, G . Wolff, & E . Levine, « Neural network lipreading system for improved speech recognition », Int. Joint Conf on Neural Networks, Baltimore, 1992, pp. 285-295 .

[26] W.H . Sumby, & I. Pollack, « Visual contribution to speech intelligibility in noise », J.Acoust. Soc. Am., vol . 26, 1954, pp. 212-215 .

[27] Q. Summerfield, « Some preliminaries to a comprehensive account of audio visual speech perception », Hearing by eye : the psychology of lipreading, B .Dodd & R.Campbell (Eds .), Lawrence Erlbaum Associates, London, 1987 , pp . 3-51 .

[28] Q. Summerfield, A . McLeod, M. McGrath, & M . Brooke, « Lips, teeth and the benefits of lipreading », Handbook of research on face processing, A.W. Young & H .D . Ellis (Eds .), Elsevier Science Publishers B .V., North-Holland , 1989, pp . 223-233 .

[29] P. Vary, « Noise suppression by spectral magnitude estimation-mechanism and theoretical limits », Signal Processing, vol . 8, No . 4, 1985 . pp . 387-400 .

[30] D .L. Wang, & J .S . Lim, «The unimportance of phase in speech enhancement », IEEE Trans. Acoust., Speech, Signal Processing, vol . 30 . No . 4, 1982, pp . 679-681 .

[31] B .P. Yuhas, M .H . Goldstein, T.J. Sejnowski, & R .E . Jenkins . «Neural network models of sensory integration for improved vowel recognition », Proc. IEEE, vol. 78, No . 10, 1990, pp. 1658-1668.

IJHT
MMEP
ACSM
EJEE
ISI
I2M
JESA
RCMA
RIA
TS
IJSDP
IJSSE
IJDNE
JNMES
IJES
EESRJ
RCES
AMA_A
AMA_B
AMA_C
AMA_D
MMC_A
MMC_B
MMC_C
MMC_D

Username
Password
Remember me

Search form

Débruitage de parole par un filtrage utilisant l'image du locuteur. Une étude de faisabilité