Skew Angle Estimation of Scanned Handwritten Arabic Documents Using a Time-Frequency Analysis of the Projection Histograms
Estimation de L’Inclinaison D’un Document Arabe Manuscrit Numérisé par Analyse Temps-Fréquence des Histogrammes de Projection
OPEN ACCESS
Ancient Arabic textual archives contain a heavy volume of handwritten documents that need to be scanned and indexed. Some of these documents are skewed, making their recognition and indexing difficult because straight lines are more suitable for the word extraction by recognition systems. We are looking for a method that can robustly estimate this orientation, whatever the size of the document. The scientific literature already proposes some solutions for image document skew angle estimation. The projection techniques seem the most appropriate ones but need to be adapted to Arabic documents. In fact, in Arabic script, the words are made of PAWs (Parts of Arabic Words) which are almost vertical or oblique and which may distort the calculation of local orientation. This prevents to apply local techniques like nearest neighbors, because of the alignment irregularity, or global techniques such as the Hough Transform because of the difficulty of locating voting points. Although these techniques fit well to printed documents, they remain inadequate to handwritten documents, in which the interline distance is random and the skew angle can be large. Kavallieratou et al. employed Cohen's class distributions on Latin documents. This Cohen's class contains all the quadratic time-frequency distributions that are covariant under time- and frequency-shifts. The members of this class are identified by a particular kernel φdD(τ,ξ), which determines their theoretical properties and their practical readability.
In Kavallieratou's paper, the relationship between the distributions properties and the experimental results are not highlighted. We propose in this article to look for the most relevant properties related to the skew angle estimation problem and to find, thanks to them, the best distribution to use.
To estimate the orientation angle, we propose to compute a time-frequency representation of the analytic signal xa(t) of the centered squared root of the projection histogram x(t) of the document. The projection angle corresponding to the histogram with the highest maximum value of its time-frequency representation is considered as an estimation of the document orientation. To study the effectiveness of our approach, we have experimented it on 864 Arabic handwritten documents. These documents have different sizes, contain several types of writing, layout (with 1 or 2 columns), a mix of text and tables, etc. The experiments were prepared after a manual orientation of the documents into different angles ranging from – 75° to +90°. We found that the Wigner-Ville distribution reaches the highest estimation rate (100 %). The other distributions yield a lower estimation rate, either because they do not satisfy properties that are important for the skew angle estimation problem, such as the scale invariance property and the support conservation, because their localization of the signal components is not sufficiently precise to provide a skew angle estimation with the maximum of the representation, or because the parameters of these distributions are not fitted to the analysed histogram profiles. The skew angle estimator using the Wigner-Ville distribution is also compared to the projection analysis and Fourier Transform methods.
Résumé
Nous présentons dans cet article une nouvelle méthode de détermination de l'inclinaison d'un document manuscrit arabe à l'aide d'une représentation temps-fréquence énergétique de la classe de Cohen. Cette méthode consiste à calculer d'abord les histogrammes de projection obtenus pour différents angles, puis à déterminer la valeur maximale de la représentation temps-fréquence de la racine carrée de ces histogrammes. L'orientation du document est alors estimée par l'angle de projection fournissant la valeur maximale la plus élevée. La méthode proposée a été testée sur 864 documents inclinés avec 9 représentations temps-fréquence différentes. Les résultats sont présentés et analysés à la fin de cet article.
Handwritten documents, energy distributions, Cohen’s class, projection histograms, skew angle estimation.
Mots clés
Documents manuscrits, distributions d'énergie, classe de Cohen, histogramme de projection, estimation de l'angle d'orientation.
[1] T. AKIYAMA and N. HAGITA, Automatic entry system for printed documents. Pattern Recognition, 23: 1141-1154, 1990.
[2] F. AUGER and C. DONCARLI, Quelques commentaires sur des représentations temps-fréquence proposées récemment. Traitement du Signal, 9(1) : 3-25, 1992.
[3] F. AUGER, O. LEMOINE, P. GONCALVES, and P. FLANDRIN. Time-frequency toolbox for MATLAB, user’s guide and reference guide. http : //tftb.nongnu.org, 1996.
[4] A. BELAID and Y. BELAID, Reconnaissance des formes : méthodes et applications. InterEditions, 1992.
[5] T. BLU and J. LEBRUN, Analyse temps-fréquence linéaire II : représentations de type ondelettes. In F. Hlawatsch and F. Auger, editors, Temps-fréquence, concept et outils, pages 101-138. Hermès, Traité IC2, 2005.
[6] B. BOASHASH, Estimating and interpreting the instantaneous frequency of a signal – part I : Fundamentals. Proc. IEEE, 80(4) : 519-538, 1992.
[7] B. BOASHASH, B. LOVELL, and L. WHITE, Time frequency analysis and pattern recognition using singular value decomposition of the Wigner-Ville distribution. Adv. Algorithms Architect. Signal Process., Proc. SPIE 828, pages 104-114, 1987.
[8] P. BOLES and B. BOASHASH, The cross Wigner-Ville distribution– a two dimensional analysis method for the processing of vibroseis seismic signals. Proc. IEEE ICASP 87, pages 904-907, 1988.
[9] M. BORN and P. JORDAN, Zur quantenmechanik. Z. Phys, 34: 858-888, 1925.
[10] A. CHAUDHURI and S. CHAUDHURI. Robust detection of skew in document images. IEEE Trans. on Image Process., 6(2) : 344-349, 1997.
[11] S. CHEN and R. M. HARALICK, An automatic algorithm for text skew estimation in document images using recursive morphological transforms. Proc. Int. Conf. on Image Processing, Austin, USA, 1: 139-143, 1994.
[12] H. CHOI and W. J. WILLIAMS, Improved time-frequency representation of multicomponent signals using exponential kernels. IEEE Trans. Acoust., Speech, Signal Process., 37(6) : 862-871, 1989.
[13] T. CLASSEN and W. MECKLENBRAUKER, The wigner distribution – a tool for time frequency analysis. Parts I-III, Philips J. Res., 35, Part I : No 3, p. 217-250 ; Part II : No 4/5, p. 372-389 ; Part III : No 6, p. 372-389, 1980.
[14] L. COHEN, Generalized phase-space distribution functions. J. Math. Phys., 7(5) : 781-786, mai 1966.
[15] L. COHEN, Time-frequency distributions – a review. Proc. IEEE, 77(7) : 941-948, 1989.
[16] G. CRISTOBAL, J. BESCOS, and J. SANTAMARIA, Application of Wigner distribution for image representation and analysis. Proc. IEEE Eighth Int. Conf. Pattern Recogn., 23: 998-1000, 1986.
[17] A. K. DAS and B. CHANDA, A fast algorithm for skew detection of document images using morphology. Int. Journal on Document Analysis and Recognition, 4(2) : 109-114, January 2001.
[18] B. ESCUDIÉ and J. GRÉA, Sur une formulation générale de la représentation en temps et en fréquence dans l’analyse des signaux d’énergie finie. CR. Acad. Sci. Paris, 283: 1049-1051, 1976.
[19] P. FLANDRIN, Temps-fréquence. Hermes, Paris, 1993.
[20] P. FLANDRIN, Time-Frequency/Time-Scale Analysis. Academic Press, San Diego, CA, 1999.
[21] P. FLANDRIN, N. MARTIN, F. AUGER, C. DEMARS, T. DOLIGEZ, LAMBERT-NEBOUT, J. MARS, F. MOLINARO, J. OVARLEZ, and O. RIOUL, Méthodes temps-fréquence : Fiches synthétiques. numéro spécial de la revue Traitement du Signal, 9(1) : 77-113, 1992.
[22] P. FLANDRIN and W. MARTIN, A general class of estimators for the Wigner-Ville spectrum of nonstationary processes. In A. Bensoussan and J.-L. Lions, editors, Systems Analysis and Optimization of Systems, Lecture Notes in Control and Information Sciences, volume 62, pages 15-23. Springer, Berlin, 1984.
[23] R. GRIBONVAL, Analyse temps-fréquence linéaire I : représentations de type Fourier. In F. Hlawatsch and F. Auger, editors, Tempsfréquence, concept et outils, pages 69-100. Hermès, Traité IC2, 2005.
[24] R. HIPPENSTIEL and P. M. D. OLIVIERA, Time varying spectral estimation using the instantaneous power spectrum (ips). IEEE Trans. Acoust., Speech, Signal Process., 38(10) : 1752-1759, 1990.
[25] F. HLAWATSCH and F. AUGER, Temps-fréquence : concepts et outils. Hermes, Lavoisier, Paris, 2005.
[26] F. HLAWATSCH and G. F. BOUDREAUX-BARTELS, Linear and quadratic time-frequency signal representation. IEEE Signal Process. Mag, 9(2) : 21-67, 1992.
[27] S. INGLIS, Lossless document image compression. Thèse de doctorat, Université de Waikato, New Zealand, Mars 1999.
[28] H. F. JIANG, C. C. HAN, and K. C. FAN. A fast approach to the detection and correction of skew documents. Pattern Recognition Letters, 18: 675-686, 1997.
[29] J. F. KAISER and R. W. SCHAFER, On the use of the Io-Sinh window for spectrum analysis. IEEE Trans. on Acoustics, Speech and Signal Process., ASSP-28(1), 1980.
[30] E. KAVALLIERATOU, N. FAKOTAKIS, and G. KOKKINAKIS, Skew angle estimation in document processing using Cohen’s class distributions. Pattern Recogn. Lett., 20: 11-13, 1999.
[31] E. KAVALLIERATOU, N. FAKOTAKIS, and G. KOKKINAKIS, Skew angle estimation for printed and handwritten documents using the Wigner-Ville distribution. Image and Vision Computing, 20: 813-824, 2002.
[32] O. KENNY and B. BOASHASH, An optical signal processing for time-frequency signal analysis using the Wigner-Ville distribution. J. Elec. Electron Eng, pages 152-158, 1988.
[33] H. MARGENAU and R. W. HILL, Correlation between measurements in quantum theory. Proc. Theor. Phys., 26: 772-738, 1961.
[34] N. OUWAYED and A. BELAID, Multioriented text line extraction from handwritten arabic documents. The Eighth IAPR Workshop on Document Analysis Systems (DAS 2008), pages 339-346, 2008.
[35] N. OUWAYED and A. BELAID, Une approache générale pour l’extraction des lignes des documents arabes anciens multiorientés. In 12ème Colloque International sur le Document Électronique (CIDE.12), Canada, 10 2009.
[36] N. OUWAYED, A. BELAID, and F. AUGER, Cohens class distributions for skew angle estimation in noisy ancient arabic documents. In 10th Int. Conf. on Document Analysis and Recognition (ICDAR’2009) – Third Workshop on Analytics for Noisy Unstructured Text Data (AND’2009), Spain, 07 2009.
[37] J.-P. OVARLEZ, P. GONCALVÈS, and R. BARANIUK, Analyse temps-fréquence quadratique III : la classe affine et autres classes covariantes. In F. Hlawatsch and F. Auger, editors, Temps-fréquence, concept et outils, pages 201-236. Hermès, Traité IC2, 2005.
[38] U. PAL and B. B. CHAUDHURI, An improved document skew angle estimation technique. Pattern Recognition Letters, 17: 899-904, 1996.
[39] U. PAL and B. B. CHAUDHURI, Skew angle detection of digitized indian script documents. IEEE Trans. Pattern Anal. Mach. Intell., 19(2) : 182-186, 1997.
[40] T. PAVLIDIS and J. ZHOU. Page segmentation and classification. Computer Vision Graphics and Image Processing, 54(2) : 484-496, 1992.
[41] G. PEAKE and T. TAN, A general algorithm for document skew angle estimation. IEEE Int. Conf. Image Process., 2 : 230-233, 1997.
[42] W. POSTL, Detection of linear oblique structures and skew scan in digitized documents. Proceedings of the Eighth International Conference on Pattern Recognition, IEEE CS Press, Los Alamitos, CA, pages 687-689, 1986.
[43] A. W. RIHACZEK, Principles of High-Resolution Radar. Artech House, Norwood, MA, 1992.
[44] J. SAUVOLA, D. DOERMANN, and M. PIETIKAINEN, Locally adaptive document skew detection. Proc. SPIE Document Recognition and Retrieval IV, 3027: 96-108, 1997.
[45] S. N. SRIHARI and V. GOVINDRAJU, Analysis of textual image using the Hough transform. Machine Vision Applications, 2 : 141-153, 1989.
[46] L. STANKOVIC, A method for time-frequency analysis. IEEE Trans. Signal Process., 42(1) : 255-229, 1994.
[47] E. P. WIGNER, On the quantum correction for thermodynamic equilibrium. Phys. Rev., 40: 749-759, 1932.
[48] P. Y. YIN, Skew detection and block classification of printed documents. Image Vis. Comput. 19 (8), pages 567-579, 2001.
[49] Y. ZHAO, L. E. ATLAS, and R. J. MARKS, The use of cone-shaped kernels for generalized time-frequency representations of nonstationary signals. IEEE Trans. Acoust., Speech, Signal Process., 38(7) : 1084-1091, 1990.