Distribution temps-fréquence à paramétrisation radialement Gaussienne optimisée pour la classification
Optimizing time-frequency representations for signal classification using radially Gaussian kernels
OPEN ACCESS
Time-frequency and time-scale distributions offer a broad class of tools for nonstationary signal analysis. The price to pay is however the choice of a well-adapted, possibly signal-dependent, representation. Time-frequency distributions with radially Gaussian kernel (RGK) provide tunable representations for a large class of applications. This parameterization was initially introduced by Baraniuk and Jones in [1] for reducing interference terms in time-frequency distributions. The RGK has attracted the attention of many researchers since this pioneering work, for instance [3] where it is optimized for a given classification problem. However, all these works do not necessarily take advantage of the recent developments in pattern recognition with kernel machines. This paper provides a connection between the optimization of RGK for classifying non-stationary signals and some criteria coming from the machine learning literature.
We consider more specifically the kernel-target alignment, a criterion that allows to select optimal reproducing kernels.
We show that this criterion leads to an optimization problem similar to that initially proposed by Baraniuk and Jones for signal analysis. Experimental results demonstrate the relevance of our approach.
Over the last decade, a wide class of pattern recognition algorithms based on the theory of reproducing kernel Hilbert spaces have been proposed, with improved performance and low computational complexity. The latter is mainly due to the kernel trick, a key idea that allows to transform any conventional linear technique into a nonlinear one if it can be expressed only in terms of inner products. It suffices to replace them by a reproducing kernel, since it corresponds to aninner product between data mapped (implicitly) into a nonlinearly transformed space. Most popular kernel machines include the Support Vector Machines (SVM) for classification and regression [9], kernel principal component analysis [9] and kernel Fisher discriminant analysis [10].
Recently, we have presented in [13, 14] a new framework for nonstationary signal classification and analysis based on machine learning considerations. For this purpose, we considered the Cohen’s class of time-frequency distributions as a natural choice for nonlinear transformations to be applied to nonstationary signals. It can be shown that the reproducing kernel associated to these spaces of representations can be written as (17). Here П is the parameterizing function of the time-frequency distribution, expressed in polar coordinates, and Ax the ambiguity function. The cumbersome optimization of the two-dimensional function П can be overcome by using the RGK, which is of the form Пσ(r,θ) = e-r 2/2σ2(θ). By injecting this expression into the general form of the reproducing kernel defined in (17), we get (18). Equivalently, by considering the discretization recommended in [16] that involves polar coordinates, we get (19) where Δr = 2√π/l with l the signal length. By considering such a reproducing kernel, we can therefore use the most effective and innovative kernel machines to process time-frequency representations. We can also take advantage of the large spectrum of kernel selection criteria in the machine learning literature to optimize time-frequency representations. In what follows, we study one particular criterion, the kernel-target alignment.
Introduced by Cristianini et al. in [30], the alignment is a measure of similarity between two reproducing kernels, or between a reproducing kernel and a target function. Given a learning set, the alignment between two kernels κ1 and κ2 is defined by (20) where <·,·>F is the Frobenius inner product, and K1 and K2 the Gram matrices with (i, j)-th entries κ1(xi,xj) and κ2(xi,xj). In [20, 30], Cristianini et al. suggest to determine the optimal reproducing kernel for a given classification problem as the one that maximizes the alignment with an ideal kernel. The ideal kernel for a bi-class problem is the one that transforms each xi into its label yi. The (i, j)-th entry of the resulting ideal Gram matrix then is yi yj. Therefore the kernel-target alignment criterion applied to a σ-tunable kernel is given by (22), namely, σ∗ = arg max σ<Kσ,K∗>F/n‖Kσ‖F where Kσ denotes the Gram matrix of the tunable kernel and K∗ the ideal target matrix.
By considering the reproducing kernel associated with the RGK time-frequency distribution, we can further simplify the resulting optimization problem. For this purpose, we can write it as an optimization problem that consists of maximizing the numerator of the objective function in (22) subject to setting its denominator to a constant. This optimization problem is given by (23) subject to (24), where υ0 is a normalization parameter. By injecting the RGK-based reproducing kernel, we can write the objective function as (25). We get an objective function similar to the one proposed in [1] within a signal analysis context. The signal-dependent entry in the latter, say |Ax(r,θ)|2, is substituted here by the so-called equivalent representation defined by |Aeq(r,θ)|2 = ∑i, j yi yj Axi(r,θ) Axj(r,θ). We can therefore take advantage of the alternate-projection technique proposed in [1]. In particular, we relax the constraint (24) and replace it with a low-cost computation constraint on the volume of the parameter function as follows: ∫ σ2(θ) dθ = υ0' . Compared with [1], it is worth noting that the computational complexity of the technique remains unchanged once we have evaluated the equivalent representation.
In order to solve the optimization problem, we consider an alternate-projecting technique inspired from [1] with two updating rules at each iteration. Let σk be the bandwidth parameter at iteration k. At iteration k + 1, we perform a gradient ascent update using expression (28), where µk is the step-size parameter controlling the convergence of the algorithm, and f the objective function to be maximized in (26). The gradient of f is defined by the vector of entries given by expression (29) for θ = 0,1,...,l-1. In order to take into account the constraint in the optimization problem, we project the solution onto the set of admissible functions. This can be implemented by normalizing σk+1(θ) at each iteration step with ‖σk+1(θ)‖/υ0' .
In order to illustrate the relevance of our approach, we propose to study two classification problems. Each one consists of two classes of two hundred 64-sample signals with a linear frequency modulation embedded in a white Gaussian noise. The signal-to-noise ratio is approximately –8 dB. In the first problem, signals are characterized by an increasing frequency modulation, from 0.1 to 0.25 for the first class, and from 0.25 to 0.4 for the second one. In the second problem, signals are characterized by an increasing frequency modulation from 0.1 to 0.4, and by a decreasing one from 0.4 to 0.1. Figure 1 shows the resulting RGK parameterization function. We observe that the regions in the Doppler-lag plan are relevant for the classification problems we have considered. The convergence of the algorithm is illustrated in Figure 2 where we show, at each iteration, the value of the alignment, averaged over 20 trials. In order to illustrate the relevance of the proposed method, we estimate the generalization error of an SVM classifier associated with the reproducing kernel resulting from our approach, and we compare it with the one associated with the Wigner distribution. Table 1 shows that not only the classification error is reduced compared to the Wigner distribution, but also the number of Support Vectors which is divided by two.
Résumé
Cet article traite de l’optimisation des distributions temps-fréquence pour la résolution de problèmes de classification de signaux. On s’intéresse en particulier à la distribution à fonction de paramétrisation radialement Gaussienne, que l’on ajuste par optimisation de l’alignement noyau-cible. Initialement développé pour la sélection de noyau reproduisant en Machine Learning, ce critère présente l’intérêt de ne nécessiter aucun cycle d’apprentissage. On montre que l’on peut obtenir la fonction de paramétrisation radialement Gaussienne maximisant celui-ci en détournant une technique classique de réduction de termes interférentiels dans les représentations temps-fréquence. On illustre l’efficacité de cette approche à l’aide d’expérimentations.
Time-frequency analysis, classification, radially Gaussian kernel, kernel-target alignment, machine learning
Mots clés
Analyse temps-fréquence, classification, noyau radialement Gaussien, critère d’alignement, reconnaissance des formes
[1] R. BARANIUK and D. JONES, «Signal-dependent timefrequency analysis using a radially gaussian kernel, » Signal Processing, vol. 32, no. 3, pp. 263–284, 1993.
[2] D. JONES and R. BARANIUK, «An adaptive optimalkernel timefrequency representation, » IEEE Trans. Signal Processing, vol. 43, no. 10, pp. 2361-2371, 1995.
[3] M. DAVY and C. DONCARLI, «Optimal kernels of time-frequency representations for signal classification,» in Proc. of the IEEE International Symposium on Time-Frequency and Time-Scale analysis, (Pittsburgh, USA), pp. 581-584, Oct. 1998.
[4] C. DONCARLI and N. MARTIN, Décision dans le plan temps-fré-quence. Paris: Hermès Sciences, Traité IC2, 2004.
[5] M. DAVY, A. GRETTON, A. DOUCET, and P. RAYNER, «Optimised support vector machines for nonstationary signal classification,» IEEE Signal Processing Letters, vol. 9, no. 12, pp. 442-445, 2002.
[6] A. RAKOTOMAMONJY, X. MARY, and S. CANU, «Non-parametric regression with wavelet kernels, » Applied Stochastic Models in Business and Industry, vol. 21, no. 2, pp. 153-163, 2005.
[7] B. BOSER, I. GUYON, and V. VAPNIK, «An training algorithm for optimal margin classifiers,» in Proc. 5th Annual Workshop on Computational Learning Theory, pp. 144-152, 1992.
[8] V. VAPNIK, The Nature of Statistical Learning Theory. New York: Springer-Verlag, 1995.
[9] B. SCHÖLKOPF, A. SMOLA, and K. MÜLLER, «Nonlinear component analysis as a kernel eigenvalue problem, » Neural Computation, vol. 10, no. 5, pp. 1299-1319, 1998.
[10] S. MIKA, G. RÄTSCH, J. WESTON, B. SCHÖLKOPF, and K. MÜLLER, «Fisher discriminant analysis with kernels, » in Advances in neural networks for signal processing (Y. H. Hu, J. Larsen, E. Wilson, and S. Douglas, eds.), (San Mateo, CA, USA), pp. 41-48, Morgan Kaufmann, 1999.
[11] G. BAUDAT and F. ANOUAR, «Generalized discriminant analysis using a kernel approach, » Neural Computation, vol. 12, no. 10, pp. 2385-2404, 2000.
[12] F. R. BACH and M. I. JORDAN, «Kernel independent component analysis, » Journal of Machine Learning Research, vol. 3, pp. 1-48, 2002.
[13] P. HONEINE, C. RICHARD, and P. FLANDRIN, «Reconnaissance des formes par méthodes à noyau dans le domaine tempsfréquence,» in Actes du XXème Colloque GRETSI sur le Traitement du Signal et des Images, (Louvain-la-Neuve, Belgium), 2005.
[14] P. HONEINE, C. RICHARD, and P. FLANDRIN, «Timefrequency learning machines,» IEEE Trans. Signal Processing, vol. 55, pp. 3930-3936, July 2007.
[15] P. HONEINE, C. RICHARD, P. FLANDRIN, and J.-B. POTHIN, «Optimal selection of time-frequency representations for signal classification: A kernel-target alignment approach,» in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (Toulouse, France), May 2006.
[16] R. BARANIUK, «Shear madness: Signal-dependent and metaleptic time-frequency representation,» Tech. Rep. UILU-ENG-92-2226, Coordinated Science Laboratory, University of Illinois, Urbana, 1992.
[17] B. SCHÖLKOPF, R. HERBRICH, and R. WILLIAMSON, «A generalized representer theorem, » Tech. Rep. NC2-TR-2000-81, NeuroCOLT, Royal Holloway College, University of London, UK, 2000.
[18] N. ARONSZAJN, «Theory of reproducing kernels, » Trans. Amer. Math. Soc., vol. 68, pp. 337-404, 1950.
[19] R. HERBRICH, Learning kernel classifiers. Theory and algorithms. Cambridge, MA, USA: The MIT Press, 2002.
[20] N. CRISTIANINI, J. KANDOLA, A. ELISSEE, and J. SHAWETAYLOR, «On kernel target alignment, » in Innovations in Machine Learning: Theory and Application (D. Holmes and L. Jain, eds.), pp. 205-255, Springer Verlag, 2006.
[21] S. AMARI and S. WU, «Improving support vector machine classifiers by modifying kernel functions, » Neural Networks, vol. 12, pp. 783-789, July 1999.
[22] D. MEYER, F. LEISCH, and K. HORNIK, «The support vector machine under test, » Neurocomputing, vol. 55, pp. 169-186, September 2003.
[23] C. BURGES, «A tutorial on support vector machines for pattern recognition,» Data Min. Knowl. Discov., vol. 2, no. 2, pp. 121-167, 1998.
[24] K.-M. CHUNG, W.-C. KAO, C.-L. SUN, L.-L. WANG, and C.-J. LIN, «Radius margin bounds for support vector machines with the rbf kernel, » Neural Comput., vol. 15, no. 11, pp. 2643-2681, 2003.
[25] O. CHAPELLE, V. VAPNIK, O. BOUSQUET, and S. MUKHERJEE, « Choosing multiple parameters for support vector machines, » Machine Learning, vol. 46, no. 1-3, pp. 131-159, 2002.
[26] K. DUAN, S. KEERTHI, and A. POO, «Evaluation of simple performance measures for tuning svm hyperparameters, » Neurocomputing, vol. 51, pp. 41-59, 2003.
[27] J. KANDOLA, J. SHAWE-TAYLOR, and N. CRISTIANINI, «Optimizing kernel alignment over combinations of kernels, » Tech. Rep. 121, Department of Computer Science, University of London, 2002.
[28] J.-B. POTHIN and C. RICHARD, «Kernel machines : une nouvelle méthode pour l’optimisation de l’alignement des noyaux et l’amélioration des performances,» in Actes du XXème Colloque GRETSI sur le Traitement du Signal et des Images, (Louvain-la-Neuve, Belgium), 2005.
[29] J.-B. POTHIN and C. RICHARD, «A greedy algorithm for optimizing the kernel alignment and the performance of kernel machines, » in Proc. EUSIPCO, (Florence, Italy), 2006.
[30] N. CRISTIANINI, J. SHAWE-TAYLOR, A. ELISSEE, and J. KANDOLA, « On kernel-target alignment, » in Proc. Neural Information Processing Systems (NIPS) 14 (T. G. Dietterich, S. Becker, and Z. Ghahramani, eds.), pp. 367-373, MIT Press, December 2001.
[31] J. SHAWE-TAYLOR and N. CRISTIANINI, Kernel Methods for Pattern Analysis. Cambridge University Press, 2004.
[32] J. KANDOLA, J. SHAWE-TAYLOR, and N. CRISTIANINI, «On the extensions of kernel alignment, » Tech. Rep. 120, Department of Computer Science, University of London, 2002.
[33] G. WU, E. Y. CHANG, and N. PANDA, «Formulating distance functions via the kernel trick, » in Proc. 11th ACM International conference on knowledge discovery in Data mining, pp. 703-709, 2005.
[34] G. LANCKRIET, N. CRISTIANINI, P. BARTLETT, L. E. GHAOUI, and M. Jordan, «Learning the kernel matrix with semi-definite programming,» in Proc. 19th International Conference on Machine Learning, pp. 323-330, 2002.