A Neighborhood Structure-Preserving Bi-objective Optimization Method Based on Class Center and Discriminant Analysis and Its Application in Facial Recognition

A Neighborhood Structure-Preserving Bi-objective Optimization Method Based on Class Center and Discriminant Analysis and Its Application in Facial Recognition

Wenying Ma

Zibo Vocational Institute, Zibo 255314, Chin

Corresponding Author Email: 
10799@zbvc.cn
Page: 
219-225
|
DOI: 
https://doi.org/10.18280/ria.330308
Received: 
20 March 2019
|
Accepted: 
25 June 2019
|
Published: 
10 October 2019
| Citation

OPEN ACCESS

Abstract: 

Based on class center and discriminant analysis, this paper puts forward a novel bi-objective optimization method that preserves the neighborhood structure, and applies it to facial recognition. Firstly, the locally preserving projection (LPP) was improved into the class-center locally preserving projection (CLPP) by replacing the sample-based neighborhood structure with the class center-based neighborhood structure. Next, a bi-objective optimization model was developed based on the CLPP and linear discriminant analysis (LDA), and solved by the multi-objective optimization theory. The bi-objective optimization problem combines the merits of single-target CLPP and single-target LDA: the class center-based neighborhood structure is preserved, and the class information is introduced naturally, making up for the defect of the LDA due to the manual changes of adjacency coefficient and highlighting the physical meaning. Finally, several experiments were conducted on AR, CAS-60 and FERET face databases. The experimental results prove that our methods are correct and effective, and the bi-objective optimization method based on CLPP and LDA (CLPP+LDA) achieved the best recognition effect.

Keywords: 

locally preserving projection (LPP), class-center LPP (CLPP), bi-objective optimization, face recognition

1. Introduction

In the information age, it is a social hotspot to accurately identify a person and protect his/her information. However, the traditional identification solutions are easily forged or lost, failing to satisfy the social needs. Currently, the most convenient and secure solution is undoubtedly biometric techniques, such as fingerprint recognition, palmprint recognition, face recognition, iris recognition, retina recognition, speech recognition and signature recognition [1]. Among them, face recognition has attracted much attention from scholars engaging in pattern recognition and artificial intelligence (AI). This technique boasts great application prospects in various fields, namely, public security (e.g. criminal identification), security verification system, credit card verification, medicine, file management, video conferencing, and human-computer interaction (HCI).

Much research has been done on face recognition at home and abroad. Judging by the way of face representation, there are three different bases for popular face recognition methods: geometric features, statistical features and connection mechanism [2]. The most heatedly discussed methods are based on statistical features [3], such as principal component analysis (PCA) [4] and linear discriminant analysis (LDA) [5, 6]. To enhance the recognition rate, the LDA maximizes the between-class scatter and minimizes the within-class scatter of the projected samples by minimizing the Fisher criterion [7].

The LDA has been widely applied in face recognition, because it is directly targeted to classification. However, the LDA, as a linear dimensionality reduction technique, might undermine the nonlinear manifold structure of the spatial distribution of face samples [8], despite its excellence in dimensionality reduction. The structure damage has a negative impact on the recognition effect. Many dimensionality reduction methods have been developed to protect the manifold structure, including locally linear embedding (LLE) [8], Isomap [9], and Laplace feature mapping [10]. Nonetheless, these nonlinear methods cannot obtain the low-dimensional projections of new samples.

Locally preserving projection (LPP) [11] is a linear approximation of Laplacian feature mapping. This method maintains the local structure of the sample space by constructing the sample-based neighborhood structure, such that the projections of neighboring samples in the original sample space are still adjacent to each other. However, the LPP is an unsupervised learning strategy. It cannot accurately recognize face images with changes in illumination, posture and expression.

Some scholars have explored the supervised LPP. For example, Zhao et al. [12] put forward locally discriminant projection (LDP), which describes the class of each sample by changing the adjacency efficient between neighbors in different classes. Compared with the LPP, the LDP has a high similarity between neighbors in the same class, and a low similarity between neighbors in different classes. Despite improving the recognition effect, the LDP cannot output scientific, optimal results, for the adjacency coefficient is changed mechanically and manually. Recently, Masashi Sugiyama et al. [13], Kim et al. [14] proposed local Fisher discriminant analysis (LFDA), drawing on Fisher discriminant analysis (FDA) and the LPP. The LFDA maximizes within-class scatter without changing the local structure within each class.

Targeting high-dimensional data in various problems, the above algorithms each applies to a specific type of data distribution. For example, the LDA assumes that the sample vectors of each class obey a multivariate normal distribution with the same covariance matrix and different mean values, and represents each class by a single cluster. In most problems (e.g. face recognition and radar automatic target recognition), each class of data in the dataset satisfies a multimodal distribution [15, 16]. In this case, the LDA and similar methods cannot provide satisfactory recognition results.

Through the above analysis, this paper improves the LPP into the class-center locally preserving projection (CLPP) by replacing the sample-based neighborhood structure with the class center-based neighborhood structure. Next, a bi-objective optimization model was developed based on the CLPP and the LDA, and solved by the multi-objective optimization theory. The bi-objective optimization problem combines the merits of single-target CLPP and single-target LDA: the class center-based neighborhood structure is preserved, and the information of weak and strong supervisions are integrated naturally, making up for the defect of the LDA due to the mechanical, manual changes of adjacency coefficient and highlighting the physical meaning. Finally, several experiments were conducted on AR, CAS-60 and other face databases. The experimental results prove that our methods are effective, the CLPP has better recognition effect than the LPP, and that the bi-objective CLPP+LDA outperforms single-objective CLPP and single-objective LDA in recognition effect.

The remainder of this paper is organized as follows: Section 2 describes the proposed method; Section 3 conducts the experiments and analyzes the experimental results; Section 4 puts forward the conclusions and forecasts the future research.

2. Methodology

2.1 The CLPP

The LPP is an effective dimensionality reduction method that protects the geometry of the sample space. But there are two defects with the LPP: ignoring the class information of samples, and having difficulty in preserving sample-based neighborhood structure. The difficulty arises from the nonuniform spatial distribution of face samples. As shown in Figure 1, any sample may be surrounded by multiple samples in different classes. If preserved, the neighborhood structure will suppress the discriminating effect.

In the face sample library, the class center can statistically represent the feature information of a class. Therefore, this section provides a new neighborhood partitioning method, in which a center point is selected such that most of its neighboring samples belong to the same class as this point. The class center as the center point obviously satisfies this neighborhood relationship. Therefore, a local neighborhood structure was designed based on the center of the current class. This approach is called class-center LPP (CLPP).

The CLPP retains the objective function of the LPP, but replaces the sample-based neighborhood structure with a class center-based neighborhood structure. Firstly, the center point, i.e. the class center, is determined. Next, the k neighboring samples of the class center are determined, creating a neighborhood of k samples. After that, the samples in the neighborhood are weighted. The similarity matrix S can be redefined as:

$S_{i j}=\left\{\begin{array}{ll}{\exp \left(-\left\|x_{i}-x_{j}\right\|^{2} / t\right)} & {\text { If } x_{i} \text { and } x_{j} \text { fall within the class center - based neighborhood}} \\ {0} & {\text {Otherwise}}\end{array}\right.$  (1)

Figure 2 shows the sample distribution in the class center-based neighborhood structure. Comparing Figure 1 and Figure 2, it is clear that most samples in the class center-based neighborhood belong to the same class, which facilitates classification.

Figure 1. Sample-based neighborhood structure

The CLPP algorithm is implemented in the following steps:

Step 1. Perform PCA dimensionality reduction on the sample set. The dimensionally-reduced samples can be expressed as $X^{'}=W_{1}^{T} X$, where W1 is the projection matrix.

Step 2. Calculate the center mi of each class, find the samples xij  in the neighborhood of mi, judge the class attribute of  $x_{i j}^{\prime}$ and compute the similarity of  $x_{i j}^{\prime}$ by formula (1).

Step 3. Obtain the similarity matrix S, and calculate the projection vector (XDXT)-1(XLXT)w=lw, where D is a diagonal matrix. Each diagonal element in D is the sum of the elements in the corresponding column of S.

Step 4. Let (w1,…,wd) be the eigenvectors of the d smallest eigenvalues and W2=(w1,…,wd). Then, the projection matrix can be set up as W=W1W2.

Figure 2. Class center-based neighborhood structure

2.2 Bi-objective optimization based on CLPP and LDA

The previous description shows that the CLPP preserves the neighborhood structure of samples and carries some class information. However, the CLPP is a weakly supervised method lacking strong supervision information. To overcome this defect, a bi-objective optimization model was proposed based on the CLPP and the LDA.

First, the objective of the CLPP was taken as the primary objective. The objective function of the CLPP has the same form as that of the LPP:

$\left\{\begin{array}{l}{\min _{w} w^{T} X L X^{T} w} \\ {s.t.  w^{T} X D X^{T} w=1}\end{array}\right.$   (2)

Next, the objective of the LDA containing strong supervision information was taken as the secondary objective. For better use of neighborhood and class information in bi-objective fusion, the LDA’s objective function was converted to the form of the LPP’s objective function. The conversion was carried out as follows:

The LDA’s objective function can be expressed as:

$\min \frac{w^{T} S_{w} w}{w^{T} S_{B} w}$  (3)

It can be derived that [12]:

$S_{w}=X L a X^{T}$

$L a=I-W$ 

$W_{i j}=\left\{\begin{array}{ll}{\frac{1}{n_{k}}} & {\text { If } x_{i} \text { and } x_{j} \text { belong to the } k \text { -th class }} \\ {0} & {\text { Otherwise }}\end{array}\right\}$  (4)

$S_{B}=X(C-L a) X^{T}$

$C=I-\frac{1}{n} e e^{T}$

$e=[1,1, \ldots, 1]_{n}^{T}$   (5)

To minimize formula (3), the LDA problem can be rewritten as:

$\min w^{T} X L a X^{T} w$

s.t.w$^{T} X(C-L a) X^{T} w=1$   (6)

Formula (6) is the converted objective function of the LDA. Thus, the bi-objective optimization model can be defined as:

$\min w^{T} X L X^{T} w$

$\min w^{T} X L a X^{T} w$

s.t. $w^{T} X D X^{T} w=1$

$w^{T} X(C-L a) X^{T} w=1$   (7)

By the linear weighted sum method, the bi-objective optimization problem in formula (7) can be transformed into a single-objective problem. Let r1 and r2 be weight coefficients satisfying $\sum_{i=1}^{2} r_{i}=1$.

Then, the new objective function can be established as follows by the linear weighted sum method:

$\min w^{T} X\left(r_{1} L+r_{2} L a\right) X^{T} w$

$s.t.  w^{T} X D X^{T} w-1=0$

$w^{T} X(C-L a) X^{T} w-1=0$   (8)

Solving formula (8) with the method of Lagrange multipliers, the following can be derived from $\lambda_{1}=t \lambda_{2}$ :

$\left[X\left(r_{0} L+r_{1} L a\right) X^{T}\right] w=\lambda X[(C-L a)+t D] X^{T} w$   (9)

The projection vector of formula (8) is the eigenvector of the d smallest eigenvalues of formula (9).

In model solving, the weight coefficients r1 and r2 should have reasonable values. According to the general theory of optimization, the values of the two weight coefficients were determined by the judgement matrix through the following steps:

Step 1. Set up a 2×2 matrix (judgement matrix) containing the judgement coefficient aij between the objective of the LDA and that of the CLPP: $A=\left(\begin{array}{ll}{a_{11}} & {a_{12}} \\ {a_{21}} & {a_{22}}\end{array}\right)$ , where a11=1 is the importance of the objective of the CLPP relative to itself, a12 is the importance of the objective of the CLPP relative to that of the LDA, a21 is the importance of the objective of the LDA relative to that of the CLPP, and a22=1 is the importance of the objective of the LDA relative to itself. a21 and a12 are reciprocals.

(1) Compute the ratio of the number of neighbors in the same class to that of neighbors in different classes:

ratio$=\frac{\text {The number of neighbors in the same class}}{\text {The mumber of neighbors in different classes}}$           (10)

(2) Estimate the a21 value based on the ratio under the minimum number of training samples.

(3) Based on the estimated a21 value, calculate the a21 value ($\underset{i}{\mathop{{{a}_{21}}}}\,$) under the other number of training samples:

$\begin{align}  & \underset{i}{\mathop{{{a}_{21}}}}\,\text{=}{{a}_{21}}\times  \\ & \frac{The\text{ }ratio\text{ }under\text{ }the\text{ }minimum\text{ }number\text{ }of\text{ }training\text{ }samples}{The\text{ }ratio\text{ }under\text{ }the\text{ }other\text{ }number\text{ }of\text{ }training\text{ }sampler} \\\end{align}$  (11)

At this point, the judgment matrix has been established under different numbers of training samples.

Step 2. Multiply the elements in A by row:

$\pi_{i}=a_{i 1} \times a_{i 2}$  (12)

Step 3. Find the square roots of $\pi_{i}$ :

$\alpha_{i}=\sqrt{\pi_{i}}$  (13)

Step 4. Obtain a set of weight coefficients by normalizing the square roots:

$r_{i}=\alpha_{i} / \sum_{j=1}^{2} \alpha_{j}$  (14)

Step 5. Repeat Steps 2~4 to determine the weight coefficients of the two objectives under different numbers of training samples.

3. Experiments and Results Analysis

This section carries out experiments on several face databases, namely, AR and CAS-60. Our methods, i.e. the CLPP and the bi-objective optimization algorithm based on the CLPP and the LDA (CLPP+LDA), were compared with the LDA, the LPP and the LDP in the experiments. All methods adopt the nearest neighbor classification strategy. The algorithms were implemented in MATLAB language.

3.1 Experiment on AR face database

The AR face database contains 4,000 plus color images on 126 people. All the images have been converted into grayscale images. Each grayscale image is 768´576 in size and contains 256 grayscales. The authors reduced the images to 60×60. One of the samples is shown in Figure 3.

There are 26 samples in each class. For the experiment, 2 to 10 samples were selected from each class to form the training set. The selected samples are the most representative ones of the varied expressions, illuminations and obstructions. Here, the training set of 10 samples is taken as an example. In each class, the 1st, 2nd, 5th, 8th, 11th, 14th, 17th, 19th, 23rd and 25th samples were selected for training and the remaining 16 samples were reserved for testing.

Figure 3. One of the samples in the AR face database

(1) Determining the weight coefficients r1 and r2

The first step is to compute the ratio of the number of neighbors in the same class to that of neighbors in different classes according to formula (10). Table 1 lists the ratios under different number of training samples from the AR face database.

Table 1. The ratios under different number of training samples from the AR face database

The number of training samples

2

3

4

5

6

7

8

9

10

ratio

6.4375

4.4923

4.2889

4.8333

2.2903

1.9539

1.6817

1.4792

1.0660

Next, the a21 value was estimated based on the ratio under the minimum number of training samples. Since this ratio is 6.4375, a21 must be equal to 1. Then, the a21 was computed under the other numbers of training samples by formula (11) and denoted as $\underset{i}{\mathop{{{a}_{21}}}}\,$.

On this basis, the weight coefficients under two training samples from the AR face database were computed by formulas (12-14) (Table 3).

Finally, the judgment matrices under different numbers of training samples were constructed based on the judgment coefficients in Table 2, and used to determine the weight coefficients under different numbers of training samples (Table 4). Note that r1 is the weight coefficient of the CLPP, and r2=1-r1 is the weight coefficient of the LDA.

Table 2. The judgment coefficients under more than two training samples from the AR face database

The number of training samples i

3

4

5

6

7

8

9

10

$\underset{i}{\mathop{{{a}_{21}}}}\,$

1.43

1.50

1.33

2.81

3.29

3.83

4.35

6.03

Table 3. The weight coefficients under two training samples from the AR face database

Objective

Judgement matrix

Formula (12)

Formula (13)

Formula (14)

CLPP

LDA

CLPP

a11=1

a12=1

1

1

0.5000

LDA

a21=1

a22=1

1

1

0.5000

$\sum \alpha_{i}$

 

2

 

Table 4. The weight coefficients under different number of training samples from the AR face database

The number of training samples

2

3

4

5

6

7

8

9

10

r1

0.5000

0.4115

0.4000

0.4292

0.2625

0.2331

0.2070

0.1869

0.1422

r2

0.5000

0.5885

0.6000

0.5708

0.7375

0.7669

0.7930

0.8131

0.8573

Table 5. Comparison of recognition results on AR face database

Method

CLPP+LDA

CLPP

LDP

LPP

LDA

Mean recognition rate

82.8654

80.3393

80.3423

78.6252

78.4266

(2) Comparison on AR face database

The recognition results of the tested methods on AR face database are compared in Table 5, in which our methods are in bold letters. It can be seen that the CLPP outperformed the LPP; the bi-objective CLPP+LDA achieved better results than the two single-objective methods (CLPP and LDA), and surpassed the LDP by 2.5 % in mean recognition rate. 

3.2 Experiment on CAS-60 face database

The CAS-60 face database is built by Chinese Academy of Sciences. The database offers a total of 1,060 images on 106 people, with 10 images per person. There is a unique feature of this database: the ten images of each person were taken under vastly different illuminations. Each image is 120×90 in size and contains 256 grayscales. The authors reduced the images to 60×45. One of the samples is displayed in Figure 4. 

For the experiment, 2 to 6 samples were selected from each class to form the training set, and the other samples were allocated to the testing set. Here, the training set of 6 samples is taken as an example. In each class, the 1st, 2nd, 3rd, 5th, 6th and 9th samples were selected for training and the remaining 4 samples were reserved for testing.

Figure 4. One of the samples in the CAS-60 face database

(1) Determining the weight coefficients r1 and r2

The weight coefficients of the CLPP+LDA on the CAS-60 face database were computed by the judgement matrix method in 3.1(1). The results are listed in Table 6 below.

(2) Comparison on CAS-60 face database

The recognition results of the tested methods on CAS-60 face database are compared in Table 7. in which our methods are in bold letters. It can be seen that the CLPP outperformed the LPP; the bi-objective CLPP+LDA achieved better results than the two single-objective methods (CLPP and LDA), and surpassed the LDP by 11.5 % in mean recognition rate.

Table 6. The weight coefficients under different number of training samples from the CAS-60 face database

The number of training samples

2

3

4

5

6

r1

0.5000

0.4709

0.1290

0.0969

0.1193

r2

0.5000

0.5291

0.8710

0.9031

0.8807

Table 7. Comparison of recognition results on AR face database

Method

CLPP+LDA

CLPP

LDP

LPP

LDA

Mean recognition rate

86.9158

81.5690

75.4984

73.1408

84.3407

3.3 Experiment on FERET face database

The FERET face database contains a total of 2,200 images on 200 people, with 11 images per person. The images of each person were taken under vastly different illuminations, expressions and postures. Each image is 384×256 in size and contains 256 grayscales. The authors reduced the images to 60×60. One of the samples is displayed in Figure 5.

For the experiment, 2 to 6 samples were selected from each class to form the training set, and the other samples were allocated to the testing set.

Figure 5. One of the samples in the FERET face database

(1) Determining the weight coefficients r1 and r2

Similar to 3.1(1), the first step is to compute the ratio of the number of neighbors in the same class to that of neighbors in different classes. The results are recorded in Table 8 below.

Table 8. The ratios under different number of training samples from the FERET face database

The number of training samples

2

3

4

5

6

ratio

Inf (100)

21.2222

6.4766

2.9841

3.2705

As shown in Table 8, there was no neighbor in different classes when the training set contains two samples. According to sub-step (2), Step 1, Section 2.2, the weight of the CLPP was maximized, i.e. r1=1. Table 9 presents the weight coefficients under different numbers of training samples.

Table 9. The weight coefficients under different number of training samples from the FERET face database

The number of training samples

2

3

4

5

6

r1

1

0.7673

0.5015

0.3167

0.3369

r2

0

0.2327

0.4985

0.6833

0.6631

(2) Comparison on FERET face database

The recognition results of the tested methods on FERET face database are compared in Table 10. in which our methods are in bold letters. It can be seen that the CLPP outperformed the LPP; the bi-objective CLPP+LDA achieved better results than the two single-objective methods (CLPP and LDA), and surpassed the LDP by 1.8 % in mean recognition rate. 

Table 10. Comparison of recognition results on FERET face database

Method

CLPP+LDA

CLPP

LDP

LPP

LDA

Mean recognition rate

76.4747

75.8242

74.6730

72.5148

52.3801

4. Conclusions

Recently, the manifold method, which is based on the LPP, has become increasingly popular in pattern recognition and learning. After several years of development, several manifold methods have emerged, including the LPP, the LFDA and the LDP. All these methods preserve the neighborhood structure based on samples. In this paper, the class center-based neighborhood structure is proved, both graphically and theoretically, to outshine the sample-based neighborhood structure in clustering samples in the same class. On this basis, the authors designed a novel method called the CLPP. However, the CLPP lacks strong supervision information of samples at the time of weighting. To solve the problem, the strongly supervised LDA method was integrated with the weakly supervised CLPP method by the multi-objective optimization theory. The two methods proposed in this paper were tested on face databases like the AR, CAS-60 and FERET. The results fully demonstrate the correctness and effectiveness of our methods.

The future research will investigate the identification of small samples, a key difficulty in the application of biometric technology. For example, the Chinese Ministry of State Security keeps a central population register. Because of the huge population, the register only contains one photo of each person. It is impossible to obtain multiple face images of the same person from the register. In this case, some algorithms (e.g. the neural network) cannot obtain necessary parameters and some may fail (e.g. the LDA). In recent years, many scholars have probed into small sample identification. For instance, Benkaddour and Bounoua [17] conducted feature extracted with deep convolutional neural network (DCNN) and completed face recognition by the PCA and support vector classifier (SVC). Reddy et al. [18] suggested recognizing facial emotions with nonlinear principal component analysis (NLPCA) and support vector machine (SVM). All these methods shed new lights on the identification of small samples, and will be referred to in our future research.

Acknowledgment

This article was supported by academic research funds of Zibo Vocational Institute(2018zzzr01).

  References

[1] Subban, R., Mankame, D.P. (2014). Human face recognition biometric techniques: Analysis and review. Recent Advances in Intelligent Systems and Computing, 235: 455-463. https://doi.org/10.1007/978-3-319-01778-5_47

[2] Taner Eskil, M., Benli, K.S. (2014). Facial expression recognition based on anatomy. Computer Vision and Image Understanding, 119: 1-14. https://doi.org/10.1016/j.cviu.2013.11.002

[3] Perveen, N., Kumar, D., Bhardwaj, I. (2013). facial expression recognition by statistical, spatial features and using decision tree. International Journal of Computer Applications, 64(18): 15-21. http://dx.doi.org/10.5120/10733-5573

[4] Kshirsagar, V.P., Baviskar, M.R., Gaikwad, M.E. (2011). Face recognition using eigenfaces. 2011 3rd International Conference on Computer Research and Development, 2: 302-306. http://dx.doi.org/10.1109/ICCRD.2011.5764137 

[5] Ye, J., Janardan, R., Park, C., Park, H. (2004). An optimization criterion for generalized discriminant analysis on undersampled problems. I IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(8): 982-994. http://dx.doi.org/10.1109/ICDM.2003.1250948

[6] Yu, H., Yang, J. (.001). A direct LDA algorithm for high-dimensional data - with application to face recognition. Pattern Recognition, 34(10): 2067-2070. https://doi.org/10.1016/S0031-3203(00)00162-X

[7] Belhumeur, P.N., Hespanha, J.P., Kriegman, D.J. (1997). Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7): 711-720. http://dx.doi.org/10.1109/34.598228

[8] Roweis, S.T., Saul, L.K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500): 2323-2326. http://dx.doi.org/10.1126/science.290.5500.2323

[9] Tenenbaum, J.B., de Silva, V., Langford, J.C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500): 2319-2323. http://dx.doi.org/10.1126/science.290.5500.2319

[10] Belkin, M., Niyogi, P. (2001). Laplacian eigenmaps and spectral techniques for embedding and clustering. NIPS'01 Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic, pp. 585-591. 

[11] He, X., Yan, S., Hu, Y., Niyogi, P., Zhang, H. (2005). Face recognition using Laplacianfaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(3): 328-340. http://dx.doi.org/10.1109/TPAMI.2005.55

[12] Zhao, H.T., Sun, S.Y., Jing, Z.L., Yang, J.Y. (2006). Local structure based supervised feature extraction. Pattern Recognition, 39(8): 1546-1550. http://dx.doi.org/10.1016/j.patcog.2006.02.023

[13] Sugiyama, M., Idé, T., Nakajima, S., Sese, J. (2006). Semi-supervised local Fisher discriminant analysis for dimensionality reduction. Machine Learning, 78(1-2): 35-40. http://dx.doi.org/10.1007/s10994-009-5125-7

[14] Kim, T.K., Kittler, J. (2005). Locally linear discriminant analysis for multimodally distributed classes for face recognition with a single model image. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(3): 318-327. http://dx.doi.org/10.1109/TPAMI.2005.58

[15] Luo, Y. (2008). Can subclasses help a multiclass learning problem. IEEE Intelligent Vehicles Symposium, pp. 214-219. http://dx.doi.org/10.1109/IVS.2008.4621136

[16] Shakhnarovich, G., Moghaddam, B. (2005) Face recognition in subspaces. Handbook of Face Recognition, 141-168. http://dx.doi.org/10.1007/0-387-27257-7_8

[17] Benkaddour, M.K., Bounoua, A., (2017). Feature extraction and classification using deep convolutional neural networks, PCA and SVC for face recognition. Traitement du Signal, 34(1-2): 77-91. https://doi.org/10.3166/TS.34.77-91

[18] Reddy, C.V.R., Reddy, U.S., Kishore, K.V.K. (2019). Facial emotion recognition using NLPCA and SVM. Traitement du Signal, 36(1): 13-22. https://doi.org/10.18280/ts.360102