JOURNAL METRICS

CiteScore 2023: 1.9 ℹCiteScore:

CiteScore is the number of citations received by a journal in one year to documents published in the three previous years, divided by the number of documents indexed in Scopus published in those same three years.

SCImago Journal Rank (SJR) 2023: 0.229 ℹSCImago Journal Rank (SJR):

The SJR is a size-independent prestige indicator that ranks journals by their 'average prestige per article'. It is based on the idea that 'all citations are not created equal'. SJR is a measure of scientific influence of journals that accounts for both the number of citations received by a journal and the importance or prestige of the journals where such citations come from It measures the scientific influence of the average article in a journal, it expresses how central to the global scientific discussion an average article of the journal is.

Source Normalized Impact per Paper (SNIP) 2023: 0.51 ℹSource Normalized Impact per Paper(SNIP):

SNIP measures a source’s contextual citation impact by weighting citations based on the total number of citations in a subject field. It helps you make a direct comparison of sources in different subject fields. SNIP takes into account characteristics of the source's subject field, which is the set of documents citing that source.

qqtu_pian_20240428144739.png

Complex Face Emotion Recognition Using Convolutional Neural Networks

Milind Talele^* | Rajashree Jain

Symbiosis Centre for Research and Innovation, Symbiosis International Deemed University, Pune 412115, India

Symbiosis Institute of Computer Studies and Research, Symbiosis International Deemed University, Pune 411016, India

Corresponding Author Email:

milind1248@gmail.com

Received:

30 November 2024

Revised:

14 January 2025

Accepted:

22 January 2025

Available online:

31 January 2025

| Citation

jesa_58.01_19.pdf

OPEN ACCESS

Abstract:

Face emotion recognitions find numerous applications including human-computer interaction, multimedia retrieval, and social robotics. Recent advancements in deep learning, particularly the use of Convolutional Neural Networks, have significantly improved the performance of facial emotion recognition systems. The present paper dwells on the fact that complex face emotions are critical in understanding human reactions and can support tackling problems in healthcare, education, retail and other domains. This paper attempts to identify complex face emotions and introduces a Complex Face Emotion Recognition Model using deep leaning techniques. The experiment was carried out based on the concept of complex emotion as a combination of two or more basic emotions. The model developed was trained on FER2013 dataset for basic face emotion recognition. Further, a novel complex face emotion technique was presented here to validate more than five different complex emotions. The model performance was measured by accuracy, precision, and prediction capacity of the model are presented for different complex emotions.

Keywords:

basic face emotion, complex face emotion, convolution neural network, face emotion recognition, computer vision

1. Introduction

Facial emotion recognition has been a longstanding challenge in computer vision and deep learning [1]. Traditional methods for emotion recognition have primarily focused on classifying fundamental facial expressions, such as the six basic emotions (sadness, happiness, anger, disgust, surprise and fear). However, real-world facial expressions often represent a much more comprehensive variety of emotions and understated variations in intensity.

The dimensional model of emotion, representing emotions as points in a multidimensional space, is a robust framework for modelling the complexity of facial expressions. By training Convolution Neural Network (CNNs) to predict the coordinates of facial expressions in this dimensional space, researchers have developed systems capable of fine-grained, continuous analysis of emotion, going beyond the limitations of traditional discrete classification approaches [2].

Researchers have also proposed using deep CNNs to address this limitation and analyze facial expressions within a dimensional emotion model. This approach allows for recognizing a broader spectrum of emotions, including complex and nuanced expressions, rather than being limited to a predefined set of discrete categories [3].

CNN-based methods have achieved remarkably high accuracy by learning robust features from large-scale datasets of facial images collected from the web [4] This allows the models to capture the nuances & complexities including a full range of emotions and their varying intensities.

The rise of deep learning has accelerated face recognition research, as CNNs are being applied to various computer vision tasks, including facial expression analysis, age estimation, and more. As this field continues to evolve, we expect to see even more sophisticated and accurate systems for complex facial emotion recognition, with far-reaching applications in human-computer interaction, psychology, and mental health [4].

Complex emotional states [5] are fundamentally significant not solely for interpersonal human interactions but also for interactions between humans and computers, as the emotional state of a person can significantly impact their concentration, decision-making, and overall productivity. It is a combination of two or more basic emotions based on the Robert Plutchik model of the emotion wheel. For example, the emotion of nostalgia is a combination of happiness and sadness, and the emotion of hope is a combination of fear and happiness. Similarly, Awe is feeling of reverence, admiration, wonder is a combination of fear and surprise and so on.

Though there are a number of studies available on the FER, there is very scarce literature on Complex Face Motion Recognition (CFER). Present paper is an attempt to develop a model of CFER using CNN. The model uses available datasets for training basic emotions and a novel technique was used for deriving complex emotions out of it. The model was tested for about seven different complex emotions. The accuracy of the validation remained between 80 to 20%.

This research paper is structured as follows: Section 2 provides a literature review focused on the use of deep learning, particularly CNNs, for recognizing complex emotions derived from basic emotions. Section 3 outlines the framework for formulating complex facial emotions, including preprocessing, basic emotion recognition, weight derivation, dataset usage, and the proposed algorithm. Visual guides, such as flow diagrams and training flowcharts, illustrate the process. Section 4 discusses experimental results, comparing the performance of CNNs in recognizing basic and complex emotions. Key findings are supported by accuracy tables, classification trends, and heatmaps for performance metrics. Section 5 concludes by summarizing the contributions and challenges in deep learning-based emotion recognition, emphasizing the need for diverse datasets, interpretable models, and multimodal approaches for future research.

2. Related Literature

Facial expressions represent a significant portion of the emotional expression and communication of humans, as per psychological theories on emotion. The intensity and nuanced nature of these expressions perform an important role in interpersonal interactions and convey a wealth of information beyond the basic six emotions [6]. Research [7] established that there are six universal basic emotions (anger, fear, disgust, happiness, sadness, and surprise) that can be reliably recognized from facial expressions.

Research [8] has found that facial emotion recognition is a complex task due to the highly personal, contextual, and multidimensional nature of human emotions.

While traditional computer vision techniques have been applied to emotion recognition, they often rely on manually engineered features and are limited in their ability to capture the full complexity of facial expressions [9].

The use of deep learning, especially CNNs, has revolutionised the field of facial emotion recognition. CNN-based methods are capable of learning robust, hierarchical features from large datasets of facial images, enabling them to recognise a much broader range of emotional expressions, including subtle and complex ones. Compared to traditional computer vision approaches, deep learning models can be directly trained on raw image does not require an extensive feature engineering. The model automatically learns important features for emotion recognition, leading to significant performance improvements [10-13]

As the domain of facial emotion recognition progresses, we anticipate observing increasingly advanced and accurate systems that can reliably recognise a wide range of complex emotional states, with far-reaching applications in human-computer interaction, psychology, and mental health. CNN-based, for emotion recognition. However, there are still challenges in accurately recognising complex emotions, as the deep learning models need to capture the intricate patterns and subtle variations in facial expressions [14].

Researchers have proposed various alternatives or modifications in existing methods for CFER. For Example, Attentive CNN, PROLOG [15], Long Term-Short Term (LSTM) memory networks and many more. In another novel study, authors have proposed a novel approach to represent and detect complex emotions based on Plutchik’s model. Instead of directly predicting complex emotions, the authors train a classifier to detect seven basic emotions and then represent complex emotions as vectors derived from these basic emotion intensities. Their experimental results show promising results for complex emotions like “Love”, outperforming the baseline DeepMoji model. However, their model shows slightly lower accuracy in detecting “Guilt” compared to DeepMoji. The authors acknowledge limitations in their dataset and suggest further research with a more diverse range of emotions [16].

The author has provided an approach for identifying sophisticated affective states for detecting complex emotions using Plutchik’s model and multi-label classifiers. The authors introduce a new textual and social corpus labelled with basic emotions, which is used to train a model for complex emotion recognition. The main contributions include a language model transfer to a multi-label classifier based on the transformer decoder architecture and a formal method for interpreting complex emotions using basic emotion vectors [17]. The Plutchik Emotion Wheel, as shown in Figure 1, is a theoretical model that categorizes human emotions into primary and complex forms, illustrating their relationships and intensities. Plutchik identified eight primary emotions: happiness, trust, fear, surprise, sadness, anticipation, anger, and disgust. These fundamental emotions can combine to create more nuanced, complex emotional states, similar to how primary colors mix to form secondary colors. For instance, the combination of happiness and trust gives rise to the complex emotion of love, while the blending of fear and surprise results in awe. Plutchik’s model also includes the concept of “emotion intensity,” where emotions become more heightened towards the center of the wheel. For example, serenity represents a less intense form of happiness, while ecstasy is a more intense manifestation of the same primary emotion. Furthermore, Plutchik’s iceberg model suggests that the visible emotional expressions we observe are merely the surface-level, with deeper, underlying emotions and motivations existing beneath. This conceptualization helps to elucidate the complexity and depth of human emotional experiences. Plutchik’s theory also offers a simple formula for understanding the formation of complex emotions, whereby a complex emotion can be represented as the sum of two primary emotions. For instance, love can be expressed as the combination of happiness and trust, while awe arises from the fusion of fear and surprise.

1.png

Figure 1. Robert Plutchik model of the emotion wheel [17]

Overall, the Plutchik Emotion Wheel serves as a valuable framework for visualizing and comprehending the intricate nature of human emotions and their interrelationships. Table 1 shows a list of basic and complex emotion formation from two basic emotions.

Table 1. Complex emotion formed from basic emotions [18-22]

Basic Emotions	Angry	Disgust	Fear	Happy	Sad	Surprise
Angry	Angry	Contempt	Antagonism	Vengeance	Depressed	Indignation
Disgust	Contempt	Disgust	Shame	Morbidity	Remorse	Disbelief
Fear	Antagonism	Shame	Fear	Frustration	Despair	Awe
Happy	Vengeance	Morbidity	Wonder	Happy	Yearning	Delight
Surprise	Indignation	Disbelief	Awe	Delight	Disapproval	Surprise

For example, combination of surprise and fear leads to “Awe” as a complex emotion as shown in Figure 2.

2.png

Figure 2. Awe complex emotion = Surprise + Fear

In another study, authors highlight the difficulty of capturing authentic emotional responses and the complexity of analysing dynamic facial expressions in real-time [23]. They advocate for the use of more naturalistic datasets and experimental paradigms to understand how facial expressions unfold in the context of everyday social interactions.

Other than, the basic six-emotion models [24], authors in this paper have explored more nuanced and contextualised approaches to emotion recognition. This includes incorporating additional modalities (e.g., speech, body language) and accounting for cultural and individual differences in emotional expression. Research has demonstrated that deep learning frameworks, especially CNN gain a good accuracy in facial emotion detection compared to statistical methods relying on handcrafted features. Moreover, deep learning models trained on large datasets exhibit strong generalisation ability, enabling them to perform well on unseen data and leading to more robust emotion detection systems [25].

In summary, complex facial emotion recognition, using deep learning, particularly CNNs, has emerged as a promising field with significant advancements in recent years.

However, the research also highlights several challenges in developing practical and reliable facial emotion recognition systems using deep learning. A key challenge is the absence of publicly available diverse, large and label datasets for training the model.

A significant body of research has explored using the FER2013 dataset for facial emotion recognition, primarily employing convolutional neural networks. Studies have investigated various CNN architectures, including those tailored for lightweight models, achieving promising results on FER2013. Researchers have also explored the impact of data augmentation techniques to address limitations of FER2013, such as class imbalance and variations in pose and illumination. While FER2013 has been instrumental in advancing FER research, studies acknowledge its biases and limitations, prompting comparisons with other datasets like CK+, JAFFE, and AffectNet. These comparisons highlight the importance of dataset diversity and the generalisation capabilities of FER models [26].

3. Methodology

Facial emotion recognition is a critical component of understanding human behavior and social interaction. It consists of three steps: face identification, face expression recognition and forming a complex emotion based on the seven basic emotions. Facial emotion recognition employs advanced algorithms to achieve accurate results where initial face detection is followed by a detailed analysis of facial movements and features, ultimately classifying emotions into basic categories such as happiness, sadness, anger, surprise, disgust, fear and neutral.

Convolutional neural networks are frequently used for image and video processing. It has built-in feature extraction filters and provide good accuracy in basic emotion recognition. Complex emotions are a combination of two or more related or opposite basic emotions. Complex emotions are difficult to recognise by machine.

As shown in Figure 3, there are three important blocks of complex emotion formulation.

3.png

Figure 3. Flow diagram for complex facial emotion formulation

3.1 Image input and preprocessing

This segment delineates the preliminary phase of the pipeline wherein the input image is acquired through webcam. A series of preprocessing procedures, including resizing, normalization, noise attenuation, deblurring, grayscale and dimensionality reduction to 48×48 pixels, are implemented to adequately prepare the image for ensuing analytical processes. This methodology guarantees that the data is pristine, uniform, and optimized for the recognition of emotional states. Following preprocessing, the next step involves feature extraction, where key characteristics of the facial expressions are identified and quantified to facilitate accurate emotion recognition.

3.2 Basic emotions recognition and emotion weights generation

In this stage, the system identifies and assigns weights to basic emotions (e.g., happiness, sadness, anger, fear, surprise, disgust) based on the features extracted from the preprocessed image. These weights represent the intensity or likelihood of each basic emotion being present in the input. For this purpose, a CNN model was used.

The main aim of CNN was a convolution operation to extract salient features from the input dataset (image or video). By learning relevant image features, the convolution layer preserves the spatial relationships between pixels. Parameter of layers are feature detector, kernel and filter. The “Activation Map”, or “Feature Map”, “Convolved Feature” represents the matrix generated by moving the filter on the image and calculate the dot matrix product. The selection of stride and filter size are crucial design choices in this layer.

Figure 4 illustrates the functional blocks of training the model for basic emotion recognition, which include preprocessing the captured input image, feature extraction, and classification using the CNN. The model’s highlighted weights correspond to the recognized basic emotions. Highest weight returned by model is consider as emotion recognition.

4.png

Figure 4. Flowchart for training the model to predict complex emotions

3.3 Derivation of complex emotion weights

Complex emotion is formed from the combination of basic emotions. Table 1 shows 7 basic emotions on the first row and first column forms a complex emotion in the matrix form.

The two highest weights are subsequently identified from the basic emotional array. Using the basic emotion weights, the system maps them to complex emotions (e.g., nostalgia, jealousy, pride, awe) by applying a combination of heuristic or learned relationships with mathematical formulation.

In this final block, complex weights are calculated to the identified complex emotions through amalgamation of two or more emotions and intensity of their emotion is identified through average weight comparison. After the fully connected layer, the softmax activation function is applied to the logits. The softmax function converts these raw scores into probability weights for each seven emotion category.

The Basic Emotions (BA) are arranged in the form of an array of 1×7 in size as in Eq. (1)

BA = {Angry, Disgust, Fear, Happy, Neutral, Sad, Surprise} (1)

The Complex Emotion is calculated as in Eq. (2).

Complex Emotion $=\sum_{i=1}^n c \ W_n$ (2)

where,

W_n: Weight of basic emotions.

n is between 1 to 7, index of basic emotion as shown in Eq. (1).

C is a multiplication factor.

For experimental purposes based on Robert Plutchik’s Model, the multiplication factor c was determined as below.

The two maximum dominant basic emotions and their respective weights W_max₁ and W_max₂ were prioritized based on an Average Weight value as

${{W}_{a}}=Average\left( {{W}_{max1}}+{{W}_{max2}} \right)$ (3)

where, c is 0, if W_n< W_a, else, when,

${{W}_{n}}>{{W}_{a}}$ (4)

The multiplication factor acts as a balancing mechanism, using the average of secondary emotions (W_a) to scale the contribution of each emotion. It also considers the threshold value for identifying the complex emotions as combination of two prominent basic emotions. Figure 5 shows the complex emotion prediction by using training the CNN model for complex face emotion recognition.

5.png

Figure 5. Complex emotions from Databrary dataset [14]

3.4 Dataset

The model was trained using FER2013 dataset [27] and tested using Databray’s Complex emotion dataset [14]. FER2013 is an Open Source dataset known as “Facial Expressions Recognition Dataset,” contains a number of grayscale images. It contains the seven categories of basic emotions as described in Eq. (1). There are about 36,321 grayscale images. About 80% (29,038 images) were used for training.

The Complex Emotion Expression Database (CEED) is a comprehensive collection of 480 images depicting a diverse range includes basic and complex emotions. These include, lovesick, desirous, contemptuous, attracted, flirtatious, brokenhearted, betrayed, jealous and affectionate. The database was developed to address limitations in previous datasets and investigate hypotheses regarding the developmental trajectories of sensitivity to complex emotional expressions. The database features a diverse group of eight young adult actors, and the emotional portrayals were independently evaluated by approximate 800 audience to confirm the of the face emotional expressions. This dataset is an available Databrary portal with the appropriate investigator signatory. It has a collection of images depicting both basic and complex emotions. It was created by photographing trained actors portraying these emotions. The actors, a diverse group of men and women of different ages and ethnicities, were guided through a method-acting approach to elicit authentic expressions. The resulting database, with its diverse range of emotions and actors, provides a valuable resource for researchers studying the complexities of emotional expression. CEED is an invaluable resource for researchers studying emotion perception and processing. Developed by Authors at Pennsylvania State University and funded by the National Institute of Mental Health (NIMH), this dataset includes a comprehensive collection consisting of 243 in basic and 237 in complex facial emotions performed by trained actors. The database aims to facilitate research by providing a reliable and validated set of stimuli, addressing the challenge of finding well-tagged and labelled images for complex emotional expressions in open-source datasets. CEED’s rigorously reviewed images ensure high-quality data for experimental studies, contributing significantly to advancements in understanding complex emotion processing. With contributions from 8 actors and feedback from 800 participants, CEED stands as a robust tool for psychological research and related fields. Figure 6 shows nine complex emotions from the dataset. Dataset images creation are contributed by 8 actors aged 18 to 27 with demographic diversity. It includes equal gender representation (4 males, 4 females) and actors from two major ethnic groups: White (4 actors, 241 images) and Black (4 actors, 239 images). The number of images per actor ranges from 15 to 88, with the largest contribution from a 24-year-old Black female (88 images) and the smallest from a 21-year-old White female (15 images). While the dataset is diverse in age, gender, and ethnicity. Table 2 shows total number of labeled complex emotions from Databrary dataset

Table 2. CEED Dataset images

Expression	No. of Images
Affectionate	36
Attracted	19
Betrayed	20
Brokenhearted	36
Contemptuous	19
Desirousa	46
Flirtatious	22
Jealous	9
Lovesick	30
Complex	237

6.png

Figure 6. Complex emotions from Databrary dataset [14]

3.5 Algorithm

Algorithm 1 trains and evaluates a CNN model on the FER 2013 dataset for seven basic emotions.

Dataset details:

• FER 2013 dataset: Grayscale images labeled with seven emotion categories.

• Training and testing splits: X_train, y_train, X_test, y_test, x_test, y_test.

CNN model components:

• Feature Extractor (FE): Defines the layers used for feature extraction (e.g., CNN layers).

• Classification Layer (CL): Fully connected layer for final classification.

Training process:

• Normalization: Ensures pixel values are scaled to a consistent range.

• Data augmentation: Techniques like rotation, flipping, or cropping to improve model generalization.

• Loss function: Cross-entropy loss (L_cat) to compare predictions with true labels.

• Optimization: Gradient descent/backpropagation used to update model weights.

• Epochs and batches: Number of training epochs and batch sizes for processing images.

Testing metrics:

• Softmax output: Converts into probabilities for seven emotion classification.

• Accuracy: Measures the percentage of correct predictions on the test set.

Algorithm 1: Training and testing for FER

Require:

X_train: Grayscale training images.

y_train: Labels for training images in X_train

X_test: Grayscale test images.

y_test: Labels for test images in X_test

FE: Feature extractor (e.g., CNN layers or pre-trained model).

CL: Classification layer.

1. For epoch in number of epochs do

2. For each batch in Batch (X_train, y_train) do

3. For x,y in batch do

4. x ← Normalization (x)

5. x ← DataAugmentation (x)

6. z ← CL (FE(x))

7. $\overline{{\hat{y}}}$ ← Softmax (z)

8. ${{L}_{\mathbf{cat}~}}$ ← CrossEntropy $\left( \overline{{\hat{y}}}~,\mathbf{y} \right)$

9. End For

10. ${{L}_{\mathbf{cat}~}}$ ← Average (L_cat) over batch

11. Update model weights (backpropagation of L_cat)

12. End For

13. End For

14. For ${{x}_{test~}},~{{y}_{test~}}$ in ${{X}_{test~}},~{{y}_{test~}}$ do

15. $\overline{{\hat{y}}}{{~}_{test~}}$ ← Softmax (CL (FE(x)))

16. acc ← Accuracy ( ${{\overline{{\hat{y}}}}_{test}},~{{y}_{test}}$ )

17. End For

18. Accuracy ← Accuracy (acc)

Algorithms 2 was used to test CNN model over new image dataset like CEED to identify complex emotions. This algorithm applies on a pre-trained CNN to identify primary and secondary emotions from a new dataset (e.g., CEED) using image.

Algorithm 2: Primary and secondary FER

Require:

face_classifier ← Haar Cascade for face detection classifier ← Pre-trained CNN model for emotion classification emotion_labels ← [‘Angry’, ‘Disgust’, ‘Fear’, ‘Happy’, ‘Neutral’, ‘Sad’, ‘Surprise’]

cap ← Input video stream or image sequence through web cam

1. Log system specification like CPU, Disk, Memory

2. for each frame in cap:

3. Read the frame frame.

4. if frame=None: break to confirm the video stream

5. Convert frame to grayscale gray.

6. Detect faces in gray using face_classifier.

7. for each face (x,y,w,h):

8. Extract roi_gray ← gray[y:y+h,x:x+w].

9. Resize roi_gray to (48,48).

10. Normalize roi_gray to [0,1].

11. Expand roi_gray dimensions for prediction.

12. Compute prediction ← classifier.predict(roi).

13. Identify primary emotion: label ← emotion_labels[prediction.argmax()].

14. Identify secondary emotion: second_label ← Second highest value in prediction array.

15. Overlay label and second_label on frame.

Dataset details:

• CEED dataset used for testing

Face detection:

• Haar cascade classifier: Used for face detection in grayscale images.

Image preprocessing:

• Convert to grayscale.

• Resize to 48 × 48 × 48 (same as model input size).

• Normalize pixel values to a range of [0, 1].

Emotion classification:

• Pre-trained CNN: Outputs a probability distribution over seven emotion labels.

• Primary emotion: Determined using the class with the highest predicted probability.

• Secondary emotion: Determined using the second highest predicted probability.

Complex emotions:

Predicted primary and secondary emotions are displayed on the image frame.

4. Results

The experimental results obtained are arranged in two sections as performance metric for basic emotions and the second section as performance metrics for complex emotions.

4.1 Performance of CNN model for basic FER

Table 3 shows the experimental result for the emotions like Angry, disgust, fear, happy, sad and surprise. The “happy” emotion had the highest accuracy of 100% while disgust as the lowest. The misclassification of disgust [28-30] seems common across literature, it may be sharing visual features with other negative emotions like anger or sadness, which causes overlapping features and makes classification more confusing. Additionally, there are fewer samples available compared to other emotion categories, limited representation in datasets, and difficulties in labeling due to cultural and contextual differences.

Table 4 shows a confusion matrix is created for basic emotions.

Table 3. Accuracy of basic emotions

Emotion Category	No. of Images	Correctly Classified Images	Accuracy (%)	Precision (%)	F1 Score
Angry	48	29	60	60.4	0.6
Disgust	28	3	11	10.7	0.1
Fear	48	34	71	70.8	0.7
Happy	48	48	100	100.0	1.0
Sad	33	25	76	75.8	0.8
Surprise	53	44	83	83.0	0.8

Table 4. Confusion matrix of basic emotions

	Angry	Disgust	Fear	Happy	Sad	Surprise
Angry	29	4	3	5	4	3
Disgust	4	3	4	6	6	5
Fear	5	4	34	1	1	3
Happy	0	0	0	48	0	0
Sad	2	1	1	1	25	3
Surprise	1	1	5	1	1	44

4.2 Performance of CNN model for complex FER

Table 5 indicates the accuracy of testing the model on CEED dataset and Figure 7 shows a heat map for the same dataset.

The system performs well in recognizing certain emotions like “Affectionate”, “Brokenhearted,” however it struggles significantly with others such as “Contemptuous” and “Lovesick.” The overall accuracy indicates room for improvement.

The total dataset comprises 491 images, of which 283 were classified correctly, leading to an overall accuracy of 58%.

The overall Precision, Recall, and F1 Score stand at 57.64%, indicating a moderate level of classification performance. However, certain categories show significant disparities in performance.

Figure 8 shows performance of the method for different complex emotions using a heatmap.

Table 5. Accuracy of complex FER

Emotion Category	No. of Images	Correctly Classified Images	Accuracy (%)	Precision (%)	F1 Score
Affectionate	36	29	81	80.6	0.8
Betrayed	19	8	42	42.1	0.4
Attracted	15	4	27	26.7	0.3
Broken hearted	36	32	89	88.9	0.9
Contemptuous	20	0	0	0.0	0.0
Desirous	46	13	28	28.3	0.3
Flirtatious	22	12	55	54.6	0.6
Lovesick	30	0	0	0.0	0.0
Jealous	9	2	22	22.2	0.2

7.png

Figure 7. Classification accuracy and misclassification of complex emotions

8.png

Figure 8. Heatmap of performance metrics for complex emotion recognition

Happy:

This category achieves perfect classification, with 100% recall, precision, accuracy and F1 score 1.0.

The model performance with this category indicates robustness in detecting happiness-related images.

Brokenhearted:

With an Accuracy of 89% and an F1 Score of 0.89, this category shows strong performance.

The model reliably classifies “Brokenhearted” emotions, suggesting good feature extraction for this emotion.

Surprise and affectionate:

These categories exhibit high accuracy (83% and 81%, respectively) and strong F1 scores (0.83 and 0.81).

The results suggest the model is effective at distinguishing these emotions.

Awe:

An attempt was made in testing the model using free pick images of “Awe” complex emotion. 9 images our 10 such free pick images were recognized correctly with an accuracy of 90%, precision of 90.2% an F1 score 0.9.

5. Conclusions and Future Directions

The author has put forward a sophisticated model for facial emotion that integrates a combination of seven basic emotions. A mathematical framework has been developed and empirically validated on complex emotions such as Awe, Affectionate, and Brokenhearted.

Deep learning has emerged as a powerful approach for facial emotion recognition, offering several advantages over traditional methods. However, the research also highlights several key challenges in developing practical and reliable facial emotion recognition systems using deep learning.

One key challenge is the limited availability of large, diverse, and annotated datasets for training deep learning models. A second challenge is the need for more robust and interpretable deep learning models, as current models can achieve high accuracy but their internal workings are often opaque for a feature exaction and exploring multimodal approaches that integrate visual and other cues for emotion recognition.

References

[1] Aliyu, I., Bomoi, M.A., Maishanu, M. (2022). A comparative study of eigenface and fisherface algorithms based on opencv and sci-kit libraries implementations. International Journal of Information Engineering and Electronic Business, 12(3): 30-40. https://doi.org/10.5815/ijieeb.2022.03.04

[2] Zhou, F., Kong, S., Fowlkes, C.C., Chen, T., Lei, B. (2020). Fine-grained facial expression analysis using dimensional emotion model. Neurocomputing, 392: 38-49. https://doi.org/10.1016/j.neucom.2020.01.067

[3] Adolphs, R. (2002). Recognizing emotion from facial expressions: Psychological and neurological mechanisms. Behavioral and Cognitive Neuroscience Reviews, 1(1): 21-62. https://doi.org/10.1177/1534582302001001003

[4] Trigueros, D.S., Meng, L., Hartnett, M. (2018). Face recognition: From traditional to deep learning methods. arXiv preprint arXiv:1811.00116. https://doi.org/10.48550/arXiv.1811.00116

[5] Do, L.N., Yang, H.J., Nguyen, H.D., Kim, S.H., Lee, G.S., Na, I.S. (2021). Deep neural network-based fusion model for emotion recognition using visual data. The Journal of Supercomputing, 77: 10773-10790. https://doi.org/10.1007/s11227-021-03690-y

[6] Hareli, S., Hess, U. (2012). The social signal value of emotions. Cognition & Emotion, 26(3): 385-389.10.1080/02699931.2012.665029

[7] Jia, S., Wang, S., Hu, C., Webster, P., Li, X. (2020). Detection of genuine and posed facial expressions of emotion: A review. arXiv preprint arXiv:2008.11353. https://doi.org/10.48550/arXiv.2008.11353

[8] Seyeditabari, A., Tabari, N., Zadrozny, W. (2018). Emotion detection in text: A review. arXiv preprint arXiv:1806.00674. https://doi.org/10.48550/arXiv.1806.00674

[9] Rangulov, D., Fahim, M. (2020). Emotion recognition on large video dataset based on convolutional feature extractor and recurrent neural network. In 2020 IEEE 4th International Conference on Image Processing, Applications and Systems (IPAS), Genova, Italy, pp. 14-20. https://doi.org/10.1109/IPAS50080.2020.9334935

[10] Gao, M., Dong, J., Zhou, D., Zhang, Q., Yang, D. (2019). End-to-end speech emotion recognition based on one-dimensional convolutional neural network. In 2019 the 3rd International Conference on Innovation in Artificial Intelligence, Suzhou, China, pp. 78-82. https://doi.org/10.1145/3319921.3319963

[11] Talele, M., Jain, R., Mapari, S. (2024). Complex face emotion recognition using computer vision and machine learning. In Harnessing Artificial Emotional Intelligence for Improved Human-Computer Interactions, pp. 180-196. https://doi.org/10.4018/979-8-3693-2794-4.ch011

[12] Talele, M., Jain, R. (2023). Complex facial emotion recognition-A systematic literature review. In 2023 Third International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT), Bhilai, India, pp. 1-8. https://doi.org/10.1109/ICAECT57570.2023.10117836

[13] Talele, M., Jain, R., Kulkarni, P. (2023). Review of face emotion recognition using feature extraction techniques. In 2023 Intelligent Computing and Control for Engineering and Business Systems (ICCEBS), Chennai, India, pp. 1-6. https://doi.org/10.1109/ICCEBS58601.2023.10448632

[14] Benda, M.S., Scherf, K.S. (2020). The complex emotion expression database: A validated stimulus set of trained actors. PloS One, 15(2): e0228248. https://doi.org/10.1371/journal.pone.0228248

[15] Ali, T.M. (2013). Query Proof Structure Caching for Incremental Evaluationof Tabled Prolog Programs. University of Malaya (Malaysia).

[16] Billal, B., Sadat, F., Lounis, H. (2020). Complex emotional intelligence learning using deep neural networks (student abstract). Proceedings of the AAAI Conference on Artificial Intelligence, 34(10): 13755-13756. https://doi.org/10.1609/aaai.v34i10.7149

[17] Plutchik, R. (2001). The nature of emotions: Human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice. American Scientist, 89(4): 344-350. https://doi.org/10.1511/2001.4.344

[18] Russell, J.A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39(6): 1161. https://psycnet.apa.org/doi/10.1037/h0077714

[19] Picard, R.W. (1997). Affective Computing. The MIT Press.

[20] Scherer, K.R. (2005). What are emotions? And how can they be measured? Social Science Information, 44(4): 695-729. https://doi.org/10.1177/0539018405058216

[21] Ortony, A., Clore, G.L., Collins, A. (2022). The Cognitive Structure of Emotions. Cambridge University Press.

[22] Porcu, S., Floris, A., Atzori, L. (2024). GAN generated images for facial expression recognition systems. IEEE DataPort. https://doi.org/10.21227/b7m1-rz14

[23] Srinivasan, R., Martinez, A.M. (2018). Cross-cultural and cultural-specific production and perception of facial expressions of emotion in the wild. IEEE Transactions on Affective Computing, 12(3): 707-721. https://doi.org/10.1109/TAFFC.2018.2887267

[24] Hasani, B., Mahoor, M.H. (2017). Facial affect estimation in the wild using deep residual and convolutional networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW): Honolulu, HI, USA, pp. 1955-1962. https://doi.org/10.1109/CVPRW.2017.245

[25] Karthigayan, M., Rizon, M., Yaacob, S., Nagarajan, R. (2007). Genetic algorithm for various face emotions classification. In 3rd Kuala Lumpur International Conference on Biomedical Engineering 2006, Kuala Lumpur, Malaysia, pp. 67-71. https://doi.org/10.1007/978-3-540-68017-8_18

[26] Li, S., Deng, W. (2020). Deep facial expression recognition: A survey. IEEE Transactions on Affective Computing, 13(3): 1195-1215. https://doi.org/10.1109/TAFFC.2020.2981446

[27] Face Expression Recognition Dataset. https://www.kaggle.com/jonathanoheix/face-expression-recognition-dataset.

[28] Zhao, G., Pietikainen, M. (2007). Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(6): 915-928. https://doi.org/10.1109/TPAMI.2007.1110

[29] Pantic, M., Rothkrantz, L.J.M. (2000). Automatic analysis of facial expressions: The state of the art. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12): 1424-1445. https://doi.org/10.1109/34.895976

[30] Ekman, P., Friesen, W.V. (1971). Constants across cultures in the face and emotion. Journal of Personality and Social Psychology, 17(2): 124-129. https://doi.org/10.1037/h0030377

IJHT
MMEP
ACSM
EJEE
ISI
I2M
JESA
RCMA
RIA
TS
IJSDP
IJSSE
IJDNE
JNMES
IJES
EESRJ
RCES
AMA_A
AMA_B
AMA_C
AMA_D
MMC_A
MMC_B
MMC_C
MMC_D

Username
Password
Remember me

Search form

Complex Face Emotion Recognition Using Convolutional Neural Networks