Home Journals RIA MFCC-Based Feature Extraction Model for Long Time Period Emotion Speech Using CNN

JOURNAL METRICS

CiteScore 2023: ℹCiteScore:

CiteScore is the number of citations received by a journal in one year to documents published in the three previous years, divided by the number of documents indexed in Scopus published in those same three years.

SCImago Journal Rank (SJR) 2023: ℹSCImago Journal Rank (SJR):

The SJR is a size-independent prestige indicator that ranks journals by their 'average prestige per article'. It is based on the idea that 'all citations are not created equal'. SJR is a measure of scientific influence of journals that accounts for both the number of citations received by a journal and the importance or prestige of the journals where such citations come from It measures the scientific influence of the average article in a journal, it expresses how central to the global scientific discussion an average article of the journal is.

Source Normalized Impact per Paper (SNIP) 2023: ℹSource Normalized Impact per Paper(SNIP):

SNIP measures a source’s contextual citation impact by weighting citations based on the total number of citations in a subject field. It helps you make a direct comparison of sources in different subject fields. SNIP takes into account characteristics of the source's subject field, which is the set of documents citing that source.

qqtu_pian_20240428144739.png

MFCC-Based Feature Extraction Model for Long Time Period Emotion Speech Using CNN

Mahmood Alhlffee

Department of DIEC, IIIE, Universidad Nacional Del Sur, Bahía Blanca 8000, Argentina

Corresponding Author Email:

mahmood@uns.edu.ar

Received:

20 December 2019

|

Revised:

30 January 2020

|

Accepted:

3 February 2020

|

Available online:

10 May 2020

| Citation

34.02_01.pdf

OPEN ACCESS

Abstract:

This paper aims to study the effectiveness of the feature extraction model based on MFCC and Fast Fourier Transform (FFT). Using the CNN model, five basic emotions were extracted from the input speech corpus, and the spectrogram based on long-term speech words was applied to achieve the high-precision performance of the fixed-length learning vector existing in the audio file. Finally, the authors proposed the method of recognizing five emotional states in the FFT-based RAVDSS and SAVEE emotion speech corpus based on FFT. By comparison with the most advanced correlation methods, it’s found that the detection accuracy is improved by 70% when using the proposed model to extract audio fragments from audio files and adjust the speech words to spectrograms.

Keywords:

Mel-Frequency Cepstral Coefficients (MFCC), Fast Fourier Transform (FFT)-Based feature, CNN model and Hybrid HMM/CNN system

1. Introduction

2. Literature Related to the Framework

3. Deep Learning Model for Speech Recognition

4. System Architecture

5. Model Evolution Result

6. Conclusion

IJHT
MMEP
ACSM
EJEE
ISI
I2M
JESA
RCMA
RIA
TS
IJSDP
IJSSE
IJDNE
JNMES
IJES
EESRJ
RCES
AMA_A
AMA_B
AMA_C
AMA_D
MMC_A
MMC_B
MMC_C
MMC_D

Username
Password
Remember me

Search form

MFCC-Based Feature Extraction Model for Long Time Period Emotion Speech Using CNN