APPLICATION OF NEURAL NETWORK MODELS FOR SOLUTION TASKS OF SPEECH EMOTION RECOGNITION

  • Дарья Ивановна Карпенкова Kazan National Research Techniсal University named after A.N.Tupolev
  • Алексей Сергеевич Катасёв Kazan National Research Techniсal University named after A.N.Tupolev
Keywords: neural network model, speech emotion, audio data analysis, modeling

Abstract

The paper considers the solution to the problem of speech emotion recognition (SER) based on the construction and research of a neural network model. Typical methods of emotion classification are analyzed. To solve the problem, the expediency of using a categorical model of representing emotions as the most effective is justified. Audio recordings of human speech are the object of research. It is proposed to use a neural network model to analyze the values of audio recording parameters, such as spectral coefficients, spectrograms and chromatograms. Several sets of English-language audio data found on the kaggle platform were used as source data for analysis and neural network modeling. The original dataset identifies seven classes (emotions): happiness, surprise, neutral emotion, anger, sadness, fear, disgust. The total number of audio recordings in the generated archive is 48,648. The initial data was presented in the form of audio recordings of various lengths. To train a neural network model, characteristic features were extracted from audio recordings and augmentation was performed. Based on the initial data, the values of 162 parameters of audio recordings were calculated to obtain a single data table for analysis. The process of preparing data for analysis and modeling is described. The data was divided into training and test sets, as well as the construction and study of a neural network model in the form of a convolutional neural network. To assess the effectiveness of the constructed model, an assessment of the accuracy, completeness and F-measure of the constructed model was made. The research results have shown that the model is quite effective and can be used as part of an intelligent decision support system.

Author Biographies

Дарья Ивановна Карпенкова, Kazan National Research Techniсal University named after A.N.Tupolev

Postgraduate student at the Department of Information Security Systems of KNRTU-KAI.

Area of scientific interests: neural network modeling, data mining, decision support systems.

SPIN:3898-0624, AuthorID: 1219347, ORCID:0009-0008-3897-7286.

Алексей Сергеевич Катасёв, Kazan National Research Techniсal University named after A.N.Tupolev

Doctor of Technical Sciences, Professor, Professor of the Department of Information Security Systems of KNRTU-KAI.

Area of scientific interests: neural network and neuro-fuzzy modeling, data mining, soft computing, decision support systems.

SPIN: 9374-6690, AuthorID: 651038, ORCID:0000-0002-9446-0491.

E-mail: ASKatasev@kai.ru

Published
2024-01-30
How to Cite
Карпенкова, Дарья, & Катасёв, Алексей. (2024, January 30). APPLICATION OF NEURAL NETWORK MODELS FOR SOLUTION TASKS OF SPEECH EMOTION RECOGNITION. Electronics, Photonics and Cyberphysical Systems, 3(4), 37-46. Retrieved from http://elphotkai.ru/article/view/599
Section
Cyber-physical systems

Most read articles by the same author(s)