APPLICATION OF NEURAL NETWORK MODELS FOR SOLUTION TASKS OF SPEECH EMOTION RECOGNITION
Abstract
The paper considers the solution to the problem of speech emotion recognition (SER) based on the construction and research of a neural network model. Typical methods of emotion classification are analyzed. To solve the problem, the expediency of using a categorical model of representing emotions as the most effective is justified. Audio recordings of human speech are the object of research. It is proposed to use a neural network model to analyze the values of audio recording parameters, such as spectral coefficients, spectrograms and chromatograms. Several sets of English-language audio data found on the kaggle platform were used as source data for analysis and neural network modeling. The original dataset identifies seven classes (emotions): happiness, surprise, neutral emotion, anger, sadness, fear, disgust. The total number of audio recordings in the generated archive is 48,648. The initial data was presented in the form of audio recordings of various lengths. To train a neural network model, characteristic features were extracted from audio recordings and augmentation was performed. Based on the initial data, the values of 162 parameters of audio recordings were calculated to obtain a single data table for analysis. The process of preparing data for analysis and modeling is described. The data was divided into training and test sets, as well as the construction and study of a neural network model in the form of a convolutional neural network. To assess the effectiveness of the constructed model, an assessment of the accuracy, completeness and F-measure of the constructed model was made. The research results have shown that the model is quite effective and can be used as part of an intelligent decision support system.

This work is licensed under a Creative Commons Attribution 4.0 International License.
