Author : Rakhi Sharma, Renu Vadhera, Sarika Chaudhar
Date of Publication :25th July 2024
Abstract:This Research investigates an innovative technique of speech analysis for the identification of human emotions. Since emotions are an essential component of human communication, being able to recognize them from speech can greatly improve interactions in a variety of settings, including customer service, virtual assistants, and healthcare. To increase the accuracy of emotion recognition, we suggest a technique that blends convolutional neural networks (CNN) with Mel-Frequency Cepstral Coefficients (MFCC). By capturing the key components of speech, MFCC converts audio signals into a format that is simpler for computers to understand. On the other hand, CNNs are strong machine learning models that are well-known for their capacity to identify patterns in both voice and image characteristics, as we have shown. In this work, we first extract speech samples' MFCC features, which offer a comprehensive depiction of the sound properties. Subsequently, these characteristics are fed into a CNN model, which uses these patterns to learn to distinguish between various emotional states like happy, sadness, anger, and neutrality. Our experiments demonstrate that, in comparison to conventional techniques, the combination of MFCC and CNN greatly improves the performance of emotion identification systems. This method not only produces better accuracy but also shows resilience in a variety of speech datasets. The results of this study may lead to the development of more responsive and sympathetic technologies, which will improve the efficiency and naturalness of human-computer interactions. Our comprehensive tests and analyses show that the combination of CNN and MFCC greatly improves emotion identification systems' performance. When compared to conventional methods, which frequently rely on less advanced feature extraction and classification techniques, the suggested methodology delivers improved accuracy and robustness. To ensure our model's generalizability and efficacy in a range of speech scenarios and environments, we validate it on Ravdess speech dataset. The results of this study may lead to the development of more sensitive and sympathetic technologies. We can create systems that better comprehend and react to human emotions by enhancing speech recognition of emotions, which will enhance the naturalness, effectiveness, and intuitiveness of interactions. This breakthrough has enormous potential for use in virtual assistants, automated customer support, mental health diagnostic tools, and any other industry where an understanding of human emotion is essential.
Reference :