Loading...
Paper Type
Complete
Paper Number
1380
Description
Music is exerting powerful effects on people’s mind and behavior in modern life. Music emotion recognition could provide unprecedented opportunities for many business applications, such as emotion-based music recommendation and music-based purchase behavior prediction. Unfortunately, the performances of existing music emotion recognition models are far from satisfaction, especially on fine-grained level. In this paper, we design a framework to integrate data from two sources, namely lyrics, and audios, to cast music to Plutchik’s emotion wheel. We utilized DCNN and TextCNN to extract the multimodal feature representations of each utterance develop a stacking ensemble learning model to solve the challenge of extreme imbalance among features from different sources (lyrics and audios). Then, we proposed a LSTM-based classifier to accomplish training and predicting of emotion recognition based on global features of the music clips. We conduct three experiments to test the effectiveness of our framework. Experiment results show that our framework outperformed existing models. Moreover, our framework can be successfully extended to recognize emotion evolution on a fi ne-grained level.
Recommended Citation
Xu, Liyang; Xu, Wei; and Zhang, Wenping, "Multi-Dimensional Music Emotion Recognition Incorporating Convolutional Neural Networks and Plutchik’s Emotion Wheel" (2021). AMCIS 2021 Proceedings. 9.
https://aisel.aisnet.org/amcis2021/adopt_diffusion/adopt_diffusion/9
Multi-Dimensional Music Emotion Recognition Incorporating Convolutional Neural Networks and Plutchik’s Emotion Wheel
Music is exerting powerful effects on people’s mind and behavior in modern life. Music emotion recognition could provide unprecedented opportunities for many business applications, such as emotion-based music recommendation and music-based purchase behavior prediction. Unfortunately, the performances of existing music emotion recognition models are far from satisfaction, especially on fine-grained level. In this paper, we design a framework to integrate data from two sources, namely lyrics, and audios, to cast music to Plutchik’s emotion wheel. We utilized DCNN and TextCNN to extract the multimodal feature representations of each utterance develop a stacking ensemble learning model to solve the challenge of extreme imbalance among features from different sources (lyrics and audios). Then, we proposed a LSTM-based classifier to accomplish training and predicting of emotion recognition based on global features of the music clips. We conduct three experiments to test the effectiveness of our framework. Experiment results show that our framework outperformed existing models. Moreover, our framework can be successfully extended to recognize emotion evolution on a fi ne-grained level.
When commenting on articles, please be friendly, welcoming, respectful and abide by the AIS eLibrary Discussion Thread Code of Conduct posted here.