Paper Type

Complete

Paper Number

1380

Description

Music is exerting powerful effects on people’s mind and behavior in modern life. Music emotion recognition could provide unprecedented opportunities for many business applications, such as emotion-based music recommendation and music-based purchase behavior prediction. Unfortunately, the performances of existing music emotion recognition models are far from satisfaction, especially on fine-grained level. In this paper, we design a framework to integrate data from two sources, namely lyrics, and audios, to cast music to Plutchik’s emotion wheel. We utilized DCNN and TextCNN to extract the multimodal feature representations of each utterance develop a stacking ensemble learning model to solve the challenge of extreme imbalance among features from different sources (lyrics and audios). Then, we proposed a LSTM-based classifier to accomplish training and predicting of emotion recognition based on global features of the music clips. We conduct three experiments to test the effectiveness of our framework. Experiment results show that our framework outperformed existing models. Moreover, our framework can be successfully extended to recognize emotion evolution on a fi ne-grained level.

Share

COinS
Top 25 Percent Paper badge
 
Aug 9th, 12:00 AM

Multi-Dimensional Music Emotion Recognition Incorporating Convolutional Neural Networks and Plutchik’s Emotion Wheel

Music is exerting powerful effects on people’s mind and behavior in modern life. Music emotion recognition could provide unprecedented opportunities for many business applications, such as emotion-based music recommendation and music-based purchase behavior prediction. Unfortunately, the performances of existing music emotion recognition models are far from satisfaction, especially on fine-grained level. In this paper, we design a framework to integrate data from two sources, namely lyrics, and audios, to cast music to Plutchik’s emotion wheel. We utilized DCNN and TextCNN to extract the multimodal feature representations of each utterance develop a stacking ensemble learning model to solve the challenge of extreme imbalance among features from different sources (lyrics and audios). Then, we proposed a LSTM-based classifier to accomplish training and predicting of emotion recognition based on global features of the music clips. We conduct three experiments to test the effectiveness of our framework. Experiment results show that our framework outperformed existing models. Moreover, our framework can be successfully extended to recognize emotion evolution on a fi ne-grained level.