Toward Multiple Emotion Classification from Musical Audio Signals


We create and release new, publicly available two music data sets that consist of musical features and multiple emotions; a music can be assigned maximum four different music emotions. It would be helpful for researchers who have trouble of developing MER system due to the rarity of multi-emotion data.

We assembled two data sets, namely MusicEmo-A and MusicEmo-B. In the case of MusicEmo-A, we collected 100 music clips from popular five genres and had them subjectively labeled by approximately 500 times through an on-line annotation system. Each music clip after 60 seconds was dropped due to the copyright issue. For each music, we used the MIR toolbox that offers an integrated set of functions to extract musical audio features. The extracted features fall into six types: dynamics, fluctuation, rhythm, spectral, timbre, and tonal features. At the ends, we obtained MusicEmo-A that is composed of 100 patterns and 864 features. For the second data set, MusicEmo-B, we collected 600 music clips of 45 seconds playtime from eleven genres, and annotated by approximately 3,600 times. We discarded 35 music clips since those do not significantly evoke any emotion in participants during its playtime. After having applied the MIR toolbox to these music clips, we obtained MusicEmo-B that is composed of 565 patterns and 346 features; features in MusicEmo-B are put forth by the statistical viewpoint.

To model the emotional content of music, Thayer's emotion theory was used. In this model, a state of human emotion is represented as a point (vector) in a two-dimensional emotion space; valence and arousal. For simple description of multiple emotions for each music, in this study, we used the four sub-planes (zones) in Thayer's model as emotion labels. The first plane, positive arousal and positive valence labeled by l1 = {+,+}, represent Excitement feeling. Other three planes can be labeled as Distress l2 = {+,-}, Depression l3 = {-,-}, and Contentment l4 = {-,+}. We assumed that the perceived emotion may distribute equally all of planes, if a music does not signi¯cantly evoke any emotion. Since each music clip is labeled by many participants who may perceive different emotions from it, we represent the multiple emotions of a music clip as a set of aggregated labels over the four emotion planes if an emotion is labeled by more than 33% of participants.


MusicEmo-A: 100 patterns, 864 features [Download]

MusicEmo-B: 565 patterns, 346 features [Download]