Datasets

Read more about Reference Annotations
Log in to post comments

Introduction

Shared data enable researchers to reliably compare their results with others. The Centre for Digital Music (C4DM) has a tradition of providing ground truth data for research, and our chord, onset and segmentation annotations have been used by many researchers in the MIR community. This year, we have focused on extending these resources in scope and quantity.

This web page accompanies a late-breaking paper at the 11th Conference on Music Information Retrieval [1].

Reference Annotations: The Beatles

Read more about Reference Annotations: The Beatles
Log in to post comments

Note: Please be sure to read the page describing these annotations before use. In particular, the level of confidence we have in the individual annotations is described there, as well as the original CD issue numbers from which we worked.

Chris Harte's PhD thesis (2010) which describes the chord syntax, transcription process and verification process for the Beatles collection can be downloaded here.

Reference Annotations: Carole King

Read more about Reference Annotations: Carole King
Log in to post comments

Reference Annotations: Queen

Read more about Reference Annotations: Queen
Log in to post comments

Reference Annotations: Michael Jackson

Read more about Reference Annotations: Michael Jackson
Log in to post comments

Reference Annotations: Zweieck

Read more about Reference Annotations: Zweieck
Log in to post comments

Room Impulse Response Data Set

Read more about Room Impulse Response Data Set
Log in to post comments

This collection of room impulse responses was measured in the Great Hall, the Octagon, and a classroom at the Mile End campus of Queen Mary, University of London in 2008. The measurements were created using the sine sweep technique [1] with a Genelec 8250A loudspeaker and two microphones, an omnidirectional DPA 4006 and a B-format Soundfield SPS422B.

These IRs are released under the Creative Commons Attribution-Noncommercial-Share-Alike license with attribution to the Centre for Digital Music, Queen Mary, University of London.

Automatic Annotations

Read more about Automatic Annotations
Log in to post comments

In the field of music computing, many methods aim at automatically extracting musical information such as beat times, chords, and keys from raw audio data.

Publishing the automatically extracted data can be useful in several ways: researchers working on similar methods can compare their results with those published; non-specialist observers can examine the state of the art; the published data can be used as input for new research methods (e.g. for music classification).

Singing Voice Audio Dataset

Read more about Singing Voice Audio Dataset
Log in to post comments

This dataset is for the purpose of the analysis of singing voice. It is our hope that the publication of this dataset will encourage further work into the area of singing voice audio analysis by removing one of the main impediments in this research area - the lack of data (unaccompanied singing).

It contains over 70 original vocal recordings by 28 professional, semi-professional and amateur singers. Singing style is predominantly Chinese Opera but some recordings are Western Opera. All recordings are 44.1 KHz sample rate and have been amplitude normalised.

Jingju structural segmentation dataset

Read more about Jingju structural segmentation dataset
Log in to post comments

This dataset is collected for the purpose of Jingju structural segmentation research. You can download the metadata and structural segmentation annotations at the following links.

metadata descriptors (xls)

Annotations (zip)

This dataset accompanies with a research paper which we will update soon.