We introduce a dataset for facilitating audio-visual analysis of musical performances. The dataset comprises a number of simple multi-instrument musical pieces assembled from coordinated but separately recorded performances of individual tracks. For each piece, we provide the musical score in MIDI format, the high-quality individual instrument audio recordings and the videos of the assembled pieces. We anticipate that the dataset will be useful for multi-modal information retrieval techniques such as music source separation, transcription, performance analysis and also serve as ground-truth for evaluating performances.
For each piece, we provide:
o Musical score in MIDI and PDF format
o Audio recordings of each individual track and mixture of pieces
o Videos of the assembled pieces.
o Ground-truth frame-level/note-level pitch annotations