Onsets and Frames: Dual-Objective Piano Transcription

Online Supplement

Related Material

Main Paper on arXiv
Source Code
Interactive Colab Notebook

Contents

Video Demonstration
Examples from MAPS Dataset
Examples from Musopen
Examples of Current Metric Limitations
Higher Time Resolution Example
Full Results
References

Video Demonstration

Examples from MAPS Dataset

Mozart Sonata K. 331, 3rd movement

Disklavier
Onsets and Frames
Onsets and Frames with Velocity
Kelz et al., 2016
Sigtia et al., 2016

Chopin Etude Op. 25 No. 3

Disklavier
Onsets and Frames
Onsets and Frames with Velocity
Kelz et al., 2016
Sigtia et al., 2016

Beethoven Sonata No. 8, 2nd Movement

Ground Truth MIDI
Disklavier Notice the missing notes that weren't played by the Disklavier
Onsets and Frames
Onsets and Frames with Velocity
Kelz et al., 2016
Sigtia et al., 2016

Examples from Musopen

Scarlatti Keyboard Sonata in Ab, K. 127

This example demonstrates the model's ability to transcribe piano recordings that are completely unrelated to the training set.

Original Recording
Onsets and Frames

This transcription is a derivative of the recording of Keyboard Sonata in Ab, K. 127 and is public domain.

Scarlatti Sonata in D minor, K. 9

This recording is of a harpsichord. The results are not as accurate as they would be for a piano recording, but the model works surprisingly well given that there are no harpsichord recordings in the training data.

Original Recording
Onsets and Frames

This transcription is a derivative of the recording of Sonata in D minor, K. 9 and is licensed under CC BY-SA 3.0.

Examples of Current Metric Limitations

All examples use Beethoven's Für Elise.

Many 1-frame notes added This sequence would get a high frame score (~80) despite being almost unlistenable.
One extremely long note added This sequence sounds perfect but would get the same frame score as the previous example.
Note timing jittered, but still within 50ms tolerance This sequence would get a perfect score with current metrics even though it is clearly not an accurate representation of the performance.

Higher Time Resolution Example

Onsets and Frames trained at 24ms timing resolution While the higher time resolution is evident, the model also adds more extraneous notes.

Full Results

Full Inference Results MIDI Files

References

Kelz et al., 2016 and Sigtia et al., 2016 are our reimplementations of the models described in:

Rainer Kelz, Matthias Dorfer, Filip Korzeniowski, Sebastian Böck, Andreas Arzt, and Gerhard Widmer. On the potential of simple framewise approaches to piano transcription. arXiv preprint arXiv:1612.05153, 2016.

Siddharth Sigtia, Emmanouil Benetos, and Simon Dixon. An end-to-end neural network for polyphonic piano music transcription. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 24(5):927–939, 2016.


This work, other than the examples from Musopen, is a derivative of the MAPS Database and is licensed under CC BY-NC-SA 4.0.