Main Paper on arXiv
Source Code
Interactive Colab Notebook
Video Demonstration
Examples from MAPS Dataset
Examples from Musopen
Examples of Current Metric Limitations
Higher Time Resolution Example
Full Results
References
Disklavier | |
Onsets and Frames | |
Onsets and Frames with Velocity | |
Kelz et al., 2016 | |
Sigtia et al., 2016 |
Disklavier | |
Onsets and Frames | |
Onsets and Frames with Velocity | |
Kelz et al., 2016 | |
Sigtia et al., 2016 |
Ground Truth MIDI | ||
Disklavier | Notice the missing notes that weren't played by the Disklavier | |
Onsets and Frames | ||
Onsets and Frames with Velocity | ||
Kelz et al., 2016 | ||
Sigtia et al., 2016 |
This example demonstrates the model's ability to transcribe piano recordings that are completely unrelated to the training set.
Original Recording | Onsets and Frames |
This transcription is a derivative of the recording of Keyboard Sonata in Ab, K. 127 and is public domain.
This recording is of a harpsichord. The results are not as accurate as they would be for a piano recording, but the model works surprisingly well given that there are no harpsichord recordings in the training data.
Original Recording | Onsets and Frames |
This transcription is a derivative of the recording of Sonata in D minor, K. 9 and is licensed under CC BY-SA 3.0.
All examples use Beethoven's Für Elise.
Many 1-frame notes added | This sequence would get a high frame score (~80) despite being almost unlistenable. | |
One extremely long note added | This sequence sounds perfect but would get the same frame score as the previous example. | |
Note timing jittered, but still within 50ms tolerance | This sequence would get a perfect score with current metrics even though it is clearly not an accurate representation of the performance. |
Onsets and Frames trained at 24ms timing resolution | While the higher time resolution is evident, the model also adds more extraneous notes. |
Kelz et al., 2016 and Sigtia et al., 2016 are our reimplementations of the models described in:
Rainer Kelz, Matthias Dorfer, Filip Korzeniowski, Sebastian Böck, Andreas Arzt, and Gerhard Widmer. On the potential of simple framewise approaches to piano transcription. arXiv preprint arXiv:1612.05153, 2016.
Siddharth Sigtia, Emmanouil Benetos, and Simon Dixon. An end-to-end neural network for polyphonic piano music transcription. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 24(5):927–939, 2016.
This work, other than the examples from Musopen, is a derivative of the MAPS Database and is licensed under CC BY-NC-SA 4.0.