Main Paper on arXiv
Source Code
Interactive Colab Notebook
Video Demonstration
Examples from MAPS Dataset
Examples from Musopen
Examples of Current Metric Limitations
Higher Time Resolution Example
Full Results
References
| Disklavier | |
| Onsets and Frames | |
| Onsets and Frames with Velocity | |
| Kelz et al., 2016 | |
| Sigtia et al., 2016 |
| Disklavier | |
| Onsets and Frames | |
| Onsets and Frames with Velocity | |
| Kelz et al., 2016 | |
| Sigtia et al., 2016 |
| Ground Truth MIDI | ||
| Disklavier | Notice the missing notes that weren't played by the Disklavier | |
| Onsets and Frames | ||
| Onsets and Frames with Velocity | ||
| Kelz et al., 2016 | ||
| Sigtia et al., 2016 |
This example demonstrates the model's ability to transcribe piano recordings that are completely unrelated to the training set.
| Original Recording | Onsets and Frames |
This transcription is a derivative of the recording of Keyboard Sonata in Ab, K. 127 and is public domain.
This recording is of a harpsichord. The results are not as accurate as they would be for a piano recording, but the model works surprisingly well given that there are no harpsichord recordings in the training data.
| Original Recording | Onsets and Frames |
This transcription is a derivative of the recording of Sonata in D minor, K. 9 and is licensed under CC BY-SA 3.0.
All examples use Beethoven's Für Elise.
| Many 1-frame notes added | This sequence would get a high frame score (~80) despite being almost unlistenable. | |
| One extremely long note added | This sequence sounds perfect but would get the same frame score as the previous example. | |
| Note timing jittered, but still within 50ms tolerance | This sequence would get a perfect score with current metrics even though it is clearly not an accurate representation of the performance. |
| Onsets and Frames trained at 24ms timing resolution | While the higher time resolution is evident, the model also adds more extraneous notes. |
Kelz et al., 2016 and Sigtia et al., 2016 are our reimplementations of the models described in:
Rainer Kelz, Matthias Dorfer, Filip Korzeniowski, Sebastian Böck, Andreas Arzt, and Gerhard Widmer. On the potential of simple framewise approaches to piano transcription. arXiv preprint arXiv:1612.05153, 2016.
Siddharth Sigtia, Emmanouil Benetos, and Simon Dixon. An end-to-end neural network for polyphonic piano music transcription. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 24(5):927–939, 2016.
This work, other than the examples from Musopen, is a derivative of the MAPS Database and is licensed under CC BY-NC-SA 4.0.