MAESTRO DatasetSamples from the MAESTRO test split (left) re-synthesized by the WaveNet model trained on MAESTRO (center) and basic MIDI synthesis (right).
| Real Audio | WaveNet Synthesis | Other Synthesis |
| Domenico Scarlatti - Sonata in B Minor, K. 87 | ||
| Franz Schubert - Moments Musicaux Op. 94 No. 3 in F-sharp Minor | ||
| Frédéric Chopin - Mazurka in D Major, Op. 33, No. 2 | ||
These are 1800-step samples from the Music Transformer model, synthesized by the WaveNet model trained on MAESTRO (left) and basic MIDI synthesis (right).
| WaveNet Synthesis | Other Synthesis | |
| Sample 1 | ||
| Sample 2 | ||
| Sample 3 | ||
| Sample 4 |
Random samples from MAESTRO.
Clips generated by the WaveNet model trained with audio from MAESTRO with no conditioning
Clips generated by the WaveNet model trained with audio/MIDI pairs from the MAESTRO training and validation splits, conditioned on random 20-second MIDI subsequences from the MAESTRO test split.
Clips generated by the WaveNet model trained with audio and transcribed MIDI from MAESTRO-T (see section 4), conditioned on random 20-second subsequences from the MAESTRO test split.
Clips generated by the WaveNet model trained with audio and transcribed MIDI from MAESTRO-T (see section 4), conditioned on random 20-second subsequences from the Music Transformer model described in section 5 that was trained on MAESTRO-T.
As a fun side-effect, we are also able to alter performances and resynthesize with a different / more natural sound than other traditional signal processing techniques. The audio alterations were performed with Abelton Live 10 on "Complex" mode. All samples are from Prelude and Fugue in A Minor, WTC I, BWV 865 by Johann Sebastian Bach.
| Original Audio | MIDI alteration, WaveNet Synthesis | Audio alteration | |
| Shift up by 1 octave | |||
| Shift down by 1 octave | |||
| Reduce tempo by 50% | |||
| Increase tempo by 100% | |||
We find that longer samples often have timbral shifts due to variation in recording settings in the ground truth data. By training with a conditioning signal for the year of the recording, we can force the model to generate with a single timbre over long time scales.
For example, this WaveNet synthesis of Prelude and Fugue in A Minor, WTC I, BWV 865 by Johann Sebastian Bach includes a timbral shift at time 0:34:
Here we synthesize this same score with different year conditionings:
| Year | Audio |
| 2004 | |
| 2006 | |
| 2008 | |
| 2009 | |
| 2011 | |
| 2013 | |
| 2014 | |
| 2015 | |
| 2017 |