MAESTRO
DatasetSamples from the MAESTRO
test split (left) re-synthesized by the WaveNet model trained on MAESTRO
(center) and basic MIDI synthesis (right).
Real Audio | WaveNet Synthesis | Other Synthesis |
Domenico Scarlatti - Sonata in B Minor, K. 87 | ||
Franz Schubert - Moments Musicaux Op. 94 No. 3 in F-sharp Minor | ||
Frédéric Chopin - Mazurka in D Major, Op. 33, No. 2 | ||
These are 1800-step samples from the Music Transformer model, synthesized by the WaveNet model trained on MAESTRO
(left) and basic MIDI synthesis (right).
WaveNet Synthesis | Other Synthesis | |
Sample 1 | ||
Sample 2 | ||
Sample 3 | ||
Sample 4 |
Random samples from MAESTRO
.
Clips generated by the WaveNet model trained with audio from MAESTRO
with no conditioning
Clips generated by the WaveNet model trained with audio/MIDI pairs from the MAESTRO
training and validation splits, conditioned on random 20-second MIDI subsequences from the MAESTRO
test split.
Clips generated by the WaveNet model trained with audio and transcribed MIDI from MAESTRO-T
(see section 4), conditioned on random 20-second subsequences from the MAESTRO
test split.
Clips generated by the WaveNet model trained with audio and transcribed MIDI from MAESTRO-T
(see section 4), conditioned on random 20-second subsequences from the Music Transformer model described in section 5 that was trained on MAESTRO-T
.
As a fun side-effect, we are also able to alter performances and resynthesize with a different / more natural sound than other traditional signal processing techniques. The audio alterations were performed with Abelton Live 10 on "Complex" mode. All samples are from Prelude and Fugue in A Minor, WTC I, BWV 865 by Johann Sebastian Bach.
Original Audio | MIDI alteration, WaveNet Synthesis | Audio alteration | |
Shift up by 1 octave | |||
Shift down by 1 octave | |||
Reduce tempo by 50% | |||
Increase tempo by 100% | |||
We find that longer samples often have timbral shifts due to variation in recording settings in the ground truth data. By training with a conditioning signal for the year of the recording, we can force the model to generate with a single timbre over long time scales.
For example, this WaveNet synthesis of Prelude and Fugue in A Minor, WTC I, BWV 865 by Johann Sebastian Bach includes a timbral shift at time 0:34:
Here we synthesize this same score with different year conditionings:
Year | Audio |
2004 | |
2006 | |
2008 | |
2009 | |
2011 | |
2013 | |
2014 | |
2015 | |
2017 |
The full anonymized listening study results are available in CSV form: listening_study_anon.csv.
Within this data, the models from the paper have the following identifiers:
Model Name | Identifier |
WaveNet Unconditioned | unconditioned |
WaveNet Transcribed/Transformer | transformer_xs |
WaveNet Transcribed/Test | test_xs |
WaveNet Ground/Test | test |
Ground Truth Recordings | validation |