Learning to Groove with Inverse Sequence Transformations

Online Supplement

Related Material

Main Paper on arXiv
Dataset Download
Blog Post

Contents

Unconditional Samples
Humanization
Tap2Drum
Groove Transfer
Infilling (Hi-Hats)
KNN vs Seq2Seq (Humanization)

Unconditional Samples

These 3 examples were generated by training a recurrent VAE model on our data. This model is similar to our other models described in the paper, except here we do not remove any performance characteristic information from the inputs. In this example, we demonstrate unconditional samples from the model (here each is 4 measures instead of 2).

Humanization

These 3 examples are drawn from the pool used in our listening test comparing Humanization with Seq2Seq against real data. In the listening test, matched examples were not necessarily paired together; in each comparison, 1 random example was drawn from the pool of real ones, and one random example was drawn from the generated ones.

We include the "Quantized" audio to demonstrate what the input to the Humanization model sounds like with quantized notes and constant velocities.

Original Quantized Generated (Seq2Seq)
1
2
3

Tap2Drum

These 3 examples are drawn from the pool used in our listening test comparing Tap2Drum with Seq2Seq against real data. In the listening test, matched examples were not necessarily paired together; in each comparison, 1 random example was drawn from the pool of real ones, and one random example was drawn from the generated ones.

We include the "Tap" audio to demonstrate what the input to the Tap2Drum model sounds like.

Original Tap Generated (Seq2Seq)
1
2
3

This video visualizes the results of using Tap2Drum in a DAW.

This video visualizes the results of using Tap2Drum on a bassline in a DAW.

Groove Transfer

These 3 audio examples demonstrate the Groove Transfer method. In each example, we extract the drum score, or the "hits" H, from the "Target Beat", and we extract the groove, or the performance characteristics V (velocity) and O (timing offsets) from the "Source Groove". These together are passed to our decoder, which generates "Transferred". The resulting performance generally follows the score from the Target Beat but the groove from the Source Groove. Sometimes it is not entirely clear whether it is possible to apply this transfer and still expect to hear a realistic output; for this reason, these outputs tend not to be quite as realistic as the outputs from our other models, but the application is quite different.

Source Groove Target Beat Transfer
1
2
3

This video visualizes the results of Groove Transfer in a DAW.

Infilling (Hi-Hats)

These 3 examples are drawn from the pool used in our listening test comparing Infilling with Seq2Seq against real data. In the listening test, matched examples were not necessarily paired together; in each comparison, 1 random example was drawn from the pool of real ones, and one random example was drawn from the generated ones.

The "Minus Hi-Hats" audio demonstrate what the inputs to this model sound like; this Infilling model takes in drums with no hi-hats and generates a hi-hat part.

Original Minus Hi-Hats Generated (Adds Hi-Hats)
1
2
3

This video visualizes the results of Infilling in a DAW.

KNN vs Seq2Seq (Humanization)

These 3 examples are drawn from the pool used in our listening test comparing Humanization with Seq2Seq against Humanization with KNN. In the listening test, matched examples were not necessarily paired together; in each comparison, 1 random example was drawn from the pool of real ones, and one random example was drawn from the generated ones.

The "Quantized" audio demonstrate what the drums sound like with quantized notes and constant velocities. These are exact renderings of the drum scores with no performance characteristics. Our Humanization models take these as input and learn to perform them, as shown in the "KNN" and "Seq2Seq" audio.

Quantized KNN Seq2Seq
1
2
3