Video of Improvisational Session with ReaLJam
Below is a video of a user playing live with ReaLJam, where the user plays a melody and an AI agent plays accompanying chords in real-time. After the user starts the live session, they start improvising a melody on an external MIDI piano, lighting up on-screen piano notes in orange. After the silence period is over (8 beats in), the chords anticipated by the AI agent appear in blue at the top of the middle region and fall down until hitting the piano and playing.
This performance highlights several of ReaLJam's strengths. First, even though the user is not listening to the metronome, they are able to synchronize well with the agent (which generates in discrete time frames). Second, the agent is able to harmonize well with the melody, even though the melody often deviates from notes in the major or minor scales. This is enabled by the agent being able to anticipate the user's melody, as well as the user being able to see what chords the agent plans on playing next. Additionally, the chord progressions are often and surprising and interesting, notably the transition from Bbm to Bb at 0:27 or Eb to B at 1:42. Third, the performance has a consistent musical structure throughout, with the key often deviating but consistently returning to a main theme centered around Ab. This shows the agent not only harmonizes and anticipates well, but has a conception of the main musical ideas in a performance.
Users created unique and cohesive performances using ReaLJam (Section 4.1)
As discussed in the paper, users overall enjoyed their experience with ReaLJam. In addition to noting the functionality of the system, they also noted experiencing particularly exceptional moments where the chord agent was highly in line with their melodies or even surprised them with interesting accompaniments. Below we show a performance from each participant in the study where the melody and chords are well synchronized and have several satisfying moments. These notably span across diverse musical styles, demonstrating ReaLJam's ability to generalize across diverse inputs and support various users' needs.
Reinforcement learning is necessary (Section 4.2 paragraph 1)
We found that only agents trained with reinforcement learning (RL) could deliver consistent and stable accompaniments. Below we show examples from the user study where participants play with the Online model, which is simply pre-trained with a supervised objective. For comparison, we show each participant playing with the ReaLchords-S model (an agent fine-tuned with RL) with 0 commit for fairness (since the Online model also must use 0 commit as discussed in Section 3.2). Regarding the below examples with the Online model:
- In the first 3 examples, the Online model fails completely, only playing chords sporadically and with poor harmonization.
- In examples 4 and 5, the Online model produces chords for most of the performance, yet they are lower quality than their RL counterpart and also completely fail near the end of the performance.
- Only in the last example does the Online model successfully accompany the user the whole time.
Warm-starting prevents bad chords early on (Section 4.2 paragraph 3)
We discovered that warm-starting generation was necessary to avoid early bad chords from the agent. We show examples from the experiment where the silence period is set to 0 beats (effectively disabling warm-starting as described in Section 2.3.2). For comparison, we show the corresponding baseline performance for each participant, where the silence period is 8 beats and warm-starting occurs. In every case, when there is no warm-starting, the agent generates an initial chord that does not match the user melody at all. The agent usually recovers after one or two more chords, although this bad start can affect the rest of the performance negatively as it does in the first example. In the baseline performances, the first chord always matches the melody well, showing that warm-starting is necessary to achieve a good start to the performance.
Commit period reduces chord instability (Section 4.2 paragraph 3)
We discovered that the commit period not only acts as a tool to help users anticipate chords, but can also stabilize chord predictions. We show examples from the experiment where the commit time is set to 0 beats and compare with the corresponding baseline performances where the commit time is 4 beats. With 0 commit, we found the agent was more likely to produce unwanted artifacts of rapid 1/16th note chord changes, which we hypothesize are due to the agent's plan changing between requests. While these artifacts occasionally occur in the baseline performances (the last example) due to randomness in decoding, they are considerably reduced.
Users performed better with their preferred settings (Section 4.3)
We found that many of the interface settings had a high impact on user preference and the quality of their performances. Below, we show pairs of user performances where one has their preferred setting (indicated verbally and by what they chose in their last perforamnce) and the other does not. We find that:
- In the first example, the user preferred to see the incoming chords, and we can hear they struggled to synchronize their melody with the chords when they weren't shown.
- In the second example, the user preferred to set the commit time to 0 beats instead of 4, and we can hear the chords are more nuanced and harmonically aligned with the melody when using 0 commit.
- In the third example, the user preferred to not listen to the metronome, and we can hear that their performance is more expressive and less rigid than when the metronome is on.