Joint Audio And Symbolic Conditioning for

Temporally Controlled Text-To-Music Generation

Or Tal1,2*, Alon Ziv1*, Itai Gat2, Felix Kreuk2, Yossi Adi1,2



1The Hebrew University of Jerusalem

2FAIR, Meta AI

*Equal Contribution

[paper] [code] [bib]

Abstract


We present JASCO, a temporally controlled text-to-music generation model utilizing both symbolic and audio-based conditions. JASCO can generate high-quality music samples conditioned on global text descriptions along with fine-grained local controls. JASCO is based on the Flow Matching modeling paradigm together with a novel conditioning method. This allows music generation controlled both locally (e.g., chords) and globally (text description). Specifically, we apply information bottleneck layers in conjunction with temporal blurring to extract relevant information with respect to specific controls. This allows the incorporation of both symbolic and audio-based conditions in the same text-to-music model. We experiment with various symbolic conditioning signals (e.g., chords, melody), as well as with audio representations (e.g., separated drum tracks, full-mix). We evaluate JASCO considering both generation quality and condition adherence, using both objective metrics and human studies. Results suggest that JASCO is comparable to the evaluated baselines considering generation quality while allowing significantly better and more versatile controls over the generated music.

Melody Conditioning

Input Audio Text 1 Text 2
Balero - Ravel An 80s driving pop song electronic drums and synth pads in the background Folk song with accordion and acoustic guitar
Your browser does not support the audio element. Your browser does not support the audio element. Your browser does not support the audio element.
Flight of the Bumblebee - Rimsky-Korsakov Fast tempo country song with dominating banjo and acoustic guitars Psychodelic trance music with deep synth bass
Your browser does not support the audio element. Your browser does not support the audio element. Your browser does not support the audio element.

Drums Conditioning

text prompt → drum prompt ↓ Input Audio 90s rock with electric guitar and heavy drums Reggae with ukelele and percussions An 80s driving pop song with heavy drums and synth pads in the background
separated drums Your browser does not support the audio element. Your browser does not support the audio element. Your browser does not support the audio element. Your browser does not support the audio element.
separated drums Your browser does not support the audio element. Your browser does not support the audio element. Your browser does not support the audio element. Your browser does not support the audio element.
separated drums Your browser does not support the audio element. Your browser does not support the audio element. Your browser does not support the audio element. Your browser does drums2not support the audio element.
beatboxing Your browser does not support the audio element. Your browser does not support the audio element. Your browser does not support the audio element. Your browser does not support the audio element.
beatboxing Your browser does not support the audio element. Your browser does not support the audio element. Your browser does not support the audio element. Your browser does not support the audio element.
beatboxing Your browser does not support the audio element. Your browser does not support the audio element. Your browser does not support the audio element. Your browser does not support the audio element.

Chords Conditioning

Chord Progression 90s rock with electric guitar and heavy drums Reggae with ukelele and percussions An 80s driving pop song with heavy drums and synth pads in the background
(C, 0.0), (F, 1.0), (G, 1.75), (C, 4.0), (F, 5.0), (G, 5.75), (C, 8.0), (F, 9.0), (A7, 9.75) Your browser does not support the audio element. Your browser does not support the audio element. Your browser does not support the audio element.
(Em, 0.0), (G, 1.5), (D, 3.0), (A, 4.5), (Em, 6.0), (G, 7.5), (D, 9.0) Your browser does not support the audio element. Your browser does not support the audio element. Your browser does not support the audio element.
(E7, 0.0), (A7, 1.0), (E7, 2.0), (A7, 4.0), (E7, 6.0), (B7, 8.0), (A7, 8.5), (E7, 9.0) Your browser does not support the audio element. Your browser does not support the audio element. Your browser does not support the audio element.
(D, 0.0), (F#m, 2.5), (G, 5.0) , (D, 7.5) Your browser does not support the audio element. Your browser does not support the audio element. Your browser does not support the audio element.
(E, 0.0), (D, 1.25), (A, 2.5) , (E, 5.0), (D, 6.25), (A, 7.5) Your browser does not support the audio element. Your browser does not support the audio element. Your browser does not support the audio element.

Audio Conditioning

Input Audio 90s rock with electric guitar and heavy drums Reggae with ukelele and percussions An 80s driving pop song with heavy drums and synth pads in the background
Your browser does not support the audio element. Your browser does not support the audio element. Your browser does not support the audio element. Your browser does not support the audio element.
Your browser does not support the audio element. Your browser does not support the audio element. Your browser does not support the audio element. Your browser does not support the audio element.
Your browser does not support the audio element. Your browser does not support the audio element. Your browser does not support the audio element. Your browser does not support the audio element.

Sample Sandbox

Interactively control the audio samples using the dropdown menus below.

Source Audio

Input Sample 1 Input Sample 2 Input Sample 3
Your browser does not support the audio element. Your browser does not support the audio element. Your browser does not support the audio element.


Generated





condition ↓ text → 90s rock with electric guitar and heavy drums lofi slow bpm electro chill with organic samples Groovy and bright, with funk elements featuring lively horns, bass, and drums to create a positive and confident mood.
Melody Your browser does not support the audio element. Your browser does not support the audio element. Your browser does not support the audio element.
Drums Your browser does not support the audio element. Your browser does not support the audio element. Your browser does not support the audio element.
Chords Your browser does not support the audio element. Your browser does not support the audio element. Your browser does not support the audio element.
Text-Only Your browser does not support the audio element. Your browser does not support the audio element. Your browser does not support the audio element.

Conditioning on Multiple Controls

text → condition ↓ A modern Bossa Nova using traditional Brazilian instruments. With nylon stringed guitar, piano, bass, flugel horn and percussion. bpm 112 Groovy and bright, with funk elements featuring lively horns, bass, and drums to create a positive and confident mood. Bright and grooving, featuring vocal chops, synthesizers, bass, and beats that create a proud, soaring mood. Soaring and hopeful, featuring atmospheric electric guitar, floating chops, bouncy choir and light synth drums that create a dreamy, inspirational mood. bpm 105
Source Audio Your browser does not support the audio element. Your browser does not support the audio element. Your browser does not support the audio element. Your browser does not support the audio element.
Chords + Drums Your browser does not support the audio element. Your browser does not support the audio element. Your browser does not support the audio element. Your browser does not support the audio element.
Chords + Melody Your browser does not support the audio element. Your browser does not support the audio element. Your browser does not support the audio element. Your browser does not support the audio element.
Drums + Melody Your browser does not support the audio element. Your browser does not support the audio element. Your browser does not support the audio element. Your browser does not support the audio element.
All Controls Your browser does not support the audio element. Your browser does not support the audio element. Your browser does not support the audio element. Your browser does not support the audio element.

BibTex:

              @misc{tal2024joint,
                title={Joint Audio and Symbolic Conditioning for Temporally Controlled Text-to-Music Generation}, 
                author={Or Tal and Alon Ziv and Itai Gat and Felix Kreuk and Yossi Adi},
                year={2024},
                eprint={2406.10970},
                archivePrefix={arXiv},
                primaryClass={cs.SD}
              }