Theme Transformer

Abstract

Attention-based Transformer models have been increasingly employed for automatic music generation. To condition the generation process of such a model with a user-specified sequence, a popular approach is to take that conditioning sequence as a priming sequence and ask a Transformer decoder to generate a continuation. However, this prompt-based conditioning cannot guarantee that the conditioning sequence would develop or even simply repeat itself in the generated continuation. In this paper, we propose an alternative conditioning approach, called theme-based conditioning, that explicitly trains the Transformer to treat the conditioning sequence as a thematic material that has to manifest itself multiple times in its generation result. This is achieved with two main technical contributions. First, we propose a deep learning-based approach that uses contrastive representation learning and clustering to automatically retrieve thematic materials from music pieces in the training data. Second, we propose a novel gated parallel attention module to be used in a sequence-to-sequence (seq2seq) encoder/decoder architecture to more effectively account for a given conditioning thematic material in the generation process of the Transformer decoder. We report on objective and subjective evaluations of variants of the proposed Theme Transformer and the conventional prompt-based baseline, showing that our best model can generate, to some extent, polyphonic pop piano music with repetition and plausible variations of a given condition.

Demo

Audio Samples

ID	Theme	Real Data	Baseline	Theme Transformer
875
888
890
893
899
900
901
904
908
909

Figures

id	First 24 bars	Melody Embedding Distance
875
888
890
893
894
896
899
900
901
904
908
909

Qualitative results for Theme Retrieval

For more details about Theme Retrieval: link

Midi ID	Piano Roll (.png)	Piano Roll (webpage)	Midi Files
065	Link	Link	Link
284	Link	Link	Link
310	Link	Link	Link
422	Link	Link	Link
449	Link	Link	Link
464	Link	Link	Link

Comparision for Different Sampling Temperature on Inference phase

File: temperature_comparison.zip

Comparision for Different Eps in DBSCAN

File: eps_comparison.zip

Citation

If you find this work helpful and use our code in your research, please kindly cite our paper:

@article{shih2021theme,
      title={Theme Transformer: Symbolic Music Generation with Theme-Conditioned Transformer}, 
      author={Yi-Jen Shih and Shih-Lun Wu and Frank Zalkow and Meinard Müller and Yi-Hsuan Yang},
      year={2021},
      eprint={2111.04093},
      archivePrefix={arXiv},
      primaryClass={cs.SD}
}