Voice conversion is the task of making a spoken utterance by one speaker sound as if uttered by a different speaker, while keeping other aspects like content the same. Existing methods focus primarily on spectral features like timbre, but ignore the unique speaking style of people which often impacts prosody. In this study we introduce a method for converting not only the timbre, but also the rhythm and pitch changes to those of the target speaker. In addition, we do so in the many-to-many setting with no paired data. We use pretrained, self-supervised, discrete units which make our approach extremely light-weight. We introduce a suite of quantitative and qualitative evaluation metrics for this setup, and show that our approach outperforms existing methods.
Sample | Source | Target | Speech Resynthesis | AutoPST | DISSC_Rhythm | DISSC_Both |
---|---|---|---|---|---|---|
p231_020 | ||||||
p245_019 |
Sample | Source | Target | Speech Resynthesis | AutoPST | DISSC_Rhythm |
---|---|---|---|---|---|
0017Happy_020 | |||||
0019Sad_028 |
Sample | Source | Target | Speech Resynthesis | DISSC_Pitch | DISSC_Rhythm | DISSC_Both |
---|---|---|---|---|---|---|
p270_001 | ||||||
p231_021 | ||||||
p245_014 |
Sample | Abnormal | Original | AutoPST | DISSC_Rhythm |
---|---|---|---|---|
p239_010 |
Source | Target Speaker | DISSC_Both |
---|---|---|
@misc{https://doi.org/10.48550/arxiv.2212.09730,
doi = {10.48550/ARXIV.2212.09730},
url = {https://arxiv.org/abs/2212.09730},
author = {Maimon, Gallil and Adi, Yossi},
keywords = {Sound (cs.SD), Computation and Language (cs.CL), Machine Learning (cs.LG), Audio and Speech Processing (eess.AS), FOS: Computer and information sciences, FOS: Computer and information sciences, FOS: Electrical engineering, electronic engineering, information engineering, FOS: Electrical engineering, electronic engineering, information engineering},
title = {Speaking Style Conversion With Discrete Self-Supervised Units},
publisher = {arXiv},
year = {2022},
copyright = {arXiv.org perpetual, non-exclusive license}
}