Generated | |||
Prompt | Cold Init.-1.3B | TWIST-1.3B | TWIST-7B |
Prompt | Generated | Ground truth (resynthesized from speech tokens using a vocoder) |
Prompt | Generated | Ground truth (resynthesized from speech tokens using a vocoder) |
@article{hassid2024textually, title={Textually pretrained speech language models}, author={Hassid, Michael and Remez, Tal and Nguyen, Tu Anh and Gat, Itai and Conneau, Alexis and Kreuk, Felix and Copet, Jade and Defossez, Alexandre and Synnaeve, Gabriel and Dupoux, Emmanuel and others}, journal={Advances in Neural Information Processing Systems}, volume={36}, year={2024} }