TASTE-Streaming: Towards Streamable Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling
arXiv:2603.12350v1 Announce Type: new Abstract: Text-speech joint spoken language modeling (SLM) aims at natural and intelligent speech-based interactions, but developing such a system may suffer …
Liang-Hsuan Tseng, Hung-yi Lee
13 views