What Language is This? Ask Your Tokenizer
arXiv:2602.17655v1 Announce Type: new Abstract: Language Identification (LID) is an important component of many multilingual natural language processing pipelines, where it facilitates corpus curation, training …
Clara Meister, Ahmetcan Yavuz, Pietro Lesci, Tiago Pimentel
4 views