OpenLID-v3: Improving the Precision of Closely Related Language Identification -- An Experience Report
arXiv:2602.13139v1 Announce Type: new Abstract: Language identification (LID) is an essential step in building high-quality multilingual datasets from web data. Existing LID tools (such as …
Mariia Fedorova, Nikolay Arefyev, Maja Buljan, Jind\v{r}ich Helcl, Stephan Oepen, Egil R{\o}nningstad, Yves Scherrer
3 views