LLM as a Meta-Judge: Synthetic Data for NLP Evaluation Metric Validation
arXiv:2603.09403v1 Announce Type: new Abstract: Validating evaluation metrics for NLG typically relies on expensive and time-consuming human annotations, which predominantly exist only for English datasets. …
Luk\'a\v{s} Eigler, Jind\v{r}ich Libovick\'y, David Hurych
32 views