A

Ari Spiesberger, Juan J. Vazquez, Nicky Pochinkov, Tom\'a\v{s} Gaven\v{c}iak, Peli Grietzer, Gavin Leech, Nandi Schoots

Articles by Ari Spiesberger, Juan J. Vazquez, Nicky Pochinkov, Tom\'a\v{s} Gaven\v{c}iak, Peli Grietzer, Gavin Leech, Nandi Schoots

Academic · 1 min

Soft Contamination Means Benchmarks Test Shallow Generalization

arXiv:2602.12413v1 Announce Type: cross Abstract: If LLM training data is polluted with benchmark test data, then benchmark performance gives biased estimates of out-of-distribution (OOD) generalization. …

Ari Spiesberger, Juan J. Vazquez, Nicky Pochinkov, Tom\'a\v{s} Gaven\v{c}iak, Peli Grietzer, Gavin Leech, Nandi Schoots
3 views