All Leaks Count, Some Count More: Interpretable Temporal Contamination Detection in LLM Backtesting
arXiv:2602.17234v1 Announce Type: new Abstract: To evaluate whether LLMs can accurately predict future events, we need the ability to \textit{backtest} them on events that have …
Zeyu Zhang, Ryan Chen, Bradly C. Stadie
6 views