Skip to main content

Academic

Academic

Academic · 1 min

SourceBench: Can AI Answers Reference Quality Web Sources?

arXiv:2602.16942v1 Announce Type: new Abstract: Large language models (LLMs) increasingly answer queries by citing web sources, but existing evaluations emphasize answer correctness rather than evidence …

Hexi Jin, Stephen Liu, Yuheng Li, Simran Malik, Yiying Zhang
4 views
Academic · 1 min

Automating Agent Hijacking via Structural Template Injection

arXiv:2602.16958v1 Announce Type: new Abstract: Agent hijacking, highlighted by OWASP as a critical threat to the Large Language Model (LLM) ecosystem, enables adversaries to manipulate …

Xinhao Deng, Jiaqing Wu, Miao Chen, Yue Xiao, Ke Xu, Qi Li
6 views
Academic · 1 min

Conv-FinRe: A Conversational and Longitudinal Benchmark for Utility-Grounded Financial Recommendation

arXiv:2602.16990v1 Announce Type: new Abstract: Most recommendation benchmarks evaluate how well a model imitates user behavior. In financial advisory, however, observed actions can be noisy …

Yan Wang, Yi Han, Lingfei Qian, Yueru He, Xueqing Peng, Dongji Feng, Zhuohan Xie, Vincent Jim Zhang, Rosie Guo, Fengran Mo, Jimin Huang, Yankai Chen, Xue Liu, Jian-Yun Nie
8 views
Academic · 1 min

Cinder: A fast and fair matchmaking system

arXiv:2602.17015v1 Announce Type: new Abstract: A fair and fast matchmaking system is an important component of modern multiplayer online games, directly impacting player retention and …

Saurav Pal
8 views
Academic · 1 min

M2F: Automated Formalization of Mathematical Literature at Scale

arXiv:2602.17016v1 Announce Type: new Abstract: Automated formalization of mathematics enables mechanical verification but remains limited to isolated theorems and short snippets. Scaling to textbooks and …

Zichen Wang, Wanli Ma, Zhenyu Ming, Gong Zhang, Kun Yuan, Zaiwen Wen
6 views