Skip to main content

Tag: cs.CY

#cs.CY

Academic · 1 min

Equitable Evaluation via Elicitation

arXiv:2602.21327v1 Announce Type: cross Abstract: Individuals with similar qualifications and skills may vary in their demeanor, or outward manner: some tend toward self-promotion while others …

Elbert Du, Cynthia Dwork, Lunjia Hu, Reid McIlroy-Young, Han Shao, Linjun Zhang
6 views
Academic · 1 min

Certified Circuits: Stability Guarantees for Mechanistic Circuits

arXiv:2602.22968v1 Announce Type: new Abstract: Understanding how neural networks arrive at their predictions is essential for debugging, auditing, and deployment. Mechanistic interpretability pursues this goal …

Alaa Anani, Tobias Lorenz, Bernt Schiele, Mario Fritz, Jonas Fischer
3 views
Academic · 1 min

Evaluating Proactive Risk Awareness of Large Language Models

arXiv:2602.20976v1 Announce Type: new Abstract: As large language models (LLMs) are increasingly embedded in everyday decision-making, their safety responsibilities extend beyond reacting to explicit harmful …

Xuan Luo, Yubin Chen, Zhiyu Hou, Linpu Yu, Geng Tu, Jing Li, Ruifeng Xu
4 views
Academic · 1 min

The Statistical Signature of LLMs

arXiv:2602.18152v1 Announce Type: new Abstract: Large language models generate text through probabilistic sampling from high-dimensional distributions, yet how this process reshapes the structural statistical organization …

Ortal Hadad, Edoardo Loru, Jacopo Nudo, Niccol\`o Di Marco, Matteo Cinelli, Walter Quattrociocchi
4 views
Academic · 1 min

Towards a Science of AI Agent Reliability

arXiv:2602.16666v1 Announce Type: new Abstract: AI agents are increasingly deployed to execute important tasks. While rising accuracy scores on standard benchmarks suggest rapid progress, many …

Stephan Rabanser, Sayash Kapoor, Peter Kirgis, Kangheng Liu, Saiteja Utpala, Arvind Narayanan
4 views