Skip to main content

Category

Academic

Academic · 1 min

Training Agents to Self-Report Misbehavior

arXiv:2602.22303v1 Announce Type: new Abstract: Frontier AI agents may pursue hidden goals while concealing their pursuit from oversight. Alignment training aims to prevent such behavior …

Bruce W. Lee, Chen Yueh-Han, Tomek Korbak
5 views
Academic · 1 min

A 1/R Law for Kurtosis Contrast in Balanced Mixtures

arXiv:2602.22334v1 Announce Type: new Abstract: Kurtosis-based Independent Component Analysis (ICA) weakens in wide, balanced mixtures. We prove a sharp redundancy law: for a standardized projection …

Yuda Bi, Wenjun Xiao, Linhao Bai, Vince D Calhoun
5 views
Academic · 1 min

Calibrated Test-Time Guidance for Bayesian Inference

arXiv:2602.22428v1 Announce Type: new Abstract: Test-time guidance is a widely used mechanism for steering pretrained diffusion models toward outcomes specified by a reward function. Existing …

Daniel Geyfman, Felix Draxler, Jan Groeneveld, Hyunsoo Lee, Theofanis Karaletsos, Stephan Mandt
4 views