UK AISI Alignment Evaluation Case-Study
arXiv:2604.00788v1 Announce Type: new Abstract: This technical report presents methods developed by the UK AI Security Institute for assessing whether advanced AI systems reliably follow …
Quality follows upgrading
Academic
arXiv:2604.00788v1 Announce Type: new Abstract: This technical report presents methods developed by the UK AI Security Institute for assessing whether advanced AI systems reliably follow …
arXiv:2604.00009v1 Announce Type: cross Abstract: We present the design rationale, implementation attempt, and failure analysis of Eyla, a proposed identity-anchored LLM architecture that integrates biologically-inspired …
arXiv:2604.00007v1 Announce Type: cross Abstract: We present Dynin-Omni, the first masked-diffusion-based omnimodal foundation model that unifies text, image, and speech understanding and generation, together with …
arXiv:2603.05735v2 Announce Type: cross Abstract: We present an AI agentic measurement of the thrust distribution in $e^{+}e^{-}$ collisions at $\sqrt{s}=91.2$~GeV using archived ALEPH data. The …
arXiv:2604.00016v1 Announce Type: cross Abstract: The validity of online behavioral research relies on study participants being human rather than machine. In the past, it was …
arXiv:2604.01221v1 Announce Type: new Abstract: We present HippoCamp, a new benchmark designed to evaluate agents' capabilities on multimodal file management. Unlike existing agent benchmarks that …
arXiv:2604.00005v1 Announce Type: new Abstract: Emotion plays an important role in human cognition and performance. Motivated by this, we investigate whether analogous emotional signals can …
arXiv:2604.00008v1 Announce Type: cross Abstract: As qualitative researchers show growing interest in using automated tools to support interpretive analysis, a large language model (LLM) is …
arXiv:2604.00228v1 Announce Type: new Abstract: Large language models are trained to refuse harmful requests, but can they accurately predict when they will refuse before responding? …
arXiv:2604.00291v1 Announce Type: new Abstract: Biolinguistics is the interdisciplinary scientific study of the biological foundations, evolution, and genetic basis of human language. It treats language …
arXiv:2604.01597v1 Announce Type: new Abstract: Traditional RL algorithms like Proximal Policy Optimization (PPO) typically train on the entire rollout buffer, operating under the assumption that …
arXiv:2604.00284v1 Announce Type: new Abstract: We formally introduce a improvisational wordplay game called Connections to explore reasoning capabilities of AI agents. Playing Connections combines skills …