Understanding LLM Failures: A Multi-Tape Turing Machine Analysis of Systematic Errors in Language Model Reasoning
arXiv:2602.15868v1 Announce Type: new Abstract: Large language models (LLMs) exhibit failure modes on seemingly trivial tasks. We propose a formalisation of LLM interaction using a deterministic multi-tape Turing machine, where each tape represents a distinct component: input characters, tokens, vocabulary,...
Every Little Helps: Building Knowledge Graph Foundation Model with Fine-grained Transferable Multi-modal Tokens
arXiv:2602.15896v1 Announce Type: new Abstract: Multi-modal knowledge graph reasoning (MMKGR) aims to predict the missing links by exploiting both graph structure information and multi-modal entity contents. Most existing works are designed for a transductive setting, which learns dataset-specific embeddings and...
Surgical Activation Steering via Generative Causal Mediation
arXiv:2602.16080v1 Announce Type: new Abstract: Where should we intervene in a language model (LM) to control behaviors that are diffused across many tokens of a long-form response? We introduce Generative Causal Mediation (GCM), a procedure for selecting model components, e.g.,...
Language Statistics and False Belief Reasoning: Evidence from 41 Open-Weight LMs
arXiv:2602.16085v1 Announce Type: new Abstract: Research on mental state reasoning in language models (LMs) has the potential to inform theories of human social cognition--such as the theory that mental state reasoning emerges in part from language exposure--and our understanding of...
LLMs Exhibit Significantly Lower Uncertainty in Creative Writing Than Professional Writers
arXiv:2602.16162v1 Announce Type: new Abstract: We argue that uncertainty is a key and understudied limitation of LLMs' performance in creative writing, which is often characterized as trite and clich\'e-ridden. Literary theory identifies uncertainty as a necessary condition for creative expression,...
Long-Tail Knowledge in Large Language Models: Taxonomy, Mechanisms, Interventions and Implications
arXiv:2602.16201v1 Announce Type: new Abstract: Large language models (LLMs) are trained on web-scale corpora that exhibit steep power-law distributions, in which the distribution of knowledge is highly long-tailed, with most appearing infrequently. While scaling has improved average-case performance, persistent failures...
Are LLMs Ready to Replace Bangla Annotators?
arXiv:2602.16241v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly used as automated annotators to scale dataset creation, yet their reliability as unbiased annotators--especially for low-resource and identity-sensitive settings--remains poorly understood. In this work, we study the behavior of...
Bayesian Quadrature: Gaussian Processes for Integration
arXiv:2602.16218v1 Announce Type: new Abstract: Bayesian quadrature is a probabilistic, model-based approach to numerical integration, the estimation of intractable integrals, or expectations. Although Bayesian quadrature was popularised already in the 1980s, no systematic and comprehensive treatment has been published. The...
The Information Geometry of Softmax: Probing and Steering
arXiv:2602.15293v1 Announce Type: cross Abstract: This paper concerns the question of how AI systems encode semantic structure into the geometric structure of their representation spaces. The motivating observation of this paper is that the natural geometry of these representation spaces...
Near-Optimal Sample Complexity for Online Constrained MDPs
arXiv:2602.15076v1 Announce Type: new Abstract: Safety is a fundamental challenge in reinforcement learning (RL), particularly in real-world applications such as autonomous driving, robotics, and healthcare. To address this, Constrained Markov Decision Processes (CMDPs) are commonly used to enforce safety constraints...
Doubly Stochastic Mean-Shift Clustering
arXiv:2602.15393v1 Announce Type: new Abstract: Standard Mean-Shift algorithms are notoriously sensitive to the bandwidth hyperparameter, particularly in data-scarce regimes where fixed-scale density estimation leads to fragmentation and spurious modes. In this paper, we propose Doubly Stochastic Mean-Shift (DSMS), a novel...
1-Bit Wonder: Improving QAT Performance in the Low-Bit Regime through K-Means Quantization
arXiv:2602.15563v1 Announce Type: new Abstract: Quantization-aware training (QAT) is an effective method to drastically reduce the memory footprint of LLMs while keeping performance degradation at an acceptable level. However, the optimal choice of quantization format and bit-width presents a challenge...
Uniform error bounds for quantized dynamical models
arXiv:2602.15586v1 Announce Type: new Abstract: This paper provides statistical guarantees on the accuracy of dynamical models learned from dependent data sequences. Specifically, we develop uniform error bounds that apply to quantized models and imperfect optimization algorithms commonly used in practical...
AD-Bench: A Real-World, Trajectory-Aware Advertising Analytics Benchmark for LLM Agents
arXiv:2602.14257v1 Announce Type: new Abstract: While Large Language Model (LLM) agents have achieved remarkable progress in complex reasoning tasks, evaluating their performance in real-world environments has become a critical problem. Current benchmarks, however, are largely restricted to idealized simulations, failing...
Does Socialization Emerge in AI Agent Society? A Case Study of Moltbook
arXiv:2602.14299v1 Announce Type: new Abstract: As large language model agents increasingly populate networked environments, a fundamental question arises: do artificial intelligence (AI) agent societies undergo convergence dynamics similar to human social systems? Lately, Moltbook approximates a plausible future scenario in...
BLUEPRINT Rebuilding a Legacy: Multimodal Retrieval for Complex Engineering Drawings and Documents
arXiv:2602.13345v1 Announce Type: new Abstract: Decades of engineering drawings and technical records remain locked in legacy archives with inconsistent or missing metadata, making retrieval difficult and often manual. We present Blueprint, a layout-aware multimodal retrieval system designed for large-scale engineering...
HBVLA: Pushing 1-Bit Post-Training Quantization for Vision-Language-Action Models
arXiv:2602.13710v1 Announce Type: new Abstract: Vision-Language-Action (VLA) models enable instruction-following embodied control, but their large compute and memory footprints hinder deployment on resource-constrained robots and edge platforms. While reducing weights to 1-bit precision through binarization can greatly improve efficiency, existing...
Testing For Distribution Shifts with Conditional Conformal Test Martingales
arXiv:2602.13848v1 Announce Type: new Abstract: We propose a sequential test for detecting arbitrary distribution shifts that allows conformal test martingales (CTMs) to work under a fixed, reference-conditional setting. Existing CTM detectors construct test martingales by continually growing a reference set...
Mistral AI buys Koyeb in first acquisition to back its cloud ambitions
Mistral AI has agreed to buy Koyeb, a Paris-based startup that simplifies AI app deployment at scale and manages the infrastructure behind it.