Reranker Optimization via Geodesic Distances on k-NN Manifolds
arXiv:2602.15860v1 Announce Type: new Abstract: Current neural reranking approaches for retrieval-augmented generation (RAG) rely on cross-encoders or large language models (LLMs), requiring substantial computational resources and exhibiting latencies of 3-5 seconds per query. We propose Maniscope, a geometric reranking method...
Understanding LLM Failures: A Multi-Tape Turing Machine Analysis of Systematic Errors in Language Model Reasoning
arXiv:2602.15868v1 Announce Type: new Abstract: Large language models (LLMs) exhibit failure modes on seemingly trivial tasks. We propose a formalisation of LLM interaction using a deterministic multi-tape Turing machine, where each tape represents a distinct component: input characters, tokens, vocabulary,...
Every Little Helps: Building Knowledge Graph Foundation Model with Fine-grained Transferable Multi-modal Tokens
arXiv:2602.15896v1 Announce Type: new Abstract: Multi-modal knowledge graph reasoning (MMKGR) aims to predict the missing links by exploiting both graph structure information and multi-modal entity contents. Most existing works are designed for a transductive setting, which learns dataset-specific embeddings and...
CLAA: Cross-Layer Attention Aggregation for Accelerating LLM Prefill
arXiv:2602.16054v1 Announce Type: new Abstract: The prefill stage in long-context LLM inference remains a computational bottleneck. Recent token-ranking heuristics accelerate inference by selectively processing a subset of semantically relevant tokens. However, existing methods suffer from unstable token importance estimation, often...
Surgical Activation Steering via Generative Causal Mediation
arXiv:2602.16080v1 Announce Type: new Abstract: Where should we intervene in a language model (LM) to control behaviors that are diffused across many tokens of a long-form response? We introduce Generative Causal Mediation (GCM), a procedure for selecting model components, e.g.,...
Language Statistics and False Belief Reasoning: Evidence from 41 Open-Weight LMs
arXiv:2602.16085v1 Announce Type: new Abstract: Research on mental state reasoning in language models (LMs) has the potential to inform theories of human social cognition--such as the theory that mental state reasoning emerges in part from language exposure--and our understanding of...
LLMs Exhibit Significantly Lower Uncertainty in Creative Writing Than Professional Writers
arXiv:2602.16162v1 Announce Type: new Abstract: We argue that uncertainty is a key and understudied limitation of LLMs' performance in creative writing, which is often characterized as trite and clich\'e-ridden. Literary theory identifies uncertainty as a necessary condition for creative expression,...
Long-Tail Knowledge in Large Language Models: Taxonomy, Mechanisms, Interventions and Implications
arXiv:2602.16201v1 Announce Type: new Abstract: Large language models (LLMs) are trained on web-scale corpora that exhibit steep power-law distributions, in which the distribution of knowledge is highly long-tailed, with most appearing infrequently. While scaling has improved average-case performance, persistent failures...
Are LLMs Ready to Replace Bangla Annotators?
arXiv:2602.16241v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly used as automated annotators to scale dataset creation, yet their reliability as unbiased annotators--especially for low-resource and identity-sensitive settings--remains poorly understood. In this work, we study the behavior of...
Investigating GNN Convergence on Large Randomly Generated Graphs with Realistic Node Feature Correlations
arXiv:2602.16145v1 Announce Type: new Abstract: There are a number of existing studies analysing the convergence behaviour of graph neural networks on large random graphs. Unfortunately, the majority of these studies do not model correlations between node features, which would naturally...
ASPEN: Spectral-Temporal Fusion for Cross-Subject Brain Decoding
arXiv:2602.16147v1 Announce Type: new Abstract: Cross-subject generalization in EEG-based brain-computer interfaces (BCIs) remains challenging due to individual variability in neural signals. We investigate whether spectral representations offer more stable features for cross-subject transfer than temporal waveforms. Through correlation analyses across...
Rethinking Input Domains in Physics-Informed Neural Networks via Geometric Compactification Mappings
arXiv:2602.16193v1 Announce Type: new Abstract: Several complex physical systems are governed by multi-scale partial differential equations (PDEs) that exhibit both smooth low-frequency components and localized high-frequency structures. Existing physics-informed neural network (PINN) methods typically train with fixed coordinate system inputs,...
Bayesian Quadrature: Gaussian Processes for Integration
arXiv:2602.16218v1 Announce Type: new Abstract: Bayesian quadrature is a probabilistic, model-based approach to numerical integration, the estimation of intractable integrals, or expectations. Although Bayesian quadrature was popularised already in the 1980s, no systematic and comprehensive treatment has been published. The...
What the Justice Department overlooks in its historical argument to end birthright citizenship
Immigration Matters is a recurring series by César Cuauhtémoc García Hernández that analyzes the court’s immigration docket, highlighting emerging legal questions about new policy and enforcement practices. In my last […]The postWhat the Justice Department overlooks in its historical argument...
Reliance unveils $110B AI investment plan as India ramps up tech ambitions
Reliance has begun building multi-gigawatt AI data centers in Jamnagar, with more than 120 MW of capacity expected to come online in 2026.
Colosseum: Auditing Collusion in Cooperative Multi-Agent Systems
arXiv:2602.15198v1 Announce Type: cross Abstract: Multi-agent systems, where LLM agents communicate through free-form language, enable sophisticated coordination for solving complex cooperative tasks. This surfaces a unique safety problem when individual agents form a coalition and \emph{collude} to pursue secondary goals...
The Information Geometry of Softmax: Probing and Steering
arXiv:2602.15293v1 Announce Type: cross Abstract: This paper concerns the question of how AI systems encode semantic structure into the geometric structure of their representation spaces. The motivating observation of this paper is that the natural geometry of these representation spaces...
Prescriptive Scaling Reveals the Evolution of Language Model Capabilities
arXiv:2602.15327v1 Announce Type: cross Abstract: For deploying foundation models, practitioners increasingly need prescriptive scaling laws: given a pre training compute budget, what downstream accuracy is attainable with contemporary post training practice, and how stable is that mapping as the field...
Near-Optimal Sample Complexity for Online Constrained MDPs
arXiv:2602.15076v1 Announce Type: new Abstract: Safety is a fundamental challenge in reinforcement learning (RL), particularly in real-world applications such as autonomous driving, robotics, and healthcare. To address this, Constrained Markov Decision Processes (CMDPs) are commonly used to enforce safety constraints...
Size Transferability of Graph Transformers with Convolutional Positional Encodings
arXiv:2602.15239v1 Announce Type: new Abstract: Transformers have achieved remarkable success across domains, motivating the rise of Graph Transformers (GTs) as attention-based architectures for graph-structured data. A key design choice in GTs is the use of Graph Neural Network (GNN)-based positional...
Scaling Laws for Masked-Reconstruction Transformers on Single-Cell Transcriptomics
arXiv:2602.15253v1 Announce Type: new Abstract: Neural scaling laws -- power-law relationships between loss, model size, and data -- have been extensively documented for language and vision transformers, yet their existence in single-cell genomics remains largely unexplored. We present the first...
Doubly Stochastic Mean-Shift Clustering
arXiv:2602.15393v1 Announce Type: new Abstract: Standard Mean-Shift algorithms are notoriously sensitive to the bandwidth hyperparameter, particularly in data-scarce regimes where fixed-scale density estimation leads to fragmentation and spurious modes. In this paper, we propose Doubly Stochastic Mean-Shift (DSMS), a novel...
ExLipBaB: Exact Lipschitz Constant Computation for Piecewise Linear Neural Networks
arXiv:2602.15499v1 Announce Type: new Abstract: It has been shown that a neural network's Lipschitz constant can be leveraged to derive robustness guarantees, to improve generalizability via regularization or even to construct invertible networks. Therefore, a number of methods varying in...
On the Geometric Coherence of Global Aggregation in Federated GNN
arXiv:2602.15510v1 Announce Type: new Abstract: Federated Learning (FL) enables distributed training across multiple clients without centralized data sharing, while Graph Neural Networks (GNNs) model relational data through message passing. In federated GNN settings, client graphs often exhibit heterogeneous structural and...
1-Bit Wonder: Improving QAT Performance in the Low-Bit Regime through K-Means Quantization
arXiv:2602.15563v1 Announce Type: new Abstract: Quantization-aware training (QAT) is an effective method to drastically reduce the memory footprint of LLMs while keeping performance degradation at an acceptable level. However, the optimal choice of quantization format and bit-width presents a challenge...
Uniform error bounds for quantized dynamical models
arXiv:2602.15586v1 Announce Type: new Abstract: This paper provides statistical guarantees on the accuracy of dynamical models learned from dependent data sequences. Specifically, we develop uniform error bounds that apply to quantized models and imperfect optimization algorithms commonly used in practical...
A unified theory of feature learning in RNNs and DNNs
arXiv:2602.15593v1 Announce Type: new Abstract: Recurrent and deep neural networks (RNNs/DNNs) are cornerstone architectures in machine learning. Remarkably, RNNs differ from DNNs only by weight sharing, as can be shown through unrolling in time. How does this structural similarity fit...