A Bayesian Information-Theoretic Approach to Data Attribution
arXiv:2604.03858v1 Announce Type: new Abstract: Training Data Attribution (TDA) seeks to trace model predictions back to influential training examples, enhancing interpretability and safety. We formulate TDA as a Bayesian information-theoretic problem: subsets are scored by the information loss they induce...
Algebraic Diversity: Group-Theoretic Spectral Estimation from Single Observations
arXiv:2604.03634v1 Announce Type: new Abstract: We prove that temporal averaging over multiple observations can be replaced by algebraic group action on a single observation for second-order statistical estimation. A General Replacement Theorem establishes conditions under which a group-averaged estimator from...
Autoencoder-Based Parameter Estimation for Superposed Multi-Component Damped Sinusoidal Signals
arXiv:2604.03985v1 Announce Type: new Abstract: Damped sinusoidal oscillations are widely observed in many physical systems, and their analysis provides access to underlying physical properties. However, parameter estimation becomes difficult when the signal decays rapidly, multiple components are superposed, and observational...
Automated Attention Pattern Discovery at Scale in Large Language Models
arXiv:2604.03764v1 Announce Type: new Abstract: Large language models have found success by scaling up capabilities to work in general settings. The same can unfortunately not be said for interpretability methods. The current trend in mechanistic interpretability is to provide precise...
POEMetric: The Last Stanza of Humanity
arXiv:2604.03695v1 Announce Type: new Abstract: Large Language Models (LLMs) can compose poetry, but how far are they from human poets? In this paper, we introduce POEMetric, the first comprehensive framework for poetry evaluation, examining 1) basic instruction-following abilities in generating...
Diagonal-Tiled Mixed-Precision Attention for Efficient Low-Bit MXFP Inference
arXiv:2604.03950v1 Announce Type: new Abstract: Transformer-based large language models (LLMs) have demonstrated remarkable performance across a wide range of real-world tasks, but their inference cost remains prohibitively high due to the quadratic complexity of attention and the memory bandwidth limitations...
BlazeFL: Fast and Deterministic Federated Learning Simulation
arXiv:2604.03606v1 Announce Type: new Abstract: Federated learning (FL) research increasingly relies on single-node simulations with hundreds or thousands of virtual clients, making both efficiency and reproducibility essential. Yet parallel client training often introduces nondeterminism through shared random state and scheduling...
GeoBrowse: A Geolocation Benchmark for Agentic Tool Use with Expert-Annotated Reasoning Traces
arXiv:2604.04017v1 Announce Type: new Abstract: Deep research agents integrate fragmented evidence through multi-step tool use. BrowseComp offers a text-only testbed for such agents, but existing multimodal benchmarks rarely require both weak visual cues composition and BrowseComp-style multi-hop verification. Geolocation is...
Are Arabic Benchmarks Reliable? QIMMA's Quality-First Approach to LLM Evaluation
arXiv:2604.03395v1 Announce Type: new Abstract: We present QIMMA, a quality-assured Arabic LLM leaderboard that places systematic benchmark validation at its core. Rather than aggregating existing resources as-is, QIMMA applies a multi-model assessment pipeline combining automated LLM judgment with human review...
Improving Model Performance by Adapting the KGE Metric to Account for System Non-Stationarity
arXiv:2604.03906v1 Announce Type: new Abstract: Geoscientific systems tend to be characterized by pronounced temporal non-stationarity, arising from seasonal and climatic variability in hydrometeorological drivers, and from natural and anthropogenic changes to land use and cover. As has been pointed out,...
FactReview: Evidence-Grounded Reviews with Literature Positioning and Execution-Based Claim Verification
arXiv:2604.04074v1 Announce Type: new Abstract: Peer review in machine learning is under growing pressure from rising submission volume and limited reviewer time. Most LLM-based reviewing systems read only the manuscript and generate comments from the paper's own narrative. This makes...
Supervised Dimensionality Reduction Revisited: Why LDA on Frozen CNN Features Deserves a Second Look
arXiv:2604.03928v1 Announce Type: new Abstract: Effective ride-hailing dispatch requires anticipating demand patterns that vary substantially across time-of-day, day-of-week, season, and special events. We propose a regime-calibrated approach that (i) segments historical trip data into demand regimes, (ii) matches the current...
Rashomon Memory: Towards Argumentation-Driven Retrieval for Multi-Perspective Agent Memory
arXiv:2604.03588v1 Announce Type: new Abstract: AI agents operating over extended time horizons accumulate experiences that serve multiple concurrent goals, and must often maintain conflicting interpretations of the same events. A concession during a client negotiation encodes as a ``trust-building investment''...
IC3-Evolve: Proof-/Witness-Gated Offline LLM-Driven Heuristic Evolution for IC3 Hardware Model Checking
arXiv:2604.03232v1 Announce Type: new Abstract: IC3, also known as property-directed reachability (PDR), is a commonly-used algorithm for hardware safety model checking. It checks if a state transition system complies with a given safety property. IC3 either returns UNSAFE (indicating property...
Simple yet Effective: Low-Rank Spatial Attention for Neural Operators
arXiv:2604.03582v1 Announce Type: new Abstract: Neural operators have emerged as data-driven surrogates for solving partial differential equations (PDEs), and their success hinges on efficiently modeling the long-range, global coupling among spatial points induced by the underlying physics. In many PDE...
DARE: Diffusion Large Language Models Alignment and Reinforcement Executor
arXiv:2604.04215v1 Announce Type: new Abstract: Diffusion large language models (dLLMs) are emerging as a compelling alternative to dominant autoregressive models, replacing strictly sequential token generation with iterative denoising and parallel generation dynamics. However, their open-source ecosystem remains fragmented across model...
AdaHOP: Fast and Accurate Low-Precision Training via Outlier-Pattern-Aware Rotation
arXiv:2604.02525v1 Announce Type: new Abstract: Low-precision training (LPT) commonly employs Hadamard transforms to suppress outliers and mitigate quantization error in large language models (LLMs). However, prior methods apply a fixed transform uniformly, despite substantial variation in outlier structures across tensors....
LLM-based Atomic Propositions help weak extractors: Evaluation of a Propositioner for triplet extraction
arXiv:2604.02866v1 Announce Type: new Abstract: Knowledge Graph construction from natural language requires extracting structured triplets from complex, information-dense sentences. In this paper, we investigate if the decomposition of text into atomic propositions (minimal, semantically autonomous units of information) can improve...
AutoVerifier: An Agentic Automated Verification Framework Using Large Language Models
arXiv:2604.02617v1 Announce Type: new Abstract: Scientific and Technical Intelligence (S&TI) analysis requires verifying complex technical claims across rapidly growing literature, where existing approaches fail to bridge the verification gap between surface-level accuracy and deeper methodological validity. We present AutoVerifier, an...
Multi-Turn Reinforcement Learning for Tool-Calling Agents with Iterative Reward Calibration
arXiv:2604.02869v1 Announce Type: new Abstract: Training tool-calling agents with reinforcement learning on multi-turn tasks remains challenging due to sparse outcome rewards and difficult credit assignment across conversation turns. We present the first application of MT-GRPO (Multi-Turn Group Relative Policy Optimization)...
Council Mode: Mitigating Hallucination and Bias in LLMs via Multi-Agent Consensus
arXiv:2604.02923v1 Announce Type: new Abstract: Large Language Models (LLMs), particularly those employing Mixture-of-Experts (MoE) architectures, have achieved remarkable capabilities across diverse natural language processing tasks. However, these models frequently suffer from hallucinations -- generating plausible but factually incorrect content --...
Learning the Signature of Memorization in Autoregressive Language Models
arXiv:2604.03199v1 Announce Type: new Abstract: All prior membership inference attacks for fine-tuned language models use hand-crafted heuristics (e.g., loss thresholding, Min-K\%, reference calibration), each bounded by the designer's intuition. We introduce the first transferable learned attack, enabled by the observation...
ROMAN: A Multiscale Routing Operator for Convolutional Time Series Models
arXiv:2604.02577v1 Announce Type: new Abstract: We introduce ROMAN (ROuting Multiscale representAtioN), a deterministic operator for time series that maps temporal scale and coarse temporal position into an explicit channel structure while reducing sequence length. ROMAN builds an anti-aliased multiscale pyramid,...
Product-Stability: Provable Convergence for Gradient Descent on the Edge of Stability
arXiv:2604.02653v1 Announce Type: new Abstract: Empirically, modern deep learning training often occurs at the Edge of Stability (EoS), where the sharpness of the loss exceeds the threshold below which classical convergence analysis applies. Despite recent progress, existing theoretical explanations of...
GRADE: Probing Knowledge Gaps in LLMs through Gradient Subspace Dynamics
arXiv:2604.02830v1 Announce Type: new Abstract: Detecting whether a model's internal knowledge is sufficient to correctly answer a given question is a fundamental challenge in deploying responsible LLMs. In addition to verbalising the confidence by LLM self-report, more recent methods explore...
WGFINNs: Weak formulation-based GENERIC formalism informed neural networks'
arXiv:2604.02601v1 Announce Type: new Abstract: Data-driven discovery of governing equations from noisy observations remains a fundamental challenge in scientific machine learning. While GENERIC formalism informed neural networks (GFINNs) provide a principled framework that enforces the laws of thermodynamics by construction,...
Prism: Policy Reuse via Interpretable Strategy Mapping in Reinforcement Learning
arXiv:2604.02353v1 Announce Type: cross Abstract: We present PRISM (Policy Reuse via Interpretable Strategy Mapping), a framework that grounds reinforcement learning agents' decisions in discrete, causally validated concepts and uses those concepts as a zero-shot transfer interface between agents trained with...
InfoSeeker: A Scalable Hierarchical Parallel Agent Framework for Web Information Seeking
arXiv:2604.02971v1 Announce Type: new Abstract: Recent agentic search systems have made substantial progress by emphasising deep, multi-step reasoning. However, this focus often overlooks the challenges of wide-scale information synthesis, where agents must aggregate large volumes of heterogeneous evidence across many...
Internalized Reasoning for Long-Context Visual Document Understanding
arXiv:2604.02371v1 Announce Type: cross Abstract: Visual long-document understanding is critical for enterprise, legal, and scientific applications, yet the best performing open recipes have not explored reasoning, a capability which has driven leaps in math and code performance. We introduce a...
ESL-Bench: An Event-Driven Synthetic Longitudinal Benchmark for Health Agents
arXiv:2604.02834v1 Announce Type: new Abstract: Longitudinal health agents must reason across multi-source trajectories that combine continuous device streams, sparse clinical exams, and episodic life events - yet evaluating them is hard: real-world data cannot be released at scale, and temporally...