SE-Search: Self-Evolving Search Agent via Memory and Dense Reward
arXiv:2603.03293v1 Announce Type: new Abstract: Retrieval augmented generation (RAG) reduces hallucinations and factual errors in large language models (LLMs) by conditioning generation on retrieved external knowledge. Recent search agents further cast RAG as an autonomous, multi-turn information-seeking process. However, existing...
How LLMs Cite and Why It Matters: A Cross-Model Audit of Reference Fabrication in AI-Assisted Academic Writing and Methods to Detect Phantom Citations
arXiv:2603.03299v1 Announce Type: new Abstract: Large language models (LLMs) have been noted to fabricate scholarly citations, yet the scope of this behavior across providers, domains, and prompting conditions remains poorly quantified. We present one of the largest citation hallucination audits...
The Logovista English-Japanese Machine Translation System
arXiv:2603.03311v1 Announce Type: new Abstract: This paper documents the architecture, development practices, and preserved artifacts of the Logovista English--Japanese machine translation system, a large, explicitly rule-based MT system that was developed and sold commercially from the early 1990s through at...
StructLens: A Structural Lens for Language Models via Maximum Spanning Trees
arXiv:2603.03328v1 Announce Type: new Abstract: Language exhibits inherent structures, a property that explains both language acquisition and language change. Given this characteristic, we expect language models to manifest internal structures as well. While interpretability research has investigated the components of...
AutoHarness: improving LLM agents by automatically synthesizing a code harness
arXiv:2603.03329v1 Announce Type: new Abstract: Despite significant strides in language models in the last few years, when used as agents, such models often try to perform actions that are not just suboptimal for a given state, but are strictly prohibited...
The CompMath-MCQ Dataset: Are LLMs Ready for Higher-Level Math?
arXiv:2603.03334v1 Announce Type: new Abstract: The evaluation of Large Language Models (LLMs) on mathematical reasoning has largely focused on elementary problems, competition-style questions, or formal theorem proving, leaving graduate-level and computational mathematics relatively underexplored. We introduce CompMath-MCQ, a new benchmark...
Prompt-Dependent Ranking of Large Language Models with Uncertainty Quantification
arXiv:2603.03336v1 Announce Type: new Abstract: Rankings derived from pairwise comparisons are central to many economic and computational systems. In the context of large language models (LLMs), rankings are typically constructed from human preference data and presented as leaderboards that guide...
Tracing Pharmacological Knowledge In Large Language Models
arXiv:2603.03407v1 Announce Type: new Abstract: Large language models (LLMs) have shown strong empirical performance across pharmacology and drug discovery tasks, yet the internal mechanisms by which they encode pharmacological knowledge remain poorly understood. In this work, we investigate how drug-group...
Farther the Shift, Sparser the Representation: Analyzing OOD Mechanisms in LLMs
arXiv:2603.03415v1 Announce Type: new Abstract: In this work, we investigate how Large Language Models (LLMs) adapt their internal representations when encountering inputs of increasing difficulty, quantified as the degree of out-of-distribution (OOD) shift. We reveal a consistent and quantifiable phenomenon:...
A theoretical model of dynamical grammatical gender shifting based on set-valued set function
arXiv:2603.03510v1 Announce Type: new Abstract: This study investigates the diverse characteristics of nouns, focusing on both semantic (e.g., countable/uncountable) and morphosyntactic (e.g., masculine/feminine) distinctions. We explore inter-word variations for gender markers in noun morphology. Grammatical gender shift is a widespread...
[Re] FairDICE: A Gap Between Theory And Practice
arXiv:2603.03454v1 Announce Type: new Abstract: Offline Reinforcement Learning (RL) is an emerging field of RL in which policies are learned solely from demonstrations. Within offline RL, some environments involve balancing multiple objectives, but existing multi-objective offline RL algorithms do not...
Half the Nonlinearity Is Wasted: Measuring and Reallocating the Transformer's MLP Budget
arXiv:2603.03459v1 Announce Type: new Abstract: We investigate when transformer MLP nonlinearity is actually necessary. A gate with $d+1$ parameters decides when to replace the full MLP with a linear surrogate. Through systematic investigation across six models (162M-2.8B parameters), two architectures,...
Biased Generalization in Diffusion Models
arXiv:2603.03469v1 Announce Type: new Abstract: Generalization in generative modeling is defined as the ability to learn an underlying distribution from a finite dataset and produce novel samples, with evaluation largely driven by held-out performance and perceived sample quality. In practice,...
Minimax Optimal Strategy for Delayed Observations in Online Reinforcement Learning
arXiv:2603.03480v1 Announce Type: new Abstract: We study reinforcement learning with delayed state observation, where the agent observes the current state after some random number of time steps. We propose an algorithm that combines the augmentation method and the upper confidence...
Orbital Transformers for Predicting Wavefunctions in Time-Dependent Density Functional Theory
arXiv:2603.03511v1 Announce Type: new Abstract: We aim to learn wavefunctions simulated by time-dependent density functional theory (TDDFT), which can be efficiently represented as linear combination coefficients of atomic orbitals. In real-time TDDFT, the electronic wavefunctions of a molecule evolve over...
Q-Measure-Learning for Continuous State RL: Efficient Implementation and Convergence
arXiv:2603.03523v1 Announce Type: new Abstract: We study reinforcement learning in infinite-horizon discounted Markov decision processes with continuous state spaces, where data are generated online from a single trajectory under a Markovian behavior policy. To avoid maintaining an infinite-dimensional, function-valued estimate,...
Test-Time Meta-Adaptation with Self-Synthesis
arXiv:2603.03524v1 Announce Type: new Abstract: As strong general reasoners, large language models (LLMs) encounter diverse domains and tasks, where the ability to adapt and self-improve at test time is valuable. We introduce MASS, a meta-learning framework that enables LLMs to...
Trade-offs in Ensembling, Merging and Routing Among Parameter-Efficient Experts
arXiv:2603.03535v1 Announce Type: new Abstract: While large language models (LLMs) fine-tuned with lightweight adapters achieve strong performance across diverse tasks, their performance on individual tasks depends on the fine-tuning strategy. Fusing independently trained models with different strengths has shown promise...
Online Learnability of Chain-of-Thought Verifiers: Soundness and Completeness Trade-offs
arXiv:2603.03538v1 Announce Type: new Abstract: Large language models with chain-of-thought generation have demonstrated great potential for producing complex mathematical proofs. However, their reasoning can often go astray, leading to increasing interest in formal and learned verifiers. A major challenge in...
NuMuon: Nuclear-Norm-Constrained Muon for Compressible LLM Training
arXiv:2603.03597v1 Announce Type: new Abstract: The rapid progress of large language models (LLMs) is increasingly constrained by memory and deployment costs, motivating compression methods for practical deployment. Many state-of-the-art compression pipelines leverage the low-rank structure of trained weight matrices, a...
Why Are Linear RNNs More Parallelizable?
arXiv:2603.03612v1 Announce Type: new Abstract: The community is increasingly exploring linear RNNs (LRNNs) as language models, motivated by their expressive power and parallelizability. While prior work establishes the expressivity benefits of LRNNs over transformers, it is unclear what makes LRNNs...
A Stein Identity for q-Gaussians with Bounded Support
arXiv:2603.03673v1 Announce Type: new Abstract: Stein's identity is a fundamental tool in machine learning with applications in generative models, stochastic optimization, and other problems involving gradients of expectations under Gaussian distributions. Less attention has been paid to problems with non-Gaussian...
MOOSE-Star: Unlocking Tractable Training for Scientific Discovery by Breaking the Complexity Barrier
arXiv:2603.03756v1 Announce Type: new Abstract: While large language models (LLMs) show promise in scientific discovery, existing research focuses on inference or feedback-driven training, leaving the direct modeling of the generative reasoning process, $P(\text{hypothesis}|\text{background})$ ($P(h|b)$), unexplored. We demonstrate that directly training...
LEA: Label Enumeration Attack in Vertical Federated Learning
arXiv:2603.03777v1 Announce Type: new Abstract: A typical Vertical Federated Learning (VFL) scenario involves several participants collaboratively training a machine learning model, where each party has different features for the same samples, with labels held exclusively by one party. Since labels...
Inverse Contextual Bandits without Rewards: Learning from a Non-Stationary Learner via Suffix Imitation
arXiv:2603.03778v1 Announce Type: new Abstract: We study the Inverse Contextual Bandit (ICB) problem, in which a learner seeks to optimize a policy while an observer, who cannot access the learner's rewards and only observes actions, aims to recover the underlying...
Relational In-Context Learning via Synthetic Pre-training with Structural Prior
arXiv:2603.03805v1 Announce Type: new Abstract: Relational Databases (RDBs) are the backbone of modern business, yet they lack foundation models comparable to those in text or vision. A key obstacle is that high-quality RDBs are private, scarce and structurally heterogeneous, making...
Pretrained Vision-Language-Action Models are Surprisingly Resistant to Forgetting in Continual Learning
arXiv:2603.03818v1 Announce Type: new Abstract: Continual learning is a long-standing challenge in robot policy learning, where a policy must acquire new skills over time without catastrophically forgetting previously learned ones. While prior work has extensively studied continual learning in relatively...
Fairness Begins with State: Purifying Latent Preferences for Hierarchical Reinforcement Learning in Interactive Recommendation
arXiv:2603.03820v1 Announce Type: new Abstract: Interactive recommender systems (IRS) are increasingly optimized with Reinforcement Learning (RL) to capture the sequential nature of user-system dynamics. However, existing fairness-aware methods often suffer from a fundamental oversight: they assume the observed user state...
Structure-Aware Distributed Backdoor Attacks in Federated Learning
arXiv:2603.03865v1 Announce Type: new Abstract: While federated learning protects data privacy, it also makes the model update process vulnerable to long-term stealthy perturbations. Existing studies on backdoor attacks in federated learning mainly focus on trigger design or poisoning strategies, typically...
HateMirage: An Explainable Multi-Dimensional Dataset for Decoding Faux Hate and Subtle Online Abuse
arXiv:2603.02684v1 Announce Type: new Abstract: Subtle and indirect hate speech remains an underexplored challenge in online safety research, particularly when harmful intent is embedded within misleading or manipulative narratives. Existing hate speech datasets primarily capture overt toxicity, underrepresenting the nuanced...