SE-Search: Self-Evolving Search Agent via Memory and Dense Reward
arXiv:2603.03293v1 Announce Type: new Abstract: Retrieval augmented generation (RAG) reduces hallucinations and factual errors in large language models (LLMs) by conditioning generation on retrieved external knowledge. Recent search agents further cast RAG as an autonomous, multi-turn information-seeking process. However, existing...
How LLMs Cite and Why It Matters: A Cross-Model Audit of Reference Fabrication in AI-Assisted Academic Writing and Methods to Detect Phantom Citations
arXiv:2603.03299v1 Announce Type: new Abstract: Large language models (LLMs) have been noted to fabricate scholarly citations, yet the scope of this behavior across providers, domains, and prompting conditions remains poorly quantified. We present one of the largest citation hallucination audits...
Benchmarking Legal RAG: The Promise and Limits of AI Statutory Surveys
arXiv:2603.03300v1 Announce Type: new Abstract: Retrieval-augmented generation (RAG) offers significant potential for legal AI, yet systematic benchmarks are sparse. Prior work introduced LaborBench to benchmark RAG models based on ostensible ground truth from an exhaustive, multi-month, manual enumeration of all...
Combating data scarcity in recommendation services: Integrating cognitive types of VARK and neural network technologies (LLM)
arXiv:2603.03309v1 Announce Type: new Abstract: Cold start scenarios present fundamental obstacles to effective recommendation generation, particularly when dealing with users lacking interaction history or items with sparse metadata. This research proposes an innovative hybrid framework that leverages Large Language Models...
Entropic-Time Inference: Self-Organizing Large Language Model Decoding Beyond Attention
arXiv:2603.03310v1 Announce Type: new Abstract: Modern large language model (LLM) inference engines optimize throughput and latency under fixed decoding rules, treating generation as a linear progression in token time. We propose a fundamentally different paradigm: entropic\-time inference, where decoding is...
The Logovista English-Japanese Machine Translation System
arXiv:2603.03311v1 Announce Type: new Abstract: This paper documents the architecture, development practices, and preserved artifacts of the Logovista English--Japanese machine translation system, a large, explicitly rule-based MT system that was developed and sold commercially from the early 1990s through at...
StructLens: A Structural Lens for Language Models via Maximum Spanning Trees
arXiv:2603.03328v1 Announce Type: new Abstract: Language exhibits inherent structures, a property that explains both language acquisition and language change. Given this characteristic, we expect language models to manifest internal structures as well. While interpretability research has investigated the components of...
AutoHarness: improving LLM agents by automatically synthesizing a code harness
arXiv:2603.03329v1 Announce Type: new Abstract: Despite significant strides in language models in the last few years, when used as agents, such models often try to perform actions that are not just suboptimal for a given state, but are strictly prohibited...
PulseLM: A Foundation Dataset and Benchmark for PPG-Text Learning
arXiv:2603.03331v1 Announce Type: new Abstract: Photoplethysmography (PPG) is a widely used non-invasive sensing modality for continuous cardiovascular and physiological monitoring across clinical, laboratory, and wearable settings. While existing PPG datasets support a broad range of downstream tasks, they typically provide...
The CompMath-MCQ Dataset: Are LLMs Ready for Higher-Level Math?
arXiv:2603.03334v1 Announce Type: new Abstract: The evaluation of Large Language Models (LLMs) on mathematical reasoning has largely focused on elementary problems, competition-style questions, or formal theorem proving, leaving graduate-level and computational mathematics relatively underexplored. We introduce CompMath-MCQ, a new benchmark...
Prompt-Dependent Ranking of Large Language Models with Uncertainty Quantification
arXiv:2603.03336v1 Announce Type: new Abstract: Rankings derived from pairwise comparisons are central to many economic and computational systems. In the context of large language models (LLMs), rankings are typically constructed from human preference data and presented as leaderboards that guide...
Tracing Pharmacological Knowledge In Large Language Models
arXiv:2603.03407v1 Announce Type: new Abstract: Large language models (LLMs) have shown strong empirical performance across pharmacology and drug discovery tasks, yet the internal mechanisms by which they encode pharmacological knowledge remain poorly understood. In this work, we investigate how drug-group...
Farther the Shift, Sparser the Representation: Analyzing OOD Mechanisms in LLMs
arXiv:2603.03415v1 Announce Type: new Abstract: In this work, we investigate how Large Language Models (LLMs) adapt their internal representations when encountering inputs of increasing difficulty, quantified as the degree of out-of-distribution (OOD) shift. We reveal a consistent and quantifiable phenomenon:...
Raising Bars, Not Parameters: LilMoo Compact Language Model for Hindi
arXiv:2603.03508v1 Announce Type: new Abstract: The dominance of large multilingual foundation models has widened linguistic inequalities in Natural Language Processing (NLP), often leaving low-resource languages underrepresented. This paper introduces LilMoo, a 0.6-billion-parameter Hindi language model trained entirely from scratch to...
A theoretical model of dynamical grammatical gender shifting based on set-valued set function
arXiv:2603.03510v1 Announce Type: new Abstract: This study investigates the diverse characteristics of nouns, focusing on both semantic (e.g., countable/uncountable) and morphosyntactic (e.g., masculine/feminine) distinctions. We explore inter-word variations for gender markers in noun morphology. Grammatical gender shift is a widespread...
AOI: Turning Failed Trajectories into Training Signals for Autonomous Cloud Diagnosis
arXiv:2603.03378v1 Announce Type: new Abstract: Large language model (LLM) agents offer a promising data-driven approach to automating Site Reliability Engineering (SRE), yet their enterprise deployment is constrained by three challenges: restricted access to proprietary data, unsafe action execution under permission-governed...
RADAR: Learning to Route with Asymmetry-aware DistAnce Representations
arXiv:2603.03388v1 Announce Type: new Abstract: Recent neural solvers have achieved strong performance on vehicle routing problems (VRPs), yet they mainly assume symmetric Euclidean distances, restricting applicability to real-world scenarios. A core challenge is encoding the relational features in asymmetric distance...
Towards Improved Sentence Representations using Token Graphs
arXiv:2603.03389v1 Announce Type: new Abstract: Obtaining a single-vector representation from a Large Language Model's (LLM) token-level outputs is a critical step for nearly all sentence-level tasks. However, standard pooling methods like mean or max aggregation treat tokens as an independent...
[Re] FairDICE: A Gap Between Theory And Practice
arXiv:2603.03454v1 Announce Type: new Abstract: Offline Reinforcement Learning (RL) is an emerging field of RL in which policies are learned solely from demonstrations. Within offline RL, some environments involve balancing multiple objectives, but existing multi-objective offline RL algorithms do not...
Half the Nonlinearity Is Wasted: Measuring and Reallocating the Transformer's MLP Budget
arXiv:2603.03459v1 Announce Type: new Abstract: We investigate when transformer MLP nonlinearity is actually necessary. A gate with $d+1$ parameters decides when to replace the full MLP with a linear surrogate. Through systematic investigation across six models (162M-2.8B parameters), two architectures,...
Biased Generalization in Diffusion Models
arXiv:2603.03469v1 Announce Type: new Abstract: Generalization in generative modeling is defined as the ability to learn an underlying distribution from a finite dataset and produce novel samples, with evaluation largely driven by held-out performance and perceived sample quality. In practice,...
Minimax Optimal Strategy for Delayed Observations in Online Reinforcement Learning
arXiv:2603.03480v1 Announce Type: new Abstract: We study reinforcement learning with delayed state observation, where the agent observes the current state after some random number of time steps. We propose an algorithm that combines the augmentation method and the upper confidence...
Optimal trajectory-guided stochastic co-optimization for e-fuel system design and real-time operation
arXiv:2603.03484v1 Announce Type: new Abstract: E-fuels are promising long-term energy carriers supporting the net-zero transition. However, the large combinatorial design-operation spaces under renewable uncertainty make the use of mathematical programming impractical for co-optimizing e-fuel production systems. Here, we present MasCOR,...
When Small Variations Become Big Failures: Reliability Challenges in Compute-in-Memory Neural Accelerators
arXiv:2603.03491v1 Announce Type: new Abstract: Compute-in-memory (CiM) architectures promise significant improvements in energy efficiency and throughput for deep neural network acceleration by alleviating the von Neumann bottleneck. However, their reliance on emerging non-volatile memory devices introduces device-level non-idealities-such as write...
Solving adversarial examples requires solving exponential misalignment
arXiv:2603.03507v1 Announce Type: new Abstract: Adversarial attacks - input perturbations imperceptible to humans that fool neural networks - remain both a persistent failure mode in machine learning, and a phenomenon with mysterious origins. To shed light, we define and analyze...
Orbital Transformers for Predicting Wavefunctions in Time-Dependent Density Functional Theory
arXiv:2603.03511v1 Announce Type: new Abstract: We aim to learn wavefunctions simulated by time-dependent density functional theory (TDDFT), which can be efficiently represented as linear combination coefficients of atomic orbitals. In real-time TDDFT, the electronic wavefunctions of a molecule evolve over...
Q-Measure-Learning for Continuous State RL: Efficient Implementation and Convergence
arXiv:2603.03523v1 Announce Type: new Abstract: We study reinforcement learning in infinite-horizon discounted Markov decision processes with continuous state spaces, where data are generated online from a single trajectory under a Markovian behavior policy. To avoid maintaining an infinite-dimensional, function-valued estimate,...
Test-Time Meta-Adaptation with Self-Synthesis
arXiv:2603.03524v1 Announce Type: new Abstract: As strong general reasoners, large language models (LLMs) encounter diverse domains and tasks, where the ability to adapt and self-improve at test time is valuable. We introduce MASS, a meta-learning framework that enables LLMs to...
Logit-Level Uncertainty Quantification in Vision-Language Models for Histopathology Image Analysis
arXiv:2603.03527v1 Announce Type: new Abstract: Vision-Language Models (VLMs) with their multimodal capabilities have demonstrated remarkable success in almost all domains, including education, transportation, healthcare, energy, finance, law, and retail. Nevertheless, the utilization of VLMs in healthcare applications raises crucial concerns...
mlx-snn: Spiking Neural Networks on Apple Silicon via MLX
arXiv:2603.03529v1 Announce Type: new Abstract: We introduce mlx-snn, the first spiking neural network (SNN) library built natively on Apple's MLX framework. As SNN research grows rapidly, all major libraries -- snnTorch, Norse, SpikingJelly, Lava -- target PyTorch or custom backends,...