Evaluate-as-Action: Self-Evaluated Process Rewards for Retrieval-Augmented Agents
arXiv:2603.09203v1 Announce Type: new Abstract: Retrieval-augmented agents can query external evidence, yet their reliability in multi-step reasoning remains limited: noisy retrieval may derail multi-hop question answering, while outcome-only reinforcement learning provides credit signals that are too coarse to optimize intermediate...
ConFu: Contemplate the Future for Better Speculative Sampling
arXiv:2603.08899v1 Announce Type: new Abstract: Speculative decoding has emerged as a powerful approach to accelerate large language model (LLM) inference by employing lightweight draft models to propose candidate tokens that are subsequently verified by the target model. The effectiveness of...
EPOCH: An Agentic Protocol for Multi-Round System Optimization
arXiv:2603.09049v1 Announce Type: new Abstract: Autonomous agents are increasingly used to improve prompts, code, and machine learning systems through iterative execution and feedback. Yet existing approaches are usually designed as task-specific optimization loops rather than as a unified protocol for...
Interpretable Markov-Based Spatiotemporal Risk Surfaces for Missing-Child Search Planning with Reinforcement Learning and LLM-Based Quality Assurance
arXiv:2603.08933v1 Announce Type: new Abstract: The first 72 hours of a missing-child investigation are critical for successful recovery. However, law enforcement agencies often face fragmented, unstructured data and a lack of dynamic, geospatial predictive tools. Our system, Guardian, provides an...
AgentOS: From Application Silos to a Natural Language-Driven Data Ecosystem
arXiv:2603.08938v1 Announce Type: new Abstract: The rapid emergence of open-source, locally hosted intelligent agents marks a critical inflection point in human-computer interaction. Systems such as OpenClaw demonstrate that Large Language Model (LLM)-based agents can autonomously operate local computing environments, orchestrate...
You Didn't Have to Say It like That: Subliminal Learning from Faithful Paraphrases
arXiv:2603.09517v1 Announce Type: new Abstract: When language models are trained on synthetic data, they (student model) can covertly acquire behavioral traits from the data-generating model (teacher model). Subliminal learning refers to the transmission of traits from a teacher to a...
Build, Borrow, or Just Fine-Tune? A Political Scientist's Guide to Choosing NLP Models
arXiv:2603.09595v1 Announce Type: new Abstract: Political scientists increasingly face a consequential choice when adopting natural language processing tools: build a domain-specific model from scratch, borrow and adapt an existing one, or simply fine-tune a general-purpose model on task data? Each...
Surgical Repair of Collapsed Attention Heads in ALiBi Transformers
arXiv:2603.09616v1 Announce Type: new Abstract: We identify a systematic attention collapse pathology in the BLOOM family of transformer language models, where ALiBi positional encoding causes 31-44% of attention heads to attend almost entirely to the beginning-of-sequence token. The collapse follows...
Understanding the Interplay between LLMs' Utilisation of Parametric and Contextual Knowledge: A keynote at ECIR 2025
arXiv:2603.09654v1 Announce Type: new Abstract: Language Models (LMs) acquire parametric knowledge from their training process, embedding it within their weights. The increasing scalability of LMs, however, poses significant challenges for understanding a model's inner workings and further for updating or...
Automatic Cardiac Risk Management Classification using large-context Electronic Patients Health Records
arXiv:2603.09685v1 Announce Type: new Abstract: To overcome the limitations of manual administrative coding in geriatric Cardiovascular Risk Management, this study introduces an automated classification framework leveraging unstructured Electronic Health Records (EHRs). Using a dataset of 3,482 patients, we benchmarked three...
Fusing Semantic, Lexical, and Domain Perspectives for Recipe Similarity Estimation
arXiv:2603.09688v1 Announce Type: new Abstract: This research focuses on developing advanced methods for assessing similarity between recipes by combining different sources of information and analytical approaches. We explore the semantic, lexical, and domain similarity of food recipes, evaluated through the...
RbtAct: Rebuttal as Supervision for Actionable Review Feedback Generation
arXiv:2603.09723v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used across the scientific workflow, including to draft peer-review reports. However, many AI-generated reviews are superficial and insufficiently actionable, leaving authors without concrete, implementable guidance and motivating the gap...
EPIC-EuroParl-UdS: Information-Theoretic Perspectives on Translation and Interpreting
arXiv:2603.09785v1 Announce Type: new Abstract: This paper introduces an updated and combined version of the bidirectional English-German EPIC-UdS (spoken) and EuroParl-UdS (written) corpora containing original European Parliament speeches as well as their translations and interpretations. The new version corrects metadata...
Chow-Liu Ordering for Long-Context Reasoning in Chain-of-Agents
arXiv:2603.09835v1 Announce Type: new Abstract: Sequential multi-agent reasoning frameworks such as Chain-of-Agents (CoA) handle long-context queries by decomposing inputs into chunks and processing them sequentially using LLM-based worker agents that read from and update a bounded shared memory. From a...
N-gram-like Language Models Predict Reading Time Best
arXiv:2603.09872v1 Announce Type: new Abstract: Recent work has found that contemporary language models such as transformers can become so good at next-word prediction that the probabilities they calculate become worse for predicting reading time. In this paper, we propose that...
Benchmarking Political Persuasion Risks Across Frontier Large Language Models
arXiv:2603.09884v1 Announce Type: new Abstract: Concerns persist regarding the capacity of Large Language Models (LLMs) to sway political views. Although prior research has claimed that LLMs are not more persuasive than standard political campaign practices, the recent rise of frontier...
Thinking to Recall: How Reasoning Unlocks Parametric Knowledge in LLMs
arXiv:2603.09906v1 Announce Type: new Abstract: While reasoning in LLMs plays a natural role in math, code generation, and multi-hop factual questions, its effect on simple, single-hop factual questions remains unclear. Such questions do not require step-by-step logical decomposition, making the...
Model Merging in the Era of Large Language Models: Methods, Applications, and Future Directions
arXiv:2603.09938v1 Announce Type: new Abstract: Model merging has emerged as a transformative paradigm for combining the capabilities of multiple neural networks into a single unified model without additional training. With the rapid proliferation of fine-tuned large language models~(LLMs), merging techniques...
CREATE: Testing LLMs for Associative Creativity
arXiv:2603.09970v1 Announce Type: new Abstract: A key component of creativity is associative reasoning: the ability to draw novel yet meaningful connections between concepts. We introduce CREATE, a benchmark designed to evaluate models' capacity for creative associative reasoning. CREATE requires models...
VeriInteresting: An Empirical Study of Model Prompt Interactions in Verilog Code Generation
arXiv:2603.08715v1 Announce Type: cross Abstract: Rapid advances in language models (LMs) have created new opportunities for automated code generation while complicating trade-offs between model characteristics and prompt design choices. In this work, we provide an empirical map of recent trends...
Self-hosted Lecture-to-Quiz: Local LLM MCQ Generation with Deterministic Quality Control
arXiv:2603.08729v1 Announce Type: cross Abstract: We present an end-to-end self-hosted (API-free) pipeline, where API-free means that lecture content is not sent to any external LLM service, that converts lecture PDFs into multiple-choice questions (MCQs) using a local LLM plus deterministic...
From Word2Vec to Transformers: Text-Derived Composition Embeddings for Filtering Combinatorial Electrocatalysts
arXiv:2603.08881v1 Announce Type: cross Abstract: Compositionally complex solid solution electrocatalysts span vast composition spaces, and even one materials system can contain more candidate compositions than can be measured exhaustively. Here we evaluate a label-free screening strategy that represents each composition...
PathoScribe: Transforming Pathology Data into a Living Library with a Unified LLM-Driven Framework for Semantic Retrieval and Clinical Integration
arXiv:2603.08935v1 Announce Type: cross Abstract: Pathology underpins modern diagnosis and cancer care, yet its most valuable asset, the accumulated experience encoded in millions of narrative reports, remains largely inaccessible. Although institutions are rapidly digitizing pathology workflows, storing data without effective...
BiCLIP: Domain Canonicalization via Structured Geometric Transformation
arXiv:2603.08942v1 Announce Type: cross Abstract: Recent advances in vision-language models (VLMs) have demonstrated remarkable zero-shot capabilities, yet adapting these models to specialized domains remains a significant challenge. Building on recent theoretical insights suggesting that independently trained VLMs are related by...
Equitable Multi-Task Learning for AI-RANs
arXiv:2603.08717v1 Announce Type: new Abstract: AI-enabled Radio Access Networks (AI-RANs) are expected to serve heterogeneous users with time-varying learning tasks over shared edge resources. Ensuring equitable inference performance across these users requires adaptive and fair learning mechanisms. This paper introduces...
Generalized Reduction to the Isotropy for Flexible Equivariant Neural Fields
arXiv:2603.08758v1 Announce Type: new Abstract: Many geometric learning problems require invariants on heterogeneous product spaces, i.e., products of distinct spaces carrying different group actions, where standard techniques do not directly apply. We show that, when a group $G$ acts transitively...
SPREAD: Subspace Representation Distillation for Lifelong Imitation Learning
arXiv:2603.08763v1 Announce Type: new Abstract: A key challenge in lifelong imitation learning (LIL) is enabling agents to acquire new skills from expert demonstrations while retaining prior knowledge. This requires preserving the low-dimensional manifolds and geometric structures that underlie task representations...
Multi-level meta-reinforcement learning with skill-based curriculum
arXiv:2603.08773v1 Announce Type: new Abstract: We consider problems in sequential decision making with natural multi-level structure, where sub-tasks are assembled together to accomplish complex goals. Systematically inferring and leveraging hierarchical structure has remained a longstanding challenge; we describe an efficient...
Are Expressive Encoders Necessary for Discrete Graph Generation?
arXiv:2603.08825v1 Announce Type: new Abstract: Discrete graph generation has emerged as a powerful paradigm for modeling graph data, often relying on highly expressive neural backbones such as transformers or higher-order architectures. We revisit this design choice by introducing GenGNN, a...
Expressivity-Efficiency Tradeoffs for Hybrid Sequence Models
arXiv:2603.08859v1 Announce Type: new Abstract: Hybrid sequence models--combining Transformer and state-space model layers--seek to gain the expressive versatility of attention as well as the computational efficiency of state-space model layers. Despite burgeoning interest in hybrid models, we lack a basic...