RUMAD: Reinforcement-Unifying Multi-Agent Debate
arXiv:2602.23864v1 Announce Type: new Abstract: Multi-agent debate (MAD) systems leverage collective intelligence to enhance reasoning capabilities, yet existing approaches struggle to simultaneously optimize accuracy, consensus formation, and computational efficiency. Static topology methods lack adaptability to task complexity variations, while external...
RF-Agent: Automated Reward Function Design via Language Agent Tree Search
arXiv:2602.23876v1 Announce Type: new Abstract: Designing efficient reward functions for low-level control tasks is a challenging problem. Recent research aims to reduce reliance on expert experience by using Large Language Models (LLMs) with task information to generate dense reward functions....
Pessimistic Auxiliary Policy for Offline Reinforcement Learning
arXiv:2602.23974v1 Announce Type: new Abstract: Offline reinforcement learning aims to learn an agent from pre-collected datasets, avoiding unsafe and inefficient real-time interaction. However, inevitable access to out-ofdistribution actions during the learning process introduces approximation errors, causing the error accumulation and...
Portfolio Reinforcement Learning with Scenario-Context Rollout
arXiv:2602.24037v1 Announce Type: new Abstract: Market regime shifts induce distribution shifts that can degrade the performance of portfolio rebalancing policies. We propose macro-conditioned scenario-context rollout (SCR) that generates plausible next-day multivariate return scenarios under stress events. However, doing so faces...
Human or Machine? A Preliminary Turing Test for Speech-to-Speech Interaction
arXiv:2602.24080v1 Announce Type: new Abstract: The pursuit of human-like conversational agents has long been guided by the Turing test. For modern speech-to-speech (S2S) systems, a critical yet unanswered question is whether they can converse like humans. To tackle this, we...
Bi-level RL-Heuristic Optimization for Real-world Winter Road Maintenance
arXiv:2602.24097v1 Announce Type: new Abstract: Winter road maintenance is critical for ensuring public safety and reducing environmental impacts, yet existing methods struggle to manage large-scale routing problems effectively and mostly reply on human decision. This study presents a novel, scalable...
Artificial Agency Program: Curiosity, compression, and communication in agents
arXiv:2602.24100v1 Announce Type: new Abstract: This paper presents the Artificial Agency Program (AAP), a position and research agenda for building AI systems as reality embedded, resource-bounded agents whose development is driven by curiosity-as-learning-progress under physical and computational constraints. The central...
Recycling Failures: Salvaging Exploration in RLVR via Fine-Grained Off-Policy Guidance
arXiv:2602.24110v1 Announce Type: new Abstract: Reinforcement Learning from Verifiable Rewards (RLVR) has emerged as a powerful paradigm for enhancing the complex reasoning capabilities of Large Reasoning Models. However, standard outcome-based supervision suffers from a critical limitation that penalizes trajectories that...
LemmaBench: A Live, Research-Level Benchmark to Evaluate LLM Capabilities in Mathematics
arXiv:2602.24173v1 Announce Type: new Abstract: We present a new approach for benchmarking Large Language Model (LLM) capabilities on research-level mathematics. Existing benchmarks largely rely on static, hand-curated sets of contest or textbook-style problems as proxies for mathematical research. Instead, we...
Learning Flexible Job Shop Scheduling under Limited Buffers and Material Kitting Constraints
arXiv:2602.24180v1 Announce Type: new Abstract: The Flexible Job Shop Scheduling Problem (FJSP) originates from real production lines, while some practical constraints are often ignored or idealized in current FJSP studies, among which the limited buffer problem has a particular impact...
A Minimal Agent for Automated Theorem Proving
arXiv:2602.24273v1 Announce Type: new Abstract: We propose a minimal agentic baseline that enables systematic comparison across different AI-based theorem prover architectures. This design implements the core features shared among state-of-the-art systems: iterative proof refinement, library search and context management. We...
DARE-bench: Evaluating Modeling and Instruction Fidelity of LLMs in Data Science
arXiv:2602.24288v1 Announce Type: new Abstract: The fast-growing demands in using Large Language Models (LLMs) to tackle complex multi-step data science tasks create an emergent need for accurate benchmarking. There are two major gaps in existing benchmarks: (i) the lack of...
QD-MAPPER: A Quality Diversity Framework to Automatically Evaluate Multi-Agent Path Finding Algorithms in Diverse Maps
arXiv:2409.06888v5 Announce Type: cross Abstract: We use the Quality Diversity (QD) algorithm with Neural Cellular Automata (NCA) to automatically evaluate Multi-Agent Path Finding (MAPF) algorithms by generating diverse maps. Previously, researchers typically evaluate MAPF algorithms on a set of specific,...
Let There Be Claws: An Early Social Network Analysis of AI Agents on Moltbook
arXiv:2602.20044v1 Announce Type: cross Abstract: Within twelve days of launch, an AI-native social platform exhibits extreme attention concentration, hierarchical role separation, and one-way attention flow, consistent with the hypothesis that stratification in agent ecosystems can emerge rapidly rather than gradually....
Keyword search is all you need: Achieving RAG-Level Performance without vector databases using agentic tool use
arXiv:2602.23368v1 Announce Type: cross Abstract: While Retrieval-Augmented Generation (RAG) has proven effective for generating accurate, context-based responses based on existing knowledge bases, it presents several challenges including retrieval quality dependencies, integration complexity and cost. Recent advances in agentic-RAG and tool-augmented...
Toward General Semantic Chunking: A Discriminative Framework for Ultra-Long Documents
arXiv:2602.23370v1 Announce Type: cross Abstract: Long-document topic segmentation plays an important role in information retrieval and document understanding, yet existing methods still show clear shortcomings in ultra-long text settings. Traditional discriminative models are constrained by fixed windows and cannot model...
Domain-Partitioned Hybrid RAG for Legal Reasoning: Toward Modular and Explainable Legal AI for India
arXiv:2602.23371v1 Announce Type: cross Abstract: Legal research in India involves navigating long and heterogeneous documents spanning statutes, constitutional provisions, penal codes, and judicial precedents, where purely keyword-based or embedding-only retrieval systems often fail to support structured legal reasoning. Recent retrieval...
Democratizing GraphRAG: Linear, CPU-Only Graph Retrieval for Multi-Hop QA
arXiv:2602.23372v1 Announce Type: cross Abstract: GraphRAG systems improve multi-hop retrieval by modeling structure, but many approaches rely on expensive LLM-based graph construction and GPU-heavy inference. We present SPRIG (Seeded Propagation for Retrieval In Graphs), a CPU-only, linear-time, token-free GraphRAG pipeline...
Now You See Me: Designing Responsible AI Dashboards for Early-Stage Health Innovation
arXiv:2602.23378v1 Announce Type: cross Abstract: Innovative HealthTech teams develop Artificial Intelligence (AI) systems in contexts where ethical expectations and organizational priorities must be balanced under severe resource constraints. While Responsible AI practices are expected to guide the design and evaluation...
Task-Lens: Cross-Task Utility Based Speech Dataset Profiling for Low-Resource Indian Languages
arXiv:2602.23388v1 Announce Type: cross Abstract: The rising demand for inclusive speech technologies amplifies the need for multilingual datasets for Natural Language Processing (NLP) research. However, limited awareness of existing task-specific resources in low-resource languages hinders research. This challenge is especially...
Learning to Generate Secure Code via Token-Level Rewards
arXiv:2602.23407v1 Announce Type: cross Abstract: Large language models (LLMs) have demonstrated strong capabilities in code generation, yet they remain prone to producing security vulnerabilities. Existing approaches commonly suffer from two key limitations: the scarcity of high-quality security data and coarse-grained...
Long Range Frequency Tuning for QML
arXiv:2602.23409v1 Announce Type: cross Abstract: Quantum machine learning models using angle encoding naturally represent truncated Fourier series, providing universal function approximation capabilities with sufficient circuit depth. For unary fixed-frequency encodings, circuit depth scales as O(omega_max * (omega_max + epsilon^{-2})) with...
Human Supervision as an Information Bottleneck: A Unified Theory of Error Floors in Human-Guided Learning
arXiv:2602.23446v1 Announce Type: cross Abstract: Large language models are trained primarily on human-generated data and feedback, yet they exhibit persistent errors arising from annotation noise, subjective preferences, and the limited expressive bandwidth of natural language. We argue that these limitations...
SALIENT: Frequency-Aware Paired Diffusion for Controllable Long-Tail CT Detection
arXiv:2602.23447v1 Announce Type: cross Abstract: Detection of rare lesions in whole-body CT is fundamentally limited by extreme class imbalance and low target-to-volume ratios, producing precision collapse despite high AUROC. Synthetic augmentation with diffusion models offers promise, yet pixel-space diffusion is...
BiKA: Kolmogorov-Arnold-Network-inspired Ultra Lightweight Neural Network Hardware Accelerator
arXiv:2602.23455v1 Announce Type: cross Abstract: Lightweight neural network accelerators are essential for edge devices with limited resources and power constraints. While quantization and binarization can efficiently reduce hardware cost, they still rely on the conventional Artificial Neural Network (ANN) computation...
TaCarla: A comprehensive benchmarking dataset for end-to-end autonomous driving
arXiv:2602.23499v1 Announce Type: cross Abstract: Collecting a high-quality dataset is a critical task that demands meticulous attention to detail, as overlooking certain aspects can render the entire dataset unusable. Autonomous driving challenges remain a prominent area of research, requiring further...
Truncated Step-Level Sampling with Process Rewards for Retrieval-Augmented Reasoning
arXiv:2602.23440v1 Announce Type: new Abstract: Training large language models to reason with search engines via reinforcement learning is hindered by a fundamental credit assignment problem: existing methods such as Search-R1 provide only a sparse outcome reward after an entire multi-step...
CiteAudit: You Cited It, But Did You Read It? A Benchmark for Verifying Scientific References in the LLM Era
arXiv:2602.23452v1 Announce Type: new Abstract: Scientific research relies on accurate citation for attribution and integrity, yet large language models (LLMs) introduce a new risk: fabricated references that appear plausible but correspond to no real publications. Such hallucinated citations have already...
FHIRPath-QA: Executable Question Answering over FHIR Electronic Health Records
arXiv:2602.23479v1 Announce Type: new Abstract: Though patients are increasingly granted digital access to their electronic health records (EHRs), existing interfaces may not support precise, trustworthy answers to patient-specific questions. Large language models (LLM) show promise in clinical question answering (QA),...
Multi-Sourced, Multi-Agent Evidence Retrieval for Fact-Checking
arXiv:2603.00267v1 Announce Type: new Abstract: Misinformation spreading over the Internet poses a significant threat to both societies and individuals, necessitating robust and scalable fact-checking that relies on retrieving accurate and trustworthy evidence. Previous methods rely on semantic and social-contextual patterns...