Optimization of Edge Directions and Weights for Mixed Guidance Graphs in Lifelong Multi-Agent Path Finding
arXiv:2602.23468v1 Announce Type: cross Abstract: Multi-Agent Path Finding (MAPF) aims to move agents from their start to goal vertices on a graph. Lifelong MAPF (LMAPF) continuously assigns new goals to agents as they complete current ones. To guide agents' movement...
TaCarla: A comprehensive benchmarking dataset for end-to-end autonomous driving
arXiv:2602.23499v1 Announce Type: cross Abstract: Collecting a high-quality dataset is a critical task that demands meticulous attention to detail, as overlooking certain aspects can render the entire dataset unusable. Autonomous driving challenges remain a prominent area of research, requiring further...
CiteAudit: You Cited It, But Did You Read It? A Benchmark for Verifying Scientific References in the LLM Era
arXiv:2602.23452v1 Announce Type: new Abstract: Scientific research relies on accurate citation for attribution and integrity, yet large language models (LLMs) introduce a new risk: fabricated references that appear plausible but correspond to no real publications. Such hallucinated citations have already...
FHIRPath-QA: Executable Question Answering over FHIR Electronic Health Records
arXiv:2602.23479v1 Announce Type: new Abstract: Though patients are increasingly granted digital access to their electronic health records (EHRs), existing interfaces may not support precise, trustworthy answers to patient-specific questions. Large language models (LLM) show promise in clinical question answering (QA),...
IDP Accelerator: Agentic Document Intelligence from Extraction to Compliance Validation
arXiv:2602.23481v1 Announce Type: new Abstract: Understanding and extracting structured insights from unstructured documents remains a foundational challenge in industrial NLP. While Large Language Models (LLMs) enable zero-shot extraction, traditional pipelines often fail to handle multi-document packets, complex reasoning, and strict...
Multi-Sourced, Multi-Agent Evidence Retrieval for Fact-Checking
arXiv:2603.00267v1 Announce Type: new Abstract: Misinformation spreading over the Internet poses a significant threat to both societies and individuals, necessitating robust and scalable fact-checking that relies on retrieving accurate and trustworthy evidence. Previous methods rely on semantic and social-contextual patterns...
EmCoop: A Framework and Benchmark for Embodied Cooperation Among LLM Agents
arXiv:2603.00349v1 Announce Type: new Abstract: Real-world scenarios increasingly require multiple embodied agents to collaborate in dynamic environments under embodied constraints, as many tasks exceed the capabilities of any single agent. Recent advances in large language models (LLMs) enable high-level cognitive...
Monotropic Artificial Intelligence: Toward a Cognitive Taxonomy of Domain-Specialized Language Models
arXiv:2603.00350v1 Announce Type: new Abstract: The prevailing paradigm in artificial intelligence research equates progress with scale: larger models trained on broader datasets are presumed to yield superior capabilities. This assumption, while empirically productive for general-purpose applications, obscures a fundamental epistemological...
NeuroHex: Highly-Efficient Hex Coordinate System for Creating World Models to Enable Adaptive AI
arXiv:2603.00376v1 Announce Type: new Abstract: \textit{NeuroHex} is a hexagonal coordinate system designed to support highly efficient world models and reference frames for online adaptive AI systems. Inspired by the hexadirectional firing structure of grid cells in the human brain, NeuroHex...
MED-COPILOT: A Medical Assistant Powered by GraphRAG and Similar Patient Case Retrieval
arXiv:2603.00460v1 Announce Type: new Abstract: Clinical decision-making requires synthesizing heterogeneous evidence, including patient histories, clinical guidelines, and trajectories of comparable cases. While large language models (LLMs) offer strong reasoning capabilities, they remain prone to hallucinations and struggle to integrate long,...
MicroVerse: A Preliminary Exploration Toward a Micro-World Simulation
arXiv:2603.00585v1 Announce Type: new Abstract: Recent advances in video generation have opened new avenues for macroscopic simulation of complex dynamic systems, but their application to microscopic phenomena remains largely unexplored. Microscale simulation holds great promise for biomedical applications such as...
TraceSIR: A Multi-Agent Framework for Structured Analysis and Reporting of Agentic Execution Traces
arXiv:2603.00623v1 Announce Type: new Abstract: Agentic systems augment large language models with external tools and iterative decision making, enabling complex tasks such as deep research, function calling, and coding. However, their long and intricate execution traces make failure diagnosis and...
LiTS: A Modular Framework for LLM Tree Search
arXiv:2603.00631v1 Announce Type: new Abstract: LiTS is a modular Python framework for LLM reasoning via tree search. It decomposes tree search into three reusable components (Policy, Transition, and RewardModel) that plug into algorithms like MCTS and BFS. A decorator-based registry...
The Synthetic Web: Adversarially-Curated Mini-Internets for Diagnosing Epistemic Weaknesses of Language Agents
arXiv:2603.00801v1 Announce Type: new Abstract: Language agents increasingly act as web-enabled systems that search, browse, and synthesize information from diverse sources. However, these sources can include unreliable or adversarial content, and the robustness of agents to adversarial ranking - where...
MC-Search: Evaluating and Enhancing Multimodal Agentic Search with Structured Long Reasoning Chains
arXiv:2603.00873v1 Announce Type: new Abstract: With the increasing demand for step-wise, cross-modal, and knowledge-grounded reasoning, multimodal large language models (MLLMs) are evolving beyond the traditional fixed retrieve-then-generate paradigm toward more sophisticated agentic multimodal retrieval-augmented generation (MM-RAG). Existing benchmarks, however, mainly...
BioProAgent: Neuro-Symbolic Grounding for Constrained Scientific Planning
arXiv:2603.00876v1 Announce Type: new Abstract: Large language models (LLMs) have demonstrated significant reasoning capabilities in scientific discovery but struggle to bridge the gap to physical execution in wet-labs. In these irreversible environments, probabilistic hallucinations are not merely incorrect, but also...
HiMAC: Hierarchical Macro-Micro Learning for Long-Horizon LLM Agents
arXiv:2603.00977v1 Announce Type: new Abstract: Large language model (LLM) agents have recently demonstrated strong capabilities in interactive decision-making, yet they remain fundamentally limited in long-horizon tasks that require structured planning and reliable execution. Existing approaches predominantly rely on flat autoregressive...
Tracking Capabilities for Safer Agents
arXiv:2603.00991v1 Announce Type: new Abstract: AI agents that interact with the real world through tool calls pose fundamental safety challenges: agents might leak private information, cause unintended side effects, or be manipulated through prompt injection. To address these challenges, we...
DIVA-GRPO: Enhancing Multimodal Reasoning through Difficulty-Adaptive Variant Advantage
arXiv:2603.01106v1 Announce Type: new Abstract: Reinforcement learning (RL) with group relative policy optimization (GRPO) has become a widely adopted approach for enhancing the reasoning capabilities of multimodal large language models (MLLMs). While GRPO enables long-chain reasoning without a critic, it...
DeepResearch-9K: A Challenging Benchmark Dataset of Deep-Research Agent
arXiv:2603.01152v1 Announce Type: new Abstract: Deep-research agents are capable of executing multi-step web exploration, targeted retrieval, and sophisticated question answering. Despite their powerful capabilities, deep-research agents face two critical bottlenecks: (1) the lack of large-scale, challenging datasets with real-world difficulty,...
Incremental LTLf Synthesis
arXiv:2603.01201v1 Announce Type: new Abstract: In this paper, we study incremental LTLf synthesis -- a form of reactive synthesis where the goals are given incrementally while in execution. In other words, the protagonist agent is already executing a strategy for...
Noise reduction in BERT NER models for clinical entity extraction
arXiv:2603.00022v1 Announce Type: new Abstract: Precision is of utmost importance in the realm of clinical entity extraction from clinical notes and reports. Encoder Models fine-tuned for Named Entity Recognition (NER) are an efficient choice for this purpose, as they don't...
SimpleTool: Parallel Decoding for Real-Time LLM Function Calling
arXiv:2603.00030v1 Announce Type: new Abstract: LLM-based function calling enables intelligent agents to interact with external tools and environments, yet autoregressive decoding imposes a fundamental latency bottleneck that limits real-time applications such as embodied intelligence, game AI, and interactive avatars (e.g.,...
When Metrics Disagree: Automatic Similarity vs. LLM-as-a-Judge for Clinical Dialogue Evaluation
arXiv:2603.00314v1 Announce Type: new Abstract: This paper details the baseline model selection, fine-tuning process, evaluation methods, and the implications of deploying more accurate LLMs in healthcare settings. As large language models (LLMs) are increasingly employed to address diverse problems, including...
Diagnosing Retrieval vs. Utilization Bottlenecks in LLM Agent Memory
arXiv:2603.02473v1 Announce Type: new Abstract: Memory-augmented LLM agents store and retrieve information from prior interactions, yet the relative importance of how memories are written versus how they are retrieved remains unclear. We introduce a diagnostic framework that analyzes how performance...
See and Remember: A Multimodal Agent for Web Traversal
arXiv:2603.02626v1 Announce Type: new Abstract: Autonomous web navigation requires agents to perceive complex visual environments and maintain long-term context, yet current Large Language Model (LLM) based agents often struggle with spatial disorientation and navigation loops. In this paper, we propose...
FinTexTS: Financial Text-Paired Time-Series Dataset via Semantic-Based and Multi-Level Pairing
arXiv:2603.02702v1 Announce Type: new Abstract: The financial domain involves a variety of important time-series problems. Recently, time-series analysis methods that jointly leverage textual and numerical information have gained increasing attention. Accordingly, numerous efforts have been made to construct text-paired time-series...
Rethinking Code Similarity for Automated Algorithm Design with LLMs
arXiv:2603.02787v1 Announce Type: new Abstract: The rise of Large Language Model-based Automated Algorithm Design (LLM-AAD) has transformed algorithm development by autonomously generating code implementations of expert-level algorithms. Unlike traditional expert-driven algorithm development, in the LLM-AAD paradigm, the main design principle...
SAE as a Crystal Ball: Interpretable Features Predict Cross-domain Transferability of LLMs without Training
arXiv:2603.02908v1 Announce Type: new Abstract: In recent years, pre-trained large language models have achieved remarkable success across diverse tasks. Besides the pivotal role of self-supervised pre-training, their effectiveness in downstream applications also depends critically on the post-training process, which adapts...
OrchMAS: Orchestrated Reasoning with Multi Collaborative Heterogeneous Scientific Expert Structured Agents
arXiv:2603.03005v1 Announce Type: new Abstract: Multi-agent large language model frameworks are promising for complex multi step reasoning, yet existing systems remain weak for scientific and knowledge intensive domains due to static prompts and agent roles, rigid workflows, and homogeneous model...