Empirical Comparison of Agent Communication Protocols for Task Orchestration
arXiv:2603.22823v1 Announce Type: new Abstract: Context. Nowadays, artificial intelligence agent systems are transforming from single-tool interactions to complex multi-agent orchestrations. As a result, two competing communication protocols have emerged: a tool integration protocol that standardizes how agents invoke external tools,...
Lie to Me: How Faithful Is Chain-of-Thought Reasoning in Reasoning Models?
arXiv:2603.22582v1 Announce Type: new Abstract: Chain-of-thought (CoT) reasoning has been proposed as a transparency mechanism for large language models in safety-critical deployments, yet its effectiveness depends on faithfulness (whether models accurately verbalize the factors that actually influence their outputs), a...
Functional Component Ablation Reveals Specialization Patterns in Hybrid Language Model Architectures
arXiv:2603.22473v1 Announce Type: new Abstract: Hybrid language models combining attention with state space models (SSMs) or linear attention offer improved efficiency, but whether both components are genuinely utilized remains unclear. We present a functional component ablation framework applied to two...
The Efficiency Attenuation Phenomenon: A Computational Challenge to the Language of Thought Hypothesis
arXiv:2603.22312v1 Announce Type: new Abstract: This paper computationally investigates whether thought requires a language-like format, as posited by the Language of Thought (LoT) hypothesis. We introduce the ``AI Private Language'' thought experiment: if two artificial agents develop an efficient, inscrutable...
Understanding LLM Performance Degradation in Multi-Instance Processing: The Roles of Instance Count and Context Length
arXiv:2603.22608v1 Announce Type: new Abstract: Users often rely on Large Language Models (LLMs) for processing multiple documents or performing analysis over a number of instances. For example, analysing the overall sentiment of a number of movie reviews requires an LLM...
PERMA: Benchmarking Personalized Memory Agents via Event-Driven Preference and Realistic Task Environments
arXiv:2603.23231v1 Announce Type: new Abstract: Empowering large language models with long-term memory is crucial for building agents that adapt to users' evolving needs. However, prior evaluations typically interleave preference-related dialogues with irrelevant conversations, reducing the task to needle-in-a-haystack retrieval while...
MedCausalX: Adaptive Causal Reasoning with Self-Reflection for Trustworthy Medical Vision-Language Models
arXiv:2603.23085v1 Announce Type: new Abstract: Vision-Language Models (VLMs) have enabled interpretable medical diagnosis by integrating visual perception with linguistic reasoning. Yet, existing medical chain-of-thought (CoT) models lack explicit mechanisms to represent and enforce causal reasoning, leaving them vulnerable to spurious...
Efficient Hallucination Detection: Adaptive Bayesian Estimation of Semantic Entropy with Guided Semantic Exploration
arXiv:2603.22812v1 Announce Type: new Abstract: Large language models (LLMs) have achieved remarkable success in various natural language processing tasks, yet they remain prone to generating factually incorrect outputs known as hallucinations. While recent approaches have shown promise for hallucination detection...
Parametric Knowledge and Retrieval Behavior in RAG Fine-Tuning for Electronic Design Automation
arXiv:2603.23047v1 Announce Type: new Abstract: Retrieval-Augmented Generation (RAG) fine-tuning has shown substantial improvements over vanilla RAG, yet most studies target document question answering and often rely on standard NLP metrics that can obscure factual differences. We evaluate RAG fine-tuning for...
AuthorMix: Modular Authorship Style Transfer via Layer-wise Adapter Mixing
arXiv:2603.23069v1 Announce Type: new Abstract: The task of authorship style transfer involves rewriting text in the style of a target author while preserving the meaning of the original text. Existing style transfer methods train a single model on large corpora...
Trained Persistent Memory for Frozen Decoder-Only LLMs
arXiv:2603.22329v1 Announce Type: new Abstract: Decoder-only language models are stateless: hidden representations are discarded after every forward pass and nothing persists across sessions. Jeong (2026a) showed that trained memory adapters give a frozen encoder-decoder backbone persistent latent-space memory, building on...
WIST: Web-Grounded Iterative Self-Play Tree for Domain-Targeted Reasoning Improvement
arXiv:2603.22352v1 Announce Type: new Abstract: Recent progress in reinforcement learning with verifiable rewards (RLVR) offers a practical path to self-improvement of language models, but existing methods face a key trade-off: endogenous self-play can drift over iterations, while corpus-grounded approaches rely...
FAAR: Format-Aware Adaptive Rounding for NVFP4
arXiv:2603.22370v1 Announce Type: new Abstract: Deploying large language models (LLMs) on edge devices requires extremely low-bit quantization. Ultra-low precision formats such as NVFP4 offer a promising solution for reducing memory footprint and accelerating computation. However, existing quantization methods typically rely...
Instruction-Tuned, but Not More Verifiable Instruction-Following: A Cross-Task Diagnosis for LoRA Adapters
arXiv:2603.22379v1 Announce Type: new Abstract: Adapters are often selected and deployed based on nominal labels (e.g., instruction-tuned), which implicitly suggest what capability improves after adaptation. We test whether nominal training objectives reliably align with realized cross-task capability gains by evaluating...
Learning When to Act: Interval-Aware Reinforcement Learning with Predictive Temporal Structure
arXiv:2603.22384v1 Announce Type: new Abstract: Autonomous agents operating in continuous environments must decide not only what to do, but when to act. We introduce a lightweight adaptive temporal control system that learns the optimal interval between cognitive ticks from experience,...
Model Predictive Control with Differentiable World Models for Offline Reinforcement Learning
arXiv:2603.22430v1 Announce Type: new Abstract: Offline Reinforcement Learning (RL) aims to learn optimal policies from fixed offline datasets, without further interactions with the environment. Such methods train an offline policy (or value function), and apply it at inference time without...
SkillRouter: Retrieve-and-Rerank Skill Selection for LLM Agents at Scale
arXiv:2603.22455v1 Announce Type: new Abstract: As LLM agent ecosystems grow, the number of available skills (tools, plugins) has reached tens of thousands, making it infeasible to inject all skills into an agent's context. This creates a need for skill routing...
A Foundation Model for Instruction-Conditioned In-Context Time Series Tasks
arXiv:2603.22586v1 Announce Type: new Abstract: In-context learning (ICL) allows a model to adapt at inference time by conditioning on examples rather than updating parameters. Existing time-series foundation models use implicit positional context, retrieval, or task-specific objectives, but rarely explicit instruction-conditioned...
Context Cartography: Toward Structured Governance of Contextual Space in Large Language Model Systems
arXiv:2603.20578v1 Announce Type: new Abstract: The prevailing approach to improving large language model (LLM) reasoning has centered on expanding context windows, implicitly assuming that more tokens yield better performance. However, empirical evidence - including the "lost in the middle" effect...
Children's Intelligence Tests Pose Challenges for MLLMs? KidGym: A 2D Grid-Based Reasoning Benchmark for MLLMs
arXiv:2603.20209v1 Announce Type: new Abstract: Multimodal Large Language Models (MLLMs) combine the linguistic strengths of LLMs with the ability to process multimodal data, enbaling them to address a broader range of visual tasks. Because MLLMs aim at more general, human-like...
KLDrive: Fine-Grained 3D Scene Reasoning for Autonomous Driving based on Knowledge Graph
arXiv:2603.21029v1 Announce Type: new Abstract: Autonomous driving requires reliable reasoning over fine-grained 3D scene facts. Fine-grained question answering over multi-modal driving observations provides a natural way to evaluate this capability, yet existing perception pipelines and driving-oriented large language model (LLM)...
Attention in Space: Functional Roles of VLM Heads for Spatial Reasoning
arXiv:2603.20662v1 Announce Type: new Abstract: Despite remarkable advances in large Vision-Language Models (VLMs), spatial reasoning remains a persistent challenge. In this work, we investigate how attention heads within VLMs contribute to spatial reasoning by analyzing their functional roles through a...
From 50% to Mastery in 3 Days: A Low-Resource SOP for Localizing Graduate-Level AI Tutors via Shadow-RAG
arXiv:2603.20650v1 Announce Type: new Abstract: Deploying high-fidelity AI tutors in schools is often blocked by the Resource Curse -- the need for expensive cloud GPUs and massive data engineering. In this practitioner report, we present a replicable Standard Operating Procedure...
AutoMOOSE: An Agentic AI for Autonomous Phase-Field Simulation
arXiv:2603.20986v1 Announce Type: new Abstract: Multiphysics simulation frameworks such as MOOSE provide rigorous engines for phase-field materials modeling, yet adoption is constrained by the expertise required to construct valid input files, coordinate parameter sweeps, diagnose failures, and extract quantitative results....
The AI Scientific Community: Agentic Virtual Lab Swarms
arXiv:2603.21344v1 Announce Type: new Abstract: In this short note we propose using agentic swarms of virtual labs as a model of an AI Science Community. In this paradigm, each particle in the swarm represents a complete virtual laboratory instance, enabling...
CRoCoDiL: Continuous and Robust Conditioned Diffusion for Language
arXiv:2603.20210v1 Announce Type: new Abstract: Masked Diffusion Models (MDMs) provide an efficient non-causal alternative to autoregressive generation but often struggle with token dependencies and semantic incoherence due to their reliance on discrete marginal distributions. We address these limitations by shifting...
ProMAS: Proactive Error Forecasting for Multi-Agent Systems Using Markov Transition Dynamics
arXiv:2603.20260v1 Announce Type: new Abstract: The integration of Large Language Models into Multi-Agent Systems (MAS) has enabled the so-lution of complex, long-horizon tasks through collaborative reasoning. However, this collec-tive intelligence is inherently fragile, as a single logical fallacy can rapidly...
gUFO: A Gentle Foundational Ontology for Semantic Web Knowledge Graphs
arXiv:2603.20948v1 Announce Type: new Abstract: gUFO is a lightweight implementation of the Unified Foundational Ontology (UFO) suitable for Semantic Web OWL 2 DL applications. UFO is a mature foundational ontology with a rich axiomatization and that has been employed in...
Position: Multi-Agent Algorithmic Care Systems Demand Contestability for Trustworthy AI
arXiv:2603.20595v1 Announce Type: new Abstract: Multi-agent systems (MAS) are increasingly used in healthcare to support complex decision-making through collaboration among specialized agents. Because these systems act as collective decision-makers, they raise challenges for trust, accountability, and human oversight. Existing approaches...
Fast-Slow Thinking RM: Efficient Integration of Scalar and Generative Reward Models
arXiv:2603.20212v1 Announce Type: new Abstract: Reward models (RMs) are critical for aligning Large Language Models via Reinforcement Learning from Human Feedback (RLHF). While Generative Reward Models (GRMs) achieve superior accuracy through chain-of-thought (CoT) reasoning, they incur substantial computational costs. Conversely,...