Diffusing to Coordinate: Efficient Online Multi-Agent Diffusion Policies
arXiv:2602.18291v1 Announce Type: new Abstract: Online Multi-Agent Reinforcement Learning (MARL) is a prominent framework for efficient agent coordination. Crucially, enhancing policy expressiveness is pivotal for achieving superior performance. Diffusion-based generative models are well-positioned to meet this demand, having demonstrated remarkable...
CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models
arXiv:2602.17684v1 Announce Type: cross Abstract: Reinforcement Learning from Verifiable Rewards (RLVR) has driven recent progress in code large language models by leveraging execution-based feedback from unit tests, but its scalability is fundamentally constrained by the availability and reliability of high-quality...
Agentic Unlearning: When LLM Agent Meets Machine Unlearning
arXiv:2602.17692v1 Announce Type: cross Abstract: In this paper, we introduce \textbf{agentic unlearning} which removes specified information from both model parameters and persistent memory in agents with closed-loop interaction. Existing unlearning methods target parameters alone, leaving two critical gaps: (i) parameter-memory...
A Case Study of Selected PTQ Baselines for Reasoning LLMs on Ascend NPU
arXiv:2602.17693v1 Announce Type: cross Abstract: Post-Training Quantization (PTQ) is crucial for efficient model deployment, yet its effectiveness on Ascend NPU remains under-explored compared to GPU architectures. This paper presents a case study of representative PTQ baselines applied to reasoning-oriented models...
MIDAS: Mosaic Input-Specific Differentiable Architecture Search
arXiv:2602.17700v1 Announce Type: cross Abstract: Differentiable Neural Architecture Search (NAS) provides efficient, gradient-based methods for automatically designing neural networks, yet its adoption remains limited in practice. We present MIDAS, a novel approach that modernizes DARTS by replacing static architecture parameters...
UBio-MolFM: A Universal Molecular Foundation Model for Bio-Systems
arXiv:2602.17709v1 Announce Type: cross Abstract: All-atom molecular simulation serves as a quintessential ``computational microscope'' for understanding the machinery of life, yet it remains fundamentally limited by the trade-off between quantum-mechanical (QM) accuracy and biological scale. We present UBio-MolFM, a universal...
Symbolic computation of conservation laws of nonlinear partial differential equations in multi‐dimensions
Abstract A direct method for the computation of polynomial conservation laws of polynomial systems of nonlinear partial differential equations (PDEs) in multi‐dimensions is presented. The method avoids advanced differential‐geometric tools. Instead, it is solely based on calculus, variational calculus, and...
On the Dynamics of Observation and Semantics
arXiv:2602.18494v1 Announce Type: new Abstract: A dominant paradigm in visual intelligence treats semantics as a static property of latent representations, assuming that meaning can be discovered through geometric proximity in high dimensional embedding spaces. In this work, we argue that...
Spilled Energy in Large Language Models
arXiv:2602.18671v1 Announce Type: new Abstract: We reinterpret the final Large Language Model (LLM) softmax classifier as an Energy-Based Model (EBM), decomposing the sequence-to-sequence probability chain into multiple interacting EBMs at inference. This principled approach allows us to track "energy spills"...
Task-Aware Exploration via a Predictive Bisimulation Metric
arXiv:2602.18724v1 Announce Type: new Abstract: Accelerating exploration in visual reinforcement learning under sparse rewards remains challenging due to the substantial task-irrelevant variations. Despite advances in intrinsic exploration, many methods either assume access to low-dimensional states or lack task-aware exploration strategies,...
The Convergence of Schema-Guided Dialogue Systems and the Model Context Protocol
arXiv:2602.18764v1 Announce Type: new Abstract: This paper establishes a fundamental convergence: Schema-Guided Dialogue (SGD) and the Model Context Protocol (MCP) represent two manifestations of a unified paradigm for deterministic, auditable LLM-agent interaction. SGD, designed for dialogue-based API discovery (2019), and...
ABD: Default Exception Abduction in Finite First Order Worlds
arXiv:2602.18843v1 Announce Type: new Abstract: We introduce ABD, a benchmark for default-exception abduction over finite first-order worlds. Given a background theory with an abnormality predicate and a set of relational structures, a model must output a first-order formula that defines...
TPRU: Advancing Temporal and Procedural Understanding in Large Multimodal Models
arXiv:2602.18884v1 Announce Type: new Abstract: Multimodal Large Language Models (MLLMs), particularly smaller, deployable variants, exhibit a critical deficiency in understanding temporal and procedural visual data, a bottleneck hindering their application in real-world embodied AI. This gap is largely caused by...
Early Evidence of Vibe-Proving with Consumer LLMs: A Case Study on Spectral Region Characterization with ChatGPT-5.2 (Thinking)
arXiv:2602.18918v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly used as scientific copilots, but evidence on their role in research-level mathematics remains limited, especially for workflows accessible to individual researchers. We present early evidence for vibe-proving with a...
DREAM: Deep Research Evaluation with Agentic Metrics
arXiv:2602.18940v1 Announce Type: new Abstract: Deep Research Agents generate analyst-grade reports, yet evaluating them remains challenging due to the absence of a single ground truth and the multidimensional nature of research quality. Recent benchmarks propose distinct methodologies, yet they suffer...
High Dimensional Procedural Content Generation
arXiv:2602.18943v1 Announce Type: new Abstract: Procedural content generation (PCG) has made substantial progress in shaping static 2D/3D geometry, while most methods treat gameplay mechanics as auxiliary and optimize only over space. We argue that this limits controllability and expressivity, and...
(Perlin) Noise as AI coordinator
arXiv:2602.18947v1 Announce Type: new Abstract: Large scale control of nonplayer agents is central to modern games, while production systems still struggle to balance several competing goals: locally smooth, natural behavior, and globally coordinated variety across space and time. Prior approaches...
Modularity is the Bedrock of Natural and Artificial Intelligence
arXiv:2602.18960v1 Announce Type: new Abstract: The remarkable performance of modern AI systems has been driven by unprecedented scales of data, computation, and energy -- far exceeding the resources required by human intelligence. This disparity highlights the need for new guiding...
InfEngine: A Self-Verifying and Self-Optimizing Intelligent Engine for Infrared Radiation Computing
arXiv:2602.18985v1 Announce Type: new Abstract: Infrared radiation computing underpins advances in climate science, remote sensing and spectroscopy but remains constrained by manual workflows. We introduce InfEngine, an autonomous intelligent computational engine designed to drive a paradigm shift from human-led orchestration...
Quantifying Automation Risk in High-Automation AI Systems: A Bayesian Framework for Failure Propagation and Optimal Oversight
arXiv:2602.18986v1 Announce Type: new Abstract: Organizations across finance, healthcare, transportation, content moderation, and critical infrastructure are rapidly deploying highly automated AI systems, yet they lack principled methods to quantify how increasing automation amplifies harm when failures occur. We propose a...
Evaluating Large Language Models on Quantum Mechanics: A Comparative Study Across Diverse Models and Tasks
arXiv:2602.19006v1 Announce Type: new Abstract: We present a systematic evaluation of large language models on quantum mechanics problem-solving. Our study evaluates 15 models from five providers (OpenAI, Anthropic, Google, Alibaba, DeepSeek) spanning three capability tiers on 20 tasks covering derivations,...
DoAtlas-1: A Causal Compilation Paradigm for Clinical AI
arXiv:2602.19158v1 Announce Type: new Abstract: Medical foundation models generate narrative explanations but cannot quantify intervention effects, detect evidence conflicts, or validate literature claims, limiting clinical auditability. We propose causal compilation, a paradigm that transforms medical evidence from narrative text into...
Hiding in Plain Text: Detecting Concealed Jailbreaks via Activation Disentanglement
arXiv:2602.19396v1 Announce Type: new Abstract: Large language models (LLMs) remain vulnerable to jailbreak prompts that are fluent and semantically coherent, and therefore difficult to detect with standard heuristics. A particularly challenging failure mode occurs when an attacker tries to hide...
INSURE-Dial: A Phase-Aware Conversational Dataset \& Benchmark for Compliance Verification and Phase Detection
arXiv:2602.18448v1 Announce Type: new Abstract: Administrative phone tasks drain roughly 1 trillion USD annually from U.S. healthcare, with over 500 million insurance-benefit verification calls manually handled in 2024. We introduce INSURE-Dial, to our knowledge the first public benchmark for developing...
Semantic Substrate Theory: An Operator-Theoretic Framework for Geometric Semantic Drift
arXiv:2602.18699v1 Announce Type: new Abstract: Most semantic drift studies report multiple signals e.g., embedding displacement, neighbor changes, distributional divergence, and recursive trajectory instability, without a shared explanatory theory that relates them. This paper proposes a formalization of these signals in...
DeepInnovator: Triggering the Innovative Capabilities of LLMs
arXiv:2602.18920v1 Announce Type: new Abstract: The application of Large Language Models (LLMs) in accelerating scientific discovery has garnered increasing attention, with a key focus on constructing research agents endowed with innovative capability, i.e., the ability to autonomously generate novel and...
Causal Identification from Counterfactual Data: Completeness and Bounding Results
arXiv:2602.23541v1 Announce Type: new Abstract: Previous work establishing completeness results for $\textit{counterfactual identification}$ has been circumscribed to the setting where the input data belongs to observational or interventional distributions (Layers 1 and 2 of Pearl's Causal Hierarchy), since it was...
Planning under Distribution Shifts with Causal POMDPs
arXiv:2602.23545v1 Announce Type: new Abstract: In the real world, planning is often challenged by distribution shifts. As such, a model of the environment obtained under one set of conditions may no longer remain valid as the distribution of states or...
The Auton Agentic AI Framework
arXiv:2602.23720v1 Announce Type: new Abstract: The field of Artificial Intelligence is undergoing a transition from Generative AI -- probabilistic generation of text and images -- to Agentic AI, in which autonomous systems execute actions within external environments on behalf of...
RUMAD: Reinforcement-Unifying Multi-Agent Debate
arXiv:2602.23864v1 Announce Type: new Abstract: Multi-agent debate (MAD) systems leverage collective intelligence to enhance reasoning capabilities, yet existing approaches struggle to simultaneously optimize accuracy, consensus formation, and computational efficiency. Static topology methods lack adaptability to task complexity variations, while external...