Scaling View Synthesis Transformers
arXiv:2602.21341v1 Announce Type: cross Abstract: Geometry-free view synthesis transformers have recently achieved state-of-the-art performance in Novel View Synthesis (NVS), outperforming traditional approaches that rely on explicit geometry modeling. Yet the factors governing their scaling with compute remain unclear. We present...
Towards Controllable Video Synthesis of Routine and Rare OR Events
arXiv:2602.21365v1 Announce Type: cross Abstract: Purpose: Curating large-scale datasets of operating room (OR) workflow, encompassing rare, safety-critical, or atypical events, remains operationally and ethically challenging. This data bottleneck complicates the development of ambient intelligence for detecting, understanding, and mitigating rare...
The Mean is the Mirage: Entropy-Adaptive Model Merging under Heterogeneous Domain Shifts in Medical Imaging
arXiv:2602.21372v1 Announce Type: cross Abstract: Model merging under unseen test-time distribution shifts often renders naive strategies, such as mean averaging unreliable. This challenge is especially acute in medical imaging, where models are fine-tuned locally at clinics on private data, producing...
Small Language Models for Privacy-Preserving Clinical Information Extraction in Low-Resource Languages
arXiv:2602.21374v1 Announce Type: cross Abstract: Extracting clinical information from medical transcripts in low-resource languages remains a significant challenge in healthcare natural language processing (NLP). This study evaluates a two-step pipeline combining Aya-expanse-8B as a Persian-to-English translation model with five open-source...
FedVG: Gradient-Guided Aggregation for Enhanced Federated Learning
arXiv:2602.21399v1 Announce Type: cross Abstract: Federated Learning (FL) enables collaborative model training across multiple clients without sharing their private data. However, data heterogeneity across clients leads to client drift, which degrades the overall generalization performance of the model. This effect...
The Headless Firm: How AI Reshapes Enterprise Boundaries
arXiv:2602.21401v1 Announce Type: cross Abstract: The boundary of the firm is determined by coordination cost. We argue that agentic AI induces a structural change in how coordination costs scale: in prior modular systems, integration cost grew with interaction topology (O(n^2)...
Graph Your Way to Inspiration: Integrating Co-Author Graphs with Retrieval-Augmented Generation for Large Language Model Based Scientific Idea Generation
arXiv:2602.22215v1 Announce Type: new Abstract: Large Language Models (LLMs) demonstrate potential in the field of scientific idea generation. However, the generated results often lack controllable academic context and traceable inspiration pathways. To bridge this gap, this paper proposes a scientific...
FIRE: A Comprehensive Benchmark for Financial Intelligence and Reasoning Evaluation
arXiv:2602.22273v1 Announce Type: new Abstract: We introduce FIRE, a comprehensive benchmark designed to evaluate both the theoretical financial knowledge of LLMs and their ability to handle practical business scenarios. For theoretical assessment, we curate a diverse set of examination questions...
Multi-Level Causal Embeddings
arXiv:2602.22287v1 Announce Type: new Abstract: Abstractions of causal models allow for the coarsening of models such that relations of cause and effect are preserved. Whereas abstractions focus on the relation between two models, in this paper we study a framework...
Agent Behavioral Contracts: Formal Specification and Runtime Enforcement for Reliable Autonomous AI Agents
arXiv:2602.22302v1 Announce Type: new Abstract: Traditional software relies on contracts -- APIs, type systems, assertions -- to specify and enforce correct behavior. AI agents, by contrast, operate on prompts and natural language instructions with no formal behavioral specification. This gap...
Exploring Human Behavior During Abstract Rule Inference and Problem Solving with the Cognitive Abstraction and Reasoning Corpus
arXiv:2602.22408v1 Announce Type: new Abstract: Humans exhibit remarkable flexibility in abstract reasoning, and can rapidly learn and apply rules from sparse examples. To investigate the cognitive strategies underlying this ability, we introduce the Cognitive Abstraction and Reasoning Corpus (CogARC), a...
Epistemic Filtering and Collective Hallucination: A Jury Theorem for Confidence-Calibrated Agents
arXiv:2602.22413v1 Announce Type: new Abstract: We investigate the collective accuracy of heterogeneous agents who learn to estimate their own reliability over time and selectively abstain from voting. While classical epistemic voting results, such as the \textit{Condorcet Jury Theorem} (CJT), assume...
How Do Latent Reasoning Methods Perform Under Weak and Strong Supervision?
arXiv:2602.22441v1 Announce Type: new Abstract: Latent reasoning has been recently proposed as a reasoning paradigm and performs multi-step reasoning through generating steps in the latent space instead of the textual space. This paradigm enables reasoning beyond discrete language tokens by...
CWM: Contrastive World Models for Action Feasibility Learning in Embodied Agent Pipelines
arXiv:2602.22452v1 Announce Type: new Abstract: A reliable action feasibility scorer is a critical bottleneck in embodied agent pipelines: before any planning or reasoning occurs, the agent must identify which candidate actions are physically executable in the current state. Existing approaches...
VeRO: An Evaluation Harness for Agents to Optimize Agents
arXiv:2602.22480v1 Announce Type: new Abstract: An important emerging application of coding agents is agent optimization: the iterative improvement of a target agent through edit-execute-evaluate cycles. Despite its relevance, the community lacks a systematic understanding of coding agent performance on this...
A Mathematical Theory of Agency and Intelligence
arXiv:2602.22519v1 Announce Type: new Abstract: To operate reliably under changing conditions, complex systems require feedback on how effectively they use resources, not just whether objectives are met. Current AI systems process vast information to produce sophisticated predictions, yet predictions can...
Agentic AI for Intent-driven Optimization in Cell-free O-RAN
arXiv:2602.22539v1 Announce Type: new Abstract: Agentic artificial intelligence (AI) is emerging as a key enabler for autonomous radio access networks (RANs), where multiple large language model (LLM)-based agents reason and collaborate to achieve operator-defined intents. The open RAN (O-RAN) architecture...
Strategy Executability in Mathematical Reasoning: Leveraging Human-Model Differences for Effective Guidance
arXiv:2602.22583v1 Announce Type: new Abstract: Example-based guidance is widely used to improve mathematical reasoning at inference time, yet its effectiveness is highly unstable across problems and models-even when the guidance is correct and problem-relevant. We show that this instability arises...
Correcting Human Labels for Rater Effects in AI Evaluation: An Item Response Theory Approach
arXiv:2602.22585v1 Announce Type: new Abstract: Human evaluations play a central role in training and assessing AI models, yet these data are rarely treated as measurements subject to systematic error. This paper integrates psychometric rater models into the AI pipeline to...
SideQuest: Model-Driven KV Cache Management for Long-Horizon Agentic Reasoning
arXiv:2602.22603v1 Announce Type: new Abstract: Long-running agentic tasks, such as deep research, require multi-hop reasoning over information distributed across multiple webpages and documents. In such tasks, the LLM context is dominated by tokens from external retrieval, causing memory usage to...
MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios
arXiv:2602.22638v1 Announce Type: new Abstract: Route-planning agents powered by large language models (LLMs) have emerged as a promising paradigm for supporting everyday human mobility through natural language interaction and tool-mediated decision making. However, systematic evaluation in real-world mobility settings is...
Knob: A Physics-Inspired Gating Interface for Interpretable and Controllable Neural Dynamics
arXiv:2602.22702v1 Announce Type: new Abstract: Existing neural network calibration methods often treat calibration as a static, post-hoc optimization task. However, this neglects the dynamic and temporal nature of real-world inference. Moreover, existing methods do not provide an intuitive interface enabling...
RLHFless: Serverless Computing for Efficient RLHF
arXiv:2602.22718v1 Announce Type: new Abstract: Reinforcement Learning from Human Feedback (RLHF) has been widely applied to Large Language Model (LLM) post-training to align model outputs with human preferences. Recent models, such as DeepSeek-R1, have also shown RLHF's potential to improve...
Know What You Know: Metacognitive Entropy Calibration for Verifiable RL Reasoning
arXiv:2602.22751v1 Announce Type: new Abstract: Large reasoning models (LRMs) have emerged as a powerful paradigm for solving complex real-world tasks. In practice, these models are predominantly trained via Reinforcement Learning with Verifiable Rewards (RLVR), yet most existing outcome-only RLVR pipelines...
AMA-Bench: Evaluating Long-Horizon Memory for Agentic Applications
arXiv:2602.22769v1 Announce Type: new Abstract: Large Language Models (LLMs) are deployed as autonomous agents in increasingly complex applications, where enabling long-horizon memory is critical for achieving strong performance. However, a significant gap exists between practical applications and current evaluation standards...
DeepPresenter: Environment-Grounded Reflection for Agentic Presentation Generation
arXiv:2602.22839v1 Announce Type: new Abstract: Presentation generation requires deep content research, coherent visual design, and iterative refinement based on observation. However, existing presentation agents often rely on predefined workflows and fixed templates. To address this, we present DeepPresenter, an agentic...
The AI Research Assistant: Promise, Peril, and a Proof of Concept
arXiv:2602.22842v1 Announce Type: new Abstract: Can artificial intelligence truly contribute to creative mathematical research, or does it merely automate routine calculations while introducing risks of error? We provide empirical evidence through a detailed case study: the discovery of novel error...
OmniGAIA: Towards Native Omni-Modal AI Agents
arXiv:2602.22897v1 Announce Type: new Abstract: Human intelligence naturally intertwines omni-modal perception -- spanning vision, audio, and language -- with complex reasoning and tool usage to interact with the world. However, current multi-modal LLMs are primarily confined to bi-modal interactions (e.g.,...
Modeling Expert AI Diagnostic Alignment via Immutable Inference Snapshots
arXiv:2602.22973v1 Announce Type: new Abstract: Human-in-the-loop validation is essential in safety-critical clinical AI, yet the transition between initial model inference and expert correction is rarely analyzed as a structured signal. We introduce a diagnostic alignment framework in which the AI-generated...
Enhancing CVRP Solver through LLM-driven Automatic Heuristic Design
arXiv:2602.23092v1 Announce Type: new Abstract: The Capacitated Vehicle Routing Problem (CVRP), a fundamental combinatorial optimization challenge, focuses on optimizing fleet operations under vehicle capacity constraints. While extensively studied in operational research, the NP-hard nature of CVRP continues to pose significant...