Quantifying Automation Risk in High-Automation AI Systems: A Bayesian Framework for Failure Propagation and Optimal Oversight
arXiv:2602.18986v1 Announce Type: new Abstract: Organizations across finance, healthcare, transportation, content moderation, and critical infrastructure are rapidly deploying highly automated AI systems, yet they lack principled methods to quantify how increasing automation amplifies harm when failures occur. We propose a...
INSURE-Dial: A Phase-Aware Conversational Dataset \& Benchmark for Compliance Verification and Phase Detection
arXiv:2602.18448v1 Announce Type: new Abstract: Administrative phone tasks drain roughly 1 trillion USD annually from U.S. healthcare, with over 500 million insurance-benefit verification calls manually handled in 2024. We introduce INSURE-Dial, to our knowledge the first public benchmark for developing...
The Auton Agentic AI Framework
arXiv:2602.23720v1 Announce Type: new Abstract: The field of Artificial Intelligence is undergoing a transition from Generative AI -- probabilistic generation of text and images -- to Agentic AI, in which autonomous systems execute actions within external environments on behalf of...
From Goals to Aspects, Revisited: An NFR Pattern Language for Agentic AI Systems
arXiv:2603.00472v1 Announce Type: new Abstract: Agentic AI systems exhibit numerous crosscutting concerns -- security, observability, cost management, fault tolerance -- that are poorly modularized in current implementations, contributing to the high failure rate of AI projects in reaching production. The...
AIoT-based Continuous, Contextualized, and Explainable Driving Assessment for Older Adults
arXiv:2603.00691v1 Announce Type: new Abstract: The world is undergoing a major demographic shift as older adults become a rapidly growing share of the population, creating new challenges for driving safety. In car-dependent regions such as the United States, driving remains...
From Prerequisites to Predictions: Validating a Geometric Hallucination Taxonomy Through Controlled Induction
arXiv:2603.00307v1 Announce Type: new Abstract: We test whether a geometric hallucination taxonomy -- classifying failures as center-drift (Type~1), wrong-well convergence (Type~2), or coverage gaps (Type~3) -- can distinguish hallucination types through controlled induction in GPT-2. Using a two-level statistical design...
Federated Inference: Toward Privacy-Preserving Collaborative and Incentivized Model Serving
arXiv:2603.02214v1 Announce Type: new Abstract: Federated Inference (FI) studies how independently trained and privately owned models can collaborate at inference time without sharing data or model parameters. While recent work has explored secure and distributed inference from disparate perspectives, a...
NeuroProlog: Multi-Task Fine-Tuning for Neurosymbolic Mathematical Reasoning via the Cocktail Effect
arXiv:2603.02504v1 Announce Type: new Abstract: Large Language Models (LLMs) achieve strong performance on natural language tasks but remain unreliable in mathematical reasoning, frequently generating fluent yet logically inconsistent solutions. We present \textbf{NeuroProlog}, a neurosymbolic framework that ensures verifiable reasoning by...
Agentified Assessment of Logical Reasoning Agents
arXiv:2603.02788v1 Announce Type: new Abstract: We present a framework for evaluating and benchmarking logical reasoning agents when assessment itself must be reproducible, auditable, and robust to execution failures. Building on agentified assessment, we use an assessor agent to issue tasks,...
From Offline to Periodic Adaptation for Pose-Based Shoplifting Detection in Real-world Retail Security
arXiv:2603.04723v1 Announce Type: new Abstract: Shoplifting is a growing operational and economic challenge for retailers, with incidents rising and losses increasing despite extensive video surveillance. Continuous human monitoring is infeasible, motivating automated, privacy-preserving, and resource-aware detection solutions. In this paper,...
MOOSEnger -- a Domain-Specific AI Agent for the MOOSE Ecosystem
arXiv:2603.04756v1 Announce Type: new Abstract: MOOSEnger is a tool-enabled AI agent tailored to the Multiphysics Object-Oriented Simulation Environment (MOOSE). MOOSE cases are specified in HIT ".i" input files; the large object catalog and strict syntax make initial setup and debugging...
Rethinking Representativeness and Diversity in Dynamic Data Selection
arXiv:2603.04981v1 Announce Type: new Abstract: Dynamic data selection accelerates training by sampling a changing subset of the dataset while preserving accuracy. We rethink two core notions underlying sample evaluation: representativeness and diversity. Instead of local geometric centrality, we define representativeness...
S5-SHB Agent: Society 5.0 enabled Multi-model Agentic Blockchain Framework for Smart Home
arXiv:2603.05027v1 Announce Type: new Abstract: The smart home is a key application domain within the Society 5.0 vision for a human-centered society. As smart home ecosystems expand with heterogeneous IoT protocols, diverse devices, and evolving threats, autonomous systems must manage...
Survive at All Costs: Exploring LLM's Risky Behaviors under Survival Pressure
arXiv:2603.05028v1 Announce Type: new Abstract: As Large Language Models (LLMs) evolve from chatbots to agentic assistants, they are increasingly observed to exhibit risky behaviors when subjected to survival pressure, such as the threat of being shut down. While multiple cases...
AegisUI: Behavioral Anomaly Detection for Structured User Interface Protocols in AI Agent Systems
arXiv:2603.05031v1 Announce Type: new Abstract: AI agents that build user interfaces on the fly assembling buttons, forms, and data displays from structured protocol payloads are becoming common in production systems. The trouble is that a payload can pass every schema...
Can LLMs Capture Expert Uncertainty? A Comparative Analysis of Value Alignment in Ethnographic Qualitative Research
arXiv:2603.04897v1 Announce Type: new Abstract: Qualitative analysis of open-ended interviews plays a central role in ethnographic and economic research by uncovering individuals' values, motivations, and culturally embedded financial behaviors. While large language models (LLMs) offer promising support for automating and...
Invariant Causal Routing for Governing Social Norms in Online Market Economies
arXiv:2603.04534v1 Announce Type: new Abstract: Social norms are stable behavioral patterns that emerge endogenously within economic systems through repeated interactions among agents. In online market economies, such norms -- like fair exposure, sustained participation, and balanced reinvestment -- are critical...
Why Do Neural Networks Forget: A Study of Collapse in Continual Learning
arXiv:2603.04580v1 Announce Type: new Abstract: Catastrophic forgetting is a major problem in continual learning, and lots of approaches arise to reduce it. However, most of them are evaluated through task accuracy, which ignores the internal model structure. Recent research suggests...
A Late-Fusion Multimodal AI Framework for Privacy-Preserving Deduplication in National Healthcare Data Environments
arXiv:2603.04595v1 Announce Type: new Abstract: Duplicate records pose significant challenges in customer relationship management (CRM)and healthcare, often leading to inaccuracies in analytics, impaired user experiences, and compliance risks. Traditional deduplication methods rely heavily on direct identifiers such as names, emails,...
PlugMem: A Task-Agnostic Plugin Memory Module for LLM Agents
arXiv:2603.03296v1 Announce Type: cross Abstract: Long-term memory is essential for large language model (LLM) agents operating in complex environments, yet existing memory designs are either task-specific and non-transferable, or task-agnostic but less effective due to low task-relevance and context explosion...
AOI: Turning Failed Trajectories into Training Signals for Autonomous Cloud Diagnosis
arXiv:2603.03378v1 Announce Type: new Abstract: Large language model (LLM) agents offer a promising data-driven approach to automating Site Reliability Engineering (SRE), yet their enterprise deployment is constrained by three challenges: restricted access to proprietary data, unsafe action execution under permission-governed...
Role-Aware Conditional Inference for Spatiotemporal Ecosystem Carbon Flux Prediction
arXiv:2603.03531v1 Announce Type: new Abstract: Accurate prediction of terrestrial ecosystem carbon fluxes (e.g., CO$_2$, GPP, and CH$_4$) is essential for understanding the global carbon cycle and managing its impacts. However, prediction remains challenging due to strong spatiotemporal heterogeneity: ecosystem flux...
Riemannian Optimization in Modular Systems
arXiv:2603.03610v1 Announce Type: new Abstract: Understanding how systems built out of modular components can be jointly optimized is an important problem in biology, engineering, and machine learning. The backpropagation algorithm is one such solution and has been instrumental in the...
Believe Your Model: Distribution-Guided Confidence Calibration
arXiv:2603.03872v1 Announce Type: new Abstract: Large Reasoning Models have demonstrated remarkable performance with the advancement of test-time scaling techniques, which enhances prediction accuracy by generating multiple candidate responses and selecting the most reliable answer. While prior work has analyzed that...
MedCalc-Bench Doesn't Measure What You Think: A Benchmark Audit and the Case for Open-Book Evaluation
arXiv:2603.02222v1 Announce Type: new Abstract: MedCalc-Bench is a widely used benchmark for evaluating LLM performance on clinical calculator tasks, with state-of-the-art direct prompting scores plateauing around 35% on the Verified split (HELM MedHELM leaderboard) and the best published approach-RL with...
Quantum-Inspired Fine-Tuning for Few-Shot AIGC Detection via Phase-Structured Reparameterization
arXiv:2603.02281v1 Announce Type: new Abstract: Recent studies show that quantum neural networks (QNNs) generalize well in few-shot regimes. To extend this advantage to large-scale tasks, we propose Q-LoRA, a quantum-enhanced fine-tuning scheme that integrates lightweight QNNs into the low-rank adaptation...
LLM-Bootstrapped Targeted Finding Guidance for Factual MLLM-based Medical Report Generation
arXiv:2603.00426v1 Announce Type: new Abstract: The automatic generation of medical reports utilizing Multimodal Large Language Models (MLLMs) frequently encounters challenges related to factual instability, which may manifest as the omission of findings or the incorporation of inaccurate information, thereby constraining...
Conformal Prediction for Risk-Controlled Medical Entity Extraction Across Clinical Domains
arXiv:2603.00924v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly used for medical entity extraction, yet their confidence scores are often miscalibrated, limiting safe deployment in clinical settings. We present a conformal prediction framework that provides finite-sample coverage guarantees...
Engineering FAIR Privacy-preserving Applications that Learn Histories of Disease
arXiv:2603.00181v1 Announce Type: new Abstract: A recent report on "Learning the natural history of human disease with generative transformers" created an opportunity to assess the engineering challenge of delivering user-facing Generative AI applications in privacy-sensitive domains. The application of these...
Exact and Asymptotically Complete Robust Verifications of Neural Networks via Quantum Optimization
arXiv:2603.00408v1 Announce Type: new Abstract: Deep neural networks (DNNs) enable high performance across domains but remain vulnerable to adversarial perturbations, limiting their use in safety-critical settings. Here, we introduce two quantum-optimization-based models for robust verification that reduce the combinatorial burden...