Budget-Aware Agentic Routing via Boundary-Guided Training
arXiv:2602.21227v1 Announce Type: cross Abstract: As large language models (LLMs) evolve into autonomous agents that execute long-horizon workflows, invoking a high-capability model at every step becomes economically unsustainable. While model routing is effective for single-turn queries, agentic routing is a...
ACAR: Adaptive Complexity Routing for Multi-Model Ensembles with Auditable Decision Traces
arXiv:2602.21231v1 Announce Type: cross Abstract: We present ACAR (Adaptive Complexity and Attribution Routing), a measurement framework for studying multi-model orchestration under auditable conditions. ACAR uses self-consistency variance (sigma) computed from N=3 probe samples to route tasks across single-model, two-model, and...
AgenticTyper: Automated Typing of Legacy Software Projects Using Agentic AI
arXiv:2602.21251v1 Announce Type: cross Abstract: Legacy JavaScript systems lack type safety, making maintenance risky. While TypeScript can help, manually adding types is expensive. Previous automated typing research focuses on type inference but rarely addresses type checking setup, definition generation, bug...
Group Orthogonalized Policy Optimization:Group Policy Optimization as Orthogonal Projection in Hilbert Space
arXiv:2602.21269v1 Announce Type: cross Abstract: We present Group Orthogonalized Policy Optimization (GOPO), a new alignment algorithm for large language models derived from the geometry of Hilbert function spaces. Instead of optimizing on the probability simplex and inheriting the exponential curvature...
The Mean is the Mirage: Entropy-Adaptive Model Merging under Heterogeneous Domain Shifts in Medical Imaging
arXiv:2602.21372v1 Announce Type: cross Abstract: Model merging under unseen test-time distribution shifts often renders naive strategies, such as mean averaging unreliable. This challenge is especially acute in medical imaging, where models are fine-tuned locally at clinics on private data, producing...
FedVG: Gradient-Guided Aggregation for Enhanced Federated Learning
arXiv:2602.21399v1 Announce Type: cross Abstract: Federated Learning (FL) enables collaborative model training across multiple clients without sharing their private data. However, data heterogeneity across clients leads to client drift, which degrades the overall generalization performance of the model. This effect...
A Framework for Assessing AI Agent Decisions and Outcomes in AutoML Pipelines
arXiv:2602.22442v1 Announce Type: new Abstract: Agent-based AutoML systems rely on large language models to make complex, multi-stage decisions across data processing, model selection, and evaluation. However, existing evaluation practices remain outcome-centric, focusing primarily on final task performance. Through a review...
VeRO: An Evaluation Harness for Agents to Optimize Agents
arXiv:2602.22480v1 Announce Type: new Abstract: An important emerging application of coding agents is agent optimization: the iterative improvement of a target agent through edit-execute-evaluate cycles. Despite its relevance, the community lacks a systematic understanding of coding agent performance on this...
Mirroring the Mind: Distilling Human-Like Metacognitive Strategies into Large Language Models
arXiv:2602.22508v1 Announce Type: new Abstract: Large Reasoning Models (LRMs) often exhibit structural fragility in complex reasoning tasks, failing to produce correct answers even after successfully deriving valid intermediate steps. Through systematic analysis, we observe that these failures frequently stem not...
A Mathematical Theory of Agency and Intelligence
arXiv:2602.22519v1 Announce Type: new Abstract: To operate reliably under changing conditions, complex systems require feedback on how effectively they use resources, not just whether objectives are met. Current AI systems process vast information to produce sophisticated predictions, yet predictions can...
Agentic AI for Intent-driven Optimization in Cell-free O-RAN
arXiv:2602.22539v1 Announce Type: new Abstract: Agentic artificial intelligence (AI) is emerging as a key enabler for autonomous radio access networks (RANs), where multiple large language model (LLM)-based agents reason and collaborate to achieve operator-defined intents. The open RAN (O-RAN) architecture...
CourtGuard: A Model-Agnostic Framework for Zero-Shot Policy Adaptation in LLM Safety
arXiv:2602.22557v1 Announce Type: new Abstract: Current safety mechanisms for Large Language Models (LLMs) rely heavily on static, fine-tuned classifiers that suffer from adaptation rigidity, the inability to enforce new governance rules without expensive retraining. To address this, we introduce CourtGuard,...
AHBid: An Adaptable Hierarchical Bidding Framework for Cross-Channel Advertising
arXiv:2602.22650v1 Announce Type: new Abstract: In online advertising, the inherent complexity and dynamic nature of advertising environments necessitate the use of auto-bidding services to assist advertisers in bid optimization. This complexity is further compounded in multi-channel scenarios, where effective allocation...
Toward Personalized LLM-Powered Agents: Foundations, Evaluation, and Future Directions
arXiv:2602.22680v1 Announce Type: new Abstract: Large language models have enabled agents that reason, plan, and interact with tools and environments to accomplish complex tasks. As these agents operate over extended interaction horizons, their effectiveness increasingly depends on adapting behavior to...
When Should an AI Act? A Human-Centered Model of Scene, Context, and Behavior for Agentic AI Design
arXiv:2602.22814v1 Announce Type: new Abstract: Agentic AI increasingly intervenes proactively by inferring users' situations from contextual data yet often fails for lack of principled judgment about when, why, and whether to act. We address this gap by proposing a conceptual...
DeepPresenter: Environment-Grounded Reflection for Agentic Presentation Generation
arXiv:2602.22839v1 Announce Type: new Abstract: Presentation generation requires deep content research, coherent visual design, and iterative refinement based on observation. However, existing presentation agents often rely on predefined workflows and fixed templates. To address this, we present DeepPresenter, an agentic...
Towards LLM-Empowered Knowledge Tracing via LLM-Student Hierarchical Behavior Alignment in Hyperbolic Space
arXiv:2602.22879v1 Announce Type: new Abstract: Knowledge Tracing (KT) diagnoses students' concept mastery through continuous learning state monitoring in education.Existing methods primarily focus on studying behavioral sequences based on ID or textual information.While existing methods rely on ID-based sequences or shallow...
Certified Circuits: Stability Guarantees for Mechanistic Circuits
arXiv:2602.22968v1 Announce Type: new Abstract: Understanding how neural networks arrive at their predictions is essential for debugging, auditing, and deployment. Mechanistic interpretability pursues this goal by identifying circuits - minimal subnetworks responsible for specific behaviors. However, existing circuit discovery methods...
SPM-Bench: Benchmarking Large Language Models for Scanning Probe Microscopy
arXiv:2602.22971v1 Announce Type: new Abstract: As LLMs achieved breakthroughs in general reasoning, their proficiency in specialized scientific domains reveals pronounced gaps in existing benchmarks due to data contamination, insufficient complexity, and prohibitive human labor costs. Here we present SPM-Bench, an...
RepSPD: Enhancing SPD Manifold Representation in EEGs via Dynamic Graphs
arXiv:2602.22981v1 Announce Type: new Abstract: Decoding brain activity from electroencephalography (EEG) is crucial for neuroscience and clinical applications. Among recent advances in deep learning for EEG, geometric learning stands out as its theoretical underpinnings on symmetric positive definite (SPD) allows...
Obscure but Effective: Classical Chinese Jailbreak Prompt Optimization via Bio-Inspired Search
arXiv:2602.22983v1 Announce Type: new Abstract: As Large Language Models (LLMs) are increasingly used, their security risks have drawn increasing attention. Existing research reveals that LLMs are highly susceptible to jailbreak attacks, with effectiveness varying across language contexts. This paper investigates...
Iterative Prompt Refinement for Dyslexia-Friendly Text Summarization Using GPT-4o
arXiv:2602.22524v1 Announce Type: new Abstract: Dyslexia affects approximately 10% of the global population and presents persistent challenges in reading fluency and text comprehension. While existing assistive technologies address visual presentation, linguistic complexity remains a substantial barrier to equitable access. This...
Bankruptcy as a National Security Risk lawreview - Minnesota Law Review
By JASON JIA-XI WU. Full Text. Defense contractors lie at the heart of the U.S. national security regime. Each year, over half of the federal defense budget is allocated to contracts outsourcing military operations, projects, and services to private companies....
The Crisis in U.S. Cancer Care: Law, Markets, and Privatization lawreview - Minnesota Law Review
By DANIEL G. AARON. Full Text. Cancer is surging among youth and young adults in the United States, yet, instead of public regulation addressing its root causes, we have outsourced the management of cancer to the private sector. A suite...
The Rise of AI-Powered Legal Research: Transforming How Lawyers Work
AI-powered legal research tools are fundamentally changing the practice of law, offering unprecedented efficiency while raising questions about quality and oversight.
The Emerging Legal Framework for Generative AI: A Comprehensive Analysis
As generative AI transforms industries worldwide, legal systems are racing to establish frameworks that balance innovation with accountability.
CRISPR Gene Therapy Patents: The Legal Battle Reshaping Biotechnology
The ongoing patent disputes surrounding CRISPR gene editing technology have profound implications for biotech innovation, patient access, and IP strategy.
Digital Sovereignty: How Nations Are Asserting Control Over Technology Infrastructure
Countries worldwide are implementing digital sovereignty measures to control data flows, technology standards, and digital infrastructure within their borders.
Towards Simulating Social Media Users with LLMs: Evaluating the Operational Validity of Conditioned Comment Prediction
arXiv:2602.22752v1 Announce Type: new Abstract: The transition of Large Language Models (LLMs) from exploratory tools to active "silicon subjects" in social science lacks extensive validation of operational validity. This study introduces Conditioned Comment Prediction (CCP), a task in which a...
AuditBench: Evaluating Alignment Auditing Techniques on Models with Hidden Behaviors
arXiv:2602.22755v1 Announce Type: new Abstract: We introduce AuditBench, an alignment auditing benchmark. AuditBench consists of 56 language models with implanted hidden behaviors. Each model has one of 14 concerning behaviors--such as sycophantic deference, opposition to AI regulation, or secret geopolitical...