Black-Box Reliability Certification for AI Agents via Self-Consistency Sampling and Conformal Calibration
arXiv:2602.21368v1 Announce Type: cross Abstract: Given a black-box AI system and a task, at what confidence level can a practitioner trust the system's output? We answer with a reliability level -- a single number per system-task pair, derived from self-consistency...
The Mean is the Mirage: Entropy-Adaptive Model Merging under Heterogeneous Domain Shifts in Medical Imaging
arXiv:2602.21372v1 Announce Type: cross Abstract: Model merging under unseen test-time distribution shifts often renders naive strategies, such as mean averaging unreliable. This challenge is especially acute in medical imaging, where models are fine-tuned locally at clinics on private data, producing...
Small Language Models for Privacy-Preserving Clinical Information Extraction in Low-Resource Languages
arXiv:2602.21374v1 Announce Type: cross Abstract: Extracting clinical information from medical transcripts in low-resource languages remains a significant challenge in healthcare natural language processing (NLP). This study evaluates a two-step pipeline combining Aya-expanse-8B as a Persian-to-English translation model with five open-source...
MrBERT: Modern Multilingual Encoders via Vocabulary, Domain, and Dimensional Adaptation
arXiv:2602.21379v1 Announce Type: cross Abstract: We introduce MrBERT, a family of 150M-300M parameter encoders built on the ModernBERT architecture and pre-trained on 35 languages and code. Through targeted adaptation, this model family achieves state-of-the-art results on Catalan- and Spanish-specific tasks,...
The Headless Firm: How AI Reshapes Enterprise Boundaries
arXiv:2602.21401v1 Announce Type: cross Abstract: The boundary of the firm is determined by coordination cost. We argue that agentic AI induces a structural change in how coordination costs scale: in prior modular systems, integration cost grew with interaction topology (O(n^2)...
Multi-Level Causal Embeddings
arXiv:2602.22287v1 Announce Type: new Abstract: Abstractions of causal models allow for the coarsening of models such that relations of cause and effect are preserved. Whereas abstractions focus on the relation between two models, in this paper we study a framework...
How Do Latent Reasoning Methods Perform Under Weak and Strong Supervision?
arXiv:2602.22441v1 Announce Type: new Abstract: Latent reasoning has been recently proposed as a reasoning paradigm and performs multi-step reasoning through generating steps in the latent space instead of the textual space. This paradigm enables reasoning beyond discrete language tokens by...
Mirroring the Mind: Distilling Human-Like Metacognitive Strategies into Large Language Models
arXiv:2602.22508v1 Announce Type: new Abstract: Large Reasoning Models (LRMs) often exhibit structural fragility in complex reasoning tasks, failing to produce correct answers even after successfully deriving valid intermediate steps. Through systematic analysis, we observe that these failures frequently stem not...
Strategy Executability in Mathematical Reasoning: Leveraging Human-Model Differences for Effective Guidance
arXiv:2602.22583v1 Announce Type: new Abstract: Example-based guidance is widely used to improve mathematical reasoning at inference time, yet its effectiveness is highly unstable across problems and models-even when the guidance is correct and problem-relevant. We show that this instability arises...
Correcting Human Labels for Rater Effects in AI Evaluation: An Item Response Theory Approach
arXiv:2602.22585v1 Announce Type: new Abstract: Human evaluations play a central role in training and assessing AI models, yet these data are rarely treated as measurements subject to systematic error. This paper integrates psychometric rater models into the AI pipeline to...
Generative Data Transformation: From Mixed to Unified Data
arXiv:2602.22743v1 Announce Type: new Abstract: Recommendation model performance is intrinsically tied to the quality, volume, and relevance of their training data. To address common challenges like data sparsity and cold start, recent researchs have leveraged data from multiple auxiliary domains...
Know What You Know: Metacognitive Entropy Calibration for Verifiable RL Reasoning
arXiv:2602.22751v1 Announce Type: new Abstract: Large reasoning models (LRMs) have emerged as a powerful paradigm for solving complex real-world tasks. In practice, these models are predominantly trained via Reinforcement Learning with Verifiable Rewards (RLVR), yet most existing outcome-only RLVR pipelines...
Decomposing Physician Disagreement in HealthBench
arXiv:2602.22758v1 Announce Type: new Abstract: We decompose physician disagreement in the HealthBench medical AI evaluation dataset to understand where variance resides and what observable features can explain it. Rubric identity accounts for 15.8% of met/not-met label variance but only 3.6-6.9%...
When Should an AI Act? A Human-Centered Model of Scene, Context, and Behavior for Agentic AI Design
arXiv:2602.22814v1 Announce Type: new Abstract: Agentic AI increasingly intervenes proactively by inferring users' situations from contextual data yet often fails for lack of principled judgment about when, why, and whether to act. We address this gap by proposing a conceptual...
Modeling Expert AI Diagnostic Alignment via Immutable Inference Snapshots
arXiv:2602.22973v1 Announce Type: new Abstract: Human-in-the-loop validation is essential in safety-critical clinical AI, yet the transition between initial model inference and expert correction is rarely analyzed as a structured signal. We introduce a diagnostic alignment framework in which the AI-generated...
Learning-based Multi-agent Race Strategies in Formula 1
arXiv:2602.23056v1 Announce Type: new Abstract: In Formula 1, race strategies are adapted according to evolving race conditions and competitors' actions. This paper proposes a reinforcement learning approach for multi-agent race strategy optimization. Agents learn to balance energy management, tire degradation,...
A Fusion of context-aware based BanglaBERT and Two-Layer Stacked LSTM Framework for Multi-Label Cyberbullying Detection
arXiv:2602.22449v1 Announce Type: new Abstract: Cyberbullying has become a serious and growing concern in todays virtual world. When left unnoticed, it can have adverse consequences for social and mental health. Researchers have explored various types of cyberbullying, but most approaches...
Importance of Prompt Optimisation for Error Detection in Medical Notes Using Language Models
arXiv:2602.22483v1 Announce Type: new Abstract: Errors in medical text can cause delays or even result in incorrect treatment for patients. Recently, language models have shown promise in their ability to automatically detect errors in medical text, an ability that has...
Iterative Prompt Refinement for Dyslexia-Friendly Text Summarization Using GPT-4o
arXiv:2602.22524v1 Announce Type: new Abstract: Dyslexia affects approximately 10% of the global population and presents persistent challenges in reading fluency and text comprehension. While existing assistive technologies address visual presentation, linguistic complexity remains a substantial barrier to equitable access. This...
Towards Faithful Industrial RAG: A Reinforced Co-adaptation Framework for Advertising QA
arXiv:2602.22584v1 Announce Type: new Abstract: Industrial advertising question answering (QA) is a high-stakes task in which hallucinated content, particularly fabricated URLs, can lead to financial loss, compliance violations, and legal risk. Although Retrieval-Augmented Generation (RAG) is widely adopted, deploying it...
Enhancing Persuasive Dialogue Agents by Synthesizing Cross-Disciplinary Communication Strategies
arXiv:2602.22696v1 Announce Type: new Abstract: Current approaches to developing persuasive dialogue agents often rely on a limited set of predefined persuasive strategies that fail to capture the complexity of real-world interactions. We applied a cross-disciplinary approach to develop a framework...
The Poly Problem in Zoning: Redefining “Family” for a Changing Society lawreview - Minnesota Law Review
By ARIC SHORT & TANYA PIERCE. Full Text. Single-family zoning has long dictated not only where people may live but also with whom. Although extensively critiqued for perpetuating racial and economic exclusion, these laws also privilege relationships defined by blood,...
Bankruptcy as a National Security Risk lawreview - Minnesota Law Review
By JASON JIA-XI WU. Full Text. Defense contractors lie at the heart of the U.S. national security regime. Each year, over half of the federal defense budget is allocated to contracts outsourcing military operations, projects, and services to private companies....
The Innocence Trap lawreview - Minnesota Law Review
By CAITLIN GLASS & JULIAN GREEN. Full Text. What makes a conviction wrongful? Developments in DNA science have led to a wave of exonerations over the past thirty years, revealing sources of error in the criminal legal process. Innocence organizations...
Waging the Battle for Society’s Soul: The Constitutionality of Juvenile Transfer Legislation in the Wake of Jones v. Mississippi lawreview - Minnesota Law Review
By LOGAN KNUTSON. Full Text. Trying juvenile defendants as adults is a cruel, yet enduring practice in U.S. criminal law. If convicted, these youthful offenders face brutal conditions in adult prison and a lifelong stigma. Although these devastating consequences of...
The Skidmore Compromise: Interpreting Skidmore as a Tiebreaker to Preserve Judicial Wisdom in the Era of Loper Bright lawreview - Minnesota Law Review
By MITCHELL ZAIC. Full Text. 'Law must be stable, and yet it cannot stand still.' Here is the great antinomy confronting us at every turn. Rest and motion, unrelieved and unchecked, are equally destructive. The law, like human kind, if...
The Crisis in U.S. Cancer Care: Law, Markets, and Privatization lawreview - Minnesota Law Review
By DANIEL G. AARON. Full Text. Cancer is surging among youth and young adults in the United States, yet, instead of public regulation addressing its root causes, we have outsourced the management of cancer to the private sector. A suite...
Volume 110 – Issue 3 - Minnesota Law Review
Regulatory History and Judicial Review lawreview - Minnesota Law Review
By TODD PHILLIPS & ANTHONY MOFFA. Full Text. The Administrative Procedure Act (APA) requires federal agencies to simply "incorporate in the rules adopted a concise general statement of their basis and purpose" after they receive comments from the public, and...
Trump moves to ban Anthropic from the US government
The Defense Department pressured Anthropic to drop restrictions on how its AI can be used by the military.