Architecting AgentOS: From Token-Level Context to Emergent System-Level Intelligence
arXiv:2602.20934v1 Announce Type: new Abstract: The paradigm of Large Language Models is undergoing a fundamental transition from static inference engines to dynamic autonomous cognitive systems.While current research primarily focuses on scaling context windows or optimizing prompt engineering the theoretical bridge...
LogicGraph : Benchmarking Multi-Path Logical Reasoning via Neuro-Symbolic Generation and Verification
arXiv:2602.21044v1 Announce Type: new Abstract: Evaluations of large language models (LLMs) primarily emphasize convergent logical reasoning, where success is defined by producing a single correct proof. However, many real-world reasoning problems admit multiple valid derivations, requiring models to explore diverse...
Tool Building as a Path to "Superintelligence"
arXiv:2602.21061v1 Announce Type: new Abstract: The Diligent Learner framework suggests LLMs can achieve superintelligence via test-time search, provided a sufficient step-success probability $\gamma$. In this work, we design a benchmark to measure $\gamma$ on logical out-of-distribution inference. We construct a...
NoRD: A Data-Efficient Vision-Language-Action Model that Drives without Reasoning
arXiv:2602.21172v1 Announce Type: new Abstract: Vision-Language-Action (VLA) models are advancing autonomous driving by replacing modular pipelines with unified end-to-end architectures. However, current VLAs face two expensive requirements: (1) massive dataset collection, and (2) dense reasoning annotations. In this work, we...
No One Size Fits All: QueryBandits for Hallucination Mitigation
arXiv:2602.20332v1 Announce Type: new Abstract: Advanced reasoning capabilities in Large Language Models (LLMs) have led to more frequent hallucinations; yet most mitigation work focuses on open-source models for post-hoc detection and parameter editing. The dearth of studies focusing on hallucinations...
A Dynamic Survey of Soft Set Theory and Its Extensions
arXiv:2602.21268v1 Announce Type: new Abstract: Soft set theory provides a direct framework for parameterized decision modeling by assigning to each attribute (parameter) a subset of a given universe, thereby representing uncertainty in a structured way [1, 2]. Over the past...
Latent Context Compilation: Distilling Long Context into Compact Portable Memory
arXiv:2602.21221v1 Announce Type: cross Abstract: Efficient long-context LLM deployment is stalled by a dichotomy between amortized compression, which struggles with out-of-distribution generalization, and Test-Time Training, which incurs prohibitive synthetic data costs and requires modifying model weights, creating stateful parameters that...
Measuring Pragmatic Influence in Large Language Model Instructions
arXiv:2602.21223v1 Announce Type: cross Abstract: It is not only what we ask large language models (LLMs) to do that matters, but also how we prompt. Phrases like "This is urgent" or "As your supervisor" can shift model behavior without altering...
Bounded Rationality and the Theory of Property
ARTICLE Bounded Rationality and the Theory of Property Oren Bar-Gill* & Nicola Persico** Strong, property rule protection—implemented via injunctions, criminal sanctions, and supercompensatory damages—is a defining aspect of property. What is the theoretical justification for property rule protection? The conventional...
Autonomous Vehicles and Liability: Who Is Responsible When AI Drives?
As autonomous vehicles approach widespread deployment, legal frameworks for determining liability in accidents involving self-driving cars remain uncertain.
ACAR: Adaptive Complexity Routing for Multi-Model Ensembles with Auditable Decision Traces
arXiv:2602.21231v1 Announce Type: cross Abstract: We present ACAR (Adaptive Complexity and Attribution Routing), a measurement framework for studying multi-model orchestration under auditable conditions. ACAR uses self-consistency variance (sigma) computed from N=3 probe samples to route tasks across single-model, two-model, and...
A General Equilibrium Theory of Orchestrated AI Agent Systems
arXiv:2602.21255v1 Announce Type: cross Abstract: We establish a general equilibrium theory for systems of large language model (LLM) agents operating under centralized orchestration. The framework is a production economy in the sense of Arrow-Debreu (1954), extended to infinite-dimensional commodity spaces...
A Systematic Review of Algorithmic Red Teaming Methodologies for Assurance and Security of AI Applications
arXiv:2602.21267v1 Announce Type: cross Abstract: Cybersecurity threats are becoming increasingly sophisticated, making traditional defense mechanisms and manual red teaming approaches insufficient for modern organizations. While red teaming has long been recognized as an effective method to identify vulnerabilities by simulating...
Equitable Evaluation via Elicitation
arXiv:2602.21327v1 Announce Type: cross Abstract: Individuals with similar qualifications and skills may vary in their demeanor, or outward manner: some tend toward self-promotion while others are modest to the point of omitting crucial information. Comparing the self-descriptions of equally qualified...
Alignment-Weighted DPO: A principled reasoning approach to improve safety alignment
arXiv:2602.21346v1 Announce Type: cross Abstract: Recent advances in alignment techniques such as Supervised Fine-Tuning (SFT), Reinforcement Learning from Human Feedback (RLHF), and Direct Preference Optimization (DPO) have improved the safety of large language models (LLMs). However, these LLMs remain vulnerable...
Towards single-shot coherent imaging via overlap-free ptychography
arXiv:2602.21361v1 Announce Type: cross Abstract: Ptychographic imaging at synchrotron and XFEL sources requires dense overlapping scans, limiting throughput and increasing dose. Extending coherent diffractive imaging to overlap-free operation on extended samples remains an open problem. Here, we extend PtychoPINN (O....
FedVG: Gradient-Guided Aggregation for Enhanced Federated Learning
arXiv:2602.21399v1 Announce Type: cross Abstract: Federated Learning (FL) enables collaborative model training across multiple clients without sharing their private data. However, data heterogeneity across clients leads to client drift, which degrades the overall generalization performance of the model. This effect...
Graph Your Way to Inspiration: Integrating Co-Author Graphs with Retrieval-Augmented Generation for Large Language Model Based Scientific Idea Generation
arXiv:2602.22215v1 Announce Type: new Abstract: Large Language Models (LLMs) demonstrate potential in the field of scientific idea generation. However, the generated results often lack controllable academic context and traceable inspiration pathways. To bridge this gap, this paper proposes a scientific...
FIRE: A Comprehensive Benchmark for Financial Intelligence and Reasoning Evaluation
arXiv:2602.22273v1 Announce Type: new Abstract: We introduce FIRE, a comprehensive benchmark designed to evaluate both the theoretical financial knowledge of LLMs and their ability to handle practical business scenarios. For theoretical assessment, we curate a diverse set of examination questions...
ConstraintBench: Benchmarking LLM Constraint Reasoning on Direct Optimization
arXiv:2602.22465v1 Announce Type: new Abstract: Large language models are increasingly applied to operational decision-making where the underlying structure is constrained optimization. Existing benchmarks evaluate whether LLMs can formulate optimization problems as solver code, but leave open a complementary question. Can...
VeRO: An Evaluation Harness for Agents to Optimize Agents
arXiv:2602.22480v1 Announce Type: new Abstract: An important emerging application of coding agents is agent optimization: the iterative improvement of a target agent through edit-execute-evaluate cycles. Despite its relevance, the community lacks a systematic understanding of coding agent performance on this...
A Mathematical Theory of Agency and Intelligence
arXiv:2602.22519v1 Announce Type: new Abstract: To operate reliably under changing conditions, complex systems require feedback on how effectively they use resources, not just whether objectives are met. Current AI systems process vast information to produce sophisticated predictions, yet predictions can...
Decomposing Physician Disagreement in HealthBench
arXiv:2602.22758v1 Announce Type: new Abstract: We decompose physician disagreement in the HealthBench medical AI evaluation dataset to understand where variance resides and what observable features can explain it. Rubric identity accounts for 15.8% of met/not-met label variance but only 3.6-6.9%...
Certified Circuits: Stability Guarantees for Mechanistic Circuits
arXiv:2602.22968v1 Announce Type: new Abstract: Understanding how neural networks arrive at their predictions is essential for debugging, auditing, and deployment. Mechanistic interpretability pursues this goal by identifying circuits - minimal subnetworks responsible for specific behaviors. However, existing circuit discovery methods...
Scaling In, Not Up? Testing Thick Citation Context Analysis with GPT-5 and Fragile Prompts
arXiv:2602.22359v1 Announce Type: new Abstract: This paper tests whether large language models (LLMs) can support interpretative citation context analysis (CCA) by scaling in thick, text-grounded readings of a single hard case rather than scaling up typological labels. It foregrounds prompt-sensitivity...
Causality $\neq$ Invariance: Function and Concept Vectors in LLMs
arXiv:2602.22424v1 Announce Type: new Abstract: Do large language models (LLMs) represent concepts abstractly, i.e., independent of input format? We revisit Function Vectors (FVs), compact representations of in-context learning (ICL) tasks that causally drive task performance. Across multiple LLMs, we show...
dLLM: Simple Diffusion Language Modeling
arXiv:2602.22661v1 Announce Type: new Abstract: Although diffusion language models (DLMs) are evolving quickly, many recent models converge on a set of shared components. These components, however, are distributed across ad-hoc research codebases or lack transparent implementations, making them difficult to...
The Poly Problem in Zoning: Redefining “Family” for a Changing Society lawreview - Minnesota Law Review
By ARIC SHORT & TANYA PIERCE. Full Text. Single-family zoning has long dictated not only where people may live but also with whom. Although extensively critiqued for perpetuating racial and economic exclusion, these laws also privilege relationships defined by blood,...
The Innocence Trap lawreview - Minnesota Law Review
By CAITLIN GLASS & JULIAN GREEN. Full Text. What makes a conviction wrongful? Developments in DNA science have led to a wave of exonerations over the past thirty years, revealing sources of error in the criminal legal process. Innocence organizations...
The Rise of AI-Powered Legal Research: Transforming How Lawyers Work
AI-powered legal research tools are fundamentally changing the practice of law, offering unprecedented efficiency while raising questions about quality and oversight.