Early Evidence of Vibe-Proving with Consumer LLMs: A Case Study on Spectral Region Characterization with ChatGPT-5.2 (Thinking)
arXiv:2602.18918v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly used as scientific copilots, but evidence on their role in research-level mathematics remains limited, especially for workflows accessible to individual researchers. We present early evidence for vibe-proving with a...
Robust and Efficient Tool Orchestration via Layered Execution Structures with Reflective Correction
arXiv:2602.18968v1 Announce Type: new Abstract: Tool invocation is a core capability of agentic systems, yet failures often arise not from individual tool calls but from how multiple tools are organized and executed together. Existing approaches tightly couple tool execution with...
Beyond Behavioural Trade-Offs: Mechanistic Tracing of Pain-Pleasure Decisions in an LLM
arXiv:2602.19159v1 Announce Type: new Abstract: Prior behavioural work suggests that some LLMs alter choices when options are framed as causing pain or pleasure, and that such deviations can scale with stated intensity. To bridge behavioural evidence (what the model does)...
Portfolio Reinforcement Learning with Scenario-Context Rollout
arXiv:2602.24037v1 Announce Type: new Abstract: Market regime shifts induce distribution shifts that can degrade the performance of portfolio rebalancing policies. We propose macro-conditioned scenario-context rollout (SCR) that generates plausible next-day multivariate return scenarios under stress events. However, doing so faces...
SALIENT: Frequency-Aware Paired Diffusion for Controllable Long-Tail CT Detection
arXiv:2602.23447v1 Announce Type: cross Abstract: Detection of rare lesions in whole-body CT is fundamentally limited by extreme class imbalance and low target-to-volume ratios, producing precision collapse despite high AUROC. Synthetic augmentation with diffusion models offers promise, yet pixel-space diffusion is...
Optimization of Edge Directions and Weights for Mixed Guidance Graphs in Lifelong Multi-Agent Path Finding
arXiv:2602.23468v1 Announce Type: cross Abstract: Multi-Agent Path Finding (MAPF) aims to move agents from their start to goal vertices on a graph. Lifelong MAPF (LMAPF) continuously assigns new goals to agents as they complete current ones. To guide agents' movement...
IDP Accelerator: Agentic Document Intelligence from Extraction to Compliance Validation
arXiv:2602.23481v1 Announce Type: new Abstract: Understanding and extracting structured insights from unstructured documents remains a foundational challenge in industrial NLP. While Large Language Models (LLMs) enable zero-shot extraction, traditional pipelines often fail to handle multi-document packets, complex reasoning, and strict...
EmCoop: A Framework and Benchmark for Embodied Cooperation Among LLM Agents
arXiv:2603.00349v1 Announce Type: new Abstract: Real-world scenarios increasingly require multiple embodied agents to collaborate in dynamic environments under embodied constraints, as many tasks exceed the capabilities of any single agent. Recent advances in large language models (LLMs) enable high-level cognitive...
FinTexTS: Financial Text-Paired Time-Series Dataset via Semantic-Based and Multi-Level Pairing
arXiv:2603.02702v1 Announce Type: new Abstract: The financial domain involves a variety of important time-series problems. Recently, time-series analysis methods that jointly leverage textual and numerical information have gained increasing attention. Accordingly, numerous efforts have been made to construct text-paired time-series...
Evaluating the Search Agent in a Parallel World
arXiv:2603.04751v1 Announce Type: new Abstract: Integrating web search tools has significantly extended the capability of LLMs to address open-world, real-time, and long-tail problems. However, evaluating these Search Agents presents formidable challenges. First, constructing high-quality deep search benchmarks is prohibitively expensive,...
Rethinking Representativeness and Diversity in Dynamic Data Selection
arXiv:2603.04981v1 Announce Type: new Abstract: Dynamic data selection accelerates training by sampling a changing subset of the dataset while preserving accuracy. We rethink two core notions underlying sample evaluation: representativeness and diversity. Instead of local geometric centrality, we define representativeness...
S5-SHB Agent: Society 5.0 enabled Multi-model Agentic Blockchain Framework for Smart Home
arXiv:2603.05027v1 Announce Type: new Abstract: The smart home is a key application domain within the Society 5.0 vision for a human-centered society. As smart home ecosystems expand with heterogeneous IoT protocols, diverse devices, and evolving threats, autonomous systems must manage...
Survive at All Costs: Exploring LLM's Risky Behaviors under Survival Pressure
arXiv:2603.05028v1 Announce Type: new Abstract: As Large Language Models (LLMs) evolve from chatbots to agentic assistants, they are increasingly observed to exhibit risky behaviors when subjected to survival pressure, such as the threat of being shut down. While multiple cases...
Enhancing Zero-shot Commonsense Reasoning by Integrating Visual Knowledge via Machine Imagination
arXiv:2603.05040v1 Announce Type: new Abstract: Recent advancements in zero-shot commonsense reasoning have empowered Pre-trained Language Models (PLMs) to acquire extensive commonsense knowledge without requiring task-specific fine-tuning. Despite this progress, these models frequently suffer from limitations caused by human reporting biases...
Jagarin: A Three-Layer Architecture for Hibernating Personal Duty Agents on Mobile
arXiv:2603.05069v1 Announce Type: new Abstract: Personal AI agents face a fundamental deployment paradox on mobile: persistent background execution drains battery and violates platform sandboxing policies, yet purely reactive agents miss time-sensitive obligations until the user remembers to ask. We present...
Bidirectional Curriculum Generation: A Multi-Agent Framework for Data-Efficient Mathematical Reasoning
arXiv:2603.05120v1 Announce Type: new Abstract: Enhancing mathematical reasoning in Large Language Models typically demands massive datasets, yet data efficiency remains a critical bottleneck. While Curriculum Learning attempts to structure this process, standard unidirectional approaches (simple-to-complex) suffer from inefficient sample utilization:...
AI-Assisted Moot Courts: Simulating Justice-Specific Questioning in Oral Arguments
arXiv:2603.04718v1 Announce Type: new Abstract: In oral arguments, judges probe attorneys with questions about the factual record, legal claims, and the strength of their arguments. To prepare for this questioning, both law schools and practicing attorneys rely on moot courts:...
FedEMA-Distill: Exponential Moving Average Guided Knowledge Distillation for Robust Federated Learning
arXiv:2603.04422v1 Announce Type: new Abstract: Federated learning (FL) often degrades when clients hold heterogeneous non-Independent and Identically Distributed (non-IID) data and when some clients behave adversarially, leading to client drift, slow convergence, and high communication overhead. This paper proposes FedEMA-Distill,...
Agent Memory Below the Prompt: Persistent Q4 KV Cache for Multi-Agent LLM Inference on Edge Devices
arXiv:2603.04428v1 Announce Type: new Abstract: Multi-agent LLM systems on edge devices face a memory management problem: device RAM is too small to hold every agent's KV cache simultaneously. On Apple M4 Pro with 10.2 GB of cache budget, only 3...
Why Do Neural Networks Forget: A Study of Collapse in Continual Learning
arXiv:2603.04580v1 Announce Type: new Abstract: Catastrophic forgetting is a major problem in continual learning, and lots of approaches arise to reduce it. However, most of them are evaluated through task accuracy, which ignores the internal model structure. Recent research suggests...
A Late-Fusion Multimodal AI Framework for Privacy-Preserving Deduplication in National Healthcare Data Environments
arXiv:2603.04595v1 Announce Type: new Abstract: Duplicate records pose significant challenges in customer relationship management (CRM)and healthcare, often leading to inaccuracies in analytics, impaired user experiences, and compliance risks. Traditional deduplication methods rely heavily on direct identifiers such as names, emails,...
EVMbench: Evaluating AI Agents on Smart Contract Security
arXiv:2603.04915v1 Announce Type: new Abstract: Smart contracts on public blockchains now manage large amounts of value, and vulnerabilities in these systems can lead to substantial losses. As AI agents become more capable at reading, writing, and running code, it is...
The Untold Story of the Proto-Smith Era: Justice O’Connor’s Papers and the Court’s Free Exercise Revolution
Justice O’Connor’s recently released Supreme Court papers reveal the untold story of how the Court systematically dismantled religious accommodation protections in the decade leading up to Employment Division v. Smith. While Smith’s abandonment of strict scrutiny for neutral, generally applicable...
Birthright citizenship: the exceptions provide the rule
The battle over birthright citizenship is a battle over its exceptions. The 14th Amendment’s first sentence proudly proclaims that “[a]ll persons born . . . in the United States, and subject to the jurisdiction […]The postBirthright citizenship: the exceptions provide...
Anthropic’s Pentagon deal is a cautionary tale for startups chasing federal contracts
The Pentagon has officially designated Anthropic a supply-chain risk after the two failed to agree on how much control the military should have over its AI models, including its use in autonomous weapons and mass domestic surveillance. As Anthropic’s $200...
Anthropic vs. the Pentagon, the SaaSpocalypse, and why competitions is good, actually
The Pentagon has officially designated Anthropic a supply-chain risk after the two failed to agree on how much control the military should have over its AI models, including its use in autonomous weapons and mass domestic surveillance. As Anthropic’s $200...
Developing an AI Assistant for Knowledge Management and Workforce Training in State DOTs
arXiv:2603.03302v1 Announce Type: cross Abstract: Effective knowledge management is critical for preserving institutional expertise and improving the efficiency of workforce training in state transportation agencies. Traditional approaches, such as static documentation, classroom-based instruction, and informal mentorship, often lead to fragmented...
Old Habits Die Hard: How Conversational History Geometrically Traps LLMs
arXiv:2603.03308v1 Announce Type: cross Abstract: How does the conversational past of large language models (LLMs) influence their future performance? Recent work suggests that LLMs are affected by their conversational history in unexpected ways. For instance, hallucinations in prior interactions may...
PulseLM: A Foundation Dataset and Benchmark for PPG-Text Learning
arXiv:2603.03331v1 Announce Type: new Abstract: Photoplethysmography (PPG) is a widely used non-invasive sensing modality for continuous cardiovascular and physiological monitoring across clinical, laboratory, and wearable settings. While existing PPG datasets support a broad range of downstream tasks, they typically provide...
AOI: Turning Failed Trajectories into Training Signals for Autonomous Cloud Diagnosis
arXiv:2603.03378v1 Announce Type: new Abstract: Large language model (LLM) agents offer a promising data-driven approach to automating Site Reliability Engineering (SRE), yet their enterprise deployment is constrained by three challenges: restricted access to proprietary data, unsafe action execution under permission-governed...