I Know What I Don't Know: Latent Posterior Factor Models for Multi-Evidence Probabilistic Reasoning
arXiv:2603.15670v1 Announce Type: new Abstract: Real-world decision-making, from tax compliance assessment to medical diagnosis, requires aggregating multiple noisy and potentially contradictory evidence sources. Existing approaches either lack explicit uncertainty quantification (neural aggregation methods) or rely on manually engineered discrete predicates...
Semi-Autonomous Formalization of the Vlasov-Maxwell-Landau Equilibrium
arXiv:2603.15929v1 Announce Type: new Abstract: We present a complete Lean 4 formalization of the equilibrium characterization in the Vlasov-Maxwell-Landau (VML) system, which describes the motion of charged plasma. The project demonstrates the full AI-assisted mathematical research loop: an AI reasoning...
Context-Length Robustness in Question Answering Models: A Comparative Empirical Study
arXiv:2603.15723v1 Announce Type: new Abstract: Large language models are increasingly deployed in settings where relevant information is embedded within long and noisy contexts. Despite this, robustness to growing context length remains poorly understood across different question answering tasks. In this...
NLP Occupational Emergence Analysis: How Occupations Form and Evolve in Real Time -- A Zero-Assumption Method Demonstrated on AI in the US Technology Workforce, 2022-2026
arXiv:2603.15998v1 Announce Type: new Abstract: Occupations form and evolve faster than classification systems can track. We propose that a genuine occupation is a self-reinforcing structure (a bipartite co-attractor) in which a shared professional vocabulary makes practitioners cohesive as a group,...
MoLoRA: Composable Specialization via Per-Token Adapter Routing
arXiv:2603.15965v1 Announce Type: new Abstract: Multi-adapter serving systems route entire sequences to a single adapter, forcing a choice when requests span multiple domains. This assumption fails in two important settings: (1) multimodal generation, where text and image tokens require different...
Form Follows Function: Recursive Stem Model
arXiv:2603.15641v1 Announce Type: new Abstract: Recursive reasoning models such as Hierarchical Reasoning Model (HRM) and Tiny Recursive Model (TRM) show that small, weight-shared networks can solve compute-heavy and NP puzzles by iteratively refining latent states, but their training typically relies...
Persona-Conditioned Risk Behavior in Large Language Models: A Simulated Gambling Study with GPT-4.1
arXiv:2603.15831v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly deployed as autonomous agents in uncertain, sequential decision-making contexts. Yet it remains poorly understood whether the behaviors they exhibit in such environments reflect principled cognitive patterns or simply surface-level...
POaaS: Minimal-Edit Prompt Optimization as a Service to Lift Accuracy and Cut Hallucinations on On-Device sLLMs
arXiv:2603.16045v1 Announce Type: new Abstract: Small language models (sLLMs) are increasingly deployed on-device, where imperfect user prompts--typos, unclear intent, or missing context--can trigger factual errors and hallucinations. Existing automatic prompt optimization (APO) methods were designed for large cloud LLMs and...
Regularized Latent Dynamics Prediction is a Strong Baseline For Behavioral Foundation Models
arXiv:2603.15857v1 Announce Type: new Abstract: Behavioral Foundation Models (BFMs) produce agents with the capability to adapt to any unknown reward or task. These methods, however, are only able to produce near-optimal policies for the reward functions that are in the...
Agent-based imitation dynamics can yield efficiently compressed population-level vocabularies
arXiv:2603.15903v1 Announce Type: new Abstract: Natural languages have been argued to evolve under pressure to efficiently compress meanings into words by optimizing the Information Bottleneck (IB) complexity-accuracy tradeoff. However, the underlying social dynamics that could drive the optimization of a...
Did You Check the Right Pocket? Cost-Sensitive Store Routing for Memory-Augmented Agents
arXiv:2603.15658v1 Announce Type: new Abstract: Memory-augmented agents maintain multiple specialized stores, yet most systems retrieve from all stores for every query, increasing cost and introducing irrelevant context. We formulate memory retrieval as a store-routing problem and evaluate it using coverage,...
MAC: Multi-Agent Constitution Learning
arXiv:2603.15968v1 Announce Type: new Abstract: Constitutional AI is a method to oversee and control LLMs based on a set of rules written in natural language. These rules are typically written by human experts, but could in principle be learned automatically...
Algorithmic Trading Strategy Development and Optimisation
arXiv:2603.15848v1 Announce Type: new Abstract: The report presents with the development and optimisation of an enhanced algorithmic trading strategy through the use of historical S&P 500 market data and earnings call sentiment analysis. The proposed strategy integrates various technical indicators...
Resilience Meets Autonomy: Governing Embodied AI in Critical Infrastructure
arXiv:2603.15885v1 Announce Type: new Abstract: Critical infrastructure increasingly incorporates embodied AI for monitoring, predictive maintenance, and decision support. However, AI systems designed to handle statistically representable uncertainty struggle with cascading failures and crisis dynamics that exceed their training assumptions. This...
Theoretical Foundations of Latent Posterior Factors: Formal Guarantees for Multi-Evidence Reasoning
arXiv:2603.15674v1 Announce Type: new Abstract: We present a complete theoretical characterization of Latent Posterior Factors (LPF), a principled framework for aggregating multiple heterogeneous evidence items in probabilistic prediction tasks. Multi-evidence reasoning arises pervasively in high-stakes domains including healthcare diagnosis, financial...
ARISE: Agent Reasoning with Intrinsic Skill Evolution in Hierarchical Reinforcement Learning
arXiv:2603.16060v1 Announce Type: new Abstract: The dominant paradigm for improving mathematical reasoning in language models relies on Reinforcement Learning with verifiable rewards. Yet existing methods treat each problem instance in isolation without leveraging the reusable strategies that emerge and accumulate...
COGNAC at SemEval-2026 Task 5: LLM Ensembles for Human-Level Word Sense Plausibility Rating in Challenging Narratives
arXiv:2603.15897v1 Announce Type: new Abstract: We describe our system for SemEval-2026 Task 5, which requires rating the plausibility of given word senses of homonyms in short stories on a 5-point Likert scale. Systems are evaluated by the unweighted average of...
CraniMem: Cranial Inspired Gated and Bounded Memory for Agentic Systems
arXiv:2603.15642v1 Announce Type: new Abstract: Large language model (LLM) agents are increasingly deployed in long running workflows, where they must preserve user and task state across many turns. Many existing agent memory systems behave like external databases with ad hoc...
NeSy-Route: A Neuro-Symbolic Benchmark for Constrained Route Planning in Remote Sensing
arXiv:2603.16307v1 Announce Type: new Abstract: Remote sensing underpins crucial applications such as disaster relief and ecological field surveys, where systems must understand complex scenes and constraints and make reliable decisions. Current remote-sensing benchmarks mainly focus on evaluating perception and reasoning...
SQL-ASTRA: Alleviating Sparse Feedback in Agentic SQL via Column-Set Matching and Trajectory Aggregation
arXiv:2603.16161v1 Announce Type: new Abstract: Agentic Reinforcement Learning (RL) shows promise for complex tasks, but Text-to-SQL remains mostly restricted to single-turn paradigms. A primary bottleneck is the credit assignment problem. In traditional paradigms, rewards are determined solely by the final-turn...
Optimizing Hospital Capacity During Pandemics: A Dual-Component Framework for Strategic Patient Relocation
arXiv:2603.15960v1 Announce Type: new Abstract: The COVID-19 pandemic has placed immense strain on hospital systems worldwide, leading to critical capacity challenges. This research proposes a two-part framework to optimize hospital capacity through patient relocation strategies. The first component involves developing...
RadAnnotate: Large Language Models for Efficient and Reliable Radiology Report Annotation
arXiv:2603.16002v1 Announce Type: new Abstract: Radiology report annotation is essential for clinical NLP, yet manual labeling is slow and costly. We present RadAnnotate, an LLM-based framework that studies retrieval-augmented synthetic reports and confidence-based selective automation to reduce expert effort for...
Understanding Moral Reasoning Trajectories in Large Language Models: Toward Probing-Based Explainability
arXiv:2603.16017v1 Announce Type: new Abstract: Large language models (LLMs) increasingly participate in morally sensitive decision-making, yet how they organize ethical frameworks across reasoning steps remains underexplored. We introduce \textit{moral reasoning trajectories}, sequences of ethical framework invocations across intermediate reasoning steps,...
SEAHateCheck: Functional Tests for Detecting Hate Speech in Low-Resource Languages of Southeast Asia
arXiv:2603.16070v1 Announce Type: new Abstract: Hate speech detection relies heavily on linguistic resources, which are primarily available in high-resource languages such as English and Chinese, creating barriers for researchers and platforms developing tools for low-resource languages in Southeast Asia, where...
ClaimFlow: Tracing the Evolution of Scientific Claims in NLP
arXiv:2603.16073v1 Announce Type: new Abstract: Scientific papers do more than report results $-$ they advance $\textit{claims}$ that later work supports, extends, or sometimes refutes. Yet existing methods for citation and claim analysis capture only fragments of this dialogue. In this...
Frequency Matters: Fast Model-Agnostic Data Curation for Pruning and Quantization
arXiv:2603.16105v1 Announce Type: new Abstract: Post-training model compression is essential for enhancing the portability of Large Language Models (LLMs) while preserving their performance. While several compression approaches have been proposed, less emphasis has been placed on selecting the most suitable...
ASDA: Automated Skill Distillation and Adaptation for Financial Reasoning
arXiv:2603.16112v1 Announce Type: new Abstract: Adapting large language models (LLMs) to specialized financial reasoning typically requires expensive fine-tuning that produces model-locked expertise. Training-free alternatives have emerged, yet our experiments show that leading methods (GEPA and ACE) achieve only marginal gains...
Language Models Don't Know What You Want: Evaluating Personalization in Deep Research Needs Real Users
arXiv:2603.16120v1 Announce Type: new Abstract: Deep Research (DR) tools (e.g. OpenAI DR) help researchers cope with ballooning publishing counts. Such tools can synthesize scientific papers to answer researchers' queries, but lack understanding of their users. We change that in MyScholarQA...
Pre-training LLM without Learning Rate Decay Enhances Supervised Fine-Tuning
arXiv:2603.16127v1 Announce Type: new Abstract: We investigate the role of learning rate scheduling in the large-scale pre-training of large language models, focusing on its influence on downstream performance after supervised fine-tuning (SFT). Decay-based learning rate schedulers are widely used to...