Adaptive Conditional Forest Sampling for Spectral Risk Optimisation under Decision-Dependent Uncertainty
arXiv:2603.12507v1 Announce Type: new Abstract: Minimising a spectral risk objective, defined as a convex combination of expected cost and Conditional Value-at-Risk (CVaR), is challenging when the uncertainty distribution is decision-dependent, making both surrogate modelling and simulation-based ranking sensitive to tail...
CALF: Communication-Aware Learning Framework for Distributed Reinforcement Learning
arXiv:2603.12543v1 Announce Type: new Abstract: Distributed reinforcement learning policies face network delays, jitter, and packet loss when deployed across edge devices and cloud servers. Standard RL training assumes zero-latency interaction, causing severe performance degradation under realistic network conditions. We introduce...
Lyapunov Stable Graph Neural Flow
arXiv:2603.12557v1 Announce Type: new Abstract: Graph Neural Networks (GNNs) are highly vulnerable to adversarial perturbations in both topology and features, making the learning of robust representations a critical challenge. In this work, we bridge GNNs with control theory to introduce...
Swap-guided Preference Learning for Personalized Reinforcement Learning from Human Feedback
arXiv:2603.12595v1 Announce Type: new Abstract: Reinforcement Learning from Human Feedback (RLHF) is a widely used approach to align large-scale AI systems with human values. However, RLHF typically assumes a single, universal reward, which overlooks diverse preferences and limits personalization. Variational...
When Drafts Evolve: Speculative Decoding Meets Online Learning
arXiv:2603.12617v1 Announce Type: new Abstract: Speculative decoding has emerged as a widely adopted paradigm for accelerating large language model inference, where a lightweight draft model rapidly generates candidate tokens that are then verified in parallel by a larger target model....
Human-AI Collaborative Autonomous Experimentation With Proxy Modeling for Comparative Observation
arXiv:2603.12618v1 Announce Type: new Abstract: Optimization for different tasks like material characterization, synthesis, and functional properties for desired applications over multi-dimensional control parameters need a rapid strategic search through active learning such as Bayesian optimization (BO). However, such high-dimensional experimental...
Adaptive Diffusion Posterior Sampling for Data and Model Fusion of Complex Nonlinear Dynamical Systems
arXiv:2603.12635v1 Announce Type: new Abstract: High-fidelity numerical simulations of chaotic, high dimensional nonlinear dynamical systems are computationally expensive, necessitating the development of efficient surrogate models. Most surrogate models for such systems are deterministic, for example when neural operators are involved....
LightMoE: Reducing Mixture-of-Experts Redundancy through Expert Replacing
arXiv:2603.12645v1 Announce Type: new Abstract: Mixture-of-Experts (MoE) based Large Language Models (LLMs) have demonstrated impressive performance and computational efficiency. However, their deployment is often constrained by substantial memory demands, primarily due to the need to load numerous expert modules. While...
Federated Hierarchical Clustering with Automatic Selection of Optimal Cluster Numbers
arXiv:2603.12684v1 Announce Type: new Abstract: Federated Clustering (FC) is an emerging and promising solution in exploring data distribution patterns from distributed and privacy-protected data in an unsupervised manner. Existing FC methods implicitly rely on the assumption that clients are with...
Does legislative history have a judicial future?
Major Questions is a recurring series by Adam White, which analyzes the court’s approach to administrative law, agencies, and the lower courts. Does legislative history have a future in judicial […]The postDoes legislative history have a judicial future?appeared first onSCOTUSblog.
Training Is Everything: Artificial Intelligence, Copyright, and Fair Training
To learn how to behave, the current revolutionary generation of AIs must be trained on vast quantities of published images, written works, and sounds, many of which fall within the core subject matter of copyright law. To some, the use...
Verified Multi-Agent Orchestration: A Plan-Execute-Verify-Replan Framework for Complex Query Resolution
arXiv:2603.11445v1 Announce Type: new Abstract: We present Verified Multi-Agent Orchestration (VMAO), a framework that coordinates specialized LLM-based agents through a verification-driven iterative loop. Given a complex query, our system decomposes it into a directed acyclic graph (DAG) of sub-questions, executes...
From Debate to Deliberation: Structured Collective Reasoning with Typed Epistemic Acts
arXiv:2603.11781v1 Announce Type: new Abstract: Multi-agent LLM systems increasingly tackle complex reasoning, yet their interaction patterns remain limited to voting, unstructured debate, or pipeline orchestration. None model deliberation: a phased process where differentiated participants exchange typed reasoning moves, preserve disagreements,...
BLooP: Zero-Shot Abstractive Summarization using Large Language Models with Bigram Lookahead Promotion
arXiv:2603.11415v1 Announce Type: new Abstract: Abstractive summarization requires models to generate summaries that convey information in the source document. While large language models can generate summaries without fine-tuning, they often miss key details and include extraneous information. We propose BLooP...
AI Knows What's Wrong But Cannot Fix It: Helicoid Dynamics in Frontier LLMs Under High-Stakes Decisions
arXiv:2603.11559v1 Announce Type: new Abstract: Large language models perform reliably when their outputs can be checked: solving equations, writing code, retrieving facts. They perform differently when checking is impossible, as when a clinician chooses an irreversible treatment on incomplete data,...
LLM-Augmented Digital Twin for Policy Evaluation in Short-Video Platforms
arXiv:2603.11333v1 Announce Type: new Abstract: Short-video platforms are closed-loop, human-in-the-loop ecosystems where platform policy, creator incentives, and user behavior co-evolve. This feedback structure makes counterfactual policy evaluation difficult in production, especially for long-horizon and distributional outcomes. The challenge is amplified...
GPT4o-Receipt: A Dataset and Human Study for AI-Generated Document Forensics
arXiv:2603.11442v1 Announce Type: new Abstract: Can humans detect AI-generated financial documents better than machines? We present GPT4o-Receipt, a benchmark of 1,235 receipt images pairing GPT-4o-generated receipts with authentic ones from established datasets, evaluated by five state-of-the-art multimodal LLMs and a...
Governing Evolving Memory in LLM Agents: Risks, Mechanisms, and the Stability and Safety Governed Memory (SSGM) Framework
arXiv:2603.11768v1 Announce Type: new Abstract: Long-term memory has emerged as a foundational component of autonomous Large Language Model (LLM) agents, enabling continuous adaptation, lifelong multimodal learning, and sophisticated reasoning. However, as memory systems transition from static retrieval databases to dynamic,...
FinRule-Bench: A Benchmark for Joint Reasoning over Financial Tables and Principles
arXiv:2603.11339v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly applied to financial analysis, yet their ability to audit structured financial statements under explicit accounting principles remains poorly explored. Existing benchmarks primarily evaluate question answering, numerical reasoning, or anomaly...
Summarize Before You Speak with ARACH: A Training-Free Inference-Time Plug-In for Enhancing LLMs via Global Attention Reallocation
arXiv:2603.11067v1 Announce Type: new Abstract: Large language models (LLMs) achieve remarkable performance, yet further gains often require costly training. This has motivated growing interest in post-training techniques-especially training-free approaches that improve models at inference time without updating weights. Most training-free...
Entropy Guided Diversification and Preference Elicitation in Agentic Recommendation Systems
arXiv:2603.11399v1 Announce Type: new Abstract: Users on e-commerce platforms can be uncertain about their preferences early in their search. Queries to recommendation systems are frequently ambiguous, incomplete, or weakly specified. Agentic systems are expected to proactively reason, ask clarifying questions,...
TimeSqueeze: Dynamic Patching for Efficient Time Series Forecasting
arXiv:2603.11352v1 Announce Type: new Abstract: Transformer-based time series foundation models face a fundamental trade-off in choice of tokenization: point-wise embeddings preserve temporal fidelity but scale poorly with sequence length, whereas fixed-length patching improves efficiency by imposing uniform boundaries that may...
Gender Bias in Generative AI-assisted Recruitment Processes
arXiv:2603.11736v1 Announce Type: new Abstract: In recent years, generative artificial intelligence (GenAI) systems have assumed increasingly crucial roles in selection processes, personnel recruitment and analysis of candidates' profiles. However, the employment of large language models (LLMs) risks reproducing, and in...
MaterialFigBENCH: benchmark dataset with figures for evaluating college-level materials science problem-solving abilities of multimodal large language models
arXiv:2603.11414v1 Announce Type: new Abstract: We present MaterialFigBench, a benchmark dataset designed to evaluate the ability of multimodal large language models (LLMs) to solve university-level materials science problems that require accurate interpretation of figures. Unlike existing benchmarks that primarily rely...
When OpenClaw Meets Hospital: Toward an Agentic Operating System for Dynamic Clinical Workflows
arXiv:2603.11721v1 Announce Type: new Abstract: Large language model (LLM) agents extend conventional generative models by integrating reasoning, tool invocation, and persistent memory. Recent studies suggest that such agents may significantly improve clinical workflows by automating documentation, coordinating care processes, and...
One Supervisor, Many Modalities: Adaptive Tool Orchestration for Autonomous Queries
arXiv:2603.11545v1 Announce Type: new Abstract: We present an agentic AI framework for autonomous multimodal query processing that coordinates specialized tools across text, image, audio, video, and document modalities. A central Supervisor dynamically decomposes user queries, delegates subtasks to modality-appropriate tools...
A technology-oriented mapping of the language and translation industry: Analysing stakeholder values and their potential implication for translation pedagogy
arXiv:2603.11667v1 Announce Type: new Abstract: This paper examines how value is constructed and negotiated in today's increasingly automated language and translation industry. Drawing on interview data from twenty-nine industry stakeholders collected within the LT-LiDER project, the study analyses how human...
SemBench: A Universal Semantic Framework for LLM Evaluation
arXiv:2603.11687v1 Announce Type: new Abstract: Recent progress in Natural Language Processing (NLP) has been driven by the emergence of Large Language Models (LLMs), which exhibit remarkable generative and reasoning capabilities. However, despite their success, evaluating the true semantic understanding of...
Semi-Synthetic Parallel Data for Translation Quality Estimation: A Case Study of Dataset Building for an Under-Resourced Language Pair
arXiv:2603.11743v1 Announce Type: new Abstract: Quality estimation (QE) plays a crucial role in machine translation (MT) workflows, as it serves to evaluate generated outputs that have no reference translations and to determine whether human post-editing or full retranslation is necessary....
Legal-DC: Benchmarking Retrieval-Augmented Generation for Legal Documents
arXiv:2603.11772v1 Announce Type: new Abstract: Retrieval-Augmented Generation (RAG) has emerged as a promising technology for legal document consultation, yet its application in Chinese legal scenarios faces two key limitations: existing benchmarks lack specialized support for joint retriever-generator evaluation, and mainstream...