LLM-Driven Heuristic Synthesis for Industrial Process Control: Lessons from Hot Steel Rolling
arXiv:2603.20537v1 Announce Type: new Abstract: Industrial process control demands policies that are interpretable and auditable, requirements that black-box neural policies struggle to meet. We study an LLM-driven heuristic synthesis framework for hot steel rolling, in which a language model iteratively...
gUFO: A Gentle Foundational Ontology for Semantic Web Knowledge Graphs
arXiv:2603.20948v1 Announce Type: new Abstract: gUFO is a lightweight implementation of the Unified Foundational Ontology (UFO) suitable for Semantic Web OWL 2 DL applications. UFO is a mature foundational ontology with a rich axiomatization and that has been employed in...
Position: Multi-Agent Algorithmic Care Systems Demand Contestability for Trustworthy AI
arXiv:2603.20595v1 Announce Type: new Abstract: Multi-agent systems (MAS) are increasingly used in healthcare to support complex decision-making through collaboration among specialized agents. Because these systems act as collective decision-makers, they raise challenges for trust, accountability, and human oversight. Existing approaches...
Profit is the Red Team: Stress-Testing Agents in Strategic Economic Interactions
arXiv:2603.20925v1 Announce Type: new Abstract: As agentic systems move into real-world deployments, their decisions increasingly depend on external inputs such as retrieved content, tool outputs, and information provided by other actors. When these inputs can be strategically shaped by adversaries,...
Fast-Slow Thinking RM: Efficient Integration of Scalar and Generative Reward Models
arXiv:2603.20212v1 Announce Type: new Abstract: Reward models (RMs) are critical for aligning Large Language Models via Reinforcement Learning from Human Feedback (RLHF). While Generative Reward Models (GRMs) achieve superior accuracy through chain-of-thought (CoT) reasoning, they incur substantial computational costs. Conversely,...
ProMAS: Proactive Error Forecasting for Multi-Agent Systems Using Markov Transition Dynamics
arXiv:2603.20260v1 Announce Type: new Abstract: The integration of Large Language Models into Multi-Agent Systems (MAS) has enabled the so-lution of complex, long-horizon tasks through collaborative reasoning. However, this collec-tive intelligence is inherently fragile, as a single logical fallacy can rapidly...
Where can AI be used? Insights from a deep ontology of work activities
arXiv:2603.20619v1 Announce Type: new Abstract: Artificial intelligence (AI) is poised to profoundly reshape how work is executed and organized, but we do not yet have deep frameworks for understanding where AI can be used. Here we provide a comprehensive ontology...
NeurIPS Datasets & Benchmarks Track: From Art to Science in AI Evaluations
Domain-Specialized Tree of Thought through Plug-and-Play Predictors
arXiv:2603.20267v1 Announce Type: new Abstract: While Large Language Models (LLMs) have advanced complex reasoning, prominent methods like the Tree of Thoughts (ToT) framework face a critical trade-off between exploration depth and computational efficiency. Existing ToT implementations often rely on heavyweight...
Coding Agents are Effective Long-Context Processors
arXiv:2603.20432v1 Announce Type: new Abstract: Large Language Models (LLMs) have demonstrated remarkable progress in scaling to access massive contexts. However, the access is via the latent and uninterpretable attention mechanisms, and LLMs fail to effective process long context, exhibiting significant...
Policies Permitting LLM Use for Polishing Peer Reviews Are Currently Not Enforceable
arXiv:2603.20450v1 Announce Type: new Abstract: A number of scientific conferences and journals have recently enacted policies that prohibit LLM usage by peer reviewers, except for polishing, paraphrasing, and grammar correction of otherwise human-written reviews. But, are these policies enforceable? To...
Diffutron: A Masked Diffusion Language Model for Turkish Language
arXiv:2603.20466v1 Announce Type: new Abstract: Masked Diffusion Language Models (MDLMs) have emerged as a compelling non-autoregressive alternative to standard large language models; however, their application to morphologically rich languages remains limited. In this paper, we introduce $\textit{Diffutron}$, a masked diffusion...
PARHAF, a human-authored corpus of clinical reports for fictitious patients in French
arXiv:2603.20494v1 Announce Type: new Abstract: The development of clinical natural language processing (NLP) systems is severely hampered by the sensitive nature of medical records, which restricts data sharing under stringent privacy regulations, particularly in France and the broader European Union....
JUBAKU: An Adversarial Benchmark for Exposing Culturally Grounded Stereotypes in Japanese LLMs
arXiv:2603.20581v1 Announce Type: new Abstract: Social biases reflected in language are inherently shaped by cultural norms, which vary significantly across regions and lead to diverse manifestations of stereotypes. Existing evaluations of social bias in large language models (LLMs) for non-English...
Weber's Law in Transformer Magnitude Representations: Efficient Coding, Representational Geometry, and Psychophysical Laws in Language Models
arXiv:2603.20642v1 Announce Type: new Abstract: How do transformer language models represent magnitude? Recent work disagrees: some find logarithmic spacing, others linear encoding, others per-digit circular representations. We apply the formal tools of psychophysics to resolve this. Using four converging paradigms...
MzansiText and MzansiLM: An Open Corpus and Decoder-Only Language Model for South African Languages
arXiv:2603.20732v1 Announce Type: new Abstract: Decoder-only language models can be adapted to diverse tasks through instruction finetuning, but the extent to which this generalizes at small scale for low-resource languages remains unclear. We focus on the languages of South Africa,...
BenchBench: Benchmarking Automated Benchmark Generation
arXiv:2603.20807v1 Announce Type: new Abstract: Benchmarks are the de facto standard for tracking progress in large language models (LLMs), yet static test sets can rapidly saturate, become vulnerable to contamination, and are costly to refresh. Scalable evaluation of open-ended items...
HiCI: Hierarchical Construction-Integration for Long-Context Attention
arXiv:2603.20843v1 Announce Type: new Abstract: Long-context language modeling is commonly framed as a scalability challenge of token-level attention, yet local-to-global information structuring remains largely implicit in existing approaches. Drawing on cognitive theories of discourse comprehension, we propose HiCI (Hierarchical Construction--Integration),...
Can ChatGPT Really Understand Modern Chinese Poetry?
arXiv:2603.20851v1 Announce Type: new Abstract: ChatGPT has demonstrated remarkable capabilities on both poetry generation and translation, yet its ability to truly understand poetry remains unexplored. Previous poetry-related work merely analyzed experimental outcomes without addressing fundamental issues of comprehension. This paper...
DiscoUQ: Structured Disagreement Analysis for Uncertainty Quantification in LLM Agent Ensembles
arXiv:2603.20975v1 Announce Type: new Abstract: Multi-agent LLM systems, where multiple prompted instances of a language model independently answer questions, are increasingly used for complex reasoning tasks. However, existing methods for quantifying the uncertainty of their collective outputs rely on shallow...
Transformer-Based Predictive Maintenance for Risk-Aware Instrument Calibration
arXiv:2603.20297v1 Announce Type: new Abstract: Accurate calibration is essential for instruments whose measurements must remain traceable, reliable, and compliant over long operating periods. Fixed-interval programs are easy to administer, but they ignore that instruments drift at different rates under different...
Bounded Coupled AI Learning Dynamics in Tri-Hierarchical Drone Swarms
arXiv:2603.20333v1 Announce Type: new Abstract: Modern autonomous multi-agent systems combine heterogeneous learning mechanisms operating at different timescales. An open question remains: can one formally guarantee that coupled dynamics of such mechanisms stay within the admissible operational regime? This paper studies...
KV Cache Optimization Strategies for Scalable and Efficient LLM Inference
arXiv:2603.20397v1 Announce Type: new Abstract: The key-value (KV) cache is a foundational optimization in Transformer-based large language models (LLMs), eliminating redundant recomputation of past token representations during autoregressive generation. However, its memory footprint scales linearly with context length, imposing critical...
SLE-FNO: Single-Layer Extensions for Task-Agnostic Continual Learning in Fourier Neural Operators
arXiv:2603.20410v1 Announce Type: new Abstract: Scientific machine learning is increasingly used to build surrogate models, yet most models are trained under a restrictive assumption in which future data follow the same distribution as the training set. In practice, new experimental...
SDE-Driven Spatio-Temporal Hypergraph Neural Networks for Irregular Longitudinal fMRI Connectome Modeling in Alzheimer's Disease
arXiv:2603.20452v1 Announce Type: new Abstract: Longitudinal neuroimaging is essential for modeling disease progression in Alzheimer's disease (AD), yet irregular sampling and missing visits pose substantial challenges for learning reliable temporal representations. To address this challenge, we propose SDE-HGNN, a stochastic...
Reinforcement Learning from Multi-Source Imperfect Preferences: Best-of-Both-Regimes Regret
arXiv:2603.20453v1 Announce Type: new Abstract: Reinforcement learning from human feedback (RLHF) replaces hard-to-specify rewards with pairwise trajectory preferences, yet regret-oriented theory often assumes that preference labels are generated consistently from a single ground-truth objective. In practical RLHF systems, however, feedback...
Spatio-Temporal Grid Intelligence: A Hybrid Graph Neural Network and LSTM Framework for Robust Electricity Theft Detection
arXiv:2603.20488v1 Announce Type: new Abstract: Electricity theft, or non-technical loss (NTL), presents a persistent threat to global power systems, driving significant financial deficits and compromising grid stability. Conventional detection methodologies, predominantly reactive and meter-centric, often fail to capture the complex...
AE-LLM: Adaptive Efficiency Optimization for Large Language Models
arXiv:2603.20492v1 Announce Type: new Abstract: Large Language Models (LLMs) have achieved remarkable success across diverse applications, yet their deployment remains challenging due to substantial computational costs, memory requirements, and energy consumption. Recent empirical studies have demonstrated that no single efficiency...
RMNP: Row-Momentum Normalized Preconditioning for Scalable Matrix-Based Optimization
arXiv:2603.20527v1 Announce Type: new Abstract: Preconditioned adaptive methods have gained significant attention for training deep neural networks, as they capture rich curvature information of the loss landscape . The central challenge in this field lies in balancing preconditioning effectiveness with...