Between Rules and Reality: On the Context Sensitivity of LLM Moral Judgment
arXiv:2603.23114v1 Announce Type: new Abstract: A human's moral decision depends heavily on the context. Yet research on LLM morality has largely studied fixed scenarios. We address this gap by introducing Contextual MoralChoice, a dataset of moral dilemmas with systematic contextual...
LLM Olympiad: Why Model Evaluation Needs a Sealed Exam
arXiv:2603.23292v1 Announce Type: new Abstract: Benchmarks and leaderboards are how NLP most often communicates progress, but in the LLM era they are increasingly easy to misread. Scores can reflect benchmark-chasing, hidden evaluation choices, or accidental exposure to test content --...
The Efficiency Attenuation Phenomenon: A Computational Challenge to the Language of Thought Hypothesis
arXiv:2603.22312v1 Announce Type: new Abstract: This paper computationally investigates whether thought requires a language-like format, as posited by the Language of Thought (LoT) hypothesis. We introduce the ``AI Private Language'' thought experiment: if two artificial agents develop an efficient, inscrutable...
When AI Shows Its Work, Is It Actually Working? Step-Level Evaluation Reveals Frontier Language Models Frequently Bypass Their Own Reasoning
arXiv:2603.22816v1 Announce Type: new Abstract: Language models increasingly "show their work" by writing step-by-step reasoning before answering. But are these reasoning steps genuinely used, or decorative narratives generated after the model has already decided? Consider: a medical AI writes "The...
Set-Valued Prediction for Large Language Models with Feasibility-Aware Coverage Guarantees
arXiv:2603.22966v1 Announce Type: new Abstract: Large language models (LLMs) inherently operate over a large generation space, yet conventional usage typically reports the most likely generation (MLG) as a point prediction, which underestimates the model's capability: although the top-ranked response can...
AuthorMix: Modular Authorship Style Transfer via Layer-wise Adapter Mixing
arXiv:2603.23069v1 Announce Type: new Abstract: The task of authorship style transfer involves rewriting text in the style of a target author while preserving the meaning of the original text. Existing style transfer methods train a single model on large corpora...
From Synthetic to Native: Benchmarking Multilingual Intent Classification in Logistics Customer Service
arXiv:2603.23172v1 Announce Type: new Abstract: Multilingual intent classification is central to customer-service systems on global logistics platforms, where models must process noisy user queries across languages and hierarchical label spaces. Yet most existing multilingual benchmarks rely on machine-translated text, which...
Latent Semantic Manifolds in Large Language Models
arXiv:2603.22301v1 Announce Type: new Abstract: Large Language Models (LLMs) perform internal computations in continuous vector spaces yet produce discrete tokens -- a fundamental mismatch whose geometric consequences remain poorly understood. We develop a mathematical framework that interprets LLM hidden states...
Sample Transform Cost-Based Training-Free Hallucination Detector for Large Language Models
arXiv:2603.22303v1 Announce Type: new Abstract: Hallucinations in large language models (LLMs) remain a central obstacle to trustworthy deployment, motivating detectors that are accurate, lightweight, and broadly applicable. Since an LLM with a prompt defines a conditional distribution, we argue that...
A Multi-Modal CNN-LSTM Framework with Multi-Head Attention and Focal Loss for Real-Time Elderly Fall Detection
arXiv:2603.22313v1 Announce Type: new Abstract: The increasing global aging population has intensified the demand for reliable health monitoring systems, particularly those capable of detecting critical events such as falls among elderly individuals. Traditional fall detection approaches relying on single-modality acceleration...
Enhancing AI-Based Tropical Cyclone Track and Intensity Forecasting via Systematic Bias Correction
arXiv:2603.22314v1 Announce Type: new Abstract: Tropical cyclones (TCs) pose severe threats to life, infrastructure, and economies in tropical and subtropical regions, underscoring the critical need for accurate and timely forecasts of both track and intensity. Recent advances in AI-based weather...
Geometric Mixture-of-Experts with Curvature-Guided Adaptive Routing for Graph Representation Learning
arXiv:2603.22317v1 Announce Type: new Abstract: Graph-structured data typically exhibits complex topological heterogeneity, making it difficult to model accurately within a single Riemannian manifold. While emerging mixed-curvature methods attempt to capture such diversity, they often rely on implicit, task-driven routing that...
Sparsely-Supervised Data Assimilation via Physics-Informed Schr\"odinger Bridge
arXiv:2603.22319v1 Announce Type: new Abstract: Data assimilation (DA) for systems governed by partial differential equations (PDE) aims to reconstruct full spatiotemporal fields from sparse high-fidelity (HF) observations while respecting physical constraints. While full-grid low-fidelity (LF) simulations provide informative priors in...
DAQ: Delta-Aware Quantization for Post-Training LLM Weight Compression
arXiv:2603.22324v1 Announce Type: new Abstract: We introduce Delta-Aware Quantization (DAQ), a data-free post-training quantization framework that preserves the knowledge acquired during post-training. Standard quantization objectives minimize reconstruction error but are agnostic to the base model, allowing quantization noise to disproportionately...
A Direct Classification Approach for Reliable Wind Ramp Event Forecasting under Severe Class Imbalance
arXiv:2603.22326v1 Announce Type: new Abstract: Decision support systems are essential for maintaining grid stability in low-carbon power systems, such as wind power plants, by providing real-time alerts to control room operators regarding potential events, including Wind Power Ramp Events (WPREs)....
Cloud-Edge Collaborative Large Models for Robust Photovoltaic Power Forecasting
arXiv:2603.22343v1 Announce Type: new Abstract: Photovoltaic (PV) power forecasting in edge-enabled grids requires balancing forecasting accuracy, robustness under weather-driven distribution shifts, and strict latency constraints. Local specialized models are efficient for routine conditions but often degrade under rare ramp events...
Three Creates All: You Only Sample 3 Steps
arXiv:2603.22375v1 Announce Type: new Abstract: Diffusion models deliver high-fidelity generation but remain slow at inference time due to many sequential network evaluations. We find that standard timestep conditioning becomes a key bottleneck for few-step sampling. Motivated by layer-dependent denoising dynamics,...
Symbolic Graph Networks for Robust PDE Discovery from Noisy Sparse Data
arXiv:2603.22380v1 Announce Type: new Abstract: Data-driven discovery of partial differential equations (PDEs) offers a promising paradigm for uncovering governing physical laws from observational data. However, in practical scenarios, measurements are often contaminated by noise and limited by sparse sampling, which...
Learning When to Act: Interval-Aware Reinforcement Learning with Predictive Temporal Structure
arXiv:2603.22384v1 Announce Type: new Abstract: Autonomous agents operating in continuous environments must decide not only what to do, but when to act. We introduce a lightweight adaptive temporal control system that learns the optimal interval between cognitive ticks from experience,...
SkillRouter: Retrieve-and-Rerank Skill Selection for LLM Agents at Scale
arXiv:2603.22455v1 Announce Type: new Abstract: As LLM agent ecosystems grow, the number of available skills (tools, plugins) has reached tens of thousands, making it infeasible to inject all skills into an agent's context. This creates a need for skill routing...
Multimodal Training to Unimodal Deployment: Leveraging Unstructured Data During Training to Optimize Structured Data Only Deployment
arXiv:2603.22530v1 Announce Type: new Abstract: Unstructured Electronic Health Record (EHR) data, such as clinical notes, contain clinical contextual observations that are not directly reflected in structured data fields. This additional information can substantially improve model learning. However, due to their...
AutoMOOSE: An Agentic AI for Autonomous Phase-Field Simulation
arXiv:2603.20986v1 Announce Type: new Abstract: Multiphysics simulation frameworks such as MOOSE provide rigorous engines for phase-field materials modeling, yet adoption is constrained by the expertise required to construct valid input files, coordinate parameter sweeps, diagnose failures, and extract quantitative results....
Fast-Slow Thinking RM: Efficient Integration of Scalar and Generative Reward Models
arXiv:2603.20212v1 Announce Type: new Abstract: Reward models (RMs) are critical for aligning Large Language Models via Reinforcement Learning from Human Feedback (RLHF). While Generative Reward Models (GRMs) achieve superior accuracy through chain-of-thought (CoT) reasoning, they incur substantial computational costs. Conversely,...
Knowledge Boundary Discovery for Large Language Models
arXiv:2603.21022v1 Announce Type: new Abstract: We propose Knowledge Boundary Discovery (KBD), a reinforcement learning based framework to explore the knowledge boundaries of the Large Language Models (LLMs). We define the knowledge boundary by automatically generating two types of questions: (i)...
Attention in Space: Functional Roles of VLM Heads for Spatial Reasoning
arXiv:2603.20662v1 Announce Type: new Abstract: Despite remarkable advances in large Vision-Language Models (VLMs), spatial reasoning remains a persistent challenge. In this work, we investigate how attention heads within VLMs contribute to spatial reasoning by analyzing their functional roles through a...
Reasoning Traces Shape Outputs but Models Won't Say So
arXiv:2603.20620v1 Announce Type: new Abstract: Can we trust the reasoning traces that large reasoning models (LRMs) produce? We investigate whether these traces faithfully reflect what drives model outputs, and whether models will honestly report their influence. We introduce Thought Injection,...
Compression is all you need: Modeling Mathematics
arXiv:2603.20396v1 Announce Type: new Abstract: Human mathematics (HM), the mathematics humans discover and value, is a vanishingly small subset of formal mathematics (FM), the totality of all valid deductions. We argue that HM is distinguished by its compressibility through hierarchically...
Locally Coherent Parallel Decoding in Diffusion Language Models
arXiv:2603.20216v1 Announce Type: new Abstract: Diffusion language models (DLMs) have emerged as a promising alternative to autoregressive (AR) models, offering sub-linear generation latency and bidirectional capabilities that are particularly appealing for code generation and editing. Achieving sub-linear latency in discrete...
KLDrive: Fine-Grained 3D Scene Reasoning for Autonomous Driving based on Knowledge Graph
arXiv:2603.21029v1 Announce Type: new Abstract: Autonomous driving requires reliable reasoning over fine-grained 3D scene facts. Fine-grained question answering over multi-modal driving observations provides a natural way to evaluate this capability, yet existing perception pipelines and driving-oriented large language model (LLM)...
DiffGraph: An Automated Agent-driven Model Merging Framework for In-the-Wild Text-to-Image Generation
arXiv:2603.20470v1 Announce Type: new Abstract: The rapid growth of the text-to-image (T2I) community has fostered a thriving online ecosystem of expert models, which are variants of pretrained diffusion models specialized for diverse generative abilities. Yet, existing model merging methods remain...