From Static Inference to Dynamic Interaction: Navigating the Landscape of Streaming Large Language Models
arXiv:2603.04592v1 Announce Type: new Abstract: Standard Large Language Models (LLMs) are predominantly designed for static inference with pre-defined inputs, which limits their applicability in dynamic, real-time scenarios. To address this gap, the streaming LLM paradigm has emerged. However, existing definitions...
Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning
arXiv:2603.04597v1 Announce Type: new Abstract: Large language models (LLMs) typically receive diverse natural language (NL) feedback through interaction with the environment. However, current reinforcement learning (RL) algorithms rely solely on scalar rewards, leaving the rich information in NL feedback underutilized...
iAgentBench: Benchmarking Sensemaking Capabilities of Information-Seeking Agents on High-Traffic Topics
arXiv:2603.04656v1 Announce Type: new Abstract: With the emergence of search-enabled generative QA systems, users are increasingly turning to tools that browse, aggregate, and reconcile evidence across multiple sources on their behalf. Yet many widely used QA benchmarks remain answerable by...
Stan: An LLM-based thermodynamics course assistant
arXiv:2603.04657v1 Announce Type: new Abstract: Discussions of AI in education focus predominantly on student-facing tools -- chatbots, tutors, and problem generators -- while the potential for the same infrastructure to support instructors remains largely unexplored. We describe Stan, a suite...
Optimizing Language Models for Crosslingual Knowledge Consistency
arXiv:2603.04678v1 Announce Type: new Abstract: Large language models are known to often exhibit inconsistent knowledge. This is particularly problematic in multilingual scenarios, where models are likely to be asked similar questions in different languages, and inconsistent responses can undermine their...
Hate Speech Detection using Large Language Models with Data Augmentation and Feature Enhancement
arXiv:2603.04698v1 Announce Type: new Abstract: This paper evaluates data augmentation and feature enhancement techniques for hate speech detection, comparing traditional classifiers, e.g., Delta Term Frequency-Inverse Document Frequency (Delta TF-IDF), with transformer-based models (DistilBERT, RoBERTa, DeBERTa, Gemma-7B, gpt-oss-20b) across diverse datasets....
Detection of Illicit Content on Online Marketplaces using Large Language Models
arXiv:2603.04707v1 Announce Type: new Abstract: Online marketplaces, while revolutionizing global commerce, have inadvertently facilitated the proliferation of illicit activities, including drug trafficking, counterfeit sales, and cybercrimes. Traditional content moderation methods such as manual reviews and rule-based automated systems struggle with...
AI-Assisted Moot Courts: Simulating Justice-Specific Questioning in Oral Arguments
arXiv:2603.04718v1 Announce Type: new Abstract: In oral arguments, judges probe attorneys with questions about the factual record, legal claims, and the strength of their arguments. To prepare for this questioning, both law schools and practicing attorneys rely on moot courts:...
TSEmbed: Unlocking Task Scaling in Universal Multimodal Embeddings
arXiv:2603.04772v1 Announce Type: new Abstract: Despite the exceptional reasoning capabilities of Multimodal Large Language Models (MLLMs), their adaptation into universal embedding models is significantly impeded by task conflict. To address this, we propose TSEmbed, a universal multimodal embedding framework that...
Autoscoring Anticlimax: A Meta-analytic Understanding of AI's Short-answer Shortcomings and Wording Weaknesses
arXiv:2603.04820v1 Announce Type: new Abstract: Automated short-answer scoring lags other LLM applications. We meta-analyze 890 culminating results across a systematic review of LLM short-answer scoring studies, modeling the traditional effect size of Quadratic Weighted Kappa (QWK) with mixed effects metaregression....
SinhaLegal: A Benchmark Corpus for Information Extraction and Analysis in Sinhala Legislative Texts
arXiv:2603.04854v1 Announce Type: new Abstract: SinhaLegal introduces a Sinhala legislative text corpus containing approximately 2 million words across 1,206 legal documents. The dataset includes two types of legal documents: 1,065 Acts dated from 1981 to 2014 and 141 Bills from...
HACHIMI: Scalable and Controllable Student Persona Generation via Orchestrated Agents
arXiv:2603.04855v1 Announce Type: new Abstract: Student Personas (SPs) are emerging as infrastructure for educational LLMs, yet prior work often relies on ad-hoc prompting or hand-crafted profiles with limited control over educational theory and population distributions. We formalize this as Theory-Aligned...
Free Lunch for Pass@$k$? Low Cost Diverse Sampling for Diffusion Language Models
arXiv:2603.04893v1 Announce Type: new Abstract: Diverse outputs in text generation are necessary for effective exploration in complex reasoning tasks, such as code generation and mathematical problem solving. Such Pass@$k$ problems benefit from distinct candidates covering the solution space. However, traditional...
Can LLMs Capture Expert Uncertainty? A Comparative Analysis of Value Alignment in Ethnographic Qualitative Research
arXiv:2603.04897v1 Announce Type: new Abstract: Qualitative analysis of open-ended interviews plays a central role in ethnographic and economic research by uncovering individuals' values, motivations, and culturally embedded financial behaviors. While large language models (LLMs) offer promising support for automating and...
Federated Heterogeneous Language Model Optimization for Hybrid Automatic Speech Recognition
arXiv:2603.04945v1 Announce Type: new Abstract: Training automatic speech recognition (ASR) models increasingly relies on decentralized federated learning to ensure data privacy and accessibility, producing multiple local models that require effective merging. In hybrid ASR systems, while acoustic models can be...
Decorrelating the Future: Joint Frequency Domain Learning for Spatio-temporal Forecasting
arXiv:2603.04418v1 Announce Type: new Abstract: Standard direct forecasting models typically rely on point-wise objectives such as Mean Squared Error, which fail to capture the complex spatio-temporal dependencies inherent in graph-structured signals. While recent frequency-domain approaches such as FreDF mitigate temporal...
Machine Learning for Complex Systems Dynamics: Detecting Bifurcations in Dynamical Systems with Deep Neural Networks
arXiv:2603.04420v1 Announce Type: new Abstract: Critical transitions are the abrupt shifts between qualitatively different states of a system, and they are crucial to understanding tipping points in complex dynamical systems across ecology, climate science, and biology. Detecting these shifts typically...
FedEMA-Distill: Exponential Moving Average Guided Knowledge Distillation for Robust Federated Learning
arXiv:2603.04422v1 Announce Type: new Abstract: Federated learning (FL) often degrades when clients hold heterogeneous non-Independent and Identically Distributed (non-IID) data and when some clients behave adversarially, leading to client drift, slow convergence, and high communication overhead. This paper proposes FedEMA-Distill,...
Delta-Crosscoder: Robust Crosscoder Model Diffing in Narrow Fine-Tuning Regimes
arXiv:2603.04426v1 Announce Type: new Abstract: Model diffing methods aim to identify how fine-tuning changes a model's internal representations. Crosscoders approach this by learning shared dictionaries of interpretable latent directions between base and fine-tuned models. However, existing formulations struggle with narrow...
Thin Keys, Full Values: Reducing KV Cache via Low-Dimensional Attention Selection
arXiv:2603.04427v1 Announce Type: new Abstract: Standard transformer attention uses identical dimensionality for queries, keys, and values ($d_q = d_k = d_v = \dmodel$). Our insight is that these components serve fundamentally different roles, and this symmetry is unnecessary. Queries and...
Flowers: A Warp Drive for Neural PDE Solvers
arXiv:2603.04430v1 Announce Type: new Abstract: We introduce Flowers, a neural architecture for learning PDE solution operators built entirely from multihead warps. Aside from pointwise channel mixing and a multiscale scaffold, Flowers use no Fourier multipliers, no dot-product attention, and no...
Uncertainty-Calibrated Spatiotemporal Field Diffusion with Sparse Supervision
arXiv:2603.04431v1 Announce Type: new Abstract: Physical fields are typically observed only at sparse, time-varying sensor locations, making forecasting and reconstruction ill-posed and uncertainty-critical. We present SOLID, a mask-conditioned diffusion framework that learns spatiotemporal dynamics from sparse observations alone: training and...
ZorBA: Zeroth-order Federated Fine-tuning of LLMs with Heterogeneous Block Activation
arXiv:2603.04436v1 Announce Type: new Abstract: Federated fine-tuning of large language models (LLMs) enables collaborative tuning across distributed clients. However, due to the large size of LLMs, local updates in federated learning (FL) may incur substantial video random-access memory (VRAM) usage....
ASFL: An Adaptive Model Splitting and Resource Allocation Framework for Split Federated Learning
arXiv:2603.04437v1 Announce Type: new Abstract: Federated learning (FL) enables multiple clients to collaboratively train a machine learning model without sharing their raw data. However, the limited computation resources of the clients may result in a high delay and energy consumption...
An Explainable Ensemble Framework for Alzheimer's Disease Prediction Using Structured Clinical and Cognitive Data
arXiv:2603.04449v1 Announce Type: new Abstract: Early and accurate detection of Alzheimer's disease (AD) remains a major challenge in medical diagnosis due to its subtle onset and progressive nature. This research introduces an explainable ensemble learning Framework designed to classify individuals...
On Emergences of Non-Classical Statistical Characteristics in Classical Neural Networks
arXiv:2603.04451v1 Announce Type: new Abstract: Inspired by measurement incompatibility and Bell-family inequalities in quantum mechanics, we propose the Non-Classical Network (NCnet), a simple classical neural architecture that stably exhibits non-classical statistical behaviors under typical and interpretable experimental setups. We find...
Learning Unified Distance Metric for Heterogeneous Attribute Data Clustering
arXiv:2603.04458v1 Announce Type: new Abstract: Datasets composed of numerical and categorical attributes (also called mixed data hereinafter) are common in real clustering tasks. Differing from numerical attributes that indicate tendencies between two concepts (e.g., high and low temperature) with their...
VSPrefill: Vertical-Slash Sparse Attention with Lightweight Indexing for Long-Context Prefilling
arXiv:2603.04460v1 Announce Type: new Abstract: The quadratic complexity of self-attention during the prefill phase impedes long-context inference in large language models. Existing sparse attention methods face a trade-off among context adaptivity, sampling overhead, and fine-tuning costs. We propose VSPrefill, a...
MAD-SmaAt-GNet: A Multimodal Advection-Guided Neural Network for Precipitation Nowcasting
arXiv:2603.04461v1 Announce Type: new Abstract: Precipitation nowcasting (short-term forecasting) is still often performed using numerical solvers for physical equations, which are computationally expensive and make limited use of the large volumes of available weather data. Deep learning models have shown...
Understanding the Dynamics of Demonstration Conflict in In-Context Learning
arXiv:2603.04464v1 Announce Type: new Abstract: In-context learning enables large language models to perform novel tasks through few-shot demonstrations. However, demonstrations per se can naturally contain noise and conflicting examples, making this capability vulnerable. To understand how models process such conflicts,...