GAIN: A Benchmark for Goal-Aligned Decision-Making of Large Language Models under Imperfect Norms
arXiv:2603.18469v1 Announce Type: new Abstract: We introduce GAIN (Goal-Aligned Decision-Making under Imperfect Norms), a benchmark designed to evaluate how large language models (LLMs) balance adherence to norms against business goals. Existing benchmarks typically focus on abstract scenarios rather than real-world...
EntropyCache: Decoded Token Entropy Guided KV Caching for Diffusion Language Models
arXiv:2603.18489v1 Announce Type: new Abstract: Diffusion-based large language models (dLLMs) rely on bidirectional attention, which prevents lossless KV caching and requires a full forward pass at every denoising step. Existing approximate KV caching methods reduce this cost by selectively updating...
Cross-Lingual LLM-Judge Transfer via Evaluation Decomposition
arXiv:2603.18557v1 Announce Type: new Abstract: As large language models are increasingly deployed across diverse real-world applications, extending automated evaluation beyond English has become a critical challenge. Existing evaluation approaches are predominantly English-focused, and adapting them to other languages is hindered...
Cross-Modal Rationale Transfer for Explainable Humanitarian Classification on Social Media
arXiv:2603.18611v1 Announce Type: new Abstract: Advances in social media data dissemination enable the provision of real-time information during a crisis. The information comes from different classes, such as infrastructure damages, persons missing or stranded in the affected zone, etc. Existing...
A Comparative Empirical Study of Catastrophic Forgetting Mitigation in Sequential Task Adaptation for Continual Natural Language Processing Systems
arXiv:2603.18641v1 Announce Type: new Abstract: Neural language models deployed in real-world applications must continually adapt to new tasks and domains without forgetting previously acquired knowledge. This work presents a comparative empirical study of catastrophic forgetting mitigation in continual intent classification....
Why Better Cross-Lingual Alignment Fails for Better Cross-Lingual Transfer: Case of Encoders
arXiv:2603.18863v1 Announce Type: new Abstract: Better cross-lingual alignment is often assumed to yield better cross-lingual transfer. However, explicit alignment techniques -- despite increasing embedding similarity -- frequently fail to improve token-level downstream performance. In this work, we show that this...
Evaluating LLM-Generated Lessons from the Language Learning Students' Perspective: A Short Case Study on Duolingo
arXiv:2603.18873v1 Announce Type: new Abstract: Popular language learning applications such as Duolingo use large language models (LLMs) to generate lessons for its users. Most lessons focus on general real-world scenarios such as greetings, ordering food, or asking directions, with limited...
A Human-in/on-the-Loop Framework for Accessible Text Generation
arXiv:2603.18879v1 Announce Type: new Abstract: Plain Language and Easy-to-Read formats in text simplification are essential for cognitive accessibility. Yet current automatic simplification and evaluation pipelines remain largely automated, metric-driven, and fail to reflect user comprehension or normative standards. This paper...
Progressive Training for Explainable Citation-Grounded Dialogue: Reducing Hallucination to Zero in English-Hindi LLMs
arXiv:2603.18911v1 Announce Type: new Abstract: Knowledge-grounded dialogue systems aim to generate informative, contextually relevant responses by conditioning on external knowledge sources. However, most existing approaches focus exclusively on English, lack explicit citation mechanisms for verifying factual claims, and offer limited...
Towards Differentiating Between Failures and Domain Shifts in Industrial Data Streams
arXiv:2603.18032v1 Announce Type: new Abstract: Anomaly and failure detection methods are crucial in identifying deviations from normal system operational conditions, which allows for actions to be taken in advance, usually preventing more serious damages. Long-lasting deviations indicate failures, while sudden,...
Adapting Methods for Domain-Specific Japanese Small LMs: Scale, Architecture, and Quantization
arXiv:2603.18037v1 Announce Type: new Abstract: This paper presents a systematic methodology for building domain-specific Japanese small language models using QLoRA fine-tuning. We address three core questions: optimal training scale, base-model selection, and architecture-aware quantization. Stage 1 (Training scale): Scale-learning experiments...
Fundamental Limits of Neural Network Sparsification: Evidence from Catastrophic Interpretability Collapse
arXiv:2603.18056v1 Announce Type: new Abstract: Extreme neural network sparsification (90% activation reduction) presents a critical challenge for mechanistic interpretability: understanding whether interpretable features survive aggressive compression. This work investigates feature survival under severe capacity constraints in hybrid Variational Autoencoder--Sparse Autoencoder...
Probabilistic Federated Learning on Uncertain and Heterogeneous Data with Model Personalization
arXiv:2603.18083v1 Announce Type: new Abstract: Conventional federated learning (FL) frameworks often suffer from training degradation due to data uncertainty and heterogeneity across local clients. Probabilistic approaches such as Bayesian neural networks (BNNs) can mitigate this issue by explicitly modeling uncertainty,...
Enhancing Reinforcement Learning Fine-Tuning with an Online Refiner
arXiv:2603.18088v1 Announce Type: new Abstract: Constraints are essential for stabilizing reinforcement learning fine-tuning (RFT) and preventing degenerate outputs, yet they inherently conflict with the optimization objective because stronger constraints limit the ability of a fine-tuned model to discover better solutions....
BoundAD: Boundary-Aware Negative Generation for Time Series Anomaly Detection
arXiv:2603.18111v1 Announce Type: new Abstract: Contrastive learning methods for time series anomaly detection (TSAD) heavily depend on the quality of negative sample construction. However, existing strategies based on random perturbations or pseudo-anomaly injection often struggle to simultaneously preserve temporal semantic...
AGRI-Fidelity: Evaluating the Reliability of Listenable Explanations for Poultry Disease Detection
arXiv:2603.18247v1 Announce Type: new Abstract: Existing XAI metrics measure faithfulness for a single model, ignoring model multiplicity where near-optimal classifiers rely on different or spurious acoustic cues. In noisy farm environments, stationary artifacts such as ventilation noise can produce explanations...
ALIGN: Adversarial Learning for Generalizable Speech Neuroprosthesis
arXiv:2603.18299v1 Announce Type: new Abstract: Intracortical brain-computer interfaces (BCIs) can decode speech from neural activity with high accuracy when trained on data pooled across recording sessions. In realistic deployment, however, models must generalize to new sessions without labeled data, and...
Learning to Reason with Curriculum I: Provable Benefits of Autocurriculum
arXiv:2603.18325v1 Announce Type: new Abstract: Chain-of-thought reasoning, where language models expend additional computation by producing thinking tokens prior to final responses, has driven significant advances in model capabilities. However, training these reasoning models is extremely costly in terms of both...
Escaping Offline Pessimism: Vector-Field Reward Shaping for Safe Frontier Exploration
arXiv:2603.18326v1 Announce Type: new Abstract: While offline reinforcement learning provides reliable policies for real-world deployment, its inherent pessimism severely restricts an agent's ability to explore and collect novel data online. Drawing inspiration from safe reinforcement learning, exploring near the boundary...
A Family of Adaptive Activation Functions for Mitigating Failure Modes in Physics-Informed Neural Networks
arXiv:2603.18328v1 Announce Type: new Abstract: Physics-Informed Neural Networks(PINNs) are a powerful and flexible learning framework that has gained significant attention in recent years. It has demonstrated strong performance across a wide range of scientific and engineering problems. In parallel, wavelets...
Self-Tuning Sparse Attention: Multi-Fidelity Hyperparameter Optimization for Transformer Acceleration
arXiv:2603.18417v1 Announce Type: new Abstract: Sparse attention mechanisms promise to break the quadratic bottleneck of long-context transformers, yet production adoption remains limited by a critical usability gap: optimal hyperparameters vary substantially across layers and models, and current methods (e.g., SpargeAttn)...
Beyond Passive Aggregation: Active Auditing and Topology-Aware Defense in Decentralized Federated Learning
arXiv:2603.18538v1 Announce Type: new Abstract: Decentralized Federated Learning (DFL) remains highly vulnerable to adaptive backdoor attacks designed to bypass traditional passive defense metrics. To address this limitation, we shift the defensive paradigm toward a novel active, interventional auditing framework. First,...
Multi-Agent Reinforcement Learning for Dynamic Pricing: Balancing Profitability,Stability and Fairness
arXiv:2603.16888v1 Announce Type: new Abstract: Dynamic pricing in competitive retail markets requires strategies that adapt to fluctuating demand and competitor behavior. In this work, we present a systematic empirical evaluation of multi-agent reinforcement learning (MARL) approaches-specifically MAPPO and MADDPG-for dynamic...
MHPO: Modulated Hazard-aware Policy Optimization for Stable Reinforcement Learning
arXiv:2603.16929v1 Announce Type: new Abstract: Regulating the importance ratio is critical for the training stability of Group Relative Policy Optimization (GRPO) based frameworks. However, prevailing ratio control methods, such as hard clipping, suffer from non-differentiable boundaries and vanishing gradient regions,...
Personalized Fall Detection by Balancing Data with Selective Feedback Using Contrastive Learning
arXiv:2603.17148v1 Announce Type: new Abstract: Personalized fall detection models can significantly improve accuracy by adapting to individual motion patterns, yet their effectiveness is often limited by the scarcity of real-world fall data and the dominance of non-fall feedback samples. This...
MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild
arXiv:2603.17187v1 Announce Type: new Abstract: Large language model (LLM) agents are increasingly used for complex tasks, yet deployed agents often remain static, failing to adapt as user needs evolve. This creates a tension between the need for continuous service and...
WINFlowNets: Warm-up Integrated Networks Training of Generative Flow Networks for Robotics and Machine Fault Adaptation
arXiv:2603.17301v1 Announce Type: new Abstract: Generative Flow Networks for continuous scenarios (CFlowNets) have shown promise in solving sequential decision-making tasks by learning stochastic policies using a flow and a retrieval network. Despite their demonstrated efficiency compared to state-of-the-art Reinforcement Learning...
The Causal Uncertainty Principle: Manifold Tearing and the Topological Limits of Counterfactual Interventions
arXiv:2603.17385v1 Announce Type: new Abstract: Judea Pearl's do-calculus provides a foundation for causal inference, but its translation to continuous generative models remains fraught with geometric challenges. We establish the fundamental limits of such interventions. We define the Counterfactual Event Horizon...
Baguan-TS: A Sequence-Native In-Context Learning Model for Time Series Forecasting with Covariates
arXiv:2603.17439v1 Announce Type: new Abstract: Transformers enable in-context learning (ICL) for rapid, gradient-free adaptation in time series forecasting, yet most ICL-style approaches rely on tabularized, hand-crafted features, while end-to-end sequence models lack inference-time adaptation. We bridge this gap with a...