A Coin Flip for Safety: LLM Judges Fail to Reliably Measure Adversarial Robustness
arXiv:2603.06594v1 Announce Type: new Abstract: Automated \enquote{LLM-as-a-Judge} frameworks have become the de facto standard for scalable evaluation across natural language processing. For instance, in safety evaluation, these judges are relied upon to evaluate harmfulness in order to benchmark the robustness...
Can Safety Emerge from Weak Supervision? A Systematic Analysis of Small Language Models
arXiv:2603.07017v1 Announce Type: new Abstract: Safety alignment is critical for deploying large language models (LLMs) in real-world applications, yet most existing approaches rely on large human-annotated datasets and static red-teaming benchmarks that are costly, difficult to scale, and slow to...
AutoChecklist: Composable Pipelines for Checklist Generation and Scoring with LLM-as-a-Judge
arXiv:2603.07019v1 Announce Type: new Abstract: Checklists have emerged as a popular approach for interpretable and fine-grained evaluation, particularly with LLM-as-a-Judge. Beyond evaluation, these structured criteria can serve as signals for model alignment, reinforcement learning, and self-correction. To support these use...
Few Tokens, Big Leverage: Preserving Safety Alignment by Constraining Safety Tokens during Fine-tuning
arXiv:2603.07445v1 Announce Type: new Abstract: Large language models (LLMs) often require fine-tuning (FT) to perform well on downstream tasks, but FT can induce safety-alignment drift even when the training dataset contains only benign data. Prior work shows that introducing a...
Cross-Modal Taxonomic Generalization in (Vision-) Language Models
arXiv:2603.07474v1 Announce Type: new Abstract: What is the interplay between semantic representations learned by language models (LM) from surface form alone to those learned from more grounded evidence? We study this question for a scenario where part of the input...
Benchmarking Large Language Models for Quebec Insurance: From Closed-Book to Retrieval-Augmented Generation
arXiv:2603.07825v1 Announce Type: new Abstract: The digitization of insurance distribution in the Canadian province of Quebec, accelerated by legislative changes such as Bill 141, has created a significant "advice gap", leaving consumers to interpret complex financial contracts without professional guidance....
vLLM Hook v0: A Plug-in for Programming Model Internals on vLLM
arXiv:2603.06588v1 Announce Type: new Abstract: Modern artificial intelligence (AI) models are deployed on inference engines to optimize runtime efficiency and resource allocation, particularly for transformer-based large language models (LLMs). The vLLM project is a major open-source library to support model...
Not all tokens are needed(NAT): token efficient reinforcement learning
arXiv:2603.06619v1 Announce Type: new Abstract: Reinforcement learning (RL) has become a key driver of progress in large language models, but scaling RL to long chain-of-thought (CoT) trajectories is increasingly constrained by backpropagation over every generated token. Even with optimized rollout...
Enhancing Instruction Following of LLMs via Activation Steering with Dynamic Rejection
arXiv:2603.06745v1 Announce Type: new Abstract: Large Language Models (LLMs), despite advances in instruction tuning, often fail to follow complex user instructions. Activation steering techniques aim to mitigate this by manipulating model internals, but have a potential risk of oversteering, where...
Latent Autoencoder Ensemble Kalman Filter for Data assimilation
arXiv:2603.06752v1 Announce Type: new Abstract: The ensemble Kalman filter (EnKF) is widely used for data assimilation in high-dimensional systems, but its performance often deteriorates for strongly nonlinear dynamics due to the structural mismatch between the Kalman update and the underlying...
DeepFact: Co-Evolving Benchmarks and Agents for Deep Research Factuality
arXiv:2603.05912v1 Announce Type: new Abstract: Search-augmented LLM agents can produce deep research reports (DRRs), but verifying claim-level factuality remains challenging. Existing fact-checkers are primarily designed for general-domain, factoid-style atomic claims, and there is no benchmark to test whether such verifiers...
Reasoning Models Struggle to Control their Chains of Thought
arXiv:2603.05706v1 Announce Type: new Abstract: Chain-of-thought (CoT) monitoring is a promising tool for detecting misbehaviors and understanding the motivations of modern reasoning models. However, if models can control what they verbalize in their CoT, it could undermine CoT monitorability. To...
SAHOO: Safeguarded Alignment for High-Order Optimization Objectives in Recursive Self-Improvement
arXiv:2603.06333v1 Announce Type: new Abstract: Recursive self-improvement is moving from theory to practice: modern systems can critique, revise, and evaluate their own outputs, yet iterative self-modification risks subtle alignment drift. We introduce SAHOO, a practical framework to monitor and control...
The DSA's Blind Spot: Algorithmic Audit of Advertising and Minor Profiling on TikTok
arXiv:2603.05653v1 Announce Type: cross Abstract: Adolescents spend an increasing amount of their time in digital environments where their still-developing cognitive capacities leave them unable to recognize or resist commercial persuasion. Article 28(2) of the Digital Service Act (DSA) responds to...
CodeScout: Contextual Problem Statement Enhancement for Software Agents
arXiv:2603.05744v1 Announce Type: new Abstract: Current AI-powered code assistance tools often struggle with poorly-defined problem statements that lack sufficient task context and requirements specification. Recent analysis of software engineering agents reveals that failures on such underspecified requests are highly correlated...
PVminerLLM: Structured Extraction of Patient Voice from Patient-Generated Text using Large Language Models
arXiv:2603.05776v1 Announce Type: new Abstract: Motivation: Patient-generated text contains critical information about patients' lived experiences, social circumstances, and engagement in care, including factors that strongly influence adherence, care coordination, and health equity. However, these patient voice signals are rarely available...
RouteGoT: Node-Adaptive Routing for Cost-Efficient Graph of Thoughts Reasoning
arXiv:2603.05818v1 Announce Type: new Abstract: Large Language Models (LLMs) excel at multi-step reasoning, yet increasing the structural complexity of inference does not consistently improve system-level returns. Methods such as Tree of Thoughts (ToT), Graph of Thoughts (GoT), and Adaptive Graph...
Lost in Stories: Consistency Bugs in Long Story Generation by LLMs
arXiv:2603.05890v1 Announce Type: new Abstract: What happens when a storyteller forgets its own story? Large Language Models (LLMs) can now generate narratives spanning tens of thousands of words, but they often fail to maintain consistency throughout. When generating long-form narratives,...
Learning Next Action Predictors from Human-Computer Interaction
arXiv:2603.05923v1 Announce Type: new Abstract: Truly proactive AI systems must anticipate what we will do next. This foresight demands far richer information than the sparse signals we type into our prompts -- it demands reasoning over the entire context of...
SPOT: Span-level Pause-of-Thought for Efficient and Interpretable Latent Reasoning in Large Language Models
arXiv:2603.06222v1 Announce Type: new Abstract: Explicit Chain-of-Thought improves the reasoning performance of large language models but often incurs high inference cost due to verbose token-level traces. While recent approaches reduce this overhead via concise prompting or step pruning, they largely...
Mind the Gap: Pitfalls of LLM Alignment with Asian Public Opinion
arXiv:2603.06264v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly being deployed in multilingual, multicultural settings, yet their reliance on predominantly English-centric training data risks misalignment with the diverse cultural values of different societies. In this paper, we present...
Abductive Reasoning with Syllogistic Forms in Large Language Models
arXiv:2603.06428v1 Announce Type: new Abstract: Research in AI using Large-Language Models (LLMs) is rapidly evolving, and the comparison of their performance with human reasoning has become a key concern. Prior studies have indicated that LLMs and humans share similar biases,...
Beyond Rows to Reasoning: Agentic Retrieval for Multimodal Spreadsheet Understanding and Editing
arXiv:2603.06503v1 Announce Type: new Abstract: Recent advances in multimodal Retrieval-Augmented Generation (RAG) enable Large Language Models (LLMs) to analyze enterprise spreadsheet workbooks containing millions of cells, cross-sheet dependencies, and embedded visual artifacts. However, state-of-the-art approaches exclude critical context through single-pass...
Score-Guided Proximal Projection: A Unified Geometric Framework for Rectified Flow Editing
arXiv:2603.05761v1 Announce Type: new Abstract: Rectified Flow (RF) models achieve state-of-the-art generation quality, yet controlling them for precise tasks -- such as semantic editing or blind image recovery -- remains a challenge. Current approaches bifurcate into inversion-based guidance, which suffers...
Sparse Crosscoders for diffing MoEs and Dense models
arXiv:2603.05805v1 Announce Type: new Abstract: Mixture of Experts (MoE) achieve parameter-efficient scaling through sparse expert routing, yet their internal representations remain poorly understood compared to dense models. We present a systematic comparison of MoE and dense model internals using crosscoders,...
MoE Lens -- An Expert Is All You Need
arXiv:2603.05806v1 Announce Type: new Abstract: Mixture of Experts (MoE) models enable parameter-efficient scaling through sparse expert activations, yet optimizing their inference and memory costs remains challenging due to limited understanding of their specialization behavior. We present a systematic analysis of...
Self-Auditing Parameter-Efficient Fine-Tuning for Few-Shot 3D Medical Image Segmentation
arXiv:2603.05822v1 Announce Type: new Abstract: Adapting foundation models to new clinical sites remains challenging in practice. Domain shift and scarce annotations must be handled by experts, yet many clinical groups do not have ready access to skilled AI engineers to...
Dynamic Momentum Recalibration in Online Gradient Learning
arXiv:2603.06120v1 Announce Type: new Abstract: Stochastic Gradient Descent (SGD) and its momentum variants form the backbone of deep learning optimization, yet the underlying dynamics of their gradient behavior remain insufficiently understood. In this work, we reinterpret gradient updates through the...
Gradient Flow Polarizes Softmax Outputs towards Low-Entropy Solutions
arXiv:2603.06248v1 Announce Type: new Abstract: Understanding the intricate non-convex training dynamics of softmax-based models is crucial for explaining the empirical success of transformers. In this article, we analyze the gradient flow dynamics of the value-softmax model, defined as ${L}(\mathbf{V} \sigma(\mathbf{a}))$,...