BenchBrowser -- Collecting Evidence for Evaluating Benchmark Validity
arXiv:2603.18019v1 Announce Type: new Abstract: Do language model benchmarks actually measure what practitioners intend them to ? High-level metadata is too coarse to convey the granular reality of benchmarks: a "poetry" benchmark may never test for haikus, while "instruction-following" benchmarks...
TARo: Token-level Adaptive Routing for LLM Test-time Alignment
arXiv:2603.18411v1 Announce Type: new Abstract: Large language models (LLMs) exhibit strong reasoning capabilities but typically require expensive post-training to reach high performance. Recent test-time alignment methods offer a lightweight alternative, but have been explored mainly for preference alignment rather than...
Implicit Grading Bias in Large Language Models: How Writing Style Affects Automated Assessment Across Math, Programming, and Essay Tasks
arXiv:2603.18765v1 Announce Type: new Abstract: As large language models (LLMs) are increasingly deployed as automated graders in educational settings, concerns about fairness and bias in their evaluations have become critical. This study investigates whether LLMs exhibit implicit grading bias based...
InfoMamba: An Attention-Free Hybrid Mamba-Transformer Model
arXiv:2603.18031v1 Announce Type: new Abstract: Balancing fine-grained local modeling with long-range dependency capture under computational constraints remains a central challenge in sequence modeling. While Transformers provide strong token mixing, they suffer from quadratic complexity, whereas Mamba-style selective state-space models (SSMs)...
Quotient Geometry and Persistence-Stable Metrics for Swarm Configurations
arXiv:2603.18041v1 Announce Type: new Abstract: Swarm and constellation reconfiguration can be viewed as motion of an unordered point configuration in an ambient space. Here, we provide persistence-stable, symmetry-invariant geometric representations for comparing and monitoring multi-agent configuration data. We introduce a...
Towards Noise-Resilient Quantum Multi-Armed and Stochastic Linear Bandits
arXiv:2603.18431v1 Announce Type: new Abstract: Quantum multi-armed bandits (MAB) and stochastic linear bandits (SLB) have recently attracted significant attention, as their quantum counterparts can achieve quadratic speedups over classical MAB and SLB. However, most existing quantum MAB algorithms assume ideal...
Discounted Beta--Bernoulli Reward Estimation for Sample-Efficient Reinforcement Learning with Verifiable Rewards
arXiv:2603.18444v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) has emerged as an effective post-training paradigm for improving the reasoning capabilities of large language models. However, existing group-based RLVR methods often suffer from severe sample inefficiency. This inefficiency...
AcceRL: A Distributed Asynchronous Reinforcement Learning and World Model Framework for Vision-Language-Action Models
arXiv:2603.18464v1 Announce Type: new Abstract: Reinforcement learning (RL) for large-scale Vision-Language-Action (VLA) models faces significant challenges in computational efficiency and data acquisition. We propose AcceRL, a fully asynchronous and decoupled RL framework designed to eliminate synchronization barriers by physically isolating...
Meta rolls out new AI content enforcement systems while reducing reliance on third-party vendors
Meta believes these AI systems can detect more violations with greater accuracy, better prevent scams, respond more quickly to real-world events, and reduce over-enforcement.
Integrating Inductive Biases in Transformers via Distillation for Financial Time Series Forecasting
arXiv:2603.16985v1 Announce Type: new Abstract: Transformer-based models have been widely adopted for time-series forecasting due to their high representational capacity and architectural flexibility. However, many Transformer variants implicitly assume stationarity and stable temporal dynamics -- assumptions routinely violated in financial...
REAL: Regression-Aware Reinforcement Learning for LLM-as-a-Judge
arXiv:2603.17145v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly deployed as automated evaluators that assign numeric scores to model outputs, a paradigm known as LLM-as-a-Judge. However, standard Reinforcement Learning (RL) methods typically rely on binary rewards (e.g., 0-1...
Noise-Response Calibration: A Causal Intervention Protocol for LLM-Judges
arXiv:2603.17172v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used as automated judges and synthetic labelers, especially in low-label settings. Yet these systems are stochastic and often overconfident, which makes deployment decisions difficult when external ground truth is...
Abstraction as a Memory-Efficient Inductive Bias for Continual Learning
arXiv:2603.17198v1 Announce Type: new Abstract: The real world is non-stationary and infinitely complex, requiring intelligent agents to learn continually without the prohibitive cost of retraining from scratch. While online continual learning offers a framework for this setting, learning new information...
WINFlowNets: Warm-up Integrated Networks Training of Generative Flow Networks for Robotics and Machine Fault Adaptation
arXiv:2603.17301v1 Announce Type: new Abstract: Generative Flow Networks for continuous scenarios (CFlowNets) have shown promise in solving sequential decision-making tasks by learning stochastic policies using a flow and a retrieval network. Despite their demonstrated efficiency compared to state-of-the-art Reinforcement Learning...
Beyond Outliers: A Data-Free Layer-wise Mixed-Precision Quantization Approach Driven by Numerical and Structural Dual-Sensitivity
arXiv:2603.17354v1 Announce Type: new Abstract: Layer-wise mixed-precision quantization (LMPQ) enables effective compression under extreme low-bit settings by allocating higher precision to sensitive layers. However, existing methods typically treat all intra-layer weight modules uniformly and rely on a single numerical property...
Cohomological Obstructions to Global Counterfactuals: A Sheaf-Theoretic Foundation for Generative Causal Models
arXiv:2603.17384v1 Announce Type: new Abstract: Current continuous generative models (e.g., Diffusion Models, Flow Matching) implicitly assume that locally consistent causal mechanisms naturally yield globally coherent counterfactuals. In this paper, we prove that this assumption fails fundamentally when the causal graph...
The Phasor Transformer: Resolving Attention Bottlenecks on the Unit Circle
arXiv:2603.17433v1 Announce Type: new Abstract: Transformer models have redefined sequence learning, yet dot-product self-attention introduces a quadratic token-mixing bottleneck for long-context time-series. We introduce the \textbf{Phasor Transformer} block, a phase-native alternative representing sequence states on the unit-circle manifold $S^1$. Each...
QuantFL: Sustainable Federated Learning for Edge IoT via Pre-Trained Model Quantisation
arXiv:2603.17507v1 Announce Type: new Abstract: Federated Learning (FL) enables privacy-preserving intelligence on Internet of Things (IoT) devices but incurs a significant carbon footprint due to the high energy cost of frequent uplink transmission. While pre-trained models are increasingly available on...
Form Follows Function: Recursive Stem Model
arXiv:2603.15641v1 Announce Type: new Abstract: Recursive reasoning models such as Hierarchical Reasoning Model (HRM) and Tiny Recursive Model (TRM) show that small, weight-shared networks can solve compute-heavy and NP puzzles by iteratively refining latent states, but their training typically relies...
MOSAIC: Composable Safety Alignment with Modular Control Tokens
arXiv:2603.16210v1 Announce Type: new Abstract: Safety alignment in large language models (LLMs) is commonly implemented as a single static policy embedded in model parameters. However, real-world deployments often require context-dependent safety rules that vary across users, regions, and applications. Existing...
VIGIL: Towards Edge-Extended Agentic AI for Enterprise IT Support
arXiv:2603.16110v1 Announce Type: new Abstract: Enterprise IT support is constrained by heterogeneous devices, evolving policies, and long-tail failure modes that are difficult to resolve centrally. We present VIGIL, an edge-extended agentic AI system that deploys desktop-resident agents to perform situated...
Context-Length Robustness in Question Answering Models: A Comparative Empirical Study
arXiv:2603.15723v1 Announce Type: new Abstract: Large language models are increasingly deployed in settings where relevant information is embedded within long and noisy contexts. Despite this, robustness to growing context length remains poorly understood across different question answering tasks. In this...
Frequency Matters: Fast Model-Agnostic Data Curation for Pruning and Quantization
arXiv:2603.16105v1 Announce Type: new Abstract: Post-training model compression is essential for enhancing the portability of Large Language Models (LLMs) while preserving their performance. While several compression approaches have been proposed, less emphasis has been placed on selecting the most suitable...
Pre-training LLM without Learning Rate Decay Enhances Supervised Fine-Tuning
arXiv:2603.16127v1 Announce Type: new Abstract: We investigate the role of learning rate scheduling in the large-scale pre-training of large language models, focusing on its influence on downstream performance after supervised fine-tuning (SFT). Decay-based learning rate schedulers are widely used to...
Social Simulacra in the Wild: AI Agent Communities on Moltbook
arXiv:2603.16128v1 Announce Type: new Abstract: As autonomous LLM-based agents increasingly populate social platforms, understanding the dynamics of AI-agent communities becomes essential for both communication research and platform governance. We present the first large-scale empirical comparison of AI-agent and human online...
PlotTwist: A Creative Plot Generation Framework with Small Language Models
arXiv:2603.16410v1 Announce Type: new Abstract: Creative plot generation presents a fundamental challenge for language models: transforming a concise premise into a coherent narrative that sustains global structure, character development, and emotional resonance. Although recent Large Language Models (LLMs) demonstrate strong...
Discovering the Hidden Role of Gini Index In Prompt-based Classification
arXiv:2603.15654v1 Announce Type: new Abstract: In classification tasks, the long-tailed minority classes usually offer the predictions that are most important. Yet these classes consistently exhibit low accuracies, whereas a few high-performing classes dominate the game. We pursue a foundational understanding...
Transition Flow Matching
arXiv:2603.15689v1 Announce Type: new Abstract: Mainstream flow matching methods typically focus on learning the local velocity field, which inherently requires multiple integration steps during generation. In contrast, Mean Velocity Flow models establish a relationship between the local velocity field and...
When Stability Fails: Hidden Failure Modes Of LLMS in Data-Constrained Scientific Decision-Making
arXiv:2603.15840v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used as decision-support tools in data-constrained scientific workflows, where correctness and validity are critical. However, evaluation practices often emphasize stability or reproducibility across repeated runs. While these properties are...
Deriving Hyperparameter Scaling Laws via Modern Optimization Theory
arXiv:2603.15958v1 Announce Type: new Abstract: Hyperparameter transfer has become an important component of modern large-scale training recipes. Existing methods, such as muP, primarily focus on transfer between model sizes, with transfer across batch sizes and training horizons often relying on...