Length Generalization Bounds for Transformers
arXiv:2603.02238v1 Announce Type: new Abstract: Length generalization is a key property of a learning algorithm that enables it to make correct predictions on inputs of any length, given finite training data. To provide such a guarantee, one needs to be...
High-order Knowledge Based Network Controllability Robustness Prediction: A Hypergraph Neural Network Approach
arXiv:2603.02265v1 Announce Type: new Abstract: In order to evaluate the invulnerability of networks against various types of attacks and provide guidance for potential performance enhancement as well as controllability maintenance, network controllability robustness (NCR) has attracted increasing attention in recent...
Graph Attention Based Prioritization of Disease Responsible Genes from Multimodal Alzheimer's Network
arXiv:2603.02273v1 Announce Type: new Abstract: Prioritizing disease-associated genes is central to understanding the molecular mechanisms of complex disorders such as Alzheimer's disease (AD). Traditional network-based approaches rely on static centrality measures and often fail to capture cross-modal biological heterogeneity. We...
Temporal Imbalance of Positive and Negative Supervision in Class-Incremental Learning
arXiv:2603.02280v1 Announce Type: new Abstract: With the widespread adoption of deep learning in visual tasks, Class-Incremental Learning (CIL) has become an important paradigm for handling dynamically evolving data distributions. However, CIL faces the core challenge of catastrophic forgetting, often manifested...
Quantum-Inspired Fine-Tuning for Few-Shot AIGC Detection via Phase-Structured Reparameterization
arXiv:2603.02281v1 Announce Type: new Abstract: Recent studies show that quantum neural networks (QNNs) generalize well in few-shot regimes. To extend this advantage to large-scale tasks, we propose Q-LoRA, a quantum-enhanced fine-tuning scheme that integrates lightweight QNNs into the low-rank adaptation...
The Malignant Tail: Spectral Segregation of Label Noise in Over-Parameterized Networks
arXiv:2603.02293v1 Announce Type: new Abstract: While implicit regularization facilitates benign overfitting in low-noise regimes, recent theoretical work predicts a sharp phase transition to harmful overfitting as the noise-to-signal ratio increases. We experimentally isolate the geometric mechanism of this transition: the...
Preconditioned Score and Flow Matching
arXiv:2603.02337v1 Announce Type: new Abstract: Flow matching and score-based diffusion train vector fields under intermediate distributions $p_t$, whose geometry can strongly affect their optimization. We show that the covariance $\Sigma_t$ of $p_t$ governs optimization bias: when $\Sigma_t$ is ill-conditioned, and...
Rigidity-Aware Geometric Pretraining for Protein Design and Conformational Ensembles
arXiv:2603.02406v1 Announce Type: new Abstract: Generative models have recently advanced $\textit{de novo}$ protein design by learning the statistical regularities of natural structures. However, current approaches face three key limitations: (1) Existing methods cannot jointly learn protein geometry and design tasks,...
Personalized Multi-Agent Average Reward TD-Learning via Joint Linear Approximation
arXiv:2603.02426v1 Announce Type: new Abstract: We study personalized multi-agent average reward TD learning, in which a collection of agents interacts with different environments and jointly learns their respective value functions. We focus on the setting where there exists a shared...
Dimension-Independent Convergence of Underdamped Langevin Monte Carlo in KL Divergence
arXiv:2603.02429v1 Announce Type: new Abstract: Underdamped Langevin dynamics (ULD) is a widely-used sampler for Gibbs distributions $\pi\propto e^{-V}$, and is often empirically effective in high dimensions. However, existing non-asymptotic convergence guarantees for discretized ULD typically scale polynomially with the ambient...
A Unified Revisit of Temperature in Classification-Based Knowledge Distillation
arXiv:2603.02430v1 Announce Type: new Abstract: A central idea of knowledge distillation is to expose relational structure embedded in the teacher's weights for the student to learn, which is often facilitated using a temperature parameter. Despite its widespread use, there remains...
Spectral Regularization for Diffusion Models
arXiv:2603.02447v1 Announce Type: new Abstract: Diffusion models are typically trained using pointwise reconstruction objectives that are agnostic to the spectral and multi-scale structure of natural signals. We propose a loss-level spectral regularization framework that augments standard diffusion training with differentiable...
Manifold Aware Denoising Score Matching (MAD)
arXiv:2603.02452v1 Announce Type: new Abstract: A major focus in designing methods for learning distributions defined on manifolds is to alleviate the need to implicitly learn the manifold so that learning can concentrate on the data distribution within the manifold. However,...
EdgeFLow: Serverless Federated Learning via Sequential Model Migration in Edge Networks
arXiv:2603.02562v1 Announce Type: new Abstract: Federated Learning (FL) has emerged as a transformative distributed learning paradigm in the era of Internet of Things (IoT), reconceptualizing data processing methodologies. However, FL systems face significant communication bottlenecks due to inevitable client-server data...
Court unanimously sides with government in immigration dispute
The Supreme Court unanimously sided with the federal government on Wednesday in Urias-Orellana v. Bondi, holding in an opinion by Justice Ketanji Brown Jackson that federal courts of appeals must […]The postCourt unanimously sides with government in immigration disputeappeared first...
Birthright citizenship: an empirical analysis of supposedly originalist briefs
Brothers in Law is a recurring series by brothers Akhil and Vikram Amar, with special emphasis on measuring what the Supreme Court says against what the Constitution itself says. For more content from […]The postBirthright citizenship: an empirical analysis of...
Distribution-Aware Companding Quantization of Large Language Models
arXiv:2603.00364v1 Announce Type: new Abstract: Large language models such as GPT and Llama are trained with a next-token prediction loss. In this work, we suggest that training language models to predict multiple future tokens at once results in higher sample...
A Typologically Grounded Evaluation Framework for Word Order and Morphology Sensitivity in Multilingual Masked LMs
arXiv:2603.00432v1 Announce Type: new Abstract: We introduce a typology-aware diagnostic for multilingual masked language models that tests reliance on word order versus inflectional form. Using Universal Dependencies, we apply inference-time perturbations: full token scrambling, content-word scrambling with function words fixed,...
CIRCUS: Circuit Consensus under Uncertainty via Stability Ensembles
arXiv:2603.00523v1 Announce Type: new Abstract: Mechanistic circuit discovery is notoriously sensitive to arbitrary analyst choices, especially pruning thresholds and feature dictionaries, often yielding brittle "one-shot" explanations with no principled notion of uncertainty. We reframe circuit discovery as an uncertainty-quantification problem...
CoMoL: Efficient Mixture of LoRA Experts via Dynamic Core Space Merging
arXiv:2603.00573v1 Announce Type: new Abstract: Large language models (LLMs) achieve remarkable performance on diverse downstream and domain-specific tasks via parameter-efficient fine-tuning (PEFT). However, existing PEFT methods, particularly MoE-LoRA architectures, suffer from limited parameter efficiency and coarse-grained adaptation due to the...
SSKG Hub: An Expert-Guided Platform for LLM-Empowered Sustainability Standards Knowledge Graphs
arXiv:2603.00669v1 Announce Type: new Abstract: Sustainability disclosure standards (e.g., GRI, SASB, TCFD, IFRS S2) are comprehensive yet lengthy, terminology-dense, and highly cross-referential, hindering structured analysis and downstream use. We present SSKG Hub (Sustainability Standards Knowledge Graph Hub), a research prototype...
Polynomial Mixing for Efficient Self-supervised Speech Encoders
arXiv:2603.00683v1 Announce Type: new Abstract: State-of-the-art speech-to-text models typically employ Transformer-based encoders that model token dependencies via self-attention mechanisms. However, the quadratic complexity of self-attention in both memory and computation imposes significant constraints on scalability. In this work, we propose...
RAVEL: Reasoning Agents for Validating and Evaluating LLM Text Synthesis
arXiv:2603.00686v1 Announce Type: new Abstract: Large Language Models have evolved from single-round generators into long-horizon agents, capable of complex text synthesis scenarios. However, current evaluation frameworks lack the ability to assess the actual synthesis operations, such as outlining, drafting, and...
RLAR: An Agentic Reward System for Multi-task Reinforcement Learning on Large Language Models
arXiv:2603.00724v1 Announce Type: new Abstract: Large language model alignment via reinforcement learning depends critically on reward function quality. However, static, domain-specific reward models are often costly to train and exhibit poor generalization in out-of-distribution scenarios encountered during RL iterations. We...
Constitutional Black-Box Monitoring for Scheming in LLM Agents
arXiv:2603.00829v1 Announce Type: new Abstract: Safe deployment of Large Language Model (LLM) agents in autonomous settings requires reliable oversight mechanisms. A central challenge is detecting scheming, where agents covertly pursue misaligned goals. One approach to mitigating such risks is LLM-based...
Learning Nested Named Entity Recognition from Flat Annotations
arXiv:2603.00840v1 Announce Type: new Abstract: Nested named entity recognition identifies entities contained within other entities, but requires expensive multi-level annotation. While flat NER corpora exist abundantly, nested resources remain scarce. We investigate whether models can learn nested structure from flat...
KVSlimmer: Theoretical Insights and Practical Optimizations for Asymmetric KV Merging
arXiv:2603.00907v1 Announce Type: new Abstract: The growing computational and memory demands of the Key-Value (KV) cache significantly limit the ability of Large Language Models (LLMs). While KV merging has emerged as a promising solution, existing methods that rely on empirical...