CWoMP: Morpheme Representation Learning for Interlinear Glossing
arXiv:2603.18184v1 Announce Type: new Abstract: Interlinear glossed text (IGT) is a standard notation for language documentation which is linguistically rich but laborious to produce manually. Recent automated IGT methods treat glosses as character sequences, neglecting their compositional structure. We propose...
TopoChunker: Topology-Aware Agentic Document Chunking Framework
arXiv:2603.18409v1 Announce Type: new Abstract: Current document chunking methods for Retrieval-Augmented Generation (RAG) typically linearize text. This forced linearization strips away intrinsic topological hierarchies, creating ``semantic fragmentation'' that degrades downstream retrieval quality. In this paper, we propose TopoChunker, an agentic...
Adaptive Decoding via Test-Time Policy Learning for Self-Improving Generation
arXiv:2603.18428v1 Announce Type: new Abstract: Decoding strategies largely determine the quality of Large Language Model (LLM) outputs, yet widely used heuristics such as greedy or fixed temperature/top-p decoding are static and often task-agnostic, leading to suboptimal or inconsistent generation quality...
Mi:dm K 2.5 Pro
arXiv:2603.18788v1 Announce Type: new Abstract: The evolving LLM landscape requires capabilities beyond simple text generation, prioritizing multi-step reasoning, long-context understanding, and agentic workflows. This shift challenges existing models in enterprise environments, especially in Korean-language and domain-specific scenarios where scaling is...
Detecting Basic Values in A Noisy Russian Social Media Text Data: A Multi-Stage Classification Framework
arXiv:2603.18822v1 Announce Type: new Abstract: This study presents a multi-stage classification framework for detecting human values in noisy Russian language social media, validated on a random sample of 7.5 million public text posts. Drawing on Schwartz's theory of basic human...
RADIUS: Ranking, Distribution, and Significance - A Comprehensive Alignment Suite for Survey Simulation
arXiv:2603.19002v1 Announce Type: new Abstract: Simulation of surveys using LLMs is emerging as a powerful application for generating human-like responses at scale. Prior work evaluates survey simulation using metrics borrowed from other domains, which are often ad hoc, fragmented, and...
Hypothesis-Conditioned Query Rewriting for Decision-Useful Retrieval
arXiv:2603.19008v1 Announce Type: new Abstract: Retrieval-Augmented Generation (RAG) improves Large Language Models (LLMs) by grounding generation in external, non-parametric knowledge. However, when a task requires choosing among competing options, simply grounding generation in broadly relevant context is often insufficient to...
Frayed RoPE and Long Inputs: A Geometric Perspective
arXiv:2603.18017v1 Announce Type: new Abstract: Rotary Positional Embedding (RoPE) is a widely adopted technique for encoding position in language models, which, while effective, causes performance breakdown when input length exceeds training length. Prior analyses assert (rightly) that long inputs cause...
Engineering Verifiable Modularity in Transformers via Per-Layer Supervision
arXiv:2603.18029v1 Announce Type: new Abstract: Transformers resist surgical control. Ablating an attention head identified as critical for capitalization produces minimal behavioral change because distributed redundancy compensates for damage. This Hydra effect renders interpretability illusory: we may identify components through correlation,...
Fundamental Limits of Neural Network Sparsification: Evidence from Catastrophic Interpretability Collapse
arXiv:2603.18056v1 Announce Type: new Abstract: Extreme neural network sparsification (90% activation reduction) presents a critical challenge for mechanistic interpretability: understanding whether interpretable features survive aggressive compression. This work investigates feature survival under severe capacity constraints in hybrid Variational Autoencoder--Sparse Autoencoder...
Probabilistic Federated Learning on Uncertain and Heterogeneous Data with Model Personalization
arXiv:2603.18083v1 Announce Type: new Abstract: Conventional federated learning (FL) frameworks often suffer from training degradation due to data uncertainty and heterogeneity across local clients. Probabilistic approaches such as Bayesian neural networks (BNNs) can mitigate this issue by explicitly modeling uncertainty,...
BoundAD: Boundary-Aware Negative Generation for Time Series Anomaly Detection
arXiv:2603.18111v1 Announce Type: new Abstract: Contrastive learning methods for time series anomaly detection (TSAD) heavily depend on the quality of negative sample construction. However, existing strategies based on random perturbations or pseudo-anomaly injection often struggle to simultaneously preserve temporal semantic...
Tula: Optimizing Time, Cost, and Generalization in Distributed Large-Batch Training
arXiv:2603.18112v1 Announce Type: new Abstract: Distributed training increases the number of batches processed per iteration either by scaling-out (adding more nodes) or scaling-up (increasing the batch-size). However, the largest configuration does not necessarily yield the best performance. Horizontal scaling introduces...
Gradient-Informed Temporal Sampling Improves Rollout Accuracy in PDE Surrogate Training
arXiv:2603.18237v1 Announce Type: new Abstract: Researchers train neural simulators on uniformly sampled numerical simulation data. But under the same budget, does systematically sampled data provide the most effective information? A fundamental yet unformalized problem is how to sample training data...
Path-Constrained Mixture-of-Experts
arXiv:2603.18297v1 Announce Type: new Abstract: Sparse Mixture-of-Experts (MoE) architectures enable efficient scaling by activating only a subset of parameters for each input. However, conventional MoE routing selects each layer's experts independently, creating N^L possible expert paths -- for N experts...
Escaping Offline Pessimism: Vector-Field Reward Shaping for Safe Frontier Exploration
arXiv:2603.18326v1 Announce Type: new Abstract: While offline reinforcement learning provides reliable policies for real-world deployment, its inherent pessimism severely restricts an agent's ability to explore and collect novel data online. Drawing inspiration from safe reinforcement learning, exploring near the boundary...
RE-SAC: Disentangling aleatoric and epistemic risks in bus fleet control: A stable and robust ensemble DRL approach
arXiv:2603.18396v1 Announce Type: new Abstract: Bus holding control is challenging due to stochastic traffic and passenger demand. While deep reinforcement learning (DRL) shows promise, standard actor-critic algorithms suffer from Q-value instability in volatile environments. A key source of this instability...
Self-Tuning Sparse Attention: Multi-Fidelity Hyperparameter Optimization for Transformer Acceleration
arXiv:2603.18417v1 Announce Type: new Abstract: Sparse attention mechanisms promise to break the quadratic bottleneck of long-context transformers, yet production adoption remains limited by a critical usability gap: optimal hyperparameters vary substantially across layers and models, and current methods (e.g., SpargeAttn)...
Discounted Beta--Bernoulli Reward Estimation for Sample-Efficient Reinforcement Learning with Verifiable Rewards
arXiv:2603.18444v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) has emerged as an effective post-training paradigm for improving the reasoning capabilities of large language models. However, existing group-based RLVR methods often suffer from severe sample inefficiency. This inefficiency...
Seeking Universal Shot Language Understanding Solutions
arXiv:2603.18448v1 Announce Type: new Abstract: Shot language understanding (SLU) is crucial for cinematic analysis but remains challenging due to its diverse cinematographic dimensions and subjective expert judgment. While vision-language models (VLMs) have shown strong ability in general visual understanding, recent...
AIMER: Calibration-Free Task-Agnostic MoE Pruning
arXiv:2603.18492v1 Announce Type: new Abstract: Mixture-of-Experts (MoE) language models increase parameter capacity without proportional per-token compute, but the deployment still requires storing all experts, making expert pruning important for reducing memory and serving overhead. Existing task-agnostic expert pruning methods are...
Balancing the Reasoning Load: Difficulty-Differentiated Policy Optimization with Length Redistribution for Efficient and Robust Reinforcement Learning
arXiv:2603.18533v1 Announce Type: new Abstract: Large Reasoning Models (LRMs) have shown exceptional reasoning capabilities, but they also suffer from the issue of overthinking, often generating excessively long and redundant answers. For problems that exceed the model's capabilities, LRMs tend to...
Data-efficient pre-training by scaling synthetic megadocs
arXiv:2603.18534v1 Announce Type: new Abstract: Synthetic data augmentation has emerged as a promising solution when pre-training is constrained by data rather than compute. We study how to design synthetic data algorithms that achieve better loss scaling: not only lowering loss...
Birthright citizenship: why the text, history, and structure of a landmark 1952 statute doom Trump’s executive order
Brothers in Law is a recurring series by brothers Akhil and Vikram Amar, with special emphasis on measuring what the Supreme Court says against what the Constitution itself says. For more content from […]The postBirthright citizenship: why the text, history,...
Justices to consider the rights of asylum seekers at the U.S.-Mexico border
The Supreme Court will hear oral arguments next week in a challenge to the government’s policy of systematically turning back asylum seekers before they can reach the U.S. border with […]The postJustices to consider the rights of asylum seekers at...
Uninjured class members, hindsight harmlessness, presidential cronies, and the mistaken use of deadly force
The Relist Watch column examines cert petitions that the Supreme Court has “relisted” for its upcoming conference. A short explanation of relists is available here. There are 261 petitions and applications […]The postUninjured class members, hindsight harmlessness, presidential cronies, and...
Volume 2026, No. 1 – Wisconsin Law Review – UW–Madison
Contract Law and Civil Justice in Local Courts by Cathy Hwang & Justin Weinstein-Tull; Preempting Drug Price Reform by Shweta Kumar; Lessons Learned? COVID’s Continued Impact on Remote Work Disability Accommodations by D’Andra Millsap Shu; Unbundling AI Openness by Parth...
Catching Pokémon, Not Tax Bills
Introduction What if we told you that you could play a unique and magical game for free? What if we told you this game would let you chase fantastical creatures across your neighborhood, turning your daily stroll into an epic...
Federated Multi Agent Deep Learning and Neural Networks for Advanced Distributed Sensing in Wireless Networks
arXiv:2603.16881v1 Announce Type: new Abstract: Multi-agent deep learning (MADL), including multi-agent deep reinforcement learning (MADRL), distributed/federated training, and graph-structured neural networks, is becoming a unifying framework for decision-making and inference in wireless systems where sensing, communication, and computing are tightly...
HoloByte: Continuous Hyperspherical Distillation for Tokenizer-Free Modeling
arXiv:2603.16917v1 Announce Type: new Abstract: Sequence modeling universally relies on discrete subword tokenization to circumvent the $\mathcal{O}(N^2)$ computational intractability of native byte-level attention. However, this heuristic quantization imposes artificial morphological boundaries, enforces vocabulary dependence, and fractures the continuity of the...