One Size Does Not Fit All: Token-Wise Adaptive Compression for KV Cache
arXiv:2603.04411v1 Announce Type: new Abstract: Despite the remarkable progress of Large Language Models (LLMs), the escalating memory footprint of the Key-Value (KV) cache remains a critical bottleneck for efficient inference. While dimensionality reduction offers a promising compression avenue, existing approaches...
Optimizing Language Models for Crosslingual Knowledge Consistency
arXiv:2603.04678v1 Announce Type: new Abstract: Large language models are known to often exhibit inconsistent knowledge. This is particularly problematic in multilingual scenarios, where models are likely to be asked similar questions in different languages, and inconsistent responses can undermine their...
Non-Zipfian Distribution of Stopwords and Subset Selection Models
arXiv:2603.04691v1 Announce Type: new Abstract: Stopwords are words that are not very informative to the content or the meaning of a language text. Most stopwords are function words but can also be common verbs, adjectives and adverbs. In contrast to...
Stacked from One: Multi-Scale Self-Injection for Context Window Extension
arXiv:2603.04759v1 Announce Type: new Abstract: The limited context window of contemporary large language models (LLMs) remains a primary bottleneck for their broader application across diverse domains. Although continual pre-training on long-context data offers a straightforward solution, it incurs prohibitive data...
AILS-NTUA at SemEval-2026 Task 3: Efficient Dimensional Aspect-Based Sentiment Analysis
arXiv:2603.04933v1 Announce Type: new Abstract: In this paper, we present AILS-NTUA system for Track-A of SemEval-2026 Task 3 on Dimensional Aspect-Based Sentiment Analysis (DimABSA), which encompasses three complementary problems: Dimensional Aspect Sentiment Regression (DimASR), Dimensional Aspect Sentiment Triplet Extraction (DimASTE),...
VSPrefill: Vertical-Slash Sparse Attention with Lightweight Indexing for Long-Context Prefilling
arXiv:2603.04460v1 Announce Type: new Abstract: The quadratic complexity of self-attention during the prefill phase impedes long-context inference in large language models. Existing sparse attention methods face a trade-off among context adaptivity, sampling overhead, and fine-tuning costs. We propose VSPrefill, a...
Understanding the Dynamics of Demonstration Conflict in In-Context Learning
arXiv:2603.04464v1 Announce Type: new Abstract: In-context learning enables large language models to perform novel tasks through few-shot demonstrations. However, demonstrations per se can naturally contain noise and conflicting examples, making this capability vulnerable. To understand how models process such conflicts,...
Oracle-efficient Hybrid Learning with Constrained Adversaries
arXiv:2603.04546v1 Announce Type: new Abstract: The Hybrid Online Learning Problem, where features are drawn i.i.d. from an unknown distribution but labels are generated adversarially, is a well-motivated setting positioned between statistical and fully-adversarial online learning. Prior work has presented a...
Distribution-Conditioned Transport
arXiv:2603.04736v1 Announce Type: new Abstract: Learning a transport model that maps a source distribution to a target distribution is a canonical problem in machine learning, but scientific applications increasingly require models that can generalize to source and target distributions unseen...
Distributional Reinforcement Learning with Information Bottleneck for Uncertainty-Aware DRAM Equalization
arXiv:2603.04768v1 Announce Type: new Abstract: Equalizer parameter optimization is critical for signal integrity in high-speed memory systems operating at multi-gigabit data rates. However, existing methods suffer from computationally expensive eye diagram evaluation, optimization of expected rather than worst-case performance, and...
Distributional Equivalence in Linear Non-Gaussian Latent-Variable Cyclic Causal Models: Characterization and Learning
arXiv:2603.04780v1 Announce Type: new Abstract: Causal discovery with latent variables is a fundamental task. Yet most existing methods rely on strong structural assumptions, such as enforcing specific indicator patterns for latents or restricting how they can interact with others. We...
Immigration Enforcement and Constraints on Information Commandeering
The debate over American immigration policy reflects deep moral divides over the meaning of American identity and the scope of fundamental individual rights like due process and the freedom of movement. Although the modern American immigration system no longer includes...
Trump gets data center companies to pledge to pay for power generation
With no enforcement and questionable economics, it may not make a difference.
From Conflict to Consensus: Boosting Medical Reasoning via Multi-Round Agentic RAG
arXiv:2603.03292v1 Announce Type: cross Abstract: Large Language Models (LLMs) exhibit high reasoning capacity in medical question-answering, but their tendency to produce hallucinations and outdated knowledge poses critical risks in healthcare fields. While Retrieval-Augmented Generation (RAG) mitigates these issues, existing methods...
Draft-Conditioned Constrained Decoding for Structured Generation in LLMs
arXiv:2603.03305v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly used to generate executable outputs, JSON objects, and API calls, where a single syntax error can make the output unusable. Constrained decoding enforces validity token-by-token via masking and renormalization,...
Token-Oriented Object Notation vs JSON: A Benchmark of Plain and Constrained Decoding Generation
arXiv:2603.03306v1 Announce Type: cross Abstract: Recently presented Token-Oriented Object Notation (TOON) aims to replace JSON as a serialization format for passing structured data to LLMs with significantly reduced token usage. While showing solid accuracy in LLM comprehension, there is a...
How does fine-tuning improve sensorimotor representations in large language models?
arXiv:2603.03313v1 Announce Type: cross Abstract: Large Language Models (LLMs) exhibit a significant "embodiment gap", where their text-based representations fail to align with human sensorimotor experiences. This study systematically investigates whether and how task-specific fine-tuning can bridge this gap. Utilizing Representational...
Automated Concept Discovery for LLM-as-a-Judge Preference Analysis
arXiv:2603.03319v1 Announce Type: cross Abstract: Large Language Models (LLMs) are increasingly used as scalable evaluators of model outputs, but their preference judgments exhibit systematic biases and can diverge from human evaluations. Prior work on LLM-as-a-judge has largely focused on a...
StructLens: A Structural Lens for Language Models via Maximum Spanning Trees
arXiv:2603.03328v1 Announce Type: new Abstract: Language exhibits inherent structures, a property that explains both language acquisition and language change. Given this characteristic, we expect language models to manifest internal structures as well. While interpretability research has investigated the components of...
AutoHarness: improving LLM agents by automatically synthesizing a code harness
arXiv:2603.03329v1 Announce Type: new Abstract: Despite significant strides in language models in the last few years, when used as agents, such models often try to perform actions that are not just suboptimal for a given state, but are strictly prohibited...
Compressed Sensing for Capability Localization in Large Language Models
arXiv:2603.03335v1 Announce Type: new Abstract: Large language models (LLMs) exhibit a wide range of capabilities, including mathematical reasoning, code generation, and linguistic behaviors. We show that many capabilities are highly localized to small subsets of attention heads within Transformer architectures....
When Shallow Wins: Silent Failures and the Depth-Accuracy Paradox in Latent Reasoning
arXiv:2603.03475v1 Announce Type: new Abstract: Mathematical reasoning models are widely deployed in education, automated tutoring, and decision support systems despite exhibiting fundamental computational instabilities. We demonstrate that state-of-the-art models (Qwen2.5-Math-7B) achieve 61% accuracy through a mixture of reliable and unreliable...
Orbital Transformers for Predicting Wavefunctions in Time-Dependent Density Functional Theory
arXiv:2603.03511v1 Announce Type: new Abstract: We aim to learn wavefunctions simulated by time-dependent density functional theory (TDDFT), which can be efficiently represented as linear combination coefficients of atomic orbitals. In real-time TDDFT, the electronic wavefunctions of a molecule evolve over...
NuMuon: Nuclear-Norm-Constrained Muon for Compressible LLM Training
arXiv:2603.03597v1 Announce Type: new Abstract: The rapid progress of large language models (LLMs) is increasingly constrained by memory and deployment costs, motivating compression methods for practical deployment. Many state-of-the-art compression pipelines leverage the low-rank structure of trained weight matrices, a...
JANUS: Structured Bidirectional Generation for Guaranteed Constraints and Analytical Uncertainty
arXiv:2603.03748v1 Announce Type: new Abstract: High-stakes synthetic data generation faces a fundamental Quadrilemma: achieving Fidelity to the original distribution, Control over complex logical constraints, Reliability in uncertainty estimation, and Efficiency in computational cost -- simultaneously. State-of-the-art Deep Generative Models (CTGAN,...
MOOSE-Star: Unlocking Tractable Training for Scientific Discovery by Breaking the Complexity Barrier
arXiv:2603.03756v1 Announce Type: new Abstract: While large language models (LLMs) show promise in scientific discovery, existing research focuses on inference or feedback-driven training, leaving the direct modeling of the generative reasoning process, $P(\text{hypothesis}|\text{background})$ ($P(h|b)$), unexplored. We demonstrate that directly training...
LEA: Label Enumeration Attack in Vertical Federated Learning
arXiv:2603.03777v1 Announce Type: new Abstract: A typical Vertical Federated Learning (VFL) scenario involves several participants collaboratively training a machine learning model, where each party has different features for the same samples, with labels held exclusively by one party. Since labels...
The Distribution of Phoneme Frequencies across the World's Languages: Macroscopic and Microscopic Information-Theoretic Models
arXiv:2603.02860v1 Announce Type: new Abstract: We demonstrate that the frequency distribution of phonemes across languages can be explained at both macroscopic and microscopic levels. Macroscopically, phoneme rank-frequency distributions closely follow the order statistics of a symmetric Dirichlet distribution whose single...
Eval4Sim: An Evaluation Framework for Persona Simulation
arXiv:2603.02876v1 Announce Type: new Abstract: Large Language Model (LLM) personas with explicit specifications of attributes, background, and behavioural tendencies are increasingly used to simulate human conversations for tasks such as user modeling, social reasoning, and behavioural analysis. Ensuring that persona-grounded...