Arbitration

LOW Academic International

Compressed Sensing for Capability Localization in Large Language Models

arXiv:2603.03335v1 Announce Type: new Abstract: Large language models (LLMs) exhibit a wide range of capabilities, including mathematical reasoning, code generation, and linguistic behaviors. We show that many capabilities are highly localized to small subsets of attention heads within Transformer architectures....

1 min 1 month, 1 week ago

bit

LOW Academic International

When Shallow Wins: Silent Failures and the Depth-Accuracy Paradox in Latent Reasoning

arXiv:2603.03475v1 Announce Type: new Abstract: Mathematical reasoning models are widely deployed in education, automated tutoring, and decision support systems despite exhibiting fundamental computational instabilities. We demonstrate that state-of-the-art models (Qwen2.5-Math-7B) achieve 61% accuracy through a mixture of reliable and unreliable...

1 min 1 month, 1 week ago

bit

LOW Academic European Union

Solving adversarial examples requires solving exponential misalignment

arXiv:2603.03507v1 Announce Type: new Abstract: Adversarial attacks - input perturbations imperceptible to humans that fool neural networks - remain both a persistent failure mode in machine learning, and a phenomenon with mysterious origins. To shed light, we define and analyze...

1 min 1 month, 1 week ago

bit

LOW Academic International

Orbital Transformers for Predicting Wavefunctions in Time-Dependent Density Functional Theory

arXiv:2603.03511v1 Announce Type: new Abstract: We aim to learn wavefunctions simulated by time-dependent density functional theory (TDDFT), which can be efficiently represented as linear combination coefficients of atomic orbitals. In real-time TDDFT, the electronic wavefunctions of a molecule evolve over...

1 min 1 month, 1 week ago

bit

LOW Academic International

NuMuon: Nuclear-Norm-Constrained Muon for Compressible LLM Training

arXiv:2603.03597v1 Announce Type: new Abstract: The rapid progress of large language models (LLMs) is increasingly constrained by memory and deployment costs, motivating compression methods for practical deployment. Many state-of-the-art compression pipelines leverage the low-rank structure of trained weight matrices, a...

1 min 1 month, 1 week ago

bit

LOW Academic International

JANUS: Structured Bidirectional Generation for Guaranteed Constraints and Analytical Uncertainty

arXiv:2603.03748v1 Announce Type: new Abstract: High-stakes synthetic data generation faces a fundamental Quadrilemma: achieving Fidelity to the original distribution, Control over complex logical constraints, Reliability in uncertainty estimation, and Efficiency in computational cost -- simultaneously. State-of-the-art Deep Generative Models (CTGAN,...

1 min 1 month, 1 week ago

adr

LOW Academic International

MOOSE-Star: Unlocking Tractable Training for Scientific Discovery by Breaking the Complexity Barrier

arXiv:2603.03756v1 Announce Type: new Abstract: While large language models (LLMs) show promise in scientific discovery, existing research focuses on inference or feedback-driven training, leaving the direct modeling of the generative reasoning process, $P(\text{hypothesis}|\text{background})$ ($P(h|b)$), unexplored. We demonstrate that directly training...

1 min 1 month, 1 week ago

bit

LOW Academic International

LEA: Label Enumeration Attack in Vertical Federated Learning

arXiv:2603.03777v1 Announce Type: new Abstract: A typical Vertical Federated Learning (VFL) scenario involves several participants collaboratively training a machine learning model, where each party has different features for the same samples, with labels held exclusively by one party. Since labels...

1 min 1 month, 1 week ago

bit

LOW Academic European Union

Large-Margin Hyperdimensional Computing: A Learning-Theoretical Perspective

arXiv:2603.03830v1 Announce Type: new Abstract: Overparameterized machine learning (ML) methods such as neural networks may be prohibitively resource intensive for devices with limited computational capabilities. Hyperdimensional computing (HDC) is an emerging resource efficient and low-complexity ML method that allows hardware...

1 min 1 month, 1 week ago

bit

LOW Academic United States

Believe Your Model: Distribution-Guided Confidence Calibration

arXiv:2603.03872v1 Announce Type: new Abstract: Large Reasoning Models have demonstrated remarkable performance with the advancement of test-time scaling techniques, which enhances prediction accuracy by generating multiple candidate responses and selecting the most reliable answer. While prior work has analyzed that...

1 min 1 month, 1 week ago

bit

LOW Academic International

The Distribution of Phoneme Frequencies across the World's Languages: Macroscopic and Microscopic Information-Theoretic Models

arXiv:2603.02860v1 Announce Type: new Abstract: We demonstrate that the frequency distribution of phonemes across languages can be explained at both macroscopic and microscopic levels. Macroscopically, phoneme rank-frequency distributions closely follow the order statistics of a symmetric Dirichlet distribution whose single...

1 min 1 month, 2 weeks ago

bit

LOW Academic International

Eval4Sim: An Evaluation Framework for Persona Simulation

arXiv:2603.02876v1 Announce Type: new Abstract: Large Language Model (LLM) personas with explicit specifications of attributes, background, and behavioural tendencies are increasingly used to simulate human conversations for tasks such as user modeling, social reasoning, and behavioural analysis. Ensuring that persona-grounded...

1 min 1 month, 2 weeks ago

bit

LOW Academic International

MaBERT:A Padding Safe Interleaved Transformer Mamba Hybrid Encoder for Efficient Extended Context Masked Language Modeling

arXiv:2603.03001v1 Announce Type: new Abstract: Self attention encoders such as Bidirectional Encoder Representations from Transformers(BERT) scale quadratically with sequence length, making long context modeling expensive. Linear time state space models, such as Mamba, are efficient; however, they show limitations in...

1 min 1 month, 2 weeks ago

adr

LOW Academic International

Routing Absorption in Sparse Attention: Why Random Gates Are Hard to Beat

arXiv:2603.02227v1 Announce Type: cross Abstract: Can a transformer learn which attention entries matter during training? In principle, yes: attention distributions are highly concentrated, and a small gate network can identify the important entries post-hoc with near-perfect accuracy. In practice, barely....

1 min 1 month, 2 weeks ago

bit

LOW Academic European Union

Neural Paging: Learning Context Management Policies for Turing-Complete Agents

arXiv:2603.02228v1 Announce Type: new Abstract: The proof that Large Language Models (LLMs) augmented with external read-write memory constitute a computationally universal system has established the theoretical foundation for general-purpose agents. However, existing implementations face a critical bottleneck: the finite and...

1 min 1 month, 2 weeks ago

adr

LOW Academic European Union

Physics-Informed Neural Networks with Architectural Physics Embedding for Large-Scale Wave Field Reconstruction

arXiv:2603.02231v1 Announce Type: new Abstract: Large-scale wave field reconstruction requires precise solutions but faces challenges with computational efficiency and accuracy. The physics-based numerical methods like Finite Element Method (FEM) provide high accuracy but struggle with large-scale or high-frequency problems due...

1 min 1 month, 2 weeks ago

bit

LOW Academic International

Concept Heterogeneity-aware Representation Steering

arXiv:2603.02237v1 Announce Type: new Abstract: Representation steering offers a lightweight mechanism for controlling the behavior of large language models (LLMs) by intervening on internal activations at inference time. Most existing methods rely on a single global steering direction, typically obtained...

1 min 1 month, 2 weeks ago

bit

LOW Academic International

A Comparative Study of UMAP and Other Dimensionality Reduction Methods

arXiv:2603.02275v1 Announce Type: new Abstract: Uniform Manifold Approximation and Projection (UMAP) is a widely used manifold learning technique for dimensionality reduction. This paper studies UMAP, supervised UMAP, and several competing dimensionality reduction methods, including Principal Component Analysis (PCA), Kernel PCA,...

1 min 1 month, 2 weeks ago

bit

LOW Academic United States

Thermodynamic Regulation of Finite-Time Gibbs Training in Energy-Based Models: A Restricted Boltzmann Machine Study

arXiv:2603.02525v1 Announce Type: new Abstract: Restricted Boltzmann Machines (RBMs) are typically trained using finite-length Gibbs chains under a fixed sampling temperature. This practice implicitly assumes that the stochastic regime remains valid as the energy landscape evolves during learning. We argue...

1 min 1 month, 2 weeks ago

bit

LOW Conference International

CVPR 2026 Media Center

1 min 1 month, 2 weeks ago

bit

LOW Conference International

CVPR 2026 News and Resources for Press

1 min 1 month, 2 weeks ago

bit

LOW Academic United States

How Large Language Models Get Stuck: Early structure with persistent errors

arXiv:2603.00359v1 Announce Type: new Abstract: Linguistic insights may help make Large Language Model (LLM) training more efficient. We trained Meta's OPT model on the 100M word BabyLM dataset, and evaluated it on the BLiMP benchmark, which consists of 67 classes,...

1 min 1 month, 2 weeks ago

bit

LOW Academic International

CIRCUS: Circuit Consensus under Uncertainty via Stability Ensembles

arXiv:2603.00523v1 Announce Type: new Abstract: Mechanistic circuit discovery is notoriously sensitive to arbitrary analyst choices, especially pruning thresholds and feature dictionaries, often yielding brittle "one-shot" explanations with no principled notion of uncertainty. We reframe circuit discovery as an uncertainty-quantification problem...

1 min 1 month, 2 weeks ago

bit

LOW Academic International

Polynomial Mixing for Efficient Self-supervised Speech Encoders

arXiv:2603.00683v1 Announce Type: new Abstract: State-of-the-art speech-to-text models typically employ Transformer-based encoders that model token dependencies via self-attention mechanisms. However, the quadratic complexity of self-attention in both memory and computation imposes significant constraints on scalability. In this work, we propose...

1 min 1 month, 2 weeks ago

adr

LOW Academic International

RLAR: An Agentic Reward System for Multi-task Reinforcement Learning on Large Language Models

arXiv:2603.00724v1 Announce Type: new Abstract: Large language model alignment via reinforcement learning depends critically on reward function quality. However, static, domain-specific reward models are often costly to train and exhibit poor generalization in out-of-distribution scenarios encountered during RL iterations. We...

1 min 1 month, 2 weeks ago

bit

LOW Academic International

MedGPT-oss: Training a General-Purpose Vision-Language Model for Biomedicine

arXiv:2603.00842v1 Announce Type: new Abstract: Biomedical multimodal assistants have the potential to unify radiology, pathology, and clinical-text reasoning, yet a critical deployment gap remains: top-performing systems are either closed-source or computationally prohibitive, precluding the on-premises deployment required for patient privacy...

1 min 1 month, 2 weeks ago

bit

LOW Academic International

CHIMERA: Compact Synthetic Data for Generalizable LLM Reasoning

arXiv:2603.00889v1 Announce Type: new Abstract: Large Language Models (LLMs) have recently exhibited remarkable reasoning capabilities, largely enabled by supervised fine-tuning (SFT)- and reinforcement learning (RL)-based post-training on high-quality reasoning data. However, reproducing and extending these capabilities in open and scalable...

1 min 1 month, 2 weeks ago

bit

LOW Academic International

Prompt Sensitivity and Answer Consistency of Small Open-Source Large Language Models on Clinical Question Answering: Implications for Low-Resource Healthcare Deployment

arXiv:2603.00917v1 Announce Type: new Abstract: Small open-source language models are gaining attention for low-resource healthcare settings, but their reliability under different prompt phrasings remains poorly understood. We evaluated five open-source models (Gemma 2 2B, Phi-3 Mini 3.8B, Llama 3.2 3B,...

1 min 1 month, 2 weeks ago

bit

LOW Academic International

Thoth: Mid-Training Bridges LLMs to Time Series Understanding

arXiv:2603.01042v1 Announce Type: new Abstract: Large Language Models (LLMs) have demonstrated remarkable success in general-purpose reasoning. However, they still struggle to understand and reason about time series data, which limits their effectiveness in decision-making scenarios that depend on temporal dynamics....

1 min 1 month, 2 weeks ago

bit

LOW Academic United States

CARE: Confounder-Aware Aggregation for Reliable LLM Evaluation

arXiv:2603.00039v1 Announce Type: new Abstract: LLM-as-a-judge ensembles are the standard paradigm for scalable evaluation, but their aggregation mechanisms suffer from a fundamental flaw: they implicitly assume that judges provide independent estimates of true quality. However, in practice, LLM judges exhibit...

1 min 1 month, 2 weeks ago

bit

Compressed Sensing for Capability Localization in Large Language Models

When Shallow Wins: Silent Failures and the Depth-Accuracy Paradox in Latent Reasoning

Solving adversarial examples requires solving exponential misalignment

Orbital Transformers for Predicting Wavefunctions in Time-Dependent Density Functional Theory

NuMuon: Nuclear-Norm-Constrained Muon for Compressible LLM Training

JANUS: Structured Bidirectional Generation for Guaranteed Constraints and Analytical Uncertainty

MOOSE-Star: Unlocking Tractable Training for Scientific Discovery by Breaking the Complexity Barrier

LEA: Label Enumeration Attack in Vertical Federated Learning

Large-Margin Hyperdimensional Computing: A Learning-Theoretical Perspective

Believe Your Model: Distribution-Guided Confidence Calibration

The Distribution of Phoneme Frequencies across the World's Languages: Macroscopic and Microscopic Information-Theoretic Models

Eval4Sim: An Evaluation Framework for Persona Simulation

MaBERT:A Padding Safe Interleaved Transformer Mamba Hybrid Encoder for Efficient Extended Context Masked Language Modeling

Routing Absorption in Sparse Attention: Why Random Gates Are Hard to Beat

Neural Paging: Learning Context Management Policies for Turing-Complete Agents

Physics-Informed Neural Networks with Architectural Physics Embedding for Large-Scale Wave Field Reconstruction

Concept Heterogeneity-aware Representation Steering

A Comparative Study of UMAP and Other Dimensionality Reduction Methods

Thermodynamic Regulation of Finite-Time Gibbs Training in Energy-Based Models: A Restricted Boltzmann Machine Study

CVPR 2026 Media Center

CVPR 2026 News and Resources for Press

How Large Language Models Get Stuck: Early structure with persistent errors

CIRCUS: Circuit Consensus under Uncertainty via Stability Ensembles

Polynomial Mixing for Efficient Self-supervised Speech Encoders

RLAR: An Agentic Reward System for Multi-task Reinforcement Learning on Large Language Models

MedGPT-oss: Training a General-Purpose Vision-Language Model for Biomedicine

CHIMERA: Compact Synthetic Data for Generalizable LLM Reasoning

Prompt Sensitivity and Answer Consistency of Small Open-Source Large Language Models on Clinical Question Answering: Implications for Low-Resource Healthcare Deployment

Thoth: Mid-Training Bridges LLMs to Time Series Understanding

CARE: Confounder-Aware Aggregation for Reliable LLM Evaluation

Impact Distribution

Related Practice Areas

JCG, PC

HSOLLC Co., Ltd.