Semi-Synthetic Parallel Data for Translation Quality Estimation: A Case Study of Dataset Building for an Under-Resourced Language Pair
arXiv:2603.11743v1 Announce Type: new Abstract: Quality estimation (QE) plays a crucial role in machine translation (MT) workflows, as it serves to evaluate generated outputs that have no reference translations and to determine whether human post-editing or full retranslation is necessary....
Bielik-Minitron-7B: Compressing Large Language Models via Structured Pruning and Knowledge Distillation for the Polish Language
arXiv:2603.11881v1 Announce Type: new Abstract: This report details the creation of Bielik-Minitron-7B, a compressed 7.35B parameter version of the Bielik-11B-v3.0 model, specifically optimized for European languages. By leveraging a two-stage compression methodology inspired by the NVIDIA Minitron approach, we combined...
Structure-Aware Epistemic Uncertainty Quantification for Neural Operator PDE Surrogates
arXiv:2603.11052v1 Announce Type: new Abstract: Neural operators (NOs) provide fast, resolution-invariant surrogates for mapping input fields to PDE solution fields, but their predictions can exhibit significant epistemic uncertainty due to finite data, imperfect optimization, and distribution shift. For practical deployment...
Graph Tokenization for Bridging Graphs and Transformers
arXiv:2603.11099v1 Announce Type: new Abstract: The success of large pretrained Transformers is closely tied to tokenizers, which convert raw input into discrete symbols. Extending these models to graph-structured data remains a significant challenge. In this work, we introduce a graph...
Beyond Barren Plateaus: A Scalable Quantum Convolutional Architecture for High-Fidelity Image Classification
arXiv:2603.11131v1 Announce Type: new Abstract: While Quantum Convolutional Neural Networks (QCNNs) offer a theoretical paradigm for quantum machine learning, their practical implementation is severely bottlenecked by barren plateaus -- the exponential vanishing of gradients -- and poor empirical accuracy compared...
Algorithmic Capture, Computational Complexity, and Inductive Bias of Infinite Transformers
arXiv:2603.11161v1 Announce Type: new Abstract: We formally define Algorithmic Capture (i.e., ``grokking'' an algorithm) as the ability of a neural network to generalize to arbitrary problem sizes ($T$) with controllable error and minimal sample adaptation, distinguishing true algorithmic learning from...
Reference-Guided Machine Unlearning
arXiv:2603.11210v1 Announce Type: new Abstract: Machine unlearning aims to remove the influence of specific data from trained models while preserving general utility. Existing approximate unlearning methods often rely on performance-degradation heuristics, such as loss maximization or random labeling. However, these...
Differentiable Thermodynamic Phase-Equilibria for Machine Learning
arXiv:2603.11249v1 Announce Type: new Abstract: Accurate prediction of phase equilibria remains a central challenge in chemical engineering. Physics-consistent machine learning methods that incorporate thermodynamic structure into neural networks have recently shown strong performance for activity-coefficient modeling. However, extending such approaches...
ARROW: Augmented Replay for RObust World models
arXiv:2603.11395v1 Announce Type: new Abstract: Continual reinforcement learning challenges agents to acquire new skills while retaining previously learned ones with the goal of improving performance in both past and future tasks. Most existing approaches rely on model-free methods with replay...
Harnessing Data Asymmetry: Manifold Learning in the Finsler World
arXiv:2603.11396v1 Announce Type: new Abstract: Manifold learning is a fundamental task at the core of data analysis and visualisation. It aims to capture the simple underlying structure of complex high-dimensional data by preserving pairwise dissimilarities in low-dimensional embeddings. Traditional methods...
UniHetCO: A Unified Heterogeneous Representation for Multi-Problem Learning in Unsupervised Neural Combinatorial Optimization
arXiv:2603.11456v1 Announce Type: new Abstract: Unsupervised neural combinatorial optimization (NCO) offers an appealing alternative to supervised approaches by training learning-based solvers without ground-truth solutions, directly minimizing instance objectives and constraint violations. Yet for graph node subset-selection problems (e.g., Maximum Clique...
Slack More, Predict Better: Proximal Relaxation for Probabilistic Latent Variable Model-based Soft Sensors
arXiv:2603.11473v1 Announce Type: new Abstract: Nonlinear Probabilistic Latent Variable Models (NPLVMs) are a cornerstone of soft sensor modeling due to their capacity for uncertainty delineation. However, conventional NPLVMs are trained using amortized variational inference, where neural networks parameterize the variational...
Grammar of the Wave: Towards Explainable Multivariate Time Series Event Detection via Neuro-Symbolic VLM Agents
arXiv:2603.11479v1 Announce Type: new Abstract: Time Series Event Detection (TSED) has long been an important task with critical applications across many high-stakes domains. Unlike statistical anomalies, events are defined by semantics with complex internal structures, which are difficult to learn...
PoultryLeX-Net: Domain-Adaptive Dual-Stream Transformer Architecture for Large-Scale Poultry Stakeholder Modeling
arXiv:2603.09991v1 Announce Type: cross Abstract: The rapid growth of the global poultry industry, driven by rising demand for affordable animal protein, has intensified public discourse surrounding production practices, housing, management, animal welfare, and supply-chain transparency. Social media platforms such as...
Defining AI Models and AI Systems: A Framework to Resolve the Boundary Problem
arXiv:2603.10023v1 Announce Type: cross Abstract: Emerging AI regulations assign distinct obligations to different actors along the AI value chain (e.g., the EU AI Act distinguishes providers and deployers for both AI models and AI systems), yet the foundational terms "AI...
Fine-Tune, Don't Prompt, Your Language Model to Identify Biased Language in Clinical Notes
arXiv:2603.10004v1 Announce Type: new Abstract: Clinical documentation can contain emotionally charged language with stigmatizing or privileging valences. We present a framework for detecting and classifying such language as stigmatizing, privileging, or neutral. We constructed a curated lexicon of biased terms...
The Prediction-Measurement Gap: Toward Meaning Representations as Scientific Instruments
arXiv:2603.10130v1 Announce Type: new Abstract: Text embeddings have become central to computational social science and psychology, enabling scalable measurement of meaning and mixed-method inference. Yet most representation learning is optimized and evaluated for prediction and retrieval, yielding a prediction-measurement gap:...
Reason and Verify: A Framework for Faithful Retrieval-Augmented Generation
arXiv:2603.10143v1 Announce Type: new Abstract: Retrieval-Augmented Generation (RAG) significantly improves the factuality of Large Language Models (LLMs), yet standard pipelines often lack mechanisms to verify inter- mediate reasoning, leaving them vulnerable to hallucinations in high-stakes domains. To address this, we...
Lost in Backpropagation: The LM Head is a Gradient Bottleneck
arXiv:2603.10145v1 Announce Type: new Abstract: The last layer of neural language models (LMs) projects output features of dimension $D$ to logits in dimension $V$, the size of the vocabulary, where usually $D \ll V$. This mismatch is known to raise...
Adaptive Activation Cancellation for Hallucination Mitigation in Large Language Models
arXiv:2603.10195v1 Announce Type: new Abstract: Large Language Models frequently generate fluent but factually incorrect text. We propose Adaptive Activation Cancellation (AAC), a real-time inference-time framework that treats hallucination-associated neural activations as structured interference within the transformer residual stream, drawing an...
LWM-Temporal: Sparse Spatio-Temporal Attention for Wireless Channel Representation Learning
arXiv:2603.10024v1 Announce Type: new Abstract: LWM-Temporal is a new member of the Large Wireless Models (LWM) family that targets the spatiotemporal nature of wireless channels. Designed as a task-agnostic foundation model, LWM-Temporal learns universal channel embeddings that capture mobility-induced evolution...
Cluster-Aware Attention-Based Deep Reinforcement Learning for Pickup and Delivery Problems
arXiv:2603.10053v1 Announce Type: new Abstract: The Pickup and Delivery Problem (PDP) is a fundamental and challenging variant of the Vehicle Routing Problem, characterized by tightly coupled pickup--delivery pairs, precedence constraints, and spatial layouts that often exhibit clustering. Existing deep reinforcement...
KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization
arXiv:2603.10085v1 Announce Type: new Abstract: Improving GPU kernel efficiency is crucial for advancing AI systems. Recent work has explored leveraging large language models (LLMs) for GPU kernel generation and optimization. However, existing LLM-based kernel optimization pipelines typically rely on opaque,...
A Survey of Weight Space Learning: Understanding, Representation, and Generation
arXiv:2603.10090v1 Announce Type: new Abstract: Neural network weights are typically viewed as the end product of training, while most deep learning research focuses on data, features, and architectures. However, recent advances show that the set of all possible weight values...
A neural operator for predicting vibration frequency response curves from limited data
arXiv:2603.10149v1 Announce Type: new Abstract: In the design of engineered components, rigorous vibration testing is essential for performance validation and identification of resonant frequencies and amplitudes encountered during operation. Performing this evaluation numerically via machine learning has great potential to...
Mashup Learning: Faster Finetuning by Remixing Past Checkpoints
arXiv:2603.10156v1 Announce Type: new Abstract: Finetuning on domain-specific data is a well-established method for enhancing LLM performance on downstream tasks. Training on each dataset produces a new set of model weights, resulting in a multitude of checkpoints saved in-house or...
Rethinking the Harmonic Loss via Non-Euclidean Distance Layers
arXiv:2603.10225v1 Announce Type: new Abstract: Cross-entropy loss has long been the standard choice for training deep neural networks, yet it suffers from interpretability limitations, unbounded weight growth, and inefficiencies that can contribute to costly training dynamics. The harmonic loss is...
Copula-ResLogit: A Deep-Copula Framework for Unobserved Confounding Effects
arXiv:2603.10284v1 Announce Type: new Abstract: A key challenge in travel demand analysis is the presence of unobserved factors that may generate non-causal dependencies, obscuring the true causal effects. To address the issue, the study introduces a novel deep learning based...
GaLoRA: Parameter-Efficient Graph-Aware LLMs for Node Classification
arXiv:2603.10298v1 Announce Type: new Abstract: The rapid rise of large language models (LLMs) and their ability to capture semantic relationships has led to their adoption in a wide range of applications. Text-attributed graphs (TAGs) are a notable example where LLMs...
What do near-optimal learning rate schedules look like?
arXiv:2603.10301v1 Announce Type: new Abstract: A basic unanswered question in neural network training is: what is the best learning rate schedule shape for a given workload? The choice of learning rate schedule is a key factor in the success or...