Transit Network Design with Two-Level Demand Uncertainties: A Machine Learning and Contextual Stochastic Optimization Framework
arXiv:2603.00010v1 Announce Type: new Abstract: Transit Network Design is a well-studied problem in the field of transportation, typically addressed by solving optimization models under fixed demand assumptions. Considering the limitations of these assumptions, this paper proposes a new framework, namely...
StaTS: Spectral Trajectory Schedule Learning for Adaptive Time Series Forecasting with Frequency Guided Denoiser
arXiv:2603.00037v1 Announce Type: new Abstract: Diffusion models have been used for probabilistic time series forecasting and show strong potential. However, fixed noise schedules often produce intermediate states that are hard to invert and a terminal state that deviates from the...
CARE: Confounder-Aware Aggregation for Reliable LLM Evaluation
arXiv:2603.00039v1 Announce Type: new Abstract: LLM-as-a-judge ensembles are the standard paradigm for scalable evaluation, but their aggregation mechanisms suffer from a fundamental flaw: they implicitly assume that judges provide independent estimates of true quality. However, in practice, LLM judges exhibit...
Attn-QAT: 4-Bit Attention With Quantization-Aware Training
arXiv:2603.00040v1 Announce Type: new Abstract: Achieving reliable 4-bit attention is a prerequisite for end-to-end FP4 computation on emerging FP4-capable GPUs, yet attention remains the main obstacle due to FP4's tiny dynamic range and attention's heavy-tailed activations. This paper presents the...
Econometric vs. Causal Structure-Learning for Time-Series Policy Decisions: Evidence from the UK COVID-19 Policies
arXiv:2603.00041v1 Announce Type: new Abstract: Causal machine learning (ML) recovers graphical structures that inform us about potential cause-and-effect relationships. Most progress has focused on cross-sectional data with no explicit time order, whereas recovering causal structures from time series data remains...
Maximizing the Spectral Energy Gain in Sub-1-Bit LLMs via Latent Geometry Alignment
arXiv:2603.00042v1 Announce Type: new Abstract: We identify the Spectral Energy Gain in extreme model compression, where low-rank binary approximations outperform tiny-rank floating-point baselines for heavy-tailed spectra. However, prior attempts fail to realize this potential, trailing state-of-the-art 1-bit methods. We attribute...
Breaking the Factorization Barrier in Diffusion Language Models
arXiv:2603.00045v1 Announce Type: new Abstract: Diffusion language models theoretically allow for efficient parallel generation but are practically hindered by the "factorization barrier": the assumption that simultaneously predicted tokens are independent. This limitation forces a trade-off: models must either sacrifice speed...
REMIND: Rethinking Medical High-Modality Learning under Missingness--A Long-Tailed Distribution Perspective
arXiv:2603.00046v1 Announce Type: new Abstract: Medical multi-modal learning is critical for integrating information from a large set of diverse modalities. However, when leveraging a high number of modalities in real clinical applications, it is often impractical to obtain full-modality observations...
Mag-Mamba: Modeling Coupled spatiotemporal Asymmetry for POI Recommendation
arXiv:2603.00053v1 Announce Type: new Abstract: Next Point-of-Interest (POI) recommendation is a critical task in location-based services, yet it faces the fundamental challenge of coupled spatiotemporal asymmetry inherent in urban mobility. Specifically, transition intents between locations exhibit high asymmetry and are...
Expert Divergence Learning for MoE-based Language Models
arXiv:2603.00054v1 Announce Type: new Abstract: The Mixture-of-Experts (MoE) architecture is a powerful technique for scaling language models, yet it often suffers from expert homogenization, where experts learn redundant functionalities, thereby limiting MoE's full potential. To address this, we introduce Expert...
Certainty-Validity: A Diagnostic Framework for Discrete Commitment Systems
arXiv:2603.00070v1 Announce Type: new Abstract: Standard evaluation metrics for machine learning -- accuracy, precision, recall, and AUROC -- assume that all errors are equivalent: a confident incorrect prediction is penalized identically to an uncertain one. For discrete commitment systems (architectures...
SEval-NAS: A Search-Agnostic Evaluation for Neural Architecture Search
arXiv:2603.00099v1 Announce Type: new Abstract: Neural architecture search (NAS) automates the discovery of neural networks that meet specified criteria, yet its evaluation procedures are often hardcoded, limiting the ability to introduce new metrics. This issue is especially pronounced in hardware-aware...
Wideband Power Amplifier Behavioral Modeling Using an Amplitude Conditioned LSTM
arXiv:2603.00101v1 Announce Type: new Abstract: Wideband power amplifiers exhibit complex nonlinear and memory effects that challenge traditional behavioral modeling approaches. This paper proposes a novel amplitude conditioned long short-term memory (AC-LSTM) network that introduces explicit amplitude-dependent gating to enhance the...
LIDS: LLM Summary Inference Under the Layered Lens
arXiv:2603.00105v1 Announce Type: new Abstract: Large language models (LLMs) have gained significant attention by many researchers and practitioners in natural language processing (NLP) since the introduction of ChatGPT in 2022. One notable feature of ChatGPT is its ability to generate...
MAML-KT: Addressing Cold Start Problem in Knowledge Tracing for New Students via Few-Shot Model-Agnostic Meta Learning
arXiv:2603.00137v1 Announce Type: new Abstract: Knowledge tracing (KT) models are commonly evaluated by training on early interactions from all students and testing on later responses. While effective for measuring average predictive performance, this evaluation design obscures a cold start scenario...
NNiT: Width-Agnostic Neural Network Generation with Structurally Aligned Weight Spaces
arXiv:2603.00180v1 Announce Type: new Abstract: Generative modeling of neural network parameters is often tied to architectures because standard parameter representations rely on known weight-matrix dimensions. Generation is further complicated by permutation symmetries that allow networks to model similar input-output functions...
A medical coding language model trained on clinical narratives from a population-wide cohort of 1.8 million patients
arXiv:2603.00221v1 Announce Type: new Abstract: Medical coding translates clinical documentation into standardized codes for billing, research, and public health, but manual coding is time-consuming and error-prone. Existing automation efforts rely on small datasets that poorly represent real-world patient heterogeneity. We...
CoPeP: Benchmarking Continual Pretraining for Protein Language Models
arXiv:2603.00253v1 Announce Type: new Abstract: Protein language models (pLMs) have recently gained significant attention for their ability to uncover relationships between sequence, structure, and function from evolutionary statistics, thereby accelerating therapeutic drug discovery. These models learn from large protein databases...
Polynomial Surrogate Training for Differentiable Ternary Logic Gate Networks
arXiv:2603.00302v1 Announce Type: new Abstract: Differentiable logic gate networks (DLGNs) learn compact, interpretable Boolean circuits via gradient-based training, but all existing variants are restricted to the 16 two-input binary gates. Extending DLGNs to Ternary Kleene $K_3$ logic and training DTLGNs...
Vectorized Adaptive Histograms for Sparse Oblique Forests
arXiv:2603.00326v1 Announce Type: new Abstract: Classification using sparse oblique random forests provides guarantees on uncertainty and confidence while controlling for specific error types. However, they use more data and more compute than other tree ensembles because they create deep trees...
StethoLM: Audio Language Model for Cardiopulmonary Analysis Across Clinical Tasks
arXiv:2603.00355v1 Announce Type: new Abstract: Listening to heart and lung sounds - auscultation - is one of the first and most fundamental steps in a clinical examination. Despite being fast and non-invasive, it demands years of experience to interpret subtle...
Quantifying Catastrophic Forgetting in IoT Intrusion Detection Systems
arXiv:2603.00363v1 Announce Type: new Abstract: Distribution shifts in attack patterns within RPL-based IoT networks pose a critical threat to the reliability and security of large-scale connected systems. Intrusion Detection Systems (IDS) trained on static datasets often fail to generalize to...
Improving Full Waveform Inversion in Large Model Era
arXiv:2603.00377v1 Announce Type: new Abstract: Full Waveform Inversion (FWI) is a highly nonlinear and ill-posed problem that aims to recover subsurface velocity maps from surface-recorded seismic waveforms data. Existing data-driven FWI typically uses small models, as available datasets have limited...
TENG-BC: Unified Time-Evolving Natural Gradient for Neural PDE Solvers with General Boundary Conditions
arXiv:2603.00397v1 Announce Type: new Abstract: Accurately solving time-dependent partial differential equations (PDEs) with neural networks remains challenging due to long-time error accumulation and the difficulty of enforcing general boundary conditions. We introduce TENG-BC, a high-precision neural PDE solver based on...
USE: Uncertainty Structure Estimation for Robust Semi-Supervised Learning
arXiv:2603.00404v1 Announce Type: new Abstract: In this study, a novel idea, Uncertainty Structure Estimation (USE), a lightweight, algorithm-agnostic procedure that emphasizes the often-overlooked role of unlabeled data quality is introduced for Semi-supervised learning (SSL). SSL has achieved impressive progress, but...
Exact and Asymptotically Complete Robust Verifications of Neural Networks via Quantum Optimization
arXiv:2603.00408v1 Announce Type: new Abstract: Deep neural networks (DNNs) enable high performance across domains but remain vulnerable to adversarial perturbations, limiting their use in safety-critical settings. Here, we introduce two quantum-optimization-based models for robust verification that reduce the combinatorial burden...
Physics-Aware Learnability: From Set-Theoretic Independence to Operational Constraints
arXiv:2603.00417v1 Announce Type: new Abstract: Beyond binary classification, learnability can become a logically fragile notion: in EMX, even the class of all finite subsets of $[0,1]$ is learnable in some models of ZFC and not in others. We argue the...
Efficient Decoder Scaling Strategy for Neural Routing Solvers
arXiv:2603.00430v1 Announce Type: new Abstract: Construction-based neural routing solvers, typically composed of an encoder and a decoder, have emerged as a promising approach for solving vehicle routing problems. While recent studies suggest that shifting parameters from the encoder to the...
ROKA: Robust Knowledge Unlearning against Adversaries
arXiv:2603.00436v1 Announce Type: new Abstract: The need for machine unlearning is critical for data privacy, yet existing methods often cause Knowledge Contamination by unintentionally damaging related knowledge. Such a degraded model performance after unlearning has been recently leveraged for new...
Benchmarking Few-shot Transferability of Pre-trained Models with Improved Evaluation Protocols
arXiv:2603.00478v1 Announce Type: new Abstract: Few-shot transfer has been revolutionized by stronger pre-trained models and improved adaptation algorithms.However, there lacks a unified, rigorous evaluation protocol that is both challenging and realistic for real-world usage. In this work, we establish FEWTRANS,...