Neural Operators Can Discover Functional Clusters
arXiv:2602.23528v1 Announce Type: new Abstract: Operator learning is reshaping scientific computing by amortizing inference across infinite families of problems. While neural operators (NOs) are increasingly well understood for regression, far less is known for classification and its unsupervised analogue: clustering....
Active Value Querying to Minimize Additive Error in Subadditive Set Function Learning
arXiv:2602.23529v1 Announce Type: new Abstract: Subadditive set functions play a pivotal role in computational economics (especially in combinatorial auctions), combinatorial optimization or artificial intelligence applications such as interpretable machine learning. However, specifying a set function requires assigning values to an...
Rudder: Steering Prefetching in Distributed GNN Training using LLM Agents
arXiv:2602.23556v1 Announce Type: new Abstract: Large-scale Graph Neural Networks (GNNs) are typically trained by sampling a vertex's neighbors to a fixed distance. Because large input graphs are distributed, training requires frequent irregular communication that stalls forward progress. Moreover, fetched data...
Dynamics of Learning under User Choice: Overspecialization and Peer-Model Probing
arXiv:2602.23565v1 Announce Type: new Abstract: In many economically relevant contexts where machine learning is deployed, multiple platforms obtain data from the same pool of users, each of whom selects the platform that best serves them. Prior work in this setting...
Flowette: Flow Matching with Graphette Priors for Graph Generation
arXiv:2602.23566v1 Announce Type: new Abstract: We study generative modeling of graphs with recurring subgraph motifs. We propose Flowette, a continuous flow matching framework, that employs a graph neural network based transformer to learn a velocity field defined over graph representations...
Hybrid Quantum Temporal Convolutional Networks
arXiv:2602.23578v1 Announce Type: new Abstract: Quantum machine learning models for sequential data face scalability challenges with complex multivariate signals. We introduce the Hybrid Quantum Temporal Convolutional Network (HQTCN), which combines classical temporal windowing with a quantum convolutional neural network core....
Normalisation and Initialisation Strategies for Graph Neural Networks in Blockchain Anomaly Detection
arXiv:2602.23599v1 Announce Type: new Abstract: Graph neural networks (GNNs) offer a principled approach to financial fraud detection by jointly learning from node features and transaction graph topology. However, their effectiveness on real-world anti-money laundering (AML) benchmarks depends critically on training...
When Does Multimodal Learning Help in Healthcare? A Benchmark on EHR and Chest X-Ray Fusion
arXiv:2602.23614v1 Announce Type: new Abstract: Machine learning holds promise for advancing clinical decision support, yet it remains unclear when multimodal learning truly helps in practice, particularly under modality missingness and fairness constraints. In this work, we conduct a systematic benchmark...
BTTackler: A Diagnosis-based Framework for Efficient Deep Learning Hyperparameter Optimization
arXiv:2602.23630v1 Announce Type: new Abstract: Hyperparameter optimization (HPO) is known to be costly in deep learning, especially when leveraging automated approaches. Most of the existing automated HPO methods are accuracy-based, i.e., accuracy metrics are used to guide the trials of...
On the Convergence of Single-Loop Stochastic Bilevel Optimization with Approximate Implicit Differentiation
arXiv:2602.23633v1 Announce Type: new Abstract: Stochastic Bilevel Optimization has emerged as a fundamental framework for meta-learning and hyperparameter optimization. Despite the practical prevalence of single-loop algorithms--which update lower and upper variables concurrently--their theoretical understanding, particularly in the stochastic regime, remains...
Selective Denoising Diffusion Model for Time Series Anomaly Detection
arXiv:2602.23662v1 Announce Type: new Abstract: Time series anomaly detection (TSAD) has been an important area of research for decades, with reconstruction-based methods, mostly based on generative models, gaining popularity and demonstrating success. Diffusion models have recently attracted attention due to...
Disentangled Mode-Specific Representations for Tensor Time Series via Contrastive Learning
arXiv:2602.23663v1 Announce Type: new Abstract: Multi-mode tensor time series (TTS) can be found in many domains, such as search engines and environmental monitoring systems. Learning representations of a TTS benefits various applications, but it is also challenging since the complexities...
Optimizer-Induced Low-Dimensional Drift and Transverse Dynamics in Transformer Training
arXiv:2602.23696v1 Announce Type: new Abstract: We study the geometry of training trajectories in small transformer models and find that parameter updates organize into a dominant drift direction with transverse residual dynamics. Using uncentered, row-normalized trajectory PCA, we show that a...
Bridging Dynamics Gaps via Diffusion Schr\"odinger Bridge for Cross-Domain Reinforcement Learning
arXiv:2602.23737v1 Announce Type: new Abstract: Cross-domain reinforcement learning (RL) aims to learn transferable policies under dynamics shifts between source and target domains. A key challenge lies in the lack of target-domain environment interaction and reward supervision, which prevents direct policy...
MAGE: Multi-scale Autoregressive Generation for Offline Reinforcement Learning
arXiv:2602.23770v1 Announce Type: new Abstract: Generative models have gained significant traction in offline reinforcement learning (RL) due to their ability to model complex trajectory distributions. However, existing generation-based approaches still struggle with long-horizon tasks characterized by sparse rewards. Some hierarchical...
TradeFM: A Generative Foundation Model for Trade-flow and Market Microstructure
arXiv:2602.23784v1 Announce Type: new Abstract: Foundation models have transformed domains from language to genomics by learning general-purpose representations from large-scale, heterogeneous data. We introduce TradeFM, a 524M-parameter generative Transformer that brings this paradigm to market microstructure, learning directly from billions...
Provable Subspace Identification of Nonlinear Multi-view CCA
arXiv:2602.23785v1 Announce Type: new Abstract: We investigate the identifiability of nonlinear Canonical Correlation Analysis (CCA) in a multi-view setup, where each view is generated by an unknown nonlinear map applied to a linear mixture of shared latents and view-private noise....
UPath: Universal Planner Across Topological Heterogeneity For Grid-Based Pathfinding
arXiv:2602.23789v1 Announce Type: new Abstract: The performance of search algorithms for grid-based pathfinding, e.g. A*, critically depends on the heuristic function that is used to focus the search. Recent studies have shown that informed heuristics that take the positions/shapes of...
GRAIL: Post-hoc Compensation by Linear Reconstruction for Compressed Networks
arXiv:2602.23795v1 Announce Type: new Abstract: Structured deep model compression methods are hardware-friendly and substantially reduce memory and inference costs. However, under aggressive compression, the resulting accuracy degradation often necessitates post-compression finetuning, which can be impractical due to missing labeled data...
MPU: Towards Secure and Privacy-Preserving Knowledge Unlearning for Large Language Models
arXiv:2602.23798v1 Announce Type: new Abstract: Machine unlearning for large language models often faces a privacy dilemma in which strict constraints prohibit sharing either the server's parameters or the client's forget set. To address this dual non-disclosure constraint, we propose MPU,...
Actor-Critic Pretraining for Proximal Policy Optimization
arXiv:2602.23804v1 Announce Type: new Abstract: Reinforcement learning (RL) actor-critic algorithms enable autonomous learning but often require a large number of environment interactions, which limits their applicability in robotics. Leveraging expert data can reduce the number of required environment interactions. A...
Beyond State-Wise Mirror Descent: Offline Policy Optimization with Parameteric Policies
arXiv:2602.23811v1 Announce Type: new Abstract: We investigate the theoretical aspects of offline reinforcement learning (RL) under general function approximation. While prior works (e.g., Xie et al., 2021) have established the theoretical foundations of learning a good policy from offline data...
Learning to maintain safety through expert demonstrations in settings with unknown constraints: A Q-learning perspective
arXiv:2602.23816v1 Announce Type: new Abstract: Given a set of trajectories demonstrating the execution of a task safely in a constrained MDP with observable rewards but with unknown constraints and non-observable costs, we aim to find a policy that maximizes the...
Inferring Chronic Treatment Onset from ePrescription Data: A Renewal Process Approach
arXiv:2602.23824v1 Announce Type: new Abstract: Longitudinal electronic health record (EHR) data are often left-censored, making diagnosis records incomplete and unreliable for determining disease onset. In contrast, outpatient prescriptions form renewal-based trajectories that provide a continuous signal of disease management. We...
FedNSAM:Consistency of Local and Global Flatness for Federated Learning
arXiv:2602.23827v1 Announce Type: new Abstract: In federated learning (FL), multi-step local updates and data heterogeneity usually lead to sharper global minima, which degrades the performance of the global model. Popular FL algorithms integrate sharpness-aware minimization (SAM) into local training to...
ULW-SleepNet: An Ultra-Lightweight Network for Multimodal Sleep Stage Scoring
arXiv:2602.23852v1 Announce Type: new Abstract: Automatic sleep stage scoring is crucial for the diagnosis and treatment of sleep disorders. Although deep learning models have advanced the field, many existing models are computationally demanding and designed for single-channel electroencephalography (EEG), limiting...
A Theory of Random Graph Shift in Truncated-Spectrum vRKHS
arXiv:2602.23880v1 Announce Type: new Abstract: This paper develops a theory of graph classification under domain shift through a random-graph generative lens, where we consider intra-class graphs sharing the same random graph model (RGM) and the domain shift induced by changes...
Learning Generation Orders for Masked Discrete Diffusion Models via Variational Inference
arXiv:2602.23968v1 Announce Type: new Abstract: Masked discrete diffusion models (MDMs) are a promising new approach to generative modelling, offering the ability for parallel token generation and therefore greater efficiency than autoregressive counterparts. However, achieving an optimal balance between parallel generation...
Intrinsic Lorentz Neural Network
arXiv:2602.23981v1 Announce Type: new Abstract: Real-world data frequently exhibit latent hierarchical structures, which can be naturally represented by hyperbolic geometry. Although recent hyperbolic neural networks have demonstrated promising results, many existing architectures remain partially intrinsic, mixing Euclidean operations with hyperbolic...
MINT: Multimodal Imaging-to-Speech Knowledge Transfer for Early Alzheimer's Screening
arXiv:2602.23994v1 Announce Type: new Abstract: Alzheimer's disease is a progressive neurodegenerative disorder in which mild cognitive impairment (MCI) marks a critical transition between aging and dementia. Neuroimaging modalities, such as structural MRI, provide biomarkers of this transition; however, their high...