Golden Layers and Where to Find Them: Improved Knowledge Editing for Large Language Models Via Layer Gradient Analysis
arXiv:2602.20207v1 Announce Type: new Abstract: Knowledge editing in Large Language Models (LLMs) aims to update the model's prediction for a specific query to a desired target while preserving its behavior on all other inputs. This process typically involves two stages:...
Model Merging in the Essential Subspace
arXiv:2602.20208v1 Announce Type: new Abstract: Model merging aims to integrate multiple task-specific fine-tuned models derived from a shared pre-trained checkpoint into a single multi-task model without additional training. Despite extensive research, task interference remains a major obstacle that often undermines...
MultiModalPFN: Extending Prior-Data Fitted Networks for Multimodal Tabular Learning
arXiv:2602.20223v1 Announce Type: new Abstract: Recently, TabPFN has gained attention as a foundation model for tabular data. However, it struggles to integrate heterogeneous modalities such as images and text, which are common in domains like healthcare and marketing, thereby limiting...
Uncertainty-Aware Delivery Delay Duration Prediction via Multi-Task Deep Learning
arXiv:2602.20271v1 Announce Type: new Abstract: Accurate delivery delay prediction is critical for maintaining operational efficiency and customer satisfaction across modern supply chains. Yet the increasing complexity of logistics networks, spanning multimodal transportation, cross-country routing, and pronounced regional variability, makes this...
The Truthfulness Spectrum Hypothesis
arXiv:2602.20273v1 Announce Type: new Abstract: Large language models (LLMs) have been reported to linearly encode truthfulness, yet recent work questions this finding's generality. We reconcile these views with the truthfulness spectrum hypothesis: the representational space contains directions ranging from broadly...
Learning to Solve Complex Problems via Dataset Decomposition
arXiv:2602.20296v1 Announce Type: new Abstract: Curriculum learning is a class of training strategies that organizes the data being exposed to a model by difficulty, gradually from simpler to more complex examples. This research explores a reverse curriculum generation approach that...
Shape-informed cardiac mechanics surrogates in data-scarce regimes via geometric encoding and generative augmentation
arXiv:2602.20306v1 Announce Type: new Abstract: High-fidelity computational models of cardiac mechanics provide mechanistic insight into the heart function but are computationally prohibitive for routine clinical use. Surrogate models can accelerate simulations, but generalization across diverse anatomies is challenging, particularly in...
In-context Pre-trained Time-Series Foundation Models adapt to Unseen Tasks
arXiv:2602.20307v1 Announce Type: new Abstract: Time-series foundation models (TSFMs) have demonstrated strong generalization capabilities across diverse datasets and tasks. However, existing foundation models are typically pre-trained to enhance performance on specific tasks and often struggle to generalize to unseen tasks...
QuantVLA: Scale-Calibrated Post-Training Quantization for Vision-Language-Action Models
arXiv:2602.20309v1 Announce Type: new Abstract: Vision-language-action (VLA) models unify perception, language, and control for embodied agents but face significant challenges in practical deployment due to rapidly increasing compute and memory demands, especially as models scale to longer horizons and larger...
Emergent Manifold Separability during Reasoning in Large Language Models
arXiv:2602.20338v1 Announce Type: new Abstract: Chain-of-Thought (CoT) prompting significantly improves reasoning in Large Language Models, yet the temporal dynamics of the underlying representation geometry remain poorly understood. We investigate these dynamics by applying Manifold Capacity Theory (MCT) to a compositional...
Hierarchical Molecular Representation Learning via Fragment-Based Self-Supervised Embedding Prediction
arXiv:2602.20344v1 Announce Type: new Abstract: Graph self-supervised learning (GSSL) has demonstrated strong potential for generating expressive graph embeddings without the need for human annotations, making it particularly valuable in domains with high labeling costs such as molecular graph analysis. However,...
Momentum Guidance: Plug-and-Play Guidance for Flow Models
arXiv:2602.20360v1 Announce Type: new Abstract: Flow-based generative models have become a strong framework for high-quality generative modeling, yet pretrained models are rarely used in their vanilla conditional form: conditional samples without guidance often appear diffuse and lack fine-grained detail due...
Quantitative Approximation Rates for Group Equivariant Learning
arXiv:2602.20370v1 Announce Type: new Abstract: The universal approximation theorem establishes that neural networks can approximate any continuous function on a compact set. Later works in approximation theory provide quantitative approximation rates for ReLU networks on the class of $\alpha$-H\"older functions...
cc-Shapley: Measuring Multivariate Feature Importance Needs Causal Context
arXiv:2602.20396v1 Announce Type: new Abstract: Explainable artificial intelligence promises to yield insights into relevant features, thereby enabling humans to examine and scrutinize machine learning models or even facilitating scientific discovery. Considering the widespread technique of Shapley values, we find that...
Wasserstein Distributionally Robust Online Learning
arXiv:2602.20403v1 Announce Type: new Abstract: We study distributionally robust online learning, where a risk-averse learner updates decisions sequentially to guard against worst-case distributions drawn from a Wasserstein ambiguity set centered at past observations. While this paradigm is well understood in...
CITED: A Decision Boundary-Aware Signature for GNNs Towards Model Extraction Defense
arXiv:2602.20418v1 Announce Type: new Abstract: Graph neural networks (GNNs) have demonstrated superior performance in various applications, such as recommendation systems and financial risk management. However, deploying large-scale GNN models locally is particularly challenging for users, as it requires significant computational...
CREDIT: Certified Ownership Verification of Deep Neural Networks Against Model Extraction Attacks
arXiv:2602.20419v1 Announce Type: new Abstract: Machine Learning as a Service (MLaaS) has emerged as a widely adopted paradigm for providing access to deep neural network (DNN) models, enabling users to conveniently leverage these models through standardized APIs. However, such services...
GauS: Differentiable Scheduling Optimization via Gaussian Reparameterization
arXiv:2602.20427v1 Announce Type: new Abstract: Efficient operator scheduling is a fundamental challenge in software compilation and hardware synthesis. While recent differentiable approaches have sought to replace traditional ones like exact solvers or heuristics with gradient-based search, they typically rely on...
Imputation of Unknown Missingness in Sparse Electronic Health Records
arXiv:2602.20442v1 Announce Type: new Abstract: Machine learning holds great promise for advancing the field of medicine, with electronic health records (EHRs) serving as a primary data source. However, EHRs are often sparse and contain missing data due to various challenges...
Oracle-Robust Online Alignment for Large Language Models
arXiv:2602.20457v1 Announce Type: new Abstract: We study online alignment of large language models under misspecified preference feedback, where the observed preference oracle deviates from an ideal but unknown ground-truth oracle. The online LLM alignment problem is a bi-level reinforcement problem...
Nonparametric Teaching of Attention Learners
arXiv:2602.20461v1 Announce Type: new Abstract: Attention learners, neural networks built on the attention mechanism, e.g., transformers, excel at learning the implicit relationships that relate sequences to their corresponding properties, e.g., mapping a given sequence of tokens to the probability of...
A Long-Short Flow-Map Perspective for Drifting Models
arXiv:2602.20463v1 Announce Type: new Abstract: This paper provides a reinterpretation of the Drifting Model~\cite{deng2026generative} through a semigroup-consistent long-short flow-map factorization. We show that a global transport process can be decomposed into a long-horizon flow map followed by a short-time terminal...
Elimination-compensation pruning for fully-connected neural networks
arXiv:2602.20467v1 Announce Type: new Abstract: The unmatched ability of Deep Neural Networks in capturing complex patterns in large and noisy datasets is often associated with their large hypothesis space, and consequently to the vast amount of parameters that characterize model...
CGSTA: Cross-Scale Graph Contrast with Stability-Aware Alignment for Multivariate Time-Series Anomaly Detection
arXiv:2602.20468v1 Announce Type: new Abstract: Multivariate time-series anomaly detection is essential for reliable industrial control, telemetry, and service monitoring. However, the evolving inter-variable dependencies and inevitable noise render it challenging. Existing methods often use single-scale graphs or instance-level contrast. Moreover,...
Wireless Federated Multi-Task LLM Fine-Tuning via Sparse-and-Orthogonal LoRA
arXiv:2602.20492v1 Announce Type: new Abstract: Decentralized federated learning (DFL) based on low-rank adaptation (LoRA) enables mobile devices with multi-task datasets to collaboratively fine-tune a large language model (LLM) by exchanging locally updated parameters with a subset of neighboring devices via...
A Generalized Apprenticeship Learning Framework for Capturing Evolving Student Pedagogical Strategies
arXiv:2602.20527v1 Announce Type: new Abstract: Reinforcement Learning (RL) and Deep Reinforcement Learning (DRL) have advanced rapidly in recent years and have been successfully applied to e-learning environments like intelligent tutoring systems (ITSs). Despite great success, the broader application of DRL...
Memory-guided Prototypical Co-occurrence Learning for Mixed Emotion Recognition
arXiv:2602.20530v1 Announce Type: new Abstract: Emotion recognition from multi-modal physiological and behavioral signals plays a pivotal role in affective computing, yet most existing models remain constrained to the prediction of singular emotions in controlled laboratory settings. Real-world human emotional experiences,...
Sample-efficient evidence estimation of score based priors for model selection
arXiv:2602.20549v1 Announce Type: new Abstract: The choice of prior is central to solving ill-posed imaging inverse problems, making it essential to select one consistent with the measurements $y$ to avoid severe bias. In Bayesian inverse problems, this could be achieved...
GENSR: Symbolic Regression Based in Equation Generative Space
arXiv:2602.20557v1 Announce Type: new Abstract: Symbolic Regression (SR) tries to reveal the hidden equations behind observed data. However, most methods search within a discrete equation space, where the structural modifications of equations rarely align with their numerical behavior, leaving fitting...
Stability and Generalization of Push-Sum Based Decentralized Optimization over Directed Graphs
arXiv:2602.20567v1 Announce Type: new Abstract: Push-Sum-based decentralized learning enables optimization over directed communication networks, where information exchange may be asymmetric. While convergence properties of such methods are well understood, their finite-iteration stability and generalization behavior remain unclear due to structural...