Support Vector Data Description for Radar Target Detection
arXiv:2602.18486v1 Announce Type: new Abstract: Classical radar detection techniques rely on adaptive detectors that estimate the noise covariance matrix from target-free secondary data. While effective in Gaussian environments, these methods degrade in the presence of clutter, which is better modeled...
Learning to Remember: End-to-End Training of Memory Agents for Long-Context Reasoning
arXiv:2602.18493v1 Announce Type: new Abstract: Long-context LLMs and Retrieval-Augmented Generation (RAG) systems process information passively, deferring state tracking, contradiction resolution, and evidence aggregation to query time, which becomes brittle under ultra long streams with frequent updates. We propose the Unified...
Weak-Form Evolutionary Kolmogorov-Arnold Networks for Solving Partial Differential Equations
arXiv:2602.18515v1 Announce Type: new Abstract: Partial differential equations (PDEs) form a central component of scientific computing. Among recent advances in deep learning, evolutionary neural networks have been developed to successively capture the temporal dynamics of time-dependent PDEs via parameter evolution....
Wide Open Gazes: Quantifying Visual Exploratory Behavior in Soccer with Pose Enhanced Positional Data
arXiv:2602.18519v1 Announce Type: new Abstract: Traditional approaches to measuring visual exploratory behavior in soccer rely on counting visual exploratory actions (VEAs) based on rapid head movements exceeding 125{\deg}/s, but this method suffer from player position bias (i.e., a focus on...
AdaptStress: Online Adaptive Learning for Interpretable and Personalized Stress Prediction Using Multivariate and Sparse Physiological Signals
arXiv:2602.18521v1 Announce Type: new Abstract: Continuous stress forecasting could potentially contribute to lifestyle interventions. This paper presents a novel, explainable, and individualized approach for stress prediction using physiological data from consumer-grade smartwatches. We develop a time series forecasting model that...
The Geometry of Multi-Task Grokking: Transverse Instability, Superposition, and Weight Decay Phase Structure
arXiv:2602.18523v1 Announce Type: new Abstract: Grokking -- the abrupt transition from memorization to generalization long after near-zero training loss -- has been studied mainly in single-task settings. We extend geometric analysis to multi-task modular arithmetic, training shared-trunk Transformers on dual-task...
Audio-Visual Continual Test-Time Adaptation without Forgetting
arXiv:2602.18528v1 Announce Type: new Abstract: Audio-visual continual test-time adaptation involves continually adapting a source audio-visual model at test-time, to unlabeled non-stationary domains, where either or both modalities can be distributionally shifted, which hampers online cross-modal learning and eventually leads to...
Deep Reinforcement Learning for Optimizing Energy Consumption in Smart Grid Systems
arXiv:2602.18531v1 Announce Type: new Abstract: The energy management problem in the context of smart grids is inherently complex due to the interdependencies among diverse system components. Although Reinforcement Learning (RL) has been proposed for solving Optimal Power Flow (OPF) problems,...
Sub-City Real Estate Price Index Forecasting at Weekly Horizons Using Satellite Radar and News Sentiment
arXiv:2602.18572v1 Announce Type: new Abstract: Reliable real estate price indicators are typically published at city level and low frequency, limiting their use for neighborhood-scale monitoring and long-horizon planning. We study whether sub-city price indices can be forecasted at weekly frequency...
Learning Beyond Optimization: Stress-Gated Dynamical Regime Regulation in Autonomous Systems
arXiv:2602.18581v1 Announce Type: new Abstract: Despite their apparent diversity, modern machine learning methods can be reduced to a remarkably simple core principle: learning is achieved by continuously optimizing parameters to minimize or maximize a scalar objective function. This paradigm has...
Ensemble Prediction of Task Affinity for Efficient Multi-Task Learning
arXiv:2602.18591v1 Announce Type: new Abstract: A fundamental problem in multi-task learning (MTL) is identifying groups of tasks that should be learned together. Since training MTL models for all possible combinations of tasks is prohibitively expensive for large task sets, a...
Non-Interfering Weight Fields: Treating Model Parameters as a Continuously Extensible Function
arXiv:2602.18628v1 Announce Type: new Abstract: Large language models store all learned knowledge in a single, fixed weight vector. Teaching a model new capabilities requires modifying those same weights, inevitably degrading previously acquired knowledge. This fundamental limitation, known as catastrophic forgetting,...
Learning Invariant Visual Representations for Planning with Joint-Embedding Predictive World Models
arXiv:2602.18639v1 Announce Type: new Abstract: World models learned from high-dimensional visual observations allow agents to make decisions and plan directly in latent space, avoiding pixel-level reconstruction. However, recent latent predictive architectures (JEPAs), including the DINO world model (DINO-WM), display a...
Adaptive Time Series Reasoning via Segment Selection
arXiv:2602.18645v1 Announce Type: new Abstract: Time series reasoning tasks often start with a natural language question and require targeted analysis of a time series. Evidence may span the full series or appear in a few short intervals, so the model...
Global Low-Rank, Local Full-Rank: The Holographic Encoding of Learned Algorithms
arXiv:2602.18649v1 Announce Type: new Abstract: Grokking -- the abrupt transition from memorization to generalization after extended training -- has been linked to the emergence of low-dimensional structure in learning dynamics. Yet neural network parameters inhabit extremely high-dimensional spaces. How can...
Communication-Efficient Personalized Adaptation via Federated-Local Model Merging
arXiv:2602.18658v1 Announce Type: new Abstract: Parameter-efficient fine-tuning methods, such as LoRA, offer a practical way to adapt large vision and language models to client tasks. However, this becomes particularly challenging under task-level heterogeneity in federated deployments. In this regime, personalization...
Large Causal Models for Temporal Causal Discovery
arXiv:2602.18662v1 Announce Type: new Abstract: Causal discovery for both cross-sectional and temporal data has traditionally followed a dataset-specific paradigm, where a new model is fitted for each individual dataset. Such an approach limits the potential of multi-dataset pretraining. The concept...
Robustness of Deep ReLU Networks to Misclassification of High-Dimensional Data
arXiv:2602.18674v1 Announce Type: new Abstract: We present a theoretical study of the robustness of parameterized networks to random input perturbations. Specifically, we analyze local robustness at a given network input by quantifying the probability that a small additive random perturbation...
Transformers for dynamical systems learn transfer operators in-context
arXiv:2602.18679v1 Announce Type: new Abstract: Large-scale foundation models for scientific machine learning adapt to physical settings unseen during training, such as zero-shot transfer between turbulent scales. This phenomenon, in-context learning, challenges conventional understanding of learning and adaptation in physical systems....
In-Context Planning with Latent Temporal Abstractions
arXiv:2602.18694v1 Announce Type: new Abstract: Planning-based reinforcement learning for continuous control is bottlenecked by two practical issues: planning at primitive time scales leads to prohibitive branching and long horizons, while real environments are frequently partially observable and exhibit regime shifts...
Insertion Based Sequence Generation with Learnable Order Dynamics
arXiv:2602.18695v1 Announce Type: new Abstract: In many domains generating variable length sequences through insertions provides greater flexibility over autoregressive models. However, the action space of insertion models is much larger than that of autoregressive models (ARMs) making the learning challenging....
Phase-Consistent Magnetic Spectral Learning for Multi-View Clustering
arXiv:2602.18728v1 Announce Type: new Abstract: Unsupervised multi-view clustering (MVC) aims to partition data into meaningful groups by leveraging complementary information from multiple views without labels, yet a central challenge is to obtain a reliable shared structural signal to guide representation...
HONEST-CAV: Hierarchical Optimization of Network Signals and Trajectories for Connected and Automated Vehicles with Multi-Agent Reinforcement Learning
arXiv:2602.18740v1 Announce Type: new Abstract: This study presents a hierarchical, network-level traffic flow control framework for mixed traffic consisting of Human-driven Vehicles (HVs), Connected and Automated Vehicles (CAVs). The framework jointly optimizes vehicle-level eco-driving behaviors and intersection-level traffic signal control...
RadioGen3D: 3D Radio Map Generation via Adversarial Learning on Large-Scale Synthetic Data
arXiv:2602.18744v1 Announce Type: new Abstract: Radio maps are essential for efficient radio resource management in future 6G and low-altitude networks. While deep learning (DL) techniques have emerged as an efficient alternative to conventional ray-tracing for radio map estimation (RME), most...
GLaDiGAtor: Language-Model-Augmented Multi-Relation Graph Learning for Predicting Disease-Gene Associations
arXiv:2602.18769v1 Announce Type: new Abstract: Understanding disease-gene associations is essential for unravelling disease mechanisms and advancing diagnostics and therapeutics. Traditional approaches based on manual curation and literature review are labour-intensive and not scalable, prompting the use of machine learning on...
From Few-Shot to Zero-Shot: Towards Generalist Graph Anomaly Detection
arXiv:2602.18793v1 Announce Type: new Abstract: Graph anomaly detection (GAD) is critical for identifying abnormal nodes in graph-structured data from diverse domains, including cybersecurity and social networks. The existing GAD methods often focus on the learning paradigms of "one-model-for-one-dataset", requiring dataset-specific...
SGNO: Spectral Generator Neural Operators for Stable Long Horizon PDE Rollouts
arXiv:2602.18801v1 Announce Type: new Abstract: Neural operators provide fast PDE surrogates and often generalize across parameters and resolutions. However, in the short train long test setting, autoregressive rollouts can become unstable. This typically happens for two reasons: one step errors...
L2G-Net: Local to Global Spectral Graph Neural Networks via Cauchy Factorizations
arXiv:2602.18837v1 Announce Type: new Abstract: Despite their theoretical advantages, spectral methods based on the graph Fourier transform (GFT) are seldom used in graph neural networks (GNNs) due to the cost of computing the eigenbasis and the lack of vertex-domain locality...
Exact Attention Sensitivity and the Geometry of Transformer Stability
arXiv:2602.18849v1 Announce Type: new Abstract: Despite powering modern AI, transformers remain mysteriously brittle to train. We develop a stability theory that explains why pre-LayerNorm works, why DeepNorm uses $N^{-1/4}$ scaling, and why warmup is necessary, all from first principles. Our...
Rank-Aware Spectral Bounds on Attention Logits for Stable Low-Precision Training
arXiv:2602.18851v1 Announce Type: new Abstract: Attention scores in transformers are bilinear forms $S_{ij} = x_i^\top M x_j / \sqrt{d_h}$ whose maximum magnitude governs overflow risk in low-precision training. We derive a \emph{rank-aware concentration inequality}: when the interaction matrix $M =...