Stochastic Gradient Descent in the Saddle-to-Saddle Regime of Deep Linear Networks
arXiv:2604.06366v1 Announce Type: new Abstract: Deep linear networks (DLNs) are used as an analytically tractable model of the training dynamics of deep neural networks. While gradient descent in DLNs is known to exhibit saddle-to-saddle dynamics, the impact of stochastic gradient...
Bi-level Heterogeneous Learning for Time Series Foundation Models: A Federated Learning Approach
arXiv:2604.06727v1 Announce Type: new Abstract: Heterogeneity in time series data is more pronounced than in vision or language, as temporal dynamics vary substantially across domains and tasks. Existing efforts on training time series foundation models (TSFMs) from scratch are often...
Context-Agent: Dynamic Discourse Trees for Non-Linear Dialogue
arXiv:2604.05552v1 Announce Type: new Abstract: Large Language Models demonstrate outstanding performance in many language tasks but still face fundamental challenges in managing the non-linear flow of human conversation. The prevalent approach of treating dialogue history as a flat, linear sequence...
YoNER: A New Yor\`ub\'a Multi-domain Named Entity Recognition Dataset
arXiv:2604.05624v1 Announce Type: new Abstract: Named Entity Recognition (NER) is a foundational NLP task, yet research in Yor\`ub\'a has been constrained by limited and domain-specific resources. Existing resources, such as MasakhaNER (a manually annotated news-domain corpus) and WikiAnn (automatically created...
AutoSOTA: An End-to-End Automated Research System for State-of-the-Art AI Model Discovery
arXiv:2604.05550v1 Announce Type: new Abstract: Artificial intelligence research increasingly depends on prolonged cycles of reproduction, debugging, and iterative refinement to achieve State-Of-The-Art (SOTA) performance, creating a growing need for systems that can accelerate the full pipeline of empirical model optimization....
Attention Editing: A Versatile Framework for Cross-Architecture Attention Conversion
arXiv:2604.05688v1 Announce Type: new Abstract: Key-Value (KV) cache memory and bandwidth increasingly dominate large language model inference cost in long-context and long-generation regimes. Architectures such as multi-head latent attention (MLA) and hybrid sliding-window attention (SWA) can alleviate this bound, but...
Dialogue Act Patterns in GenAI-Mediated L2 Oral Practice: A Sequential Analysis of Learner-Chatbot Interactions
arXiv:2604.05702v1 Announce Type: new Abstract: While generative AI (GenAI) voice chatbots offer scalable opportunities for second language (L2) oral practice, the interactional processes related to learners' gains remain underexplored. This study investigates dialogue act (DA) patterns in interactions between Grade...
LLM Reasoning as Trajectories: Step-Specific Representation Geometry and Correctness Signals
arXiv:2604.05655v1 Announce Type: new Abstract: This work characterizes large language models' chain-of-thought generation as a structured trajectory through representation space. We show that mathematical reasoning traverses functionally ordered, step-specific subspaces that become increasingly separable with layer depth. This structure already...
Don't Act Blindly: Robust GUI Automation via Action-Effect Verification and Self-Correction
arXiv:2604.05477v1 Announce Type: new Abstract: Autonomous GUI agents based on vision-language models (VLMs) often assume deterministic environment responses, generating actions without verifying whether previous operations succeeded. In real-world settings with network latency, rendering delays, and system interruptions, this assumption leads...
Learning-Based Multi-Criteria Decision Making Model for Sawmill Location Problems
arXiv:2604.04996v1 Announce Type: new Abstract: Strategically locating a sawmill is vital for enhancing the efficiency, profitability, and sustainability of timber supply chains. Our study proposes a Learning-Based Multi-Criteria Decision-Making (LB-MCDM) framework that integrates machine learning (ML) with GIS-based spatial location...
PRIME: Prototype-Driven Multimodal Pretraining for Cancer Prognosis with Missing Modalities
arXiv:2604.04999v1 Announce Type: new Abstract: Multimodal self-supervised pretraining offers a promising route to cancer prognosis by integrating histopathology whole-slide images, gene expression, and pathology reports, yet most existing approaches require fully paired and complete inputs. In practice, clinical cohorts are...
Enhancing sample efficiency in reinforcement-learning-based flow control: replacing the critic with an adaptive reduced-order model
arXiv:2604.04986v1 Announce Type: new Abstract: Model-free deep reinforcement learning (DRL) methods suffer from poor sample efficiency. To overcome this limitation, this work introduces an adaptive reduced-order-model (ROM)-based reinforcement learning framework for active flow control. In contrast to conventional actor--critic architectures,...
Dynamic Linear Coregionalization for Realistic Synthetic Multivariate Time Series
arXiv:2604.05064v1 Announce Type: new Abstract: Synthetic data is essential for training foundation models for time series (FMTS), but most generators assume static correlations, and are typically missing realistic inter-channel dependencies. We introduce DynLMC, a Dynamic Linear Model of Coregionalization, that...
Reasoning Through Chess: How Reasoning Evolves from Data Through Fine-Tuning and Reinforcement Learning
arXiv:2604.05134v1 Announce Type: new Abstract: How can you get a language model to reason in a task it natively struggles with? We study how reasoning evolves in a language model -- from supervised fine-tuning (SFT) to reinforcement learning (RL) --...
Learning Stable Predictors from Weak Supervision under Distribution Shift
arXiv:2604.05002v1 Announce Type: new Abstract: Learning from weak or proxy supervision is common when ground-truth labels are unavailable, yet robustness under distribution shift remains poorly understood, especially when the supervision mechanism itself changes. We formalize this as supervision drift, defined...
Territory Paint Wars: Diagnosing and Mitigating Failure Modes in Competitive Multi-Agent PPO
arXiv:2604.04983v1 Announce Type: new Abstract: We present Territory Paint Wars, a minimal competitive multi-agent reinforcement learning environment implemented in Unity, and use it to systematically investigate failure modes of Proximal Policy Optimisation (PPO) under self-play. A first agent trained for...
Learning What Matters: Dynamic Dimension Selection and Aggregation for Interpretable Vision-Language Reward Modeling
arXiv:2604.05445v1 Announce Type: new Abstract: Vision-language reward modeling faces a dilemma: generative approaches are interpretable but slow, while discriminative ones are efficient but act as opaque "black boxes." To bridge this gap, we propose VL-MDR (Vision-Language Multi-Dimensional Reward), a framework...
Expectation Maximization (EM) Converges for General Agnostic Mixtures
arXiv:2604.05842v1 Announce Type: new Abstract: Mixture of linear regression is well studied in statistics and machine learning, where the data points are generated probabilistically using $k$ linear models. Algorithms like Expectation Maximization (EM) may be used to recover the ground...
The UNDO Flip-Flop: A Controlled Probe for Reversible Semantic State Management in State Space Model
arXiv:2604.05923v1 Announce Type: new Abstract: State space models (SSMs) have been shown to possess the theoretical capacity to model both star-free sequential tasks and bounded hierarchical structures Sarrof et al. (2024). However, formal expressivity results do not guarantee that gradient-based...
Energy-Based Dynamical Models for Neurocomputation, Learning, and Optimization
arXiv:2604.05042v1 Announce Type: new Abstract: Recent advances at the intersection of control theory, neuroscience, and machine learning have revealed novel mechanisms by which dynamical systems perform computation. These advances encompass a wide range of conceptual, mathematical, and computational ideas, with...
Improving Sparse Memory Finetuning
arXiv:2604.05248v1 Announce Type: new Abstract: Large Language Models (LLMs) are typically static after training, yet real-world applications require continual adaptation to new knowledge without degrading existing capabilities. Standard approaches to updating models, like full finetuning or parameter-efficient methods (e.g., LoRA),...
Training Without Orthogonalization, Inference With SVD: A Gradient Analysis of Rotation Representations
arXiv:2604.05414v1 Announce Type: new Abstract: Recent work has shown that removing orthogonalization during training and applying it only at inference improves rotation estimation in deep learning, with empirical evidence favoring 9D representations with SVD projection. However, the theoretical understanding of...
Shadow Derivatives: The Quiet Propertization of AI Learning
Introduction Artificial intelligence (AI) systems learn. In today’s AI markets, durable advantage comes less from any single output than from the learning that accumulates through training, fine-tuning, and downstream feedback loops.[1] Each interaction, correction, and deployment contributes incrementally to improved...
Vehicle-as-Prompt: A Unified Deep Reinforcement Learning Framework for Heterogeneous Fleet Vehicle Routing Problem
arXiv:2604.05195v1 Announce Type: new Abstract: Unlike traditional homogeneous routing problems, the Heterogeneous Fleet Vehicle Routing Problem (HFVRP) involves heterogeneous fixed costs, variable travel costs, and capacity constraints, rendering solution quality highly sensitive to vehicle selection. Furthermore, real-world logistics applications often...
Reproducing AlphaZero on Tablut: Self-Play RL for an Asymmetric Board Game
arXiv:2604.05476v1 Announce Type: new Abstract: This work investigates the adaptation of the AlphaZero reinforcement learning algorithm to Tablut, an asymmetric historical board game featuring unequal piece counts and distinct player objectives (king capture versus king escape). While the original AlphaZero...
Optimal-Transport-Guided Functional Flow Matching for Turbulent Field Generation in Hilbert Space
arXiv:2604.05700v1 Announce Type: new Abstract: High-fidelity modeling of turbulent flows requires capturing complex spatiotemporal dynamics and multi-scale intermittency, posing a fundamental challenge for traditional knowledge-based systems. While deep generative models, such as diffusion models and Flow Matching, have shown promising...
Top-K Retrieval with Fixed-Size Linear-Attention Completion: Backbone- and KV-Format-Preserving Attention for KV-Cache Read Reduction
arXiv:2604.05438v1 Announce Type: new Abstract: Long-context generation is increasingly limited by decode-time key-value (KV) cache traffic, particularly when KV is offloaded beyond GPU memory. Query-aware retrieval (e.g., Top-K selection) reduces this traffic by loading only a subset of KV pairs,...
Hidden in the Multiplicative Interaction: Uncovering Fragility in Multimodal Contrastive Learning
arXiv:2604.05834v1 Announce Type: new Abstract: Multimodal contrastive learning is increasingly enriched by going beyond image-text pairs. Among recent contrastive methods, Symile is a strong approach for this challenge because its multiplicative interaction objective captures higher-order cross-modal dependence. Yet, we find...
EEG-MFTNet: An Enhanced EEGNet Architecture with Multi-Scale Temporal Convolutions and Transformer Fusion for Cross-Session Motor Imagery Decoding
arXiv:2604.05843v1 Announce Type: new Abstract: Brain-computer interfaces (BCIs) enable direct communication between the brain and external devices, providing critical support for individuals with motor impairments. However, accurate motor imagery (MI) decoding from electroencephalography (EEG) remains challenging due to noise and...
Graph Topology Information Enhanced Heterogeneous Graph Representation Learning
arXiv:2604.05732v1 Announce Type: new Abstract: Real-world heterogeneous graphs are inherently noisy and usually not in the optimal graph structures for downstream tasks, which often adversely affects the performance of GRL models in downstream tasks. Although Graph Structure Learning (GSL) methods...