Beyond a Single Extractor: Re-thinking HTML-to-Text Extraction for LLM Pretraining
arXiv:2602.19548v1 Announce Type: new Abstract: One of the first pre-processing steps for constructing web-scale LLM pretraining datasets involves extracting text from HTML. Despite the immense diversity of web content, existing open-source datasets predominantly apply a single fixed extractor to all...
Sculpting the Vector Space: Towards Efficient Multi-Vector Visual Document Retrieval via Prune-then-Merge Framework
arXiv:2602.19549v1 Announce Type: new Abstract: Visual Document Retrieval (VDR), which aims to retrieve relevant pages within vast corpora of visually-rich documents, is of significance in current multimodal retrieval applications. The state-of-the-art multi-vector paradigm excels in performance but suffers from prohibitive...
KGHaluBench: A Knowledge Graph-Based Hallucination Benchmark for Evaluating the Breadth and Depth of LLM Knowledge
arXiv:2602.19643v1 Announce Type: new Abstract: Large Language Models (LLMs) possess a remarkable capacity to generate persuasive and intelligible language. However, coherence does not equate to truthfulness, as the responses often contain subtle hallucinations. Existing benchmarks are limited by static and...
Revisiting the Seasonal Trend Decomposition for Enhanced Time Series Forecasting
arXiv:2602.18465v1 Announce Type: new Abstract: Time series forecasting presents significant challenges in real-world applications across various domains. Building upon the decomposition of the time series, we enhance the architecture of machine learning models for better multivariate time series forecasting. To...
Weak-Form Evolutionary Kolmogorov-Arnold Networks for Solving Partial Differential Equations
arXiv:2602.18515v1 Announce Type: new Abstract: Partial differential equations (PDEs) form a central component of scientific computing. Among recent advances in deep learning, evolutionary neural networks have been developed to successively capture the temporal dynamics of time-dependent PDEs via parameter evolution....
Wide Open Gazes: Quantifying Visual Exploratory Behavior in Soccer with Pose Enhanced Positional Data
arXiv:2602.18519v1 Announce Type: new Abstract: Traditional approaches to measuring visual exploratory behavior in soccer rely on counting visual exploratory actions (VEAs) based on rapid head movements exceeding 125{\deg}/s, but this method suffer from player position bias (i.e., a focus on...
The Geometry of Multi-Task Grokking: Transverse Instability, Superposition, and Weight Decay Phase Structure
arXiv:2602.18523v1 Announce Type: new Abstract: Grokking -- the abrupt transition from memorization to generalization long after near-zero training loss -- has been studied mainly in single-task settings. We extend geometric analysis to multi-task modular arithmetic, training shared-trunk Transformers on dual-task...
Audio-Visual Continual Test-Time Adaptation without Forgetting
arXiv:2602.18528v1 Announce Type: new Abstract: Audio-visual continual test-time adaptation involves continually adapting a source audio-visual model at test-time, to unlabeled non-stationary domains, where either or both modalities can be distributionally shifted, which hampers online cross-modal learning and eventually leads to...
Deep Reinforcement Learning for Optimizing Energy Consumption in Smart Grid Systems
arXiv:2602.18531v1 Announce Type: new Abstract: The energy management problem in the context of smart grids is inherently complex due to the interdependencies among diverse system components. Although Reinforcement Learning (RL) has been proposed for solving Optimal Power Flow (OPF) problems,...
Learning Beyond Optimization: Stress-Gated Dynamical Regime Regulation in Autonomous Systems
arXiv:2602.18581v1 Announce Type: new Abstract: Despite their apparent diversity, modern machine learning methods can be reduced to a remarkably simple core principle: learning is achieved by continuously optimizing parameters to minimize or maximize a scalar objective function. This paradigm has...
Online decoding of rat self-paced locomotion speed from EEG using recurrent neural networks
arXiv:2602.18637v1 Announce Type: new Abstract: $\textit{Objective.}$ Accurate neural decoding of locomotion holds promise for advancing rehabilitation, prosthetic control, and understanding neural correlates of action. Recent studies have demonstrated decoding of locomotion kinematics across species on motorized treadmills. However, efforts to...
Adaptive Time Series Reasoning via Segment Selection
arXiv:2602.18645v1 Announce Type: new Abstract: Time series reasoning tasks often start with a natural language question and require targeted analysis of a time series. Evidence may span the full series or appear in a few short intervals, so the model...
Information-Guided Noise Allocation for Efficient Diffusion Training
arXiv:2602.18647v1 Announce Type: new Abstract: Training diffusion models typically relies on manually tuned noise schedules, which can waste computation on weakly informative noise regions and limit transfer across datasets, resolutions, and representations. We revisit noise schedule allocation through an information-theoretic...
Global Low-Rank, Local Full-Rank: The Holographic Encoding of Learned Algorithms
arXiv:2602.18649v1 Announce Type: new Abstract: Grokking -- the abrupt transition from memorization to generalization after extended training -- has been linked to the emergence of low-dimensional structure in learning dynamics. Yet neural network parameters inhabit extremely high-dimensional spaces. How can...
Communication-Efficient Personalized Adaptation via Federated-Local Model Merging
arXiv:2602.18658v1 Announce Type: new Abstract: Parameter-efficient fine-tuning methods, such as LoRA, offer a practical way to adapt large vision and language models to client tasks. However, this becomes particularly challenging under task-level heterogeneity in federated deployments. In this regime, personalization...
Large Causal Models for Temporal Causal Discovery
arXiv:2602.18662v1 Announce Type: new Abstract: Causal discovery for both cross-sectional and temporal data has traditionally followed a dataset-specific paradigm, where a new model is fitted for each individual dataset. Such an approach limits the potential of multi-dataset pretraining. The concept...
Robustness of Deep ReLU Networks to Misclassification of High-Dimensional Data
arXiv:2602.18674v1 Announce Type: new Abstract: We present a theoretical study of the robustness of parameterized networks to random input perturbations. Specifically, we analyze local robustness at a given network input by quantifying the probability that a small additive random perturbation...
Transformers for dynamical systems learn transfer operators in-context
arXiv:2602.18679v1 Announce Type: new Abstract: Large-scale foundation models for scientific machine learning adapt to physical settings unseen during training, such as zero-shot transfer between turbulent scales. This phenomenon, in-context learning, challenges conventional understanding of learning and adaptation in physical systems....
In-Context Planning with Latent Temporal Abstractions
arXiv:2602.18694v1 Announce Type: new Abstract: Planning-based reinforcement learning for continuous control is bottlenecked by two practical issues: planning at primitive time scales leads to prohibitive branching and long horizons, while real environments are frequently partially observable and exhibit regime shifts...
Insertion Based Sequence Generation with Learnable Order Dynamics
arXiv:2602.18695v1 Announce Type: new Abstract: In many domains generating variable length sequences through insertions provides greater flexibility over autoregressive models. However, the action space of insertion models is much larger than that of autoregressive models (ARMs) making the learning challenging....
HONEST-CAV: Hierarchical Optimization of Network Signals and Trajectories for Connected and Automated Vehicles with Multi-Agent Reinforcement Learning
arXiv:2602.18740v1 Announce Type: new Abstract: This study presents a hierarchical, network-level traffic flow control framework for mixed traffic consisting of Human-driven Vehicles (HVs), Connected and Automated Vehicles (CAVs). The framework jointly optimizes vehicle-level eco-driving behaviors and intersection-level traffic signal control...
GLaDiGAtor: Language-Model-Augmented Multi-Relation Graph Learning for Predicting Disease-Gene Associations
arXiv:2602.18769v1 Announce Type: new Abstract: Understanding disease-gene associations is essential for unravelling disease mechanisms and advancing diagnostics and therapeutics. Traditional approaches based on manual curation and literature review are labour-intensive and not scalable, prompting the use of machine learning on...
SGNO: Spectral Generator Neural Operators for Stable Long Horizon PDE Rollouts
arXiv:2602.18801v1 Announce Type: new Abstract: Neural operators provide fast PDE surrogates and often generalize across parameters and resolutions. However, in the short train long test setting, autoregressive rollouts can become unstable. This typically happens for two reasons: one step errors...
Hyperbolic Busemann Neural Networks
arXiv:2602.18858v1 Announce Type: new Abstract: Hyperbolic spaces provide a natural geometry for representing hierarchical and tree-structured data due to their exponential volume growth. To leverage these benefits, neural networks require intrinsic and efficient components that operate directly in hyperbolic space....
Boosting for Vector-Valued Prediction and Conditional Density Estimation
arXiv:2602.18866v1 Announce Type: new Abstract: Despite the widespread use of boosting in structured prediction, a general theoretical understanding of aggregation beyond scalar losses remains incomplete. We study vector-valued and conditional density prediction under general divergences and identify stability conditions under...
HEHRGNN: A Unified Embedding Model for Knowledge Graphs with Hyperedges and Hyper-Relational Edges
arXiv:2602.18897v1 Announce Type: new Abstract: Knowledge Graph(KG) has gained traction as a machine-readable organization of real-world knowledge for analytics using artificial intelligence systems. Graph Neural Network(GNN), is proven to be an effective KG embedding technique that enables various downstream tasks...
OpenAI COO says ‘we have not yet really seen AI penetrate enterprise business processes’
There is a lot of talk around AI agents taking over business processes and claiming that "SaaS is dead." While these predictions have moved SaaS stocks at times, they haven't really come true.
Games That Teach, Chats That Convince: Comparing Interactive and Static Formats for Persuasive Learning
arXiv:2602.17905v1 Announce Type: cross Abstract: Interactive systems such as chatbots and games are increasingly used to persuade and educate on sustainability-related topics, yet it remains unclear how different delivery formats shape learning and persuasive outcomes when content is held constant....
MIRA: Memory-Integrated Reinforcement Learning Agent with Limited LLM Guidance
arXiv:2602.17930v1 Announce Type: cross Abstract: Reinforcement learning (RL) agents often suffer from high sample complexity in sparse or delayed reward settings due to limited prior structure. Large language models (LLMs) can provide subgoal decompositions, plausible trajectories, and abstract priors that...
On the scaling relationship between cloze probabilities and language model next-token prediction
arXiv:2602.17848v1 Announce Type: new Abstract: Recent work has shown that larger language models have better predictive power for eye movement and reading time data. While even the best models under-allocate probability mass to human responses, larger models assign higher-quality estimates...