Buffer Matters: Unleashing the Power of Off-Policy Reinforcement Learning in Large Language Model Reasoning
arXiv:2602.20722v1 Announce Type: new Abstract: Traditional on-policy Reinforcement Learning with Verifiable Rewards (RLVR) frameworks suffer from experience waste and reward homogeneity, which directly hinders learning efficiency on difficult samples during large language models post-training. In this paper, we introduce Batch...
Predicting Sentence Acceptability Judgments in Multimodal Contexts
arXiv:2602.20918v1 Announce Type: new Abstract: Previous work has examined the capacity of deep neural networks (DNNs), particularly transformers, to predict human sentence acceptability judgments, both independently of context, and in document contexts. We consider the effect of prior exposure to...
LogicGraph : Benchmarking Multi-Path Logical Reasoning via Neuro-Symbolic Generation and Verification
arXiv:2602.21044v1 Announce Type: new Abstract: Evaluations of large language models (LLMs) primarily emphasize convergent logical reasoning, where success is defined by producing a single correct proof. However, many real-world reasoning problems admit multiple valid derivations, requiring models to explore diverse...
A Hierarchical Multi-Agent System for Autonomous Discovery in Geoscientific Data Archives
arXiv:2602.21351v1 Announce Type: new Abstract: The rapid accumulation of Earth science data has created a significant scalability challenge; while repositories like PANGAEA host vast collections of datasets, citation metrics indicate that a substantial portion remains underutilized, limiting data reusability. Here...
Semantic Partial Grounding via LLMs
arXiv:2602.22067v1 Announce Type: new Abstract: Grounding is a critical step in classical planning, yet it often becomes a computational bottleneck due to the exponential growth in grounded actions and atoms as task size increases. Recent advances in partial grounding have...
EQ-5D Classification Using Biomedical Entity-Enriched Pre-trained Language Models and Multiple Instance Learning
arXiv:2602.21216v1 Announce Type: cross Abstract: The EQ-5D (EuroQol 5-Dimensions) is a standardized instrument for the evaluation of health-related quality of life. In health economics, systematic literature reviews (SLRs) depend on the correct identification of publications that use the EQ-5D, but...
Task-Aware LoRA Adapter Composition via Similarity Retrieval in Vector Databases
arXiv:2602.21222v1 Announce Type: cross Abstract: Parameter efficient fine tuning methods like LoRA have enabled task specific adaptation of large language models, but efficiently composing multiple specialized adapters for unseen tasks remains challenging. We present a novel framework for dynamic LoRA...
Group Orthogonalized Policy Optimization:Group Policy Optimization as Orthogonal Projection in Hilbert Space
arXiv:2602.21269v1 Announce Type: cross Abstract: We present Group Orthogonalized Policy Optimization (GOPO), a new alignment algorithm for large language models derived from the geometry of Hilbert function spaces. Instead of optimizing on the probability simplex and inheriting the exponential curvature...
SideQuest: Model-Driven KV Cache Management for Long-Horizon Agentic Reasoning
arXiv:2602.22603v1 Announce Type: new Abstract: Long-running agentic tasks, such as deep research, require multi-hop reasoning over information distributed across multiple webpages and documents. In such tasks, the LLM context is dominated by tokens from external retrieval, causing memory usage to...
Decomposing Physician Disagreement in HealthBench
arXiv:2602.22758v1 Announce Type: new Abstract: We decompose physician disagreement in the HealthBench medical AI evaluation dataset to understand where variance resides and what observable features can explain it. Rubric identity accounts for 15.8% of met/not-met label variance but only 3.6-6.9%...
RepSPD: Enhancing SPD Manifold Representation in EEGs via Dynamic Graphs
arXiv:2602.22981v1 Announce Type: new Abstract: Decoding brain activity from electroencephalography (EEG) is crucial for neuroscience and clinical applications. Among recent advances in deep learning for EEG, geometric learning stands out as its theoretical underpinnings on symmetric positive definite (SPD) allows...
Enhancing CVRP Solver through LLM-driven Automatic Heuristic Design
arXiv:2602.23092v1 Announce Type: new Abstract: The Capacitated Vehicle Routing Problem (CVRP), a fundamental combinatorial optimization challenge, focuses on optimizing fleet operations under vehicle capacity constraints. While extensively studied in operational research, the NP-hard nature of CVRP continues to pose significant...
Efficient Dialect-Aware Modeling and Conditioning for Low-Resource Taiwanese Hakka Speech Processing
arXiv:2602.22522v1 Announce Type: new Abstract: Taiwanese Hakka is a low-resource, endangered language that poses significant challenges for automatic speech recognition (ASR), including high dialectal variability and the presence of two distinct writing systems (Hanzi and Pinyin). Traditional ASR models often...
Digital Sovereignty: How Nations Are Asserting Control Over Technology Infrastructure
Countries worldwide are implementing digital sovereignty measures to control data flows, technology standards, and digital infrastructure within their borders.
Improving Neural Argumentative Stance Classification in Controversial Topics with Emotion-Lexicon Features
arXiv:2602.22846v1 Announce Type: new Abstract: Argumentation mining comprises several subtasks, among which stance classification focuses on identifying the standpoint expressed in an argumentative text toward a specific target topic. While arguments-especially about controversial topics-often appeal to emotions, most prior work...
Improving Spatial Allocation for Energy System Coupling with Graph Neural Networks
arXiv:2602.22249v1 Announce Type: new Abstract: In energy system analysis, coupling models with mismatched spatial resolutions is a significant challenge. A common solution is assigning weights to high-resolution geographic units for aggregation, but traditional models are limited by using only a...
Orthogonal Weight Modification Enhances Learning Scalability and Convergence Efficiency without Gradient Backpropagation
arXiv:2602.22259v1 Announce Type: new Abstract: Recognizing the substantial computational cost of backpropagation (BP), non-BP methods have emerged as attractive alternatives for efficient learning on emerging neuromorphic systems. However, existing non-BP approaches still face critical challenges in efficiency and scalability. Inspired...
X-REFINE: XAI-based RElevance input-Filtering and archItecture fiNe-tuning for channel Estimation
arXiv:2602.22277v1 Announce Type: new Abstract: AI-native architectures are vital for 6G wireless communications. The black-box nature and high complexity of deep learning models employed in critical applications, such as channel estimation, limit their practical deployment. While perturbation-based XAI solutions offer...
Reliable XAI Explanations in Sudden Cardiac Death Prediction for Chagas Cardiomyopathy
arXiv:2602.22288v1 Announce Type: new Abstract: Sudden cardiac death (SCD) is unpredictable, and its prediction in Chagas cardiomyopathy (CC) remains a significant challenge, especially in patients not classified as high risk. While AI and machine learning models improve risk stratification, their...
Learning geometry-dependent lead-field operators for forward ECG modeling
arXiv:2602.22367v1 Announce Type: new Abstract: Modern forward electrocardiogram (ECG) computational models rely on an accurate representation of the torso domain. The lead-field method enables fast ECG simulations while preserving full geometric fidelity. Achieving high anatomical accuracy in torso representation is,...
Disentangling Shared and Target-Enriched Topics via Background-Contrastive Non-negative Matrix Factorization
arXiv:2602.22387v1 Announce Type: new Abstract: Biological signals of interest in high-dimensional data are often masked by dominant variation shared across conditions. This variation, arising from baseline biological structure or technical effects, can prevent standard dimensionality reduction methods from resolving condition-specific...
Copyright Protection for AI-Generated Works
Since the 2010s, artificial intelligence (AI) has quickly grown from another subset of machine learning (ie deep learning) in particular with recent advances in generative AI, such as ChatGPT. The use of generative AI has gone beyond leisure purposes. It...
Precision Medicine and Data Privacy: Balancing Innovation with Patient Rights
The rapid advancement of precision medicine creates unprecedented opportunities for personalized treatment while raising complex data privacy and consent challenges.
Multi-dimensional Assessment and Explainable Feedback for Counselor Responses to Client Resistance in Text-based Counseling with LLMs
arXiv:2602.21638v1 Announce Type: new Abstract: Effectively addressing client resistance is a sophisticated clinical skill in psychological counseling, yet practitioners often lack timely and scalable supervisory feedback to refine their approaches. Although current NLP research has examined overall counseling quality and...
SymTorch: A Framework for Symbolic Distillation of Deep Neural Networks
arXiv:2602.21307v1 Announce Type: new Abstract: Symbolic distillation replaces neural networks, or components thereof, with interpretable, closed-form mathematical expressions. This approach has shown promise in discovering physical laws and mathematical relationships directly from trained deep learning models, yet adoption remains limited...
Interleaved Head Attention
arXiv:2602.21371v1 Announce Type: new Abstract: Multi-Head Attention (MHA) is the core computational primitive underlying modern Large Language Models (LLMs). However, MHA suffers from a fundamental linear scaling limitation: $H$ attention heads produce exactly $H$ independent attention matrices, with no communication...
Benchmarking State Space Models, Transformers, and Recurrent Networks for US Grid Forecasting
arXiv:2602.21415v1 Announce Type: new Abstract: Selecting the right deep learning model for power grid forecasting is challenging, as performance heavily depends on the data available to the operator. This paper presents a comprehensive benchmark of five modern neural architectures: two...
When Learning Hurts: Fixed-Pole RNN for Real-Time Online Training
arXiv:2602.21454v1 Announce Type: new Abstract: Recurrent neural networks (RNNs) can be interpreted as discrete-time state-space models, where the state evolution corresponds to an infinite-impulse-response (IIR) filtering operation governed by both feedforward weights and recurrent poles. While, in principle, all parameters...
Geometric Priors for Generalizable World Models via Vector Symbolic Architecture
arXiv:2602.21467v1 Announce Type: new Abstract: A key challenge in artificial intelligence and neuroscience is understanding how neural systems learn representations that capture the underlying dynamics of the world. Most world models represent the transition function with unstructured neural networks, limiting...
GradAlign: Gradient-Aligned Data Selection for LLM Reinforcement Learning
arXiv:2602.21492v1 Announce Type: new Abstract: Reinforcement learning (RL) has become a central post-training paradigm for large language models (LLMs), but its performance is highly sensitive to the quality of training problems. This sensitivity stems from the non-stationarity of RL: rollouts...