ReIn: Conversational Error Recovery with Reasoning Inception
arXiv:2602.17022v1 Announce Type: new Abstract: Conversational agents powered by large language models (LLMs) with tool integration achieve strong performance on fixed task-oriented dialogue datasets but remain vulnerable to unanticipated, user-induced errors. Rather than focusing on error prevention, this work focuses...
The Emergence of Lab-Driven Alignment Signatures: A Psychometric Framework for Auditing Latent Bias and Compounding Risk in Generative AI
arXiv:2602.17127v1 Announce Type: new Abstract: As Large Language Models (LLMs) transition from standalone chat interfaces to foundational reasoning layers in multi-agent systems and recursive evaluation loops (LLM-as-a-judge), the detection of durable, provider-level behavioral signatures becomes a critical requirement for safety...
ABCD: All Biases Come Disguised
arXiv:2602.17445v1 Announce Type: new Abstract: Multiple-choice question (MCQ) benchmarks have been a standard evaluation practice for measuring LLMs' ability to reason and answer knowledge-based questions. Through a synthetic NonsenseQA benchmark, we observe that different LLMs exhibit varying degrees of label-position-few-shot-prompt...
DeepVision-103K: A Visually Diverse, Broad-Coverage, and Verifiable Mathematical Dataset for Multimodal Reasoning
arXiv:2602.16742v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has been shown effective in enhancing the visual reflection and reasoning capabilities of Large Multimodal Models (LMMs). However, existing datasets are predominantly derived from either small-scale manual construction or...
Low-Dimensional and Transversely Curved Optimization Dynamics in Grokking
arXiv:2602.16746v1 Announce Type: new Abstract: Grokking -- the delayed transition from memorization to generalization in small algorithmic tasks -- remains poorly understood. We present a geometric analysis of optimization dynamics in transformers trained on modular arithmetic. PCA of attention weight...
Escaping the Cognitive Well: Efficient Competition Math with Off-the-Shelf Models
arXiv:2602.16793v1 Announce Type: new Abstract: In the past year, custom and unreleased math reasoning models reached gold medal performance on the International Mathematical Olympiad (IMO). Similar performance was then reported using large-scale inference on publicly available models but at prohibitive...
Position: Why a Dynamical Systems Perspective is Needed to Advance Time Series Modeling
arXiv:2602.16864v1 Announce Type: new Abstract: Time series (TS) modeling has come a long way from early statistical, mainly linear, approaches to the current trend in TS foundation models. With a lot of hype and industrial demand in this field, it...
A Unified Framework for Locality in Scalable MARL
arXiv:2602.16966v1 Announce Type: new Abstract: Scalable Multi-Agent Reinforcement Learning (MARL) is fundamentally challenged by the curse of dimensionality. A common solution is to exploit locality, which hinges on an Exponential Decay Property (EDP) of the value function. However, existing conditions...
Towards Fair and Efficient De-identification: Quantifying the Efficiency and Generalizability of De-identification Approaches
arXiv:2602.15869v1 Announce Type: new Abstract: Large language models (LLMs) have shown strong performance on clinical de-identification, the task of identifying sensitive identifiers to protect privacy. However, previous work has not examined their generalizability between formats, cultures, and genders. In this...
Seeing to Generalize: How Visual Data Corrects Binding Shortcuts
arXiv:2602.15183v1 Announce Type: cross Abstract: Vision Language Models (VLMs) are designed to extend Large Language Models (LLMs) with visual capabilities, yet in this work we observe a surprising phenomenon: VLMs can outperform their underlying LLMs on purely text-only tasks, particularly...
How to Train Your Long-Context Visual Document Model
arXiv:2602.15257v1 Announce Type: cross Abstract: We present the first comprehensive, large-scale study of training long-context vision language models up to 344K context, targeting long-document visual question answering with measured transfer to long-context text. While several such strong are open-weight, namely...
FrameRef: A Framing Dataset and Simulation Testbed for Modeling Bounded Rational Information Health
arXiv:2602.15273v1 Announce Type: cross Abstract: Information ecosystems increasingly shape how people internalize exposure to adverse digital experiences, raising concerns about the long-term consequences for information health. In modern search and recommendation systems, ranking and personalization policies play a central role...
Learning Representations from Incomplete EHR Data with Dual-Masked Autoencoding
arXiv:2602.15159v1 Announce Type: new Abstract: Learning from electronic health records (EHRs) time series is challenging due to irregular sam- pling, heterogeneous missingness, and the resulting sparsity of observations. Prior self-supervised meth- ods either impute before learning, represent missingness through a...
Hybrid Federated and Split Learning for Privacy Preserving Clinical Prediction and Treatment Optimization
arXiv:2602.15304v1 Announce Type: new Abstract: Collaborative clinical decision support is often constrained by governance and privacy rules that prevent pooling patient-level records across institutions. We present a hybrid privacy-preserving framework that combines Federated Learning (FL) and Split Learning (SL) to...
Benchmarking IoT Time-Series AD with Event-Level Augmentations
arXiv:2602.15457v1 Announce Type: new Abstract: Anomaly detection (AD) for safety-critical IoT time series should be judged at the event level: reliability and earliness under realistic perturbations. Yet many studies still emphasize point-level results on curated base datasets, limiting value for...
Evaluating Federated Learning for Cross-Country Mood Inference from Smartphone Sensing Data
arXiv:2602.15478v1 Announce Type: new Abstract: Mood instability is a key behavioral indicator of mental health, yet traditional assessments rely on infrequent and retrospective reports that fail to capture its continuous nature. Smartphone-based mobile sensing enables passive, in-the-wild mood inference from...
LLM-as-Judge on a Budget
arXiv:2602.15481v1 Announce Type: new Abstract: LLM-as-a-judge has emerged as a cornerstone technique for evaluating large language models by leveraging LLM reasoning to score prompt-response pairs. Since LLM judgments are stochastic, practitioners commonly query each pair multiple times to estimate mean...
Uniform error bounds for quantized dynamical models
arXiv:2602.15586v1 Announce Type: new Abstract: This paper provides statistical guarantees on the accuracy of dynamical models learned from dependent data sequences. Specifically, we develop uniform error bounds that apply to quantized models and imperfect optimization algorithms commonly used in practical...
Call for Tutorial Proposals for CVPR 2026
The Computer Vision Foundation – A non-profit organization that fosters and supports research in all aspects of computer vision
BLUEPRINT Rebuilding a Legacy: Multimodal Retrieval for Complex Engineering Drawings and Documents
arXiv:2602.13345v1 Announce Type: new Abstract: Decades of engineering drawings and technical records remain locked in legacy archives with inconsistent or missing metadata, making retrieval difficult and often manual. We present Blueprint, a layout-aware multimodal retrieval system designed for large-scale engineering...
Preventing Rank Collapse in Federated Low-Rank Adaptation with Client Heterogeneity
arXiv:2602.13486v1 Announce Type: new Abstract: Federated low-rank adaptation (FedLoRA) has facilitated communication-efficient and privacy-preserving fine-tuning of foundation models for downstream tasks. In practical federated learning scenarios, client heterogeneity in system resources and data distributions motivates heterogeneous LoRA ranks across clients....
Fast Swap-Based Element Selection for Multiplication-Free Dimension Reduction
arXiv:2602.13532v1 Announce Type: new Abstract: In this paper, we propose a fast algorithm for element selection, a multiplication-free form of dimension reduction that produces a dimension-reduced vector by simply selecting a subset of elements from the input. Dimension reduction is...
Benchmark Leakage Trap: Can We Trust LLM-based Recommendation?
arXiv:2602.13626v1 Announce Type: new Abstract: The expanding integration of Large Language Models (LLMs) into recommender systems poses critical challenges to evaluation reliability. This paper identifies and investigates a previously overlooked issue: benchmark data leakage in LLM-based recommendation. This phenomenon occurs...
Cumulative Utility Parity for Fair Federated Learning under Intermittent Client Participation
arXiv:2602.13651v1 Announce Type: new Abstract: In real-world federated learning (FL) systems, client participation is intermittent, heterogeneous, and often correlated with data characteristics or resource constraints. Existing fairness approaches in FL primarily focus on equalizing loss or accuracy conditional on participation,...
Zero-Order Optimization for LLM Fine-Tuning via Learnable Direction Sampling
arXiv:2602.13659v1 Announce Type: new Abstract: Fine-tuning large pretrained language models (LLMs) is a cornerstone of modern NLP, yet its growing memory demands (driven by backpropagation and large optimizer States) limit deployment in resource-constrained settings. Zero-order (ZO) methods bypass backpropagation by...
HBVLA: Pushing 1-Bit Post-Training Quantization for Vision-Language-Action Models
arXiv:2602.13710v1 Announce Type: new Abstract: Vision-Language-Action (VLA) models enable instruction-following embodied control, but their large compute and memory footprints hinder deployment on resource-constrained robots and edge platforms. While reducing weights to 1-bit precision through binarization can greatly improve efficiency, existing...
Discrete Double-Bracket Flows for Isotropic-Noise Invariant Eigendecomposition
arXiv:2602.13759v1 Announce Type: new Abstract: We study matrix-free eigendecomposition under a matrix-vector product (MVP) oracle, where each step observes a covariance operator $C_k = C_{sig} + \sigma_k^2 I + E_k$. Standard stochastic approximation methods either use fixed steps that couple...