Decoupling Strategy and Execution in Task-Focused Dialogue via Goal-Oriented Preference Optimization
arXiv:2602.15854v1 Announce Type: cross Abstract: Large language models show potential in task-oriented dialogue systems, yet existing training methods often rely on token-level likelihood or preference optimization, which poorly align with long-horizon task success. To address this, we propose Goal-Oriented Preference...
Kalman-Inspired Runtime Stability and Recovery in Hybrid Reasoning Systems
arXiv:2602.15855v1 Announce Type: cross Abstract: Hybrid reasoning systems that combine learned components with model-based inference are increasingly deployed in tool-augmented decision loops, yet their runtime behavior under partial observability and sustained evidence mismatch remains poorly understood. In practice, failures often...
Enhancing Action and Ingredient Modeling for Semantically Grounded Recipe Generation
arXiv:2602.15862v1 Announce Type: cross Abstract: Recent advances in Multimodal Large Language Models (MLMMs) have enabled recipe generation from food images, yet outputs often contain semantically incorrect actions or ingredients despite high lexical scores (e.g., BLEU, ROUGE). To address this gap,...
Test-Time Adaptation for Tactile-Vision-Language Models
arXiv:2602.15873v1 Announce Type: cross Abstract: Tactile-vision-language (TVL) models are increasingly deployed in real-world robotic and multimodal perception tasks, where test-time distribution shifts are unavoidable. Existing test-time adaptation (TTA) methods provide filtering in unimodal settings but lack explicit treatment of modality-wise...
IT-OSE: Exploring Optimal Sample Size for Industrial Data Augmentation
arXiv:2602.15878v1 Announce Type: cross Abstract: In industrial scenarios, data augmentation is an effective approach to improve model performance. However, its benefits are not unidirectionally beneficial. There is no theoretical research or established estimation for the optimal sample size (OSS) in...
FUTURE-VLA: Forecasting Unified Trajectories Under Real-time Execution
arXiv:2602.15882v1 Announce Type: cross Abstract: General vision-language models increasingly support unified spatiotemporal reasoning over long video streams, yet deploying such capabilities on robots remains constrained by the prohibitive latency of processing long-horizon histories and generating high-dimensional future predictions. To bridge...
NeuroSleep: Neuromorphic Event-Driven Single-Channel EEG Sleep Staging for Edge-Efficient Sensing
arXiv:2602.15888v1 Announce Type: cross Abstract: Reliable, continuous neural sensing on wearable edge platforms is fundamental to long-term health monitoring; however, for electroencephalography (EEG)-based sleep monitoring, dense high-frequency processing is often computationally prohibitive under tight energy budgets. To address this bottleneck,...
Surrogate Modeling for Neutron Transport: A Neural Operator Approach
arXiv:2602.15890v1 Announce Type: cross Abstract: This work introduces a neural operator based surrogate modeling framework for neutron transport computation. Two architectures, the Deep Operator Network (DeepONet) and the Fourier Neural Operator (FNO), were trained for fixed source problems to learn...
Improved Upper Bounds for Slicing the Hypercube
arXiv:2602.16807v1 Announce Type: new Abstract: A collection of hyperplanes $\mathcal{H}$ slices all edges of the $n$-dimensional hypercube $Q_n$ with vertex set $\{-1,1\}^n$ if, for every edge $e$ in the hypercube, there exists a hyperplane in $\mathcal{H}$ intersecting $e$ in its...
An order-oriented approach to scoring hesitant fuzzy elements
arXiv:2602.16827v1 Announce Type: new Abstract: Traditional scoring approaches on hesitant fuzzy sets often lack a formal base in order theory. This paper proposes a unified framework, where each score is explicitly defined with respect to a given order. This order-oriented...
Narrow fine-tuning erodes safety alignment in vision-language agents
arXiv:2602.16931v1 Announce Type: new Abstract: Lifelong multimodal agents must continuously adapt to new tasks through post-training, but this creates fundamental tension between acquiring capabilities and preserving safety alignment. We demonstrate that fine-tuning aligned vision-language models on narrow-domain harmful datasets induces...
HQFS: Hybrid Quantum Classical Financial Security with VQC Forecasting, QUBO Annealing, and Audit-Ready Post-Quantum Signing
arXiv:2602.16976v1 Announce Type: new Abstract: Here's the corrected paragraph with all punctuation and formatting issues fixed: Financial risk systems usually follow a two-step routine: a model predicts return or risk, and then an optimizer makes a decision such as a...
Sonar-TS: Search-Then-Verify Natural Language Querying for Time Series Databases
arXiv:2602.17001v1 Announce Type: new Abstract: Natural Language Querying for Time Series Databases (NLQ4TSDB) aims to assist non-expert users retrieve meaningful events, intervals, and summaries from massive temporal records. However, existing Text-to-SQL methods are not designed for continuous morphological intents such...
Cinder: A fast and fair matchmaking system
arXiv:2602.17015v1 Announce Type: new Abstract: A fair and fast matchmaking system is an important component of modern multiplayer online games, directly impacting player retention and satisfaction. However, creating fair matches between lobbies (pre-made teams) of heterogeneous skill levels presents a...
M2F: Automated Formalization of Mathematical Literature at Scale
arXiv:2602.17016v1 Announce Type: new Abstract: Automated formalization of mathematics enables mechanical verification but remains limited to isolated theorems and short snippets. Scaling to textbooks and research papers is largely unaddressed, as it requires managing cross-file dependencies, resolving imports, and ensuring...
IntentCUA: Learning Intent-level Representations for Skill Abstraction and Multi-Agent Planning in Computer-Use Agents
arXiv:2602.17049v1 Announce Type: new Abstract: Computer-use agents operate over long horizons under noisy perception, multi-window contexts, evolving environment states. Existing approaches, from RL-based planners to trajectory retrieval, often drift from user intent and repeatedly solve routine subproblems, leading to error...
RFEval: Benchmarking Reasoning Faithfulness under Counterfactual Reasoning Intervention in Large Reasoning Models
arXiv:2602.17053v1 Announce Type: new Abstract: Large Reasoning Models (LRMs) exhibit strong performance, yet often produce rationales that sound plausible but fail to reflect their true decision process, undermining reliability and trust. We introduce a formal framework for reasoning faithfulness, defined...
Predictive Batch Scheduling: Accelerating Language Model Training Through Loss-Aware Sample Prioritization
arXiv:2602.17066v1 Announce Type: new Abstract: We introduce Predictive Batch Scheduling (PBS), a novel training optimization technique that accelerates language model convergence by dynamically prioritizing high-loss samples during batch construction. Unlike curriculum learning approaches that require predefined difficulty metrics or hard...
Texo: Formula Recognition within 20M Parameters
arXiv:2602.17189v1 Announce Type: new Abstract: In this paper we present Texo, a minimalist yet highperformance formula recognition model that contains only 20 million parameters. By attentive design, distillation and transfer of the vocabulary and the tokenizer, Texo achieves comparable performance...
Continual learning and refinement of causal models through dynamic predicate invention
arXiv:2602.17217v1 Announce Type: new Abstract: Efficiently navigating complex environments requires agents to internalize the underlying logic of their world, yet standard world modelling methods often struggle with sample inefficiency, lack of transparency, and poor scalability. We propose a framework for...
One-step Language Modeling via Continuous Denoising
arXiv:2602.16813v1 Announce Type: new Abstract: Language models based on discrete diffusion have attracted widespread interest for their potential to provide faster generation than autoregressive models. In practice, however, they exhibit a sharp degradation of sample quality in the few-step regime,...
BanglaSummEval: Reference-Free Factual Consistency Evaluation for Bangla Summarization
arXiv:2602.16843v1 Announce Type: new Abstract: Evaluating factual consistency is essential for reliable text summarization, particularly in high-stakes domains such as healthcare and news. However, most existing evaluation metrics overlook Bangla, a widely spoken yet under-resourced language, and often depend on...
When Semantic Overlap Is Not Enough: Cross-Lingual Euphemism Transfer Between Turkish and English
arXiv:2602.16957v1 Announce Type: new Abstract: Euphemisms substitute socially sensitive expressions, often softening or reframing meaning, and their reliance on cultural and pragmatic context complicates modeling across languages. In this study, we investigate how cross-lingual equivalence influences transfer in multilingual euphemism...
Persona2Web: Benchmarking Personalized Web Agents for Contextual Reasoning with User History
arXiv:2602.17003v1 Announce Type: new Abstract: Large language models have advanced web agents, yet current agents lack personalization capabilities. Since users rarely specify every detail of their intent, practical web agents must be able to interpret ambiguous queries by inferring user...
Evaluating Cross-Lingual Classification Approaches Enabling Topic Discovery for Multilingual Social Media Data
arXiv:2602.17051v1 Announce Type: new Abstract: Analysing multilingual social media discourse remains a major challenge in natural language processing, particularly when large-scale public debates span across diverse languages. This study investigates how different approaches for cross-lingual text classification can support reliable...
ALPS: A Diagnostic Challenge Set for Arabic Linguistic & Pragmatic Reasoning
arXiv:2602.17054v1 Announce Type: new Abstract: While recent Arabic NLP benchmarks focus on scale, they often rely on synthetic or translated data which may benefit from deeper linguistic verification. We introduce ALPS (Arabic Linguistic & Pragmatic Suite), a native, expert-curated diagnostic...
Projective Psychological Assessment of Large Multimodal Models Using Thematic Apperception Tests
arXiv:2602.17108v1 Announce Type: new Abstract: Thematic Apperception Test (TAT) is a psychometrically grounded, multidimensional assessment framework that systematically differentiates between cognitive-representational and affective-relational components of personality-like functioning. This test is a projective psychological framework designed to uncover unconscious aspects of...
What Makes a Good Doctor Response? An Analysis on a Romanian Telemedicine Platform
arXiv:2602.17194v1 Announce Type: new Abstract: Text-based telemedicine has become a common mode of care, requiring clinicians to deliver medical advice clearly and effectively in writing. As platforms increasingly rely on patient ratings and feedback, clinicians face growing pressure to maintain...
Representation Collapse in Machine Translation Through the Lens of Angular Dispersion
arXiv:2602.17287v1 Announce Type: new Abstract: Modern neural translation models based on the Transformer architecture are known for their high performance, particularly when trained on high-resource datasets. A standard next-token prediction training strategy, while widely adopted in practice, may lead to...