CLOT: Closed-Loop Global Motion Tracking for Whole-Body Humanoid Teleoperation
arXiv:2602.15060v1 Announce Type: cross Abstract: Long-horizon whole-body humanoid teleoperation remains challenging due to accumulated global pose drift, particularly on full-sized humanoids. Although recent learning-based tracking methods enable agile and coordinated motions, they typically operate in the robot's local frame and...
Safe-SDL:Establishing Safety Boundaries and Control Mechanisms for AI-Driven Self-Driving Laboratories
arXiv:2602.15061v1 Announce Type: cross Abstract: The emergence of Self-Driving Laboratories (SDLs) transforms scientific discovery methodology by integrating AI with robotic automation to create closed-loop experimental systems capable of autonomous hypothesis generation, experimentation, and analysis. While promising to compress research timelines...
Structural Divergence Between AI-Agent and Human Social Networks in Moltbook
arXiv:2602.15064v1 Announce Type: cross Abstract: Large populations of AI agents are increasingly embedded in online environments, yet little is known about how their collective interaction patterns compare to human social systems. Here, we analyze the full interaction network of Moltbook,...
ZeroSyl: Simple Zero-Resource Syllable Tokenization for Spoken Language Modeling
arXiv:2602.15537v1 Announce Type: new Abstract: Pure speech language models aim to learn language directly from raw audio without textual resources. A key challenge is that discrete tokens from self-supervised speech encoders result in excessively long sequences, motivating recent work on...
Evidence-Grounded Subspecialty Reasoning: Evaluating a Curated Clinical Intelligence Layer on the 2025 Endocrinology Board-Style Examination
arXiv:2602.16050v1 Announce Type: new Abstract: Background: Large language models have demonstrated strong performance on general medical examinations, but subspecialty clinical reasoning remains challenging due to rapidly evolving guidelines and nuanced evidence hierarchies. Methods: We evaluated January Mirror, an evidence-grounded clinical...
Leveraging Large Language Models for Causal Discovery: a Constraint-based, Argumentation-driven Approach
arXiv:2602.16481v1 Announce Type: new Abstract: Causal discovery seeks to uncover causal relations from data, typically represented as causal graphs, and is essential for predicting the effects of interventions. While expert knowledge is required to construct principled causal graphs, many statistical...
Redefining boundaries in innovation and knowledge domains: Investigating the impact of generative artificial intelligence on copyright and intellectual property rights
Narrative Theory-Driven LLM Methods for Automatic Story Generation and Understanding: A Survey
arXiv:2602.15851v1 Announce Type: cross Abstract: Applications of narrative theories using large language models (LLMs) deliver promising use-cases in automatic story generation and understanding tasks. Our survey examines how natural language processing (NLP) research engages with fields of narrative studies, and...
Kalman-Inspired Runtime Stability and Recovery in Hybrid Reasoning Systems
arXiv:2602.15855v1 Announce Type: cross Abstract: Hybrid reasoning systems that combine learned components with model-based inference are increasingly deployed in tool-augmented decision loops, yet their runtime behavior under partial observability and sustained evidence mismatch remains poorly understood. In practice, failures often...
Rethinking Soft Compression in Retrieval-Augmented Generation: A Query-Conditioned Selector Perspective
arXiv:2602.15856v1 Announce Type: cross Abstract: Retrieval-Augmented Generation (RAG) effectively grounds Large Language Models (LLMs) with external knowledge and is widely applied to Web-related tasks. However, its scalability is hindered by excessive context length and redundant retrievals. Recent research on soft...
IndicJR: A Judge-Free Benchmark of Jailbreak Robustness in South Asian Languages
arXiv:2602.16832v1 Announce Type: new Abstract: Safety alignment of large language models (LLMs) is mostly evaluated in English and contract-bound, leaving multilingual vulnerabilities understudied. We introduce \textbf{Indic Jailbreak Robustness (IJR)}, a judge-free benchmark for adversarial safety across 12 Indic and South...
Narrow fine-tuning erodes safety alignment in vision-language agents
arXiv:2602.16931v1 Announce Type: new Abstract: Lifelong multimodal agents must continuously adapt to new tasks through post-training, but this creates fundamental tension between acquiring capabilities and preserving safety alignment. We demonstrate that fine-tuning aligned vision-language models on narrow-domain harmful datasets induces...
RFEval: Benchmarking Reasoning Faithfulness under Counterfactual Reasoning Intervention in Large Reasoning Models
arXiv:2602.17053v1 Announce Type: new Abstract: Large Reasoning Models (LRMs) exhibit strong performance, yet often produce rationales that sound plausible but fail to reflect their true decision process, undermining reliability and trust. We introduce a formal framework for reasoning faithfulness, defined...
Web Verbs: Typed Abstractions for Reliable Task Composition on the Agentic Web
arXiv:2602.17245v1 Announce Type: new Abstract: The Web is evolving from a medium that humans browse to an environment where software agents act on behalf of users. Advances in large language models (LLMs) make natural language a practical interface for goal-directed...
Meenz bleibt Meenz, but Large Language Models Do Not Speak Its Dialect
arXiv:2602.16852v1 Announce Type: new Abstract: Meenzerisch, the dialect spoken in the German city of Mainz, is also the traditional language of the Mainz carnival, a yearly celebration well known throughout Germany. However, Meenzerisch is on the verge of dying out-a...
Eigenmood Space: Uncertainty-Aware Spectral Graph Analysis of Psychological Patterns in Classical Persian Poetry
arXiv:2602.16959v1 Announce Type: new Abstract: Classical Persian poetry is a historically sustained archive in which affective life is expressed through metaphor, intertextual convention, and rhetorical indirection. These properties make close reading indispensable while limiting reproducible comparison at scale. We present...
Large Language Models Persuade Without Planning Theory of Mind
arXiv:2602.17045v1 Announce Type: new Abstract: A growing body of work attempts to evaluate the theory of mind (ToM) abilities of humans and large language models (LLMs) using static, non-interactive question-and-answer benchmarks. However, theoretical work in the field suggests that first-personal...
ALPS: A Diagnostic Challenge Set for Arabic Linguistic & Pragmatic Reasoning
arXiv:2602.17054v1 Announce Type: new Abstract: While recent Arabic NLP benchmarks focus on scale, they often rely on synthetic or translated data which may benefit from deeper linguistic verification. We introduce ALPS (Arabic Linguistic & Pragmatic Suite), a native, expert-curated diagnostic...
The Emergence of Lab-Driven Alignment Signatures: A Psychometric Framework for Auditing Latent Bias and Compounding Risk in Generative AI
arXiv:2602.17127v1 Announce Type: new Abstract: As Large Language Models (LLMs) transition from standalone chat interfaces to foundational reasoning layers in multi-agent systems and recursive evaluation loops (LLM-as-a-judge), the detection of durable, provider-level behavioral signatures becomes a critical requirement for safety...
Fine-Grained Uncertainty Quantification for Long-Form Language Model Outputs: A Comparative Study
arXiv:2602.17431v1 Announce Type: new Abstract: Uncertainty quantification has emerged as an effective approach to closed-book hallucination detection for LLMs, but existing methods are largely designed for short-form outputs and do not generalize well to long-form generation. We introduce a taxonomy...
AIDG: Evaluating Asymmetry Between Information Extraction and Containment in Multi-Turn Dialogue
arXiv:2602.17443v1 Announce Type: new Abstract: Evaluating the strategic reasoning capabilities of Large Language Models (LLMs) requires moving beyond static benchmarks to dynamic, multi-turn interactions. We introduce AIDG (Adversarial Information Deduction Game), a game-theoretic framework that probes the asymmetry between information...
Auditing Reciprocal Sentiment Alignment: Inversion Risk, Dialect Representation and Intent Misalignment in Transformers
arXiv:2602.17469v1 Announce Type: new Abstract: The core theme of bidirectional alignment is ensuring that AI systems accurately understand human intent and that humans can trust AI behavior. However, this loop fractures significantly across language barriers. Our research addresses Cross-Lingual Sentiment...
Learning to Stay Safe: Adaptive Regularization Against Safety Degradation during Fine-Tuning
arXiv:2602.17546v1 Announce Type: new Abstract: Instruction-following language models are trained to be helpful and safe, yet their safety behavior can deteriorate under benign fine-tuning and worsen under adversarial updates. Existing defenses often offer limited protection or force a trade-off between...
Unmasking the Factual-Conceptual Gap in Persian Language Models
arXiv:2602.17623v1 Announce Type: new Abstract: While emerging Persian NLP benchmarks have expanded into pragmatics and politeness, they rarely distinguish between memorized cultural facts and the ability to reason about implicit social norms. We introduce DivanBench, a diagnostic benchmark focused on...
Low-Dimensional and Transversely Curved Optimization Dynamics in Grokking
arXiv:2602.16746v1 Announce Type: new Abstract: Grokking -- the delayed transition from memorization to generalization in small algorithmic tasks -- remains poorly understood. We present a geometric analysis of optimization dynamics in transformers trained on modular arithmetic. PCA of attention weight...
Attending to Routers Aids Indoor Wireless Localization
arXiv:2602.16762v1 Announce Type: new Abstract: Modern machine learning-based wireless localization using Wi-Fi signals continues to face significant challenges in achieving groundbreaking performance across diverse environments. A major limitation is that most existing algorithms do not appropriately weight the information from...
Position: Why a Dynamical Systems Perspective is Needed to Advance Time Series Modeling
arXiv:2602.16864v1 Announce Type: new Abstract: Time series (TS) modeling has come a long way from early statistical, mainly linear, approaches to the current trend in TS foundation models. With a lot of hype and industrial demand in this field, it...
A Unified Framework for Locality in Scalable MARL
arXiv:2602.16966v1 Announce Type: new Abstract: Scalable Multi-Agent Reinforcement Learning (MARL) is fundamentally challenged by the curse of dimensionality. A common solution is to exploit locality, which hinges on an Exponential Decay Property (EDP) of the value function. However, existing conditions...
Early-Warning Signals of Grokking via Loss-Landscape Geometry
arXiv:2602.16967v1 Announce Type: new Abstract: Grokking -- the abrupt transition from memorization to generalization after prolonged training -- has been linked to confinement on low-dimensional execution manifolds in modular arithmetic. Whether this mechanism extends beyond arithmetic remains open. We study...
Fail-Closed Alignment for Large Language Models
arXiv:2602.16977v1 Announce Type: new Abstract: We identify a structural weakness in current large language model (LLM) alignment: modern refusal mechanisms are fail-open. While existing approaches encode refusal behaviors across multiple latent features, suppressing a single dominant feature$-$via prompt-based jailbreaks$-$can cause...