DistillLens: Symmetric Knowledge Distillation Through Logit Lens
arXiv:2602.13567v1 Announce Type: new Abstract: Standard Knowledge Distillation (KD) compresses Large Language Models (LLMs) by optimizing final outputs, yet it typically treats the teacher's intermediate layer's thought process as a black box. While feature-based distillation attempts to bridge this gap,...
Elo-Evolve: A Co-evolutionary Framework for Language Model Alignment
arXiv:2602.13575v1 Announce Type: new Abstract: Current alignment methods for Large Language Models (LLMs) rely on compressing vast amounts of human preference data into static, absolute reward functions, leading to data scarcity, noise sensitivity, and training instability. We introduce Elo-Evolve, a...
Metaphors' journeys across time and genre: tracking the evolution of literary metaphors with temporal embeddings
arXiv:2602.13701v1 Announce Type: new Abstract: Metaphors are a distinctive feature of literary language, yet they remain less studied experimentally than everyday metaphors. Moreover, previous psycholinguistic and computational approaches overlooked the temporal dimension, although many literary metaphors were coined centuries apart...
PrivAct: Internalizing Contextual Privacy Preservation via Multi-Agent Preference Training
arXiv:2602.13840v1 Announce Type: new Abstract: Large language model (LLM) agents are increasingly deployed in personalized tasks involving sensitive, context-dependent information, where privacy violations may arise in agents' action due to the implicitness of contextual privacy. Existing approaches rely on external,...
Predicting Invoice Dilution in Supply Chain Finance with Leakage Free Two Stage XGBoost, KAN (Kolmogorov Arnold Networks), and Ensemble Models
arXiv:2602.15248v1 Announce Type: new Abstract: Invoice or payment dilution is the gap between the approved invoice amount and the actual collection is a significant source of non credit risk and margin loss in supply chain finance. Traditionally, this risk is...
Diffusing to Coordinate: Efficient Online Multi-Agent Diffusion Policies
arXiv:2602.18291v1 Announce Type: new Abstract: Online Multi-Agent Reinforcement Learning (MARL) is a prominent framework for efficient agent coordination. Crucially, enhancing policy expressiveness is pivotal for achieving superior performance. Diffusion-based generative models are well-positioned to meet this demand, having demonstrated remarkable...
CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models
arXiv:2602.17684v1 Announce Type: cross Abstract: Reinforcement Learning from Verifiable Rewards (RLVR) has driven recent progress in code large language models by leveraging execution-based feedback from unit tests, but its scalability is fundamentally constrained by the availability and reliability of high-quality...
Agentic Unlearning: When LLM Agent Meets Machine Unlearning
arXiv:2602.17692v1 Announce Type: cross Abstract: In this paper, we introduce \textbf{agentic unlearning} which removes specified information from both model parameters and persistent memory in agents with closed-loop interaction. Existing unlearning methods target parameters alone, leaving two critical gaps: (i) parameter-memory...
A Case Study of Selected PTQ Baselines for Reasoning LLMs on Ascend NPU
arXiv:2602.17693v1 Announce Type: cross Abstract: Post-Training Quantization (PTQ) is crucial for efficient model deployment, yet its effectiveness on Ascend NPU remains under-explored compared to GPU architectures. This paper presents a case study of representative PTQ baselines applied to reasoning-oriented models...
UBio-MolFM: A Universal Molecular Foundation Model for Bio-Systems
arXiv:2602.17709v1 Announce Type: cross Abstract: All-atom molecular simulation serves as a quintessential ``computational microscope'' for understanding the machinery of life, yet it remains fundamentally limited by the trade-off between quantum-mechanical (QM) accuracy and biological scale. We present UBio-MolFM, a universal...
On the Dynamics of Observation and Semantics
arXiv:2602.18494v1 Announce Type: new Abstract: A dominant paradigm in visual intelligence treats semantics as a static property of latent representations, assuming that meaning can be discovered through geometric proximity in high dimensional embedding spaces. In this work, we argue that...
Task-Aware Exploration via a Predictive Bisimulation Metric
arXiv:2602.18724v1 Announce Type: new Abstract: Accelerating exploration in visual reinforcement learning under sparse rewards remains challenging due to the substantial task-irrelevant variations. Despite advances in intrinsic exploration, many methods either assume access to low-dimensional states or lack task-aware exploration strategies,...
ABD: Default Exception Abduction in Finite First Order Worlds
arXiv:2602.18843v1 Announce Type: new Abstract: We introduce ABD, a benchmark for default-exception abduction over finite first-order worlds. Given a background theory with an abnormality predicate and a set of relational structures, a model must output a first-order formula that defines...
TPRU: Advancing Temporal and Procedural Understanding in Large Multimodal Models
arXiv:2602.18884v1 Announce Type: new Abstract: Multimodal Large Language Models (MLLMs), particularly smaller, deployable variants, exhibit a critical deficiency in understanding temporal and procedural visual data, a bottleneck hindering their application in real-world embodied AI. This gap is largely caused by...
DREAM: Deep Research Evaluation with Agentic Metrics
arXiv:2602.18940v1 Announce Type: new Abstract: Deep Research Agents generate analyst-grade reports, yet evaluating them remains challenging due to the absence of a single ground truth and the multidimensional nature of research quality. Recent benchmarks propose distinct methodologies, yet they suffer...
High Dimensional Procedural Content Generation
arXiv:2602.18943v1 Announce Type: new Abstract: Procedural content generation (PCG) has made substantial progress in shaping static 2D/3D geometry, while most methods treat gameplay mechanics as auxiliary and optimize only over space. We argue that this limits controllability and expressivity, and...
(Perlin) Noise as AI coordinator
arXiv:2602.18947v1 Announce Type: new Abstract: Large scale control of nonplayer agents is central to modern games, while production systems still struggle to balance several competing goals: locally smooth, natural behavior, and globally coordinated variety across space and time. Prior approaches...
Evaluating Large Language Models on Quantum Mechanics: A Comparative Study Across Diverse Models and Tasks
arXiv:2602.19006v1 Announce Type: new Abstract: We present a systematic evaluation of large language models on quantum mechanics problem-solving. Our study evaluates 15 models from five providers (OpenAI, Anthropic, Google, Alibaba, DeepSeek) spanning three capability tiers on 20 tasks covering derivations,...
DoAtlas-1: A Causal Compilation Paradigm for Clinical AI
arXiv:2602.19158v1 Announce Type: new Abstract: Medical foundation models generate narrative explanations but cannot quantify intervention effects, detect evidence conflicts, or validate literature claims, limiting clinical auditability. We propose causal compilation, a paradigm that transforms medical evidence from narrative text into...
Semantic Substrate Theory: An Operator-Theoretic Framework for Geometric Semantic Drift
arXiv:2602.18699v1 Announce Type: new Abstract: Most semantic drift studies report multiple signals e.g., embedding displacement, neighbor changes, distributional divergence, and recursive trajectory instability, without a shared explanatory theory that relates them. This paper proposes a formalization of these signals in...
DeepInnovator: Triggering the Innovative Capabilities of LLMs
arXiv:2602.18920v1 Announce Type: new Abstract: The application of Large Language Models (LLMs) in accelerating scientific discovery has garnered increasing attention, with a key focus on constructing research agents endowed with innovative capability, i.e., the ability to autonomously generate novel and...
Causal Identification from Counterfactual Data: Completeness and Bounding Results
arXiv:2602.23541v1 Announce Type: new Abstract: Previous work establishing completeness results for $\textit{counterfactual identification}$ has been circumscribed to the setting where the input data belongs to observational or interventional distributions (Layers 1 and 2 of Pearl's Causal Hierarchy), since it was...
Planning under Distribution Shifts with Causal POMDPs
arXiv:2602.23545v1 Announce Type: new Abstract: In the real world, planning is often challenged by distribution shifts. As such, a model of the environment obtained under one set of conditions may no longer remain valid as the distribution of states or...
Human or Machine? A Preliminary Turing Test for Speech-to-Speech Interaction
arXiv:2602.24080v1 Announce Type: new Abstract: The pursuit of human-like conversational agents has long been guided by the Turing test. For modern speech-to-speech (S2S) systems, a critical yet unanswered question is whether they can converse like humans. To tackle this, we...
Artificial Agency Program: Curiosity, compression, and communication in agents
arXiv:2602.24100v1 Announce Type: new Abstract: This paper presents the Artificial Agency Program (AAP), a position and research agenda for building AI systems as reality embedded, resource-bounded agents whose development is driven by curiosity-as-learning-progress under physical and computational constraints. The central...
Uncertainty Quantification for Multimodal Large Language Models with Incoherence-adjusted Semantic Volume
arXiv:2602.24195v1 Announce Type: new Abstract: Despite their capabilities, Multimodal Large Language Models (MLLMs) may produce plausible but erroneous outputs, hindering reliable deployment. Accurate uncertainty metrics could enable escalation of unreliable queries to human experts or larger models for improved performance....
Higress-RAG: A Holistic Optimization Framework for Enterprise Retrieval-Augmented Generation via Dual Hybrid Retrieval, Adaptive Routing, and CRAG
arXiv:2602.23374v1 Announce Type: cross Abstract: The integration of Large Language Models (LLMs) into enterprise knowledge management systems has been catalyzed by the Retrieval-Augmented Generation (RAG) paradigm, which augments parametric memory with non-parametric external data. However, the transition from proof-of-concept to...
Now You See Me: Designing Responsible AI Dashboards for Early-Stage Health Innovation
arXiv:2602.23378v1 Announce Type: cross Abstract: Innovative HealthTech teams develop Artificial Intelligence (AI) systems in contexts where ethical expectations and organizational priorities must be balanced under severe resource constraints. While Responsible AI practices are expected to guide the design and evaluation...
CiteAudit: You Cited It, But Did You Read It? A Benchmark for Verifying Scientific References in the LLM Era
arXiv:2602.23452v1 Announce Type: new Abstract: Scientific research relies on accurate citation for attribution and integrity, yet large language models (LLMs) introduce a new risk: fabricated references that appear plausible but correspond to no real publications. Such hallucinated citations have already...
TraderBench: How Robust Are AI Agents in Adversarial Capital Markets?
arXiv:2603.00285v1 Announce Type: new Abstract: Evaluating AI agents in finance faces two key challenges: static benchmarks require costly expert annotation yet miss the dynamic decision-making central to real-world trading, while LLM-based judges introduce uncontrolled variance on domain-specific tasks. We introduce...