Exploring Human Behavior During Abstract Rule Inference and Problem Solving with the Cognitive Abstraction and Reasoning Corpus
arXiv:2602.22408v1 Announce Type: new Abstract: Humans exhibit remarkable flexibility in abstract reasoning, and can rapidly learn and apply rules from sparse examples. To investigate the cognitive strategies underlying this ability, we introduce the Cognitive Abstraction and Reasoning Corpus (CogARC), a...
Epistemic Filtering and Collective Hallucination: A Jury Theorem for Confidence-Calibrated Agents
arXiv:2602.22413v1 Announce Type: new Abstract: We investigate the collective accuracy of heterogeneous agents who learn to estimate their own reliability over time and selectively abstain from voting. While classical epistemic voting results, such as the \textit{Condorcet Jury Theorem} (CJT), assume...
How Do Latent Reasoning Methods Perform Under Weak and Strong Supervision?
arXiv:2602.22441v1 Announce Type: new Abstract: Latent reasoning has been recently proposed as a reasoning paradigm and performs multi-step reasoning through generating steps in the latent space instead of the textual space. This paradigm enables reasoning beyond discrete language tokens by...
CWM: Contrastive World Models for Action Feasibility Learning in Embodied Agent Pipelines
arXiv:2602.22452v1 Announce Type: new Abstract: A reliable action feasibility scorer is a critical bottleneck in embodied agent pipelines: before any planning or reasoning occurs, the agent must identify which candidate actions are physically executable in the current state. Existing approaches...
ConstraintBench: Benchmarking LLM Constraint Reasoning on Direct Optimization
arXiv:2602.22465v1 Announce Type: new Abstract: Large language models are increasingly applied to operational decision-making where the underlying structure is constrained optimization. Existing benchmarks evaluate whether LLMs can formulate optimization problems as solver code, but leave open a complementary question. Can...
VeRO: An Evaluation Harness for Agents to Optimize Agents
arXiv:2602.22480v1 Announce Type: new Abstract: An important emerging application of coding agents is agent optimization: the iterative improvement of a target agent through edit-execute-evaluate cycles. Despite its relevance, the community lacks a systematic understanding of coding agent performance on this...
Mapping the Landscape of Artificial Intelligence in Life Cycle Assessment Using Large Language Models
arXiv:2602.22500v1 Announce Type: new Abstract: Integration of artificial intelligence (AI) into life cycle assessment (LCA) has accelerated in recent years, with numerous studies successfully adapting machine learning algorithms to support various stages of LCA. Despite this rapid development, comprehensive and...
A Mathematical Theory of Agency and Intelligence
arXiv:2602.22519v1 Announce Type: new Abstract: To operate reliably under changing conditions, complex systems require feedback on how effectively they use resources, not just whether objectives are met. Current AI systems process vast information to produce sophisticated predictions, yet predictions can...
Cognitive Models and AI Algorithms Provide Templates for Designing Language Agents
arXiv:2602.22523v1 Announce Type: new Abstract: While contemporary large language models (LLMs) are increasingly capable in isolation, there are still many difficult problems that lie beyond the abilities of a single LLM. For such tasks, there is still uncertainty about how...
Agentic AI for Intent-driven Optimization in Cell-free O-RAN
arXiv:2602.22539v1 Announce Type: new Abstract: Agentic artificial intelligence (AI) is emerging as a key enabler for autonomous radio access networks (RANs), where multiple large language model (LLM)-based agents reason and collaborate to achieve operator-defined intents. The open RAN (O-RAN) architecture...
Requesting Expert Reasoning: Augmenting LLM Agents with Learned Collaborative Intervention
arXiv:2602.22546v1 Announce Type: new Abstract: Large Language Model (LLM) based agents excel at general reasoning but often fail in specialized domains where success hinges on long-tail knowledge absent from their training data. While human experts can provide this missing knowledge,...
Strategy Executability in Mathematical Reasoning: Leveraging Human-Model Differences for Effective Guidance
arXiv:2602.22583v1 Announce Type: new Abstract: Example-based guidance is widely used to improve mathematical reasoning at inference time, yet its effectiveness is highly unstable across problems and models-even when the guidance is correct and problem-relevant. We show that this instability arises...
SideQuest: Model-Driven KV Cache Management for Long-Horizon Agentic Reasoning
arXiv:2602.22603v1 Announce Type: new Abstract: Long-running agentic tasks, such as deep research, require multi-hop reasoning over information distributed across multiple webpages and documents. In such tasks, the LLM context is dominated by tokens from external retrieval, causing memory usage to...
AHBid: An Adaptable Hierarchical Bidding Framework for Cross-Channel Advertising
arXiv:2602.22650v1 Announce Type: new Abstract: In online advertising, the inherent complexity and dynamic nature of advertising environments necessitate the use of auto-bidding services to assist advertisers in bid optimization. This complexity is further compounded in multi-channel scenarios, where effective allocation...
Toward Personalized LLM-Powered Agents: Foundations, Evaluation, and Future Directions
arXiv:2602.22680v1 Announce Type: new Abstract: Large language models have enabled agents that reason, plan, and interact with tools and environments to accomplish complex tasks. As these agents operate over extended interaction horizons, their effectiveness increasingly depends on adapting behavior to...
Knob: A Physics-Inspired Gating Interface for Interpretable and Controllable Neural Dynamics
arXiv:2602.22702v1 Announce Type: new Abstract: Existing neural network calibration methods often treat calibration as a static, post-hoc optimization task. However, this neglects the dynamic and temporal nature of real-world inference. Moreover, existing methods do not provide an intuitive interface enabling...
RLHFless: Serverless Computing for Efficient RLHF
arXiv:2602.22718v1 Announce Type: new Abstract: Reinforcement Learning from Human Feedback (RLHF) has been widely applied to Large Language Model (LLM) post-training to align model outputs with human preferences. Recent models, such as DeepSeek-R1, have also shown RLHF's potential to improve...
Generative Data Transformation: From Mixed to Unified Data
arXiv:2602.22743v1 Announce Type: new Abstract: Recommendation model performance is intrinsically tied to the quality, volume, and relevance of their training data. To address common challenges like data sparsity and cold start, recent researchs have leveraged data from multiple auxiliary domains...
Know What You Know: Metacognitive Entropy Calibration for Verifiable RL Reasoning
arXiv:2602.22751v1 Announce Type: new Abstract: Large reasoning models (LRMs) have emerged as a powerful paradigm for solving complex real-world tasks. In practice, these models are predominantly trained via Reinforcement Learning with Verifiable Rewards (RLVR), yet most existing outcome-only RLVR pipelines...
Decomposing Physician Disagreement in HealthBench
arXiv:2602.22758v1 Announce Type: new Abstract: We decompose physician disagreement in the HealthBench medical AI evaluation dataset to understand where variance resides and what observable features can explain it. Rubric identity accounts for 15.8% of met/not-met label variance but only 3.6-6.9%...
MiroFlow: Towards High-Performance and Robust Open-Source Agent Framework for General Deep Research Tasks
arXiv:2602.22808v1 Announce Type: new Abstract: Despite the remarkable progress of large language models (LLMs), the capabilities of standalone LLMs have begun to plateau when tackling real-world, complex tasks that require interaction with external tools and dynamic environments. Although recent agent...
FlexMS is a flexible framework for benchmarking deep learning-based mass spectrum prediction tools in metabolomics
arXiv:2602.22822v1 Announce Type: new Abstract: The identification and property prediction of chemical molecules is of central importance in the advancement of drug discovery and material science, where the tandem mass spectrometry technology gives valuable fragmentation cues in the form of...
DeepPresenter: Environment-Grounded Reflection for Agentic Presentation Generation
arXiv:2602.22839v1 Announce Type: new Abstract: Presentation generation requires deep content research, coherent visual design, and iterative refinement based on observation. However, existing presentation agents often rely on predefined workflows and fixed templates. To address this, we present DeepPresenter, an agentic...
The AI Research Assistant: Promise, Peril, and a Proof of Concept
arXiv:2602.22842v1 Announce Type: new Abstract: Can artificial intelligence truly contribute to creative mathematical research, or does it merely automate routine calculations while introducing risks of error? We provide empirical evidence through a detailed case study: the discovery of novel error...
Towards LLM-Empowered Knowledge Tracing via LLM-Student Hierarchical Behavior Alignment in Hyperbolic Space
arXiv:2602.22879v1 Announce Type: new Abstract: Knowledge Tracing (KT) diagnoses students' concept mastery through continuous learning state monitoring in education.Existing methods primarily focus on studying behavioral sequences based on ID or textual information.While existing methods rely on ID-based sequences or shallow...
FactGuard: Agentic Video Misinformation Detection via Reinforcement Learning
arXiv:2602.22963v1 Announce Type: new Abstract: Multimodal large language models (MLLMs) have substantially advanced video misinformation detection through unified multimodal reasoning, but they often rely on fixed-depth inference and place excessive trust in internally generated assumptions, particularly in scenarios where critical...
RepSPD: Enhancing SPD Manifold Representation in EEGs via Dynamic Graphs
arXiv:2602.22981v1 Announce Type: new Abstract: Decoding brain activity from electroencephalography (EEG) is crucial for neuroscience and clinical applications. Among recent advances in deep learning for EEG, geometric learning stands out as its theoretical underpinnings on symmetric positive definite (SPD) allows...
Obscure but Effective: Classical Chinese Jailbreak Prompt Optimization via Bio-Inspired Search
arXiv:2602.22983v1 Announce Type: new Abstract: As Large Language Models (LLMs) are increasingly used, their security risks have drawn increasing attention. Existing research reveals that LLMs are highly susceptible to jailbreak attacks, with effectiveness varying across language contexts. This paper investigates...
Learning-based Multi-agent Race Strategies in Formula 1
arXiv:2602.23056v1 Announce Type: new Abstract: In Formula 1, race strategies are adapted according to evolving race conditions and competitors' actions. This paper proposes a reinforcement learning approach for multi-agent race strategy optimization. Agents learn to balance energy management, tire degradation,...
Enhancing CVRP Solver through LLM-driven Automatic Heuristic Design
arXiv:2602.23092v1 Announce Type: new Abstract: The Capacitated Vehicle Routing Problem (CVRP), a fundamental combinatorial optimization challenge, focuses on optimizing fleet operations under vehicle capacity constraints. While extensively studied in operational research, the NP-hard nature of CVRP continues to pose significant...