DeepInnovator: Triggering the Innovative Capabilities of LLMs
arXiv:2602.18920v1 Announce Type: new Abstract: The application of Large Language Models (LLMs) in accelerating scientific discovery has garnered increasing attention, with a key focus on constructing research agents endowed with innovative capability, i.e., the ability to autonomously generate novel and...
Yor-Sarc: A gold-standard dataset for sarcasm detection in a low-resource African language
arXiv:2602.18964v1 Announce Type: new Abstract: Sarcasm detection poses a fundamental challenge in computational semantics, requiring models to resolve disparities between literal and intended meaning. The challenge is amplified in low-resource languages where annotated datasets are scarce or nonexistent. We present...
Whisper: Courtside Edition Enhancing ASR Performance Through LLM-Driven Context Generation
arXiv:2602.18966v1 Announce Type: new Abstract: Domain-specific speech remains a persistent challenge for automatic speech recognition (ASR), even for state-of-the-art systems like OpenAI's Whisper. We introduce Whisper: Courtside Edition, a novel multi-agent large language model (LLM) pipeline that enhances Whisper transcriptions...
An Agentic LLM Framework for Adverse Media Screening in AML Compliance
arXiv:2602.23373v1 Announce Type: new Abstract: Adverse media screening is a critical component of anti-money laundering (AML) and know-your-customer (KYC) compliance processes in financial institutions. Traditional approaches rely on keyword-based searches that generate high false-positive rates or require extensive manual review....
Causal Identification from Counterfactual Data: Completeness and Bounding Results
arXiv:2602.23541v1 Announce Type: new Abstract: Previous work establishing completeness results for $\textit{counterfactual identification}$ has been circumscribed to the setting where the input data belongs to observational or interventional distributions (Layers 1 and 2 of Pearl's Causal Hierarchy), since it was...
Construct, Merge, Solve & Adapt with Reinforcement Learning for the min-max Multiple Traveling Salesman Problem
arXiv:2602.23579v1 Announce Type: new Abstract: The Multiple Traveling Salesman Problem (mTSP) extends the Traveling Salesman Problem to m tours that start and end at a common depot and jointly visit all customers exactly once. In the min-max variant, the objective...
PseudoAct: Leveraging Pseudocode Synthesis for Flexible Planning and Action Control in Large Language Model Agents
arXiv:2602.23668v1 Announce Type: new Abstract: Large language model (LLM) agents typically rely on reactive decision-making paradigms such as ReAct, selecting actions conditioned on growing execution histories. While effective for short tasks, these approaches often lead to redundant tool usage, unstable...
ODAR: Principled Adaptive Routing for LLM Reasoning via Active Inference
arXiv:2602.23681v1 Announce Type: new Abstract: The paradigm of large language model (LLM) reasoning is shifting from parameter scaling to test-time compute scaling, yet many existing approaches still rely on uniform brute-force sampling (for example, fixed best-of-N or self-consistency) that is...
From Flat Logs to Causal Graphs: Hierarchical Failure Attribution for LLM-based Multi-Agent Systems
arXiv:2602.23701v1 Announce Type: new Abstract: LLM-powered Multi-Agent Systems (MAS) have demonstrated remarkable capabilities in complex domains but suffer from inherent fragility and opaque failure mechanisms. Existing failure attribution methods, whether relying on direct prompting, costly replays, or supervised fine-tuning, typically...
Reasoning-Driven Multimodal LLM for Domain Generalization
arXiv:2602.23777v1 Announce Type: new Abstract: This paper addresses the domain generalization (DG) problem in deep learning. While most DG methods focus on enforcing visual feature invariance, we leverage the reasoning capability of multimodal large language models (MLLMs) and explore the...
EMO-R3: Reflective Reinforcement Learning for Emotional Reasoning in Multimodal Large Language Models
arXiv:2602.23802v1 Announce Type: new Abstract: Multimodal Large Language Models (MLLMs) have shown remarkable progress in visual reasoning and understanding tasks but still struggle to capture the complexity and subjectivity of human emotions. Existing approaches based on supervised fine-tuning often suffer...
RUMAD: Reinforcement-Unifying Multi-Agent Debate
arXiv:2602.23864v1 Announce Type: new Abstract: Multi-agent debate (MAD) systems leverage collective intelligence to enhance reasoning capabilities, yet existing approaches struggle to simultaneously optimize accuracy, consensus formation, and computational efficiency. Static topology methods lack adaptability to task complexity variations, while external...
Human or Machine? A Preliminary Turing Test for Speech-to-Speech Interaction
arXiv:2602.24080v1 Announce Type: new Abstract: The pursuit of human-like conversational agents has long been guided by the Turing test. For modern speech-to-speech (S2S) systems, a critical yet unanswered question is whether they can converse like humans. To tackle this, we...
Bi-level RL-Heuristic Optimization for Real-world Winter Road Maintenance
arXiv:2602.24097v1 Announce Type: new Abstract: Winter road maintenance is critical for ensuring public safety and reducing environmental impacts, yet existing methods struggle to manage large-scale routing problems effectively and mostly reply on human decision. This study presents a novel, scalable...
Artificial Agency Program: Curiosity, compression, and communication in agents
arXiv:2602.24100v1 Announce Type: new Abstract: This paper presents the Artificial Agency Program (AAP), a position and research agenda for building AI systems as reality embedded, resource-bounded agents whose development is driven by curiosity-as-learning-progress under physical and computational constraints. The central...
Recycling Failures: Salvaging Exploration in RLVR via Fine-Grained Off-Policy Guidance
arXiv:2602.24110v1 Announce Type: new Abstract: Reinforcement Learning from Verifiable Rewards (RLVR) has emerged as a powerful paradigm for enhancing the complex reasoning capabilities of Large Reasoning Models. However, standard outcome-based supervision suffers from a critical limitation that penalizes trajectories that...
LemmaBench: A Live, Research-Level Benchmark to Evaluate LLM Capabilities in Mathematics
arXiv:2602.24173v1 Announce Type: new Abstract: We present a new approach for benchmarking Large Language Model (LLM) capabilities on research-level mathematics. Existing benchmarks largely rely on static, hand-curated sets of contest or textbook-style problems as proxies for mathematical research. Instead, we...
A Minimal Agent for Automated Theorem Proving
arXiv:2602.24273v1 Announce Type: new Abstract: We propose a minimal agentic baseline that enables systematic comparison across different AI-based theorem prover architectures. This design implements the core features shared among state-of-the-art systems: iterative proof refinement, library search and context management. We...
DARE-bench: Evaluating Modeling and Instruction Fidelity of LLMs in Data Science
arXiv:2602.24288v1 Announce Type: new Abstract: The fast-growing demands in using Large Language Models (LLMs) to tackle complex multi-step data science tasks create an emergent need for accurate benchmarking. There are two major gaps in existing benchmarks: (i) the lack of...
Let There Be Claws: An Early Social Network Analysis of AI Agents on Moltbook
arXiv:2602.20044v1 Announce Type: cross Abstract: Within twelve days of launch, an AI-native social platform exhibits extreme attention concentration, hierarchical role separation, and one-way attention flow, consistent with the hypothesis that stratification in agent ecosystems can emerge rapidly rather than gradually....
Democratizing GraphRAG: Linear, CPU-Only Graph Retrieval for Multi-Hop QA
arXiv:2602.23372v1 Announce Type: cross Abstract: GraphRAG systems improve multi-hop retrieval by modeling structure, but many approaches rely on expensive LLM-based graph construction and GPU-heavy inference. We present SPRIG (Seeded Propagation for Retrieval In Graphs), a CPU-only, linear-time, token-free GraphRAG pipeline...
Higress-RAG: A Holistic Optimization Framework for Enterprise Retrieval-Augmented Generation via Dual Hybrid Retrieval, Adaptive Routing, and CRAG
arXiv:2602.23374v1 Announce Type: cross Abstract: The integration of Large Language Models (LLMs) into enterprise knowledge management systems has been catalyzed by the Retrieval-Augmented Generation (RAG) paradigm, which augments parametric memory with non-parametric external data. However, the transition from proof-of-concept to...
Task-Lens: Cross-Task Utility Based Speech Dataset Profiling for Low-Resource Indian Languages
arXiv:2602.23388v1 Announce Type: cross Abstract: The rising demand for inclusive speech technologies amplifies the need for multilingual datasets for Natural Language Processing (NLP) research. However, limited awareness of existing task-specific resources in low-resource languages hinders research. This challenge is especially...
Learning to Generate Secure Code via Token-Level Rewards
arXiv:2602.23407v1 Announce Type: cross Abstract: Large language models (LLMs) have demonstrated strong capabilities in code generation, yet they remain prone to producing security vulnerabilities. Existing approaches commonly suffer from two key limitations: the scarcity of high-quality security data and coarse-grained...
Brain-OF: An Omnifunctional Foundation Model for fMRI, EEG and MEG
arXiv:2602.23410v1 Announce Type: cross Abstract: Brain foundation models have achieved remarkable advances across a wide range of neuroscience tasks. However, most existing models are limited to a single functional modality, restricting their ability to exploit complementary spatiotemporal dynamics and the...
DesignSense: A Human Preference Dataset and Reward Modeling Framework for Graphic Layout Generation
arXiv:2602.23438v1 Announce Type: cross Abstract: Graphic layouts serve as an important and engaging medium for visual communication across different channels. While recent layout generation models have demonstrated impressive capabilities, they frequently fail to align with nuanced human aesthetic judgment. Existing...
SALIENT: Frequency-Aware Paired Diffusion for Controllable Long-Tail CT Detection
arXiv:2602.23447v1 Announce Type: cross Abstract: Detection of rare lesions in whole-body CT is fundamentally limited by extreme class imbalance and low target-to-volume ratios, producing precision collapse despite high AUROC. Synthetic augmentation with diffusion models offers promise, yet pixel-space diffusion is...
BiKA: Kolmogorov-Arnold-Network-inspired Ultra Lightweight Neural Network Hardware Accelerator
arXiv:2602.23455v1 Announce Type: cross Abstract: Lightweight neural network accelerators are essential for edge devices with limited resources and power constraints. While quantization and binarization can efficiently reduce hardware cost, they still rely on the conventional Artificial Neural Network (ANN) computation...
Truncated Step-Level Sampling with Process Rewards for Retrieval-Augmented Reasoning
arXiv:2602.23440v1 Announce Type: new Abstract: Training large language models to reason with search engines via reinforcement learning is hindered by a fundamental credit assignment problem: existing methods such as Search-R1 provide only a sparse outcome reward after an entire multi-step...
FHIRPath-QA: Executable Question Answering over FHIR Electronic Health Records
arXiv:2602.23479v1 Announce Type: new Abstract: Though patients are increasingly granted digital access to their electronic health records (EHRs), existing interfaces may not support precise, trustworthy answers to patient-specific questions. Large language models (LLM) show promise in clinical question answering (QA),...