Higress-RAG: A Holistic Optimization Framework for Enterprise Retrieval-Augmented Generation via Dual Hybrid Retrieval, Adaptive Routing, and CRAG
arXiv:2602.23374v1 Announce Type: cross Abstract: The integration of Large Language Models (LLMs) into enterprise knowledge management systems has been catalyzed by the Retrieval-Augmented Generation (RAG) paradigm, which augments parametric memory with non-parametric external data. However, the transition from proof-of-concept to...
Task-Lens: Cross-Task Utility Based Speech Dataset Profiling for Low-Resource Indian Languages
arXiv:2602.23388v1 Announce Type: cross Abstract: The rising demand for inclusive speech technologies amplifies the need for multilingual datasets for Natural Language Processing (NLP) research. However, limited awareness of existing task-specific resources in low-resource languages hinders research. This challenge is especially...
Learning to Generate Secure Code via Token-Level Rewards
arXiv:2602.23407v1 Announce Type: cross Abstract: Large language models (LLMs) have demonstrated strong capabilities in code generation, yet they remain prone to producing security vulnerabilities. Existing approaches commonly suffer from two key limitations: the scarcity of high-quality security data and coarse-grained...
Brain-OF: An Omnifunctional Foundation Model for fMRI, EEG and MEG
arXiv:2602.23410v1 Announce Type: cross Abstract: Brain foundation models have achieved remarkable advances across a wide range of neuroscience tasks. However, most existing models are limited to a single functional modality, restricting their ability to exploit complementary spatiotemporal dynamics and the...
DesignSense: A Human Preference Dataset and Reward Modeling Framework for Graphic Layout Generation
arXiv:2602.23438v1 Announce Type: cross Abstract: Graphic layouts serve as an important and engaging medium for visual communication across different channels. While recent layout generation models have demonstrated impressive capabilities, they frequently fail to align with nuanced human aesthetic judgment. Existing...
SALIENT: Frequency-Aware Paired Diffusion for Controllable Long-Tail CT Detection
arXiv:2602.23447v1 Announce Type: cross Abstract: Detection of rare lesions in whole-body CT is fundamentally limited by extreme class imbalance and low target-to-volume ratios, producing precision collapse despite high AUROC. Synthetic augmentation with diffusion models offers promise, yet pixel-space diffusion is...
BiKA: Kolmogorov-Arnold-Network-inspired Ultra Lightweight Neural Network Hardware Accelerator
arXiv:2602.23455v1 Announce Type: cross Abstract: Lightweight neural network accelerators are essential for edge devices with limited resources and power constraints. While quantization and binarization can efficiently reduce hardware cost, they still rely on the conventional Artificial Neural Network (ANN) computation...
Truncated Step-Level Sampling with Process Rewards for Retrieval-Augmented Reasoning
arXiv:2602.23440v1 Announce Type: new Abstract: Training large language models to reason with search engines via reinforcement learning is hindered by a fundamental credit assignment problem: existing methods such as Search-R1 provide only a sparse outcome reward after an entire multi-step...
FHIRPath-QA: Executable Question Answering over FHIR Electronic Health Records
arXiv:2602.23479v1 Announce Type: new Abstract: Though patients are increasingly granted digital access to their electronic health records (EHRs), existing interfaces may not support precise, trustworthy answers to patient-specific questions. Large language models (LLM) show promise in clinical question answering (QA),...
TraderBench: How Robust Are AI Agents in Adversarial Capital Markets?
arXiv:2603.00285v1 Announce Type: new Abstract: Evaluating AI agents in finance faces two key challenges: static benchmarks require costly expert annotation yet miss the dynamic decision-making central to real-world trading, while LLM-based judges introduce uncontrolled variance on domain-specific tasks. We introduce...
EmCoop: A Framework and Benchmark for Embodied Cooperation Among LLM Agents
arXiv:2603.00349v1 Announce Type: new Abstract: Real-world scenarios increasingly require multiple embodied agents to collaborate in dynamic environments under embodied constraints, as many tasks exceed the capabilities of any single agent. Recent advances in large language models (LLMs) enable high-level cognitive...
Monotropic Artificial Intelligence: Toward a Cognitive Taxonomy of Domain-Specialized Language Models
arXiv:2603.00350v1 Announce Type: new Abstract: The prevailing paradigm in artificial intelligence research equates progress with scale: larger models trained on broader datasets are presumed to yield superior capabilities. This assumption, while empirically productive for general-purpose applications, obscures a fundamental epistemological...
Conservative Equilibrium Discovery in Offline Game-Theoretic Multiagent Reinforcement Learning
arXiv:2603.00374v1 Announce Type: new Abstract: Offline learning of strategies takes data efficiency to its extreme by restricting algorithms to a fixed dataset of state-action trajectories. We consider the problem in a mixed-motive multiagent setting, where the goal is to solve...
MED-COPILOT: A Medical Assistant Powered by GraphRAG and Similar Patient Case Retrieval
arXiv:2603.00460v1 Announce Type: new Abstract: Clinical decision-making requires synthesizing heterogeneous evidence, including patient histories, clinical guidelines, and trajectories of comparable cases. While large language models (LLMs) offer strong reasoning capabilities, they remain prone to hallucinations and struggle to integrate long,...
Optimizing In-Context Demonstrations for LLM-based Automated Grading
arXiv:2603.00465v1 Announce Type: new Abstract: Automated assessment of open-ended student responses is a critical capability for scaling personalized feedback in education. While large language models (LLMs) have shown promise in grading tasks via in-context learning (ICL), their reliability is heavily...
From Goals to Aspects, Revisited: An NFR Pattern Language for Agentic AI Systems
arXiv:2603.00472v1 Announce Type: new Abstract: Agentic AI systems exhibit numerous crosscutting concerns -- security, observability, cost management, fault tolerance -- that are poorly modularized in current implementations, contributing to the high failure rate of AI projects in reaching production. The...
LifeEval: A Multimodal Benchmark for Assistive AI in Egocentric Daily Life Tasks
arXiv:2603.00490v1 Announce Type: new Abstract: The rapid progress of Multimodal Large Language Models (MLLMs) marks a significant step toward artificial general intelligence, offering great potential for augmenting human capabilities. However, their ability to provide effective assistance in dynamic, real-world environments...
DenoiseFlow: Uncertainty-Aware Denoising for Reliable LLM Agentic Workflows
arXiv:2603.00532v1 Announce Type: new Abstract: Autonomous agents are increasingly entrusted with complex, long-horizon tasks, ranging from mathematical reasoning to software generation. While agentic workflows facilitate these tasks by decomposing them into multi-step reasoning chains, reliability degrades significantly as the sequence...
MicroVerse: A Preliminary Exploration Toward a Micro-World Simulation
arXiv:2603.00585v1 Announce Type: new Abstract: Recent advances in video generation have opened new avenues for macroscopic simulation of complex dynamic systems, but their application to microscopic phenomena remains largely unexplored. Microscale simulation holds great promise for biomedical applications such as...
Machine Learning Grade Prediction Using Students' Grades and Demographics
arXiv:2603.00608v1 Announce Type: new Abstract: Student repetition in secondary education imposes significant resource burdens, particularly in resource-constrained contexts. Addressing this challenge, this study introduces a unified machine learning framework that simultaneously predicts pass/fail outcomes and continuous grades, a departure from...
InfoPO: Information-Driven Policy Optimization for User-Centric Agents
arXiv:2603.00656v1 Announce Type: new Abstract: Real-world user requests to LLM agents are often underspecified. Agents must interact to acquire missing information and make correct downstream decisions. However, current multi-turn GRPO-based methods often rely on trajectory-level reward computation, which leads to...
MO-MIX: Multi-Objective Multi-Agent Cooperative Decision-Making With Deep Reinforcement Learning
arXiv:2603.00730v1 Announce Type: new Abstract: Deep reinforcement learning (RL) has been applied extensively to solve complex decision-making problems. In many real-world scenarios, tasks often have several conflicting objectives and may require multiple agents to cooperate, which are the multi-objective multi-agent...
MC-Search: Evaluating and Enhancing Multimodal Agentic Search with Structured Long Reasoning Chains
arXiv:2603.00873v1 Announce Type: new Abstract: With the increasing demand for step-wise, cross-modal, and knowledge-grounded reasoning, multimodal large language models (MLLMs) are evolving beyond the traditional fixed retrieve-then-generate paradigm toward more sophisticated agentic multimodal retrieval-augmented generation (MM-RAG). Existing benchmarks, however, mainly...
BioProAgent: Neuro-Symbolic Grounding for Constrained Scientific Planning
arXiv:2603.00876v1 Announce Type: new Abstract: Large language models (LLMs) have demonstrated significant reasoning capabilities in scientific discovery but struggle to bridge the gap to physical execution in wet-labs. In these irreversible environments, probabilistic hallucinations are not merely incorrect, but also...
HiMAC: Hierarchical Macro-Micro Learning for Long-Horizon LLM Agents
arXiv:2603.00977v1 Announce Type: new Abstract: Large language model (LLM) agents have recently demonstrated strong capabilities in interactive decision-making, yet they remain fundamentally limited in long-horizon tasks that require structured planning and reliable execution. Existing approaches predominantly rely on flat autoregressive...
CollabEval: Enhancing LLM-as-a-Judge via Multi-Agent Collaboration
arXiv:2603.00993v1 Announce Type: new Abstract: Large Language Models (LLMs) have revolutionized AI-generated content evaluation, with the LLM-as-a-Judge paradigm becoming increasingly popular. However, current single-LLM evaluation approaches face significant challenges, including inconsistent judgments and inherent biases from pre-training data. To address...
Alien Science: Sampling Coherent but Cognitively Unavailable Research Directions from Idea Atoms
arXiv:2603.01092v1 Announce Type: new Abstract: Large language models are adept at synthesizing and recombining familiar material, yet they often fail at a specific kind of creativity that matters most in research: producing ideas that are both coherent and non-obvious to...
FCN-LLM: Empower LLM for Brain Functional Connectivity Network Understanding via Graph-level Multi-task Instruction Tuning
arXiv:2603.01135v1 Announce Type: new Abstract: Large Language Models have achieved remarkable success in language understanding and reasoning, and their multimodal extensions enable comprehension of images, video, and audio. Inspired by this, foundation models for brain functional connectivity networks derived from...
AutoSkill: Experience-Driven Lifelong Learning via Skill Self-Evolution
arXiv:2603.01145v1 Announce Type: new Abstract: In practical LLM applications, users repeatedly express stable preferences and requirements, such as reducing hallucinations, following institutional writing conventions, or avoiding overly technical wording, yet such interaction experience is seldom consolidated into reusable knowledge. Consequently,...
DeepResearch-9K: A Challenging Benchmark Dataset of Deep-Research Agent
arXiv:2603.01152v1 Announce Type: new Abstract: Deep-research agents are capable of executing multi-step web exploration, targeted retrieval, and sophisticated question answering. Despite their powerful capabilities, deep-research agents face two critical bottlenecks: (1) the lack of large-scale, challenging datasets with real-world difficulty,...