BotzoneBench: Scalable LLM Evaluation via Graded AI Anchors
arXiv:2602.13214v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly deployed in interactive environments requiring strategic decision-making, yet systematic evaluation of these capabilities remains challenging. Existing benchmarks for LLMs primarily assess static reasoning through isolated tasks and fail to...
When to Think Fast and Slow? AMOR: Entropy-Based Metacognitive Gate for Dynamic SSM-Attention Switching
arXiv:2602.13215v1 Announce Type: new Abstract: Transformers allocate uniform computation to every position, regardless of difficulty. State Space Models (SSMs) offer efficient alternatives but struggle with precise information retrieval over a long horizon. Inspired by dual-process theories of cognition (Kahneman, 2011),...
VeRA: Verified Reasoning Data Augmentation at Scale
arXiv:2602.13217v1 Announce Type: new Abstract: The main issue with most evaluation schemes today is their "static" nature: the same problems are reused repeatedly, allowing for memorization, format exploitation, and eventual saturation. To measure genuine AI progress, we need evaluation that...
X-Blocks: Linguistic Building Blocks of Natural Language Explanations for Automated Vehicles
arXiv:2602.13248v1 Announce Type: new Abstract: Natural language explanations play a critical role in establishing trust and acceptance of automated vehicles (AVs), yet existing approaches lack systematic frameworks for analysing how humans linguistically construct driving rationales across diverse scenarios. This paper...
NeuroWeaver: An Autonomous Evolutionary Agent for Exploring the Programmatic Space of EEG Analysis Pipelines
arXiv:2602.13473v1 Announce Type: new Abstract: Although foundation models have demonstrated remarkable success in general domains, the application of these models to electroencephalography (EEG) analysis is constrained by substantial data requirements and high parameterization. These factors incur prohibitive computational costs, thereby...
Differentiable Rule Induction from Raw Sequence Inputs
arXiv:2602.13583v1 Announce Type: new Abstract: Rule learning-based models are widely used in highly interpretable scenarios due to their transparent structures. Inductive logic programming (ILP), a form of machine learning, induces rules from facts while maintaining interpretability. Differentiable ILP models enhance...
The Quantization Trap: Breaking Linear Scaling Laws in Multi-Hop Reasoning
arXiv:2602.13595v1 Announce Type: new Abstract: Neural scaling laws provide a predictable recipe for AI advancement: reducing numerical precision should linearly improve computational efficiency and energy profile (E proportional to bits). In this paper, we demonstrate that this scaling law breaks...
PhGPO: Pheromone-Guided Policy Optimization for Long-Horizon Tool Planning
arXiv:2602.13691v1 Announce Type: new Abstract: Recent advancements in Large Language Model (LLM) agents have demonstrated strong capabilities in executing complex tasks through tool use. However, long-horizon multi-step tool planning is challenging, because the exploration space suffers from a combinatorial explosion....
ADAB: Arabic Dataset for Automated Politeness Benchmarking -- A Large-Scale Resource for Computational Sociopragmatics
arXiv:2602.13870v1 Announce Type: new Abstract: The growing importance of culturally-aware natural language processing systems has led to an increasing demand for resources that capture sociopragmatic phenomena across diverse languages. Nevertheless, Arabic-language resources for politeness detection remain under-explored, despite the rich...
Chain-of-Thought Reasoning with Large Language Models for Clinical Alzheimer's Disease Assessment and Diagnosis
arXiv:2602.13979v1 Announce Type: new Abstract: Alzheimer's disease (AD) has become a prevalent neurodegenerative disease worldwide. Traditional diagnosis still relies heavily on medical imaging and clinical assessment by physicians, which is often time-consuming and resource-intensive in terms of both human expertise...
LM-Lexicon: Improving Definition Modeling via Harmonizing Semantic Experts
arXiv:2602.14060v1 Announce Type: new Abstract: We introduce LM-Lexicon, an innovative definition modeling approach that incorporates data clustering, semantic expert learning, and model merging using a sparse mixture-of-experts architecture. By decomposing the definition modeling task into specialized semantic domains, where small...
Attention-gated U-Net model for semantic segmentation of brain tumors and feature extraction for survival prognosis
arXiv:2602.15067v1 Announce Type: new Abstract: Gliomas, among the most common primary brain tumors, vary widely in aggressiveness, prognosis, and histology, making treatment challenging due to complex and time-intensive surgical interventions. This study presents an Attention-Gated Recurrent Residual U-Net (R2U-Net) based...
SOMtime the World Ain$'$t Fair: Violating Fairness Using Self-Organizing Maps
arXiv:2602.18201v1 Announce Type: new Abstract: Unsupervised representations are widely assumed to be neutral with respect to sensitive attributes when those attributes are withheld from training. We show that this assumption is false. Using SOMtime, a topology-preserving representation method based on...
AsynDBT: Asynchronous Distributed Bilevel Tuning for efficient In-Context Learning with Large Language Models
arXiv:2602.17694v1 Announce Type: cross Abstract: With the rapid development of large language models (LLMs), an increasing number of applications leverage cloud-based LLM APIs to reduce usage costs. However, since cloud-based models' parameters and gradients are agnostic, users have to manually...
ScaleBITS: Scalable Bitwidth Search for Hardware-Aligned Mixed-Precision LLMs
arXiv:2602.17698v1 Announce Type: cross Abstract: Post-training weight quantization is crucial for reducing the memory and inference cost of large language models (LLMs), yet pushing the average precision below 4 bits remains challenging due to highly non-uniform weight sensitivity and the...
Detection and Classification of Cetacean Echolocation Clicks using Image-based Object Detection Methods applied to Advanced Wavelet-based Transformations
arXiv:2602.17749v1 Announce Type: cross Abstract: A challenge in marine bioacoustic analysis is the detection of animal signals, like calls, whistles and clicks, for behavioral studies. Manual labeling is too time-consuming to process sufficient data to get reasonable results. Thus, an...
Inelastic Constitutive Kolmogorov-Arnold Networks: A generalized framework for automated discovery of interpretable inelastic material models
arXiv:2602.17750v1 Announce Type: cross Abstract: A key problem of solid mechanics is the identification of the constitutive law of a material, that is, the relation between strain and stress. Machine learning has lead to considerable advances in this field lately....
Investigating Target Class Influence on Neural Network Compressibility for Energy-Autonomous Avian Monitoring
arXiv:2602.17751v1 Announce Type: cross Abstract: Biodiversity loss poses a significant threat to humanity, making wildlife monitoring essential for assessing ecosystem health. Avian species are ideal subjects for this due to their popularity and the ease of identifying them through their...
Symbolic computation of conservation laws of nonlinear partial differential equations in multi‐dimensions
Abstract A direct method for the computation of polynomial conservation laws of polynomial systems of nonlinear partial differential equations (PDEs) in multi‐dimensions is presented. The method avoids advanced differential‐geometric tools. Instead, it is solely based on calculus, variational calculus, and...
Decoding ML Decision: An Agentic Reasoning Framework for Large-Scale Ranking System
arXiv:2602.18640v1 Announce Type: new Abstract: Modern large-scale ranking systems operate within a sophisticated landscape of competing objectives, operational constraints, and evolving product requirements. Progress in this domain is increasingly bottlenecked by the engineering context constraint: the arduous process of translating...
Modularity is the Bedrock of Natural and Artificial Intelligence
arXiv:2602.18960v1 Announce Type: new Abstract: The remarkable performance of modern AI systems has been driven by unprecedented scales of data, computation, and energy -- far exceeding the resources required by human intelligence. This disparity highlights the need for new guiding...
InfEngine: A Self-Verifying and Self-Optimizing Intelligent Engine for Infrared Radiation Computing
arXiv:2602.18985v1 Announce Type: new Abstract: Infrared radiation computing underpins advances in climate science, remote sensing and spectroscopy but remains constrained by manual workflows. We introduce InfEngine, an autonomous intelligent computational engine designed to drive a paradigm shift from human-led orchestration...
K-Search: LLM Kernel Generation via Co-Evolving Intrinsic World Model
arXiv:2602.19128v1 Announce Type: new Abstract: Optimizing GPU kernels is critical for efficient modern machine learning systems yet remains challenging due to the complex interplay of design factors and rapid hardware evolution. Existing automated approaches typically treat Large Language Models (LLMs)...
Characterizing MARL for Energy Control: A Multi-KPI Benchmark on the CityLearn Environment
arXiv:2602.19223v1 Announce Type: new Abstract: The optimization of urban energy systems is crucial for the advancement of sustainable and resilient smart cities, which are becoming increasingly complex with multiple decision-making units. To address scalability and coordination concerns, Multi-Agent Reinforcement Learning...
ALPACA: A Reinforcement Learning Environment for Medication Repurposing and Treatment Optimization in Alzheimer's Disease
arXiv:2602.19298v1 Announce Type: new Abstract: Evaluating personalized, sequential treatment strategies for Alzheimer's disease (AD) using clinical trials is often impractical due to long disease horizons and substantial inter-patient heterogeneity. To address these constraints, we present the Alzheimer's Learning Platform for...
ReHear: Iterative Pseudo-Label Refinement for Semi-Supervised Speech Recognition via Audio Large Language Models
arXiv:2602.18721v1 Announce Type: new Abstract: Semi-supervised learning in automatic speech recognition (ASR) typically relies on pseudo-labeling, which often suffers from confirmation bias and error accumulation due to noisy supervision. To address this limitation, we propose ReHear, a framework for iterative...
RUMAD: Reinforcement-Unifying Multi-Agent Debate
arXiv:2602.23864v1 Announce Type: new Abstract: Multi-agent debate (MAD) systems leverage collective intelligence to enhance reasoning capabilities, yet existing approaches struggle to simultaneously optimize accuracy, consensus formation, and computational efficiency. Static topology methods lack adaptability to task complexity variations, while external...
Bi-level RL-Heuristic Optimization for Real-world Winter Road Maintenance
arXiv:2602.24097v1 Announce Type: new Abstract: Winter road maintenance is critical for ensuring public safety and reducing environmental impacts, yet existing methods struggle to manage large-scale routing problems effectively and mostly reply on human decision. This study presents a novel, scalable...
Learning Flexible Job Shop Scheduling under Limited Buffers and Material Kitting Constraints
arXiv:2602.24180v1 Announce Type: new Abstract: The Flexible Job Shop Scheduling Problem (FJSP) originates from real production lines, while some practical constraints are often ignored or idealized in current FJSP studies, among which the limited buffer problem has a particular impact...
QD-MAPPER: A Quality Diversity Framework to Automatically Evaluate Multi-Agent Path Finding Algorithms in Diverse Maps
arXiv:2409.06888v5 Announce Type: cross Abstract: We use the Quality Diversity (QD) algorithm with Neural Cellular Automata (NCA) to automatically evaluate Multi-Agent Path Finding (MAPF) algorithms by generating diverse maps. Previously, researchers typically evaluate MAPF algorithms on a set of specific,...