Call for Proposals: “AIx” Pop-Up Events
We are now accepting proposals for AAAI-sponsored “AIx” Pop-Up Events — TEDx-style talks, panels, or public forums
A Theoretical Framework for Adaptive Utility-Weighted Benchmarking
arXiv:2602.12356v1 Announce Type: new Abstract: Benchmarking has long served as a foundational practice in machine learning and, increasingly, in modern AI systems such as large language models, where shared tasks, metrics, and leaderboards offer a common basis for measuring progress...
To Mix or To Merge: Toward Multi-Domain Reinforcement Learning for Large Language Models
arXiv:2602.12566v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) plays a key role in stimulating the explicit reasoning capability of Large Language Models (LLMs). We can achieve expert-level performance in some specific domains via RLVR, such as coding...
Can I Have Your Order? Monte-Carlo Tree Search for Slot Filling Ordering in Diffusion Language Models
arXiv:2602.12586v1 Announce Type: new Abstract: While plan-and-infill decoding in Masked Diffusion Models (MDMs) shows promise for mathematical and code reasoning, performance remains highly sensitive to slot infilling order, often yielding substantial output variance. We introduce McDiffuSE, a framework that formulates...
GeoAgent: Learning to Geolocate Everywhere with Reinforced Geographic Characteristics
arXiv:2602.12617v1 Announce Type: new Abstract: This paper presents GeoAgent, a model capable of reasoning closely with humans and deriving fine-grained address conclusions. Previous RL-based methods have achieved breakthroughs in performance and interpretability but still remain concerns because of their reliance...
AI Agents for Inventory Control: Human-LLM-OR Complementarity
arXiv:2602.12631v1 Announce Type: new Abstract: Inventory control is a fundamental operations problem in which ordering decisions are traditionally guided by theoretically grounded operations research (OR) algorithms. However, such algorithms often rely on rigid modeling assumptions and can perform poorly when...
X-SYS: A Reference Architecture for Interactive Explanation Systems
arXiv:2602.12748v1 Announce Type: new Abstract: The explainable AI (XAI) research community has proposed numerous technical methods, yet deploying explainability as systems remains challenging: Interactive explanation systems require both suitable algorithms and system capabilities that maintain explanation usability across repeated queries,...
WebClipper: Efficient Evolution of Web Agents with Graph-based Trajectory Pruning
arXiv:2602.12852v1 Announce Type: new Abstract: Deep Research systems based on web agents have shown strong potential in solving complex information-seeking tasks, yet their search efficiency remains underexplored. We observe that many state-of-the-art open-source web agents rely on long tool-call trajectories...
Optimal Take-off under Fuzzy Clearances
arXiv:2602.13166v1 Announce Type: new Abstract: This paper presents a hybrid obstacle avoidance architecture that integrates Optimal Control under clearance with a Fuzzy Rule Based System (FRBS) to enable adaptive constraint handling for unmanned aircraft. Motivated by the limitations of classical...
Language-Guided Invariance Probing of Vision-Language Models
arXiv:2511.13494v1 Announce Type: cross Abstract: Recent vision-language models (VLMs) such as CLIP, OpenCLIP, EVA02-CLIP and SigLIP achieve strong zero-shot performance, but it is unclear how reliably they respond to controlled linguistic perturbations. We introduce Language-Guided Invariance Probing (LGIP), a benchmark...
A Lightweight LLM Framework for Disaster Humanitarian Information Classification
arXiv:2602.12284v1 Announce Type: cross Abstract: Timely classification of humanitarian information from social media is critical for effective disaster response. However, deploying large language models (LLMs) for this task faces challenges in resource-constrained emergency settings. This paper develops a lightweight, cost-effective...
From Biased Chatbots to Biased Agents: Examining Role Assignment Effects on LLM Agent Robustness
arXiv:2602.12285v1 Announce Type: cross Abstract: Large Language Models (LLMs) are increasingly deployed as autonomous agents capable of actions with real-world impacts beyond text generation. While persona-induced biases in text generation are well documented, their effects on agent task performance remain...
Energy-Aware Reinforcement Learning for Robotic Manipulation of Articulated Components in Infrastructure Operation and Maintenance
arXiv:2602.12288v1 Announce Type: cross Abstract: With the growth of intelligent civil infrastructure and smart cities, operation and maintenance (O&M) increasingly requires safe, efficient, and energy-conscious robotic manipulation of articulated components, including access doors, service drawers, and pipeline valves. However, existing...
Adaptive traffic signal control optimization using a novel road partition and multi-channel state representation method
arXiv:2602.12296v1 Announce Type: cross Abstract: This study proposes a novel adaptive traffic signal control method leveraging a Deep Q-Network (DQN) and Proximal Policy Optimization (PPO) to optimize signal timing by integrating variable cell length and multi-channel state representation. A road...
OmniCustom: Sync Audio-Video Customization Via Joint Audio-Video Generation Model
arXiv:2602.12304v1 Announce Type: cross Abstract: Existing mainstream video customization methods focus on generating identity-consistent videos based on given reference images and textual prompts. Benefiting from the rapid advancement of joint audio-video generation, this paper proposes a more compelling new task:...
OptiML: An End-to-End Framework for Program Synthesis and CUDA Kernel Optimization
arXiv:2602.12305v1 Announce Type: cross Abstract: Generating high-performance CUDA kernels remains challenging due to the need to navigate a combinatorial space of low-level transformations under noisy and expensive hardware feedback. Although large language models can synthesize functionally correct CUDA code, achieving...
Quantum walk inspired JPEG compression of images
arXiv:2602.12306v1 Announce Type: cross Abstract: This work proposes a quantum inspired adaptive quantization framework that enhances the classical JPEG compression by introducing a learned, optimized Qtable derived using a Quantum Walk Inspired Optimization (QWIO) search strategy. The optimizer searches a...
Visible and Hyperspectral Imaging for Quality Assessment of Milk: Property Characterisation and Identification
arXiv:2602.12313v1 Announce Type: cross Abstract: Rapid and non-destructive assessment of milk quality is crucial to ensuring both nutritional value and food safety. In this study, we investigated the potential of visible and hyperspectral imaging as cost-effective and quick-response alternatives to...
AgenticShop: Benchmarking Agentic Product Curation for Personalized Web Shopping
arXiv:2602.12315v1 Announce Type: cross Abstract: The proliferation of e-commerce has made web shopping platforms key gateways for customers navigating the vast digital marketplace. Yet this rapid expansion has led to a noisy and fragmented information environment, increasing cognitive burden as...
Free Lunch in Medical Image Foundation Model Pre-training via Randomized Synthesis and Disentanglement
arXiv:2602.12317v1 Announce Type: cross Abstract: Medical image foundation models (MIFMs) have demonstrated remarkable potential for a wide range of clinical tasks, yet their development is constrained by the scarcity, heterogeneity, and high cost of large-scale annotated datasets. Here, we propose...
ForeAct: Steering Your VLA with Efficient Visual Foresight Planning
arXiv:2602.12322v1 Announce Type: cross Abstract: Vision-Language-Action (VLA) models convert high-level language instructions into concrete, executable actions, a task that is especially challenging in open-world environments. We present Visual Foresight Planning (ForeAct), a general and efficient planner that guides a VLA...
Intrinsic Credit Assignment for Long Horizon Interaction
arXiv:2602.12342v1 Announce Type: cross Abstract: How can we train agents to navigate uncertainty over long horizons? In this work, we propose {\Delta}Belief-RL, which leverages a language model's own intrinsic beliefs to reward intermediate progress. Our method utilizes the change in...
Policy4OOD: A Knowledge-Guided World Model for Policy Intervention Simulation against the Opioid Overdose Crisis
arXiv:2602.12373v1 Announce Type: cross Abstract: The opioid epidemic remains one of the most severe public health crises in the United States, yet evaluating policy interventions before implementation is difficult: multiple policies interact within a dynamic system where targeting one risk...
Value Bonuses using Ensemble Errors for Exploration in Reinforcement Learning
arXiv:2602.12375v1 Announce Type: cross Abstract: Optimistic value estimates provide one mechanism for directed exploration in reinforcement learning (RL). The agent acts greedily with respect to an estimate of the value plus what can be seen as a value bonus. The...
Why Deep Jacobian Spectra Separate: Depth-Induced Scaling and Singular-Vector Alignment
arXiv:2602.12384v2 Announce Type: cross Abstract: Understanding why gradient-based training in deep networks exhibits strong implicit bias remains challenging, in part because tractable singular-value dynamics are typically available only for balanced deep linear models. We propose an alternative route based on...
Rational Neural Networks have Expressivity Advantages
arXiv:2602.12390v1 Announce Type: cross Abstract: We study neural networks with trainable low-degree rational activation functions and show that they are more expressive and parameter-efficient than modern piecewise-linear and smooth activations such as ELU, LeakyReLU, LogSigmoid, PReLU, ReLU, SELU, CELU, Sigmoid,...
What does RL improve for Visual Reasoning? A Frankenstein-Style Analysis
arXiv:2602.12395v1 Announce Type: cross Abstract: Reinforcement learning (RL) with verifiable rewards has become a standard post-training stage for boosting visual reasoning in vision-language models, yet it remains unclear what capabilities RL actually improves compared with supervised fine-tuning as cold-start initialization...
AstRL: Analog and Mixed-Signal Circuit Synthesis with Deep Reinforcement Learning
arXiv:2602.12402v1 Announce Type: cross Abstract: Analog and mixed-signal (AMS) integrated circuits (ICs) lie at the core of modern computing and communications systems. However, despite the continued rise in design complexity, advances in AMS automation remain limited. This reflects the central...
Soft Contamination Means Benchmarks Test Shallow Generalization
arXiv:2602.12413v1 Announce Type: cross Abstract: If LLM training data is polluted with benchmark test data, then benchmark performance gives biased estimates of out-of-distribution (OOD) generalization. Typical decontamination filters use n-gram matching which fail to detect semantic duplicates: sentences with equivalent...
RankLLM: Weighted Ranking of LLMs by Quantifying Question Difficulty
arXiv:2602.12424v1 Announce Type: cross Abstract: Benchmarks establish a standardized evaluation framework to systematically assess the performance of large language models (LLMs), facilitating objective comparisons and driving advancements in the field. However, existing benchmarks fail to differentiate question difficulty, limiting their...