Call for Proposals: “AIx” Pop-Up Events
We are now accepting proposals for AAAI-sponsored “AIx” Pop-Up Events — TEDx-style talks, panels, or public forums
GT-HarmBench: Benchmarking AI Safety Risks Through the Lens of Game Theory
arXiv:2602.12316v1 Announce Type: new Abstract: Frontier AI systems are increasingly capable and deployed in high-stakes multi-agent environments. However, existing AI safety benchmarks largely evaluate single agents, leaving multi-agent risks such as coordination failure and conflict poorly understood. We introduce GT-HarmBench,...
Evolving Beyond Snapshots: Harmonizing Structure and Sequence via Entity State Tuning for Temporal Knowledge Graph Forecasting
arXiv:2602.12389v1 Announce Type: new Abstract: Temporal knowledge graph (TKG) forecasting requires predicting future facts by jointly modeling structural dependencies within each snapshot and temporal evolution across snapshots. However, most existing methods are stateless: they recompute entity representations at each timestamp...
Intent-Driven Smart Manufacturing Integrating Knowledge Graphs and Large Language Models
arXiv:2602.12419v1 Announce Type: new Abstract: The increasing complexity of smart manufacturing environments demands interfaces that can translate high-level human intents into machine-executable actions. This paper presents a unified framework that integrates instruction-tuned Large Language Models (LLMs) with ontology-aligned Knowledge Graphs...
Scaling Web Agent Training through Automatic Data Generation and Fine-grained Evaluation
arXiv:2602.12544v1 Announce Type: new Abstract: We present a scalable pipeline for automatically generating high-quality training data for web agents. In particular, a major challenge in identifying high-quality training instances is trajectory evaluation - quantifying how much progress was made towards...
To Mix or To Merge: Toward Multi-Domain Reinforcement Learning for Large Language Models
arXiv:2602.12566v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) plays a key role in stimulating the explicit reasoning capability of Large Language Models (LLMs). We can achieve expert-level performance in some specific domains via RLVR, such as coding...
GeoAgent: Learning to Geolocate Everywhere with Reinforced Geographic Characteristics
arXiv:2602.12617v1 Announce Type: new Abstract: This paper presents GeoAgent, a model capable of reasoning closely with humans and deriving fine-grained address conclusions. Previous RL-based methods have achieved breakthroughs in performance and interpretability but still remain concerns because of their reliance...
X-SYS: A Reference Architecture for Interactive Explanation Systems
arXiv:2602.12748v1 Announce Type: new Abstract: The explainable AI (XAI) research community has proposed numerous technical methods, yet deploying explainability as systems remains challenging: Interactive explanation systems require both suitable algorithms and system capabilities that maintain explanation usability across repeated queries,...
Consistency of Large Reasoning Models Under Multi-Turn Attacks
arXiv:2602.13093v2 Announce Type: new Abstract: Large reasoning models with reasoning capabilities achieve state-of-the-art performance on complex tasks, but their robustness under multi-turn adversarial pressure remains underexplored. We evaluate nine frontier reasoning models under adversarial attacks. Our findings reveal that reasoning...
Optimal Take-off under Fuzzy Clearances
arXiv:2602.13166v1 Announce Type: new Abstract: This paper presents a hybrid obstacle avoidance architecture that integrates Optimal Control under clearance with a Fuzzy Rule Based System (FRBS) to enable adaptive constraint handling for unmanned aircraft. Motivated by the limitations of classical...
A Lightweight LLM Framework for Disaster Humanitarian Information Classification
arXiv:2602.12284v1 Announce Type: cross Abstract: Timely classification of humanitarian information from social media is critical for effective disaster response. However, deploying large language models (LLMs) for this task faces challenges in resource-constrained emergency settings. This paper develops a lightweight, cost-effective...
Energy-Aware Reinforcement Learning for Robotic Manipulation of Articulated Components in Infrastructure Operation and Maintenance
arXiv:2602.12288v1 Announce Type: cross Abstract: With the growth of intelligent civil infrastructure and smart cities, operation and maintenance (O&M) increasingly requires safe, efficient, and energy-conscious robotic manipulation of articulated components, including access doors, service drawers, and pipeline valves. However, existing...
OmniCustom: Sync Audio-Video Customization Via Joint Audio-Video Generation Model
arXiv:2602.12304v1 Announce Type: cross Abstract: Existing mainstream video customization methods focus on generating identity-consistent videos based on given reference images and textual prompts. Benefiting from the rapid advancement of joint audio-video generation, this paper proposes a more compelling new task:...
Perceptual Self-Reflection in Agentic Physics Simulation Code Generation
arXiv:2602.12311v1 Announce Type: cross Abstract: We present a multi-agent framework for generating physics simulation code from natural language descriptions, featuring a novel perceptual self-reflection mechanism for validation. The system employs four specialized agents: a natural language interpreter that converts user...
Visible and Hyperspectral Imaging for Quality Assessment of Milk: Property Characterisation and Identification
arXiv:2602.12313v1 Announce Type: cross Abstract: Rapid and non-destructive assessment of milk quality is crucial to ensuring both nutritional value and food safety. In this study, we investigated the potential of visible and hyperspectral imaging as cost-effective and quick-response alternatives to...
Free Lunch in Medical Image Foundation Model Pre-training via Randomized Synthesis and Disentanglement
arXiv:2602.12317v1 Announce Type: cross Abstract: Medical image foundation models (MIFMs) have demonstrated remarkable potential for a wide range of clinical tasks, yet their development is constrained by the scarcity, heterogeneity, and high cost of large-scale annotated datasets. Here, we propose...
ForeAct: Steering Your VLA with Efficient Visual Foresight Planning
arXiv:2602.12322v1 Announce Type: cross Abstract: Vision-Language-Action (VLA) models convert high-level language instructions into concrete, executable actions, a task that is especially challenging in open-world environments. We present Visual Foresight Planning (ForeAct), a general and efficient planner that guides a VLA...
Policy4OOD: A Knowledge-Guided World Model for Policy Intervention Simulation against the Opioid Overdose Crisis
arXiv:2602.12373v1 Announce Type: cross Abstract: The opioid epidemic remains one of the most severe public health crises in the United States, yet evaluating policy interventions before implementation is difficult: multiple policies interact within a dynamic system where targeting one risk...
Reproducing DragDiffusion: Interactive Point-Based Editing with Diffusion Models
arXiv:2602.12393v1 Announce Type: cross Abstract: DragDiffusion is a diffusion-based method for interactive point-based image editing that enables users to manipulate images by directly dragging selected points. The method claims that accurate spatial control can be achieved by optimizing a single...
Correctness, Artificial Intelligence, and the Epistemic Value of Mathematical Proof
arXiv:2602.12463v1 Announce Type: cross Abstract: We argue that it is neither necessary nor sufficient for a mathematical proof to have epistemic value that it be "correct", in the sense of formalizable in a formal proof system. We then present a...
Designing RNAs with Language Models
arXiv:2602.12470v1 Announce Type: cross Abstract: RNA design, the task of finding a sequence that folds into a target secondary structure, has broad biological and biomedical impact but remains computationally challenging due to the exponentially large sequence space and exponentially many...
propella-1: Multi-Property Document Annotation for LLM Data Curation at Scale
arXiv:2602.12414v1 Announce Type: new Abstract: Since FineWeb-Edu, data curation for LLM pretraining has predominantly relied on single scalar quality scores produced by small classifiers. A single score conflates multiple quality dimensions, prevents flexible filtering, and offers no interpretability. We introduce...
RBCorr: Response Bias Correction in Language Models
arXiv:2602.12445v1 Announce Type: new Abstract: Language models (LMs) are known to be prone to response biases, which present as option preference biases in fixed-response questions. It is therefore imperative to develop low-cost and effective response bias correction methods to improve...
Discovering Semantic Latent Structures in Psychological Scales: A Response-Free Pathway to Efficient Simplification
arXiv:2602.12575v1 Announce Type: new Abstract: Psychological scale refinement traditionally relies on response-based methods such as factor analysis, item response theory, and network psychometrics to optimize item composition. Although rigorous, these approaches require large samples and may be constrained by data...
Beyond Normalization: Rethinking the Partition Function as a Difficulty Scheduler for RLVR
arXiv:2602.12642v1 Announce Type: new Abstract: Reward-maximizing RL methods enhance the reasoning performance of LLMs, but often reduce the diversity among outputs. Recent works address this issue by adopting GFlowNets, training LLMs to match a target distribution while jointly learning its...
ReFilter: Improving Robustness of Retrieval-Augmented Generation via Gated Filter
arXiv:2602.12709v1 Announce Type: new Abstract: Retrieval-augmented generation (RAG) has become a dominant paradigm for grounding large language models (LLMs) with external evidence in knowledge-intensive question answering. A core design choice is how to fuse retrieved samples into the LLMs, where...
Towards a Diagnostic and Predictive Evaluation Methodology for Sequence Labeling Tasks
arXiv:2602.12759v1 Announce Type: new Abstract: Standard evaluation in NLP typically indicates that system A is better on average than system B, but it provides little info on how to improve performance and, what is worse, it should not come as...
RAT-Bench: A Comprehensive Benchmark for Text Anonymization
arXiv:2602.12806v1 Announce Type: new Abstract: Data containing personal information is increasingly used to train, fine-tune, or query Large Language Models (LLMs). Text is typically scrubbed of identifying information prior to use, often with tools such as Microsoft's Presidio or Anthropic's...
Left-right asymmetry in predicting brain activity from LLMs' representations emerges with their formal linguistic competence
arXiv:2602.12811v1 Announce Type: new Abstract: When humans and large language models (LLMs) process the same text, activations in the LLMs correlate with brain activity measured, e.g., with functional magnetic resonance imaging (fMRI). Moreover, it has been shown that, as the...