Think First, Diffuse Fast: Improving Diffusion Language Model Reasoning via Autoregressive Plan Conditioning
arXiv:2603.13243v1 Announce Type: new Abstract: Diffusion large language models (dLLMs) generate text via iterative denoising but consistently underperform on multi-step reasoning. We hypothesize this gap stems from a coordination problem: AR models build coherence token-by-token, while diffusion models must coordinate...
LLM Routing as Reasoning: A MaxSAT View
arXiv:2603.13612v1 Announce Type: new Abstract: Routing a query through an appropriate LLM is challenging, particularly when user preferences are expressed in natural language and model attributes are only partially observable. We propose a constraint-based interpretation of language-conditioned LLM routing, formulating...
GroupGuard: A Framework for Modeling and Defending Collusive Attacks in Multi-Agent Systems
arXiv:2603.13940v1 Announce Type: new Abstract: While large language model-based agents demonstrate great potential in collaborative tasks, their interactivity also introduces security vulnerabilities. In this paper, we propose and model group collusive attacks, a highly destructive threat in which multiple agents...
ILION: Deterministic Pre-Execution Safety Gates for Agentic AI Systems
arXiv:2603.13247v1 Announce Type: new Abstract: The proliferation of autonomous AI agents capable of executing real-world actions - filesystem operations, API calls, database modifications, financial transactions - introduces a class of safety risk not addressed by existing content-moderation infrastructure. Current text-safety...
DyACE: Dynamic Algorithm Co-evolution for Online Automated Heuristic Design with Large Language Model
arXiv:2603.13344v1 Announce Type: new Abstract: The prevailing paradigm in Automated Heuristic Design (AHD) typically relies on the assumption that a single, fixed algorithm can effectively navigate the shifting dynamics of a combinatorial search. This static approach often proves inadequate for...
Human Attribution of Causality to AI Across Agency, Misuse, and Misalignment
arXiv:2603.13236v1 Announce Type: new Abstract: AI-related incidents are becoming increasingly frequent and severe, ranging from safety failures to misuse by malicious actors. In such complex situations, identifying which elements caused an adverse outcome, the problem of cause selection, is a...
vla-eval: A Unified Evaluation Harness for Vision-Language-Action Models
arXiv:2603.13966v1 Announce Type: new Abstract: Vision Language Action VLA models are typically evaluated using per benchmark scripts maintained independently by each model repository, leading to duplicated code, dependency conflicts, and underspecified protocols. We present vla eval, an open source evaluation...
StatePlane: A Cognitive State Plane for Long-Horizon AI Systems Under Bounded Context
arXiv:2603.13644v1 Announce Type: new Abstract: Large language models (LLMs) and small language models (SLMs) operate under strict context window and key-value (KV) cache constraints, fundamentally limiting their ability to reason coherently over long interaction horizons. Existing approaches -- extended context...
MESD: Detecting and Mitigating Procedural Bias in Intersectional Groups
arXiv:2603.13452v1 Announce Type: new Abstract: Research about bias in machine learning has mostly focused on outcome-oriented fairness metrics (e.g., equalized odds) and on a single protected category. Although these approaches offer great insight into bias in ML, they provide limited...
Distilling Deep Reinforcement Learning into Interpretable Fuzzy Rules: An Explainable AI Framework
arXiv:2603.13257v1 Announce Type: new Abstract: Deep Reinforcement Learning (DRL) agents achieve remarkable performance in continuous control but remain opaque, hindering deployment in safety-critical domains. Existing explainability methods either provide only local insights (SHAP, LIME) or employ over-simplified surrogates failing to...
Multimodal Emotion Regression with Multi-Objective Optimization and VAD-Aware Audio Modeling for the 10th ABAW EMI Track
arXiv:2603.13760v1 Announce Type: new Abstract: We participated in the 10th ABAW Challenge, focusing on the Emotional Mimicry Intensity (EMI) Estimation track on the Hume-Vidmimic2 dataset. This task aims to predict six continuous emotion dimensions: Admiration, Amusement, Determination, Empathic Pain, Excitement,...
Artificial intelligence-driven improvement of hospital logistics management resilience: a practical exploration based on H Hospital
arXiv:2603.13816v1 Announce Type: new Abstract: Hospital logistics management faces growing pressure from internal operations and external emergencies, with artificial intelligence (AI) holding untapped potential to boost its resilience. This study explores AI's role in enhancing logistics resilience via a mixed-methods...
A Dual-Path Generative Framework for Zero-Day Fraud Detection in Banking Systems
arXiv:2603.13237v1 Announce Type: new Abstract: High-frequency banking environments face a critical trade-off between low-latency fraud detection and the regulatory explainability demanded by GDPR. Traditional rule-based and discriminative models struggle with "zero-day" attacks due to extreme class imbalance and the lack...
PA-Net: Precipitation-Adaptive Mixture-of-Experts for Long-Tail Rainfall Nowcasting
arXiv:2603.13818v1 Announce Type: new Abstract: Precipitation nowcasting is vital for flood warning, agricultural management, and emergency response, yet two bottlenecks persist: the prohibitive cost of modeling million-scale spatiotemporal tokens from multi-variate atmospheric fields, and the extreme long-tailed rainfall distribution where...
Multi-Axis Trust Modeling for Interpretable Account Hijacking Detection
arXiv:2603.13246v1 Announce Type: new Abstract: This paper proposes a Hadith-inspired multi-axis trust modeling framework, motivated by a structurally analogous problem in classical Hadith scholarship: assessing the trustworthiness of information sources using interpretable, multidimensional criteria rather than a single anomaly score....
Agent-Based User-Adaptive Filtering for Categorized Harassing Communication
arXiv:2603.13288v1 Announce Type: new Abstract: We propose an agent-based framework for personalized filtering of categorized harassing communication in online social networks. Unlike global moderation systems that apply uniform filtering rules, our approach models user-specific tolerance levels and preferences through adaptive...
Traffic and weather driven hybrid digital twin for bridge monitoring
arXiv:2603.14028v1 Announce Type: new Abstract: A hybrid digital twin framework is presented for bridge condition monitoring using existing traffic cameras and weather APIs, reducing reliance on dedicated sensor installations. The approach is demonstrated on the Peace Bridge (99 years in...
DeceptGuard :A Constitutional Oversight Framework For Detecting Deception in LLM Agents
arXiv:2603.13791v1 Announce Type: new Abstract: Reliable detection of deceptive behavior in Large Language Model (LLM) agents is an essential prerequisite for safe deployment in high-stakes agentic contexts. Prior work on scheming detection has focused exclusively on black-box monitors that observe...
Why Grokking Takes So Long: A First-Principles Theory of Representational Phase Transitions
arXiv:2603.13331v1 Announce Type: new Abstract: Grokking is the sudden generalization that appears long after a model has perfectly memorized its training data. Although this phenomenon has been widely observed, there is still no quantitative theory explaining the length of the...
Learning When to Trust in Contextual Bandits
arXiv:2603.13356v1 Announce Type: new Abstract: Standard approaches to Robust Reinforcement Learning assume that feedback sources are either globally trustworthy or globally adversarial. In this paper, we challenge this assumption and we identify a more subtle failure mode. We term this...
LiveWeb-IE: A Benchmark For Online Web Information Extraction
arXiv:2603.13773v1 Announce Type: new Abstract: Web information extraction (WIE) is the task of automatically extracting data from web pages, offering high utility for various applications. The evaluation of WIE systems has traditionally relied on benchmarks built from HTML snapshots captured...
GhanaNLP Parallel Corpora: Comprehensive Multilingual Resources for Low-Resource Ghanaian Languages
arXiv:2603.13793v1 Announce Type: new Abstract: Low resource languages present unique challenges for natural language processing due to the limited availability of digitized and well structured linguistic data. To address this gap, the GhanaNLP initiative has developed and curated 41,513 parallel...
A Systematic Evaluation Protocol of Graph-Derived Signals for Tabular Machine Learning
arXiv:2603.13998v1 Announce Type: new Abstract: While graph-derived signals are widely used in tabular learning, existing studies typically rely on limited experimental setups and average performance comparisons, leaving the statistical reliability and robustness of observed gains largely unexplored. Consequently, it remains...
Formal Abductive Explanations for Navigating Mental Health Help-Seeking and Diversity in Tech Workplaces
arXiv:2603.14007v1 Announce Type: new Abstract: This work proposes a formal abductive explanation framework designed to systematically uncover rationales underlying AI predictions of mental health help-seeking within tech workplace settings. By computing rigorous justifications for model outputs, this approach enables principled...
Deep Convolutional Architectures for EEG Classification: A Comparative Study with Temporal Augmentation and Confidence-Based Voting
arXiv:2603.13261v1 Announce Type: new Abstract: Electroencephalography (EEG) classification plays a key role in brain-computer interface (BCI) systems, yet it remains challenging due to the low signal-to-noise ratio, temporal variability of neural responses, and limited data availability. In this paper, we...
Supervised Fine-Tuning versus Reinforcement Learning: A Study of Post-Training Methods for Large Language Models
arXiv:2603.13985v1 Announce Type: new Abstract: Pre-trained Large Language Model (LLM) exhibits broad capabilities, yet, for specific tasks or domains their attainment of higher accuracy and more reliable reasoning generally depends on post-training through Supervised Fine-Tuning (SFT) or Reinforcement Learning (RL)....
Can We Trust LLMs on Memristors? Diving into Reasoning Ability under Non-Ideality
arXiv:2603.13725v1 Announce Type: new Abstract: Memristor-based analog compute-in-memory (CIM) architectures provide a promising substrate for the efficient deployment of Large Language Models (LLMs), owing to superior energy efficiency and computational density. However, these architectures suffer from precision issues caused by...
Intelligent Materials Modelling: Large Language Models Versus Partial Least Squares Regression for Predicting Polysulfone Membrane Mechanical Performance
arXiv:2603.13834v1 Announce Type: new Abstract: Predicting the mechanical properties of polysulfone (PSF) membranes from structural descriptors remains challenging due to extreme data scarcity typical of experimental studies. To investigate this issue, this study benchmarked knowledge-driven inference using four large language...
The ARC of Progress towards AGI: A Living Survey of Abstraction and Reasoning
arXiv:2603.13372v1 Announce Type: new Abstract: The Abstraction and Reasoning Corpus (ARC-AGI) has become a key benchmark for fluid intelligence in AI. This survey presents the first cross-generation analysis of 82 approaches across three benchmark versions and the ARC Prize 2024-2025...
Early Rug Pull Warning for BSC Meme Tokens via Multi-Granularity Wash-Trading Pattern Profiling
arXiv:2603.13830v1 Announce Type: new Abstract: The high-frequency issuance and short-cycle speculation of meme tokens in decentralized finance (DeFi) have significantly amplified rug-pull risk. Existing approaches still struggle to provide stable early warning under scarce anomalies, incomplete labels, and limited interpretability....