Adversarial Reinforcement Learning for Detecting False Data Injection Attacks in Vehicular Routing
arXiv:2603.11433v1 Announce Type: new Abstract: In modern transportation networks, adversaries can manipulate routing algorithms using false data injection attacks, such as simulating heavy traffic with multiple devices running crowdsourced navigation applications, to mislead vehicles toward suboptimal routes and increase congestion....
Gender Bias in Generative AI-assisted Recruitment Processes
arXiv:2603.11736v1 Announce Type: new Abstract: In recent years, generative artificial intelligence (GenAI) systems have assumed increasingly crucial roles in selection processes, personnel recruitment and analysis of candidates' profiles. However, the employment of large language models (LLMs) risks reproducing, and in...
DocSage: An Information Structuring Agent for Multi-Doc Multi-Entity Question Answering
arXiv:2603.11798v1 Announce Type: new Abstract: Multi-document Multi-entity Question Answering inherently demands models to track implicit logic between multiple entities across scattered documents. However, existing Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) frameworks suffer from critical limitations: standard RAG's vector...
DIVE: Scaling Diversity in Agentic Task Synthesis for Generalizable Tool Use
arXiv:2603.11076v1 Announce Type: new Abstract: Recent work synthesizes agentic tasks for post-training tool-using LLMs, yet robust generalization under shifts in tasks and toolsets remains an open challenge. We trace this brittleness to insufficient diversity in synthesized tasks. Scaling diversity is...
BTZSC: A Benchmark for Zero-Shot Text Classification Across Cross-Encoders, Embedding Models, Rerankers and LLMs
arXiv:2603.11991v1 Announce Type: new Abstract: Zero-shot text classification (ZSC) offers the promise of eliminating costly task-specific annotation by matching texts directly to human-readable label descriptions. While early approaches have predominantly relied on cross-encoder models fine-tuned for natural language inference (NLI),...
H2LooP Spark Preview: Continual Pretraining of Large Language Models for Low-Level Embedded Systems Code
arXiv:2603.11139v1 Announce Type: new Abstract: Large language models (LLMs) demonstrate strong code generation abilities in general-purpose programming languages but remain limited in specialized domains such as low-level embedded systems programming. This domain involves hardware register manipulation, vendor-specific SDKs, real-time operating...
Procedural Fairness via Group Counterfactual Explanation
arXiv:2603.11140v1 Announce Type: new Abstract: Fairness in machine learning research has largely focused on outcome-oriented fairness criteria such as Equalized Odds, while comparatively less attention has been given to procedural-oriented fairness, which addresses how a model arrives at its predictions....
Attention Gathers, MLPs Compose: A Causal Analysis of an Action-Outcome Circuit in VideoViT
arXiv:2603.11142v1 Announce Type: new Abstract: The paper explores how video models trained for classification tasks represent nuanced, hidden semantic information that may not affect the final outcome, a key challenge for Trustworthy AI models. Through Explainable and Interpretable AI methods,...
Hindsight-Anchored Policy Optimization: Turning Failure into Feedback in Sparse Reward Settings
arXiv:2603.11321v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a promising paradigm for post-training reasoning models. However, group-based methods such as Group Relative Policy Optimization (GRPO) face a critical dilemma in sparse-reward settings: pure Reinforcement...
CUAAudit: Meta-Evaluation of Vision-Language Models as Auditors of Autonomous Computer-Use Agents
arXiv:2603.10577v1 Announce Type: new Abstract: Computer-Use Agents (CUAs) are emerging as a new paradigm in human-computer interaction, enabling autonomous execution of tasks in desktop environment by perceiving high-level natural-language instructions. As such agents become increasingly capable and are deployed across...
TAMUSA-Chat: A Domain-Adapted Large Language Model Conversational System for Research and Responsible Deployment
arXiv:2603.09992v1 Announce Type: cross Abstract: This paper presents TAMUSA-Chat, a research-oriented framework for building domain-adapted large language model conversational systems. The work addresses critical challenges in adapting general-purpose foundation models to institutional contexts through supervised fine-tuning, retrieval-augmented generation, and systematic...
The DMA Streaming Framework: Kernel-Level Buffer Orchestration for High-Performance AI Data Paths
arXiv:2603.10030v1 Announce Type: cross Abstract: AI transport libraries move bytes efficiently, but they commonly assume that buffers are already correctly allocated, placed, shared, registered, and safe under completion and teardown pressure. This paper presents dmaplane, a Linux kernel module that...
Mitigating Translationese Bias in Multilingual LLM-as-a-Judge via Disentangled Information Bottleneck
arXiv:2603.10351v1 Announce Type: new Abstract: Large language models (LLMs) have become a standard for multilingual evaluation, yet they exhibit a severe systematic translationese bias. In this paper, translationese bias is characterized as LLMs systematically favoring machine-translated text over human-authored references,...
Lost in the Middle at Birth: An Exact Theory of Transformer Position Bias
arXiv:2603.10123v1 Announce Type: new Abstract: The ``Lost in the Middle'' phenomenon -- a U-shaped performance curve where LLMs retrieve well from the beginning and end of a context but fail in the middle -- is widely attributed to learned Softmax...
Variance-Aware Adaptive Weighting for Diffusion Model Training
arXiv:2603.10391v1 Announce Type: new Abstract: Diffusion models have recently achieved remarkable success in generative modeling, yet their training dynamics across different noise levels remain highly imbalanced, which can lead to inefficient optimization and unstable learning behavior. In this work, we...
On the Learning Dynamics of Two-layer Linear Networks with Label Noise SGD
arXiv:2603.10397v1 Announce Type: new Abstract: One crucial factor behind the success of deep learning lies in the implicit bias induced by noise inherent in gradient-based training algorithms. Motivated by empirical observations that training with noisy labels improves model generalization, we...
World2Mind: Cognition Toolkit for Allocentric Spatial Reasoning in Foundation Models
arXiv:2603.09774v1 Announce Type: new Abstract: Achieving robust spatial reasoning remains a fundamental challenge for current Multimodal Foundation Models (MFMs). Existing methods either overfit statistical shortcuts via 3D grounding data or remain confined to 2D visual perception, limiting both spatial reasoning...
Meissa: Multi-modal Medical Agentic Intelligence
arXiv:2603.09018v1 Announce Type: new Abstract: Multi-modal large language models (MM-LLMs) have shown strong performance in medical image understanding and clinical reasoning. Recent medical agent systems extend them with tool use and multi-agent collaboration, enabling complex decision-making. However, these systems rely...
MEMO: Memory-Augmented Model Context Optimization for Robust Multi-Turn Multi-Agent LLM Games
arXiv:2603.09022v1 Announce Type: new Abstract: Multi-turn, multi-agent LLM game evaluations often exhibit substantial run-to-run variance. In long-horizon interactions, small early deviations compound across turns and are amplified by multi-agent coupling. This biases win rate estimates and makes rankings unreliable across...
Telogenesis: Goal Is All U Need
arXiv:2603.09476v1 Announce Type: new Abstract: Goal-conditioned systems assume goals are provided externally. We ask whether attentional priorities can emerge endogenously from an agent's internal cognitive state. We propose a priority function that generates observation targets from three epistemic gaps: ignorance...
One Language, Two Scripts: Probing Script-Invariance in LLM Concept Representations
arXiv:2603.08869v1 Announce Type: new Abstract: Do the features learned by Sparse Autoencoders (SAEs) represent abstract meaning, or are they tied to how text is written? We investigate this question using Serbian digraphia as a controlled testbed: Serbian is written interchangeably...
TaSR-RAG: Taxonomy-guided Structured Reasoning for Retrieval-Augmented Generation
arXiv:2603.09341v1 Announce Type: new Abstract: Retrieval-Augmented Generation (RAG) helps large language models (LLMs) answer knowledge-intensive and time-sensitive questions by conditioning generation on external evidence. However, most RAG systems still retrieve unstructured chunks and rely on one-shot generation, which often yields...
Bioalignment: Measuring and Improving LLM Disposition Toward Biological Systems for AI Safety
arXiv:2603.09154v1 Announce Type: new Abstract: Large language models (LLMs) trained on internet-scale corpora can exhibit systematic biases that increase the probability of unwanted behavior. In this study, we examined potential biases towards synthetic vs. biological technological solutions across four domains...
Logos: An evolvable reasoning engine for rational molecular design
arXiv:2603.09268v1 Announce Type: new Abstract: The discovery and design of functional molecules remain central challenges across chemistry,biology, and materials science. While recent advances in machine learning have accelerated molecular property prediction and candidate generation, existing models tend to excel either...
Logics-Parsing-Omni Technical Report
arXiv:2603.09677v1 Announce Type: new Abstract: Addressing the challenges of fragmented task definitions and the heterogeneity of unstructured data in multimodal parsing, this paper proposes the Omni Parsing framework. This framework establishes a Unified Taxonomy covering documents, images, and audio-visual streams,...
MiniAppBench: Evaluating the Shift from Text to Interactive HTML Responses in LLM-Powered Assistants
arXiv:2603.09652v1 Announce Type: new Abstract: With the rapid advancement of Large Language Models (LLMs) in code generation, human-AI interaction is evolving from static text responses to dynamic, interactive HTML-based applications, which we term MiniApps. These applications require models to not...
Self-hosted Lecture-to-Quiz: Local LLM MCQ Generation with Deterministic Quality Control
arXiv:2603.08729v1 Announce Type: cross Abstract: We present an end-to-end self-hosted (API-free) pipeline, where API-free means that lecture content is not sent to any external LLM service, that converts lecture PDFs into multiple-choice questions (MCQs) using a local LLM plus deterministic...
Fish Audio S2 Technical Report
arXiv:2603.08823v1 Announce Type: cross Abstract: We introduce Fish Audio S2, an open-sourced text-to-speech system featuring multi-speaker, multi-turn generation, and, most importantly, instruction-following control via natural-language descriptions. To scale training, we develop a multi-stage training recipe together with a staged data...
PathoScribe: Transforming Pathology Data into a Living Library with a Unified LLM-Driven Framework for Semantic Retrieval and Clinical Integration
arXiv:2603.08935v1 Announce Type: cross Abstract: Pathology underpins modern diagnosis and cancer care, yet its most valuable asset, the accumulated experience encoded in millions of narrative reports, remains largely inaccessible. Although institutions are rapidly digitizing pathology workflows, storing data without effective...
The Coupling Within: Flow Matching via Distilled Normalizing Flows
arXiv:2603.09014v1 Announce Type: new Abstract: Flow models have rapidly become the go-to method for training and deploying large-scale generators, owing their success to inference-time flexibility via adjustable integration steps. A crucial ingredient in flow training is the choice of coupling...