Accurate and Efficient Multi-Channel Time Series Forecasting via Sparse Attention Mechanism
arXiv:2603.18712v1 Announce Type: new Abstract: The task of multi-channel time series forecasting is ubiquitous in numerous fields such as finance, supply chain management, and energy planning. It is critical to effectively capture complex dynamic dependencies within and between channels for...
Reasonably reasoning AI agents can avoid game-theoretic failures in zero-shot, provably
arXiv:2603.18563v1 Announce Type: new Abstract: AI agents are increasingly deployed in interactive economic environments characterized by repeated AI-AI interactions. Despite AI agents' advanced capabilities, empirical studies reveal that such interactions often fail to stably induce a strategic equilibrium, such as...
A Concept is More Than a Word: Diversified Unlearning in Text-to-Image Diffusion Models
arXiv:2603.18767v1 Announce Type: new Abstract: Concept unlearning has emerged as a promising direction for reducing the risks of harmful content generation in text-to-image diffusion models by selectively erasing undesirable concepts from a model's parameters. Existing approaches typically rely on keywords...
Consumer-to-Clinical Language Shifts in Ambient AI Draft Notes and Clinician-Finalized Documentation: A Multi-level Analysis
arXiv:2603.18327v1 Announce Type: new Abstract: Ambient AI generates draft clinical notes from patient-clinician conversations, often using lay or consumer-oriented phrasing to support patient understanding instead of standardized clinical terminology. How clinicians revise these drafts for professional documentation conventions remains unclear....
Access Controlled Website Interaction for Agentic AI with Delegated Critical Tasks
arXiv:2603.18197v1 Announce Type: new Abstract: Recent studies reveal gaps in delegating critical tasks to agentic AI that accesses websites on the user's behalf, primarily due to limited access control mechanisms on websites designed for agentic AI. In response, we propose...
NeuroGame Transformer: Gibbs-Inspired Attention Driven by Game Theory and Statistical Physics
arXiv:2603.18761v1 Announce Type: new Abstract: Standard attention mechanisms in transformers are limited by their pairwise formulation, which hinders the modeling of higher-order dependencies among tokens. We introduce the NeuroGame Transformer (NGT) to overcome this by reconceptualizing attention through a dual...
Balanced Thinking: Improving Chain of Thought Training in Vision Language Models
arXiv:2603.18656v1 Announce Type: new Abstract: Multimodal reasoning in vision-language models (VLMs) typically relies on a two-stage process: supervised fine-tuning (SFT) and reinforcement learning (RL). In standard SFT, all tokens contribute equally to the loss, even though reasoning data are inherently...
TeachingCoach: A Fine-Tuned Scaffolding Chatbot for Instructional Guidance to Instructors
arXiv:2603.18189v1 Announce Type: new Abstract: Higher education instructors often lack timely and pedagogically grounded support, as scalable instructional guidance remains limited and existing tools rely on generic chatbot advice or non-scalable teaching center human-human consultations. We present TeachingCoach, a pedagogically...
Controllable Evidence Selection in Retrieval-Augmented Question Answering via Deterministic Utility Gating
arXiv:2603.18011v1 Announce Type: new Abstract: Many modern AI question-answering systems convert text into vectors and retrieve the closest matches to a user question. While effective for topical similarity, similarity scores alone do not explain why some retrieved text can serve...
Don't Vibe Code, Do Skele-Code: Interactive No-Code Notebooks for Subject Matter Experts to Build Lower-Cost Agentic Workflows
arXiv:2603.18122v1 Announce Type: new Abstract: Skele-Code is a natural-language and graph-based interface for building workflows with AI agents, designed especially for less or non-technical users. It supports incremental, interactive notebook-style development, and each step is converted to code with a...
MANAR: Memory-augmented Attention with Navigational Abstract Conceptual Representation
arXiv:2603.18676v1 Announce Type: new Abstract: MANAR (Memory-augmented Attention with Navigational Abstract Conceptual Representation), contextualization layer generalizes standard multi-head attention (MHA) by instantiating the principles of Global Workspace Theory (GWT). While MHA enables unconstrained all-to-all communication, it lacks the functional bottleneck...
Adaptive Domain Models: Bayesian Evolution, Warm Rotation, and Principled Training for Geometric and Neuromorphic AI
arXiv:2603.18104v1 Announce Type: new Abstract: Prevailing AI training infrastructure assumes reverse-mode automatic differentiation over IEEE-754 arithmetic. The memory overhead of training relative to inference, optimizer complexity, and structural degradation of geometric properties through training are consequences of this arithmetic substrate....
CWoMP: Morpheme Representation Learning for Interlinear Glossing
arXiv:2603.18184v1 Announce Type: new Abstract: Interlinear glossed text (IGT) is a standard notation for language documentation which is linguistically rich but laborious to produce manually. Recent automated IGT methods treat glosses as character sequences, neglecting their compositional structure. We propose...
From Noise to Signal: When Outliers Seed New Topics
arXiv:2603.18358v1 Announce Type: new Abstract: Outliers in dynamic topic modeling are typically treated as noise, yet we show that some can serve as early signals of emerging topics. We introduce a temporal taxonomy of news-document trajectories that defines how documents...
TopoChunker: Topology-Aware Agentic Document Chunking Framework
arXiv:2603.18409v1 Announce Type: new Abstract: Current document chunking methods for Retrieval-Augmented Generation (RAG) typically linearize text. This forced linearization strips away intrinsic topological hierarchies, creating ``semantic fragmentation'' that degrades downstream retrieval quality. In this paper, we propose TopoChunker, an agentic...
Multimodal Task Interference: A Benchmark and Analysis of History-Target Mismatch in Multimodal LLMs
arXiv:2603.18425v1 Announce Type: new Abstract: Task interference, the performance degradation caused by task switches within a single conversation, has been studied exclusively in text-only settings despite the growing prevalence of multimodal dialogue systems. We introduce a benchmark for evaluating this...
UT-ACA: Uncertainty-Triggered Adaptive Context Allocation for Long-Context Inference
arXiv:2603.18446v1 Announce Type: new Abstract: Long-context inference remains challenging for large language models due to attention dilution and out-of-distribution degradation. Context selection mitigates this limitation by attending to a subset of key-value cache entries, yet most methods allocate a fixed...
The Truncation Blind Spot: How Decoding Strategies Systematically Exclude Human-Like Token Choices
arXiv:2603.18482v1 Announce Type: new Abstract: Standard decoding strategies for text generation, including top-k, nucleus sampling, and contrastive search, select tokens based on likelihood, restricting selection to high-probability regions. Human language production operates differently: tokens are chosen for communicative appropriateness rather...
Language Model Maps for Prompt-Response Distributions via Log-Likelihood Vectors
arXiv:2603.18593v1 Announce Type: new Abstract: We propose a method that represents language models by log-likelihood vectors over prompt-response pairs and constructs model maps for comparing their conditional distributions. In this space, distances between models approximate the KL divergence between the...
Cross-Modal Rationale Transfer for Explainable Humanitarian Classification on Social Media
arXiv:2603.18611v1 Announce Type: new Abstract: Advances in social media data dissemination enable the provision of real-time information during a crisis. The information comes from different classes, such as infrastructure damages, persons missing or stranded in the affected zone, etc. Existing...
DiscoPhon: Benchmarking the Unsupervised Discovery of Phoneme Inventories With Discrete Speech Units
arXiv:2603.18612v1 Announce Type: new Abstract: We introduce DiscoPhon, a multilingual benchmark for evaluating unsupervised phoneme discovery from discrete speech units. DiscoPhon covers 6 dev and 6 test languages, chosen to span a wide range of phonemic contrasts. Given only 10...
Why Better Cross-Lingual Alignment Fails for Better Cross-Lingual Transfer: Case of Encoders
arXiv:2603.18863v1 Announce Type: new Abstract: Better cross-lingual alignment is often assumed to yield better cross-lingual transfer. However, explicit alignment techniques -- despite increasing embedding similarity -- frequently fail to improve token-level downstream performance. In this work, we show that this...
Frayed RoPE and Long Inputs: A Geometric Perspective
arXiv:2603.18017v1 Announce Type: new Abstract: Rotary Positional Embedding (RoPE) is a widely adopted technique for encoding position in language models, which, while effective, causes performance breakdown when input length exceeds training length. Prior analyses assert (rightly) that long inputs cause...
Engineering Verifiable Modularity in Transformers via Per-Layer Supervision
arXiv:2603.18029v1 Announce Type: new Abstract: Transformers resist surgical control. Ablating an attention head identified as critical for capitalization produces minimal behavioral change because distributed redundancy compensates for damage. This Hydra effect renders interpretability illusory: we may identify components through correlation,...
InfoMamba: An Attention-Free Hybrid Mamba-Transformer Model
arXiv:2603.18031v1 Announce Type: new Abstract: Balancing fine-grained local modeling with long-range dependency capture under computational constraints remains a central challenge in sequence modeling. While Transformers provide strong token mixing, they suffer from quadratic complexity, whereas Mamba-style selective state-space models (SSMs)...
Taming Epilepsy: Mean Field Control of Whole-Brain Dynamics
arXiv:2603.18035v1 Announce Type: new Abstract: Controlling the high-dimensional neural dynamics during epileptic seizures remains a significant challenge due to the nonlinear characteristics and complex connectivity of the brain. In this paper, we propose a novel framework, namely Graph-Regularized Koopman Mean-Field...
Quotient Geometry and Persistence-Stable Metrics for Swarm Configurations
arXiv:2603.18041v1 Announce Type: new Abstract: Swarm and constellation reconfiguration can be viewed as motion of an unordered point configuration in an ambient space. Here, we provide persistence-stable, symmetry-invariant geometric representations for comparing and monitoring multi-agent configuration data. We introduce a...
Variational Phasor Circuits for Phase-Native Brain-Computer Interface Classification
arXiv:2603.18078v1 Announce Type: new Abstract: We present the \textbf{Variational Phasor Circuit (VPC)}, a deterministic classical learning architecture operating on the continuous $S^1$ unit circle manifold. Inspired by variational quantum circuits, VPC replaces dense real-valued weight matrices with trainable phase shifts,...
Enhancing Reinforcement Learning Fine-Tuning with an Online Refiner
arXiv:2603.18088v1 Announce Type: new Abstract: Constraints are essential for stabilizing reinforcement learning fine-tuning (RFT) and preventing degenerate outputs, yet they inherently conflict with the optimization objective because stronger constraints limit the ability of a fine-tuned model to discover better solutions....
Tula: Optimizing Time, Cost, and Generalization in Distributed Large-Batch Training
arXiv:2603.18112v1 Announce Type: new Abstract: Distributed training increases the number of batches processed per iteration either by scaling-out (adding more nodes) or scaling-up (increasing the batch-size). However, the largest configuration does not necessarily yield the best performance. Horizontal scaling introduces...