Context Engineering: From Prompts to Corporate Multi-Agent Architecture
arXiv:2603.09619v1 Announce Type: new Abstract: As artificial intelligence (AI) systems evolve from stateless chatbots to autonomous multi-step agents, prompt engineering (PE), the discipline of crafting individual queries, proves necessary but insufficient. This paper introduces context engineering (CE) as a standalone...
Meissa: Multi-modal Medical Agentic Intelligence
arXiv:2603.09018v1 Announce Type: new Abstract: Multi-modal large language models (MM-LLMs) have shown strong performance in medical image understanding and clinical reasoning. Recent medical agent systems extend them with tool use and multi-agent collaboration, enabling complex decision-making. However, these systems rely...
Vibe-Creation: The Epistemology of Human-AI Emergent Cognition
arXiv:2603.09486v1 Announce Type: new Abstract: The encounter between human reasoning and generative artificial intelligence (GenAI) cannot be adequately described by inherited metaphors of tool use, augmentation, or collaborative partnership. This article argues that such interactions produce a qualitatively distinct cognitive-epistemic...
DEO: Training-Free Direct Embedding Optimization for Negation-Aware Retrieval
arXiv:2603.09185v1 Announce Type: new Abstract: Recent advances in Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) have enabled diverse retrieval methods. However, existing retrieval methods often fail to accurately retrieve results for negation and exclusion queries. To address this limitation,...
Abundant Intelligence and Deficient Demand: A Macro-Financial Stress Test of Rapid AI Adoption
arXiv:2603.09209v1 Announce Type: new Abstract: We formalize a macro-financial stress test for rapid AI adoption. Rather than a productivity bust or existential risk, we identify a distribution-and-contract mismatch: AI-generated abundance coexists with demand deficiency because economic institutions are anchored to...
PRECEPT: Planning Resilience via Experience, Context Engineering & Probing Trajectories A Unified Framework for Test-Time Adaptation with Compositional Rule Learning and Pareto-Guided Prompt Evolution
arXiv:2603.09641v1 Announce Type: new Abstract: LLM agents that store knowledge as natural language suffer steep retrieval degradation as condition count grows, often struggle to compose learned rules reliably, and typically lack explicit mechanisms to detect stale or adversarial knowledge. We...
Bioalignment: Measuring and Improving LLM Disposition Toward Biological Systems for AI Safety
arXiv:2603.09154v1 Announce Type: new Abstract: Large language models (LLMs) trained on internet-scale corpora can exhibit systematic biases that increase the probability of unwanted behavior. In this study, we examined potential biases towards synthetic vs. biological technological solutions across four domains...
Learning When to Sample: Confidence-Aware Self-Consistency for Efficient LLM Chain-of-Thought Reasoning
arXiv:2603.08999v1 Announce Type: new Abstract: Large language models (LLMs) achieve strong reasoning performance through chain-of-thought (CoT) reasoning, yet often generate unnecessarily long reasoning paths that incur high inference cost. Recent self-consistency-based approaches further improve accuracy but require sampling and aggregating...
Telogenesis: Goal Is All U Need
arXiv:2603.09476v1 Announce Type: new Abstract: Goal-conditioned systems assume goals are provided externally. We ask whether attentional priorities can emerge endogenously from an agent's internal cognitive state. We propose a priority function that generates observation targets from three epistemic gaps: ignorance...
MultiGraSCCo: A Multilingual Anonymization Benchmark with Annotations of Personal Identifiers
arXiv:2603.08879v1 Announce Type: new Abstract: Accessing sensitive patient data for machine learning is challenging due to privacy concerns. Datasets with annotations of personally identifiable information are crucial for developing and testing anonymization systems to enable safe data sharing that complies...
LLM as a Meta-Judge: Synthetic Data for NLP Evaluation Metric Validation
arXiv:2603.09403v1 Announce Type: new Abstract: Validating evaluation metrics for NLG typically relies on expensive and time-consuming human annotations, which predominantly exist only for English datasets. We propose \textit{LLM as a Meta-Judge}, a scalable framework that utilizes LLMs to generate synthetic...
Cognitively Layered Data Synthesis for Domain Adaptation of LLMs to Space Situational Awareness
arXiv:2603.09231v1 Announce Type: new Abstract: Large language models (LLMs) demonstrate exceptional performance on general-purpose tasks. however, transferring them to complex engineering domains such as space situational awareness (SSA) remains challenging owing to insufficient structural alignment with mission chains, the absence...
An Empirical Study and Theoretical Explanation on Task-Level Model-Merging Collapse
arXiv:2603.09463v1 Announce Type: new Abstract: Model merging unifies independently fine-tuned LLMs from the same base, enabling reuse and integration of parallel development efforts without retraining. However, in practice we observe that merging does not always succeed: certain combinations of task-specialist...
Enhancing Debunking Effectiveness through LLM-based Personality Adaptation
arXiv:2603.09533v1 Announce Type: new Abstract: This study proposes a novel methodology for generating personalized fake news debunking messages by prompting Large Language Models (LLMs) with persona-based inputs aligned to the Big Five personality traits: Extraversion, Agreeableness, Conscientiousness, Neuroticism, and Openness....
ALARM: Audio-Language Alignment for Reasoning Models
arXiv:2603.09556v1 Announce Type: new Abstract: Large audio language models (ALMs) extend LLMs with auditory understanding. A common approach freezes the LLM and trains only an adapter on self-generated targets. However, this fails for reasoning LLMs (RLMs) whose built-in chain-of-thought traces...
Build, Borrow, or Just Fine-Tune? A Political Scientist's Guide to Choosing NLP Models
arXiv:2603.09595v1 Announce Type: new Abstract: Political scientists increasingly face a consequential choice when adopting natural language processing tools: build a domain-specific model from scratch, borrow and adapt an existing one, or simply fine-tune a general-purpose model on task data? Each...
Surgical Repair of Collapsed Attention Heads in ALiBi Transformers
arXiv:2603.09616v1 Announce Type: new Abstract: We identify a systematic attention collapse pathology in the BLOOM family of transformer language models, where ALiBi positional encoding causes 31-44% of attention heads to attend almost entirely to the beginning-of-sequence token. The collapse follows...
ESAinsTOD: A Unified End-to-End Schema-Aware Instruction-Tuning Framework for Task-Oriented Dialog Modeling
arXiv:2603.09691v1 Announce Type: new Abstract: Existing end-to-end modeling methods for modular task-oriented dialog systems are typically tailored to specific datasets, making it challenging to adapt to new dialog scenarios. In this work, we propose ESAinsTOD, a unified End-to-end Schema-Aware Instruction-tuning...
Evaluation of LLMs in retrieving food and nutritional context for RAG systems
arXiv:2603.09704v1 Announce Type: new Abstract: In this article, we evaluate four Large Language Models (LLMs) and their effectiveness at retrieving data within a specialized Retrieval-Augmented Generation (RAG) system, using a comprehensive food composition database. Our method is focused on the...
EPIC-EuroParl-UdS: Information-Theoretic Perspectives on Translation and Interpreting
arXiv:2603.09785v1 Announce Type: new Abstract: This paper introduces an updated and combined version of the bidirectional English-German EPIC-UdS (spoken) and EuroParl-UdS (written) corpora containing original European Parliament speeches as well as their translations and interpretations. The new version corrects metadata...
Equitable Multi-Task Learning for AI-RANs
arXiv:2603.08717v1 Announce Type: new Abstract: AI-enabled Radio Access Networks (AI-RANs) are expected to serve heterogeneous users with time-varying learning tasks over shared edge resources. Ensuring equitable inference performance across these users requires adaptive and fair learning mechanisms. This paper introduces...
Cross-Domain Uncertainty Quantification for Selective Prediction: A Comprehensive Bound Ablation with Transfer-Informed Betting
arXiv:2603.08907v1 Announce Type: new Abstract: We present a comprehensive ablation of nine finite-sample bound families for selective prediction with risk control, combining concentration inequalities (Hoeffding, Empirical Bernstein, Clopper-Pearson, Wasserstein DRO, CVaR) with multiple-testing corrections (union bound, Learn Then Test fixed-sequence)...
The Coupling Within: Flow Matching via Distilled Normalizing Flows
arXiv:2603.09014v1 Announce Type: new Abstract: Flow models have rapidly become the go-to method for training and deploying large-scale generators, owing their success to inference-time flexibility via adjustable integration steps. A crucial ingredient in flow training is the choice of coupling...
When to Retrain after Drift: A Data-Only Test of Post-Drift Data Size Sufficiency
arXiv:2603.09024v1 Announce Type: new Abstract: Sudden concept drift makes previously trained predictors unreliable, yet deciding when to retrain and what post-drift data size is sufficient is rarely addressed. We propose CALIPER - a detector- and model-agnostic, data-only test that estimates...
Dynamic Multi-period Experts for Online Time Series Forecasting
arXiv:2603.09062v1 Announce Type: new Abstract: Online Time Series Forecasting (OTSF) requires models to continuously adapt to concept drift. However, existing methods often treat concept drift as a monolithic phenomenon. To address this limitation, we first redefine concept drift by categorizing...
Learning Adaptive LLM Decoding
arXiv:2603.09065v1 Announce Type: new Abstract: Decoding from large language models (LLMs) typically relies on fixed sampling hyperparameters (e.g., temperature, top-p), despite substantial variation in task difficulty and uncertainty across prompts and individual decoding steps. We propose to learn adaptive decoding...
Latent-DARM: Bridging Discrete Diffusion And Autoregressive Models For Reasoning
arXiv:2603.09184v1 Announce Type: new Abstract: Most multi-agent systems rely exclusively on autoregressive language models (ARMs) that are based on sequential generation. Although effective for fluent text, ARMs limit global reasoning and plan revision. On the other hand, Discrete Diffusion Language...
The Radio-Frequency Transformer for Signal Separation
arXiv:2603.09201v1 Announce Type: new Abstract: We study a problem of signal separation: estimating a signal of interest (SOI) contaminated by an unknown non-Gaussian background/interference. Given the training data consisting of examples of SOI and interference, we show how to build...
Beyond Test-Time Training: Learning to Reason via Hardware-Efficient Optimal Control
arXiv:2603.09221v1 Announce Type: new Abstract: Associative memory has long underpinned the design of sequential models. Beyond recall, humans reason by projecting future states and selecting goal-directed actions, a capability that modern language models increasingly require but do not natively encode....
TA-GGAD: Testing-time Adaptive Graph Model for Generalist Graph Anomaly Detection
arXiv:2603.09349v1 Announce Type: new Abstract: A significant number of anomalous nodes in the real world, such as fake news, noncompliant users, malicious transactions, and malicious posts, severely compromises the health of the graph data ecosystem and urgently requires effective identification...