Intelligence Inertia: Physical Principles and Applications
arXiv:2603.22347v1 Announce Type: new Abstract: While Landauer's principle establishes the fundamental thermodynamic floor for information erasure and Fisher Information provides a metric for local curvature in parameter space, these classical frameworks function effectively only as approximations within regimes of sparse...
Computational Arbitrage in AI Model Markets
arXiv:2603.22404v1 Announce Type: new Abstract: Consider a market of competing model providers selling query access to models with varying costs and capabilities. Customers submit problem instances and are willing to pay up to a budget for a verifiable solution. An...
AI Mental Models: Learned Intuition and Deliberation in a Bounded Neural Architecture
arXiv:2603.22561v1 Announce Type: new Abstract: This paper asks whether a bounded neural architecture can exhibit a meaningful division of labor between intuition and deliberation on a classic 64-item syllogistic reasoning benchmark. More broadly, the benchmark is relevant to ongoing debates...
From Static Templates to Dynamic Runtime Graphs: A Survey of Workflow Optimization for LLM Agents
arXiv:2603.22386v1 Announce Type: new Abstract: Large language model (LLM)-based systems are becoming increasingly popular for solving tasks by constructing executable workflows that interleave LLM calls, information retrieval, tool use, code execution, memory updates, and verification. This survey reviews recent methods...
Whether, Not Which: Mechanistic Interpretability Reveals Dissociable Affect Reception and Emotion Categorization in LLMs
arXiv:2603.22295v1 Announce Type: new Abstract: Large language models appear to develop internal representations of emotion -- "emotion circuits," "emotion neurons," and structured emotional manifolds have been reported across multiple model families. But every study making these claims uses stimuli signalled...
Towards Automated Community Notes Generation with Large Vision Language Models for Combating Contextual Deception
arXiv:2603.22453v1 Announce Type: new Abstract: Community Notes have emerged as an effective crowd-sourced mechanism for combating online deception on social media platforms. However, its reliance on human contributors limits both the timeliness and scalability. In this work, we study the...
CoMaTrack: Competitive Multi-Agent Game-Theoretic Tracking with Vision-Language-Action Models
arXiv:2603.22846v1 Announce Type: new Abstract: Embodied Visual Tracking (EVT), a core dynamic task in embodied intelligence, requires an agent to precisely follow a language-specified target. Yet most existing methods rely on single-agent imitation learning, suffering from costly expert data and...
Beyond Preset Identities: How Agents Form Stances and Boundaries in Generative Societies
arXiv:2603.23406v1 Announce Type: new Abstract: While large language models simulate social behaviors, their capacity for stable stance formation and identity negotiation during complex interventions remains unclear. To overcome the limitations of static evaluations, this paper proposes a novel mixed-methods framework...
LLM Olympiad: Why Model Evaluation Needs a Sealed Exam
arXiv:2603.23292v1 Announce Type: new Abstract: Benchmarks and leaderboards are how NLP most often communicates progress, but in the LLM era they are increasingly easy to misread. Scores can reflect benchmark-chasing, hidden evaluation choices, or accidental exposure to test content --...
CAPITU: A Benchmark for Evaluating Instruction-Following in Brazilian Portuguese with Literary Context
arXiv:2603.22576v1 Announce Type: new Abstract: We introduce CAPITU, a benchmark for evaluating instruction-following capabilities of Large Language Models (LLMs) in Brazilian Portuguese. Unlike existing benchmarks that focus on English or use generic prompts, CAPITU contextualizes all tasks within eight canonical...
Improving LLM Predictions via Inter-Layer Structural Encoders
arXiv:2603.22665v1 Announce Type: new Abstract: The standard practice in Large Language Models (LLMs) is to base predictions on the final-layer token representations. Recent studies, however, show that intermediate layers encode substantial information, which may contain more task-relevant features than the...
MERIT: Memory-Enhanced Retrieval for Interpretable Knowledge Tracing
arXiv:2603.22289v1 Announce Type: new Abstract: Knowledge Tracing (KT) models students' evolving knowledge states to predict future performance, serving as a foundation for personalized education. While traditional deep learning models achieve high accuracy, they often lack interpretability. Large Language Models (LLMs)...
Memory Bear AI Memory Science Engine for Multimodal Affective Intelligence: A Technical Report
arXiv:2603.22306v1 Announce Type: new Abstract: Affective judgment in real interaction is rarely a purely local prediction problem. Emotional meaning often depends on prior trajectory, accumulated context, and multimodal evidence that may be weak, noisy, or incomplete at the current moment....
The Efficiency Attenuation Phenomenon: A Computational Challenge to the Language of Thought Hypothesis
arXiv:2603.22312v1 Announce Type: new Abstract: This paper computationally investigates whether thought requires a language-like format, as posited by the Language of Thought (LoT) hypothesis. We introduce the ``AI Private Language'' thought experiment: if two artificial agents develop an efficient, inscrutable...
Understanding LLM Performance Degradation in Multi-Instance Processing: The Roles of Instance Count and Context Length
arXiv:2603.22608v1 Announce Type: new Abstract: Users often rely on Large Language Models (LLMs) for processing multiple documents or performing analysis over a number of instances. For example, analysing the overall sentiment of a number of movie reviews requires an LLM...
KALAVAI: Predicting When Independent Specialist Fusion Works -- A Quantitative Model for Post-Hoc Cooperative LLM Training
arXiv:2603.22755v1 Announce Type: new Abstract: Independently trained domain specialists can be fused post-hoc into a single model that outperforms any individual specialist, and the gain is predictable: gain = 0.82 x divergence - 2.72 (R^2 = 0.856, n=6, 3-26% divergence)....
Avoiding Over-smoothing in Social Media Rumor Detection with Pre-trained Propagation Tree Transformer
arXiv:2603.22854v1 Announce Type: new Abstract: Deep learning techniques for rumor detection typically utilize Graph Neural Networks (GNNs) to analyze post relations. These methods, however, falter due to over-smoothing issues when processing rumor propagation structures, leading to declining performance. Our investigation...
Quality Over Clicks: Intrinsic Quality-Driven Iterative Reinforcement Learning for Cold-Start E-Commerce Query Suggestion
arXiv:2603.22922v1 Announce Type: new Abstract: Existing dialogue systems rely on Query Suggestion (QS) to enhance user engagement. Recent efforts typically employ large language models with Click-Through Rate (CTR) model, yet fail in cold-start scenarios due to their heavy reliance on...
Beyond Hate: Differentiating Uncivil and Intolerant Speech in Multimodal Content Moderation
arXiv:2603.22985v1 Announce Type: new Abstract: Current multimodal toxicity benchmarks typically use a single binary hatefulness label. This coarse approach conflates two fundamentally different characteristics of expression: tone and content. Drawing on communication science theory, we introduce a fine-grained annotation scheme...
PaperVoyager : Building Interactive Web with Visual Language Models
arXiv:2603.22999v1 Announce Type: new Abstract: Recent advances in visual language models have enabled autonomous agents for complex reasoning, tool use, and document understanding. However, existing document agents mainly transform papers into static artifacts such as summaries, webpages, or slides, which...
AuthorMix: Modular Authorship Style Transfer via Layer-wise Adapter Mixing
arXiv:2603.23069v1 Announce Type: new Abstract: The task of authorship style transfer involves rewriting text in the style of a target author while preserving the meaning of the original text. Existing style transfer methods train a single model on large corpora...
HGNet: Scalable Foundation Model for Automated Knowledge Graph Generation from Scientific Literature
arXiv:2603.23136v1 Announce Type: new Abstract: Automated knowledge graph (KG) construction is essential for navigating the rapidly expanding body of scientific literature. However, existing approaches struggle to recognize long multi-word entities, often fail to generalize across domains, and typically overlook the...
Why AI-Generated Text Detection Fails: Evidence from Explainable AI Beyond Benchmark Accuracy
arXiv:2603.23146v1 Announce Type: new Abstract: The widespread adoption of Large Language Models (LLMs) has made the detection of AI-Generated text a pressing and complex challenge. Although many detection systems report high benchmark accuracy, their reliability in real-world settings remains uncertain,...
ImplicitRM: Unbiased Reward Modeling from Implicit Preference Data for LLM alignment
arXiv:2603.23184v1 Announce Type: new Abstract: Reward modeling represents a long-standing challenge in reinforcement learning from human feedback (RLHF) for aligning language models. Current reward modeling is heavily contingent upon experimental feedback data with high collection costs. In this work, we...
Decoding AI Authorship: Can LLMs Truly Mimic Human Style Across Literature and Politics?
arXiv:2603.23219v1 Announce Type: new Abstract: Amidst the rising capabilities of generative AI to mimic specific human styles, this study investigates the ability of state-of-the-art large language models (LLMs), including GPT-4o, Gemini 1.5 Pro, and Claude Sonnet 3.5, to emulate the...
I Came, I Saw, I Explained: Benchmarking Multimodal LLMs on Figurative Meaning in Memes
arXiv:2603.23229v1 Announce Type: new Abstract: Internet memes represent a popular form of multimodal online communication and often use figurative elements to convey layered meaning through the combination of text and images. However, it remains largely unclear how multimodal large language...
Beyond Hard Constraints: Budget-Conditioned Reachability For Safe Offline Reinforcement Learning
arXiv:2603.22292v1 Announce Type: new Abstract: Sequential decision making using Markov Decision Process underpins many realworld applications. Both model-based and model free methods have achieved strong results in these settings. However, real-world tasks must balance reward maximization with safety constraints, often...
Scaling Attention via Feature Sparsity
arXiv:2603.22300v1 Announce Type: new Abstract: Scaling Transformers to ultra-long contexts is bottlenecked by the $O(n^2 d)$ cost of self-attention. Existing methods reduce this cost along the sequence axis through local windows, kernel approximations, or token-level sparsity, but these approaches consistently...
Latent Semantic Manifolds in Large Language Models
arXiv:2603.22301v1 Announce Type: new Abstract: Large Language Models (LLMs) perform internal computations in continuous vector spaces yet produce discrete tokens -- a fundamental mismatch whose geometric consequences remain poorly understood. We develop a mathematical framework that interprets LLM hidden states...
Research on Individual Trait Clustering and Development Pathway Adaptation Based on the K-means Algorithm
arXiv:2603.22302v1 Announce Type: new Abstract: With the development of information technology, the application of artificial intelligence and machine learning in the field of education shows great potential. This study aims to explore how to utilize K-means clustering algorithm to provide...