CircuChain: Disentangling Competence and Compliance in LLM Circuit Analysis
arXiv:2602.15037v1 Announce Type: cross Abstract: As large language models (LLMs) advance toward expert-level performance in engineering domains, reliable reasoning under user-specified constraints becomes critical. In circuit analysis, for example, a numerically correct solution is insufficient if it violates established methodological...
Indic-TunedLens: Interpreting Multilingual Models in Indian Languages
arXiv:2602.15038v1 Announce Type: cross Abstract: Multilingual large language models (LLMs) are increasingly deployed in linguistically diverse regions like India, yet most interpretability tools remain tailored to English. Prior work reveals that LLMs often operate in English centric representation spaces, making...
GRACE: an Agentic AI for Particle Physics Experiment Design and Simulation
arXiv:2602.15039v1 Announce Type: cross Abstract: We present GRACE, a simulation-native agent for autonomous experimental design in high-energy and nuclear physics. Given multimodal input in the form of a natural-language prompt or a published experimental paper, the agent extracts a structured...
Beyond Context Sharing: A Unified Agent Communication Protocol (ACP) for Secure, Federated, and Autonomous Agent-to-Agent (A2A) Orchestration
arXiv:2602.15055v1 Announce Type: cross Abstract: In the artificial intelligence space, as we transition from isolated large language models to autonomous agents capable of complex reasoning and tool use. While foundational architectures and local context management protocols have been established, the...
PolyNODE: Variable-dimension Neural ODEs on M-polyfolds
arXiv:2602.15128v1 Announce Type: cross Abstract: Neural ordinary differential equations (NODEs) are geometric deep learning models based on dynamical systems and flows generated by vector fields on manifolds. Despite numerous successful applications, particularly within the flow matching paradigm, all existing NODE...
AIC CTU@AVerImaTeC: dual-retriever RAG for image-text fact checking
arXiv:2602.15190v1 Announce Type: new Abstract: In this paper, we present our 3rd place system in the AVerImaTeC shared task, which combines our last year's retrieval-augmented generation (RAG) pipeline with a reverse image search (RIS) module. Despite its simplicity, our system...
Extracting Consumer Insight from Text: A Large Language Model Approach to Emotion and Evaluation Measurement
arXiv:2602.15312v1 Announce Type: new Abstract: Accurately measuring consumer emotions and evaluations from unstructured text remains a core challenge for marketing research and practice. This study introduces the Linguistic eXtractor (LX), a fine-tuned, large language model trained on consumer-authored text that...
NeuroSymActive: Differentiable Neural-Symbolic Reasoning with Active Exploration for Knowledge Graph Question Answering
arXiv:2602.15353v1 Announce Type: new Abstract: Large pretrained language models and neural reasoning systems have advanced many natural language tasks, yet they remain challenged by knowledge-intensive queries that require precise, structured multi-hop inference. Knowledge graphs provide a compact symbolic substrate for...
Far Out: Evaluating Language Models on Slang in Australian and Indian English
arXiv:2602.15373v1 Announce Type: new Abstract: Language models exhibit systematic performance gaps when processing text in non-standard language varieties, yet their ability to comprehend variety-specific slang remains underexplored for several languages. We present a comprehensive evaluation of slang awareness in Indian...
Making Large Language Models Speak Tulu: Structured Prompting for an Extremely Low-Resource Language
arXiv:2602.15378v1 Announce Type: new Abstract: Can large language models converse in languages virtually absent from their training data? We investigate this question through a case study on Tulu, a Dravidian language with over 2 million speakers but minimal digital presence....
The Vision Wormhole: Latent-Space Communication in Heterogeneous Multi-Agent Systems
arXiv:2602.15382v1 Announce Type: new Abstract: Multi-Agent Systems (MAS) powered by Large Language Models have unlocked advanced collaborative reasoning, yet they remain shackled by the inefficiency of discrete text communication, which imposes significant runtime overhead and information quantization loss. While latent...
TAROT: Test-driven and Capability-adaptive Curriculum Reinforcement Fine-tuning for Code Generation with Large Language Models
arXiv:2602.15449v1 Announce Type: new Abstract: Large Language Models (LLMs) are changing the coding paradigm, known as vibe coding, yet synthesizing algorithmically sophisticated and robust code still remains a critical challenge. Incentivizing the deep reasoning capabilities of LLMs is essential to...
In Agents We Trust, but Who Do Agents Trust? Latent Source Preferences Steer LLM Generations
arXiv:2602.15456v1 Announce Type: new Abstract: Agents based on Large Language Models (LLMs) are increasingly being deployed as interfaces to information on online platforms. These agents filter, prioritize, and synthesize information retrieved from the platforms' back-end databases or via web search....
LuxMT Technical Report
arXiv:2602.15506v1 Announce Type: new Abstract: We introduce LuxMT, a machine translation system based on Gemma 3 27B and fine-tuned for translation from Luxembourgish (LB) into French (FR) and English (EN). To assess translation performance, we construct a novel benchmark covering...
Fine-Refine: Iterative Fine-grained Refinement for Mitigating Dialogue Hallucination
arXiv:2602.15509v1 Announce Type: new Abstract: The tendency for hallucination in current large language models (LLMs) negatively impacts dialogue systems. Such hallucinations produce factually incorrect responses that may mislead users and undermine system trust. Existing refinement methods for dialogue systems typically...
Perspectives - Interactive Document Clustering in the Discourse Analysis Tool Suite
arXiv:2602.15540v1 Announce Type: new Abstract: This paper introduces Perspectives, an interactive extension of the Discourse Analysis Tool Suite designed to empower Digital Humanities (DH) scholars to explore and organize large, unstructured document collections. Perspectives implements a flexible, aspect-focused document clustering...
LLM-to-Speech: A Synthetic Data Pipeline for Training Dialectal Text-to-Speech Models
arXiv:2602.15675v1 Announce Type: new Abstract: Despite the advances in neural text to speech (TTS), many Arabic dialectal varieties remain marginally addressed, with most resources concentrated on Modern Spoken Arabic (MSA) and Gulf dialects, leaving Egyptian Arabic -- the most widely...
Revisiting Northrop Frye's Four Myths Theory with Large Language Models
arXiv:2602.15678v1 Announce Type: new Abstract: Northrop Frye's theory of four fundamental narrative genres (comedy, romance, tragedy, satire) has profoundly influenced literary criticism, yet computational approaches to his framework have focused primarily on narrative patterns rather than character functions. In this...
Rethinking Metrics for Lexical Semantic Change Detection
arXiv:2602.15716v1 Announce Type: new Abstract: Lexical semantic change detection (LSCD) increasingly relies on contextualised language model embeddings, yet most approaches still quantify change using a small set of semantic change metrics, primarily Average Pairwise Distance (APD) and cosine distance over...
Ethical Considerations in Artificial Intelligence: Addressing Bias and Fairness in Algorithmic Decision-Making
The expanding use of artificial intelligence (AI) in decision-making across a range of industries has given rise to serious ethical questions about prejudice and justice. This study looks at the moral ramifications of using AI algorithms in decision-making and looks...
Optimization Instability in Autonomous Agentic Workflows for Clinical Symptom Detection
arXiv:2602.16037v1 Announce Type: new Abstract: Autonomous agentic workflows that iteratively refine their own behavior hold considerable promise, yet their failure modes remain poorly characterized. We investigate optimization instability, a phenomenon in which continued autonomous improvement paradoxically degrades classifier performance, using...
How Uncertain Is the Grade? A Benchmark of Uncertainty Metrics for LLM-Based Automatic Assessment
arXiv:2602.16039v1 Announce Type: new Abstract: The rapid rise of large language models (LLMs) is reshaping the landscape of automatic assessment in education. While these systems demonstrate substantial advantages in adaptability to diverse question types and flexibility in output formats, they...
Improving Interactive In-Context Learning from Natural Language Feedback
arXiv:2602.16066v1 Announce Type: new Abstract: Adapting one's thought process based on corrective feedback is an essential ability in human learning, particularly in collaborative settings. In contrast, the current large language model training paradigm relies heavily on modeling vast, static corpora....
Learning Personalized Agents from Human Feedback
arXiv:2602.16173v1 Announce Type: new Abstract: Modern AI agents are powerful but often fail to align with the idiosyncratic, evolving preferences of individual users. Prior approaches typically rely on static datasets, either training implicit preference models on interaction history or encoding...
EnterpriseGym Corecraft: Training Generalizable Agents on High-Fidelity RL Environments
arXiv:2602.16179v1 Announce Type: new Abstract: We show that training AI agents on high-fidelity reinforcement learning environments produces capabilities that generalize beyond the training distribution. We introduce \corecraft{}, the first environment in \textsc{EnterpriseGym}, Surge AI's suite of agentic RL environments. \corecraft{}...
Multi-agent cooperation through in-context co-player inference
arXiv:2602.16301v1 Announce Type: new Abstract: Achieving cooperation among self-interested agents remains a fundamental challenge in multi-agent reinforcement learning. Recent work showed that mutual cooperation can be induced between "learning-aware" agents that account for and shape the learning dynamics of their...
Framework of Thoughts: A Foundation Framework for Dynamic and Optimized Reasoning based on Chains, Trees, and Graphs
arXiv:2602.16512v1 Announce Type: new Abstract: Prompting schemes such as Chain of Thought, Tree of Thoughts, and Graph of Thoughts can significantly enhance the reasoning capabilities of large language models. However, most existing schemes require users to define static, problem-specific reasoning...
Agent Skill Framework: Perspectives on the Potential of Small Language Models in Industrial Environments
arXiv:2602.16653v1 Announce Type: new Abstract: Agent Skill framework, now widely and officially supported by major players such as GitHub Copilot, LangChain, and OpenAI, performs especially well with proprietary models by improving context engineering, reducing hallucinations, and boosting task accuracy. Based...
Towards a Science of AI Agent Reliability
arXiv:2602.16666v1 Announce Type: new Abstract: AI agents are increasingly deployed to execute important tasks. While rising accuracy scores on standard benchmarks suggest rapid progress, many agents still continue to fail in practice. This discrepancy highlights a fundamental limitation of current...
The Perplexity Paradox: Why Code Compresses Better Than Math in LLM Prompts
arXiv:2602.15843v1 Announce Type: cross Abstract: In "Compress or Route?" (Johnson, 2026), we found that code generation tolerates aggressive prompt compression (r >= 0.6) while chain-of-thought reasoning degrades gradually. That study was limited to HumanEval (164 problems), left the "perplexity paradox"...