Making Large Language Models Speak Tulu: Structured Prompting for an Extremely Low-Resource Language
arXiv:2602.15378v1 Announce Type: new Abstract: Can large language models converse in languages virtually absent from their training data? We investigate this question through a case study on Tulu, a Dravidian language with over 2 million speakers but minimal digital presence....
The Vision Wormhole: Latent-Space Communication in Heterogeneous Multi-Agent Systems
arXiv:2602.15382v1 Announce Type: new Abstract: Multi-Agent Systems (MAS) powered by Large Language Models have unlocked advanced collaborative reasoning, yet they remain shackled by the inefficiency of discrete text communication, which imposes significant runtime overhead and information quantization loss. While latent...
TAROT: Test-driven and Capability-adaptive Curriculum Reinforcement Fine-tuning for Code Generation with Large Language Models
arXiv:2602.15449v1 Announce Type: new Abstract: Large Language Models (LLMs) are changing the coding paradigm, known as vibe coding, yet synthesizing algorithmically sophisticated and robust code still remains a critical challenge. Incentivizing the deep reasoning capabilities of LLMs is essential to...
In Agents We Trust, but Who Do Agents Trust? Latent Source Preferences Steer LLM Generations
arXiv:2602.15456v1 Announce Type: new Abstract: Agents based on Large Language Models (LLMs) are increasingly being deployed as interfaces to information on online platforms. These agents filter, prioritize, and synthesize information retrieved from the platforms' back-end databases or via web search....
LuxMT Technical Report
arXiv:2602.15506v1 Announce Type: new Abstract: We introduce LuxMT, a machine translation system based on Gemma 3 27B and fine-tuned for translation from Luxembourgish (LB) into French (FR) and English (EN). To assess translation performance, we construct a novel benchmark covering...
Fine-Refine: Iterative Fine-grained Refinement for Mitigating Dialogue Hallucination
arXiv:2602.15509v1 Announce Type: new Abstract: The tendency for hallucination in current large language models (LLMs) negatively impacts dialogue systems. Such hallucinations produce factually incorrect responses that may mislead users and undermine system trust. Existing refinement methods for dialogue systems typically...
Perspectives - Interactive Document Clustering in the Discourse Analysis Tool Suite
arXiv:2602.15540v1 Announce Type: new Abstract: This paper introduces Perspectives, an interactive extension of the Discourse Analysis Tool Suite designed to empower Digital Humanities (DH) scholars to explore and organize large, unstructured document collections. Perspectives implements a flexible, aspect-focused document clustering...
LLM-to-Speech: A Synthetic Data Pipeline for Training Dialectal Text-to-Speech Models
arXiv:2602.15675v1 Announce Type: new Abstract: Despite the advances in neural text to speech (TTS), many Arabic dialectal varieties remain marginally addressed, with most resources concentrated on Modern Spoken Arabic (MSA) and Gulf dialects, leaving Egyptian Arabic -- the most widely...
Revisiting Northrop Frye's Four Myths Theory with Large Language Models
arXiv:2602.15678v1 Announce Type: new Abstract: Northrop Frye's theory of four fundamental narrative genres (comedy, romance, tragedy, satire) has profoundly influenced literary criticism, yet computational approaches to his framework have focused primarily on narrative patterns rather than character functions. In this...
Rethinking Metrics for Lexical Semantic Change Detection
arXiv:2602.15716v1 Announce Type: new Abstract: Lexical semantic change detection (LSCD) increasingly relies on contextualised language model embeddings, yet most approaches still quantify change using a small set of semantic change metrics, primarily Average Pairwise Distance (APD) and cosine distance over...
Ethical Considerations in Artificial Intelligence: Addressing Bias and Fairness in Algorithmic Decision-Making
The expanding use of artificial intelligence (AI) in decision-making across a range of industries has given rise to serious ethical questions about prejudice and justice. This study looks at the moral ramifications of using AI algorithms in decision-making and looks...
Optimization Instability in Autonomous Agentic Workflows for Clinical Symptom Detection
arXiv:2602.16037v1 Announce Type: new Abstract: Autonomous agentic workflows that iteratively refine their own behavior hold considerable promise, yet their failure modes remain poorly characterized. We investigate optimization instability, a phenomenon in which continued autonomous improvement paradoxically degrades classifier performance, using...
How Uncertain Is the Grade? A Benchmark of Uncertainty Metrics for LLM-Based Automatic Assessment
arXiv:2602.16039v1 Announce Type: new Abstract: The rapid rise of large language models (LLMs) is reshaping the landscape of automatic assessment in education. While these systems demonstrate substantial advantages in adaptability to diverse question types and flexibility in output formats, they...
Improving Interactive In-Context Learning from Natural Language Feedback
arXiv:2602.16066v1 Announce Type: new Abstract: Adapting one's thought process based on corrective feedback is an essential ability in human learning, particularly in collaborative settings. In contrast, the current large language model training paradigm relies heavily on modeling vast, static corpora....
Learning Personalized Agents from Human Feedback
arXiv:2602.16173v1 Announce Type: new Abstract: Modern AI agents are powerful but often fail to align with the idiosyncratic, evolving preferences of individual users. Prior approaches typically rely on static datasets, either training implicit preference models on interaction history or encoding...
EnterpriseGym Corecraft: Training Generalizable Agents on High-Fidelity RL Environments
arXiv:2602.16179v1 Announce Type: new Abstract: We show that training AI agents on high-fidelity reinforcement learning environments produces capabilities that generalize beyond the training distribution. We introduce \corecraft{}, the first environment in \textsc{EnterpriseGym}, Surge AI's suite of agentic RL environments. \corecraft{}...
Multi-agent cooperation through in-context co-player inference
arXiv:2602.16301v1 Announce Type: new Abstract: Achieving cooperation among self-interested agents remains a fundamental challenge in multi-agent reinforcement learning. Recent work showed that mutual cooperation can be induced between "learning-aware" agents that account for and shape the learning dynamics of their...
Framework of Thoughts: A Foundation Framework for Dynamic and Optimized Reasoning based on Chains, Trees, and Graphs
arXiv:2602.16512v1 Announce Type: new Abstract: Prompting schemes such as Chain of Thought, Tree of Thoughts, and Graph of Thoughts can significantly enhance the reasoning capabilities of large language models. However, most existing schemes require users to define static, problem-specific reasoning...
Agent Skill Framework: Perspectives on the Potential of Small Language Models in Industrial Environments
arXiv:2602.16653v1 Announce Type: new Abstract: Agent Skill framework, now widely and officially supported by major players such as GitHub Copilot, LangChain, and OpenAI, performs especially well with proprietary models by improving context engineering, reducing hallucinations, and boosting task accuracy. Based...
Towards a Science of AI Agent Reliability
arXiv:2602.16666v1 Announce Type: new Abstract: AI agents are increasingly deployed to execute important tasks. While rising accuracy scores on standard benchmarks suggest rapid progress, many agents still continue to fail in practice. This discrepancy highlights a fundamental limitation of current...
The Perplexity Paradox: Why Code Compresses Better Than Math in LLM Prompts
arXiv:2602.15843v1 Announce Type: cross Abstract: In "Compress or Route?" (Johnson, 2026), we found that code generation tolerates aggressive prompt compression (r >= 0.6) while chain-of-thought reasoning degrades gradually. That study was limited to HumanEval (164 problems), left the "perplexity paradox"...
Do Personality Traits Interfere? Geometric Limitations of Steering in Large Language Models
arXiv:2602.15847v1 Announce Type: cross Abstract: Personality steering in large language models (LLMs) commonly relies on injecting trait-specific steering vectors, implicitly assuming that personality traits can be controlled independently. In this work, we examine whether this assumption holds by analysing the...
Institutionalizing trust in AI governance: from ethical principles to legal design
Building Safe and Deployable Clinical Natural Language Processing under Temporal Leakage Constraints
arXiv:2602.15852v1 Announce Type: cross Abstract: Clinical natural language processing (NLP) models have shown promise for supporting hospital discharge planning by leveraging narrative clinical documentation. However, note-based models are particularly vulnerable to temporal and lexical leakage, where documentation artifacts encode future...
Rethinking Soft Compression in Retrieval-Augmented Generation: A Query-Conditioned Selector Perspective
arXiv:2602.15856v1 Announce Type: cross Abstract: Retrieval-Augmented Generation (RAG) effectively grounds Large Language Models (LLMs) with external knowledge and is widely applied to Web-related tasks. However, its scalability is hindered by excessive context length and redundant retrievals. Recent research on soft...
State Design Matters: How Representations Shape Dynamic Reasoning in Large Language Models
arXiv:2602.15858v1 Announce Type: cross Abstract: As large language models (LLMs) move from static reasoning tasks toward dynamic environments, their success depends on the ability to navigate and respond to an environment that changes as they interact at inference time. An...
NLP Privacy Risk Identification in Social Media (NLP-PRISM): A Survey
arXiv:2602.15866v1 Announce Type: cross Abstract: Natural Language Processing (NLP) is integral to social media analytics but often processes content containing Personally Identifiable Information (PII), behavioral cues, and metadata raising privacy risks such as surveillance, profiling, and targeted advertising. To systematically...
Fly0: Decoupling Semantic Grounding from Geometric Planning for Zero-Shot Aerial Navigation
arXiv:2602.15875v1 Announce Type: cross Abstract: Current Visual-Language Navigation (VLN) methodologies face a trade-off between semantic understanding and control precision. While Multimodal Large Language Models (MLLMs) offer superior reasoning, deploying them as low-level controllers leads to high latency, trajectory oscillations, and...
IT-OSE: Exploring Optimal Sample Size for Industrial Data Augmentation
arXiv:2602.15878v1 Announce Type: cross Abstract: In industrial scenarios, data augmentation is an effective approach to improve model performance. However, its benefits are not unidirectionally beneficial. There is no theoretical research or established estimation for the optimal sample size (OSS) in...
FUTURE-VLA: Forecasting Unified Trajectories Under Real-time Execution
arXiv:2602.15882v1 Announce Type: cross Abstract: General vision-language models increasingly support unified spatiotemporal reasoning over long video streams, yet deploying such capabilities on robots remains constrained by the prohibitive latency of processing long-horizon histories and generating high-dimensional future predictions. To bridge...