Trajectory-Informed Memory Generation for Self-Improving Agent Systems
arXiv:2603.10600v1 Announce Type: new Abstract: LLM-powered agents face a persistent challenge: learning from their execution experiences to improve future performance. While agents can successfully complete many tasks, they often repeat inefficient patterns, fail to recover from similar errors, and miss...
A Retrieval-Augmented Language Assistant for Unmanned Aircraft Safety Assessment and Regulatory Compliance
arXiv:2603.09999v1 Announce Type: cross Abstract: This paper presents the design and validation of a retrieval-based assistant that supports safety assessment, certification activities, and regulatory compliance for unmanned aircraft systems. The work is motivated by the growing complexity of drone operations...
Does LLM Alignment Really Need Diversity? An Empirical Study of Adapting RLVR Methods for Moral Reasoning
arXiv:2603.10588v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) has achieved remarkable success in logical reasoning tasks, yet whether large language model (LLM) alignment requires fundamentally different approaches remains unclear. Given the apparent tolerance for multiple valid responses...
TAMUSA-Chat: A Domain-Adapted Large Language Model Conversational System for Research and Responsible Deployment
arXiv:2603.09992v1 Announce Type: cross Abstract: This paper presents TAMUSA-Chat, a research-oriented framework for building domain-adapted large language model conversational systems. The work addresses critical challenges in adapting general-purpose foundation models to institutional contexts through supervised fine-tuning, retrieval-augmented generation, and systematic...
Empathy Is Not What Changed: Clinical Assessment of Psychological Safety Across GPT Model Generations
arXiv:2603.09997v1 Announce Type: cross Abstract: When OpenAI deprecated GPT-4o in early 2026, thousands of users protested under #keep4o, claiming newer models had "lost their empathy." No published study has tested this claim. We conducted the first clinical measurement, evaluating three...
PoultryLeX-Net: Domain-Adaptive Dual-Stream Transformer Architecture for Large-Scale Poultry Stakeholder Modeling
arXiv:2603.09991v1 Announce Type: cross Abstract: The rapid growth of the global poultry industry, driven by rising demand for affordable animal protein, has intensified public discourse surrounding production practices, housing, management, animal welfare, and supply-chain transparency. Social media platforms such as...
There Are No Silly Questions: Evaluation of Offline LLM Capabilities from a Turkish Perspective
arXiv:2603.09996v1 Announce Type: cross Abstract: The integration of large language models (LLMs) into educational processes introduces significant constraints regarding data privacy and reliability, particularly in pedagogically vulnerable contexts such as Turkish heritage language education. This study aims to systematically evaluate...
HEAL: Hindsight Entropy-Assisted Learning for Reasoning Distillation
arXiv:2603.10359v1 Announce Type: new Abstract: Distilling reasoning capabilities from Large Reasoning Models (LRMs) into smaller models is typically constrained by the limitation of rejection sampling. Standard methods treat the teacher as a static filter, discarding complex "corner-case" problems where the...
Emulating Clinician Cognition via Self-Evolving Deep Clinical Research
arXiv:2603.10677v1 Announce Type: new Abstract: Clinical diagnosis is a complex cognitive process, grounded in dynamic cue acquisition and continuous expertise accumulation. Yet most current artificial intelligence (AI) systems are misaligned with this reality, treating diagnosis as single-pass retrospective prediction while...
Adaptive RAN Slicing Control via Reward-Free Self-Finetuning Agents
arXiv:2603.10564v1 Announce Type: new Abstract: The integration of Generative AI models into AI-native network systems offers a transformative path toward achieving autonomous and adaptive control. However, the application of such models to continuous control tasks is impeded by intrinsic architectural...
Assessing Cognitive Biases in LLMs for Judicial Decision Support: Virtuous Victim and Halo Effects
arXiv:2603.10016v1 Announce Type: cross Abstract: We investigate whether large language models (LLMs) display human-like cognitive biases, focusing on potential implications for assistance in judicial sentencing, a decision-making system where fairness is paramount. Two of the most relevant biases were chosen:...
Verbalizing LLM's Higher-order Uncertainty via Imprecise Probabilities
arXiv:2603.10396v1 Announce Type: new Abstract: Despite the growing demand for eliciting uncertainty from large language models (LLMs), empirical evidence suggests that LLM behavior is not always adequately captured by the elicitation techniques developed under the classical probabilistic uncertainty framework. This...
Personalized Group Relative Policy Optimization for Heterogenous Preference Alignment
arXiv:2603.10009v1 Announce Type: cross Abstract: Despite their sophisticated general-purpose capabilities, Large Language Models (LLMs) often fail to align with diverse individual preferences because standard post-training methods, like Reinforcement Learning with Human Feedback (RLHF), optimize for a single, global objective. While...
Explainable LLM Unlearning Through Reasoning
arXiv:2603.09980v1 Announce Type: cross Abstract: LLM unlearning is essential for mitigating safety, copyright, and privacy concerns in pre-trained large language models (LLMs). Compared to preference alignment, it offers a more explicit way by removing undesirable knowledge characterized by specific unlearning...
Automated evaluation of LLMs for effective machine translation of Mandarin Chinese to English
arXiv:2603.09998v1 Announce Type: cross Abstract: Although Large Language Models (LLMs) have exceptional performance in machine translation, only a limited systematic assessment of translation quality has been done. The challenge lies in automated frameworks, as human-expert-based evaluations can be time-consuming, given...
Evolving Demonstration Optimization for Chain-of-Thought Feature Transformation
arXiv:2603.09987v1 Announce Type: cross Abstract: Feature Transformation (FT) is a core data-centric AI task that improves feature space quality to advance downstream predictive performance. However, discovering effective transformations remains challenging due to the large space of feature-operator combinations. Existing solutions...
One Model, Many Skills: Parameter-Efficient Fine-Tuning for Multitask Code Analysis
arXiv:2603.09978v1 Announce Type: cross Abstract: Large language models have recently surpassed specialized systems on code generation, yet their effectiveness on other code-analysis tasks remains less clear. At the same time, multi-task learning offers a way to unify diverse objectives within...
Nurture-First Agent Development: Building Domain-Expert AI Agents Through Conversational Knowledge Crystallization
arXiv:2603.10808v1 Announce Type: new Abstract: The emergence of large language model (LLM)-based agent frameworks has shifted the primary challenge in building domain-expert AI agents from raw capability to effective encoding of domain expertise. Two dominant paradigms -- code-first development, which...
Resource-constrained Amazons chess decision framework integrating large language models and graph attention
arXiv:2603.10512v1 Announce Type: new Abstract: Artificial intelligence has advanced significantly through the development of intelligent game-playing systems, providing rigorous testbeds for decision-making, strategic planning, and adaptive learning. However, resource-constrained environments pose critical challenges, as conventional deep learning methods heavily rely...
Defining AI Models and AI Systems: A Framework to Resolve the Boundary Problem
arXiv:2603.10023v1 Announce Type: cross Abstract: Emerging AI regulations assign distinct obligations to different actors along the AI value chain (e.g., the EU AI Act distinguishes providers and deployers for both AI models and AI systems), yet the foundational terms "AI...
RedFuser: An Automatic Operator Fusion Framework for Cascaded Reductions on AI Accelerators
arXiv:2603.10026v1 Announce Type: cross Abstract: Operator fusion, as a key performance optimization technique in the deployment of AI models, significantly improves execution efficiency and has been widely adopted in modern AI compilers. However, for cascaded reduction operations involving multiple loops...
The DMA Streaming Framework: Kernel-Level Buffer Orchestration for High-Performance AI Data Paths
arXiv:2603.10030v1 Announce Type: cross Abstract: AI transport libraries move bytes efficiently, but they commonly assume that buffers are already correctly allocated, placed, shared, registered, and safe under completion and teardown pressure. This paper presents dmaplane, a Linux kernel module that...
Large Language Models and Book Summarization: Reading or Remembering, Which Is Better?
arXiv:2603.09981v1 Announce Type: new Abstract: Summarization is a core task in Natural Language Processing (NLP). Recent advances in Large Language Models (LLMs) and the introduction of large context windows reaching millions of tokens make it possible to process entire books...
An Efficient Hybrid Deep Learning Approach for Detecting Online Abusive Language
arXiv:2603.09984v1 Announce Type: new Abstract: The digital age has expanded social media and online forums, allowing free expression for nearly 45% of the global population. Yet, it has also fueled online harassment, bullying, and harmful behaviors like hate speech and...
Beyond the Prompt in Large Language Models: Comprehension, In-Context Learning, and Chain-of-Thought
arXiv:2603.10000v1 Announce Type: new Abstract: Large Language Models (LLMs) have demonstrated remarkable proficiency across diverse tasks, exhibiting emergent properties such as semantic prompt comprehension, In-Context Learning (ICL), and Chain-of-Thought (CoT) reasoning. Despite their empirical success, the theoretical mechanisms driving these...
Fine-Tune, Don't Prompt, Your Language Model to Identify Biased Language in Clinical Notes
arXiv:2603.10004v1 Announce Type: new Abstract: Clinical documentation can contain emotionally charged language with stigmatizing or privileging valences. We present a framework for detecting and classifying such language as stigmatizing, privileging, or neutral. We constructed a curated lexicon of biased terms...
GATech at AbjadGenEval Shared Task: Multilingual Embeddings for Arabic Machine-Generated Text Classification
arXiv:2603.10007v1 Announce Type: new Abstract: We present our approach to the AbjadGenEval shared task on detecting AI-generated Arabic text. We fine-tuned the multilingual E5-large encoder for binary classification, and we explored several pooling strategies to pool token representations, including weighted...
Evaluating Progress in Graph Foundation Models: A Comprehensive Benchmark and New Insights
arXiv:2603.10033v1 Announce Type: new Abstract: Graph foundation models (GFM) aim to acquire transferable knowledge by pre-training on diverse graphs, which can be adapted to various downstream tasks. However, domain shift in graphs is inherently two-dimensional: graphs differ not only in...
TriageSim: A Conversational Emergency Triage Simulation Framework from Structured Electronic Health Records
arXiv:2603.10035v1 Announce Type: new Abstract: Research in emergency triage is restricted to structured electronic health records (EHR) due to regulatory constraints on nurse-patient interactions. We introduce TriageSim, a simulation framework for generating persona-conditioned triage conversations from structured records. TriageSim enables...
The Prediction-Measurement Gap: Toward Meaning Representations as Scientific Instruments
arXiv:2603.10130v1 Announce Type: new Abstract: Text embeddings have become central to computational social science and psychology, enabling scalable measurement of meaning and mixed-method inference. Yet most representation learning is optimized and evaluated for prediction and retrieval, yielding a prediction-measurement gap:...