Expert Personas Improve LLM Alignment but Damage Accuracy: Bootstrapping Intent-Based Persona Routing with PRISM
arXiv:2603.18507v1 Announce Type: new Abstract: Persona prompting can steer LLM generation towards a domain-specific tone and pattern. This behavior enables use cases in multi-agent systems where diverse interactions are crucial and human-centered tasks require high-level human alignment. Prior works provide...
Consumer-to-Clinical Language Shifts in Ambient AI Draft Notes and Clinician-Finalized Documentation: A Multi-level Analysis
arXiv:2603.18327v1 Announce Type: new Abstract: Ambient AI generates draft clinical notes from patient-clinician conversations, often using lay or consumer-oriented phrasing to support patient understanding instead of standardized clinical terminology. How clinicians revise these drafts for professional documentation conventions remains unclear....
TherapyGym: Evaluating and Aligning Clinical Fidelity and Safety in Therapy Chatbots
arXiv:2603.18008v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used for mental-health support; yet prevailing evaluation methods--fluency metrics, preference tests, and generic dialogue benchmarks--fail to capture the clinically critical dimensions of psychotherapy. We introduce THERAPYGYM, a framework that...
MemMA: Coordinating the Memory Cycle through Multi-Agent Reasoning and In-Situ Self-Evolution
arXiv:2603.18718v1 Announce Type: new Abstract: Memory-augmented LLM agents maintain external memory banks to support long-horizon interaction, yet most existing systems treat construction, retrieval, and utilization as isolated subroutines. This creates two coupled challenges: strategic blindness on the forward path of...
A Concept is More Than a Word: Diversified Unlearning in Text-to-Image Diffusion Models
arXiv:2603.18767v1 Announce Type: new Abstract: Concept unlearning has emerged as a promising direction for reducing the risks of harmful content generation in text-to-image diffusion models by selectively erasing undesirable concepts from a model's parameters. Existing approaches typically rely on keywords...
Retrieval-Augmented LLM Agents: Learning to Learn from Experience
arXiv:2603.18272v1 Announce Type: new Abstract: While large language models (LLMs) have advanced the development of general-purpose agents, achieving robust generalization to unseen tasks remains a significant challenge. Current approaches typically rely on either fine-tuning or training-free memory-augmented generation using retrieved...
TeachingCoach: A Fine-Tuned Scaffolding Chatbot for Instructional Guidance to Instructors
arXiv:2603.18189v1 Announce Type: new Abstract: Higher education instructors often lack timely and pedagogically grounded support, as scalable instructional guidance remains limited and existing tools rely on generic chatbot advice or non-scalable teaching center human-human consultations. We present TeachingCoach, a pedagogically...
MedForge: Interpretable Medical Deepfake Detection via Forgery-aware Reasoning
arXiv:2603.18577v1 Announce Type: new Abstract: Text-guided image editors can now manipulate authentic medical scans with high fidelity, enabling lesion implantation/removal that threatens clinical trust and safety. Existing defenses are inadequate for healthcare. Medical detectors are largely black-box, while MLLM-based explainers...
MineDraft: A Framework for Batch Parallel Speculative Decoding
arXiv:2603.18016v1 Announce Type: new Abstract: Speculative decoding (SD) accelerates large language model inference by using a smaller draft model to propose draft tokens that are subsequently verified by a larger target model. However, the performance of standard SD is often...
D-Mem: A Dual-Process Memory System for LLM Agents
arXiv:2603.18631v1 Announce Type: new Abstract: Driven by the development of persistent, self-adapting autonomous agents, equipping these systems with high-fidelity memory access for long-horizon reasoning has emerged as a critical requirement. However, prevalent retrieval-based memory frameworks often follow an incremental processing...
CORE: Robust Out-of-Distribution Detection via Confidence and Orthogonal Residual Scoring
arXiv:2603.18290v1 Announce Type: new Abstract: Out-of-distribution (OOD) detection is essential for deploying deep learning models reliably, yet no single method performs consistently across architectures and datasets -- a scorer that leads on one benchmark often falters on another. We attribute...
DynaRAG: Bridging Static and Dynamic Knowledge in Retrieval-Augmented Generation
arXiv:2603.18012v1 Announce Type: new Abstract: We present DynaRAG, a retrieval-augmented generation (RAG) framework designed to handle both static and time-sensitive information needs through dynamic knowledge integration. Unlike traditional RAG pipelines that rely solely on static corpora, DynaRAG selectively invokes external...
AlignMamba-2: Enhancing Multimodal Fusion and Sentiment Analysis with Modality-Aware Mamba
arXiv:2603.18462v1 Announce Type: new Abstract: In the era of large-scale pre-trained models, effectively adapting general knowledge to specific affective computing tasks remains a challenge, particularly regarding computational efficiency and multimodal heterogeneity. While Transformer-based methods have excelled at modeling inter-modal dependencies,...
MemArchitect: A Policy Driven Memory Governance Layer
arXiv:2603.18330v1 Announce Type: new Abstract: Persistent Large Language Model (LLM) agents expose a critical governance gap in memory management. Standard Retrieval-Augmented Generation (RAG) frameworks treat memory as passive storage, lacking mechanisms to resolve contradictions, enforce privacy, or prevent outdated information...
Learned but Not Expressed: Capability-Expression Dissociation in Large Language Models
arXiv:2603.18013v1 Announce Type: new Abstract: Large language models (LLMs) demonstrate the capacity to reconstruct and trace learned content from their training data under specific elicitation conditions, yet this capability does not manifest in standard generation contexts. This empirical observational study...
Large-Scale Analysis of Political Propaganda on Moltbook
arXiv:2603.18349v1 Announce Type: new Abstract: We present an NLP-based study of political propaganda on Moltbook, a Reddit-style platform for AI agents. To enable large-scale analysis, we develop LLM-based classifiers to detect political propaganda, validated against expert annotation (Cohen's $\kappa$= 0.64-0.74)....
Prune-then-Quantize or Quantize-then-Prune? Understanding the Impact of Compression Order in Joint Model Compression
arXiv:2603.18426v1 Announce Type: new Abstract: What happens when multiple compression methods are combined-does the order in which they are applied matter? Joint model compression has emerged as a powerful strategy to achieve higher efficiency by combining multiple methods such as...
Do Large Language Models Possess a Theory of Mind? A Comparative Evaluation Using the Strange Stories Paradigm
arXiv:2603.18007v1 Announce Type: new Abstract: The study explores whether current Large Language Models (LLMs) exhibit Theory of Mind (ToM) capabilities -- specifically, the ability to infer others' beliefs, intentions, and emotions from text. Given that LLMs are trained on language...
LGESynthNet: Controlled Scar Synthesis for Improved Scar Segmentation in Cardiac LGE-MRI Imaging
arXiv:2603.18356v1 Announce Type: new Abstract: Segmentation of enhancement in LGE cardiac MRI is critical for diagnosing various ischemic and non-ischemic cardiomyopathies. However, creating pixel-level annotations for these images is challenging and labor-intensive, leading to limited availability of annotated data. Generative...
Modeling the human lexicon under temperature variations: linguistic factors, diversity and typicality in LLM word associations
arXiv:2603.18171v1 Announce Type: new Abstract: Large language models (LLMs) achieve impressive results in terms of fluency in text generation, yet the nature of their linguistic knowledge - in particular the human-likeness of their internal lexicon - remains uncertain. This study...
GRAFITE: Generative Regression Analysis Framework for Issue Tracking and Evaluation
arXiv:2603.18173v1 Announce Type: new Abstract: Large language models (LLMs) are largely motivated by their performance on popular topics and benchmarks at the time of their release. However, over time, contamination occurs due to significant exposure of benchmark data during training....
CWoMP: Morpheme Representation Learning for Interlinear Glossing
arXiv:2603.18184v1 Announce Type: new Abstract: Interlinear glossed text (IGT) is a standard notation for language documentation which is linguistically rich but laborious to produce manually. Recent automated IGT methods treat glosses as character sequences, neglecting their compositional structure. We propose...
How Psychological Learning Paradigms Shaped and Constrained Artificial Intelligence
arXiv:2603.18203v1 Announce Type: new Abstract: The dominant paradigms of artificial intelligence were shaped by learning theories from psychology: behaviorism inspired reinforcement learning, cognitivism gave rise to deep learning and memory-augmented architectures, and constructivism influenced curriculum learning and compositional approaches. This...
From Noise to Signal: When Outliers Seed New Topics
arXiv:2603.18358v1 Announce Type: new Abstract: Outliers in dynamic topic modeling are typically treated as noise, yet we show that some can serve as early signals of emerging topics. We introduce a temporal taxonomy of news-document trajectories that defines how documents...
Synthetic Data Generation for Training Diversified Commonsense Reasoning Models
arXiv:2603.18361v1 Announce Type: new Abstract: Conversational agents are required to respond to their users not only with high quality (i.e. commonsense bearing) responses, but also considering multiple plausible alternative scenarios, reflecting the diversity in their responses. Despite the growing need...
PowerFlow: Unlocking the Dual Nature of LLMs via Principled Distribution Matching
arXiv:2603.18363v1 Announce Type: new Abstract: Unsupervised Reinforcement Learning from Internal Feedback (RLIF) has emerged as a promising paradigm for eliciting the latent capabilities of Large Language Models (LLMs) without external supervision. However, current methods rely on heuristic intrinsic rewards, which...
AutoScreen-FW: An LLM-based Framework for Resume Screening
arXiv:2603.18390v1 Announce Type: new Abstract: Corporate recruiters often need to screen many resumes within a limited time, which increases their burden and may cause suitable candidates to be overlooked. To address these challenges, prior work has explored LLM-based automated resume...
TopoChunker: Topology-Aware Agentic Document Chunking Framework
arXiv:2603.18409v1 Announce Type: new Abstract: Current document chunking methods for Retrieval-Augmented Generation (RAG) typically linearize text. This forced linearization strips away intrinsic topological hierarchies, creating ``semantic fragmentation'' that degrades downstream retrieval quality. In this paper, we propose TopoChunker, an agentic...
Multimodal Task Interference: A Benchmark and Analysis of History-Target Mismatch in Multimodal LLMs
arXiv:2603.18425v1 Announce Type: new Abstract: Task interference, the performance degradation caused by task switches within a single conversation, has been studied exclusively in text-only settings despite the growing prevalence of multimodal dialogue systems. We introduce a benchmark for evaluating this...
The Truncation Blind Spot: How Decoding Strategies Systematically Exclude Human-Like Token Choices
arXiv:2603.18482v1 Announce Type: new Abstract: Standard decoding strategies for text generation, including top-k, nucleus sampling, and contrastive search, select tokens based on likelihood, restricting selection to high-probability regions. Human language production operates differently: tokens are chosen for communicative appropriateness rather...