RadAnnotate: Large Language Models for Efficient and Reliable Radiology Report Annotation
arXiv:2603.16002v1 Announce Type: new Abstract: Radiology report annotation is essential for clinical NLP, yet manual labeling is slow and costly. We present RadAnnotate, an LLM-based framework that studies retrieval-augmented synthetic reports and confidence-based selective automation to reduce expert effort for...
Understanding Moral Reasoning Trajectories in Large Language Models: Toward Probing-Based Explainability
arXiv:2603.16017v1 Announce Type: new Abstract: Large language models (LLMs) increasingly participate in morally sensitive decision-making, yet how they organize ethical frameworks across reasoning steps remains underexplored. We introduce \textit{moral reasoning trajectories}, sequences of ethical framework invocations across intermediate reasoning steps,...
SEAHateCheck: Functional Tests for Detecting Hate Speech in Low-Resource Languages of Southeast Asia
arXiv:2603.16070v1 Announce Type: new Abstract: Hate speech detection relies heavily on linguistic resources, which are primarily available in high-resource languages such as English and Chinese, creating barriers for researchers and platforms developing tools for low-resource languages in Southeast Asia, where...
ClaimFlow: Tracing the Evolution of Scientific Claims in NLP
arXiv:2603.16073v1 Announce Type: new Abstract: Scientific papers do more than report results $-$ they advance $\textit{claims}$ that later work supports, extends, or sometimes refutes. Yet existing methods for citation and claim analysis capture only fragments of this dialogue. In this...
CounterRefine: Answer-Conditioned Counterevidence Retrieval for Inference-Time Knowledge Repair in Factual Question Answering
arXiv:2603.16091v1 Announce Type: new Abstract: In factual question answering, many errors are not failures of access but failures of commitment: the system retrieves relevant evidence, yet still settles on the wrong answer. We present CounterRefine, a lightweight inference-time repair layer...
ASDA: Automated Skill Distillation and Adaptation for Financial Reasoning
arXiv:2603.16112v1 Announce Type: new Abstract: Adapting large language models (LLMs) to specialized financial reasoning typically requires expensive fine-tuning that produces model-locked expertise. Training-free alternatives have emerged, yet our experiments show that leading methods (GEPA and ACE) achieve only marginal gains...
Language Models Don't Know What You Want: Evaluating Personalization in Deep Research Needs Real Users
arXiv:2603.16120v1 Announce Type: new Abstract: Deep Research (DR) tools (e.g. OpenAI DR) help researchers cope with ballooning publishing counts. Such tools can synthesize scientific papers to answer researchers' queries, but lack understanding of their users. We change that in MyScholarQA...
SIA: A Synthesize-Inject-Align Framework for Knowledge-Grounded and Secure E-commerce Search LLMs with Industrial Deployment
arXiv:2603.16137v1 Announce Type: new Abstract: Large language models offer transformative potential for e-commerce search by enabling intent-aware recommendations. However, their industrial deployment is hindered by two critical challenges: (1) knowledge hallucination due to insufficient encoding of dynamic, fine-grained product knowledge,...
Parametric Social Identity Injection and Diversification in Public Opinion Simulation
arXiv:2603.16142v1 Announce Type: new Abstract: Large language models (LLMs) have recently been adopted as synthetic agents for public opinion simulation, offering a promising alternative to costly and slow human surveys. Despite their scalability, current LLM-based simulation methods fail to capture...
Polyglot-Lion: Efficient Multilingual ASR for Singapore via Balanced Fine-Tuning of Qwen3-ASR
arXiv:2603.16184v1 Announce Type: new Abstract: We present Polyglot-Lion, a family of compact multilingual automatic speech recognition (ASR) models tailored for the linguistic landscape of Singapore, covering English, Mandarin, Tamil, and Malay. Our models are obtained by fine-tuning Qwen3-ASR-0.6B and Qwen3-ASR-1.7B...
Structured Semantic Cloaking for Jailbreak Attacks on Large Language Models
arXiv:2603.16192v1 Announce Type: new Abstract: Modern LLMs employ safety mechanisms that extend beyond surface-level input filtering to latent semantic representations and generation-time reasoning, enabling them to recover obfuscated malicious intent during inference and refuse accordingly, and rendering many surface-level obfuscation...
Is Semi-Automatic Transcription Useful in Corpus Creation? Preliminary Considerations on the KIParla Corpus
arXiv:2603.16258v1 Announce Type: new Abstract: This paper analyses the implementation of Automatic Speech Recognition (ASR) into the transcription workflow of the KIParla corpus, a resource of spoken Italian. Through a two-phase experiment, 11 expert and novice transcribers produced both manual...
PyPhonPlan: Simulating phonetic planning with dynamic neural fields and task dynamics
arXiv:2603.16299v1 Announce Type: new Abstract: We introduce PyPhonPlan, a Python toolkit for implementing dynamical models of phonetic planning using coupled dynamic neural fields and task dynamic simulations. The toolkit provides modular components for defining planning, perception and memory fields, as...
PashtoCorp: A 1.25-Billion-Word Corpus, Evaluation Suite, and Reproducible Pipeline for Low-Resource Language Development
arXiv:2603.16354v1 Announce Type: new Abstract: We present PashtoCorp, a 1.25-billion-word corpus for Pashto, a language spoken by 60 million people that remains severely underrepresented in NLP. The corpus is assembled from 39 sources spanning seven HuggingFace datasets and 32 purpose-built...
RECOVER: Robust Entity Correction via agentic Orchestration of hypothesis Variants for Evidence-based Recovery
arXiv:2603.16411v1 Announce Type: new Abstract: Entity recognition in Automatic Speech Recognition (ASR) is challenging for rare and domain-specific terms. In domains such as finance, medicine, and air traffic control, these errors are costly. If the entities are entirely absent from...
IndexRAG: Bridging Facts for Cross-Document Reasoning at Index Time
arXiv:2603.16415v1 Announce Type: new Abstract: Multi-hop question answering (QA) requires reasoning across multiple documents, yet existing retrieval-augmented generation (RAG) approaches address this either through graph-based methods requiring additional online processing or iterative multi-step reasoning. We present IndexRAG, a novel approach...
DynHD: Hallucination Detection for Diffusion Large Language Models via Denoising Dynamics Deviation Learning
arXiv:2603.16459v1 Announce Type: new Abstract: Diffusion large language models (D-LLMs) have emerged as a promising alternative to auto-regressive models due to their iterative refinement capabilities. However, hallucinations remain a critical issue that hinders their reliability. To detect hallucination responses from...
On the Emotion Understanding of Synthesized Speech
arXiv:2603.16483v1 Announce Type: new Abstract: Emotion is a core paralinguistic feature in voice interaction. It is widely believed that emotion understanding models learn fundamental representations that transfer to synthesized speech, making emotion understanding results a plausible reward or evaluation metric...
AdaMem: Adaptive User-Centric Memory for Long-Horizon Dialogue Agents
arXiv:2603.16496v1 Announce Type: new Abstract: Large language model (LLM) agents increasingly rely on external memory to support long-horizon interaction, personalized assistance, and multi-step reasoning. However, existing memory systems still face three core challenges: they often rely too heavily on semantic...
How often do Answers Change? Estimating Recency Requirements in Question Answering
arXiv:2603.16544v1 Announce Type: new Abstract: Large language models (LLMs) often rely on outdated knowledge when answering time-sensitive questions, leading to confident yet incorrect responses. Without explicit signals indicating whether up-to-date information is required, models struggle to decide when to retrieve...
How to Achieve Prototypical Birth and Death for OOD Detection?
arXiv:2603.15650v1 Announce Type: new Abstract: Out-of-Distribution (OOD) detection is crucial for the secure deployment of machine learning models, and prototype-based learning methods are among the mainstream strategies for achieving OOD detection. Existing prototype-based learning methods generally rely on a fixed...
Discovering the Hidden Role of Gini Index In Prompt-based Classification
arXiv:2603.15654v1 Announce Type: new Abstract: In classification tasks, the long-tailed minority classes usually offer the predictions that are most important. Yet these classes consistently exhibit low accuracies, whereas a few high-performing classes dominate the game. We pursue a foundational understanding...
Transition Flow Matching
arXiv:2603.15689v1 Announce Type: new Abstract: Mainstream flow matching methods typically focus on learning the local velocity field, which inherently requires multiple integration steps during generation. In contrast, Mean Velocity Flow models establish a relationship between the local velocity field and...
Tackling Over-smoothing on Hypergraphs: A Ricci Flow-guided Neural Diffusion Approach
arXiv:2603.15696v1 Announce Type: new Abstract: Hypergraph neural networks (HGNNs) have demonstrated strong capabilities in modeling complex higher-order relationships. However, existing HGNNs often suffer from over-smoothing as the number of layers increases and lack effective control over message passing among nodes....
Mastering the Minority: An Uncertainty-guided Multi-Expert Framework for Challenging-tailed Sequence Learning
arXiv:2603.15708v1 Announce Type: new Abstract: Imbalanced data distribution remains a critical challenge in sequential learning, leading models to easily recognize frequent categories while failing to detect minority classes adequately. The Mixture-of-Experts model offers a scalable solution, yet its application is...
Embedding-Aware Feature Discovery: Bridging Latent Representations and Interpretable Features in Event Sequences
arXiv:2603.15713v1 Announce Type: new Abstract: Industrial financial systems operate on temporal event sequences such as transactions, user actions, and system logs. While recent research emphasizes representation learning and large language models, production systems continue to rely heavily on handcrafted statistical...
Meta-TTRL: A Metacognitive Framework for Self-Improving Test-Time Reinforcement Learning in Unified Multimodal Models
arXiv:2603.15724v1 Announce Type: new Abstract: Existing test-time scaling (TTS) methods for unified multimodal models (UMMs) in text-to-image (T2I) generation primarily rely on search or sampling strategies that produce only instance-level improvements, limiting the ability to learn from prior inferences and...
Mask Is What DLLM Needs: A Masked Data Training Paradigm for Diffusion LLMs
arXiv:2603.15803v1 Announce Type: new Abstract: Discrete diffusion models offer global context awareness and flexible parallel generation. However, uniform random noise schedulers in standard DLLM training overlook the highly non-uniform information density inherent in real-world sequences. This wastes optimization resources on...
Hypothesis Class Determines Explanation: Why Accurate Models Disagree on Feature Attribution
arXiv:2603.15821v1 Announce Type: new Abstract: The assumption that prediction-equivalent models produce equivalent explanations underlies many practices in explainable AI, including model selection, auditing, and regulatory evaluation. In this work, we show that this assumption does not hold. Through a large-scale...
FlashSampling: Fast and Memory-Efficient Exact Sampling
arXiv:2603.15854v1 Announce Type: new Abstract: Sampling from a categorical distribution is mathematically simple, but in large-vocabulary decoding, it often triggers extra memory traffic and extra kernels after the LM head. We present FlashSampling, an exact sampling primitive that fuses sampling...