Persona-Conditioned Risk Behavior in Large Language Models: A Simulated Gambling Study with GPT-4.1
arXiv:2603.15831v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly deployed as autonomous agents in uncertain, sequential decision-making contexts. Yet it remains poorly understood whether the behaviors they exhibit in such environments reflect principled cognitive patterns or simply surface-level...
CraniMem: Cranial Inspired Gated and Bounded Memory for Agentic Systems
arXiv:2603.15642v1 Announce Type: new Abstract: Large language model (LLM) agents are increasingly deployed in long running workflows, where they must preserve user and task state across many turns. Many existing agent memory systems behave like external databases with ad hoc...
RadAnnotate: Large Language Models for Efficient and Reliable Radiology Report Annotation
arXiv:2603.16002v1 Announce Type: new Abstract: Radiology report annotation is essential for clinical NLP, yet manual labeling is slow and costly. We present RadAnnotate, an LLM-based framework that studies retrieval-augmented synthetic reports and confidence-based selective automation to reduce expert effort for...
Understanding Moral Reasoning Trajectories in Large Language Models: Toward Probing-Based Explainability
arXiv:2603.16017v1 Announce Type: new Abstract: Large language models (LLMs) increasingly participate in morally sensitive decision-making, yet how they organize ethical frameworks across reasoning steps remains underexplored. We introduce \textit{moral reasoning trajectories}, sequences of ethical framework invocations across intermediate reasoning steps,...
SEAHateCheck: Functional Tests for Detecting Hate Speech in Low-Resource Languages of Southeast Asia
arXiv:2603.16070v1 Announce Type: new Abstract: Hate speech detection relies heavily on linguistic resources, which are primarily available in high-resource languages such as English and Chinese, creating barriers for researchers and platforms developing tools for low-resource languages in Southeast Asia, where...
ClaimFlow: Tracing the Evolution of Scientific Claims in NLP
arXiv:2603.16073v1 Announce Type: new Abstract: Scientific papers do more than report results $-$ they advance $\textit{claims}$ that later work supports, extends, or sometimes refutes. Yet existing methods for citation and claim analysis capture only fragments of this dialogue. In this...
Frequency Matters: Fast Model-Agnostic Data Curation for Pruning and Quantization
arXiv:2603.16105v1 Announce Type: new Abstract: Post-training model compression is essential for enhancing the portability of Large Language Models (LLMs) while preserving their performance. While several compression approaches have been proposed, less emphasis has been placed on selecting the most suitable...
ASDA: Automated Skill Distillation and Adaptation for Financial Reasoning
arXiv:2603.16112v1 Announce Type: new Abstract: Adapting large language models (LLMs) to specialized financial reasoning typically requires expensive fine-tuning that produces model-locked expertise. Training-free alternatives have emerged, yet our experiments show that leading methods (GEPA and ACE) achieve only marginal gains...
Language Models Don't Know What You Want: Evaluating Personalization in Deep Research Needs Real Users
arXiv:2603.16120v1 Announce Type: new Abstract: Deep Research (DR) tools (e.g. OpenAI DR) help researchers cope with ballooning publishing counts. Such tools can synthesize scientific papers to answer researchers' queries, but lack understanding of their users. We change that in MyScholarQA...
Pre-training LLM without Learning Rate Decay Enhances Supervised Fine-Tuning
arXiv:2603.16127v1 Announce Type: new Abstract: We investigate the role of learning rate scheduling in the large-scale pre-training of large language models, focusing on its influence on downstream performance after supervised fine-tuning (SFT). Decay-based learning rate schedulers are widely used to...
Social Simulacra in the Wild: AI Agent Communities on Moltbook
arXiv:2603.16128v1 Announce Type: new Abstract: As autonomous LLM-based agents increasingly populate social platforms, understanding the dynamics of AI-agent communities becomes essential for both communication research and platform governance. We present the first large-scale empirical comparison of AI-agent and human online...
SciZoom: A Large-scale Benchmark for Hierarchical Scientific Summarization across the LLM Era
arXiv:2603.16131v1 Announce Type: new Abstract: The explosive growth of AI research has created unprecedented information overload, increasing the demand for scientific summarization at multiple levels of granularity beyond traditional abstracts. While LLMs are increasingly adopted for summarization, existing benchmarks remain...
SIA: A Synthesize-Inject-Align Framework for Knowledge-Grounded and Secure E-commerce Search LLMs with Industrial Deployment
arXiv:2603.16137v1 Announce Type: new Abstract: Large language models offer transformative potential for e-commerce search by enabling intent-aware recommendations. However, their industrial deployment is hindered by two critical challenges: (1) knowledge hallucination due to insufficient encoding of dynamic, fine-grained product knowledge,...
Polyglot-Lion: Efficient Multilingual ASR for Singapore via Balanced Fine-Tuning of Qwen3-ASR
arXiv:2603.16184v1 Announce Type: new Abstract: We present Polyglot-Lion, a family of compact multilingual automatic speech recognition (ASR) models tailored for the linguistic landscape of Singapore, covering English, Mandarin, Tamil, and Malay. Our models are obtained by fine-tuning Qwen3-ASR-0.6B and Qwen3-ASR-1.7B...
Attention-guided Evidence Grounding for Spoken Question Answering
arXiv:2603.16292v1 Announce Type: new Abstract: Spoken Question Answering (Spoken QA) presents a challenging cross-modal problem: effectively aligning acoustic queries with textual knowledge while avoiding the latency and error propagation inherent in cascaded ASR-based systems. In this paper, we introduce Attention-guided...
PyPhonPlan: Simulating phonetic planning with dynamic neural fields and task dynamics
arXiv:2603.16299v1 Announce Type: new Abstract: We introduce PyPhonPlan, a Python toolkit for implementing dynamical models of phonetic planning using coupled dynamic neural fields and task dynamic simulations. The toolkit provides modular components for defining planning, perception and memory fields, as...
PashtoCorp: A 1.25-Billion-Word Corpus, Evaluation Suite, and Reproducible Pipeline for Low-Resource Language Development
arXiv:2603.16354v1 Announce Type: new Abstract: We present PashtoCorp, a 1.25-billion-word corpus for Pashto, a language spoken by 60 million people that remains severely underrepresented in NLP. The corpus is assembled from 39 sources spanning seven HuggingFace datasets and 32 purpose-built...
Who Benchmarks the Benchmarks? A Case Study of LLM Evaluation in Icelandic
arXiv:2603.16406v1 Announce Type: new Abstract: This paper evaluates current Large Language Model (LLM) benchmarking for Icelandic, identifies problems, and calls for improved evaluation methods in low/medium-resource languages in particular. We show that benchmarks that include synthetic or machine-translated data that...
DynHD: Hallucination Detection for Diffusion Large Language Models via Denoising Dynamics Deviation Learning
arXiv:2603.16459v1 Announce Type: new Abstract: Diffusion large language models (D-LLMs) have emerged as a promising alternative to auto-regressive models due to their iterative refinement capabilities. However, hallucinations remain a critical issue that hinders their reliability. To detect hallucination responses from...
On the Emotion Understanding of Synthesized Speech
arXiv:2603.16483v1 Announce Type: new Abstract: Emotion is a core paralinguistic feature in voice interaction. It is widely believed that emotion understanding models learn fundamental representations that transfer to synthesized speech, making emotion understanding results a plausible reward or evaluation metric...
DanceHA: A Multi-Agent Framework for Document-Level Aspect-Based Sentiment Analysis
arXiv:2603.16546v1 Announce Type: new Abstract: Aspect-Based Sentiment Intensity Analysis (ABSIA) has garnered increasing attention, though research largely focuses on domain-specific, sentence-level settings. In contrast, document-level ABSIA--particularly in addressing complex tasks like extracting Aspect-Category-Opinion-Sentiment-Intensity (ACOSI) tuples--remains underexplored. In this work, we...
EmoLLM: Appraisal-Grounded Cognitive-Emotional Co-Reasoning in Large Language Models
arXiv:2603.16553v1 Announce Type: new Abstract: Large language models (LLMs) demonstrate strong cognitive intelligence (IQ), yet many real-world interactions also require emotional intelligence (EQ) to produce responses that are both factually reliable and emotionally appropriate. In settings such as emotional support,...
Characterizing Delusional Spirals through Human-LLM Chat Logs
arXiv:2603.16567v1 Announce Type: new Abstract: As large language models (LLMs) have proliferated, disturbing anecdotal reports of negative psychological effects, such as delusions, self-harm, and ``AI psychosis,'' have emerged in global media and legal discourse. However, it remains unclear how users...
Tokenization Tradeoffs in Structured EHR Foundation Models
arXiv:2603.15644v1 Announce Type: new Abstract: Foundation models for structured electronic health records (EHRs) are pretrained on longitudinal sequences of timestamped clinical events to learn adaptable patient representations. Tokenization -- how these timelines are converted into discrete model inputs -- determines...
XLinear: Frequency-Enhanced MLP with CrossFilter for Robust Long-Range Forecasting
arXiv:2603.15645v1 Announce Type: new Abstract: Time series forecasters are widely used across various domains. Among them, MLP (multi-layer perceptron)-based forecasters have been proven to be more robust to noise compared to Transformer-based forecasters. However, MLP struggles to capture complex features,...
Alternating Reinforcement Learning with Contextual Rubric Rewards
arXiv:2603.15646v1 Announce Type: new Abstract: Reinforcement Learning with Rubric Rewards (RLRR) is a framework that extends conventional reinforcement learning from human feedback (RLHF) and verifiable rewards (RLVR) by replacing scalar preference signals with structured, multi-dimensional, contextual rubric-based evaluations. However, existing...
Steering Frozen LLMs: Adaptive Social Alignment via Online Prompt Routing
arXiv:2603.15647v1 Announce Type: new Abstract: Large language models (LLMs) are typically governed by post-training alignment (e.g., RLHF or DPO), which yields a largely static policy during deployment and inference. However, real-world safety is a full-lifecycle problem: static defenses degrade against...
How to Achieve Prototypical Birth and Death for OOD Detection?
arXiv:2603.15650v1 Announce Type: new Abstract: Out-of-Distribution (OOD) detection is crucial for the secure deployment of machine learning models, and prototype-based learning methods are among the mainstream strategies for achieving OOD detection. Existing prototype-based learning methods generally rely on a fixed...
A federated learning framework with knowledge graph and temporal transformer for early sepsis prediction in multi-center ICUs
arXiv:2603.15651v1 Announce Type: new Abstract: The early prediction of sepsis in intensive care unit (ICU) patients is crucial for improving survival rates. However, the development of accurate predictive models is hampered by data fragmentation across healthcare institutions and the complex,...
Beyond Reward Suppression: Reshaping Steganographic Communication Protocols in MARL via Dynamic Representational Circuit Breaking
arXiv:2603.15655v1 Announce Type: new Abstract: In decentralized Multi-Agent Reinforcement Learning (MARL), steganographic collusion -- where agents develop private protocols to evade monitoring -- presents a critical AI safety threat. Existing defenses, limited to behavioral or reward layers, fail to detect...