Academic
Academic
Training Is Everything: Artificial Intelligence, Copyright, and Fair Training
To learn how to behave, the current revolutionary generation of AIs must be trained on vast quantities of published images, written works, and sounds, many …
Mind the Sim2Real Gap in User Simulation for Agentic Tasks
arXiv:2603.11245v1 Announce Type: new Abstract: As NLP evaluation shifts from static benchmarks to multi-turn interactive settings, LLM-based simulators have become widely used as user proxies, …
LLM-Augmented Digital Twin for Policy Evaluation in Short-Video Platforms
arXiv:2603.11333v1 Announce Type: new Abstract: Short-video platforms are closed-loop, human-in-the-loop ecosystems where platform policy, creator incentives, and user behavior co-evolve. This feedback structure makes counterfactual …
Algorithmic Consequences of Particle Filters for Sentence Processing: Amplified Garden-Paths and Digging-In Effects
arXiv:2603.11412v1 Announce Type: new Abstract: Under surprisal theory, linguistic representations affect processing difficulty only through the bottleneck of surprisal. Our best estimates of surprisal come …
Governing Evolving Memory in LLM Agents: Risks, Mechanisms, and the Stability and Safety Governed Memory …
arXiv:2603.11768v1 Announce Type: new Abstract: Long-term memory has emerged as a foundational component of autonomous Large Language Model (LLM) agents, enabling continuous adaptation, lifelong multimodal …
Entropy Guided Diversification and Preference Elicitation in Agentic Recommendation Systems
arXiv:2603.11399v1 Announce Type: new Abstract: Users on e-commerce platforms can be uncertain about their preferences early in their search. Queries to recommendation systems are frequently …
AI Psychometrics: Evaluating the Psychological Reasoning of Large Language Models with Psychometric Validities
arXiv:2603.11279v1 Announce Type: new Abstract: The immense number of parameters and deep neural networks make large language models (LLMs) rival the complexity of human brains, …
Stop Listening to Me! How Multi-turn Conversations Can Degrade Diagnostic Reasoning
arXiv:2603.11394v1 Announce Type: new Abstract: Patients and clinicians are increasingly using chatbots powered by large language models (LLMs) for healthcare inquiries. While state-of-the-art LLMs exhibit …
Deactivating Refusal Triggers: Understanding and Mitigating Overrefusal in Safety Alignment
arXiv:2603.11388v1 Announce Type: new Abstract: Safety alignment aims to ensure that large language models (LLMs) refuse harmful requests by post-training on harmful queries paired with …
MaterialFigBENCH: benchmark dataset with figures for evaluating college-level materials science problem-solving abilities of multimodal large …
arXiv:2603.11414v1 Announce Type: new Abstract: We present MaterialFigBench, a benchmark dataset designed to evaluate the ability of multimodal large language models (LLMs) to solve university-level …