MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios
arXiv:2602.22638v1 Announce Type: new Abstract: Route-planning agents powered by large language models (LLMs) have emerged as a promising paradigm for supporting everyday human mobility through natural language interaction and tool-mediated decision making. However, systematic evaluation in real-world mobility settings is...
RLHFless: Serverless Computing for Efficient RLHF
arXiv:2602.22718v1 Announce Type: new Abstract: Reinforcement Learning from Human Feedback (RLHF) has been widely applied to Large Language Model (LLM) post-training to align model outputs with human preferences. Recent models, such as DeepSeek-R1, have also shown RLHF's potential to improve...
AMA-Bench: Evaluating Long-Horizon Memory for Agentic Applications
arXiv:2602.22769v1 Announce Type: new Abstract: Large Language Models (LLMs) are deployed as autonomous agents in increasingly complex applications, where enabling long-horizon memory is critical for achieving strong performance. However, a significant gap exists between practical applications and current evaluation standards...
DeepPresenter: Environment-Grounded Reflection for Agentic Presentation Generation
arXiv:2602.22839v1 Announce Type: new Abstract: Presentation generation requires deep content research, coherent visual design, and iterative refinement based on observation. However, existing presentation agents often rely on predefined workflows and fixed templates. To address this, we present DeepPresenter, an agentic...
OmniGAIA: Towards Native Omni-Modal AI Agents
arXiv:2602.22897v1 Announce Type: new Abstract: Human intelligence naturally intertwines omni-modal perception -- spanning vision, audio, and language -- with complex reasoning and tool usage to interact with the world. However, current multi-modal LLMs are primarily confined to bi-modal interactions (e.g.,...
Modeling Expert AI Diagnostic Alignment via Immutable Inference Snapshots
arXiv:2602.22973v1 Announce Type: new Abstract: Human-in-the-loop validation is essential in safety-critical clinical AI, yet the transition between initial model inference and expert correction is rarely analyzed as a structured signal. We introduce a diagnostic alignment framework in which the AI-generated...
Decoder-based Sense Knowledge Distillation
arXiv:2602.22351v1 Announce Type: new Abstract: Large language models (LLMs) learn contextual embeddings that capture rich semantic information, yet they often overlook structured lexical knowledge such as word senses and relationships. Prior work has shown that incorporating sense dictionaries can improve...
Scaling In, Not Up? Testing Thick Citation Context Analysis with GPT-5 and Fragile Prompts
arXiv:2602.22359v1 Announce Type: new Abstract: This paper tests whether large language models (LLMs) can support interpretative citation context analysis (CCA) by scaling in thick, text-grounded readings of a single hard case rather than scaling up typological labels. It foregrounds prompt-sensitivity...
Causality $\neq$ Invariance: Function and Concept Vectors in LLMs
arXiv:2602.22424v1 Announce Type: new Abstract: Do large language models (LLMs) represent concepts abstractly, i.e., independent of input format? We revisit Function Vectors (FVs), compact representations of in-context learning (ICL) tasks that causally drive task performance. Across multiple LLMs, we show...
A Fusion of context-aware based BanglaBERT and Two-Layer Stacked LSTM Framework for Multi-Label Cyberbullying Detection
arXiv:2602.22449v1 Announce Type: new Abstract: Cyberbullying has become a serious and growing concern in todays virtual world. When left unnoticed, it can have adverse consequences for social and mental health. Researchers have explored various types of cyberbullying, but most approaches...
Bridging Latent Reasoning and Target-Language Generation via Retrieval-Transition Heads
arXiv:2602.22453v1 Announce Type: new Abstract: Recent work has identified a subset of attention heads in Transformer as retrieval heads, which are responsible for retrieving information from the context. In this work, we first investigate retrieval heads in multilingual contexts. In...
Iterative Prompt Refinement for Dyslexia-Friendly Text Summarization Using GPT-4o
arXiv:2602.22524v1 Announce Type: new Abstract: Dyslexia affects approximately 10% of the global population and presents persistent challenges in reading fluency and text comprehension. While existing assistive technologies address visual presentation, linguistic complexity remains a substantial barrier to equitable access. This...
Search-P1: Path-Centric Reward Shaping for Stable and Efficient Agentic RAG Training
arXiv:2602.22576v1 Announce Type: new Abstract: Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) by incorporating external knowledge, yet traditional single-round retrieval struggles with complex multi-step reasoning. Agentic RAG addresses this by enabling LLMs to dynamically decide when and what to...
Enhancing Persuasive Dialogue Agents by Synthesizing Cross-Disciplinary Communication Strategies
arXiv:2602.22696v1 Announce Type: new Abstract: Current approaches to developing persuasive dialogue agents often rely on a limited set of predefined persuasive strategies that fail to capture the complexity of real-world interactions. We applied a cross-disciplinary approach to develop a framework...
The Poly Problem in Zoning: Redefining “Family” for a Changing Society lawreview - Minnesota Law Review
By ARIC SHORT & TANYA PIERCE. Full Text. Single-family zoning has long dictated not only where people may live but also with whom. Although extensively critiqued for perpetuating racial and economic exclusion, these laws also privilege relationships defined by blood,...
Reinforcing Real-world Service Agents: Balancing Utility and Cost in Task-oriented Dialogue
arXiv:2602.22697v1 Announce Type: new Abstract: The rapid evolution of Large Language Models (LLMs) has accelerated the transition from conversational chatbots to general agents. However, effectively balancing empathetic communication with budget-aware decision-making remains an open challenge. Since existing methods fail to...
Probing for Knowledge Attribution in Large Language Models
arXiv:2602.22787v1 Announce Type: new Abstract: Large language models (LLMs) often generate fluent but unfounded claims, or hallucinations, which fall into two types: (i) faithfulness violations - misusing user context - and (ii) factuality violations - errors from internal knowledge. Proper...
Where Vision Becomes Text: Locating the OCR Routing Bottleneck in Vision-Language Models
arXiv:2602.22918v1 Announce Type: new Abstract: Vision-language models (VLMs) can read text from images, but where does this optical character recognition (OCR) information enter the language processing stream? We investigate the OCR routing mechanism across three architecture families (Qwen3-VL, Phi-4, InternVL3.5)...
CiteLLM: An Agentic Platform for Trustworthy Scientific Reference Discovery
arXiv:2602.23075v1 Announce Type: new Abstract: Large language models (LLMs) have created new opportunities to enhance the efficiency of scholarly activities; however, challenges persist in the ethical deployment of AI assistance, including (1) the trustworthiness of AI-generated content, (2) preservation of...
Assessing Deanonymization Risks with Stylometry-Assisted LLM Agent
arXiv:2602.23079v1 Announce Type: new Abstract: The rapid advancement of large language models (LLMs) has enabled powerful authorship inference capabilities, raising growing concerns about unintended deanonymization risks in textual data such as news articles. In this work, we introduce an LLM...
MTRAG-UN: A Benchmark for Open Challenges in Multi-Turn RAG Conversations
arXiv:2602.23184v1 Announce Type: new Abstract: We present MTRAG-UN, a benchmark for exploring open challenges in multi-turn retrieval augmented generation, a popular use of large language models. We release a benchmark of 666 tasks containing over 2,800 conversation turns across 6...
Discourse-Aware Dual-Track Streaming Response for Low-Latency Spoken Dialogue Systems
arXiv:2602.23266v1 Announce Type: new Abstract: Achieving human-like responsiveness is a critical yet challenging goal for cascaded spoken dialogue systems. Conventional ASR-LLM-TTS pipelines follow a strictly sequential paradigm, requiring complete transcription and full reasoning before speech synthesis can begin, which results...
A Mixture-of-Experts Model for Multimodal Emotion Recognition in Conversations
arXiv:2602.23300v1 Announce Type: new Abstract: Emotion Recognition in Conversations (ERC) presents unique challenges, requiring models to capture the temporal flow of multi-turn dialogues and to effectively integrate cues from multiple modalities. We propose Mixture of Speech-Text Experts for Recognition of...
Scale Can't Overcome Pragmatics: The Impact of Reporting Bias on Vision-Language Reasoning
arXiv:2602.23351v1 Announce Type: new Abstract: The lack of reasoning capabilities in Vision-Language Models (VLMs) has remained at the forefront of research discourse. We posit that this behavior stems from a reporting bias in their training data. That is, how people...
Enriching Taxonomies Using Large Language Models
arXiv:2602.22213v1 Announce Type: cross Abstract: Taxonomies play a vital role in structuring and categorizing information across domains. However, many existing taxonomies suffer from limited coverage and outdated or ambiguous nodes, reducing their effectiveness in knowledge retrieval. To address this, we...
To Deceive is to Teach? Forging Perceptual Robustness via Adversarial Reinforcement Learning
arXiv:2602.22227v1 Announce Type: new Abstract: Despite their impressive capabilities, Multimodal Large Language Models (MLLMs) exhibit perceptual fragility when confronted with visually complex scenes. This weakness stems from a reliance on finite training datasets, which are prohibitively expensive to scale and...
Support Tokens, Stability Margins, and a New Foundation for Robust LLMs
arXiv:2602.22271v1 Announce Type: new Abstract: Self-attention is usually described as a flexible, content-adaptive way to mix a token with information from its past. We re-interpret causal self-attention transformers, the backbone of modern foundation models, within a probabilistic framework, much like...
Integrating Machine Learning Ensembles and Large Language Models for Heart Disease Prediction Using Voting Fusion
arXiv:2602.22280v1 Announce Type: new Abstract: Cardiovascular disease is the primary cause of death globally, necessitating early identification, precise risk classification, and dependable decision-support technologies. The advent of large language models (LLMs) provides new zero-shot and few-shot reasoning capabilities, even though...
Early Risk Stratification of Dosing Errors in Clinical Trials Using Machine Learning
arXiv:2602.22285v1 Announce Type: new Abstract: Objective: The objective of this study is to develop a machine learning (ML)-based framework for early risk stratification of clinical trials (CTs) according to their likelihood of exhibiting a high rate of dosing errors, using...
When Should a Model Change Its Mind? An Energy-Based Theory and Regularizer for Concept Drift in Electrocardiogram (ECG) Signals
arXiv:2602.22294v1 Announce Type: new Abstract: Models operating on dynamic physiologic signals must distinguish benign, label-preserving variability from true concept change. Existing concept-drift frameworks are largely distributional and provide no principled guidance on how much a model's internal representation may move...