MentalBench: A Benchmark for Evaluating Psychiatric Diagnostic Capability of Large Language Models
arXiv:2602.12871v1 Announce Type: new Abstract: We introduce MentalBench, a benchmark for evaluating psychiatric diagnostic decision-making in large language models (LLMs). Existing mental health benchmarks largely rely on social media data, limiting their ability to assess DSM-grounded diagnostic judgments. At the...
When Words Don't Mean What They Say: Figurative Understanding in Bengali Idioms
arXiv:2602.12921v1 Announce Type: new Abstract: Figurative language understanding remains a significant challenge for Large Language Models (LLMs), especially for low-resource languages. To address this, we introduce a new idiom dataset, a large-scale, culturally-grounded corpus of 10,361 Bengali idioms. Each idiom...
Curriculum Learning and Pseudo-Labeling Improve the Generalization of Multi-Label Arabic Dialect Identification Models
arXiv:2602.12937v1 Announce Type: new Abstract: Being modeled as a single-label classification task for a long time, recent work has argued that Arabic Dialect Identification (ADI) should be framed as a multi-label classification task. However, ADI remains constrained by the availability...
Know More, Know Clearer: A Meta-Cognitive Framework for Knowledge Augmentation in Large Language Models
arXiv:2602.12996v1 Announce Type: new Abstract: Knowledge augmentation has significantly enhanced the performance of Large Language Models (LLMs) in knowledge-intensive tasks. However, existing methods typically operate on the simplistic premise that model performance equates with internal knowledge, overlooking the knowledge-confidence gaps...
Can we trust AI to detect healthy multilingual English speakers among the cognitively impaired cohort in the UK? An investigation using real-world conversational speech
arXiv:2602.13047v1 Announce Type: new Abstract: Conversational speech often reveals early signs of cognitive decline, such as dementia and MCI. In the UK, one in four people belongs to an ethnic minority, and dementia prevalence is expected to rise most rapidly...
Exploring a New Competency Modeling Process with Large Language Models
arXiv:2602.13084v1 Announce Type: new Abstract: Competency modeling is widely used in human resource management to select, develop, and evaluate talent. However, traditional expert-driven approaches rely heavily on manual analysis of large volumes of interview transcripts, making them costly and prone...
SCOPE: Selective Conformal Optimized Pairwise LLM Judging
arXiv:2602.13110v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used as judges to replace costly human preference labels in pairwise evaluation. Despite their practicality, LLM judges remain prone to miscalibration and systematic biases. This paper proposes SCOPE (Selective...
OpenLID-v3: Improving the Precision of Closely Related Language Identification -- An Experience Report
arXiv:2602.13139v1 Announce Type: new Abstract: Language identification (LID) is an essential step in building high-quality multilingual datasets from web data. Existing LID tools (such as OpenLID or GlotLID) often struggle to identify closely related languages and to distinguish valid natural...
Beyond Musical Descriptors: Extracting Preference-Bearing Intent in Music Queries
arXiv:2602.12301v1 Announce Type: cross Abstract: Although annotated music descriptor datasets for user queries are increasingly common, few consider the user's intent behind these descriptors, which is essential for effectively meeting their needs. We introduce MusicRecoIntent, a manually annotated corpus of...
DiffuRank: Effective Document Reranking with Diffusion Language Models
arXiv:2602.12528v1 Announce Type: cross Abstract: Recent advances in large language models (LLMs) have inspired new paradigms for document reranking. While this paradigm better exploits the reasoning and contextual understanding capabilities of LLMs, most existing LLM-based rerankers rely on autoregressive generation,...
HyperMLP: An Integrated Perspective for Sequence Modeling
arXiv:2602.12601v1 Announce Type: cross Abstract: Self-attention is often viewed as probabilistic query-key lookup, motivating designs that preserve normalized attention scores and fixed positional semantics. We advocate a simpler and more unified perspective: an autoregressive attention head can be viewed as...
Vision Token Reduction via Attention-Driven Self-Compression for Efficient Multimodal Large Language Models
arXiv:2602.12618v1 Announce Type: cross Abstract: Multimodal Large Language Models (MLLMs) incur significant computational cost from processing numerous vision tokens through all LLM layers. Prior pruning methods operate either before the LLM, limiting generality due to diverse encoder-projector designs or within...
VimRAG: Navigating Massive Visual Context in Retrieval-Augmented Generation via Multimodal Memory Graph
arXiv:2602.12735v1 Announce Type: cross Abstract: Effectively retrieving, reasoning, and understanding multimodal information remains a critical challenge for agentic systems. Traditional Retrieval-augmented Generation (RAG) methods rely on linear interaction histories, which struggle to handle long-context tasks, especially those involving information-sparse yet...
Wireless TokenCom: RL-Based Tokenizer Agreement for Multi-User Wireless Token Communications
arXiv:2602.12338v1 Announce Type: new Abstract: Token Communications (TokenCom) has recently emerged as an effective new paradigm, where tokens are the unified units of multimodal communications and computations, enabling efficient digital semantic- and goal-oriented communications in future wireless networks. To establish...
Deep Doubly Debiased Longitudinal Effect Estimation with ICE G-Computation
arXiv:2602.12379v1 Announce Type: new Abstract: Estimating longitudinal treatment effects is essential for sequential decision-making but is challenging due to treatment-confounder feedback. While Iterative Conditional Expectation (ICE) G-computation offers a principled approach, its recursive structure suffers from error propagation, corrupting the...
Bench-MFG: A Benchmark Suite for Learning in Stationary Mean Field Games
arXiv:2602.12517v1 Announce Type: new Abstract: The intersection of Mean Field Games (MFGs) and Reinforcement Learning (RL) has fostered a growing family of algorithms designed to solve large-scale multi-agent systems. However, the field currently lacks a standardized evaluation protocol, forcing researchers...
Analytical Results for Two Exponential Family Distributions in Hierarchical Dirichlet Processes
arXiv:2602.12527v1 Announce Type: new Abstract: The Hierarchical Dirichlet Process (HDP) provides a flexible Bayesian nonparametric framework for modeling grouped data with a shared yet unbounded collection of mixture components. While existing applications of the HDP predominantly focus on the Dirichlet-multinomial...
AMPS: Adaptive Modality Preference Steering via Functional Entropy
arXiv:2602.12533v1 Announce Type: new Abstract: Multimodal Large Language Models (MLLMs) often exhibit significant modality preference, which is a tendency to favor one modality over another. Depending on the input, they may over-rely on linguistic priors relative to visual evidence, or...
VI-CuRL: Stabilizing Verifier-Independent RL Reasoning via Confidence-Guided Variance Reduction
arXiv:2602.12579v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a dominant paradigm for enhancing Large Language Models (LLMs) reasoning, yet its reliance on external verifiers limits its scalability. Recent findings suggest that RLVR primarily functions...
Block-Sample MAC-Bayes Generalization Bounds
arXiv:2602.12605v1 Announce Type: new Abstract: We present a family of novel block-sample MAC-Bayes bounds (mean approximately correct). While PAC-Bayes bounds (probably approximately correct) typically give bounds for the generalization error that hold with high probability, MAC-Bayes bounds have a similar...
Coden: Efficient Temporal Graph Neural Networks for Continuous Prediction
arXiv:2602.12613v1 Announce Type: new Abstract: Temporal Graph Neural Networks (TGNNs) are pivotal in processing dynamic graphs. However, existing TGNNs primarily target one-time predictions for a given temporal span, whereas many practical applications require continuous predictions, that predictions are issued frequently...
Efficient Personalized Federated PCA with Manifold Optimization for IoT Anomaly Detection
arXiv:2602.12622v1 Announce Type: new Abstract: Internet of things (IoT) networks face increasing security threats due to their distributed nature and resource constraints. Although federated learning (FL) has gained prominence as a privacy-preserving framework for distributed IoT environments, current federated principal...
Uncovering spatial tissue domains and cell types in spatial omics through cross-scale profiling of cellular and genomic interactions
arXiv:2602.12651v1 Announce Type: new Abstract: Cellular identity and function are linked to both their intrinsic genomic makeup and extrinsic spatial context within the tissue microenvironment. Spatial transcriptomics (ST) offers an unprecedented opportunity to study this, providing in situ gene expression...
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing - ACL Anthology
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations - ACL Anthology
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations - ACL Anthology
Deed - Attribution 4.0 International - Creative Commons
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations - ACL Anthology
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts - ACL Anthology
Deed - Attribution-NonCommercial-ShareAlike 3.0 Unported - Creative Commons