AIWizards at MULTIPRIDE: A Hierarchical Approach to Slur Reclamation Detection
arXiv:2602.12818v1 Announce Type: new Abstract: Detecting reclaimed slurs represents a fundamental challenge for hate speech detection systems, as the same lexcal items can function either as abusive expressions or as in-group affirmations depending on social identity and context. In this...
BaziQA-Benchmark: Evaluating Symbolic and Temporally Compositional Reasoning in Large Language Models
arXiv:2602.12889v1 Announce Type: new Abstract: We present BaziQA-Benchmark, a standardized benchmark for evaluating symbolic and temporally compositional reasoning in large language models. The benchmark is derived from 200 professionally curated, multiple-choice problems from the Global Fortune-teller Competition (2021--2025), where each...
Curriculum Learning and Pseudo-Labeling Improve the Generalization of Multi-Label Arabic Dialect Identification Models
arXiv:2602.12937v1 Announce Type: new Abstract: Being modeled as a single-label classification task for a long time, recent work has argued that Arabic Dialect Identification (ADI) should be framed as a multi-label classification task. However, ADI remains constrained by the availability...
ProbeLLM: Automating Principled Diagnosis of LLM Failures
arXiv:2602.12966v1 Announce Type: new Abstract: Understanding how and why large language models (LLMs) fail is becoming a central challenge as models rapidly evolve and static evaluations fall behind. While automated probing has been enabled by dynamic test generation, existing approaches...
Evaluating the Homogeneity of Keyphrase Prediction Models
arXiv:2602.12989v1 Announce Type: new Abstract: Keyphrases which are useful in several NLP and IR applications are either extracted from text or predicted by generative models. Contrarily to keyphrase extraction approaches, keyphrase generation models can predict keyphrases that do not appear...
Know More, Know Clearer: A Meta-Cognitive Framework for Knowledge Augmentation in Large Language Models
arXiv:2602.12996v1 Announce Type: new Abstract: Knowledge augmentation has significantly enhanced the performance of Large Language Models (LLMs) in knowledge-intensive tasks. However, existing methods typically operate on the simplistic premise that model performance equates with internal knowledge, overlooking the knowledge-confidence gaps...
Can we trust AI to detect healthy multilingual English speakers among the cognitively impaired cohort in the UK? An investigation using real-world conversational speech
arXiv:2602.13047v1 Announce Type: new Abstract: Conversational speech often reveals early signs of cognitive decline, such as dementia and MCI. In the UK, one in four people belongs to an ethnic minority, and dementia prevalence is expected to rise most rapidly...
Exploring a New Competency Modeling Process with Large Language Models
arXiv:2602.13084v1 Announce Type: new Abstract: Competency modeling is widely used in human resource management to select, develop, and evaluate talent. However, traditional expert-driven approaches rely heavily on manual analysis of large volumes of interview transcripts, making them costly and prone...
Towards interpretable models for language proficiency assessment: Predicting the CEFR level of Estonian learner texts
arXiv:2602.13102v1 Announce Type: new Abstract: Using NLP to analyze authentic learner language helps to build automated assessment and feedback tools. It also offers new and extensive insights into the development of second language production. However, there is a lack of...
Semantic Chunking and the Entropy of Natural Language
arXiv:2602.13194v1 Announce Type: new Abstract: The entropy rate of printed English is famously estimated to be about one bit per character, a benchmark that modern large language models (LLMs) have only recently approached. This entropy rate implies that English contains...
Alignment or Integration? Rethinking Multimodal Fusion in DNA-language Foundation Models
arXiv:2602.12286v1 Announce Type: cross Abstract: Fusing DNA foundation models with large language models (LLMs) for DNA-language reasoning raises a fundamental question: at what level should genomic sequences and natural language interact? Most existing approaches encode DNA sequences and text separately...
Beyond Musical Descriptors: Extracting Preference-Bearing Intent in Music Queries
arXiv:2602.12301v1 Announce Type: cross Abstract: Although annotated music descriptor datasets for user queries are increasingly common, few consider the user's intent behind these descriptors, which is essential for effectively meeting their needs. We introduce MusicRecoIntent, a manually annotated corpus of...
Sparse Autoencoders are Capable LLM Jailbreak Mitigators
arXiv:2602.12418v1 Announce Type: cross Abstract: Jailbreak attacks remain a persistent threat to large language model safety. We propose Context-Conditioned Delta Steering (CC-Delta), an SAE-based defense that identifies jailbreak-relevant sparse features by comparing token-level representations of the same harmful request with...
Constraint-Rectified Training for Efficient Chain-of-Thought
arXiv:2602.12526v1 Announce Type: cross Abstract: Chain-of-Thought (CoT) has significantly enhanced the reasoning capabilities of Large Language Models (LLMs), especially when combined with reinforcement learning (RL) based post-training methods. While longer reasoning traces can improve answer quality and unlock abilities such...
DiffuRank: Effective Document Reranking with Diffusion Language Models
arXiv:2602.12528v1 Announce Type: cross Abstract: Recent advances in large language models (LLMs) have inspired new paradigms for document reranking. While this paradigm better exploits the reasoning and contextual understanding capabilities of LLMs, most existing LLM-based rerankers rely on autoregressive generation,...
HyperMLP: An Integrated Perspective for Sequence Modeling
arXiv:2602.12601v1 Announce Type: cross Abstract: Self-attention is often viewed as probabilistic query-key lookup, motivating designs that preserve normalized attention scores and fixed positional semantics. We advocate a simpler and more unified perspective: an autoregressive attention head can be viewed as...
VimRAG: Navigating Massive Visual Context in Retrieval-Augmented Generation via Multimodal Memory Graph
arXiv:2602.12735v1 Announce Type: cross Abstract: Effectively retrieving, reasoning, and understanding multimodal information remains a critical challenge for agentic systems. Traditional Retrieval-augmented Generation (RAG) methods rely on linear interaction histories, which struggle to handle long-context tasks, especially those involving information-sparse yet...
Abstractive Red-Teaming of Language Model Character
arXiv:2602.12318v1 Announce Type: new Abstract: We want language model assistants to conform to a character specification, which asserts how the model should act across diverse user interactions. While models typically follow these character specifications, they can occasionally violate them in...
The Appeal and Reality of Recycling LoRAs with Adaptive Merging
arXiv:2602.12323v1 Announce Type: new Abstract: The widespread availability of fine-tuned LoRA modules for open pre-trained models has led to an interest in methods that can adaptively merge LoRAs to improve performance. These methods typically include some way of selecting LoRAs...
Wireless TokenCom: RL-Based Tokenizer Agreement for Multi-User Wireless Token Communications
arXiv:2602.12338v1 Announce Type: new Abstract: Token Communications (TokenCom) has recently emerged as an effective new paradigm, where tokens are the unified units of multimodal communications and computations, enabling efficient digital semantic- and goal-oriented communications in future wireless networks. To establish...
A Machine Learning Approach to the Nirenberg Problem
arXiv:2602.12368v1 Announce Type: new Abstract: This work introduces the Nirenberg Neural Network: a numerical approach to the Nirenberg problem of prescribing Gaussian curvature on $S^2$ for metrics that are pointwise conformal to the round metric. Our mesh-free physics-informed neural network...
Deep Doubly Debiased Longitudinal Effect Estimation with ICE G-Computation
arXiv:2602.12379v1 Announce Type: new Abstract: Estimating longitudinal treatment effects is essential for sequential decision-making but is challenging due to treatment-confounder feedback. While Iterative Conditional Expectation (ICE) G-computation offers a principled approach, its recursive structure suffers from error propagation, corrupting the...
High-dimensional Level Set Estimation with Trust Regions and Double Acquisition Functions
arXiv:2602.12391v1 Announce Type: new Abstract: Level set estimation (LSE) classifies whether an unknown function's value exceeds a specified threshold for given inputs, a fundamental problem in many real-world applications. In active learning settings with limited initial data, we aim to...
Synthetic Interaction Data for Scalable Personalization in Large Language Models
arXiv:2602.12394v1 Announce Type: new Abstract: Personalized prompting offers large opportunities for deploying large language models (LLMs) to diverse users, yet existing prompt optimization methods primarily focus on task-level optimization while largely overlooking user-specific preferences and latent constraints of individual users....
Computationally sufficient statistics for Ising models
arXiv:2602.12449v1 Announce Type: new Abstract: Learning Gibbs distributions using only sufficient statistics has long been recognized as a computationally hard problem. On the other hand, computationally efficient algorithms for learning Gibbs distributions rely on access to full sample configurations generated...
Regularized Meta-Learning for Improved Generalization
arXiv:2602.12469v1 Announce Type: new Abstract: Deep ensemble methods often improve predictive performance, yet they suffer from three practical limitations: redundancy among base models that inflates computational cost and degrades conditioning, unstable weighting under multicollinearity, and overfitting in meta-learning pipelines. We...
Tight Bounds for Logistic Regression with Large Stepsize Gradient Descent in Low Dimension
arXiv:2602.12471v1 Announce Type: new Abstract: We consider the optimization problem of minimizing the logistic loss with gradient descent to train a linear model for binary classification with separable data. With a budget of $T$ iterations, it was recently shown that...
A Theoretical Analysis of Mamba's Training Dynamics: Filtering Relevant Features for Generalization in State Space Models
arXiv:2602.12499v1 Announce Type: new Abstract: The recent empirical success of Mamba and other selective state space models (SSMs) has renewed interest in non-attention architectures for sequence modeling, yet their theoretical foundations remain underexplored. We present a first-step analysis of generalization...
On Robustness and Chain-of-Thought Consistency of RL-Finetuned VLMs
arXiv:2602.12506v1 Announce Type: new Abstract: Reinforcement learning (RL) fine-tuning has become a key technique for enhancing large language models (LLMs) on reasoning-intensive tasks, motivating its extension to vision language models (VLMs). While RL-tuned VLMs improve on visual reasoning benchmarks, they...
Bench-MFG: A Benchmark Suite for Learning in Stationary Mean Field Games
arXiv:2602.12517v1 Announce Type: new Abstract: The intersection of Mean Field Games (MFGs) and Reinforcement Learning (RL) has fostered a growing family of algorithms designed to solve large-scale multi-agent systems. However, the field currently lacks a standardized evaluation protocol, forcing researchers...