DIA-HARM: Dialectal Disparities in Harmful Content Detection Across 50 English Dialects
arXiv:2604.05318v1 Announce Type: new Abstract: Harmful content detectors-particularly disinformation classifiers-are predominantly developed and evaluated on Standard American English (SAE), leaving their robustness to dialectal variation unexplored. We present DIA-HARM, the first benchmark for evaluating disinformation detection robustness across 50 English dialects spanning U.S., British, African, Caribbean, and Asia-Pacific varieties. Using Multi-VALUE's linguistically grounded transformations, we introduce D3 (Dialectal Disinformation Detection), a corpus of 195K samples derived from established disinformation benchmarks. Our evaluation of 16 detection models reveals systematic vulnerabilities: human-written dialectal content degrades detection by 1.4-3.6% F1, while AI-generated content remains stable. Fine-tuned transformers substantially outperform zero-shot LLMs (96.6% vs. 78.3% best-case F1), with some models exhibiting catastrophic failures exceeding 33% degradation on mixed
arXiv:2604.05318v1 Announce Type: new Abstract: Harmful content detectors-particularly disinformation classifiers-are predominantly developed and evaluated on Standard American English (SAE), leaving their robustness to dialectal variation unexplored. We present DIA-HARM, the first benchmark for evaluating disinformation detection robustness across 50 English dialects spanning U.S., British, African, Caribbean, and Asia-Pacific varieties. Using Multi-VALUE's linguistically grounded transformations, we introduce D3 (Dialectal Disinformation Detection), a corpus of 195K samples derived from established disinformation benchmarks. Our evaluation of 16 detection models reveals systematic vulnerabilities: human-written dialectal content degrades detection by 1.4-3.6% F1, while AI-generated content remains stable. Fine-tuned transformers substantially outperform zero-shot LLMs (96.6% vs. 78.3% best-case F1), with some models exhibiting catastrophic failures exceeding 33% degradation on mixed content. Cross-dialectal transfer analysis across 2,450 dialect pairs shows that multilingual models (mDeBERTa: 97.2% average F1) generalize effectively, while monolingual models like RoBERTa and XLM-RoBERTa fail on dialectal inputs. These findings demonstrate that current disinformation detectors may systematically disadvantage hundreds of millions of non-SAE speakers worldwide. We release the DIA-HARM framework, D3 corpus, and evaluation tools: https://github.com/jsl5710/dia-harm
Executive Summary
The article introduces DIA-HARM, a groundbreaking benchmark designed to evaluate the robustness of harmful content detection systems—particularly disinformation classifiers—across 50 diverse English dialects. By leveraging linguistically grounded transformations from Multi-VALUE, the authors construct the D3 corpus, comprising 195K samples derived from established disinformation benchmarks. Their evaluation of 16 detection models reveals significant dialectal disparities: human-written dialectal content reduces detection performance by 1.4–3.6% F1, while AI-generated content remains unaffected. Multilingual models (e.g., mDeBERTa) outperform monolingual ones (e.g., RoBERTa), achieving 97.2% average F1, whereas some models exhibit catastrophic failures exceeding 33% degradation. The findings underscore the systemic bias in current disinformation detection systems against non-Standard American English speakers, potentially marginalizing hundreds of millions globally.
Key Points
- ▸ First comprehensive benchmark (DIA-HARM) assessing disinformation detection across 50 English dialects, addressing a critical gap in AI fairness and linguistic inclusivity.
- ▸ Systematic evaluation reveals that human-written dialectal content degrades F1 performance by 1.4–3.6%, while AI-generated content remains robust, highlighting limitations in model generalization to linguistic diversity.
- ▸ Multilingual models (e.g., mDeBERTa) significantly outperform monolingual models (e.g., RoBERTa) in cross-dialectal transfer, achieving 97.2% average F1, whereas some models fail catastrophically with >33% degradation on mixed inputs.
- ▸ The D3 corpus (195K samples) and DIA-HARM framework are released as open-source tools for further research and mitigation of dialectal biases in harmful content detection.
Merits
Novelty and Scope
The article pioneers the evaluation of disinformation detection across an unprecedented 50 English dialects, filling a critical void in AI fairness research. The use of linguistically grounded transformations (Multi-VALUE) to generate the D3 corpus ensures methodological rigor and reproducibility.
Methodological Rigor
The study employs a robust evaluation framework, testing 16 detection models across diverse architectures (transformers, LLMs) and analyzing cross-dialectal transfer with 2,450 dialect pairs. The findings are empirically grounded and statistically validated.
Impact and Accessibility
By releasing the DIA-HARM framework, D3 corpus, and evaluation tools as open-source resources, the authors democratize access to critical tools for addressing dialectal biases in AI systems, fostering collaboration and further innovation.
Demerits
Limited Generalizability to Non-English Dialects
While the study covers 50 English dialects, it does not extend to non-English languages or dialects, which may exhibit even greater linguistic diversity and pose additional challenges for harmful content detection systems.
Focus on Disinformation Over Other Harmful Content
The benchmark and analysis are centered on disinformation, with less emphasis on other forms of harmful content (e.g., hate speech, misinformation, or propaganda) that may also be influenced by dialectal variation.
Potential Overreliance on Synthetic Data
The D3 corpus is derived from established disinformation benchmarks using linguistically grounded transformations. While this approach ensures scalability, it may not fully capture the nuances of naturally occurring dialectal disinformation in real-world settings.
Expert Commentary
The DIA-HARM study represents a seminal contribution to the field of AI fairness and NLP, exposing a critical blind spot in the development of harmful content detection systems. The authors’ rigorous evaluation of 16 models across 50 dialects provides compelling evidence that current systems are not only biased but systematically disadvantage non-Standard English speakers—a finding with profound ethical and societal implications. The stark contrast between the performance of multilingual models (e.g., mDeBERTa) and monolingual ones (e.g., RoBERTa) underscores the need for a paradigm shift in model development, emphasizing cross-dialectal robustness as a non-negotiable criterion for deployment. Furthermore, the release of the DIA-HARM framework and D3 corpus as open-source tools is a commendable step toward fostering inclusivity in AI research. However, the study’s focus on English dialects, while groundbreaking, leaves a gap in addressing linguistic diversity in non-English contexts, which warrants further investigation. Policymakers and practitioners must heed these findings, as the marginalization of non-SAE speakers in digital safety tools could exacerbate existing inequalities in the digital public sphere.
Recommendations
- ✓ Developers and researchers should integrate dialectal robustness as a core evaluation metric in the development of harmful content detection systems, using the DIA-HARM framework as a benchmark for assessing performance across diverse dialects.
- ✓ Policymakers should collaborate with researchers to establish standardized testing protocols for AI systems in high-stakes domains, ensuring that linguistic inclusivity is a mandatory requirement for compliance and deployment.
- ✓ The AI community should expand the scope of dialectal benchmarks beyond English to include non-English languages and dialects, leveraging collaborative efforts to address global linguistic diversity.
- ✓ Organizations deploying AI systems for disinformation detection should conduct regular audits using the DIA-HARM framework to identify and mitigate biases, ensuring equitable performance across all linguistic communities.
- ✓ Future research should explore the intersection of dialectal variation and other forms of harmful content (e.g., hate speech, propaganda) to develop comprehensive, linguistically inclusive AI systems.
Sources
Original: arXiv - cs.CL