Academic

HMS-BERT: Hybrid Multi-Task Self-Training for Multilingual and Multi-Label Cyberbullying Detection

arXiv:2603.12920v1 Announce Type: new Abstract: Cyberbullying on social media is inherently multilingual and multi-faceted, where abusive behaviors often overlap across multiple categories. Existing methods are commonly limited by monolingual assumptions or single-task formulations, which restrict their effectiveness in realistic multilingual and multi-label scenarios. In this paper, we propose HMS-BERT, a hybrid multi-task self-training framework for multilingual and multi-label cyberbullying detection. Built upon a pretrained multilingual BERT backbone, HMS-BERT integrates contextual representations with handcrafted linguistic features and jointly optimizes a fine-grained multi-label abuse classification task and a three-class main classification task. To address labeled data scarcity in low-resource languages, an iterative self-training strategy with confidence-based pseudo-labeling is introduced to facilitate cross-lingual knowledge transfer. Experiments on four public datasets de

Zixin Feng, Xinying Cui, Yifan Sun, Zheng Wei, Jiachen Yuan, Jiazhen Hu, Ning Xin, Md Maruf Hasan · March 16, 2026 · 1 min read · 12 views

#cs.CL #stat.ML

Executive Summary

The article proposes HMS-BERT, a hybrid multi-task self-training framework for multilingual and multi-label cyberbullying detection. It integrates contextual representations with handcrafted linguistic features and jointly optimizes a fine-grained multi-label abuse classification task and a three-class main classification task. The framework achieves strong performance on four public datasets, with a macro F1-score of up to 0.9847 on the multi-label task and an accuracy of 0.6775 on the main classification task. The article addresses the limitations of existing methods, which are often restricted by monolingual assumptions or single-task formulations.

Key Points

▸ HMS-BERT is a hybrid multi-task self-training framework for multilingual and multi-label cyberbullying detection
▸ The framework integrates contextual representations with handcrafted linguistic features
▸ It jointly optimizes a fine-grained multi-label abuse classification task and a three-class main classification task

Merits

Effective Performance

HMS-BERT achieves strong performance on four public datasets, demonstrating its effectiveness in multilingual and multi-label cyberbullying detection

Cross-Lingual Knowledge Transfer

The framework facilitates cross-lingual knowledge transfer through an iterative self-training strategy with confidence-based pseudo-labeling

Demerits

Limited Data Availability

The framework may be limited by the availability of labeled data in low-resource languages

Complexity

The hybrid multi-task self-training framework may be complex to implement and require significant computational resources

Expert Commentary

The proposed HMS-BERT framework is a significant contribution to the field of cyberbullying detection, as it addresses the limitations of existing methods and demonstrates strong performance on multilingual and multi-label datasets. The use of a hybrid multi-task self-training approach and cross-lingual knowledge transfer strategy is particularly noteworthy, as it enables the framework to adapt to low-resource languages and improve its overall effectiveness. However, further research is needed to address the complexity and data availability limitations of the framework.

Recommendations

✓ Further research should be conducted to improve the efficiency and scalability of the HMS-BERT framework
✓ The framework should be tested on additional datasets and languages to demonstrate its generalizability and effectiveness in real-world applications

Sources

arXiv - cs.CL

HMS-BERT: Hybrid Multi-Task Self-Training for Multilingual and Multi-Label Cyberbullying Detection

AI Commentary

Executive Summary

Key Points

Merits

Effective Performance

Cross-Lingual Knowledge Transfer

Demerits

Limited Data Availability

Complexity

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs