HMS-BERT: Hybrid Multi-Task Self-Training for Multilingual and Multi-Label Cyberbullying Detection
arXiv:2603.12920v1 Announce Type: new Abstract: Cyberbullying on social media is inherently multilingual and multi-faceted, where abusive behaviors often overlap across multiple categories. Existing methods are commonly limited by monolingual assumptions or single-task formulations, which restrict their effectiveness in realistic multilingual and multi-label scenarios. In this paper, we propose HMS-BERT, a hybrid multi-task self-training framework for multilingual and multi-label cyberbullying detection. Built upon a pretrained multilingual BERT backbone, HMS-BERT integrates contextual representations with handcrafted linguistic features and jointly optimizes a fine-grained multi-label abuse classification task and a three-class main classification task. To address labeled data scarcity in low-resource languages, an iterative self-training strategy with confidence-based pseudo-labeling is introduced to facilitate cross-lingual knowledge transfer. Experiments on four public datasets de
arXiv:2603.12920v1 Announce Type: new Abstract: Cyberbullying on social media is inherently multilingual and multi-faceted, where abusive behaviors often overlap across multiple categories. Existing methods are commonly limited by monolingual assumptions or single-task formulations, which restrict their effectiveness in realistic multilingual and multi-label scenarios. In this paper, we propose HMS-BERT, a hybrid multi-task self-training framework for multilingual and multi-label cyberbullying detection. Built upon a pretrained multilingual BERT backbone, HMS-BERT integrates contextual representations with handcrafted linguistic features and jointly optimizes a fine-grained multi-label abuse classification task and a three-class main classification task. To address labeled data scarcity in low-resource languages, an iterative self-training strategy with confidence-based pseudo-labeling is introduced to facilitate cross-lingual knowledge transfer. Experiments on four public datasets demonstrate that HMS-BERT achieves strong performance, attaining a macro F1-score of up to 0.9847 on the multi-label task and an accuracy of 0.6775 on the main classification task. Ablation studies further verify the effectiveness of the proposed components.
Executive Summary
The article proposes HMS-BERT, a hybrid multi-task self-training framework for multilingual and multi-label cyberbullying detection. It integrates contextual representations with handcrafted linguistic features and jointly optimizes a fine-grained multi-label abuse classification task and a three-class main classification task. The framework achieves strong performance on four public datasets, with a macro F1-score of up to 0.9847 on the multi-label task and an accuracy of 0.6775 on the main classification task. The article addresses the limitations of existing methods, which are often restricted by monolingual assumptions or single-task formulations.
Key Points
- ▸ HMS-BERT is a hybrid multi-task self-training framework for multilingual and multi-label cyberbullying detection
- ▸ The framework integrates contextual representations with handcrafted linguistic features
- ▸ It jointly optimizes a fine-grained multi-label abuse classification task and a three-class main classification task
Merits
Effective Performance
HMS-BERT achieves strong performance on four public datasets, demonstrating its effectiveness in multilingual and multi-label cyberbullying detection
Cross-Lingual Knowledge Transfer
The framework facilitates cross-lingual knowledge transfer through an iterative self-training strategy with confidence-based pseudo-labeling
Demerits
Limited Data Availability
The framework may be limited by the availability of labeled data in low-resource languages
Complexity
The hybrid multi-task self-training framework may be complex to implement and require significant computational resources
Expert Commentary
The proposed HMS-BERT framework is a significant contribution to the field of cyberbullying detection, as it addresses the limitations of existing methods and demonstrates strong performance on multilingual and multi-label datasets. The use of a hybrid multi-task self-training approach and cross-lingual knowledge transfer strategy is particularly noteworthy, as it enables the framework to adapt to low-resource languages and improve its overall effectiveness. However, further research is needed to address the complexity and data availability limitations of the framework.
Recommendations
- ✓ Further research should be conducted to improve the efficiency and scalability of the HMS-BERT framework
- ✓ The framework should be tested on additional datasets and languages to demonstrate its generalizability and effectiveness in real-world applications