Taiwan Safety Benchmark and Breeze Guard: Toward Trustworthy AI for Taiwanese Mandarin
arXiv:2603.07286v1 Announce Type: new Abstract: Global safety models exhibit strong performance across widely used benchmarks, yet their training data rarely captures the cultural and linguistic nuances of Taiwanese Mandarin. This limitation results in systematic blind spots when interpreting region-specific risks such as localized financial scams, culturally embedded hate speech, and misinformation patterns. To address these gaps, we introduce TS-Bench (Taiwan Safety Benchmark), a standardized evaluation suite for assessing safety performance in Taiwanese Mandarin. TS-Bench contains 400 human-curated prompts spanning critical domains including financial fraud, medical misinformation, social discrimination, and political manipulation. In parallel, we present Breeze Guard, an 8B safety model derived from Breeze 2, our previously released general-purpose Taiwanese Mandarin LLM with strong cultural grounding from its original pre-training corpus. Breeze Guard is obtained through supervis
arXiv:2603.07286v1 Announce Type: new Abstract: Global safety models exhibit strong performance across widely used benchmarks, yet their training data rarely captures the cultural and linguistic nuances of Taiwanese Mandarin. This limitation results in systematic blind spots when interpreting region-specific risks such as localized financial scams, culturally embedded hate speech, and misinformation patterns. To address these gaps, we introduce TS-Bench (Taiwan Safety Benchmark), a standardized evaluation suite for assessing safety performance in Taiwanese Mandarin. TS-Bench contains 400 human-curated prompts spanning critical domains including financial fraud, medical misinformation, social discrimination, and political manipulation. In parallel, we present Breeze Guard, an 8B safety model derived from Breeze 2, our previously released general-purpose Taiwanese Mandarin LLM with strong cultural grounding from its original pre-training corpus. Breeze Guard is obtained through supervised fine-tuning on a large-scale, human-verified synthesized dataset targeting Taiwan-specific harms. Our central hypothesis is that effective safety detection requires the cultural grounding already present in the base model; safety fine-tuning alone is insufficient to introduce new socio linguistic knowledge from scratch. Empirically, Breeze Guard significantly outperforms the leading 8B general-purpose safety model, Granite Guardian 3.3, on TS-Bench (+0.17 overall F1), with particularly large gains in high-context categories such as scam (+0.66 F1) and financial malpractice (+0.43 F1). While the model shows slightly lower performance on English-centric benchmarks (ToxicChat, AegisSafetyTest), this tradeoff is expected for a regionally specialized safety model optimized for Taiwanese Mandarin. Together, Breeze Guard and TS-Bench establish a new foundation for trustworthy AI deployment in Taiwan.
Executive Summary
This article introduces TS-Bench, a standardized evaluation suite for assessing safety performance in Taiwanese Mandarin, and Breeze Guard, an 8B safety model derived from Breeze 2, a general-purpose Taiwanese Mandarin language model. The authors argue that cultural grounding is essential for effective safety detection, and fine-tuning alone is insufficient to introduce new socio-linguistic knowledge. The results show that Breeze Guard significantly outperforms the leading 8B general-purpose safety model on TS-Bench, particularly in high-context categories. However, it shows lower performance on English-centric benchmarks. The authors conclude that Breeze Guard and TS-Bench establish a new foundation for trustworthy AI deployment in Taiwan. This study highlights the importance of cultural context in AI safety and provides a valuable framework for evaluating safety performance in Taiwanese Mandarin.
Key Points
- ▸ TS-Bench is a standardized evaluation suite for assessing safety performance in Taiwanese Mandarin
- ▸ Breeze Guard is an 8B safety model derived from Breeze 2, a general-purpose Taiwanese Mandarin language model
- ▸ Cultural grounding is essential for effective safety detection, and fine-tuning alone is insufficient to introduce new socio-linguistic knowledge
Merits
Strength in cultural context
The study highlights the importance of cultural context in AI safety, which is a critical aspect of trustworthy AI deployment in Taiwan.
Effective safety detection
The results show that Breeze Guard significantly outperforms the leading 8B general-purpose safety model on TS-Bench, particularly in high-context categories.
Standardized evaluation suite
TS-Bench provides a valuable framework for evaluating safety performance in Taiwanese Mandarin, which can be used to develop more trustworthy AI systems.
Demerits
Limited generalizability
The study's results may not be generalizable to other languages or cultures, which could limit the applicability of the findings.
Tradeoff between region-specific and general-purpose safety models
The study shows that Breeze Guard performs worse on English-centric benchmarks, which may be a tradeoff for regionally specialized safety models optimized for Taiwanese Mandarin.
Expert Commentary
The study's findings are significant because they demonstrate the importance of cultural context in AI safety. The introduction of TS-Bench and Breeze Guard provides a valuable framework for evaluating safety performance in Taiwanese Mandarin, which can be used to develop more trustworthy AI systems. However, the study's results may not be generalizable to other languages or cultures, which could limit the applicability of the findings. Additionally, the tradeoff between region-specific and general-purpose safety models is an important consideration for AI developers and policymakers. Overall, the study provides a valuable contribution to the field of AI safety and highlights the importance of cultural context in AI development.
Recommendations
- ✓ Recommendation 1: AI developers and researchers should prioritize the development of culturally grounded AI systems that take into account the specific linguistic and cultural nuances of the region or population they are designed to serve.
- ✓ Recommendation 2: Policymakers should consider the cultural context of AI deployment and develop policies that address cultural bias and promote the development of trustworthy AI systems.