Skip to main content
Academic

IndicJR: A Judge-Free Benchmark of Jailbreak Robustness in South Asian Languages

arXiv:2602.16832v1 Announce Type: new Abstract: Safety alignment of large language models (LLMs) is mostly evaluated in English and contract-bound, leaving multilingual vulnerabilities understudied. We introduce \textbf{Indic Jailbreak Robustness (IJR)}, a judge-free benchmark for adversarial safety across 12 Indic and South Asian languages (2.1 Billion speakers), covering 45216 prompts in JSON (contract-bound) and Free (naturalistic) tracks. IJR reveals three patterns. (1) Contracts inflate refusals but do not stop jailbreaks: in JSON, LLaMA and Sarvam exceed 0.92 JSR, and in Free all models reach 1.0 with refusals collapsing. (2) English to Indic attacks transfer strongly, with format wrappers often outperforming instruction wrappers. (3) Orthography matters: romanized or mixed inputs reduce JSR under JSON, with correlations to romanization share and tokenization (approx 0.28 to 0.32) indicating systematic effects. Human audits confirm detector reliability, and lite-to-full compar

P
Priyaranjan Pattnayak, Sanchari Chowdhuri
· · 1 min read · 5 views

arXiv:2602.16832v1 Announce Type: new Abstract: Safety alignment of large language models (LLMs) is mostly evaluated in English and contract-bound, leaving multilingual vulnerabilities understudied. We introduce \textbf{Indic Jailbreak Robustness (IJR)}, a judge-free benchmark for adversarial safety across 12 Indic and South Asian languages (2.1 Billion speakers), covering 45216 prompts in JSON (contract-bound) and Free (naturalistic) tracks. IJR reveals three patterns. (1) Contracts inflate refusals but do not stop jailbreaks: in JSON, LLaMA and Sarvam exceed 0.92 JSR, and in Free all models reach 1.0 with refusals collapsing. (2) English to Indic attacks transfer strongly, with format wrappers often outperforming instruction wrappers. (3) Orthography matters: romanized or mixed inputs reduce JSR under JSON, with correlations to romanization share and tokenization (approx 0.28 to 0.32) indicating systematic effects. Human audits confirm detector reliability, and lite-to-full comparisons preserve conclusions. IJR offers a reproducible multilingual stress test revealing risks hidden by English-only, contract-focused evaluations, especially for South Asian users who frequently code-switch and romanize.

Executive Summary

The article introduces IndicJR, a judge-free benchmark for evaluating the adversarial safety of large language models (LLMs) in 12 Indic and South Asian languages, covering 24,216 prompts. The study reveals three patterns: contracts do not prevent jailbreaks, attacks transfer from English to Indic languages, and orthography affects the robustness of LLMs. The findings suggest that LLMs are vulnerable to multilingual attacks, particularly in South Asian languages. The study emphasizes the importance of evaluating LLMs in diverse languages and formats, highlighting the risks of relying solely on English-only evaluations.

Key Points

  • IndicJR introduces a judge-free benchmark for evaluating LLMs in 12 Indic and South Asian languages.
  • The study reveals that contracts do not prevent jailbreaks in LLMs.
  • Attacks transfer from English to Indic languages, highlighting the multilingual nature of LLM vulnerabilities.

Merits

Strength

The study provides a comprehensive evaluation of LLMs in 12 Indic and South Asian languages, highlighting the importance of multilingual evaluations. The use of a judge-free benchmark allows for objective and transparent evaluation of LLMs.

Multilingual Evaluation

The study emphasizes the need for multilingual evaluations of LLMs, particularly in South Asian languages, which are often understudied.

Orthography Matters

The study highlights the importance of orthography in evaluating LLM robustness, suggesting that romanized or mixed inputs can reduce JSR.

Demerits

Limitation

The study is limited to 12 Indic and South Asian languages, and may not be representative of all languages. Additionally, the evaluation of LLMs is based on a specific benchmark, which may not generalize to other benchmarks.

Scalability

The study evaluates a relatively small number of LLMs, which may not be scalable to larger models or more diverse languages.

Expert Commentary

The study provides a significant contribution to the field of NLP, highlighting the importance of multilingual evaluations and the vulnerabilities of LLMs in diverse languages. However, the study is limited in its scope and scalability, and future research should aim to evaluate LLMs in a more diverse set of languages and formats. Additionally, the study highlights the need for policymakers to recognize the importance of multilingual NLP evaluations and invest in research and development to address the vulnerabilities of LLMs in diverse languages.

Recommendations

  • Recommendation 1: LLM developers should prioritize multilingual evaluations and invest in research and development to address the vulnerabilities of LLMs in diverse languages.
  • Recommendation 2: Policymakers should recognize the importance of multilingual NLP evaluations and invest in research and development to address the vulnerabilities of LLMs in diverse languages.

Sources