Academic

LLM-Powered Automatic Translation and Urgency in Crisis Scenarios

arXiv:2602.13452v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly proposed for crisis preparedness and response, particularly for multilingual communication. However, their suitability for high-stakes crisis contexts remains insufficiently evaluated. This work examines the performance of state-of-the-art LLMs and machine translation systems in crisis-domain translation, with a focus on preserving urgency, which is a critical property for effective crisis communication and triaging. Using multilingual crisis data and a newly introduced urgency-annotated dataset covering over 32 languages, we show that both dedicated translation models and LLMs exhibit substantial performance degradation and instability. Crucially, even linguistically adequate translations can distort perceived urgency, and LLM-based urgency classifications vary widely depending on the language of the prompt and input. These findings highlight significant risks in deploying general-purpose la

B
Belu Ticona, Antonis Anastasopoulos
· · 1 min read · 2 views

arXiv:2602.13452v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly proposed for crisis preparedness and response, particularly for multilingual communication. However, their suitability for high-stakes crisis contexts remains insufficiently evaluated. This work examines the performance of state-of-the-art LLMs and machine translation systems in crisis-domain translation, with a focus on preserving urgency, which is a critical property for effective crisis communication and triaging. Using multilingual crisis data and a newly introduced urgency-annotated dataset covering over 32 languages, we show that both dedicated translation models and LLMs exhibit substantial performance degradation and instability. Crucially, even linguistically adequate translations can distort perceived urgency, and LLM-based urgency classifications vary widely depending on the language of the prompt and input. These findings highlight significant risks in deploying general-purpose language technologies for crisis communication and underscore the need for crisis-aware evaluation frameworks.

Executive Summary

The article investigates the effectiveness of large language models (LLMs) and machine translation systems in crisis scenarios, focusing on the preservation of urgency in multilingual communication. Using a newly annotated dataset covering 32 languages, the study reveals significant performance degradation and instability in both dedicated translation models and LLMs. The findings underscore the risks of deploying general-purpose language technologies in high-stakes crisis contexts and call for the development of crisis-aware evaluation frameworks.

Key Points

  • Performance degradation and instability in LLMs and machine translation systems in crisis scenarios.
  • Linguistically adequate translations can distort perceived urgency.
  • Variability in LLM-based urgency classifications depending on the language of the prompt and input.
  • Need for crisis-aware evaluation frameworks for language technologies.

Merits

Comprehensive Dataset

The study utilizes a newly introduced urgency-annotated dataset covering over 32 languages, providing a robust foundation for evaluating the performance of LLMs and machine translation systems in crisis scenarios.

Critical Insight

The article highlights the critical importance of preserving urgency in crisis communication, which is often overlooked in general-purpose language technologies.

Practical Implications

The findings have direct implications for the deployment of language technologies in crisis preparedness and response, emphasizing the need for specialized evaluation frameworks.

Demerits

Limited Scope

The study focuses primarily on the preservation of urgency and does not extensively explore other critical aspects of crisis communication, such as accuracy and cultural sensitivity.

Generalizability

The findings may not be fully generalizable to all crisis scenarios, as the study is based on a specific dataset and set of languages.

Technical Complexity

The article assumes a certain level of technical expertise, which may limit its accessibility to a broader audience, including policymakers and practitioners.

Expert Commentary

The article provides a timely and critical examination of the suitability of LLMs and machine translation systems in crisis scenarios. The focus on preserving urgency is particularly noteworthy, as it addresses a critical aspect of crisis communication that is often overlooked. The comprehensive dataset and rigorous analysis contribute significantly to the discourse on the deployment of language technologies in high-stakes contexts. However, the study's limitations, such as its limited scope and generalizability, should be acknowledged. Future research should expand on these findings by exploring other critical aspects of crisis communication and evaluating the performance of language technologies across a broader range of crisis scenarios and languages. The practical and policy implications of the study underscore the need for specialized evaluation frameworks and regulatory guidelines to ensure the effective and reliable use of AI technologies in crisis communication.

Recommendations

  • Develop specialized evaluation frameworks for language technologies in crisis scenarios, focusing on the preservation of urgency and other critical aspects of crisis communication.
  • Conduct further research to explore the performance of language technologies across a broader range of crisis scenarios and languages, ensuring the generalizability of the findings.

Sources