Assessing LLM Response Quality in the Context of Technology-Facilitated Abuse
arXiv:2602.17672v1 Announce Type: cross Abstract: Technology-facilitated abuse (TFA) is a pervasive form of intimate partner violence (IPV) that leverages digital tools to control, surveil, or harm survivors. While tech clinics are one of the reliable sources of support for TFA survivors, they face limitations due to staffing constraints and logistical barriers. As a result, many survivors turn to online resources for assistance. With the growing accessibility and popularity of large language models (LLMs), and increasing interest from IPV organizations, survivors may begin to consult LLM-based chatbots before seeking help from tech clinics. In this work, we present the first expert-led manual evaluation of four LLMs - two widely used general-purpose non-reasoning models and two domain-specific models designed for IPV contexts - focused on their effectiveness in responding to TFA-related questions. Using real-world questions collected from literature and online forums, we assess the
arXiv:2602.17672v1 Announce Type: cross Abstract: Technology-facilitated abuse (TFA) is a pervasive form of intimate partner violence (IPV) that leverages digital tools to control, surveil, or harm survivors. While tech clinics are one of the reliable sources of support for TFA survivors, they face limitations due to staffing constraints and logistical barriers. As a result, many survivors turn to online resources for assistance. With the growing accessibility and popularity of large language models (LLMs), and increasing interest from IPV organizations, survivors may begin to consult LLM-based chatbots before seeking help from tech clinics. In this work, we present the first expert-led manual evaluation of four LLMs - two widely used general-purpose non-reasoning models and two domain-specific models designed for IPV contexts - focused on their effectiveness in responding to TFA-related questions. Using real-world questions collected from literature and online forums, we assess the quality of zero-shot single-turn LLM responses generated with a survivor safety-centered prompt on criteria tailored to the TFA domain. Additionally, we conducted a user study to evaluate the perceived actionability of these responses from the perspective of individuals who have experienced TFA. Our findings, grounded in both expert assessment and user feedback, provide insights into the current capabilities and limitations of LLMs in the TFA context and may inform the design, development, and fine-tuning of future models for this domain. We conclude with concrete recommendations to improve LLM performance for survivor support.
Executive Summary
This article evaluates the effectiveness of large language models (LLMs) in responding to technology-facilitated abuse (TFA) related questions. The study assesses four LLMs, including general-purpose and domain-specific models, using real-world questions and a survivor safety-centered prompt. The findings provide insights into the capabilities and limitations of LLMs in the TFA context and offer recommendations for improving their performance in supporting survivors.
Key Points
- ▸ Evaluation of four LLMs in responding to TFA-related questions
- ▸ Assessment of LLM responses using real-world questions and a survivor safety-centered prompt
- ▸ User study to evaluate the perceived actionability of LLM responses from the perspective of TFA survivors
Merits
Comprehensive Evaluation
The study provides a thorough evaluation of LLMs in the TFA context, including both expert assessment and user feedback.
Demerits
Limited Generalizability
The study's findings may not be generalizable to all TFA contexts or survivor populations, as the evaluation is based on a specific set of LLMs and questions.
Expert Commentary
The study's findings underscore the potential of LLMs to provide support to TFA survivors, but also highlight the need for careful evaluation and refinement of these models to ensure they are effective and safe. The use of real-world questions and a survivor safety-centered prompt is a strength of the study, as it allows for a more nuanced understanding of the capabilities and limitations of LLMs in this context. However, further research is needed to fully realize the potential of LLMs in supporting TFA survivors and to address the ethical and practical challenges associated with their use.
Recommendations
- ✓ Develop specialized training data and fine-tuning for LLMs in the TFA context
- ✓ Conduct further research on the effectiveness and safety of LLMs in supporting TFA survivors