Academic

Assessing LLM Response Quality in the Context of Technology-Facilitated Abuse

Vijay Prakash, Majed Almansoori, Donghan Hu, Rahul Chatterjee, Danny Yuxing Huang · March 7, 2026 · 1 min read · 20 views

#cs.HC #cs.AI #cs.CL #cs.CR #cs.CY

arXiv:2602.17672v1 Announce Type: cross Abstract: Technology-facilitated abuse (TFA) is a pervasive form of intimate partner violence (IPV) that leverages digital tools to control, surveil, or harm survivors. While tech clinics are one of the reliable sources of support for TFA survivors, they face limitations due to staffing constraints and logistical barriers. As a result, many survivors turn to online resources for assistance. With the growing accessibility and popularity of large language models (LLMs), and increasing interest from IPV organizations, survivors may begin to consult LLM-based chatbots before seeking help from tech clinics. In this work, we present the first expert-led manual evaluation of four LLMs - two widely used general-purpose non-reasoning models and two domain-specific models designed for IPV contexts - focused on their effectiveness in responding to TFA-related questions. Using real-world questions collected from literature and online forums, we assess the quality of zero-shot single-turn LLM responses generated with a survivor safety-centered prompt on criteria tailored to the TFA domain. Additionally, we conducted a user study to evaluate the perceived actionability of these responses from the perspective of individuals who have experienced TFA. Our findings, grounded in both expert assessment and user feedback, provide insights into the current capabilities and limitations of LLMs in the TFA context and may inform the design, development, and fine-tuning of future models for this domain. We conclude with concrete recommendations to improve LLM performance for survivor support.

Executive Summary

This article evaluates the effectiveness of large language models (LLMs) in responding to technology-facilitated abuse (TFA) related questions. The study assesses four LLMs, including general-purpose and domain-specific models, using real-world questions and a survivor safety-centered prompt. The findings provide insights into the capabilities and limitations of LLMs in the TFA context and offer recommendations for improving their performance in supporting survivors.

Key Points

▸ Evaluation of four LLMs in responding to TFA-related questions
▸ Assessment of LLM responses using real-world questions and a survivor safety-centered prompt
▸ User study to evaluate the perceived actionability of LLM responses from the perspective of TFA survivors

Merits

Comprehensive Evaluation

The study provides a thorough evaluation of LLMs in the TFA context, including both expert assessment and user feedback.

Demerits

Limited Generalizability

The study's findings may not be generalizable to all TFA contexts or survivor populations, as the evaluation is based on a specific set of LLMs and questions.

Expert Commentary

The study's findings underscore the potential of LLMs to provide support to TFA survivors, but also highlight the need for careful evaluation and refinement of these models to ensure they are effective and safe. The use of real-world questions and a survivor safety-centered prompt is a strength of the study, as it allows for a more nuanced understanding of the capabilities and limitations of LLMs in this context. However, further research is needed to fully realize the potential of LLMs in supporting TFA survivors and to address the ethical and practical challenges associated with their use.

Recommendations

✓ Develop specialized training data and fine-tuning for LLMs in the TFA context
✓ Conduct further research on the effectiveness and safety of LLMs in supporting TFA survivors

Sources

arXiv - cs.AI

Assessing LLM Response Quality in the Context of Technology-Facilitated Abuse

AI Commentary

Executive Summary

Key Points

Merits

Comprehensive Evaluation

Demerits

Limited Generalizability

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs