Academic

TrustMH-Bench: A Comprehensive Benchmark for Evaluating the Trustworthiness of Large Language Models in Mental Health

arXiv:2603.03047v1 Announce Type: new Abstract: While Large Language Models (LLMs) demonstrate significant potential in providing accessible mental health support, their practical deployment raises critical trustworthiness concerns due to the domains high-stakes and safety-sensitive nature. Existing evaluation paradigms for general-purpose LLMs fail to capture mental health-specific requirements, highlighting an urgent need to prioritize and enhance their trustworthiness. To address this, we propose TrustMH-Bench, a holistic framework designed to systematically quantify the trustworthiness of mental health LLMs. By establishing a deep mapping from domain-specific norms to quantitative evaluation metrics, TrustMH-Bench evaluates models across eight core pillars: Reliability, Crisis Identification and Escalation, Safety, Fairness, Privacy, Robustness, Anti-sycophancy, and Ethics. We conduct extensive experiments across six general-purpose LLMs and six specialized mental health models. E

Zixin Xiong, Ziteng Wang, Haotian Fan, Xinjie Zhang, Wenxuan Wang · March 5, 2026 · 1 min read · 3 views

#cs.CL #cs.AI

Executive Summary

The article introduces TrustMH-Bench, a comprehensive benchmark for evaluating the trustworthiness of large language models (LLMs) in mental health. It assesses models across eight pillars, including reliability, safety, and ethics, revealing significant deficiencies in their performance. The study highlights the need to improve the trustworthiness of LLMs in high-stakes mental health applications. Experimental results show that even powerful models underperform in various trustworthiness dimensions, emphasizing the importance of systematic improvement. The release of data and code enables further research and development in this critical area.

Key Points

▸ Introduction of TrustMH-Bench, a benchmark for evaluating LLM trustworthiness in mental health
▸ Assessment of models across eight core pillars, including reliability and ethics
▸ Revelation of significant deficiencies in model performance, highlighting the need for improvement

Merits

Comprehensive Evaluation Framework

TrustMH-Bench provides a holistic framework for evaluating LLM trustworthiness, covering critical aspects such as safety, fairness, and privacy.

Demerits

Limited Model Generalizability

The study's findings may not generalize to all LLMs or mental health applications, potentially limiting the benchmark's applicability.

Expert Commentary

The introduction of TrustMH-Bench represents a crucial step towards ensuring the trustworthiness of LLMs in mental health applications. The benchmark's comprehensive evaluation framework provides a foundation for assessing model performance across critical dimensions. However, the study's findings also highlight the significant challenges that must be addressed to develop trustworthy LLMs. As the use of LLMs in mental health continues to grow, it is essential to prioritize their trustworthiness and develop effective strategies for improving their performance in high-stakes applications.

Recommendations

✓ Develop and implement more robust testing and evaluation protocols for LLMs in mental health applications
✓ Establish clear guidelines and standards for ensuring LLM trustworthiness in high-stakes mental health applications

Sources

arXiv - cs.CL

TrustMH-Bench: A Comprehensive Benchmark for Evaluating the Trustworthiness of Large Language Models in Mental Health

AI Commentary

Executive Summary

Key Points

Merits

Comprehensive Evaluation Framework

Demerits

Limited Model Generalizability

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs