TrustMH-Bench: A Comprehensive Benchmark for Evaluating the Trustworthiness of Large Language Models in Mental Health
arXiv:2603.03047v1 Announce Type: new Abstract: While Large Language Models (LLMs) demonstrate significant potential in providing accessible mental health support, their practical deployment raises critical trustworthiness concerns due to the domains high-stakes and safety-sensitive nature. Existing evaluation paradigms for general-purpose LLMs fail to capture mental health-specific requirements, highlighting an urgent need to prioritize and enhance their trustworthiness. To address this, we propose TrustMH-Bench, a holistic framework designed to systematically quantify the trustworthiness of mental health LLMs. By establishing a deep mapping from domain-specific norms to quantitative evaluation metrics, TrustMH-Bench evaluates models across eight core pillars: Reliability, Crisis Identification and Escalation, Safety, Fairness, Privacy, Robustness, Anti-sycophancy, and Ethics. We conduct extensive experiments across six general-purpose LLMs and six specialized mental health models. E
arXiv:2603.03047v1 Announce Type: new Abstract: While Large Language Models (LLMs) demonstrate significant potential in providing accessible mental health support, their practical deployment raises critical trustworthiness concerns due to the domains high-stakes and safety-sensitive nature. Existing evaluation paradigms for general-purpose LLMs fail to capture mental health-specific requirements, highlighting an urgent need to prioritize and enhance their trustworthiness. To address this, we propose TrustMH-Bench, a holistic framework designed to systematically quantify the trustworthiness of mental health LLMs. By establishing a deep mapping from domain-specific norms to quantitative evaluation metrics, TrustMH-Bench evaluates models across eight core pillars: Reliability, Crisis Identification and Escalation, Safety, Fairness, Privacy, Robustness, Anti-sycophancy, and Ethics. We conduct extensive experiments across six general-purpose LLMs and six specialized mental health models. Experimental results indicate that the evaluated models underperform across various trustworthiness dimensions in mental health scenarios, revealing significant deficiencies. Notably, even generally powerful models (e.g., GPT-5.1) fail to maintain consistently high performance across all dimensions. Consequently, systematically improving the trustworthiness of LLMs has become a critical task. Our data and code are released.
Executive Summary
The article introduces TrustMH-Bench, a comprehensive benchmark for evaluating the trustworthiness of large language models (LLMs) in mental health. It assesses models across eight pillars, including reliability, safety, and ethics, revealing significant deficiencies in their performance. The study highlights the need to improve the trustworthiness of LLMs in high-stakes mental health applications. Experimental results show that even powerful models underperform in various trustworthiness dimensions, emphasizing the importance of systematic improvement. The release of data and code enables further research and development in this critical area.
Key Points
- ▸ Introduction of TrustMH-Bench, a benchmark for evaluating LLM trustworthiness in mental health
- ▸ Assessment of models across eight core pillars, including reliability and ethics
- ▸ Revelation of significant deficiencies in model performance, highlighting the need for improvement
Merits
Comprehensive Evaluation Framework
TrustMH-Bench provides a holistic framework for evaluating LLM trustworthiness, covering critical aspects such as safety, fairness, and privacy.
Demerits
Limited Model Generalizability
The study's findings may not generalize to all LLMs or mental health applications, potentially limiting the benchmark's applicability.
Expert Commentary
The introduction of TrustMH-Bench represents a crucial step towards ensuring the trustworthiness of LLMs in mental health applications. The benchmark's comprehensive evaluation framework provides a foundation for assessing model performance across critical dimensions. However, the study's findings also highlight the significant challenges that must be addressed to develop trustworthy LLMs. As the use of LLMs in mental health continues to grow, it is essential to prioritize their trustworthiness and develop effective strategies for improving their performance in high-stakes applications.
Recommendations
- ✓ Develop and implement more robust testing and evaluation protocols for LLMs in mental health applications
- ✓ Establish clear guidelines and standards for ensuring LLM trustworthiness in high-stakes mental health applications