Skip to main content
Academic

Quantifying and Mitigating Socially Desirable Responding in LLMs: A Desirability-Matched Graded Forced-Choice Psychometric Study

arXiv:2602.17262v1 Announce Type: new Abstract: Human self-report questionnaires are increasingly used in NLP to benchmark and audit large language models (LLMs), from persona consistency to safety and bias assessments. Yet these instruments presume honest responding; in evaluative contexts, LLMs can instead gravitate toward socially preferred answers-a form of socially desirable responding (SDR)-biasing questionnaire-derived scores and downstream conclusions. We propose a psychometric framework to quantify and mitigate SDR in questionnaire-based evaluation of LLMs. To quantify SDR, the same inventory is administered under HONEST versus FAKE-GOOD instructions, and SDR is computed as a direction-corrected standardized effect size from item response theory (IRT)-estimated latent scores. This enables comparisons across constructs and response formats, as well as against human instructed-faking benchmarks. For mitigation, we construct a graded forced-choice (GFC) Big Five inventory by sel

K
Kensuke Okada, Yui Furukawa, Kyosuke Bunji
· · 1 min read · 5 views

arXiv:2602.17262v1 Announce Type: new Abstract: Human self-report questionnaires are increasingly used in NLP to benchmark and audit large language models (LLMs), from persona consistency to safety and bias assessments. Yet these instruments presume honest responding; in evaluative contexts, LLMs can instead gravitate toward socially preferred answers-a form of socially desirable responding (SDR)-biasing questionnaire-derived scores and downstream conclusions. We propose a psychometric framework to quantify and mitigate SDR in questionnaire-based evaluation of LLMs. To quantify SDR, the same inventory is administered under HONEST versus FAKE-GOOD instructions, and SDR is computed as a direction-corrected standardized effect size from item response theory (IRT)-estimated latent scores. This enables comparisons across constructs and response formats, as well as against human instructed-faking benchmarks. For mitigation, we construct a graded forced-choice (GFC) Big Five inventory by selecting 30 cross-domain pairs from an item pool via constrained optimization to match desirability. Across nine instruction-tuned LLMs evaluated on synthetic personas with known target profiles, Likert-style questionnaires show consistently large SDR, whereas desirability-matched GFC substantially attenuates SDR while largely preserving the recovery of the intended persona profiles. These results highlight a model-dependent SDR-recovery trade-off and motivate SDR-aware reporting practices for questionnaire-based benchmarking and auditing of LLMs.

Executive Summary

This article proposes a psychometric framework to quantify and mitigate socially desirable responding (SDR) in questionnaire-based evaluation of large language models (LLMs). By administering a survey under honest and fake-good instructions, the authors estimate SDR as a standardized effect size from item response theory (IRT)-estimated latent scores. They demonstrate that a graded forced-choice (GFC) inventory can substantially reduce SDR while preserving the recovery of intended persona profiles. The results highlight a model-dependent SDR-recovery trade-off and emphasize the need for SDR-aware reporting practices in LLM benchmarking and auditing.

Key Points

  • The authors propose a psychometric framework to quantify and mitigate SDR in questionnaire-based evaluation of LLMs
  • The framework involves administering a survey under honest and fake-good instructions and estimating SDR as a standardized effect size from IRT-estimated latent scores
  • A graded forced-choice (GFC) inventory can substantially reduce SDR while preserving the recovery of intended persona profiles

Merits

Strength in Methodology

The authors employ a rigorous methodological approach by using item response theory (IRT) to estimate latent scores, which allows for accurate quantification of SDR

Substantial Reduction in SDR

The GFC inventory demonstrates a significant reduction in SDR, which is a crucial finding for the accurate evaluation of LLMs

Demerits

Limited Generalizability

The results may not generalize to real-world scenarios where LLMs are evaluated by humans with varying levels of expertise and familiarity with the persona profiles

Computational Requirements

The proposed framework requires significant computational resources and expertise in psychometrics, which may be a barrier to adoption

Expert Commentary

The article makes a significant contribution to the field of AI and human-computer interaction by highlighting the importance of considering SDR in the evaluation of LLMs. The proposed framework is a crucial step towards developing more accurate and reliable methods for evaluating LLMs. However, the study's limitations should be acknowledged, and further research is needed to generalize the results to real-world scenarios. The implications of the study are far-reaching, and policymakers and regulators should take note of the need for SDR-aware reporting practices in LLM benchmarking and auditing. Overall, the article is a valuable addition to the literature on AI and human-computer interaction, and it provides a solid foundation for future research in this area.

Recommendations

  • Recommendation 1: Researchers should prioritize the development of SDR-aware reporting practices for LLM benchmarking and auditing
  • Recommendation 2: Policymakers and regulators should consider the potential biases in LLMs and develop guidelines for their evaluation and deployment

Sources