Skip to main content
Academic

InterviewSim: A Scalable Framework for Interview-Grounded Personality Simulation

arXiv:2602.20294v1 Announce Type: new Abstract: Simulating real personalities with large language models requires grounding generation in authentic personal data. Existing evaluation approaches rely on demographic surveys, personality questionnaires, or short AI-led interviews as proxies, but lack direct assessment against what individuals actually said. We address this gap with an interview-grounded evaluation framework for personality simulation at a large scale. We extract over 671,000 question-answer pairs from 23,000 verified interview transcripts across 1,000 public personalities, each with an average of 11.5 hours of interview content. We propose a multi-dimensional evaluation framework with four complementary metrics measuring content similarity, factual consistency, personality alignment, and factual knowledge retention. Through systematic comparison, we demonstrate that methods grounded in real interview data substantially outperform those relying solely on biographical prof

arXiv:2602.20294v1 Announce Type: new Abstract: Simulating real personalities with large language models requires grounding generation in authentic personal data. Existing evaluation approaches rely on demographic surveys, personality questionnaires, or short AI-led interviews as proxies, but lack direct assessment against what individuals actually said. We address this gap with an interview-grounded evaluation framework for personality simulation at a large scale. We extract over 671,000 question-answer pairs from 23,000 verified interview transcripts across 1,000 public personalities, each with an average of 11.5 hours of interview content. We propose a multi-dimensional evaluation framework with four complementary metrics measuring content similarity, factual consistency, personality alignment, and factual knowledge retention. Through systematic comparison, we demonstrate that methods grounded in real interview data substantially outperform those relying solely on biographical profiles or the model's parametric knowledge. We further reveal a trade-off in how interview data is best utilized: retrieval-augmented methods excel at capturing personality style and response quality, while chronological-based methods better preserve factual consistency and knowledge retention. Our evaluation framework enables principled method selection based on application requirements, and our empirical findings provide actionable insights for advancing personality simulation research.

Executive Summary

This study introduces InterviewSim, a scalable framework for evaluating personality simulation models using authentic personal data from large interview transcripts. The framework extracts over 671,000 question-answer pairs from 23,000 verified interviews across 1,000 public personalities. A multi-dimensional evaluation framework is proposed, comprising four metrics: content similarity, factual consistency, personality alignment, and factual knowledge retention. The study demonstrates that methods grounded in real interview data outperform those relying solely on biographical profiles or parametric knowledge, revealing a trade-off between retrieval-augmented and chronological-based methods. The evaluation framework enables principled method selection and provides actionable insights for advancing personality simulation research.

Key Points

  • Introduction of InterviewSim, a scalable framework for personality simulation evaluation
  • Use of authentic personal data from large interview transcripts for evaluation
  • Proposed multi-dimensional evaluation framework with four metrics
  • Demonstration of methods grounded in real interview data outperforming other approaches
  • Revelation of trade-off between retrieval-augmented and chronological-based methods

Merits

Strengths in Methodology

The study employs a robust and scalable framework, leveraging a large corpus of verified interview transcripts to evaluate personality simulation models.

Insights into Personality Simulation

The study provides actionable insights into the trade-offs between different evaluation approaches, informing the development of more effective personality simulation models.

Demerits

Limited Generalizability

The study's findings may not generalize to other domains or populations, due to the specific characteristics of the interview transcripts used.

Dependence on High-Quality Data

The effectiveness of the InterviewSim framework relies on the availability of high-quality, verified interview transcripts, which may be difficult to obtain in certain contexts.

Expert Commentary

The study makes a significant contribution to the field of personality simulation research by introducing a scalable and robust evaluation framework. The findings have important implications for the development of more effective and engaging human-computer interfaces. However, the study's limitations highlight the need for further research on the generalizability and dependence on high-quality data. Moreover, the policy implications of the study's findings warrant careful consideration, particularly in contexts where user trust and consent are critical.

Recommendations

  • Further research is needed to investigate the generalizability of the InterviewSim framework to other domains and populations.
  • Developers and policymakers should consider the potential consequences of deploying personality simulation models in real-world applications and prioritize transparency, consent, and user trust.

Sources