Academic

Memory-Driven Role-Playing: Evaluation and Enhancement of Persona Knowledge Utilization in LLMs

arXiv:2603.19313v1 Announce Type: new Abstract: A core challenge for faithful LLM role-playing is sustaining consistent characterization throughout long, open-ended dialogues, as models frequently fail to recall and accurately apply their designated persona knowledge without explicit cues. To tackle this, we propose the Memory-Driven Role-Playing paradigm. Inspired by Stanislavski's "emotional memory" acting theory, this paradigm frames persona knowledge as the LLM's internal memory store, requiring retrieval and application based solely on dialogue context, thereby providing a rigorous test of depth and autonomous use of knowledge. Centered on this paradigm, we contribute: (1) MREval, a fine-grained evaluation framework assessing four memory-driven abilities - Anchoring, Recalling, Bounding, and Enacting; (2) MRPrompt, a prompting architecture that guides structured memory retrieval and response generation; and (3) MRBench, a bilingual (Chinese/English) benchmark for fine-grained dia

K
Kai Wang, Haoyang You, Yang Zhang, Zhongjie Wang
· · 1 min read · 7 views

arXiv:2603.19313v1 Announce Type: new Abstract: A core challenge for faithful LLM role-playing is sustaining consistent characterization throughout long, open-ended dialogues, as models frequently fail to recall and accurately apply their designated persona knowledge without explicit cues. To tackle this, we propose the Memory-Driven Role-Playing paradigm. Inspired by Stanislavski's "emotional memory" acting theory, this paradigm frames persona knowledge as the LLM's internal memory store, requiring retrieval and application based solely on dialogue context, thereby providing a rigorous test of depth and autonomous use of knowledge. Centered on this paradigm, we contribute: (1) MREval, a fine-grained evaluation framework assessing four memory-driven abilities - Anchoring, Recalling, Bounding, and Enacting; (2) MRPrompt, a prompting architecture that guides structured memory retrieval and response generation; and (3) MRBench, a bilingual (Chinese/English) benchmark for fine-grained diagnosis. The novel paradigm provides a comprehensive diagnostic for four-staged role-playing abilities across 12 LLMs. Crucially, experiments show that MRPrompt allows small models (e.g., Qwen3-8B) to match the performance of much larger closed-source LLMs (e.g., Qwen3-Max and GLM-4.7), and confirms that upstream memory gains directly enhance downstream response quality, validating the staged theoretical foundation.

Executive Summary

This article proposes the Memory-Driven Role-Playing paradigm to address the challenge of sustaining consistent characterization in long, open-ended dialogues for LLMs. The paradigm frames persona knowledge as an LLM's internal memory store, requiring retrieval and application based on dialogue context. The authors contribute MREval, a fine-grained evaluation framework; MRPrompt, a prompting architecture; and MRBench, a bilingual benchmark. Experiments show that MRPrompt enables small models to match the performance of larger closed-source LLMs, validating the staged theoretical foundation. This research provides a comprehensive diagnostic for four-staged role-playing abilities and demonstrates the potential of upstream memory gains to enhance downstream response quality.

Key Points

  • Proposes the Memory-Driven Role-Playing paradigm to address challenges in LLM role-playing
  • Contributes MREval, MRPrompt, and MRBench to facilitate evaluation and enhancement of persona knowledge utilization
  • Demonstrates the effectiveness of MRPrompt in enabling small models to match the performance of larger LLMs

Merits

Strength

The Memory-Driven Role-Playing paradigm provides a comprehensive diagnostic for four-staged role-playing abilities, offering valuable insights into the strengths and weaknesses of LLMs.

Practical Application

The MRPrompt architecture and MRBench benchmark enable researchers to evaluate and enhance the persona knowledge utilization of LLMs, facilitating their practical application in various domains.

Demerits

Limitation

The evaluation of the Memory-Driven Role-Playing paradigm is limited to a single dataset, and its generalizability to other domains and datasets is unclear.

Scalability

The MRPrompt architecture and MRBench benchmark may require significant computational resources and expertise to implement and maintain, potentially limiting their scalability.

Expert Commentary

This research makes a significant contribution to the field of natural language processing and generation by proposing a novel paradigm for evaluating and enhancing the persona knowledge utilization of LLMs. The Memory-Driven Role-Playing paradigm and its associated tools offer a comprehensive diagnostic for four-staged role-playing abilities, enabling researchers to better understand the strengths and weaknesses of LLMs. While the evaluation of the paradigm is limited to a single dataset, its potential applications in various domains, including conversational AI and natural language generation, are vast and promising. As the field of AI continues to evolve, the research has implications for the development of more sophisticated and transparent AI systems, promoting accountability and trust in AI decision-making.

Recommendations

  • Future research should focus on evaluating the Memory-Driven Role-Playing paradigm across multiple datasets and domains to ensure its generalizability and scalability.
  • The development of more robust and user-friendly versions of the MRPrompt architecture and MRBench benchmark can facilitate their wider adoption in industry and academia.

Sources

Original: arXiv - cs.CL