Academic

HACHIMI: Scalable and Controllable Student Persona Generation via Orchestrated Agents

arXiv:2603.04855v1 Announce Type: new Abstract: Student Personas (SPs) are emerging as infrastructure for educational LLMs, yet prior work often relies on ad-hoc prompting or hand-crafted profiles with limited control over educational theory and population distributions. We formalize this as Theory-Aligned and Distribution-Controllable Persona Generation (TAD-PG) and introduce HACHIMI, a multi-agent Propose-Validate-Revise framework that generates theory-aligned, quota-controlled personas. HACHIMI factorizes each persona into a theory-anchored educational schema, enforces developmental and psychological constraints via a neuro-symbolic validator, and combines stratified sampling with semantic deduplication to reduce mode collapse. The resulting HACHIMI-1M corpus comprises 1 million personas for Grades 1-12. Intrinsic evaluation shows near-perfect schema validity, accurate quotas, and substantial diversity, while external evaluation instantiates personas as student agents answering CEP

Y
Yilin Jiang, Fei Tan, Xuanyu Yin, Jing Leng, Aimin Zhou
· · 1 min read · 2 views

arXiv:2603.04855v1 Announce Type: new Abstract: Student Personas (SPs) are emerging as infrastructure for educational LLMs, yet prior work often relies on ad-hoc prompting or hand-crafted profiles with limited control over educational theory and population distributions. We formalize this as Theory-Aligned and Distribution-Controllable Persona Generation (TAD-PG) and introduce HACHIMI, a multi-agent Propose-Validate-Revise framework that generates theory-aligned, quota-controlled personas. HACHIMI factorizes each persona into a theory-anchored educational schema, enforces developmental and psychological constraints via a neuro-symbolic validator, and combines stratified sampling with semantic deduplication to reduce mode collapse. The resulting HACHIMI-1M corpus comprises 1 million personas for Grades 1-12. Intrinsic evaluation shows near-perfect schema validity, accurate quotas, and substantial diversity, while external evaluation instantiates personas as student agents answering CEPS and PISA 2022 surveys; across 16 cohorts, math and curiosity/growth constructs align strongly between humans and agents, whereas classroom-climate and well-being constructs are only moderately aligned, revealing a fidelity gradient. All personas are generated with Qwen2.5-72B, and HACHIMI provides a standardized synthetic student population for group-level benchmarking and social-science simulations. Resources available at https://github.com/ZeroLoss-Lab/HACHIMI

Executive Summary

This article introduces HACHIMI, a multi-agent framework for generating student personas (SPs) aligned with educational theory and population distributions. HACHIMI factorizes each persona into a theory-anchored educational schema, validates it through a neuro-symbolic validator, and combines stratified sampling with semantic deduplication to reduce mode collapse. The resulting 1M-persona corpus, HACHIMI-1M, is demonstrated to have near-perfect schema validity, accurate quotas, and substantial diversity. External evaluation shows strong alignment between human and agent responses on math and curiosity/growth constructs, but moderate alignment on classroom-climate and well-being constructs. This work has significant implications for developing standardized synthetic student populations for group-level benchmarking and social-science simulations.

Key Points

  • HACHIMI is a multi-agent framework for generating theory-aligned and quota-controlled student personas.
  • HACHIMI factorizes each persona into a theory-anchored educational schema and validates it through a neuro-symbolic validator.
  • The resulting 1M-persona corpus, HACHIMI-1M, demonstrates near-perfect schema validity, accurate quotas, and substantial diversity.

Merits

Strength in Addressing Previous Limitations

HACHIMI effectively addresses previous limitations in student persona generation, such as reliance on ad-hoc prompting or hand-crafted profiles, by providing a standardized and scalable approach.

Demerits

Limited Generalizability to Real-World Classrooms

While HACHIMI-1M shows strong alignment with human responses on certain constructs, moderate alignment on other constructs, such as classroom-climate and well-being, suggests that further research is needed to ensure generalizability to real-world classrooms.

Expert Commentary

The introduction of HACHIMI is a significant contribution to the field of AI-powered educational tools, as it addresses a critical need for standardized and scalable student persona generation. The framework's ability to factorize personas into theory-anchored educational schemas and validate them through a neuro-symbolic validator demonstrates a high degree of sophistication and control. However, the moderate alignment on certain constructs, such as classroom-climate and well-being, suggests that further research is needed to ensure generalizability to real-world classrooms. This work has significant implications for both practical and policy-related applications, and its potential impact on education and student outcomes warrants further exploration.

Recommendations

  • Future research should focus on refining HACHIMI to improve alignment with real-world classroom dynamics, particularly on constructs such as classroom-climate and well-being.
  • Educators and researchers should explore the potential applications of HACHIMI in evaluating educational interventions and policies, as well as in developing more effective teaching and learning strategies.

Sources