HACHIMI: Scalable and Controllable Student Persona Generation via Orchestrated Agents
arXiv:2603.04855v1 Announce Type: new Abstract: Student Personas (SPs) are emerging as infrastructure for educational LLMs, yet prior work often relies on ad-hoc prompting or hand-crafted profiles with limited control over educational theory and population distributions. We formalize this as Theory-Aligned and Distribution-Controllable Persona Generation (TAD-PG) and introduce HACHIMI, a multi-agent Propose-Validate-Revise framework that generates theory-aligned, quota-controlled personas. HACHIMI factorizes each persona into a theory-anchored educational schema, enforces developmental and psychological constraints via a neuro-symbolic validator, and combines stratified sampling with semantic deduplication to reduce mode collapse. The resulting HACHIMI-1M corpus comprises 1 million personas for Grades 1-12. Intrinsic evaluation shows near-perfect schema validity, accurate quotas, and substantial diversity, while external evaluation instantiates personas as student agents answering CEP
arXiv:2603.04855v1 Announce Type: new Abstract: Student Personas (SPs) are emerging as infrastructure for educational LLMs, yet prior work often relies on ad-hoc prompting or hand-crafted profiles with limited control over educational theory and population distributions. We formalize this as Theory-Aligned and Distribution-Controllable Persona Generation (TAD-PG) and introduce HACHIMI, a multi-agent Propose-Validate-Revise framework that generates theory-aligned, quota-controlled personas. HACHIMI factorizes each persona into a theory-anchored educational schema, enforces developmental and psychological constraints via a neuro-symbolic validator, and combines stratified sampling with semantic deduplication to reduce mode collapse. The resulting HACHIMI-1M corpus comprises 1 million personas for Grades 1-12. Intrinsic evaluation shows near-perfect schema validity, accurate quotas, and substantial diversity, while external evaluation instantiates personas as student agents answering CEPS and PISA 2022 surveys; across 16 cohorts, math and curiosity/growth constructs align strongly between humans and agents, whereas classroom-climate and well-being constructs are only moderately aligned, revealing a fidelity gradient. All personas are generated with Qwen2.5-72B, and HACHIMI provides a standardized synthetic student population for group-level benchmarking and social-science simulations. Resources available at https://github.com/ZeroLoss-Lab/HACHIMI
Executive Summary
This article introduces HACHIMI, a multi-agent framework for generating student personas (SPs) aligned with educational theory and population distributions. HACHIMI factorizes each persona into a theory-anchored educational schema, validates it through a neuro-symbolic validator, and combines stratified sampling with semantic deduplication to reduce mode collapse. The resulting 1M-persona corpus, HACHIMI-1M, is demonstrated to have near-perfect schema validity, accurate quotas, and substantial diversity. External evaluation shows strong alignment between human and agent responses on math and curiosity/growth constructs, but moderate alignment on classroom-climate and well-being constructs. This work has significant implications for developing standardized synthetic student populations for group-level benchmarking and social-science simulations.
Key Points
- ▸ HACHIMI is a multi-agent framework for generating theory-aligned and quota-controlled student personas.
- ▸ HACHIMI factorizes each persona into a theory-anchored educational schema and validates it through a neuro-symbolic validator.
- ▸ The resulting 1M-persona corpus, HACHIMI-1M, demonstrates near-perfect schema validity, accurate quotas, and substantial diversity.
Merits
Strength in Addressing Previous Limitations
HACHIMI effectively addresses previous limitations in student persona generation, such as reliance on ad-hoc prompting or hand-crafted profiles, by providing a standardized and scalable approach.
Demerits
Limited Generalizability to Real-World Classrooms
While HACHIMI-1M shows strong alignment with human responses on certain constructs, moderate alignment on other constructs, such as classroom-climate and well-being, suggests that further research is needed to ensure generalizability to real-world classrooms.
Expert Commentary
The introduction of HACHIMI is a significant contribution to the field of AI-powered educational tools, as it addresses a critical need for standardized and scalable student persona generation. The framework's ability to factorize personas into theory-anchored educational schemas and validate them through a neuro-symbolic validator demonstrates a high degree of sophistication and control. However, the moderate alignment on certain constructs, such as classroom-climate and well-being, suggests that further research is needed to ensure generalizability to real-world classrooms. This work has significant implications for both practical and policy-related applications, and its potential impact on education and student outcomes warrants further exploration.
Recommendations
- ✓ Future research should focus on refining HACHIMI to improve alignment with real-world classroom dynamics, particularly on constructs such as classroom-climate and well-being.
- ✓ Educators and researchers should explore the potential applications of HACHIMI in evaluating educational interventions and policies, as well as in developing more effective teaching and learning strategies.