Using LLMs for Knowledge Component-level Correctness Labeling in Open-ended Coding Problems
arXiv:2602.17542v1 Announce Type: new Abstract: Fine-grained skill representations, commonly referred to as knowledge components (KCs), are fundamental to many approaches in student modeling and learning analytics. However, KC-level correctness labels are rarely available in real-world datasets, especially for open-ended programming tasks where solutions typically involve multiple KCs simultaneously. Simply propagating problem-level correctness to all associated KCs obscures partial mastery and often leads to poorly fitted learning curves. To address this challenge, we propose an automated framework that leverages large language models (LLMs) to label KC-level correctness directly from student-written code. Our method assesses whether each KC is correctly applied and further introduces a temporal context-aware Code-KC mapping mechanism to better align KCs with individual student code. We evaluate the resulting KC-level correctness labels in terms of learning curve fit and predictive p
arXiv:2602.17542v1 Announce Type: new Abstract: Fine-grained skill representations, commonly referred to as knowledge components (KCs), are fundamental to many approaches in student modeling and learning analytics. However, KC-level correctness labels are rarely available in real-world datasets, especially for open-ended programming tasks where solutions typically involve multiple KCs simultaneously. Simply propagating problem-level correctness to all associated KCs obscures partial mastery and often leads to poorly fitted learning curves. To address this challenge, we propose an automated framework that leverages large language models (LLMs) to label KC-level correctness directly from student-written code. Our method assesses whether each KC is correctly applied and further introduces a temporal context-aware Code-KC mapping mechanism to better align KCs with individual student code. We evaluate the resulting KC-level correctness labels in terms of learning curve fit and predictive performance using the power law of practice and the Additive Factors Model. Experimental results show that our framework leads to learning curves that are more consistent with cognitive theory and improves predictive performance, compared to baselines. Human evaluation further demonstrates substantial agreement between LLM and expert annotations.
Executive Summary
This article proposes an innovative framework that leverages large language models (LLMs) to automate the labeling of knowledge component-level correctness in open-ended coding problems. The framework assesses the correct application of each knowledge component and introduces a temporal context-aware Code-KC mapping mechanism to better align components with individual student code. Experimental results demonstrate improved learning curve fit and predictive performance compared to baselines, with human evaluation showing substantial agreement between LLM and expert annotations. This breakthrough has significant implications for student modeling and learning analytics, enabling more accurate and nuanced assessments of student learning.
Key Points
- ▸ Proposes an automated framework for KC-level correctness labeling using LLMs
- ▸ Introduces a temporal context-aware Code-KC mapping mechanism
- ▸ Demonstrates improved learning curve fit and predictive performance
- ▸ Achieves substantial agreement between LLM and expert annotations
Merits
Strength in Methodology
The use of LLMs to automate KC-level correctness labeling is a novel and effective approach, leveraging the power of large language models to improve the accuracy and efficiency of student modeling and learning analytics.
Demerits
Limitation in Generalizability
The framework's effectiveness may be limited to specific domains or tasks, and further research is needed to explore its generalizability to other areas of education and learning analytics.
Expert Commentary
The article presents a significant contribution to the field of student modeling and learning analytics, offering a novel and effective approach to automating KC-level correctness labeling. The use of LLMs demonstrates the power of large language models in improving the accuracy and efficiency of student modeling and learning analytics. However, further research is needed to explore the framework's generalizability and scalability to other domains and tasks. The implications of this work are substantial, with potential applications in education policy and practice. As the field continues to evolve, it will be essential to consider the intersection of human expertise and automated methods, ensuring that both are leveraged to advance our understanding of student learning and behavior.
Recommendations
- ✓ Future research should aim to explore the framework's generalizability and scalability to other domains and tasks, and to develop more robust and widely applicable methods for KC-level correctness labeling.
- ✓ The use of LLMs for KC-level correctness labeling should be further investigated in various educational settings, with a focus on the practical applications and policy implications of this approach.