Academic

MERIT: Memory-Enhanced Retrieval for Interpretable Knowledge Tracing

Runze Li, Kedi Chen, Guwei Feng, Mo Yu, Jun Wang, Wei Zhang · March 25, 2026 · 1 min read · 3 views

#cs.CL #cs.AI

arXiv:2603.22289v1 Announce Type: new Abstract: Knowledge Tracing (KT) models students' evolving knowledge states to predict future performance, serving as a foundation for personalized education. While traditional deep learning models achieve high accuracy, they often lack interpretability. Large Language Models (LLMs) offer strong reasoning capabilities but struggle with limited context windows and hallucinations. Furthermore, existing LLM-based methods typically require expensive fine-tuning, limiting scalability and adaptability to new data. We propose MERIT (Memory-Enhanced Retrieval for Interpretable Knowledge Tracing), a training-free framework combining frozen LLM reasoning with structured pedagogical memory. Rather than updating parameters, MERIT transforms raw interaction logs into an interpretable memory bank. The framework uses semantic denoising to categorize students into latent cognitive schemas and constructs a paradigm bank where representative error patterns are analyzed offline to generate explicit Chain-of-Thought (CoT) rationales. During inference, a hierarchical routing mechanism retrieves relevant contexts, while a logic-augmented module applies semantic constraints to calibrate predictions. By grounding the LLM in interpretable memory, MERIT achieves state-of-the-art performance on real-world datasets without gradient updates. This approach reduces computational costs and supports dynamic knowledge updates, improving the accessibility and transparency of educational diagnosis.

Executive Summary

MERIT introduces a novel, training-free framework for Knowledge Tracing (KT) that integrates frozen Large Language Model (LLM) reasoning with structured pedagogical memory, offering a compelling balance between interpretability and performance. Unlike conventional deep learning or fine-tuned LLM-based KT models, MERIT avoids parameter updates by transforming raw student interaction logs into an interpretable memory bank. Through semantic denoising and latent cognitive schema categorization, the framework generates a paradigm bank of representative error patterns, enabling offline analysis for explicit Chain-of-Thought (CoT) rationales. During inference, a hierarchical routing mechanism enables efficient context retrieval, while a logic-augmented module introduces semantic constraints to calibrate predictions. The result is a scalable, transparent, and effective KT system that achieves state-of-the-art performance on real-world datasets without gradient updates, reducing computational costs and enhancing educational diagnosis accessibility.

Key Points

▸ MERIT eliminates the need for gradient updates by using a frozen LLM architecture.
▸ The framework leverages semantic denoising and latent schema categorization to enhance interpretability.
▸ It achieves state-of-the-art performance without modifying model parameters, reducing both cost and complexity.

Merits

Interpretability Without Sacrifice

MERIT maintains high predictive accuracy while introducing a clear interpretable memory mechanism, addressing a critical gap in KT literature.

Scalability and Adaptability

By avoiding fine-tuning and parameter updates, MERIT supports dynamic data adaptation and reduces computational overhead, making it suitable for large-scale educational platforms.

Demerits

Context Window Constraints

Although MERIT avoids parameter updates, reliance on frozen LLM context windows may limit depth of analysis for very long interaction histories, potentially affecting granularity in complex cases.

Offline Processing Dependency

The effectiveness of the paradigm bank construction depends on accurate offline analysis of error patterns, which may introduce latency or require additional infrastructure for real-time deployment.

Expert Commentary

MERIT represents a significant advance in the intersection of AI and education by successfully reconciling the competing demands of interpretability and performance. The framework’s use of a frozen LLM as a reasoning engine—rather than a parameter-updating model—is a masterstroke in computational efficiency and ethical AI design. While traditional KT models have prioritized accuracy at the expense of explainability, MERIT flips this paradigm by anchoring predictions in structured, semantic memory, effectively transforming the LLM from an opaque predictor into a transparent diagnostic assistant. The concept of a paradigm bank as a curated repository of error patterns is particularly innovative; it mirrors the cognitive science principle of schema-based learning and extends it into computational AI. Furthermore, the hierarchical routing mechanism demonstrates practical engineering ingenuity, enabling efficient retrieval without compromising interpretability. The implications extend beyond KT: MERIT offers a replicable architectural pattern for other domains where interpretable AI is critical, such as healthcare diagnostics or legal reasoning. One caveat to consider: while the current implementation avoids parameter updates, future iterations may benefit from hybrid models that selectively activate parameter fine-tuning for specific edge cases without compromising the core interpretability framework. Overall, MERIT sets a new standard for explainable AI in education.

Recommendations

✓ 1. Implement MERIT as a pilot module in institutional KT systems for diagnostic evaluation and comparative performance benchmarking.
✓ 2. Explore hybrid architectures that integrate MERIT’s memory-based interpretability with selective parameter fine-tuning for enhanced adaptability in dynamic learning environments.

Sources

Original: arXiv - cs.CL

arXiv - cs.CL

MERIT: Memory-Enhanced Retrieval for Interpretable Knowledge Tracing

AI Commentary

Executive Summary

Key Points

Merits

Interpretability Without Sacrifice

Scalability and Adaptability

Demerits

Context Window Constraints

Offline Processing Dependency

Expert Commentary

Recommendations

Sources

Related Articles

Cross-subject Muscle Fatigue Detection via Adversarial and Supervised Contrastive Learning …

A Numerical Method for Coupling Parameterized Physics-Informed Neural Networks and …

Low-Rank Compression of Pretrained Models via Randomized Subspace Iteration

Product-Stability: Provable Convergence for Gradient Descent on the Edge of …

JCG, PC

HSOLLC Co., Ltd.