Self-hosted Lecture-to-Quiz: Local LLM MCQ Generation with Deterministic Quality Control
arXiv:2603.08729v1 Announce Type: cross Abstract: We present an end-to-end self-hosted (API-free) pipeline, where API-free means that lecture content is not sent to any external LLM service, that converts lecture PDFs into multiple-choice questions (MCQs) using a local LLM plus deterministic quality control (QC). The pipeline is designed for black-box minimization: LLMs may assist drafting, but the final released artifacts are plain-text question banks with an explicit QC trace and without any need to call an LLM at deployment time. We run a seed sweep on three short "dummy lectures" (information theory, thermodynamics, and statistical mechanics), collecting 15 runs x 8 questions = 120 accepted candidates (122 attempts total under bounded retries). All 120 accepted candidates satisfy hard QC checks (JSON schema conformance, a single marked correct option, and numeric/constant equivalence tests); however, the warning layer flags 8/120 items (spanning 8 runs) that expose residual qualit
arXiv:2603.08729v1 Announce Type: cross Abstract: We present an end-to-end self-hosted (API-free) pipeline, where API-free means that lecture content is not sent to any external LLM service, that converts lecture PDFs into multiple-choice questions (MCQs) using a local LLM plus deterministic quality control (QC). The pipeline is designed for black-box minimization: LLMs may assist drafting, but the final released artifacts are plain-text question banks with an explicit QC trace and without any need to call an LLM at deployment time. We run a seed sweep on three short "dummy lectures" (information theory, thermodynamics, and statistical mechanics), collecting 15 runs x 8 questions = 120 accepted candidates (122 attempts total under bounded retries). All 120 accepted candidates satisfy hard QC checks (JSON schema conformance, a single marked correct option, and numeric/constant equivalence tests); however, the warning layer flags 8/120 items (spanning 8 runs) that expose residual quality risks such as duplicated distractors or missing rounding instructions. We report a warning taxonomy with concrete before->after fixes, and we release the final 24-question set (three lectures x 8 questions) as JSONL/CSV for Google Forms import (e.g., via Apps Script or API tooling) included as ancillary files under anc/. Finally, we position the work through the AI to Learn (AI2L) rubric lens and argue that self-hosted MCQ generation with explicit QC supports privacy, accountability, and Green AI in educational workflows.
Executive Summary
This article presents a novel self-hosted pipeline for converting lecture PDFs into multiple-choice questions (MCQs) using a local large language model (LLM) with deterministic quality control. The pipeline minimizes reliance on external LLM services and generates plain-text question banks with an explicit quality control trace. The authors report a robust pipeline that produces high-quality MCQs with minimal residual quality risks. The work has significant implications for educational workflows, supporting privacy, accountability, and Green AI principles. The proposed solution could be particularly valuable in institutions with strict data governance policies or those seeking to reduce their reliance on cloud-based services.
Key Points
- ▸ Self-hosted pipeline for lecture-to-quiz conversion using a local LLM
- ▸ Deterministic quality control ensures high-quality MCQs
- ▸ Minimizes reliance on external LLM services and cloud-based infrastructure
- ▸ Supports Green AI principles by reducing energy consumption and carbon footprint
Merits
Robustness and Reliability
The pipeline demonstrates robustness and reliability through its ability to generate high-quality MCQs with minimal residual quality risks.
Flexibility and Customizability
The self-hosted pipeline allows for flexibility and customizability, enabling users to adapt the solution to their specific needs and workflows.
Green AI Principles
The proposed solution supports Green AI principles by reducing energy consumption and carbon footprint associated with cloud-based services.
Demerits
Technical Complexity
The pipeline may require significant technical expertise to implement and maintain, potentially limiting its adoption in certain settings.
Scalability Limitations
The self-hosted pipeline may face scalability limitations, particularly when dealing with large volumes of lecture content or complex question formats.
Expert Commentary
The article presents a significant contribution to the field of AI in education, demonstrating the potential of LLMs for generating high-quality educational content. The proposed pipeline's emphasis on deterministic quality control and self-hosted architecture addresses key concerns related to data governance and privacy. While the solution may require significant technical expertise to implement and maintain, its potential benefits justify further exploration. The article's implications for data governance, privacy, and Green AI principles are particularly noteworthy, highlighting the need for policymakers to consider these factors when developing AI applications in education.
Recommendations
- ✓ Further research is needed to investigate the scalability limitations of the proposed pipeline and explore potential solutions to address these challenges.
- ✓ Educational institutions and policymakers should prioritize data governance and privacy when adopting AI solutions for educational content creation.