A unified foundational framework for knowledge injection and evaluation of Large Language Models in Combustion Science
arXiv:2603.04452v1 Announce Type: new Abstract: To advance foundation Large Language Models (LLMs) for combustion science, this study presents the first end-to-end framework for developing domain-specialized models for the combustion community. The framework comprises an AI-ready multimodal knowledge base at the 3.5 billion-token scale, extracted from over 200,000 peer-reviewed articles, 8,000 theses and dissertations, and approximately 400,000 lines of combustion CFD code; a rigorous and largely automated evaluation benchmark (CombustionQA, 436 questions across eight subfields); and a three-stage knowledge-injection pathway that progresses from lightweight retrieval-augmented generation (RAG) to knowledge-graph-enhanced retrieval and continued pretraining. We first quantitatively validate Stage 1 (naive RAG) and find a hard ceiling: standard RAG accuracy peaks at 60%, far surpassing zero-shot performance (23%) yet well below the theoretical upper bound (87%). We further demonstrate t
arXiv:2603.04452v1 Announce Type: new Abstract: To advance foundation Large Language Models (LLMs) for combustion science, this study presents the first end-to-end framework for developing domain-specialized models for the combustion community. The framework comprises an AI-ready multimodal knowledge base at the 3.5 billion-token scale, extracted from over 200,000 peer-reviewed articles, 8,000 theses and dissertations, and approximately 400,000 lines of combustion CFD code; a rigorous and largely automated evaluation benchmark (CombustionQA, 436 questions across eight subfields); and a three-stage knowledge-injection pathway that progresses from lightweight retrieval-augmented generation (RAG) to knowledge-graph-enhanced retrieval and continued pretraining. We first quantitatively validate Stage 1 (naive RAG) and find a hard ceiling: standard RAG accuracy peaks at 60%, far surpassing zero-shot performance (23%) yet well below the theoretical upper bound (87%). We further demonstrate that this stage's performance is severely constrained by context contamination. Consequently, building a domain foundation model requires structured knowledge graphs and continued pretraining (Stages 2 and 3).
Executive Summary
This study proposes a unified foundational framework for developing Large Language Models (LLMs) in combustion science. The framework consists of an AI-ready knowledge base, an evaluation benchmark (CombustionQA), and a three-stage knowledge-injection pathway. The authors validate the first stage (naive retrieval-augmented generation) and find a hard ceiling at 60% accuracy, constrained by context contamination. They conclude that a domain foundation model requires structured knowledge graphs and continued pretraining. This framework has significant implications for advancing LLMs in combustion science and addressing knowledge gaps in this field. The study's findings highlight the importance of rigorous evaluation and knowledge-injection pathways in developing effective LLMs.
Key Points
- ▸ A unified framework for developing LLMs in combustion science is proposed.
- ▸ The framework consists of an AI-ready knowledge base, CombustionQA, and a three-stage knowledge-injection pathway.
- ▸ Naive retrieval-augmented generation has a hard ceiling at 60% accuracy, constrained by context contamination.
Merits
Strength in Rigorous Evaluation
The study employs a rigorous evaluation benchmark (CombustionQA) to assess the performance of LLMs in combustion science.
Structured Knowledge Graphs
The authors emphasize the importance of structured knowledge graphs in developing effective LLMs in combustion science.
Demerits
Limitation of Naive Retrieval-Augmented Generation
The study finds that naive retrieval-augmented generation has a hard ceiling at 60% accuracy, which is far below the theoretical upper bound.
Expert Commentary
This study makes significant contributions to the field of LLMs in science by proposing a unified framework and highlighting the importance of rigorous evaluation and knowledge-injection pathways. However, the study also reveals the limitations of naive retrieval-augmented generation, which has implications for the development of effective LLMs. The authors' emphasis on structured knowledge graphs and continued pretraining is a crucial step towards developing explainable AI in science. This study has significant implications for advancing LLMs in combustion science and addressing knowledge gaps in this field.
Recommendations
- ✓ Future studies should investigate the application of the proposed framework in various domains within combustion science.
- ✓ The development of explainable AI in science should be a priority in the development of LLMs.