Academic

A unified foundational framework for knowledge injection and evaluation of Large Language Models in Combustion Science

arXiv:2603.04452v1 Announce Type: new Abstract: To advance foundation Large Language Models (LLMs) for combustion science, this study presents the first end-to-end framework for developing domain-specialized models for the combustion community. The framework comprises an AI-ready multimodal knowledge base at the 3.5 billion-token scale, extracted from over 200,000 peer-reviewed articles, 8,000 theses and dissertations, and approximately 400,000 lines of combustion CFD code; a rigorous and largely automated evaluation benchmark (CombustionQA, 436 questions across eight subfields); and a three-stage knowledge-injection pathway that progresses from lightweight retrieval-augmented generation (RAG) to knowledge-graph-enhanced retrieval and continued pretraining. We first quantitatively validate Stage 1 (naive RAG) and find a hard ceiling: standard RAG accuracy peaks at 60%, far surpassing zero-shot performance (23%) yet well below the theoretical upper bound (87%). We further demonstrate t

Zonglin Yang, Runze Mao, Tianhao Wu, Han Li, QingGuo Zhou, Zhi X. Chen · March 7, 2026 · 1 min read · 11 views

#cs.CL #cs.AI

Executive Summary

This study proposes a unified foundational framework for developing Large Language Models (LLMs) in combustion science. The framework consists of an AI-ready knowledge base, an evaluation benchmark (CombustionQA), and a three-stage knowledge-injection pathway. The authors validate the first stage (naive retrieval-augmented generation) and find a hard ceiling at 60% accuracy, constrained by context contamination. They conclude that a domain foundation model requires structured knowledge graphs and continued pretraining. This framework has significant implications for advancing LLMs in combustion science and addressing knowledge gaps in this field. The study's findings highlight the importance of rigorous evaluation and knowledge-injection pathways in developing effective LLMs.

Key Points

▸ A unified framework for developing LLMs in combustion science is proposed.
▸ The framework consists of an AI-ready knowledge base, CombustionQA, and a three-stage knowledge-injection pathway.
▸ Naive retrieval-augmented generation has a hard ceiling at 60% accuracy, constrained by context contamination.

Merits

Strength in Rigorous Evaluation

The study employs a rigorous evaluation benchmark (CombustionQA) to assess the performance of LLMs in combustion science.

Structured Knowledge Graphs

The authors emphasize the importance of structured knowledge graphs in developing effective LLMs in combustion science.

Demerits

Limitation of Naive Retrieval-Augmented Generation

The study finds that naive retrieval-augmented generation has a hard ceiling at 60% accuracy, which is far below the theoretical upper bound.

Expert Commentary

This study makes significant contributions to the field of LLMs in science by proposing a unified framework and highlighting the importance of rigorous evaluation and knowledge-injection pathways. However, the study also reveals the limitations of naive retrieval-augmented generation, which has implications for the development of effective LLMs. The authors' emphasis on structured knowledge graphs and continued pretraining is a crucial step towards developing explainable AI in science. This study has significant implications for advancing LLMs in combustion science and addressing knowledge gaps in this field.

Recommendations

✓ Future studies should investigate the application of the proposed framework in various domains within combustion science.
✓ The development of explainable AI in science should be a priority in the development of LLMs.

Sources

arXiv - cs.CL

A unified foundational framework for knowledge injection and evaluation of Large Language Models in Combustion Science

AI Commentary

Executive Summary

Key Points

Merits

Strength in Rigorous Evaluation

Structured Knowledge Graphs

Demerits

Limitation of Naive Retrieval-Augmented Generation

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs