Full-Stack Domain Enhancement for Combustion LLMs: Construction and Optimization
arXiv:2603.19268v1 Announce Type: cross Abstract: Large language models (LLMs) in the direction of task adaptation and capability enhancement for professional fields demonstrate significant application potential. Nevertheless, for complex physical systems such as combustion science, general-purpose LLMs often generate severe hallucinations due to insufficient domain knowledge and the inability to adhere to physical conservation laws. To address this issue, we propose the first full-stack domain-enhanced LLM workflow tailored for the field of combustion science, which integrates automated domain corpus construction, incremental pre-training, instruction fine-tuning, and verifiable reward-based reinforcement learning. This workflow ensures that the model truly internalizes physical laws rather than merely learning textual statistical patterns. We also release FlameBench, a standardized evaluation benchmark specifically designed for complex reasoning tasks in combustion science. Experime
arXiv:2603.19268v1 Announce Type: cross Abstract: Large language models (LLMs) in the direction of task adaptation and capability enhancement for professional fields demonstrate significant application potential. Nevertheless, for complex physical systems such as combustion science, general-purpose LLMs often generate severe hallucinations due to insufficient domain knowledge and the inability to adhere to physical conservation laws. To address this issue, we propose the first full-stack domain-enhanced LLM workflow tailored for the field of combustion science, which integrates automated domain corpus construction, incremental pre-training, instruction fine-tuning, and verifiable reward-based reinforcement learning. This workflow ensures that the model truly internalizes physical laws rather than merely learning textual statistical patterns. We also release FlameBench, a standardized evaluation benchmark specifically designed for complex reasoning tasks in combustion science. Experimental results demonstrate that the model developed in this work significantly outperforms state-of-the-art general-purpose closed-source models and traditional retrieval-augmented generation methods on combustion science reasoning tasks. This work lays a solid technical and resource foundation for the subsequent development of domain-specific scientific research agents with reliable scientific reasoning capabilities.
Executive Summary
This article proposes a full-stack domain-enhanced Large Language Model (LLM) workflow tailored for combustion science, addressing the issue of hallucinations in general-purpose LLMs. The workflow integrates automated domain corpus construction, incremental pre-training, instruction fine-tuning, and verifiable reward-based reinforcement learning to ensure the model internalizes physical laws. Experimental results show the model outperforms state-of-the-art general-purpose models and traditional retrieval-augmented generation methods on combustion science reasoning tasks. The authors release FlameBench, a standardized evaluation benchmark for complex reasoning tasks in combustion science. This work lays a foundation for developing domain-specific scientific research agents with reliable scientific reasoning capabilities.
Key Points
- ▸ The proposed full-stack domain-enhanced LLM workflow addresses hallucinations in general-purpose LLMs for combustion science.
- ▸ The workflow integrates automated domain corpus construction, incremental pre-training, instruction fine-tuning, and verifiable reward-based reinforcement learning.
- ▸ The model outperforms state-of-the-art general-purpose models and traditional retrieval-augmented generation methods on combustion science reasoning tasks.
Merits
Strength in Addressing Hallucinations
The proposed workflow effectively addresses the issue of hallucinations in general-purpose LLMs, ensuring the model internalizes physical laws and adheres to conservation laws.
Standardized Evaluation Benchmark
The authors release FlameBench, a standardized evaluation benchmark for complex reasoning tasks in combustion science, providing a reliable assessment of the model's performance.
Demerits
Limited Generalizability
The model's performance may not generalize well to other complex scientific domains, limiting its applicability beyond combustion science.
High Computational Requirements
The proposed workflow requires significant computational resources, which may be a barrier to adoption in resource-constrained environments.
Expert Commentary
The authors' proposal of a full-stack domain-enhanced LLM workflow for combustion science is a significant contribution to the field of AI and scientific research. The integration of automated domain corpus construction, incremental pre-training, instruction fine-tuning, and verifiable reward-based reinforcement learning addresses the limitations of general-purpose LLMs and ensures the model internalizes physical laws. However, the model's performance may not generalize well to other complex scientific domains, and the high computational requirements may be a barrier to adoption. Nonetheless, the work provides a solid foundation for the development of domain-specific scientific research agents with reliable scientific reasoning capabilities.
Recommendations
- ✓ Future research should focus on adapting the proposed workflow for other complex scientific domains, addressing the limitations of general-purpose LLMs in those areas.
- ✓ The development of more efficient and computationally feasible approaches to verifiable reward-based reinforcement learning is essential for widespread adoption of domain-specific scientific research agents.
Sources
Original: arXiv - cs.AI