Cognitively Layered Data Synthesis for Domain Adaptation of LLMs to Space Situational Awareness
arXiv:2603.09231v1 Announce Type: new Abstract: Large language models (LLMs) demonstrate exceptional performance on general-purpose tasks. however, transferring them to complex engineering domains such as space situational awareness (SSA) remains challenging owing to insufficient structural alignment with mission chains, the absence of higher-order cognitive supervision, and poor correspondence between data quality criteria and engineering specifications. The core bottleneck is the construction of high-quality supervised fine-tuning (SFT) datasets. To this end, we propose BD-FDG (Bloom's Taxonomy-based Domain-specific Fine-tuning Data Generation), a framework that addresses incomplete knowledge coverage, shallow cognitive depth, and limited quality controllability through three mechanisms: structured knowledge organization, cognitively layered question modeling, and automated quality control. The framework uses a knowledge tree to ensure structured corpus coverage, designs a question
arXiv:2603.09231v1 Announce Type: new Abstract: Large language models (LLMs) demonstrate exceptional performance on general-purpose tasks. however, transferring them to complex engineering domains such as space situational awareness (SSA) remains challenging owing to insufficient structural alignment with mission chains, the absence of higher-order cognitive supervision, and poor correspondence between data quality criteria and engineering specifications. The core bottleneck is the construction of high-quality supervised fine-tuning (SFT) datasets. To this end, we propose BD-FDG (Bloom's Taxonomy-based Domain-specific Fine-tuning Data Generation), a framework that addresses incomplete knowledge coverage, shallow cognitive depth, and limited quality controllability through three mechanisms: structured knowledge organization, cognitively layered question modeling, and automated quality control. The framework uses a knowledge tree to ensure structured corpus coverage, designs a question generation scheme spanning nine categories and six cognitive levels from Remember to Create to produce samples with a continuous difficulty gradient, and applies a multidimensional scoring pipeline to enforce domain rigor and consistency. Using BD-FDG, we construct SSA-SFT, a domain dataset of approximately 230K samples, and fine-tune Qwen3-8B to obtain SSA-LLM-8B. Experiments show that SSA-LLM-8B achieves relative BLEU-1 improvements of 144\% (no-think) and 176\% (think) on the domain test set and a win rate of 82.21\% over the baseline in arena comparisons, while largely preserving general benchmark performance (MMLU-Pro, MATH-500). These results validate SFT data construction driven by cognitive layering as an effective paradigm for complex engineering domains and provide a transferable framework for domain-specific LLM adaptation.
Executive Summary
This article proposes a novel framework, BD-FDG, for domain adaptation of large language models (LLMs) to complex engineering domains such as space situational awareness (SSA). The framework addresses the challenges of insufficient structural alignment, incomplete knowledge coverage, and poor data quality through three mechanisms: structured knowledge organization, cognitively layered question modeling, and automated quality control. The authors construct a domain dataset of approximately 230K samples using BD-FDG and fine-tune Qwen3-8B to achieve significant improvements in performance on SSA-specific tasks. The results demonstrate the effectiveness of BD-FDG as a transferable framework for domain-specific LLM adaptation. This breakthrough has significant implications for the application of LLMs in complex engineering domains.
Key Points
- ▸ The article proposes a novel framework, BD-FDG, for domain adaptation of LLMs to complex engineering domains.
- ▸ The framework addresses challenges of insufficient structural alignment, incomplete knowledge coverage, and poor data quality.
- ▸ The authors construct a domain dataset of approximately 230K samples using BD-FDG and achieve significant improvements in performance on SSA-specific tasks.
Merits
Strength in Addressing Domain-Specific Challenges
The framework effectively addresses the challenges of insufficient structural alignment, incomplete knowledge coverage, and poor data quality, making it a significant contribution to the field.
Transferable Framework for Domain-Specific LLM Adaptation
The authors demonstrate the effectiveness of BD-FDG as a transferable framework for domain-specific LLM adaptation, which has significant implications for the application of LLMs in complex engineering domains.
Quantifiable Performance Improvements
The results demonstrate significant improvements in performance on SSA-specific tasks, providing a clear indication of the framework's effectiveness.
Demerits
Limited Dataset Size
The dataset of approximately 230K samples may be relatively small compared to other domains, which could impact the robustness of the results.
Dependence on Specific LLM Model
The results are dependent on the specific LLM model used (Qwen3-8B), which may limit the generalizability of the findings.
Limited Exploration of Alternative Frameworks
The article primarily focuses on the BD-FDG framework, and it would be beneficial to explore alternative frameworks and their performance in domain-specific LLM adaptation.
Expert Commentary
The article presents a novel and effective framework for domain adaptation of LLMs to complex engineering domains. The use of cognitive layering and structured knowledge organization addresses the challenges of domain-specific tasks and provides a clear methodology for the construction of high-quality datasets. The results demonstrate the effectiveness of the framework, and the transferable nature of the framework has significant implications for the application of LLMs in other domains. However, the limited dataset size and dependence on specific LLM models are notable limitations. Overall, this article is a significant contribution to the field and has far-reaching implications for the application of LLMs in complex engineering domains.
Recommendations
- ✓ Future research should explore the application of the BD-FDG framework in other domains and the extension of the framework to accommodate more complex tasks and datasets.
- ✓ The development of more robust and generalizable methodologies for domain adaptation is crucial for the effective application of LLMs in complex engineering domains.