Academic

Qwen-BIM: developing large language model for BIM-based design with domain-specific benchmark and dataset

Jia-Rui Lin, Yun-Hong Cai, Xiang-Rui Ni, Shaojie Zhou, Peng Pan · March 2, 2026 · 1 min read · 41 views

#cs.AI

arXiv:2602.20812v1 Announce Type: new Abstract: As the construction industry advances toward digital transformation, BIM (Building Information Modeling)-based design has become a key driver supporting intelligent construction. Despite Large Language Models (LLMs) have shown potential in promoting BIM-based design, the lack of specific datasets and LLM evaluation benchmarks has significantly hindered the performance of LLMs. Therefore, this paper addresses this gap by proposing: 1) an evaluation benchmark for BIM-based design together with corresponding quantitative indicators to evaluate the performance of LLMs, 2) a method for generating textual data from BIM and constructing corresponding BIM-derived datasets for LLM evaluation and fine-tuning, and 3) a fine-tuning strategy to adapt LLMs for BIM-based design. Results demonstrate that the proposed domain-specific benchmark effectively and comprehensively assesses LLM capabilities, highlighting that general LLMs are still incompetent for domain-specific tasks. Meanwhile, with the proposed benchmark and datasets, Qwen-BIM is developed and achieves a 21.0% average increase in G-Eval score compared to the base LLM model. Notably, with only 14B parameters, performance of Qwen-BIM is comparable to that of general LLMs with 671B parameters for BIM-based design tasks. Overall, this study develops the first domain-specific LLM for BIM-based design by introducing a comprehensive benchmark and high-quality dataset, which provide a solid foundation for developing BIM-related LLMs in various fields.

Executive Summary

This article proposes Qwen-BIM, a large language model for BIM-based design, addressing the lack of domain-specific datasets and benchmarks. The authors introduce a comprehensive benchmark, a method for generating textual data from BIM, and a fine-tuning strategy. Qwen-BIM achieves a 21.0% average increase in G-Eval score compared to the base model, with performance comparable to general LLMs with significantly more parameters. This study provides a foundation for developing BIM-related LLMs, supporting the construction industry's digital transformation.

Key Points

▸ Introduction of a domain-specific benchmark for BIM-based design
▸ Development of a method for generating textual data from BIM
▸ Proposal of a fine-tuning strategy for adapting LLMs to BIM-based design

Merits

Comprehensive Benchmark

The proposed benchmark effectively assesses LLM capabilities, highlighting the need for domain-specific models.

Demerits

Limited Parameter Comparison

The comparison of Qwen-BIM's performance to general LLMs with significantly more parameters may not be entirely fair, as it does not account for potential optimizations in the larger models.

Expert Commentary

The introduction of Qwen-BIM marks a significant step forward in the development of AI solutions for the construction industry. By addressing the lack of domain-specific datasets and benchmarks, the authors provide a foundation for the creation of more effective and efficient BIM-based design tools. The proposed fine-tuning strategy and comprehensive benchmark will likely have a lasting impact on the field, enabling the development of more specialized and accurate LLMs. However, further research is needed to fully explore the potential of Qwen-BIM and its applications in real-world construction projects.

Recommendations

✓ Further evaluation of Qwen-BIM in real-world construction projects to assess its practical effectiveness
✓ Exploration of potential applications of Qwen-BIM in related fields, such as architecture and urban planning

Sources

arXiv - cs.AI

Qwen-BIM: developing large language model for BIM-based design with domain-specific benchmark and dataset

AI Commentary

Executive Summary

Key Points

Merits

Comprehensive Benchmark

Demerits

Limited Parameter Comparison

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs