Model Merging in the Era of Large Language Models: Methods, Applications, and Future Directions
arXiv:2603.09938v1 Announce Type: new Abstract: Model merging has emerged as a transformative paradigm for combining the capabilities of multiple neural networks into a single unified model without additional training. With the rapid proliferation of fine-tuned large language models~(LLMs), merging techniques offer a computationally efficient alternative to ensembles and full retraining, enabling practitioners to compose specialized capabilities at minimal cost. This survey presents a comprehensive and structured examination of model merging in the LLM era through the \textbf{FUSE} taxonomy, a four-dimensional framework organized along \textbf{F}oundations, \textbf{U}nification Strategies, \textbf{S}cenarios, and \textbf{E}cosystem. We first establish the theoretical underpinnings of merging, including loss landscape geometry, mode connectivity, and the linear mode connectivity hypothesis. We then systematically review the algorithmic landscape, spanning weight averaging, task vector
arXiv:2603.09938v1 Announce Type: new Abstract: Model merging has emerged as a transformative paradigm for combining the capabilities of multiple neural networks into a single unified model without additional training. With the rapid proliferation of fine-tuned large language models~(LLMs), merging techniques offer a computationally efficient alternative to ensembles and full retraining, enabling practitioners to compose specialized capabilities at minimal cost. This survey presents a comprehensive and structured examination of model merging in the LLM era through the \textbf{FUSE} taxonomy, a four-dimensional framework organized along \textbf{F}oundations, \textbf{U}nification Strategies, \textbf{S}cenarios, and \textbf{E}cosystem. We first establish the theoretical underpinnings of merging, including loss landscape geometry, mode connectivity, and the linear mode connectivity hypothesis. We then systematically review the algorithmic landscape, spanning weight averaging, task vector arithmetic, sparsification-enhanced methods, mixture-of-experts architectures, and evolutionary optimization approaches. For each method family, we analyze the core formulation, highlight representative works, and discuss practical trade-offs. We further examine downstream applications across multi-task learning, safety alignment, domain specialization, multilingual transfer, and federated learning. Finally, we survey the supporting ecosystem of open-source tools, community platforms, and evaluation benchmarks, and identify key open challenges including theoretical gaps, scalability barriers, and standardization needs. This survey aims to equip researchers and practitioners with a structured foundation for advancing model merging.
Executive Summary
This comprehensive survey on model merging in the era of large language models provides a structured framework for advancing this transformative paradigm. The FUSE taxonomy offers a four-dimensional framework for examining model merging, covering foundations, unification strategies, scenarios, and ecosystem. The survey systematically reviews algorithmic methods, including weight averaging, task vector arithmetic, and mixture-of-experts architectures, and examines downstream applications across multi-task learning, safety alignment, and domain specialization. It also surveys the supporting ecosystem of open-source tools and evaluation benchmarks, highlighting key open challenges. This survey aims to equip researchers and practitioners with a solid foundation for advancing model merging, which is essential for harnessing the power of large language models in various applications.
Key Points
- ▸ Model merging emerges as a transformative paradigm for combining multiple neural networks without additional training.
- ▸ The FUSE taxonomy provides a structured framework for examining model merging, covering foundations, unification strategies, scenarios, and ecosystem.
- ▸ The survey systematically reviews algorithmic methods, including weight averaging, task vector arithmetic, and mixture-of-experts architectures.
Merits
Comprehensive Coverage
The survey covers a wide range of algorithmic methods and downstream applications, providing a comprehensive understanding of the paradigm.
Structured Framework
The FUSE taxonomy offers a structured framework for examining model merging, making it easier for researchers and practitioners to navigate and advance the field.
Current State of Practice
The survey provides a current state-of-practice assessment, highlighting key open challenges and areas for future research.
Demerits
Theoretical Gaps
The survey highlights theoretical gaps in model merging, which may hinder future progress in the field.
Scalability Barriers
The survey notes scalability barriers in model merging, which may limit its adoption in large-scale applications.
Standardization Needs
The survey emphasizes the need for standardization in model merging, which may be challenging to achieve given the complexity of the paradigm.
Expert Commentary
The survey provides a comprehensive and structured examination of model merging in the era of large language models. The FUSE taxonomy offers a four-dimensional framework for examining model merging, covering foundations, unification strategies, scenarios, and ecosystem. The survey systematically reviews algorithmic methods, including weight averaging, task vector arithmetic, and mixture-of-experts architectures, and examines downstream applications across multi-task learning, safety alignment, and domain specialization. However, the survey highlights key open challenges, including theoretical gaps, scalability barriers, and standardization needs. To advance model merging, researchers and practitioners must address these challenges and develop more robust and scalable methods for model merging.
Recommendations
- ✓ {'title': 'Develop More Robust and Scalable Methods', 'description': 'Researchers and practitioners should prioritize developing more robust and scalable methods for model merging, addressing theoretical gaps, scalability barriers, and standardization needs.'}
- ✓ {'title': 'Invest in Explainability Research', 'description': 'Investing in explainability research is essential for developing more transparent and trustworthy models, which is critical for model merging applications.'}