Academic

DC-Merge: Improving Model Merging with Directional Consistency

arXiv:2603.06242v1 Announce Type: new Abstract: Model merging aims to integrate multiple task-adapted models into a unified model that preserves the knowledge of each task. In this paper, we identify that the key to this knowledge retention lies in maintaining the directional consistency of singular spaces between merged multi-task vector and individual task vectors. However, this consistency is frequently compromised by two issues: i) an imbalanced energy distribution within task vectors, where a small fraction of singular values dominate the total energy, leading to the neglect of semantically important but weaker components upon merging, and ii) the geometric inconsistency of task vectors in parameter space, which causes direct merging to distort their underlying directional geometry. To address these challenges, we propose DC-Merge, a method for directional-consistent model merging. It first balances the energy distribution of each task vector by smoothing its singular values, ens

arXiv:2603.06242v1 Announce Type: new Abstract: Model merging aims to integrate multiple task-adapted models into a unified model that preserves the knowledge of each task. In this paper, we identify that the key to this knowledge retention lies in maintaining the directional consistency of singular spaces between merged multi-task vector and individual task vectors. However, this consistency is frequently compromised by two issues: i) an imbalanced energy distribution within task vectors, where a small fraction of singular values dominate the total energy, leading to the neglect of semantically important but weaker components upon merging, and ii) the geometric inconsistency of task vectors in parameter space, which causes direct merging to distort their underlying directional geometry. To address these challenges, we propose DC-Merge, a method for directional-consistent model merging. It first balances the energy distribution of each task vector by smoothing its singular values, ensuring all knowledge components are adequately represented. These energy-balanced vectors are then projected onto a shared orthogonal subspace to align their directional geometries with minimal reconstruction error. Finally, the aligned vectors are aggregated in the shared orthogonal subspace and projected back to the original parameter space. Extensive experiments on vision and vision-language benchmarks show that DC-Merge consistently achieves state-of-the-art performance in both full fine-tuning and LoRA settings. The implementation code is available at https://github.com/Tobeginwith/DC-Merge.

Executive Summary

The article introduces DC-Merge, a novel method for model merging that prioritizes directional consistency between task vectors. By addressing issues of imbalanced energy distribution and geometric inconsistency, DC-Merge ensures that all knowledge components are adequately represented and preserved during the merging process. The approach involves smoothing singular values, projecting vectors onto a shared orthogonal subspace, and aggregating them to produce a unified model. Experimental results demonstrate state-of-the-art performance in various settings, highlighting the effectiveness of DC-Merge in model merging tasks.

Key Points

  • DC-Merge prioritizes directional consistency between task vectors
  • The method addresses imbalanced energy distribution and geometric inconsistency
  • Experimental results show state-of-the-art performance in vision and vision-language benchmarks

Merits

Improved Knowledge Retention

DC-Merge effectively preserves the knowledge of each task by maintaining directional consistency, resulting in a more comprehensive and accurate unified model.

Demerits

Computational Complexity

The additional steps involved in DC-Merge, such as smoothing singular values and projecting vectors onto a shared subspace, may increase computational complexity and require more resources.

Expert Commentary

The introduction of DC-Merge marks a significant advancement in model merging techniques, as it addresses crucial issues that have hindered the widespread adoption of multi-task learning. By prioritizing directional consistency, DC-Merge ensures that the resulting unified model is not only more accurate but also more comprehensive, preserving the knowledge and capabilities of each individual task. The experimental results demonstrate the effectiveness of DC-Merge, and its potential applications in various domains are substantial. However, further research is needed to explore the limitations and potential extensions of this method, particularly in terms of computational complexity and scalability.

Recommendations

  • Further investigation into the scalability and computational complexity of DC-Merge
  • Exploration of potential applications and extensions of DC-Merge in various domains, such as natural language processing and reinforcement learning

Sources