Academic

ACE-Merging: Data-Free Model Merging with Adaptive Covariance Estimation

arXiv:2603.02945v1 Announce Type: new Abstract: Model merging aims to combine multiple task-specific expert models into a single model while preserving generalization across diverse tasks. However, interference among experts, especially when they are trained on different objectives, often leads to significant performance degradation. Despite recent progress, resolving this interference without data access, retraining, or architectural modification remains a fundamental challenge. This paper provides a theoretical analysis demonstrating that the input covariance of each task, which is a key factor for optimal merging, can be implicitly estimated from the parameter differences of its fine-tuned model, even in a fully data-free setting. Building on this insight, we introduce \acem, an Adaptive Covariance Estimation framework that effectively mitigates inter-task interference. Our approach features a principled, closed-form solution that contrasts with prior iterative or heuristic methods

arXiv:2603.02945v1 Announce Type: new Abstract: Model merging aims to combine multiple task-specific expert models into a single model while preserving generalization across diverse tasks. However, interference among experts, especially when they are trained on different objectives, often leads to significant performance degradation. Despite recent progress, resolving this interference without data access, retraining, or architectural modification remains a fundamental challenge. This paper provides a theoretical analysis demonstrating that the input covariance of each task, which is a key factor for optimal merging, can be implicitly estimated from the parameter differences of its fine-tuned model, even in a fully data-free setting. Building on this insight, we introduce \acem, an Adaptive Covariance Estimation framework that effectively mitigates inter-task interference. Our approach features a principled, closed-form solution that contrasts with prior iterative or heuristic methods. Extensive experiments on both vision and language benchmarks demonstrate that \acem sets a new state-of-the-art among data-free methods. It consistently outperforms existing baselines; for example, \acem achieves an average absolute improvement of 4\% over the previous methods across seven tasks on GPT-2. Owing to its efficient closed-form formulation, \acem delivers superior performance with a modest computational cost, providing a practical and theoretically grounded solution for model merging.

Executive Summary

This article presents ACE-Merging, an innovative data-free model merging framework that effectively resolves inter-task interference without data access, retraining, or architectural modification. ACE-Merging leverages Adaptive Covariance Estimation, a novel approach that estimates the input covariance of each task from the parameter differences of its fine-tuned model. This framework outperforms existing baselines, achieving a 4% average absolute improvement across seven tasks on GPT-2. ACE-Merging's efficient closed-form formulation makes it a practical and theoretically grounded solution for model merging. The authors' theoretical analysis provides a solid foundation for this approach, showcasing its potential to revolutionize model merging and promote more efficient collaboration among expert models.

Key Points

  • ACE-Merging is a data-free model merging framework that resolves inter-task interference.
  • ACE-Merging uses Adaptive Covariance Estimation to estimate input covariance from parameter differences.
  • The framework outperforms existing baselines, achieving a 4% average absolute improvement on GPT-2.

Merits

Theoretical Foundation

The authors provide a thorough theoretical analysis of ACE-Merging, demonstrating its potential to resolve inter-task interference in a data-free setting.

Efficient Closed-Form Formulation

ACE-Merging's closed-form solution enables efficient computation, making it a practical solution for model merging.

Superior Performance

The framework outperforms existing baselines, achieving state-of-the-art results on vision and language benchmarks.

Demerits

Limited Domain

The framework's performance and generalizability have only been demonstrated on specific vision and language benchmarks.

Scalability

The computational cost of ACE-Merging may increase with the number of tasks and models involved, potentially limiting its scalability.

Expert Commentary

The article presents a well-structured and thorough investigation of ACE-Merging, a novel data-free model merging framework. The authors' theoretical analysis provides a solid foundation for this approach, and the experimental results demonstrate its potential to resolve inter-task interference in a data-free setting. However, the framework's performance and generalizability have only been demonstrated on specific vision and language benchmarks, and its scalability may be limited by the computational cost of the closed-form solution. Nonetheless, ACE-Merging is a significant contribution to the field of model merging, and its implications for transfer learning and knowledge distillation are worth exploring further.

Recommendations

  • Future research should focus on expanding the scope of ACE-Merging to other domains and applications, as well as exploring its potential for transfer learning and knowledge distillation.
  • Developers should investigate methods to optimize the computational cost of ACE-Merging, potentially using approximation techniques or parallel computing to improve its scalability.

Sources