Academic

FourierMoE: Fourier Mixture-of-Experts Adaptation of Large Language Models

arXiv:2604.01762v1 Announce Type: new Abstract: Parameter-efficient fine-tuning (PEFT) has emerged as a crucial paradigm for adapting large language models (LLMs) under constrained computational budgets. However, standard PEFT methods often struggle in multi-task fine-tuning settings, where diverse optimization objectives induce task interference and limited parameter budgets lead to representational deficiency. While recent approaches incorporate mixture-of-experts (MoE) to alleviate these issues, they predominantly operate in the spatial domain, which may introduce structural redundancy and parameter overhead. To overcome these limitations, we reformulate adaptation in the spectral domain. Our spectral analysis reveals that different tasks exhibit distinct frequency energy distributions, and that LLM layers display heterogeneous frequency sensitivities. Motivated by these insights, we propose FourierMoE, which integrates the MoE architecture with the inverse discrete Fourier transfo

J
Juyong Jiang, Fan Wang, Hong Qi, Sunghun Kim, Jing Tang
· · 1 min read · 8 views

arXiv:2604.01762v1 Announce Type: new Abstract: Parameter-efficient fine-tuning (PEFT) has emerged as a crucial paradigm for adapting large language models (LLMs) under constrained computational budgets. However, standard PEFT methods often struggle in multi-task fine-tuning settings, where diverse optimization objectives induce task interference and limited parameter budgets lead to representational deficiency. While recent approaches incorporate mixture-of-experts (MoE) to alleviate these issues, they predominantly operate in the spatial domain, which may introduce structural redundancy and parameter overhead. To overcome these limitations, we reformulate adaptation in the spectral domain. Our spectral analysis reveals that different tasks exhibit distinct frequency energy distributions, and that LLM layers display heterogeneous frequency sensitivities. Motivated by these insights, we propose FourierMoE, which integrates the MoE architecture with the inverse discrete Fourier transform (IDFT) for frequency-aware adaptation. Specifically, FourierMoE employs a frequency-adaptive router to dispatch tokens to experts specialized in distinct frequency bands. Each expert learns a set of conjugate-symmetric complex coefficients, preserving complete phase and amplitude information while theoretically guaranteeing lossless IDFT reconstruction into real-valued spatial weights. Extensive evaluations across 28 benchmarks, multiple model architectures, and scales demonstrate that FourierMoE consistently outperforms competitive baselines in both single-task and multi-task settings while using significantly fewer trainable parameters. These results highlight the promise of spectral-domain expert adaptation as an effective and parameter-efficient paradigm for LLM fine-tuning.

Executive Summary

The article FourierMoE introduces an innovative spectral-domain adaptation framework for large language models (LLMs) by integrating mixture-of-experts (MoE) with the inverse discrete Fourier transform (IDFT). Traditional PEFT methods face challenges in multi-task settings due to task interference and parameter constraints. FourierMoE addresses these by reformulating adaptation in the spectral domain, leveraging distinct frequency energy distributions across tasks and heterogeneous frequency sensitivities in LLM layers. By employing a frequency-adaptive router and complex conjugate-symmetric coefficients, the method enables frequency-aware, lossless adaptation with reduced parameter overhead. Empirical evaluations across 28 benchmarks show consistent outperformance of competitive baselines in both single- and multi-task scenarios, while achieving significant parameter efficiency. This represents a meaningful shift toward spectral-domain expert adaptation as a viable, scalable solution for LLM fine-tuning.

Key Points

  • Spectral-domain reformulation of MoE for LLM adaptation
  • Utilization of IDFT to address task interference and parameter constraints
  • Empirical validation demonstrating superior performance across multiple benchmarks with fewer parameters

Merits

Innovation

FourierMoE introduces a novel spectral-domain approach that fundamentally shifts the adaptation paradigm from spatial to frequency-aware mechanisms, offering a more precise and efficient adaptation mechanism tailored to task-specific frequency profiles.

Empirical Validation

The extensive benchmarking across diverse models and scales provides robust evidence of the method’s superiority, enhancing credibility and applicability in real-world deployment scenarios.

Demerits

Complexity

The implementation of frequency-adaptive routing and complex coefficient handling may introduce additional computational overhead or require specialized expertise for deployment, potentially complicating adoption in non-expert environments.

Generalization Concerns

While performance is validated across 28 benchmarks, the extent to which these results generalize to entirely novel or unseen task domains remains an open question, warranting further empirical scrutiny.

Expert Commentary

FourierMoE represents a sophisticated and timely contribution to the field of LLM adaptation. The shift from spatial to spectral domain adaptation is not merely a methodological tweak; it is a paradigm shift that aligns with the intrinsic nature of neural networks’ frequency-based representations. The use of conjugate-symmetric complex coefficients as a mechanism to preserve phase and amplitude information while enabling lossless reconstruction is particularly noteworthy—it demonstrates a deep understanding of signal processing theory applied to neural architectures. Moreover, the claim of theoretical lossless IDFT reconstruction is a strong anchor for credibility. However, one must consider the practical implications of deploying these complex coefficients in production systems, particularly in latency-sensitive applications. While the paper cites extensive evaluations, the real-world scalability of this approach—especially under high-throughput inference demands—requires further validation. Nonetheless, this work sets a new benchmark for combining signal processing principles with LLMs, and I anticipate it will catalyze a wave of research exploring spectral-domain adaptations across other domains, from computer vision to audio AI.

Recommendations

  • Researchers should replicate FourierMoE’s methodology in alternative neural architectures beyond LLMs, such as vision transformers and reinforcement learning agents, to assess cross-domain applicability.
  • Industry practitioners should evaluate FourierMoE in hybrid PEFT pipelines alongside existing spatial MoE variants to determine optimal integration strategies for production-grade systems.

Sources

Original: arXiv - cs.LG