PA-Net: Precipitation-Adaptive Mixture-of-Experts for Long-Tail Rainfall Nowcasting
arXiv:2603.13818v1 Announce Type: new Abstract: Precipitation nowcasting is vital for flood warning, agricultural management, and emergency response, yet two bottlenecks persist: the prohibitive cost of modeling million-scale spatiotemporal tokens from multi-variate atmospheric fields, and the extreme long-tailed rainfall distribution where heavy-to-torrential events -- those of greatest societal impact -- constitute fewer than 0.1% of all samples. We propose the Precipitation-Adaptive Network (PA-Net), a Transformer framework whose computational budget is explicitly governed by rainfall intensity. Its core component, Precipitation-Adaptive MoE (PA-MoE), dynamically scales the number of activated experts per token according to local precipitation magnitude, channeling richer representational capacity toward the rare yet critical heavy-rainfall tail. A Dual-Axis Compressed Latent Attention mechanism factorizes spatiotemporal attention with convolutional reduction to manage massive cont
arXiv:2603.13818v1 Announce Type: new Abstract: Precipitation nowcasting is vital for flood warning, agricultural management, and emergency response, yet two bottlenecks persist: the prohibitive cost of modeling million-scale spatiotemporal tokens from multi-variate atmospheric fields, and the extreme long-tailed rainfall distribution where heavy-to-torrential events -- those of greatest societal impact -- constitute fewer than 0.1% of all samples. We propose the Precipitation-Adaptive Network (PA-Net), a Transformer framework whose computational budget is explicitly governed by rainfall intensity. Its core component, Precipitation-Adaptive MoE (PA-MoE), dynamically scales the number of activated experts per token according to local precipitation magnitude, channeling richer representational capacity toward the rare yet critical heavy-rainfall tail. A Dual-Axis Compressed Latent Attention mechanism factorizes spatiotemporal attention with convolutional reduction to manage massive context lengths, while an intensity-aware training protocol progressively amplifies learning signals from extreme-rainfall samples. Experiment on ERA5 demonstrate consistent improvements over state-of-the-art baselines, with particularly significant gains in heavy-rain and rainstorm regimes.
Executive Summary
The PA-Net article addresses a critical gap in precipitation nowcasting by introducing a novel Transformer-based framework that dynamically adapts computational resources to rainfall intensity. Given the disproportionate societal impact of rare heavy-rainfall events and the computational inefficiency of modeling vast spatiotemporal data, PA-Net introduces the PA-MoE component, which scales expert activation based on local precipitation magnitude, thereby optimizing resource allocation toward high-impact events. The Dual-Axis Compressed Latent Attention further mitigates scalability challenges by compressing context dimensions via convolutional reduction. Empirical validation on ERA5 datasets confirms measurable gains in accuracy, particularly in heavy-rain and rainstorm scenarios, demonstrating the effectiveness of intensity-aware adaptive architectures. This work represents a meaningful advancement in balancing computational efficiency with predictive accuracy in meteorological forecasting.
Key Points
- ▸ PA-Net introduces a Transformer framework that dynamically scales experts based on rainfall intensity
- ▸ PA-MoE component adjusts expert activation dynamically per token based on precipitation magnitude
- ▸ Dual-Axis Compressed Latent Attention reduces computational load via convolutional compression of spatiotemporal attention
Merits
Innovative Adaptivity
PA-Net’s intensity-aware architecture represents a novel solution to the long-tail distribution challenge by aligning computational effort with impact, not volume.
Efficient Scalability
The Dual-Axis Compressed Latent Attention enables efficient handling of massive spatiotemporal contexts without proportional increase in computational cost.
Demerits
Implementation Complexity
The dynamic scaling mechanism may introduce architectural complexity in deployment, particularly for real-time operational systems requiring deterministic latency.
Generalizability Concerns
Empirical validation on ERA5 may limit applicability to other regional datasets or non-European meteorological infrastructures without further validation.
Expert Commentary
PA-Net represents a sophisticated and timely intervention in the field of precipitation nowcasting. The core innovation lies in its ability to reconcile the dual challenge of computational scalability and statistical imbalance—two persistent obstacles in meteorological forecasting. By embedding a conditional computation mechanism that responds to intensity rather than volume, the authors effectively redirect computational resources toward the most societally relevant events, a paradigm shift from conventional uniform-scale models. The Dual-Axis Compressed Latent Attention is particularly noteworthy for its elegant integration of attention compression with convolutional reduction, offering a scalable solution without compromising contextual depth. While the empirical results are compelling, the long-term viability of such systems will depend on reproducibility across diverse meteorological datasets and computational architectures. Moreover, the intensity-aware training protocol introduces a new dimension to data augmentation strategies, potentially influencing future research in adaptive learning for environmental modeling. Overall, PA-Net sets a new benchmark for adaptive, impact-driven architectures in climate forecasting.
Recommendations
- ✓ 1. Encourage open-source deployment of PA-Net’s architecture for comparative evaluation across global meteorological datasets.
- ✓ 2. Fund pilot studies integrating PA-Net into real-time operational nowcasting platforms to assess latency, accuracy, and scalability under live conditions.