Rethinking Adam for Time Series Forecasting: A Simple Heuristic to Improve Optimization under Distribution Shifts
arXiv:2603.10095v1 Announce Type: new Abstract: Time-series forecasting often faces challenges from non-stationarity, particularly distributional drift, where the data distribution evolves over time. This dynamic behavior can undermine the effectiveness of adaptive optimizers, such as Adam, which are typically designed for stationary objectives. In this paper, we revisit Adam in the context of non-stationary forecasting and identify that its second-order bias correction limits responsiveness to shifting loss landscapes. To address this, we propose TS_Adam, a lightweight variant that removes the second-order correction from the learning rate computation. This simple modification improves adaptability to distributional drift while preserving the optimizer core structure and requiring no additional hyperparameters. TS_Adam integrates easily into existing models and consistently improves performance across long- and short-term forecasting tasks. On the ETT datasets with the MICN model, it
arXiv:2603.10095v1 Announce Type: new Abstract: Time-series forecasting often faces challenges from non-stationarity, particularly distributional drift, where the data distribution evolves over time. This dynamic behavior can undermine the effectiveness of adaptive optimizers, such as Adam, which are typically designed for stationary objectives. In this paper, we revisit Adam in the context of non-stationary forecasting and identify that its second-order bias correction limits responsiveness to shifting loss landscapes. To address this, we propose TS_Adam, a lightweight variant that removes the second-order correction from the learning rate computation. This simple modification improves adaptability to distributional drift while preserving the optimizer core structure and requiring no additional hyperparameters. TS_Adam integrates easily into existing models and consistently improves performance across long- and short-term forecasting tasks. On the ETT datasets with the MICN model, it achieves an average reduction of 12.8% in MSE and 5.7% in MAE compared to Adam. These results underscore the practicality and versatility of TS_Adam as an effective optimization strategy for real-world forecasting scenarios involving non-stationary data. Code is available at: https://github.com/DD-459-1/TS_Adam.
Executive Summary
This article proposes a modified version of the Adam optimizer, TS_Adam, to address the challenges of non-stationarity in time-series forecasting. Adam's second-order bias correction is removed to improve adaptability to distributional drift. TS_Adam requires no additional hyperparameters and integrates seamlessly into existing models, achieving improved performance across various forecasting tasks. The results demonstrate an average reduction of 12.8% in mean squared error (MSE) and 5.7% in mean absolute error (MAE) compared to Adam on the ETT datasets with the MICN model. This modification has the potential to enhance the effectiveness of time-series forecasting models in real-world scenarios.
Key Points
- ▸ Adam's second-order bias correction limits its responsiveness to shifting loss landscapes in non-stationary time-series forecasting.
- ▸ TS_Adam removes this bias correction to improve adaptability to distributional drift.
- ▸ TS_Adam achieves improved performance across various forecasting tasks without requiring additional hyperparameters.
Merits
Improved Adaptability
TS_Adam's design allows for better responsiveness to changing loss landscapes, making it more effective in non-stationary time-series forecasting.
Practical Implementation
TS_Adam integrates easily into existing models, reducing the complexity of adopting this modified optimizer in real-world applications.
Efficient Hyperparameter Tuning
TS_Adam does not require additional hyperparameters, simplifying the tuning process and reducing the risk of overfitting.
Demerits
Dependency on Specific Models
TS_Adam's performance improvements are demonstrated on a specific model (MICN) and datasets (ETT), limiting its generalizability to other models and scenarios.
Potential Overfitting
Removing Adam's second-order bias correction may lead to overfitting, particularly in scenarios with complex or noisy data.
Expert Commentary
TS_Adam's effectiveness in non-stationary time-series forecasting demonstrates the importance of adapting optimization techniques to the complexities of real-world data. This modification highlights the need for more research on adaptive optimizers that can handle distributional shifts, which is crucial for the widespread adoption of machine learning technologies. Furthermore, the results underscore the value of rigorous testing and evaluation in identifying effective optimization strategies for specific applications. As the field continues to evolve, it is essential to prioritize the development of more robust and adaptable optimization techniques that can handle the nuances of non-stationary data.
Recommendations
- ✓ Future research should focus on adapting TS_Adam to other machine learning applications that involve non-stationary data, such as natural language processing and computer vision.
- ✓ Developing more robust and adaptable optimization techniques that can handle complex distributional shifts is crucial for the widespread adoption of machine learning technologies in real-world applications.