Academic

Timer-S1: A Billion-Scale Time Series Foundation Model with Serial Scaling

arXiv:2603.04791v1 Announce Type: new Abstract: We introduce Timer-S1, a strong Mixture-of-Experts (MoE) time series foundation model with 8.3B total parameters, 0.75B activated parameters for each token, and a context length of 11.5K. To overcome the scalability bottleneck in existing pre-trained time series foundation models, we perform Serial Scaling in three dimensions: model architecture, dataset, and training pipeline. Timer-S1 integrates sparse TimeMoE blocks and generic TimeSTP blocks for Serial-Token Prediction (STP), a generic training objective that adheres to the serial nature of forecasting. The proposed paradigm introduces serial computations to improve long-term predictions while avoiding costly rolling-style inference and pronounced error accumulation in the standard next-token prediction. Pursuing a high-quality and unbiased training dataset, we curate TimeBench, a corpus with one trillion time points, and apply meticulous data augmentation to mitigate predictive bias

arXiv:2603.04791v1 Announce Type: new Abstract: We introduce Timer-S1, a strong Mixture-of-Experts (MoE) time series foundation model with 8.3B total parameters, 0.75B activated parameters for each token, and a context length of 11.5K. To overcome the scalability bottleneck in existing pre-trained time series foundation models, we perform Serial Scaling in three dimensions: model architecture, dataset, and training pipeline. Timer-S1 integrates sparse TimeMoE blocks and generic TimeSTP blocks for Serial-Token Prediction (STP), a generic training objective that adheres to the serial nature of forecasting. The proposed paradigm introduces serial computations to improve long-term predictions while avoiding costly rolling-style inference and pronounced error accumulation in the standard next-token prediction. Pursuing a high-quality and unbiased training dataset, we curate TimeBench, a corpus with one trillion time points, and apply meticulous data augmentation to mitigate predictive bias. We further pioneer a post-training stage, including continued pre-training and long-context extension, to enhance short-term and long-context performance. Evaluated on the large-scale GIFT-Eval leaderboard, Timer-S1 achieves state-of-the-art forecasting performance, attaining the best MASE and CRPS scores as a pre-trained model. Timer-S1 will be released to facilitate further research.

Executive Summary

This article introduces Timer-S1, a novel time series foundation model that overcomes scalability bottlenecks in existing models through Serial Scaling. With 8.3B parameters and a context length of 11.5K, Timer-S1 integrates sparse TimeMoE blocks and generic TimeSTP blocks for Serial-Token Prediction (STP). The authors curate TimeBench, a high-quality training dataset with one trillion time points, and pioneer a post-training stage to enhance short-term and long-context performance. Timer-S1 achieves state-of-the-art forecasting performance on the GIFT-Eval leaderboard, with the best MASE and CRPS scores as a pre-trained model. Its release is anticipated to facilitate further research in this area.

Key Points

  • Timer-S1 introduces Serial Scaling to overcome scalability bottlenecks in existing time series foundation models.
  • The model integrates sparse TimeMoE blocks and generic TimeSTP blocks for Serial-Token Prediction (STP).
  • TimeBench, a high-quality training dataset with one trillion time points, is curated to mitigate predictive bias.
  • Timer-S1 achieves state-of-the-art forecasting performance on the GIFT-Eval leaderboard.

Merits

Strength in Serial Scaling

The authors' innovative approach to Serial Scaling in three dimensions (model architecture, dataset, and training pipeline) enables the development of a highly scalable time series foundation model.

Improved Long-term Predictions

The introduction of serial computations in Timer-S1 improves long-term predictions while avoiding costly rolling-style inference and pronounced error accumulation in standard next-token prediction.

High-quality Training Dataset

The curation of TimeBench, a corpus with one trillion time points, ensures a high-quality and unbiased training dataset that mitigates predictive bias.

Demerits

Computational Intensity

The development and training of Timer-S1 are computationally intensive, requiring significant resources and expertise.

Complexity of Model Architecture

The integration of sparse TimeMoE blocks and generic TimeSTP blocks in Timer-S1 may add complexity to the model architecture, making it challenging to interpret and maintain.

Expert Commentary

Timer-S1 represents a significant advancement in time series foundation models, addressing the scalability bottlenecks that have limited the development of more accurate and efficient models. The authors' innovative approach to Serial Scaling and the curation of TimeBench demonstrate a deep understanding of the challenges facing time series forecasting. However, the model's complexity and computational intensity may limit its adoption in certain settings. As researchers and practitioners continue to build upon Timer-S1, it is essential to prioritize the development of more interpretable models and the creation of high-quality training datasets. By doing so, we can unlock the full potential of time series forecasting and drive innovation across various industries.

Recommendations

  • Develop more interpretable time series models that can provide transparent and explainable results.
  • Invest in the creation of high-quality training datasets that can mitigate predictive bias and improve model performance.

Sources