Academic

The Multiverse of Time Series Machine Learning: an Archive for Multivariate Time Series Classification

arXiv:2603.20352v1 Announce Type: new Abstract: Time series machine learning (TSML) is a growing research field that spans a wide range of tasks. The popularity of established tasks such as classification, clustering, and extrinsic regression has, in part, been driven by the availability of benchmark datasets. An archive of 30 multivariate time series classification datasets, introduced in 2018 and commonly known as the UEA archive, has since become an essential resource cited in hundreds of publications. We present a substantial expansion of this archive that more than quadruples its size, from 30 to 133 classification problems. We also release preprocessed versions of datasets containing missing values or unequal length series, bringing the total number of datasets to 147. Reflecting the growth of the archive and the broader community, we rebrand it as the Multiverse archive to capture its diversity of domains. The Multiverse archive includes datasets from multip

arXiv:2603.20352v1 Announce Type: new Abstract: Time series machine learning (TSML) is a growing research field that spans a wide range of tasks. The popularity of established tasks such as classification, clustering, and extrinsic regression has, in part, been driven by the availability of benchmark datasets. An archive of 30 multivariate time series classification datasets, introduced in 2018 and commonly known as the UEA archive, has since become an essential resource cited in hundreds of publications. We present a substantial expansion of this archive that more than quadruples its size, from 30 to 133 classification problems. We also release preprocessed versions of datasets containing missing values or unequal length series, bringing the total number of datasets to 147. Reflecting the growth of the archive and the broader community, we rebrand it as the Multiverse archive to capture its diversity of domains. The Multiverse archive includes datasets from multiple sources, consolidating other collections and standalone datasets into a single, unified repository. Recognising that running experiments across the full archive is computationally demanding, we recommend a subset of the full archive called Multiverse-core (MV-core) for initial exploration. To support researchers in using the new archive, we provide detailed guidance and a baseline evaluation of established and recent classification algorithms, establishing performance benchmarks for future research. We have created a dedicated repository for the Multiverse archive that provides a common aeon and scikit-learn compatible framework for reproducibility, an extensive record of published results, and an interactive interface to explore the results.

Executive Summary

This article presents a significant expansion of the UEA archive, a collection of multivariate time series classification datasets, into a unified repository known as the Multiverse archive. The new archive comprises 133 classification problems, including preprocessed versions of datasets with missing values or unequal length series. The authors provide a baseline evaluation of established and recent classification algorithms, establishing performance benchmarks for future research. A dedicated repository has been created to support reproducibility, featuring a common framework, extensive record of published results, and an interactive interface to explore the results. This development is expected to facilitate research in time series machine learning, particularly in the context of multivariate classification problems.

Key Points

  • Expansion of the UEA archive into a unified repository called the Multiverse archive
  • Increase in the number of classification problems from 30 to 133
  • Preprocessing of datasets with missing values or unequal length series
  • Baseline evaluation of established and recent classification algorithms
  • Creation of a dedicated repository for reproducibility and exploration

Merits

Comprehensive and Unified Repository

The Multiverse archive provides a single, unified repository for multivariate time series classification datasets, facilitating research and comparison across various domains.

Preprocessed Datasets

The inclusion of preprocessed datasets with missing values or unequal length series enables researchers to focus on classification tasks without the need for additional data processing.

Baseline Evaluation and Performance Benchmarks

The authors' baseline evaluation of established and recent classification algorithms establishes performance benchmarks for future research, providing a valuable resource for the research community.

Demerits

Scalability and Computational Demands

Running experiments across the full archive is computationally demanding, which may hinder the adoption of the Multiverse archive for large-scale research projects.

Complexity of the Multiverse-core Subset

The recommendation of a subset of the full archive called Multiverse-core for initial exploration may introduce additional complexity and require researchers to adapt their approaches accordingly.

Expert Commentary

The creation of the Multiverse archive represents a significant advancement in the field of time series machine learning, particularly in the context of multivariate classification problems. The unified repository and preprocessed datasets will facilitate research and comparison across various domains, establishing performance benchmarks for future research. However, scalability and computational demands may hinder the adoption of the Multiverse archive for large-scale research projects. Nevertheless, this development is expected to drive the development of more accurate and efficient classification algorithms, which will have significant practical and policy implications.

Recommendations

  • Researchers should explore the Multiverse archive and its preprocessed datasets to leverage the comprehensive and unified repository for multivariate time series classification problems.
  • The development of more efficient and scalable algorithms for processing large datasets in the Multiverse archive is crucial to facilitate its adoption for large-scale research projects.

Sources

Original: arXiv - cs.LG