Skip to main content
Academic

Omni-iEEG: A Large-Scale, Comprehensive iEEG Dataset and Benchmark for Epilepsy Research

arXiv:2602.16072v1 Announce Type: new Abstract: Epilepsy affects over 50 million people worldwide, and one-third of patients suffer drug-resistant seizures where surgery offers the best chance of seizure freedom. Accurate localization of the epileptogenic zone (EZ) relies on intracranial EEG (iEEG). Clinical workflows, however, remain constrained by labor-intensive manual review. At the same time, existing data-driven approaches are typically developed on single-center datasets that are inconsistent in format and metadata, lack standardized benchmarks, and rarely release pathological event annotations, creating barriers to reproducibility, cross-center validation, and clinical relevance. With extensive efforts to reconcile heterogeneous iEEG formats, metadata, and recordings across publicly available sources, we present $\textbf{Omni-iEEG}$, a large-scale, pre-surgical iEEG resource comprising $\textbf{302 patients}$ and $\textbf{178 hours}$ of high-resolution recordings. The dataset

arXiv:2602.16072v1 Announce Type: new Abstract: Epilepsy affects over 50 million people worldwide, and one-third of patients suffer drug-resistant seizures where surgery offers the best chance of seizure freedom. Accurate localization of the epileptogenic zone (EZ) relies on intracranial EEG (iEEG). Clinical workflows, however, remain constrained by labor-intensive manual review. At the same time, existing data-driven approaches are typically developed on single-center datasets that are inconsistent in format and metadata, lack standardized benchmarks, and rarely release pathological event annotations, creating barriers to reproducibility, cross-center validation, and clinical relevance. With extensive efforts to reconcile heterogeneous iEEG formats, metadata, and recordings across publicly available sources, we present $\textbf{Omni-iEEG}$, a large-scale, pre-surgical iEEG resource comprising $\textbf{302 patients}$ and $\textbf{178 hours}$ of high-resolution recordings. The dataset includes harmonized clinical metadata such as seizure onset zones, resections, and surgical outcomes, all validated by board-certified epileptologists. In addition, Omni-iEEG provides over 36K expert-validated annotations of pathological events, enabling robust biomarker studies. Omni-iEEG serves as a bridge between machine learning and epilepsy research. It defines clinically meaningful tasks with unified evaluation metrics grounded in clinical priors, enabling systematic evaluation of models in clinically relevant settings. Beyond benchmarking, we demonstrate the potential of end-to-end modeling on long iEEG segments and highlight the transferability of representations pretrained on non-neurophysiological domains. Together, these contributions establish Omni-iEEG as a foundation for reproducible, generalizable, and clinically translatable epilepsy research. The project page with dataset and code links is available at omni-ieeg.github.io/omni-ieeg.

Executive Summary

The article introduces Omni-iEEG, a large-scale, comprehensive intracranial EEG (iEEG) dataset designed to advance epilepsy research. Comprising 302 patients and 178 hours of high-resolution recordings, Omni-iEEG includes harmonized clinical metadata and over 36,000 expert-validated annotations of pathological events. The dataset aims to bridge the gap between machine learning and epilepsy research by providing standardized benchmarks and clinically relevant tasks. The authors demonstrate the potential of end-to-end modeling on long iEEG segments and highlight the transferability of representations pretrained on non-neurophysiological domains. This resource is intended to facilitate reproducible, generalizable, and clinically translatable epilepsy research.

Key Points

  • Omni-iEEG is a large-scale iEEG dataset with 302 patients and 178 hours of recordings.
  • The dataset includes harmonized clinical metadata and expert-validated annotations.
  • Omni-iEEG defines clinically meaningful tasks with unified evaluation metrics.
  • The dataset demonstrates the potential of end-to-end modeling and transfer learning in epilepsy research.

Merits

Comprehensive Dataset

Omni-iEEG provides a large and diverse dataset that includes high-resolution recordings and extensive clinical metadata, making it a valuable resource for epilepsy research.

Standardized Benchmarks

The dataset offers standardized benchmarks and clinically relevant tasks, which can facilitate cross-center validation and improve the reproducibility of research findings.

Expert Validation

The annotations and metadata in Omni-iEEG are validated by board-certified epileptologists, ensuring the accuracy and reliability of the dataset.

Demerits

Data Heterogeneity

Despite efforts to harmonize data from different sources, inherent heterogeneity in iEEG recordings and metadata may still pose challenges for data analysis and model generalization.

Limited Patient Diversity

The dataset may not fully represent the diversity of epilepsy patients, which could limit the generalizability of research findings to broader populations.

Technical Complexity

The complexity of iEEG data and the need for specialized expertise to interpret and analyze the data may create barriers for researchers without a background in neurophysiology.

Expert Commentary

Omni-iEEG represents a significant advancement in the field of epilepsy research by providing a large-scale, comprehensive iEEG dataset that bridges the gap between machine learning and clinical practice. The dataset's standardized benchmarks and expert-validated annotations offer a robust foundation for developing and evaluating machine learning models in clinically relevant settings. However, the inherent heterogeneity of iEEG data and the need for specialized expertise to interpret and analyze the data pose challenges that must be addressed to fully realize the potential of this resource. The successful translation of research findings into clinical practice will require ongoing collaboration between clinicians, researchers, and data scientists to ensure that technological solutions meet the needs of patients and healthcare providers. Overall, Omni-iEEG is a valuable contribution to the field of epilepsy research and has the potential to accelerate the development of more accurate and efficient diagnostic and treatment approaches.

Recommendations

  • Increase efforts to harmonize data from diverse sources to minimize heterogeneity and improve the generalizability of research findings.
  • Promote interdisciplinary collaboration to ensure that machine learning models developed using Omni-iEEG are clinically relevant and can be integrated into clinical workflows.

Sources