Academic

OSF: On Pre-training and Scaling of Sleep Foundation Models

arXiv:2603.00190v1 Announce Type: new Abstract: Polysomnography (PSG) provides the gold standard for sleep assessment but suffers from substantial heterogeneity across recording devices and cohorts. There have been growing efforts to build general-purpose foundation models (FMs) for sleep physiology, but lack an in-depth understanding of the pre-training process and scaling patterns that lead to more generalizable sleep FMs. To fill this gap, we curate a massive corpus of 166,500 hours of sleep recordings from nine public sources and establish SleepBench, a comprehensive, fully open-source benchmark. Leveraging SleepBench, we systematically evaluate four families of self-supervised pre-training objectives and uncover three critical findings: (1) existing FMs fail to generalize to missing channels at inference; (2) channel-invariant feature learning is essential for pre-training; and (3) scaling sample size, model capacity, and multi-source data mixture consistently improves downstream

Z
Zitao Shuai, Zongzhe Xu, David Yang, Wei Wang, Yuzhe Yang
· · 1 min read · 19 views

arXiv:2603.00190v1 Announce Type: new Abstract: Polysomnography (PSG) provides the gold standard for sleep assessment but suffers from substantial heterogeneity across recording devices and cohorts. There have been growing efforts to build general-purpose foundation models (FMs) for sleep physiology, but lack an in-depth understanding of the pre-training process and scaling patterns that lead to more generalizable sleep FMs. To fill this gap, we curate a massive corpus of 166,500 hours of sleep recordings from nine public sources and establish SleepBench, a comprehensive, fully open-source benchmark. Leveraging SleepBench, we systematically evaluate four families of self-supervised pre-training objectives and uncover three critical findings: (1) existing FMs fail to generalize to missing channels at inference; (2) channel-invariant feature learning is essential for pre-training; and (3) scaling sample size, model capacity, and multi-source data mixture consistently improves downstream performance.With an enhanced pre-training and scaling recipe, we introduce OSF, a family of sleep FMs that achieves state-of-the-art performance across nine datasets on diverse sleep and disease prediction tasks. Further analysis of OSF also reveals intriguing properties in sample efficiency, hierarchical aggregation, and cross-dataset scaling.

Executive Summary

This article presents OSF, a family of foundation models for sleep physiology, designed to overcome the heterogeneity of polysomnography (PSG) recordings. Leveraging a massive corpus of 166,500 hours of sleep recordings, the authors develop SleepBench, a comprehensive benchmark for evaluating sleep FMs. Their findings highlight the importance of channel-invariant feature learning, scaling sample size, model capacity, and multi-source data mixture for achieving state-of-the-art performance on diverse sleep and disease prediction tasks. The authors also introduce intriguing properties of OSF, including sample efficiency, hierarchical aggregation, and cross-dataset scaling. This work fills a significant gap in the understanding of pre-training and scaling of sleep FMs, enabling the development of more generalizable models for sleep assessment and disease prediction.

Key Points

  • Existing FMs fail to generalize to missing channels at inference.
  • Channel-invariant feature learning is essential for pre-training.
  • Scaling sample size, model capacity, and multi-source data mixture consistently improves downstream performance.

Merits

Advances the field of sleep research

The development of OSF and SleepBench marks a significant step forward in the field of sleep research, enabling the creation of more generalizable models for sleep assessment and disease prediction.

Provides a comprehensive benchmark for evaluating sleep FMs

SleepBench offers a robust framework for evaluating the performance of sleep FMs, allowing for a more accurate assessment of their capabilities and limitations.

Demerits

Lack of diversity in the dataset

The article relies on a dataset of 166,500 hours of sleep recordings from nine public sources, which may not be representative of the broader population, potentially limiting the generalizability of the findings.

Insufficient evaluation of the environmental impact

The article focuses primarily on the technical aspects of the OSF model, with limited consideration of the environmental implications of training and deploying large-scale machine learning models.

Expert Commentary

The article presents a significant contribution to the field of sleep research, offering a novel approach to the development of generalizable sleep FMs. The introduction of SleepBench provides a comprehensive benchmark for evaluating the performance of sleep FMs, enabling researchers to more accurately assess the capabilities and limitations of these models. However, the article's reliance on a dataset from nine public sources may limit the generalizability of the findings. Furthermore, the article's focus on the technical aspects of the OSF model neglects the environmental implications of training and deploying large-scale machine learning models. Nevertheless, the article's findings have significant implications for the development of personalized sleep therapies and interventions, as well as for the creation of more accurate and efficient sleep assessment tools.

Recommendations

  • Future research should prioritize the development of more diverse and representative datasets for sleep research.
  • Researchers should consider the environmental implications of their work, exploring more sustainable methods for training and deploying large-scale machine learning models.

Sources