Scaling Laws for Masked-Reconstruction Transformers on Single-Cell Transcriptomics
arXiv:2602.15253v1 Announce Type: new Abstract: Neural scaling laws -- power-law relationships between loss, model size, and data -- have been extensively documented for language and vision transformers, yet their existence in single-cell genomics remains largely unexplored. We present the first systematic study of scaling behaviour for masked-reconstruction transformers trained on single-cell RNA sequencing (scRNA-seq) data. Using expression profiles from the CELLxGENE Census, we construct two experimental regimes: a data-rich regime (512 highly variable genes, 200,000 cells) and a data-limited regime (1,024 genes, 10,000 cells). Across seven model sizes spanning three orders of magnitude in parameter count (533 to 3.4 x 10^8 parameters), we fit the parametric scaling law to validation mean squared error (MSE). The data-rich regime exhibits clear power-law scaling with an irreducible loss floor of c ~ 1.44, while the data-limited regime shows negligible scaling, indicating that model
arXiv:2602.15253v1 Announce Type: new Abstract: Neural scaling laws -- power-law relationships between loss, model size, and data -- have been extensively documented for language and vision transformers, yet their existence in single-cell genomics remains largely unexplored. We present the first systematic study of scaling behaviour for masked-reconstruction transformers trained on single-cell RNA sequencing (scRNA-seq) data. Using expression profiles from the CELLxGENE Census, we construct two experimental regimes: a data-rich regime (512 highly variable genes, 200,000 cells) and a data-limited regime (1,024 genes, 10,000 cells). Across seven model sizes spanning three orders of magnitude in parameter count (533 to 3.4 x 10^8 parameters), we fit the parametric scaling law to validation mean squared error (MSE). The data-rich regime exhibits clear power-law scaling with an irreducible loss floor of c ~ 1.44, while the data-limited regime shows negligible scaling, indicating that model capacity is not the binding constraint when data are scarce. These results establish that scaling laws analogous to those observed in natural language processing do emerge in single-cell transcriptomics when sufficient data are available, and they identify the data-to-parameter ratio as a critical determinant of scaling behaviour. A preliminary conversion of the data-rich asymptotic floor to information-theoretic units yields an estimate of approximately 2.30 bits of entropy per masked gene position. We discuss implications for the design of single-cell foundation models and outline the additional measurements needed to refine this entropy estimate.
Executive Summary
This article presents the first systematic study of scaling laws for masked-reconstruction transformers trained on single-cell RNA sequencing data. The authors investigate the power-law relationships between loss, model size, and data in the context of single-cell genomics. Using expression profiles from the CELLxGENE Census, they construct two experimental regimes with varying levels of data richness and model capacity. The results demonstrate clear power-law scaling in the data-rich regime, but negligible scaling in the data-limited regime. The study highlights the importance of data-to-parameter ratio as a critical determinant of scaling behaviour and provides insights into the design of single-cell foundation models. The findings have implications for the development of more efficient and effective models in single-cell transcriptomics.
Key Points
- ▸ The article presents the first systematic study of scaling laws for masked-reconstruction transformers in single-cell genomics.
- ▸ The authors investigate power-law relationships between loss, model size, and data in the context of single-cell genomics.
- ▸ The study highlights the importance of data-to-parameter ratio as a critical determinant of scaling behaviour.
Merits
Strength of theoretical framework
The article builds upon established knowledge of neural scaling laws in language and vision transformers, providing a solid theoretical foundation for the study.
Methodological rigor
The authors employ a systematic approach, using expression profiles from the CELLxGENE Census and constructing two experimental regimes to investigate scaling laws.
Implications for single-cell genomics
The study provides insights into the design of single-cell foundation models and highlights the importance of data-to-parameter ratio in scaling behaviour.
Demerits
Limited generalizability
The study focuses on a specific type of model and dataset, which may limit the generalizability of the findings to other areas of single-cell genomics.
Need for further validation
The results are based on a limited number of models and datasets, and further validation is necessary to confirm the findings and explore their implications.
Expert Commentary
The article presents a well-designed and rigorously executed study of scaling laws for masked-reconstruction transformers in single-cell genomics. The authors provide a clear and concise presentation of their findings, which are supported by a systematic approach and a robust theoretical framework. The study highlights the importance of data-to-parameter ratio in scaling behaviour and provides insights into the design of single-cell foundation models. However, the limited generalizability of the findings and the need for further validation are notable limitations. Overall, the article makes a significant contribution to the field of single-cell genomics and deep learning, and its findings have important implications for the development of more efficient and effective models in this field.
Recommendations
- ✓ Future studies should investigate the generalizability of the findings to other areas of single-cell genomics, including other types of models and datasets.
- ✓ Further validation of the results is necessary to confirm the findings and explore their implications for the design of single-cell foundation models.