PRISM: Exploring Heterogeneous Pretrained EEG Foundation Model Transfer to Clinical Differential Diagnosis
arXiv:2603.02268v1 Announce Type: new Abstract: EEG foundation models are typically pretrained on narrow-source clinical archives and evaluated on benchmarks from the same ecosystem, leaving unclear whether representations encode neural physiology or recording-distribution artifacts. We introduce PRISM (Population Representative Invariant Signal Model), a masked autoencoder ablated along two axes -- pretraining population and downstream adaptation -- with architecture and preprocessing fixed. We compare a narrow-source EU/US corpus (TUH + PhysioNet) against a geographically diverse pool augmented with multi-center South Asian clinical recordings across multiple EEG systems. Three findings emerge. First, narrow-source pretraining yields stronger linear probes on distribution-matched benchmarks, while diverse pretraining produces more adaptable representations under fine-tuning -- a trade-off invisible under single-protocol evaluation. Trained on three source corpora, PRISM matches or o
arXiv:2603.02268v1 Announce Type: new Abstract: EEG foundation models are typically pretrained on narrow-source clinical archives and evaluated on benchmarks from the same ecosystem, leaving unclear whether representations encode neural physiology or recording-distribution artifacts. We introduce PRISM (Population Representative Invariant Signal Model), a masked autoencoder ablated along two axes -- pretraining population and downstream adaptation -- with architecture and preprocessing fixed. We compare a narrow-source EU/US corpus (TUH + PhysioNet) against a geographically diverse pool augmented with multi-center South Asian clinical recordings across multiple EEG systems. Three findings emerge. First, narrow-source pretraining yields stronger linear probes on distribution-matched benchmarks, while diverse pretraining produces more adaptable representations under fine-tuning -- a trade-off invisible under single-protocol evaluation. Trained on three source corpora, PRISM matches or outperforms REVE (92 datasets, 60,000+ hours) on the majority of tasks, demonstrating that targeted diversity can substitute for indiscriminate scale and that dataset count is a confounding variable in model comparison. Second, on a clinically challenging and previously untested task -- distinguishing epilepsy from diagnostic mimickers via interictal EEG -- the diverse checkpoint outperforms the narrow-source checkpoint by +12.3 pp balanced accuracy, the largest gap across all evaluations. Third, systematic inconsistencies between EEG-Bench and EEG-FM-Bench reverse model rankings on identical datasets by up to 24 pp; we identify six concrete sources including split construction, checkpoint selection, segment length, and normalization, showing these factors compound non-additively.
Executive Summary
This article presents PRISM, a masked autoencoder designed to explore heterogeneous pretrained EEG foundation model transfer to clinical differential diagnosis. The authors compare narrow-source pretraining with a geographically diverse pool of clinical recordings and demonstrate that targeted diversity can substitute for indiscriminate scale. PRISM matches or outperforms REVE on the majority of tasks and exhibits better performance on distinguishing epilepsy from diagnostic mimickers via interictal EEG. The study identifies several sources of inconsistencies in model rankings across identical datasets, highlighting the need for more standardized evaluation protocols. The findings have significant implications for the development and evaluation of EEG-based clinical decision support systems.
Key Points
- ▸ PRISM outperforms REVE on distinguishing epilepsy from diagnostic mimickers via interictal EEG
- ▸ Targeted diversity in pretraining can substitute for indiscriminate scale
- ▸ Systematic inconsistencies in model rankings across identical datasets are identified
Merits
Strength in Evaluation Protocol
The authors introduce a systematic evaluation protocol that compares narrow-source pretraining with a geographically diverse pool of clinical recordings, providing a more comprehensive understanding of EEG foundation model transfer.
Insights into EEG-Based Decision Support Systems
The findings of this study provide valuable insights into the development and evaluation of EEG-based clinical decision support systems, highlighting the importance of targeted diversity in pretraining and standardized evaluation protocols.
Demerits
Limited Generalizability
The study's results may not be generalizable to other EEG-based applications or clinical settings, highlighting the need for further research to validate the findings.
Methodological Complexity
The PRISM architecture and evaluation protocol introduce methodological complexities that may be challenging for non-experts to replicate or extend.
Expert Commentary
The study presents a rigorous evaluation of EEG foundation model transfer, highlighting the importance of targeted diversity in pretraining and standardized evaluation protocols. The findings have significant implications for the development and evaluation of EEG-based clinical decision support systems, which are increasingly being used in clinical settings. However, the study's results may not be generalizable to other EEG-based applications or clinical settings, and the methodological complexities of the PRISM architecture and evaluation protocol may be challenging for non-experts to replicate or extend. Further research is needed to validate the findings and explore the limitations of the study.
Recommendations
- ✓ Future studies should prioritize the development of standardized evaluation protocols for EEG-based decision support systems.
- ✓ Researchers should explore the application of targeted diversity in pretraining to other EEG-based applications and clinical settings.