PRIME: Prototype-Driven Multimodal Pretraining for Cancer Prognosis with Missing Modalities
arXiv:2604.04999v1 Announce Type: new Abstract: Multimodal self-supervised pretraining offers a promising route to cancer prognosis by integrating histopathology whole-slide images, gene expression, and pathology reports, yet most existing approaches require fully paired and complete inputs. In practice, clinical cohorts are fragmented and often miss one or more modalities, limiting both supervised fusion and scalable multimodal pretraining. We propose PRIME, a missing-aware multimodal self-supervised pretraining framework that learns robust and transferable representations from partially observed cohorts. PRIME maps heterogeneous modality embeddings into a unified token space and introduces a shared prototype memory bank for latent-space semantic imputation via patient-level consensus retrieval, producing structurally aligned tokens without reconstructing raw signals. Two complementary pretraining objectives: inter-modality alignment and post-fusion consistency under structured missi
arXiv:2604.04999v1 Announce Type: new Abstract: Multimodal self-supervised pretraining offers a promising route to cancer prognosis by integrating histopathology whole-slide images, gene expression, and pathology reports, yet most existing approaches require fully paired and complete inputs. In practice, clinical cohorts are fragmented and often miss one or more modalities, limiting both supervised fusion and scalable multimodal pretraining. We propose PRIME, a missing-aware multimodal self-supervised pretraining framework that learns robust and transferable representations from partially observed cohorts. PRIME maps heterogeneous modality embeddings into a unified token space and introduces a shared prototype memory bank for latent-space semantic imputation via patient-level consensus retrieval, producing structurally aligned tokens without reconstructing raw signals. Two complementary pretraining objectives: inter-modality alignment and post-fusion consistency under structured missingness augmentation, jointly learn representations that remain predictive under arbitrary modality subsets. We evaluate PRIME on The Cancer Genome Atlas with label-free pretraining on 32 cancer types and downstream 5-fold evaluation on five cohorts across overall survival prediction, 3-year mortality classification, and 3-year recurrence classification. PRIME achieves the best macro-average performance among all compared methods, reaching 0.653 C-index, 0.689 AUROC, and 0.637 AUROC on the three tasks, respectively, while improving robustness under test-time missingness and supporting parameter-efficient and label-efficient adaptation. These results support missing-aware multimodal pretraining as a practical strategy for prognosis modeling in fragmented clinical data settings.
Executive Summary
The paper introduces PRIME, a novel multimodal self-supervised pretraining framework designed to address the challenge of incomplete clinical data in cancer prognosis. By leveraging histopathology images, gene expression, and pathology reports, PRIME mitigates the limitations of traditional approaches that require fully paired inputs. The framework employs a unified token space and a shared prototype memory bank to enable latent-space semantic imputation, ensuring robust and transferable representations even under structured missingness. Evaluated on The Cancer Genome Atlas across 32 cancer types, PRIME demonstrates superior performance in overall survival prediction, 3-year mortality classification, and 3-year recurrence classification, achieving macro-average metrics of 0.653 C-index, 0.689 AUROC, and 0.637 AUROC, respectively. The approach enhances both parameter-efficiency and label-efficiency, offering a scalable solution for fragmented clinical datasets.
Key Points
- ▸ PRIME addresses the critical issue of missing modalities in clinical datasets, a common challenge in cancer prognosis modeling.
- ▸ The framework introduces a unified token space and prototype memory bank for latent-space semantic imputation, enabling robust representation learning from partially observed cohorts.
- ▸ PRIME achieves state-of-the-art performance across multiple downstream tasks, demonstrating its practical utility and scalability in real-world clinical settings.
Merits
Innovative Framework for Missing Modality Handling
PRIME's use of a shared prototype memory bank and patient-level consensus retrieval for latent-space imputation is a novel approach to handling missing modalities, addressing a long-standing challenge in multimodal learning.
Superior Performance and Robustness
The framework achieves the best macro-average performance across multiple tasks, including overall survival prediction and mortality classification, while improving robustness under test-time missingness.
Scalability and Efficiency
PRIME supports parameter-efficient and label-efficient adaptation, making it scalable for large clinical datasets and adaptable to new cohorts with minimal additional labeling.
Demerits
Complexity of Implementation
The sophisticated architecture of PRIME, including its prototype memory bank and multi-stage pretraining objectives, may pose challenges in implementation and computational requirements, potentially limiting accessibility for smaller research teams.
Dependence on High-Quality Pretraining Data
While PRIME demonstrates label-free pretraining on 32 cancer types, the quality and representativeness of the pretraining data (e.g., The Cancer Genome Atlas) may influence downstream performance, particularly in rare or underrepresented cancer types.
Limited Generalizability to Non-Cancer Applications
The framework is tailored for cancer prognosis and may not generalize seamlessly to other medical or non-medical multimodal applications with different data structures or missingness patterns.
Expert Commentary
PRIME represents a significant advancement in the field of multimodal AI for healthcare, particularly in addressing the pervasive challenge of missing data. The authors' innovative use of a shared prototype memory bank and latent-space imputation is not only technically elegant but also highly practical, offering a solution that mirrors real-world clinical scenarios where data completeness is the exception rather than the rule. The performance metrics reported are impressive, especially given the framework's label-free pretraining approach, which reduces the burden of data annotation—a major bottleneck in medical AI. However, the complexity of PRIME's architecture may pose a barrier to widespread adoption, particularly for smaller institutions with limited computational resources. Future work should explore simplifications or modular adaptations that retain the core benefits while reducing implementation overhead. Additionally, while the results are promising, further validation across diverse datasets and clinical settings will be critical to ensure generalizability and to address potential biases inherent in large-scale clinical datasets.
Recommendations
- ✓ Conduct further validation studies across diverse and underrepresented cancer types to assess the generalizability of PRIME's performance and mitigate potential biases.
- ✓ Develop modular or simplified versions of PRIME to reduce computational and implementation barriers, enabling broader adoption in resource-constrained environments.
- ✓ Explore hybrid approaches that combine PRIME with other missing-data techniques (e.g., multiple imputation) to enhance robustness and interpretability in clinical applications.
- ✓ Engage with regulatory bodies to establish standardized evaluation protocols for AI systems trained on partially observed datasets, ensuring equitable performance and safety in clinical deployment.
Sources
Original: arXiv - cs.LG