Skip to main content
Academic

GLaDiGAtor: Language-Model-Augmented Multi-Relation Graph Learning for Predicting Disease-Gene Associations

arXiv:2602.18769v1 Announce Type: new Abstract: Understanding disease-gene associations is essential for unravelling disease mechanisms and advancing diagnostics and therapeutics. Traditional approaches based on manual curation and literature review are labour-intensive and not scalable, prompting the use of machine learning on large biomedical data. In particular, graph neural networks (GNNs) have shown promise for modelling complex biological relationships. To address limitations in existing models, we propose GLaDiGAtor (Graph Learning-bAsed DIsease-Gene AssociaTiOn pRediction), a novel GNN framework with an encoder-decoder architecture for disease-gene association prediction. GLaDiGAtor constructs a heterogeneous biological graph integrating gene-gene, disease-disease, and gene-disease interactions from curated databases, and enriches each node with contextual features from well-known language models (ProtT5 for protein sequences and BioBERT for disease text). In evaluations, our

O
Osman Onur Kuzucu, Tunca Do\u{g}an
· · 1 min read · 2 views

arXiv:2602.18769v1 Announce Type: new Abstract: Understanding disease-gene associations is essential for unravelling disease mechanisms and advancing diagnostics and therapeutics. Traditional approaches based on manual curation and literature review are labour-intensive and not scalable, prompting the use of machine learning on large biomedical data. In particular, graph neural networks (GNNs) have shown promise for modelling complex biological relationships. To address limitations in existing models, we propose GLaDiGAtor (Graph Learning-bAsed DIsease-Gene AssociaTiOn pRediction), a novel GNN framework with an encoder-decoder architecture for disease-gene association prediction. GLaDiGAtor constructs a heterogeneous biological graph integrating gene-gene, disease-disease, and gene-disease interactions from curated databases, and enriches each node with contextual features from well-known language models (ProtT5 for protein sequences and BioBERT for disease text). In evaluations, our model achieves superior predictive accuracy and generalisation, outperforming 14 existing methods. Literature-supported case studies confirm the biological relevance of high-confidence novel predictions, highlighting GLaDiGAtor's potential to discover candidate disease genes. These results underscore the power of graph convolutional networks in biomedical informatics and may ultimately facilitate drug discovery by revealing new gene-disease links. The source code and processed datasets are publicly available at https://github.com/HUBioDataLab/GLaDiGAtor.

Executive Summary

GLaDiGAtor, a novel graph neural network framework, demonstrates exceptional predictive accuracy and generalization in disease-gene association prediction. By integrating gene-gene, disease-disease, and gene-disease interactions from curated databases and enriching nodes with contextual features from language models, GLaDiGAtor outperforms 14 existing methods. Case studies validate the biological relevance of novel predictions, highlighting the model's potential in drug discovery. The publicly available source code and datasets facilitate further research and potential applications in biomedical informatics. While GLaDiGAtor showcases impressive results, its scope and limitations warrant continued investigation, particularly regarding the generalizability to diverse biological contexts and the interpretability of its predictions.

Key Points

  • GLaDiGAtor is a novel graph neural network framework for disease-gene association prediction.
  • The model integrates multiple interaction types and contextual features from language models.
  • GLaDiGAtor outperforms 14 existing methods in predictive accuracy and generalization.

Merits

Strength in Predictive Accuracy

GLaDiGAtor achieves superior predictive accuracy and generalization compared to existing methods.

Comprehensive Integration of Interactions

The model effectively combines gene-gene, disease-disease, and gene-disease interactions from curated databases.

Enrichment with Contextual Features

GLaDiGAtor leverages language models (ProtT5 and BioBERT) to enhance node features with contextual information.

Demerits

Limited Generalizability

It is unclear whether GLaDiGAtor's performance will generalize to diverse biological contexts and datasets.

Interpretability Challenges

The model's decision-making processes and the relevance of its predictions may be difficult to interpret and understand.

Expert Commentary

GLaDiGAtor's impressive results demonstrate the potential of graph neural networks in biomedical informatics. However, the model's limitations in generalizability and interpretability warrant continued investigation and refinement. Furthermore, the successful integration of language models highlights the value of leveraging contextual features in biomedical data analysis. As the field of biomedical research continues to evolve, the development of robust and interpretable models like GLaDiGAtor will be crucial for advancing our understanding of complex biological relationships and discovering new therapeutic targets.

Recommendations

  • Future research should focus on expanding GLaDiGAtor's capabilities to diverse biological contexts and datasets, as well as improving its interpretability and transparency.
  • The integration of GLaDiGAtor with other machine learning and data analysis techniques (e.g., network analysis, pathway enrichment) may enhance its predictive accuracy and biological relevance.

Sources