Academic

BiScale-GTR: Fragment-Aware Graph Transformers for Multi-Scale Molecular Representation Learning

Yi Yang, Ovidiu Daescu · April 9, 2026 · 1 min read · 80 views

#cs.LG #cs.AI

arXiv:2604.06336v1 Announce Type: new Abstract: Graph Transformers have recently attracted attention for molecular property prediction by combining the inductive biases of graph neural networks (GNNs) with the global receptive field of Transformers. However, many existing hybrid architectures remain GNN-dominated, causing the resulting representations to remain heavily shaped by local message passing. Moreover, most existing methods operate at only a single structural granularity, limiting their ability to capture molecular patterns that span multiple molecular scales. We introduce BiScale-GTR, a unified framework for self-supervised molecular representation learning that combines chemically grounded fragment tokenization with adaptive multi-scale reasoning. Our method improves graph Byte Pair Encoding (BPE) tokenization to produce consistent, chemically valid, and high-coverage fragment tokens, which are used as fragment-level inputs to a parallel GNN-Transformer architecture. Architecturally, atom-level representations learned by a GNN are pooled into fragment-level embeddings and fused with fragment token embeddings before Transformer reasoning, enabling the model to jointly capture local chemical environments, substructure-level motifs, and long-range molecular dependencies. Experiments on MoleculeNet, PharmaBench, and the Long Range Graph Benchmark (LRGB) demonstrate state-of-the-art performance across both classification and regression tasks. Attribution analysis further shows that BiScale-GTR highlights chemically meaningful functional motifs, providing interpretable links between molecular structure and predicted properties. Code will be released upon acceptance.

Executive Summary

BiScale-GTR introduces a novel framework for molecular representation learning, addressing the limitations of existing Graph Transformers that are often GNN-dominated and single-scale. By integrating chemically grounded fragment tokenization with an adaptive multi-scale architecture, BiScale-GTR simultaneously captures local chemical environments, substructure motifs, and long-range dependencies. It enhances graph Byte Pair Encoding (BPE) for consistent, valid, and high-coverage fragment tokens, which are then processed in parallel with atom-level GNN representations. This fusion, followed by Transformer reasoning, yields state-of-the-art performance across diverse molecular property prediction benchmarks and offers improved interpretability through meaningful functional motif attribution.

Key Points

▸ Introduces BiScale-GTR, a unified framework for multi-scale molecular representation learning using Graph Transformers.
▸ Develops an improved graph Byte Pair Encoding (BPE) for generating consistent, chemically valid, and high-coverage fragment tokens.
▸ Employs a parallel GNN-Transformer architecture that fuses atom-level GNN embeddings with fragment token embeddings for multi-scale reasoning.
▸ Achieves state-of-the-art performance on MoleculeNet, PharmaBench, and LRGB benchmarks for classification and regression.
▸ Provides enhanced interpretability by highlighting chemically meaningful functional motifs through attribution analysis.

Merits

Multi-Scale Representation

Effectively addresses the common limitation of single-granularity models by integrating atom-level GNNs with fragment-level Transformers, capturing diverse molecular patterns.

Chemically Grounded Tokenization

The improved graph BPE ensures fragment tokens are not arbitrary but chemically valid and high-coverage, enhancing the meaningfulness of higher-level representations.

Unified Architecture

Seamlessly integrates local GNN reasoning with global Transformer attention, overcoming the GNN-dominated bias prevalent in many hybrid models.

Strong Empirical Performance

Demonstrates state-of-the-art results across a variety of challenging benchmarks, validating its effectiveness in diverse molecular tasks.

Enhanced Interpretability

Attribution analysis linking model predictions to chemically meaningful functional motifs is a significant step towards trustworthy AI in chemistry.

Demerits

Computational Complexity

The parallel GNN-Transformer architecture and multi-scale processing may introduce higher computational demands, especially for very large molecules or datasets.

Scalability of Tokenization

While improved, the scalability of graph BPE for extremely diverse or novel chemical spaces, particularly in drug discovery, warrants further investigation.

Generalizability of Fragmentation

The 'chemical validity' of fragments, while an improvement, might still be implicitly tied to the training data distribution used for BPE, potentially limiting generalizability to truly novel scaffolds.

Expert Commentary

BiScale-GTR represents a significant methodological advance in molecular representation learning, adeptly navigating the inherent trade-offs between local structural detail and global molecular context. The innovation lies not just in combining GNNs and Transformers, but in the sophisticated, chemically informed integration of these paradigms at multiple scales. The improved graph BPE for fragment tokenization is particularly noteworthy, moving beyond purely statistical tokenization to embed chemical intuition. This grounding in chemical validity lends considerable strength to the learned representations and, critically, enhances the interpretability of the model's predictions. The consistent state-of-the-art performance across diverse benchmarks underscores its robustness. For practitioners in drug discovery and materials science, this offers a powerful tool that promises not only predictive accuracy but also crucial insights into the underlying chemical mechanisms, bridging the gap between computational prediction and experimental design. The emphasis on interpretability positions BiScale-GTR as a foundational step towards more trusted and actionable AI in chemistry.

Recommendations

✓ Conduct a thorough analysis of the computational efficiency and scalability of BiScale-GTR, particularly when applied to very large molecules (e.g., proteins) or ultra-large virtual screening libraries.
✓ Explore the robustness of the improved graph BPE tokenization across highly diverse and novel chemical spaces, perhaps by benchmarking against specialized chemical vocabularies or expert-curated fragment libraries.
✓ Investigate the application of BiScale-GTR in active learning loops to guide experimental design, leveraging its interpretability to suggest informative experiments.
✓ Extend the attribution analysis to identify not just functional motifs but also key interactions or conformational features responsible for specific property predictions, potentially through dynamic graph representations.

Sources

Original: arXiv - cs.LG

arXiv - cs.LG

BiScale-GTR: Fragment-Aware Graph Transformers for Multi-Scale Molecular Representation Learning

AI Commentary

Executive Summary

Key Points

Merits

Multi-Scale Representation

Chemically Grounded Tokenization

Unified Architecture

Strong Empirical Performance

Enhanced Interpretability

Demerits

Computational Complexity

Scalability of Tokenization

Generalizability of Fragmentation

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs