Towards Improved Sentence Representations using Token Graphs
arXiv:2603.03389v1 Announce Type: new Abstract: Obtaining a single-vector representation from a Large Language Model's (LLM) token-level outputs is a critical step for nearly all sentence-level tasks. However, standard pooling methods like mean or max aggregation treat tokens as an independent set, discarding the rich relational structure captured by the model's self-attention layers and making them susceptible to signal dilution. To address this, we introduce GLOT, a lightweight, structure-aware pooling module that reframes pooling as relational learning followed by aggregation. Operating on the outputs of a frozen LLM, GLOT first constructs a latent token-similarity graph, then refines token representations with a graph neural network, and finally aggregates them using a readout layer. Experimentally, our approach is remarkably robust and efficient: on a diagnostic stress test where 90% of tokens are random distractors, GLOT maintains over 97% accuracy while baseline methods collaps
arXiv:2603.03389v1 Announce Type: new Abstract: Obtaining a single-vector representation from a Large Language Model's (LLM) token-level outputs is a critical step for nearly all sentence-level tasks. However, standard pooling methods like mean or max aggregation treat tokens as an independent set, discarding the rich relational structure captured by the model's self-attention layers and making them susceptible to signal dilution. To address this, we introduce GLOT, a lightweight, structure-aware pooling module that reframes pooling as relational learning followed by aggregation. Operating on the outputs of a frozen LLM, GLOT first constructs a latent token-similarity graph, then refines token representations with a graph neural network, and finally aggregates them using a readout layer. Experimentally, our approach is remarkably robust and efficient: on a diagnostic stress test where 90% of tokens are random distractors, GLOT maintains over 97% accuracy while baseline methods collapse. Furthermore, it is competitive with state-of-the-art techniques on benchmarks like GLUE and MTEB with 20x fewer trainable parameters and speeds up the training time by over 100x compared with parameter-efficient fine-tuning methods. Supported by a theoretical analysis of its expressive power, our work shows that learning over token graphs is a powerful paradigm for the efficient adaptation of frozen LLMs. Our code is published at https://github.com/ipsitmantri/GLOT.
Executive Summary
The article 'Towards Improved Sentence Representations using Token Graphs' introduces GLOT, a novel pooling mechanism that addresses a critical bottleneck in sentence-level tasks by leveraging token-level outputs from frozen LLMs more effectively. Rather than treating tokens as independent entities through conventional pooling, GLOT introduces a structure-aware, graph-based approach by constructing a latent token-similarity graph, refining representations via a GNN, and aggregating via a readout layer. The results demonstrate significant resilience to noise—maintaining accuracy over 97% in adversarial token scenarios—while achieving substantial efficiency gains: 20x fewer parameters and a 100x faster training time compared to parameter-efficient fine-tuning methods. This represents a meaningful advancement in adapting frozen models without retraining, particularly for resource-constrained applications.
Key Points
- ▸ GLOT introduces a graph-based pooling mechanism to enhance sentence representations from LLMs
- ▸ The method maintains high accuracy under adversarial token conditions
- ▸ It achieves substantial efficiency gains in both parameter count and training speed
Merits
Robustness
GLOT’s performance under signal dilution makes it highly reliable in real-world noisy environments.
Efficiency
The reduction in parameter count and acceleration in training time make GLOT scalable and accessible.
Demerits
Implementation Complexity
The graph-based architecture may introduce minor overhead in deployment or integration with existing pipelines.
Expert Commentary
GLOT represents a paradigm shift in how we treat token-level outputs in downstream tasks. Traditionally, pooling was viewed as a mechanical aggregation—mean, max, or attention—but the authors reframe it as a relational learning problem. This is a profound conceptual leap. The use of a latent similarity graph to capture implicit model-derived relationships, followed by GNN refinement, transforms the pooling layer from a black box into a interpretable, structure-aware component. Moreover, the empirical validation on adversarial distractor scenarios is particularly compelling, as it isolates the true value of the method from superficial improvements. The scalability and efficiency metrics are not merely incremental—they are transformative for deployment on edge devices or low-resource platforms. While the potential for integration into fine-tuning workflows may require adaptation, the core contribution—reconceptualizing pooling as relational learning—is foundational. This work deserves serious consideration as a standard in LLM adaptation literature.
Recommendations
- ✓ Researchers should evaluate GLOT as a baseline for future LLM adaptation studies, particularly where parameter efficiency or robustness is critical.
- ✓ Open-source repositories should incorporate GLOT as a reference implementation for comparative analysis in downstream task benchmarks.