How Do LLMs Encode Scientific Quality? An Empirical Study Using Monosemantic Features from Sparse Autoencoders
arXiv:2602.19115v1 Announce Type: new Abstract: In recent years, there has been a growing use of generative AI, and large language models (LLMs) in particular, to support both the assessment and generation of scientific work. Although some studies have shown that LLMs can, to a certain extent, evaluate research according to perceived quality, our understanding of the internal mechanisms that enable this capability remains limited. This paper presents the first study that investigates how LLMs encode the concept of scientific quality through relevant monosemantic features extracted using sparse autoencoders. We derive such features under different experimental settings and assess their ability to serve as predictors across three tasks related to research quality: predicting citation count, journal SJR, and journal h-index. The results indicate that LLMs encode features associated with multiple dimensions of scientific quality. In particular, we identify four recurring types of features
arXiv:2602.19115v1 Announce Type: new Abstract: In recent years, there has been a growing use of generative AI, and large language models (LLMs) in particular, to support both the assessment and generation of scientific work. Although some studies have shown that LLMs can, to a certain extent, evaluate research according to perceived quality, our understanding of the internal mechanisms that enable this capability remains limited. This paper presents the first study that investigates how LLMs encode the concept of scientific quality through relevant monosemantic features extracted using sparse autoencoders. We derive such features under different experimental settings and assess their ability to serve as predictors across three tasks related to research quality: predicting citation count, journal SJR, and journal h-index. The results indicate that LLMs encode features associated with multiple dimensions of scientific quality. In particular, we identify four recurring types of features that capture key aspects of how research quality is represented: 1) features reflecting research methodologies; 2) features related to publication type, with literature reviews typically exhibiting higher impact; 3) features associated with high-impact research fields and technologies; and 4) features corresponding to specific scientific jargons. These findings represent an important step toward understanding how LLMs encapsulate concepts related to research quality.
Executive Summary
This study investigates how large language models (LLMs) encode the concept of scientific quality. Using sparse autoencoders, the authors extract monosemantic features and assess their ability to predict research quality across three tasks. The results indicate that LLMs encode features associated with multiple dimensions of scientific quality, including research methodologies, publication types, high-impact research fields, and scientific jargons. The findings provide insights into how LLMs encapsulate concepts related to research quality, representing an important step toward understanding their internal mechanisms.
Key Points
- ▸ LLMs can encode features associated with multiple dimensions of scientific quality
- ▸ Four recurring types of features capture key aspects of research quality
- ▸ Sparse autoencoders can extract monosemantic features relevant to research quality
Merits
Methodological Innovation
The use of sparse autoencoders to extract monosemantic features is a novel approach to understanding LLMs' internal mechanisms.
Demerits
Limited Generalizability
The study's findings may not generalize to other domains or tasks, limiting the applicability of the results.
Expert Commentary
This study represents a significant contribution to our understanding of how LLMs encode concepts related to scientific quality. The use of sparse autoencoders to extract monosemantic features is a methodological innovation that sheds light on the internal mechanisms of LLMs. The findings have important implications for the development of more effective LLM-based tools for research evaluation and generation. However, further research is needed to fully understand the generalizability of the results and to address the limitations of the study.
Recommendations
- ✓ Future studies should investigate the applicability of the findings to other domains and tasks
- ✓ Researchers should explore the use of other machine learning techniques to extract features relevant to research quality