Enhancing Scientific Literature Chatbots with Retrieval-Augmented Generation: A Performance Evaluation of Vector and Graph-Based Systems
arXiv:2602.17856v1 Announce Type: cross Abstract: This paper investigates the enhancement of scientific literature chatbots through retrieval-augmented generation (RAG), with a focus on evaluating vector- and graph-based retrieval systems. The proposed chatbot leverages both structured (graph) and unstructured (vector) databases to access scientific articles and gray literature, enabling efficient triage of sources according to research objectives. To systematically assess performance, we examine two use-case scenarios: retrieval from a single uploaded document and retrieval from a large-scale corpus. Benchmark test sets were generated using a GPT model, with selected outputs annotated for evaluation. The comparative analysis emphasizes retrieval accuracy and response relevance, providing insight into the strengths and limitations of each approach. The findings demonstrate the potential of hybrid RAG systems to improve accessibility to scientific knowledge and to support evidence-base
arXiv:2602.17856v1 Announce Type: cross Abstract: This paper investigates the enhancement of scientific literature chatbots through retrieval-augmented generation (RAG), with a focus on evaluating vector- and graph-based retrieval systems. The proposed chatbot leverages both structured (graph) and unstructured (vector) databases to access scientific articles and gray literature, enabling efficient triage of sources according to research objectives. To systematically assess performance, we examine two use-case scenarios: retrieval from a single uploaded document and retrieval from a large-scale corpus. Benchmark test sets were generated using a GPT model, with selected outputs annotated for evaluation. The comparative analysis emphasizes retrieval accuracy and response relevance, providing insight into the strengths and limitations of each approach. The findings demonstrate the potential of hybrid RAG systems to improve accessibility to scientific knowledge and to support evidence-based decision making.
Executive Summary
This study evaluates the performance of vector- and graph-based retrieval systems in enhancing scientific literature chatbots through retrieval-augmented generation (RAG). The proposed chatbot leverages both structured and unstructured databases to access scientific articles and gray literature, enabling efficient triage of sources according to research objectives. The study examines two use-case scenarios and provides a comparative analysis of retrieval accuracy and response relevance. The findings demonstrate the potential of hybrid RAG systems to improve accessibility to scientific knowledge and support evidence-based decision making.
Key Points
- ▸ The study proposes a hybrid RAG system that leverages both structured (graph) and unstructured (vector) databases.
- ▸ The system enables efficient triage of sources according to research objectives.
- ▸ The study examines two use-case scenarios: retrieval from a single uploaded document and retrieval from a large-scale corpus.
Merits
Strength
The study provides a systematic evaluation of vector- and graph-based retrieval systems, shedding light on the strengths and limitations of each approach.
Innovation
The proposed hybrid RAG system has the potential to improve accessibility to scientific knowledge and support evidence-based decision making.
Methodology
The study uses a well-designed benchmark test set, generated using a GPT model, to assess the performance of the retrieval systems.
Demerits
Limitation
The study focuses on a specific use-case scenario, and its generalizability to other domains and applications is unclear.
Data
The study relies on a limited dataset, which may not be representative of the broader scientific literature.
Evaluation Metrics
The study uses a limited set of evaluation metrics, which may not capture the full range of performance characteristics of the retrieval systems.
Expert Commentary
This study makes a significant contribution to the field of information retrieval and scientific literature retrieval. The proposed hybrid RAG system has the potential to improve accessibility to scientific knowledge and support evidence-based decision making. However, the study's limitations, such as its focus on a specific use-case scenario and its reliance on a limited dataset, need to be addressed in future research. The study's findings also have implications for the development of evidence-based decision-making frameworks and the use of scientific literature in policy-making. Overall, the study provides a valuable insight into the strengths and limitations of vector- and graph-based retrieval systems and their potential applications in scientific literature retrieval.
Recommendations
- ✓ Future research should focus on developing hybrid RAG systems that can be applied to different domains and applications.
- ✓ The study's limitations, such as its focus on a specific use-case scenario and its reliance on a limited dataset, need to be addressed in future research.
- ✓ The study's findings have policy implications for the development of evidence-based decision-making frameworks and the use of scientific literature in policy-making.