Skip to main content
Academic

P-RAG: Prompt-Enhanced Parametric RAG with LoRA and Selective CoT for Biomedical and Multi-Hop QA

arXiv:2602.15874v1 Announce Type: new Abstract: Large Language Models (LLMs) demonstrate remarkable capabilities but remain limited by their reliance on static training data. Retrieval-Augmented Generation (RAG) addresses this constraint by retrieving external knowledge during inference, though it still depends heavily on knowledge base quality. To explore potential improvements, we evaluated three RAG variants-Standard RAG, DA-RAG, and our proposed Prompt-Enhanced Parametric RAG (P-RAG), a hybrid architecture that integrates parametric knowledge within the LLM and retrieved evidence, guided by Chain-of-Thought (CoT) prompting and Low-Rank Adaptation (LoRA) fine-tuning-on both general and biomedical datasets. Using LLaMA-3.2-1B-Instruct fine-tuned via LoRA, we evaluate on PubMedQA and 2WikiMultihopQA. P-RAG outperforms Standard RAG on PubMedQA by 10.47 percentage points in F1 (93.33% vs. 82.86%; 12.64% relative). On 2WikiMultihopQA, P-RAG nearly doubles the overall score vs. Standard

X
Xingda Lyu, Gongfu Lyu, Zitai Yan, Yuxin Jiang
· · 1 min read · 3 views

arXiv:2602.15874v1 Announce Type: new Abstract: Large Language Models (LLMs) demonstrate remarkable capabilities but remain limited by their reliance on static training data. Retrieval-Augmented Generation (RAG) addresses this constraint by retrieving external knowledge during inference, though it still depends heavily on knowledge base quality. To explore potential improvements, we evaluated three RAG variants-Standard RAG, DA-RAG, and our proposed Prompt-Enhanced Parametric RAG (P-RAG), a hybrid architecture that integrates parametric knowledge within the LLM and retrieved evidence, guided by Chain-of-Thought (CoT) prompting and Low-Rank Adaptation (LoRA) fine-tuning-on both general and biomedical datasets. Using LLaMA-3.2-1B-Instruct fine-tuned via LoRA, we evaluate on PubMedQA and 2WikiMultihopQA. P-RAG outperforms Standard RAG on PubMedQA by 10.47 percentage points in F1 (93.33% vs. 82.86%; 12.64% relative). On 2WikiMultihopQA, P-RAG nearly doubles the overall score vs. Standard RAG (33.44% vs. 17.83%) and achieves 44.03% on the Compare subset (with 42.74% Bridge, 21.84% Inference, 8.60% Compose). CoT prompting substantially improves multi-hop reasoning but yields mixed results for simpler, single-hop queries. These findings underscore P-RAG's potential for accurate, scalable, and contextually adaptive biomedical question answering. Our contributions include: (1) LoRA-based fine-tuning of LLaMA-3.2-1B-Instruct for biomedical QA, (2) introduction of P-RAG with Chain-of-Thought prompting, and (3) state-of-the-art results on PubMedQA and 2WikiMultihopQA.

Executive Summary

This article presents a novel approach to Retrieval-Augmented Generation (RAG), called Prompt-Enhanced Parametric RAG (P-RAG), which integrates parametric knowledge within the Large Language Model (LLM) and retrieved evidence. P-RAG is guided by Chain-of-Thought (CoT) prompting and Low-Rank Adaptation (LoRA) fine-tuning. The authors evaluate P-RAG on both general and biomedical datasets, achieving state-of-the-art results on PubMedQA and 2WikiMultihopQA. The findings demonstrate the potential of P-RAG for accurate, scalable, and contextually adaptive biomedical question answering. The contributions include LoRA-based fine-tuning of LLaMA-3.2-1B-Instruct for biomedical QA, introduction of P-RAG with Chain-of-Thought prompting, and improved multi-hop reasoning capabilities.

Key Points

  • P-RAG integrates parametric knowledge within the LLM and retrieved evidence to enhance knowledge base quality.
  • P-RAG is guided by Chain-of-Thought (CoT) prompting and Low-Rank Adaptation (LoRA) fine-tuning.
  • The authors evaluate P-RAG on both general and biomedical datasets, achieving state-of-the-art results on PubMedQA and 2WikiMultihopQA.

Merits

Strength in Knowledge Representation

P-RAG effectively integrates both parametric knowledge within the LLM and retrieved evidence to improve knowledge base quality, addressing the constraint of static training data in LLMs.

Improved Multi-Hop Reasoning

CoT prompting substantially improves multi-hop reasoning capabilities, demonstrating the potential of P-RAG for contextually adaptive question answering.

State-of-the-Art Results

P-RAG achieves state-of-the-art results on PubMedQA and 2WikiMultihopQA, underscoring its potential for accurate and scalable biomedical question answering.

Demerits

Overreliance on Prompting

P-RAG's effectiveness may be contingent upon the quality and design of the CoT prompting used, potentially limiting its applicability in real-world scenarios.

Computational Resource Intensity

P-RAG's integration of LoRA fine-tuning and Chain-of-Thought prompting may incur significant computational resource requirements, potentially hindering its adoption in resource-constrained environments.

Expert Commentary

This article presents a significant advancement in the field of Retrieval-Augmented Generation, offering a novel approach to integrating parametric knowledge within the Large Language Model. The authors' evaluation of P-RAG on both general and biomedical datasets demonstrates its potential for accurate and scalable question answering. However, the study's reliance on Chain-of-Thought prompting and LoRA fine-tuning may incur significant computational resource requirements, limiting its adoption in resource-constrained environments. Furthermore, the effectiveness of P-RAG may be contingent upon the quality and design of the CoT prompting used, potentially limiting its applicability in real-world scenarios.

Recommendations

  • Future studies should investigate the applicability of P-RAG in resource-constrained environments and explore the potential of its integration with other question answering systems.
  • The authors should further develop and refine the design of Chain-of-Thought prompting to improve its effectiveness and efficiency.

Sources