Academic

PrivMedChat: End-to-End Differentially Private RLHF for Medical Dialogue Systems

arXiv:2603.03054v1 Announce Type: new Abstract: Large language models are increasingly used for patient-facing medical assistance and clinical decision support, but adapting them to clinical dialogue often requires supervision derived from doctor-patient conversations that may contain sensitive information. Conventional supervised fine-tuning and reinforcement learning from human feedback (RLHF) can amplify memorization risks, enabling empirical membership inference and extraction of rare training-set content. We present PrivMedChat, an end-to-end framework for differentially private RLHF (DP-RLHF) for medical dialogue. Our design enforces differential privacy at every training stage that directly accesses dialogue-derived supervision: (i) Differential Private Stochastic Gradient Descent (DP-SGD) for medical SFT and (ii) DP-SGD for reward model learning from preference pairs. To limit additional privacy expenditure during alignment, we apply DP-SGD to the PPO actor and critic when ope

S
Sudip Bhujel
· · 1 min read · 2 views

arXiv:2603.03054v1 Announce Type: new Abstract: Large language models are increasingly used for patient-facing medical assistance and clinical decision support, but adapting them to clinical dialogue often requires supervision derived from doctor-patient conversations that may contain sensitive information. Conventional supervised fine-tuning and reinforcement learning from human feedback (RLHF) can amplify memorization risks, enabling empirical membership inference and extraction of rare training-set content. We present PrivMedChat, an end-to-end framework for differentially private RLHF (DP-RLHF) for medical dialogue. Our design enforces differential privacy at every training stage that directly accesses dialogue-derived supervision: (i) Differential Private Stochastic Gradient Descent (DP-SGD) for medical SFT and (ii) DP-SGD for reward model learning from preference pairs. To limit additional privacy expenditure during alignment, we apply DP-SGD to the PPO actor and critic when operating on dialogue-derived prompts, while the reward model remains fixed after DP training. We also introduce an annotation-free preference construction strategy that pairs physician responses with filtered non-expert generations to produce scalable preference data without clinician labeling. Experiments on medical dialogue benchmarks show that PrivMedChat at $\varepsilon=7$ achieves the highest ROUGE-L of 0.156 among all DP models, reduces clinical hallucinations to 1.4% and harmful advice to 0.4%, and obtains the highest overall score of 2.86 in a 3-model LLM-jury evaluation, while producing membership-inference signals that are near chance (AUC 0.510-0.555). We open-source our code at https://github.com/sudip-bhujel/privmedchat.

Executive Summary

This article presents PrivMedChat, an end-to-end differentially private reinforcement learning from human feedback (DP-RLHF) framework for medical dialogue systems. The framework addresses the risks of memorization and membership inference by enforcing differential privacy at every training stage. The authors introduce an annotation-free preference construction strategy and experimentally demonstrate the framework's effectiveness on medical dialogue benchmarks. PrivMedChat achieves high ROUGE-L scores, reduces clinical hallucinations and harmful advice, and obtains the highest overall score in a 3-model LLM-jury evaluation. However, the framework's reliance on DP-SGD and the potential for increased computational costs are limitations that require further investigation. The article's contributions have significant implications for the development of private and secure medical dialogue systems.

Key Points

  • PrivMedChat is an end-to-end differentially private RLHF framework for medical dialogue systems
  • The framework addresses memorization risks and membership inference through differential privacy
  • Annotation-free preference construction strategy is introduced for scalable preference data
  • Experiments demonstrate the framework's effectiveness on medical dialogue benchmarks

Merits

Strength in Addressing Memorization Risks

PrivMedChat effectively mitigates the risks of memorization and membership inference through differential privacy, making it a significant contribution to the development of private medical dialogue systems.

Improved Scalability

The annotation-free preference construction strategy enables the creation of scalable preference data without clinician labeling, enhancing the framework's practical applications.

High ROUGE-L Scores

PrivMedChat achieves high ROUGE-L scores on medical dialogue benchmarks, demonstrating its effectiveness in generating coherent and contextually relevant responses.

Demerits

Reliance on DP-SGD

The framework's reliance on DP-SGD may lead to increased computational costs, potentially limiting its practical applications in resource-constrained environments.

Potential for Membership Inference

Although PrivMedChat effectively mitigates memorization risks, it is unclear whether the framework can prevent membership inference attacks entirely, especially in adversarial scenarios.

Expert Commentary

PrivMedChat represents a significant advancement in the development of private medical dialogue systems. The framework's ability to effectively mitigate memorization risks and membership inference attacks is particularly noteworthy. However, further investigation is required to explore the potential limitations of DP-SGD and the framework's reliance on this technique. Additionally, the annotation-free preference construction strategy introduced by the authors has the potential to significantly enhance the scalability of private medical dialogue systems. As the healthcare industry continues to adopt AI-powered technologies, the development of secure and trustworthy medical dialogue systems will become increasingly important. PrivMedChat is an important step towards achieving this goal, and its contributions will likely have a lasting impact on the field.

Recommendations

  • Future research should focus on exploring alternative differentially private techniques that can mitigate the computational costs associated with DP-SGD
  • The development of more robust and private reinforcement learning frameworks is essential for the widespread adoption of AI-powered healthcare technologies

Sources