Skip to main content
Academic

Generative Pseudo-Labeling for Pre-Ranking with LLMs

arXiv:2602.20995v1 Announce Type: cross Abstract: Pre-ranking is a critical stage in industrial recommendation systems, tasked with efficiently scoring thousands of recalled items for downstream ranking. A key challenge is the train-serving discrepancy: pre-ranking models are trained only on exposed interactions, yet must score all recalled candidates -- including unexposed items -- during online serving. This mismatch not only induces severe sample selection bias but also degrades generalization, especially for long-tail content. Existing debiasing approaches typically rely on heuristics (e.g., negative sampling) or distillation from biased rankers, which either mislabel plausible unexposed items as negatives or propagate exposure bias into pseudo-labels. In this work, we propose Generative Pseudo-Labeling (GPL), a framework that leverages large language models (LLMs) to generate unbiased, content-aware pseudo-labels for unexposed items, explicitly aligning the training distribution

arXiv:2602.20995v1 Announce Type: cross Abstract: Pre-ranking is a critical stage in industrial recommendation systems, tasked with efficiently scoring thousands of recalled items for downstream ranking. A key challenge is the train-serving discrepancy: pre-ranking models are trained only on exposed interactions, yet must score all recalled candidates -- including unexposed items -- during online serving. This mismatch not only induces severe sample selection bias but also degrades generalization, especially for long-tail content. Existing debiasing approaches typically rely on heuristics (e.g., negative sampling) or distillation from biased rankers, which either mislabel plausible unexposed items as negatives or propagate exposure bias into pseudo-labels. In this work, we propose Generative Pseudo-Labeling (GPL), a framework that leverages large language models (LLMs) to generate unbiased, content-aware pseudo-labels for unexposed items, explicitly aligning the training distribution with the online serving space. By offline generating user-specific interest anchors and matching them with candidates in a frozen semantic space, GPL provides high-quality supervision without adding online latency. Deployed in a large-scale production system, GPL improves click-through rate by 3.07%, while significantly enhancing recommendation diversity and long-tail item discovery.

Executive Summary

This article proposes Generative Pseudo-Labeling (GPL), a framework leveraging large language models (LLMs) to generate unbiased pseudo-labels for unexposed items in pre-ranking recommendation systems. GPL addresses the train-serving discrepancy by offline generating user-specific interest anchors and matching them with candidates in a frozen semantic space. The framework provides high-quality supervision without adding online latency, improving click-through rate and recommendation diversity. The GPL framework is deployed in a large-scale production system, achieving a 3.07% increase in click-through rate.

Key Points

  • Pre-ranking is a critical stage in industrial recommendation systems, tasked with efficiently scoring thousands of recalled items for downstream ranking.
  • Existing debiasing approaches typically rely on heuristics or distillation from biased rankers, which either mislabel plausible unexposed items as negatives or propagate exposure bias into pseudo-labels.
  • GPL leverages LLMs to generate unbiased, content-aware pseudo-labels for unexposed items, explicitly aligning the training distribution with the online serving space.

Merits

Strength in Addressing Train-Serving Discrepancy

GPL effectively addresses the train-serving discrepancy by generating unbiased pseudo-labels for unexposed items, improving generalization and reducing sample selection bias.

Improved Recommendation Diversity

GPL enhances recommendation diversity by providing high-quality supervision without adding online latency, leading to improved click-through rates and long-tail item discovery.

Demerits

Dependence on Large Language Models

GPL's performance relies heavily on the quality and availability of large language models, which may introduce dependencies and limitations in deployment.

Potential for Overfitting

The offline generation of user-specific interest anchors may lead to overfitting if not properly regularized, affecting the generalizability of GPL.

Expert Commentary

The GPL framework presents a novel approach to addressing the train-serving discrepancy in pre-ranking recommendation systems. By leveraging large language models to generate unbiased pseudo-labels for unexposed items, GPL improves recommendation diversity and long-tail item discovery. However, the framework's dependence on large language models and potential for overfitting require careful consideration. The implications of GPL's deployment in large-scale production systems and its potential impact on data privacy and biased recommendations warrant further investigation.

Recommendations

  • Future research should explore the application of GPL in other recommendation systems and evaluate its performance in diverse scenarios.
  • Developers and practitioners should carefully consider the dependence on large language models and potential for overfitting when deploying GPL in production.

Sources