Large language models can disambiguate opioid slang on social media
arXiv:2603.10313v1 Announce Type: new Abstract: Social media text shows promise for monitoring trends in the opioid overdose crisis; however, the overwhelming majority of social media text is unrelated to opioids. When leveraging social media text to monitor trends in the ongoing opioid overdose crisis, a common strategy for identifying relevant content is to use a lexicon of opioid-related terms as inclusion criteria. However, many slang terms for opioids, such as "smack" or "blues," have common non-opioid meanings, making them ambiguous. The advanced textual reasoning capability of large language models (LLMs) presents an opportunity to disambiguate these slang terms at scale. We present three tasks on which to evaluate four state-of-the-art LLMs (GPT-4, GPT-5, Gemini 2.5 Pro, and Claude Sonnet 4.5): a lexicon-based setting, in which the LLM must disambiguate a specific term within the context of a given post; a lexicon-free setting, in which the LLM must identify opioid-related pos
arXiv:2603.10313v1 Announce Type: new Abstract: Social media text shows promise for monitoring trends in the opioid overdose crisis; however, the overwhelming majority of social media text is unrelated to opioids. When leveraging social media text to monitor trends in the ongoing opioid overdose crisis, a common strategy for identifying relevant content is to use a lexicon of opioid-related terms as inclusion criteria. However, many slang terms for opioids, such as "smack" or "blues," have common non-opioid meanings, making them ambiguous. The advanced textual reasoning capability of large language models (LLMs) presents an opportunity to disambiguate these slang terms at scale. We present three tasks on which to evaluate four state-of-the-art LLMs (GPT-4, GPT-5, Gemini 2.5 Pro, and Claude Sonnet 4.5): a lexicon-based setting, in which the LLM must disambiguate a specific term within the context of a given post; a lexicon-free setting, in which the LLM must identify opioid-related posts from context without a lexicon; and an emergent slang setting, in which the LLM must identify opioid-related posts with simulated new slang terms. All four LLMs showed excellent performance across all tasks. In both subtasks of the lexicon-based setting, LLM F1 scores ("fenty" subtask: 0.824-0.972; "smack" subtask: 0.540-0.862) far exceeded those of the best lexicon strategy (0.126 and 0.009, respectively). In the lexicon-free task, LLM F1 scores (0.544-0.769) surpassed those of lexicons (0.080-0.540), and LLMs demonstrated uniformly higher recall. On emergent slang, all LLMs had higher accuracy (average: 0.784), F1 score (average: 0.712), precision (average: 0.981), and recall (average: 0.587) than the two lexicons assessed. Our results show that LLMs can be used to identify relevant content for low-prevalence topics, including but not limited to opioid references, enhancing data provided to downstream analyses and predictive models.
Executive Summary
This article explores the potential of large language models (LLMs) in disambiguating opioid slang on social media, enhancing the monitoring of trends in the opioid overdose crisis. The study evaluates four state-of-the-art LLMs across three tasks, demonstrating their excellent performance in disambiguating slang terms, identifying opioid-related posts, and detecting emergent slang. The results show that LLMs can effectively identify relevant content for low-prevalence topics, outperforming traditional lexicon-based strategies. This breakthrough has significant implications for improving data quality and predictive models in public health and social media monitoring.
Key Points
- ▸ LLMs can disambiguate opioid slang terms with high accuracy
- ▸ LLMs outperform traditional lexicon-based strategies in identifying opioid-related posts
- ▸ LLMs demonstrate potential in detecting emergent slang terms
Merits
Improved Accuracy
LLMs achieve higher accuracy and F1 scores compared to traditional lexicon-based strategies, enhancing the reliability of social media monitoring
Demerits
Dependence on Training Data
LLMs' performance may be limited by the quality and diversity of their training data, potentially leading to biases and inaccuracies
Expert Commentary
The study's results underscore the potential of LLMs in addressing the complexities of social media monitoring, particularly in the context of low-prevalence topics like opioid references. However, it is crucial to consider the limitations and potential biases of LLMs, ensuring that their deployment is carefully evaluated and validated. As the field continues to evolve, it is essential to prioritize transparency, accountability, and collaboration between researchers, policymakers, and practitioners to harness the full potential of LLMs in addressing pressing social and public health challenges.
Recommendations
- ✓ Further research on the limitations and potential biases of LLMs in social media monitoring
- ✓ Development of guidelines and best practices for the deployment of LLMs in public health and social media monitoring contexts