Detection of Illicit Content on Online Marketplaces using Large Language Models
arXiv:2603.04707v1 Announce Type: new Abstract: Online marketplaces, while revolutionizing global commerce, have inadvertently facilitated the proliferation of illicit activities, including drug trafficking, counterfeit sales, and cybercrimes. Traditional content moderation methods such as manual reviews and rule-based automated systems struggle with scalability, dynamic obfuscation techniques, and multilingual content. Conventional machine learning models, though effective in simpler contexts, often falter when confronting the semantic complexities and linguistic nuances characteristic of illicit marketplace communications. This research investigates the efficacy of Large Language Models (LLMs), specifically Meta's Llama 3.2 and Google's Gemma 3, in detecting and classifying illicit online marketplace content using the multilingual DUTA10K dataset. Employing fine-tuning techniques such as Parameter-Efficient Fine-Tuning (PEFT) and quantization, these models were systematically benchm
arXiv:2603.04707v1 Announce Type: new Abstract: Online marketplaces, while revolutionizing global commerce, have inadvertently facilitated the proliferation of illicit activities, including drug trafficking, counterfeit sales, and cybercrimes. Traditional content moderation methods such as manual reviews and rule-based automated systems struggle with scalability, dynamic obfuscation techniques, and multilingual content. Conventional machine learning models, though effective in simpler contexts, often falter when confronting the semantic complexities and linguistic nuances characteristic of illicit marketplace communications. This research investigates the efficacy of Large Language Models (LLMs), specifically Meta's Llama 3.2 and Google's Gemma 3, in detecting and classifying illicit online marketplace content using the multilingual DUTA10K dataset. Employing fine-tuning techniques such as Parameter-Efficient Fine-Tuning (PEFT) and quantization, these models were systematically benchmarked against a foundational transformer-based model (BERT) and traditional machine learning baselines (Support Vector Machines and Naive Bayes). Experimental results reveal a task-dependent advantage for LLMs. In binary classification (illicit vs. non-illicit), Llama 3.2 demonstrated performance comparable to traditional methods. However, for complex, imbalanced multi-class classification involving 40 specific illicit categories, Llama 3.2 significantly surpassed all baseline models. These findings offer substantial practical implications for enhancing online safety, equipping law enforcement agencies, e-commerce platforms, and cybersecurity specialists with more effective, scalable, and adaptive tools for illicit content detection and moderation.
Executive Summary
This article explores the use of Large Language Models (LLMs) for detecting illicit content on online marketplaces. The research finds that LLMs, specifically Meta's Llama 3.2 and Google's Gemma 3, outperform traditional machine learning models and baseline methods in complex, multi-class classification tasks. The study highlights the potential of LLMs in enhancing online safety and providing law enforcement agencies and e-commerce platforms with effective tools for illicit content detection and moderation. The findings have significant practical implications for improving online marketplace regulation and cybersecurity.
Key Points
- ▸ LLMs demonstrate task-dependent advantage in detecting illicit online marketplace content
- ▸ Llama 3.2 shows comparable performance to traditional methods in binary classification, but surpasses baselines in multi-class classification
- ▸ Fine-tuning techniques such as PEFT and quantization improve LLM performance
Merits
Improved Accuracy
LLMs demonstrate higher accuracy in complex classification tasks, making them a valuable tool for online safety and regulation
Scalability
LLMs can handle large volumes of data and scale to meet the needs of online marketplaces
Adaptability
LLMs can be fine-tuned to adapt to new and evolving forms of illicit content
Demerits
Limited Contextual Understanding
LLMs may struggle to understand the nuances of human language and context, potentially leading to false positives or negatives
Dependence on Training Data
LLMs are only as effective as the data they are trained on, and may not perform well on unseen or novel forms of illicit content
Expert Commentary
The research highlights the potential of LLMs to revolutionize the detection and moderation of illicit content on online marketplaces. However, it is crucial to address the limitations of LLMs, including their dependence on training data and potential lack of contextual understanding. Further research is needed to explore the applications and limitations of LLMs in this context. The use of LLMs also raises important questions about the role of human oversight and review in the detection and moderation process, and the need for regulatory frameworks that balance the benefits of LLMs with the need for transparency and accountability.
Recommendations
- ✓ Further research is needed to explore the applications and limitations of LLMs in detecting illicit content on online marketplaces
- ✓ Regulatory bodies should consider the potential of LLMs to improve online safety and reassess their approaches to online marketplace regulation accordingly