From Perceptions To Evidence: Detecting AI-Generated Content In Turkish News Media With A Fine-Tuned Bert Classifier
arXiv:2602.13504v1 Announce Type: new Abstract: The rapid integration of large language models into newsroom workflows has raised urgent questions about the prevalence of AI-generated content in online media. While computational studies have begun to quantify this phenomenon in English-language outlets, no empirical investigation exists for Turkish news media, where existing research remains limited to qualitative interviews with journalists or fake news detection. This study addresses that gap by fine-tuning a Turkish-specific BERT model (dbmdz/bert-base-turkish-cased) on a labeled dataset of 3,600 articles from three major Turkish outlets with distinct editorial orientations for binary classification of AI-rewritten content. The model achieves 0.9708 F1 score on the held-out test set with symmetric precision and recall across both classes. Subsequent deployment on over 3,500 unseen articles spanning between 2023 and 2026 reveals consistent cross-source and temporally stable classifi
arXiv:2602.13504v1 Announce Type: new Abstract: The rapid integration of large language models into newsroom workflows has raised urgent questions about the prevalence of AI-generated content in online media. While computational studies have begun to quantify this phenomenon in English-language outlets, no empirical investigation exists for Turkish news media, where existing research remains limited to qualitative interviews with journalists or fake news detection. This study addresses that gap by fine-tuning a Turkish-specific BERT model (dbmdz/bert-base-turkish-cased) on a labeled dataset of 3,600 articles from three major Turkish outlets with distinct editorial orientations for binary classification of AI-rewritten content. The model achieves 0.9708 F1 score on the held-out test set with symmetric precision and recall across both classes. Subsequent deployment on over 3,500 unseen articles spanning between 2023 and 2026 reveals consistent cross-source and temporally stable classification patterns, with mean prediction confidence exceeding 0.96 and an estimated 2.5 percentage of examined news content rewritten or revised by LLMs on average. To the best of our knowledge, this is the first study to move beyond self-reported journalist perceptions toward empirical, data-driven measurement of AI usage in Turkish news media.
Executive Summary
This study addresses the gap in empirical research on AI-generated content in Turkish news media by fine-tuning a Turkish-specific BERT model to detect AI-rewritten articles. The research achieves a high F1 score and reveals that approximately 2.5% of examined news content is AI-generated, marking the first data-driven measurement of AI usage in Turkish news media.
Key Points
- ▸ First empirical study on AI-generated content in Turkish news media.
- ▸ Fine-tuned Turkish BERT model achieves high accuracy in detecting AI-rewritten content.
- ▸ Approximately 2.5% of examined news content is estimated to be AI-generated.
Merits
Innovative Methodology
The study employs a fine-tuned BERT model specifically adapted for Turkish, providing a robust and accurate method for detecting AI-generated content.
High Accuracy
The model achieves an F1 score of 0.9708, demonstrating high precision and recall, which is crucial for reliable detection.
Comprehensive Dataset
The study uses a large and diverse dataset of 3,600 articles from three major Turkish outlets, ensuring the findings are representative and generalizable.
Demerits
Limited Scope
The study focuses on only three Turkish news outlets, which may not fully capture the diversity of AI usage across all Turkish media.
Temporal Limitations
The dataset spans from 2023 to 2026, which may not reflect current trends or future developments in AI-generated content.
Binary Classification
The model uses binary classification, which may oversimplify the nuances of AI-generated content and its integration into news media.
Expert Commentary
This study represents a significant advancement in the empirical measurement of AI-generated content in Turkish news media. The high accuracy of the fine-tuned BERT model provides a reliable method for detecting AI-rewritten articles, addressing a critical gap in the existing literature. The findings suggest that while AI-generated content is present, it is currently a minor component of the examined news media. However, the study's limitations, such as the focus on a limited number of outlets and the binary classification approach, highlight areas for future research. The implications of this study are profound, offering practical tools for media outlets and policymakers to ensure transparency and ethical use of AI in journalism. As AI continues to integrate into newsroom workflows, such empirical studies will be crucial in guiding both industry practices and regulatory frameworks.
Recommendations
- ✓ Expand the study to include a broader range of Turkish news outlets to capture the full spectrum of AI usage in the media landscape.
- ✓ Develop more nuanced classification models that can differentiate between various degrees of AI involvement in content creation, rather than relying on binary classification.