Decoding AI Authorship: Can LLMs Truly Mimic Human Style Across Literature and Politics?
arXiv:2603.23219v1 Announce Type: new Abstract: Amidst the rising capabilities of generative AI to mimic specific human styles, this study investigates the ability of state-of-the-art large language models (LLMs), including GPT-4o, Gemini 1.5 Pro, and Claude Sonnet 3.5, to emulate the authorial signatures of prominent literary and political figures: Walt Whitman, William Wordsworth, Donald Trump, and Barack Obama. Utilizing a zero-shot prompting framework with strict thematic alignment, we generated synthetic corpora evaluated through a complementary framework combining transformer-based classification (BERT) and interpretable machine learning (XGBoost). Our methodology integrates Linguistic Inquiry and Word Count (LIWC) markers, perplexity, and readability indices to assess the divergence between AI-generated and human-authored text. Results demonstrate that AI-generated mimicry remains highly detectable, with XGBoost models trained on a restricted set of eight stylometric features a
arXiv:2603.23219v1 Announce Type: new Abstract: Amidst the rising capabilities of generative AI to mimic specific human styles, this study investigates the ability of state-of-the-art large language models (LLMs), including GPT-4o, Gemini 1.5 Pro, and Claude Sonnet 3.5, to emulate the authorial signatures of prominent literary and political figures: Walt Whitman, William Wordsworth, Donald Trump, and Barack Obama. Utilizing a zero-shot prompting framework with strict thematic alignment, we generated synthetic corpora evaluated through a complementary framework combining transformer-based classification (BERT) and interpretable machine learning (XGBoost). Our methodology integrates Linguistic Inquiry and Word Count (LIWC) markers, perplexity, and readability indices to assess the divergence between AI-generated and human-authored text. Results demonstrate that AI-generated mimicry remains highly detectable, with XGBoost models trained on a restricted set of eight stylometric features achieving accuracy comparable to high-dimensional neural classifiers. Feature importance analyses identify perplexity as the primary discriminative metric, revealing a significant divergence in the stochastic regularity of AI outputs compared to the higher variability of human writing. While LLMs exhibit distributional convergence with human authors on low-dimensional heuristic features, such as syntactic complexity and readability, they do not yet fully replicate the nuanced affective density and stylistic variance inherent in the human-authored corpus. By isolating the specific statistical gaps in current generative mimicry, this study provides a comprehensive benchmark for LLM stylistic behavior and offers critical insights for authorship attribution in the digital humanities and social media.
Executive Summary
This study critically examines the ability of advanced large language models (LLMs) to replicate the authorial styles of iconic literary and political figures, including Walt Whitman, William Wordsworth, Donald Trump, and Barack Obama. Employing a zero-shot prompting framework and a multi-metric evaluation system (BERT, XGBoost, LIWC, perplexity, readability indices), the research demonstrates that while LLMs achieve distributional convergence in low-dimensional stylistic features, their outputs remain detectable as AI-generated due to fundamental differences in stochastic regularity and affective density. The findings underscore the persistent gap between AI mimicry and human writing, offering a benchmark for authorship attribution in digital humanities and social media contexts.
Key Points
- ▸ LLMs (GPT-4o, Gemini 1.5 Pro, Claude Sonnet 3.5) were evaluated for their ability to mimic the writing styles of prominent literary and political figures using zero-shot prompting and thematic alignment.
- ▸ A hybrid evaluation framework combining BERT, XGBoost, LIWC, perplexity, and readability indices revealed that AI-generated text retains detectable stylistic divergences from human-authored works, particularly in stochastic regularity and affective density.
- ▸ Despite achieving convergence in low-dimensional features (e.g., syntactic complexity), LLMs fail to replicate the nuanced stylistic variance and affective depth characteristic of human writing, as evidenced by high detection accuracy in authorship attribution tasks.
Merits
Methodological Rigor
The study employs a sophisticated, multi-layered evaluation framework that integrates transformer-based classification, interpretable machine learning, and comprehensive linguistic metrics, ensuring robust and nuanced insights into stylistic mimicry.
Interdisciplinary Relevance
By bridging computational linguistics, digital humanities, and social media studies, the research offers a cross-disciplinary benchmark for assessing AI-generated text, with implications for authorship attribution, content moderation, and legal adjudication.
Empirical Grounding
The use of state-of-the-art LLMs and a diverse corpus of high-profile figures provides empirical grounding that enhances the credibility and applicability of the findings across multiple domains.
Demerits
Limited Scope of Authorial Styles
The study focuses on a narrow selection of literary and political figures, which may not fully capture the stylistic diversity or the rapid evolution of human writing styles, potentially limiting the generalizability of the results.
Zero-Shot Prompting Constraints
The reliance on zero-shot prompting frameworks may underrepresent the adaptability of LLMs, as fine-tuned or few-shot approaches could yield different outcomes in stylistic mimicry, particularly for figures with highly distinctive or idiosyncratic writing styles.
Static Evaluation Metrics
The evaluation metrics (e.g., perplexity, readability indices) are inherently static and may not fully account for the dynamic and adaptive nature of both human writing and LLM outputs over time, particularly in response to evolving cultural or linguistic trends.
Expert Commentary
This study represents a significant contribution to the growing body of research on AI-generated text, offering a rigorous and multi-faceted analysis of the limitations of current LLMs in replicating human writing styles. The integration of interpretable machine learning (XGBoost) with transformer-based models (BERT) provides a nuanced understanding of the statistical gaps that persist between AI outputs and human-authored works. Notably, the prominence of perplexity as a discriminative feature aligns with broader observations in computational linguistics, where the inherent stochasticity of AI-generated text remains a critical hurdle to achieving human-like coherence and variability. While the study’s focus on high-profile figures ensures relevance, future research should expand to include a broader range of stylistic and demographic diversity to enhance generalizability. The implications for digital humanities and social media are profound, particularly in an era where AI-generated content is increasingly indistinguishable from human-authored material. This work not only advances our understanding of AI mimicry but also serves as a call to action for policymakers, technologists, and ethicists to address the challenges posed by the proliferation of synthetic text.
Recommendations
- ✓ Expand the corpus to include a more diverse range of authorial styles, including underrepresented voices and evolving contemporary writing trends, to improve the robustness and generalizability of the findings.
- ✓ Investigate the potential of fine-tuned and few-shot prompting approaches to assess whether targeted model adaptations can bridge the identified gaps in stylistic mimicry, particularly for figures with highly distinctive or idiosyncratic writing styles.
- ✓ Develop dynamic, real-time evaluation metrics that can adapt to the evolving nature of both human writing and LLM outputs, particularly in response to cultural, linguistic, or technological shifts.
- ✓ Collaborate with policymakers to establish standardized benchmarks for AI detection tools, ensuring that authorship attribution methods are both reliable and transparent across different applications and jurisdictions.
Sources
Original: arXiv - cs.CL