Skip to main content
Academic

Small Wins Big: Comparing Large Language Models and Domain Fine-Tuned Models for Sarcasm Detection in Code-Mixed Hinglish Text

arXiv:2602.21933v1 Announce Type: new Abstract: Sarcasm detection in multilingual and code-mixed environments remains a challenging task for natural language processing models due to structural variations, informal expressions, and low-resource linguistic availability. This study compares four large language models, Llama 3.1, Mistral, Gemma 3, and Phi-4, with a fine-tuned DistilBERT model for sarcasm detection in code-mixed Hinglish text. The results indicate that the smaller, sequentially fine-tuned DistilBERT model achieved the highest overall accuracy of 84%, outperforming all of the LLMs in zero and few-shot set ups, using minimal LLM generated code-mixed data used for fine-tuning. These findings indicate that domain-adaptive fine-tuning of smaller transformer based models may significantly improve sarcasm detection over general LLM inference, in low-resource and data scarce settings.

B
Bitan Majumder, Anirban Sen
· · 1 min read · 3 views

arXiv:2602.21933v1 Announce Type: new Abstract: Sarcasm detection in multilingual and code-mixed environments remains a challenging task for natural language processing models due to structural variations, informal expressions, and low-resource linguistic availability. This study compares four large language models, Llama 3.1, Mistral, Gemma 3, and Phi-4, with a fine-tuned DistilBERT model for sarcasm detection in code-mixed Hinglish text. The results indicate that the smaller, sequentially fine-tuned DistilBERT model achieved the highest overall accuracy of 84%, outperforming all of the LLMs in zero and few-shot set ups, using minimal LLM generated code-mixed data used for fine-tuning. These findings indicate that domain-adaptive fine-tuning of smaller transformer based models may significantly improve sarcasm detection over general LLM inference, in low-resource and data scarce settings.

Executive Summary

This study investigates the effectiveness of large language models (LLMs) and domain fine-tuned models for sarcasm detection in code-mixed Hinglish text. The results show that a smaller, sequentially fine-tuned DistilBERT model outperforms all LLMs in zero and few-shot setups, achieving an accuracy of 84%. This finding suggests that domain-adaptive fine-tuning of smaller transformer-based models can significantly improve sarcasm detection in low-resource and data-scarce settings. The study highlights the potential of fine-tuning existing models for specific tasks and languages, rather than solely relying on large, pre-trained LLMs. The findings have significant implications for natural language processing applications, particularly in multilingual and code-mixed environments.

Key Points

  • The study compares four LLMs (Llama 3.1, Mistral, Gemma 3, and Phi-4) with a fine-tuned DistilBERT model for sarcasm detection in code-mixed Hinglish text.
  • The fine-tuned DistilBERT model achieved an accuracy of 84%, outperforming all LLMs in zero and few-shot setups.
  • The study highlights the potential of domain-adaptive fine-tuning of smaller transformer-based models for specific tasks and languages.

Merits

Strength

The study's focus on low-resource and data-scarce settings is a significant contribution to the field of natural language processing.

Methodological rigor

The study employs a rigorous methodology, comparing multiple LLMs and a fine-tuned model in zero and few-shot setups.

Demerits

Limitation

The study's reliance on a single fine-tuned model limits the generalizability of the findings.

Scalability

The study does not explore the scalability of the fine-tuned model to larger datasets or more complex tasks.

Expert Commentary

The study's findings are significant, but the results should be interpreted with caution due to the study's limitations. The focus on low-resource and data-scarce settings is a crucial contribution to the field, but the reliance on a single fine-tuned model limits the generalizability of the findings. Furthermore, the study does not explore the scalability of the fine-tuned model to larger datasets or more complex tasks. Nevertheless, the study highlights the potential of domain-adaptive fine-tuning of smaller transformer-based models for specific tasks and languages, which is a promising area of research.

Recommendations

  • Future studies should explore the scalability of domain-adaptive fine-tuning to larger datasets and more complex tasks.
  • Researchers should investigate the transferability of fine-tuned models to other languages and tasks.

Sources