Academic

Fine-Tuning and Evaluating Conversational AI for Agricultural Advisory

Sanyam Singh, Naga Ganesh, Vineet Singh, Lakshmi Pedapudi, Ritesh Kumar, SSP Jyothi, Archana Karanam, C. Yashoda, Mettu Vijaya Rekha Reddy, Shesha Phani Debbesa, Chandan Dash · March 6, 2026 · 1 min read · 44 views

#cs.CL #cs.AI #cs.LG

arXiv:2603.03294v1 Announce Type: cross Abstract: Large Language Models show promise for agricultural advisory, yet vanilla models exhibit unsupported recommendations, generic advice lacking specific, actionable detail, and communication styles misaligned with smallholder farmer needs. In high stakes agricultural contexts, where recommendation accuracy has direct consequences for farmer outcomes, these limitations pose challenges for responsible deployment. We present a hybrid LLM architecture that decouples factual retrieval from conversational delivery: supervised fine-tuning with LoRA on expert-curated GOLDEN FACTS (atomic, verified units of agricultural knowledge) optimizes fact recall, while a separate stitching layer transforms retrieved facts into culturally appropriate, safety-aware responses. Our evaluation framework, DG-EVAL, performs atomic fact verification (measuring recall, precision, and contradiction detection) against expert-curated ground truth rather than Wikipedia or retrieved documents. Experiments across multiple model configurations on crops and queries from Bihar, India show that fine-tuning on curated data substantially improves fact recall and F1, while maintaining high relevance. Using a fine-tuned smaller model achieves comparable or better factual quality at a fraction of the cost of frontier models. A stitching layer further improves safety subscores while maintaining high conversational quality. We release the farmerchat-prompts library to enable reproducible development of domain-specific agricultural AI.

Executive Summary

This article presents a novel approach to fine-tuning and evaluating conversational AI for agricultural advisory, specifically addressing limitations in Large Language Model (LLM) performance. The proposed hybrid LLM architecture decouples factual retrieval from conversational delivery, leveraging supervised fine-tuning with LoRA on curated data to optimize fact recall. Experimental results demonstrate significant improvements in fact recall, F1, and safety subscores, while maintaining high conversational quality. The release of the farmerchat-prompts library enables reproducible development of domain-specific agricultural AI. This work has significant implications for the responsible deployment of AI in high-stakes agricultural contexts, where recommendation accuracy directly impacts farmer outcomes.

Key Points

▸ Decoupling factual retrieval from conversational delivery improves fact recall and F1
▸ Supervised fine-tuning with LoRA on curated data optimizes fact recall
▸ The farmerchat-prompts library enables reproducible development of domain-specific agricultural AI

Merits

Strength in Methodology

The article presents a well-structured approach to addressing limitations in LLM performance, leveraging a novel hybrid architecture and evaluation framework. The use of expert-curated data and ground truth for evaluation adds credibility to the results.

Improved Factual Quality

The experimental results demonstrate significant improvements in fact recall, F1, and safety subscores, indicating that the proposed approach can improve the factual quality of conversational AI for agricultural advisory.

Demerits

Limited Generalizability

The experimental results are based on a single dataset from Bihar, India, which may limit the generalizability of the findings to other contexts and regions.

Dependence on Curated Data

The proposed approach relies on the availability of curated data, which may not be feasible or practical in all settings, particularly in resource-constrained environments.

Expert Commentary

This article presents a significant contribution to the field of conversational AI, particularly in the context of agricultural advisory. The proposed hybrid architecture and evaluation framework offer a promising approach to addressing the limitations of LLM performance. However, the article's reliance on curated data and limited generalizability of the findings may limit the scalability of the approach. Nevertheless, the release of the farmerchat-prompts library provides a valuable resource for reproducible development of domain-specific agricultural AI. As the field of AI continues to evolve, it is essential to prioritize the development of domain-specific solutions that address the unique needs and challenges of smallholder farmers.

Recommendations

✓ Future research should focus on developing approaches that can leverage uncurated data sources, such as online forums and social media, to improve the scalability of the proposed approach.
✓ Policymakers should prioritize investments in the development of domain-specific AI solutions for agriculture, particularly in resource-constrained environments.

Sources

arXiv - cs.AI

Fine-Tuning and Evaluating Conversational AI for Agricultural Advisory

AI Commentary

Executive Summary

Key Points

Merits

Strength in Methodology

Improved Factual Quality

Demerits

Limited Generalizability

Dependence on Curated Data

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs