Academic

Evaluating Large Language Models on Historical Health Crisis Knowledge in Resource-Limited Settings: A Hybrid Multi-Metric Study

arXiv:2603.20514v1 Announce Type: new Abstract: Large Language Models (LLMs) offer significant potential for delivering health information. However, their reliability in low-resource contexts remains uncertain. This study evaluates GPT-4, Gemini Pro, Llama~3, and Mistral-7B on health crisis-related enquiries concerning COVID-19, dengue, the Nipah virus, and Chikungunya in the low-resource context of Bangladesh. We constructed a question--answer dataset from authoritative sources and assessed model outputs through semantic similarity, expert-model cross-evaluation, and Natural Language Inference (NLI). Findings highlight both the strengths and limitations of LLMs in representing epidemiological history and health crisis knowledge, underscoring their promise and risks for informing policy in resource-constrained environments.

M
Mohammed Rakibul Hasan
· · 1 min read · 8 views

arXiv:2603.20514v1 Announce Type: new Abstract: Large Language Models (LLMs) offer significant potential for delivering health information. However, their reliability in low-resource contexts remains uncertain. This study evaluates GPT-4, Gemini Pro, Llama~3, and Mistral-7B on health crisis-related enquiries concerning COVID-19, dengue, the Nipah virus, and Chikungunya in the low-resource context of Bangladesh. We constructed a question--answer dataset from authoritative sources and assessed model outputs through semantic similarity, expert-model cross-evaluation, and Natural Language Inference (NLI). Findings highlight both the strengths and limitations of LLMs in representing epidemiological history and health crisis knowledge, underscoring their promise and risks for informing policy in resource-constrained environments.

Executive Summary

This study evaluates the reliability of Large Language Models (LLMs) in delivering health information in low-resource contexts, such as Bangladesh. The researchers assess four LLMs - GPT-4, Gemini Pro, Llama~3, and Mistral-7B - on health crisis-related enquiries concerning COVID-19, dengue, the Nipah virus, and Chikungunya. The findings highlight both the strengths and limitations of LLMs in representing epidemiological history and health crisis knowledge. The study uses a hybrid multi-metric approach, incorporating semantic similarity, expert-model cross-evaluation, and Natural Language Inference (NLI) to assess model outputs. The results underscore the promise and risks of LLMs for informing policy in resource-constrained environments, emphasizing the need for further research and evaluation.

Key Points

  • The study evaluates the reliability of LLMs in low-resource contexts for delivering health information.
  • Four LLMs - GPT-4, Gemini Pro, Llama~3, and Mistral-7B - are assessed on health crisis-related enquiries.
  • The study uses a hybrid multi-metric approach, incorporating semantic similarity, expert-model cross-evaluation, and NLI to assess model outputs.

Merits

Strengths of LLMs in Representing Epidemiological History

The study highlights the ability of LLMs to provide a broad range of information on epidemiological history, including historical data, trends, and patterns.

Potential for Informing Policy in Resource-Constrained Environments

The study underscores the potential of LLMs to inform policy decisions in resource-constrained environments, where access to healthcare information is limited.

Demerits

Limitations of LLMs in Representing Health Crisis Knowledge

The study highlights the limitations of LLMs in providing nuanced and context-specific health crisis knowledge, particularly in low-resource contexts.

Need for Further Research and Evaluation

The study emphasizes the need for further research and evaluation to fully understand the potential and limitations of LLMs in delivering health information in low-resource contexts.

Expert Commentary

The study contributes significantly to our understanding of the potential and limitations of LLMs in delivering health information in low-resource contexts. However, the findings also highlight the need for further research and evaluation to fully understand the potential of LLMs in informing policy decisions. Furthermore, the study underscores the importance of digital health literacy in low-resource contexts, where access to healthcare information is limited. As AI continues to play an increasingly important role in healthcare decision-making, it is essential that we prioritize further research on the use of AI in healthcare decision-making, particularly in low-resource contexts.

Recommendations

  • Further research is needed to fully understand the potential and limitations of LLMs in delivering health information in low-resource contexts.
  • Policy decisions in low-resource contexts should be informed by the potential and limitations of LLMs, emphasizing the need for further research and evaluation.

Sources

Original: arXiv - cs.CL