Academic

Evaluation of LLMs in retrieving food and nutritional context for RAG systems

arXiv:2603.09704v1 Announce Type: new Abstract: In this article, we evaluate four Large Language Models (LLMs) and their effectiveness at retrieving data within a specialized Retrieval-Augmented Generation (RAG) system, using a comprehensive food composition database. Our method is focused on the LLMs ability to translate natural language queries into structured metadata filters, enabling efficient retrieval via a Chroma vector database. By achieving high accuracy in this critical retrieval step, we demonstrate that LLMs can serve as an accessible, high-performance tool, drastically reducing the manual effort and technical expertise previously required for domain experts, such as food compilers and nutritionists, to leverage complex food and nutrition data. However, despite the high performance on easy and moderately complex queries, our analysis of difficult questions reveals that reliable retrieval remains challenging when queries involve non-expressible constraints. These findings

arXiv:2603.09704v1 Announce Type: new Abstract: In this article, we evaluate four Large Language Models (LLMs) and their effectiveness at retrieving data within a specialized Retrieval-Augmented Generation (RAG) system, using a comprehensive food composition database. Our method is focused on the LLMs ability to translate natural language queries into structured metadata filters, enabling efficient retrieval via a Chroma vector database. By achieving high accuracy in this critical retrieval step, we demonstrate that LLMs can serve as an accessible, high-performance tool, drastically reducing the manual effort and technical expertise previously required for domain experts, such as food compilers and nutritionists, to leverage complex food and nutrition data. However, despite the high performance on easy and moderately complex queries, our analysis of difficult questions reveals that reliable retrieval remains challenging when queries involve non-expressible constraints. These findings demonstrate that LLM-driven metadata filtering excels when constraints can be explicitly expressed, but struggles when queries exceed the representational scope of the metadata format.

Executive Summary

This article evaluates the effectiveness of four Large Language Models (LLMs) in retrieving food and nutritional context for Retrieval-Augmented Generation (RAG) systems. The study demonstrates the potential of LLMs as high-performance tools for domain experts, such as food compilers and nutritionists, to efficiently retrieve complex food and nutrition data. However, the analysis reveals that LLM-driven metadata filtering struggles with queries that involve non-expressible constraints. The findings highlight the importance of explicit constraint expression in achieving reliable retrieval. The study's results have significant implications for the application of LLMs in various fields, including food science, nutrition, and data retrieval.

Key Points

  • LLMs can serve as high-performance tools for domain experts to retrieve complex food and nutrition data.
  • LLMs excel in retrieving data when constraints can be explicitly expressed.
  • LLMs struggle with retrieving data when queries involve non-expressible constraints.

Merits

Strength in Expertise

The study demonstrates the potential of LLMs to drastically reduce manual effort and technical expertise required for domain experts to leverage complex food and nutrition data.

High-Precision Retrieval

The study achieves high accuracy in retrieving data within a specialized RAG system, using a comprehensive food composition database.

Demerits

Limitation in Non-Expressible Constraints

The study reveals that LLM-driven metadata filtering struggles with retrieving data when queries involve non-expressible constraints.

Scope of Metadata Format

The study highlights the importance of explicit constraint expression, which is limited by the representational scope of the metadata format.

Expert Commentary

The study provides valuable insights into the effectiveness of LLMs in retrieving food and nutritional context for RAG systems. The findings demonstrate the potential of LLMs to serve as high-performance tools for domain experts, but also highlight the importance of explicit constraint expression. The study's limitations in handling non-expressible constraints are a significant concern, and further research is needed to address this issue. The study's results have significant implications for the application of LLMs in various fields, including food science, nutrition, and data retrieval.

Recommendations

  • Future studies should investigate the development of LLMs that can handle non-expressible constraints more effectively.
  • Policymakers should consider the limitations of LLMs in retrieving data with non-expressible constraints when developing data retrieval systems.

Sources