Academic

FEAST: Retrieval-Augmented Multi-Hierarchical Food Classification for the FoodEx2 System

arXiv:2603.03176v1 Announce Type: new Abstract: Hierarchical text classification (HTC) and extreme multi-label classification (XML) tasks face compounded challenges from complex label interdependencies, data sparsity, and extreme output dimensions. These challenges are exemplified in the European Food Safety Authority's FoodEx2 system-a standardized food classification framework essential for food consumption monitoring and contaminant exposure assessment across Europe. FoodEx2 coding transforms natural language food descriptions into a set of codes from multiple standardized hierarchies, but faces implementation barriers due to its complex structure. Given a food description (e.g., "organic yogurt''), the system identifies its base term ("yogurt''), all the applicable facet categories (e.g., "production method''), and then, every relevant facet descriptors to each category (e.g., "organic production''). While existing models perform adequately on well-balanced and semantically dense

arXiv:2603.03176v1 Announce Type: new Abstract: Hierarchical text classification (HTC) and extreme multi-label classification (XML) tasks face compounded challenges from complex label interdependencies, data sparsity, and extreme output dimensions. These challenges are exemplified in the European Food Safety Authority's FoodEx2 system-a standardized food classification framework essential for food consumption monitoring and contaminant exposure assessment across Europe. FoodEx2 coding transforms natural language food descriptions into a set of codes from multiple standardized hierarchies, but faces implementation barriers due to its complex structure. Given a food description (e.g., "organic yogurt''), the system identifies its base term ("yogurt''), all the applicable facet categories (e.g., "production method''), and then, every relevant facet descriptors to each category (e.g., "organic production''). While existing models perform adequately on well-balanced and semantically dense hierarchies, no work has been applied on the practical constraints imposed by the FoodEx2 system. The limited literature addressing such real-world scenarios further compounds these challenges. We propose FEAST (Food Embedding And Semantic Taxonomy), a novel retrieval-augmented framework that decomposes FoodEx2 classification into a three-stage approach: (1) base term identification, (2) multi-label facet prediction, and (3) facet descriptor assignment. By leveraging the system's hierarchical structure to guide training and performing deep metric learning, FEASTlearns discriminative embeddings that mitigate data sparsity and improve generalization on rare and fine-grained labels. Evaluated on the multilingual FoodEx2 benchmark, FEAST outperforms the prior European's CNN baseline F1 scores by 12-38 % on rare classes.

Executive Summary

The FEAST framework addresses the challenges of hierarchical text classification and extreme multi-label classification in the FoodEx2 system. It proposes a three-stage approach, leveraging the system's hierarchical structure to guide training and improve generalization on rare and fine-grained labels. FEAST outperforms the prior baseline by 12-38% on rare classes, demonstrating its effectiveness in mitigating data sparsity and improving classification accuracy.

Key Points

  • FEAST is a novel retrieval-augmented framework for FoodEx2 classification
  • It decomposes classification into base term identification, multi-label facet prediction, and facet descriptor assignment
  • FEAST leverages the system's hierarchical structure to guide training and improve generalization

Merits

Effective Handling of Rare Classes

FEAST demonstrates significant improvement in classification accuracy for rare classes, outperforming the prior baseline by 12-38%.

Improved Generalization

FEAST's ability to learn discriminative embeddings enables better generalization on fine-grained labels, addressing the challenges of data sparsity.

Demerits

Limited Literature

The limited existing literature on addressing real-world scenarios like the FoodEx2 system may hinder the development and evaluation of FEAST.

Complexity of the FoodEx2 System

The complex structure of the FoodEx2 system may pose implementation barriers and challenges for the FEAST framework.

Expert Commentary

The FEAST framework represents a significant advancement in addressing the challenges of hierarchical text classification and extreme multi-label classification. By leveraging the hierarchical structure of the FoodEx2 system, FEAST demonstrates improved generalization and accuracy, particularly for rare classes. However, the complexity of the FoodEx2 system and limited existing literature may pose challenges for the widespread adoption and evaluation of FEAST. Further research is necessary to fully explore the potential of FEAST and its applications in food consumption monitoring and contaminant exposure assessment.

Recommendations

  • Further evaluation of FEAST on diverse datasets to assess its robustness and generalizability
  • Exploration of FEAST's applications in other extreme multi-label classification tasks beyond the FoodEx2 system

Sources