Academic

The GELATO Dataset for Legislative NER

arXiv:2603.14130v1 Announce Type: new Abstract: This paper introduces GELATO (Government, Executive, Legislative, and Treaty Ontology), a dataset of U.S. House and Senate bills from the 118th Congress annotated using a novel two-level named entity recognition ontology designed for U.S. legislative texts. We fine-tune transformer-based models (BERT, RoBERTa) of different architectures and sizes on this dataset for first-level prediction. We then use LLMs with optimized prompts to complete the second level prediction. The strong performance of RoBERTa and relatively weak performance of BERT models, as well as the application of LLMs as second-level predictors, support future research in legislative NER or downstream tasks using these model combinations as extraction tools.

M
Matthew Flynn, Timothy Obiso, Sam Newman
· · 1 min read · 8 views

arXiv:2603.14130v1 Announce Type: new Abstract: This paper introduces GELATO (Government, Executive, Legislative, and Treaty Ontology), a dataset of U.S. House and Senate bills from the 118th Congress annotated using a novel two-level named entity recognition ontology designed for U.S. legislative texts. We fine-tune transformer-based models (BERT, RoBERTa) of different architectures and sizes on this dataset for first-level prediction. We then use LLMs with optimized prompts to complete the second level prediction. The strong performance of RoBERTa and relatively weak performance of BERT models, as well as the application of LLMs as second-level predictors, support future research in legislative NER or downstream tasks using these model combinations as extraction tools.

Executive Summary

The GELATO dataset represents a significant advancement in legislative named entity recognition by introducing a novel two-level ontology tailored for U.S. legislative texts. The paper effectively combines transformer-based models (BERT, RoBERTa) for first-level prediction and leverages LLMs via optimized prompts for second-level refinement. The empirical results demonstrate clear performance disparities—RoBERTa outperforms BERT, validating the efficacy of the ontology and model combination. This work bridges a critical gap in legal text processing by offering a structured, scalable framework for legislative NER, with potential applications in downstream legal analytics and policy monitoring.

Key Points

  • Introduction of GELATO as a two-level legislative NER ontology
  • Use of transformer models (BERT, RoBERTa) for first-level annotation
  • Application of LLMs with optimized prompts for second-level prediction

Merits

Innovation

The novel two-level ontology provides a structured, scalable framework for legislative text annotation, enhancing precision and applicability.

Performance Validation

Empirical evidence supports the superiority of RoBERTa and effectiveness of LLMs as second-level predictors, offering a validated methodology for future research.

Demerits

Scope Limitation

The dataset is currently confined to the 118th Congress, limiting generalizability to other legislative bodies or time periods.

Model Dependency

Reliance on specific transformer architectures and prompt-optimized LLMs may introduce barriers to reproducibility or adaptability in resource-constrained settings.

Expert Commentary

The GELATO dataset marks a pivotal step in the intersection of computational linguistics and legislative informatics. The authors’ decision to adopt a two-level ontology—first leveraging transformer-based embeddings for coarse-grained classification, then deploying fine-tuned LLMs for contextual refinement—demonstrates a sophisticated understanding of the nuances inherent in legislative texts. The performance differential between RoBERTa and BERT is particularly instructive: RoBERTa’s contextual richness aligns more effectively with the syntactic and semantic complexity of legislative drafting, suggesting that future architectures should prioritize contextual depth over raw size. Moreover, the use of LLMs as second-level predictors opens a new paradigm in hybrid NER systems, where rule-based or high-level semantic inference is augmented by latent linguistic understanding. This hybrid architecture may become a template for similar domains beyond legislation, such as judicial opinions or regulatory filings. Importantly, the dataset’s temporal specificity (118th Congress) invites future work to extend GELATO across congressional cycles, thereby enabling longitudinal studies of legislative language evolution. Finally, the paper’s emphasis on prompt optimization as a critical lever for LLM performance aligns with emerging trends in AI-assisted legal analytics, reinforcing the importance of human-in-the-loop design in automated legal processing.

Recommendations

  • Extend GELATO to encompass additional congressional terms and legislative bodies to improve generalizability.
  • Develop open-source prompt templates and fine-tuning pipelines to facilitate reproducibility and adoption by the legal AI community.

Sources