Academic

Just Pass Twice: Efficient Token Classification with LLMs for Zero-Shot NER

arXiv:2604.05158v1 Announce Type: new Abstract: Large language models encode extensive world knowledge valuable for zero-shot named entity recognition. However, their causal attention mechanism, where tokens attend only to preceding context, prevents effective token classification when disambiguation requires future context. Existing approaches use LLMs generatively, prompting them to list entities or produce structured outputs, but suffer from slow autoregressive decoding, hallucinated entities, and formatting errors. We propose Just Pass Twice (JPT), a simple yet effective method that enables causal LLMs to perform discriminative token classification with full bidirectional context. Our key insight is that concatenating the input to itself lets each token in the second pass attend to the complete sentence, requiring no architectural modifications. We combine these representations with definition-guided entity embeddings for flexible zero-shot generalization. Our approach achieves

A
Ahmed Ewais, Ahmed Hashish, Amr Ali
· · 1 min read · 5 views

arXiv:2604.05158v1 Announce Type: new Abstract: Large language models encode extensive world knowledge valuable for zero-shot named entity recognition. However, their causal attention mechanism, where tokens attend only to preceding context, prevents effective token classification when disambiguation requires future context. Existing approaches use LLMs generatively, prompting them to list entities or produce structured outputs, but suffer from slow autoregressive decoding, hallucinated entities, and formatting errors. We propose Just Pass Twice (JPT), a simple yet effective method that enables causal LLMs to perform discriminative token classification with full bidirectional context. Our key insight is that concatenating the input to itself lets each token in the second pass attend to the complete sentence, requiring no architectural modifications. We combine these representations with definition-guided entity embeddings for flexible zero-shot generalization. Our approach achieves state-of-the-art results on zero-shot NER benchmarks, surpassing the previous best method by +7.9 F1 on average across CrossNER and MIT benchmarks, being over 20x faster than comparable generative methods.

Executive Summary

The article introduces Just Pass Twice (JPT), a novel method for zero-shot Named Entity Recognition (NER) using causal Large Language Models (LLMs). By concatenating input text with itself, JPT enables bidirectional context processing without architectural modifications, addressing the limitation of causal attention mechanisms. The approach combines full-sentence contextual representations with definition-guided entity embeddings, achieving state-of-the-art performance on zero-shot NER benchmarks (e.g., +7.9 F1 over prior best on CrossNER and MIT benchmarks) while being over 20x faster than generative alternatives. This work bridges the gap between discriminative and generative NER methods, offering a scalable and efficient solution for real-world applications.

Key Points

  • Causal LLMs' unidirectional attention limits zero-shot NER performance due to reliance on future context for disambiguation.
  • JPT leverages concatenation to simulate bidirectional context in a second pass, enabling discriminative token classification without architectural changes.
  • Definition-guided entity embeddings enhance zero-shot generalization, surpassing prior state-of-the-art benchmarks by significant margins while improving computational efficiency.

Merits

Computational Efficiency

JPT achieves over 20x speedup compared to generative methods by avoiding autoregressive decoding and reducing overhead from formatting and hallucination corrections.

Architectural Simplicity

The method requires no modifications to the LLM architecture, relying solely on input concatenation and embeddings, making it broadly applicable to existing models.

Performance Gains

Demonstrates substantial improvements in zero-shot NER benchmarks (e.g., +7.9 F1 on average), indicating superior accuracy and generalization across diverse domains.

Demerits

Input Length Constraints

Concatenation doubles input length, which may pose challenges for very long sequences due to memory or computational limits, particularly in resource-constrained environments.

Latency in Second Pass

While faster than generative methods, the two-pass approach introduces latency compared to traditional unidirectional classifiers, which could impact real-time applications.

Dependency on Entity Definitions

The effectiveness of definition-guided embeddings relies on high-quality entity definitions, which may not always be available or may require manual curation for niche domains.

Expert Commentary

This article presents a highly innovative and pragmatic solution to a well-documented limitation of causal LLMs in discriminative tasks. The authors' insight—using input concatenation to simulate bidirectional context—is both elegant and broadly applicable, offering a compelling alternative to the computationally expensive and error-prone generative approaches dominating current NER pipelines. The performance gains, particularly the +7.9 F1 improvement, are remarkable and suggest that JPT could become a benchmark for zero-shot NER tasks. However, the method's reliance on entity definitions and potential scalability issues with long sequences warrant further exploration. From a practical standpoint, JPT's efficiency could reshape industry practices, particularly in domains where real-time processing is critical. The work also invites deeper investigation into hybrid architectures that combine the strengths of causal and bidirectional modeling without sacrificing simplicity.

Recommendations

  • Explore hybrid architectures that integrate JPT's concatenation strategy with lightweight bidirectional mechanisms (e.g., memory-efficient attention) to further reduce latency and memory overhead.
  • Conduct comprehensive evaluations across multilingual and domain-specific datasets to validate the method's robustness in real-world scenarios beyond standard benchmarks.
  • Investigate adaptive concatenation techniques (e.g., dynamic truncation or selective duplication) to optimize performance for varying sequence lengths.
  • Develop automated tools for generating high-quality entity definitions to enhance the method's applicability in low-resource or rapidly evolving domains.

Sources

Original: arXiv - cs.CL