Conference

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track - ACL Anthology

· · 10 min read · 8 views

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track Yunyao Li , Angeliki Lazaridou (Editors) Anthology ID: 2022.emnlp-industry Month: December Year: 2022 Address: Abu Dhabi, UAE Venue: EMNLP SIG: Publisher: Association for Computational Linguistics URL: https://aclanthology.org/2022.emnlp-industry/ DOI: 10.18653/v1/2022.emnlp-industry Bib Export formats: BibTeX MODS XML EndNote PDF: https://aclanthology.org/2022.emnlp-industry.pdf PDF (full) Bib TeX Search Show all abstracts Hide all abstracts pdf bib Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track Yunyao Li | Angeliki Lazaridou pdf bib abs Unsupervised Term Extraction for Highly Technical Domains Francesco Fusco | Peter Staar | Diego Antognini Term extraction is an information extraction task at the root of knowledge discovery platforms. Developing term extractors that are able to generalize across very diverse and potentially highly technical domains is challenging, as annotations for domains requiring in-depth expertise are scarce and expensive to obtain. In this paper, we describe the term extraction subsystem of a commercial knowledge discovery platform that targets highly technical fields such as pharma, medical, and material science. To be able to generalize across domains, we introduce a fully unsupervised annotator (UA). It extracts terms by combining novel morphological signals from sub-word tokenization with term-to-topic and intra-term similarity metrics, computed using general-domain pre-trained sentence-encoders. The annotator is used to implement a weakly-supervised setup, where transformer-models are fine-tuned (or pre-trained) over the training data generated by running the UA over large unlabeled corpora. Our experiments demonstrate that our setup can improve the predictive performance while decreasing the inference latency on both CPUs and GPUs. Our annotators provide a very competitive baseline for all the cases where annotations are not available. pdf bib abs D yna M a R : Dynamic Prompt with Mask Token Representation Xiaodi Sun | Sunny Rajagopalan | Priyanka Nigam | Weiyi Lu | Yi Xu | Iman Keivanloo | Belinda Zeng | Trishul Chilimbi Recent research has shown that large language models pretrained using unsupervised approaches can achieve significant performance improvement on many downstream tasks. Typically when adapting these language models to downstream tasks, like a classification or regression task, we employ a fine-tuning paradigm in which the sentence representation from the language model is input to a task-specific head; the model is then fine-tuned end-to-end. However, with the emergence of models like GPT-3, prompt-based fine-tuning has been proven to be a successful approach for few-shot tasks. Inspired by this work, we study discrete prompt technologies in practice. There are two issues that arise with the standard prompt approach. First, it can overfit on the prompt template. Second, it requires manual effort to formulate the downstream task as a language model problem. In this paper, we propose an improvement to prompt-based fine-tuning that addresses these two issues. We refer to our approach as DynaMaR – Dynamic Prompt with Mask Token Representation. Results show that DynaMaR can achieve an average improvement of 10% in few-shot settings and improvement of 3.7% in data-rich settings over the standard fine-tuning approach on four e-commerce applications. pdf bib abs A Hybrid Approach to Cross-lingual Product Review Summarization Saleh Soltan | Victor Soto | Ke Tran | Wael Hamza We present a hybrid approach for product review summarization which consists of: (i) an unsupervised extractive step to extract the most important sentences out of all the reviews, and (ii) a supervised abstractive step to summarize the extracted sentences into a coherent short summary. This approach allows us to develop an efficient cross-lingual abstractive summarizer that can generate summaries in any language, given the extracted sentences out of thousands of reviews in a source language. In order to train and test the abstractive model, we create the Cross-lingual Amazon Reviews Summarization (CARS) dataset which provides English summaries for training, and English, French, Italian, Arabic, and Hindi summaries for testing based on selected English reviews. We show that the summaries generated by our model are as good as human written summaries in coherence, informativeness, non-redundancy, and fluency. pdf bib abs Augmenting Operations Research with Auto-Formulation of Optimization Models From Problem Descriptions Rindra Ramamonjison | Haley Li | Timothy Yu | Shiqi He | Vishnu Rengan | Amin Banitalebi-dehkordi | Zirui Zhou | Yong Zhang We describe an augmented intelligence system for simplifying and enhancing the modeling experience for operations research. Using this system, the user receives a suggested formulation of an optimization problem based on its description. To facilitate this process, we build an intuitive user interface system that enables the users to validate and edit the suggestions. We investigate controlled generation techniques to obtain an automatic suggestion of formulation. Then, we evaluate their effectiveness with a newly created dataset of linear programming problems drawn from various application domains. pdf bib abs Knowledge Distillation based Contextual Relevance Matching for E -commerce Product Search Ziyang Liu | Chaokun Wang | Hao Feng | Lingfei Wu | Liqun Yang Online relevance matching is an essential task of e-commerce product search to boost the utility of search engines and ensure a smooth user experience. Previous work adopts either classical relevance matching models or Transformer-style models to address it. However, they ignore the inherent bipartite graph structures that are ubiquitous in e-commerce product search logs and are too inefficient to deploy online. In this paper, we design an efficient knowledge distillation framework for e-commerce relevance matching to integrate the respective advantages of Transformer-style models and classical relevance matching models. Especially for the core student model of the framework, we propose a novel method using k-order relevance modeling. The experimental results on large-scale real-world data (the size is 6 174 million) show that the proposed method significantly improves the prediction accuracy in terms of human relevance judgment. We deploy our method to JD.com online search platform. The A/B testing results show that our method significantly improves most business metrics under price sort mode and default sort mode. pdf bib abs Accelerating the Discovery of Semantic Associations from Medical Literature: Mining Relations Between Diseases and Symptoms Alberto Purpura | Francesca Bonin | Joao Bettencourt-silva Medical literature is a vast and constantly expanding source of information about diseases, their diagnoses and treatments. One of the ways to extract insights from this type of data is through mining association rules between such entities. However, existing solutions do not take into account the semantics of sentences from which entity co-occurrences are extracted. We propose a scalable solution for the automated discovery of semantic associations between different entities such as diseases and their symptoms. Our approach employs the UMLS semantic network and a binary relation classification model trained with distant supervision to validate and help ranking the most likely entity associations pairs extracted with frequency-based association rule mining algorithms. We evaluate the proposed system on the task of extracting disease-symptom associations from a collection of over 14M PubMed abstracts and validate our results against a publicly available known list of disease-symptom pairs. pdf bib abs PENTATRON : PE rsonalized co NT ext-Aware Transformer for Retrieval-based c O nversational u N derstanding Niranjan Uma Naresh | Ziyan Jiang | Ankit | Sungjin Lee | Jie Hao | Xing Fan | Chenlei Guo Conversational understanding is an integral part of modern intelligent devices. In a large fraction of the global traffic from customers using smart digital assistants, frictions in dialogues may be attributed to incorrect understanding of the entities in a customer’s query due to factors including ambiguous mentions, mispronunciation, background noise and faulty on-device signal processing. Such errors are compounded by two common deficiencies from intelligent devices namely, (1) the device not being tailored to individual customers, and (2) the device responses being unaware of the context in the conversation session. Viewing this problem via the lens of retrieval-based search engines, we build and evaluate a scalable entity correction system, PENTATRON. The system leverages a parametric transformer-based language model to learn patterns from in-session customer-device interactions coupled with a non-parametric personalized entity index to compute the correct query, which aids downstream components in reasoning about the best response. In addition to establishing baselines and demonstrating the value of personalized and context-aware systems, we use multitasking to learn the domain of the correct entity. We also investigate the utility of language model prompts. Through extensive experiments, we show a significant upward movement of the key metric (Exact Match) by up to 500.97% (relative to the baseline). pdf bib abs Machine translation impact in E -commerce multilingual search Bryan Zhang | Amita Misra Previous work suggests that performance of cross-lingual information retrieval correlates highly with the quality of Machine Translation. However, there may be a threshold beyond which improving query translation quality yields little or no benefit to further improve the retrieval performance. This threshold may depend upon multiple factors including the source and target languages, the existing MT system quality and the search pipeline. In order to identify the benefit of improving an MT system for a given search pipeline, we investigate the sensitivity of retrieval quality to the presence of different levels of MT quality using experimental datasets collected from actual traffic. We systematically improve the performance of our MT systems quality on language pairs as measured by MT evaluation metrics including Bleu and Chrf to determine their impact on search precision metrics and extract signals that help to guide the improvement strategies. Using this information we develop techniques to compare query translations for multiple language pairs and identify the most promising language pairs to invest and improve. pdf bib abs Ask-and-Verify: Span Candidate Generation and Verification for Attribute Value Extraction Yifan Ding | Yan Liang | Nasser Zalmout | Xian Li | Christan Grant | Tim Weninger The product attribute value extraction (AVE) task aims to capture key factual information from product profiles, and is useful for several downstream applications in e-Commerce platforms. Previous contributions usually formulate this task using sequence labeling or reading comprehension architectures. However, sequence labeling models tend to be conservative in their predictions resulting in a high false negative rate. Existing reading comprehension formulations, on the other hand, can over-generate attribute values which hinders precision. In the present work we address these limitations with a new end-to-end pipeline framework called Ask-and-Verify. Given a product and an attribute query, the Ask step detects the top-K span candidates (i.e. possible attribute values) from the product profiles, then the Verify step filters out false positive candidates. We evaluate Ask-and-Verify model on Amazon’s product pages and AliExpress public dataset, and present a comparative analysis as well as a detailed ablation study. Despite its simplicity, we show that Ask-and-Verify outperforms recent state-of-the-art models by up to 3.1% F1 absolute improvement points, while also scaling to thousands of attributes. pdf bib abs Consultation Checklists: Standardising the Human Evaluation of Medical Note Generation Aleksandar Savkov | Francesco Moramarco | Alex Papadopoulos Korfiatis | Mark Perera | Anya Belz | Ehud Reiter Evaluating automatically generated text is generally hard due to the inherently subjective nature of many aspects of the output quality. This difficulty is compounded in automatic consultation note generation by differing opinions between medical experts both about which patient statements should be included in generated notes and about their respective importance in arriving at a diagnosis. Previous real-world evaluations of note-generation systems saw substantial disagreement between expert evaluators. In this paper we propose a protocol that aims to increase objectivity by grounding evaluations in Consultation Checklists, which are created in a preliminary step and then used as a common point of reference during quality assessment. We observed good levels of inter-annotator agreement in a first evaluation study using the protocol; further, using Consultation Checklists produced in the study as reference for automatic metrics such as ROUGE or BERTScore improves their correlation with human judgements compared to using the original human note. pdf bib abs Towards Need-Based Spoken Language Understanding Model Updates: What Have We Learned? Quynh Do | Judith Gaspers | Daniil Sorokin | Patrick Lehnen In productionized machine learning systems, online model performance is known to deteriorate over time when there is a distributional drift between offline training and online application data. As a remedy, models are typically retrained at fixed time intervals, implying high computational and manual costs. This work aims at decreasing such costs in productionized, large-scale Spoken Language Understanding systems. In particular, we develop a need-based re-training strategy guided by an efficient drift detector and discuss the arising challenges including system complexity, overlapping model releases, observation limitation and the absence of annotated resources at runtime. We present empirical results on historical data and confirm the utility of our design decisions via an online A/B experiment. pdf bib abs Knowledge Distillation Transfer Sets and their Impact on Downstream NLU Tasks Charith Peris | Lizhen Tan | Thomas Gueudre | Turan Gojayev | Pan Wei | Gokmen Oz Teacher-student knowledge distillation is a popular technique for compressing today’s prevailing large language models into manageable sizes that fit low-latency downstream applications. Both the teacher and the choice of transfer set used for distillation are crucial ingredients in creating a high quality student. Yet, the generic corpora used to pretrain the teacher and the corpora associated with the downstrea

Executive Summary

The Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP): Industry Track, edited by Yunyao Li and Angeliki Lazaridou, presents cutting-edge research in natural language processing (NLP) with a focus on industry applications. The conference features a variety of papers, including one on unsupervised term extraction for highly technical domains and another on dynamic prompt with mask token representation for large language models. These papers highlight advancements in NLP technologies that aim to improve performance and reduce inference latency, particularly in specialized fields such as pharma, medical, and material science.

Key Points

  • Unsupervised term extraction for highly technical domains using morphological signals and pre-trained sentence encoders.
  • Dynamic prompt with mask token representation to improve the adaptability of large language models to downstream tasks.
  • Weakly-supervised setups that leverage unsupervised annotators to enhance model performance with limited labeled data.

Merits

Innovative Approach to Term Extraction

The paper on unsupervised term extraction introduces a novel method that combines morphological signals with term-to-topic and intra-term similarity metrics, providing a robust solution for domains with scarce annotations.

Improved Model Performance

The dynamic prompt with mask token representation approach demonstrates significant improvements in adapting large language models to downstream tasks, particularly in few-shot learning scenarios.

Demerits

Limited Generalizability

While the unsupervised term extraction method shows promise, its effectiveness may vary across different technical domains, requiring further validation.

Potential Overfitting

The dynamic prompt approach, despite its advantages, may still face challenges related to overfitting, particularly when dealing with small datasets.

Expert Commentary

The Proceedings of the 2022 EMNLP Industry Track offer a compelling glimpse into the current state and future potential of NLP technologies in industrial applications. The paper on unsupervised term extraction presents a sophisticated approach that leverages morphological signals and pre-trained sentence encoders to address the challenges of data scarcity in highly technical domains. This method not only improves the accuracy of term extraction but also reduces inference latency, making it a valuable tool for knowledge discovery platforms. Similarly, the dynamic prompt with mask token representation approach demonstrates the adaptability of large language models to downstream tasks, particularly in few-shot learning scenarios. This method addresses the limitations of traditional fine-tuning paradigms and offers a more efficient and effective solution for model adaptation. However, both approaches have their limitations. The unsupervised term extraction method may require further validation across different technical domains to ensure its generalizability. The dynamic prompt approach, while promising, may still face challenges related to overfitting, particularly with small datasets. Despite these limitations, the research presented in these proceedings underscores the importance of continued innovation in NLP technologies to meet the evolving needs of industries relying on technical expertise.

Recommendations

  • Further research should focus on validating the unsupervised term extraction method across a broader range of technical domains to ensure its robustness and generalizability.
  • Developers of NLP models should explore hybrid approaches that combine unsupervised, weakly-supervised, and supervised learning methods to enhance model performance and adaptability.

Sources

Related Articles