Conference

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations - ACL Anthology

· · 11 min read · 8 views

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations Qun Liu , David Schlangen (Editors) Anthology ID: 2020.emnlp-demos Month: October Year: 2020 Address: Online Venue: EMNLP SIG: Publisher: Association for Computational Linguistics URL: https://aclanthology.org/2020.emnlp-demos/ DOI: 10.18653/v1/2020.emnlp-demos Bib Export formats: BibTeX MODS XML EndNote PDF: https://aclanthology.org/2020.emnlp-demos.pdf PDF (full) Bib TeX Search Show all abstracts Hide all abstracts pdf bib Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations Qun Liu | David Schlangen pdf bib abs O pen UE : An Open Toolkit of Universal Extraction from Text Ningyu Zhang | Shumin Deng | Zhen Bi | Haiyang Yu | Jiacheng Yang | Mosha Chen | Fei Huang | Wei Zhang | Huajun Chen Natural language processing covers a wide variety of tasks with token-level or sentence-level understandings. In this paper, we provide a simple insight that most tasks can be represented in a single universal extraction format. We introduce a prototype model and provide an open-source and extensible toolkit called OpenUE for various extraction tasks. OpenUE allows developers to train custom models to extract information from the text and supports quick model validation for researchers. Besides, OpenUE provides various functional modules to maintain sufficient modularity and extensibility. Except for the toolkit, we also deploy an online demo with restful APIs to support real-time extraction without training and deploying. Additionally, the online system can extract information in various tasks, including relational triple extraction, slot & intent detection, event extraction, and so on. We release the source code, datasets, and pre-trained models to promote future researches in http://github.com/zjunlp/openue . pdf bib abs BERT weet: A pre-trained language model for E nglish Tweets Dat Quoc Nguyen | Thanh Vu | Anh Tuan Nguyen We present BERTweet, the first public large-scale pre-trained language model for English Tweets. Our BERTweet, having the same architecture as BERT-base (Devlin et al., 2019), is trained using the RoBERTa pre-training procedure (Liu et al., 2019). Experiments show that BERTweet outperforms strong baselines RoBERTa-base and XLM-R-base (Conneau et al., 2020), producing better performance results than the previous state-of-the-art models on three Tweet NLP tasks: Part-of-speech tagging, Named-entity recognition and text classification. We release BERTweet under the MIT License to facilitate future research and applications on Tweet data. Our BERTweet is available at https://github.com/VinAIResearch/BERTweet pdf bib abs N eural QA : A Usable Library for Question Answering (Contextual Query Expansion + BERT ) on Large Datasets Victor Dibia Existing tools for Question Answering (QA) have challenges that limit their use in practice. They can be complex to set up or integrate with existing infrastructure, do not offer configurable interactive interfaces, and do not cover the full set of subtasks that frequently comprise the QA pipeline (query expansion, retrieval, reading, and explanation/sensemaking). To help address these issues, we introduce NeuralQA - a usable library for QA on large datasets. NeuralQA integrates well with existing infrastructure (e.g., ElasticSearch instances and reader models trained with the HuggingFace Transformers API) and offers helpful defaults for QA subtasks. It introduces and implements contextual query expansion (CQE) using a masked language model (MLM) as well as relevant snippets ( RelSnip ) - a method for condensing large documents into smaller passages that can be speedily processed by a document reader model. Finally, it offers a flexible user interface to support workflows for research explorations (e.g., visualization of gradient-based explanations to support qualitative inspection of model behaviour) and large scale search deployment. Code and documentation for NeuralQA is available as open source on Github. pdf bib abs W ikipedia2 V ec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from W ikipedia Ikuya Yamada | Akari Asai | Jin Sakuma | Hiroyuki Shindo | Hideaki Takeda | Yoshiyasu Takefuji | Yuji Matsumoto The embeddings of entities in a large knowledge base (e.g., Wikipedia) are highly beneficial for solving various natural language tasks that involve real world knowledge. In this paper, we present Wikipedia2Vec, a Python-based open-source tool for learning the embeddings of words and entities from Wikipedia. The proposed tool enables users to learn the embeddings efficiently by issuing a single command with a Wikipedia dump file as an argument. We also introduce a web-based demonstration of our tool that allows users to visualize and explore the learned embeddings. In our experiments, our tool achieved a state-of-the-art result on the KORE entity relatedness dataset, and competitive results on various standard benchmark datasets. Furthermore, our tool has been used as a key component in various recent studies. We publicize the source code, demonstration, and the pretrained embeddings for 12 languages at https://wikipedia2vec.github.io/ . pdf bib abs ARES : A Reading Comprehension Ensembling Service Anthony Ferritto | Lin Pan | Rishav Chakravarti | Salim Roukos | Radu Florian | J. William Murdock | Avi Sil We introduce ARES (A Reading Comprehension Ensembling Service): a novel Machine Reading Comprehension (MRC) demonstration system which utilizes an ensemble of models to increase F1 by 2.3 points. While many of the top leaderboard submissions in popular MRC benchmarks such as the Stanford Question Answering Dataset (SQuAD) and Natural Questions (NQ) use model ensembles, the accompanying papers do not publish their ensembling strategies. In this work, we detail and evaluate various ensembling strategies using the NQ dataset. ARES leverages the CFO (Chakravarti et al., 2019) and ReactJS distributed frameworks to provide a scalable interactive Question Answering experience that capitalizes on the agreement (or lack thereof) between models to improve the answer visualization experience. pdf bib abs Transformers: State-of-the-Art Natural Language Processing Thomas Wolf | Lysandre Debut | Victor Sanh | Julien Chaumond | Clement Delangue | Anthony Moi | Pierric Cistac | Tim Rault | Remi Louf | Morgan Funtowicz | Joe Davison | Sam Shleifer | Patrick von Platen | Clara Ma | Yacine Jernite | Julien Plu | Canwen Xu | Teven Le Scao | Sylvain Gugger | Mariama Drame | Quentin Lhoest | Alexander Rush Recent progress in natural language processing has been driven by advances in both model architecture and model pretraining. Transformer architectures have facilitated building higher-capacity models and pretraining has made it possible to effectively utilize this capacity for a wide variety of tasks. Transformers is an open-source library with the goal of opening up these advances to the wider machine learning community. The library consists of carefully engineered state-of-the art Transformer architectures under a unified API. Backing this library is a curated collection of pretrained models made by and available for the community. Transformers is designed to be extensible by researchers, simple for practitioners, and fast and robust in industrial deployments. The library is available at https://github.com/huggingface/transformers . pdf bib abs A dapter H ub: A Framework for Adapting Transformers Jonas Pfeiffer | Andreas Rücklé | Clifton Poth | Aishwarya Kamath | Ivan Vulić | Sebastian Ruder | Kyunghyun Cho | Iryna Gurevych The current modus operandi in NLP involves downloading and fine-tuning pre-trained models consisting of millions or billions of parameters. Storing and sharing such large trained models is expensive, slow, and time-consuming, which impedes progress towards more general and versatile NLP methods that learn from and for many tasks. Adapters—small learnt bottleneck layers inserted within each layer of a pre-trained model— ameliorate this issue by avoiding full fine-tuning of the entire model. However, sharing and integrating adapter layers is not straightforward. We propose AdapterHub, a framework that allows dynamic “stiching-in” of pre-trained adapters for different tasks and languages. The framework, built on top of the popular HuggingFace Transformers library, enables extremely easy and quick adaptations of state-of-the-art pre-trained models (e.g., BERT, RoBERTa, XLM-R) across tasks and languages. Downloading, sharing, and training adapters is as seamless as possible using minimal changes to the training scripts and a specialized infrastructure. Our framework enables scalable and easy access to sharing of task-specific models, particularly in low-resource scenarios. AdapterHub includes all recent adapter architectures and can be found at AdapterHub.ml pdf bib abs HUMAN : Hierarchical Universal Modular AN notator Moritz Wolf | Dana Ruiter | Ashwin Geet D’Sa | Liane Reiners | Jan Alexandersson | Dietrich Klakow A lot of real-world phenomena are complex and cannot be captured by single task annotations. This causes a need for subsequent annotations, with interdependent questions and answers describing the nature of the subject at hand. Even in the case a phenomenon is easily captured by a single task, the high specialisation of most annotation tools can result in having to switch to another tool if the task only slightly changes. We introduce HUMAN, a novel web-based annotation tool that addresses the above problems by a) covering a variety of annotation tasks on both textual and image data, and b) the usage of an internal deterministic state machine, allowing the researcher to chain different annotation tasks in an interdependent manner. Further, the modular nature of the tool makes it easy to define new annotation tasks and integrate machine learning algorithms e.g., for active learning. HUMAN comes with an easy-to-use graphical user interface that simplifies the annotation task and management. pdf bib abs D eezy M atch: A Flexible Deep Learning Approach to Fuzzy String Matching Kasra Hosseini | Federico Nanni | Mariona Coll Ardanuy We present DeezyMatch, a free, open-source software library written in Python for fuzzy string matching and candidate ranking. Its pair classifier supports various deep neural network architectures for training new classifiers and for fine-tuning a pretrained model, which paves the way for transfer learning in fuzzy string matching. This approach is especially useful where only limited training examples are available. The learned DeezyMatch models can be used to generate rich vector representations from string inputs. The candidate ranker component in DeezyMatch uses these vector representations to find, for a given query, the best matching candidates in a knowledge base. It uses an adaptive searching algorithm applicable to large knowledge bases and query sets. We describe DeezyMatch’s functionality, design and implementation, accompanied by a use case in toponym matching and candidate ranking in realistic noisy datasets. pdf bib abs C o S a T a: A Constraint Satisfaction Solver and Interpreted Language for Semi-Structured Tables of Sentences Peter Jansen This work presents CoSaTa, an intuitive constraint satisfaction solver and interpreted language for knowledge bases of semi-structured tables expressed as text. The stand-alone CoSaTa solver allows easily expressing complex compositional “inference patterns” for how knowledge from different tables tends to connect to support inference and explanation construction in question answering and other downstream tasks, while including advanced declarative features and the ability to operate over multiple representations of text (words, lemmas, or part-of-speech tags). CoSaTa also includes a hybrid imperative/declarative interpreted language for expressing simple models through minimally-specified simulations grounded in constraint patterns, helping bridge the gap between question answering, question explanation, and model simulation. The solver and interpreter are released as open source. Screencast Demo: https://youtu.be/t93Acsz7LyE pdf bib abs I n V e R o: Making Semantic Role Labeling Accessible with Intelligible Verbs and Roles Simone Conia | Fabrizio Brignone | Davide Zanfardino | Roberto Navigli Semantic Role Labeling (SRL) is deeply dependent on complex linguistic resources and sophisticated neural models, which makes the task difficult to approach for non-experts. To address this issue we present a new platform named Intelligible Verbs and Roles (InVeRo). This platform provides access to a new verb resource, VerbAtlas, and a state-of-the-art pretrained implementation of a neural, span-based architecture for SRL. Both the resource and the system provide human-readable verb sense and semantic role information, with an easy to use Web interface and RESTful APIs available at http://nlp.uniroma1.it/invero . pdf bib abs Youling: an AI -assisted Lyrics Creation System Rongsheng Zhang | Xiaoxi Mao | Le Li | Lin Jiang | Lin Chen | Zhiwei Hu | Yadong Xi | Changjie Fan | Minlie Huang Recently, a variety of neural models have been proposed for lyrics generation. However, most previous work completes the generation process in a single pass with little human intervention. We believe that lyrics creation is a creative process with human intelligence centered. AI should play a role as an assistant in the lyrics creation process, where human interactions are crucial for high-quality creation. This paper demonstrates Youling , an AI-assisted lyrics creation system, designed to collaborate with music creators. In the lyrics generation process, Youling supports traditional one pass full-text generation mode as well as an interactive generation mode, which allows users to select the satisfactory sentences from generated candidates conditioned on preceding context. The system also provides a revision module which enables users to revise undesired sentences or words of lyrics repeatedly. Besides, Youling allows users to use multifaceted attributes to control the content and format of generated lyrics. The demo video of the system is available at https://youtu.be/DFeNpHk0pm4 . pdf bib abs A Technical Question Answering System with Transfer Learning Wenhao Yu | Lingfei Wu | Yu Deng | Ruchi Mahindru | Qingkai Zeng | Sinem Guven | Meng Jiang In recent years, the need for community technical question-answering sites has increased significantly. However, it is often expensive for human experts to provide timely and helpful responses on those forums. We develop TransTQA, which is a novel system that offers automatic responses by retrieving proper answers based on correctly answered similar questions in the past. TransTQA is built upon a siamese ALBERT network, which enables it to respond quickly and accuratel

Executive Summary

The 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations features various innovative projects, including OpenUE, a universal extraction toolkit, and BERTweet, a pre-trained language model for English Tweets. These projects demonstrate significant advancements in natural language processing, enabling improved text analysis and information extraction. The conference proceedings highlight the potential of these technologies to facilitate research and applications in multiple areas.

Key Points

  • Introduction of OpenUE, a universal extraction toolkit for various natural language processing tasks
  • Presentation of BERTweet, a pre-trained language model for English Tweets, outperforming strong baselines in multiple tasks
  • Development of NeuralQA, a library for question answering on large datasets, incorporating contextual query expansion and BERT

Merits

Innovative Approaches

The projects presented in the conference proceedings demonstrate innovative approaches to natural language processing, enabling improved text analysis and information extraction.

Open-Source Availability

The release of open-source toolkits and models, such as OpenUE and BERTweet, facilitates future research and applications in the field.

Demerits

Complexity

The complexity of some projects, such as NeuralQA, may limit their usability and integration with existing systems.

Limited Domain Adaptation

The projects may not be readily adaptable to other domains or languages, potentially limiting their applicability.

Expert Commentary

The 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations showcases significant advancements in natural language processing, demonstrating the potential for improved text analysis and information extraction. The introduction of innovative projects, such as OpenUE and BERTweet, highlights the importance of open-source availability and collaboration in driving progress in the field. However, the complexity and limited domain adaptability of some projects may pose challenges for usability and integration. Overall, the conference proceedings demonstrate the potential for natural language processing to facilitate various applications and inform policy decisions, emphasizing the need for continued research and development in this area.

Recommendations

  • Future research should focus on developing more adaptable and user-friendly natural language processing technologies, facilitating integration with existing systems and promoting broader adoption.
  • The development of open-source technologies should be encouraged, promoting transparency, accountability, and collaboration in research and development.

Sources

Related Articles