查看原文
其他

刊讯|SSCI 期刊《计算语言学协会学报》2022年第10卷

五万学者关注了→ 语言学心得 2023-01-10

 Transactions of the Association for Computational Linguistics

Volume 10, 2022

计算机语言学协会学报(SSCI一区,2021 IF:9.194)2022年第10卷共发文47篇(本文限于篇幅节选其中14篇)。研究论文涉及神经语言模型、处理分歧、自动事实核查等主题。

温馨提示:点击文末“阅读原文”可跳转下载全文。


目录


ARTICLES

  • Word Acquisition in Neural Language Models, by Tyler A Chang,Benjamin K Bergen, Pages 1-16.

  • Decomposing and Recomposing Event Structure, by William Andrew Horsley Gantt,Lelia Glass,Aaron Steven White, Pages 17-34.

  • FeTaQA: Free-form Table Question Answering, by Linyong Nan;Chiachun Hsieh, Ziming Mao, Xi Lin, Neha Verma, Rui Zhang, Wojciech Kryściński, Hailey Schoelkopf, Riley Kong, Xiangru Tang, Mutethia Mutuma, Benjamin Rosand, Isabel Trindade, Renusree Bandaru, Jacob Cunningham, Caiming Xiong, Dragomir Radev, Pages 35-49.

  • Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets , by Julia Kreutzer, Isaac Caswell, Lisa Wang, Ahsan Wahab, Daan van Esch, Nasanbayar Ulzii-Orshikh, Allahsera Auguste Tapo, Nishant Subramani, Artem Sokolov, Claytone Sikasote, Monang Setyawan, Supheakmungkol Sarin, Sokhar Samb, Benoît Sagot, Clara Rivera, Annette Rios, Isabel Papadimitriou, Salomey Osei, Pedro Ortiz Suarez, Iroro Fred Ọ̀nọ̀mẹ̀ Orife, Kelechi Ogueji, Rubungo Andre Niyongabo, Toan Nguyen, Mathias Müller, André Müller, Shamsuddeen Hassan Muhammad, Nanda Muhammad, Ayanda Mnyakeni, Jamshidbek Mirzakhalov, Tapiwanashe Matangira, Colin Leong, Nze Lawson, Sneha Kudugunta, Yacine Jernite, Mathias Jenny, Orhan Firat, Bonaventure Femi Pancrace Dossou, Sakhile Dlamini, Nisansa de Silva, Sakine Çabuk Ballı, Stella Biderman, Alessia Battisti, Ahmed Baruwa, Ankur Bapna, Pallavi Baljekar, Israel Abebe Azime, Ayodele Awokoya, Duygu Ataman, Orevaoghene Ahia, Oghenefego Ahia, Sweta Agrawal, Mofetoluwa Adeyemi,Pages 50-72.

  • CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation, by Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting, Pages 73-91.

  • Dealing with Disagreements: Looking Beyond the Majority Vote in Subjective Annotations, by Aida Mostafazadeh Davani, Mark Díaz, Vinodkumar Prabhakaran, Pages 92-110.

  • Break, Perturb, Build: Automatic Perturbation of Reasoning Paths Through Question Decomposition, by Mor Geva, Tomer Wolfson, Jonathan Berant, Pages 111-126.

  • Out-of-Domain Discourse Dependency Parsing via Bootstrapping: An Empirical Analysis on Its Effectiveness and Limitation, by Noriki Nishida, Yuji Matsumoto, Pages 127-144.

  • Samanantar: The Largest Publicly Available Parallel Corpora Collection For 11 Indic Languages, by Gowtham Ramesh, Sumanth Doddapaneni, Aravinth Bheemaraj, Mayank Jobanputra, Raghavan AK, Ajitesh Sharma, Sujit Sahoo, Harshita Diddee, Mahalakshmi J, Divyanshu Kakwani, Navneet Kumar, Aswin Pradeep, Srihari Nagaraj, Kumar Deepak, Vivek Raghavan, Anoop Kunchukuttan, Pratyush Kumar, Mitesh Shantadevi Khapra, Pages 145-162.

  • SummaC: Re-Visiting NLI-based Models for Inconsistency Detection in Summarization, by Philippe Laban, Tobias Schnabel, Paul Bennett, Marti Hearst, Pages 163-177.

  • A Survey on Automated Fact-Checking,by Zhijiang Guo, Michael Sejr Schlichtkrull, Andreas Vlachos, Pages 178-206.

  • Predicting Document Coverage for Relation Extraction, by Sneha Singhania, Simon Razniewski, Gerhard Weikum, Pages 207-223.

  • ABNIRML: Analyzing the Behavior of Neural IR Models, by Sean MacAvaney, Sergey Feldman, Nazli Goharian, Doug Downey, Arman Cohan, Pages 223-239.

  • Neuro-symbolic Natural Logic with Introspective Revision for Natural Language Inference, by Yufei Feng, Xiaoyu Yang, Xiaodan Zhu, Michael Greenspan, Pages 240-256.


摘要

Word Acquisition in Neural Language Models


Tyler A Chang, Department of Cognitive Science

Benjamin K Bergen, Department of Cognitive Science


Abstract We investigate how neural language models acquire individual words during training, extracting learning curves and ages of acquisition for over 600 words on the MacArthur-Bates Communicative Development Inventory (Fenson et al., 2007). Drawing on studies of word acquisition in children, we evaluate multiple predictors for words' ages of acquisition in LSTMs, BERT, and GPT-2. We find that the effects of concreteness, word length, and lexical class are pointedly different in children and language models, reinforcing the importance of interaction and sensorimotor experience in child language acquisition. Language models rely far more on word frequency than children, but like children, they exhibit slower learning of words in longer utterances. Interestingly, models follow consistent patterns during training for both unidirectional and bidirectional models, and for both LSTM and Transformer architectures. Models predict based on unigram token frequencies early in training, before transitioning loosely to bigram probabilities, eventually converging on more nuanced predictions. These results shed light on the role of distributional learning mechanisms in children, while also providing insights for more human-like language acquisition in language models.



Decomposing and Recomposing Event Structure


William Andrew Horsley Gant, University of Rochester, USA.

Lelia Glass, University of Rochester, USA.

Aaron Steven White, University of Rochester, USA.

Abstract We present an event structure classification empirically derived from inferential properties annotated on sentence- and document-level Universal Decompositional Semantics (UDS) graphs. We induce this classification jointly with semantic role, entity, and event-event relation classifications using a document-level generative model structured by these graphs. To support this induction, we augment existing annotations found in the UDS1.0 dataset, which covers the entirety of the English Web Treebank, with an array of inferential properties capturing fine-grained aspects of the temporal and aspectual structure of events. The resulting dataset (available at decomp.io) is the largest annotation of event structure and (partial) event coreference to date.


FeTaQA: Free-form Table Question Answering


Linyong Nan, Yale University, USA. 

Chiachun Hsieh, Yale University, USA. 

Ziming Mao,Yale University, USA. 

Abstract Existing table question answering datasets contain abundant factual questions that primarily evaluate a QA system’s comprehension of query and tabular data. However, restricted by their short-form answers, these datasets fail to include question–answer interactions that represent more advanced and naturally occurring information needs: questions that ask for reasoning and integration of information pieces retrieved from a structured knowledge source. To complement the existing datasets and to reveal the challenging nature of the table-based question answering task, we introduce FeTaQA, a new dataset with 10K Wikipedia-based {table, question, free-form answer, supporting table cells} pairs. FeTaQA is collected from noteworthy descriptions of Wikipedia tables that contain information people tend to seek; generation of these descriptions requires advanced processing that humans perform on a daily basis: Understand the question and table, retrieve, integrate, infer, and conduct text planning and surface realization to generate an answer. We provide two benchmark methods for the proposed task: a pipeline method based on semantic parsing-based QA systems and an end-to-end method based on large pretrained text generation models, and show that FeTaQA poses a challenge for both methods.


Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets


Julia Kreutzer, Google Research, Canada

Isaac Caswell, Google Research, USA

Lisa Wang, Ahsan Wahab, Google Research, USA

Abstract With the success of large-scale pre-training and multilingual modeling in Natural Language Processing (NLP), recent years have seen a proliferation of large, web-mined text datasets covering hundreds of languages. We manually audit the quality of 205 language-specific corpora released with five major public datasets (CCAligned, ParaCrawl, WikiMatrix, OSCAR, mC4). Lower-resource corpora have systematic issues: At least 15 corpora have no usable text, and a significant fraction contains less than 50% sentences of acceptable quality. In addition, many are mislabeled or use nonstandard/ambiguous language codes. We demonstrate that these issues are easy to detect even for non-proficient speakers, and supplement the human audit with automatic analyses. Finally, we recommend techniques to evaluate and improve multilingual corpora and discuss potential risks that come with low-quality data releases.



CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation


Jonathan H. Clark, Google Research, USA.

Dan Garrette, Google Research, USA. 

Iulia Turc, John Wieting, Google Research, USA. 


Abstract Pipelined NLP systems have largely been superseded by end-to-end neural modeling, yet nearly all commonly-used models still require an explicit tokenization step. While recent tokenization approaches based on data-derived subword lexicons are less brittle than manually engineered tokenizers, these techniques are not equally suited to all languages, and the use of any fixed vocabulary may limit a model's ability to adapt. In this paper, we present Canine, a neural encoder that operates directly on character sequences -- without explicit tokenization or vocabulary -- and a pre-training strategy that operates either directly on characters or optionally uses subwords as a soft inductive bias. To use its finer-grained input effectively and efficiently, Canine combines downsampling, which reduces the input sequence length, with a deep transformer stack, which encodes context. Canine outperforms a comparable mBERT model by 5.7 F1 on TyDi QA, a challenging multilingual benchmark, despite having fewer model parameters


Dealing with Disagreements: Looking Beyond the Majority Vote in Subjective Annotations


Aida Mostafazadeh Davani, University of Southern California, USA.

Mark Díaz, University of Southern California, USA.

Vinodkumar Prabhakaran, Google Research, USA.

Abstract Majority voting and averaging are common approaches employed to resolve annotator disagreements and derive single ground truth labels from multiple annotations. However, annotators may systematically disagree with one another, often reflecting their individual biases and values, especially in the case of subjective tasks such as detecting affect, aggression, and hate speech. Annotator disagreements may capture important nuances in such tasks that are often ignored while aggregating annotations to a single ground truth. In order to address this, we investigate the efficacy of multi-annotator models. In particular, our multi-task based approach treats predicting each annotators' judgements as separate subtasks, while sharing a common learned representation of the task. We show that this approach yields same or better performance than aggregating labels in the data prior to training across seven different binary classification tasks. Our approach also provides a way to estimate uncertainty in predictions, which we demonstrate better correlate with annotation disagreements than traditional methods. Being able to model uncertainty is especially useful in deployment scenarios where knowing when not to make a prediction is important. 


Break, Perturb, Build: Automatic Perturbation of Reasoning Paths Through Question Decomposition


Mor Geva,School of Computer Science, Tel Aviv University, Israel

Tomer Wolfson, School of Computer Science, Tel Aviv University, Israel

Jonathan Berant,School of Computer Science, Tel Aviv University, Israel

Abstract Recent efforts to create challenge benchmarks that test the abilities of natural language understanding models have largely depended on human annotations. In this work, we introduce the "Break, Perturb, Build" (BPB) framework for automatic reasoning-oriented perturbation of question-answer pairs. BPB represents a question by decomposing it into the reasoning steps that are required to answer it, symbolically perturbs the decomposition, and then generates new question-answer pairs. We demonstrate the effectiveness of BPB by creating evaluation sets for three reading comprehension (RC) benchmarks, generating thousands of high-quality examples without human intervention. We evaluate a range of RC models on our evaluation sets, which reveals large performance gaps on generated examples compared to the original data. Moreover, symbolic perturbations enable fine-grained analysis of the strengths and limitations of models. Last, augmenting the training data with examples generated by BPB helps close the performance gaps, without any drop on the original data distribution.


Out-of-Domain Discourse Dependency Parsing via Bootstrapping: An Empirical Analysis on Its Effectiveness and Limitation


Noriki Nishida, RIKEN Center for Advanced Intelligence Project, Japan. 

Yuji Matsumoto, RIKEN Center for Advanced Intelligence Project, Japan.

Abstract Discourse parsing has been studied for decades. However, it still remains challenging to utilize discourse parsing for real-world applications because the parsing accuracy degrades significantly on out-of-domain text. In this paper, we report and discuss the effectiveness and limitations of bootstrapping methods for adapting modern BERT-based discourse dependency parsers to out-of-domain text without relying on additional human supervision. Specifically, we investigate self-training, co-training, tri-training, and asymmetric tri-training of graph-based and transition-based discourse dependency parsing models, as well as confidence measures and sample selection criteria in two adaptation scenarios: monologue adaptation between scientific disciplines and dialogue genre adaptation. We also release COVID-19 Discourse Dependency Treebank (COVID19-DTB), a new manually annotated resource for discourse dependency parsing of biomedical paper abstracts. The experimental results show that bootstrapping is significantly and consistently effective for unsupervised domain adaptation of discourse dependency parsing, but the low coverage of accurately predicted pseudo labels is a bottleneck for further improvement. We show that active learning can mitigate this limitation.


Samanantar: The Largest Publicly Available Parallel Corpora Collection For 11 Indic Languages


Gowtham Ramesh, RBCDSAI, India

Sumanth Doddapaneni, RBCDSAI, India

Aravinth Bheemaraj,Tarento Technologies, India

Abstract We present Samanantar, the largest publicly available parallel corpora collection for Indic languages. The collection contains a total of 49.7 million sentence pairs between English and 11 Indic languages (from two language families). Specifically, we compile 12.4 million sentence pairs from existing, publicly-available parallel corpora, and additionally mine 37.4 million sentence pairs from the web, resulting in a 4X increase. We mine the parallel sentences from the web by combining many corpora, tools, and methods: (a) web-crawled monolingual corpora, (b) document OCR for extracting sentences from scanned documents, (c) multilingual representation models for aligning sentences, and (d) approximate nearest neighbor search for searching in a large collection of sentences. Human evaluation of samples from the newly mined corpora validate the high quality of the parallel sentences across 11 languages. Further, we extract 83.4 million sentence pairs between all 55 Indic language pairs from the English-centric parallel corpus using English as the pivot language. We trained multilingual NMT models spanning all these languages on Samanantar, which outperform existing models and baselines on publicly available benchmarks, such as FLORES, establishing the utility of Samanantar. Our data and models are available publicly at Samanantar and we hope they will help advance research in NMT and multilingual NLP for Indic languages.


SummaC: Re-Visiting NLI-based Models for Inconsistency Detection in Summarization


Philippe Laban, UC Berkeley, USA. 

Tobias Schnabel, Microsoft, USA. 

Paul Bennett, Microsoft, USA.

Marti Hearst, UC Berkeley, USA. 

Abstract In the summarization domain, a key requirement for summaries is to be factually consistent with the input document. Previous  work has found that natural language inference (NLI) models do not perform competitively when applied to inconsistency detection. In this work, we revisit the use of NLI for inconsistency detection, finding that past work suffered from a mismatch in input granularity between NLI datasets (sentence-level), and  inconsistency detection (document level). We provide a highly effective and light-weight method called SummaCConv that enables NLI models to be successfully used for this task by segmenting documents into sentence units and aggregating scores between pairs of sentences. On our newly introduced benchmark called SummaC (Summary Consistency) consisting of six large inconsistency detection datasets, SummaCConv obtains state-of-the-art results with a balanced accuracy of 74.4%, a 5% point improvement compared to prior work.


A Survey on Automated Fact-Checking


Zhijiang Guo, Department of Computer Science and Technology, University of Cambridge, UK.

Michael Sejr Schlichtkrull, Department of Computer Science and Technology, University of Cambridge, UK. 

Andreas Vlachos, Department of Computer Science and Technology, University of Cambridge, UK. 


Abstract Fact-checking has become increasingly important due to the speed with which both information and misinformation can spread in the modern media ecosystem. Therefore, researchers have been exploring how fact-checking can be automated, using techniques based on natural language processing, machine learning, knowledge representation, and databases to automatically predict the veracity of claims. In this paper, we survey automated fact-checking stemming from natural language processing, and discuss its connections to related tasks and disciplines. In this process, we present an overview of existing datasets and models, aiming to unify the various definitions given and identify common concepts. Finally, we highlight challenges for future research.



Predicting Document Coverage for Relation Extraction


Francesco BurroniSam Tilsen,Cornell University, Department of Linguistics, Morrill Hall 203, 14850 Ithaca, NY, USA

Sneha Singhania,Max Planck Institute for Informatics, Germany.

Simon Razniewski,Max Planck Institute for Informatics, Germany.

Gerhard Weikum,Max Planck Institute for Informatics, Germany.

Abstract This paper presents a new task of predicting the coverage of a text document for relation extraction (RE): does the document contain many relational tuples for a given entity? Coverage predictions are useful in selecting the best documents for knowledge base construction with large input corpora. To study this problem, we present a dataset of 31,366 diverse documents for 520 entities. We analyze the correlation of document coverage with features like length, entity mention frequency, Alexa rank, language complexity and information retrieval scores. Each of these features has only moderate predictive power. We employ methods combining features with statistical models like TF-IDF and language models like BERT. The model combining features and BERT, HERB, achieves an F1 score of up to 46%. We demonstrate the utility of coverage predictions on two use cases: KB construction and claim refutation.



ABNIRML: Analyzing the Behavior of Neural IR Models


Sean MacAvaney, IR Lab, Georgetown University, Washington, DC, USA.

Sergey Feldman, Allen Institute for AI, Seattle, WA, USA.

Nazli Goharian, IR Lab, Georgetown University, Washington, DC, USA.

Doug Downey, Allen Institute for AI, Seattle, WA, USA.

Arman Cohan, Allen Institute for AI, Seattle, WA, USA


Abstract Pretrained contextualized language models such as BERT and T5 have established a new state-of-the-art for ad-hoc search. However, it is not yet well-understood why these methods are so effective, what makes some variants more effective than others, and what pitfalls they may have.  We present a new comprehensive framework for Analyzing the Behavior of Neural IR ModeLs (ABNIRML), which includes new types of diagnostic probes that allow us to test several characteristics---such as writing styles, factuality, sensitivity to paraphrasing and word order---that are not addressed by previous techniques.  To demonstrate the value of the framework, we conduct an extensive empirical study that yields insights into the factors that contribute to the neural model's gains, and identify potential unintended biases the models exhibit. Some of our results confirm conventional wisdom, like that recent neural ranking models rely less on exact term overlap with the query, and instead leverage richer linguistic information, evidenced by their higher sensitivity to word and sentence order. Other results are more surprising, such as that some models (e.g., T5 and ColBERT) are biased towards factually correct (rather than simply relevant) texts. Further, some characteristics vary even for the same base language model, and other characteristics can appear due to random variations during model training.


Neuro-symbolic Natural Logic with Introspective Revision for Natural Language Inference


Yufei Feng, Ingenuity Labs Research Institute & ECE, Queen’s University, Canada.

Xiaoyu Yang, Ingenuity Labs Research Institute & ECE, Queen’s University, Canada. 

Xiaodan Zhu, Ingenuity Labs Research Institute & ECE, Queen’s University, Canada.

Michael Greenspan, Ingenuity Labs Research Institute & ECE, Queen’s University, Canada. 


Abstract We introduce a neuro-symbolic natural logic framework based on reinforcement learning with introspective revision. The model samples and rewards specific reasoning paths through policy gradient, in which the introspective revision algorithm modifies intermediate symbolic reasoning steps to discover reward-earning operations as well as leverages external knowledge to alleviate spurious reasoning and training inefficiency. The framework is supported by properly designed local relation models to avoid input entangling, which helps ensure the interpretability of the proof paths. The proposed model has built-in interpretability and shows superior capability in monotonicity inference, systematic generalization, and interpretability, compared to previous models on the existing datasets.


期刊简介

Transactions of the Association for Computational Linguistics (TACL) is an ACL-sponsored journal published by MIT Press that publishes papers in all areas of computational linguistics and natural language processing. TACL has the following features:计算机语言学协会学报(TACL)是由麻省理工学院出版社出版的ACL赞助期刊,发表计算语言学和自然语言处理所有领域的论文。TACL具有以下特点:

1.TACL publishes conference-length papers, but has a journal-style reviewing process (for example, the option for an action editor to recommend the “revise and resubmit” category for a paper).TACL发表会议长度的论文,但具有期刊式的审阅过程(例如,行动编辑可以选择为论文推荐“修改并重新提交”类别)。

2.Papers appearing at TACL are eligible for a presentation at certain ACL-sponsored conferences. Thus the model combines the benefits of a journal, with the benefits of being able to present the work at a major conference. (Presentation is optional; authors do not have to present their papers at the conference).因此,该模型结合了期刊的优势,以及能够在主要会议上展示工作成果的优势。(演讲是可选的;作者不必在会议上发表论文).

3.TACL accepts submissions all year (the 1st day of each month is a submission deadline).TACL全年接受投稿(每个月的第一天是投稿截止日期).

4.TACL is committed to fast-turnaround reviewing.(TACL致力于快速周转审稿)

官网地址:

https://transacl.org/index.php/tacl

本文来源:TACL官网

点击文末“阅读原文”可跳转下载




课程推荐


重  磅|《语言文字应用》创刊三十周年!

2022-11-20

刊讯|《第二语言学习研究》2022年第14辑

2022-11-21

刊讯|SSCI 期刊《社会中的语言》2022年第1-3期

2022-11-18

刊讯|SSCI 期刊《应用语言学》2022年第3期

2022-11-15

刊讯|《对外汉语研究》2022年第26期

2022-11-11

刊讯|SSCI 期刊 TESOL Quarterly 2022年第2期

2022-11-10

刊讯 |《国际中文教育(中英文)》2022年第3期(留言赠刊)

2022-11-07

刊讯|《东方语言学》第22辑

2022-11-06

刊讯|SSCI 期刊《语言教学研究》2022年第4-5期

2022-11-05

刊讯|《语言研究集刊》2022年第1期

2022-11-04

刊讯|SSCI 期刊《语料库语言学和语言学理论》2022年第1-2期

2022-11-03

刊讯|SSCI 期刊《专门用途英语》2022年第65-68卷

2022-11-02


欢迎加入

“语言学心得交流分享群”


“语言学考博/考研/保研交流群”请添加“心得君”入群务必备注“学校+研究方向/专业”

今日小编:讷   言

  审     核:心得小蔓

转载&合作请联系

"心得君"

微信:xindejun_yyxxd

点击“阅读原文”可跳转下载

您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存