NALA Group | publications

2024

It Is Not About What You Say, It Is About How You Say It: A Surprisingly Simple Approach for Improving Reading Comprehension

Sagi Shaier, Lawrence Hunter, and Katharina von der Wense

In Findings of the Association for Computational Linguistics ACL 2024, 2024
TAMS: Translation-Assisted Morphological Segmentation

Enora Rice, Ali Marashian, Luke Gessler, Alexis Palmer, and Katharina von der Wense

In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024
Aligning to Adults Is Easy, Aligning to Children Is Hard: A Study of Linguistic Alignment in Dialogue Systems

Dorothea French, Sidney D’Mello, and Katharina von der Wense

In Proceedings of the 1st Human-Centered Large Language Modeling Workshop, 2024
Eyes on the Game: Deciphering Implicit Human Signals to Infer Human Proficiency, Trust, and Intent

Nikhil Hulle, Stephane Aroca-Ouellette, Anthony Ries, Jake Brawer, Katharina von der Wense, and Alessandro Roncone

In Proceedings of the 2024 33rd IEEE International Conference on Robot and Human Interactive Communication (ROMAN), 2024
Findings of the AmericasNLP 2024 Shared Task on Machine Translation into Indigenous Languages

Abteen Ebrahimi, Ona Gibert, Raul Vazquez, Rolando Coto-Solano, Pavel Denisov, Robert Pugh, Manuel Mager, Arturo Oncevay, Luis Chiruzzo, Katharina von der Wense, and Shruti Rijhwani

In Proceedings of the 4th Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP 2024), 2024
Findings of the AmericasNLP 2024 Shared Task on the Creation of Educational Materials for Indigenous Languages

Luis Chiruzzo, Pavel Denisov, Alejandro Molina-Villegas, Silvia Fernandez-Sabido, Rolando Coto-Solano, Marvin Agüero-Torales, Aldo Alvarez, Samuel Canul-Yah, Lorena Hau-Ucán, Abteen Ebrahimi, Robert Pugh, Arturo Oncevay, Shruti Rijhwani, Katharina von der Wense, and Manuel Mager

In Proceedings of the 4th Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP 2024), 2024
Evaluating LLMs as Tools to Support Early Vocabulary Learning

Jennifer Weber, Maria Valentini, Téa Wright, Katharina von der Wense, and Eliana Colunga

In Proceedings of the Annual Meeting of the Cognitive Science Society (to appear), 2024
Prompting as Panacea? A Case Study of In-Context Learning Performance for Qualitative Coding of Classroom Dialog

Ananya Ganesh, Chelsea Chandler, Sidney D’Mello, Martha Palmer, and Katharina von der Wense

In Proceedings of the International Conference on Educational Data Mining (to appear), 2024
Zero-Shot vs. Translation-Based Cross-Lingual Transfer: The Case of Lexical Gaps

Abteen Ebrahimi, and Katharina von der Wense

In Proceedings of the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (to appear), 2024
Knowledge Distillation vs. Pretraining from Scratch under a Fixed (Computation) Budget

Minh Duc Bui, Fabian David Schmidt, Goran Glavaš, and Katharina von der Wense

In Proceedings of the Workshop on Insights from Negative Results in NLP (to appear), 2024
The Trade-off between Performance, Efficiency, and Fairness in Adapter Modules for Text Classification

Minh Duc Bui, and Katharina von der Wense

In Proceedings of the Fourth Workshop on Trustworthy Natural Language Processing (to appear), 2024
NLP for Language Documentation: Two Reasons for the Gap between Theory and Practice

Luke Gessler, and Katharina von der Wense

In Proceedings of the 4th Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP) (to appear), 2024
JGU Mainz’s Submission to the AmericasNLP 2024 Shared Task on the Creation of Educational Materials for Indigenous Languages

Minh Duc Bui, and Katharina von der Wense

In Proceedings of the 4th Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP) (to appear), 2024
Quantifying the Hyperparameter Sensitivity of Neural Networks for Character-level Sequence-to-Sequence Tasks

Adam Wiemerslage, Kyle Gorman, and Katharina von der Wense

In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics, 2024
Comparing Template-based and Template-free Language Model Probing

Sagi Shaier, Kevin Bennett, Lawrence Hunter, and Katharina von der Wense

In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics, 2024
Desiderata For The Context Use Of Question Answering Systems

Sagi Shaier, Lawrence Hunter, and Katharina von der Wense

In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics, 2024

2023

On the Automatic Generation and Simplification of Children’s Stories

Maria Valentini, Jennifer Weber, Jesus Salcido, Téa Wright, Eliana Colunga, and Katharina von der Wense

In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023
Emerging Challenges in Personalized Medicine: Assessing Demographic Effects on Biomedical Question Answering Systems

Sagi Shaier, Kevin Bennett, Lawrence Hunter, and Katharina von der Wense

In Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, 2023
Who Are All The Stochastic Parrots Imitating? They Should Tell Us!

Sagi Shaier, Lawrence Hunter, and Katharina von der Wense

In Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, 2023
Findings of the CoCo4MT 2023 Shared Task on Corpus Construction for Machine Translation

Ananya Ganesh, Marine Carpuat, William Chen, Katharina Kann, Constantine Lignos, John E. Ortega, Jonne Saleva, Shabnam Tafreshi, and Rodolfo Zevallos

In Proceedings of the Second Workshop on Corpus Generation and Corpus Augmentation for Machine Translation, 2023
Neural Machine Translation for the Indigenous Languages of the Americas: An Introduction

Manuel Mager, Rajat Bhatnagar, Graham Neubig, Ngoc Thang Vu, and Katharina Kann

In Proceedings of the Third Workshop on NLP for Indigenous Languages of the Americas, 2023
Findings of the AmericasNLP 2023 Shared Task on Machine Translation into Indigenous Languages

Abteen Ebrahimi, Manuel Mager, Shruti Rijhwani, Enora Rice, Arturo Oncevay, Claudia Baltazar, María Cortés, Cynthia Montaño, John E Ortega, Rolando Coto-Solano, Hilaria Cruz, Alexis Palmer, and Katharina Kann

In Proceedings of the Third Workshop on NLP for Indigenous Languages of the Americas, 2023
A Survey of Challenges and Methods in the Computational Modeling of Multi-Party Dialog

Ananya Ganesh, Martha Palmer, and Katharina Kann

In Proceedings of the 5th Workshop on NLP for Conversational AI, 2023
Mind the Gap between the Application Track and the Real World

Ananya Ganesh, Jie Cao, Margaret Perkoff, Rosy Southwell, Martha Palmer, and Katharina Kann

In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, 2023
Ethical Considerations for Machine Translation of Indigenous Languages: Giving a Voice to the Speakers

Manuel Mager, Elisabeth Albine Mager, Katharina Kann, and Ngoc Thang Vu

In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, 2023
An Investigation of Noise in Morphological Inflection

Adam Wiemerslage, Changbing Yang, Garrett Nicolai, Miikka Silfverberg, and Katharina Kann

In Findings of the 61st Annual Meeting of the Association for Computational Linguistics, 2023
A Comparative Analysis of Automatic Speech Recognition Errors in Small Group Classroom Discourse

Jie Cao, Ananya Ganesh, Jon Cai, Rosy Southwell, Margaret Perkoff, Michael Reagan, Katharina Kann, James Martin, Martha Palmer, and Sidney D’Mello

In Proceedings of the 31st ACM Conference on User Modeling, Adaptation and Personalization, 2023
Navigating Wanderland: Highlighting Off-Task Discussions in Classrooms

Ananya Ganesh, Michael Chang, Rachel Dickler, Michael Regan, Jon Cai, Kristin Wright-Bettner, James Pustejovsky, James Martin, Jeff Flanigan, Martha Palmer, and Katharina Kann

In Proceedings of the 24th International Conference on Artificial Intelligence in Education, 2023
Meeting the Needs of Low-Resource Languages: Exploring Automatic Alignments via Pretrained Models

Abteen Ebrahimi, Arya D. McCarthy, Arturo Oncevay, John E. Ortega, Luis Chiruzzo, Rolando Coto-Solano, Gustavo A. Giménez-Lugo, and Katharina Kann

In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023

2022

Findings of the Second AmericasNLP Competition on Speech-to-Text Translation

Abteen Ebrahimi, Manuel Mager, Adam Wiemerslage, Pavel Denisov, Arturo Oncevay, Danni Liu, Sai Koneru, Enes Yavuz Ugan, Zhaolin Li, Jan Niehues, Monica Romero, Ivan G Torre, Tanel Alumäe, Jiaming Kong, Sergey Polezhaev, Yury Belousov, Wei-Rui Chen, Peter Sullivan, Ife Adebara, Bashar Talafha, Alcides Alcoba Inciarte, Muhammad Abdul-Mageed, Luis Chiruzzo, Rolando Coto-Solano, Hilaria Cruz, Sofía Flores-Solórzano, Aldo Andrés Alvarez López, Ivan Meza-Ruiz, John E. Ortega, Alexis Palmer, Rodolfo Joel Zevallos Salazar, Kristine Stenzel, Thang Vu, and Katharina Kann

In Proceedings of the NeurIPS 2022 Competitions Track, 2022

Abs PDF

Indigenous languages, including those from the Americas, have received very little attention from the machine learning (ML) and natural language processing (NLP) communities. To tackle the resulting lack of systems for these languages and the accompanying social inequalities affecting their speakers, we conduct the second AmericasNLP competition (and the first one in collaboration with NeurIPS), which is centered around speech-to-text translation systems for Indigenous languages of the Americas. The competition features three tasks – (1) automatic speech recognition, (2) text-based machine translation, and (3) speech-to-text translation – and two tracks: constrained and unconstrained. Five Indigenous languages are covered: Bribri, Guarani, Kotiria, Wa’ikhana, and Quechua. In this overview paper, we describe the tasks, tracks, and languages, introduce the baseline and participating systems, and end with a summary of ongoing and future challenges for the automatic translation of Indigenous languages.
AmericasNLI: Machine translation and natural language inference systems for Indigenous languages of the Americas

Katharina Kann, Abteen Ebrahimi, Manuel Mager, Arturo Oncevay, John E. Ortega, Annette Rios, Angela Fan, Ximena Gutierrez-Vasques, Luis Chiruzzo, Gustavo A. Giménez-Lugo, Ricardo Ramos, Ivan Vladimir Meza Ruiz, Elisabeth Mager, Vishrav Chaudhary, Graham Neubig, Alexis Palmer, Rolando Coto-Solano, and Ngoc Thang Vu

Frontiers in Artificial Intelligence 2022

Abs

Little attention has been paid to the development of human language technology for truly low-resource languages—i.e., languages with limited amounts of digitally available text data, such as Indigenous languages. However, it has been shown that pretrained multilingual models are able to perform crosslingual transfer in a zero-shot setting even for low-resource languages which are unseen during pretraining. Yet, prior work evaluating performance on unseen languages has largely been limited to shallow token-level tasks. It remains unclear if zero-shot learning of deeper semantic tasks is possible for unseen languages. To explore this question, we present AmericasNLI, a natural language inference dataset covering 10 Indigenous languages of the Americas. We conduct experiments with pretrained models, exploring zero-shot learning in combination with model adaptation. Furthermore, as AmericasNLI is a multiway parallel dataset, we use it to benchmark the performance of different machine translation models for those languages. Finally, using a standard transformer model, we explore translation-based approaches for natural language inference. We find that the zero-shot performance of pretrained models without adaptation is poor for all languages in AmericasNLI, but model adaptation via continued pretraining results in improvements. All machine translation models are rather weak, but, surprisingly, translation-based approaches to natural language inference outperform all other models on that task.
A Major Obstacle for NLP Research: Let’s Talk about Time Allocation!

Katharina Kann, Shiran Dudy, and Arya D. McCarthy

In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022
A Comprehensive Comparison of Neural Networks as Cognitive Models of Inflection

Adam Wiemerslage, Shiran Dudy, and Katharina Kann

In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022
CHIA: CHoosing Instances to Annotate for Machine Translation

Rajat Bhatnagar, Ananya Ganesh, and Katharina Kann

In Findings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022
Generate Me a Bedtime Story: Leveraging Natural Language Processing for Early Vocabulary Enhancement

Trevor A. Hall, Maria Valentini, Eliana Colunga, and Katharina Kann

In Proceedings of the Workshop on NLP for Positive Impact, 2022
Machine Translation Between High-resource Languages in a Language Documentation Setting

Katharina Kann, Abteen Ebrahimi, Kristine Stenzel, and Alexis Palmer

In Proceedings of the First Workshop on Applying NLP to Field Linguistics, 2022
Response Construct Tagging: NLP-Aided Assessment for Engineering Education

Ananya Ganesh, Hugh Scribner, Jasdeep Singh, Katherine Goodman, Jean Hertzberg, and Katharina Kann

In Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications, 2022
Open-domain Dialogue Generation: What We Can Do, Cannot Do, And Should Do Next

Katharina Kann, Abteen Ebrahimi, Joewie J. Koh, Shiran Dudy, and Alessandro Roncone

In Proceedings of the 4th Workshop on NLP for Conversational AI, 2022

Poster
AmericasNLI: Evaluating Zero-shot Natural Language Understanding of Pretrained Multilingual Models in Truly Low-resource Languages

Abteen Ebrahimi, Manuel Mager, Arturo Oncevay, Vishrav Chaudhary, Luis Chiruzzo, Angela Fan, John Ortega, Ricardo Ramos, Annette Rios, Ivan Vladimir Meza Ruiz, Gustavo A. Giménez-Lugo, Elisabeth Mager, Graham Neubig, Alexis Palmer, Rolando Coto-Solano, Thang Vu, and Katharina Kann

In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022

Video
How Does Multilingual Pretraining Affect Cross-Lingual Transferability?

Yoshinari Fujinuma, Jordan Lee Boyd-Graber, and Katharina Kann

In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022
Morphological Processing of Low-Resource Languages: Where We Are and What’s Next

Adam Wiemerslage, Miikka Silfverberg, Changbing Yang, Arya D. McCarthy, Garrett Nicolai, Eliana Colunga, and Katharina Kann

In Findings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022

Poster Video
BPE vs. Morphological Segmentation: A Case Study on Machine Translation of Four Polysynthetic Languages

Manuel Mager, Arturo Oncevay, Elisabeth Mager, Katharina Kann, and Thang Vu

In Findings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022

2021

The World of an Octopus: How Reporting Bias Influences a Language Model’s Perception of Color

Cory Paik, Stéphane Aroca-Ouellette, Alessandro Roncone, and Katharina Kann

In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Abs

Recent work has raised concerns about the inherent limitations of text-only pretraining. In this paper, we first demonstrate that reporting bias, the tendency of people to not state the obvious, is one of the causes of this limitation, and then investigate to what extent multimodal training can mitigate this issue. To accomplish this, we 1) generate the Color Dataset (CoDa), a dataset of human-perceived color distributions for 521 common objects; 2) use CoDa to analyze and compare the color distribution found in text, the distribution captured by language models, and a human’s perception of color; and 3) investigate the performance differences between text-only and multimodal models on CoDa. Our results show that the distribution of colors that a language model recovers correlates more strongly with the inaccurate distribution found in text than with the ground-truth, supporting the claim that reporting bias negatively impacts and inherently limits text-only training. We then demonstrate that multimodal models can leverage their visual training to mitigate these effects, providing a promising avenue for future research.
What Would a Teacher Do? Predicting Future Talk Moves

Ananya Ganesh, Martha Palmer, and Katharina Kann

In Findings of the 59th Annual Meeting of the Association for Computational Linguistics, 2021
PROST: Physical Reasoning of Objects through Space and Time

Stephane Aroca-Ouellette, Cory Paik, Alessandro Roncone, and Katharina Kann

In Findings of the 59th Annual Meeting of the Association for Computational Linguistics, 2021
How to Adapt Your Pretrained Multilingual Model to 1600 Languages

Abteen Ebrahimi, and Katharina Kann

In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics, 2021
Don’t Rule Out Monolingual Speakers: A Method For Crowdsourcing Machine Translation Data

Rajat Bhatnagar, Ananya Ganesh, and Katharina Kann

In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics, 2021
Findings of the LoResMT 2021 Shared Task on COVID and Sign Language for Low-resource Languages

Atul Kr. Ojha, Chao-Hong Liu, Katharina Kann, John Ortega, Sheetal Shatam, and Theodorus Fransen

In Proceedings of the 4th Workshop on Technologies for MT of Low Resource Languages (LoResMT2021), 2021

Abs

We present the findings of the LoResMT 2021 shared task which focuses on machine translation (MT) of COVID-19 data for both low-resource spoken and sign languages. The organization of this task was conducted as part of the fourth workshop on technologies for machine translation of low resource languages (LoResMT). Parallel corpora is presented and publicly available which includes the following directions: English↔Irish, English↔Marathi, and Taiwanese Sign language↔Traditional Chinese. Training data consists of 8112, 20933 and 128608 segments, respectively. There are additional monolingual data sets for Marathi and English that consist of 21901 segments. The results presented here are based on entries from a total of eight teams. Three teams submitted systems for English↔Irish while five teams submitted systems for English↔Marathi. Unfortunately, there were no systems submissions for the Taiwanese Sign language↔Traditional Chinese task. Maximum system performance was computed using BLEU and follow as 36.0 for English–Irish, 34.6 for Irish–English, 24.2 for English–Marathi, and 31.3 for Marathi–English.
Paradigm Clustering with Weighted Edit Distance

Andrew Gerlach, Adam Wiemerslage, and Katharina Kann

In Proceedings of the 18th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, 2021

Abs

This paper describes our system for the SIGMORPHON 2021 Shared Task on Unsupervised Morphological Paradigm Clustering, which asks participants to group inflected forms together according their underlying lemma without the aid of annotated training data. We employ agglomerative clustering to group word forms together using a metric that combines an orthographic distance and a semantic distance from word embeddings. We experiment with two variations of an edit distance-based model for quantifying orthographic distance, but, due to time constraints, our system does not improve over the shared task’s baseline system.
Findings of the SIGMORPHON 2021 Shared Task on Unsupervised Morphological Paradigm Clustering

Adam Wiemerslage, Arya D. McCarthy, Alexander Erdmann, Garrett Nicolai, Manex Agirrezabal, Miikka Silfverberg, Mans Hulden, and Katharina Kann

In Proceedings of the 18th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, 2021

Abs

We describe the second SIGMORPHON shared task on unsupervised morphology: the goal of the SIGMORPHON 2021 Shared Task on Unsupervised Morphological Paradigm Clustering is to cluster word types from a raw text corpus into paradigms. To this end, we release corpora for 5 development and 9 test languages, as well as gold partial paradigms for evaluation. We receive 14 submissions from 4 teams that follow different strategies, and the best performing system is based on adaptor grammars. Results vary significantly across languages. However, all systems are outperformed by a supervised lemmatizer, implying that there is still room for improvement.
Findings of the AmericasNLP 2021 Shared Task on Open Machine Translation for Indigenous Languages of the Americas

Manuel Mager, Arturo Oncevay, Abteen Ebrahimi, John Ortega, Annette Rios, Angela Fan, Ximena Gutierrez-Vasques, Luis Chiruzzo, Gustavo Giménez-Lugo, Ricardo Ramos, Ivan Vladimir Meza Ruiz, Rolando Coto-Solano, Alexis Palmer, Elisabeth Mager-Hois, Vishrav Chaudhary, Graham Neubig, Ngoc Thang Vu, and Katharina Kann

In Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas, 2021
Coloring the Black Box: What Synesthesia Tells Us about Character Embeddings

Katharina Kann, and Mauro M. Monsalve-Mercado

In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics, 2021
CLiMP: A Benchmark for Chinese Language Model Evaluation

Beilei Xiang, Changbing Yang, Yu Li, Alex Warstadt, and Katharina Kann

In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics, 2021

2020

Making a Point: Pointer-Generator Transformers for Disjoint Vocabularies

Nikhil Prabhu, and Katharina Kann

In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 9th International Joint Conference on Natural Language Processing Student Research Workshop, 2020

Best Paper Award
English Intermediate-Task Training Improves Zero-Shot Cross-Lingual Transfer Too

Jason Phang, Phu Mon Htut, Yada Pruksachatkun, Haokun Liu, Clara Vania, Iacer Calixto, Katharina Kann, and Samuel R. Bowman

In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 9th International Joint Conference on Natural Language Processing, 2020
Tackling the Low-resource Challenge for Canonical Segmentation

Manuel Mager, Özlem Çetinoğlu, and Katharina Kann

In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020
Acrostic Poem Generation

Rajat Agarwal, and Katharina Kann

In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020
IGT2P: From Interlinear Glossed Texts to Paradigms

Sarah Moeller, Ling Liu, Changbing Yang, Katharina Kann, and Mans Hulden

In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020
Why Overfitting Isn’t Always Bad: Retrofitting Cross-Lingual Word Embeddings to Dictionaries

Mozhi Zhang, Yoshinari Fujinuma, Michael J. Paul, and Jordan Boyd-Graber

In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Abs

Cross-lingual word embeddings (CLWE) are often evaluated on bilingual lexicon induction (BLI). Recent CLWE methods use linear projections, which underfit the training dictionary, to generalize on BLI. However, underfitting can hinder generalization to other downstream tasks that rely on words from the training dictionary. We address this limitation by retrofitting CLWE to the training dictionary, which pulls training translation pairs closer in the embedding space and overfits the training dictionary. This simple post-processing step often improves accuracy on two downstream tasks, despite lowering BLI test accuracy. We also retrofit to both the training dictionary and a synthetic dictionary induced from CLWE, which sometimes generalizes even better on downstream tasks. Our results confirm the importance of fully exploiting training dictionary in downstream tasks and explains why BLI is a flawed CLWE evaluation.
The SIGMORPHON 2020 Shared Task on Unsupervised Morphological Paradigm Completion

Katharina Kann, Arya D. McCarthy, Garrett Nicolai, and Mans Hulden

In Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, 2020

Abs

In this paper, we describe the findings of the SIGMORPHON 2020 shared task on unsupervised morphological paradigm completion (SIGMORPHON 2020 Task 2), a novel task in the field of inflectional morphology. Participants were asked to submit systems which take raw text and a list of lemmas as input, and output all inflected forms, i.e., the entire morphological paradigm, of each lemma. In order to simulate a realistic use case, we first released data for 5 development languages. However, systems were officially evaluated on 9 surprise languages, which were only revealed a few days before the submission deadline. We provided a modular baseline system, which is a pipeline of 4 components. 3 teams submitted a total of 7 systems, but, surprisingly, none of the submitted systems was able to improve over the baseline on average over all 9 test languages. Only on 3 languages did a submitted system obtain the best results. This shows that unsupervised morphological paradigm completion is still largely unsolved. We present an analysis here, so that this shared task will ground further research on the topic.
Frustratingly Easy Multilingual Grapheme-to-Phoneme Conversion

Nikhil Prabhu, and Katharina Kann

In Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, 2020

Abs

In this paper, we describe two CU-Boulder submissions to the SIGMORPHON 2020 Task 1 on multilingual grapheme-to-phoneme conversion (G2P). Inspired by the high performance of a standard transformer model (Vaswani et al., 2017) on the task, we improve over this approach by adding two modifications: (i) Instead of training exclusively on G2P, we additionally create examples for the opposite direction, phoneme-to-grapheme conversion (P2G). We then perform multi-task training on both tasks. (ii) We produce ensembles of our models via majority voting. Our approaches, though being conceptually simple, result in systems that place 6th and 8th amongst 23 submitted systems, and obtain the best results out of all systems on Lithuanian and Modern Greek, respectively.
The NYU-CUBoulder Systems for SIGMORPHON 2020 Task 0 and Task 2

Assaf Singer, and Katharina Kann

In Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, 2020

Abs

We describe the NYU-CUBoulder systems for the SIGMORPHON 2020 Task 0 on typologically diverse morphological inflection and Task 2 on unsupervised morphological paradigm completion. The former consists of generating morphological inflections from a lemma and a set of morphosyntactic features describing the target form. The latter requires generating entire paradigms for a set of given lemmas from raw text alone. We model morphological inflection as a sequence-to-sequence problem, where the input is the sequence of the lemma’s characters with morphological tags, and the output is the sequence of the inflected form’s characters. First, we apply a transformer model to the task. Second, as inflected forms share most characters with the lemma, we further propose a pointer-generator transformer model to allow easy copying of input characters.
The IMS–CUBoulder System for the SIGMORPHON 2020 Shared Task on Unsupervised Morphological Paradigm Completion

Manuel Mager, and Katharina Kann

In Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, 2020

Abs

In this paper, we present the systems of the University of Stuttgart IMS and the University of Colorado Boulder (IMS–CUBoulder) for SIGMORPHON 2020 Task 2 on unsupervised morphological paradigm completion (Kann et al., 2020). The task consists of generating the morphological paradigms of a set of lemmas, given only the lemmas themselves and unlabeled text. Our proposed system is a modified version of the baseline introduced together with the task. In particular, we experiment with substituting the inflection generation component with an LSTM sequence-to-sequence model and an LSTM pointer-generator network. Our pointer-generator system obtains the best score of all seven submitted systems on average over all languages, and outperforms the official baseline, which was best overall, on Bulgarian and Kannada.
Self-Training for Unsupervised Parsing with PRPN

Anhad Mohananey, Katharina Kann, and Samuel R. Bowman

In Proceedings of the 16th International Conference on Parsing Technologies and the IWPT 2020 Shared Task on Parsing into Enhanced Universal Dependencies, 2020

Abs

Neural unsupervised parsing (UP) models learn to parse without access to syntactic annotations, while being optimized for another task like language modeling. In this work, we propose self-training for neural UP models: we leverage aggregated annotations predicted by copies of our model as supervision for future copies. To be able to use our model’s predictions during training, we extend a recent neural UP architecture, the PRPN (Shen et al., 2018a), such that it can be trained in a semi-supervised fashion. We then add examples with parses predicted by our model to our unlabeled UP training data. Our self-trained model outperforms the PRPN by 8.1% F1 and the previous state of the art by 1.6% F1. In addition, we show that our architecture can also be helpful for semi-supervised parsing in ultra-low-resource settings.
Intermediate-Task Transfer Learning with Pretrained Language Models: When and Why Does It Work?

Yada Pruksachatkun, Jason Phang, Haokun Liu, Phu Mon Htut, Xiaoyi Zhang, Richard Yuanzhe Pang, Clara Vania, Katharina Kann, and Samuel R. Bowman

In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Abs

While pretrained models such as BERT have shown large gains across natural language understanding tasks, their performance can be improved by further training the model on a data-rich intermediate task, before fine-tuning it on a target task. However, it is still poorly understood when and why intermediate-task training is beneficial for a given target task. To investigate this, we perform a large-scale study on the pretrained RoBERTa model with 110 intermediate-target task combinations. We further evaluate all trained models with 25 probing tasks meant to reveal the specific skills that drive transfer. We observe that intermediate tasks requiring high-level inference and reasoning abilities tend to work best. We also observe that target task performance is strongly correlated with higher-level abilities such as coreference resolution. However, we fail to observe more granular correlations between probing and target task performance, highlighting the need for further work on broad-coverage probing benchmarks. We also observe evidence that the forgetting of knowledge learned during pretraining may limit our analysis, highlighting the need for further work on transfer learning methods in these settings.
Unsupervised Morphological Paradigm Completion

Huiming Jin, Liwei Cai, Yihui Peng, Chen Xia, Arya McCarthy, and Katharina Kann

In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Abs

We propose the task of unsupervised morphological paradigm completion. Given only raw text and a lemma list, the task consists of generating the morphological paradigms, i.e., all inflected forms, of the lemmas. From a natural language processing (NLP) perspective, this is a challenging unsupervised task, and high-performing systems have the potential to improve tools for low-resource languages or to assist linguistic annotators. From a cognitive science perspective, this can shed light on how children acquire morphological knowledge. We further introduce a system for the task, which generates morphological paradigms via the following steps: (i) EDIT TREE retrieval, (ii) additional lemma retrieval, (iii) paradigm size discovery, and (iv) inflection generation. We perform an evaluation on 14 typologically diverse languages. Our system outperforms trivial baselines with ease and, for some languages, even obtains a higher accuracy than minimally supervised systems.
Learning to Learn Morphological Inflection for Resource-Poor Languages

Katharina Kann, Samuel R. Bowman, and Kyunghyun Cho

In Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020
Weakly Supervised POS Taggers Perform Poorly on Truly Low-Resource Languages

Katharina Kann, Ophélie Lacroix, and Anders Søgaard

In Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020
Acquisition of Inflectional Morphology in Artificial Neural Networks With Prior Knowledge

Katharina Kann

In Proceedings of the Society for Computation in Linguistics, 2020