publications
Selected publications of NALA members (since 2020).
2026
- Does Contextual Informativeness Predict Preschoolers’ Word Learning from Stories?In Proceedings of the Annual Meeting of the Cognitive Science Society, 2026
- From If-Statements to ML Pipelines: Revisiting Bias in Code-GenerationIn Findings of the Association for Computational Linguistics: ACL 2026, 2026
- Large Language Models Are Overconfident in Their Own ResponsesIn Findings of the Association for Computational Linguistics: ACL 2026, 2026
- Meenz bleibt Meenz, but Large Language Models Do Not Speak Its DialectIn Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026), 2026
2025
- Mitigating Label Length Bias in Large Language ModelsIn Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, 2025
- NALA_MAINZ at BLP-2025 Task 2: A Multi-agent Approach for Bangla Instruction to Python Code GenerationIn Proceedings of the Workshop on Bangla Language Processing (BLP 2025), 2025
- Dialogue Acts as a Lens on Human–LLM Interaction: Analyzing Conversational Norms in Model-Generated ResponsesIn Proceedings of the Workshop on Bridging Human–Computer Interaction and Natural Language Processing (HCI+NLP 2025), 2025
- JGU Mainz’s submission to the WMT25 shared task on LLMs with limited resources for Slavic languages: MT and QAIn Proceedings of the Tenth Conference on Machine Translation (WMT 2025), 2025
- Molecular String Representation Preferences in Pretrained LLMs: A Comparative Study in Zero- & Few-Shot Molecular Property PredictionIn Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025
- Large Language Models Discriminate Against Speakers of German DialectsIn Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025
- Interdisciplinary Research in Conversation: A Case Study in Computational Morphology for Language DocumentationIn Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025
- Mind the Gap: A Closer Look at Tokenization for Multiple-Choice Question Answering with LLMsIn Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025
- ReSeeding Latent States for Sequential Language UnderstandingIn Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025
- Model-Based Ranking of Source Languages for Zero-Shot Cross-Lingual TransferIn Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025
- Linguistic Alignment Predicts Learning in Small Group Tutoring SessionsIn Findings of the Association for Computational Linguistics: EMNLP 2025, 2025
- Implicitly Aligning Humans and Autonomous Agents through Shared Task AbstractionsIn Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2025), 2025
- On Generalization across Measurement Systems: LLMs Entail More Test-Time Compute for Underrepresented CulturesIn Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics, 2025
- Improving Low-Resource Morphological Inflection via Self-Supervised ObjectivesIn Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics, 2025
- Understanding the Gap: an Analysis of Research Collaborations in NLP and Language DocumentationIn Findings of the Association for Computational Linguistics: ACL 2025, 2025
- MALAMUTE: A Multilingual, Highly-granular, Template-free, Education-based Probing DatasetIn Findings of the Association for Computational Linguistics: ACL 2025, 2025
- CLIX: Cross-Lingual Explanations of Idiomatic ExpressionsIn Findings of the Association for Computational Linguistics: ACL 2025, 2025
- The Effectiveness of Uncased Tokenization for Clinical NotesIn Findings of the Association for Computational Linguistics: ACL 2025, 2025
- Untangling the Influence of Typology, Data, and Model Architecture on Ranking Transfer Languages for Cross-Lingual POS TaggingIn Proceedings of the Workshop on Language Models for Underserved Communities (LM4UC 2025), 2025
- Corrective In-Context Learning: Evaluating Self-Correction in Large Language ModelsIn Proceedings of the Workshop on Insights from Negative Results in NLP, 2025
- Findings of the AmericasNLP 2025 Shared Tasks on Machine Translation, Creation of Educational Material, and Translation Metrics for Indigenous Languages of the AmericasIn Proceedings of the Fifth Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP), 2025
- Multi3Hate: Multimodal, Multilingual, and Multicultural Hate Speech Detection with Vision–Language ModelsIn Proceedings of the 2025 Conference of the North American Chapter of the Association for Computational Linguistics, 2025
- More Experts Than Galaxies: Conditionally-Overlapping Experts with Biologically-Inspired Fixed RoutingIn The Thirteenth International Conference on Learning Representations (ICLR 2025), 2025
- Asking Again and Again: Exploring LLM Robustness to Repeated Questions2025
- From Priest to Doctor: Domain Adaptation for Low-Resource Neural Machine TranslationIn Proceedings of the 31st International Conference on Computational Linguistics, 2025
- Measuring Contextual Informativeness in Child-Directed TextIn Proceedings of the 31st International Conference on Computational Linguistics, 2025
2024
- Identifying Telescope Usage in Astrophysics Publications: A Machine Learning Framework for Institutional Research Management at ObservatoriesThe Astronomical Journal 2024
- Getting The Most Out of Your Training Data: Exploring Unsupervised Tasks for Morphological InflectionIn Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024
- It Is Not About What You Say, It Is About How You Say It: A Surprisingly Simple Approach for Improving Reading ComprehensionIn Findings of the Association for Computational Linguistics ACL 2024, 2024
- TAMS: Translation-Assisted Morphological SegmentationIn Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024
- Aligning to Adults Is Easy, Aligning to Children Is Hard: A Study of Linguistic Alignment in Dialogue SystemsIn Proceedings of the 1st Human-Centered Large Language Modeling Workshop, 2024
- Eyes on the Game: Deciphering Implicit Human Signals to Infer Human Proficiency, Trust, and IntentIn Proceedings of the 2024 33rd IEEE International Conference on Robot and Human Interactive Communication (ROMAN), 2024
- Findings of the AmericasNLP 2024 Shared Task on Machine Translation into Indigenous LanguagesIn Proceedings of the 4th Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP 2024), 2024
- Findings of the AmericasNLP 2024 Shared Task on the Creation of Educational Materials for Indigenous LanguagesIn Proceedings of the 4th Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP 2024), 2024
- Evaluating LLMs as Tools to Support Early Vocabulary LearningIn Proceedings of the Annual Meeting of the Cognitive Science Society, 2024
- Prompting as Panacea? A Case Study of In-Context Learning Performance for Qualitative Coding of Classroom DialogIn Proceedings of the International Conference on Educational Data Mining, 2024
- Zero-Shot vs. Translation-Based Cross-Lingual Transfer: The Case of Lexical GapsIn Proceedings of the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 2024
- Knowledge Distillation vs. Pretraining from Scratch under a Fixed (Computation) BudgetIn Proceedings of the Workshop on Insights from Negative Results in NLP, 2024
- The Trade-off between Performance, Efficiency, and Fairness in Adapter Modules for Text ClassificationIn Proceedings of the Fourth Workshop on Trustworthy Natural Language Processing, 2024
- NLP for Language Documentation: Two Reasons for the Gap between Theory and PracticeIn Proceedings of the 4th Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP), 2024
- JGU Mainz’s Submission to the AmericasNLP 2024 Shared Task on the Creation of Educational Materials for Indigenous LanguagesIn Proceedings of the 4th Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP), 2024
- Quantifying the Hyperparameter Sensitivity of Neural Networks for Character-level Sequence-to-Sequence TasksIn Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics, 2024
- Comparing Template-based and Template-free Language Model ProbingIn Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics, 2024
- Desiderata For The Context Use Of Question Answering SystemsIn Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics, 2024
2023
- On the Automatic Generation and Simplification of Children’s StoriesIn Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023
- Emerging Challenges in Personalized Medicine: Assessing Demographic Effects on Biomedical Question Answering SystemsIn Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, 2023
- Who Are All The Stochastic Parrots Imitating? They Should Tell Us!In Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, 2023
- Findings of the CoCo4MT 2023 Shared Task on Corpus Construction for Machine TranslationIn Proceedings of the Second Workshop on Corpus Generation and Corpus Augmentation for Machine Translation, 2023
- Neural Machine Translation for the Indigenous Languages of the Americas: An IntroductionIn Proceedings of the Third Workshop on NLP for Indigenous Languages of the Americas, 2023
- Findings of the AmericasNLP 2023 Shared Task on Machine Translation into Indigenous LanguagesIn Proceedings of the Third Workshop on NLP for Indigenous Languages of the Americas, 2023
- A Survey of Challenges and Methods in the Computational Modeling of Multi-Party DialogIn Proceedings of the 5th Workshop on NLP for Conversational AI, 2023
- Mind the Gap between the Application Track and the Real WorldIn Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, 2023
- Ethical Considerations for Machine Translation of Indigenous Languages: Giving a Voice to the SpeakersIn Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, 2023
- An Investigation of Noise in Morphological InflectionIn Findings of the 61st Annual Meeting of the Association for Computational Linguistics, 2023
- A Comparative Analysis of Automatic Speech Recognition Errors in Small Group Classroom DiscourseIn Proceedings of the 31st ACM Conference on User Modeling, Adaptation and Personalization, 2023
- Navigating Wanderland: Highlighting Off-Task Discussions in ClassroomsIn Proceedings of the 24th International Conference on Artificial Intelligence in Education, 2023
- Meeting the Needs of Low-Resource Languages: Exploring Automatic Alignments via Pretrained ModelsIn Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023
2022
- AmericasNLI: Machine translation and natural language inference systems for Indigenous languages of the AmericasFrontiers in Artificial Intelligence 2022
- A Major Obstacle for NLP Research: Let’s Talk about Time Allocation!In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022
- A Comprehensive Comparison of Neural Networks as Cognitive Models of InflectionIn Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022
- CHIA: CHoosing Instances to Annotate for Machine TranslationIn Findings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022
- Generate Me a Bedtime Story: Leveraging Natural Language Processing for Early Vocabulary EnhancementIn Proceedings of the Workshop on NLP for Positive Impact, 2022
- Machine Translation Between High-resource Languages in a Language Documentation SettingIn Proceedings of the First Workshop on Applying NLP to Field Linguistics, 2022
- Response Construct Tagging: NLP-Aided Assessment for Engineering EducationIn Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications, 2022
- Open-domain Dialogue Generation: What We Can Do, Cannot Do, And Should Do NextIn Proceedings of the 4th Workshop on NLP for Conversational AI, 2022
- AmericasNLI: Evaluating Zero-shot Natural Language Understanding of Pretrained Multilingual Models in Truly Low-resource LanguagesIn Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022
- How Does Multilingual Pretraining Affect Cross-Lingual Transferability?In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022
- BPE vs. Morphological Segmentation: A Case Study on Machine Translation of Four Polysynthetic LanguagesIn Findings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022
2021
- The World of an Octopus: How Reporting Bias Influences a Language Model’s Perception of ColorIn Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021
- What Would a Teacher Do? Predicting Future Talk MovesIn Findings of the 59th Annual Meeting of the Association for Computational Linguistics, 2021
- PROST: Physical Reasoning of Objects through Space and TimeIn Findings of the 59th Annual Meeting of the Association for Computational Linguistics, 2021
- How to Adapt Your Pretrained Multilingual Model to 1600 LanguagesIn Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics, 2021
- Don’t Rule Out Monolingual Speakers: A Method For Crowdsourcing Machine Translation DataIn Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics, 2021
- Findings of the LoResMT 2021 Shared Task on COVID and Sign Language for Low-resource LanguagesIn Proceedings of the 4th Workshop on Technologies for MT of Low Resource Languages (LoResMT2021), 2021
- Paradigm Clustering with Weighted Edit DistanceIn Proceedings of the 18th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, 2021
- Findings of the SIGMORPHON 2021 Shared Task on Unsupervised Morphological Paradigm ClusteringIn Proceedings of the 18th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, 2021
- Findings of the AmericasNLP 2021 Shared Task on Open Machine Translation for Indigenous Languages of the AmericasIn Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas, 2021
- Coloring the Black Box: What Synesthesia Tells Us about Character EmbeddingsIn Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics, 2021
- CLiMP: A Benchmark for Chinese Language Model EvaluationIn Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics, 2021
2020
- Making a Point: Pointer-Generator Transformers for Disjoint VocabulariesIn Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 9th International Joint Conference on Natural Language Processing Student Research Workshop, 2020Best Paper Award
- English Intermediate-Task Training Improves Zero-Shot Cross-Lingual Transfer TooIn Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 9th International Joint Conference on Natural Language Processing, 2020
- Tackling the Low-resource Challenge for Canonical SegmentationIn Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020
- Acrostic Poem GenerationIn Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020
- IGT2P: From Interlinear Glossed Texts to ParadigmsIn Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020
- Why Overfitting Isn’t Always Bad: Retrofitting Cross-Lingual Word Embeddings to DictionariesIn Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020
- The SIGMORPHON 2020 Shared Task on Unsupervised Morphological Paradigm CompletionIn Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, 2020
- Frustratingly Easy Multilingual Grapheme-to-Phoneme ConversionIn Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, 2020
- The NYU-CUBoulder Systems for SIGMORPHON 2020 Task 0 and Task 2In Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, 2020
- The IMS–CUBoulder System for the SIGMORPHON 2020 Shared Task on Unsupervised Morphological Paradigm CompletionIn Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, 2020
- Self-Training for Unsupervised Parsing with PRPNIn Proceedings of the 16th International Conference on Parsing Technologies and the IWPT 2020 Shared Task on Parsing into Enhanced Universal Dependencies, 2020
- Intermediate-Task Transfer Learning with Pretrained Language Models: When and Why Does It Work?In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020
- Unsupervised Morphological Paradigm CompletionIn Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020
- Learning to Learn Morphological Inflection for Resource-Poor LanguagesIn Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020
- Weakly Supervised POS Taggers Perform Poorly on Truly Low-Resource LanguagesIn Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020
- Acquisition of Inflectional Morphology in Artificial Neural Networks With Prior KnowledgeIn Proceedings of the Society for Computation in Linguistics, 2020