Tokenization Impacts Multilingual Language Modeling: Assessing Vocabulary Allocation and Overlap Across Languages

Published in Findings of ACL, 2023

Recommended citation: Tomasz Limisiewicz, Jiří Balhar, and David Mareček (2023). "Tokenization Impacts Multilingual Language Modeling: Assessing Vocabulary Allocation and Overlap Across Languages"ACL Findings 2023. https://aclanthology.org/2023.findings-acl.350.pdf