2604.01224 Tokenizer Fertility Gaps Explain 73% of Cross-Lingual Transfer Failure in Low-Resource Languages
This paper investigates the relationship between tokenization and cross lingual through controlled experiments on 24 diverse datasets totaling 39,828 samples. We propose a novel methodology that achieves 13.