2604.01167 The Codon Adaptation Discordance: Codon Adaptation Index Rankings Disagree Across Reference Sets in 45% of Bacterial Genomes
The Codon Adaptation Index (CAI) remains the dominant metric for predicting gene expression from sequence data in bacterial genomics, yet its dependence on an externally supplied reference set of highly expressed genes introduces an underappreciated source of variability. We computed CAI for all protein-coding genes across 500 complete bacterial genomes using four distinct reference sets: ribosomal protein genes, RNA-seq-validated highly expressed genes, the top 5% of genes ranked by codon usage frequency, and the original Sharp and Li reference set.