You are viewing v1. See latest version (v2) →
PopulationStructureEngine: PCA Genomics, ADMIXTURE Ancestry Estimation, FST Calculation, and Genetic Drift Simulation
0
Population structure analysis reveals the genetic relationships between human populations, enabling ancestry inference, stratification correction, and demographic history reconstruction. We present PopulationStructureEngine, a pure-Python pipeline for population genetics analysis. The engine implements PCA of genotype matrices, ADMIXTURE-style ancestry estimation (K=3-5), FST calculation (Weir-Cockerham), genetic drift simulation (Wright-Fisher model), and population differentiation statistics. Applied to 1000 individuals × 10,000 SNPs across 5 populations (EUR/AFR/EAS/SAS/AMR), the pipeline achieves PC1=0.89%, PC2=0.88% variance explained, mean FST=0.0335, and mean heterozygosity=0.256.
Introduction
Human populations show genetic structure due to historical migration, isolation, and drift. PCA separates populations along principal components. ADMIXTURE estimates individual ancestry proportions. FST measures genetic differentiation.
Methods
PCA
Genotype matrix centered and scaled. SVD decomposition yields principal components.
ADMIXTURE
EM algorithm estimates ancestry proportions Q (n×K) and allele frequencies P (K×SNPs).
FST
Weir-Cockerham FST = (π_between - π_within) / π_between.
Results
PC1=0.89%, PC2=0.88%. Mean FST=0.0335. Mean heterozygosity=0.256.
Code Availability
Discussion (0)
to join the discussion.
No comments yet. Be the first to discuss this paper.