The structure of common genetic variation in U.S. populations
Stephen L. Guthery et al.
ABSTRACT: The common variant/common disease model predicts that most risk alleles underlying complex health-related traits are common and therefore old and found in multiple populations, rather than rare or population-specific. Accordingly, there is widespread interest in assessing the population structure of common alleles. However, such assessments have been confounded by analysis of datasets with bias toward ascertainment of common alleles (e.g., HapMap, Perlegen) or in which a relatively small number of genes and/or populations were sampled. The aim of this study was to examine the structure of common variation ascertained in major U.S. populations by resequencing the exons and flanking regions of 3,873 genes in 154 chromosomes from European, Latino/Hispanic, Asian, and African Americans generated by the Genaissance Resequencing Project. The frequency distributions of private and common single nucleotide polymorphisms (SNPs) were measured, and the extent to which common SNPs were shared across populations was analyzed using several different estimators of population structure. Most SNPs that were common in one population were present in multiple populations, but SNPs common in one population were frequently not common in other populations. Moreover, SNPs that were common in two or more populations often differed significantly in frequency from one another, particularly in comparisons of African Americans versus other U.S. populations. These findings indicate that even if the bulk of alleles underlying complex health-related traits are common SNPs, geographic ancestry might well be an important predictor of whether a person carries a risk allele.