The papers are basically large scale studies that demonstrate the need for stratification correction among European Americans, for example in a WGA analysis of rheumatoid arthritis. They've come up with sets of AIMs (ancestry informative markers) that are useful for distinguishing between various European groups (Ashkenazi Jews, Irish, and north-south). There's lots of good stuff in these papers (beyond the scope of my post here). I wish they would have talked a bit more about what diseases differ in prevalence between European groups: Crohn's disease, rheumatoid arthritis, inflammatory bowel syndrome, celiac disease, coronary heart disease, alzheimers, multiple sclerosis, cystic fibrosis... There's lots of good figures, but these are the only figure I could manage to put here.
If you're interested in this, there are other recent papers (here, here) that have looked at European substructure and blogwise, I would also suggest Dienekes' blog, since he's really on top of this topic.
Application of Ancestry Informative Markers to Association Studies in European Americans Michael F. Seldin, Alkes L. Price
from this paper:
For example, in a recent WGA study of rheumatoid arthritis in European Americans, markers in the LCT and IRF4 genes would have been falsely implicated as associated to disease without the application of methods to control for stratification . Similar empirical examples of population stratification exist for other phenotypes, and genetic risk has been reported to vary across Europe for a wide range of diseases [4–8]. In general, investigators should be alerted to consider population stratification when WGA data indicates that a particular marker shows a strong frequency gradient across Europe.Discerning the Ancestry of European Americans in Genetic Association Studies Alkes L. Price
An important question is which ancestries should be evaluated in replication studies by genotyping of AIMs at additional cost. The answer to this question will vary from study to study, depending on factors such as the collection location of cases and controls, the phenotype being studied, and considerations of cost. For example, a study of a phenotype with known ancestry differences, in which cases are collected from a large city and controls are collected from throughout the country, would be well-advised to define ancestry to the fullest extent possible. On the other hand, a study of a phenotype with no known ancestry differences, involving cases and controls rigorously matched by location, might choose to bypass the use of AIMs entirely. An intermediate option would be to model only north–south ancestry, addressing the single most likely source of stratification at partial cost, with some residual risk of stratification.
Abstract: European Americans are often treated as a homogeneous group, but in fact form a structured population due to historical immigration of diverse source populations. Discerning the ancestry of European Americans genotyped in association studies is important in order to prevent false-positive or false-negative associations due to population stratification and to identify genetic variants whose contribution to disease risk differs across European ancestries. Here, we investigate empirical patterns of population structure in European Americans, analyzing 4,198 samples from four genome-wide association studies to show that components roughly corresponding to northwest European, southeast European, and Ashkenazi Jewish ancestry are the main sources of European American population structure. Building on this insight, we constructed a panel of 300 validated markers that are highly informative for distinguishing these ancestries. We demonstrate that this panel of markers can be used to correct for stratification in association studies that do not generate dense genotype data.Analysis and Application of European Genetic Substructure Using 300 K SNP Information Chao Tian et al.
Abstract: European population genetic substructure was examined in a diverse set of >1,000 individuals of European descent, each genotyped with >300 K SNPs. Both STRUCTURE and principal component analyses (PCA) showed the largest division/principal component (PC) differentiated northern from southern European ancestry. A second PC further separated Italian, Spanish, and Greek individuals from those of Ashkenazi Jewish ancestry as well as distinguishing among northern European populations. In separate analyses of northern European participants other substructure relationships were discerned showing a west to east gradient. Application of this substructure information was critical in examining a real dataset in whole genome association (WGA) analyses for rheumatoid arthritis in European Americans to reduce false positive signals. In addition, two sets of European substructure ancestry informative markers (ESAIMs) were identified that provide substantial substructure information. The results provide further insight into European population genetic substructure and show that this information can be used for improving error rates in association testing of candidate genes and in replication studies of WGA scans.