Tuesday, October 28, 2008

Predicting unobserved phenotypes from genotypic data

This looks really interesting. The goal is to use whole genome SNP data to predict three phenotypes in mice (coal color, % of CD8+ cells, cellular hemoglobin), using "reversible jump MCMC". According to this website this is how RJMCMC works:
"– RJMCMC randomly “walks around” the space of possible model structures by changing one edge at a time – called structurallearning.
– At each step in its “walk”, all of the model parameters are updated
– called parametrical learning.
– At the end, you have a list of all of the model structures it visited at each step and their corresponding set of parameters."
They cite this recent paper that I posted about a few months ago that proposes a different way of looking simultaneously at the association between a collection of SNPs and some trait.
So, in this paper, they seem to be able to make decent predictions about these traits using about 12,000 SNPs in each of 2,300 mice. The correlations between observed and expexted phenotypes, using only genotype data, are in the range of .33 to .85.
There's really a lot of interesting things in this paper and I don't have to go over all of them.
I'll just end by mentioning that, as the authors state, if we were to do this in humans we would probably need many more markers and more individuals since the mice used in this experiment were from inbred lines with extended LD.

Predicting Unobserved Phenotypes for Complex Traits from Whole-Genome SNP Data
Sang Hong Lee, Julius H. J. van der Werf, Ben J. Hayes, Michael E. Goddard, Peter M. Visscher
PLoS Genet 4(10): e1000231.
Abstract: Genome-wide association studies (GWAS) for quantitative traits and disease in humans and other species have shown that there are many loci that contribute to the observed resemblance between relatives. GWAS to date have mostly focussed on discovery of genes or regulatory regions habouring causative polymorphisms, using single SNP analyses and setting stringent type-I error rates. Genome-wide marker data can also be used to predict genetic values and therefore predict phenotypes. Here, we propose a Bayesian method that utilises all marker data simultaneously to predict phenotypes. We apply the method to three traits: coat colour, %CD8 cells, and mean cell haemoglobin, measured in a heterogeneous stock mouse population. We find that a model that contains both additive and dominance effects, estimated from genome-wide marker data, is successful in predicting unobserved phenotypes and is significantly better than a prediction based upon the phenotypes of close relatives. Correlations between predicted and actual phenotypes were in the range of 0.4 to 0.9 when half of the number of families was used to estimate effects and the other half for prediction. Posterior probabilities of SNPs being associated with coat colour were high for regions that are known to contain loci for this trait. The prediction of phenotypes using large samples, high-density SNP data, and appropriate statistical methodology is feasible and can be applied in human medicine, forensics, or artificial selection programs.

No comments:

Locations of visitors to this page