more rich data requires smaller sample sizes is the seemingly obvious point here. I'm surprised this is a new concept (combining gene expression and genotype data, that is), especially in the gene expression literature, given that it's probably easier to get genotype data if you're getting gene expression data, than the other way around.
Increasing the Power to Detect Causal Associations by Combining Genotypic and Expression Data in Segregating Populations
Jun Zhu, Matthew C. Wiener, Chunsheng Zhang, Arthur Fridman, Eric Minch, Pek Y. Lum, Jeffrey R. Sachs, Eric E. Schadt
PLoS Computational Biology 3(4): e69
Summary: Complex phenotypes such as common human diseases are caused by variations in DNA in many genes that interact in complex ways with a number of environmental factors. These multifactorial gene and environmental perturbations induce changes in molecular networks that in turn lead to phenotypic changes in the organism under study. The comprehensive monitoring of transcript abundances using gene expression microarrays in different tissues over a large number of individuals in a population can be used to reconstruct molecular networks that underlie higher-order phenotypes such as disease. The cost to generate these large-scale gene activity measurements over large numbers of individuals can be extreme. However, by integrating DNA variation and gene activity data monitored in each individual in a given population of interest, we demonstrate that the power to elucidate molecular networks that drive complex phenotypes can be significantly enhanced, without increasing the sample size. Using a biologically realistic simulation framework, we demonstrate that molecular networks reconstructed using the combined DNA variation and gene activity data are more accurate than molecular networks reconstructed from gene activity data alone, implying that adding DNA variation data might allow us to use fewer subjects to produce molecular networks that better explain complex phenotypes such as disease.