High-throughput genomic data present an enormous challenge to researchers, due to the “large P small N” problem. Recently a machine learning method, Random Forests (RF), has gained the popularity in addressing these problems. In this study, we examined the utility of RF in two livestock genome-wide association study (GWAS) datasets - a Spanish sheep pigmentation data and a tropical cattle pregnancy status data. The comparison of top 10 ranking SNPs identified by RF to single-marker GWAS methods found that: 1) RF confirmed the most strongly associated SNP (s26449) being the closest to the sheep pigmentation gene MCR1; 2) Five out of the top 10 SNPs identified by RF were close to the genes previously reported to link with reproductive performance in human or other species. The results indicate that RF can potentially be used in GWAS as an initial screening tool for candidate genes.

Yutao Li, James Kijas, John M Henshall, Sigrid A Lehnert, Russell McCulloch, Anthony Reverter-Gomez

Proceedings of the World Congress on Genetics Applied to Livestock Production, Volume Methods and Tools: Statistical methods - linear and nonlinear models, , 206, 2014
Download Full PDF BibTEX Citation Endnote Citation Search the Proceedings

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.