Microarray technology allows the monitoring of thousands of gene expressions in various biological conditions. Most of these genes are irrelevant for discriminating the outcome classes. Feature selection is consequently needed to help reduce the dimension of the variable space. The selection made must be able to handle highly correlated relevant genes and identify new regulated genes that are still unknown. A promising solution is the random forests classification algorithm which can deal with a massive number of correlated input variables and can also select features using internal variable importance measures. This technique was applied to a small, unbalanced, 3-class folliculogenesis dataset taken from pig ovarian cells, to illustrate its benefits over multiple testing. A stable, biologically relevant gene selection was obtained by using the Mean Decrease Gini importance measure. The relevancy of the results was assessed both heuristically and through biological interpretation.

K. AL Cao, A. Bonnet, P. Besse, C. Robert-Granie, M. San Cristobal

Proceedings of the World Congress on Genetics Applied to Livestock Production, Volume , , 23.09, 2006
Download Full PDF BibTEX Citation Endnote Citation Search the Proceedings

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.