OPEN ACCESS
Many machine learning methods can’t obtain higher classification performance because of the characteristics of high dimension and small samplest of gene expression profile. This paper proposes an improved rotation forest algorithm based on heterogeneous classifiers ensemble to classify gene expression profile.Firstly, all the original genes are ranked by using relieff algorithm, and then some top-ranked genes are selected to build a new training subset from original training set. Secondly, because decision tree classifier in rotation forest algorithm has the disadvantages of local optimum and overfitting,an improved rotation forest algorithm based on heterogeneous classifiers is proposed to overcome above problems.Here,heterogeneous classifiers based on support vector machine, decision tree and extreme learning machine, replace decision tree in rotation forest algorithm and are used to train base classifiers, and then the heterogeneous base classifiers will have the higher diversity each other to improve ensemble performance furtherly.Experimental results on nine benchmark gene expression profile datasets show our proposed algorithm is better than traditional rotation forest, bagging and boosting. It improves not only classification accuracy, but also has high stability and time efficiency.
gene expression profile, rotation forest, relieff algorithm, heterogeneous classifiers.
This paper is supported by Scientific Research Program Funded by Shaanxi Provincial Education Department (16JK1149) and Scientific Research Program Funded by Shaanxi University of Technology ( SLGKY16-15,SLGKY-29).
1. M.B Kursa, Robustness of random forest-based gene selection methods, 2014, BMC bioinformatics, vol.15, no.1, pp.1-8.
2. T. Chen, H.F. Xue, Z.L. Hong, M. Cui, H. Zhao, A hybrid ensemble method based on double disturbance for classifying microarray data, 2015, Bio-Medical Materials and Engineering, vol.26, no.1, pp.1961-1968.
3. Y. Xiao, T.H. Hsiao, U.Suresh, H.I. Chen, X. Wu, S.E. Wolf. Y. Chen, A novel significance score for gene selection and ranking, 2014, Bioinformatics, vol.30, no.6, pp. 801-807.
4. T. Chen, Z.L. Hong, H. Zhao, J. Wei, A novel feature gene selection method based on neighborhood mutual Information, 2015, International Journal of Hybrid Information Technology, vol.8, no.7, pp.277-292.
5. K.H. Chen, K.J. Wang, M.L. Tsai, K.M. Wang, A.M. Adrian, W.C. Cheng, T.S. Yang, N.C. Teng, K.P. Tan, K.S. Chang, Gene selection for cancer identification: a decision tree model empowered by particle swarm optimization algorithm, 2014, BMC bioinformatics, vol.5, no.1, pp.49-56.
6. R.C. Barros, M.P. Basgalupp, A.A. Freitas, A. De Carvalho, Evolutionary design of decision-tree algorithms tailored to microarray gene expression data sets, 2014, IEEE Transactions on Evolutionary Computation, vol.18, no.6, pp.873-892.
7. B. Chandra, K.V.N. Babu, Classification of gene expression data using spiking wavelet radial basis neural network,2014,Expert systems with applications, vol.41, no.4, pp.1326-1330.
8. C. Bazot, N. Dobigeon, J.Y. Tourneret, A.K. Zaas, G.S. Ginsburg, A.O. Hero, Unsupervised bayesian linear unmixing of gene expression microarrays, 2013, BMC bioinformatics, vol.14, no.1, pp.99-108.
9. S. Kar, K.D. Sharma, M. Maitra, Gene selection from microarray gene expression data for classification of cancer subgroups employing PSO and adaptive K-nearest neighborhood technique, 2015, Expert Systems with Applications, vol. 42, no.1, pp. 612-627.
10. C. Das, S. Bose, M. Chattopadhyay, S. Chattopadhyay, A novel distance-based iterative sequential KNN algorithm for estimation of missing values in microarray gene expression data, 2016, International Journal of Bioinformatics Research and Applications, vol.12, no.4, pp. 312-342.
11. H. Saberkaria, M. Shamsi, M. Joroughi, F. Golabi, M.H. Sedaaghi, Cancer classification in microarray data using a hybrid selective independent component analysis (SICA) and υ-Support Vector Machine (υ-SVM) Algorithm, 2014, Journal of medical signals and sensors, vol.4, no.4, pp.291-299.
12. T. Chen, Classification algorithm on gene expression profiles of tumor using neighborhood rough set and support vector machine, 2014, Advanced Materials Research, vol.850, pp.1238-1242.
13. H. Zhao, Intrusion detection ensemble algorithm based on bagging and neighborhood rough set, 2013, International Journal of Security and Its Applications, vol.7, no.5, pp.193-204.
14. T. Chen, Z.L. Hong, A combined svm ensemble algorithm based on KICA and KFCM, 2012, Software Engineering and Knowledge Engineering: Theory and Practice, China, pp.585-592.
15. L. Shi, L. Xi, X. Ma, M. Weng, X. Hu, A novel ensemble algorithm for biomedical classification based on ant colony optimization, 2011, Applied Soft Computing, vol.11, no.8, pp.5674-5683.
16. L. Breiman, Bagging predictors, 1996, Mach. Learn, vol.24, no.1, pp.123-140.
17. R. Schapire, The strength of weak learnability, 1990, Mach. Learn, vol.5, no.2, pp.197-227.
18. T.K. Ho, The random subspace method for constructing decision forests, 1998, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.20, no.8, pp.832-844.
19. R. Díaz-Uriarte, S.A. DeAndres, Gene selection and classification of microarray data using random forest, 2006, BMC bioinformatics, vol.7, no.1, pp.1-13.
20. C.X. Zhang, J.S. Zhang, RotBoost: A technique for combining Rotation Forest and AdaBoost, 2008, Pattern recognition letters, vol.29, no.10, pp.1524-1536.
21. J.J. Rodriguez, L.I. Kuncheva, C.J. Alonso, Rotation forest: a new classifier ensemble method, 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.28, no.10, pp.1619-1630.
22. L. Zhang, P.N. Suganthan, Random forests with ensemble of feature spaces, 2014, Pattern Recognition, vol.47, no.10, pp.3429-3437.
23. H. Lu, L. Yang, K. Yan, Y. Xue, Z. Gao, A cost-sensitive rotation forest algorithm for gene expression data classification, 2016, Neurocomputing, vol.228, no.8, pp.270-276.
24. S. Kotsiantis, Combining bagging, boosting, rotation forest and random subspace methods, 2011, Artificial Intelligence Review, vol.35, no.3, pp.223-240.
25. A. Ozcift, A. Gulten, Classifier ensemble construction with rotation forest to improve medical diagnosis performance of machine learning algorithms, 2011, Computer methods and programs in biomedicine, vol.104, no.3, pp.443-451.
26. G.B. Huang, Q.Y. Zhu, C.K. Siew, Extreme learning machine: theory and applications, 2006, Neurocomputing , vol.70, no.1, pp. 489-501.
27. G.B. Huang, Q.Y. Zhu, C.K. Siew, Extreme learning machine: a new learning scheme of feedforward neural networks, 2004, IEEE International Joint Conference, pp.985-990.
28. G.B. Huang, H. Zhou, X. Ding, R. Zhang, Extreme learning machine for regression and multiclass classification, 2012, Systems, Man, and Cybernetics, Part B: Cybernetics,vol.42,no.2,pp.513-529.
29. Y. Lan, Y.C. Soh, G.B. Huang, Two-stage extreme learning machine for regression, 2010, Neurocomputing, vol.73, no.16, pp.3028-3038.
30. I. Kononenko, Estimating attributes: analysis and extensions of RELIEF, 1994, Proceedings of the European conference on machine learning, Lecture notes in computer science, pp.784:171-182.
31. M. Aly, Survey on multiclass classification methods, 2005, Neural Network, pp.1-9.
32. G.I. Webb, Multiboosting: A technique for combining boosting and wagging, 2000, Machine learning, vol.40, no.2, pp.159-196.