Author : Mohammed Siyad B 1
Date of Publication :7th January 2017
Abstract: Epigenetic alterations have been associated with a wide variety of diseases including cancer. Bladder cancer is the fourth most common cancer and the ninth driving reason of cancer death. A lot of tools and protocols have been developed for the diagnosis of bladder cancer over the past 5 to 10 years. In this paper, a machine learning approach is proposed for effectively predicting the disease from epigenetic information in the context of bladder cancer. Three different feature selection methods were assessed in combination with three classification methods, using 10-fold cross-validation on the training data set. A model consisting of 151 genes(treated as features) selected through genetic algorithm and random forest classification is identified as the best model with AUC=0.96 from 10-fold cross validation. Most of the selected genes which formed the basis of prediction were allegedly reported in the pathways related to bladder cancer. Hence the best selected model can be effectively applied for better disease diagnosis and prognosis.
Reference :
-
- G. Cheung, A. Sahai, M. Billia, P. Dasgupta, and M. S. Khan, “Recent advances in the diagnosis and treatment of bladder cancer,” BMC Medicine, vol. 11, no. 13, March 2013.
- “What is bladder-cancer,” http://www.cancer.org/cancer/bladdercancer/ detailedguide/bladder-cancer-what-is-bladder-cancer.
- C. Piccinni, D. Motola, G. Marchesini, and E. Poluzzi, “Assessing the association of pioglitazone use and bladder cancer through drug adverse event reporting,” Diabetes Care, vol. 34, pp. 1369–1371, 2011.
- N. U. Nair, “Computational problems in epigenetics,” EDIC Research Proposal, 2010.
- L. DHK and M. ER, “DNA methylation: a form of epigenetic control of gene expression,” The Obstetrician and Gynaecologist, vol. 12, no. 1, pp. 37–42, January 2010.
- J. Li, T. Ching, S. Huang, and L. X. Garmire, “Using epigenomics data to predict gene expression in lung cancer,” BMC Bioinformatics, vol. 16, no. S5, pp. 1471– 2105, March 2015.
- Z. Herce and P. Hainaut, “Genetic and epigenetic alterations as biomarkers for cancer detection,” Molecular On Cology1, pp. 26–41, 2007.
- J. Tang, S. Alelyani, and H. Liu, Data Classification Algorithms and Applications, Chapter 2- Feature Selection for Classification: A Review. Chapman and Hall/CRC 2014, 2014.
- I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene selection for cancer classification using support vector machines,” Machine Learning, vol. 46, pp. 389– 422, 2002.
- M. Robnik-Sikonja and I. Kononenko, “Theoretical and empirical analysis of relieff and rrelieff,” Machine Learning, vol. 53, pp. 23–69, October 2003
- S. Osowski, T. Markiewicz, and K. Siwek, “Application of support vector machine and genetic algorithm for improved blood cell recognition,” IEEE Transactions On Instrumentation And Measurement, vol. 58, pp. 2159–2168, October 2009.
- A. K. Yadav and S. Chandel, “Solar energy potential assessment of western himalayan indian state of himachal pradesh using j48 algorithm of weka in ann based prediction model,” Renewable Energy, Elsevier, vol. 75, p. 675693, March 2015.
- N. Bhargava, G. Sharma, R. Bhargava, and M. Mathuria, “Decision tree analysis on j48 algorithm for data mining,” IJARCSSE, vol. 3, no. 6, pp. 1114–1119, June 2013.
- Surabhi and S. K. Pandey, “Performance evaluation of supervised classification algorithms using data mining,” IJSAE, vol. 2, no. 8, pp. 1476–1482, August 2014
- L. Breiman, “Random forests,” Machine Learning, vol. 45, pp. 5–32, 2001.
- “NCBI data base,” /www.ncbi.nlm.nih.gov/geo/query/acc.cgi?accn =gse37816, series GSE37816.
- “NCBI data base,” http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?accn =gse37815, series GSE37815.
- A. P. Bradley, “The use of the area under the roc curve in the evaluation of machine learning algorithms,” Pattern Recognition-Elsevier, vol. 30, no. 7, pp. 1145– 1159, 1997.
- K. H. Zou, A. J. O. Malley, and L. Mauri, “Receiver-operating characteristic analysis for evaluating diagnostic tests and predictive models,” Circulation, vol. 115, no. 5, pp. 654–657, February 2007.
- J. Hallinan, “Assessing and comparing classifier performance with roc curves,” Macine Learning Mastery, 2014.
- M. Kanehisa and S. Goto, “KEGG: Kyoto encyclopedia of genes and genomes,” Nucleic Acids Research, vol. 28, pp. 27–30, 2000.