Open Access Open Access  Restricted Access Subscription or Fee Access

A Classification Model using Neuro Fuzzy Classifier for Imbalanced Data

Anazida Zainal, Mohd Aizaini Maarof, Siti Mariyam Shamsuddin

Abstract


Most of the biological data often deals with class imbalance problem. This happens due to the heterogeneous data and also several categorical attributes. This induces the researchers to work in this area to handle the data imbalance problem. The main challenges faced in bioinformatics are the manner by which to unravel the logical issues as opposed to concentrating too vigorously on gathering and examining biological information. As a result of the unpredictability, there are various testing research issues in bioinformatics. For the most part, information examination related issues in bioinformatics can be separated into three classes as indicated by the sort of biological data: sequences, structures, and networks. Classification and clustering strategies of data mining plays a critical part to dissect biological data such as genomic/DNA microarray data classification and analysis. Learning from imbalanced datasets is a common problem found in many bioinformatics applications, such as gene prediction, splice site prediction, promoter prediction, protein classification and many more. In this work neuro fuzzy model is presented for the data imbalance classification problem.


Keywords


Big Biological Data, Data Mining, Class Imbalance Problem, Logical Difficulties, Multi Class Labels.

Full Text:

PDF

References


Antony Browne, Brian D. Hudson, David C. Whitley, Martyn G. Ford, Philip Picton, Biological data mining with neural networks: implementation and application of a flexible decision tree extraction algorithm to genomic problem domains, Neurocomputing, Volume 57, 2004, Pages 275-293.

Dewan Md. Farid, Mohammad Abdullah Al-Mamun, Bernard Manderick, Ann Nowe, An adaptive rule-based classifier for mining big biological data, Expert Systems with Applications, Volume 64, 2016, Pages 305-316.

Georgios Papachristoudis, Sotiris Diplaris, Pericles A. Mitkas, SoFoCles: Feature filtering for microarray classification based on Gene Ontology, Journal of Biomedical Informatics, Volume 43, Issue 1, 2010, Pages 1-14.

Han-Lin Li, Yao-Huei Huang, A DIAMOND method of inducing classification rules for biological data, Computers in Biology and Medicine, Volume 41, Issue 8, 2011, Pages 587-599.

Hoover M. Goldbaum, “Locating the optic nerve in a retinal image using the fuzzy convergence of the blood vessels”, IEEE Trans. on Med. Imag., Vol. 22, No. 8, pp. 951–958, 2003.

J. A. Sanz, D. Bernardo, F. Herrera, H. Bustince and H. Hagras, "A Compact Evolutionary Interval-Valued Fuzzy Rule-Based Classification System for the Modeling and Prediction of Real-World Financial Applications With Imbalanced Data," in IEEE Transactions on Fuzzy Systems, vol. 23, no. 4, pp. 973-990, Aug. 2015.

J. Meng, J. Zhang and Y. Luan, "Gene Selection Integrated with Biological Knowledge for Plant Stress Response Using Neighborhood System and Rough Set Theory," in IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 12, no. 2, pp. 433-444, March-April 2015.

M. S. Esfahani and E. R. Dougherty, "An Optimization-Based Framework for the Transformation of Incomplete Biological Knowledge into a Probabilistic Structure and Its Application to the Utilization of Gene/Protein Signaling Pathways in Discrete Phenotype Classification," in IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 12, no. 6, pp. 1304-1321, 1 Nov.-Dec. 2015.

M. Yamada., "Ultra High-Dimensional Nonlinear Feature Selection for Big Biological Data," in IEEE Transactions on Knowledge and Data Engineering, vol. 30, no. 7, pp. 1352-1365, 1 July 2018.

Marcin Czajkowski, Marek Grześ, Marek Kretowski, Multi-test decision tree and its application to microarray data classification, Artificial Intelligence in Medicine, Volume 61, Issue 1, 2014, Pages 35-44,

N. Asadi, A. Mirzaei and E. Haghshenas, "Creating Discriminative Models for Time Series Classification and Clustering by HMM Ensembles," in IEEE Transactions on Cybernetics, vol. 46, no. 12, pp. 2899-2910, Dec. 2016.

P. Montanari, I. Bartolini, P. Ciaccia, M. Patella, S. Ceri and M. Masseroli, "Pattern Similarity Search in Genomic Sequences," in IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 11, pp. 3053-3067, 1 Nov. 2016.

P. P. Kuksa, "Biological Sequence Classification with Multivariate String Kernels," in IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 10, no. 5, pp. 1201-1210, Sept.-Oct. 2013.

R. Klein, B.E. Klein, S.E.Moss, T. Y.Wong, “Retinal vessel caliber and microvascular and macrovascular disease in type 2 diabetes: XXI: the Wisconsin Epidemiologic Study of Diabetic Retinopathy,” Ophthalmology, Vol. 114, No. 10, pp. 1884-1892, 2007.

Y. Li and A. Ngom, "Nonnegative Least-Squares Methods for the Classification of High-Dimensional Biological Data," in IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 10, no. 2, pp. 447-456, March-April 2013.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.