Open Access Open Access  Restricted Access Subscription or Fee Access

Data Processing for Large Database using Mapreduce Approach and Using APSO

S. Bindhu, R. Preethi, N. Gopinath

Abstract


Big Data is a hype up-springing many technical challenges that confront both academic research communities and commercial IT deployment. It is generally known that data which are sourced from data streams accumulate making traditional batch-based model induction algorithms. Feature selection has been popularly used to lighten the processing. Optimal feature subset which is derived grows exponentially in size. In order to tackle this problem, a novel lightweight feature selection is proposed and it is designed particularly for mining streaming data on the fly, by using accelerated particle swarm optimization (APSO) which is achieved with good accuracy also with reasonable processing time. In this paper, the data in the disk are processed in subsequent iterations.


Keywords


APSO, CART

Full Text:

PDF

References


Quinlan, J. R., C4.5: Programs for Machine Learning. Morgan Kauf-mann Publishers, 1993

Ping-Feng Pai, Tai-Chi Chen, "Rough set theory with discriminant analysis in analyzing electricity loads", Expert Systems with Applica-tions 36 (2009), pp.8799–8806

Mohamed Medhat Gaber, Arkady Zaslavsky, Shonali Krishnaswamy, "Mining data streams: a review", ACM SIGMOD Record, Volume 34 Issue 2, June 2005, pp.18-26

Wei Fan, Albert Bifet, "Mining Big Data: Current Status, and Forecast to the Future", SIGKDD Explorations, Volume 14, Issue 2, pp.1-5

Arinto Murdopo, "Distributed Decision Tree Learning for Mining Big Data Streams", Master of Science Thesis, European Master in Distribut-ed Computing, July 2013

S. Fong, X.S. Yang, S. Deb, Swarm Search for Feature Selection in Classi-fication, The 2nd International Conference on Big Data Science and En-gineering (BDSE 2013), 2013, 3-5 Dec. 2013.

Rokach, Lior, and OdedMaimon. "Top-down induction of decision trees classifiers-a survey." Systems, Man, and Cybernetics, Part C: Ap-plications and Reviews, IEEE Transactions on 35, no. 4 (2005): 476-487.

Aggarwal, Charu C., ed. Data streams: models and algorithms. Vol. 31.Springer, 2007.

Domingos P., and Hulten G. 2000. "Mining high-speed data streams", in Proc. of 6th ACM SIGKDD international conference on Knowledge dis-covery and data mining (KDD’00), ACM, New York, NY, USA, pp. 71-80.

B.Pfahringer, G. Holmes, and R. Kirkby, "New Options for Hoeffding Trees", Proc. in Australian Conference on Artificial Intelligence, 2007, pp.90-99.

John G. Cleary, Leonard E. Trigg: K*: An Instance-based Learner Using an Entropic Distance Measure. In: 12th International Conference on Machine Learning, pp.108-114, 1995.

Indre Zliobaite, Albert Bifet, Bernhard Pfahringer, Geoff Holmes, "Ac-tive Learning with Evolving Streaming Data", ECML/PKDD (3) 2011, pp.597-612.

Simon Fong, Suash Deb, Xin-She Yang, Jinyan Li, "Metaheuristic Swarm Search for Feature Selection in Life Science Classification", IEEE IT Professional Magazine, August 2014, Volume 16, Issue 4, pp.24-29.

Xin-She Yang, Suash Deb, Simon Fong, Accelerated Particle Swarm Optimization and Support Vector Machine for Business Optimization and Applications, The Third International Conference on Networked Digital Technologies (NDT 2011), Springer CCIS 136, 11-13 July 2011, Macau, China, pp.53-66.

Fong, S., Liang, J., Wong, R., Ghanavati, M., "A novel feature selection by clustering coefficients of variations", 2014 Ninth International Con-ference on Digital Information Management (ICDIM), Sept. 29, 2014, pp.205-213.

Fong, S., Liang, J., Wong, R., Ghanavati, M., "A novel feature selection by clustering coefficients of variations", 2014 Ninth International Conference on Digital Information Management (ICDIM), Sept. 29, 2014, pp.205-213

I.H. Witten, E. Frank, Data mining: practical machine learning tools and techniques with Java implementations, Morgan Kaufmann (2005), J.S. Bridle, “Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition,” Neurocomputing—Algorithms, Architectures and Applications, F. Fogelman-Soulie and J. Herault, eds., NATO ASI Series F68, Berlin: SpringerVerlag, pp. 227-236, 1989. (Book style with paper title and editor)


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.