Open Access Open Access  Restricted Access Subscription or Fee Access

Data Mining Clustering Technique in Data Streams – A Survey

S. Vijayarani, P. Sathya

Abstract


A data stream is a continuous, real time, ordered sequence of items. It is impossible to control the order in which items arrives. Real time surveillances system, telecommunication system, sensor network, financial applications are some of the examples of the data stream systems. These types of streams produced millions or billions of updates every hour. These data must be processed to extract the information in a meaningful way. As data stored in a database and data warehouse are processed by using some mining algorithm. Data mining is an extraction of interesting pattern or knowledge from huge amount of data. In this paper, we will study how the data mining techniques are used in data streams as well as the clustering problem for data stream applications. To partition the data sets into one or more groups of similar objects is known as clustering.

Keywords


Clustering, Data Mining, Data Streams,

Full Text:

PDF

References


Aggarwal, C. (2007). In C. Aggarwal (Ed.), ―Data streams: Models and algorithms‖. Springer.

Babcock B., Datar M., and Motwani R.: ―Load Shedding Techniques for Data Stream Systems ―(short paper). In Proc. of the 2003 Workshop on Management and Processing of Data Streams (MPDS 2003) (2003).

Go lab L. and Ozsu M. T.: ―Issues in Data Stream Management‖. In SIGMOD Record, Volume 32, Number 2, June (2003) 5-14

―Data Mining: Introductory and Advanced Topics‖ Margaret H. Dunham

L. O'Callaghan, N. Mishra, A. Meyerson, S.Guha, and R. Motwani. ―Streaming-data algorithms for high-quality clustering.‖ Proceedings of IEEE International Conference on Data Engineering, March 2002.

L. O‘Callaghan, N. Mishra, A. Meyerson, S. Guha, and R. Motwani. ―High-performance clustering of streams and large data sets‖. In Proc. of the 2002 Intl. Conference on Data Engineering (ICDE 2002), Feb 2002.

C. Ordonez. ―Clustering Binary Data Streams with K-means algorithm‖ ACM DMKD 2003.

H. Wang, W. Fan, P. Yu and J. Han;‖ Mining Concept-Drifting Data Streams using Ensemble Classifiers‖, in the 9th ACM International Conference on ―Knowledge Discovery and Data Mining (SIGKDD)‖, Aug. 2003, Washington DC,USA

Domingo‘s .P.and Hulten .G. ―Mining high speed data streams‖.

S. Muthukrishnan. ―Data Streams: Algorithms and Applications‖. Now Publishers, 2005.

Y. Chi, H. Wang and P.S. Yu. Loadstar :‖ Load Shedding in Data Stream Mining‖. In Proc. The 31st VLDB Conf., Trondheim, Norway, 2005, pp. 1302—1305.

M. Last, ―Online Classification of Nonstationary Data Streams, Intelligent Data Analysis‖, Vol. 6, No. 2, pp. 129-147, 2002.

G. S. Manku, S. Rajagopalan, and B. G. Lindsay.‖ Random sampling techniques for space efficient online computation of order statistics of large datasets‖. In Proc. of 1999 ACM SIGMOD, pages 251–262, 1999.

―Scientific Data mining and Discovery‖, ISBN 978-3-642-02787-1. Springer Verlag Berlin Heidelberg, 2010.

―Data Streams: An Overview and Scientific Applications‖ Charu C. Aggarwal

T. Zhang, R. Ramakrishnan, M. Livny. BIRCH: An E_cient Data Clustering Method for Very Large Databases. ACM SIGMOD Conference, 1996.

P. Kranen, I. Assent, C. Baldauf, and T. Seidl, ―The ClusTree: indexing micro-clusters for anytime stream mining,‖ Knowledge and Information Systems, pp. 1–24, 2010.

‖ Data Stream Mining A Practical Approach‖Albert Bifet, Geoff Holmes, Richard Kirkby and Bernhard Pfahringer May 2011

Kranen P., Assent I., Baldauf C., and Seidl T. The ClusTree: Indexing micro-clusters for anytime stream mining. In Knowledge and Information Systems Journal (KAIS), 2010.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.