Open Access Open Access  Restricted Access Subscription or Fee Access

An Emerging Classification Method for Huge Dataset in Clustering

B. Rosiline Jeetha, Dr.M. Punithavalli


Clustering analysis is used to explore the classification for large dataset and Canberra distance is generalized so that it can process the data with categorical attributes. Based on the generalized Canberra distance definition, an instance of constraint-based clustering is introduced. Meanwhile, the nearest neighbor classification is improved. Class-labeled clusters are regarded as classifying models used for classifying data. The proposed classification method can discover the data of big difference from the instances in training data, which may mean a new data type. The generalize Canberra distance for continuous numerical attributes data to mixed attributes data, and use clustering analysis technique to squash existing instances, improve the classical nearest neighbor classification method.


ID3, C4.5, Canberra Distance, Clustering, Improved Nearest Neighbour.

Full Text:



Guido Bologna. A Study on Rule Extraction from Neural Networks Applied to Medical Databases. The 4th European Conference on Principles and Practice of Knowledge Discovery (PKDD2000), Lyon, France, Sept 2000.

Schiffmann W, Joost M, Werner R. Optimization of the Backpropagation Algorithm for Training Multilayer Perceptrons. Technical report (1994). University Koblenz, Institute of Physics.

Tjen-sien Lim, Wei-yin Loh,Yu-shan Shih. A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-three Old and New Classification Algorithms. Machine Learning, 40, 203-229 (2000). 2000 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands.

Zhao Ying ,Gao Jun ,Wang Rong-gui ,Hu Jing. An Extended Nearest Neighbor Method Based on Bionic Pattern Recognition. Acta Electronica Sinica. 2004.12: 196-198.

Leake D B. CBR in context: The present and future. In Leake D B, editor, Cased-Based Reasoning: Experience, Lessons, and Future Direction. Menlo Park: AAAI Press, 1996:3-30.

Shengyi Jiang, Xiaoyu Song, etc. A clustering-based method for unsupervised intrusion detections. Pattern Recognition Letters. 2006 (27):802-810

Charles Elkan. Results of the KDD'99 Classifier Learning Contest. URL:

S.T. Teoh and K. Ma, “PaintingClass: interactive construction, visualization and exploration of decision trees,” Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, Washington, D.C.: ACM, 2003, pp. 667-672.

M. Ankerst, C. Elsen, M. Ester, and H. Kriegel, “Visual classification: an interactive approach to decision tree construction,” Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 1999, pp. 392-396.

Q. Cui and J. Yang, “Measuring Data Abstraction Quality in Multiresolution Visualizations,” IEEE Transactions on Visualization and Computer Graphics, vol. 12, 2006, pp. 709-716.

D. Yang, Z. Xie, E.A. Rundensteiner, and M.O. Ward, “Managing discoveries in the visual analytics process,” SIGKDD Explor. Newsl., vol. 9, 2007, pp. 22-29.

G. Ellis and A. Dix, “Density control through random sampling: an architectural perspective,” Information Visualisation, IV 2002., 2002, pp. 82–90.

E. Bertini and G. Santucci, “Give chance a chance: modeling density to enhance scatter plot quality through random data sampling,” Information Visualization, vol. 5, 2006, pp. 95–110.

R.A. Amar, “Knowledge Precepts for Design and Evaluation of Information Visualizations,” IEEE Transactions on Visualization and Computer Graphics, vol. 11, 2005, pp. 432-442.

C. Plaisant, J. Fekete, and G. Grinstein, “Promoting Insight- Based Evaluation of Visualizations: From Contest to Benchmark Repository,” Visualization and Computer Graphics, IEEE Transactions on, vol. 14, 2008.


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.