Open Access Open Access  Restricted Access Subscription or Fee Access

Analysis of Different Similarity Functions with Fuzzy C-Means Clustering Approach Using Meeting Transcripts

J.I. Sheeba, K. Vivekanandan

Abstract


Clustering is a technique of automatically grouping similar data into clusters. A large diversity of similarity measures distance functions such as Euclidean distance, Jaccard distance, Pearson Correlation distance, Cosine similarity and Kullback –Leibler Divergence have been implemented for clustering. Fuzzy C means algorithm is implemented for assigning membership to each word point in the cluster. In the same way it is calculated to each cluster center from the origin of remote region between the cluster center and the word point in this process. This proposed framework is used to validate the five similarity measure functions with Fuzzy C means clustering algorithm for finding the effectiveness. To estimate the optimal number of clusters, by implementing the validity measures like purity and entropy. Finally the results are compared five similarity measure functions with Fuzzy C Means clustering algorithm. Euclidean similarity measure function provides better and accurate results as compared to the other distance functions. nally e � s�o���istical looms to scrutiny because of the hefty amount of aspects, the intricacy of molds or the intricacy in executing the scrutiny. In this paper we will discuss the data extraction in oracle database, oracle data extraction and the algorithm used in the oracle data extraction. The functions of oracle data extraction like directed and undirected sets will be explained using different algorithms.

 

.0�/<:�o���/span>

 

more �9ei�o���knowledge.The aim of this work is to create a MLPT, to predict Myocardial Infraction. After getting the patient information this MLPT, forecastthat the patient is caused by heart attack or not which is performed by using three Data mining techniques: Naïve Bayes, Decision tree and WAC (Weighted Associative Classifiers). Using the medical prognosis such as chest pain type, thalassic, slope etc., it can predict the probabilities of patients getting a heart disease in the future. The prediction is performed from extracting the patient’s diachronic data or data storage. The research is mainly developed to recover the hidden information from the database. The system has been implemented in JSP and checked using the datasets that is been collected from UCI machine learning repository.

 


Keywords


Clustering, Euclidean Distance, Fuzzy C Means Algorithm, Similarity Measure.

Full Text:

PDF

References


Feifan Liu, Deana Pennell, Fei Liu: Unsupervised Approaches for Automatic Keyword Extraction Using Meeting Transcripts. NAACL '09 Proceedings of Human Language Technologies, Annual Conference of the North American Chapter of the Association for Computational Linguistics, ISBN - 978-1-932432-41-1, pp.620–628(2009).

Maral Dadvar Franciska de Jong : Improved Cyberbullying Detection Using Gender Information, DIR’2012, February 23-24(2012).

Earl cox :Fuzzy modeling and Genetic algorithms for data mining and exploration, Published by Elsevier, Morgan Kaufmann Publishers. ISBN No :0-12-194275-9 (2005)

Karthik Dinakar, Roi Reichart :Modeling the Detection of Textual Cyberbullying, Association for the Advancement of Artificial Intelligence, pp 11-17(2011).

Kelly Reynolds, April Kontostathis,:Using Machine Learning to Detect Cyberbullying, ICMLA '11 Proceedings of the 2011 10th International Conference on Machine Learning and Applications and Workshops,Vol 02,pp.241-244.IEEE Computer Society USA , ISBN: 978-0-7695-4607-0(2011).

Nayana Mariya Varghese and Jomina John: Cluster Optimization for Enhanced Web Usage Mining using Fuzzy Logic, World Congress on Information and Communication Technologies, IEEE, pp.948-952(2012)

Mir.M and Tadayon Tabrizi .G :Improving Data Clustering Using Fuzzy Logic and PSO Algorithm, 20th Iranian Conference on Electrical Engineering, (ICEE2012), May 15-17, Tehran, Iran. IEEE, pp .784-788(2012).

Selva Kumar .S;Hannah Inbarani .H :Analysis of mixed C-means clustering approach for brain tumour gene expression data, Int. J. of Data Analysis Techniques and Strategies, Vol.5, No.2, pp.214 – 228, DOI: 10.1504/IJDATS.2013.053682 (2013).

Maciej Piasecki; Michał Marcińczuk; Radosław Ramocki; Marek Maziarz, :WordNetLoom: a WordNet development system integrating form-based and graph-based perspectives, Int. J. of Data Mining, Modelling and Management, Vol.5, No.3, pp.210 – 232(2013).

J.I.Sheeba,K.Vivekanandan : Improved Keyword and Keyphrase Extraction from Meeting Transcripts, International Journal of Computer Applications (0975 – 8887), Vol 52, No.13, pp. 11-15(2012)

J.I.Sheeba and K. Vivekanandan : Low Frequency, Keyword and Keyphrase Extraction from Meeting Transcripts with Sentiment Classification using Unsupervised Framework, Proceedings of Second international conference on Computational Science, Engineering and Information Technology, CCSEIT-2012, October 26~28, Coimbatore, Tamilnadu, India. ACM. pp. 212-216, ISBN 978-1-4503-1310-0. (2012)

J.I.Sheeba, K.Vivekanandan: Improved Unsupervised Framework for solving Synonym, Homonym, Hyponymy & Polysemy Problems from Extracted Keywords and Identify topics in Meeting Transcripts, International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.2, No.5, DOI: 10.5121/ijcsea. 2012.2508, pp. 85-92. (2012)

J.I.Sheeba, K. Vivekanandan :Low Frequency, Keyword Extraction with Sentiment Classification and Cyberbully Detection Using Fuzzy Logic Technique, IEEE International Conference on Computational Intelligence and Computing Research, pp. 33-37, ISBN- 978-1-4799-1594-1, December 26-28(2013).

J.I.Sheeba, K. Vivekanandan: Improved Sentiment Classification From Meeting Transcripts, IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 5, No 3, ISSN (Online): 1694-0814,pp.169-176,(2012)

J.I.Sheeba, K. Vivekanandan: A Fuzzy Logic based on Sentiment Classification, International Journal of Data Mining and Knowledge Management Process (IJDKP) Vol.4, No.4, July 2014, DOI : 10.5121/ijdkp.2014.4403,pp.27-44.(2014)

J.I.Sheeba, K. Vivekanandan, A Fuzzy Logic based Improved Keyword Extraction from Meeting Transcripts, International Journal on Computer Science and Engineering Vol. 6 No.08, ISSN : 0975-3397,pp. 287-299.(2014)

Anil Kumar Patidar et al.: Analysis of Different Similarity Measure Functions and their Impacts on Shared Nearest Neighbor Clustering Approach, International Journal of Computer Applications (0975 – 8887) Vol. 40, No.16, February (2012).

Anna Huang,: Similarity Measures for Text Document Clustering, the proceedings of the New Zealand Computer Science Research Student Conference 2008 , Christchurch, New Zealand, NZCSRSC 2008, April (2008).


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.