Graphgain: A Proposed Measure for Ranking Mined Subgraph
Frequent itemset discovery algorithms have been used to solve various interesting problems over the year. As data mining techniques are being introduced and widely applied to non-traditional itemsets, existing approaches for finding frequent itemsets cannot be used as they cannot model the requirement of these domains. An alternate way of modeling the objects in these data sets, is to use a graph to model the database objects. Within that model, the problem of finding frequent patterns becomes that of finding subgraphs that occur frequently over the entire set of graphs. Modeling objects using graphs allows us to represne tarbitrary relations among entities. In this paper we present a computationally efficient algorithm for finding the ranking of such frequent subgraphs. The subgraph finding method may follow any one of the existing algorithm. In order to find out the ranking of subgraphs we present a new method called “graphgain”. A graphgain is the normalization technique applied at each position for a chosen value of Discounted Cumulative Gain (DCG) of a subgraph. The DCG alone cannot be used to achieve the performance from one query to the next in the search engine’s algorithm. To obtain the graphgain an ideal ordering of DCG (IDCG) at each position is to be found out. For this, a Modified Dicounted Cumulative Gain using “lift” is introduced here and IDCG is also evaluated. Then the graphgain is evaluated. Finally, the graphgain for all rules can be averaged to obtain a measure of the average performance of a search engine’s ranking algorithm. And also the ordering of graphgain will provide the order of evaluation of rules which gives in turn the efficient ranking of subgraph process.
Cook D.J. and Holder L.B.. 2000, “Graph_Based Data Mining”. IEEE Trans. On Intelligent Systems 15(2):32-41. IEEE Press, Piscataway, NJ, USA.
Michihiro Kuramochi and George Karypis. 2001, “Frequent Subgraph Discovery”, IEEE International Conference on Data Mining (ICDM), .
Lawrence B.Holder, Diane J.Cook. 2005, “Graph-Based Data Mining”, Idea Group Inc., .
Akihiro Inokuchi, Takashi Washio and Hiroshi Motoda. 2000, “An Apriori-based Algorithm for Mining Frequent Substructures from Graph Data”. Proc. PKDD2000, Sept. 13-16, Lyon, France.
“Discounted Cumulative Gain” – Wikipedia, the free encyclopedia.
Kalerro Jarrelin, Jaana Kekalainen : 2002, “Cumulated gain based evaluation of IR techniques”. ACM Transactions on Information system 20(4), 422-446.
Croft. B. Metzler. D. and Strohman T. 2009, “Search Engines. Information retrieval in practice”. Adison Wesley.
Sergey Brin, Motwani. R. and Silverstein. C. 1997, “Beyond Market Baskets : Generalizing Association Rules to Correlations”, SIGMOD’97 AI, USA.
Sergey Brin, Rajeev Motwani, Jeffrey Ullman. D. and Shalom Tsur., 1997, “Dynamic itemset counting and implication rules for market basket data”. In SIGMOD 1997, Proceedings ACM SIGMOD International Conference on Management of Data, pages 255-264, Tucson, Arizona, USA, May.
Bayardo Jr R.J. and Rakesh Agrawal,, 1999, “Mining the most Interesting Rules”, Procedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 145-154.
Akihiro Inokuchi, Takashi Washio and Hiroshi Motoda, 2000, “An Apriori-based Algorithm for Mining Frequent Substructures from Graph Data”, PKDD2000, Sept. 13-16, Lyon, France.
Available : http: // oldwww.comlab.ox.ac.uk/oucl/ groups / machlearn / PTE/.
Srinivasan, A. King, R.D., Muggleton, S.H. and Sternberg, M. 1997, “The predictive toxicology evaluation challenge”. In Proc. Of the 15th International Joint Conference on Artificial Intelligence (IJCAI), pages 1-6, Morgan-Kaufmann,.
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution 3.0 License.