Cancer Gene Expression Analysis Using Class Chunk
Classification consists of assigning a class label to a set of unclassified cases. Supervised and unsupervised classification methods are used to assign class labels. Classification is performed in two steps learning or training and testing. Learning process is used to identify the class patterns from the labeled transactions. In training phase unlabeled transactions are assigned with the class values with reference to the learned class patterns. Bayesian classification and decision tree classification methods are used for the category assignment process. An outlier is a comment that varies so much from other comments as to produce suspicions. Distance based outlier detection methods are used to find records that are differ from the rest of the data set.
Critical nuggets are collections of records that have domain-specific with essential information. Nuggets are referred as class chunks. Nuggets are used to perform label or category assignment to transactions. Domains independent method is used to measure criticality and reduce the find space for identify critical nuggets. Criticality measure is the records that detached together from the data set or from values of attributes. Criticality Score (CR-score) indicates the outcome of removing a nearby data’s on a classification model. Here we are using three kinds of algorithms. One is GetNuggetScore algorithm is used to calculate the CR-score value. Second is Findboundary algorithm is used to identify the class boundary values. Third is Findcriticalnuggets algorithm. We can split Findcriticalnuggets algorithm into two phases to detect critical nuggets for two classes. The centroid neighborhood relationship is used to find the nuggets for the significant classes.
To support this critical nugget for multiple classes we use identification and classification scheme under cancer gene expression environment. The scheme can be accepted to handle mixed attribute data values. To reduce the detection complexity we use the boundary approximation algorithm. With the help of Post processing operations we can able to identify class in multiple data environment.
A. Koufakou and M. Georgiopoulos, “A Fast Outlier Detection Strategy for Distributed High-Dimensional Data Sets with Mixed Attributes,” Data Mining and Knowledge Discovery, vol. 20, no. 2, special issue SI, pp. 259-289, Mar. 2010.
R.A. Weekley, R.K. Goodrich, and L.B. Cornman, “An Algorithm for Classification and Outlier Detection of Time-Series Data,” J. Atmospheric and Oceanic Technology, vol. 27, no. 1, pp. 94-107, Jan. 2010.
M. Ye, X. Li, and M.E. Orlowska, “Projected Outlier Detection in High-Dimensional Mixed-Attributes Data Set,” Expert Systems with Applications, vol. 36, no. 3, pp. 7104-7113, Apr. 2009.
L. Geng and H.J. Hamilton, “Interestingness Measures for Data Mining: A Survey,” ACM Computing Surveys, vol. 38, article 9, http://doi.acm.org/10.1145/1132960.1132963, Sept. 2006.
E. Triantaphyllou, Data Mining and Knowledge Discovery via Logic-Based Methods. Springer, 2010.
V. Chandola, A. Banerjee, and V. Kumar, “Anomaly Detection: A Survey,” ACM Computing Survey, vol. 41, no. 3, article 15, 2009.
A. Ghoting, S. Parthasarathy, and M.E. Otey, “Fast Mining of Distance-Based Outliers in High-Dimensional Datasets,” Data Mining and Knowledge Discovery, vol. 16, no. 3, pp. 349-364, 2008.
L. Duan, L. Xu, Y. Liu, and J. Lee, “Cluster-Based Outlier Detection,” Annals of Operations Research, vol. 168, no. 1, pp. 151-168, http://dx.doi.org/10.1007/s10479-008-0371-9, Apr. 2009.
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution 3.0 License.