Open Access Open Access  Restricted Access Subscription or Fee Access

An Open Source Framework for Data Pre-processing of Online Software Bug Repositories

Naresh Kumar Nagwani, Dr. Shrish Verma


Software bug repositories are great source of knowledge. It contains lot of useful information related to software development, software design and common error patterns for a software project. Most of the projects uses some bug tracking system to manage the bugs associated with the software. These bug tracking system works as an online bug repositories, which can be accessed by all of the project members situated at different locations. All project members can update and read the software bug related information from these online bug repositories. In order to extract knowledge from these online software bug repositories some mechanism is required to extract, parse and save the data locally for analysis. In this paper a framework is proposed and implemented using open source API’s (Application Programming Interfaces) for the preprocessing of the online software bug repositories for data mining, also performance is evaluated for the implemented framework in terms of software bug data fetch and parse timings from online repositories.


Software bug repositories, Fetching bug repositories, Parsing software bugs, Data preprocessing of bug repositories

Full Text:



Alvaro Arenas, Juan Bicarregui, Tiziana Margaria, "The FMICS View on the Verified Software Repository", Proceedings of Integrated Design and Process Technology, IDPT-2006.

Anupriya Ankolekar, Katia Sycara, James, Chris Welty, "Supporting Online Problem Solving Communities with the Semantic Web",Proceedings of WWW 2006, May 23–26, 2006.

BugZilla, bug tracking tool:

Chadd C. Williams and Jeffrey K. Hollingsworth, "Automatic Mining of Source Code Repositories to Improve Bug Finding Techniques", IEEE Transaction on SOFTWARE ENGINEERING, VOL. 31, NO. 6, JUNE 2005.

Chadd C. Williams, Jeffrey K. Hollingsworth, "Recovering System Specific Rules from Software Repositories", Proceedings of MSR 2005:International Workshop on Mining Software Repositories, 2005.

Christine A. Halverson, Jason B. Ellis, Catalina Danis, Wendy A.Kellogg, "Designing Task Visualizations to Support the Coordination of Work in Software Development", Proceedings of the 2006 20th anniversary conference on Computer supported cooperative work, pp. 39- 48, 2006.

Christoph Kiefer, Abraham Bernstein, Jonas Tappolet, "Mining Software Repositories with iSPARQL and a Software Evolution Ontology",Proceedings of the 29th International Conference on Software Engineering Workshops table of contents, 2007.

E. Rowland Watkins, Denis A. Nicole, "Version Control in Online Software Repositories", Version Control in Online Software Repositories.ACM TechNews, 7 (872), 2005.

Erik Linstead, Paul Rigor, Sushil Bajracharya, Cristina Lopes and PierreBaldi, “Mining Internet-Scale Software Repositories”, Advances in Neural Information Processing Systems (NIPS), Vol. 21,2008.

Fatudimu I.T, Musa A.G, Ayo C.K, Sofoluwe A. B, “Knowledge Discovery in Online Repositories: A Text Mining Approach”, European Journal of Scientific Research, ISSN 1450-216X, Vol. 22 No. 2, 2008 pp.241-250.

Gina Venolia, "Textual Allusions to Artifacts in Software related Repositories", MSR 2006: The 3rdInternational W orkshop on Mining Software Repositories, 2006.

Horacio Saggion and Robert Gaizauskas, "Mining on-line sources for definition knowledge", Proceedings of the Seventeenth International Florida Artificial Intelligence Research Society Conference, MiamiBeach, Florida, USA 2004.

J. C. Bicarregui, C. A. R. Hoare, and J. C. P. Woodcock,"The Verified Software Repository:a step towards the verifying compiler", Formal Aspects of Computing, Springer London, Vol. 18, Number 2 / June, pp.143-151, 2006.

Jaime Spacco, Jaymie Strecker, David Hovemeyer, and William Pugh,"Software Repository Mining with Marmoset: An Automated Programming Project Snapshot and Testing System", Proceedings of MSR 2005.

Jason Novotny, Steven Tuecke, Von Welch, "An Online Credential Repository for the Grid: MyProxy", Proceedings of the Tenth International Symposium on High Performance Distributed Computing(HPDC-10), IEEE Press, August 2001.

Java, the open source programming API: http://

JAXP, Java XML parsing API:

Jeffrey S. Poulin, "Populating Software Repositories: Incentives and Domain-Specific Software", Journal of Systems and Software archive,Vol 30 , Issue 3, Special issue on software reuse, pp. 187 - 199, 1995.

JIRA, bug tracking tool :

John Anvik, Lyndon Hiew and Gail C. Murphy, "Coping with an Open Bug Repository", OOPSLA workshop on eclipse technology eXchange archive, Proceedings of the 2005 OOPSLA workshop on Eclipse technology eXchange, pp. 35 - 39, 2005.

John Anvik, Lyndon Hiew and Gail C. Murphy, "Who Should Fix This Bug?", Proceedings of ICSE’06, Shanghai, China, 2006.

Keir Mierle, Kevin Laven, Sam Roweis, Greg Wilson, "Mining Student CVS Repositories for Performance Indicators", Proceedings of the 2005 International Workshop on Mining Software Repositories (MSR2005),pp.41--45, May 2005.

Lucas D. Panjer, "Predicting Eclipse Bug Lifetimes", Proceedings of the Fourth International Workshop on Mining Software Repositories, 2007 .

Lucian Voinea, Alexandru Telea, “An Open Framework for CVS Repository Querying, Analysis and Visualization”, Proceedings of the 2006 international workshop on Mining software repositories table of contents, Shanghai, China, pp. 33 - 39, 2006.

MantisBT, bug tracking tool :

MySql, the open source database management system:

Nagwani, N.K. Singh, P., “Bug Mining Model Based on Event-Component Similarity to Discover Similar and Duplicate GUI Bugs", Advance Computing Conference, 2009. IACC 2009. IEEE International, pp.1388-1392, 2009.

Olga Baysal, Michael W. Godfrey, Robin Cohen, "A Bug You Like: A Framework for Automated Assignment of Bugs", Proc. of 2009 IEEE Intl.Conference on Program Comprehension (ICPC-09), 17-19 May 2009.

Randall H. Trigg, Jeanette Blomberg, Lucy Suchman, "Moving document collections online: The evolution of a shared repository", Proceedings of the Sixth European Conference on Computer-Supported Cooperative Work, 12-16 September 1999.

Repository Data, Georgios Gousios, Eirini Kalliamvakou and Diomidis Spinellis, "Measuring Developer Contribution from Software",Proceedings of the 2008 international working conference on Mining software repositories table of contents, pp 129-132, 2008.

Romain Robbes, "Mining a Change-Based Software Repository",Proceedings in Fourth International Workshop on Mining Software Repositories MSR '07, 2007.

Scott Henninger, "An Evolutionary Approach to Constructing Effective Software Reuse Repositories", ACM Transactions on Software Engineering and Methodology (TOSEM) archive, Vol. 6 , Issue 2, pp. 111 - 140, 1997.

Sean and Hojun: Automated Detection of Duplicate Bug Reports with Semantic Concepts, IEEE COMPSAC, 2008.

Shih-Fu Chang, John R. Smith, Mandis Beigi, and Ana Benitez, "Visual Information Retrieval from Large Distributed On-line Repositories",Communications of the ACM archive, Volume 40 , Issue 12, pp 63 - 71, 1997.

Shuji Morisaki, Akito Monden, Tomoko Matsumura, Haruaki Tamada and Kenichi Matsumoto: Defect Data Analysis Based on Extended Association Rule Mining, 2007.

Sushil Bajracharya, Joel Ossher, Cristina Lopes, "Sourcerer: An internet-scale software repository", Proceedings of the 2009 ICSE Workshop on Search-Driven Development-Users, Infrastructure, Tools and Evaluation table of contents, pp. 1-4, 2009.

Xiaoyin Wang, Lu Zhang, Tao Xie, John Anvik and Jiasu Sun, "An Approach to Detecting Duplicate Bug Reports using Natural Language and Execution Information", International Conference on Software Engineering archive, Proceedings of the 30th international conference on Software engineering, pp. 461-470, 2008.


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.