SECTOR DE INTERVENCIÓN
2.2.3 La Cooperación Internacional y la Corresponsabilidad para la Conservación del Patrimonio Cultural
In our research, we were able to automate data collecting techniques for mining bug repositories. We collected data from 8 projects from different software categories. Data was cleaned and designed in three different categories: report, activity, and comment. Classification algorithms were studied and then applied to predict the probability of reopened bugs. In all of the projects a decent amount of precision and recall was achieved. The precision for reopen bugs was from 40% to 90% while range of recall was from 40% to 100%. ADtree, C 4.5 achieved the best F-measure for prediction of reopens while NaiveBayes and BayesNet achieved the best recall of reopened bugs. We found the most important factors responsible for a reopen were component, name of person who fixed name of the person who verified the name of the person who closed the bug, the number, resolving time, verifying time, size, and name of person who made the comment. We developed a data mining methods that was different from other software repository miners, for we created a web crawler to get bug information from the web instead of a more traditional way of mining software repository through files. We were able extract information from around 1 million web pages. The advantage of this method was that we got the latest updated information of projects and that we had access to all the projects open to public. We introduced the name and time the person verified and closed and dimension of the last 3 comments. We had a higher precision and recall then the previous research, which was verified by application to different category of projects. Using reopen analysis of bugs, developers can share data with bug reporters which shows likelihood of reopening a bug report if bug is from a certain component.
Components with high reopen rate can be studied to find their coding metrics. Committers which show high rate of reopen can be retrained to reduce their reopen rate. Assigning of the higher priority bugs can be restricted to committers with higher reputation. Guidance of developers whose comments help in reducing reopen rate can be used for higher priority bugs. By predicting whether bug will be reopened beforehand, more resources can be allocated before documenting it as fixed, thus percent of reopen will go down, increasing reliability of software.
8. FUTURE WORK
The amount of resources we had access to had been limited to bug repository. But the code metrics: lines of codes, global variables, cyclomatic complexity, read coupling, write coupling, address coupling, fan-in, fan-out, weighted methods per class, depth of inheritance, class coupling, and number of subclasses, all of which are important factors in finding bugs, can be incorporatedto enhance research regarding reopen bugs. We plan to understand the contents of reports and comments, weight them according to keywords present that can predict reopen. We plan to create a developer profile with their fixing experience with a type of modules and work habits. Adding the mentioned factors will enhance our knowledge of factors responsible for reopen and make out precision recall more accurate.
REFERENCES
Bayes. (1763). An essay towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society, 370-418.
Boehm, B., & Basili, V. (2001). Software Defect Reduction Top 10 List. Computer Volume 34 Issue 1, 135-137.
Drauschke, M. (2008). Feature Subset Selection with Adaboost and ADTboost. Bonn: Department of Photogrammetry,University of Bonn.
Fayyad, U. (1996). From Data Mining to Knowledge Discovery in Databases. Scientific and Statistical Database Management, 2 - 11.
Freund, Y., & Mason, L. (1999). The alternating decision tree learning algorithm. International Conference on Machine Learning, (pp. 124-133).
Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. In Machine Learning, 148-156.
Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning. Journal of Computer and System Sciences., 119–139.
Freund, Y., & Schapire, R. E. (1999). A Short Introduction to Boosting. Journal of Japanese Society for Artificial Intelligence., 771-780.
Friedman, N., Geiger, D., & Goldszmidt, M. (1997). Bayesian network classifiers. Machine Learning, 131-163.
Gu, Z., Barr, E., Hamilton, D., & Su, Z. (2010). Has the bug really been fixed? International Conference on Software Engineering, 55 - 64.
Guo, P., Zimmermann, T., Nagappan, N., & Murphy, B. (2010). Characterizing and predicting which bugs get fixed: an empirical study of Microsoft Windows. International conference on software engineering, 495–504.
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. (2009). The WEKA Data Mining Software.
Hassan, A. (2008). The road ahead for Mining Software Repositories. Frontiers of Software Maintenance, 2008. FoSM 2008., 48 - 57.
Jongyindee, A., Ohira, M., Ihara, A., & Matsumoto, K.-i. (2011). Good or Bad Committers? A Case Study of Committers' Cautiousness and the Consequences on the Bug Fixing Process in the Eclipse Project. Software Measurement, 2011 Joint Conference of the 21st Int'l Workshop on and 6th Int'l Conference on Software Process and Product Measurement (IWSM-MENSURA), 116 - 125. Kohavi, R., & Provost, F. (1998). Classifier performance evaluation. Machine Learning,
271-274.
Mitchel, T. (1997). Machine Learning. McGraw.
Quinlan, J. R. (1986). Induction of Decision Trees. Machine. Learning. , 81-106.
Refaeilzadeh, P., Tang, L., & Liu, H. (2009). Cross Validation. Encyclopedia of Database Systems (EDBS), 532-538.
Shihab, E., Ihara, A., Kamei, Y. I., Ohira, M., Adams, B., Hassan, A., et al. (2012). Studying re-opened bugs in open source software. Empirical Software Engineering October 2013, Volume 18, Issue 5, 1005-1042.
Shihab, E., Ihara, A., Kamei, Y., & Ibrahim, W. (2010). Predicting Re-opened Bugs: A Case Study on the Eclipse. Reverse Engineering work shop, 249-258.
Shull, F., Singer, J., & Sjoberg, D. (2007). Guide to Advanced Empirical Software Engineering. Springer.
Sliwerski, J., Zimmermann, T., & Zeller, A. (2005). When do changes induce fixes? 1–5: Proceedings of the 2005 international workshop on Mining software repositories, . Vlasceanu, I. V., & Bac, C. (2008). A study concerning the bug tracking applications.
TELECOM & Management SudParis.
Williams, G. (2009). Rattle: A Data Mining GUI for R. The R Journal.
Zimmermann, T., Nagappan, N., & Zeller, A. (2008). Predicting Bugs from History – Software Evolution. Springer, 69-88.