About: “A systematic and comprehensive investigation of methods to build and evaluate fault prediction models”, published in JSS, 2010
Finalist and candidate for JSS MIP grand prize
At the time we worked on this project, around 2008–2009, there were many papers on predicting fault-proneness of software components. However, publications tended to focus on specific and sometimes narrow aspects, often using public databases or open source projects. In this paper, in collaboration with Telenor, a major Telecom player in Europe, we wanted to investigate the best ways to build and apply such fault-proneness models in a realistic industrial context and in a comprehensive manner.
We therefore systematically assessed three aspects on how to build and evaluate fault-proneness models in the context of large Java legacy system development projects: (1) we compared many data mining and machine learning techniques to build fault-proneness models, (2) we assessed the impact of using different metric sets such as source code structural measures and change/fault history (process measures), and (3) we compared several alternative ways of assessing the performance of the models.
The results of the study indicated that the choice of fault-proneness modeling technique has limited impact on the resulting classification accuracy or cost-effectiveness. There are however large differences between the individual metric sets in terms of cost-effectiveness, and although the process measures are among the most expensive ones to collect, including them as candidate measures significantly improves the prediction models compared with models that only include structural measures and/or their deltas between releases. Further, we observe that what is considered the best model is highly dependent on the criteria that are used to evaluate and compare the models.