An Empirical Analysis of the Effectiveness of Software Metrics and Fault Prediction Model for Identifying Faulty Classes

Loading...
Publication Logo

Date

2017

Journal Title

Journal ISSN

Volume Title

Publisher

Elsevier

Open Access Color

Green Open Access

No

OpenAIRE Downloads

OpenAIRE Views

Publicly Funded

No
Impulse
Top 10%
Influence
Top 10%
Popularity
Top 10%

Research Projects

Journal Issue

Abstract

Software fault prediction models are used to predict faulty modules at the very early stage of software development life cycle. Predicting fault proneness using source code metrics is an area that has attracted several researchers' attention. The performance of a model to assess fault proneness depends on the source code metrics which are considered as the input for the model. In this work, we have proposed a framework to validate the source code metrics and identify a suitable set of source code metrics with the aim to reduce irrelevant features and improve the performance of the fault prediction model. Initially, we applied a t-test analysis and univariate logistic regression analysis to each source code metric to evaluate their potential for predicting fault proneness. Next, we performed a correlation analysis and multivariate linear regression stepwise forward selection to find the right set of source code metrics for fault prediction. The obtained set of source code metrics are considered as the input to develop a fault prediction model using a neural network with five different training algorithms and three different ensemble methods. The effectiveness of the developed fault prediction models are evaluated using a proposed cost evaluation framework. We performed experiments on fifty six Open Source Java projects. The experimental results reveal that the model developed by considering the selected set of source code metrics using the suggested source code metrics validation framework as the input achieves better results compared to all other metrics. The experimental results also demonstrate that the fault prediction model is best suitable for projects with faulty classes less than the threshold value depending on fault identification efficiency (low - 48.89%, median- 39.26%, and high - 27.86%).

Description

kumar, lov/0000-0002-0123-7822; Misra, Sanjay/0000-0002-3556-9331; Rath, Santanu/0000-0001-5641-8199

Keywords

Feature selection techniques, Artificial neural network, Ensemble method, Source code metrics, Cost analysis framework

Fields of Science

0202 electrical engineering, electronic engineering, information engineering, 02 engineering and technology

Citation

WoS Q

Q2

Scopus Q

OpenCitations Logo
OpenCitations Citation Count
54

Source

Computer Standards & Interfaces

Volume

53

Issue

Start Page

1

End Page

32

Collections

PlumX Metrics
Citations

CrossRef : 54

Scopus : 62

Captures

Mendeley Readers : 67

SCOPUS™ Citations

62

checked on Feb 13, 2026

Web of Science™ Citations

47

checked on Feb 13, 2026

Page Views

2

checked on Feb 13, 2026

Google Scholar Logo
Google Scholar™
OpenAlex Logo
OpenAlex FWCI
12.87579188

Sustainable Development Goals

SDG data is not available