Nowadays, software is the key element of function in many modern engineered systems. It can be found in a variety of applications from the refrigerators to cars to the washing machine, etc. Due to the advancement of technology brought about by the IT industry, the significance of the field of software engineering has been continuously increasing. In this growing number of applications of software, one vital challenge is to ensure that system is matching with the quality specifications and is reliable to use. The quality of the software determines its value and reliability ensures the failure-free operation of software for a specified period of time in the given environmental conditions. Quality and reliability of the software depend on the software faults. The more the faults, the lesser the reliability of the software and more efforts are required to maintain the quality of the software. Software quality assurance (SQA) can be thought as an umbrella activity, which incorporates various activities to organize and monitor the software development process and ensure that the final software product is of higher quality and reliability (Menzies et al. ). However, SQA is a time-consuming process and requires an ample amount of resources.
Software fault prediction ( SFP) can be used to help in allocating limited SQA resources in a cost-effective and optimized manner by predicting the fault-proneness of software modules before the testing. It is a process to predict the fault-prone software modules without executing them by using some underlying characteristics of the given software system. A typical fault prediction model is built by training a learning technique for the dataset having some structural properties together with the fault information for a known project. The trained prediction models are subsequently used to predict fault-proneness of the unknown software project (Arisholm et al. ). The quick and early identification of the faulty modules as a result of using software fault prediction can be utilized by the tester or developer to better strategize the software quality assurance efforts.
The lure of early detection of faults and improving the quality of the system has attracted considerable attention of research community to the software fault prediction . A wide range of statistical and machine learning techniques has been used earlier to build the fault prediction models and to predict the fault-proneness of currently developing software system. Moreover, the availability of open-source and publicly accessible software fault dataset repositories such as NASA Metrics Data Program and PROMISE data repository (PROMISE ) is allowing researchers to undertake more investigations and is opening up new areas of applications.
1.1 Software Faults, Errors, and Failure Terminologies
Software fault can arise in any phase of the software development including requirements gathering and specifications, designs, code, or maintenance. Depending upon the origin of the fault, the nature of the fault differs. The faults that are translated into the code of the software may lead to system failure , if not identified and removed correctly. Errors , defects, faults, and failures are interrelated terms and often created misunderstanding in their definitions (Huizinga and Kolawa ). Here, we follow the IEEE standard 610-1990 to define these terms.
If programmer misses the semicolon at the end of the third line of code and compiles the program, then an error message will occur. This type of mistake called error.
Bug : An unexpected result or deviation in actual functionality found out by an author (who wrote the code) after compilation of program and during any testing phase is called bug.
Ex. In the above program, if the author uses some other variable like d instead of b at third line or uses + operator instead / operator, then no compilation error occurs but it produces some unexpected results. This is called bug.
Exception : An unhandled error occurring at run-time of the program is called exception.
Ex. There is no error generated at the compilation time, but at the run-time it throws an exception for b = 0 (third line).
Fault : An incorrect step, process, or data definition in a computer program that causes the program to perform in an unanticipated manner. It is commonly known as defect also and generally found by moderator (not an author of code).
Ex. In the above program, if the author uses some other variable like d instead of b at third line or uses + operator instead / operator, then no compilation error occurs but it produces some unexpected results and this issue is found by moderator. This is called fault or defect.
Failure : The inability of a software or software component to perform its required functions within specified performance requirements. In other words, the software does not do what the requirements describe.
Ex. Due to the fault occurrence, if any other part of code or module gets affected, this condition is called failure .
Software faults are different from software failures. Software faults are an indication of a quality attribute that explains a condition that the software fails to perform its desired functions. While, software failures are the symptoms, revealed when one or more software faults are executed.