CN112527675A

CN112527675A - Lightweight software defect prediction method

Info

Publication number: CN112527675A
Application number: CN202011532907.4A
Authority: CN
Inventors: 包嘉盛; 任洪敏
Original assignee: Shanghai Maritime University
Current assignee: Shanghai Maritime University
Priority date: 2020-12-22
Filing date: 2020-12-22
Publication date: 2021-03-19

Abstract

The invention provides a lightweight software defect prediction method. Comprises the following steps: s1, acquiring a project code submitted by a user; s2, analyzing the Java code file by a Spotbugs tool; s3, comparing the historical defect data of the same item with the current data; s4, setting a minimum support degree and a minimum confidence degree; s5, carrying out correlation algorithm analysis by the FBCM algorithm; s6, item rating, defect, visualization of prediction result and leaving a message for the defect. The method is different from the common code analysis tool which only can analyze the codes of the project submitted at this time, can reflect the defect condition of the project for a long time, can predict the software defect, and realizes the control and management of the software project or the product quality.

Description

Lightweight software defect prediction method

Technical Field

The invention relates to the field of software defect prevention in software warehouse mining. In particular to a lightweight software defect prediction method.

Background

With the continuous development of the software industry, the complexity and prevention capability of software defects play an important role in the software industry. The success of a software project depends on quality, which refers to cost, time, effort. In order to avoid the huge cost caused by low software quality, the method is preferably invested in defect prevention. Software bug prevention is a complex but essential software test related activity. In building high-quality software, defect prevention plays an important role in the quality level of the software. In addition, it facilitates software testing and debugging. For large software, manually finding defects is very complex and time consuming because there are many close associations between source files. It follows that the quality of the software is very important, and testing the relevant software and tools becomes crucial in predicting defects. Software code bugs are often present and many companies use code review and test code to discover bugs. Therefore, a good defect prediction method is needed to find defects as quickly as possible. This not only saves time, but also ensures that high quality software is built. This method can also help development members know which mistakes they are prone to make and try to avoid.

In recent years, different methods have been proposed to detect whether a code is defective. Software defect prevention is analyzed by mining a historical warehouse of software, and measurement metadata of software modules are used for discovering and locking defect modules in advance, so that different statistics and data mining methods are provided. However, the research is still in the early stage, products or applications with universality are not released yet, and a special data mining system for software defect prevention is not available at home and abroad.

In short, the software defect prediction technology is mainly based on basic properties (length, complexity, function and process) of software, and history data of software defects is used for predicting defects which may be left behind or not discovered in the software, so that people can know the quality of the software, whether the software can be delivered to a client or not, and even find out possible failure modes in the process. As software gradually permeates into the aspects of life, software defects also face new challenges, and the software defects are required to be continuously innovated according to continuously changing requirements while solving the problems in the prior art, such as how to better adapt to rapidly changing markets, requirements of complex software systems and economic requirements of software values.

Therefore, a good software defect prevention method and a corresponding lightweight code quality management implementation system have great significance for product development.

Disclosure of Invention

The purpose of the invention is as follows: because the static code analysis tool is used only after the codes are analyzed, the result of the analysis can be obtained only, and finally, the static code analysis tool is modified by developers. The code of the developer cannot be systematically evaluated, the code quality cannot be systematically managed, and potential defects in the developed software product cannot be predicted. Moreover, there are many predictions for software defects, and the method for constructing the software defect prediction is very complex and has high requirements on a training set and a test set. For cross-project and even cross-company projects, software defects cannot be well predicted. The commercial code quality management system SonarQube can be used for continuously analyzing and evaluating the quality of project source codes, but can not predict defects, and can smoothly run only by requiring higher requirements on running memory by more than 4 GB. According to the light-weight software defect prediction method and the corresponding implementation system, development members can know common defects in the development process, the defects which are possibly generated after the defects are predicted, each project has a score according to the difference of the defect level and the number, the defect state of which the defect is generated can be checked, and a processing suggestion is provided. The quality of product development of a company can be improved, the efficiency in the development process is improved, and the risk of defect omission is reduced.

In order to achieve the purpose, the implementation of the invention adopts the following technical scheme:

using a code quality management system, wherein the code quality management system is a Web application program based on a B/S mode, and the whole technical system structure is divided into four parts, namely a user management part, a defect analysis part, a defect prediction part and a defect display part; the user management part is responsible for the registration and login of users, role distribution to employees and the management of setting related information, each user can distribute a role, each role has different authorities, and the function of employee management can be achieved by distributing different roles and authorities; the defect analysis component is responsible for analyzing codes of projects submitted by users and generating project analysis reports, comparing defect data in the new reports with historical defect data, judging newly-added defects and repaired defects, and finally storing results into the database; the defect prediction component analyzes the project defects through an FBCM algorithm to generate a frequent item set and an association rule, and the frequent item set is used for acquiring the defects which are easy to occur; the association rule finds out which defects are easy to occur next, and the prediction effect is achieved; the defect display part is used for displaying the defect information, the defect type proportion, the defect number and type of the specific items and predicting the defects which can appear next time; the defect display part is used for displaying the defect information, the defect type proportion, the defect number and type of the specific items and predicting the defects which can appear next time; the user management component enables the defect display component to display the item results checked by different authorities according to the authorities of the user; the defect analysis component generates data of analysis results, and the data enters the defect prediction component for prediction; the data structure generated by the defect prediction component is displayed in a defect display component;

s1, the system acquires the codes submitted to the code quality system by the user;

s2, analyzing the submitted Java code by using a code static analysis tool Spotbugs to obtain a code defect analysis report; if the project has historical data submitted in the past, only the modified or newly added code file is analyzed;

s3, acquiring historical defect data of the item from the database, and comparing the defects in the newly generated report with the historical defects so as to record the repaired and newly added defects; scoring the items according to the defect type and the defect quantity; if the existing defect is not processed in the past, if the existing defect is a new defect, storing the defect type, the defect position, the generation time and the code author information of the defect record into a database; if the report has no history unsolved defect, the submitted code repairs the defect, the secondary defect state is changed into solved, and the repair time is marked and stored in a database; scoring the items according to the defect types and the defect quantity, wherein if a high-risk defect occurs, the item rating is the lowest; if there are only individual insignificant defects, the score is relatively high;

s4, the user can input the minimum support degree or the minimum confidence degree of the association algorithm according to the requirement, and default setting can be used;

s5, carrying out correlation algorithm analysis on the project by using a matrix compression-based FUP optimization algorithm FBCM algorithm to obtain a frequent item set and a correlation rule; traversing a database, taking the code defects submitted to the project records every day as an object, wherein each object is a set and forms an alternative 1-item set with the length of 1, and the minimum support degree and the minimum confidence degree use default values or are self-defined, and the calculation method of the support degree and the minimum confidence degree is as follows;

Support(X→Y)＝P(XY)≥min_Support；

Confidence(X→Y)＝P(X|Y)≥min_Confidence；

wherein min _ Support, min _ Confidence is set by a user or uses a default value;

pruning the 1-item candidate set to obtain frequent 1-item sets, wherein the frequent 1-item sets are defects which frequently appear in the item;

mining association rules based on the transaction matrix, and respectively compressing two matrixes in each iteration in the mining process to reduce the search space of a solution; transforming the matrixes DB and DB, respectively converting the original transaction database DB and the incremental database DB into two transaction matrixes, and scanning the databases for 1 time to generate a Boolean matrix;

MatrixDB＝convert(DB)；Matrixdb＝convert(db)；

compress the matrix according to the frequency 1 set of entries: matrix × b ═ compress1(matrix xdb, L1(DB + DB)); matrixdb ═ compress1(Matrixdb, L1(DB + DB));

compressing the matrix according to the frequency k item set: matrix DB ═ compress2(Matrix xdb, s, k);

matrixdb ═ compress2(Matrixdb, s, k); wherein s: a support degree threshold; k: calculating a frequent k item set;

the Boolean matrix deletes and compresses the items which do not meet the conditions by calculating whether the item set support degree of the frequent k item set meets the minimum support degree; in the kth iteration process, if the number of column vectors is less than the minimum support threshold s, the column may be deleted; if the number of the row vectors is less than k, the row can be deleted;

mining association rules based on the transaction matrix, wherein in the mining process, two matrixes need to be compressed respectively for each iteration so as to reduce the search space of a solution;

using the existing frequent item set L (DB), finding out an item with the original frequent item changed into an infrequent item, called loser, finding out loser and deleting, using the rest items in L (DB) as winner, adding the winner into the total frequent item set L (DB + DB); finding a new winner (changed from an original infrequent item to a frequent item) in the incremental database DB, and adding the new winner into the total frequent item set L (DB + DB); the new frequent item set can only be generated in the original DB frequent item set L (DB) and the newly added data DB frequent item set L (DB); gradually calculating from the frequent 1 item set to the maximum frequent set;

the association rule generated by the algorithm is used for predicting the defects which may be generated next, and after the user submits the codes of the project, if the defects which appear exist in the existing association rule, the result deduced by the association rule is the possible code defects, so that the software defects can be prevented;

and S6, storing the generated frequent item set and the association rule into a database, and returning the defect type, the detailed information and the data of the defect type of the project analysis to a front-end analysis interface for being displayed for a user to check.

Compared with the prior art, the lightweight software defect prediction method and the system have the following advantages:

the existing common static code quality analysis tool can only analyze the codes of the submitted project and generate a software defect report, but cannot reflect the defect condition of the project for a long time and can not better process and predict the defects. The existing code quality management platform has uneven quality, some functions are not perfect, and some operations have higher requirements on a hardware system and cannot be well used. Currently, most of research results on software defect prediction technology come from empirical research. Although many software defect prediction models have been implemented, most of the research is still in the theoretical stage of the software defect prediction models and is not applied to the actual cases. Aiming at the situation, the system requirement of the software defect prediction system is determined in the process of deeply understanding and learning the theory and predicting the software defects, and the software defect prediction system is designed and realized and aims to put the software defect prediction theory into practice and application. The code quality management system can effectively record software defects, track repair progress, make statistics and analysis on the defects of each dimension, and finally realize control and management on the quality of software projects or products so as to ensure the normal operation of capacity and the system. The application of the code quality management system undoubtedly improves the working efficiency of managers and has good social and economic values.

Drawings

In order to more clearly illustrate the technical solution of the present invention, the drawings used in the description will be briefly introduced, and it is obvious that the drawings in the following description are an embodiment of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts according to the drawings:

FIG. 1 is a flow chart of an implementation of the lightweight software defect prediction method of the present invention.

FIG. 2 is a system architecture diagram provided by the lightweight software defect prediction method of the present invention.

FIG. 3 is a flow chart of data mining provided by the lightweight software defect prediction method of the present invention.

FIG. 4 is a visual interface effect diagram of the lightweight software defect prediction method of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to achieve the aim, the invention provides a lightweight software defect prediction method, which uses a code quality management system, wherein the code quality management system is a Web application program based on a B/S mode, and the whole technical system structure is divided into four parts, namely a user management part, a defect analysis part, a defect prediction part and a defect display part; the user management part is responsible for the registration and login of users, role distribution to employees and the management of setting related information, each user can distribute a role, each role has different authorities, and the function of employee management can be achieved by distributing different roles and authorities; the defect analysis component is responsible for analyzing codes of projects submitted by users and generating project analysis reports, comparing defect data in the new reports with historical defect data, judging newly-added defects and repaired defects, and finally storing results into the database; the defect prediction component analyzes the project defects through an FBCM algorithm to generate a frequent item set and an association rule, and the frequent item set is used for acquiring the defects which are easy to occur; the association rule finds out which defects are easy to occur next, and the prediction effect is achieved; the defect display part is used for displaying the defect information, the defect type proportion, the defect number and type of the specific items and predicting the defects which can appear next time; the defect display part is used for displaying the defect information, the defect type proportion, the defect number and type of the specific items and predicting the defects which can appear next time; the user management component enables the defect display component to display the item results checked by different authorities according to the authorities of the user; the defect analysis component generates data of analysis results, and the data enters the defect prediction component for prediction; the data structure generated by the defect prediction component is displayed in a defect display component; comprises the following steps as shown in figure 1:

Support(X→Y)＝P(XY)≥min_Support；

Confidence(X→Y)＝P(X|Y)≥min_Confidence

MatrixDB＝convert(DB)；Matrixdb＝convert(db)；

compressing the matrix according to the frequency k item set: matrix DB as compress2(Matrix xDB, s, k)

As shown in fig. 2, the system structure of the present invention includes a login module, an item submission module, an item rating and information module, an item defect chart display module, an item defect prediction module, an item defect message module, a user management module, a defect data analysis and integration module, a defect prediction module, a module management module, and a log management module.

The login module is responsible for the registration and login of the user; the user management module is responsible for management of assigning roles to the employees and setting related information, each user can assign a role, each role has different authorities, and the function of employee management can be achieved by assigning different roles and authorities.

The project submitting module is responsible for submitting codes of projects by users, the system can submit Java projects, the Java language projects are still mainstream, and the Java language projects are widely used in the backend projects of enterprises at present.

The project rating and information module is responsible for displaying the quality of the project, each project is rated, grading is carried out according to the category and data of the project defects, and the current code quality of the project is reflected along with change along with the later submission and modification of the project. Meanwhile, the code submission information of developers, who submits the codes of the projects when and how many defects are generated and repaired can be checked. The project defect message module provides a message for the development member about a defect, such as a production reason, a solution method, and a predicted solution time.

The project defect chart display module is responsible for displaying defect information, defect type proportion, defect quantity and type of specific projects and predicting defects which may appear next time as shown in fig. 4.

The defect data analysis and integration module is used for analyzing the project by using a Spotbugs code analysis tool at the background, generating a project analysis report, comparing the defect data in the new report with the historical defect data, judging newly-added defects and repairing the defects, and finally storing the result into the database.

The defect prediction module analyzes the project defects through an FBCM algorithm by a background so as to generate a frequent item set and an association rule, and obtains the defects which are easy to occur through the frequent item set; the association rule finds out which defects are easy to occur next, and the preventive effect is shown in fig. 3.

While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A lightweight software defect prediction method uses a code quality management system, wherein the code quality management system is a Web application program based on a B/S mode, and the whole technical system structure is divided into four parts, namely a user management part, a defect analysis part, a defect prediction part and a defect display part; the user management part is responsible for the registration and login of users, role distribution to employees and the management of setting related information, each user can distribute a role, each role has different authorities, and the function of employee management can be achieved by distributing different roles and authorities; the defect analysis component is responsible for analyzing codes of projects submitted by users and generating project analysis reports, comparing defect data in the new reports with historical defect data, judging newly-added defects and repaired defects, and finally storing results into the database; the defect prediction component analyzes the project defects through an FBCM algorithm to generate a frequent item set and an association rule, and the frequent item set is used for acquiring the defects which are easy to occur; the association rule finds out which defects are easy to occur next, and the prediction effect is achieved; the defect display part is used for displaying the defect information, the defect type proportion, the defect number and type of the specific items and predicting the defects which can appear next time; the defect display part is used for displaying the defect information, the defect type proportion, the defect number and type of the specific items and predicting the defects which can appear next time; the user management component enables the defect display component to display the item results checked by different authorities according to the authorities of the user; the defect analysis component generates data of analysis results, and the data enters the defect prediction component for prediction; the data structure generated by the defect prediction component is displayed in a defect display component; the method for predicting the light-weight software defects is characterized by comprising the following steps of:

Support(X→Y)＝P(XY)≥min_Support；

Confidence(X→Y)＝P(X|Y)≥min_Confidence；

MatrixDB＝convert(DB)；Matrixdb＝convert(db)；

calculating a rule with the confidence coefficient larger than the minimum confidence coefficient to generate an association rule for predicting defects which may be generated next, and after a user submits codes of a project, if the defects exist in the existing association rule, the result derived by the association rule is the possible code defects, so that software defects can be prevented;