CN113791976B

CN113791976B - Method and device for enhancing defect positioning based on program dependence

Info

Publication number: CN113791976B
Application number: CN202111056342.1A
Authority: CN
Inventors: 张天; 潘敏学; 罗雯波
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2021-09-09
Filing date: 2021-09-09
Publication date: 2023-06-20
Anticipated expiration: 2041-09-09
Also published as: CN113791976A

Abstract

The invention discloses a method and a device for enhancing defect positioning based on program dependence. According to the method, statement data dependency relationship analysis of source codes is carried out, a statement defect suspicious degree table obtained by analyzing the source codes through an existing software tool is combined with the statement data dependency relationship to form a characteristic value vector composed of n+1 suspicious degree values, the characteristic value vector is used as a sample of each statement of the source codes and is input into a support vector machine for machine analysis, so that an optimized defect suspicious degree list is obtained, a positioning result is enhanced, and defect positioning is more accurate.

Description

Method and device for enhancing defect positioning based on program dependence

Technical Field

The present invention relates to program code analysis technology, and more particularly to defect analysis technology of source code of program software.

Background

Software inevitably has defects. In software product projects, the test period is often much longer than the development period, because a lot of labor is required to repair defects found during the test. In defect repair, a primary difficulty is defect localization, i.e., finding where the software code is present. Especially when the software is large in scale, it takes a lot of time to find the defect code.

Thus, automated software defect localization methods and tools are increasingly gaining importance. In the prior art, defect localization is mainly achieved by two methods: the first is to traverse the program code by a static inspection tool to find out the suspicious defect; the second method is to find out the suspicious part of the statement defect through the statement coverage information in the program test execution process, such as an automated program defect positioning tool based on program frequency spectrum represented by Ochiai.

Disclosure of Invention

The invention aims to solve the problems that: and further analyzing the suspicious positions of the sentence defects found by the existing tool, so that the suspicious positions of the sentence defects are more accurate.

In order to solve the problems, the invention adopts the following scheme:

the method for enhancing defect positioning based on program dependence comprises the following steps: a data acquisition step,

A dependency analysis step, a model training step and a defect analysis step;

the data acquisition step is used for: acquiring training data and data to be evaluated;

the training data comprises training source codes, a training defect suspicious degree list and a known defect list;

the data to be evaluated comprises a source code to be evaluated and a defect suspicion list to be evaluated;

The training defect suspicion list and the defect suspicion list to be evaluated are both defect suspicion lists;

the dependency analysis step: constructing a control flow graph according to an input source code, and then constructing statement data dependency relations among nodes of the control flow graph through data flow analysis of the control flow graph;

the model training step comprises the following steps:

step ST1: analyzing sentence data dependency relationship of the training source code through the dependency analysis step to obtain training sentence data dependency relationship;

step ST2: constructing a corresponding defect positioning training sample for each sentence of the training source code according to the training sentence data dependency relationship, the training defect suspicious list and the known defect list; the defect positioning training sample comprises a characteristic value vector and a defect label;

step ST3: inputting the eigenvalue vector and the defect label in each defect positioning training sample into a support vector machine for training;

the defect analysis step includes the steps of:

step SA1: analyzing the sentence data dependency relationship of the source code to be evaluated through the dependency analysis step to obtain the sentence data dependency relationship to be evaluated;

Step SA2: constructing a corresponding defect positioning evaluation sample for each sentence of the source code to be evaluated according to the sentence data dependency relationship to be evaluated and the defect suspicion list to be evaluated; the defect localization evaluation sample comprises a eigenvalue vector;

step SA3: inputting the eigenvalue vector in each defect positioning evaluation sample to a support vector machine trained by the model training step for evaluation, and obtaining a new defect suspicious list;

in the above-mentioned steps of the method,

the defect suspicion list is a set of suspicion positioning information;

the suspicious defect positioning information at least comprises statement positions and suspicious values;

the known defect table is a set of known defect localization information;

the known defect localization information comprises at least sentence positions;

the statement position is used for indicating the position of the current statement in the source code;

the defect label is used for indicating whether the current sentence has a known defect or not, and is determined according to the known defect table;

the eigenvalue vector is a vector composed of n+1 suspicious values; the first suspicious value is the suspicious value of the current sentence, and the other N suspicious values are the suspicious values of N sentences with highest suspicious values of the sentences with dependency relationship with the current sentence;

The dependency relationship is determined according to the statement data dependency relationship;

and determining the suspicious value of the statement according to the suspicious defect positioning table.

Further, according to the method for enhancing defect localization based on program dependency according to the present invention, the dependency analysis step comprises the steps of:

step SY1: constructing a control flow graph according to the input source code;

in the control flow graph, each node corresponds to a source code statement;

step SY2: analyzing the variables related to each node in the control flow graph to obtain a set of node variable information; the node variable information corresponds to a variable and comprises variable information, a variable value type and a statement position; the variable value type is divided into change and reference, if the corresponding variable is changed in the corresponding statement, the variable value type is changed; if the corresponding variable is used in the corresponding statement, the variable value type is a reference;

step SY3: combining the node variable information sets of the corresponding precursor nodes of each node in the control flow graph to the current node in an iterative mode to form a new node variable information set until the node variable information sets of each node are not changed;

Step SY4: extracting a set of node variable information of each node, removing node variable information with variable value types as references, and forming data dependency information corresponding to each node as statement data dependency relation after simplifying processing;

the data dependency information is a set of variable value change information corresponding to variables stored in the current node;

the variable value change information comprises change position information;

the change position information represents the statement position of the variable value of the variable in the preamble node when the variable value is changed;

the preamble node refers to a node positioned before the current node on a path represented by the control flow graph;

the precursor node refers to a precursor node connected with the current node.

Further, according to the method for enhancing defect localization based on program dependency of the present invention, in the step SY4, the data dependency information corresponding to each node is converted into a statement list having a dependency relationship with the current statement as its statement data dependency relationship.

An apparatus for enhancing defect localization based on program dependency according to the present invention comprises: a data acquisition module,

The system comprises a dependence analysis module, a model training module and a defect analysis module;

The data acquisition module is used for: acquiring training data and data to be evaluated;

the dependency analysis module: constructing a control flow graph according to an input source code, and then constructing statement data dependency relations among nodes of the control flow graph through data flow analysis of the control flow graph;

the model training module comprises the following modules:

module MT1: analyzing sentence data dependency relationship of the training source code through the dependency analysis module to obtain training sentence data dependency relationship;

module MT2: constructing a corresponding defect positioning training sample for each sentence of the training source code according to the training sentence data dependency relationship, the training defect suspicious list and the known defect list; the defect positioning training sample comprises a characteristic value vector and a defect label;

module MT3: inputting the eigenvalue vector and the defect label in each defect positioning training sample into a support vector machine for training;

The defect analysis module comprises the following modules:

module MA1: analyzing the sentence data dependency relationship of the source code to be evaluated through the dependency analysis module to obtain the sentence data dependency relationship to be evaluated;

module MA2: constructing a corresponding defect positioning evaluation sample for each sentence of the source code to be evaluated according to the sentence data dependency relationship to be evaluated and the defect suspicion list to be evaluated; the defect localization evaluation sample comprises a eigenvalue vector;

module MA3: inputting the eigenvalue vector in each defect positioning evaluation sample to a support vector machine trained by the model training module for evaluation to obtain a new defect suspicious list;

in the above-mentioned respective modules, a plurality of the modules,

the defect suspicion list is a set of suspicion positioning information;

the known defect table is a set of known defect localization information;

Further, according to the apparatus for enhancing defect localization based on program dependency of the present invention, the dependency analysis module includes:

module MY1: constructing a control flow graph according to the input source code;

in the control flow graph, each node corresponds to a source code statement;

module MY2: analyzing the variables related to each node in the control flow graph to obtain a set of node variable information; the node variable information corresponds to a variable and comprises variable information, a variable value type and a statement position; the variable value type is divided into change and reference, if the corresponding variable is changed in the corresponding statement, the variable value type is changed; if the corresponding variable is used in the corresponding statement, the variable value type is a reference;

Module MY3: combining the node variable information sets of the corresponding precursor nodes of each node in the control flow graph to the current node in an iterative mode to form a new node variable information set until the node variable information sets of each node are not changed;

module MY4: extracting a set of node variable information of each node, removing node variable information with variable value types as references, and forming data dependency information corresponding to each node as statement data dependency relation after simplifying processing;

the variable value change information comprises change position information;

the precursor node refers to a precursor node connected with the current node.

Further, according to the program dependency enhancement defect localization-based device of the present invention, the module MY4 converts the data dependency information corresponding to each node into a statement list having a dependency relationship with the current statement as its statement data dependency relationship.

The invention has the following technical effects: according to the invention, the machine learning technology is used for comprehensively analyzing the data dependency relationship of the original defect positioning result and the program, and the positioning result is enhanced, so that the defect positioning is more accurate.

Drawings

FIG. 1 is a flow chart of an embodiment of the present invention.

Where SY is the dependency analysis step, ST is the model training step, SA is the defect analysis step.

FIG. 2 is an example source code of an embodiment of the invention.

FIG. 3 is a control flow diagram constructed in accordance with the example source code of FIG. 2 in accordance with an embodiment of the invention.

Detailed Description

The invention is described in further detail below with reference to the accompanying drawings.

As shown in fig. 1, the method for enhancing defect localization based on program dependency of the present invention mainly includes a model training step ST and a defect analysis step SA.

The foregoing data acquisition step represents the input of the present invention, and specifically how to acquire the data, and the style of acquiring the data is not necessary to be described in detail. The input of the invention is: training data and data to be evaluated. The training data is used for model training step ST, and comprises training source codes, a training defect suspicious degree list and a known defect list. The data to be evaluated is used for the defect analysis step SA, and comprises a source code to be evaluated and a defect suspicion list to be evaluated.

The training source code and the source code to be evaluated are program source codes. The program source code may be source code written in various programming languages, such as c, c++, java, ada, go, phython, etc. The programming language in which the program source code is written is not limited. It should be emphasized that the training source code and the source code to be evaluated generally require to be written in the same programming language.

The training defect suspicion list and the defect suspicion list to be evaluated are both defect suspicion lists. The defect suspicion list is a collection of suspicion location information. The suspicious defect localization information includes at least statement locations and suspicious values. The current statement is indicated in the source code by a file name and a line number, and can be combined with the information of the function, class, method and the like. The suspicion value is a measure of the range of values between 0 and 1. When the suspicion value is 1, the statement is indicated to have a defect; when the suspicion value is 0, it indicates that the sentence is not defective. Defect suspicion lists are typically generated by other software tools, such as the aforementioned Ochiai.

The known defect table is a collection of known defect localization information; the known defect localization information includes at least the statement location. Of course, it is understood by those skilled in the art that the known defect table may be directly combined with the defect suspicion training table, and in this case, the item with suspicion value of 1 in the combined table is the information corresponding to the known defect. The present invention has been described with respect to the separation thereof for convenience of description only.

The model training step ST includes the steps of:

step ST1, namely, sentence data dependency analysis of the training source code, is specifically: and analyzing sentence data dependency relationship of the training source codes to obtain training sentence data dependency relationship.

Step ST2, namely, constructing a defect localization training sample, specifically: constructing a corresponding defect positioning training sample for each sentence of the training source code according to the training sentence data dependency relationship, the training defect suspicious list and the known defect list; the defect localization training samples include eigenvalue vectors and defect labels. The defect label is used to indicate whether the current sentence has a known defect, and is determined according to a known defect table, and is generally indicated by 0, 1. The eigenvalue vector is a vector composed of n+1 suspicious values; the first suspicious value is the suspicious value of the current sentence, and the other N suspicious values are the suspicious values of N sentences with highest suspicious values of the sentences with dependency relationship with the current sentence. The dependency relationship here is determined by the inputted sentence data dependency relationship. The suspicion value of the statement herein is determined from the entered defect suspicion location table.

Step ST3, namely, training the support vector machine, specifically comprises: and inputting the eigenvalue vector and the defect label in each defect positioning training sample into a support vector machine for training.

The defect analysis step SA includes the steps of:

step SA1, namely, sentence data dependency analysis of the source code to be evaluated, specifically comprises the following steps: and analyzing the sentence data dependency relationship of the source code to be evaluated to obtain the sentence data dependency relationship to be evaluated.

Step SA2: constructing a corresponding defect positioning evaluation sample for each sentence of the source code to be evaluated according to the sentence data dependency relationship to be evaluated and the defect suspicion list to be evaluated; the defect localization evaluation samples include eigenvalue vectors.

Step SA3: and inputting the eigenvalue vectors in each defect positioning evaluation sample into a support vector machine trained by a model training step for evaluation, and obtaining a new defect suspicious list.

In the above steps, step ST1 and step SA1 are substantially the same, except that the program source code input to them is different. In the present embodiment, step ST1 and step SA1 are realized by the dependency analysis step SY. In particular, in implementation, the dependency analysis step SY may be embodied as a machine process performed by a function call. The dependency analysis step SY is used for analyzing the data dependency relationship among sentences and outputting the sentence data dependency relationship. The training sentence data dependency relationship and the sentence data dependency relationship to be evaluated are both sentence data dependency relationships output by the dependency analysis step SY.

And a dependency analysis step, namely, constructing a control flow graph according to the input source code, and then constructing statement data dependency relations among all nodes of the control flow graph through data flow analysis of the control flow graph. The inter-statement data herein refers to program variables. The data flow here is the case in which the program variables are changed and referenced. The aforementioned sentence data dependency relationship is a dependency relationship between sentence variables. Depending on the analysis step, the embodiment is realized by the following method:

step SY1: constructing a control flow graph according to the input source code;

step SY2: analyzing the related variables for each node in the control flow graph to obtain a set of node variable information;

step SY3: combining the node variable information sets of the corresponding precursor nodes of each node in the control flow graph into the current node in an iterative mode to form a new node variable information set until the node variable information sets of each node are not changed;

step SY4: and extracting a set of node variable information of each node, removing the node variable information with the variable value type as a reference, and forming data dependency information corresponding to each node as statement data dependency relation after simplifying processing.

In step SY1, the construction of a control flow graph from source code is familiar to those skilled in the art. It should be noted that, in a general control flow graph, each node is a basic block of a program. For example, the code illustrated in FIG. 2 is a program segment written in the c++ language listing the values of the first 100 of the Phenobonas series.

Lines

1, 2, 3 and 4 form a basic program block, and form a node of a control flow graph.

Lines

10, 11, 12, 13 form a basic block of a program, forming a node of a control flow graph. In this embodiment, such basic blocks are split into separate sentences for ease of analysis and localization. That is, in this embodiment, each statement is taken as a node of a control flow graph. Thus, the basic blocks of the program in

lines

1, 2, 3, and 4 are decomposed into nodes of four control flow diagrams, and the basic blocks of the program in

lines

10, 11, 12, and 13 are decomposed into nodes of four control flow diagrams. Referring to fig. 3, in the control flow graph constructed in this embodiment, statements are used as nodes, and since each statement is a Line and there is only one statement in a Line in the example code, the control flow graph illustrated in fig. 3 uses Line numbers as node identifiers, and node identifiers Line1 to Line16 are respectively the Line numbers where the corresponding statements are located.

In step SY2, the node variable information corresponds to a variable, including variable information, variable value type and sentence position. The variable information is generally indicated by a variable name, and under the condition of more code quantity, function names, method names and class names can be combined to serve as variable information for specifying the variable corresponding to the node variable information. For example, if the code in the example code of fig. 2 is a piece of code in the function name of Fabonacci100, variables a, b, i and t in the code may be referred to by fabonacci100_a, fabonacci100_b, fabonacci100_i and fabonacci100_t as variable information respectively. The variable value type is divided into change and reference, if the corresponding variable is assigned or created in the corresponding statement, the variable value type is change. If the corresponding variable is used in the corresponding statement, the variable value type is a reference. For example, if the statement corresponding to the node Line2 is a=1, the node variable information set corresponding to the node may be represented as { { a, change, line2}, where a is a variable name, the change represents that the variable value of the variable a is assigned in the statement and changes, and Line2 is the statement position indicated by the Line number.

For simplicity of processing, in this embodiment, the control flow graph takes a function or class method as a basic unit. If the call of the function or the method appears in the code, the call of the function or the method is used as a statement to construct a node. When the function or method is called, the formalized parameters are used as the assignment behavior of the variable in the statement. Because the arguments of a function or method may be assigned during the execution of the function or method. In addition, for a function or method call, global variables or class members may be involved, one way is to add node variable information corresponding to all the global variables and class members to the set of node variable information of the statement, and another way is to add node variable information corresponding to the global variables and class members involved in the function or method to the set of node variable information of the statement.

Further, it should be noted that each statement does not necessarily refer to a variable, for example, the statement corresponding to the node Line4 and Line16 is a call of the function print, and the statement does not refer to a variable, and the set of the node variable information corresponding thereto is empty.

Step SY3 is a loop iterative process. Step SY2 is an initialization process of the set of node variable information of the node, compared to step SY 3. In the loop iteration of step SY3, each iteration process traverses each node according to the control flow graph, and the node variable information sets of the corresponding precursor nodes of each node in the control flow graph are combined into the current node to form a new node variable information set. The precursor node refers to a precursor node connected with the current node. The preamble node refers to a node located before the current node on the path represented by the control flow graph. For example, in the control flow graph illustrated in fig. 3, the precursor node of the node Line5 is: node Line4, node Line8, and node Line13.

Take the code illustrated in fig. 2 and the control flow graph illustrated in fig. 3 as examples. After step SY2, the node variable information sets of each node of the control flow graph are respectively:

node Line1: { a, change, line1}, { b, change, line1 };

Node Line2: { a, change, line2 };

node Line3: { b, change, line3 };

node Line4: { };

node Line5: { i, change, line5 };

node Line7: { a, reference, line7 };

node Line8: { a, reference, line8 };

node Line10: { a, reference, line10}, { b, reference, line10}, { t, change, line10 };

node Line11: { a, reference, line11}, { b, change, line11 };

node Line12: { t, reference, line12}, { a, change, line12 };

node Line13: { i, reference, line13}, { a, reference, line13 };

node Line16: {}.

After the control flow graph of the first round of step SY3 is traversed, the node variable information sets of each node of the control flow graph are respectively:

node Line1: { a, change, line1}, { b, change, line1 };

node Line2: { a, change, line1}, { b, change, line1}, { a, change, line2 };

node Line3: { a, change, line1}, { b, change, line1}, { a, change, line2}, { b, change, line3 };

node Line4: { a, change, line1}, { b, change, line1}, { a, change, line2}, { b, change, line3 };

node Line5: { a, change, line1}, { b, change, line1}, { a, change, line2}, { b, change, line3}, { i, change, line5}, { a, reference, line8}, { i, reference, line13}, { a, reference, line13 };

Node Line7: { a, change, line1}, { b, change, line1}, { a, change, line2}, { b, change, line3}, { i, change, line5}, { a, reference, line8}, { i, reference, line13}, { a, reference, line7 };

node Line8: { a, change, line1}, { b, change, line1}, { a, change, line2}, { b, change, line3}, { i, change, line5}, { a, reference, line8}, { i, reference, line13}, { a, reference, line7}, { a, reference, line8 };

node Line10: { a, change, line1}, { b, change, line1}, { a, change, line2}, { b, change, line3}, { i, change, line5}, { a, reference, line8}, { i, reference, line13}, { a, reference, line7}, { a, reference, line10}, { b, reference, line10}, { t, change, line10 };

node Line11: { a, change, line1}, { b, change, line1}, { a, change, line2}, { b, change, line3}, { i, change, line5}, { a, reference, line8}, { i, reference, line13}, { a, reference, line7}, { a, reference, line10}, { b, reference, line10}, { t, change, line10}, { a, reference, line11}, { b, change, line11 };

node Line12: { a, change, line1}, { b, change, line1}, { a, change, line2}, { b, change, line3}, { i, change, line5}, { a, reference, line8}, { i, reference, line13}, { a, reference, line7}, { a, reference, line10}, { b, reference, line10}, { t, change, line10}, { a, reference, line11}, { b, change, line11}, { t, reference, line12}, { a, change, line12 };

Node Line13: { a, change, line1}, { b, change, line1}, { a, change, line2}, { b, change, line3}, { i, change, line5}, { a, reference, line8}, { i, reference, line13}, { a, reference, line7}, { a, reference, line10}, { b, reference, line10}, { t, change, line10}, { a, reference, line11}, { b, change, line11}, { t, reference, line12}, { a, change, line12 };

node Line16: { a, change, line1}, { b, change, line1}, { a, change, line2}, { b, change, line3}, { a, reference, line8}, { a, reference, line13 }.

After the control flow graph of the second round of step SY3 is traversed, the node variable information sets of each node of the control flow graph are respectively:

node Line1: { a, change, line1}, { b, change, line1 };

node Line2: { a, change, line1}, { b, change, line1}, { a, change, line2 };

node Line5: { a, change, line1}, { b, change, line1}, { a, change, line2}, { b, change, line3}, { i, change, line5}, { a, reference, line8}, { i, reference, line13}, { a, reference, line7}, { a, reference, line10}, { b, reference, line10}, { a, reference, line11}, { b, change, line11}, { a, change, line12 };

Node Line7: { a, change, line1}, { b, change, line1}, { a, change, line2}, { b, change, line3}, { i, change, line5}, { a, reference, line8}, { i, reference, line13}, { a, reference, line7}, { a, reference, line10}, { b, reference, line10}, { a, reference, line11}, { b, change, line11}, { a, change, line12 };

node Line8: { a, change, line1}, { b, change, line1}, { a, change, line2}, { b, change, line3}, { i, change, line5}, { a, reference, line8}, { i, reference, line13}, { a, reference, line7}, { a, reference, line10}, { b, reference, line10}, { a, reference, line11}, { b, change, line11}, { a, change, line12 };

node Line10: { a, change, line1}, { b, change, line1}, { a, change, line2}, { b, change, line3}, { i, change, line5}, { a, reference, line8}, { i, reference, line13}, { a, reference, line7}, { a, reference, line10}, { b, reference, line10}, { t, change, line10}, { a, reference, line11}, { b, change, line11}, { t, reference, line12}, { a, change, line12 };

node Line11: { a, change, line1}, { b, change, line1}, { a, change, line2}, { b, change, line3}, { i, change, line5}, { a, reference, line8}, { i, reference, line13}, { a, reference, line7}, { a, reference, line10}, { b, reference, line10}, { t, change, line10}, { a, reference, line11}, { b, change, line11}, { t, reference, line12}, { a, change, line12 };

node Line16: { a, change, line1}, { b, change, line1}, { a, change, line2}, { b, change, line3}, { a, reference, line8}, { a, reference, line13}, { a, reference, line7}, { a, reference, line10}, { b, reference, line10}, { a, reference, line11}, { b, change, line11}, { a, change, line12 }.

After the control flow graph of the two-wheel step SY3 is traversed, the node variable information set of each node of the control flow graph is not changed any more, and the loop iteration of the step SY3 is ended.

Furthermore, those skilled in the art understand that when merging the set of node variable information for each precursor node of a node to the current node, the scope of the variables needs to be considered. For example, the set of node variable information of node Line5 { a, change, line1}, { b, change, line1}, { a, change, line2}, { b, change, line3}, { i, change, line5}, { a, reference, line8}, { i, reference, line13}, { a, reference, line7}, { a, reference, line10}, { b, reference, line10}, { a, reference, line11}, { b, change, line11}, { a, change, line12}, the variable i is defined in the for statement, the scope of which is limited to the for loop body only, and the node Line16 is outside the for loop body, so that when the set of node variable information of the precursor node Line5 of the node 16 is incorporated onto the node Line16, the corresponding variable information of the node should be rejected. For another example, the scope of the variable t as a temporary variable is limited to the sentences Line10, line11, line12 and Line13, so that when the set of node variable information of the preceding node Line13 of the node Line5 is merged onto the node Line5, the node variable information related to the temporary variable t should be culled.

In step SY4, the data dependency information is a set of variable value change information corresponding to the variable stored in the current node; the variable value change information includes a set of change position information; the change position information indicates a statement position where a variable value of a variable in the preamble node is changed. In the previous example, the node variable information set of each node formed by the code illustrated in fig. 2 is removed, the node variable information with the variable value type being the reference is removed, and the rest variable value types are all changed, so that the existence of the variable value types has no meaning, the variable value type parameters are removed, and then the data dependency information of each node is obtained as follows:

Node Line1: { a, line1}, { b, line1 };

node Line2: { a, line1}, { b, line1}, { a, line2 };

node Line3: { a, line1}, { b, line1}, { a, line2}, { b, line3 };

node Line4: { a, line1}, { b, line1}, { a, line2}, { b, line3 };

node Line5: { a, line1}, { b, line1}, { a, line2}, { b, line3}, { i, line5}, { b, line11}, { a, line12 };

node Line7: { a, line1}, { b, line1}, { a, line2}, { b, line3}, { i, line5}, { b, line11}, { a, line12 };

node Line8: { a, line1}, { b, line1}, { a, line2}, { b, line3}, { i, line5}, { b, line11}, { a, line12 };

node Line10: { a, line1}, { b, line1}, { a, line2}, { b, line3}, { i, line5}, { t, line10}, { b, line11}, { a, line12 };

node Line11: { a, line1}, { b, line1}, { a, line2}, { b, line3}, { i, line5}, { t, line10}, { b, line11}, { a, line12 };

node Line12: { a, line1}, { b, line1}, { a, line2}, { b, line3}, { i, line5}, { t, line10}, { b, line11}, { a, line12 };

node Line13: { a, line1}, { b, line1}, { a, line2}, { b, line3}, { i, line5}, { t, line10}, { b, line11}, { a, line12 };

node Line16: { a, line1}, { b, line1}, { a, line2}, { b, line3}, { b, line11}, { a, line12 }.

And then the data dependency information is subjected to the following simplified processing:

1. rejecting sentences included in the node itself;

2. and eliminating variable information in the data dependency information.

The data dependency information of each node representing the sentence data dependency relationship is thus obtained as follows:

node Line1: { };

node Line2: { Line1};

node Line3: { Line1, line2};

node Line4: { Line1, line2, line3};

node Line5: { Line1, line2, line3, line11, line12};

node Line7: { Line1, line2, line3, line5, line11, line12};

node Line8: { Line1, line2, line3, line5, line11, line12};

node Line10: { Line1, line2, line3, line5, line11, line12};

node Line11: { Line1, line2, line3, line5, line10, line12};

node Line12: { Line1, line2, line3, line5, line10, line11 };

node Line13: { Line1, line2, line3, line5, line10, line11, line12};

node Line16: { Line1, line2, line3, line11, line12}.

In the statement data dependency relationship, for example, the data dependency information in the node Line12 is: { Line1, line2, line3, line5, line10, line11 }, meaning that statement Line12 depends on statement Line1, line2, line3, line5, line10, line11, while statement Line1, line2, line3, line5, line10, line11 depends on statement Line 12.

Step ST2 can be specifically divided into two steps:

step ST21: constructing eigenvalue vectors of each sentence according to the training sentence data dependency relationship and the training defect suspicious degree list;

step ST22: and marking each statement with a defect label whether defects exist or not according to the known defect table.

Step SA2 is to construct eigenvalue vectors of each sentence according to the dependency relationship of the sentence data to be evaluated and the defect suspicion list to be evaluated.

Therefore, step SA2 is substantially the same as step ST21, that is, simply, the eigenvalue vector of each sentence is constructed from the sentence data dependency relationship and the defect suspicion list. The method comprises the specific steps of performing statement traversal on source codes, and performing the following steps on each traversed statement:

firstly, finding whether a current statement has a suspicion value from a defect suspicion list, and if so, taking the found suspicion value as a first element of a statement characteristic value vector; if not, the first element suspicious value of the sentence characteristic value vector is taken as 0.

Then, finding out a statement with a dependency relationship with the current statement according to the statement data dependency relationship, extracting the suspicious degree value of each statement with the dependency relationship with the current statement from the defect suspicious degree list, and selecting N suspicious degree values with the highest suspicious degree value as the last N elements of the statement eigenvalue vector; if the number of sentences with the dependency relationship with the current sentence is not more than N, or if the number of sentences with the dependency relationship with the current sentence is not more than N in the defect suspicion list, filling the rest with 0 as the suspicion value. In this embodiment, N is 6, and those skilled in the art may also take on values of 7, 8 or other values.

That is, the feature vector of the sentence is a vector composed of n+1 suspicion values.

The core of steps ST3 and SA3 is the support vector machine. The support vector machine, that is, the SVM, is a type of generalized linear classifier that binary classifies data according to a supervised learning manner. In this embodiment, the support vector machine adopts a Linear support vector machine, that is, a Linear SVM. The linear support vector machine in this embodiment is familiar to those skilled in the art, and will not be described in detail in this specification.

In addition, in step SA2, it is necessary to find out the statement having a dependency relationship with the current statement in step ST21, and in the present embodiment, the data dependency information of each statement is a statement list depending on the statement. To simplify the processing of step SA2 and step ST21, the data dependency information of each sentence may be converted into a sentence list having a dependency relationship with the current sentence.

Claims

1. A method for enhancing defect localization based on program dependency, the method comprising: a data acquisition step,

A dependency analysis step, a model training step and a defect analysis step;

the model training step comprises the following steps:

The defect analysis step includes the steps of:

in the above-mentioned steps of the method,

the defect suspicion list is a set of suspicion positioning information;

the known defect table is a set of known defect localization information;

and the suspicious value of the statement is determined according to the suspicious defect positioning table.

2. The method for program-dependent enhanced defect localization based on claim 1, wherein the step of dependent analysis comprises the steps of:

step SY1: constructing a control flow graph according to the input source code;

in the control flow graph, each node corresponds to a source code statement;

the variable value change information comprises change position information;

the precursor node refers to a precursor node connected with the current node.

3. The method for enhancing defect localization based on program dependency according to claim 2, wherein the step SY4 converts the data dependency information corresponding to each node into a statement list having a dependency relationship with the current statement as its statement data dependency relationship.

4. An apparatus for enhancing defect localization based on program dependency, the apparatus comprising: a data acquisition module,

the model training module comprises the following modules:

the defect analysis module comprises the following modules:

in the above-mentioned respective modules, a plurality of the modules,

the defect suspicion list is a set of suspicion positioning information;

the known defect table is a set of known defect localization information;

5. The program dependency enhancement defect localization based apparatus of claim 4, wherein the dependency analysis module comprises the following modules:

in the control flow graph, each node corresponds to a source code statement;

the variable value change information comprises change position information;

the precursor node refers to a precursor node connected with the current node.

6. The apparatus for program dependency enhancement defect localization based on claim 5, wherein the module MY4 converts the data dependency information corresponding to each node into a statement list having a dependency relationship with the current statement as its statement data dependency relationship.