CN109144882B

CN109144882B - Software fault positioning method and device based on program invariants

Info

Publication number: CN109144882B
Application number: CN201811096080.XA
Authority: CN
Inventors: 王甜甜; 许家欢; 王克朝; 苏小红
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2018-09-19
Filing date: 2018-09-19
Publication date: 2021-07-06
Anticipated expiration: 2038-09-19
Also published as: CN109144882A

Abstract

The invention discloses a method and a device for positioning software faults of program invariants. The method comprises the following steps: performing statement, value and logic expression level instrumentation on a target software source code, and executing the instrumented source code by adopting a preset test case set to obtain execution information; clustering the preset failure test case sets, and selecting a success test case set which is beneficial to distinguishing the defect sentences for each cluster; learning execution information of a preferred successful test case set to obtain a program invariant set, wherein the program invariant set comprises an aggregation type range invariant, a truth value phenotype range invariant and a floating point type range invariant; and detecting invariant violation according to the execution information of the failed test case set and the program invariant set to obtain a suspicious statement set. And adopting dependence analysis to filter out invariant violation false detections caused by fault propagation, carrying out statistical analysis on invariant violations at each statement, and calculating statement doubtful degree. The invention improves the accuracy of software fault location and overcomes the problem of missing detection of logic expression defect location.

Description

Software fault positioning method and device based on program invariants

Technical Field

The invention relates to the technical field of computer software, in particular to a software fault positioning method and device based on program invariants.

Background

Software defects are often included in software systems, reducing the reliability, availability, and security of the software systems. One of the primary tasks to correct a defect is to identify the location of the program element associated with the defect, and the programmer can then modify the program based on the location of the suspect program element and its program context to correct the defect. However, this process is very time consuming and laborious. Software debugging accounts for even 80% of the overall software overhead. There is therefore a need for an automated software fault location method.

In addition to providing direct support for software developers, automated software error localization techniques are also used in automated program modification. Suspicious program statements are identified prior to revision to direct a revision tool to generate a patch, narrowing the search space for the patch. Thus, the accuracy of automatic positioning directly affects the effectiveness of the correction tool.

Therefore, how to improve the effectiveness of the fault location technology becomes a current research hotspot. Various fault location methods have been developed, each of which has advantages and disadvantages, and no fault location technique superior to all other methods exists.

The basic idea of the fault positioning method based on the program invariants is as follows: firstly, a successful test case is used for training a program invariant set, then a failed test case is executed, the violation of the program invariant set by the failed test case is detected, the violation is added into a candidate set and used as a possible defect position, and the possible defect position is used as a cause of failure. The method has the advantage that the invariants obtained by successful test case learning are used for helping to analyze the expected behaviors and attributes of the software.

However, one difficult problem with this approach is how to automatically identify the program attributes needed in fault localization, i.e., how to define program invariants, which are inappropriate and can lead to missed or false detections. Another difficult problem is that the method has high requirements on the quality of test cases, and there is a balance between false detection and missed detection. If the learning invariant is too wide, the true failure generation reason is not included in the invariant violation candidate set, so that the defect position is missed; if the learning invariants are too narrow (some invariants are missed), a large number of false detections can be caused, and the generation source of the software failure cannot be accurately positioned.

Disclosure of Invention

In view of the above, the present invention has been made to provide a program invariant based software fault localization method, apparatus, electronic device and computer readable storage medium that overcome or at least partially solve the above problems.

One embodiment of the invention provides a software fault positioning method based on program invariants, which comprises the following steps:

establishing an abstract syntax tree aiming at a source code of target software, performing statement-level instrumentation, value-level instrumentation and logic expression-level instrumentation on the source code of the target software according to the abstract syntax tree, executing the instrumented source code of the target software by adopting a preset successful test case set and a preset failed test case set, and acquiring execution information of each test case, wherein the steps comprise: statement coverage information, values of variables and expressions and a logic expression truth table;

performing cluster analysis on the preset failure test case set, and screening the preset success test case set according to a failure coverage equivalence division preference criterion to obtain an optimal success test case set;

learning the execution information of the preferred successful test case set to obtain a program invariant set, wherein the program invariant set comprises a set type invariant, a floating point type range invariant and a truth table type invariant;

and carrying out invariant violation detection according to the execution information of the failed test case set and the program invariant set to obtain a suspicious statement set.

Optionally, before the obtaining of the preferred successful test case set, the method further includes:

and removing the inevitable coincidence test cases from the successful test cases.

Optionally, the method further comprises:

obtaining statements only covered by the failed test cases according to preset statement coverage information of the successful test case set and statement coverage information of the failed test case set, and adding the statements only covered by the failed test cases into the suspicious statement set;

and obtaining a forever and/or forever false predicate according to the preset logic expression truth table of the successful test case set and the logic expression truth table of the failed test case set, and adding the statement where the forever and/or forever false predicate is located into the suspicious statement set.

Optionally, after obtaining the set of suspicious statements, the method further comprises:

adding an assignment statement generating invariant violation and a direct control dependent statement thereof into the suspicious statement set by adopting arrival fixed value analysis;

and adopting dependence analysis to filter out the suspicious sentences which are subjected to false detection by invariant violation caused by fault propagation from the suspicious sentence set.

Optionally, the adding, by using arrival fixed value analysis, an assignment statement that produces invariant violation and a direct control dependent statement thereof to the suspect statement set includes:

analyzing the target software and creating a control flow graph of the target software;

traversing the control flow graph, and solving the arrival fixed value information by using an iterative algorithm and a data flow equation;

determining assignment statements generating invariant violating statements and direct control dependent statements in the suspicious statement set according to the arrival constant information;

and adding the assignment statement generating each invariant violation statement and the direct control dependent statement thereof into the suspicious statement set.

Optionally, the filtering out the suspicious sentences with invariant violation false detections due to fault propagation from the suspicious sentence set by using dependency analysis includes:

analyzing the target software to obtain a data dependency graph of the target software;

marking invariant violation information of statement nodes of each failed test case in the data dependency graph;

and filtering out the suspicious sentences determined by the invariant violation detection caused by fault propagation from the suspicious sentence set according to the invariant violation information.

and counting and analyzing invariant violation generated by each failed test case, calculating the suspicious degree of each suspicious statement in the suspicious statement set, and sequencing each suspicious statement according to the suspicious degree of each suspicious statement.

Optionally, the calculating the suspicious degree of each suspicious statement in the suspicious statement set includes:

calculating the doubtful degree of each doubtful statement in the doubtful statement set according to the following formula:

wherein Sus _ Inv(s)_i) Is a suspicious sentence s_iThe degree of suspicion of; TS is the preset successful test case set, and | TF | is the total number of test cases in the preset failed test case set; v. of_iIs the ith node in the data dependency graph, tf is the failed test case, confidence (v)_iTf, TS) is v_iThe node executes the fail test case tf to generate the confidence coefficient of invariance violation; if v is_iIs the cause of failure determined in the dependency analysis, confidence (v)₁Tf, TS) is 1, otherwise confidence (v)_i,tf,TS)＝0.1,2≤i≤k。

Another embodiment of the present invention provides a software fault location apparatus based on program invariants, including:

the source code instrumentation unit is used for establishing an abstract syntax tree for a source code of target software, performing statement-level instrumentation, value-level instrumentation and logic expression-level instrumentation on the source code of the target software according to the abstract syntax tree, executing the instrumented source code of the target software by adopting a preset successful test case set and a preset failed test case set, and obtaining execution information of each test case, and comprises: obtaining statement coverage information, values of variables and expressions and a truth table of logic expressions;

the test case set processing unit is used for carrying out cluster analysis on the failed test case set and screening the successful test case set according to a failure coverage equivalence division preference criterion to obtain an optimal successful test case set;

an invariant set obtaining unit, configured to learn the preferred successful test case set to obtain a program invariant set, where the program invariant set includes a set type invariant, a floating point type range invariant, and a truth table type invariant;

and the suspicious statement set acquisition unit is used for carrying out invariant violation detection according to the execution information of the failed test case set and the program invariant set to obtain a suspicious statement set.

Another embodiment of the present invention provides an electronic device, which includes a memory and a processor, the memory and the processor are communicatively connected through an internal bus, the memory stores program instructions that can be executed by the processor, and the program instructions, when executed by the processor, can implement the method described above.

Another embodiment of the present invention provides a computer-readable storage medium storing computer instructions for causing a computer to perform the above-described method.

The technical effect of the invention is that the invention carries out statement-level instrumentation, value-level instrumentation and logic expression-level instrumentation aiming at the source code of the target software, adopts the preset test case set to execute the instrumented source code of the target software, and obtains the execution information of each test case, including: statement coverage information, values of variables and expressions and a logic expression truth table; performing cluster analysis on the preset failure test case set to obtain an optimal success test case set; learning execution information of a preferred successful test case set to obtain a program invariant set, wherein the program invariant set comprises an aggregation type invariant, a floating point type range invariant and a truth table type invariant; and carrying out invariant violation detection according to the execution information of the failed test case set and the program invariant set to obtain a suspicious statement set. And filtering out invariant violation false detections caused by fault propagation from the suspicious statement set by adopting dependence analysis, and calculating statement suspicious degree by statistically analyzing invariant violations at each statement position on the basis. The invention improves the accuracy of software fault location and overcomes the problem of missed detection of logic expression defect location.

Drawings

FIG. 1 is a flowchart illustrating a method for locating software faults based on program invariants according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a software fault location method based on program invariants according to another embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a software fault location device based on program invariants according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

The type of the program invariants, the quality of the test cases and the error propagation all affect the effectiveness of fault location, and the affecting factors need to be comprehensively considered in the locating process. In particular, the following problems still remain to be analyzed and solved.

(1) How to automatically identify the program attributes required for fault location, which not only has lower computational complexity, but also can ensure the effectiveness of fault location?

(2) How to select a test case set so that on one hand, the complexity of invariant analysis is reduced and on the other hand, the validity of positioning is improved?

(3) How to consider error state propagation in the positioning process, the influence of the execution of an error statement on the execution state of its subsequent statements, on one hand, to position a suspicious assignment statement and reduce missed detection, and on the other hand, to remove false detection caused by error state propagation?

Fig. 1 is a flowchart illustrating a software fault location method based on program invariants according to an embodiment of the present invention. As shown in fig. 1, the method of the embodiment of the present invention includes:

s11: establishing an abstract syntax tree aiming at a source code of target software, performing statement-level instrumentation, value-level instrumentation and logic expression-level instrumentation on the source code of the target software according to the abstract syntax tree, executing the instrumented source code of the target software by adopting a preset successful test case set and a preset failed test case set, and acquiring execution information of each test case, wherein the steps comprise: statement coverage information, values of variables and expressions and a logic expression truth table;

it should be noted that the values of the variables and the expressions include variable values and values of the expressions.

The abstract syntax tree of the program adequately represents the syntactic structure of the program, the structure of the tree facilitating the execution of the instrumentation probe statements and the generation of the source code in reverse, and therefore instrumentation is performed based on the abstract syntax tree.

The test cases preferably use the execution path information of the program, and the invariant analysis uses the variable value and the logic expression truth table information, so that the instrumentation with three granularities is executed on the basis of the representation of the abstract syntax tree of the program.

(1) Statement level instrumentation: and inserting a probe statement at each statement for collecting statement line numbers covered when the test case executes the program.

(2) Value-level instrumentation: and inserting a probe statement into the expression, collecting values of variables in the process of executing the program by the test case set, and recording the corresponding invariant type according to the types of the variables and the expression.

(3) Instrumentation at the logical expression level: and inserting a probe statement into the logic expression, collecting a truth table of the logic expression in the process of executing the test case execution program, and recording the invariant type corresponding to the truth table.

Specifically, the program pile inserting steps are as follows:

and performing lexical analysis and syntactic analysis on the source code, establishing an abstract syntax tree, and simultaneously recording the line number of each statement in the source file.

The static single assignments for the equivalent form are first converted for negative effect expressions. Such as: the return statement returns an expression, the composite logical expression, converted to a form that assigns the expression to a temporary variable.

And traversing the abstract syntax tree, identifying the position where the probe statement needs to be inserted, and inserting the node for outputting the corresponding line number into the abstract syntax tree.

And (3) statement level pile inserting: for all executable statements, inserting a probe statement of an output line number;

if the instruction is an assignment statement and the like, executing variable value level instrumentation: the variable name is obtained through the syntax tree and the variable type is obtained through the symbol lookup table. Insert probe statement, output its left value.

If the logic expression is the logic expression, executing logic expression level instrumentation: and traversing the subtree according to the depth priority order, inserting a probe statement into each sub-expression, and outputting the value of the sub-expression until all nodes of the tree are completely accessed, so that the logic expression can obtain a truth value sequence in the subsequent execution process.

Statements defining output filestream variables, open filestream, and close filestream are inserted to output execution information into the file. And reversely generating a corresponding source code by the abstract syntax tree to obtain the instrumented program.

And then executing the instrumented program by using the test case, and collecting and storing the execution information of the program in the running process.

S12: performing cluster analysis on the preset failure test case set, and screening the preset success test case set according to a failure coverage equivalence division preference criterion to obtain an optimal success test case set;

s13: learning the execution information of the preferred successful test case set to obtain a program invariant set, wherein the program invariant set comprises a set type invariant, a floating point type range invariant and a truth table type invariant;

program invariants are formulas or rules that are constantly satisfied in program source code. The invariant remains unchanged when the program is executed with different inputs.

Carrot and Savant derive all predefined invariant patterns that may be satisfied based on Daikon, computational complexity is high, and some of the invariants may be fault location independent, and in addition, it is difficult to adequately represent program invariants that are program control flow dependent.

Software bugs often affect the values of variables and expressions in a program, and thus help locate bugs by comparing the differences in values of variables and expressions in successful execution versus failed execution.

The invariant representation in DIDUCE and the invariant range of the value proposed by Sahoo et al have lower complexity relative to the Daikon derived invariant, but the given invariant form does not consider the control flow related logical invariant, so that the logical expression defect is difficult to locate, and the range of integer numbers is represented in interval form, which easily results in missed detection.

Therefore, the embodiment of the present invention improves the invariant representation form proposed by Sahoo et al, analyzes the variable and the median in the execution process of the expression, and defines the invariant representation forms of the following three forms according to the types of the variable and the expression.

Floating point type range invariant: for a given set of successful test cases TS, the program invariant of expression e in floating point form at program point p is denoted Inv (p, e, TS) ([ a, b ]

If and only if the following condition (iff) is satisfied:

where value (e, ts) represents the value of expression e when the program is executed with test case ts.

That is, for a given successful test case set TS, the program invariant of the floating-point expression e at the program point p is the range of the value interval of e obtained by learning all successful test cases in the running TS, a and b are obtained by executing e with a certain test case, a is the lower bound of the interval, and b is the upper bound of the interval.

The floating-point type range invariant captures value information of floating-point type program variables and expressions. Such as double f; the range invariant for f learned with the successful test case is [1.1,5.7 ].

In order to effectively locate the defect that the divisor is 0, whether a floating point expression value with an absolute value close to 0 appears is specially recorded. The definition is as follows:

if it is not

Inv(p,e,TS)＝[a,b]+[0,0]

If not, then,

Inv(p,e,TS)＝[a,b]

the floating point type range invariant is suitable for analyzing defects such as subscript boundary crossing of an array, and other defects which affect variable and expression values can be indirectly positioned to a certain extent through value range analysis.

Set type invariant: for a given set of successful test cases TS, the program invariants of expression e of integer, character, string (character array or character pointer) type at program point p are denoted as Inv (p, e, TS) = { value (e, TS) | TS ∈ TS }. Where value (e, ts) represents the value of expression e when the program is executed with test case ts.

Namely integer, character and character string, the invariant is a discrete value set in the execution process, so that the range of the invariant can be prevented from being too wide, and missing detection in the process of detecting the invariants by violation can be reduced.

True phenotype invariant: for a given set of successful test cases TS, the program invariants of the multi-conditional logic expression e at the program point p are denoted as Inv (p, e, TS) { (Inv (p, e1, TS), Inv (p, e2, TS), value (e, TS)) | e1, e2 is a sub-expression of e }. Where value (e, ts) represents the value of expression e when the program is executed with test case ts.

That is, the invariant form of a logic expression is a truth table formed by its sub-expressions and its values. If e1, e2 also contains sub-expressions, its truth table consists of its sub-expression truth values.

S14: and carrying out invariant violation detection according to the execution information of the failed test case set and the program invariant set to obtain a suspicious statement set.

And learning by using the successful test case set to obtain the invariant of a specific program point in the program, then detecting the execution information of the failed test case, judging whether the execution information meets the invariant, and if not, calling that invariant violation is generated.

Definition (invariant violation): for a given set of successful test cases TS, the expression e at program point p, for a failed test case tf, if

Then the expression e at the program point p is called tf, resulting in an invariant violation. Then program point p is a suspect program point.

Table 1 gives examples of truth table type invariants. The symbol ≠ in the table represents program calculation truncation, namely when the logic expression is calculated, when the value of the previous part of expressions is true, the result of the next part of expressions is not calculated any more. Assume that the correct expression is z ═ x & & y, and that the error is written in the program as z ═ x | | | y. In the fault locating process, the invariant of the logic expression is {000, 1 × 1} obtained by using the successful case learning, and the truth value sequence 011 executed by the failed case (1,0) does not appear in the invariant set, so invariant violation occurs, and the logic expression error can be located.

TABLE 1 truth table phenotype invariant examples

The invariants of the logic expressions are defined in a truth table form, the composition structure of the expressions of the programs is considered, and the problem that only one fixed 0 or 1 value can be obtained for the logic expressions by simple value range analysis, and the related defects of the logic expressions are difficult to locate is solved. And analyzing the expression of a single condition by adopting a floating point type range invariant or a discrete invariant according to the expression type.

In order to reduce complexity, the embodiment of the invention does not deduce expressions which may be satisfied between variables, but analyzes the conditions of the variables and the expression values in the execution process. In order to improve the positioning accuracy, according to the types of variables and expressions, the program invariants are defined as floating point type range invariants, set type invariants and truth table type invariants, corresponding variables and expression values when each test case executes the program are obtained through program instrumentation at the value and logic expression levels, and then the invariants are generated by utilizing the execution information learning of successful test cases. The former two invariants analyze the execution state values of the variables, which is beneficial to positioning the software defects related to the data, and the latter analyzes the logic expression, which is beneficial to positioning the defects related to the predicates. The method solves the problem that the existing method is difficult to analyze the branch condition expression, so that the defect positioning and the omission of the logic expression are caused.

In an optional implementation manner of the embodiment of the present invention, before the obtaining of the preferred successful test case set, the method further includes:

In order to reduce false detection and missed detection of invariant violation detection and improve effectiveness of fault location, the influence of a failed execution path on the fault location is fully considered in a test case optimization process, and the method comprises the following steps:

(1) clustering failed test cases: and clustering the failed test cases with similar statement coverage information in the same class by adopting hierarchical clustering. One failed test case cluster is analyzed at a time.

(2) Remove "certain" coincidentally correct test cases: and (4) marking the statement coverage as the successful test case which is the same as any failed test case statement coverage as an inevitable coincidence correct test case, and deleting the correct test case from the successful test case set.

(3) And (3) selecting successful test cases similar to the failed execution path, and filtering out the test cases which are possibly coincident with the correct test cases in the selection process. The method adopts a multi-criterion optimization test case optimization method [ i ], and assigns higher priority to successful test cases with execution paths similar to failed execution paths according to a 'failure coverage vector similarity priority' criterion, so that invariant sets generated by subsequent learning are more related to failed test cases, and the invariant violation detection false detection and missed detection are reduced; and then, according to a 'failure coverage equivalent division optimization selection' criterion, selecting a minimum successful test case set capable of distinguishing failure execution statements to the greatest extent, avoiding the too wide learning invariant range, and being beneficial to reducing invariant violation missing detection.

Further, the method further comprises:

If a program statement only appears in the failure path and is not covered by any successful test cases, the statement is likely to be a suspect statement that caused a failure. For example, adding redundant statements to otherwise correct program code causes a program to execute a logical error, or produces erroneous computational results.

Statements covered by only failed test cases, for a given set of successful test cases TS, failed test case set TF, statements s on program points p, if

And is

Then s is a suspect statement that is only covered by the failed test case.

In this case, the invariants cannot be learned and obtained by successful test cases, and even invariants are not violated, so that it is necessary to identify such suspicious sentences according to the sentence coverage information of the test cases.

Predicate decision conditions in a program are generally used to distinguish different situations and to perform different processing for different branches. If some predicate decision conditions are forever true or forever false in all executions of the program, this may mean that some defects are contained in the program, resulting in a failure of the predicate decision. In this case, the statement and its related statements are suspicious, although the failed test case does not violate the invariant learned by the successful test case. It is therefore necessary to specifically identify such perpetual and/or plausible predicates.

Forever and/or forever predicates: for a given successful test case set TS, failed test case set TF, and condition judgment predicate expression e on a program point p, if

and

Then e is the forever predicate. If it is not

and

Then e is the permanent false predicate.

And inputting statement coverage information collected in the execution process of each test case, and then respectively creating a statement coverage matrix of a successful test case and a statement coverage matrix of a failed test case. And scanning the two matrixes, checking the statement coverage vector of the successful test case and the statement coverage vector of the failed test case of each statement, taking the statements of which the successful test case statement coverage vector elements are all 0 and the failed test case coverage vector elements are not all 0 as suspicious statements, and adding the suspicious statements into the candidate set.

And (3) statistically analyzing the truth values of all the conditional judgment predicates, identifying the conditional judgment predicates which execute all the successful test cases and all the failure test case truth values are true (or false) as suspicious sentences, and adding the suspicious sentences into the candidate set.

Further, after obtaining the set of suspect statements, the method further comprises:

Suspicious statements typically exist in the execution path with failed test cases. In order to reduce the complexity of invariant solution, only the statements covered by the failed cases are analyzed, and the optimized successful test case set is used for learning and generating the program invariants.

Analyzing the execution information of each successful test case in the optimization set, and generating three types of invariants in a statistical manner according to the variables and the expression information recorded in advance.

(1) And for the set type invariant, adding the discrete value of the variable into the invariant set in a union mode according to the statement position and the corresponding variable value.

(2) And for the floating point type range invariant, updating the variable value range of the corresponding position according to the statement position and the corresponding variable value.

(3) And for the truth table invariant, adding a truth sequence corresponding to the logic expression into the invariant set in a union form according to the statement position.

Analyzing the execution information file (with the same format as the successful test case) of each failed test case, and judging whether the value of the execution information file meets the program invariants obtained by the previous learning of the preferred successful test case set according to the variables at each position and the invariant type corresponding to the expression.

(1) For the set type variable, whether the value of the variable appears in the numerical type set invariant is judged according to the statement position and the variable ID.

(2) And for the floating point type variable, judging whether the value of the floating point type variable is within the upper and lower bounds of the numerical value type range invariant according to the statement position and the variable ID.

(3) For the logic type expression, whether the truth value sequence of the expression appears in the corresponding logic expression truth table is judged according to the statement position.

With the above determination, if the result is no, it is said that the code line in which the variable or expression is located, which is referred to as the suspect code line, produces an invariant violation.

Further, the adding, by using arrival fixed value analysis, an assignment statement generating invariant violation and a direct control dependent statement thereof into the suspect statement set includes:

The embodiment of the invention analyzes the mutual influence among the sentences and the propagation path of the fault by analyzing the data dependence and the control dependence among the sentences so as to improve the positioning effectiveness. On one hand, by utilizing variable definite value relation analysis, the defect of variable assigned value class which is difficult to be positioned by the existing method is positioned. On the other hand, error state propagation can cause a large number of statements that are not the root of the failure to be identified as suspect statements, and therefore such false positives are filtered out using dependency analysis.

To quantify the suspicion of a statement, a confidence level is defined for the invariant violation, measuring the likelihood that the invariant violation is the root of a failure.

A constant assignment statement such as var 1 has a constant value that is fixed in both the value of the variable in the successful test case and the value of the variable in the failed test case, so no invariant violation can be detected at the statement; and usually, both the successful use case and the failed use case execute the assignment statement, so that the assignment statement is difficult to effectively locate by methods such as program spectrum and the like.

However, if var is miscount by 1 (e.g., var should be 0 in its correct value), then the statements that reference var in its successors, where invariant violations may be detected, are all affected by it.

In order to effectively locate the generation source of the failure and avoid missing detection of such constant assignment errors, after a suspicious statement generating invariant violation is detected, arrival-fixed value analysis is adopted, and the fixed value statement (assignment or return statement) of the variable in the statement is also considered as the suspicious statement, so that the invariant violation attribute of the statement is set to be true, and the confidence coefficient of the statement as the generation source of the failure is defined as 1.

Arrival rating analysis (arrival Definitions): for each program point, the definite values of the variables in the program are analyzed, and the program point can be reached through a certain program path. And traversing the control flow graph, and solving the information of the reached fixed value by using a data flow equation by adopting an iterative algorithm.

The control depends on: if the node vj executes the predicate decision of the node vi, the vj control is called to be dependent on the vi, or a control dependent edge exists between the vi and the vj, and the control dependent edge is marked as

Setting the value: vi constant value variable x means that the memory cell of x is rewritten once due to the execution of vi. Is marked as

The constant value of variable x is a statement that assigns or may assign a value to x. The most common constant value is a statement that assigns or reads a value to x.

Quote: vj is said to reference vi-valued variable x if vi-valued variable x is used as an operand in the expression of vj.

The following steps are reached: the constant value vi for variable x reaches vj if and only if there is an execution path from vi to vj and variable x is not re-valued in that path.

In order to reduce the complexity of analysis and avoid introducing redundant information, the embodiment of the invention adopts 1-step inverse control dependence and arrival fixed value analysis. Only the direct control dependent statements and their valued statements that produce invariant violations are added to the candidate set.

An error in a statement may affect the program state, and as the program continues to execute, the affected program state may be propagated further. That is, the location where the execution path of the program changes is not necessarily the statement that actually causes the error, but may be the error of other statements, and the error is continuously propagated through the program state until the execution path of the program changes.

If a defective statement is the root cause of failure, other statements that reference the error value generated by the statement may also produce invariant violations. Based on the heuristic information, a fault propagation analysis method based on reverse slicing is provided, a propagation path of an error state is analyzed, and invariant violation false detection caused by fault propagation is filtered. The data dependency path with invariant marks and the error propagation data flow path are defined on the basis of the data dependency graph.

To reduce complexity, rather than creating a complete dynamic program dependency graph, the data dependencies between invariant violations are analyzed in an on-demand query based on a static data dependency graph.

Data dependency graph: the data dependency graph DDG ═ (V, E) of the program P is a directed graph. Wherein V is a node set and represents statements and predicates in the program;

is a set of edges representing data dependencies between nodes.

Data dependence: if there is a path from node v1 to node v2, there is one in v2A variable that is valued and referenced in v1 and that is not re-valued anywhere else on the path from v1 to v2, then we say that v2 data depends on v1, or that there is a data flow edge between v1 and v2, noted as

Data dependent path with invariant flags: the data dependent path with invariant marks of the given test case tf with execution failure is a directed graph, wherein IDDG (tf) is (V, I, E), and V is a node set and represents statements and predicates in a program; i is an attribute set of the node and is used for recording coverage and invariant information of each node when tf is executed;

is a set of edges representing data dependencies between nodes.

Wherein the attribute of each node is a triple (COV, INV, VIO), defined as follows:

statement node of failed execution override: if the failed test case executes the statement, the COV attribute is T, otherwise, the COV attribute is F;

the successful case learns invariant nodes: if the invariance INV attribute is learned to be T by using the successful use case, otherwise, the INV attribute is F;

node that produces invariant violation: if the production invariants violate the VIO attribute T, otherwise the VIO attribute F.

Error propagation data flow path: for failed test case tf, assume

Is the data flow path in IDDG if (1) v_i.COV＝T1≤v_iK ≦ k, i.e. node v on the path_iAre all executed by the failed test case tf; (2) v. of₁.INV＝T，v_kINV ═ T, i.e. v₁And v_kAll produce invariant violations; (3) k is 1 or

2＜i＜k，v_iINV ═ T and v_iF, i.e. a node on the data flow path that does not contain any non-violating success invariants, then

Is an error propagation data flow path.

In order to efficiently evaluate whether an invariant violation is the cause of a failure, each invariant violation is assigned a confidence value.

confidence(v₁,tf,TS)＝1

confidence(v_i,tf,TS)＝0.1,2≤i≤k

I.e. v₁Having a higher confidence is the cause of the fault, while invariant violations elsewhere on the data flow path are more likely due to propagation of error conditions along the data stream, and thus a lower confidence of the defect, which is assigned to a lower confidence level, helps to reduce false detection of invariant violations.

Is a failure propagation data flow path, where s₁Is the root of the failure with a confidence of 1, and s₂、s₇Is an invariant violation due to the propagation of error states along the data stream with a confidence level of 0.1.

If it is not

On the data stream path of the first data stream,

1＜i＜k v_iINV ═ T and v_iIf a node with a success invariant that does not cause a violation is present, then confidence (v)_i+1Tf, TS) ═ 1. This is based on the principle of trusting success invariants, if a statement contains a success invariant, it is assumed that it does not produce a wrong value, and is therefore dependent on that statementSuccessor statements may be an independent source of errors if they produce invariant violations.

When invariant violations generated by different failed test cases are analyzed, the step (3) is executed again only according to the invariant violating information generated by a new failed test case without repeatedly creating a data dependency graph and marking invariant information of successful case learning.

In order to reduce mutual interference among multiple failures, the failure test cases are clustered according to the similarity of execution paths, the failure test cases possibly caused by the same reason are clustered in the same class, then the success test cases are respectively selected for each failure case cluster, the test case set similar to the execution path of the failure cases is selected, and the test cases which are coincidentally correct are prevented from being selected, so that the invariant range is narrowed, and the missing detection is reduced. And the missed detection and false detection of fault positioning are reduced by program dependent analysis. On one hand, the arriving fixed value is used for analyzing and identifying the fixed value statement of the variable, the defect of the assignment statement is identified, and the missing detection is reduced; on the other hand, the method is based on a dependent error propagation analysis method, and utilizes dynamic tracking information to analyze error propagation paths, thereby filtering false detection of invariant violation caused by error propagation.

Preferably, after obtaining the set of suspect statements, the method further comprises:

The embodiment of the invention sorts the suspicious sentences, thereby improving the detection efficiency.

As shown in fig. 2, the software fault location method based on program invariants according to the embodiment of the present invention mainly includes the following steps:

in the first step, the source code is parsed and execution information is collected.

In order to obtain the execution information of the program, the program is instrumented on the basis of an abstract syntax tree. On the basis of ensuring that the logic and the function of the original program are not changed, the three granularity instrumentation probe statements are respectively used for obtaining statement coverage, variable expression values and logic expression truth table information in the program execution process.

Second, two classes of suspicious sentences that are difficult to locate with invariant violations are identified.

Analyzing statement coverage information and identifying statements which are only covered by failure; and analyzing and logic expression truth table and detecting the permanent true/permanent false predicate.

And thirdly, a successful case irrelevant to a failed test case can interfere with invariant location violation detection. For example, if there are more successful test cases at the faulty program statement, it may result in learning to obtain an invariant range that is too broad, resulting in missed detection of invariant violations. At other correct sentences, however, it is possible to learn to obtain invariants that are independent of the current failure, resulting in false detection of invariants against detection.

In addition, there may also be coincidentally correct test cases in the successful test case set. The coincidental and correct test case may generate an error state in the execution process of the program, but is not propagated to the final result, so if the test case is used for learning and generating an invariant, an error invariant set may be generated, and further a subsequent positioning process is affected, which may cause false detection that the invariant violates detection or missed detection.

Therefore, the embodiment of the invention provides a test case optimization method considering the failure path. Firstly, according to the execution statement coverage information, clustering the failure test cases with similar execution paths, and avoiding mutual interference among multiple different failures. And then clustering each failed case, deleting the coincidental correct test cases, and preferentially selecting the successful test cases similar to the failed execution path.

And fourthly, learning the invariant set by using the preferred successful test case. The embodiment of the invention reduces the complexity of fault positioning and improves the effectiveness, defines three types of invariants according to the types of the program expressions, and supports the defects related to positioning and value and the defects related to positioning and program logic.

And fifthly, detecting invariant violation.

And aiming at the invariant set obtained by learning, detecting whether the value of the failed test case at the corresponding program point meets the invariant, if not, adding the invariant into the candidate set in a violation way, and marking the program point as a suspicious statement position.

And sixthly, using fault propagation analysis to reduce the missing detection and false detection of invariant violation detection.

The drawback of constant variance of assignment statements is difficult to locate because the value and execution path of such statements at execution time are invariant to success and failure test cases. However, since the wrong program state can propagate along the control and data dependence path of the program in the execution process of the failed test case, the subsequent statement with the dependency relation with the assignment statement can show invariant violation behavior when the failed test case is executed, and the assignment statement defect is further identified through reaching the fixed value analysis, so that the missing detection is reduced.

On the other hand, error state propagation can cause the invariant to generate false positives against detection. If a defective statement is the root cause of failure, other statements that reference the error value generated by the statement may also produce invariant violations. Such subsequent invariant violations are not the root of failures, they are simply due to propagation of erroneous values through subsequent computations. Based on the heuristic information, the wrong propagation path is analyzed, and further false detections of invariant violations caused by fault propagation are filtered out.

And seventhly, counting and calculating the suspicious degree of the program points with invariants violated, sequencing the program points according to the sequence of the suspicious degree from high to low, and outputting a suspicious statement list and invariants violated information thereof.

Fig. 3 is a schematic structural diagram of a software fault location apparatus based on program invariants according to an embodiment of the present invention. The apparatus of this embodiment comprises:

the source code instrumentation unit 31 is configured to establish an abstract syntax tree for a source code of target software, perform statement-level instrumentation, value-level instrumentation, and logic expression-level instrumentation on the source code of the target software according to the abstract syntax tree, execute the instrumented source code of the target software by using a preset successful test case set and a preset failed test case set, and obtain execution information of each test case, and includes: obtaining statement coverage information, values of variables and expressions and a truth table of logic expressions;

the test case set processing unit 32 is configured to perform cluster analysis on the failed test case set, and screen the successful test case set according to a failure coverage equivalence division preference criterion to obtain an optimal successful test case set;

an invariant set obtaining unit 33, configured to learn the preferred successful test case set to obtain a program invariant set, where the program invariant set includes a set type invariant, a floating point type range invariant, and a truth table type invariant;

the suspicious statement set obtaining unit 34 performs invariant violation detection according to the execution information of the failed test case set and the program invariant set, and obtains a suspicious statement set.

Optionally, the test case set processing unit 32 is further configured to:

Optionally, the suspicious statement set obtaining unit 34 is further configured to:

Optionally, the method further comprises:

and the fault propagation analysis unit is used for adding the assignment statement generating the invariant violation and the direct control dependent statement into the suspicious statement set by adopting arrival fixed value analysis, and filtering out the invariant violation false detection caused by fault propagation from the suspicious statement set by adopting dependent analysis.

The fault propagation analysis unit is specifically configured to:

Optionally, the fault propagation analysis unit is further configured to:

Optionally, the system further includes a statement doubtful degree calculation unit, configured to statistically analyze invariant violations generated by each failed test case, calculate doubtful degrees of each doubtful statement in the doubtful statement set, and sort each doubtful statement according to the doubtful degree of each doubtful statement.

The sentence suspicion degree calculation unit 36 is specifically configured to:

The suspicion degree calculation unit is specifically configured to:

wherein Sus _ Inv(s)_i) Is a suspicious sentence s_iThe degree of suspicion of; TS is the preset successful test case set, and | TF | is the total number of test cases in the preset failed test case set; v. of_iIs the ith node in the data dependency graph, tf is the failed test case, confidence (v)_iTf, TS) is v_iThe node executes the fail test case tf to generate the confidence coefficient of invariance violation; if v is_iIs the cause of failure determined in the dependency analysis, confidence (v)₁,tfTS is 1, otherwise confidence (v)_i,tf,TS)＝0.1,2≤i≤k。

The apparatus of the embodiment of the present invention may be used to implement the above method embodiments, and the principle and technical effect are similar, which are not described herein again.

Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention. As shown in fig. 4, the electronic device includes a memory 41 and a processor 42, the memory 41 and the processor 42 are communicatively connected via an internal bus 43, the memory 41 stores program instructions executable by the processor 42, and the program instructions, when executed by the processor 42, implement the method described above.

In addition, the logic instructions in the memory 41 may be implemented in the form of software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention or a part thereof, which essentially contributes to the prior art, can be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Another embodiment of the present invention provides a computer-readable storage medium storing computer instructions that cause the computer to perform the above-described method.

The method aims at the problems that the existing method has limited defect types of positioning caused by incomplete definition of invariants, analysis caused by lack of test case influence, missing detection and false detection caused by invariants violating detection caused by analysis of error propagation and the like. Firstly, the definition of program invariants is improved, and the invariants are divided into range invariants, set invariants and Boolean sequence invariants according to the types of variables and expressions; two types of suspicious sentences which are difficult to violate detection with invariant are defined; thereby enriching the defect types that can be located. Then, an optimization strategy for reducing invariant violations to detect false positives and false negatives is researched. The test case optimization considering the failure path is provided, the failure test cases are clustered according to the execution path so as to avoid mutual interference among multiple failures, a test case set similar to the execution path of the failure case is selected, and the test cases which are coincidentally correct are prevented from being selected, so that the invariant range is narrowed, and the missing detection is reduced. And clustering the failed test cases according to the execution path to avoid mutual interference among multiple failures. On one hand, the method for reducing the missing detection and the false detection of invariant violation detection by utilizing fault propagation analysis is utilized to analyze and detect the assignment statement defect of the missing detection by utilizing the arrival fixed value; and on the other hand, the error propagation path is analyzed, and the false detection of invariant violation caused by error propagation is filtered. On the basis of the research, a program invariant fault positioning method for analyzing error state propagation is provided.

Compared with the program spectrum method, the method provided by the embodiment of the invention has the advantages that:

(1) because the suspicious statement list of the method of the embodiment of the invention only lists the suspicious statements generating invariant violation and the false detection is further reduced by the operations of test case optimization, error propagation and the like, most defective versions can be positioned by examining less codes.

(2) For the version which can be located, the method of the embodiment of the invention has lower code review overhead.

The program spectrum method is more suitable for positioning defects with obvious difference between failure path and success path coverage statements, such as control flow related defects. And when the successful case and the failed case both execute the defect statement, the positioning effect is poor.

The method provided by the embodiment of the invention not only considers statement coverage (such as detecting suspicious statements only covered by failed test cases), but also analyzes the propagation of wrong state values along the control flow and data flow of a program, thereby obviously improving the effectiveness of positioning results.

(3) The program spectrum method does not consider the mutual influence among the sentences, can only give out a statistical analysis result, and cannot effectively assist developers to understand the defects. Invariant analysis takes into account fault propagation and the context of the program, and invariant violation results can be used to assist developers in understanding the cause of software errors and provide assistance for their bug fixes.

(4) Compared with a program spectrum method, the method provided by the embodiment of the invention has the advantages that the specific variable value and the error propagation path are considered, so that the parallel ranking sentences are obviously reduced, and the positioning effectiveness is improved.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It is to be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

In the description of the present invention, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description. Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

While the foregoing is directed to embodiments of the present invention, other modifications and variations of the present invention may be devised by those skilled in the art in light of the above teachings. It should be understood by those skilled in the art that the foregoing detailed description is for the purpose of better explaining the present invention, and the scope of the present invention should be determined by the scope of the appended claims.

Claims

1. A software fault positioning method based on program invariants is characterized by comprising the following steps:

2. The method of claim 1, further comprising:

3. The method of claim 1, wherein after obtaining the set of suspect statements, the method further comprises:

4. The method of claim 3, wherein the adding an assignment statement that produces invariant violation and its direct control dependent statement to the suspect statement set using reach-definite analysis comprises:

5. The method of claim 3, wherein filtering out suspect sentences from the suspect sentence collection using dependency analysis that are invariant against false positives due to fault propagation comprises:

6. The method of claim 1, wherein after obtaining the set of suspect statements, the method further comprises:

7. The method according to claim 6, wherein said calculating the doubtful degree of each doubtful statement in the doubtful statement set comprises:

8. A software fault locating device based on program invariants, comprising:

9. An electronic device comprising a memory and a processor, the memory and the processor being communicatively coupled via an internal bus, the memory storing program instructions executable by the processor, the program instructions when executed by the processor being operable to implement the method of any of claims 1-7.

10. A computer-readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1-7.