WO2017201853A1

WO2017201853A1 - Method for locating program regression fault using slicing model

Info

Publication number: WO2017201853A1
Application number: PCT/CN2016/090956
Authority: WO
Inventors: 刘烃; 王海军; 郑庆华; 管晓宏; 陈泽华; 朱海萍
Original assignee: 西安交通大学
Priority date: 2016-05-26
Filing date: 2016-07-22
Publication date: 2017-11-30
Also published as: CN106095663A; CN106095663B

Abstract

A method for locating a program regression fault using a slicing model, comprising: in a program pre-processing phase, comparing source codes of two versions of a program to identify parts that differ therebetween, and rearranging the source codes according to the identification result; in a trajectory association phase, on the basis of execution trajectories of the two acquired program versions, dependencies among statements thereof, and information of variables thereof, associating and classifying the statements of two execution trajectories; in a slicing analysis phase, taking an execution failure point of a new-version program as a starting point and performing slicing analysis; according to the statement entity classification and the dependencies thereof, backtracking to a statement entity causing the program execution failure, until there are no further dependency statements to be analyzed and the dependency of the current statement under analysis no longer needs to be analyzed; and using all statement entities analyzed in all phases in the slicing analysis as program behavior slicing output causing a regression error. The present invention explains the mechanism by which a regression fault is generated, and provides guidance for recovery from the regression fault.

Description

Program regression error location method based on slice model

Technical field

The invention relates to the field of trusted software and software testing, in particular to a method for locating regression errors in a program.

Background technique

Software testing is the basic means to ensure the quality of software, and it is also the most labor-intensive and material-intensive process in the software development process. Experience has shown that regression errors are often introduced during software updates, so regression testing is necessary. Even though there are many technologies that automate the process of regression testing, regression testing is only the first step in the software testing process. The more important and challenging task is to find the program modifications that caused the program to fail, and provide a context for why these changes would cause the program to fail. However, the process is not simple. First, as programs become more complex, regression errors always appear in specific environments or configurations. Second, due to time and labor constraints, usually only one test case is available. This test case was successfully executed in the old version of the program, but failed in the new version of the program. Third, the precise context in which program execution fails is difficult to obtain. Only the tester can read the context in which the program failed to execute, and they can accurately fix the error based on this information.

In recent years, many techniques for regression error testing have been proposed, such as ADD (Augmented Delta Debugging) and AFTER (Automated Fault Explanation for Regression testing). The automatic isolation of ADD by combining coverage analysis and Delta debugging results in a modification of program execution failure. However, ADD does not analyze the semantics of the program, so the misplacement of the positioning may simply avoid the failure of the program execution and not the real error. AFTER uses dynamic analysis based on Delta debugging and semantic analysis based on symbol analysis to locate regression errors. The main problem with AFTER is its extensibility. As we all know, symbol analysis is a huge process, so it is difficult to apply to large-scale programs. Even though these techniques can be automated to locate program changes that cause program execution to fail, they are rarely used because they cannot provide a context that causes the program to fail. Among the existing technologies, a generally accepted assumption is that testers can easily Understand why the program failed to execute. In fact, the understanding of the reasons for program execution failure is not simple, so the prior art has little help in repairing regression errors.

Therefore, a practical regression error localization technique not only requires precise positioning to cause errors, but also provides a context in which the modification causes the program to fail.

Summary of the invention

The object of the present invention is to propose a program regression error localization method based on a slice model. By performing slice analysis on two versions of the program, the program behavior slice causing the program regression error is output, and the problem of regression error location in the program test is solved.

In order to achieve the above object, the present invention adopts the following technical solutions:

A program regression error location method based on a slice model includes the following steps:

S1), according to the two versions of the program to be tested, calculate the difference between the two versions of the program source code, according to the source code, the source code of the two versions of the program is rearranged, so that the same code has the same line number , different codes correspond to blank lines;

S2), using two code reordering program execution will lead to a test case in which the modified version program execution fails, and extract the execution trajectory of the two versions of the program, the variable values contained in the statement, and the dependency of the statement;

S3), the execution trajectory is organized into a tree structure, each node of the tree is a function execution sequence; the trajectory function body statement corresponding method is called, and the execution trajectory of the two versions of the program extracted in step S2) is corresponding;

S4), according to the difference of the source code, the track corresponding result and the variable value contained in the statement, the statements in the track are divided into four categories: modifying different statements, different statements, different values, and consistent statements;

S5), starting with a new version of the program execution failure point as a starting point, performing a slice analysis on the generation of the regression error, and adding the failure point statement to the slice analysis queue;

S6), determining whether the slice analysis queue is empty, if it is empty, then jumping to step S9), otherwise jumping to step S7);

S7), take out the first end statement of the slice analysis queue, and determine whether the first end statement needs to be sliced, if not required Analysis then jumps to step S6), otherwise jumps to step S8);

S8), adding the current analysis sentence to the regression error slice, and adding the dependency statement and the corresponding statement that need to be sliced according to the sentence classification in step S4) to the slice analysis queue;

S9), output regression error slice.

A further improvement of the present invention is that the process of code rearrangement in the step S1) comprises the following steps:

S101), according to the source code of the two versions of the program, calculate the difference between the two versions of the program source code;

S102), according to the difference between the calculated two versions of the program source code, each of the two versions of the program source code is divided into two categories: identical statements, modified statements;

S103), on the basis of not changing the execution order of the program statement, so that the identical statements have the same line number, and the modified statement corresponds to the blank line;

S104), the two versions of the program source code after the output code rearrangement are used as the new program to be tested.

A further improvement of the present invention is that the tree structure of the execution trajectory in step S3) is defined as follows: each node in the tree is an execution trajectory of a function body, and a child node of one node is called in a function body represented by the node The execution trajectory of the function body; in step S3), the trajectory function body statement is correspondingly started from the root node of the tree structure (main function body trajectory), and specifically includes the following steps:

S301), determining whether the corresponding statement in the currently corresponding function body needs to be empty, if it is empty, the function body statement corresponding method execution end returns, otherwise, the process proceeds to step S302);

S302), iteratively extracting the statement in the function body as the current corresponding statement;

S303), determining whether the current corresponding statement is a function call statement, if it is a function call statement, then jumping to step S304), otherwise jumping to step S306);

S304), determining whether the function call has a corresponding function call in the corresponding track, if there is a corresponding function call, then jump to step S305), otherwise jump to step S301);

S305), calling the function body statement corresponding method corresponding to the statement in the function body track, after the execution ends, the process proceeds to step S301);

S306), determining whether the current statement has a corresponding statement in the corresponding track, if there is a corresponding statement, then jump to step S307), otherwise jump to step S301);

S307), the current statement is associated with the corresponding statement in the corresponding track, and after the process ends, the process proceeds to step S301).

A further improvement of the present invention is that the strategy for classifying the sentences in step S4) is to classify all the statements in the track into the following four categories according to the source code sentence classification, the track correspondence result, and the variable value information contained in the statement in step S102):

(1) Modification results in different statements: modified statements.

(2) Flow different statements: non-modified statements, and there is no corresponding statement in the corresponding track;

(3) Statements with different values: there are corresponding statements in the two tracks, and the statement contains at least one variable value;

(4) Consistent statement: There are corresponding statements in the two tracks, and the values of the variables contained in the statement are the same;

The classification process includes the following steps:

S401), determining whether the statements in the trajectory have all been classified, if all the classifications have been performed, then the process proceeds to step S407), otherwise, the process proceeds to step S402);

S402), iteratively extracting unclassified sentences in the trajectory and classifying them;

S403), according to the classification of the source code, determine whether the statement is a modified statement, if the modified statement is marked as modified resulting in a different statement and jump to step S401), otherwise jump to step S404);

S404), according to the trajectory corresponding result, determine whether the statement has a corresponding statement, if there is no corresponding statement, the statement is marked as a different flow statement and jump to step S401), otherwise jump to step S405);

S405), according to the variable value information included in the statement, determining whether the statement and the variable value included in the corresponding statement are the same, and if at least one variable value is not the same, marking the statement as a statement having a different value and jumping to step S401) Otherwise, go to step S406);

S406), marking the statement as a consistent statement and jumping to step S401);

S407), the sentence classification in the track ends and returns.

A further improvement of the present invention is that the criterion for determining whether the statement needs to be sliced in step S7) is that if the statement has been analyzed or belongs to the statement consistent in step S4), no slice analysis is required, otherwise the slice needs to be sliced. analysis.

A further improvement of the present invention is that the analysis strategy of the statement that needs to be sliced in step S8) is that, for the statement with different median value in step S4), only the data dependency statement and the corresponding statement in the corresponding track are added to the slice analysis queue; For statements with different flows, only the control dependency statement is added to the slice analysis queue; for the classification to modify to cause different statements, the control dependency and data dependency statements are added to the slice analysis queue. At the same time, the current statement analyzed in step S8) needs to be added to the regression error slice.

Compared with the prior art, the present invention has the following beneficial effects: the present invention proposes a program regression error localization method based on a slice model, and performs slice analysis on two versions of the program to output a program behavior slice that causes a program regression error. Solve the problem of regression error location in program testing. In the pre-processing stage of the program, the method compares the two versions of the program source code, identifies different parts thereof, and rearranges the source code of the two versions of the program according to the recognition result, so that the two sequences are executed without affecting the execution order of the program code. The original corresponding code in the version program has the same line number; in the corresponding stage of the track, the statements on the two execution tracks are performed according to the execution trajectory of the obtained two versions of the program, the dependency relationship between the statements, and the variable value information. Corresponding, and classifying the sentences according to the corresponding results of the trajectory; in the slice analysis stage, based on the control dependence of the program, the data dependency, and the classification of the statements, the slice analysis is started from the new program execution failure point as the starting point; Its dependency, backtracking the statement entity that caused the program to fail, until the dependency statement to be analyzed is empty, and the analyzed current statement does not need to continue to analyze its dependencies; finally, all the parsed statement entities in the stage of the slice analysis are Program behavior slice output that causes regression errors. Compared with the existing regression error localization method, this method clarifies the mechanism of regression error and has a guiding role in the repair of regression errors.

DRAWINGS

Figure 1 is a general flow chart of the method of the present invention;

2 is a flow chart of a code rearrangement method;

3 is a flow chart of a method corresponding to a function body statement;

Figure 4 is a flow chart of the sentence classification process;

Figure 5 is a schematic diagram showing an exemplary procedure in a specific embodiment;

Figure 6 is a comparison of the results of the source code;

Figure 7 is a diagram of code rearrangement results;

Figure 8 is a diagram corresponding to the trajectory;

Figure 9 is a regression error slice diagram;

Figure 10 is a diagram of a slice analysis process.

detailed description

Embodiments of the present invention will be described in detail below with reference to examples.

Referring to FIG. 5, the program to be tested v1 is the original version, and the program to be tested v2 is the modified new version.

Step S1: Calculate the difference between the source code of the two versions of the program according to the two versions of the program to be tested, and rearrange the source code of the two versions according to the source code, so that the same code has the same line number. The different codes correspond to the blank lines. The flow chart is shown in Figure 2. Specifically, the following steps are included:

Step S101: Calculate the difference between the source codes of the two versions according to the source code of the two versions of the program;

Step S102: According to the calculated difference between the source code of the two versions of the program, the statements in the source code are divided into two categories, the same statement and the modified statement, and the modified statement is marked (c1-c4), as shown in FIG. 6. Shown

Step S103: On the basis of not changing the execution order of the program statement, the identical statements have the same line number, the modified statement corresponds to the blank line, and the code rearrangement result is as shown in FIG. 7;

Step S104: The two versions of the program source code to be tested after the code rearrangement are output as a new program to be tested.

Step S2: Execute the test case <x=1, y=1, z=2> which causes the modified version program to fail to execute, and extract the execution track of the two versions of the program, and the statement contains The value of the variable and the dependencies of the statement;

Table 1 program execution track

程序v1轨迹Program v1 track	main#c 1 2 3 5 6 7 11 12 13 mian#r Main#c 1 2 3 5 6 7 11 12 13 mian#r
程序v2轨迹Program v2 track	mian#c 1 2 4 8 9 11 12 13 main#r Mian#c 1 2 4 8 9 11 12 13 main#r

Step S3: Calling the trajectory function body sentence corresponding method, corresponding to the execution trajectory of the two versions of the program extracted in step S2), the flowchart is as shown in FIG. 3, and the function body statement corresponding method:

S301) determining whether the corresponding statement in the currently corresponding function body needs to be empty, if it is empty, the function body statement corresponding method execution end returns, otherwise, the process proceeds to step S302);

S302) iteratively extracting the statement in the function body as the current corresponding statement;

S303) determining whether the current corresponding statement is a function call statement, if it is a function call statement, then jumping to step S304), otherwise jumping to step S306);

S304) determining whether the function call has a corresponding function call in the corresponding track, if there is a corresponding function call, then jump to step S305), otherwise jump to step S301);

S305) calling the function body statement corresponding method to correspond to the statement in the function body track, after the execution ends, the process proceeds to step S301);

S306) determining whether the current statement has a corresponding statement in the corresponding track, if there is a corresponding statement, then jump to step S307), otherwise jump to step S301);

S307) Corresponding to the corresponding statement in the corresponding track, and then jumping to step S301).

Take the main function body trajectory in two tracks as an example. The specific process includes:

Step S301: the main function body needs to be corresponding to the statement is not empty, so go to S302;

Step S302: taking out the statement 1 in the program v1 that the main does not respond, go to step S303;

Step S303: statement 1 is not a function call statement, proceeds to step S306;

Step S306: The main function statement 1 in the path v1 track corresponds to the main function statement 1 in the program v2 track, and the process proceeds to step S307;

Step S307: Corresponding to the main function statement 1 in the two version program tracks, and marking the corresponding statement.

Repeat the above steps until the uncorrelated statement in the two tracks is empty, and the corresponding result of the trace is as shown in Fig. 8.

Step S4: According to the difference of the source code, the track corresponding result, and the variable value included in the statement, the statements in the track are divided into four categories: modifying different statements, different statements, different values, and consistent statements. The classification strategy is as follows:

(1) Modification results in different statements: modified statements.

The classification process is shown in Figure 4, which specifically includes the following steps:

S401) determining whether the statements in the track have been all classified, if all the categories have been classified, then the process proceeds to step S407), otherwise, the process proceeds to step S402);

S402) Iteratively extracts unclassified sentences in the trajectory and classifies them;

S403) according to the classification of the source code, determine whether the statement is a modified statement, if the modified statement is marked as modified resulting in a different statement and jump to step S401), otherwise jump to step S404);

S404) determining whether the statement has a corresponding statement according to the result of the track correspondence, if there is no corresponding statement, marking the statement as a statement with a different flow and jumping to step S401), otherwise jumping to step S405);

S405) determining, according to the variable value information included in the statement, whether the value of the variable included in the statement and the corresponding statement is the same, such as If at least one variable value is not the same, the statement is marked as a statement with a different value and jumps to step S401), otherwise jumps to step S406);

S406) marking the statement as a consistent statement and jumping to step S401);

S407) The sentence classification in the trajectory ends and returns.

Taking the statement 2 as an example, the specific classification process includes:

Step S401: the statements in the trajectory are not all classified, and the process proceeds to S402;

Step S402: Extract the unclassified statement 2 in the track;

Step S403: statement 2 non-modified statement, go to S403;

Step S403: statement 2 has a corresponding statement, and proceeds to S404;

Step S404: the two variables in the two sentences contain the same variable value, go to step S405;

Step S405: Mark the statement 2 in the two tracks as a consistent statement.

Repeat the above steps until all the statements in the track have been classified, and the result of the sentence classification is shown in Table 2.

Table 2 statement classification results

程序v1轨迹Program v1 track	程序v2轨迹Program v2 track	语句分类Statement classification
1:void main(int x,int y,int z)1:void main(int x,int y,int z)	1:void main(int x,int y,int z)1:void main(int x,int y,int z)	一致的Consistent
2：int abs＝x；2:int abs=x;	2：int abs＝x；2:int abs=x;	一致的Consistent
3：if(x<0)3: if (x < 0)		修改导致不同Modifications lead to different
	4：abs＝-x；4: abs=-x;	流不同Different flow
5:int max＝y；5: int max=y;		修改导致不同Modifications lead to different
6:if(y<z)6: if(y<z)		修改导致不同Modifications lead to different
7:max＝z；7:max=z;		修改导致不同Modifications lead to different
	8:int max＝z；8: int max=z;	修改导致不同Modifications lead to different

	9:if(y>z)9: if (y>z)	修改导致不同Modifications lead to different
11:int out＝max；11:int out=max;	11:int out＝max；11:int out=max;	一致的Consistent
12:out＝abs+out；12: out=abs+out;	12:out＝abs+out；12: out=abs+out;	值不同Different value
13:printf(“％d”,out)；13: printf ("%d", out);	13:printf(“％d”,out)；13: printf ("%d", out);	值不同Different value

Step S5: Begin the slice analysis by generating the failure point of the new version program as a starting point, and add the failure point statement to the slice analysis queue. In this example, the new version program is program v2, and the execution failure point statement is the output point. Statement 13, so the statement 13 in the program v2 track is added to the slice analysis queue;

Step S6: determining whether the slice analysis queue is empty, if it is empty, then jumping to step S9), otherwise jumping to step S7), the current queue is not empty, so the process goes to step S7);

Step S7: taking out the first end statement of the slice analysis queue, and determining whether the first end statement needs to perform slice analysis. If no analysis is needed, the process jumps to step S6), otherwise it jumps to step S8), whether it needs to perform slice analysis. The criterion is that if the statement has been analyzed or belongs to a statement that is consistent in step S4), no slice analysis is required, otherwise a slice analysis is required. Since the statement 13 in the program v2 track belongs to a statement with a different value, it is necessary to perform slice analysis, and the process proceeds to step S8);

Step S8: adding the current analysis sentence to the regression error slice, and adding the dependency statement and the corresponding statement that need to be sliced according to the sentence classification in step S4) to the slice analysis queue. Therefore, the statement 13 in the program v2 is added to the regression error slice, and the statement 13 corresponding to the program v1 track is added to the slice analysis queue. Since the statement 13 in the program v2 belongs to a statement having a different value, it is necessary to add its data dependency statement 12 to the slice analysis queue as well. After the step ends, the process jumps to step S6);

The above steps are repeated until step S6) jumps to step S9). The loop process is schematically illustrated in Figure 10. The solid line with an arrow in the figure indicates the dependency between the statements, and the slash fill indicates that the statement is added to the regression error slice.

Step S9: Output a regression error slice, and an example regression error slice is shown in FIG. As can be seen from the slice, by Statement 3 is deleted from program v2 relative to program v1, resulting in a different output, ie a regression error in the discussion. Therefore, the slice provides a context for the regression error, and we can accurately correct the regression error based on the slice.

Claims

A program regression error location method based on a slice model, comprising the following steps:

S1), according to the two versions of the program to be tested, calculate the difference between the two versions of the program source code, according to the source code, the source code of the two versions of the program is rearranged, so that the same code has the same line number , different codes correspond to blank lines;

S2), using two code reordering program execution will lead to a test case in which the modified version program execution fails, and extract the execution trajectory of the two versions of the program, the variable values contained in the statement, and the dependency of the statement;

S3), the execution trajectory is organized into a tree structure, each node of the tree is a function execution sequence; the trajectory function body statement corresponding method is called, and the execution trajectory of the two versions of the program extracted in step S2) is corresponding;

S4), according to the difference of the source code, the track corresponding result and the variable value contained in the statement, the statements in the track are divided into four categories: modifying different statements, different statements, different values, and consistent statements;

S5), starting with a new version of the program execution failure point as a starting point, performing a slice analysis on the generation of the regression error, and adding the failure point statement to the slice analysis queue;

S6), determining whether the slice analysis queue is empty, if it is empty, then jumping to step S9), otherwise jumping to step S7);

S7), take out the slice analysis queue head end statement, and determine whether the first end statement needs to perform slice analysis, if not need to analyze, then jump to step S6), otherwise jump to step S8);

S8), adding the current analysis statement to the regression error slice, and adding the dependency statement and the corresponding statement that need to be sliced according to the sentence classification in step S4) to the slice analysis queue, and then jumping to step S6);

S9), output regression error slice.
The slice model based program regression error localization method according to claim 1, wherein the process of code rearrangement in step S1) comprises the following steps:

S101), according to the source code of the two versions of the program, calculate the difference between the two versions of the program source code;

S102), according to the difference between the calculated two versions of the program source code, each of the two versions of the program source code is divided into two categories: identical statements, modified statements;

S103), on the basis of not changing the execution order of the program statement, rearranging the source codes of the two versions of the program, so that the identical statements have the same line number, and the modified statement corresponds to the blank line;

S104), the two versions of the program source code after the output code rearrangement are used as the new program to be tested.
The slice model based program regression error localization method according to claim 1, wherein the tree structure of the execution track in step S3) is defined as follows: each node in the tree is an execution track of a function body, and one node The child node is the execution trajectory of the function body called in the function body represented by the node; in step S3), the trajectory function body statement is correspondingly started from the root node of the tree structure, and specifically includes the following steps:

S301), determining whether the corresponding statement in the currently corresponding function body needs to be empty, if it is empty, the function body statement corresponding method execution end returns, otherwise, the process proceeds to step S302);

S302), iteratively extracting the statement in the function body as the current corresponding statement;

S303), determining whether the current corresponding statement is a function call statement, if it is a function call statement, then jumping to step S304), otherwise jumping to step S306);

S304), determining whether the function call has a corresponding function call in the corresponding track, if there is a corresponding function call, then jump to step S305), otherwise jump to step S301);

S305), calling the function body statement corresponding method corresponding to the statement in the function body track, after the execution ends, the process proceeds to step S301);

S306), determining whether the current statement has a corresponding statement in the corresponding track, if there is a corresponding statement, then jump to step S307), otherwise jump to step S301);

S307), the current statement is associated with the corresponding statement in the corresponding track, and after the process ends, the process proceeds to step S301).
The slice model-based program regression error locating method according to claim 1, wherein the method for classifying the sentences in step S4) is: according to the step S102), the source code sentence classification, the track correspondence result, and the variable value information included in the statement. , divide all the statements in the track into the following four categories:

(1) The modification results in a different statement: the modified statement;

(2) Flow different statements: non-modified statements, and there is no corresponding statement in the corresponding track;

(3) Statements with different values: there are corresponding statements in the two tracks, and the statement contains at least one variable value;

(4) Consistent statement: There are corresponding statements in the two tracks, and the values of the variables contained in the statement are the same.
The slice model based program regression error localization method according to claim 1 or 4, wherein the classification process in step S4) comprises the following steps:

S401), determining whether the statements in the trajectory have all been classified, if all the classifications have been performed, then the process proceeds to step S407), otherwise, the process proceeds to step S402);

S402), iteratively extracting unclassified sentences in the trajectory and classifying them;

S403), according to the classification of the source code, determine whether the statement is a modified statement, if the modified statement is marked as modified resulting in a different statement and jump to step S401), otherwise jump to step S404);

S404), according to the trajectory corresponding result, determine whether the statement has a corresponding statement, if there is no corresponding statement, the statement is marked as a different flow statement and jump to step S401), otherwise jump to step S405);

S405), according to the variable value information included in the statement, determining whether the statement and the variable value included in the corresponding statement are the same, and if at least one variable value is not the same, marking the statement as a statement having a different value and jumping to step S401) Otherwise, go to step S406);

S406), marking the statement as a consistent statement and jumping to step S401);

S407), the sentence classification in the track ends and returns.
The slice model-based program regression error locating method according to claim 1, wherein the criterion for determining whether the statement needs to be sliced in step S7) is: if the statement has been analyzed or belongs to the step S4) The statement does not need to be sliced, otherwise it needs to be sliced.
The slice model based program regression error localization method according to claim 1, wherein step S8) The analysis strategy of the statement that needs to be sliced and analyzed is: for the statement with different median value in step S4), only the data dependency statement and the corresponding statement in the corresponding track are added to the slice analysis queue; for the different sentences, only the control is controlled. The dependency statement is added to the slice analysis queue; for the classification to modify to cause different statements, its control dependencies and data dependency statements are added to the slice analysis queue.