CN112069059A

CN112069059A - Test case generation method and system based on maximum likelihood estimation maximum expectation

Info

Publication number: CN112069059A
Application number: CN202010811968.8A
Authority: CN
Inventors: 谢晓园; 姚羽秋; 关超; 浦帆
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2020-08-13
Filing date: 2020-08-13
Publication date: 2020-12-11
Anticipated expiration: 2040-08-13
Also published as: CN112069059B

Abstract

The invention discloses a test case generation method and a system based on maximum likelihood estimation maximum expectation, which comprises the steps of dividing an input domain of software into a plurality of sub-regions, distinguishing a boundary region of the input domain from an internal region, and taking the boundary region as the sub-region with the highest priority; introducing latent variables, estimating the probability that the sub-regions in the internal region possibly contain failure regions by combining an EM algorithm, and sequencing the sub-regions in the internal region based on the probability; generating test cases according to the priority order of the sub-regions until a software error is found; and if the number of the test cases reaches the preset condition and no software error is found yet, continuing to generate the test cases in the sub-area with the highest priority until the software error is found. The method of the invention can well solve the problem of huge calculation overhead of the prior ART method, solves the problem of boundary effect of the prior ART method to a certain extent, and simultaneously improves the operation efficiency.

Description

Test case generation method and system based on maximum likelihood estimation maximum expectation

Technical Field

The invention relates to the technical field of computers, in particular to a test case generation method and system based on maximum likelihood estimation maximum expectation.

Background

Software testing is an important link in the software development process, the whole system or part of modules can be operated by manual or automatic means, and whether the whole or local functions of the software meet the specified requirements or not is judged according to whether the expected result is consistent with the actual result or not. Currently, the variety of technologies in the field of software testing is quite large, and the most common software testing technologies are white box testing, gray box testing and black box testing. However, no matter which test method the tester chooses to use, it is almost impossible to completely test the input field of the software, and therefore it is common practice to select test cases from some representative subset of the input field of the software for testing.

Random testing is the random generation of test cases required for experiments in the input domain. But this method is not highly capable of finding errors due to its randomness to generate test cases.

Aiming at the defect that the random test has poor error finding capability, the random test is improved, and the ART method (adaptive random test) is proposed in the prior ART, wherein the classical ART method belongs to an adaptive random test method with a fixed candidate set scale. The FSCS-ART method (adaptive random test of a fixed candidate set) greatly improves the error finding capability by ensuring that test cases are more uniformly distributed in a software input domain.

However, the FSCS-ART method introduces a large number of distance calculations, resulting in a significant overhead of system computing resources for the method. Due to the characteristic that the FSCS-ART selects the test cases, the test cases selected for testing the software are easy to accumulate at the boundary of the software input domain, so that the effect of detecting the software error is influenced, namely the boundary effect.

Patent application 201811501282.8 proposes an adaptive random test method based on iterative region averaging and localization.

Patent application 201911030817.2 proposes an adaptive random test case generation method based on a central point compensation strategy.

The two methods divide an input domain, but the selection of the divided sub-regions for generating the test case is random, so that the problem of high calculation overhead of the traditional ART method is solved to a certain extent, but the blindness of the random selection of the sub-regions cannot well improve the test efficiency. Therefore, a new technical scheme with practical application significance is urgently needed to be proposed in the technical field.

Disclosure of Invention

The invention aims to provide a test case generation scheme based on maximum likelihood estimation, aiming at the problems that the calculation cost of the method is very large and the test cases are easy to accumulate in the boundary of a software input domain due to a large amount of distance calculation in the method in the prior art.

The technical scheme of the invention provides a test case generation method based on maximum likelihood estimation maximum expectation, which comprises the following steps,

step S1, dividing the input domain of the software into a plurality of sub-regions, distinguishing the boundary region of the input domain from the internal region, and taking the boundary region as the sub-region with the highest priority;

step S2, introducing latent variables, estimating the probability that the sub-regions in the internal region possibly contain failure regions by combining an EM algorithm, and sequencing the sub-regions in the internal region based on the probability;

step S3, generating test cases according to the priority order of the sub-regions until a software error is found; and if the number of the test cases reaches the preset condition and no software error is found yet, continuing to generate the test cases in the sub-area with the highest priority until the software error is found.

Further, step S1 includes the following sub-steps,

s1.1, setting the boundary length of the internal area according to the failure rate of the input area to obtain a boundary area outside the internal area, and recording the boundary area as D1;

step S1.2, the priority of the bounding region D1 is ranked first, and the inner region is divided into two parts of the same size, denoted as sub-regions D2 and D3.

When the input field is a two-dimensional square, the boundary length of the inner region is set as follows,

wherein b is the side length of the internal region, a is the side length of the input domain of the whole body, and theta is failure rate.

Further, step S2 includes the following sub-steps,

s2.1, randomly generating 1000 test cases in the two inner sub-areas D2 and D3 respectively, and counting the number of the test cases in which the software errors are found respectively;

s2.2, replacing the probability that the sub-region can find the software error by the probability that the test case finds the error and falls in the sub-region;

s2.3, introducing latent variables, iterating the probability parameters of the previous step by using an EM algorithm until the parameters are converged, and taking the parameter values reaching the convergence as maximum likelihood estimators of the parameters;

s2.4, sorting the priorities of the two inner subregions D2 and D3 by comparing the probability;

step S2.5, further dividing the two internal sub-regions into two parts with the same size, respectively, assuming that the sub-region D2 is divided into D4 and D5, D3 is divided into D6 and D7, repeating the above steps S2.1 to S2.4 for D4 and D5, and repeating the above steps S2.1 to S2.4 for D6 and D7;

step S2.6, the priorities of all the divided sub-regions are sorted, and the priority order of the sub-regions D1, D4, D5, D6 and D7 is determined.

Furthermore, step S3 is implemented as follows,

1) firstly, continuously generating a test case from a boundary region with the highest priority, executing a test, and stopping if a software error is found; if no software error is found and the total test case generation number reaches a preset threshold value, generating a test case in a sub-area with the second priority and executing the test, and if the software error is found, stopping the test; if no software error is found and the total test case generation number reaches a preset threshold value, generating a test case in a third priority subregion, executing the test, and stopping if the software error is found; by analogy, after the priority level is continuously reduced, if no software error is found in the sub-region with the lowest priority level and the total test case generation number reaches a preset threshold value, entering the step 2);

2) if the step 1) has been executed once currently, returning to execute the step 1), if the step 1) has been executed twice repeatedly currently, continuing to generate the test case only in the boundary area with the highest priority, and executing the test until a software error is found or a preset upper limit of test times is reached.

Also, the threshold setting preferably takes a value of 10.

The invention also correspondingly provides a test case generation system based on the maximum likelihood estimation, which is used for realizing the test case generation method based on the maximum likelihood estimation.

And, including the following modules,

the first module is used for dividing an input domain of the software into a plurality of sub-regions, distinguishing a boundary region of the input domain from an internal region, and taking the boundary region as the sub-region with the highest priority;

the second module is used for introducing a maximum likelihood estimation expectation mode, estimating the probability that the sub-region in the internal region possibly contains a failure region, and sequencing the sub-regions in the internal region based on the probability;

the third module is used for generating test cases according to the priority order of the sub-regions until a software error is found; and if the number of the test cases reaches the preset condition and no software error is found yet, continuing to generate the test cases in the sub-area with the highest priority until the software error is found.

Alternatively, the test case generation system comprises a processor and a memory, wherein the memory is used for storing program instructions, and the processor is used for calling the stored instructions in the processor to execute the test case generation method based on the maximum likelihood estimation maximum expectation.

Alternatively, the test case generating method comprises a readable storage medium, wherein a computer program is stored on the readable storage medium, and when the computer program is executed, the test case generating method based on maximum likelihood estimation maximum expectation is realized.

The invention discloses a test case generation scheme based on maximum likelihood estimation maximum expectation, which comprises the steps of firstly dividing an input domain into a plurality of sub-domains with different sizes, introducing latent variables and combining an EM (effective electromagnetic field) method to estimate the probability that the sub-domains possibly contain failure domains, finally sequencing the sub-domains by taking the probability size as a standard, and preferentially generating test cases in the sub-domains with the highest sequencing priority. The method of the invention can well solve the problem of huge calculation overhead of the prior ART method, solve the problem of boundary effect of the prior ART method to a certain extent and simultaneously improve the operation efficiency.

Drawings

FIG. 1 is a flowchart of a test case generation method based on maximum likelihood estimation according to an embodiment of the present invention;

FIG. 2 is a diagram showing a first iterative process of the EM-ART method according to the embodiment of the present invention, wherein (a) part is a schematic diagram of division of D1, D2 and D3, and (b) part is a schematic diagram of virtual division of D2.1, D2.2, D3.1 and D3.2;

FIG. 3 is a diagram showing a second iterative process of the EM-ART method according to the embodiment of the present invention, wherein (a) part is a schematic diagram of division of D4 and D5, and (b) part is a schematic diagram of virtual division of D4.1, D4.2, D5.1 and D5.2;

FIG. 4 is a diagram showing a third iteration process and a final division result of the EM-ART method according to the embodiment of the present invention, wherein (a) is a virtual division diagram of portions D6.1, D6.2, D7.1 and D7.2; and part (b) is a schematic diagram divided by D6 and D7.

Detailed Description

The invention provides a test case generation method based on maximum likelihood estimation maximum expectation, which comprises the following steps: the input domain is divided into a plurality of sub-domains with different sizes, latent variables are introduced and an EM method is combined to estimate the probability that the sub-domains possibly contain failure domains, the sub-domains are finally sequenced by taking the probability size as a standard, and test cases are preferentially generated in the sub-domains with the front sequencing, so that the calculation overhead is reduced and the boundary effect is relieved. In order to conform to the habit in the technical field, the test case generation method based on maximum likelihood estimation maximum expectation provided by the invention is referred to as an EM-ART method for short.

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment provides a test case generation method based on maximum likelihood estimation maximum expectation, please refer to fig. 1, and the method includes:

step S1: the input domain of the software is divided into a plurality of sub-domains, the boundary region of the input domain is distinguished from the inner region, and the boundary region is used as the sub-domain with the highest priority. The inner region may be divided into two or more sub-regions. The invention provides a method for dividing an input domain into a plurality of sub-domains with different sizes and distinguishing a boundary region of a software input domain from an inner region aiming at the problem of boundary effect of an ART method in the prior ART. Taking a two-dimensional square input field as an example, a specific idea of dividing the software input field by the EM-ART method is shown in fig. 2.

In one embodiment, step S1 specifically includes:

step S1.1: the boundary length of the inner region is set according to the failure rate of the input field.

Step S1.2: the priority of the boundary area is arranged at the first position, and the inner area is divided into two parts with the same size, so that two inner sub-areas are obtained.

The total input field is centered on the inner square area, totalAnd the annular area obtained by subtracting the middle square area from the input area is the boundary area. Specifically, in fig. 2, the boundary region is D1, and as shown in part (a) in fig. 2, the inner region is divided into two parts of the same size, which are D2 and D3, respectively. That is, the merging region of D2 and D3 is the inner square region at the center of the overall input field. In order to make the boundary region D1 relatively close to the "boundary" in the true sense, the side length of the internal square region is also considered, and according to the prior art study on the failure rate of the input domain and the geometry of the failure domain, it is preferable to set the boundary length of the internal square region to be the same

Where b is the inner square region side length, a is the overall input domain side length, and θ is the failure rate. Then, the three regions D1, D2, and D3 are prioritized, the boundary region D1 is ranked first, and the two inner sub-regions D2 and D3 having the same size cannot determine the prioritization order for a while.

In specific implementation, a user can set a value θ corresponding to a specific problem according to an application scenario, and the value θ is usually smaller, for example, 0.01, 0.005, 0.002, 0.001, 0.0005, 0.0002, and 0.0001.

Step S2: aiming at the problem of huge calculation overhead existing in the prior ART method, latent variables are introduced and an EM method is combined to estimate the probability that the sub-regions in the internal region possibly contain failure regions, and finally the sub-regions in the internal region are sequenced by taking the probability as a standard (the priority of the boundary region is still arranged at the first position and is higher than that of all the sub-regions in the internal region);

aiming at the problem of huge calculation overhead of an ART method in the prior ART, latent variables are introduced and an EM algorithm is combined to estimate the probability that the sub-regions possibly contain failure regions, and the sub-regions are finally sequenced by taking the probability as a standard.

In order to make the test cases more uniformly distributed, the embodiment proposes to divide the sub-regions D2 and D3 equally, so that the priority order of D2 and D3 is determined based on maximum likelihood estimation maximum expectation, and then the sub-regions D2 and D3 are sorted in the same way.

In one embodiment, step S2 specifically includes:

step S2.1: and respectively generating 1000 test cases in the two internal sub-areas at random, and respectively counting the number of the test cases in which the software errors are found.

Specifically, 1000 test cases are randomly generated in the regions D2 and D3, and the number of test cases in which software errors are found is counted and recorded as y1 and y2, respectively.

Step S2.2: replacing the sub-region with the probability that the test case found the error and fell within the sub-region can find the probability of a software error.

Specifically, assume that the probability of a software error being found in sub-region D2 is β₁The probability of a software error being found in sub-region D3 is θ - β₁Because the iteration result of the test probability is small due to the small value of theta, the embodiment performs special processing on the probability that the errors can be found in the two regions, and if the researched event is described as a test case discovery error and falls in the sub-region D2 or D3, the conditional probability can be used

And

to replace the original probability value for the estimation (here

) By using

And

the original probability value is replaced for estimation, so that the result is more visual and easier to compare. y1/1000 is approximately equal to alpha₁When the number of experiments is larger, the calculated value is closer to the real probability value.

Step S2.3: and introducing latent variables, iterating the probability parameters of the previous step by using an EM (effective man) method until the parameters are converged, and taking the parameter values reaching the convergence as maximum likelihood estimators of the parameters.

Specifically, as shown in part (b) of fig. 2, two sub-regions are assumed to exist inside D2 and D3, respectively, where D2 may be assumed to be divided into D2.1 and D2.2, and the assumed division manner is not limited; and assume that the probabilities of finding a software error in D2.1 and D2.2 are respectively

And

d3 can be assumed to be divided into D3.1 and D3.2, and the assumed division manner is not limited; and assume that the probabilities of finding a software error in D3.1 and D3.2 are respectively

And

then introducing a latent variable z₁And z₂Let z be the test cases for finding software errors in D2.1 and D2.2 respectively₁And y₁-z₁Similarly, assume that the test cases for software bugs found in D3.1 and D3.2 are z₂And y₂-z₂。

Introduction of latent variable z₁And z₂And then, respectively solving the condition expectation for the latent variables to obtain an expression of the step E:

wherein the content of the first and second substances,

and (3) carrying out derivation on the formula and making the derivation be zero to obtain an M-step iterative formula:

where Q () refers to the mathematical expectation, i refers to the number of iterations,

at the i-th iteration

Is determined by the estimated value of (c),

y₁and y₂Indicating the number of software bug cases found in the areas D2 and D3,

representing the conditional probability that the test case found an error and fell in sub-region D2.

Using EM method to correct unknown parameters

Iterating until the parameters converge, and reaching the converged parameter values

As a parameter

And comparing

And

the magnitude of (b) is given if

Then region D2 is ranked second in priority, and conversely D3 is ranked second in priority.

Step S2.4: the priorities of the two inner sub-regions D2 or D3 are ordered by comparing the magnitude of the probabilities.

Step S2.5: the two inner sub-regions D2 and D3 are further divided into two equal-sized two parts, respectively, and the above-described steps S2.1 to S2.4 are repeated.

Step S2.6: the priorities of all the partitioned sub-regions are ordered.

Specifically, after the priority order of the three sub-regions D1, D2, and D3 is determined, as shown in part (a) of fig. 3, the D2 region is further divided into two identical sub-regions D4 and D5, respectively, in the same manner as before. As shown in part (b) of fig. 3, two sub-regions are also assumed to exist inside D4 and D5, wherein D4 may be assumed to be divided into D4.1 and D4.2, and D5 may be assumed to be divided into D5.1 and D5.2, and the assumed division manner is not limited.

Assume that the probability of a software error being found in sub-region D4 is

The probability that a software error can be found in sub-region D5 is

Then counting the number of test cases in which software errors are found, and respectively recording the number as y₃And y₄Then introducing a latent variable z₃And z₄Let z be the test cases for software errors found in D4.1 and D4.2, respectively₃And y₃-z₃The probability of finding a software error is respectively

And

similarly, assume that the test cases found in D5.1 and D5.2 for a software bug are z₄And y₄-z₄The probability of finding an error is respectively

And

EM method is then introduced to correct the unknown parameters

As a parameter

Maximum likelihood estimator of (1), and final comparison

And

and sorting the regions D4 and D5 according to the size relationship between the two.

After determining the priority ranking between D4 and D5, the sub-region D3 is further processed by region division, parameter iteration and the like similar to the sub-region D2, the region D3 is firstly divided into two equal sub-regions D6 and D7, as shown in part (a) in fig. 4, it is assumed that two sub-regions exist inside D6 and D7 respectively, wherein D6 can be divided into D6.1 and D6.2, D7 can be divided into D7.1 and D7.2, and it is assumed that the probability of software error found in the sub-region D6 is equal to

The probability that a software error can be found in sub-region D7 is

Then counting the number of test cases in which software errors are found, and respectively recording the number as y₅And y₆Then introducing a latent variable z₅And z₆Let y be the test cases for finding software errors in D6.1 and D6.2, respectively₅And y₅-z₅The probability of finding a software error is respectively

And

then assume that the test cases found with software errors in D7.1 and D7.2 are y respectively₆And y₆-z₆The probability of finding an error is respectively

And

then introducing EM method to correct unknown parameter psi₃Iterating until the parameters converge, and reaching the converged parameter values

As a parameter

Maximum likelihood estimator of (1), and final comparison

And

and sorting the regions D6 and D7 according to the size relationship between the two.

In summary, as shown in part (b) of fig. 4, the input domain is finally divided into 5 sub-regions D1, D4, D5, D6 and D7 through three parameter iterations, and the 5 sub-regions are prioritized, which determines the order in which the test cases are generated in the sub-regions by the following experiments.

Step S3: and performing software test by using an EM-ART method, and preferentially generating test cases in the front-ranked sub-area.

In the embodiment, the test cases are generated according to the priority order of the sub-regions until a software error is found. And if the number of the test cases reaches a certain degree and no software error is found, continuing to generate the test cases in the sub-area with the highest priority until the software error is found.

Specifically, step S3 of the embodiment is implemented as follows:

1) firstly, continuously generating test cases from a boundary area with the highest priority, executing the test, and stopping if a software error is found. And if no software error is found and the total test case generation number reaches a preset threshold value, generating a test case in a sub-area with the second priority and executing the test, and if the software error is found, stopping the test. And if no software error is found and the total test case generation number reaches a preset threshold value, generating a test case in a third priority subregion, executing the test, and if the software error is found, stopping the test. After the priority level is continuously reduced, if no software error is found in the sub-area with the lowest priority level and the total test case generation number reaches a preset threshold value, entering the step 2);

in specific implementation, the threshold may be preset by a person skilled in the art as needed, and the embodiment adopts the preferred value of 10, that is, 10 test cases are sequentially generated in each area according to the priority order.

In specific implementation, a person skilled in the art can implement the automatic operation process by using a computer software technology, and a system device for implementing the method, such as a computer-readable storage medium storing a corresponding computer program according to the technical solution of the present invention and a computer device including a corresponding computer program for operating the computer program, should also be within the scope of the present invention.

In some possible embodiments, a test case generation system for estimating a maximum expectation based on maximum likelihood is provided, comprising the following modules,

In some possible embodiments, a test case generation system based on maximum likelihood estimation maximum expectation is provided, which includes a processor and a memory, wherein the memory is used for storing program instructions, and the processor is used for calling the stored instructions in the processor to execute a test case generation method based on maximum likelihood estimation maximum expectation.

In some possible embodiments, a test case generation system based on maximum likelihood estimation maximum expectation is provided, which includes a readable storage medium, on which a computer program is stored, and when the computer program is executed, the test case generation system based on maximum likelihood estimation maximum expectation is implemented as described above.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various modifications and variations can be made in the embodiments of the present invention without departing from the spirit or scope of the embodiments of the invention. Thus, if such modifications and variations of the embodiments of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to encompass such modifications and variations.

Claims

1. A test case generation method based on maximum likelihood estimation maximum expectation is characterized by comprising the following steps,

2. The maximum likelihood estimation-based test case generation method of the maximum expectation according to claim 1, wherein: step S1 includes the following sub-steps,

3. The maximum likelihood estimation-based test case generation method of the maximum expectation according to claim 2, wherein: when the input field is a two-dimensional square, the boundary length of the inner region is set as follows,

4. The maximum likelihood estimation-based test case generation method of the maximum expectation according to claim 2, wherein: step S2 includes the following sub-steps,

5. The method for generating test cases based on maximum likelihood estimation maximum expectation according to claim 1, 2, 3 or 4, wherein: the step S3 is implemented as follows,

6. The maximum likelihood estimation-based test case generation method of the maximum expectation according to claim 5, wherein: the threshold setting preferably takes a value of 10.

7. A test case generation system based on maximum likelihood estimation maximum expectation is characterized in that: the method for generating the test case based on the maximum likelihood estimation maximum expectation according to any one of claims 1 to 6.

8. The maximum likelihood estimation based test case generation system of claim 7, wherein: comprises the following modules which are used for realizing the functions of the system,

9. The maximum likelihood estimation based test case generation system of claim 7, wherein: comprising a processor and a memory for storing program instructions, the processor being configured to invoke the stored instructions in the processor to perform a test case generation method based on maximum likelihood estimation maximum expectation as claimed in any one of claims 1 to 6.

10. The maximum likelihood estimation based test case generation system of claim 7, wherein: comprising a readable storage medium having stored thereon a computer program which, when executed, implements a method for maximum likelihood estimation based maximum expectation test case generation as claimed in any one of claims 1-6.