CN114356778A

CN114356778A - Deep reinforcement learning software testing method based on coverage guide fuzzy test

Info

Publication number: CN114356778A
Application number: CN202210021911.7A
Authority: CN
Inventors: 郑征; 李天成; 万晓晖
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2022-01-10
Filing date: 2022-01-10
Publication date: 2022-04-15

Abstract

The invention discloses a deep reinforcement learning software testing method based on coverage guide fuzzy testing, which comprises the following steps: the method comprises the following steps: 1) generating an initial state meeting the value range; 2) the initial state is handed to the deep reinforcement learning software to be tested to run, and all running information of the intelligent agent is recorded in the running process of the software to be tested; 3) designing a target function, judging whether the initial state triggers errors or not, and if the initial state triggers errors, considering that the initial state is a successful test case; 4) analyzing whether the collected state forms a new cover or not, and if so, adding the new cover into the seed pool; 5) and selecting part of seeds from the seeds newly added into the seed pool to perform mutation operation, and taking the mutation seeds as test input. The invention firstly provides a coverage guide fuzzy test method facing deep reinforcement learning software, improves the traditional coverage guide fuzzy test framework, can generate a more sufficient and comprehensive test case set, and more efficiently exposes the error behavior of the deep reinforcement learning software.

Description

Deep reinforcement learning software testing method based on coverage guide fuzzy test

Technical Field

The invention belongs to the technical field of artificial intelligence software testing of software engineering application, and particularly relates to a deep reinforcement learning software testing method based on coverage guide fuzzy testing.

Background

In recent years, deep reinforcement learning techniques are increasingly applied to the fields of game play, automatic driving, medical health, financial transactions, robot control, network security, and the like. Meanwhile, quality assurance of deep reinforcement learning software is also increasingly important, especially in safety critical fields such as automatic driving and medical health. In fact, the primary factor affecting software quality and reliability is software defects, i.e., all attributes that do not meet the software's intended design. In the development lifecycle of a large software project, the later an intrinsic defect is detected, the higher the cost of repairing the defect at a later stage. Therefore, software defects must be detected and removed in time at the beginning of software development. How to timely find and eliminate potential defects in the deep reinforcement learning software and improve the reliability of the artificial intelligence software becomes a problem which must be concerned and solved all the time in the development life cycle of the artificial intelligence software.

Early detection of software defects relies primarily on extensive coverage testing of the software. The software test is to judge whether a software module has defects by designing a series of test cases in advance, then running the test cases in the tested software and analyzing whether the output result is consistent with the expected result. The essence of deep reinforcement learning software testing is that in order for the software to meet delivery requirements, as many defects as possible are detected using limited testing resources. If the researched deep reinforcement learning software testing method can detect the defects in the software more quickly, the development process can be accelerated, and the software development efficiency is improved. Traditional software programs have explicit semantic logic and can often be expressed in the form of control flow statements. Many researches propose various traditional software testing methods based on coverage from the aspects of equivalence classes, boundary values, causal graphs and the like.

However, as a special class of machine learning software, the deep reinforcement learning software has the general characteristics that the machine learning software is different from the traditional software: firstly, the failure mechanism is complex, besides codes, a large number of neuron structures form a software main body, and the software is used as statistical guide type software and is driven by data and comprises a large number of random factors; secondly, the test case is difficult to generate, the task-oriented performance is complex, the dimension of the test input space is high, and no specific test prediction exists; thirdly, the test evaluation is difficult, the machine learning software needs to consider the whole system, and the machine learning software cannot be decomposed into a plurality of relatively independent test units like the traditional software; fourthly, the software repair is difficult, the effective information available for the software repair is less due to the difficulty in generating the test case, the defect characteristics are more complex, and the repair is difficult by using the information in the test case. In view of the above, the conventional software testing method cannot be used for deep reinforcement learning software testing.

In recent years, software testing for machine learning has become a research focus, and a series of testing methods have been proposed one after another. However, there are still significant differences between deep reinforcement learning software and other machine learning software. For example, the deep reinforcement learning model needs to interact with a complex environment in a training process, so that the failure reasons of the deep reinforcement learning model are more various, meanwhile, training data and samples are generated in the training process, the training data have high correlation, so that the running time is longer, the training is more difficult, and the expected output is usually a decision track, so that the generation of test cases is more difficult. The above differences also limit the direct application of the existing machine learning test method in deep reinforcement learning software testing.

How to design a test case generation method and test sufficiency measurement suitable for the deep reinforcement learning software and how to construct a deep reinforcement learning software test framework are key points and difficulties of the design of the reinforcement learning software test method and are problems to be solved by the invention.

Disclosure of Invention

One object of the present invention is: in order to solve the difficulty of the reinforcement learning software test and make up the blank of the current academic and industrial fields on the deep reinforcement learning software test, the invention provides a deep reinforcement learning software test method based on coverage guide fuzzy test. The method is designed in a targeted manner from two aspects of test case generation and test sufficiency evaluation by means of the thought and the framework of the coverage guide fuzzy test, and provides a coverage guide fuzzy test method for deep reinforcement learning software. Aiming at the problem of test case generation, the invention realizes high-quality and high-efficiency generation of test input by designing a proper seed selector and a proper seed mutator, and more efficiently triggers the failure of deep reinforcement learning software; aiming at the problem of test sufficiency evaluation, the invention judges whether the current state forms new coverage or not by comparing whether the shortest distance between the current state and the state seed pool exceeds a preset threshold value or not, and in addition, in the process of generating test input in an iteration mode, the new state can be generated to form new coverage, so that the test coverage state space is ensured.

Another object of the invention is: the software testing framework for the reinforcement learning system aims to accelerate the exposure of the defects of the reinforcement learning system, so that the software testing framework can be used as an effective means for evaluating the reliability of the reinforcement learning system, and the generated test case can also be used for retraining to improve the reliability of the system.

The technical scheme of the invention is as follows: a deep reinforcement learning software testing method based on coverage guide fuzzy testing comprises the following steps:

step 1), defining the state definition of a reinforcement learning task and the value range of each dimension of the state, setting a random state generator, and generating an initial state meeting the value range through the random state generator;

step 2), the randomly generated initial state is handed to the deep reinforcement learning software to be tested to run, an operation information tracker is set, and all operation information of the intelligent agent, such as state information and corresponding reward information experienced by the intelligent agent in the current round, is recorded in the running process of the software to be tested;

step 3), designing a target function, judging whether the initial state triggers errors or not through the target function, and if the errors are triggered, considering that the initial state is a successful test case; for example, setting whether the turn award is lower than a failure threshold value, if so, considering that the initial state triggers an error, and considering the initial state as a successful test case;

step 4), designing a coverage analyzer, analyzing whether the collected states form new coverage or not, and adding the new coverage into the seed pool if the new coverage is formed; for example, the shortest distance between each state collected in the current round and all the states in the seed pool is calculated, whether the current state forms new coverage is judged by judging whether the shortest distance exceeds a certain threshold value, and if the current state forms new coverage, the new coverage is added into the seed pool;

the coverage analyzer adopts the approximate nearest neighbor algorithm based on the kd-tree, which is greatly different from the coverage definition in the traditional test. For a randomly generated initial state, its nearest neighbor is found in the seed pool, and if the distance is greater than an adjustable threshold δ, the initial state is added to the seed pool.

To increase efficiency, this is further simplified: the kd-Tree is constructed before each iteration and is not updated in the current iteration. The test input generated in each iteration will be used as the initial state for the subsequent iteration process. In each round, only the current state satisfies: the distance to the latest state of addition is greater than a fixed threshold delta^′(internal threshold) and the distance to the nearest neighbor in the seed pool is greater than the adjustable threshold δ, we will assume that the current state reaches the new coverage, at which point we will add the current state to the seed pool.

Step 5), designing a seed mutation device, selecting partial seeds from the seeds newly added into the seed pool to perform mutation operation, and taking the mutated seeds as test input;

further, the test input of the seed mutator further comprises a step of generating a state capable of forming a new coverage as another part of test input by using a random state generator, and the state and the mutant seed are used together as a new round of input of the deep reinforcement learning software.

The seed mutation device adopts a mutation strategy based on loss gradient, which is totally different from the mutation strategy in the traditional fuzzy test, the mutation strategy uses the thinking of gradient attack for reference, and the failure of the deep reinforcement learning software can be efficiently triggered by the mode of calculating the loss gradient of the deep reinforcement learning model and then mutating the current state along the gradient direction.

For example, the initial state with the lowest round reward in the previous round is taken, the gradient of the initial state is calculated, the initial state is mutated along the direction of the negative gradient of the initial state, meanwhile, in order to improve the state coverage, a state capable of forming a new coverage is randomly generated and selected, and the state is used as the initial state of the new round to be handed to the tested software for execution.

And (5) continuously executing the processes from the step (2) to the step (5) until the maximum cycle number is reached, wherein the successful test cases collected in the whole process can expose the error behavior of the tested software.

Compared with the existing deep reinforcement learning software testing method, the deep reinforcement learning software testing method based on the coverage guide fuzzy test has the advantages that: the invention can greatly improve the testing efficiency by utilizing the coverage guide fuzzy testing method, and can generate more failure test cases compared with the traditional random generation method. And compared with the traditional randomly generated failure test case, the generated failure test case has higher diversity. Therefore, the invention can generate more and more diversified failure test cases, namely, the efficiency and the quality of the test case generation aiming at the deep reinforcement learning software are improved.

Drawings

FIG. 1 is a flow chart of a deep reinforcement learning software testing method based on coverage-guided fuzzy testing

Detailed Description

The invention is further described below with reference to the accompanying drawings.

Step 1), state definition of a reinforcement learning task and value ranges of all dimensions of the state are defined (the value ranges of all dimensions are determined by task characteristics and can also be determined by developers for understanding the current task), a random state generator is set, and an initial state meeting the value ranges is generated through the random state generator. For example, we provide an inverted pendulum as in the gym libraryControl task (CartColle Environment) is taken as an example, and the input state is

Respectively corresponding to the position of the trolley, the angle of the rod, the speed of the trolley and the angular speed of the rod, and setting the range x ∈ [ -4.8,4.8 ] of each dimension of the state according to the task],θ∈[-24,24],

Randomly generating a state s within a set range;

step 3), designing a target function, judging whether the initial state triggers errors or not through the target function, and if triggering fails, considering that the initial state is a successful test case; for example, setting whether the turn award is lower than a failure threshold value, if so, considering that the initial state triggers an error, and considering the initial state as a successful test case;

the objective function is a boolean function on the round awards, which is very different from the conventional fuzz test. Specifically, the deep reinforcement learning software is made to complete a complete round with the current state as an initial starting point, and if the round reward obtained in the complete round is lower than the failure threshold (for example, for a cartrole task, the round reward is 1 when the task fails, and therefore the failure threshold is 1), the objective function returns True, which indicates that the current input triggers the error behavior of the deep reinforcement learning software, and the input state is a test input which successfully triggers an error.

And 4), designing a coverage analyzer, analyzing whether the collected states form a new coverage, and adding the new coverage into the seed pool if the collected states form the new coverage. The coverage analyzer employed by the present invention will be described in detail below.

The overlay parser is responsible for reading runtime information, in particular status information, and then checking whether the current input performs a new overlay. Ideally, the coverage analyzer aims to check whether the deep reinforcement learning system is in a state that triggers different faults by constantly exploring new regions of the state space. The approach to achieve this goal is to use the kd-Tree based nearest neighbor algorithm, which is an algorithm widely used to search for approximate nearest neighbors in high dimensional space. Specifically, for a randomly generated initial state, we look for its nearest neighbor in the seed pool and add the initial state to the seed pool if the distance is greater than an adjustable threshold δ. However, the construction of the kd-Tree is very time consuming. It is impractical to construct a kd-tree to check the coverage at each time step during run time. Therefore, we perform a compromise: the kd-Tree is constructed before each iteration and is not updated in the current iteration. Each iteration has a batch of test inputs as initial states for the next round. In each round, the current state is added to the seed pool only if it satisfies the following two conditions: the distance to the last added state is greater than a fixed threshold δ^′(internal threshold) while the distance to the nearest neighbor in the seed pool is greater than the adjustable threshold δ. The above procedure is shown in table 1.

TABLE 1

And 5) designing a seed mutation device, selecting partial seeds from the seeds newly added into the seed pool to perform mutation operation, and taking the mutated seeds as test input. The seed selector and the seed mutator (mutation strategy) used in the present invention will be described in detail below.

First, a seed selector used in the present invention will be described in detail with reference to fig. 1.

At any given time, the fuzz test selects seeds from the pool of seeds to mutate at the beginning of a new iteration test in the main loop of the fuzz test. In conventional coverage-guided fuzz testing methods, a good seed selection strategy will typically select those seeds that cover the new path and are more likely to trigger errors. However, the deep reinforcement learning test process has difficulty guiding the generation of states that simultaneously perform new overlays and trigger errant behavior. Therefore, we select seeds that have won a lower round award in the last iteration. The seed selected above is only a part of the batch input for the next iteration. Another part of the batch input will be described later.

Next, a seed mutator (mutation strategy) used in the present invention will be described in detail.

The mutation strategy is another key problem in the deep reinforcement learning software test method based on the coverage-guided fuzzy test, and is responsible for generating new test cases according to batch input obtained in the previous iteration. The input of the variant device is the seed selected by the seed selector, and the specific variant process is as follows:

given an initial state s, a Q network pi, an action set a, an environment E, and a variation step size α, first calculate the Q value of the current state:

q₀＝π(s) (1)

calculating the action index:

idx＝argmaxq₀ (2)

calculate the Q value for the next state:

q₁＝π(E(s,a[idx])) (3)

calculating the label of the current state s:

label＝q₀ (4)

label[idx]＝label[idx]+γ·max(q₁) (5)

where γ is the discount rate in the training process.

Calculating a loss value:

loss＝||label-q₀||₂ (6)

using the gradient, the state after mutation s is calculated^′：

Reinforcement learning tasks can generally be modeled as a Markov decision process with Markov properties indicating that the outcome of an action depends only on the current state; therefore, the traces after the initial state is taken as any intermediate state of a certain trace without considering any random factors in the environment, and the traces are also overlapped. Therefore, the test cases should also explore more system execution states and cover more, and the mutation strategy cannot meet the requirement. Therefore, the test case is divided into two parts, one of which is the variant input (10%) and the other of which is the randomly generated input (90%).

A complete process of a deep reinforcement learning software testing method based on coverage-guided fuzz testing is described, as shown in fig. 1 of the accompanying drawings, and the specific implementation steps are shown in table 2:

TABLE 2

The above description describes in detail the deep reinforcement learning software testing method based on coverage-guided fuzz testing according to the present invention, but it is obvious that the specific implementation form of the present invention is not limited thereto. It will be apparent to those skilled in the art that various obvious changes may be made therein without departing from the spirit of the invention and the scope of the appended claims.

Claims

1. A deep reinforcement learning software testing method based on coverage guide fuzzy testing is characterized in that: the frame comprises the following specific steps:

step 2), transferring the randomly generated initial state to deep reinforcement learning software to be tested for operation, setting an operation information tracker, and recording all operation information of the intelligent agent in the operation process of the software to be tested;

step 3), designing a target function, judging whether the initial state triggers errors or not through the target function, and if the errors are triggered, considering that the initial state is a successful test case;

step 4), designing a coverage analyzer, analyzing whether the collected states form new coverage or not, and adding the new coverage into the seed pool if the new coverage is formed;

and continuously executing the processes from the step 2) to the step 5) until the maximum cycle number is reached, wherein the successful test cases collected in the whole process can expose the error behavior of the tested software.

2. The method for testing the deep reinforcement learning software based on the coverage guide fuzzy test according to claim 1, characterized in that: all the operation information in the step 2) comprises state information and corresponding reward information experienced by the agent in the current round.

3. The method for testing the deep reinforcement learning software based on the coverage guide fuzzy test according to claim 1, characterized in that: the objective function in step 2) is a boolean function on the round awards.

4. The method for testing the deep reinforcement learning software based on the coverage guide fuzzy test according to claim 1, characterized in that: the determining whether the initial state triggers an error in step 3) includes: and setting whether the turn reward is lower than a failure threshold value, and if the turn reward is lower than the failure threshold value, considering that the initial state triggers errors, and determining the initial state as a successful test case.

5. The method for testing the deep reinforcement learning software based on the coverage guide fuzzy test according to claim 1, characterized in that: and 4) the coverage analyzer searches the nearest neighbor of the randomly generated initial state in the seed pool by adopting an approximate nearest neighbor algorithm based on a kd-tree, and adds the initial state into the seed pool if the distance is greater than an adjustable threshold delta.

6. The method for testing the deep reinforcement learning software based on the coverage guide fuzzy test according to claim 5, wherein: for the coverage analyzer, further simplification: the kd-tree is constructed before each iteration and is not updated in the current iteration; the test input generated in each iteration is used as the initial state of the subsequent iteration process; in each round, only the current state satisfies: until the distance from the newly added state is greater than a fixed threshold δ' and the distance from the nearest neighbor in the seed pool is greater than an adjustable threshold δ, the current state is considered to reach a new coverage, and the current state is added into the seed pool.

7. The method for testing the deep reinforcement learning software based on the coverage guide fuzzy test according to claim 1, characterized in that: and 5) adopting a variation strategy based on the loss gradient by the seed variation device, calculating the loss gradient of the deep reinforcement learning model, and then performing variation on the current state along the gradient direction.

8. The method for testing the deep reinforcement learning software based on the coverage guide fuzzy test according to claim 7, wherein: the loss gradient-based mutation strategy comprises the following steps: and taking the initial state of the part with the lowest turn reward in the previous round, calculating the gradient of the initial state and carrying out variation on the initial state along the direction of the negative gradient.

9. The method for testing the deep reinforcement learning software based on the coverage guide fuzzy test according to claim 1, characterized in that: the test input of the seed mutator in the step 5) further comprises the step of generating a state capable of forming new coverage by using a random state generator as another part of test input, and the state and the mutant seeds are used together as a new round of input of the deep reinforcement learning software.