WO2020151301A1

WO2020151301A1 - Reinforcement learning-based test script generation method and apparatus

Info

Publication number: WO2020151301A1
Application number: PCT/CN2019/116263
Authority: WO
Inventors: 李佳楠; 张新琛; 陈忻; 黄伟东; 孙震
Original assignee: 阿里巴巴集团控股有限公司
Priority date: 2019-01-21
Filing date: 2019-11-07
Publication date: 2020-07-30
Also published as: CN109901994B; CN109901994A

Abstract

Provided in the embodiments of the present description are a reinforcement learning-based test script generation method and apparatus. The method comprises: acquiring states and behaviors used for testing, wherein the states comprise a plurality of testing states from a testing initial state to a testing target state, switching between the plurality of testing states is triggered by the behaviors, and one of the states corresponds to a plurality of possible behaviors; running a reinforcement learning model to determine a corresponding Q value between each of the states and each behavior; according to the Q value, obtaining a test script, the test script corresponding to one execution path from the testing initial state to the testing target state, the execution path comprising a sequence of behaviors that may reach the target state so as to test target software to be tested by means of the test script.

Description

Method and device for generating test script based on reinforcement learning

Technical field

The present disclosure relates to the field of testing technology, and in particular to a method and device for generating test scripts based on reinforcement learning.

Background technique

Testing is the process of operating a program under specified conditions to find program errors, measure software quality, and evaluate whether it can meet design requirements. Automated testing is a process that transforms human-driven testing behavior into machine execution.

In the traditional automated testing process, testers need to manually write test scripts for automated testing, and the writing of test scripts often takes most of the testers' time.

Summary of the invention

In view of this, one or more embodiments of this specification provide a method and device for generating test scripts based on reinforcement learning, so as to save labor costs for automated testing and make testing more convenient.

Specifically, one or more embodiments of this specification are implemented through the following technical solutions:

In a first aspect, a method for generating test scripts based on reinforcement learning is provided, where the test scripts are used to test target software under test; the method includes:

Acquire the state and behavior for testing, the state includes a plurality of test states from the test initial state to the test target state, and the switching between the plurality of test states is triggered by the behavior, and one state corresponds to multiple Possible actions

Run a reinforcement learning model to determine the corresponding Q value between each state and each behavior;

According to the Q value, a test script is obtained, the test script corresponds to an execution path from the initial test state to the test target state, and the execution path includes a sequence of behaviors that can reach the target state to pass the test script pair The target software under test is tested.

In a second aspect, a test script generation device based on reinforcement learning is provided, the device is used to generate a test script; the device includes:

The information acquisition module is used to acquire the state and behavior used for testing, the state includes a plurality of test states from the test initial state to the test target state, and the switching between the plurality of test states is triggered by the behavior, One said state corresponds to multiple possible actions;

The model running module is used to run the reinforcement learning model to determine the corresponding Q value between each state and each behavior;

The script generation module is used to obtain a test script according to the Q value, the test script corresponding to an execution path from the initial test state to the test target state, and the execution path includes a sequence of behaviors that can reach the target state to Test through the test script.

In a third aspect, a test script generation device based on reinforcement learning is provided. The device includes a memory and a processor. The memory is used to store computer instructions that can run on the processor. The processor is used to execute the The following steps are implemented when the computer commands:

The method and device for generating test scripts based on reinforcement learning in one or more embodiments of this specification uses a reinforcement learning model to automatically generate test scripts through the reinforcement learning model, thereby saving labor costs.

Description of the drawings

In order to more clearly explain one or more embodiments of this specification or the technical solutions in the prior art, the following will briefly introduce the accompanying drawings needed in the description of the embodiments or prior art. Obviously, in the following description The drawings are only some of the embodiments described in one or more embodiments of this specification. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative labor. .

Fig. 1 is an application system of a test script generation method provided by at least one embodiment of this specification;

FIG. 2 is a scenario of automatic path generation of a login example provided by at least one embodiment of this specification;

Figure 3 is a process of automatically generating paths according to Figure 2;

FIG. 4 is a flow of automatic generation of test scripts based on Qianghu learning provided by at least one embodiment of this specification;

Fig. 5 is a test script generation device based on reinforcement learning according to at least one embodiment of this specification.

detailed description

In order to enable those skilled in the art to better understand the technical solutions in one or more embodiments of this specification, the following will combine the drawings in one or more embodiments of this specification to compare The technical solution is described clearly and completely. Obviously, the described embodiments are only a part of the embodiments in this specification, rather than all the embodiments. Based on one or more embodiments of this specification, all other embodiments obtained by those of ordinary skill in the art without creative work should fall within the scope of protection of the present disclosure.

At least one embodiment of this specification uses a reinforcement learning model to automatically generate test scripts.

First, the system to which this method is applied is described through FIG. 1. As shown in FIG. 1, the system may include: a test script generation device 11 based on reinforcement learning (hereinafter referred to as a script generation device), an agent device 12 and a device under test 13.

Among them, a reinforcement learning model can be run in the script generation device 11. For example, the reinforcement learning model may be a Q Learning model, or may also be a deep Q-Network (Deep Q-Network, referred to as DQN) model, etc.

The agent device 12 (Agent) can receive the command sent by the script generation device 11 and operate the device under test 13 according to the command. For example, the proxy device 12 can receive a command to execute a certain action sent by the script generation device 11, and the proxy device 12 can operate and control the device under test 13 to execute the action. In addition, the device under test 13 may feed back the feedback result after executing the action to the proxy device 12. The feedback result may include the state reached after executing the action, and the proxy device 12 may return the feedback state to the script generating device 11 to make the script The generating device 11 performs the next processing.

The device under test 13 may be a device running the test software. For example, the device under test 13 may be a mobile terminal device or a PC terminal device. FIG. 1 takes the mobile terminal as an example.

Take QLearing as an example as follows to describe the system based on Figure 1, which uses reinforcement learning to automatically generate test scripts:

Q table under test

In the QLearing method, the purpose of Q_learning is to update the Q table to make it an accurate Q table. Such a Q table can be maintained in the script generation device 11, and the Q table can include status and action. The following example in Table 1:

Table 1 Example of Q

QQ	action_1action_1	action_2action_2	..........	action_naction_n
state_1state_1	To	To	To	To
state_2state_2	To	To	To	To
..........	To	To	To	To
state_nstate_n	To	To	To	To

In the test scenario, explain the related concepts in the Q table as follows:

The "state": may include multiple test states from the test initial state to the test target state. For example, the test state may include the test initial state and the test intermediate state.

Take the UI-based test as an example, and the user login application as an example:

The initial state of the test can be that the user opens the application homepage;

The test target state can be that the user enters the application homepage.

The test initial state to the test target state may include multiple test intermediate states. For example, after the user clicks the login button, the page jumps to the login page, and the displayed login page can be a test state. For another example, after the user enters the user name, the entered user name is displayed on the page, which can be used as a test state.

Switching between multiple test states is triggered by action actions. For example, if the user clicks the login button, this operation triggers the test state to switch from the application homepage display to the login page display.

The various states mentioned above are observable, and the display information of the current interface can also be obtained through some excuses provided by the system as the state. For example, an android device can automatically obtain the observation status of the current interface control information through adb dump.

For example, the state of the Q table can be designed as follows: Taking the android device as an example, the description information of the current device interface controls can be obtained through adb dump and recorded in an xml format file, which can be recorded The type and coordinate information of all controls on the current interface. A state vector of the interface state can be extracted to identify the state of the current interface. The state vector represents the test state. The state vector can include multiple state features, and each state feature corresponds to the target software under test running in the test state. An interface description dimension of the interface.

For example, in the state vector [x1, x2, x3, x4, x5], the state feature x1 represents the number of controls on the current interface, the state feature x2 represents the maximum level of the current interface layout, and x3 is the total area of all controls on the current interface. x4 is the x value of the average center coordinates of all controls, and x5 is the y value of the average center coordinates of all controls. Among them, the number of controls, the number of levels, the total area, etc., can each be regarded as an interface description dimension. Using this information, the test status of an interface can be roughly expressed in a numerical manner.

The "behavior": multiple action operations that may be encountered in a certain state. In the test scenario, you can test as many possible behaviors as possible in a certain state.

This example is based on UI testing, so the behavior action can be the user's operational behavior, such as clicking the login button. The method in this specification can also be applied to other types of testing, such as functional testing or interface testing, and is not limited to UI testing.

Taking the UI test as an example, the login button is in a certain position on the login page, but the user may not click on the correct position when clicking, for example, clicking an unambiguous empty position, or clicking a wrong Place, wait. In this example, the page screen can be divided into multiple units, each unit is represented by a position, and clicking on that position is used as an action. For example, click (30,10), click (10,10), the above (30,10) can be the coordinates of the click position. How to divide it can be customized.

For example, a design method of click coordinates: In order to improve the universality of the operation, the definition of the operation should be as universal as possible. For example, a click operation can be defined as a click on the interface coordinates, and the specific coordinates can be divided according to the screen resolution. For example, define a total of 20*40 for all click operations.

The abscissa x of each click operation is:

Unit width = (current interface width/20)

The x coordinate of the i-th column operation=unit width/2+i*unit width.

The ordinate of each click operation is:

Unit length = (current interface length/40)

The y coordinate of the jth line operation=unit length/2+j*unit length.

The design of the click operation coordinate can manually specify the size of the grid. For example, the grid corresponding to the click operation can be 20*40 or 30*60, which can be adjusted according to the performance of the operation in specific use.

In addition to click operations, other types of actions can also be included, such as input behavior and sliding behavior. The specific type of action can be related to the design of the interface function of the test software. For example, if the user wants to enter a user name and password on the page, it can include an input action; if the page requires the user to slide, it can include a slide action. Further, the sliding action may also include sliding to the left and sliding to the right, which are respectively an action. Other types of actions can also be designed as universally as possible. For example, for sliding operations, four parameters can be specified, including the coordinates of the specified starting point (x, y) and the coordinates of the ending point (x, y).

The above-mentioned "state" and "behavior" can be designed according to the characteristics of the test software.

When testing software, there are many situations that need to be tested. For example, when testing a certain software, the software can include multiple tests. Here are a few test cases, each of which can be called a test case:

Test case 1: Enter the correct user name and password, and click the submit button to verify whether you can log in correctly. (Normal input)

Test case 2: Enter the wrong user name or password, the authentication login will fail, and the corresponding error message will be prompted. (Error checking);

Test case 3: If the user name and password are too short or too long, what should be done (security, whether there is a prompt when the password is too short)

Each case can design its own Q table, but in order to improve the test efficiency, you can try to design a universal Q table. For example, taking the action of the click type as an example, the correct click position can be different for the case of different test target states, but all possible positions can be listed in the Q table, and the behavior in the Q table can include respective corresponding Clicking behaviors at different screen coordinate positions can be guided by setting different reward values for different position behaviors in different cases.

Different tested cases can have the same action, and some cases have exactly the same action. In different cases, part of the status may be the same. For example, when performing login and registration, the initial status of both login and registration cases may go through the registration login page, and the final status may also reach the application homepage. , But the intermediate links may be different. If there are two different cases, after the corresponding operations are performed, the states on a series of operation paths are the same, then there will be situations where the states are also exactly the same, but this kind of situation is rare.

That is, if the first test target state and the second test target state are two different test target states (these are two different cases), for example, the first test target state can display the login success page, and the second test The target status may be a pop-up error message. Then, when the test target state is the first test target state, the multiple test state statuses are used when the test target state is the second test target state. Status, at least part of the test status can be the same. As mentioned above, the initial test state of the two cases may be the same as the intermediate test state.

At least part of the behaviors used when the test target state is the first test target state and the plurality of behavior actions used when the test target state is the second test target state may be the same. For example, both cases include click operations and include the same multiple possible click coordinates. In different cases, for the corresponding Q value between each state and each behavior, although a certain behavior and state used in different cases are the same, the corresponding Q value between the state and behavior Can be different. For example, suppose the above test case 1 and test case 2 can use the same Q table. When the Q value is updated, for test case 1, if the login is successful, the reward value is set higher; and for test case 2, if If login fails, the reward value is set higher. Then the Q values in the Q tables of the two cases can be different.

Update of the Q table under test

After the Q table is designed, the script generation device 11 can update the Q table, and after the update is completed, the execution path during the test is obtained according to the Q table.

Take the user login application as an example to describe how to get the test execution path:

Figure 2 illustrates the automatic path generation scenario of the login example. Figure 3 is based on the path automatic generation process shown in Figure 2. Please refer to Figure 2 and Figure 3. The method may include:

In step 300, the Q table is initialized, and the Q table includes the state and behavior.

For example, see Table 2 below, which is the Q table in the user login application example.

Table 2 Example of Q in the user login application example

For example, state 1 may be the home page of the application; a login button is displayed on the home page of the application, and the corresponding operation action in state 1 may include clicking on various positions of the home page, including clicking the login button.

State 2 can be an application login page, which displays input boxes for the user name and password, and also includes the display of a login button. In this state 2, possible actions may include clicking on different locations, and may also include input operations such as entering a user name or entering a password. Of course, in other examples, in addition to entering the user name and password, there can also be some other identity authentication methods. For example, if the page requires the user to slide the tab to configure the picture, then it can include actions such as sliding to the left or sliding to the right.

State 3 may be to display the user name entered by the user. There can also be state 4, state 5 and other states before the user successfully logs in to the application.

After successful login, enter the application homepage.

Among them, only after the user performs the correct operation can the state switch be triggered. For example, if the user clicks on a meaningless location on the application homepage, the state may remain in state 1, and it will not switch to state 2.

It can be seen that the state is a number of intermediate states that may be reached during the test,

The action action is the possible user behavior in each state.

The Q value in the table indicates what behavior the user takes, which is more likely to achieve the goal of this test. If the behavior performed by the user is closer to the test target, a higher Q value can be set; it is equivalent to a guide to the user's behavior, so that the path to the test target can be found more quickly. When in a certain state, select a certain behavior action in that state, the behavior with a higher Q value will have a greater probability of being selected.

That is, in the test scenario, the status, behavior, and Q value setting in the Q table are all related to the characteristics of the test software itself and the test target status, and are determined accordingly. Different test software and test target states can have different states and behaviors, and different Q values can be set to guide the generation of test paths.

In this step, initially, the values in the Q table can be initialized to all 0s, or other values can be used.

In step 302, the Q table is updated by means of Q Learning to obtain the updated Q table. The Q table includes Q values corresponding to various behaviors in each state.

As shown in FIG. 2, in each state, the script generation device 11 can randomly select an action in that state, and notify the agent device 12 to execute the action. The agent device 12 controls the execution behavior of the device under test 13 according to instructions, and the ring device under test 13 is equivalent to an operating environment of the software under test.

The proxy device 12 may feed back a feedback result to the script generation device 11, and the feedback result may include whether the state reached after executing the action is the test target state.

The process of updating the Q table can be performed according to the conventional Q Learning method, which will not be described in detail.

The process of updating the Q table is briefly described as follows, but it is not limited to this:

For example, referring to Table 2, in state 1, an action is randomly selected to instruct the agent device 12 to execute. The proxy device 12 reports back to enter state 2. The state 2 is not the target state (successful login). As long as the final state is not reached, the script generation device 11 continues to select an action from the action corresponding to state 2, and continues to instruct the proxy device 12 to execute .

In this loop, until the action is selected in the last state, the result returned by the proxy device 12 is that a successful login has been achieved. Then, a reward value can be given, and the reward value may be used to update the action in the state that triggered the login. For example, after the user enters the user name and password, and clicks the login button, it leads to the successful login to the application. You can enter the user name, enter the password, and click the login button in the above three states. The value is updated slightly higher. For example, the Q values corresponding to other actions in the same state are all 0, and the Q values of these actions can be 0.8 or 0.9.

In the same way, you can continue to iterate, start again from state 1, and randomly select an action to execute. If the behavior of the test target can be reached, the higher Q value is updated until all status lines in the Q table are updated. Then, use the generated Q table to continue iterating, select the action with a higher Q value in each state, and update the Q table according to whether the final result reaches the test target state.

In step 304, the execution path of the test script is obtained according to the Q value, and the execution path includes a sequence of actions that can reach the target state.

In this step, after updating the Q table, the execution path of the test script can be obtained accordingly. For example, the action with the highest Q value in each state can be selected to form a behavior sequence, which is the execution path of the test.

The script generating device 11 has automatically generated a test script so far, and the test script can be sent to the agent device 12 for execution, and the test can be performed through the test script. The generated test script may be at least one.

The test script generation method of this example can save labor costs by using a reinforcement learning model and can generate better test scripts.

The above example is based on Q Learning in reinforcement learning, and other reinforcement learning models can also be used for processing, such as DQN.

When using the DQN model, the DQN network can be pre-trained, and the input of the DQN network can be the state in the Q table, which is the state during the test, such as the image of the software interface of the test. The output of the DQN model can be the Q value corresponding to each action action in the corresponding state. The Q value is the state-action value, which is a function of state and action. After training, the trained DQN network can be obtained. After the DQN network training is completed, the Q values corresponding to each behavior in each state have been obtained. During the test, input the current state and select the action corresponding to the maximum Q value to execute. This is the test path.

For different cases, you can use the same method to get the corresponding test path in that case.

The method for generating test scripts is not limited to the above Q Learning and DQN, and other reinforcement learning models can also be used. The flow in Figure 4 illustrates the processing flow when reinforcement learning is applied to the automatic generation of test scripts, which may include the following processing:

In step 400, a status status and a behavior action for testing are obtained, the status includes a plurality of test states from the test initial state to the test target state, and the switching of the plurality of test states is triggered by the behavior, one The state corresponds to multiple possible behaviors;

In step 402, run a reinforcement learning model to determine the corresponding Q value between each state and each behavior;

In step 404, the execution path of the test script is obtained according to the Q value, and the execution path includes a sequence of actions that can reach the target state, so as to pass the test script for testing.

The model based on reinforcement learning realizes the automatic generation of automated test cases, which greatly reduces the investment in labor costs, and can achieve the effect of automatically generating use cases without manually writing use case scripts.

Fig. 5 provides a test script generation device based on reinforcement learning according to at least one embodiment of this specification, and the device is used to generate a test script. As shown in FIG. 5, the device may include: an information acquisition module 51, a model running module 52, and a script generation module 53.

The information acquisition module 51 is configured to acquire the state and behavior used for testing, the state includes a plurality of test states from the test initial state to the test target state, and the switching between the plurality of test states is triggered by the behavior , One state corresponds to multiple possible behaviors;

The model running module 52 is configured to run a reinforcement learning model to determine the corresponding Q value between each state and each behavior;

The script generation module 53 is configured to obtain a test script according to the Q value, the test script corresponding to an execution path from the initial test state to the test target state, and the execution path includes a sequence of actions that can reach the target state, To pass the test script for testing.

In an example, when the type of the behavior is click, the behavior includes click behaviors corresponding to different interface coordinates, and the interface is the running interface of the target software under test.

In an example, the model running module 52 is specifically configured to: initialize the Q table, which includes the state and behavior; update the Q table by means of Q Learning, to obtain the updated Q table .

In an example, the model running module 52 is specifically used to train a DQN, where the input of the DQN is the state, and the output is the Q value corresponding to the state and behavior; and the DQN after the training is obtained.

At least one embodiment of the present specification also provides a test script generation device based on reinforcement learning. The device includes a memory and a processor. The memory is used to store computer instructions that can run on the processor; the processor is used for The following steps are implemented when the computer instructions are executed:

It should also be noted that the terms "including", "including" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, commodity or equipment including a series of elements not only includes those elements, but also includes Other elements that are not explicitly listed, or also include elements inherent to such processes, methods, commodities, or equipment. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, method, commodity, or equipment that includes the element.

Those skilled in the art should understand that one or more embodiments of this specification can be provided as a method, a system, or a computer program product. Therefore, one or more embodiments of this specification may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, one or more embodiments of this specification may adopt a computer program implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code. The form of the product.

One or more embodiments of this specification may be described in the general context of computer-executable instructions executed by a computer, such as program modules. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform specific tasks or implement specific abstract data types. One or more embodiments of this specification can also be practiced in distributed computing environments. In these distributed computing environments, tasks are performed by remote processing devices connected through a communication network. In a distributed computing environment, program modules can be located in local and remote computer storage media including storage devices.

The various embodiments in this specification are described in a progressive manner, and the same or similar parts between the various embodiments can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the data processing device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for related parts, please refer to the part of the description of the method embodiment.

The foregoing describes specific embodiments of this specification. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps described in the claims can be performed in a different order than in the embodiments and still achieve desired results. In addition, the processes depicted in the drawings do not necessarily require the specific order or sequential order shown to achieve the desired result. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

The above descriptions are only preferred embodiments of one or more embodiments of this specification, and are not intended to limit one or more embodiments of this specification. All within the spirit and principle of one or more embodiments of this specification, Any modification, equivalent replacement, improvement, etc. made should be included in the protection scope of one or more embodiments of this specification.

Claims

A method for generating test scripts based on reinforcement learning, where the test scripts are used to test target software under test; the method includes:

Acquire the state and behavior for testing, the state includes a plurality of test states from the test initial state to the test target state, and the switching between the plurality of test states is triggered by the behavior, and one state corresponds to multiple Possible actions

Run a reinforcement learning model to determine the corresponding Q value between each state and each behavior;

According to the Q value, a test script is obtained, the test script corresponds to an execution path from the initial test state to the test target state, and the execution path includes a sequence of behaviors that can reach the target state to pass the test script pair The target software under test is tested.
The method according to claim 1, wherein the behavior includes multiple types of behavior, and the type of the behavior is related to the interface design of the target software under test.
The method according to claim 1,

The multiple test states used when the test target state is the first test target state are at least partially the same as the multiple test states used when the test target state is the second test target state;

The multiple behaviors used when the test target state is the first test target state are at least partially the same as the multiple behaviors used when the test target state is the second test target state;

The first test target state and the second test target state are different.
The method according to claim 1,

When the type of the behavior is click, the behavior includes click behaviors corresponding to different interface coordinates, and the interface is the running interface of the target software under test.
According to the method of claim 4,

The interface coordinates are divided according to the screen resolution of the running interface of the target software under test.
The method according to claim 1, wherein the running a reinforcement learning model to determine the corresponding Q value between each state and each behavior includes:

Initialize the Q table, the Q table includes the state and behavior;

The Q table is updated through Q Learning, and the updated Q table is obtained.
The method according to claim 1, wherein the running a reinforcement learning model to determine the corresponding Q value between each state and each behavior includes:

Training a deep Q network, the input of the deep Q network is the state, and the output is the Q value corresponding to the state and behavior; the trained deep Q network is obtained.
The method according to claim 1, wherein the behavior is a user's operation behavior on the running interface of the target software under test.
The method according to claim 1,

Each of the test states of the target software under test includes multiple state characteristics;

Each of the state characteristics corresponds to an interface description dimension of the target software running interface in the test state.
A test script generation device based on reinforcement learning, the device is used to generate a test script; the device includes:

The information acquisition module is used to acquire the state and behavior used for testing, the state includes a plurality of test states from the test initial state to the test target state, and the switching between the plurality of test states is triggered by the behavior, One said state corresponds to multiple possible actions;

The model running module is used to run the reinforcement learning model to determine the corresponding Q value between each state and each behavior;

The script generation module is used to obtain a test script according to the Q value, the test script corresponding to an execution path from the initial test state to the test target state, and the execution path includes a sequence of behaviors that can reach the target state to Test through the test script.
The device according to claim 10,

When the type of the behavior is click, the behavior includes click behaviors corresponding to different interface coordinates, and the interface is the running interface of the target software under test.
The device according to claim 10,

The model running module is specifically configured to: initialize a Q table, which includes the state and behavior; and update the Q table by means of Q Learning to obtain an updated Q table.
The device according to claim 10,

The model operation module is specifically used to train a deep Q network, the input of the deep Q network is the state, and the output is the Q value corresponding to the state and behavior; and the trained deep Q network is obtained.
A test script generation device based on reinforcement learning. The device includes a memory and a processor. The memory is used to store computer instructions that can run on the processor; the processor is used to implement the following when executing the computer instructions step:

Acquire the state and behavior for testing, the state includes a plurality of test states from the test initial state to the test target state, and the switching between the plurality of test states is triggered by the behavior, and one state corresponds to multiple Possible actions

Run a reinforcement learning model to determine the corresponding Q value between each state and each behavior;

According to the Q value, a test script is obtained, the test script corresponds to an execution path from the initial test state to the test target state, and the execution path includes a sequence of behaviors that can reach the target state to pass the test script pair The target software under test is tested.