CN116225963A

CN116225963A - Automatic generation method and system for Web application test cases based on reinforcement learning

Info

Publication number: CN116225963A
Application number: CN202310367115.3A
Authority: CN
Inventors: 裴求根; 梁哲恒; 龙震岳; 沈桂泉; 周纯; 崔磊; 张金波; 曾纪钧; 沈伍强; 张小陆; 李凯; 周昉昉; 吴国全; 常晓宁
Original assignee: Guangdong Power Grid Co Ltd
Current assignee: Guangdong Power Grid Co Ltd
Priority date: 2023-04-06
Filing date: 2023-04-06
Publication date: 2023-06-06

Abstract

The invention relates to a Web application test case automatic generation method and a system based on reinforcement learning, which extract states from Web pages and output the extracted states; identifying interactable elements in the webpage, generating actions for accessing the interactable elements, and outputting the generated actions; identifying whether the states are known states, and constructing a state diagram taking the states as nodes and taking the actions transferred between the states as edges; calculating rewards according to the states and actions; training an action selection strategy by using a reinforcement learning algorithm, selecting an action according to the action selection strategy, executing the action on the Web application, enabling the Web application to jump to another webpage, and storing an action sequence executed on the Web application as a test case. The invention can automatically provide test cases for target Web application, greatly improves the coverage rate of the existing test cases, and discovers potential defects as soon as possible. In the process of generating the test cases, the automation degree of the Web application test is improved, and the availability of the Web application is further improved.

Description

Automatic generation method and system for Web application test cases based on reinforcement learning

Technical Field

The invention relates to an automatic generation method and system for a Web application test case based on reinforcement learning, and belongs to the field of software testing.

Background

In recent years, web applications have increased dramatically. Recent surveys have shown that by 2022, 7 months, global Web applications have exceeded 10 billion. The user spends an average of 7 hours using the web application per day. Manual testing and automated testing are important means of ensuring web application quality.

However, manual testing is time consuming. Also, there are a number of possible sequences in Web applications. The manual test cases cover only a small portion of them. Selenium, playwright and other tools can generate clicking, inputting and other actions simulating manual operation according to test scripts written by testers to interact with target applications. These tools still require expertise from a tester to write test cases. The automatic test tool and the automatic test method for the Web application program CN201710023922.8 further support the T language to write test cases, and a tester can write test scripts for testing only by knowing a webpage test scene and simple basic programming knowledge. Although the T language simplifies Java and C languages, CN201710023922.8 still requires a tester to have development skills and scene knowledge.

In summary, while the prior art simplifies test case portability, manual intervention is still required and it is difficult to automatically generate valid action sequences.

Disclosure of Invention

The invention solves the technical problems: the method and the tool for automatically generating the Web application test cases based on reinforcement learning can train an effective action selection strategy to generate the test cases by utilizing the executed action sequences, support automatic extraction of application states, select actions on the states, and adopt the reinforcement learning training action selection strategy according to action execution results, so that the test case generation is not performed manually any more.

The technical proposal of the invention is as follows:

in a first aspect, a method for automatically generating a Web application test case based on reinforcement learning is provided, including the following steps:

step 1: extracting a state from a web page: acquiring a node tree of a webpage, and initializing a state by using the node tree; then traversing the state to delete the elements of the redundant node tree which have no influence on the page visualization and the related edges thereof, and simplifying the element number of the state; the simplified node tree is called a rendering tree, elements on the rendering tree are traversed with width priority, if two elements are similar, the two elements are considered to have similar functions, for one element, if the other element is similar to the one element, the two elements are considered to be elements with similar functions, and the elements with similar functions are identified to be in the same state, so that the states are extracted from the webpage;

Step 2: identifying an accessible element in the extracted state, an act of generating the accessible element: traversing the elements on the original node tree through the element identification action of the node tree, and if one element accords with a predefined condition, considering that accessible elements exist in the action and generating the action of the accessible elements;

step 3: constructing a state diagram which takes states as nodes and takes actions transferred between the states as edges, and reflecting the transfer relation between the states, wherein the state diagram comprises a state set, an action set, action triggerable state transfer and an initial state; when the current state and the current action are added into the state diagram, whether the current state exists in the state diagram or not needs to be judged, whether the current state is a known state or not is recognized, and redundant states in the state diagram are avoided; for the current action, traversing the action set of the state diagram, judging whether the known action exists in the action set and is the same as the current action, and avoiding redundant actions in the state diagram;

step 4: based on the states and actions in the state diagram, evaluating the contribution of the actions executed by each step to the state execution, designing a reward model, and calculating rewards for the actions through the reward model; each time the current action is executed, the state is transferred from the previous state to the current state, and rewards are calculated for the current action according to the rewards model;

Step 5: training an action selection strategy by using a reinforcement learning algorithm according to the state of the step 1, the action output by the step 2 and the reward of the step 4, selecting the action according to the action selection strategy, and outputting the selected action;

step 6: executing the action output in the step 5 on the Web application, so that the Web application jumps to another webpage;

step 7: continuing to execute the steps 1-6, and storing the action sequence executed on the Web application as a test case.

Further, the specific implementation process of the step 1 is as follows:

(11) Obtaining a DOM tree of a webpage, and initializing a state by using the DOM tree, namely respectively initializing a node set and an edge set of the state by using the node set and the edge set of the DOM tree;

(12) Traversing the state to delete a redundant element, if an element has and only has one child element, deleting the element from the node set of the state, deleting an edge associated with the element from the edge set of the state, and adding an edge from the parent element of the element to the child element of the element to the edge set of the state;

(13) Performing breadth-first traversal on the rendering tree, regarding one element, if another element is similar to the one element, considering the one element and the other element as the same service function, and using the one element to represent the service function; then traversing the sub-elements of the two elements further, and searching similar elements with the sub-elements of the one element in the sub-elements of the other element; otherwise, if the one element does not exist, searching for similar elements of the child element of the one element is stopped.

Further, in the step (13), the judging that the other element is similar to the one element includes: another element has a similar structure to the one element or another element has a similar pattern to the one element, wherein:

(131) Another element has a similar structure to the one element: whether the elements have similar structures or not is identified by utilizing the ambiguity of the path language of the elements, namely, searching the similar elements by neglecting the index of ancestor elements on the path language of the elements;

(132) Another element has a similar style to the element: defining style similarity of another element to the element as a weighted average of their class name, location, size, hyperlink, external file reference and identified edit distance; if the style similarity between the other element and the one element is greater than or equal to a preset threshold value, the other element is considered to have a similar style with the one element.

Further, in the step 2, the predefined condition is one of the following three conditions:

(21) If the tag name of an element matches the default setting, then the element is considered to be operated on and an action is generated to access the element;

(22) If the label of an element is an input control or a plurality of lines of plain text editing controls and the type of the element is one of a radio button and a check box, then deeming the element as a clickable element and generating an action for the element of a clickable type to access the element;

(23) If an element tag number and type matches the user configuration, the element is considered to be operational and an action is generated to access the element.

Further, in the step 3, when the state and the action are added to the state diagram, it is specifically determined whether the state and the action exist in the state diagram as follows:

(31) Comparing whether the current state is the same as the existing state one by one, if so, setting the current state as the existing state; otherwise, if the current state is different from the existing state, the current state is not in the state set of the state diagram, and the current state is added into the state set of the state diagram; then, adding a transition from the previous state to the current state to the state diagram;

(32) In the current state, each feasible action of the current state is required to be compared with the existing actions in the state diagram, and if the current action is the same as the existing actions, the current action is not added into the action set; otherwise, if the current action is not the same as the existing action, adding the current action into the action set; for an action, the set of actions of the state diagram is traversed to check if there is an existing action that is the same as the current action, and if the two actions are of the same type and the elements they access are similar, then the two actions are considered to be the same.

Further, in the step (1), if the current state is the same as the existing state, constructing a state index tree, judging by adopting similarity, and if the similarity between the current state and the existing state in the state set is higher than a preset threshold, considering that the current state is the same as the existing state;

the similarity judgment is specifically as follows:

given the current state, constructing a state index tree, and judging whether the existing state is the same as the current state in the state diagram or not through similarity judgment and comparison of the state and the state index tree;

the method is concretely realized as follows: when constructing a state index tree, firstly labeling all elements on the current state s-to be used for indicating which state the elements belong to; then, acquiring a state index tree formed by the existing states, if the state index tree is not constructed, setting the state index tree as the current state, and not having the same state as the current state;

if a state index tree exists, traversing the state index tree to calculate the number of similar elements between the current state and the existing state; meanwhile, the elements in the current state are merged into a state index tree, and the number of similar elements between the current state and the existing state is calculated;

After obtaining the number of similar elements between the current state and the existing state, respectively calculating the similarity between the current state and the existing state, wherein the similarity between the current state and the existing state is defined as:

wherein, # similarity num is the number of similar elements between state s and existing state s ', and #s ' represent the number of elements of state s and existing state s ', respectively; min represents minimum;

determining that a certain state has maximum similarity with the existing state, and if the maximum similarity is larger than a preset threshold value, considering that the state is the same as the existing state; otherwise, a certain state is not the same state as the existing state.

Further, in the step 4, the reward model is as follows:

r _i to calculate rewards, if the current state is external web page extraction or the current state is the same as the previous state, a negative prize is awarded to the current actionExcitation, i.e., pendty < 0; for other cases, a positive prize r 'is calculated' _i ：

r′ _i ＝w _loc *r _loc +w _cur *r _curiosity +w _freq *r _freq +w _explore *r _explore

Wherein w is _loc 、w _cur 、w _freq And w _explore The corresponding weight of each reward index is larger than 0;

the r is _loc To a bonus indicator based on current action position: indicating that the agent has selected to access an element e _i Is the action of the element e _i Adjacent element e of (2) _i-1 ，r _loc The formula is:

wherein h (e) and w (e) each represent an element e _i Height and width of dist (e) _i-1 ，e _i ) Representing and the one element e _i Adjacent element e _i-1 The Lewy distance between them;

the r is _curiosity For rewarding index based on curiosity of current action, refer to the element existing in the current state and not existing in the previous state as a change element, r _curiosity The calculation formula is as follows:

that is, if the element accessed by the current action is a change element, the current action is given a prize value of 1/mutants (s _i )，mutants(s _i ) Is represented in the current state s _i The number of the medium change elements;

the r is _freq The bonus index based on the execution frequency of the motion transition means that if the execution times of the motion transition from the previous state to the current state are more, the motion is based on the bonus index r of the execution frequency _freq The smaller r _freq The definition is as follows:

wherein N is _i Representing the number of executions of the transition from the previous state to the current state through the current action;

the r is _explore Based on the rewarding index of the current state, namely after executing the current action, the target application is transferred to the current state, if the action proportion which is not executed in the current state is more, the current action is helpful for testing the target application, and the rewarding index r of the current action is given _explore ，r _explore The calculation formula is as follows:

Wherein m is _i And n _i The number of actions representing the current state and the number of actions that have not been performed, respectively.

Further, in the step 5, training the action selection strategy by using the reinforcement learning algorithm is specifically implemented as follows:

given a Web application, automatically generating a group of test cases, firstly recording a target application homepage link, and initializing a test case set, a strategy and a state set;

executing a plurality of periods, and resetting the target application by revisiting the homepage link of the target application when the period starts; then extracting state and action from homepage; in each cycle, generating a sequence of actions by performing several steps, in step i, selecting a current action in a previous state, after performing the current action, the target application jumps to the current web page, extracts the current state and the feasible actions from the current web page, updates the state set with the extracted current state, and calculates a reward for the current action for the feasible actions, based on the previous state, the current action, the current state and the reward r for the current action _i Updating the policy and making the currentThe action adds a motion sequence, and the motion sequence generated in each period is stored in a test case set;

The updating strategy is carried out by an updating function Q, and the updating mode is as follows:

Q(s _i-1 ，act _i )←Q(s _i-1 ，act _i )+α(r _i +γQ ^* (s _i ，a _i+1 )-Q(s _i-1 ，a _i ))

Q(s _i-1 ，act _i ) Representing the previous state s _i-1 Executing the current action act _i Contribution to test target application; when executing the current action act _i So that the previous state s _i-1 From the current action act _i Transition to current state s _i At the time, the current prize r is calculated _i And updates the function Q, Q ^* (s _i+1 ，a _i+1 ) Representing the slave state s _i+1 Starting the obtained cumulative maximum contribution; the jackpot will be at a discount rate gamma E0, 1]Discounting alpha epsilon [0,1 ]]Representing the learning rate.

In a second aspect, the present invention provides an automatic generation system for Web application test cases based on reinforcement learning, including: the system comprises a state extraction module, an action extraction module, a state diagram construction module, a reward model module and a reinforcement learning agent module; wherein:

the state extraction module: extracting a state from a web page: acquiring a node tree of a webpage, and initializing a state by using the node tree; then traversing the state to delete the elements of the redundant node tree which have no influence on the page visualization and the related edges thereof, and simplifying the element number of the state; the simplified node tree is called a rendering tree, elements on the rendering tree are traversed with width priority, if two elements are similar, the two elements are considered to have similar functions, for one element, if the other element is similar to the one element, the two elements are considered to be elements with similar functions, and the elements with similar functions are identified to be in the same state, so that the states are extracted from the webpage;

The action extraction module: identifying an accessible element in the extracted state, an act of generating the accessible element: traversing the elements on the original node tree through the element identification action of the node tree, and if one element accords with a predefined condition, considering that accessible elements exist in the action and generating the action of the accessible elements;

the state diagram construction module: constructing a state diagram which takes states as nodes and takes actions transferred between the states as edges, and reflecting the transfer relation between the states, wherein the state diagram comprises a state set, an action set, action triggerable state transfer and an initial state; when the current state and the current action are added into the state diagram, whether the current state exists in the state diagram or not needs to be judged, whether the current state is a known state or not is recognized, and redundant states in the state diagram are avoided; for the current action, traversing the action set of the state diagram, judging whether the known action exists in the action set and is the same as the current action, and avoiding redundant actions in the state diagram;

a reward model module: based on the states and actions in the state diagram, evaluating the contribution of the actions executed by each step to the state execution, designing a reward model, and calculating rewards for the actions through the reward model; each time the current action is executed, the state is transferred from the previous state to the current state, and rewards are calculated for the current action according to the rewards model;

Reinforcement learning agent module: training an action selection strategy by using a reinforcement learning algorithm according to the state, the actions and the rewards, selecting the actions according to the action selection strategy, and outputting the selected actions; and executing the output action on the Web application, so that the Web application jumps to another webpage, continuously executing the state extraction module, the action extraction module, the state diagram construction module and the rewarding model module, and storing the sequence of the action executed on the Web application as a test case.

In a third aspect, the present invention provides an electronic device (computer, server, smart phone, etc.) comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for performing the steps of the method for reinforcement learning based automatic generation of a Web application test case of the present invention.

In a fourth aspect, the present invention provides a computer readable storage medium (e.g., ROM/RAM, magnetic disk, optical disk) storing a computer program, which when executed by a computer, implements the steps of a reinforcement learning-based automatic generation method for a Web application test case of the present invention.

Compared with the prior art, the invention has the advantages that:

(1) The invention can automatically provide test cases for target Web application, greatly improves the coverage rate of the existing test cases, and discovers potential defects as soon as possible. In the process of generating the test cases, the state space is constructed by collecting page information, extracting states and actions, and the state space is executed by adopting a reinforcement learning algorithm, so that the degree of automation of the Web application test is improved, and the usability of the Web application is further improved.

(2) The existing random-based method randomly explores a state space, and is difficult to generate an effective action sequence. In a model-based approach, the model can provide information that generates a valid sequence of actions. However, it is difficult for existing model-based methods to build a complete model for the target application. Thus, these methods can only generate a limited sequence of actions. In order to generate test cases capable of covering multiple states, the invention mainly comprises two stages: status and actions. Each time an action is performed, the state and action are extracted from the web page. In order to effectively identify the state, the invention uses the technical feature that similar elements represent the same function, combines similar functional elements, identifies the web pages with similar functions as the same state, selects proper actions through the reinforcement learning agent, and executes the actions on the target application. And updating the rewards of the selected actions according to the execution result. In order to improve the state execution efficiency, an innovative reward model is designed to guide interaction with Web application like human testers, and a reinforcement learning training action selection strategy is adopted, so that test case generation is not performed manually any more, the coverage rate of the existing test case is greatly improved, and potential defects are found as soon as possible. In the process of generating the test cases, the automation degree of the Web application test is improved, and the usability of the Web application is further improved.

Drawings

FIG. 1 is a flow chart of the implementation of the method of the present invention

FIG. 2 is a block diagram of the components of the tool of the present invention;

FIG. 3 is a flow chart of extracting a state from a web page according to the present invention;

FIG. 4 is a flow chart of identifying accessible elements in the extracted state of the present invention.

Detailed Description

The present invention will be described in detail with reference to the accompanying drawings.

As shown in fig. 1 and 2, the implementation of the method of the present invention is specifically as follows:

step 1: as shown in fig. 3, the state is extracted from the web page: acquiring a node tree of a webpage, wherein the embodiment of the invention adopts a DOM tree and initializes a state by using the DOM tree; then traversing the state to delete the elements of the redundant DOM tree and the related edges thereof which have no influence on the visualization of the page, and simplifying the element number of the state; the simplified DOM tree is called a rendering tree, elements on the rendering tree are traversed in width first, if two elements are similar, the two elements are considered to have similar functions, for one element e, if another element e 'is similar to the one element e, the one element e and the other element e' are considered to be elements with similar functions, the elements with similar functions are combined, and meanwhile the elements with similar functions are identified to be in the same state, so that the state s is extracted from the webpage;

The specific implementation method is as follows:

(11) Obtaining a DOM tree (node tree) of a webpage, and initializing a state by using the DOM tree, namely initializing a node set and an edge set of a state s by using the node set and the edge set of the DOM tree respectively;

(12) Traversing the state to delete a redundant element, if an element has and only has one child element, deleting the element e from the node set of the state, deleting the edge associated with the element from the edge set of the state, and adding the edge from the parent element of the element to the child element of the element to the edge set of the state s;

(13) Performing breadth-first traversal on the rendering tree, regarding one element e, if another element e 'is similar to the one element e, considering the one element e and the other element e' as the same service function, and using the one element e to characterize the service function; then traversing the sub-elements of the two elements e and e 'further, and searching similar elements with the sub-elements of the one element e in the sub-elements of the other element e'; otherwise, if the one element does not exist, searching for similar elements of the child element of the one element e is stopped.

In the step (13), the judging that the other element e' is similar to the one element e includes: another element e 'has a similar structure to the one element e, or another element e' has a similar pattern to the one element e, in which:

(131) Another element e' has a similar structure to the one element e: whether the elements have similar structures or not is identified by utilizing the ambiguity of the path language of the elements, namely, searching the similar elements by neglecting the index of ancestor elements on the path language of the elements;

(132) Another element e' has a similar style to the element e: the style similarity of another element e' to the element e is defined as a weighted average of the edit distances of their class name className, position, size, hyperlink href, external file reference src and identity id; if the style similarity between the other element e 'and the one element e is greater than or equal to a preset threshold value, the other element e' is considered to have a similar style with the one element.

Step 2: as shown in fig. 4, the act of identifying an accessible element in the extracted state, generating an accessible element: traversing the elements on the original DOM tree through the element identification action of the DOM tree, and if one element e accords with a predefined condition, considering that an accessible element e exists in the action act, and generating an action act of the accessible element;

the above predefined condition is one of the following three:

(21) If the tag name of an element e matches the default setting, then the element e is considered to be operated on and an action is generated to access the element e;

(22) If the tag name of an element e is input control < input > or a plurality of lines of plain text editing control < textarea >, and the type of the element e is one of radio and checkbox, then the element e is considered a clickable element, and an action of click type is generated for the element e to access the element e;

(3) If the tag name and type of an element e match the user configuration, the element e is considered to be operational and an action is generated to access the element e.

Step 3: constructing a state diagram taking states as nodes and taking actions of transition between the states as edges, and reflecting transition relation among the states, wherein the state diagram comprises a state set, an action set, action triggerable state transition and an initial state, and the states M= (S, A, sigma, S) ₀ ) Where S is the state set, A is the action set, the action can trigger the state transition, Σ: S×A→S is a state transition set, S ₀ Is an initial state; when adding the state and the action into the state diagram, judging whether the state s exists in the state diagram, and identifying whether the state is a known state, so that the state space explosion caused by the existence of redundant states in the state diagram is avoided; for the action act, traversing the action set A of the state diagram, judging whether the known action act' exists in the action set A and is the same as the action act, avoiding redundant actions in the state diagram and avoiding explosion of an action space;

When the state and the action are added into the state diagram, whether the state and the action exist in the state diagram is identified as follows:

(31) Comparing the current states s one by one _i Whether or not the current state s 'is identical to the existing state s', if the current state s _i Is identical to the existing state s' and the current state s _i Set to the existing state s'; conversely, if the current state s _i Different from the existing state s', the current state s is described _i Not existing in the state set of the state diagram M, the current state s is calculated _i Adding a state set of a state diagram M; then, add transition from previous state to present state s to state diagram _i Is transferred from the first to the second stage;

(32) In the current state s _i The current state s also needs to be checked _i If the current action act is compared with the existing action act' in the state diagram _i As with the existing action act', the current action act is no longer performed _i Adding action concentration; conversely, if the current action act _i Different from the existing action act', the current action act is performed _i Adding an action set A; for the current action act _i The action set A of the state diagram is traversed to check whether there are existing actions act' ∈A and current actions act _i The same, two actions are considered to be identical if the types of the actions are identical and the elements they access are similar.

In the step (31), if the current state s is judged _i When the current state s 'is the same as the existing state s', constructing a state index tree, judging by adopting similarity, and if the current state s _i The similarity with the existing state s' in the state set is higher than the preset threshold value, and the current state s is considered _i The same as the existing state s';

the similarity judgment is specifically as follows:

given the current state s _i Constructing a state index tree, and judging whether the existing state s' and the current state s exist in the state diagram M or not through similarity judgment and comparison of the state s and the state index tree _i The same;

the method is concretely realized as follows: when constructing the state index tree, the current state s is first _i Labeling all elements on the element list to indicate which state the element belongs to; then, a state index tree formed by the existing state s' is obtained, and if the state index tree is not constructed, the state index tree is set as the current state s _i And there is no relation to the current state s _i The same state;

if a state index tree exists, traversing the state index tree to calculate the current state s _i And the number of similar elements between the existing states s'; at the same time, the current state s _i Merging elements of the current state s into a state index tree _i The number of similar elements to the existing state s';

the manner of obtaining the number of similar elements of the state s and the existing state s' is as follows:

(1) Initializing simCnt to count the number of similar elements between the state s and each existing state;

(2) Respectively obtaining a root element sRoot and an iRoot of a state s and a state index tree stateIndexTree, and obtaining a child element sChilds of the element sRoot;

(3) Merging child elements sc epsilon sCkilds and descendant elements thereof of the sRoot into a state index subtree taking the element iRoot as a root through a function mgAndcnt (), and calculating the number of descendants of the element sc and descendant similar elements of the element iRoot;

(4) The function mgAndcnt () merges element elements into a state index subtree rooted at element iRoot, and counts the number of similar elements of the subtree rooted at element (i.e., sub-state) and the state index subtree rooted at element iRoot;

(5) Firstly, obtaining child elements iChilds of the element iRoot, and finding an element simEle with the maximum element similarity with the element in the child elements, wherein the element similarity between the child elements is sim. If the element similarity is sim greater than a preset threshold value eleSimThreshold, the element simElement and the element are considered to be similar;

(6) The element simElement belongs to the existing state id as known from the tag on the element simElement. Thus, simCnt will _sub [stateId]An increase of 1; and, further merging sub-elements of the element simElement into a state index subtree rooted at the element simElement, and counting the number simCnt of offspring elements of the element simElement and similar elements of the state index subtree rooted at the element simElement _c The method comprises the steps of carrying out a first treatment on the surface of the Otherwise, if the element similarity is sim smaller than a preset threshold value eleSimThreshold, the element simElement and the element are considered to be dissimilar;

(7) The elements element and edge (iRoot) are added to the state index tree by the function addSubTree ().

And finally, returning the information of the number of the subtrees taking the element as the root and the similar elements of the existing states.

After obtaining the number of similar elements of the state s and the existing state s ', calculating the similarity between the state s and the existing state s ', wherein the similarity between the state s and the existing state s ' is defined as:

wherein, # similarity num is the number of similar elements between state s and existing state s ', and #s ' represent the number of elements of state s and existing state s ', respectively;

determining that the state s has the maximum similarity with the existing state s ', and if the maximum similarity is larger than a preset threshold value, considering that the state s is the same as the existing state s'; if not, state s is not the same state as the existing state s'.

Step 4: based on the states and actions in the state diagram, evaluating the contribution of the actions executed by each step to the execution of the state space, designing a reward model, and calculating rewards for the actions through the reward model; whenever the current action act _i After execution, the state goes from the previous state s _i-1 Transition to current state s _i According to the rewarding model, act is the current action _i Calculating rewards;

reward model r _i The definition is as follows:

if the current action act is performed _i After that, the target application jumps to an external link, or the current state s _i The current action act is the same as the previous state _i Giving negative rewards, namely, penalty < 0;

otherwise, a positive prize r 'is calculated' _i (act _i )：

the r is _loc To act based on the current action _i Position rewarding index: let act from the previous operation _i-1 And current action act _i The elements of the operation are e respectively _i-1 And e _i The agent selects to access an element e _i Wherein the one element e _i Adjacent element e of (2) _i-1 ，r _loc The formula is:

wherein h (e) and w (e) represent the height and width of element e, dist (e) _i-1 ，e _i ) Representing two adjacent elements e _i-1 And e _i The Lewy distance between them;

the r is _curiosity To act based on the current action _i Curiosity bonus indicator being in the current state s _i Exist in the previous state s _i-1 The element not existing above is a change element, r _curiosity The calculation formula is as follows:

i.e. if the current action act _i The accessed element is a change element, the current action act _i The awarded prize value is 1/mutants (s _i )，mutants(s _i ) Is represented in the current state s _i The number of the medium change elements;

the r is _freq For a bonus index based on the execution frequency of action transitions, if from the previous state s _i-1 Through the current action act _i Transition to current state s _i The more times the execution of the action is, the more the action is based on the bonus index r of the execution frequency _freq The smaller r _freq The definition is as follows:

wherein N is _i Representing the previous state s _i-1 Through the current action act _i Transition to current state s _i The number of executions of (a);

the r is _explore Based on the current state s _i Is to perform the current action act _i After that, the target application transitions to the current state s _i If the current state s _i The more actions are scaled that have not been performed, the current action act _i Contributing to testing the target application, giving the current action act _i Prize index r _explore ，r _explore The calculation formula is as follows:

wherein m is _i And n _i Respectively represent the current state s _i And the number of actions that have not been performed.

Step 5: training an action selection strategy by using a reinforcement learning algorithm according to the state of the step 1, the action output by the step 2 and the reward output by the step 4, selecting the action according to the action selection strategy, and outputting the selected action;

Training the action selection strategy by using the reinforcement learning algorithm is specifically implemented as follows:

giving a Web application, automatically generating a group of test cases T, firstly recording a target application homepage link url, and initializing the test case set T, a strategy pi and a state set S;

executing a plurality of periods N, when the period starts, reaching the homepage p by revisiting homepage links url of the target application ₀ To reset the target application; next, from homepage p ₀ Extracting a state and an action; in each cycle, the sequence of actions as is generated by performing several steps, in the ith step, in the previous state s _i-1 Selecting a current action act _i ExecuteCurrent action act _i After that, the target application jumps to the current web page p _i From the current web page p _i Extracting the current state s _i And mobile actions, using the extracted current state s _i Updating the state set S and being the current action act of actions in the feasible actions _i Calculating the prize r of the current action _i Based on the previous state s _i-1 Current action act _i Current state s _i And rewards r for current actions _i Update policy pi and act the current action _i Adding an action sequence as, and storing the action sequence as generated in each period into a test case set T;

the updating strategy pi is performed by an updating function Q, and the updating mode is as follows:

Based on the same inventive concept, another embodiment of the present invention provides an electronic device (computer, server, smart phone, etc.) comprising a memory storing a computer program configured to be executed by the processor, and a processor, the computer program comprising instructions for performing the steps in the inventive method.

Based on the same inventive concept, another embodiment of the present invention provides a computer readable storage medium (e.g., ROM/RAM, magnetic disk, optical disk) storing a computer program which, when executed by a computer, implements the steps of the inventive method.

The above examples are provided for the purpose of describing the present invention only and are not intended to limit the scope of the present invention. The scope of the invention is defined by the appended claims. Various equivalents and modifications that do not depart from the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. The automatic generation method of the Web application test case based on reinforcement learning is characterized by comprising the following steps of:

step 4: based on the states and actions in the state diagram, evaluating the contribution of the actions executed by each step to the state execution, designing a reward model, and calculating rewards for the actions through the reward model; each time the action is executed, the state is transferred from the previous state to the current state, and rewards are calculated for the current action according to the rewards model;

2. The reinforcement learning-based automatic generation method of Web application test cases according to claim 1, wherein: the specific implementation process of the step 1 is as follows:

(12) Traversing a state to delete a redundant element, if an element has and only has one child element, deleting the element from a set of nodes of the state, deleting an edge associated with the element from a set of edges of the state, and adding an edge from a parent element of the element to a child element of the element to a set of edges of the state;

3. The reinforcement learning-based automatic generation method of Web application test cases according to claim 2, wherein: in the step (13), the judging that the other element is similar to the one element includes: another element has a similar structure to the one element or another element has a similar pattern to the one element, wherein:

(131) Another element has a similar structure to the one element: identifying whether the elements have similar structures or not by utilizing the ambiguity of the path language of the elements, and searching the similar elements by neglecting the index of ancestor elements on the path language of the elements;

4. The reinforcement learning-based automatic generation method of Web application test cases according to claim 1, wherein: in the step 2, the predefined condition is one of the following three conditions:

5. The reinforcement learning-based automatic generation method of Web application test cases according to claim 1, wherein: in the step 3, when the state and the action are added to the state diagram, whether the state and the action exist in the state diagram is identified as follows:

(31) Comparing whether the state is the same as the existing state one by one, if so, setting the current state as the existing state; otherwise, if the current state is different from the existing state, the current state is not in the state set of the state diagram, and the current state is added into the state set of the state diagram; then, adding a transition from the previous state to the current state to the state diagram;

6. The reinforcement learning-based automatic generation method of Web application test cases according to claim 5, wherein: in the step (1), if the current state is the same as the existing state, constructing a state index tree, judging by adopting similarity, and if the similarity between the current state and the existing state in the state set is higher than a preset threshold, considering that the current state is the same as the existing state;

the similarity judgment is specifically as follows:

given a state, constructing a state index tree, and judging whether the existing state is the same as a certain state in the state diagram or not through similarity judgment and comparison of the state and the state index tree;

The method is concretely realized as follows: when constructing a state index tree, firstly labeling all elements in a state to indicate which state the elements belong to; then, acquiring a state index tree formed by the existing states, if the state index tree is not constructed yet, setting the state index tree as the current state, and not having the same state as the current state;

if the state index tree exists, traversing the state index tree, and calculating the number of similar elements between the state and the existing state; meanwhile, merging the elements of the state into a state index tree, and calculating to obtain the quantity of similar elements of the state and the existing state;

after obtaining the number of similar elements between the state and the existing state, calculating the similarity between the state and the existing state respectively, wherein the similarity between the state and the existing state is defined as:

determining that the current state and the existing state have the maximum similarity, and if the maximum similarity is larger than a preset threshold value, considering that the current state and the existing state are the same; if not, the current state is not the same state as the existing state.

7. The reinforcement learning-based automatic generation method for the Web application test cases as claimed in claim 1, wherein: in the step 4, the reward model is as follows:

r _i to calculate rewards, if the current state is external web page extraction, or the current state is the same as the previous state, a negative reward, i.e. penalty, is given to the current action<0; for other cases, a positive prize r 'is calculated' _i ：

wherein h (e) and w (e) each represent an element e _i Height and width of dist (e) _i-1 ,e _i ) Representing and the one element e _i Adjacent element e _i-1 The Lewy distance between them;

that is, if the element accessed by the current action is a change element, the current action is given a prize value of 1mutants (s _i )，mutants(s _i ) Is represented in the current state s _i The number of the medium change elements;

8. The reinforcement learning-based automatic generation method for the Web application test cases as claimed in claim 1, wherein: and step 5, training an action selection strategy by using a reinforcement learning algorithm, wherein the action selection strategy is specifically realized as follows:

executing a plurality of periods, and resetting the target application by revisiting the homepage link of the target application when the period starts; then from homepage Extracting a state and an action; in each cycle, generating a sequence of actions by performing several steps, in step i, selecting a current action in a previous state, after performing the current action, the target application jumps to the current web page, extracts the current state and the feasible actions from the current web page, updates the state set with the extracted current state, and calculates a reward for the current action for the feasible actions, based on the previous state, the current action, the current state and the reward r for the current action _i Updating the strategy, adding the current action into an action sequence, and storing the action sequence generated in each period into a test case set;

Q(s _i-1 ,act _i )←Q(s _i-1 ,act _i )+α(r _i +γQ ^* (s _i ,a _i+1 )-Q(s _i-1 ,a _i ))

Q(s _i-1 ,act _i ) Representing the previous state s _i-1 Executing the current action act _i Contribution to test target application; when executing the current action act _i So that the previous state s _i-1 From the current action act _i Transition to current state s _i When calculating the prize r _i And updates the function Q, Q ^* (s _i+1 ,a _i+1 ) Representing the slave state s _i+1 Starting the obtained cumulative maximum contribution; the jackpot will be at a discount rate gamma E0, 1]Discounting alpha epsilon [0,1 ]]Representing the learning rate.

9. The automatic generation system of the Web application test case based on reinforcement learning is characterized by comprising the following components: the system comprises a state extraction module, an action extraction module, a state diagram construction module, a reward model module and a reinforcement learning agent module; wherein:

the state diagram construction module is used for constructing a state diagram taking states as nodes and taking actions transferred between the states as edges, reflecting the transfer relation between the states, wherein the state diagram comprises a state set, an action triggerable state transfer and an initial state; when the current state and the current action are added into the state diagram, whether the current state exists in the state diagram or not needs to be judged, whether the current state is a known state or not is recognized, and redundant states in the state diagram are avoided; for the current action, traversing the action set of the state diagram, judging whether the known action exists in the action set and is the same as the current action, and avoiding redundant actions in the state diagram;

10. An electronic device comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, characterized by: said computer program executing the method of any of claims 1-8 of the present invention; or a computer readable storage medium storing a computer program for performing the method of any one of claims 1-8.