CN113377651A

CN113377651A - Class integration test sequence generation method based on reinforcement learning

Info

Publication number: CN113377651A
Application number: CN202110647435.5A
Authority: CN
Inventors: 张艳梅; 丁艳茹; 姜淑娟; 袁冠; 张颖辉
Original assignee: China University of Mining and Technology CUMT
Current assignee: China University of Mining and Technology CUMT
Priority date: 2021-06-10
Filing date: 2021-06-10
Publication date: 2021-09-10

Abstract

The invention discloses a method for generating a class integration test sequence based on reinforcement learning, which belongs to the technical field of software testing. It includes the following steps: 1) define reinforcement learning tasks; 2) program static analysis; 3) measure test pile complexity; 4) design reward function; 5) design value function; 6) generate class integration test sequence. The invention solves the problem that the existing method for generating a class integration test sequence based on reinforcement learning is not accurate enough to evaluate and determine the overall cost of the class integration test sequence, and provides more accurate testing work for testers in actual production and life. The measurement method improves the efficiency of integration testing and can better control the quality of the product.

Description

A Reinforcement Learning-Based Method for Generating Class Integration Test Sequences

技术领域technical field

本发明属于软件测试技术领域，且特别是有关于一种基于强化学习的类集成测试序列生成方法。The invention belongs to the technical field of software testing, and in particular relates to a method for generating a class integration test sequence based on reinforcement learning.

背景技术Background technique

软件测试阶段主要包括单元测试、集成测试、系统测试、验证和确认以及回归测试等。其中，集成测试是在单元测试的基础上，将所有的软件单元组装成模块、子系统或系统，检测各部分工作是否达到或实现相应技术指标及要求，以保证各单元组合之后能够按既定意图协作运行，确保增量的行为正确。但是，面向对象的程序没有明显的层次划分，其间的调用关系表现为错综复杂的网状结构，传统的集成测试策略并不能很好地应用其中。所以，需要提出符合面向对象程序特点的新的集成测试策略，这种新的策略以类为对象，以生成最优的类集成测试序列为目的，进而确定测试方式。The software testing phase mainly includes unit testing, integration testing, system testing, verification and validation, and regression testing. Among them, integration testing is to assemble all software units into modules, subsystems or systems on the basis of unit testing, and detect whether each part of the work has reached or achieved the corresponding technical indicators and requirements, so as to ensure that each unit can be combined according to the established intention. Work cooperatively to ensure that increments behave correctly. However, the object-oriented program has no obvious hierarchical division, and the calling relationship between them is intricate network structure, and the traditional integration testing strategy cannot be applied well. Therefore, it is necessary to propose a new integration testing strategy that conforms to the characteristics of object-oriented programs. This new strategy takes the class as the object and aims to generate the optimal class integration testing sequence, and then determines the testing method.

根据面向对象程序的类间依赖性，软件工程领域的研究者们提出了基于类集成测试序列的集成策略。在测试过程中，这些策略往往需要为面向对象程序中的某些类构造所需的测试桩，以代替其完成某些功能。这项任务的代价很大，并且一般来说没有办法避免，因而如何降低代价成为了集成测试中的一项关键性的问题。研究过程中，学者们通过计算测试桩复杂度衡量测试桩的代价，不同的类集成测试序列，它们的测试桩复杂度不尽相同，测试代价也不相同。合理地对测试程序中的类进行排序，得到可行的类集成测试序列，可以大大降低需要构建的测试桩的总体复杂度，进而尽可能使测试代价减小。According to the inter-class dependencies of object-oriented programs, researchers in the field of software engineering have proposed integration strategies based on class integration test sequences. In the testing process, these strategies often need to construct the required test stubs for some classes in the object-oriented program to complete some functions instead of them. The cost of this task is high, and generally there is no way to avoid it, so how to reduce the cost has become a key issue in integration testing. In the research process, scholars measure the cost of test piles by calculating the complexity of test piles. Different types of integration test sequences have different test pile complexities and test costs. Sorting the classes in the test program reasonably to obtain a feasible class integration test sequence can greatly reduce the overall complexity of the test piles that need to be constructed, thereby reducing the test cost as much as possible.

已有的基于强化学习的类集成测试序列生成方法忽略了测试桩复杂度这一评价指标，这些方法假设每个类间依赖关系依赖程度相同，即，每个测试桩具有相同的复杂度。然而，不同的测试桩具有不同的复杂度，测试桩越少不能表示确定一个类集成测试序列花费的测试桩代价越低。因此，已有的基于强化学习的类集成测试序列生成方法以类测试桩数量作为衡量标准，来确定类集成测试序列需要花费的总体代价，这种指标不够精确。所以，提出合理的类集成测试序列生成技术以及将评价指标精确化对于集成测试来说具有相当重要的意义。Existing reinforcement learning-based class integration test sequence generation methods ignore the evaluation index of test stub complexity. These methods assume that each class has the same degree of dependency, that is, each test stub has the same complexity. However, different test stubs have different complexity. Fewer test stubs does not mean that the cost of determining a class integration test sequence is lower. Therefore, the existing reinforcement learning-based generation methods for class integration test sequences use the number of class test piles as a measure to determine the overall cost of class integration test sequences, which is not accurate enough. Therefore, it is very important for integration testing to propose a reasonable integration test sequence generation technique and to make the evaluation index precise.

发明内容SUMMARY OF THE INVENTION

本发明的目的在于提供一种基于强化学习的类集成测试序列生成方法，解决已有的基于强化学习的类集成测试序列生成方法评估确定类集成测试序列花费的总体代价的指标不够精确的问题。这可以为实际生产生活中测试人员开展测试工作提供更为准确的度量方法，进而提升集成测试的效率。The purpose of the present invention is to provide a method for generating a class integration test sequence based on reinforcement learning, which solves the problem that the existing method for generating a class integration test sequence based on reinforcement learning is not accurate enough to evaluate and determine the overall cost of the class integration test sequence. This can provide a more accurate measurement method for testers to carry out testing work in actual production and life, thereby improving the efficiency of integration testing.

本发明按以下技术方案实现：The present invention is realized according to the following technical solutions:

一种基于强化学习的类集成测试序列生成方法，具体过程为：A method for generating class integration test sequence based on reinforcement learning, the specific process is as follows:

步骤1、定义强化学习任务：强化学习的任务就是使智能体在环境中不断地进行尝试，根据获得的奖励值不断调整策略，最终生成一个较好策略，智能体根据这个策略便能够知道在什么状态下应该执行什么动作；Step 1. Define the reinforcement learning task: The task of reinforcement learning is to make the agent continue to try in the environment, constantly adjust the strategy according to the reward value obtained, and finally generate a better strategy, and the agent can know where What action should be performed in the state;

步骤2、程序静态分析：对源程序进行静态分析，将获取的信息用于计算类间的属性和方法复杂度，通过属性复杂度计算类间的属性耦合，通过方法复杂度计算类间的方法耦合；Step 2. Program static analysis: perform static analysis on the source program, use the obtained information to calculate the properties and method complexity between classes, calculate the attribute coupling between classes through the attribute complexity, and calculate the method between classes through the method complexity coupling;

步骤3、度量测试桩复杂度：依据前面得到的属性和方法复杂度计算测试桩复杂度，为后面奖励函数的设计提供信息；Step 3, measure the complexity of the test pile: calculate the complexity of the test pile according to the properties and method complexity obtained in the front, and provide information for the design of the reward function later;

步骤4、设计奖励函数：将测试桩复杂度的计算融入奖励函数的设计中，指导智能体向测试桩复杂度更低的方向学习；Step 4. Design the reward function: integrate the calculation of the complexity of the test pile into the design of the reward function, and guide the agent to learn in the direction of lower complexity of the test pile;

步骤5、设计值函数：通过奖励函数反馈值函数，通过值函数的设定保证累计奖励最大化；Step 5. Design the value function: feedback the value function through the reward function, and ensure that the accumulated reward is maximized through the setting of the value function;

步骤6、生成类集成测试序列：当智能体完成设定的训练次数，选出整体奖励值最大的动作路径，即为本次学习得到的类集成测试序列。Step 6. Generate a class integration test sequence: When the agent completes the set training times, select the action path with the largest overall reward value, which is the class integration test sequence learned this time.

具体的方案，步骤1的具体步骤如下：For a specific scheme, the specific steps of step 1 are as follows:

1.1、将要分析的软件系统视为一组在测试时进行需要进行集成的类的集合；1.1. Consider the software system to be analyzed as a collection of classes that need to be integrated during testing;

1.2、保留智能体在路径中执行的动作序列，即动作历史，作为集成测试类序列的候选解决方案；1.2. Retain the action sequence performed by the agent in the path, that is, the action history, as a candidate solution for the integration test class sequence;

1.3、从候选解决方案中找到一条总体奖励最大的动作历史，即为本次学习过程所求的类集成测试序列。1.3. Find an action history with the largest overall reward from the candidate solutions, which is the class integration test sequence required for this learning process.

具体的方案，步骤2的具体步骤如下：For a specific scheme, the specific steps of step 2 are as follows:

2.1、分析类间关系，通过属性复杂度计算类间的属性耦合，使用A(i, j)代表，i，j分别表示程序中的类，属性复杂度数量上等于i调用j的成员变量数、方法参数类型与方法的返回值数目三者之和；2.1. Analyze the relationship between classes, calculate the attribute coupling between classes through the attribute complexity, use A(i, j) to represent, i, j respectively represent the class in the program, the attribute complexity is equal to the number of member variables of i calling j , the sum of the method parameter type and the number of return values of the method;

2.2、通过方法复杂度计算类间的方法耦合，使用M(i, j)代表，方法复杂度数目上等于i调用j中方法的数目。2.2. Calculate the method coupling between classes through method complexity, represented by M(i, j). The number of method complexity is equal to the number of methods in j called by i.

2.3、对属性和方法复杂度进行标准化。2.3. Standardize properties and method complexity.

具体的方案，步骤3的具体步骤如下：For a specific scheme, the specific steps of step 3 are as follows:

3.1、通过熵权法计算类间的属性和方法复杂度的权值；3.1. Calculate the weight of attributes and method complexity between classes by entropy weight method;

3.2、结合属性和方法复杂度计算得到测试桩复杂度；3.2. Combining attributes and method complexity to calculate the test pile complexity;

3.3、当得到类集成测试序列时，对过程中产生的测试桩复杂度进行累加，得到总体测试桩复杂度。3.3. When the class integration test sequence is obtained, the complexity of the test piles generated in the process is accumulated to obtain the overall test pile complexity.

进一步的方案：测试桩复杂度

Further Scenario: Test Stub Complexity

其中，

，

，A(i, j)代表类i和j之间的属性复杂度，M(i, j)代表类i和j之间的方法复杂度，熵权法最先进行的就是对连个指标进行标准化，这样得到的结果均在0到1之间，标准化之后的类间的属性复杂度是

，方法复杂度是

。in,

,

, A(i, j) represents the attribute complexity between classes i and j, M(i, j) represents the method complexity between classes i and j, and the entropy weight method is first performed on two indicators. Standardization, the results obtained in this way are all between 0 and 1, and the attribute complexity between classes after standardization is

, the method complexity is

.

具体的方案，步骤4的具体步骤如下：For a specific scheme, the specific steps of step 4 are as follows:

4.1、设计当智能体探索并集成到的类越优，其得到奖励值越高的奖励函数；4.1. Design a reward function with a higher reward value when the agent explores and integrates into a better class;

4.2、当过程中任意一个动作类重复出现两次，给予这条路径一个最小值-∞，以便继续探索时进行避免；4.2. When any action class appears twice in the process, give this path a minimum value of -∞, so as to avoid it when continuing to explore;

4.3、通过以上两点结合测试桩复杂度，设计奖励函数。4.3. Design the reward function by combining the above two points with the complexity of the test pile.

进一步的方案：

Further options:

其中，智能体经过i-1次状态变换到达σ_i，σ_i表示状态路径，r(σ_i)表示该状态路径会得到的奖励值，Max表示最大奖励值，这里取1000，c为一个正整数值，这里取100，a_σi表示与状态路径对应的动作历史，SCplx()表示测试桩复杂度。当过程中出现任何不符合要求的情况，则环境会赋予智能体一个惩罚值。Among them, the agent reaches σ _i after i-1 state transitions, σ _i represents the state path, r(σ _i ) represents the reward value that the state path will get, and Max represents the maximum reward value, where 1000 is taken here, and c is a positive Integer value, here is 100, a _σi represents the action history corresponding to the state path, and SCplx() represents the test pile complexity. When any non-compliance occurs in the process, the environment will give the agent a penalty value.

具体的方案，步骤5的具体步骤如下：For a specific scheme, the specific steps of step 5 are as follows:

5.1、根据环境的交互产生的状态和选择的动作，得到一个即时回报的Q值，用Q(s,a)表示，其中，s表示状态，a表示动作；5.1. According to the state generated by the interaction of the environment and the selected action, a Q value of the immediate reward is obtained, which is represented by Q(s, a), where s represents the state and a represents the action;

5.2、根据下一个状态s’选择最大的Q(s’, a’)并将其乘以一个折扣因子γ；5.2. Select the largest Q(s', a') according to the next state s' and multiply it by a discount factor γ;

5.3、再加上智能体在状态s下执行动作a得到的奖励值r；5.3, plus the reward value r obtained by the agent performing action a in state s;

5.4、整体乘以一个学习率α；5.4. Multiply the whole by a learning rate α;

5.5、再加上刚才即时回报的Q值得到现在的Q值。5.5, plus the Q value of the immediate return just now, the current Q value.

进一步的方案：

Further options:

其中，α表示学习率，r表示智能体在状态s下执行动作a得到的奖励值，γ表示折扣因子。Among them, α represents the learning rate, r represents the reward value obtained by the agent performing action a in state s, and γ represents the discount factor.

具体的方案，步骤6的具体步骤如下：For a specific scheme, the specific steps of step 6 are as follows:

6.1、智能体根据动作选择机制选择动作；6.1. The agent selects actions according to the action selection mechanism;

6.2、智能体完成训练次数的时候，系统选择最大整体奖励值的动作序列返回，即为所求的最优类集成测试序列。6.2. When the agent completes the training times, the system selects the action sequence with the largest overall reward value to return, which is the desired optimal class integration test sequence.

与现有技术相比，本发明有益效果：Compared with the prior art, the present invention has the following beneficial effects:

本发明解决了目前已有的基于强化学习的类集成测试序列生成方法评估确定类集成测试序列花费的总体代价的指标不够精确的问题，为实际生产生活中测试人员开展测试工作提供了更为准确的度量方法，提升了集成测试的效率，进而更好地控制产品的质量。The invention solves the problem that the existing method for generating a class integration test sequence based on reinforcement learning is not accurate enough to evaluate and determine the overall cost of the class integration test sequence, and provides more accurate testing work for testers in actual production and life. The measurement method improves the efficiency of integration testing, thereby better controlling the quality of the product.

附图说明Description of drawings

附图作为本发明的一部分，用来提供对本发明的进一步的理解，本发明的示意性实施例及其说明用于解释本发明，但不构成对本发明的不当限定。显然，下面描述中的附图仅仅是一些实施例，对于本领域普通技术人员来说，在不付出创造性劳动的前提下，还可以根据这些附图获得其他附图。The accompanying drawings, as a part of the present invention, are used to provide further understanding of the present invention, and the exemplary embodiments of the present invention and their descriptions are used to explain the present invention, but do not constitute an improper limitation of the present invention. Obviously, the drawings in the following description are only some embodiments, and for those of ordinary skill in the art, other drawings can also be obtained from these drawings without any creative effort.

在附图中：In the attached image:

图1为本发明实例的一种基于强化学习的类集成测试序列生成方法的流程图；1 is a flowchart of a method for generating a class integration test sequence based on reinforcement learning according to an example of the present invention;

图2为定义强化学习任务的流程图；Figure 2 is a flowchart for defining reinforcement learning tasks;

图3为程序静态分析的流程图；Fig. 3 is the flow chart of program static analysis;

图4为度量测试桩复杂度的流程图；Fig. 4 is the flow chart of measuring test pile complexity;

图5为设计奖励函数的流程图；Fig. 5 is the flow chart of designing reward function;

图6为设计值函数的流程图；Fig. 6 is the flow chart of design value function;

图7为生成类集成测试序列的流程图。Figure 7 is a flow chart of generating a class integration test sequence.

需要说明的是，这些附图和文字描述并不旨在以任何方式限制本发明的构思范围，而是通过参考特定实施例为本领域技术人员说明本发明的概念。It should be noted that these drawings and written descriptions are not intended to limit the scope of the present invention in any way, but to illustrate the concept of the present invention to those skilled in the art by referring to specific embodiments.

具体实施方式Detailed ways

为使本发明实施例的目的、技术方案和优点更加清楚，下面将结合本发明实施例中的附图，对实施例中的技术方案进行清楚、完整地描述，以下实施例用于说明本发明，但不用来限制本发明的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and the following embodiments are used to illustrate the present invention , but are not intended to limit the scope of the present invention.

图1为本发明实例的一种基于强化学习的类集成测试序列生成方法的流程图。FIG. 1 is a flow chart of a method for generating a class integration test sequence based on reinforcement learning according to an example of the present invention.

S1定义强化学习任务。强化学习是以马尔科夫决策过程为理论基础的，即智能体选择动作发生的条件概率只和当前事务所处状态相关，但同时，当前的动作会以一定概率影响到下一状态的转变，智能体根据状态和动作得到奖励值。假设路径动作历史共有n个动作。在既定的策略条件下，定义强化学习任务的目标就是找到一组最优的序列，使所获得的总体奖励最大化。S1 defines the reinforcement learning task. Reinforcement learning is theoretically based on the Markov decision process, that is, the conditional probability that the agent chooses an action to occur is only related to the current state of the office, but at the same time, the current action will affect the transition of the next state with a certain probability. The agent gets reward value based on state and action. Suppose there are n actions in the path action history. Under the given policy conditions, the goal of defining a reinforcement learning task is to find a set of optimal sequences that maximize the overall reward obtained.

S2程序静态分析。对源程序进行静态分析，获取一系列信息，如：类间依赖关系类型、属性依赖信息（如成员变量依赖信息、参数传递信息和返回值信息等）和方法调用信息等，将这些信息用于计算类间的属性和方法复杂度。类间依赖关系可以依据程序运行与否分为类间静态依赖关系与类间动态依赖关系。当程序无需运行即可进行分析的称为类间静态依赖，当程序运行时形成的关系称为类间动态依赖关系。本发明涉及的主要是静态依赖关系。构建测试桩的过程需要考虑类间依赖关系的强弱，而度量这种依赖关系就需要分析类间的耦合信息，进而计算构建测试桩的代价。经过分析，耦合度的紧密程度与依赖关系的强弱呈正相关关系，即耦合度与测试代价呈正相关关系。S2 program static analysis. Perform static analysis on the source program to obtain a series of information, such as: inter-class dependency type, attribute dependency information (such as member variable dependency information, parameter transfer information and return value information, etc.) and method invocation information, etc., and use this information for Calculate property and method complexity between classes. The dependencies between classes can be divided into static dependencies between classes and dynamic dependencies between classes according to whether the program runs or not. When the program can be analyzed without running, it is called static dependency between classes, and the relationship formed when the program is running is called dynamic dependency between classes. The present invention is primarily concerned with static dependencies. The process of building test stubs needs to consider the strength of dependencies between classes, and measuring such dependencies requires analyzing the coupling information between classes, and then calculating the cost of building test stubs. After analysis, the tightness of the coupling degree is positively correlated with the strength of the dependency, that is, the coupling degree is positively correlated with the test cost.

S3度量测试桩复杂度。测试桩并非一个系统中真正的类，是服务于被测对象的一个组件模块或一段目标代码。当两个类之间的依赖关系较强时，测试桩需要模拟的功能就比较多，构建起来较为复杂，测试桩复杂度就比较高；当关系较弱时，构建测试桩的难度降低，代价较小，测试桩复杂度就较低，所以就可以通过类间关系的强弱计算测试桩复杂度，进而得到测试代价。S3 measures test pile complexity. A test stub is not a real class in a system, but a component module or a piece of target code that serves the object under test. When the dependency between the two classes is strong, the test pile needs to simulate more functions, the construction is more complicated, and the complexity of the test pile is relatively high; when the relationship is weak, the difficulty of constructing the test pile is reduced, and the cost Smaller, the test pile complexity is lower, so the test pile complexity can be calculated by the strength of the relationship between classes, and then the test cost can be obtained.

S4设计奖励函数。当智能体在状态s下采取动作a时，环境会综合给予智能体一个奖励r，当智能体获得的r为正向时，智能体会加强相关方向动作的选择，同时也会影响下一状态。奖励函数就是计算奖励值r的一个函数。S4 designs the reward function. When the agent takes action a in state s, the environment will comprehensively give the agent a reward r. When the r obtained by the agent is positive, the agent will strengthen the choice of actions in the relevant direction, and will also affect the next state. The reward function is a function that calculates the reward value r.

S5设计值函数。奖励函数用来使智能体从一个状态转移到下一个状态时所获得的奖惩值最大，为了保证到达目标状态时累计奖励值最大，设计值函数来表示累计回报值，Q表就用来存储这样的Q值。假设已经知道了前t次动作之后得到的平均奖励值与第t+1次的奖励值，那就可以预测出第t+1次选择动作后的累计奖励Q值来进行更新，值函数表现了强化学习中预测和更新反馈的过程。S5 Design Value Function. The reward function is used to maximize the reward and punishment value obtained by the agent when it moves from one state to the next state. In order to ensure that the cumulative reward value is the largest when reaching the target state, the value function is designed to represent the cumulative reward value, and the Q table is used to store such the Q value. Assuming that the average reward value obtained after the first t actions and the reward value of the t+1st time are known, then the cumulative reward Q value after the t+1th selection action can be predicted and updated. The value function shows The process of predicting and updating feedback in reinforcement learning.

S6生成类集成测试序列。通过智能体在学习过程中的动作选择以及奖励反馈，一次选择合适的动作加入动作序列中，在这个过程中通过衡量构建测试桩的复杂度设计奖励函数，以及设计值函数保证动作序列的累计奖励值也是最大的，最终选择出合适的动作序列即为该方法得到的最优的类集成测试序列。S6 generates class integration test sequences. Through the action selection and reward feedback of the agent in the learning process, a suitable action is selected to be added to the action sequence. In this process, the reward function is designed by measuring the complexity of building the test pile, and the value function is designed to ensure the cumulative reward of the action sequence. The value is also the largest, and the final selection of the appropriate action sequence is the optimal class integration test sequence obtained by this method.

图2为定义强化学习任务的流程图。通过了解强化学习的构造，结合对类集成测试序列的相关研究，制定以测试桩复杂度尽可能低为目标的强化学习策略，以智能体选择的动作序列累计奖励值最优作为学习目标。具体步骤如下：首先，将要分析的软件系统视为一组在测试时进行需要进行集成的类的集合；然后，保留智能体在路径中执行的动作序列，即动作历史，作为集成测试类序列的候选解决方案；最后，从候选解决方案中找到一条总体奖励最大的动作历史，即为本次学习过程所求的类集成测试序列。Figure 2 is a flowchart for defining reinforcement learning tasks. By understanding the structure of reinforcement learning, combined with the relevant research on the class ensemble test sequence, a reinforcement learning strategy is formulated with the goal of reducing the complexity of the test pile as much as possible, and the optimal cumulative reward value of the action sequence selected by the agent is the learning goal. The specific steps are as follows: first, the software system to be analyzed is regarded as a set of classes that need to be integrated during testing; then, the action sequence performed by the agent in the path, that is, the action history, is retained as the integration test class sequence. Candidate solutions; finally, find an action history with the largest overall reward from the candidate solutions, which is the class integration test sequence required for this learning process.

图3为程序静态分析的流程图。通过对程序中类与类间的依赖关系进行分析，得到属性和方法耦合度，为下一步计算测试桩复杂度做准备。具体步骤如下：首先，通过分析程序间的具体语句分析类间关系；然后，通过属性复杂度计算类间的属性耦合，通过方法复杂度计算类间的方法耦合；最后，为了方便下一步计算，对属性和方法复杂度分别进行标准化。Figure 3 is a flow chart of program static analysis. By analyzing the dependencies between classes in the program, the coupling degree of attributes and methods is obtained, which is prepared for the next step to calculate the complexity of test piles. The specific steps are as follows: first, analyze the relationship between classes by analyzing the specific statements between the programs; then, calculate the attribute coupling between classes through the attribute complexity, and calculate the method coupling between the classes through the method complexity; finally, in order to facilitate the next calculation, Normalize properties and method complexity separately.

图4为度量测试桩复杂度的流程图。测试桩复杂度是衡量测试代价的重要指标，主要是通过类间的属性和方法复杂度计算得到。具体步骤如下：首先，确定属性和方法复杂度的权值，采用熵权法计算类间的二者权重；然后，结合上一步标准化后的属性和方法复杂度计算得到测试桩复杂度；最后，当得到类集成测试序列时，对过程中产生的测试桩复杂度进行累加，得到总体测试桩复杂度，以便对本方法进行评估。FIG. 4 is a flow chart of measuring the complexity of test piles. The test pile complexity is an important indicator to measure the test cost, which is mainly calculated by the attributes and method complexity between classes. The specific steps are as follows: first, determine the weights of the attribute and method complexity, and use the entropy weight method to calculate the weight between the two classes; then, combine the standardized attributes and method complexity in the previous step to calculate the test pile complexity; finally, When the class integration test sequence is obtained, the complexity of the test piles generated in the process is accumulated to obtain the overall test pile complexity, so as to evaluate the method.

实施方式：为了更准确得到测试桩复杂度，采用熵权法来计算属性和方法复杂度的权值W_A和W_M。Embodiment: In order to obtain the test pile complexity more accurately, the entropy weight method is used to calculate the weights W _A and W _M of the attribute and method complexity.

熵权法计算测试桩复杂度的步骤如下：The steps of calculating the complexity of the test pile by the entropy weight method are as follows:

（1）标准化指标(1) Standardized indicators

计算测试桩复杂度有两个指标，分别是属性复杂度与方法复杂度，A(i, j)代表类i和j之间的属性复杂度，M(i, j)代表类i和j之间的方法复杂度，熵权法最先进行的就是对连个指标进行标准化，这样得到的结果均在0到1之间，标准化之后的类间的属性复杂度是

，方法复杂度是

。计算公式如下：There are two indicators for calculating the complexity of test piles, namely attribute complexity and method complexity, A(i, j) represents the attribute complexity between classes i and j, and M(i, j) represents the difference between classes i and j. The first step of the entropy weight method is to standardize two indicators, so that the obtained results are between 0 and 1. The attribute complexity between classes after standardization is

, the method complexity is

. Calculated as follows:

；

;

。

.

（2）建立评价矩阵(2) Establish an evaluation matrix

假设待测系统包含m个类，则可以构造一个m*2的矩阵表示类间的两种关系，其中以第一列表示在属性复杂度的指标下的评价值，第二列表示在方法复杂度的指标下的评价值。两列共同构成一个评价矩阵R，计算公式如下：Assuming that the system to be tested contains m classes, an m*2 matrix can be constructed to represent the two relationships between classes, in which the first column represents the evaluation value under the attribute complexity index, and the second column represents the method complexity. The evaluation value under the index of degree. The two columns together form an evaluation matrix R, and the calculation formula is as follows:

计算信息熵Calculate information entropy

计算信息熵之前需要首先计算每个类分别对每个指标的评价值占比，以P_ij表示第j个指标的比重，然后根据得到的比重计算第j个指标的信息熵e_j，其中K为一个常数，公式如下：Before calculating the information entropy, it is necessary to first calculate the proportion of the evaluation value of each class to each index, and use P _ij to represent the proportion of the jth index, and then calculate the information entropy e _j of the jth index according to the obtained proportion, where K is a constant, the formula is as follows:

计算权重Calculate weights

将第j个指标的权重表示为W_j，其中j为1时代表属性复杂度权重，j为2时代表方法复杂度权重，公式如下：Denote the weight of the jth indicator as W _j , where j is 1, representing the attribute complexity weight, and when j is 2, representing the method complexity weight, the formula is as follows:

最终得到属性复杂度和方法复杂度得权重，进而得到测试桩复杂度SCplx(i, j)。Finally, the weight of attribute complexity and method complexity is obtained, and then the test pile complexity SCplx(i, j) is obtained.

。

.

图5为设计奖励函数的流程图，奖励函数是指导智能体进行探索的重要指标，智能体倾向于探索奖励值更大的动作路径，所以为了得到测试代价尽可能低的类集成测试序列，本发明结合测试桩复杂度对奖励函数进行改进。具体步骤如下：首先，设计当智能体探索并集成到的类越优，其得到奖励值越高的奖励函数；然后，当过程中任意一个动作类重复出现两次，给予这条路径一个最小值-∞，以便继续探索时可以对这种情况进行避免；最后，通过以上两点结合测试桩复杂度，设计奖励函数，训练智能体倾向于探索测试桩复杂度更低的路径。Figure 5 is a flowchart of designing a reward function. The reward function is an important indicator to guide the agent to explore. The agent tends to explore the action path with a larger reward value. Therefore, in order to obtain a class integration test sequence with the lowest possible test cost, this The invention improves the reward function in combination with the complexity of the test pile. The specific steps are as follows: First, design a reward function with a higher reward value when the agent explores and integrates into a better class; then, when any action class appears twice in the process, give this path a minimum value -∞, so that this situation can be avoided when continuing to explore; finally, by combining the above two points with the complexity of the test pile, the reward function is designed, and the training agent tends to explore the path with lower complexity of the test pile.

实施方式：假设智能体在第一个状态到第f个状态中，共经历了n+1个状态，选择了n个动作，则认为第f个状态s_f即为最终态，从s₁到s_f之间的状态变化函数的公式如下所示，s’为状态s的下一状态。Implementation: Assuming that the agent has experienced n+1 states in total from the first state to the f-th state, and selected n actions, the f-th state s _f is considered to be the final state, from s ₁ to The formula of the state change function between s _f is shown below, where s' is the next state of state s.

用

表示从最初状态到最终状态s_f的状态路径，其中σ₀表示初始状态s₁，

。智能体按照该路径执行的n个动作序列即可以表示为

，即与状态路径对应的动作历史。如果该路径不包含重复动作，则可以认为

即为一条可供选择的类集成测试序列。use

represents the state path from the initial state to the final state s _f , where σ ₀ represents the initial state s ₁ ,

. The n action sequences performed by the agent according to the path can be expressed as

, that is, the action history corresponding to the state path. If the path does not contain repeated actions, it can be considered that

That is, a sequence of optional class integration tests.

强化学习中的奖惩机制是控制智能体探索最佳路径的核心，当智能体探索并集成到的类越优，其得到奖励值越高。为了进一步降低类集成测试序列的总体测试代价，本发明结合测试桩复杂度设计强化函数，定义公式如下所示：The reward and punishment mechanism in reinforcement learning is the core of controlling the agent to explore the best path. The better the class that the agent explores and integrates into, the higher the reward value. In order to further reduce the overall test cost of the class integration test sequence, the present invention combines the complexity of the test pile to design a strengthening function, and the definition formula is as follows:

智能体经过i-1次状态变换到达σ_i，σ_i表示状态路径，r(σ_i)表示该状态路径会得到的奖励值，Max表示最大奖励值，这里取1000，c为一个正整数值，这里取100，a_σi表示与状态路径对应的动作历史，SCplx()表示测试桩复杂度。当智能体进行探索的过程中出现任何不符合要求的情况，环境会赋予智能体一个惩罚值。例如，过程中任意一个动作类重复出现两次，即给予这条路径一个最小值-∞，后面继续探索时可以对这种情况进行避免。当该路径中没有出现重复类则可认为此条路径可行，环境会赋予智能体一个奖励值，此处c是一个正整数。当到达终态仍没有重复类出现，即可以认为已经找到一条备选路径并计算其整体测试代价。假如当前得到的类集成测试序列的测试代价小于前面所得到的所有序列的测试代价，则可以给予其一个较高的奖励值。按照这种路径动作历史的顺序进行集成，可以使生成的类集成测试序列整体测试桩复杂度最小化，最大程度的节约测试成本。The agent reaches σ _i after i-1 state transitions, σ _i represents the state path, r(σ _i ) represents the reward value that the state path will get, and Max represents the maximum reward value, where 1000 is taken here, and c is a positive integer value , here is 100, a _σi represents the action history corresponding to the state path, and SCplx() represents the test pile complexity. When the agent does not meet the requirements in the process of exploration, the environment will give the agent a penalty value. For example, if any action class appears twice in the process, this path is given a minimum value of -∞, which can be avoided when continuing to explore later. When there is no duplicate class in the path, the path can be considered feasible, and the environment will give the agent a reward value, where c is a positive integer. When the final state is reached and no duplicate class appears, it can be considered that an alternative path has been found and its overall test cost is calculated. If the test cost of the currently obtained class integration test sequence is less than the test cost of all the previous sequences obtained, it can be given a higher reward value. Integrating according to the sequence of the path action history can minimize the overall test pile complexity of the generated class integration test sequence and save the test cost to the greatest extent.

图6为设计值函数的流程图，值函数更注重智能体探索过程中得到的累计奖励值，可以和奖励函数相辅相成，共同促使智能体向测试代价更低的方向探索。具体步骤如下：首先，根据环境的交互中产生的状态和选择的动作，得到一个即时回报的Q值，用Q(s, a)表示；然后，根据下一个状态s’选择最大的Q(s’, a’)并将其乘以一个折扣因子

；之后，再加上智能体在状态s下执行动作a得到的奖励值

并整体乘以一个学习率

；最后，加上刚才即时回报的Q值得到现在的Q值。Figure 6 is a flowchart of designing a value function. The value function pays more attention to the cumulative reward value obtained during the agent's exploration process, and can complement the reward function to jointly promote the agent to explore in a direction with lower testing costs. The specific steps are as follows: first, according to the state generated in the interaction of the environment and the selected action, obtain a Q value of immediate reward, which is represented by Q(s, a); then, select the largest Q(s according to the next state s'',a') and multiply it by a discount factor

; After that, add the reward value obtained by the agent performing action a in state s

and multiply by a learning rate as a whole

; Finally, add the Q value of the immediate return just now to the current Q value.

计算公式如下：

Calculated as follows:

其中，

表示学习率，

表示智能体在状态

下执行动作

得到的奖励值，

表示折扣因子。in,

represents the learning rate,

Indicates that the agent is in the state

perform action

the reward value received,

represents the discount factor.

图7为生成类集成测试序列的流程图，本发明采用强化学习方法，训练智能体向测试代价更低的方向学习，其过程中选择动作得到的动作序列即为所求的类集成测试序列。具体步骤如下：首先，智能体根据动作选择机制选择动作；然后，智能体完成训练次数的时候，系统选择最大整体奖励值的动作序列返回，即为所求的最优类集成测试序列。7 is a flow chart of generating a class integration test sequence. The present invention adopts the reinforcement learning method to train the agent to learn in a direction with lower test cost, and the action sequence obtained by selecting actions in the process is the desired class integration test sequence. The specific steps are as follows: First, the agent selects actions according to the action selection mechanism; then, when the agent completes the training times, the system selects the action sequence with the largest overall reward value and returns, which is the optimal class integration test sequence.

实施方式：强化学习是通过探索和利用选择动作的过程，为了在学习过程避免陷入局部最优，进一步增加智能体探索的比例，在ε-贪婪方法的基础上，采取两种选择机制：Implementation: Reinforcement learning is a process of exploring and utilizing selection actions. In order to avoid falling into a local optimum during the learning process and further increase the proportion of agent exploration, based on the ε-greedy method, two selection mechanisms are adopted:

传统的ε-贪婪算法：以1-ε的概率选择当前状态下Q值最大对应的动作；以ε的概率随机选择动作。The traditional ε-greedy algorithm: select the action corresponding to the maximum Q value in the current state with the probability of 1-ε; randomly select the action with the probability of ε.

动态调整ε算法：首先动态调整ε的值，公式如下所示，其中times表示训练次数。Dynamic adjustment ε algorithm: first dynamically adjust the value of ε, the formula is as follows, where times represents the number of training times.

。

.

通过上述Q-学习算法，得到智能体从初始到最终状态的路径σ，如果所有状态下的所有动作均被访问到，则智能体得到的Q值达到最佳值。最后得到的与状态路径相关联的动作历史，其对应的动作序列即为本方法得到的最佳的类集成测试序列。Through the above Q-learning algorithm, the path σ of the agent from the initial to the final state is obtained. If all actions in all states are accessed, the Q value obtained by the agent reaches the optimal value. The action history associated with the state path is finally obtained, and its corresponding action sequence is the best class integration test sequence obtained by this method.

以下给出上述的一种基于强化学习的类集成测试序列生成方法的实验数据：The experimental data of the above-mentioned reinforcement learning-based class ensemble test sequence generation method are given below:

实验对象为SIR网站中的elevator、SPM、ATM、daisy、ANT、BCEL、DNS、email_spl和notepad_spl等9个程序。本方法30次实验结果的平均值表明对于这9个程序中的elevator、daisy、ANT、email_spl和notepad_spl这5个程序，与采用粒子群算法和随机算法生成类集成测试序列时花费的最低总体测试桩复杂度相比，由本方法生成最优的类集成测试序列时所花费的总体测试桩复杂度最低，分别降低了39.4%，33.3%，7.6%，37.9%和17.8%。The experimental objects are nine programs in the SIR website, including elevator, SPM, ATM, daisy, ANT, BCEL, DNS, email_spl and notepad_spl. The average of 30 experimental results of this method shows that for five of the nine programs, elevator, daisy, ANT, email_spl, and notepad_spl, the overall test cost is the lowest when using particle swarm algorithm and random algorithm to generate class integration test sequences. Compared with the stub complexity, the overall test stub complexity spent by this method to generate the optimal integration-like test sequence is the lowest, which is reduced by 39.4%, 33.3%, 7.6%, 37.9% and 17.8%, respectively.

综上，本发明解决了当下已有的基于强化学习的类集成测试序列生成方法评估确定类集成测试序列花费的总体代价的指标不够精确的问题，不但可以进一步完善类集成测试序列生成领域的研究，还可以进一步降低测试代价，为实际生产生活中测试人员开展测试工作提供更为准确的度量方法，提升软件测试环节的效率，保证软件产品的质量。To sum up, the present invention solves the problem that the existing reinforcement learning-based method for generating class integration test sequences is not accurate enough to evaluate and determine the overall cost of class integration test sequences, and not only can further improve the research in the field of class integration test sequence generation. , it can further reduce the test cost, provide more accurate measurement methods for testers to carry out testing work in actual production and life, improve the efficiency of software testing, and ensure the quality of software products.

在此处所提供的说明书中，说明了大量具体细节。然而，能够理解，本发明的实施例可以在没有这些具体细节的情况下实践。在一些实例中，并未详细示出公知的方法、结构和技术，以便不模糊对本说明书的理解。In the description provided herein, numerous specific details are set forth. It will be understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

此外，本领域的技术人员能够理解，尽管在此所述的一些实施例包括其它实施例中所包含的某些特征而不是其它特征，但是不同实施例的特征的组合同样意味着处于本发明的保护范围之内并且形成不同的实施例。例如，在上面的实施例中，本领域技术人员能够根据获知的技术方案和本申请所要解决的技术问题，以组合的方式来使用。Furthermore, it will be understood by those skilled in the art that although some of the embodiments described herein include certain features contained in other embodiments and not others, that combinations of features of different embodiments are also intended to be within the scope of the present invention. within the scope of protection and form different embodiments. For example, in the above embodiments, those skilled in the art can use in a combined manner according to the known technical solutions and the technical problems to be solved by the present application.

以上所述仅是本发明的较佳实施例而已，并非对本发明作任何形式上的限制，虽然本发明已以较佳实施例揭露如上，然而并非用以限定本发明，任何熟悉本专利的技术人员在不脱离本发明技术方案范围内，当可利用上述提示的技术内容做出些许更动或修饰为等同变化的等效实施例，但凡是未脱离本发明技术方案的内容，依据本发明的技术实质对以上实施例所作的任何简单修改、等同变化与修饰，均仍属于本发明方案的范围内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention in any form. Although the present invention has been disclosed above with preferred embodiments, it is not intended to limit the present invention. Within the scope of the technical solution of the present invention, personnel can make some changes or modifications to equivalent examples of equivalent changes by using the above-mentioned technical content, but any content that does not depart from the technical solution of the present invention, according to the present invention. Any simple modifications, equivalent changes and modifications made to the above embodiments in the technical essence still fall within the scope of the solution of the present invention.

Claims

1. A class integration test sequence generation method based on reinforcement learning is characterized in that:

step 1, defining a reinforcement learning task: the task of reinforcement learning is to make the intelligent agent continuously try in the environment, continuously adjust the strategy according to the obtained reward value, and finally generate a better strategy, and the intelligent agent can know what action should be executed under what state according to the strategy;

step 2, static analysis of the program: performing static analysis on a source program, using the acquired information for calculating the attribute and method complexity among classes, calculating the attribute coupling among the classes through the attribute complexity, and calculating the method coupling among the classes through the method complexity;

step 3, measuring the complexity of the test pile: calculating the complexity of the test pile according to the obtained attributes and method complexity, and providing information for the design of a reward function at the back;

step 4, designing a reward function: integrating the calculation of the complexity of the test pile into the design of a reward function, and guiding an intelligent agent to learn towards a direction with lower complexity of the test pile;

step 5, designing a value function: the value function is fed back through the reward function, and the maximization of the accumulated reward is ensured through the setting of the value function;

step 6, generating a class integration test sequence: and when the intelligent agent finishes the set training times, selecting the action path with the maximum integral reward value, namely the class integrated test sequence obtained by the study.

2. The class integration test sequence generation method based on reinforcement learning of claim 1, wherein:

the specific steps of step 1 are as follows:

1.1, regarding a software system to be analyzed as a set of classes to be integrated during testing;

1.2, an action sequence, namely an action history, executed by the agent in the path is reserved and used as a candidate solution of the integrated test class sequence;

1.3, finding an action history with the maximum overall reward from the candidate solutions, namely, finding a class integration test sequence required by the learning process.

3. The class integration test sequence generation method based on reinforcement learning of claim 1, wherein:

the specific steps of step 2 are as follows:

2.1, analyzing the relationship among classes, calculating the attribute coupling among the classes through the attribute complexity, using A (i, j) to represent, wherein i, j respectively represent the classes in the program, and the attribute complexity is numerically equal to the sum of the number of member variables of i call j, the type of the method parameter and the number of return values of the method;

2.2, calculating the method coupling between classes through the method complexity, using M (i, j) to represent, wherein the method complexity is equal to the number of methods in i call j in number, and then standardizing the attribute and the method complexity.

4. The class integration test sequence generation method based on reinforcement learning of claim 1, wherein:

the specific steps of step 3 are as follows:

3.1, calculating the weight of the attribute and the method complexity between classes by an entropy weight method;

3.2, calculating to obtain the complexity of the test pile by combining the attribute and the method complexity;

and 3.3, accumulating the complexity of the test pile generated in the process when the class integration test sequence is obtained, and obtaining the complexity of the total test pile.

5. The class integration test sequence generation method based on reinforcement learning of claim 4, wherein:

test pile complexity

Wherein,

，

a (i, j) represents the attribute complexity between classes i and j, M (i, j) represents the method complexity between classes i and j, the entropy weight method firstly standardizes the continuous indexes, the obtained results are all between 0 and 1, and the attribute complexity between the classes after standardization is

The complexity of the method is

。

6. The class integration test sequence generation method based on reinforcement learning of claim 1, wherein:

the specific steps of step 4 are as follows:

4.1, designing a reward function with a higher reward value when the class explored and integrated by the agent is more optimal;

4.2 when any action class in the process is repeated twice, giving a minimum value to the path

So as to avoid when continuing to explore;

4.3, by combining the complexity of the test pile and designing a reward function, the training agent tends to explore a path with lower complexity of the test pile.

7. The class integration test sequence generation method based on reinforcement learning of claim 6, wherein:

wherein the agent reaches sigma through i-1 state transitions_i，σ_iRepresents a state path, r (σ)_i) Denotes the prize value that the status path will receive, Max denotes the maximum prize value, here 1000, c is a positive integer value, here 100, a_σiRepresenting the action history corresponding to the state path, SCplx () represents the test stub complexity.

8. The class integration test sequence generation method based on reinforcement learning of claim 1, wherein:

the specific steps of step 5 are as follows:

5.1, obtaining a Q value which is reported immediately according to the state and the selected action generated by the interaction of the environment, wherein the Q value is represented by Q (s, a), s represents the state, and a represents the action;

5.2, selecting the largest Q (s ', a ') according to the next state s ' and multiplying it by a discount factor gamma;

5.3, adding the reward value r obtained by the agent executing the action a in the state s;

5.4, multiplying the whole by a learning rate alpha;

5.5, plus the value of Q just reported immediately, the current value of Q is obtained.

9. The class integration test sequence generation method based on reinforcement learning of claim 8, wherein:

where α represents the learning rate, r represents the reward value resulting from the agent performing action a in state s, and γ represents the discount factor.

10. The class integration test sequence generation method based on reinforcement learning of claim 1, wherein:

the specific steps of step 6 are as follows:

6.1, the intelligent agent selects the action according to an action selection mechanism;

6.2, when the intelligent agent finishes training times, the system selects the action sequence with the maximum integral reward value to return, and the action sequence is the required optimal class integration test sequence.