CN117971684A

CN117971684A - A semantically-aware whole-machine regression test case recommendation method

Info

Publication number: CN117971684A
Application number: CN202410172283.1A
Authority: CN
Inventors: 邓水光; 徐浩然; 向天宇; 智晨; 张高榕; 吴孜璇; 尹建伟
Original assignee: Zhejiang University ZJU; Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Zhejiang University ZJU; Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2024-02-07
Filing date: 2024-02-07
Publication date: 2024-05-03
Anticipated expiration: 2044-02-07
Also published as: CN117971684B

Abstract

The invention discloses a whole machine regression test case recommendation method for changing semantic perception, which comprises the following steps: (1) Acquiring code submission information and a test case, and cleaning to acquire a text data set D _{Submitting information} of the code submission information and a description data set D _{Test case} of the test case; (2) Classifying each submitted information text in the text dataset D _{Submitting information} into different characteristic tags according to a hierarchical structure by adopting a hierarchical residual multi-granularity classification network model based on a tag relation tree; the feature tag represents a description tag of a corresponding function of the test case test; (3) And screening test cases under the classification from the description dataset D _{Test case} according to the classification of each submitted information text, performing similarity calculation on the submitted information text and the corresponding screened test case text, and selecting the test cases with high similarity scores as recommended test cases. By utilizing the method and the device, the efficiency and the accuracy of test case selection can be effectively improved.

Description

A semantically-aware whole-machine regression test case recommendation method

技术领域Technical Field

本发明属于软件回归测试领域，尤其是涉及一种变更语义感知的整机回归测试用例推荐方法。The present invention belongs to the field of software regression testing, and in particular relates to a change semantics-aware whole-machine regression test case recommendation method.

背景技术Background technique

回归测试是软件开发和发布的关键环节，在修改旧代码后进行，以确保这些修改没有引入新的缺陷或者影响了其他功能的正常运作。自动化回归测试能显著减少在系统测试、维护更新等阶段的开销。Regression testing is a key step in software development and release. It is performed after modifying the old code to ensure that these modifications have not introduced new defects or affected the normal operation of other functions. Automated regression testing can significantly reduce the cost of system testing, maintenance and update.

作为软件生命周期的重要组成部分，回归测试在整个软件测试流程中占据重要地位，并在软件开发的每个阶段都会反复执行多次。尤其在渐进式和快速迭代的开发模式中，新版本的频繁发布要求回归测试的进行更为频繁，而采用极端编程方法的项目中，每天可能需要进行多轮回归测试。在这种背景下，选择合适的回归测试策略，以提升回归测试的效率和有效性，显得尤为重要。例如，中国专利文献号CN101178687A公开了一种改进的软件回归测试方法。As an important part of the software life cycle, regression testing occupies an important position in the entire software testing process and is repeatedly performed many times at each stage of software development. Especially in the incremental and rapid iteration development mode, the frequent release of new versions requires more frequent regression testing, and in projects using extreme programming methods, multiple rounds of regression testing may be required every day. In this context, it is particularly important to choose a suitable regression testing strategy to improve the efficiency and effectiveness of regression testing. For example, Chinese patent document No. CN101178687A discloses an improved software regression testing method.

随着产品功能的持续增加和迭代，测试用例集的规模也在不断扩大，这直接导致了回归测试成本的增长。鉴于测试资源的有限性，不可能执行全部测试用例。为了提升回归测试的效率，迫切需要制定更加科学的回归测试策略。这就要求依据一系列明确的测试目标，对测试用例进行有效排序，以确定它们的执行顺序，确保优先执行最关键的测试用例。With the continuous increase and iteration of product functions, the scale of test case sets is also expanding, which directly leads to the increase of regression testing costs. Given the limited testing resources, it is impossible to execute all test cases. In order to improve the efficiency of regression testing, it is urgent to formulate a more scientific regression testing strategy. This requires effective sorting of test cases based on a series of clear test objectives to determine their execution order and ensure that the most critical test cases are executed first.

如论文《Can Code Representation Boost IR-Based Test CasePrioritization》提出了一个方法，通过使用变更代码和测试代码的语义相似度来进行测试用例的推荐。这个方法首先使用代码表示方法将变更代码和测试代码转换成向量，然后根据方法级别和类级别两种粒度来计算测试方法和变更代码之间的相似度，最后结合两种粒度的相似度给出和变更代码相似度高的测试用例。For example, the paper "Can Code Representation Boost IR-Based Test Case Prioritization" proposed a method to recommend test cases by using the semantic similarity between the changed code and the test code. This method first uses the code representation method to convert the changed code and the test code into vectors, then calculates the similarity between the test method and the changed code based on the method level and class level granularity, and finally combines the similarity of the two granularities to give test cases with high similarity to the changed code.

如论文《Prioritizing Natural Language Test Cases Based on Highly-UsedGame Features》提出一个方法，通过零样本分类和遗传算法多目标优化对使用自然语言描述的人工测试用例进行推荐。这个方法首先使用零样本分类的方法从自然语言测试用例描述中自动识别测试用例会测试的功能，然后根据测试用例所覆盖的频繁使用的功能以及测试用例的执行时间，对测试用例进行优先级排序，这里使用到了遗传算法对覆盖功能多和执行时间短两个目标进行优化，得到较好的测试用例推荐排序。For example, the paper "Prioritizing Natural Language Test Cases Based on Highly-Used Game Features" proposes a method to recommend artificial test cases described in natural language through zero-shot classification and genetic algorithm multi-objective optimization. This method first uses the zero-shot classification method to automatically identify the functions that the test cases will test from the natural language test case description, and then prioritizes the test cases based on the frequently used functions covered by the test cases and the execution time of the test cases. Here, the genetic algorithm is used to optimize the two goals of covering more functions and short execution time to obtain a better test case recommendation ranking.

尽管现有技术提供了回归测试用例选择的基本方法，但它们存在一些显著的缺点：Although existing techniques provide basic methods for regression test case selection, they have some significant disadvantages:

1、场景不匹配：大多数优先级方法不适用于人工测试用例，因为它们需要源代码信息或测试执行历史记录来支持他们的算法，而这在人工测试场景中通常不适用。1. Scenario Mismatch: Most prioritization methods are not applicable to manual test cases because they require source code information or test execution history to support their algorithms, which are usually not applicable in manual testing scenarios.

2、变更语义理解不足：代码提交信息包含了关于代码变更的重要语义信息，这对于识别受影响的功能和相应测试用例至关重要。现有技术缺乏有效的机制去解读这些语义信息，导致测试用例选择可能不够精确或相关。2. Insufficient understanding of change semantics: Code submission information contains important semantic information about code changes, which is crucial for identifying affected functions and corresponding test cases. Existing technologies lack an effective mechanism to interpret this semantic information, resulting in test case selection that may not be accurate or relevant enough.

因此，需要一种更高效、智能化的技术方案来克服上述问题，特别是能够理解代码提交的语义信息，并自动推荐相关的人工测试用例，以提高软件测试的准确性和效率。Therefore, a more efficient and intelligent technical solution is needed to overcome the above problems, especially one that can understand the semantic information of code submissions and automatically recommend relevant manual test cases to improve the accuracy and efficiency of software testing.

发明内容Summary of the invention

本发明提供了一种变更语义感知的整机回归测试用例推荐方法，能够有效提高测试用例选择的效率和准确性。The present invention provides a change semantics-aware whole-machine regression test case recommendation method, which can effectively improve the efficiency and accuracy of test case selection.

一种变更语义感知的整机回归测试用例推荐方法，包括以下步骤：A change semantics-aware whole-machine regression test case recommendation method includes the following steps:

(1)获取代码提交信息和测试用例，数据清洗后获得代码提交信息的文本数据集D_提交信息和测试用例的描述数据集D_测试用例；(1) Obtain code submission information and test cases. After data cleaning, obtain a text dataset D of code _{submission information} and a description dataset D of _test cases;

(2)采用基于标签关系树的层级残差多粒度分类网络模型，将文本数据集D_提交信息中的每个提交信息文本按层次结构分类到不同特征标签中；(2) Using a hierarchical residual multi-granularity classification network model based on a label relationship tree, each submission information text in the text dataset D _submission information is classified into different feature labels according to the hierarchical structure;

其中，特征标签代表测试用例测试的对应功能的描述标签，其描述形式为树状多层次标签表示；Among them, the feature label represents the description label of the corresponding function tested by the test case, and its description form is a tree-like multi-level label representation;

(3)根据对每个提交信息文本的分类从描述数据集D_测试用例中筛选出该分类下的测试用例子集，对提交信息文本和对应测试用例子集的文本进行相似度计算，选出高相似度分数的用例作为推荐的测试用例。(3) According to the classification of each submitted information text, a subset of test cases under the classification is selected from _{the test} cases in the description data set D, and the similarity between the submitted information text and the text in the corresponding subset of test cases is calculated, and the cases with high similarity scores are selected as recommended test cases.

步骤(2)具体包括：Step (2) specifically includes:

(2-1)通过微调的BERT模型，将文本数据集D_提交信息中的每个提交信息文本转换成高维向量；(2-1) Through the fine-tuned BERT model, each submission information text in the text dataset D _{submission information} is converted into a high-dimensional vector;

(2-2)将测试用例的描述数据集D_测试用例中的标签层次关系转换成树结构，并生成基于二进制标签的状态向量；(2-2) Convert the label hierarchical relationship in the test case description data set D _{test case} into a tree structure and generate a state vector based on binary labels;

(2-3)将步骤(2-1)中得到的文本高维向量输入到层级残差多粒度分类网络模型，通过步骤(2-2)得到的状态向量进行约束，对文本高维向量进行层级特征分类，确定每个提交信息文本的层级特征标签。(2-3) The high-dimensional text vector obtained in step (2-1) is input into the hierarchical residual multi-granularity classification network model, and the state vector obtained in step (2-2) is constrained to perform hierarchical feature classification on the high-dimensional text vector to determine the hierarchical feature label of each submitted information text.

步骤(2-1)的具体过程为：The specific process of step (2-1) is:

(2-1-1)选择BERT模型，并利用标注好特征的提交信息文本数据进行训练微调；(2-1-1) Select the BERT model and use the submitted information text data with labeled features for training and fine-tuning;

(2-1-2)通过与BERT模型对应的分词器，将文本数据集D_提交信息中的提交信息文本转换成词元列表和对应的掩码列表；(2-1-2) Using the word segmenter corresponding to the BERT model, convert the submission information text in the text dataset D _{submission information} into a word list and a corresponding mask list;

(2-1-3)将词元列表和对应的掩码列表输入到步骤(2-1-1)训练微调后的BERT模型，转换成高维向量；(2-1-3) Input the word list and the corresponding mask list into the BERT model trained and fine-tuned in step (2-1-1) and convert them into high-dimensional vectors;

(2-1-4)选取BERT模型输出的池化向量输出作为提交信息文本的向量表示。(2-1-4) Select the pooled vector output of the BERT model as the vector representation of the submitted information text.

步骤(2-2)的具体过程为：The specific process of step (2-2) is:

(2-2-1)设计结构化的格式存储特征树的层次化信息，特征标签代表测试用例测试的对应功能的描述标签，其描述形式为树状多层次标签表示；其中，测试用例被划分至特征树任意一层节点，在叶子节点或中间节点，不同特征节点代表不同的测试用例分类；(2-2-1) Design a structured format to store the hierarchical information of the feature tree. The feature label represents the description label of the corresponding function tested by the test case, and its description form is a tree-like multi-level label representation; wherein the test case is divided into any layer of nodes in the feature tree, and at the leaf node or the middle node, different feature nodes represent different test case classifications;

(2-2-2)将特征树计算出所有符合特征树约束的基于二进制标签的状态向量，形成合理状态空间；若特征节点总数为N，形成一个N×N的矩阵，通过遍历特征树结构，使矩阵每列代表特征节点，每行代表特征树中所有可能的合法路径，使得取二元值的层级类别标签代表特征节点标签，即状态空间用于定义父子节点以及兄弟节点之间关系的合法约束。(2-2-2) The feature tree calculates all binary label-based state vectors that meet the feature tree constraints to form a reasonable state space; if the total number of feature nodes is N, an N×N matrix is formed. By traversing the feature tree structure, each column of the matrix represents a feature node, and each row represents all possible legal paths in the feature tree, so that the binary-valued hierarchical category label represents the feature node label, that is, the state space is used to define the legal constraints on the relationship between parent-child nodes and sibling nodes.

步骤(2-3)的具体过程为：The specific process of step (2-3) is:

(2-3-1)根据步骤(2-2-2)中的状态空间计算特征树的层数、每层特征的数量，用于构建层级残差多粒度分类网络模型的网络结构；其中，所述的层级残差多粒度分类网络模型包含层次特征传递模块、残差连接部分以及层级分类模块；(2-3-1) Calculating the number of layers of the feature tree and the number of features in each layer according to the state space in step (2-2-2) to construct a network structure of a hierarchical residual multi-granularity classification network model; wherein the hierarchical residual multi-granularity classification network model comprises a hierarchical feature transfer module, a residual connection part and a hierarchical classification module;

(2-3-2)建立层次特征传递模块，该模块由若干全连接网络或者卷积网络构成，用于不同层次之间特征提取以及向下传递；建立残差连接部分，通过依次连接的粗粒度父类层级特征与细粒度子类层级特征层，得到由父类特征信息逐层向下结合到子类的组合特征向量；(2-3-2) Establish a hierarchical feature transfer module, which is composed of several fully connected networks or convolutional networks, and is used for feature extraction between different levels and downward transmission; establish a residual connection part, through the sequential connection of coarse-grained parent-class level features and fine-grained child-class level feature layers, to obtain a combined feature vector that is combined from the parent-class feature information layer by layer to the child-class;

(2-3-3)根据组合特征向量应用层级分类模块，组合特征向量通过每一层级的分类网络，分类网络输出的每个维度对应每个特征标签；(2-3-3) Applying a hierarchical classification module based on the combined feature vector, the combined feature vector passes through each level of the classification network, and each dimension of the classification network output corresponds to each feature label;

(2-3-4)设定层级阈值，根据每层分类网络输出的向量确定文本的层次分类。(2-3-4) Set the hierarchical threshold and determine the hierarchical classification of the text based on the vector output by each layer of the classification network.

步骤(3)具体包括：Step (3) specifically includes:

(3-1)根据对每个提交信息文本的分类从描述数据集D_测试用例中筛选出该分类下的测试用例子集；通过微调的BERT模型，将数据集D_提交信息中的每个提交信息文本和对应测试用例子集的文本转换成高维向量；(3-1) According to the classification of each submission information text, a subset of test cases under the classification is selected from _{the test} cases in the description dataset D; each submission information text in _{the submission information} of the dataset D and the text in the corresponding subset of test cases are converted into high-dimensional vectors through a fine-tuned BERT model;

(3-2)根据步骤(3-1)中得到的文本高维向量表示，计算提交信息文本的向量表示与对应的测试用例子集中每个测试用例描述的向量表示之间的相似度，得到推荐的测试用例列表。(3-2) Based on the high-dimensional vector representation of the text obtained in step (3-1), calculate the similarity between the vector representation of the submitted information text and the vector representation of each test case description in the corresponding test case set to obtain a list of recommended test cases.

步骤(3-1)的具体过程为：The specific process of step (3-1) is:

(3-1-1)选择BERT模型，并根据标注好的提交信息和测试用例的文本数据对进行训练微调；(3-1-1) Select the BERT model and train and fine-tune it based on the annotated submission information and test case text data;

(3-1-2)根据对每个提交信息文本的分类从描述数据集D_测试用例中筛选出该分类下的测试用例子集；通过与BERT模型对应的分词器，将提交信息文本和对应的测试用例子集的文本转换成词元列表和对应的掩码列表；(3-1-2) According to the classification of each submitted information text, a subset of test cases under the classification is selected from _{the test} cases in the description dataset D; through the word segmenter corresponding to the BERT model, the submitted information text and the text of the corresponding subset of test cases are converted into a word list and a corresponding mask list;

(3-1-3)将步骤(3-1-2)中的词元列表和对应的掩码列表输入到步骤(3-1-1)训练微调得到的BERT模型，转换成提交信息的高维向量和测试用例描述的高维向量；(3-1-3) Input the word list and the corresponding mask list in step (3-1-2) into the BERT model trained and fine-tuned in step (3-1-1), and convert them into high-dimensional vectors of submission information and test case descriptions;

(3-1-4)选取BERT模型输出的池化向量输出作为对应文本的向量表示。(3-1-4) Select the pooled vector output of the BERT model as the vector representation of the corresponding text.

步骤(3-2)的具体过程为：The specific process of step (3-2) is:

(3-2-1)计算提交信息的向量表示与对应的测试用例子集中每个测试用例描述的向量表示之间的相似度，其中相似度计算使用余弦相似度公式：(3-2-1) Calculate the similarity between the vector representation of the submitted information and the vector representation of each test case description in the corresponding test case set, where the similarity calculation uses the cosine similarity formula:

(3-2-2)根据步骤(3-2-1)给出的相似度分数，选择相似度较高的测试用例作为推荐的测试用例，并输出推荐的测试用例列表。(3-2-2) According to the similarity scores given in step (3-2-1), test cases with higher similarity are selected as recommended test cases, and a list of recommended test cases is output.

与现有技术相比，本发明具有以下有益效果：Compared with the prior art, the present invention has the following beneficial effects:

本发明设计了一个两阶段的测试用例选择框架，第一阶段根据代码提交语义信息，利用基于标签树结构的层级残差多粒度分类模型分类到对应的测试用例特征标签下，从而在海量测试用例集中筛选出一个测试用例子集，缩小第二阶段中需要计算匹配的测试用例范围，提高测试用例选择速度。第二阶段根据代码提交语义信息和测试用例描述信息，利用SBERT模型计算代码提交语义信息和根据第一阶段筛选出来的测试用例子集的描述信息的语义相似度，根据相似度分数输出按照变更语义相关度排序的推荐测试用例列表。通过对代码提交信息文本的处理和智能分析，该方法提供了一种高效、准确的方式，用于在软件开发过程中自动选择相关的回归测试用例，适用于快速迭代的软件开发项目。The present invention designs a two-stage test case selection framework. In the first stage, according to the code submission semantic information, a hierarchical residual multi-granularity classification model based on a label tree structure is used to classify the code into corresponding test case feature labels, thereby screening out a subset of test cases from a massive test case set, reducing the scope of test cases that need to be calculated and matched in the second stage, and improving the test case selection speed. In the second stage, according to the code submission semantic information and the test case description information, the SBERT model is used to calculate the semantic similarity between the code submission semantic information and the description information of the subset of test cases screened out in the first stage, and a recommended test case list sorted by the change semantic relevance is output according to the similarity score. Through the processing and intelligent analysis of the code submission information text, the method provides an efficient and accurate way to automatically select relevant regression test cases in the software development process, which is suitable for fast-iteration software development projects.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明一种变更语义感知的整机回归测试用例推荐方法流程图；FIG1 is a flow chart of a method for recommending whole-machine regression test cases based on semantic change awareness according to the present invention;

图2为本发明中层级残差多粒度分类网络模型的残差连接部分示意图；FIG2 is a schematic diagram of the residual connection part of the hierarchical residual multi-granularity classification network model of the present invention;

图3为本发明实施例中步骤S200的流程图；FIG3 is a flow chart of step S200 in an embodiment of the present invention;

图4为本发明实施例中步骤S300的流程图。FIG. 4 is a flow chart of step S300 in an embodiment of the present invention.

具体实施方式Detailed ways

下面结合附图和实施例对本发明做进一步详细描述，需要指出的是，以下所述实施例旨在便于对本发明的理解，而对其不起任何限定作用。The present invention is further described in detail below in conjunction with the accompanying drawings and embodiments. It should be pointed out that the embodiments described below are intended to facilitate the understanding of the present invention and do not have any limiting effect on the present invention.

如图1所示，一种变更语义感知的整机回归测试用例推荐方法，包括以下步骤：As shown in FIG1 , a change semantics-aware whole-machine regression test case recommendation method includes the following steps:

S100，数据预处理，获取历史版本中的代码提交信息的文本，组成原始数据集，然后清洗数据集输出可用于下面步骤的数据集D_提交信息。并且根据相同的方法获取清洗出测试用例的描述数据集D_测试用例。S100, data preprocessing, obtaining the text of code submission information in the historical version to form the original data set, and then cleaning the data set to output the data set D _{submission information} that can be used in the following steps. And according to the same method, obtain the description data set D _{test case} of the cleaned test case.

本步骤中，清洗原始数据集的具体过程为：In this step, the specific process of cleaning the original data set is:

S101，从数据库中收集文本。S101, collect text from a database.

S102，去除文本中的特殊字符、空格等无关信息。S102, removing irrelevant information such as special characters and spaces in the text.

S103，对文本进行规范化处理。具体来说，首先对文本中的英文字母大小写统一，然后根据提交信息格式的模板提取出关键的字段。S103, normalizing the text. Specifically, firstly, the upper and lower case letters in the text are unified, and then key fields are extracted according to the template of the submitted information format.

S104，输出清洗和规范化后的文本数据。S104, output the cleaned and normalized text data.

S200，获取用例库特中征标签树的结构，将树结构转换成标签关系状态向量用于构建分类网络模型。采用基于标签关系树的层级残差多粒度分类网络模型，将步骤S100中的代码提交信息的文本数据按层次结构分类到不同特征标签中。特征标签代表测试用例测试的对应功能的描述标签，其描述形式的树状多层次标签表示。本步骤的基于标签关系树的层级残差多粒度分类网络模型需要通过一定数量的已标注特征的提交信息文本数据进行训练，以适应具体项目的特定环境。S200, obtain the structure of the label tree in the test case library, and convert the tree structure into a label relationship state vector for constructing a classification network model. A hierarchical residual multi-granularity classification network model based on a label relationship tree is used to classify the text data of the code submission information in step S100 into different feature labels according to a hierarchical structure. The feature label represents the description label of the corresponding function tested by the test case, and its description is represented by a tree-like multi-level label. The hierarchical residual multi-granularity classification network model based on the label relationship tree in this step needs to be trained with a certain amount of submission information text data with annotated features to adapt to the specific environment of a specific project.

S200的具体步骤如图3所示，具体包括：The specific steps of S200 are shown in FIG3 , and specifically include:

S210，文本向量化模块(BERT模型)，通过微调的BERT模型，将步骤S100中的输出文本转换成高维向量。S210, a text vectorization module (BERT model), converts the output text in step S100 into a high-dimensional vector through a fine-tuned BERT model.

S220，标签关系树构建模块，将用例库标签层次关系转换成树结构，并生成基于二进制标签状态向量。S220, a label relationship tree construction module, converts the label hierarchy of the use case library into a tree structure and generates a binary label state vector.

S230，层级分类模块，将步骤S210中得到的文本高维向量输入到分类模型中，通过步骤S220得到的状态向量进行约束，对文本高维向量进行层级特征分类，确定每个提交信息文本的层级特征标签。S230, a hierarchical classification module, inputs the high-dimensional text vector obtained in step S210 into the classification model, constrains it through the state vector obtained in step S220, performs hierarchical feature classification on the high-dimensional text vector, and determines the hierarchical feature label of each submitted information text.

其中步骤S210具体包括：Step S210 specifically includes:

S211，选择合适的BERT预训练模型，并根据标注好的特征的提交信息的文本数据进行训练微调，以适应项目特定的文本数据。S211, select a suitable BERT pre-trained model, and perform training and fine-tuning based on the text data of the submitted information with labeled features to adapt to the project-specific text data.

S212，步骤S100中输出的文本通过BERT模型对应的分词器，将文本转换成词元列表和对应的掩码列表。S212, the text output in step S100 is converted into a word list and a corresponding mask list through the word segmenter corresponding to the BERT model.

S213，将步骤S212中的词元列表和对应的掩码列表输入到BERT模型，转换成高维向量。S213, input the word list and the corresponding mask list in step S212 into the BERT model and convert them into high-dimensional vectors.

S214，选取BERT模型输出的池化向量输出作为文本的向量表示并输出到下一个步骤。S214, select the pooled vector output of the BERT model as the vector representation of the text and output it to the next step.

其中步骤S220具体包括：Wherein step S220 specifically includes:

S221，提取特征树的层次化信息，形成多层次特征标签树。设计结构化的格式存储特征树的层次化信息，特征标签代表测试用例测试的对应功能的描述标签，其描述形式为树状多层次标签表示。测试用例可以被划分至特征树任意一层节点，既可以在叶子节点，也可以在中间节点，在大部分应用场景下，不同特征节点代表不同的测试用例分类。S221, extract the hierarchical information of the feature tree to form a multi-level feature label tree. Design a structured format to store the hierarchical information of the feature tree. The feature label represents the description label of the corresponding function tested by the test case, and its description form is a tree-like multi-level label representation. Test cases can be divided into any layer of nodes in the feature tree, either in leaf nodes or in intermediate nodes. In most application scenarios, different feature nodes represent different test case classifications.

S222，将步骤S221中特征树计算出所有符合特征树约束的基于二进制标签的状态向量，形成合理状态空间。若特征节点总数为N，形成一个N×N的矩阵，通过遍历特征树结构，使矩阵每列代表特征节点，每行代表特征树中所有可能的合法路径，使得取二元值的层级类别标签代表特征节点标签，即状态空间用于定义父子节点以及兄弟节点之间关系的合法约束。S222, the feature tree in step S221 calculates all binary label-based state vectors that meet the feature tree constraints to form a reasonable state space. If the total number of feature nodes is N, an N×N matrix is formed. By traversing the feature tree structure, each column of the matrix represents a feature node, and each row represents all possible legal paths in the feature tree, so that the binary-valued hierarchical category label represents the feature node label, that is, the state space is used to define the legal constraints of the relationship between parent-child nodes and sibling nodes.

其中步骤S230具体包括：Wherein step S230 specifically includes:

S231，计算构建网络所需要的层级属性，根据步骤S222中的状态空间计算特征树的层数、每层特征的数量用于构建层级残差网络结构。S231, calculate the hierarchical attributes required for building the network, and calculate the number of layers of the feature tree and the number of features in each layer according to the state space in step S222 to build a hierarchical residual network structure.

S232，建立层次特征传递模块，该模块由若干全连接网络或者卷积网络构成，用于不同层次之间特征提取以及向下传递。建立残差连接部分，如图2所示，通过依次连接的粗粒度父类层级特征与细粒度子类层级特征层，得到由父类特征信息逐层向下结合到子类的组合特征向量。S232, establish a hierarchical feature transfer module, which is composed of several fully connected networks or convolutional networks, and is used for feature extraction and downward transfer between different levels. Establish a residual connection part, as shown in Figure 2, by sequentially connecting the coarse-grained parent-class level feature and the fine-grained child-class level feature layer, to obtain a combined feature vector that is combined from the parent-class feature information layer by layer to the child class.

S233，根据步骤S232组合特征向量应用层级分类模块，组合特征向量通过每一层级的分类网络。首先根据特征树确定每一层的特征标签数量，确定分类模型的全连接网络层次结构。其中一层的全连接网络的计算过程为：S233, according to step S232, the combined feature vector is applied to the hierarchical classification module, and the combined feature vector passes through the classification network of each level. First, the number of feature labels of each layer is determined according to the feature tree, and the fully connected network hierarchy of the classification model is determined. The calculation process of the fully connected network of one layer is:

x_i+1＝σ(W_ix_i+b_i)x _i+1 =σ( _Wixi + _bi ₎

其中x_i+1表示输出向量，σ表示激活函数，W_i表示当前层的网络权重，x_i表示输入向量，b_i表示偏置。使用Sigmoid函数作为激活函数。分类网络输出的每个维度则对应了每个特征标签。Where x _i+1 represents the output vector, σ represents the activation function, _Wi represents the network weight of the current layer, x _i represents the input vector, and _bi represents the bias. The Sigmoid function is used as the activation function. Each dimension of the classification network output corresponds to each feature label.

S234，根据每层分类网络输出的向量确定文本的层次分类。首先需要设定每一层阈值，当步骤S233输出的到某一层向量的某个维度的值高于当前层的阈值，那么文本分类到该层这一个维度对应的特征标签上，根据树结构判定分类特征标签是否合理，最后形成整体的层次分类标签S234, determine the hierarchical classification of the text based on the vector output by each layer of the classification network. First, it is necessary to set the threshold of each layer. When the value of a certain dimension of the vector output in step S233 is higher than the threshold of the current layer, the text is classified into the feature label corresponding to this dimension of the layer. The tree structure is used to determine whether the classification feature label is reasonable, and finally the overall hierarchical classification label is formed.

S300，根据步骤S200中对每个提交信息文本的分类从步骤S100中输出的测试用例的描述数据集D_测试用例中筛选出该分类下的测试用例子集，通过SBERT模型对提交信息文本和测试用例文本进行相似度计算，选出高相似度分数的用例作为推荐的测试用例。本步骤的基于SBERT模型需要通过一定数量的标注好的提交信息文本和测试用例文本对应的数据进行训练微调适应具体项目的特定环境。S300, according to the classification of each submitted information text in step S200, a subset of test cases under the classification is screened from the test case description data set D _{test case} output in step S100, and the similarity between the submitted information text and the test case text is calculated through the SBERT model, and the test cases with high similarity scores are selected as recommended test cases. The SBERT-based model in this step needs to be trained and fine-tuned with a certain amount of data corresponding to the annotated submitted information text and test case text to adapt to the specific environment of the specific project.

如图4所示，S300的具体步骤包括：As shown in FIG. 4 , the specific steps of S300 include:

S310，通过微调的BERT模型，将步骤S100中的提交信息和筛选的测试用例子集的文本数据转换成高维向量。S310, converting the submitted information in step S100 and the text data of the filtered test case subset into a high-dimensional vector through the fine-tuned BERT model.

S320，根据步骤S310中得到的文本高维向量表示，计算提交信息的向量表示与每个测试用例描述的向量表示之间的相似度，得到推荐的测试用例列表。S320, based on the high-dimensional vector representation of the text obtained in step S310, calculate the similarity between the vector representation of the submitted information and the vector representation of each test case description to obtain a recommended test case list.

其中步骤S310具体包括：Step S310 specifically includes:

S311，选择和步骤S211相同的预训练模型，并根据标注好的提交信息和测试用例的文本数据对进行训练微调，以适应项目特定的文本数据。S311, select the same pre-trained model as step S211, and perform training and fine-tuning based on the labeled submission information and test case text data to adapt to the project-specific text data.

S312，步骤S100中输出的文本通过BERT模型对应的分词器，将提交信息文本和测试用例文本转换成词元列表和对应的掩码列表。S312, the text output in step S100 is converted into a word list and a corresponding mask list by using the word segmenter corresponding to the BERT model.

S313，将步骤S312中的词元列表和对应的掩码列表输入到BERT模型，转换成提交信息的高维向量和测试用例描述的高维向量。S313, input the word list and the corresponding mask list in step S312 into the BERT model and convert them into high-dimensional vectors of submission information and test case descriptions.

S314，选取BERT模型输出的池化向量输出作为对应文本的向量表示。S314, select the pooled vector output of the BERT model as the vector representation of the corresponding text.

其中步骤S320具体包括：Step S320 specifically includes:

S321，计算提交信息的向量表示与每个测试用例描述的向量表示之间的相似度，其中相似度计算使用余弦相似度公式：S321, calculating the similarity between the vector representation of the submitted information and the vector representation of each test case description, wherein the similarity calculation uses the cosine similarity formula:

S322，根据步骤S321给出的相似度分数，选择相似度较高的测试用例作为推荐的测试用例，并输出推荐的测试用例列表。S322, according to the similarity scores given in step S321, select test cases with higher similarity as recommended test cases, and output a list of recommended test cases.

本发明的实施方式能够有效提高测试用例选择的效率和准确性。通过智能化的文本分析和相似度计算，该方法能够在大量的测试用例中快速识别出与最近的代码提交密切相关的测试用例，从而显著提升软件测试的质量和可靠性。The embodiments of the present invention can effectively improve the efficiency and accuracy of test case selection. Through intelligent text analysis and similarity calculation, the method can quickly identify test cases closely related to the most recent code submission among a large number of test cases, thereby significantly improving the quality and reliability of software testing.

以上所述的实施例对本发明的技术方案和有益效果进行了详细说明，应理解的是以上所述仅为本发明的具体实施例，并不用于限制本发明，凡在本发明的原则范围内所做的任何修改、补充和等同替换，均应包含在本发明的保护范围之内。The embodiments described above provide a detailed description of the technical solutions and beneficial effects of the present invention. It should be understood that the above are only specific embodiments of the present invention and are not intended to limit the present invention. Any modifications, supplements and equivalent substitutions made within the scope of the principles of the present invention should be included in the protection scope of the present invention.

Claims

1. A change semantics-aware whole-machine regression test case recommendation method, characterized by comprising the following steps:

(1) Obtain code submission information and test cases. After data cleaning, obtain a text dataset D of code _{submission information} and a description dataset D of _test cases;

(2) Using a hierarchical residual multi-granularity classification network model based on a label relationship tree, each submission information text in the text dataset D _submission information is classified into different feature labels according to the hierarchical structure;

Among them, the feature label represents the description label of the corresponding function tested by the test case, and its description form is a tree-like multi-level label representation;

(3) According to the classification of each submitted information text, a subset of test cases under the classification is selected from _{the test} cases in the description data set D, and the similarity between the submitted information text and the text in the corresponding subset of test cases is calculated, and the cases with high similarity scores are selected as recommended test cases.

2. The change semantics-aware whole-machine regression test case recommendation method according to claim 1, wherein step (2) specifically comprises:

(2-1) Through the fine-tuned BERT model, each submission information text in the text dataset D _{submission information} is converted into a high-dimensional vector;

(2-2) Convert the label hierarchical relationship in the test case description data set D _{test case} into a tree structure and generate a state vector based on binary labels;

(2-3) The high-dimensional text vector obtained in step (2-1) is input into the hierarchical residual multi-granularity classification network model, and the state vector obtained in step (2-2) is constrained to perform hierarchical feature classification on the high-dimensional text vector to determine the hierarchical feature label of each submitted information text.

3. The change semantics-aware whole-machine regression test case recommendation method according to claim 2, characterized in that the specific process of step (2-1) is:

(2-1-1) Select the BERT model and use the submitted information text data with labeled features for training and fine-tuning;

(2-1-2) Using the word segmenter corresponding to the BERT model, convert the submission information text in the text dataset D _{submission information} into a word list and a corresponding mask list;

(2-1-3) Input the word list and the corresponding mask list into the BERT model trained and fine-tuned in step (2-1-1) and convert them into high-dimensional vectors;

(2-1-4) Select the pooled vector output of the BERT model as the vector representation of the submitted information text.

4. The change semantics-aware whole-machine regression test case recommendation method according to claim 2, characterized in that the specific process of step (2-2) is:

(2-2-1) Design a structured format to store the hierarchical information of the feature tree. The feature label represents the description label of the corresponding function tested by the test case, and its description form is a tree-like multi-level label representation; wherein the test case is divided into any layer of nodes in the feature tree, and at the leaf node or the middle node, different feature nodes represent different test case classifications;

(2-2-2) The feature tree calculates all binary label-based state vectors that meet the feature tree constraints to form a reasonable state space; if the total number of feature nodes is N, an N×N matrix is formed. By traversing the feature tree structure, each column of the matrix represents a feature node, and each row represents all possible legal paths in the feature tree, so that the binary-valued hierarchical category label represents the feature node label, that is, the state space is used to define the legal constraints on the relationship between parent-child nodes and sibling nodes.

5. The change semantics-aware whole-machine regression test case recommendation method according to claim 4 is characterized in that the specific process of step (2-3) is:

(2-3-1) Calculating the number of layers of the feature tree and the number of features in each layer according to the state space in step (2-2-2) to construct a network structure of a hierarchical residual multi-granularity classification network model; wherein the hierarchical residual multi-granularity classification network model comprises a hierarchical feature transfer module, a residual connection part and a hierarchical classification module;

(2-3-2) Establish a hierarchical feature transfer module, which is composed of several fully connected networks or convolutional networks, and is used for feature extraction between different levels and downward transmission; establish a residual connection part, through the sequential connection of coarse-grained parent-class level features and fine-grained child-class level feature layers, to obtain a combined feature vector that is combined from the parent-class feature information layer by layer to the child-class;

(2-3-3) Applying a hierarchical classification module based on the combined feature vector, the combined feature vector passes through each level of the classification network, and each dimension of the classification network output corresponds to each feature label;

(2-3-4) Set the hierarchical threshold and determine the hierarchical classification of the text based on the vector output by each layer of the classification network.

6. The change semantics-aware whole-machine regression test case recommendation method according to claim 1, wherein step (3) specifically comprises:

(3-1) According to the classification of each submission information text, a subset of test cases under the classification is selected from _{the test} cases in the description dataset D; each submission information text in _{the submission information} of the dataset D and the text in the corresponding subset of test cases are converted into high-dimensional vectors through a fine-tuned BERT model;

(3-2) Based on the high-dimensional vector representation of the text obtained in step (3-1), calculate the similarity between the vector representation of the submitted information text and the vector representation of each test case description in the corresponding test case set to obtain a list of recommended test cases.

7. The change semantics-aware whole-machine regression test case recommendation method according to claim 6, characterized in that the specific process of step (3-1) is:

(3-1-1) Select the BERT model and train and fine-tune it based on the annotated submission information and test case text data;

(3-1-2) According to the classification of each submitted information text, a subset of test cases under the classification is selected from _{the test} cases in the description dataset D; through the word segmenter corresponding to the BERT model, the submitted information text and the text of the corresponding subset of test cases are converted into a word list and a corresponding mask list;

(3-1-3) Input the word list and the corresponding mask list in step (3-1-2) into the BERT model trained and fine-tuned in step (3-1-1), and convert them into high-dimensional vectors of submission information and test case descriptions;

(3-1-4) Select the pooled vector output of the BERT model as the vector representation of the corresponding text.

8. The change semantics-aware whole-machine regression test case recommendation method according to claim 7, characterized in that the specific process of step (3-2) is:

(3-2-1) Calculate the similarity between the vector representation of the submitted information and the vector representation of each test case description in the corresponding test case set, where the similarity calculation uses the cosine similarity formula:

(3-2-2) According to the similarity scores given in step (3-2-1), test cases with higher similarity are selected as recommended test cases, and a list of recommended test cases is output.