CN117971684A - Whole machine regression test case recommendation method capable of changing semantic perception - Google Patents
Whole machine regression test case recommendation method capable of changing semantic perception Download PDFInfo
- Publication number
- CN117971684A CN117971684A CN202410172283.1A CN202410172283A CN117971684A CN 117971684 A CN117971684 A CN 117971684A CN 202410172283 A CN202410172283 A CN 202410172283A CN 117971684 A CN117971684 A CN 117971684A
- Authority
- CN
- China
- Prior art keywords
- test case
- text
- feature
- hierarchical
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012360 testing method Methods 0.000 title claims abstract description 168
- 238000000034 method Methods 0.000 title claims abstract description 43
- 230000008447 perception Effects 0.000 title claims abstract description 16
- 230000006870 function Effects 0.000 claims abstract description 16
- 238000004364 calculation method Methods 0.000 claims abstract description 9
- 238000012216 screening Methods 0.000 claims abstract description 8
- 238000004140 cleaning Methods 0.000 claims abstract description 7
- 239000013598 vector Substances 0.000 claims description 75
- 230000008569 process Effects 0.000 claims description 14
- 238000012549 training Methods 0.000 claims description 12
- 238000012546 transfer Methods 0.000 claims description 8
- 239000011159 matrix material Substances 0.000 claims description 6
- 230000011218 segmentation Effects 0.000 claims description 4
- 238000000605 extraction Methods 0.000 claims description 3
- 230000000452 restraining effect Effects 0.000 claims description 2
- 230000008859 change Effects 0.000 description 6
- 230000018109 developmental process Effects 0.000 description 5
- 238000013145 classification model Methods 0.000 description 3
- 238000013522 software testing Methods 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 230000002068 genetic effect Effects 0.000 description 2
- 235000019580 granularity Nutrition 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000007792 addition Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000012913 prioritisation Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000010998 test method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/3668—Software testing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/316—Indexing structures
- G06F16/322—Trees
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/335—Filtering based on additional data, e.g. user or group profiles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a whole machine regression test case recommendation method for changing semantic perception, which comprises the following steps: (1) Acquiring code submission information and a test case, and cleaning to acquire a text data set D Submitting information of the code submission information and a description data set D Test case of the test case; (2) Classifying each submitted information text in the text dataset D Submitting information into different characteristic tags according to a hierarchical structure by adopting a hierarchical residual multi-granularity classification network model based on a tag relation tree; the feature tag represents a description tag of a corresponding function of the test case test; (3) And screening test cases under the classification from the description dataset D Test case according to the classification of each submitted information text, performing similarity calculation on the submitted information text and the corresponding screened test case text, and selecting the test cases with high similarity scores as recommended test cases. By utilizing the method and the device, the efficiency and the accuracy of test case selection can be effectively improved.
Description
Technical Field
The invention belongs to the field of software regression testing, and particularly relates to a whole machine regression testing case recommendation method capable of changing semantic perception.
Background
Regression testing is a key step in software development and release, and is performed after old code is modified to ensure that these modifications do not introduce new flaws or affect the normal operation of other functions. The automated regression testing can significantly reduce the overhead in the stages of system testing, maintenance updating, etc.
As an important component of the software lifecycle, regression testing takes a significant role in the overall software testing process and is repeatedly performed multiple times at each stage of software development. Particularly in progressive and fast iterative development modes, frequent release of new versions requires more frequent performance of regression testing, whereas in projects employing extreme programming methods, multiple rounds of regression testing may be required daily. Under such a background, it is important to select a proper regression testing strategy to improve the efficiency and effectiveness of the regression testing. For example, chinese patent document No. CN101178687a discloses an improved software regression testing method.
With the continuous increase and iteration of product functions, the scale of test case sets is also continuously expanding, which directly leads to the increase of regression testing cost. In view of the limited testing resources, it is not possible to execute all test cases. In order to improve the efficiency of regression testing, a more scientific regression testing strategy is urgently required to be formulated. This requires that the test cases be effectively ordered according to a series of well-defined test objectives to determine their execution order, ensuring that the most critical test cases are executed in preference.
A method is proposed as paper Can Code Representation Boost IR-Based Test Case Prioritization, which recommends test cases by using semantic similarity of the change code and the test code. The method comprises the steps of firstly converting a change code and a test code into vectors by using a code representation method, then calculating the similarity between the test method and the change code according to two granularities of a method level and a class level, and finally giving out a test case with high similarity to the change code by combining the similarity of the two granularities.
A method is proposed in paper Prioritizing Natural Language TEST CASES Based on Highly-Used Game Features, and manual test cases described by natural language are recommended through zero sample classification and genetic algorithm multi-objective optimization. The method comprises the steps of firstly automatically identifying the function which can be tested by the test cases from natural language test case description by using a zero sample classification method, then carrying out priority ranking on the test cases according to the frequently used functions covered by the test cases and the execution time of the test cases, and optimizing two targets of multiple covering functions and short execution time by using a genetic algorithm to obtain better test case recommendation ranking.
While the prior art provides a basic approach to regression test case selection, they suffer from a number of significant drawbacks:
1. Scene mismatch: most priority methods are not applicable to manual test cases because they require source code information or test execution history to support their algorithms, which is not generally applicable in manual test scenarios.
2. The change semantic understanding is insufficient: code submission information contains important semantic information about code changes, which is critical to identifying affected functions and corresponding test cases. The lack of efficient mechanisms to interpret this semantic information in the prior art results in test case selection that may not be sufficiently accurate or relevant.
Therefore, a more efficient and intelligent technical scheme is needed to overcome the problems, and particularly, semantic information submitted by codes can be understood, and related manual test cases can be automatically recommended so as to improve accuracy and efficiency of software testing.
Disclosure of Invention
The invention provides a whole machine regression test case recommendation method for changing semantic perception, which can effectively improve the efficiency and accuracy of test case selection.
A whole machine regression test case recommendation method for changing semantic perception comprises the following steps:
(1) Acquiring code submission information and a test case, and cleaning data to obtain a text data set D Submitting information of the code submission information and a description data set D Test case of the test case;
(2) Classifying each submitted information text in the text dataset D Submitting information into different characteristic tags according to a hierarchical structure by adopting a hierarchical residual multi-granularity classification network model based on a tag relation tree;
The feature tag represents a description tag of a corresponding function of the test case test, and the description form of the feature tag is a tree-shaped multi-level tag representation;
(3) And screening the test case subsets under the classification from the description dataset D Test case according to the classification of each submitted information text, performing similarity calculation on the submitted information text and the text corresponding to the test case subsets, and selecting the test cases with high similarity scores as recommended test cases.
The step (2) specifically comprises:
(2-1) text-converting each submitted information in the text dataset D Submitting information into a high-dimensional vector by means of a trimmed BERT model;
(2-2) converting the tag hierarchical relationship in the description dataset D Test case of the test case into a tree structure, and generating a binary tag-based state vector;
And (2-3) inputting the text high-dimensional vector obtained in the step (2-1) into a hierarchical residual multi-granularity classification network model, and carrying out hierarchical feature classification on the text high-dimensional vector by restraining the state vector obtained in the step (2-2) to determine the hierarchical feature label of each submitted information text.
The specific process of the step (2-1) is as follows:
(2-1-1) selecting a BERT model, and performing training fine adjustment by utilizing submitted information text data marked with characteristics;
(2-1-2) text-converting the submitted information in the text dataset D Submitting information into a list of lemmas and a corresponding list of masks by means of a segmenter corresponding to the BERT model;
(2-1-3) inputting the word element list and the corresponding mask list into the BERT model after training and fine tuning in the step (2-1-1), and converting the BERT model into a high-dimensional vector;
(2-1-4) selecting the pooled vector output of the BERT model output as a vector representation of the submitted information text.
The specific process of the step (2-2) is as follows:
(2-2-1) designing and storing hierarchical information of a feature tree in a structured format, wherein feature labels represent description labels of corresponding functions of test case tests, and the description forms of the feature labels are tree-shaped multi-level label representations; the test cases are divided into any layer of nodes of the feature tree, and different feature nodes represent different test case classifications at leaf nodes or intermediate nodes;
(2-2-2) calculating all binary label-based state vectors conforming to the constraint of the feature tree by the feature tree to form a reasonable state space; if the total number of the feature nodes is N, an N multiplied by N matrix is formed, each column of the matrix represents the feature nodes by traversing the feature tree structure, and each row represents all possible legal paths in the feature tree, so that the hierarchical class label taking the binary value represents the feature node label, namely, the state space is used for defining legal constraint of the relation between the father node and the father node.
The specific process of the step (2-3) is as follows:
(2-3-1) calculating the number of layers of the feature tree and the number of features of each layer according to the state space in the step (2-2-2), and constructing a network structure of the hierarchical residual multi-granularity classification network model; the hierarchical residual multi-granularity classification network model comprises a hierarchical feature transfer module, a residual connecting part and a hierarchical classification module;
(2-3-2) establishing a hierarchical feature transfer module which is composed of a plurality of fully connected networks or convolution networks and is used for feature extraction and downward transfer between different hierarchies; establishing a residual connection part, and obtaining a combined feature vector which is downwards combined to the subclasses layer by layer from the parent feature information through sequentially connecting coarse-granularity parent class level features and fine-granularity subclass level feature layers;
(2-3-3) applying a hierarchical classification module according to the combined feature vector, wherein the combined feature vector passes through a classification network of each hierarchical level, and each dimension output by the classification network corresponds to each feature label;
(2-3-4) setting a hierarchical threshold, and determining hierarchical classification of the text according to the vector output by each hierarchical classification network.
The step (3) specifically comprises:
(3-1) screening the subset of test cases under the classification from the description dataset D Test case according to the classification for each submitted information text; converting each submitted information text and text of the corresponding test case subset in the dataset D Submitting information into a high-dimensional vector by the trimmed BERT model;
(3-2) calculating the similarity between the vector representation of the submitted information text and the vector representation of each test case description in the corresponding test case set according to the text high-dimensional vector representation obtained in the step (3-1), and obtaining a recommended test case list.
The specific process of the step (3-1) is as follows:
(3-1-1) selecting a BERT model, and training and fine-tuning according to the annotated submitted information and the text data of the test case;
(3-1-2) screening the subset of test cases under the classification from the description dataset D Test case according to the classification for each submitted information text; converting the submitted information text and the text of the corresponding test case subset into a word element list and a corresponding mask list through a word segmentation device corresponding to the BERT model;
(3-1-3) inputting the word element list and the corresponding mask list in the step (3-1-2) into the BERT model obtained by training and fine tuning in the step (3-1-1), and converting the BERT model into a high-dimensional vector for submitting information and a high-dimensional vector for describing a test case;
(3-1-4) selecting the pooled vector output of the BERT model output as the vector representation of the corresponding text.
The specific process of the step (3-2) is as follows:
(3-2-1) calculating a similarity between the vector representation of the submitted information and the vector representation of each test case description in the corresponding test case set, wherein the similarity calculation uses a cosine similarity formula:
(3-2-2) selecting the test cases with higher similarity as recommended test cases according to the similarity score given in the step (3-2-1), and outputting a recommended test case list.
Compared with the prior art, the invention has the following beneficial effects:
According to the invention, a two-stage test case selection framework is designed, semantic information is submitted according to codes in the first stage, and the hierarchical residual multi-granularity classification model based on the label tree structure is utilized to classify the test case feature labels, so that a test case subset is screened out from a large number of test case sets, the test case range which needs to be calculated and matched in the second stage is reduced, and the test case selection speed is improved. And the second stage is to calculate the semantic similarity between the code submitting semantic information and the description information of the test case subset screened according to the first stage by utilizing SBERT model according to the code submitting semantic information and the test case description information, and output a recommended test case list ordered according to the change semantic relevance according to the similarity score. The method provides an efficient and accurate mode for automatically selecting the related regression test cases in the software development process through processing and intelligent analysis of the code submitted information text, and is suitable for the software development project of quick iteration.
Drawings
FIG. 1 is a flowchart of a whole machine regression testing case recommendation method for changing semantic perception;
FIG. 2 is a schematic diagram of a residual connection portion of a hierarchical residual multi-granularity classification network model according to the present invention;
FIG. 3 is a flowchart of step S200 in an embodiment of the present invention;
Fig. 4 is a flowchart of step S300 in the embodiment of the invention.
Detailed Description
The invention will be described in further detail with reference to the drawings and examples, it being noted that the examples described below are intended to facilitate the understanding of the invention and are not intended to limit the invention in any way.
As shown in fig. 1, a whole machine regression test case recommendation method for changing semantic perception includes the following steps:
S100, data preprocessing, namely acquiring the text of code submission information in a historical version to form an original data set, and then cleaning the data set to output a data set D Submitting information which can be used for the following steps. And the description data set D Test case for cleaning out the test cases is obtained according to the same method.
In this step, the specific process of cleaning the original dataset is:
S101, collecting text from a database.
S102, removing irrelevant information such as special characters, spaces and the like in the text.
S103, performing normalization processing on the text. Specifically, the English letters in the text are unified in case first, and then key fields are extracted according to a template of the submitted information format.
S104, outputting the text data after cleaning and normalization.
S200, acquiring the structure of the feature tag tree in the use case library feature, and converting the tree structure into a tag relation state vector for constructing a classification network model. And classifying the text data of the code submission information in the step S100 into different characteristic labels according to a hierarchical structure by adopting a hierarchical residual multi-granularity classification network model based on a label relation tree. The feature tag represents a description tag of the corresponding function of the test case test, and the tree-shaped multi-level tag of the description form is represented. The hierarchical residual multi-granularity classification network model based on the label relation tree in the step needs to be trained through submitted information text data of a certain number of marked features so as to adapt to the specific environment of a specific project.
The specific steps of S200 are shown in fig. 3, and specifically include:
S210, a text vectorization module (BERT model) converts the output text in the step S100 into a high-dimensional vector through the trimmed BERT model.
S220, the label relation tree construction module converts the label hierarchical relation of the use case library into a tree structure and generates a binary label-based state vector.
And S230, a hierarchical classification module inputs the text high-dimensional vector obtained in the step S210 into a classification model, performs constraint on the state vector obtained in the step S220, performs hierarchical feature classification on the text high-dimensional vector, and determines a hierarchical feature tag of each submitted information text.
The step S210 specifically includes:
S211, selecting a proper BERT pre-training model, and training and fine-tuning according to text data of submitted information of marked features so as to adapt to project-specific text data.
S212, converting the text output in the step S100 into a word element list and a corresponding mask list through a word segmentation device corresponding to the BERT model.
S213, inputting the word element list and the corresponding mask list in the step S212 into the BERT model, and converting the word element list and the corresponding mask list into high-dimensional vectors.
S214, selecting the pooled vector output of the BERT model as the vector representation of the text and outputting the pooled vector to the next step.
The step S220 specifically includes:
S221, extracting layering information of the feature tree to form a multi-level feature tag tree. The structured format is designed to store the layering information of the feature tree, and the feature labels represent description labels of the corresponding functions of the test case test, and the description form is tree-shaped multi-level label representation. The test cases can be divided into any layer of nodes of the feature tree, namely the leaf nodes and the intermediate nodes, and different feature nodes represent different test case classifications under most application scenes.
S222, calculating all binary label-based state vectors conforming to the feature tree constraint by the feature tree in the step S221 to form a reasonable state space. If the total number of the feature nodes is N, an N multiplied by N matrix is formed, each column of the matrix represents the feature nodes by traversing the feature tree structure, and each row represents all possible legal paths in the feature tree, so that the hierarchical class label taking the binary value represents the feature node label, namely, the state space is used for defining legal constraint of the relation between the father node and the father node.
The step S230 specifically includes:
S231, calculating the hierarchy attribute required by constructing the network, and calculating the layer number of the feature tree and the number of the features of each layer according to the state space in the step S222 for constructing the hierarchy residual network structure.
S232, establishing a hierarchical feature transfer module which is composed of a plurality of fully-connected networks or convolution networks and is used for feature extraction and downward transfer among different hierarchies. And establishing a residual connection part, wherein as shown in fig. 2, the combined feature vector which is combined to the subclass layer by layer downwards from the parent feature information is obtained through sequentially connecting coarse-granularity parent class level features and fine-granularity subclass level feature layers.
S233, applying a hierarchical classification module according to the combined feature vector in the step S232, wherein the combined feature vector passes through the classification network of each hierarchical layer. Firstly, determining the number of feature labels of each layer according to the feature tree, and determining the fully-connected network hierarchical structure of the classification model. The calculation process of the full-connection network of one layer is as follows:
xi+1=σ(Wixi+bi)
Where x i+1 represents the output vector, σ represents the activation function, W i represents the network weight of the current layer, x i represents the input vector, and b i represents the bias. A Sigmoid function is used as the activation function. Each dimension of the classification network output corresponds to each feature label.
S234, determining the hierarchical classification of the text according to the vector output by each layer of classification network. Firstly, setting a threshold value of each layer, when the value of a certain dimension of a certain layer vector output in step S233 is higher than the threshold value of the current layer, classifying the text on a feature label corresponding to the one dimension of the layer, judging whether the classified feature label is reasonable according to the tree structure, and finally forming an integral hierarchical classification label
S300, screening a subset of the test cases under the classification from the description dataset D Test case of the test cases output in the step S100 according to the classification of each submitted information text in the step S200, performing similarity calculation on the submitted information text and the test case text through a SBERT model, and selecting the test cases with high similarity scores as recommended test cases. The SBERT-based model in the step needs to train and finely tune through a certain amount of marked submitted information texts and data corresponding to test case texts to adapt to the specific environment of a specific project.
As shown in fig. 4, the specific steps of S300 include:
s310, converting the submitted information and the text data of the screened test case subsets in the step S100 into high-dimensional vectors through the trimmed BERT model.
S320, calculating the similarity between the vector representation of the submitted information and the vector representation of each test case description according to the text high-dimensional vector representation obtained in the step S310, and obtaining a recommended test case list.
The step S310 specifically includes:
S311, selecting the same pre-training model as in step S211, and performing training fine adjustment according to the annotated submitted information and the text data pairs of the test cases so as to adapt to the text data of the project.
S312, the text output in the step S100 is converted into a word element list and a corresponding mask list through a word segmentation device corresponding to the BERT model.
S313, inputting the word element list and the corresponding mask list in the step S312 into the BERT model, and converting the word element list and the corresponding mask list into a high-dimensional vector of submitted information and a high-dimensional vector of test case description.
S314, selecting the pooled vector output of the BERT model as the vector representation of the corresponding text.
The step S320 specifically includes:
s321, calculating the similarity between the vector representation of the submitted information and the vector representation of each test case description, wherein the similarity calculation uses a cosine similarity formula:
s322, selecting a test case with higher similarity as a recommended test case according to the similarity score given in the step S321, and outputting a recommended test case list.
The embodiment of the invention can effectively improve the efficiency and accuracy of test case selection. Through intelligent text analysis and similarity calculation, the method can rapidly identify the test cases closely related to the latest code submission in a large number of test cases, thereby remarkably improving the quality and reliability of software testing.
The foregoing embodiments have described in detail the technical solution and the advantages of the present invention, it should be understood that the foregoing embodiments are merely illustrative of the present invention and are not intended to limit the invention, and any modifications, additions and equivalents made within the scope of the principles of the present invention should be included in the scope of the invention.
Claims (8)
1. The whole machine regression test case recommendation method for changing semantic perception is characterized by comprising the following steps of:
(1) Acquiring code submission information and a test case, and cleaning data to obtain a text data set D Submitting information of the code submission information and a description data set D Test case of the test case;
(2) Classifying each submitted information text in the text dataset D Submitting information into different characteristic tags according to a hierarchical structure by adopting a hierarchical residual multi-granularity classification network model based on a tag relation tree;
The feature tag represents a description tag of a corresponding function of the test case test, and the description form of the feature tag is a tree-shaped multi-level tag representation;
(3) And screening the test case subsets under the classification from the description dataset D Test case according to the classification of each submitted information text, performing similarity calculation on the submitted information text and the text corresponding to the test case subsets, and selecting the test cases with high similarity scores as recommended test cases.
2. The whole machine regression test case recommendation method for changing semantic perception according to claim 1, wherein the step (2) specifically comprises:
(2-1) text-converting each submitted information in the text dataset D Submitting information into a high-dimensional vector by means of a trimmed BERT model;
(2-2) converting the tag hierarchical relationship in the description dataset D Test case of the test case into a tree structure, and generating a binary tag-based state vector;
And (2-3) inputting the text high-dimensional vector obtained in the step (2-1) into a hierarchical residual multi-granularity classification network model, and carrying out hierarchical feature classification on the text high-dimensional vector by restraining the state vector obtained in the step (2-2) to determine the hierarchical feature label of each submitted information text.
3. The whole machine regression test case recommendation method for changing semantic perception according to claim 2, wherein the specific process of the step (2-1) is as follows:
(2-1-1) selecting a BERT model, and performing training fine adjustment by utilizing submitted information text data marked with characteristics;
(2-1-2) text-converting the submitted information in the text dataset D Submitting information into a list of lemmas and a corresponding list of masks by means of a segmenter corresponding to the BERT model;
(2-1-3) inputting the word element list and the corresponding mask list into the BERT model after training and fine tuning in the step (2-1-1), and converting the BERT model into a high-dimensional vector;
(2-1-4) selecting the pooled vector output of the BERT model output as a vector representation of the submitted information text.
4. The whole machine regression test case recommendation method for changing semantic perception according to claim 2, wherein the specific process of the step (2-2) is as follows:
(2-2-1) designing and storing hierarchical information of a feature tree in a structured format, wherein feature labels represent description labels of corresponding functions of test case tests, and the description forms of the feature labels are tree-shaped multi-level label representations; the test cases are divided into any layer of nodes of the feature tree, and different feature nodes represent different test case classifications at leaf nodes or intermediate nodes;
(2-2-2) calculating all binary label-based state vectors conforming to the constraint of the feature tree by the feature tree to form a reasonable state space; if the total number of the feature nodes is N, an N multiplied by N matrix is formed, each column of the matrix represents the feature nodes by traversing the feature tree structure, and each row represents all possible legal paths in the feature tree, so that the hierarchical class label taking the binary value represents the feature node label, namely, the state space is used for defining legal constraint of the relation between the father node and the father node.
5. The whole machine regression test case recommendation method for changing semantic perception according to claim 4, wherein the specific process of the step (2-3) is as follows:
(2-3-1) calculating the number of layers of the feature tree and the number of features of each layer according to the state space in the step (2-2-2), and constructing a network structure of the hierarchical residual multi-granularity classification network model; the hierarchical residual multi-granularity classification network model comprises a hierarchical feature transfer module, a residual connecting part and a hierarchical classification module;
(2-3-2) establishing a hierarchical feature transfer module which is composed of a plurality of fully connected networks or convolution networks and is used for feature extraction and downward transfer between different hierarchies; establishing a residual connection part, and obtaining a combined feature vector which is downwards combined to the subclasses layer by layer from the parent feature information through sequentially connecting coarse-granularity parent class level features and fine-granularity subclass level feature layers;
(2-3-3) applying a hierarchical classification module according to the combined feature vector, wherein the combined feature vector passes through a classification network of each hierarchical level, and each dimension output by the classification network corresponds to each feature label;
(2-3-4) setting a hierarchical threshold, and determining hierarchical classification of the text according to the vector output by each hierarchical classification network.
6. The whole machine regression test case recommendation method for changing semantic perception according to claim 1, wherein the step (3) specifically comprises:
(3-1) screening the subset of test cases under the classification from the description dataset D Test case according to the classification for each submitted information text; converting each submitted information text and text of the corresponding test case subset in the dataset D Submitting information into a high-dimensional vector by the trimmed BERT model;
(3-2) calculating the similarity between the vector representation of the submitted information text and the vector representation of each test case description in the corresponding test case set according to the text high-dimensional vector representation obtained in the step (3-1), and obtaining a recommended test case list.
7. The whole machine regression test case recommendation method for changing semantic perception according to claim 6, wherein the specific process of the step (3-1) is as follows:
(3-1-1) selecting a BERT model, and training and fine-tuning according to the annotated submitted information and the text data of the test case;
(3-1-2) screening the subset of test cases under the classification from the description dataset D Test case according to the classification for each submitted information text; converting the submitted information text and the text of the corresponding test case subset into a word element list and a corresponding mask list through a word segmentation device corresponding to the BERT model;
(3-1-3) inputting the word element list and the corresponding mask list in the step (3-1-2) into the BERT model obtained by training and fine tuning in the step (3-1-1), and converting the BERT model into a high-dimensional vector for submitting information and a high-dimensional vector for describing a test case;
(3-1-4) selecting the pooled vector output of the BERT model output as the vector representation of the corresponding text.
8. The whole machine regression test case recommendation method for changing semantic perception according to claim 7, wherein the specific process of the step (3-2) is as follows:
(3-2-1) calculating a similarity between the vector representation of the submitted information and the vector representation of each test case description in the corresponding test case set, wherein the similarity calculation uses a cosine similarity formula:
(3-2-2) selecting the test cases with higher similarity as recommended test cases according to the similarity score given in the step (3-2-1), and outputting a recommended test case list.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410172283.1A CN117971684B (en) | 2024-02-07 | 2024-02-07 | Whole machine regression test case recommendation method capable of changing semantic perception |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410172283.1A CN117971684B (en) | 2024-02-07 | 2024-02-07 | Whole machine regression test case recommendation method capable of changing semantic perception |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117971684A true CN117971684A (en) | 2024-05-03 |
CN117971684B CN117971684B (en) | 2024-08-23 |
Family
ID=90865760
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410172283.1A Active CN117971684B (en) | 2024-02-07 | 2024-02-07 | Whole machine regression test case recommendation method capable of changing semantic perception |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117971684B (en) |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190034323A1 (en) * | 2017-07-27 | 2019-01-31 | Hcl Technologies Limited | System and method for generating regression test suite |
CN111708703A (en) * | 2020-06-18 | 2020-09-25 | 深圳前海微众银行股份有限公司 | Test case set generation method, device, equipment and computer readable storage medium |
CN112052160A (en) * | 2020-08-06 | 2020-12-08 | 中信银行股份有限公司 | Code case obtaining method and device, electronic equipment and medium |
WO2022095354A1 (en) * | 2020-11-03 | 2022-05-12 | 平安科技(深圳)有限公司 | Bert-based text classification method and apparatus, computer device, and storage medium |
US20220156175A1 (en) * | 2020-11-19 | 2022-05-19 | Ebay Inc. | Mapping of test cases to test data for computer software testing |
US20230195773A1 (en) * | 2019-10-11 | 2023-06-22 | Ping An Technology (Shenzhen) Co., Ltd. | Text classification method, apparatus and computer-readable storage medium |
CN116340159A (en) * | 2023-03-14 | 2023-06-27 | 平安银行股份有限公司 | Regression test case recommendation method, system, equipment and storage medium |
CN116662184A (en) * | 2023-06-05 | 2023-08-29 | 福建师范大学 | Industrial control protocol fuzzy test case screening method and system based on Bert |
CN116775451A (en) * | 2022-12-30 | 2023-09-19 | 广东亿迅科技有限公司 | Intelligent scoring method and device for test cases, terminal equipment and computer medium |
CN117493174A (en) * | 2023-10-23 | 2024-02-02 | 中移互联网有限公司 | Test case determination and cloud disk regression test method and device |
-
2024
- 2024-02-07 CN CN202410172283.1A patent/CN117971684B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190034323A1 (en) * | 2017-07-27 | 2019-01-31 | Hcl Technologies Limited | System and method for generating regression test suite |
US20230195773A1 (en) * | 2019-10-11 | 2023-06-22 | Ping An Technology (Shenzhen) Co., Ltd. | Text classification method, apparatus and computer-readable storage medium |
CN111708703A (en) * | 2020-06-18 | 2020-09-25 | 深圳前海微众银行股份有限公司 | Test case set generation method, device, equipment and computer readable storage medium |
WO2021253904A1 (en) * | 2020-06-18 | 2021-12-23 | 深圳前海微众银行股份有限公司 | Test case set generation method, apparatus and device, and computer readable storage medium |
CN112052160A (en) * | 2020-08-06 | 2020-12-08 | 中信银行股份有限公司 | Code case obtaining method and device, electronic equipment and medium |
WO2022095354A1 (en) * | 2020-11-03 | 2022-05-12 | 平安科技(深圳)有限公司 | Bert-based text classification method and apparatus, computer device, and storage medium |
US20220156175A1 (en) * | 2020-11-19 | 2022-05-19 | Ebay Inc. | Mapping of test cases to test data for computer software testing |
CN116775451A (en) * | 2022-12-30 | 2023-09-19 | 广东亿迅科技有限公司 | Intelligent scoring method and device for test cases, terminal equipment and computer medium |
CN116340159A (en) * | 2023-03-14 | 2023-06-27 | 平安银行股份有限公司 | Regression test case recommendation method, system, equipment and storage medium |
CN116662184A (en) * | 2023-06-05 | 2023-08-29 | 福建师范大学 | Industrial control protocol fuzzy test case screening method and system based on Bert |
CN117493174A (en) * | 2023-10-23 | 2024-02-02 | 中移互联网有限公司 | Test case determination and cloud disk regression test method and device |
Non-Patent Citations (1)
Title |
---|
机器之心: "CVPR2022|浙大、蚂蚁集团提出基于标签关系树的层级残差多粒度分类网络,建模多粒度标签间的层级知识", Retrieved from the Internet <URL:《https://cloud.tencent.com/developer/article/2034527》> * |
Also Published As
Publication number | Publication date |
---|---|
CN117971684B (en) | 2024-08-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110597735B (en) | Software defect prediction method for open-source software defect feature deep learning | |
CN110059181B (en) | Short text label method, system and device for large-scale classification system | |
Socher et al. | Parsing natural scenes and natural language with recursive neural networks | |
US11164044B2 (en) | Systems and methods for tagging datasets using models arranged in a series of nodes | |
CN112650923A (en) | Public opinion processing method and device for news events, storage medium and computer equipment | |
CN114896388A (en) | Hierarchical multi-label text classification method based on mixed attention | |
CN111966825A (en) | Power grid equipment defect text classification method based on machine learning | |
CN110175235A (en) | Intelligence commodity tax sorting code number method and system neural network based | |
CN109214410A (en) | A kind of method and system promoting multi-tag classification accuracy rate | |
US10614031B1 (en) | Systems and methods for indexing and mapping data sets using feature matrices | |
CN117171413B (en) | Data processing system and method for digital collection management | |
CN116562284B (en) | Government affair text automatic allocation model training method and device | |
CN117971684B (en) | Whole machine regression test case recommendation method capable of changing semantic perception | |
CN116756605A (en) | ERNIE-CN-GRU-based automatic speech step recognition method, system, equipment and medium | |
CN116150669A (en) | Mashup service multi-label classification method based on double-flow regularized width learning | |
CN115269855A (en) | Paper fine-grained multi-label labeling method and device based on pre-training encoder | |
CN115934936A (en) | Intelligent traffic text analysis method based on natural language processing | |
CN115238645A (en) | Asset data identification method and device, electronic equipment and computer storage medium | |
CN115168634A (en) | Fabric cross-modal image-text retrieval method based on multi-level representation | |
CN114648121A (en) | Data processing method and device, electronic equipment and storage medium | |
CN118520176B (en) | Accurate recommendation method and system based on artificial intelligence | |
CN117952022B (en) | Yield multi-dimensional interactive system, method, computer equipment and storage medium | |
Lin et al. | Multi-label text classification based on graph attention network and self-attention mechanism | |
Mojzeš et al. | Feature space for statistical classification of Java source code patterns | |
Chen et al. | Research on software classification based on LSTM and CNN |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |