CN117971684A - Whole machine regression test case recommendation method capable of changing semantic perception - Google Patents

Whole machine regression test case recommendation method capable of changing semantic perception Download PDF

Info

Publication number
CN117971684A
CN117971684A CN202410172283.1A CN202410172283A CN117971684A CN 117971684 A CN117971684 A CN 117971684A CN 202410172283 A CN202410172283 A CN 202410172283A CN 117971684 A CN117971684 A CN 117971684A
Authority
CN
China
Prior art keywords
test case
text
feature
hierarchical
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410172283.1A
Other languages
Chinese (zh)
Other versions
CN117971684B (en
Inventor
邓水光
徐浩然
向天宇
智晨
张高榕
吴孜璇
尹建伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Guangdong Oppo Mobile Telecommunications Corp Ltd
Original Assignee
Zhejiang University ZJU
Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU, Guangdong Oppo Mobile Telecommunications Corp Ltd filed Critical Zhejiang University ZJU
Priority to CN202410172283.1A priority Critical patent/CN117971684B/en
Publication of CN117971684A publication Critical patent/CN117971684A/en
Application granted granted Critical
Publication of CN117971684B publication Critical patent/CN117971684B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/322Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a whole machine regression test case recommendation method for changing semantic perception, which comprises the following steps: (1) Acquiring code submission information and a test case, and cleaning to acquire a text data set D Submitting information of the code submission information and a description data set D Test case of the test case; (2) Classifying each submitted information text in the text dataset D Submitting information into different characteristic tags according to a hierarchical structure by adopting a hierarchical residual multi-granularity classification network model based on a tag relation tree; the feature tag represents a description tag of a corresponding function of the test case test; (3) And screening test cases under the classification from the description dataset D Test case according to the classification of each submitted information text, performing similarity calculation on the submitted information text and the corresponding screened test case text, and selecting the test cases with high similarity scores as recommended test cases. By utilizing the method and the device, the efficiency and the accuracy of test case selection can be effectively improved.

Description

Whole machine regression test case recommendation method capable of changing semantic perception
Technical Field
The invention belongs to the field of software regression testing, and particularly relates to a whole machine regression testing case recommendation method capable of changing semantic perception.
Background
Regression testing is a key step in software development and release, and is performed after old code is modified to ensure that these modifications do not introduce new flaws or affect the normal operation of other functions. The automated regression testing can significantly reduce the overhead in the stages of system testing, maintenance updating, etc.
As an important component of the software lifecycle, regression testing takes a significant role in the overall software testing process and is repeatedly performed multiple times at each stage of software development. Particularly in progressive and fast iterative development modes, frequent release of new versions requires more frequent performance of regression testing, whereas in projects employing extreme programming methods, multiple rounds of regression testing may be required daily. Under such a background, it is important to select a proper regression testing strategy to improve the efficiency and effectiveness of the regression testing. For example, chinese patent document No. CN101178687a discloses an improved software regression testing method.
With the continuous increase and iteration of product functions, the scale of test case sets is also continuously expanding, which directly leads to the increase of regression testing cost. In view of the limited testing resources, it is not possible to execute all test cases. In order to improve the efficiency of regression testing, a more scientific regression testing strategy is urgently required to be formulated. This requires that the test cases be effectively ordered according to a series of well-defined test objectives to determine their execution order, ensuring that the most critical test cases are executed in preference.
A method is proposed as paper Can Code Representation Boost IR-Based Test Case Prioritization, which recommends test cases by using semantic similarity of the change code and the test code. The method comprises the steps of firstly converting a change code and a test code into vectors by using a code representation method, then calculating the similarity between the test method and the change code according to two granularities of a method level and a class level, and finally giving out a test case with high similarity to the change code by combining the similarity of the two granularities.
A method is proposed in paper Prioritizing Natural Language TEST CASES Based on Highly-Used Game Features, and manual test cases described by natural language are recommended through zero sample classification and genetic algorithm multi-objective optimization. The method comprises the steps of firstly automatically identifying the function which can be tested by the test cases from natural language test case description by using a zero sample classification method, then carrying out priority ranking on the test cases according to the frequently used functions covered by the test cases and the execution time of the test cases, and optimizing two targets of multiple covering functions and short execution time by using a genetic algorithm to obtain better test case recommendation ranking.
While the prior art provides a basic approach to regression test case selection, they suffer from a number of significant drawbacks:
1. Scene mismatch: most priority methods are not applicable to manual test cases because they require source code information or test execution history to support their algorithms, which is not generally applicable in manual test scenarios.
2. The change semantic understanding is insufficient: code submission information contains important semantic information about code changes, which is critical to identifying affected functions and corresponding test cases. The lack of efficient mechanisms to interpret this semantic information in the prior art results in test case selection that may not be sufficiently accurate or relevant.
Therefore, a more efficient and intelligent technical scheme is needed to overcome the problems, and particularly, semantic information submitted by codes can be understood, and related manual test cases can be automatically recommended so as to improve accuracy and efficiency of software testing.
Disclosure of Invention
The invention provides a whole machine regression test case recommendation method for changing semantic perception, which can effectively improve the efficiency and accuracy of test case selection.
A whole machine regression test case recommendation method for changing semantic perception comprises the following steps:
(1) Acquiring code submission information and a test case, and cleaning data to obtain a text data set D Submitting information of the code submission information and a description data set D Test case of the test case;
(2) Classifying each submitted information text in the text dataset D Submitting information into different characteristic tags according to a hierarchical structure by adopting a hierarchical residual multi-granularity classification network model based on a tag relation tree;
The feature tag represents a description tag of a corresponding function of the test case test, and the description form of the feature tag is a tree-shaped multi-level tag representation;
(3) And screening the test case subsets under the classification from the description dataset D Test case according to the classification of each submitted information text, performing similarity calculation on the submitted information text and the text corresponding to the test case subsets, and selecting the test cases with high similarity scores as recommended test cases.
The step (2) specifically comprises:
(2-1) text-converting each submitted information in the text dataset D Submitting information into a high-dimensional vector by means of a trimmed BERT model;
(2-2) converting the tag hierarchical relationship in the description dataset D Test case of the test case into a tree structure, and generating a binary tag-based state vector;
And (2-3) inputting the text high-dimensional vector obtained in the step (2-1) into a hierarchical residual multi-granularity classification network model, and carrying out hierarchical feature classification on the text high-dimensional vector by restraining the state vector obtained in the step (2-2) to determine the hierarchical feature label of each submitted information text.
The specific process of the step (2-1) is as follows:
(2-1-1) selecting a BERT model, and performing training fine adjustment by utilizing submitted information text data marked with characteristics;
(2-1-2) text-converting the submitted information in the text dataset D Submitting information into a list of lemmas and a corresponding list of masks by means of a segmenter corresponding to the BERT model;
(2-1-3) inputting the word element list and the corresponding mask list into the BERT model after training and fine tuning in the step (2-1-1), and converting the BERT model into a high-dimensional vector;
(2-1-4) selecting the pooled vector output of the BERT model output as a vector representation of the submitted information text.
The specific process of the step (2-2) is as follows:
(2-2-1) designing and storing hierarchical information of a feature tree in a structured format, wherein feature labels represent description labels of corresponding functions of test case tests, and the description forms of the feature labels are tree-shaped multi-level label representations; the test cases are divided into any layer of nodes of the feature tree, and different feature nodes represent different test case classifications at leaf nodes or intermediate nodes;
(2-2-2) calculating all binary label-based state vectors conforming to the constraint of the feature tree by the feature tree to form a reasonable state space; if the total number of the feature nodes is N, an N multiplied by N matrix is formed, each column of the matrix represents the feature nodes by traversing the feature tree structure, and each row represents all possible legal paths in the feature tree, so that the hierarchical class label taking the binary value represents the feature node label, namely, the state space is used for defining legal constraint of the relation between the father node and the father node.
The specific process of the step (2-3) is as follows:
(2-3-1) calculating the number of layers of the feature tree and the number of features of each layer according to the state space in the step (2-2-2), and constructing a network structure of the hierarchical residual multi-granularity classification network model; the hierarchical residual multi-granularity classification network model comprises a hierarchical feature transfer module, a residual connecting part and a hierarchical classification module;
(2-3-2) establishing a hierarchical feature transfer module which is composed of a plurality of fully connected networks or convolution networks and is used for feature extraction and downward transfer between different hierarchies; establishing a residual connection part, and obtaining a combined feature vector which is downwards combined to the subclasses layer by layer from the parent feature information through sequentially connecting coarse-granularity parent class level features and fine-granularity subclass level feature layers;
(2-3-3) applying a hierarchical classification module according to the combined feature vector, wherein the combined feature vector passes through a classification network of each hierarchical level, and each dimension output by the classification network corresponds to each feature label;
(2-3-4) setting a hierarchical threshold, and determining hierarchical classification of the text according to the vector output by each hierarchical classification network.
The step (3) specifically comprises:
(3-1) screening the subset of test cases under the classification from the description dataset D Test case according to the classification for each submitted information text; converting each submitted information text and text of the corresponding test case subset in the dataset D Submitting information into a high-dimensional vector by the trimmed BERT model;
(3-2) calculating the similarity between the vector representation of the submitted information text and the vector representation of each test case description in the corresponding test case set according to the text high-dimensional vector representation obtained in the step (3-1), and obtaining a recommended test case list.
The specific process of the step (3-1) is as follows:
(3-1-1) selecting a BERT model, and training and fine-tuning according to the annotated submitted information and the text data of the test case;
(3-1-2) screening the subset of test cases under the classification from the description dataset D Test case according to the classification for each submitted information text; converting the submitted information text and the text of the corresponding test case subset into a word element list and a corresponding mask list through a word segmentation device corresponding to the BERT model;
(3-1-3) inputting the word element list and the corresponding mask list in the step (3-1-2) into the BERT model obtained by training and fine tuning in the step (3-1-1), and converting the BERT model into a high-dimensional vector for submitting information and a high-dimensional vector for describing a test case;
(3-1-4) selecting the pooled vector output of the BERT model output as the vector representation of the corresponding text.
The specific process of the step (3-2) is as follows:
(3-2-1) calculating a similarity between the vector representation of the submitted information and the vector representation of each test case description in the corresponding test case set, wherein the similarity calculation uses a cosine similarity formula:
(3-2-2) selecting the test cases with higher similarity as recommended test cases according to the similarity score given in the step (3-2-1), and outputting a recommended test case list.
Compared with the prior art, the invention has the following beneficial effects:
According to the invention, a two-stage test case selection framework is designed, semantic information is submitted according to codes in the first stage, and the hierarchical residual multi-granularity classification model based on the label tree structure is utilized to classify the test case feature labels, so that a test case subset is screened out from a large number of test case sets, the test case range which needs to be calculated and matched in the second stage is reduced, and the test case selection speed is improved. And the second stage is to calculate the semantic similarity between the code submitting semantic information and the description information of the test case subset screened according to the first stage by utilizing SBERT model according to the code submitting semantic information and the test case description information, and output a recommended test case list ordered according to the change semantic relevance according to the similarity score. The method provides an efficient and accurate mode for automatically selecting the related regression test cases in the software development process through processing and intelligent analysis of the code submitted information text, and is suitable for the software development project of quick iteration.
Drawings
FIG. 1 is a flowchart of a whole machine regression testing case recommendation method for changing semantic perception;
FIG. 2 is a schematic diagram of a residual connection portion of a hierarchical residual multi-granularity classification network model according to the present invention;
FIG. 3 is a flowchart of step S200 in an embodiment of the present invention;
Fig. 4 is a flowchart of step S300 in the embodiment of the invention.
Detailed Description
The invention will be described in further detail with reference to the drawings and examples, it being noted that the examples described below are intended to facilitate the understanding of the invention and are not intended to limit the invention in any way.
As shown in fig. 1, a whole machine regression test case recommendation method for changing semantic perception includes the following steps:
S100, data preprocessing, namely acquiring the text of code submission information in a historical version to form an original data set, and then cleaning the data set to output a data set D Submitting information which can be used for the following steps. And the description data set D Test case for cleaning out the test cases is obtained according to the same method.
In this step, the specific process of cleaning the original dataset is:
S101, collecting text from a database.
S102, removing irrelevant information such as special characters, spaces and the like in the text.
S103, performing normalization processing on the text. Specifically, the English letters in the text are unified in case first, and then key fields are extracted according to a template of the submitted information format.
S104, outputting the text data after cleaning and normalization.
S200, acquiring the structure of the feature tag tree in the use case library feature, and converting the tree structure into a tag relation state vector for constructing a classification network model. And classifying the text data of the code submission information in the step S100 into different characteristic labels according to a hierarchical structure by adopting a hierarchical residual multi-granularity classification network model based on a label relation tree. The feature tag represents a description tag of the corresponding function of the test case test, and the tree-shaped multi-level tag of the description form is represented. The hierarchical residual multi-granularity classification network model based on the label relation tree in the step needs to be trained through submitted information text data of a certain number of marked features so as to adapt to the specific environment of a specific project.
The specific steps of S200 are shown in fig. 3, and specifically include:
S210, a text vectorization module (BERT model) converts the output text in the step S100 into a high-dimensional vector through the trimmed BERT model.
S220, the label relation tree construction module converts the label hierarchical relation of the use case library into a tree structure and generates a binary label-based state vector.
And S230, a hierarchical classification module inputs the text high-dimensional vector obtained in the step S210 into a classification model, performs constraint on the state vector obtained in the step S220, performs hierarchical feature classification on the text high-dimensional vector, and determines a hierarchical feature tag of each submitted information text.
The step S210 specifically includes:
S211, selecting a proper BERT pre-training model, and training and fine-tuning according to text data of submitted information of marked features so as to adapt to project-specific text data.
S212, converting the text output in the step S100 into a word element list and a corresponding mask list through a word segmentation device corresponding to the BERT model.
S213, inputting the word element list and the corresponding mask list in the step S212 into the BERT model, and converting the word element list and the corresponding mask list into high-dimensional vectors.
S214, selecting the pooled vector output of the BERT model as the vector representation of the text and outputting the pooled vector to the next step.
The step S220 specifically includes:
S221, extracting layering information of the feature tree to form a multi-level feature tag tree. The structured format is designed to store the layering information of the feature tree, and the feature labels represent description labels of the corresponding functions of the test case test, and the description form is tree-shaped multi-level label representation. The test cases can be divided into any layer of nodes of the feature tree, namely the leaf nodes and the intermediate nodes, and different feature nodes represent different test case classifications under most application scenes.
S222, calculating all binary label-based state vectors conforming to the feature tree constraint by the feature tree in the step S221 to form a reasonable state space. If the total number of the feature nodes is N, an N multiplied by N matrix is formed, each column of the matrix represents the feature nodes by traversing the feature tree structure, and each row represents all possible legal paths in the feature tree, so that the hierarchical class label taking the binary value represents the feature node label, namely, the state space is used for defining legal constraint of the relation between the father node and the father node.
The step S230 specifically includes:
S231, calculating the hierarchy attribute required by constructing the network, and calculating the layer number of the feature tree and the number of the features of each layer according to the state space in the step S222 for constructing the hierarchy residual network structure.
S232, establishing a hierarchical feature transfer module which is composed of a plurality of fully-connected networks or convolution networks and is used for feature extraction and downward transfer among different hierarchies. And establishing a residual connection part, wherein as shown in fig. 2, the combined feature vector which is combined to the subclass layer by layer downwards from the parent feature information is obtained through sequentially connecting coarse-granularity parent class level features and fine-granularity subclass level feature layers.
S233, applying a hierarchical classification module according to the combined feature vector in the step S232, wherein the combined feature vector passes through the classification network of each hierarchical layer. Firstly, determining the number of feature labels of each layer according to the feature tree, and determining the fully-connected network hierarchical structure of the classification model. The calculation process of the full-connection network of one layer is as follows:
xi+1=σ(Wixi+bi)
Where x i+1 represents the output vector, σ represents the activation function, W i represents the network weight of the current layer, x i represents the input vector, and b i represents the bias. A Sigmoid function is used as the activation function. Each dimension of the classification network output corresponds to each feature label.
S234, determining the hierarchical classification of the text according to the vector output by each layer of classification network. Firstly, setting a threshold value of each layer, when the value of a certain dimension of a certain layer vector output in step S233 is higher than the threshold value of the current layer, classifying the text on a feature label corresponding to the one dimension of the layer, judging whether the classified feature label is reasonable according to the tree structure, and finally forming an integral hierarchical classification label
S300, screening a subset of the test cases under the classification from the description dataset D Test case of the test cases output in the step S100 according to the classification of each submitted information text in the step S200, performing similarity calculation on the submitted information text and the test case text through a SBERT model, and selecting the test cases with high similarity scores as recommended test cases. The SBERT-based model in the step needs to train and finely tune through a certain amount of marked submitted information texts and data corresponding to test case texts to adapt to the specific environment of a specific project.
As shown in fig. 4, the specific steps of S300 include:
s310, converting the submitted information and the text data of the screened test case subsets in the step S100 into high-dimensional vectors through the trimmed BERT model.
S320, calculating the similarity between the vector representation of the submitted information and the vector representation of each test case description according to the text high-dimensional vector representation obtained in the step S310, and obtaining a recommended test case list.
The step S310 specifically includes:
S311, selecting the same pre-training model as in step S211, and performing training fine adjustment according to the annotated submitted information and the text data pairs of the test cases so as to adapt to the text data of the project.
S312, the text output in the step S100 is converted into a word element list and a corresponding mask list through a word segmentation device corresponding to the BERT model.
S313, inputting the word element list and the corresponding mask list in the step S312 into the BERT model, and converting the word element list and the corresponding mask list into a high-dimensional vector of submitted information and a high-dimensional vector of test case description.
S314, selecting the pooled vector output of the BERT model as the vector representation of the corresponding text.
The step S320 specifically includes:
s321, calculating the similarity between the vector representation of the submitted information and the vector representation of each test case description, wherein the similarity calculation uses a cosine similarity formula:
s322, selecting a test case with higher similarity as a recommended test case according to the similarity score given in the step S321, and outputting a recommended test case list.
The embodiment of the invention can effectively improve the efficiency and accuracy of test case selection. Through intelligent text analysis and similarity calculation, the method can rapidly identify the test cases closely related to the latest code submission in a large number of test cases, thereby remarkably improving the quality and reliability of software testing.
The foregoing embodiments have described in detail the technical solution and the advantages of the present invention, it should be understood that the foregoing embodiments are merely illustrative of the present invention and are not intended to limit the invention, and any modifications, additions and equivalents made within the scope of the principles of the present invention should be included in the scope of the invention.

Claims (8)

1. The whole machine regression test case recommendation method for changing semantic perception is characterized by comprising the following steps of:
(1) Acquiring code submission information and a test case, and cleaning data to obtain a text data set D Submitting information of the code submission information and a description data set D Test case of the test case;
(2) Classifying each submitted information text in the text dataset D Submitting information into different characteristic tags according to a hierarchical structure by adopting a hierarchical residual multi-granularity classification network model based on a tag relation tree;
The feature tag represents a description tag of a corresponding function of the test case test, and the description form of the feature tag is a tree-shaped multi-level tag representation;
(3) And screening the test case subsets under the classification from the description dataset D Test case according to the classification of each submitted information text, performing similarity calculation on the submitted information text and the text corresponding to the test case subsets, and selecting the test cases with high similarity scores as recommended test cases.
2. The whole machine regression test case recommendation method for changing semantic perception according to claim 1, wherein the step (2) specifically comprises:
(2-1) text-converting each submitted information in the text dataset D Submitting information into a high-dimensional vector by means of a trimmed BERT model;
(2-2) converting the tag hierarchical relationship in the description dataset D Test case of the test case into a tree structure, and generating a binary tag-based state vector;
And (2-3) inputting the text high-dimensional vector obtained in the step (2-1) into a hierarchical residual multi-granularity classification network model, and carrying out hierarchical feature classification on the text high-dimensional vector by restraining the state vector obtained in the step (2-2) to determine the hierarchical feature label of each submitted information text.
3. The whole machine regression test case recommendation method for changing semantic perception according to claim 2, wherein the specific process of the step (2-1) is as follows:
(2-1-1) selecting a BERT model, and performing training fine adjustment by utilizing submitted information text data marked with characteristics;
(2-1-2) text-converting the submitted information in the text dataset D Submitting information into a list of lemmas and a corresponding list of masks by means of a segmenter corresponding to the BERT model;
(2-1-3) inputting the word element list and the corresponding mask list into the BERT model after training and fine tuning in the step (2-1-1), and converting the BERT model into a high-dimensional vector;
(2-1-4) selecting the pooled vector output of the BERT model output as a vector representation of the submitted information text.
4. The whole machine regression test case recommendation method for changing semantic perception according to claim 2, wherein the specific process of the step (2-2) is as follows:
(2-2-1) designing and storing hierarchical information of a feature tree in a structured format, wherein feature labels represent description labels of corresponding functions of test case tests, and the description forms of the feature labels are tree-shaped multi-level label representations; the test cases are divided into any layer of nodes of the feature tree, and different feature nodes represent different test case classifications at leaf nodes or intermediate nodes;
(2-2-2) calculating all binary label-based state vectors conforming to the constraint of the feature tree by the feature tree to form a reasonable state space; if the total number of the feature nodes is N, an N multiplied by N matrix is formed, each column of the matrix represents the feature nodes by traversing the feature tree structure, and each row represents all possible legal paths in the feature tree, so that the hierarchical class label taking the binary value represents the feature node label, namely, the state space is used for defining legal constraint of the relation between the father node and the father node.
5. The whole machine regression test case recommendation method for changing semantic perception according to claim 4, wherein the specific process of the step (2-3) is as follows:
(2-3-1) calculating the number of layers of the feature tree and the number of features of each layer according to the state space in the step (2-2-2), and constructing a network structure of the hierarchical residual multi-granularity classification network model; the hierarchical residual multi-granularity classification network model comprises a hierarchical feature transfer module, a residual connecting part and a hierarchical classification module;
(2-3-2) establishing a hierarchical feature transfer module which is composed of a plurality of fully connected networks or convolution networks and is used for feature extraction and downward transfer between different hierarchies; establishing a residual connection part, and obtaining a combined feature vector which is downwards combined to the subclasses layer by layer from the parent feature information through sequentially connecting coarse-granularity parent class level features and fine-granularity subclass level feature layers;
(2-3-3) applying a hierarchical classification module according to the combined feature vector, wherein the combined feature vector passes through a classification network of each hierarchical level, and each dimension output by the classification network corresponds to each feature label;
(2-3-4) setting a hierarchical threshold, and determining hierarchical classification of the text according to the vector output by each hierarchical classification network.
6. The whole machine regression test case recommendation method for changing semantic perception according to claim 1, wherein the step (3) specifically comprises:
(3-1) screening the subset of test cases under the classification from the description dataset D Test case according to the classification for each submitted information text; converting each submitted information text and text of the corresponding test case subset in the dataset D Submitting information into a high-dimensional vector by the trimmed BERT model;
(3-2) calculating the similarity between the vector representation of the submitted information text and the vector representation of each test case description in the corresponding test case set according to the text high-dimensional vector representation obtained in the step (3-1), and obtaining a recommended test case list.
7. The whole machine regression test case recommendation method for changing semantic perception according to claim 6, wherein the specific process of the step (3-1) is as follows:
(3-1-1) selecting a BERT model, and training and fine-tuning according to the annotated submitted information and the text data of the test case;
(3-1-2) screening the subset of test cases under the classification from the description dataset D Test case according to the classification for each submitted information text; converting the submitted information text and the text of the corresponding test case subset into a word element list and a corresponding mask list through a word segmentation device corresponding to the BERT model;
(3-1-3) inputting the word element list and the corresponding mask list in the step (3-1-2) into the BERT model obtained by training and fine tuning in the step (3-1-1), and converting the BERT model into a high-dimensional vector for submitting information and a high-dimensional vector for describing a test case;
(3-1-4) selecting the pooled vector output of the BERT model output as the vector representation of the corresponding text.
8. The whole machine regression test case recommendation method for changing semantic perception according to claim 7, wherein the specific process of the step (3-2) is as follows:
(3-2-1) calculating a similarity between the vector representation of the submitted information and the vector representation of each test case description in the corresponding test case set, wherein the similarity calculation uses a cosine similarity formula:
(3-2-2) selecting the test cases with higher similarity as recommended test cases according to the similarity score given in the step (3-2-1), and outputting a recommended test case list.
CN202410172283.1A 2024-02-07 2024-02-07 Whole machine regression test case recommendation method capable of changing semantic perception Active CN117971684B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410172283.1A CN117971684B (en) 2024-02-07 2024-02-07 Whole machine regression test case recommendation method capable of changing semantic perception

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410172283.1A CN117971684B (en) 2024-02-07 2024-02-07 Whole machine regression test case recommendation method capable of changing semantic perception

Publications (2)

Publication Number Publication Date
CN117971684A true CN117971684A (en) 2024-05-03
CN117971684B CN117971684B (en) 2024-08-23

Family

ID=90865760

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410172283.1A Active CN117971684B (en) 2024-02-07 2024-02-07 Whole machine regression test case recommendation method capable of changing semantic perception

Country Status (1)

Country Link
CN (1) CN117971684B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190034323A1 (en) * 2017-07-27 2019-01-31 Hcl Technologies Limited System and method for generating regression test suite
CN111708703A (en) * 2020-06-18 2020-09-25 深圳前海微众银行股份有限公司 Test case set generation method, device, equipment and computer readable storage medium
CN112052160A (en) * 2020-08-06 2020-12-08 中信银行股份有限公司 Code case obtaining method and device, electronic equipment and medium
WO2022095354A1 (en) * 2020-11-03 2022-05-12 平安科技(深圳)有限公司 Bert-based text classification method and apparatus, computer device, and storage medium
US20220156175A1 (en) * 2020-11-19 2022-05-19 Ebay Inc. Mapping of test cases to test data for computer software testing
US20230195773A1 (en) * 2019-10-11 2023-06-22 Ping An Technology (Shenzhen) Co., Ltd. Text classification method, apparatus and computer-readable storage medium
CN116340159A (en) * 2023-03-14 2023-06-27 平安银行股份有限公司 Regression test case recommendation method, system, equipment and storage medium
CN116662184A (en) * 2023-06-05 2023-08-29 福建师范大学 Industrial control protocol fuzzy test case screening method and system based on Bert
CN116775451A (en) * 2022-12-30 2023-09-19 广东亿迅科技有限公司 Intelligent scoring method and device for test cases, terminal equipment and computer medium
CN117493174A (en) * 2023-10-23 2024-02-02 中移互联网有限公司 Test case determination and cloud disk regression test method and device

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190034323A1 (en) * 2017-07-27 2019-01-31 Hcl Technologies Limited System and method for generating regression test suite
US20230195773A1 (en) * 2019-10-11 2023-06-22 Ping An Technology (Shenzhen) Co., Ltd. Text classification method, apparatus and computer-readable storage medium
CN111708703A (en) * 2020-06-18 2020-09-25 深圳前海微众银行股份有限公司 Test case set generation method, device, equipment and computer readable storage medium
WO2021253904A1 (en) * 2020-06-18 2021-12-23 深圳前海微众银行股份有限公司 Test case set generation method, apparatus and device, and computer readable storage medium
CN112052160A (en) * 2020-08-06 2020-12-08 中信银行股份有限公司 Code case obtaining method and device, electronic equipment and medium
WO2022095354A1 (en) * 2020-11-03 2022-05-12 平安科技(深圳)有限公司 Bert-based text classification method and apparatus, computer device, and storage medium
US20220156175A1 (en) * 2020-11-19 2022-05-19 Ebay Inc. Mapping of test cases to test data for computer software testing
CN116775451A (en) * 2022-12-30 2023-09-19 广东亿迅科技有限公司 Intelligent scoring method and device for test cases, terminal equipment and computer medium
CN116340159A (en) * 2023-03-14 2023-06-27 平安银行股份有限公司 Regression test case recommendation method, system, equipment and storage medium
CN116662184A (en) * 2023-06-05 2023-08-29 福建师范大学 Industrial control protocol fuzzy test case screening method and system based on Bert
CN117493174A (en) * 2023-10-23 2024-02-02 中移互联网有限公司 Test case determination and cloud disk regression test method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
机器之心: "CVPR2022|浙大、蚂蚁集团提出基于标签关系树的层级残差多粒度分类网络,建模多粒度标签间的层级知识", Retrieved from the Internet <URL:《https://cloud.tencent.com/developer/article/2034527》> *

Also Published As

Publication number Publication date
CN117971684B (en) 2024-08-23

Similar Documents

Publication Publication Date Title
CN110597735B (en) Software defect prediction method for open-source software defect feature deep learning
CN110059181B (en) Short text label method, system and device for large-scale classification system
Socher et al. Parsing natural scenes and natural language with recursive neural networks
US11164044B2 (en) Systems and methods for tagging datasets using models arranged in a series of nodes
CN112650923A (en) Public opinion processing method and device for news events, storage medium and computer equipment
CN114896388A (en) Hierarchical multi-label text classification method based on mixed attention
CN111966825A (en) Power grid equipment defect text classification method based on machine learning
CN110175235A (en) Intelligence commodity tax sorting code number method and system neural network based
CN109214410A (en) A kind of method and system promoting multi-tag classification accuracy rate
US10614031B1 (en) Systems and methods for indexing and mapping data sets using feature matrices
CN117171413B (en) Data processing system and method for digital collection management
CN116562284B (en) Government affair text automatic allocation model training method and device
CN117971684B (en) Whole machine regression test case recommendation method capable of changing semantic perception
CN116756605A (en) ERNIE-CN-GRU-based automatic speech step recognition method, system, equipment and medium
CN116150669A (en) Mashup service multi-label classification method based on double-flow regularized width learning
CN115269855A (en) Paper fine-grained multi-label labeling method and device based on pre-training encoder
CN115934936A (en) Intelligent traffic text analysis method based on natural language processing
CN115238645A (en) Asset data identification method and device, electronic equipment and computer storage medium
CN115168634A (en) Fabric cross-modal image-text retrieval method based on multi-level representation
CN114648121A (en) Data processing method and device, electronic equipment and storage medium
CN118520176B (en) Accurate recommendation method and system based on artificial intelligence
CN117952022B (en) Yield multi-dimensional interactive system, method, computer equipment and storage medium
Lin et al. Multi-label text classification based on graph attention network and self-attention mechanism
Mojzeš et al. Feature space for statistical classification of Java source code patterns
Chen et al. Research on software classification based on LSTM and CNN

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant