CN111625468B - Test case duplicate removal method and device - Google Patents
Test case duplicate removal method and device Download PDFInfo
- Publication number
- CN111625468B CN111625468B CN202010505902.6A CN202010505902A CN111625468B CN 111625468 B CN111625468 B CN 111625468B CN 202010505902 A CN202010505902 A CN 202010505902A CN 111625468 B CN111625468 B CN 111625468B
- Authority
- CN
- China
- Prior art keywords
- test case
- keyword
- test
- pair
- cases
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000012360 testing method Methods 0.000 title claims abstract description 451
- 238000000034 method Methods 0.000 title claims abstract description 46
- 230000011218 segmentation Effects 0.000 claims description 11
- 239000000203 mixture Substances 0.000 claims description 7
- 238000000605 extraction Methods 0.000 claims description 5
- 238000012512 characterization method Methods 0.000 claims 3
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/3668—Software testing
- G06F11/3672—Test management
- G06F11/3684—Test management for test design, e.g. generating new test cases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application provides a test case de-duplication method and device, the method respectively determines the similarity of the characteristic values of two test cases in each test case pair by extracting the characteristic values of each test case, the test case pair with the similarity larger than the preset similarity threshold belongs to the characteristic value as a test case pair to be processed, and the test cases in a plurality of test case pairs to be processed are de-duplicated, so that the automatic de-duplication of the test cases is realized. And the feature values of all the test cases are respectively extracted, and the similarity of the feature values of two test cases in all the test cases is respectively determined, so that the processing of all the test cases is realized, the omission of the test cases is avoided, and the accuracy of the test cases is improved.
Description
Technical Field
The present disclosure relates to the field of testing technologies, and in particular, to a test case deduplication method and apparatus.
Background
For large business systems, a large number of test cases may be required to test them. In order to meet the usage requirements of the test cases, different testers are required to write the test cases. And different testers may have repetition of the written test cases, resulting in waste of test resources.
Therefore, it is necessary to perform deduplication processing on test cases where there are duplications. At present, the test case deduplication is generally performed manually, but the accuracy and the efficiency are low.
Disclosure of Invention
In order to solve the above technical problems, an embodiment of the present application provides a test case deduplication method and apparatus, so as to achieve the purpose of improving the deduplication efficiency and accuracy of the test case, and the technical scheme is as follows:
a test case deduplication method, comprising:
extracting characteristic values of each test case;
respectively determining the similarity of the characteristic values of two test cases in each test case pair, wherein the test case pair is obtained by selecting any two test case compositions from a plurality of test cases;
taking the test case pair with the characteristic value of which the similarity is larger than a preset similarity threshold value as a test case pair to be processed;
and de-duplicating the test cases in the plurality of pairs of the test cases to be processed.
Preferably, the extracting the feature value of each test case includes:
extracting characteristic values of each test case;
performing word segmentation on the characteristic values of each test case to obtain at least one keyword;
the determining the similarity of the feature values of the two test cases in each test case pair comprises the following steps:
and determining the similarity of the characteristic values of the two test cases in each test case pair based on the obtained keywords by segmenting the characteristic values of each test case.
Preferably, the determining the similarity of the feature values of the two test cases in each test case pair based on the keywords obtained by word segmentation of the feature values of each test case includes:
determining similar keyword pairs in keywords of two test cases in each test case pair respectively, wherein the similar keyword pairs consist of a first keyword and a second keyword, the text similarity of the first keyword and the second keyword is larger than a set text similarity threshold, the first keyword is a keyword of one of the two test cases, and the second keyword is a keyword of the other of the two test cases;
and counting the number of the similar keyword pairs, and determining the similarity of the characteristic values of the two test cases based on the number of the similar keyword pairs.
Preferably, the determining the similarity of the feature values of the two test cases in each test case pair based on the keywords obtained by word segmentation of the feature values of each test case includes:
counting the occurrence times of each first keyword in each test case pair in a first test case and the occurrence times of each second keyword in each test case pair in a second test case respectively, wherein the first test case and the second test case form the test case pair, the first keyword is the keyword of the first test case, and the second keyword is the keyword of the second test case;
and determining the similarity of the feature values of the two test cases based on the keywords of the two test cases in each test case pair, the frequency of occurrence of each first keyword in each test case pair in the first test case, and the frequency of occurrence of each second keyword in each test case pair in the second test case.
Preferably, the determining the similarity of the feature values of the two test cases based on the keywords of the two test cases in each test case pair, the number of times each first keyword in each test case pair appears in the first test case, and the number of times each second keyword in each test case pair appears in the second test case, includes:
and determining the similarity of the characteristic values of the two test cases by using a cosine similarity algorithm based on the keywords of the two test cases in each test case pair, the frequency of occurrence of each first keyword in each test case pair in the first test case, and the frequency of occurrence of each second keyword in each test case pair in the second test case.
A test case deduplication apparatus comprising:
the extraction module is used for extracting the characteristic values of each test case;
the first determining module is used for respectively determining the similarity of the characteristic values of two test cases in each test case pair, wherein the test case pair is obtained by selecting any two test case compositions from a plurality of test cases;
the second determining module is used for taking the test case pair with the feature value of which the similarity is larger than a preset similarity threshold value as a test case pair to be processed;
and the de-duplication module is used for de-duplicating the test cases in the plurality of the test case pairs to be processed.
Preferably, the extraction module is specifically configured to:
extracting characteristic values of each test case;
performing word segmentation on the characteristic values of each test case to obtain at least one keyword;
the first determining module is specifically configured to:
and determining the similarity of the characteristic values of the two test cases in each test case pair based on the obtained keywords by segmenting the characteristic values of each test case.
Preferably, the first determining module is specifically configured to:
determining similar keyword pairs in keywords of two test cases in each test case pair respectively, wherein the similar keyword pairs consist of a first keyword and a second keyword, the text similarity of the first keyword and the second keyword is larger than a set text similarity threshold, the first keyword is a keyword of one of the two test cases, and the second keyword is a keyword of the other of the two test cases;
and counting the number of the similar keyword pairs, and determining the similarity of the characteristic values of the two test cases based on the number of the similar keyword pairs.
Preferably, the first determining module is specifically configured to:
counting the occurrence times of each first keyword in each test case pair in a first test case and the occurrence times of each second keyword in each test case pair in a second test case respectively, wherein the first test case and the second test case form the test case pair, the first keyword is the keyword of the first test case, and the second keyword is the keyword of the second test case;
and determining the similarity of the feature values of the two test cases based on the keywords of the two test cases in each test case pair, the frequency of occurrence of each first keyword in each test case pair in the first test case, and the frequency of occurrence of each second keyword in each test case pair in the second test case.
Preferably, the first determining module is specifically configured to:
and determining the similarity of the characteristic values of the two test cases by using a cosine similarity algorithm based on the keywords of the two test cases in each test case pair, the frequency of occurrence of each first keyword in each test case pair in the first test case, and the frequency of occurrence of each second keyword in each test case pair in the second test case.
Compared with the prior art, the beneficial effects of this application are:
in the method, the similarity of the characteristic values of the two test cases in each test case pair is respectively determined by extracting the characteristic values of each test case, the test case pair with the similarity larger than the preset similarity threshold belongs to the characteristic value as a test case pair to be processed, and the test cases in the plurality of test case pairs to be processed are subjected to the de-duplication mode, so that the automatic de-duplication of the test cases is realized. And the feature values of all the test cases are respectively extracted, and the similarity of the feature values of two test cases in all the test cases is respectively determined, so that the processing of all the test cases is realized, the omission of the test cases is avoided, and the accuracy of the test cases is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.
Fig. 1 is a flow chart of example 1 of a test case deduplication method provided herein;
fig. 2 is a flow chart of example 2 of a test case deduplication method provided herein;
fig. 3 is a flow chart of example 3 of a test case deduplication method provided herein;
fig. 4 is a flow chart of example 4 of a test case deduplication method provided herein;
fig. 5 is a schematic structural diagram of a test case deduplication apparatus provided in the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
The embodiment of the application discloses a test case deduplication method, which comprises the following steps: extracting characteristic values of each test case; respectively determining the similarity of the characteristic values of two test cases in each test case pair, wherein the test case pair is obtained by selecting any two test case compositions from a plurality of test cases; taking the test case pair with the characteristic value of which the similarity is larger than a preset similarity threshold value as a test case pair to be processed; and de-duplicating the test cases in the plurality of pairs of the test cases to be processed. In the application, the duplicate removal efficiency and the accuracy can be improved.
Next, a test case duplication eliminating method disclosed in the embodiments of the present application is described, and as shown in fig. 1, a flowchart of an embodiment 1 of a test case duplication eliminating method provided in the present application may include the following steps:
and S11, extracting characteristic values of each test case.
The characteristic values of the test cases may include, but are not limited to: any one or more of a feature value characterizing a functional module to which the test case belongs, a feature value describing the test case, a feature value characterizing an operational step of the test case, and a feature value characterizing an expected result of execution of the test case.
Step S12, similarity of characteristic values of two test cases in each test case pair is determined, wherein the test case pair is obtained by selecting any two test case compositions from a plurality of test cases.
After extracting the characteristic values of each test case, the plurality of test cases can be combined in pairs to obtain a plurality of test case pairs, and the similarity of the characteristic values of two test cases in each test case pair is respectively determined. Specifically, the similarity of the feature values of two test cases in each test case pair can be determined by using a cosine similarity algorithm.
The similarity of the feature values of the two test cases in each test case pair can be used as the similarity of the two test cases in each test case pair.
And S13, taking the test case pair with the characteristic value of which the similarity is larger than a preset similarity threshold value as a test case pair to be processed.
The preset similarity threshold may be set as needed, and is not limited in this embodiment.
And S14, performing de-duplication on the test cases in the plurality of to-be-processed test case pairs.
The deduplicating the test cases in the plurality of pairs of test cases to be processed may include:
s141, respectively removing one of two test cases in each test case pair to be processed to obtain a first test case set;
s142, if a first test case example set exists in the first test case set, selecting one test case from the first test case subset for reservation, wherein the first test case subset comprises at least 2 test cases, and at least 2 test cases are the same.
As another alternative embodiment of the present application, referring to fig. 2, a schematic flow chart of an embodiment 2 of a test case duplication eliminating method provided in the present application is mainly a refinement of the test case duplication eliminating method described in the foregoing embodiment 1, and as shown in fig. 2, the method may include, but is not limited to, the following steps:
and S21, extracting characteristic values of each test case.
In this embodiment, the feature values of the test cases may include, but are not limited to: any one or more of a feature value characterizing a functional module to which the test case belongs, a feature value describing the test case, a feature value characterizing an operational step of the test case, and a feature value characterizing an expected result of execution of the test case.
And S22, segmenting the characteristic values of each test case to obtain at least one keyword.
Steps S21-S22 are a specific embodiment of step S11 in example 1.
In this embodiment, the feature value of each test case may be segmented based on the python segmentation library to obtain at least one keyword.
And S23, determining the similarity of the characteristic values of the two test cases in each test case pair based on the obtained keywords by segmenting the characteristic values of the test cases.
Step S23 is a specific embodiment of step S12 in example 1.
And S24, taking the test case pair with the characteristic value of which the similarity is larger than a preset similarity threshold value as a test case pair to be processed.
And S25, performing de-duplication on the test cases in the plurality of to-be-processed test case pairs.
The detailed procedure of steps S24-S25 can be referred to the related description of steps S13-S14 in embodiment 1, and will not be repeated here.
In this embodiment, the keyword is obtained by word segmentation of the feature value, and the similarity of the feature value is determined based on the keyword, so that the complexity of determining the similarity of the feature value can be reduced, the efficiency of determining the similarity of the feature value is improved, and further the duplicate removal efficiency is improved.
As another alternative embodiment of the present application, referring to fig. 3, a schematic flow chart of an embodiment 3 of a test case duplication eliminating method provided in the present application is mainly a refinement of the test case duplication eliminating method described in the foregoing embodiment 2, and as shown in fig. 3, the method may include, but is not limited to, the following steps:
and S31, extracting characteristic values of each test case.
And S32, segmenting the characteristic values of each test case to obtain at least one keyword.
The detailed procedure of steps S31-S32 can be referred to in the related description of steps S21-S22 in embodiment 2, and will not be described herein.
Step S33, determining similar keyword pairs in keywords of two test cases in each test case pair respectively.
In this embodiment, the pair of similar keywords is composed of a first keyword and a second keyword, where the text similarity between the first keyword and the second keyword is greater than a set text similarity threshold, the first keyword is a keyword of one of the two test cases, and the second keyword is a keyword of the other of the two test cases.
In this embodiment, the determining process of the similar keyword pairs may include: and calculating the text similarity of the first keyword and the second keyword, judging whether the text similarity of the first keyword and the second keyword is larger than a set text similarity threshold, and if so, forming a similar keyword pair by the first keyword and the second keyword.
And step S34, counting the number of the similar keyword pairs, and determining the similarity of the characteristic values of the two test cases based on the number of the similar keyword pairs.
In this embodiment, a correspondence between the number of similar keyword pairs and the similarity of the feature values of the test cases may be set, after the number of similar keyword pairs is counted, the similarity of the feature values corresponding to the number of related keyword pairs is searched in the correspondence, and the searched similarity is used as the similarity of the feature values of the two test cases.
Steps S33-S34 are a specific embodiment of step S23 in example 2.
Step S35, taking a test case pair to which the characteristic value with the similarity larger than a preset similarity threshold belongs as a test case pair to be processed;
and S36, performing de-duplication on the test cases in the plurality of to-be-processed test case pairs.
The detailed procedure of steps S35-S36 can be seen in steps S24-S25 in embodiment 2, and will not be described here.
As another alternative embodiment of the present application, referring to fig. 4, a schematic flow chart of an embodiment 4 of a test case deduplication method provided in the present application is mainly a refinement of the test case deduplication method described in the foregoing embodiment 2, and as shown in fig. 4, the method may include, but is not limited to, the following steps:
and S41, extracting characteristic values of each test case.
And step S42, segmenting the characteristic values of each test case to obtain at least one keyword.
Step S33, counting the times of occurrence of each first keyword in each test case pair in the first test case and the times of occurrence of each second keyword in each test case pair in the second test case.
The first test case and the second test case form the test case pair, the first keyword is a keyword of the first test case, and the second keyword is a keyword of the second test case.
Step S44, determining the similarity of the feature values of the two test cases based on the keywords of the two test cases in each test case pair, the number of times that each first keyword in each test case pair appears in the first test case, and the number of times that each second keyword in each test case pair appears in the second test case.
In this embodiment, the first keyword of the first test case and the number of times the first keyword appears in the first test case in each test case pair may be formed into a first vector, the second keyword of the second test case and the number of times the second keyword appears in the second test case may be formed into a second vector, the similarity between the first vector and the second vector may be calculated, and the similarity between the first vector and the second vector may be used as the similarity between the feature values of the first test case and the second test case.
In this embodiment, the similarity of the feature values of the two test cases may be determined using a cosine similarity algorithm. Specifically, the similarity of the first vector and the second vector may be calculated using a cosine similarity algorithm.
Steps S43-S44 are a specific embodiment of step S23 in example 2.
Step S45, taking a test case pair to which the characteristic value with the similarity larger than a preset similarity threshold belongs as a test case pair to be processed;
and S46, performing deduplication on the test cases in the plurality of to-be-processed test case pairs.
The detailed procedure of steps S45-S46 can be seen in steps S24-S25 in embodiment 2, and will not be described here.
In this embodiment, the similarity of the feature values of the two test cases is determined based on the number of the similar keyword pairs and the number of times that the keywords in the similar keyword pairs appear in the keywords of the two test cases, so that the accuracy of determining the similarity of the feature values can be improved.
Next, a test case deduplication device provided in the present application will be described, and the test case deduplication device described below and the test case deduplication method described above may be referred to correspondingly.
Referring to fig. 5, the test case deduplication apparatus includes: the device comprises an extraction module 11, a first determination module 12, a second determination module 13 and a deduplication module 14.
An extracting module 11, configured to extract feature values of each test case;
a first determining module 12, configured to determine similarity of feature values of two test cases in each pair of test cases, where the pair of test cases is obtained by selecting any two test case compositions from a plurality of test cases;
a second determining module 13, configured to use a pair of test cases to which the feature value with the similarity greater than the preset similarity threshold belongs as a pair of test cases to be processed;
and the deduplication module 14 is used for deduplicating the test cases in the plurality of the to-be-processed test case pairs.
In this embodiment, the extracting module 11 may specifically be configured to:
extracting characteristic values of each test case;
performing word segmentation on the characteristic values of each test case to obtain at least one keyword;
accordingly, the first determining module 12 may specifically be configured to:
and determining the similarity of the characteristic values of the two test cases in each test case pair based on the obtained keywords by segmenting the characteristic values of each test case.
In this embodiment, the first determining module 12 may specifically be configured to:
determining similar keyword pairs in keywords of two test cases in each test case pair respectively, wherein the similar keyword pairs consist of a first keyword and a second keyword, the text similarity of the first keyword and the second keyword is larger than a set text similarity threshold, the first keyword is a keyword of one of the two test cases, and the second keyword is a keyword of the other of the two test cases;
and counting the number of the similar keyword pairs, and determining the similarity of the characteristic values of the two test cases based on the number of the similar keyword pairs.
In this embodiment, the first determining module 12 may specifically be configured to:
counting the occurrence times of each first keyword in each test case pair in a first test case and the occurrence times of each second keyword in each test case pair in a second test case respectively, wherein the first test case and the second test case form the test case pair, the first keyword is the keyword of the first test case, and the second keyword is the keyword of the second test case;
and determining the similarity of the feature values of the two test cases based on the keywords of the two test cases in each test case pair, the frequency of occurrence of each first keyword in each test case pair in the first test case, and the frequency of occurrence of each second keyword in each test case pair in the second test case.
In this embodiment, the first determining module 12 may specifically be configured to:
and determining the similarity of the characteristic values of the two test cases by using a cosine similarity algorithm based on the keywords of the two test cases in each test case pair, the frequency of occurrence of each first keyword in each test case pair in the first test case, and the frequency of occurrence of each second keyword in each test case pair in the second test case.
It should be noted that, in each embodiment, the differences from the other embodiments are emphasized, and the same similar parts between the embodiments are referred to each other. For the apparatus class embodiments, the description is relatively simple as it is substantially similar to the method embodiments, and reference is made to the description of the method embodiments for relevant points.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present application.
From the above description of embodiments, it will be apparent to those skilled in the art that the present application may be implemented in software plus a necessary general purpose hardware platform. Based on such understanding, the technical solutions of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in the embodiments or some parts of the embodiments of the present application.
The foregoing has described in detail a test case deduplication method and apparatus provided herein, with specific examples being employed herein to illustrate the principles and implementations of the present application, the above examples being provided only to assist in understanding the methods of the present application and their core ideas; meanwhile, as those skilled in the art will have modifications in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.
Claims (8)
1. A test case deduplication method, comprising:
extracting characteristic values of each test case; the characteristic value is any one or more of a characteristic value of a functional module to which the characterization test case belongs, a characteristic value of a description of the test case, a characteristic value of an operation step of the characterization test case and a characteristic value of an expected result of the performance of the characterization test case;
respectively determining the similarity of the characteristic values of two test cases in each test case pair, wherein the test case pair is obtained by selecting any two test case compositions from a plurality of test cases;
taking the test case pair with the characteristic value of which the similarity is larger than a preset similarity threshold value as a test case pair to be processed;
performing de-duplication on the test cases in the plurality of pairs of test cases to be processed, including: removing one of two test cases in each test case pair to be processed respectively to obtain a first test case set, and if the first test case set exists in the first test case set, selecting one test case in the first test case set for reservation, wherein the first test case set comprises at least 2 test cases, and at least 2 test cases are the same;
the extracting the characteristic value of each test case comprises the following steps:
extracting characteristic values of each test case;
performing word segmentation on the characteristic values of each test case to obtain at least one keyword; the method comprises the steps of dividing words of characteristic values of each test case based on a python word dividing library to obtain at least one keyword;
the determining the similarity of the feature values of the two test cases in each test case pair comprises the following steps:
and determining the similarity of the characteristic values of the two test cases in each test case pair based on the obtained keywords by segmenting the characteristic values of each test case.
2. The method according to claim 1, wherein the determining the similarity of the feature values of two test cases in each test case pair based on the keywords obtained by word segmentation of the feature values of each test case, respectively, comprises:
determining similar keyword pairs in keywords of two test cases in each test case pair respectively, wherein the similar keyword pairs consist of a first keyword and a second keyword, the text similarity of the first keyword and the second keyword is larger than a set text similarity threshold, the first keyword is a keyword of one of the two test cases, and the second keyword is a keyword of the other of the two test cases;
and counting the number of the similar keyword pairs, and determining the similarity of the characteristic values of the two test cases based on the number of the similar keyword pairs.
3. The method according to claim 1, wherein the determining the similarity of the feature values of two test cases in each test case pair based on the keywords obtained by word segmentation of the feature values of each test case, respectively, comprises:
counting the occurrence times of each first keyword in each test case pair in a first test case and the occurrence times of each second keyword in each test case pair in a second test case respectively, wherein the first test case and the second test case form the test case pair, the first keyword is the keyword of the first test case, and the second keyword is the keyword of the second test case;
and determining the similarity of the feature values of the two test cases based on the keywords of the two test cases in each test case pair, the frequency of occurrence of each first keyword in each test case pair in the first test case, and the frequency of occurrence of each second keyword in each test case pair in the second test case.
4. The method of claim 3, wherein determining the similarity of the feature values of the two test cases based on the keywords of the two test cases in each of the test cases, the number of times each first keyword in each of the test cases appears in the first test case, and the number of times each second keyword in each of the test cases appears in the second test case, respectively, comprises:
and determining the similarity of the characteristic values of the two test cases by using a cosine similarity algorithm based on the keywords of the two test cases in each test case pair, the frequency of occurrence of each first keyword in each test case pair in the first test case, and the frequency of occurrence of each second keyword in each test case pair in the second test case.
5. A test case deduplication apparatus, comprising:
the extraction module is used for extracting the characteristic values of each test case; any one or more of a feature value representing a functional module to which the test case belongs, a feature value describing the test case, a feature value representing an operation step of the test case, and a feature value representing an expected result of execution of the test case;
the first determining module is used for respectively determining the similarity of the characteristic values of two test cases in each test case pair, wherein the test case pair is obtained by selecting any two test case compositions from a plurality of test cases;
the second determining module is used for taking the test case pair with the feature value of which the similarity is larger than a preset similarity threshold value as a test case pair to be processed;
the de-duplication module is configured to de-duplicate test cases in the plurality of pairs of test cases to be processed, and includes: removing one of two test cases in each test case pair to be processed respectively to obtain a first test case set, and if the first test case set exists in the first test case set, selecting one test case in the first test case set for reservation, wherein the first test case set comprises at least 2 test cases, and at least 2 test cases are the same;
the extraction module is specifically configured to:
extracting characteristic values of each test case;
the characteristic values of all the test cases are segmented to obtain at least one keyword, wherein the characteristic values of all the test cases are segmented based on a python segmentation library to obtain at least one keyword;
the first determining module is specifically configured to:
and determining the similarity of the characteristic values of the two test cases in each test case pair based on the obtained keywords by segmenting the characteristic values of each test case.
6. The apparatus of claim 5, wherein the first determining module is specifically configured to:
determining similar keyword pairs in keywords of two test cases in each test case pair respectively, wherein the similar keyword pairs consist of a first keyword and a second keyword, the text similarity of the first keyword and the second keyword is larger than a set text similarity threshold, the first keyword is a keyword of one of the two test cases, and the second keyword is a keyword of the other of the two test cases;
and counting the number of the similar keyword pairs, and determining the similarity of the characteristic values of the two test cases based on the number of the similar keyword pairs.
7. The apparatus of claim 5, wherein the first determining module is specifically configured to:
counting the occurrence times of each first keyword in each test case pair in a first test case and the occurrence times of each second keyword in each test case pair in a second test case respectively, wherein the first test case and the second test case form the test case pair, the first keyword is the keyword of the first test case, and the second keyword is the keyword of the second test case;
and determining the similarity of the feature values of the two test cases based on the keywords of the two test cases in each test case pair, the frequency of occurrence of each first keyword in each test case pair in the first test case, and the frequency of occurrence of each second keyword in each test case pair in the second test case.
8. The apparatus of claim 7, wherein the first determining module is specifically configured to:
and determining the similarity of the characteristic values of the two test cases by using a cosine similarity algorithm based on the keywords of the two test cases in each test case pair, the frequency of occurrence of each first keyword in each test case pair in the first test case, and the frequency of occurrence of each second keyword in each test case pair in the second test case.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010505902.6A CN111625468B (en) | 2020-06-05 | 2020-06-05 | Test case duplicate removal method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010505902.6A CN111625468B (en) | 2020-06-05 | 2020-06-05 | Test case duplicate removal method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111625468A CN111625468A (en) | 2020-09-04 |
CN111625468B true CN111625468B (en) | 2024-04-16 |
Family
ID=72260191
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010505902.6A Active CN111625468B (en) | 2020-06-05 | 2020-06-05 | Test case duplicate removal method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111625468B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11954019B2 (en) | 2022-02-04 | 2024-04-09 | Optum, Inc. | Machine learning techniques for automated software testing configuration management |
Citations (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8234285B1 (en) * | 2009-07-10 | 2012-07-31 | Google Inc. | Context-dependent similarity measurements |
CN103678702A (en) * | 2013-12-30 | 2014-03-26 | 优视科技有限公司 | Video duplicate removal method and device |
CN104636319A (en) * | 2013-11-11 | 2015-05-20 | 腾讯科技(北京)有限公司 | Text duplicate removal method and device |
CN105824798A (en) * | 2016-03-03 | 2016-08-03 | 云南电网有限责任公司教育培训评价中心 | Examination question de-duplicating method of examination question base based on examination question key word likeness |
CN106598940A (en) * | 2016-11-01 | 2017-04-26 | 四川用联信息技术有限公司 | Text similarity solution algorithm based on global optimization of keyword quality |
WO2017107566A1 (en) * | 2015-12-25 | 2017-06-29 | 广州视源电子科技股份有限公司 | Retrieval method and system based on word vector similarity |
CN109101620A (en) * | 2018-08-08 | 2018-12-28 | 广州神马移动信息科技有限公司 | Similarity calculating method, clustering method, device, storage medium and electronic equipment |
CN109508378A (en) * | 2018-11-26 | 2019-03-22 | 平安科技(深圳)有限公司 | A kind of sample data processing method and processing device |
CN110162630A (en) * | 2019-05-09 | 2019-08-23 | 深圳市腾讯信息技术有限公司 | A kind of method, device and equipment of text duplicate removal |
CN110163688A (en) * | 2019-05-30 | 2019-08-23 | 复旦大学 | Commodity network public sentiment detection system |
CN110162750A (en) * | 2019-01-24 | 2019-08-23 | 腾讯科技(深圳)有限公司 | Text similarity detection method, electronic equipment and computer readable storage medium |
CN110276021A (en) * | 2019-04-29 | 2019-09-24 | 小轮(上海)网络科技有限公司 | Place name matching process and device based on semantic similarity |
CN110377886A (en) * | 2019-06-19 | 2019-10-25 | 平安国际智慧城市科技股份有限公司 | Project duplicate checking method, apparatus, equipment and storage medium |
CN110442760A (en) * | 2019-07-24 | 2019-11-12 | 银江股份有限公司 | A kind of the synonym method for digging and device of question and answer searching system |
CN110941598A (en) * | 2019-12-02 | 2020-03-31 | 北京锐安科技有限公司 | Data deduplication method, device, terminal and storage medium |
CN110956037A (en) * | 2019-10-16 | 2020-04-03 | 厦门美柚股份有限公司 | Multimedia content repeated judgment method and device |
CN111159445A (en) * | 2019-12-30 | 2020-05-15 | 深圳云天励飞技术有限公司 | Picture filtering method and device, electronic equipment and storage medium |
-
2020
- 2020-06-05 CN CN202010505902.6A patent/CN111625468B/en active Active
Patent Citations (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8234285B1 (en) * | 2009-07-10 | 2012-07-31 | Google Inc. | Context-dependent similarity measurements |
CN104636319A (en) * | 2013-11-11 | 2015-05-20 | 腾讯科技(北京)有限公司 | Text duplicate removal method and device |
CN103678702A (en) * | 2013-12-30 | 2014-03-26 | 优视科技有限公司 | Video duplicate removal method and device |
WO2017107566A1 (en) * | 2015-12-25 | 2017-06-29 | 广州视源电子科技股份有限公司 | Retrieval method and system based on word vector similarity |
CN105824798A (en) * | 2016-03-03 | 2016-08-03 | 云南电网有限责任公司教育培训评价中心 | Examination question de-duplicating method of examination question base based on examination question key word likeness |
CN106598940A (en) * | 2016-11-01 | 2017-04-26 | 四川用联信息技术有限公司 | Text similarity solution algorithm based on global optimization of keyword quality |
CN109101620A (en) * | 2018-08-08 | 2018-12-28 | 广州神马移动信息科技有限公司 | Similarity calculating method, clustering method, device, storage medium and electronic equipment |
CN109508378A (en) * | 2018-11-26 | 2019-03-22 | 平安科技(深圳)有限公司 | A kind of sample data processing method and processing device |
CN110162750A (en) * | 2019-01-24 | 2019-08-23 | 腾讯科技(深圳)有限公司 | Text similarity detection method, electronic equipment and computer readable storage medium |
CN110276021A (en) * | 2019-04-29 | 2019-09-24 | 小轮(上海)网络科技有限公司 | Place name matching process and device based on semantic similarity |
CN110162630A (en) * | 2019-05-09 | 2019-08-23 | 深圳市腾讯信息技术有限公司 | A kind of method, device and equipment of text duplicate removal |
CN110163688A (en) * | 2019-05-30 | 2019-08-23 | 复旦大学 | Commodity network public sentiment detection system |
CN110377886A (en) * | 2019-06-19 | 2019-10-25 | 平安国际智慧城市科技股份有限公司 | Project duplicate checking method, apparatus, equipment and storage medium |
CN110442760A (en) * | 2019-07-24 | 2019-11-12 | 银江股份有限公司 | A kind of the synonym method for digging and device of question and answer searching system |
CN110956037A (en) * | 2019-10-16 | 2020-04-03 | 厦门美柚股份有限公司 | Multimedia content repeated judgment method and device |
CN110941598A (en) * | 2019-12-02 | 2020-03-31 | 北京锐安科技有限公司 | Data deduplication method, device, terminal and storage medium |
CN111159445A (en) * | 2019-12-30 | 2020-05-15 | 深圳云天励飞技术有限公司 | Picture filtering method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN111625468A (en) | 2020-09-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10346257B2 (en) | Method and device for deduplicating web page | |
CN110826648B (en) | Method for realizing fault detection by utilizing time sequence clustering algorithm | |
CN110019792A (en) | File classification method and device and sorter model training method | |
CN111243601B (en) | Voiceprint clustering method and device, electronic equipment and computer-readable storage medium | |
US10783145B2 (en) | Block level deduplication with block similarity | |
CN113448935B (en) | Method, electronic device and computer program product for providing log information | |
CN111597297A (en) | Article recall method, system, electronic device and readable storage medium | |
CN111625468B (en) | Test case duplicate removal method and device | |
CN110096605B (en) | Image processing method and device, electronic device and storage medium | |
CN117743577A (en) | Text classification method, device, electronic equipment and storage medium | |
CN106919554B (en) | Method and device for identifying invalid words in document | |
CN116361185A (en) | Software testing method and device | |
CN110399464B (en) | Similar news judgment method and system and electronic equipment | |
CN110929493B (en) | Data management method, redundant data detection method, storage medium and data system | |
CN111159996B (en) | Short text set similarity comparison method and system based on text fingerprint algorithm | |
CN104484330A (en) | Pre-selecting method and device of spam comments based on grading keyword threshold combination evaluation | |
KR102357023B1 (en) | Apparatus and Method for restoring Conversation Segment Sentences | |
CN110321425B (en) | Method and device for judging defect type of power grid | |
CN113780042A (en) | Picture set operation method, picture set labeling method and device | |
Zhang et al. | Research on data cleaning method based on SNM algorithm | |
CN115858324B (en) | AI-based IT equipment fault processing method, apparatus, equipment and medium | |
Ruiz et al. | Video retrieval using sparse Bayesian reconstruction | |
CN111552864B (en) | Information deduplication method, system, storage medium and electronic equipment | |
CN115344485A (en) | Anomaly detection method and device, computer equipment and storage medium | |
CN113656393B (en) | Data processing method, device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |