CN111625468B - Test case duplicate removal method and device - Google Patents

Test case duplicate removal method and device Download PDF

Info

Publication number
CN111625468B
CN111625468B CN202010505902.6A CN202010505902A CN111625468B CN 111625468 B CN111625468 B CN 111625468B CN 202010505902 A CN202010505902 A CN 202010505902A CN 111625468 B CN111625468 B CN 111625468B
Authority
CN
China
Prior art keywords
test case
keyword
test
pair
cases
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010505902.6A
Other languages
Chinese (zh)
Other versions
CN111625468A (en
Inventor
李刘强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bank of China Ltd
Original Assignee
Bank of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bank of China Ltd filed Critical Bank of China Ltd
Priority to CN202010505902.6A priority Critical patent/CN111625468B/en
Publication of CN111625468A publication Critical patent/CN111625468A/en
Application granted granted Critical
Publication of CN111625468B publication Critical patent/CN111625468B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3684Test management for test design, e.g. generating new test cases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a test case de-duplication method and device, the method respectively determines the similarity of the characteristic values of two test cases in each test case pair by extracting the characteristic values of each test case, the test case pair with the similarity larger than the preset similarity threshold belongs to the characteristic value as a test case pair to be processed, and the test cases in a plurality of test case pairs to be processed are de-duplicated, so that the automatic de-duplication of the test cases is realized. And the feature values of all the test cases are respectively extracted, and the similarity of the feature values of two test cases in all the test cases is respectively determined, so that the processing of all the test cases is realized, the omission of the test cases is avoided, and the accuracy of the test cases is improved.

Description

Test case duplicate removal method and device
Technical Field
The present disclosure relates to the field of testing technologies, and in particular, to a test case deduplication method and apparatus.
Background
For large business systems, a large number of test cases may be required to test them. In order to meet the usage requirements of the test cases, different testers are required to write the test cases. And different testers may have repetition of the written test cases, resulting in waste of test resources.
Therefore, it is necessary to perform deduplication processing on test cases where there are duplications. At present, the test case deduplication is generally performed manually, but the accuracy and the efficiency are low.
Disclosure of Invention
In order to solve the above technical problems, an embodiment of the present application provides a test case deduplication method and apparatus, so as to achieve the purpose of improving the deduplication efficiency and accuracy of the test case, and the technical scheme is as follows:
a test case deduplication method, comprising:
extracting characteristic values of each test case;
respectively determining the similarity of the characteristic values of two test cases in each test case pair, wherein the test case pair is obtained by selecting any two test case compositions from a plurality of test cases;
taking the test case pair with the characteristic value of which the similarity is larger than a preset similarity threshold value as a test case pair to be processed;
and de-duplicating the test cases in the plurality of pairs of the test cases to be processed.
Preferably, the extracting the feature value of each test case includes:
extracting characteristic values of each test case;
performing word segmentation on the characteristic values of each test case to obtain at least one keyword;
the determining the similarity of the feature values of the two test cases in each test case pair comprises the following steps:
and determining the similarity of the characteristic values of the two test cases in each test case pair based on the obtained keywords by segmenting the characteristic values of each test case.
Preferably, the determining the similarity of the feature values of the two test cases in each test case pair based on the keywords obtained by word segmentation of the feature values of each test case includes:
determining similar keyword pairs in keywords of two test cases in each test case pair respectively, wherein the similar keyword pairs consist of a first keyword and a second keyword, the text similarity of the first keyword and the second keyword is larger than a set text similarity threshold, the first keyword is a keyword of one of the two test cases, and the second keyword is a keyword of the other of the two test cases;
and counting the number of the similar keyword pairs, and determining the similarity of the characteristic values of the two test cases based on the number of the similar keyword pairs.
Preferably, the determining the similarity of the feature values of the two test cases in each test case pair based on the keywords obtained by word segmentation of the feature values of each test case includes:
counting the occurrence times of each first keyword in each test case pair in a first test case and the occurrence times of each second keyword in each test case pair in a second test case respectively, wherein the first test case and the second test case form the test case pair, the first keyword is the keyword of the first test case, and the second keyword is the keyword of the second test case;
and determining the similarity of the feature values of the two test cases based on the keywords of the two test cases in each test case pair, the frequency of occurrence of each first keyword in each test case pair in the first test case, and the frequency of occurrence of each second keyword in each test case pair in the second test case.
Preferably, the determining the similarity of the feature values of the two test cases based on the keywords of the two test cases in each test case pair, the number of times each first keyword in each test case pair appears in the first test case, and the number of times each second keyword in each test case pair appears in the second test case, includes:
and determining the similarity of the characteristic values of the two test cases by using a cosine similarity algorithm based on the keywords of the two test cases in each test case pair, the frequency of occurrence of each first keyword in each test case pair in the first test case, and the frequency of occurrence of each second keyword in each test case pair in the second test case.
A test case deduplication apparatus comprising:
the extraction module is used for extracting the characteristic values of each test case;
the first determining module is used for respectively determining the similarity of the characteristic values of two test cases in each test case pair, wherein the test case pair is obtained by selecting any two test case compositions from a plurality of test cases;
the second determining module is used for taking the test case pair with the feature value of which the similarity is larger than a preset similarity threshold value as a test case pair to be processed;
and the de-duplication module is used for de-duplicating the test cases in the plurality of the test case pairs to be processed.
Preferably, the extraction module is specifically configured to:
extracting characteristic values of each test case;
performing word segmentation on the characteristic values of each test case to obtain at least one keyword;
the first determining module is specifically configured to:
and determining the similarity of the characteristic values of the two test cases in each test case pair based on the obtained keywords by segmenting the characteristic values of each test case.
Preferably, the first determining module is specifically configured to:
determining similar keyword pairs in keywords of two test cases in each test case pair respectively, wherein the similar keyword pairs consist of a first keyword and a second keyword, the text similarity of the first keyword and the second keyword is larger than a set text similarity threshold, the first keyword is a keyword of one of the two test cases, and the second keyword is a keyword of the other of the two test cases;
and counting the number of the similar keyword pairs, and determining the similarity of the characteristic values of the two test cases based on the number of the similar keyword pairs.
Preferably, the first determining module is specifically configured to:
counting the occurrence times of each first keyword in each test case pair in a first test case and the occurrence times of each second keyword in each test case pair in a second test case respectively, wherein the first test case and the second test case form the test case pair, the first keyword is the keyword of the first test case, and the second keyword is the keyword of the second test case;
and determining the similarity of the feature values of the two test cases based on the keywords of the two test cases in each test case pair, the frequency of occurrence of each first keyword in each test case pair in the first test case, and the frequency of occurrence of each second keyword in each test case pair in the second test case.
Preferably, the first determining module is specifically configured to:
and determining the similarity of the characteristic values of the two test cases by using a cosine similarity algorithm based on the keywords of the two test cases in each test case pair, the frequency of occurrence of each first keyword in each test case pair in the first test case, and the frequency of occurrence of each second keyword in each test case pair in the second test case.
Compared with the prior art, the beneficial effects of this application are:
in the method, the similarity of the characteristic values of the two test cases in each test case pair is respectively determined by extracting the characteristic values of each test case, the test case pair with the similarity larger than the preset similarity threshold belongs to the characteristic value as a test case pair to be processed, and the test cases in the plurality of test case pairs to be processed are subjected to the de-duplication mode, so that the automatic de-duplication of the test cases is realized. And the feature values of all the test cases are respectively extracted, and the similarity of the feature values of two test cases in all the test cases is respectively determined, so that the processing of all the test cases is realized, the omission of the test cases is avoided, and the accuracy of the test cases is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.
Fig. 1 is a flow chart of example 1 of a test case deduplication method provided herein;
fig. 2 is a flow chart of example 2 of a test case deduplication method provided herein;
fig. 3 is a flow chart of example 3 of a test case deduplication method provided herein;
fig. 4 is a flow chart of example 4 of a test case deduplication method provided herein;
fig. 5 is a schematic structural diagram of a test case deduplication apparatus provided in the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
The embodiment of the application discloses a test case deduplication method, which comprises the following steps: extracting characteristic values of each test case; respectively determining the similarity of the characteristic values of two test cases in each test case pair, wherein the test case pair is obtained by selecting any two test case compositions from a plurality of test cases; taking the test case pair with the characteristic value of which the similarity is larger than a preset similarity threshold value as a test case pair to be processed; and de-duplicating the test cases in the plurality of pairs of the test cases to be processed. In the application, the duplicate removal efficiency and the accuracy can be improved.
Next, a test case duplication eliminating method disclosed in the embodiments of the present application is described, and as shown in fig. 1, a flowchart of an embodiment 1 of a test case duplication eliminating method provided in the present application may include the following steps:
and S11, extracting characteristic values of each test case.
The characteristic values of the test cases may include, but are not limited to: any one or more of a feature value characterizing a functional module to which the test case belongs, a feature value describing the test case, a feature value characterizing an operational step of the test case, and a feature value characterizing an expected result of execution of the test case.
Step S12, similarity of characteristic values of two test cases in each test case pair is determined, wherein the test case pair is obtained by selecting any two test case compositions from a plurality of test cases.
After extracting the characteristic values of each test case, the plurality of test cases can be combined in pairs to obtain a plurality of test case pairs, and the similarity of the characteristic values of two test cases in each test case pair is respectively determined. Specifically, the similarity of the feature values of two test cases in each test case pair can be determined by using a cosine similarity algorithm.
The similarity of the feature values of the two test cases in each test case pair can be used as the similarity of the two test cases in each test case pair.
And S13, taking the test case pair with the characteristic value of which the similarity is larger than a preset similarity threshold value as a test case pair to be processed.
The preset similarity threshold may be set as needed, and is not limited in this embodiment.
And S14, performing de-duplication on the test cases in the plurality of to-be-processed test case pairs.
The deduplicating the test cases in the plurality of pairs of test cases to be processed may include:
s141, respectively removing one of two test cases in each test case pair to be processed to obtain a first test case set;
s142, if a first test case example set exists in the first test case set, selecting one test case from the first test case subset for reservation, wherein the first test case subset comprises at least 2 test cases, and at least 2 test cases are the same.
As another alternative embodiment of the present application, referring to fig. 2, a schematic flow chart of an embodiment 2 of a test case duplication eliminating method provided in the present application is mainly a refinement of the test case duplication eliminating method described in the foregoing embodiment 1, and as shown in fig. 2, the method may include, but is not limited to, the following steps:
and S21, extracting characteristic values of each test case.
In this embodiment, the feature values of the test cases may include, but are not limited to: any one or more of a feature value characterizing a functional module to which the test case belongs, a feature value describing the test case, a feature value characterizing an operational step of the test case, and a feature value characterizing an expected result of execution of the test case.
And S22, segmenting the characteristic values of each test case to obtain at least one keyword.
Steps S21-S22 are a specific embodiment of step S11 in example 1.
In this embodiment, the feature value of each test case may be segmented based on the python segmentation library to obtain at least one keyword.
And S23, determining the similarity of the characteristic values of the two test cases in each test case pair based on the obtained keywords by segmenting the characteristic values of the test cases.
Step S23 is a specific embodiment of step S12 in example 1.
And S24, taking the test case pair with the characteristic value of which the similarity is larger than a preset similarity threshold value as a test case pair to be processed.
And S25, performing de-duplication on the test cases in the plurality of to-be-processed test case pairs.
The detailed procedure of steps S24-S25 can be referred to the related description of steps S13-S14 in embodiment 1, and will not be repeated here.
In this embodiment, the keyword is obtained by word segmentation of the feature value, and the similarity of the feature value is determined based on the keyword, so that the complexity of determining the similarity of the feature value can be reduced, the efficiency of determining the similarity of the feature value is improved, and further the duplicate removal efficiency is improved.
As another alternative embodiment of the present application, referring to fig. 3, a schematic flow chart of an embodiment 3 of a test case duplication eliminating method provided in the present application is mainly a refinement of the test case duplication eliminating method described in the foregoing embodiment 2, and as shown in fig. 3, the method may include, but is not limited to, the following steps:
and S31, extracting characteristic values of each test case.
And S32, segmenting the characteristic values of each test case to obtain at least one keyword.
The detailed procedure of steps S31-S32 can be referred to in the related description of steps S21-S22 in embodiment 2, and will not be described herein.
Step S33, determining similar keyword pairs in keywords of two test cases in each test case pair respectively.
In this embodiment, the pair of similar keywords is composed of a first keyword and a second keyword, where the text similarity between the first keyword and the second keyword is greater than a set text similarity threshold, the first keyword is a keyword of one of the two test cases, and the second keyword is a keyword of the other of the two test cases.
In this embodiment, the determining process of the similar keyword pairs may include: and calculating the text similarity of the first keyword and the second keyword, judging whether the text similarity of the first keyword and the second keyword is larger than a set text similarity threshold, and if so, forming a similar keyword pair by the first keyword and the second keyword.
And step S34, counting the number of the similar keyword pairs, and determining the similarity of the characteristic values of the two test cases based on the number of the similar keyword pairs.
In this embodiment, a correspondence between the number of similar keyword pairs and the similarity of the feature values of the test cases may be set, after the number of similar keyword pairs is counted, the similarity of the feature values corresponding to the number of related keyword pairs is searched in the correspondence, and the searched similarity is used as the similarity of the feature values of the two test cases.
Steps S33-S34 are a specific embodiment of step S23 in example 2.
Step S35, taking a test case pair to which the characteristic value with the similarity larger than a preset similarity threshold belongs as a test case pair to be processed;
and S36, performing de-duplication on the test cases in the plurality of to-be-processed test case pairs.
The detailed procedure of steps S35-S36 can be seen in steps S24-S25 in embodiment 2, and will not be described here.
As another alternative embodiment of the present application, referring to fig. 4, a schematic flow chart of an embodiment 4 of a test case deduplication method provided in the present application is mainly a refinement of the test case deduplication method described in the foregoing embodiment 2, and as shown in fig. 4, the method may include, but is not limited to, the following steps:
and S41, extracting characteristic values of each test case.
And step S42, segmenting the characteristic values of each test case to obtain at least one keyword.
Step S33, counting the times of occurrence of each first keyword in each test case pair in the first test case and the times of occurrence of each second keyword in each test case pair in the second test case.
The first test case and the second test case form the test case pair, the first keyword is a keyword of the first test case, and the second keyword is a keyword of the second test case.
Step S44, determining the similarity of the feature values of the two test cases based on the keywords of the two test cases in each test case pair, the number of times that each first keyword in each test case pair appears in the first test case, and the number of times that each second keyword in each test case pair appears in the second test case.
In this embodiment, the first keyword of the first test case and the number of times the first keyword appears in the first test case in each test case pair may be formed into a first vector, the second keyword of the second test case and the number of times the second keyword appears in the second test case may be formed into a second vector, the similarity between the first vector and the second vector may be calculated, and the similarity between the first vector and the second vector may be used as the similarity between the feature values of the first test case and the second test case.
In this embodiment, the similarity of the feature values of the two test cases may be determined using a cosine similarity algorithm. Specifically, the similarity of the first vector and the second vector may be calculated using a cosine similarity algorithm.
Steps S43-S44 are a specific embodiment of step S23 in example 2.
Step S45, taking a test case pair to which the characteristic value with the similarity larger than a preset similarity threshold belongs as a test case pair to be processed;
and S46, performing deduplication on the test cases in the plurality of to-be-processed test case pairs.
The detailed procedure of steps S45-S46 can be seen in steps S24-S25 in embodiment 2, and will not be described here.
In this embodiment, the similarity of the feature values of the two test cases is determined based on the number of the similar keyword pairs and the number of times that the keywords in the similar keyword pairs appear in the keywords of the two test cases, so that the accuracy of determining the similarity of the feature values can be improved.
Next, a test case deduplication device provided in the present application will be described, and the test case deduplication device described below and the test case deduplication method described above may be referred to correspondingly.
Referring to fig. 5, the test case deduplication apparatus includes: the device comprises an extraction module 11, a first determination module 12, a second determination module 13 and a deduplication module 14.
An extracting module 11, configured to extract feature values of each test case;
a first determining module 12, configured to determine similarity of feature values of two test cases in each pair of test cases, where the pair of test cases is obtained by selecting any two test case compositions from a plurality of test cases;
a second determining module 13, configured to use a pair of test cases to which the feature value with the similarity greater than the preset similarity threshold belongs as a pair of test cases to be processed;
and the deduplication module 14 is used for deduplicating the test cases in the plurality of the to-be-processed test case pairs.
In this embodiment, the extracting module 11 may specifically be configured to:
extracting characteristic values of each test case;
performing word segmentation on the characteristic values of each test case to obtain at least one keyword;
accordingly, the first determining module 12 may specifically be configured to:
and determining the similarity of the characteristic values of the two test cases in each test case pair based on the obtained keywords by segmenting the characteristic values of each test case.
In this embodiment, the first determining module 12 may specifically be configured to:
determining similar keyword pairs in keywords of two test cases in each test case pair respectively, wherein the similar keyword pairs consist of a first keyword and a second keyword, the text similarity of the first keyword and the second keyword is larger than a set text similarity threshold, the first keyword is a keyword of one of the two test cases, and the second keyword is a keyword of the other of the two test cases;
and counting the number of the similar keyword pairs, and determining the similarity of the characteristic values of the two test cases based on the number of the similar keyword pairs.
In this embodiment, the first determining module 12 may specifically be configured to:
counting the occurrence times of each first keyword in each test case pair in a first test case and the occurrence times of each second keyword in each test case pair in a second test case respectively, wherein the first test case and the second test case form the test case pair, the first keyword is the keyword of the first test case, and the second keyword is the keyword of the second test case;
and determining the similarity of the feature values of the two test cases based on the keywords of the two test cases in each test case pair, the frequency of occurrence of each first keyword in each test case pair in the first test case, and the frequency of occurrence of each second keyword in each test case pair in the second test case.
In this embodiment, the first determining module 12 may specifically be configured to:
and determining the similarity of the characteristic values of the two test cases by using a cosine similarity algorithm based on the keywords of the two test cases in each test case pair, the frequency of occurrence of each first keyword in each test case pair in the first test case, and the frequency of occurrence of each second keyword in each test case pair in the second test case.
It should be noted that, in each embodiment, the differences from the other embodiments are emphasized, and the same similar parts between the embodiments are referred to each other. For the apparatus class embodiments, the description is relatively simple as it is substantially similar to the method embodiments, and reference is made to the description of the method embodiments for relevant points.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present application.
From the above description of embodiments, it will be apparent to those skilled in the art that the present application may be implemented in software plus a necessary general purpose hardware platform. Based on such understanding, the technical solutions of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in the embodiments or some parts of the embodiments of the present application.
The foregoing has described in detail a test case deduplication method and apparatus provided herein, with specific examples being employed herein to illustrate the principles and implementations of the present application, the above examples being provided only to assist in understanding the methods of the present application and their core ideas; meanwhile, as those skilled in the art will have modifications in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims (8)

1. A test case deduplication method, comprising:
extracting characteristic values of each test case; the characteristic value is any one or more of a characteristic value of a functional module to which the characterization test case belongs, a characteristic value of a description of the test case, a characteristic value of an operation step of the characterization test case and a characteristic value of an expected result of the performance of the characterization test case;
respectively determining the similarity of the characteristic values of two test cases in each test case pair, wherein the test case pair is obtained by selecting any two test case compositions from a plurality of test cases;
taking the test case pair with the characteristic value of which the similarity is larger than a preset similarity threshold value as a test case pair to be processed;
performing de-duplication on the test cases in the plurality of pairs of test cases to be processed, including: removing one of two test cases in each test case pair to be processed respectively to obtain a first test case set, and if the first test case set exists in the first test case set, selecting one test case in the first test case set for reservation, wherein the first test case set comprises at least 2 test cases, and at least 2 test cases are the same;
the extracting the characteristic value of each test case comprises the following steps:
extracting characteristic values of each test case;
performing word segmentation on the characteristic values of each test case to obtain at least one keyword; the method comprises the steps of dividing words of characteristic values of each test case based on a python word dividing library to obtain at least one keyword;
the determining the similarity of the feature values of the two test cases in each test case pair comprises the following steps:
and determining the similarity of the characteristic values of the two test cases in each test case pair based on the obtained keywords by segmenting the characteristic values of each test case.
2. The method according to claim 1, wherein the determining the similarity of the feature values of two test cases in each test case pair based on the keywords obtained by word segmentation of the feature values of each test case, respectively, comprises:
determining similar keyword pairs in keywords of two test cases in each test case pair respectively, wherein the similar keyword pairs consist of a first keyword and a second keyword, the text similarity of the first keyword and the second keyword is larger than a set text similarity threshold, the first keyword is a keyword of one of the two test cases, and the second keyword is a keyword of the other of the two test cases;
and counting the number of the similar keyword pairs, and determining the similarity of the characteristic values of the two test cases based on the number of the similar keyword pairs.
3. The method according to claim 1, wherein the determining the similarity of the feature values of two test cases in each test case pair based on the keywords obtained by word segmentation of the feature values of each test case, respectively, comprises:
counting the occurrence times of each first keyword in each test case pair in a first test case and the occurrence times of each second keyword in each test case pair in a second test case respectively, wherein the first test case and the second test case form the test case pair, the first keyword is the keyword of the first test case, and the second keyword is the keyword of the second test case;
and determining the similarity of the feature values of the two test cases based on the keywords of the two test cases in each test case pair, the frequency of occurrence of each first keyword in each test case pair in the first test case, and the frequency of occurrence of each second keyword in each test case pair in the second test case.
4. The method of claim 3, wherein determining the similarity of the feature values of the two test cases based on the keywords of the two test cases in each of the test cases, the number of times each first keyword in each of the test cases appears in the first test case, and the number of times each second keyword in each of the test cases appears in the second test case, respectively, comprises:
and determining the similarity of the characteristic values of the two test cases by using a cosine similarity algorithm based on the keywords of the two test cases in each test case pair, the frequency of occurrence of each first keyword in each test case pair in the first test case, and the frequency of occurrence of each second keyword in each test case pair in the second test case.
5. A test case deduplication apparatus, comprising:
the extraction module is used for extracting the characteristic values of each test case; any one or more of a feature value representing a functional module to which the test case belongs, a feature value describing the test case, a feature value representing an operation step of the test case, and a feature value representing an expected result of execution of the test case;
the first determining module is used for respectively determining the similarity of the characteristic values of two test cases in each test case pair, wherein the test case pair is obtained by selecting any two test case compositions from a plurality of test cases;
the second determining module is used for taking the test case pair with the feature value of which the similarity is larger than a preset similarity threshold value as a test case pair to be processed;
the de-duplication module is configured to de-duplicate test cases in the plurality of pairs of test cases to be processed, and includes: removing one of two test cases in each test case pair to be processed respectively to obtain a first test case set, and if the first test case set exists in the first test case set, selecting one test case in the first test case set for reservation, wherein the first test case set comprises at least 2 test cases, and at least 2 test cases are the same;
the extraction module is specifically configured to:
extracting characteristic values of each test case;
the characteristic values of all the test cases are segmented to obtain at least one keyword, wherein the characteristic values of all the test cases are segmented based on a python segmentation library to obtain at least one keyword;
the first determining module is specifically configured to:
and determining the similarity of the characteristic values of the two test cases in each test case pair based on the obtained keywords by segmenting the characteristic values of each test case.
6. The apparatus of claim 5, wherein the first determining module is specifically configured to:
determining similar keyword pairs in keywords of two test cases in each test case pair respectively, wherein the similar keyword pairs consist of a first keyword and a second keyword, the text similarity of the first keyword and the second keyword is larger than a set text similarity threshold, the first keyword is a keyword of one of the two test cases, and the second keyword is a keyword of the other of the two test cases;
and counting the number of the similar keyword pairs, and determining the similarity of the characteristic values of the two test cases based on the number of the similar keyword pairs.
7. The apparatus of claim 5, wherein the first determining module is specifically configured to:
counting the occurrence times of each first keyword in each test case pair in a first test case and the occurrence times of each second keyword in each test case pair in a second test case respectively, wherein the first test case and the second test case form the test case pair, the first keyword is the keyword of the first test case, and the second keyword is the keyword of the second test case;
and determining the similarity of the feature values of the two test cases based on the keywords of the two test cases in each test case pair, the frequency of occurrence of each first keyword in each test case pair in the first test case, and the frequency of occurrence of each second keyword in each test case pair in the second test case.
8. The apparatus of claim 7, wherein the first determining module is specifically configured to:
and determining the similarity of the characteristic values of the two test cases by using a cosine similarity algorithm based on the keywords of the two test cases in each test case pair, the frequency of occurrence of each first keyword in each test case pair in the first test case, and the frequency of occurrence of each second keyword in each test case pair in the second test case.
CN202010505902.6A 2020-06-05 2020-06-05 Test case duplicate removal method and device Active CN111625468B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010505902.6A CN111625468B (en) 2020-06-05 2020-06-05 Test case duplicate removal method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010505902.6A CN111625468B (en) 2020-06-05 2020-06-05 Test case duplicate removal method and device

Publications (2)

Publication Number Publication Date
CN111625468A CN111625468A (en) 2020-09-04
CN111625468B true CN111625468B (en) 2024-04-16

Family

ID=72260191

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010505902.6A Active CN111625468B (en) 2020-06-05 2020-06-05 Test case duplicate removal method and device

Country Status (1)

Country Link
CN (1) CN111625468B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11954019B2 (en) 2022-02-04 2024-04-09 Optum, Inc. Machine learning techniques for automated software testing configuration management

Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8234285B1 (en) * 2009-07-10 2012-07-31 Google Inc. Context-dependent similarity measurements
CN103678702A (en) * 2013-12-30 2014-03-26 优视科技有限公司 Video duplicate removal method and device
CN104636319A (en) * 2013-11-11 2015-05-20 腾讯科技(北京)有限公司 Text duplicate removal method and device
CN105824798A (en) * 2016-03-03 2016-08-03 云南电网有限责任公司教育培训评价中心 Examination question de-duplicating method of examination question base based on examination question key word likeness
CN106598940A (en) * 2016-11-01 2017-04-26 四川用联信息技术有限公司 Text similarity solution algorithm based on global optimization of keyword quality
WO2017107566A1 (en) * 2015-12-25 2017-06-29 广州视源电子科技股份有限公司 Retrieval method and system based on word vector similarity
CN109101620A (en) * 2018-08-08 2018-12-28 广州神马移动信息科技有限公司 Similarity calculating method, clustering method, device, storage medium and electronic equipment
CN109508378A (en) * 2018-11-26 2019-03-22 平安科技(深圳)有限公司 A kind of sample data processing method and processing device
CN110162630A (en) * 2019-05-09 2019-08-23 深圳市腾讯信息技术有限公司 A kind of method, device and equipment of text duplicate removal
CN110163688A (en) * 2019-05-30 2019-08-23 复旦大学 Commodity network public sentiment detection system
CN110162750A (en) * 2019-01-24 2019-08-23 腾讯科技(深圳)有限公司 Text similarity detection method, electronic equipment and computer readable storage medium
CN110276021A (en) * 2019-04-29 2019-09-24 小轮(上海)网络科技有限公司 Place name matching process and device based on semantic similarity
CN110377886A (en) * 2019-06-19 2019-10-25 平安国际智慧城市科技股份有限公司 Project duplicate checking method, apparatus, equipment and storage medium
CN110442760A (en) * 2019-07-24 2019-11-12 银江股份有限公司 A kind of the synonym method for digging and device of question and answer searching system
CN110941598A (en) * 2019-12-02 2020-03-31 北京锐安科技有限公司 Data deduplication method, device, terminal and storage medium
CN110956037A (en) * 2019-10-16 2020-04-03 厦门美柚股份有限公司 Multimedia content repeated judgment method and device
CN111159445A (en) * 2019-12-30 2020-05-15 深圳云天励飞技术有限公司 Picture filtering method and device, electronic equipment and storage medium

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8234285B1 (en) * 2009-07-10 2012-07-31 Google Inc. Context-dependent similarity measurements
CN104636319A (en) * 2013-11-11 2015-05-20 腾讯科技(北京)有限公司 Text duplicate removal method and device
CN103678702A (en) * 2013-12-30 2014-03-26 优视科技有限公司 Video duplicate removal method and device
WO2017107566A1 (en) * 2015-12-25 2017-06-29 广州视源电子科技股份有限公司 Retrieval method and system based on word vector similarity
CN105824798A (en) * 2016-03-03 2016-08-03 云南电网有限责任公司教育培训评价中心 Examination question de-duplicating method of examination question base based on examination question key word likeness
CN106598940A (en) * 2016-11-01 2017-04-26 四川用联信息技术有限公司 Text similarity solution algorithm based on global optimization of keyword quality
CN109101620A (en) * 2018-08-08 2018-12-28 广州神马移动信息科技有限公司 Similarity calculating method, clustering method, device, storage medium and electronic equipment
CN109508378A (en) * 2018-11-26 2019-03-22 平安科技(深圳)有限公司 A kind of sample data processing method and processing device
CN110162750A (en) * 2019-01-24 2019-08-23 腾讯科技(深圳)有限公司 Text similarity detection method, electronic equipment and computer readable storage medium
CN110276021A (en) * 2019-04-29 2019-09-24 小轮(上海)网络科技有限公司 Place name matching process and device based on semantic similarity
CN110162630A (en) * 2019-05-09 2019-08-23 深圳市腾讯信息技术有限公司 A kind of method, device and equipment of text duplicate removal
CN110163688A (en) * 2019-05-30 2019-08-23 复旦大学 Commodity network public sentiment detection system
CN110377886A (en) * 2019-06-19 2019-10-25 平安国际智慧城市科技股份有限公司 Project duplicate checking method, apparatus, equipment and storage medium
CN110442760A (en) * 2019-07-24 2019-11-12 银江股份有限公司 A kind of the synonym method for digging and device of question and answer searching system
CN110956037A (en) * 2019-10-16 2020-04-03 厦门美柚股份有限公司 Multimedia content repeated judgment method and device
CN110941598A (en) * 2019-12-02 2020-03-31 北京锐安科技有限公司 Data deduplication method, device, terminal and storage medium
CN111159445A (en) * 2019-12-30 2020-05-15 深圳云天励飞技术有限公司 Picture filtering method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN111625468A (en) 2020-09-04

Similar Documents

Publication Publication Date Title
US10346257B2 (en) Method and device for deduplicating web page
CN110826648B (en) Method for realizing fault detection by utilizing time sequence clustering algorithm
CN110019792A (en) File classification method and device and sorter model training method
CN111243601B (en) Voiceprint clustering method and device, electronic equipment and computer-readable storage medium
US10783145B2 (en) Block level deduplication with block similarity
CN113448935B (en) Method, electronic device and computer program product for providing log information
CN111597297A (en) Article recall method, system, electronic device and readable storage medium
CN111625468B (en) Test case duplicate removal method and device
CN110096605B (en) Image processing method and device, electronic device and storage medium
CN117743577A (en) Text classification method, device, electronic equipment and storage medium
CN106919554B (en) Method and device for identifying invalid words in document
CN116361185A (en) Software testing method and device
CN110399464B (en) Similar news judgment method and system and electronic equipment
CN110929493B (en) Data management method, redundant data detection method, storage medium and data system
CN111159996B (en) Short text set similarity comparison method and system based on text fingerprint algorithm
CN104484330A (en) Pre-selecting method and device of spam comments based on grading keyword threshold combination evaluation
KR102357023B1 (en) Apparatus and Method for restoring Conversation Segment Sentences
CN110321425B (en) Method and device for judging defect type of power grid
CN113780042A (en) Picture set operation method, picture set labeling method and device
Zhang et al. Research on data cleaning method based on SNM algorithm
CN115858324B (en) AI-based IT equipment fault processing method, apparatus, equipment and medium
Ruiz et al. Video retrieval using sparse Bayesian reconstruction
CN111552864B (en) Information deduplication method, system, storage medium and electronic equipment
CN115344485A (en) Anomaly detection method and device, computer equipment and storage medium
CN113656393B (en) Data processing method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant