CN111625468A - Test case duplicate removal method and device - Google Patents

Test case duplicate removal method and device Download PDF

Info

Publication number
CN111625468A
CN111625468A CN202010505902.6A CN202010505902A CN111625468A CN 111625468 A CN111625468 A CN 111625468A CN 202010505902 A CN202010505902 A CN 202010505902A CN 111625468 A CN111625468 A CN 111625468A
Authority
CN
China
Prior art keywords
test case
test
keyword
keywords
pair
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010505902.6A
Other languages
Chinese (zh)
Other versions
CN111625468B (en
Inventor
李刘强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bank of China Ltd
Original Assignee
Bank of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bank of China Ltd filed Critical Bank of China Ltd
Priority to CN202010505902.6A priority Critical patent/CN111625468B/en
Publication of CN111625468A publication Critical patent/CN111625468A/en
Application granted granted Critical
Publication of CN111625468B publication Critical patent/CN111625468B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3684Test management for test design, e.g. generating new test cases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The method comprises the steps of extracting characteristic values of all test cases, respectively determining the similarity of the characteristic values of the two test cases in all test case pairs, using the test case pair to which the characteristic value with the similarity larger than a preset similarity threshold belongs as a test case pair to be processed, and carrying out duplicate removal on the test cases in a plurality of test case pairs to be processed, so that the automatic duplicate removal of the test cases is realized. And the feature values of all the test cases are respectively extracted, and the similarity of the feature values of the two test cases in each test case is respectively determined, so that the processing of all the test cases is realized, the omission of the test cases is avoided, and the accuracy of the test cases is improved.

Description

Test case duplicate removal method and device
Technical Field
The present application relates to the field of testing technologies, and in particular, to a method and an apparatus for removing duplicate in a test case.
Background
For large business systems, a large number of test cases may be required to test them. In order to meet the requirement of the use amount of the test cases, different testers are needed to compile the test cases. Different testers may have repetition of the written test cases, which results in waste of test resources.
Therefore, it is necessary to perform deduplication processing for test cases in which duplication exists. At present, the duplication of test cases is generally removed by adopting a manual mode, but the accuracy rate is low and the efficiency is low.
Disclosure of Invention
In order to solve the above technical problems, embodiments of the present application provide a method and an apparatus for removing duplicate of a test case, so as to achieve the purpose of improving the efficiency and accuracy of removing duplicate of the test case, and the technical scheme is as follows:
a test case deduplication method, comprising:
extracting characteristic values of each test case;
respectively determining the similarity of the characteristic values of two test cases in each test case pair, wherein the test case pairs are obtained by selecting any two test cases from a plurality of test cases;
taking the test case pair to which the characteristic value with the similarity larger than a preset similarity threshold belongs as a test case pair to be processed;
and carrying out duplicate removal on the test cases in the plurality of test case pairs to be processed.
Preferably, the extracting the feature values of the test cases includes:
extracting characteristic values of each test case;
performing word segmentation on the characteristic value of each test case to obtain at least one keyword;
the determining the similarity of the characteristic values of the two test cases in each test case pair respectively comprises the following steps:
and determining the similarity of the characteristic values of the two test cases in each test case pair based on the keywords obtained by segmenting the characteristic values of the test cases respectively.
Preferably, the determining the similarity between the feature values of the two test cases in each test case pair based on the keywords obtained by segmenting the feature values of each test case respectively includes:
respectively determining similar keyword pairs in keywords of two test cases in each test case pair, wherein the similar keyword pairs consist of first keywords and second keywords, the text similarity of the first keywords and the second keywords is greater than a set text similarity threshold, the first keywords are keywords of one test case in the two test cases, and the second keywords are keywords of the other test case in the two test cases;
and counting the number of the similar keyword pairs, and determining the similarity of the characteristic values of the two test cases based on the number of the similar keyword pairs.
Preferably, the determining the similarity between the feature values of the two test cases in each test case pair based on the keywords obtained by segmenting the feature values of each test case respectively includes:
respectively counting the occurrence frequency of each first keyword in each test case pair in a first test case and the occurrence frequency of each second keyword in the test case pair in a second test case, wherein the first test case and the second test case form the test case pair, the first keyword is the keyword of the first test case, and the second keyword is the keyword of the second test case;
and determining the similarity of the characteristic values of the two test cases respectively based on the keywords of the two test cases in each test case pair, the occurrence frequency of each first keyword in each test case pair in the first test case, and the occurrence frequency of each second keyword in each test case pair in the second test case.
Preferably, the determining the similarity of the feature values of the two test cases based on the keywords of the two test cases in each test case pair, the occurrence frequency of each first keyword in each test case pair in the first test case, and the occurrence frequency of each second keyword in the test case pair in the second test case respectively includes:
and determining the similarity of the characteristic values of the two test cases by using a cosine similarity algorithm based on the keywords of the two test cases in each test case pair, the occurrence frequency of each first keyword in each test case pair in the first test case and the occurrence frequency of each second keyword in each test case pair in the second test case.
A test case deduplication apparatus, comprising:
the extraction module is used for extracting the characteristic value of each test case;
the first determining module is used for respectively determining the similarity of the characteristic values of two test cases in each test case pair, wherein the test case pair is obtained by selecting any two test cases from a plurality of test cases;
the second determination module is used for taking the test case pair to which the characteristic value with the similarity larger than the preset similarity threshold belongs as a test case pair to be processed;
and the duplication removing module is used for carrying out duplication removal on the test cases in the plurality of test case pairs to be processed.
Preferably, the extraction module is specifically configured to:
extracting characteristic values of each test case;
performing word segmentation on the characteristic value of each test case to obtain at least one keyword;
the first determining module is specifically configured to:
and determining the similarity of the characteristic values of the two test cases in each test case pair based on the keywords obtained by segmenting the characteristic values of the test cases respectively.
Preferably, the first determining module is specifically configured to:
respectively determining similar keyword pairs in keywords of two test cases in each test case pair, wherein the similar keyword pairs consist of first keywords and second keywords, the text similarity of the first keywords and the second keywords is greater than a set text similarity threshold, the first keywords are keywords of one test case in the two test cases, and the second keywords are keywords of the other test case in the two test cases;
and counting the number of the similar keyword pairs, and determining the similarity of the characteristic values of the two test cases based on the number of the similar keyword pairs.
Preferably, the first determining module is specifically configured to:
respectively counting the occurrence frequency of each first keyword in each test case pair in a first test case and the occurrence frequency of each second keyword in the test case pair in a second test case, wherein the first test case and the second test case form the test case pair, the first keyword is the keyword of the first test case, and the second keyword is the keyword of the second test case;
and determining the similarity of the characteristic values of the two test cases respectively based on the keywords of the two test cases in each test case pair, the occurrence frequency of each first keyword in each test case pair in the first test case, and the occurrence frequency of each second keyword in each test case pair in the second test case.
Preferably, the first determining module is specifically configured to:
and determining the similarity of the characteristic values of the two test cases by using a cosine similarity algorithm based on the keywords of the two test cases in each test case pair, the occurrence frequency of each first keyword in each test case pair in the first test case and the occurrence frequency of each second keyword in each test case pair in the second test case.
Compared with the prior art, the beneficial effect of this application is:
in the application, the similarity of the characteristic values of two test cases in each test case pair is respectively determined by extracting the characteristic value of each test case, the test case pair to which the characteristic value with the similarity larger than a preset similarity threshold belongs is taken as a test case pair to be processed, and the test cases in the plurality of test case pairs to be processed are subjected to duplicate removal, so that the test cases are automatically deduplicated. And the feature values of all the test cases are respectively extracted, and the similarity of the feature values of the two test cases in each test case is respectively determined, so that the processing of all the test cases is realized, the omission of the test cases is avoided, and the accuracy of the test cases is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.
FIG. 1 is a flowchart of an embodiment 1 of a test case deduplication method provided in the present application;
FIG. 2 is a flowchart of an embodiment 2 of a test case deduplication method provided by the present application;
FIG. 3 is a flowchart of an embodiment 3 of a test case deduplication method provided in the present application;
FIG. 4 is a flowchart of an embodiment 4 of a test case deduplication method provided in the present application;
fig. 5 is a schematic structural diagram of a test case removing device provided in the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The embodiment of the application discloses a test case duplicate removal method, which comprises the following steps: extracting characteristic values of each test case; respectively determining the similarity of the characteristic values of two test cases in each test case pair, wherein the test case pairs are obtained by selecting any two test cases from a plurality of test cases; taking the test case pair to which the characteristic value with the similarity larger than a preset similarity threshold belongs as a test case pair to be processed; and carrying out duplicate removal on the test cases in the plurality of test case pairs to be processed. . In the application, the duplicate removal efficiency and the accuracy can be improved.
Next, a description is given of a test case deduplication method disclosed in the embodiment of the present application, and as shown in fig. 1, a flowchart of embodiment 1 of a test case deduplication method provided in the present application may include the following steps:
and step S11, extracting the characteristic values of the test cases.
The characteristic values of the test case may include, but are not limited to: any one or more of characteristic values representing the function module to which the test case belongs, characteristic values describing the test case, characteristic values representing operation steps of the test case, and characteristic values representing expected results of execution of the test case.
And step S12, respectively determining the similarity of the characteristic values of two test cases in each test case pair, wherein the test case pair is obtained by selecting any two test cases from the plurality of test cases.
After the characteristic values of the test cases are extracted, pairwise combination can be performed on the test cases to obtain a plurality of test case pairs, and the similarity of the characteristic values of the two test cases in each test case pair is respectively determined. Specifically, the similarity of the feature values of the two test cases in each test case pair can be determined by using a cosine similarity algorithm.
The similarity of the characteristic values of the two test cases in each test case pair can be used as the similarity of the two test cases in each test case pair.
And step S13, taking the test case pair to which the characteristic value with the similarity larger than the preset similarity threshold belongs as the test case pair to be processed.
The preset similarity threshold may be set as needed, and is not limited in this embodiment.
And step S14, carrying out duplicate removal on the test cases in the plurality of test case pairs to be processed.
The deduplication of the test cases in the plurality of test case pairs to be processed may include:
s141, respectively removing one of the two test cases in each test case pair to be processed to obtain a first test case set;
and S142, if a first test case subset exists in the first test case set, selecting one test case from the first test case subset for reservation, wherein the first test case subset comprises at least 2 test cases, and the at least 2 test cases are the same.
As another alternative embodiment of the present application, referring to fig. 2, a schematic flow chart of an embodiment 2 of a test case deduplication method provided by the present application is provided, where this embodiment mainly relates to a refinement scheme of the test case deduplication method described in the foregoing embodiment 1, as shown in fig. 2, the method may include, but is not limited to, the following steps:
and step S21, extracting the characteristic values of the test cases.
In this embodiment, the characteristic values of the test case may include, but are not limited to: any one or more of characteristic values representing the function module to which the test case belongs, characteristic values describing the test case, characteristic values representing operation steps of the test case, and characteristic values representing expected results of execution of the test case.
And step S22, performing word segmentation on the characteristic value of each test case to obtain at least one keyword.
Steps S21-S22 are a specific implementation of step S11 in example 1.
In this embodiment, the feature values of each test case may be segmented based on a segmentation library of python to obtain at least one keyword.
And step S23, determining the similarity of the characteristic values of the two test cases in each test case pair based on the keywords obtained by segmenting the characteristic values of the test cases respectively.
Step S23 is a specific implementation manner of step S12 in example 1.
And step S24, taking the test case pair to which the characteristic value with the similarity larger than the preset similarity threshold belongs as the test case pair to be processed.
And step S25, carrying out duplicate removal on the test cases in the plurality of test case pairs to be processed.
The detailed procedures of steps S24-S25 can be found in the related descriptions of steps S13-S14 in embodiment 1, and are not repeated herein.
In this embodiment, the keywords are obtained by segmenting the feature values, and the similarity of the feature values is determined based on the keywords, so that the complexity of determining the similarity of the feature values can be reduced, the efficiency of determining the similarity of the feature values is improved, and the deduplication efficiency is further improved.
As another alternative embodiment of the present application, referring to fig. 3, a schematic flow chart of an embodiment 3 of a test case deduplication method provided by the present application is provided, where this embodiment mainly relates to a refinement scheme of the test case deduplication method described in the foregoing embodiment 2, as shown in fig. 3, the method may include, but is not limited to, the following steps:
and step S31, extracting the characteristic values of the test cases.
And step S32, performing word segmentation on the characteristic value of each test case to obtain at least one keyword.
The detailed procedures of steps S31-S32 can be referred to the related descriptions of steps S21-S22 in embodiment 2, and are not described herein again.
And step S33, respectively determining similar keyword pairs in the keywords of the two test cases in each test case pair.
In this embodiment, the similar keyword pair is composed of a first keyword and a second keyword, the text similarity between the first keyword and the second keyword is greater than a set text similarity threshold, the first keyword is a keyword of one of the two test cases, and the second keyword is a keyword of the other of the two test cases.
In this embodiment, the process of determining the similar keyword pair may include: and calculating the text similarity of the first keyword and the second keyword, judging whether the text similarity of the first keyword and the second keyword is greater than a set text similarity threshold, and if so, combining the first keyword and the second keyword into a similar keyword pair.
And step S34, counting the number of the similar keyword pairs, and determining the similarity of the characteristic values of the two test cases based on the number of the similar keyword pairs.
In this embodiment, a correspondence between the number of similar keyword pairs and the similarity of the feature values of the test cases may be set, after the number of similar keyword pairs is counted, the similarity of the feature values corresponding to the number of the related keyword pairs is found in the correspondence, and the found similarity is used as the similarity of the feature values of the two test cases.
Steps S33-S34 are a specific implementation of step S23 in example 2.
Step S35, taking the test case pair to which the characteristic value with the similarity larger than the preset similarity threshold belongs as a test case pair to be processed;
and step S36, carrying out duplicate removal on the test cases in the plurality of test case pairs to be processed.
The detailed procedures of steps S35-S36 can be seen in steps S24-S25 of embodiment 2, and are not repeated herein.
As another alternative embodiment of the present application, referring to fig. 4, a schematic flow chart of an embodiment 4 of a test case deduplication method provided by the present application is provided, where this embodiment mainly relates to a refinement scheme of the test case deduplication method described in the foregoing embodiment 2, as shown in fig. 4, the method may include, but is not limited to, the following steps:
and step S41, extracting the characteristic values of the test cases.
And step S42, performing word segmentation on the characteristic value of each test case to obtain at least one keyword.
And step S33, respectively counting the occurrence frequency of each first keyword in each test case pair in the first test case and the occurrence frequency of each second keyword in the test case pair in the second test case.
The first test case and the second test case form the test case pair, the first keywords are keywords of the first test case, and the second keywords are keywords of the second test case.
Step S44, determining similarity of characteristic values of two test cases respectively based on the keywords of the two test cases in each test case pair, the occurrence frequency of each first keyword in each test case pair in the first test case, and the occurrence frequency of each second keyword in the test case pair in the second test case.
In this embodiment, the number of times that the first keyword and the first keyword of the first test case in each test case pair appear in the first test case may be formed into a first vector, the number of times that the second keyword and the second keyword of the second test case appear in the second test case may be formed into a second vector, the similarity between the first vector and the second vector is calculated, and the similarity between the first vector and the second vector is used as the similarity between the feature values of the first test case and the second test case.
In this embodiment, the similarity of the feature values of the two test cases may be determined by using a cosine similarity algorithm. Specifically, the similarity of the first vector and the second vector may be calculated using a cosine similarity algorithm.
Steps S43-S44 are a specific implementation of step S23 in example 2.
Step S45, taking the test case pair to which the characteristic value with the similarity larger than the preset similarity threshold belongs as a test case pair to be processed;
and step S46, carrying out duplicate removal on the test cases in the plurality of test case pairs to be processed.
The detailed procedures of steps S45-S46 can be seen in steps S24-S25 of embodiment 2, and are not repeated herein.
In this embodiment, the similarity of the feature values of the two test cases is determined based on the number of the similar keyword pairs and the number of times that the keywords in each similar keyword pair appear in the keywords of the two test cases, so that the accuracy of determining the similarity of the feature values can be improved.
The test case deduplication device provided in the present application is described below, and the test case deduplication device described below and the test case deduplication method described above may be referred to in correspondence with each other.
Referring to fig. 5, the test case deduplication apparatus includes: an extraction module 11, a first determination module 12, a second determination module 13 and a deduplication module 14.
The extraction module 11 is used for extracting the characteristic values of the test cases;
the first determining module 12 is configured to determine similarity between feature values of two test cases in each test case pair, where the test case pair is obtained by combining any two test cases selected from the plurality of test cases;
a second determining module 13, configured to use the test case pair to which the feature value with the similarity greater than the preset similarity threshold belongs as a test case pair to be processed;
and the duplication removing module 14 is configured to duplicate the test cases in the plurality of test case pairs to be processed.
In this embodiment, the extracting module 11 may specifically be configured to:
extracting characteristic values of each test case;
performing word segmentation on the characteristic value of each test case to obtain at least one keyword;
accordingly, the first determining module 12 may be specifically configured to:
and determining the similarity of the characteristic values of the two test cases in each test case pair based on the keywords obtained by segmenting the characteristic values of the test cases respectively.
In this embodiment, the first determining module 12 may be specifically configured to:
respectively determining similar keyword pairs in keywords of two test cases in each test case pair, wherein the similar keyword pairs consist of first keywords and second keywords, the text similarity of the first keywords and the second keywords is greater than a set text similarity threshold, the first keywords are keywords of one test case in the two test cases, and the second keywords are keywords of the other test case in the two test cases;
and counting the number of the similar keyword pairs, and determining the similarity of the characteristic values of the two test cases based on the number of the similar keyword pairs.
In this embodiment, the first determining module 12 may be specifically configured to:
respectively counting the occurrence frequency of each first keyword in each test case pair in a first test case and the occurrence frequency of each second keyword in the test case pair in a second test case, wherein the first test case and the second test case form the test case pair, the first keyword is the keyword of the first test case, and the second keyword is the keyword of the second test case;
and determining the similarity of the characteristic values of the two test cases respectively based on the keywords of the two test cases in each test case pair, the occurrence frequency of each first keyword in each test case pair in the first test case, and the occurrence frequency of each second keyword in each test case pair in the second test case.
In this embodiment, the first determining module 12 may be specifically configured to:
and determining the similarity of the characteristic values of the two test cases by using a cosine similarity algorithm based on the keywords of the two test cases in each test case pair, the occurrence frequency of each first keyword in each test case pair in the first test case and the occurrence frequency of each second keyword in each test case pair in the second test case.
It should be noted that each embodiment is mainly described as a difference from the other embodiments, and the same and similar parts between the embodiments may be referred to each other. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.
From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments of the present application.
The detailed description is given above on the duplicate removal method and device for a test case provided by the present application, and a specific example is applied in the detailed description to explain the principle and the implementation manner of the present application, and the description of the above embodiment is only used to help understanding the method and the core idea of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (10)

1. A method for duplicate removal of test cases, comprising:
extracting characteristic values of each test case;
respectively determining the similarity of the characteristic values of two test cases in each test case pair, wherein the test case pairs are obtained by selecting any two test cases from a plurality of test cases;
taking the test case pair to which the characteristic value with the similarity larger than a preset similarity threshold belongs as a test case pair to be processed;
and carrying out duplicate removal on the test cases in the plurality of test case pairs to be processed.
2. The method of claim 1, wherein the extracting feature values of the respective test cases comprises:
extracting characteristic values of each test case;
performing word segmentation on the characteristic value of each test case to obtain at least one keyword;
the determining the similarity of the characteristic values of the two test cases in each test case pair respectively comprises the following steps:
and determining the similarity of the characteristic values of the two test cases in each test case pair based on the keywords obtained by segmenting the characteristic values of the test cases respectively.
3. The method according to claim 2, wherein determining similarity of the feature values of the two test cases in each test case pair based on the keywords obtained by segmenting the feature values of each test case respectively comprises:
respectively determining similar keyword pairs in keywords of two test cases in each test case pair, wherein the similar keyword pairs consist of first keywords and second keywords, the text similarity of the first keywords and the second keywords is greater than a set text similarity threshold, the first keywords are keywords of one test case in the two test cases, and the second keywords are keywords of the other test case in the two test cases;
and counting the number of the similar keyword pairs, and determining the similarity of the characteristic values of the two test cases based on the number of the similar keyword pairs.
4. The method according to claim 2, wherein determining similarity of the feature values of the two test cases in each test case pair based on the keywords obtained by segmenting the feature values of each test case respectively comprises:
respectively counting the occurrence frequency of each first keyword in each test case pair in a first test case and the occurrence frequency of each second keyword in the test case pair in a second test case, wherein the first test case and the second test case form the test case pair, the first keyword is the keyword of the first test case, and the second keyword is the keyword of the second test case;
and determining the similarity of the characteristic values of the two test cases respectively based on the keywords of the two test cases in each test case pair, the occurrence frequency of each first keyword in each test case pair in the first test case, and the occurrence frequency of each second keyword in each test case pair in the second test case.
5. The method of claim 4, wherein determining similarity of feature values of the two test cases based on the keywords of the two test cases in each test case pair, the number of times that each first keyword in each test case pair appears in a first test case, and the number of times that each second keyword in the test case pair appears in a second test case respectively comprises:
and determining the similarity of the characteristic values of the two test cases by using a cosine similarity algorithm based on the keywords of the two test cases in each test case pair, the occurrence frequency of each first keyword in each test case pair in the first test case and the occurrence frequency of each second keyword in each test case pair in the second test case.
6. A test case deduplication apparatus, comprising:
the extraction module is used for extracting the characteristic value of each test case;
the first determining module is used for respectively determining the similarity of the characteristic values of two test cases in each test case pair, wherein the test case pair is obtained by selecting any two test cases from a plurality of test cases;
the second determination module is used for taking the test case pair to which the characteristic value with the similarity larger than the preset similarity threshold belongs as a test case pair to be processed;
and the duplication removing module is used for carrying out duplication removal on the test cases in the plurality of test case pairs to be processed.
7. The apparatus according to claim 6, wherein the extraction module is specifically configured to:
extracting characteristic values of each test case;
performing word segmentation on the characteristic value of each test case to obtain at least one keyword;
the first determining module is specifically configured to:
and determining the similarity of the characteristic values of the two test cases in each test case pair based on the keywords obtained by segmenting the characteristic values of the test cases respectively.
8. The apparatus of claim 7, wherein the first determining module is specifically configured to:
respectively determining similar keyword pairs in keywords of two test cases in each test case pair, wherein the similar keyword pairs consist of first keywords and second keywords, the text similarity of the first keywords and the second keywords is greater than a set text similarity threshold, the first keywords are keywords of one test case in the two test cases, and the second keywords are keywords of the other test case in the two test cases;
and counting the number of the similar keyword pairs, and determining the similarity of the characteristic values of the two test cases based on the number of the similar keyword pairs.
9. The apparatus of claim 7, wherein the first determining module is specifically configured to:
respectively counting the occurrence frequency of each first keyword in each test case pair in a first test case and the occurrence frequency of each second keyword in the test case pair in a second test case, wherein the first test case and the second test case form the test case pair, the first keyword is the keyword of the first test case, and the second keyword is the keyword of the second test case;
and determining the similarity of the characteristic values of the two test cases respectively based on the keywords of the two test cases in each test case pair, the occurrence frequency of each first keyword in each test case pair in the first test case, and the occurrence frequency of each second keyword in each test case pair in the second test case.
10. The apparatus of claim 9, wherein the first determining module is specifically configured to:
and determining the similarity of the characteristic values of the two test cases by using a cosine similarity algorithm based on the keywords of the two test cases in each test case pair, the occurrence frequency of each first keyword in each test case pair in the first test case and the occurrence frequency of each second keyword in each test case pair in the second test case.
CN202010505902.6A 2020-06-05 2020-06-05 Test case duplicate removal method and device Active CN111625468B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010505902.6A CN111625468B (en) 2020-06-05 2020-06-05 Test case duplicate removal method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010505902.6A CN111625468B (en) 2020-06-05 2020-06-05 Test case duplicate removal method and device

Publications (2)

Publication Number Publication Date
CN111625468A true CN111625468A (en) 2020-09-04
CN111625468B CN111625468B (en) 2024-04-16

Family

ID=72260191

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010505902.6A Active CN111625468B (en) 2020-06-05 2020-06-05 Test case duplicate removal method and device

Country Status (1)

Country Link
CN (1) CN111625468B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11954019B2 (en) 2022-02-04 2024-04-09 Optum, Inc. Machine learning techniques for automated software testing configuration management

Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8234285B1 (en) * 2009-07-10 2012-07-31 Google Inc. Context-dependent similarity measurements
CN103678702A (en) * 2013-12-30 2014-03-26 优视科技有限公司 Video duplicate removal method and device
CN104636319A (en) * 2013-11-11 2015-05-20 腾讯科技(北京)有限公司 Text duplicate removal method and device
CN105824798A (en) * 2016-03-03 2016-08-03 云南电网有限责任公司教育培训评价中心 Examination question de-duplicating method of examination question base based on examination question key word likeness
CN106598940A (en) * 2016-11-01 2017-04-26 四川用联信息技术有限公司 Text similarity solution algorithm based on global optimization of keyword quality
WO2017107566A1 (en) * 2015-12-25 2017-06-29 广州视源电子科技股份有限公司 Retrieval method and system based on word vector similarity
CN109101620A (en) * 2018-08-08 2018-12-28 广州神马移动信息科技有限公司 Similarity calculating method, clustering method, device, storage medium and electronic equipment
CN109508378A (en) * 2018-11-26 2019-03-22 平安科技(深圳)有限公司 A kind of sample data processing method and processing device
CN110162630A (en) * 2019-05-09 2019-08-23 深圳市腾讯信息技术有限公司 A kind of method, device and equipment of text duplicate removal
CN110163688A (en) * 2019-05-30 2019-08-23 复旦大学 Commodity network public sentiment detection system
CN110162750A (en) * 2019-01-24 2019-08-23 腾讯科技(深圳)有限公司 Text similarity detection method, electronic equipment and computer readable storage medium
CN110276021A (en) * 2019-04-29 2019-09-24 小轮(上海)网络科技有限公司 Place name matching process and device based on semantic similarity
CN110377886A (en) * 2019-06-19 2019-10-25 平安国际智慧城市科技股份有限公司 Project duplicate checking method, apparatus, equipment and storage medium
CN110442760A (en) * 2019-07-24 2019-11-12 银江股份有限公司 A kind of the synonym method for digging and device of question and answer searching system
CN110941598A (en) * 2019-12-02 2020-03-31 北京锐安科技有限公司 Data deduplication method, device, terminal and storage medium
CN110956037A (en) * 2019-10-16 2020-04-03 厦门美柚股份有限公司 Multimedia content repeated judgment method and device
CN111159445A (en) * 2019-12-30 2020-05-15 深圳云天励飞技术有限公司 Picture filtering method and device, electronic equipment and storage medium

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8234285B1 (en) * 2009-07-10 2012-07-31 Google Inc. Context-dependent similarity measurements
CN104636319A (en) * 2013-11-11 2015-05-20 腾讯科技(北京)有限公司 Text duplicate removal method and device
CN103678702A (en) * 2013-12-30 2014-03-26 优视科技有限公司 Video duplicate removal method and device
WO2017107566A1 (en) * 2015-12-25 2017-06-29 广州视源电子科技股份有限公司 Retrieval method and system based on word vector similarity
CN105824798A (en) * 2016-03-03 2016-08-03 云南电网有限责任公司教育培训评价中心 Examination question de-duplicating method of examination question base based on examination question key word likeness
CN106598940A (en) * 2016-11-01 2017-04-26 四川用联信息技术有限公司 Text similarity solution algorithm based on global optimization of keyword quality
CN109101620A (en) * 2018-08-08 2018-12-28 广州神马移动信息科技有限公司 Similarity calculating method, clustering method, device, storage medium and electronic equipment
CN109508378A (en) * 2018-11-26 2019-03-22 平安科技(深圳)有限公司 A kind of sample data processing method and processing device
CN110162750A (en) * 2019-01-24 2019-08-23 腾讯科技(深圳)有限公司 Text similarity detection method, electronic equipment and computer readable storage medium
CN110276021A (en) * 2019-04-29 2019-09-24 小轮(上海)网络科技有限公司 Place name matching process and device based on semantic similarity
CN110162630A (en) * 2019-05-09 2019-08-23 深圳市腾讯信息技术有限公司 A kind of method, device and equipment of text duplicate removal
CN110163688A (en) * 2019-05-30 2019-08-23 复旦大学 Commodity network public sentiment detection system
CN110377886A (en) * 2019-06-19 2019-10-25 平安国际智慧城市科技股份有限公司 Project duplicate checking method, apparatus, equipment and storage medium
CN110442760A (en) * 2019-07-24 2019-11-12 银江股份有限公司 A kind of the synonym method for digging and device of question and answer searching system
CN110956037A (en) * 2019-10-16 2020-04-03 厦门美柚股份有限公司 Multimedia content repeated judgment method and device
CN110941598A (en) * 2019-12-02 2020-03-31 北京锐安科技有限公司 Data deduplication method, device, terminal and storage medium
CN111159445A (en) * 2019-12-30 2020-05-15 深圳云天励飞技术有限公司 Picture filtering method and device, electronic equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11954019B2 (en) 2022-02-04 2024-04-09 Optum, Inc. Machine learning techniques for automated software testing configuration management

Also Published As

Publication number Publication date
CN111625468B (en) 2024-04-16

Similar Documents

Publication Publication Date Title
Pan et al. Event detection with spatial latent Dirichlet allocation
CN111243601B (en) Voiceprint clustering method and device, electronic equipment and computer-readable storage medium
CN110019792A (en) File classification method and device and sorter model training method
CN109933644B (en) Character string matching method and device
CN106372202B (en) Text similarity calculation method and device
WO2020211393A1 (en) Written judgment information retrieval method and device, computer apparatus, and storage medium
CN110110325B (en) Repeated case searching method and device and computer readable storage medium
CN115809662B (en) Method, device, equipment and medium for detecting anomaly of text content
CN111338692A (en) Vulnerability classification method and device based on vulnerability codes and electronic equipment
CN112183102A (en) Named entity identification method based on attention mechanism and graph attention network
CN112784009A (en) Subject term mining method and device, electronic equipment and storage medium
CN110413998B (en) Self-adaptive Chinese word segmentation method oriented to power industry, system and medium thereof
CN111625468A (en) Test case duplicate removal method and device
CN110716857A (en) Test case management method and device, computer equipment and storage medium
CN113821630A (en) Data clustering method and device
CN116226681B (en) Text similarity judging method and device, computer equipment and storage medium
CN116361185A (en) Software testing method and device
CN111178037A (en) Repeated defect report identification method and device and electronic equipment
CN113656575B (en) Training data generation method and device, electronic equipment and readable medium
CN112115362B (en) Programming information recommendation method and device based on similar code recognition
CN114610576A (en) Log generation monitoring method and device
CN104484330A (en) Pre-selecting method and device of spam comments based on grading keyword threshold combination evaluation
CN111538669B (en) Test case extraction method and device based on historical problem backtracking analysis
CN113935387A (en) Text similarity determination method and device and computer readable storage medium
CN111061924A (en) Phrase extraction method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant