CN111178037A - Repeated defect report identification method and device and electronic equipment - Google Patents
Repeated defect report identification method and device and electronic equipment Download PDFInfo
- Publication number
- CN111178037A CN111178037A CN201911341418.8A CN201911341418A CN111178037A CN 111178037 A CN111178037 A CN 111178037A CN 201911341418 A CN201911341418 A CN 201911341418A CN 111178037 A CN111178037 A CN 111178037A
- Authority
- CN
- China
- Prior art keywords
- defect report
- similarity
- defect
- content information
- identified
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000007547 defect Effects 0.000 title claims abstract description 484
- 238000000034 method Methods 0.000 title claims abstract description 48
- 238000012360 testing method Methods 0.000 claims description 35
- 238000004364 calculation method Methods 0.000 claims description 20
- 238000012545 processing Methods 0.000 claims description 10
- 239000013598 vector Substances 0.000 claims description 9
- 230000003252 repetitive effect Effects 0.000 claims description 8
- 230000006870 function Effects 0.000 description 9
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000005034 decoration Methods 0.000 description 2
- 230000002950 deficient Effects 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000000802 evaporation-induced self-assembly Methods 0.000 description 1
- 238000011990 functional testing Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/3668—Software testing
- G06F11/3672—Test management
Abstract
The application provides a repeated defect report identification method, a repeated defect report identification device and electronic equipment, which are applied to the technical field of computers, wherein the method comprises the following steps: whether the defect report to be identified is the repeated defect report or not is automatically identified, so that the identification efficiency of the repeated defect report is improved, in addition, the third similarity between the defect report to be identified and at least one target defect report is determined based on the first similarity and the second similarity between the content information of each part and the corresponding part of at least one target defect report, and then whether the defect report to be identified is the repeated defect report or not is determined based on the third similarity between the defect report to be identified and at least one target defect report, namely whether the defect report to be identified is the repeated defect report or not is determined from the similarity calculated by multiple dimensions, so that the accuracy of whether the defect report is the repeated defect report or not is improved.
Description
Technical Field
The application relates to the technical field of computers, in particular to a repeated defect report identification method and device and electronic equipment.
Background
The defect report is a document published by a natural language, describes the fault conditions of the software which can not run normally and meet the requirements in the running process, is usually submitted by a user or a tester of the software and then handed to a software quality maintainer, and how to avoid submitting repeated defect reports becomes a problem.
Currently, identification of duplicate defect reports is done manually, i.e., by manually reading each submitted defect report to determine whether it is a duplicate defect report. However, according to the current method of determining whether the report is a defect report by manual reading, it is necessary to manually read the submitted defect reports one by one, and then determine whether the report is a repeatedly submitted defect report, which is time-consuming. Therefore, the conventional method of manually identifying whether the defect report is repeated or not has a problem of low efficiency.
Disclosure of Invention
The application provides a repeated defect identification method, a repeated defect identification device and electronic equipment, which are used for improving the identification efficiency and accuracy of a repeated defect report, and the technical scheme adopted by the application is as follows:
in a first aspect, there is provided a method of repetitive defect identification, the method comprising,
calculating and determining a first similarity between a defect report to be identified and at least one target defect report;
acquiring content information of a plurality of parts of the defect report to be identified, and respectively determining second similarity of the content information of each part and a corresponding part of at least one target defect report to obtain second similarity of the content information of each part;
determining a third similarity between the defect report to be identified and at least one target defect report based on the predetermined first similarity and the weight value of the second similarity of the content information of each part;
and determining whether the defect report to be identified is a repeated defect report based on the third similarity of the defect report to be identified and the at least one target defect report.
Optionally, determining a second similarity between the content information of each portion and the corresponding portion of the at least one target defect report respectively comprises:
determining the type of the content information of each part;
a second similarity of the content information of the respective portion to a corresponding portion of the at least one target defect report is determined based on the type of the content information of the respective portion.
Optionally, the content information of the plurality of parts includes: test case information, test environment information, defect creator information, defect responsible person information, and defect label information;
determining a second similarity of the content information of each portion to a corresponding portion of the at least one target defect report based on the type of the content information of each portion, comprising:
when the content information is test case information, test environment information and defect label information, determining a second similarity of the corresponding content information and a corresponding part of at least one target defect report by calculating the distance between the text vectors;
and when the content information is defect creator information and defect accountant information, determining a second similarity between the corresponding content information and the corresponding part of the at least one target defect report in a keyword matching mode.
Further, the method further comprises:
acquiring an existing defect report;
judging whether the existing defect report and the defect report to be identified contain the same defect component name or not;
and when the existing defect report and the defect report to be identified contain the same defect component name, taking the existing defect report as a target defect report.
Further, the method further comprises:
carrying out stop word processing on the defect report to be identified and the target defect report, wherein the stop words comprise fixed template fields of the defect report;
calculating and determining a first similarity between a defect report to be identified and at least one target defect report, specifically comprising:
and calculating a first similarity between the defect report to be identified after determining the stop word and at least one target defect report after determining the stop word.
In a second aspect, there is provided a repetitive defect report identification apparatus, the apparatus comprising,
the calculation determining module is used for calculating and determining a first similarity between the defect report to be identified and at least one target defect report;
the first determining module is used for acquiring the content information of a plurality of parts of the defect report to be identified, and respectively determining the second similarity of the content information of each part and the corresponding part of at least one target defect report to obtain the second similarity of the content information of each part;
the second determining module is used for determining a third similarity between the defect report to be identified and at least one target defect report based on the preset first similarity and the weighted value of the second similarity of the content information of each part;
and the third determining module is used for determining whether the defect report to be identified is a repeated defect report or not based on the third similarity between the defect report to be identified and the at least one target defect report.
Optionally, the first determining module includes:
a first determination unit configured to determine types of content information of the respective portions;
a second determining unit for determining a second similarity of the content information of each portion with a corresponding portion of the at least one target defect report based on the type of the content information of each portion.
Optionally, the content information of the plurality of parts includes: test case information, test environment information, defect creator information, defect responsible person information, and defect label information;
the second determining unit is specifically used for determining a second similarity between the corresponding content information and the corresponding part of the at least one target defect report by calculating the distance between the text vectors when the content information is the test case information, the test environment information and the defect label information; and/or, the second similarity determination module is specifically configured to determine, by means of keyword matching, a second similarity between the corresponding content information and a corresponding portion of the at least one target defect report when the content information is defect creator information or defect accountant information.
Further, the apparatus further comprises:
an acquisition module for acquiring an existing defect report;
the judging module is used for judging whether the existing defect report and the defect report to be identified contain the same defect component name or not;
and the module is used for taking the existing defect report as a target defect report when the existing defect report and the defect report to be identified contain the same defect component name.
Further, the apparatus further comprises:
the removing module is used for performing stop word processing on the defect report to be identified and the target defect report, wherein the stop words comprise fixed template fields of the defect report;
the first calculation and determination module is specifically used for calculating a first similarity between the defect report to be identified after determining the stop word and at least one target defect report after determining the stop word.
In a third aspect, an electronic device is provided, which includes:
one or more processors;
a memory;
one or more application programs, wherein the one or more application programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: the repetitive defect report identification method shown in the first aspect is performed.
In a fourth aspect, there is provided a computer-readable storage medium for storing computer instructions which, when run on a computer, cause the computer to perform the repetitive defect report identification method of the first aspect.
Compared with the prior art that whether the report is the repeated defect report is determined manually, the method determines the first similarity between the defect report to be identified and at least one target defect report through calculation, then, content information of a plurality of parts of the defect report to be identified is obtained, second similarity of the content information of each part and a corresponding part of at least one target defect report is respectively determined, second similarity of the content information of each part is obtained, third similarity of the defect report to be identified and at least one target defect report is determined based on the preset first similarity and the weight value of the second similarity of the content information of each part, and finally whether the defect report to be identified is a repeated defect report is determined based on the third similarity of the defect report to be identified and at least one target defect report. The method comprises the steps of automatically identifying whether a defect report to be identified is a repeated defect report or not, so that the identification efficiency of the repeated defect report is improved, determining a third similarity between the defect report to be identified and at least one target defect report based on a first similarity and a second similarity between content information of each part and a corresponding part of the at least one target defect report, and then determining whether the defect report to be identified is the repeated defect report or not based on the third similarity between the defect report to be identified and the at least one target defect report, namely determining whether the defect report is the repeated defect report or not based on the similarities calculated from multiple dimensions, so that the accuracy of the repeated defect report or not is improved.
Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flowchart illustrating a method for identifying a re-bug report according to an embodiment of the present application;
FIG. 2 is a schematic structural diagram of an apparatus for identifying a duplicate defect report according to an embodiment of the present application;
FIG. 3 is a schematic diagram of another repeat defect report identification apparatus according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to the embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application.
As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
The embodiment of the application provides a repeated defect report identification method, as shown in fig. 1, the method may include the following steps:
step S101, calculating and determining a first similarity between a defect report to be identified and at least one target defect report;
specifically, a first similarity between the defect report to be identified and at least one target defect report can be determined by calculating the distance between the texts; wherein, the distance can be Euclidean distance, cosine distance, Hamming distance, etc.; the text vector expressions of the defect report to be identified and the at least one target defect report may be obtained through a corresponding word embedding algorithm, where the obtained text vector expressions may be implemented through a neural network method, may also be implemented through a TF/IDF (term-inverse document frequency) algorithm, and may also be implemented through other methods capable of implementing the present application, which are not limited herein.
Step S102, acquiring content information of a plurality of parts of the defect report to be identified, and respectively determining a second similarity between the content information of each part and a corresponding part of at least one target defect report to obtain a second similarity of the content information of each part;
specifically, the defect report may be composed of a plurality of parts, content information of the plurality of parts may be identified by a corresponding keyword matching method or a natural language understanding algorithm, and then a second similarity between the content information of each part and a corresponding part of the at least one target defect report may be determined by a corresponding similarity determination method.
Illustratively, the content information of the plurality of parts may be content information of part a, content information of part B, and content information of part C, and when there is only one target defect report, a second similarity of the content information of part a of the defect report to be identified and the content information of part a of the target defect report, a second similarity of the content information of part B of the defect report to be identified and the content information of part B of the target defect report, and a second similarity of the content information of part C of the defect report to be identified and the content information of part C of the target defect report may be determined, respectively. There may be a plurality of second similarities.
For example, when there are a plurality of (three as an example) target defect reports, a second similarity between the content information of the part a of the defect report to be identified and the content information of the part a of the first target defect report, a second similarity between the content information of the part a of the defect report to be identified and the content information of the part a of the second target defect report, a second similarity between the content information of the part a of the defect report to be identified and the content information of the part a of the third target defect report, and a second similarity between the content information of the part B of the defect report to be identified and the corresponding parts of the first, second, and third target defect reports may be determined, respectively.
Step S103, determining a third similarity between the defect report to be identified and at least one target defect report based on the preset first similarity and the weighted value of the second similarity of the content information of each part;
specifically, a certain weight value may be assigned to each of the first similarity and the second similarity, and then a third similarity between the defect report to be identified and the at least one target defect report may be determined comprehensively.
Illustratively, the content information of the plurality of parts may be content information of part a, content information of part B, content information of part C, and when there is only one target defect report, it may be represented by the formula:
the third similarity is the first similarity, the partial similarity of 1+ a × the partial similarity of 2+ B × the partial similarity of 3+ C × the partial similarity of 4 (equation 1)
And determining a third similarity between the defect report to be identified and the target defect report, wherein the weight 1, the weight 2, the weight 3 and the weight 4 may be the same or different, and optionally, the value of the weight 1 is greater than the values of the weight 2, the weight 3 and the weight 4.
For example, when there are a plurality of target defect reports, the third similarity between the defect report to be identified and each target defect report may be calculated by the above method, and details are not repeated here.
And step S104, determining whether the defect report to be identified is a repeated defect report or not based on the third similarity between the defect report to be identified and at least one target defect report.
Specifically, whether the defect report to be identified is a repeated defect report is determined based on the third similarity, wherein when the similarity between the defect report to be identified and a certain target defect report exceeds a certain threshold value, the defect report to be identified is determined as the repeated defect report.
Compared with the prior art that whether the report is the repeated defect report is determined manually, the identification method of the repeated defect report determines the first similarity between the defect report to be identified and at least one target defect report through calculation, then, content information of a plurality of parts of the defect report to be identified is obtained, second similarity of the content information of each part and a corresponding part of at least one target defect report is respectively determined, second similarity of the content information of each part is obtained, third similarity of the defect report to be identified and at least one target defect report is determined based on the preset first similarity and the weight value of the second similarity of the content information of each part, and finally whether the defect report to be identified is a repeated defect report is determined based on the third similarity of the defect report to be identified and at least one target defect report. The method comprises the steps of automatically identifying whether a defect report to be identified is a repeated defect report or not, so that the identification efficiency of the repeated defect report is improved, determining a third similarity between the defect report to be identified and at least one target defect report based on a first similarity and a second similarity between content information of each part and a corresponding part of the at least one target defect report, and then determining whether the defect report to be identified is the repeated defect report or not based on the third similarity between the defect report to be identified and the at least one target defect report, namely determining whether the defect report is the repeated defect report or not based on the similarities calculated from multiple dimensions, so that the accuracy of the repeated defect report or not is improved.
The embodiment of the present application provides a possible implementation manner, and step S102 includes:
step S1021 (not shown in the figure), which determines the types of content information of the respective sections;
step S1022 (not shown in the figure) determines a second similarity between the content information of each portion and the corresponding portion of the at least one target defect report based on the type of the content information of each portion.
Specifically, the similarity between the content information of different parts may be calculated in different manners, and the type corresponding to the content information of each part may be determined first, and then the second similarity between the content information of each part and the corresponding part of the at least one target defect report may be determined based on the type of the content information of each part.
Illustratively, the content information has three types, different types respectively correspond to corresponding similarity calculation modes, and when the determined content information is the type A, the second similarity is calculated through the similarity calculation mode of the type A content information.
With the embodiment of the application, the second similarity between the content information of each part and the corresponding part of the at least one target defect report is determined based on the type of the content information, and the problem of how to determine the second similarity is solved.
The embodiment of the present application provides a possible implementation manner, where the content information of the multiple parts includes: test case information, test environment information, defect creator information, defect responsible person information, and defect label information;
wherein, the test environment is an important information of the test report, and each defect is associated with an operating system, middleware and a topological structure. The test context information may be stored as an artifact (artifact) in the test management system or, in the defect tracking system, as part of the "description" field or information of the "context" field. The more similar the test environment, the higher the likelihood of inter-defect duplication.
The defect creator may be a developer or a tester. Testers of different test teams follow different test schemes, for example, a functional test team is focused on finding functional errors of products, a system verification test is focused on finding defects in a simulated user environment, so testers in the same test team are more likely to find the same defects.
The defect label may be a keyword label of the defect report, and the keyword label may be manually extracted by a writer of the defect report.
Step S1022 (not shown in the figure) includes:
step S10221 (not shown in the figure), when the content information is test case information, test environment information, defect label information, determining a second similarity between the corresponding content information and a corresponding portion of at least one target defect report by calculating a distance between the text vectors;
specifically, when the content information is test case information, test environment information, defect label information, the second similarity of the corresponding content information to the corresponding portion of the at least one target defect report may be determined by a distance between the text vectors.
Step S10222 (not shown in the figure), when the content information is the defect creator information or the defect responsible person information, determining a second similarity between the corresponding content information and the corresponding part of the at least one target defect report by means of keyword matching.
Specifically, when the content information is the defect creator information and the defect responsible person information, determining a second similarity between the corresponding content information and the corresponding part of the at least one target defect report in a keyword matching manner, and if the defect creator of the defect report to be identified is the same as or is located in the same team as the defect creator of the target defect report, determining that the similarity value is 1, otherwise, determining that the similarity value is 0.
The method and the device solve the problem of how to specifically determine the similarity between the content information of the defect report to be identified and the content information of the corresponding part of the target defect report.
The embodiment of the present application provides a possible implementation manner, and further, the method further includes:
step S105 (not shown in the figure), acquiring an existing defect report;
step S106 (not shown in the figure), determining whether the current defect report and the defect report to be identified contain the same defect component name;
step S107 (not shown in the figure), when the existing defect report and the defect report to be identified contain the same defective component name, the existing defect report is taken as the target defect report.
Specifically, one or more defect reports can be obtained from the database, and then whether the existing defect report and the defect report to be identified contain the same defect component name or not is judged; if the existing defect report and the identified defect report belong to the same component, the probability of similarity between the existing defect report and the identified defect report is higher than the probability of similarity between the existing defect report and the identified defect report, so that if the existing defect report and the identified defect report belong to the same component, the existing defect report is determined as the target defect report.
For the embodiment of the application, the existing defect report is screened by judging whether the existing defect report and the defect report to be identified comprise the same defect component name, so that the subsequent data processing amount or calculation amount is reduced.
The embodiment of the present application provides a possible implementation manner, and further, the method further includes:
step S108 (not shown in the figure), the to-be-identified defect report and the target defect report are processed by stop words, and the stop words comprise fixed template fields of the defect report;
specifically, the target defect report and the defect report to be identified may be processed by deactivating words, where the deactivating words include fixed template fields of the defect report, for example, a defect report template for the "description" field may contain words such as recurring steps, actual results, expected results, detailed error information, screen shot attachments, and the like.
Step S101 specifically includes: step S1011 (not shown in the figure), calculates a first similarity between the defect report to be identified after determining the stop word and at least one target defect report after determining the stop word.
For the embodiment of the application, the stop word processing is carried out on the fixed template fields of the defect report to be identified and the target defect report, so that the data processing amount is further reduced.
Fig. 2 is a repeated defect report recognition apparatus according to an embodiment of the present application, where the apparatus 20 includes: a calculation determination module 201, a first determination module 202, a second determination module 203, and a third determination module 204, wherein,
a calculation determination module 201, configured to calculate and determine a first similarity between a defect report to be identified and at least one target defect report;
a first determining module 202, configured to obtain content information of multiple portions of a defect report to be identified, and determine a second similarity between the content information of each portion and a corresponding portion of at least one target defect report, to obtain a second similarity between the content information of each portion;
the second determining module 203 is used for determining a third similarity between the defect report to be identified and at least one target defect report based on the predetermined first similarity and the weighted value of the second similarity of the content information of each part;
a third determining module 204, configured to determine whether the defect report to be identified is a repeated defect report based on a third similarity between the defect report to be identified and the at least one target defect report.
Compared with the prior art that whether the report is the repeated defect report is determined manually, the repeated defect report identification device determines the first similarity between the defect report to be identified and at least one target defect report through calculation, then, content information of a plurality of parts of the defect report to be identified is obtained, second similarity of the content information of each part and a corresponding part of at least one target defect report is respectively determined, second similarity of the content information of each part is obtained, third similarity of the defect report to be identified and at least one target defect report is determined based on the preset first similarity and the weight value of the second similarity of the content information of each part, and finally whether the defect report to be identified is a repeated defect report is determined based on the third similarity of the defect report to be identified and at least one target defect report. The method comprises the steps of automatically identifying whether a defect report to be identified is a repeated defect report or not, so that the identification efficiency of the repeated defect report is improved, determining a third similarity between the defect report to be identified and at least one target defect report based on a first similarity and a second similarity between content information of each part and a corresponding part of the at least one target defect report, and then determining whether the defect report to be identified is the repeated defect report or not based on the third similarity between the defect report to be identified and the at least one target defect report, namely determining whether the defect report is the repeated defect report or not based on the similarities calculated from multiple dimensions, so that the accuracy of the repeated defect report or not is improved.
The device for identifying a duplicate defect report of the present embodiment can perform the method for identifying a duplicate defect report provided in the above embodiments of the present application, and the implementation principles thereof are similar, and are not described herein again.
As shown in fig. 3, the present embodiment provides another repetitive defect report identification apparatus, where the apparatus 30 includes: a calculation determination module 301, a first determination module 302, a second determination module 303, and a third determination module 304, wherein,
a calculation determining module 301, configured to calculate and determine a first similarity between a defect report to be identified and at least one target defect report;
here, the calculation determination module 301 in fig. 3 has the same or similar function as the calculation determination module 201 in fig. 2.
A first determining module 302, configured to obtain content information of multiple portions of a defect report to be identified, and determine a second similarity between the content information of each portion and a corresponding portion of at least one target defect report, to obtain a second similarity between the content information of each portion;
wherein the first determining module 302 in fig. 3 has the same or similar function as the first determining module 201 in fig. 2.
A second determining module 303, configured to determine a third similarity between the defect report to be identified and at least one target defect report based on a predetermined first similarity and a weight value of a second similarity of the content information of each part;
wherein the second determining module 303 in fig. 3 has the same or similar function as the second determining module 203 in fig. 2.
A third determining module 304, configured to determine whether the defect report to be identified is a repeated defect report based on a third similarity between the defect report to be identified and the at least one target defect report.
Wherein the third determining module 304 in fig. 3 has the same or similar function as the third determining module 204 in fig. 2.
The embodiment of the present application provides a possible implementation manner, and in particular, the first determining module 302 includes:
a first determination unit 3021 for determining the types of content information of the respective sections;
a second determining unit 3022 for determining a second similarity of the content information of each portion to a corresponding portion of the at least one target defect report based on the type of the content information of each portion.
With the embodiment of the application, the second similarity between the content information of each part and the corresponding part of the at least one target defect report is determined based on the type of the content information, and the problem of how to determine the second similarity is solved.
The embodiment of the present application provides a possible implementation manner, where the content information of the multiple parts includes: test case information, test environment information, defect creator information, defect responsible person information, and defect label information;
a second determining unit 3022, configured to determine, when the content information is test case information, test environment information, and defect label information, a second similarity between the corresponding content information and a corresponding portion of the at least one target defect report by calculating a distance between the text vectors; and/or, the second similarity determination module is specifically configured to determine, by means of keyword matching, a second similarity between the corresponding content information and a corresponding portion of the at least one target defect report when the content information is defect creator information or defect accountant information.
The method and the device solve the problem of how to specifically determine the similarity between the content information of the defect report to be identified and the content information of the corresponding part of the target defect report.
The embodiment of the present application provides a possible implementation manner, and further, the apparatus further includes:
an obtaining module 305, configured to obtain an existing defect report;
a judging module 306, configured to judge whether the existing defect report and the defect report to be identified contain the same defect component name;
and a module 307, configured to take the existing defect report as a target defect report when the existing defect report and the defect report to be identified contain the same defective component name.
For the embodiment of the application, the existing defect report is screened by judging whether the existing defect report and the defect report to be identified comprise the same defect component name, so that the subsequent data processing amount or calculation amount is reduced.
The embodiment of the present application provides a possible implementation manner, and further, the apparatus 30 further includes:
the removing module 308 is configured to perform stop word processing on the defect report to be identified and the target defect report, where a stop word includes a fixed template field of the defect report;
the first calculation and determination module is specifically used for calculating a first similarity between the defect report to be identified after determining the stop word and at least one target defect report after determining the stop word.
For the embodiment of the application, the stop word processing is carried out on the fixed template fields of the defect report to be identified and the target defect report, so that the data processing amount is further reduced.
Compared with the prior art that whether the report is the repeated defect report is determined manually, the repeated defect report identification device determines the first similarity between the defect report to be identified and at least one target defect report through calculation, then, content information of a plurality of parts of the defect report to be identified is obtained, second similarity of the content information of each part and a corresponding part of at least one target defect report is respectively determined, second similarity of the content information of each part is obtained, third similarity of the defect report to be identified and at least one target defect report is determined based on the preset first similarity and the weight value of the second similarity of the content information of each part, and finally whether the defect report to be identified is a repeated defect report is determined based on the third similarity of the defect report to be identified and at least one target defect report. The method comprises the steps of automatically identifying whether a defect report to be identified is a repeated defect report or not, so that the identification efficiency of the repeated defect report is improved, determining a third similarity between the defect report to be identified and at least one target defect report based on a first similarity and a second similarity between content information of each part and a corresponding part of the at least one target defect report, and then determining whether the defect report to be identified is the repeated defect report or not based on the third similarity between the defect report to be identified and the at least one target defect report, namely determining whether the defect report is the repeated defect report or not based on the similarities calculated from multiple dimensions, so that the accuracy of the repeated defect report or not is improved.
The embodiment of the present application provides a device for identifying a repeated defect report, which is suitable for the method shown in the above embodiment and is not described herein again.
An embodiment of the present application provides an electronic device, as shown in fig. 4, an electronic device 40 shown in fig. 4 includes: a processor 401 and a memory 403. Wherein the processor 401 is coupled to the memory 403, such as via a bus 402. Further, the electronic device 40 may also include a transceiver 404. It should be noted that the transceiver 404 is not limited to one in practical applications, and the structure of the electronic device 40 is not limited to the embodiment of the present application. The processor 401 is applied to the embodiment of the present application, and is configured to implement the functions of the calculation determination module, the first determination module, the second determination module, and the third determination module shown in fig. 2 or fig. 3, and the functions of the acquisition module, the judgment module, the module, and the removal module shown in fig. 3. The transceiver 404 includes a receiver and a transmitter.
The processor 401 may be a CPU, general purpose processor, DSP, ASIC, FPGA or other programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 401 may also be a combination of computing functions, e.g., comprising one or more microprocessors, a combination of a DSP and a microprocessor, or the like.
The memory 403 may be, but is not limited to, a ROM or other type of static storage device that can store static information and instructions, a RAM or other type of dynamic storage device that can store information and instructions, an EEPROM, a CD-ROM or other optical disk storage, optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
The memory 403 is used for storing application program codes for executing the scheme of the application, and the execution is controlled by the processor 401. The processor 401 is configured to execute application program codes stored in the memory 403 to implement the functions of the repetitive defect report identifying apparatus provided by the embodiment shown in fig. 2 or fig. 3.
The embodiment of the application provides an electronic device, compared with the prior art that whether the report is the repeated defect report is determined in a manual mode, the embodiment of the application determines the first similarity of the defect report to be identified and at least one target defect report through calculation, then, content information of a plurality of parts of the defect report to be identified is obtained, second similarity of the content information of each part and a corresponding part of at least one target defect report is respectively determined, second similarity of the content information of each part is obtained, third similarity of the defect report to be identified and at least one target defect report is determined based on the preset first similarity and the weight value of the second similarity of the content information of each part, and finally whether the defect report to be identified is a repeated defect report is determined based on the third similarity of the defect report to be identified and at least one target defect report. The method comprises the steps of automatically identifying whether a defect report to be identified is a repeated defect report or not, so that the identification efficiency of the repeated defect report is improved, determining a third similarity between the defect report to be identified and at least one target defect report based on a first similarity and a second similarity between content information of each part and a corresponding part of the at least one target defect report, and then determining whether the defect report to be identified is the repeated defect report or not based on the third similarity between the defect report to be identified and the at least one target defect report, namely determining whether the defect report is the repeated defect report or not based on the similarities calculated from multiple dimensions, so that the accuracy of the repeated defect report or not is improved.
The embodiment of the application provides an electronic device suitable for the method embodiment. And will not be described in detail herein.
The present application provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the method shown in the above embodiments is implemented.
Embodiments of the present application provide a computer-readable storage medium that computationally determines a first similarity of a defect report to be identified to at least one target defect report, then, content information of a plurality of parts of the defect report to be identified is obtained, second similarity of the content information of each part and a corresponding part of at least one target defect report is respectively determined, second similarity of the content information of each part is obtained, third similarity of the defect report to be identified and at least one target defect report is determined based on the preset first similarity and the weight value of the second similarity of the content information of each part, and finally whether the defect report to be identified is a repeated defect report is determined based on the third similarity of the defect report to be identified and at least one target defect report. The method comprises the steps of automatically identifying whether a defect report to be identified is a repeated defect report or not, so that the identification efficiency of the repeated defect report is improved, determining a third similarity between the defect report to be identified and at least one target defect report based on a first similarity and a second similarity between content information of each part and a corresponding part of the at least one target defect report, and then determining whether the defect report to be identified is the repeated defect report or not based on the third similarity between the defect report to be identified and the at least one target defect report, namely determining whether the defect report is the repeated defect report or not based on the similarities calculated from multiple dimensions, so that the accuracy of the repeated defect report or not is improved.
The embodiment of the application provides a computer-readable storage medium which is suitable for the method embodiment. And will not be described in detail herein.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
The foregoing is only a partial embodiment of the present application, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present application, and these modifications and decorations should also be regarded as the protection scope of the present application.
Claims (10)
1. A method for identifying a recurring defect report, comprising:
calculating and determining a first similarity between a defect report to be identified and at least one target defect report;
acquiring content information of a plurality of parts of the defect report to be identified, and respectively determining second similarity of the content information of each part and a corresponding part of at least one target defect report to obtain second similarity of the content information of each part;
determining a third similarity between the defect report to be identified and at least one target defect report based on the predetermined weight values of the first similarity and the second similarity of the content information of each part;
determining whether the defect report to be identified is a repeated defect report based on a third similarity of the defect report to be identified and at least one target defect report.
2. The method of claim 1, wherein the determining a second similarity between the content information of each portion and the corresponding portion of the at least one target defect report comprises:
determining the type of the content information of each part;
determining a second similarity of the content information of the respective portion to a corresponding portion of the at least one target defect report based on the type of the content information of the respective portion.
3. The method of claim 1, wherein the plurality of portions of content information comprise: test case information, test environment information, defect creator information, defect responsible person information, and defect label information;
the determining a second similarity of the content information of each portion to a corresponding portion of at least one target defect report based on the type of the content information of each portion comprises:
when the content information is test case information, test environment information and defect label information, determining a second similarity of the corresponding content information and a corresponding part of at least one target defect report by calculating the distance between text vectors;
and when the content information is defect creator information and defect accountant information, determining a second similarity between the corresponding content information and the corresponding part of the at least one target defect report in a keyword matching mode.
4. The method of claim 1, further comprising:
acquiring an existing defect report;
judging whether the existing defect report and the defect report to be identified contain the same defect component name or not;
and when the existing defect report and the defect report to be identified contain the same defect component name, taking the existing defect report as a target defect report.
5. The method according to any one of claims 1-4, characterized in that the method further comprises:
performing stop word processing on the defect report to be identified and the target defect report, wherein the stop words comprise fixed template fields of the defect report;
the calculating and determining a first similarity between the defect report to be identified and at least one target defect report specifically comprises:
and calculating a first similarity between the defect report to be identified after determining the stop word and at least one target defect report after determining the stop word.
6. An apparatus for duplicate defect report identification, comprising:
the calculation determining module is used for calculating and determining a first similarity between the defect report to be identified and at least one target defect report;
the first determining module is used for acquiring the content information of a plurality of parts of the defect report to be identified, and respectively determining the second similarity of the content information of each part and the corresponding part of at least one target defect report to obtain the second similarity of the content information of each part;
the second determining module is used for determining a third similarity between the defect report to be identified and at least one target defect report based on the preset first similarity and the weight value of the second similarity of the content information of each part;
a third determining module, configured to determine whether the defect report to be identified is a repeated defect report based on a third similarity between the defect report to be identified and at least one target defect report.
7. The apparatus of claim 6, wherein the first determining module comprises:
a first determination unit configured to determine a type of the content information of the respective portions;
a second determining unit for determining a second similarity of the content information of each portion with a corresponding portion of the at least one target defect report based on the type of the content information of each portion.
8. The apparatus of claim 6, wherein the content information of the plurality of portions comprises: test case information, test environment information, defect creator information, defect responsible person information, and defect label information;
the second determining unit is specifically configured to determine, when the content information is test case information, test environment information, and defect label information, a second similarity between the corresponding content information and a corresponding portion of the at least one target defect report by calculating a distance between text vectors; and/or, the second similarity determination unit is specifically configured to determine, by means of keyword matching, a second similarity between the corresponding content information and a corresponding portion of the at least one target defect report when the content information is defect creator information or defect responsible person information.
9. An electronic device, comprising:
one or more processors;
a memory;
one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: -performing a repetitive defect report identification method according to any of claims 1 to 5.
10. A computer-readable storage medium storing computer instructions which, when executed on a computer, cause the computer to perform the repetitive defect report identification method of any of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911341418.8A CN111178037A (en) | 2019-12-24 | 2019-12-24 | Repeated defect report identification method and device and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911341418.8A CN111178037A (en) | 2019-12-24 | 2019-12-24 | Repeated defect report identification method and device and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111178037A true CN111178037A (en) | 2020-05-19 |
Family
ID=70655633
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911341418.8A Pending CN111178037A (en) | 2019-12-24 | 2019-12-24 | Repeated defect report identification method and device and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111178037A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112286808A (en) * | 2020-10-29 | 2021-01-29 | 北京字节跳动网络技术有限公司 | Application program testing method and device, electronic equipment and medium |
CN113238963A (en) * | 2021-06-16 | 2021-08-10 | 中国农业银行股份有限公司 | Test report generation method, device, equipment, storage medium and program product |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103970666A (en) * | 2014-05-29 | 2014-08-06 | 重庆大学 | Method for detecting repeated software defect reports |
CN106156272A (en) * | 2016-06-21 | 2016-11-23 | 北京工业大学 | A kind of information retrieval method based on multi-source semantic analysis |
CN106250311A (en) * | 2016-07-27 | 2016-12-21 | 成都启力慧源科技有限公司 | Repeated defects based on LDA model report detection method |
-
2019
- 2019-12-24 CN CN201911341418.8A patent/CN111178037A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103970666A (en) * | 2014-05-29 | 2014-08-06 | 重庆大学 | Method for detecting repeated software defect reports |
CN106156272A (en) * | 2016-06-21 | 2016-11-23 | 北京工业大学 | A kind of information retrieval method based on multi-source semantic analysis |
CN106250311A (en) * | 2016-07-27 | 2016-12-21 | 成都启力慧源科技有限公司 | Repeated defects based on LDA model report detection method |
Non-Patent Citations (4)
Title |
---|
李楠; 王晓博; 刘超: "自动分析软件缺陷报告间相关性的方法研究", 《计算机应用研究》, vol. 27, no. 6, pages 2134 - 2139 * |
范道远 等: "融合文本与分类信息的重复缺陷报告检测方法", 《计算机科学》 * |
范道远 等: "融合文本与分类信息的重复缺陷报告检测方法", 《计算机科学》, 19 August 2019 (2019-08-19) * |
高子欣; 赵逢禹; 刘亚: "基于缺陷报告分析的软件缺陷定位方法", 《软件》, vol. 40, no. 5, pages 8 - 15 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112286808A (en) * | 2020-10-29 | 2021-01-29 | 北京字节跳动网络技术有限公司 | Application program testing method and device, electronic equipment and medium |
CN112286808B (en) * | 2020-10-29 | 2023-08-11 | 抖音视界有限公司 | Application program testing method and device, electronic equipment and medium |
CN113238963A (en) * | 2021-06-16 | 2021-08-10 | 中国农业银行股份有限公司 | Test report generation method, device, equipment, storage medium and program product |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111723209B (en) | Semi-supervised text classification model training method, text classification method, system, equipment and medium | |
CN110704633A (en) | Named entity recognition method and device, computer equipment and storage medium | |
CN111460250A (en) | Image data cleaning method, image data cleaning device, image data cleaning medium, and electronic apparatus | |
JP7404839B2 (en) | Identification of software program defect location | |
CN111353549A (en) | Image tag verification method and device, electronic device and storage medium | |
CN115358643B (en) | Message-based upstream and downstream document generation method and device and storage medium | |
CN111178037A (en) | Repeated defect report identification method and device and electronic equipment | |
CN115168868B (en) | Business vulnerability analysis method and server applied to artificial intelligence | |
CN112685324A (en) | Method and system for generating test scheme | |
CN113934848B (en) | Data classification method and device and electronic equipment | |
CN111258905A (en) | Defect positioning method and device, electronic equipment and computer readable storage medium | |
CN113468905B (en) | Graphic code identification method, graphic code identification device, computer equipment and storage medium | |
CN111190973A (en) | Method, device, equipment and storage medium for classifying statement forms | |
CN113886373A (en) | Data processing method and device and electronic equipment | |
CN113723555A (en) | Abnormal data detection method and device, storage medium and terminal | |
CN111274821B (en) | Named entity identification data labeling quality assessment method and device | |
CN116563853A (en) | Method and device suitable for text recognition and error correction | |
CN116578700A (en) | Log classification method, log classification device, equipment and medium | |
CN114139636B (en) | Abnormal operation processing method and device | |
CN113836297B (en) | Training method and device for text emotion analysis model | |
CN112749079B (en) | Defect classification method and device for software test and computing equipment | |
CN114970490A (en) | Text labeling data quality inspection method and device, electronic equipment and storage medium | |
CN110826488B (en) | Image identification method and device for electronic document and storage equipment | |
CN115758135B (en) | Track traffic signal system function demand tracing method and device and electronic equipment | |
CN112612882B (en) | Review report generation method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200519 |
|
RJ01 | Rejection of invention patent application after publication |