CN111178037A - Repeated defect report identification method and device and electronic equipment - Google Patents

Repeated defect report identification method and device and electronic equipment Download PDF

Info

Publication number
CN111178037A
CN111178037A CN201911341418.8A CN201911341418A CN111178037A CN 111178037 A CN111178037 A CN 111178037A CN 201911341418 A CN201911341418 A CN 201911341418A CN 111178037 A CN111178037 A CN 111178037A
Authority
CN
China
Prior art keywords
defect report
similarity
defect
content information
identified
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911341418.8A
Other languages
Chinese (zh)
Inventor
章岩
王建秋
付晨
孟博
曹邦中
由军强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Citic Bank Corp Ltd
Original Assignee
China Citic Bank Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Citic Bank Corp Ltd filed Critical China Citic Bank Corp Ltd
Priority to CN201911341418.8A priority Critical patent/CN111178037A/en
Publication of CN111178037A publication Critical patent/CN111178037A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management

Abstract

The application provides a repeated defect report identification method, a repeated defect report identification device and electronic equipment, which are applied to the technical field of computers, wherein the method comprises the following steps: whether the defect report to be identified is the repeated defect report or not is automatically identified, so that the identification efficiency of the repeated defect report is improved, in addition, the third similarity between the defect report to be identified and at least one target defect report is determined based on the first similarity and the second similarity between the content information of each part and the corresponding part of at least one target defect report, and then whether the defect report to be identified is the repeated defect report or not is determined based on the third similarity between the defect report to be identified and at least one target defect report, namely whether the defect report to be identified is the repeated defect report or not is determined from the similarity calculated by multiple dimensions, so that the accuracy of whether the defect report is the repeated defect report or not is improved.

Description

Repeated defect report identification method and device and electronic equipment
Technical Field
The application relates to the technical field of computers, in particular to a repeated defect report identification method and device and electronic equipment.
Background
The defect report is a document published by a natural language, describes the fault conditions of the software which can not run normally and meet the requirements in the running process, is usually submitted by a user or a tester of the software and then handed to a software quality maintainer, and how to avoid submitting repeated defect reports becomes a problem.
Currently, identification of duplicate defect reports is done manually, i.e., by manually reading each submitted defect report to determine whether it is a duplicate defect report. However, according to the current method of determining whether the report is a defect report by manual reading, it is necessary to manually read the submitted defect reports one by one, and then determine whether the report is a repeatedly submitted defect report, which is time-consuming. Therefore, the conventional method of manually identifying whether the defect report is repeated or not has a problem of low efficiency.
Disclosure of Invention
The application provides a repeated defect identification method, a repeated defect identification device and electronic equipment, which are used for improving the identification efficiency and accuracy of a repeated defect report, and the technical scheme adopted by the application is as follows:
in a first aspect, there is provided a method of repetitive defect identification, the method comprising,
calculating and determining a first similarity between a defect report to be identified and at least one target defect report;
acquiring content information of a plurality of parts of the defect report to be identified, and respectively determining second similarity of the content information of each part and a corresponding part of at least one target defect report to obtain second similarity of the content information of each part;
determining a third similarity between the defect report to be identified and at least one target defect report based on the predetermined first similarity and the weight value of the second similarity of the content information of each part;
and determining whether the defect report to be identified is a repeated defect report based on the third similarity of the defect report to be identified and the at least one target defect report.
Optionally, determining a second similarity between the content information of each portion and the corresponding portion of the at least one target defect report respectively comprises:
determining the type of the content information of each part;
a second similarity of the content information of the respective portion to a corresponding portion of the at least one target defect report is determined based on the type of the content information of the respective portion.
Optionally, the content information of the plurality of parts includes: test case information, test environment information, defect creator information, defect responsible person information, and defect label information;
determining a second similarity of the content information of each portion to a corresponding portion of the at least one target defect report based on the type of the content information of each portion, comprising:
when the content information is test case information, test environment information and defect label information, determining a second similarity of the corresponding content information and a corresponding part of at least one target defect report by calculating the distance between the text vectors;
and when the content information is defect creator information and defect accountant information, determining a second similarity between the corresponding content information and the corresponding part of the at least one target defect report in a keyword matching mode.
Further, the method further comprises:
acquiring an existing defect report;
judging whether the existing defect report and the defect report to be identified contain the same defect component name or not;
and when the existing defect report and the defect report to be identified contain the same defect component name, taking the existing defect report as a target defect report.
Further, the method further comprises:
carrying out stop word processing on the defect report to be identified and the target defect report, wherein the stop words comprise fixed template fields of the defect report;
calculating and determining a first similarity between a defect report to be identified and at least one target defect report, specifically comprising:
and calculating a first similarity between the defect report to be identified after determining the stop word and at least one target defect report after determining the stop word.
In a second aspect, there is provided a repetitive defect report identification apparatus, the apparatus comprising,
the calculation determining module is used for calculating and determining a first similarity between the defect report to be identified and at least one target defect report;
the first determining module is used for acquiring the content information of a plurality of parts of the defect report to be identified, and respectively determining the second similarity of the content information of each part and the corresponding part of at least one target defect report to obtain the second similarity of the content information of each part;
the second determining module is used for determining a third similarity between the defect report to be identified and at least one target defect report based on the preset first similarity and the weighted value of the second similarity of the content information of each part;
and the third determining module is used for determining whether the defect report to be identified is a repeated defect report or not based on the third similarity between the defect report to be identified and the at least one target defect report.
Optionally, the first determining module includes:
a first determination unit configured to determine types of content information of the respective portions;
a second determining unit for determining a second similarity of the content information of each portion with a corresponding portion of the at least one target defect report based on the type of the content information of each portion.
Optionally, the content information of the plurality of parts includes: test case information, test environment information, defect creator information, defect responsible person information, and defect label information;
the second determining unit is specifically used for determining a second similarity between the corresponding content information and the corresponding part of the at least one target defect report by calculating the distance between the text vectors when the content information is the test case information, the test environment information and the defect label information; and/or, the second similarity determination module is specifically configured to determine, by means of keyword matching, a second similarity between the corresponding content information and a corresponding portion of the at least one target defect report when the content information is defect creator information or defect accountant information.
Further, the apparatus further comprises:
an acquisition module for acquiring an existing defect report;
the judging module is used for judging whether the existing defect report and the defect report to be identified contain the same defect component name or not;
and the module is used for taking the existing defect report as a target defect report when the existing defect report and the defect report to be identified contain the same defect component name.
Further, the apparatus further comprises:
the removing module is used for performing stop word processing on the defect report to be identified and the target defect report, wherein the stop words comprise fixed template fields of the defect report;
the first calculation and determination module is specifically used for calculating a first similarity between the defect report to be identified after determining the stop word and at least one target defect report after determining the stop word.
In a third aspect, an electronic device is provided, which includes:
one or more processors;
a memory;
one or more application programs, wherein the one or more application programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: the repetitive defect report identification method shown in the first aspect is performed.
In a fourth aspect, there is provided a computer-readable storage medium for storing computer instructions which, when run on a computer, cause the computer to perform the repetitive defect report identification method of the first aspect.
Compared with the prior art that whether the report is the repeated defect report is determined manually, the method determines the first similarity between the defect report to be identified and at least one target defect report through calculation, then, content information of a plurality of parts of the defect report to be identified is obtained, second similarity of the content information of each part and a corresponding part of at least one target defect report is respectively determined, second similarity of the content information of each part is obtained, third similarity of the defect report to be identified and at least one target defect report is determined based on the preset first similarity and the weight value of the second similarity of the content information of each part, and finally whether the defect report to be identified is a repeated defect report is determined based on the third similarity of the defect report to be identified and at least one target defect report. The method comprises the steps of automatically identifying whether a defect report to be identified is a repeated defect report or not, so that the identification efficiency of the repeated defect report is improved, determining a third similarity between the defect report to be identified and at least one target defect report based on a first similarity and a second similarity between content information of each part and a corresponding part of the at least one target defect report, and then determining whether the defect report to be identified is the repeated defect report or not based on the third similarity between the defect report to be identified and the at least one target defect report, namely determining whether the defect report is the repeated defect report or not based on the similarities calculated from multiple dimensions, so that the accuracy of the repeated defect report or not is improved.
Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flowchart illustrating a method for identifying a re-bug report according to an embodiment of the present application;
FIG. 2 is a schematic structural diagram of an apparatus for identifying a duplicate defect report according to an embodiment of the present application;
FIG. 3 is a schematic diagram of another repeat defect report identification apparatus according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to the embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application.
As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
The embodiment of the application provides a repeated defect report identification method, as shown in fig. 1, the method may include the following steps:
step S101, calculating and determining a first similarity between a defect report to be identified and at least one target defect report;
specifically, a first similarity between the defect report to be identified and at least one target defect report can be determined by calculating the distance between the texts; wherein, the distance can be Euclidean distance, cosine distance, Hamming distance, etc.; the text vector expressions of the defect report to be identified and the at least one target defect report may be obtained through a corresponding word embedding algorithm, where the obtained text vector expressions may be implemented through a neural network method, may also be implemented through a TF/IDF (term-inverse document frequency) algorithm, and may also be implemented through other methods capable of implementing the present application, which are not limited herein.
Step S102, acquiring content information of a plurality of parts of the defect report to be identified, and respectively determining a second similarity between the content information of each part and a corresponding part of at least one target defect report to obtain a second similarity of the content information of each part;
specifically, the defect report may be composed of a plurality of parts, content information of the plurality of parts may be identified by a corresponding keyword matching method or a natural language understanding algorithm, and then a second similarity between the content information of each part and a corresponding part of the at least one target defect report may be determined by a corresponding similarity determination method.
Illustratively, the content information of the plurality of parts may be content information of part a, content information of part B, and content information of part C, and when there is only one target defect report, a second similarity of the content information of part a of the defect report to be identified and the content information of part a of the target defect report, a second similarity of the content information of part B of the defect report to be identified and the content information of part B of the target defect report, and a second similarity of the content information of part C of the defect report to be identified and the content information of part C of the target defect report may be determined, respectively. There may be a plurality of second similarities.
For example, when there are a plurality of (three as an example) target defect reports, a second similarity between the content information of the part a of the defect report to be identified and the content information of the part a of the first target defect report, a second similarity between the content information of the part a of the defect report to be identified and the content information of the part a of the second target defect report, a second similarity between the content information of the part a of the defect report to be identified and the content information of the part a of the third target defect report, and a second similarity between the content information of the part B of the defect report to be identified and the corresponding parts of the first, second, and third target defect reports may be determined, respectively.
Step S103, determining a third similarity between the defect report to be identified and at least one target defect report based on the preset first similarity and the weighted value of the second similarity of the content information of each part;
specifically, a certain weight value may be assigned to each of the first similarity and the second similarity, and then a third similarity between the defect report to be identified and the at least one target defect report may be determined comprehensively.
Illustratively, the content information of the plurality of parts may be content information of part a, content information of part B, content information of part C, and when there is only one target defect report, it may be represented by the formula:
the third similarity is the first similarity, the partial similarity of 1+ a × the partial similarity of 2+ B × the partial similarity of 3+ C × the partial similarity of 4 (equation 1)
And determining a third similarity between the defect report to be identified and the target defect report, wherein the weight 1, the weight 2, the weight 3 and the weight 4 may be the same or different, and optionally, the value of the weight 1 is greater than the values of the weight 2, the weight 3 and the weight 4.
For example, when there are a plurality of target defect reports, the third similarity between the defect report to be identified and each target defect report may be calculated by the above method, and details are not repeated here.
And step S104, determining whether the defect report to be identified is a repeated defect report or not based on the third similarity between the defect report to be identified and at least one target defect report.
Specifically, whether the defect report to be identified is a repeated defect report is determined based on the third similarity, wherein when the similarity between the defect report to be identified and a certain target defect report exceeds a certain threshold value, the defect report to be identified is determined as the repeated defect report.
Compared with the prior art that whether the report is the repeated defect report is determined manually, the identification method of the repeated defect report determines the first similarity between the defect report to be identified and at least one target defect report through calculation, then, content information of a plurality of parts of the defect report to be identified is obtained, second similarity of the content information of each part and a corresponding part of at least one target defect report is respectively determined, second similarity of the content information of each part is obtained, third similarity of the defect report to be identified and at least one target defect report is determined based on the preset first similarity and the weight value of the second similarity of the content information of each part, and finally whether the defect report to be identified is a repeated defect report is determined based on the third similarity of the defect report to be identified and at least one target defect report. The method comprises the steps of automatically identifying whether a defect report to be identified is a repeated defect report or not, so that the identification efficiency of the repeated defect report is improved, determining a third similarity between the defect report to be identified and at least one target defect report based on a first similarity and a second similarity between content information of each part and a corresponding part of the at least one target defect report, and then determining whether the defect report to be identified is the repeated defect report or not based on the third similarity between the defect report to be identified and the at least one target defect report, namely determining whether the defect report is the repeated defect report or not based on the similarities calculated from multiple dimensions, so that the accuracy of the repeated defect report or not is improved.
The embodiment of the present application provides a possible implementation manner, and step S102 includes:
step S1021 (not shown in the figure), which determines the types of content information of the respective sections;
step S1022 (not shown in the figure) determines a second similarity between the content information of each portion and the corresponding portion of the at least one target defect report based on the type of the content information of each portion.
Specifically, the similarity between the content information of different parts may be calculated in different manners, and the type corresponding to the content information of each part may be determined first, and then the second similarity between the content information of each part and the corresponding part of the at least one target defect report may be determined based on the type of the content information of each part.
Illustratively, the content information has three types, different types respectively correspond to corresponding similarity calculation modes, and when the determined content information is the type A, the second similarity is calculated through the similarity calculation mode of the type A content information.
With the embodiment of the application, the second similarity between the content information of each part and the corresponding part of the at least one target defect report is determined based on the type of the content information, and the problem of how to determine the second similarity is solved.
The embodiment of the present application provides a possible implementation manner, where the content information of the multiple parts includes: test case information, test environment information, defect creator information, defect responsible person information, and defect label information;
wherein, the test environment is an important information of the test report, and each defect is associated with an operating system, middleware and a topological structure. The test context information may be stored as an artifact (artifact) in the test management system or, in the defect tracking system, as part of the "description" field or information of the "context" field. The more similar the test environment, the higher the likelihood of inter-defect duplication.
The defect creator may be a developer or a tester. Testers of different test teams follow different test schemes, for example, a functional test team is focused on finding functional errors of products, a system verification test is focused on finding defects in a simulated user environment, so testers in the same test team are more likely to find the same defects.
The defect label may be a keyword label of the defect report, and the keyword label may be manually extracted by a writer of the defect report.
Step S1022 (not shown in the figure) includes:
step S10221 (not shown in the figure), when the content information is test case information, test environment information, defect label information, determining a second similarity between the corresponding content information and a corresponding portion of at least one target defect report by calculating a distance between the text vectors;
specifically, when the content information is test case information, test environment information, defect label information, the second similarity of the corresponding content information to the corresponding portion of the at least one target defect report may be determined by a distance between the text vectors.
Step S10222 (not shown in the figure), when the content information is the defect creator information or the defect responsible person information, determining a second similarity between the corresponding content information and the corresponding part of the at least one target defect report by means of keyword matching.
Specifically, when the content information is the defect creator information and the defect responsible person information, determining a second similarity between the corresponding content information and the corresponding part of the at least one target defect report in a keyword matching manner, and if the defect creator of the defect report to be identified is the same as or is located in the same team as the defect creator of the target defect report, determining that the similarity value is 1, otherwise, determining that the similarity value is 0.
The method and the device solve the problem of how to specifically determine the similarity between the content information of the defect report to be identified and the content information of the corresponding part of the target defect report.
The embodiment of the present application provides a possible implementation manner, and further, the method further includes:
step S105 (not shown in the figure), acquiring an existing defect report;
step S106 (not shown in the figure), determining whether the current defect report and the defect report to be identified contain the same defect component name;
step S107 (not shown in the figure), when the existing defect report and the defect report to be identified contain the same defective component name, the existing defect report is taken as the target defect report.
Specifically, one or more defect reports can be obtained from the database, and then whether the existing defect report and the defect report to be identified contain the same defect component name or not is judged; if the existing defect report and the identified defect report belong to the same component, the probability of similarity between the existing defect report and the identified defect report is higher than the probability of similarity between the existing defect report and the identified defect report, so that if the existing defect report and the identified defect report belong to the same component, the existing defect report is determined as the target defect report.
For the embodiment of the application, the existing defect report is screened by judging whether the existing defect report and the defect report to be identified comprise the same defect component name, so that the subsequent data processing amount or calculation amount is reduced.
The embodiment of the present application provides a possible implementation manner, and further, the method further includes:
step S108 (not shown in the figure), the to-be-identified defect report and the target defect report are processed by stop words, and the stop words comprise fixed template fields of the defect report;
specifically, the target defect report and the defect report to be identified may be processed by deactivating words, where the deactivating words include fixed template fields of the defect report, for example, a defect report template for the "description" field may contain words such as recurring steps, actual results, expected results, detailed error information, screen shot attachments, and the like.
Step S101 specifically includes: step S1011 (not shown in the figure), calculates a first similarity between the defect report to be identified after determining the stop word and at least one target defect report after determining the stop word.
For the embodiment of the application, the stop word processing is carried out on the fixed template fields of the defect report to be identified and the target defect report, so that the data processing amount is further reduced.
Fig. 2 is a repeated defect report recognition apparatus according to an embodiment of the present application, where the apparatus 20 includes: a calculation determination module 201, a first determination module 202, a second determination module 203, and a third determination module 204, wherein,
a calculation determination module 201, configured to calculate and determine a first similarity between a defect report to be identified and at least one target defect report;
a first determining module 202, configured to obtain content information of multiple portions of a defect report to be identified, and determine a second similarity between the content information of each portion and a corresponding portion of at least one target defect report, to obtain a second similarity between the content information of each portion;
the second determining module 203 is used for determining a third similarity between the defect report to be identified and at least one target defect report based on the predetermined first similarity and the weighted value of the second similarity of the content information of each part;
a third determining module 204, configured to determine whether the defect report to be identified is a repeated defect report based on a third similarity between the defect report to be identified and the at least one target defect report.
Compared with the prior art that whether the report is the repeated defect report is determined manually, the repeated defect report identification device determines the first similarity between the defect report to be identified and at least one target defect report through calculation, then, content information of a plurality of parts of the defect report to be identified is obtained, second similarity of the content information of each part and a corresponding part of at least one target defect report is respectively determined, second similarity of the content information of each part is obtained, third similarity of the defect report to be identified and at least one target defect report is determined based on the preset first similarity and the weight value of the second similarity of the content information of each part, and finally whether the defect report to be identified is a repeated defect report is determined based on the third similarity of the defect report to be identified and at least one target defect report. The method comprises the steps of automatically identifying whether a defect report to be identified is a repeated defect report or not, so that the identification efficiency of the repeated defect report is improved, determining a third similarity between the defect report to be identified and at least one target defect report based on a first similarity and a second similarity between content information of each part and a corresponding part of the at least one target defect report, and then determining whether the defect report to be identified is the repeated defect report or not based on the third similarity between the defect report to be identified and the at least one target defect report, namely determining whether the defect report is the repeated defect report or not based on the similarities calculated from multiple dimensions, so that the accuracy of the repeated defect report or not is improved.
The device for identifying a duplicate defect report of the present embodiment can perform the method for identifying a duplicate defect report provided in the above embodiments of the present application, and the implementation principles thereof are similar, and are not described herein again.
As shown in fig. 3, the present embodiment provides another repetitive defect report identification apparatus, where the apparatus 30 includes: a calculation determination module 301, a first determination module 302, a second determination module 303, and a third determination module 304, wherein,
a calculation determining module 301, configured to calculate and determine a first similarity between a defect report to be identified and at least one target defect report;
here, the calculation determination module 301 in fig. 3 has the same or similar function as the calculation determination module 201 in fig. 2.
A first determining module 302, configured to obtain content information of multiple portions of a defect report to be identified, and determine a second similarity between the content information of each portion and a corresponding portion of at least one target defect report, to obtain a second similarity between the content information of each portion;
wherein the first determining module 302 in fig. 3 has the same or similar function as the first determining module 201 in fig. 2.
A second determining module 303, configured to determine a third similarity between the defect report to be identified and at least one target defect report based on a predetermined first similarity and a weight value of a second similarity of the content information of each part;
wherein the second determining module 303 in fig. 3 has the same or similar function as the second determining module 203 in fig. 2.
A third determining module 304, configured to determine whether the defect report to be identified is a repeated defect report based on a third similarity between the defect report to be identified and the at least one target defect report.
Wherein the third determining module 304 in fig. 3 has the same or similar function as the third determining module 204 in fig. 2.
The embodiment of the present application provides a possible implementation manner, and in particular, the first determining module 302 includes:
a first determination unit 3021 for determining the types of content information of the respective sections;
a second determining unit 3022 for determining a second similarity of the content information of each portion to a corresponding portion of the at least one target defect report based on the type of the content information of each portion.
With the embodiment of the application, the second similarity between the content information of each part and the corresponding part of the at least one target defect report is determined based on the type of the content information, and the problem of how to determine the second similarity is solved.
The embodiment of the present application provides a possible implementation manner, where the content information of the multiple parts includes: test case information, test environment information, defect creator information, defect responsible person information, and defect label information;
a second determining unit 3022, configured to determine, when the content information is test case information, test environment information, and defect label information, a second similarity between the corresponding content information and a corresponding portion of the at least one target defect report by calculating a distance between the text vectors; and/or, the second similarity determination module is specifically configured to determine, by means of keyword matching, a second similarity between the corresponding content information and a corresponding portion of the at least one target defect report when the content information is defect creator information or defect accountant information.
The method and the device solve the problem of how to specifically determine the similarity between the content information of the defect report to be identified and the content information of the corresponding part of the target defect report.
The embodiment of the present application provides a possible implementation manner, and further, the apparatus further includes:
an obtaining module 305, configured to obtain an existing defect report;
a judging module 306, configured to judge whether the existing defect report and the defect report to be identified contain the same defect component name;
and a module 307, configured to take the existing defect report as a target defect report when the existing defect report and the defect report to be identified contain the same defective component name.
For the embodiment of the application, the existing defect report is screened by judging whether the existing defect report and the defect report to be identified comprise the same defect component name, so that the subsequent data processing amount or calculation amount is reduced.
The embodiment of the present application provides a possible implementation manner, and further, the apparatus 30 further includes:
the removing module 308 is configured to perform stop word processing on the defect report to be identified and the target defect report, where a stop word includes a fixed template field of the defect report;
the first calculation and determination module is specifically used for calculating a first similarity between the defect report to be identified after determining the stop word and at least one target defect report after determining the stop word.
For the embodiment of the application, the stop word processing is carried out on the fixed template fields of the defect report to be identified and the target defect report, so that the data processing amount is further reduced.
Compared with the prior art that whether the report is the repeated defect report is determined manually, the repeated defect report identification device determines the first similarity between the defect report to be identified and at least one target defect report through calculation, then, content information of a plurality of parts of the defect report to be identified is obtained, second similarity of the content information of each part and a corresponding part of at least one target defect report is respectively determined, second similarity of the content information of each part is obtained, third similarity of the defect report to be identified and at least one target defect report is determined based on the preset first similarity and the weight value of the second similarity of the content information of each part, and finally whether the defect report to be identified is a repeated defect report is determined based on the third similarity of the defect report to be identified and at least one target defect report. The method comprises the steps of automatically identifying whether a defect report to be identified is a repeated defect report or not, so that the identification efficiency of the repeated defect report is improved, determining a third similarity between the defect report to be identified and at least one target defect report based on a first similarity and a second similarity between content information of each part and a corresponding part of the at least one target defect report, and then determining whether the defect report to be identified is the repeated defect report or not based on the third similarity between the defect report to be identified and the at least one target defect report, namely determining whether the defect report is the repeated defect report or not based on the similarities calculated from multiple dimensions, so that the accuracy of the repeated defect report or not is improved.
The embodiment of the present application provides a device for identifying a repeated defect report, which is suitable for the method shown in the above embodiment and is not described herein again.
An embodiment of the present application provides an electronic device, as shown in fig. 4, an electronic device 40 shown in fig. 4 includes: a processor 401 and a memory 403. Wherein the processor 401 is coupled to the memory 403, such as via a bus 402. Further, the electronic device 40 may also include a transceiver 404. It should be noted that the transceiver 404 is not limited to one in practical applications, and the structure of the electronic device 40 is not limited to the embodiment of the present application. The processor 401 is applied to the embodiment of the present application, and is configured to implement the functions of the calculation determination module, the first determination module, the second determination module, and the third determination module shown in fig. 2 or fig. 3, and the functions of the acquisition module, the judgment module, the module, and the removal module shown in fig. 3. The transceiver 404 includes a receiver and a transmitter.
The processor 401 may be a CPU, general purpose processor, DSP, ASIC, FPGA or other programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 401 may also be a combination of computing functions, e.g., comprising one or more microprocessors, a combination of a DSP and a microprocessor, or the like.
Bus 402 may include a path that transfers information between the above components. The bus 402 may be a PCI bus or an EISA bus, etc. The bus 402 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 4, but this does not indicate only one bus or one type of bus.
The memory 403 may be, but is not limited to, a ROM or other type of static storage device that can store static information and instructions, a RAM or other type of dynamic storage device that can store information and instructions, an EEPROM, a CD-ROM or other optical disk storage, optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
The memory 403 is used for storing application program codes for executing the scheme of the application, and the execution is controlled by the processor 401. The processor 401 is configured to execute application program codes stored in the memory 403 to implement the functions of the repetitive defect report identifying apparatus provided by the embodiment shown in fig. 2 or fig. 3.
The embodiment of the application provides an electronic device, compared with the prior art that whether the report is the repeated defect report is determined in a manual mode, the embodiment of the application determines the first similarity of the defect report to be identified and at least one target defect report through calculation, then, content information of a plurality of parts of the defect report to be identified is obtained, second similarity of the content information of each part and a corresponding part of at least one target defect report is respectively determined, second similarity of the content information of each part is obtained, third similarity of the defect report to be identified and at least one target defect report is determined based on the preset first similarity and the weight value of the second similarity of the content information of each part, and finally whether the defect report to be identified is a repeated defect report is determined based on the third similarity of the defect report to be identified and at least one target defect report. The method comprises the steps of automatically identifying whether a defect report to be identified is a repeated defect report or not, so that the identification efficiency of the repeated defect report is improved, determining a third similarity between the defect report to be identified and at least one target defect report based on a first similarity and a second similarity between content information of each part and a corresponding part of the at least one target defect report, and then determining whether the defect report to be identified is the repeated defect report or not based on the third similarity between the defect report to be identified and the at least one target defect report, namely determining whether the defect report is the repeated defect report or not based on the similarities calculated from multiple dimensions, so that the accuracy of the repeated defect report or not is improved.
The embodiment of the application provides an electronic device suitable for the method embodiment. And will not be described in detail herein.
The present application provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the method shown in the above embodiments is implemented.
Embodiments of the present application provide a computer-readable storage medium that computationally determines a first similarity of a defect report to be identified to at least one target defect report, then, content information of a plurality of parts of the defect report to be identified is obtained, second similarity of the content information of each part and a corresponding part of at least one target defect report is respectively determined, second similarity of the content information of each part is obtained, third similarity of the defect report to be identified and at least one target defect report is determined based on the preset first similarity and the weight value of the second similarity of the content information of each part, and finally whether the defect report to be identified is a repeated defect report is determined based on the third similarity of the defect report to be identified and at least one target defect report. The method comprises the steps of automatically identifying whether a defect report to be identified is a repeated defect report or not, so that the identification efficiency of the repeated defect report is improved, determining a third similarity between the defect report to be identified and at least one target defect report based on a first similarity and a second similarity between content information of each part and a corresponding part of the at least one target defect report, and then determining whether the defect report to be identified is the repeated defect report or not based on the third similarity between the defect report to be identified and the at least one target defect report, namely determining whether the defect report is the repeated defect report or not based on the similarities calculated from multiple dimensions, so that the accuracy of the repeated defect report or not is improved.
The embodiment of the application provides a computer-readable storage medium which is suitable for the method embodiment. And will not be described in detail herein.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
The foregoing is only a partial embodiment of the present application, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present application, and these modifications and decorations should also be regarded as the protection scope of the present application.

Claims (10)

1. A method for identifying a recurring defect report, comprising:
calculating and determining a first similarity between a defect report to be identified and at least one target defect report;
acquiring content information of a plurality of parts of the defect report to be identified, and respectively determining second similarity of the content information of each part and a corresponding part of at least one target defect report to obtain second similarity of the content information of each part;
determining a third similarity between the defect report to be identified and at least one target defect report based on the predetermined weight values of the first similarity and the second similarity of the content information of each part;
determining whether the defect report to be identified is a repeated defect report based on a third similarity of the defect report to be identified and at least one target defect report.
2. The method of claim 1, wherein the determining a second similarity between the content information of each portion and the corresponding portion of the at least one target defect report comprises:
determining the type of the content information of each part;
determining a second similarity of the content information of the respective portion to a corresponding portion of the at least one target defect report based on the type of the content information of the respective portion.
3. The method of claim 1, wherein the plurality of portions of content information comprise: test case information, test environment information, defect creator information, defect responsible person information, and defect label information;
the determining a second similarity of the content information of each portion to a corresponding portion of at least one target defect report based on the type of the content information of each portion comprises:
when the content information is test case information, test environment information and defect label information, determining a second similarity of the corresponding content information and a corresponding part of at least one target defect report by calculating the distance between text vectors;
and when the content information is defect creator information and defect accountant information, determining a second similarity between the corresponding content information and the corresponding part of the at least one target defect report in a keyword matching mode.
4. The method of claim 1, further comprising:
acquiring an existing defect report;
judging whether the existing defect report and the defect report to be identified contain the same defect component name or not;
and when the existing defect report and the defect report to be identified contain the same defect component name, taking the existing defect report as a target defect report.
5. The method according to any one of claims 1-4, characterized in that the method further comprises:
performing stop word processing on the defect report to be identified and the target defect report, wherein the stop words comprise fixed template fields of the defect report;
the calculating and determining a first similarity between the defect report to be identified and at least one target defect report specifically comprises:
and calculating a first similarity between the defect report to be identified after determining the stop word and at least one target defect report after determining the stop word.
6. An apparatus for duplicate defect report identification, comprising:
the calculation determining module is used for calculating and determining a first similarity between the defect report to be identified and at least one target defect report;
the first determining module is used for acquiring the content information of a plurality of parts of the defect report to be identified, and respectively determining the second similarity of the content information of each part and the corresponding part of at least one target defect report to obtain the second similarity of the content information of each part;
the second determining module is used for determining a third similarity between the defect report to be identified and at least one target defect report based on the preset first similarity and the weight value of the second similarity of the content information of each part;
a third determining module, configured to determine whether the defect report to be identified is a repeated defect report based on a third similarity between the defect report to be identified and at least one target defect report.
7. The apparatus of claim 6, wherein the first determining module comprises:
a first determination unit configured to determine a type of the content information of the respective portions;
a second determining unit for determining a second similarity of the content information of each portion with a corresponding portion of the at least one target defect report based on the type of the content information of each portion.
8. The apparatus of claim 6, wherein the content information of the plurality of portions comprises: test case information, test environment information, defect creator information, defect responsible person information, and defect label information;
the second determining unit is specifically configured to determine, when the content information is test case information, test environment information, and defect label information, a second similarity between the corresponding content information and a corresponding portion of the at least one target defect report by calculating a distance between text vectors; and/or, the second similarity determination unit is specifically configured to determine, by means of keyword matching, a second similarity between the corresponding content information and a corresponding portion of the at least one target defect report when the content information is defect creator information or defect responsible person information.
9. An electronic device, comprising:
one or more processors;
a memory;
one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: -performing a repetitive defect report identification method according to any of claims 1 to 5.
10. A computer-readable storage medium storing computer instructions which, when executed on a computer, cause the computer to perform the repetitive defect report identification method of any of claims 1-5.
CN201911341418.8A 2019-12-24 2019-12-24 Repeated defect report identification method and device and electronic equipment Pending CN111178037A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911341418.8A CN111178037A (en) 2019-12-24 2019-12-24 Repeated defect report identification method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911341418.8A CN111178037A (en) 2019-12-24 2019-12-24 Repeated defect report identification method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN111178037A true CN111178037A (en) 2020-05-19

Family

ID=70655633

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911341418.8A Pending CN111178037A (en) 2019-12-24 2019-12-24 Repeated defect report identification method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN111178037A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112286808A (en) * 2020-10-29 2021-01-29 北京字节跳动网络技术有限公司 Application program testing method and device, electronic equipment and medium
CN113238963A (en) * 2021-06-16 2021-08-10 中国农业银行股份有限公司 Test report generation method, device, equipment, storage medium and program product

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103970666A (en) * 2014-05-29 2014-08-06 重庆大学 Method for detecting repeated software defect reports
CN106156272A (en) * 2016-06-21 2016-11-23 北京工业大学 A kind of information retrieval method based on multi-source semantic analysis
CN106250311A (en) * 2016-07-27 2016-12-21 成都启力慧源科技有限公司 Repeated defects based on LDA model report detection method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103970666A (en) * 2014-05-29 2014-08-06 重庆大学 Method for detecting repeated software defect reports
CN106156272A (en) * 2016-06-21 2016-11-23 北京工业大学 A kind of information retrieval method based on multi-source semantic analysis
CN106250311A (en) * 2016-07-27 2016-12-21 成都启力慧源科技有限公司 Repeated defects based on LDA model report detection method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
李楠; 王晓博; 刘超: "自动分析软件缺陷报告间相关性的方法研究", 《计算机应用研究》, vol. 27, no. 6, pages 2134 - 2139 *
范道远 等: "融合文本与分类信息的重复缺陷报告检测方法", 《计算机科学》 *
范道远 等: "融合文本与分类信息的重复缺陷报告检测方法", 《计算机科学》, 19 August 2019 (2019-08-19) *
高子欣; 赵逢禹; 刘亚: "基于缺陷报告分析的软件缺陷定位方法", 《软件》, vol. 40, no. 5, pages 8 - 15 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112286808A (en) * 2020-10-29 2021-01-29 北京字节跳动网络技术有限公司 Application program testing method and device, electronic equipment and medium
CN112286808B (en) * 2020-10-29 2023-08-11 抖音视界有限公司 Application program testing method and device, electronic equipment and medium
CN113238963A (en) * 2021-06-16 2021-08-10 中国农业银行股份有限公司 Test report generation method, device, equipment, storage medium and program product

Similar Documents

Publication Publication Date Title
CN111723209B (en) Semi-supervised text classification model training method, text classification method, system, equipment and medium
CN110704633A (en) Named entity recognition method and device, computer equipment and storage medium
CN111460250A (en) Image data cleaning method, image data cleaning device, image data cleaning medium, and electronic apparatus
JP7404839B2 (en) Identification of software program defect location
CN111353549A (en) Image tag verification method and device, electronic device and storage medium
CN115358643B (en) Message-based upstream and downstream document generation method and device and storage medium
CN111178037A (en) Repeated defect report identification method and device and electronic equipment
CN115168868B (en) Business vulnerability analysis method and server applied to artificial intelligence
CN112685324A (en) Method and system for generating test scheme
CN113934848B (en) Data classification method and device and electronic equipment
CN111258905A (en) Defect positioning method and device, electronic equipment and computer readable storage medium
CN113468905B (en) Graphic code identification method, graphic code identification device, computer equipment and storage medium
CN111190973A (en) Method, device, equipment and storage medium for classifying statement forms
CN113886373A (en) Data processing method and device and electronic equipment
CN113723555A (en) Abnormal data detection method and device, storage medium and terminal
CN111274821B (en) Named entity identification data labeling quality assessment method and device
CN116563853A (en) Method and device suitable for text recognition and error correction
CN116578700A (en) Log classification method, log classification device, equipment and medium
CN114139636B (en) Abnormal operation processing method and device
CN113836297B (en) Training method and device for text emotion analysis model
CN112749079B (en) Defect classification method and device for software test and computing equipment
CN114970490A (en) Text labeling data quality inspection method and device, electronic equipment and storage medium
CN110826488B (en) Image identification method and device for electronic document and storage equipment
CN115758135B (en) Track traffic signal system function demand tracing method and device and electronic equipment
CN112612882B (en) Review report generation method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200519

RJ01 Rejection of invention patent application after publication