CN111339272A - Code defect report retrieval method and device - Google Patents

Code defect report retrieval method and device Download PDF

Info

Publication number
CN111339272A
CN111339272A CN202010108813.8A CN202010108813A CN111339272A CN 111339272 A CN111339272 A CN 111339272A CN 202010108813 A CN202010108813 A CN 202010108813A CN 111339272 A CN111339272 A CN 111339272A
Authority
CN
China
Prior art keywords
code
retrieval
defect report
report
code defect
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010108813.8A
Other languages
Chinese (zh)
Inventor
陈馨慧
李子强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southern University of Science and Technology
Original Assignee
Southern University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southern University of Science and Technology filed Critical Southern University of Science and Technology
Priority to CN202010108813.8A priority Critical patent/CN111339272A/en
Publication of CN111339272A publication Critical patent/CN111339272A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3692Test management for test results analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a code defect report retrieval method and a device, comprising the following steps: acquiring a code to be retrieved; analyzing the code to be retrieved based on the code keyword and the corpus rule to obtain a project description label; reading a code defect report from a specified search library based on the item description tag. The device comprises a collecting module, a searching module and a searching module, wherein the collecting module is used for acquiring a code to be searched; the analysis module is used for analyzing the code to be retrieved based on the code keyword and the corpus rule to obtain a project description label; and the extraction module is used for reading the code defect report from the specified retrieval data bank based on the item description label. The embodiment of the invention determines the proper item description label through the identification of the code so as to read the code defect report from the specified retrieval database, and has better aiming effect and higher retrieval efficiency compared with simple keyword retrieval.

Description

Code defect report retrieval method and device
Technical Field
The invention relates to the technical field of data retrieval, in particular to a code defect report retrieval method and a code defect report retrieval device.
Background
With the development of open source communities and the popularization of smart phones, the demand for developing mobile phone programs is increasing and higher. Many automated test case generation techniques have been developed for finding crashes or defects in mobile phone applications.
Some application development engineers prefer to read test cases interpreted in natural language. Meanwhile, the defect report written in the natural language has redundant content or similar parts, and can be used for cross-project application in different programs.
An open source code hosting website, such as GitHub, has advanced search functions; the defect report generated in the process of searching the test case can be retrieved by setting a plurality of elements. For a program of a specific type, such as Android, in the current retrieval mode, the GitHub cannot filter out a specific application program repository, and only one restriction element can be added as a filtering condition for developing a language, and the function or application scene of the program cannot be searched, so that the desired data is inconvenient to search.
Disclosure of Invention
The present invention is directed to solving at least one of the problems of the prior art. Therefore, the invention provides a code defect report retrieval device which can meet the positioning requirement under a complex safety environment.
The invention also provides a code defect report retrieval method.
In a first aspect, an embodiment of the present invention provides a code defect report retrieval method, including: acquiring a code to be retrieved; analyzing the code to be retrieved based on the code keyword and the corpus rule to obtain a project description label; reading a code defect report from a specified search library based on the item description tag.
The code defect report retrieval method of the embodiment of the invention at least has the following beneficial effects: the method has the advantages that the proper item description label is determined for the identification of the code so as to read the code defect report from the specified retrieval database, and compared with simple keyword retrieval, the method has better targeted effect and higher retrieval efficiency.
According to another embodiment of the code defect report search method of the present invention, setting the specified search library includes: acquiring a defect report from an open source community; matching and marking a defect report according to the functional keywords and the corpus rules, wherein the functional keywords are associated with the code keywords; and collecting the defect reports of the finished marks to obtain a retrieval database. The defect report can be obtained in the largest range through the open source community, and the content of the defect report can be accurately positioned through the functional keywords and the corpus rules, so that the subsequent retrieval is facilitated.
According to another embodiment of the present invention, a code defect report retrieval method further includes: acquiring a retrieval keyword; correspondingly, based on the item description label and the search key, reading the corresponding code defect report from the specified search library. By adding the search key as the search parameter, the accuracy of the search can be further improved.
According to another embodiment of the code defect report search method of the present invention, the search database processes the corresponding code defect report according to a preset text similarity algorithm, and outputs the result according to the obtained similarity score. The text similarity serves as a recommendation basis, and a user can conveniently check the retrieval result.
According to another embodiment of the code defect report search method of the present invention, the text similarity calculation method specifically includes: calculating the similarity between different code defect reports through an overlap coefficient and/or an n-gram algorithm based on at least one of the sub-tags contained in the retrieval key words and/or the item description tags; wherein the sub-tags of the item description tag include: XML file name, view name and resource name; the sub-tags of the search key include: XML file name, view name, resource name, report length, report state, correlation mark and reply number; correspondingly, calculating to obtain the similarity score of the single code defect report according to the similarity of the sub-labels and a preset mathematical formula. Through a specific text similarity algorithm and a specific sub-label, the retrieval precision can be improved by combining specific actual operation.
In a second aspect, an embodiment of the present invention provides a code defect report retrieval apparatus, including: the collection module is used for acquiring a code to be retrieved; the analysis module is used for analyzing the code to be retrieved based on the code keyword and the corpus rule to obtain a project description label; and the extraction module is used for reading the code defect report from the specified retrieval data bank based on the item description label.
According to another embodiment of the present invention, a code defect report retrieval apparatus further includes: a search database setting module for: acquiring a defect report from an open source community; matching and marking a defect report according to the functional keywords and the corpus rules, wherein the functional keywords are associated with the code keywords; and collecting the defect reports of the finished marks to obtain a retrieval database.
According to another embodiment of the invention, the code defect report retrieval device comprises a collection module, a search module and a search module, wherein the collection module is further used for acquiring a search keyword; correspondingly, the extraction module reads the corresponding code defect report from the specified retrieval data bank based on the item description label and the retrieval key word.
According to another embodiment of the code defect report search device of the present invention, the search database processes the corresponding code defect report according to a preset text similarity algorithm, and outputs the corresponding code defect report according to the obtained similarity score.
According to another embodiment of the code defect report retrieving apparatus of the present invention, the text similarity calculation method specifically includes: calculating the similarity between different code defect reports through an overlap coefficient and/or an n-gram algorithm based on at least one of the sub-tags contained in the retrieval key words and/or the item description tags; wherein the sub-tags of the item description tag include: XML file name, view name and resource name; the sub-tags of the search key include: XML file name, view name, resource name, report length, report state, correlation mark and reply number; correspondingly, calculating to obtain the similarity score of the single code defect report according to the similarity of the sub-labels and a preset mathematical formula.
Drawings
FIG. 1 is a flowchart illustrating an embodiment of a code bug report retrieval method according to the present invention;
FIG. 2 is a schematic diagram of the connection of an embodiment of a code defect report retrieval apparatus according to the present invention.
FIG. 3 is a diagram illustrating an embodiment of a code bug report retrieval framework in accordance with the present invention.
Detailed Description
The concept and technical effects of the present invention will be clearly and completely described below in conjunction with the embodiments to fully understand the objects, features and effects of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments, and those skilled in the art can obtain other embodiments without inventive effort based on the embodiments of the present invention, and all embodiments are within the protection scope of the present invention.
In the description of the present invention, if an orientation description is referred to, for example, the orientations or positional relationships indicated by "upper", "lower", "front", "rear", "left", "right", etc. are based on the orientations or positional relationships shown in the drawings, only for convenience of describing the present invention and simplifying the description, but not for indicating or implying that the method or the element referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention. If a feature is referred to as being "disposed," "secured," "connected," or "mounted" to another feature, it can be directly disposed, secured, or connected to the other feature or indirectly disposed, secured, connected, or mounted to the other feature.
In the description of the embodiments of the present invention, if "a number" is referred to, it means one or more, if "a plurality" is referred to, it means two or more, if "greater than", "less than" or "more than" is referred to, it is understood that the number is not included, and if "greater than", "lower" or "inner" is referred to, it is understood that the number is included. If reference is made to "first" or "second", this should be understood to distinguish between features and not to indicate or imply relative importance or to implicitly indicate the number of indicated features or to implicitly indicate the precedence of the indicated features.
Example 1.
Referring to fig. 1, a code defect report retrieval method in an embodiment of the present invention is shown, including:
s1, acquiring a code to be retrieved;
s2, analyzing the code to be retrieved based on the code keyword and the corpus rule to obtain a project description label;
s3, reading the code defect report from the designated retrieval data bank based on the item description label.
Providing a specific device to execute the method, inputting a plurality of sections of codes by a user, and based on code keywords and corpus rules, wherein the code keywords are specific characters which change according to different programming types, and the corpus rules are ordering rules (specifically, the rules of a main predicate guest can be referred to); traversing the code to be retrieved through special characters and corresponding sorting rules, namely when a sentence comprises specific characters and the sorting of the characters accords with the sorting rules, the sentence is considered to accord with a certain mode, and the item description labels are used for representing different modes; the item description tag is used as key information for retrieval, and can be used for reading code defect reports meeting the key information from a specified retrieval database.
The code defect report retrieval method of the embodiment of the invention at least has the following beneficial effects: the method has the advantages that the proper item description label is determined for the identification of the code so as to read the code defect report from the specified retrieval database, and compared with simple keyword retrieval, the method has better targeted effect and higher retrieval efficiency.
According to another embodiment of the code defect report search method of the present invention, setting the specified search library includes: acquiring a defect report from an open source community; matching and marking a defect report according to the functional keywords and the corpus rules, wherein the functional keywords are associated with the code keywords; and collecting the defect reports of the finished marks to obtain a retrieval database.
Firstly, establishing a database for storing defect reports; then, all project reports are obtained from android program projects (specifically, other programming types can also be used) in the open source community in a crawler mode; then screening and preprocessing the reports to obtain corresponding metadata (specifically, the metadata can be a title, an author, the number of follow-up comments, a label, a release state, a text, a submitted SHA and the like); and finally, after the corpus processing is carried out, putting the corpus into a database. Meanwhile, each android program item (i.e., an associated file set including a defect report, such as a source code file, which can be obtained by a crawler specifically) in the database generates a description file for retrieval and matching with a subsequently input keyword/item description tag.
Wherein, the metadata is various specific data for defining the subsequent sub-tags, and the sub-tags are defined/determined according to the specific data; the meaning of the corpus processing here is to further clarify metadata to reduce the interference of other data similar to metadata but not metadata, i.e. recognized data may only occur by chance and no code sentence conforming to the corpus rules is formed; the description file is a description file which is matched with the corresponding android program project and comprises a plurality of sub-labels and specific text descriptions, and the matching process based on keywords can be completed through the description file, and the specific content of the android program project can be explained through natural language; the functional keywords are words associated with the code keywords, and are distinguished in that the code keywords are specific characters conforming to encoding rules, and the functional keywords include not only the code keywords, but also characters of a natural language for describing functions, such as functional "read", and then the corresponding code keywords are formal programming sentences including a number of characters and a specified order, and the functional keywords can exceed the limitation of the kind of encoding language, that is, they point to specific programming sentences capable of realizing functions and character sets corresponding to the programming sentences. The defect report can be obtained in the largest range through the open source community, and the content of the defect report can be accurately positioned through the functional keywords and the corpus rules, so that the subsequent retrieval is facilitated.
According to another embodiment of the present invention, a code defect report retrieval method further includes: acquiring a retrieval keyword; correspondingly, based on the item description label and the search key, reading the corresponding code defect report from the specified search library. By adding the search key as the search parameter, the accuracy of the search can be further improved.
The retrieval keywords comprise the above function keywords, character keywords and other characters; wherein, other characters are nonstandard characters, namely fuzzy characters, and are used for equivalent functional keywords or character keywords; for example, the functional keyword "get", the corresponding other words may be "crawl", "fid", or "whole", the character keyword "char", and the corresponding other words may be "cher", "cehr", or the like; the method aims to further improve the success rate of retrieval; this is because the person who explains the defect file using the natural language may have a region, an education level, and a mouth addiction to adopt the personalized language.
According to another embodiment of the code defect report search method of the present invention, the search database processes the corresponding code defect report according to a preset text similarity algorithm, and outputs the result according to the obtained similarity score.
With the continuous open source, similar code defect reports are increased gradually, the number of code defect reports conforming to the retrieval rule is very large, at the moment, a reasonable recommendation mode is needed to provide appropriate files for a searcher, and the text similarity is used as a recommendation basis, so that a user can conveniently check the retrieval result. The high scoring files are typically ranked in the top.
According to another embodiment of the code defect report search method of the present invention, the text similarity calculation method specifically includes: calculating the similarity between different code defect reports through an overlap coefficient and/or an n-gram algorithm based on at least one of the sub-tags contained in the retrieval key words and/or the item description tags; wherein the sub-tags of the item description tag include: XML file name, view name and resource name; the sub-tags of the search key include: XML file name, view name, resource name, report length, report state, correlation mark and reply number; correspondingly, calculating to obtain the similarity score of the single code defect report according to the similarity of the sub-labels and a preset mathematical formula. Through a specific text similarity algorithm and a specific sub-label, the retrieval precision can be improved by combining specific actual operation.
Processing for similarity of character strings in the defect report:
coefficient of overlap
Figure BDA0002389254550000061
Wherein, X and Y are word sets after 2 character strings are subjected to word segmentation, generally, X is a search keyword (the same keyword), Y is a corpus in a database, | X ∩ Y | is the size of the same common part of two word sets, and min (| X |, | Y |) is the size of the minimum set in X and Y, and a score ranging from 0 to 1 is obtained as an overlap coefficient by dividing the above.
Processing for similarity of character strings in the defect report:
the principle of N-gram belongs to the concept in the category of computer linguistics and probability theory, and refers to that a sequence of N items (item) in a given piece of text or voice is processed to obtain the association. Typically the objects of the N-gram are taken from a text or corpus. The N-Gram is based on an assumption: the nth word appears in relation to the first n-1 words and is not related to any other words. (this is also an assumption in hidden markov.) the probability of an entire sentence occurring is equal to the product of the probabilities of the individual words occurring. The probability of each word can be obtained by statistical calculation in the corpus.
In this example n takes 3. Sentence (string) similarity defined based on an N-Gram model (i.e., algorithm) is a fuzzy matching approach, and similarity is measured by the "difference" between two well-defined sentences. The calculation of the N-Gram similarity refers to that the original sentence is segmented according to the length N to obtain word segments, namely all the substrings with the length of N in the original sentence. For two sentences S and T, the similarity of the two sentences can be defined from the number of common substrings. For example:
the degree of similarity between [ "hello", "world" ], [ 'world', 'hello' ] is 0.3;
the degree of similarity between [ "hello", "world" ], [ 'hello', 'the', 'world' ] is 0.667.
Processing for overall similarity of documents:
overall similarity
Figure BDA0002389254550000071
Wherein, the total similarity is formed by weighting the similarity of different sub-descriptions (sub-labels), h is a function (overlapping coefficient function, n-gram function) for calculating the similarity, c1And c2Is a sub-description of C1And C2For the set of subdescriptions, w is the weight, this function is used to compare the similarity between the bug reports of the 2 applications, C1Or C2The method specifically comprises the following steps: XML file name, view name, resource name, report length, report status, association tag, and number of replies.
The XML file name, the view name, and the resource name are specifically used to describe 3 levels of the user interface component of the android application from large to small (of course, in the case of other programming languages, other sub-tags may be used). After the three sub-labels are combined, the combined description key words can be used for comprehensively describing the similarity between the android interface components, and therefore the similarity of the content of the defect report can be conveniently determined.
Additional sub-tags: (Defect) report Length, it is generally considered that the more detailed the report, the better; the (defect) report state is processed (close) and the (open) report state is processed (open), and the processed report is generally considered to be more helpful; the correlation mark is correlated to a defect solution (error repair), which is generally considered to be better; the defect report may be associated with an error fix, and if so, there is an associated flag, which is generally considered more helpful; the number of follow-up replies to a bug report is also an indicator of the level of detail of the report.
The specific scoring principle or process includes:
the preset mathematical formula is as follows: scoring
Figure BDA0002389254550000072
Wherein,
Figure BDA0002389254550000073
is a keyword vector (i.e. vector form of keyword c)
Figure BDA0002389254550000074
) That is, it means that it is composed of a plurality of sub-descriptions), e is a defect report, W is a weight, W isiIs the weight of the particular keyword i.
Then based on the overall similarity of the item description (i.e., the above-mentioned overall similarity), the report length, the report status, whether to associate to a repaired version (i.e., the above-mentioned associated error repair), the number of follow-up replies, and the degree of matching of the joint description keyword with the report:
full coverage matching (all sub-descriptions of the joint description keyword appear, and if all matches have extra weight score); overlap matching (calculating the matching degree of the joint description keyword and the report by using an overlap coefficient); hot word matching (if some empirical hot words occur, additional points are also added, such as "defect", "recurrence", the description of the occurrence of these hot words is relatively more helpful to the report), 3 different aspects.
Weighting by multiple characteristics to obtain keyword vector
Figure BDA0002389254550000081
An association score with defect report e; and finally, sorting, and outputting the result from high relevance to low relevance.
Example 2.
An embodiment of the present invention provides a code defect report retrieval apparatus as shown in fig. 2, including:
the collection module 1 is used for acquiring a code to be retrieved; the analysis module 2 is used for analyzing the code to be retrieved based on the code keyword and the corpus rule to obtain a project description label; and the extraction module 3 is used for reading the code defect report from the specified retrieval data bank based on the item description label.
According to another embodiment of the present invention, a code defect report retrieval apparatus further includes: a search database setting module for: acquiring a defect report from an open source community; matching and marking a defect report according to the functional keywords and the corpus rules, wherein the functional keywords are associated with the code keywords; and collecting the defect reports of the finished marks to obtain a retrieval database.
According to another embodiment of the invention, the code defect report retrieval device comprises a collection module, a search module and a search module, wherein the collection module is further used for acquiring a search keyword; correspondingly, the extraction module reads the corresponding code defect report from the specified retrieval data bank based on the item description label and the retrieval key word.
According to another embodiment of the code defect report search device of the present invention, the search database processes the corresponding code defect report according to a preset text similarity algorithm, and outputs the corresponding code defect report according to the obtained similarity score.
According to another embodiment of the code defect report retrieving apparatus of the present invention, the text similarity calculation method specifically includes: calculating the similarity between different code defect reports through an overlap coefficient and/or an n-gram algorithm based on at least one of the sub-tags contained in the retrieval key words and/or the item description tags; wherein the sub-tags of the item description tag include: XML file name, view name and resource name; the sub-tags of the search key include: XML file name, view name, resource name, report length, report state, correlation mark and reply number; correspondingly, calculating to obtain the similarity score of the single code defect report according to the similarity of the sub-labels and a preset mathematical formula.
Example 3.
Referring to fig. 3, an embodiment of the present invention provides a code defect report retrieval framework, including:
a build system part and a system retrieval part, wherein,
the construction system comprises:
acquiring a defect report from an open source community GitHub, and acquiring a defect report database through metadata preprocessing and corpus processing; meanwhile, a description file database is constructed, and the purpose of the database is to integrate the elements related to the codes, obtain the corresponding file document from the defect report database, and can be simply understood as a collection of search keywords formed by various combinations of keywords.
The system retrieval comprises:
retrieval is performed mainly through two ways, the first input way is input ①, a document containing a keyword is read from a defect report database according to the input keyword through a keyword request;
the first input mode is that a project fuzzy request is input ②, namely, a code to be retrieved is input, then, element analysis is carried out (code keywords and linguistic data) to obtain a corresponding keyword combination (namely, a project description file), a corresponding common description file is read from a description file database according to the combination, and a document comprising the keyword combination is read from a defect report database through the common description file.
The embodiments of the present invention have been described in detail with reference to the accompanying drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the gist of the present invention. Furthermore, the embodiments of the present invention and the features of the embodiments may be combined with each other without conflict.

Claims (10)

1. A code defect report retrieval method, comprising:
acquiring a code to be retrieved;
analyzing the code to be retrieved based on the code keyword and the corpus rule to obtain a project description label;
reading a code defect report from a specified search repository based on the item description tag.
2. The code defect report search method of claim 1, wherein setting the specified search library comprises:
acquiring a defect report from an open source community;
matching and marking the defect report according to functional keywords and a corpus rule, wherein the functional keywords are associated with the code keywords;
and collecting the defect reports of the finished marks to obtain a retrieval database.
3. The code defect report retrieval method of claim 2, further comprising:
acquiring a retrieval keyword; in a corresponding manner, the first and second optical fibers are,
and reading a corresponding code defect report from a specified retrieval data bank based on the item description label and the retrieval key word.
4. The code defect report search method of claim 3, wherein said search database processes said corresponding code defect report according to a preset text similarity algorithm and outputs the result according to the obtained similarity score.
5. The code defect report retrieval method of claim 4, wherein the text similarity algorithm specifically comprises:
calculating similarity between different code defect reports through an overlap coefficient and/or an n-gram algorithm based on at least one of the retrieval key and/or the sub-label contained in the item description label; wherein,
the sub-tags of the item description tag include: XML file name, view name and resource name;
the sub-label of the search key comprises: XML file name, view name, resource name, report length, report state, correlation mark and reply number; in a corresponding manner, the first and second optical fibers are,
and calculating to obtain the similarity score of a single code defect report according to the similarity of the sub-labels and a preset mathematical formula.
6. A code defect report retrieval apparatus, comprising:
the collection module is used for acquiring a code to be retrieved;
the analysis module is used for analyzing the code to be retrieved based on the code keyword and the corpus rule to obtain a project description label;
and the extraction module is used for reading the code defect report from a specified retrieval data bank based on the item description label.
7. The code defect report retrieval device of claim 6, further comprising: a search database setting module for:
acquiring a defect report from an open source community;
matching and marking the defect report according to functional keywords and a corpus rule, wherein the functional keywords are associated with the code keywords;
and collecting the defect reports of the finished marks to obtain a retrieval database.
8. The code bug report retrieval device of claim 6, wherein the collection module is further configured to obtain a retrieval key; in a corresponding manner, the first and second optical fibers are,
and the extraction module reads a corresponding code defect report from a specified retrieval database based on the item description label and the retrieval key word.
9. The code defect report search device of claim 8, wherein the search library processes the corresponding code defect report according to a preset text similarity algorithm and outputs the corresponding code defect report according to the obtained similarity score.
10. The code defect report retrieval device of claim 9, wherein the text similarity algorithm specifically comprises:
calculating similarity between different code defect reports through an overlap coefficient and/or an n-gram algorithm based on at least one of the retrieval key and/or the sub-label contained in the item description label; wherein,
the sub-tags of the item description tag include: XML file name, view name and resource name;
the sub-label of the search key comprises: XML file name, view name, resource name, report length, report state, correlation mark and reply number; in a corresponding manner, the first and second optical fibers are,
and calculating to obtain the similarity score of a single code defect report according to the similarity of the sub-labels and a preset mathematical formula.
CN202010108813.8A 2020-02-21 2020-02-21 Code defect report retrieval method and device Pending CN111339272A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010108813.8A CN111339272A (en) 2020-02-21 2020-02-21 Code defect report retrieval method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010108813.8A CN111339272A (en) 2020-02-21 2020-02-21 Code defect report retrieval method and device

Publications (1)

Publication Number Publication Date
CN111339272A true CN111339272A (en) 2020-06-26

Family

ID=71181848

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010108813.8A Pending CN111339272A (en) 2020-02-21 2020-02-21 Code defect report retrieval method and device

Country Status (1)

Country Link
CN (1) CN111339272A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114462399A (en) * 2020-11-09 2022-05-10 中核核电运行管理有限公司 Accurate matching method for quality defect report and state report of nuclear power plant
CN114692609A (en) * 2022-04-01 2022-07-01 南京优速网络科技有限公司 Method for implementing Chinese text error correction based on similarity

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107608732A (en) * 2017-09-13 2018-01-19 扬州大学 A kind of bug search localization methods based on bug knowledge mappings

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107608732A (en) * 2017-09-13 2018-01-19 扬州大学 A kind of bug search localization methods based on bug knowledge mappings

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ANH TUAN NGUYEN等: "Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering" *
P´ETER GYIMESI等: "Characterization of Source Code Defects by Data Mining Conducted on GitHub" *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114462399A (en) * 2020-11-09 2022-05-10 中核核电运行管理有限公司 Accurate matching method for quality defect report and state report of nuclear power plant
CN114692609A (en) * 2022-04-01 2022-07-01 南京优速网络科技有限公司 Method for implementing Chinese text error correction based on similarity

Similar Documents

Publication Publication Date Title
CN110502621B (en) Question answering method, question answering device, computer equipment and storage medium
CN110399457B (en) Intelligent question answering method and system
CN110968699B (en) Logic map construction and early warning method and device based on fact recommendation
AU2019263758B2 (en) Systems and methods for generating a contextually and conversationally correct response to a query
US20200327172A1 (en) System and method for processing contract documents
JP5356197B2 (en) Word semantic relation extraction device
CN102253930B (en) A kind of method of text translation and device
CN113158653B (en) Training method, application method, device and equipment for pre-training language model
CN111814482B (en) Text key data extraction method and system and computer equipment
CN114036930A (en) Text error correction method, device, equipment and computer readable medium
CN108804592A (en) Knowledge library searching implementation method
CN114722137A (en) Security policy configuration method and device based on sensitive data identification and electronic equipment
CN115292450A (en) Data classification field knowledge base construction method based on information extraction
CN112149387A (en) Visualization method and device for financial data, computer equipment and storage medium
Ko et al. Natural language processing–driven model to extract contract change reasons and altered work items for advanced retrieval of change orders
Cheng et al. A similarity integration method based information retrieval and word embedding in bug localization
CN112380848A (en) Text generation method, device, equipment and storage medium
CN111339272A (en) Code defect report retrieval method and device
US9305103B2 (en) Method or system for semantic categorization
Souza et al. ARCTIC: metadata extraction from scientific papers in pdf using two-layer CRF
Lazemi et al. Persian plagirisim detection using CNN s
CN110688453B (en) Scene application method, system, medium and equipment based on information classification
CN106776590A (en) A kind of method and system for obtaining entry translation
CN112269852A (en) Method, system and storage medium for generating public opinion topic
CN116521133B (en) Software function safety requirement analysis method, device, equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200626