CN110442514B - Method for realizing defect repair recommendation based on learning algorithm - Google Patents
Method for realizing defect repair recommendation based on learning algorithm Download PDFInfo
- Publication number
- CN110442514B CN110442514B CN201910623765.3A CN201910623765A CN110442514B CN 110442514 B CN110442514 B CN 110442514B CN 201910623765 A CN201910623765 A CN 201910623765A CN 110442514 B CN110442514 B CN 110442514B
- Authority
- CN
- China
- Prior art keywords
- repair
- defect
- ast
- bug
- editing operation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000007547 defect Effects 0.000 title claims abstract description 59
- 238000000034 method Methods 0.000 title claims abstract description 29
- 229940060321 after-bug Drugs 0.000 claims abstract description 17
- 238000001914 filtration Methods 0.000 claims abstract description 14
- 238000012216 screening Methods 0.000 claims abstract description 11
- 238000013528 artificial neural network Methods 0.000 claims abstract description 7
- 238000012549 training Methods 0.000 claims abstract description 7
- 238000013507 mapping Methods 0.000 claims description 6
- 230000000306 recurrent effect Effects 0.000 claims 1
- 238000013135 deep learning Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/3604—Software analysis for verifying properties of programs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a method for realizing defect repair recommendation based on a learning algorithm, which comprises the following steps: extracting abstract grammar tree AST through GumTree for collected source codes before and after bug repair to obtain AST editing operation sequences of codes before and after bug repair; screening and filtering AST editing operation sequences; the source codes before bug repair and after bug repair are abstracted by a parser in combination with the AST editing operation sequence after screening and filtering and are respectively mapped into vector feature representations; training a neural network according to the mapped vector characteristic representation to obtain a defect sentence recognition model, thereby recognizing a defect sentence; a repair scheme is recommended for the identified defect statement based on semantic features of the source code. According to AST editing operation among codes, model characteristic representation is realized through fine-granularity code analysis, and the defect codes are positioned by the context relation, so that a repair recommendation scheme with fine-granularity property can be obtained, and the repair is more accurate.
Description
Technical Field
The invention belongs to the field of software maintenance, and particularly relates to a method for realizing defect repair recommendation based on a learning algorithm.
Background
Repairing defects is a time consuming task for software. As the size and complexity of software products grow, defects are an unavoidable issue. Software defects may occur due to deviations in demand understanding, unreasonable development processes, or insufficient experience of developers, etc. When a developer faces a large number of defects, if recommendation of a repair scheme can be achieved according to the defect codes, efficiency of repairing the defects by the developer can be greatly improved.
The problem of automatic program repair is a research hotspot in the current software maintenance field, and researchers at home and abroad develop intensive researches on the problem. Existing automated program repair methods can be briefly categorized into test case based automated program repair and other types of automated program repair methods. Other types of automated procedure repair methods evaluate the correctness of candidate patches by means of pre-condition/post-condition based contracts or defect reports, etc. When the developer does not have enough time to complete the manual repair of all defects, temporary patches can be automatically generated for some defect programs by means of an automatic program repair method, and then the developer can refer to the temporary patches to further improve the quality of the patches by means of manual mode. The current automatic repair method is relatively limited, has higher requirements on the expertise of the developer, and consumes a great amount of time for confirming an acceptable repair mode or conversion; and automated repair methods can have problems with generating programmer acceptable patches, such as articles [ Qi, z., long, f., achour, s., and Rinard, m.an analysis of patch plausibility and correctness for generate-and-validate patch generation systems.issta'15 ], most of which reported patches generated by deleting functional blocks or by techniques that overload test cases to effect repair are incorrect.
The current deep learning technology is widely applied to defect positioning, defect prediction and defect repair. In deep learning based automatic defect repair, learning transcoding or learning history submission is commonly used to generate patches, and although sufficiently correct transcoded variants or patches can be generated and no manual selection is required, defect repair accuracy is around 45%, accuracy is still relatively low, and there are difficulties in completing large amounts of defect data repair. In true defect code repair, automatic repair by error transcoding still requires further investigation.
Disclosure of Invention
The invention aims to provide a defect repair recommendation method for providing a defect repair scheme for a developer and improving the defect repair efficiency and quality of the developer.
The technical solution for realizing the purpose of the invention is as follows: the method for realizing defect repair recommendation based on the learning algorithm comprises the following steps:
step 1, extracting abstract syntax trees AST through GumTree respectively aiming at collected source codes before and after bug repair to obtain respective AST editing operation sequences of the codes before and after bug repair;
step 2, screening and filtering AST editing operation sequences;
step 3, combining the AST editing operation sequences after screening and filtering, abstracting source codes before bug repair and after bug repair by using a parser, and mapping the source codes into vector feature representations respectively;
step 4, training a neural network according to the mapped vector characteristic representation to obtain a defect sentence recognition model, thereby recognizing a defect sentence;
and 5, recommending a repair scheme for the defect statement identified in the step 4 based on the semantic features of the source code.
Compared with the prior art, the invention has the remarkable advantages that: 1) Extracting an AST editing operation sequence of codes through GumTree, providing an accurate data source for the model, and ensuring that each training data in the invention is effective; 2) The RNN encoder-decoder combined training model is adopted, the input sequence is converted into vector representation and then decoded into the output sequence, and the whole learning process is trained by utilizing an end-to-end structure, so that the model can simulate various AST operations and generate candidate patches, and the application range is wide; 3) When the repair scheme is recommended, from statement semantics, the contact context locates the defect code, and the repair recommended scheme with fine granularity property can be obtained, so that the repair is more accurate; 4) The repair granularity provided is made finer from an AST point of view.
The invention is described in further detail below with reference to the accompanying drawings.
Drawings
FIG. 1 is a flowchart of a method for implementing defect repair recommendation based on a learning algorithm.
Fig. 2 is a schematic diagram of an AST editing operation sequence of a pre-repair code according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of an AST editing operation sequence of the repaired code in the embodiment of the present invention.
Detailed Description
Referring to fig. 1, the method for realizing defect repair recommendation based on a learning algorithm of the invention comprises the following steps:
step 1, extracting abstract syntax trees AST through GumTree respectively aiming at collected source codes before and after bug repair to obtain respective AST editing operation sequences of the codes before and after bug repair;
step 2, screening and filtering AST editing operation sequences;
step 3, combining the AST editing operation sequences after screening and filtering, abstracting source codes before bug repair and after bug repair by using a parser, and mapping the source codes into vector feature representations respectively;
step 4, training a neural network according to the mapped vector characteristic representation to obtain a defect sentence recognition model, thereby recognizing a defect sentence;
and 5, recommending a repair scheme for the defect statement identified in the step 4 based on the semantic features of the source code.
Further preferably, in step 1, AST extraction is performed on the source code, and node types of the extracted AST include:
(1) Method call and class instance creation nodes;
(2) Method declaration, type declaration, and enumeration declaration nodes;
(3) The control flow node comprises a while statement, a catch statement, an if statement and a throw statement.
Further preferably, the screening filtration in step 2 is specifically: filtering out grammar errors and AST editing operation sequences with the occurrence frequency lower than a set threshold value.
Further, in step 3, source codes before bug repair and after bug repair are abstracted by using a parser and mapped into vector feature representations respectively, specifically:
step 3-1, generating a mark stream from the source code by using an analyzer;
and 3-2, feeding the mark stream back to the parser, generating a unique ID for each identifier/text in the source code, and mapping.
Further, in step 5, a repair scheme is recommended for the defect statement based on the semantic features of the code, specifically:
according to the context of the defect statement in the source code, the following fine-grained repair mode of table 1 is combined, and a corresponding repair scheme is recommended: the problem of X in the defect sentence is repaired or the problem of Y in the defect sentence is repaired, wherein X, Y is (1) to (1)Any one of the following;
TABLE 1 repair mode for bug
。
The present invention will be described in further detail with reference to examples.
Examples
Referring to fig. 1, the method for realizing defect repair recommendation based on a learning algorithm of the invention comprises the following steps:
1. collecting codes before bug repair (bug files) and after bug repair (fixed files) from the Github, and extracting abstract syntax trees AST through GumTree respectively to obtain AST editing operation sequences of the codes before bug repair and after bug repair. The AST editing operation sequence of the code before repair extracted in this embodiment is shown in fig. 2, and the AST editing operation sequence of the code after repair is shown in fig. 3.
2. Screening and filtering are carried out on the AST editing operation sequence, and the AST editing operation sequence containing grammar errors and having the occurrence frequency lower than a set threshold value (in the embodiment, the occurrence frequency is more than 3 times).
3. The source codes before bug repair and after bug repair are abstracted by a Java parser in combination with the AST editing operation sequence after screening and filtering and mapped into vector feature representations respectively, as shown in the following table 2:
TABLE 2 code mapping results before and after repair
4. Training a neural network according to the mapped vector feature representation to obtain a defect sentence recognition model, thereby recognizing a defect sentence as shown in the following table 3:
table 3 input/output examples of defect statement identification model
5. Based on the semantic features of the source code, a repair scheme is recommended for the defect statement identified in the above process 4, and the repair scheme in this embodiment is as follows:
such defects exist (2 if in the body of (9) reference, suggested modifications.
According to AST editing operation among codes, model characteristic representation is realized through fine-granularity code analysis, and the defect codes are positioned by the contact context relation, so that a repair recommendation scheme with fine-granularity property can be obtained, and the repair is more accurate.
Claims (4)
1. The method for realizing defect repair recommendation based on the learning algorithm is characterized by comprising the following steps:
step 1, extracting abstract syntax trees AST through GumTree respectively aiming at collected source codes before and after bug repair to obtain respective AST editing operation sequences of the codes before and after bug repair; extracting the AST from the source code, wherein the node types of the extracted AST comprise:
(1) Method call and class instance creation nodes;
(2) Method declaration, type declaration, and enumeration declaration nodes;
(3) A control flow node including a while statement, a catch statement, an if statement, and a throw statement;
step 2, screening and filtering AST editing operation sequences;
step 3, combining the AST editing operation sequences after screening and filtering, abstracting source codes before bug repair and after bug repair by using a parser, and mapping the source codes into vector feature representations respectively; the source codes before and after bug repair are abstracted by a parser and mapped into vector feature representations respectively, and the method specifically comprises the following steps:
step 3-1, generating a mark stream from the source code by using an analyzer;
step 3-2, feeding the marking stream back to the parser, generating a unique ID for each identifier/text in the source code and mapping;
step 4, training a neural network according to the mapped vector characteristic representation to obtain a defect sentence recognition model, thereby recognizing a defect sentence;
and 5, recommending a repair scheme for the defect statement identified in the step 4 based on the semantic features of the source code.
2. The method for implementing defect repair recommendation based on learning algorithm according to claim 1, wherein the filtering in step 2 specifically comprises: filtering out grammar errors and AST editing operation sequences with the occurrence frequency lower than a set threshold value.
3. The method for implementing defect repair recommendation based on learning algorithm according to claim 1, wherein the neural network in step 4 is specifically a recurrent neural network RNN.
4. The method for implementing defect repair recommendation based on learning algorithm according to claim 1, wherein the semantic feature based on code in step 5 is a defect statement recommendation repair scheme, specifically:
according to the context of the defect statement in the source code, the following fine-grained repair mode of table 1 is combined, and a corresponding repair scheme is recommended: the problem of X in the defect sentence is repaired or the problem of Y in the defect sentence is repaired, wherein X, Y is (1) to (1)Any one of the following;
TABLE 1 repair mode for bug
。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910623765.3A CN110442514B (en) | 2019-07-11 | 2019-07-11 | Method for realizing defect repair recommendation based on learning algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910623765.3A CN110442514B (en) | 2019-07-11 | 2019-07-11 | Method for realizing defect repair recommendation based on learning algorithm |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110442514A CN110442514A (en) | 2019-11-12 |
CN110442514B true CN110442514B (en) | 2024-01-12 |
Family
ID=68430178
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910623765.3A Active CN110442514B (en) | 2019-07-11 | 2019-07-11 | Method for realizing defect repair recommendation based on learning algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110442514B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111459491B (en) * | 2020-03-17 | 2021-11-05 | 南京航空航天大学 | Code recommendation method based on tree neural network |
CN111897946B (en) * | 2020-07-08 | 2023-09-19 | 扬州大学 | Vulnerability patch recommendation method, vulnerability patch recommendation system, computer equipment and storage medium |
CN114416421B (en) * | 2022-01-24 | 2024-05-31 | 北京航空航天大学 | Automatic positioning and repairing method for code defects |
CN115951892A (en) * | 2022-11-08 | 2023-04-11 | 北京交通大学 | Program patch generating method based on expression |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105045719A (en) * | 2015-08-24 | 2015-11-11 | 中国科学院软件研究所 | Method and device for predicting regression test failure on basis of repair deficiency change |
CN109299007A (en) * | 2018-09-18 | 2019-02-01 | 哈尔滨工程大学 | A kind of defect repair person's auto recommending method |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104412327B (en) * | 2013-01-02 | 2019-02-12 | 默思股份有限公司 | Built-in self-test and prosthetic device and method |
EP3452924A4 (en) * | 2016-04-27 | 2020-01-01 | Coda Project, Inc. | System, method, and apparatus for operating a unified document surface workspace |
CN106445804B (en) * | 2016-08-24 | 2019-04-05 | 北京奇虎测腾安全技术有限公司 | A kind of source code cloud detection system and method based on serializing intermediate representation |
-
2019
- 2019-07-11 CN CN201910623765.3A patent/CN110442514B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105045719A (en) * | 2015-08-24 | 2015-11-11 | 中国科学院软件研究所 | Method and device for predicting regression test failure on basis of repair deficiency change |
CN109299007A (en) * | 2018-09-18 | 2019-02-01 | 哈尔滨工程大学 | A kind of defect repair person's auto recommending method |
Also Published As
Publication number | Publication date |
---|---|
CN110442514A (en) | 2019-11-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110442514B (en) | Method for realizing defect repair recommendation based on learning algorithm | |
Dinella et al. | Hoppity: Learning graph transformations to detect and fix bugs in programs | |
CN109783079A (en) | A kind of code annotation generation method based on program analysis and Recognition with Recurrent Neural Network | |
CN110502361A (en) | Fine granularity defect positioning method towards bug report | |
CN107678971B (en) | Code taste driven code defect prediction method based on clone and coupling detection | |
US12106095B2 (en) | Deep learning-based java program internal annotation generation method and system | |
CN109446221A (en) | A kind of interactive data method for surveying based on semantic analysis | |
CN117215935A (en) | Software defect prediction method based on multidimensional code joint graph representation | |
CN110442527A (en) | Automation restorative procedure towards bug report | |
CN116483730A (en) | Service system automatic test method based on domestic software and hardware and open source test tool | |
CN114547619A (en) | Vulnerability repairing system and method based on tree | |
CN115437952A (en) | Statement level software defect detection method based on deep learning | |
CN117238276B (en) | Analysis correction system based on intelligent voice data recognition | |
CN108228232B (en) | Automatic repairing method for circulation problem in program | |
CN106383734A (en) | Method for extracting detailed design from codes | |
CN109508204B (en) | Front-end code quality detection method and device | |
CN117193778A (en) | Code examination method and system based on language model | |
CN116820557A (en) | Code abstract generation method integrating node characteristics of abstract syntax tree | |
CN114064472B (en) | Automatic software defect repairing acceleration method based on code representation | |
CN116069337A (en) | Code defect automatic repair method combining repair template and deep learning | |
CN113867714B (en) | Automatic code generation method adapting to multiple languages | |
CN115237469A (en) | Multi-mode architecture reverse analysis method based on cloud service source code | |
CN115719057A (en) | Log analysis method | |
CN113360766A (en) | Java method name recommendation method based on seq2seq model | |
CN113377962A (en) | Intelligent process simulation method based on image recognition and natural language processing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |