CN117453566A - Code defect repairing method, device, electronic equipment and storage medium - Google Patents

Code defect repairing method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN117453566A
CN117453566A CN202311558286.0A CN202311558286A CN117453566A CN 117453566 A CN117453566 A CN 117453566A CN 202311558286 A CN202311558286 A CN 202311558286A CN 117453566 A CN117453566 A CN 117453566A
Authority
CN
China
Prior art keywords
code
target
defect
expert
rule
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311558286.0A
Other languages
Chinese (zh)
Inventor
吴冕冠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202311558286.0A priority Critical patent/CN117453566A/en
Publication of CN117453566A publication Critical patent/CN117453566A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/362Software debugging
    • G06F11/366Software debugging using diagnostics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/362Software debugging
    • G06F11/3628Software debugging of optimised code
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Computer Hardware Design (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present disclosure provides a code defect repair method, which relates to the field of artificial intelligence. The method comprises the following steps: detecting code defects existing in the target code; if the code defects are detected, matching the code defects in a knowledge base by utilizing a decision tree model to obtain a first target expert rule, wherein the knowledge base comprises N expert rules, each expert rule indicates repair information of the corresponding class of code defects, and N is an integer greater than or equal to 1; if the code defect is not detected, carrying out code intention understanding on the target code by using a generative model, and obtaining a second target expert rule based on the code intention understanding result when the code intention understanding result represents that the code defect exists; repairing the code defect based on the first target expert rule or the second target expert rule. The disclosure also provides a code defect repair device, an electronic device and a storage medium.

Description

Code defect repairing method, device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of artificial intelligence, and more particularly, to a code defect repair method, apparatus, electronic device, and storage medium.
Background
Code defects refer to flaws and problems that exist in code, resulting in end products that fail to meet software requirements and their functional requirements, requiring repair. In the related art, the defects are usually repaired manually, the efficiency and the accuracy of the defects are greatly influenced by artificial factors, and meanwhile, the experience assets for repairing the code defects are not reused.
Disclosure of Invention
In view of the foregoing, the present disclosure provides a code defect repair method, apparatus, electronic device, and storage medium.
In one aspect of the disclosed embodiments, a method for repairing a code defect is provided, including: detecting code defects existing in the target code; if the code defects are detected, matching the code defects in a knowledge base by utilizing a decision tree model to obtain a first target expert rule, wherein the knowledge base comprises N expert rules, each expert rule indicates repair information of the corresponding class of code defects, and N is an integer greater than or equal to 1; if the code defect is not detected, carrying out code intention understanding on the target code by using a generative model, and obtaining a second target expert rule based on the code intention understanding result when the code intention understanding result represents that the code defect exists; repairing the code defect based on the first target expert rule or the second target expert rule.
According to an embodiment of the present disclosure, if the code defect is detected, after obtaining the first target expert rule, the method further includes: inputting the target code, the code defect and the first target expert rule into the generative model to obtain a third target expert rule output by the generative model; wherein repairing the code defect based on the first target expert rule or the second target expert rule comprises: and repairing the code defect by taking the third target expert rule as the new first target expert rule.
According to an embodiment of the present disclosure, obtaining the third target expert rule of the generative model output includes: causing the generative model to perform code intent understanding on the target code, the code flaw, and the first target expert rule to simulate repair of the code flaw based on the first target expert rule; and when the simulation repairing result does not accord with a preset condition, the generated model determines the third target expert rule from N expert rules, wherein the third target expert rule is different from the first target expert rule.
According to an embodiment of the present disclosure, after detecting a code defect existing in an object code, if the code defect is detected, matching the code defect in a knowledge base by using a decision tree model, and not obtaining a first object expert rule, the method further includes: and inputting the target codes and the code defects into the generative model to obtain the first target expert rules output by the generative model.
According to an embodiment of the present disclosure, before the code intent understanding of the object code using the generative model, the method further comprises: obtaining N expert rules and M1 first code samples with code defects, wherein each first code sample is provided with a corresponding code defect label, and M1 is an integer greater than or equal to 1; n expert rules and M1 first code samples are input into the pre-trained generative model for fine tuning.
According to an embodiment of the present disclosure, when the code intent understanding result characterizes the presence of the code defect, deriving a second target expert rule based on the code intent understanding result comprises: when the code intent understanding result characterizes the presence of the code defect, causing the trimmed generative model to generate the second target expert rule based on the code intent understanding result.
According to an embodiment of the present disclosure, after repairing the code defect based on the second target expert rule, the method further comprises: and if the N expert rules do not comprise the second target expert rules, updating the knowledge base, wherein the updated knowledge base comprises the N expert rules and the second target expert rules.
According to an embodiment of the present disclosure, when the code intent understanding result characterizes the presence of the code defect, deriving a second target expert rule based on the code intent understanding result comprises: and when the code intention understanding result represents that the code defect exists, inputting N expert rules into the generative model to obtain the second target expert rule.
In accordance with an embodiment of the present disclosure, before matching the code flaws with the N expert rules using the decision tree model, the method further comprises: obtaining N expert rules and M2 second code samples with code defects, wherein each second code sample is provided with a corresponding code defect label, and M2 is an integer greater than or equal to 1; training the decision tree model based on the N expert rules and the M2 second code samples. 1
According to an embodiment of the present disclosure, repairing the code defect based on the first target expert rule includes: inputting the object code, the code flaw, and the first object expert rule into the generative model; causing the generative model to repair the code defect of the target code based on the first target expert rule.
Another aspect of an embodiment of the present disclosure provides a code defect repair apparatus, including: the defect detection module is used for detecting code defects existing in the target code; the first rule module is used for matching the code defects in a knowledge base by utilizing a decision tree model if the code defects are detected, so as to obtain a first target expert rule, wherein the knowledge base comprises N expert rules, each expert rule indicates repair information of the corresponding category code defects, and N is an integer greater than or equal to 1; the second rule module is used for carrying out code intention understanding on the target code by utilizing a generative model if the code defect is not detected, and obtaining a second target expert rule based on the code intention understanding result when the code intention understanding result represents that the code defect exists; and the defect repairing module is used for repairing the code defects based on the first target expert rules or the second target expert rules.
Another aspect of an embodiment of the present disclosure provides an electronic device, including: one or more processors; and a storage means for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method as described above.
Another aspect of the disclosed embodiments also provides a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform the method as described above.
Another aspect of the disclosed embodiments also provides a computer program product comprising a computer program which, when executed by a processor, implements a method as described above.
One or more of the above embodiments have the following advantages: meanwhile, the method provides a decision tree model and a generation model, the comprehensive decision tree model combines the advantage of fast matching speed of a knowledge base, and the generation model has the advantage of code intention understanding capability based on natural language processing, and expert rules for repairing information are rapidly provided for code defect output of a target code, so that automatic code repairing is completed, the efficiency of code defect repairing is improved, and the experience assets of code repairing can be precipitated and reused.
Drawings
The foregoing and other objects, features and advantages of the disclosure will be more apparent from the following description of embodiments of the disclosure with reference to the accompanying drawings, in which:
FIG. 1 schematically illustrates an application scenario diagram of a code defect repair method according to an embodiment of the present disclosure;
FIG. 2 schematically illustrates a flow chart of a code defect repair method according to an embodiment of the present disclosure;
FIG. 3 schematically illustrates a flow chart of a method of code defect repair according to another embodiment of the present disclosure;
FIG. 4 schematically illustrates a flow chart of obtaining a third target expert rule in accordance with an embodiment of the present disclosure;
FIG. 5 schematically illustrates a flow chart of a method of code defect repair according to another embodiment of the present disclosure;
FIG. 6 schematically illustrates a flow chart of a method of code defect repair according to another embodiment of the present disclosure;
FIG. 7 schematically illustrates a flow chart of a method of code defect repair according to another embodiment of the present disclosure;
FIG. 8 schematically illustrates a flow chart of a method of code defect repair according to another embodiment of the present disclosure;
FIG. 9 schematically illustrates a flow chart of a method of code defect repair according to another embodiment of the present disclosure;
FIG. 10 schematically illustrates a flow chart of a method of code defect repair according to another embodiment of the present disclosure;
FIG. 11 schematically illustrates a block diagram of a code defect repair apparatus according to an embodiment of the present disclosure; and
fig. 12 schematically illustrates a block diagram of an electronic device adapted to implement a code defect repair method according to an embodiment of the disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is only exemplary and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.
Where expressions like at least one of "A, B and C, etc. are used, the expressions should generally be interpreted in accordance with the meaning as commonly understood by those skilled in the art (e.g.," a system having at least one of A, B and C "shall include, but not be limited to, a system having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
Fig. 1 schematically illustrates an application scenario diagram of a code defect repair method according to an embodiment of the present disclosure. It should be noted that fig. 1 is only an example in which embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, but does not mean that embodiments of the present disclosure may not be used in other devices, systems, environments, or scenarios.
As shown in fig. 1, an application scenario 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only) may be installed on the terminal devices 101, 102, 103.
The terminal devices 101, 102, 103 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server (by way of example only) providing support for websites browsed by users using the terminal devices 101, 102, 103. The background management server may analyze and process the received data such as the user request, and feed back the processing result (e.g., the web page, information, or data obtained or generated according to the user request) to the terminal device.
The server 105 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud computing, network service, and middleware service.
In some embodiments, the user may issue training or fine tuning instructions for the decision tree model, generative model, through a front end page provided by the terminal device 101, 102, 103, and may execute the instructions locally or at the server 105. The user may also input object code via the front-end page provided by the terminal device 101, 102, 103 and issue code defect detection and repair instructions, which may be executed locally or at the server 105.
It should be noted that, the method for repairing a code defect provided by the embodiment of the present invention may be performed by the terminal devices 101, 102, 103 or the server 105. Accordingly, the code defect repair apparatus provided by the embodiment of the present invention may be generally disposed in the terminal devices 101, 102, 103 or the server 105.
It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
The code defect repair method of the embodiment of the present disclosure will be described in detail below by way of fig. 2 to 10 based on the scenario described in fig. 1.
Fig. 2 schematically illustrates a flow chart of a code defect repair method according to an embodiment of the present disclosure.
As shown in fig. 2, this embodiment includes:
in operation S210, detecting a code defect of the object code;
the object code comprises the code to be detected, possibly a function or a code block, in which logical errors or potential security vulnerabilities exist, e.g. grammatical errors, logical errors, functional errors, etc. of the code may be regarded as code defects. For example, a static analysis is performed on a section of Java program to detect whether a bug exists therein, such as a null pointer exception.
In operation S220, it is determined whether a code defect is detected.
Detecting a code defect refers to determining that a defect or error exists in the object code. If the uninitialized variable, the memory leakage problem and the like are found according to the static analysis result.
In operation S230, if a code defect is detected, matching the code defect in a knowledge base by using a decision tree model to obtain a first target expert rule, wherein the knowledge base includes N expert rules, each expert rule indicates repair information of a corresponding class of code defect, and N is an integer greater than or equal to 1;
the decision tree model is constructed based on a machine learning algorithm, such as ID3, C4.5 and the like. The decision tree model represents the decision process by building a series of decision nodes and leaf nodes. In code defect matching, the decision tree model can determine which entry in the knowledge base the defect best matches based on a given target code content or code defect characteristics.
The knowledge base is a database storing expert knowledge or rules, for example, using a decision tree model to match detected buffer overflow defects with rules in the knowledge base, obtaining repair information, i.e., a first target expert rule, referring to expert rules for guiding the repair process, which provides guidance and advice for specific defect types in the target code.
Expert rules may include requirements in terms of code format, naming convention, error handling, and the like. The repair information may include a hint word of how to repair the defect, a computer-executable code statement or a repair guidance of a natural language description, or the like. If the memory leakage problem in the target code is detected, matching the problem with a memory leakage repair rule in a knowledge base by using a decision tree model, and obtaining a corresponding expert rule, for example, releasing unused memory space.
In operation S240, the code defect is repaired based on the first target expert rule.
The first target expert rule is an expert rule obtained according to a matching result of the decision tree model and is used for guiding the repair of the code defects. For example, according to the matched expert rules, memory release errors in the code are modified, for example, missing memory release sentences are added, so that the memory leakage problem is repaired.
If no code defect is detected, performing code intention understanding on the target code by using the generative model, and obtaining a second target expert rule based on the code intention understanding result when the code intention understanding result represents that the code defect exists in the code intention understanding result;
the second target expert rule may be from N expert rules or may be a new rule generated by the generative model. For example, the extraction-generated model is used for understanding the content of the intention of the code defect, matching with expert rules of a knowledge base, and obtaining a second target expert rule which is matched with the best. For another example, the generative model generates new rules outside the knowledge base as second target expert rules based on the knowledge learned during the training phase.
The Generative model is a machine learning model that is used to generate new data or representations, such as a natural language processing model based on a GPT (Generative Pre-trained Transformer) architecture, which can generate new content, such as ChatGPT, claude. Code is intended to be understood to refer to the understanding and appreciation of the purpose, function, and inference of the code. For example, the generated model is used to analyze the code in which no defect is detected, and infer the possible intent of the code, such as determining that the function of a piece of code is to perform a file reading operation.
The code intention understanding capability of the generative model is utilized, which is a supplement to the detection means in operation S210, and potential code defects are found from the dimension of natural language processing, so that the code defect repairing efficiency is improved.
In operation S260, the code defect is repaired based on the second target expert rule.
The second target expert rules are expert rules that generate model outputs for further repairing code defects. Error logic in the code is improved, for example, according to rules output by the generative model, to repair defects caused by logic errors.
According to the embodiment of the disclosure, the decision tree model and the generation model are provided at the same time, the comprehensive decision tree model combines the advantage of fast matching speed of a knowledge base, and the generation model has the advantage of code intention understanding capability based on natural language processing, and expert rules for providing repair information for code defect output of the target code are fast, so that automatic code repair is completed, the efficiency of code defect repair is improved, and the experience assets of code repair can be precipitated and reused.
Fig. 3 schematically illustrates a flow chart of a method of code defect repair according to another embodiment of the present disclosure.
As shown in fig. 3, the rest of this embodiment is the same as that of fig. 2 except for operation S310 and operation S320, and will not be described again here. If a code defect is detected, after the first target expert rule is obtained, the embodiment includes:
in operation S310, inputting the object code, the code defect, and the first object expert rule into the generative model to obtain a third object expert rule output by the generative model;
a hint-word-assisted generative model output may also be provided in operation S310, such as a hint word including a textual description of the role of the piece of object code, the business scenario, the output requirements, and the object code, the code deficiency, and the first object expert rules.
In operation S320, the code defect is repaired using the third target expert rule as a new first target expert rule, which is the same as or different from the first target expert rule. Operation S320 is one of the embodiments of operation S240.
For example, the third target expert rule may be the first target expert rule, or may be a modification or addition to the first target expert rule, or may be an entirely new repair rule.
According to embodiments of the present disclosure, with the code intent understanding capability of the generative model, the repair capability is further evaluated against the first target expert rule, and the repair effect may be improved. The new expert rules are used to repair code defects more accurately and comprehensively, thereby improving code quality.
Fig. 4 schematically illustrates a flowchart of obtaining a third target expert rule according to an embodiment of the disclosure.
As shown in fig. 4, this embodiment is one of the embodiments of operation S310, including:
in operation S410, causing the generative model to perform code intent understanding on the target code, the code defect, and the first target expert rule, and performing simulated repair on the code defect based on the first target expert rule;
and understanding and reasoning the target codes, the code defects and the first target expert rules through a generated model, simulating the repairing code defects to obtain repairing results, and taking the repairing results as the basis of further decision.
In operation S420, when the result of the simulation repair does not meet the preset condition, the generated model is caused to determine a third target expert rule from the N expert rules, the third target expert rule being different from the first target expert rule.
The preset conditions constrain the requirements of the code in terms of correctness, performance and the like, for example, the repaired code needs to be able to be compiled, generate correct results and the like.
According to the embodiment of the disclosure, the reasoning and understanding capabilities of the generative model can be utilized, and when the result of simulation repair does not meet the preset condition, a new repair rule is selected from a plurality of expert rules, so that the repair effect and adaptability are further improved.
In some embodiments, the large model understands the target code, judges whether the target code has a defect, if yes, extracts the interpretation content of the large model on the code defect (such as the code problem of closing a file after opening, a null pointer and the like), then matches the interpretation content with the rule of the expert system for repairing the code, obtains the expert rule which is most matched with the rule, and then uses the problem code block and the corresponding expert rule as the input of the expert system for repairing the code to obtain the corresponding code repairing result.
In other embodiments, the generative model may generate a new repair rule, in addition to the N expert rules, as a third target expert rule for automatically repairing the code defect based on the input target code, the code defect, and the first target expert rule.
Fig. 5 schematically illustrates a flow chart of a method of code defect repair according to another embodiment of the present disclosure.
As shown in fig. 5, the rest of this embodiment is the same as that of fig. 2 except for operations S510 to S530, and will not be described again here. Before code intent understanding of object code with a generative model, this embodiment includes:
in operation S510, N expert rules and M1 first code samples with code defects are obtained, each first code sample having a corresponding code defect label, M1 being an integer greater than or equal to 1;
The first code sample refers to a code sample with a code defect label for training a generative model, for example, a section of code containing an error grammar may be used as the first code sample. Code defect labels refer to labels on marked code samples for identifying defects or problems in the code, which may be generated, for example, by marking the location and type of defects in the code sample. Expert rules and code samples containing defects are obtained for subsequent training and fine tuning.
In operation S520, N expert rules and M1 first code samples are input into the pre-trained generative model for fine tuning.
The method has the advantages that the generated model can better understand and generate the target codes with the code intention through learning expert rules and defect samples, and how to repair the code defects by using expert rules, so that the generation effect and accuracy are further improved.
The fine tuning process is a process of applying a pre-trained generative model to a generative task. The purpose of the fine tuning is to enable the generative model to have text generating capability for the code repair scene, namely, response information can be generated according to the code defects input by a user. The fine tuning process can be performed by means of adjusting the super parameters of the model, modifying the loss function and the like, and finally the performance in the code defect repairing task is improved, so that the target expert rule with better repairing effect can be determined from N expert rules.
According to the embodiment of the disclosure, through learning expert rules and defect samples, the generated model can accurately detect and correct defects in codes, and the quality and reliability of the codes are improved.
In operation S530, when the code intent understanding result characterizes the presence of a code defect, the trimmed generative model is caused to generate a second target expert rule based on the code intent understanding result. Operation S530 is one of the embodiments of operation S250.
The second target expert rule may be from the knowledge base, or may be a correction or supplement to any expert rule in the knowledge base, or may be a completely new repair rule.
According to the embodiment of the disclosure, the fine-tuned generative model can be utilized to process the defects, so that the accuracy and efficiency of repairing the code defects are improved. Based on the knowledge of the expert system (the problem code is used as input, and the corresponding expert rule is used as output), the large model is directionally optimized, and thus, the accuracy of the large model in the aspect of problem code identification is improved.
In some embodiments, after repairing the code defect based on the second target expert rule, further comprising: and if the N expert rules do not comprise the second target expert rules, updating the knowledge base, wherein the updated knowledge base comprises the N expert rules and the second target expert rules.
According to an embodiment of the present disclosure, a second target expert rule is added to the knowledge base so that a subsequent code defect repair process may be used. The updating of the knowledge base can keep the latest expert rules, so that the subsequent code defect repair is more accurate and efficient.
Fig. 6 schematically illustrates a flow chart of a method of code defect repair according to another embodiment of the present disclosure.
As shown in fig. 6, the rest of this embodiment is the same as that of fig. 2 except for operation S610, and will not be described here again. This embodiment includes:
in operation S610, when the code intention understanding result indicates that the code defect exists, N expert rules are input to the generative model, resulting in a second target expert rule. Operation S610 is one of the embodiments of operation S250.
Unlike fig. 5, this embodiment does not use N expert rules to fine tune the generative model in advance, which can save time for fine tuning. And taking the N expert rules as prompt words, and outputting a second target expert rule on the basis that the generated model carries out intention understanding on the target codes, the code defects and the N expert rules. In this embodiment, the second target expert rule may be from the knowledge base, or may be a correction or supplement to any expert rule in the knowledge base, or may be a completely new repair rule.
Fig. 7 schematically illustrates a flow chart of a method of code defect repair according to another embodiment of the present disclosure.
As shown in fig. 7, the rest of this embodiment is the same as that of fig. 2 except for operations S710 to S720, and will not be described again here. Before matching code flaws with N expert rules using a decision tree model, this embodiment includes:
in operation S710, N expert rules and M2 second code samples having code defects are obtained, each second code sample having a corresponding code defect tag, M2 being an integer greater than or equal to 1;
in operation S720, a decision tree model is trained based on the N expert rules and the M2 second code samples.
In the training process, the decision tree model correlates the expert rules and the features in the second code sample with the corresponding code defect labels, so that the new code sample can be classified and judged.
In the training process, first, expert rules are acquired from an expert, and second code samples with code defect labels are collected, wherein the samples are composed of parts with defects in actual codes. Next, feature extraction is performed for each second code sample, for example, various features such as the number of code lines, function call relations, variable use cases, and the like are extracted from the code. The purpose of feature extraction is to transform the code samples into numerical features that can be processed by machine learning algorithms. And then, matching the extracted features with the corresponding code defect labels to form a training data set. Each sample contains a set of features and a tag that indicates whether the sample has a code defect. Finally, model training is carried out on the prepared training data set by using a decision tree algorithm. The decision tree algorithm builds a decision tree based on expert rules and the characteristics of the code samples. In the training process, the algorithm gradually builds nodes and branches of the decision tree according to the characteristics and labels of the samples, so that a decision tree model for repairing the code defects is built.
According to the embodiment of the disclosure, the detection efficiency and accuracy of the code defects can be improved. By training the decision tree model using expert rules and a second code sample with code defects, features and patterns of code defects can be effectively captured, making the detection process more automated and reliable.
Fig. 8 schematically illustrates a flow chart of a method of code defect repair according to another embodiment of the present disclosure.
As shown in fig. 8, the rest of this embodiment is the same as that of fig. 2 except for operations S810 to S820, and will not be described again here. Operations S810 to S820 are one embodiment of operation S240, and specifically include:
in operation S810, inputting an object code, a code defect, and a first object expert rule into a generative model;
for example, the target code, code flaws, and first target expert rules as inputs, the target code and code flaws provide specific problems to be repaired, and the first target expert rules provide guidance for the repair process. The generated model utilizes the input information and combines the learning ability of the model to generate a specific repairing scheme.
In operation S820, the generative model is caused to repair the code defect of the target code based on the first target expert rule.
The generative model generates a repair code or further provides a repair suggestion and executes according to the code intent understanding result by referring to the first target expert rule.
By inputting the object code, the code defects and expert rules, the generative model can automatically generate the repair code or provide repair suggestions, thereby reducing the workload of manual repair.
Fig. 9 schematically illustrates a flow chart of a method of code defect repair according to another embodiment of the present disclosure.
As shown in fig. 9, the rest of this embodiment is the same as that of fig. 2 except for operations S910 to S920, and will not be described again here. This embodiment includes:
in operation S910, the code defect is matched in the knowledge base by using the decision tree model, and the first target expert rule is not obtained; and matching the defect characteristics of the target codes with defects in the knowledge base through the decision tree model, and determining no target expert rules.
In operation S920, an object code and a code defect are input to the generative model, and a first object expert rule output by the generative model is obtained.
The generative model generates specific expert rules according to the input target codes and the code defect characteristics so as to guide the repairing process. The output of the generated model cooperates with the matching result of the decision tree model, and more specific and targeted repair suggestions are efficiently provided.
For example, the decision tree model matches the characteristics of the code flaws with flaws in the knowledge base and finds the appropriate repair method. For example, for a boundary-check error, the decision tree model may not match the target expert rule. The object code and defects are then input to a generative model that outputs suggestions to add boundary check code to the image processing function to avoid out of range access.
Fig. 10 schematically illustrates a flow chart of a method of code defect repair according to another embodiment of the present disclosure.
As shown in fig. 10, this embodiment includes:
in operation S1001, expert models and knowledge data (such as the type and content of the error report and how the error report is resolved) are acquired;
in operation S1002, the obtained expert experience and knowledge are converted into formalized expert rules, which specifically include knowledge representation, knowledge storage, knowledge reasoning, and the like, to generate a knowledge base. And forming an expert system based on the decision tree model obtained by training and the knowledge base.
Illustratively, the code repair expert system is constructed as follows:
1. knowledge data (such as various types and contents of errors and solutions to how the errors were resolved) is collected, taking the main data asset of the trip expert system as an example of a Java language program that did not properly close the file.
Defect code:
in the above code, if an exception is thrown while the reader is used, the file will not be properly closed, which is a code defect.
Correctly repaired code:
the correct solution is to close the file in the final block.
2. The data acquired in the first step is subjected to rule conversion of expert experience, for example, in the first step, the java read file is not closed correctly, and the problem and the solution can be converted into the following two expert rules:
rule 1: if a Java program opens a file but does not close the file at the end of the program, the program has a code defect.
Rule 2: if a Java program has a code defect, i.e. the file is not closed correctly, closing the file in the finaly block may repair the defect.
3. And taking the code defects, the corresponding expert rules and the corresponding solutions as training data, and training and generating a decision tree model of the code repair expert system. Take the scikit-learn library of Python as an example. Training a decision tree based on sample data such as code defects, expert rules and repair schemes, wherein the training is roughly as follows:
from sklearn.model_selection import train_test_split
from sklearn import tree
suppose X is the feature data (i.e., code defect and corresponding expert rule pair, which can be converted to vectors by word2vec technique of nlp), and y is the target data (i.e., repair scheme)
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2)
A decision tree model is created again and trained with training data:
clf=tree.DecisionTreeClassifier()
clf=clf.fit(X_train,y_train)
this model can also be used to predict new data:
new_data= [. The.+ -.) ] # new feature data
print(clf.predict(new_data))
In operation S1003, training data of the generative model is obtained through a third party code hosting repository such as internet data crawling, gitsub, gitee, etc., and the problem code is made to be input in the training phase, and the corresponding expert rule is made to be output.
In operation S1004, the generated model is trained, such as pre-trained and fine-tuned, using the training data to obtain a trained generated model.
In operation S1005, a code of a defect to be repaired is determined.
In operation S1006, the code of the defect to be repaired is scanned using the static code detection tool.
In operation S1007, it is determined whether an abnormality is detected. If detected, operation S1008 is performed, and if not, operation S1012 is performed.
If an anomaly is detected, it is input to the expert system in operation S1008.
In operation S1009, it is determined whether there is a matching rule based on the output of the decision tree model. If yes, operation S1010 is performed, and if no, operation S1011 is performed.
In operation S1010, if there is a matching rule, i.e., a first target expert rule, the code is repaired based on the matching expert rule. And automatically testing the repaired code by using an automatic testing tool.
In operation S1011, the generative model understands the code intention of the target code, and determines whether or not the current code has a defect. If so, operation S1012 is performed.
In operation S1012, the interpretation content of the code defect by the generative model is extracted, matched with the rule of the expert system for code repair, and the expert rule which is the best matched with the interpretation content is obtained, namely, the second target expert rule. Operation S1010 is then performed and the repaired code is automatically tested using an automated test tool.
In operation S1013, it is determined whether or not the automated test passes.
In operation S1014, a result output, i.e., repaired object code, is performed after the automated test passes. Otherwise, if the data does not pass, the data is re-matched or manually processed.
According to the embodiment of the disclosure, automatic code repair is completed based on the expert model or the large model code generation model, so that the empirical assets for code repair can be precipitated and reused, and the efficiency of code defect repair is greatly improved. The method has the universality of multiple fields and multiple languages, codes in different languages can be subjected to code defect repair by using similar steps, and the generalization is good.
Based on the code defect repairing method, the disclosure also provides a code defect repairing device. The device will be described in detail below with reference to fig. 11.
Fig. 11 schematically illustrates a block diagram of a code defect repair apparatus according to an embodiment of the present disclosure.
As shown in fig. 11, the code defect repair apparatus 1100 of this embodiment includes a defect detection module 1110, a first rule module 1120, a second rule module 1130, and a defect repair module 1140.
The defect detection module 1110 may perform operation S210 for detecting a code defect existing in the target code;
the first rule module 1120 may execute operation S220, configured to match the code defect in a knowledge base by using a decision tree model if the code defect is detected, to obtain a first target expert rule, where the knowledge base includes N expert rules, each expert rule indicates repair information of the corresponding class of code defect, and N is an integer greater than or equal to 1;
the second rule module 1130 may perform operation S230 for performing code intention understanding on the target code using the generative model if the code defect is not detected, and obtaining a second target expert rule based on the code intention understanding result when the code intention understanding result indicates that the code defect exists;
the defect repair module 1140 may perform operation S240 for repairing a code defect based on the first target expert rule or the second target expert rule.
In some embodiments, the first rule module 1120 may further perform operations S310 to S320, and operations S410 to S420, which are not described herein.
In some embodiments, the code defect repair apparatus 1100 includes a trimming module that may perform operations S510-S520, and in some embodiments, the second rule module 1130 may perform operations S530 or S610, which are not described herein.
In some embodiments, the code defect repair apparatus 1100 includes a decision tree module, which may perform operations S710 to S720, which are not described herein.
In some embodiments, the defect repair module 1140 may perform operations S810-S820, which are not described herein.
In some embodiments, the first rule module 1120 may further perform operations S910 to S920, which are not described herein.
Note that the code defect repair apparatus 1100 includes modules for performing the respective steps of any one of the embodiments described above with reference to fig. 2 to 10. The implementation manner, the solved technical problems, the realized functions and the realized technical effects of each module/unit/sub-unit and the like in the apparatus part embodiment are the same as or similar to the implementation manner, the solved technical problems, the realized functions and the realized technical effects of each corresponding step in the method part embodiment, and are not repeated herein.
Any of the plurality of modules of defect detection module 1110, first rule module 1120, second rule module 1130, and defect repair module 1140 may be combined in one module, or any of the plurality of modules may be split into a plurality of modules, according to embodiments of the present disclosure. Alternatively, at least some of the functionality of one or more of the modules may be combined with at least some of the functionality of other modules and implemented in one module.
According to embodiments of the present disclosure, at least one of defect detection module 1110, first rule module 1120, second rule module 1130, and defect repair module 1140 may be implemented, at least in part, as a hardware circuit, such as a Field Programmable Gate Array (FPGA), programmable Logic Array (PLA), system-on-chip, system-on-substrate, system-on-package, application Specific Integrated Circuit (ASIC), or in hardware or firmware, such as any other reasonable manner of integrating or packaging the circuits, or in any one of or a suitable combination of three of software, hardware, and firmware. Alternatively, at least one of defect detection module 1110, first rule module 1120, second rule module 1130, and defect repair module 1140 may be implemented, at least in part, as a computer program module that, when executed, performs the corresponding functions.
Fig. 12 schematically illustrates a block diagram of an electronic device adapted to implement a code defect repair method according to an embodiment of the disclosure.
As shown in fig. 12, an electronic device 1200 according to an embodiment of the present disclosure includes a processor 1201, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 1202 or a program loaded from a storage section 1208 into a Random Access Memory (RAM) 1203. The processor 1201 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or an associated chipset and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. Processor 1201 may also include on-board memory for caching purposes. The processor 1201 may include a single processing unit or multiple processing units for performing the different actions of the method flows according to embodiments of the disclosure.
In the RAM 1203, various programs and data required for the operation of the electronic apparatus 1200 are stored. The processor 1201, the ROM 1202, and the RAM 1203 are connected to each other through a bus 1204. The processor 1201 performs various operations of the method flow according to the embodiments of the present disclosure by executing programs in the ROM 1202 and/or RAM 1203. Note that the program may be stored in one or more memories other than the ROM 1202 and the RAM 1203. The processor 1201 may also perform various operations of the method flow according to embodiments of the present disclosure by executing programs stored in one or more memories.
According to an embodiment of the disclosure, the electronic device 1200 may also include an input/output (I/O) interface 1205, the input/output (I/O) interface 1205 also being connected to the bus 1204. The electronic device 1200 may also include one or more of the following components connected to the I/O interface 1205: an input section 1206 including a keyboard, a mouse, and the like; an output portion 1207 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 1208 including a hard disk or the like; and a communication section 1209 including a network interface card such as a LAN card, a modem, or the like. The communication section 1209 performs communication processing via a network such as the internet. The drive 1210 is also connected to the I/O interface 1205 as needed. A removable medium 1211 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on the drive 1210 so that a computer program read out therefrom is installed into the storage section 1208 as needed.
The present disclosure also provides a computer-readable storage medium that may be embodied in the apparatus/device/system described in the above embodiments; or may exist alone without being assembled into the apparatus/device/system. The computer-readable storage medium carries one or more programs which, when executed, implement methods in accordance with embodiments of the present disclosure.
According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example, but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, the computer-readable storage medium may include the ROM 1202 and/or the RAM 1203 and/or one or more memories other than the ROM 1202 and the RAM 1203 described above.
Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the methods shown in the flowcharts. The program code, when executed in a computer system, causes the computer system to perform the methods provided by embodiments of the present disclosure.
The above-described functions defined in the system/apparatus of the embodiments of the present disclosure are performed when the computer program is executed by the processor 1201. The systems, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.
In one embodiment, the computer program may be based on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program can also be transmitted, distributed over a network medium in the form of signals, and downloaded and installed via a communication portion 1209, and/or from a removable medium 1211. The computer program may include program code that may be transmitted using any appropriate network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
In such an embodiment, the computer program can be downloaded and installed from a network via the communication portion 1209, and/or installed from the removable media 1211. The above-described functions defined in the system of the embodiments of the present disclosure are performed when the computer program is executed by the processor 1201. The systems, devices, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.
According to embodiments of the present disclosure, program code for performing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, such computer programs may be implemented in high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. Programming languages include, but are not limited to, such as Java, c++, python, "C" or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Those skilled in the art will appreciate that the features recited in the various embodiments of the disclosure and/or in the claims may be provided in a variety of combinations and/or combinations, even if such combinations or combinations are not explicitly recited in the disclosure. In particular, the features recited in the various embodiments of the present disclosure and/or the claims may be variously combined and/or combined without departing from the spirit and teachings of the present disclosure. All such combinations and/or combinations fall within the scope of the present disclosure.
The embodiments of the present disclosure are described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described above separately, this does not mean that the measures in the embodiments cannot be used advantageously in combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be made by those skilled in the art without departing from the scope of the disclosure, and such alternatives and modifications are intended to fall within the scope of the disclosure.

Claims (13)

1. A method of code defect repair, comprising:
detecting code defects existing in the target code;
if the code defects are detected, matching the code defects in a knowledge base by utilizing a decision tree model to obtain a first target expert rule, wherein the knowledge base comprises N expert rules, each expert rule indicates repair information of the corresponding class of code defects, and N is an integer greater than or equal to 1;
If the code defect is not detected, carrying out code intention understanding on the target code by using a generative model, and obtaining a second target expert rule based on the code intention understanding result when the code intention understanding result represents that the code defect exists;
repairing the code defect based on the first target expert rule or the second target expert rule.
2. The method of claim 1, wherein if the code defect is detected, after deriving the first target expert rule, the method further comprises:
inputting the target code, the code defect and the first target expert rule into the generative model to obtain a third target expert rule output by the generative model;
wherein repairing the code defect based on the first target expert rule or the second target expert rule comprises:
and repairing the code defect by taking the third target expert rule as the new first target expert rule.
3. The method of claim 2, wherein obtaining the third target expert rule of the generative model output comprises:
causing the generative model to perform code intent understanding on the target code, the code flaw, and the first target expert rule to simulate repair of the code flaw based on the first target expert rule;
And when the simulation repairing result does not accord with a preset condition, the generated model determines the third target expert rule from N expert rules, wherein the third target expert rule is different from the first target expert rule.
4. The method of claim 1, wherein,
after detecting a code defect existing in the target code, if the code defect is detected, matching the code defect in a knowledge base by utilizing a decision tree model, and obtaining no first target expert rule, wherein the method further comprises the following steps:
and inputting the target codes and the code defects into the generative model to obtain the first target expert rules output by the generative model.
5. The method of claim 1, wherein prior to code intent understanding of the object code with a generative model, the method further comprises:
obtaining N expert rules and M1 first code samples with code defects, wherein each first code sample is provided with a corresponding code defect label, and M1 is an integer greater than or equal to 1;
n expert rules and M1 first code samples are input into the pre-trained generative model for fine tuning.
6. The method of claim 5, wherein deriving a second target expert rule based on the code intent understanding result when the code intent understanding result characterizes the presence of the code defect comprises:
when the code intent understanding result characterizes the presence of the code defect, causing the trimmed generative model to generate the second target expert rule based on the code intent understanding result.
7. The method of claim 6, wherein after repairing the code defect based on the second target expert rule, the method further comprises:
and if the N expert rules do not comprise the second target expert rules, updating the knowledge base, wherein the updated knowledge base comprises the N expert rules and the second target expert rules.
8. The method of claim 1, wherein deriving a second target expert rule based on the code intent understanding result when the code intent understanding result characterizes the presence of the code defect comprises:
and when the code intention understanding result represents that the code defect exists, inputting N expert rules into the generative model to obtain the second target expert rule.
9. The method of claim 1, wherein prior to matching the code flaws with the N expert rules using a decision tree model, the method further comprises:
obtaining N expert rules and M2 second code samples with code defects, wherein each second code sample is provided with a corresponding code defect label, and M2 is an integer greater than or equal to 1;
training the decision tree model based on the N expert rules and the M2 second code samples.
10. The method of claim 1, wherein repairing the code defect based on the first target expert rule comprises:
inputting the object code, the code flaw, and the first object expert rule into the generative model;
causing the generative model to repair the code defect of the target code based on the first target expert rule.
11. A code defect repair apparatus comprising:
the defect detection module is used for detecting code defects existing in the target code;
the first rule module is used for matching the code defects in a knowledge base by utilizing a decision tree model if the code defects are detected, so as to obtain a first target expert rule, wherein the knowledge base comprises N expert rules, each expert rule indicates repair information of the corresponding category code defects, and N is an integer greater than or equal to 1;
The second rule module is used for carrying out code intention understanding on the target code by utilizing a generative model if the code defect is not detected, and obtaining a second target expert rule based on the code intention understanding result when the code intention understanding result represents that the code defect exists;
and the defect repairing module is used for repairing the code defects based on the first target expert rules or the second target expert rules.
12. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-9.
13. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method according to any of claims 1 to 9.
CN202311558286.0A 2023-11-21 2023-11-21 Code defect repairing method, device, electronic equipment and storage medium Pending CN117453566A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311558286.0A CN117453566A (en) 2023-11-21 2023-11-21 Code defect repairing method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311558286.0A CN117453566A (en) 2023-11-21 2023-11-21 Code defect repairing method, device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117453566A true CN117453566A (en) 2024-01-26

Family

ID=89579927

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311558286.0A Pending CN117453566A (en) 2023-11-21 2023-11-21 Code defect repairing method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117453566A (en)

Similar Documents

Publication Publication Date Title
US10830817B2 (en) Touchless testing platform
US20190228319A1 (en) Data-driven automatic code review
US10067983B2 (en) Analyzing tickets using discourse cues in communication logs
US11327874B1 (en) System, method, and computer program for orchestrating automatic software testing
CN110705255B (en) Method and device for detecting association relation between sentences
US10769057B2 (en) Identifying potential errors in code using machine learning
US11176019B2 (en) Automated breakpoint creation
CN110490304B (en) Data processing method and device
US10489728B1 (en) Generating and publishing a problem ticket
CN113869789A (en) Risk monitoring method and device, computer equipment and storage medium
CN113591998A (en) Method, device, equipment and storage medium for training and using classification model
CN116560631B (en) Method and device for generating machine learning model code
US20100305986A1 (en) Using Service Exposure Criteria
US20230297784A1 (en) Automated decision modelling from text
CN115587029A (en) Patch detection method and device, electronic equipment and computer readable medium
CN117453566A (en) Code defect repairing method, device, electronic equipment and storage medium
Wang et al. Multi-type source code defect detection based on TextCNN
CN114117445A (en) Vulnerability classification method, device, equipment and medium
CN112115212A (en) Parameter identification method and device and electronic equipment
Kikuma et al. Automatic test case generation method for large scale communication node software
CN117290856B (en) Intelligent test management system based on software automation test technology
US20230342553A1 (en) Attribute and rating co-extraction
Zeng et al. Type analysis and automatic static detection of infeasible paths
US20230359824A1 (en) Feature crossing for machine learning
US20230394327A1 (en) Generating datasets for scenario-based training and testing of machine learning systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination