WO2023010916A1 - 软件自动修复方法、系统、电子设备及存储介质 - Google Patents

软件自动修复方法、系统、电子设备及存储介质 Download PDF

Info

Publication number
WO2023010916A1
WO2023010916A1 PCT/CN2022/091008 CN2022091008W WO2023010916A1 WO 2023010916 A1 WO2023010916 A1 WO 2023010916A1 CN 2022091008 W CN2022091008 W CN 2022091008W WO 2023010916 A1 WO2023010916 A1 WO 2023010916A1
Authority
WO
WIPO (PCT)
Prior art keywords
patch
code
abstract syntax
result
vector
Prior art date
Application number
PCT/CN2022/091008
Other languages
English (en)
French (fr)
Inventor
程圣宇
朱琪豪
孙泽宇
肖元安
张文杰
熊英飞
张路
曹继承
彭星海
Original Assignee
中兴通讯股份有限公司
北京大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司, 北京大学 filed Critical 中兴通讯股份有限公司
Publication of WO2023010916A1 publication Critical patent/WO2023010916A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/71Version control; Configuration management

Definitions

  • the embodiments of the present application relate to the computer field, and in particular to a software automatic repair method, system, electronic equipment, and storage medium.
  • ABSF Automatic software repair
  • An embodiment of the present application provides a method for automatic software repair, including: obtaining software defect codes; generating a patch template conforming to the syntax of the language used by the software defect codes according to the grammatical features of the software defect codes and the trained patch template generation model; Fill in the patch template to generate patches for software defect codes; use patches to repair software defect codes.
  • An embodiment of the present application provides an automatic software repair system, including: an acquisition module, used to acquire software defect codes; a template generation module, used to generate a model that conforms to the A patch template of the syntax of the language used by the defective code; a patch generating module, configured to fill the patch template, and generate a patch for the software defective code; and a repair module, configured to use the patch to repair the software defective code.
  • the embodiment of the present application also provides an electronic device, including: at least one processor; a memory communicatively connected to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, So that at least one processor can execute the above automatic software repair method.
  • the embodiment of the present application also provides a computer-readable storage medium storing a computer program, and implementing the above automatic software repair method when the computer program is executed by a processor.
  • Fig. 1 is a flowchart of a software automatic repair method provided according to an embodiment of the present application
  • Fig. 2 is the extended grammatical rule of software automatic repair provided according to an embodiment of the present application
  • FIG. 3 is a schematic diagram of a patch template generation model provided according to an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a proposer/decision maker structure provided according to an embodiment of the present application.
  • Fig. 5 is a schematic diagram of implementing a software automatic repair method provided according to an embodiment of the present application.
  • Fig. 6 is a schematic diagram of a software automatic repair system provided according to an embodiment of the present application.
  • Fig. 7 is a schematic structural diagram of an electronic device provided according to an embodiment of the present application.
  • the main purpose of the embodiment of the present application is to provide an automatic software repair method, system, electronic device and storage medium, which can generate highly adaptable patches for defect codes in different programming languages, and improve the repair ability of automatic software repair.
  • the embodiment of the present application relates to a method for automatic software repair, as shown in Figure 1, the method specifically includes:
  • Step 101 obtaining the software defect code
  • Step 102 according to the grammatical features of the software defect code and the trained patch template generation model, generate a patch template conforming to the syntax of the language used by the software defect code;
  • Step 103 filling the patch template to generate a patch for the software defect code
  • Step 104 repair software defect codes with a patch.
  • the automatic software repair method of this embodiment is applied to electronic devices, such as computers, mobile phones, tablets, etc., by developing an automatic software repair system to realize automatic software repair.
  • the software automatic repairing method of this embodiment generates a patch template conforming to the grammar of the language used by the software defect code by generating a model according to the grammatical features of the software defect code and the trained patch template, fills the patch template, and generates a patch of the software defect code , which can make the generated patch consistent with the syntax of the software defect code, no matter what language the software defect code is written in, an adapted patch can be generated for repair, and the repair ability of automatic software repair can be improved.
  • step 101 the electronic device acquires a software defect code, that is, a fault source code.
  • the software defect code can be obtained by the electronic device according to the code test result, or can be obtained by the electronic device reading the input content of the technician.
  • step 102 the electronic device generates a patch template conforming to the syntax of the language used by the software defect code according to the grammatical features of the software defect code and the trained patch template generation model.
  • the electronic device can use basic deep learning technology to self-learn the syntax of the software defect code from the obtained software defect code, call the trained patch template generation model, and generate a patch template conforming to the grammar of the software defect code.
  • the electronic device before step 102, the electronic device also determines the defect function in the software defect code based on the defect location technology; parses the defect function, and generates a first abstract syntax tree (Abstract Syntax Tree, "AST" for short); according to The feature vector, label and adjacency matrix of each node in an abstract syntax tree are obtained to obtain the preorder traversal sequence of the first abstract syntax tree, the label vector of each node and the first abstract syntax graph; according to the preset grammar rules and grammar features, The extended program grammar is obtained; according to the rule sequence of the extended program grammar, the word embedding method is called to generate each rule sequence embedding vector and program patch; Syntax tree: Obtain a second abstract syntax graph and a second abstract syntax tree path according to the second abstract syntax tree.
  • AST Abstract Syntax Tree
  • Patch template generation model including: Code Encoder, Patch Encoder, Abstract Syntax Tree Path Encoder and Extended Syntax Decoder.
  • Generating the patch template of the software defect code specifically includes: inputting the preorder traversal sequence, each node label vector and the first abstract syntax diagram into the code encoder to obtain the code encoding result; embedding the code encoding result and each rule sequence into the vector, each rule The sequence encoding vector and the second abstract syntax graph are input into the patch encoder to obtain the patch encoding result; the code encoding result, the code encoding result and the second abstract syntax tree path are input into the abstract syntax tree path encoder to obtain the abstract syntax tree path encoding result; Input the encoding result of the abstract syntax tree path into the extended syntax decoder, select the optimal rule sequence; generate a patch template according to the optimal rule sequence.
  • the optimal rule sequence is not a complete patch sequence, but a sequence segment in the complete patch sequence.
  • the patch template generation model needs to encode this sequence segment again through the code encoder, patch encoder, and abstract syntax tree path
  • the iterative operation of the decoder and the extended syntax decoder extends and extends the sequence segment until the sequence segment cannot be extended or the length reaches the preset requirement, then the iterative operation is stopped to obtain a complete patch sequence.
  • the patch template generation model generates patch templates according to each complete patch sequence generated.
  • the initial input of the patch encoder can be a preset string, which identifies the beginning of the patch sequence, and generates the first sequence segment in the extended syntax decoder
  • the first sequence segment is input into the patch encoder during the iterative operation, and the first sequence segment is extended and expanded through the iterative operation, and a new sequence segment is connected after the first sequence segment to obtain an extended extension
  • the extended sequence segment will be extended to continue iterative operation to obtain a complete patch sequence.
  • each node in the first abstract syntax tree represents a character string
  • its feature vector is obtained by vectorizing the character string through word embedding technology.
  • the preorder traversal sequence is obtained by preorder traversal of the first abstract syntax tree, and each element in the sequence is a feature vector.
  • the present application also marks a label for each first abstract syntax tree node, which is used to represent the positional relationship between the character string represented by the node and the defective code line where the defect is located, and has four types: 1.
  • the node is located in the defective code 2.
  • the node is located in the previous line of the defective code line; 3.
  • the node is located in the subsequent line of the defective code line; 4.
  • the node is located in other lines.
  • each first abstract syntax tree node is also converted into a node label vector through the word embedding technology, and each node label vector sequence and the preorder traversal sequence have the same node order. Since the preorder traversal sequence and the label vectors of each node do not contain the structural information between the nodes on the first abstract syntax tree, this application also processes the first abstract syntax tree to obtain the first abstract syntax graph, and combines the nodes only with Its closest left neighbors are connected without adding other extra edges.
  • the storage form of the first abstract syntax graph is the adjacency matrix of each node.
  • the present application also uses the stored preset grammar rules, for example, the grammar rules of the modification operation and the grammar (also called native grammar) information of the software defect code, to obtain the extended program grammar. And use the grammar to analyze the obtained rule sequence, call the word embedding method, generate each rule sequence embedding vector and program patch, and the rule sequence embedding vector is the ID encoding rule information of the usage rule.
  • These rules are expressed as real vectors by means of embedding.
  • the grammatical rules that is, the rule sequence
  • the rule sequence are regarded as atomic marks by the rule definition encoding, in fact, this will lead to the loss of part of the information of the rule content.
  • a regular sequence encoding vector is generated according to the encoding sequence numbers in the regular sequence.
  • the abstract syntax tree (ie, the second abstract syntax tree) of the partial patch template is constructed by repairing the patch, and the second abstract syntax graph and the partial abstract syntax tree path (ie, the second abstract syntax tree path) are obtained by using the partial abstract syntax tree.
  • Figure 2 shows the specific composition of the extended grammar of the present application.
  • This application is not designed for a specific programming language, so the original programming language is called HL (host language).
  • NTS represents the original non-terminal symbol in the HL grammar
  • ⁇ HLStatement> represents the non-terminal symbol representing the expression in the HL grammar
  • ⁇ HLIdentifier> represents the terminal symbol in the HL grammar.
  • the extended grammar of this application includes the following 6 rules: Rule 1 stipulates that a patch contains one or more modification operations. Rule 2 states that modification operations have two types, add and/or change. Rule 3 declares the syntax for modifying operations of type add.
  • the modification operation of the add type will insert a newly generated expression before the defective line code, that is, ⁇ HLIidentifier> can be expanded into an expression by HL syntax or copied from the original defective function.
  • Rule 4 declares the syntax of modification operations of type change.
  • the modification operation of the change type will replace a part of the subtree of the defective code.
  • the modification operation contains two parameters, the first parameter is the position of the subtree to be replaced, represented by its position in the sequence of the preorder traversal of the abstract syntax tree.
  • the second parameter represents the newly generated subtree, which is used to replace the original defective subtree, where the new subtree and the old subtree need to have the same root node to meet the grammatical correctness of the replaced program .
  • this application proposes a copy operation, which can copy an expression of the same type from a defective function when generating a new abstract syntax tree.
  • Rule 5 declares the grammar of the operation that can be used to extend arbitrary nonterminals in the HL grammar.
  • the copy operation has a parameter that specifies the position of the abstract syntax tree that needs to be copied, and also uses its position in the preorder traversal sequence to indicate the position of the syntax tree, where the root node of the subtree to be copied also needs to be the same as the non- Terminals have the same node type to ensure syntactic correctness after copying.
  • Rule 6 declares that a terminal symbol can be converted into a special placeholder in the patch template. When the model judges that a terminal symbol should be expanded into a project-specific identifier, the placeholder can be used to replace its position in the patch. At the same time, the terminal symbol can also be replaced by a common identifier in the vocabulary. In the implementation of this application, identifiers that appear more than 100 times in the training set are added to the vocabulary.
  • FIG. 3 it is a schematic diagram of the patch template generation model constructed by the neural network algorithm adopted in the present application.
  • the patch template generation model of this application consists of 4 components in total, code encoder (code encoder): used to process the abstract syntax tree of the input defect function; patch encoder (AST encoder): used to process half of the generated patches The abstract syntax tree of the abstract syntax tree; the path encoder of the abstract syntax tree (tree path encoder): used to process the abstract syntax tree path from the root node to the expanded node; the expanded syntax decoder (expanded syntax decoder): used according to the hidden layer Input to output the probability of each grammar rule being selected.
  • code encoder used to process the abstract syntax tree of the input defect function
  • patch encoder AST encoder
  • the path encoder of the abstract syntax tree (tree path encoder): used to process the abstract syntax tree path from the root node to the expanded node
  • the expanded syntax decoder expanded syntax
  • the code encoder includes: the first self-attention layer, the first gating layer and the first graph convolution layer; the preorder traversal sequence, each node label vector and the first abstract syntax graph input code encoding device to obtain the code encoding result, including: obtaining the position feature vector of each node according to the preorder traversal sequence; obtaining the first query vector, the first key value vector and the first weight value according to the preorder traversal sequence and position feature vector Vector; input the first query vector, the first key vector and the first weight vector into the first self-attention layer to obtain the first self-attention result; input the first self-attention result node label vector into the first The gating layer obtains the first gating result; inputs the first gating result and the first abstract syntax graph into the first graph convolution layer to obtain the first graph convolution result; assigns the first graph convolution result to the first The query vector, the first key-value vector, and the first weight vector are iteratively calculated on the first self-attention layer, the first gating layer, and the
  • the composition of the first self-attention layer may be a self-attention neuron, and the neuron first needs to use a position feature vector to represent the position information of each node.
  • the calculation formula of the position feature vector is as follows:
  • step is the dimension of the preset word embedding vector
  • the word embedding vector is the vector obtained after the feature vector is processed by word embedding
  • i means that the word is the i-th member of its sequence
  • j means word embedding The value of the jth dimension of the vector.
  • the code encoder fuses the position feature vector with the three input vectors to obtain a first query vector (Q), a first key-value vector (K) and a first weight vector (V).
  • the position feature vector is fused with the same input vector to obtain Q, K, and V with the same values.
  • the self-attention neuron calculates the input Q, K, and V based on the multi-head attention mechanism.
  • the calculation process of a single head is as follows:
  • d k d/H
  • d the dimension of the word embedding vector
  • H the head number of self-attention neurons
  • T the transposition operation.
  • the result calculated by the self-attention layer is the first self-attention result.
  • the code encoder inputs each node label vector of the first self-attention result into the first gating layer, wherein the first gating layer may be composed of gating neurons.
  • the gating neuron has three input parameters, the query vector q and two vectors c 1 and c 2 , where q and c 1 are assigned by the first self-attention result, and c 2 is assigned by the node label vector.
  • the calculation process of the gated neuron is as follows:
  • i represents that the word is the i-th member of the sequence
  • is the weight calculated by the corresponding vector
  • the code encoder inputs the first gating result and the first abstract syntax graph into the first graph convolution layer to obtain the first graph convolution result, wherein the first graph convolution layer may be composed of graph convolution neurons.
  • the calculation process of this neuron can be expressed as:
  • A is the regularized adjacency matrix of the first abstract syntax graph G
  • rs and r p represent any node in the graph G
  • u p represents the feature vector of the corresponding node
  • the initial value of the feature vector corresponding to the node is the previous neuron
  • W g is the weight matrix used in the graph convolution network that can be learned by the neural network, and the initial value is any value.
  • the code encoder assigns the first image convolution result to Q, K, and V, and iteratively calculates the first self-attention layer, the first gating layer, and the first image convolution layer to obtain the code encoding result.
  • the code encoder can be a first self-attention layer, a first gating layer, and a first image convolution layer as a group, and by setting N 1 groups, N 1 iterative calculations can be realized.
  • the patch encoder includes: a second self-attention layer, a second gating layer, a natural language attention layer, and a second graph convolution layer; the code encoding result and each rule sequence are embedded in the vector, each rule
  • the sequence encoding vector and the second abstract syntax diagram are input into the patch encoder to obtain the patch encoding result, including: obtaining the position feature vector of each node according to the preorder traversal sequence; obtaining the second query according to the regular sequence embedding vector and position feature vector vector, the second key-value vector and the second weight vector; the second query vector, the second key-value vector and the second weight vector are input into the second self-attention layer to obtain the second self-attention result; the first Two self-attention results and each rule sequence encoding vector are input into the second gating layer to obtain the second gating result; the code encoding result and the second gating result are input into the natural language attention layer to obtain the natural language attention result; The first natural language attention result and the second abstract syntax graph are input to the second graph
  • the position feature vector calculated by the patch encoder is the same as the position feature vector in the code encoder
  • the second query vector, the second key value vector and the second weight value vector are respectively the same as the first query vector, the first key value vector
  • the calculation process of the value vector and the first weight vector is the same, only need to change the preorder traversal sequence in the calculation process to a regular sequence embedding vector.
  • the second self-attention layer can consist of the same self-attention neurons as the code encoder.
  • the second gating layer can consist of the same gating neurons as the code encoder.
  • the natural language attention layer can be composed of the same self-attention neurons as the code encoder
  • the second graph convolutional layer can be composed of the same first graph convolutional neurons as the code encoder.
  • a sequence of grammar rules r 1 , r 2 , ..., r P used to generate a partial AST in the decoding step, where P denotes The length of the sequence. It is also possible to express these grammatical rules as real number vectors r 1 , r 2 , ..., r P by means of embedding, for grammatical rule i: a-->b 1 ...b K , where a is the parent node , and b 1 ...b K is the predecessor node. They can be terminal or nonterminal. Index i is the ID of the rule.
  • ri is the table query embedding of rule ri
  • ri (c) is the content encoding rule representation, and we encode the predecessor node information a again.
  • layer normalization is also performed.
  • the patch encoder can be a group of a second self-attention layer, a second gating layer, a natural language attention layer and a second graph convolution layer, and realize N 2 iterations by setting N 2 groups calculate.
  • the abstract syntax tree path encoder includes: a patch attention layer, a code attention layer and a fully connected layer; the code encoding result, the code encoding result and the second abstract syntax tree path are input into the abstract syntax tree path encoder , to obtain the encoding result of the abstract syntax tree path, including: input the patch encoding result and the second abstract syntax tree path into the patch attention layer to obtain the patch attention result; input the code encoding result and the patch attention result into the code attention layer to obtain Code attention result; input the code attention result into the fully connected layer, assign the output result of the fully connected layer to the second abstract syntax tree path, iteratively calculate the patch attention layer, code attention layer and fully connected layer, and obtain the abstract The syntax tree path encodes the result.
  • the patch attention layer can be composed of patch attention neurons
  • the patch attention neuron is the same as the self-attention neuron of the code encoder
  • the code attention layer can be composed of code attention neurons
  • the code attention neuron Same as the code encoder's self-attention neuron.
  • the abstract syntax tree path encoder can be a patch attention neuron, a code attention neuron, and a fully connected neuron as a group. By setting N 3 groups, N 3 iteration calculations can be realized.
  • the abstract syntax tree path encoder combines the generated patch information with the defect code description, and combines it with the corresponding abstract syntax tree path information.
  • the abstract syntax tree path refers to the depth traversal sequence from the root node to the syntax tree node to be expanded. Similar to the abstract syntax tree reader, in the abstract syntax tree path encoder we use multiple modules with the same structure (each module contains multiple sublayers). Residual connections and layer normalization are used between each sublayer.
  • the Abstract Syntax Tree Path Encoder takes as query input the non-terminal nodes to be expanded. A query node is represented as a path from the root node to the node to be expanded.
  • An abstract syntax tree attention sublayer is applied on the output of the patch encoder and features are extracted.
  • Q is computed from the query q i (path)
  • K and V are computed from the code features output by the code encoder.
  • the Abstract Syntax Tree Path Encoder will further incorporate functions from the input description into the Decoder. This combination is also achieved through the attention sublayer, where Q is computed from the output features of the abstract syntax tree attention sublayer; and the output of the K and V code encoders.
  • the extended grammar decoder includes: a native rule proposer, a copy rule proposer, a defective subtree proposer, and a decision maker; input the encoding result of the abstract syntax tree path into the extended grammar decoder, and select the best rule sequence, Generate a patch template, including: input the encoded results of the abstract syntax tree path into the native rule proposer, copy rule proposer and defect subtree proposer respectively, to obtain the selection probability of the extended rule; among them, the native rule proposer is used to generate predefined extended rules The selection probability of , the copy rule proposer is used to select the subtree, and the defect subtree proposer selects the subtree position with defect; the result of encoding the abstract syntax tree path, the selection probability of the expansion rule, the subtree of the first abstract syntax tree and the defect subtree The position of the subtree is input into the decision maker to obtain the probability of the best rule; according to the probability of the best rule, the best rule sequence is obtained.
  • each proposer will give multiple alternative grammar rules, and give the estimated probability p of each grammar rule.
  • proposer 1 there may be selection 1-1, selection 1-2 to selection 1-m, corresponding to p 1-1 , p 1-2 to p 1-m , and so on
  • to proposer N there may be selection Nt, selection N-2 to selection Nm, corresponding to p NN , p N-2 to p Nt , based on the node type of the extended syntax tree, the decision maker needs to give the probability q of each proposer.
  • the probability of each grammatical rule is finally calculated by p*q.
  • each proposer There will be a logical part in each proposer. For those rules that are included in the proposer but cannot be used to expand the current syntax tree node (for example, the left node of the grammar rule is of different type from the current node), the logical part will reset the probability of the corresponding grammar rule to 0.
  • this logic component will also reset the corresponding probability to 0, so that the final probability of the grammar rule proposed by the proposer is 0, which also ensures the grammatical correctness of the patch generated by this application.
  • the implementation of this application contains three proposers and one decision maker.
  • the first proposer is the native rule proposer (Rule Predictor), which is used to estimate the selection probability of predefined extension rules.
  • the second proposer is the copy rule proposer, which is used to select a suitable subtree in the subtree copy operation.
  • the third proposer is the subtree proposer, which is used to select the defective subtree position when expanding the change node.
  • the decision maker outputs the selection probabilities of the three proposers, which are combined with the probabilities generated by each proposer to output the probability of the best grammar rule.
  • This application iteratively generates a complete rule sequence starting from a special initial rule.
  • this application proposes a proposer/decider structure to estimate the probability of expanding the rule at each step.
  • the function of the proposer is to provide a set of different available rules, and each rule has its corresponding probability of being selected.
  • the function of the decider is to provide the selection probability of different proposers. For the options provided by illegal proposers, the decider will modify the probability to 0.
  • the probability of the final grammar rule is determined by the probability provided by the decider and the proposer’s obtained by multiplying the probabilities.
  • the copy rule proposer is also used to generate the copy operation code according to the position of the subtree corresponding to the defect function after selecting the subtree of the first abstract syntax tree;
  • the defect subtree proposer is also used to After selecting the subtree position with defect, generate defect subtree code according to the subtree position with defect; code the result of abstract syntax tree path, the selection probability of expansion rule, the subtree of the first abstract syntax tree and the subtree with defect
  • the tree position is input into the decision maker, including: the encoding result of the abstract syntax tree path, the selection probability of the expansion rule, the copy operation encoding, and the defective subtree encoding are input into the decision maker.
  • step 103 the electronic device fills the patch template with the identifier of the software defect code to generate a patch of the software defect code.
  • the present application proposes a technique of using a placeholder in a patch template to fill in the patch template, aiming at the deficiency that some software automatic repair technologies cannot generate item-specific identifiers.
  • Some software automatic repair techniques solve the problem of not being able to generate project-specific identifiers.
  • the direct method is to let the neural network choose the appropriate identifier from the input context, but this requires the context of the entire software defect code as the input of the model.
  • This application proposes to generate some specific placeholders in the patch to replace these project-specific identifiers. When the patch is applied to the defective program, these placeholders will be instantiated as corresponding identifiers.
  • the number of available identifiers for a location is not too high, so the placeholders don't have much impact on the syntactic content of the patch template.
  • step 104 the electronic device repairs software defect codes with a patch.
  • the present application proposes to employ an extended syntax-guided decoder to generate modification operations instead of the full repaired code.
  • This application draws on the grammar-guided decoder in the field of automatic code generation.
  • this application converts the patch into a sequence of predefined modification operations.
  • the modification operation can express a small part of the program more concisely. on the modification.
  • the sequence of modification operations can also be described by a set of extended grammars that include the original language grammar. Therefore, the application provides an extended grammar suitable for modifying operations on the basis of the original grammar, so that software defect codes can be repaired with patches.
  • the present application provides an automatic software repair method based on extended grammar rules, which converts the software method to be repaired into an expression of an abstract syntax tree, and uses the method of generating a sequence of grammar rules to generate a patch template, and finally correspondingly Fill in the patch template to obtain the repair plan of the software to be repaired, and help developers repair software defects that occur during the development process.
  • extended grammar rules which converts the software method to be repaired into an expression of an abstract syntax tree
  • uses the method of generating a sequence of grammar rules to generate a patch template, and finally correspondingly Fill in the patch template to obtain the repair plan of the software to be repaired, and help developers repair software defects that occur during the development process.
  • this application proposes to use an extended syntax-guided decoder to generate modification operations instead of complete repaired code. This application draws on the syntax-guided decoder in the field of automatic code generation.
  • the decoder regards code generation as the extension process of the abstract syntax tree of the code, and estimates the probability of the next syntax rule selection according to the generated part of the abstract syntax tree.
  • the decoder can ensure that the generated patches must satisfy the syntax of the corresponding language.
  • this application converts the patch into a sequence of predefined modification operations, and the modification operations can more succinctly represent the modification of a small part of the program.
  • the sequence of modification operations can also be described by a set of extended grammars that include the original language grammar. Therefore, the present application provides an extended grammar suitable for modifying operations on the basis of the original grammar.
  • the first abstract syntax tree may also be traversed in in-order or post-order, and the corresponding traversal order is used for each vector sequence.
  • the automatic software repair method of the present application is used to conduct defect repair experiments, specifically having a relatively high repair rate.
  • the training set required for training the model.
  • This application crawls the submission records of Java grammar created between March 2011 and March 2018 from the Github code warehouse, and uses keyword screening to filter out and repair related code submission records and only Modify the commit record of a code snippet.
  • the final training data set contains a total of 103,585 training data, of which 80% are used as training sets and 20% are used as verification sets.
  • the verification of the experiment of this application uses 395 defects of the commonly used defect data set Defects4J v1.2 and an additional 420 defects of Defects4J v2.0.
  • the defect localization method used in the experiment is the Ochiai algorithm based on test sample coverage, which is commonly used in software automatic repair research. Each defect is given 5 hours for patch verification.
  • TBar and Simfix are the two software automatic repair technologies with the best performance in Defects4J v1.2.
  • the table lists the total number of defects repaired by the three technologies on the two test data sets. It can be seen from the table that this application repairs 11 more defects than TBar on Defects4J v1.2, and repairs twice as many defects on Defects4J v2.0. These results show that this application has stronger repair ability and better generalization than the existing technology.
  • the implementation of the present application also relates to a software automatic repair system, as shown in Figure 6, including:
  • the template generation module 602 is used to generate a patch template conforming to the syntax of the language used by the software defect code according to the grammatical features of the software defect code and the trained patch template generation model;
  • Patch generation module 603, fills the patch template, generates the patch of software defect code
  • the repairing module 604 is used for repairing software defect codes with a patch.
  • the method before generating a patch template conforming to the syntax of the language used by the software defect code according to the grammatical features of the software defect code and the trained patch template generation model, the method further includes: based on the defect location technology, in the software defect code Determine the defect function in the method; analyze the defect function to generate the first abstract syntax tree; according to the feature vector, label and adjacency matrix of each node in the first abstract syntax tree, obtain the preorder traversal sequence and the label of each node of the first abstract syntax tree Vector and the first abstract syntax diagram; according to the preset grammar rules and grammar features, the extended program grammar is obtained; according to the rule sequence of the extended program grammar, the word embedding method is called to generate each rule sequence embedding vector and program patch; according to the rule sequence Encoding serial numbers to generate encoding vectors for each rule sequence; generating a second abstract syntax tree according to the program patch; obtaining a second abstract syntax graph and a second abstract syntax tree path according to the second abstract syntax tree; generating a patch template model, including:
  • the code encoder includes: the first self-attention layer, the first gating layer and the first graph convolution layer; the preorder traversal sequence, each node label vector and the first abstract syntax graph input code encoding device to obtain the code encoding result, including: obtaining the position feature vector of each node according to the preorder traversal sequence; obtaining the first query vector, the first key value vector and the first weight value according to the preorder traversal sequence and position feature vector Vector; input the first query vector, the first key vector and the first weight vector into the first self-attention layer to obtain the first self-attention result; input the first self-attention result node label vector into the first The gating layer obtains the first gating result; inputs the first gating result and the first abstract syntax graph into the first graph convolution layer to obtain the first graph convolution result; assigns the first graph convolution result to the first The query vector, the first key-value vector, and the first weight vector are iteratively calculated on the first self-attention layer, the first gating layer, and the
  • the patch encoder includes: a second self-attention layer, a second gating layer, a natural language attention layer, and a second graph convolution layer; the code encoding result and each rule sequence are embedded in the vector, each rule
  • the sequence encoding vector and the second abstract syntax diagram are input into the patch encoder to obtain the patch encoding result, including: obtaining the position feature vector of each node according to the preorder traversal sequence; obtaining the second query according to the regular sequence embedding vector and position feature vector vector, the second key-value vector and the second weight vector; the second query vector, the second key-value vector and the second weight vector are input into the second self-attention layer to obtain the second self-attention result; the first Two self-attention results and each rule sequence encoding vector are input into the second gating layer to obtain the second gating result; the code encoding result and the second gating result are input into the natural language attention layer to obtain the natural language attention result; The natural language attention result and the second abstract syntax graph are input into the second graph con
  • the abstract syntax tree path encoder includes: a patch attention layer, a code attention layer and a fully connected layer; the code encoding result, the code encoding result and the second abstract syntax tree path are input into the abstract syntax tree path encoder , to obtain the encoding result of the abstract syntax tree path, including: input the patch encoding result and the second abstract syntax tree path into the patch attention layer to obtain the patch attention result; input the code encoding result and the patch attention result into the code attention layer to obtain Code attention result; input the code attention result into the fully connected layer, assign the output result of the fully connected layer to the second abstract syntax tree path, iteratively calculate the patch attention layer, code attention layer and fully connected layer, and obtain the abstract The syntax tree path encodes the result.
  • the extended grammar decoder includes: a native rule proposer, a copy rule proposer, a defective subtree proposer, and a decision maker; input the encoding result of the abstract syntax tree path into the extended grammar decoder, and select the best rule sequence, Generate a patch template, including: input the encoded results of the abstract syntax tree path into the native rule proposer, copy rule proposer and defect subtree proposer respectively, to obtain the selection probability of the extended rule; among them, the native rule proposer is used to generate predefined extended rules The selection probability of , the copy rule proposer is used to select the subtree, and the defect subtree proposer selects the subtree position with defect; the result of encoding the abstract syntax tree path, the selection probability of the expansion rule, the subtree of the first abstract syntax tree and the defect subtree The position of the subtree is input into the decision maker to obtain the probability of the best rule; according to the probability of the best rule, the best rule sequence is obtained.
  • the copy rule proposer is also used to generate the copy operation code according to the position of the subtree corresponding to the defect function after selecting the subtree of the first abstract syntax tree;
  • the defect subtree proposer is also used to After selecting the subtree position with defect, generate defect subtree code according to the subtree position with defect; code the result of abstract syntax tree path, the selection probability of expansion rule, the subtree of the first abstract syntax tree and the subtree with defect
  • the tree position is input into the decision maker, including: the encoding result of the abstract syntax tree path, the selection probability of the expansion rule, the copy operation encoding, and the defective subtree encoding are input into the decision maker.
  • the embodiment of the present application also relates to an electronic device, as shown in FIG. 7 , including: at least one processor 701; a memory 702 communicatively connected to at least one processor; The executed instructions are executed by at least one processor 701 in any one of the foregoing method embodiments.
  • the memory 702 and the processor 701 are connected by a bus, and the bus may include any number of interconnected buses and bridges, and the bus connects one or more processors 701 and various circuits of the memory 702 together.
  • the bus may also connect together various other circuits such as peripherals, voltage regulators, and power management circuits, all of which are well known in the art and therefore will not be further described herein.
  • the bus interface provides an interface between the bus and the transceivers.
  • a transceiver may be a single element or multiple elements, such as multiple receivers and transmitters, providing means for communicating with various other devices over a transmission medium.
  • the information processed by the processor 701 is transmitted on the wireless medium through the antenna, and further, the antenna also receives the information and transmits the information to the processor 701 .
  • the processor 701 is responsible for managing the bus and general processing, and may also provide various functions including timing, peripheral interface, voltage regulation, power management and other control functions. Instead, memory 702 may be used to store information used by the processor when performing operations.
  • Embodiments of the present application relate to a computer-readable storage medium storing a computer program.
  • the above method embodiments are implemented when the computer program is executed by the processor.
  • the program is stored in a storage medium, and includes several instructions to make a device ( It may be a single-chip microcomputer, a chip, etc.) or a processor (processor) to execute all or part of the steps of the methods in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disc, etc., which can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Stored Programmes (AREA)

Abstract

一种软件自动修复方法、系统、电子设备及存储介质,软件自动修复方法包括:获取软件缺陷代码(101);根据软件缺陷代码的语法特征和训练好的补丁模板生成模型,生成符合软件缺陷代码所使用语言的语法的补丁模板(102);填充补丁模板,生成软件缺陷代码的补丁(103);用补丁修复软件缺陷代码(104)。

Description

软件自动修复方法、系统、电子设备及存储介质
交叉引用
本申请基于申请号为“202110904041.3”、申请日为2021年8月6日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此以引入方式并入本申请。
技术领域
本申请实施例涉及计算机领域,特别涉及一种软件自动修复方法、系统、电子设备及存储介质。
背景技术
软件自动修复(Automatic Bug Fixing,简称“ABF”),严格意义上来说是指在不需要人工介入的情况下,通过自动化生成正确的修复包来修复目标软件中存在的Bug的一种程序。
传统的软件自动修复技术主要从代码仓库中挖掘预存的补丁模板、根据缺陷代码进行贪心搜索以及随机搜索等技术,生成能够通过测试样例的软件补丁。由于预存的补丁模板有限,但程序开发语言多种多样,预存的补丁模板泛化能力有限,并不能完全适配所有缺陷代码,尤其在用于处理新的软件缺陷时,根据预存的补丁模板,难以得到适配的补丁,从而难以完成对缺陷代码的自动修复。
发明内容
本申请实施例提供了一种软件自动修复方法,包括:获取软件缺陷代码;根据软件缺陷代码的语法特征和训练好的补丁模板生成模型,生成符合软件缺陷代码所使用语言的语法的补丁模板;填充补丁模板,生成软件缺陷代码的补丁;用补丁修复软件缺陷代码。
本申请实施例提供了一种软件自动修复系统,包括:获取模块,用于获取软件缺陷代码;模板生成模块,用于根据软件缺陷代码的语法特征和训练好的补丁模板生成模型,生成符合软件缺陷代码所使用语言的语法的补丁模板;补丁生成模块,用于填充补丁模板,生成软件缺陷代码的补丁;修复模块,用于用补丁修复软件缺陷代码。
本申请实施例还提供了一种电子设备,包括:至少一个处理器;与至少一个处理器通信连接的存储器;存储器存储有可被至少一个处理器执行的指令,指令被至少一个处理器执行,以使至少一个处理器能够执行上述软件自动修复方法。
本申请实施例还提供了一种计算机可读存储介质,存储有计算机程序,计算机程序被处理器执行时实现上述软件自动修复方法。
附图说明
一个或多个实施例通过与之对应的附图中的图片进行示例性说明,这些示例性说明并不构成对实施例的限定,附图中具有相同参考数字标号的元件表示为类似的元件,除非有特别申明,附图中的图不构成比例限制。
图1是根据本申请一实施例提供的软件自动修复方法流程图;
图2是根据本申请一实施例提供的软件自动修复的扩展语法规则;
图3是根据本申请一实施例提供的补丁模板生成模型的示意图;
图4是根据本申请一实施例提供的提案器/决策器结构的示意图;
图5是根据本申请一实施例提供的软件自动修复方法实施示意图;
图6是根据本申请一实施例提供的软件自动修复系统示意图;
图7是根据本申请一实施例提供的电子设备结构示意图。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合附图对本申请的各实施例进行详细的阐述。然而,本领域的普通技术人员可以理解,在本申请各实施例中,为了使读者更好地理解本申请而提出了许多技术细节。但是,即使没有这些技术细节和基于以下各实施例的种种变化和修改,也可以 实现本申请所要求保护的技术方案。以下各个实施例的划分是为了描述方便,不应对本申请的具体实现方式构成任何限定,各个实施例在不矛盾的前提下可以相互结合相互引用。
本申请实施例的主要目的在于提出一种软件自动修复方法、系统、电子设备及存储介质,能够针对不同程序开发语言的缺陷代码生成适配性高的补丁,提高软件自动修复的修复能力。
本申请的实施例涉及一种软件自动修复方法,如图1所示,方法具体包括:
步骤101,获取软件缺陷代码;
步骤102,根据软件缺陷代码的语法特征和训练好的补丁模板生成模型,生成符合软件缺陷代码所使用语言的语法的补丁模板;
步骤103,填充补丁模板,生成软件缺陷代码的补丁;
步骤104,用补丁修复软件缺陷代码。
本实施例的软件自动修复方法,应用于电子设备中,例如,电脑、手机、平板等,通过开发一个软件自动修复系统,实现软件的自动修复。本实施例的软件自动修复方法,通过根据软件缺陷代码的语法特征和训练好的补丁模板生成模型,生成符合软件缺陷代码所使用语言的语法的补丁模板,填充补丁模板,生成软件缺陷代码的补丁,可以使得生成的补丁与软件缺陷代码的语法相符,不论软件缺陷代码以何种语言编写,都可以生成适配的补丁进行修复,提高软件自动修复的修复能力。
下面对本实施方式的软件自动修复方法的实现细节进行具体的说明,以下内容仅为方便理解提供的实现细节,并非实施本方案的必须。
在步骤101中,电子设备获取软件缺陷代码,即故障源代码。
具体地,软件缺陷代码可以由电子设备根据代码测试结果获取,也可以由电子设备读取技术人员的输入内容获取。
在步骤102中,电子设备根据软件缺陷代码的语法特征和训练好的补丁模板生成模型,生成符合软件缺陷代码所使用语言的语法的补丁模板。
具体地,电子设备可以基本深度学习技术,从获取的软件缺陷代码中,自学习软件缺陷代码的语法,调用训练好的补丁模板生成模型,生成符合软件缺陷代码的语法的补丁模板。
在一个例子中,在步骤102之前,电子设备还基于缺陷定位技术,在软件缺陷代码中确定缺陷函数;解析缺陷函数,生成第一抽象语法树(Abstract Syntax Tree,简称“AST”);根据第一抽象语法树中每一节点的特征向量、标签及邻接矩阵,得到第一抽象语法树的前序遍历序列、各节点标签向量及第一抽象语法图;根据预设的语法规则和语法特征,得到扩展程序语法;根据扩展程序语法的规则序列,调用词嵌入方法,生成各规则序列嵌入向量和程序补丁;根据规则序列的编码序号,生成各规则序列编码向量;根据程序补丁,生成第二抽象语法树;根据第二抽象语法树,得到第二抽象语法图及第二抽象语法树路径。
补丁模板生成模型,包括:代码编码器、补丁编码器、抽象语法树路径编码器和扩展语法解码器。生成软件缺陷代码的补丁模板,具体包括:将前序遍历序列、各节点标签向量及第一抽象语法图输入代码编码器,得到代码编码结果;将代码编码结果和各规则序列嵌入向量、各规则序列编码向量、第二抽象语法图输入补丁编码器,得到补丁编码结果;将代码编码结果、代码编码结果和第二抽象语法树路径输入抽象语法树路径编码器,得到抽象语法树路径编码结果;将抽象语法树路径编码结果输入扩展语法解码器,选取最佳规则序列;根据最佳规则序列,生成补丁模板。
其中,最佳规则序列并不是一个完整的补丁序列,而是完整的补丁序列中的一个序列段,补丁模板生成模型需要将此序列段再次经过代码编码器、补丁编码器、抽象语法树路径编码器和扩展语法解码器的迭代运算,对序列段进行延长扩展,直到此序列段无法扩展或者长度达到预设要求时,停止迭代运算,得到一个完整的补丁序列。补丁模板生成模型根据生成的各完整的补丁序列,生成补丁模板。
具体地,在每个生成完整的补丁序列的过程中,补丁编码器的初始输入可以是一个预设的字符串,以这个字符串标识补丁序列的开始,在扩展语法解码器生成首个序列段后,将此首个序列段在迭代运算时,再输入补丁编码器,经过迭代运算对此首个序列段进行延长扩展,在首个序列段后续接一下新的序列段,即得到一个延长扩展后的序列段,将延长扩展后的序列段继续迭代运算,得到一个完整的补丁序列。
其中,第一抽象语法树中每一节点都代表一个字符串,其特征向量将字符 串经过词嵌入技术处理,将字符串向量化得到的。前序遍历序列是对第一抽象语法树进行前序遍历得到,序列中每个元素即是一个特征向量。本申请还为每一个第一抽象语法树节点标记一个标签,该标签用于表征该节点表示的字符串与缺陷所在的缺陷代码行的位置关系,具有四种类型:1、该节点位于缺陷代码行内部;2、该节点位于缺陷代码行的前一行;3、该节点位于缺陷代码行的后一行;4、该节点位于其他行。本申请将每个第一抽象语法树节点的标签同样通过词嵌入技术被转化为节点标签向量,同时各节点标签向量序列和前序遍历序列拥有一样的节点顺序。由于前序遍历序列和各节点标签向量没有包含第一抽象语法树上各节点之间的结构信息,因此本申请还对第一抽象语法树进行处理,得到第一抽象语法图,将节点只和其最接近的左邻居相连接,不添加其他额外的边。第一抽象语法图的存储形式是各节点的邻接矩阵。
本申请还使用存储的预设的语法规则,比如,修改操作的语法规则和软件缺陷代码的语法(也称为原生语法)信息,得到扩展程序语法。并用该语法解析得到的规则序列,调用词嵌入方法,生成各规则序列嵌入向量和程序补丁,规则序列嵌入向量是使用规则的ID编码规则信息。通过embedding的方法将这些规则表示为实数向量。一些软件自动修复方法中,以规则定义编码将语法规则(即规则序列)视为原子标记,实际上,这样会导致丢失部分规则内容的信息。为了缓解此问题,我们使用规则定义的编码来增强规则的表示形式,将规则内容编码为向量。根据规则序列中的编码序号生成规则序列编码向量。通过修复补丁构造部分补丁模板的抽象语法树(即第二抽象语法树),利用部分抽象语法树得到第二抽象语法图及部分抽象语法树路径(即第二抽象语法树路径)。
图2给出了本申请的扩展语法的具体组成。本申请没有针对特定的程序语言进行设计,因此将原生的程序语言称为HL(host language)。NTS代表HL语法中的原有的非终结符,<HLStatement>代表HL语法中代表表达式的非终结符,<HLIdentifier>代表HL语法中的终结符。本申请的扩展语法包括以下6个规则:规则1规定一个补丁包含一个或多个修改操作。规则2规定修改操作具有两种类型,add和/或change。规则3声明了add类型的修改操作的语法。add类型的修改操作会在缺陷行代码前插入一个新生成的表达式,即<HLIdentifier>可以被HL的语法扩展为一个表达式或是从原本的缺陷函数中拷贝一个表达式。 规则4声明了change类型的修改操作的语法。change类型的修改操作会替换缺陷代码的一部分子树。该修改操作包含两个参数,第一个参数是被替换的子树的位置,用其在抽象语法树的前序遍历的序列中的位置来表示。第二个参数代表新生成的子树,用于替换原有的出现缺陷的子树,其中需要满足新的子树和旧的子树需要有相同的根节点以满足替换后程序的语法正确性。在两种类型的修改操作中,模型都需要生成新的抽象语法树。在实际情况中,虽然替换前后的程序上都会有不同程度的修改,但是很大一部分的程序之间是存在相同的部分。利用这个性质,本申请提出一种拷贝操作,能够在生成新的抽象语法树的时候,从缺陷函数中拷贝一个类型相同的表达式。规则5声明了该操作的语法,该操作可以被用于扩展HL语法中的任意的非终结符。拷贝操作具有一个参数,该参数指明需要被拷贝的抽象语法树的位置,同样使用其在前序遍历序列中的位置来标明语法树的位置,其中被拷贝的子树的根节点同样需要与非终结符具有相同的节点类型,以保证拷贝后的语法正确性。规则6声明补丁模板内可以将终结符转化为一个特殊的占位符placeholder,当模型判断一个终结符应该被扩展为项目特定的标识符的时候,可以用placeholder来代替其在补丁中的位置,同时终结符也可以被词表中的某个常用标识符所代替。在本申请的实现中,在训练集中出现次数超过100次的标识符,将其加入词表中。
图3所示,是本申请所采用的神经网络算法构造的补丁模板生成模型的示意图。本申请的补丁模板生成模型总共包含4个组成部分,代码编码器(code encoder):用来处理输入的缺陷函数的抽象语法树;补丁编码器(AST encoder):用来处理生成至一半的补丁的抽象语法树;抽象语法树的路径编码器(tree path encoder):用来处理根节点至被扩展节点的抽象语法树路径;扩展语法解码器(expanded syntax decoder):用来根据隐含层的输入来输出每一条语法规则被选择的概率。
在一个例子中,代码编码器,包括:第一自注意力层、第一门控层和第一图卷积层;将前序遍历序列、各节点标签向量及第一抽象语法图输入代码编码器,得到代码编码结果,包括:根据前序遍历序列,获取各节点的位置特征向量;根据前序遍历序列与位置特征向量,获取第一问询向量、第一键值向量及第一权值向量;将第一问询向量、第一键值向量及第一权值向量输入第一自注 意力层,得到第一自注意力结果;将第一自注意力结果各节点标签向量输入第一门控层,得到第一门控结果;将第一门控结果与第一抽象语法图输入第一图卷积层,得到第一图卷积结果;将第一图卷积结果赋值给第一问询向量、第一键值向量及第一权值向量,对第一自注意力层、第一门控层和第一图卷积层进行迭代计算,得到代码编码结果。
其中,如图3所示,第一自注意力层的组成可以是自注意力神经元,该神经元首先需要用位置特征向量来表征每一个节点的位置信息。位置特征向量的计算公式如下:
Figure PCTCN2022091008-appb-000001
Figure PCTCN2022091008-appb-000002
其中pos=i+step,step为预设的词嵌入向量的维度,词嵌入向量是特征向量经过词嵌入处理后得到的向量,i代表该单词是其序列的第i个成员,j代表词嵌入向量第j维的数值。代码编码器将位置特征向量分别与三个输入的向量做融合,得到第一问询向量(Q)、第一键值向量(K)及第一权值向量(V)。本实施例中,将位置特征向量与同一个输入向量做融合,得到值相同的Q、K、V。
自注意力神经元基于多头注意力机制,对输入的Q、K、V进行计算,单个头的计算过程如下:
Figure PCTCN2022091008-appb-000003
其中,d k=d/H,d为词嵌入向量的维度,H为自注意神经元的头数,T是转置运算。自注意力层计算得到的结果为第一自注意力结果。
代码编码器将第一自注意力结果各节点标签向量输入第一门控层,其中,第一门控层可以由门控神经元组成。门控神经元具有三个输入参数,问询向量q以及两个向量c 1和c 2,其中,q和c 1由第一自注意力结果赋值,c 2由节点标签向量赋值。门控神经元计算过程如下:
Figure PCTCN2022091008-appb-000004
Figure PCTCN2022091008-appb-000005
Figure PCTCN2022091008-appb-000006
其中i代表该单词是该序列第i个成员,α为对应向量所计算的权重,
Figure PCTCN2022091008-appb-000007
Figure PCTCN2022091008-appb-000008
代表c 1,c 2经过第一全连接层计算之后的特征向量,
Figure PCTCN2022091008-appb-000009
代表c 1,c 2经过第二全连接层计算之后的特征向量。
代码编码器将第一门控结果与第一抽象语法图输入第一图卷积层,得到第一图卷积结果,其中,第一图卷积层可以由图卷积神经元组成。该神经元的计算过程可以表示为:
Figure PCTCN2022091008-appb-000010
其中A为第一抽象语法图G的正则化邻接矩阵,r s与r p代表图G中的任一节点,u p代表对应节点的特征向量,节点对应的特征向量的初始值为上一个神经元的输出,即,对应节点的h i向量。W g是图卷积网络中使用的可以被神经网络学习的权重矩阵,初始值为任意值。
代码编码器将第一图卷积结果赋值给Q、K、V,对第一自注意力层、第一门控层和第一图卷积层进行迭代计算,得到代码编码结果。其中,代码编码器可以一个第一自注意力层、一个第一门控层、一个第一图卷积层为一组,通过设置N 1组,实现N 1次迭代计算。
在一个例子中,补丁编码器,包括:第二自注意力层、第二门控层、自然语言注意力层和第二图卷积层;将代码编码结果和各规则序列嵌入向量、各规则序列编码向量、第二抽象语法图输入补丁编码器,得到补丁编码结果,包括:根据前序遍历序列,获取各节点的位置特征向量;根据规则序列嵌入向量与位置特征向量,获取第二问询向量、第二键值向量及第二权值向量;将第二问询向量、第二键值向量及第二权值向量输入第二自注意力层,得到第二自注意力结果;将第二自注意力结果和各规则序列编码向量输入第二门控层,得到第二门控结果;将代码编码结果及第二门控结果输入自然语言注意力层,得到自然语言注意力结果;将第一自然语言注意力结果和第二抽象语法图输入第二图卷积层,得到第二图卷积结果;将第二图卷积结果赋值给第二问询向量、第二键 值向量及第二权值向量,对第二自注意力层、第二门控层、自然语言注意力层和第二图卷积层进行迭代计算,得到补丁编码结果。
其中,补丁编码器的计算的位置特征向量与代码编码器中的位置特征向量相同,第二问询向量、第二键值向量及第二权值向量分别与第一问询向量、第一键值向量及第一权值向量的计算过程相同,只需要将计算过程中的前序遍历序列更改为规则序列嵌入向量即可。第二自注意力层可以由与代码编码器相同的自注意力神经元组成。第二门控层可以由与代码编码器相同的门控神经元组成。自然语言注意力层可以由与代码编码器相同的自注意力神经元组成,第二图卷积层可以由与代码编码器相同的第一图卷积神经元组成。
在一个例子中,在第二门控层和自然语言注意力层之间,有一个语法规则序列r 1,r 2,...,r P用于在解码步骤中生成部分AST,其中P表示序列的长度。还可以进行通过embedding的方法将这些语法规则表示为实数向量r 1,r 2,...,r P,对于语法规则i:a-->b 1...b K,其中a是父节点,而b 1...b K是前继节点。它们可以是终结符或非终结符。索引i是规则的ID。我们使用全连接的方式,通过将规则内容编码为向量r (c)。其中,输入为向量a b 1b K。特别的,该序列也被填充到最大长度。然后,规则定义特征y 1 (rule),...,y P (rule)由另一个全连接层计算得出。
Figure PCTCN2022091008-appb-000011
其中r i是规则r i的表查询嵌入,r i (c)是内容编码规则表示,并且我们再次编码了前继节点信息a。在步骤后,还进行了层归一化。
其中,补丁编码器可以一个第二自注意力层、一个第二门控层、一个自然语言注意力层和一个第二图卷积层为一组,通过设置N 2组,实现N 2次迭代计算。
在一个例子中,抽象语法树路径编码器,包括:补丁注意力层、代码注意力层和全连接层;将代码编码结果、代码编码结果和第二抽象语法树路径输入抽象语法树路径编码器,得到抽象语法树路径编码结果,包括:将补丁编码结果与第二抽象语法树路径输入补丁注意力层,得到补丁注意力结果;将代码编码结果与补丁注意力结果输入代码注意力层,得到代码注意力结果;将代码注 意力结果输入全连接层,将全连接层输出结果赋值给第二抽象语法树路径,对补丁注意力层、代码注意力层和全连接层进行迭代计算,得到抽象语法树路径编码结果。
其中,补丁注意力层可以由补丁注意力神经元组成,补丁注意力神经元与代码编码器的自注意力神经元相同,代码注意力层可以由代码注意力神经元组成,代码注意力神经元与代码编码器的自注意力神经元相同。抽象语法树路径编码器可以一个补丁注意力神经元、一个代码注意力神经元、一个全连接神经元为一组,通过设置N 3组,实现N 3次迭代计算。
抽象语法树路径编码器将生成的补丁信息与缺陷代码描述结合在一起,并与对应的抽象语法树路径信息结合。抽象语法树路径是指将根节点到要被扩展的语法树节点之间的深度遍历序列。与抽象语法树读取器类似,在抽象语法树路径编码器中我们使用了多个结构相同的模块(每个模块包含多个子层)。在每个子层之间使用残差连接及层归一化。抽象语法树路径编码器将要扩展的非终结节点作为查询输入。查询节点表示为从根节点到要扩展的节点的路径。我们将该路径中的节点表示为实数,然后对这些向量应用的全连接层,其的输出为q i (path)。然后,我们应用两个与代码编码器相同结构的注意力子层来结合代码编码器和补丁编码器的输出。
在补丁编码器的输出上应用抽象语法树注意力子层,并提取特征。在这一层中,Q是根据查询q i (path)计算得到的,K和V是根据代码编码器输出的代码特征计算得出的。抽象语法树路径编码器将从输入描述中进一步结合到解码器中功能。这种结合也通过注意力子层来实现,其中Q由抽象语法树注意力子层的输出特征计算;和K和V代码编码器的输出计算。最后,我们使用了两层全连接,其中第一层具有GELU激活函数,然后提取特征以进行预测。
在一个例子中,扩展语法解码器,包括:原生规则提案器、拷贝规则提案器、缺陷子树提案器和决策器;将抽象语法树路径编码结果输入扩展语法解码器,选取最佳规则序列,生成补丁模板,包括:将抽象语法树路径编码结果分别输入原生规则提案器、拷贝规则提案器及缺陷子树提案器,得到扩展规则选择概率;其中,原生规则提案器用于生成预定义的扩展规则的选择概率,拷贝规则提案器用于选择子树,缺陷子树提案器选择具有缺陷的子树位置;将抽象 语法树路径编码结果、扩展规则选择概率、第一抽象语法树的子树和具有缺陷的子树位置输入决策器,获取最佳规则的概率;根据最佳规则的概率,得到最佳规则序列。
如图4所示,是本申请提出的提案器/决策器结构的示意图。在扩展抽象语法树的时候,每一个提案器会给出多个可供选择的语法规则,并给出每一个语法规则的估计概率p。例如,对提案器1可以有选择1-1、选择1-2到选择1-m,对应p 1-1、p 1-2到p 1-m,以此类推,至提案器N可以有选择N-t、选择N-2到选择N-m,对应p N-N、p N-2到p N-t,基于被扩展的语法树的节点类型,决策器需要给出每一个提案器的概率q。例如,对提案器1可以有对应的概率q 1,至提案器N可以有对应的概率q N。每一条语法规则的概率最后由p*q计算得出。
每一个提案器中会存在一个逻辑部件,对于那些包含在该提案器中,但又不能用于扩展当前语法树节点的规则(例如语法规则的左侧节点和当前节点类型不同),该逻辑部件会将相应语法规则的概率重置为0。
决策器中同样存在一个相似的逻辑部件,对于那些不能被用于对应节点的提案器,该逻辑部件也会将对应的概率重置为0,使得该提案器所提出的语法规则的最终概率为0,这同样保证了本申请生成的补丁的语法正确性。
本申请的实现中包含三个提案器和一个决策器。第一个提案器是原生规则提案器(Rule Predictor),该部件用于估计预定义的扩展规则的选择概率。第二个提案器是拷贝规则提案器,该部件用于子树拷贝操作中选择一个合适的子树。第三个提案器是子树提案器,该部件用于扩展change节点时,选择具有缺陷的子树位置。最后,决策器分别输出三个提案器的选择概率,与各自提案器所生成的概率相组合,输出最佳语法规则的概率,本申请从特殊的起始规则开始迭代生成完整的规则序列。
相对于代码生成,该解码器很难直接简单地迁移到修改操作序列的生成上,首先扩展语法中存在一些特殊的非终结符具有不同的扩展规则,同时修改操作需要满足一些语法上的限制,这些原有的解码器都没有办法实现。因此本申请提出了一种提案器/决策器结构来估计每一步扩展规则的概率。提案器的功能是提供不同的可用规则的集合,同时每一条规则有其对应的被选概率。决策器的功能是提供不同提案器的选择概率,对于不合法的提案器所提供的选项,决策 器都会将其概率修改为0,则最终语法规则的概率由决策器提供的概率和提案器的概率相乘所得到。
将代码生成看作代码抽象语法树的扩展过程,根据生成的部分抽象语法树去估计下一步语法规则选择的概率,采用该解码器能够使得生成的补丁一定能够满足对应语言的语法。
在一个例子中,拷贝规则提案器,还用于在选择第一抽象语法树的子树后,根据子树对应在缺陷函数中的位置,生成拷贝操作编码;缺陷子树提案器,还用于在选择具有缺陷的子树位置后,根据具有缺陷的子树位置,生成缺陷子树编码;将抽象语法树路径编码结果、扩展规则选择概率、第一抽象语法树的子树和具有缺陷的子树位置输入决策器,包括:将抽象语法树路径编码结果、扩展规则选择概率、拷贝操作编码、缺陷子树编码输入决策器。
在步骤103中,电子设备将软件缺陷代码的标识符,填充补丁模板,生成软件缺陷代码的补丁。
具体地,本申请针对一些软件自动修复技术无法生成项目特定的标识符的不足,提出了在补丁模板中使用占位符,再填充补丁模板的技术。一些软件自动修复技术解决无法生成项目特定的标识符的直接的方法是让神经网络从输入的上下文中选择合适的标识符,但这需要将整个软件缺陷代码的上下文当作模型的输入,目前没有神经网络能够处理如此庞大的输入。本申请提出在补丁中生成一些特定的占位符来代替这些项目特定的标识符,在补丁被应用于缺陷程序上时,这些占位符会被实例化为对应的标识符,通过考虑程序中的类型约束等,对于一个位置的可用标识符的数量不会太多,因此占位符不会对补丁模板的语法内容产生太多影响。
在步骤104中,电子设备用补丁修复软件缺陷代码。
具体地,本申请提出采用扩展语法制导的解码器来生成修改操作而不是完整的修复后的代码。本申请借鉴了代码自动生成领域中的语法制导的解码器,同时,针对重复生成复杂表达式的问题,本申请将补丁转化为预定义修改操作的序列,修改操作能够较为简洁地表示程序小部分上的修改。为了能够让解码器能够依照语法生成对应的修改操作,注意到修改操作的序列同样能被一组包含原本语言语法的扩展语法所描述。因此,本申请在原生语法的基础上给出了 一种适用于修改操作的扩展语法,从而可以用补丁修复软件缺陷代码。
如图5所示,本申请提供一种基于扩展语法规则的软件自动修复方法,将待修复软件方法转化为抽象语法树的表达方式,同时利用生成语法规则序列的方法生成补丁模板,最后相应地填充补丁模板,从而获取待修复软件的修复方案,帮助开发人员修复开发过程中出现的软件缺陷。本申请针对现有基于深度学习的软件自动修复技术可能生成语法不正确以及补丁表达形式不够简洁的不足,提出采用扩展语法制导的解码器来生成修改操作而不是完整的修复后的代码。本申请借鉴了代码自动生成领域中的语法制导的解码器,该解码器将代码生成看作代码抽象语法树的扩展过程,根据生成的部分抽象语法树去估计下一步语法规则选择的概率,采用该解码器能够使得生成的补丁一定能够满足对应语言的语法。同时,针对重复生成复杂表达式的问题,本申请将补丁转化为预定义修改操作的序列,修改操作能够较为简洁地表示程序小部分上的修改。为了能够让解码器能够依照语法生成对应的修改操作,注意到修改操作的序列同样能被一组包含原本语言语法的扩展语法所描述。因此,本申请在原有语法的基础上给出了一种适用于修改操作的扩展语法。
在一个例子中,也可以以中序或者后序对第一抽象语法树进行遍历,并将各向量序列也对应使用对应的遍历顺序。
在一个例子中,本申请的软件自动修复方法被用于进行缺陷修复实验,具体有较高的修复率。
(1)首先需要获取训练模型所需的训练集。本申请从Github代码仓库中爬取了创建时间在2011年3月到2018年3月之间的Java语法的提交记录,并采用关键词筛选的办法从中筛选出和修复相关的代码提交记录并且仅修改一个代码片段的提交记录。最终的训练数据集共包含103585条训练数据,其中80%作为训练集,20%作为验证集。
(2)本申请实验的验证采用常用的缺陷数据集Defects4J v1.2的395个缺陷和Defects4J v2.0的额外的420个缺陷。实验中采用的缺陷定位方法是在软件自动修复研究常用的基于测试样例覆盖情况的Ochiai算法。每个缺陷给予5个小时的补丁验证时间。
(3)下表列出了本申请在实验数据上的修复结果。
  Defects4J v1.2 Defects4J v2.0
TBar 42 8
SimFix 34 2
本申请 53 19
TBar和Simfix是两种在Defects4J v1.2性能表现最好的两个软件自动修复技术,表格中分别列出了三项技术在两个测试数据集上修复的缺陷总数。从表格中可以看出,本申请在Defects4J v1.2上比TBar多修复了11个缺陷,在Defects4J v2.0上多修复了一倍的缺陷。这些结果说明本申请能够比现有的技术具有更强的修复能力,同时具有更好的泛化性。
本申请实施方式还涉及一种软件自动修复系统,如图6所示,包括:
获取模块601,用于获取软件缺陷代码;
模板生成模块602,用于根据软件缺陷代码的语法特征和训练好的补丁模板生成模型,生成符合软件缺陷代码所使用语言的语法的补丁模板;
补丁生成模块603,填充补丁模板,生成软件缺陷代码的补丁;
修复模块604,用于用补丁修复软件缺陷代码。
在一个例子中,在根据软件缺陷代码的语法特征和训练好的补丁模板生成模型,生成符合软件缺陷代码所使用语言的语法的补丁模板前,方法还包括:基于缺陷定位技术,在软件缺陷代码中确定缺陷函数;解析缺陷函数,生成第一抽象语法树;根据第一抽象语法树中每一节点的特征向量、标签及邻接矩阵,得到第一抽象语法树的前序遍历序列、各节点标签向量及第一抽象语法图;根据预设的语法规则和语法特征,得到扩展程序语法;根据扩展程序语法的规则序列,调用词嵌入方法,生成各规则序列嵌入向量和程序补丁;根据规则序列的编码序号,生成各规则序列编码向量;根据程序补丁,生成第二抽象语法树;根据第二抽象语法树,得到第二抽象语法图及第二抽象语法树路径;补丁模板生成模型,包括:代码编码器、补丁编码器、抽象语法树路径编码器和扩展语法解码器;生成符合软件缺陷代码所使用语言的语法的补丁模板,包括:将前序遍历序列、各节点标签向量及第一抽象语法图输入代码编码器,得到代码编 码结果;将代码编码结果和各规则序列嵌入向量、各规则序列编码向量、第二抽象语法图输入补丁编码器,得到补丁编码结果;将代码编码结果、代码编码结果和第二抽象语法树路径输入抽象语法树路径编码器,得到抽象语法树路径编码结果;将抽象语法树路径编码结果输入扩展语法解码器,选取最佳规则序列;根据最佳规则序列,生成补丁模板。
在一个例子中,代码编码器,包括:第一自注意力层、第一门控层和第一图卷积层;将前序遍历序列、各节点标签向量及第一抽象语法图输入代码编码器,得到代码编码结果,包括:根据前序遍历序列,获取各节点的位置特征向量;根据前序遍历序列与位置特征向量,获取第一问询向量、第一键值向量及第一权值向量;将第一问询向量、第一键值向量及第一权值向量输入第一自注意力层,得到第一自注意力结果;将第一自注意力结果各节点标签向量输入第一门控层,得到第一门控结果;将第一门控结果与第一抽象语法图输入第一图卷积层,得到第一图卷积结果;将第一图卷积结果赋值给第一问询向量、第一键值向量及第一权值向量,对第一自注意力层、第一门控层和第一图卷积层进行迭代计算,得到代码编码结果。
在一个例子中,补丁编码器,包括:第二自注意力层、第二门控层、自然语言注意力层和第二图卷积层;将代码编码结果和各规则序列嵌入向量、各规则序列编码向量、第二抽象语法图输入补丁编码器,得到补丁编码结果,包括:根据前序遍历序列,获取各节点的位置特征向量;根据规则序列嵌入向量与位置特征向量,获取第二问询向量、第二键值向量及第二权值向量;将第二问询向量、第二键值向量及第二权值向量输入第二自注意力层,得到第二自注意力结果;将第二自注意力结果和各规则序列编码向量输入第二门控层,得到第二门控结果;将代码编码结果及第二门控结果输入自然语言注意力层,得到自然语言注意力结果;将自然语言注意力结果和第二抽象语法图输入第二图卷积层,得到第二图卷积结果;将第二图卷积结果赋值给第二问询向量、第二键值向量及第二权值向量,对第二自注意力层、第二门控层、自然语言注意力层和第二图卷积层进行迭代计算,得到补丁编码结果。
在一个例子中,抽象语法树路径编码器,包括:补丁注意力层、代码注意力层和全连接层;将代码编码结果、代码编码结果和第二抽象语法树路径输入 抽象语法树路径编码器,得到抽象语法树路径编码结果,包括:将补丁编码结果与第二抽象语法树路径输入补丁注意力层,得到补丁注意力结果;将代码编码结果与补丁注意力结果输入代码注意力层,得到代码注意力结果;将代码注意力结果输入全连接层,将全连接层输出结果赋值给第二抽象语法树路径,对补丁注意力层、代码注意力层和全连接层进行迭代计算,得到抽象语法树路径编码结果。
在一个例子中,扩展语法解码器,包括:原生规则提案器、拷贝规则提案器、缺陷子树提案器和决策器;将抽象语法树路径编码结果输入扩展语法解码器,选取最佳规则序列,生成补丁模板,包括:将抽象语法树路径编码结果分别输入原生规则提案器、拷贝规则提案器及缺陷子树提案器,得到扩展规则选择概率;其中,原生规则提案器用于生成预定义的扩展规则的选择概率,拷贝规则提案器用于选择子树,缺陷子树提案器选择具有缺陷的子树位置;将抽象语法树路径编码结果、扩展规则选择概率、第一抽象语法树的子树和具有缺陷的子树位置输入决策器,获取最佳规则的概率;根据最佳规则的概率,得到最佳规则序列。
在一个例子中,拷贝规则提案器,还用于在选择第一抽象语法树的子树后,根据子树对应在缺陷函数中的位置,生成拷贝操作编码;缺陷子树提案器,还用于在选择具有缺陷的子树位置后,根据具有缺陷的子树位置,生成缺陷子树编码;将抽象语法树路径编码结果、扩展规则选择概率、第一抽象语法树的子树和具有缺陷的子树位置输入决策器,包括:将抽象语法树路径编码结果、扩展规则选择概率、拷贝操作编码、缺陷子树编码输入决策器。
本申请的实施例还涉及一种电子设备,如图7所示,包括:至少一个处理器701;与至少一个处理器通信连接的存储器702;其中,存储器702存储有可被至少一个处理器701执行的指令,指令被至少一个处理器701执行上述的任一方法实施例。
其中,存储器702和处理器701采用总线方式连接,总线可以包括任意数量的互联的总线和桥,总线将一个或多个处理器701和存储器702的各种电路连接在一起。总线还可以将诸如外围设备、稳压器和功率管理电路等之类的各 种其他电路连接在一起,这些都是本领域所公知的,因此,本文不再对其进行进一步描述。总线接口在总线和收发机之间提供接口。收发机可以是一个元件,也可以是多个元件,比如多个接收器和发送器,提供用于在传输介质上与各种其他装置通信的单元。经处理器701处理的信息通过天线在无线介质上进行传输,进一步,天线还接收信息并将信息传送给处理器701。
处理器701负责管理总线和通常的处理,还可以提供各种功能,包括定时,外围接口,电压调节、电源管理以及其他控制功能。而存储器702可以被用于存储处理器在执行操作时所使用的信息。
本申请的实施例涉及一种计算机可读存储介质,存储有计算机程序。计算机程序被处理器执行时实现上述方法实施例。
即,本领域技术人员可以理解,实现上述实施例方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
本领域的普通技术人员可以理解,上述各实施方式是实现本申请的具体实施例,而在实际应用中,可以在形式上和细节上对其作各种改变,而不偏离本申请的精神和范围。

Claims (10)

  1. 一种软件自动修复方法,包括:
    获取软件缺陷代码;
    根据所述软件缺陷代码的语法特征和训练好的补丁模板生成模型,生成符合所述软件缺陷代码所使用语言的语法的补丁模板;
    填充所述补丁模板,生成所述软件缺陷代码的补丁;
    用所述补丁修复所述软件缺陷代码。
  2. 根据权利要求1所述的软件自动修复方法,其中,在所述根据所述软件缺陷代码的语法特征和训练好的补丁模板生成模型,生成符合所述软件缺陷代码所使用语言的语法的补丁模板前,所述方法还包括:
    基于缺陷定位技术,在所述软件缺陷代码中确定缺陷函数;
    解析所述缺陷函数,生成第一抽象语法树;
    根据所述第一抽象语法树中每一节点的特征向量、标签及邻接矩阵,得到第一抽象语法树的前序遍历序列、各节点标签向量及第一抽象语法图;
    根据预设的语法规则和所述语法信息,得到扩展程序语法;
    根据所述扩展程序语法的规则序列,调用词嵌入方法,生成各规则序列嵌入向量和程序补丁;
    根据所述规则序列的编码序号,生成各规则序列编码向量;
    根据所述程序补丁,生成第二抽象语法树;
    根据所述第二抽象语法树,得到第二抽象语法图及第二抽象语法树路径;
    所述补丁模板生成模型,包括:代码编码器、补丁编码器、抽象语法树路径编码器和扩展语法解码器;
    所述生成符合所述软件缺陷代码所使用语言的语法的补丁模板,包括:
    将所述前序遍历序列、各所述节点标签向量及所述第一抽象语法图输入所述代码编码器,得到代码编码结果;
    将所述代码编码结果和各所述规则序列嵌入向量、各所述规则序列编码向量、所述第二抽象语法图输入所述补丁编码器,得到补丁编码结果;
    将所述代码编码结果、所述代码编码结果和所述第二抽象语法树路径输入 所述抽象语法树路径编码器,得到抽象语法树路径编码结果;
    将所述抽象语法树路径编码结果输入所述扩展语法解码器,选取最佳规则序列;
    根据所述最佳规则序列,生成所述补丁模板。
  3. 根据权利要求2所述的软件自动修复方法,其中,所述代码编码器,包括:第一自注意力层、第一门控层和第一图卷积层;
    所述将所述前序遍历序列、各所述节点标签向量及所述第一抽象语法图输入所述代码编码器,得到代码编码结果,包括:
    根据所述前序遍历序列,获取各节点的位置特征向量;
    根据前序遍历序列与所述位置特征向量,获取第一问询向量、第一键值向量及第一权值向量;
    将所述第一问询向量、所述第一键值向量及所述第一权值向量输入所述第一自注意力层,得到第一自注意力结果;
    将所述第一自注意力结果各所述节点标签向量输入所述第一门控层,得到第一门控结果;
    将所述第一门控结果与所述第一抽象语法图输入所述第一图卷积层,得到第一图卷积结果;
    将所述第一图卷积结果赋值给所述第一问询向量、所述第一键值向量及所述第一权值向量,对所述第一自注意力层、第一门控层和第一图卷积层进行迭代计算,得到所述代码编码结果。
  4. 根据权利要求2或3所述的软件自动修复方法,其中,所述补丁编码器,包括:第二自注意力层、第二门控层、自然语言注意力层和第二图卷积层;
    所述将所述代码编码结果和各所述规则序列嵌入向量、各所述规则序列编码向量、所述第二抽象语法图输入所述补丁编码器,得到补丁编码结果,包括:
    根据所述前序遍历序列,获取各节点的位置特征向量;
    根据所述规则序列嵌入向量与所述位置特征向量,获取第二问询向量、第二键值向量及第二权值向量;
    将所述第二问询向量、所述第二键值向量及所述第二权值向量输入所述第二自注意力层,得到第二自注意力结果;
    将所述第二自注意力结果和各所述规则序列编码向量输入所述第二门控层,得到第二门控结果;
    将所述代码编码结果及所述第二门控结果输入所述自然语言注意力层,得到自然语言注意力结果;
    将所述自然语言注意力结果和所述第二抽象语法图输入所述第二图卷积层,得到第二图卷积结果;
    将所述第二图卷积结果赋值给所述第二问询向量、所述第二键值向量及所述第二权值向量,对所述第二自注意力层、第二门控层、所述自然语言注意力层和第二图卷积层进行迭代计算,得到所述补丁编码结果。
  5. 根据权利要求2至4中任意一项所述的软件自动修复方法,其中,所述抽象语法树路径编码器,包括:补丁注意力层、代码注意力层和全连接层;
    所述将所述代码编码结果、所述代码编码结果和所述第二抽象语法树路径输入所述抽象语法树路径编码器,得到抽象语法树路径编码结果,包括:
    将所述补丁编码结果与所述第二抽象语法树路径输入所述补丁注意力层,得到补丁注意力结果;
    将所述代码编码结果与所述补丁注意力结果输入所述代码注意力层,得到代码注意力结果;
    将所述代码注意力结果输入所述全连接层,将所述全连接层输出结果赋值给所述第二抽象语法树路径,对所述补丁注意力层、所述代码注意力层和所述全连接层进行迭代计算,得到所述抽象语法树路径编码结果。
  6. 根据权利要求2至5中任意一项所述的软件自动修复方法,其中,所述扩展语法解码器,包括:原生规则提案器、拷贝规则提案器、缺陷子树提案器和决策器;
    所述将所述抽象语法树路径编码结果输入所述扩展语法解码器,选取最佳规则序列,包括:
    将所述抽象语法树路径编码结果分别输入所述原生规则提案器、所述拷贝规则提案器及所述缺陷子树提案器,得到扩展规则选择概率;其中,所述原生规则提案器用于生成预定义的扩展规则的选择概率,所述拷贝规则提案器用于选择所述第一抽象语法树的子树,所述缺陷子树提案器选择所述具有缺陷的子树位置;
    将所述抽象语法树路径编码结果、所述扩展规则选择概率、所述第一抽象语法树的子树和具有缺陷的子树位置输入决策器,获取最佳规则的概率;
    根据所述最佳规则的概率,得到所述最佳规则序列。
  7. 根据权利要求6所述的软件自动修复方法,其中,所述拷贝规则提案器,还用于在所述选择所述第一抽象语法树的子树后,根据所述子树对应在所述缺陷函数中的位置,生成拷贝操作编码;
    所述缺陷子树提案器,还用于在所述选择所述具有缺陷的子树位置后,根据所述具有缺陷的子树位置,生成缺陷子树编码;
    所述将所述抽象语法树路径编码结果、所述扩展规则选择概率、所述第一抽象语法树的子树和具有缺陷的子树位置输入决策器,包括:
    将所述抽象语法树路径编码结果、所述扩展规则选择概率、所述拷贝操作编码、所述缺陷子树编码输入所述决策器。
  8. 一种软件自动修复系统,包括:
    获取模块,用于获取软件缺陷代码;
    模板生成模块,用于根据所述软件缺陷代码的语法特征和训练好的补丁模板生成模型,生成符合所述软件缺陷代码所使用语言的语法的补丁模板;
    补丁生成模块,用于填充所述补丁模板,生成所述软件缺陷代码的补丁;
    修复模块,用于用所述补丁修复所述软件缺陷代码。
  9. 一种电子设备,包括:
    至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述 至少一个处理器执行,以使所述至少一个处理器能够执行如权利要求1至7中任一所述的软件自动修复方法。
  10. 一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时实现权利要求1至7中任一项所述的软件自动修复方法。
PCT/CN2022/091008 2021-08-06 2022-05-05 软件自动修复方法、系统、电子设备及存储介质 WO2023010916A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110904041.3A CN115934147A (zh) 2021-08-06 2021-08-06 软件自动修复方法、系统、电子设备及存储介质
CN202110904041.3 2021-08-06

Publications (1)

Publication Number Publication Date
WO2023010916A1 true WO2023010916A1 (zh) 2023-02-09

Family

ID=85155138

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/091008 WO2023010916A1 (zh) 2021-08-06 2022-05-05 软件自动修复方法、系统、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN115934147A (zh)
WO (1) WO2023010916A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116094973A (zh) * 2023-03-06 2023-05-09 深圳市华曦达科技股份有限公司 用于用户端设备广域网管理协议的测试方法和装置
CN117009127A (zh) * 2023-08-23 2023-11-07 航电所(成都)科技有限公司 火电厂云端系统的软件升级方法及系统

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117056940B (zh) * 2023-10-12 2024-01-16 中关村科学城城市大脑股份有限公司 服务器系统漏洞修复方法、装置、电子设备和介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090307660A1 (en) * 2008-10-14 2009-12-10 Edss, Inc. Ticc-paradigm to build verified parallel software for multi-core chips
CN105446712A (zh) * 2014-08-08 2016-03-30 阿里巴巴集团控股有限公司 一种应用程序缺陷修补方法及装置
US20180165182A1 (en) * 2016-12-09 2018-06-14 Fujitsu Limited Automated software program repair
CN109376092A (zh) * 2018-11-26 2019-02-22 扬州大学 一种面向缺陷补丁代码的软件缺陷原因自动分析方法
CN110597735A (zh) * 2019-09-25 2019-12-20 北京航空航天大学 一种面向开源软件缺陷特征深度学习的软件缺陷预测方法
CN112181428A (zh) * 2020-09-28 2021-01-05 北京航空航天大学 一种基于抽象语法树的开源软件缺陷数据分类方法及系统
CN112463424A (zh) * 2020-11-13 2021-03-09 扬州大学 一种基于图的端到端程序修复方法

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090307660A1 (en) * 2008-10-14 2009-12-10 Edss, Inc. Ticc-paradigm to build verified parallel software for multi-core chips
CN105446712A (zh) * 2014-08-08 2016-03-30 阿里巴巴集团控股有限公司 一种应用程序缺陷修补方法及装置
US20180165182A1 (en) * 2016-12-09 2018-06-14 Fujitsu Limited Automated software program repair
CN109376092A (zh) * 2018-11-26 2019-02-22 扬州大学 一种面向缺陷补丁代码的软件缺陷原因自动分析方法
CN110597735A (zh) * 2019-09-25 2019-12-20 北京航空航天大学 一种面向开源软件缺陷特征深度学习的软件缺陷预测方法
CN112181428A (zh) * 2020-09-28 2021-01-05 北京航空航天大学 一种基于抽象语法树的开源软件缺陷数据分类方法及系统
CN112463424A (zh) * 2020-11-13 2021-03-09 扬州大学 一种基于图的端到端程序修复方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116094973A (zh) * 2023-03-06 2023-05-09 深圳市华曦达科技股份有限公司 用于用户端设备广域网管理协议的测试方法和装置
CN117009127A (zh) * 2023-08-23 2023-11-07 航电所(成都)科技有限公司 火电厂云端系统的软件升级方法及系统

Also Published As

Publication number Publication date
CN115934147A (zh) 2023-04-07

Similar Documents

Publication Publication Date Title
WO2023010916A1 (zh) 软件自动修复方法、系统、电子设备及存储介质
US9928040B2 (en) Source code generation, completion, checking, correction
CN113064586B (zh) 一种基于抽象语法树增广图模型的代码补全方法
CN114218932B (zh) 基于故障因果图谱的航空故障文本摘要生成方法及其装置
US20220180198A1 (en) Training method, storage medium, and training device
CN112597063A (zh) 缺陷代码定位的方法、装置以及存储介质
US11487522B1 (en) Training and/or using neural network model to generate target source code from lower-level representation
CN113342318A (zh) 基于多视图代码特征的细粒度代码自动生成方法及系统
CN110807335A (zh) 基于机器学习的翻译方法、装置、设备及存储介质
CN116745758A (zh) 使用基于神经网络的机器学习的智能查询编辑器
CN112764738A (zh) 基于多视图程序特征的代码自动生成方法及系统
CN115543437B (zh) 一种代码注释生成方法和系统
CN113741886A (zh) 一种基于图的语句级程序修复方法及系统
CN112446221B (zh) 翻译评估方法、装置、系统及计算机存储介质
CN116822464A (zh) 一种文本纠错方法、系统、设备及存储介质
CN115686923B (zh) 一种软件源代码缺陷自动修复方法及系统
CN115495085A (zh) 一种基于深度学习细粒度代码模板的生成方法及装置
CN115906854A (zh) 一种基于多级对抗的跨语言命名实体识别模型训练方法
JPH07319682A (ja) ソフトウエア発見システム
US20220100640A1 (en) Generating test input values for functional components based on test coverage analysis
US20220180197A1 (en) Training method, storage medium, and training device
CN114881011B (zh) 多通道中文文本更正方法、装置、计算机设备和存储介质
US12008826B2 (en) Method and apparatus for customized deep learning-based text correction
CN117573084B (zh) 一种基于逐层融合抽象语法树的代码补全方法
CN117273027B (zh) 一种基于翻译错误纠正的机器翻译自动后校验方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22851636

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE