CN110008344B

CN110008344B - Method for automatically marking data structure label on code

Info

Publication number: CN110008344B
Application number: CN201910304797.7A
Authority: CN
Inventors: 不公告发明人
Original assignee: Zhongsenyunlian Chengdu Technology Co ltd
Current assignee: Zhongsenyunlian Chengdu Technology Co ltd
Priority date: 2019-04-16
Filing date: 2019-04-16
Publication date: 2020-09-29
Anticipated expiration: 2039-04-16
Also published as: CN112148879B; CN112148879A; CN110008344A

Abstract

The invention discloses a method for automatically marking a data structure label on a code, which belongs to the field of natural language processing under artificial intelligence, and comprises the following steps: converting the code into an abstract syntax tree using a lexical analyzer and a syntax analyzer; modeling the abstract syntax tree, and coding each node on the tree from bottom to top by using an attention mechanism and a residual block to obtain the code of the whole tree; and finally, labeling the code with a data structure through a classifier in the model. According to the method, the data structure label can be automatically marked on the code, and the workload of manually marking the data structure label on the code is reduced.

Description

Method for automatically marking data structure label on code

Technical Field

The invention belongs to the field of natural language processing under artificial intelligence, and particularly relates to a method for automatically marking a data structure label on a code.

Background

With the popularization of the internet, a large number of high-quality codes appear on the internet, but many codes do not have labels of data structures, so that the user is inconvenient to inquire and learn, and the manual marking of data structure labels for massive codes is unrealistic.

Disclosure of Invention

The invention provides a method for automatically marking a data structure label on a code. The method comprises the steps of converting codes into an abstract syntax tree by using a lexical analyzer and a syntax analyzer, then embedding words into each word, sequentially coding each node from bottom to top by using methods such as a residual block and an attention mechanism and the like on the tree, finally obtaining codes of a root node, wherein the codes comprise syntax and semantic expressions of all sub nodes and semantic expressions of self nodes, and finally classifying by using the expressions of the root node.

The invention relates to a method for automatically labeling a data structure on a code, which comprises the following steps:

step 1: code for a number of annotated data structures is collected from web pages using crawler technology.

Step 2: because different codes have different grammars, different lexical analyzers are required to be used for different languages, the lexical analyzers are used for replacing variables of different types in the codes with corresponding words, and the lexical analyzers replace numbers such as 1, 1.1 and the like with Num; the lexical analyzer replaces all variable names with names; all the character strings of the lexical analyzer are replaced by Str, wherein the lexical analyzer does not replace the keywords corresponding to the language.

And step 3: and (4) corresponding parsers are used for different languages, and the parsers are used for converting the codes after lexical analysis into abstract syntax trees.

And 4, step 4: and performing word embedding on words generated after lexical analysis and syntactic analysis, and performing word embedding on words such as Num, Name, root node Module, assignment operation Assign and the like.

And 5: and carrying out nonlinear transformation on the embedded codes of each node by using the same residual block Reb to obtain new semantic codes.

e′＝Reb^q(e)＝LN(W₂·ReLU(W₁·e)+e)

Where e is the embedded code corresponding to the current node, e ∈ R^{embedding_size}Embedding size is the dimension, W, of each node embedding₁∈R^{d_i×embedding_size}，W₂∈R^{embedding_size×d_i}D _ i is a hyper-parameter, ReLU is a ReLU activation function, LN is a hierarchical normalization, and Reb is a residual block.

Step 6: non-leaf nodes are encoded on the tree from bottom to top, and the semantic expression of all sub-nodes under the current node which is most relevant to the current node is calculated by using an attention mechanism.

V_c＝A·H^T

A＝softmax(score(Q,H))

Wherein Q is a matrix formed by superposing vectors of n same current nodes transformed by a residual block, H is a matrix formed by superposing vectors of n sub-nodes under the current nodes transformed by the residual block, the score function is used for calculating the similarity between the expression of the current node and the expression of each sub-node, the higher the similarity is,the higher the probability after softmax, the higher the score function can calculate the similarity between the current node and the sub-nodes in three ways, V_cIs an expression of attention.

And fusing the attention vector and the current node vector to form a new vector expression of the current node, wherein the vector expression of the current node comprises the semantic expression of the current node and the semantic expression of all sub-nodes. Such as the following equation:

e″＝ReLU(Reb^q(e′)+Reb^c(V_c)+b)

where e' is the vector encoding of the current node, V_cFor the attention vector, Reb is a residual block, b is a bias value, ReLU is a ReLU activation function, and e "is vector coding fused by using residual block coding for e' current node vector coding and using residual block coding for Vc attention vector coding.

And 7: according to the above formula, the expression of each node is calculated on the tree from bottom to top, and finally the expression of the root node is used for classification, and because the code possibly belongs to a plurality of categories, a plurality of sigmoid classifiers are used for obtaining a plurality of labels of the data structure.

y_i＝sigmoid(W₂·ReLU(W₁·e′_r)+b)

Wherein e'_rFor semantic expression of a root node, W1 and W2 are parameters, b is a bias value, ReLU is a ReLU activation function, and sigmoid is a sigmoid function.

And 8: the method comprises the steps of training a model, using a large number of label data structure codes to train an integral model, firstly using a lexical analyzer to carry out lexical analysis on the codes, replacing numbers such as 1 and 1.1 with Num, replacing all variable names with Name, replacing all character strings with Str, using a grammar analyzer to convert the codes after the lexical analysis into an abstract syntax tree, embedding each node in the abstract syntax tree, namely finding the corresponding real-dimensional vector of the node, and using a residual block to carry out nonlinear transformation on the embedded codes of each node to obtain new semantic codes. The following formula:

e′＝Reb^q(e)＝LN(W₂·ReLU(W₁·e)+e)

Non-leaf nodes are encoded on the tree from bottom to top, and the semantic expression of all sub-nodes under the current node which is most relevant to the current node is calculated by using an attention mechanism.

V_c＝A·H^T

A＝softmax(score(Q,H))

Q is a matrix formed by superposing vectors of n same current nodes after residual block transformation, H is a matrix formed by superposing vectors of n sub-nodes under the current nodes after residual block transformation, the score function is used for calculating the similarity between the current node expression and each sub-node expression, the higher the similarity is, the higher the probability after softmax is, the score function can calculate the similarity between the current node and the sub-nodes in three modes, and V is_cIs an expression of attention.

e′＝ReLU(Reb^q(e′)+Reb^c(V_c)+b)

e' is the vector of the current node and the attention vector V_cExpression after fusion.

Finally, the codes of the root nodes are used for classification, and because the codes can belong to a plurality of classes, a plurality of sigmoid classifiers are used for obtaining a plurality of labels of the data structures.

y_i＝sigmoid(W₂·ReLU(W₁·e′_r)+b)

e′_rFor semantic expression of a root node, the ReLU is a ReLU activation function, and the sigmoid is a sigmoid function.

The coding of the root node has a difference between the predicted probability and the true probability through the sigmoid function, and a loss value is generated. And each parameter is updated through inverse gradient propagation, so that the training effect is achieved.

And step 9: predicting a new code by using a trained model to obtain a section of new code, performing lexical analysis on the section of new code by using a lexical analyzer, replacing numbers such as 1, 1.1 and the like with Num, replacing all variable names with Name, replacing all character strings with Str, converting the lexical analyzed code into an abstract syntax tree by using a syntax analyzer, embedding each node in the abstract syntax tree, such as Num, Name and the like, namely finding the node into a corresponding real-dimensional vector, encoding the vector of each node by using a residual block to obtain a new code, encoding each node from bottom to top by using an attention machine, and finally classifying by using the codes of root nodes, wherein a plurality of sigmoid classifiers can be judged to be a plurality of data structures, if one classifier predicts that the probability of a certain label is more than 50 percent, the section of code belongs to the category, a threshold value, such as a prediction probability higher than 70%, may also be set to consider the code as belonging to this category.

Drawings

FIG. 1 is a schematic flow chart of the present invention.

Fig. 2 is a diagram of an abstract syntax tree of the a ═ b + c code.

Fig. 3 is a diagram of abstract syntax tree coding.

Detailed Description

The present invention will now be described in further detail with reference to the accompanying drawings. These drawings are simplified schematic views illustrating only the basic structure of the present invention in a schematic manner, and thus show only the constitution related to the present invention.

FIG. 1 shows a flow diagram of a method for automatically tagging code with data structures, comprising:

firstly, a crawler technology is used for collecting a large number of codes with data structures from various blogs, forums and other networks;

secondly, performing lexical analysis on the codes by using a lexical analyzer, replacing numbers such as 1, 1.1 and the like with Num, replacing all variable names with Name, and replacing all character strings with Str, wherein the lexical analyzer does not replace keywords corresponding to the language;

thirdly, using a syntax analyzer to analyze the syntax of the code and converting the code into an abstract syntax tree;

fourthly, coding each node in the abstract syntax tree by using a residual block, wherein each node obtains a new code coded by the residual block, coding each node from bottom to top in sequence on the tree by using an attention mechanism, fusing information of all sub-nodes of each node and a current node, and encoding the node from bottom to top by a layer until a root node on the tree is coded;

and fifthly, training the model by using a large number of label data structure codes to train the integral model, firstly, performing lexical analysis on the codes by using a lexical analyzer, replacing numbers such as 1, 1.1 and the like with Num, replacing all variable names with Name, replacing all character strings with Str, converting the codes after the lexical analysis into an abstract syntax tree by using a syntax analyzer, embedding each node in the abstract syntax tree, namely finding the corresponding real-dimensional vector of the node, and performing nonlinear transformation on the embedded codes of each node by using a residual block to obtain new semantic codes. The following formula:

e′＝Reb^q(e)＝LN(W₂·ReLU(W₁·e)+e)

V_c＝A·H^T

A＝softmax(score(Q,H))

e″＝ReLU(Reb^q(e′)+Reb^c(V_c)+b)

e 'is the vector e' of the current node and the attention vector V_cExpression after fusion.

y_i＝sigmoid(W₂·ReLU(W₁·e′_r)+b)

Sixthly, predicting a new code by using a trained model to obtain a section of new code, performing lexical analysis on the section of new code by using a lexical analyzer, replacing numbers such as 1, 1.1 and the like with Num, replacing all variable names with Name, replacing all character strings with Str, converting the lexical analyzed code into an abstract syntax tree by using a syntax analyzer, embedding each node in the abstract syntax tree, namely embedding nodes such as Num, Name and the like, namely finding corresponding real-dimensional vectors for the nodes, coding the vector of each node by using a residual block to obtain new codes, sequentially coding each node from bottom to top by using an attention system, and finally classifying by using codes of root nodes, judging as a plurality of data structures due to the use of a plurality of sigmoid classifiers, if one classifier predicts that the probability of a certain label is more than 50%, then the code belongs to the category, or a threshold value may be set, for example, if the prediction probability is higher than 70%, before the code is considered to belong to the category.

Fig. 2 shows a schematic diagram of an abstract syntax tree of a ═ b + c code, which includes Module, Assign, Name, Store, BinOp, Load, Add names. The following sequentially introduces, Module is the root node, the start of all codes; assign is an assignment symbol, specifically, in a code of a ═ b + c ═ c; the Name is an abstract Name of variable names, which variable is not clear, but the Name is a variable a, b and c as seen from the code; store is a storage symbol, and the value calculated by b + c is assigned to a and stored in a memory; BinOp is a binary operation, such as addition, subtraction, multiplication, division; load is a loading symbol and loads the value of a certain variable; add is the sign of the addition, adding the values of the two variables.

In fig. 2, the abstract syntax tree flow of the a ═ b + c code is that a Module root node is given first, how many lines of codes exist under the Module root node, and there is only one line of codes, the a ═ b + c code is mainly used for performing assignment operation, an Assign node is provided under the Module root node, the Assign node has a left sub-tree and a right sub-tree, the left sub-tree represents assignment to a variable, i.e., the variable on the left side of the equal sign in a ═ b + c, the right sub-tree represents the symbol on the right side of the equal sign in a ═ b + c, the left sub-node of the Assign node to the Name, the sub-node under the Name is a Store node, and represents assignment of the value on the right side of the equation to the left side and stores the value in the memory.

The right subnode of the Assign node of the Assign has BinOp binary operation symbols, which represent that addition, subtraction, multiplication and division are possible below. An Add addition symbol is arranged below the BinOp binary operation symbol to represent that the right side of the code equal sign is addition operation, the Name variable symbol on the left side of the Add addition symbol is a b variable in the code a ═ b + c, a subnode below the variable symbol is a Load loading symbol, and the fact that calculation needs to be carried out by using a value in the b variable is explained. The Name variable symbol on the right side of the Add addition symbol is a c variable in a code a ═ b + c, a Load symbol is arranged below the variable symbol, and calculation needs to be performed by using the value in the c variable.

In an abstract syntax tree, Module is a root node, Assign is an assignment node, calculate a BinOp binary operation symbol on the right, represent a b variable by a Name variable symbol, take out a value in the b variable by Load, represent a c variable by another Name variable symbol, take out a value in the c variable by Load, operate the values of the two variables, Add the value in the b variable and the value in the c variable by looking at an Add addition symbol, Assign the calculated value to the Name variable symbol by the Assign symbol after adding, specifically, an a variable in an a b + c code, Assign the value of the a variable to the a variable, Store the value of the a variable by using a Store symbol, and finally Store the value assigned to the a variable in a memory by the Store symbol.

Fig. 3 shows a schematic diagram of encoding an abstract syntax tree, which is described from bottom to top, where the lowermost leaf nodes are NameEmbedding, Add Embedding, and Name Embedding, where Embedding represents that the nodes are embedded, that is, the nodes are converted into real-valued vectors, then the three nodes are further encoded by using a residual block to obtain semantic codes of the three nodes, then Embedding is performed on a BinOp binary operation node, converting the BinOp binary operation node into a corresponding real-valued vector, and then encoding the BinOp node by using the residual block. Such as the following equation:

e′＝Reb^q(e)＝LN(W₂·ReLU(W₁·e)+e)

And after encoding, calculating semantic expressions which are most relevant to all sub nodes under the current BinOp node and the current BinOp node by using an attention mechanism. Such as the following equation:

V_c＝A·H^T

A＝softmax(score(Q,H))

And then, fusing the semantic coding vectors of all sub-nodes under the BinOp node obtained by the attention mechanism with the coding vectors of the BinOp node by the residual block to form a new vector expression of the current node, wherein the vector expression of the current node comprises the semantic expression of the current node and the semantic expression of all the sub-nodes. Such as the following equation:

e″＝ReLU(Reb^q(e′)+Reb^c(V_c)+b)

The coding vector of the BinOp binary operation node is obtained at present, then the left sub-tree of the Assign Embedding node is coded, the left sub-tree only has the Name Embedding node, Embedding is carried out on the Name variable node, the coding vector of the node is obtained, as with the above, a residual block is used for coding the Name Embedding node to obtain a new semantic expression, all sub-nodes of the Assign Embedding node have coding vectors, Embedding is carried out on the Assign node at present, the Assign node is converted into a real value vector, and the residual block is used for recoding the Assign Embedding node vector to obtain a new semantic expression. And then, calculating the semantic expression of all sub-nodes under the current Assign Embedding node and the most relevant to the current Assign Embedding node by using an attention mechanism. And then fusing the semantic coding vectors of all sub-nodes under the design Embedding node obtained by the attention mechanism with the coding vectors of the design Embedding node by the residual block to form a new vector expression of the current node, wherein the vector expression of the current node comprises the semantic expression of the current node and the semantic expression of all sub-nodes.

And at the final stage of encoding, Embedding Module root nodes into the Embedding Embedding to convert the Embedding Embedding into corresponding real value vectors, and re-encoding the Module Embedding by using a residual block to obtain new semantic encoding. And then, calculating the semantic expression of all sub nodes under the current Module Embedding node and the most relevant to the current Module Embedding node by using an attention mechanism. And then, fusing the semantic coding vectors of all sub nodes under the Module Embedding node obtained by the attention mechanism with the coding vectors of the Module Embedding node by the residual block to form a new vector expression of the current node. The semantic coding vector of the Module node is obtained at present and also represents the semantic coding of the whole code, the semantic coding vector of the Module node is input into a plurality of Sigmoid functions, and each Sigmoid function judges whether the data structure is a certain data structure.

The technical scheme of the invention is described in detail in the following with the accompanying drawings:

as shown in fig. 1, the main process of the present invention is:

Step 2: because different codes have different grammars, different lexical analyzers are required to be used for different languages, the lexical analyzers are used for replacing variables of different types in the codes with corresponding words, the lexical analyzers replace numbers such as 1, 1.1 and the like with Num, the lexical analyzers replace all variable names with Name, the lexical analyzers replace all character strings with Str, and the lexical analyzers cannot replace keywords corresponding to the languages.

And step 3: and (4) corresponding parsers are used for different languages, and the parsers are used for converting the codes after lexical analysis into abstract syntax trees. As shown in fig. 2, the a ═ b + c code is converted to an abstract syntax tree using a python ast toolkit.

And 4, step 4: and performing word embedding on words generated after lexical analysis and syntactic analysis, such as word embedding on words of Num, Name, root node Module, assignment operation Assign and the like.

e′＝Reb^q(e)＝LN(W₂·ReLU(W₁·e)+e)

V_c＝A·H^T

A＝softmax(score(Q,H))

Q is a matrix formed by superposing vectors of n same current nodes transformed by residual blocks, H is a matrix formed by superposing vectors of n sub-nodes under the current nodes transformed by the residual blocks, and the score function is used for calculating the expression of the current nodes and each sub-nodeThe similarity of the expression is higher, the probability after softmax is higher, and the score function can calculate the similarity between the current node and the sub-nodes in three ways, namely V_cIs an expression of attention.

e′＝ReLU(Reb^q(e′)+Reb^c(V_c)+b)

y_i＝sigmoid(W₂·ReLU(W₁·e′_r)+b)

And 8: training a model, training the whole model by using a large number of standard data structure codes to ensure that the accuracy of the model for judging any code reaches more than 50 percent, wherein all the code training models with data structures are one epoch at a time, the flow of the one-time model of the one-time code training is to provide one section of code, a lexical analyzer is used for carrying out lexical analysis on the section of code, numbers such as 1, 1.1 and the like are replaced by Num, all variable names are replaced by Name, all character strings are replaced by Str, the code after the lexical analysis is converted into an abstract syntax tree by the syntax analyzer, each node in the abstract syntax tree is embedded, namely the node is found into a corresponding real-dimensional vector, and a residual block is used for carrying out nonlinear conversion on the embedded code of each node to obtain a new semantic code. The following formula:

e′＝Reb^q(e)＝LN(W₂·ReLU(W₁·e)+e)

V_c＝A·H^T

A＝softmax(score(Q,H))

e″＝ReLU(Reb^q(e′)+Reb^c(V_c)+b)

And finally, classifying the codes by using the codes of the root nodes, wherein the codes of the root nodes have a loss value caused by the difference between the prediction probability and the real probability through a sigmoid function, and updating each parameter through reverse gradient propagation so as to achieve the training effect.

The method for automatically labeling a data structure on a code provided by the implementation of the invention is described in detail, the principle and the implementation mode of the invention are explained in the text, and the description of the implementation mode is only used for assisting in understanding the method and the core idea of the invention.

Claims

1. A method for automatically tagging code with a data structure, the method comprising:

collecting a plurality of codes marked with data structures;

converting the code into an abstract syntax tree by using a lexical analyzer and a syntax analyzer;

coding the nodes on the tree by using an attention mechanism and a residual block, and labeling the codes by using codes;

training a model and predicting new codes by using the trained model, wherein the training model comprises:

training an integral model by using a large number of label data structure codes, and firstly, carrying out lexical analysis on the codes by using a lexical analyzer;

converting the codes after lexical analysis into an abstract syntax tree by using a syntax analyzer;

embedding each node in the abstract syntax tree, namely finding the corresponding real-dimensional vector for the node;

carrying out nonlinear transformation on the embedded codes of each node by using a residual block to obtain new semantic codes; the following formula:

e'＝Reb_q(e)＝LN(W₂·ReLU(W₁·e)+e)

where e is the embedded code corresponding to the current node, e ∈ R^{embedding_size}Embedding size is the dimension, W, of each node embedding₁∈R^{d_i×embedding_size},W₂∈R^{embedding_size×d_i}D _ i is a hyper-parameter, ReLU is a ReLU activation function, LN is hierarchical normalization, and Reb is a residual block;

coding non-leaf nodes on the tree from bottom to top, and calculating semantic expressions which are most relevant to all sub nodes under the current node and the current node by using an attention mechanism;

V_c＝A·H^T

A＝softmax(score(Q,H))

q is a matrix formed by superposing vectors of n same current nodes after residual block transformation, H is a matrix formed by superposing vectors of n sub-nodes under the current nodes after residual block transformation, the score function is used for calculating the similarity between the current node expression and each sub-node expression, the higher the similarity is, the higher the probability after softmax is, the score function can calculate the similarity between the current node and the sub-nodes in three modes, and V is_cIs an expression of attention; then the attention vector and the current node vector are fused to form a new vector expression of the current node, wherein the vector expression of the current node comprises the semantic expression of the current node and the semantic expression of all sub-nodes; such as the following equation:

e”＝ReLU(Reb^q(e')+Reb_c(V_c)+b)

where e' is the vector encoding of the current node, V_cIs an attention vector V_cReb is a residual block, b is a bias value, ReLU is a ReLU activation function, and e 'is vector coding formed by fusing e' current node vector coding with residual block coding and Vc attention vector coding with residual block coding;

finally, the codes of the root nodes are used for classification, and because the codes possibly belong to a plurality of classes, a plurality of sigmoid classifiers are used for obtaining a plurality of labels of the data structures;

yi＝sigmoid(W₂·ReLU(W₁·e'_r)+b)

e'_ras semantic expression of a root node, the ReLU is a ReLU activation function, and the sigmoid is a sigmoid function;

the coding of the root node has a prediction probability and a real probability difference through a sigmoid function, a loss value is generated, and each parameter is updated through reverse gradient propagation, so that the training effect is achieved.

2. The method of automatically tagging code with data structures according to claim 1, wherein said collecting a plurality of data structure tagged codes comprises:

hundreds of thousands of codes are collected from the web through a crawler technology, wherein corresponding data structures are labeled, and the data structures comprise trees, linked lists and queues.

3. The method of automatically tagging a data structure in code according to claim 1, wherein said using a lexical analyzer comprises using a lexical analyzer to replace different types of variables in code with corresponding words due to different grammars of the code requiring different lexical analyzers for different languages.

4. The method of automatically tagging code with data structures of claim 1, wherein said using a parser comprises using a corresponding parser for different languages to convert lexically parsed code into an abstract syntax tree using a parser.

5. The method for automatically tagging a data structure to a code according to claim 1, wherein said encoding a node on a tree using an attention mechanism and a residual block comprises:

performing word embedding on words generated after lexical analysis and syntactic analysis to convert the words into real-valued vectors;

all nodes are encoded on the tree using attention mechanisms and residual blocks.

6. The method for automatically tagging a code with a data structure according to claim 1, wherein said tagging a code with a code comprises:

classifying using the codes of the root nodes, wherein the codes can belong to a plurality of categories, so that a plurality of sigmoid classifiers are used for obtaining tags of a plurality of data structures;

yi＝sigmoid(W₂·ReLU(W₁·e'_r)+b)

wherein e'_rFor semantic expression of a root node, W1, W2 and b are parameters needing learning, ReLU is a ReLU activation function, and sigmoid is a sigmoid function.

7. The method for automatically tagging data structures in code according to claim 1, wherein said predicting new code using a trained model comprises:

predicting a new code by using a trained model to obtain a section of new code, performing lexical analysis on the section of new code by using a lexical analyzer, replacing numbers 1 and 1.1 with Num, replacing all variable names with Name, replacing all character strings with Str, converting the lexical analyzed code into an abstract syntax tree by using a syntax analyzer, embedding each node in the abstract syntax tree, embedding Num and Name nodes, namely finding the corresponding real-dimensional vector for the node, coding the vector of each node by using a residual block to obtain new code, sequentially coding each node from bottom to top by using an attention mechanism, and finally classifying by using the coding of a root node, judging as a plurality of data structures due to the use of a plurality of sigmoid classifiers, wherein if one classifier predicts that the probability of a certain label is more than 50 percent, the section of code belongs to the category, a threshold is set, e.g., the prediction probability is higher than 70%, and the code is considered to belong to this category.

8. The method for automatically tagging a code with a data structure according to claim 5, wherein said using residual block coding on a tree comprises:

carrying out nonlinear transformation on the embedded codes of each node by using the same residual block Reb to obtain new semantic codes of the node; the following formula:

e'＝Reb^q(e)＝LN(W₂·ReLU(W₁·e)+e)

where e is the embedded code corresponding to the current node, e ∈ R^{embedding_size}Embedding size is the dimension, W, of each node embedding₁∈R^{d_i×embedding_size},W₂∈R^{embedding_size×d_i}D _ i is a hyper-parameter, ReLU is a ReLU activation function, LN is a hierarchical normalization, and Reb is a residual block.

9. The method for automatically tagging code with a data structure according to claim 5, wherein said encoding on a tree using an attention mechanism comprises:

V_c＝A·H^T

A＝softmax(score(Q,H))

q is a matrix formed by superposing vectors of n same current nodes transformed by the residual blocks, H is a matrix formed by superposing vectors of n sub-nodes under the current nodes transformed by the residual blocks, scorThe e function is used for calculating the similarity between the current node expression and each sub-node expression, the higher the similarity is, the higher the probability after softmax is, the more the score function can calculate the similarity between the current node and the sub-node through three modes, and V is_cFor the attention expression, the attention vector and the current node vector are fused to form a new vector expression of the current node, and the vector expression of the current node comprises the current node vector expression

The semantic expression of the self also comprises the semantic expression of all sub nodes; such as the following equation:

e”＝ReLU(Reb^q(e')+Reb_c(V_c)+b)

e' is the vector of the current node and the attention vector V_cThe fused representation, ReLU, is the ReLU activation function, Reb residual block.