CN112148879A

CN112148879A - Computer readable storage medium for automatically labeling code with data structure

Info

Publication number: CN112148879A
Application number: CN202011019000.8A
Authority: CN
Inventors: 不公告发明人
Original assignee: Zhongsenyunlian Chengdu Technology Co ltd
Current assignee: Zhongsenyunlian Chengdu Technology Co ltd
Priority date: 2019-04-16
Filing date: 2019-04-16
Publication date: 2020-12-29
Anticipated expiration: 2039-04-16
Also published as: CN110008344A; CN110008344B; CN112148879B

Abstract

The invention discloses a computer readable storage medium for automatically marking a code with a data structure label, which belongs to the field of natural language processing under artificial intelligence, and comprises the following steps: converting the code into an abstract syntax tree using a lexical analyzer and a syntax analyzer; modeling the abstract syntax tree, and coding each node on the tree from bottom to top by using an attention mechanism and a residual block to obtain the code of the whole tree; and finally, labeling the code with a data structure through a classifier in the model. According to the method, the data structure label can be automatically marked on the code, and the workload of manually marking the data structure label on the code is reduced.

Description

Computer readable storage medium for automatically labeling code with data structure

The application is a divisional application with application date of 2019, 4, month and 16, application number of 2019103047977 and the name of 'a method for automatically labeling a data structure for a code'.

Technical Field

The invention belongs to the field of natural language processing under artificial intelligence, and particularly relates to a computer readable storage medium for automatically marking a code with a data structure label.

Background

With the popularization of the internet, a large number of high-quality codes appear on the internet, but many codes do not have labels of data structures, so that the user is inconvenient to inquire and learn, and the manual marking of data structure labels for massive codes is unrealistic.

Disclosure of Invention

The invention provides a computer readable storage medium for automatically labeling a code with a data structure. A computer readable medium having stored thereon a computer program, wherein said program when executed by a processor implements a method for automatically tagging data structures in a code, the method using a lexical analyzer and a syntax analyzer to convert the code into an abstract syntax tree, then embedding words in each word, sequentially encoding each node from bottom to top using a residual block and an attention mechanism on the tree, and finally obtaining a coding of a root node, the coding including both syntax and semantic expressions of all sub-nodes and a semantic expression of its own node, and finally classifying using the expression of the root node, because a section of the code may contain multiple data structures, multiple sigmoid classifiers are used to obtain multiple data structure tags.

The present invention is a computer readable medium having stored thereon a computer program characterized in that said program is executed by a processor for a method of automatically tagging code with a data structure, comprising the steps of:

step 1: code for a number of annotated data structures is collected from web pages using crawler technology.

Step 2: because different codes have different grammars, different lexical analyzers are required to be used for different languages, the lexical analyzers are used for replacing variables of different types in the codes with corresponding words, and the lexical analyzers replace numbers such as 1, 1.1 and the like with Num; the lexical analyzer replaces all variable names with names; all the character strings of the lexical analyzer are replaced by Str, wherein the lexical analyzer does not replace the keywords corresponding to the language.

And step 3: and (4) corresponding parsers are used for different languages, and the parsers are used for converting the codes after lexical analysis into abstract syntax trees.

And 4, step 4: and performing word embedding on words generated after lexical analysis and syntactic analysis, and performing word embedding on words such as Num, Name, root node Module, assignment operation Assign and the like.

And 5: and carrying out nonlinear transformation on the embedded codes of each node by using the same residual block Reb to obtain new semantic codes.

e′＝Reb^q(e)＝LN(W₂·ReLU(W₁·e)+e)

Wherein e is the embedded code corresponding to the current node, e belongs to R^embedding_^sizeEmbedding size is the dimension, W, of each node embedding₁∈R^{d_i×embedding_size}，W₂∈R^{embedding_size×d_i}D _ i is a hyper-parameter, ReLU is a ReLU activation function, LN is a hierarchical normalization, and Reb is a residual block.

Step 6: non-leaf nodes are encoded on the tree from bottom to top, and the semantic expression of all sub-nodes under the current node which is most relevant to the current node is calculated by using an attention mechanism.

V_c＝A·H^T

A＝softmax(score(Q,H))

Q is a matrix formed by superposing vectors of n same current nodes after residual block transformation, H is a matrix formed by superposing vectors of n sub-nodes under the current nodes after residual block transformation, the score function is used for calculating the similarity of the current node expression and each sub-node expression, the higher the similarity is, the higher the probability after softmax is, the more the score function can calculate the similarity of the current node and the sub-nodes in three modes, and V is_cIs an expression of attention.

And fusing the attention vector and the current node vector to form a new vector expression of the current node, wherein the vector expression of the current node comprises the semantic expression of the current node and the semantic expression of all sub-nodes. Such as the following equation:

e″＝ReLU(Reb^q(e′)+Reb^c(V_c)+b)

where e' is the vector encoding of the current node, V_cFor the attention vector, Reb is a residual block, b is a bias value, ReLU is a ReLU activation function, and e 'is vector coding fused by using residual block coding for e' current node vector coding and using residual block coding for Vc attention vector coding.

And 7: according to the above formula, the expression of each node is calculated on the tree from bottom to top, and finally the expression of the root node is used for classification, and because the code possibly belongs to a plurality of categories, a plurality of sigmoid classifiers are used for obtaining a plurality of labels of the data structure.

y_i＝sigmoid(W₂·ReLU(W₁·e′_r)+b)

Wherein e'_rFor semantic expression of a root node, W1 and W2 are parameters, b is a bias value, ReLU is a ReLU activation function, and sigmoid is a sigmoid function.

And 8: the method comprises the steps of training a model, using a large number of label data structure codes to train an integral model, firstly using a lexical analyzer to carry out lexical analysis on the codes, replacing numbers such as 1 and 1.1 with Num, replacing all variable names with Name, replacing all character strings with Str, using a grammar analyzer to convert the codes after the lexical analysis into an abstract syntax tree, embedding each node in the abstract syntax tree, namely finding the corresponding real-dimensional vector of the node, and using a residual block to carry out nonlinear transformation on the embedded codes of each node to obtain new semantic codes. The following formula:

e′＝Reb^q(e)＝LN(W₂·ReLU(W₁·e)+e)

wherein e is the embedded code corresponding to the current node, e belongs to R^{embedding_size}Embedding size is the dimension, W, of each node embedding₁∈R^{d_i×embedding_size}，W₂∈R^{embedding_size×d_i}D _ i is a hyper-parameter, ReLU is a ReLU activation function, LN is a hierarchical normalization, and Reb is a residual block.

Non-leaf nodes are encoded on the tree from bottom to top, and the semantic expression of all sub-nodes under the current node which is most relevant to the current node is calculated by using an attention mechanism.

V_c＝A·H^T

A＝softmax(score(Q,H))

Q is a matrix formed by superposing vectors of n same current nodes after residual block transformation, H is a matrix formed by superposing vectors of n sub-nodes under the current nodes after residual block transformation, the score function is used for calculating the similarity between the current node expression and each sub-node expression, the higher the similarity is, the higher the probability after softmax is, the score function can calculate the similarity between the current node and the sub-nodes in three modes, and V is_cIs an expression of attention.

e′＝ReLU(Reb^q(e′)+Reb^c(V_c)+b)

e' is the vector of the current node and the attention vector V_cExpression after fusion.

Finally, the codes of the root nodes are used for classification, and because the codes can belong to a plurality of classes, a plurality of sigmoid classifiers are used for obtaining a plurality of labels of the data structures.

y_i＝sigmoid(W₂·ReLU(W₁·e′_r)+b)

e’_rFor semantic expression of a root node, the ReLU is a ReLU activation function, and the sigmoid is a sigmoid function.

The coding of the root node has a difference between the predicted probability and the true probability through the sigmoid function, and a loss value is generated. And each parameter is updated through inverse gradient propagation, so that the training effect is achieved.

And step 9: predicting a new code by using a trained model to obtain a section of new code, performing lexical analysis on the section of new code by using a lexical analyzer, replacing numbers such as 1, 1.1 and the like with Num, replacing all variable names with Name, replacing all character strings with Str, converting the lexical analyzed code into an abstract syntax tree by using a syntax analyzer, embedding each node in the abstract syntax tree, such as Num, Name and the like, namely finding the node into a corresponding real-dimensional vector, encoding the vector of each node by using a residual block to obtain a new code, encoding each node from bottom to top by using an attention machine, and finally classifying by using the codes of root nodes, wherein a plurality of sigmoid classifiers can be judged to be a plurality of data structures, if one classifier predicts that the probability of a certain label is more than 50 percent, the section of code belongs to the category, a threshold value, such as a prediction probability higher than 70%, may also be set to consider the code as belonging to this category.

Drawings

FIG. 1 is a schematic flow chart of the present invention.

Fig. 2 is a diagram of an abstract syntax tree of the a ═ b + c code.

Fig. 3 is a diagram of abstract syntax tree coding.

Detailed Description

The present invention will now be described in further detail with reference to the accompanying drawings. These drawings are simplified schematic views illustrating only the basic structure of the present invention in a schematic manner, and thus show only the constitution related to the present invention.

FIG. 1 shows a flow diagram of a method for automatically tagging code with data structures, comprising:

firstly, a crawler technology is used for collecting a large number of codes with data structures from various blogs, forums and other networks;

secondly, performing lexical analysis on the codes by using a lexical analyzer, replacing numbers such as 1, 1.1 and the like with Num, replacing all variable names with Name, and replacing all character strings with Str, wherein the lexical analyzer does not replace keywords corresponding to the language;

thirdly, using a syntax analyzer to analyze the syntax of the code and converting the code into an abstract syntax tree;

fourthly, coding each node in the abstract syntax tree by using a residual block, wherein each node obtains a new code coded by the residual block, coding each node from bottom to top in sequence on the tree by using an attention mechanism, fusing information of all sub-nodes of each node and a current node, and encoding the node from bottom to top by a layer until a root node on the tree is coded;

and fifthly, training the model by using a large number of label data structure codes to train the integral model, firstly, performing lexical analysis on the codes by using a lexical analyzer, replacing numbers such as 1, 1.1 and the like with Num, replacing all variable names with Name, replacing all character strings with Str, converting the codes after the lexical analysis into an abstract syntax tree by using a syntax analyzer, embedding each node in the abstract syntax tree, namely finding the corresponding real-dimensional vector of the node, and performing nonlinear transformation on the embedded codes of each node by using a residual block to obtain new semantic codes. The following formula:

e′＝Reb^q(e)＝LN(W₂·ReLU(W₁·e)+e)

V_c＝A·H^T

A＝softmax(score(Q,H))

e′＝ReLU(Reb^q(e′)+Reb^c(V_c)+b)

y_i＝sigmoid(W₂·ReLU(W₁·e′_r)+b)

Sixthly, predicting a new code by using a trained model to obtain a section of new code, performing lexical analysis on the section of new code by using a lexical analyzer, replacing numbers such as 1, 1.1 and the like with Num, replacing all variable names with Name, replacing all character strings with Str, converting the lexical analyzed code into an abstract syntax tree by using a syntax analyzer, embedding each node in the abstract syntax tree, namely embedding nodes such as Num, Name and the like, namely finding corresponding real-dimensional vectors for the nodes, coding the vector of each node by using a residual block to obtain new codes, sequentially coding each node from bottom to top by using an attention system, and finally classifying by using codes of root nodes, judging as a plurality of data structures due to the use of a plurality of sigmoid classifiers, if one classifier predicts that the probability of a certain label is more than 50%, then the code belongs to the category, or a threshold value may be set, for example, if the prediction probability is higher than 70%, before the code is considered to belong to the category.

Fig. 2 shows a schematic diagram of an abstract syntax tree of a ═ b + c code, which includes Module, Assign, Name, Store, BinOp, Load, Add names. The following sequentially introduces, Module is the root node, the start of all codes; assign is an assignment symbol, specifically, in a code of a ═ b + c ═ c; the Name is an abstract Name of variable names, which variable is not clear, but the Name is a variable a, b and c as seen from the code; store is a storage symbol, and the value calculated by b + c is assigned to a and stored in a memory; BinOp is a binary operation, such as addition, subtraction, multiplication, division; load is a loading symbol and loads the value of a certain variable; add is the sign of the addition, adding the values of the two variables.

In fig. 2, the abstract syntax tree flow of the a ═ b + c code is that a Module root node is given first, how many lines of codes exist under the Module root node, and there is only one line of codes, the a ═ b + c code is mainly used for performing assignment operation, an Assign node is provided under the Module root node, the Assign node has a left sub-tree and a right sub-tree, the left sub-tree represents assignment to a variable, i.e., the variable on the left side of the equal sign in a ═ b + c, the right sub-tree represents the symbol on the right side of the equal sign in a ═ b + c, the left sub-node of the Assign node to the Name, the sub-node under the Name is a Store node, and represents assignment of the value on the right side of the equation to the left side and stores the value in the memory.

The right subnode of the Assign node of the Assign has BinOp binary operation symbols, which represent that addition, subtraction, multiplication and division are possible below. An Add addition symbol is arranged below the BinOp binary operation symbol to represent that the right side of the code equal sign is addition operation, the Name variable symbol on the left side of the Add addition symbol is a b variable in the code a ═ b + c, a subnode below the variable symbol is a Load loading symbol, and the fact that calculation needs to be carried out by using a value in the b variable is explained. The Name variable symbol on the right side of the Add addition symbol is a c variable in a code a ═ b + c, a Load symbol is arranged below the variable symbol, and calculation needs to be performed by using the value in the c variable.

In an abstract syntax tree, Module is a root node, Assign is an assignment node, calculate a BinOp binary operation symbol on the right, represent a b variable by a Name variable symbol, take out a value in the b variable by Load, represent a c variable by another Name variable symbol, take out a value in the c variable by Load, operate the values of the two variables, Add the value in the b variable and the value in the c variable by looking at an Add addition symbol, Assign the calculated value to the Name variable symbol by the Assign symbol after adding, specifically, an a variable in an a b + c code, Assign the value of the a variable to the a variable, Store the value of the a variable by using a Store symbol, and finally Store the value assigned to the a variable in a memory by the Store symbol.

Fig. 3 shows a schematic diagram of encoding an abstract syntax tree, which is described from bottom to top, where the lowermost leaf nodes are Name Embedding, Add Embedding, and Name Embedding, where Embedding represents that the nodes are embedded, that is, the nodes are converted into real-valued vectors, then the three nodes are further encoded by using a residual block to obtain semantic codes of the three nodes, then Embedding is performed on a BinOp binary operation node, converting the BinOp binary operation node into a corresponding real-valued vector, and then encoding the BinOp node by using the residual block. Such as the following equation:

e′＝Reb^q(e)＝LN(W₂·ReLU(W₁·e)+e)

wherein e is the embedded code corresponding to the current node, e belongs to R^{embedding_size}Embedding size is the dimension, W, of each node embedding₁∈R^{d_i×embedding_size}，W₂∈R^{embedding_size×d_i}D _ i is a hyper-parameter, ReLU is a ReLU activation function, LN is a hierarchical normalization, Reb is a residual errorAnd (5) blocking.

And after encoding, calculating semantic expressions which are most relevant to all sub nodes under the current BinOp node and the current BinOp node by using an attention mechanism. Such as the following equation:

V_c＝A·H^T

A＝softmax(score(Q,H))

And then, fusing the semantic coding vectors of all sub-nodes under the BinOp node obtained by the attention mechanism with the coding vectors of the BinOp node by the residual block to form a new vector expression of the current node, wherein the vector expression of the current node comprises the semantic expression of the current node and the semantic expression of all the sub-nodes. Such as the following equation:

e′＝ReLU(Reb^q(e′)+Reb^c(V_c)+b)

The coding vector of the BinOp binary operation node is obtained at present, then the left sub-tree of the Assign Embedding node is coded, the left sub-tree only has the Name Embedding node, Embedding is carried out on the Name variable node, the coding vector of the node is obtained, as with the above, a residual block is used for coding the Name Embedding node to obtain a new semantic expression, all sub-nodes of the Assign Embedding node have coding vectors, Embedding is carried out on the Assign node at present, the Assign node is converted into a real value vector, and the residual block is used for recoding the Assign Embedding node vector to obtain a new semantic expression. And then, calculating the semantic expression of all sub-nodes under the current Assign Embedding node and the most relevant to the current Assign Embedding node by using an attention mechanism. And then fusing the semantic coding vectors of all sub-nodes under the design Embedding node obtained by the attention mechanism with the coding vectors of the design Embedding node by the residual block to form a new vector expression of the current node, wherein the vector expression of the current node comprises the semantic expression of the current node and the semantic expression of all sub-nodes.

And at the final stage of encoding, Embedding Module root nodes into the Embedding Embedding to convert the Embedding Embedding into corresponding real value vectors, and re-encoding the Module Embedding by using a residual block to obtain new semantic encoding. And then, calculating the semantic expression of all sub nodes under the current Module Embedding node and the most relevant to the current Module Embedding node by using an attention mechanism. And then, fusing the semantic coding vectors of all sub nodes under the Module Embedding node obtained by the attention mechanism with the coding vectors of the Module Embedding node by the residual block to form a new vector expression of the current node. The semantic coding vector of the Module node is obtained at present and also represents the semantic coding of the whole code, the semantic coding vector of the Module node is input into a plurality of Sigmoid functions, and each Sigmoid function judges whether the data structure is a certain data structure.

The technical scheme of the invention is described in detail in the following with the accompanying drawings:

as shown in fig. 1, the main process of the present invention is:

Step 2: because different codes have different grammars, different lexical analyzers are required to be used for different languages, the lexical analyzers are used for replacing variables of different types in the codes with corresponding words, the lexical analyzers replace numbers such as 1, 1.1 and the like with Num, the lexical analyzers replace all variable names with Name, the lexical analyzers replace all character strings with Str, and the lexical analyzers cannot replace keywords corresponding to the languages.

And step 3: and (4) corresponding parsers are used for different languages, and the parsers are used for converting the codes after lexical analysis into abstract syntax trees. As shown in fig. 2, the a ═ b + c code is converted to an abstract syntax tree using a python ast toolkit.

And 4, step 4: and performing word embedding on words generated after lexical analysis and syntactic analysis, such as word embedding on words of Num, Name, root node Module, assignment operation Assign and the like.

e′＝Reb^q(e)＝LN(W₂·ReLU(W₁·e)+e)

V_c＝A·H^T

A＝softmax(score(Q,H))

e′＝ReLU(Reb^q(e′)+Reb^c(V_c)+b)

y_i＝sigmoid(W₂·ReLU(W₁·e′_r)+b)

And 8: training a model, training the whole model by using a large number of standard data structure codes to ensure that the accuracy of the model for judging any code reaches more than 50 percent, wherein all the code training models with data structures are one epoch at a time, the flow of the one-time model of the one-time code training is to provide one section of code, a lexical analyzer is used for carrying out lexical analysis on the section of code, numbers such as 1, 1.1 and the like are replaced by Num, all variable names are replaced by Name, all character strings are replaced by Str, the code after the lexical analysis is converted into an abstract syntax tree by the syntax analyzer, each node in the abstract syntax tree is embedded, namely the node is found into a corresponding real-dimensional vector, and a residual block is used for carrying out nonlinear conversion on the embedded code of each node to obtain a new semantic code. The following formula:

e′＝Reb^q(e)＝LN(W₂·ReLU(W₁·e)+e)

V_c＝A·H^T

A＝softmax(score(Q,H))

e′＝ReLU(Reb^q(e′)+Reb^c(V_c)+b)

And finally, classifying the codes by using the codes of the root nodes, wherein the codes of the root nodes have a loss value caused by the difference between the prediction probability and the real probability through a sigmoid function, and updating each parameter through reverse gradient propagation so as to achieve the training effect.

As another aspect, the present invention also provides a computer-readable medium, which may be

Contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to implement the configuration information based data query method of the present invention.

The method for automatically labeling a data structure on a code provided by the implementation of the invention is described in detail, the principle and the implementation mode of the invention are explained in the text, and the description of the implementation mode is only used for assisting in understanding the method and the core idea of the invention.

Claims

1. A computer-readable medium, on which a computer program is stored, which program, when executed by a processor, implements a method of automatically tagging code with a data structure, comprising:

collecting a plurality of codes marked with data structures;

converting the code into an abstract syntax tree by using a lexical analyzer and a syntax analyzer;

coding the nodes on the tree by using an attention mechanism and a residual block, and labeling the codes by using codes;

training a model and predicting new codes by using the trained model, wherein the training model comprises:

training an integral model by using a large number of label data structure codes, and firstly, carrying out lexical analysis on the codes by using a lexical analyzer;

converting the codes after lexical analysis into an abstract syntax tree by using a syntax analyzer;

embedding each node in the abstract syntax tree, namely finding the corresponding real-dimensional vector for the node;

carrying out nonlinear transformation on the embedded codes of each node by using a residual block to obtain new semantic codes; the following formula:

e'＝Reb_q(e)＝LN(W₂·ReLU(W₁·e)+e)

wherein e is the embedded code corresponding to the current node, e belongs to R^{embedding_size}Embedding size is the dimension, W, of each node embedding₁∈R^{d_i×embedding_size},W₂∈R^{embedding_size×d_i}D _ i is a hyper-parameter, ReLU is a ReLU activation function, LN is hierarchical normalization, and Reb is a residual block;

coding non-leaf nodes on the tree from bottom to top, and calculating semantic expressions which are most relevant to all sub nodes under the current node and the current node by using an attention mechanism;

V_c＝A·H^T

A＝softmax(score(Q,H))

q is a matrix formed by superposing vectors of n same current nodes after residual block transformation, H is a matrix formed by superposing vectors of n sub-nodes under the current nodes after residual block transformation, the score function is used for calculating the similarity between the current node expression and each sub-node expression, the higher the similarity is, the higher the probability after softmax is, the score function can calculate the similarity between the current node and the sub-nodes in three modes, and V is_cIs an expression of attention; then the attention vector and the current node vector are fused to form a new vector expression of the current nodeAt present, the vector expression of the current node not only comprises the semantic expression of the current node, but also comprises the semantic expression of all sub-nodes; such as the following equation:

e”＝ReLU(Reb^q(e')+Reb_c(V_c)+b)

e 'where e' is the vector encoding of the current node, V_cIs an attention vector V_cReb is a residual block, b is a bias value, ReLU is a ReLU activation function, and e 'is vector coding formed by fusing e' current node vector coding with residual block coding and Vc attention vector coding with residual block coding;

finally, the codes of the root nodes are used for classification, and because the codes possibly belong to a plurality of classes, a plurality of sigmoid classifiers are used for obtaining a plurality of labels of the data structures;

yi＝sigmoid(W₂·ReLU(W₁·e'_r)+b)

e'_ras semantic expression of a root node, the ReLU is a ReLU activation function, and the sigmoid is a sigmoid function;

the coding of the root node has a prediction probability and a real probability difference through a sigmoid function, a loss value is generated, and each parameter is updated through reverse gradient propagation, so that the training effect is achieved.

2. The computer-readable medium of claim 1, wherein collecting the plurality of tagged data structures comprises collecting hundreds of thousands of code tagged with corresponding data structures from a web via a crawler technique, wherein the data structures comprise trees, linked lists, and queues.

3. The computer-readable medium of claim 1, wherein the parsing the lexical representation includes using a lexical analyzer to replace different types of variables in the code with corresponding words, wherein different lexical analyzers for different languages are required due to different grammars of the code.

4. The computer-readable medium of claim 1, wherein the parsing includes translating the lexically analyzed code into an abstract syntax tree using a parser for different languages using corresponding parsers.

5. The computer-readable medium of claim 1, wherein encoding the nodes on the tree using an attention mechanism and a residual block comprises:

performing word embedding on words generated after lexical analysis and syntactic analysis to convert the words into real-valued vectors;

all nodes are encoded on the tree using attention mechanisms and residual blocks.

6. The computer-readable medium of claim 1, wherein the tagging a code with a code using encoding comprises:

classifying using the codes of the root nodes, wherein the codes can belong to a plurality of categories, so that a plurality of sigmoid classifiers are used for obtaining tags of a plurality of data structures;

yi＝sigmoid(W₂·ReLU(W₁·e'_r)+b)

wherein e'_rFor semantic expression of a root node, W1, W2 and b are parameters needing learning, ReLU is a ReLU activation function, and sigmoid is a sigmoid function.

7. The computer-readable medium of claim 1, wherein predicting the new code using the trained model comprises:

predicting a new code by using a trained model to obtain a section of new code, performing lexical analysis on the section of new code by using a lexical analyzer, replacing numbers 1 and 1.1 with Num, replacing all variable names with Name, replacing all character strings with Str, converting the lexical analyzed code into an abstract syntax tree by using a syntax analyzer, embedding each node in the abstract syntax tree, embedding Num and Name nodes, namely finding the corresponding real-dimensional vector for the node, coding the vector of each node by using a residual block to obtain new code, sequentially coding each node from bottom to top by using an attention mechanism, and finally classifying by using the coding of a root node, judging as a plurality of data structures due to the use of a plurality of sigmoid classifiers, wherein if one classifier predicts that the probability of a certain label is more than 50 percent, the section of code belongs to the category, a threshold is set, e.g., the prediction probability is higher than 70%, and the code is considered to belong to this category.

8. The computer-readable medium of claim 5, wherein the using residual block coding on the tree comprises:

carrying out nonlinear transformation on the embedded codes of each node by using the same residual block Reb to obtain new semantic codes of the node; the following formula:

e'＝Reb^q(e)＝LN(W₂·ReLU(W₁·e)+e)

wherein e is the embedded code corresponding to the current node, e belongs to R^{embedding_size}Embedding size is the dimension, W, of each node embedding₁∈R^{d_i×embedding_size},W₂∈R^{embedding_size×d_i}D _ i is a hyper-parameter, ReLU is a ReLU activation function, LN is a hierarchical normalization, and Reb is a residual block.

9. The computer-readable medium of claim 5, wherein the encoding on the tree using an attention mechanism comprises:

V_c＝A·H^T

A＝softmax(score(Q,H))

q is a matrix formed by vector superposition of n same current nodes after residual block transformation, and H is a channel of n sub-nodes under the current nodesThe score function is a matrix obtained after vector superposition after residual block transformation, the similarity between the current node expression and each sub-node expression is calculated, the higher the similarity is, the higher the probability after softmax is, the similarity between the current node expression and the sub-nodes can be calculated through three modes, and V is_cIs an expression of attention. Then the attention vector and the current node vector are fused to form a new vector expression of the current node, wherein the vector expression of the current node comprises the semantic expression of the current node and the semantic expression of all sub-nodes; such as the following equation:

e”＝ReLU(Reb^q(e')+Reb_c(V_c)+b)

e' is the vector of the current node and the attention vector V_cThe fused representation, ReLU, is the ReLU activation function, Reb residual block.