CN112035099A

CN112035099A - Vectorization representation method and device for nodes in abstract syntax tree

Info

Publication number: CN112035099A
Application number: CN202010907349.9A
Authority: CN
Inventors: 董叶豪; 刘盈
Original assignee: Beijing Topsec Technology Co Ltd; Beijing Topsec Network Security Technology Co Ltd; Beijing Topsec Software Co Ltd
Current assignee: Beijing Topsec Technology Co Ltd; Beijing Topsec Network Security Technology Co Ltd; Beijing Topsec Software Co Ltd
Priority date: 2020-09-01
Filing date: 2020-09-01
Publication date: 2020-12-04
Anticipated expiration: 2040-09-01
Also published as: CN112035099B

Abstract

The embodiment of the application provides a vectorization representation method and a vectorization representation device for nodes in an abstract syntax tree, which relate to the technical field of computers, and the vectorization representation method for the nodes in the abstract syntax tree comprises the following steps: firstly, acquiring an abstract syntax tree to be processed; then, performing breadth-first traversal on the abstract syntax tree to obtain a first sequence, and performing depth-first traversal on the abstract syntax tree again to obtain a second sequence; further, generating a coding sequence to be processed according to the first sequence and the second sequence; and finally, processing the coding sequence to be processed through a pre-constructed vectorization processing model to obtain a vectorization representation result of the nodes in the abstract syntax tree. Therefore, the method can completely cover all the nodes in the abstract syntax tree, and further accurately carry out vectorization representation on the nodes in the abstract syntax tree.

Description

Vectorization representation method and device for nodes in abstract syntax tree

Technical Field

The present application relates to the field of computer technologies, and in particular, to a vectorization representation method and apparatus for nodes in an abstract syntax tree.

Background

An Abstract Syntax Tree (AST), or syntax tree, is a tree representation of an abstract syntax structure of source code data written in a programming language, with each node of the tree representing a construct that appears in the source code data. In the existing vectorization representation method of the nodes in the abstract syntax tree, the sub-nodes of the nodes in the abstract syntax tree are usually directly coded to obtain the vectorization representation of the nodes in the abstract syntax tree. In practice, it is found that the existing vectorization representation method only uses its child nodes and discards sibling and grandchild nodes, resulting in loss of node information. Therefore, the existing vectorization representation method of the nodes in the abstract syntax tree cannot accurately carry out vectorization representation on the nodes in the abstract syntax tree.

Disclosure of Invention

The embodiments of the present application provide a method and an apparatus for vectorizing and representing nodes in an abstract syntax tree, which can comprehensively cover all nodes in the abstract syntax tree, and further accurately perform vectorization and representation on the nodes in the abstract syntax tree.

A first aspect of an embodiment of the present application provides a vectorization representation method for nodes in an abstract syntax tree, including:

acquiring an abstract syntax tree to be processed;

performing breadth-first traversal on the abstract syntax tree to obtain a first sequence, and performing depth-first traversal on the abstract syntax tree to obtain a second sequence;

generating a coding sequence to be processed according to the first sequence and the second sequence;

and processing the coding sequence to be processed through a pre-constructed vectorization processing model to obtain a vectorization representation result of the nodes in the abstract syntax tree.

In the implementation process, firstly, an abstract syntax tree to be processed is obtained; then, performing breadth-first traversal on the abstract syntax tree to obtain a first sequence, and performing depth-first traversal on the abstract syntax tree again to obtain a second sequence; further, generating a coding sequence to be processed according to the first sequence and the second sequence; and finally, processing the coding sequence to be processed through a pre-constructed vectorization processing model to obtain a vectorization representation result of the nodes in the abstract syntax tree. Therefore, the method can completely cover all the nodes in the abstract syntax tree, and further accurately carry out vectorization representation on the nodes in the abstract syntax tree.

Further, the obtaining the abstract syntax tree to be processed includes:

acquiring source code data to be processed;

and analyzing the source code data to obtain the abstract syntax tree to be processed.

In the implementation process, the abstract syntax tree can be obtained by analyzing the source code data, and when the vectorization representation of the abstract syntax tree is carried out, the corresponding vectorization representation can be generated only by obtaining the source code data file, so that the method is simple and has wide application range.

Further, the generating a coding sequence to be processed according to the first sequence and the second sequence includes:

performing connection processing on the first sequence and the second sequence to obtain a connection sequence;

and coding the connecting sequence to obtain a coding sequence to be processed.

In the implementation process, the obtained connection sequence can cover the correlation between the brother node and the father-son node in the abstract syntax tree by connecting the first sequence and the second sequence, capture some structural rules existing between the nodes in the abstract syntax tree, and is favorable for accurately carrying out vectorization representation on the nodes in the abstract syntax tree.

Further, before the processing the to-be-processed coding sequence through the pre-constructed vectorization processing model to obtain the vectorization representation result of the node in the abstract syntax tree, the method further includes:

constructing an original processing model;

acquiring training data and preset model parameters for training the original processing model;

adjusting the original processing model through the preset model parameters to obtain an initial model;

and training the initial model through the training data to obtain a vectorization processing model.

In the implementation process, before the coding sequence to be processed is processed through the pre-constructed vectorization processing model, an original processing model needs to be constructed, and then parameter setting and training are performed on the original processing model through preset model parameters and training data, so that the vectorization processing model is obtained.

Further, the preset model parameters at least comprise a coding dimension value and a preset cost function of the coding sequence to be processed;

adjusting the original processing model through the preset model parameters to obtain an initial model, including:

setting the number of neurons of an output layer of each model unit in the original processing model as the coding dimension value to obtain an initial adjustment model;

and setting the cost function of the initial adjustment model as the preset cost function to obtain an initial model.

In the implementation process, the original processing model is adjusted through the preset model parameters, so that the accuracy of the model is favorably improved, and the accuracy of vectorization representation of the abstract syntax tree is favorably improved.

A second aspect of the embodiments of the present application provides a device for vectorizing a node in an abstract syntax tree, where the device for vectorizing a node in an abstract syntax tree includes:

the acquisition module is used for acquiring an abstract syntax tree to be processed;

the traversal module is used for performing breadth-first traversal on the abstract syntax tree to obtain a first sequence and performing depth-first traversal on the abstract syntax tree to obtain a second sequence;

the coding module is used for generating a coding sequence to be processed according to the first sequence and the second sequence;

and the model processing module is used for processing the coding sequence to be processed through a pre-constructed vectorization processing model to obtain a vectorization representation result of the nodes in the abstract syntax tree.

In the implementation process, the acquisition module acquires an abstract syntax tree to be processed; then, the traversal module performs breadth-first traversal on the abstract syntax tree to obtain a first sequence, and performs depth-first traversal on the abstract syntax tree again to obtain a second sequence; further, the coding module generates a coding sequence to be processed according to the first sequence and the second sequence; and finally, the model processing module processes the coding sequence to be processed through a pre-constructed vectorization processing model to obtain a vectorization representation result of the nodes in the abstract syntax tree. Therefore, the method can completely cover all the nodes in the abstract syntax tree, and further accurately carry out vectorization representation on the nodes in the abstract syntax tree.

Further, the obtaining module comprises:

the acquisition submodule is used for acquiring source code data to be processed;

and the analysis submodule is used for analyzing the source code data to obtain the abstract syntax tree to be processed.

In the implementation process, the parsing submodule can parse the source code data to obtain the abstract syntax tree, and when vectorization representation of the abstract syntax tree is performed, the corresponding vectorization representation can be generated only by acquiring the source code data file through the acquisition submodule, so that the method is simple and has a wide application range.

Further, the encoding module includes:

the connection submodule is used for performing connection processing on the first sequence and the second sequence to obtain a connection sequence;

and the coding submodule is used for coding the connection sequence to obtain a coding sequence to be processed.

In the implementation process, the connection sub-module connects the first sequence and the second sequence, so that the obtained connection sequence can cover the correlation between the brother node and the father-son node in the abstract syntax tree, capture some structural rules existing between the nodes in the abstract syntax tree, and be beneficial to accurately carrying out vectorization representation on the nodes in the abstract syntax tree.

A third aspect of embodiments of the present application provides an electronic device, including a memory and a processor, where the memory is used to store a computer program, and the processor runs the computer program to make the electronic device execute the vectorization representation method for nodes in an abstract syntax tree according to any one of the first aspect of embodiments of the present application.

A fourth aspect of the embodiments of the present application provides a computer-readable storage medium, which stores computer program instructions, and when the computer program instructions are read and executed by a processor, the computer program instructions perform the vectorization representation method for nodes in an abstract syntax tree according to any one of the first aspect of the embodiments of the present application.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.

Fig. 1 is a schematic flowchart of a vectorization representation method for nodes in an abstract syntax tree according to an embodiment of the present application;

fig. 2 is a schematic flowchart of a vectorization representation method for nodes in an abstract syntax tree according to a second embodiment of the present application;

fig. 3 is a schematic structural diagram of a vectorization representation apparatus for nodes in an abstract syntax tree according to a third embodiment of the present application;

fig. 4 is a schematic structural diagram of a vectorization representation apparatus for nodes in an abstract syntax tree according to a fourth embodiment of the present application;

fig. 5 is an expanded schematic view of an LSTM model according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.

Example 1

Referring to fig. 1, fig. 1 is a flowchart illustrating a vectorization representation method for nodes in an abstract syntax tree according to an embodiment of the present disclosure. The vectorization representation method of the nodes in the abstract syntax tree comprises the following steps:

and S101, acquiring an abstract syntax tree to be processed.

In this embodiment, an execution subject of the method may be an electronic device such as a computer, a server, a smart phone, a tablet computer, and the like, which is not limited in this embodiment.

In the embodiment of the present application, an Abstract Syntax Tree (AST), also called Syntax Tree (Syntax Tree), is an Abstract representation of a Syntax structure of source code data. The abstract syntax tree represents the syntax structure of the programming language in the form of a tree, with each node on the tree representing a structure in the source code data.

In the embodiment of the present application, the source code data to be processed may be analyzed to obtain the abstract syntax tree to be processed, or the pre-stored abstract syntax tree to be processed may be directly obtained, which is not limited in this embodiment of the present application.

S102, performing breadth-first traversal on the abstract syntax tree to obtain a first sequence, and performing depth-first traversal on the abstract syntax tree to obtain a second sequence.

In the embodiment of the present application, when breadth-first traversal is performed on the abstract syntax tree, a breadth-first search algorithm may be used, and specifically, the breadth-first search algorithm may be a Dijkstra single-source shortest path algorithm, a Prim minimum spanning tree algorithm, or the like, which is not limited in the embodiment of the present application.

In the embodiment of the present application, a Breadth First Search algorithm (BFS), also called a Breadth First Search algorithm, has the following main ideas: similar to the hierarchical traversal of the tree, the correlation between sibling nodes can be captured by traversing the abstract syntax tree through the BFS algorithm.

In the embodiment of the application, when the abstract syntax tree is subjected to depth-first traversal, a depth-first search algorithm can be adopted, and then the correlation between parent nodes and child nodes can be captured.

In the embodiment of the present application, a Depth First Search algorithm (DFS) has the following main ideas: firstly, taking an unvisited vertex as a starting vertex, and walking to the unvisited vertex along the edge of the current vertex; when there is no vertex that has not been visited, then go back to the previous vertex, and continue to probe other vertices until all vertices have been visited.

In the embodiment of the application, the first sequence and the second sequence are token sequences, tokens are played as character strings, and the token sequences are character string sequences obtained by traversing the abstract syntax tree.

After step S102, the method further includes the following steps:

and S103, generating a coding sequence to be processed according to the first sequence and the second sequence.

In the embodiment of the application, the first sequence and the second sequence are connected to obtain a connection sequence, and then the connection sequence is encoded to obtain a coding sequence to be processed.

In this embodiment of the present application, a one-hot encoding algorithm and the like may be used when encoding the connection sequence, which is not limited in this embodiment of the present application.

In the embodiment of the application, the One-hot coding algorithm is also called One-bit effective coding, and mainly adopts an N-bit state register to code N states, each state is provided with independent register bits, and only One bit is effective at any time. One-hot encoding is the representation of classification variables as binary vectors.

And S104, processing the coding sequence to be processed through a pre-constructed vectorization processing model to obtain a vectorization representation result of the nodes in the abstract syntax tree.

In the embodiment of the present application, the pre-constructed vectorization processing model is a neural network model, and may specifically be a Long Short-Term Memory network (LSTM) model, and the like, which is not limited in this embodiment of the present application.

In the embodiment of the present application, when the vectorization processing model is an LSTM model, the LSTM model is a time-cycle neural network, and includes an LSTM unit, and the output expectation of each time step of the LSTM unit is the token of the next time step. Referring to fig. 5, fig. 5 is an expanded view of an LSTM model according to an embodiment of the present application. As shown in fig. 5, after the LSTM model is developed, the output time steps of the LSTM unit include T1 time step, T2 time step, and T3 time step, and when training the LSTM, if a piece of training data is set to [ token1, token2, token3, token4], the output expectation of the time step of the LSTM unit T1 is token of the next time step (T2 time step), i.e., token2, similarly, the output expectation of the time step of T2 is token3, and the output expectation of the time step of T3 is token 4. And training the LSTM model through the training data to obtain a trained vectorization processing model.

It can be seen that, by implementing the vectorization representation method for the nodes in the abstract syntax tree described in this embodiment, all the nodes in the abstract syntax tree can be completely covered, and thus the nodes in the abstract syntax tree can be accurately vectorized and represented.

Example 2

Referring to fig. 2, fig. 2 is a flowchart illustrating a vectorization representation method for nodes in an abstract syntax tree according to an embodiment of the present application. As shown in fig. 2, the vectorized representation method of the node in the abstract syntax tree includes:

s201, constructing an original processing model.

In this embodiment of the present application, the original processing model may specifically be a Long Short-Term Memory network (LSTM) model, and the like, which is not limited in this embodiment of the present application.

S202, obtaining training data used for training an original processing model and preset model parameters, wherein the preset model parameters at least comprise a coding dimension value and a preset cost function of a coding sequence to be processed.

S203, setting the number of neurons of the output layer of each model unit in the original processing model as a coding dimension value to obtain an initial adjustment model.

In the embodiment of the present application, the primitive processing model includes a primitive LSTM unit, and the output expectation of each time step of the primitive LSTM unit is token of the next time step, and then the number of neurons in the output layer of each time step is set to be the same as the coding dimension value.

In the embodiment of the present application, a coding dimension value of a coding sequence to be processed is set to be M, and correspondingly, the number of neurons in an output layer at each time step is set to be M.

After step S203, the following steps are also included:

and S204, setting the cost function of the initial adjustment model as a preset cost function to obtain an initial model.

In the embodiment of the present application, the preset cost function is a preset loss function, which may specifically be a cross entropy function, and the like, and the embodiment of the present application is not limited thereto.

In the embodiment of the application, a softmax layer is added after an output layer of the initial adjustment model, and a cross entropy function is used as a cost function of the initial adjustment model.

In the embodiment of the application, a cross entropy function, namely a cross entropy loss function, introduces the cross entropy into the field of computational linguistics disambiguation, adopts the real semantics of a sentence as the prior information of a training set of the cross entropy, and adopts the semantics of machine translation as the posterior information of a test set. And calculating the cross entropy of the two, and guiding the identification and elimination of the ambiguity by the cross entropy.

In the embodiment of the present application, by implementing the steps S203 to S204, the original processing model can be adjusted by presetting the model parameters, so as to obtain the initial model.

S205, training the initial model through the training data to obtain a vectorization processing model.

In the embodiment of the present application, each column vector of the connection weight matrix between the input layer of the trained vectorization processing model and the hidden layer of the LSTM unit corresponds to a vectorization representation of token.

In this embodiment of the present application, when a one-hot encoding algorithm is used to encode a connection sequence, if the types of character strings (i.e., tokens) in the abstract syntax tree are M, the one-hot encoding corresponding to each token in the input layer of the vectorization processing model is an M-dimensional string vector. Further, assuming that the hidden layer of the LSTM unit has N neurons, the dimension of the connection weight matrix between the input layer of the vectorization processing model and the hidden layer of the LSTM unit is (N, M). Because each token of the input layer is a one-hot vector, the trained vectorization processing model compresses each M-dimensional one-hot vector into an N-dimensional column vector, which is a column vector in the connection weight matrix.

After step S205, the following steps are also included:

and S206, acquiring source code data to be processed.

In the embodiments of the present application, source code data (also referred to as a source program) refers to a series of human-readable computer language instructions.

In the embodiment of the present application, the source code data may be C language source code data, and the like, which is not limited to this embodiment of the present application.

And S207, analyzing the source code data to obtain the abstract syntax tree to be processed.

In the embodiment of the application, the source code data can be analyzed through the syntax analyzer to generate the corresponding abstract syntax tree. A parser (parser) usually appears as a component of a compiler or interpreter, which functions to perform syntax checking and to build a data structure (i.e. an abstract syntax tree) consisting of the input words.

In this embodiment of the application, when the source code data is C-language source code data, a C-language parser may be used to parse the source code data, and the C-language parser may be a C-language source code parser Pycparser, and the like, which is not limited to this embodiment of the application.

In the embodiment of the present application, by implementing the steps S206 to S207, the abstract syntax tree to be processed can be obtained.

After step S207, the following steps are also included:

and S208, performing breadth-first traversal on the abstract syntax tree to obtain a first sequence, and performing depth-first traversal on the abstract syntax tree to obtain a second sequence.

S209, performing connection processing on the first sequence and the second sequence to obtain a connection sequence.

S210, coding the connection sequence to obtain a coding sequence to be processed.

In the embodiment of the present application, a one-hot encoding algorithm may be adopted to encode the connection sequence.

In the embodiment of the present application, the code sequence to be processed can be generated according to the first sequence and the second sequence by implementing the steps S209 to S210.

S211, processing the coding sequence to be processed through a pre-constructed vectorization processing model to obtain a vectorization representation result of the nodes in the abstract syntax tree.

Example 3

Referring to fig. 3, fig. 3 is a schematic structural diagram of a vectorization representation apparatus for a node in an abstract syntax tree according to an embodiment of the present application. As shown in fig. 3, the vectorization representation apparatus of the node in the abstract syntax tree includes:

an obtaining module 310, configured to obtain an abstract syntax tree to be processed.

And the traversal module 320 is configured to perform breadth-first traversal on the abstract syntax tree to obtain a first sequence, and perform depth-first traversal on the abstract syntax tree to obtain a second sequence.

And the encoding module 330 is configured to generate a to-be-processed encoding sequence according to the first sequence and the second sequence.

The model processing module 340 is configured to process the coding sequence to be processed through a pre-constructed vectorization processing model, so as to obtain a vectorization representation result of a node in the abstract syntax tree.

In the embodiment of the present application, for the explanation of the vectorization representation apparatus for nodes in the abstract syntax tree, reference may be made to the description in embodiment 1 or embodiment 2, and details are not repeated in this embodiment.

It can be seen that, by implementing the vectorization representation apparatus for nodes in the abstract syntax tree described in this embodiment, all the nodes in the abstract syntax tree can be fully covered, and thus, the vectorization representation can be accurately performed on the nodes in the abstract syntax tree.

Example 4

Referring to fig. 4, fig. 4 is a schematic structural diagram of a vectorization representation apparatus for a node in an abstract syntax tree according to an embodiment of the present disclosure. The vectorization representation device of the node in the abstract syntax tree shown in fig. 4 is optimized by the vectorization representation device of the node in the abstract syntax tree shown in fig. 3. As shown in fig. 4, the obtaining module 310 includes:

the obtaining submodule 311 is configured to obtain source code data to be processed.

And the parsing submodule 312 is configured to parse the source code data to obtain an abstract syntax tree to be processed.

As an alternative embodiment, the encoding module 330 includes:

the connection submodule 331 is configured to perform connection processing on the first sequence and the second sequence to obtain a connection sequence.

And the coding submodule 332 is used for coding the connection sequence to obtain a coding sequence to be processed.

As an optional implementation, the building module 350 is configured to build an original processing model before the to-be-processed coding sequence is processed through a pre-built vectorization processing model to obtain a vectorization representation result of a node in the abstract syntax tree.

The parameter obtaining module 360 is configured to obtain training data for training the raw processing model and preset model parameters.

And an adjusting module 370, configured to adjust the original processing model by presetting model parameters, so as to obtain an initial model.

And the training module 380 is configured to train the initial model through the training data to obtain a vectorization processing model.

In the embodiment of the application, the preset model parameters at least include a coding dimension value and a preset cost function of the coding sequence to be processed.

As an alternative embodiment, the adjusting module 370 includes:

the first setting submodule 371 is configured to set the number of neurons in the output layer of each model unit in the original processing model as a coding dimension value, so as to obtain an initial adjustment model.

And a second setting submodule 372, which sets the cost function of the initial adjustment model as a preset cost function to obtain an initial model.

An embodiment of the present application provides an electronic device, including a memory and a processor, where the memory is used to store a computer program, and the processor runs the computer program to make the electronic device execute a vectorization representation method for a node in an abstract syntax tree in any one of embodiment 1 or embodiment 2 of the present application.

An embodiment of the present application provides a computer-readable storage medium, which stores computer program instructions, and when the computer program instructions are read and executed by a processor, the computer program instructions perform a vectorization representation method for a node in an abstract syntax tree according to any one of embodiment 1 or embodiment 2 of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims

1. A method for vectorized representation of nodes in an abstract syntax tree, comprising:

acquiring an abstract syntax tree to be processed;

2. The method according to claim 1, wherein the obtaining the abstract syntax tree to be processed comprises:

acquiring source code data to be processed;

3. The method according to claim 1, wherein the generating a sequence of coding to be processed according to the first sequence and the second sequence comprises:

and coding the connecting sequence to obtain a coding sequence to be processed.

4. The method as claimed in claim 1, wherein before the processing the coding sequence to be processed through the pre-constructed vectorization processing model to obtain the vectorization representation result of the node in the abstract syntax tree, the method further comprises:

constructing an original processing model;

5. The method according to claim 4, wherein the predetermined model parameters at least include a coding dimension value and a predetermined cost function of the coding sequence to be processed;

6. An apparatus for vectorizing representation of a node in an abstract syntax tree, the apparatus comprising:

7. The apparatus for vectorized representation of nodes in an abstract syntax tree as claimed in claim 6, wherein said retrieving module comprises:

8. The apparatus for vectorized representation of nodes in an abstract syntax tree as claimed in claim 6, wherein said encoding module comprises:

9. An electronic device, comprising a memory for storing a computer program and a processor for executing the computer program to cause the electronic device to perform the vectorized representation method of nodes in an abstract syntax tree according to any one of claims 1 to 5.

10. A readable storage medium having stored thereon computer program instructions which, when read and executed by a processor, perform the method of vectorized representation of nodes in an abstract syntax tree according to any one of claims 1 to 5.