CN109582352A - A kind of code completion method and system based on double AST sequences - Google Patents
A kind of code completion method and system based on double AST sequences Download PDFInfo
- Publication number
- CN109582352A CN109582352A CN201811224521.XA CN201811224521A CN109582352A CN 109582352 A CN109582352 A CN 109582352A CN 201811224521 A CN201811224521 A CN 201811224521A CN 109582352 A CN109582352 A CN 109582352A
- Authority
- CN
- China
- Prior art keywords
- sequence
- code
- ast
- sequences
- completion
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/70—Software maintenance or management
- G06F8/72—Code refactoring
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Devices For Executing Special Programs (AREA)
Abstract
The present invention provides a kind of code completion method and system based on double AST sequences, comprising: source code processing step uses abstract syntax tree analysis source code;AST turns binary tree step, and above-mentioned abstract syntax tree is converted to two different sequences simultaneously;Model training step, by described two different sequence inputting LSTM models, train language model;Completion step is predicted, according to the language model completion code trained.The abstract syntax tree (AST) of program code to be learned is converted to two sequences (such as " preamble sequence " and " middle sequence sequence ") by the present invention simultaneously, and utilizes information one LSTM model of training of the two sequences simultaneously.The LSTM that method of the invention trains has higher accuracy rate.Technical solution of the present invention have the characteristics that it is simple, quick, can preferably improve code recommendation accuracy rate and recommend efficiency.
Description
Technical field
The present invention relates to computer software engineering technical fields, mend more particularly, to a kind of code based on double AST sequences
Full method and system.
Background technique
One program often has the structure of different levels, and the structure of each level corresponds to corresponding program and analyzed
Journey, thus the program information of different level of abstractions can be obtained from different analytic processes.Many programs need to compile ability
Operation, such as C, C++, C#, Java, and the various technologies in compiling are also commonly used in program analysis task;Compiling substantially process
As shown in Figure 1.Typically, by morphological analysis (Lexical Analysis), syntactic analysis (Syntax Analysis) and semanteme
It analyzes (Semantic Analysis), the morphological information and grammer of available program, semantic information, inapt theory can also
To be understood as " literal " information and " structure " information of program.Obviously, the two analyzes the function of a program for understanding
It is all highly important.
Program analysis is carried out now with many research and utilization deep learning models, intuitive idea is to utilize circulation nerve
Network (Recurrent Neural Network, RNN) is to program source code (or the word sequence of source program, i.e. token sequence
Column) establish language model.And such way has only used the information of the program bottom --- morphological information --- to analyze
Program.However, different with natural language, the structural information of program contains more essential information, directly utilizes the source generation of program
Code/word sequence is modeled, and the information of program itself cannot be well reflected.In other words, merely with morphological information to program into
Row analysis is incomplete, does not make full use of the information of program source code various aspects.Furthermore different tasks is to different programs
Information sensing degree is different, even if certain program analysis tasks are only also more efficient using more abstract program information, and such as journey
Sequence classification task is just more sensitive to program structure, this is because program structure reflects the function of program.Such as in program from
Define identifier i and be substituted for iii, the structure of program be it is completely immovable, i.e. the function of program does not change.
Most of programmer will use frame or library API during carrying out software development to be multiplexed code.But
Programmer is almost impossible to remember all API, because existing API quantity is very huge.Therefore, code auto-complete machine
System have become in modern Integrated Development Environment (Integrated Development Environment, IDE) can not or
Scarce component part.According to statistics, code completion is most-often used one of ten instructions of developer.Code completion mechanism is in program
The remainder of completion program can be attempted when member's input code.The input when code completion of intelligence can be programmed by eliminating is wrong
Accidentally and recommend suitable API to accelerate software development process.
Currently, being AST (Abstract Syntax Tree, abstract syntax tree) code conversion, then again abstract language
Method tree is converted to Token (identifier) sequence and the use of obtained AST sequence data training LSTM is a kind of code building
Method.However, it is difficult for only using a single sequence and such as only using " preamble sequence " according to the basic theory of data structure
To describe clear original AST tree construction.That is, lost when an AST is converted to a sequence
The information (that is, only relying only on a sequence, original AST can not be converted back) of many tree constructions.It completely to save
The all information of one syntax tree, it is necessary to while at least (e.g., while " preamble sequence " and " middle sequence sequence are used using two sequences
Column " could completely save the information of one tree.
Summary of the invention
In order to solve the above problem, the abstract syntax tree (AST) of program code to be learned is converted to two by the present invention simultaneously
A sequence (such as " preamble sequence " and " middle sequence sequence "), and information one LSTM model of training of the two sequences is utilized simultaneously.
Specifically, the present invention provides a kind of code completion methods based on double AST sequences, comprising:
Source code processing step uses abstract syntax tree analysis source code;
Above-mentioned abstract syntax tree is converted to two different sequences by sequence generation step simultaneously;
Model training step, by described two different sequence inputting LSTM models, train language model;
Completion step is predicted, according to the language model completion code trained.
Preferably, in source code processing step, the source code is resolved to different form, with obtain code class,
Method list, code identifier.
Preferably, the sequence generation step includes: to obtain preamble sequence and middle sequence by preamble traversal and inorder traversal
Sequence splices preamble sequence and middle sequence sequence, as the input of subsequent LSTM network.
Preferably, the sequence generation step further comprises: obtaining middle sequence sequence by inorder traversal and postorder traversal
With postorder sequence, sequence sequence and postorder sequence in splicing, as the input of subsequent LSTM network.
Preferably, the LSTM model is concatenated LSTM model, and the LSTM model is located at the hidden layer of RNN model.
Preferably, in prediction completion step, partial code segment is inputted to the language model of trained mistake, thus root
The code element recommended according to context output.
According to another aspect of the present invention, a kind of code completion system based on double AST sequences is additionally provided, including suitable
The following module of sequence connection:
Source code processing module uses abstract syntax tree analysis source code;
Above-mentioned abstract syntax tree is converted to two different sequences by sequence generating module simultaneously;
Model training module, by described two different sequence inputting LSTM models, train language model;
Completion module is predicted, for according to the language model completion code trained.
Preferably, the source code is resolved to different form by the source code processing module, to obtain class, the side of code
Method list, code identifier.
Preferably, the sequence generating module obtains preamble sequence and middle sequence sequence by preamble traversal and inorder traversal,
Splice preamble sequence and middle sequence sequence, as the input of subsequent LSTM network.
Preferably, sequence generating module further passes through inorder traversal and postorder traversal obtains middle sequence sequence and postorder sequence
It arranges, sequence sequence and postorder sequence in splicing, as the input of subsequent LSTM network.
The LSTM that method of the invention trains has higher accuracy rate.Technical solution of the present invention has simple, fast
The feature of speed can preferably improve the accuracy rate of code recommendation and recommend efficiency.
Detailed description of the invention
By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field
Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention
Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:
Fig. 1 is the compiling flow chart of existing program.
Fig. 2 is that the present invention is based on the code completion method flow diagrams of double AST sequences.
Fig. 3 is that the present invention is based on the code completion system construction drawings of double AST sequences.
Fig. 4 is the Experiment Training result schematic diagram that " preamble+middle sequence " of the invention splices list entries.
Fig. 5 is the Experiment Training result schematic diagram that " middle sequence+postorder " of the invention splices list entries.
Fig. 6 is the Experiment Training result schematic diagram of four kinds of different LSTM list entries.
Specific embodiment
The illustrative embodiments that the present invention will be described in more detail below with reference to accompanying drawings.Although showing this hair in attached drawing
Bright illustrative embodiments, it being understood, however, that may be realized in various forms the reality of the invention without that should be illustrated here
The mode of applying is limited.It is to be able to thoroughly understand the present invention on the contrary, providing these embodiments, and this can be sent out
Bright range is fully disclosed to those skilled in the art.
The present invention is serialized AST (Abstract Syntax Tree, abstract syntax tree), and is serialized to AST
As a result it is modeled, it thus can be with LSTM series model (shot and long term memory network, one kind of Recognition with Recurrent Neural Network) to journey
The structural information of sequence is analyzed, and then completes class of procedures task.In other words, the present invention is carried out using LSTM series model
It is improved on the basis of class of procedures task, the morphology rank input (token sequence) of original language model is substituted for
AST serializing using the structural information of program as a result, mainly analyzed, and achieve ideal result.
RNN (Recognition with Recurrent Neural Network) is a kind of common artificial neural network, defeated suitable for processing timing list entries
It can be set as the sequence (even 1) of different length out;Early stage RNN cannot handle long-term Dependence Problem, i.e. RNN meeting " forgetting "
Information.LSTM (shot and long term memory network) is a kind of variant of RNN, solves long-term Dependence Problem, has certain memory capability,
It is suitble to be spaced and postpone longer event in processing predicted time sequence.However, being which type of RNN, input all must be
Sequence, if the structure that serializing embodies program information just must be taken into consideration using the structure of LSTM analysis program --- here
The present invention uses AST.
The structure of program is often tree, and LSTM is the series model of linear structure.According to the knowledge of data structure:
1, multiway tree and binary tree correspond.2, binary tree and inorder traversal, preamble/postorder traversal sequence correspond;Or
Person says that inorder traversal and preamble/postorder traversal sequence uniquely determine a binary tree.To which source code and AST turn two
Pitch preamble/postorder traversal sequence, the one-to-one conclusion of splicing sequence of inorder traversal of tree.So the present invention proposes two
Kind serializing mode is simultaneously tested:
The first: AST turns binary tree, obtains preamble sequence and middle sequence sequence by preamble traversal and inorder traversal, splices
Preamble sequence and middle sequence sequence, as the input of network;
Second: AST turns binary tree, obtains middle sequence sequence and postorder sequence by inorder traversal and postorder traversal, splicing
Middle sequence sequence and postorder sequence, as the input of network.
Fig. 2 is that the present invention is based on the code completion method flow diagrams of double AST sequences.Include the following steps:
S1, source code processing step, carry out analysis source code using abstract syntax tree.In this step, the source code
It is resolved to different form, with class, method list, the code identifier etc. for obtaining code.
Abstract syntax tree (abstract syntax tree is perhaps abbreviated as AST) or syntax tree (syntax
Tree), be source code abstract syntax structure the tree-shaped form of expression, in particular to the source code of programming language.With abstract language
Method tree it is opposite be concrete syntax tree (concrete syntax tree), commonly referred to as parsing tree (parse tree).Generally
, in the translation and compilation process of source code, syntax analyzer is created that parsing tree.Once AST is created out, subsequent
Treatment process in, such as semantic analysis stage, some information can be added.
Then, abstract syntax tree switchs to binary tree.
Above-mentioned abstract syntax tree (AST) is converted to two different sequences by S2, sequence generation step simultaneously.Specifically, this
Invention uses two kinds of conversion regimes, as follows:
The first: obtaining preamble sequence and middle sequence sequence by preamble traversal and inorder traversal, splicing preamble sequence is in
Sequence sequence, as the input of subsequent LSTM network;
Second: middle sequence sequence and postorder sequence being obtained by inorder traversal and postorder traversal, sequence sequence is with after in splicing
Sequence sequence, as the input of subsequent LSTM network.
S3, model training step, by described two different sequence inputting LSTM models, train language model.Step S2 solution
Two sequences obtained after analysis will be used for the circulation mind based on long short-term memory (Long Short Term Memory, LSTM)
Through netspeak model.The LSTM model is concatenated LSTM model, and the LSTM model is located at the hidden layer of RNN model.
S4, prediction completion step, according to the language model completion code trained.In this step, by partial code
Segment inputs the language model of trained mistake, to based on context export the code element of recommendation.
As shown in figure 3, according to another aspect of the present invention, additionally providing a kind of code completion based on double AST sequences
System 100, the following module including sequential connection:
Source code processing module 110 uses abstract syntax tree analysis source code;Preferably, the source code processing module
The source code is resolved into different form, to obtain class, the method list, code identifier of code.
Above-mentioned abstract syntax tree (AST) is converted to two different sequences by sequence generating module 120 simultaneously.
Model training module 130, by described two different sequence inputting LSTM models, train language model;
Completion module 140 is predicted, for according to the language model completion code trained.
In specific embodiment described in Fig. 2, the data set that the present invention uses is that POJ (comment online by Peking University's program
Examining system) 104 class C/C++ LISP program LISP source codes, every a kind of problem for corresponding to the system contains 500 parts of topics
Completely meeting the requirements source code of submitting of student, the classification of Miscellaneous Documents has been marked via 1-104.Preprocessing part is first
It first passes through pycparser tool and obtains the AST of C/C++ language, then AST is converted by binary tree by unified approach, by preceding
Sequence/middle sequence/postorder traversal obtains preamble/middle sequence/postorder sequence (sequence that traversing result is all kinds of node Node in AST), leads to
It crosses splicing (concatenate) sequence and obtains list entries: preamble sequence+middle sequence sequence or middle sequence sequence+postorder sequence.Such as
Described above, either which kind of splices list entries, all corresponds with source program.List entries is crossed one first by network portion
A embeding layer, each Node switch to an one-hot vector (one-hot length is vocabulary size), and such each
One-hot vector is by the true input as each time step;When the one-hot input of the last one Node of list entries
Afterwards, the output one-hot vector of a prediction is obtained, for the result after one softmax layers (being not drawn into figure), value is maximum
One (length is programs categories number, i.e., 104) subscript adds a prediction classification as list entries.Training when, label to
It is quantified as one-hot vector (length 104), only corresponding to one of classification is 1;The loss function used is cross entropy.
Xi indicates the input of current time step in Fig. 2, i.e. the one-hot of the insertion of AST interior joint, hi indicate current time
The hiding layer state of step, y indicates current output --- due to being class of procedures task, the present invention only needs reading in last
Time step when a input exports corresponding Prediction program classification, and actually y also needs to take after one softmax layers
Argmax determines corresponding classification.
Experiment and result
In order to show modelling effect, the present invention has also done two groups of control experiments.One is the experiment of original version, inputs sequence
Column are the morphological analyses of program as a result, i.e. directly by token sequence/word sequence of source code as input;The second is using non-
One-to-one ergodic sequence combination is as input, and the present invention uses the depth-first traversal of AST for the sake of simplicity
Node sequence is as input.
Experimental setup (hyper parameter):
Hidden size=300
Batch size=22
Learning rate=1e-4
Experimental result: the Experiment Training process of two kinds of splicing list entries is as shown in Figure 4,5: " preamble+middle sequence " is as defeated
The experiment entered finally converges on 90.08% predictablity rate (Fig. 4), and " middle sequence+postorder " is finally converged on as input
88.43% accuracy rate (Fig. 5).
In addition the result of other two groups of control experiments is as shown in Figure 6: where ast2bt_iap refers to middle sequence splicing postorder
For sequence as input, ast2bt_pai refers to that for sequence sequence as inputting, ast_dfs refers to the depth of AST in preamble splicing
First traversal sequence refers to the tokens sequence of source code directly as input as input, src_code.The following table 1 is four kinds of methods
The test accuracy rate that mode input obtains corresponds to table.
Table 1
Mode input | Test accuracy rate |
AST turns binary tree: preamble+middle sequence | 90.08% |
AST turns binary tree: middle sequence+postorder | 88.43% |
The depth-first traversal sequence of AST | 88.60% |
The token sequence of Source Code | 91.87% |
It can be seen that the even superficial structural information using program, can also reach fine in class of procedures task
Effect.Wherein, ast_dfs (depth-first traversal, which is contributed a foreword, is classified as input) method obviously restrains slow, this is because deep
It spends first traversal sequence and source program not corresponds, so therefore the discrimination of program can lose, the study speed of model
Degree is just naturally slow, and accuracy rate is also lower compared with method for distinguishing.And the convergence rate of src_code is fast, this
Aspect is since list entries is shorter (the half length less than splicing sequence as input mode), and model can learn more
Fastly;Further, since LSTM model itself can extraction procedure higher level of abstraction information, so being made in fact using double AST sequence assemblies
For the model of list entries, the structural information of program is also not yet utilized well.It is critical that light is to utilize the imperfect of program
Structural information (double AST splicing sequences can not completely reflect the structure of entire program, but for the serializing for LSTM
Model we have to take into account that this kind of serializing operation), can be obtained in class of procedures task well as a result, with source generation
The experiment that code is made to input differs less than 2%, and this already demonstrates the abilities of model.
On the whole, model of the invention is modeled with the structural information of program well.Further, should
Model can be also used for other tasks, as long as the model can obtain complete AST structure, do similar process.On the other hand, such as
Fruit can be using the network of more structuring, more completely with the data knot of original AST or corresponding reflection program structure
For structure as input, possible experimental result can be more ideal.
It should be understood that
Algorithm and display be not inherently related to any certain computer, virtual bench or other equipment provided herein.
Various fexible units can also be used together with teachings based herein.As described above, it constructs required by this kind of device
Structure be obvious.In addition, the present invention is also not directed to any particular programming language.It should be understood that can use various
Programming language realizes summary of the invention described herein, and the description done above to language-specific is to disclose this hair
Bright preferred forms.
In the instructions provided here, numerous specific details are set forth.It is to be appreciated, however, that implementation of the invention
Example can be practiced without these specific details.In some instances, well known method, structure is not been shown in detail
And technology, so as not to obscure the understanding of this specification.
Similarly, it should be understood that in order to simplify the disclosure and help to understand one or more of the various inventive aspects,
Above in the description of exemplary embodiment of the present invention, each feature of the invention is grouped together into single implementation sometimes
In example, figure or descriptions thereof.However, the disclosed method should not be interpreted as reflecting the following intention: i.e. required to protect
Shield the present invention claims features more more than feature expressly recited in each claim.More precisely, as following
Claims reflect as, inventive aspect is all features less than single embodiment disclosed above.Therefore,
Thus the claims for following specific embodiment are expressly incorporated in the specific embodiment, wherein each claim itself
All as a separate embodiment of the present invention.
Those skilled in the art will understand that can be carried out adaptively to the module in the equipment in embodiment
Change and they are arranged in one or more devices different from this embodiment.It can be the module or list in embodiment
Member or component are combined into a module or unit or component, and furthermore they can be divided into multiple submodule or subelement or
Sub-component.Other than such feature and/or at least some of process or unit exclude each other, it can use any
Combination is to all features disclosed in this specification (including adjoint claim, abstract and attached drawing) and so disclosed
All process or units of what method or apparatus are combined.Unless expressly stated otherwise, this specification is (including adjoint power
Benefit require, abstract and attached drawing) disclosed in each feature can carry out generation with an alternative feature that provides the same, equivalent, or similar purpose
It replaces.
In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments
In included certain features rather than other feature, but the combination of the feature of different embodiments mean it is of the invention
Within the scope of and form different embodiments.For example, in the following claims, embodiment claimed is appointed
Meaning one of can in any combination mode come using.
Various component embodiments of the invention can be implemented in hardware, or to run on one or more processors
Software module realize, or be implemented in a combination thereof.It will be understood by those of skill in the art that can be used in practice
One in the creating device of microprocessor or digital signal processor (DSP) to realize virtual machine according to an embodiment of the present invention
The some or all functions of a little or whole components.The present invention is also implemented as executing method as described herein
Some or all device or device programs (for example, computer program and computer program product).Such realization
Program of the invention can store on a computer-readable medium, or may be in the form of one or more signals.This
The signal of sample can be downloaded from an internet website to obtain, and is perhaps provided on the carrier signal or mentions in any other forms
For.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and ability
Field technique personnel can be designed alternative embodiment without departing from the scope of the appended claims.In the claims,
Any reference symbol between parentheses should not be configured to limitations on claims.Word "comprising" does not exclude the presence of not
Element or step listed in the claims.Word "a" or "an" located in front of the element does not exclude the presence of multiple such
Element.The present invention can be by means of including the hardware of several different elements and being come by means of properly programmed computer real
It is existing.In the unit claims listing several devices, several in these devices can be through the same hardware branch
To embody.The use of word first, second, and third does not indicate any sequence.These words can be explained and be run after fame
Claim.
The foregoing is only a preferred embodiment of the present invention, but scope of protection of the present invention is not limited thereto,
In the technical scope disclosed by the present invention, any changes or substitutions that can be easily thought of by anyone skilled in the art,
It should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with the protection model of the claim
Subject to enclosing.
Claims (10)
1. a kind of code completion method based on double AST sequences characterized by comprising
Source code processing step uses abstract syntax tree analysis source code;
Above-mentioned abstract syntax tree is converted to two different sequences by sequence generation step simultaneously;
Model training step, by described two different sequence inputting LSTM models, train language model;
Completion step is predicted, according to the language model completion code trained.
2. the code completion method according to claim 1 based on double AST sequences, it is characterised in that:
In source code processing step, the source code is resolved to different form, to obtain class, the method list, generation of code
Code identifier.
3. the code completion method according to claim 1 based on double AST sequences, it is characterised in that:
The sequence generation step includes: to obtain preamble sequence and middle sequence sequence by preamble traversal and inorder traversal, before splicing
Sequence sequence and middle sequence sequence, as the input of subsequent LSTM network.
4. the code completion method according to claim 3 based on double AST sequences, it is characterised in that:
The sequence generation step further comprises: middle sequence sequence and postorder sequence are obtained by inorder traversal and postorder traversal,
Sequence sequence and postorder sequence in splicing, as the input of subsequent LSTM network.
5. the code completion method according to claim 1 or 2 based on double AST sequences, it is characterised in that:
The LSTM model is the LSTM model of stack.
6. the code completion method according to claim 1 based on double AST sequences, it is characterised in that:
In prediction completion step, partial code segment is inputted to the language model of trained mistake, thus based on context defeated
The code element recommended out.
7. a kind of code completion system based on double AST sequences, which is characterized in that the following module including sequential connection:
Source code processing module uses abstract syntax tree analysis source code;
Above-mentioned abstract syntax tree is converted to two different sequences by sequence generating module simultaneously;
Model training module, by described two different sequence inputting LSTM models, train language model;
Completion module is predicted, for according to the language model completion code trained.
8. the code completion system according to claim 7 based on double AST sequences, it is characterised in that:
The source code is resolved to different form by the source code processing module, to obtain class, the method list, code of code
Identifier.
9. the code completion system according to claim 7 based on double AST sequences, it is characterised in that:
The sequence generating module obtains preamble sequence and middle sequence sequence by preamble traversal and inorder traversal, splices preamble sequence
With middle sequence sequence, as the input of subsequent LSTM network.
10. the code completion system according to claim 9 based on double AST sequences, it is characterised in that:
Sequence generating module further passes through inorder traversal and postorder traversal obtains middle sequence sequence and postorder sequence, sequence sequence in splicing
Column and postorder sequence, as the input of subsequent LSTM network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811224521.XA CN109582352A (en) | 2018-10-19 | 2018-10-19 | A kind of code completion method and system based on double AST sequences |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811224521.XA CN109582352A (en) | 2018-10-19 | 2018-10-19 | A kind of code completion method and system based on double AST sequences |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109582352A true CN109582352A (en) | 2019-04-05 |
Family
ID=65920215
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811224521.XA Pending CN109582352A (en) | 2018-10-19 | 2018-10-19 | A kind of code completion method and system based on double AST sequences |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109582352A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110223553A (en) * | 2019-05-20 | 2019-09-10 | 北京师范大学 | A kind of prediction technique and system of answering information |
CN111966817A (en) * | 2020-07-24 | 2020-11-20 | 复旦大学 | API recommendation method based on deep learning and code context structure and text information |
CN112035099A (en) * | 2020-09-01 | 2020-12-04 | 北京天融信网络安全技术有限公司 | Vectorization representation method and device for nodes in abstract syntax tree |
CN112860362A (en) * | 2021-02-05 | 2021-05-28 | 达而观数据(成都)有限公司 | Visual debugging method and system for robot automation process |
CN112905188A (en) * | 2021-02-05 | 2021-06-04 | 中国海洋大学 | Code translation method and system based on generation type countermeasure GAN network |
CN113010182A (en) * | 2021-03-25 | 2021-06-22 | 北京百度网讯科技有限公司 | Method and device for generating upgrade file and electronic equipment |
CN113064586A (en) * | 2021-05-12 | 2021-07-02 | 南京大学 | Code completion method based on abstract syntax tree augmented graph model |
CN113076089A (en) * | 2021-04-15 | 2021-07-06 | 南京大学 | API completion method based on object type |
CN117573085A (en) * | 2023-10-17 | 2024-02-20 | 广东工业大学 | Code complement method based on hierarchical structure characteristics and sequence characteristics |
CN117573084A (en) * | 2023-08-02 | 2024-02-20 | 广东工业大学 | Code complement method based on layer-by-layer fusion abstract syntax tree |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070277163A1 (en) * | 2006-05-24 | 2007-11-29 | Syver, Llc | Method and tool for automatic verification of software protocols |
CN102185930A (en) * | 2011-06-09 | 2011-09-14 | 北京理工大学 | Method for detecting SQL (structured query language) injection vulnerability |
US9928040B2 (en) * | 2013-11-12 | 2018-03-27 | Microsoft Technology Licensing, Llc | Source code generation, completion, checking, correction |
US20180088937A1 (en) * | 2016-09-29 | 2018-03-29 | Microsoft Technology Licensing, Llc | Code refactoring mechanism for asynchronous code optimization using topological sorting |
CN108388425A (en) * | 2018-03-20 | 2018-08-10 | 北京大学 | A method of based on LSTM auto-complete codes |
CN108563433A (en) * | 2018-03-20 | 2018-09-21 | 北京大学 | A kind of device based on LSTM auto-complete codes |
CN108595165A (en) * | 2018-04-25 | 2018-09-28 | 清华大学 | A kind of code completion method, apparatus and storage medium based on code intermediate representation |
-
2018
- 2018-10-19 CN CN201811224521.XA patent/CN109582352A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070277163A1 (en) * | 2006-05-24 | 2007-11-29 | Syver, Llc | Method and tool for automatic verification of software protocols |
CN102185930A (en) * | 2011-06-09 | 2011-09-14 | 北京理工大学 | Method for detecting SQL (structured query language) injection vulnerability |
US9928040B2 (en) * | 2013-11-12 | 2018-03-27 | Microsoft Technology Licensing, Llc | Source code generation, completion, checking, correction |
US20180088937A1 (en) * | 2016-09-29 | 2018-03-29 | Microsoft Technology Licensing, Llc | Code refactoring mechanism for asynchronous code optimization using topological sorting |
CN108388425A (en) * | 2018-03-20 | 2018-08-10 | 北京大学 | A method of based on LSTM auto-complete codes |
CN108563433A (en) * | 2018-03-20 | 2018-09-21 | 北京大学 | A kind of device based on LSTM auto-complete codes |
CN108595165A (en) * | 2018-04-25 | 2018-09-28 | 清华大学 | A kind of code completion method, apparatus and storage medium based on code intermediate representation |
Non-Patent Citations (3)
Title |
---|
DRCRYPTO: "由先序+后序遍历确定序列是否唯一并输出一个中序序列", 《HTTPS://BLOG.CSDN.NET/U011240016/ARTICLE/DETAILS/53193754》 * |
JIAN LI等: "Code Completion with Neural Attention and Pointer Networks"", 《INT’L JOINT CONF. ON ARTIFICAL INTELLIGENCE (IJCAI)》 * |
VESELIN RAYCHEV等: "Code Completion with Statistical Language Models", 《ACM》 * |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110223553B (en) * | 2019-05-20 | 2021-08-10 | 北京师范大学 | Method and system for predicting answer information |
CN110223553A (en) * | 2019-05-20 | 2019-09-10 | 北京师范大学 | A kind of prediction technique and system of answering information |
CN111966817A (en) * | 2020-07-24 | 2020-11-20 | 复旦大学 | API recommendation method based on deep learning and code context structure and text information |
CN111966817B (en) * | 2020-07-24 | 2022-05-20 | 复旦大学 | API recommendation method based on deep learning and code context structure and text information |
CN112035099A (en) * | 2020-09-01 | 2020-12-04 | 北京天融信网络安全技术有限公司 | Vectorization representation method and device for nodes in abstract syntax tree |
CN112035099B (en) * | 2020-09-01 | 2024-03-15 | 北京天融信网络安全技术有限公司 | Vectorization representation method and device for nodes in abstract syntax tree |
CN112905188A (en) * | 2021-02-05 | 2021-06-04 | 中国海洋大学 | Code translation method and system based on generation type countermeasure GAN network |
CN112860362B (en) * | 2021-02-05 | 2022-10-04 | 达而观数据(成都)有限公司 | Visual debugging method and system for robot automation process |
CN112860362A (en) * | 2021-02-05 | 2021-05-28 | 达而观数据(成都)有限公司 | Visual debugging method and system for robot automation process |
CN113010182A (en) * | 2021-03-25 | 2021-06-22 | 北京百度网讯科技有限公司 | Method and device for generating upgrade file and electronic equipment |
CN113076089A (en) * | 2021-04-15 | 2021-07-06 | 南京大学 | API completion method based on object type |
CN113076089B (en) * | 2021-04-15 | 2023-11-21 | 南京大学 | API (application program interface) completion method based on object type |
CN113064586A (en) * | 2021-05-12 | 2021-07-02 | 南京大学 | Code completion method based on abstract syntax tree augmented graph model |
CN113064586B (en) * | 2021-05-12 | 2022-04-22 | 南京大学 | Code completion method based on abstract syntax tree augmented graph model |
CN117573084A (en) * | 2023-08-02 | 2024-02-20 | 广东工业大学 | Code complement method based on layer-by-layer fusion abstract syntax tree |
CN117573084B (en) * | 2023-08-02 | 2024-04-12 | 广东工业大学 | Code complement method based on layer-by-layer fusion abstract syntax tree |
CN117573085A (en) * | 2023-10-17 | 2024-02-20 | 广东工业大学 | Code complement method based on hierarchical structure characteristics and sequence characteristics |
CN117573085B (en) * | 2023-10-17 | 2024-04-09 | 广东工业大学 | Code complement method based on hierarchical structure characteristics and sequence characteristics |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109582352A (en) | A kind of code completion method and system based on double AST sequences | |
CN108388425A (en) | A method of based on LSTM auto-complete codes | |
Biallas et al. | Arcade. PLC: A verification platform for programmable logic controllers | |
Chakraborty et al. | On multi-modal learning of editing source code | |
Marre et al. | Test sequences generation from lustre descriptions: Gatel | |
Medeiros et al. | DEKANT: a static analysis tool that learns to detect web application vulnerabilities | |
CN108595341B (en) | Automatic example generation method and system | |
WO2019075390A1 (en) | Blackbox matching engine | |
CN108563433A (en) | A kind of device based on LSTM auto-complete codes | |
WO2019051426A1 (en) | Pruning engine | |
CN109614103A (en) | A kind of code completion method and system based on character | |
WO2018226598A1 (en) | Method and system for arbitrary-granularity execution clone detection | |
CN109492402A (en) | A kind of intelligent contract safe evaluating method of rule-based engine | |
CN106682343A (en) | Method for formally verifying adjacent matrixes on basis of diagrams | |
CN114911711A (en) | Code defect analysis method and device, electronic equipment and storage medium | |
Shrestha et al. | DeepFuzzSL: Generating models with deep learning to find bugs in the Simulink toolchain | |
CN107194065A (en) | A kind of method for being checked in PCB design and setting binding occurrence | |
CN108563561B (en) | Program implicit constraint extraction method and system | |
CN106775913A (en) | A kind of object code controlling stream graph generation method | |
Xu et al. | Dsmith: Compiler fuzzing through generative deep learning model with attention | |
Meffert | Supporting design patterns with annotations | |
Hashtroudi et al. | Automated test case generation using code models and domain adaptation | |
Ribeiro et al. | Gpt-3-powered type error debugging: Investigating the use of large language models for code repair | |
US7543274B2 (en) | System and method for deriving a process-based specification | |
KR102421274B1 (en) | voice control system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |