WO2021017025A1 - Method for automatically generating python codes from natural language - Google Patents

Method for automatically generating python codes from natural language Download PDF

Info

Publication number
WO2021017025A1
WO2021017025A1 PCT/CN2019/099733 CN2019099733W WO2021017025A1 WO 2021017025 A1 WO2021017025 A1 WO 2021017025A1 CN 2019099733 W CN2019099733 W CN 2019099733W WO 2021017025 A1 WO2021017025 A1 WO 2021017025A1
Authority
WO
WIPO (PCT)
Prior art keywords
natural language
abstract syntax
syntax tree
generator
discriminator
Prior art date
Application number
PCT/CN2019/099733
Other languages
French (fr)
Chinese (zh)
Inventor
祝亚兵
张岩峰
Original Assignee
东北大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 东北大学 filed Critical 东北大学
Publication of WO2021017025A1 publication Critical patent/WO2021017025A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/43Checking; Contextual analysis
    • G06F8/436Semantic checking

Definitions

  • the invention belongs to the technical field of natural language processing, and specifically relates to a method for automatically generating Python code from natural language.
  • Semantic analysis tasks are a type of task in the field of natural language processing.
  • the main research is how to convert a given natural language description text into a logical representation that a computer can understand and execute, such as SQL, Python, Java, etc. .
  • the traditional method is to design a fixed template based on the characteristics of the programming language, and then use pattern matching to parse the natural language description into instances in the template.
  • the present invention proposes a method for automatically generating Python code from natural language.
  • the invention aims to improve the effect of the generator generating program fragments according to the natural language description through the discriminator, and learn the connection between the distribution of the natural language and the programming language.
  • a method to automatically generate Python code from natural language the steps are as follows:
  • Step 1 Use the generator of the GAN network to generate the abstract syntax tree of the program fragments according to the natural language description.
  • the generator is an Encoder-Decoder deep learning framework.
  • the Encoder is responsible for encoding the natural language description sequence.
  • the Decoder decodes the semantics of the natural language description into an abstract syntax tree of program fragments according to the encoding result of the Encoder.
  • Step 1.1 Use the two-way LSTM network as the Encoder to encode the natural language description sequence
  • Step 1.1.1 Encode the natural language description sequence from left to right and from right to left to obtain the intermediate hidden vector of each character
  • Step 1.1.2 Hide the vector in the middle Perform concat operation That is, the encoding vector of the natural language description character, and the encoding vector of each character is saved for later use by the decoder.
  • Step 1.1.3 Use the middle hidden vector of the last character as the initial state h end of the Decoder.
  • Step 1.2 Use the one-way LSTM network as the decoder, and construct the natural language semantic decoding encoded by the Encoder into the abstract syntax tree of the program.
  • This step introduces the grammatical rules of the programming language into the generation process.
  • the abstract syntax tree is generated in a depth-first traversal manner.
  • Each generation step is an application of context-free grammar production.
  • the grammar rules provide prior knowledge for the generation of abstract syntax trees and reduce the search space.
  • Step 1.2.1 Use h end in 1.1.3 as the initial state of the Decoder, and use the attention mechanism to calculate the content vector of h end , and then use the content vector as the input of LSTM.
  • Step 1.2.2 Use Softmax to multi-categorize the 1.2.1 LSTM output results, and these categories correspond to the action of generating an abstract syntax tree.
  • Step 1.2.3 For the action of the abstract syntax tree generated in 1.2.2, one type of this action is to generate leaf nodes, and the other type of action is to generate non-leaf nodes.
  • the action of generating non-leaf nodes is a context-free grammar extension; while the action of generating leaf nodes is to generate specific characters, that is, sequence characters in program fragments, which can be copied from natural language description sequences Copy the characters, you can also generate the corresponding characters according to the model.
  • Step 1.2.4 Apply the action of the abstract syntax tree of 1.2.3 to construct an abstract syntax tree in a depth-first traversal manner.
  • Step 1.2.5 Take the output of 1.2.4 as the input of 1.2.1, repeat the operations from 1.2.1 to 1.2.4, and finally get a complete abstract syntax tree, that is, the abstraction of the program fragment corresponding to the natural language description semantics Law tree.
  • Step 1.2.6 Parse the abstract syntax tree into program fragments.
  • Step 2 Use the GAN discriminator to determine whether the semantics of the abstract syntax tree generated by the generator is consistent with the semantics of the given natural language description, which is also a strong semantic constraint on the generator.
  • the data for training the discriminator is divided into three types: A natural language description in the training data and the abstract syntax tree of the corresponding program.
  • B Given a natural language description and an abstract syntax tree generated by the generator.
  • C natural language describes the abstract syntax tree of sequences and unrelated programs.
  • the given label for training data A is consistent, while the given label for training data B and C is inconsistent.
  • Step 2.1 Use the Encoder method in the GAN generator to encode the natural language description sequence. This step only needs to get the final semantic vector.
  • Step 2.2 Use the tree-type LSTM network to encode the abstract syntax tree from the bottom up, and encode it to the root node of the abstract syntax tree, which is the semantic vector corresponding to the abstract syntax tree.
  • Step 2.3 Perform vector multiplication on the natural language semantic vector in 2.1 and 2.2 and the semantic vector of the abstract syntax tree.
  • Step 2.4 Repeat 2.1 and 2.3 to perform the same operation on training data B and training data C in step 2.
  • Step 2.5 Perform two-class prediction on the training data pair in 2.4, and judge whether the semantics of the natural language and the program abstract syntax tree are consistent in these three cases.
  • Step 3 Train GANCoder, train the generator and discriminator of the GAN network together. During optimization, the generator and the discriminator are optimized alternately. Before training, the generator and the discriminator are pre-trained separately, and then the game is trained together.
  • the model GANCoder generated by a method of automatically generating Python code from natural language contains two parts: a generator and a discriminator, where the generator is responsible for generating program fragments from natural language to programming language, and the discriminator is Identify the program fragments generated by the generator.
  • the generator and the discriminator are in a state of game training, and they improve each other. In the end, the discriminator cannot identify whether the programming language program fragment is the data of the original training set or the data generated by the generator.
  • the present invention generates a code generation system by generating a confrontation network optimization training, and the system can generate a section of program code with the same function according to the natural language description of a function given by the user.
  • the generator can learn the language models of natural language and programming language more effectively.
  • Figure 1 is a semantic analyzer based on the Encoder-Decoder model.
  • Figure 2 is an abstract syntax tree corresponding to a Python program.
  • Figure 3 is the overall framework of the GANCoder of the present invention.
  • Figure 4 is a diagram showing the framework of the GANCoder generator.
  • Figure 5 is the use of tree LSTM network to encode the abstract method tree.
  • the proposed GANCoder system is generally a generative confrontation network, including two parts: a generator and a discriminator, as shown in Figure 3.
  • the generator is an Encoder-Decoder model.
  • the Encoder is responsible for encoding the natural language description sequence using a two-way LSTM network, while the Decoder decodes the semantics encoded by the Encoder into the abstract syntax tree of the program, using one-way LSTM network; and the discriminator is mainly responsible for judging whether the semantics of natural language description and abstract syntax tree are consistent.
  • the generator Encoder is used, and the tree-type LSTM network is used for the encoding of the abstract syntax tree.
  • the abstract syntax tree of the program is coded in a bottom-up manner, and the coding vector of the root node of the abstract syntax tree is the semantic vector of the abstract syntax tree.
  • Step 1 Use the generator of the GAN network to generate the abstract syntax tree of the program fragments according to the natural language description.
  • the generator is an Encoder-Decoder deep learning model, as shown in Figure 4.
  • the left side of the figure is Encoder, which is a two-way LSTM network, which is responsible for encoding natural language description sequences; the right side of the figure is Decoder, which is a one-way LSTM network. , It decodes the semantics described by natural language into abstract syntax trees of program fragments according to the encoding result of Encoder.
  • Step 1.1 Use the two-way LSTM network as the Encoder to encode the natural language description sequence.
  • the left and right directions in the Encoder in Figure 4 represent the encoding order of the LSTM network.
  • Step 1.1.1 Encode the natural language description sequence from left to right and from right to left to obtain the intermediate hidden vector of each character
  • the two encoding directions of the LSTM network in Figure 4 Encoder are the same.
  • Step 1.1.2 Change 1.1 Perform concat operation and get That is, the encoding vector of the natural language description character, and the encoding vector of each character is saved for later use by the decoder.
  • Step 1.1.3 Use the middle hidden vector of the last character as the initial state h end of the Decoder.
  • Step 1.2 Use the one-way LSTM network as the decoder, and construct the natural language semantic decoding encoded by the Encoder into the abstract syntax tree of the program.
  • This step introduces the grammatical rules of the programming language into the code generation process.
  • the abstract syntax tree is generated in a depth-first traversal manner.
  • Each generation step is an application of context-free grammar production.
  • the grammar rules provide prior knowledge for the generation of abstract syntax trees and reduce the search space.
  • Step 1.2.1 As shown in Figure 4, the Decoder takes h end in 1.1.3 as the initial state, and uses the attention mechanism to calculate the content vector C1 of h end , and then uses the content vector as the input of LSTM.
  • Step 1.2.2 Use Softmax to multi-categorize the LSTM output results. These categories correspond to the action of generating the abstract syntax tree, corresponding to each node of the abstract syntax tree in the right figure in Figure 2.
  • Step 1.2.3 For the actions predicted in 1.2.2, one is to generate leaf nodes, and the other is to generate non-leaf nodes, that is, the leaf nodes and non-leaf nodes in the abstract syntax tree in Figure 2.
  • the action of generating non-leaf nodes it is a context-free grammar extension, each of which is a context grammar rule; while the action of generating leaf nodes is to generate specific characters, that is, sequence characters in program fragments, which can be copied Ways to copy characters from the natural language description sequence, or to generate corresponding characters based on the model.
  • Step 1.2.4 Apply the 1.2.3 prediction action to construct an abstract syntax tree in a depth-first traversal manner.
  • the order in which the nodes of the abstract syntax tree in Figure 2 are represented by solid arrows is the order in which each node in the abstract syntax tree is built.
  • Step 1.2.5 Use the output of 1.2.4 as the input of 1.2.1, as shown in Figure 2.
  • the information of the previous node is passed to the next node, and the information includes the state of the previous step, which is indicated by the solid arrow , There is also the information of the parent node, the information conveyed by the dotted arrow. Then repeat the operations from 1.2.1 to 1.2.4, and finally get a complete abstract syntax tree, that is, the abstract syntax tree of the program fragment corresponding to the natural language description semantics.
  • Step 1.2.6 Parse the complete abstract syntax tree into program fragments.
  • Step 2 Use the GAN discriminator to determine whether the semantics of the abstract syntax tree generated by the generator is consistent with the semantics of the given natural language description, which is also a strong semantic constraint on the generator.
  • the training discriminator data is divided into three types: 1. The natural language description in the training data and the abstract syntax tree of the corresponding program. 2. Given the natural language description and the abstract syntax tree generated by the generator. 3. Natural language description sequence and abstract syntax tree of unrelated programs. For 1, the given label is consistent, and for two data, the given label is inconsistent.
  • Step 2.1 Use the Encoder method in the GAN generator to encode the natural language description sequence. As long as the final semantic vector is obtained in this step, the Encoder structure is shown in Figure 4.
  • Step 2.2 Adopt a tree-type LSTM network, as shown in Figure 5, to encode the abstract syntax tree from the bottom up.
  • the child nodes of the abstract syntax tree are the input of the parent node encoding, and it is encoded to the root node of the abstract syntax tree. It is the semantic vector corresponding to this abstract syntax tree.
  • Step 2.3 Perform vector multiplication on the natural language semantic vector in 2.1 and 2.2 and the semantic vector of the abstract syntax tree.
  • Step 2.4 Repeat 2.1 and 2.3 to perform the same operation on training data 2 and training data 3 in step 2.
  • Step 2.5 Perform two-class prediction on the training data pair in 2.4, and judge whether the semantics of the natural language and the program abstract syntax tree are consistent in these three cases.
  • Step 3 Train GANCoder, train the generator and discriminator of the GAN network together. During optimization, the generator and the discriminator are optimized alternately. Before training, the generator and the discriminator are pre-trained, and then the game is trained together. As shown in Figure 3, the discriminator's information will be fed back to the generator.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

A method for automatically generating Python codes from a natural language, wherein the method belongs to the technical field of natural language processing. The method steps are as follows: step 1: generating, by means of a generator of a GAN, an abstract syntax tree of a program fragment according to a natural language description; step 2: determining, by means of a discriminator of the GAN, whether the semanteme of the abstract syntax tree generated by the generator is consistent with the semanteme described by a given natural language; and step 3: training the generator and the discriminator of the GAN together. According to the method, a code generation system is generated by means of generative adversarial network optimization training, and the system can generate, according to the natural language description of a function given by a user, a program code with the same function. Compared with a traditional optimization method, a generative adversarial network is used to perform adversarial game training, and a generator can learn language models of natural languages and programming languages more effectively.

Description

一种从自然语言自动生成Python代码的方法A method to automatically generate Python code from natural language 技术领域Technical field
本发明属于自然语言处理技术领域,具体涉及一种从自然语言自动生成Python代码的方法。The invention belongs to the technical field of natural language processing, and specifically relates to a method for automatically generating Python code from natural language.
背景技术Background technique
语义分析任务是自然语言处理领域中的一类任务,主要研究的是如何将给定的自然语言描述文本转换成一种计算机能够理解并且可以执行的一种逻辑表示,比如SQL,Python,Java等形式。传统的方法是根据程序设计语言的特点设计出固定的模板,然后使用模式匹配的方式将自然语言描述解析成模板中的一个个实例。随着深度学习技术的发展,Encoder-Decoder等深度学习框架也被引入到语义分析分析任务中,比如采用机器翻译的方法将自然描述语言序列直接翻译成编程语言序列,又或者是在生成代码的时候,引入编程语言的语法,先生成程序的抽象语法树,然后再将抽象语法树转换成程序代码。但是上述Encoder-Decoder模型在处理从自然语言到编程语言之间的转换的时候,Encoder和Decoder分别处理两种不同语言,由于Encoder和Decoder使用的神经网络的不同,以及网络的深度,自然语言描述的语义在程序代码生成的过程中会逐步丢失,因此缺少一个强的语义约束的训练模型。Semantic analysis tasks are a type of task in the field of natural language processing. The main research is how to convert a given natural language description text into a logical representation that a computer can understand and execute, such as SQL, Python, Java, etc. . The traditional method is to design a fixed template based on the characteristics of the programming language, and then use pattern matching to parse the natural language description into instances in the template. With the development of deep learning technology, deep learning frameworks such as Encoder-Decoder have also been introduced into semantic analysis and analysis tasks, such as the use of machine translation methods to directly translate natural description language sequences into programming language sequences, or generate code At that time, the grammar of the programming language was introduced, and the abstract syntax tree of the program was first generated, and then the abstract syntax tree was converted into program code. However, when the above Encoder-Decoder model deals with the conversion from natural language to programming language, Encoder and Decoder deal with two different languages respectively. Due to the different neural networks used by Encoder and Decoder, as well as the depth of the network, natural language description The semantics of is gradually lost in the process of program code generation, so there is a lack of a training model with strong semantic constraints.
发明内容Summary of the invention
针对于上述问题,本发明提出来一种从自然语言自动生成Python代码的方法。本发明旨在通过判别器提高生成器根据自然语言描述生成程序片段的效果,学习到自然语言和编程语言的分布之间的联系。In view of the above problems, the present invention proposes a method for automatically generating Python code from natural language. The invention aims to improve the effect of the generator generating program fragments according to the natural language description through the discriminator, and learn the connection between the distribution of the natural language and the programming language.
本发明的技术方案是:The technical scheme of the present invention is:
一种从自然语言自动生成Python代码的方法,步骤如下:A method to automatically generate Python code from natural language, the steps are as follows:
步骤1:采用GAN网络的生成器根据自然语言描述生成程序片段的抽象语法树。Step 1: Use the generator of the GAN network to generate the abstract syntax tree of the program fragments according to the natural language description.
生成器是一个Encoder-Decoder深度学习框架,Encoder负责对自然语言描述序列进行编码,Decoder则根据Encoder的编码结果,将自然语言描述的语义解码成程序片段的抽象语法树。The generator is an Encoder-Decoder deep learning framework. The Encoder is responsible for encoding the natural language description sequence. The Decoder decodes the semantics of the natural language description into an abstract syntax tree of program fragments according to the encoding result of the Encoder.
步骤1.1:采用双向LSTM网络作为Encoder,对自然语言描述序列进行编码;Step 1.1: Use the two-way LSTM network as the Encoder to encode the natural language description sequence;
步骤1.1.1:从左到右以及从右到左两个方向对自然语言描述序列进行编码,得到每个字符的中间隐藏向量
Figure PCTCN2019099733-appb-000001
Step 1.1.1: Encode the natural language description sequence from left to right and from right to left to obtain the intermediate hidden vector of each character
Figure PCTCN2019099733-appb-000001
步骤1.1.2:将中间隐藏向量
Figure PCTCN2019099733-appb-000002
进行concat操作
Figure PCTCN2019099733-appb-000003
即为自然语言描述字符的编码向量,并把每个字符的编码向量保存下来,以待后面Decoder使用。
Step 1.1.2: Hide the vector in the middle
Figure PCTCN2019099733-appb-000002
Perform concat operation
Figure PCTCN2019099733-appb-000003
That is, the encoding vector of the natural language description character, and the encoding vector of each character is saved for later use by the decoder.
步骤1.1.3:将最后一个字符的中间隐藏向量作为Decoder的初始状态h endStep 1.1.3: Use the middle hidden vector of the last character as the initial state h end of the Decoder.
步骤1.2:采用单向LSTM网络作为Decoder,将Encoder编码的自然语言语义解码构建为程序的抽象语法树。Step 1.2: Use the one-way LSTM network as the decoder, and construct the natural language semantic decoding encoded by the Encoder into the abstract syntax tree of the program.
这一步骤将编程语言的语法规则引入到生成过程中。以深度优先遍历的方式生成抽象语法树,每一个生成步骤,是对上下文无关文法产生式的应用。语法规则为抽象语法树的生成提供先验知识,缩小搜索空间。This step introduces the grammatical rules of the programming language into the generation process. The abstract syntax tree is generated in a depth-first traversal manner. Each generation step is an application of context-free grammar production. The grammar rules provide prior knowledge for the generation of abstract syntax trees and reduce the search space.
步骤1.2.1:将1.1.3中的h end作为Decoder的初始状态,并使用注意力机制计算h end的内容向量,然后将该内容向量作为LSTM的输入。 Step 1.2.1: Use h end in 1.1.3 as the initial state of the Decoder, and use the attention mechanism to calculate the content vector of h end , and then use the content vector as the input of LSTM.
步骤1.2.2:采用Softmax对1.2.1的LSTM输出结果进行多分类,这些类别分别对应生成抽象语法树的动作。Step 1.2.2: Use Softmax to multi-categorize the 1.2.1 LSTM output results, and these categories correspond to the action of generating an abstract syntax tree.
步骤1.2.3:对于1.2.2生成的抽象语法树的动作,该动作一类是生成叶子节点,该动作另一类是生成非叶子节点。Step 1.2.3: For the action of the abstract syntax tree generated in 1.2.2, one type of this action is to generate leaf nodes, and the other type of action is to generate non-leaf nodes.
对于生成非叶子节点的动作来说,是上下文无关文法扩展;而生成叶子节点的动作,则是生成具体的字符,也就是程序片段中的序列字符,可以采用复制的方式从自然语言描述序列中将字符复制过来,也可以根据由模型生成相应的字符。The action of generating non-leaf nodes is a context-free grammar extension; while the action of generating leaf nodes is to generate specific characters, that is, sequence characters in program fragments, which can be copied from natural language description sequences Copy the characters, you can also generate the corresponding characters according to the model.
步骤1.2.4:按照深度优先遍历的方式应用1.2.3的抽象语法树的动作构建抽象语法树。Step 1.2.4: Apply the action of the abstract syntax tree of 1.2.3 to construct an abstract syntax tree in a depth-first traversal manner.
步骤1.2.5:将1.2.4的输出结果作为1.2.1的输入,重复1.2.1到1.2.4操作,最终得到一棵完整的抽象语法树,即自然语言描述语义对应的程序片段的抽象法树。Step 1.2.5: Take the output of 1.2.4 as the input of 1.2.1, repeat the operations from 1.2.1 to 1.2.4, and finally get a complete abstract syntax tree, that is, the abstraction of the program fragment corresponding to the natural language description semantics Law tree.
步骤1.2.6:将抽象语法树解析成程序片段。Step 1.2.6: Parse the abstract syntax tree into program fragments.
步骤2:采用GAN的判别器判断生成器生成的抽象语法树的语义是否与给定的自然语言描述的语义是否一致,这也是对生成器生成的一种强的语义约束。训练判别器的数据分为三种:A训练数据中的自然语言描述和与之对应的程序的抽象语法树。B给定自然语言描述和生成器生成的抽象语法树。C自然语言描述序列和与之无关的程序的抽象语法树。对于训练数据A给定标签为一致,而训练数据B,C给定标签为不一致。Step 2: Use the GAN discriminator to determine whether the semantics of the abstract syntax tree generated by the generator is consistent with the semantics of the given natural language description, which is also a strong semantic constraint on the generator. The data for training the discriminator is divided into three types: A natural language description in the training data and the abstract syntax tree of the corresponding program. B Given a natural language description and an abstract syntax tree generated by the generator. C natural language describes the abstract syntax tree of sequences and unrelated programs. The given label for training data A is consistent, while the given label for training data B and C is inconsistent.
步骤2.1:采用GAN生成器中Encoder的方法对自然语言描述序列进行编码,这一步只要得到最后的语义向量。Step 2.1: Use the Encoder method in the GAN generator to encode the natural language description sequence. This step only needs to get the final semantic vector.
步骤2.2:采用树型LSTM网络,自底向上对抽象语法树进行编码,一直编码到抽象语法树的根节点,也就是这个抽象语法树对应的语义向量。Step 2.2: Use the tree-type LSTM network to encode the abstract syntax tree from the bottom up, and encode it to the root node of the abstract syntax tree, which is the semantic vector corresponding to the abstract syntax tree.
步骤2.3:将2.1和2.2中的自然语言语义向量和抽象语法树的语义向量进 行向量乘法。Step 2.3: Perform vector multiplication on the natural language semantic vector in 2.1 and 2.2 and the semantic vector of the abstract syntax tree.
步骤2.4:重复2.1和2.3,对步骤2中的训练数据B和训练数据C进行同样的操作。Step 2.4: Repeat 2.1 and 2.3 to perform the same operation on training data B and training data C in step 2.
步骤2.5:对2.4中的训练数据对进行二分类预测,及判断这三种情况下自然语言和程序抽象语法树的语义是否一致。Step 2.5: Perform two-class prediction on the training data pair in 2.4, and judge whether the semantics of the natural language and the program abstract syntax tree are consistent in these three cases.
步骤3:训练GANCoder,将GAN网络的生成器和判别器一起训练。在优化的时候,生成器和判别器交替优化。在训练之前,先分别对生成器和判别器进行预训练,然后再一起博弈训练。Step 3: Train GANCoder, train the generator and discriminator of the GAN network together. During optimization, the generator and the discriminator are optimized alternately. Before training, the generator and the discriminator are pre-trained separately, and then the game is trained together.
进一步的,由一种从自然语言自动生成Python代码的方法生成的模型GANCoder包含两个部分:生成器和判别器,其中生成器负责实现从自然语言到编程语言程序片段的生成,而判别器则识别出生成器生成的程序片段。训练的时候,生成器和判别器处于博弈训练的状态,相互提高,到最后判别器不能识别出编程语言程序片段是原始训练集的数据还是由生成器生成的数据。Further, the model GANCoder generated by a method of automatically generating Python code from natural language contains two parts: a generator and a discriminator, where the generator is responsible for generating program fragments from natural language to programming language, and the discriminator is Identify the program fragments generated by the generator. During training, the generator and the discriminator are in a state of game training, and they improve each other. In the end, the discriminator cannot identify whether the programming language program fragment is the data of the original training set or the data generated by the generator.
本发明具备的有益效果:The beneficial effects of the present invention:
本发明通过生成对抗网络优化训练,生成一个代码生成系统,该系统可以根据用户给定的对于一个功能的自然语言描述,然后生成一段具有相同功能的程序代码。相较于传统的优化方法,使用生成对抗网络进行对抗博弈训练,生成器能够更有效地学习到自然语言和编程语言的语言模型。The present invention generates a code generation system by generating a confrontation network optimization training, and the system can generate a section of program code with the same function according to the natural language description of a function given by the user. Compared with traditional optimization methods, using generative adversarial network for adversarial game training, the generator can learn the language models of natural language and programming language more effectively.
附图说明Description of the drawings
图1是基于Encoder-Decoder模型的语义分析器。Figure 1 is a semantic analyzer based on the Encoder-Decoder model.
图2是一个Python程序对应的抽象语法树。Figure 2 is an abstract syntax tree corresponding to a Python program.
图3是本发明GANCoder的总体框架。Figure 3 is the overall framework of the GANCoder of the present invention.
图4是GANCoder的生成器的框架表示图。Figure 4 is a diagram showing the framework of the GANCoder generator.
图5是使用树型LSTM网络对抽象法树进行编码。Figure 5 is the use of tree LSTM network to encode the abstract method tree.
具体实施方式Detailed ways
以下结合技术方案和附图详细叙述本发明的具体实施方式。The specific embodiments of the present invention will be described in detail below in conjunction with the technical solution and the drawings.
一种从自然语言自动生成Python代码的方法,提出的GANCoder系统,总体上是一个生成对抗网络,包含生成器和判别器两个部分,如图3所示。其中生成器是一个Encoder-Decoder模型,如图4所示,Encoder负责对自然语言描述序列进行编码,使用双向LSTM网络,而Decoder则将Encoder编码的语义解码成程序的抽象语法树,使用单向LSTM网络;而判别器主要负责判断自然语言描述和抽象语法树的语义是否一致,对于自然语言描述的语义编码使用生成器Encoder,对于抽象语法树的编码则采用树型LSTM网络,树型LSTM网络如图5所示,以自底向上的方式对程序的抽象语法树进行编码,抽象语法树的根节点的编码向量为抽象语法树的语义向量。A method to automatically generate Python code from natural language. The proposed GANCoder system is generally a generative confrontation network, including two parts: a generator and a discriminator, as shown in Figure 3. The generator is an Encoder-Decoder model. As shown in Figure 4, the Encoder is responsible for encoding the natural language description sequence using a two-way LSTM network, while the Decoder decodes the semantics encoded by the Encoder into the abstract syntax tree of the program, using one-way LSTM network; and the discriminator is mainly responsible for judging whether the semantics of natural language description and abstract syntax tree are consistent. For the semantic encoding of natural language description, the generator Encoder is used, and the tree-type LSTM network is used for the encoding of the abstract syntax tree. As shown in Figure 5, the abstract syntax tree of the program is coded in a bottom-up manner, and the coding vector of the root node of the abstract syntax tree is the semantic vector of the abstract syntax tree.
步骤1:采用GAN网络的生成器根据自然语言描述生成程序片段的抽象语法树。Step 1: Use the generator of the GAN network to generate the abstract syntax tree of the program fragments according to the natural language description.
生成器是一个Encoder-Decoder深度学习模型,如图4所示,图中左边是Encoder,是一个双向LSTM网络,负责对自然语言描述序列进行编码;图中右边是Decoder,是一个单向LSTM网络,它则根据Encoder的编码结果,将自然语言描述的语义解码成程序片段的抽象语法树。The generator is an Encoder-Decoder deep learning model, as shown in Figure 4. The left side of the figure is Encoder, which is a two-way LSTM network, which is responsible for encoding natural language description sequences; the right side of the figure is Decoder, which is a one-way LSTM network. , It decodes the semantics described by natural language into abstract syntax trees of program fragments according to the encoding result of Encoder.
步骤1.1:采用双向LSTM网络作为Encoder,对自然语言描述序列进行编码。图4Encoder中左右两个方向表示LSTM网络的编码顺序。Step 1.1: Use the two-way LSTM network as the Encoder to encode the natural language description sequence. The left and right directions in the Encoder in Figure 4 represent the encoding order of the LSTM network.
步骤1.1.1:从左到右以及从右到左两个方向对自然语言描述序列进行编码,得到每个字符的中间隐藏向量
Figure PCTCN2019099733-appb-000004
如图4Encoder中LSTM网络的两个编码方向一样。
Step 1.1.1: Encode the natural language description sequence from left to right and from right to left to obtain the intermediate hidden vector of each character
Figure PCTCN2019099733-appb-000004
The two encoding directions of the LSTM network in Figure 4 Encoder are the same.
步骤1.1.2:将1.1的
Figure PCTCN2019099733-appb-000005
进行concat操作,得到
Figure PCTCN2019099733-appb-000006
即为自然语言描述字符的编码向量,并把每个字符的编码向量保存下来,以待后面Decoder使用。
Step 1.1.2: Change 1.1
Figure PCTCN2019099733-appb-000005
Perform concat operation and get
Figure PCTCN2019099733-appb-000006
That is, the encoding vector of the natural language description character, and the encoding vector of each character is saved for later use by the decoder.
步骤1.1.3:将最后一个字符的中间隐藏向量作为Decoder的初始状态h endStep 1.1.3: Use the middle hidden vector of the last character as the initial state h end of the Decoder.
步骤1.2:采用单向LSTM网络作为Decoder,将Encoder编码的自然语言语义解码构建为程序的抽象语法树。Step 1.2: Use the one-way LSTM network as the decoder, and construct the natural language semantic decoding encoded by the Encoder into the abstract syntax tree of the program.
这一步骤将编程语言的语法规则引入到代码生成过程中。以深度优先遍历的方式生成抽象语法树,每一个生成步骤,是对上下文无关文法产生式的应用。语法规则为抽象语法树的生成提供先验知识,缩小搜索空间。This step introduces the grammatical rules of the programming language into the code generation process. The abstract syntax tree is generated in a depth-first traversal manner. Each generation step is an application of context-free grammar production. The grammar rules provide prior knowledge for the generation of abstract syntax trees and reduce the search space.
步骤1.2.1:如图4,Decoder将1.1.3中的h end作为始状态,并使用注意力机制计算h end的内容向量C1,然后将该内容向量作为LSTM的输入。 Step 1.2.1: As shown in Figure 4, the Decoder takes h end in 1.1.3 as the initial state, and uses the attention mechanism to calculate the content vector C1 of h end , and then uses the content vector as the input of LSTM.
步骤1.2.2:采用Softmax对LSTM输出结果进行多分类,这些类别分别对应生成抽象语法树的动作,对应如图2中右图抽象语法树的每个节点。Step 1.2.2: Use Softmax to multi-categorize the LSTM output results. These categories correspond to the action of generating the abstract syntax tree, corresponding to each node of the abstract syntax tree in the right figure in Figure 2.
步骤1.2.3:对于1.2.2预测的动作,一类是生成叶子节点,另一类是生成非叶子节点,也就是图2中抽象语法树中叶子节点和非叶子节点。对于生成非叶子节点的动作来说,是上下文无关文法扩展,每一条为上下文文法规则;而生成叶子节点的动作,则是生成具体的字符,也就是程序片段中的序列字符,可以采用复制的方式从自然语言描述序列中将字符复制过来,也可以根据由模型生成相应的字符。Step 1.2.3: For the actions predicted in 1.2.2, one is to generate leaf nodes, and the other is to generate non-leaf nodes, that is, the leaf nodes and non-leaf nodes in the abstract syntax tree in Figure 2. For the action of generating non-leaf nodes, it is a context-free grammar extension, each of which is a context grammar rule; while the action of generating leaf nodes is to generate specific characters, that is, sequence characters in program fragments, which can be copied Ways to copy characters from the natural language description sequence, or to generate corresponding characters based on the model.
步骤1.2.4:按照深度优先遍历的方式应用1.2.3预测动作构建抽象语法树。图2中的抽象语法树节点用实线箭头表示的顺序为建立抽象语法树中每个节点构建的顺序。Step 1.2.4: Apply the 1.2.3 prediction action to construct an abstract syntax tree in a depth-first traversal manner. The order in which the nodes of the abstract syntax tree in Figure 2 are represented by solid arrows is the order in which each node in the abstract syntax tree is built.
步骤1.2.5:将1.2.4的输出结果作为1.2.1的输入,如图2,上一个节点的信息传递给下一节点,其中信息包括上一步骤的状态,也就是实线箭头表示的, 还有父节点的信息,虚线箭头传递的信息。然后重复1.2.1到1.2.4操作,最终得到一棵完整的抽象语法树,即自然语言描述语义对应的程序片段的抽象语法树。Step 1.2.5: Use the output of 1.2.4 as the input of 1.2.1, as shown in Figure 2. The information of the previous node is passed to the next node, and the information includes the state of the previous step, which is indicated by the solid arrow , There is also the information of the parent node, the information conveyed by the dotted arrow. Then repeat the operations from 1.2.1 to 1.2.4, and finally get a complete abstract syntax tree, that is, the abstract syntax tree of the program fragment corresponding to the natural language description semantics.
步骤1.2.6:将完整的抽象语法树解析成程序片段。Step 1.2.6: Parse the complete abstract syntax tree into program fragments.
步骤2:采用GAN的判别器判断生成器生成的抽象语法树的语义是否与给定的自然语言描述的语义是否一致,这也是对生成器生成的一种强的语义约束。训练判别器的数据分为三种:1.训练数据中的自然语言描述和与之对应的程序的抽象语法树。2.给定自然语言描述和生成器生成的抽象语法树。3.自然语言描述序列和与之无关的程序的抽象语法树。对于1来说,给定标签为一致,而2,3两种数据,给定标签为不一致。Step 2: Use the GAN discriminator to determine whether the semantics of the abstract syntax tree generated by the generator is consistent with the semantics of the given natural language description, which is also a strong semantic constraint on the generator. The training discriminator data is divided into three types: 1. The natural language description in the training data and the abstract syntax tree of the corresponding program. 2. Given the natural language description and the abstract syntax tree generated by the generator. 3. Natural language description sequence and abstract syntax tree of unrelated programs. For 1, the given label is consistent, and for two data, the given label is inconsistent.
步骤2.1:采用GAN生成器中Encoder的方法对自然语言描述序列进行编码,这一步只要得到最后的语义向量,Encoder的结构如图4所示。Step 2.1: Use the Encoder method in the GAN generator to encode the natural language description sequence. As long as the final semantic vector is obtained in this step, the Encoder structure is shown in Figure 4.
步骤2.2:采用树型LSTM网络,如图5所示,自底向上地对抽象语法树进行编码,抽象语法树的孩子节点是父节点编码的输入,一直编码到抽象语法树的根节点,也就是这个抽象语法树对应的语义向量。Step 2.2: Adopt a tree-type LSTM network, as shown in Figure 5, to encode the abstract syntax tree from the bottom up. The child nodes of the abstract syntax tree are the input of the parent node encoding, and it is encoded to the root node of the abstract syntax tree. It is the semantic vector corresponding to this abstract syntax tree.
步骤2.3:将2.1和2.2中的自然语言语义向量和抽象语法树的语义向量进行向量乘法。Step 2.3: Perform vector multiplication on the natural language semantic vector in 2.1 and 2.2 and the semantic vector of the abstract syntax tree.
步骤2.4:重复2.1和2.3,对步骤2中的训练数据2和训练数据3进行同样的操作。Step 2.4: Repeat 2.1 and 2.3 to perform the same operation on training data 2 and training data 3 in step 2.
步骤2.5:对2.4中的训练数据对进行二分类预测,及判断这三种情况下自然语言和程序抽象语法树的语义是否一致。Step 2.5: Perform two-class prediction on the training data pair in 2.4, and judge whether the semantics of the natural language and the program abstract syntax tree are consistent in these three cases.
步骤3:训练GANCoder,将GAN网络的生成器和判别器一起训练。在优化的时候,生成器和判别器交替优化。在训练之前,先对生成器和判别器进行 预训练,然后再一起博弈训练,如图3所示,判别器的信息会反馈到生成器。Step 3: Train GANCoder, train the generator and discriminator of the GAN network together. During optimization, the generator and the discriminator are optimized alternately. Before training, the generator and the discriminator are pre-trained, and then the game is trained together. As shown in Figure 3, the discriminator's information will be fed back to the generator.

Claims (4)

  1. 一种从自然语言自动生成Python代码的方法,其特征在于,步骤如下:A method for automatically generating Python code from natural language, characterized in that the steps are as follows:
    步骤1:采用GAN网络的生成器根据自然语言描述生成程序片段的抽象语法树;Step 1: Use the generator of the GAN network to generate the abstract syntax tree of the program fragment according to the natural language description;
    步骤1.1:采用双向LSTM网络作为Encoder,对自然语言描述序列进行编码;Step 1.1: Use the two-way LSTM network as the Encoder to encode the natural language description sequence;
    步骤1.1.1:从左到右以及从右到左两个方向对自然语言描述序列进行编码,得到每个字符的中间隐藏向量
    Figure PCTCN2019099733-appb-100001
    Step 1.1.1: Encode the natural language description sequence from left to right and from right to left to obtain the intermediate hidden vector of each character
    Figure PCTCN2019099733-appb-100001
    步骤1.1.2:将中间隐藏向量
    Figure PCTCN2019099733-appb-100002
    进行concat操作
    Figure PCTCN2019099733-appb-100003
    即为自然语言描述字符的编码向量,并把每个字符的编码向量保存下来,以待后面Decoder使用;
    Step 1.1.2: Hide the vector in the middle
    Figure PCTCN2019099733-appb-100002
    Perform concat operation
    Figure PCTCN2019099733-appb-100003
    That is, the encoding vector of the natural language description character, and the encoding vector of each character is saved for later use by the decoder;
    步骤1.1.3:将最后一个字符的中间隐藏向量作为Decoder的初始状态h endStep 1.1.3: Use the middle hidden vector of the last character as the initial state h end of the Decoder;
    步骤1.2:采用单向LSTM网络作为Decoder,将Encoder编码的自然语言语义解码构建为程序的抽象语法树;Step 1.2: Use the one-way LSTM network as the decoder, and construct the natural language semantic decoding encoded by the Encoder into the abstract syntax tree of the program;
    步骤2:训练判别器的数据分为三种:A训练数据中的自然语言描述和与之对应的程序的抽象语法树;B给定自然语言描述和生成器生成的抽象语法树;C自然语言描述序列和与之无关的程序的抽象语法树;Step 2: The data for training the discriminator is divided into three types: A natural language description in the training data and the abstract syntax tree of the corresponding program; B a given natural language description and abstract syntax tree generated by the generator; C natural language Abstract syntax tree describing sequences and unrelated programs;
    对于训练数据A给定标签为一致;而训练数据B,C给定标签为不一致;For training data A, the given labels are consistent; for training data B and C, the given labels are inconsistent;
    步骤2.1:采用GAN生成器中Encoder的方法对自然语言描述序列进行编码;Step 2.1: Use the Encoder method in the GAN generator to encode the natural language description sequence;
    步骤2.2:采用树型LSTM网络,自底向上对抽象语法树进行编码,一直编码到抽象语法树的根节点;Step 2.2: Use a tree-type LSTM network to encode the abstract syntax tree from the bottom up, and encode it to the root node of the abstract syntax tree;
    步骤2.3:将2.1和2.2中的自然语言语义向量和抽象语法树的语义向量进 行向量乘法;Step 2.3: Perform vector multiplication on the natural language semantic vector in 2.1 and 2.2 and the semantic vector of the abstract syntax tree;
    步骤2.4:重复2.1和2.3,对步骤2中的训练数据B和训练数据C进行同样的操作;Step 2.4: Repeat 2.1 and 2.3 to perform the same operation on training data B and training data C in step 2;
    步骤2.5:对2.4中的训练数据对进行二分类预测,及判断这三种情况下自然语言和程序抽象语法树的语义是否一致;Step 2.5: Perform two-class prediction on the training data pair in 2.4, and judge whether the semantics of the natural language and the program abstract syntax tree are consistent in these three cases;
    步骤3:将GAN网络的生成器和判别器一起训练,生成器和判别器交替优化。Step 3: Train the generator and discriminator of the GAN network together, and optimize the generator and discriminator alternately.
  2. 如权利要求1所述的从自然语言自动生成Python代码的方法,其特征在于,所述的步骤1.2具体方法如下:The method for automatically generating Python code from natural language according to claim 1, wherein the specific method of step 1.2 is as follows:
    步骤1.2.1:将1.1.3中的初始状态h end作为Decoder的初始状态,并使用注意力机制计算h end的内容向量,然后将该内容向量作为LSTM的输入; Step 1.2.1: Use the initial state h end in 1.1.3 as the initial state of the Decoder, and use the attention mechanism to calculate the content vector of h end , and then use the content vector as the input of LSTM;
    步骤1.2.2:采用Softmax对1.2.1的LSTM输出结果进行多分类,这些类别分别对应生成抽象语法树的动作;Step 1.2.2: Use Softmax to multi-categorize the 1.2.1 LSTM output results, and these categories correspond to the actions of generating abstract syntax trees;
    步骤1.2.3:对于1.2.2生成的抽象语法树的动作,该动作一类是生成叶子节点,该动作另一类是生成非叶子节点;Step 1.2.3: For the action of the abstract syntax tree generated by 1.2.2, one type of this action is to generate leaf nodes, and the other type of action is to generate non-leaf nodes;
    步骤1.2.4:按照深度优先遍历的方式应用1.2.3的抽象语法树的动作构建抽象语法树;Step 1.2.4: Apply the actions of the abstract syntax tree of 1.2.3 to construct an abstract syntax tree in a depth-first traversal manner;
    步骤1.2.5:将1.2.4的输出结果作为1.2.1的输入,重复1.2.1到1.2.4操作,最终得到一棵完整的抽象语法树,即自然语言描述语义对应的程序片段的抽象法树;Step 1.2.5: Take the output of 1.2.4 as the input of 1.2.1, repeat the operations from 1.2.1 to 1.2.4, and finally get a complete abstract syntax tree, that is, the abstraction of the program fragment corresponding to the natural language description semantics Law tree
    步骤1.2.6:将抽象语法树解析成程序片段。Step 1.2.6: Parse the abstract syntax tree into program fragments.
  3. 如权利要求1或2所述的从自然语言自动生成Python代码的方法,其特征在于,所述的步骤3中,在训练生成器和判别器之前,先分别对生成器和判别器 进行预训练,再一起博弈训练。The method for automatically generating Python code from natural language according to claim 1 or 2, characterized in that, in said step 3, before training the generator and the discriminator, the generator and the discriminator are respectively pre-trained , And then game training together.
  4. 根据权利要求1~3任一所述的从自然语言自动生成Python代码的方法生成的模型包含两个部分:生成器和判别器,其中生成器负责实现从自然语言到编程语言程序片段的生成,而判别器则识别出生成器生成的程序片段;训练时,生成器和判别器处于博弈训练的状态,相互提高,到最后判别器不能识别出编程语言程序片段是原始训练集的数据还是由生成器生成的数据。The model generated by the method for automatically generating Python code from natural language according to any one of claims 1 to 3 includes two parts: a generator and a discriminator, where the generator is responsible for generating program fragments from natural language to programming language, The discriminator recognizes the program fragments generated by the generator; during training, the generator and the discriminator are in a game training state and improve each other. In the end, the discriminator cannot identify whether the programming language program fragment is the original training set data or generated Data generated by the device.
PCT/CN2019/099733 2019-07-29 2019-08-08 Method for automatically generating python codes from natural language WO2021017025A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910689490.3 2019-07-29
CN201910689490.3A CN110489102B (en) 2019-07-29 2019-07-29 Method for automatically generating Python code from natural language

Publications (1)

Publication Number Publication Date
WO2021017025A1 true WO2021017025A1 (en) 2021-02-04

Family

ID=68548396

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/099733 WO2021017025A1 (en) 2019-07-29 2019-08-08 Method for automatically generating python codes from natural language

Country Status (2)

Country Link
CN (1) CN110489102B (en)
WO (1) WO2021017025A1 (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112987653B (en) * 2019-12-17 2022-04-15 深圳市恒控科技有限公司 Method and device for converting Chinese program into G code
CN111443904B (en) * 2020-03-12 2023-04-07 清华大学深圳国际研究生院 Method for generating executable code and computer readable storage medium
CN111639153A (en) * 2020-04-24 2020-09-08 平安国际智慧城市科技股份有限公司 Query method and device based on legal knowledge graph, electronic equipment and medium
CN112255962A (en) * 2020-10-30 2021-01-22 浙江佳乐科仪股份有限公司 PLC programming system based on artificial intelligence
CN112905188A (en) * 2021-02-05 2021-06-04 中国海洋大学 Code translation method and system based on generation type countermeasure GAN network
CN113126973A (en) * 2021-04-30 2021-07-16 南京工业大学 Code generation method based on gated attention and interactive LSTM
CN113849162B (en) * 2021-09-28 2024-04-02 哈尔滨工业大学 Code generation method combining model driving and deep neural network
CN114860241B (en) * 2022-07-07 2022-09-23 中国海洋大学 Code abstract syntax tree generation method based on generation countermeasure network
CN116400901A (en) * 2023-04-12 2023-07-07 上海计算机软件技术开发中心 Python code automatic generation method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170323636A1 (en) * 2016-05-05 2017-11-09 Conduent Business Services, Llc Semantic parsing using deep neural networks for predicting canonical forms
CN109359293A (en) * 2018-09-13 2019-02-19 内蒙古大学 Mongolian name entity recognition method neural network based and its identifying system
CN109783809A (en) * 2018-12-22 2019-05-21 昆明理工大学 A method of alignment sentence is extracted from Laos-Chinese chapter grade alignment corpus
CN109799990A (en) * 2017-11-16 2019-05-24 中标软件有限公司 Source code annotates automatic generation method and system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108446540B (en) * 2018-03-19 2022-02-25 中山大学 Program code plagiarism type detection method and system based on source code multi-label graph neural network
CN108388425B (en) * 2018-03-20 2021-02-19 北京大学 Method for automatically completing codes based on LSTM
CN108733359B (en) * 2018-06-14 2020-12-25 北京航空航天大学 Automatic generation method of software program
RU2697648C2 (en) * 2018-10-05 2019-08-15 Общество с ограниченной ответственностью "Алгоритм" Traffic classification system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170323636A1 (en) * 2016-05-05 2017-11-09 Conduent Business Services, Llc Semantic parsing using deep neural networks for predicting canonical forms
CN109799990A (en) * 2017-11-16 2019-05-24 中标软件有限公司 Source code annotates automatic generation method and system
CN109359293A (en) * 2018-09-13 2019-02-19 内蒙古大学 Mongolian name entity recognition method neural network based and its identifying system
CN109783809A (en) * 2018-12-22 2019-05-21 昆明理工大学 A method of alignment sentence is extracted from Laos-Chinese chapter grade alignment corpus

Also Published As

Publication number Publication date
CN110489102A (en) 2019-11-22
CN110489102B (en) 2021-06-18

Similar Documents

Publication Publication Date Title
WO2021017025A1 (en) Method for automatically generating python codes from natural language
CN112613303B (en) Knowledge distillation-based cross-modal image aesthetic quality evaluation method
CN107632981B (en) Neural machine translation method introducing source language chunk information coding
CN111241294B (en) Relationship extraction method of graph convolution network based on dependency analysis and keywords
CN109359297B (en) Relationship extraction method and system
CN108563433B (en) Device based on LSTM automatic completion code
CN109508459A (en) A method of extracting theme and key message from news
CN111382574B (en) Semantic parsing system combining syntax under virtual reality and augmented reality scenes
CN113657123A (en) Mongolian aspect level emotion analysis method based on target template guidance and relation head coding
CN110084323A (en) End-to-end semanteme resolution system and training method
CN115543437A (en) Code annotation generation method and system
CN114489669A (en) Python language code fragment generation method based on graph learning
CN116661805B (en) Code representation generation method and device, storage medium and electronic equipment
CN108363685B (en) Self-media data text representation method based on recursive variation self-coding model
CN112732264A (en) Automatic code conversion method between high-level programming languages
CN115906857A (en) Chinese medicine text named entity recognition method based on vocabulary enhancement
CN108733359B (en) Automatic generation method of software program
CN115826988A (en) Java method annotation instant automatic updating method based on data flow analysis and attention mechanism
CN116362265A (en) Text translation method, device, equipment and storage medium
CN116483314A (en) Automatic intelligent activity diagram generation method
CN114757181B (en) Method and device for training and extracting event of end-to-end event extraction model based on prior knowledge
CN112434143B (en) Dialog method, storage medium and system based on hidden state constraint of GRU (generalized regression Unit)
CN113536741B (en) Method and device for converting Chinese natural language into database language
CN114881010A (en) Chinese grammar error correction method based on Transformer and multitask learning
CN113486647A (en) Semantic parsing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19939997

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19939997

Country of ref document: EP

Kind code of ref document: A1