CN115373691A

CN115373691A - Intelligent programming language program translation method and system based on neural network

Info

Publication number: CN115373691A
Application number: CN202210850684.9A
Authority: CN
Inventors: 郭崎; 文渊博
Original assignee: Institute of Computing Technology of CAS
Current assignee: Institute of Computing Technology of CAS
Priority date: 2022-07-19
Filing date: 2022-07-19
Publication date: 2022-11-22

Abstract

The invention provides an intelligent programming language program translation method and system based on a neural network, which comprises the following steps: constructing a forward model for translating a source language program into a target language program and a reverse model for translating the target language program into the source language program; training the forward model and the reverse model through reverse translation according to a source language program library and a target language program library to obtain a forward translation model and a reverse translation model; inputting a source language program to be translated into the forward translation model, inputting the obtained multiple candidate results into a reordering model to obtain the scores of the candidate results, and selecting the candidate result with the highest score as the translation result of the source language program.

Description

Intelligent programming language program translation method and system based on neural network

Technical Field

The invention relates to the technical field of program code conversion, program generation, program translation and natural language processing, in particular to an intelligent programming language program translation method and system based on a neural network.

Background

The intelligent chip is used as an important material carrier of a deep learning algorithm and is widely used in different fields. As an important component of system software in intelligent computing systems, intelligent programming languages dedicated to intelligent computing systems are emerging, such as the CUDA programming language and BANG language. Due to the rapid development of deep learning algorithms, developers need to develop user programs in an intelligent programming language in order to deploy custom algorithms on intelligent chips. However, these programming languages have high programming difficulty and place high demands on programmers. Because of the huge arithmetic unit design of the intelligent chip, the parallelism of data and tasks is often existed, and programmers need to follow the parallel programming model when using the programming languages. In addition, unlike a storage structure such as a Cache on a general-purpose processor, which is transparent to a programmer, when the programmer uses an intelligent programming language, the programmer needs to manually perform explicit declaration, management, and the like on different types of storage spaces. These difficulties clearly add to the challenges of programmers developing user programs on smart chips.

User programs in traditional programming languages based on a serial programming model are readily available, which benefits from two reasons: (1) A large amount of legacy code which is developed and optimized exists on the serial programming model; (2) The serial programming model conforms to the use habits of programmers, and users can easily write user programs by using the traditional programming language. Therefore, if the user program of the traditional programming language can be used and migrated to the user program of the intelligent programming language, the expenses of the programmer for developing the user program on the intelligent chip can be greatly reduced.

Currently, some work has been done to translate between different natural languages (e.g., english to chinese), and some people have been done to translate program languages (e.g., C to CUDA). At present, in the prior art, the TransCoder can use a neural network for program language translation, and can complete automatic translation aiming at user programs such as C, python, java and the like. It consists of basic Transformer architecture, including an encoder and a decoder, each having independent 6 layers. The model is common to a plurality of programming languages and is trained in three unsupervised training modes.

The first task is pre-training of a cross-programming Language, the task is similar to a training method of a Mask Language Model (MLM), word elements of a part of input sequences are blocked randomly, and the training Model predicts the word elements according to the context of the blocked word elements. During the training process, training data of different languages are alternately replaced to enable the model to learn the representation of different languages with higher quality.

The second task is as follows: noise reduction self-coding: after pre-training of the mask language model, the encoder has learned the word characterization. But the translation task requires the co-participation of the encoder and decoder, and the decoder at this time has not yet learned how to decode. Therefore, the TransCoder introduces the task of noise reduction self-coding to train the decoder. The method is to add noise to the input sequence, such as random occlusion, delete a certain lemma or shuffle the order, and then let the model predict the original sequence. The training mode enables the decoder to participate in training, enables the stability of the model to be stronger, and can deal with input noise.

The third task is as follows: and (4) reverse translation. After the training of the first two steps, the model can generate a translation result preliminarily. But the quality of the results produced is poor because the model has not been able to translate from a specified language to another specified language. In order to solve the problem, reverse translation is introduced, which is a solution in the case of lack of supervised linguistic data and is a weak supervised learning model. The reverse translation is composed of two parts, namely a source language to target language translation model and a target language to source language translation model. Firstly, the language material of the target language is used as input, the source language is obtained through model translation, a source language material with noise is generated, then the language material of the source language and the language material of the target language can be approximately considered as a pair of supervised language materials, and therefore the process from the source language to the target language can be carried out in a weak supervision mode. In the training process, two directions are trained simultaneously until convergence.

The following mainly addresses the problems and disadvantages of the prior art Transcoder.

The Transcoder is mainly designed for traditional serial programming languages, such as C, java, and Python, and it is easy to translate between these programming languages that are semantically similar (meaning all serial programs) and only differ in syntax (meaning that different programming language keywords, such as void of C, def in Python). However, conventional programming languages, such as C (hereinafter C is used as a representative of conventional programming languages), are serial programming models that describe complex operations by a scalar representation having a loop nested structure. An intelligent programming language, such as the CUDA language (hereinafter, the CUDA language is used as a representative of the intelligent programming language), is a parallel programming model, and requires parallel processing of data. The C programming model and CUDA programming model differ greatly both grammatically and semantically. On a grammatical level, a special keyword such as a parallel variable threeadidx exists in the CUDA programming model. At the semantic level, the loop nesting structure in the C program needs to be translated into parallel variables in the CUDA program. These problems all exacerbate the difficulty of completing automatic translation of C programs into CUDA programs.

The existing technology, namely the Transcoder, does not analyze program semantics, particularly parallel semantics. Therefore, when the problem of translating the C language into the CUDA language is encountered, the performance is not good enough, and the loop structure in the C serial program cannot be detected well and translated into the corresponding CUDA program.

Disclosure of Invention

The invention aims to solve the problem that in the prior art, the difference of the intelligent programming language and the traditional programming language in program semantics is not considered, and provides an index for modeling the parallel semantics of a program and a method and a system for reordering a generated program. The defects in the prior art can be solved by introducing modeling of parallel semantics and selecting an optimal solution in the generated CUDA program through an additional reordering model.

The prior art Transcoder can be considered as being able to obtain a translation model that can take a C program as an input and a set of CUDA program outputs as candidates. Then, the new reordering model introduced by the invention is used for scoring according to the parallel semantic modeling index, so that the optimal CUDA program is selected.

Specifically, the invention provides an intelligent programming language program translation method based on a neural network, which comprises the following steps:

step 1, constructing a forward model for translating a source language program into a target language program and constructing a reverse model for translating the target language program into the source language program; training the forward model and the reverse model through reverse translation according to a source language program library and a target language program library to obtain a forward translation model and a reverse translation model;

and 2, inputting the source language program to be translated into the forward translation model, inputting the obtained multiple candidate results into a reordering model to obtain the scores of the candidate results, and selecting the candidate result with the highest score as the translation result of the source language program.

The intelligent programming language program translation method based on the neural network, wherein the step 1 comprises the following steps:

selecting a program from a source language program library as a first training target to be input into the forward model to obtain a first target language, inputting the intermediate target language into the reverse model to obtain a first source language, constructing a first loss according to the first source language and the first training target, and training the forward model and the reverse model;

selecting a program from a target language program library as a second training target to input into the reverse model to obtain a second source language, inputting the second source language into the forward model to obtain a second target language, constructing a second loss according to the second target language and the second training target, and training the forward model and the reverse model;

respectively saving the current forward model and the reverse model as the forward translation model and the reverse translation model until the first loss and the second loss are converged;

the intelligent programming language program translation method based on the neural network is characterized in that the training of the reordering model comprises the following steps:

selecting a program x from the target language program library to input the reverse translation model, generating an intermediate source program, inputting the intermediate source program into the forward translation model to obtain a plurality of candidate results, and normalizing the prediction scores of the reordering model for each candidate result to obtain output distribution;

taking the program x as a reference, obtaining a correct score of each candidate result as an output distribution, and training the re-ranking model by minimizing divergence between the output distribution and the output distribution.

The intelligent programming language program translation method based on the neural network is characterized in that the source language is C, python or Java, and the target language is CUDA or BANG.

The invention also provides an intelligent programming language program translation system based on the neural network, which comprises the following components:

the initial module is used for constructing a forward model for translating the source language program into the target language program and a reverse model for translating the target language program into the source language program; training the forward model and the reverse model through reverse translation according to a source language program library and a target language program library to obtain a forward translation model and a reverse translation model;

and the translation module is used for inputting the source language program to be translated into the forward translation model, inputting the obtained multiple candidate results into the reordering model to obtain the scores of the candidate results, and selecting the candidate result with the highest score as the translation result of the source language program.

The intelligent programming language program translation system based on the neural network is characterized in that the initial module is used for:

the intelligent programming language program translation system based on the neural network, wherein the training of the reordering model comprises the following steps:

taking the program x as a reference, obtaining a correct score of each candidate result as an output distribution, and training the reordering model by minimizing divergence between the output distribution and the output distribution.

The intelligent programming language program translation system based on the neural network is characterized in that the source language is C, python or Java, and the target language is CUDA or BANG.

The invention also provides a storage medium for storing a program for executing the method for translating the any intelligent programming language program based on the neural network.

The invention also provides a client used for the intelligent programming language program translation system based on the neural network.

According to the scheme, the invention has the advantages that:

the method can realize automatic translation from the program of the traditional programming language to the program of the intelligent programming language, and can improve the machine translation evaluation index BLEU (bilingual evaluation integrity) from 72.21 to 74.00 by 1.79 compared with the prior art. The compiling passing rate of the generated intelligent programming language program is increased from 83.8 percent to 92.8 percent. In addition, compared with the original program of the traditional programming language, the generated program of the intelligent programming language can be improved by 347 times at the highest running speed. The method can also be used for assisting a programmer to write the user program, and the development efficiency of writing the intelligent programming language program by the programmer is improved by 3.8 times to the maximum.

Drawings

FIG. 1 is a general training flow diagram of the present invention;

FIG. 2 is a flow chart of the first and second phases of the training process of the present invention;

FIG. 3 is a flow chart of a third stage training of the present invention;

FIG. 4 is a flowchart illustrating a fourth stage of training according to the present invention;

FIG. 5 is an overall flow chart of the translation process of the present invention.

Detailed Description

In order to achieve the technical effects, the invention mainly comprises the following key technical points:

key points 1, parallel semantic modeling; the method has the technical effects that the program written by the intelligent programming language can be scored, and the parallel semantics of the generated program can be evaluated;

keypoint 2, a reordering model (or a combination of a translation model and a reordering model); the technical effect is that the key point 1 (namely parallel semantic modeling) can be utilized for reordering, and the effect of translating the program of the traditional programming language into the program of the intelligent programming language is improved.

In order to make the aforementioned features and effects of the present invention more comprehensible, embodiments accompanied with figures are described in detail below.

Parallel semantic modeling:

based on BLEU, the invention provides an evaluation index ParaBLEU:

ParaBLEU＝α·BLEU+β·BLEU _weight +(γ·Match _ast +δ·Match _df )×SIM _CUDAkeywords ×SIM _loops ×SIM _parallel

in the formula BLEU _weight Is BLEU with weight, match is the matching degree, AST is Abstract Syntax Trees, abbreviation of Abstract Syntax tree, match _ast Refers to a hypothesis equationThe degree of matching of the sequence and reference programs on the abstract syntax tree, df is an abbreviation for data stream dataflow. Match _df Refers to the degree of matching of the hypothetical program and the reference program on the data stream, and SIM is an abbreviation for similarity degree. CUDAkeywords are CUDA related keywords. SIM (subscriber identity Module) _CUDAkeywords Is how similar the program and the reference program are in the CUDA related keywords. Loops are loop structures and are important components of the CUDA program. SIM (subscriber identity Module) _loops Refers to the degree of similarity between the hypothetical and reference programs in the loop structure; parallel semantics are expressed in which loop axes can be parallel and on which loop axes cannot. SIM (subscriber identity Module) _parallel Refers to how similar the hypothetical program and the reference program are in parallel semantics. α, β, γ, δ are preset weights, which may typically be 0.25.

The index takes into account keyword similarity, loop structure similarity and parallel semantic similarity of the intelligent programming language.

The keyword matching degree is to distinguish the CUDA code from the C code at a syntax level, such as determining whether a CUDA-specific keyword such as _ global _ appears or not. The loop structure similarity is used for checking the matching degree of the loop structure. The parallel semantic similarity further judges whether the original serial program is correctly translated into the parallel program on the basis of the similarity of the loop structure. In particular, the present invention employs similarity distances rather than matching scores to evaluate loop structure similarity and parallel semantic similarity. The similarity distance can better evaluate the similarity of the parallel semantics, and punishment is carried out on the unmatched parallel semantics.

The generated programs can be better scored through the evaluation index ParaBLEU.

Reordering model (or combination of translation and reordering models):

brief flow chart during training:

since C is typical of conventional programming languages, C is taken as an example and is used as a source language for program translation in the present embodiment. Since the CUDA language is typical of the intelligent programming language, the CUDA language is taken as an example, and is used as a target language for program translation in the embodiment.

As shown in FIG. 1, the training of the present invention is divided into four phases: pre-training, noise reduction self-coding, reverse translation and discriminant reordering.

First stage-pre-training: as shown in fig. 2 (upper part), the programs written in C language and CUDA language are collected as C monolingual corpus and CUDA monolingual corpus, respectively. Then, a basic Language Model (LM) is obtained through a pre-training method. The language model is to make a computer "learn" about the C language and the CUDA language, but at this time, the relationship between the C language and the CUDA language cannot be established. The subsequent translation model enables the computer to learn the relationship between the C language and the CUDA language.

Second stage-noise reduction self-encoding: as shown in fig. 2 (lower half). The method trains a decoder of a translation model, mainly aims to make the stability of the model stronger and can cope with input noise.

Third stage-reverse translation: as shown in FIG. 3, in the training process of the reverse translation, the conversion model C- > CUDA of the source language to the target language and the conversion model C of the target language to the source language CUDA- > C are trained simultaneously. In particular, translation models cannot generally be trained using supervised learning methods due to the lack of program pairs consisting of source and target languages. Therefore, the invention generates a relative CUDA program for an input program in a C monolingual corpus by a reverse Translation method, and then translates the CUDA program back to a C program, so that the invention can use the original input C program as a reference, use the C program behind the C-CUDA-C as an assumption, and train a Translation Model (MT Model) by minimizing the difference between the two programs; the CUDA-C-CUDA direction is also the same.

Fourth stage-reordering model: as shown in FIG. 4, after the training process of the previous reverse translation, the translation model is already able to perform C to CUDA conversion, but the quality of the generated CUDA program cannot meet the requirement. Discriminantly reordered training processes rely on supervised data for parallel semantic evaluation of programs. Therefore, the invention utilizes the training result of the inverse model, and generates the data of the reordering model in the inverse translation process of CUDA-C-CUDA as shown in the figure.

First, given CUDA code x as input, C code u is generated through a CUDA-C translation model, the process sets a bundle Search (Beam Search) to 1, i.e., only 1 result is generated. U is then used as the input of the C-CUDA translation model, the process bundle search is set to 50, and the result is generated as w _i ,i∈[1,50].

The present invention takes these N candidates as input to the re-ranking model D. The reordering model D may take the form of a simple MLP network. The structure is as follows: MLP->tan h (activation function) ->MLP. The reordering model respectively predicts a ParaBLEU score for each candidate result wi, and the output distribution p of the model D is obtained through normalization processing _D . Meanwhile, the invention adopts the same method, takes the original CUDA input x as the reference, and takes each w as the reference _i As an assumption, a target distribution p trained by model D is obtained. By minimizing p _D And the Kullback-Leibler divergence between p is used for training the reordering model D, and the whole training process is shown as a formula:

in the formula D, reordering the model; u: inputting a C-CUDA translation model; w is a _i ,i∈[1,50]The order of 50 solutions in the cluster search; exp: an exponential function; the first formula above: distribution p of the reordering model _D The denominator is the sum of the indexes of the 50 solutions of the cluster model, and the numerator is the index value of each solution, representing the current solution w _i The probability of (c). The second formula above: max min is taken to be the maximum and minimum, corresponding to a normalization function, which normalizes score to [0,1 ]]In the meantime. The third formula above: similar to the first equation, the target distribution p is calculated using score in equation 2, where τ is a temperature parameter, and can take the value of 0.5. The last, fourth, formula L = Loss. The left side of the formula represents the loss of the re-ranking model. On the right side of the formula is the calculation formula for the Kullback-Leibler divergence (i.e., KL divergence). The KL divergence is fitted withAnd re-ranking the distribution of the model D and the target distribution calculated by score.

After the training is completed, the invention obtains a translation model and a reordering model.

As shown in fig. 5, during inference, given program x in the source language, firstly, a set of candidate solutions is generated by using a bundle search method through a translation model, and then a reordering model is used to jump out the solution with the highest paraBLEU in the set of candidate solutions as the effect of the final program translation.

The following are system examples corresponding to the above method examples, and this embodiment can be implemented in cooperation with the above embodiments. The related technical details mentioned in the above embodiments are still valid in this embodiment, and are not described herein again in order to reduce repetition. Accordingly, the related-art details mentioned in the present embodiment can also be applied to the above-described embodiments.

The invention also provides an intelligent programming language program translation system based on the neural network, which comprises the following steps:

Claims

1. An intelligent programming language program translation method based on a neural network is characterized by comprising the following steps:

2. The neural network-based intelligent programming language program translation method of claim 1, wherein the step 1 comprises:

and saving the current forward model and the reverse model as the forward translation model and the reverse translation model respectively until the first loss and the second loss converge.

3. The intelligent programming language program translation method based on neural network as claimed in claim 1, wherein the training of the reordering model comprises:

4. The method of claim 1, wherein the source language is C, python or Java, and the target language is CUDA or BANG.

5. An intelligent programming language program translation system based on a neural network, comprising:

6. The intelligent programming language program translation system based on neural network as claimed in claim 5, wherein the initialization module is configured to:

7. The intelligent programming language program translation system based on neural network as claimed in claim 5, wherein the training of the reordering model comprises:

8. The intelligent programming language program translation system based on neural network as claimed in claim 5, wherein the source language is C, python or Java language and the target language is CUDA language or BANG language.

9. A storage medium storing a program for executing the neural network based intelligent programming language program translation method according to any one of claims 1 to 4.

10. A client for use in the neural network based intelligent programming language program translation system of any one of claims 5 to 8.