CN113705251A

CN113705251A - Training method of machine translation model, language translation method and equipment

Info

Publication number: CN113705251A
Application number: CN202110356556.4A
Authority: CN
Inventors: 涂兆鹏; 刘洋; 史树明; 王硕
Original assignee: Tsinghua University; Tencent Technology Shenzhen Co Ltd
Current assignee: Tsinghua University; Tencent Technology Shenzhen Co Ltd
Priority date: 2021-04-01
Filing date: 2021-04-01
Publication date: 2021-11-26
Anticipated expiration: 2041-04-01
Also published as: CN113705251B

Abstract

The embodiment of the application discloses a training method of a machine translation model, a language translation method and equipment, and relates to the field of machine translation of natural language processing. The method comprises the following steps: the data in the first bilingual parallel database are divided into source language data and target language data, the initial machine translation model is finely adjusted through the source language data to obtain a finely adjusted machine translation model, the finely adjusted machine translation model is applied to perform translation tasks, the influence of language coverage deviation existing among different languages on the machine translation model can be eliminated, the performance of the machine translation model obtained through training by the method is improved, and a translated text with high quality and high fidelity can be obtained by applying the model.

Description

Training method of machine translation model, language translation method and equipment

Technical Field

The invention relates to the field of artificial intelligence natural language processing, in particular to a training method of a machine translation model, a language translation method and a device.

Background

Neural machine translation has risen rapidly in recent years. Compared with statistical machine translation, neural machine translation is relatively simple in model, and mainly comprises two parts, namely an encoder and a decoder. The encoder represents the source language as a high-dimensional vector after a series of neural network transformations. The decoder is responsible for re-decoding (translating) this high-dimensional vector into the target language.

The training of the neural machine translation model is not open to large-scale and high-quality bilingual parallel data. Bilingual parallel data is usually obtained by a human translator for translation, and huge human resources and time cost are consumed for constructing large-scale bilingual parallel data.

However, there is a significant difference in the overlaid content in bilingual parallel data that originates from different languages, which is referred to as language overlay bias. The fidelity of the translated text translated by the neural machine translation model is closely related to the language coverage deviation. Therefore, the existence of language coverage deviation in the bilingual parallel data will affect the performance of the neural machine translation model obtained by training the bilingual parallel data.

Disclosure of Invention

The embodiment of the application provides a training method of a machine translation model, a language translation method and equipment, and the method for training the machine translation model can eliminate the influence of language coverage deviation on the machine translation model caused by bilingual parallel sentence pairs originated from different languages in a bilingual parallel database, so that the performance of the machine translation model trained by the method is improved.

In a first aspect, an embodiment of the present application provides a method for training a machine translation model, including: acquiring a bilingual parallel database, wherein the bilingual parallel database comprises a plurality of groups of bilingual parallel sentence pairs, the bilingual parallel sentence pairs are content-aligned data formed by source language data and target language data, and the bilingual parallel database comprises a first bilingual parallel database;

dividing a plurality of groups of bilingual parallel sentence pairs in the first bilingual parallel database into source language-derived data and target language-derived data, wherein the target language data in the bilingual parallel sentence pairs belonging to the source language-derived data are obtained by translating on the basis of the source language data, and the source language data in the bilingual parallel sentence pairs belonging to the target language data are obtained by translating on the basis of the target language data;

training a first machine translation model from the source language data, the first machine translation model for translating the source language to the target language.

Wherein the dividing the plurality of bilingual parallel sentence pairs in the first bilingual parallel database into source language-derived data and target language-derived data comprises:

acquiring a parallel sentence pair to be processed from the first bilingual parallel database;

and determining the data type of the parallel sentence pair to be processed according to the content covered by the parallel sentence pair to be processed, wherein the data type comprises the source language data and the target language data.

The determining the data type of the parallel sentence pair to be processed according to the content covered by the parallel sentence pair to be processed includes:

determining a first probability that the source language data in the parallel sentence pair to be processed is from the source language according to the source language data in the parallel sentence pair to be processed;

determining a second probability that the target language data in the parallel sentence pair to be processed is from the target language according to the content of the target language data in the parallel sentence pair to be processed;

and determining the data type of the parallel sentence pair to be processed according to the deviation between the first probability and the second probability.

Wherein the determining the data type of the to-be-processed parallel sentence pair according to the deviation between the first probability and the second probability specifically includes:

determining the score of the to-be-processed parallel sentence pair according to the deviation between the first probability and the second probability, wherein the score is used for determining the data type of the to-be-processed parallel sentence pair;

when the score is larger than a target threshold value, determining that the parallel sentence pair to be processed belongs to source language data;

and when the score is smaller than a target threshold value, determining that the parallel sentence pair to be processed belongs to the data from the target language.

Determining a first probability that the source language data in the to-be-processed parallel sentence pair is from the source language according to the source language data in the to-be-processed parallel sentence pair, wherein the determining comprises: inputting the source language data in the parallel sentence pair to be processed into a first language model, and determining the first probability, wherein the first language model is used for determining the probability of the source language data in the parallel sentence pair to be processed appearing in the source language, and the first language model is obtained by training a source language single language database;

determining a second probability that the target language data in the parallel sentence pair to be processed is originated from the target language according to the content of the target language data in the parallel sentence pair to be processed, including: and inputting the target language data in the parallel sentence pair to be processed into a second language model, and determining the second probability to be processed, wherein the second language model is used for determining the probability of the target language data in the parallel sentence pair to be processed appearing in the target language, and the second language model is obtained through training of a target language single-language database.

Wherein prior to training the first machine translation model with the source language-derived data, the method further comprises:

and training an initial machine translation model by using the bilingual parallel database to obtain the first machine translation model.

acquiring a single-language database, wherein the single-language database comprises a plurality of original texts with target languages;

inputting each original text in the single-language database into the second machine translation model to obtain a source language translation text corresponding to each original text, wherein the second machine translation model is obtained by training the first bilingual parallel database and is used for translating a target language into a source language;

and adding a plurality of groups of pseudo parallel sentence pairs consisting of source language translation texts respectively corresponding to the original texts with the target languages and the original texts with the target languages into the bilingual parallel database.

acquiring a monolingual database, wherein the monolingual database comprises a plurality of original texts with the languages as source languages;

inputting each original text in the single-language database into the third machine translation model to obtain a target language translation text corresponding to each original text, wherein the third machine translation model is obtained by training the first bilingual parallel database and is used for translating a source language into a target language;

and adding a plurality of groups of pseudo parallel sentence pairs consisting of target language translation texts respectively corresponding to the original texts with the plurality of languages as the source languages and the original texts with the plurality of languages as the source languages into the bilingual parallel database.

In a second aspect, an embodiment of the present application provides a language translation method, including:

receiving data to be translated, wherein the data to be translated is source language data;

inputting data to be translated into a machine translation model to obtain target language data corresponding to the data to be translated, wherein the machine translation model is a first machine translation model obtained by training through the method provided in the first aspect or the various optional implementation manners of the first aspect.

In a third aspect, an embodiment of the present application provides a training apparatus for a machine translation model, including:

the device comprises an acquisition unit, a first bilingual parallel database and a second bilingual parallel database, wherein the acquisition unit is used for acquiring the bilingual parallel database which comprises a plurality of groups of bilingual parallel sentence pairs, the bilingual parallel sentence pairs are content-aligned data formed by source language data and target language data, and the bilingual parallel database comprises the first bilingual parallel database;

a dividing unit, configured to divide a plurality of bilingual parallel sentence pairs in the first bilingual parallel database into source language data and target language data, where target language data in the bilingual parallel sentence pair belonging to the source language data is translated based on the source language data, and source language data in the bilingual parallel sentence pair belonging to the target language data is translated based on the target language data;

a training unit to train a first machine translation model from the source language data, the first machine translation model to translate the source language to the target language.

In a fourth aspect, an embodiment of the present application provides a language translation apparatus, including:

the receiving unit is used for receiving data to be translated, and the data to be translated is source language data;

a translation unit, configured to translate the data to be translated into target language data corresponding to the data, where the translation unit includes a machine translation model, and the machine translation model is a first machine translation model obtained by training through a training method of a machine model provided in the first aspect or various optional implementation manners of the first aspect.

In a fifth aspect, an embodiment of the present application provides a computer device, including: one or more processors, one or more memories, the one or more memories being respectively coupled with the one or more processors; the one or more memories are for storing computer program code comprising computer instructions;

the processor is used for calling the computer instruction to execute: a method of training a machine model as provided in the first aspect or in various alternative implementations of the first aspect.

In a sixth aspect, an embodiment of the present application provides a computer device, including: one or more processors, one or more memories, the one or more memories being respectively coupled with the one or more processors; the one or more memories are for storing computer program code comprising computer instructions;

the processor is used for calling the computer instruction to execute: a language translation method as provided in the second aspect.

In a seventh aspect, this application embodiment provides a computer-readable storage medium, which stores a computer program, the computer program comprising program instructions that, when executed by a processor, perform the method for training a machine translation model as provided in the first aspect or in the various alternative implementations of the first aspect.

In an eighth aspect, the present application provides a computer-readable storage medium, in which a computer program is stored, the computer program including program instructions, which, when executed by a processor, perform the language translation method according to the second aspect.

In a ninth aspect, embodiments of the present application provide a computer program product or a computer program, which includes computer instructions, which are stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided in the first aspect or the various alternative implementations of the first aspect.

In a tenth aspect, embodiments of the present application provide a computer program product or a computer program, which includes computer instructions, which are stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to execute the language translation method provided by the second aspect.

According to the training method of the machine translation model, data in a first bilingual parallel database are divided into source language data and target language data, the initial machine translation model is finely adjusted through the source language data to obtain the finely adjusted machine translation model, the finely adjusted machine translation model is applied to carry out translation tasks, the influence of language coverage deviation existing among different languages on the machine translation model can be eliminated, the performance of the machine translation model obtained through training by the method is improved, and a translated text with high translated text quality and high fidelity can be obtained through the model.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1A is a schematic structural diagram of a computer system according to an embodiment of the present disclosure;

FIG. 1B is a flowchart illustrating a method for training a machine translation model according to an embodiment of the present disclosure;

FIG. 2 is a diagram illustrating the result of partition accuracy obtained by applying a method for partitioning data types according to an embodiment of the present application;

3-5 are diagrams of translation expressions of several machine models obtained by training partitioned data according to the embodiment of the present application;

FIG. 6 is a flow chart illustrating another method for training a machine translation model according to an embodiment of the present disclosure;

FIG. 7 is a diagram illustrating a translation expression result of a model obtained by training according to a training method for a machine translation model provided in an embodiment of the present application;

FIG. 8 is a flow chart illustrating another method for training a machine translation model according to an embodiment of the present disclosure;

FIG. 9 is a flowchart illustrating another method for training a machine translation model according to an embodiment of the present disclosure;

FIG. 10 is a diagram illustrating the results of quality translations obtained using several machine translation models provided by embodiments of the present application;

FIG. 11A is a schematic structural diagram of a training apparatus for a machine translation model according to an embodiment of the present disclosure;

fig. 11B is a schematic structural diagram of a language translation apparatus according to an embodiment of the present application;

fig. 12 is a schematic structural diagram of a language translation device provided in an embodiment of the present application;

fig. 13 is a schematic structural diagram of a training apparatus for a machine translation model according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

First, for convenience of understanding, before describing the training method of the machine translation model provided in the embodiments of the present application, the following describes related terms related to the embodiments of the present application.

DL: deep Learning, a branch of machine Learning, is an algorithm that attempts to perform high-level abstraction on data using multiple processing layers that contain complex structures or consist of multiple nonlinear transformations.

NN: neural Network, a deep learning model simulating biological Neural Network structure and function in the field of machine learning and cognitive science.

DNN: deep Neural Network, a Neural Network with a deeper Network structure, and a core model in Deep learning.

NMT: neural Machine Translation, the latest generation of Machine Translation technology based on Neural networks.

BLEU: the standard method for evaluating machine translation, the higher the value, the better the effect.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.

Machine translation refers to a translation mode in which a sentence in one natural language (source language) is translated into a sentence in another natural language (target language) by a computer. In general, machine translation is the translation of a sentence in a source language into a sentence in a target language by a trained machine translation model. Illustratively, the source language is Chinese, the target language is English, and the sentences in the source language are "I is a student". ", the sentence is translated into" iamastudent "by a machine translation model. The machine translation model can be trained by a large number of bilingual parallel sentence pairs.

The bilingual parallel sentence pair is content-aligned data formed by source language data and corresponding target language data, wherein the content alignment means that the content of the source language data and the content of the target language data have translation relationship and consistent meaning expression.

In the embodiment of the present application, bilingual parallel sentence pairs can be divided into two categories, namely source language data and target language data.

The source language data refers to a bilingual parallel sentence pair with aligned contents, which is formed by firstly generating a text by an author of a source language and then translating the text into a target language by a human translator. That is, target language data belonging to a bilingual parallel sentence pair derived from source language data is translated based on the source language data.

Wherein originating from the target language data means that text is first produced by an author of the target language and then translated by a human translator to the source language in a reverse translation direction to form content-aligned pairs of bilingual parallel sentences. That is, the source language data belonging to the bilingual parallel sentence pair derived from the target language data is translated based on the target language data.

Language coverage bias, the content covered by data from different languages has a significant difference, and this difference is called language coverage bias. For example, the data derived from chinese may include contents having chinese character such as "yunnan, zhang san, lie si, mitten crab, babysbreath" and the like, and the data derived from english may include contents having english character such as "California, National basetball Association, Birmingham" and the like.

With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and applied in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical care, smart customer service, and the like.

The machine translation model obtained by training through the training method of the machine translation model provided by the embodiment of the application can be used in the following scenes:

(1) machine translation

In the application scenario, the machine translation model trained by the method provided by the embodiment of the application can be used in applications supporting translation functions, such as an electronic dictionary application, an electronic book application, a web browsing application, a social application, a graph recognition application, and the like. And when the application program receives the data to be translated, the machine translation model outputs a translation result according to the input data to be translated. Illustratively, the data to be translated includes at least one of a text type, a picture type, an audio type, and a video type of content.

(2) Dialogue question-answer

In the application scenario, the machine translation model trained by the method provided by the embodiment of the application can be applied to intelligent equipment such as an intelligent terminal or an intelligent home. Taking a virtual assistant set in an intelligent terminal as an example, the automatic answering function of the virtual assistant is obtained by training a machine translation model provided by the embodiment of the application and is realized by the translation model. The user provides translation-related problems for the virtual assistant, when the virtual assistant receives the problems input by the user (the problems input by the user can be realized in a voice or text input mode), the machine translation model divides the input problems and outputs translation results, and the intelligent equipment converts the translation results into a voice or text mode and feeds the voice or text mode back to the user through the virtual assistant.

The foregoing is only described with two scenarios as an example, and the method provided in the embodiment of the present application may also be used in other application scenarios, such as text summarization extraction, and the embodiment of the present application does not limit a specific application scenario.

The training method and the language translation method of the machine translation model provided by the embodiment of the application can be applied to computer equipment with strong data processing capacity. In some embodiments, the training method and the language translation method of the machine translation model provided by the embodiments of the present application may be applied to a personal computer, a workstation, or a server, that is, machine translation and training of the machine translation model may be implemented by the personal computer, the workstation, or the server.

Referring to fig. 1A, fig. 1A is a schematic structural diagram of a computer system, where the computer system includes a database 10, a training device 11, and an execution device 12, and the execution device 12 may include a first device 110 and a second device 120.

The database 10 includes a bilingual parallel database and a single-language database of different languages, and the data therein may be used as sample data for training a machine translation model.

The training device 11 may be a server, a workstation, a personal computer, or the like, and is used for training the machine translation model by the data acquired from the database 10. Specifically, the training device 11 may obtain a first bilingual parallel database, divide a plurality of bilingual parallel sentence pairs in the first bilingual parallel database into data derived from a source language and data derived from a target language, and train a first machine translation model through the data derived from the source language, where the first machine translation model is used to translate the source language into the target language. In some embodiments, the training apparatus may further translate the monolingual data in the monolingual database acquired from the database 10 through the machine translation model trained by the first bilingual parallel database to obtain a pseudo parallel sentence pair, and add the obtained pseudo parallel sentence pair to the database 10.

The executing device 12 may store the machine translation model trained by the training device 11, and the first device 110 in the executing device 12 may transmit with the server through the communication network.

The first device 110 has an application program installed therein, which supports a translation function, and the application program may be an electronic dictionary application program, an electronic book reading application program, a web browsing application program, a social contact application program, or the like. The first device 110 may be a terminal device such as a smart phone, a smart watch, a tablet computer, a notebook computer, and an intelligent robot.

The second device 120 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud service, a cloud database, cloud computing, a cloud function, cloud storage, network service, cloud communication, middleware service, content distribution network, and a big data and artificial intelligence platform. In some embodiments, the second device 120 is a backend server for applications in the first device 110.

In some embodiments, the language translation method provided by the embodiment of the present application may be executed by the second device 120. After the first device 110 obtains the data to be translated, the data to be translated may be sent to the second device 120 through the communication network, the second device 120 executes the method of language translation after receiving the data to be translated to obtain a translation result, further, the second device 120 sends the translation result to the first device 110, and the first device 110 displays the translation result through an application program.

In other embodiments, the first device 110 and the second device 120 may be the same device, and the language translation method provided in this embodiment may also be executed by this device, which is not limited in this embodiment.

It should be understood that, in some embodiments, the training device 11 and the executing device 12 may be the same device, and the device may execute the training work and the language translation work of the machine translation model, and the functions of the device are not limited in the embodiments of the present application.

The scheme provided by the embodiment of the application relates to an artificial intelligence natural language processing technology, and is specifically explained by the following embodiment:

please refer to fig. 1B, which is a flowchart illustrating a method for training a machine translation model according to an embodiment of the present disclosure. The training method of the machine translation model may be executed by the training device 11 in fig. 1A, and the embodiment of the present application is described by taking the training device 11 in fig. 1A as an example, as shown in fig. 1B, the training method of the machine translation model includes, but is not limited to, the following steps:

and S1, acquiring bilingual parallel databases, wherein the bilingual parallel databases comprise a first bilingual parallel database.

The bilingual parallel database includes a plurality of bilingual parallel sentence pairs. Bilingual parallel sentence pairs are content-aligned data composed of source language data and target language data. That is, in a bilingual parallel database, each piece of source language data has target language data corresponding to it.

In some embodiments, the bilingual parallel database is a first bilingual parallel database that includes a plurality of bilingual parallel sentence pairs derived from source language data or from target language data, i.e., one of the language data in each bilingual parallel sentence pair is translated by a human translator.

In other embodiments, the bilingual parallel database includes a first bilingual parallel database and a pseudo parallel sentence pair, and the pseudo parallel sentence pair is a sentence pair formed by translating the first language original text in the monolingual database and translating the second language text obtained by translating the first language original text through the machine translation model.

And S2, dividing the bilingual parallel sentence pairs in the first bilingual parallel database into source language data and target language data.

In one implementation, S2 may include, but is not limited to, the following steps:

and S21, acquiring the parallel sentence pairs to be processed from the first bilingual parallel database.

And S22, determining the data type of the parallel sentence pair to be processed according to the content covered by the parallel sentence pair to be processed, wherein the data type comprises source language data and target language data.

The content distribution covered by different languages is different, namely the content distribution covered by the source language data and the target language data is different in the parallel sentence pair to be processed, so that the data type of the parallel sentence pair to be processed can be determined through the deviation between the content distribution covered by the source language data and the target language data.

In a specific embodiment, determining the data type of the to-be-processed parallel sentence pair according to the content covered by the to-be-processed parallel sentence pair may include the following processes:

s221, training a first language model through a source language single language database, wherein the first language model is used for determining the probability of source language data in bilingual parallel sentence pairs appearing in the source language.

S222, inputting the source language data in the parallel sentence pair to be processed into the first language model to obtain a first probability that the source language data in the parallel sentence pair to be processed is from the source language.

And S223, training a second language model through the target language single-language database, wherein the second language model is used for determining the probability of the target language data in the bilingual parallel sentence pair appearing in the target language.

S224, inputting the target language data in the parallel sentence pair to be processed into a second language model to obtain a second probability that the target language data in the parallel sentence pair to be processed is from the target language.

And S225, determining the data type of the parallel sentence pair to be processed according to the deviation between the first probability and the second probability.

In some implementations, S225 may include the following processes: determining the score of the parallel sentence pair to be processed according to the deviation between the first probability and the second probability, wherein the score is used for determining the data type of the parallel sentence pair to be processed; when the score is larger than a target threshold value, determining that the parallel sentence pair to be processed belongs to source language data; and when the score is smaller than a target threshold value, determining that the parallel sentence pair to be processed belongs to the data from the target language.

In other embodiments, S225 may be implemented by other ways, such as comparing the first probability with the second probability, determining that the parallel sentence pair to be processed belongs to data originated from the source language when the first probability is greater than the second probability, and determining that the parallel sentence pair to be processed belongs to data originated from the target language when the first probability is less than the second probability.

The principle of detecting the data type of the parallel sentence pair to be processed according to the content distribution covered by the different languages in S221-S224 is described below. In particular, use may be made of

Representing the distribution of content covered by the source language,

representing the distribution of content covered by the target language. Given a pair of parallel sentences to be processed<x,y>The probability that it is covered by the source or target language can be expressed as:

a score can be used to quantify the difference between the two probabilities:

wherein

Due to the fact that

And

language-only, but parallel sentence-pairs to be processed<x,y>Irrelevant, so c is a constant when the source and target languages are given. It can be found that parallel sentence pairs with higher score values have a higher probability of belonging to data originating from the source language and parallel sentence pairs with lower score values have a higher probability of belonging to data originating from the target language. To depict

And

the language model based on the self-attention mechanism can be trained by using a source language single-language database and a target language single-language database respectively: first language model

And a second language model

And using probabilities of the language model to estimate

And

specifically, let

The score may be expressed as:

after the training of the language model is finished, the value of c can be determined through a small-scale bilingual parallel database with known data types.

It should be understood that the first language model and the second language model may be the above-mentioned language model based on the attention-free mechanism, and may also be a language model based on any other architecture, such as a language model based on a recurrent neural network, a language model based on a convolutional neural network.

The language model is obtained by training a large amount of sample data by a method of maximum likelihood estimation. The principle of determining the probability of a language data originating from the language by means of a language model is that a language model is usually constructed as a probability distribution p(s) of a string s, where p(s) actually reflects the probability of s appearing as a sentence. Probability here refers to the likelihood that this combination, which constitutes a string, appears in the corpus. Assuming that the corpus is from human language, this probability can be considered as the probability of whether an input sentence is in human language. The language model may include an input layer, a projection layer, a hidden layer, and an output layer. The string s is composed of a number of words, each word is represented in the input layer in the form of a one-hot vector, each one-hot vector is converted into another word vector in the projection layer, connected to create a matrix e, the matrix is then flattened and further converted into hidden vectors through the hidden layer, and finally the probability distribution of the string s is calculated and output using the softmax function from the hidden vectors. The probability of the sentence (character string) s may be a product of probabilities of each word in the sentence appearing in the sentence, and a probability of a word subsequent to the word appearing in the sentence may be predicted from a word in the input sentence.

For example, the monolingual database is taken as a Chinese monolingual database for example, and the training process of the language model is explained. For example, the process of training a language model with the sentence "i have a dream" as a sample data may be: firstly, constructing a target model, wherein the target model comprises an input layer, a projection layer, a hidden layer and an output layer; then inputting 'i has a dream' into the target model, and finally outputting the probability that the model predicts the words of 'i has a dream' through an input layer, a projection layer, a hidden layer and an output layer, wherein the model is guided to maximize the probability in the training process. In this way, each piece of data in the Chinese single-language database is input into the target model, and the optimized target model is obtained after training, namely the language model after training.

Illustratively, as shown in fig. 2, the data in the english-chinese (En-Zh) parallel database, the english-japanese (En-Ja) parallel database, and the english-german (En-De) parallel database are divided by the method for determining the data type of the bilingual parallel sentence pair described in the above-mentioned S2, respectively, to obtain the division accuracy as shown in fig. 2. The method FT is an existing text classification method based on a convolutional neural network, the method Ours is the method described in S2 in this embodiment, the accuracy of dividing the data in the three databases is represented by F1 value, and the higher the F1 value is, the higher the accuracy of dividing the data is.

After the data in the bilingual database is divided, a machine translation model can be trained through the divided data. Specifically, three machine translation models may be trained according to data type: (1) model 1 trained by source target language data; (2) model 2 obtained by training from source language data; (3) and (3) obtaining a model 3 through training of the first bilingual database without distinguishing data types.

Illustratively, the data in an English-Chinese (En-Zh) parallel database, an English-Japanese (En-Ja) parallel database and an English-German (En-De) parallel database are respectively used as training samples to train the corresponding three machine translation models. And evaluating the translation performance of each model from the three aspects of the overall quality of the translated text, the translation fidelity and the translation fluency to obtain the results shown in the figures 3-5.

As shown in FIG. 3, the first column of Data Origin represents the Data type of the training sample for the machine translation model, Target represents Data from the Target language, Source represents Data from the Source language, and Both represents Data that uses all of the parallel database without distinguishing Data types. From the second column, the bold face in the BLEU value of each column represents the highest score of the column, the underlined score represents the second highest score of the column, the BLEU value represents the quality of the translation, and a higher value represents a better quality of the translation. Illustratively, En-Zh represents inter-translation between english and chinese, the second column represents the Source language of english, and the Target language of chinese, and taking the second column as an example, the score 33.2 corresponding to Target represents the quality of the translation of the model translation obtained by training with data derived from chinese, the score 36.5 corresponding to Source represents the quality of the translation of the model translation obtained by training with data derived from english, and the score 36.6 corresponding to Both represents the quality of the translation of the model translation obtained by training with all data in the En-Zh bilingual parallel database without distinguishing between data derived from chinese or from english. It can be seen from the individual BLEU values of each column in fig. 3 that the model trained using only the data from the source language is less poor in the quality of the translation than the model trained using all the data in the parallel database without distinguishing the data types, and the quality of the translation of the model trained using only the data from the source language is also the highest score, and thus it can be seen that the translation of the model trained using only the data from the source language is better in the quality of the translation.

As shown in FIG. 4, the first column of Data Origin represents the Data type of the training sample for the machine translation model, Target represents Data from the Target language, Source represents Data from the Source language, and Both represents Data that uses all of the parallel database without distinguishing Data types. Wherein, from the second column, the bold in the F-measure value of each column represents the highest score of the column, the underlined score represents the second highest score of the column, and the F-measure value identifies the fidelity of the translation, and the higher the value is, the higher the fidelity of the translation is. Taking the first bilingual parallel database as an En-Zh bilingual parallel database as an example, illustratively, En → Zh represents english as a source language, chinese as a target language, En ← Zh represents chinese as a source language, and english as a target language, and F-measure values of real words, such as noun (noun), verb (verb), and adjective or adverb (adj), are determined respectively. It can be seen from fig. 4 that the highest scores are obtained a plurality of times using only the loyalty of the translation of the model derived from the source language data training, and it can be seen that the model derived from the source language data training alone performs best in terms of the loyalty of the translation in the three models. While the model trained using the data derived from the target language performed poorly on the fidelity of the translated text (F-measure value was low with respect to the other two models), and the scores in the fourth to seventh columns as in fig. 5, the model trained using all the data in the parallel database without distinguishing the data type was low compared to the F-measure value of the model trained using only the data derived from the source language, indicating that the model trained using additional data derived from the target language failed to improve the fidelity of the translated text without distinguishing.

As shown in fig. 5, fluency of translation is measured by the confusion of the language model (i.e., PPL), with lower PPL the better the fluency. Diff refers to the relative change in the PPL for a model relative to the PPL for a "Both". "No. abs." indicates that real words are not abstracted, "count.abs." indicates that all real words are abstracted with corresponding part-of-speech tags. Taking WMT20En-Zh bilingual parallel database as an example, the fluency performances of the trained three models are shown in fig. 5, and after all real words in the text to be translated are abstracted by corresponding part-of-speech tags, the fluency of the translated text of the model trained by using the source language data has a small change of + 1.4% and + 2.2% relative to the fluency of the translated text of the model trained by using all data in the bilingual parallel database without distinguishing the data types, so that it can be shown that after all real words are abstracted by corresponding part-of-speech tags, the model trained by using only the source language data has a better performance on the fluency of the translated text.

It should be understood that bilingual parallel sentence pairs in a bilingual parallel database are all sentences in one language (a first language) translated by a human translator into sentences in another language (a second language), with the first language of the bilingual parallel sentence pair being the original language of the bilingual parallel sentence pair (i.e., which language the bilingual parallel sentence pair originates from). Because the influence of the original language of the data on the neural machine translation model is neglected by the work of predecessors, the original language information of bilingual parallel sentence pairs in most large-scale bilingual parallel databases is lost in the data construction and arrangement process. As can be seen from the above-described expressions of models trained by data of different original languages (from source language data and from target language data), since the original languages of a large number of pairs of bilingual parallel sentences in the bilingual parallel database are different, the quality, fidelity, and the like of translations translated by training machine translation models by using data of different original languages without distinction are poor, so that pairs of bilingual parallel sentences in the bilingual parallel database can be divided into data of source language and data of target language according to the original languages of the pairs of bilingual parallel sentences, and machine translation models can be further trained for the divided data, thereby improving the performance of the machine translation models.

S3, training the first machine translation model by deriving from the source language data.

In the embodiment of the present application, training the first machine translation model by source language data may be implemented by the following several implementations, and the process of S3 is described in detail below.

Implementation mode 1: and training to obtain a first machine translation model by taking source language data which is divided in the first bilingual parallel database as a training sample. Specifically, a first machine translation model is trained with source language data derived from source language data as input and target language data as output.

Implementation mode 2: the first machine translation model is trained first with a first bilingual parallel database of undivided data types, and then fine-tuned by the partitioned source language data.

Specifically, please refer to fig. 6, and fig. 6 is a flowchart illustrating implementation 2.

Wherein, the above S3 may include the following steps in fig. 6:

s301, training through a first bilingual parallel database to obtain a first machine translation model.

The bilingual parallel sentence pairs in the first bilingual parallel database of the undivided data type are used as training samples, source language data in each group of bilingual parallel sentence pairs are used as input, target language data are used as output, and an initial machine translation model, also called a first machine translation model, is obtained through training.

S302, fine-tuning the first machine translation model through source language data.

And training the first machine translation model obtained through the training in the S301 by taking the divided source language data from the source language data as input and the target language data as output to obtain the fine-tuned first machine translation model.

As can be appreciated, the fine tuning of the machine translation model is to further train the machine translation model through partial data.

The following describes training to obtain the translation performance of the first machine translation model by the method described in implementation mode 2, and for example, the first machine translation models respectively corresponding to the first machine translation models are trimmed by data from different languages, the trimmed first machine translation models are used for translation, and the translation performance of the models is characterized by applying the translation quality (BLEU value). As shown in FIG. 7, FIG. 7 shows BLEU values of a first machine model trained using a first bilingual database and refined first machine translation model in six translation directions (En-Zh, Zh-En, En-Ja, Ja-En, En-De, De-En, wherein En-Zh indicates English En as the source language and Chinese Zh as the target language). Wherein, the values in the row corresponding to Baseline represent BLEU values of a first machine model trained by a first bilingual parallel database used in six translation directions, the values in the row corresponding to Tune represent BLEU values of models obtained by fine-tuning the first machine model respectively corresponding to the BLEU values by using data from different languages in the six translation directions, and Average represents an Average value of the values in each row. It can be seen from the figure that, in the six translation directions, the BLEU value of the model after source language data fine tuning is higher than that of the first machine translation model without fine tuning, which indicates that the translation model obtained by training through the method described in the above implementation mode 2 can improve the translation quality.

Implementation mode 3: referring to fig. 8, fig. 8 is a flowchart illustrating another training method for a machine translation model according to an embodiment of the present application, where the step S3 may include the following steps in fig. 8:

s31a, obtaining a monolingual database, wherein the monolingual database comprises a plurality of original texts with target languages.

S32a, inputting the original text with each language in the single language database as the target language into a second machine translation model to obtain a source language translation text corresponding to each original text, wherein the second machine translation model is used for translating the target language into the source language.

The second machine translation model is obtained through training of the first bilingual parallel database, and training can be performed in a mode that target language data in each group of bilingual parallel sentence pairs of the first bilingual parallel database are input and corresponding source language data of the target language data are output.

In some embodiments, after obtaining the original texts in the target languages and the corresponding translated texts in the source languages of the single language database, the pseudo parallel sentence pair composed of the original text in each target language and the corresponding translated text in the source language is added with indication information, such as a tag BT (meaning reverse translation). Wherein the reverse translation represents a translation from the target language to the source language.

S33a, adding multiple groups of pseudo parallel sentence pairs consisting of the original texts of the target languages in the monolingual database and the source language translation texts respectively corresponding to the original texts into the bilingual parallel database.

And S34a, training through a bilingual parallel database to obtain a first machine translation model.

S35, fine-tuning the first machine translation model through source language data. The process may refer to the related description of S302 in implementation 1, and is not described herein again.

Implementation mode 4: referring to fig. 9, fig. 9 is a flowchart illustrating another training method for a machine translation model according to an embodiment of the present application, where the step S3 may include the following steps in fig. 9:

s31b, obtaining a monolingual database, wherein the monolingual database comprises a plurality of original texts with the source language.

S32b, inputting the original text with each language in the single language database as the source language into a third machine translation model to obtain a target language translation text corresponding to each original text, wherein the third machine translation model is used for translating the source language into the target language.

The third machine translation model is obtained through training of the first bilingual parallel database, and can be trained in a mode that source language data in each group of bilingual parallel sentence pairs of the first bilingual parallel database are input and target language data corresponding to the source language data are output.

In some embodiments, after obtaining the original texts in the source languages and the target language translation texts corresponding to the original texts in the single language database, indication information, such as a label FT (meaning forward translation), is added to the pseudo parallel sentence pair composed of the original text in each source language and the target language translation text corresponding to the original text in each source language. Wherein the forward translation represents a translation from a source language to a target language.

S33b, adding a plurality of pseudo parallel sentence pairs consisting of original texts taking a plurality of languages in the monolingual database as source languages and target language translation texts respectively corresponding to the original texts into the bilingual parallel database.

And S34b, training through a bilingual parallel database to obtain a first machine translation model.

In implementation 3 and implementation 4, multiple sets of pseudo-parallel sentence pairs are determined according to the target language monolingual database and the machine translation model obtained through the training of the first bilingual parallel database, and the multiple sets of pseudo-parallel sentence pairs are added to the bilingual parallel database, so that the initial machine translation model is trained through the bilingual parallel database to which the pseudo-parallel sentence pairs are newly added. Moreover, after the indication information is added to the pseudo parallel sentence pair, the first machine translation model is trained through the bilingual parallel database to perform better.

Illustratively, the following describes the translation performance (characterized by the translation quality BLEU value) of the machine translation model obtained by training in implementation 2, implementation 3 and implementation 4 described above when english is used as the target language or source language. As shown in FIG. 10, Monolintual in the table represents Monolingual and Bilngual represents Bilingual; x → En denotes: x is a source language, and English (En) is a target language, which represents that different source languages are translated into English; en → X denotes: english (En) is a source language, X is a target language, and represents the translation from English to a different target language; tagging denotes Tagging pseudo-parallel data pairs obtained from monolingual data, such as (BT or FT); Fine-Tune represents the Fine tuning of the initial machine translation model using data derived from the source language. Line 1 shows the translation quality (BLEU) of a first model obtained by training a first bilingual parallel database, line 2 shows the translation quality of a second model obtained by fine-tuning the first model using source language data, line 3 shows the translation quality of a third model obtained by training the first parallel database and pseudo-parallel sentences to which indication information (also referred to as labels) is not added, line 4 shows the translation quality of a fourth model obtained by training the first parallel database and pseudo-parallel sentences to which indication information (also referred to as labels) is added, line 5 shows the translation quality of a fifth model obtained by fine-tuning the third model using source language data, and line 6 shows the translation quality of a sixth model obtained by fine-tuning the fourth model using source language data. Lines 3-6 show the translation quality of a model trained from a bilingual parallel database including pseudo-parallel sentence pairs obtained by applying an english (En) monolingual database, the blu-ray value in block 101 shows the translation quality of a machine translation model trained from a bilingual parallel database including pseudo-parallel sentence pairs obtained by reversely translating the monolingual database, the blu-ray value in block 102 shows the translation quality of a machine translation model trained from a bilingual parallel database including pseudo-parallel sentence pairs obtained by forward translating the monolingual database, and Ave shows the average value. As can be seen from the figure, the translation quality of the translated text by the machine translation models (the second model, the fifth model and the sixth model) which are derived from the source language data fine tuning is higher than that of the corresponding machine translation models (the first model, the third model and the fourth model) which are not derived from the source language data fine tuning.

According to the training method of the machine translation model, data in the first bilingual parallel database are divided into source language data and target language data, the initial machine translation model is subjected to fine tuning through the source language data to obtain the fine-tuned machine translation model, the fine-tuned machine translation model is applied to perform translation tasks, the influence of language coverage deviation existing among different languages on the machine translation model can be eliminated, the performance of the machine translation model obtained through training by the method is improved, and a translated text with high quality and high fidelity can be obtained by applying the model.

A language translation method provided in an embodiment of the present application is described below, where the method may include:

the computer device receives data to be translated, which is source language data.

And the computer equipment inputs the data to be translated into a machine translation model to obtain target language data corresponding to the data to be translated. Firstly, performing word embedding processing on input data to be translated by using a machine translation model to obtain a mapping intermediate vector; fitting the mapping intermediate vectors according to the dependency relationship among the mapping intermediate vectors to obtain fitted vectors; and finally, decoding the fitted vector through a decoder in the machine translation model to obtain a decoded vector, thereby outputting a translation result. The machine translation model may be the first machine translation model obtained by training through the training method of the machine translation model shown in fig. 1B, fig. 6, fig. 8, or fig. 9. The description of the machine translation model may refer to the description of fig. 1B, or fig. 6, or fig. 8, or fig. 9, and will not be repeated here.

Referring to fig. 11A, fig. 11A is a schematic structural diagram of a training apparatus 1100 for a machine translation model according to an embodiment of the present disclosure. The training device of the machine translation model can be applied to the training apparatus 11 in fig. 1A, and the training device of the machine translation model can include:

an obtaining unit 1101, configured to obtain a bilingual parallel database, where the bilingual parallel database includes multiple sets of bilingual parallel sentence pairs, where the bilingual parallel sentence pairs are content-aligned data composed of source language data and target language data, and the bilingual parallel database includes a first bilingual parallel database;

a dividing unit 1102, configured to divide multiple sets of bilingual parallel sentence pairs in the first bilingual parallel database into source language-derived data and target language-derived data, where target language data in the bilingual parallel sentence pair belonging to the source language-derived data is translated based on the source language data, and source language data in the bilingual parallel sentence pair belonging to the target language data is translated based on the target language data;

a training unit 1103 configured to train a first machine translation model from the source language data, the first machine translation model being configured to translate the source language into the target language.

The specific functional implementation of the dividing unit 1102 can be referred to the above-mentioned related descriptions of S21 and S221-S225 in S2; the specific functional implementation of the training unit 1103 can be referred to the related description of implementation manners 1-4 in S3, and will not be described herein again.

Please refer to fig. 11B, fig. 11B is a schematic structural diagram of a language translation apparatus according to an embodiment of the present application. The language translation apparatus may be applied to the execution device 12 in fig. 1A, and may include:

a receiving unit 1201, configured to receive data to be translated, where the data to be translated is source language data;

a translation unit 1202, configured to translate data to be translated into target language data corresponding to the data, where the translation unit 1202 includes a machine translation model, and the machine translation model is a machine translation model obtained by training through a training method of any one of the machine translation models shown in fig. 1B, fig. 6, fig. 8, and fig. 9.

Referring to fig. 12, fig. 12 is a schematic structural diagram of a language translation device according to an embodiment of the present application. As shown in fig. 12, the language translation apparatus 1000 may correspond to the second apparatus 120 or the first apparatus 110 in the computer system shown in fig. 1A described above, and the language translation apparatus 1000 may include: the processor 1001, the network interface 1004, and the memory 1005, and the language translation apparatus 1000 may further include: a user interface 1003, and at least one communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display) and a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a standard wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1004 may be a high-speed RAM memory or a non-volatile memory (e.g., at least one disk memory). The memory 1005 may optionally be at least one memory device located remotely from the processor 1001. As shown in fig. 12, a memory 1005, which is a kind of computer-readable storage medium, may include therein an operating system, a network communication module, a user interface module, and a device control application program.

In the language translation device 1000 shown in fig. 12, the network interface 1004 may provide a network communication function; the user interface 1003 is an interface for providing a user with input; the processor 1001 may be configured to call the device control application stored in the memory 1005, so as to implement a translation task performed on a model obtained by training a training method of any one of the machine translation models shown in fig. 1B, fig. 6, fig. 8, or fig. 9, where the training method is not described herein again. In addition, the beneficial effects of the same method are not described in detail.

Further, here, it is to be noted that: an embodiment of the present application further provides a computer-readable storage medium, and the computer-readable storage medium stores the computer program executed by the aforementioned language translation apparatus 1000, and the computer program includes program instructions, and when the processor executes the program instructions, the language translation method can be executed, and therefore, details will not be repeated here. In addition, the beneficial effects of the same method are not described in detail. For technical details not disclosed in embodiments of the computer-readable storage medium according to the present invention, reference is made to the description of embodiments of the method of the present application.

Referring to fig. 13, fig. 13 is a schematic structural diagram of a training apparatus of a machine translation model according to an embodiment of the present disclosure. As shown in fig. 13, the training apparatus 2000 of the machine translation model may include: the processor 2001, the network interface 2004 and the memory 2005, and the training apparatus 2000 of the machine translation model described above may further include: a user interface 2003, and at least one communication bus 2002. The communication bus 2002 is used to implement connection communication between these components. The user interface 2003 may include a Display (Display) and a Keyboard (Keyboard), and the optional user interface 2003 may further include a standard wired interface and a standard wireless interface. The network interface 2004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 2004 may be a high-speed RAM memory or a non-volatile memory, such as at least one disk memory. The memory 2005 may optionally also be at least one memory device located remotely from the aforementioned processor 2001. As shown in fig. 13, the memory 2005 which is a kind of computer-readable storage medium may include therein an operating system, a network communication module, a user interface module, and a device control application program.

In the training apparatus 2000 of the machine translation model shown in fig. 13, the network interface 2004 may provide a network communication function; the user interface 2003 is mainly used for providing an input interface for a user, and may include a display screen, where the display screen may display a result obtained by an instruction executed by the processor 2001, for example, displaying a progress of model training and displaying a translation result of data to be translated; the processor 2001 may be configured to call the device control application program stored in the memory 2005 to implement the method for training the machine translation model shown in any one of fig. 1B, fig. 6, fig. 8, and fig. 9, which is not described herein again in related descriptions. In addition, the beneficial effects of the same method are not described in detail.

It should be understood that the training device 2000 for machine translation models described in this embodiment of the present application may perform the training method for machine translation models described in any one of fig. 1B, fig. 6, fig. 8, and fig. 9 above, and will not be described herein again. In addition, the beneficial effects of the same method are not described in detail.

Further, here, it is to be noted that: an embodiment of the present application further provides a computer-readable storage medium, where a computer program executed by the aforementioned training apparatus 2000 for machine translation models is stored in the computer-readable storage medium, and the computer program includes program instructions, and when the processor executes the program instructions, the method for training a machine translation model according to any one of fig. 1B, fig. 6, fig. 8, or fig. 9 can be executed, and therefore, details of related descriptions are not repeated here. In addition, the beneficial effects of the same method are not described in detail. For technical details not disclosed in embodiments of the computer-readable storage medium referred to in the present application, reference is made to the description of embodiments of the method of the present application.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present invention, and it is therefore to be understood that the invention is not limited by the scope of the appended claims.

Claims

1. A method for training a machine translation model, comprising:

acquiring a bilingual parallel database, wherein the bilingual parallel database comprises a plurality of groups of bilingual parallel sentence pairs, the bilingual parallel sentence pairs are content-aligned data formed by source language data and target language data, and the bilingual parallel database comprises a first bilingual parallel database;

2. The method of claim 1, wherein said partitioning the plurality of bilingual parallel sentence pairs in the first bilingual parallel database into source language data and target language data comprises:

3. The method according to claim 2, wherein the determining the data type of the pair of parallel sentences to be processed according to the content covered by the pair of parallel sentences to be processed comprises:

4. The method according to claim 3, wherein the determining the data type of the to-be-processed parallel sentence pair according to the deviation between the first probability and the second probability specifically comprises:

5. The method of claim 3 or 4, wherein determining a first probability that source language data in the pair of parallel sentences to be processed is derived from the source language from source language data in the pair of parallel sentences to be processed comprises: inputting the source language data in the parallel sentence pair to be processed into a first language model, and determining the first probability, wherein the first language model is used for determining the probability of the source language data in the parallel sentence pair to be processed appearing in the source language, and the first language model is obtained by training a source language single language database;

6. The method of any of claims 1-4, wherein prior to training a first machine translation model with the source language-derived data, the method further comprises:

7. The method of any of claims 1-4, wherein prior to training a first machine translation model with the source language-derived data, the method further comprises:

8. The method of any of claims 1-4, wherein prior to training a first machine translation model with the source language-derived data, the method further comprises:

9. A method of language translation, comprising:

inputting data to be translated into a machine translation model to obtain target language data corresponding to the data to be translated, wherein the machine translation model is a first machine translation model obtained by training according to the method of any one of claims 1 to 8.

10. An apparatus for training a machine translation model, comprising:

11. A language translation apparatus, comprising:

a translation unit for translating the data to be translated into its corresponding target language data, the translation unit comprising a machine translation model, the machine translation model being a first machine translation model trained by the method of any one of claims 1-8.

12. A computer device, comprising: one or more processors, one or more memories, the one or more memories being respectively coupled with the one or more processors; the one or more memories are for storing computer program code comprising computer instructions;

the processor is used for calling the computer instruction to execute: the method of training a machine translation model of any of claims 1-8.

13. A computer device, comprising: one or more processors, one or more memories, the one or more memories being respectively coupled with the one or more processors; the one or more memories are for storing computer program code comprising computer instructions;

the processor is used for calling the computer instruction to execute: the language translation method of claim 9.

14. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising program instructions which, when executed by a processor, perform the method of training a machine translation model according to any of claims 1-8.

15. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising program instructions that, when executed by a processor, perform the language translation method according to claim 9.