CN114065778A

CN114065778A - Chapter-level translation method, translation model training method and device

Info

Publication number: CN114065778A
Application number: CN202010763386.7A
Authority: CN
Inventors: 张培; 张旭; 陈伟
Original assignee: Beijing Sogou Technology Development Co Ltd
Current assignee: Beijing Sogou Technology Development Co Ltd
Priority date: 2020-07-31
Filing date: 2020-07-31
Publication date: 2022-02-18

Abstract

The invention discloses a chapter translation method, a translation model training method and a device, which are applied to the field of machine translation, wherein for each sentence to be translated in a chapter to be translated, a sentence representation of the sentence to be translated, which contains context semantic information, is obtained through a target chapter translation model, and the sentence to be translated is translated based on the sentence representation; and obtaining a chapter translation result corresponding to the chapter to be translated according to the translation result of each sentence of the sentence to be translated in the chapter to be translated. The invention realizes the effect of improving chapter-level text translation.

Description

Chapter-level translation method, translation model training method and device

Technical Field

The invention relates to the technical field of neural machine translation, in particular to a chapter-level translation method, a translation model training method and a device.

Background

In recent years, with the proposal of the transform framework, NMT (Neural machine translation) has been developed in a leap-step manner, and the translation quality has been greatly improved. As more and more businesses move toward globalization, NMT may have a tremendous impact on the translation industry. Unlike traditional statistical machine translation, NMT uses neural network-based techniques to achieve more context-accurate translation.

Since the NMT can translate an entire sentence at once, the output of the NMT can be similar to manual translation. At present, for discourse-level translation, a single sentence is usually used as a translation unit, and translation results of each sentence are spliced to obtain a final discourse translation result.

Disclosure of Invention

The embodiment of the invention provides a chapter-level translation method, a translation model training method and a device, and aims to solve the technical problem that in the prior art, the chapter-level text translation effect is poor.

In a first aspect, an embodiment of the present invention provides a chapter translation method, including:

for each sentence of the sentence to be translated in the chapters to be translated, obtaining sentence representation of the sentence to be translated, which contains context semantic information, through a target chapter translation model, and translating the sentence to be translated based on the sentence representation;

and obtaining a chapter translation result corresponding to the chapter to be translated according to the translation result of each sentence of the sentence to be translated in the chapter to be translated.

Optionally, the target chapter translation model is obtained by learning sentence expressions containing context semantic information in a chapter-level training corpus, where the chapter-level training corpus is a chapter-level parallel corpus and/or a chapter-level monolingual corpus.

Optionally, the obtaining the target chapter translation model by learning the sentence representation containing the context semantic information in the chapter-level training corpus includes:

aiming at the fact that the discourse-level training corpus is discourse-level parallel corpus, a context prediction model and a neural machine translation model are jointly trained on the discourse-level parallel corpus, and a target discourse translation model corresponding to the neural machine translation model is obtained, wherein the context prediction model is used for learning sentence expression containing context semantic information from source end sentences of the discourse-level parallel corpus; or

The method comprises the steps of pre-training a pre-training model based on a chapter-level monolingual corpus aiming at the chapter-level training corpus, and finely adjusting a target combination model according to the pre-training model after pre-training to obtain a target chapter translation model corresponding to the target combination model, wherein the pre-training model is used for learning sentence expression containing context semantic information from source end sentences of the chapter-level monolingual corpus, and the target combination model contains a neural machine translation model and a source end context encoder.

Optionally, the jointly training a context prediction model and a neural machine translation model based on the chapter-level parallel corpora to obtain a target chapter translation model corresponding to the neural machine translation model includes:

the neural machine translation model and the context prediction model are jointly trained by utilizing the obtained first chapter-level parallel corpus until a trained joint model is obtained, wherein a first parallel sentence pair in the first chapter-level parallel corpus comprises a current source end sentence, a source end context sentence aiming at the current source end sentence and a target end sentence;

extracting a trained neural machine translation model from the trained joint model;

and continuing to train the trained neural machine translation model by using the obtained second chapter-level parallel corpus until a target chapter-level translation model corresponding to the neural machine translation model is obtained, wherein a second parallel sentence pair in the second chapter-level parallel corpus comprises a current source end sentence and a target end sentence aiming at the current source end sentence.

Optionally, the neural machine translation model and the context prediction model share the same source-end encoder, the neural machine translation model further includes a target-end decoder, and the context prediction model further includes a source-end context decoder.

Optionally, the jointly training the neural machine translation model and the context prediction model by using the obtained first chapter-level parallel corpus includes multiple times of jointly iterative training on the neural machine translation model and the context prediction model; wherein, any one time of the joint iterative training comprises the following steps:

encoding, by the source-end encoder, a current source-end sentence of a first parallel sentence pair in the first chapter-level parallel corpus;

decoding the coding results of the same source end coder respectively through the target end decoder and the source end context decoder so as to predict a target end sentence and a source end context sentence corresponding to the current source end sentence;

determining a joint loss gradient according to the predicted source context sentence, the predicted target prediction sentence and the first parallel sentence pair;

updating model parameters of the neural machine translation model and model parameters of the context prediction model based on the joint loss gradient.

Optionally, before jointly training the neural machine translation model and the context prediction model using the acquired first chapter-level parallel corpus, the method further includes:

pre-training a pre-training model by using the discourse-level monolingual corpus to obtain a pre-training model after pre-training, wherein the pre-training model is used for learning sentence expressions containing context semantic information from sentences at the source end of the discourse-level monolingual corpus;

initializing the neural machine translation model and the context prediction model based on the pre-trained model.

Optionally, the fine-tuning the target combination model according to the pre-trained model after pre-training to obtain the target chapter translation model corresponding to the target combination model includes:

initializing the target combination model based on the pre-trained model after pre-training;

and training the initialized target combination model according to the discourse-level monolingual corpus to obtain a target discourse translation model corresponding to the target combination model.

In a second aspect, an embodiment of the present invention provides a translation model training method, including:

and training a context prediction model and a neural machine translation model jointly based on the chapter-level parallel corpora to obtain a target chapter translation model corresponding to the neural machine translation model, wherein the context prediction model is used for learning sentence expression containing context semantic information from source end sentences of the chapter-level parallel corpora.

utilizing a chapter-level monolingual corpus to train a pre-training model to obtain a pre-training model after pre-training, wherein the pre-training model is used for learning sentence expressions containing context semantic information from sentences at the source end of the chapter-level monolingual corpus;

In a third aspect, an embodiment of the present invention provides a translation model training method, including:

pre-training the pre-training model based on the discourse-level monolingual corpus;

and fine-tuning a target combination model according to the pre-trained pre-training model to obtain a target chapter translation model corresponding to the target combination model, wherein the pre-training model is used for learning sentence expression containing context semantic information from source end sentences of the chapter-level monolingual corpus, and the target combination model contains a neural machine translation model and a source end context encoder.

In a fourth aspect, an embodiment of the present invention provides a chapter translator, including:

the sentence translation unit is used for obtaining sentence representation of the sentence to be translated, which contains context semantic information, through the target chapter translation model aiming at each sentence to be translated in the chapters to be translated, and translating the sentence to be translated based on the sentence representation;

and the translation result forming unit is used for obtaining a chapter translation result corresponding to the chapter to be translated according to the translation result of each sentence of the sentence to be translated in the chapter to be translated.

Optionally, the apparatus further comprises:

and the model training unit is used for obtaining the target discourse translation model by learning sentence expression containing context semantic information in the discourse-level training corpus, wherein the discourse-level training corpus is discourse-level parallel corpus and/or discourse-level monolingual corpus.

Optionally, the model training unit includes:

the first training unit is used for training a context prediction model and a neural machine translation model based on the discourse-level parallel corpus in a combined manner to obtain a target discourse translation model corresponding to the neural machine translation model aiming at the discourse-level parallel corpus, wherein the context prediction model is used for learning sentence expression containing context semantic information from source end sentences of the discourse-level parallel corpus;

or the model training unit comprises:

the second training unit is used for aiming at the discourse-level training corpus to be discourse-level monolingual corpus and pre-training the pre-training model based on the discourse-level monolingual corpus;

the model fine-tuning unit is used for fine-tuning a target combination model according to a pre-trained model after pre-training to obtain a target chapter translation model corresponding to the target combination model, wherein the pre-trained model is used for learning sentence expression containing context semantic information from source end sentences of the chapter-level monolingual corpus, and the target combination model contains a neural machine translation model and a source end context encoder.

Optionally, the first training unit includes:

a joint training subunit, configured to jointly train the neural machine translation model and the context prediction model by using the obtained first chapter-level parallel corpus until a trained joint model is obtained, where a first parallel sentence pair in the first chapter-level parallel corpus includes a current source-end sentence, a source-end context sentence for the current source-end sentence, and a target-end sentence;

the model extraction subunit is used for extracting a trained neural machine translation model from the trained combined model;

and a continuing training subunit, configured to continue training the trained neural machine translation model by using the obtained second chapter-level parallel corpus until a target chapter-level translation model corresponding to the neural machine translation model is obtained, where a second parallel sentence pair in the second chapter-level parallel corpus includes a current source end sentence and a target end sentence aiming at the current source end sentence.

Optionally, the joint training subunit is configured to perform multiple joint iterative training on the neural machine translation model and the context prediction model; wherein, for any one time of joint iterative training, the joint training subunit is specifically configured to:

Optionally, the apparatus further comprises:

the system comprises a first pre-training unit, a second pre-training unit and a third pre-training unit, wherein the first pre-training unit is used for training a pre-training model by utilizing the chapter-level monolingual corpus to obtain the pre-training model after pre-training, and the pre-training model is used for learning sentence expressions containing context semantic information from sentences at the source end of the chapter-level monolingual corpus;

an initialization unit, configured to initialize the neural machine translation model and the context prediction model based on the pre-trained pre-training model.

Optionally, the model fine-tuning unit includes:

the initialization subunit is used for initializing the target combination model based on the pre-trained model after pre-training;

and the fine tuning training unit is used for training the target combination model according to the chapter-level monolingual corpus to obtain a target chapter translation model corresponding to the target combination model.

In a fifth aspect, an embodiment of the present invention provides a translation model training apparatus, including:

the first training unit is used for training a context prediction model and a neural machine translation model jointly based on the chapter-level parallel corpus to obtain a target chapter translation model corresponding to the neural machine translation model, wherein the context prediction model is used for learning sentence expression containing context semantic information from source end sentences of the chapter-level parallel corpus.

Optionally, the first training unit includes:

Optionally, the apparatus further comprises:

the system comprises a first pre-training unit, a second pre-training unit and a third pre-training unit, wherein the first pre-training unit is used for training a pre-training model by utilizing a chapter-level monolingual corpus to obtain the pre-training model after pre-training, and the pre-training model is used for learning sentence expressions containing context semantic information from sentences at the source end of the chapter-level monolingual corpus;

In a sixth aspect, an embodiment of the present invention provides a translation model training apparatus, including:

In a seventh aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program is configured to, when executed by a processor, implement any one of the methods in the first aspect.

In an eighth aspect, an embodiment of the present invention provides an electronic device, including a memory, and one or more programs, where the one or more programs are stored in the memory and configured to be executed by one or more processors to execute operation instructions included in the one or more programs for performing any one of the methods of the first aspect.

One or more technical solutions provided by the embodiments of the present invention at least achieve the following beneficial effects:

aiming at the sentence to be translated in the chapter to be translated, the embodiment of the invention obtains the sentence expression of the sentence to be translated, which contains context semantic information, through a target chapter translation model, and translates the sentence to be translated based on the sentence expression; because the target chapter translation model captures the context semantic information of the sentence to be translated in the chapter, the translation ambiguity can be eliminated, the translation results of the same words in different sentences of the chapter can be kept more consistent, and the translation effect of the chapter-level text can be improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present specification, a brief description will be given below of the embodiments or the drawings required in the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the present specification, and it is obvious for a person skilled in the art to obtain other drawings based on these drawings without inventive labor.

FIG. 1 is a flow chart of a chapter translation method according to an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of a joint model according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a pre-training model according to an embodiment of the present invention;

FIG. 4 is a flowchart illustrating a joint training method according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a target combination model according to an embodiment of the present invention;

FIG. 6 is a flowchart illustrating a pre-training + trimming method according to an embodiment of the present invention;

FIG. 7 is a functional block diagram of a chapter translator according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of an electronic device in an embodiment of the invention.

Detailed Description

In order to solve the technical problems, the technical scheme provided by the embodiment of the invention has the following general idea: aiming at a sentence to be translated in the discourse to be translated, obtaining a sentence representation of the sentence to be translated, which contains context semantic information, through a target discourse translation model, and translating the sentence to be translated based on the sentence representation; therefore, the target chapter translation model captures the inter-sentence dependency relationship of the sentence to be translated at the chapter level so as to improve the translation effect of the chapter-level text.

In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some, but not all embodiments of the present disclosure. All other embodiments obtained by a person skilled in the art based on the embodiments in the present specification without any inventive step are within the scope of the present specification.

In a first aspect, an embodiment of the present invention provides a chapter translating method, as shown in fig. 1, the chapter translating method includes the following steps S10 to S11:

s10, aiming at each sentence to be translated in the chapters to be translated, obtaining sentence expression of the sentence to be translated, which contains context semantic information, through a target chapter translation model, and translating the sentence to be translated based on the sentence expression;

s11, obtaining a chapter translation result corresponding to the chapter to be translated according to the translation result of each sentence of the sentence to be translated in the chapter to be translated.

In the target chapter translation model in the embodiment of the invention, the model structure at least comprises the neural machine translation model, the training of the target chapter translation model is to learn the sentence representation that the chapter-level training corpus contains the context semantic information, and the sentence representation that the chapter-level training corpus contains the context semantic information is learned, so that the trained target chapter translation model can perform 'contextualized' sentence representation on the source-end sentence, and the improvement of the chapter translation capability of the chapter-level neural machine is facilitated.

In an embodiment of the present invention, the neural machine translation model in the target chapter translation model may be a translation model based on a transform framework, where the transform framework is a translation framework based on attention (self-attention mechanism) composed of an encoder and a decoder. Of course, according to actual requirements, other types of neural machine translation models may be selected, for example, a translation model of an encoder-decoder series, such as a neural machine translation model of an RNN (recurrent neural network) architecture.

It should be noted that, in the embodiment of the present invention, the training of the target chapter translation model and the chapter translation method described in the above steps S10 to S11 belong to mutually independent execution processes, and may be performed on different electronic devices, that is, the target chapter translation model trained on one electronic device may be applied to another electronic device to translate a chapter-level text.

In specific implementation, the chapter-level training corpus used for training the target chapter translation model may be a chapter-level parallel corpus or a chapter-level monolingual corpus. And the training process of the target chapter translation model is correspondingly different according to the difference of the used chapter-level training corpora. Next, a target discourse translation model is trained on discourse level parallel corpus or discourse level monolingual corpus, and a training method is respectively given:

the training method comprises the following steps: joint training method

For the fact that the chapter-level training corpus is a chapter-level parallel corpus, a target chapter translation model can be obtained by adopting a joint training method in the specific mode: and training the neural machine translation model and the context prediction model by using the chapter-level parallel corpora in a combined manner to obtain a target chapter translation model corresponding to the neural machine translation model.

Specifically, the model structure of the finally obtained target chapter translation model is the same as that of the neural machine translation model. The context prediction model is used for learning sentence expression containing context semantic information from source end sentences of the discourse-level parallel corpus.

Specifically, a target chapter translation model corresponding to the neural machine translation model is obtained through a joint training method, and the method specifically comprises the following steps of A1-A3:

and A1, training a neural machine translation model and a context prediction model jointly by using the obtained first chapter-level parallel corpora until a trained joint model is obtained.

Specifically, the first chapter-level parallel corpus includes a certain number of first parallel sentence pairs, where the first parallel sentence pairs include a source-end sentence, a source-end context sentence for the source-end sentence, and a target-end sentence. The source context sentence comprises a previous source sentence and a next source sentence of the current source sentence. Such as: the first parallel sentence pair is represented as(s)_i,s_i-1,s_i+1,y_i) Wherein s is_iIs the current source-end sentence. s_i-1Is s_iLast source sentence, s_i+1Is s_iNext source end sentence, y_iIs s_iThe target end sentence of (1).

The joint model comprises a neural machine translation model and a context prediction model. In the joint training process, a target end sentence of a current source end sentence is predicted through a neural machine translation model, and a source end context sentence of the current source end sentence is predicted through a context prediction model.

Referring to fig. 2, fig. 2 is a schematic structural diagram of a joint model in an embodiment of the present invention, where the joint model includes a source-end encoder and three decoders sharing the source-end encoder, specifically, a neural machine translation model and a context prediction model share the same source-end encoder, the neural machine translation model further includes a target-end decoder, and the context prediction model further includes a source-end context decoder, where the source-end context decoder of the context prediction model includes a pre-decoder (pre-encoder) and a post-decoder (next-encoder). The training of the combined model is a process of performing combined training on the neural machine translation model and the context prediction model, and specifically comprises multiple times of combined iterative training on the neural machine translation model and the context prediction model; referring to fig. 4, fig. 4 is a schematic flow chart of a joint training method in an embodiment of the present invention, where any one joint iterative training includes the following steps a11 to a 13:

step A11, a source-end encoder encodes a current source-end sentence of a first parallel sentence pair in the first chapter-level parallel corpus, and a target-end decoder and a source-end context decoder respectively decode the encoding result of the same source-end encoder to predict a target-end sentence and a source-end context sentence corresponding to the current source-end sentence.

More specifically, as shown in fig. 2, based on the context prediction model in the embodiment of the present invention including a pre-decoder and a post-decoder, step a11 specifically includes: encoding the current source end sentence of the first parallel sentence pair through a source end encoder to obtain a source end sentence encoding vector; the source-side sentence encoding vector is decoded by a front decoder to predict a previous source-side sentence, the source-side sentence encoding vector is decoded by a rear decoder to predict a next source-side sentence, and the source-side sentence encoding vector is decoded by a target-side decoder to predict a target-side sentence.

In a specific implementation process, the pre-decoder, the post-decoder, and the source encoder of the context prediction model share a word vector of the source sentence, that is, the source sentence is expressed as a word vector and then transmitted to the source encoder, the pre-decoder, and the post-decoder as shown in fig. 2.

And step A12, determining a joint loss gradient according to the predicted source context sentence, the predicted target prediction sentence and the first parallel sentence pair.

It should be noted that M first parallel sentence pairs are used in one joint iteration process, where M is a positive integer. The calculation formula of the joint loss function is as follows:

Loss＝Loss_tgt+μ*Loss_pre+λ*Loss_next

wherein, Loss _ tgt is the Loss of M target sentences predicted by the target decoder; and the Loss of M previous source end sentences corresponding to the M target end sentences is predicted by the front decoder according to the Loss _ pre, and the Loss of M next source end sentences corresponding to the M target end sentences is predicted by the rear decoder according to the Loss _ next. And determining a joint loss gradient according to the joint loss function.

And A13, updating model parameters of the neural machine translation model and the context prediction model based on the joint loss gradient. I.e. updating the model parameters of the joint model shown in fig. 2.

And continuously and iteratively training the combined model by repeating the steps A11-A13 until a first preset finishing condition is met, and finishing the training to obtain the trained combined model. Specifically, the joint training may be ended when a preset iteration threshold is reached or convergence occurs, so as to obtain a trained joint model.

After the trained joint model is obtained, step a2 is performed: and extracting the trained neural machine translation model from the trained joint model. Specifically, as shown in fig. 2, the pre-decoder and the post-decoder can be removed from the joint model, and a coder-decoder network formed by a source-side decoder and a target-side decoder is left, so as to obtain the trained neural machine translation model.

After step A2, perform step A3: and continuously training the extracted trained neural machine translation model by using the obtained second chapter parallel corpus until a target chapter level translation model corresponding to the neural machine translation model is obtained.

Specifically, the second chapter parallel corpus includes a number of second parallel sentence pairs, wherein the second parallel sentence pairs include the current source-end sentence and the target-end sentence corresponding to the current source-end sentence, for example, the second parallel sentence pair may be expressed as {(s)_i,y_i) In which s is_iIs the current source sentence, and y_iIs s_iThe target end sentence of (1).

Step a3 specifically includes: a31, encoding the current source-end sentence in the second parallel sentence pair by a source-end encoder; a32, decoding the coding result through a target end decoding end to predict a target end sentence of the current source end sentence; a33, calculating Loss Loss _ tgt of the target end sentence predicted by the target end decoder according to the predicted target end sentence and the actual target end sentence in the second parallel sentence pair; and A34, determining a translation loss gradient according to the loss of the target end sentence predicted by the target end decoder, updating the model parameters of the neural machine translation model according to the translation loss gradient, and completing one iteration of the model parameters of the neural machine translation model. And repeating the steps A31-A34 to realize continuous iterative training of the neural machine translation model, and ending the iterative training of the neural machine translation model until a second preset ending condition is met to obtain the target chapter-level translation model.

It should be noted that the context prediction model may be a skip-through model based on a Transformer frame, and the neural machine translation model may be a translation model based on a Transformer frame. The neural machine translation model has been explained previously and will not be described in detail here.

In order to further improve the overall translation effect of the target chapter-level translation model, improvement is performed on the basis of the first training method to obtain a training mode of pre-training and combined training, and the method specifically comprises the following steps:

before executing step A1, training a pre-training model by using a chapter-level monolingual corpus to obtain a pre-trained pre-training model, wherein the pre-training model is used for learning sentence expressions containing context semantic information from sentences at the source end of the chapter-level monolingual corpus; initializing the neural machine translation model and the context prediction model based on the pre-trained pre-training model.

The model structure of the pre-training model and the process of training the pre-training model are the same as or similar to those of the second training mode, and reference may be made to the related description in the second training mode below, and for the sake of brevity of the description, no further description is given here.

And a second training method: pre-training + fine-tuning method

The joint training method needs a large amount of chapter-level parallel corpora as training data. However, chapter-level parallel corpora with document boundaries are not readily available at the time of implementation. In contrast, a large amount of chapter-level monolingual corpus is readily available as training data. Under the condition that discourse level parallel corpora cannot be obtained, a second training mode can be adopted: for the case that the discourse-level training corpus is a discourse-level monolingual corpus, a target discourse-level translation model can be obtained by adopting a pre-training and fine-tuning method, and the specific process of the training mode two comprises the following steps B1-B2:

step B1, pre-training the pre-training model based on the chapter-level monolingual corpus; step B2: and fine-tuning the target combination model based on the pre-trained pre-training model to obtain a target chapter translation model corresponding to the target combination model.

Specifically, in step B1, the pre-training model is pre-trained using the chapter-level monolingual corpus, and the context sentence of the current source-end sentence is predicted by the pre-training model, so that the source-end context encoder capable of capturing the dependency relationship between sentences is obtained by pre-training the pre-training model.

Specifically, the discourse-level monolingual corpus contains a certain number of monolingual training sentence pairs, which can be expressed as(s)_i,s_i-1,s_i+1) Wherein s is_iAs a current source end sentence，s_i-1Is s is_iLast source sentence, s_i+1Is s is_iThe last source sentence.

The pre-training process for the pre-training model includes a multiple iterative training process for the pre-training model, where, referring to fig. 6, fig. 6 is a schematic flow chart of the pre-training + fine-tuning method in the embodiment of the present invention, and any one iterative training process for the pre-training model includes the following steps B11-B13:

and step B11, predicting the source context sentence of the current source sentence by the pre-training model aiming at the current source sentence of the monolingual training sentence pair in the chapter-level monolingual corpus.

Specifically, the model structure of the pre-training model may use any one of the following two model structures:

the decoder comprises two source-end decoders (a front decoder and a rear decoder) and one source-end encoder shared by the front decoder and the rear decoder. The pre-training model is based on the same source end encoder to the current source end sentence s_iCoding is carried out, two source-end decoders of the pre-training model respectively decode according to the coding result of the same source-end coder to obtain a previous source-end sentence s_i-1And the next source sentence s_i+1。

The pre-training model is composed of two encoder-decoder models shown in fig. 3, and includes: a front decoder, a rear decoder, a front encoder and a rear encoder), expressing the current source end sentence as a word vector and then transmitting the word vector to two encoder-decoder models, respectively encoding the current source end sentence by using two independent source end encoders (the front encoder and the rear encoder) in the pre-training model, wherein the front decoder is used for predicting the current source end sentence s_iLast source sentence s_i-1The post-decoder is used for predicting the current source end sentence s_iNext source end sentence s_i+1. It should be noted that the word vectors of the source sentences are shared by the two encoder-decoder models in the pre-training model shown in fig. 3, that is, the current source sentence is transmitted to the two encoder-decoder models after being represented as a word vector.

Step B12: determining a context loss function together according to a source end context sentence predicted by a pre-training model and an actual source end context sentence in a monolingual training sentence pair, and determining a context loss function gradient of the iteration according to the context loss function; step B13: and adjusting the model parameters of the pre-training model according to the gradient of the context loss function of the iteration.

It should be noted that one or more monolingual training sentence pairs may be used in one iteration of the pre-training model.

And continuously iteratively training the pre-training model by repeating the steps B11-B13 until convergence or the maximum iteration number is reached, and obtaining the pre-trained pre-training model. Wherein, the calculation formula of the context loss function is as follows:

Loss＝Loss_pre+Loss_next

and Loss _ pre is the Loss of the previous source-end sentence predicted by one coder-decoder model in the pre-training model, and Loss _ next is the Loss of the next source-end sentence predicted by the other coder-decoder model in the pre-training model.

Step B2 is specifically to initialize the target combination model according to the pre-trained model after pre-training. Specifically, referring to fig. 5, the target combination model is obtained by integrating the neural machine translation model with two source-side encoders (a front encoder and a rear encoder) in the pre-trained model, wherein the neural machine translation model is a translation model based on a Transformer framework, which has been explained above and is not described herein again.

And step B3, performing fine tuning training on the initialized target combination model by using the chapter-level monolingual corpus to obtain a target chapter-level translation model corresponding to the target combination model.

Specifically, referring to fig. 5, the sum of the output of the pre-encoder (pre-encoder), the output of the post-encoder (next-encoder), and the word vector of the current source-end sentence in the target combination model is used as the input of the source-end encoder in the neural machine translation model to perform fine tuning on the target combination model. In the fine tuning process, parameters of a front encoder and a rear encoder in the word vector and the target combination model can be continuously optimized.

By the technical scheme, the target chapter translation model captures the context semantic information of the sentence to be translated in the chapter, so that translation ambiguity can be eliminated, translation results of the same words in different sentences can be kept more consistent, and the translation effect of the chapter-level text can be improved.

In the following, experiments are performed on the target chapter translation model obtained by training in various embodiments provided in the embodiments of the present invention on two translation tasks of chinese-english and english-german to verify the validity of the target chapter translation model for the chapter-level text, but the present invention is not limited to the following:

in the task of translating chinese-english, LDC (language Data Consortium) corpora including LDC2003E14, LDC2005T06, LDC2005T10 and a part of LDC2004T08 (conference recording/law/news) corpora are used, and the sizes of these corpora are 2.8M pairs of parallel sentences. 94K chapters (containing 900K sentence pairs) are selected from the corpus of 2.8M parallel sentence pairs, and the NIST06 data set in the NIS database is used as a development set, and the NIST02/NIST03/NIST04/NIST05/NIST08 is used as a test set. The development set and test set contained 588 discourse and 5833 sentence pairs. Each document contains an average of 10 sentences. The chapter-level monolingual corpus was collected for the experiment, which had a total of 25M sentences and 700K documents, each of which contained 35 sentences on average.

In the english-german translation task, chapter-level bilingual data of WMT19 is used as a training set (total 39K documents and 855K sentence pairs). In addition, chapter-level monolingual corpus of 410K documents (containing 10M sentences) was collected. NewST 2019 is used as a development set, and NewST 2017 and NewST 2018 are used as test sets. The development set contained 123 documents and 2998 sentences, and the test set contained 255 documents and 6002 sentences.

For Chinese language segmentation, the word is segmented into sub-words with smaller granularity by using Byte Pair Encoding (BPE). On the english and german ends, Byte Pair Encoding (BPE) is used to segment words into smaller granularity sub-words. Case-insensitive NIST BLEU (an evaluation machine translation evaluation index improved on BLEU basis) scores were used as evaluation metrics, and BLEU scores were calculated using the "mteval-v11b. Words outside the vocabulary are all labeled with "UNK" instead.

A neural machine translation model of the Transformer architecture was used as a reference model. In the training process: the hidden layer size is set to 512 and the filter size is set to 2048. The number of layers of the decoder and the encoder are set to 6 layers, and the number of attention of 8 heads is set. And updating the target chapter translation model by using an Adam algorithm. The learning rate is set to 1.0, and the learning rate update steps (arm-steps) is set to 4000. At each iteration 4096 words are set for a batch. Encoding by using 4 TITAN XP GPUs in an encoding process, and decoding by using 2 TITAN XP GPUs in a decoding process; the width of beam-search (beam-search) is set to 4 during decoding. And (5) carrying out significance detection on the translation results of the test set.

The results of the combined training method are shown in table 1. As shown in table 1, for the joint training method, "Pre"/"Next" indicates that only a front encoder or a rear encoder is used. According to the experimental results on the development set, μ ═ 0.5 and λ ═ 0.5 were set for the pre-encoder and the post-encoder, respectively (according to the development set, μ ═ 0.1, 0.5, and 1.0 were verified, respectively, where μ set to 0.5 was the most effective). Only the BLEU scores of the front encoder or the rear encoder are similar, which shows that the influence of the front source sentence and the rear source sentence on the current source sentence translation is almost the same. When the previous source sentence and the next source sentence are predicted simultaneously by using the "pre + next" method (μ is set to 0.5 and λ is set to 0.3 according to the result of the development set), the BLEU point is improved by +0.84 compared with the reference model, compared with the method using only the "pre" and the "next".

Indicates that the significance detection is better than the reference model (P)<0.01)

TABLE 1 BLEU score comparison for Chinese-English translation

The 25M-order discourse monolingual corpus is used for training the pre-training model, and then fine-tuning is performed on two parallel corpuses with different scales: 900K parallel corpuses and 2.8M parallel corpuses. Sentences in the 900K corpus show strong context relevance. However, in a 2.8M corpus, not all chapters have a definite document boundary. When the pre-training model is trained by the discourse monolingual language, the sentences in the documents are not randomly disturbed, and the original sequence of the sentences in each document is maintained. The results are shown in tables 1 and 2, and similar to joint training, the pre-trained models are trained by a single encoder, either pre-encoder or next-encoder, to obtain either a "pre" or "next" result. Of course, both encoders can be trained simultaneously, corresponding to the result of "Pre + Next". As shown in tables 1 and 2, "Pre" and "Next" of the Pre-training and fine-tuning methods achieved a large improvement in both baselines. Without using any context information, the improvement to the Transformer reference model is +0.93 and +1.28BLEU points, respectively. In the context of table 2, the following,

Table 2 BLEU score comparison on chinese-english translation for the pre-training + fine tuning method.

From the above experimental data, it can be seen that the translation effect can be obviously improved by the target chapter model obtained by the method of combined training or pre-training plus fine-tuning. Further, the combined model is initialized by pre-training the model before training the combined model. As can be seen from table 1, the highest BLEU score was obtained, +1.14 above the transform baseline.

As shown in table 3, BLEU points better than baseline model 0.81 and 0.92, respectively, in the english translation.

TABLE 3 BLEU score comparison on English-German translation

According to the experimental data, the translation performance of the model can be improved by predicting the upper and lower sentences from the current source end sentence to the source end sentence in the chapters through the target chapter translation model.

Experiments prove that the current source end sentence is coded by using two independent encoders in the pre-training model and is respectively used for predicting the previous source end sentence and the next source end sentence to obtain better translation effect. The experimental BLEU data are shown in table 4 below.

TABLE 4 comparison of two encoders and a Single encoder in the Pre-training model

Compared with the method that the pre-training model uses two independent encoders to encode the current source-end sentence and shares one encoder to encode the current source-end sentence. The results are shown in Table 4. Clearly, a non-shared encoder model is preferred over a shared encoder model. This shows that two independent encoders are better at capturing the dependency between the current sentence and the surrounding sentences than a single encoder. This is because the dependency of the current sentence on the previous sentence is different from that of the next sentence.

It can be seen from the results shown in table 5 that BLEU point improvements of +0.34 and +0.66 can be obtained on two parallel corpora of different sizes if only pre-trained word vectors are used as input. When the sum of the input word vector in the pre-training model, the output of the pre-encoder (pre-encoder) and the output of the post-encoder (next-encoder) is used as the input of the encoder in the neural machine translation model, the BLEU of +0.5 and +0.62 is improved compared with the case that only the word vector is used as the input.

Table 5 BLEU scores for the target combination model on two different scale corpora.

In table 5: "pre + next + word vector" represents that the outputs of the above encoder (pre-encoder) and the below encoder (next-encoder), and the word vector of the pre-training model are taken together as the input of the target combination model. The "word vector" is simply the word vector of the pre-trained model as input.

The translation method based on the target chapter translation model provided by the embodiment of the invention can improve the translation quality by eliminating translation ambiguity (such as example one in table 6) or enabling the translation effects to be more consistent (such as example two in table 6). In example one, "fragile" has two meanings of "weak" or "fragle", and since the target chapter translation model has exact context semantic information, the term "fragile" can be correctly translated. In example two, the target chapter translation model translates the "findings" in both sentences to the same translation "detected" because the "cases" mentioned in the second sentence can be predicted from the meaning of "drugs" and "police" in the first sentence. In addition, 5 documents were randomly extracted from the test set for a total of 48 sentences for translation analysis.

TABLE 6 comparison of translation results for the reference model and the target chapter translation model

In a second aspect, based on the same inventive concept, an embodiment of the present invention provides a translation model training method, including the following steps: and training a context prediction model and a neural machine translation model jointly based on the chapter-level parallel corpora to obtain a target chapter translation model corresponding to the neural machine translation model, wherein the context prediction model is used for learning sentence expression containing context semantic information from source end sentences of the chapter-level parallel corpora.

In a specific embodiment, the jointly training a context prediction model and a neural machine translation model based on the chapter-level parallel corpus to obtain a target chapter translation model corresponding to the neural machine translation model includes: the neural machine translation model and the context prediction model are jointly trained by utilizing the obtained first chapter-level parallel corpus until a trained joint model is obtained, wherein a first parallel sentence pair in the first chapter-level parallel corpus comprises a current source end sentence, a source end context sentence aiming at the current source end sentence and a target end sentence; extracting a trained neural machine translation model from the trained joint model;

in a specific embodiment, the trained neural machine translation model is continuously trained by using the obtained second chapter-level parallel corpus until a target chapter-level translation model corresponding to the neural machine translation model is obtained, wherein a second parallel sentence pair in the second chapter-level parallel corpus includes a current source end sentence and a target end sentence aiming at the current source end sentence.

In a specific embodiment, before jointly training the neural machine translation model and the context prediction model using the acquired first chapter-level parallel corpus, the method further includes: utilizing a chapter-level monolingual corpus to train a pre-training model to obtain a pre-training model after pre-training, wherein the pre-training model is used for learning sentence expressions containing context semantic information from sentences at the source end of the chapter-level monolingual corpus; initializing the neural machine translation model and the context prediction model based on the pre-trained model.

In a third aspect, based on the same inventive concept, an embodiment of the present invention provides a translation model training method, including the following steps: pre-training the pre-training model based on the discourse-level monolingual corpus; and fine-tuning a target combination model according to the pre-trained pre-training model to obtain a target chapter translation model corresponding to the target combination model, wherein the pre-training model is used for learning sentence expression containing context semantic information from source end sentences of the chapter-level monolingual corpus, and the target combination model contains a neural machine translation model and a source end context encoder.

In a fourth aspect, based on the same inventive concept, an embodiment of the present invention provides a chapter translator, as shown in fig. 7, including:

the sentence translation unit 701 is configured to obtain, for each sentence to be translated in the chapters to be translated, a sentence representation that the sentence to be translated contains context semantic information through the target chapter translation model, and translate the sentence to be translated based on the sentence representation;

the translation result forming unit 702 is configured to obtain a chapter translation result corresponding to the chapter to be translated according to a translation result of each sentence of the sentence to be translated in the chapter to be translated.

In a specific embodiment, the apparatus further comprises:

In a specific embodiment, the model training unit includes:

or the model training unit comprises:

In a specific embodiment, the first training unit includes:

In a specific embodiment, the neural machine translation model and the context prediction model share the same source-side encoder, the neural machine translation model further includes a target-side decoder, and the context prediction model further includes a source-side context decoder.

In a specific embodiment, the joint training subunit is configured to perform a plurality of joint iterative training on the neural machine translation model and the context prediction model; wherein, any one time of the joint iterative training, the joint training subunit is specifically configured to:

In a specific embodiment, the apparatus further comprises:

In a specific embodiment, the model fine tuning unit includes:

In a fifth aspect, based on the same inventive concept, an embodiment of the present invention provides a translation model training apparatus, including:

In a specific embodiment, the first training unit includes:

In a specific embodiment, the apparatus further comprises:

In a sixth aspect, based on the same inventive concept, an embodiment of the present invention provides a translation model training apparatus, including:

With regard to the above-mentioned apparatus, the specific functions of each unit have been described in detail in the embodiment of the chapter translation method provided in the aspect of the present invention, and will not be described in detail herein, and the specific implementation process may refer to the embodiment of the chapter translation method provided in the first aspect.

The seventh aspect is based on the same inventive concept as the foregoing discourse translation method embodiment, and the present specification embodiment further provides an electronic device, and based on the same inventive concept as the foregoing discourse translation method embodiment, the present specification embodiment further provides an electronic device. Fig. 8 is a block diagram illustrating an electronic device 800 in accordance with an example embodiment. For example, the device 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 8, device 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.

The processing component 802 generally controls overall operation of the device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing elements 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operation at the device 800. Examples of such data include instructions for any application or method operating on device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

Power component 806 provides power to the various components of device 800. Power components 806 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for device 800.

The multimedia component 808 includes a screen that provides an output interface between the device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 800 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the device 800. For example, the sensor assembly 814 may detect the open/closed state of the device 800, the relative positioning of components, such as a display and keypad of the device 800, the sensor assembly 814 may also detect a change in the position of the device 800 or a component of the device 800, the presence or absence of user contact with the device 800, orientation or acceleration/deceleration of the device 800, and a change in the temperature of the device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

Communications component 816 is configured to facilitate communications between device 800 and other devices in a wired or wireless manner. The device 800 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast associated information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communications component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 804 comprising instructions, executable by the processor 820 of the device 800 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

There is also provided a non-transitory computer-readable storage medium having instructions thereon, which, when executed by a processor of a mobile terminal, enable the device 800 to perform a translation model training method or a chapter translation method, the method including any of the foregoing translation model training method embodiments, the method including any of the foregoing chapter translation method embodiments.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present invention is defined only by the appended claims, which are not intended to limit the present invention, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A method for translating chapters is characterized by comprising the following steps:

2. The method of claim 1, wherein the target discourse translation model is obtained by learning sentence representations containing context semantic information in discourse-level training corpuses, wherein the discourse-level training corpuses are discourse-level parallel corpuses and/or discourse-level monolingual corpuses.

3. The method of claim 2, wherein the obtaining the target discourse translation model by learning sentence representations containing contextual semantic information in a discourse-level training corpus comprises:

4. A translation model training method is characterized by comprising the following steps:

5. A translation model training method is characterized by comprising the following steps:

6. A chapter translator, comprising:

7. A translation model training apparatus, comprising:

8. A translation model training apparatus, comprising:

9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1 to 5.

10. An electronic device comprising a memory and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors to perform the operational instructions of the method according to any of claims 1-5.