CN114580446A

CN114580446A - Neural machine translation method and device based on document context

Info

Publication number: CN114580446A
Application number: CN202210254752.5A
Authority: CN
Inventors: 张磊
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2022-03-15
Filing date: 2022-03-15
Publication date: 2022-06-03

Abstract

The present disclosure provides a neural machine translation method based on document context, which is applied to the field of artificial intelligence or other fields, and the neural machine translation method comprises: obtaining a source language sentence; selecting a plurality of candidate auxiliary sentences from the context of the source language sentence; respectively combining a source language sentence with a plurality of candidate auxiliary sentences and inputting the source language sentence into a context scorer to obtain a plurality of groups of scores; determining a target auxiliary sentence from a plurality of candidate auxiliary sentences according to the plurality of groups of scores, wherein the target auxiliary sentence is used for assisting in translating the source language sentence; neural machine translation is performed using the source language sentence and the target auxiliary sentence. The present disclosure also provides a system, device, storage medium and program product for document context-based neural machine translation.

Description

Neural machine translation method and device based on document context

Technical Field

The present disclosure relates to the field of artificial intelligence technologies, and in particular, to a neural machine translation method, apparatus, device, medium, and program product based on document context.

Background

Although Neural Machine Translation (NMT) has been well developed in recent years, the standard NMT model does not consider the association between individual sentences when translating an entire article, but rather translates each sentence in the article independently. Thus, there are many document-level nmt (docnmt) methods proposed. These methods utilize attention-based neural networks to extract textual information between sentences of a source or target language document to improve the translation quality of the document. However, most existing methods of DocNMT do not consider that text information needs to be extracted from different contexts when different source language sentences are translated, useful context information is easily lost during translation, or the translation results of documents are not accurate due to interference of redundant context information.

Disclosure of Invention

In view of the foregoing, the present disclosure provides a document context-based neural machine translation method, apparatus, device, medium, and program product.

According to a first aspect of the present disclosure, there is provided a document context-based neural machine translation method, comprising: obtaining a source language sentence; selecting a plurality of candidate auxiliary sentences from the context of the source language sentence; respectively combining a source language sentence with a plurality of candidate auxiliary sentences and inputting the source language sentence into a context scorer to obtain a plurality of groups of scores; determining a target auxiliary sentence from a plurality of candidate auxiliary sentences according to the plurality of groups of scores, wherein the target auxiliary sentence is used for assisting in translating the source language sentence; neural machine translation is performed using the source language sentence and the target auxiliary sentence.

According to an embodiment of the present disclosure, combining a source language sentence with a plurality of candidate auxiliary sentences and inputting the context scorer, respectively, comprises: connecting the source language sentence with a plurality of candidate auxiliary sentences respectively by using special symbolic marks to form a plurality of examples; the multiple instances are input into a context scorer, which includes a two-layer anamorphic encoder, wherein the output of the first layer anamorphic encoder serves as the input to the second layer anamorphic encoder.

According to an embodiment of the present disclosure, the input context scorer according to combining the source language sentence with the plurality of candidate auxiliary sentences, respectively, further comprises: connecting the source language sentence with the empty text sentence by using a special symbolic mark to form a first example; the first instance is input into a context scorer.

According to an embodiment of the present disclosure, combining a source language sentence with a plurality of candidate auxiliary sentences and inputting the combined result to a context scorer, and obtaining a plurality of sets of scores includes: calculating to obtain the attention score of each group by using a scoring function according to the output of the second layer of the transformers; and carrying out normalization processing on the attention scores of each group to obtain the attention probability distribution.

According to an embodiment of the present disclosure, determining the target assist sentence from the plurality of candidate assist sentences according to the plurality of sets of scores includes: determining a candidate sentence having a probability higher than the first probability as a target sentence from among the plurality of candidate sentences; wherein the first probability is the attention probability corresponding to the first instance.

According to an embodiment of the present disclosure, before combining the source language sentence with a plurality of candidate auxiliary sentences respectively and inputting the combined source language sentence to the context scorer, the method further comprises: performing reinforcement learning training on the context scorer and the neural machine translation model; the neural machine translation model comprises two document-level neural machine translation models sharing training parameters.

According to the embodiment of the disclosure, the training of the context scorer and the neural machine translation model for reinforcement learning comprises the following steps: respectively inputting any one of the candidate auxiliary sentences and the current target auxiliary sentence into two document-level neural machine translation models; translating by using a document-level neural machine translation model, and respectively obtaining a first reward value and a second reward value according to a translation result; and performing gradient calculation according to the first reward value and the second reward value, and feeding the gradient values back to the context scorer and the neural machine translation model for training respectively.

According to an embodiment of the present disclosure, translating using a document-level neural machine translation model includes: and (4) evaluating the translation effect by using the BLEU algorithm score.

According to the embodiment of the disclosure, the training of the context scorer and the neural machine translation model for reinforcement learning comprises the following steps: in reinforcement learning training, a negative log-likelihood function is used as a loss function between a source language sentence and a translated target language sentence.

According to the embodiment of the disclosure, the training of the context scorer and the neural machine translation model for reinforcement learning comprises the following steps: and combining the negative log-likelihood function trained by the neural machine translation model with the reinforcement learning objective function through balance factors to obtain a combined loss function.

A second aspect of the present disclosure provides a system for document context-based neural machine translation, comprising: the obtaining module is used for obtaining a source language sentence; a selecting module for selecting a plurality of candidate auxiliary sentences from the context of the source language sentence; the scoring module is used for respectively combining the source language sentences with the candidate auxiliary sentences and inputting the combination into the context scorer to obtain a plurality of groups of scores; a determining module, configured to determine a target auxiliary sentence from the plurality of candidate auxiliary sentences according to the plurality of sets of scores, where the target auxiliary sentence is used to assist in translating the source language sentence; and the translation module is used for performing neural machine translation by using the source language sentence and the target auxiliary sentence.

A third aspect of the present disclosure provides an electronic device, comprising: one or more processors; memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the document context based neural machine translation method described above.

A fourth aspect of the present disclosure also provides a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform the above-described document context-based neural machine translation method.

A fifth aspect of the present disclosure also provides a computer program product comprising a computer program which, when executed by a processor, implements the above-described document context-based neural machine translation method.

Drawings

The foregoing and other objects, features and advantages of the disclosure will be apparent from the following description of embodiments of the disclosure, which proceeds with reference to the accompanying drawings, in which:

FIG. 1 schematically illustrates an application scenario diagram of a document context based neural machine translation method, apparatus, device, medium, and program product, in accordance with embodiments of the present disclosure;

FIG. 2 schematically illustrates a flow diagram of a document context based neural machine translation method in accordance with an embodiment of the present disclosure;

FIG. 3 schematically illustrates a flow chart of a method of combining a source language sentence with a plurality of candidate auxiliary sentences respectively and inputting the context scorer, according to an embodiment of the present disclosure;

FIG. 4 schematically shows a structural diagram of a context scorer according to an embodiment of the present disclosure;

FIG. 5 schematically illustrates a flow chart of a second method of combining a source language sentence with a plurality of candidate auxiliary sentences respectively and inputting a context scorer according to an embodiment of the present disclosure;

FIG. 6 schematically illustrates a flow chart of a method for combining a source language sentence with a plurality of candidate auxiliary sentences and inputting the combination to a context scorer to obtain a plurality of sets of scores according to an embodiment of the disclosure;

FIG. 7 schematically illustrates a flow diagram of a reinforcement learning training process in accordance with an embodiment of the present disclosure;

FIG. 8 schematically illustrates a flow diagram of a method of reinforcement learning training a context scorer, neural machine translation model, in accordance with an embodiment of the present disclosure;

FIG. 9 schematically illustrates a block diagram of a system for document context based neural machine translation, in accordance with an embodiment of the present disclosure; and

fig. 10 schematically shows a block diagram of an electronic device adapted to implement the above described method according to an embodiment of the present disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.

Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). Where a convention analogous to "A, B or at least one of C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B or C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

Some block diagrams and/or flow diagrams are shown in the figures. It will be understood that some blocks of the block diagrams and/or flowchart illustrations, or combinations thereof, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the instructions, which execute via the processor, create means for implementing the functions/acts specified in the block diagrams and/or flowchart block or blocks. The techniques of this disclosure may be implemented in hardware and/or software (including firmware, microcode, etc.). In addition, the techniques of this disclosure may take the form of a computer program product on a computer-readable storage medium having instructions stored thereon for use by or in connection with an instruction execution system.

Based on the defects of the traditional neural machine translation method, the embodiment of the disclosure provides a neural machine translation method, a device, equipment, a medium and a program product based on document context, which are applied to the field of artificial intelligence and can effectively improve the accuracy of a translation result.

Fig. 1 schematically illustrates an exemplary system architecture 100 that may be applied to a document context-based neural machine translation method in accordance with an embodiment of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.

As shown in fig. 1, the system architecture 100 according to this embodiment may include

terminal devices

101, 102, 103, a network 104 and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).

The

terminal devices

101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 105 may be a server providing various services, such as a background management server (for example only) providing support for websites browsed by users using the

terminal devices

101, 102, 103. The background management server may analyze and perform other processing on the received data such as the user request, and feed back a processing result (e.g., a webpage, information, or data obtained or generated according to the user request) to the terminal device.

It should be noted that the neural machine translation method based on document context provided by the embodiments of the present disclosure may be generally executed by the

terminal devices

101, 102, 103 and the server 105. Accordingly, the document context based neural machine translation system provided by the embodiments of the present disclosure may be generally disposed in the

terminal devices

101, 102, 103 and the server 105. The document context based neural machine translation method provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105. Accordingly, the document context based neural machine translation system provided by the embodiments of the present disclosure may also be disposed in a server or a server cluster that is different from the server 105 and is capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

The document context-based neural machine translation method of the disclosed embodiment will be described in detail below with reference to fig. 2 to 10 based on the scenario described in fig. 1.

FIG. 2 schematically illustrates a flow diagram of a document context based neural machine translation method in accordance with an embodiment of the present disclosure.

As shown in FIG. 2, the document context based neural machine translation method 200 may include operations S210-S250.

In operation S210, a source language sentence is obtained.

Machine translation is a technique for implementing natural language translation using a computer; neural machine translation is a new generation of translation technology that uses a single neural network to maximize the performance of machine translation, i.e., a source language sentence X ═ { X ═ X₁，…，x_ITranslating into a target language sentence Y ═ Y₁，…，y_T}。

In operation S220, a plurality of candidate auxiliary sentences are selected from the context in which the source language sentence is located.

A plurality of candidate auxiliary sentences are selected in the context according to the position of the current source language sentence in the document, for example, the candidate auxiliary sentences can be the first n sentences, the first m sentences, the last n sentences, the last m sentences or the whole document.

In operation S230, the source language sentence is respectively combined with a plurality of candidate auxiliary sentences and input to the context scorer, so as to obtain a plurality of sets of scores.

The context scorer scores the reference value of each candidate auxiliary sentence according to the current source language sentence X.

In operation S240, a target auxiliary sentence for assisting in translating the source language sentence is determined from the plurality of candidate auxiliary sentences according to the plurality of sets of scores.

A probabilistic first strategy is used to select useful contextual sentences, i.e., target assist sentences, based on the reference value score.

In operation S250, neural machine translation is performed using the source language sentence and the target auxiliary sentence.

And inputting the source language sentence and the selected target auxiliary sentence into the DocNMT model to obtain a translation result.

The method can dynamically select the context sentences, so that when the document is translated, the information in the context sentences can be fully utilized, the problem of insufficient or redundant context information is avoided, and the accuracy of the translation result is improved.

FIG. 3 schematically illustrates a flow chart of a first method of combining a source language sentence with a plurality of candidate auxiliary sentences and inputting a context scorer, respectively, according to an embodiment of the present disclosure.

As shown in fig. 3, the combining and inputting the source language sentence with the plurality of candidate auxiliary sentences, respectively, into the context scorer may include operations S2311 to S2312.

In operation S2311, a source language sentence is connected with a plurality of candidate auxiliary sentences, respectively, using special symbolic labels to construct a plurality of instances.

Marking the boundaries of the source language sentence with special symbols, for example by adding a document tag at the beginning of the source sentence as an additional marker; and simultaneously connecting the two sentences between the source language sentence and each candidate auxiliary sentence through special symbol marks to form an example.

In operation S2312, a plurality of instances are input to a context scorer, the context scorer including a two-layer anamorphic encoder, wherein an output of the first layer anamorphic encoder serves as an input to the second layer anamorphic encoder.

FIG. 4 shows the structure of a context scorer of an embodiment of the present disclosure, which mainly includes two Layers of transformers Layers, L₁Layer transformers and L₂The transformers of the layers are identical in structure, but they do not share weights, i.e., the trainable parameters of each layer have values, gradients, and updates independently. Each layer of the transformers encoder is divided into two sub-layers: Self-Attention layer (Self-Attention) and Feed-Forward neural network layer (Feed-Forward). The input of each layer of the transformers encoder first flows through the attention layer, which can help the transformers encoder to look at other words in the input sentence when encoding a specific word; the output from the attention layer is fed to a feedforward neural network layer, with the exact same feedforward neural network applied independently to the word at each position of one input instance. For one input instance, at L₁After the transformers of the layer output the hidden layer state of the context sentence, the hidden layer state is used as L₂Input to a layer's anamorphic encoder. L is a radical of an alcohol₂The effect of the anamorphic encoder of the layer is to model the association information between context statements.

FIG. 5 schematically illustrates a flow chart of a second method of combining a source language sentence with a plurality of candidate auxiliary sentences and inputting a context scorer, respectively, according to an embodiment of the present disclosure.

As shown in FIG. 5, this method of combining and inputting a source language sentence to a context scorer with a plurality of candidate auxiliary sentences, respectively, may include operations S2321-S2322.

In operation S2321, a first instance is constructed by concatenating the source language sentence with the empty text sentence using the special notation.

In operation S2322, the first instance is input into the context scorer.

As shown in FIG. 4, "NON" represents a special empty text sentence that is combined with a source language sentence into a special instance that represents the translation of the source language sentence without context for assisting in selecting a policy for decision making.

FIG. 6 schematically illustrates a flow chart of a method for combining a source language sentence with a plurality of candidate auxiliary sentences and inputting the combination to a context scorer to obtain a plurality of sets of scores according to an embodiment of the disclosure.

As shown in FIG. 6, obtaining multiple sets of scores based on combining the source language sentence with multiple candidate auxiliary sentences and inputting the combination to the context scorer may include operations S2331-S2332.

In operation S2331, an attention score for each group is calculated using a scoring function based on the output of the second layer of the transformers encoder.

In operation S2332, the attention scores of each group are normalized to obtain an attention probability distribution.

And inputting the output of the second layer of the transformer encoder into a 2-layer MLP, acquiring a score of the association degree of the context sentence and the sentence to be translated, calculating the score through a scoring function, wherein the obtained score is the attention score of each group, and the higher the attention score is, the greater the influence of the hidden state of the encoder on the next translated word is. And normalizing the scores of all the examples formed by the candidate auxiliary sentences and the empty text sentences and the source language sentences respectively to obtain the attention probability distribution.

On the basis of the above embodiment, determining the target assist sentence from the plurality of candidate assist sentences according to the plurality of sets of scores includes: determining a candidate sentence having a probability higher than the first probability as a target sentence from among the plurality of candidate sentences; wherein the first probability is the attention probability corresponding to the first instance.

And determining a useful context sentence for subsequent translation according to the obtained attention probability distribution by adopting a probability-first selection strategy. In particular, context statements may be selected that have a probability higher than the probability corresponding to an empty context statement instance. If the probability corresponding to the empty text sentence instance is the highest, the number of the context sentences is set to 0, which indicates that the context sentences can not assist the translation of the current source language sentence. As the source language sentence changes, the number and location of the selected context sentences also changes dynamically. The range of the number of context sentences is [0, | S | ], and the number of context sentences may be further set to a fixed value, that is, the fixed number of context sentences with the highest probability is selected. According to the method and the device, the candidate context sentences are scored and selected through the context scorer and the selection strategy, so that the context sentences containing useful information are obtained, and the problem of insufficient or redundant context information is solved.

On the basis of the above embodiment, before combining the source language sentence with a plurality of candidate auxiliary sentences and inputting the source language sentence to the context scorer, the method further includes: performing reinforcement learning training on the context scorer and the neural machine translation model; wherein the neural machine translation model comprises two document-level neural machine translation models sharing training parameters.

In the stage of training the model, a policy gradient method is used to train the context scorer and the DocNMT model, and the reinforcement learning training process is shown in FIG. 7. The context scorer and the DocNMT model can generate sentences closer to standard translation results through reinforcement learning training.

Fig. 8 schematically illustrates a flow chart of a method of training a context scorer, a neural machine translation model for reinforcement learning, in accordance with an embodiment of the present disclosure.

As shown in FIG. 8, the reinforcement learning training of the context scorer and the neural machine translation model may include operations S2301-S2303.

In operation S2301, any one of the candidate auxiliary sentences and the current target auxiliary sentence are input into two document-level neural machine translation models, respectively.

In operation S2302, a document-level neural machine translation model is used for translation, and a first bonus value and a second bonus value are respectively obtained according to the translation result.

In operation S2303, a gradient calculation is performed according to the first reward value and the second reward value, and the gradient values are fed back to the context scorer and the neural machine translation model for training, respectively.

The current signal and stimulus in each step of reinforcement learning can affect the subsequent signal and stimulus, and the Reward mechanism of reinforcement learning can feed back a Reward value (Reward) according to the obtained translation result, so that the translation effect is advanced towards a good direction. By integrating a reinforcement learning mechanism into model training, the translation result is more accurate and reliable.

On the basis of the above embodiment, translating using the document-level neural machine translation model includes: and (4) evaluating the translation effect by using the BLEU algorithm score.

The BLEU algorithm is a reference for evaluating the machine translation technology at the present stage, and the BLEU algorithm is used for comparing the translation to be evaluated with the provided reference translation, for example, if the more N-grams (one of statistical language models, including a unigram, a bigram, a trigram, a quadrigram and the like) are co-occurring between the translation to be evaluated and the provided reference translation, the more similar the translation to be evaluated and the provided reference translation are, and the higher the quality of the translation result of the machine translation is.

On the basis of the above embodiment, in the reinforcement learning training, a negative log-likelihood function is used as a loss function between the source language sentence and the translated target language sentence.

The goal of training the document-level translation model is to minimize the negative log-likelihood, and the loss function is, for example:

wherein X ═ { X ═ X₁，…，x_IIs a source language sentence, Y ═ Y₁，…，y_TZ is a target language sentence obtained by translationAs a subset, x, of the set S of candidate context statements_IIs the I word in X, y_TIs the T-th word in Y, Y_tFor words in the target language translated at time T, θ is a set of model parameters and T is the number of words in target sequence Y.

Training loss functions such as cross entropy loss functions, semantic loss functions, and word loss functions for training the document-level translation model; wherein the cross entropy loss function is the cross entropy loss between the target translation and the final translation of the current source language sentence; the goal of the training is to minimize the aforementioned cross-entropy loss function.

On the basis of the above embodiment, the training of the context scorer and the neural machine translation model for reinforcement learning includes:

and combining the negative log-likelihood function trained by the neural machine translation model with the reinforcement learning objective function through balance factors to obtain a combined loss function.

Specifically, the loss function L (θ) after combination is, for example:

L(θ)＝α*L_mle+(1-α)*L_r1

wherein L is_mleTraining an objective function, L, for a neural machine translation model_rlTo enhance the learning objective function, α is a balance factor.

And (3) balancing the model training target and the reinforcement learning target through the alpha value, so that the model benefit is maximized.

The present disclosure is further illustrated by the following detailed description. The neural machine translation method and system described above are specifically described in the following examples. However, the following examples are merely illustrative of the present disclosure, and the scope of the present disclosure is not limited thereto.

The DocNMT model uses the information Z between texts to convert a source language sentence X into { X }₁，…，x_ITranslating into a target language sentence Y ═ Y₁，…，y_TZ is a subset of the candidate context statement set S. The information Z is primarily derived from the contextual statement of the document in which the source language statement is located. Training purpose of modelThe criterion is to minimize the negative log-likelihood, and the loss function is as in equation (1):

the method of the embodiment translates the sentence X in a document and mainly comprises three steps: in the first step, text that is helpful in translating statement X is selected from the context using a selection module. The core component of the selection module is a context scorer that scores the reference value of each candidate context sentence according to the current translated sentence X. In a second step, a probabilistic-first policy is used to select useful context statements based on the score. And thirdly, inputting the selected context statement into the DocNMT model to obtain a translation result.

The structure of the context scorer is shown in fig. 4. The tokens of the statements need to be obtained first and then scored. First, a source language sentence and a context sentence are connected to form an example, and a sentence is added to the beginning "<DCS>"Mark, Add between two sentences"<SEP>". The instance is input to an L₁In a layer encoder. "<DCS>"markup can encode information of a contextual statement in a source language document well by self-attention. Where "NON" represents a special empty text statement that is used to help select a policy for decision making.

For a candidate context statement z e S, at L₁Layer transformers encoder output hidden layer state "<DCS>"after, it is again referred to as L₂Input of layer transformers, L₂The output of the layer's transformers is noted

Finally, a two-layer linear scoring network is adopted to calculate the score of the association degree of the context and the sentence to be translated, as shown in formula (2):

Score_z＝σ(W₂(W₁h_z+b₁)+b₂) (2)

wherein

σ denotes a logical sigmoid function, b₁Represents L₁Original input of layers, b₂Represents L₂The original input of the layer.

Considering the sampling operation in the training process, all the scores in the candidate set S are normalized to obtain a probability distribution P_selectAs shown in equation (3):

P_select＝softmax([Score₁；…；Score_|S|) (3)

wherein [ a; b ] represents the concatenation of the values a and b into a vector, | S | is the number of statements in the candidate set S.

The embodiment adopts a probability-first selection strategy. According to P_selectCan obtain useful context sentences and use the context sentences in the translation task. To dynamically select a context sentence, a special empty sentence marker is added to the set of candidate sentences "<NON>", which represents translating a source language sentence without context. Finally, those with higher probability are selected "<NON>"contextual statement of probability. If "<NON>"is the highest, the number of context statements is set to 0. As the translated sentence changes, the selected context sentence also changes dynamically. The number of context statements ranges from [0, | S]. The number of context statements is set to a fixed value, i.e. the fixed number of context statements with the highest probability is selected.

In the stage of training the model, the embodiment adopts a strategy gradient method to train the selection module and the DocNMT model. Fig. 7 illustrates the reinforcement learning training process.

The present embodiment initializes the DocNMT model using the parameters of the standard statement-level NMT model. For the selection module, the present embodiment initializes the context scorer as a two-classification task and does not take into account dependencies between context statements. Two of the DocNMT models share training parameters.

The initialization process of the context scorer is divided into two steps.

First, the present embodiment creates pseudo tags for candidate context statements. Each context statement is labeled as a 1 or 0. This example shows Score in formula (2)_zThe probability of predicting tag 1. Pseudo-labels are generated by another DocNMT model trained by a random context sentence. In this embodiment, different candidate context sentences are input to the DocNMT model to be trained, and the BLEU score higher than "<NON>"context statement and labeled 1, BLEU score lower"<NON>"is marked as 0.

Second, this embodiment trains the context scorers to predict pseudo-labels. This embodiment shares the parameters of the embedding layer with the original DocNMT model and trains the original scorer by minimizing cross-entropy loss.

The present embodiment employs a reward mechanism that can assess translation quality and is sensitive to context changes for intensive training. For the decoding time t, the present embodiment calculates to generate an accurate target word y_tCost g of_tAs shown in equation (4):

wherein the first two terms calculate the accurate target word y_tAnd the difference in log probability between the words with the highest probability in the predicted probability distribution. The last term is a normalized form that reflects the most probable word in the predicted probability distribution

Word with the second highest probability

A greater gap indicates a greater confidence in the prediction.

This embodiment obtained the generation of accurate sentence Y ═ Y₁，…，y_TAverage cost of the points, then using a single incrementSubtracting the function to obtain the final reward value (which ranges from 0 to 1), as shown in equation (5):

a larger prize value indicates a greater likelihood of generating an accurate target word. Therefore, selection of the contextual statement should be encouraged. Conversely, if the reward value is low, selection of the contextual statement is discouraged.

The goal of the embodiment's intensive training of the model is to minimize the negative expectation reward. A simple sample u from the strategy P is used in the loss function for approximation, as shown in equation (6):

L_rl＝-E_u～P[r(u)]≈-r(u)，u～P (6)

this embodiment introduces a baseline reward r (u ') in the training to reduce the gradient variation, where u' is obtained by the inference algorithm in the test phase, and the final gradient is evaluated by equation (7):

the parameters of the context scorer and the DocNMT model are denoted by ω and θ, respectively, in this example. For each of the source language sentences X, a set of context sentences Z is selected by the selection strategy mentioned above^*Then according to P in formula (3)_selectAnother set of context statements of the same size is sampled

Two sets of context statements are input into the same DocNMT model, and the reward value r (Z) can be calculated separately^*) And

thus, the gradient of the context scorer can be calculated by equation (8):

wherein the content of the first and second substances,

is from P_selectSampling out of

The probability of (c). Basic prize value r (Z)^*) Is derived by the current best strategy, this method encourages the model to find more useful contexts with reward values higher than the currently selected best context.

For the DocNMT model, the present embodiment combines the MLE objective function (see equation (1)) and the reinforcement learning objective function (see equation (6)) as shown in equation (9):

where α is a balance factor.

The present embodiment introduces a reinforcement learning objective function into the DocNMT model, thereby enabling the model to better utilize the already selected context. The reinforcement learning gradient of the DocNMT model is calculated by equation (10):

wherein

Is a context defined by the current DocNMT model and sampling

And obtaining a translation result.

Therefore, the DocNMT model and the text selection module can generate sentences closer to standard translation results through reinforcement learning training.

The method adopts a dynamic selection method to select context sentences with variable sizes from the document and applies the context sentences to document level translation; scoring and selecting candidate context sentences through a context scorer and a selection strategy so as to obtain the context sentences containing useful information; and a strategy gradient method is used for carrying out reinforcement learning training on the context scorer and the DocNMT model, compared with the traditional DocNMT model, the method can extract useful information in the document and has a good auxiliary effect on accurately translating sentences in the document. The neural machine translation method disclosed by the invention is applied to the training of the DocNMT model in the financial field, and the translation performance of the DocNMT model in the financial field can be improved.

FIG. 9 schematically illustrates a block diagram of a system for document context based neural machine translation, in accordance with an embodiment of the present disclosure.

As shown in FIG. 9, the system 900 for document context based neural machine translation includes: an obtaining module 910, a selecting module 920, a scoring module 930, a determining module 940 and a translating module 950.

An obtaining module 910, configured to obtain a source language sentence; according to an embodiment of the present disclosure, the obtaining module 910 may be configured to perform the step S210 described above with reference to fig. 2, for example, and is not described herein again.

A selecting module 920 is used for selecting a plurality of candidate auxiliary sentences from the context of the source language sentence. According to an embodiment of the present disclosure, the selecting module 920 may be configured to perform the step S220 described above with reference to fig. 2, for example, and is not described herein again.

A scoring module 930, configured to combine the source language sentence with multiple candidate auxiliary sentences respectively and input the combined source language sentence to the context scorer, so as to obtain multiple sets of scores. According to an embodiment of the present disclosure, the scoring module 930 may be configured to perform the step S230 described above with reference to fig. 2, for example, and will not be described herein again.

A determining module 940, configured to determine a target auxiliary sentence from the plurality of candidate auxiliary sentences according to the plurality of sets of scores, where the target auxiliary sentence is used for assisting in translating the source language sentence. According to an embodiment of the present disclosure, the determining module 940 may be configured to perform the step S240 described above with reference to fig. 2, for example, and is not described herein again.

A translation module 950 for performing neural machine translation using the source language sentence and the target auxiliary sentence. According to an embodiment of the present disclosure, the translation module 950 may be configured to perform the step S250 described above with reference to fig. 2, for example, and is not described herein again.

It should be noted that any number of modules, sub-modules, units, sub-units, or at least part of the functionality of any number thereof according to embodiments of the present disclosure may be implemented in one module. Any one or more of the modules, sub-modules, units, and sub-units according to the embodiments of the present disclosure may be implemented by being split into a plurality of modules. Any one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in any other reasonable manner of hardware or firmware by integrating or packaging a circuit, or in any one of or a suitable combination of software, hardware, and firmware implementations. Alternatively, one or more of the modules, sub-modules, units, sub-units according to embodiments of the disclosure may be at least partially implemented as a computer program module, which when executed may perform the corresponding functions.

For example, any plurality of the obtaining module 910, the selecting module 920, the scoring module 930, the determining module 940 and the translating module 950 may be combined and implemented in one module, or any one of the modules may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. According to an embodiment of the present disclosure, at least one of the obtaining module 910, the selecting module 920, the scoring module 930, the determining module 940 and the translating module 950 may be implemented at least partially as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented by hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or implemented by any one of three implementations of software, hardware and firmware, or implemented by a suitable combination of any several of them. Alternatively, at least one of the obtaining module 910, the selecting module 920, the scoring module 930, the determining module 940 and the translating module 950 may be at least partially implemented as a computer program module, which may perform a corresponding function when executed.

Fig. 10 schematically shows a block diagram of an electronic device adapted to implement the above described method according to an embodiment of the present disclosure. The electronic device shown in fig. 10 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 10, the electronic apparatus 1000 described in this embodiment includes: a processor 1001 which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)1002 or a program loaded from a storage section 1008 into a Random Access Memory (RAM) 1003. Processor 1001 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 1001 may also include onboard memory for caching purposes. The processor 1001 may include a single processing unit or multiple processing units for performing different actions of a method flow according to embodiments of the present disclosure.

In the RAM 1003, various programs and data necessary for the operation of the system 1000 are stored. The processor 1001, ROM1002, and RAM 1003 are connected to each other by a bus 1004. The processor 1001 performs various operations of the method flow according to the embodiments of the present disclosure by executing programs in the ROM1002 and/or the RAM 1003. Note that the program may also be stored in one or more memories other than the ROM1002 and the RAM 1003. The processor 1001 may also perform various operations of method flows according to embodiments of the present disclosure by executing programs stored in one or more memories.

Electronic device 1000 may also include an input/output (I/O) interface 1005, the input/output (I/O) interface 1005 also being connected to bus 1004, according to an embodiment of the present disclosure. The system 1000 may also include one or more of the following components connected to the I/O interface 1005: an input section 1006 including a keyboard, a mouse, and the like; an output section 1007 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 1008 including a hard disk and the like; and a communication section 1009 including a network interface card such as a LAN card, a modem, or the like. The communication section 1009 performs communication processing via a network such as the internet. The driver 1010 is also connected to the I/O interface 1005 as necessary. A removable medium 1011 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1010 as necessary, so that a computer program read out therefrom is mounted into the storage section 1008 as necessary.

According to embodiments of the present disclosure, method flows according to embodiments of the present disclosure may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication part 1009 and/or installed from the removable medium 1011. The computer program performs the above-described functions defined in the system of the embodiment of the present disclosure when executed by the processor 1001. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.

The embodiments of the present disclosure also provide a computer-readable storage medium, which may be included in the device/apparatus/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer readable storage medium carries one or more programs which, when executed, implement a document context based neural machine translation method in accordance with an embodiment of the present disclosure.

According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, a computer-readable storage medium may include the ROM1002 and/or the RAM 1003 described above and/or one or more memories other than the ROM1002 and the RAM 1003.

Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the method illustrated in the flow chart. When the computer program product runs in a computer system, the program code is used for causing the computer system to realize the neural machine translation method based on the document context provided by the embodiment of the disclosure.

The computer program performs the above-described functions defined in the system/apparatus of the embodiments of the present disclosure when executed by the processor 1001. The systems, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.

In one embodiment, the computer program may be hosted on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted in the form of a signal on a network medium, distributed, downloaded and installed via the communication part 1009, and/or installed from the removable medium 1011. The computer program containing program code may be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

In such an embodiment, the computer program may be downloaded and installed from the network through the communication part 1009 and/or installed from the removable medium 1011. The computer program performs the above-described functions defined in the system of the embodiment of the present disclosure when executed by the processor 1001. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.

In accordance with embodiments of the present disclosure, program code for executing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, these computer programs may be implemented using high level procedural and/or object oriented programming languages, and/or assembly/machine languages. The programming language includes, but is not limited to, programming languages such as Java, C + +, python, the "C" language, or the like. The program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

It should be noted that each functional module in each embodiment of the present disclosure may be integrated into one processing module, or each module may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the present disclosure may be embodied in the form of software products, in part or in whole, which substantially contributes to the prior art.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.

While the disclosure has been shown and described with reference to certain exemplary embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the disclosure as defined by the appended claims and their equivalents. Accordingly, the scope of the present disclosure should not be limited to the above-described embodiments, but should be defined not only by the appended claims, but also by equivalents thereof.

Claims

1. A neural machine translation method based on document context, comprising:

obtaining a source language sentence;

selecting a plurality of candidate auxiliary sentences from the context of the source language sentence;

respectively combining the source language sentences with the candidate auxiliary sentences and inputting the combination into a context scorer to obtain a plurality of groups of scores;

determining a target auxiliary sentence from the plurality of candidate auxiliary sentences according to the plurality of sets of scores, the target auxiliary sentence being used for assisting in translating the source language sentence;

performing neural machine translation using the source language sentence and the target auxiliary sentence.

2. The document context-based neural machine translation method of claim 1, wherein said combining said source language sentence with said plurality of candidate auxiliary sentences and inputting a context scorer comprises:

connecting the source language sentence with the candidate auxiliary sentences respectively by using special symbolic marks to form a plurality of examples;

inputting the plurality of instances into the context scorer, the context scorer comprising a two-layer anamorphic encoder, wherein an output of a first layer anamorphic encoder is an input to a second layer anamorphic encoder.

3. The document context-based neural machine translation method of claim 2, wherein said combining said source language sentence with said plurality of candidate auxiliary sentences and inputting a context scorer according to said combining said source language sentence with said plurality of candidate auxiliary sentences further comprises:

connecting the source language sentence with the empty text sentence by using a special symbolic mark to form a first example;

inputting the first instance into the context scorer.

4. The document context-based neural machine translation method of claim 3, wherein said combining said source language sentence with said plurality of candidate auxiliary sentences and inputting the combined sentence into a context scorer to obtain a plurality of sets of scores comprises:

calculating to obtain the attention score of each group by using a scoring function according to the output of the second layer of the transformers;

and carrying out normalization processing on the attention scores of each group to obtain the attention probability distribution.

5. The document context-based neural machine translation method of claim 4, wherein said determining a target auxiliary sentence from said plurality of candidate auxiliary sentences according to said plurality of sets of scores comprises:

determining a candidate sentence having a probability higher than the first probability as a target sentence from the plurality of candidate sentences; wherein the first probability is an attention probability corresponding to the first instance.

6. The document context-based neural machine translation method of claim 1, wherein said combining said source language sentence with said plurality of candidate auxiliary sentences respectively and inputting to a context scorer further comprises:

performing reinforcement learning training on the context scorer and the neural machine translation model; wherein the neural machine translation model comprises two document-level neural machine translation models sharing training parameters.

7. The document context-based neural machine translation method of claim 6, wherein said training the context scorer and neural machine translation model for reinforcement learning comprises:

respectively inputting any one of the candidate auxiliary sentences and the current target auxiliary sentence into the two document-level neural machine translation models;

translating by using the document-level neural machine translation model, and respectively obtaining a first reward value and a second reward value according to the translation result;

and performing gradient calculation according to the first reward value and the second reward value, and respectively feeding back the gradient values to the context scorer and the neural machine translation model for training.

8. The document context-based neural machine translation method of claim 7, wherein said translating using the document-level neural machine translation model comprises:

and (4) evaluating the translation effect by using the BLEU algorithm score.

9. The document context-based neural machine translation method of claim 6, wherein said training the context scorer and neural machine translation model for reinforcement learning comprises:

in the reinforcement learning training, a negative log-likelihood function is used as a loss function between the source language sentence and the translated target language sentence.

10. The document context-based neural machine translation method of claim 9, wherein said training the context scorer and neural machine translation model for reinforcement learning comprises:

and combining the negative log-likelihood function trained by the neural machine translation model with a reinforcement learning target function through balance factors to obtain a combined loss function.

11. A system for neural machine translation based on document context, comprising:

the obtaining module is used for obtaining a source language sentence;

a selecting module, configured to select a plurality of candidate auxiliary sentences from a context in which the source language sentence is located;

the scoring module is used for respectively combining the source language sentences with the candidate auxiliary sentences and inputting the combination into a context scorer to obtain a plurality of groups of scores;

a determining module for determining a target auxiliary sentence from the plurality of candidate auxiliary sentences according to the plurality of sets of scores, the target auxiliary sentence being used for assisting in translating the source language sentence;

and the translation module is used for performing neural machine translation by using the source language sentence and the target auxiliary sentence.

12. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-10.

13. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method of any one of claims 1 to 10.

14. A computer program product comprising a computer program which, when executed by a processor, carries out the method according to any one of claims 1 to 10.