CN109271646B

CN109271646B - Text translation method and device, readable storage medium and computer equipment

Info

Publication number: CN109271646B
Application number: CN201811026196.6A
Authority: CN
Inventors: 涂兆鹏; 窦子轶; 王星; 史树明; 张潼
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2018-09-04
Filing date: 2018-09-04
Publication date: 2022-07-08
Anticipated expiration: 2038-09-04
Also published as: CN111382584B; CN109271646A; CN111382584A

Abstract

The application relates to a text translation method, a text translation device, a computer readable storage medium and a computer device, wherein the method comprises the following steps: acquiring a word sequence of a source text; performing semantic coding on the word sequence layer by layer through a multilayer neural network of an encoder in a machine translation model to obtain a source end fusion vector sequence; the source end fusion vector sequence fuses source end vector sequences output by each layer of neural network; decoding the source end fusion vector sequence through a decoder of the machine translation model according to a word vector of a target word output by the machine translation model in the previous time to obtain a current target end vector sequence; determining a target word output by the machine translation model at the current time according to the target end vector sequence; and generating a target text corresponding to the source text according to each target word output by the machine translation model. The scheme provided by the application can improve the text translation quality.

Description

Text translation method and device, readable storage medium and computer equipment

Technical Field

The present application relates to the field of machine learning technologies, and in particular, to a text translation method and apparatus, a computer-readable storage medium, and a computer device.

Background

With the development of machine learning techniques, machine translation techniques have emerged. In the existing machine translation field, a deep neural network is generally used for text translation, such as a recurrent neural network, a convolutional neural network, and a self-attention mechanism network. Conventionally, a coding-decoding framework, that is, a framework composed of an encoder composed of a multi-layer neural network and a decoder composed of a multi-layer neural network, is generally used for text translation by a neural network model.

However, the conventional neural network translation framework is used for text translation, and only the information of the uppermost neural network in the encoder and the decoder is generally utilized, so that the text translation accuracy is not high.

Disclosure of Invention

In view of the above, it is necessary to provide a text translation method, a text translation apparatus, a computer-readable storage medium, and a computer device for solving the technical problem of low accuracy of text translation.

A method of text translation, comprising:

acquiring a word sequence of a source text;

performing semantic coding on the word sequence layer by layer through a multilayer neural network of an encoder in a machine translation model to obtain a source end fusion vector sequence; the source end fusion vector sequence fuses source end vector sequences output by each layer of neural network;

decoding the source end fusion vector sequence through a decoder of the machine translation model according to a word vector of a target word output by the machine translation model in the previous time to obtain a current target end vector sequence;

determining a target word output by the machine translation model at the current time according to the target end vector sequence;

and generating a target text corresponding to the source text according to each target word output by the machine translation model.

A text translation apparatus, the apparatus comprising:

the acquisition module is used for acquiring a word sequence of a source text;

the encoding module is used for carrying out semantic encoding on the word sequence layer by layer through a multilayer neural network of an encoder in a machine translation model to obtain a source end fusion vector sequence; the source end fusion vector sequence fuses source end vector sequences output by each layer of neural network;

the decoding module is used for decoding the source end fusion vector sequence according to the word vector of the target word output by the machine translation model in the previous time through a decoder of the machine translation model to obtain a current target end vector sequence;

the determining module is used for determining a target word output by the machine translation model at the current time according to the target end vector sequence;

and the generating module is used for generating a target text corresponding to the source text according to each target word output by the machine translation model.

A computer-readable storage medium, storing a computer program which, when executed by a processor, causes the processor to perform the steps of the text translation method.

A computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the text translation method.

According to the text translation method, the text translation device, the computer readable storage medium and the computer equipment, the word sequences of the source text are subjected to semantic coding layer by layer through the multilayer neural network of the encoder in the machine translation model, and the source end vector sequences output by the neural networks of each layer are fused to obtain the source end fusion vector sequence. Therefore, by fusing the source end vector sequences output by each layer of neural network, the information of each hidden layer in the machine translation model can be fused so as to learn better hidden layer representation. And decoding the source end fusion vector sequence by a decoder of the machine translation model according to the word vector of the target word output by the machine translation model in the previous time to obtain a current target end vector sequence, and determining the target word output by the machine translation model in the current time according to the target end vector sequence. And generating a target text corresponding to the source text according to each target word output by the machine translation model. Therefore, when the encoding-decoding framework is used for performing text translation on the source text, the semantic and syntax information of each layer can be fused, each hidden layer in the machine translation model can be more fully learned, the loss of effective information in the text translation process is reduced, and the accuracy of text translation is greatly improved.

Drawings

FIG. 1 is a diagram of an application environment of a text translation method in one embodiment;

FIG. 2 is a flow diagram that illustrates a method for text translation, according to one embodiment;

FIG. 3 is a flow diagram that illustrates the translation of text by a machine translation model that includes an encoder-decoder architecture, under an embodiment;

FIG. 4 is a block diagram illustrating semantic encoding of a word sequence by a multi-layer neural network of an encoder in one embodiment;

fig. 5 is a schematic flowchart illustrating a step of decoding a source-side fused vector sequence layer by layer according to a word vector of a target word output by a machine translation model at a previous time to obtain a target-side fused vector sequence, through a multi-layer neural network of a decoder in the machine translation model in one embodiment;

FIG. 6 is a block diagram illustrating a decoding process performed on a source-side fused vector sequence by a multi-layer neural network of a decoder in an embodiment;

FIG. 7 is a schematic diagram of a dense connection network in one embodiment;

FIG. 8 is a block diagram illustrating linear fusion of the outputs of the neural networks in each layer according to an embodiment;

FIG. 9 is a block diagram illustrating an embodiment of a method for fusing outputs of neural networks in layers using a recursive fusion;

FIG. 10 is a block diagram illustrating a hierarchical fusion of the outputs of the neural networks in each layer according to an embodiment;

FIG. 11 is a block diagram illustrating an exemplary embodiment of a cross-layer attention mechanism fusion of outputs of neural networks;

FIG. 12 is a flowchart illustrating the training steps of the machine translation model in one embodiment;

FIG. 13 is a flowchart illustrating a method of text translation in one embodiment;

FIG. 14 is a block diagram showing the construction of a text translation apparatus according to an embodiment;

FIG. 15 is a block diagram showing the construction of a text translation apparatus according to another embodiment;

FIG. 16 is a block diagram showing a configuration of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

FIG. 1 is a diagram of an application environment of the text translation method in one embodiment. Referring to fig. 1, the text translation method is applied to a text translation system. The text translation system includes a terminal 110 and a server 120. The terminal 110 may send the source text or a word sequence obtained by segmenting the source text to the server 120, and the server 120 executes a text translation method to obtain a target text, and then returns the target text to the terminal 110. The terminal 110 may also perform a text translation method after acquiring the source text. The terminal 110 and the server 120 are connected through a network. The terminal 110 may specifically be a desktop terminal or a mobile terminal, and the mobile terminal may specifically be at least one of a mobile phone, a tablet computer, a notebook computer, and the like. The server 120 may be implemented as a stand-alone server or a server cluster composed of a plurality of servers.

As shown in FIG. 2, in one embodiment, a method of text translation is provided. The embodiment is mainly illustrated by applying the method to a computer device, which may be the terminal 110 or the server 120 in fig. 1. Referring to fig. 2, the text translation method specifically includes the following steps:

s202, acquiring a word sequence of the source text.

Wherein the source text is the initial text to be translated. The source text may specifically be a sentence, paragraph, chapter, or the like. The source text may be a chinese text, an english text, or the like. The word sequence of the source text is a sequence formed by all words obtained after word segmentation processing is carried out on the source text. For the word segmentation processing of the source text which is the Chinese text, a word segmentation mode based on a dictionary or statistics can be adopted. For word segmentation processing of which the source text is an English text, word segmentation modes such as splitting words according to spaces can be adopted.

Specifically, the computer device can directly acquire the source text, and perform word segmentation processing on the source text to obtain a corresponding word sequence. The computer device may also receive word sequences generated by other computer devices from the source text. After the computer equipment acquires the word sequence of the source text, the acquired word sequence can be processed in the next step through an input layer of a pre-trained machine translation model.

In one embodiment, the computer device may convert the discrete word sequence into a continuous sequence of spatial representation vectors through a word embedding (word embedding) process after obtaining the word sequence of the source text. And inputting the spatial expression vector sequence into an input layer of a pre-trained machine translation model so as to carry out the next processing.

S204, semantic coding is carried out on the word sequence layer by layer through a multilayer neural network of an encoder in a machine translation model, and a source end fusion vector sequence is obtained; the source end fusion vector sequence fuses the source end vector sequences output by each layer of neural network.

Wherein the machine translation model is a pre-trained machine learning model. The machine translation model may employ a neural network-based Sequence-to-Sequence framework. The Sequence-to-Sequence framework is a framework including an Encoder-Decoder structure. The Encode-Decoder structure converts an input sequence into another sequence output. In this framework, the encoder converts the input sequence into a vector sequence, and the decoder generates the output sequence in sequence, time-wise, according to the vector sequence. The encoder and the decoder may adopt the same type of neural network model, or different types of neural network models. For example, the encoder and the decoder may be both a CNN (Convolutional Neural Networks) model or an RNN (Recurrent Neural Networks) model. Or the encoder and the decoder respectively adopt different neural network models, such as an RNN model adopted by the encoder, a CNN model adopted by the decoder, and the like.

In one embodiment, after the word sequence of the source text is input into the machine translation model, the word sequence can be semantically encoded layer by layer through a multilayer neural network of an encoder in the machine translation model, so as to obtain a source end fusion vector sequence. The source end fusion vector sequence fuses the source end vector sequences output by each layer of neural network. The decoder decodes and converts the source end fusion vector sequence into a target end vector sequence, and an output layer of the machine translation model can convert the target end vector sequence into an output sequence so as to obtain a translation text of the source text, namely a target text corresponding to the source text.

In one embodiment, the multilayer neural network of the encoder in the machine translation model can perform semantic coding on the word sequence layer by layer respectively to obtain a source end vector sequence output by each layer of neural network. Specifically, the computer device may input a spatial representation vector sequence corresponding to a word sequence of the source text to a first layer of neural network in a multilayer neural network of the encoder, perform semantic coding processing on the spatial representation vector sequence through the first layer of neural network, and output a source end vector sequence corresponding to the first layer of neural network. And then the source end vector sequence output by the first layer of neural network is used as the input of the second layer of neural network, and semantic coding processing is carried out through the second layer of neural network to obtain the source end vector sequence output by the second layer of neural network. And repeating the steps until a source end vector sequence output by the last layer of neural network is obtained. The machine translation model can fuse the source end vector sequences output by each layer of neural network to obtain a source end fusion vector sequence.

In one embodiment, when semantic coding is performed on a word sequence layer by layer, a multilayer neural network of an encoder in a machine translation model may use a source-end vector sequence output by a pre-layer neural network as input of a current-layer neural network directly or after processing, and obtain a source-end vector sequence output by the current-layer neural network after semantic coding is performed on the current-layer neural network. Wherein, the preamble layer refers to a layer before the current layer. And repeating the steps until a source end vector sequence output by the last layer of neural network is obtained, and taking the source end vector sequence output by the last layer of neural network as a source end fusion vector sequence.

In one embodiment, the machine translation model may perform fusion processing on the source-side fusion vector output by the multilayer neural network by using a fusion manner such as linear superposition fusion processing or fusion network processing.

The semantic coding of the word sequence is a process of converting the word sequence into a vector. The source end vector sequence is a vector obtained by inputting a word sequence of a source text into a hidden layer in a multilayer neural network of an encoder and performing source end layer conversion processing on the word sequence through the hidden layer. The hidden layer is a term in the neural network model, is an intermediate layer relative to the input layer and the output layer, and comprises model parameters obtained by training the neural network model. The hidden layer of the encoder is here an intermediate layer with respect to the input layer of the encoder and the output layer of the encoder. The hidden layer of the encoder may be a plurality of neural network layers.

The following describes, by way of example, a process of performing semantic coding on a word sequence layer by layer through a multilayer neural network of an encoder in a machine translation model to obtain a source-end fusion vector sequence: taking the word sequence of the source text as a sequence with the length of m as an example, the word sequence is input into a multilayer neural network of an encoder to carry out semantic encoding. First, the computer device may perform word segmentation on the source text to obtain a word sequence of the source text as x ═ x (x)₁,x₂,...,x_m) The word sequence is converted into a continuous spatial representation vector by a word embedding method, which can be expressed as p ═ p (p)₁,p₂,...,p_m). By passingThe multilayer neural network of the encoder performs semantic coding on the spatial expression vector layer by layer to obtain a source end fusion vector sequence which can be recorded as H ═ (H)₁,h₂,...,h_m)。

And S206, decoding the source end fusion vector sequence through a decoder of the machine translation model according to the word vector of the target word output by the machine translation model in the previous time to obtain the current target end vector sequence.

Where decoding is the process of converting the vector sequence input to the decoder into a target output sequence. The target end vector sequence is a vector sequence obtained by inputting a source end fusion vector corresponding to a word vector of each word in the word sequence into a hidden layer of a decoder. Both the source-side fused vector sequence and the target-side vector sequence may represent semantic information and syntactic information of the word sequence. The hidden layer of the decoder is here an intermediate layer with respect to the input layer of the decoder and the output layer of the decoder. The hidden layer of the decoder may include a plurality of neural network layers.

Specifically, when the machine translation model generates the current target word, the previously output target word may be obtained. And decoding the source end vector sequence output by the encoder through a decoder according to the word vector of the target word output by the previous machine translation model to obtain the current target end vector sequence. The target end vector sequence not only contains the semantic information of each word in the word sequence of the source text, but also integrates the semantic information of the target word output at the previous time, so that the finally generated target text is more coherent, and the translation result is more accurate.

In one embodiment, the machine translation model may obtain historically output target words before generating the current target word. And generating a current target end vector sequence according to the target words output by the history, the current target end vector sequence output by the decoder and the current content vector. And the current content vector is obtained by performing weighted summation on the source end vector sequence according to the current self-attention distribution weight vector.

In one embodiment, the machine translation model may obtain the last output target word when the current target-end vector sequence is to be generated. The machine translation model can perform word embedding processing on the target words output last time, and convert the target words into word vectors represented by real numbers.

In one embodiment, step S206 specifically includes: decoding the source end fusion vector sequence layer by layer through a multilayer neural network of a decoder in the machine translation model according to a word vector of a target word output by the machine translation model in the previous time to obtain a current target end fusion vector sequence; the target end fusion vector sequence fuses the target end vector sequences output by each layer of neural network. Further, the machine translation model can determine the target words output by the machine translation model at the current moment according to the target end fusion vector sequence at the current moment.

In one embodiment, the multi-layer neural network of the decoder in the machine translation model can decode the source-end fusion vector sequence layer by layer respectively to obtain a target-end vector sequence output by each layer of neural network. Specifically, the computer device may input the source-side fused vector sequence to a first layer of neural network in a multilayer neural network of the decoder, decode the source-side fused vector sequence through the first layer of neural network, and output a target-side vector sequence corresponding to the first layer of neural network. And then taking the target end vector sequence output by the first layer of neural network as the input of the second layer of neural network, and carrying out layer transformation processing through the second layer of neural network to obtain the target end vector sequence output by the second layer of neural network. And repeating the steps until a target end vector sequence output by the last layer of neural network is obtained. The machine translation model can fuse the target end vector sequences output by each layer of neural network to obtain a target end fusion vector sequence.

In one embodiment, when the multi-layer neural network of the decoder in the machine translation model decodes the source-side fused vector sequence layer by layer, the target-side vector sequence output by the pre-layer neural network can be directly used as the input of the current-layer neural network or the vector sequence obtained after processing can be used as the input of the current-layer neural network, and the target-side vector sequence output by the current-layer neural network is obtained after semantic coding is performed through the current-layer neural network. And repeating the steps until a target end vector sequence output by the last layer of neural network is obtained, and taking the target end vector sequence output by the last layer of neural network as a target end fusion vector sequence.

In one embodiment, the machine translation model may perform fusion processing on the source-side fusion vector output by the multilayer neural network by using a fusion manner such as linear superposition fusion processing or fusion network processing. In one embodiment, the merging manner used by the decoder and encoder in the machine translation model may be the same or different.

In the above embodiment, the source text fusion vector sequence is decoded layer by layer through the multilayer neural network of the decoder in the machine translation model, so as to obtain the target end fusion vector sequence fusing the output results of each layer of neural network. Therefore, by fusing the target end vector sequences output by each layer of neural network, the information of each hidden layer in a machine translation model can be fused in the machine translation processing so as to learn better hidden layer representation and greatly improve the translation accuracy.

It is to be understood that "current time" used herein is used to describe a time when a decoder of the machine translation model decodes and outputs a current target word, and "previous time" is used to describe a time when the decoder decodes and outputs a previous target word. For example, the previous time is the (i-1) th time, the target word output by the machine translation model is y_i-1(ii) a When the number of times is the ith time, the target word output by the machine translation model is y_i. And, when the time is a relative change time, for example, the machine translation model outputs the target word of the next time i +1 of the current time i, the next time i +1 may be used as a new current time, and the current time i may be used as a new previous time.

And S208, determining the target word output by the machine translation model at the current time according to the target end vector sequence.

Specifically, when the current target word to be output is generated by the machine translation model, the current output probability sequence can be calculated and obtained through the output layer of the machine translation model according to the target end vector sequence decoded by the decoder. The current output probability sequence output by the machine translation model is a sequence formed by the probabilities that each candidate word is the target word output at the current time in the output end word set. Further, the machine translation model may select a candidate word corresponding to the maximum probability in the output probability sequence as the current target word. And repeating the decoding steps in sequence until the end word is output.

And S210, generating a target text corresponding to the source text according to each target word output by the machine translation model.

Specifically, the computer device may splice the output target words according to the sequence of outputting the target words through an output layer of the machine translation model, so as to generate a target text corresponding to the source text. In one embodiment, the target text is a different language text than the source text.

Referring to FIG. 3, FIG. 3 illustrates a flow diagram for text translation by a machine translation model including an encoder-decoder architecture in one embodiment. Firstly, inputting source text (namely input sentences) into an encoder of a machine translation model, and outputting a source end fusion vector sequence through an encoder module. And then inputting the source end fusion vector sequence and the word vector of the target word output last time into an attention module, and performing attention mechanism processing on the source end fusion vector sequence through the attention module to obtain a current source end content vector, namely a current moment source end context. And inputting the source end context at the current moment into a decoder of the machine translation model, decoding the source end context at the current moment through a decoder module, and outputting the target word at the current moment.

According to the text translation method, through the multilayer neural network of the encoder in the machine translation model, semantic coding is carried out on word sequences of a source text layer by layer, source end vector sequences output by the neural networks of each layer are fused, and a source end fusion vector sequence is obtained. Therefore, by fusing the source end vector sequences output by each layer of neural network, the information of each hidden layer in the machine translation model can be fused so as to learn better hidden layer representation. And decoding the source end fusion vector sequence by a decoder of the machine translation model according to the word vector of the target word output by the machine translation model in the previous time to obtain a current target end vector sequence, and determining the target word output by the machine translation model in the current time according to the target end vector sequence. And generating a target text corresponding to the source text according to each target word output by the machine translation model. Therefore, when the encoding-decoding framework is used for performing text translation on the source text, the semantic and syntax information of each layer can be fused, each hidden layer in the machine translation model can be more fully learned, the loss of effective information in model processing is reduced, and the accuracy of text translation is greatly improved.

In an embodiment, the step S204, that is, performing semantic coding on the word sequence layer by layer through a multilayer neural network of an encoder in the machine translation model to obtain the source-end fusion vector sequence specifically includes the following steps: inputting a word sequence to a first layer of neural network in a multilayer neural network of an encoder in a machine translation model to obtain a source end vector sequence output by the first layer of neural network; respectively acquiring first self-attention distribution weight vectors corresponding to each word in a word sequence in a current-layer neural network from a second-layer neural network in the multi-layer neural network; respectively calculating to obtain source end vectors corresponding to all words in the neural network of the current layer and the words in the word sequence according to a first self-attention distribution weight vector corresponding to all the words in the word sequence and a source end vector sequence output by a neural network of the previous layer; splicing source end vectors corresponding to each word in the word sequence and in the neural network of the current layer to obtain a source end vector sequence output by the neural network of the current layer until source end vector sequences output by the neural networks of all layers are obtained; and fusing the source end vector sequence to obtain a source end fusion vector sequence.

Specifically, each layer of neural network of the encoder in the machine translation model can process the input of each layer of neural network through a self-attention mechanism processing mode (also called source end layer transformation), so as to obtain the output of each layer of neural network. The computer device may use the word sequence of the source text as an input to a first layer neural network in a multi-layer neural network of an encoder in the machine translation model. And performing source-end layer transformation processing on the word sequence according to a preset initial value or a randomly generated initial value to obtain a source-end vector sequence output by the first-layer neural network. The source layer transformation processing is a processing process of processing input data through a neural network in an encoder to obtain output data. In the present application, the source-side layer transformation processing performed on the input data by each layer of neural network in the encoder is the same, and is explained in detail below.

Further, in a current-layer neural network from a second-layer neural network in the multi-layer neural network, first self-attention-distribution weight vectors corresponding to words in a word sequence in the current-layer neural network are respectively obtained. The first self-attention distribution weight corresponding to each word is obtained after attention mechanism processing is carried out according to the source end vector sequence of the previous layer.

The machine translation model can respectively calculate and obtain source end vectors corresponding to all the words in the current layer of neural network and the words in the word sequence according to the first self-attention distribution weight vector corresponding to all the words in the word sequence and the source end vector sequence output by the previous layer of neural network.

Referring to fig. 4, fig. 4 is a schematic structural diagram illustrating semantic encoding processing performed on a word sequence by a multi-layer neural network of an encoder in one embodiment. Each box in fig. 4 represents a hidden layer vector, i.e., a source end vector, in the neural network corresponding to a word. For each source end vector in each layer of neural network, the neural network can obtain a first self-attention distribution weight vector corresponding to the source end vector, and a source end vector sequence corresponding to the previous layer of neural network is subjected to weighted summation calculation according to the first self-attention distribution weight vector to obtain a source end vector of the current layer.

For example, for a jth source-end vector in an i-th layer neural network, the encoder may perform attention mechanism processing on a source-end vector sequence output by a previous layer, that is, an i-1-th layer neural network, to obtain a first self-attention distribution weight vector, which may be represented as α_i,j＝{α₁,α₂,…,α_J}. Wherein, the source end vector sequence output by the i-1 layer neural network can be represented as H ═ { H ═ H₁,h₂,...,h_JAnd (4) a source end vector corresponding to the jth source end vector in the ith layer of neural networkCan be calculated by the following formula:

accordingly, the vector of each source end in each layer of neural network can be calculated by the formula.

In one embodiment, the following formula can be used to calculate each source end vector in each layer of neural network:

q, K, V is obtained by linearly transforming the input of the current layer (i.e. the output of the neural network of the previous layer) according to three different learnable parameter matrixes, i.e. Q is a request (query) vector sequence, K is a key (key) vector sequence, and V is a value (value) vector sequence. Further, the machine translation model may use the logical similarity e between the dot-product modeling request and each key-value pair,

wherein, K^TRepresenting the transpose of the key-value matrix, d is the dimension of the model hidden layer vector. Softmax is a normalization function.

Further, the machine translation model can splice source end vectors corresponding to each word in the current layer neural network and in the word sequence to obtain a source end vector sequence output by the current layer neural network. For each layer of neural network in the multilayer neural network of the encoder, the source end vector sequences respectively output by each layer of neural network are obtained by adopting the method, and the machine translation model can fuse the source end vector sequences corresponding to each layer of neural network, so that the source end fusion vector sequence is obtained.

It can be understood that, for the first-layer neural network, the manner of performing source-end-layer transformation processing on the word sequence by the first-layer neural network is the same, and for the corresponding data of the previous-layer neural network required for processing by the first-layer neural network, an initial value may be preset empirically or an initial value may be randomly generated to perform the next processing.

In the above embodiment, for each layer of neural network in the multilayer neural network, the source end vectors corresponding to the words in the current layer of neural network and in the word sequence are respectively calculated according to the first self-attention distribution weight vector corresponding to each word in the word sequence and the source end vector sequence output by the previous layer of neural network, so as to obtain the source end vector sequence, so that each layer of neural network can utilize the result output by the previous layer of neural network, and the transitivity of the neural network in the data processing process is ensured. And source end vector sequences output by each layer of neural network are fused, so that the expression of each hidden layer in a machine translation model can be fully learned, and important information can be prevented from being lost.

In one embodiment, the step of decoding the source-end fusion vector sequence layer by layer through a multilayer neural network of a decoder in the machine translation model according to a word vector of a target word output at a previous time of the machine translation model to obtain the target-end fusion vector sequence specifically includes the following steps:

s502, inputting word vectors of target words output by the machine translation model history into a first layer of neural network in a multilayer neural network of a decoder in the machine translation model to obtain a target end vector sequence output by the first layer of neural network at the current time.

Specifically, the target words decoded by the decoder are performed sequentially, that is, one target word is output at each time. When the decoder decodes the current time, the machine translation model can take the word vector of the target word output historically as the input of a first layer of neural network in the multilayer neural network of the decoder in the machine translation model, and the target end layer transformation processing is carried out on the input data through the first layer of neural network according to a preset initial value or a randomly generated initial value to obtain a target end vector sequence output by the current first layer of neural network. The target layer transformation processing is a processing process of processing input data through a neural network in a decoder to obtain output data. In the present application, the processing manner of the target-side layer transformation performed on the input data by each layer of neural network in the decoder is the same, and is explained in detail below.

It can be understood that, as the number of the target words decoded by the decoder increases, the target words input to the first layer of neural network in the multi-layer neural network of the decoder correspondingly increase in each decoding process.

S504, second self-attention distribution weight vectors corresponding to the current target words are respectively obtained from a current layer of neural network from a second layer of neural network in the multilayer neural network.

Specifically, in a current-layer neural network from a second-layer neural network in the multi-layer neural network of the decoder, second self-attention-distribution weight vectors corresponding to words in a word sequence in the current-layer neural network are respectively obtained. And obtaining the second self-attention distribution weight corresponding to each word after the attention mechanism processing is carried out according to the target end vector sequence of the previous layer.

S506, according to the second self-attention distribution weight vector corresponding to each target word and the target end vector sequence corresponding to the previous layer of neural network, respectively calculating to obtain the first target end vector corresponding to each target word in the current layer of neural network.

Specifically, when the machine translation model outputs the target word each time, the machine translation model may respectively calculate and obtain a first target end vector corresponding to each target word in the current neural network and the current neural network according to the second self-attention-assignment weight vector corresponding to each target word and the target end vector sequence corresponding to the previous neural network.

In one embodiment, since the machine translation model generates the target words in time sequence, in the decoding process, for each layer of neural network, the hidden layer vector corresponding to each target word is obtained by performing weighted summation only according to the hidden layer vector of the previous layer of neural network corresponding to the historical target word.

Referring to fig. 6, fig. 6 is a schematic structural diagram illustrating a decoding process performed by a multi-layer neural network of a decoder on a source-side fused vector sequence in an embodiment. Each box in fig. 6 represents a hidden layer vector corresponding to a word in a neural network. Any two of the multi-layer neural networks of FIG. 6 with upper portions being decodersA layer neural network. For example, for the jth target end vector in the ith layer neural network, the encoder may perform attention mechanism processing on the target end vector sequence output by the previous layer, that is, the ith-1 layer neural network, to obtain a second self-attention distribution weight vector, which may be represented as α_i,j＝{α₁,α₂,…,α_j(dimension j of the second self-attention-assignment weight vector). The target end vector sequence output by the i-1 layer neural network can be expressed as H ═ { H ═ H₁,h₂,...,h_jAnd calculating a source end vector corresponding to a jth source end vector in the ith layer of neural network by using the following formula:

that is to say, the target end vector sequences corresponding to the j previous target words corresponding to the previous layer of neural network are subjected to weighted summation, and a first target end vector corresponding to the j-th target word of the current layer can be obtained through calculation. Accordingly, the vector of each target end in each layer of neural network can be calculated by the formula.

In one embodiment, the target end vector for each layer of the neural network can be calculated by adopting the following formula:

wherein, K^TRepresenting the transpose of the key-value matrix, d is the dimension of the model hidden layer vector. The Softmax function is a normalized exponential function.

S508, respectively acquiring attention distribution weight vectors corresponding to the current target words in a current layer of neural network from a second layer of neural network in the multilayer neural network; the attention classification weight vector corresponds to the source end fusion vector sequence.

Specifically, in a current-layer neural network from a second-layer neural network in the multi-layer neural network, attention distribution weight vectors corresponding to the current target words are respectively obtained. The attention allocation weight vector herein refers to an encoding-decoding attention allocation weight vector, and the attention classification weight vector corresponds to the source-side fusion vector sequence. It will be appreciated that the attention allocation weights will be different during different timings of decoder decoding, respectively.

In one embodiment, the machine translation model may calculate the attention allocation weight according to a previous target-side vector corresponding to a current layer neural network of the decoder and the source-side vector.

And S510, respectively calculating to obtain second target end vectors corresponding to the target words in the current neural network of the current layer according to the attention distribution weight vectors corresponding to the target words and the source end fusion vector sequence.

Specifically, the machine translation model may obtain, according to the attention distribution weight vector and the source end fusion vector sequence corresponding to each target word, a second target end vector corresponding to each target word in the current-level neural network by calculation.

Referring to fig. 6, for the second target end vector corresponding to each target word in each layer of neural network, the source end fusion vector sequence may be obtained by performing weighted summation, where the weighting coefficients correspond to respective numerical values in the attention distribution weight vector. It will be appreciated that the attention allocation weights will be different during different timings of decoder decoding, respectively.

For example, for any target word in each layer of neural network, the machine translation model may obtain the attention distribution weight vector corresponding to the target word in the current layer, which may be represented as α_i,j＝{α₁,α₂,…,α_J}. Wherein the content of the first and second substances,the source-side fused vector sequence may be represented as

Then the second target end vector corresponding to the target word can be calculated by the following formula:

accordingly, the second source end vector for each layer of the neural network can be calculated by the above formula.

And S512, fusing the first target end vector and the second target end vector to obtain a current target end vector.

Specifically, the machine translation model may obtain the current target end vector by fusing a first target end vector and a second target end vector through two layers of neural networks by using a ReLU (calibrated Linear Unit) or other activation functions. Correspondingly, the target end vectors corresponding to the target words of each layer of neural network can be obtained by adopting the method.

And S514, splicing target end vectors corresponding to each target word in the current-level neural network to obtain a current-level target end vector sequence output by the current-level neural network until target end vector sequences output by the current-level neural networks are obtained.

Specifically, the machine translation model may sequentially splice target end vectors corresponding to each target word in the current-level neural network to obtain a target end vector sequence output by the current-level neural network. For each layer of neural network in the multi-layer neural network of the decoder, the above steps S504 to S514 may be repeated until the target end vector sequences respectively output by the current layer of neural network are obtained.

And S516, fusing the target end vector sequence to obtain a target end fusion vector sequence.

Specifically, the machine translation model may fuse the target end vector sequences output by each layer of neural network to obtain a target end fusion vector sequence. The machine translation model can adopt a linear superposition fusion processing mode or a fusion network processing mode and other fusion modes to perform fusion processing on the target end fusion vector output by the multilayer neural network.

Further, when the current target word to be output is generated by the machine translation model, the current output probability sequence can be calculated through the output layer of the machine translation model according to the target end fusion vector sequence obtained by decoding by the decoder. And then selecting the candidate word corresponding to the maximum probability in the output probability sequence as the current target word. It is understood that the steps S502 to S516 and the step of generating the current target word according to the target-side fused vector sequence may be repeated for each time point decoded by the decoder until the machine translation model outputs the stop word.

It can be understood that the way of the first-layer neural network performing the target-end-layer transformation processing on the input data of the first-layer neural network is the same as that of the other-layer neural networks, and the initial value may be preset empirically or randomly generated for the corresponding data of the previous-layer neural network required for the processing of the first-layer neural network, so as to perform the next processing.

In the foregoing embodiment, the decoder performs a self-attention mechanism process on the word vectors of the target words output historically to obtain a first target-end vector, and performs an attention mechanism process on the source-end fusion vector sequence to obtain a second target-end vector. And then fusing the first target end vector and the second target end vector to obtain the current target end vector, wherein the target end vector can be fused with the hidden layer vector representation of the source end, and also integrates the semantic information of the target words output historically, so that the finally generated target text is more coherent, and the translation accuracy is higher.

In one embodiment, the multi-layer neural network is a dense connection network; the method comprises the following steps of carrying out semantic coding on a word sequence layer by layer through a multilayer neural network of an encoder in a machine translation model to obtain a source end fusion vector sequence: inputting a word sequence to a first layer of neural network in a multilayer neural network of an encoder in a machine translation model to obtain a source end vector sequence output by the first layer of neural network; acquiring preamble source end vector sequences respectively corresponding to all preamble layer neural networks of a current layer from a second layer neural network in a multi-layer neural network; linearly overlapping the vector sequence of the preorder source end to obtain a comprehensive vector sequence of the preorder source end; calculating to obtain a source end vector sequence output by the current layer of neural network according to the comprehensive preorder source end vector sequence and a source end vector sequence corresponding to the previous layer of neural network until a source end vector sequence output by the last layer of neural network in the multilayer neural network is obtained; and taking the source end vector sequence output by the last layer of neural network as a source end fusion vector sequence.

In particular, the multilayer neural network in the encoder is a dense connection network. In one embodiment, each dense concatenation in the encoder is a residual concatenation. Referring to fig. 7, fig. 7 shows a schematic diagram of the structure of a dense connection network in one embodiment. The machine translation model can take the source end vector sequences respectively output by all the preorder layer neural networks of the current layer as the input of the current layer neural network.

The computer equipment can input the word sequence of the source text into a first layer of neural network in a multilayer neural network of an encoder in a machine translation model, and obtains a source end vector sequence output by the first layer of neural network through source end layer transformation processing of the first layer of neural network. The source layer transform processing may specifically adopt the self-attention mechanism processing manner in the above embodiment.

Further, preamble source end vector sequences respectively corresponding to all preamble layer neural networks of a current layer are obtained from a current layer neural network from a second layer neural network in the multilayer neural network. The machine translation model can perform layer transformation processing on a source end vector sequence output by a previous layer of neural network, and linearly superimpose a vector sequence obtained after the source end layer transformation processing and all pre-sequence source end vector sequences of a current layer to obtain a source end vector sequence corresponding to the current layer. Correspondingly, executing the steps for each layer of the multilayer neural network in the encoder until a source end vector sequence output by the last layer of the multilayer neural network is obtained. And taking the source end vector sequence output by the last layer of neural network as a source end fusion vector sequence.

In one embodiment, the fusion calculation may be performed by the following equation:

wherein H^lA fusion vector sequence representing the output of the l-th layer neural network; h^l-1A source end vector sequence which represents the output of the l-1 layer neural network; hⁱA source end vector sequence representing the output of the i-th layer neural network; layer represents a Layer transform function;

the vector sequences output by the first-layer to l-1-layer neural networks are linearly superposed.

In an embodiment, the layer transformation function refers to a function that the current layer neural network performs layer transformation processing on the input of the current layer, and may be a self-attention mechanism processing mode corresponding to the source end in the foregoing embodiment.

It can be understood that, for the decoder, the multilayer neural network in the decoder may also adopt a dense connection network, and in the process of decoding the source-end fusion vector sequence layer by the multilayer neural network in the decoder to obtain the target-end fusion vector sequence fusing the target-end vector sequences output by the neural networks of each layer, the above-mentioned fusion mode corresponding to the encoder may be adopted, or a fusion mode different from the encoder may be adopted.

In the above embodiment, the source end vector sequence of the current layer is obtained by performing linear superposition on the preamble source end sequence corresponding to the preamble layer neural network of the current layer and combining the source end vector sequence corresponding to the previous layer neural network, and so on until the source end vector sequence output by the last layer neural network in the multilayer neural network is obtained. Therefore, the output of the preamble layer is used as the input of the current layer in a fusion mode, so that the source end vector sequence output by the last layer of neural network well fuses the output of each layer of neural network. And moreover, the difficulty of data processing can be effectively reduced by adopting a linear superposition mode, and the network execution efficiency is improved.

In one embodiment, the step of performing semantic coding on the word sequence layer by layer through a multilayer neural network of an encoder in a machine translation model to obtain a source end fusion vector sequence includes: semantic coding is carried out on the word sequence layer by layer through a multilayer neural network of an encoder in a machine translation model, and a source end vector sequence output by each layer of neural network source is obtained; and performing linear superposition on the source end vector sequences output by each layer of neural network to obtain a source end fusion vector sequence.

Specifically, the computer device may input the word sequence of the source text into a first layer neural network in a multilayer neural network of an encoder in the machine translation model, and obtain a source end vector sequence output by the first layer neural network through source end layer transformation processing of the first layer neural network. The source layer transform processing may specifically adopt the self-attention mechanism processing manner in the above embodiment. And taking the source end vector sequence output by the first layer of neural network as the input of the second layer of neural network, and obtaining the source end vector sequence output by the second layer of neural network through the source end layer transformation processing of the second layer of neural network. And each layer of neural network sequentially executes the steps, the output of the previous layer of neural network is used as the input of the current layer of neural network, and the output of the current layer of neural network is obtained through source end layer conversion processing of the current layer of neural network.

Further, referring to fig. 8, fig. 8 is a schematic structural diagram illustrating linear fusion of outputs of each layer of neural network in one embodiment. The machine translation model can linearly superpose the source end vector sequences output by each layer of neural network to obtain a source end fusion vector sequence. The linear superposition may specifically be an operation of performing weighted summation, and the weighting coefficient may be a trained model parameter.

In one embodiment, the linear superposition calculation is performed by the following equation:

wherein the content of the first and second substances,

representing a source end fusion vector sequence; hⁱA source end vector sequence representing the output of the i-th layer neural network; w_iRepresenting the weighting coefficient corresponding to the i-th layer neural network;

the vector sequences output by the first-layer to l-layer neural networks are linearly superposed.

It can be understood that, for the decoder, in the process that the multilayer neural network in the decoder decodes the source-end fusion vector sequence layer by layer to obtain the target-end fusion vector sequence fusing the target-end vector sequences output by the respective layers of neural networks, the above-mentioned fusion mode corresponding to the encoder may be adopted, or a fusion mode different from the encoder may be adopted.

In the embodiment, the source end fusion vector sequence is obtained by fusing the source end vector sequences output by the neural networks of each layer in a linear fusion mode, semantic and syntax information of each layer can be fused, each hidden layer representation in a machine translation model can be more fully learned, loss of effective information in the text translation process is reduced, and the accuracy of text translation is greatly improved. And moreover, the difficulty of data processing can be effectively reduced by adopting a linear superposition mode, and the network execution efficiency is improved.

In one embodiment, the step of performing semantic coding on the word sequence layer by layer through a multilayer neural network of an encoder in a machine translation model to obtain a source end fusion vector sequence specifically includes the following steps: inputting a word sequence to a first layer of neural network in a multilayer neural network of an encoder in a machine translation model to obtain a source end vector sequence output by the first layer of neural network; semantic coding is carried out on a source end vector sequence output by a first layer of neural network through a second layer of neural network in a multilayer neural network of an encoder in a machine translation model, and a source end vector sequence output by the second layer of neural network is obtained; carrying out network fusion processing on the source end vector sequence output by the first layer of neural network and the source end vector sequence output by the second layer of neural network to obtain a network fusion vector sequence corresponding to the second layer of neural network; acquiring a network fusion vector sequence corresponding to a previous layer of neural network from a current layer of neural network from a third layer of neural network in the multilayer neural network; performing network fusion processing on a source end vector sequence output by a current layer of neural network and a network fusion vector sequence corresponding to a previous layer of neural network to obtain a network fusion vector sequence corresponding to the current layer of neural network until a network fusion vector sequence corresponding to the last layer of neural network in the multilayer neural network is obtained; and taking the network fusion vector sequence corresponding to the last layer of neural network as a source end fusion vector sequence.

Referring to fig. 9, fig. 9 is a schematic structural diagram illustrating a structure of fusing outputs of neural networks of respective layers in a recursive fusion manner in an embodiment. The computer equipment can input the word sequence of the source text into a first layer of neural network in a multilayer neural network of an encoder in a machine translation model, and obtains a source end vector sequence output by the first layer of neural network through source end layer transformation processing of the first layer of neural network. The source layer transform processing may specifically adopt the self-attention mechanism processing manner in the above embodiment. And taking the source end vector sequence output by the first layer of neural network as the input of the second layer of neural network, and obtaining the source end vector sequence output by the second layer of neural network through the source end layer transformation processing of the second layer of neural network. And each layer of neural network sequentially executes the steps, the output of the previous layer of neural network is used as the input of the current layer of neural network, and the output of the current layer of neural network is obtained through source end layer conversion processing of the current layer of neural network.

On the other hand, the machine translation model can fuse the source end vector sequence output by the first layer of neural network and the source end vector sequence output by the second layer of neural network through two layers of neural networks by adopting a ReLU or other activation functions to obtain a network fusion vector sequence corresponding to the second layer of neural network. And then carrying out network fusion processing on the source end vector sequence output by the third layer of neural network and the network fusion vector sequence corresponding to the second layer of neural network to obtain the network fusion vector sequence corresponding to the third layer of neural network. And sequentially calculating until a network fusion vector sequence corresponding to the last layer of neural network in the multilayer neural network is obtained. And taking the network fusion vector sequence corresponding to the last layer of neural network as a source end fusion vector sequence.

In one embodiment, the recursive fusion calculation is performed by the following formula:

wherein, AGG represents converged network processing;

representing a fusion vector sequence corresponding to the l-th layer neural network;

representing a fusion vector sequence corresponding to the l-1 layer neural network; h^lAnd representing the vector sequence output by the l-th layer neural network.

In the embodiment, the source end fusion vector sequence is obtained by fusing the source end vector sequences output by the neural networks of each layer in a recursive fusion mode, so that the semantic and syntax information of each layer can be fused, the hidden layer representations in the machine translation model can be more fully learned, the loss of effective information in the text translation process is reduced, and the accuracy of text translation is greatly improved.

In one embodiment, the step of performing semantic coding on the word sequence layer by layer through a multilayer neural network of an encoder in a machine translation model to obtain a source end fusion vector sequence includes:

calculating to obtain a source end fusion vector sequence by the following formula:

wherein, AGG represents converged network processing;

representing a source end fusion vector sequence corresponding to the 2 i-th layer neural network;

representing a source end fusion vector sequence corresponding to the layer 2(i-1) neural network; h^2i-1A source end vector sequence which represents the output of the 2i-1 layer neural network; h²ⁱAnd the source end vector sequence of the output of the 2 i-th layer neural network is represented.

Specifically, referring to fig. 10, fig. 10 is a schematic structural diagram illustrating a hierarchical fusion of outputs of neural networks of respective layers in an embodiment. The computer equipment can input the word sequence of the source text into a first layer of neural network in a multilayer neural network of an encoder in a machine translation model, and obtains a source end vector sequence output by the first layer of neural network through source end layer transformation processing of the first layer of neural network. The source layer transform processing may specifically adopt the self-attention mechanism processing manner in the above embodiment. And taking the source end vector sequence output by the first layer of neural network as the input of the second layer of neural network, and obtaining the source end vector sequence output by the second layer of neural network through the source end layer transformation processing of the second layer of neural network.

On the other hand, the machine translation model can perform network fusion processing on the source end vector sequence output by the first layer of neural network and the source end vector sequence output by the second layer of neural network to obtain a network fusion vector sequence corresponding to the second layer of neural network. And the network fusion vector sequence corresponding to the second layer of neural network is used as the input of the third layer of neural network, and the source end vector sequence output by the third layer of neural network is obtained through the source end layer transformation processing of the third layer of neural network. And taking the source end vector sequence output by the third layer of neural network as the input of the fourth layer of neural network, and obtaining the source end vector sequence output by the fourth layer of neural network through the source end layer transformation processing of the fourth layer of neural network.

And the machine translation model carries out network fusion processing on the source end vector sequence output by the third layer of neural network, the source end vector sequence output by the fourth layer of neural network and the network fusion vector sequence corresponding to the second layer of neural network to obtain the network fusion vector sequence corresponding to the fourth layer of neural network. And sequentially processing layer by layer until a network fusion vector sequence corresponding to the last layer of neural network is obtained. And taking the network fusion vector sequence corresponding to the last layer of neural network as a source end fusion vector sequence.

In the embodiment, the source end fusion vector sequence is obtained by fusing the source end vector sequences output by the neural networks of each layer in a hierarchical fusion mode, semantic and syntax information of each layer can be fused, each hidden layer representation in a machine translation model can be more fully learned, loss of effective information in the text translation process is reduced, and the accuracy of text translation is greatly improved.

In one embodiment, the step of performing semantic coding on the word sequence layer by layer through a multilayer neural network of an encoder in a machine translation model to obtain a source end fusion vector sequence includes: semantic coding is carried out on the word sequence layer by layer through a multilayer neural network of an encoder in a machine translation model, and a source end vector sequence output by each layer of neural network source is obtained; and performing cross-layer attention mechanism fusion on the source end vector sequences output by each layer of neural network to obtain a source end fusion vector sequence. Wherein, the cross-layer attention mechanism fusion calculation is carried out through the following formula:

wherein the content of the first and second substances,

wherein, AGG represents converged network processing; ATT denotes attention mechanism processing; c^lRepresenting a source end fusion vector sequence corresponding to the l-th layer neural network; q, K, V respectively representing vector sequences obtained by linear transformation of inputs of the current layer neural network by three different learnable parameter matrixes;

the content vector obtained after the attention mechanism processing is performed on the l-k layer is shown.

Referring to fig. 11, fig. 11 is a schematic diagram illustrating a structure of applying cross-layer attention mechanism fusion to outputs of neural networks of respective layers in one embodiment. Specifically, the computer device may input the word sequence of the source text into a first layer neural network in a multilayer neural network of an encoder in the machine translation model, and obtain a source end vector sequence output by the first layer neural network through source end layer transformation processing of the first layer neural network. The source layer transform processing may specifically adopt the self-attention mechanism processing manner in the above embodiment. And taking the source end vector sequence output by the first layer of neural network as the input of the second layer of neural network, and obtaining the source end vector sequence output by the second layer of neural network through the source end layer transformation processing of the second layer of neural network. And each layer of neural network sequentially executes the steps, the output of the previous layer of neural network is used as the input of the current layer of neural network, and the output of the current layer of neural network is obtained through source end layer conversion processing of the current layer of neural network.

Further, the machine translation model can perform cross-layer attention mechanism processing on the source end vector sequences output by each layer of neural network to obtain content vectors corresponding to each layer of neural network. And then carrying out network fusion processing on the content vectors corresponding to the neural networks of each layer to obtain a source end fusion vector sequence.

The cross-layer attention mechanism processing of the source end vector sequence output by each layer of neural network can be calculated by adopting the following formula:

wherein Q, K, V is obtained by linear transformation of the input of the current layer (i.e. the output of the neural network of the previous layer) according to three different learnable parameter matrices, i.e. Q is a request (query) vector sequence, K is a key vector sequence, and V is a value vector sequence. Accordingly, Q^lThe method comprises the steps of obtaining a request vector sequence by performing linear transformation on input of a first-layer neural network according to a learnable parameter matrix; k^l-kThe method is characterized in that a key vector sequence is obtained by performing linear transformation on input of a l-k layer neural network according to a learnable parameter matrix; v^l-kThe method refers to a value vector sequence obtained by linearly transforming input of the l-k layer neural network according to a learnable parameter matrix.

In the embodiment, the source end fusion vector sequence is obtained by fusing the source end vector sequences output by the neural networks of each layer in a cross-layer attention mechanism fusion mode, semantic and syntax information of each layer can be fused, each hidden layer representation in a machine translation model can be more fully learned, loss of effective information in a text translation process is reduced, and accuracy of text translation is greatly improved.

In one embodiment, the step of training the machine translation model specifically comprises the steps of:

s1202, a sample word sequence and a reference output probability sequence of the training samples in the sample set are obtained.

In particular, a sample set is a collection of large amounts of training data that are needed for model training. The sample set comprises a sample word sequence corresponding to each sample and a corresponding reference output probability sequence. The sample data in the sample set may be obtained from a plurality of public data sets. And the candidate word corresponding to the maximum probability in the reference output probability sequence is the reference target word.

And S1204, inputting the sample word sequence into a machine translation model for training to obtain a prediction output probability sequence.

Specifically, the computer device may input the sample word sequence into a machine translation model, and execute the text translation method according to the machine translation model to obtain the prediction output probability sequence. The computer device may adjust the model parameters towards reducing a difference between the predicted output probability sequence and the reference output probability. Therefore, a prediction output probability sequence is obtained by continuously inputting the sample word sequence, and model parameters are adjusted according to the difference between the prediction output probability sequence and the reference output probability so as to train the machine translation model.

And S1206, constructing a maximum likelihood function according to the reference output probability sequence and the prediction output probability sequence.

Wherein the maximum likelihood function is used to evaluate the degree of difference between the reference output probability sequence and the predicted output probability sequence. The maximum likelihood function bears the responsibility of obtaining a good machine translation model through training, so that the model generates a target text with correct grammar and smooth characters.

In one embodiment, the maximum likelihood function may be constructed by the following equation:

where argmax is a function indicating the finding of a parameter with a maximum value, i.e. finding such that logP (y) isⁿ|xⁿ(ii) a θ) is the maximum value.

Is a training sample pair in model training, i.e., a sample word sequence, and a reference output

Probability. log is a logarithmic function, Σ () is a superposition operation, and θ is a model parameter.

And S1208, constructing a layer difference function according to differences among the vector sequences corresponding to the neural networks in different layers.

Wherein the layer difference function is used to evaluate the degree of difference between different layers in the neural network. In particular, the machine translation model may construct layer difference functions based on differences between outputs of different layers one by one. In one embodiment, the machine translation model may construct a layer difference function by differences between adjacent layers.

In one embodiment, the machine translation model can respectively obtain vector sequences corresponding to different layers of convolutional neural networks; constructing a difference function according to the difference between vector sequences corresponding to different layers of convolutional neural networks; and carrying out regularization processing on the difference function to obtain a layer difference function.

Specifically, the machine translation model may calculate the difference between the vector sequences corresponding to different layers of the convolutional neural network by calculating the average of the squared cosine distances of the vectors between the hidden layers. The following formula may be referenced:

wherein H^l＝{s₁,s₂,…,s_N}，H^l′＝{t₁,t₂,…,t_N}；s_iAnd t_iRespectively representing implicit vectors in different layers of neural networks.

In one embodiment, the computer device may further input the vector sequences output by the neural networks of different layers to a new neural network discriminator, and learn the similarity between the layers through the neural network discriminator, so as to obtain the difference between the different layers.

Further, after the machine translation model constructs the difference function according to the difference between the vector sequences corresponding to different layers of convolutional neural networks, the difference function can be regularized to obtain a layer difference function. Reference is made to the following formula:

by carrying out regularization processing on the difference function, the difference between layers can be enhanced, so that the machine translation model can better fuse the output of each layer of neural network, more fully learn each hidden layer representation in the machine translation model, and reduce the loss of effective information in the text translation process.

S1210, taking the weighted sum function of the maximum likelihood function and the layer difference function as the target function of the machine translation model.

Specifically, the machine translation model may obtain the objective function of each sample by the following formula: j is J_likelihood+λJ_diversity. Wherein λ is a weighting coefficient; and respectively carrying out weighted summation on the maximum likelihood function and the layer difference function to obtain a final target function.

In one embodiment, the machine translation module may construct a layer difference function from the differences between all of the different layers. Alternatively, to reduce the computational effort, the machine translation model may construct a layer difference function from the differences between adjacent layers. And then constructing an objective function according to the layer difference function and the maximum likelihood function.

In one embodiment, the objective function is represented by the following formula:

wherein, [ x, y [ ]]Is a training sample pair in model training; h^l＝{s₁,s₂,…,s_N}，H^l+1＝{t₁,t₂,…,t_N}；s_iAnd t_iRespectively representing implicit vectors in different layers of neural networks; d (H)^l,H^l+1) Representing the difference value between vector sequences corresponding to adjacent layers of convolutional neural networks; λ is a hyperparameter; θ, γ are model parameters, respectively.

And S1212, taking the model parameter when the target function is maximized as the model parameter of the machine translation model, returning to the step of inputting the sample word sequence into the machine translation model for training to obtain the predicted output probability sequence, and continuing training until the training stopping condition is met.

Wherein the training stop condition is a condition for ending the model training. The training stopping condition may be that a preset number of iterations is reached, or that the performance index of the machine translation model after the model parameters are adjusted reaches a preset index. Adjusting the model parameters of the machine translation model is to adjust the model parameters of the machine translation model.

Specifically, for the target function corresponding to each sample sequence, the model parameter when the target function is the maximum is taken as the model parameter of the machine translation model, and then the next sample sequence is predicted on the basis of the model parameter so as to continue training the model parameter until the training stopping condition is met.

In one embodiment, at least five fusion modes provided by the present application can be respectively implemented by training a machine translation model through corresponding methods. The various embodiments provided by the application can be used in all mainstream neural network machine translation systems and are applicable to translation tasks of all languages. The translation quality of the various embodiments of the application is obviously improved in text translation.

In the embodiment, in the process of training the model, the maximum likelihood and the difference between different layers of the neural network are considered in the trained target, so that the trained machine translation model can learn the hidden layer representations more fully, the loss of effective information in the text translation process is reduced, the accuracy of text translation is greatly improved, and the quality of the target text is higher.

As shown in fig. 13, in a specific embodiment, the text translation method specifically includes the following steps:

s1302, a word sequence of the source text is obtained.

And S1304, inputting a word sequence to a first layer of neural network in a multilayer neural network of an encoder in the machine translation model, and obtaining a source end vector sequence output by the first layer of neural network.

S1306, in a current layer neural network from a second layer neural network in the multilayer neural network, respectively obtaining first self-attention distribution weight vectors corresponding to each word in a word sequence in the current layer neural network.

S1308, respectively calculating to obtain source end vectors corresponding to each word in the neural network of the current layer and the word sequence according to the first self-attention distribution weight vector corresponding to each word in the word sequence and the source end vector sequence output by the neural network of the previous layer.

S1310, splicing the source end vectors corresponding to the words in the neural network of the current layer and in the word sequence to obtain a source end vector sequence output by the neural network of the current layer until source end vector sequences output by the neural networks of all layers are obtained.

And S1312, fusing the source end vector sequence to obtain a source end fusion vector sequence. The source end fusion vector sequence fuses the source end vector sequences output by each layer of neural network.

S1214, inputting the word vector of the target word output by the machine translation model history into the first layer neural network in the multilayer neural network of the decoder in the machine translation model, and obtaining the target end vector sequence output by the current first layer neural network.

S1316, obtaining second self-attention distribution weight vectors corresponding to the current target word from a second layer of neural networks in the multi-layer neural networks.

S1318, calculating a first target end vector corresponding to each target word in the current layer of neural network according to the second self-attention-assignment weight vector corresponding to each target word and the target end vector sequence corresponding to the previous layer of neural network.

S1320, respectively obtaining attention distribution weight vectors corresponding to the current target words in the current layer of neural network from the second layer of neural network in the multilayer neural network; the attention classification weight vector corresponds to the source end fusion vector sequence.

And S1322, respectively calculating to obtain a second target end vector corresponding to each target word in the current neural network of the current layer according to the attention distribution weight vector corresponding to each target word and the source end fusion vector sequence.

And S1324, fusing the first target end vector and the second target end vector to obtain a current target end vector.

S1326, target end vectors corresponding to each target word in the current-level neural network are spliced to obtain a target end vector sequence output by the current-level neural network until target end vector sequences output by the current-level neural network are obtained.

And S1328, fusing the target end vector sequence to obtain a target end fusion vector sequence.

And S1330, determining the target words output by the machine translation model at the current moment according to the target end fusion vector sequence at the current moment.

And S1332, generating a target text corresponding to the source text according to each target word output by the machine translation model.

According to the text translation method, through the multilayer neural network of the encoder in the machine translation model, semantic coding is carried out on word sequences of a source text layer by layer, source end vector sequences output by the neural networks of each layer are fused, and a source end fusion vector sequence is obtained. Therefore, by fusing the source end vector sequences output by each layer of neural network, the information of each hidden layer in the machine translation model can be fused so as to learn better hidden layer representation. And decoding the source end fusion vector sequence by a decoder of the machine translation model according to the word vector of the target word output by the machine translation model in the previous time to obtain a current target end vector sequence, and determining the target word output by the machine translation model in the current time according to the target end vector sequence. And generating a target text corresponding to the source text according to each target word output by the machine translation model. Therefore, when the encoding-decoding framework is used for performing text translation on the source text, the semantic and syntax information of each layer can be fused, each hidden layer in the machine translation model can be more fully learned, the loss of effective information in the text translation process is reduced, and the accuracy of text translation is greatly improved.

FIG. 13 is a flowchart illustrating a method for text translation in one embodiment. It should be understood that, although the steps in the flowchart of fig. 13 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 13 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

As shown in fig. 14, in one embodiment, there is provided a text translation apparatus 1400 comprising: an acquisition module 1401, an encoding module 1402, a decoding module 1403, a determination module 1404, and a generation module 1405.

An obtaining module 1401, configured to obtain a word sequence of a source text.

The encoding module 1402 is configured to perform semantic encoding on the word sequence layer by layer through a multilayer neural network of an encoder in the machine translation model to obtain a source end fusion vector sequence; the source end fusion vector sequence fuses the source end vector sequences output by each layer of neural network.

The decoding module 1403 is configured to decode, by a decoder of the machine translation model, the source-end fusion vector sequence according to the word vector of the target word output by the machine translation model in the previous time, so as to obtain a current target-end vector sequence.

And a determining module 1404, configured to determine, according to the target end vector sequence, a target word currently output by the machine translation model.

And the generating module 1405 is configured to generate a target text corresponding to the source text according to each target word output by the machine translation model.

In one embodiment, the encoding module 1402 is further configured to input the word sequence to a first layer neural network in a multi-layer neural network of an encoder in the machine translation model, and obtain a source end vector sequence output by the first layer neural network; respectively acquiring first self-attention distribution weight vectors corresponding to each word in a word sequence in a current-layer neural network from a second-layer neural network in the multi-layer neural network; respectively calculating to obtain source end vectors corresponding to all words in the neural network of the current layer and the words in the word sequence according to a first self-attention distribution weight vector corresponding to all the words in the word sequence and a source end vector sequence output by a neural network of the previous layer; splicing source end vectors corresponding to each word in the word sequence and in the neural network of the current layer to obtain a source end vector sequence output by the neural network of the current layer until source end vector sequences output by the neural networks of all layers are obtained; and fusing the source end vector sequence to obtain a source end fusion vector sequence.

In one embodiment, the decoding module 1403 is further configured to decode the source-end fusion vector sequence layer by layer through a multilayer neural network of a decoder in the machine translation model according to the word vector of the target word output by the machine translation model in the previous time, so as to obtain a current target-end fusion vector sequence; the target end fusion vector sequence fuses the target end vector sequences output by each layer of neural network. The determining module 1404 is further configured to determine a target word output by the machine translation model at the current time according to the target-side fusion vector sequence at the current time.

In one embodiment, the decoding module 1403 is further configured to input the word vector of the target word output historically by the machine translation model into a first layer neural network in a multilayer neural network of a decoder in the machine translation model, to obtain a target end vector sequence output by the current first layer neural network; respectively acquiring second self-attention distribution weight vectors corresponding to the current target words from a second layer of neural network in the multilayer neural network; respectively calculating to obtain first target end vectors corresponding to the target words in the current-layer neural network and the current-layer neural network according to the second self-attention distribution weight vector corresponding to each target word and the target end vector sequence corresponding to the previous-layer neural network; respectively acquiring attention distribution weight vectors corresponding to current target words in a current layer of neural network from a second layer of neural network in the multilayer neural network; the attention classification weight vector corresponds to the source end fusion vector sequence; respectively calculating to obtain second target end vectors corresponding to the target words in the current neural network of the current layer according to the attention distribution weight vectors corresponding to the target words and the source end fusion vector sequence; fusing the first target end vector and the second target end vector to obtain a current target end vector; splicing target end vectors corresponding to each target word in the current-level neural network to obtain a current-level target end vector sequence output by the current-level neural network until target end vector sequences output by each level of neural network at the current level are obtained; and fusing the target end vector sequence to obtain a target end fusion vector sequence.

In one embodiment, the encoding module 1402 is further configured to input the word sequence to a first layer neural network in a multi-layer neural network of an encoder in the machine translation model, and obtain a source end vector sequence output by the first layer neural network; acquiring preamble source end vector sequences respectively corresponding to all preamble layer neural networks of a current layer from a second layer neural network in a multi-layer neural network; linearly overlapping the vector sequence of the preorder source end to obtain a comprehensive vector sequence of the preorder source end; calculating to obtain a source end vector sequence output by the current layer of neural network according to the comprehensive preorder source end vector sequence and a source end vector sequence corresponding to the previous layer of neural network until a source end vector sequence output by the last layer of neural network in the multilayer neural network is obtained; and taking the source end vector sequence output by the last layer of neural network as a source end fusion vector sequence.

In one embodiment, the encoding module 1402 is further configured to perform semantic encoding on the word sequence layer by layer through a multilayer neural network of an encoder in the machine translation model to obtain a source-end vector sequence output by each layer of neural network source;

carrying out linear superposition on the source end vector sequences output by each layer of neural network to obtain a source end fusion vector sequence; wherein the linear superposition calculation is performed by the following formula:

wherein the content of the first and second substances,

In one embodiment, the encoding module 1402 is further configured to input the word sequence to a first layer neural network in a multi-layer neural network of an encoder in the machine translation model, and obtain a source end vector sequence output by the first layer neural network; semantic coding is carried out on a source end vector sequence output by a first layer of neural network through a second layer of neural network in a multilayer neural network of an encoder in a machine translation model, and a source end vector sequence output by the second layer of neural network is obtained; carrying out network fusion processing on the source end vector sequence output by the first layer of neural network and the source end vector sequence output by the second layer of neural network to obtain a network fusion vector sequence corresponding to the second layer of neural network; acquiring a network fusion vector sequence corresponding to a previous layer of neural network from a current layer of neural network from a third layer of neural network in the multilayer neural network; performing network fusion processing on a source end vector sequence output by a current layer of neural network and a network fusion vector sequence corresponding to a previous layer of neural network to obtain a network fusion vector sequence corresponding to the current layer of neural network until a network fusion vector sequence corresponding to the last layer of neural network in the multilayer neural network is obtained; and taking the network fusion vector sequence corresponding to the last layer of neural network as a source end fusion vector sequence.

In one embodiment, the encoding module 1402 is further configured to calculate a source-end fusion vector sequence by the following formula:

wherein, AGG represents converged network processing;

performing cross-layer attention mechanism fusion on the source end vector sequences output by each layer of neural network to obtain source end fusion vector sequences; wherein, the cross-layer attention mechanism fusion calculation is carried out through the following formula:

wherein the content of the first and second substances,

Referring to fig. 15, in an embodiment, the text translation apparatus 1400 further includes a training module 1406, where the training module 1406 is configured to obtain a sample word sequence of a training sample in the sample set and a reference output probability sequence; inputting the sample word sequence into a machine translation model for training to obtain a predicted output probability sequence; constructing a maximum likelihood function according to the reference output probability sequence and the prediction output probability sequence; constructing layer difference functions according to differences among vector sequences corresponding to different layers of neural networks; taking the weighted sum function of the maximum likelihood function and the layer difference function as a target function of a machine translation model; and taking the model parameter when the target function is maximized as the model parameter of the machine translation model, returning to the step of inputting the sample word sequence into the machine translation model for training to obtain the predicted output probability sequence, and continuing training until the training stopping condition is met.

In one embodiment, the training module 1406 is further configured to obtain vector sequences corresponding to different layers of convolutional neural networks, respectively; constructing a difference function according to the difference between vector sequences corresponding to different layers of convolutional neural networks; and carrying out regularization processing on the difference function to obtain a layer difference function.

In one embodiment, the training module 1406 is further operable to represent the objective function by the following formula:

The text translation device performs semantic coding on the word sequence of the source text layer by layer through the multilayer neural network of the encoder in the machine translation model, fuses the source end vector sequences output by each layer of neural network, and obtains a source end fusion vector sequence. Therefore, by fusing the source end vector sequences output by each layer of neural network, the information of each hidden layer in the machine translation model can be fused so as to learn better hidden layer representation. And decoding the source end fusion vector sequence by a decoder of the machine translation model according to the word vector of the target word output by the machine translation model in the previous time to obtain a current target end vector sequence, and determining the target word output by the machine translation model in the current time according to the target end vector sequence. And generating a target text corresponding to the source text according to each target word output by the machine translation model. Therefore, when the encoding-decoding framework is used for performing text translation on the source text, the semantic and syntax information of each layer can be fused, each hidden layer in the machine translation model can be more fully learned, the loss of effective information in the text translation process is reduced, and the accuracy of text translation is greatly improved.

FIG. 16 is a diagram illustrating an internal structure of a computer device in one embodiment. The computer device may specifically be the terminal 110 or the server 120 in fig. 1. As shown in fig. 16, the computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system and may also store a computer program that, when executed by the processor, causes the processor to implement a text translation method. The internal memory may also have stored therein a computer program that, when executed by the processor, causes the processor to perform a text translation method.

Those skilled in the art will appreciate that the architecture shown in fig. 16 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, the text translation apparatus provided in the present application may be implemented in the form of a computer program that is executable on a computer device as shown in fig. 16. The memory of the computer device may store various program modules constituting the text translation apparatus, such as an acquisition module, an encoding module, a decoding module, a determination module, and a generation module shown in fig. 14. The respective program modules constitute computer programs that cause the processors to execute the steps in the text translation methods of the embodiments of the present application described in the present specification.

For example, the computer device shown in fig. 16 may execute step S202 by the acquisition module in the text translation apparatus shown in fig. 14. The computer device may perform step S204 by the encoding module. The computer device may perform step S206 through the decoding module. The computer device may perform step S208 by the determination module. The computer device may perform step S210 through the generation module.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the text translation method described above. Here, the steps of the text translation method may be steps in the text translation method of each of the above embodiments.

In one embodiment, a computer-readable storage medium is provided, in which a computer program is stored, which, when executed by a processor, causes the processor to perform the steps of the above-described text translation method. Here, the steps of the text translation method may be steps in the text translation method of each of the above embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method of text translation, comprising:

acquiring a word sequence of a source text;

performing semantic coding on the word sequence layer by layer through a multilayer neural network of an encoder in a machine translation model to obtain a source end fusion vector sequence; the source end fusion vector sequence fuses a source end vector sequence output by the multilayer neural network;

generating a target text corresponding to the source text according to each target word output by the machine translation model;

the fusion mode for fusing the source end vector sequence output by the multilayer neural network comprises the following steps:

fusing the source end vector sequence output by the multilayer neural network by adopting any one of a linear fusion mode, a recursive fusion mode, a hierarchical fusion mode or a cross-layer attention mechanism fusion mode to obtain a source end fusion vector sequence, or,

and calculating to obtain the output of the current layer according to the input by taking the source end vector sequences respectively output by all the preorder layer neural networks of the current layer in the multilayer neural network as the input of the current layer until the output of the last layer of neural network is obtained, namely the source end fusion vector sequence.

2. The method of claim 1, wherein decoding, by the decoder of the machine translation model, the source-side fused vector sequence according to the word vector of the target word output by the machine translation model in the previous time to obtain a current target-side vector sequence, comprises:

decoding the source end fusion vector sequence layer by layer through a multilayer neural network of a decoder in the machine translation model according to a word vector of a target word output by the machine translation model in the previous time to obtain a current target end fusion vector sequence; the target end fusion vector sequence fuses a target end vector sequence output by the multilayer neural network;

the determining the target word output by the machine translation model at the current moment according to the target end vector sequence includes:

and determining the target words output by the machine translation model at the current moment according to the target end fusion vector sequence at the current moment.

3. The method of claim 2, wherein the decoding, by a multi-layer neural network of a decoder in the machine translation model, the source-end fusion vector sequence layer by layer according to a word vector of a target word output from a previous time of the machine translation model to obtain a target-end fusion vector sequence comprises:

inputting word vectors of target words output historically by the machine translation model into a first layer of neural network in a multilayer neural network of a decoder in the machine translation model to obtain a target end vector sequence output by the first layer of neural network at the current time;

respectively acquiring second self-attention distribution weight vectors corresponding to the current target words in a current layer of neural network from a second layer of neural network in the multilayer neural network;

respectively calculating to obtain a first target end vector corresponding to each target word in the current-layer neural network at the current time according to a second self-attention distribution weight vector corresponding to each target word and a target end vector sequence corresponding to the preorder-layer neural network;

respectively acquiring attention distribution weight vectors corresponding to current target words in a current layer of neural network from a second layer of neural network in the multilayer neural network; the attention classification weight vector corresponds to the source end fusion vector sequence;

respectively calculating to obtain second target end vectors corresponding to the target words in the current neural network of the current layer according to the attention distribution weight vectors corresponding to the target words and the source end fusion vector sequence;

fusing the first target end vector and the second target end vector to obtain a current target end vector;

splicing target end vectors corresponding to each target word in the current-level neural network to obtain a current-level target end vector sequence output by the current-level neural network until target end vector sequences output by each level of neural network at the current level are obtained;

and acquiring a target end fusion vector sequence which is output by the multi-layer neural network in a fusion manner.

4. The method of claim 1, wherein the semantic encoding the word sequence layer by layer through a multi-layer neural network of an encoder in a machine translation model to obtain a source-end fusion vector sequence comprises:

inputting the word sequence to a first layer of neural network in a multilayer neural network of an encoder in a machine translation model to obtain a source end vector sequence output by the first layer of neural network;

respectively acquiring first self-attention distribution weight vectors corresponding to each word in the word sequence in a current layer neural network from a second layer neural network in the multilayer neural network;

respectively calculating to obtain source end vectors corresponding to all words in the word sequence and in the current layer neural network according to a first self-attention distribution weight vector corresponding to all the words in the word sequence and a source end vector sequence output by a preorder layer neural network;

splicing source end vectors corresponding to words in the word sequence and in the neural network of the current layer to obtain a source end vector sequence output by the neural network of the current layer until source end vector sequences output by the neural networks of all layers are obtained;

and acquiring a source end fusion vector sequence which fuses the source end vector sequences output by the multilayer neural network.

5. The method of any one of claims 1 to 3, wherein the multi-layer neural network is a dense connection network; the semantic coding is carried out on the word sequence layer by layer through a multilayer neural network of an encoder in a machine translation model, and the obtaining of the source end fusion vector sequence comprises the following steps:

obtaining preamble source end vector sequences respectively corresponding to all preamble layer neural networks of a current layer from a second layer neural network in the multilayer neural network;

linearly overlapping the preorder source end vector sequence to obtain a comprehensive preorder source end vector sequence;

calculating to obtain a source end vector sequence output by the current layer of neural network according to the comprehensive preorder source end vector sequence and a source end vector sequence corresponding to the previous layer of neural network until a source end vector sequence output by the last layer of neural network in the multilayer neural network is obtained;

and taking the source end vector sequence output by the last layer of neural network as a source end fusion vector sequence.

6. The method according to any one of claims 1 to 3, wherein the semantic encoding the word sequence layer by layer through a multi-layer neural network of an encoder in a machine translation model to obtain a source-end fusion vector sequence comprises:

semantic coding is carried out on the word sequence layer by layer through a multilayer neural network of an encoder in a machine translation model, and a source end vector sequence output by each layer of neural network source is obtained;

wherein the content of the first and second substances,

7. The method according to any one of claims 1 to 3, wherein the semantic encoding the word sequence layer by layer through a multi-layer neural network of an encoder in a machine translation model to obtain a source-end fusion vector sequence comprises:

semantic coding is carried out on a source end vector sequence output by a first layer of neural network through a second layer of neural network in a multilayer neural network of an encoder in the machine translation model, and a source end vector sequence output by the second layer of neural network is obtained;

carrying out network fusion processing on a source end vector sequence output by a first layer of neural network and a source end vector sequence output by a second layer of neural network to obtain a network fusion vector sequence corresponding to the second layer of neural network;

acquiring a network fusion vector sequence corresponding to a previous layer of neural network from a current layer of neural network from a third layer of neural network in the multilayer neural network;

performing network fusion processing on a source end vector sequence output by a current layer of neural network and a network fusion vector sequence corresponding to the previous layer of neural network to obtain a network fusion vector sequence corresponding to the current layer of neural network until a network fusion vector sequence corresponding to the last layer of neural network in the multilayer neural network is obtained;

and taking the network fusion vector sequence corresponding to the last layer of neural network as a source end fusion vector sequence.

8. The method according to any one of claims 1 to 3, wherein the semantic encoding the word sequence layer by layer through a multi-layer neural network of an encoder in a machine translation model to obtain a source-end fusion vector sequence comprises:

wherein, AGG represents converged network processing;

9. The method according to any one of claims 1 to 3, wherein the semantic encoding the word sequence layer by layer through a multi-layer neural network of an encoder in a machine translation model to obtain a source-end fusion vector sequence comprises:

wherein the content of the first and second substances,

wherein, AGG represents converged network processing; ATT denotes attention mechanism processing; c^lRepresenting a source end fusion vector sequence corresponding to the l-th layer neural network; q, K, V respectively representing the linear transformation of the input of the current layer neural network by three different learnable parameter matrixesA sequence of vectors of arrival;

10. The method of claim 1, wherein the step of training the machine translation model comprises:

acquiring a sample word sequence and a reference output probability sequence of training samples in a sample set;

inputting the sample word sequence into a machine translation model for training to obtain a predicted output probability sequence;

constructing a maximum likelihood function according to the reference output probability sequence and the prediction output probability sequence;

constructing layer difference functions according to differences among vector sequences corresponding to different layers of neural networks;

taking a weighted sum function of the maximum likelihood function and the layer difference function as a target function of the machine translation model;

and taking the model parameter when the target function is maximized as the model parameter of a machine translation model, returning to the step of inputting the sample word sequence into the machine translation model for training to obtain a predicted output probability sequence, and continuing training until the training is stopped when the training stopping condition is met.

11. The method according to claim 10, wherein constructing the layer difference function according to the differences between the vector sequences corresponding to the neural networks of different layers comprises:

respectively obtaining vector sequences corresponding to different layers of convolutional neural networks;

constructing a difference function according to the difference between vector sequences corresponding to different layers of convolutional neural networks;

and carrying out regularization processing on the difference function to obtain a layer difference function.

12. The method of claim 11, wherein said applying a weighted sum function of the maximum likelihood function and the layer difference function as an objective function of the machine translation model comprises:

the objective function is represented by the following formula:

wherein, [ x, y [ ]]Is a training sample pair in model training; p (y | x; theta) represents a maximum likelihood function; h^l＝{s₁,s₂,…,s_N}，H^l+1＝{t₁,t₂,…,t_N}；s_iAnd t_iRespectively representing implicit vectors in different layers of neural networks; d (H)^l,H^l+1) Representing the difference value between vector sequences corresponding to adjacent layers of convolutional neural networks; l represents the total number of layers of the multilayer neural network, and N represents the total number of implicit vectors in each layer of the neural network; λ is a hyperparameter; θ, γ are model parameters, respectively.

13. A text translation apparatus, characterized in that the apparatus comprises:

the acquisition module is used for acquiring a word sequence of a source text;

the encoding module is used for carrying out semantic encoding on the word sequence layer by layer through a multilayer neural network of an encoder in a machine translation model to obtain a source end fusion vector sequence; the source end fusion vector sequence fuses a source end vector sequence output by the multilayer neural network;

the generating module is used for generating a target text corresponding to the source text according to each target word output by the machine translation model;

the method comprises the steps of adopting any one of a linear fusion mode, a recursive fusion mode, a hierarchical fusion mode or a cross-layer attention mechanism fusion mode to fuse source end vector sequences output by a multi-layer neural network to obtain a source end fusion vector sequence, or obtaining the output of a current layer according to input calculation by taking source end vector sequences respectively output by all pre-layer neural networks of the current layer in the multi-layer neural network as the input of the current layer until the output of the last layer of neural network is obtained, namely the source end fusion vector sequence.

14. The apparatus according to claim 13, wherein the decoding module is specifically configured to decode, layer by layer, the source-end fusion vector sequence according to a word vector of a target word output by the machine translation model in the previous time through a multilayer neural network of a decoder in the machine translation model, so as to obtain a current target-end fusion vector sequence; the target end fusion vector sequence fuses a target end vector sequence output by the multilayer neural network;

the determining module is specifically configured to determine a target word output by the machine translation model at the current time according to the target-end fusion vector sequence at the current time.

15. The apparatus according to claim 14, wherein the decoding module is specifically configured to input the word vector of the target word output by the machine translation model history into a first layer neural network in a multilayer neural network of a decoder in the machine translation model, to obtain a target end vector sequence output by the current first layer neural network; respectively acquiring second self-attention distribution weight vectors corresponding to the current target words in a current layer of neural network from a second layer of neural network in the multilayer neural network; respectively calculating to obtain a first target end vector corresponding to each target word in the current-layer neural network at the current time according to a second self-attention distribution weight vector corresponding to each target word and a target end vector sequence corresponding to the preorder-layer neural network; respectively acquiring attention distribution weight vectors corresponding to current target words in a current layer of neural network from a second layer of neural network in the multilayer neural network; the attention classification weight vector corresponds to the source end fusion vector sequence; respectively calculating to obtain second target end vectors corresponding to the target words in the current neural network of the current layer according to the attention distribution weight vectors corresponding to the target words and the source end fusion vector sequence; fusing the first target end vector and the second target end vector to obtain a current target end vector; splicing target end vectors corresponding to each target word in the current-level neural network to obtain a current-level target end vector sequence output by the current-level neural network until target end vector sequences output by each level of neural network at the current level are obtained; and acquiring a target end fusion vector sequence which is output by the multi-layer neural network in a fusion manner.

16. The apparatus according to claim 13, wherein the encoding module is specifically configured to input the word sequence to a first layer neural network in a multi-layer neural network of an encoder in a machine translation model, and obtain a source-end vector sequence output by the first layer neural network; respectively acquiring first self-attention distribution weight vectors corresponding to each word in the word sequence in a current layer neural network from a second layer neural network in the multilayer neural network; respectively calculating to obtain source end vectors corresponding to all words in the word sequence and in the current layer neural network according to a first self-attention distribution weight vector corresponding to all the words in the word sequence and a source end vector sequence output by a preorder layer neural network; splicing source end vectors corresponding to words in the word sequence and in the neural network of the current layer to obtain a source end vector sequence output by the neural network of the current layer until source end vector sequences output by the neural networks of all layers are obtained; and acquiring a source end fusion vector sequence which fuses the source end vector sequences output by the multilayer neural network.

17. The apparatus of any one of claims 13 to 15, wherein the multi-layer neural network is a dense connection network; the encoding module is specifically configured to input the word sequence to a first layer neural network in a multilayer neural network of an encoder in a machine translation model, and obtain a source end vector sequence output by the first layer neural network; obtaining preamble source end vector sequences respectively corresponding to all preamble layer neural networks of a current layer from a second layer neural network in the multilayer neural network; linearly overlapping the preorder source end vector sequence to obtain a comprehensive preorder source end vector sequence; calculating to obtain a source end vector sequence output by the current layer of neural network according to the comprehensive preorder source end vector sequence and a source end vector sequence corresponding to the previous layer of neural network until a source end vector sequence output by the last layer of neural network in the multilayer neural network is obtained; and taking the source end vector sequence output by the last layer of neural network as a source end fusion vector sequence.

18. The apparatus according to any one of claims 13 to 15, wherein the encoding module is specifically configured to perform semantic encoding on the word sequence layer by layer through a multilayer neural network of an encoder in a machine translation model, so as to obtain a source-end vector sequence output by each layer of neural network source; carrying out linear superposition on the source end vector sequences output by each layer of neural network to obtain a source end fusion vector sequence; wherein the linear superposition calculation is performed by the following formula:

wherein the content of the first and second substances,

representing a source end fusion vector sequence; hⁱA source end vector sequence representing the output of the i-th layer neural network; w is a group of_iRepresenting the weighting coefficient corresponding to the i-th layer neural network;

19. The apparatus according to any one of claims 13 to 15, wherein the encoding module is specifically configured to input the word sequence to a first layer neural network in a multilayer neural network of an encoder in a machine translation model, and obtain a source end vector sequence output by the first layer neural network; semantic coding is carried out on a source end vector sequence output by a first layer of neural network through a second layer of neural network in a multilayer neural network of an encoder in the machine translation model, and a source end vector sequence output by the second layer of neural network is obtained; carrying out network fusion processing on a source end vector sequence output by a first layer of neural network and a source end vector sequence output by a second layer of neural network to obtain a network fusion vector sequence corresponding to the second layer of neural network; acquiring a network fusion vector sequence corresponding to a previous layer of neural network from a current layer of neural network from a third layer of neural network in the multilayer neural network; performing network fusion processing on a source end vector sequence output by a current layer of neural network and a network fusion vector sequence corresponding to the previous layer of neural network to obtain a network fusion vector sequence corresponding to the current layer of neural network until a network fusion vector sequence corresponding to the last layer of neural network in the multilayer neural network is obtained; and taking the network fusion vector sequence corresponding to the last layer of neural network as a source end fusion vector sequence.

20. The apparatus according to any of claims 13 to 15, wherein the encoding module obtains the source-end fusion vector sequence by calculating according to the following formula:

wherein, AGG represents converged network processing;

21. The apparatus according to any one of claims 13 to 15, wherein the encoding module is specifically configured to perform semantic encoding on the word sequence layer by layer through a multilayer neural network of an encoder in a machine translation model, so as to obtain a source-end vector sequence output by each layer of neural network source; performing cross-layer attention mechanism fusion on the source end vector sequences output by each layer of neural network to obtain source end fusion vector sequences; wherein, the cross-layer attention mechanism fusion calculation is carried out through the following formula:

wherein, the first and the second end of the pipe are connected with each other,

22. The apparatus of claim 13, further comprising a training module configured to obtain a sample word sequence of training samples in a sample set and a reference output probability sequence; inputting the sample word sequence into a machine translation model for training to obtain a predicted output probability sequence; constructing a maximum likelihood function according to the reference output probability sequence and the prediction output probability sequence; constructing layer difference functions according to differences among vector sequences corresponding to different layers of neural networks; taking a weighted sum function of the maximum likelihood function and the layer difference function as a target function of the machine translation model; and taking the model parameter when the target function is maximized as the model parameter of a machine translation model, returning to the step of inputting the sample word sequence into the machine translation model for training to obtain a predicted output probability sequence, and continuing training until the training is stopped when the training stopping condition is met.

23. The apparatus of claim 22, wherein the training module is further configured to obtain vector sequences corresponding to different layers of convolutional neural networks, respectively; constructing a difference function according to the difference between vector sequences corresponding to different layers of convolutional neural networks; and carrying out regularization processing on the difference function to obtain a layer difference function.

24. The apparatus of claim 23, wherein the training module is further configured to represent an objective function by the following formula:

25. A computer-readable storage medium, storing a computer program which, when executed by a processor, causes the processor to carry out the steps of the method according to any one of claims 1 to 12.

26. A computer device comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of the method of any one of claims 1 to 12.