CN110008482B

CN110008482B - Text processing method and device, computer readable storage medium and computer equipment

Info

Publication number: CN110008482B
Application number: CN201910308349.4A
Authority: CN
Inventors: 王星; 涂兆鹏; 王龙跃; 史树明
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-04-17
Filing date: 2019-04-17
Publication date: 2021-03-09
Anticipated expiration: 2039-04-17
Also published as: CN111368564B; CN111368564A; CN110008482A

Abstract

The application relates to a text processing method, a text processing device, a computer readable storage medium and a computer device, wherein the method comprises the following steps: acquiring an input sequence of a source text; semantic coding the input sequence to obtain a source end vector sequence; acquiring a first weight vector corresponding to each word in the source end vector sequence; generating a target end vector of each word according to the source end vector sequence and the first weight vector corresponding to each word; obtaining a target sentence vector according to the source end vector sequence; determining a target word corresponding to each word according to the target end vector of each word and the target sentence vector; and generating a target text corresponding to the source text according to the target word corresponding to each word. By adopting the scheme, each word can be translated by utilizing sentence information, and the translation accuracy is improved.

Description

Text processing method and device, computer readable storage medium and computer equipment

Technical Field

The present application relates to the field of computer technologies, and in particular, to a text processing method and apparatus, a computer-readable storage medium, and a computer device.

Background

With the continuous development of machine learning techniques, machine translation techniques have emerged. Currently, neural network machine translation is used as a translation technology of the latest generation, and in current neural machine translation research and application, a word in a source-end sentence is generally selected by using an attention mechanism to be decoded and translated.

However, when the current neural machine translation framework performs attention mechanism to select a proper word for translation, the whole source-end sentence information cannot be considered sufficiently, so that the translated text is not accurate enough. For example, for certain ambiguous words, the situation where context may cause translation errors cannot be fully considered.

Disclosure of Invention

Based on this, it is necessary to provide a text processing method, an apparatus, a computer-readable storage medium, and a computer device for solving the technical problem that a failure to contact a context causes a translation error.

A text processing method, comprising:

acquiring an input sequence of a source text;

semantic coding the input sequence to obtain a source end vector sequence;

acquiring a first weight vector corresponding to each word in the source end vector sequence;

generating a target end vector of each word according to the source end vector sequence and the first weight vector corresponding to each word;

obtaining a target sentence vector according to the source end vector sequence;

determining a target word corresponding to each word according to the target end vector of each word and the target sentence vector;

and generating a target text corresponding to the source text according to the target word corresponding to each word.

A text processing apparatus, the apparatus comprising:

the sequence acquisition module is used for acquiring an input sequence of a source text;

the encoding module is used for carrying out semantic encoding on the input sequence to obtain a source end vector sequence;

the weight obtaining module is used for obtaining a first weight vector corresponding to each word in the source end vector sequence;

a target end vector generating module, configured to generate a target end vector of each word according to the source end vector sequence and the first weight vector corresponding to each word;

the target sentence vector determining module is used for obtaining a target sentence vector according to the source end vector sequence;

a target word determining module, configured to determine a target word corresponding to each word according to the target end vector of each word and the target sentence vector;

and the target text generation module is used for generating a target text corresponding to the source text according to the target word corresponding to each word.

A computer-readable storage medium, storing a computer program which, when executed by a processor, causes the processor to perform the steps of any of the methods described above.

A computer device comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of any of the methods described above.

According to the text processing method, the text processing device, the computer readable storage medium and the computer equipment, the source end vector sequence is obtained by obtaining the input sequence of the source text and performing semantic coding on the input sequence. The method comprises the steps of obtaining a first weight vector corresponding to each word in a source end vector sequence, and generating a target end vector of each word according to the source end vector sequence and the first weight vector corresponding to each word, so that the vector of each word in the source end is converted into the vector of each word in the target end. And obtaining a target sentence vector according to the source end vector sequence, so that the target sentence vector fuses the key information of each word of the source end, and each word is associated with the front and rear words. The target words corresponding to each word are determined according to the target end vector of each word and the target sentence vector, and finally the target texts corresponding to the source texts are generated according to the target words corresponding to each word, so that the problem that in the traditional text translation method, the corresponding target words are determined only according to the target end vector of a single word, and the semantics of each word in the sentence are ignored, so that the translation is inaccurate is solved. By adopting the scheme, each word can be translated by utilizing sentence information, and the translation accuracy is improved.

Drawings

FIG. 1 is a diagram of an application environment of a text processing method in one embodiment;

FIG. 2 is a flow diagram that illustrates a method for text processing in one embodiment;

FIG. 3 is a flow diagram that illustrates the processing of text by the translation model in one embodiment;

FIG. 4 is a flowchart illustrating the steps of obtaining a target sentence vector in one embodiment;

FIG. 5 is a flowchart illustrating the steps of obtaining a target sentence vector in one embodiment;

FIG. 6 is a flow diagram illustrating vector modeling of deep sentences in accordance with an embodiment;

FIG. 7 is a flowchart illustrating the steps of generating a deep sentence vector in one embodiment;

FIG. 8 is a schematic diagram of a further embodiment of the process of vector modeling of deep sentences;

FIG. 9 is a flowchart illustrating the steps of generating shallow sentence vectors in one embodiment;

FIG. 10 is a flowchart illustrating the steps of determining a target word in one embodiment;

FIG. 11 is an architecture diagram of a neural network machine translation system, in accordance with an embodiment;

FIG. 12 is a block diagram showing a configuration of a text processing apparatus according to another embodiment;

FIG. 13 is a block diagram of a computer device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

FIG. 1 is a diagram of an application environment of a text processing method in one embodiment. Referring to fig. 1, the text processing method is applied to a text processing system. The text processing system includes a terminal 110 and a server 120. The terminal 110 and the server 120 are connected through a network. The terminal 110 may specifically be a desktop terminal or a mobile terminal, and the mobile terminal may specifically be at least one of a mobile phone, a tablet computer, a notebook computer, and the like. The server 120 may be implemented as a stand-alone server or a server cluster composed of a plurality of servers. In this embodiment, the terminal 110 may obtain a source text to be translated, perform word segmentation on the source text, and obtain a word vector corresponding to each word of the source end, so as to obtain an input sequence corresponding to the source text. The terminal 110 performs an operation of translating the source text into the target text according to the input sequence. After the terminal 110 obtains the input sequence corresponding to the source text, the input sequence may be sent to the server 120, and the server 120 performs an operation of translating the source text into the target text according to the input sequence.

As shown in FIG. 2, in one embodiment, a method of text processing is provided. The embodiment is mainly illustrated by applying the method to the computer device in fig. 1. The computer device may be a terminal or a server. Referring to fig. 2, the text processing method specifically includes the following steps:

step 202, an input sequence of source text is obtained.

Wherein, the source text refers to the text to be translated. The source text may be a sentence, paragraph, chapter, etc. text. The source text may be, but is not limited to, chinese text and english text. The input sequence refers to a sequence formed by word vectors corresponding to each word after dividing the words of the source text.

Specifically, the computer device obtains a text to be translated, and performs word segmentation on the text to be translated in a word segmentation mode. And the computer equipment acquires the word vector corresponding to each word after word segmentation to obtain the input sequence corresponding to the text to be translated.

In this embodiment, the computer device may perform word segmentation processing on the source text by using a semantic word segmentation method, a character matching word segmentation method, a statistical word segmentation method, and the like. After word segmentation, a word vector corresponding to each word is determined from the word list. The word table records the word vector corresponding to each word, and one word corresponds to one word vector. After the computer equipment divides the words of the source text, words which are the same as the words of each source text are searched in the word list, and word vectors corresponding to the words which are the same as the words of the source text in the word list are used as the word vectors of the words of the source text.

In this embodiment, the computer device may directly perform word segmentation on the source text to obtain the input sequence. The source text can also be sent to a third-party device for processing, and the third-party device performs word segmentation processing on the source text. The computer device then directly inputs the sequence corresponding to the source text from the third party device.

And 204, performing semantic coding on the input sequence to obtain a source end vector sequence.

The semantic coding is to process information by words, systematically classify the information according to word senses and word categories or organize and summarize language materials in a specific language form, find out basic arguments, arguments and logic structures of the materials, and code the words according to semantic features. Systematic classification by word category refers to classifying word information by different category systems such as sports, news, and entertainment. The source end vector sequence is a sequence obtained by semantically coding an input sequence.

Specifically, the computer device inputs the input sequence into a multilayer neural network of an encoder, and the multilayer neural network of the encoder performs semantic coding on the input sequence layer by layer to obtain a source end vector sequence output by each layer of the neural network.

In this embodiment, the computer device inputs the input sequence into the first layer neural network of the encoder, and the first layer neural network performs semantic coding on the input sequence to obtain a source end vector sequence output by the first layer neural network. And then, the source end vector sequence output by the first layer of neural network is used as the input of the second layer of neural network, and the second layer of neural network carries out semantic coding on the source end vector sequence output by the first layer of neural network to obtain the source end vector sequence output by the second layer of neural network. Similarly, the source end vector sequence output by the current layer is used as the input of the next layer to obtain the source end vector sequence output by the next layer until the source end vector sequence output by the last layer of neural network is obtained.

In this embodiment, the computer device inputs an input sequence into a first layer neural network of the encoder, the input sequence being formed of word vectors of a plurality of words of the source text. And the first layer of neural network carries out semantic coding on the word vector of each word in the input sequence to obtain a source end vector corresponding to the word vector of each word output by the first layer of neural network. And when the first layer of neural network completes semantic coding on the word vectors of all the words in the input sequence, obtaining source end vectors corresponding to the word vectors of all the words output by the first layer of neural network, wherein the source end vectors form a source end vector sequence output by the first layer of neural network.

Step 206, a first weight vector corresponding to each word in the source end vector sequence is obtained.

The first weight vector is obtained by performing attention mechanism operation processing according to the source end vector sequence of the previous layer.

Specifically, each word corresponds to a source end vector. And performing attention operation on the source end vector input into the first layer of neural network according to a preset initial value or a randomly generated initial value to obtain a weight vector corresponding to the source end vector output by the first layer of neural network. The weight vector is the weight vector of the word corresponding to the source end vector. Similarly, a weight vector corresponding to each word output by the first-layer neural network is obtained in the same manner. And then, from the second layer of neural network, performing attention operation on the source end vector sequence of the last layer of the encoder according to the weight vector corresponding to each word output by the previous layer of neural network to obtain the weight vector corresponding to each word output by the current layer. Until obtaining the weight vector corresponding to each word output by the last layer of neural network, and taking the weight vector corresponding to each word output by the last layer of neural network as the first weight vector corresponding to each word in the source end vector sequence.

In the present embodiment, each layer of neural network in the encoder corresponds to each layer of neural network in the decoder. Each word corresponds to a source end vector. The computer equipment obtains a source end vector sequence output by the last layer of neural network in the encoder, and inputs the source end vector sequence of the last layer of the encoder into the first layer of neural network of the decoder. And the first-layer neural network of the decoder performs attention operation on one source end vector in the source end vector sequence through a preset initial value or a randomly generated initial value, and outputs a weight vector corresponding to the source end vector, wherein the weight vector is the weight vector of a word corresponding to the source end vector. Similarly, a weight vector corresponding to each source end vector of the first-layer neural network output of the decoder can be obtained, and therefore a weight vector corresponding to each word of the first-layer neural network output of the decoder can be obtained.

Then, the computer device inputs the weight vector corresponding to each word output by the first layer neural network of the decoder into the second layer neural network of the decoder, and the source end vector sequence is input into the second layer neural network of the decoder. And performing attention operation on the source end vector sequence through the weight vector corresponding to each word output by the first layer of neural network to obtain the weight vector corresponding to each word output by the second layer of neural network of the decoder. And from the second layer of the neural network of the decoder, taking the weight vector corresponding to each word output by the neural network of the previous layer and the source end vector sequence as the input of the current layer. And performing attention operation on the source end vector sequence through the weight vector corresponding to each word output by the upper layer of neural network to obtain the weight vector corresponding to each word output by the current layer of neural network. By analogy, a weight vector corresponding to each word output by the last layer of the neural network of the decoder can be obtained, and the weight vector corresponding to each word output by the last layer of the neural network of the decoder is used as a first weight vector corresponding to each word in the source end vector sequence.

It should be noted that, in this embodiment, the source end vector sequence in each layer of the neural network of the input decoder is the source end vector sequence output by the last layer of the neural network of the encoder.

And step 208, generating a target end vector of each word according to the source end vector sequence and the first weight vector corresponding to each word.

And the target end vector is obtained by inputting the source end vector corresponding to each word in the input sequence into a hidden layer of a decoder and calculating. The hidden layer may include a plurality of neural network layers.

Specifically, the computer device performs dot product calculation on the source end vector sequence and the weight vector corresponding to each word output by the last layer of neural network of the decoder, so as to obtain a target end vector corresponding to each word.

And step 210, obtaining a target sentence vector according to the source end vector sequence.

The target sentence vector is obtained by processing the source end vector sequence according to a specific rule. The target sentence vector includes a shallow sentence vector and a deep sentence vector.

Specifically, the computer device processes the source-end vector sequence according to a preset rule to obtain a target sentence vector corresponding to the source-end vector sequence. And the shallow sentence vector or the deep sentence vector is obtained by processing the source end vector sequence by the computer equipment according to different preset rules.

Step 212, determining a target word corresponding to each word according to the target end vector and the target sentence vector of each word.

Step 214, generating a target text corresponding to the source text according to the target word corresponding to each word.

The target words refer to words obtained after translation of the words in the source text. The target text refers to a translated version translated from the source text.

Specifically, the computer device may linearly superimpose the target end vector and the target sentence vector for each word. And obtaining a word vector corresponding to the candidate word at the target end, matching the vector obtained after linear superposition with the word vector corresponding to the candidate word at the target end to obtain a target word corresponding to each word, and thus generating a target text corresponding to the source text.

In this embodiment, when the target sentence vector is a shallow sentence vector, the target end vector of each word and the shallow sentence vector may be linearly superimposed to obtain a superimposed vector, and the next processing may be performed. When the target sentence vector is a deep sentence vector, the target end vector of each word and the deep sentence vector can be linearly superposed to obtain a superposed vector and perform the next processing.

According to the text processing method, the source end vector sequence is obtained by obtaining the input sequence of the source text and performing semantic coding on the input sequence. The method comprises the steps of obtaining a first weight vector corresponding to each word in a source end vector sequence, and generating a target end vector of each word according to the source end vector sequence and the first weight vector corresponding to each word, so that the vector of each word in the source end is converted into the vector of each word in the target end. And obtaining a target sentence vector according to the source end vector sequence, so that the target sentence vector fuses the key information of each word of the source end, and each word is associated with the front and rear words. The target words corresponding to each word are determined according to the target end vector of each word and the target sentence vector, and finally the target texts corresponding to the source texts are generated according to the target words corresponding to each word, so that the problem that in the traditional text translation method, the corresponding target words are determined only according to the target end vector of a single word, and the semantics of each word in the sentence are ignored, so that the translation is inaccurate is solved. By adopting the scheme, each word can be translated by utilizing sentence information, and the translation accuracy is improved.

In one embodiment, the implementation process of obtaining the first weight vector corresponding to each word in the source end vector sequence and generating the target end vector of each word according to the source end vector sequence and the first weight vector corresponding to each word is as follows:

specifically, the computer device obtains a source end vector sequence output by the last layer of neural network in the encoder, and inputs the source end vector sequence of the last layer of the encoder into the first layer of neural network of the decoder. And the first-layer neural network of the decoder performs attention operation on one source end vector in the source end vector sequence through a preset initial value or a randomly generated initial value, and outputs a weight vector corresponding to the source end vector, wherein the weight vector is a first weight vector of a word corresponding to the source end vector. Similarly, a first weight vector corresponding to each source end vector of the first layer neural network output of the decoder can be obtained, and therefore a first weight vector corresponding to each word of the first layer neural network output of the decoder can be obtained. And then, the computer equipment carries out weighted summation on the first weight vector corresponding to each word output by the first layer of neural network and the source end vector sequence respectively to obtain a target end vector corresponding to each word output by the first layer of neural network of the decoder.

Then, the computer device inputs the target end vector corresponding to each word output by the first layer neural network of the decoder and the source end vector sequence into the second layer neural network of the decoder, and calculates a first weight vector corresponding to each word in the second layer neural network. And weighting and summing the first weight vector corresponding to each word in the second layer of neural network and the source end vector sequence to obtain a target end vector corresponding to each word output by the second layer of neural network. Similarly, a target end vector corresponding to each word output by the last layer of the neural network of the decoder can be obtained. And then, determining a target word corresponding to each word according to the target end vector and the target sentence vector corresponding to each word output by the last layer of neural network of the decoder.

In one embodiment, the step of generating a target end vector for each word further comprises: and determining the target end vector output at the current moment according to the target end vector output at the previous moment and the source end vector sequence.

Specifically, the decoding by the decoder to obtain the target word corresponding to each source end word is performed once, that is, the decoding at each time point obtains a target word corresponding to one source end word. And the computer equipment inputs the source end vector sequence output by the last layer of the encoder and a preset initial value into the first layer of the neural network of the decoder to obtain a target end vector output by the first layer of the neural network of the decoder at the first moment. Then, the computer device performs attention operation on the target end vector output at the first moment and the source end vector sequence to obtain a target end vector output at a second moment by the first layer neural network of the decoder. Similarly, the attention operation is performed on the source end vector sequence by using the target end vector output at the previous moment, so as to obtain the target end vector output at the current moment. Thereby obtaining a target end vector corresponding to each source end word output by the first layer neural network of the decoder at each moment. Similarly, a target end vector corresponding to each source end word output by the last layer of the neural network of the decoder at each moment is obtained according to the same processing mode. And determining the target words corresponding to each word according to the target end vector and the target sentence vector corresponding to each source end word output by the last layer of neural network of the decoder at each moment.

In the text processing method, attention operation is performed on the source end vector sequence through the target end vector output at the previous moment, so that the output probability of the current moment is predicted according to the output of the previous moment, and the target end vector output at the current moment is obtained. And then the target word at the current moment can be predicted according to the target word at the previous moment, and the translation operation of the source text is completed.

FIG. 3 is a flow diagram illustrating processing of text by a neural network machine translation model in one embodiment. And inputting the source text into an encoder, and performing semantic coding on the source text through a coding module to obtain a source end vector sequence. And then inputting the source end vector sequence and the word vector corresponding to the target word output by the previous neural network into an attention module, and performing attention mechanism processing on the source end vector sequence through the attention module to obtain a current source end content vector, namely a current moment source end context. And then inputting the context of the source end at the current moment into a decoder, decoding the context of the source end at the current moment through a decoding module, and outputting the target word at the current moment.

In one embodiment, the target sentence vector is a shallow sentence vector; the obtaining of the target sentence vector according to the source end vector sequence includes: acquiring a source end vector sequence of an encoder output layer; and determining the average value of the source end vector sequence of the output layer on the corresponding dimension to generate a shallow sentence vector.

The shallow sentence vector is obtained by processing a source end vector sequence output by an encoder output layer according to a preset rule. The preset rule may be to find an average value of the source-end vector sequence in the corresponding dimension.

Specifically, the computer device obtains a source end vector sequence of the output of the encoder output layer, calculates an average value of the source end vector sequence of the output layer in each dimension, and takes the average value in each dimension as a shallow sentence vector corresponding to the source end vector sequence of the encoder output layer.

For example, the sequence of source vectors output by the encoder output layer is H ═ [ (1.2,1.4,0.6), (0.2,1.3,1.6), (0.5,0.2,1.7) ], where each vector represents a source vector obtained by semantically encoding a word. Each vector is a three-dimensional vector, the shallow sentence vector corresponding to the source end vector sequence H is calculated, and the average value of the 3 vectors in the first dimension, the average value in the second dimension and the average value in the third dimension can be calculated, so that the vector H is [ (1.2+0.2+0.5)/3, (1.4+1.3+0.2)/3, (0.6+1.6+1.7)/3 ]. The vector H obtained by calculation is a shallow sentence vector corresponding to the source end vector sequence H of the encoder output layer. When the vector corresponding to each word is a 512-dimensional vector, the obtained source-end vector sequence is a sequence formed by a plurality of 512-dimensional vectors. And averaging the source end vector sequence in 1 to 512 dimensions to obtain a 512-dimensional vector, namely the shallow sentence vector is a 512-dimensional vector. It should be noted that in the practical application scenario, the dimension of the vector corresponding to each word includes, but is not limited to, 512 dimensions. The vector dimension of each word can be set according to different requirements.

According to the text processing method, the average value of the source end vector sequence of the encoder output layer in the corresponding dimension is solved, each average value fuses the element information of each word in the corresponding dimension, so that the obtained shallow layer sentence vector fuses the information of each word, and the information represented by a single word is converted into the information represented by the whole sentence.

In one embodiment, the target sentence vector is a shallow sentence vector; the obtaining of the target sentence vector according to the source end vector sequence includes: acquiring a source end vector sequence of an encoder output layer; and determining the maximum value of the source end vector sequence of the output layer on the corresponding dimension to generate a shallow sentence vector.

The shallow sentence vector is obtained by processing a source end vector sequence output by an encoder output layer according to a preset rule. The preset rule may be to find the maximum value of the source-end vector sequence in the corresponding dimension.

Specifically, the computer device obtains a source-end vector sequence of the output of the encoder output layer, determines a maximum value of the source-end vector sequence of the output layer in each dimension, and obtains a vector composed of the maximum values in each dimension. And taking the vector formed by the maximum values in each dimension as a shallow sentence vector corresponding to the source end vector sequence of the encoder output layer.

For example, the sequence of source vectors output by the encoder output layer is H ═ [ (1.2,1.4,0.6), (0.2,1.3,1.6), (0.5,0.2,1.7) ], where each vector represents a source vector obtained by semantically encoding a word. Each vector is a three-dimensional vector, a shallow sentence vector corresponding to the source end vector sequence H is obtained, the maximum value of the 3 vectors in the first dimension, the maximum value in the second dimension and the maximum value in the third dimension can be calculated, and the obtained vector H is (1.3, 1.4, 1.7). The vector H obtained by calculation is a shallow sentence vector corresponding to the source end vector sequence H of the encoder output layer. When the vector corresponding to each word is a 512-dimensional vector, the obtained source-end vector sequence is a sequence formed by a plurality of 512-dimensional vectors. The maximum value of the source end vector sequence in the 1-512 dimensions is solved to obtain a 512-dimensional vector, namely the shallow sentence vector is a 512-dimensional vector. Similarly, in a practical application scenario, the dimension of the vector corresponding to each word includes, but is not limited to, 512 dimensions.

In the text processing method, the element of each word in the corresponding dimension represents the importance degree of each word in the dimension. By determining the maximum value of the source-side vector sequence of the encoder output layer in the corresponding dimension, the most representative information in each dimension can be determined. The maximum value of the source end vector sequence of the output layer in the corresponding dimension is used as a shallow sentence vector, so that each component element of the obtained shallow sentence vector represents the most important information in each dimension, and the shallow sentence vector keeps the whole information of each word.

In one embodiment, as shown in FIG. 4, the target sentence vector is a shallow sentence vector; the obtaining of the target sentence vector according to the source end vector sequence includes:

step 402, an input sequence of an encoder input layer is obtained.

Step 404, determine the maximum value of the input sequence of the input layer in the corresponding dimension, and obtain an intermediate vector.

The intermediate vector is a vector obtained by calculating the maximum value of the input sequence in the corresponding dimension.

Specifically, the computer device obtains an input sequence of an encoder input layer, determines a maximum value of the input sequence of the encoder input layer in each dimension, obtains a vector formed by the maximum values in each dimension, and obtains an intermediate vector.

Step 406, a source-end vector sequence of the encoder output layer is obtained.

Step 408, determining a similarity vector between the intermediate vector and the source end vector sequence of the output layer.

Wherein, the similarity vector refers to the logic similarity between the source end vector sequence and each key-value pair.

Specifically, the computer device obtains a source end vector sequence of an encoder output layer, and calculates the similarity between the intermediate vector and the source end vector sequence of the output layer through dot product modeling to obtain a similarity vector. For example, by

And solving the similarity between the intermediate vector q and the source end vector sequence H of the output layer. Wherein H^TAnd d is a constant, and when the source-end vector sequence is a 512-dimensional vector sequence, d is 512.

Step 410, a weight vector corresponding to the similarity is obtained according to the similarity vector.

The weight vector is obtained by normalizing the similarity vector. Normalization can transform one K-dimensional vector containing arbitrary real numbers into another K-dimensional real vector such that each element ranges from 0 to 1 and the sum of all elements is 1.

Specifically, the computer device obtains a similarity vector between the intermediate vector and the source end vector sequence of the output layer, and performs normalization processing on the similarity vector to obtain a weight vector corresponding to the similarity vector.

Step 412, generating a shallow sentence vector according to the weight vector and the source end vector sequence of the output layer.

Specifically, the computer device performs dot product calculation on the weight vector and the source end vector sequence of the output layer to obtain a shallow sentence vector corresponding to the source end vector sequence of the output layer. And the shallow sentence vector corresponding to the source end vector sequence of the output layer is the target sentence vector.

For example, the similarity vector E is normalized to obtain a weight vector E corresponding to E, and the weight vector E and the source end vector sequence H of the output layer are dot-multiplied to obtain a shallow sentence vector g, that is, g ═ E · H.

According to the text processing method, the input sequence of the input layer of the encoder is obtained, the maximum value of the input sequence of the input layer on the corresponding dimension is determined, the intermediate vector is obtained, and therefore the key information of the source terminal word is extracted. And acquiring a source end vector sequence of an output layer of the encoder, and determining a similarity vector between the intermediate vector and the source end vector sequence of the output layer so as to determine the logic similarity between a source end word and the sequence subjected to semantic encoding. And obtaining a weight vector corresponding to the similarity according to the similarity vector, and generating a shallow sentence vector according to the weight vector and the source end vector sequence of the output layer, so that information of a source end word and information of the sequence subjected to semantic coding can be integrated, and key information of the word is integrated into a sentence.

In one embodiment, as shown in fig. 5, the target sentence vector is a deep sentence vector; the obtaining of the target sentence vector according to the source end vector sequence includes:

step 502, a source-end vector sequence of each layer of the encoder is obtained.

And step 504, obtaining shallow sentence vectors of each layer according to the source end vector sequence of each layer.

Specifically, the computer device obtains a source-end vector sequence output by each layer of the neural network of the encoder, and obtains the source-end vector sequence of each layer. And then, performing attention operation on the source end vector sequence output by the first layer of neural network to obtain a first layer of shallow sentence vector. And taking the shallow sentence vector of the first layer as the input of the second layer neural network, and performing attention operation on the source end vector sequence through the shallow sentence vector of the first layer to obtain the shallow sentence vector of the second layer. Similarly, from the second layer of neural network, the output of the previous layer of neural network is taken as the input of the current layer, the attention operation is performed on the source end vector sequence of the current layer, the shallow sentence vector of the current layer is obtained, and thus the shallow sentence vector of each layer is obtained.

Step 506, according to the shallow sentence vectors of each layer, deep sentence vectors are generated.

Wherein, the deep sentence vector is obtained by modeling the shallow sentence vector of each layer.

Specifically, the computer device updates the hidden state of the first layer neural network with the shallow sentence vector of the first layer as input. And then the hidden state of the first layer of neural network after being updated and the shallow sentence vector of the second layer are used as the input of the second layer of neural network, and the hidden state of the second layer of neural network is updated. Similarly, until the hidden state of the last layer of neural network is updated, the updated hidden state of the last layer of neural network is used as the deep sentence vector.

In the text processing method, the shallow sentence vector of each layer is obtained according to the source end vector sequence of each layer by obtaining the source end vector sequence of each layer of the encoder. And generating deep sentence vectors according to the shallow sentence vectors of each layer, so that the obtained deep sentence vectors are fused with the information of the shallow sentence vectors of each layer. Therefore, global information of the source text is reserved in the deep sentence vector, and the target text obtained through translation is more accurate.

In one embodiment, the generating a deep sentence vector from the shallow sentence vectors of each layer includes:

inputting the shallow sentence vector of each layer into a recurrent neural network to obtain a deep sentence vector output by an output layer of the recurrent neural network; each layer of network of the recurrent neural network corresponds to each layer of network of the encoder, the input of each layer of the recurrent neural network comprises a shallow layer sentence vector of a corresponding layer in the encoder and an implicit state vector output by a layer above the layer in the recurrent neural network, and the implicit state vector is obtained after the recurrent neural network processes the shallow layer sentence vector input by the layer above.

The Recurrent Neural Network (RNN) is an artificial Neural Network in which nodes are directionally connected to form a ring. The internal state of such a network may exhibit dynamic timing behavior, and the RNN may utilize internal memory to process sequences of inputs at arbitrary timing.

Specifically, the computer device inputs the shallow sentence vector of the first layer and an empirically preset initial value or a randomly generated initial value into the first layer of the recurrent neural network, and updates the implicit state vector of the first layer of the recurrent neural network. And then, the updated hidden state vector of the first layer of the recurrent neural network and the shallow sentence vector of the second layer are used as the input of the second layer of the recurrent neural network, and the hidden state vector of the second layer of the recurrent neural network is updated. Similarly, from the second layer of the recurrent neural network, the updated hidden state vector of the previous layer of the recurrent neural network and the shallow sentence vector corresponding to the current layer are used as the input of the current layer of the recurrent neural network, so that the hidden state vector of the current layer of the recurrent neural network is updated. And after the hidden state vector of the last layer of the recurrent neural network is updated, taking the updated hidden state vector of the last layer of the recurrent neural network as the deep sentence vector. The shallow sentence vectors of each layer are input into the recurrent neural network to obtain the deep sentence vectors output by the output layer of the recurrent neural network, so that the obtained deep sentence vectors are fused with the information of the shallow sentence vectors of each layer, and the global information of the source text is reserved.

As shown in FIG. 6, g_n＝(g₁,g₂,g₃)，g_nFor each layer of shallow sentence vector in the encoder, the deep sentence vector is

The recurrent neural network has 3 layers, deep sentence vector is

Can be composed of

Thus obtaining the product. Then obtain the experience preset initial value r₀Or a randomly generated initial value r₀The first-layer shallow sentence vector g in the encoder₁And r₀Inputting the first layer of the recurrent neural network, and outputting the first layer shallow sentence vector g₁And r₀Carrying out weighted calculation to obtain the updated hidden state vector r of the first layer of the recurrent neural network₁I.e. r₁＝g₁·r₀. Then, the updated implicit state vector r of the first layer of the recurrent neural network is used₁And a second-level shallow sentence vector g in the encoder₂As the input of the second layer of the recurrent neural network, the updated hidden state vector r of the first layer of the recurrent neural network is obtained₂. Similarly, according to the above method, the updated hidden state vector r of the last layer of the recurrent neural network can be waited for₃R is to be measured₃As a deep sentence vector.

In one embodiment, as shown in fig. 7, the generating a deep sentence vector according to the shallow sentence vector of each layer includes:

step 702, determining a similarity vector between a shallow sentence vector of a current layer and a target end vector of a previous layer of the current layer.

Step 704, determining a weight vector between the shallow sentence vector of the current layer and the target end vector of the previous layer of the current layer according to the similarity vector.

Step 706, generate a deep sentence vector according to the weight vector and the shallow sentence vector of each layer.

Specifically, each layer of neural network in the encoder corresponds to each layer of neural network in the decoder. The computer equipment acquires shallow sentence vectors output by each layer of neural network in the encoder and the initial target end vector of the decoder. And inputting the shallow sentence vector output by the first-layer neural network of the encoder and the initial target end vector of the decoder into the first-layer neural network of the decoder, and calculating the logic similarity between the initial target end vector and the first-layer shallow sentence vector to obtain the similarity vector output by the first-layer neural network. And carrying out normalization processing on the similarity vector to obtain a weight vector output by the first layer of neural network.

Then, the computer device obtains the shallow sentence vector output by the second layer neural network in the encoder, and obtains the target end vector output by the first layer neural network in the decoder. And inputting the second-layer shallow-layer sentence vector of the encoder and the first-layer target end vector of the decoder into a first-layer neural network of the decoder, and calculating the logic similarity between the first-layer target end vector and the second-layer shallow-layer sentence vector to obtain a similarity vector output by the second-layer neural network. And carrying out normalization processing on the similarity vector to obtain a weight vector output by the second layer of neural network of the decoder. Similarly, from the second layer of neural network, the target end vector output by the previous layer of neural network and the shallow layer sentence vector of the current layer are used as the input of the current layer of neural network. And calculating and determining a similarity vector between a target end vector output by the neural network of the previous layer and a shallow sentence vector of the current layer, and then carrying out normalization processing on the similarity vector obtained by calculation of each layer to correspondingly obtain a weight vector of each layer. And then, the computer equipment performs dot product calculation on the weight vector of each layer and the shallow sentence vector corresponding to the weight vector of each layer to obtain a deep sentence vector.

In the text processing method, the logical similarity between the shallow sentence vector of the current layer and the target end vector of the previous layer of the current layer is determined by determining the similarity vector between the shallow sentence vector of the current layer and the target end vector of the previous layer of the current layer. And determining a weight vector between the shallow layer sentence vector of the current layer and the target end vector of the previous layer of the current layer according to the similarity vector, and generating a deep layer sentence vector according to the weight vector and the shallow layer sentence vector of each layer. The key information of each layer of shallow sentence vector can be integrated, the condition that the key information is lost is avoided, and the condition that translation is wrong is avoided.

In one embodiment, as shown in FIG. 8, the source-side vector sequence H for each layer of the encoder_iPerforming attention operation to form shallow sentence vector g of each layer_iTo obtain each layer of shallow sentence vector g_iAnd forming a shallow sentence vector sequence G.

G＝{g₁,...,g_N}

g_i＝Global(H_i)

After obtaining the shallow sentence vector sequence G, in the decoder, the target end vector d_i-1As a query, the shallow sentence vector sequence G of the encoder is given by { G ═ G₁,...,g_NCarry out attention-giving operation, i.e. g_i＝Att(d_i-1G), forming a deep-level sentence vector.

The specific operation of obtaining the deep sentence vector is as follows: for target end vector d_i-1Dot product with G to obtain a similarity vector e between the request and each key-value pair_i：

Wherein G is_n ^TAnd d is the dimension of the hidden state vector of the model. Then the similarity vector e_iCarrying out normalization processing to obtain a weight vector beta_iThen, for the weight vector beta_iAnd the shallow sentence vector g of each layer of the encoder_iPerforming dot product calculation, i.e. g ═ beta_iG to obtain a deep-level sentence vector G.

In one embodiment, as shown in fig. 9, the generating the shallow sentence vector of each layer according to the source end vector sequence of each layer includes:

step 902, obtain a source-end vector sequence of each layer of the encoder.

Specifically, the computer device inputs the input sequence of the source text into a first layer neural network of an encoder to perform semantic coding, and obtains a source end vector sequence output by the first layer neural network of the encoder. And inputting the source end vector sequence of the first layer into a second layer neural network of the encoder to perform semantic encoding to obtain a second layer source end vector sequence. And obtaining a source end vector sequence output by each layer of neural network of the encoder according to the same mode.

And 904, taking each layer as a current layer one by one, and determining a similarity vector between a source end vector sequence of the current layer and a shallow sentence vector of a previous layer of the current layer.

Step 906, a weight vector corresponding to the similarity is obtained according to the similarity vector.

Step 908, generating a shallow sentence vector of the current layer according to the weight vector and the source end vector sequence of the current layer.

Specifically, the computer device takes each layer of neural network as the current layer one by one in order. The method comprises the steps of obtaining a source end vector sequence of a current layer, obtaining a shallow layer sentence vector of a previous layer of the current layer, and calculating the logic similarity between the source end vector sequence of the current layer and the shallow layer sentence vector of the previous layer of the current layer to obtain a similarity vector. And then, carrying out normalization processing on the similarity vector to obtain a weight vector corresponding to the similarity. And then carrying out weighted summation on the weight vector and the source end vector sequence of the current layer to obtain a shallow sentence vector of the current layer.

In this embodiment, the first layer neural network is not one layer above, and thus, is not one layer above a shallow sentence vector. Therefore, the initial shallow sentence vector needs to be set in advance. And when the first-layer neural network is taken as the current layer, taking the initial shallow sentence vector as the shallow sentence vector of the previous layer of the first-layer neural network. And calculating to obtain the shallow sentence vector output by the first layer of neural network.

The text processing method comprises the steps of converting the source end vector sequence of each layer into shallow sentence vectors of each layer, taking the shallow sentence vectors output by the current layer as the input of the next layer, and calculating the shallow sentence vectors of the next layer. The shallow sentence vector of each current layer is obtained by the shallow sentence vector of the previous layer and the source end vector sequence of the current layer, the transmission of the information of the source end words is ensured, and the shallow sentence vector of the current layer is integrated with the information of all layers before the current layer, so that the information of each word of the source text is integrated into the sentence.

In one embodiment, a shallow sentence vector learning operation is performed on the source-side vector sequence of each layer of the encoder. For example, attention operation is performed on the source end vector sequence of the nth layer to form a shallow sentence vector of the nth layer:

g_n＝Global(H_n)

specifically, with the shallow sentence vector g of the n-1 st layer_n-1As the input of the nth layer, the source end vector sequence H of the nth layer of the encoder is paired through the shallow layer sentence vector of the (n-1) th layer_nThe attention operation is performed in the following way:

g_n＝Att(g_n-1,H_n)

further, the specific operation is to g_n-1And H_nDot product to obtain a similarity vector e between the request and each key-value pair_n：

Wherein H_n ^TRepresenting the transpose of the source-side vector sequence of the encoder input layer, and d is the dimension of the model hidden state vector. Then to the similarity vector e_nCarrying out normalization processing to obtain a similarity vector e_nCorresponding weight vector E_n

Then the weight and the source end vector sequence H of the n layer of the encoder are compared_nCarrying out dot product calculation to obtain the vector g of the shallow sentence at the nth layer_n：

g_n＝E_n·H_n。

In one embodiment, the computer device performs attention operation on a source end vector sequence output by the first layer neural network to obtain a shallow sentence vector of the first layer. And taking the shallow sentence vector of the first layer as the input of the second layer neural network, and performing attention operation on the source end vector sequence through the shallow sentence vector of the first layer to obtain the shallow sentence vector of the second layer. Similarly, from the second layer of neural network, the output of the previous layer of neural network is taken as the input of the current layer, the attention operation is performed on the source end vector sequence of the current layer, the shallow sentence vector of the current layer is obtained, and thus the shallow sentence vector of each layer is obtained. Note that the first layer neural network is not one above, and thus, is not one above the shallow sentence vector. Therefore, when inputting the first layer source end vector sequence into the first layer neural network for attention operation, initial values need to be set in advance or generated randomly. Inputting the initial value into the first layer neural network, and performing attention operation on the source end vector sequence of the first layer through the initial value.

In one embodiment, as shown in fig. 10, the determining the target word corresponding to each word according to the target end vector of each word and the target sentence vector includes:

step 1002, obtaining a predicted word vector corresponding to each word according to the target end vector of each word and the target sentence vector.

And the predicted word vector is a vector obtained by processing a target end vector and a target sentence vector corresponding to a word. One word corresponds to one predicted word vector.

Specifically, the computer device obtains a target end vector corresponding to a word, obtains a target sentence vector, and linearly superimposes the target end vector corresponding to the word and the target sentence vector to obtain a predicted word vector corresponding to the word. And obtaining a predicted word vector corresponding to each word according to the same mode.

Step 1004, obtaining word vectors of the candidate words of the target end.

Step 1006, determining similarity between the predicted word vector corresponding to each word and the word vector of the candidate word at the target end.

The candidate words refer to words in a preset word bank, and each word in the word bank corresponds to a word vector.

Specifically, the computer device obtains a word vector corresponding to each candidate word from a word bank of the target terminal. Then, a predicted word vector corresponding to a word is selected, and the similarity between the selected predicted word vector and the word vector corresponding to each candidate word at the target end is calculated. Similarly, the similarity between each predicted word vector and the word vector of the candidate word at the target end is calculated in the same manner.

Step 1008, the candidate word with the highest similarity of the predicted word vector corresponding to each word is used as the target word corresponding to each word.

Specifically, after determining the similarity between the selected predicted word vector and the word vectors corresponding to the candidate words at the target end, the computer device determines the candidate word with the highest similarity to the selected predicted word vector, and takes the candidate word as the target word corresponding to the selected predicted word vector. And obtaining the target word corresponding to each predicted word vector in the same mode, thereby obtaining the target word corresponding to each source-end word.

In this embodiment, after determining the similarity between a selected predicted word vector and a word vector corresponding to each candidate word at the target end, the computer device may directly determine a target word corresponding to the selected predicted word vector. Then, a predicted word vector corresponding to another word is selected, and the similarity between the word vectors corresponding to the candidate words at the target end is calculated to determine the operation of the target word corresponding to the selected word.

In the text processing method, the predicted word vector corresponding to each word is obtained according to the target end vector of each word and the target sentence vector. The word vectors of the candidate words of the target end are obtained, and the similarity between the predicted word vector corresponding to each word and the word vectors of the candidate words of the target end is determined. And taking the candidate word with the highest similarity of the predicted word vector corresponding to each word as the target word corresponding to each word, so as to obtain the target word corresponding to each word of the source text.

In one embodiment, the computer device may generate an output layer using a fully-connected feed-forward network (position-with-full-connected feed-forward network), compare the obtained predicted word vector with word vectors corresponding to all candidate words at a target end, and select a candidate word with the highest similarity as a target word corresponding to the predicted word vector, thereby obtaining a target word of a source text corresponding to the predicted word vector.

In one embodiment, the determining the target word corresponding to each word according to the target end vector of each word and the target sentence vector includes: obtaining a word vector of a candidate word of a target end; determining the similarity between the predicted word vector corresponding to each word and the word vector of the candidate word at the target end; outputting a preset number of candidate words with high similarity, and taking the preset number of candidate words with high similarity as target words corresponding to each word.

Generating a target text corresponding to the source text according to the target word corresponding to each word, including: generating a candidate text according to a preset number of candidate words corresponding to each word; and taking the candidate text with the highest output probability as the target text corresponding to the source text.

The output probability refers to the probability of each candidate text obtained by calculation according to the preset number of candidate words corresponding to each word.

Specifically, after determining the similarity between the selected predicted word vector and the word vector corresponding to each candidate word at the target end, the computer device may sort the candidate words according to the similarity, and select a preset number of candidate words with high similarity. The candidate words with high similarity in the preset number can be used as a plurality of target words corresponding to the selected predicted word vector. After each word in the source text corresponds to a preset number of candidate words, the computer device may generate a plurality of candidate texts according to the preset number of candidate words corresponding to each word. Further, the computer device calculates an output probability of each candidate text according to the candidate word corresponding to each word in the source text. And determining the candidate text with the highest output probability, and taking the candidate text with the highest output probability as the target text corresponding to the source text. The manner of calculating the output probability of the candidate text includes, but is not limited to, a bundle search.

According to the text processing method, a plurality of candidate texts corresponding to the source text are obtained by determining the candidate word corresponding to each word in the source text, and the candidate text with the highest output probability is used as the target text corresponding to the source text. Therefore, the target text closest to the source text is obtained, and the translation is more accurate.

In one embodiment, the text processing method includes:

a computer device obtains an input sequence of source text.

Then, the computer device semantically encodes the input sequence to obtain a source end vector sequence.

Then, the computer device obtains a first weight vector corresponding to each word in the source end vector sequence.

Further, the computer device generates a target end vector of each word according to the source end vector sequence and the first weight vector corresponding to each word.

Next, the computer device obtains a sequence of source-side vectors for each layer of the encoder.

Further, the computer device takes each layer as the current layer one by one, and determines the similarity vector between the source end vector sequence of the current layer and the last layer shallow sentence vector of the current layer.

And then, the computer equipment obtains a weight vector corresponding to the similarity according to the similarity vector.

And then, the computer equipment generates a shallow sentence vector of the current layer according to the weight vector and the source end vector sequence of the current layer.

Next, the computer device determines a similarity vector between the shallow sentence vector of the current layer and the target-side vector of the previous layer of the current layer.

Further, the computer equipment determines a weight vector between the shallow sentence vector of the current layer and the target end vector of the previous layer of the current layer according to the similarity vector.

Next, the computer device generates a deep sentence vector from the weight vector and the shallow sentence vector for each layer.

Further, the computer device obtains a predicted word vector corresponding to each word according to the target end vector of each word and the deep sentence vector.

Then, the computer device obtains a word vector of the candidate word of the target terminal.

Then, the computer device determines the similarity between the predicted word vector corresponding to each word and the word vector of the candidate word at the target end.

Further, the computer device takes the candidate word with the highest similarity of the predicted word vector corresponding to each word as the target word corresponding to each word.

And then, the computer equipment generates a target text corresponding to the source text according to the target word corresponding to each word.

In the text processing method, the source end vector sequence is obtained by obtaining the input sequence of the source text and performing semantic coding on the input sequence. And calculating the shallow sentence vector of the next layer by converting the source end vector sequence of each layer into the shallow sentence vector of each layer and taking the shallow sentence vector output by the current layer as the input of the next layer. The shallow sentence vector of each current layer is obtained by the shallow sentence vector of the previous layer and the source end vector sequence of the current layer, the transmission of the information of the source end words is ensured, and the shallow sentence vector of the current layer is integrated with the information of all layers before the current layer, so that the information of each word of the source text is integrated into the sentence.

By obtaining the first weight vector corresponding to each word in the source end vector sequence, the target end vector of each word can be generated according to the source end vector sequence and the first weight vector corresponding to each word, so that the vector of each word in the source end is converted into the vector of each word in the target end.

And determining the logic similarity between the shallow sentence vector of the current layer and the target end vector of the previous layer of the current layer by determining the similarity vector between the shallow sentence vector of the current layer and the target end vector of the previous layer of the current layer. And determining a weight vector between the shallow layer sentence vector of the current layer and the target end vector of the previous layer of the current layer according to the similarity vector, and generating a deep layer sentence vector according to the weight vector and the shallow layer sentence vector of each layer. The key information of each layer of shallow sentence vector can be integrated, the condition that the key information is lost is avoided, and the condition that translation is wrong is avoided.

And obtaining a target sentence vector according to the source end vector sequence, so that the target sentence vector fuses the key information of each word of the source end, and each word is associated with the front and rear words.

The target word corresponding to each word is determined according to the target end vector of each word and the target sentence vector, and the problem that translation is inaccurate due to the fact that the corresponding target word is determined only according to the target end vector of a single word and the semantics of each word in a sentence are ignored in the existing text translation method is solved. And finally, generating a target text corresponding to the source text according to the target word corresponding to each word. By adopting the scheme, each word can be translated by utilizing sentence information, and the translation accuracy is improved.

FIG. 11 is a diagram illustrating the architecture of a neural network machine translation system in one embodiment. Fig. 11 shows the structure of the encoder and decoder layers of the neural network machine translation model.

As shown in fig. 11, Nx on the left side represents the structure of one layer of the encoder, which includes two sublayers, the first sublayer being a multi-head attention layer and the second sublayer being a forward propagation layer. The input and output of each sub-layer are associated, and the output of the current sub-layer is used as an input data of the next sub-layer. Each sub-layer is followed by a normalization operation, which can increase the convergence speed of the model. The Nx on the right side represents the structure of one layer of the decoder, the decoder comprises three sublayers in one layer, the first sublayer is a multi-head attention sublayer controlled by a mask matrix and is used for modeling generated target end sentence vectors, and in the training process, the multi-head attention sublayer needs one mask matrix to control, so that only the first t-1 words are calculated in each multi-head attention calculation. The second sub-layer is a multi-head attention sub-layer, which is an attention mechanism between an encoder and a decoder, that is, relevant semantic information is searched in a source text, and the calculation of the layer uses a dot product mode. The third sublayer is a forward propagation sublayer, which is computed in the same way as the forward propagation sublayer in the encoder. There is also a relation between each sub-layer of the decoder, and the output of the current sub-layer is used as an input data of the next sub-layer. And each sub-layer of the decoder is also followed by a normalization operation to speed up model convergence.

In one embodiment, the present solution is used to test in the WMT2017 english machine translation task, with the following test results:

in the scheme, the model 1 refers to performing mean pooling operation on a source end vector sequence of an output layer of an encoder to obtain a shallow sentence vector, wherein the shallow sentence vector is a target sentence vector. In the scheme, the model 2 refers to performing maximum pooling (max-pooling) operation on a source end vector sequence of an output layer of the encoder to obtain a shallow sentence vector, wherein the shallow sentence vector is a target sentence vector. In the scheme, the model 3 is to take an input sequence of an input layer of an encoder as query (query), and perform attention operation on a source end vector sequence of an output layer of the encoder to obtain a shallow sentence vector, wherein the shallow sentence vector is a target sentence vector. When the target sentence vector is a deep sentence vector, the deep sentence vector is obtained by using the scheme-model 4 or the scheme-model 5. In the scheme, the model 4 is a static mode for obtaining deep sentence vectors, a cyclic neural network is adopted to carry out cyclic modeling on shallow sentence vectors of each layer, and the final state of the shallow sentence vectors is taken as the deep sentence vectors. In the scheme, the model 5 is a dynamic mode for obtaining deep sentence vectors, a target end vector of each moment of a decoder is used as a query, and an attention mode is used for performing attention operation on shallow sentence vectors of each layer to form the target end vector of each moment. The method provided by the embodiment is simple and easy to model, few in required computing resources and computing speed, and capable of efficiently utilizing sentence information to help the neural machine translation technology to improve translation performance.

The BLEU improvement in the table above is typically significant over a 0.5 point increase, and Δ in this column refers to the absolute number of increases. As can be seen from the table, Δ of each of the solution-model 2, the solution-model 3, the solution-model 4, and the solution-model 5 exceeds 0.5 point, indicating that the proposed method can significantly improve translation quality.

In addition to the above embodiments, the method for obtaining the shallow sentence vector in the present embodiment may also be obtained by processing the source-end vector sequence output by the encoder output layer in other machine learning manners. In addition to the above embodiments, the method for obtaining the deep sentence vector in the present embodiment may also be obtained by performing an operation on the shallow sentence vector of each layer of the encoder by using other machine learning methods (e.g., a convolutional neural network, etc.).

The model similarity and the topological structure of the neural network are not particularly limited in the scheme, and the neural network in the scheme can be replaced by various other novel model structures with paddles, such as a recurrent neural network and a variant, or replaced by other network structures, such as a convolutional neural network and the like.

Meanwhile, it should be noted that the method provided by the present invention can be used in all mainstream neural network machine translation systems, and is suitable for translation tasks of all languages.

Fig. 2-10 are flow diagrams illustrating a text processing method according to an embodiment. It should be understood that although the various steps in the flowcharts of fig. 2-10 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2-10 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternating with other steps or at least some of the sub-steps or stages of other steps.

In one embodiment, as shown in FIG. 12, a text processing apparatus is provided. The text processing apparatus includes:

a sequence obtaining module 1202, configured to obtain an input sequence of the source text.

And the encoding module 1204 is configured to perform semantic encoding on the input sequence to obtain a source-end vector sequence.

The weight obtaining module 1206 is configured to obtain a first weight vector corresponding to each word in the source end vector sequence.

And a target end vector generating module 1208, configured to generate a target end vector of each word according to the source end vector sequence and the first weight vector corresponding to each word.

And a target sentence vector determination module 1210 configured to obtain a target sentence vector according to the source end vector sequence.

And the target word determining module 1212 is configured to determine a target word corresponding to each word according to the target end vector and the target sentence vector of each word.

And a target text generation module 1214, configured to generate a target text corresponding to the source text according to the target word corresponding to each word.

The text processing device obtains the input sequence of the source text, and obtains the source end vector sequence by semantically coding the input sequence. The method comprises the steps of obtaining a first weight vector corresponding to each word in a source end vector sequence, and generating a target end vector of each word according to the source end vector sequence and the first weight vector corresponding to each word, so that the vector of each word in the source end is converted into the vector of each word in the target end. And obtaining a target sentence vector according to the source end vector sequence, so that the target sentence vector fuses the key information of each word of the source end, and each word is associated with the front and rear words. The target word corresponding to each word is determined according to the target end vector of each word and the target sentence vector, and the problem that translation is inaccurate due to the fact that the corresponding target word is determined only according to the target end vector of a single word and the semantics of each word in a sentence are ignored in the existing text translation method is solved. And finally, generating a target text corresponding to the source text according to the target word corresponding to each word. By adopting the scheme, each word can be translated by utilizing sentence information, and the translation accuracy is improved.

In one embodiment, when the target sentence vector is a shallow sentence vector, the target sentence vector determination module 1210 is further configured to: acquiring a source end vector sequence of an encoder output layer; and determining the average value of the source end vector sequence of the output layer on the corresponding dimension to generate a shallow sentence vector. The average value of the source end vector sequence of the encoder output layer on the corresponding dimension is solved, each average value fuses the element information of each word on the corresponding dimension, and therefore the obtained shallow layer sentence vector fuses the information of each word, and the information represented by a single word is converted into the information represented by the whole sentence.

In one embodiment, when the target sentence vector is a shallow sentence vector, the target sentence vector determination module 1210 is further configured to: acquiring a source end vector sequence of an encoder output layer; and determining the maximum value of the source end vector sequence of the output layer on the corresponding dimension to generate a shallow sentence vector. In the text processing apparatus, the element of each word in the corresponding dimension represents the importance degree of each word in the dimension. By determining the maximum value of the source-side vector sequence of the encoder output layer in the corresponding dimension, the most representative information in each dimension can be determined. The maximum value of the source end vector sequence of the output layer in the corresponding dimension is used as a shallow sentence vector, so that each component element of the obtained shallow sentence vector represents the most important information in each dimension, and the shallow sentence vector keeps the whole information of each word.

In one embodiment, when the target sentence vector is a shallow sentence vector, the target sentence vector determination module 1210 is further configured to: acquiring an input sequence of an encoder input layer; determining the maximum value of the input sequence of the input layer on the corresponding dimension to obtain a middle vector; acquiring a source end vector sequence of an encoder output layer; determining a similarity vector between the intermediate vector and the source end vector sequence of the output layer; obtaining a weight vector corresponding to the similarity according to the similarity vector; and generating a shallow sentence vector according to the weight vector and the source end vector sequence of the output layer. The method comprises the steps of obtaining an input sequence of an input layer of an encoder, determining the maximum value of the input sequence of the input layer on a corresponding dimension, obtaining a middle vector, and extracting key information of a source end word. And acquiring a source end vector sequence of an output layer of the encoder, and determining a similarity vector between the intermediate vector and the source end vector sequence of the output layer so as to determine the logic similarity between a source end word and the sequence subjected to semantic encoding. And obtaining a weight vector corresponding to the similarity according to the similarity vector, and generating a shallow sentence vector according to the weight vector and the source end vector sequence of the output layer, so that information of a source end word and information of the sequence subjected to semantic coding can be integrated, and key information of the word is integrated into a sentence.

In one embodiment, when the target sentence vector is a deep sentence vector, the target sentence vector determination module 1210 is further configured to: acquiring a source end vector sequence of each layer of an encoder; obtaining shallow sentence vectors of each layer according to the source end vector sequence of each layer; and generating a deep sentence vector according to the shallow sentence vector of each layer. And obtaining a shallow sentence vector of each layer according to the source end vector sequence of each layer by obtaining the source end vector sequence of each layer of the encoder. And generating deep sentence vectors according to the shallow sentence vectors of each layer, so that the obtained deep sentence vectors are fused with the information of the shallow sentence vectors of each layer. Therefore, global information of the source text is reserved in the deep sentence vector, and the target text obtained through translation is more accurate.

In one embodiment, when the target sentence vector is a deep sentence vector, the target sentence vector determination module 1210 is further configured to: inputting the shallow sentence vector of each layer into a recurrent neural network to obtain a deep sentence vector output by an output layer of the recurrent neural network; each layer of network of the recurrent neural network corresponds to each layer of network of the encoder, the input of each layer of the recurrent neural network comprises a shallow layer sentence vector of a corresponding layer in the encoder and an implicit state vector output by a layer above the layer in the recurrent neural network, and the implicit state vector is obtained after the recurrent neural network processes the shallow layer sentence vector input by the layer above. The shallow sentence vectors of each layer are input into the recurrent neural network to obtain the deep sentence vectors output by the output layer of the recurrent neural network, so that the obtained deep sentence vectors are fused with the information of the shallow sentence vectors of each layer, and the global information of the source text is reserved.

In one embodiment, when the target sentence vector is a deep sentence vector, the target sentence vector determination module 1210 is further configured to: determining a similarity vector between a shallow sentence vector of a current layer and a target end vector of a previous layer of the current layer; and determining a weight vector between the shallow sentence vector of the current layer and the target end vector of the previous layer of the current layer according to the similarity vector. And generating a deep sentence vector according to the weight vector and the shallow sentence vector of each layer. In the text processing apparatus, the logical similarity between the shallow sentence vector of the current layer and the target end vector of the previous layer of the current layer is determined by determining the similarity vector between the shallow sentence vector of the current layer and the target end vector of the previous layer of the current layer. And determining a weight vector between the shallow layer sentence vector of the current layer and the target end vector of the previous layer of the current layer according to the similarity vector, and generating a deep layer sentence vector according to the weight vector and the shallow layer sentence vector of each layer. The key information of each layer of shallow sentence vector can be integrated, the condition that the key information is lost is avoided, and the condition that translation is wrong is avoided.

In one embodiment, when the target sentence vector is a deep sentence vector, the target sentence vector determination module 1210 is further configured to: acquiring a source end vector sequence of each layer of an encoder; taking each layer as a current layer one by one, and determining a similarity vector between a source end vector sequence of the current layer and a last layer shallow sentence vector of the current layer; obtaining a weight vector corresponding to the similarity according to the similarity vector; and generating a shallow sentence vector of the current layer according to the weight vector and the source end vector sequence of the current layer. And calculating the shallow sentence vector of the next layer by converting the source end vector sequence of each layer into the shallow sentence vector of each layer and taking the shallow sentence vector output by the current layer as the input of the next layer. The shallow sentence vector of each current layer is obtained by the shallow sentence vector of the previous layer and the source end vector sequence of the current layer, the transmission of the information of the source end words is ensured, and the shallow sentence vector of the current layer is integrated with the information of all layers before the current layer, so that the information of each word of the source text is integrated into the sentence.

In one embodiment, the target word determination module 1212 is further configured to: obtaining a predicted word corresponding to each word according to the target end vector of each word and the target sentence vector; acquiring a candidate word of a target end; determining the similarity between the predicted word corresponding to each word and the candidate word at the target end; and taking the candidate word with the highest similarity of the predicted word corresponding to each word as the target word corresponding to each word. And obtaining a predicted word vector corresponding to each word according to the target end vector of each word and the target sentence vector. The word vectors of the candidate words of the target end are obtained, and the similarity between the predicted word vector corresponding to each word and the word vectors of the candidate words of the target end is determined. And taking the candidate word with the highest similarity of the predicted word vector corresponding to each word as the target word corresponding to each word, so as to obtain the target word corresponding to each word of the source text.

FIG. 13 is a diagram illustrating an internal structure of a computer device in one embodiment. The computer device may specifically be the terminal 110 (or the server 120) in fig. 1. As shown in fig. 13, the computer apparatus includes a processor, a memory, a network interface, an input device, and a display screen connected through a system bus. Wherein the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system and may also store a computer program that, when executed by the processor, causes the processor to implement the text processing method. The internal memory may also have stored therein a computer program that, when executed by the processor, causes the processor to perform a text processing method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the architecture shown in fig. 13 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, the text processing apparatus provided in the present application may be implemented in the form of a computer program that is executable on a computer device as shown in fig. 13. The memory of the computer device may store various program modules constituting the text processing apparatus, such as a sequence acquisition module 1202, an encoding module 1204, a weight acquisition module 1206, a target-side vector generation module 1208, a target sentence vector determination module 1210, a target word determination module 1212, and a target text generation module 1214 shown in fig. 12. The computer program constituted by the respective program modules causes the processor to execute the steps in the text processing method of the respective embodiments of the present application described in the present specification.

For example, the computer device shown in fig. 13 may perform the step of acquiring the input sequence of the source text by a sequence acquisition module in the text processing apparatus shown in fig. 12. The computer equipment can execute the step of obtaining the source end vector sequence by semantically coding the input sequence through the coding module. The computer device can execute the step of obtaining a first weight vector corresponding to each word in the source end vector sequence through the weight obtaining module. The computer device can execute the step of generating the target end vector of each word according to the source end vector sequence and the first weight vector corresponding to each word through the target end vector generating module. The computer device can execute the step of obtaining the target sentence vector according to the source end vector sequence through the target sentence vector determination module. The computer equipment can execute the step of determining the target word corresponding to each word according to the target end vector and the target sentence vector of each word through the target word determination module. The computer equipment can execute the step of generating the target text corresponding to the source text according to the target word corresponding to each word through the target text generation module.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the text processing method described above. Here, the steps of the text processing method may be steps in the text processing methods of the above-described respective embodiments.

In one embodiment, a computer-readable storage medium is provided, in which a computer program is stored which, when being executed by a processor, causes the processor to carry out the steps of the above-mentioned text processing method. Here, the steps of the text processing method may be steps in the text processing methods of the above-described respective embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A text processing method, comprising:

acquiring an input sequence of a source text;

semantic coding the input sequence to obtain a source end vector sequence;

obtaining a target sentence vector according to the source end vector sequence;

obtaining a predicted word vector corresponding to each word according to the target end vector of each word and the target sentence vector;

determining the similarity between the predicted word vector corresponding to each word and the word vector of the candidate word at the target end;

taking the candidate word with the highest similarity of the predicted word vector corresponding to each word as a target word corresponding to each word;

2. The method of claim 1, wherein the target sentence vector is a shallow sentence vector; the obtaining of the target sentence vector according to the source end vector sequence includes:

acquiring a source end vector sequence of an encoder output layer;

and determining the average value of the source end vector sequence of the output layer on the corresponding dimension to generate a shallow sentence vector.

3. The method of claim 1, wherein the target sentence vector is a shallow sentence vector; the obtaining of the target sentence vector according to the source end vector sequence includes:

acquiring a source end vector sequence of an encoder output layer;

and determining the maximum value of the source end vector sequence of the output layer on the corresponding dimension to generate a shallow sentence vector.

4. The method of claim 1, wherein the target sentence vector is a shallow sentence vector; the obtaining of the target sentence vector according to the source end vector sequence includes:

acquiring an input sequence of an encoder input layer;

determining the maximum value of the input sequence of the input layer on the corresponding dimension to obtain a middle vector;

acquiring a source end vector sequence of an encoder output layer;

determining a similarity vector between the intermediate vector and a source end vector sequence of the output layer;

obtaining a weight vector corresponding to the similarity according to the similarity vector;

and generating a shallow sentence vector according to the weight vector and the source end vector sequence of the output layer.

5. The method of claim 1, wherein the target sentence vector is a deep sentence vector; the obtaining of the target sentence vector according to the source end vector sequence includes:

acquiring a source end vector sequence of each layer of an encoder;

obtaining a shallow sentence vector of each layer according to the source end vector sequence of each layer;

and generating a deep sentence vector according to the shallow sentence vector of each layer.

6. The method of claim 5, wherein generating a deep sentence vector from the shallow sentence vectors of each layer comprises:

7. The method of claim 5, wherein generating a deep sentence vector from the shallow sentence vectors of each layer comprises:

determining a similarity vector between a shallow sentence vector of a current layer and a target end vector of a previous layer of the current layer;

determining a weight vector between a shallow sentence vector of a current layer and a target end vector of a previous layer of the current layer according to the similarity vector;

and generating a deep sentence vector according to the weight vector and the shallow sentence vector of each layer.

8. The method of claim 5, wherein the generating a shallow sentence vector for each layer according to the source end vector sequence for each layer comprises:

acquiring a source end vector sequence of each layer of an encoder;

taking each layer as a current layer one by one, and determining a similarity vector between a source end vector sequence of the current layer and a last layer shallow sentence vector of the current layer;

and generating a shallow sentence vector of the current layer according to the weight vector and the source end vector sequence of the current layer.

9. A text processing apparatus, characterized in that the apparatus comprises:

the target word determining module is used for obtaining a predicted word vector corresponding to each word according to the target end vector of each word and the target sentence vector; determining the similarity between the predicted word vector corresponding to each word and the word vector of the candidate word at the target end; taking the candidate word with the highest similarity of the predicted word vector corresponding to each word as a target word corresponding to each word;

10. The apparatus of claim 9, wherein when the target sentence vector is a shallow sentence vector, the target sentence vector determination module is further configured to obtain a source end vector sequence of an encoder output layer; and determining the average value of the source end vector sequence of the output layer on the corresponding dimension to generate a shallow sentence vector.

11. The apparatus of claim 9, wherein when the target sentence vector is a shallow sentence vector, the target sentence vector determination module is further configured to obtain a source end vector sequence of an encoder output layer; and determining the maximum value of the source end vector sequence of the output layer on the corresponding dimension to generate a shallow sentence vector.

12. The apparatus of claim 9, wherein when the target sentence vector is a shallow sentence vector, the target sentence vector determination module is further configured to obtain an input sequence of an encoder input layer; determining the maximum value of the input sequence of the input layer on the corresponding dimension to obtain a middle vector; acquiring a source end vector sequence of an encoder output layer; determining a similarity vector between the intermediate vector and a source end vector sequence of the output layer; obtaining a weight vector corresponding to the similarity according to the similarity vector; and generating a shallow sentence vector according to the weight vector and the source end vector sequence of the output layer.

13. The apparatus of claim 9, wherein when the target sentence vector is a shallow sentence vector, the target sentence vector determination module is further configured to obtain a source end vector sequence of each layer of the encoder; obtaining a shallow sentence vector of each layer according to the source end vector sequence of each layer; and generating a deep sentence vector according to the shallow sentence vector of each layer.

14. The apparatus according to claim 13, wherein the target sentence vector determination module is further configured to input the shallow sentence vector of each layer into a recurrent neural network, resulting in a deep sentence vector output by an output layer of the recurrent neural network; each layer of network of the recurrent neural network corresponds to each layer of network of the encoder, the input of each layer of the recurrent neural network comprises a shallow layer sentence vector of a corresponding layer in the encoder and an implicit state vector output by a layer above the layer in the recurrent neural network, and the implicit state vector is obtained after the recurrent neural network processes the shallow layer sentence vector input by the layer above.

15. The apparatus of claim 13, wherein the target sentence vector determination module is further configured to determine a similarity vector between a shallow sentence vector of a current layer and a target end vector of a previous layer of the current layer; determining a weight vector between a shallow sentence vector of a current layer and a target end vector of a previous layer of the current layer according to the similarity vector; and generating a deep sentence vector according to the weight vector and the shallow sentence vector of each layer.

16. The apparatus of claim 13, wherein the target sentence vector determination module is further configured to obtain a source end vector sequence for each layer of the encoder; taking each layer as a current layer one by one, and determining a similarity vector between a source end vector sequence of the current layer and a last layer shallow sentence vector of the current layer; obtaining a weight vector corresponding to the similarity according to the similarity vector; and generating a shallow sentence vector of the current layer according to the weight vector and the source end vector sequence of the current layer.

17. A computer-readable storage medium, storing a computer program which, when executed by a processor, causes the processor to carry out the steps of the method according to any one of claims 1 to 8.

18. A computer device comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of the method according to any one of claims 1 to 8.