CN109933809B

CN109933809B - Translation method and device, and training method and device of translation model

Info

Publication number: CN109933809B
Application number: CN201910198990.7A
Authority: CN
Inventors: 李长亮; 王怡然; 郭馨泽; 唐剑波
Original assignee: Chengdu Kingsoft Interactive Entertainment Technology Co ltd; Beijing Kingsoft Digital Entertainment Co Ltd
Current assignee: Chengdu Kingsoft Interactive Entertainment Technology Co ltd; Beijing Kingsoft Digital Entertainment Co Ltd
Priority date: 2019-03-15
Filing date: 2019-03-15
Publication date: 2023-09-15
Anticipated expiration: 2039-03-15
Also published as: CN109933809A

Abstract

The application provides a translation method and device and a training method and device of a translation model, wherein the method comprises the following steps: splitting the target text to obtain at least two clauses; inputting each clause into a coding layer to obtain a sentence coding vector corresponding to each clause; obtaining a text coding vector corresponding to the target text according to the sentence coding vector corresponding to each clause; and inputting the text coding vector to a decoding layer to generate a translation text corresponding to the target text. Compared with the prior art, the sentence dependency relationship of the target text is enhanced, so that the translation model can obtain better translation effect under the condition of translating long text.

Description

Translation method and device, and training method and device of translation model

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a translation method and apparatus, a translation model training method and apparatus, a computing device, a computer readable storage medium, and a chip.

Background

In the prior art, most of the translation models have a frame of a coding layer-decoding layer structure, wherein the coding layer is responsible for compressing a source language sentence into a coding vector in a semantic space, and inputting the coding vector to a decoder, wherein the coding vector contains main information of the source language sentence; the decoding layer iterates the coding vectors provided by the coding layer to produce semantically equivalent target language sentences, i.e. machine translation results.

In the existing translation model, long-distance dependency relationship of a text is lost in the process of encoding the long text, so that a good translation effect cannot be obtained, and a large number of overturns are caused.

Disclosure of Invention

In view of the above, the embodiments of the present application provide a translation method and apparatus, a translation model training method and apparatus, a computing device, a computer readable storage medium, and a chip, so as to solve the technical defects existing in the prior art.

The embodiment of the application provides a translation method, which is used for a translation model, wherein the translation model comprises an encoding layer and a decoding layer, and the method comprises the following steps:

splitting the target text to obtain at least two clauses;

inputting each clause into a coding layer to obtain a sentence coding vector corresponding to each clause;

obtaining a text coding vector corresponding to the target text according to the sentence coding vector corresponding to each clause;

and inputting the text coding vector to a decoding layer to generate a translation text corresponding to the target text.

Optionally, inputting each clause to the coding layer to obtain a sentence coding vector corresponding to each clause, including:

for the 1 st clause, inputting the clause into a coding layer to obtain a sentence coding vector corresponding to the 1 st clause;

And for other clauses except the 1 st clause, inputting the sentence code vector corresponding to the previous clause and the current clause into the coding layer to obtain the sentence code vector corresponding to the current clause.

Optionally, obtaining a text encoding vector corresponding to the target text according to the sentence encoding vector corresponding to each clause, including:

and taking the sentence coding vector corresponding to the last clause as the text coding vector corresponding to the target text.

Optionally, the target text includes N clauses, each clause including M words, where M is greater than or equal to 2, N is greater than or equal to 2, and M, N is a positive integer;

for the ith clause, wherein i is a positive integer and 1 < i.ltoreq.N;

inputting the sentence code vector corresponding to the previous clause and the current clause into the coding layer to obtain the sentence code vector corresponding to the current clause, wherein the method comprises the following steps:

s102, obtaining a coding hidden layer output corresponding to the 1 st word according to the 1 st word of the i st clause input to the coding layer and the sentence coding vector corresponding to the i-1 st clause;

s104, obtaining the coding hidden layer output corresponding to the jth word according to the sentence coding vector corresponding to the ith-1 clause, the coding hidden layer output corresponding to the jth-1 word and the jth word of the ith clause, wherein j is more than or equal to 2 and less than or equal to M;

S106, automatically increasing j by 1, judging whether j after the automatic increase of 1 is larger than M, if yes, executing the step S108, and if not, continuously executing the step S104;

s108, outputting according to the coding hidden layers corresponding to the M words of the ith clause, and obtaining the sentence coding vector corresponding to the ith clause.

Optionally, the step S102 includes:

obtaining a corresponding word coding vector according to the 1 st word of the i-th clause input to the coding layer;

and obtaining the coding hidden layer output corresponding to the 1 st word according to the word coding vector corresponding to the 1 st word of the i-th clause and the sentence coding vector corresponding to the i-1 st clause input to the coding layer.

Optionally, the step S104 includes:

obtaining a corresponding word coding vector according to the j-th word of the i-th clause input to the coding layer;

and obtaining the coding hidden layer output corresponding to the j-th word of the i-th clause according to the word coding vector corresponding to the j-th word of the i-th clause, the sentence coding vector corresponding to the i-1 th clause input to the decoding layer and the coding hidden layer output corresponding to the j-1 th word.

Optionally, the step S108 includes:

and multiplying the coding hidden layer outputs corresponding to the M words of the ith clause by the corresponding weight coefficients respectively, and then summing to obtain the sentence coding vector corresponding to the ith clause.

Optionally, inputting the text encoding vector to the decoding layer, generating a translation text corresponding to the target text, including:

inputting the text coding vector and the initial setting word into a decoding layer to generate at least two decoding words;

and obtaining the translation text corresponding to the target text according to the at least two decoding words.

Optionally, the number of the decoding words is P, wherein P is more than or equal to 2 and P is a positive integer; inputting the text encoding vector and the initial set word to a decoding layer to generate at least two decoded words, comprising:

s202, obtaining a 1 st decoding hidden layer output according to a text coding vector and an initial set word input to a decoding layer, and obtaining a 1 st decoding word according to the 1 st decoding hidden layer output;

s204, according to the q-1 decoding word and the q-1 decoding hidden layer output, obtaining the q decoding hidden layer output, and according to the q decoding hidden layer output, obtaining the q decoding word, wherein q is a positive integer and is more than or equal to 2 and less than or equal to P;

s206, automatically increasing q by 1, judging whether q after the automatic increase of 1 is larger than P, if so, ending, and if not, continuing to execute the step S204.

The embodiment of the application provides a training method of a translation model, wherein the translation model comprises a coding layer and a decoding layer; the training method comprises the following steps:

Splitting a first text of the target corpus to obtain at least two clauses;

obtaining a text coding vector corresponding to the first text according to the sentence coding vector corresponding to each clause;

inputting the text coding vector and the translated second text corresponding to the first text to a decoding layer to obtain an output training translation text;

judging whether a training stopping condition is reached according to the error of the training translation text and the second text;

if yes, stopping training;

if not, continuing to execute the step of splitting the first text of the target corpus to obtain at least two clauses.

Optionally, the second text includes P tag words;

inputting the text encoding vector and the translated second text corresponding to the first text to a decoding layer to obtain an output training translation text, comprising:

s302, obtaining a 1 st decoding hidden layer output according to the text coding vector input to the decoding layer and the initial set word, and obtaining a 1 st decoding word according to the 1 st decoding hidden layer output;

s304, according to the q-1 tag word and the q-1 decoding hidden layer output, obtaining the q decoding hidden layer output, and according to the q decoding hidden layer output, obtaining the q decoding word, wherein q is a positive integer and is more than or equal to 2 and less than or equal to P;

S306, self-increasing q by 1, judging whether q after self-increasing by 1 is larger than P, if so, executing step S308, and if not, continuing to execute step S304;

s308, obtaining corresponding training translation text according to the P decoding words.

Optionally, the training stop condition includes: the error of the training translated text and the second text is less than a stability threshold.

The embodiment of the application provides a translation device which is used for a translation model, wherein the translation model comprises an encoding layer and a decoding layer, and the translation device comprises:

the first splitting module is configured to split the target text to obtain at least two clauses;

the first clause coding module is configured to input each clause into the coding layer to obtain a sentence coding vector corresponding to each clause;

the first text coding module is configured to obtain text coding vectors corresponding to the target text according to sentence coding vectors corresponding to each clause;

the first decoding module is configured to input the text encoding vector to a decoding layer and generate a translation text corresponding to the target text.

The embodiment of the application provides a training device of a translation model, wherein the translation model comprises an encoding layer and a decoding layer, and the training device comprises:

The second splitting module is configured to split the first text of the target corpus to obtain at least two clauses;

the second clause coding module is configured to input each clause into the coding layer to obtain a sentence coding vector corresponding to each clause;

the second text coding module is configured to obtain text coding vectors corresponding to the first text according to sentence coding vectors corresponding to each clause;

the second decoding module is configured to input the text coding vector and the translated second text corresponding to the first text to the decoding layer to obtain an output training translation text;

and the training module is configured to judge whether the training stopping condition is met according to the error of the training translation text and the second text, if so, stopping training, and if not, continuing to execute the second splitting module.

Embodiments of the present application provide a computing device comprising a memory, a processor, and computer instructions stored on the memory and executable on the processor, which when executed implement the steps of the translation method or training method of a translation model as described above.

Embodiments of the present application provide a computer readable storage medium storing computer instructions that, when executed by a processor, implement the steps of a translation method or training method of a translation model as described above.

The embodiment of the application provides a chip which stores computer instructions which, when executed by the chip, implement the steps of the translation method or the training method of the translation model.

According to the translation method and device, the target text is split to obtain the clauses, each clause is coded to obtain the corresponding sentence coding vector, the text coding vector corresponding to the target text is obtained according to the sentence coding vector corresponding to each clause and is input to the decoding layer, and compared with the prior art, the sentence dependency relationship of the target text is enhanced, so that a translation model can obtain better translation effect under the condition of translating long text.

In addition, in the process of generating sentence coding vectors corresponding to each clause, the sentence coding vector corresponding to the previous clause and the current clause are input to a coding layer to obtain the sentence coding vector corresponding to the current clause, so that information attenuation of words with the target text in the later sequence in the coding process is reduced, and better translation effect is achieved by a translation model.

According to the training method of the translation model, each clause of the first text of the target corpus is input to the coding layer to obtain the corresponding sentence coding vector, the text coding vector corresponding to the first text is obtained according to the sentence coding vector corresponding to each clause, the text coding vector and the second text are input to the decoding layer to obtain the output training translation text, and the translation model is continuously trained according to the error of the training translation text and the error of the second text, so that the translation model for reducing the word information attenuation of the target text can be obtained, and better translation effects can be obtained.

Drawings

FIG. 1 is a schematic architecture diagram of a computing device according to an embodiment of the application;

FIG. 2 is a flow chart of a translation method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a process for generating an encoded hidden layer output and sentence encoding vector of an encoding layer according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a translation method according to an embodiment of the present application;

FIG. 5 is a flow chart of a translation method according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a process for generating a decoding hidden layer output and sentence code vector of an encoding layer according to an embodiment of the present application;

FIG. 7a is a schematic diagram of a process for generating an encoded hidden layer output and sentence encoded vectors of an encoding layer of a translation model according to another embodiment of the present application;

FIG. 7b is a schematic diagram of a process for generating a decoding hidden layer output and sentence code vectors of a decoding layer of a translation model according to another embodiment of the present application;

FIG. 8 is a flow chart of a method of training a translation model according to another embodiment of the present application;

FIG. 9 is a schematic diagram of a translation device according to another embodiment of the present application;

FIG. 10 is a schematic diagram of a training apparatus for translation models according to still another embodiment of the present application.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. The present application may be embodied in many other forms than those herein described, and those skilled in the art will readily appreciate that the present application may be similarly embodied without departing from the spirit or essential characteristics thereof, and therefore the present application is not limited to the specific embodiments disclosed below.

The terminology used in the one or more embodiments of the specification is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the specification. As used in this specification, one or more embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that, although the terms first, second, etc. may be used in one or more embodiments of this specification to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first may also be referred to as a second, and similarly, a second may also be referred to as a first, without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.

First, terms related to one or more embodiments of the present application will be explained.

Translation model: the main idea is that a sentence to be translated is encoded into a code vector through an encoding layer (encoder), then the code vector is decoded by a decoding layer (decoder) to obtain a decoding vector, and then the decoding vector is translated into a corresponding translation sentence.

LSTM (Long Short-Term Memory) model: is a time recurrent neural network suitable for processing and predicting important events with relatively long intervals and delays in a time series. The LSTM model may be used to link previous information to a current task, such as using past statements to infer an understanding of the current statement.

Coding (encoder): converting the sentence to be translated from characters into coding vectors;

decoding (decoder): the encoded vector is converted into language words of the translation sentence.

In the present application, a translation method and apparatus, a translation model training method and apparatus, a computing device, a computer-readable storage medium, and a chip are provided, and detailed descriptions are provided in the following embodiments.

Fig. 1 is a block diagram illustrating a configuration of a computing device 100 according to an embodiment of the present description. The components of the computing device 100 include, but are not limited to, a memory 110 and a processor 120. Processor 120 is coupled to memory 110 via bus 130 and database 150 is used to store data.

Computing device 100 also includes access device 140, access device 140 enabling computing device 100 to communicate via one or more networks 160. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. The access device 140 may include one or more of any type of network interface, wired or wireless (e.g., a Network Interface Card (NIC)), such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.

In one embodiment of the present description, the above-described components of computing device 100, as well as other components not shown in FIG. 1, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device shown in FIG. 1 is for exemplary purposes only and is not intended to limit the scope of the present description. Those skilled in the art may add or replace other components as desired.

Computing device 100 may be any type of stationary or mobile computing device including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smart phone), wearable computing device (e.g., smart watch, smart glasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 100 may also be a mobile or stationary server.

Wherein the processor 120 may perform the steps of the method shown in fig. 2. Fig. 2 is a schematic flow diagram illustrating a translation method for a translation model including an encoding layer and a decoding layer according to an embodiment of the present application. In this embodiment, the translation model may use an LSTM model.

The translation method of this embodiment includes the following steps 202 to 208:

202. and splitting the target text to obtain at least two clauses.

In the embodiment, the target text is split into the clauses, and then the clauses are encoded to obtain the corresponding sentence encoding vector, so that the semantic information loss of the words in the preceding sequence in the encoding process can be reduced, and the semantic information of the text can be transmitted farther backwards in the encoding process.

For example, the target text is "sand-blown years have passed over, today's air is fresh, wind and daily, on a wide traffic road, only a rich garden is seen, cabbage, tomato and emerald cucumber are grown, then fruit tree forest" can be split into 5 clauses, respectively "sand-blown years have passed over again", "now air is fresh, wind and daily", "on a wide traffic road", "only a rich garden is seen, cabbage, tomato and emerald cucumber are grown", "then fruit tree forest".

204. And inputting each clause into the coding layer to obtain a sentence coding vector corresponding to each clause.

Specifically, step 204 includes:

s2042, inputting the clause into the coding layer for the 1 st clause, and obtaining a sentence coding vector corresponding to the 1 st clause.

S2044, for other clauses except the 1 st clause, inputting the sentence code vector corresponding to the previous clause and the current clause into the coding layer to obtain the sentence code vector corresponding to the current clause.

For convenience of explanation, referring to fig. 3, fig. 3 is a schematic diagram of a process of generating a coding hidden layer output and sentence coding vector of a coding layer. In FIG. 3, X ⁽ⁱ⁾ _j The j-th word representing the i-th clause, C ⁽ⁱ⁾ Sentence code vector representing the ith clause, h ⁽ⁱ⁾ _j Representing the coding hidden layer output corresponding to the j-th word from which the i-th clause is obtained.

The following describes steps S2042 to S2042 in detail, taking the target text including N clauses and each clause including M words as an example. Wherein M is more than or equal to 2, N is more than or equal to 2 and M, N is a positive integer.

Referring to fig. 4, step S2042 includes the following steps 402 to 408:

402. and obtaining the coding hidden layer output corresponding to the 1 st word according to the 1 st word input to the coding layer.

404. And obtaining the coding hidden layer output corresponding to the jth word according to the coding hidden layer output corresponding to the jth-1 word input to the coding layer and the jth word, wherein j is a positive integer and is more than or equal to 2 and less than or equal to M.

Specifically, step 404 includes: obtaining a corresponding word coding vector according to the j-th word input to the coding layer; and then obtaining the code hidden layer output corresponding to the jth word according to the code hidden layer output corresponding to the jth-1 word input to the code layer and the word code vector corresponding to the jth word.

406. And (3) automatically increasing j by 1, judging whether j after the automatic increase of 1 is greater than M, if so, executing step 408, and if not, continuing to execute step 404.

408. And outputting according to the coding hidden layers corresponding to the M words of the 1 st clause to obtain the sentence coding vector corresponding to the 1 st clause.

Step 408 includes: the coding hidden layer outputs corresponding to the M words of the 1 st clause are multiplied by the corresponding weight coefficients respectively, and then summed to obtain the sentence coding vector corresponding to the 1 st clause.

Specifically, see the following formula (1) for calculating the sentence code vector corresponding to the 1 st clause:

wherein C is ⁽ⁱ⁾ Representing sentence code vectors corresponding to the ith clause;

i represents the i-th clause, where i=1 for the 1-th clause;

j represents the j-th word of the i-th clause, and M represents the number of words included in the i-th clause;

h represents the coding hidden layer output corresponding to the jth word;

w represents the weight coefficient corresponding to each coding hidden layer output.

Referring to fig. 5, step S2044 includes the following steps 502 to 508:

502. and obtaining the coding hidden layer output corresponding to the 1 st word according to the 1 st word of the i-th clause input to the coding layer and the sentence coding vector corresponding to the i-1 st clause.

Specifically, step 502 includes: obtaining a corresponding word coding vector according to the 1 st word of the i-th clause input to the coding layer; and then obtaining the coding hidden layer output corresponding to the 1 st word according to the word coding vector corresponding to the 1 st word of the i-th clause and the sentence coding vector corresponding to the i-1 st clause input to the coding layer.

504. And obtaining the coding hidden layer output corresponding to the jth word according to the sentence coding vector corresponding to the ith-1 clause, the coding hidden layer output corresponding to the jth-1 word and the jth word of the ith clause, wherein j is more than or equal to 2 and less than or equal to M.

Specifically, step 504 includes: obtaining a corresponding word coding vector according to the j-th word of the i-th clause input to the coding layer; and then obtaining the coding hidden layer output corresponding to the j-th word of the i-th clause according to the word coding vector corresponding to the j-th word of the i-th clause, the sentence coding vector corresponding to the i-1 th clause input to the decoding layer and the coding hidden layer output corresponding to the j-1 th word.

506. And (3) automatically increasing j by 1, judging whether j after the automatic increase of 1 is greater than M, if so, executing step 508, and if not, continuing to execute step 504.

508. And outputting according to the coding hidden layers corresponding to the M words of the ith clause, and obtaining the sentence coding vector corresponding to the ith clause.

Step 508 includes: and multiplying the coding hidden layer outputs corresponding to the M words of the ith clause by the corresponding weight coefficients respectively, and then summing to obtain the sentence coding vector corresponding to the ith clause.

Specifically, the calculation of the sentence code vector corresponding to the ith clause in step 508 is referred to the above formula (1), and will not be described herein.

206. And obtaining a text coding vector corresponding to the target text according to the sentence coding vector corresponding to each clause.

In this embodiment, there are many methods for obtaining a text encoding vector corresponding to a target text according to a sentence encoding vector corresponding to each clause, for example:

first kind: multiplying sentence coding vectors corresponding to each clause by coefficients respectively to obtain text coding vectors;

second kind: obtaining text coding vectors according to sentence coding vectors corresponding to the last several clauses;

third kind: and taking the sentence coding vector corresponding to the last clause as the text coding vector corresponding to the target text.

In this embodiment, since the sentence code vector of the previous clause is utilized in the process of generating the sentence code vector of the previous clause, the sentence code vector of the last clause already includes the semantic information contained in the previous clause, so that the sentence code vector of the last clause is used as the text code vector of the target text, and the translation accuracy of the translation process can be ensured.

208. And inputting the text coding vector to a decoding layer to generate a translation text corresponding to the target text.

Specifically, step 208 includes the following steps S2082 to S2084:

s2082, inputting the text coding vector and the initial setting word into a decoding layer, and generating at least two decoding words.

S2084, obtaining the translation text corresponding to the target text according to the at least two decoding words.

Specifically, referring to fig. 6, fig. 6 is a schematic diagram of a process of generating a decoding hidden layer output and sentence coding vector of a coding layer. In FIG. 6, h ₁ Representing the 1 st decoded hidden layer output, Y ₁ Represents the 1 st decoded word, C ^N Representing the text encoding vector corresponding to the target text,<start>representing the initially set word.

Taking the decoding words as P examples, wherein P is more than or equal to 2 and P is a positive integer. Referring to fig. 7, step S2082 includes the following steps 702-706:

702. And obtaining a 1 st decoding hidden layer output according to the text coding vector input to the decoding layer and the initial set word, and obtaining the 1 st decoding word according to the 1 st decoding hidden layer output.

704. According to the q-1 decoding word and the q-1 decoding hidden layer output, the q decoding hidden layer output is obtained, and according to the q decoding hidden layer output, the q decoding word is obtained, wherein q is a positive integer and q is more than or equal to 2 and less than or equal to P.

706. And (3) automatically increasing q by 1, judging whether q after the automatic increase of 1 is larger than P, if so, ending, and if not, continuing to execute step 704.

According to the translation method provided by the application, the target text is split to obtain the clauses, each clause is then encoded to obtain the corresponding sentence code vector, the text code vector corresponding to the target text is obtained according to the sentence code vector corresponding to each clause and is input to the decoding layer, and compared with the prior art, the sentence dependency relationship of the target text is enhanced, so that the translation model can obtain better translation effect under the condition of translating long text.

In order to facilitate understanding of the technical solution of the present embodiment, a specific example will be schematically described below. Taking the target text including 'i love China, i love Beijing' as an example, the schematic description is carried out. The target text includes 2 clauses "i love china" and "i love beijing", each of which includes 4 words "i", "love", "middle", "country" and "i", "love", "north", "jing".

Referring to fig. 7a and 7b, fig. 7a is a schematic diagram of a process of generating a coding hidden layer output and sentence coding vector of a coding layer. Fig. 7b is a schematic diagram of a process for generating a decoding hidden layer output and sentence coding vectors of a decoding layer.

The translation method comprises the following steps:

1) And splitting the target text to obtain 2 clauses.

2) Inputting the 1 st clause into the coding layer to obtain a sentence coding vector C corresponding to the 1 st clause ⁽¹⁾ 。

Specifically, step 2) includes: word 1X according to clause 1 ⁽¹⁾ ₁ Obtaining the coding hidden layer output h corresponding to the 1 st word ⁽¹⁾ ₁ The method comprises the steps of carrying out a first treatment on the surface of the Output h from coding hidden layer corresponding to 1 st word ⁽¹⁾ ₁ And word 2X ⁽¹⁾ ₂ Obtain word X2 ⁽¹⁾ ₂ Corresponding coded hidden layer output h ⁽¹⁾ ₂ The method comprises the steps of carrying out a first treatment on the surface of the According to word 2X ⁽¹⁾ ₂ Corresponding coded hidden layer output h ⁽¹⁾ ₂ And word 3X ⁽¹⁾ ₃ Obtain word X3 ⁽¹⁾ ₃ Corresponding coded hidden layer output h ⁽¹⁾ ₃ The method comprises the steps of carrying out a first treatment on the surface of the According to 3 rdIndividual word X ⁽¹⁾ ₃ Corresponding coded hidden layer output h ⁽¹⁾ ₃ And word 4X ⁽¹⁾ ₄ Obtain word X4 ⁽¹⁾ ₄ Corresponding coded hidden layer output h ⁽¹⁾ ₄ The method comprises the steps of carrying out a first treatment on the surface of the Finally, outputting the coding hidden layer of the 1 st to 4 th words to h ⁽¹⁾ ₁ ～h ⁽¹⁾ ₄ Respectively multiplying the weight coefficients and then summing to obtain sentence code vector C corresponding to the 1 st clause ⁽¹⁾ 。

Output h for coding hidden layer corresponding to 1 st to 4 th words ⁽¹⁾ ₁ ～h ⁽¹⁾ ₄ The foregoing embodiments have been described in detail, and will not be repeated here.

3) Sentence code vector C corresponding to 1 st clause ⁽¹⁾ Inputting the 2 nd clause into the coding layer to obtain a sentence coding vector C corresponding to the 2 nd clause ⁽²⁾ 。

Specifically, step 3) includes: word 1X according to clause 2 ⁽²⁾ ₁ Sentence code vector C corresponding to 1 st clause ⁽¹⁾ Obtain the 1 st word X of the 2 nd clause ⁽²⁾ ₁ Corresponding coded hidden layer output h ⁽²⁾ ₁ The method comprises the steps of carrying out a first treatment on the surface of the According to sentence code vector C corresponding to 1 st clause ⁽¹⁾ Word 1X ⁽²⁾ ₁ Corresponding coded hidden layer output h ⁽²⁾ ₁ And word 2X of clause 2 ⁽²⁾ ₂ Obtain word X2 ⁽²⁾ ₂ Corresponding coded hidden layer output h ⁽²⁾ ₂ The method comprises the steps of carrying out a first treatment on the surface of the According to sentence code vector C corresponding to 1 st clause ⁽¹⁾ Coding hidden layer output h corresponding to the 2 nd word ⁽²⁾ ₂ And word 3X of clause 2 ⁽²⁾ ₃ Obtain word X3 ⁽²⁾ ₃ Corresponding coded hidden layer output h ⁽²⁾ ₃ The method comprises the steps of carrying out a first treatment on the surface of the According to sentence code vector C corresponding to 1 st clause ⁽¹⁾ Word 3X ⁽²⁾ ₃ Corresponding coded hidden layer output h ⁽²⁾ ₃ And word 4X of clause 2 ⁽²⁾ ₄ Obtain word X4 ⁽²⁾ ₄ Corresponding braidingCode hidden layer output h ⁽²⁾ ₄ The method comprises the steps of carrying out a first treatment on the surface of the Finally, outputting the coding hidden layer of the 1 st to 4 th words to h ⁽²⁾ ₁ ～h ⁽²⁾ ₄ Respectively multiplying the weight coefficients and then summing to obtain sentence code vector C corresponding to the 2 nd clause ⁽²⁾ 。

4) Sentence code vector C corresponding to the 2 nd clause ⁽²⁾ The text decoding vector is input to the decoding layer, and 6 decoding words are generated.

In this embodiment, the 6 decoded words include "I", "love", "China", "I", "love", "Beijin".

Specifically, the text code vector C is input to the decoding layer ⁽²⁾ Initially setting words<START>Obtain the 1 st decoding hidden layer output h ₁ And decodes hidden layer output h according to 1 st ₁ Obtain the 1 st decoded word Y ₁ The method comprises the steps of carrying out a first treatment on the surface of the Decoding word Y according to 1 st ₁ And 1 st decoding hidden layer output h ₁ Obtaining the 2 nd decoding hidden layer output h ₂ And decodes hidden layer output h according to the 2 nd ₂ Obtain the 2 nd decoded word Y ₂ The method comprises the steps of carrying out a first treatment on the surface of the Decoding word Y according to 2 nd ₂ And the 2 nd decoding hidden layer output h ₂ Obtaining the 3 rd decoding hidden layer output h ₃ And decodes hidden layer output h according to 3 rd ₃ Obtain the 3 rd decoded word Y ₃ The method comprises the steps of carrying out a first treatment on the surface of the Decoding word Y according to 3 rd ₃ And 3 rd decoding hidden layer output h ₃ Obtaining the 4 th decoding hidden layer output h ₄ And decodes hidden layer output h according to 4 th ₄ Obtain the 4 th decoded word Y ₄ The method comprises the steps of carrying out a first treatment on the surface of the Decoding word Y according to 4 th ₄ And 4 th decoding hidden layer output h ₄ Obtaining the 5 th decoding hidden layer output h ₅ And decodes hidden layer output h according to 5 th ₅ Obtain the 5 th decoded word Y ₅ The method comprises the steps of carrying out a first treatment on the surface of the Decoding word Y according to 5 th ₅ And 5 th decoding hidden layer output h ₅ Obtaining the 6 th decoding hidden layer output h ₆ And decodes hidden layer output h according to 6 th ₆ Obtain the 6 th decoded word Y ₆ 。

5) And obtaining the translation text corresponding to the target text according to the 6 decoding words.

In this embodiment, the target text "I love China, I love Beijing" corresponding translation text "I love China, I love beijin" is obtained according to 6 decoding words.

The application also discloses a training method of the translation model, referring to fig. 8, comprising the following steps:

802. splitting the first text of the target corpus to obtain at least two clauses.

804. And inputting each clause into the coding layer to obtain a sentence coding vector corresponding to each clause.

806. And obtaining the text coding vector corresponding to the first text according to the sentence coding vector corresponding to each clause.

808. And inputting the text coding vector and the translated second text corresponding to the first text to a decoding layer to obtain an output training translation text.

Specifically, taking the example that the second text includes P tag words, step 808 includes the following steps S8082 to S8088:

s8082, obtaining a 1 st decoding hidden layer output according to the text coding vector input to the decoding layer and the initial set word, and obtaining a 1 st decoding word according to the 1 st decoding hidden layer output;

s8084, according to the q-1 tag word and the q-1 decoding hidden layer output, obtaining the q decoding hidden layer output, and according to the q decoding hidden layer output, obtaining the q decoding word, wherein q is a positive integer and q is more than or equal to 2 and less than or equal to P;

s8086, automatically increasing q by 1, judging whether q after the automatic increase of 1 is larger than P, if so, executing the step S8088, and if not, continuously executing the step S8084;

s8088, obtaining corresponding training translation text according to the P decoding words.

As can be seen from the above steps S8082 to S8088, in the training phase of the translation model, the q-th decoding hidden layer output is obtained according to the q-1 th tag word and the q-1 th decoding hidden layer output, unlike the translation phase, in which the q-th decoding hidden layer output is obtained according to the q-1 th decoding word and the q-1 th decoding hidden layer output.

For example, the first text is "I love China" and the second text is "I love China". After receiving sentence coding vectors corresponding to 'I love China', a decoding layer inputs an initial set word < start >, then obtains a 1 st decoding word you, and the actual correct decoding word is a tag word I; then outputting according to the label word I and the 1 st decoding hidden layer to obtain a 2 nd decoding word love; and then outputting according to the label word love and the 2 nd decoding hidden layer to obtain the 3 rd decoding word China.

Finally, a training translation text 'You love China' is obtained according to the obtained decoded words, and is compared with a second text 'I love China' to obtain errors between the two text 'You love China'.

810. And judging whether a training stopping condition is met according to the error of the training translation text and the second text, if so, stopping training, and if not, continuing to execute the step 802.

Specific training stop conditions include: the error of the training translated text from the second text is less than the stability threshold.

The stability threshold may be set according to actual requirements, for example, to 10%.

An embodiment of the present application also discloses a translation device, referring to fig. 9, for a translation model, where the translation model includes an encoding layer and a decoding layer, and the translation device includes:

the first splitting module 902 is configured to split the target text to obtain at least two clauses;

the first clause coding module 904 is configured to input each clause to the coding layer to obtain a sentence coding vector corresponding to each clause;

a first text encoding module 906 configured to obtain a text encoding vector corresponding to the target text according to the sentence encoding vector corresponding to each clause;

the first decoding module 908 is configured to input the text encoding vector to the decoding layer, generating a translated text corresponding to the target text.

Optionally, the first clause encoding module 904 is specifically configured to:

Optionally, the first text encoding module 906 is specifically configured to: and taking the sentence coding vector corresponding to the last clause as the text coding vector corresponding to the target text.

for the ith clause, wherein i is a positive integer and 1 < i.ltoreq.N;

the first clause encoding module 904 is specifically configured to:

Optionally, the first clause encoding module 904 is specifically configured to:

Optionally, the first clause encoding module 904 is specifically configured to: and multiplying the coding hidden layer outputs corresponding to the M words of the ith clause by the corresponding weight coefficients respectively, and then summing to obtain the sentence coding vector corresponding to the ith clause.

Optionally, the first decoding module 908 is specifically configured to:

Optionally, the number of the decoding words is P, wherein P is more than or equal to 2 and P is a positive integer; the first decoding module 908 is specifically configured to:

According to the translation device provided by the application, the target text is split to obtain the clauses, each clause is then encoded to obtain the corresponding sentence code vector, the text code vector corresponding to the target text is obtained according to the sentence code vector corresponding to each clause and is input to the decoding layer, and compared with the prior art, the sentence dependency relationship of the target text is enhanced, so that a translation model can obtain better translation effect under the condition of translating long text.

The above is a schematic scheme of the translation apparatus of this embodiment. It should be noted that, the technical solution of the translating device and the technical solution of the translating method belong to the same concept, and details of the technical solution of the translating device which are not described in detail can be referred to the description of the technical solution of the translating method.

The embodiment of the application also discloses a training device of the translation model, referring to fig. 10, the translation model comprises an encoding layer and a decoding layer, and the training device comprises:

A second splitting module 1002, configured to split the first text of the target corpus, to obtain at least two clauses;

a second clause coding module 1004 configured to input each clause to the coding layer to obtain a sentence coding vector corresponding to each clause;

a second text encoding module 1006 configured to obtain a text encoding vector corresponding to the first text according to the sentence encoding vector corresponding to each clause;

a second decoding module 1008 configured to input the text encoding vector and the translated second text corresponding to the first text to a decoding layer, resulting in an output training translated text;

the training module 1010 is configured to determine whether a training stopping condition is reached according to an error between the training translation text and the second text, if yes, stopping training, and if not, continuing to execute the second splitting module 1002.

Wherein the training stop conditions include: the error of the training translated text from the second text is less than the stability threshold.

Optionally, the second text includes P tag words, and the second decoding module 1008 is specifically configured to:

According to the training device of the translation model, each clause of the first text of the target corpus is input to the coding layer to obtain the corresponding sentence coding vector, the text coding vector corresponding to the first text is obtained according to the sentence coding vector corresponding to each clause, the text coding vector and the second text are input to the decoding layer to obtain the output training translation text, and the translation model is continuously trained according to the error of the training translation text and the error of the second text, so that the translation model for reducing the word information attenuation of the target text can be obtained, and better translation effects can be obtained.

The above is a schematic scheme of the training device of the translation model of the present embodiment. It should be noted that, the technical solution of the training device of the translation model and the technical solution of the training method of the translation model belong to the same concept, and details of the technical solution of the training device of the translation model, which are not described in detail, can be referred to the description of the technical solution of the training method of the translation model.

An embodiment of the present application also provides a computer-readable storage medium storing computer instructions that, when executed by a processor, implement the steps of a translation method or a training method of a translation model as described above.

The above is an exemplary version of a computer-readable storage medium of the present embodiment. It should be noted that, the technical solution of the storage medium and the technical solution of the foregoing translation method or training method of the translation model belong to the same concept, and details of the technical solution of the storage medium that are not described in detail may be referred to the description of the technical solution of the foregoing translation method or training method of the translation model.

The computer instructions include computer program code that may be in source code form, object code form, executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.

An embodiment of the present application also provides a chip storing computer instructions which, when executed by the chip, implement the steps of the translation method or training method of the translation model as described above.

It should be noted that, for the sake of simplicity of description, the foregoing method embodiments are all expressed as a series of combinations of actions, but it should be understood by those skilled in the art that the present application is not limited by the order of actions described, as some steps may be performed in other order or simultaneously in accordance with the present application. Further, those skilled in the art will appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily all required for the present application.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.

The preferred embodiments of the application disclosed above are intended only to assist in the explanation of the application. Alternative embodiments are not intended to be exhaustive or to limit the application to the precise form disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the application and the practical application, to thereby enable others skilled in the art to best understand and utilize the application. The application is limited only by the claims and the full scope and equivalents thereof.

Claims

1. A translation method for a translation model, the translation model comprising an encoding layer and a decoding layer, the method comprising:

splitting a target text to obtain at least two clauses, wherein the target text comprises N clauses, each clause comprises M words, M is more than or equal to 2, N is more than or equal to 2, and M, N is a positive integer;

for the ith clause except for the 1 st clause, performing: s102, obtaining a coding hidden layer output corresponding to the 1 st word according to the 1 st word of the i st clause input to the coding layer and the sentence coding vector corresponding to the i-1 st clause; s104, obtaining the coding hidden layer output corresponding to the jth word according to the sentence coding vector corresponding to the ith-1 clause, the coding hidden layer output corresponding to the jth-1 word and the jth word of the ith clause, wherein j is more than or equal to 2 and less than or equal to M; s106, automatically increasing j by 1, judging whether j after the automatic increase of 1 is larger than M, if yes, executing the step S108, and if not, continuously executing the step S104; s108, outputting according to coding hidden layers corresponding to M words of the ith clause to obtain sentence coding vectors corresponding to the ith clause, wherein i is a positive integer and is more than 1 and less than or equal to N;

2. The translation method of claim 1, wherein obtaining a text encoding vector corresponding to the target text from the sentence encoding vector corresponding to each clause, comprises:

3. The translation method of claim 1, wherein S102 comprises:

4. The translation method according to claim 1, wherein said step S104 comprises:

5. The translation method according to claim 1, wherein said step S108 comprises:

6. The translation method of claim 1, wherein inputting the text encoding vector to the decoding layer generates the translated text corresponding to the target text, comprising:

7. The translation method of claim 6, wherein said decoded words are P, wherein P is equal to or greater than 2 and P is a positive integer;

inputting the text encoding vector and the initial set word to a decoding layer to generate at least two decoded words, comprising:

8. A method for training a translation model is characterized in that the translation model comprises an encoding layer and a decoding layer,

the training method comprises the following steps:

splitting a first text of a target corpus to obtain at least two clauses, wherein the first text comprises N clauses, each clause comprises M words, M is more than or equal to 2, N is more than or equal to 2, and M, N is a positive integer;

if yes, stopping training;

9. The training method of claim 8, wherein the second text comprises P tag words;

10. The training method of claim 8, wherein the training stop condition comprises: the error of the training translated text and the second text is less than a stability threshold.

11. A translation apparatus for a translation model, the translation model comprising an encoding layer and a decoding layer, the translation apparatus comprising:

the first splitting module is configured to split the target text to obtain at least two clauses, wherein the target text comprises N clauses, each clause comprises M words, M is more than or equal to 2, N is more than or equal to 2, and M, N is a positive integer;

the first clause coding module is configured to input the clause into a coding layer for the 1 st clause to obtain a sentence coding vector corresponding to the 1 st clause; for the ith clause except for the 1 st clause, performing: s102, obtaining a coding hidden layer output corresponding to the 1 st word according to the 1 st word of the i st clause input to the coding layer and the sentence coding vector corresponding to the i-1 st clause; s104, obtaining the coding hidden layer output corresponding to the jth word according to the sentence coding vector corresponding to the ith-1 clause, the coding hidden layer output corresponding to the jth-1 word and the jth word of the ith clause, wherein j is more than or equal to 2 and less than or equal to M; s106, automatically increasing j by 1, judging whether j after the automatic increase of 1 is larger than M, if yes, executing the step S108, and if not, continuously executing the step S104; s108, outputting according to coding hidden layers corresponding to M words of the ith clause to obtain sentence coding vectors corresponding to the ith clause, wherein i is a positive integer and is more than 1 and less than or equal to N;

12. A training device for a translation model, wherein the translation model includes an encoding layer and a decoding layer, the training device comprising:

the second splitting module is configured to split a first text of the target corpus to obtain at least two clauses, wherein the first text comprises N clauses, each clause comprises M words, M is more than or equal to 2, N is more than or equal to 2 and M, N is a positive integer;

the second clause coding module is configured to input the clause into the coding layer for the 1 st clause to obtain a sentence coding vector corresponding to the 1 st clause;

13. A computing device comprising a memory, a processor, and computer instructions stored on the memory and executable on the processor, wherein the processor, when executing the instructions, implements the steps of the method of any of claims 1-7 or 8-10.

14. A computer readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the method of any one of claims 1-7 or 8-10.

15. A chip storing computer instructions, which when executed by the chip, implement the steps of the method of any one of claims 1-7 or 8-10.