CN114118108A

CN114118108A - Translation model establishing method, translation method and corresponding device

Info

Publication number: CN114118108A
Application number: CN202111330368.0A
Authority: CN
Inventors: 陈珺; 孙清清; 郑行; 王爱凌; 赖伟达; 邹泊滔
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2021-11-11
Filing date: 2021-11-11
Publication date: 2022-03-01

Abstract

The embodiment of the specification provides a method for establishing a translation model, a translation method and a corresponding device, according to the embodiment of the specification, firstly training data comprising a plurality of training samples are obtained; then training an auxiliary model comprising an encoder, a text decoder and a speech decoder by using the training data; the source language text of the training sample is used as the input of an encoder, and the encoder outputs the characteristic representation of the source language text; the text decoder predicts a target language text of the source language text using the feature representation; the speech decoder predicts a speech index text of the source language text using the feature representation; the training targets of the auxiliary model are: minimizing the difference between the prediction result of the text decoder and the corresponding target language text in the training sample and minimizing the difference between the prediction result of the speech decoder and the corresponding speech index text in the training sample; and then, a coder and a text decoder in the auxiliary model obtained by training are used for obtaining a translation model.

Description

Translation model establishing method, translation method and corresponding device

Technical Field

One or more embodiments of the present disclosure relate to the technical field of artificial intelligence, and in particular, to a method for building a translation model, a translation method, and a corresponding apparatus.

Background

Translation (Transliteration), also known as transcription or Transliteration, is a process in which, given two languages using different characters, one language text is translated in a similar language according to pronunciation, and the translated text no longer has its own semantics but retains the form of speech and writing. Translation is often used for some proper nouns or technical terms such as names of people, places, etc. Such as Arabic

The translation is English Latin letters as Muhammad, and the translation is Chinese as Muhamoude.

Currently, translation is widely applied in many application scenes, so how to improve translation quality becomes a problem to be solved urgently.

Disclosure of Invention

In view of the above, one or more embodiments of the present disclosure describe a method for building a translation model, a translation method, and a corresponding apparatus for improving translation quality.

According to a first aspect, there is provided a method of building a translation model, comprising:

acquiring training data containing a plurality of training samples, wherein the training samples comprise source language texts and target language texts and voice index texts corresponding to the source language texts;

training an auxiliary model comprising an encoder, a text decoder and a speech decoder using the training data; the source language text of the training sample is used as the input of the encoder, and the encoder outputs the characteristic representation of the source language text; the text decoder predicts a target language text of the source language text using the feature representation; the speech decoder predicts a speech index text of the source language text using the feature representation; the training targets of the auxiliary model are as follows: minimizing the difference between the prediction result of the text decoder and the corresponding target language text in the training sample and minimizing the difference between the prediction result of the speech decoder and the corresponding speech index text in the training sample;

and obtaining the translation model by utilizing an encoder and a text decoder in the auxiliary model obtained by training.

In one embodiment, the encoder employs a bi-directional recurrent neural network structure, and the text decoder and speech decoder employ a recurrent neural network structure.

In another embodiment, the encoder outputting the representation of the characteristic of the source language text comprises:

the encoder encodes the source language text to obtain the hidden vector representation corresponding to each character;

and converting the hidden vector representation corresponding to each character into a background variable by using a preset conversion function.

In one embodiment, the text decoder predicting target language text of the source language text using the feature representation comprises: the text decoder converts the prediction result of the previous character in the target language text, the background variable and the implicit vector representation of the previous character to obtain the implicit vector representation of the current character in the target language text; obtaining a prediction result of the current character by utilizing the implicit vector representation mapping of the current character;

the speech decoder predicting a speech index text of the source language text using the feature representation includes: the voice decoder converts the prediction result of the previous character in the voice index text, the background variable and the implicit vector representation of the previous character to obtain the implicit vector representation of the current character in the voice index text; and obtaining a prediction result of the current character by utilizing the implicit vector representation mapping of the current character.

In another embodiment, the auxiliary model further comprises: a text attention layer and a voice attention layer;

the encoding layer encodes the source language text to obtain a hidden vector representation corresponding to each character;

the text attention layer carries out attention mechanism processing on the implicit vector representation to obtain text background variables corresponding to all characters in a target language text, so that the text background variables corresponding to all the characters are utilized when the text decoder carries out prediction;

the voice attention layer carries out attention mechanism processing on the implicit vector representation to obtain voice background variables corresponding to all characters in a voice index text, so that the voice background variables corresponding to all the characters are utilized when the voice decoder carries out prediction;

the translation model further includes the text attention layer.

In one embodiment, the text attention layer represents h as a hidden vector corresponding to the ith character of the source language text when determining a text background variable corresponding to the jth character of the target language text_iAttention weight α adopted_jiIs to the attention score e_jiIs obtained by normalization; said e_jiFrom said h_iAnd the text encoder is used for implicit vector representation s of j-1 character of target language text_j-1Transforming to obtain;

when the speech attention layer determines the speech background variable of the kth character in the speech index text, the speech attention layer represents h to the hidden vector corresponding to the ith character of the source language text_iAttention weight α adopted_kiIs to the attention score e_kiIs obtained by normalization; said e_kiFrom said h_iAnd the speech coder represents u for the hidden vector representation of the j-1 th character of the speech index text_k-1And (5) carrying out transformation to obtain the product.

In another embodiment, the text decoder predicting target language text of the source language text using the feature representation includes: the text decoder converts the prediction result of the previous character in the target language text, the background variable of the current character and the implicit vector representation of the previous character to obtain the implicit vector representation of the current character in the target language text; obtaining a prediction result of the current character by utilizing the implicit vector representation mapping of the current character;

the speech decoder predicting a speech index text of the source language text using the feature representation includes: the voice decoder converts the prediction result of the previous character in the voice index text, the background variable of the current character and the implicit vector representation of the previous character to obtain the implicit vector representation of the current character in the voice index text; and obtaining a prediction result of the current character by utilizing the implicit vector representation mapping of the current character.

According to a second aspect, there is also provided a translation method comprising:

obtaining a source language text;

inputting the source language text into a translation model which is pre-established by adopting the method, and obtaining a target language text which is output by the translation model aiming at the source language text.

In one embodiment, the method further comprises:

and searching in the target language document or list by using the acquired target language text to acquire a search result.

According to a third aspect, there is provided an apparatus for building a translation model, comprising:

a training sample obtaining unit configured to obtain training data including a plurality of training samples, the training samples including source language texts and target language texts and voice index texts corresponding to the source language texts;

an auxiliary model training unit configured to train an auxiliary model including an encoder, a text decoder, and a speech decoder using the training data; the source language text of the training sample is used as the input of the encoder, and the encoder outputs the characteristic representation of the source language text; the text decoder predicts a target language text of the source language text using the feature representation; the speech decoder predicts a speech index text of the source language text using the feature representation; the training targets of the auxiliary model are as follows: minimizing the difference between the prediction result of the text decoder and the corresponding target language text in the training sample and minimizing the difference between the prediction result of the speech decoder and the corresponding speech index text in the training sample;

a translation model obtaining unit configured to obtain the translation model by using an encoder and a text decoder in the trained auxiliary model.

In another embodiment, the encoder is specifically configured to encode the source language text to obtain a hidden vector representation corresponding to each character; and converting the hidden vector representation corresponding to each character into a background variable by using a preset conversion function.

In an embodiment, the text decoder is specifically configured to perform transformation by using a prediction result of a previous character in the target language text, the background variable, and a hidden vector representation of the previous character to obtain a hidden vector representation of a current character in the target language text; obtaining a prediction result of the current character by utilizing the implicit vector representation mapping of the current character;

the speech decoder is specifically used for converting by using a prediction result of a previous character in a speech index text, the background variable and the implicit vector representation of the previous character to obtain the implicit vector representation of a current character in the speech index text; and obtaining a prediction result of the current character by utilizing the implicit vector representation mapping of the current character.

the coding layer is specifically used for coding the source language text to obtain a hidden vector representation corresponding to each character;

the text attention layer is used for processing an attention mechanism on the implicit vector representation to obtain text background variables corresponding to each character in a target language text, so that the text background variables corresponding to each character are utilized when the text decoder performs prediction;

the voice attention layer is used for processing an attention mechanism on the implicit vector representation to obtain a voice background variable corresponding to each character in a voice index text, so that the voice background variable corresponding to each character is utilized when the voice decoder performs prediction;

the translation model further includes the text attention layer.

In another embodiment, the text decoder is specifically configured to perform transformation by using a prediction result of a previous character in the target language text, a background variable of a current character, and a hidden vector representation of the previous character to obtain a hidden vector representation of the current character in the target language text; obtaining a prediction result of the current character by utilizing the implicit vector representation mapping of the current character;

the speech decoder is specifically used for converting by using a prediction result of a previous character in the speech index text, a background variable of a current character and a hidden vector representation of the previous character to obtain a hidden vector representation of the current character in the speech index text; and obtaining a prediction result of the current character by utilizing the implicit vector representation mapping of the current character.

According to a fourth aspect, there is also provided a translation apparatus comprising:

a source text acquisition unit configured to acquire a source language text;

a target text acquisition unit configured to input the source language text into a translation model, acquire a target language text output by the translation model for the source language text; wherein the translation model is pre-established by the apparatus as described above.

In one embodiment, further comprising:

and the information retrieval unit is configured to retrieve in the target language document or the list by using the acquired target language text to acquire a retrieval result.

According to a fifth aspect, there is provided a computing device comprising a memory having stored therein executable code and a processor that, when executing the executable code, implements the method of the first aspect.

According to the method and the device provided by the embodiment of the specification, the source language text and the target language text are learned by constructing the translation model, and the acoustic representation of the source language text is better learned by combining with the voice decoding task to assist the encoder, so that the translation quality of the translation model is improved while the training efficiency of the translation model is accelerated.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 shows a flow diagram of a method of root cause location according to one embodiment;

FIG. 2 is a schematic structural diagram of an auxiliary model provided in an embodiment of the present disclosure;

FIG. 3 is a schematic structural diagram of another auxiliary model provided in the embodiments of the present disclosure;

FIG. 4 illustrates a flow diagram of a translation method according to one embodiment;

FIG. 5 is a block diagram illustrating a translation model provided by an embodiment of the present disclosure;

FIG. 6 is a block diagram illustrating a translation model provided by an embodiment of the present disclosure;

FIG. 7 shows a schematic block diagram of an apparatus to build a translation model according to one embodiment;

fig. 8 shows a schematic block diagram of a translation apparatus according to one embodiment.

Detailed Description

The traditional translation method is mainly based on rules, for example, using a preset mapping table, to correspond the letters or letter combinations of the source language text to the letters or letter combinations of the target language. But this character-level mapping results in translation results of low quality. For example, for languages that use consonant phonemes, such as arabic, there are no vowels in the alphabet but only consonants, but reading the text requires complementing appropriate vowels according to context. In this case, if a rule-based translation scheme is used, only isolated target language consonant letter sequences can be generated, and an ideal translation result cannot be obtained.

In view of the above, the present specification provides a novel translation method, and the following detailed description is provided with embodiments.

FIG. 1 shows a flow diagram of a method of root cause location, according to one embodiment. It is to be appreciated that the method can be performed by any apparatus, device, platform, cluster of devices having computing and processing capabilities.

As shown in fig. 1, the method includes:

step 101, obtaining training data including a plurality of training samples, where the training samples include source language texts and target language texts and voice index texts corresponding to the source language texts.

Step 103, training an auxiliary model comprising an encoder, a text decoder and a speech decoder by using training data; the source language text of the training sample is used as the input of an encoder, and the encoder outputs the characteristic representation of the source language text; the text decoder predicts a target language text of the source language text using the feature representation; the speech decoder predicts a speech index text of the source language text using the feature representation; the training targets of the auxiliary model are: minimizing the difference between the predicted result of the text decoder and the corresponding target language text in the training samples and minimizing the difference between the predicted result of the speech decoder and the corresponding speech index text in the training samples.

And 105, obtaining a translation model by utilizing an encoder and a text decoder in the auxiliary model obtained by training.

In the method shown in fig. 1, a source language text and a target language text are learned by constructing a translation model, and an encoder is assisted to learn acoustic representation of the source language text better by combining a speech decoding task, so that the translation quality of the translation model is improved while the training efficiency of the translation model is accelerated.

The manner in which the various steps shown in fig. 1 are performed is described below. First, the above step 101, i.e., "obtaining training data including a plurality of training samples", will be described in detail with reference to the embodiments.

In this step, when training data is obtained, parallel samples of source language text, target language text, and speech index text are obtained. That is, each training sample needs to contain source language text and its corresponding target language text and speech index text, wherein in order to ensure the accuracy of model training, the contents in the training sample should be accurate.

As one of the realizable manners, some typical source language texts may be selected, for example, proper nouns such as typical names of people and places, or some technical terms, and then the target language texts are manually labeled by experts, and the source language texts are encoded by using a speech decoding algorithm to obtain corresponding speech index texts. Among other things, speech decoding algorithms such as the DoubleMetaphone, Metaphone3, SOUNDEX, etc. may be employed. These algorithms are currently relatively sophisticated speech decoding algorithms, not described in detail herein, and are intended to create the same index (or key) for words that sound the same or similar.

However, the above-mentioned manner of manual labeling by experts requires much labor cost, and another achievable manner is provided here, namely, the source language text and the target language text can be constructed by terms and attribute contents on encyclopedic pages. For example, on some encyclopedia pages in the small languages, the terms of proper nouns such as names of people and names of places have expressions of other corresponding languages in their property areas. E.g. in Arabic encyclopedia pages, for entries

In its property region there will be its corresponding english expression "Muhammad". On an encyclopedia page of Chinese, for the word "Muhamoude", there will be its corresponding English expression "Muhammad" in its attribute region. In this way it is possible to catch on encyclopedia pages such as

The correspondence between "Muhammad" and "Muhamoude". In training the translation model, if the source and target languages are Arabic and English, respectively, then this may be utilized

And "Muhammad" as the source and target language text, respectively, in a training sample. If the source language and the target language are respectively Chinese and English, the "Muhmoude" and the "Muhammad" can be used as the source language text and the target language text in one training sample. The corresponding voice index text can still be obtained in various mannersSpeech decoding algorithms such as the DoubleMetaphone, Metaphone3, SOUNDEX, etc. encode the source language text.

The above step 103 of training the auxiliary model including the encoder, the text decoder and the speech decoder using the training data will be described in detail with reference to the embodiments.

The purpose of this embodiment is to train the translation model, and the training of the translation model is realized by training the neural network model. However, in order to improve the training efficiency and quality, the speech decoding task is used to assist the training of the translation model in this embodiment, in which case the model including the speech decoder is referred to as an auxiliary model.

Fig. 2 is a schematic structural diagram of an auxiliary model provided in an embodiment of the present specification, where the auxiliary model mainly includes an encoder, a text decoder, and a speech decoder.

Where the input to the encoder is the source language text in the training sample. The source language text can be thought of as a sequence of characters, denoted X: x ═ X₁,x₂,…,x_m). Wherein x is_iRepresenting the ith character in the source language text and m representing the number of characters in the input sequence.

The encoder encodes the input sequence character by character, and the encoding results in a characteristic representation of the text in the source language.

Specifically, the encoder may encode the source language text to obtain a hidden vector representation corresponding to each character; and then converting the hidden vector representation corresponding to each character into a background variable by using a preset conversion function.

For each character, taking the ith character as an example, the encoder will input x_iAnd the hidden vector representation h of the previous character, i.e. the (i-1) th character_i-1Conversion to a hidden vector representation h for the ith character_i. The formula is as follows:

h_i＝f(x_i,h_i-1) (1)

where f () is the transform function employed by the encoder in obtaining the hidden vector representation.

After the implicit vector representations of all characters are obtained, the implicit vector representations are transformed into a background variable C by using the following formula:

C＝q(h₁,h₂,...,h_m) (2)

where q () is the transform function employed by the encoder in deriving the background variable.

The encoder may adopt a conventional Recurrent Neural Network structure such as RNN (Recurrent Neural Network), LSTM (Long Short-Term Memory), GRU (Gated Recurrent Unit), and the like. As a preferred embodiment, however, the encoder may employ a bi-directional recurrent neural network structure, such as BiLSTM. In this case, the latent vector representation h for each character of the encoder_iDepending on the sub-sequence before and after the character (also including the hidden vector representation of the current character) the information of the whole sequence is encoded.

The text decoder predicts a target language text of the source language text using the feature representation output by the encoder.

It can be seen that the characteristics of the encoder output are represented as a background variable C in this embodiment. The text decoder takes the background variable C as input and outputs a predicted sequence of characters, i.e. the target language text, which is assumed to be represented as

Wherein the content of the first and second substances,

and j represents the j character in the predicted target language text, and n is the total number of characters in the target language text.

For each character, taking the jth character as an example, the text decoder uses the prediction result of the last character

Background variable C and implicit vector representation s of previous character_j-1Converting to obtain the hidden vector representation s of the jth character_j. Is expressed by formula asThe following:

where g () is the transformation function employed by the text decoder in obtaining the hidden vector representation.

Then using Softmax operation to represent the hidden vector of the j-th character to be s_jMapping to obtain the prediction result of the j character

The mapping result is output in the form of conditional probabilities mapped to the character.

The text encoder may employ a recurrent neural network structure such as RNN, LSTM, GRU, etc. in this embodiment.

The speech decoder predicts a speech index text of the source language text using the feature representation output by the encoder.

It can be seen that the characteristics of the encoder output are represented as a background variable C in this embodiment. The speech decoder takes the background variable C as input and outputs a sequence of predicted characters, i.e. a speech index text, which is assumed to be represented as

Wherein the content of the first and second substances,

and the k-th character in the predicted voice index text is represented, and r is the total number of characters in the voice index text.

For each character, taking the k-th character as an example, the speech decoder uses the prediction result of the previous character

Background variable C and implicit vector representation u of previous character_k-1Converting to obtain the implicit vector representation u of the k character_k. Is formulated as follows:

where l () is the transform function employed by the speech decoder in obtaining the hidden vector representation.

Then using Softmax operation to represent the hidden vector of the k-th character as u_kMapping to obtain the predicted result of the k character

The mapping result is output in the form of conditional probabilities mapped to the character. It should be noted here that the speech encoder is mapped to the speech coding space. The speech coding space differs according to the specific speech coding algorithm used, and the specific speech coding algorithm should be consistent with the speech coding algorithm corresponding to the speech index text in the training sample.

The speech coder may employ a recurrent neural network structure such as RNN, LSTM, GRU, etc. in this embodiment.

If in the training sample, the source language text X ═ X (X)₁,x₂,…,x_m) The corresponding target language text Y and the voice index text Z are respectively: y ═ Y₁,y₂,…,y_n)，Z＝(z₁,z₂,…,z_r) Then during the training process, the training goal is to minimize

And Y and minimization

And Z.

The Loss function Loss, which may be obtained from the first Loss function and the second Loss function, may be designed in advance for the learning target. Wherein the first Loss function Loss_charTo minimize

The difference from Y is the target design, the second Loss function Loss_phoneTo minimize

The difference from Z is the target design. For example, the formula is expressed as follows:

Loss＝Loss_char+λLoss_phone (4)

wherein λ is a weighting coefficient.

That is, the model parameters of the whole auxiliary model are updated according to the values of the loss function (i.e. determined by the output of the text decoder and the speech encoder) in each iteration until a preset iteration stop condition is reached. The iteration stop condition may be, for example, that the value of the loss function is less than or equal to a preset value threshold, or that the number of iterations reaches a preset number threshold, or the like.

After training the auxiliary model, the actual translation model only includes the encoder and the text decoder, and may not include the speech encoder.

Fig. 3 is a schematic structural diagram of another auxiliary model provided in an embodiment of the present specification, where the auxiliary model includes a text attention layer and a speech attention layer in addition to an encoder, a text decoder, and a speech decoder.

Likewise, the input to the encoder is the source language text in the training sample. The source language text can be thought of as a sequence of characters, denoted X: x ═ X₁,x₂,…,x_m). Wherein x is_iRepresenting the ith character in the source language text and m representing the number of characters in the input sequence.

Specifically, the encoder may encode the source language text to obtain a hidden vector representation corresponding to each character.

For each character, taking the ith character as an example, the encoder will input x_iAnd the hidden vector representation h of the previous character, i.e. the (i-1) th character_i-1Conversion to implicit character of iAmount represents h_i. The formula used is as described above in equation (1).

Unlike in the structure shown in fig. 2, the generation of the background variable is performed by the text attention layer and the speech attention layer, respectively.

The text attention layer carries out attention mechanism processing on the implicit vector representation to obtain a text background variable C^char. The text background variable C^charIs variable, for each instant (i.e., for each character of the target language text), the following formula may be employed:

wherein the content of the first and second substances,

a text context variable representing the jth character of the corresponding target language text. Alpha is alpha_jiFor weight, for a given j, its value is a probability distribution. Alpha is alpha_jiMay be the score for attention e using the softmax operation_jiAnd carrying out normalization to obtain the product.

e_jiThe hidden vector representation h, which can be derived from the text attention layer_iAnd the hidden vector representation s of the text decoder for the j-1 th character_j-1And (5) carrying out transformation to obtain the product. The formula is expressed as follows:

e_ji＝a(s_j-1,h_i) (6)

where a () is the decoder getting e_jiThe transformation function used. For example, the method can be realized by adopting a multi-layer perceptron, and the formula is represented as:

a(s_j-1,h_i)＝V tanh(W_ss_j-1+W_hh_i) (7)

wherein, V, W_sAnd W_hAre all model parameters.

The voice attention layer carries out attention mechanism processing on the implicit vector representation to obtain a voice background variable C^phone. The speech background variable C^phoneIs variable, for each instant (i.e. each character of the corresponding phonetic index text) the following formula can be used:

wherein the content of the first and second substances,

a speech background variable representing the kth character of the corresponding speech indexed text. Alpha is alpha_kiFor weight, the value is a probability distribution for a given k. Alpha is alpha_kiMay be the score for attention e using the softmax operation_kiAnd carrying out normalization to obtain the product.

e_kiThe implicit vector representation h obtained by the decoder can be used_iAnd the hidden vector representation u of the text decoder for the k-1 th character_k-1And (5) carrying out transformation to obtain the product. The formula is expressed as follows:

e_ki＝a'(u_k-1,h_i) (9)

wherein a' () is the speech attention layer getting e_kiThe transformation function used. For example, the method can be realized by adopting a multi-layer perceptron, and the formula is represented as:

a'(u_k-1,h_i)＝V'tanh(W_uu_k-1+W_h'h_i) (10)

wherein, V' and W_uAnd W_h' are all model parameters。

The text decoder then uses the text background variable C output by the text attention layer^charAnd predicting a target language text of the source language text.

It can be seen that the features utilized by the text decoder in this embodiment are represented as text background variables C^char. Text decoder with text background variable C^charAs input, a predicted sequence of characters, i.e., target language text, is output, presumably represented as

Wherein the content of the first and second substances,

Background variables of the current character

And a hidden vector representation s of the previous character_j-1Converting to obtain the hidden vector representation s of the jth character_j. Is formulated as follows:

The mapping result is summarized in terms of conditions for mapping to charactersAnd outputting in the form of a rate.

Speech decoder using speech background variable C^phoneA speech index text of a source language text is predicted.

It can be seen that the features utilized by the speech decoder in this embodiment are represented as speech background variables C^phone. Speech background variable C for speech decoder^phoneAs input, a predicted sequence of characters, i.e. a speech index text, is output, which is assumed to be represented as

Wherein the content of the first and second substances,

Speech background variable of current character

And the implicit vector representation u of the previous character_k-1Converting to obtain the implicit vector representation u of the k character_k. Is formulated as follows:

And Y and minimization

And Z.

The difference from Z is the target design. For example, as shown in the above equation (4).

After training the auxiliary model, the actual translation model only includes the encoder, the text attention layer and the text decoder, and may not include the speech attention layer and the speech encoder.

After the translation model is obtained, the translation process can be performed by using the translation model. Fig. 4 shows a flow diagram of a translation method according to an embodiment, it being understood that the method may be performed by any apparatus, device, platform, cluster of devices having computing, processing capabilities. As shown in fig. 4, the method may include the steps of:

step 401: and obtaining source language text.

The source language text involved in this step is the text to be translated. The text usually includes proper nouns such as names of people and places, technical nouns, and the like.

Step 403: inputting the source language text into the translation model, and acquiring the target language text output by the translation model aiming at the source language text.

This embodiment needs to ensure that the source language and the target language corresponding to the selected translation model are consistent with the requirement of the translation.

Fig. 5 is a schematic structural diagram of a translation model provided in an embodiment of the present specification, and as shown in fig. 5, the translation model includes an encoder and a text decoder.

And the encoder encodes the source language text to obtain the characteristic representation of the source language text.

If the input source language text is X: x ═ X₁,x₂,…,x_m) Then for each character, taking the ith character as an example, the encoder will input x_iAnd the hidden vector representation h of the previous character, i.e. the (i-1) th character_i-1Conversion to a hidden vector representation h for the ith character_iFor example, the transformation can be performed using equation (1). And then transformed into a background variable C using equation (2).

It can be seen that the characteristics of the encoder output are represented as a background variable C in this embodiment. The text decoder takes the background variable C as input and outputs a sequence of predicted characters, namely the target language text

Background variable C and implicit vector representation s of previous character_j-1Converting to obtain the hidden vector representation s of the jth character_j. For example, using equation (3).

Fig. 6 is a schematic structural diagram of a translation model provided in an embodiment of the present specification, and as shown in fig. 6, the translation model includes an encoder, a text attention layer, and a text decoder.

And the encoder encodes the source language text to obtain the hidden vector representation corresponding to each character.

If the input source language text is X: x ═ X₁,x₂,…,x_m) Then for each character, taking the ith character as an example, the encoder will input x_iAnd the last character isImplicit vector representation h for the i-1 th character_i-1Conversion to a hidden vector representation h for the ith character_iFor example, the transformation can be performed using equation (1).

The text attention layer carries out attention mechanism processing on the implicit vector representation to obtain a text background variable C^char. The text background variable C^charIs variable, and for each time (i.e. each character of the corresponding target language text), the text background variable corresponding to the jth character of the target language text can be determined by using the formula (5)

The text decoder then uses the text background variable C output by the text attention layer^charAnd predicting a target language text of the source language text. Text decoder with text background variable C^charAs input, a predicted sequence of characters, i.e. target language text, is output

Background variables of the current character

And a hidden vector representation s of the previous character_j-1Converting to obtain the hidden vector representation s of the jth character_j. For example, formula (11) can be used. Then using Softmax operation to represent the hidden vector of the j-th character to be s_jMapping to obtain the prediction result of the j character

The translation method provided by the above method embodiment can be applied to various application scenarios. As a typical application scenario, the obtained target language text may be used to perform a search in a target language document or list to obtain a search result.

In one example, assume that in the information defense domain, there are some special names of people, organization names, etc. that are listed in the blacklist. However, although the names of persons and organizations in the blacklist are languages such as english, they need to be recognized in texts of small languages. Then, the translation method in the above embodiment of the present specification may be adopted, and the text in the small language is input into the corresponding translation model as the source language text, so as to obtain the translation result of the corresponding english language. And then matching the English translation result with a blacklist, namely searching in the blacklist, and if the search result exists, indicating that the text contains the blacklist, and performing further information defense processing on the text.

In some particular technical fields, for example, there is a need to find articles in the small language associated with a particular technical name, but only the english text of that technical name is known. The english text of the technical name can be input into the corresponding translation model to obtain the translation result of the corresponding language of the technical name. The translation result is searched in a large number of articles, so that the article containing the technical name is found.

But also to other scenarios, which are not exhaustive here.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

According to an embodiment of another aspect, an apparatus for building a translation model is provided. FIG. 7 illustrates a schematic block diagram of an apparatus to build a translation model, according to one embodiment. It is to be appreciated that the apparatus can be implemented by any apparatus, device, platform, and cluster of devices having computing and processing capabilities. As shown in fig. 7, the apparatus 700 includes: a training sample acquisition unit 701, an auxiliary model training unit 702, and a translation model acquisition unit 703. The main functions of each component unit are as follows:

a training sample obtaining unit 701 configured to obtain training data including a plurality of training samples, where a training sample includes a source language text and a target language text and a speech index text corresponding to the source language text.

As one of the realizable manners, some typical source language texts may be selected, for example, proper nouns such as typical names of people and places, or some technical terms, and then the target language texts are manually labeled by experts, and the source language texts are encoded by using a speech decoding algorithm to obtain corresponding speech index texts.

As another realizable mode, a source language text and a target language text can be constructed through entries and attribute contents on an encyclopedic page, and a voice decoding algorithm is adopted to encode the source language text to obtain a corresponding voice index text.

An auxiliary model training unit 702 configured to train an auxiliary model including an encoder, a text decoder, and a speech decoder using training data; the source language text of the training sample is used as the input of an encoder, and the encoder outputs the characteristic representation of the source language text; the text decoder predicts a target language text of the source language text using the feature representation; the speech decoder predicts a speech index text of the source language text using the feature representation; the training targets of the auxiliary model are: minimizing the difference between the predicted result of the text decoder and the corresponding target language text in the training samples and minimizing the difference between the predicted result of the speech decoder and the corresponding speech index text in the training samples.

As a preferred embodiment, the encoder may employ a bidirectional recurrent neural network structure, and the text decoder and the speech decoder may employ a recurrent neural network structure.

In a realizable manner, the auxiliary model mainly comprises an encoder, a text decoder and a speech decoder, as shown in fig. 2.

The encoder is used for encoding the source language text to obtain the hidden vector representation corresponding to each character; and converting the hidden vector representation corresponding to each character into a background variable by using a preset conversion function.

The text decoder is used for converting by using a prediction result of a previous character in the target language text, a background variable and the implicit vector representation of the previous character to obtain the implicit vector representation of the current character in the target language text; and obtaining a prediction result of the current character by utilizing the implicit vector representation mapping of the current character.

The voice decoder is used for converting the prediction result of the previous character in the voice index text, the background variable and the implicit vector representation of the previous character to obtain the implicit vector representation of the current character in the voice index text; and obtaining a prediction result of the current character by utilizing the implicit vector representation mapping of the current character.

As another realizable approach, as shown in fig. 3, the auxiliary model may further include: a text attention layer and a speech attention layer.

The encoding layer is used for encoding the source language text to obtain the hidden vector representation corresponding to each character;

the text attention layer is used for processing an attention mechanism on the implicit vector representation to obtain text background variables corresponding to the characters in the target language text, so that a text decoder can utilize the text background variables corresponding to the characters when predicting.

The voice attention layer is used for processing an attention mechanism on the implicit vector representation to obtain a voice background variable corresponding to each character in the voice index text, so that a voice decoder can utilize the voice background variable corresponding to each character when predicting.

The translation model in this case further includes a text attention layer.

More specifically, the text attention layer is determining the target languageWhen the text background variable corresponding to the jth character of the language text represents h for the hidden vector corresponding to the ith character of the source language text_iAttention weight α adopted_jiIs to the attention score e_jiIs obtained by normalization; e.g. of the type_jiFrom h_iAnd the text encoder's hidden vector representation s for the j-1 th character of the target language text_j-1And (5) carrying out transformation to obtain the product.

When the phonetic attention layer determines the phonetic background variable of the kth character in the phonetic index text, the implicit vector corresponding to the ith character of the source language text represents h_iAttention weight α adopted_kiIs to the attention score e_kiIs obtained by normalization; e.g. of the type_kiFrom h_iAnd the speech coder represents u for the hidden vector representation of the j-1 th character of the speech indexed text_k-1And (5) carrying out transformation to obtain the product.

The text decoder is used for converting by using a prediction result of a previous character in the target language text, a background variable of the current character and the implicit vector representation of the previous character to obtain the implicit vector representation of the current character in the target language text; and obtaining a prediction result of the current character by utilizing the implicit vector representation mapping of the current character.

The speech decoder is specifically used for converting by using a prediction result of a previous character in the speech index text, a background variable of the current character and the implicit vector representation of the previous character to obtain the implicit vector representation of the current character in the speech index text; and obtaining a prediction result of the current character by utilizing the implicit vector representation mapping of the current character.

A translation model obtaining unit 703 configured to obtain a translation model by using an encoder and a text decoder in the trained auxiliary model.

According to an embodiment of another aspect, a translation apparatus is provided. Fig. 8 shows a schematic block diagram of a translation apparatus according to one embodiment. It is to be appreciated that the apparatus can be implemented by any apparatus, device, platform, and cluster of devices having computing and processing capabilities. As shown in fig. 8, the apparatus 800 includes: the source text acquisition unit 801 and the target text acquisition unit 802 may further include an information retrieval unit 803. The main functions of each component unit are as follows:

a source text acquisition unit 801 configured to acquire a source language text.

A target text obtaining unit 802 configured to input the source language text into the translation model, and obtain the target language text output by the translation model for the source language text.

As one way of accomplishing this, the translation model primarily includes an encoder and a text decoder, as shown in fig. 5.

And the encoder encodes the source language text to obtain the characteristic representation of the source language text. The text decoder predicts a target language text of the source language text using the feature representation output by the encoder.

Specifically, the text decoder performs conversion by using the prediction result of the previous character in the target language text, the background variable, and the hidden vector representation of the previous character to obtain the hidden vector representation of the current character. And then, mapping the hidden vector representation of the current character by using Softmax operation to obtain a prediction result of the current character. The mapping result is output in the form of conditional probabilities mapped to the character.

As another implementable approach, the translation model includes an encoder, a text attention layer, and a text decoder, as shown in fig. 6.

And the text attention layer carries out attention mechanism processing on the hidden vector representation to obtain a text background variable.

The text decoder then predicts a target language text of the source language text using the text background variables output by the text attention layer.

Wherein the text attention layer determines the text corresponding to the jth character of the target language textWhen the background variable is used, the hidden vector corresponding to the ith character of the source language text is represented by h_iAttention weight α adopted_jiIs to the attention score e_jiIs obtained by normalization; e.g. of the type_jiFrom h_iAnd the text encoder's hidden vector representation s for the j-1 th character of the target language text_j-1And (5) carrying out transformation to obtain the product.

The text decoder is used for converting by utilizing a prediction result of a previous character in the target language text, a background variable of the current character and the implicit vector representation of the previous character to obtain the implicit vector representation of the current character in the target language text; and obtaining a prediction result of the current character by utilizing the implicit vector representation mapping of the current character.

An information retrieval unit 803 configured to perform retrieval in a target language document or list using the acquired target language text to acquire a retrieval result.

According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in fig. 1 or fig. 4.

According to an embodiment of yet another aspect, there is also provided a computing device comprising a memory and a processor, the memory having stored therein executable code, the processor implementing the method of fig. 1 or fig. 4 when executing the executable code.

With the development of time and technology, computer readable storage media are more and more widely used, and the propagation path of computer programs is not limited to tangible media any more, and the computer programs can be directly downloaded from a network and the like. Any combination of one or more computer-readable storage media may be employed. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present specification, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The processors described above may include one or more single-core processors or multi-core processors. The processor may comprise any combination of general purpose processors or dedicated processors (e.g., image processors, application processor baseband processors, etc.).

Computer program code for carrying out operations for aspects of the present description may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

Those skilled in the art will recognize that in one or more of the examples described above, the functions described in this specification can be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.

It is also to be understood that the terminology used in the embodiments of the specification is for the purpose of describing particular embodiments only, and is not intended to be limiting of the invention. As used in the specification examples and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to … …", depending on the context.

The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims

1. A method of building a translation model, comprising:

2. The method of claim 1, wherein the encoder employs a bi-directional recurrent neural network structure, and the text decoder and speech decoder employ a recurrent neural network structure.

3. The method of claim 1, wherein the encoder outputting the feature representation of the source language text comprises:

4. The method of claim 3, wherein the text decoder predicting target language text of the source language text using the feature representation comprises: the text decoder converts the prediction result of the previous character in the target language text, the background variable and the implicit vector representation of the previous character to obtain the implicit vector representation of the current character in the target language text; obtaining a prediction result of the current character by utilizing the implicit vector representation mapping of the current character;

5. The method of claim 1, wherein the auxiliary model further comprises: a text attention layer and a voice attention layer;

the translation model further includes the text attention layer.

6. The method of claim 5, wherein the text attention layer represents h for the hidden vector representation corresponding to the ith character of the source language text when determining the text context variable corresponding to the jth character of the target language text_iAttention weight α adopted_jiIs to the attention score e_jiIs obtained by normalization; said e_jiFrom said h_iAnd the text encoder is used for implicit vector representation s of j-1 character of target language text_j-1Transforming to obtain;

7. The method of claim 5, wherein the text decoder predicting target language text of the source language text using the feature representation comprises: the text decoder converts the prediction result of the previous character in the target language text, the background variable of the current character and the implicit vector representation of the previous character to obtain the implicit vector representation of the current character in the target language text; obtaining a prediction result of the current character by utilizing the implicit vector representation mapping of the current character;

8. A translation method, comprising:

obtaining a source language text;

the method comprises the steps of inputting the source language text into a translation model which is established in advance by adopting the method of any one of claims 1 to 7, and obtaining target language text which is output by the translation model aiming at the source language text.

9. The method of claim 8, further comprising:

10. Apparatus for building a translation model, comprising:

11. Translation apparatus comprising:

a source text acquisition unit configured to acquire a source language text;

a target text acquisition unit configured to input the source language text into a translation model, acquire a target language text output by the translation model for the source language text; wherein the translation model is pre-established by the apparatus of any one of claims 10 to 16.

12. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code that, when executed by the processor, performs the method of any of claims 1-9.