CN114118108A - Translation model establishing method, translation method and corresponding device - Google Patents

Translation model establishing method, translation method and corresponding device Download PDF

Info

Publication number
CN114118108A
CN114118108A CN202111330368.0A CN202111330368A CN114118108A CN 114118108 A CN114118108 A CN 114118108A CN 202111330368 A CN202111330368 A CN 202111330368A CN 114118108 A CN114118108 A CN 114118108A
Authority
CN
China
Prior art keywords
text
character
decoder
language text
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111330368.0A
Other languages
Chinese (zh)
Inventor
陈珺
孙清清
郑行
王爱凌
赖伟达
邹泊滔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202111330368.0A priority Critical patent/CN114118108A/en
Publication of CN114118108A publication Critical patent/CN114118108A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/274Converting codes to words; Guess-ahead of partial word inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The embodiment of the specification provides a method for establishing a translation model, a translation method and a corresponding device, according to the embodiment of the specification, firstly training data comprising a plurality of training samples are obtained; then training an auxiliary model comprising an encoder, a text decoder and a speech decoder by using the training data; the source language text of the training sample is used as the input of an encoder, and the encoder outputs the characteristic representation of the source language text; the text decoder predicts a target language text of the source language text using the feature representation; the speech decoder predicts a speech index text of the source language text using the feature representation; the training targets of the auxiliary model are: minimizing the difference between the prediction result of the text decoder and the corresponding target language text in the training sample and minimizing the difference between the prediction result of the speech decoder and the corresponding speech index text in the training sample; and then, a coder and a text decoder in the auxiliary model obtained by training are used for obtaining a translation model.

Description

Translation model establishing method, translation method and corresponding device
Technical Field
One or more embodiments of the present disclosure relate to the technical field of artificial intelligence, and in particular, to a method for building a translation model, a translation method, and a corresponding apparatus.
Background
Translation (Transliteration), also known as transcription or Transliteration, is a process in which, given two languages using different characters, one language text is translated in a similar language according to pronunciation, and the translated text no longer has its own semantics but retains the form of speech and writing. Translation is often used for some proper nouns or technical terms such as names of people, places, etc. Such as Arabic
Figure BDA0003348583150000011
The translation is English Latin letters as Muhammad, and the translation is Chinese as Muhamoude.
Currently, translation is widely applied in many application scenes, so how to improve translation quality becomes a problem to be solved urgently.
Disclosure of Invention
In view of the above, one or more embodiments of the present disclosure describe a method for building a translation model, a translation method, and a corresponding apparatus for improving translation quality.
According to a first aspect, there is provided a method of building a translation model, comprising:
acquiring training data containing a plurality of training samples, wherein the training samples comprise source language texts and target language texts and voice index texts corresponding to the source language texts;
training an auxiliary model comprising an encoder, a text decoder and a speech decoder using the training data; the source language text of the training sample is used as the input of the encoder, and the encoder outputs the characteristic representation of the source language text; the text decoder predicts a target language text of the source language text using the feature representation; the speech decoder predicts a speech index text of the source language text using the feature representation; the training targets of the auxiliary model are as follows: minimizing the difference between the prediction result of the text decoder and the corresponding target language text in the training sample and minimizing the difference between the prediction result of the speech decoder and the corresponding speech index text in the training sample;
and obtaining the translation model by utilizing an encoder and a text decoder in the auxiliary model obtained by training.
In one embodiment, the encoder employs a bi-directional recurrent neural network structure, and the text decoder and speech decoder employ a recurrent neural network structure.
In another embodiment, the encoder outputting the representation of the characteristic of the source language text comprises:
the encoder encodes the source language text to obtain the hidden vector representation corresponding to each character;
and converting the hidden vector representation corresponding to each character into a background variable by using a preset conversion function.
In one embodiment, the text decoder predicting target language text of the source language text using the feature representation comprises: the text decoder converts the prediction result of the previous character in the target language text, the background variable and the implicit vector representation of the previous character to obtain the implicit vector representation of the current character in the target language text; obtaining a prediction result of the current character by utilizing the implicit vector representation mapping of the current character;
the speech decoder predicting a speech index text of the source language text using the feature representation includes: the voice decoder converts the prediction result of the previous character in the voice index text, the background variable and the implicit vector representation of the previous character to obtain the implicit vector representation of the current character in the voice index text; and obtaining a prediction result of the current character by utilizing the implicit vector representation mapping of the current character.
In another embodiment, the auxiliary model further comprises: a text attention layer and a voice attention layer;
the encoding layer encodes the source language text to obtain a hidden vector representation corresponding to each character;
the text attention layer carries out attention mechanism processing on the implicit vector representation to obtain text background variables corresponding to all characters in a target language text, so that the text background variables corresponding to all the characters are utilized when the text decoder carries out prediction;
the voice attention layer carries out attention mechanism processing on the implicit vector representation to obtain voice background variables corresponding to all characters in a voice index text, so that the voice background variables corresponding to all the characters are utilized when the voice decoder carries out prediction;
the translation model further includes the text attention layer.
In one embodiment, the text attention layer represents h as a hidden vector corresponding to the ith character of the source language text when determining a text background variable corresponding to the jth character of the target language textiAttention weight α adoptedjiIs to the attention score ejiIs obtained by normalization; said ejiFrom said hiAnd the text encoder is used for implicit vector representation s of j-1 character of target language textj-1Transforming to obtain;
when the speech attention layer determines the speech background variable of the kth character in the speech index text, the speech attention layer represents h to the hidden vector corresponding to the ith character of the source language textiAttention weight α adoptedkiIs to the attention score ekiIs obtained by normalization; said ekiFrom said hiAnd the speech coder represents u for the hidden vector representation of the j-1 th character of the speech index textk-1And (5) carrying out transformation to obtain the product.
In another embodiment, the text decoder predicting target language text of the source language text using the feature representation includes: the text decoder converts the prediction result of the previous character in the target language text, the background variable of the current character and the implicit vector representation of the previous character to obtain the implicit vector representation of the current character in the target language text; obtaining a prediction result of the current character by utilizing the implicit vector representation mapping of the current character;
the speech decoder predicting a speech index text of the source language text using the feature representation includes: the voice decoder converts the prediction result of the previous character in the voice index text, the background variable of the current character and the implicit vector representation of the previous character to obtain the implicit vector representation of the current character in the voice index text; and obtaining a prediction result of the current character by utilizing the implicit vector representation mapping of the current character.
According to a second aspect, there is also provided a translation method comprising:
obtaining a source language text;
inputting the source language text into a translation model which is pre-established by adopting the method, and obtaining a target language text which is output by the translation model aiming at the source language text.
In one embodiment, the method further comprises:
and searching in the target language document or list by using the acquired target language text to acquire a search result.
According to a third aspect, there is provided an apparatus for building a translation model, comprising:
a training sample obtaining unit configured to obtain training data including a plurality of training samples, the training samples including source language texts and target language texts and voice index texts corresponding to the source language texts;
an auxiliary model training unit configured to train an auxiliary model including an encoder, a text decoder, and a speech decoder using the training data; the source language text of the training sample is used as the input of the encoder, and the encoder outputs the characteristic representation of the source language text; the text decoder predicts a target language text of the source language text using the feature representation; the speech decoder predicts a speech index text of the source language text using the feature representation; the training targets of the auxiliary model are as follows: minimizing the difference between the prediction result of the text decoder and the corresponding target language text in the training sample and minimizing the difference between the prediction result of the speech decoder and the corresponding speech index text in the training sample;
a translation model obtaining unit configured to obtain the translation model by using an encoder and a text decoder in the trained auxiliary model.
In one embodiment, the encoder employs a bi-directional recurrent neural network structure, and the text decoder and speech decoder employ a recurrent neural network structure.
In another embodiment, the encoder is specifically configured to encode the source language text to obtain a hidden vector representation corresponding to each character; and converting the hidden vector representation corresponding to each character into a background variable by using a preset conversion function.
In an embodiment, the text decoder is specifically configured to perform transformation by using a prediction result of a previous character in the target language text, the background variable, and a hidden vector representation of the previous character to obtain a hidden vector representation of a current character in the target language text; obtaining a prediction result of the current character by utilizing the implicit vector representation mapping of the current character;
the speech decoder is specifically used for converting by using a prediction result of a previous character in a speech index text, the background variable and the implicit vector representation of the previous character to obtain the implicit vector representation of a current character in the speech index text; and obtaining a prediction result of the current character by utilizing the implicit vector representation mapping of the current character.
In another embodiment, the auxiliary model further comprises: a text attention layer and a voice attention layer;
the coding layer is specifically used for coding the source language text to obtain a hidden vector representation corresponding to each character;
the text attention layer is used for processing an attention mechanism on the implicit vector representation to obtain text background variables corresponding to each character in a target language text, so that the text background variables corresponding to each character are utilized when the text decoder performs prediction;
the voice attention layer is used for processing an attention mechanism on the implicit vector representation to obtain a voice background variable corresponding to each character in a voice index text, so that the voice background variable corresponding to each character is utilized when the voice decoder performs prediction;
the translation model further includes the text attention layer.
In one embodiment, the text attention layer represents h as a hidden vector corresponding to the ith character of the source language text when determining a text background variable corresponding to the jth character of the target language textiAttention weight α adoptedjiIs to the attention score ejiIs obtained by normalization; said ejiFrom said hiAnd the text encoder is used for implicit vector representation s of j-1 character of target language textj-1Transforming to obtain;
when the speech attention layer determines the speech background variable of the kth character in the speech index text, the speech attention layer represents h to the hidden vector corresponding to the ith character of the source language textiAttention weight α adoptedkiIs to the attention score ekiIs obtained by normalization; said ekiFrom said hiAnd the speech coder represents u for the hidden vector representation of the j-1 th character of the speech index textk-1And (5) carrying out transformation to obtain the product.
In another embodiment, the text decoder is specifically configured to perform transformation by using a prediction result of a previous character in the target language text, a background variable of a current character, and a hidden vector representation of the previous character to obtain a hidden vector representation of the current character in the target language text; obtaining a prediction result of the current character by utilizing the implicit vector representation mapping of the current character;
the speech decoder is specifically used for converting by using a prediction result of a previous character in the speech index text, a background variable of a current character and a hidden vector representation of the previous character to obtain a hidden vector representation of the current character in the speech index text; and obtaining a prediction result of the current character by utilizing the implicit vector representation mapping of the current character.
According to a fourth aspect, there is also provided a translation apparatus comprising:
a source text acquisition unit configured to acquire a source language text;
a target text acquisition unit configured to input the source language text into a translation model, acquire a target language text output by the translation model for the source language text; wherein the translation model is pre-established by the apparatus as described above.
In one embodiment, further comprising:
and the information retrieval unit is configured to retrieve in the target language document or the list by using the acquired target language text to acquire a retrieval result.
According to a fifth aspect, there is provided a computing device comprising a memory having stored therein executable code and a processor that, when executing the executable code, implements the method of the first aspect.
According to the method and the device provided by the embodiment of the specification, the source language text and the target language text are learned by constructing the translation model, and the acoustic representation of the source language text is better learned by combining with the voice decoding task to assist the encoder, so that the translation quality of the translation model is improved while the training efficiency of the translation model is accelerated.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 shows a flow diagram of a method of root cause location according to one embodiment;
FIG. 2 is a schematic structural diagram of an auxiliary model provided in an embodiment of the present disclosure;
FIG. 3 is a schematic structural diagram of another auxiliary model provided in the embodiments of the present disclosure;
FIG. 4 illustrates a flow diagram of a translation method according to one embodiment;
FIG. 5 is a block diagram illustrating a translation model provided by an embodiment of the present disclosure;
FIG. 6 is a block diagram illustrating a translation model provided by an embodiment of the present disclosure;
FIG. 7 shows a schematic block diagram of an apparatus to build a translation model according to one embodiment;
fig. 8 shows a schematic block diagram of a translation apparatus according to one embodiment.
Detailed Description
The traditional translation method is mainly based on rules, for example, using a preset mapping table, to correspond the letters or letter combinations of the source language text to the letters or letter combinations of the target language. But this character-level mapping results in translation results of low quality. For example, for languages that use consonant phonemes, such as arabic, there are no vowels in the alphabet but only consonants, but reading the text requires complementing appropriate vowels according to context. In this case, if a rule-based translation scheme is used, only isolated target language consonant letter sequences can be generated, and an ideal translation result cannot be obtained.
In view of the above, the present specification provides a novel translation method, and the following detailed description is provided with embodiments.
FIG. 1 shows a flow diagram of a method of root cause location, according to one embodiment. It is to be appreciated that the method can be performed by any apparatus, device, platform, cluster of devices having computing and processing capabilities.
As shown in fig. 1, the method includes:
step 101, obtaining training data including a plurality of training samples, where the training samples include source language texts and target language texts and voice index texts corresponding to the source language texts.
Step 103, training an auxiliary model comprising an encoder, a text decoder and a speech decoder by using training data; the source language text of the training sample is used as the input of an encoder, and the encoder outputs the characteristic representation of the source language text; the text decoder predicts a target language text of the source language text using the feature representation; the speech decoder predicts a speech index text of the source language text using the feature representation; the training targets of the auxiliary model are: minimizing the difference between the predicted result of the text decoder and the corresponding target language text in the training samples and minimizing the difference between the predicted result of the speech decoder and the corresponding speech index text in the training samples.
And 105, obtaining a translation model by utilizing an encoder and a text decoder in the auxiliary model obtained by training.
In the method shown in fig. 1, a source language text and a target language text are learned by constructing a translation model, and an encoder is assisted to learn acoustic representation of the source language text better by combining a speech decoding task, so that the translation quality of the translation model is improved while the training efficiency of the translation model is accelerated.
The manner in which the various steps shown in fig. 1 are performed is described below. First, the above step 101, i.e., "obtaining training data including a plurality of training samples", will be described in detail with reference to the embodiments.
In this step, when training data is obtained, parallel samples of source language text, target language text, and speech index text are obtained. That is, each training sample needs to contain source language text and its corresponding target language text and speech index text, wherein in order to ensure the accuracy of model training, the contents in the training sample should be accurate.
As one of the realizable manners, some typical source language texts may be selected, for example, proper nouns such as typical names of people and places, or some technical terms, and then the target language texts are manually labeled by experts, and the source language texts are encoded by using a speech decoding algorithm to obtain corresponding speech index texts. Among other things, speech decoding algorithms such as the DoubleMetaphone, Metaphone3, SOUNDEX, etc. may be employed. These algorithms are currently relatively sophisticated speech decoding algorithms, not described in detail herein, and are intended to create the same index (or key) for words that sound the same or similar.
However, the above-mentioned manner of manual labeling by experts requires much labor cost, and another achievable manner is provided here, namely, the source language text and the target language text can be constructed by terms and attribute contents on encyclopedic pages. For example, on some encyclopedia pages in the small languages, the terms of proper nouns such as names of people and names of places have expressions of other corresponding languages in their property areas. E.g. in Arabic encyclopedia pages, for entries
Figure BDA0003348583150000081
In its property region there will be its corresponding english expression "Muhammad". On an encyclopedia page of Chinese, for the word "Muhamoude", there will be its corresponding English expression "Muhammad" in its attribute region. In this way it is possible to catch on encyclopedia pages such as
Figure BDA0003348583150000082
The correspondence between "Muhammad" and "Muhamoude". In training the translation model, if the source and target languages are Arabic and English, respectively, then this may be utilized
Figure BDA0003348583150000083
And "Muhammad" as the source and target language text, respectively, in a training sample. If the source language and the target language are respectively Chinese and English, the "Muhmoude" and the "Muhammad" can be used as the source language text and the target language text in one training sample. The corresponding voice index text can still be obtained in various mannersSpeech decoding algorithms such as the DoubleMetaphone, Metaphone3, SOUNDEX, etc. encode the source language text.
The above step 103 of training the auxiliary model including the encoder, the text decoder and the speech decoder using the training data will be described in detail with reference to the embodiments.
The purpose of this embodiment is to train the translation model, and the training of the translation model is realized by training the neural network model. However, in order to improve the training efficiency and quality, the speech decoding task is used to assist the training of the translation model in this embodiment, in which case the model including the speech decoder is referred to as an auxiliary model.
Fig. 2 is a schematic structural diagram of an auxiliary model provided in an embodiment of the present specification, where the auxiliary model mainly includes an encoder, a text decoder, and a speech decoder.
Where the input to the encoder is the source language text in the training sample. The source language text can be thought of as a sequence of characters, denoted X: x ═ X1,x2,…,xm). Wherein x isiRepresenting the ith character in the source language text and m representing the number of characters in the input sequence.
The encoder encodes the input sequence character by character, and the encoding results in a characteristic representation of the text in the source language.
Specifically, the encoder may encode the source language text to obtain a hidden vector representation corresponding to each character; and then converting the hidden vector representation corresponding to each character into a background variable by using a preset conversion function.
For each character, taking the ith character as an example, the encoder will input xiAnd the hidden vector representation h of the previous character, i.e. the (i-1) th characteri-1Conversion to a hidden vector representation h for the ith characteri. The formula is as follows:
hi=f(xi,hi-1) (1)
where f () is the transform function employed by the encoder in obtaining the hidden vector representation.
After the implicit vector representations of all characters are obtained, the implicit vector representations are transformed into a background variable C by using the following formula:
C=q(h1,h2,...,hm) (2)
where q () is the transform function employed by the encoder in deriving the background variable.
The encoder may adopt a conventional Recurrent Neural Network structure such as RNN (Recurrent Neural Network), LSTM (Long Short-Term Memory), GRU (Gated Recurrent Unit), and the like. As a preferred embodiment, however, the encoder may employ a bi-directional recurrent neural network structure, such as BiLSTM. In this case, the latent vector representation h for each character of the encoderiDepending on the sub-sequence before and after the character (also including the hidden vector representation of the current character) the information of the whole sequence is encoded.
The text decoder predicts a target language text of the source language text using the feature representation output by the encoder.
It can be seen that the characteristics of the encoder output are represented as a background variable C in this embodiment. The text decoder takes the background variable C as input and outputs a predicted sequence of characters, i.e. the target language text, which is assumed to be represented as
Figure BDA0003348583150000101
Wherein the content of the first and second substances,
Figure BDA0003348583150000102
and j represents the j character in the predicted target language text, and n is the total number of characters in the target language text.
For each character, taking the jth character as an example, the text decoder uses the prediction result of the last character
Figure BDA0003348583150000103
Background variable C and implicit vector representation s of previous characterj-1Converting to obtain the hidden vector representation s of the jth characterj. Is expressed by formula asThe following:
Figure BDA0003348583150000104
where g () is the transformation function employed by the text decoder in obtaining the hidden vector representation.
Then using Softmax operation to represent the hidden vector of the j-th character to be sjMapping to obtain the prediction result of the j character
Figure BDA0003348583150000105
The mapping result is output in the form of conditional probabilities mapped to the character.
The text encoder may employ a recurrent neural network structure such as RNN, LSTM, GRU, etc. in this embodiment.
The speech decoder predicts a speech index text of the source language text using the feature representation output by the encoder.
It can be seen that the characteristics of the encoder output are represented as a background variable C in this embodiment. The speech decoder takes the background variable C as input and outputs a sequence of predicted characters, i.e. a speech index text, which is assumed to be represented as
Figure BDA0003348583150000106
Wherein the content of the first and second substances,
Figure BDA0003348583150000107
and the k-th character in the predicted voice index text is represented, and r is the total number of characters in the voice index text.
For each character, taking the k-th character as an example, the speech decoder uses the prediction result of the previous character
Figure BDA0003348583150000108
Background variable C and implicit vector representation u of previous characterk-1Converting to obtain the implicit vector representation u of the k characterk. Is formulated as follows:
Figure BDA0003348583150000111
where l () is the transform function employed by the speech decoder in obtaining the hidden vector representation.
Then using Softmax operation to represent the hidden vector of the k-th character as ukMapping to obtain the predicted result of the k character
Figure BDA0003348583150000112
The mapping result is output in the form of conditional probabilities mapped to the character. It should be noted here that the speech encoder is mapped to the speech coding space. The speech coding space differs according to the specific speech coding algorithm used, and the specific speech coding algorithm should be consistent with the speech coding algorithm corresponding to the speech index text in the training sample.
The speech coder may employ a recurrent neural network structure such as RNN, LSTM, GRU, etc. in this embodiment.
If in the training sample, the source language text X ═ X (X)1,x2,…,xm) The corresponding target language text Y and the voice index text Z are respectively: y ═ Y1,y2,…,yn),Z=(z1,z2,…,zr) Then during the training process, the training goal is to minimize
Figure BDA0003348583150000113
And Y and minimization
Figure BDA0003348583150000114
And Z.
The Loss function Loss, which may be obtained from the first Loss function and the second Loss function, may be designed in advance for the learning target. Wherein the first Loss function LosscharTo minimize
Figure BDA0003348583150000115
The difference from Y is the target design, the second Loss function LossphoneTo minimize
Figure BDA0003348583150000116
The difference from Z is the target design. For example, the formula is expressed as follows:
Loss=Losschar+λLossphone (4)
wherein λ is a weighting coefficient.
That is, the model parameters of the whole auxiliary model are updated according to the values of the loss function (i.e. determined by the output of the text decoder and the speech encoder) in each iteration until a preset iteration stop condition is reached. The iteration stop condition may be, for example, that the value of the loss function is less than or equal to a preset value threshold, or that the number of iterations reaches a preset number threshold, or the like.
After training the auxiliary model, the actual translation model only includes the encoder and the text decoder, and may not include the speech encoder.
Fig. 3 is a schematic structural diagram of another auxiliary model provided in an embodiment of the present specification, where the auxiliary model includes a text attention layer and a speech attention layer in addition to an encoder, a text decoder, and a speech decoder.
Likewise, the input to the encoder is the source language text in the training sample. The source language text can be thought of as a sequence of characters, denoted X: x ═ X1,x2,…,xm). Wherein x isiRepresenting the ith character in the source language text and m representing the number of characters in the input sequence.
The encoder encodes the input sequence character by character, and the encoding results in a characteristic representation of the text in the source language.
Specifically, the encoder may encode the source language text to obtain a hidden vector representation corresponding to each character.
For each character, taking the ith character as an example, the encoder will input xiAnd the hidden vector representation h of the previous character, i.e. the (i-1) th characteri-1Conversion to implicit character of iAmount represents hi. The formula used is as described above in equation (1).
The encoder may adopt a conventional Recurrent Neural Network structure such as RNN (Recurrent Neural Network), LSTM (Long Short-Term Memory), GRU (Gated Recurrent Unit), and the like. As a preferred embodiment, however, the encoder may employ a bi-directional recurrent neural network structure, such as BiLSTM. In this case, the latent vector representation h for each character of the encoderiDepending on the sub-sequence before and after the character (also including the hidden vector representation of the current character) the information of the whole sequence is encoded.
Unlike in the structure shown in fig. 2, the generation of the background variable is performed by the text attention layer and the speech attention layer, respectively.
The text attention layer carries out attention mechanism processing on the implicit vector representation to obtain a text background variable Cchar. The text background variable CcharIs variable, for each instant (i.e., for each character of the target language text), the following formula may be employed:
Figure BDA0003348583150000121
wherein the content of the first and second substances,
Figure BDA0003348583150000122
a text context variable representing the jth character of the corresponding target language text. Alpha is alphajiFor weight, for a given j, its value is a probability distribution. Alpha is alphajiMay be the score for attention e using the softmax operationjiAnd carrying out normalization to obtain the product.
ejiThe hidden vector representation h, which can be derived from the text attention layeriAnd the hidden vector representation s of the text decoder for the j-1 th characterj-1And (5) carrying out transformation to obtain the product. The formula is expressed as follows:
eji=a(sj-1,hi) (6)
where a () is the decoder getting ejiThe transformation function used. For example, the method can be realized by adopting a multi-layer perceptron, and the formula is represented as:
a(sj-1,hi)=V tanh(Wssj-1+Whhi) (7)
wherein, V, WsAnd WhAre all model parameters.
The voice attention layer carries out attention mechanism processing on the implicit vector representation to obtain a voice background variable Cphone. The speech background variable CphoneIs variable, for each instant (i.e. each character of the corresponding phonetic index text) the following formula can be used:
Figure BDA0003348583150000131
wherein the content of the first and second substances,
Figure BDA0003348583150000132
a speech background variable representing the kth character of the corresponding speech indexed text. Alpha is alphakiFor weight, the value is a probability distribution for a given k. Alpha is alphakiMay be the score for attention e using the softmax operationkiAnd carrying out normalization to obtain the product.
ekiThe implicit vector representation h obtained by the decoder can be usediAnd the hidden vector representation u of the text decoder for the k-1 th characterk-1And (5) carrying out transformation to obtain the product. The formula is expressed as follows:
eki=a'(uk-1,hi) (9)
wherein a' () is the speech attention layer getting ekiThe transformation function used. For example, the method can be realized by adopting a multi-layer perceptron, and the formula is represented as:
a'(uk-1,hi)=V'tanh(Wuuk-1+Wh'hi) (10)
wherein, V' and WuAnd Wh' are all model parameters。
The text decoder then uses the text background variable C output by the text attention layercharAnd predicting a target language text of the source language text.
It can be seen that the features utilized by the text decoder in this embodiment are represented as text background variables Cchar. Text decoder with text background variable CcharAs input, a predicted sequence of characters, i.e., target language text, is output, presumably represented as
Figure BDA0003348583150000141
Wherein the content of the first and second substances,
Figure BDA0003348583150000142
and j represents the j character in the predicted target language text, and n is the total number of characters in the target language text.
For each character, taking the jth character as an example, the text decoder uses the prediction result of the last character
Figure BDA0003348583150000143
Background variables of the current character
Figure BDA0003348583150000144
And a hidden vector representation s of the previous characterj-1Converting to obtain the hidden vector representation s of the jth characterj. Is formulated as follows:
Figure BDA0003348583150000145
where g () is the transformation function employed by the text decoder in obtaining the hidden vector representation.
Then using Softmax operation to represent the hidden vector of the j-th character to be sjMapping to obtain the prediction result of the j character
Figure BDA0003348583150000146
The mapping result is summarized in terms of conditions for mapping to charactersAnd outputting in the form of a rate.
The text encoder may employ a recurrent neural network structure such as RNN, LSTM, GRU, etc. in this embodiment.
Speech decoder using speech background variable CphoneA speech index text of a source language text is predicted.
It can be seen that the features utilized by the speech decoder in this embodiment are represented as speech background variables Cphone. Speech background variable C for speech decoderphoneAs input, a predicted sequence of characters, i.e. a speech index text, is output, which is assumed to be represented as
Figure BDA0003348583150000147
Wherein the content of the first and second substances,
Figure BDA0003348583150000148
and the k-th character in the predicted voice index text is represented, and r is the total number of characters in the voice index text.
For each character, taking the k-th character as an example, the speech decoder uses the prediction result of the previous character
Figure BDA0003348583150000149
Speech background variable of current character
Figure BDA00033485831500001410
And the implicit vector representation u of the previous characterk-1Converting to obtain the implicit vector representation u of the k characterk. Is formulated as follows:
Figure BDA00033485831500001411
where l () is the transform function employed by the speech decoder in obtaining the hidden vector representation.
Then using Softmax operation to represent the hidden vector of the k-th character as ukMapping to obtain the predicted result of the k character
Figure BDA00033485831500001412
The mapping result is output in the form of conditional probabilities mapped to the character. It should be noted here that the speech encoder is mapped to the speech coding space. The speech coding space differs according to the specific speech coding algorithm used, and the specific speech coding algorithm should be consistent with the speech coding algorithm corresponding to the speech index text in the training sample.
The speech coder may employ a recurrent neural network structure such as RNN, LSTM, GRU, etc. in this embodiment.
If in the training sample, the source language text X ═ X (X)1,x2,…,xm) The corresponding target language text Y and the voice index text Z are respectively: y ═ Y1,y2,…,yn),Z=(z1,z2,…,zr) Then during the training process, the training goal is to minimize
Figure BDA0003348583150000151
And Y and minimization
Figure BDA0003348583150000152
And Z.
The Loss function Loss, which may be obtained from the first Loss function and the second Loss function, may be designed in advance for the learning target. Wherein the first Loss function LosscharTo minimize
Figure BDA0003348583150000153
The difference from Y is the target design, the second Loss function LossphoneTo minimize
Figure BDA0003348583150000154
The difference from Z is the target design. For example, as shown in the above equation (4).
That is, the model parameters of the whole auxiliary model are updated according to the values of the loss function (i.e. determined by the output of the text decoder and the speech encoder) in each iteration until a preset iteration stop condition is reached. The iteration stop condition may be, for example, that the value of the loss function is less than or equal to a preset value threshold, or that the number of iterations reaches a preset number threshold, or the like.
After training the auxiliary model, the actual translation model only includes the encoder, the text attention layer and the text decoder, and may not include the speech attention layer and the speech encoder.
After the translation model is obtained, the translation process can be performed by using the translation model. Fig. 4 shows a flow diagram of a translation method according to an embodiment, it being understood that the method may be performed by any apparatus, device, platform, cluster of devices having computing, processing capabilities. As shown in fig. 4, the method may include the steps of:
step 401: and obtaining source language text.
The source language text involved in this step is the text to be translated. The text usually includes proper nouns such as names of people and places, technical nouns, and the like.
Step 403: inputting the source language text into the translation model, and acquiring the target language text output by the translation model aiming at the source language text.
This embodiment needs to ensure that the source language and the target language corresponding to the selected translation model are consistent with the requirement of the translation.
Fig. 5 is a schematic structural diagram of a translation model provided in an embodiment of the present specification, and as shown in fig. 5, the translation model includes an encoder and a text decoder.
And the encoder encodes the source language text to obtain the characteristic representation of the source language text.
Specifically, the encoder may encode the source language text to obtain a hidden vector representation corresponding to each character; and then converting the hidden vector representation corresponding to each character into a background variable by using a preset conversion function.
If the input source language text is X: x ═ X1,x2,…,xm) Then for each character, taking the ith character as an example, the encoder will input xiAnd the hidden vector representation h of the previous character, i.e. the (i-1) th characteri-1Conversion to a hidden vector representation h for the ith characteriFor example, the transformation can be performed using equation (1). And then transformed into a background variable C using equation (2).
The text decoder predicts a target language text of the source language text using the feature representation output by the encoder.
It can be seen that the characteristics of the encoder output are represented as a background variable C in this embodiment. The text decoder takes the background variable C as input and outputs a sequence of predicted characters, namely the target language text
Figure BDA0003348583150000161
For each character, taking the jth character as an example, the text decoder uses the prediction result of the last character
Figure BDA0003348583150000162
Background variable C and implicit vector representation s of previous characterj-1Converting to obtain the hidden vector representation s of the jth characterj. For example, using equation (3).
Then using Softmax operation to represent the hidden vector of the j-th character to be sjMapping to obtain the prediction result of the j character
Figure BDA0003348583150000163
The mapping result is output in the form of conditional probabilities mapped to the character.
Fig. 6 is a schematic structural diagram of a translation model provided in an embodiment of the present specification, and as shown in fig. 6, the translation model includes an encoder, a text attention layer, and a text decoder.
And the encoder encodes the source language text to obtain the hidden vector representation corresponding to each character.
If the input source language text is X: x ═ X1,x2,…,xm) Then for each character, taking the ith character as an example, the encoder will input xiAnd the last character isImplicit vector representation h for the i-1 th characteri-1Conversion to a hidden vector representation h for the ith characteriFor example, the transformation can be performed using equation (1).
The text attention layer carries out attention mechanism processing on the implicit vector representation to obtain a text background variable Cchar. The text background variable CcharIs variable, and for each time (i.e. each character of the corresponding target language text), the text background variable corresponding to the jth character of the target language text can be determined by using the formula (5)
Figure BDA0003348583150000171
The text decoder then uses the text background variable C output by the text attention layercharAnd predicting a target language text of the source language text. Text decoder with text background variable CcharAs input, a predicted sequence of characters, i.e. target language text, is output
Figure BDA0003348583150000172
For each character, taking the jth character as an example, the text decoder uses the prediction result of the last character
Figure BDA0003348583150000173
Background variables of the current character
Figure BDA0003348583150000174
And a hidden vector representation s of the previous characterj-1Converting to obtain the hidden vector representation s of the jth characterj. For example, formula (11) can be used. Then using Softmax operation to represent the hidden vector of the j-th character to be sjMapping to obtain the prediction result of the j character
Figure BDA0003348583150000175
The mapping result is output in the form of conditional probabilities mapped to the character.
The translation method provided by the above method embodiment can be applied to various application scenarios. As a typical application scenario, the obtained target language text may be used to perform a search in a target language document or list to obtain a search result.
In one example, assume that in the information defense domain, there are some special names of people, organization names, etc. that are listed in the blacklist. However, although the names of persons and organizations in the blacklist are languages such as english, they need to be recognized in texts of small languages. Then, the translation method in the above embodiment of the present specification may be adopted, and the text in the small language is input into the corresponding translation model as the source language text, so as to obtain the translation result of the corresponding english language. And then matching the English translation result with a blacklist, namely searching in the blacklist, and if the search result exists, indicating that the text contains the blacklist, and performing further information defense processing on the text.
In some particular technical fields, for example, there is a need to find articles in the small language associated with a particular technical name, but only the english text of that technical name is known. The english text of the technical name can be input into the corresponding translation model to obtain the translation result of the corresponding language of the technical name. The translation result is searched in a large number of articles, so that the article containing the technical name is found.
But also to other scenarios, which are not exhaustive here.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
According to an embodiment of another aspect, an apparatus for building a translation model is provided. FIG. 7 illustrates a schematic block diagram of an apparatus to build a translation model, according to one embodiment. It is to be appreciated that the apparatus can be implemented by any apparatus, device, platform, and cluster of devices having computing and processing capabilities. As shown in fig. 7, the apparatus 700 includes: a training sample acquisition unit 701, an auxiliary model training unit 702, and a translation model acquisition unit 703. The main functions of each component unit are as follows:
a training sample obtaining unit 701 configured to obtain training data including a plurality of training samples, where a training sample includes a source language text and a target language text and a speech index text corresponding to the source language text.
As one of the realizable manners, some typical source language texts may be selected, for example, proper nouns such as typical names of people and places, or some technical terms, and then the target language texts are manually labeled by experts, and the source language texts are encoded by using a speech decoding algorithm to obtain corresponding speech index texts.
As another realizable mode, a source language text and a target language text can be constructed through entries and attribute contents on an encyclopedic page, and a voice decoding algorithm is adopted to encode the source language text to obtain a corresponding voice index text.
An auxiliary model training unit 702 configured to train an auxiliary model including an encoder, a text decoder, and a speech decoder using training data; the source language text of the training sample is used as the input of an encoder, and the encoder outputs the characteristic representation of the source language text; the text decoder predicts a target language text of the source language text using the feature representation; the speech decoder predicts a speech index text of the source language text using the feature representation; the training targets of the auxiliary model are: minimizing the difference between the predicted result of the text decoder and the corresponding target language text in the training samples and minimizing the difference between the predicted result of the speech decoder and the corresponding speech index text in the training samples.
As a preferred embodiment, the encoder may employ a bidirectional recurrent neural network structure, and the text decoder and the speech decoder may employ a recurrent neural network structure.
In a realizable manner, the auxiliary model mainly comprises an encoder, a text decoder and a speech decoder, as shown in fig. 2.
The encoder is used for encoding the source language text to obtain the hidden vector representation corresponding to each character; and converting the hidden vector representation corresponding to each character into a background variable by using a preset conversion function.
The text decoder is used for converting by using a prediction result of a previous character in the target language text, a background variable and the implicit vector representation of the previous character to obtain the implicit vector representation of the current character in the target language text; and obtaining a prediction result of the current character by utilizing the implicit vector representation mapping of the current character.
The voice decoder is used for converting the prediction result of the previous character in the voice index text, the background variable and the implicit vector representation of the previous character to obtain the implicit vector representation of the current character in the voice index text; and obtaining a prediction result of the current character by utilizing the implicit vector representation mapping of the current character.
As another realizable approach, as shown in fig. 3, the auxiliary model may further include: a text attention layer and a speech attention layer.
The encoding layer is used for encoding the source language text to obtain the hidden vector representation corresponding to each character;
the text attention layer is used for processing an attention mechanism on the implicit vector representation to obtain text background variables corresponding to the characters in the target language text, so that a text decoder can utilize the text background variables corresponding to the characters when predicting.
The voice attention layer is used for processing an attention mechanism on the implicit vector representation to obtain a voice background variable corresponding to each character in the voice index text, so that a voice decoder can utilize the voice background variable corresponding to each character when predicting.
The translation model in this case further includes a text attention layer.
More specifically, the text attention layer is determining the target languageWhen the text background variable corresponding to the jth character of the language text represents h for the hidden vector corresponding to the ith character of the source language textiAttention weight α adoptedjiIs to the attention score ejiIs obtained by normalization; e.g. of the typejiFrom hiAnd the text encoder's hidden vector representation s for the j-1 th character of the target language textj-1And (5) carrying out transformation to obtain the product.
When the phonetic attention layer determines the phonetic background variable of the kth character in the phonetic index text, the implicit vector corresponding to the ith character of the source language text represents hiAttention weight α adoptedkiIs to the attention score ekiIs obtained by normalization; e.g. of the typekiFrom hiAnd the speech coder represents u for the hidden vector representation of the j-1 th character of the speech indexed textk-1And (5) carrying out transformation to obtain the product.
The text decoder is used for converting by using a prediction result of a previous character in the target language text, a background variable of the current character and the implicit vector representation of the previous character to obtain the implicit vector representation of the current character in the target language text; and obtaining a prediction result of the current character by utilizing the implicit vector representation mapping of the current character.
The speech decoder is specifically used for converting by using a prediction result of a previous character in the speech index text, a background variable of the current character and the implicit vector representation of the previous character to obtain the implicit vector representation of the current character in the speech index text; and obtaining a prediction result of the current character by utilizing the implicit vector representation mapping of the current character.
A translation model obtaining unit 703 configured to obtain a translation model by using an encoder and a text decoder in the trained auxiliary model.
According to an embodiment of another aspect, a translation apparatus is provided. Fig. 8 shows a schematic block diagram of a translation apparatus according to one embodiment. It is to be appreciated that the apparatus can be implemented by any apparatus, device, platform, and cluster of devices having computing and processing capabilities. As shown in fig. 8, the apparatus 800 includes: the source text acquisition unit 801 and the target text acquisition unit 802 may further include an information retrieval unit 803. The main functions of each component unit are as follows:
a source text acquisition unit 801 configured to acquire a source language text.
A target text obtaining unit 802 configured to input the source language text into the translation model, and obtain the target language text output by the translation model for the source language text.
As one way of accomplishing this, the translation model primarily includes an encoder and a text decoder, as shown in fig. 5.
And the encoder encodes the source language text to obtain the characteristic representation of the source language text. The text decoder predicts a target language text of the source language text using the feature representation output by the encoder.
Specifically, the encoder may encode the source language text to obtain a hidden vector representation corresponding to each character; and then converting the hidden vector representation corresponding to each character into a background variable by using a preset conversion function.
Specifically, the text decoder performs conversion by using the prediction result of the previous character in the target language text, the background variable, and the hidden vector representation of the previous character to obtain the hidden vector representation of the current character. And then, mapping the hidden vector representation of the current character by using Softmax operation to obtain a prediction result of the current character. The mapping result is output in the form of conditional probabilities mapped to the character.
As another implementable approach, the translation model includes an encoder, a text attention layer, and a text decoder, as shown in fig. 6.
And the encoder encodes the source language text to obtain the hidden vector representation corresponding to each character.
And the text attention layer carries out attention mechanism processing on the hidden vector representation to obtain a text background variable.
The text decoder then predicts a target language text of the source language text using the text background variables output by the text attention layer.
Wherein the text attention layer determines the text corresponding to the jth character of the target language textWhen the background variable is used, the hidden vector corresponding to the ith character of the source language text is represented by hiAttention weight α adoptedjiIs to the attention score ejiIs obtained by normalization; e.g. of the typejiFrom hiAnd the text encoder's hidden vector representation s for the j-1 th character of the target language textj-1And (5) carrying out transformation to obtain the product.
The text decoder is used for converting by utilizing a prediction result of a previous character in the target language text, a background variable of the current character and the implicit vector representation of the previous character to obtain the implicit vector representation of the current character in the target language text; and obtaining a prediction result of the current character by utilizing the implicit vector representation mapping of the current character.
An information retrieval unit 803 configured to perform retrieval in a target language document or list using the acquired target language text to acquire a retrieval result.
According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in fig. 1 or fig. 4.
According to an embodiment of yet another aspect, there is also provided a computing device comprising a memory and a processor, the memory having stored therein executable code, the processor implementing the method of fig. 1 or fig. 4 when executing the executable code.
With the development of time and technology, computer readable storage media are more and more widely used, and the propagation path of computer programs is not limited to tangible media any more, and the computer programs can be directly downloaded from a network and the like. Any combination of one or more computer-readable storage media may be employed. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present specification, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The processors described above may include one or more single-core processors or multi-core processors. The processor may comprise any combination of general purpose processors or dedicated processors (e.g., image processors, application processor baseband processors, etc.).
Computer program code for carrying out operations for aspects of the present description may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
Those skilled in the art will recognize that in one or more of the examples described above, the functions described in this specification can be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
It is also to be understood that the terminology used in the embodiments of the specification is for the purpose of describing particular embodiments only, and is not intended to be limiting of the invention. As used in the specification examples and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to … …", depending on the context.
The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims (12)

1. A method of building a translation model, comprising:
acquiring training data containing a plurality of training samples, wherein the training samples comprise source language texts and target language texts and voice index texts corresponding to the source language texts;
training an auxiliary model comprising an encoder, a text decoder and a speech decoder using the training data; the source language text of the training sample is used as the input of the encoder, and the encoder outputs the characteristic representation of the source language text; the text decoder predicts a target language text of the source language text using the feature representation; the speech decoder predicts a speech index text of the source language text using the feature representation; the training targets of the auxiliary model are as follows: minimizing the difference between the prediction result of the text decoder and the corresponding target language text in the training sample and minimizing the difference between the prediction result of the speech decoder and the corresponding speech index text in the training sample;
and obtaining the translation model by utilizing an encoder and a text decoder in the auxiliary model obtained by training.
2. The method of claim 1, wherein the encoder employs a bi-directional recurrent neural network structure, and the text decoder and speech decoder employ a recurrent neural network structure.
3. The method of claim 1, wherein the encoder outputting the feature representation of the source language text comprises:
the encoder encodes the source language text to obtain the hidden vector representation corresponding to each character;
and converting the hidden vector representation corresponding to each character into a background variable by using a preset conversion function.
4. The method of claim 3, wherein the text decoder predicting target language text of the source language text using the feature representation comprises: the text decoder converts the prediction result of the previous character in the target language text, the background variable and the implicit vector representation of the previous character to obtain the implicit vector representation of the current character in the target language text; obtaining a prediction result of the current character by utilizing the implicit vector representation mapping of the current character;
the speech decoder predicting a speech index text of the source language text using the feature representation includes: the voice decoder converts the prediction result of the previous character in the voice index text, the background variable and the implicit vector representation of the previous character to obtain the implicit vector representation of the current character in the voice index text; and obtaining a prediction result of the current character by utilizing the implicit vector representation mapping of the current character.
5. The method of claim 1, wherein the auxiliary model further comprises: a text attention layer and a voice attention layer;
the encoding layer encodes the source language text to obtain a hidden vector representation corresponding to each character;
the text attention layer carries out attention mechanism processing on the implicit vector representation to obtain text background variables corresponding to all characters in a target language text, so that the text background variables corresponding to all the characters are utilized when the text decoder carries out prediction;
the voice attention layer carries out attention mechanism processing on the implicit vector representation to obtain voice background variables corresponding to all characters in a voice index text, so that the voice background variables corresponding to all the characters are utilized when the voice decoder carries out prediction;
the translation model further includes the text attention layer.
6. The method of claim 5, wherein the text attention layer represents h for the hidden vector representation corresponding to the ith character of the source language text when determining the text context variable corresponding to the jth character of the target language textiAttention weight α adoptedjiIs to the attention score ejiIs obtained by normalization; said ejiFrom said hiAnd the text encoder is used for implicit vector representation s of j-1 character of target language textj-1Transforming to obtain;
when the speech attention layer determines the speech background variable of the kth character in the speech index text, the speech attention layer represents h to the hidden vector corresponding to the ith character of the source language textiAttention weight α adoptedkiIs to the attention score ekiIs obtained by normalization; said ekiFrom said hiAnd the speech coder represents u for the hidden vector representation of the j-1 th character of the speech index textk-1And (5) carrying out transformation to obtain the product.
7. The method of claim 5, wherein the text decoder predicting target language text of the source language text using the feature representation comprises: the text decoder converts the prediction result of the previous character in the target language text, the background variable of the current character and the implicit vector representation of the previous character to obtain the implicit vector representation of the current character in the target language text; obtaining a prediction result of the current character by utilizing the implicit vector representation mapping of the current character;
the speech decoder predicting a speech index text of the source language text using the feature representation includes: the voice decoder converts the prediction result of the previous character in the voice index text, the background variable of the current character and the implicit vector representation of the previous character to obtain the implicit vector representation of the current character in the voice index text; and obtaining a prediction result of the current character by utilizing the implicit vector representation mapping of the current character.
8. A translation method, comprising:
obtaining a source language text;
the method comprises the steps of inputting the source language text into a translation model which is established in advance by adopting the method of any one of claims 1 to 7, and obtaining target language text which is output by the translation model aiming at the source language text.
9. The method of claim 8, further comprising:
and searching in the target language document or list by using the acquired target language text to acquire a search result.
10. Apparatus for building a translation model, comprising:
a training sample obtaining unit configured to obtain training data including a plurality of training samples, the training samples including source language texts and target language texts and voice index texts corresponding to the source language texts;
an auxiliary model training unit configured to train an auxiliary model including an encoder, a text decoder, and a speech decoder using the training data; the source language text of the training sample is used as the input of the encoder, and the encoder outputs the characteristic representation of the source language text; the text decoder predicts a target language text of the source language text using the feature representation; the speech decoder predicts a speech index text of the source language text using the feature representation; the training targets of the auxiliary model are as follows: minimizing the difference between the prediction result of the text decoder and the corresponding target language text in the training sample and minimizing the difference between the prediction result of the speech decoder and the corresponding speech index text in the training sample;
a translation model obtaining unit configured to obtain the translation model by using an encoder and a text decoder in the trained auxiliary model.
11. Translation apparatus comprising:
a source text acquisition unit configured to acquire a source language text;
a target text acquisition unit configured to input the source language text into a translation model, acquire a target language text output by the translation model for the source language text; wherein the translation model is pre-established by the apparatus of any one of claims 10 to 16.
12. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code that, when executed by the processor, performs the method of any of claims 1-9.
CN202111330368.0A 2021-11-11 2021-11-11 Translation model establishing method, translation method and corresponding device Pending CN114118108A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111330368.0A CN114118108A (en) 2021-11-11 2021-11-11 Translation model establishing method, translation method and corresponding device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111330368.0A CN114118108A (en) 2021-11-11 2021-11-11 Translation model establishing method, translation method and corresponding device

Publications (1)

Publication Number Publication Date
CN114118108A true CN114118108A (en) 2022-03-01

Family

ID=80378287

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111330368.0A Pending CN114118108A (en) 2021-11-11 2021-11-11 Translation model establishing method, translation method and corresponding device

Country Status (1)

Country Link
CN (1) CN114118108A (en)

Similar Documents

Publication Publication Date Title
CN107870902B (en) Neural machine translation system
JP6818941B2 (en) How to Train Multilingual Speech Recognition Networks, Speech Recognition Systems and Multilingual Speech Recognition Systems
CN111783462B (en) Chinese named entity recognition model and method based on double neural network fusion
JP6929466B2 (en) Speech recognition system
CN110603583B (en) Speech recognition system and method for speech recognition
CN112712804B (en) Speech recognition method, system, medium, computer device, terminal and application
Yao et al. An improved LSTM structure for natural language processing
US11741109B2 (en) Dialogue system, a method of obtaining a response from a dialogue system, and a method of training a dialogue system
KR102565274B1 (en) Automatic interpretation method and apparatus, and machine translation method and apparatus
CN109887484B (en) Dual learning-based voice recognition and voice synthesis method and device
EP3819809A1 (en) A dialogue system, a method of obtaining a response from a dialogue system, and a method of training a dialogue system
KR102329127B1 (en) Apparatus and method for converting dialect into standard language
JP7072178B2 (en) Equipment, methods and programs for natural language processing
CN111401084A (en) Method and device for machine translation and computer readable storage medium
CN112446211A (en) Text processing device, method, apparatus, and computer-readable storage medium
RU2712101C2 (en) Prediction of probability of occurrence of line using sequence of vectors
CN114818668A (en) Method and device for correcting personal name of voice transcribed text and computer equipment
CN113655893A (en) Word and sentence generation method, model training method and related equipment
Mamatov et al. Speech recognition based on transformer neural networks
Khassanov et al. Enriching rare word representations in neural language models by embedding matrix augmentation
WO2019163752A1 (en) Morpheme analysis learning device, morpheme analysis device, method, and program
CN114118108A (en) Translation model establishing method, translation method and corresponding device
CN114067783A (en) Speech recognition method, speech recognition device, storage medium, and electronic apparatus
Wang et al. Sequence adversarial training and minimum bayes risk decoding for end-to-end neural conversation models
Wan et al. Lexicon-constrained copying network for Chinese abstractive summarization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination