CN111428518A

CN111428518A - Low-frequency word translation method and device

Info

Publication number: CN111428518A
Application number: CN201910020175.1A
Authority: CN
Inventors: 张学强; 刘俊华; 魏思; 王智国; 胡国平
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2019-01-09
Filing date: 2019-01-09
Publication date: 2020-07-17
Anticipated expiration: 2039-01-09
Also published as: CN111428518B

Abstract

The application discloses a low-frequency word translation method and a device, wherein the method comprises the following steps: after a source text to be translated is obtained, combined representation results corresponding to each low-frequency word in the source text can be generated, for each low-frequency word, the combined representation results corresponding to the low-frequency words comprise vector representation results of the low-frequency words and/or vector representation results of a translated text of the low-frequency words and vector representation results of wildcards after the low-frequency words are replaced by wildcards, and then the source text is translated according to the combined representation results corresponding to the low-frequency words, so that a target text is obtained. Therefore, when the source text is translated, the vector representation result of the wildcard characters correspondingly replaced by the low-frequency words is considered, and the vector representation result of the low-frequency words and/or the translated text of the low-frequency words is further considered, so that the integrity of semantic information of the source text is improved, and the fluency of the translation result is improved.

Description

Low-frequency word translation method and device

Technical Field

The application relates to the technical field of machine translation, in particular to a low-frequency word translation method and device.

Background

With the continuous development of science and technology, machine translation becomes an important research subject for solving the problem of interactive communication of different language ethnic groups, wherein the quality of the translation effect of low-frequency words directly influences whether the machine translation technology and application can be successfully put to practical use and industrialization. Low-frequency words refer to a class of words that are sparse in frequency or never appear in a large-scale bilingual parallel corpus, and are often called unknown words (unknown) or Out-of-vocabulary words (Out of vocabularies) in natural language processing according to the frequency degree. Because the low-frequency words have the characteristics of frequency sparsity, single translated text and the like, the translation of the low-frequency words is always the key and difficult point in the research work of machine translation.

In the existing low-frequency word translation method, low-frequency words in a source text are converted into wildcards, then the source text converted into the wildcards is translated to obtain a target text, and finally the wildcards in the target text are replaced by the original corresponding low-frequency words to form a final complete translation. However, although the translation method enables the low-frequency words to be translated, directly converting the low-frequency words into wildcards may cause incomplete semantic information of the source text, and further cause insufficient smoothness of the target text obtained after translation, that is, there is a problem of reduced fluency of the translated text.

Disclosure of Invention

The embodiment of the application mainly aims to provide a low-frequency word translation method and device, which can improve fluency of translation results when a text to which a low-frequency word belongs is translated.

The embodiment of the application provides a low-frequency word translation method, which comprises the following steps:

generating a combined representation result corresponding to each low-frequency word in the source text; the combined representation result comprises a vector representation result corresponding to the low-frequency word and/or a vector representation result corresponding to a translation of the low-frequency word, and a vector representation result of the wildcard after the corresponding low-frequency word is replaced by the wildcard;

and translating the source text according to the combined representation result corresponding to each low-frequency word to obtain a target text.

Optionally, the translating the source text according to the combined representation result corresponding to each low-frequency word includes:

for each low-frequency word, carrying out vector fusion on each vector representation result in the combined representation result corresponding to the low-frequency word to obtain a final representation result of the low-frequency word;

and translating the source text according to the respective final characterization result of each low-frequency word.

Optionally, the vector fusion is performed on each vector representation result in the combined representation result corresponding to the low-frequency word to obtain a final representation result of the low-frequency word, and the method includes:

performing weighted calculation on each vector representation result in the combined representation results corresponding to the low-frequency words to obtain weighted calculation results;

and carrying out nonlinear transformation on the weighting calculation result to obtain a final characterization result of the low-frequency word.

Optionally, the vector characterization result of the corresponding low-frequency word is generated as follows:

and generating a vector representation result of the corresponding low-frequency word by using the vector representation result of each subword of the corresponding low-frequency word.

Optionally, the generating a vector representation result of a corresponding low-frequency word by using a vector representation result of each subword of the corresponding low-frequency word includes:

reversely scanning each subword corresponding to the low-frequency word by utilizing a neural network to obtain a vector representation result of a first subword in each subword;

and taking the vector representation result of the first sub-word as the vector representation result of the corresponding low-frequency word.

Optionally, the vector characterization result of the translation of the corresponding low-frequency word is generated according to the following method:

and generating a vector representation result of the translation of the corresponding low-frequency word by using the vector representation result of each sub-word of the translation of the corresponding low-frequency word.

Optionally, the generating a vector representation result of the translation of the corresponding low-frequency word by using the vector representation result of each subword of the translation of the corresponding low-frequency word includes:

reversely scanning each sub-word of the translation of the corresponding low-frequency word by using a neural network to obtain a vector representation result of a first sub-word in each sub-word;

and taking the vector representation result of the first sub-word as the vector representation result of the translation corresponding to the low-frequency word.

Optionally, the vector representation result of the wildcard carries context semantic information of the sample corpus to which the corresponding low-frequency word belongs.

The embodiment of the present application further provides a low frequency word translation device, including:

the characterization result generation unit is used for generating a combined characterization result corresponding to each low-frequency word in the source text; the combined representation result comprises a vector representation result corresponding to the low-frequency word and/or a vector representation result corresponding to a translation of the low-frequency word, and a vector representation result of the wildcard after the corresponding low-frequency word is replaced by the wildcard;

and the low-frequency word translation unit is used for translating the source text according to the combined representation result corresponding to each low-frequency word to obtain a target text.

Optionally, the low-frequency word translation unit includes:

the vector fusion subunit is used for performing vector fusion on each vector representation result in the combined representation result corresponding to each low-frequency word to obtain a final representation result of the low-frequency word;

and the low-frequency word translation subunit is used for translating the source text according to the respective final representation result of each low-frequency word.

Optionally, the vector fusion subunit includes:

the weighting calculation subunit is configured to perform weighting calculation on each vector representation result in the combined representation result corresponding to the low-frequency word to obtain a weighting calculation result;

and the nonlinear transformation subunit is used for carrying out nonlinear transformation on the weighting calculation result to obtain a final characterization result of the low-frequency word.

Optionally, the characterization result generating unit is specifically configured to generate a vector characterization result corresponding to the low-frequency word by using the vector characterization result of each subword corresponding to the low-frequency word.

Optionally, the characterization result generating unit includes:

the first scanning sub-unit is used for reversely scanning each sub-word of the corresponding low-frequency word by utilizing the neural network to obtain a vector representation result of a first sub-word in each sub-word;

and the first generating subunit is used for taking the vector representation result of the first subword as the vector representation result of the corresponding low-frequency word.

Optionally, the characterization result generating unit is specifically configured to generate a vector characterization result of the translation corresponding to the low-frequency word by using a vector characterization result of each subword of the translation corresponding to the low-frequency word.

Optionally, the characterization result generating unit includes:

the second scanning subunit is used for reversely scanning each subword of the translation of the corresponding low-frequency word by utilizing the neural network to obtain a vector representation result of a first subword in each subword;

and the second generating subunit is used for taking the vector representation result of the first subword as the vector representation result of the translation corresponding to the low-frequency word.

An embodiment of the present application further provides a low frequency word translation device, including: a processor, a memory, a system bus;

the processor and the memory are connected through the system bus;

the memory is used for storing one or more programs, and the one or more programs comprise instructions which, when executed by the processor, cause the processor to execute any implementation manner of the low-frequency word translation method.

An embodiment of the present application further provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are run on a terminal device, the terminal device is enabled to execute any implementation manner of the low-frequency word translation method.

The embodiment of the present application further provides a computer program product, which when running on a terminal device, enables the terminal device to execute any implementation manner of the low-frequency word translation method.

In summary, according to the low-frequency word translation method and device provided in the embodiments of the present application, after a source text to be translated is obtained, combined representation results corresponding to each low-frequency word in the source text may be generated, and for each low-frequency word, the combined representation result corresponding to the low-frequency word includes a vector representation result of the low-frequency word and/or a vector representation result of a translated text of the low-frequency word, and a vector representation result of a wildcard after the low-frequency word is replaced with a wildcard, and then the source text is translated according to the combined representation results corresponding to each low-frequency word, so as to obtain a target text. Therefore, compared with a method for directly converting low-frequency words in a source text into wildcards and then translating the wildcards in the prior art, the method and the device for converting the low-frequency words in the source text have the advantages that when the source text is translated, not only are vector representation results of the wildcards correspondingly replaced by the low-frequency words taken into consideration, but also vector representation results of the low-frequency words and/or translated text of the low-frequency words are further taken into consideration, so that the integrity of semantic information of the source text is improved, and the fluency of the translation results is further improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a diagram illustrating an RNN and Attention-based translation model provided by an embodiment of the present application;

fig. 2 is a schematic flowchart of a low-frequency word translation method according to an embodiment of the present application;

FIG. 3 is a diagram illustrating a reverse scan sub-word sequence provided by an embodiment of the present application;

fig. 4 is a schematic flowchart of translating a source text according to a combined representation result corresponding to each low-frequency word provided in the embodiment of the present application;

fig. 5 is a schematic diagram of a function image of a hyperbolic tangent function tanh provided in the embodiment of the present application;

FIG. 6 is a schematic diagram of vector fusion of low-frequency words and wildcards according to an embodiment of the present application;

FIG. 7 is a vector fusion diagram of low frequency word translations and wildcards according to an embodiment of the present application;

fig. 8 is a vector fusion diagram of low-frequency words, low-frequency word translations, and wildcards provided in an embodiment of the present application;

fig. 9 is a schematic composition diagram of a low-frequency word translation apparatus according to an embodiment of the present application.

Detailed Description

Before introducing the low-frequency word translation method provided by the embodiment of the present application, a structure and a function of a low-frequency word translation model that can be used in the embodiment of the present application are first introduced.

The low-frequency word translation model used in the embodiment of the present application may be a translation model based on a neural network and Attention mechanism (Attention), or a translation model based on Attention entirely, and the like. The embodiment of the present application does not limit the type of the Neural Network used in the translation model, and for example, the Neural Network may be a Recurrent Neural Network (RNN) or a Convolutional Neural Network (CNN).

For convenience of description, in the embodiments of the present application, a text to be translated is defined as a source text, and a translation obtained by translating the source text is defined as a target text.

The working process of the translation model is described below by taking the translation model based on RNN and Attention as an example, and as shown in fig. 1, the source text input by the model is x ═ x (x)₁，x₂，x₃，...，x_m) And the target text output by the model is y ═ y (y)₁，y₂，y₃，...，y_n) And the lengths of the source text and the target text are m and n respectively, and the m and the n respectively represent the number of the sub-words contained in the source text and the target text.

The translation model includes three modules, respectively, a bidirectional RNN-based coding module (i.e., an Encoder module), an Attention module (i.e., an Attention module), and an RNN-based decoding module (i.e., a Decoder module), and the functions of each module are described below.

I, Encoder module

The Encoder module functions to compute the token encoding of each word in the source text in the context of the text. Specifically, for source text x ═ (x)₁，x₂，x₃，...，x_m) Each word in (2) can be obtained by a word vector table look-up technique and the like_iCorresponding vector characterization result v_iThen, obtaining the characterization f of the ith word under the condition that the ith word sees historical vocabulary information through a forward recurrent neural network based on the vector characterization result of the ith word_iAnd obtaining the characterization b of the ith word under the condition of seeing future vocabulary information through a reverse circulation neural network_iFinally, the two are spliced together [ f_i：b_i]Forming a final characterization result h of the ith word_iWherein, i is 1, 2, 3.

The Recurrent neural network here may be a common RNN, or an improved structure thereof, such as a Gated Recurrent Unit (GRU) or a long-Short Term Memory network (L ong Short-Term Memory, L STM).

Second, Attention module

The role of the Attention module is to calculate the information representation c of the source text on which the ith decoding moment depends_i. Assume that the last time RNN decoded implicit state is s_i-1Then c is_iThe calculation method of (2) is specifically described as follows:

wherein, a(s)_i-1，h_j) Is based on the variable s_i-1And h_jThe general function of (1) can be implemented in various ways.

It can be seen that semantic information representation c of the source text generated at the ith decoding moment_iIs a characterization result h for each word in the source text_jAnd the characterization result h of each word_jα_ijDetermines the degree of attention that the word receives at the current moment.

Decoder module

The Decoder module is used for representing the vector of the source text dynamically generated at each moment_iAnd the state s of the Decoder module at the previous moment_i-1And generating the target text by adopting the RNN. The specific calculation method is as follows:

s_i＝f(x_i-1，y_i-1，c_i)

wherein, f (-) represents a transformation function based on RNN, and RNN can be a common structure, and can also be GRU or L STM structure added with a gating mechanism;P(y_i＝V_k) Denotes y_iIs the probability of the kth word in the target language vocabulary, b_k(s_i) Representing the transformation function associated with the kth target word.

After the word probability calculation on the target language vocabulary is completed at each decoding moment, a final decoding sequence can be obtained by, for example, the BeamSearch algorithm, that is, the target text y ═ (y ═ is obtained₁，y₂，y₃，...，y_n) The output probability P (y | x) corresponding to the entire target text is maximized.

Next, a method for translating a low-frequency word provided in the embodiment of the present application will be specifically described.

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

First embodiment

Referring to fig. 2, a flow diagram of a low-frequency word translation method provided in this embodiment is shown, where the method includes the following steps:

s201: and generating a combined representation result corresponding to each low-frequency word in the source text.

In this embodiment, a text to be translated is defined as a source text.

It should be noted that the present embodiment does not limit the language of the source text, for example, the source text may be a chinese text, an english text, or the like; the present embodiment also does not limit the length of the source text, for example, the source text may be a word, a sentence, a chapter-level text, etc.

It will be appreciated that one or more low frequency words may be included in the source text based on the length of the source text. The present embodiment does not limit the type of the low frequency words, and may be named entities, complex noun phrases or terms of expertise, and the like.

In this embodiment, for each low-frequency word in the source text, a combined representation result corresponding to each low-frequency word may be generated, and in the combined representation result, a vector representation result of the corresponding low-frequency word and/or a vector representation result of a translated version of the corresponding low-frequency word may be included, and the vector representation result of the wildcard UNKi may be obtained after the corresponding low-frequency word is replaced with the wildcard UNKi.

Specifically, each low-frequency word in the source text needs to be replaced by a wildcard character UNKi, based on which, when the source text is translated in the subsequent step S202, a vector representation result considering the low-frequency word itself and the wildcard character UNKi corresponding to the low-frequency word is combined, or a translation of the low-frequency word in the target language (i.e., the translated language) and a vector representation result considering the wildcard character UNKi corresponding to the low-frequency word are combined, or a translation of the low-frequency word in the target language and a vector representation result considering the wildcard character UNKi corresponding to the low-frequency word are combined. That is to say, the wildcard character UNKi of the low-frequency word and the semantic information of the low-frequency word and/or the translated text are/is comprehensively considered, the source text is translated, and the accuracy and the fluency of the translation result can be effectively improved.

When the source text needs to be translated by using the vector representation result of the low-frequency word translation, the low-frequency words need to be translated, and the low-frequency words can be translated by adopting a strategy of combining a word list lookup mode and a model translation mode. Specifically, for bilingual low-frequency words, because the internal statistical rules of the bilingual low-frequency words are relatively fuzzy and the occurrence frequency of the bilingual low-frequency words in a training corpus is very low, certain difficulty exists in modeling, so that the translation of the low-frequency words can be translated by preferentially using a dictionary searching mode, and if the search fails, the translation is translated by adopting a customized low-frequency word translation model.

The dictionary searching mode can be realized by pre-constructing a word list of low-frequency words (including entities, terms, proper nouns and the like) according to specific application scenes and fields of machine translation, and the returned low-frequency word translation can be ensured to be completely correct and accord with a specific situation as long as the low-frequency words to be searched hit the word list.

However, when the low-frequency word translation is performed by looking up a dictionary, the low-frequency word translation may not be hit depending on the scale of the word list construction. In this case, a customized low-frequency word translation model may be used, and the target translation with the highest probability may be output by using the customized low-frequency word translation model as the translation of the low-frequency word, and specifically, a word model may be used as the low-frequency word translation model to solve the problems of a short low-frequency word and a low frequency, because the frequency of the modeling unit in the model training data may be increased by using the word model, the translation performance of the low-frequency word may be greatly improved. For example, taking "Nanjing Yangtze River Bridge" and its translation "Nanjingtze River Bridge" as an example, the bilingual forms under the character model are "Nanjing Yangtze River Bridge" and "[ Nanjing ] [ Yangtze ] [ RiVer ] [ Bridge ]".

Next, how to generate the combined characterization results corresponding to the low-frequency words in the source text is described, that is, for each low-frequency word, "how to generate the vector characterization result of the low-frequency word," how to generate the vector characterization result of the translation of the low-frequency word, "and" how to generate the vector characterization result of the wildcard corresponding to the low-frequency word. The method comprises the following specific steps:

firstly, generating a vector characterization result of a low-frequency word according to the following mode

In an implementation manner of this embodiment, for each low-frequency word, a vector characterization result of the low-frequency word may be generated by using a vector characterization result of each subword of the low-frequency word.

In this implementation, the low-frequency word may be segmented into sub-words, where the number of characters in a sub-word is not limited, the sub-word may include one character or multiple characters, and for each sub-word, the frequency of the sub-word in the source language (i.e., the language to which the source text belongs) may be higher or lower, that is, the sub-word may be a high-frequency sub-word or a low-frequency sub-word. For example, for the word "robot", assuming that "robot" is a low-frequency word, and the sub-words "machine" and "machine" in the low-frequency word are generally co-occurring in a large-scale corpus with a high frequency, but the frequency of co-occurrence of three words "machine", "machine" and "human" may be relatively low, the "robot" may be segmented into sub-words "machine" and "human", where "machine" and "human" are high-frequency sub-words. Based on this, under the condition that the frequency of the sub-words in the low-frequency words is high, the vector representation result with the high frequency sub-words is better can be learned through the model, so that when the vector representation result of each sub-word of the low-frequency words is used for generating the vector representation result of the low-frequency words, the vector representation result of the low-frequency words can accurately represent the semantic information of the low-frequency words.

More specifically, in this implementation, each subword of the low-frequency word may be reversely scanned by using a neural network to obtain a vector characterization result of a first subword in each subword, and the vector characterization result of the first subword is used as the vector characterization result of the low-frequency word.

In practical implementation, a long-Short-Term Memory (L ong-Term Memory, L STM) Network can be adopted to reversely scan a sub-word sequence of the low-frequency word, so that the first sub-word of the sub-word sequence occupies important information in a vector representation result of the low-frequency word, wherein the L STM Network is good at modeling natural language in machine translation, can convert a text with any length into a floating point vector with a specific dimension, simultaneously memorizes more important words in the text and can keep Memory for a long time, the L STM is a special structural type of a Recurrent Neural Network (RNN) model, three control units of an input gate, an output gate and a forgetting gate are added, and for information entering the L STM Network, the three gates can determine the proportion of memorized, forgotten and output information after judging the three gates, and can effectively solve the problem of long-distance dependence in a Neural Network.

Based on this, if the L STM network is adopted to scan the sub-word sequence of the low-frequency word in the reverse direction, the first sub-word can be guaranteed to play the greatest role in the vector representation result of the low-frequency word, so that the translation result of the low-frequency word is guaranteed to be more smoothly linked with the context of the low-frequency word.

For example, the schematic diagram of the sequence of reverse-scanning subwords shown in FIG. 3, with low-frequency wordsS_unk2(x₆，x₇，x₈) And its vector characterization result

For example, wherein x₆、x₇And x₈For the three sub-words of the low frequency word,

and sequentially representing the results of the vectors of the three sub-words. Then, one by one will

Inputting the code into L STM network, the calculation formula is as follows:

finally, L STM network output vector

I.e. the low frequency word S_unk2(x₆，x₇，x₈) The vector of (2) characterizes the result, where x₆As the last input will be

Plays a greater role in the process.

Secondly, generating a vector representation result of the translation of the low-frequency words according to the following mode

In an implementation manner of this embodiment, for each low-frequency word, a vector representation result of the translated version of the low-frequency word may be generated by using a vector representation result of each subword of the translated version of the low-frequency word. More specifically, in this implementation manner, each subword of the translation of the low-frequency word may be reversely scanned by using the neural network to obtain a vector characterization result of a first subword in each subword, and the vector characterization result of the first subword is used as the vector characterization result of the translation of the low-frequency word.

In this implementation manner, the vector representation result of the translated text of the low-frequency word may be generated in a similar manner to the above-mentioned "generating the vector representation result of the low-frequency word in the following manner", that is, replacing the above-mentioned "low-frequency word" with the "translated text of the low-frequency word", replacing each sub-word of the above-mentioned "low-frequency word" with each sub-word of the "translated text of the low-frequency word", so as to generate the vector representation result of the translated text of the low-frequency word in the above-mentioned manner, and a specific manner is not described herein again.

Thirdly, generating a vector representation result of the wildcard characters corresponding to the low-frequency words according to the following mode

In an implementation manner of this embodiment, for each low-frequency word, a vector representation result of a wildcard corresponding to the low-frequency word carries context semantic information of a sample corpus to which the low-frequency word belongs.

In the implementation manner, a pre-constructed low-frequency word translation model can be adopted to translate a source text, and when the low-frequency word translation model is trained, a large amount of model training data is required to be used for training, the model training data contains a large amount of sample texts, one or more sample texts can contain the low-frequency word in the source text, after the low-frequency word is replaced by a corresponding wildcard character UNKi, word vectors of the wildcard character UNKi and other words in the sample text to which the wildcard character UNKi belongs can be initialized, and the low-frequency word translation model is trained on the basis of the word vectors; in the training process, the vector representation result of the wildcard character UNKi can be continuously updated, so that the vector representation result of the wildcard character UNKi carries semantic information of the context of the sample text to which the wildcard character UNKi belongs, and after the training is finished, the vector representation result of the wildcard character UNKi is fixed.

Based on this, when a vector representation result of the wildcard character UNKi corresponding to the low-frequency word in the source text needs to be generated, the vector representation result of the wildcard character UNKi obtained after model training is finished can be directly used.

S202: and translating the source text according to the combined representation result corresponding to each low-frequency word to obtain a target text.

In this embodiment, after the combined representation results corresponding to the low-frequency words are obtained in step S201, the source text may be translated based on the combined representation results corresponding to the low-frequency words and the respective vector representation results of other words (words other than the low-frequency words) in the source text, where a translated text of the source text is defined as a target text.

As for other words in the source text except for the low-frequency words, the vector representation results of the words may be directly generated by using a word vector generation method, and of course, the vector representation results of the words may also be generated in the manner of "generating the vector representation results of wildcards", that is, the vector representation results of the words are obtained in the process of training the low-frequency word translation model, and for each of the words, the vector representation results of the word carry context semantic information of the sample corpus to which the word belongs.

It should be noted that, when translating the source text, the source text may be translated by using an existing or future-appearing translation model, such as the translation model based on RNN and Attention shown in fig. 1, or the translation model based on CNN and Attention, or the translation model based on Attention, etc. In addition, the present embodiment does not limit the language of the target text, for example, if the source text is chinese, the target text may be english.

It should be noted that, a specific implementation manner of the step S202 will be specifically described in the second embodiment.

In summary, in the low-frequency word translation method provided in this embodiment, after a source text to be translated is obtained, a combined representation result corresponding to each low-frequency word in the source text may be generated, and for each low-frequency word, the combined representation result corresponding to the low-frequency word includes a vector representation result of the low-frequency word and/or a vector representation result of a translated text of the low-frequency word, and a vector representation result of a wildcard after the low-frequency word is replaced with a wildcard, and then the source text is translated according to the combined representation result corresponding to each low-frequency word, so as to obtain a target text. Therefore, compared with a method of directly converting low-frequency words in a source text into wildcards and then translating the wildcards in the prior art, the method and the device have the advantages that when the source text is translated, not only are vector representation results of the wildcards correspondingly replaced by the low-frequency words considered, but also vector representation results of the low-frequency words and/or translated text of the low-frequency words are further considered, so that the integrity of semantic information of the source text is improved, and the fluency of the translation results is further improved.

In addition, the existing low-frequency word translation method aims at the situation that translation missing may occur to a plurality of low-frequency words (greater than or equal to 3), and the low-frequency word translation method adopted by the embodiment can greatly reduce the rate of the translation missing and enable the context of the target text obtained by translation to be smoother.

Second embodiment

It should be noted that, this embodiment will describe a specific implementation manner of step S202 in the first embodiment.

Referring to fig. 4, a schematic flow chart of translating a source text according to a combined representation result corresponding to each low-frequency word provided in this embodiment is shown, where the method includes the following steps:

s401: and for each low-frequency word, carrying out vector fusion on each vector representation result in the combined representation result corresponding to the low-frequency word to obtain a final representation result of the low-frequency word.

In this embodiment, vector fusion may be performed on two or three vector characterization results in the combined characterization result of each low-frequency word, and a final characterization result of each low-frequency word is obtained through vector fusion.

For example, assume that the source text is x and the target text is y, and assume that the sequence of subwords included in the source text x and the target text y are:

x＝(x₁，x₂，x₃，x₄，x₅，x₆，x₇，x₈，x₉，...，x_m)

y＝(y₁，y₂，y₃，y₄，y₅，y₆，y₇，y₈，y₉，...，y_n)

wherein, it is assumed that the sequence S of subwords in the source text x can be obtained by word frequency statistics_unk1(x₂，x₃) And S_unk2(x₆，x₇，x₈) The two low-frequency words to be processed are obtained by means of dictionary lookup or model translation introduced in the first embodiment, and the low-frequency word S is assumed to be a translation of the two low-frequency words_unk1(x₂，x₃) Is a sequence of sub-words T in the target text y_unk1(y₂，y₃，y₄) Low frequency word S_unk2(x₆，x₇，x₈) Is a sequence of sub-words T in the target text y_unk2(y₇，y₈，y₉)。

When the wildcard character UNKi is used to replace the low-frequency words and low-frequency word translations in the bilingual sentence pair (x, y), a new bilingual sentence pair is obtained

Comprises the following steps:

wherein u is₁For replacing low-frequency words S_unk1(x₂，x₃) Wildcard character of u₂For replacing low-frequency words S_unk2(x₆，x₇，x₈) The wildcard character of (1).

At this time, there are the following three vector fusion methods:

fusion mode 1, low-frequency words S in source text x can be combined_unk1(x₂，x₃) The vector characterization result and the wildcard character u₁The vector representation results are fused to obtain a low-frequency word S_unk1(x₂，x₃) (iv) final characterization results; similarly, the low-frequency words S in the source text x_unk2(x₆，x₇，x₈) The vector characterization result and the wildcard character u₂The vector representation results are fused to obtain a low-frequency word S_unk2(x₆，x₇，x₈) The final characterization result of (1).

The fusion mode 2 can be used for fusing low-frequency words S in the source text x_unk1(x₂，x₃) Is translated into a translation T_unk1(y₂，y₃，y₄) The vector characterization result and the wildcard character u₁The vector representation results are fused to obtain a low-frequency word S_unk1(x₂，x₃) (iv) final characterization results; similarly, the low-frequency words S in the source text x_unk2(x₆，x₇，x₈) Is translated into a translation T_unk2(y₇，y₈，y₉) The vector characterization result and the wildcard character u₂The vector representation results are fused to obtain a low-frequency word S_unk2(x₆，x₇，x₈) The final characterization result of (1).

A fusion mode 3 can be used for fusing low-frequency words S in the source text x_unk1(x₂，x₃) The vector characterization result of (2), the low frequency word S_unk1(x₂，x₃) Is translated into a translation T_unk1(y₂，y₃，y₄) The vector characterization result and wildcard character u₁The vector representation results are fused to obtain a low-frequency word S_unk1(x₂，x₃) (iv) final characterization results; similarly, the low-frequency words S in x in the source text are combined_unk2(x₆，x₇，x₈) The vector characterization result of (2), the low frequency word S_unk2(x₆，x₇，x₈) Is translated into a translation T_unk2(y₇，y₈，y₉) The vector characterization result and wildcard character u₂The vector representation results are fused to obtain a low-frequency word S_unk2(x₆，x₇，x₈) The final characterization result of (1).

In an implementation manner of this embodiment, step S401 may specifically include: and for each low-frequency word, performing weighted calculation on each vector representation result in the combined representation result corresponding to the low-frequency word to obtain a weighted calculation result, and performing nonlinear transformation on the weighted calculation result to obtain a final representation result of the low-frequency word.

In this implementation manner, the function of "nonlinear transformation" is mainly implemented by Activation functions (Activation functions), which have very important functions for artificial neural network model learning and representing complex and nonlinear mapping relationships, and specifically, a hyperbolic tangent function tanh may be used as the Activation function, and the calculation formula is as follows:

fig. 5 shows a function image of the hyperbolic tangent function tanh.

Based on this, for each low-frequency word in the source text, after performing weighted calculation on two or three vector representation results in the combined representation result corresponding to the low-frequency word, a hyperbolic tangent function tanh may be used to perform nonlinear transformation on the weighted calculation result, and the transformation result is used as the final representation result of the low-frequency word.

The nonlinear transformation by using the hyperbolic tangent function tanh mainly has the following two advantages:

first, when the value of input x is large or small, the slope of tanh will approach zero indefinitely. That is, if and only if the weighting calculation result corresponding to the low-frequency word is within a certain range, a larger gradient is generated after the non-linear transformation is performed on the low-frequency word through the tanh function; on the contrary, when the weighting calculation result corresponding to the low-frequency word exceeds a certain range, the tanh function does not make a large response any more because the tanh function approaches saturation, at this time, the role of the tanh function is to shield the weighting calculation result as an abnormal value, and in the training process of the low-frequency word translation model, based on the shielding role of the tanh function, certain bias influence on the output of a subsequent coding result and the update of a context vector caused by the fact that the weighting calculation result corresponding to the low-frequency word in the sample text (belonging to the model training data) is too large or too small is avoided.

Second, as can be seen from the tanh function image shown in fig. 5, the function has the characteristics of smooth output, easy derivation, output centered at 0 and between (-1, 1), and the like. The characteristics ensure that the parameter updating efficiency of the tanh function is high, and the weighting calculation result corresponding to the low-frequency word can be rescaled to a certain size range after nonlinear transformation, which is particularly important in the scenes with dense matrix operations such as neural networks.

Therefore, for each low-frequency word in the source text, vector fusion can be performed on each vector representation result in the combined representation result corresponding to the low-frequency word by adopting a mode of firstly weighting calculation and then nonlinear transformation, so that the final representation result of the low-frequency word is obtained. Next, based on the above example, the above three vector fusion methods will be described in detail.

In the fusion mode 1, as shown in the vector fusion diagram of the low-frequency words and the wildcards shown in fig. 6, it is assumed that the source text x includes two low-frequency words, which are respectively the low-frequency words S_unk1(x₂，x₃) And S_unk2(x₆，x₇，x₈)。

Suppose a low frequency word S_unk1(x₂，x₃) The vector characterization result of the sub-word sequence is S_unk1(v₂，v₃) By using S_unk1(v₂，v₃) Can generate low frequency words S_unk1(x₂，x₃) Vector characterization result of

(for example, using the L STM reverse scan method mentioned in the first embodiment) and, in addition, using the method described in the first embodiment, the low-frequency word S can be obtained_unk1(x₂，x₃) Corresponding wildcard character u₁Vector characterization result of

Namely, the wildcard character u is obtained by updating in the training process of the low-frequency word translation model₁Vector characterization result of

At this time, the method of weighting calculation and then nonlinear transformation can be adopted

And

vector fusion is carried out, namely:

wherein, W_u，unk1、W_s，unk1Are weights.

It can be seen that at low frequency the word S_unk1(x₂，x₃) Final characterization result of (5)₁In (1), not only its wildcard character u is included₁Vector characterization result of

Low frequency word S carried in_unk1(x₂，x₃) The context semantic information of the sample corpus (belonging to the training data of the low-frequency word translation model) also comprises the low-frequency word S_unk1(x₂，x₃) Semantic information of itself, therefore, V₁After the coding layer of the low-frequency word translation model is sent into, the vector representation result generated correspondingly

More accurate and more sufficient information is contained. Further, the vector characterizes the result

Will play a positive role in the subsequent Attention module and Decoder module, and further can generate more accurate low-frequency word translation and source text translation (i.e. target text translation)Markup text).

Similarly, assume a low frequency word S_unk2(x₆，x₇，x₈) The vector characterization result of the sub-word sequence is S_unk2(v₆，v₇，v₈) By using S_unk2(x₆，x₇，x₈) Can generate low frequency words S_unk2(x₆，x₇，x₈) Vector characterization result of

(for example, using the L STM reverse scan method mentioned in the first embodiment) and, in addition, using the method described in the first embodiment, the low-frequency word S can be obtained_unk2(x₆，x₇，x₈) Corresponding wildcard character u₂Vector characterization result of

Namely, the low-frequency words S are updated in the training process of the low-frequency word translation model_unk2(x₆，x₇，x₈) Wildcard character u of₂Vector characterization result of

And

vector fusion is carried out, namely:

wherein, W_u，unk2、W_s，unk2Are weights.

It can be seen that at low frequency the word S_unk2(x₆，x₇，x₈) Final characterization result of (5)₂In (1)Includes its wildcard character u₂Vector characterization result of

Low frequency word S carried in_unk2(x₆，x₇，x₈) The context semantic information of the sample corpus (belonging to the training data of the low-frequency word translation model) also comprises the low-frequency word S_unk2(x₆，x₇，x₈) Semantic information of itself, therefore, V₂After the coding layer of the low-frequency word translation model is sent into, the vector representation result generated correspondingly

The subsequent Attention module and Decoder module will play a positive role, and further more accurate low-frequency word translation and source text translation (i.e. target text) can be generated.

In the fusion mode 2, as shown in fig. 7, the low-frequency word translation and wildcard vector fusion diagram is assumed that the source text x includes two low-frequency words, which are respectively the low-frequency words S_unk1(x₂，x₃) And S_unk2(x₆，x₇，x₈)。

Suppose low frequency word translation T_unk1(y₂，y₃，y₄) The vector characterization result of the sub-word sequence is T_unk1(v₂，v₃，v₄) By means of T_unk1(v₂，v₃，v₄) Can generate low-frequency word translation T_unk1(y₂，y₃，y₄) Vector characterization result ofv _unk1(for example, using the L STM reverse scan method mentioned in the first embodiment) and, in addition, using the method described in the first embodiment, the low-frequency word S can be obtained_unk1(x₂，x₃) Corresponding wildcard character u₁Vector table ofCharacterization results

And

vector fusion is carried out, namely:

wherein, W_u，unk1、W_t，unk1Are weights.

Low frequency word S carried in_unk1(x₂，x₃) The context semantic information of the sample corpus (belonging to the training data of the low-frequency word translation model) also comprises the low-frequency word translation T_unk1(y₂，y₃，y₄) Semantic information of itself, therefore, V₁After the coding layer of the low-frequency word translation model is sent into, the vector representation result generated correspondingly

Similarly, assume a low frequency word translation T_unk2(y₇，y₈，y₉) The vector characterization result of the sub-word sequence is T_unk2(v₇，v₈，v₉) By means of T_unk2(v₇，v₈，v₉) Can generate low-frequency word translation T_unk2(y₇，y₈，y₉) Vector characterization result ofv _unk2(for example, using the L STM reverse scan method mentioned in the first embodiment) and, in addition, using the method described in the first embodiment, the low-frequency word S can be obtained_unk2(x₆，x₇，x₈) Corresponding wildcard character u₂Vector characterization result of

And

vector fusion is carried out, namely:

wherein, W_u，unk2、W_t，unk2Are weights.

It can be seen that at low frequency the word S_unk2(x₆，x₇，x₈) Final characterization result of (5)₂In (1), not only its wildcard character u is included₂Vector characterization result of

Low frequency word S carried in_unk2(x₆，x₇，x₈) The context semantic information of the sample corpus (belonging to the training data of the low-frequency word translation model) also comprises the low-frequency word translation T_unk2(y₇，y₈，y₉) Semantic information of itself, therefore, V₂After the coding layer of the low-frequency word translation model is sent into, the vector representation result generated correspondingly

In the fusion mode 3, as shown in fig. 8, the low-frequency word translation and the wildcard vector fusion diagram are assumed that the source text x includes two low-frequency words, which are respectively the low-frequency words S_unk1(x₂，x₃) And S_unk2(x₆，x₇，x₈)。

Referring to the above descriptions of the fusion mode 1 and the fusion mode 2, the low frequency word S can be obtained_unk1(x₂，x₃) Vector characterization result of

Low frequency word S_unk1(x₂，x₃) Translation T_unk1(y₂，y₃，y₄) Vector characterization result ofv _unk1And a low frequency word S_unk1(x₂，x₃) Corresponding wildcard character u₁Vector of (2)Characterization results

At this time, the three may be vector-fused in a manner of first weighting calculation and then nonlinear transformation, so as to obtain the low-frequency word S_unk1(x₂，x₃) Final characterization result of (5)₁Namely:

similarly, referring to the above descriptions of the fusion mode 1 and the fusion mode 2, the low-frequency word S can be obtained_unk2(x₆，x₇，x₈) Vector characterization result of

Low frequency word S_unk2(x₆，x₇，x₈) Translation T_unk2(y₇，y₈，y₉) Vector characterization result ofv _unk2And a low frequency word S_unk2(x₆，x₇，x₈) Corresponding wildcard character u₂Vector characterization result of

At this time, the three may be vector-fused in a manner of first weighting calculation and then nonlinear transformation, so as to obtain the low-frequency word S_unk2(x₆，x₇，x₈) Final characterization result of (5)₂Namely:

when the fusion mode 3 is adopted for vector fusion, the vector representation of the low-frequency word semantic information in two vector spaces of the source language and the target language is utilized at the same time. Since low-frequency words appear only less frequently in a single language, the low-frequency characteristic is not necessarily satisfied at the other end of bilingual translation. Therefore, the fusion mode 3 can ensure that the semantic features of the low-frequency words in two different languages are complementarily extracted in the training process of the low-frequency word translation model, so that the low-frequency translation defect is effectively relieved, and the translation quality of the source text with the low-frequency words in the actual translation can be improved.

In addition, in the training process of the low-frequency word translation model, not only the vector representation result of the low-frequency word is given, but also the vector representation result of the translation of the low-frequency word is given, so that the model can learn a copy mechanism, namely, the model learns a translation mechanism which directly outputs the low-frequency word and the translation of the context thereof through the training process. For example, when translating source text, the model is assumed to incorporate low frequency word translation T during encoding_unk1(y₂，y₃，y₄) And T_unk2(y₇，y₈，y₉) Vector characterization result T of_unk1(v₂，v₃，v₄) And T_unk2(v₇，v₈，v₉) And low-frequency words T are also present in the target text_unk1(y₂，y₃，y₄) And T_unk2(y₇，y₈，y₉)。

In the three fusion methods, the larger the weight is, the greater the influence of the vector representation result corresponding to the weight on the output result of the encoder becomes. The weighted values belong to adaptive parameters of the low-frequency word translation model, and are parameters which are continuously updated in the model training process and obtained after the model training is finished.

S402: and translating the source text according to the respective final characterization result of each low-frequency word to obtain a target text.

In this embodiment, after the final characterization result of each low-frequency word in the source text is obtained in S401, encoding may be performed according to the final characterization result of each low-frequency word and the vector characterization results of other words in the source text, so that the source text is translated based on the encoding result. Because the context information of the low-frequency words and the semantic information of the low-frequency words and/or the translated text thereof are fused into the coding result, the accuracy and the fluency of the translation result can be improved by translating the source text based on the coding result.

Third embodiment

In this embodiment, a low frequency word translation apparatus will be described, and please refer to the above method embodiment for related contents.

Referring to fig. 9, a schematic composition diagram of a low frequency word translation apparatus provided in an embodiment of the present application is shown, where the apparatus 900 includes:

a representation result generating unit 901, configured to generate a combined representation result corresponding to each low-frequency word in the source text; the combined representation result comprises a vector representation result corresponding to the low-frequency word and/or a vector representation result corresponding to a translation of the low-frequency word, and a vector representation result of the wildcard after the corresponding low-frequency word is replaced by the wildcard;

and the low-frequency word translation unit 902 is configured to translate the source text according to the combined representation result corresponding to each low-frequency word, so as to obtain a target text.

In an implementation manner of this embodiment, the low-frequency word translation unit 902 includes:

In an implementation manner of this embodiment, the vector fusion subunit includes:

In an implementation manner of this embodiment, the characterization result generating unit 901 is specifically configured to generate a vector characterization result corresponding to a low-frequency word by using a vector characterization result of each subword corresponding to the low-frequency word.

In an implementation manner of this embodiment, the characterization result generating unit 901 includes:

In an implementation manner of this embodiment, the characterization result generating unit 901 is specifically configured to generate a vector characterization result of the translated text corresponding to the low-frequency word by using a vector characterization result of each subword of the translated text corresponding to the low-frequency word.

In an implementation manner of this embodiment, the vector representation result of the wildcard carries context semantic information of a sample corpus to which the corresponding low-frequency word belongs.

Further, an embodiment of the present application further provides a low-frequency word translation device, including: a processor, a memory, a system bus;

the processor and the memory are connected through the system bus;

the memory is used for storing one or more programs, and the one or more programs comprise instructions which, when executed by the processor, cause the processor to execute any one of the implementation methods of the low-frequency word translation method.

Further, an embodiment of the present application further provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are run on a terminal device, the terminal device is caused to execute any implementation method of the low-frequency word translation method.

Further, an embodiment of the present application further provides a computer program product, which when running on a terminal device, causes the terminal device to execute any implementation method of the above low-frequency word translation method.

As can be seen from the above description of the embodiments, those skilled in the art can clearly understand that all or part of the steps in the above embodiment methods can be implemented by software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present application may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network communication device such as a media gateway, etc.) to execute the method according to the embodiments or some parts of the embodiments of the present application.

It should be noted that, in the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. The term "comprising", without further limitation, means that the element so defined is not excluded from the group consisting of additional identical elements in the process, method, article, or apparatus that comprises the element.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for translating low-frequency words, comprising:

2. The method according to claim 1, wherein translating the source text according to the combined representation result corresponding to each low-frequency word comprises:

3. The method according to claim 2, wherein performing vector fusion on each vector characterization result in the combined characterization results corresponding to the low-frequency word to obtain a final characterization result of the low-frequency word comprises:

4. The method according to any one of claims 1 to 3, wherein the vector characterization result of the corresponding low-frequency word is generated as follows:

5. The method according to claim 4, wherein the generating a vector characterization result of the corresponding low-frequency word by using the vector characterization result of each subword of the corresponding low-frequency word comprises:

6. The method of any one of claims 1 to 3, wherein the vector characterization result of the translated version of the corresponding low-frequency word is generated as follows:

7. The method of claim 6, wherein generating the vector characterization result of the translated version of the corresponding low-frequency word by using the vector characterization result of each subword of the translated version of the corresponding low-frequency word comprises:

8. The method according to any one of claims 1 to 3, wherein the vector representation result of the wildcard carries context semantic information corresponding to the sample corpus to which the low-frequency word belongs.

9. A low-frequency word translation apparatus, comprising:

10. The apparatus of claim 9, wherein the low frequency word translation unit comprises:

11. A low-frequency word translation apparatus, comprising: a processor, a memory, a system bus;

the processor and the memory are connected through the system bus;

the memory is to store one or more programs, the one or more programs comprising instructions, which when executed by the processor, cause the processor to perform the method of any of claims 1-8.

12. A computer-readable storage medium having stored therein instructions that, when executed on a terminal device, cause the terminal device to perform the method of any one of claims 1-8.

13. A computer program product, characterized in that it, when run on a terminal device, causes the terminal device to perform the method of any of claims 1-8.