CN111428518B

CN111428518B - Low-frequency word translation method and device

Info

Publication number: CN111428518B
Application number: CN201910020175.1A
Authority: CN
Inventors: 张学强; 刘俊华; 魏思; 王智国; 胡国平
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2019-01-09
Filing date: 2019-01-09
Publication date: 2023-11-21
Anticipated expiration: 2039-01-09
Also published as: CN111428518A

Abstract

The application discloses a low-frequency word translation method and a device, wherein the method comprises the following steps: after the source text to be translated is obtained, a combination characterization result corresponding to each low-frequency word in the source text can be generated, and regarding each low-frequency word, the combination characterization result corresponding to the low-frequency word comprises a vector characterization result of the low-frequency word and/or a vector characterization result of the translated version of the low-frequency word, and a vector characterization result of the wildcard after the low-frequency word is replaced by the wildcard, then the source text is translated according to the combination characterization result corresponding to each low-frequency word, so that the target text is obtained. Therefore, when the source text is translated, the vector representation result of the wildcard character corresponding to the replacement of the low-frequency word is considered, and the vector representation result of the low-frequency word and/or the translated version of the low-frequency word is further considered, so that the integrity of semantic information of the source text is improved, and the fluency of the translated result is further improved.

Description

Low-frequency word translation method and device

Technical Field

The application relates to the technical field of machine translation, in particular to a low-frequency word translation method and device.

Background

With the continuous development of technology, machine translation becomes an important research topic for solving the communication among different language ethnic groups, wherein the advantages and disadvantages of the low-frequency word translation effect directly affect the smooth trend of the machine translation technology and application to practicality and industrialization. The low-frequency words refer to a class of words that are frequently sparse or never occurring in a large bilingual parallel corpus, and are also commonly referred to as unlisted words (unnknown words) or exowords (Out of vocabulary) in natural language processing according to the degree of the frequency. Because the low-frequency words have the characteristics of frequency sparsity, translation singleness and the like, the low-frequency word translation is always an important point and a difficult point in the research work of machine translation.

In the existing low-frequency word translation method, firstly, low-frequency words in a source text are converted into wild cards, then the source text converted into the wild cards is translated, so that a target text is obtained, and finally the wild cards in the target text are replaced with the original corresponding low-frequency words, so that a final complete translation is formed. However, although the translation method enables the low-frequency word to be translated, the direct conversion of the low-frequency word into the wild card can cause incomplete semantic information of the source text, and further cause insufficient smoothness of the target text obtained after the translation, namely, the problem of reduced translation fluency exists.

Disclosure of Invention

The embodiment of the application mainly aims to provide a low-frequency word translation method and device, which can improve the fluency of a translation result when translating a text to which a low-frequency word belongs.

The embodiment of the application provides a low-frequency word translation method, which comprises the following steps:

generating a combination characterization result corresponding to each low-frequency word in the source text; the combined characterization result comprises a vector characterization result corresponding to the low-frequency word and/or a vector characterization result corresponding to the translation of the low-frequency word, and a vector characterization result of the wild card after the wild card is replaced by the corresponding low-frequency word;

and translating the source text according to the combination characterization results respectively corresponding to the low-frequency words to obtain a target text.

Optionally, the translating the source text according to the combination characterization result corresponding to each low-frequency word includes:

for each low-frequency word, vector fusion is carried out on each vector characterization result in the combination characterization results corresponding to the low-frequency word, and a final characterization result of the low-frequency word is obtained;

and translating the source text according to the final characterization result of each low-frequency word.

Optionally, the vector fusion is performed on each vector characterization result in the combination characterization results corresponding to the low-frequency word, so as to obtain a final characterization result of the low-frequency word, which includes:

Weighting calculation is carried out on each vector characterization result in the combination characterization results corresponding to the low-frequency words, and a weighting calculation result is obtained;

and carrying out nonlinear transformation on the weighted calculation result to obtain a final characterization result of the low-frequency word.

Optionally, the vector characterization result of the corresponding low-frequency word is generated in the following manner:

and generating a vector characterization result corresponding to the low-frequency word by using the vector characterization result corresponding to each sub word of the low-frequency word.

Optionally, the generating the vector characterization result of the corresponding low-frequency word by using the vector characterization result of each sub word of the corresponding low-frequency word includes:

reversely scanning each sub word corresponding to the low-frequency word by using the neural network to obtain a vector characterization result of a first sub word in each sub word;

and taking the vector characterization result of the first subword as the vector characterization result of the corresponding low-frequency word.

Optionally, the vector characterization result of the translated version of the corresponding low-frequency word is generated in the following manner:

and generating a vector characterization result of the translation corresponding to the low-frequency word by using the vector characterization result of each sub word of the translation corresponding to the low-frequency word.

Optionally, the generating the vector characterization result of the translated version of the corresponding low-frequency word by using the vector characterization result of each sub word of the translated version of the corresponding low-frequency word includes:

Reversely scanning each sub word of the translated version of the corresponding low-frequency word by utilizing the neural network to obtain a vector characterization result of a first sub word in each sub word;

and taking the vector characterization result of the first subword as the vector characterization result of the translation corresponding to the low-frequency word.

Optionally, the vector characterization result of the wild card carries context semantic information of the sample corpus to which the corresponding low-frequency word belongs.

The embodiment of the application also provides a low-frequency word translation device, which comprises:

the characterization result generation unit is used for generating a combination characterization result corresponding to each low-frequency word in the source text; the combined characterization result comprises a vector characterization result corresponding to the low-frequency word and/or a vector characterization result corresponding to the translation of the low-frequency word, and a vector characterization result of the wild card after the wild card is replaced by the corresponding low-frequency word;

and the low-frequency word translation unit is used for translating the source text according to the combination characterization results respectively corresponding to the low-frequency words to obtain a target text.

Optionally, the low-frequency word translation unit includes:

vector fusion subunit, configured to, for each low-frequency word, perform vector fusion on each vector characterization result in the combination characterization results corresponding to the low-frequency word, so as to obtain a final characterization result of the low-frequency word;

And the low-frequency word translation subunit is used for translating the source text according to the respective final characterization result of each low-frequency word.

Optionally, the vector fusion subunit includes:

the weighting calculation subunit is used for carrying out weighting calculation on each vector characterization result in the combination characterization results corresponding to the low-frequency words to obtain a weighting calculation result;

and the nonlinear transformation subunit is used for carrying out nonlinear transformation on the weighted calculation result to obtain a final characterization result of the low-frequency word.

Optionally, the characterization result generating unit is specifically configured to generate a vector characterization result corresponding to the low-frequency word by using a vector characterization result of each sub-word corresponding to the low-frequency word.

Optionally, the characterization result generating unit includes:

the first scanning subunit is used for reversely scanning each sub-word corresponding to the low-frequency word by utilizing the neural network to obtain a vector characterization result of a first sub-word in each sub-word;

the first generation subunit is configured to use the vector representation result of the first subword as a vector representation result of the corresponding low-frequency word.

Optionally, the characterization result generating unit is specifically configured to generate a vector characterization result of the translated version of the corresponding low-frequency word by using a vector characterization result of each sub word of the translated version of the corresponding low-frequency word.

Optionally, the characterization result generating unit includes:

the second scanning subunit is used for reversely scanning each sub-word of the translation corresponding to the low-frequency word by utilizing the neural network to obtain a vector characterization result of a first sub-word in each sub-word;

and the second generation subunit is used for taking the vector characterization result of the first subword as the vector characterization result of the translation corresponding to the low-frequency word.

The embodiment of the application also provides low-frequency word translation equipment, which comprises the following steps: a processor, memory, system bus;

the processor and the memory are connected through the system bus;

the memory is configured to store one or more programs, the one or more programs comprising instructions, which when executed by the processor, cause the processor to perform any of the implementations of the low frequency word translation method described above.

The embodiment of the application also provides a computer readable storage medium, wherein the computer readable storage medium stores instructions, and when the instructions run on the terminal equipment, the terminal equipment is caused to execute any implementation mode of the low-frequency word translation method.

The embodiment of the application also provides a computer program product which, when run on terminal equipment, causes the terminal equipment to execute any implementation mode of the low-frequency word translation method.

In summary, according to the low-frequency word translation method and device provided by the embodiment of the application, after a source text to be translated is obtained, a combination characterization result corresponding to each low-frequency word in the source text can be generated, and regarding each low-frequency word, the combination characterization result corresponding to the low-frequency word comprises a vector characterization result of the low-frequency word and/or a vector characterization result of the translated version of the low-frequency word, and a vector characterization result of the wildcard after the low-frequency word is replaced with the wildcard, and then the source text is translated according to the combination characterization result corresponding to each low-frequency word, so that a target text is obtained. Therefore, compared with the mode of directly converting the low-frequency word in the source text into the wild card and then translating the wild card in the prior art, the embodiment of the application not only considers the vector representation result of the wild card corresponding to the replacement of the low-frequency word, but also further considers the vector representation result of the low-frequency word and/or the translated version of the low-frequency word, thereby improving the integrity of semantic information of the source text and further improving the fluency of the translation result.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a translation model based on RNNs and attributes provided by an embodiment of the present application;

FIG. 2 is a schematic flow chart of a low-frequency word translation method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a reverse scan subword sequence according to an embodiment of the disclosure;

FIG. 4 is a schematic flow chart of translating a source text according to the combination characterization result corresponding to each low-frequency word in the embodiment of the present application;

FIG. 5 is a schematic diagram of a function image of a tanh function according to an embodiment of the present application;

FIG. 6 is a diagram illustrating vector fusion of low-frequency words and wild cards according to an embodiment of the present application;

FIG. 7 is a diagram showing vector fusion of low-frequency translation and wild card according to an embodiment of the present application;

FIG. 8 is a diagram showing vector fusion of low-frequency words, low-frequency word translations and wild cards according to an embodiment of the present application;

Fig. 9 is a schematic diagram of a low-frequency word translation device according to an embodiment of the present application.

Detailed Description

Before introducing the low-frequency word translation method provided by the embodiment of the application, the structure and the function of the low-frequency word translation model which can be used by the embodiment of the application are preferably introduced.

The low-frequency word translation model used in the embodiment of the application can be a translation model based on a neural network and Attention mechanism (Attention), or a translation model based on Attention completely, etc. The type of neural network used in the translation model is not limited in the embodiment of the present application, and may be, for example, a recurrent neural network (Recurrent Neural Network, RNN), a convolutional neural network (Convolutional Neural Network, CNN), or the like.

It should be noted that, for convenience of description, in the embodiment of the present application, a text to be translated is defined as a source text, and a translation obtained after the translation of the source text is defined as a target text.

Taking a translation model based on RNN and Attention as an example, the working process of the translation model is described, and as shown in FIG. 1, the source text input by the model is x= (x) ₁ ，x ₂ ，x ₃ ，...，x _m ) The target text output by the model is y= (y) ₁ ，y ₂ ，y ₃ ，...，y _n ) Wherein the source text and The length of the target text is m and n, respectively, and m and n represent the number of subwords contained in the source text and the target text, respectively.

The translation model includes three modules, namely a bi-directional RNN-based encoding module (i.e., an encodable module), an Attention module (i.e., an Attention module), and an RNN-based decoding module (i.e., a Decoder module), the functions of each of which are described below.

1. Encoder module

The effect of the Encoder module is to calculate a token encoding for each word in the source text in the context of the text. Specifically, for the source text x= (x ₁ ，x ₂ ，x ₃ ，...，x _m ) Each word x in the list can be obtained by word vector table look-up technique _i Corresponding vector characterization result v _i Then, based on the vector characterization result of the ith word, obtaining a characterization f of the ith word under the condition that the ith word sees historical vocabulary information through a forward cyclic neural network _i And obtaining a representation b of the ith word under the condition that the ith word sees future vocabulary information through a reverse circulation neural network _i Finally, splice the two together [ f _i ：b _i ]Forming the final characterization result h of the ith word _i Wherein i=1, 2, 3.

The recurrent neural network may be a common RNN, or a modified structure thereof, such as a gated recurrent unit (Gated Recurrent Unit, GRU) or a Long Short-Term Memory (LSTM). For each word in the source text, the calculation of the characterization vector of the word uses both forward historical information and backward future information, so that the information representation can be better based on the context in which the word is located.

2. Attention module

The role of the Attention module is to calculate the information token c of the source text on which the ith decoding moment depends _i . Assuming that the RNN decoding implicit state at the last moment is s _i-1 C is _i The calculation mode of (a) is specifically described as follows:

wherein a(s) _i-1 ，h _j ) Is based on the variable s _i-1 And h _j There are a number of implementations possible.

It can be seen that the semantic information of the source text generated at the ith decoding moment characterizes c _i Is the characterization result h of each word in the source text _j And the characterization result h of each word _j The weighting coefficient alpha of (2) _ij The degree of attention that the word receives at the current time is determined.

3. Decode module

The Decoder module functions to represent c based on the vector of the dynamically generated source text at each time _i And the state s of the Decoder module at the previous time _i-1 Generating target text by adopting an RNN network. The specific calculation mode is as follows:

s _i ＝f(x _i-1 ，y _i-1 ，c _i )

wherein, f (·) represents a transformation function based on RNN, which may be a general structure, or a GRU or LSTM structure incorporating a gating mechanism; p (y) _i ＝V _k ) Representing y _i Is the probability of the kth word in the target language vocabulary, b _k (s _i ) Representing a transformation function associated with the kth target word.

After the word probability calculation on the target language vocabulary at each decoding moment is completed, the final decoding sequence, i.e. the target text y= (y) can be obtained through, for example, the Beam Search algorithm ₁ ，y ₂ ，y ₃ ，...，y _n ) So that the output probability P (y|x) corresponding to the entire target text is maximized.

Next, a specific description will be given of the low-frequency word translation method provided by the embodiment of the present application.

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

First embodiment

Referring to fig. 2, a flow chart of a low-frequency word translation method provided in this embodiment includes the following steps:

s201: and generating a combination characterization result corresponding to each low-frequency word in the source text.

In the present embodiment, text that needs to be translated is defined as source text.

It should be noted that, the embodiment does not limit the language of the source text, for example, the source text may be a chinese text, an english text, etc.; the present embodiment also does not limit the length of the source text, for example, the source text may be words, sentences, chapter level text, and the like.

It is understood that the source text may include one or more low frequency words based on the length of the source text. The embodiment does not limit the type of the low-frequency word, and may be a named entity, a complex noun phrase, a term of art, or the like.

In this embodiment, for each low-frequency word in the source text, a combination characterization result corresponding to each low-frequency word may be generated, where the combination characterization result may include a vector characterization result corresponding to the low-frequency word and/or a vector characterization result corresponding to the translated version of the low-frequency word, and a vector characterization result corresponding to the wild card unti after the wild card unti is replaced with the wild card unti.

Specifically, each low-frequency word in the source text needs to be correspondingly replaced by one wildcard identifier unci, based on which, when the source text is translated in the following step S202, the vector representation results of the low-frequency word and the wildcard identifier unci corresponding to the low-frequency word are taken into consideration, or the translation of the low-frequency word in the target language (i.e. the translated language) and the vector representation results of the wildcard identifier unci corresponding to the low-frequency word are taken into consideration, or the translation of the low-frequency word in the target language and the vector representation results of the wildcard identifier unci corresponding to the low-frequency word are taken into consideration. That is, the wild card UNKi of the low-frequency word and the semantic information of the low-frequency word and/or the translation thereof are comprehensively considered to translate the source text, so that the accuracy and fluency of the translation result can be effectively improved.

When the source text is required to be translated by using the vector characterization result of the low-frequency word translation, the low-frequency word is required to be translated, and a strategy of combining a word searching mode and a model translation mode can be adopted to translate the low-frequency word. In particular, for bilingual low-frequency words, because the internal statistics rule is relatively fuzzy and the occurrence frequency in the training corpus is very low, a certain difficulty exists in modeling, so that the low-frequency words can be translated by preferentially using a dictionary searching mode for translating the low-frequency words, and if searching fails, a customized low-frequency word translation model is adopted for translating the low-frequency words.

Regarding the dictionary searching mode, a low-frequency word (including entity, term, proper noun and the like) word list can be constructed in advance according to specific application scenes and fields of machine translation, and the returned low-frequency word translation can be ensured to be completely correct and accord with a specific situation as long as the low-frequency word to be searched hits the word list.

However, the low-frequency word translation is performed by searching a dictionary, and depending on the scale of the vocabulary construction, a situation that a hit cannot be found may occur. In this case, a customized low-frequency word translation model may be used, and the target translation with the highest probability is output by using the model, as a translation of the low-frequency word, specifically, a word model may be used as a low-frequency word translation model, so as to solve the problems that the low-frequency word is shorter and the frequency is lower, because the frequency of the modeling unit in the model training data may be improved by using the word model, so that the translation performance of the low-frequency word is greatly improved. For example, taking "Nanjing Yangtze Bridge" and its translation "Nanjing Yangtze River Bridge" as an example, the bilingual form in the word model is "Nanjing Yangtze RiVer Bridge".

Next, description is made on how to generate a combination characterization result corresponding to each low-frequency word in the source text, that is, for each low-frequency word, "how to generate a vector characterization result of the low-frequency word," how to generate a vector characterization result of a translated version of the low-frequency word, "and" how to generate a vector characterization result of a wild card corresponding to the low-frequency word. The method comprises the following steps:

1. generating a vector characterization result of the low frequency word in the following way

In one implementation manner of this embodiment, for each low-frequency word, the vector characterization result of each sub-word of the low-frequency word may be used to generate the vector characterization result of the low-frequency word.

In this implementation, the low-frequency word may be segmented into each sub-word, where the number of words in the sub-word is not limited, and the sub-word may include one word or may include multiple words, and for each sub-word, the frequency of the sub-word in the source language (i.e., the language to which the source text belongs) may be higher, or may be lower, that is, the sub-word may be a high-frequency sub-word, or may be a low-frequency sub-word. For example, for the term "robot," it is assumed that the "robot" is a low frequency word, and the subwords "machine" and "machine" in the low frequency word generally co-occur frequently in a large-scale corpus, but the frequency of co-occurrence of three words of "machine", "machine" and "person" may be relatively low, and the "robot" may be split into the subwords "machine" and "person", where "machine" and "person" are high frequency subwords. Based on the above, under the condition that the frequency of the sub-word in the low-frequency word is higher, the better vector representation result of the high-frequency sub-word can be learned through the model, so that when the vector representation result of each sub-word of the low-frequency word is utilized to generate the vector representation result of the low-frequency word, the semantic information of the low-frequency word can be accurately represented by the vector representation result of the low-frequency word.

More specifically, in this implementation manner, each sub-word of the low-frequency word may be reversely scanned by using the neural network, so as to obtain a vector characterization result of a first sub-word in each sub-word, and the vector characterization result of the first sub-word is used as the vector characterization result of the low-frequency word.

In actual implementation, a Long Short-Term Memory (LSTM) network may be used to reversely scan the sequence of the subwords of the low-frequency words, so that the first subword of the sequence of the subwords occupies important information in the vector characterization result of the low-frequency words. The LSTM network is good at modeling natural language in machine translation, can convert text with any length into floating point number vectors with specific dimensions, memorizes important words in the text, and can keep memory for a long time; the LSTM is a special structure type of a cyclic neural network (Recurrent Neural Network, RNN) model, three control units of an input gate, an output gate and a forget gate are added, and the three gates can judge information entering the LSTM network and then determine the proportion of the information to be memorized, forgotten and output, so that the problem of long-distance dependence in the neural network can be effectively solved.

Based on the above, if the LSTM network is adopted to reversely scan the sub word sequence of the low frequency word, the maximum effect of the first sub word in the vector characterization result of the low frequency word can be ensured, so that the connection between the translation result of the low frequency word and the context of the translation result is ensured to be more smooth.

For example, a schematic diagram of a reverse scan subword sequence shown in FIG. 3 is scanned with the low frequency word S _unk2 (x ₆ ，x ₇ ，x ₈ ) Vector characterization resultsFor example, where x ₆ 、x ₇ And x ₈ Three subwords for the low frequency word, < ->Vector characterization results for these three subwords in turn. Then, add +_one by one>Inputting the data into an LSTM network for coding, wherein the calculation formula is as follows:

finally, the output vector of the LSTM networkNamely, the low-frequency word S _unk2 (x ₆ ，x ₇ ，x ₈ ) Is a vector characterization result of (1), wherein x is ₆ Will be at +.>Plays a larger role.

2. Generating a vector characterization result of the translated version of the low frequency word in the following way

In one implementation manner of this embodiment, for each low-frequency word, a vector characterization result of each sub-word of the translated version of the low-frequency word may be generated by using a vector characterization result of each sub-word of the translated version of the low-frequency word. More specifically, in this implementation manner, each sub word of the translated version of the low frequency word may be reversely scanned by using the neural network, so as to obtain a vector characterization result of a first sub word in each sub word, and the vector characterization result of the first sub word is used as a vector characterization result of the translated version of the low frequency word.

In this implementation manner, the vector characterization result of the translated version of the low frequency word may be generated in a similar manner to the "one" method, in which the "low frequency word" is replaced by the "translated version of the low frequency word", and each sub-word of the "low frequency word" is replaced by each sub-word of the "translated version of the low frequency word", so that the vector characterization result of the translated text of the low frequency word is generated in the above manner, and the specific manner is not repeated here.

3. Generating a vector characterization result of the wildcard corresponding to the low-frequency word according to the following way

In one implementation manner of this embodiment, for each low-frequency word, the vector representation result of the wildcard symbol corresponding to the low-frequency word carries the context semantic information of the sample corpus to which the low-frequency word belongs.

In this implementation manner, since a low-frequency word translation model constructed in advance can be used to translate a source text, and when the low-frequency word translation model is trained, a large amount of model training data is required to be used to train, a large amount of sample texts are included in the model training data, one or more sample texts may exist in the sample texts and include the low-frequency word in the source text, after the low-frequency word is replaced by a corresponding wildcard UNKi, word vectors of the wildcard UNKi and other words in the sample text to which the wildcard UNKi belongs can be initialized, and the low-frequency word translation model is trained based on the word vectors; in the training process, the vector characterization result of the wildcard UNKi can be continuously updated, so that the vector characterization result of the wildcard UNKi carries semantic information of the context of the sample text to which the wildcard UNKi belongs, and after the training is finished, the vector characterization result of the wildcard UNKi is fixed.

Based on the above, when the vector characterization result of the wild card UNKi corresponding to the low-frequency word in the source text needs to be generated, the vector characterization result of the wild card UNKi obtained after model training is finished can be directly used.

S202: and translating the source text according to the combination characterization results respectively corresponding to the low-frequency words to obtain the target text.

In this embodiment, after the combined representation results corresponding to the low-frequency words respectively are obtained in step S201, the source text may be translated based on the combined representation results corresponding to the low-frequency words respectively and the vector representation results of the other words (words except the low-frequency words) in the source text, where the translated version of the source text is defined as the target text.

Regarding other words except the low-frequency word in the source text, the word vector generation method may be adopted to directly generate the vector characterization result of the words, or, of course, the vector characterization result of the words may also be generated according to the above-mentioned mode of generating the vector characterization result of the wild card, that is, in the process of training the low-frequency word translation model, the vector characterization result of the words is obtained, and for each word in the words, the vector characterization result of the word carries the context semantic information of the sample corpus to which the word belongs.

It should be noted that, when translating the source text, an existing or future translation model may be used to translate the source text, such as the RNN and Attention-based translation model shown in fig. 1, or the CNN and Attention-based translation model, or the Attention-based translation model. In addition, the language of the target text is not limited in this embodiment, for example, the source text is chinese, and the target text may be english.

It should be further noted that, the specific implementation of this step S202 will be specifically described in the second embodiment.

In summary, in the low-frequency word translation method provided in this embodiment, after a source text to be translated is obtained, a combination characterization result corresponding to each low-frequency word in the source text may be generated, and regarding each low-frequency word, the combination characterization result corresponding to the low-frequency word includes a vector characterization result of the low-frequency word and/or a vector characterization result of a translated version of the low-frequency word, and a vector characterization result of the wild card after the low-frequency word is replaced with the wild card, and then, the source text is translated according to the combination characterization result corresponding to each low-frequency word, so as to obtain a target text. Therefore, compared with the mode of directly converting the low-frequency word in the source text into the wild card and then translating the wild card in the prior art, in the embodiment, when translating the source text, not only the vector representation result of the wild card corresponding to the replacement of the low-frequency word is considered, but also the vector representation result of the low-frequency word and/or the translated version of the low-frequency word is further considered, so that the integrity of semantic information of the source text is improved, and the fluency of the translation result is further improved.

In addition, the existing low-frequency word translation method aims at the situation that translation omission occurs in a plurality of low-frequency words (greater than or equal to 3), and the low-frequency word translation method adopted in the embodiment can greatly reduce the rate of omission and enable the context of the translated target text to be smoother.

Second embodiment

It should be noted that, in this embodiment, a specific implementation manner of step S202 in the above-mentioned first embodiment will be described.

Referring to fig. 4, a flow chart of translating a source text according to a combination characterization result corresponding to each low-frequency word provided in this embodiment is shown, and the method includes the following steps:

s401: and for each low-frequency word, vector fusion is carried out on each vector characterization result in the combination characterization results corresponding to the low-frequency word, and a final characterization result of the low-frequency word is obtained.

In this embodiment, two or three vector characterization results in the combined characterization result of each low-frequency word may be subjected to vector fusion, and a final characterization result of each low-frequency word is obtained through vector fusion.

For illustration, assume that the source text is x, the target text is y, and that the sub-word sequences contained in the source text x and the target text y are:

x＝(x ₁ ，x ₂ ，x ₃ ，x ₄ ，x ₅ ，x ₆ ，x ₇ ，x ₈ ，x ₉ ，...，x _m )

y＝(y ₁ ，y ₂ ，y ₃ ，y ₄ ，y ₅ ，y ₆ ，y ₇ ，y ₈ ，y ₉ ，...，y _n )

Wherein, it is assumed that the sub-word sequence S in the source text x can be obtained through word frequency statistics _unk1 (x ₂ ，x ₃ ) And S is equal to _unk2 (x ₆ ，x ₇ ，x ₈ ) The two low-frequency words to be processed can be obtained by searching dictionary or model translation in the first embodimentTranslation of the frequency word, supposing the low frequency word S _unk1 (x ₂ ，x ₃ ) Is the sequence of subwords T in the target text y _unk1 (y ₂ ，y ₃ ，y ₄ ) Low frequency word S _unk2 (x ₆ ，x ₇ ，x ₈ ) Is the sequence of subwords T in the target text y _unk2 (y ₇ ，y ₈ ，y ₉ )。

When the universal character UNKi is adopted to replace the low-frequency words and the low-frequency word translations in the bilingual sentence pairs (x, y), a new bilingual sentence pair is obtainedThe method comprises the following steps:

wherein u is ₁ For replacing low frequency words S _unk1 (x ₂ ，x ₃ ) Wild card of u ₂ For replacing low frequency words S _unk2 (x ₆ ，x ₇ ，x ₈ ) Is a wild card of (c).

At this time, there are three vector fusion approaches:

fusion mode 1, the low-frequency word S in the source text x can be obtained _unk1 (x ₂ ，x ₃ ) Vector characterization result of (2) and wild card u ₁ The vector characterization result of (2) is fused to obtain the low-frequency word S _unk1 (x ₂ ，x ₃ ) Is a final characterization result of (a); similarly, the low-frequency word S in the source text x is used for _unk2 (x ₆ ，x ₇ ，x ₈ ) Vector characterization result of (2) and wild card u ₂ The vector characterization result of (2) is fused to obtain the low-frequency word S _unk2 (x ₆ ，x ₇ ，x ₈ ) Is a final characterization result of (2).

Fusion mode 2, source text can be usedLow frequency word S in x _unk1 (x ₂ ，x ₃ ) Translation T of (C) _unk1 (y ₂ ，y ₃ ，y ₄ ) Vector characterization result of (2) and wild card u ₁ The vector characterization result of (2) is fused to obtain the low-frequency word S _unk1 (x ₂ ，x ₃ ) Is a final characterization result of (a); similarly, the low-frequency word S in the source text x is used for _unk2 (x ₆ ，x ₇ ，x ₈ ) Translation T of (C) _unk2 (y ₇ ，y ₈ ，y ₉ ) Vector characterization result of (2) and wild card u ₂ The vector characterization result of (2) is fused to obtain the low-frequency word S _unk2 (x ₆ ，x ₇ ，x ₈ ) Is a final characterization result of (2).

Fusion mode 3, the low-frequency word S in the source text x can be obtained _unk1 (x ₂ ，x ₃ ) Vector characterization result of (2), the low frequency word S _unk1 (x ₂ ，x ₃ ) Translation T of (C) _unk1 (y ₂ ，y ₃ ，y ₄ ) Vector characterization result of (2) and wild card u ₁ The vector characterization result of (2) is fused to obtain the low-frequency word S _unk1 (x ₂ ，x ₃ ) Is a final characterization result of (a); similarly, the low-frequency words S in the x in the source text are treated _unk2 (x ₆ ，x ₇ ，x ₈ ) Vector characterization result of (2), the low frequency word S _unk2 (x ₆ ，x ₇ ，x ₈ ) Translation T of (C) _unk2 (y ₇ ，y ₈ ，y ₉ ) Vector characterization result of (2) and wild card u ₂ The vector characterization result of (2) is fused to obtain the low-frequency word S _unk2 (x ₆ ，x ₇ ，x ₈ ) Is a final characterization result of (2).

In one implementation manner of this embodiment, this step S401 may specifically include: and for each low-frequency word, carrying out weighted calculation on each vector characterization result in the combination characterization result corresponding to the low-frequency word to obtain a weighted calculation result, and carrying out nonlinear transformation on the weighted calculation result to obtain a final characterization result of the low-frequency word.

In the implementation manner, the function of nonlinear transformation is mainly realized by an activation function (Activation functions), the activation function plays a very important role in learning an artificial neural network model and representing complex and nonlinear mapping relations, and a hyperbolic tangent function tanh can be used as the activation function, and the calculation formula is as follows:

the functional image of the hyperbolic tangent function tanh is shown in fig. 5.

Based on the above, for each low-frequency word in the source text, after weighting calculation is performed on two or three vector characterization results in the combination characterization results corresponding to the low-frequency word, the weighting calculation results can be subjected to nonlinear transformation by adopting a hyperbolic tangent function tanh, and the transformation results are used as final characterization results of the low-frequency word.

The nonlinear transformation by adopting the hyperbolic tangent function tanh mainly comprises the following two advantages:

first, when the value of input x is large or small, the slope of tanh will approach zero indefinitely. That is, if and only if the weighted calculation result corresponding to the low-frequency word is within a certain range, a larger gradient exists after the tan h function is subjected to nonlinear transformation; on the contrary, when the weighted calculation result corresponding to the low-frequency word exceeds a certain range, the tanh function does not respond more because the tanh function is approaching saturation, and at this time, the tanh function is used for shielding the weighted calculation result serving as an abnormal value, and in the training process of the low-frequency word translation model, the influence of certain bias on the output of the subsequent coding result and the updating of the context vector due to the fact that the weighted calculation result corresponding to the low-frequency word in the sample text (belonging to model training data) is too large or too small is avoided based on the shielding effect of the tanh function.

Second, as can be seen from the tanh function image shown in fig. 5, the function has characteristics of smooth output, easy derivation, output centering on 0 and being between (-1, 1), and the like. The characteristics ensure high parameter updating efficiency of the tanh function, and the weighted calculation result corresponding to the low-frequency word can be rescaled to a certain size range after nonlinear transformation, which is particularly important in the scene of matrix operation intensive such as a neural network.

Therefore, for each low-frequency word in the source text, each vector characterization result in the combination characterization result corresponding to the low-frequency word can be subjected to vector fusion in a mode of weighting calculation and nonlinear transformation, so that the final characterization result of the low-frequency word is obtained. Next, the above three vector fusion modes will be specifically described based on the above examples.

In the above fusion mode 1, as shown in the vector fusion diagram of the low-frequency words and wildcards in fig. 6, it is assumed that the source text x includes two low-frequency words, namely the low-frequency word S _unk1 (x ₂ ，x ₃ ) And S is _unk2 (x ₆ ，x ₇ ，x ₈ )。

Assume low frequency word S _unk1 (x ₂ ，x ₃ ) The result of the vector characterization of the sequence of subwords is S _unk1 (v ₂ ，v ₃ ) By S _unk1 (v ₂ ，v ₃ ) Can generate low-frequency words S _unk1 (x ₂ ，x ₃ ) Vector characterization result of (2)(e.g., by reverse scanning with LSTM as mentioned in the first embodiment); in addition, the method described in the first embodiment can be adopted to obtain the low-frequency word S _unk1 (x ₂ ，x ₃ ) Corresponding wild card u ₁ Vector characterization result->That is, the wild card symbol u is updated and obtained in the training process of the low-frequency word translation model ₁ Vector characterization result->At this time, the method of weighting calculation and nonlinear transformation can be adopted, and the method is that +.>And (3) withVector fusion is performed, namely:

wherein W is _u，unk1 、W _s，unk1 Is the weight.

It can be seen that in the low frequency word S _unk1 (x ₂ ，x ₃ ) Final characterization result V of (2) ₁ In which not only is contained its wild card u ₁ Vector characterization result of (2)Low frequency word S carried in _unk1 (x ₂ ，x ₃ ) The context semantic information of the belonged sample corpus (training data belonging to the low-frequency word translation model) also comprises the low-frequency word S _unk1 (x ₂ ，x ₃ ) Semantic information of itself, therefore, V ₁ After being sent into the coding layer of the low-frequency word translation model, the corresponding generated vector characterization result is ++>More accurate and more complete in the information contained. Further, the vector characterizes the result +.>The method plays a positive role in the subsequent Attention module and Decoder module, and further can generate more accurate low-frequency word translations and source text translations (namely target texts).

Similarly, assume the low frequency word S _unk2 (x ₆ ，x ₇ ，x ₈ ) The result of the vector characterization of the sequence of subwords is S _unk2 (v ₆ ，v ₇ ，v ₈ ) By S _unk2 (x ₆ ，x ₇ ，x ₈ ) Can generate low-frequency words S _unk2 (x ₆ ，x ₇ ，x ₈ ) Vector characterization result of (2)(e.g., by reverse scanning with LSTM as mentioned in the first embodiment); in addition, the method described in the first embodiment can be adopted to obtain the low-frequency word S _unk2 (x ₆ ，x ₇ ，x ₈ ) Corresponding wild card u ₂ Vector characterization result->That is, the low-frequency word S is updated in the training process of the low-frequency word translation model _unk2 (x ₆ ，x ₇ ，x ₈ ) Wild card u of (1) ₂ Vector characterization result->At this time, the method of weighting calculation and nonlinear transformation can be adopted, and the method is that +.>And->Vector fusion is performed, namely:

wherein W is _u，unk2 、W _s，unk2 Is the weight.

It can be seen that in the low frequency word S _unk2 (x ₆ ，x ₇ ，x ₈ ) Final characterization result V of (2) ₂ In which not only is contained its wild card u ₂ Vector characterization result of (2)Low frequency word S carried in _unk2 (x ₆ ，x ₇ ，x ₈ ) The context semantic information of the belonged sample corpus (training data belonging to the low-frequency word translation model) also comprises the low-frequency word S _unk2 (x ₆ ，x ₇ ，x ₈ ) Semantic information of itself, therefore, V ₂ After being sent into the coding layer of the low-frequency word translation model, the corresponding generated vector characterization result is ++>More accurate and more complete in the information contained. Further, the vector characterizes the result +.>The method plays a positive role in the subsequent Attention module and Decoder module, and further can generate more accurate low-frequency word translations and source text translations (namely target texts).

In the above fusion mode 2, as shown in the vector fusion diagram of the low-frequency word translation and the wild card shown in fig. 7, it is assumed that the source text x includes two low-frequency words, namely the low-frequency word S _unk1 (x ₂ ，x ₃ ) And S is _unk2 (x ₆ ，x ₇ ，x ₈ )。

Let us assume low frequency word translation T _unk1 (y ₂ ，y ₃ ，y ₄ ) The result of the vector characterization of the sequence of subwords of (a) is T _unk1 (v ₂ ，v ₃ ，v ₄ ) By T _unk1 (v ₂ ，v ₃ ，v ₄ ) Can generate low-frequency word translation T _unk1 (y ₂ ，y ₃ ，y ₄ ) Vector characterization result of (2)v _unk1 (e.g., by reverse scanning with LSTM as mentioned in the first embodiment); in addition, the method described in the first embodiment can be adopted to obtain the low-frequency word S _unk1 (x ₂ ，x ₃ ) Corresponding wild card u ₁ Vector characterization result of (2)That is, the wild card symbol u is updated and obtained in the training process of the low-frequency word translation model ₁ Vector characterization result->At this time, the method of weighting calculation and nonlinear transformation can be adopted, and the method is that +.>And->Vector fusion is performed, namely:

wherein W is _u，unk1 、W _t，unk1 Is the weight.

It can be seen that in the low frequency word S _unk1 (x ₂ ，x ₃ ) Final characterization result V of (2) ₁ In which not only is contained its wild card u ₁ Vector characterization result of (2)Low frequency word S carried in _unk1 (x ₂ ，x ₃ ) The context semantic information of the belonged sample corpus (training data belonging to the low-frequency word translation model) also comprises the low-frequency word translation T _unk1 (y ₂ ，y ₃ ，y ₄ ) Semantic information of itself, therefore, V ₁ After being sent into the coding layer of the low-frequency word translation model, the corresponding generated vector characterization result is ++>More accurate and more complete in the information contained. Further, the vector characterizes the result +.>The method plays a positive role in the subsequent Attention module and Decoder module, and further can generate more accurate low-frequency word translations and source text translations (namely target texts).

Similarly, assume a low-frequency word translation T _unk2 (y ₇ ，y ₈ ，y ₉ ) The result of the vector characterization of the sequence of subwords of (a) is T _unk2 (v ₇ ，v ₈ ，v ₉ ) By T _unk2 (v ₇ ，v ₈ ，v ₉ ) Can generate low-frequency word translation T _unk2 (y ₇ ，y ₈ ，y ₉ ) Vector characterization result of (2)v _unk2 (e.g., by reverse scanning with LSTM as mentioned in the first embodiment); in addition, the method described in the first embodiment can be adopted to obtain the low-frequency word S _unk2 (x ₆ ，x ₇ ，x ₈ ) Corresponding wild card u ₂ Vector characterization result of (2)That is, the low-frequency word S is updated in the training process of the low-frequency word translation model _unk2 (x ₆ ，x ₇ ，x ₈ ) Wild card u of (1) ₂ Vector characterization result->At this time, the method of weighting calculation and nonlinear transformation can be adopted, and the method is that +.>And->Vector fusion is performed, namely:

wherein W is _u，unk2 、W _t，unk2 Is the weight.

It can be seen that in the low frequency word S _unk2 (x ₆ ，x ₇ ，x ₈ ) Final characterization result V of (2) ₂ In which not only is contained its wild card u ₂ Vector characterization result of (2)Low frequency word S carried in _unk2 (x ₆ ，x ₇ ，x ₈ ) The context semantic information of the belonged sample corpus (training data belonging to the low-frequency word translation model) also comprises the low-frequency word translation T _unk2 (y ₇ ，y ₈ ，y ₉ ) Semantic information of itself, therefore, V ₂ After being sent into the coding layer of the low-frequency word translation model, the corresponding generated vector characterization result is ++>More accurate and more complete in the information contained. Further, the vector characterizes the result +.>The method plays a positive role in the subsequent Attention module and Decoder module, and further can generate more accurate low-frequency word translations and source text translations (namely target texts).

In the above fusion mode 3, as shown in the vector fusion diagram of the low-frequency word, the translation of the low-frequency word and the wildcard shown in fig. 8, it is assumed that the source text x includes two low-frequency words, which are respectively the low-frequency words S _unk1 (x ₂ ，x ₃ ) And S is _unk2 (x ₆ ，x ₇ ，x ₈ )。

Referring to the above description of the fusion mode 1 and the fusion mode 2, the low-frequency word S can be obtained _unk1 (x ₂ ，x ₃ ) Vector characterization result of (2)Low frequency word S _unk1 (x ₂ ，x ₃ ) Translation T of (1) _unk1 (y ₂ ，y ₃ ，y ₄ ) Vector characterization result of (2)v _unk1 Low frequency word S _unk1 (x ₂ ，x ₃ ) Corresponding wild card u ₁ Vector characterization result->At this time, a mode of weighting calculation and nonlinear transformation can be adopted to perform vector fusion on the three, thereby obtaining the low-frequency word S _unk1 (x ₂ ，x ₃ ) Final characterization result V of (2) ₁ The method comprises the following steps:

similarly, referring to the above description of the fusion mode 1 and the fusion mode 2, the low-frequency word S can be obtained _unk2 (x ₆ ，x ₇ ，x ₈ ) Vector characterization result of (2)Low frequency word S _unk2 (x ₆ ，x ₇ ，x ₈ ) Translation T of (1) _unk2 (y ₇ ，y ₈ ，y ₉ ) Vector characterization result of (2)v _unk2 Low frequency word S _unk2 (x ₆ ，x ₇ ，x ₈ ) Corresponding wild card u ₂ Vector characterization result->At this time, a mode of weighting calculation and nonlinear transformation can be adopted to perform vector fusion on the three, thereby obtaining the low-frequency word S _unk2 (x ₆ ，x ₇ ，x ₈ ) Final characterization result V of (2) ₂ The method comprises the following steps:

when vector fusion is carried out in the fusion mode 3, vector characterization of low-frequency word sense information under two vector spaces of a source language and a target language is utilized. Because the low-frequency words only appear in a certain single language with lower frequency, the characteristic of lower frequency is not necessarily satisfied at the other end of the bilingual translation. Therefore, the fusion mode 3 can ensure that semantic features of the low-frequency words under two different languages are extracted complementarily in the training process of the low-frequency word translation model, so that the low-frequency translation defect is effectively relieved, and further in actual translation, the translation quality of the source text with the low-frequency words can be improved.

In addition, in the training process of the low-frequency word translation model, not only the vector characterization result of the low-frequency word is given, but also the vector characterization result of the translation of the low-frequency word is given, so that the model can learn a copying mechanism, namely, the model can learn a translation mechanism for directly outputting the low-frequency word and the translation of the context thereof together through the training process. For example, in translating a source text, it is assumed that the model incorporates the low frequency word translations T during the encoding process _unk1 (y ₂ ，y ₃ ，y ₄ ) And T is _unk2 (y ₇ ，y ₈ ，y ₉ ) Vector characterization result T of (2) _unk1 (v ₂ ，v ₃ ，v ₄ ) And T is _unk2 (v ₇ ，v ₈ ，v ₉ ) While the low frequency word T is also present in the target text _unk1 (y ₂ ，y ₃ ，y ₄ ) And T is _unk2 (y ₇ ，y ₈ ，y ₉ )。

In the above three fusion methods, the larger the weight is, the larger the influence of the vector characterization result corresponding to the weight on the encoder output result is. The weight values belong to self-adaptive parameters of the low-frequency word translation model, and are parameters which are continuously updated in the model training process and obtained after the model training is finished.

S402: and translating the source text according to the final characterization result of each low-frequency word to obtain a target text.

In this embodiment, after the final characterization result of each low-frequency word in the source text is obtained through S401, encoding may be performed according to the final characterization result of each low-frequency word and the vector characterization results of other words in the source text, so as to translate the source text based on the encoding result. Because the coding result not only integrates the context information of the low-frequency word, but also integrates the semantic information of the low-frequency word and/or the translation thereof, the source text is translated based on the coding result, and the accuracy and fluency of the translation result can be improved.

Third embodiment

The embodiment will be described with reference to a low-frequency word translation device, and the related content is referred to the above method embodiment.

Referring to fig. 9, a schematic diagram of a low-frequency word translation device according to an embodiment of the present application is provided, where the device 900 includes:

the characterization result generating unit 901 is configured to generate a combined characterization result corresponding to each low-frequency word in the source text; the combined characterization result comprises a vector characterization result corresponding to the low-frequency word and/or a vector characterization result corresponding to the translation of the low-frequency word, and a vector characterization result of the wild card after the wild card is replaced by the corresponding low-frequency word;

and the low-frequency word translation unit 902 is configured to translate the source text according to the combination characterization results corresponding to the low-frequency words respectively, so as to obtain a target text.

In one implementation manner of this embodiment, the low-frequency word translation unit 902 includes:

In one implementation of this embodiment, the vector fusion subunit includes:

In one implementation manner of this embodiment, the token result generation unit 901 is specifically configured to generate a vector token result corresponding to a low-frequency word by using a vector token result corresponding to each sub-word of the low-frequency word.

In one implementation manner of the present embodiment, the characterization result generating unit 901 includes:

In one implementation manner of this embodiment, the token result generating unit 901 is specifically configured to generate a vector token result of a translated version of a corresponding low-frequency word by using a vector token result of each sub word of the translated version of the corresponding low-frequency word.

In an implementation manner of this embodiment, the vector characterization result of the wildcard symbol carries context semantic information of the sample corpus to which the corresponding low-frequency word belongs.

Further, the embodiment of the application also provides a low-frequency word translation device, which comprises: a processor, memory, system bus;

the processor and the memory are connected through the system bus;

Further, the embodiment of the application also provides a computer readable storage medium, wherein the computer readable storage medium stores instructions, and when the instructions run on the terminal equipment, the terminal equipment is caused to execute any implementation method of the low-frequency word translation method.

Further, the embodiment of the application also provides a computer program product, which when being run on the terminal equipment, causes the terminal equipment to execute any implementation method of the low-frequency word translation method.

From the above description of embodiments, it will be apparent to those skilled in the art that all or part of the steps of the above described example methods may be implemented in software plus necessary general purpose hardware platforms. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network communication device such as a media gateway, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present application.

It should be noted that, in the present description, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different manner from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

It is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A low frequency word translation method, comprising:

translating the source text according to the combination characterization results respectively corresponding to the low-frequency words to obtain a target text;

the vector characterization result of the wild card carries context semantic information of the sample corpus to which the corresponding low-frequency word belongs.

2. The method according to claim 1, wherein translating the source text according to the combined characterization result corresponding to each low frequency word includes:

3. The method of claim 2, wherein vector fusion is performed on each vector characterization result in the combined characterization results corresponding to the low-frequency word to obtain a final characterization result of the low-frequency word, and the method comprises:

4. A method according to any one of claims 1 to 3, wherein the vector characterization result of the corresponding low frequency word is generated in the following way:

5. The method of claim 4, wherein generating the vector characterization result for the low frequency word using the vector characterization result for each sub-word for the low frequency word comprises:

6. A method according to any one of claims 1 to 3, wherein the vector characterization result of the translated version of the corresponding low frequency word is generated in the following manner:

7. The method of claim 6, wherein generating the vector characterization result for the translated version of the low frequency word using the vector characterization result for each sub-word of the translated version of the corresponding low frequency word comprises:

8. A low frequency word translation device, comprising:

the low-frequency word translation unit is used for translating the source text according to the combination characterization results respectively corresponding to the low-frequency words to obtain a target text;

9. The apparatus of claim 8, wherein the low frequency word translation unit comprises:

10. A low frequency word translation apparatus, comprising: a processor, memory, system bus;

the processor and the memory are connected through the system bus;

the memory is for storing one or more programs, the one or more programs comprising instructions, which when executed by the processor, cause the processor to perform the method of any of claims 1-7.

11. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein instructions, which when run on a terminal device, cause the terminal device to perform the method of any of claims 1-7.

12. A computer program product, characterized in that the computer program product, when run on a terminal device, causes the terminal device to perform the method of any of claims 1-7.