CN111428518A - Low-frequency word translation method and device - Google Patents

Low-frequency word translation method and device Download PDF

Info

Publication number
CN111428518A
CN111428518A CN201910020175.1A CN201910020175A CN111428518A CN 111428518 A CN111428518 A CN 111428518A CN 201910020175 A CN201910020175 A CN 201910020175A CN 111428518 A CN111428518 A CN 111428518A
Authority
CN
China
Prior art keywords
low
frequency word
result
frequency
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910020175.1A
Other languages
Chinese (zh)
Other versions
CN111428518B (en
Inventor
张学强
刘俊华
魏思
王智国
胡国平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iFlytek Co Ltd
Original Assignee
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co Ltd filed Critical iFlytek Co Ltd
Priority to CN201910020175.1A priority Critical patent/CN111428518B/en
Publication of CN111428518A publication Critical patent/CN111428518A/en
Application granted granted Critical
Publication of CN111428518B publication Critical patent/CN111428518B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The application discloses a low-frequency word translation method and a device, wherein the method comprises the following steps: after a source text to be translated is obtained, combined representation results corresponding to each low-frequency word in the source text can be generated, for each low-frequency word, the combined representation results corresponding to the low-frequency words comprise vector representation results of the low-frequency words and/or vector representation results of a translated text of the low-frequency words and vector representation results of wildcards after the low-frequency words are replaced by wildcards, and then the source text is translated according to the combined representation results corresponding to the low-frequency words, so that a target text is obtained. Therefore, when the source text is translated, the vector representation result of the wildcard characters correspondingly replaced by the low-frequency words is considered, and the vector representation result of the low-frequency words and/or the translated text of the low-frequency words is further considered, so that the integrity of semantic information of the source text is improved, and the fluency of the translation result is improved.

Description

Low-frequency word translation method and device
Technical Field
The application relates to the technical field of machine translation, in particular to a low-frequency word translation method and device.
Background
With the continuous development of science and technology, machine translation becomes an important research subject for solving the problem of interactive communication of different language ethnic groups, wherein the quality of the translation effect of low-frequency words directly influences whether the machine translation technology and application can be successfully put to practical use and industrialization. Low-frequency words refer to a class of words that are sparse in frequency or never appear in a large-scale bilingual parallel corpus, and are often called unknown words (unknown) or Out-of-vocabulary words (Out of vocabularies) in natural language processing according to the frequency degree. Because the low-frequency words have the characteristics of frequency sparsity, single translated text and the like, the translation of the low-frequency words is always the key and difficult point in the research work of machine translation.
In the existing low-frequency word translation method, low-frequency words in a source text are converted into wildcards, then the source text converted into the wildcards is translated to obtain a target text, and finally the wildcards in the target text are replaced by the original corresponding low-frequency words to form a final complete translation. However, although the translation method enables the low-frequency words to be translated, directly converting the low-frequency words into wildcards may cause incomplete semantic information of the source text, and further cause insufficient smoothness of the target text obtained after translation, that is, there is a problem of reduced fluency of the translated text.
Disclosure of Invention
The embodiment of the application mainly aims to provide a low-frequency word translation method and device, which can improve fluency of translation results when a text to which a low-frequency word belongs is translated.
The embodiment of the application provides a low-frequency word translation method, which comprises the following steps:
generating a combined representation result corresponding to each low-frequency word in the source text; the combined representation result comprises a vector representation result corresponding to the low-frequency word and/or a vector representation result corresponding to a translation of the low-frequency word, and a vector representation result of the wildcard after the corresponding low-frequency word is replaced by the wildcard;
and translating the source text according to the combined representation result corresponding to each low-frequency word to obtain a target text.
Optionally, the translating the source text according to the combined representation result corresponding to each low-frequency word includes:
for each low-frequency word, carrying out vector fusion on each vector representation result in the combined representation result corresponding to the low-frequency word to obtain a final representation result of the low-frequency word;
and translating the source text according to the respective final characterization result of each low-frequency word.
Optionally, the vector fusion is performed on each vector representation result in the combined representation result corresponding to the low-frequency word to obtain a final representation result of the low-frequency word, and the method includes:
performing weighted calculation on each vector representation result in the combined representation results corresponding to the low-frequency words to obtain weighted calculation results;
and carrying out nonlinear transformation on the weighting calculation result to obtain a final characterization result of the low-frequency word.
Optionally, the vector characterization result of the corresponding low-frequency word is generated as follows:
and generating a vector representation result of the corresponding low-frequency word by using the vector representation result of each subword of the corresponding low-frequency word.
Optionally, the generating a vector representation result of a corresponding low-frequency word by using a vector representation result of each subword of the corresponding low-frequency word includes:
reversely scanning each subword corresponding to the low-frequency word by utilizing a neural network to obtain a vector representation result of a first subword in each subword;
and taking the vector representation result of the first sub-word as the vector representation result of the corresponding low-frequency word.
Optionally, the vector characterization result of the translation of the corresponding low-frequency word is generated according to the following method:
and generating a vector representation result of the translation of the corresponding low-frequency word by using the vector representation result of each sub-word of the translation of the corresponding low-frequency word.
Optionally, the generating a vector representation result of the translation of the corresponding low-frequency word by using the vector representation result of each subword of the translation of the corresponding low-frequency word includes:
reversely scanning each sub-word of the translation of the corresponding low-frequency word by using a neural network to obtain a vector representation result of a first sub-word in each sub-word;
and taking the vector representation result of the first sub-word as the vector representation result of the translation corresponding to the low-frequency word.
Optionally, the vector representation result of the wildcard carries context semantic information of the sample corpus to which the corresponding low-frequency word belongs.
The embodiment of the present application further provides a low frequency word translation device, including:
the characterization result generation unit is used for generating a combined characterization result corresponding to each low-frequency word in the source text; the combined representation result comprises a vector representation result corresponding to the low-frequency word and/or a vector representation result corresponding to a translation of the low-frequency word, and a vector representation result of the wildcard after the corresponding low-frequency word is replaced by the wildcard;
and the low-frequency word translation unit is used for translating the source text according to the combined representation result corresponding to each low-frequency word to obtain a target text.
Optionally, the low-frequency word translation unit includes:
the vector fusion subunit is used for performing vector fusion on each vector representation result in the combined representation result corresponding to each low-frequency word to obtain a final representation result of the low-frequency word;
and the low-frequency word translation subunit is used for translating the source text according to the respective final representation result of each low-frequency word.
Optionally, the vector fusion subunit includes:
the weighting calculation subunit is configured to perform weighting calculation on each vector representation result in the combined representation result corresponding to the low-frequency word to obtain a weighting calculation result;
and the nonlinear transformation subunit is used for carrying out nonlinear transformation on the weighting calculation result to obtain a final characterization result of the low-frequency word.
Optionally, the characterization result generating unit is specifically configured to generate a vector characterization result corresponding to the low-frequency word by using the vector characterization result of each subword corresponding to the low-frequency word.
Optionally, the characterization result generating unit includes:
the first scanning sub-unit is used for reversely scanning each sub-word of the corresponding low-frequency word by utilizing the neural network to obtain a vector representation result of a first sub-word in each sub-word;
and the first generating subunit is used for taking the vector representation result of the first subword as the vector representation result of the corresponding low-frequency word.
Optionally, the characterization result generating unit is specifically configured to generate a vector characterization result of the translation corresponding to the low-frequency word by using a vector characterization result of each subword of the translation corresponding to the low-frequency word.
Optionally, the characterization result generating unit includes:
the second scanning subunit is used for reversely scanning each subword of the translation of the corresponding low-frequency word by utilizing the neural network to obtain a vector representation result of a first subword in each subword;
and the second generating subunit is used for taking the vector representation result of the first subword as the vector representation result of the translation corresponding to the low-frequency word.
Optionally, the vector representation result of the wildcard carries context semantic information of the sample corpus to which the corresponding low-frequency word belongs.
An embodiment of the present application further provides a low frequency word translation device, including: a processor, a memory, a system bus;
the processor and the memory are connected through the system bus;
the memory is used for storing one or more programs, and the one or more programs comprise instructions which, when executed by the processor, cause the processor to execute any implementation manner of the low-frequency word translation method.
An embodiment of the present application further provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are run on a terminal device, the terminal device is enabled to execute any implementation manner of the low-frequency word translation method.
The embodiment of the present application further provides a computer program product, which when running on a terminal device, enables the terminal device to execute any implementation manner of the low-frequency word translation method.
In summary, according to the low-frequency word translation method and device provided in the embodiments of the present application, after a source text to be translated is obtained, combined representation results corresponding to each low-frequency word in the source text may be generated, and for each low-frequency word, the combined representation result corresponding to the low-frequency word includes a vector representation result of the low-frequency word and/or a vector representation result of a translated text of the low-frequency word, and a vector representation result of a wildcard after the low-frequency word is replaced with a wildcard, and then the source text is translated according to the combined representation results corresponding to each low-frequency word, so as to obtain a target text. Therefore, compared with a method for directly converting low-frequency words in a source text into wildcards and then translating the wildcards in the prior art, the method and the device for converting the low-frequency words in the source text have the advantages that when the source text is translated, not only are vector representation results of the wildcards correspondingly replaced by the low-frequency words taken into consideration, but also vector representation results of the low-frequency words and/or translated text of the low-frequency words are further taken into consideration, so that the integrity of semantic information of the source text is improved, and the fluency of the translation results is further improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a diagram illustrating an RNN and Attention-based translation model provided by an embodiment of the present application;
fig. 2 is a schematic flowchart of a low-frequency word translation method according to an embodiment of the present application;
FIG. 3 is a diagram illustrating a reverse scan sub-word sequence provided by an embodiment of the present application;
fig. 4 is a schematic flowchart of translating a source text according to a combined representation result corresponding to each low-frequency word provided in the embodiment of the present application;
fig. 5 is a schematic diagram of a function image of a hyperbolic tangent function tanh provided in the embodiment of the present application;
FIG. 6 is a schematic diagram of vector fusion of low-frequency words and wildcards according to an embodiment of the present application;
FIG. 7 is a vector fusion diagram of low frequency word translations and wildcards according to an embodiment of the present application;
fig. 8 is a vector fusion diagram of low-frequency words, low-frequency word translations, and wildcards provided in an embodiment of the present application;
fig. 9 is a schematic composition diagram of a low-frequency word translation apparatus according to an embodiment of the present application.
Detailed Description
Before introducing the low-frequency word translation method provided by the embodiment of the present application, a structure and a function of a low-frequency word translation model that can be used in the embodiment of the present application are first introduced.
The low-frequency word translation model used in the embodiment of the present application may be a translation model based on a neural network and Attention mechanism (Attention), or a translation model based on Attention entirely, and the like. The embodiment of the present application does not limit the type of the Neural Network used in the translation model, and for example, the Neural Network may be a Recurrent Neural Network (RNN) or a Convolutional Neural Network (CNN).
For convenience of description, in the embodiments of the present application, a text to be translated is defined as a source text, and a translation obtained by translating the source text is defined as a target text.
The working process of the translation model is described below by taking the translation model based on RNN and Attention as an example, and as shown in fig. 1, the source text input by the model is x ═ x (x)1,x2,x3,...,xm) And the target text output by the model is y ═ y (y)1,y2,y3,...,yn) And the lengths of the source text and the target text are m and n respectively, and the m and the n respectively represent the number of the sub-words contained in the source text and the target text.
The translation model includes three modules, respectively, a bidirectional RNN-based coding module (i.e., an Encoder module), an Attention module (i.e., an Attention module), and an RNN-based decoding module (i.e., a Decoder module), and the functions of each module are described below.
I, Encoder module
The Encoder module functions to compute the token encoding of each word in the source text in the context of the text. Specifically, for source text x ═ (x)1,x2,x3,...,xm) Each word in (2) can be obtained by a word vector table look-up technique and the likeiCorresponding vector characterization result viThen, obtaining the characterization f of the ith word under the condition that the ith word sees historical vocabulary information through a forward recurrent neural network based on the vector characterization result of the ith wordiAnd obtaining the characterization b of the ith word under the condition of seeing future vocabulary information through a reverse circulation neural networkiFinally, the two are spliced together [ fi:bi]Forming a final characterization result h of the ith wordiWherein, i is 1, 2, 3.
The Recurrent neural network here may be a common RNN, or an improved structure thereof, such as a Gated Recurrent Unit (GRU) or a long-Short Term Memory network (L ong Short-Term Memory, L STM).
Second, Attention module
The role of the Attention module is to calculate the information representation c of the source text on which the ith decoding moment dependsi. Assume that the last time RNN decoded implicit state is si-1Then c isiThe calculation method of (2) is specifically described as follows:
Figure BDA0001940482970000061
Figure BDA0001940482970000062
wherein, a(s)i-1,hj) Is based on the variable si-1And hjThe general function of (1) can be implemented in various ways.
It can be seen that semantic information representation c of the source text generated at the ith decoding momentiIs a characterization result h for each word in the source textjAnd the characterization result h of each wordjαijDetermines the degree of attention that the word receives at the current moment.
Decoder module
The Decoder module is used for representing the vector of the source text dynamically generated at each momentiAnd the state s of the Decoder module at the previous momenti-1And generating the target text by adopting the RNN. The specific calculation method is as follows:
si=f(xi-1,yi-1,ci)
Figure BDA0001940482970000071
wherein, f (-) represents a transformation function based on RNN, and RNN can be a common structure, and can also be GRU or L STM structure added with a gating mechanism;P(yi=Vk) Denotes yiIs the probability of the kth word in the target language vocabulary, bk(si) Representing the transformation function associated with the kth target word.
After the word probability calculation on the target language vocabulary is completed at each decoding moment, a final decoding sequence can be obtained by, for example, the BeamSearch algorithm, that is, the target text y ═ (y ═ is obtained1,y2,y3,...,yn) The output probability P (y | x) corresponding to the entire target text is maximized.
Next, a method for translating a low-frequency word provided in the embodiment of the present application will be specifically described.
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
First embodiment
Referring to fig. 2, a flow diagram of a low-frequency word translation method provided in this embodiment is shown, where the method includes the following steps:
s201: and generating a combined representation result corresponding to each low-frequency word in the source text.
In this embodiment, a text to be translated is defined as a source text.
It should be noted that the present embodiment does not limit the language of the source text, for example, the source text may be a chinese text, an english text, or the like; the present embodiment also does not limit the length of the source text, for example, the source text may be a word, a sentence, a chapter-level text, etc.
It will be appreciated that one or more low frequency words may be included in the source text based on the length of the source text. The present embodiment does not limit the type of the low frequency words, and may be named entities, complex noun phrases or terms of expertise, and the like.
In this embodiment, for each low-frequency word in the source text, a combined representation result corresponding to each low-frequency word may be generated, and in the combined representation result, a vector representation result of the corresponding low-frequency word and/or a vector representation result of a translated version of the corresponding low-frequency word may be included, and the vector representation result of the wildcard UNKi may be obtained after the corresponding low-frequency word is replaced with the wildcard UNKi.
Specifically, each low-frequency word in the source text needs to be replaced by a wildcard character UNKi, based on which, when the source text is translated in the subsequent step S202, a vector representation result considering the low-frequency word itself and the wildcard character UNKi corresponding to the low-frequency word is combined, or a translation of the low-frequency word in the target language (i.e., the translated language) and a vector representation result considering the wildcard character UNKi corresponding to the low-frequency word are combined, or a translation of the low-frequency word in the target language and a vector representation result considering the wildcard character UNKi corresponding to the low-frequency word are combined. That is to say, the wildcard character UNKi of the low-frequency word and the semantic information of the low-frequency word and/or the translated text are/is comprehensively considered, the source text is translated, and the accuracy and the fluency of the translation result can be effectively improved.
When the source text needs to be translated by using the vector representation result of the low-frequency word translation, the low-frequency words need to be translated, and the low-frequency words can be translated by adopting a strategy of combining a word list lookup mode and a model translation mode. Specifically, for bilingual low-frequency words, because the internal statistical rules of the bilingual low-frequency words are relatively fuzzy and the occurrence frequency of the bilingual low-frequency words in a training corpus is very low, certain difficulty exists in modeling, so that the translation of the low-frequency words can be translated by preferentially using a dictionary searching mode, and if the search fails, the translation is translated by adopting a customized low-frequency word translation model.
The dictionary searching mode can be realized by pre-constructing a word list of low-frequency words (including entities, terms, proper nouns and the like) according to specific application scenes and fields of machine translation, and the returned low-frequency word translation can be ensured to be completely correct and accord with a specific situation as long as the low-frequency words to be searched hit the word list.
However, when the low-frequency word translation is performed by looking up a dictionary, the low-frequency word translation may not be hit depending on the scale of the word list construction. In this case, a customized low-frequency word translation model may be used, and the target translation with the highest probability may be output by using the customized low-frequency word translation model as the translation of the low-frequency word, and specifically, a word model may be used as the low-frequency word translation model to solve the problems of a short low-frequency word and a low frequency, because the frequency of the modeling unit in the model training data may be increased by using the word model, the translation performance of the low-frequency word may be greatly improved. For example, taking "Nanjing Yangtze River Bridge" and its translation "Nanjingtze River Bridge" as an example, the bilingual forms under the character model are "Nanjing Yangtze River Bridge" and "[ Nanjing ] [ Yangtze ] [ RiVer ] [ Bridge ]".
Next, how to generate the combined characterization results corresponding to the low-frequency words in the source text is described, that is, for each low-frequency word, "how to generate the vector characterization result of the low-frequency word," how to generate the vector characterization result of the translation of the low-frequency word, "and" how to generate the vector characterization result of the wildcard corresponding to the low-frequency word. The method comprises the following specific steps:
firstly, generating a vector characterization result of a low-frequency word according to the following mode
In an implementation manner of this embodiment, for each low-frequency word, a vector characterization result of the low-frequency word may be generated by using a vector characterization result of each subword of the low-frequency word.
In this implementation, the low-frequency word may be segmented into sub-words, where the number of characters in a sub-word is not limited, the sub-word may include one character or multiple characters, and for each sub-word, the frequency of the sub-word in the source language (i.e., the language to which the source text belongs) may be higher or lower, that is, the sub-word may be a high-frequency sub-word or a low-frequency sub-word. For example, for the word "robot", assuming that "robot" is a low-frequency word, and the sub-words "machine" and "machine" in the low-frequency word are generally co-occurring in a large-scale corpus with a high frequency, but the frequency of co-occurrence of three words "machine", "machine" and "human" may be relatively low, the "robot" may be segmented into sub-words "machine" and "human", where "machine" and "human" are high-frequency sub-words. Based on this, under the condition that the frequency of the sub-words in the low-frequency words is high, the vector representation result with the high frequency sub-words is better can be learned through the model, so that when the vector representation result of each sub-word of the low-frequency words is used for generating the vector representation result of the low-frequency words, the vector representation result of the low-frequency words can accurately represent the semantic information of the low-frequency words.
More specifically, in this implementation, each subword of the low-frequency word may be reversely scanned by using a neural network to obtain a vector characterization result of a first subword in each subword, and the vector characterization result of the first subword is used as the vector characterization result of the low-frequency word.
In practical implementation, a long-Short-Term Memory (L ong-Term Memory, L STM) Network can be adopted to reversely scan a sub-word sequence of the low-frequency word, so that the first sub-word of the sub-word sequence occupies important information in a vector representation result of the low-frequency word, wherein the L STM Network is good at modeling natural language in machine translation, can convert a text with any length into a floating point vector with a specific dimension, simultaneously memorizes more important words in the text and can keep Memory for a long time, the L STM is a special structural type of a Recurrent Neural Network (RNN) model, three control units of an input gate, an output gate and a forgetting gate are added, and for information entering the L STM Network, the three gates can determine the proportion of memorized, forgotten and output information after judging the three gates, and can effectively solve the problem of long-distance dependence in a Neural Network.
Based on this, if the L STM network is adopted to scan the sub-word sequence of the low-frequency word in the reverse direction, the first sub-word can be guaranteed to play the greatest role in the vector representation result of the low-frequency word, so that the translation result of the low-frequency word is guaranteed to be more smoothly linked with the context of the low-frequency word.
For example, the schematic diagram of the sequence of reverse-scanning subwords shown in FIG. 3, with low-frequency wordsSunk2(x6,x7,x8) And its vector characterization result
Figure BDA0001940482970000101
For example, wherein x6、x7And x8For the three sub-words of the low frequency word,
Figure BDA0001940482970000102
and sequentially representing the results of the vectors of the three sub-words. Then, one by one will
Figure BDA0001940482970000103
Inputting the code into L STM network, the calculation formula is as follows:
Figure BDA0001940482970000104
finally, L STM network output vector
Figure BDA0001940482970000105
I.e. the low frequency word Sunk2(x6,x7,x8) The vector of (2) characterizes the result, where x6As the last input will be
Figure BDA0001940482970000106
Plays a greater role in the process.
Secondly, generating a vector representation result of the translation of the low-frequency words according to the following mode
In an implementation manner of this embodiment, for each low-frequency word, a vector representation result of the translated version of the low-frequency word may be generated by using a vector representation result of each subword of the translated version of the low-frequency word. More specifically, in this implementation manner, each subword of the translation of the low-frequency word may be reversely scanned by using the neural network to obtain a vector characterization result of a first subword in each subword, and the vector characterization result of the first subword is used as the vector characterization result of the translation of the low-frequency word.
In this implementation manner, the vector representation result of the translated text of the low-frequency word may be generated in a similar manner to the above-mentioned "generating the vector representation result of the low-frequency word in the following manner", that is, replacing the above-mentioned "low-frequency word" with the "translated text of the low-frequency word", replacing each sub-word of the above-mentioned "low-frequency word" with each sub-word of the "translated text of the low-frequency word", so as to generate the vector representation result of the translated text of the low-frequency word in the above-mentioned manner, and a specific manner is not described herein again.
Thirdly, generating a vector representation result of the wildcard characters corresponding to the low-frequency words according to the following mode
In an implementation manner of this embodiment, for each low-frequency word, a vector representation result of a wildcard corresponding to the low-frequency word carries context semantic information of a sample corpus to which the low-frequency word belongs.
In the implementation manner, a pre-constructed low-frequency word translation model can be adopted to translate a source text, and when the low-frequency word translation model is trained, a large amount of model training data is required to be used for training, the model training data contains a large amount of sample texts, one or more sample texts can contain the low-frequency word in the source text, after the low-frequency word is replaced by a corresponding wildcard character UNKi, word vectors of the wildcard character UNKi and other words in the sample text to which the wildcard character UNKi belongs can be initialized, and the low-frequency word translation model is trained on the basis of the word vectors; in the training process, the vector representation result of the wildcard character UNKi can be continuously updated, so that the vector representation result of the wildcard character UNKi carries semantic information of the context of the sample text to which the wildcard character UNKi belongs, and after the training is finished, the vector representation result of the wildcard character UNKi is fixed.
Based on this, when a vector representation result of the wildcard character UNKi corresponding to the low-frequency word in the source text needs to be generated, the vector representation result of the wildcard character UNKi obtained after model training is finished can be directly used.
S202: and translating the source text according to the combined representation result corresponding to each low-frequency word to obtain a target text.
In this embodiment, after the combined representation results corresponding to the low-frequency words are obtained in step S201, the source text may be translated based on the combined representation results corresponding to the low-frequency words and the respective vector representation results of other words (words other than the low-frequency words) in the source text, where a translated text of the source text is defined as a target text.
As for other words in the source text except for the low-frequency words, the vector representation results of the words may be directly generated by using a word vector generation method, and of course, the vector representation results of the words may also be generated in the manner of "generating the vector representation results of wildcards", that is, the vector representation results of the words are obtained in the process of training the low-frequency word translation model, and for each of the words, the vector representation results of the word carry context semantic information of the sample corpus to which the word belongs.
It should be noted that, when translating the source text, the source text may be translated by using an existing or future-appearing translation model, such as the translation model based on RNN and Attention shown in fig. 1, or the translation model based on CNN and Attention, or the translation model based on Attention, etc. In addition, the present embodiment does not limit the language of the target text, for example, if the source text is chinese, the target text may be english.
It should be noted that, a specific implementation manner of the step S202 will be specifically described in the second embodiment.
In summary, in the low-frequency word translation method provided in this embodiment, after a source text to be translated is obtained, a combined representation result corresponding to each low-frequency word in the source text may be generated, and for each low-frequency word, the combined representation result corresponding to the low-frequency word includes a vector representation result of the low-frequency word and/or a vector representation result of a translated text of the low-frequency word, and a vector representation result of a wildcard after the low-frequency word is replaced with a wildcard, and then the source text is translated according to the combined representation result corresponding to each low-frequency word, so as to obtain a target text. Therefore, compared with a method of directly converting low-frequency words in a source text into wildcards and then translating the wildcards in the prior art, the method and the device have the advantages that when the source text is translated, not only are vector representation results of the wildcards correspondingly replaced by the low-frequency words considered, but also vector representation results of the low-frequency words and/or translated text of the low-frequency words are further considered, so that the integrity of semantic information of the source text is improved, and the fluency of the translation results is further improved.
In addition, the existing low-frequency word translation method aims at the situation that translation missing may occur to a plurality of low-frequency words (greater than or equal to 3), and the low-frequency word translation method adopted by the embodiment can greatly reduce the rate of the translation missing and enable the context of the target text obtained by translation to be smoother.
Second embodiment
It should be noted that, this embodiment will describe a specific implementation manner of step S202 in the first embodiment.
Referring to fig. 4, a schematic flow chart of translating a source text according to a combined representation result corresponding to each low-frequency word provided in this embodiment is shown, where the method includes the following steps:
s401: and for each low-frequency word, carrying out vector fusion on each vector representation result in the combined representation result corresponding to the low-frequency word to obtain a final representation result of the low-frequency word.
In this embodiment, vector fusion may be performed on two or three vector characterization results in the combined characterization result of each low-frequency word, and a final characterization result of each low-frequency word is obtained through vector fusion.
For example, assume that the source text is x and the target text is y, and assume that the sequence of subwords included in the source text x and the target text y are:
x=(x1,x2,x3,x4,x5,x6,x7,x8,x9,...,xm)
y=(y1,y2,y3,y4,y5,y6,y7,y8,y9,...,yn)
wherein, it is assumed that the sequence S of subwords in the source text x can be obtained by word frequency statisticsunk1(x2,x3) And Sunk2(x6,x7,x8) The two low-frequency words to be processed are obtained by means of dictionary lookup or model translation introduced in the first embodiment, and the low-frequency word S is assumed to be a translation of the two low-frequency wordsunk1(x2,x3) Is a sequence of sub-words T in the target text yunk1(y2,y3,y4) Low frequency word Sunk2(x6,x7,x8) Is a sequence of sub-words T in the target text yunk2(y7,y8,y9)。
When the wildcard character UNKi is used to replace the low-frequency words and low-frequency word translations in the bilingual sentence pair (x, y), a new bilingual sentence pair is obtained
Figure BDA0001940482970000131
Comprises the following steps:
Figure BDA0001940482970000132
Figure BDA0001940482970000133
wherein u is1For replacing low-frequency words Sunk1(x2,x3) Wildcard character of u2For replacing low-frequency words Sunk2(x6,x7,x8) The wildcard character of (1).
At this time, there are the following three vector fusion methods:
fusion mode 1, low-frequency words S in source text x can be combinedunk1(x2,x3) The vector characterization result and the wildcard character u1The vector representation results are fused to obtain a low-frequency word Sunk1(x2,x3) (iv) final characterization results; similarly, the low-frequency words S in the source text xunk2(x6,x7,x8) The vector characterization result and the wildcard character u2The vector representation results are fused to obtain a low-frequency word Sunk2(x6,x7,x8) The final characterization result of (1).
The fusion mode 2 can be used for fusing low-frequency words S in the source text xunk1(x2,x3) Is translated into a translation Tunk1(y2,y3,y4) The vector characterization result and the wildcard character u1The vector representation results are fused to obtain a low-frequency word Sunk1(x2,x3) (iv) final characterization results; similarly, the low-frequency words S in the source text xunk2(x6,x7,x8) Is translated into a translation Tunk2(y7,y8,y9) The vector characterization result and the wildcard character u2The vector representation results are fused to obtain a low-frequency word Sunk2(x6,x7,x8) The final characterization result of (1).
A fusion mode 3 can be used for fusing low-frequency words S in the source text xunk1(x2,x3) The vector characterization result of (2), the low frequency word Sunk1(x2,x3) Is translated into a translation Tunk1(y2,y3,y4) The vector characterization result and wildcard character u1The vector representation results are fused to obtain a low-frequency word Sunk1(x2,x3) (iv) final characterization results; similarly, the low-frequency words S in x in the source text are combinedunk2(x6,x7,x8) The vector characterization result of (2), the low frequency word Sunk2(x6,x7,x8) Is translated into a translation Tunk2(y7,y8,y9) The vector characterization result and wildcard character u2The vector representation results are fused to obtain a low-frequency word Sunk2(x6,x7,x8) The final characterization result of (1).
In an implementation manner of this embodiment, step S401 may specifically include: and for each low-frequency word, performing weighted calculation on each vector representation result in the combined representation result corresponding to the low-frequency word to obtain a weighted calculation result, and performing nonlinear transformation on the weighted calculation result to obtain a final representation result of the low-frequency word.
In this implementation manner, the function of "nonlinear transformation" is mainly implemented by Activation functions (Activation functions), which have very important functions for artificial neural network model learning and representing complex and nonlinear mapping relationships, and specifically, a hyperbolic tangent function tanh may be used as the Activation function, and the calculation formula is as follows:
Figure BDA0001940482970000141
fig. 5 shows a function image of the hyperbolic tangent function tanh.
Based on this, for each low-frequency word in the source text, after performing weighted calculation on two or three vector representation results in the combined representation result corresponding to the low-frequency word, a hyperbolic tangent function tanh may be used to perform nonlinear transformation on the weighted calculation result, and the transformation result is used as the final representation result of the low-frequency word.
The nonlinear transformation by using the hyperbolic tangent function tanh mainly has the following two advantages:
first, when the value of input x is large or small, the slope of tanh will approach zero indefinitely. That is, if and only if the weighting calculation result corresponding to the low-frequency word is within a certain range, a larger gradient is generated after the non-linear transformation is performed on the low-frequency word through the tanh function; on the contrary, when the weighting calculation result corresponding to the low-frequency word exceeds a certain range, the tanh function does not make a large response any more because the tanh function approaches saturation, at this time, the role of the tanh function is to shield the weighting calculation result as an abnormal value, and in the training process of the low-frequency word translation model, based on the shielding role of the tanh function, certain bias influence on the output of a subsequent coding result and the update of a context vector caused by the fact that the weighting calculation result corresponding to the low-frequency word in the sample text (belonging to the model training data) is too large or too small is avoided.
Second, as can be seen from the tanh function image shown in fig. 5, the function has the characteristics of smooth output, easy derivation, output centered at 0 and between (-1, 1), and the like. The characteristics ensure that the parameter updating efficiency of the tanh function is high, and the weighting calculation result corresponding to the low-frequency word can be rescaled to a certain size range after nonlinear transformation, which is particularly important in the scenes with dense matrix operations such as neural networks.
Therefore, for each low-frequency word in the source text, vector fusion can be performed on each vector representation result in the combined representation result corresponding to the low-frequency word by adopting a mode of firstly weighting calculation and then nonlinear transformation, so that the final representation result of the low-frequency word is obtained. Next, based on the above example, the above three vector fusion methods will be described in detail.
In the fusion mode 1, as shown in the vector fusion diagram of the low-frequency words and the wildcards shown in fig. 6, it is assumed that the source text x includes two low-frequency words, which are respectively the low-frequency words Sunk1(x2,x3) And Sunk2(x6,x7,x8)。
Suppose a low frequency word Sunk1(x2,x3) The vector characterization result of the sub-word sequence is Sunk1(v2,v3) By using Sunk1(v2,v3) Can generate low frequency words Sunk1(x2,x3) Vector characterization result of
Figure BDA0001940482970000151
(for example, using the L STM reverse scan method mentioned in the first embodiment) and, in addition, using the method described in the first embodiment, the low-frequency word S can be obtainedunk1(x2,x3) Corresponding wildcard character u1Vector characterization result of
Figure BDA0001940482970000152
Namely, the wildcard character u is obtained by updating in the training process of the low-frequency word translation model1Vector characterization result of
Figure BDA0001940482970000153
At this time, the method of weighting calculation and then nonlinear transformation can be adopted
Figure BDA0001940482970000154
And
Figure BDA0001940482970000155
vector fusion is carried out, namely:
Figure BDA0001940482970000156
wherein, Wu,unk1、Ws,unk1Are weights.
It can be seen that at low frequency the word Sunk1(x2,x3) Final characterization result of (5)1In (1), not only its wildcard character u is included1Vector characterization result of
Figure BDA0001940482970000157
Low frequency word S carried inunk1(x2,x3) The context semantic information of the sample corpus (belonging to the training data of the low-frequency word translation model) also comprises the low-frequency word Sunk1(x2,x3) Semantic information of itself, therefore, V1After the coding layer of the low-frequency word translation model is sent into, the vector representation result generated correspondingly
Figure BDA0001940482970000158
More accurate and more sufficient information is contained. Further, the vector characterizes the result
Figure BDA0001940482970000159
Will play a positive role in the subsequent Attention module and Decoder module, and further can generate more accurate low-frequency word translation and source text translation (i.e. target text translation)Markup text).
Similarly, assume a low frequency word Sunk2(x6,x7,x8) The vector characterization result of the sub-word sequence is Sunk2(v6,v7,v8) By using Sunk2(x6,x7,x8) Can generate low frequency words Sunk2(x6,x7,x8) Vector characterization result of
Figure BDA00019404829700001510
(for example, using the L STM reverse scan method mentioned in the first embodiment) and, in addition, using the method described in the first embodiment, the low-frequency word S can be obtainedunk2(x6,x7,x8) Corresponding wildcard character u2Vector characterization result of
Figure BDA00019404829700001511
Namely, the low-frequency words S are updated in the training process of the low-frequency word translation modelunk2(x6,x7,x8) Wildcard character u of2Vector characterization result of
Figure BDA00019404829700001512
At this time, the method of weighting calculation and then nonlinear transformation can be adopted
Figure BDA00019404829700001513
And
Figure BDA00019404829700001514
vector fusion is carried out, namely:
Figure BDA00019404829700001515
wherein, Wu,unk2、Ws,unk2Are weights.
It can be seen that at low frequency the word Sunk2(x6,x7,x8) Final characterization result of (5)2In (1)Includes its wildcard character u2Vector characterization result of
Figure BDA00019404829700001516
Low frequency word S carried inunk2(x6,x7,x8) The context semantic information of the sample corpus (belonging to the training data of the low-frequency word translation model) also comprises the low-frequency word Sunk2(x6,x7,x8) Semantic information of itself, therefore, V2After the coding layer of the low-frequency word translation model is sent into, the vector representation result generated correspondingly
Figure BDA0001940482970000161
More accurate and more sufficient information is contained. Further, the vector characterizes the result
Figure BDA0001940482970000162
The subsequent Attention module and Decoder module will play a positive role, and further more accurate low-frequency word translation and source text translation (i.e. target text) can be generated.
In the fusion mode 2, as shown in fig. 7, the low-frequency word translation and wildcard vector fusion diagram is assumed that the source text x includes two low-frequency words, which are respectively the low-frequency words Sunk1(x2,x3) And Sunk2(x6,x7,x8)。
Suppose low frequency word translation Tunk1(y2,y3,y4) The vector characterization result of the sub-word sequence is Tunk1(v2,v3,v4) By means of Tunk1(v2,v3,v4) Can generate low-frequency word translation Tunk1(y2,y3,y4) Vector characterization result ofv unk1(for example, using the L STM reverse scan method mentioned in the first embodiment) and, in addition, using the method described in the first embodiment, the low-frequency word S can be obtainedunk1(x2,x3) Corresponding wildcard character u1Vector table ofCharacterization results
Figure BDA0001940482970000163
Namely, the wildcard character u is obtained by updating in the training process of the low-frequency word translation model1Vector characterization result of
Figure BDA0001940482970000164
At this time, the method of weighting calculation and then nonlinear transformation can be adopted
Figure BDA0001940482970000165
And
Figure BDA0001940482970000166
vector fusion is carried out, namely:
Figure BDA0001940482970000167
wherein, Wu,unk1、Wt,unk1Are weights.
It can be seen that at low frequency the word Sunk1(x2,x3) Final characterization result of (5)1In (1), not only its wildcard character u is included1Vector characterization result of
Figure BDA0001940482970000168
Low frequency word S carried inunk1(x2,x3) The context semantic information of the sample corpus (belonging to the training data of the low-frequency word translation model) also comprises the low-frequency word translation Tunk1(y2,y3,y4) Semantic information of itself, therefore, V1After the coding layer of the low-frequency word translation model is sent into, the vector representation result generated correspondingly
Figure BDA0001940482970000169
More accurate and more sufficient information is contained. Further, the vector characterizes the result
Figure BDA00019404829700001610
The subsequent Attention module and Decoder module will play a positive role, and further more accurate low-frequency word translation and source text translation (i.e. target text) can be generated.
Similarly, assume a low frequency word translation Tunk2(y7,y8,y9) The vector characterization result of the sub-word sequence is Tunk2(v7,v8,v9) By means of Tunk2(v7,v8,v9) Can generate low-frequency word translation Tunk2(y7,y8,y9) Vector characterization result ofv unk2(for example, using the L STM reverse scan method mentioned in the first embodiment) and, in addition, using the method described in the first embodiment, the low-frequency word S can be obtainedunk2(x6,x7,x8) Corresponding wildcard character u2Vector characterization result of
Figure BDA00019404829700001611
Namely, the low-frequency words S are updated in the training process of the low-frequency word translation modelunk2(x6,x7,x8) Wildcard character u of2Vector characterization result of
Figure BDA00019404829700001612
At this time, the method of weighting calculation and then nonlinear transformation can be adopted
Figure BDA00019404829700001613
And
Figure BDA00019404829700001614
vector fusion is carried out, namely:
Figure BDA00019404829700001615
wherein, Wu,unk2、Wt,unk2Are weights.
It can be seen that at low frequency the word Sunk2(x6,x7,x8) Final characterization result of (5)2In (1), not only its wildcard character u is included2Vector characterization result of
Figure BDA0001940482970000171
Low frequency word S carried inunk2(x6,x7,x8) The context semantic information of the sample corpus (belonging to the training data of the low-frequency word translation model) also comprises the low-frequency word translation Tunk2(y7,y8,y9) Semantic information of itself, therefore, V2After the coding layer of the low-frequency word translation model is sent into, the vector representation result generated correspondingly
Figure BDA0001940482970000172
More accurate and more sufficient information is contained. Further, the vector characterizes the result
Figure BDA0001940482970000173
The subsequent Attention module and Decoder module will play a positive role, and further more accurate low-frequency word translation and source text translation (i.e. target text) can be generated.
In the fusion mode 3, as shown in fig. 8, the low-frequency word translation and the wildcard vector fusion diagram are assumed that the source text x includes two low-frequency words, which are respectively the low-frequency words Sunk1(x2,x3) And Sunk2(x6,x7,x8)。
Referring to the above descriptions of the fusion mode 1 and the fusion mode 2, the low frequency word S can be obtainedunk1(x2,x3) Vector characterization result of
Figure BDA0001940482970000174
Low frequency word Sunk1(x2,x3) Translation Tunk1(y2,y3,y4) Vector characterization result ofv unk1And a low frequency word Sunk1(x2,x3) Corresponding wildcard character u1Vector of (2)Characterization results
Figure BDA0001940482970000175
At this time, the three may be vector-fused in a manner of first weighting calculation and then nonlinear transformation, so as to obtain the low-frequency word Sunk1(x2,x3) Final characterization result of (5)1Namely:
Figure BDA0001940482970000176
similarly, referring to the above descriptions of the fusion mode 1 and the fusion mode 2, the low-frequency word S can be obtainedunk2(x6,x7,x8) Vector characterization result of
Figure BDA0001940482970000177
Low frequency word Sunk2(x6,x7,x8) Translation Tunk2(y7,y8,y9) Vector characterization result ofv unk2And a low frequency word Sunk2(x6,x7,x8) Corresponding wildcard character u2Vector characterization result of
Figure BDA0001940482970000179
At this time, the three may be vector-fused in a manner of first weighting calculation and then nonlinear transformation, so as to obtain the low-frequency word Sunk2(x6,x7,x8) Final characterization result of (5)2Namely:
Figure BDA0001940482970000178
when the fusion mode 3 is adopted for vector fusion, the vector representation of the low-frequency word semantic information in two vector spaces of the source language and the target language is utilized at the same time. Since low-frequency words appear only less frequently in a single language, the low-frequency characteristic is not necessarily satisfied at the other end of bilingual translation. Therefore, the fusion mode 3 can ensure that the semantic features of the low-frequency words in two different languages are complementarily extracted in the training process of the low-frequency word translation model, so that the low-frequency translation defect is effectively relieved, and the translation quality of the source text with the low-frequency words in the actual translation can be improved.
In addition, in the training process of the low-frequency word translation model, not only the vector representation result of the low-frequency word is given, but also the vector representation result of the translation of the low-frequency word is given, so that the model can learn a copy mechanism, namely, the model learns a translation mechanism which directly outputs the low-frequency word and the translation of the context thereof through the training process. For example, when translating source text, the model is assumed to incorporate low frequency word translation T during encodingunk1(y2,y3,y4) And Tunk2(y7,y8,y9) Vector characterization result T ofunk1(v2,v3,v4) And Tunk2(v7,v8,v9) And low-frequency words T are also present in the target textunk1(y2,y3,y4) And Tunk2(y7,y8,y9)。
In the three fusion methods, the larger the weight is, the greater the influence of the vector representation result corresponding to the weight on the output result of the encoder becomes. The weighted values belong to adaptive parameters of the low-frequency word translation model, and are parameters which are continuously updated in the model training process and obtained after the model training is finished.
S402: and translating the source text according to the respective final characterization result of each low-frequency word to obtain a target text.
In this embodiment, after the final characterization result of each low-frequency word in the source text is obtained in S401, encoding may be performed according to the final characterization result of each low-frequency word and the vector characterization results of other words in the source text, so that the source text is translated based on the encoding result. Because the context information of the low-frequency words and the semantic information of the low-frequency words and/or the translated text thereof are fused into the coding result, the accuracy and the fluency of the translation result can be improved by translating the source text based on the coding result.
Third embodiment
In this embodiment, a low frequency word translation apparatus will be described, and please refer to the above method embodiment for related contents.
Referring to fig. 9, a schematic composition diagram of a low frequency word translation apparatus provided in an embodiment of the present application is shown, where the apparatus 900 includes:
a representation result generating unit 901, configured to generate a combined representation result corresponding to each low-frequency word in the source text; the combined representation result comprises a vector representation result corresponding to the low-frequency word and/or a vector representation result corresponding to a translation of the low-frequency word, and a vector representation result of the wildcard after the corresponding low-frequency word is replaced by the wildcard;
and the low-frequency word translation unit 902 is configured to translate the source text according to the combined representation result corresponding to each low-frequency word, so as to obtain a target text.
In an implementation manner of this embodiment, the low-frequency word translation unit 902 includes:
the vector fusion subunit is used for performing vector fusion on each vector representation result in the combined representation result corresponding to each low-frequency word to obtain a final representation result of the low-frequency word;
and the low-frequency word translation subunit is used for translating the source text according to the respective final representation result of each low-frequency word.
In an implementation manner of this embodiment, the vector fusion subunit includes:
the weighting calculation subunit is configured to perform weighting calculation on each vector representation result in the combined representation result corresponding to the low-frequency word to obtain a weighting calculation result;
and the nonlinear transformation subunit is used for carrying out nonlinear transformation on the weighting calculation result to obtain a final characterization result of the low-frequency word.
In an implementation manner of this embodiment, the characterization result generating unit 901 is specifically configured to generate a vector characterization result corresponding to a low-frequency word by using a vector characterization result of each subword corresponding to the low-frequency word.
In an implementation manner of this embodiment, the characterization result generating unit 901 includes:
the first scanning sub-unit is used for reversely scanning each sub-word of the corresponding low-frequency word by utilizing the neural network to obtain a vector representation result of a first sub-word in each sub-word;
and the first generating subunit is used for taking the vector representation result of the first subword as the vector representation result of the corresponding low-frequency word.
In an implementation manner of this embodiment, the characterization result generating unit 901 is specifically configured to generate a vector characterization result of the translated text corresponding to the low-frequency word by using a vector characterization result of each subword of the translated text corresponding to the low-frequency word.
In an implementation manner of this embodiment, the characterization result generating unit 901 includes:
the second scanning subunit is used for reversely scanning each subword of the translation of the corresponding low-frequency word by utilizing the neural network to obtain a vector representation result of a first subword in each subword;
and the second generating subunit is used for taking the vector representation result of the first subword as the vector representation result of the translation corresponding to the low-frequency word.
In an implementation manner of this embodiment, the vector representation result of the wildcard carries context semantic information of a sample corpus to which the corresponding low-frequency word belongs.
Further, an embodiment of the present application further provides a low-frequency word translation device, including: a processor, a memory, a system bus;
the processor and the memory are connected through the system bus;
the memory is used for storing one or more programs, and the one or more programs comprise instructions which, when executed by the processor, cause the processor to execute any one of the implementation methods of the low-frequency word translation method.
Further, an embodiment of the present application further provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are run on a terminal device, the terminal device is caused to execute any implementation method of the low-frequency word translation method.
Further, an embodiment of the present application further provides a computer program product, which when running on a terminal device, causes the terminal device to execute any implementation method of the above low-frequency word translation method.
As can be seen from the above description of the embodiments, those skilled in the art can clearly understand that all or part of the steps in the above embodiment methods can be implemented by software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present application may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network communication device such as a media gateway, etc.) to execute the method according to the embodiments or some parts of the embodiments of the present application.
It should be noted that, in the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. The term "comprising", without further limitation, means that the element so defined is not excluded from the group consisting of additional identical elements in the process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (13)

1. A method for translating low-frequency words, comprising:
generating a combined representation result corresponding to each low-frequency word in the source text; the combined representation result comprises a vector representation result corresponding to the low-frequency word and/or a vector representation result corresponding to a translation of the low-frequency word, and a vector representation result of the wildcard after the corresponding low-frequency word is replaced by the wildcard;
and translating the source text according to the combined representation result corresponding to each low-frequency word to obtain a target text.
2. The method according to claim 1, wherein translating the source text according to the combined representation result corresponding to each low-frequency word comprises:
for each low-frequency word, carrying out vector fusion on each vector representation result in the combined representation result corresponding to the low-frequency word to obtain a final representation result of the low-frequency word;
and translating the source text according to the respective final characterization result of each low-frequency word.
3. The method according to claim 2, wherein performing vector fusion on each vector characterization result in the combined characterization results corresponding to the low-frequency word to obtain a final characterization result of the low-frequency word comprises:
performing weighted calculation on each vector representation result in the combined representation results corresponding to the low-frequency words to obtain weighted calculation results;
and carrying out nonlinear transformation on the weighting calculation result to obtain a final characterization result of the low-frequency word.
4. The method according to any one of claims 1 to 3, wherein the vector characterization result of the corresponding low-frequency word is generated as follows:
and generating a vector representation result of the corresponding low-frequency word by using the vector representation result of each subword of the corresponding low-frequency word.
5. The method according to claim 4, wherein the generating a vector characterization result of the corresponding low-frequency word by using the vector characterization result of each subword of the corresponding low-frequency word comprises:
reversely scanning each subword corresponding to the low-frequency word by utilizing a neural network to obtain a vector representation result of a first subword in each subword;
and taking the vector representation result of the first sub-word as the vector representation result of the corresponding low-frequency word.
6. The method of any one of claims 1 to 3, wherein the vector characterization result of the translated version of the corresponding low-frequency word is generated as follows:
and generating a vector representation result of the translation of the corresponding low-frequency word by using the vector representation result of each sub-word of the translation of the corresponding low-frequency word.
7. The method of claim 6, wherein generating the vector characterization result of the translated version of the corresponding low-frequency word by using the vector characterization result of each subword of the translated version of the corresponding low-frequency word comprises:
reversely scanning each sub-word of the translation of the corresponding low-frequency word by using a neural network to obtain a vector representation result of a first sub-word in each sub-word;
and taking the vector representation result of the first sub-word as the vector representation result of the translation corresponding to the low-frequency word.
8. The method according to any one of claims 1 to 3, wherein the vector representation result of the wildcard carries context semantic information corresponding to the sample corpus to which the low-frequency word belongs.
9. A low-frequency word translation apparatus, comprising:
the characterization result generation unit is used for generating a combined characterization result corresponding to each low-frequency word in the source text; the combined representation result comprises a vector representation result corresponding to the low-frequency word and/or a vector representation result corresponding to a translation of the low-frequency word, and a vector representation result of the wildcard after the corresponding low-frequency word is replaced by the wildcard;
and the low-frequency word translation unit is used for translating the source text according to the combined representation result corresponding to each low-frequency word to obtain a target text.
10. The apparatus of claim 9, wherein the low frequency word translation unit comprises:
the vector fusion subunit is used for performing vector fusion on each vector representation result in the combined representation result corresponding to each low-frequency word to obtain a final representation result of the low-frequency word;
and the low-frequency word translation subunit is used for translating the source text according to the respective final representation result of each low-frequency word.
11. A low-frequency word translation apparatus, comprising: a processor, a memory, a system bus;
the processor and the memory are connected through the system bus;
the memory is to store one or more programs, the one or more programs comprising instructions, which when executed by the processor, cause the processor to perform the method of any of claims 1-8.
12. A computer-readable storage medium having stored therein instructions that, when executed on a terminal device, cause the terminal device to perform the method of any one of claims 1-8.
13. A computer program product, characterized in that it, when run on a terminal device, causes the terminal device to perform the method of any of claims 1-8.
CN201910020175.1A 2019-01-09 2019-01-09 Low-frequency word translation method and device Active CN111428518B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910020175.1A CN111428518B (en) 2019-01-09 2019-01-09 Low-frequency word translation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910020175.1A CN111428518B (en) 2019-01-09 2019-01-09 Low-frequency word translation method and device

Publications (2)

Publication Number Publication Date
CN111428518A true CN111428518A (en) 2020-07-17
CN111428518B CN111428518B (en) 2023-11-21

Family

ID=71546099

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910020175.1A Active CN111428518B (en) 2019-01-09 2019-01-09 Low-frequency word translation method and device

Country Status (1)

Country Link
CN (1) CN111428518B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111274826A (en) * 2020-01-19 2020-06-12 南京新一代人工智能研究院有限公司 Semantic information fusion-based low-frequency word translation method
CN112560510A (en) * 2020-12-10 2021-03-26 科大讯飞股份有限公司 Translation model training method, device, equipment and storage medium
CN113051936A (en) * 2021-03-16 2021-06-29 昆明理工大学 Method for enhancing Hanyue neural machine translation based on low-frequency word representation

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1643511A (en) * 2002-03-11 2005-07-20 南加利福尼亚大学 Named entity translation
US20070150260A1 (en) * 2005-12-05 2007-06-28 Lee Ki Y Apparatus and method for automatic translation customized for documents in restrictive domain
CN101194253A (en) * 2005-06-14 2008-06-04 微软公司 Collocation translation from monolingual and available bilingual corpora
CN101833550A (en) * 2010-03-10 2010-09-15 无锡市百川科技有限公司 Method and device for intercepting English display information at real time and translating thereof into Chinese
US20100332217A1 (en) * 2009-06-29 2010-12-30 Shalom Wintner Method for text improvement via linguistic abstractions
CN103189859A (en) * 2010-08-26 2013-07-03 谷歌公司 Conversion of input text strings
US20140163951A1 (en) * 2012-12-07 2014-06-12 Xerox Corporation Hybrid adaptation of named entity recognition
CN104346459A (en) * 2014-11-10 2015-02-11 南京信息工程大学 Text classification feature selecting method based on term frequency and chi-square statistics
CN106446230A (en) * 2016-10-08 2017-02-22 国云科技股份有限公司 Method for optimizing word classification in machine learning text
US20170103062A1 (en) * 2015-10-08 2017-04-13 Facebook, Inc. Language independent representations
JP2018055671A (en) * 2016-09-21 2018-04-05 パナソニックIpマネジメント株式会社 Paraphrase identification method, paraphrase identification device, and paraphrase identification program
CN108170686A (en) * 2017-12-29 2018-06-15 科大讯飞股份有限公司 Text interpretation method and device
CN108228574A (en) * 2017-12-07 2018-06-29 科大讯飞股份有限公司 Text translation processing method and device
CN108228576A (en) * 2017-12-29 2018-06-29 科大讯飞股份有限公司 Text interpretation method and device
CN108763221A (en) * 2018-06-20 2018-11-06 科大讯飞股份有限公司 A kind of attribute-name characterizing method and device

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1643511A (en) * 2002-03-11 2005-07-20 南加利福尼亚大学 Named entity translation
CN101194253A (en) * 2005-06-14 2008-06-04 微软公司 Collocation translation from monolingual and available bilingual corpora
US20070150260A1 (en) * 2005-12-05 2007-06-28 Lee Ki Y Apparatus and method for automatic translation customized for documents in restrictive domain
US20100332217A1 (en) * 2009-06-29 2010-12-30 Shalom Wintner Method for text improvement via linguistic abstractions
CN101833550A (en) * 2010-03-10 2010-09-15 无锡市百川科技有限公司 Method and device for intercepting English display information at real time and translating thereof into Chinese
CN103189859A (en) * 2010-08-26 2013-07-03 谷歌公司 Conversion of input text strings
US20140163951A1 (en) * 2012-12-07 2014-06-12 Xerox Corporation Hybrid adaptation of named entity recognition
CN104346459A (en) * 2014-11-10 2015-02-11 南京信息工程大学 Text classification feature selecting method based on term frequency and chi-square statistics
US20170103062A1 (en) * 2015-10-08 2017-04-13 Facebook, Inc. Language independent representations
JP2018055671A (en) * 2016-09-21 2018-04-05 パナソニックIpマネジメント株式会社 Paraphrase identification method, paraphrase identification device, and paraphrase identification program
CN106446230A (en) * 2016-10-08 2017-02-22 国云科技股份有限公司 Method for optimizing word classification in machine learning text
CN108228574A (en) * 2017-12-07 2018-06-29 科大讯飞股份有限公司 Text translation processing method and device
CN108170686A (en) * 2017-12-29 2018-06-15 科大讯飞股份有限公司 Text interpretation method and device
CN108228576A (en) * 2017-12-29 2018-06-29 科大讯飞股份有限公司 Text interpretation method and device
CN108763221A (en) * 2018-06-20 2018-11-06 科大讯飞股份有限公司 A kind of attribute-name characterizing method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
陈怀兴等: "一种命名实体翻译等价对的抽取方法", 《中文信息学报》 *
陈怀兴等: "一种命名实体翻译等价对的抽取方法", 《中文信息学报》, no. 04, 15 July 2008 (2008-07-15) *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111274826A (en) * 2020-01-19 2020-06-12 南京新一代人工智能研究院有限公司 Semantic information fusion-based low-frequency word translation method
CN111274826B (en) * 2020-01-19 2021-02-05 南京新一代人工智能研究院有限公司 Semantic information fusion-based low-frequency word translation method
CN112560510A (en) * 2020-12-10 2021-03-26 科大讯飞股份有限公司 Translation model training method, device, equipment and storage medium
CN112560510B (en) * 2020-12-10 2023-12-01 科大讯飞股份有限公司 Translation model training method, device, equipment and storage medium
CN113051936A (en) * 2021-03-16 2021-06-29 昆明理工大学 Method for enhancing Hanyue neural machine translation based on low-frequency word representation

Also Published As

Publication number Publication date
CN111428518B (en) 2023-11-21

Similar Documents

Publication Publication Date Title
CN109086267B (en) Chinese word segmentation method based on deep learning
CN109840287B (en) Cross-modal information retrieval method and device based on neural network
US10255275B2 (en) Method and system for generation of candidate translations
CN106502985B (en) neural network modeling method and device for generating titles
CN112733541A (en) Named entity identification method of BERT-BiGRU-IDCNN-CRF based on attention mechanism
CN109614471B (en) Open type problem automatic generation method based on generation type countermeasure network
CN107967262A (en) A kind of neutral net covers Chinese machine translation method
Lin et al. Automatic translation of spoken English based on improved machine learning algorithm
CN111666758B (en) Chinese word segmentation method, training device and computer readable storage medium
CN110162766B (en) Word vector updating method and device
CN110532555B (en) Language evaluation generation method based on reinforcement learning
CN114757182A (en) BERT short text sentiment analysis method for improving training mode
US11475225B2 (en) Method, system, electronic device and storage medium for clarification question generation
CN111428518B (en) Low-frequency word translation method and device
WO2023134083A1 (en) Text-based sentiment classification method and apparatus, and computer device and storage medium
CN111767718A (en) Chinese grammar error correction method based on weakened grammar error feature representation
Wu et al. An effective approach of named entity recognition for cyber threat intelligence
CN114428850A (en) Text retrieval matching method and system
CN114064856A (en) XLNET-BiGRU-based text error correction method
CN111274826B (en) Semantic information fusion-based low-frequency word translation method
Mathur et al. A scaled‐down neural conversational model for chatbots
CN114218928A (en) Abstract text summarization method based on graph knowledge and theme perception
CN111353040A (en) GRU-based attribute level emotion analysis method
CN111274827B (en) Suffix translation method based on multi-target learning of word bag
CN112579739A (en) Reading understanding method based on ELMo embedding and gating self-attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant