CN107273363A

CN107273363A - A kind of language text interpretation method and system

Info

Publication number: CN107273363A
Application number: CN201710335652.4A
Authority: CN
Inventors: 刘洋; 张嘉成; 孙茂松; 栾焕博; 许静芳
Original assignee: Tsinghua University; Beijing Sogou Technology Development Co Ltd
Current assignee: Tsinghua University; Beijing Sogou Technology Development Co Ltd
Priority date: 2017-05-12
Filing date: 2017-05-12
Publication date: 2017-10-20
Anticipated expiration: 2037-05-12
Also published as: CN107273363B

Abstract

The present invention provides a kind of language text interpretation method and system.This method includes：Rule is determined according to default translation candidate collection, the corresponding translation candidate collection of source language text is determined, the translation candidate collection includes multiple cypher texts of source language text；The source language text is language text to be translated；Based on the translation candidate collection, default translation model and default priori model, the first probability distribution and the second probability distribution are determined；First probability distribution is used to indicate that the cypher text meets the probability of priori model, and second probability distribution is used to indicate that the cypher text meets the probability of translation model；Based on first probability distribution and second probability distribution, the cypher text of the source language text is determined from the translation candidate collection.The present invention can incorporate any priori in translation model, so as to improve the accuracy and reliability of machine translation.

Description

A kind of language text interpretation method and system

Technical field

The present invention relates to machine translation mothod field, more particularly to a kind of language text interpretation method and system.

Background technology

Carried out with international, the exchange between different language crowd is growing day by day, translate into extremely to close in exchanging Important instrument.Machine translation because it is convenient simple and free the advantages of, greatly meet the translation demand of people, improve The efficiency of international exchange so that people propose higher requirement to the correctness of machine translation.

Machine translation can substantially be divided into：Rule-based machine translation method and the machine translation based on corpus.Base In the machine translation of corpus, its key issue, which is that, sets up a complete corpus, alternatively referred to as high-quality Training sample.High-quality training sample directly affects the accuracy of translation.However, setting up high-quality training sample not It is an easy thing, reason is that sample data is limited, it is impossible to portray the distribution of initial data well；In addition, Even if sample data enough, can not avoid wherein the presence of error sample, i.e. noise data.The god obtained based on the training sample It is difficult to prepare to embody master mould through network, or even the situation for violating priori occurs.In this case, priori Introducing just becomes particularly significant.For translation rule, for example, " should not repeat translation, should not also leak and turn over ", such rule is just It can be described as priori.Many studies have shown that, priori is incorporated in neural network model to constrain it, god can be improved Performance through network.

Machine translation method (the Attention-based Neural Machine of neutral net based on notice mechanism Translation；Abbreviation Attention-based NMT) be the machine translation based on corpus a branch, be also current A kind of machine translation method used in main flow translation system.Its basic thought is using a non-linear neural net end to end Source language text is directly mapped to target language text by network, that is, builds the new frame of one " coding-decoding "：Give a source Language sentence, a continuous, dense vector is mapped as first by an encoder, then reuses a decoder The vector is converted into a target language sentence.But, this method is difficult that priori is dissolved among neutral net.

Also there is the technology that priori is dissolved into neutral net by some at present.For example, some technologies are by priori Represented with extra neural network module；Some technologies in training objective by adding limit entry to incorporate priori.Though These right technologies can significantly lift translation effect, but the former correlation that requires between different prioris be also required to by Modeling, the latter is merely able to a small amount of simple limit entry of addition.These problems cause these technologies to be applied to will be any, multiple Miscellaneous priori incorporates neural network machine translation model.

Therefore, how to provide it is a kind of can by any priori incorporate neural network machine translation model interpretation method The problem of being a urgent need to resolve.

The content of the invention

To solve the problem of any priori can not being incorporated into neutral net translation model of prior art presence, this hair It is bright that a kind of language text interpretation method and system are provided.

On the one hand, the present invention provides a kind of language text interpretation method, and this method includes：

Rule is determined according to default translation candidate collection, the corresponding translation candidate collection of source language text is determined, it is described Translation candidate collection includes multiple cypher texts of source language text；The source language text is language text to be translated；

Based on the translation candidate collection, default translation model and default priori model, the first probability is determined Distribution and the second probability distribution；First probability distribution is used to indicate that the cypher text meets the general of priori model Rate, second probability distribution is used to indicate that the cypher text meets the probability of translation model；

Based on first probability distribution and second probability distribution, the source is determined from the translation candidate collection The cypher text of language text.

On the other hand, the present invention provides a kind of language text translation system, and the system includes：

Candidate collection module is translated, for determining rule according to default translation candidate collection, source language text pair is determined The translation candidate collection answered, the translation candidate collection includes multiple cypher texts of source language text；The source language text For language text to be translated；

Training module, for translating candidate collection, default translation model and default priori model based on described, Determine the first probability distribution and the second probability distribution；First probability distribution is used to indicate that the cypher text meets priori and known Know the probability of model, second probability distribution is used to indicate that the cypher text meets the probability of translation model；

Translation module, for based on first probability distribution and second probability distribution, from the translation Candidate Set The cypher text of the source language text is determined in conjunction.

Language text interpretation method and system that the present invention is provided, by calculating priori model and translation model respectively Translation candidate collection on probability distribution, and using the difference of two probability distribution as speech training target a part, from And make it that Machine Translation Model may learn arbitrary priori, improve the accuracy of machine translation result and reliable Property.

Brief description of the drawings

Fig. 1 is the schematic flow sheet of language text interpretation method provided in an embodiment of the present invention；

Fig. 2 is the structural representation of language text translation system provided in an embodiment of the present invention；

Embodiment

To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention In accompanying drawing, the technical scheme in the embodiment of the present invention is explicitly described, it is clear that described embodiment be the present invention A part of embodiment, rather than whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art are not having The every other embodiment obtained under the premise of creative work is made, the scope of protection of the invention is belonged to.

Fig. 1 is the schematic flow sheet of language text interpretation method provided in an embodiment of the present invention.As shown in figure 1, this method Comprise the following steps：

Step 101, according to it is default translation candidate collection determine rule, determine the corresponding translation Candidate Set of source language text Close, the translation candidate collection includes multiple cypher texts of source language text；The source language text is language to be translated Text；

Step 102, based on the translation candidate collection, default translation model and default priori model, it is determined that First probability distribution and the second probability distribution；First probability distribution is used to indicate that the cypher text meets priori mould The probability of type, second probability distribution is used to indicate that the cypher text meets the probability of translation model；

Step 103, based on first probability distribution and second probability distribution, from the translation candidate collection really The cypher text of the fixed source language text.

Specifically, first, default translation candidate collection determine rule refer to translate be the generation of sequence task, source There are multiple words or word in language text x, in generation translation candidate collection, the word or word of previous generation can be as latter The input of individual word or word.According to the source language text x of different length, its size for really translating candidate collection is exponential , it is impossible to effectively calculate.In actual applications, by stochastical sampling or beam search, so as to obtain the source language text Multiple cypher texts, that is, translate candidate collection S (x), be that can be achieved using prior art, here is omitted；

Then, according to the translation candidate collection S (x) and default priori model Q (y | x；γ), the first probability is determined DistributionAccording to the translation candidate collection S (x) and default translation model P (y | x；θ), the second probability distribution is determinedFinally, based on the first probability distribution and the second probability distribution, source language text is determined from translation candidate collection Cypher text y.

For sake of clarity, if source language text x is as input, cypher text y thus constitutes sentence right as output (x, y).In actual applications, under different linguistic context there are different semantemes in same word or word, and source language text x be by Multiple words or word are according to the different compositions that puts in order, and the uncertainty of the ambiguity and order of word or word causes one Individual source language text may correspond to multiple cypher texts (y1, y2, y3 etc.), and probability highest is then in this multiple cypher text Optimal cypher text, in order to be made a distinction with other cypher texts, referred to as target language text.

For example, default priori model Q (y | x；γ), it can be obtained not according to different characteristic function φ (x, y) Same model, the first probability distribution can be determined according to the following formula：

Wherein, x represents source language text, and y is target language text, and y ' is cypher text, and γ is priori model Parameter preset.

Characteristic function φ (x, y) represents the corresponding relation of source language text and cypher text in priori knowledge data base, Based on specific characteristic function, each cypher text y1, y2 and y3 are given a mark using priori model, that is, calculate each Cypher text meets the probability of priori model.Wherein, the cypher text of priori model is more met, probability is higher.

Translation model P (y | x；θ) it is then the commonly used scoring model of machine translation, the translation model can be parallel by training Corpus is obtained, and the corresponding relation of source language text x and cypher text y in Parallel Corpus is represented, for calculating each translation Text meets the probability of translation model, belongs to prior art, and here is omitted.

According to translation candidate collection S (x) and translation model P (y | x；θ), the second probability distribution can be determined by following formula：

Wherein, x represents source language text, and y is target language text, and y ' is cypher text, and θ is the parameter of translation model；α It is the default hyper parameter for controlling the second probability distribution steep.

Language text interpretation method provided in an embodiment of the present invention, by comprehensively utilizing priori model and translation mould Multiple cypher texts are given a mark by type in terms of two, so as to encourage the cypher text for more meeting priori model turning over The probability translated under model is also higher, so that final from translation candidate collection determine target language text, improves translation model Performance and translation result accuracy.

On the basis of above-described embodiment, first probability distribution and described second in the language text interpretation method Probability distribution, determines the cypher text of the source language text from the translation candidate collection, including：

Based on first probability distribution and second probability distribution, probability difference parameter value is determined；The probability difference Different parameter is used for the difference for indicating first probability distribution and second probability distribution；

Based on the probability difference parameter value, the translation text of the source language text is determined from the translation candidate collection This.

Specifically, first, rule is determined according to default translation candidate collection, determines the corresponding translations of source language text x Candidate collection S (x)；Then, based on the translation candidate collection, translation model and priori model, the first probability distribution is determinedWith the second probability distributionAfterwards, the probability between the first probability distribution and the second probability distribution is determined Difference parameter value；Finally, based on the probability difference parameter value, turning over for source language text x is determined from translation candidate collection S (x) This y of translation.

For example, after User logs in translation system, in-the input in Chinese column of English translation window in input source language text x For " many airports are all forced to close ", determine that translation candidate collection S (x) there are two cypher texts according to x：Y1 is " Many Airports were closed to close " and y2 are " Many airports were forced to close down”；

According to priori model, the first probability distribution is determined

Wherein, the probability that Q (y1 | x)=0.2, i.e. sentence meets (x, y1) priori model is 0.2；Q (y2 | x)= The probability that 0.8, i.e. sentence meet (x, y2) priori model is 0.8；

According to translation model, the second probability distribution is determined：

Wherein, the probability that P (y1 | x)=0.6, i.e. sentence meets (x, y1) translation model is 0.6；P (y2 | x)=0.4, i.e., The probability that sentence meets (x, y2) translation model is 0.4；

Pass through the first probability distribution and the second probability distribution, it may be determined that difference parameter value therebetween；Based on the difference Different parameter value is adjusted to translation model and above-mentioned two cypher text is given a mark again, obtain P (y1 | x)=0.3, P (y2 | X)=0.7；

Accordingly, it is determined that source language text x：The cypher text y of " many airports are all forced to close ":“Many airports were forced to close down”。

By above-described embodiment it can be seen that, language text interpretation method provided in an embodiment of the present invention, based on the first probability Distribution and the difference parameter value of the second probability distribution, and multiple cypher texts are given a mark again according to translation model, so as to improve Meet probability of the cypher text of priori in translation model probability distribution, and then obtain more accurately source language text Cypher text.

On the basis of above-described embodiment, the difference parameter value of first probability distribution and second probability distribution is KL (Kullback-Leibler) distance, can be determined by following formula：

On the basis of the various embodiments described above, in the language text interpretation method based on the probability difference parameter value, The cypher text of the source language text is determined from the translation candidate collection, including：

Based on the difference parameter value, training objective is determined；The training objective is used to indicate the translation model to institute State priori Model approximation；

Based on the training objective and the default model that reorders, the original language is determined from the translation candidate collection The cypher text of text.

Specifically, first, rule is determined according to default translation candidate collection, determines the corresponding translations of source language text x Candidate collection S (x)；Then, based on the translation candidate collection, translation model and priori model, the first probability distribution is determinedWith the second probability distributionAfterwards, the probability between the first probability distribution and the second probability distribution is determined Difference parameter value；Finally, based on the probability difference parameter value, training objective J (θ, γ) is determined so that translation model is to priori mould Type is approached；Finally, based on training objective J (θ, γ) and the default model that reorders, the determination source from translation candidate collection S (x) Language text x cypher text y.

In general, when being given a mark to cypher text, and generally use translation model P (y | x；Log-likelihood θ) is estimated Be counted as standard exercise criterion, i.e., traditional training objective for log-likelihood function L (θ)=logP (y | x；θ).

By determining the difference parameter value of the first probability distribution and the second probability distribution, the difference parameter value is added into tradition In training objective, it is determined that new training objective is J (θ, γ), the training objective thinks that optimal parameter θ and γ can encourage most to accord with Probability highest of the cypher text of priori in the second probability distribution of translation model is closed, so that translation model more inclines The cypher text that priori is determined for compliance with Xiang Yucong translation candidate collection S (x) is source language text x target language text y。

Alternatively, if the difference parameter value is KL distances, training objective can be determined according to following formula：

Wherein, λ₁And λ₂It is the default hyper parameter of balance training target, N is the sentence logarithm of training data.

Optimal parameter θ and γ is obtained by new training objective, using the following model that reorders, from translation candidate Determine the cypher text of source language text.

Y=argmax_y∈S(x){logP(y|x；θ)+γ·φ(x,y)}

For example, it is assumed that source language text x is " Bush and salon have held talks ", translation candidate collection S is determined according to x (x) there are three cypher texts：Y1 is " Bush held a talk with Sharon ", y2 are " Bush held a talk With Bush ", y3 are " Bush had lunch with Sharon ".

Assuming that characteristic function φ (x, y) represents the word pair occurred in sentence centering source language text x and target language text y Quantity, word is combined into { (Bush, Bush), (holding, held), (talks, talk), (salon, Sharon) } to collection, then the In one cypher text y1,4 words are to occurring, therefore φ (x, y1)=4；Similarly, φ (x, y2)=3, φ (x, y3)= 2。

First probability distribution can be determined according to priori model

Wherein, cypher text y1 probability is：

It can similarly obtain：Q (y2 | x)=e³/(e²+e³+e⁴)；Q (y3 | x)=e²/(e²+e³+e⁴).Final Q (y1 | x)= 0.67, Q (y2 | x)=0.24, Q (y3 | x)=0.09.

By above-mentioned probability, cypher text y1 best suits priori model, and is in fact also correctly to turn over Translation sheet；Cypher text y2 has then substantially run counter to the priori of " should not repeat translation, should not leak and turn over ", therefore probability is relatively low； Cypher text y3 then deviate from the semanteme of source language text, therefore probability is lower.

Assuming that obtaining the second probability distribution by the translation model before adjustment

Wherein, P (y1 | x)=0.4, P (y2 | x)=0.5, P (y3 | x)=0.1, translation model can translate " Bush held a talk with Bush”。

Now, if default hyper parameter λ₁、λ₂Numerical value be 1, pass through formula calculate above-mentioned two probability distribution between KL (P | | Q), new training objective J (θ, γ) is determined based on KL distances；

Based on the training objective and reorder model, translation model is adjusted, P (y1 | x)=0.6 after training, P (y2 | x)=0.31, P (y3 | x)=0.09, it is seen then that new training objective improves cypher text y1 probability, and reduces Cypher text y2 and y3 probability so that more meet probability in probability distribution of the cypher text of priori in translation model It is higher, even if translation model is to priori Model approximation.

Therefore, the target language text y of final output is " Bush held a talk with Sharon ".

By above-described embodiment it can be seen that, language text interpretation method provided in an embodiment of the present invention, by the way that elder generation will be met Test the probability distribution of knowledge model and meet translation model probability distribution between KL distances add traditional training objective, drum Encourage probability of the cypher text for more meeting priori model under translation model also higher, and then the translation more optimized Model parameter, so that final from translation candidate collection determine target language text, improves performance and the translation of translation model As a result accuracy.

Fig. 2 is the structural representation of language text translation system provided in an embodiment of the present invention.As shown in Fig. 2 the system Including：Translate candidate collection module 21, training module 22 and translation module 23.Wherein, translation candidate collection module 21 is used for root Rule is determined according to default translation candidate collection, the corresponding translation candidate collection of source language text, the translation Candidate Set is determined Conjunction includes multiple cypher texts of source language text；The source language text is language text to be translated；Training module 22 is used In based on the translation candidate collection, default translation model and default priori model, determine the first probability distribution and Second probability distribution；First probability distribution is used to indicate that the cypher text meets the probability of priori model, described Second probability distribution is used to indicate that the cypher text meets the probability of translation model；Translation module 23 is used to be based on described first Probability distribution and second probability distribution, determine the cypher text of the source language text from the translation candidate collection.

It should be noted that the language text translation system is that, in order to realize above method embodiment, its function is specific Above method embodiment is referred to, here is omitted.

On the basis of above-described embodiment, the translation module 23 in the system is specifically for based on first probability distribution And second probability distribution, determine probability difference parameter value；The probability difference parameter is used to indicate first probability point The difference of cloth and second probability distribution；Based on the probability difference parameter value, institute is determined from the translation candidate collection State the cypher text of source language text.Alternatively, the probability difference parameter is KL distances.

On the basis of the various embodiments described above, the translation module 23 in the system is specifically for based on the difference parameter Value, determines training objective；The training objective is used to indicate the translation model to the priori Model approximation；Based on institute Training objective and the default model that reorders are stated, the translation text of the source language text is determined from the translation candidate collection This.

The language text interpretation method and system provided by the present invention, translation is dissolved into the training stage by priori In model, the performance of translation model is improved, and then priori is applied in translation process, extra without increase Mixed-media network modules mixed-media, which is achieved that, applies to any priori in machine translation, the final accuracy for improving translation result and reliable Property.

Finally it should be noted that：The above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations；Although The present invention is described in detail with reference to the foregoing embodiments, it will be understood by those within the art that：It still may be used To be modified to the technical scheme described in foregoing embodiments, or equivalent substitution is carried out to which part technical characteristic； And these modification or replace, do not make appropriate technical solution essence depart from various embodiments of the present invention technical scheme spirit and Scope.

Claims

1. a kind of language text interpretation method, it is characterised in that including：

Rule is determined according to default translation candidate collection, the corresponding translation candidate collection of source language text, the translation is determined Candidate collection includes multiple cypher texts of source language text；The source language text is language text to be translated；

Based on the translation candidate collection, default translation model and default priori model, the first probability distribution is determined And second probability distribution；First probability distribution is used to indicate that the cypher text meets the probability of priori model, institute Stating the second probability distribution is used to indicate that the cypher text meets the probability of translation model；

Based on first probability distribution and second probability distribution, the original language is determined from the translation candidate collection The cypher text of text.

2. according to the method described in claim 1, it is characterised in that described based on first probability distribution and described second general Rate is distributed, and the cypher text of the source language text is determined from the translation candidate collection, including：

Based on first probability distribution and second probability distribution, probability difference parameter value is determined；The probability difference ginseng Number is used for the difference for indicating first probability distribution and second probability distribution；

Based on the probability difference parameter value, the cypher text of the source language text is determined from the translation candidate collection.

3. method according to claim 2, it is characterised in that the probability difference parameter is KL distances.

4. method according to claim 2, it is characterised in that based on the probability difference parameter value, waited from the translation Selected works determine the cypher text of the source language text in closing, including：

Based on the difference parameter value, training objective is determined；The training objective is used to indicate the translation model to the elder generation Knowledge model is tested to approach；

Based on the training objective and the default model that reorders, the source language text is determined from the translation candidate collection Cypher text.

5. a kind of language text translation system, it is characterised in that including：

Candidate collection module is translated, for determining rule according to default translation candidate collection, determines that source language text is corresponding Candidate collection is translated, the translation candidate collection includes multiple cypher texts of source language text；The source language text is to treat The language text of translation；

Training module, for translating candidate collection, default translation model and default priori model based on described, it is determined that First probability distribution and the second probability distribution；First probability distribution is used to indicate that the cypher text meets priori mould The probability of type, second probability distribution is used to indicate that the cypher text meets the probability of translation model；

Translation module, for based on first probability distribution and second probability distribution, from the translation candidate collection Determine the cypher text of the source language text.

6. system according to claim 5, it is characterised in that the translation module specifically for：

7. system according to claim 6, it is characterised in that the probability difference parameter is KL distances.

8. system according to claim 6, it is characterised in that the translation module specifically for：