JP2014216029A

JP2014216029A - Device and method for evaluating word phrase of intermediate language, and device and method for machine translation

Info

Publication number: JP2014216029A
Application number: JP2014092639A
Authority: JP
Inventors: フゥ・イウェヌ; Yiwen Fu; 乃晟葛; Naisheng Ge; ジョン・ジョォングアン; Zhongguang Zheng; 遥孟; Yao Meng; 浩于; Yu Hao
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2013-04-26
Filing date: 2014-04-28
Publication date: 2014-11-17
Anticipated expiration: 2034-04-28
Also published as: JP6326940B2; CN104123274A; CN104123274B

Abstract

PROBLEM TO BE SOLVED: To provide a device and a method for evaluating a word phrase of an intermediate language, a machine translation method, and a machine translation device.SOLUTION: A method for evaluating a word phrase of an intermediate language includes the steps of: determining a first specific attribute exhibited by a word phrase of an intermediate language in regard to a source language; determining a second specific attribute exhibited by the word phrase of the intermediate language in regard to a target language; calculating a reliability score of the word phrase of the intermediate language, on the basis of the first specific attribute and the second specific attribute; and evaluating the word phrase of the intermediate language, on the basis of the reliability score. The word phrase of the intermediate language serves as a bridge for translating a specific word phrase of the source language to a word phrase of the target language.

Description

本発明は、言語処理分野に関し、特に、中間言語の語句を評価する方法及び装置並びに機械翻訳方法及び装置に関する。 The present invention relates to the field of language processing, and in particular, to a method and apparatus for evaluating phrases in an intermediate language, and a machine translation method and apparatus.

平行語彙情報は、クロスランゲージ(Cross Language)応用（機械翻訳、クロスランゲージ情報取得など）において重要な地位を占める。しかし、任意の２種類の語言の平行語彙情報を取得することができるとは言えず、特に、使用範囲が非常に有限な語言については、それに関する平行語彙情報を得ることがかなり難しい。また、新しい単語や言葉がどんどん出てくるので、既に把握している平行語彙情報をもって新しく出てくる単語や言葉の発展に適応することもかなり難しい。よって、中間言語（Pivot Language）という概念が導入されている。中間言語によって、より多くの平行語彙情報を補充する。しかし、中間言語の使用については、主に、２つの主な障碍が存在し、即ち、第一は、多義性であり、第二は、不整合である。 Parallel vocabulary information occupies an important position in cross language application (machine translation, acquisition of cross language information, etc.). However, it cannot be said that parallel vocabulary information of any two types of words can be acquired, and it is quite difficult to obtain parallel vocabulary information related to words that have a very limited range of use. Also, since new words and words come out more and more, it is quite difficult to adapt to the development of new words and words that come out with parallel vocabulary information already known. Therefore, the concept of intermediate language (Pivot Language) has been introduced. Supplement more parallel vocabulary information with an intermediate language. However, for the use of intermediate languages, there are mainly two main obstacles: the first is ambiguity and the second is inconsistent.

ソース言語（Source Language）、中間言語、及びターゲット言語（Target Language）の多義問題を解決するために、従来方法では、構造化バイリンガル字典、語義クラス（semantic class）、複数の中間言語言、関連周波数、編集距離などの情報を用いる。従来方法では、一貫して、ソース言語と中間言語との間の翻訳確率、及び、中間言語とターゲット言語との間の翻訳確率により多義問題を反映することができ、また、最も高い確率を選択することによって多義問題を解決し得るとのことを信じる。 In order to solve the ambiguous problems of Source Language, Intermediate Language, and Target Language, the traditional method uses structured bilingual dictionaries, semantic classes, multiple intermediate language words, and related frequencies. Information such as edit distance is used. In the conventional method, the ambiguity problem can be consistently reflected by the translation probability between the source language and the intermediate language and the translation probability between the intermediate language and the target language, and the highest probability is selected. We believe that by doing so we can solve ambiguous problems.

中間言語によってバイリンガル情報を得る従来方法は、基本的に、次のようなプロセスに従う。まず、ソース言語と中間言語との情報、及び、中間言語とターゲット言語との情報、例えば、翻訳確率、語彙化翻訳確率、編集距離、語義情報などを得る。次に、これらの情報に基づいて、最も信頼できるペアのソース言語の語句及びターゲット言語の語句を選択する。しかし、中間言語が複数の語義を含む場合、従来方法は、この種の場合に対して特殊処理を行うことがなく、依然として、翻訳確率が最も高い語句を最終結果として選択する。この種の方法は、非平行な言語材料に由来するソース言語及びターゲット言語が同じ意味を有しないとの特徴をなおざりにするので、中間言語の語句が多義を有する場合、翻訳確率は、ソース言語の語句とターゲット言語の語句との間の語義関係を反映することができない。 The conventional method of obtaining bilingual information by an intermediate language basically follows the following process. First, information on the source language and the intermediate language, and information on the intermediate language and the target language, for example, translation probability, lexicalized translation probability, editing distance, semantic information, and the like are obtained. Next, based on this information, the most reliable pair of source language phrases and target language phrases is selected. However, when the intermediate language includes a plurality of meanings, the conventional method does not perform special processing for this type of case, and still selects the phrase having the highest translation probability as the final result. This type of method leaves behind the feature that the source and target languages from non-parallel language material do not have the same meaning, so if the intermediate language phrase is ambiguous, the translation probability is the source language The semantic relationship between the phrase and the target language phrase cannot be reflected.

よって、上述の問題を解決することができる技術が望ましい。 Therefore, a technique that can solve the above-described problem is desirable.

本発明の目的は、中間言語の語句を評価する方法及び中間言語の語句を評価する装置、並びに、機械翻訳方法及び機械翻訳装置を提供することにある。 An object of the present invention is to provide a method for evaluating an intermediate language phrase, an apparatus for evaluating an intermediate language phrase, a machine translation method, and a machine translation apparatus.

本発明の一側面によれば、中間言語の語句を評価する方法が提供される。該方法は、中間言語の語句がソース言語に対する第一特定属性を確定し；中間言語の語句がターゲット言語に対する第二特定属性を確定し；第一特定属性及び第二特定属性に基づいて、中間言語の語句の信頼性点数を計算し；及び、信頼性点数に基づいて、中間言語の語句を評価することを含み、そのうち、中間言語の語句は、ソース言語の特定語句をターゲット言語の語句に翻訳するためのブリッジ（Bridge）である。 According to one aspect of the invention, a method for evaluating an intermediate language phrase is provided. The method includes intermediate language phrases determining first specific attributes for the source language; intermediate language phrases determining second specific attributes for the target language; based on the first specific attributes and the second specific attributes, intermediate Calculating a confidence score for the language phrase; and evaluating an intermediate language phrase based on the confidence score, wherein the intermediate language phrase converts the source language specific phrase into the target language phrase It is a bridge for translation.

本発明の他の側面によれば、機械翻訳方法が提供される。該方法は、
上述の中間言語の語句の評価方法により、中間言語の語句の信頼性点数を取得し；中間言語の語句の信頼性点数、及び、ソース言語の特定語句を中間言語の語句を経由してターゲット言語の候補語句に翻訳する機械翻訳点数に基づいて、ターゲット言語の候補語句の翻訳点数を計算し；及び、翻訳点数に基づいて、ターゲット言語の候補語句からターゲット言語の語句を翻訳結果として選択することを含む。 According to another aspect of the present invention, a machine translation method is provided. The method
The intermediate language phrase reliability score is obtained by the intermediate language phrase evaluation method described above; the intermediate language phrase reliability score and the source language specific phrase via the intermediate language phrase as the target language Calculating the translation score of the target language candidate word based on the machine translation score translated into the candidate language; and selecting the target language word as the translation result from the target language candidate word based on the translation score including.

本発明の一実施例における中間言語の語句を評価する方法のフローチャートである。4 is a flowchart of a method for evaluating an intermediate language phrase in one embodiment of the present invention. 本発明の一実施例における人工神経ネットワークに基づく信頼性点数の計算を示す図である。It is a figure which shows calculation of the reliability score based on the artificial neural network in one Example of this invention. 本発明の他の実施例における中間言語の語句を評価する方法のフローチャートである。6 is a flowchart of a method for evaluating an intermediate language phrase according to another embodiment of the present invention. 本発明の一実施例における機械翻訳方法のフローチャートである。It is a flowchart of the machine translation method in one Example of this invention. 本発明の一実施例における対中間言語の語句を評価する装置の構成ブロック図である。1 is a configuration block diagram of an apparatus for evaluating phrases in an intermediate language according to an embodiment of the present invention. 本発明の他の実施例における中間言語の語句を評価する装置の構成ブロック図である。It is a structure block diagram of the apparatus which evaluates the phrase of an intermediate language in the other Example of this invention. 本発明の他の実施例における中間言語の語句を評価する装置の構成ブロック図である。It is a structure block diagram of the apparatus which evaluates the phrase of an intermediate language in the other Example of this invention. 本発明の一実施例における機械翻訳装置の構成ブロック図である。1 is a configuration block diagram of a machine translation apparatus in an embodiment of the present invention. 本発明における中間言語の語句を評価する方法及び装置並びに機械翻訳方法及び機械翻訳装置を実施又は実現するための計算装置の例の構成図である。It is a block diagram of the example of the calculation apparatus for implementing or implement | achieving the method and apparatus, the machine translation method, and the machine translation apparatus of evaluating the phrase of an intermediate language in this invention.

以下、図面を参照しながら本発明の実施例を説明する。 Embodiments of the present invention will be described below with reference to the drawings.

ある場合、ソース言語Aとターゲット言語Bとを直接関連させるための情報が足りず、中間言語Eとソース言語Aとを直接関連させる情報、及び、中間言語Eとターゲット言語Bとを直接関連させるための情報が存在する可能性がある。この種の場合、中間言語E中から１つ又は複数の語句Pをブリッジとして選択し、そして、ソース言語A中の特定語句Sをターゲット言語B中の対応する語句に翻訳するとのことを考慮してもよい。言い換えれば、中間言語Eの語句Pは、ソース言語Aの特定語句Sをターゲット言語B中の対応する語句Tに翻訳するためのブリッジとされてもよい。 In some cases, there is not enough information to directly associate source language A and target language B, information that directly associates intermediate language E and source language A, and directly associates intermediate language E and target language B. There may be information for In this type of case, consider that one or more phrases P from the intermediate language E are selected as bridges, and that specific phrases S in the source language A are translated into corresponding phrases in the target language B. May be. In other words, the phrase P in the intermediate language E may be a bridge for translating the specific phrase S in the source language A into the corresponding phrase T in the target language B.

図1Aは、本発明の一実施例における中間言語Eの語句Pを評価する方法100のフローチャートである。図1Bは、本発明の一実施例における人工神経ネットワークに基づく信頼性点数の計算を示す図である。 FIG. 1A is a flowchart of a method 100 for evaluating a phrase P of an intermediate language E in one embodiment of the present invention. FIG. 1B is a diagram showing calculation of a reliability score based on an artificial neural network in one embodiment of the present invention.

図1Aに示すように、ステップS102では、中間言語Eの各語句Pがソース言語Aに対する特定属性（即ち、第一特定属性）を確定する。言い換えれば、語句Pがソース言語Aにおいて呈する特定属性を確定する。 As shown in FIG. 1A, in step S102, each phrase P of the intermediate language E determines a specific attribute (that is, a first specific attribute) for the source language A. In other words, the specific attribute that the phrase P exhibits in the source language A is determined.

第一特定属性は、中間言語Eの各語句Pがソース言語Aにおける語義範囲（即ち、第一語義範囲）を含んでもよい。語句Pがソース言語Aにおける意味が多いほど、第一語義範囲が大きい。例えば、該第一語義範囲は、ソース言語A中の、中間言語Eの各語句Pに対応するの語句の数、又は、該数の関数であってもよい。ある場合、ソース言語A中の、中間言語Eの語句Pに対応する語句の数が指数性を呈するので、該数の１つの関数を取って、得た結果が線形性を有するようにさせてもよい。例えば、該関数は対数であってもよい。 In the first specific attribute, each phrase P of the intermediate language E may include a meaning range in the source language A (that is, a first meaning range). The greater the meaning of the word P in the source language A, the greater the first meaning range. For example, the first semantic range may be the number of words corresponding to each word P of the intermediate language E in the source language A, or a function of the number. In some cases, the number of phrases corresponding to the phrase P of the intermediate language E in the source language A exhibits exponentiality, so one function of the number is taken so that the obtained result has linearity. Also good. For example, the function may be logarithmic.

ステップS104では、中間言語Eの各語句Pがターゲット言語Bに対する特定属性（即ち、第二特定属性）を確定する。言い換えれば、語句Pがソース言語Aにおいて呈する特定属性を確定する。 In step S104, each phrase P of the intermediate language E determines a specific attribute (that is, a second specific attribute) for the target language B. In other words, the specific attribute that the phrase P exhibits in the source language A is determined.

第二特定属性は、中間言語Eの各語句Pがターゲット言語Bにおける語義範囲（即ち、第二語義範囲）を含んでも良い。語句Pがターゲット言語Bにおける意味が多いほど、範囲が大きい。例えば、該第二語義範囲は、ターゲット言語B中の、中間言語Eの各語句Pに対応する語句の数、又は、該数の関数であってもよい。同様に、該関数は対数であってもよい。 In the second specific attribute, each phrase P of the intermediate language E may include a meaning range in the target language B (that is, a second meaning range). The greater the meaning of the word P in the target language B, the greater the range. For example, the second semantic range may be the number of phrases corresponding to each phrase P of the intermediate language E in the target language B, or a function of the number. Similarly, the function may be logarithmic.

ステップS106では、第一特定属性及び第二特定属性に基づいて、中間言語Eの各語句Pの信頼性点数を計算する。 In step S106, the reliability score of each word P of the intermediate language E is calculated based on the first specific attribute and the second specific attribute.

一実施例では、第一特定属性及び第二特定属性を特徴とし、例えば、回帰アルゴリズムによって、中間言語Eの各語句Pの信頼性点数を計算しても良い。 In one embodiment, the first specific attribute and the second specific attribute are characterized, and the reliability score of each phrase P of the intermediate language E may be calculated by, for example, a regression algorithm.

好ましくは、一実施例では、第一特定属性及び第二特定属性を特徴として、人工神経ネットワーク（Artificial Neural Networks：ANN）アルゴリズムによって、中間言語Eの各語句Pの信頼性点数を計算してもよい。 Preferably, in one embodiment, the reliability score of each word P of the intermediate language E is calculated by an artificial neural network (ANN) algorithm characterized by the first specific attribute and the second specific attribute. Good.

人工神経ネットワークは、実質的に、一種の簡単な数学モデルであり、方程式f:x|yによって定義され得る。人工神経ネットワーク中のネットワークという単語とは、各システム中の各層の神経元中の神経元間の相互接続を指す。１つの典型な人工神経ネットワークは、三層の神経元を有する。第一層の神経元は、入力神経元であり、データを受けて第二層の神経元に送るために用いられる。その後、データは、数が第一層の数よりも多い第二層神経元によって第三層の出力神経元に送られる。より複雑な人工神経ネットワークは、より多くの層数を有してもよい。各神経元に記憶のパラメータが重み（weight）と称され、重みは、データ伝送プロセスにおいてデータとの計算が行われる。数学上では、１つの神経ネットワークの方程式f(x)は、一連の方程式g_m(x)の組み合わせである。g_m(x)は、他の一連の方程式の組み合わせと定義されてもよい。この方程式は、図1Bに示すようなネットワーク構成図と表されても良く、この構成図中の矢印は、各変量間の依存関係を示し、Pr(S|P)は、中間言語Eの語句Pの、ソース言語Aの特定語句への翻訳確率を表し、Pw(S|P)は、中間言語Eの語句の、ソース言語Aの特定語句への語彙化翻訳確率を表し、S(P)は、第一語義範囲及び第二語義範囲を含むことを表す。図1Bには、Pr(S|P)、Pw(S|P)、及びS(P)を同時に示しているが、S(P)のみを用い、Pr(S|P)及びPw(S|P)を用いなくてもよい。 An artificial neural network is essentially a kind of simple mathematical model that can be defined by the equation f: x | y. The term network in an artificial neural network refers to the interconnection between the nerve elements in the nerve elements of each layer in each system. One typical artificial neural network has three layers of nerve elements. The first layer is the input source and is used to receive data and send it to the second layer. Thereafter, the data is sent to the output nerve of the third layer by the second layer nerve whose number is greater than the number of the first layer. More complex artificial neural networks may have more layers. The parameter of storage at each nerve element is called weight, and the weight is calculated with data in the data transmission process. In mathematics, an equation f (x) for one neural network is a combination of a series of equations g _m (x). g _m (x) may be defined as a combination of other series of equations. This equation may be represented as a network configuration diagram as shown in FIG. 1B, where the arrows in this configuration diagram indicate the dependency between each variable, and Pr (S | P) is a phrase in the intermediate language E P represents the translation probability of a specific phrase in source language A, Pw (S | P) represents the lexicalization probability of a phrase in intermediate language E into a specific phrase in source language A, and S (P) Represents that the first meaning range and the second meaning range are included. In FIG. 1B, Pr (S | P), Pw (S | P), and S (P) are shown simultaneously, but only S (P) is used, and Pr (S | P) and Pw (S | P) may not be used.

人工神経ネットワークでは、“非線形重み和”は、一種の広く使用さている関数の組み合わせであり、次のように表れる。

In an artificial neural network, “non-linear weight sum” is a kind of widely used combination of functions, and is expressed as follows.

そのうち、f(x)は、信頼性点数を表し、Kは、活性化関数を表し、g_m(x)は、第m個の特徴の値を表し、w_mは、第m個の特徴の重みを表し、biasWは、バイアス重みを表し、biasVは、バイアス値を表す。一実施例では、tanh(x)を活性化関数として用いる。 F (x) represents the reliability score, K represents the activation function, g _m (x) represents the value of the mth feature, and w _m represents the mth feature. The weight represents bias weight, biasW represents the bias weight, and biasV represents the bias value. In one embodiment, tanh (x) is used as the activation function.

このように、人工神経ネットワークによって信頼性点数を計算することができる。なお、他の回帰方法又は他の適切な方法によって信頼性点数を計算してもよい。 Thus, the reliability score can be calculated by the artificial neural network. The reliability score may be calculated by other regression methods or other appropriate methods.

一実施例では、中間言語の信頼性は、語句Pが、語句Sを対応する語句Tに翻訳するためのブリッジになる可能性と、語義正確性とのバランスとして定義されてもよい。例えば、上述の実施例では、人工神経ネットワークを用いて回帰計算を行うことによって、可能性と語義正確性との間のバランスを求め、そして、最終的には、中間言語E中の各語句Pについて、それ相応の信頼性点数を計算する。 In one embodiment, the reliability of the intermediate language may be defined as the balance between the likelihood that the phrase P becomes a bridge for translating the phrase S into the corresponding phrase T and the semantic accuracy. For example, in the embodiment described above, a regression calculation is performed using an artificial neural network to determine the balance between possibility and semantic accuracy, and finally each word P in the intermediate language E The corresponding reliability score is calculated for.

一実施例では、第一語義範囲及び第二語義範囲を特徴として用いて信頼性点数を計算してもよい。他の実施例では、第一語義範囲、第二語義範囲、中間言語Eの語句Pの、ソース言語Aの特定語句Sへの翻訳確率、及び/又は、中間言語Eの語句の、ソース言語Aの特定語句Sへの語彙化翻訳確率を特徴として用いて信頼性点数を計算してもよい。言い換えれば、該実施例では、第一語義範囲及び第二語義範囲の他、第一特定属性は、さらに、中間言語Eの語句Pの、ソース言語Aの特定語句への翻訳確率、及び/又は、中間言語Eの語句の、ソース言語Aの特定語句への語彙化翻訳確率を含んでもよい。なお、任意の適切な方法で、中間言語Eの語句Pの、ソース言語Aの特定語句への翻訳確率Pr(S|P)、及び、中間言語Eの語句の、ソース言語Aの特定語句への語彙化翻訳確率Pw(S|P)を求めてもよい。例えば、次の式（2）により、中間言語Eの語句Pの、ソース言語Aの特定語句への翻訳確率Pr(S|P)を計算することができる。

In one embodiment, the reliability score may be calculated using the first semantic range and the second semantic range as features. In other embodiments, the first semantic range, the second semantic range, the translation probability of the phrase P of the intermediate language E to the specific phrase S of the source language A, and / or the source language A of the phrase of the intermediate language E The reliability score may be calculated using the probability of lexical translation into a specific phrase S as a feature. In other words, in the embodiment, in addition to the first semantic range and the second semantic range, the first specific attribute further includes a translation probability of the phrase P of the intermediate language E to the specific phrase of the source language A, and / or The lexical translation probability of the phrase of the intermediate language E to the specific phrase of the source language A may be included. The translation probability Pr (S | P) of the phrase P of the intermediate language E to the specific phrase of the source language A and the phrase of the intermediate language E to the specific phrase of the source language A by any appropriate method The lexicalized translation probability Pw (S | P) may be obtained. For example, the translation probability Pr (S | P) of the phrase P in the intermediate language E to the specific phrase in the source language A can be calculated by the following equation (2).

そのうち、関数N(S、P)は、ソース言語A中の特定語句Sと、中間言語E中の語句Pとの共現（co-occurrence）回数を表し、S_iは、ソース言語中の全ての語句を表す。 Among them, the function N (S, P) represents the number of co-occurrences of the specific phrase S in the source language A and the phrase P in the intermediate language E, and S _i is all in the source language. Represents the phrase.

例えば、次の式（3）により、中間言語Eの語句の、ソース言語Aの特定語句への語彙化翻訳確率Pw(S|P)を計算することができる。

For example, the lexicalized translation probability Pw (S | P) of the phrase of the intermediate language E to the specific phrase of the source language A can be calculated by the following equation (3).

そのうち、中間言語Eの語句Pがm個の単語からなり、各単語がP_j(j=1、2、……、m)と表され、ソース言語Aの特定語句Sがn個の単語からなり、各単語がS_i(i=1、2、……、n)と表され、関数W(S_i、P_j)が単語P_jを単語S_iに翻訳する確率を表すとする。計算に当たって、語句Pの、特定語句Sへの単語レベルのアライメント関係が既知であるとし（例えば、語句P中の第１個の単語が、特定語句Sの第1〜第3個の単語に対応することが既知である）、Alはアライメントを表す。（i,j）∈Alの意味は、語句Pと特定のSとがアライメントである時に、単語P_iと単語S_iとが対応するとのことである。|…|は、個数を求めることを表し、よって、積符号（Π）の後の分数は、Siに対応するP中の語句の個数を表し、即ち、その後の和（Σ）による結果に対して算術平均値を求めるのである。積符号（Π）は、iの値が1からnまで（即ち、特定語句S中の第１個の単語から最後の１つの単語まで）であると規定し、語句P中の、単語Siに対応する全ての単語については、語句P中の単語P_jをS_iに翻訳する確率を求め、そして、和を求める。この和で、語句P中の単語P_jがS_iに対応する個数を割ることで、１つの点数を得る。そして、得られた点数の積を、語句Pの、語句Sへの語彙化翻訳確率とする。 Among them, the phrase P of the intermediate language E consists of m words, each word is expressed as P _j (j = 1, 2,..., M), and the specific phrase S of the source language A consists of n words Each word is represented as S _i (i = 1, 2,..., N), and the function W (S _i , P _j ) represents the probability of translating the word P _j into the word S _i . In the calculation, it is assumed that the word level alignment relationship of the phrase P to the specific phrase S is known (for example, the first word in the phrase P corresponds to the first to third words of the specific phrase S Al represents alignment. The meaning of (i, j) ∈Al is that when the phrase P and a specific S are aligned, the word P _i and the word S _i correspond to each other. | ... | represents obtaining the number, so the fraction after the product code (Π) represents the number of words in P corresponding to Si, that is, for the result of the subsequent sum (Σ) To find the arithmetic mean. The product code (Π) specifies that the value of i is from 1 to n (ie, from the first word to the last word in the specific phrase S), and to the word Si in the phrase P For all the corresponding words, the probability of translating the word P _j in the phrase P into S _i is obtained, and the sum is obtained. By dividing the number of words P _j in the phrase P corresponding to S _i by this sum, one score is obtained. Then, the product of the obtained scores is set as the lexicalized translation probability of the phrase P to the phrase S.

次に、ステップS108では、信頼性点数に基づいて、中間言語Eの各語句Pを評価する。 Next, in step S108, each phrase P of the intermediate language E is evaluated based on the reliability score.

以下、ソース言語が中国語であり、ターゲット言語が日本語であり、中間言語が英語であのを例として本発明の利点を説明する。本発明の一実施例では、中間言語中から、最も狭い語義範囲を有する語句を選択する。例えば、中国語の“水流”を日本語中の対応する語句に翻訳するとする。英語には１つ以上の選択可能な語句があり、従来方法では、語義“水流”を含む可能性が一番大きい１つの語句を選択するので、“flow”がベストチョイスである。しかし、“flow”は、日本語の“水流”に翻訳する確率が非常に低く、最終的には、“flow”を経由して翻訳すると、誤った日本語訳文を生じさせてしまう。本発明の一実施例では、中間言語中の“water flow”が最も正確なのもであると見なし、なぜなら、それは、中国語の“水流”の語義を含むと同時に、日本語中の“水流”の意味をも正確に反映しているからである。よって、中間言語E中の語句Pの、ソース言語Aにおける特定属性、及び、語句Pの、ターゲット言語Bにおける特定属性に基づいて、語句Pがブリッジとなる信頼性を評価することができ、これによって、信頼性がより高い中間言語の語句をブリッジとして選択して翻訳を行うことができる。 Hereinafter, the advantages of the present invention will be described by taking as an example that the source language is Chinese, the target language is Japanese, and the intermediate language is English. In one embodiment of the present invention, the phrase having the narrowest semantic range is selected from the intermediate languages. For example, suppose that a Chinese “water stream” is translated into a corresponding phrase in Japanese. In English, there are one or more selectable phrases, and in the conventional method, one phrase that is most likely to contain the meaning “water flow” is selected, so “flow” is the best choice. However, “flow” has a very low probability of being translated into Japanese “water flow”. Finally, if translated via “flow”, an incorrect Japanese translation is generated. In one embodiment of the present invention, “water flow” in the intermediate language is considered to be the most accurate because it includes the meaning of “water flow” in Chinese and at the same time “water flow” in Japanese. This is because the meaning of is accurately reflected. Therefore, based on the specific attribute in the source language A of the phrase P in the intermediate language E and the specific attribute in the target language B of the phrase P, the reliability with which the phrase P becomes a bridge can be evaluated. Therefore, it is possible to perform translation by selecting an intermediate language phrase having higher reliability as a bridge.

図2は、本発明の他の実施例における中間言語Eの語句Pを評価する方法200のフローチャートである。 FIG. 2 is a flowchart of a method 200 for evaluating the phrase P of the intermediate language E in another embodiment of the present invention.

図2に示すように、ステップS202では、ソース言語Aの特定語句Sと、中間言語Eの語句データベース中の語句とのアライメントを行い、中間言語Eの少なくとも１つの第一語句を得る。言い換えれば、特定語句Sを、中間言語Eの語句データベース中の、特定語句Sの意味を有する可能性のある語句にアライメントさせ、便宜のため、得られた語句を第一語句と称する。 As shown in FIG. 2, in step S202, the specific phrase S of the source language A and the phrases in the phrase database of the intermediate language E are aligned to obtain at least one first phrase of the intermediate language E. In other words, the specific phrase S is aligned with a phrase having the meaning of the specific phrase S in the phrase database of the intermediate language E, and the obtained phrase is referred to as a first phrase for convenience.

ステップS204では、少なくとも１つの第一語句中から、ソース言語Aの特定語句Sに対応しない部分を除くことで、少なくとも１つの第一アライメント語句を得る。言い換えれば、各第一語句については、その中の一部のみが特定語句Sに対応する可能性があり、よって、対応しない部分を除く必要があり、便宜のため、対応しない部分を除いて得られた語句を第一アライメント語句と称する。なお、ここで除去された部分とは、１つの第一語句中の一部を指すであり、複数の第一語句中の幾つかの第一語句を指すのではない。 In step S204, at least one first alignment word / phrase is obtained by removing a portion not corresponding to the specific word / phrase S of the source language A from at least one first word / phrase. In other words, for each first word / phrase, only a part of the first word / phrase may correspond to the specific word / phrase S, and therefore, it is necessary to exclude the non-corresponding part. This word is referred to as a first alignment word. The part removed here refers to a part in one first phrase, and does not refer to some first phrases in a plurality of first phrases.

ステップS206では、ターゲット言語Bの語句データベース中の語句と、中間言語Eの語句データベース中の語句とのアライメントを行い、中間言語Eの少なくとも１つの第二語句を得る。ある場合、幾つかの言語材料が存在し、このよな言語材料は、ターゲット言語Bと、中間言語Eとの関連性を含むが、往々して、ターゲット言語Bと、中間言語Eとは、一対一に対応しないので、ターゲット言語Bと中間言語Eとをアライメントさせる必要がある。便宜のため、中間言語E中の、ターゲット言語B中の語句にアライメントすることができる語句を第二語句と称する。 In step S206, the phrases in the target language B phrase database and the phrases in the intermediate language E phrase database are aligned to obtain at least one second phrase in the intermediate language E. In some cases, there are several language materials, such language materials include the relationship between the target language B and the intermediate language E, but often the target language B and the intermediate language E are Since there is no one-to-one correspondence, it is necessary to align the target language B and the intermediate language E. For convenience, a phrase that can be aligned with a phrase in target language B in intermediate language E is referred to as a second phrase.

ステップS208では、少なくとも１つの第二語句から、ターゲット言語Bの語句データベース中の語句に対応しない部分を除去して、少なくとも１つの第二アライメント語句を得る。同様に、各第二語句については、一部のみがターゲット言語B中の語句との対応性を有する可能性があるので、対応しない部分を除去する必要があり、便宜のため、対応しない部分を除いて得られた語句を第二アライメント語句と称する。なお、ここで除去された部分とは、１つの第二語句中の一部を指すのであり、複数の第二語句中の幾つかの第二語句を指すのではない。 In step S208, a portion not corresponding to the word / phrase in the word / phrase database of the target language B is removed from at least one second word / phrase to obtain at least one second alignment word / phrase. Similarly, for each second word, only a part may have correspondence with the words in the target language B, so it is necessary to remove the non-corresponding part. The phrase obtained by removing is referred to as a second alignment phrase. The portion removed here refers to a part of one second word, and does not refer to some second words in a plurality of second words.

ステップS210では、上述の少なくとも１つの第一アライメント語句と、上述の少なくとも１つの第二アライメント語句との共通集合中の語句を中間言語Eの評価待ち語句とする。共通集合中の語句は、ソース言語A中の語句Sに対応する語句であり、且つ、ターゲット言語B中の語句との対応性を有するので、共通集合中の語句は、語句Sをターゲット言語に翻訳するときの対応する語句とされる可能性がある。 In step S210, the words in the common set of the above-described at least one first alignment word and the above-described at least one second alignment word are set as the evaluation waiting word of the intermediate language E. The phrase in the common set is a phrase corresponding to the phrase S in the source language A and has a correspondence with the phrase in the target language B. Therefore, the phrase in the common set uses the phrase S as the target language. There is a possibility that it will be the corresponding phrase when translating.

理解しやすいため、以下、１つの例を挙げて上述のプロセスを詳しく説明する。 For ease of understanding, the above process is described in detail below with an example.

この例では、中国語がソース言語Aであり、特定語句Sが中国語の“甲硫安酸（メチオニンという意味）”であり、中間言語Eが英語であり、ターゲット言語Bが日本語であるとする。 In this example, the Chinese language is the source language A, the specific phrase S is the Chinese word “josulfuric acid (meaning methionine)”, the intermediate language E is English, and the target language B is Japanese. To do.

中国語の“甲硫安酸”と、英語の語句データベース中の語句とのアライメントを行ったに、第一語句である“methionine Promix、NEN、Boston、MA”、“-14C]-L-methionine”などを得る。該第一語句から、“甲硫安酸”に対応しない部分、即ち、“Promix、NEN、Boston、MA”、“-14C]-L-”などを除去することで、第一アライメント語句“methionine”を得る。ここでは、“甲硫安酸”に対応する１つのみの語句“methionine”を挙げているが、容易に理解できるのは、幾つかの語句、例えば、“気体”については、その対応する英語の語句の数が複数ある可能性があり、例えば、“air”、“gas”などがあるので、１つ以上の第一語句を得る可能性があり、そして、１つ以上の第一アライメント語句を得る可能性もある。 The alignment of the Chinese word “sulfuric acid” with the words in the English word database, the first words “methionine Promix, NEN, Boston, MA”, “-14C] -L-methionine” And so on. The first alignment phrase “methionine” is removed from the first phrase by removing parts that do not correspond to “sulfuric acid”, ie, “Promix, NEN, Boston, MA”, “-14C] -L-”, etc. Get. Here, only one word “methionine” corresponding to “sulfuric acid” is listed, but it is easy to understand that for some words, for example “gas”, the corresponding English There can be more than one number of phrases, for example, “air”, “gas”, etc., so it is possible to get one or more first phrases, and one or more first alignment phrases There is also a possibility to get.

次に、日本語の語句データベース中の語句と、英語の語句データベース中の語句とのアライメントを行い、各日本語の語句に対応する英語の語句を第二語句として取得し、即ち、日本語の語句と、英語の語句との対応関係を得ることに相当する。同様に、英語の語句から、日本語の語句に対応しない部分を除去して第二アライメント語句を得る。もちろん、第二アライメント語句の数は、第一アライメント語句の数よりも大きくなる可能性がある。最後的には、第一アライメント語句と第二アライメント語句との共通集合中の語句作を評価待ちの英語の語句とし、即ち、共通集合中の語句を、信頼性点数を計算する必要のある語句とする。 Next, the word / phrase in the Japanese word / phrase database is aligned with the word / phrase in the English word / phrase database, and the English word / phrase corresponding to each Japanese word / phrase is obtained as the second word / phrase, that is, This corresponds to obtaining a correspondence between a phrase and an English phrase. Similarly, a portion that does not correspond to a Japanese word is removed from an English word to obtain a second alignment word. Of course, the number of second alignment phrases may be greater than the number of first alignment phrases. Finally, the phrase in the common set of the first alignment word and the second alignment word is the English word that is waiting for evaluation, that is, the word whose reliability score needs to be calculated for the word in the common set And

一実施例では、第一アライメント語句と第二アライメント語句との共通集合中の語句を評価待ち中間言語Eの語句とするステップ（即ち、ステップS210）の前に、さらに、第一アライメント語句の開始部分及び第二アライメント語句の開始部分が安定であるかどうかをそれぞれ判断してよい。例えば、第一アライメント語句の開始部分及び第二アライメント語句の開始部分がソース言語Aにおける語義範囲が第一閾値を超えたかどうかをそれぞれ判断する。また、さらに、第一アライメント語句の開始部分及び第二アライメント語句の開始部分がターゲット言語Bにおける語義範囲が第二閾値を超えたかどうかをそれぞれ判断してもよい。第一アライメント語句の開始部分及び第二アライメント語句の開始部分が安定でなければ、開始部分を除去する。また、ソース言語が中国語であり、中間言語が英語であり、及び、ターゲット言語が日本語であるのを例とすると、開始部分の定冠詞、不定冠詞、介詞、副詞などは、中国語における語義範囲及び日本語における語義範囲が往々して大きすぎるので、不安定の語句として除去されてもよい。 In one embodiment, prior to the step of setting a phrase in the intersection of the first alignment phrase and the second alignment phrase as an evaluation-pending intermediate language E phrase (ie, step S210), the start of the first alignment phrase Each of the portion and the start portion of the second alignment phrase may be determined to be stable. For example, it is determined whether the starting part of the first alignment word and the starting part of the second alignment word each have a meaning range in the source language A that exceeds the first threshold value. Furthermore, the start part of the first alignment word and the start part of the second alignment word may respectively determine whether the meaning range in the target language B exceeds the second threshold value. If the starting part of the first alignment word and the starting part of the second alignment word are not stable, the starting part is removed. For example, if the source language is Chinese, the intermediate language is English, and the target language is Japanese, the starting definite articles, indefinite articles, interns, adverbs, etc. Ranges and meaning ranges in Japanese are often too large and may be removed as unstable phrases.

次に、ステップS212では、中間言語Eの評価待ち語句がソース言語Aに対する第一特定属性を確定する。 Next, in step S212, the first specific attribute for the source language A is determined by the pending evaluation phrase of the intermediate language E.

ステップS214では、中間言語Eの評価待ち語句がターゲット言語Bに対する第二特定属性を確定する。 In step S214, the evaluation waiting word of the intermediate language E determines the second specific attribute for the target language B.

ステップS216では、第一特定属性及び第二特定属性に基づいて、中間言語Eの評価待ち語句の信頼性点数を計算する。 In step S216, the reliability score of the evaluation waiting phrase of the intermediate language E is calculated based on the first specific attribute and the second specific attribute.

ステップS218では、信頼性点数に基づいて、中間言語Eの評価待ち語句を評価する。 In step S218, the evaluation waiting phrase of the intermediate language E is evaluated based on the reliability score.

図3は、本発明の一実施例における機械翻訳方法300のフローチャートである。 FIG. 3 is a flowchart of a machine translation method 300 in one embodiment of the present invention.

図3に示すように、ステップS302では、上述の中間言語Eの語句Pの評価方法100又は200により、中間言語Eの語句Pの信頼性点数を得ることができる。 As shown in FIG. 3, in step S302, the reliability score of the phrase P of the intermediate language E can be obtained by the evaluation method 100 or 200 of the phrase P of the intermediate language E described above.

ステップS304では、中間言語Eの語句Pの信頼性点数、及び、ソース言語Aの特定語句Sを中間言語Eの語句を経由してターゲット言語Bの候補語句に翻訳する機械翻訳点数に基づいて、ターゲット言語Bの候補語句の翻訳点数を計算する。 In step S304, based on the reliability score of the phrase P of the intermediate language E and the machine translation score for translating the specific phrase S of the source language A into the candidate phrase of the target language B via the phrases of the intermediate language E, Calculate the translation score of the target language B candidate phrase.

例えば、信頼性点数及び機械翻訳点数に基づいて、CKY（Cocke-Kasami-Younger）アルゴリズムによって、ターゲット言語Bの候補語句の翻訳点数を計算することができる。 For example, based on the reliability score and the machine translation score, the translation score of the candidate word / phrase of the target language B can be calculated by a CKY (Cocke-Kasami-Younger) algorithm.

例えば、中間言語Eの複数の語句中から、信頼性点数が所定信頼性点数閾値よりも高い語句を信頼語句として選択することができる。信頼語句の個数がNであるとする。好ましくは、次の式（4）により、ターゲット言語Bの第i個の候補語句の翻訳点数P_iを計算する。

For example, from among a plurality of phrases in the intermediate language E, a phrase having a reliability score higher than a predetermined reliability score threshold can be selected as the trust phrase. Assume that the number of trust words is N. Preferably, the translation score P _i of the i-th candidate word / phrase of the target language B is calculated by the following equation (4).

そのうち、r_jは、中間言語Eの第j個の信頼語句の信頼性点数であり、T_ijは、ソース言語Aの特定語句Sを中間言語Eの第j個の信頼語句を経由してターゲット言語Bの第i個の候補語句に翻訳する機械翻訳点数である。 Of these, r _j is the reliability score of the jth trust word in the intermediate language E, and T _ij targets the specific phrase S in the source language A via the jth trust word in the intermediate language E. The number of machine translation points translated into the i-th candidate word / phrase of language B.

そのうち、次の式（5）により、機械翻訳点数T_i ^j、（Tとも称される。）を計算してもよい。

Among them, the machine translation score T _i ^j (also referred to as T) may be calculated by the following equation (5).

そのうち、W_iは、翻訳重みであり、F_iは、特徴である。好ましくは、4個の特徴、正翻訳確率、逆翻訳確率、正語彙化翻訳確率及び逆語彙化翻訳確率を用いる。lnは、4個の特徴に対してそれぞれ自然対数を求めることを表す。 Of these, _Wi is a translation weight, and _Fi is a feature. Preferably, four features, normal translation probability, reverse translation probability, normal lexicalized translation probability, and reverse lexicalized translation probability are used. ln represents the natural logarithm for each of the four features.

ステップS306では、翻訳点数に基づいて、ターゲット言語Bの候補語句中から、ターゲット言語Bの語句を翻訳結果として選択する。例えば、ターゲット言語中の、翻訳点数が最も高い語句を翻訳結果として選択してもよい。 In step S306, based on the translation score, the target language B phrase is selected as a translation result from the target language B candidate phrases. For example, the word / phrase having the highest translation score in the target language may be selected as the translation result.

中間言語の信頼性点数を推定しにくいため、一実施例では、群知能アルゴリズムを用いて、上述の方法に使用するパラメータを調整し、例えば、人工神経ネットワークアルゴリズム中のパラメータ、CKYアルゴリズム中のパラメータ、及び所定語義範囲閾値などを調整する。 Since it is difficult to estimate the reliability score of the intermediate language, in one embodiment, the parameters used in the above method are adjusted using the swarm intelligence algorithm, for example, parameters in the artificial neural network algorithm, parameters in the CKY algorithm , And a predetermined meaning range threshold value are adjusted.

群知能アルゴリズムは、遺伝的アルゴリズムに類似する。群知能アルゴリズムでは、まず、初期化を行って多くのシステム構成を取得し、各システム構成は、１つの個体として存在する。各個体が含む情報は、全体のシステムを構成することができる。各個体については、１つの評価関数によって評価する。各個体の評価点数は、この個体の突変確率及び繁衍確率に直接影響する。高評価値を有する個体の繁衍率が高いが、突変率が低い。逆に、低評価値を有する個体の繁衍率が低いが、突変確率が高い。群れ全体は、幾つかの世代の繁衍の後に、最も高い評価値を有する１つの個体を選択して用い、全体のシステムを構成する。 The swarm intelligence algorithm is similar to the genetic algorithm. In the swarm intelligence algorithm, first, many system configurations are acquired by performing initialization, and each system configuration exists as one individual. The information included in each individual can constitute the entire system. Each individual is evaluated by one evaluation function. The evaluation score of each individual has a direct influence on the sudden change probability and fertility probability of this individual. The fertility rate of individuals with high evaluation values is high, but the rate of sudden change is low. On the other hand, the probability of sudden change is high although the fertility rate of individuals having low evaluation values is low. The entire herd selects and uses the single individual with the highest evaluation value after several generations of breeding and constitutes the entire system.

粒子群最適化アルゴリズム（Particle Swarm Optimization：PSO）は、群れに基づくものであり、環境への適応度に基づいて、群れ中の個体を良い領域に移動させる。しかし、それは、個体に対して発展演算子を使用せず、各個体を、D次元捜索空間中の１つの体積無しの粒子（即ち、点）と見なし、この粒子は、捜索空間において一定の速度で飛び、この速度は、その自身の飛行経験及び同伴の飛行経験に基づいて動的に調整され得る。第i個の粒子は、Xi=(xi1、xi2、…、xiD)と表され、そのうち、次元数Dの大小（Ｓｉｚｅ）は、調整する必要がある全てのパラメータの個数であり、各パラメータは、１つの特定の次元数に対応する。その経験した最もよい位置は、Pi=(pi1、pi2、…、piD)と記され、pbestとも称される。そのうち、最もよい位置とは、最もよい適応値を有する位置を指し、適応値の計算は、Xiを目標方程式に代入して解を求めることであり、求められた解の値は、適応値である。本発明では、目標方程式は、上述の式（4）であり、即ち、翻訳点数の計算用の公式である。群れの全ての粒子が経験した最も良い位置のインデックス番号は、符号gで表し、即ち、Pgであり、gbestとも称される。粒子iの速度は、Vi=(vi1、vi2、…、viD)で表す。各世代について、その第d次元(1≦d≦D)は、次のような方程式に従って変化する。

Particle Swarm Optimization (PSO) is based on swarms and moves individuals in the swarm to a good area based on the fitness to the environment. However, it does not use an evolution operator for the individual and considers each individual as a single volumeless particle (ie, a point) in the D-dimensional search space, which has a constant velocity in the search space. This speed can be adjusted dynamically based on its own flight experience and accompanying flight experience. The i-th particle is expressed as Xi = (xi1, xi2,..., XiD), of which the dimension number D (Size) is the number of all parameters that need to be adjusted. Corresponds to one specific number of dimensions. The best location experienced is noted as Pi = (pi1, pi2,..., PiD), also referred to as pbest. Of these, the best position refers to the position having the best adaptive value, and the calculation of the adaptive value is to obtain a solution by substituting Xi into the target equation, and the obtained solution value is the adaptive value. is there. In the present invention, the target equation is the above-described equation (4), that is, a formula for calculating the translation score. The index number of the best position experienced by all particles in the swarm is denoted by the symbol g, ie, Pg, also referred to as gbest. The velocity of the particle i is represented by Vi = (vi1, vi2,..., ViD). For each generation, its d-th dimension (1 ≦ d ≦ D) changes according to the following equation.

そのうち、wは、慣性重み(inertia weight)であり、c1及びc2は、加速定数（acceleration constants）であり、rand()及びRand()は、２つの[0、1]の範囲内に変化するランダム値である。 Where w is inertia weight, c1 and c2 are acceleration constants, and rand () and Rand () vary within the range of two [0, 1] It is a random value.

また、粒子の速度Viは、１つの最大速度Vmaxにより制限される。ある粒子への現在のの加速により、この粒子のある次元における速度vidが該次元の最大速度vmax,dを超えた場合、該次元の該粒子の速度は、該次元の最大速度vmax,dに制限され、即ち、該次元の最大速度vmax,dに等しい。 The particle speed Vi is limited by one maximum speed Vmax. If the current acceleration to a particle causes the velocity vid in one dimension of the particle to exceed the maximum velocity vmax, d in the dimension, the velocity of the particle in the dimension is reduced to the maximum velocity vmax, d in the dimension. Limited, ie equal to the maximum velocity vmax, d of the dimension.

式（4）は、第一部分が粒子のその前の行為の慣性であり、第二部分が“認識（cognition）”部分であり、粒子自身の思考を表し、第三部分が“社会（social）”部分であり、粒子間の情報共用及び相互協力を表す。 In equation (4), the first part is the inertia of the previous action of the particle, the second part is the “cognition” part, representing the particle's own thoughts, and the third part is “social” "Part, representing information sharing and mutual cooperation between particles."

“認識”部分：１つの強化されたランダム行為が将来に再び出現する可能性が大きい。ここでの行為は、“認識”であり、また、正確な知識を得たことは、強化されたとのことであると仮定し、このような１つのモデルは、粒子が励起されて誤差を減らすと仮定する。 The “recognition” part: one enhanced random act is likely to reappear in the future. The act here is “recognition” and it is assumed that having gained accurate knowledge is enhanced, and one such model reduces the error as the particles are excited. Assume that

“社会”部分：観察者は、１つのモデルが某行為を強化していると観察した時に、該行為を実行する確率を増加させる。即ち、粒子自身の認識は、他の粒子により模倣される。 “Society” part: When an observer observes that a model enhances the act of jealousy, it increases the probability of performing that act. That is, the recognition of the particle itself is imitated by other particles.

PSOアルゴリズムは、次のような心理学的仮定を用い、即ち、一致を求める認識プロセスでは、個体が往々して自身の信念を覚え、また、その同時に、他の個体らの信念をも考慮する。他の個体の信念がよりよいと判断した時に、適応に調整を行う。 The PSO algorithm uses the following psychological assumptions: that is, in the recognition process for matching, individuals often remember their beliefs and at the same time consider the beliefs of other individuals . Adjustments are made to adaptation when it is determined that the beliefs of other individuals are better.

標準PSOのアルゴリズムのフローチャートは、次のようであり、即ち、1）１つの群れの粒子（群れの規模がmである）を初期化し、ランダムな位置和速度を含み；2）各粒子の適応度を評価し；3）各粒子について、その適応値と、その経験した最も良い位置pbestとを比較し、良いと判断すると、それを現在の最も良い位置pbestとし；4）各粒子について、その適応値と、全体が経験した最も良い位置gbestとを比較し、良いと判断すると、gbestのインデックス番号を再び設置し；5）方程式(1)に基づいて粒子の速度及び位置を変更し；6）終了条件（通常、十分に良い適応値であり、又は、１つの所定の最大値Gmaxに達する）を満たさない場合、2)戻す。 The flow chart of the standard PSO algorithm is as follows: 1) Initialize a swarm of particles (the swarm size is m) and include a random position sum velocity; 2) Adaptation of each particle 3) Compare the adaptation value for each particle with the best position pbest experienced, and if it is determined to be good, it is the current best position pbest; 4) For each particle, Comparing the adaptation value with the best overall position gbest experienced, and if good, reposition gbest index number; 5) change particle velocity and position based on equation (1); 6 ) If the termination condition (usually a sufficiently good adaptation value or one predetermined maximum value Gmax is not met), 2) return.

群知能アルゴリズムの他に、他のアルゴリズム、例えば、遺伝的アルゴリズム、人工免疫システム、ランダム伝播捜索、及びＥＭアルゴリズムなどを使用してよい。 In addition to swarm intelligence algorithms, other algorithms such as genetic algorithms, artificial immune systems, random propagation searches, and EM algorithms may be used.

図4は、本発明の一実施例における中間言語Eの語句Pを評価する装置400の構成ブロック図である。 FIG. 4 is a configuration block diagram of an apparatus 400 for evaluating the phrase P of the intermediate language E in one embodiment of the present invention.

図4に示すよう、中間言語Eの語句Pを評価する装置400は、第一特定属性確定部412、第二特定属性確定部414、信頼性点数計算部416、及び評価部418を含んでもよい。中間言語Eの語句Pは、ソース言語Aの特定語句Sをターゲット言語Bの語句に翻訳するためのブリッジである。 As shown in FIG. 4, the apparatus 400 for evaluating the phrase P of the intermediate language E may include a first specific attribute determination unit 412, a second specific attribute determination unit 414, a reliability score calculation unit 416, and an evaluation unit 418. . The phrase P of the intermediate language E is a bridge for translating the specific phrase S of the source language A into the phrase of the target language B.

第一特定属性確定部412は、中間言語Eの語句がソース言語Aに対する第一特定属性を確定する。 The first specific attribute determination unit 412 determines the first specific attribute for the language of the intermediate language E for the source language A.

例えば、第一特定属性は、中間言語Eの語句がソース言語Aにおける第一語義範囲を含んでもよい。第一語義範囲は、ソース言語A中の、中間言語Eの語句に対応する語句の数、又は、該数の関数である。該関数は、例えば、対数関数であってよい。幾つかの実施例では、第一特定属性は、さらに、中間言語Eの語句Pの、ソース言語Aの特定語句Sへの翻訳確率、及び/又は、中間言語Eの語句Pの、ソース言語Aの特定語句Sへの語彙化翻訳確率を含んでもよい。 For example, the first specific attribute may include the first semantic range in the source language A for the phrase of the intermediate language E. The first meaning range is the number of phrases corresponding to the phrase of the intermediate language E in the source language A or a function of the number. The function may be a logarithmic function, for example. In some embodiments, the first specific attribute may further include a translation probability of the phrase P of the intermediate language E to the specific phrase S of the source language A and / or the source language A of the phrase P of the intermediate language E. May include the lexicalized translation probabilities for a specific phrase S.

第二特定属性確定部414は、中間言語Eの語句Pがターゲット言語Bに対応する第二特定属性を確定する。 The second specific attribute determination unit 414 determines a second specific attribute in which the phrase P of the intermediate language E corresponds to the target language B.

例えば、第二特定属性は、中間言語Eの語句がターゲット言語Bにおける第二語義範囲を含んでもよい。例えば、第二語義範囲は、ターゲット言語B中の、中間言語Eの語句に対応する語句の数、又は、該数の関数である。該関数は、例えば、対数関数であってよい。 For example, the second specific attribute may include a second semantic range in the target language B of the intermediate language E phrase. For example, the second meaning range is the number of phrases corresponding to the phrase of the intermediate language E in the target language B, or a function of the number. The function may be a logarithmic function, for example.

信頼性点数計算部416は、第一特定属性及び第二特定属性に基づいて、中間言語Eの語句Pの信頼性点数を計算する。 The reliability score calculation unit 416 calculates the reliability score of the phrase P of the intermediate language E based on the first specific attribute and the second specific attribute.

評価部418は、信頼性点数に基づいて中間言語Eの語句Pを評価する。 The evaluation unit 418 evaluates the phrase P of the intermediate language E based on the reliability score.

図5は、本発明の他の実施例における中間言語Eの語句Pを評価する装置400’の構成ブロック図である。 FIG. 5 is a block diagram showing the configuration of an apparatus 400 'for evaluating the phrase P of the intermediate language E in another embodiment of the present invention.

図5に示すように、装置400’は、第一アライメント部402、第一除去部404、第二アライメント部406、第二除去部408、及び共通集合確定部410、並びに、図4に示す部品とは同じである第一特定属性確定部412、第二特定属性確定部414、信頼性点数計算部416、及び評価部418を含んでもよい。 As shown in FIG. 5, the apparatus 400 ′ includes a first alignment unit 402, a first removal unit 404, a second alignment unit 406, a second removal unit 408, a common set determination unit 410, and the components shown in FIG. May include a first specific attribute determination unit 412, a second specific attribute determination unit 414, a reliability score calculation unit 416, and an evaluation unit 418.

第一アライメント部402は、ソース言語Aの特定語句Sと、中間言語Eの語句データベース中の語句とのアライメントを行い、中間言語Eの第一語句を得る。 The first alignment unit 402 performs alignment between the specific phrase S of the source language A and the phrases in the phrase database of the intermediate language E, and obtains the first phrase of the intermediate language E.

]第一除去部404は、第一語句中から、ソース言語Aの特定語句Sに対応しない部分を除去し、第一アライメント語句を得る。 The first removal unit 404 removes a portion that does not correspond to the specific word / phrase S of the source language A from the first word / phrase to obtain a first alignment word / phrase.

第二アライメント部406は、ターゲット言語Bの語句データベース中の語句と、中間言語Eの語句データベース中の語句とのアライメントを行い、中間言語Eの第二語句を得る。 The second alignment unit 406 aligns the phrases in the target language B phrase database and the phrases in the intermediate language E phrase database to obtain a second intermediate language E phrase.

第二除去部408は、第二語句中から、ターゲット言語Bの語句データベース中の語句に対応しない部分を除き、第二アライメント語句を得る。 The second removal unit 408 obtains a second alignment word / phrase by excluding a part not corresponding to the word / phrase in the word / phrase database of the target language B from the second word / phrase.

共通集合確定部410は、第一アライメント語句と、第二アライメント語句との共通集合中の語句を、中間言語Eの評価待ち語句とする。 The common set determination unit 410 sets a word in the common set of the first alignment word and the second alignment word as an evaluation waiting word for the intermediate language E.

第一特定属性確定部412は、中間言語Eの評価待ち語句がソース言語Aに対する第一特定属性を確定する。 The first specific attribute determination unit 412 determines the first specific attribute for the source language A as the evaluation waiting phrase of the intermediate language E.

第二特定属性確定部414は、中間言語Eの評価待ち語句がターゲット言語Bに対する第二特定属性を確定する。 The second specific attribute determination unit 414 determines the second specific attribute for the target language B as the evaluation waiting phrase of the intermediate language E.

信頼性点数計算部416は、第一特定属性及び第二特定属性に基づいて、中間言語Eの評価待ち語句の信頼性点数を計算する。 The reliability score calculation unit 416 calculates the reliability score of the pending evaluation phrase of the intermediate language E based on the first specific attribute and the second specific attribute.

評価部418は、信頼性点数に基づいて、中間言語Eの評価待ち語句を評価する。 The evaluation unit 418 evaluates the evaluation waiting phrase of the intermediate language E based on the reliability score.

図6は、本発明の他の実施例における中間言語Eの語句Pを評価する装置400”の構成ブロック図である。 FIG. 6 is a block diagram showing a configuration of an apparatus 400 ″ for evaluating the phrase P of the intermediate language E in another embodiment of the present invention.

図6に示すような装置400”と、図5に示すような装置400’との相違点は、装置400”は、さらに、第三除去部409を含むことにある。 The difference between the apparatus 400 ″ as shown in FIG. 6 and the apparatus 400 ′ as shown in FIG. 5 is that the apparatus 400 ″ further includes a third removal unit 409.

第三除去部409は、第一アライメント語句の開始部分及び第二アライメント語句の開始部分が安定であるかどうかをそれぞれ判断し、且つ、第一アライメント語句の開始部分及び第二アライメント語句の開始部分が安定でない場合、これらの開始部分を除去する。 The third removal unit 409 determines whether the start part of the first alignment word and the start part of the second alignment word are stable, respectively, and the start part of the first alignment word and the start part of the second alignment word If is not stable, remove these starting parts.

例えば、第三除去部409は、第一アライメント語句の開始部分及び第二アライメント語句の開始部分がソース言語Aにおける語義範囲が第一閾値を超えたかどうかをそれぞれ判断し、超えた場合、該開始部分を除去する。また、さらに、第一アライメント語句の開始部分及び第二アライメント語句の開始部分がターゲット言語Bにおける語義範囲が第二閾値を超えたかどうかをそれぞれ判断し、超えた場合、該開始部分を除去する。ここでは、依然として、ソース言語が中国語であり、中間言語が英語であり、及に、ターゲット言語が日本語であるのを例とすると、開始部分の定冠詞、不定冠詞、介詞、副詞などは、中国語における語義範囲及び日本語における語義範囲が往々して大きすぎるので、不安定の語句として除去されてもよい。 For example, the third removal unit 409 determines whether the start part of the first alignment word and the start part of the second alignment word each have a meaning range in the source language A that exceeds the first threshold value. Remove the part. Further, it is determined whether or not the start part of the first alignment word and the start part of the second alignment word each have a meaning range in the target language B exceeding the second threshold value, and if it exceeds, the start part is removed. Here, if the source language is still Chinese, the intermediate language is English, and the target language is Japanese, the definite article, indefinite article, interposition, adverb, etc. Since the meaning range in Chinese and the meaning range in Japanese are often too large, they may be removed as unstable phrases.

図7は、本発明の一実施例における機械翻訳装置700の構成ブロック図である。 FIG. 7 is a block diagram showing the configuration of the machine translation apparatus 700 in one embodiment of the present invention.

図7に示すように、機械翻訳装置700は、中間言語Eの語句を評価する装置712、翻訳点数計算部714、及び翻訳結果選択部716を含む。 As shown in FIG. 7, the machine translation device 700 includes a device 712 that evaluates an intermediate language E phrase, a translation score calculation unit 714, and a translation result selection unit 716.

中間言語の語句を評価する装置712は、例えば、図4に示すような装置400、図5に示すような装置400’、又は、図6に示すような装置400”であってよい。 The device 712 for evaluating intermediate language phrases may be, for example, a device 400 as shown in FIG. 4, a device 400 'as shown in FIG. 5, or a device 400 ″ as shown in FIG.

中間言語Eの語句を評価する装置712は、中間言語Eの語句Pの信頼性点数を得ることができる。 The device 712 for evaluating the phrase of the intermediate language E can obtain the reliability score of the phrase P of the intermediate language E.

翻訳点数計算部714は、中間言語Eの語句Pの信頼性点数、及び、ソース言語Aの特定語句Sを中間言語Eの語句Pを経由してターゲット言語Bの候補語句に翻訳する機械翻訳点数に基づいて、ターゲット言語Bの候補語句の翻訳点数を計算することができる。例えば、翻訳点数計算部714は、信頼性点数及び機械翻訳点数に基づいて、CKYアルゴリズムによって、ターゲット言語Bの候補語句の翻訳点数を計算することができる。 The translation score calculation unit 714 translates the reliability score of the phrase P of the intermediate language E and the machine translation score for translating the specific phrase S of the source language A into the candidate phrase of the target language B via the phrase P of the intermediate language E Based on the above, the translation score of the candidate word / phrase of the target language B can be calculated. For example, the translation score calculation unit 714 can calculate the translation score of the candidate word / phrase of the target language B by the CKY algorithm based on the reliability score and the machine translation score.

例えば、中間言語Eの複数の語句中から、信頼性点数が所定信頼性点数閾値よりも大きい語句を信頼語句として選択してもよい。 For example, from among a plurality of phrases in the intermediate language E, a phrase having a reliability score larger than a predetermined reliability score threshold may be selected as the trust phrase.

好ましくは、上述の式（4）で、ターゲット言語Bの第i個の候補語句の翻訳点数Piを計算する。 Preferably, the translation score Pi of the i-th candidate word / phrase of the target language B is calculated by the above formula (4).

翻訳結果選択部716は、翻訳点数計算部714が算出した翻訳点数に基づいて、ターゲット言語Bの候補語句中から、ターゲット言語Bの語句を翻訳結果として選択してもよい。 The translation result selection unit 716 may select a target language B phrase as a translation result from the target language B candidate phrases based on the translation score calculated by the translation score calculation unit 714.

本発明の実施例では、モデルを形成することで中間言語の多義を分析し、また、中間言語中の多義なし又は多義がとても少ない語句をブリッジとして選択する。本発明の実施例では、中間言語E中の１つの語句Pによって、ソース言語A中の特定語句Sを目標語句中の語句に翻訳することができ、また、中間言語Eの複数の語句Pによって、ソース言語A中の特定語句Sを目標語句中の語句に翻訳することもできる。 In the embodiment of the present invention, the ambiguity of the intermediate language is analyzed by forming a model, and a phrase having no ambiguity or very little ambiguity in the intermediate language is selected as a bridge. In an embodiment of the present invention, a single phrase P in the intermediate language E can translate a specific phrase S in the source language A into a phrase in the target phrase, and a plurality of phrases P in the intermediate language E The specific phrase S in the source language A can be translated into the phrase in the target phrase.

以上、具体的な実施例に基づいて本発明の原理について説明したが、理解すべきは、本発明の実施例による装置の全部又は一部又は任意のステップは、ソフトウェア、ファームウェア、ハードウェア又はそれらの任意の組む合せの方式で実現されてもよい。ソフトウェア又はファームウェアにより実現する場合、まず、記憶媒体又はネットワークから、専用ハードウェア構造を有するマシン（例えば、図8に示す汎用マシン800）に該ソフトウェア又はファームウェアのプログラムをインストールし、それから、該マシンは、各種プログラムがインストールされている時に、上述のユニットやサブユニットの各種機能を実行することができる。 Although the principle of the present invention has been described based on the specific embodiments, it should be understood that all or a part of the apparatus according to the embodiments of the present invention or any step may be software, firmware, hardware or the like. It may be realized by any combination method. When implemented by software or firmware, first, the software or firmware program is installed from a storage medium or a network to a machine having a dedicated hardware structure (for example, the general-purpose machine 800 shown in FIG. 8). When the various programs are installed, various functions of the above-described units and subunits can be executed.

図8に示すように、中央処理ユニット（ＣＰＵ）801が、リードオンリーメモリ（ＲＯＭ）802に記憶されているプログラム、又は、記憶部808からランダムアクセスメモリ（ＲＡＭ）803にロードされているプログラムに基づいて各種の処理を行う。ＲＡＭ803は、ニーズに応じて、ＣＰＵ801が各種の処理などを実行するときに必要なデータを記憶してもよい。ＣＰＵ801、ＲＯＭ802及びＲＡＭ803は、バス804を経由して互いに接続される。また、入力／出力インターフェース805もバス804に接続される。 As shown in FIG. 8, the central processing unit (CPU) 801 is a program stored in a read-only memory (ROM) 802 or a program loaded from a storage unit 808 into a random access memory (RAM) 803. Various processes are performed based on this. The RAM 803 may store data necessary when the CPU 801 executes various processes according to needs. The CPU 801, ROM 802, and RAM 803 are connected to each other via a bus 804. An input / output interface 805 is also connected to the bus 804.

入力／出力インターフェース805には、入力部806（キーボードやマウスなどを含む）、出力部807（表示器例えばＣＲＴ、ＬＣＤ、スピーカーなどを含む）、記憶部808（ハードディスクなどを含む）、及び通信部809（ネットワーク接続カード例えばＬＡＮカード、モデムなどを含む）が接続される。通信部809は、ネットワーク例えばインターネットを経由して通信処理を行う。ドライブ810がニーズに応じて入力／出力インターフェース805に接続されてもよい。また、ニーズに応じて、取り外し可能な媒体811例えば磁気ディスク、光ディスク、光磁気ディスク、半導体メモリなどをドライブ810にセットすることにより、その中から読み出したコンピュータプログラムを記憶部808にインストールしてもよい。 The input / output interface 805 includes an input unit 806 (including a keyboard and a mouse), an output unit 807 (including a display device such as a CRT, an LCD, and a speaker), a storage unit 808 (including a hard disk), and a communication unit. 809 (including a network connection card such as a LAN card and a modem) is connected. The communication unit 809 performs communication processing via a network such as the Internet. A drive 810 may be connected to the input / output interface 805 according to needs. Further, according to needs, a removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc. is set in the drive 810, so that a computer program read from the medium can be installed in the storage unit 808. Good.

ソフトウェアにより上述の一連の処理を実現する場合は、ネットワーク例えばインターネット、又は記憶媒体例えば取り外し可能な媒体811から、このソフトウェアを構成するプログラムをインストールしてもよい。 When the above-described series of processing is realized by software, a program constituting the software may be installed from a network such as the Internet or a storage medium such as a removable medium 811.

なお、当業者が理解すべきは、このような記憶媒体は、中にプログラムが記憶されており、ユーザにプログラムを提供するよう装置と独立して配られる図8に示すような取り外し可能な媒体811に限定されない。取り外し可能な媒体811の例としては、磁気ディスク（フロッピー（登録商標）ディスクを含む）、光ディスク（ＣＤ−ＲＯＭ及びＤＶＤを含む）、光磁気ディスク（ＭＤ（登録商標）を含む）、及び半導体メモリを含む。或いは、記憶媒体はＲＯＭ802、記憶部808に含まれるハードディスクなどであってもよく、それらにはプログラムが記憶されており、且つそれらを含む装置とともにユーザに配られてもよい。 It should be understood by those skilled in the art that such a storage medium is a removable medium as shown in FIG. 8 in which a program is stored and distributed independently of the apparatus so as to provide the program to the user. It is not limited to 811. Examples of the removable medium 811 include a magnetic disk (including a floppy (registered trademark) disk), an optical disk (including a CD-ROM and a DVD), a magneto-optical disk (including MD (registered trademark)), and a semiconductor memory. including. Alternatively, the storage medium may be a hard disk included in the ROM 802 and the storage unit 808, in which a program is stored, and may be distributed to a user together with a device including them.

また、本発明は、マシン（例えば、コンピュータ）読取可能な命令コードからなるプログラムプロダクトにも関する。この命令コードは、マシンに読み取られて実行される時に、上述の実施例による方法を実行することができる。 The invention also relates to a program product comprising machine-readable instruction codes. When this instruction code is read and executed by a machine, the method according to the above-described embodiment can be executed.

さらに、上述のマシン読取可能な命令コードからなるプログラムプロダクトを記憶している記憶媒体も本開示に含まれている。このような記憶媒体は、磁気ディスク（フロッピーディスク）、光ディスク、光磁気ディスク、メモリカード、メモリメモリスティックなどを含むが、これらに限定されない。 Furthermore, a storage medium that stores a program product including the above-described machine-readable instruction code is also included in the present disclosure. Such storage media include, but are not limited to, magnetic disks (floppy disks), optical disks, magneto-optical disks, memory cards, memory memory sticks, and the like.

本発明の上述の実施例による方法は、明細書に記載の又は図面に図示の時間順序に従って実行することに限定されず、他の時間順序に従って、並列に又は独立して実行してもよい。よって、本明細書又は図面に記載の方法の実行順序は、本発明の技術的範囲を限定しない。 The method according to the above-described embodiments of the present invention is not limited to being performed according to the time sequence described in the specification or illustrated in the drawings, but may be performed in parallel or independently according to other time sequences. Therefore, the execution order of the method described in this specification or the drawings does not limit the technical scope of the present invention.

また、もちろん、本発明の上述の方法の各処理プロセスは、各種のマシン可読記憶媒体に保存のコンピュータ実行可能なプログラムの方式により実現されてもよい。 Of course, each processing process of the above-described method of the present invention may be realized by a computer-executable program system stored in various machine-readable storage media.

また、本発明の目的は、上述の実行可能なプログラムコードを記憶している記憶媒体を直接又は間接にシステム又は設備に提供し、且つ、該システム又は設備中のコンピュータ又は中央処理ユニット（CPU）が上述のプログラムコードを読み出して実行させる方式で実現されてもよい。 Another object of the present invention is to provide a storage medium storing the above-described executable program code directly or indirectly to a system or equipment, and a computer or central processing unit (CPU) in the system or equipment. May be realized by a method of reading and executing the above-described program code.

また、該システム又は設備はプログラムを実行する機能を有すれば、本発明の実施方式はプログラムに限定されず、また、該プログラムは他の任意の形式、例えば、オブジェクトプログラム、インタープリター実行用プログラム、又は、オペレーティングシステム操作系統に提供するスクリプトプログラムなどであってもよい。 In addition, as long as the system or equipment has a function of executing a program, the implementation method of the present invention is not limited to the program, and the program may be in any other format such as an object program or an interpreter execution program. Alternatively, it may be a script program provided to the operating system operation system.

上述のマシン可読記憶媒体は、各種の存儲器及び存儲ユニット、半導体装置、ディスユニット例えば光、磁気及び光磁気ディス、及び他の任意の使用可能な情報記憶媒体などであってもよい。 The machine-readable storage medium described above may be various storage devices and storage units, semiconductor devices, disunits such as optical, magnetic and magneto-optical displays, and any other usable information storage medium.

また、クライントコンピュータが、インターネットに接続されている所定のウェブサイトを経由して、本発明の上述の実施例によるコンピュータプログラムコードをダウンロードし、コンピュータにインストールした後に、該プログラムを実行することにより、本発明を実現することもできる。 In addition, the client computer downloads the computer program code according to the above-described embodiment of the present invention via a predetermined website connected to the Internet, installs it in the computer, and then executes the program, The present invention can also be realized.

また、上述の各実施例を含む実施形態に関し、更に以下の付記を開示する。 Moreover, the following additional remarks are disclosed regarding the embodiment including each of the above-described examples.

（付記1）
中間言語の語句を評価する方法であって、
前記中間言語の語句がソース言語に対する第一特定属性を確定し；
前記中間言語の語句がターゲット言語に対する第二特定属性を確定し；
前記第一特定属性及び前記第二特定属性に基づいて、前記中間言語の語句の信頼性点数を計算し；及び、
前記信頼性点数に基づいて、前記中間言語の語句を評価することを含み、
そのうち、前記中間言語の語句は、ソース言語の特定語句をターゲット言語の語句に翻訳するためのブリッジである、方法。 (Appendix 1)
A method for evaluating phrases in an intermediate language,
The intermediate language phrase establishes a first specific attribute for the source language;
The intermediate language phrase establishes a second specific attribute for the target language;
Calculating a confidence score of the intermediate language phrase based on the first specific attribute and the second specific attribute; and
Evaluating the intermediate language phrase based on the confidence score;
Wherein the intermediate language phrase is a bridge for translating a source language specific phrase into a target language phrase.

（付記2）
付記1に記載の方法であって、
前記第一特定属性は、前記中間言語の語句が前記ソース言語における第一語義範囲を含み、及び、前記第二特定属性は、前記中間言語の語句が前記ターゲット言語における第二語義範囲を含む、方法。 (Appendix 2)
The method according to appendix 1, wherein
The first specific attribute includes the intermediate language phrase including a first semantic range in the source language, and the second specific attribute includes the intermediate language phrase includes a second semantic range in the target language. Method.

（付記3）
付記2に記載の方法であって、
前記第一語義範囲は、前記ソース言語中の、前記中間言語の語句に対応する語句の数、又は、該数の関数であり、及び、前記第二語義範囲は、前記ターゲット言語中の、前記中間言語に対応する語句の数、又は、該数の関数である、方法。 (Appendix 3)
The method according to appendix 2, wherein
The first semantic range is the number of phrases corresponding to the intermediate language phrase in the source language, or a function of the number, and the second semantic range is the target language, The number of phrases corresponding to the intermediate language, or a method of the number.

（付記4）
付記1に記載の方法であって、
前記第一特定属性は、さらに、
前記中間言語の語句の、前記ソース言語の特定語句への翻訳確率、及び/又は、前記中間言語の語句の、前記ソース言語の特定語句への語彙化翻訳確率を含む、方法。 (Appendix 4)
The method according to appendix 1, wherein
The first specific attribute further includes:
A method comprising: a translation probability of the intermediate language phrase to the source language specific phrase and / or a lexicalized translation probability of the intermediate language phrase to the source language specific phrase.

（付記5）
付記1に記載の方法であって、
前記中間言語の語句がソース言語に対する第一特定属性を確定するステップの前に、前記方法は、さらに、
前記ソース言語の特定語句と、前記中間言語の語句データベース中の語句とのアライメントを行い、前記中間言語の第一語句を取得し；
前記第一語句中から、前記ソース言語の特定語句に対応しない部分を除去し、第一アライメント語句を取得し；
前記ターゲット言語の語句データベース中の語句と、前記中間言語の語句データベース中の語句とのアライメントを行い、前記中間言語の第二語句を取得し；
前記第二語句中から、前記ターゲット言語の語句データベース中の語句に対応しない部分を除去し、第二アライメント語句を取得し；及び、
前記第一アライメント語句と、前記第二アライメント語句との共通集合中の語句を、前記中間言語の評価待ち語句とすることを含む、方法。 (Appendix 5)
The method according to appendix 1, wherein
Prior to the step in which the intermediate language phrase establishes a first specific attribute for the source language, the method further comprises:
Aligning a specific phrase of the source language with a phrase in the phrase database of the intermediate language to obtain a first phrase of the intermediate language;
Removing a portion that does not correspond to the specific phrase of the source language from the first phrase to obtain a first alignment phrase;
Aligning phrases in the target language phrase database with phrases in the intermediate language phrase database to obtain a second phrase in the intermediate language;
Removing a portion of the second language that does not correspond to the phrase in the target language phrase database to obtain a second alignment phrase; and
Including a phrase in a common set of the first alignment phrase and the second alignment phrase as an evaluation pending phrase of the intermediate language.

（付記6）
付記5に記載の方法であって、
前記第一アライメント語句と、前記第二アライメント語句との共通集合中の語句を、前記中間言語の評価待ち語句とするステップの前に、さらに、
前記第一アライメント語句の開始部分及び前記第二アライメント語句の開始部分が安定であるかどうかをそれぞれ判断し；及び、
前記第一アライメント語句の開始部分及び前記第二アライメント語句の開始部分が不安定であれば、前記開始部分を除去することを含む、方法。 (Appendix 6)
The method according to appendix 5, wherein
Before the step of making the phrase in the common set of the first alignment phrase and the second alignment phrase to be an evaluation pending phrase of the intermediate language,
Determining whether the starting part of the first alignment word and the starting part of the second alignment word are stable, respectively; and
Removing the starting portion if the starting portion of the first alignment phrase and the starting portion of the second alignment phrase are unstable.

（付記7）
付記6に記載の方法であって、
前記第一アライメント語句の開始部分及び前記第二アライメント語句の開始部分が安定であるかどうかをそれぞれ判断するステップは、
前記第一アライメント語句の開始部分及び前記第二アライメント語句の開始部分が前記ソース言語における語義範囲が第一閾値を超えたかどうかをそれぞれ判断し；及び/又は、
前記第一アライメント語句の開始部分及び前記第二アライメント語句の開始部分が前記ターゲット言語における語義範囲が第二閾値を超えたかどうかをそれぞれ判断することを含む、方法。 (Appendix 7)
The method according to appendix 6, wherein
Each of determining whether the start portion of the first alignment word and the start portion of the second alignment word are stable,
The beginning portion of the first alignment phrase and the beginning portion of the second alignment phrase each determine whether the semantic range in the source language exceeds a first threshold; and / or
A method wherein the starting portion of the first alignment phrase and the starting portion of the second alignment phrase each determine whether a semantic range in the target language exceeds a second threshold.

（付記8）
付記1に記載の方法であって、
前記第一特定属性及び前記第二特定属性を特徴とし、回帰アルゴリズムによって前記中間言語の語句の信頼性点数を計算する、方法。 (Appendix 8)
The method according to appendix 1, wherein
A method characterized in that the first specific attribute and the second specific attribute are characterized, and the reliability score of the intermediate language phrase is calculated by a regression algorithm.

（付記9）
付記8に記載の方法であって、
前記回帰アルゴリズムは、人工神経ネットワークアルゴリズムである、方法。 (Appendix 9)
The method according to appendix 8, wherein
The method, wherein the regression algorithm is an artificial neural network algorithm.

（付記10）
付記9に記載の方法であって、
次の式により前記信頼性点数f(x)を計算し、

そのうち、Kは、活性化関数を表し、g_m(x)は、第m個の特徴の値を表し、w_mは、第m個の特徴の重みを表し、biasWは、バイアス重みを表し、biasVは、バイアス値を表す、方法。 (Appendix 10)
The method according to appendix 9, wherein
Calculate the reliability score f (x) by the following formula,

Where K represents an activation function, g _m (x) represents the value of the m th feature, w _m represents the weight of the m th feature, biasW represents the bias weight, biasV is a method that represents a bias value.

（付記11）
機械翻訳方法であって、
付記1〜10の任意の１項に記載の方法によって、前記中間言語の語句の信頼性点数を取得し；
前記中間言語の語句の信頼性点数、及び、ソース言語の特定語句を前記中間言語の語句を経由して前記ターゲット言語の候補語句に翻訳する機械翻訳点数に基づいて、前記ターゲット言語の前記候補語句の翻訳点数を計算し；及び、
前記翻訳点数に基づいて、前記ターゲット言語の前記候補語句中から前記ターゲット言語の語句を翻訳結果として選択することを含む、方法。 (Appendix 11)
A machine translation method,
Obtaining a reliability score of the intermediate language phrase by the method according to any one of appendices 1 to 10;
The candidate language of the target language based on a reliability score of the intermediate language phrase and a machine translation score for translating a specific word of the source language into the candidate language of the target language via the intermediate language phrase Calculate the translation score of; and
Selecting a target language phrase as a translation result from the candidate phrases of the target language based on the translation score.

（付記12）
付記11に記載の方法であって、さらに、
前記信頼性点数及び前記機械翻訳点数に基づいて、CKYアルゴリズムにより、前記ターゲット言語の前記候補語句の翻訳点数を計算することを含む、方法。 (Appendix 12)
The method according to appendix 11, further comprising:
Calculating a translation score of the candidate phrase of the target language by a CKY algorithm based on the reliability score and the machine translation score.

（付記13）
付記11に記載の方法であって、さらに、
前記中間言語の複数の語句中から、前記信頼性点数が所定信頼性点数閾値よりも大きい語句を信頼語句として選択することを含む、方法。 (Appendix 13)
The method according to appendix 11, further comprising:
Selecting from among a plurality of phrases of the intermediate language, a phrase having the reliability score greater than a predetermined reliability score threshold as a confidence phrase.

（付記14）
付記13に記載の方法であって、
次の式により前記ターゲット言語の第i個の候補語句の翻訳点数P_iを計算し、

そのうち、r_jは、前記中間言語の第j個の信頼語句の信頼性点数を表し、T_i ^jは、前記ソース言語の特定語句を前記中間言語の第j個の信頼語句を経由して前記ターゲット言語の第i個の候補語句に翻訳する機械翻訳点数を表す、方法。 (Appendix 14)
The method according to appendix 13, wherein
The translation score P _i of the i-th candidate word / phrase of the target language is calculated by the following formula:

Of these, r _j represents the reliability score of the jth trust word in the intermediate language, and T _i ^j passes the specific phrase in the source language via the jth trust word in the intermediate language. A method for representing a machine translation score to be translated into the i th candidate word in the target language.

（付記15）
中間言語の語句を評価する装置であって、
前記中間言語の語句がソース言語に対する第一特定属性を確定する第一特定属性確定部；
前記中間言語の語句がターゲット言語に対する第二特定属性を確定する第二特定属性確定部；
前記第一特定属性及び前記第二特定属性に基づいて、前記中間言語の語句の信頼性点数を計算する信頼性点数計算部；及び、
前記信頼性点数に基づいて、前記中間言語の語句を評価する評価部を含み、
そのうち、前記中間言語の語句は、ソース言語の特定語句をターゲット言語の語句に翻訳するためのブリッジである、装置。 (Appendix 15)
A device for evaluating phrases in an intermediate language,
A first specific attribute determining unit for determining a first specific attribute of the intermediate language for the source language;
A second specific attribute determining unit for determining a second specific attribute for the target language by the phrase of the intermediate language;
A reliability score calculation unit for calculating a reliability score of the phrase of the intermediate language based on the first specific attribute and the second specific attribute; and
An evaluation unit that evaluates the phrase of the intermediate language based on the reliability score;
Wherein the intermediate language phrase is a bridge for translating a source language specific phrase into a target language phrase.

（付記16）
付記15に記載の装置であって、
前記第一特定属性は、前記中間言語の語句が前記ソース言語における第一語義範囲を含み、及び、前記第二特定属性は、前記中間言語の語句が前記ターゲット言語における第二語義範囲を含む、装置。 (Appendix 16)
The apparatus according to appendix 15, wherein
The first specific attribute includes the intermediate language phrase including a first semantic range in the source language, and the second specific attribute includes the intermediate language phrase includes a second semantic range in the target language. apparatus.

（付記17）
付記16に記載の装置であって、
前記第一語義範囲は、前記ソース言語中の、前記中間言語の語句に対応する語句の数、又は、該数の関数であり、及び、前記第二語義範囲は、前記ターゲット言語中の、前記中間言語の語句に対応する語句の数、又は、該数の関数である、装置。 (Appendix 17)
The apparatus according to appendix 16, wherein
The first semantic range is the number of phrases corresponding to the intermediate language phrase in the source language, or a function of the number, and the second semantic range is the target language, A device that is or is a function of the number of phrases corresponding to a phrase in an intermediate language.

（付記18）
付記15〜17の任意の１項に記載の装置であって、
前記第一特定属性は、さらに、
前記中間言語の語句の、前記ソース言語の特定語句への翻訳確率、及び/又は、前記中間言語の語句の、前記ソース言語の特定語句への語彙化翻訳確率を含む、装置。 (Appendix 18)
The apparatus according to any one of appendices 15 to 17, comprising:
The first specific attribute further includes:
An apparatus comprising: a translation probability of the intermediate language phrase to the source language specific phrase and / or a lexicalized translation probability of the intermediate language phrase to the source language specific phrase.

（付記19）
付記15〜17の任意の１項に記載の装置であって、さらに、
前記ソース言語の特定語句と、前記中間言語の語句データベース中の語句とのアライメントを行い、前記中間言語の第一語句を得る第一アライメント部；
前記第一語句中から、前記ソース言語の特定語句に対応しない部分を除去し、第一アライメント語句を得る第一除去部；
前記ターゲット言語の語句データベース中の語句と、前記中間言語の語句データベース中の語句とのアライメントを行い、前記中間言語の第二語句を得る第二アライメント部；
前記第二語句中から、前記ターゲット言語の語句データベース中の語句に対応しない部分を除去し、第二アライメント語句を得る第二除去部；及び、
前記第一アライメント語句と、前記第二アライメント語句との共通集合中の語句を、評価待ちの前記中間言語の語句とする共通集合確定部を含む、装置。 (Appendix 19)
The apparatus according to any one of appendices 15 to 17, further comprising:
A first alignment unit that performs alignment between the specific phrase of the source language and the phrase in the phrase database of the intermediate language to obtain the first phrase of the intermediate language;
A first removal unit that removes a portion of the first word that does not correspond to the specific word of the source language to obtain a first alignment word;
A second alignment unit that aligns a phrase in the phrase database of the target language and a phrase in the phrase database of the intermediate language to obtain a second phrase of the intermediate language;
A second removing unit that removes a portion of the target language that does not correspond to the phrase in the phrase database of the target language to obtain a second alignment phrase; and
An apparatus comprising: a common set determination unit that sets a word in a common set of the first alignment word and the second alignment word as a word of the intermediate language waiting for evaluation.

（付記20）
機械翻訳装置であって、
付記15〜19の任意の１項に記載の装置であって、前記中間言語の語句の信頼性点数を得る、装置；
前記中間言語の語句の信頼性点数、及び、ソース言語の特定語句を前記中間言語の語句を経由して前記ターゲット言語の候補語句に翻訳する機械翻訳点数に基づいて、前記ターゲット言語の前記候補語句の翻訳点数を計算する翻訳点数計算部；及び、
前記翻訳点数に基づいて、前記ターゲット言語の前記候補語句中から、前記ターゲット言語の語句を翻訳結果として選択する翻訳結果選択部を含む、機械翻訳装置。 (Appendix 20)
A machine translation device,
The apparatus according to any one of appendices 15 to 19, wherein the apparatus obtains a reliability score of the intermediate language phrase;
The candidate language of the target language based on a reliability score of the intermediate language phrase and a machine translation score for translating a specific word of the source language into the candidate language of the target language via the intermediate language phrase A translation score calculation unit for calculating the translation score of;
A machine translation device, comprising: a translation result selection unit that selects a word in the target language as a translation result from the candidate words in the target language based on the translation score.

以上、本発明の好ましい実施形態を説明したが、本発明はこの実施形態に限定されず、本発明の趣旨を離脱しない限り、本発明に対するあらゆる変更は本発明の技術的範囲に属する。 The preferred embodiment of the present invention has been described above, but the present invention is not limited to this embodiment, and all modifications to the present invention belong to the technical scope of the present invention unless departing from the spirit of the present invention.

Claims

A method for evaluating phrases in an intermediate language,
The intermediate language phrase establishes a first specific attribute for the source language;
The intermediate language phrase establishes a second specific attribute for the target language;
Calculating a confidence score of the intermediate language phrase based on the first specific attribute and the second specific attribute; and
Evaluating the intermediate language phrase based on the confidence score;
The intermediate language phrase is a bridge for translating a source language specific phrase into a target language phrase.

The method of claim 1, comprising
The first specific attribute includes the phrase of the intermediate language including a first semantic range in the source language; and
The second specific attribute is a method, wherein the phrase of the intermediate language includes a second semantic range in the target language.

A method according to claim 2, comprising
The first semantic range is the number of words or a function of the number corresponding to the intermediate language words in the source language; and
The method wherein the second semantic range is a number of words or a function of the number corresponding to the words of the intermediate language in the target language.

The method of claim 1, comprising
The first specific attribute further includes:
Probability of translation of the intermediate language phrase into the source language specific phrase, and / or
A method comprising lexicalized translation probabilities of the intermediate language phrases to the source language specific phrases.

The method of claim 1, comprising
Prior to the step in which the intermediate language phrase establishes a first specific attribute for the source language, the method further comprises:
Aligning a specific phrase of the source language with a phrase in the phrase database of the intermediate language to obtain a first phrase of the intermediate language;
Removing a portion of the first word that does not correspond to a specific word of the source language to obtain a first alignment word;
Aligning phrases in the target language phrase database with phrases in the intermediate language phrase database to obtain a second phrase in the intermediate language;
Removing a portion of the second language that does not correspond to the phrase in the target language phrase database to obtain a second alignment phrase; and
Including a phrase in a common set of the first alignment phrase and the second alignment phrase as an evaluation pending phrase of the intermediate language.

A method according to claim 5, comprising
Prior to the step of making a phrase in a common set of the first alignment phrase and the second alignment phrase to be an evaluation pending phrase of the intermediate language, the method further comprises:
Determining whether the starting part of the first alignment word and the starting part of the second alignment word are stable, respectively; and
Removing the starting portion if the starting portion of the first alignment phrase and the starting portion of the second alignment phrase are unstable.

The method according to claim 6, wherein
Each of determining whether the start portion of the first alignment word and the start portion of the second alignment word are stable,
The beginning portion of the first alignment phrase and the beginning portion of the second alignment phrase each determine whether the semantic range in the source language exceeds a first threshold; and / or
A method wherein the starting portion of the first alignment phrase and the starting portion of the second alignment phrase each determine whether a semantic range in the target language exceeds a second threshold.

A machine translation method,
Obtaining a confidence score of the phrase of the intermediate language by the method according to any one of claims 1-7;
The candidate language of the target language based on a reliability score of the intermediate language phrase and a machine translation score for translating a specific word of the source language into the candidate language of the target language via the intermediate language phrase Calculate the translation score of; and
A machine translation method including selecting a target language phrase as a translation result from the candidate words of the target language based on the translation score.

A device for evaluating phrases in an intermediate language,
A first specific attribute determining unit for determining a first specific attribute of the intermediate language for the source language;
A second specific attribute determining unit for determining a second specific attribute for the target language by the phrase of the intermediate language;
A reliability score calculation unit for calculating a reliability score of the phrase of the intermediate language based on the first specific attribute and the second specific attribute; and
An evaluation unit that evaluates the phrase of the intermediate language based on the reliability score;
Wherein the intermediate language phrase is a bridge for translating a source language specific phrase into a target language phrase.

A machine translation device,
10. The apparatus of claim 9, wherein the apparatus obtains a reliability score for the intermediate language phrase;
The candidate language of the target language based on a reliability score of the intermediate language phrase and a machine translation score for translating a specific word of the source language into the candidate language of the target language via the intermediate language phrase A translation score calculation unit for calculating the translation score of;
A machine translation device, comprising: a translation result selection unit that selects a word in the target language as a translation result from the candidate words in the target language based on the translation score.