JP2609173B2

JP2609173B2 - Example-driven machine translation method

Info

Publication number: JP2609173B2
Application number: JP2077471A
Authority: JP
Inventors: 英一郎隅田; 仁飯田; 秀雄幸山
Original assignee: 株式会社エイ・ティ・アール自動翻訳電話研究所
Priority date: 1990-03-26
Filing date: 1990-03-26
Publication date: 1997-05-14
Anticipated expiration: 2012-05-14
Also published as: JPH03276367A

Description

【発明の詳細な説明】［産業上の利用分野］この発明は用例主導型機械翻訳方法に関し、特に、専
門家によって作成される翻訳規則を用いることなく、原
文とこの原文に対応する訳文との対からなる設例をデー
タベースに蓄積しておき、入力された原文に最も類似し
た用例をデータベースから検索し、検索された用例の翻
訳に従って原文を翻訳するような用例主導型機械翻訳方
法に関する。Description: TECHNICAL FIELD The present invention relates to an example-driven machine translation method, and more particularly, to a method of translating an original text and a translation corresponding to the original text without using a translation rule created by an expert. The present invention relates to an example-driven machine translation method in which an example composed of pairs is stored in a database, an example most similar to the input original is searched from the database, and the original is translated according to the translation of the searched example.

［従来の技術および発明が解決しようとする課題］電子計算機による機械翻訳システムは、ますますその
必要性が高まり、研究開発が盛んになっている。しか
も、最近では、技術文書だけではなく、多様な分野への
適用も始まっており、会話への適用や自動翻訳電話への
応用も始まっている。従来の電子計算機を用いた翻訳
装置として、規則主導型機械翻訳方式が知られている。
この規則主導型翻訳方式では、訳語選択規則など複雑で
大規模な規則のデータベースが用いられている。これら
の規則は専門家によって各対象分野ごとに作成されてい
るが、その構築にかかるコストがシステムを作成する際
の大きな問題となっている。また、規則の追加の影響が
単純に予測できず、翻訳の質の向上も困難である。[Problems to be Solved by Conventional Techniques and Inventions] The need for machine translation systems using computers has been increasing, and research and development have been active. In addition, recently, application to not only technical documents but also various fields has started, and application to conversation and automatic translation telephone has also started. A rule-driven machine translation system is known as a translation device using a conventional computer.
In this rule-driven translation system, a database of complicated and large-scale rules such as translation word selection rules is used. Although these rules have been prepared for each subject area by experts, the cost of their construction is a major problem when creating a system. Also, the effects of adding rules cannot be simply predicted, and it is difficult to improve the quality of translation.

それゆえに、この発明の主たる目的は、従来の規則を
用いることなく、容易に収集可能な原文と対訳との対か
らなる用例をデータベースに蓄積しておき、このデータ
ベースに蓄積された用例を用いて原文を翻訳し得る用例
主導型機械翻訳方法を提供することである。Therefore, a main object of the present invention is to store, in a database, examples of easily collected original sentences and bilingual pairs without using conventional rules, and to use the examples stored in the database. An object-driven machine translation method capable of translating an original sentence is provided.

［課題を解決するための手段］この発明は原文とこの原文に対応する訳文との対から
なる用例を用いて機械翻訳する用例主導型機械翻訳方法
であって、翻訳規則を用いることなく用例を用例データ
ベースに蓄積しておくとともに、原言語の単語を意味の
類似性に基づいて木構造に階層化した辞書データベース
を用意し、入力された原文と用例データベースの用例と
の類似性を辞書データベースを参照して実数値として計
算し、計算された値が最小の用例の翻訳に従って、入力
された原文を翻訳するように構成したものである。[Means for Solving the Problems] The present invention is an example-driven machine translation method for performing machine translation using an example composed of a pair of an original sentence and a translated sentence corresponding to the original sentence. In addition to accumulating in the example database, a dictionary database is prepared in which words in the source language are hierarchized into a tree structure based on the similarity of meaning, and the similarity between the input source text and the examples in the example database is checked. It is configured to refer to and calculate as a real value, and to translate the input original text in accordance with the translation of the example in which the calculated value is the minimum.

［作用］この発明に係る用例主導型機械翻訳方法では、辞書デ
ータベースを参照して、入力された原文と用例データベ
ースの用例との類似性を実数値として計算し、計算され
た値が最小の用例の翻訳に従って入力された原文を翻訳
する。[Operation] In the example-driven machine translation method according to the present invention, the similarity between the input original sentence and the example in the example database is calculated as a real value with reference to the dictionary database, and the example in which the calculated value is the minimum is calculated. Translate the input text according to the translation.

［発明の実施例］第１図はこの発明の一実施例の概略ブロック図であ
る。第１図を参照して、入力部１はキーボードやOCRや
音声確認装置などからなり、原言語文を入力する。入力
された原言語文は解析部２に与えられる。解析部２は入
力された原言語文を解析するものであって、解析結果と
して入力言語に依存した原言語依存構造を抽出して、こ
の発明の特徴となる用例主導型変換部３に与える。用例
主導型変換部３は後述の第２図ないし第４図で詳細に説
明するが、原言語依存構造を変換して、目的言語に依存
した目的言語依存構造を作り出して生成部４に与える。
生成部４は目的言語の語順に並べ変えたり、活用変化や
語尾変化を行ない、目的言語の文章を生成して出力部５
に与える。出力部５は表示装置や印刷装置や音声合成装
置などによって構成される。FIG. 1 is a schematic block diagram of an embodiment of the present invention. Referring to FIG. 1, input unit 1 includes a keyboard, an OCR, a voice confirmation device, and the like, and inputs a source language sentence. The input source language sentence is provided to the analysis unit 2. The analysis unit 2 analyzes the input source language sentence, extracts a source language dependent structure depending on the input language as an analysis result, and provides the source language dependent structure to the example driven type conversion unit 3 which is a feature of the present invention. As will be described in detail later with reference to FIGS. 2 to 4, the example-driven conversion unit 3 converts the source language-dependent structure, creates a target language-dependent structure depending on the target language, and provides it to the generation unit 4.
The generation unit 4 rearranges the word order of the target language, changes the inflection and the ending, generates a sentence in the target language, and outputs it.
Give to. The output unit 5 includes a display device, a printing device, a voice synthesizing device, and the like.

第２図および第３図は用例主導型変換部の入力と出力
の依存構造を説明するための図である。ここで、依存構
造とは、原言語および目的言語の文の構造を記述するた
めの木構造であって、単語が節点で単語間の依存関係が
リンクで表現されることをいう。第２図に示した上部の
日本語文「太郎はドイツの車を買う。」に対する依存構
造について説明する。矩形で節点を表現し、直線でリン
クを表現している。「買う」には、「太郎」が関係
「は」で依存し、「車」が関係「を」で依存している。
また、「車」には、「ドイツ」が関係「の」で依存して
いる。依存されている節点を主要素と呼び、依存してい
る節点を従要素と呼ぶ。変換プロセスを制御するため
に、原言語の依存関係を使用する。また、第３図には、
訳文の「Taro buys ａ car made−in Germany.」
に対する依存構造が挙げてある。FIG. 2 and FIG. 3 are diagrams for explaining the dependency structure of the input and output of the example driven conversion unit. Here, the dependency structure is a tree structure for describing the structure of a sentence in the source language and the target language, and means that words are nodes and dependencies between words are represented by links. The dependency structure for the upper Japanese sentence "Taro buys a German car" shown in FIG. 2 will be described. Nodes are represented by rectangles, and links are represented by straight lines. "Taro" depends on the relationship "ha" for "buy", and "car" depends on the relationship "wo".
In addition, "Germany" is dependent on "Car" for the relationship "No." The dependent node is called the main element, and the dependent node is called the subordinate element. Use source language dependencies to control the conversion process. Also, in FIG.
The translated text "Taro buys a car made-in Germany."
There is a dependency structure on.

用例主導型変換部３の基本的な考えを、ここでは説明
の簡略化のために依存構造を用いずに、直観的に説明す
る。用例主導型変換部３では、原文、その対訳の対から
なる用例データベースと、原言語の単語を意味の類似性
に基づいて木構造に階層化したシソーラスと呼ばれる辞
書データベースとが用いられる。用例データベースの中
には次の２つの用例が含まれているとする。The basic idea of the example-driven conversion unit 3 will be described intuitively here without using a dependent structure for simplification of the description. The example-driven conversion unit 3 uses an example database including pairs of original sentences and their translations, and a dictionary database called a thesaurus in which words in the source language are hierarchized into a tree structure based on similarity of meaning. It is assumed that the following two examples are included in the example database.

包丁は切れる。→The kitchen knifi cuts. 彼女は切れる。→She is sharp. 次の入力と用例の日本語を比較すると、構文的に
ととに類似していることがわかる。The knife cuts. → The kitchen knifi cuts. She cuts. → She is sharp. Comparing the following input with the Japanese in the example shows that it is syntactically similar to.

課長は切れる。 The section manager runs out.

さらに、シソーラスを検索すると、「課長」と「包
丁」は意味的に遠く、「課長」と「彼女」は意味的に近
いことがわかる。このことから、入力文は構文的かつ
意味的に用例の日本語に類似していることがわかる。
したがって、入力の翻訳は用例に従って次ののよ
うにすることができる。これが用例主導型変換部３の機
能である。Further, when searching the thesaurus, it is found that “section manager” and “knife” are semantically far, and “section manager” and “she” are semantically close. This indicates that the input sentence is syntactically and semantically similar to the example Japanese.
Therefore, the translation of the input can be as follows according to the example. This is the function of the example-driven conversion unit 3.

The section chief is sharp. また、用例主導型機械翻訳では、用例を追加すること
により、簡単に翻訳の品質を向上できる。今、用例デー
タベースの中に次の２つの用例が含まれているとする。The section chief is sharp. In addition, in example-driven machine translation, the quality of translation can be easily improved by adding examples. Now, it is assumed that the following two examples are included in the example database.

弾丸が的に当たった。→The bullet hit the t
arget. １ドルは360円に当たった。→Onedollar was equ
ivalent to 360 yen. この状態でシステムはに対して構文的かつ意味的に類
似した用例に従って訳文を出力できる。The bullet hit the target. → The bullet hit the t
arget. One dollar hit 360 yen. → Onedollar was equ
ivalent to 360 yen. In this state, the system can output a translation according to a syntactically and semantically similar example.

弾丸が的に当たらなかった。 The bullet did not hit.

The bullet didn′ｔ hit the target. しかし、より良い訳文として次のが望ましいとする。 The bullet didn't hit the target. However, a better translation would be:

The bullet missed the target. この訳文を生成するために、従来技術と異なり、用例
主導型機械翻訳では、の用例を用例データベースに追
加するだけ済む。これは仮名漢字変換の学習機能に匹敵
する有用なものである。In order to generate this translation, unlike the prior art, in the example-driven machine translation, it is only necessary to add the example to the example database. This is as useful as the learning function of kana-kanji conversion.

弾丸が的に当たらなかった。→The bullet misse
d the target. この追加によって、以後、類似した入力に対してもシス
テムが最適な訳を出力できる。たとえば、入力に対し
てを出力できる。The bullet did not hit. → The bullet misse
d the target. This addition allows the system to output an optimal translation for similar inputs in the future. For example, can be output for input.

石が窓に当たらなかった。 The stone did not hit the window.

The stone missed the window. 第４図は第１図に示した用例主導型変換部の概略ブロ
ック図である。第４図を参照して、用例主導型変換部３
は変換制御部31と変換部32とからなる。変換部32は予め
第１図に示した解析部２を用いて処理されている用例デ
ータベース33と、単語を意味の類似性に基づいて木構造
に階層化した辞書データベース、すなわちシソーラス34
を参照する。変換制御部31は原言語の依存構造を、上か
ら下へ、左から右へ走査しながら、原言語に各部分依存
構造ごとに変換部32を起動する。すべての節点の走査が
終了すると、目的言語の依存構造ができ、これを第１図
に示した生成部４に出力する。たとえば、前述の第２図
および第３図に示した依存構造の場合は、矩形内の番号
順に走査する。換言すれば、変換プロセスを制御するた
めに原言語の依存構造を使用する。変換部32は変換制御
部31から部分依存構造を受取り、書換えた部分依存構造
を変換制御部31に返す。FIG. 4 is a schematic block diagram of the example-driven conversion unit shown in FIG. Referring to FIG. 4, example-driven conversion unit 3
Consists of a conversion control unit 31 and a conversion unit 32. The conversion unit 32 includes an example database 33 previously processed using the analysis unit 2 shown in FIG. 1, and a dictionary database in which words are hierarchized into a tree structure based on the similarity of meaning, that is, a thesaurus 34.
See The conversion control unit 31 activates the conversion unit 32 for each partial dependency structure in the source language while scanning the dependency structure of the source language from top to bottom and from left to right. When all the nodes have been scanned, a dependency structure of the target language is created, and this is output to the generation unit 4 shown in FIG. For example, in the case of the dependency structure shown in FIGS. 2 and 3, scanning is performed in the order of the numbers in the rectangle. In other words, it uses the source language dependency structure to control the conversion process. The conversion unit 32 receives the partially dependent structure from the conversion control unit 31 and returns the rewritten partially dependent structure to the conversion control unit 31.

第５図および第６図は用例主導型変換部のプロセスを
説明するための図である。第５図および第６図に示すよ
うに主要素、それに直接依存する従要素、主要素と従要
素の関係の３つからなるか、あるいは従要素がない場合
には主要素のみである。変換部32は、主要素、それに直
接依存する従要素、主要素と従要素の関係の３つからな
る場合、主要素と関係が変換され、従要素がない場合に
は主要素のみが変換される。5 and 6 are diagrams for explaining the process of the example-driven conversion unit. As shown in FIGS. 5 and 6, the main element, the subordinate element directly dependent on the main element, the relation between the main element and the subelement, or only the main element when there is no subelement. The conversion unit 32 converts the relationship between the main element and the main element when there are three components: the main element, the sub-element directly dependent on the main element, and the relation between the main element and the sub-element. You.

変換部32では、入力に類似した用例を用例データベー
ス33から検索するために、入力と用例の距離を計算す
る。最も類似した用例は、距離が最小の用例であり、こ
れに従って部分依存構造を書換えて変換制御部31に返
す。The conversion unit 32 calculates the distance between the input and the example in order to search the example database 33 for an example similar to the input. The most similar example is an example in which the distance is the smallest, and the partial dependent structure is rewritten according to the example and returned to the conversion control unit 31.

次に、距離計算について説明する。変換部32への入力
である部分依存構造の情報は属性値のリストとして表現
できる。以下には、計算式の説明のために、属性値のリ
ストを用いる。リストI,E、ｉ番目の属性値Ii,Eiと表わ
すことにする。翻訳に用いられる代表的な属性値として
は、格，否定，相，法，態などの構文的な情報および名
詞，動詞の意味的な情報などがある。全体の距離ｄ（I,
E）は各属性値の距離ｄ（Ii,Ei）と属性値の重みWiを用
いて次の式で計算する。Next, the distance calculation will be described. Information on the partial dependency structure, which is input to the conversion unit 32, can be expressed as a list of attribute values. In the following, a list of attribute values is used for explanation of the calculation formula. Lists I and E are represented by i-th attribute values Ii and Ei. Representative attribute values used for translation include syntactic information such as case, negation, aspect, law, and state, and semantic information of nouns and verbs. The total distance d (I,
E) is calculated by the following formula using the distance d (Ii, Ei) of each attribute value and the weight Wi of the attribute value.

次に、属性値の距離について説明する。意味属性以外
は値が一致するか否かに従って０か１とする。意味属性
では部分一致を認め、０から１までの実数を割当てる。
意味属性の距離を決定するためにシソーラスを用いる。
ここで、シソーラスについて少し説明する。シソーラス
は単語を意味の類似性に基づいて木構造に階層化した辞
書データベースである。 Next, the distance of the attribute value will be described. Values other than the semantic attributes are set to 0 or 1 according to whether the values match. In the semantic attribute, partial matching is recognized, and a real number from 0 to 1 is assigned.
A thesaurus is used to determine the distance of the semantic attribute.
Here, the thesaurus will be described a little. The thesaurus is a dictionary database in which words are hierarchized in a tree structure based on the similarity of meaning.

第７図はシソーラスの階層を説明するための図であ
る。第７図を参照して、矩形は概念を示し、上位と下位
の関係で結合されている。たとえば、概念「行動」の下
位概念として「陳述」と「往来」が結合され、さらに、
概念「往来」の下位概念として「滞在」と「発着」が結
合されている。最下層の概念には、その概念に属する類
義語の集合が結合されている、たとえば、最下層の概念
「相談」には単語「会議」，「打合わせ」などが結合さ
れている。シソーラスの作成は既に確立された技術であ
り、市販のデータベースを使用することができる。FIG. 7 is a diagram for explaining the hierarchy of the thesaurus. Referring to FIG. 7, rectangles indicate concepts and are connected in an upper-order and a lower-order relationship. For example, "statement" and "traffic" are subordinate to the concept "action"
“Stay” and “departure” are combined as subordinate concepts of the concept “traffic”. A set of synonyms belonging to the concept is connected to the concept at the bottom. For example, the word "meeting" and "meeting" are connected to the concept "consultation" at the bottom. Creating a thesaurus is an established technique, and commercially available databases can be used.

意味属性の距離として、シソーラスの上での最下位の
共通上位概念の位置に比例した値を割当てる。たとえ
ば、（ｎ＋１）階層のｋ番目の階層に対してk/nを割当
てる。単語の意味の距離の計算に共起情報の利用が従来
より行なわれているが、必要な共起情報の量が膨大にな
るという欠点があるので採用せず、この発明ではシソー
ラスを用いている。As the distance of the semantic attribute, a value proportional to the position of the lowest common superordinate concept on the thesaurus is assigned. For example, k / n is assigned to the k-th layer of the (n + 1) layer. Although the use of co-occurrence information has been conventionally used for calculating the meaning distance of words, it is not adopted because of the disadvantage that the amount of necessary co-occurrence information becomes enormous, and the present invention uses a thesaurus. .

次に属性値の重みについて説明する。属性値の重み
は、その値が訳の選択に与える影響の大きさを表わす。
用例データベース33で同じ属性値を持つ用例を集め、そ
の中で訳の頻度を調べ、次の第（２）式に従って数値化
する。Next, the weight of the attribute value will be described. The weight of the attribute value indicates the magnitude of the effect that the value has on the translation selection.
Examples having the same attribute value are collected in the example database 33, the frequency of translation is checked in the examples, and quantified according to the following equation (2).

この計算は重いが、データベースでの静的な頻度にの
み依存しているので、システム作成時に前もって計算で
き、実行時にコストはかからない。システム作成時に、
用例データベース33から属性値の重みの表が作成され
る。 This calculation is heavy, but depends only on the static frequency in the database, so it can be calculated in advance at system creation time and at no cost at runtime. When creating the system,
A table of attribute value weights is created from the example database 33.

次に、より具体例を掲げて説明する。簡単のために、
日本語の「N₁のN₂」（Ｎで名詞を表わす。以下同じであ
る。）の形の名詞句を英語へ翻訳する場合を例に用い
る。日本語の「N₁のN₂」の形の名詞句は頻度の高い表現
である。特に、関係「の」の英語への翻訳は多様でその
選択は難しい。実際デフォルトと考えられている「N₂of
N₁」の頻度は20〜40％にすぎず、他の前置詞が使われた
り、前置詞なしで翻訳される。Next, a more specific example will be described. For simplicity,
Using as an example the case of translating "N ₁ of N _2" of the Japanese (in N represents a noun. Or less the same.) Form of the noun phrase of the English. Noun phrases in the form of "N ₁ N ₂ " in Japanese are frequent expressions. In particular, the translation of the relationship "no" into English is diverse and difficult to choose. "N ₂ of
The frequency of N ₁ "is only 20% to 40%, or other preposition is used, are translated without preposition.

第８図はこの発明の一実施例の動作を説明するための
図である。第８図を参照して、入力例「京都での会議」
に対して、検索例のような用例が検索される。これに基
づいて訳「the conference in Kyoto」が出力され
る。この例において、以下には名詞句の関係「の」の翻
訳と名詞の翻訳について説明する。関係「の」の翻訳で
は、入力に類似した用例を用例データベース33から検索
するために、入力と用例の距離を計算する。入力と用例
とが同じデータ構造、すなわち属性値のリストとして表
現されているものとする。リストI,E、ｉ番目の属性値
をIi,Eiと記述する。日本語の「N₁のN₂」の形の名詞句
を英語へ翻訳する場合必要な属性は、各名詞に対して
は、品詞の下位分類（サ変，普通…），接頭語・接尾語
の存在，シソーラスの意味概念，格助詞「の」に対して
は、その種類（の，での，からの…）である。FIG. 8 is a diagram for explaining the operation of one embodiment of the present invention. Referring to FIG. 8, input example "meeting in Kyoto"
Is searched for an example such as a search example. Based on this, the translation "the conference in Kyoto" is output. In this example, the translation of the noun phrase relationship "no" and the translation of the noun will be described below. In the translation of the relation “no”, the distance between the input and the example is calculated in order to search the example database 33 for an example similar to the input. It is assumed that the input and the example are expressed as the same data structure, that is, a list of attribute values. The list I, E, and the i-th attribute value are described as Ii, Ei. When translating a Japanese noun phrase in the form of “N ₁ N ₂ ” into English, the required attributes are: For existence, thesaurus semantics, and the case particle "no", it is the kind (of ,, of, from ...).

次に、用例の距離について説明する。全体の距離ｄ
（I,E）は各属性値の距離ｄ（Ii,Ei）と属性値の重みWi
を用いて、次の第（３）式で計算する。Next, an example distance will be described. Overall distance d
(I, E) is the distance d (Ii, Ei) of each attribute value and the weight Wi of the attribute value
Is calculated by the following equation (3).

属性値への距離は、意味属性すなわちシソーラスの意
味概念以外は値が一致する否かに従って０か１とする。
意味属性では、部分一致を認め、０から１までの実数を
割当てる。シソーラスの上での最下位の共通上位概念の
位置に比例した値を割当てる。シソーラスの作成は前述
のごとく、既に確立された技術であり、市販のデータベ
ースを使用することができる。この発明の特徴は、意味
の近さを最下位の共通上位概念の位置に比例した値で数
値化した点である。 The distance to the attribute value is set to 0 or 1 depending on whether or not the values match except for the semantic attribute, that is, the semantic concept of the thesaurus.
In the semantic attribute, a partial match is recognized, and a real number from 0 to 1 is assigned. Assign a value proportional to the position of the lowest common ancestor on the thesaurus. As described above, thesaurus creation is an established technique, and a commercially available database can be used. A feature of the present invention is that the similarity is digitized by a value proportional to the position of the lowest common superordinate concept.

前述の第７図はシソーラスの一部であり、このシソー
ラスは、全体で４階層であり、概念を表わす矩形の中に
数字で示したように、各階層に0,1/3,2/3,3/3を割当て
ている。すなわち、（ｎ＋１）階層のｋ番目の階層に対
してk/nを割当てることになる。第７図の点線ａで示し
たように、単語「会議」と単語「滞在」の最下位の共通
上位概念は「行動」であり、単語「会議」と単語「滞
在」の距離は2/3になる。同様にして、第７図の点線ｂ
で示したように、単語「到着」と単語「滞在」の最下位
の共通上位概念は「往来」であり、単語「到着」と単語
「滞在」の距離は1/3になる。また、第７図には図示し
ていないが、単語「東京」と単語「京都」の最下位の共
通上位概念は最下位の概念「地名」であり、単語「東
京」と単語「京都」の距離は０になる。FIG. 7 described above is a part of the thesaurus. The thesaurus has four layers as a whole, and 0,1 / 3,2 / 3 as shown by a numeral in a rectangle representing a concept. , 3/3 are assigned. That is, k / n is assigned to the k-th layer of the (n + 1) layer. As shown by the dotted line a in FIG. 7, the lowest common superordinate concept of the word “meeting” and the word “stay” is “action”, and the distance between the word “meeting” and the word “stay” is 2/3. become. Similarly, the dotted line b in FIG.
As shown by, the lowest common superordinate concept of the word “arrival” and the word “stay” is “traffic”, and the distance between the word “arrival” and the word “stay” is 1 /. Although not shown in FIG. 7, the lowest common superordinate concept of the word “Tokyo” and the word “Kyoto” is the lowest concept “place name”, and the word “Tokyo” and the word “Kyoto” The distance becomes 0.

次に属性値の重みの計算を、例を用いて説明する。第
９図，第10図および第11図は特定の属性値を持つ用例に
おける訳の分布の一例を示す図である。第９図に示した
例は、１番目の名詞N₁の意味概念が「地名」の場合であ
り、第10図は格助詞が「での」の場合であり、第11図は
２番目の名詞N₂の意味概念が「相談」の場合である。こ
れらの図の訳の欄では、前置詞「in」を使った訳を記号
「BinA」で表現するなど便宜的表記を用いている。Next, the calculation of the weight of the attribute value will be described using an example. FIGS. 9, 10, and 11 show examples of translation distributions in examples having specific attribute values. The example shown in FIG. 9 is a case where the semantic concept of the _first noun N ₁ is “place name”, FIG. 10 is a case where the case particle is “de no”, and FIG. meaning the concept of the noun N ₂ is a case of "consultation". In the translation columns of these figures, a convenient notation such as expressing the translation using the preposition "in" with the symbol "BinA" is used.

第９図に見られるように、１番目の名詞N₁の意味概念
が「地名」の場合は「in」，「from」など多様な前置詞
が使われたり、１番目の名詞N₁の形容詞化が起こった
り、訳の選択との相関が弱い。また、第11図に見られる
ように、「N₂」の意味概念が「相談」の場合も訳の選択
との相関は弱い。As can be seen in FIG. 9, the first case meaning the concept of noun N ₁ is "place name" or used is "in", a variety such as "from" preposition, the first adjective of the noun N ₁ Occurs and the correlation with the translation choice is weak. Also, as seen in FIG. 11, when the meaning concept of “N ₂ ” is “consultation”, the correlation with the translation selection is weak.

一方、第10図に示すように、格助詞が「での」の場合
はすべて前置詞「in」で表現されていて、より相関があ
る。このような相関は次の第（４）式で数値化される。On the other hand, as shown in FIG. 10, when the case particle is “de no”, it is all expressed by the preposition “in”, which is more correlated. Such a correlation is quantified by the following equation (4).

第９図，第10図および第11図のそれぞれの重みは第
（４）式より、次の第（５）式で表わされている。 The weights in FIGS. 9, 10 and 11 are expressed by the following expression (5) from expression (4).

このようにして、属性値の距離と属性値の重みがわか
ると、入力と用例の全体の距離が第（３）式によって計
算できる。ただし、第（１）式と第（３）式とは同じで
あり、第（２）式と第（４）式とは同じである。すなわ
ち、以下のように第８図の検索例中の距離が計算でき
る。 In this way, when the distance between the attribute values and the weight of the attribute values are known, the entire distance between the input and the example can be calculated by Expression (3). However, Expressions (1) and (3) are the same, and Expressions (2) and (4) are the same. That is, the distance in the search example of FIG. 8 can be calculated as follows.

ｄ（京都での会議，東京での滞在）＝ｄ（京都，東京）×0.49＋ｄ（での，での） ×1.0＋ｄ（会議，滞在）×0.54 ＝０×0.49＋０×1.0＋2/3×0.540.4 ｄ（京都での会議，東京の会議）＝ｄ（京都，東京）×0.49＋ｄ（での，の） ×1.0＋ｄ（会議，会議）×0.54 ＝０×0.49＋１×1.0＋０×0.54＝1.0 次に、名詞訳語が複数ある場合の翻訳について説明す
る。たとえば、名本語の「手」には、少なくとも４つの
訳語である「paw」，「hand」，「handle」，「move」
がある。それぞれに対する用例は以下のとおりであった
とする。d (meeting in Kyoto, stay in Tokyo) = d (Kyoto, Tokyo) x 0.49 + d (in, out) x 1.0 + d (meeting, stay) x 0.54 = 0 x 0.49 + 0 x 1.0 + 2/3 x 0.540.4 d (meeting in Kyoto, meeting in Tokyo) = d (Kyoto, Tokyo) x 0.49 + d (in, no) x 1.0 + d (meeting, meeting) x 0.54 = 0 x 0.49 + 1 x 1.0 + 0 x 0.54 = 1.0 Next, translation when there are a plurality of noun translation words will be described. For example, the name hand has at least four translations: paw, hand, handle, and move.
There is. The examples for each are as follows.

（ａ）猫の手→paw （ｂ）彼の手→hand （ｃ）急須の手→handle （ｄ）チェスの手→move 前従の関係「の」の翻訳の距離計算により入力「彼女
の手」に対しては用例（ｂ）が選択でき、入力「将
棋の手」に対しては用例（ｄ）が選択でき、日本語
の「手」の訳語が決定できる。(A) Cat's hand → paw (b) His hand → hand (c) Teapot hand → handle (d) Chess hand → move , The example (b) can be selected, and for the input "shogi no te", the example (d) can be selected, and the translation of the Japanese "te" can be determined.

［発明の効果」以上のように、この発明によれば、翻訳規則を用いる
ことなく、辞書データベースを参照して、入力された原
文と用例データベースの用例との類似性を実数値として
計算し、計算された値が最小の用例の翻訳に従って、入
力された原文を翻訳できる。したがって、用例主導型機
械翻訳における翻訳の失敗は類似した用例がないことが
原因であり、適正な用例を追加することにより容易に改
善できる。[Effects of the Invention] As described above, according to the present invention, the similarity between an input original sentence and an example of an example database is calculated as a real value by referring to a dictionary database without using a translation rule, The input source text can be translated according to the translation of the example having the smallest calculated value. Therefore, translation failure in example-driven machine translation is caused by the absence of a similar example, and can be easily improved by adding an appropriate example.

また、用例主導型機械翻訳は用例データベースの特
徴、すなわち対象分野ごとの個別の現象を素直に反映す
る。たとえば、対話に頻出する丁寧な表現である「Ｎの
方」の英語への翻訳では常に「方」が略され、「Ｎ」だ
けになる。このような翻訳は規則を用いることなく、単
に用例データベースを収集すれば可能になる。Further, the example-driven machine translation directly reflects the features of the example database, that is, individual phenomena for each target field. For example, in the translation of a polite expression that frequently appears in dialogues, “N”, is always abbreviated to “N”, and is simply “N”. Such translations can be made by simply collecting an example database without using rules.

また、規則主導型機械翻訳は、入力に正確に照合する
規則が存在しなければ翻訳に失敗する。これに対して、
用例主導型機械翻訳は類似した用例が存在すればよいの
で、データベースの不完全性に対してロバストな方式と
言える。In addition, rule-driven machine translation fails if there is no rule that exactly matches the input. On the contrary,
Example-driven machine translation is a method that is robust against database imperfections, since similar examples need only exist.

さらに、慣用表現の翻訳はその要素の翻訳からは合成
できない（たとえば「孫の手」→ａ back scratche
r」）。これを規則主導型機械翻訳で処理するために
は、個別現象のためだけにコストのかかる規則を追加す
る必要があり、慣用表現の翻訳が規則主導型機械翻訳に
は適していない。これに対して、用例主導型機械翻訳で
は、単に用例を追加するだけ済むので、慣用表現の翻訳
に適している。Furthermore, the translation of the idiomatic expression cannot be synthesized from the translation of the element (eg, “grandchildren” → a back scratche)
r "). To process this by rule-driven machine translation, it is necessary to add rules that are expensive only for individual phenomena, and translation of idiomatic expressions is not suitable for rule-driven machine translation. On the other hand, the example-driven machine translation is suitable for translating an idiomatic expression, since only an example needs to be added.

さらに、用例データベースが原言語と目的言語の対照
的な対であり、対を逆転して用いれば、逆方向の規則を
作成する手間もなく、逆方向の翻訳が可能になる。Furthermore, the example database is a contrasting pair of the source language and the target language, and if the pair is used in reverse, translation in the reverse direction becomes possible without the need to create rules in the reverse direction.

[Brief description of the drawings]

第１図はこの発明の一実施例の概略ブロック図である。
第２図および第３図は用例主導型変換部の入力と出力で
ある依存構造を説明するための図である。第４図は第１
図に示した用例主導型変換部の概略ブロック図である。
第５図および第６図は用例主導型変換部のプロセスを説
明するための図である。第７図はシソーラスの階層を説
明するための図である。第８図はこの発明の一実施例で
ある名詞句の関係「の」を翻訳するプロセスを説明する
ための図である。第９図，第10図および第11図はこの発
明の一実施例である名詞句の関係「の」を翻訳する際
に、距離計算で使われる重みを説明するための図であ
る。図において、１は入力部、２は解析部、３は用例主導型
変換部、４は生成部、５は出力部、31は変換制御部、32
は変換部、33は用例データベース、34はシソーラスを示
す。FIG. 1 is a schematic block diagram of one embodiment of the present invention.
FIG. 2 and FIG. 3 are diagrams for explaining a dependency structure which is an input and an output of the example driven conversion unit. FIG. 4 shows the first
FIG. 4 is a schematic block diagram of an example driven conversion unit shown in the figure.
5 and 6 are diagrams for explaining the process of the example-driven conversion unit. FIG. 7 is a diagram for explaining the hierarchy of the thesaurus. FIG. 8 is a diagram for explaining a process of translating the relationship "no" between noun phrases according to one embodiment of the present invention. FIGS. 9, 10, and 11 are diagrams for explaining weights used in distance calculation when translating a noun phrase relation "no" according to an embodiment of the present invention. In the figure, 1 is an input unit, 2 is an analysis unit, 3 is an example-driven conversion unit, 4 is a generation unit, 5 is an output unit, 31 is a conversion control unit, 32
Indicates a conversion unit, 33 indicates an example database, and 34 indicates a thesaurus.

───────────────────────────────────────────────────── フロントページの続き (72)発明者幸山秀雄京都府相楽郡精華町大字乾谷小字三平谷５番地株式会社エイ・ティ・アール自動翻訳電話研究所内 (56)参考文献実開平１−138160ＪＰ，Ｕ) ────────────────────────────────────────────────── ─── Continuing from the front page (72) Hideo Sachiyama, Inventor 5th place, Sanraya, Inaya, Seika-cho, Soraku-gun, Kyoto Pref. 138160JP, U)

Claims

(57) [Claims]

1. An example-driven machine translation method for performing machine translation using an example composed of a pair of an original sentence and a translation corresponding to the original sentence, wherein the example is stored in an example database, and a word in a source language is stored. Prepare a dictionary database hierarchized into a tree structure based on the similarity of meaning, and refer to the dictionary database for the similarity between the input original text and the example of the example database according to the input of the original text. And calculate as a real value, and according to the translation of the example with the smallest calculated value,
An example-driven machine translation method characterized by translating an input original text.