JPH02294779A

JPH02294779A - Machine translation system

Info

Publication number: JPH02294779A
Application number: JP1114926A
Authority: JP
Inventors: Hiroyuki Nakajima; 弘之中島; Hiroyuki Kaji; 梶　博行
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1989-05-10
Filing date: 1989-05-10
Publication date: 1990-12-05

Abstract

PURPOSE:To automatically acquire language knowledge from a translated sentence by finding out translating relation in each sentence between the 2nd language writing obtained by post-editing the 1st language writing and the 1st language writing. CONSTITUTION:The machine translation system is constituted of a Japanese dictionary storage device 04, an English dictionary storage device 05, a Japanese grammar frame storage device 06, an English grammar frame storage device 07, Japanese-English/English-Japanese conversion dictionary storage device 08, a Japanese coocurrence relation dictionary storage device 09, and English occurrence relation dictionary storage device 10, a Japanese writing file storage device 11, an English writing file storage device 12, and so on. The set of translated sentences of the 1st language sentences included in the 1st language writing can be found out from the 2nd language writing in the 1st and 2nd language writings having translating relation. Consequently, the translating relation in each sentence can be automatically determined from the input writing and the writing post-edited in each writing after translation, so that the knowledge of various levels can be automatically registered in a translation dictionary.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は機械翻訳システムなどの自然言語処理システム
に係り，特に自動学習機能を有する機械翻訳システムに
関する．〔従来の技術〕機械翻訳システムに用いる知識の自動学習方式について
は，いくつかの方式が提案されている。[Detailed Description of the Invention] [Industrial Application Field] The present invention relates to a natural language processing system such as a machine translation system, and particularly to a machine translation system having an automatic learning function. [Prior Art] Several methods have been proposed for automatic knowledge learning methods used in machine translation systems.

たとえば，特願昭６３−１９２７５１は、入力文解析結
果である中間表現から曖昧性のないものを抽出すること
により、語と語の共起関係知識を獲得し、利用する方式
である．また、特開昭６２−２１９１６５は，定型文を
入力することにより、共起関係やイディオムなどの訳語
選択用の知識を獲得する方式である．また、特願昭６３
　−　２６３９３４は，対訳例文から、語と語の対訳関
係、イディオム，語と語の共起関係などを抽出する方式
である．さらに、「日英翻訳システムにおけるポストエ
ディット情報のフィードバック」　（情報処理学会第３
７回全国大会、ｐｐ．９７６〜９７７）に述べられてい
る方法は、日英翻訳システムの出力英文の後編集結果か
ら，日英中間表現の変換規則や英文生成規則を学習する
技術である．〔発明が解決しようとする課題〕しかしながら，特願昭６３−１９２７５１の方法は、獲
得できる共起関係知識が原言語に関するもののみである
．また、特開昭６２−２１９１６５は，定型文を入力す
ることにより、訳語選択用の共起関係知識や、イディオ
ムを学習する方式であるが、大量の定型文のセットを用
意しておく必要があり，共起関係の学習は事実上不可能
であると言える．また、特願昭６３　−　２６３９３４
および、前記論文「日英翻訳システムにおけるポストエ
ディット情報のフィードバック」は、対訳関係が決定し
た、第一言語文と第二言語文から言語知識を獲得する方
法であり、後編集で第二言語文の分割や結合が行なわれ
た場合には、利用できない．後編集は，通常，文章単位
で行なわれるので、第二言語文の分割・結合は、頻繁に
行なわれる．本発明の目的は、機械翻訳システムの学習機能において
，後編集さ九た第二言語文章と５人力文章である第一言
語文章の、文単位での対訳舅係を決定し、対訳文からの
言語知識の自動獲得を可能にすることである．〔課題を解決するための手段〕上記目的は、対訳関係にある第一言語文章と第二言語文
章から、第一言語文章に含まれる第一言語文の対訳文の
集合を、第二言語文章の中から求めるステップを設ける
ことにより達成される．該ステップは、第一言語文章に
含まれる第一言語文の数Ｍと、第二言語文章に含まれる
第二言語文の数Ｎをカウントするステップと，該第一言
語文Ｓ（１）〜Ｓ　（Ｍ）の仮対訳文Ｔ（１）〜Ｔ　（
Ｍ）を求めるステップと、ｍ＝１〜Ｍについて、Ｔ　（
ｍ）を中心として、その前後の第二言語文に含まれる単
語とＳ（ｍ）に含まれる単語との対訳関係から、Ｓ（ｍ
）の対訳文集合ｔｒ（ｍ）を求めるステップと、ｎ　＝
　１　〜Ｎについて，　　ｔ　ｒ（１）　〜ｔ　ｒ（Ｍ
）の何れにも含まれなかったＴ（ｎ）をｔ　ｒ　（１）
〜ｔｒ（Ｍ　）の何れかに含めるステップより構成され
る．〔作用〕対訳関係にある第一言語文章に含まれる文の数Ｍと、第
二言語文章に含まれる文の数Ｎをそれぞれ求める．この
第一言語文の集合を｛Ｓ（１），・・・Ｓ　（Ｍ））と
し，第二言語文の集合を（’１’（１）．・・・Ｔ（Ｎ
））とする。For example, Japanese Patent Application No. 63-192751 proposes a method for acquiring and utilizing knowledge of co-occurrence relationships between words by extracting unambiguous intermediate expressions that are the result of input sentence analysis. Furthermore, Japanese Patent Application Laid-Open No. 62-219165 is a method of acquiring knowledge for selecting translations, such as co-occurrence relationships and idioms, by inputting fixed sentences. In addition, the special request for
-263934 is a method for extracting bilingual relationships between words, idioms, co-occurrence relationships between words, etc. from bilingual example sentences. Furthermore, “Feedback of post-editing information in Japanese-English translation systems” (Information Processing Society of Japan, Vol. 3)
7th National Convention, pp. The method described in 976-977) is a technique that learns conversion rules for Japanese-English intermediate representations and English sentence production rules from the results of post-editing of output English sentences from a Japanese-English translation system. [Problems to be Solved by the Invention] However, in the method of Japanese Patent Application No. 1982-19275, the co-occurrence relationship knowledge that can be acquired is only related to the source language. In addition, Japanese Patent Application Laid-open No. 62-219165 uses a method to learn co-occurrence relationship knowledge and idioms for selecting translations by inputting fixed sentences, but it is necessary to prepare a large set of fixed sentences. Therefore, it can be said that learning co-occurrence relationships is virtually impossible. Also, patent application No. 63-263934
The paper "Feedback of post-editing information in a Japanese-English translation system" describes a method for acquiring linguistic knowledge from first language sentences and second language sentences whose bilingual relationship has been determined. It cannot be used if the file is divided or combined. Post-editing is usually done on a sentence-by-sentence basis, so second language sentences are frequently divided and combined. The purpose of the present invention is to use the learning function of a machine translation system to determine the bilingual relationship of a post-edited second language text and a human-written first language text on a sentence-by-sentence basis, and to The goal is to enable automatic acquisition of language knowledge. [Means for solving the problem] The above purpose is to convert a set of parallel sentences of the first language sentence included in the first language sentence into a second language sentence from a first language sentence and a second language sentence that have a bilingual relationship. This is achieved by providing a step to find from among. This step includes counting the number M of first language sentences included in the first language sentences and the number N of second language sentences included in the second language sentences, and counting the first language sentences S(1) to Provisional parallel translations of S (M) T(1)~T (
M) and for m=1 to M, T (
Centering on S(m), from the bilingual relationship between the words included in the second language sentences before and after it and the words included in S(m),
), a step of finding a bilingual sentence set tr(m) of n =
1 to N, t r(1) to t r(M
) T(n) that is not included in any of t r (1)
~tr(M). [Operation] Find the number M of sentences included in the first language text and the number N of sentences included in the second language text that are in a bilingual relationship, respectively. Let the set of first language sentences be {S(1),...S (M)), and the set of second language sentences be ('1'(1)...T(N
)).

次に．ｍ＝１〜Ｍについて、Ｆ（ｍ）＝　［ｍ×Ｎ／Ｍ
ｌで定義する関数Ｆ（ｍ）を用いて、Ｓ　（ｍ）の仮対
訳文Ｔ（Ｆ（ｍ））を求める（［］は小数点以下切り上
げを意味する）．次に，ｍ＝１〜Ｍについて、文Ｓ（ｍ）を構成する単語
の訳語を含む第二言語文を、第二言語文の集合（Ｔ（ｗ
ａｘ（Ｆ（ｍ）−ｒ，１）），−，Ｔ（ｗｉｎ（Ｆ（ａ
＋）＋ｒ，Ｎ）））の中から求め、文Ｓ（ｍ）を構成す
る単語の訳語を含む第二言語文の集合をｔｒ（ｍ）とす
る．ここでｒは、０≦ｒ≦Ｎを濶たす整数とする。ｔｒ
（ｍ）＝φであれば、ｔ　ｒ（ｍ）＝　（Ｔ（Ｆ（ｍ）
））とする。next. For m=1~M, F(m)=[m×N/M
Using the function F(m) defined by l, find the tentative translation sentence T(F(m)) of S(m) ([] means rounding up to the nearest whole number). Next, for m = 1 to M, the second language sentences containing the translations of the words constituting the sentence S(m) are defined as the set of second language sentences (T(w
ax(F(m)-r,1)),-,T(win(F(a)
+)+r, N))), and let tr(m) be the set of second language sentences containing translations of the words that make up the sentence S(m). Here, r is an integer satisfying 0≦r≦N. tr
If (m)=φ, then t r(m)= (T(F(m)
)).

次に、ｎ＝１〜Ｎについて−　ｔｒ　（１）〜ｔ　ｒ　
（Ｍ）の何れにも含まれない第二言語文Ｔ　（　ｎ　）
があれば、ｔ　ｒ（ｍ　１）＝　ｔ　ｒ（ｍ　１）ＬＪ
（Ｔ（ｎ））、ｔ　ｒ（ｍ２）＝ｔ　ｒ（ｍ２）ＬＪ（
Ｔ（ｎ））、として、Ｔ（ｎ）をｔｒ（ｍｌ）とｔｒ（
ｍ２）に含める．ここで．ｔｒ（ｍｌ）は、Ｔ（ｎ）よ
り前の文で、Ｔ（ｎｌ）を含むｔｒ（ｍ）が存在する最
大の０１について、Ｔ（ｎｌ）を含む集合である．また
、ｔｒ（ｍ２）は．Ｔ（ｎ）より後の文で、Ｔ（ｎ２）
を含むｔｒ（ｙｎ．）が存在する最小のｎ２について．
Ｔ（ｎ２）を含む集合である．ｍｌ，ｍ２が複数ある場
合は、すべてのｍｌ，ｍ２について、上記のｔｒ（ｍｌ
），ｔｒ（ｍ２）の拡大操作を行なう。Next, for n = 1 to N - tr (1) to tr
Second language sentence T (n) that is not included in any of (M)
If so, t r (m 1) = t r (m 1) LJ
(T(n)), t r(m2)=t r(m2)LJ(
T(n)), T(n) is tr(ml) and tr(
m2). here. tr(ml) is a set containing T(nl) for the maximum 01 in which there is a tr(m) containing T(nl) in the sentence before T(n). Also, tr(m2) is . In the sentence after T(n), T(n2)
For the smallest n2 such that there is a tr(yn.) containing .
It is a set containing T(n2). If there are multiple ml, m2, the above tr(ml
), tr(m2) is enlarged.

以上のようにして、Ｓ（１）〜Ｓ　（Ｍ）の対訳文集合
ｔｒ（１）からｔｒ（Ｍ）を決定する。As described above, tr(M) is determined from the bilingual sentence set tr(1) of S(1) to S(M).

〔Example〕

以下，本発明の一実施例である日英・英日双方向機械翻
訳システムについて説明する．第２図は、実施例のハー
ドウエア構成図で、処理装［２０１，入力装置０２，出
力装置０３，日本語辞書記憶装置０４，英語辞書記憶装
１！０５，日本語格フレーム記憶装置０６，英語格フレ
ーム記憶装置０７，Ｅｌ英・英日変換辞書記憶装置０８
，日本語共起関係辞書記憶装置０９，英語共起関係辞書
記憶装置１０，日本語文章ファイル記憶装置１１，英語
文章ファイル記憶装置１２から成る．日本語辞書記憶装
置内の日本語辞書は，第３図に示すようなレコードで構
成される．レコードは、日本語見出し０４１，品詞０４
２，意味コード０４３，日本語格フレームコード０４４
から成る．レコードは日本語見出し０４１をキーとして
検索できる．英語辞書記憶装置内の英語辞書は、第４図に示すような
レコードで構成される．レコードは，英語見出し０５１
，品詞０５２，意味コード０５３，英語格フレームコー
ド０５４から成る。レコードは英語見出し０５１をキー
として検索できる。The following describes a Japanese-English and English-Japanese bidirectional machine translation system that is an embodiment of the present invention. FIG. 2 is a hardware configuration diagram of the embodiment, showing a processing unit [201, input device 02, output device 03, Japanese dictionary storage device 04, English dictionary storage device 1!05, Japanese case frame storage device 06, English case frame storage device 07, El English/English-Japanese conversion dictionary storage device 08
, a Japanese co-occurrence relationship dictionary storage device 09, an English co-occurrence relationship dictionary storage device 10, a Japanese sentence file storage device 11, and an English sentence file storage device 12. The Japanese dictionary in the Japanese dictionary storage device consists of records as shown in Figure 3. The record is Japanese heading 041, part of speech 04
2, Semantic code 043, Japanese case frame code 044
Consists of. Records can be searched using Japanese heading 041 as a key. The English dictionary in the English dictionary storage device is composed of records as shown in Figure 4. The record is English heading 051
, part of speech 052, meaning code 053, and English case frame code 054. Records can be searched using English heading 051 as a key.

日本語格フレーム記憶装置内の日本語格フレームは、第
５図に示すようなレコードで構成される。The Japanese case frame in the Japanese case frame storage device is composed of records as shown in FIG.

レコードは、日本語格フレームコード名０６１，深層格
０６２，表層格ｏ６３，格要素の意味コード０６４から
成る．深層格はコードで表わす．コードＡ，Ｏ，Ｉはそ
れぞれ，動作主格，対象格，道具格を表わしている。表
層格は深層格に対応する日本語の助詞を表わしている。The record consists of Japanese case frame code name 061, deep case 062, surface case o63, and case element meaning code 064. Deep cases are represented by codes. Codes A, O, and I represent the nominative, object, and instrumental cases, respectively. Surface cases represent Japanese particles that correspond to deep cases.

格要素の意味コードは，格要素の持つべき意味的特徴を
表すコードであり．ＨＵＭ，ＯＢＪ，ＩＮＳＴはそれぞ
れ、人間，動作の対象となる物，道具を表す．レコード
は日本語格フレームコード名０６１をキーとして検索で
きる．英語格フレーム記憶装置内の英語格フレームは，第６図
に示すようなレコードで構成される．レコードは、英語
格フレームコード名０７１，深層格ｏ７２，表層格ｏ７
３，格要素の意味コード０７４から成る．深層格は日本
語格フレームの場合と同じコードで表わす。表層格は深
層格に対応する英語の構文的役割または前置詞を表おし
ており、Ｓは主語、Ｄ○は直接目的語を表す．格要素の
意味コードは日本語格フレームの場合と同じコードで表
す．レコードは英語格フレームコード名０７１をキーと
して検索できる．日英・英日変換辞書記憶装置内の日英・英日変換辞書は
第７図に示すようなレコードで構成される．レコードは
、日本語概念構造０８１，英語概念構造０８２から成る
．同一レコード内の日本語概念構造と英語概念構造は対
訳関係にある．各概念構造は，単一の単語の場合もあれ
ば、２つの単語とその間の意味的関係（深層格コードで
表現する）の集合で構成される場合もある．第７図では
、簡単のため、［彼』とｒｈｅＪ、「工具」とｒｔｏｏ
ｌＪなど単語同士の対訳関係のみ登録している．レコー
ドは日本語概念構造０８１と英語概念構造０８２のいず
れをキーとしても検索することができる。The semantic code of a case element is a code that represents the semantic features that the case element should have. HUM, OBJ, and INST represent a person, an object, and a tool, respectively. Records can be searched using the Japanese case frame code name 061 as a key. The English case frame in the English case frame storage device consists of records as shown in Figure 6. The record is English case frame code name 071, deep case o72, surface case o7
3. Consists of case element semantic code 074. Deep cases are expressed using the same codes as for Japanese case frames. The surface case represents the English syntactic role or preposition that corresponds to the deep case, S represents the subject, and D○ represents the direct object. The semantic codes of case elements are expressed using the same codes as for Japanese case frames. Records can be searched using the English case frame code name 071 as a key. The Japanese-English/English-Japanese conversion dictionary in the Japanese-English/English-Japanese conversion dictionary storage device consists of records as shown in Figure 7. The record consists of a Japanese conceptual structure 081 and an English conceptual structure 082. Japanese conceptual structures and English conceptual structures in the same record have a bilingual relationship. Each conceptual structure may be a single word, or it may consist of a set of two words and a semantic relationship (expressed by a deep case code) between them. In Figure 7, for simplicity, [he] and rheJ, and "tool" and rtoo
Only bilingual relationships between words such as lJ are registered. Records can be searched using either the Japanese conceptual structure 081 or the English conceptual structure 082 as a key.

日本語共起関係辞書記憶装置内の日本語共起関係辞書は
、第８図に示すようなレコードで構成される．レコード
は，日本語動詞０９１，日本語名詞０９２，深層格０９
３から成る．レコードは日本語動詞０９１と日本語名詞
０９２の連接をキーとして検索できる．英語共起関係辞書記憶装置内の英語共起関係辞書は，第
９図に示すようなレコードで構成される．レコードは、
英語動詞１０１，英語名詞１０２，深層格１０３から成
る．レコードは英語動詞１０１と日本語名詞１０２の連
接をキーとして検索できる．日本語共起関係辞書および英語共起関係辞書は、格フレ
ームでは表現できない動詞固有の共起関係を，動詞，名
詞，深層格の３項関係として登録しており，解析・生成
処理においては格フレームに優先して使用される．日本語文章記憶装置内の日本語文章ファイルは、第１０
図に示すようなレコードで構成さわる．レコードは，日
本文番号１１１，日本文１１２，対訳英文番号１１３か
ら成る．英語文章記憶装置内の英語文章ファイルは、第１１図に
示すようなレコードで構成される．レコードは，英文番
号１２１，英文１２２，対訳日本文番号１２３から成る
，日本語文章記憶装置および英語文章記憶装置は、それぞ
九の文章を簿成する文が、文章中に呪われる順序で文番
号を付されて、１文１レコード単位で記憶されている．次に処理装置０１で実行さ九る処理について．第ｌｒｊ
！Ｉによって説明する．まず、入力装置０２から処理選択パラメータを入力する
．処理選択パラメータとしては、本システムを、日英機
械翻訳に用いる場合は「翻訳」を，翻訳用辞書保守に用
いる場合は、「保守」を入力する（０１０１）− 次に、日本語文章ファイルから，日本語文章を読み込む
．すなわち、日本語文章ファイルに記憶されている日本
語文レコードをすべてよみ込む（０１０２）．ステップ
０１０２で読み込んだ日本語文の集合を（ＳＮ（１），
・・・，ＳＮ（Ｍ））として，以下のステップ０１０３
からステップ０１０６をｍ＝ｌ〜Ｍについて繰り返す．
日本語辞書と日本語格フレームと日本語共起関係辞書を
参照して日本語文ＳＮ（ｍ）の構文解析を行なう．解析
結果は，ノードが内容語を表わし、意味的に関係を持つ
語の対を結ぶアークでその意味的関係（深層格コード）
を表わす木構造グラフで表現する。複数の解析が可能な
場合は，すべての解を求める．日本文ＳＮ（ｍ）の解析
結果の集合を（Ｎ（ｍ，１），＋＋，Ｎ（ｍ，Ｉ））で
表わす（０１０３）．解析結果を処理装置内の作業メモ
リに格納する（０１０４）．ステップ０１０１で入力さ
れた処理選択パラメータを判定し，パラメータが「翻訳
」であれば，ステップ０１０６に進み、パラメータが「
保守』であれば，ステップ０１０３〜０１０６の繰返し
処理の最初に戻る（０１０５）．日英・英日変換辞書，
英語共起関係辞書，英語格フレーム辞書を参照して，日
本語解析結果Ｎ　（ｍ，１）・・・Ｎ　（ｍ，Ｉ）から
，それぞれの解析結果に対する英文を生成する（０１０
６）．ステップ０１０１で入力された処理選択パラメータを判
定し，パラメータが「翻訳』であれば、ステップ０１０
８に進み、「保守』であれば、ステップ０１１１に進む
（０１０７）．ステップ０１０６で出力された英語翻訳
文章を出力装置より出力する（０１０８）．英語翻訳文
章の後編集を受け付ける（０１０９）．後編集された英
語文章を英語文章ファイルに出力する（０１１０）．英
語文章ファイルより，英語文章を入力する．この英語文
章は、翻訳処理の場合は，ステップ０１０９で後編集さ
れた翻訳文章であり，ステップ０１０２で入力された日
本語文章の対訳文章である．辞書保守処理の場合も、英
語文章ファイルに予め記憶された，ステップ０１０２で
入力された日本語文章の対訳文章である（０１１１）．
ステップ０１１１で読み込んだ英文の集合を（ＳＥ（１
），・・・，ＳＥ　（Ｎ））として、以下のステツプ０
１１２からステップ０エエ３をｎ　＝　１〜Ｎについて
繰り返す．英語辞書と英語格フレームと英語共起関係辞
書を参照して英文ＳＥ（ｎ）の構文解析を行なう．解析
結果は、ノードが内容語を表わし，意味的に関係を持つ
語の対を結ぶアークでその意味的関係（ａＩ層格コード
）を表わす木構造グラフで表現する．複数の解析が可能
な場合は、すべての解を求める．英文ＳＥ（ｎ）の解析
結果の集合を（Ｅ（ｎ，１），・・Ｅ　（ｍ　，　Ｊ　
））で表わす（０　１　１　２）．解析結果を処理装置
内の作業メモリに格納する（０１１３）．次に、後述する対訳文決定ルーチンにより，日本語文章
と英語文章の文単位での対訳関係を求める．対訳関係は
，後述するように，日本文ＳＮ（１）〜ＳＮ（Ｍ）の対
訳英文の集合を、それぞれ．ｔｒ（１）〜ｔｒ（Ｍ）と
して表現してお＜（０１１４）．ステップ０１１４で求
めた対訳文関係により，ステップ０１０４で格納した日
本語文解析結果と、ステップ０１１３で格納した英文解
析結果の対応関係を求め，前記特願昭６３　−　２６３
９３４に述べられている方法により，語と語の対訳関係
，語と語の共起関係、イディオムなどの知識を抽出する
．すなわち、ｍ＝１〜Ｍについて、ステップ０１０３で
求めたＳＮ（ｍ）の解析結果Ｎ　（ｍ　，　１　）〜Ｎ
　（ｍ，Ｉ）と，ｔｒ（ｍ）に含まれるＳＮ（ｍ）の訳
文ＳＲ（ｎ）の解析結果Ｅ　（ｎ，１）〜Ｅ　（ｎｏ　
Ｊ）を比較して、そのパターンマッチ処理に基づいて，
最も良くパターンの一致した解析結果Ｎ（ｍ，ｉ）とＥ
（ｎ，ｊ）を比較することにより，知識を抽出する．ｔ
ｒ（ｍ）が複数の英文を含む場合と一つの英文が複数の
ｔｒ（ｍ）に含まれる場合，すなわち，対訳文関係が１
対１でない場合は、特願昭６３−２６３９３４に述べら
れている方法を若干改良する必要があるが、後に具体例
で述べるように，接続語（または副詞）の情報を用いる
ことにより，簡単に改良できる（０１１５）．ステップｏ１１５で抽出した知識に基づき、日英・英日
変換辞書記憶，日本語共起関係辞書，英語共起関係辞書
を更新する（０１１６）．ステップ０１１６で求めた対
訳文関係を日本語文章ファイル，英語文章ファイルに出
力する（０　１　１　７）．以上が本発明の処理の流九である．次に，上記ステップ０１１４の対訳文決定ルーチンの処
理内容を第１２図に従って詳述する．まず．ｍ−１〜Ｍ
について、Ｆ　（ｍ）　＝　［ｍ×Ｎ／Ｍ］で定義する
関数Ｆ（ｍ）を用いて、日本文ＳＮ（ｍ）の対訳文候補
ＳＥ（Ｆ（ｍ））を求める（［コは／Ｉ１数点以下切り
上げを意味する）．これは，第１３図に示すように，日
本文番号の朶合（１，・・・，Ｍ）と英文番号の集合（
１＋ｍｔＮ）の直積集合をＭ×Ｎのマトリクスで表した
場合、対訳文関係は、該マトリクスの対角線上にほぼ分
布する部分集合になる可能性が高いことに基づく（０１
１４１）　．次に，ｍ＝１〜Ｍについて、文ＳＮ（ｍ）を構成する単
語の訳語を含む英文を、英文の集合，｛　Ｓ　Ｅ　（＋
ｉａｘ（Ｆ（ｍ）−ｒ，１）），　Ｓ　Ｅ　（ｗａｘ（
Ｆ（ｍ）−ｒ，１）÷１），−　，　　Ｓ　　Ｅ　（ｓ
ｉｎ（Ｆ（＋＊）＋ｒ，Ｎ）））の中かヘ委め、文ＳＮ
（ｍ）を構成する単語の訳語を含む英文の集合をｔｒ（
ｍ）とする．この処理は，文ＳＮ（ｍ）の対訳文の集合
をステップ０１１４１で求めた対訳文候補の前後±ｒ文
の中から、求めることを意味する．ｒは，０≦ｒ≦Ｎを
満たす整数とする．重１１Ｎおよびｗｉｎは、ｍが１ま
たはＭに近い場合に対処するため用いている．単語分割
は、ステップ０１０３および，０１１２で行われており
，この結果を利用する．単語の対訳関係は，日英・英日
変換辞書を検索して求める．ここで、ｔｒ（ｍ）＝φで
あれば，ｔｒ（ｍ）＝（ＳＥ（Ｆ（ｍ）））とする（０
１１４２），次に、ｎ＝１〜Ｎについて、ｔ　ｒ　（１）〜ｔ　ｒ　
（ｍ）の何れにも含まれない英文ＳＥ（ｎ）があれば、
ｔ　ｒ（ｍｌ）＝ｔ　ｒ（ｍｌ）Ｕ（ＳＥ（ｎ））、ｔ
　ｒ（ｍ２）＝ｔ　ｒ（ｍ２）Ｕ（ＳＥ（ｎ））．とし
て，ＳＥ（ｎ）をｔｒ（ｍｌ）とｔｒ（ｍ２）に含める
．ここで，ｔｒ（ｍｌ）は、ｓＥ（ｎ）より前の英文で
、ＳＥ（ｎｌ）を含むｔｒ（ｍ）が存在する最大のｎ１
について，ＳＥ（ｎｌ）を含め集合である．また，ｔｒ
（ｍ２）は，ＳＥ（ｎ）より後の英文で，ＳＥ（ｎ２）
を含むｔｒ（ｍ）が存在する最小のｎ２について、ＳＥ
（ｎ２）を含む集合である．ｍ１，ｍ２が複数ある場合
は、すべてのｍｌ，ｍ２について，上記のｔ　ｒ（ｍｌ
），ｔ　ｒ（ｍ２）の拡大操作を行なう　（０１１４３
）．以上のようにして、求められたｔｒ（１）〜ｔｒ（Ｍ）
を、それぞれ、ＳＮ（１）〜ＳＮ（Ｍ）の訳文集合とす
る．次に、第３図の日本語辞書，第４図の英語辞書，第５図
の日本語格フレーム，第６図の英語格フレーム、第７図
の日英・英日変換辞書，第８図の日本語共起関係辞書、
第９図の英語共起関係辞書、第１０図の日本語文章ファ
イル、第１１図の英語文章ファイルを用いた場合の、自
動学習機械翻訳の過程を、例をあげて詳しく説明する．
まず、処理選択パラメータとして，「保守」を入力した
ときについて説明する．日本語文章記憶装置内には、「私がその工具でプリンタ
を作ったあと、彼はその工具でキーボードを作った．」
の１文のみが登録されているので、この１文をＳＮ（１
）として日本語辞書，日本語格フレーム，日本語共起関
係辞書を用いて解析し，第１４図に示した解析結果を得
る．深層格コードＭは修飾関係，Ｌは述語間の連続関係
を表す．英語文章記憶装置内には、｛ＳＮ（１））の対
訳文章（ｒ工　ｍａｄｅ　　ｔｈｓ　　ｐｒｉｎｔｅｒ
　　ｗｉｔｈ　　ｔｈｅ　　ｔｏｏｌ．Ｊ、　ｒＴｈｅ
ｎ，ｈａ　ｍａｄｅ　ｔｈｅ　ｋｅｙｂｏａｒｄ　ｗｉ
ｔｈ　ｔｈｅ　ｔｏｏｌ．Ｊ）が登録されている．そこ
で、この２文をＳＥ（１）．ＳＥ（２）として入力し、
英文解析を行ない，第１５図，第１６図に示した解析結
果を得る．動詞ｒ　＠ａｋｅ　Ｊの格フレームはＥ１と
Ｅ２の２つがあるため，Ｓ　Ｅ　（１）とＳ　Ｅ　（２
）の解析結果は，それぞれ，第１５図（ａ），（ｂ）、
第１６図（ａ），（ｂ）の２通りずつ求められる．次に対訳文決定ルーチンに進む．今，日本文の数Ｍ＝１
，英文の数Ｎ＝２であるので、Ｆ（１）＝［１ｘ２／１
１　＝２である．従って、ＳＮ（１）の対訳文候補はＳ
　Ｅ　（２）である．次に．ＳＥ（２）を中心とする前
後の英文から．ＳＮ（１）の対訳文集合を単語の対訳関
係により求める．ｒ＝１とすると、検索範囲は（Ｓ　Ｅ
　（１），　Ｓ　Ｅ　（２））である．日英・英日変換
辞書に登録された対訳関係（私、工），（彼，ｈｅ），
（作る、ｍａｋｅ）　＋　　（工具、ｔｏｏｌ）　，（
プリンタ，　ｐｒｉｎｔｅｒ）により、ｔ　ｒ（１）＝
　｛ＳＥ（１）　，　ＳＥ（２））になる．すべての英
文がｔｒ（１）に含まれたので５対訳文決定ルーチンは
終了する．次に、知識抽出処理に移る．このときステッ
プ０１１５の処理の説明で触れたように、対訳文関係が
１対２になっているので、第１４図に示したＳＮ（１）
の解析結果を表すグラフと、第１５図２第１６図にそれ
ぞれ示したＳＥ（１），ＳＥ（２）の解析結果を表すグ
ラフの対をパターンマッチさせる必要がある，この場合
は，日英・英日変換辞書に登録された対訳関係（あと、
ｔｈｅｎ）を用いることにより、第１５図と、第１６図
のグラフは，第１７図のグラフにまとめることができ，
パターンマッチは、第１４図のグラフと第１７図（ａ）
　（ｂ）　（ｃ）　（ｄ）のグラフについて行なえばよ
い．前記特願昭６３−２６３９３４に述べられているパ
ターンマッチング方法にヰ２．すなわち，第１４図のグ
ラフと最も良くマツチする第１７図（ａ）のグラフと、
第１４図のグラフの概念関係を比較することにより，単
語の対訳関係（キーボード，　ｋｅｙｂｏａｒｄ）と，
日本語共起関係（作る，キーボード，０）、（作る，彼
，Ａ），（作る，工具，Ｉ），（作る，プリンタ，０）
と英語共起関係（ｍａｋｅ，　ｋｅｙｂｏａｒｃＬ　Ｏ
）　＋　　（ｍａｋｅｔＩ　，　Ａ）　，　　（ｍａｋ
８，　ｔｏｏｌ，　　Ｉ　）　，　（ｍａｋｅ，　ｐｒ
ｉｎｔｅｒ，０）が抽出され，辞書に登録される（Ｗｉ
単のため，述語と格要素の共起関係のみを抽出している
）．最後に、対訳文決定ルーチンで求めた文単位の対訳
関係を日本語文章記憶装置，英語文章記憶装置に、それ
ぞれ、第１８図，第１９図のように登録し、処理を終了
する．上記の例では，処理選択パラメータとして、ｒ保守］を
入力した場合を取り上げたが，「翻訳」を選択した場合
も、英文生成と後ＩＩｉ集のステップを経ることにより
、対訳文章が英語文章記憶装置に格納されるので、対訳
文決定以降の処理は、「保守』の場合と全く同様に行な
える．以上、本発明の一実施例を示した．上記実施例によれば，機械翻訳システムで翻訳し、人手
による後編集を受けた英語文章と，機械翻訳システムに
入力された日本語文章から、文単位での対訳関係を自動
的に決定し、日英の単語としての対訳関係，および共起
関係を自動的かつ適切に抽出し，辞書登録することがで
きる．このため，機械翻訳システムに辞書の自己増殖機
能を持たせることができる．［拡張例ｌ］また、上記実施例の拡張例として，対訳文決定の処理に
おいて、日本文ＳＮ（ｍ）の対訳文候補ＳＥ（Ｆ（ｍ）
）を中心として、その前後のｒ個の英文の中からＳＮ（
ｍ）の対訳文を検出するステップにおいて、日本文の数
Ｍと英文の数Ｎの大小関係に基づいてｒの値を決定する
ステップを追加することにより，対訳文決定の精度と能
率を向上させる方法が考えられる．すなわち．ＭＡＮで
あれば、日本語文１文当りに対応する英文の数は少なく
なるので，ｒを小さ＜Ｌ，．Ｍ＜Ｎであれば，ｒを大き
くする．たとえば、ｒ＝［Ｎ／Ｍ］と計算するステップ
を設ける（［］は小数点以下切り上げを意味する）．［拡張例２］また、上記実施例や、［拡張例１］のｒを変化させる方
法では、Ｍ）Ｎの場合．ｔｒ（ｍ）の拡大操作（ステッ
プ０１１４３）を用いるケースが多くなり．対訳文決定
の精度が若干落ちると考えられる．そこで，対訳文決定
処理の前に、ＭとＮの大小関係を判定し，Ｍ＞Ｎの場合
は、訳文決定の方向を日英から英日に変更するステップ
を追加する．すなわち．ＭＡＮの場合は、英文Ｓ　Ｅ　
（１）〜ＳＥ（Ｎ）の日本語対訳文集合を求める方向に
対訳文決定を変更するステップを設けることにより，上
記の問題を解決できる．対訳文決定の方向の変更が可能
であることは、システムの対称性より，明らかである．［拡張例３］また、上記実施例において、日本文ＳＮ（ｍ）の対訳文
を対訳文候補ＳＥ　（Ｆ（ｍ））の中から，単語間の対
訳関係に基づいて決定するステップ（ステップ０１１４
２）では，すべての内容語について，対訳関係を求めて
いるが，単語の品詞を限定し（たとえば，名詞）、限定
した単語についてのみ対訳関係を検出することにより，
処理を高速化する方法が考えらわる．［拡張例４コまた、上記実施例において、日本語文章ファイルおよび
英語文章ファイルのレコードに段落管理情報を第２０図
，第２１図のように設け、対訳文決定処理を文章単位で
はなく、段落単位で行なうことにより、決定精度を向上
させる方法が考えられる．［拡張例５コまた、上記実施例の英文生成処理および知識獲得処理に
おいて，日本文の単語のあとに括弧で括られたアルファ
ベット文字列を該単語の訳語として選択するステップを
設け、該訳語が日英・英日変換辞書に登録されていない
場合は、辞書変更時にその対訳関係を登録するステップ
を設けることにより，訳語選択および知識獲得能力を高
めることができる．たとえば，日本文「私はプログラム
（ｐｒｏｇｒａ■）を作った．』が入力された場合、「
プログラムＪの訳語として、ｒｐｒｏｇｒａｍＪを選択
し、（プログラム，　ｐｒｏｇｒａｙａ）の対訳関係が
日英・英日変換辞書に登録されていなければ，該対訳関
係を登録する．［拡張例６］また、上記実施例において，対訳文決定ルーチンのステ
ップ０１１４２で，単語の対訳関係に基づいて日本文Ｓ
Ｎ（ｍ）の対訳文集合を求める処理を，次のように実行
する方法も考えられる．すなわち，しきい値θ≧Ｏを設
定し，英文ＳＥ（ｎ）に含まれる内容語の数Ｙ（ｎ）に
対する、ＳＮ（ｍ）に含まれる内容語の訳語のうちＳＥ
（ｎ）に含まれるものの数Ｔ（ｍ，ｎ）の比，Ｒ（ｍ，ｎ）＝Ｔ（ｍ，ｎ）／Ｙ（ｎ）が．Ｒ（ｍ，ｎ
）＞θを満たしていればＳＥ（ｎ）をｔｒ（ｍ）に含め
るという方法である．上記実施例は，本拡張例において
，θ＝０とした場合に相当する．また，本拡張例では，
しきい値θを設けることにより精度向上を図っているた
め、上記実施例のステップ０１１４１を省略することも
できる。The Japanese co-occurrence relationship dictionary in the Japanese co-occurrence relationship dictionary storage device is composed of records as shown in Figure 8. The records are Japanese verbs 091, Japanese nouns 092, deep cases 09
Consists of 3. Records can be searched using the conjunction of Japanese verb 091 and Japanese noun 092 as a key. The English co-occurrence relationship dictionary in the English co-occurrence relationship dictionary storage device consists of records as shown in Figure 9. The record is
It consists of 101 English verbs, 102 English nouns, and 103 deep cases. Records can be searched using the conjunction of English verbs 101 and Japanese nouns 102 as keys. The Japanese co-occurrence relationship dictionary and the English co-occurrence relationship dictionary register verb-specific co-occurrence relationships that cannot be expressed in case frames as ternary relationships of verbs, nouns, and deep cases. Used in preference to frames. The Japanese text file in the Japanese text storage device is the 10th
It consists of records as shown in the figure. The record consists of Japanese sentence number 111, Japanese sentence 112, and bilingual English sentence number 113. The English text file in the English text storage device consists of records as shown in Figure 11. The record consists of English sentence number 121, English sentence 122, and parallel Japanese sentence number 123. The Japanese sentence storage device and the English sentence storage device each record the sentences that make up the nine sentences in the sentence in the cursed order. Each sentence is numbered and stored in one record unit. Next, let's talk about the processing executed by processing device 01. No. lrj
! Explain by I. First, process selection parameters are input from the input device 02. As the process selection parameter, enter "translation" if you want to use this system for Japanese-English machine translation, or "maintenance" if you want to use it for translation dictionary maintenance (0101) - Next, from the Japanese text file , reads the Japanese text. That is, all Japanese sentence records stored in the Japanese sentence file are read (0102). The set of Japanese sentences read in step 0102 is (SN(1),
..., SN(M)), the following step 0103
Step 0106 is repeated for m=l~M.
Parse the Japanese sentence SN(m) by referring to the Japanese dictionary, Japanese case frame, and Japanese co-occurrence relationship dictionary. The analysis result is that nodes represent content words, and arcs that connect pairs of semantically related words represent their semantic relationships (deep case codes).
It is expressed as a tree-structured graph representing . If multiple analyzes are possible, find all solutions. The set of analysis results of the Japanese sentence SN(m) is expressed as (N(m, 1), ++, N(m, I)) (0103). The analysis results are stored in the working memory within the processing device (0104). The input processing selection parameter is determined in step 0101, and if the parameter is "translation", the process proceeds to step 0106, and the parameter is "translation".
If it is 'Maintenance', the process returns to the beginning of the iterative process of steps 0103 to 0106 (0105). Japanese-English/English-Japanese conversion dictionary,
Refer to the English co-occurrence relationship dictionary and the English case frame dictionary to generate English sentences for each analysis result from the Japanese analysis results N (m, 1)...N (m, I) (010
6). The process selection parameter input in step 0101 is determined, and if the parameter is "translation", step 010
Proceed to step 8, and if it is "maintenance", proceed to step 0111 (0107).The English translated text output in step 0106 is output from the output device (0108).Post-editing of the English translated text is accepted (0109). Output the post-edited English text to the English text file (0110). Input the English text from the English text file. In the case of translation processing, this English text is the translated text that was post-edited in step 0109. , the bilingual text of the Japanese text input in step 0102.In the case of dictionary maintenance processing, this is also the bilingual text of the Japanese text input in step 0102, which is stored in advance in the English text file (0111).
The set of English sentences read in step 0111 is (SE(1
),...,SE (N)), the following step 0
Steps 0 and 3 from step 112 are repeated for n = 1 to N. Parse the English sentence SE(n) by referring to the English dictionary, English case frame, and English co-occurrence relationship dictionary. The analysis results are expressed as a tree structure graph in which nodes represent content words and arcs connecting pairs of semantically related words represent their semantic relationships (aI stratification codes). If multiple analyzes are possible, find all solutions. The set of analysis results of English sentence SE(n) is expressed as (E(n, 1),...E (m, J
)) (0 1 1 2). The analysis results are stored in the working memory within the processing device (0113). Next, the bilingual sentence determination routine described later is used to find the bilingual relationship between the Japanese sentence and the English sentence on a sentence-by-sentence basis. As described later, the bilingual relationship is a set of bilingual English sentences for Japanese sentences SN(1) to SN(M), respectively. Expressed as tr(1) to tr(M)<(0114). Based on the bilingual sentence relationship obtained in step 0114, the correspondence relationship between the Japanese sentence analysis result stored in step 0104 and the English sentence analysis result stored in step 0113 is determined, and the above-mentioned patent application No. 1988-263 is obtained.
Using the method described in 934, knowledge such as bilingual relationships between words, co-occurrence relationships between words, and idioms is extracted. That is, for m=1 to M, the analysis results of SN(m) obtained in step 0103 are N (m, 1) to N
(m, I) and the analysis result E (n, 1) to E (no
J) and based on the pattern matching process,
Analysis results N(m,i) and E with the best pattern matching
Knowledge is extracted by comparing (n, j). t
When r(m) includes multiple English sentences and when one English sentence is included in multiple tr(m), in other words, the bilingual sentence relationship is 1.
If the ratio is not 1, it is necessary to slightly improve the method described in Japanese Patent Application No. 63-263934, but as will be explained later in a specific example, it is possible to easily Can be improved (0115). Based on the knowledge extracted in step o115, the Japanese-English/English-Japanese conversion dictionary memory, the Japanese co-occurrence relationship dictionary, and the English co-occurrence relationship dictionary are updated (0116). The bilingual sentence relationship obtained in step 0116 is output to the Japanese text file and the English text file (0 1 1 7). The above is the process flow of the present invention. Next, the processing contents of the bilingual sentence determination routine in step 0114 will be described in detail with reference to FIG. first. m-1~M
, find the bilingual sentence candidate SE(F(m)) of the Japanese sentence SN(m) using the function F(m) defined by F(m) = [m×N/M] ([kowa/ I mean rounding up to the next few points). As shown in Figure 13, this is a combination of Japanese numbers (1,...,M) and a set of English numbers (1,...,M).
1+mtN) is represented by an M×N matrix, the bilingual sentence relationship is based on the fact that there is a high possibility that the set will be a subset distributed approximately on the diagonal of the matrix (01
141). Next, for m = 1 to M, the English sentences containing the translations of the words constituting the sentence SN(m) are divided into a set of English sentences, { S E (+
iax(F(m)-r,1)), S E(wax(
F(m)-r,1)÷1),-, S E (s
in (F(+*)+r,N))), sentence SN
tr(
m). This process means finding a set of bilingual sentences for sentence SN(m) from ±r sentences before and after the bilingual sentence candidate obtained in step 01141. Let r be an integer satisfying 0≦r≦N. Weight 11N and win are used to deal with cases where m is close to 1 or M. Word segmentation is performed in steps 0103 and 0112, and the results are used. The bilingual relationship between words is determined by searching Japanese-English and English-Japanese conversion dictionaries. Here, if tr(m)=φ, then tr(m)=(SE(F(m)))(0
1142), Next, for n=1 to N, t r (1) to t r
If there is an English sentence SE(n) that is not included in any of (m), then
t r (ml) = t r (ml) U (SE (n)), t
r(m2)=t r(m2)U(SE(n)). As, SE(n) is included in tr(ml) and tr(m2). Here, tr(ml) is the maximum n1 in English sentences before sE(n) where tr(m) including SE(nl) exists.
is a set including SE(nl). Also, tr
(m2) is an English sentence after SE(n), and SE(n2)
For the smallest n2 such that there exists a tr(m) containing SE
This is a set containing (n2). If there are multiple m1 and m2, the above t r(ml
), t r (m2) is expanded (01143
). As described above, tr(1) to tr(M) obtained
Let be the translation sets of SN(1) to SN(M), respectively. Next, the Japanese dictionary in Figure 3, the English dictionary in Figure 4, the Japanese case frame in Figure 5, the English case frame in Figure 6, the Japanese-English/English-Japanese conversion dictionary in Figure 7, and the English-Japanese conversion dictionary in Figure 8. Japanese co-occurrence relationship dictionary,
The process of automatic learning machine translation when using the English co-occurrence relationship dictionary shown in Figure 9, the Japanese text file shown in Figure 10, and the English text file shown in Figure 11 will be explained in detail using examples.
First, we will explain what happens when "maintenance" is input as the process selection parameter. In the Japanese text storage device, it says, ``After I made a printer with that tool, he made a keyboard with that tool.''
Only one sentence is registered, so this one sentence is SN(1
) using a Japanese dictionary, Japanese case frame, and Japanese co-occurrence relationship dictionary to obtain the analysis results shown in Figure 14. The deep case code M represents a modification relationship, and L represents a continuous relationship between predicates. In the English sentence storage device, there is a bilingual sentence of {SN(1))
with the tool. J, rThe
n,ha made the keyboard wi
th the tool. J) is registered. Therefore, these two sentences are SE(1). Enter as SE(2),
Analyze the English text and obtain the analysis results shown in Figures 15 and 16. The verb r @ake J has two case frames, E1 and E2, so S E (1) and S E (2
) are shown in Figures 15(a), (b), and 15(b), respectively.
Two methods are obtained in each of Figures 16 (a) and (b). Next, proceed to the bilingual sentence determination routine. Now, the number of Japanese sentences M = 1
, the number of English sentences N=2, so F(1)=[1x2/1
1 = 2. Therefore, the parallel sentence candidate for SN(1) is S
E (2). next. From the English sentences before and after SE (2). Find the set of translated sentences for SN(1) based on the translated relationships between words. If r=1, the search range is (S E
(1), S E (2)). Bilingual relations registered in the Japanese-English and English-Japanese conversion dictionaries (I, 工), (He, he),
(make, make) + (tool), (
t r(1)=
becomes {SE(1), SE(2)). Since all the English sentences are included in tr(1), the 5-parallel sentence determination routine ends. Next, we move on to knowledge extraction processing. At this time, as mentioned in the explanation of the process of step 0115, since the bilingual sentence relationship is 1:2, the SN(1) shown in FIG.
It is necessary to pattern match the graph representing the analysis results of SE(1) and SE(2) shown in Figure 15 and Figure 16, respectively.・Bilingual translations registered in the English-Japanese conversion dictionary (and
then), the graphs in Figures 15 and 16 can be combined into the graph in Figure 17,
Pattern matching is performed using the graph in Figure 14 and Figure 17 (a).
You can do this for the graphs in (b), (c), and (d). 2. The pattern matching method described in the above-mentioned Japanese Patent Application No. 63-263934. In other words, the graph in FIG. 17(a) that best matches the graph in FIG. 14,
By comparing the conceptual relationships in the graph in Figure 14, we can determine the bilingual relationship between words (keyboard, keyboard),
Japanese co-occurrence relationship (make, keyboard, 0), (make, him, A), (make, tool, I), (make, printer, 0)
and English co-occurrence relationship (make, keyboardL O
) + (maketI, A), (mak
8, tool, I), (make, pr
inter, 0) is extracted and registered in the dictionary (Wi
For simplicity, only the co-occurrence relationships between predicates and case elements are extracted). Finally, the sentence-by-sentence bilingual relationship obtained in the bilingual sentence determination routine is registered in the Japanese sentence storage device and the English sentence storage device, respectively, as shown in FIGS. 18 and 19, and the process ends. In the above example, we have taken up the case where "r maintenance" is input as the process selection parameter, but even if "translation" is selected, the bilingual text will be stored in the English text memory by going through the steps of English sentence generation and Volume IIi. Since the information is stored in the device, the processing after bilingual text determination can be performed in exactly the same way as in the case of "maintenance".An embodiment of the present invention has been described above.According to the above embodiment, the machine translation system Automatically determines the bilingual relationship on a sentence-by-sentence basis from the translated English text that has undergone manual post-editing and the Japanese text input into the machine translation system, and calculates the bilingual relationship as Japanese and English words, and the Japanese text input into the machine translation system. Origin relations can be automatically and appropriately extracted and registered in a dictionary.For this reason, a machine translation system can be provided with a dictionary self-propagation function. [Extended example 1] Also, an extended example of the above embodiment. In the process of determining parallel sentences, the parallel sentence candidates SE(F(m)
), select SN(
m) In the step of detecting parallel sentences, the accuracy and efficiency of determining parallel sentences is improved by adding a step of determining the value of r based on the magnitude relationship between the number M of Japanese sentences and the number N of English sentences. There are ways to do this. In other words. If it is MAN, the number of English sentences corresponding to one Japanese sentence will be small, so let r be small<L, . If M<N, increase r. For example, provide a step to calculate r = [N/M] ([] means rounding up to the nearest whole number). [Extension Example 2] Furthermore, in the above embodiment and the method of changing r in [Extension Example 1], in the case of M)N. There are many cases where the enlarging operation of tr(m) (step 01143) is used. It is thought that the accuracy of determining bilingual sentences will be slightly lower. Therefore, before the bilingual sentence determination process, a step is added in which the magnitude relationship between M and N is determined, and if M>N, the direction of the translation sentence determination is changed from Japanese to English to English. In other words. For MAN, English S E
The above problem can be solved by providing a step of changing the bilingual sentence determination in the direction of finding the set of Japanese bilingual sentences from (1) to SE(N). It is clear from the symmetry of the system that it is possible to change the direction of bilingual sentence determination. [Extended Example 3] Furthermore, in the above embodiment, a step (step 0114) of determining the bilingual sentence of the Japanese sentence SN(m) from among the bilingual sentence candidates SE (F(m)) is performed based on the bilingual relationship between words.
In 2), the bilingual relationships are found for all content words, but by limiting the parts of speech of the words (for example, nouns) and detecting the bilingual relationships only for the limited words,
I can't think of a way to speed up the processing. [Extended example 4] Also, in the above example, paragraph management information is provided in the records of the Japanese text file and the English text file as shown in Figures 20 and 21, and the parallel sentence determination process is performed not on a sentence basis but on a paragraph basis. One possible method is to improve the decision accuracy by doing it in units. [Expansion Example 5] Furthermore, in the English sentence generation process and knowledge acquisition process of the above embodiment, a step is provided to select an alphabetic character string enclosed in parentheses after a word in the Japanese sentence as a translation of the word, and the translation is If it is not registered in the Japanese-English/English-Japanese conversion dictionary, by providing a step to register the bilingual relationship when changing the dictionary, the ability to select translated words and acquire knowledge can be improved. For example, if the Japanese sentence "I created a program (progra■)." is input, "
Select rprogramJ as the translation word for program J, and if the bilingual relationship of (program, prograya) is not registered in the Japanese-English/English-Japanese conversion dictionary, register the bilingual relationship. [Extended Example 6] Furthermore, in the above embodiment, in step 01142 of the bilingual sentence determination routine, the Japanese sentence S is determined based on the bilingual relationship of the words.
It is also possible to perform the process of finding N(m) bilingual sentence sets as follows. In other words, by setting a threshold value θ≧O, the number of content words Y(n) included in the English sentence SE(n) is determined by the SE
The ratio of the number T (m, n) of things included in (n), R (m, n) = T (m, n) / Y (n) is. R(m, n
) > θ, SE(n) is included in tr(m). The above example corresponds to the case where θ=0 in this extended example. Also, in this extended example,
Since accuracy is improved by providing the threshold value θ, step 01141 in the above embodiment can be omitted.

すなわち，ステップ０１１４２のＳＮ（ｍ）に対する対
訳文検索範囲を、英語文章全体、すなわち，（ＳＥ（１
），・・・，ＳＥ（Ｎ））とすることも可能である．［拡張例７］また，上記実施例において、対訳文から知識を抽出する
ステップで、対訳文関係が１対１である対訳文のみを求
め、１対１の対訳文からのみ知識を抽出することにより
、知識の抽出精度を向上させる方式も可能である．［変形例１］また、上記実施例の対訳文決定方式は、単語間の対訳関
係に基づいた方式である．しかし，対訳関係にある文集
合と文集合は，ほぼ同数の内容語を含むと考えられるの
で，対訳文決定は，以下に述べるように、内容語の数の
一致する文集合を対訳文章の先頭の文から求めていく方
法も可能である．まず、ｍ＝１〜Ｍについて、日本文Ｓ
Ｎ（ｍ）に含まれる内容語の数を、それぞれ．Ｘ（ｍ）
と定義する．また、ｎ＝１〜Ｎについて、英文ＳＥ（ｎ
）に含まれる内容語の数をＹ（ｎ）と定義する．そこで
，対訳文決定処理を以下第２２図に従って説明するよう
に行なう．まず、ＳＲＴ（１）＝１とする．ＳＲＴ（ｍ）はＳＮ（
ｒｎ）の対訳文集合ｔｒ（ｍ）の先頭の英文の番号であ
る（０１１４Ａ）。変数ＥＮＤＮ＝Ｏとする．ＥＮＤＮ
は、以下のステップ０１１４Ｃ〜０１１４Ｊのループの
終了を表わすフラグである（０１　１４Ｂ）．ｍ＝１〜
Ｍについて（ｍ＝ＭまたはＥＮＤＮ＝１になるまで）、
以下のステップ０　１　１４Ｃ〜０１１４Ｌを繰り返す
．変数ＳＵＭ＝Ｏとする。ＳＵＭは、英文の内容語数を
カウントする変数である（０１１４Ｇ）。ｎ＝ＳＲＴ（
ｍ）とする（０　１　１　４　Ｄ）．　Ｘ（ｍ）≧ＳＵ
Ｎ、かつ，ｎ≦Ｎである間、次の２つのステップ０１１
４Ｅ，０１１４Ｆを繰り返す。That is, the bilingual sentence search range for SN(m) in step 01142 is set to the entire English sentence, that is, (SE(1
), ..., SE(N)). [Extension Example 7] Furthermore, in the above embodiment, in the step of extracting knowledge from bilingual sentences, only bilingual sentences with a one-to-one bilingual sentence relationship are obtained, and knowledge is extracted only from the one-to-one bilingual sentences. It is also possible to improve the accuracy of knowledge extraction. [Modification 1] Furthermore, the bilingual sentence determination method of the above embodiment is a method based on the bilingual relationship between words. However, since it is thought that a sentence set and a sentence set in a bilingual relationship contain almost the same number of content words, bilingual sentence determination is performed by selecting the sentence set that has the same number of content words at the beginning of the bilingual sentence, as described below. It is also possible to find it from the sentence . First, for m = 1 to M, the Japanese sentence S
Let the number of content words included in N(m) be respectively . X (m)
It is defined as Also, for n = 1 to N, English SE (n
) is defined as Y(n). Therefore, the bilingual sentence determination process is performed as explained below in accordance with FIG. 22. First, let SRT(1)=1. SRT(m) is SN(
This is the number of the first English sentence in the bilingual sentence set tr(m) of rn) (0114A). Set the variable ENDN=O. ENDN
is a flag indicating the end of the loop of steps 0114C to 0114J below (01 14B). m=1～
For M (until m=M or ENDN=1),
Repeat steps 0 1 14C to 0114L below. Let variable SUM=O. SUM is a variable that counts the number of words in an English sentence (0114G). n=SRT(
m) (0 1 1 4 D). X(m)≧SU
N, and while n≦N, the next two steps 011
Repeat 4E, 0114F.

ＳＵＭにＹ（ｎ）を加える（０１１４Ｅ）．ｎに１を加
ｋへ、（０　１１４　Ｆ）　．　０　１　１４Ｅ，０１
１４Ｆ　（７）ステップをぬけた後、変数ＥＮＤ（ｍ）
＝ｎ−１とする．ＥＮＤ（ｍ）は、ｔｒ（ｒｎ）に含ま
れる最後の英文番号である（０　１　１４Ｇ）　．　ｔ
　ｒ（ｍ．）＝（Ｓ　Ｅ　（Ｓ　Ｒ　Ｔ（ｍ）），　・
＝，　Ｓ　Ｅ　（Ｅ　ＮＤ（ｍ）））とする（０１１４
Ｈ）．ｎ　＜　Ｎであるかどうかを判定し．ｎ＜Ｎであれば、
ステップ０１１４Ｊに進み、ｎ　＜　Ｎでなければ，ス
テップ０１１４Ｋに進ｔ／　（０１１４Ｉ）．Ｓ　Ｒ　
Ｔ　（　ｍ　＋　１　）　＝　ｎとする（０１１４Ｊ）
．ＥＮＤＮ＝１とする（０　１　１　４　Ｋ）。ｔｒ（
ｍ＋１）＝，−−−，＝　ｔ　ｒ　（Ｍ）＝　ｔ　ｒ　
（ｍ）とする（０　１　１　４　Ｌ）．以上の、０１１
４Ｃ〜０１１４Ｊのループが終了すれば、処理を終える
．［変形例２］上記変形例１の対訳文決定方式は，文に含まれる内容語
の数に基づいた方式であるが，文に含まれる文字数に基
づいた方式すなわち，文字数の一致する文集合を対訳文
章の先頭から求めていく方式も可能である．実行方式は
、上記変形例１のＸ（ｍ），Ｙ（ｎ）の定義をそれぞれ
．ＳＮ（ｍ）に含まれる文字数、ＳＥ　（ｎ）に含まれ
る文字数に変更すれば、上記変形例１の方法がそのまま
使える．本変形例は，上記変形例１に比べて、精度は若
干落ちると考えられるが、英語，仏語のように比較的近
い言語間の対訳文決定には有効である．〔発明の効果〕本発明によれば，対訳文から言語知識を獲得する双方向
機械翻訳システムにおいて，入力文章と、翻訳後，文章
単位で後編集された文章から、文単位での対訳関係を自
動的に決定することができるので、文単位での解析結果
のパターンマッチにより，単語間の対訳関係、イディオ
ムとしての対訳関係，共起関係および語のもつ意味的性
質を抽出し、辞書に登録することができ，翻訳用辞書に
種種のレベルの知識を自動的かつ適切に登録していくこ
とが可能になり、辞書の精度を自動的に向上させること
ができる．特に，マニュアル等の大量文の一括翻訳に対
して有効である。Add Y(n) to SUM (0114E). Add 1 to n to k, (0 114 F) . 0 1 14E,01
14F (7) After passing through step, variable END (m)
=n-1. END(m) is the last English number included in tr(rn) (0 1 14G). t
r(m.)=(S E (S R T(m)), ・
=, S E (E ND(m))) (0114
H). Determine whether n < N. If n<N,
Proceed to step 0114J, and if n < N, proceed to step 0114K t/ (0114I). S.R.
Set T (m + 1) = n (0114J)
．． Set ENDN=1 (0 1 1 4 K). tr(
m+1)=,−−−,= t r (M)= t r
(m) (0 1 1 4 L). Above, 011
When the loop from 4C to 0114J ends, the process ends. [Modification 2] The bilingual sentence determination method in Modification 1 above is a method based on the number of content words included in a sentence. It is also possible to calculate from the beginning of the bilingual text. The execution method is based on the definitions of X(m) and Y(n) in Modification 1 above. By changing the number of characters included in SN (m) and the number of characters included in SE (n), the method of Modification 1 above can be used as is. Although the accuracy of this modification is considered to be slightly lower than that of modification 1 above, it is effective in determining parallel sentences between relatively similar languages such as English and French. [Effects of the Invention] According to the present invention, in a bidirectional machine translation system that acquires linguistic knowledge from bilingual sentences, it is possible to determine the bilingual relationship on a sentence-by-sentence basis from an input sentence and a post-edited sentence on a sentence-by-sentence basis after translation. Since it can be determined automatically, by pattern matching the analysis results for each sentence, the bilingual relationships between words, bilingual relationships as idioms, co-occurrence relationships, and semantic properties of words can be extracted and registered in the dictionary. This makes it possible to automatically and appropriately register knowledge at various levels in a translation dictionary, and automatically improve the accuracy of the dictionary. It is particularly effective for batch translation of large amounts of text such as manuals.

[Brief explanation of drawings]

第１図は本発明の一実旅例の日英・英日機械翻訳システ
ムの処理の概略を示すＰＡＤ、第２図は本発明の一実施
例の日英・英日機械翻訳システムのハードウエア構成図
、第３図は日本語辞書のレコード内容の例を示す図，第
４図は英語辞帯レコード内容の例を示す図，第５図は日
本語格フレームのレコード内容の例を示す図，第６図は
英語格フレームのレコード内容の例を示す図、＠７図は
日英・英日変換辞書のレコード内容の例を示す図，第８
図は日本語共起関係辞書のレコード内容の例を示す図、
第９図は英語共起関係辞書のレコード内容の例を示す図
、第１０図，第１８図，第２０図は日本語文章ファイル
のレコード内容の例を示す図、第１１図，第１９図，第
２１図は英語文章のファイルのレコード内容の例を示す
図，第１２図は単語の対訳関係に基づいた対訳文決定処
理の詳細を示すＰＡＤ、第１３図は日本文番号の集合と
英文番号の集合の直積集合を示す概念図、第１４図〜第
１７図は日本文および英文の解析結果を表す木構造グラ
フの例を示す図，第２２図は文の含む内容語の数に基づ
いた対訳文決定処理の詳細を示すＰＡＤである．０１・・・入力装置、０２・・・処理装置，０３・・・
出力装置、０４・・・日本語辞書記憶装置．０５・・・
英語辞書記憶装置、０６・・・日本語格フレーム記憶装
置、０７・・・英語格フレーム記憶装置、０８・・・日
英・英日変換辞書記憶装置、０９・・・日本語共起関係
辞書記憶装置、１０・・・英語共起関係辞書記憶装置、
１１・・・日本語文章ファイル記憶装置、１２・・・英
語あ　ヒ　ｆｉ　　　梓托割猶霞第霞力 ■ ｍ詔循口バρ８１〆ρ８２第ｌθ ロＺ口猶／／閉循口２　　　　　　１ｈｅｙｔ．ｈｅｘＪｅｔｋｅｋｅｙｂ
ｏａ＞ｔｊ　　ＷｌｔＨｈｅｔｐａ／，　　　　一拓ロ遁　ｌ７　　口（ａ）猶口ＺＩ４囚偵）第ｌ５（ｂ）（の（ｂ冫招／ｇ口拓ｔｑ ■ ２θ 口力図Figure 1 is a PAD showing an outline of the processing of a Japanese-English/English-Japanese machine translation system as an example of the present invention, and Figure 2 is the hardware of a Japanese-English/English-Japanese machine translation system as an example of the present invention. Configuration diagram, Figure 3 shows an example of record contents of a Japanese dictionary, Figure 4 shows an example of English dictionary record contents, and Figure 5 shows an example of record contents of a Japanese case frame. , Figure 6 is a diagram showing an example of the record contents of an English case frame, @ Figure 7 is a diagram showing an example of the record contents of a Japanese-English/English-Japanese conversion dictionary, and Figure 8 is a diagram showing an example of record contents of an English case frame.
The figure shows an example of the record contents of a Japanese co-occurrence relationship dictionary.
Figure 9 is a diagram showing an example of the record contents of the English co-occurrence relationship dictionary, Figures 10, 18, and 20 are diagrams showing examples of the record contents of the Japanese text file, Figures 11, and 19. , Figure 21 is a diagram showing an example of the record contents of a file of English sentences, Figure 12 is a PAD showing details of the bilingual sentence determination process based on the bilingual relationship of words, and Figure 13 is a set of Japanese sentence numbers and English sentences. A conceptual diagram showing a Cartesian product set of a set of numbers. Figures 14 to 17 are diagrams showing examples of tree structure graphs representing the analysis results of Japanese and English sentences. Figure 22 is a diagram showing the number of content words included in a sentence. This is a PAD showing details of the bilingual sentence determination process. 01... Input device, 02... Processing device, 03...
Output device, 04...Japanese dictionary storage device. 05...
English dictionary storage device, 06...Japanese case frame storage device, 07...English case frame storage device, 08...Japanese-English/English-Japanese conversion dictionary storage device, 09...Japanese co-occurrence relationship dictionary Storage device, 10...English co-occurrence relationship dictionary storage device,
11...Japanese text file storage device, 12...English ahi fi Azusa Takuwari Yuka dai Kasiriku ■ m edict circulation bar ρ81 〆ρ82 th lθ ro Z mouth order // closed circulation mouth 2 1 heyt. hexJetkeyb
oa>tj WltHhetpa/, Ittakuroton l7 Mouth (a) Yukuchi Z I4 Prisoner) No. 15 (b) (of (b invitation/g Kuchitakutq ■ 2θ Kuchirokuzu)

Claims

[Scope of Claims] 1. A machine translation system characterized by the step of determining a sentence-by-sentence bilingual relationship between a second language sentence obtained by post-editing the translation result of the first language sentence and the first language sentence. 2. In the machine translation system according to claim 1, the step of determining the sentence-by-sentence bilingual relationship between the first language sentence and the second language sentence is performed for each first language sentence SN in the first language sentence. In the second language sentence SE in a sentence, S among the translations of content words included in SN relative to the number of content words included in SE
A machine translation characterized by comprising a substep of calculating the ratio of the number of items included in E, and a substep of including SE in a bilingual sentence set of SN if the ratio exceeds a certain threshold. system. 3. In the machine translation system according to claim 1, the step of determining the sentence-by-sentence bilingual relationship between the first language sentence and the second language sentence includes determining the number M of first language sentences and the number N of second language sentences;
For the mth first language sentence of the first language sentences, m×N/M
a substep of determining a first language sentence having an integer number as a bilingual sentence candidate for the m-th first language sentence;
From several sentences before and after the parallel translation candidate in the second language sentence,
a substep of determining a set of sentences containing translations of words included in the m-th first language sentence and making this set a bilingual sentence set of the m-th first language sentence; It is determined whether there is a second language sentence in the second language sentence that is not included in the bilingual sentence set of any first language sentence in the sentence, and It consists of a substep of including a second language sentence in a first language sentence that is not included in a set of parallel sentences of a monolingual sentence into a set of parallel sentences of any first language sentence in the first language sentence. A machine translation system characterized by: 4. In the machine translation system according to claim 3, in the substep of obtaining a parallel translation set of the first language sentence from several sentences before and after a parallel translation candidate of the first language sentence, the range for obtaining the parallel translation set is A machine translation system characterized in that the determination is made based on the magnitude relationship between the number M of monolingual sentences and the number N of second language sentences. 5. In the machine translation system according to claim 3, the number M of first language sentences and the number N of second language sentences are determined before the step of determining the sentence-by-sentence bilingual relationship between the first language sentence and the second language sentence. If M>N, the direction of determining the bilingual relationship for each sentence is determined by changing the bilingual sentence set of the second language sentences in the second language sentence to the first language sentence in the first language sentence. A machine translation system characterized by providing a step of converting in a direction determined from among. 6. In the machine translation system according to claim 3, in the step of obtaining a bilingual sentence set of the first language sentence from among the second language sentences including translations of words included in the first language sentence, the part of speech in the critical state is determined. A machine translation system characterized by: 7. The machine translation system according to claim 1, wherein the bilingual translation relationship in sentence units is determined from the bilingual translation relationship in paragraph units. 8. A machine translation system, characterized in that the machine translation system includes a step of determining a sentence-by-sentence bilingual relationship between a second language sentence and a first language sentence, which are obtained by manually post-editing the translation result of the first language sentence. dictionary maintenance method. 9. A dictionary maintenance method for a machine translation system, comprising a step of determining a sentence-by-sentence bilingual relationship between a first language sentence and a second language sentence that are in a bilingual relationship. 10. In a machine translation system, a bilingual sentence file characterized by providing a step of obtaining a sentence-by-sentence bilingual relationship between a second language sentence and a first language sentence, which are obtained by manually post-editing the translation result of the first language sentence. Creation method. 11. A method for creating a bilingual text file, which is characterized by providing a step of obtaining a bilingual relationship in sentence units between a first language sentence and a second language sentence that are in a bilingual relationship. 12. In the machine translation system according to claim 1, when a second language string enclosed in parentheses is present in the first language sentence after a word in the first language sentence when selecting a translation word when generating a second language sentence. A machine translation system comprising a step of selecting the character string as a translated word. 13. The machine translation system according to claim 12, wherein a second language character string enclosed in parentheses exists in the first language sentence after a word in the first language sentence, and the character string is a translation word of the word. , determine whether it is registered in the dictionary,
A machine translation system comprising the step of registering the character string as a translation of the word if the character string is not registered. 14. In the machine translation system according to claim 1, the step of determining the bilingual relationship between sentence units of the first language sentence and the second language sentence includes a sentence set in the first language sentence and a sentence set in the second language sentence. , a step of finding pairs of sentence sets in which the number of content words included in the sentence sets matches, starting from the beginning of a first language sentence and a second language sentence. 15. In the machine translation system according to claim 1, the step of determining the bilingual relationship between sentence units of the first language text and the second language text includes a sentence set in the first language text and a sentence set in the second language text. , a machine translation system characterized in that the step is to find a pair of sentence sets that have the same number of characters in the sentence sets, starting from the beginning of a first language sentence and a second language sentence. 16. The machine translation system according to claim 1, wherein knowledge is acquired only for sentences with a one-to-one bilingual relationship.