JP2840258B2

JP2840258B2 - Method of creating bilingual dictionary and co-occurrence dictionary for machine translation system

Info

Publication number: JP2840258B2
Application number: JP63240971A
Authority: JP
Inventors: 博行梶; 弘之中島
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1988-09-28
Filing date: 1988-09-28
Publication date: 1998-12-24
Anticipated expiration: 2013-12-24
Also published as: JPH0290364A

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は対訳辞書や共起関係辞書の自己増殖機能をも
つ機械翻訳方法およびシステムに関する。DETAILED DESCRIPTION OF THE INVENTION [Industrial Application Field] The present invention relates to a machine translation method and system having a bilingual dictionary and a co-occurrence relation dictionary having a self-propagating function.

[Conventional technology]

機械翻訳システムの重要な構成要素として辞書があ
る。辞書は、第１言語（ソース言語）および第２言語
（ターゲット言語）の語とその属性情報（品詞，意味コ
ード，格フレームなど）、第１言語の語と第２言語の語
の間の対訳関係、さらには第１言語あるいは第２言語に
おける語の共起関係などの情報を含んでいる。辞書の作
成は、従来、人手にまかされていたが、膨大な労力が必
要という問題があり、自動作成あるいは自己増殖機能が
実現できれば、その効果は極めて大きい。自動作成の可
能性の高い辞書情報としては語の共起関係があり、例え
ば、特開昭62−232076号公報には、文の解析結果から語
の共起関係を抽出して知識ベースに蓄積する方式が示さ
れている。このように文から知識を抽出するという考え
方は、機械翻訳システムの能力がシステムの利用ととも
に高まることになるので非常に有用である。しかし、こ
の特開昭62−232076号公報に示されている方式は、第１
言語における語の共起関係のみに限定され、他の辞書情
報には適用できないものである。An important component of a machine translation system is a dictionary. The dictionary contains words in the first language (source language) and second language (target language) and their attribute information (part of speech, semantic codes, case frames, etc.), and translations between words in the first language and words in the second language It contains information such as relations and co-occurrence relations of words in the first language or the second language. Hitherto, the creation of dictionaries has been left to manual labor, but there is a problem that enormous labor is required. If an automatic creation or self-reproduction function can be realized, the effect is extremely large. Dictionary information having a high possibility of automatic creation includes word co-occurrence relations. For example, Japanese Patent Laid-Open No. 232076/1987 discloses the extraction of word co-occurrence relations from sentence analysis results and accumulation in the knowledge base. A method for performing the operation is shown. The idea of extracting knowledge from sentences in this way is very useful because the ability of a machine translation system increases with the use of the system. However, the method disclosed in Japanese Patent Application Laid-Open No. 62-232076 is the first method.
This is limited to only the co-occurrence relation of words in a language, and cannot be applied to other dictionary information.

[Problems to be solved by the invention]

本発明の第１の目的は、対訳関係の知識を獲得する機
械翻訳方法およびシステムを提供することにある。A first object of the present invention is to provide a machine translation method and system for acquiring bilingual knowledge.

本発明の第２の目的は、第２言語の構文・意味解析を
行なうことなく、第２言語における語の共起関係を知識
を獲得する機械翻訳方法およびシステムを提供すること
にある。A second object of the present invention is to provide a machine translation method and system for acquiring knowledge of co-occurrence relations of words in a second language without performing syntax / semantic analysis of the second language.

[Means for solving the problem]

第１の目的を達成するために、本発明の第１の特徴
は、対訳辞書を利用して、第１言語の文とその訳文であ
る第２言語の文の間の、語レベルの対応関係を同定し、
同定された対訳関係のうち、対訳辞書に未登録のものを
対訳辞書に登録することにある。In order to achieve the first object, a first feature of the present invention is to use a bilingual dictionary to provide a word-level correspondence between a sentence in a first language and a translated sentence in a second language. To identify
Among the identified bilingual relations, those not registered in the bilingual dictionary are registered in the bilingual dictionary.

第２の目的を達成するために、本発明の第２の特徴
は、上述した処理に加え、第１言語の文に対して構文・
意味解析を行なつて、文中に含まれる語の共起関係を抽
出し、対訳関係同定で得た第１言語の語と第２言語の語
の対応関係を利用して、第１言語共起関係抽出で得た第
１言語の語の共起関係を第２言語の語の共起関係に写像
し、それで得た第２言語の語の共起関係を第２言語共起
関係辞書に登録することにある。In order to achieve the second object, a second feature of the present invention is that, in addition to the processing described above, a syntax
A semantic analysis is performed to extract the co-occurrence relations of the words included in the sentence, and the first language co-occurrence relation is used by using the correspondence between the words of the first language and the words of the second language obtained by the bilingual relation identification. The co-occurrence relation of the first language word obtained by the relation extraction is mapped to the co-occurrence relation of the second language word, and the obtained co-occurrence relation of the second language word is registered in the second language co-occurrence dictionary. Is to do.

[Action]

対訳関係同定処理は、第１言語の文とその訳文である
第２言語の文が与えられると、次のようにして語レベル
の対応関係を同定する。まず、第１言語の辞書を参照し
て、第１言語の文がｍ個の語Ｓ（１），…,S（ｍ）から
構成されていることを同定する。同様に、第２言語の辞
書を参照して、第２言語の文がｎ個の語Ｔ（１），…,T
（ｎ）から構成されていることを同定する。（本発明で
は、文の対訳関係を語の対訳関係の集合と考えることが
基本であるから、ｍ＝ｎであることが望ましいが、実際
には必ずしもｍ＝ｎになるとは限らない。）第１言語の
文及び第２言語の文を構成する語を同定すると、次に語
の対応関係の同定に移る。このためにはまず対訳辞書を
参照する。Ｓ（ｉ）とＴ（ｊ）の組が対訳辞書に含まれ
ていれば、今処理している対訳文においてもＳ（ｉ）と
Ｔ（ｊ）が対応していると判断する。このようにして、
ｐ組（但し、ｐｍかつｐｎ）の対応関係が同定でき
たとする。もし、対訳辞書に未登録の対応関係を対訳文
が含んでいれば、ｐ＜ｍかつｐ＜ｎである。そこで、残
つた（ｍ−ｐ）個の第１言語の語と（ｎ−ｐ）個の第２
言語の語の間で対応関係を推定する処理に移る。この推
定は常に可能であるとは限らない。しかし、ｐ＝ｍ−１
＝ｎ−１である場合、残つた語は第１言語，第２言語と
も一つであるから、それらが対応していると判断でき
る。ｐ＝ｍ−１＝ｎ−１でない場合でも、ｍ−p,n−ｐ
が小さければ、品詞などの情報を手がかりにして、対応
関係を同定できることが多い。また、ｍ＞ｎの場合、第
１言語の文において、対応する第２言語の語が同定され
ていない語が連続しているなら、これらを一つの複合語
とみなすという方法をとることにより、語の対応関係が
推定できるようになることがある。ｍ＜ｎの場合も同様
である。以上述べたことから、対訳辞書に未登録の対訳
関係を含む対訳文についても、語レベルの対応関係が同
定し得ることが理解できるであろう。In the bilingual relationship identification process, when a sentence in the first language and a sentence in the second language, which is a translation thereof, are given, the correspondence at the word level is identified as follows. First, referring to the dictionary of the first language, it is identified that the sentence of the first language is composed of m words S (1),..., S (m). Similarly, referring to the dictionary of the second language, the sentence of the second language is composed of n words T (1),.
(N) is identified. (In the present invention, since it is fundamental to consider the bilingual relation of a sentence as a set of bilingual relations of words, it is preferable that m = n, but in practice, m = n is not always true.) After identifying the words that make up the sentences in one language and the sentences in the second language, the process proceeds to identification of the correspondence between words. For this purpose, first, the bilingual dictionary is referred to. If a set of S (i) and T (j) is included in the bilingual dictionary, it is determined that S (i) and T (j) also correspond to the bilingual sentence currently being processed. In this way,
It is assumed that the correspondence of p sets (provided that pm and pn) can be identified. If the bilingual sentence includes a correspondence that is not registered in the bilingual dictionary, p <m and p <n. Then, the remaining (mp) first language words and the (np) second words
The process proceeds to estimating the correspondence between words in the language. This estimation is not always possible. However, p = m-1
In the case of = n-1, since the remaining words are the same in both the first language and the second language, it can be determined that they correspond. Even when p = m−1 = n−1, m−p, n−p
If is small, it is often possible to identify the correspondence using information such as the part of speech as a clue. Further, when m> n, in a sentence in the first language, if a word in which a corresponding word in the second language is not identified is continuous, these words are regarded as one compound word. In some cases, the correspondence between words can be estimated. The same is true for m <n. From the above description, it can be understood that a word-level correspondence can be identified for a bilingual sentence including a bilingual relationship that is not registered in the bilingual dictionary.

対訳辞書登録処理は、対訳関係同定処理で同定した語
レベルの対応関係のうち、対訳辞書に未登録のものを対
訳辞書に登録するので、対訳文から語の対訳関係を獲得
し、辞書に蓄積する機能をもつ機械翻訳システムが実現
できる。In the bilingual dictionary registration process, among the word-level correspondences identified in the bilingual relationship identification process, those that have not been registered in the bilingual dictionary are registered in the bilingual dictionary. A machine translation system having the function of performing

さらに、第１言語共起関係抽出処理は、第１言語の文
の構文・意味解析により得られる語の依存関係の集合か
ら、あいまい性のないもののみを選択することにより、
第１言語の語の共起関係を抽出する。この結果は、共起
関係写像処理により第２言語の語の共起関係に写像さ
れ、第２言語共起関係辞書登録処理により第２言語共起
関係辞書に登録される。このようにして、対訳文から第
２言語の語の共起関係を獲得し、辞書に蓄積する機能を
もつ機械翻訳システムが実現できる。Furthermore, the first language co-occurrence relation extraction process selects only unambiguous ones from a set of word dependencies obtained by syntactic and semantic analysis of sentences in the first language.
The co-occurrence relationship between words in the first language is extracted. The result is mapped to the co-occurrence relation of the words in the second language by the co-occurrence relation mapping processing, and is registered in the second language co-occurrence relation dictionary by the second language co-occurrence relation dictionary registration processing. In this way, a machine translation system having a function of acquiring a co-occurrence relationship between words in the second language from a bilingual sentence and accumulating the acquired words in a dictionary can be realized.

〔Example〕

以下、本発明の一実施例を図面により説明する。本実
施例では第１言語が日本語、第２言語が英語であるとす
る。実施例の機械翻訳システムに必要なハードウエア
は、第２図に示すように、中央処理装置1,入力装置2,出
力装置3,辞書記憶装置4,テキスト記憶装置５から構成さ
れる。Hereinafter, an embodiment of the present invention will be described with reference to the drawings. In this embodiment, it is assumed that the first language is Japanese and the second language is English. The hardware necessary for the machine translation system of the embodiment includes a central processing unit 1, an input unit 2, an output unit 3, a dictionary storage unit 4, and a text storage unit 5, as shown in FIG.

中央処理装置１は本発明による、対訳関係及び共起関
係の知識を獲得する処理のほか、翻訳処理，テキストの
入出力及び更新の処理を行なう。入力装置２はテキスト
の入力や修正に、出力装置３はテキストの表示に用いら
れるが、本発明に直接は関係しない。The central processing unit 1 performs a translation process, a text input / output and an update process, in addition to a process of acquiring knowledge of a bilingual relationship and a co-occurrence relationship according to the present invention. The input device 2 is used for inputting and correcting text, and the output device 3 is used for displaying text, but is not directly related to the present invention.

辞書記憶装置４には、日本語辞書41,日本語共起関係
辞書42,日英対訳辞書43,英語辞書44,英語共起関係辞書4
5が記憶される。なお、辞書のこのような分割は論理的
なものであり、複数の辞書を一体化して記憶することも
含めて、物理的な構造は特に限定されない。第３図
（ａ）に日本語辞書41のレコードを例示する。日本語辞
書のレコードは、日本語の語411、その属性情報として
の品詞412,意味コード413,格フレーム414を含む。英語
辞書44については例示しないが、日本語辞書と同様であ
る。第３図（ｂ）に日英対訳辞書43のレコードを例示す
る。日英対訳辞書のレコードは、日本語の語431と英語
の語432の対である。第３図（ｃ）に英語共起関係辞書4
5のレコードを例示する。英語共起関係辞書のレコード
は、共起関係を有する二つの英語の語451,452と、それ
らの間の関係を示すコード453とを含む。日本語共起関
係辞書42について例示しないが、英語共起関係辞書と同
様である。The dictionary storage device 4 includes a Japanese dictionary 41, a Japanese co-occurrence relation dictionary 42, a Japanese-English bilingual dictionary 43, an English dictionary 44, and an English co-occurrence relation dictionary 4.
5 is stored. Note that such division of the dictionary is logical, and the physical structure is not particularly limited, including integrating and storing a plurality of dictionaries. FIG. 3A illustrates a record of the Japanese dictionary 41. The record of the Japanese dictionary includes a Japanese word 411, a part of speech 412 as its attribute information, a meaning code 413, and a case frame 414. Although the English dictionary 44 is not illustrated, it is the same as the Japanese dictionary. FIG. 3 (b) shows an example of a record in the Japanese-English bilingual dictionary 43. A record in the Japanese-English bilingual dictionary is a pair of Japanese word 431 and English word 432. Figure 3 (c) shows the English co-occurrence dictionary 4
5 records are exemplified. The record of the English co-occurrence relation dictionary includes two English words 451 and 452 having a co-occurrence relation and a code 453 indicating a relation between them. Although the Japanese co-occurrence relation dictionary 42 is not illustrated, it is the same as the English co-occurrence relation dictionary.

テキスト記憶装置５には、日本語テキストフアイル51
と英語テキストフアイル52が記憶される。日本語テキス
トと英語テキストは、いずれも文ごとに文番号が付さ
れ、同一の文番号をもつ文は対訳関係にある。従つて、
文番号をキーとして、対訳文を検索することができる。The text storage device 5 has a Japanese text file 51
And the English text file 52 are stored. Each of the Japanese text and the English text is assigned a sentence number for each sentence, and sentences having the same sentence number have a bilingual relationship. Therefore,
A bilingual sentence can be searched using the sentence number as a key.

次に、第１図に従つて、日本語の文とその対訳文であ
る英語の文から、語レベルの対応関係を同定する処理、
さらに英語の語の共起関係を抽出する処理を説明する。Next, according to FIG. 1, a process for identifying a correspondence at a word level from a Japanese sentence and an English sentence which is a translation thereof,
Further, a process of extracting a co-occurrence relationship between English words will be described.

第１のステツプは、日本語の文を構成する語の同定で
ある。まず、日本語辞書を参照しながら、文を語に分割
する（処理101）。日本語の文は語の境界を示す空白を
含まないので、若干複雑な処理が必要であるが、例えば
特願昭59−162443に示されている方法を用いればよい。
次に、文を構成する語のうち、内容語を選択する（処理
102）。内容語はｍ個含まれているとし、それらをＳ
（１），…,S（ｍ）とする。この処理により、助詞や助
動詞などの機能語を、対応関係を同定する処理に対象外
とする。機能語は機械翻訳システムの処理において重要
な役割を果たすものであり、また、数も少ないので、辞
書の記述は完成していると考えてよい。また、内容語ほ
ど言語間の対応関係が単純でないからである。The first step is to identify the words that make up the Japanese sentence. First, the sentence is divided into words while referring to the Japanese dictionary (processing 101). Since Japanese sentences do not include a space indicating a word boundary, slightly complicated processing is required. For example, a method disclosed in Japanese Patent Application No. 59-162443 may be used.
Next, a content word is selected from the words constituting the sentence (processing
102). Assume that m content words are included, and
(1),..., S (m). By this processing, functional words such as particles and auxiliary verbs are excluded from the processing for identifying the correspondence. Since the function words play an important role in the processing of the machine translation system, and the number is small, it can be considered that the description of the dictionary is completed. Also, the correspondence between languages is not as simple as content words.

第２ステツプは、英語の文を構成する語の同定であ
る。英語辞書を参照しながら、文を語に分割する（処理
103）。英語の文は語の境界が空白で示されているの
で、変化計の処理が必要であるほかは単純な処理で実現
できる。次に、文を構成する語のうち、内容語を選択す
る（処理104）。内容語はｎ個含まれているとし、それ
をＴ（１），…,T（ｎ）とする。The second step is to identify the words that make up the English sentence. Split sentence into words while referring to English dictionary (processing
103). Since English sentences have word boundaries shown as blanks, they can be realized by simple processing, except for the need for change meter processing. Next, a content word is selected from the words constituting the sentence (process 104). It is assumed that n content words are included, and they are T (1),..., T (n).

第３のステツプは、対訳辞書を参照して、日本語の語
Ｓ（１），…,S（ｍ）と英語の語Ｔ（１），…,T（ｎ）
の間に対応関係を同定する処理である。日本語の語を指
すインデクスをi,英語の語を指すインデクスをj,iから
ｊの写像をσとする。また、第３ステツプで決定された
対応関係の数を示すレジスタをｋとする。第３ステツプ
の処理は、ｋを初期値０にする（処理105）ことから始
まる。次に、ｉを初期値１にする（処理106）。さら
に、ｊを初期値１にする（処理107）。このあと、Ｓ
（ｉ）とσ^-1（ｊ）が未決定である（処理108）Ｔ
（ｊ）の対が対訳辞書に含まれているかどうかを調べる
（処理109）。Ｓ（ｉ）とＴ（ｊ）の対が対訳辞書に含
まれていれば、ｉとｊが対応するものとしてσを定義し
（処理110）、ｋを１だけ増加させる（処理111）。Ｓ
（ｉ）とＴ（ｊ）の対が対訳辞書に含まれていなけれ
ば、ｊをｎになるまで（処理112）カウントアツプし
（処理113）、Ｓ（ｉ）とＴ（ｊ）の対が対訳辞書に含
まれているかどうか調べる処理を続ける。107〜113の処
理は、ｉをｍになるまで（処理114）、カウントアツプ
しながら続ける（処理115）。以上の処理により、日本
語の語Ｓ（１），…,S（ｍ）と英語の語Ｔ（１），…,T
（ｎ）の間の対応関係のうち、対訳辞書に含まれている
ものが同定される。In the third step, referring to the bilingual dictionary, the Japanese words S (1),..., S (m) and the English words T (1),.
This is a process of identifying the correspondence between An index indicating a Japanese word is i, an index indicating an English word is j, and a mapping from i to j is σ. Also, let k be a register indicating the number of correspondences determined in the third step. The process in the third step starts by setting k to an initial value 0 (process 105). Next, i is set to the initial value 1 (process 106). Further, j is set to an initial value 1 (process 107). After this, S
(I) and σ ⁻¹ (j) are undecided (process 108) T
It is checked whether the pair of (j) is included in the bilingual dictionary (process 109). If the pair of S (i) and T (j) is included in the bilingual dictionary, σ is defined as a correspondence between i and j (process 110), and k is increased by 1 (process 111). S
If the pair of (i) and T (j) is not included in the bilingual dictionary, count up j to n (process 112) (process 113), and the pair of S (i) and T (j) The process of checking whether the entry is included in the bilingual dictionary is continued. The processing of 107 to 113 is continued while counting up (processing 115) until i becomes m (processing 114). By the above processing, the Japanese words S (1),..., S (m) and the English words T (1),.
Among the correspondences between (n), those included in the bilingual dictionary are identified.

第４のステツプは、第３ステツプで同定できなかつた
対応関係の推定である。本実施例では、最も簡単に推定
できる場合のみ、これを行なう。すなわち、日本語の語
の数ｍと英語の語の数ｎが一致し（処理116）、しかも
（ｍ−１）個の対応関係が第３ステツプで同定された
（処理117）場合である。この場合、σ（ｉ）が未決定
のi,σ^-1（ｉ）が未決定のｊがそれぞれ一つ存在するの
で、これをさがす（処理118）。該当するi,jがi₀,j₀で
あれば、σ（i₀）＝j₀であるとし（処理119）、Ｓ
（i₀）とＴ（j₀）の対を対訳辞書に登録する（処理12
0）。The fourth step is an estimation of the correspondence that could not be identified in the third step. In the present embodiment, this is performed only when the estimation can be made most easily. That is, this is the case where the number m of Japanese words and the number n of English words match (process 116), and (m-1) correspondences are identified in the third step (process 117). In this case, since there is one j for which σ (i) has not been determined and one j for which σ ^-1 (i) has not been determined, this is searched (process 118). If the corresponding i, j is i ₀ , j _0, it is assumed that σ (i ₀ ) = j ₀ (process 119), and S
A pair of (i ₀ ) and T (j ₀ ) is registered in the bilingual dictionary (processing 12
0).

第５のステツプは、日本語の文に対して構文・意味解
析を行ない、語の共起関係を抽出する処理である（処理
121）。すなわち、第１ステツプの結果得られる内容語
Ｓ（１），…,S（ｍ）の間の係り受け関係を解析し、あ
いまい性のないもののみを選択する。The fifth step is a process of performing syntactic and semantic analysis on Japanese sentences to extract co-occurrence relations of words (processing
121). That is, the dependency relationship between the content words S (1),..., S (m) obtained as a result of the first step is analyzed, and only the unambiguous one is selected.

日本語の文からｌ個の共起関係〔Ｓ（i_p）,S
（ｉ′_ｐ）,R_p〕（ｐ＝1,…,l）が抽出されたとする。
ここで、Ｓ（i_p）とＳ（ｉ′_ｐ）が共起する語、R_pがそ
れらの間の関係を表わすコードである。From a Japanese sentence, one co-occurrence relation [S (i _p ), S
(I ' _p ), R _p ] (p = 1,..., L) is extracted.
Here, S (i _p ) and S (i ′ _p ) co-occur, and R _p is a code representing the relationship between them.

第６のステツプは、第５ステツプで得た日本語の語の
共起関係を英語の語の共起関係に写像する処理である。
日本語の語と英語の語の間の対応関係はσで表わされて
いるので、日本語の語の共起関係〔Ｓ（i_p）,S
（ｉ′_ｐ）,R_p〕を英語の語の共起関係〔Ｔ（σ
（i_p））,T（σ（ｉ′_ｐ））,R_p〕に写像する（処理12
3）。このあとこれを英語共起関係辞書に登録する（処
理124）。以上の処理を、共起関係を指すインデクスＰ
を初期値１から（処理122）、ｌまで（処理125）カウン
トアツプしながら（処理126）を繰り返す。The sixth step is processing for mapping the co-occurrence relation of Japanese words obtained in the fifth step to the co-occurrence relation of English words.
Since the correspondence between Japanese words and English words is represented by σ, the co-occurrence of Japanese words [S (i _p ), S
(I ′ _p ), R _p ] is _defined as the co-occurrence relation of English words [T (σ
(I _p )), T (σ (i ′ _p )), R _p ] (processing 12
3). Thereafter, this is registered in the English co-occurrence dictionary (process 124). The above processing is performed on the index P indicating the co-occurrence relationship.
Is repeated from the initial value 1 (process 122) to 1 (process 125) while counting up (process 126).

以上、第１図に従つて、語の対訳関係と共起関係を獲
得する処理を説明した。The processing for acquiring the bilingual relations and co-occurrence relations of words has been described above with reference to FIG.

第４図に、対訳文から語の対訳関係と共起関係が獲得
される例を示す。対訳文は、・文書フアイルを更新する。FIG. 4 shows an example in which a bilingual relation and a co-occurrence relation of words are obtained from a bilingual sentence. For bilingual sentences: ・ Update the document file.

・update the document file. である。第３図に示した辞書を用いて、この対訳文を処
理するものとする。第１のステツプにより得られる日本
語の語は第４図（Ａ）に示すとおりである。第２のステ
ツプにより得られる英語の語は第４図（ｂ）に示すとお
りである。第３のステツプで得られる語の対応関係は第
４図（ｃ）に示すとおりである。第３図（ｂ）の日英対
訳辞書は「更新する」と「update」の対が含まれていな
いので、この対応関係は同定されていない。これは第４
のステツプで推定され、第４図（ｄ）に示す語の対応関
係が得られる。日英対訳辞書には、「更新する」と「up
date」の対が登録され、第４図（ｅ）に示す内容とな
る。さらに、第５のステツプで第４図（ｆ）に示す日本
語の語の共起関係が得られる。これは、第６のステツプ
で、第４図（ｇ）に示す英語の語の共起関係に写像さ
れ、英語共起関係辞書に登録される。英語共起関係辞書
の内容は第３図（ｃ）から第４図（ｈ）にように変わ
る。・ Update the document file. This bilingual sentence is processed using the dictionary shown in FIG. The Japanese words obtained by the first step are as shown in FIG. The English words obtained by the second step are as shown in FIG. 4 (b). The correspondence between words obtained in the third step is as shown in FIG. 4 (c). Since the Japanese-English bilingual dictionary of FIG. 3 (b) does not include a pair of "update" and "update", this correspondence has not been identified. This is the fourth
And the word correspondence shown in FIG. 4 (d) is obtained. In the Japanese-English bilingual dictionary, "update" and "up
The date pair is registered and has the contents shown in FIG. 4 (e). Further, in the fifth step, the co-occurrence relation of Japanese words shown in FIG. 4 (f) is obtained. This is mapped to the English word co-occurrence relation shown in FIG. 4 (g) in the sixth step, and is registered in the English co-occurrence relation dictionary. The content of the English co-occurrence dictionary changes from FIG. 3 (c) to FIG. 4 (h).

本実施例では、第４のステツプの処理は最も簡単に対
応関係が推定できる場合のみを示した。ここで若干の工
夫をすることにより、対応関係の推定能力が向上できる
ことを示しておく。In the present embodiment, the processing in the fourth step is described only in the case where the correspondence can be estimated most easily. Here, it is shown that the ability to estimate the correspondence can be improved by making some contrivances.

例えば、日本語の語の数ｍと英語の語の数ｎが同じで
あつても、第３ステツプで複数の対応関係が同定できな
いことがある。いま、第３図（ｂ）の日英対訳辞書が、
「文書」と「document」の対を含んでいないとする。こ
の時、第４図の例は、第３ステツプの結果が第５図
（ｃ）のようになる。すなわち、「更新する」と「upda
te」，「文書」と「document」の２組の対応関係が同定
されていないことになる。このような場合、語の品詞な
どを利用して、対応関係を推定すればよい。すなわち、
「文書」と「フアイル」にともに名詞であり、「文書フ
アイル」が名詞句であると考えることができる。また、
「document」と「file」はともに名詞であり、「docume
nt file」が名詞句であると考えることができる。ここ
で、「フアイル」と「file」の対応関係が第３ステツプ
で同定されているので、「文書」と「document」の対応
関係を推定することができる。このようにして、第４ス
テツプの結果が第５図（ｄ）のようになる。For example, even if the number m of Japanese words and the number n of English words are the same, a plurality of correspondences may not be identified in the third step. Now, the Japanese-English bilingual dictionary in FIG.
Suppose that it does not include a pair of “document” and “document”. At this time, in the example of FIG. 4, the result of the third step is as shown in FIG. 5 (c). In other words, "update" and "upda
This means that the correspondence between two sets of "te", "document", and "document" has not been identified. In such a case, the correspondence may be estimated using the part of speech of the word. That is,
Both "document" and "file" are nouns, and "document file" can be considered a noun phrase. Also,
"Document" and "file" are both nouns, and "docume
nt file "can be considered a noun phrase. Here, since the correspondence between "file" and "file" has been identified in the third step, the correspondence between "document" and "document" can be estimated. Thus, the result of the fourth step is as shown in FIG.

次に、日本語の語の数ｍと英語の語の数ｎが同じでな
い場合の対応のしかたを第６図に例示する。ここでの対
訳文は、・端末制御装置・terminal controller である。第１のステツプ，第２のステツプの結果は、普
通、第６図（ａ），（ｂ）に示すようになるであろう。
すなわち、日本語の語は３個、英語の語は２個である。
第３のステツプは、「端末」と「terminal」の対応関係
のみが同定され、第６図（ｃ）の結果が得られる。ここ
で、対応関係が同定できなかつた「制御」と「装置」は
隣接しており、これを一つの複合語とみなせば、日本言
と英語の語数が同じになるので、第６図（ｄ）のように
考える。このようにすれば、第３のステツプの結果は第
６図（ｅ）のように修正され、第４のステツプで第６図
（ｆ）の結果を得ることができる。すなわち、「制御装
置」と「controller」の対応関係を推定することができ
る。Next, an example of how to deal with the case where the number m of Japanese words and the number n of English words are not the same is illustrated in FIG. The bilingual sentence here is:-terminal controller-terminal controller. The results of the first and second steps will typically be as shown in FIGS. 6 (a) and (b).
That is, there are three Japanese words and two English words.
In the third step, only the correspondence between "terminal" and "terminal" is identified, and the result shown in FIG. 6 (c) is obtained. Here, "control" and "apparatus", for which the correspondence could not be identified, are adjacent to each other, and if this is regarded as one compound word, the number of words in Japanese and English will be the same. Think like). In this way, the result of the third step is modified as shown in FIG. 6 (e), and the result of FIG. 6 (f) can be obtained in the fourth step. That is, the correspondence between the “control device” and the “controller” can be estimated.

本実施例では、対訳辞書を利用して、語の対応関係を
同定する処理を、（１）日本語の文を構成する語の同
定，（２）英語の文を構成する語の同定，（３）日本語
の語と英語の語の対を対訳辞書から検索する処理の順序
で行なつている。しかし、その順序で行なわなければな
らないわけでない。例えば、（１）日本語の文を構成す
る語の同定，（２）対訳辞書を参照して、日本語の語の
対訳語の候補を求める処理，（３）対訳語の候補を英語
の文中から検索する処理の順序で行なうことも可能であ
る。In the present embodiment, the process of identifying the correspondence between words using the bilingual dictionary includes (1) identification of words constituting a Japanese sentence, (2) identification of words constituting an English sentence, 3) The processing is performed in the order of processing for searching pairs of Japanese words and English words from the bilingual dictionary. However, it does not have to be done in that order. For example, (1) identification of words constituting a Japanese sentence, (2) processing for obtaining a candidate for a translated word of a Japanese word by referring to a bilingual dictionary, and (3) candidate of a translated word in an English sentence It is also possible to perform the search in the order of processing.

さらに、本実施例では、英語の語の共起関係は抽出し
たものを全て共起関係辞書に登録する方式をとつてい
る。しかし、英語の語の共起関係の利用目的が、翻訳時
の対訳選択であるので、全てを登録する必要はない。日
本語の語の各々について、対訳語の優先順位を示す情報
を含むように、対訳辞書を構成しておき、第１順位の対
訳語から成る共起関係を英語共起関係辞書への登録の対
象外としてもよい。これにより、英語共起関係辞書の容
量が小さくすることができる。Further, in the present embodiment, the co-occurrence relation of English words is registered in the co-occurrence relation dictionary. However, since the purpose of utilizing the co-occurrence relationship of English words is to select a translation at the time of translation, it is not necessary to register all of them. For each Japanese word, a bilingual dictionary is constructed so as to include information indicating the priority of the bilingual word, and the co-occurrence relation consisting of the first-rank bilingual word is registered in the English co-occurrence relation dictionary. It may be excluded from the target. Thereby, the capacity of the English co-occurrence relation dictionary can be reduced.

また、本実施例では、日本語の複合名詞が存在する場
合、それに対応する英語の名詞句を構成する語の間に共
起関係があると判断され、それが英語共起関係辞書に登
録されることになる。しかし、日本語の複合名詞，英語
の名詞句は品詞列パターンで同定できるので、それらの
対を日英対訳辞書に登録することも考えられる。この方
法によると、翻訳における日英変換過程で、複合名詞を
一つの単位として扱うことができるので、翻訳処理の負
荷が小さくなるという効果が得られる。Further, in the present embodiment, when a Japanese compound noun exists, it is determined that there is a co-occurrence relationship between the words constituting the English noun phrase corresponding to the compound noun, and this is registered in the English co-occurrence dictionary. Will be. However, since compound nouns in Japanese and noun phrases in English can be identified by part-of-speech sequence patterns, it is conceivable to register their pairs in a Japanese-English bilingual dictionary. According to this method, a compound noun can be treated as one unit in the process of translating between Japanese and English in translation, so that the effect of reducing the load of translation processing can be obtained.

〔The invention's effect〕

本発明によれば、対訳文から語の対訳関係および共起
関係を学習する機能が実現できる。機械翻訳システムで
は、翻訳結果に後編集を施して得られる訳文を入力文と
対にして考えると、対訳文が絶えず利用できる。従つ
て、本発明により、自己増殖機能をもつ機械翻訳システ
ムを実現することができる。According to the present invention, a function of learning a bilingual relation and a co-occurrence relation of words from a bilingual sentence can be realized. In a machine translation system, when a translated sentence obtained by performing post-editing on a translation result is considered as a pair with an input sentence, the translated sentence can be used constantly. Therefore, according to the present invention, a machine translation system having a self-propagating function can be realized.

[Brief description of the drawings]

第１図は本発明の一実施例の、対訳文から語の対応関係
を同定する処理、および第２言語の語の共起関係を抽出
する処理のフローチャート、第２図は本発明を実施する
ハードウエア構成図、第３図は辞書のレコードの例を示
す図、第４図は対訳文に対する処理の例を示す図、第５
図及び第６図は、語の対応関係の推定処理の変形例の説
明図である。 101〜102……第１言語の文を構成する語の同定処理、10
3〜104……第２言語の文を構成する語の同定処理、105
〜115……対訳辞書を参照して、第１言語の文の語と第
２言語の語の対応関係を同定する処理、116〜120……第
１言語の語と第２言語の語の対応関係の推定と対訳辞書
への登録処理、121……第１言語の文からの語の共起関
係の抽出処理、122〜126……第１言語の語の共起関係の
第２言語への写像と第２言語共起関係辞書への登録処
理。FIG. 1 is a flowchart of a process of identifying a word correspondence from a bilingual sentence and a process of extracting a co-occurrence relationship of words in a second language according to an embodiment of the present invention, and FIG. 2 implements the present invention. FIG. 3 is a diagram showing an example of a dictionary record, FIG. 4 is a diagram showing an example of processing for a bilingual sentence, FIG.
FIG. 6 and FIG. 6 are explanatory diagrams of a modification of the processing for estimating the correspondence between words. 101-102... Identification processing of words constituting the first language sentence, 10
3 to 104: Identification processing of words constituting sentences in the second language, 105
~ 115 ... Processing for identifying correspondence between words in the first language sentence and words in the second language with reference to the bilingual dictionary, 116-120 ... Correspondence between words in the first language and words in the second language Estimating the relationship and registering it in the bilingual dictionary, 121... Extracting the co-occurrence relationship of words from sentences in the first language, 122-126. Registration processing for mapping and second language co-occurrence dictionary.

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開昭62−297972（ＪＰ，Ａ) 特開昭63−5470（ＪＰ，Ａ) 特開昭59−165178（ＪＰ，Ａ) 特開昭63−44276（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁶，ＤＢ名) G06F 17/20 - 17/28 ＪＩＣＳＴファイル（ＪＯＩＳ)──────────────────────────────────────────────────続き Continuation of the front page (56) References JP-A-62-297972 (JP, A) JP-A-63-5470 (JP, A) JP-A-59-165178 (JP, A) JP-A-63-297 44276 (JP, A) (58) Field surveyed (Int. Cl. ⁶ , DB name) G06F 17/20-17/28 JICST file (JOIS)

Claims

(57) [Claims]

1. A bilingual dictionary creation method for a machine translation system having a first language dictionary, a second language dictionary, and a first language and a second language bilingual dictionary, wherein the first language dictionary is used. A first step of identifying a word constituting a sentence of the first language, and using a dictionary of the second language, a word constituting a sentence of the second language having a bilingual relationship with the sentence of the first language. A second step of identifying, and, for the combination of the first language word and the second language word identified in the first and second steps, respectively, determine whether a bilingual relationship can be established between the first language and the second language. A third step of determining using a bilingual dictionary to identify word-level correspondence, and examining whether any words for which correspondence could not be identified exist, and if so, the correspondence between those words And estimating in the fourth step The method of creating bilingual dictionary machine translation systems, characterized in that it comprises a fifth step of registering the correspondence relationship to the bilingual dictionary, a.

2. The method according to claim 1, wherein the step of estimating the correspondence between words is performed in a sentence of the first language or the second language, the correspondence of which is undetermined. If there is a continuous list of words, they are regarded as one compound word, and a step of estimating the correspondence between words is included.

3. The method for creating a bilingual dictionary of a machine translation system according to claim 1, wherein said first step and said second step include a step of selecting only a content word from the identified words. The third step is a method for creating a bilingual dictionary of a machine translation system, wherein the method is performed only on the selected content word.

4. A bilingual dictionary creation method for a machine translation system having a first language dictionary, a second language dictionary, and a first language and a second language bilingual dictionary, wherein the first language dictionary is used. And a first step of identifying words forming a sentence of the first language, and using the bilingual dictionary to select a candidate of a bilingual word in the second language for the word in the first language identified in the first step. A second step of searching; and a third step of searching for a bilingual word candidate obtained in the second step from a sentence in a second language having a bilingual relationship with the sentence in the first language to identify a word-level correspondence. A fourth step of examining whether or not there is a word for which a correspondence could not be identified in the third step, and estimating a correspondence between the words when there is, a correspondence estimated in the fourth step; Register relationships in the bilingual dictionary Bilingual dictionary creating a machine translation system which comprises 5 steps and.

5. A method according to claim 4, wherein the step of estimating the correspondence between words is performed in a sentence of the first language or the second language, the correspondence of which is undetermined. , A method for creating a bilingual dictionary for a machine translation system, which includes a process of estimating the correspondence between words when there are consecutive words.

6. A method for creating a bilingual dictionary for a machine translation system according to claim 4 or 5, wherein said first step includes a step of selecting only content words from the identified words,
A bilingual dictionary creation method for a machine translation system, wherein the second step and the third step are performed only on the selected content word.

7. A method for creating a co-occurrence relation dictionary of a machine translation system having a bilingual dictionary of a first language and a second language and a co-occurrence relation dictionary of a second language, comprising the steps of: A first step of identifying a word-level correspondence between a sentence in the first language and a sentence in the second language, which is a translation thereof, using a dictionary; Performing a second step of extracting co-occurrence relations of words included in the sentence, and the second step of utilizing the correspondence between the words of the first language and the words of the second language obtained in the first step. The first extracted in
A third step of mapping the co-occurrence relations of the words of the language to the co-occurrence relations of the words of the second language; and storing the co-occurrence relations of the words of the second language mapped in the third step in a second language co-occurrence dictionary. Registering a co-occurrence relation dictionary for a machine translation system.

8. A method for creating a co-occurrence relation dictionary for a machine translation system according to claim 7, wherein said bilingual dictionary stores information on the priority between bilingual words when one word has a plurality of bilingual words. Wherein the fourth step excludes a co-occurrence relationship obtained as a first-ranked pair of bilingual words from the registration target. .

9. A method for creating a co-occurrence relation dictionary for a machine translation system according to claim 7, wherein the co-occurrence relation of a second language word obtained corresponding to a compound word in a sentence of the first language is the first language. A method for creating a co-occurrence relation dictionary for a machine translation system, comprising registering in a bilingual dictionary of a first language and a second language together with the compound word of the language.