JP2006053867A

JP2006053867A - Bilingual dictionary creation method and apparatus, and computer program

Info

Publication number: JP2006053867A
Application number: JP2004236641A
Authority: JP
Inventors: Kyoki Haku; 京姫白
Original assignee: ATR Advanced Telecommunications Research Institute International
Current assignee: ATR Advanced Telecommunications Research Institute International
Priority date: 2004-08-16
Filing date: 2004-08-16
Publication date: 2006-02-23

Abstract

【課題】任意の言語の組合せに対し自動的に対訳辞書を作成できるようにする。
【解決手段】この方法は、第１言語の話者のための、第１言語から第３言語への方向性を持つ第１の辞書と、第２言語の話者のための、第２言語から第３言語への方向性を持つ第２の辞書とを、電子的に読取可能な形で準備するステップ５２と、第１の辞書の内容語に関する各エントリと第２の辞書の内容語に関する各エントリとからなるエントリ対のうち、各エントリに含まれる第３言語の訳語の集合に対する所定の関数として定められる類似度の値が予め定めるしきい値以上であるエントリ対を抽出するステップ５４と、抽出するステップで抽出されたエントリ対に対応する内容語対を電子的に読取可能な形式で蓄積するステップ５６とを含む。
【選択図】図１
A bilingual dictionary can be automatically created for a combination of arbitrary languages.
The method includes a first dictionary for a first language speaker having a direction from a first language to a third language, and a second language for a second language speaker. Preparing a second dictionary having directionality from the first language to the third language in an electronically readable form, each entry relating to a content word in the first dictionary, and a content word in the second dictionary A step 54 of extracting an entry pair whose similarity value determined as a predetermined function with respect to the set of translations of the third language included in each entry is equal to or greater than a predetermined threshold, among the entry pairs composed of each entry; And step 56 for storing the content word pairs corresponding to the entry pairs extracted in the extracting step in an electronically readable form.
[Selection] Figure 1

Description

この発明は自然言語に関する電子的辞書の自動構築技術に関し、特に、言語の組合せを問わず、自動的に機械翻訳に適した対訳辞書を構築することができる辞書作成方法および装置に関する。 The present invention relates to an automatic construction technique for an electronic dictionary relating to a natural language, and more particularly to a dictionary creation method and apparatus capable of automatically constructing a bilingual dictionary suitable for machine translation regardless of the combination of languages.

機械翻訳においては、対訳辞書は最も基本的かつ重要な資源である。しかし、対訳辞書の構築には長い時間と多大なコストとを要する。今後、様々な言語の間での自動翻訳に対する需要が増大することが予測されるが、言語の組合せは言語の数に応じて指数関数的に増大するため、いかにして対訳辞書を構築するかが重要な課題である。 In machine translation, the bilingual dictionary is the most basic and important resource. However, it takes a long time and a great cost to construct a bilingual dictionary. In the future, the demand for automatic translation between various languages is expected to increase, but the number of language combinations increases exponentially with the number of languages, so how to build a bilingual dictionary Is an important issue.

電子的な対訳辞書の構築方法には様々な手法が存在する。従来の手法のうち、最も興味深いものとして、パピヨンプロジェクト（The Papillon Project：非特許文献１）がある。この文献は、大規模、詳細、かつ一定原則にしたがった辞書を作成するための多言語語彙データを構築することに主眼をおいている。多言語辞書を作成するための主たる資源は単言語辞書である。それら単言語辞書を言語間のリンクに関連付け、多言語辞書作成のためのデータベースとする。このような方法により多言語辞書を作成する場合には数多くの協力者および専門家を必要とする。 There are various methods for constructing an electronic bilingual dictionary. Among the conventional methods, the most interesting one is the Papillon Project (Non-Patent Document 1). This document focuses on building multilingual vocabulary data for creating a large-scale, detailed, and compliant dictionary. The primary resource for creating a multilingual dictionary is a monolingual dictionary. These monolingual dictionaries are associated with links between languages and are used as a database for creating a multilingual dictionary. When a multilingual dictionary is created by such a method, a large number of collaborators and experts are required.

このプロジェクトに関連した研究として、英仏辞書および日英辞書から日仏辞書を作成する試みが非特許文献２に開示されている。
クリスティアンボアト他２名、「パピヨンプロジェクト：オープンソース辞書および語彙目録を導出するための多言語語彙データベースの共同構築」、第２回ワークショップ、ＮＬＰＸＭＬ−２００２、ｐｐ．９３−９６、２００２年（Christian Boitet et al., The Papillon project: cooperatively building a multilingual lexical data-base to derive open source dictionaries & lexicons", the 2nd Workshop NLPXML-2002, pp. 93-96）クミコタナカ他１名、「第３言語を介した２言語辞書の作成」、ＣＯＬＩＮＧ−９４、ｐｐ．２９７−３０３、１９９４年（Kumiko Tanaka et al., "Construction of a Bilingual Dictionary Intermediated by a Third Language", COLING-94, pp. 297-303.） As research related to this project, an attempt to create a Japanese-French dictionary from an English-French dictionary and a Japanese-English dictionary is disclosed in Non-Patent Document 2.
Christian Boat et al., “Papillon Project: Collaborative construction of an open source dictionary and multilingual vocabulary database for deriving vocabulary catalogs”, 2nd Workshop, NLPXML-2002, pp. 93-96, 2002 (Christian Boitet et al., The Papillon project: cooperatively building a multilingual lexical data-base to derive open source dictionaries & lexicons ", the 2nd Workshop NLPXML-2002, pp. 93-96) Kumiko Tanaka and one other, “Creation of a bilingual dictionary through a third language”, CORING-94, pp. 297-303, 1994 (Kumiko Tanaka et al., "Construction of a Bilingual Dictionary Intermediated by a Third Language", COLING-94, pp. 297-303.)

上記したような多言語間の機械翻訳を実現する場合には、多くの言語の組合せに対する対訳辞書が必要となる。しかもそのための時間およびコストを最小限に抑える必要がある。また、通常は対訳辞書の作成には両言語に精通した専門家が必要とされるが、言語の組合せが多様になると、そのような専門家を探すことも困難になる。そのため、対訳の対象となる言語に精通していない作業者であっても、任意の言語の組合せに対して対訳辞書を効率よく作成できるように、自動的に対訳辞書を作成するための技術が必要とされている。 In order to realize machine translation between multiple languages as described above, bilingual dictionaries for many language combinations are required. Moreover, it is necessary to minimize the time and cost for that purpose. In addition, an expert who is familiar with both languages is usually required to create a bilingual dictionary. However, when there are various combinations of languages, it becomes difficult to find such experts. Therefore, there is a technology for automatically creating a bilingual dictionary so that even a worker who is not familiar with the language to be translated can efficiently create a bilingual dictionary for any combination of languages. is needed.

上記した従来技術では、対訳辞書を作成する技術の必要性と、そのための試みとは示されているものの、任意の言語の組合せに対して、自動的にかつ効率よく対訳辞書を作成するための方策は示されていない。また、機械翻訳の性質上、利用可能な資源からできるだけ多くのエントリを抽出できることが望ましいが、そのための方策も従来技術には示されていない。 Although the prior art described above shows the necessity of a technique for creating a bilingual dictionary and an attempt to do so, it is possible to automatically and efficiently create a bilingual dictionary for any combination of languages. No strategy is shown. In addition, it is desirable that as many entries as possible can be extracted from available resources because of the nature of machine translation, but no measures for that purpose are shown in the prior art.

それゆえに本発明の目的は、任意の言語の組合せに対しても、自動的に対訳辞書を作成することが可能な対訳辞書作成方法および装置を提供することである。 SUMMARY OF THE INVENTION Therefore, an object of the present invention is to provide a bilingual dictionary creation method and apparatus capable of automatically creating a bilingual dictionary for any combination of languages.

本発明のほかの目的は、任意の言語の組合せに対しても，自動的にできるだけ多くのエントリを持つ対訳辞書を作成できる対訳辞書作成方法および装置を提供することである。 Another object of the present invention is to provide a bilingual dictionary creation method and apparatus capable of automatically creating a bilingual dictionary having as many entries as possible for any combination of languages.

本発明の第１の局面に係る対訳辞書作成方法は、第１言語と第２言語との間の対訳辞書を、第３言語を仲介として自動的に作成するための対訳辞書作成方法であって、第１言語の話者のための、第１言語から第３言語への方向性を持つ第１の辞書と、第２言語の話者のための、第２言語から第３言語への方向性を持つ第２の辞書とを、電子的に読取可能な形で準備するステップと、第１の辞書の内容語に関する各エントリと第２の辞書の内容語に関する各エントリとからなるエントリ対のうち、各エントリに含まれる第３言語の訳語の集合に対する所定の関数として定められる類似度の値が予め定めるしきい値以上であるエントリ対を抽出するステップと、抽出するステップで抽出されたエントリ対に対応する内容語対を電子的に読取可能な形式で蓄積する第１のステップとを含む。 A bilingual dictionary creating method according to a first aspect of the present invention is a bilingual dictionary creating method for automatically creating a bilingual dictionary between a first language and a second language with a third language as an intermediary. A first dictionary with directionality from the first language to the third language for speakers of the first language and a direction from second language to the third language for speakers of the second language A second dictionary having the characteristics in an electronically readable form, and an entry pair comprising each entry relating to the content word of the first dictionary and each entry relating to the content word of the second dictionary Of these, a step of extracting an entry pair whose similarity value defined as a predetermined function for the set of translations of the third language included in each entry is equal to or greater than a predetermined threshold, and an entry extracted in the extracting step The content word pair corresponding to the pair can be read electronically And a first step of storing by the formula.

上のような方向性を持つ第１および第２の辞書を選択し、それら辞書から、類似度の値がしきい値以上となるエントリ対に対応する内容語対を抽出することにより、それらと異なる方向性を持つ辞書を使用した場合と比較して、より高い精度でかつ多くの見出しを含む第１言語と第２言語の対訳辞書を作成できることが実験により確かめられた。辞書の作成手順は自動的で、利用者は対訳辞書の言語の組合せを指定するだけでよい。したがって、多くの言語の組合せに対して、精度の高い対訳辞書を短時間に作成できる。また、この方法によれば第１言語と第２言語との双方に精通した専門家は不要であり、非常に多くの言語の組合せに対しても、容易に対訳辞書を作成できる。 By selecting the first and second dictionaries having the above directionality and extracting the content word pairs corresponding to the entry pairs having a similarity value equal to or greater than a threshold value from these dictionaries, Experiments have confirmed that bilingual dictionaries for the first language and the second language can be created with higher accuracy and more headlines than when using dictionaries with different directions. The dictionary creation procedure is automatic, and the user only needs to specify the language combination of the bilingual dictionary. Therefore, a highly accurate bilingual dictionary can be created in a short time for many combinations of languages. Further, according to this method, an expert who is familiar with both the first language and the second language is unnecessary, and a bilingual dictionary can be easily created even for a very large number of language combinations.

好ましくは、準備するステップは、各々、所定の言語話者のための、所定の方向性を持つ、電子的に読取可能な複数個の２言語辞書を準備するステップと、第１言語および第２言語に関する指定を受けるステップと、複数個の２言語辞書から、指定を受けるステップで指定された第１言語から他言語への方向性を持つ辞書と、指定を受けるステップで指定された第２言語から上記他言語への方向性を持つ辞書とからなる辞書対を選択するステップとを含む。 Preferably, the preparing step includes preparing a plurality of electronically readable bilingual dictionaries each having a predetermined direction for a predetermined language speaker, and a first language and a second language. A step of receiving a specification related to a language, a dictionary having directionality from the first language specified in the step of receiving a specification to a different language from a plurality of bilingual dictionaries, and a second language specified in the step of receiving the specification Selecting a dictionary pair consisting of a dictionary having a direction to the other language.

より好ましくは、準備するステップは、各々、所定の言語話者のための、所定の方向性を持つ、電子的に読取可能な複数個の２言語辞書を準備するステップと、第１言語および第２言語、ならびに第３言語に関する指定を受けるステップと、複数個の２言語辞書から、指定を受けるステップで指定された第１言語から第３言語への方向性を持つ辞書と、指定を受けるステップで指定された第２言語から第３言語への方向性を持つ辞書とからなる辞書対を選択するステップとを含む。 More preferably, the preparing step comprises: preparing a plurality of electronically readable bilingual dictionaries each having a predetermined direction for a predetermined language speaker; Receiving a designation relating to two languages and a third language; a dictionary having directionality from the first language to the third language designated in the designation receiving step from a plurality of two-language dictionaries; and receiving the designation Selecting a dictionary pair consisting of a dictionary having directionality from the second language to the third language specified in (1).

さらに好ましくは、選択するステップでは、複数個の辞書対が選択されることがあり、準備するステップはさらに、選択するステップで選択された複数個の辞書対のうち、各辞書対に含まれるエントリ数の和が最も大きなものを選択するステップを含む。 More preferably, a plurality of dictionary pairs may be selected in the selecting step, and the preparing step further includes an entry included in each dictionary pair among the plurality of dictionary pairs selected in the selecting step. Selecting the one with the largest sum of numbers.

好ましくは、抽出するステップは、第１の辞書の各エントリｘと、第２の辞書の各エントリｙとの間の類似度Ｓ₁（ｘ，ｙ）を次の式によって算出するステップを含む。 Preferably, the extracting step includes a step of calculating a similarity S ₁ (x, y) between each entry x in the first dictionary and each entry y in the second dictionary by the following equation.

ただしＺ（ｘ）、Ｚ（ｙ）はそれぞれ、エントリｘ、ｙ内に含まれる第３言語の訳語の集合を表し、記号｜・｜は集合の要素数を表す。抽出するステップはさらに、類似度Ｓ₁（ｘ，ｙ）が予め定めるしきい値以上であるエントリ対（ｘ，ｙ）に対応する内容語対を抽出するステップを含んでもよい。

However, Z (x) and Z (y) represent a set of third language translations included in the entries x and y, respectively, and the symbols | · | represent the number of elements of the set. The step of extracting may further include a step of extracting a content word pair corresponding to the entry pair (x, y) whose similarity S ₁ (x, y) is greater than or equal to a predetermined threshold value.

より好ましくは、予め定めるしきい値は、エントリ対（ｘ，ｙ）に含まれる、第３言語の共通訳語数の関数である。 More preferably, the predetermined threshold value is a function of the number of common translation words of the third language included in the entry pair (x, y).

さらに好ましくは、共通訳語数の関数は、共通訳語数に対する単調減少関数である。 More preferably, the function of the number of common translations is a monotonically decreasing function with respect to the number of common translations.

共通訳語数の関数は、共通訳語数が４以上のときには０に設定されるようにしてもよい。 The function of the number of common translations may be set to 0 when the number of common translations is 4 or more.

好ましくは、対訳辞書作成方法は、第１言語の話者のための、第３言語から第１言語への方向性を持つ第３の辞書と、第２言語の話者のための、第３言語から第２言語への方向性を持つ第４の辞書とを、電子的に読取可能な形で準備するステップと、第３言語の見出しに対応する、第３の辞書のエントリと第４の辞書のエントリとの双方から、所定条件を充足する第３言語の例文とその訳文とを抽出し、対訳文対を作成するステップと、対訳文対を作成するステップで作成された対訳文対のうち、訳文同士のアライメントを採るステップと、アライメントを採るステップで互いにアラインされたチャンクから、機能語対を抽出し電子的に読取可能な形式で蓄積する第２のステップと、電子的に読取可能な形式で蓄積する第１および第２のステップで蓄積された内容語対と機能語対とをマージするステップとをさらに含む。 Preferably, the bilingual dictionary creation method includes a third dictionary having a direction from the third language to the first language for a speaker in the first language and a third dictionary for a speaker in the second language. Preparing a fourth dictionary having directionality from the language to the second language in an electronically readable form, a third dictionary entry corresponding to the third language heading, and a fourth dictionary The third language example sentence that satisfies the predetermined condition and its translation are extracted from both the dictionary entries, and the parallel sentence pair created in the step of creating the parallel sentence pair and the step of creating the parallel sentence pair Of these, the second step of extracting the functional word pairs from the chunks aligned with each other in the alignment step and the step of taking the alignment, and storing them in an electronically readable form, and electronically readable The first and second steps stored in various formats Further comprising the step of merging up in stored content words paired with a function word pairs.

本発明の第２の局面に係るコンピュータプログラムは、コンピュータにより実行されると、上記したいずれかの対訳辞書作成方法の全てのステップを実施するように当該コンピュータを制御する。 When the computer program according to the second aspect of the present invention is executed by a computer, the computer program controls the computer to perform all the steps of any of the above-described bilingual dictionary creation methods.

本発明の第３の局面に係る対訳辞書作成装置は、第１言語と第２言語との間の対訳辞書を、第３言語を仲介として自動的に作成するための対訳辞書作成装置であって、第１言語の話者のための、第１言語から第３言語への方向性を持つ第１の辞書と、第２言語の話者のための、第２言語から第３言語への方向性を持つ第２の辞書とを、電子的に読取可能な形で準備するための辞書準備手段と、第１の辞書の内容語に関する各エントリと第２の辞書の内容語に関する各エントリとからなるエントリ対のうち、各エントリに含まれる第３言語の訳語の集合に対する所定の関数として定められる類似度の値が予め定めるしきい値以上であるエントリ対を抽出するためのエントリ対抽出手段と、エントリ対抽出手段により抽出されたエントリ対に対応する内容語対を電子的に読取可能な形式で蓄積するための内容語対記憶手段とを含む。 A bilingual dictionary creation device according to a third aspect of the present invention is a bilingual dictionary creation device for automatically creating a bilingual dictionary between a first language and a second language, using the third language as an intermediary. A first dictionary with directionality from the first language to the third language for speakers of the first language and a direction from second language to the third language for speakers of the second language A dictionary preparing means for preparing the second dictionary having the characteristics in an electronically readable form, each entry relating to the content word of the first dictionary, and each entry relating to the content word of the second dictionary Entry pair extraction means for extracting an entry pair whose similarity value determined as a predetermined function for a set of translated words in the third language included in each entry is equal to or greater than a predetermined threshold value Corresponds to the entry pair extracted by the entry pair extraction means That the content words pairs and a content word pair storage means for storing in electronically readable form.

［辞書の性質について］
従来の技術では、辞書の性質についてはあまり考慮されていない。しかし、実際には辞書の性質は対訳辞書の自動作成において非常に重要な意味を持つ。辞書の性質の典型的なものとしては、辞書が想定する母語と、辞書の方向性とがある。 [Dictionary properties]
In the prior art, the nature of the dictionary is not considered much. In practice, however, the nature of the dictionary is very important in the automatic creation of a bilingual dictionary. Typical characteristics of the dictionary include the native language assumed by the dictionary and the directionality of the dictionary.

本明細書において「辞書の方向性」とは、どの言語のエントリからどの言語の語を引くか、という意味の方向性を示す。本明細書ではこの方向性を「⇒」で表す。例えば英和辞書の方向性は英語⇒日本語であり、和英辞書の方向性は「日本語⇒英語」である。韓英辞書であれば「韓国語⇒英語」であり、英韓辞書であれば「英語⇒韓国語」である。なお、以下の説明では、記載を簡易にするために言語について略号を用いる。例えば日本語は「Ｊ」、英語は「Ｅ」、韓国語は「Ｋ」などと記載する。したがって和英辞書は「Ｊ⇒Ｅ」、英和辞書は「Ｅ⇒Ｊ」と表す。 In this specification, “dictionary directionality” indicates a directionality meaning which language word is drawn from which language entry. In this specification, this directionality is represented by “⇒”. For example, the direction of the English-Japanese dictionary is English → Japanese, and the direction of the Japanese-English dictionary is “Japanese → English”. The Korean-English dictionary is “Korean → English”, and the English-Korean dictionary is “English → Korean”. In the following description, abbreviations are used for languages in order to simplify the description. For example, “J” for Japanese, “E” for English, “K” for Korean, etc. Therefore, the Japanese-English dictionary is expressed as “J → E”, and the English-Japanese dictionary is expressed as “E → J”.

辞書が想定する母語とは、辞書が想定している利用者の母語のことをいう。通常、日本で作成されている英和辞書も和英辞書も、日本語を母語とし、英語を外国語とする話者（以下「日本語話者」と呼ぶ。）を利用者として想定している。一方、例えばイギリスで作成された英和辞書があるとすれば、想定している利用者は英語を母語とする話者（「英語話者」と呼ぶ。以下、他の言語についても同様である。）である。本明細書では、辞書が想定している母語を表す略号を、辞書の方向性の前に小さな文字で付すこととする。したがって、日本語話者のための英和辞書は「_ＪＥ⇒Ｊ」、日本語話者のための和英辞書は「_ＪＪ⇒Ｅ」、韓国語話者のための韓英辞書は「_ＫＫ⇒Ｅ」、韓国語話者のための英韓辞書は「_ＫＥ⇒Ｋ」と表す。本明細書では、一般的に、言語Ｚの話者を対象として作成された、言語Ｘから言語Ｙへの辞書を「_ＺＸ⇒Ｙ」と表す。 The native language assumed by the dictionary refers to the user's native language assumed by the dictionary. In general, both English-Japanese and Japanese-English dictionaries created in Japan are assumed to be users who speak Japanese as their mother tongue and English as a foreign language (hereinafter referred to as “Japanese speakers”). On the other hand, for example, if there is an English-Japanese dictionary created in the United Kingdom, an assumed user is called a speaker whose native language is English ("English speaker". The same applies to other languages hereinafter. ). In the present specification, an abbreviation representing a native language assumed by the dictionary is given in small letters before the directionality of the dictionary. Therefore, the English-Japanese dictionary for Japanese speakers _"J E⇒J", Japanese-English dictionary for Japanese speakers _"J J⇒E", Korean-English dictionary for the Korean speaker is _{"K K} ⇒E ”, the English-Korean dictionary for Korean speakers is expressed as“ _K E⇒K ”. In this specification, a dictionary from language X to language Y, which is generally created for speakers of language Z, is represented as “ _{Z X} => Y”.

辞書の方向性については、従来技術では考慮されていない。しかし、方向性は実は対訳辞書作成の上で非常に重要な概念である。例えば、日本語話者が使用する和英辞書（_ＪＪ⇒Ｅ）は、日本語話者が主として英語の文を書いたり英語で話したりするときに使用される。そうした状況では、当然のことながら日本語話者は、自分が英語に翻訳しようとする日本語の単語の意味を良く知っている。したがってその単語に関する詳細な説明を辞書に記載する必要はない。例外として、日本語にあって英語に対応する概念が存在しない単語の場合には、日本語のその概念を英語で表現するための説明的な記載が載せられていることがある。また、日本語に対応する英語を和英辞書で調べようとする場合、訳語の品詞（ＰＯＳ）に関する情報も二次的なものとなり、それほど重要ではない。 The directionality of the dictionary is not considered in the prior art. However, directionality is actually a very important concept in creating a bilingual dictionary. For example, Japanese-English dictionary _(J J⇒E) that Japanese speaker to be used is used when a Japanese speaker or speak in English mainly or write a sentence in English. In such a situation, of course, a Japanese speaker knows the meaning of the Japanese word he is trying to translate into English. Therefore, it is not necessary to write a detailed explanation about the word in the dictionary. As an exception, in the case of a word in Japanese that does not have a concept corresponding to English, an explanatory description for expressing the concept in Japanese in English may be provided. In addition, when trying to look up English corresponding to Japanese with a Japanese-English dictionary, information on the part of speech (POS) of the translation is also secondary and is not so important.

一方、日本語話者のための英和辞書（_ＪＥ⇒Ｊ）は、日本語話者の観点から見て英語の単語の意味および用法などを知るために使用される。そのため、各エントリには、訳語だけではなく、語法、用法などの説明的情報、例文、および品詞などの文法的情報が付されていることが多い。 On the other hand, English-Japanese dictionary _(J E⇒J) for Japanese speakers, as seen from the point of view of the Japanese speaker is used to know, such as English the meaning and usage of the word. For this reason, not only the translated words but also grammatical information such as descriptive information such as wording and usage, example sentences, and parts of speech are often attached to each entry.

例えば図４に示すように、日本語と英語において互いに対応する単語である「タンゴ」と「ｔａｎｇｏ」とについて、和英辞書での「タンゴ」のエントリ１６０と、英和辞書の「ｔａｎｇｏ」のエントリ１６２とは、その内容が大きく異なっている。エントリ１６０はごく簡単に訳語のみを挙げているのに対し、エントリ１６２の記載は詳細である。訳語だけではなく、その単語に関する説明、複数形および品詞などの文法的情報、例文などの情報が記載されている。言語間の対訳辞書を作成する場合、エントリ１６０のように訳語のみを挙げている場合の方が機械処理上では複雑さがなく、効率よく精度の高い対訳辞書を作成できる。一方、エントリ１６２のように訳語に関する説明文が多く含まれているものは、対訳辞書を作成する上での不要な情報を含むため、必要な機械処理が複雑になり、また誤りも多くなる。 For example, as shown in FIG. 4, for the words “tango” and “tango” that correspond to each other in Japanese and English, an entry 160 of “tango” in the Japanese-English dictionary and an entry 162 of “tango” in the English-Japanese dictionary. The contents are very different. The entry 160 lists only the translated words very simply, whereas the entry 162 is detailed. Not only the translated words but also explanations about the words, grammatical information such as plurals and parts of speech, and information such as example sentences are described. When creating a bilingual dictionary between languages, the case where only translated words are listed as in the entry 160 is less complicated in machine processing, and a highly accurate bilingual dictionary can be created efficiently. On the other hand, an entry 162 containing many explanatory texts related to translated words includes unnecessary information for creating a bilingual dictionary, so that necessary machine processing is complicated and errors are increased.

したがって、機械翻訳のための対訳辞書作成では、このような辞書の方向性と、その対象としている話者の言語とを考慮する必要がある。 Therefore, in creating a bilingual dictionary for machine translation, it is necessary to consider the directionality of such a dictionary and the language of the target speaker.

本願発明の発明者は、方向性を考慮し、英語を中間言語として日本語と韓国語との対訳辞書（以下「日韓対訳辞書」と呼ぶ。）を作成する実験を行なった。その結果、_ＪＪ⇒Ｅと_ＫＫ⇒Ｅという辞書の組合せで日韓対訳辞書を作成すると、それ以外の組合せを用いた場合と比較してはるかによい結果を得ることができた。辞書の方向性には普遍性があると考えられるので、このように辞書の方向性を考慮した作成方法は、日韓対訳辞書だけでなく任意の言語の組合せにおいても有効であると考えられる。以下、そのような対訳辞書自動作成装置の実施の形態について述べる。 The inventor of the present invention conducted an experiment to create a bilingual dictionary of Japanese and Korean (hereinafter referred to as “Japanese-Korean bilingual dictionary”) using English as an intermediate language in consideration of the directionality. As a result, it is creating a Japanese-Korean translation dictionary in combination dictionary that _J J⇒E and _K K⇒E, it was possible to obtain much better results in comparison with the case of using the other combinations. Since the directionality of the dictionary is considered to have universality, the creation method considering the directionality of the dictionary in this way is considered to be effective not only in a Japanese-Korean bilingual dictionary but also in any combination of languages. Hereinafter, an embodiment of such a bilingual dictionary automatic creation apparatus will be described.

［構成］
図１は、本発明の一実施の形態に係る辞書自動作成システム２０のブロック図である。図１を参照して、辞書自動作成システム２０は、種々の組合せの２言語間の辞書を多数格納した辞書記憶装置３８と、それら辞書の方向性などに関する属性情報を記憶した辞書属性記憶装置３６と、利用者から与えられる、作成する辞書の言語の組合せ情報（第１言語Ｘを指定する情報３０および第２言語Ｙを指定する情報３２）を受け、辞書属性記憶装置３６を参照して辞書記憶装置３８に格納された辞書の中から第１言語Ｘと第２言語Ｙとによって最適な辞書の組合せを選択し、第１言語Ｘと第２言語Ｙとの間の電子対訳辞書４０を自動的に作成するための辞書自動作成装置３４とを含む。 [Constitution]
FIG. 1 is a block diagram of an automatic dictionary creation system 20 according to an embodiment of the present invention. Referring to FIG. 1, a dictionary automatic creation system 20 includes a dictionary storage device 38 that stores a large number of dictionaries between two languages of various combinations, and a dictionary attribute storage device 36 that stores attribute information related to the directionality of these dictionaries. The dictionary language combination information (information 30 designating the first language X and information 32 designating the second language Y) given by the user is received, and the dictionary is referred to the dictionary attribute storage device 36. An optimal combination of dictionaries is selected from the dictionaries stored in the storage device 38 according to the first language X and the second language Y, and the electronic parallel translation dictionary 40 between the first language X and the second language Y is automatically selected. And an automatic dictionary creation device 34 for creating it.

図２に、辞書記憶装置３８の内容を示す。図２に示すように、辞書記憶装置３８は、様々な言語の各々について、その言語を母語とする利用者のための、他言語との間の辞書群８０、８２、８４、…を含む。例えば辞書群８０は日本語話者のための辞書群であって、和英辞書９０、英和辞書９２、…を含む。同様に辞書群８２は韓国語話者のための辞書群であって、韓英辞書１００、英韓辞書１０２、…を含む。辞書群８４は英語話者のための辞書群であって、和英辞書１１０、英和辞書１１２、…を含む。 FIG. 2 shows the contents of the dictionary storage device 38. As shown in FIG. 2, the dictionary storage device 38 includes, for each of various languages, a dictionary group 80, 82, 84,... With another language for a user whose native language is the language. For example, the dictionary group 80 is a dictionary group for Japanese speakers, and includes a Japanese-English dictionary 90, an English-Japanese dictionary 92,. Similarly, the dictionary group 82 is a dictionary group for Korean speakers, and includes a Korean-English dictionary 100, an English-Korean dictionary 102,. The dictionary group 84 is a dictionary group for English speakers, and includes a Japanese-English dictionary 110, an English-Japanese dictionary 112,.

辞書９０と辞書１１０とはいずれも和英辞書であるが、辞書９０は日本語話者のためのものであり、辞書１１０は英語話者のためのものであるという相違がある。同様に辞書９２と辞書１１２とはいずれも英和辞書であるが、辞書９２は日本語話者のためのものであり、辞書１１２は英語話者のためのものである。 Both the dictionary 90 and the dictionary 110 are Japanese-English dictionaries, with the difference that the dictionary 90 is for Japanese speakers and the dictionary 110 is for English speakers. Similarly, the dictionary 92 and the dictionary 112 are both English-Japanese dictionaries, but the dictionary 92 is for Japanese speakers and the dictionary 112 is for English speakers.

図３は、図１に示す辞書属性記憶装置３６の構成を示す。図３を参照して、辞書属性記憶装置３６は、複数の辞書の属性情報１４０、１４２、１４４、１４６、…を記憶している。それら属性情報は、図３に示す属性情報１３０のフォーマットにしたがう。属性情報１３０は、辞書ファイル名と、辞書ファイルまでのパス名（辞書ファイルがネットワーク上にある場合にはＵＲＬ。以下単に「パス名」と呼ぶ。）と、その辞書がどの言語を母語とする話者のための辞書であるかを示す母語種別情報と、エントリを構成する第１言語の種別と、訳語を構成する第２言語の種別と、辞書に含まれるエントリ数とを含む。 FIG. 3 shows a configuration of the dictionary attribute storage device 36 shown in FIG. Referring to FIG. 3, the dictionary attribute storage device 36 stores a plurality of dictionary attribute information 140, 142, 144, 146,. The attribute information follows the format of the attribute information 130 shown in FIG. The attribute information 130 includes a dictionary file name, a path name to the dictionary file (URL if the dictionary file is on the network; hereinafter simply referred to as “path name”), and the language in which the dictionary is the native language. Native language type information indicating whether the dictionary is for a speaker, the type of the first language constituting the entry, the type of the second language constituting the translated word, and the number of entries included in the dictionary are included.

例えば、辞書_ＪＪ⇒Ｅの場合、母語種別は日本語（Ｊ）、第１言語種別は日本語（Ｊ）、第２言語種別は英語（Ｅ）となる。辞書_ＪＥ⇒Ｊの場合、母語種別は日本語（Ｊ）、第１言語種別は英語（Ｅ）、第２言語種別は日本語（Ｊ）となる。 For example, in the case of a dictionary _J J⇒E, native types are Japanese (J), the first language type is Japanese (J), the second language type is the English (E). In the case of a dictionary _J E⇒J, native types are Japanese (J), the first language type is English (E), the second language type is a Japanese (J).

再び図１を参照して、辞書自動作成装置３４は、第１言語Ｘを指定する情報３０と第２言語Ｙを指定する情報３２との入力を利用者から受取るための入力装置５０と、入力装置５０が受取った言語の組合せに応じ、対訳辞書のうち、内容語に関するエントリを抽出するための辞書の組合せを辞書属性記憶装置３６を参照して選択するための内容語用辞書選択部５２と、辞書記憶装置３８内の、内容語用辞書選択部５２により選択された２つの辞書を参照して内容語に関する対訳（内容語対）を抽出するための内容語訳抽出処理部５４と、内容語訳抽出処理部５４により抽出された内容語対を電子的に蓄積するための内容語対記憶部５６とを含む。 Referring again to FIG. 1, the dictionary automatic creation device 34 includes an input device 50 for receiving input of information 30 specifying the first language X and information 32 specifying the second language Y from the user, A content word dictionary selection unit 52 for selecting a dictionary combination for extracting an entry related to a content word from the bilingual dictionary according to the language combination received by the device 50 with reference to the dictionary attribute storage device 36. A content word translation extraction processing unit 54 for extracting a translation (content word pair) related to the content word with reference to the two dictionaries selected by the content word dictionary selection unit 52 in the dictionary storage device 38; A content word pair storage unit 56 for electronically storing the content word pairs extracted by the word translation extraction processing unit 54.

辞書自動作成装置３４はさらに、内容語用辞書選択部５２と同様に入力装置５０が受取った言語の組合せに応じ、対訳辞書のうち、機能語に関するエントリを抽出するための辞書の組合せを辞書属性記憶装置３６を参照して選択するための機能語用辞書選択部６０と、辞書記憶装置３８内の辞書のうち、機能語用辞書選択部６０により選択された２つの辞書を参照して機能語に関する対訳（機能語対）を抽出するための機能語訳抽出処理部６２と、機能語訳抽出処理部６２により抽出された機能語対を電子的に蓄積するための機能語対記憶部６４とを含む。 The dictionary automatic creation device 34 further selects a dictionary combination for extracting an entry relating to a functional word from the bilingual dictionary in accordance with the language combination received by the input device 50 in the same manner as the content word dictionary selection unit 52. A function word dictionary selection unit 60 for selecting with reference to the storage device 36 and a function word with reference to two dictionaries selected by the function word dictionary selection unit 60 among the dictionaries in the dictionary storage device 38 A functional word translation extraction processing unit 62 for extracting bilingual translations (functional word pairs), and a functional word pair storage unit 64 for electronically storing the functional word pairs extracted by the functional word translation extraction processing unit 62; including.

辞書自動作成装置３４はさらに、内容語対記憶部５６に記憶された内容語対と、機能語対記憶部６４に記憶された機能語対とをマージして第１言語から第２言語への対訳辞書４０を作成するためのマージ処理部７０を含む。なお、ここで「マージ」とは、内容語対の集合と機能語対の集合とを互いに１つの集合にまとめ、さらに何らかの順序、例えば日本語であればあいうえお順、英語などアルファベットを使用する言語であればアルファベット順に、内容語対および機能語対を配列することをいう。順序の基準となる言語は、第１言語と第２言語のどちらでもよいし、それぞれを基準としてマージしたものを２つ作成してもよい。 The dictionary automatic creation device 34 further merges the content word pair stored in the content word pair storage unit 56 and the function word pair stored in the function word pair storage unit 64 to change from the first language to the second language. A merge processing unit 70 for creating the bilingual dictionary 40 is included. Here, “merge” refers to a language that uses a set of content word pairs and a set of function word pairs in one set, and in some order, for example, Ai-Oh order in Japanese, English, etc. If so, it means that content word pairs and function word pairs are arranged in alphabetical order. The language that becomes the reference of the order may be either the first language or the second language, or two may be created by merging them based on each.

内容語用辞書選択部５２は、第１言語Ｘ、第２言語Ｙが指定されたものとすると、辞書属性記憶装置３６を参照して、_ＸＸ⇒Ｚ、_ＹＹ⇒Ｚとなるような２つの辞書を選択する機能を持つ。言語Ｚは任意の言語であり、該当する辞書が辞書記憶装置３８に存在するものであれば、どのような言語でもよい。つまり、内容語用辞書選択部５２は、第１言語Ｘを母語とする話者用の、第１言語Ｘから第３言語Ｚへの辞書と、第２言語Ｙを母語とする話者用の、第２言語Ｙから第３言語Ｚへの辞書とを選択する。 Assuming that the first language X and the second language Y are designated, the content word dictionary selection unit 52 refers to the dictionary attribute storage device 2 so that _{X X} → Z and _{Y Y} → Z. Has the ability to select two dictionaries. The language Z is an arbitrary language, and any language can be used as long as the corresponding dictionary exists in the dictionary storage device 38. That is, the content word dictionary selection unit 52 is for a speaker whose first language X is a native language, for a speaker whose first language X is a third language Z, and for a speaker whose second language Y is a native language. And a dictionary from the second language Y to the third language Z is selected.

一方機能語用辞書選択部６０は、第１言語Ｘ，第２言語Ｙが指定されたものとすると、辞書属性記憶装置３６を参照して、_ＸＺ⇒Ｘ、_ＹＺ⇒Ｙとなるような２つの辞書を選択する。つまり、機能語用辞書選択部６０は、第１言語Ｘを母語とする話者用の、第３言語Ｚから第１言語Ｘへの辞書と、第２言語Ｙを母語とする話者用の、第３言語Ｚから第２言語Ｙへの辞書とを選択する。 On the other hand the function word dictionary selecting section 60, first language X, assuming that the second language Y is designated, by referring to the dictionary attribute storage device 36, such that _X Z⇒X, _Y Z⇒Y Select two dictionaries. That is, the functional language dictionary selection unit 60 is for a speaker whose first language X is a native language, a dictionary from the third language Z to the first language X, and a speaker whose native language is the second language Y. And a dictionary from the third language Z to the second language Y is selected.

図５は、図１に示す内容語訳抽出処理部５４のより詳細なブロック図である。図５を参照して、内容語訳抽出処理部５４は、内容語用辞書選択部５２により選択された２つの辞書１７０のエントリと、辞書１７２のエントリとの全ての組合せを抽出するための単語対抽出部１８０と、単語対抽出部１８０により抽出された単語対のうち、辞書１７０から抽出されたエントリ内の訳語と辞書１７２から抽出されたエントリ内の訳語との内容に基づいて、両エントリ間の類似度Ｓ₁を次の式によって算出するための類似度算出部１８２とを含む。 FIG. 5 is a more detailed block diagram of the content word translation extraction processing unit 54 shown in FIG. Referring to FIG. 5, content word translation extraction processing unit 54 extracts words for extracting all combinations of two dictionary 170 entries and dictionary 172 entries selected by content word dictionary selecting unit 52. Of the word pairs extracted by the pair extraction unit 180 and the word pair extraction unit 180, both entries based on the contents of the translation in the entry extracted from the dictionary 170 and the translation in the entry extracted from the dictionary 172 And a similarity calculation unit 182 for calculating the similarity S ₁ between them by the following equation.

ただしｘ、ｙはそれぞれ言語Ｘ、Ｙのエントリ内に含まれる、言語Ｚの訳語の集合を表し、記号｜・｜は集合の要素数を表す。

Here, x and y represent a set of translated words of language Z included in the entries of languages X and Y, respectively, and symbols | · | represent the number of elements of the set.

内容語訳抽出処理部５４はさらに、２つのエントリ間に含まれる共通の訳語数に応じて内容語対の選択または棄却を適切に行なえるように予め実験により定められた、類似度のしきい値テーブルを記憶するしきい値テーブル記憶部１８６と、単語対抽出部１８０により抽出された単語対の各々に対し、類似度算出部１８２により算出されたしきい値が、しきい値テーブル記憶部１８６に記憶されたしきい値以上か否かにしたがって当該単語対を選択するか棄却するかを決定し、選択した単語対を内容語対記憶部５６に出力するための内容語対選択部１８４とを含む。 The content word translation extraction processing unit 54 further determines a threshold of similarity determined in advance by experiments so that a content word pair can be appropriately selected or rejected according to the number of common translation words included between two entries. The threshold value calculated by the similarity calculation unit 182 for each of the word pairs extracted by the threshold value table storage unit 186 and the word pair extraction unit 180 is stored in the threshold value table storage unit. A content word pair selection unit 184 for determining whether to select or reject the word pair according to whether or not the threshold value stored in 186 is equal to or greater than, and outputting the selected word pair to the content word pair storage unit 56 Including.

図６に、単語対抽出部１８０による単語対抽出結果を、韓国語と日本語との対訳辞書を英語を中間言語として作成する場合を例として示す。この場合、辞書１７０および１７２としてはそれぞれ_ＫＫ⇒Ｅと_ＪＪ⇒Ｅが選択される。図６を参照して、_ＫＫ⇒Ｅを参照することにより、韓国語の単語１９０と単語１９２との双方に対して、例えば２つの英語の訳語（checkおよびcheque）が抽出されたものとする。一方、日本語の単語「小切手」に対しても同じ２つの英語の訳語（checkおよびcheque）が抽出され、「防止する」に対しては３つの訳語（check, prevent, prevent from）が抽出され、「点検する」に対しては２つの訳語（examine, check）が抽出されたものとする。同様に、「照合」に対して訳語checkが、「預ける」に対して４つの訳語（leave, deposit, check, entrust）が、それぞれ抽出されたものとする。 FIG. 6 shows an example of a word pair extraction result by the word pair extraction unit 180 when a bilingual dictionary of Korean and Japanese is created with English as an intermediate language. In this case, _K K⇒E and _J J⇒E are respectively selected as the dictionary 170 and 172. With reference to FIG. 6, by referring to the _K K⇒E, for both the word 190 and words 192 Korean, for example it is assumed that two English translation (check and Cheque) are extracted . On the other hand, the same two English translations (check and check) are extracted for the Japanese word “check”, and three translations (check, prevent, prevent from) are extracted for “prevent”. It is assumed that two translations (examine, check) are extracted for “check”. Similarly, it is assumed that the translated word check is extracted for “verification” and four translated words (leave, deposit, check, entrust) are extracted for “deposit”.

この場合、韓国語の単語１９０、１９２と日本語の各単語との間で、式（１）にしたがって類似度を算出し、かつ各エントリ内の訳語内で共通するものの数にしたがって分類した結果を図７に示す。 In this case, the similarity is calculated according to the expression (1) between the Korean words 190 and 192 and the Japanese words, and is classified according to the number of common words in the translated words in each entry. Is shown in FIG.

例えば、韓国語１９０と日本語「小切手」との間では、２つの訳語（check, cheque）が共通している。また双方とも訳語は２つずつなので、類似度＝２×２／（２＋２）＝１．０００である。韓国語１９２と日本語「小切手」の間の関係も同様である。 For example, two translated words (check, check) are common between Korean 190 and Japanese “cheque”. In addition, since both have two translations, similarity = 2 × 2 / (2 + 2) = 1.000. The relationship between Korean 192 and Japanese “cheque” is similar.

一方、韓国語１９０と日本語「照合する」との間では、共通する訳語は１つである。また韓国語１９０の訳語は２つ、「照合する」の訳語は１つなので、類似度＝２×１／（２＋１）＝０．６６７となる。図７の他の行の類似度も同様に算出される。 On the other hand, there is one common translation between Korean 190 and Japanese “match”. Further, since there are two translations of Korean 190 and one translation of “verify”, similarity = 2 × 1 / (2 + 1) = 0.667. Similarities in other rows in FIG. 7 are calculated in the same manner.

図７を参照して、仮に韓国語Ｋ１に対する英語の訳語と日本語Ｊ１に対する英語の訳語とが完全に一致した場合、単語Ｋ１とＪ１とが互いに対応するものである可能性は非常に高い。しかし、図７に示すように、共有される訳語数が少なくなるにしたがい、単語Ｋ１、Ｊ１が対応するものである可能性は低くなっていく。図７に示す例では、例えばグループ（３）に属するものは採用しない方が好ましい。 Referring to FIG. 7, if the English translation for Korean K1 and the English translation for Japanese J1 completely match, it is very likely that words K1 and J1 correspond to each other. However, as shown in FIG. 7, as the number of translated words to be shared decreases, the possibility that the words K1 and J1 correspond to each other decreases. In the example shown in FIG. 7, for example, it is preferable not to use one belonging to the group (3).

そこで、共通訳語数によって、内容語対として採用するか否かを判定するための類似度のしきい値を変えることにする。具体的には、種々の実験によって大体９０パーセントの精度で内容語対を採用できるようにしきい値を決定した。日本語と韓国語との対訳辞書を英語を仲介として作成する場合の共通訳語数と、共通訳語数により変化する類似度のしきい値τとを図８に示す。図８に示すように、しきい値τは共通訳語数の単調減少関数である。そして、共通訳語数が４以上になるとしきい値τは０とする。すなわち、共通訳語数が４以上の内容語対は無条件で採用している。 Therefore, the similarity threshold for determining whether or not to adopt as a content word pair is changed according to the number of common translation words. Specifically, the threshold value was determined so that the content word pair could be adopted with an accuracy of about 90 percent by various experiments. FIG. 8 shows the number of common translation words when a bilingual dictionary of Japanese and Korean is created using English as an intermediary, and the threshold value τ of similarity that varies depending on the number of common translation words. As shown in FIG. 8, the threshold τ is a monotonically decreasing function of the number of common translation words. The threshold τ is set to 0 when the number of common translation words is 4 or more. In other words, content word pairs having a common translation word count of 4 or more are used unconditionally.

図８に示すしきい値テーブル記憶部１８６のしきい値を使用して、韓国語と日本語との間で、英語を仲介として内容語対を抽出する実験を行なった。韓国語と日本語とのエントリの組合せの数は１５７，６１８である。この組合せには、使用した韓国語辞書（５０，８２６エントリ）のうち、２８，４７９エントリを使用し、日本語辞書（２８，３１０エントリ）のうち、１７，６８７エントリを使用した。これらのうち、９０パーセントの精度で韓国語−日本語間の対訳辞書のエントリとして抽出された内容語対の数は２５，７０３であった。 Using the threshold values stored in the threshold table storage unit 186 shown in FIG. 8, an experiment was conducted between Korean and Japanese to extract content word pairs using English as an intermediary. The number of combinations of Korean and Japanese entries is 157,618. For this combination, 28,479 entries were used in the used Korean dictionary (50,826 entries), and 17,687 entries were used in the Japanese dictionary (28,310 entries). Of these, the number of content word pairs extracted as entries in the Korean-Japanese bilingual dictionary with 90% accuracy was 25,703.

一方、内容語対だけでなく、機能語対も抽出できるとさらに好ましい。図１に示す機能語訳抽出処理部６２はそのためのものである。その詳細な構成を図９に示す。 On the other hand, it is more preferable that not only content word pairs but also function word pairs can be extracted. The functional word translation extraction processing unit 62 shown in FIG. 1 is for that purpose. The detailed structure is shown in FIG.

図９を参照して、機能語訳抽出処理部６２は、前述したように機能語用辞書選択部６０により選択された２つの辞書２１０（_ＸＺ⇒Ｘ）および２１２（_ＹＺ⇒Ｙ）の言語Ｚの同一の単語エントリの組合せを全て調べ、そこに記載されている言語Ｚの例文または熟語（以下単に「例文」と呼ぶ。）、およびその訳文を全て抽出するための例文抽出部２２０と、例文抽出部２２０により抽出された同一の言語Ｚのエントリ中の言語Ｚの例文のうち、辞書２１０から抽出されたものと辞書２１２から抽出されたものとで全く一致するもの、または類似するものを、その例文に対して２つの辞書に記載されていた訳文ととともに抽出するための一致文抽出部２２２とを含む。ここでは、２つの辞書に記載されている言語Ｚの例文の対のうち、互いに完全に一致するもの、または一致する語数が７以上の類似のものを採用し、その訳文を抽出することにする。 Referring to FIG. 9, a functional translation extraction processing unit 62, the two dictionaries 210 selected by the function word dictionary selecting section 60 as described above _(X Z⇒X) and 212 _(Y Z⇒Y) An example sentence extracting unit 220 for examining all combinations of the same word entries in the language Z and extracting all the example sentences or phrases (hereinafter simply referred to as “example sentences”) of the language Z and their translations; Among the example sentences of the language Z in the entry of the same language Z extracted by the example sentence extracting unit 220, the one extracted from the dictionary 210 and the one extracted from the dictionary 212 are identical or similar. And a matching sentence extracting unit 222 for extracting the sentence together with the translated sentences described in the two dictionaries for the example sentence. Here, out of a pair of example sentences of language Z described in two dictionaries, one that completely matches each other or a similar one having seven or more matching words is extracted and its translation is extracted. .

なお、互いに完全に一致する言語Ｚの例文については無条件で抽出するようにしてもよい。また、類似する文として選択する際の一致語数のしきい値は７以上が好ましく、８以上としてもよい。 Note that example sentences in language Z that completely match each other may be extracted unconditionally. Further, the threshold of the number of matching words when selecting as similar sentences is preferably 7 or more, and may be 8 or more.

機能語訳抽出処理部６２はさらに、内容語対記憶部５６の内容を参照して、一致文抽出部２２２により抽出された２つの訳文の間で、内容語を中心としたチャンク単位でのアライメントをとるためのアライメント処理部２２４と、アライメント処理部２２４によるアライメントにより、互いに対応付けられた２つの言語Ｘ、Ｙのチャンクから内容語対記憶部５６に記憶されている内容語を取り除くことにより、当該チャンク内に残る機能語のみの対を生成するための機能語対選択部２２６とを含む。機能語対選択部２２６は、互いにアラインされたチャンク対のうち、対応する言語Ｚのチャンクが互いに完全に一致するもののみを採用し、互いに異なっているものは採用しない。機能語対選択部２２６は、採用した機能語対を機能語対記憶部６４に書込む。 The function word translation extraction processing unit 62 further refers to the contents of the content word pair storage unit 56 and aligns the two translated sentences extracted by the matching sentence extraction unit 222 in units of chunks centering on the content words. By removing the content words stored in the content word pair storage unit 56 from the chunks of the two languages X and Y associated with each other by the alignment by the alignment processing unit 224 and the alignment processing unit 224, A function word pair selection unit 226 for generating a pair of only function words remaining in the chunk. The function word pair selection unit 226 adopts only those chunks whose corresponding language Z chunks completely match each other, and do not adopt those that are different from each other. The function word pair selection unit 226 writes the adopted function word pair in the function word pair storage unit 64.

図１０を参照して、機能語対選択部２２６による処理について説明する。以下では、理解を容易にするために、韓国語と日本語との間で、英語を仲介として機能語対を抽出する場合について説明する。図１０には、例文対２４０および２４２が示されている。 Processing performed by the function word pair selection unit 226 will be described with reference to FIG. In the following, in order to facilitate understanding, a case will be described in which functional word pairs are extracted between Korean and Japanese using English as an intermediary. FIG. 10 shows example sentence pairs 240 and 242.

例文対２４０は、英和辞書から抽出された例文２５０と英韓辞書から抽出された例文２５２との英語の文が完全に一致している場合を示す。例文対２４０の文頭に記載されている記号「＝」は、英文が互いに完全に一致していることを示し、「１０」はその例文に含まれる単語数を示す。 The example sentence pair 240 indicates a case where the English sentence of the example sentence 250 extracted from the English-Japanese dictionary and the example sentence 252 extracted from the English-Korean dictionary completely match. The symbol “=” described at the beginning of the sentence of the example sentence pair 240 indicates that the English sentences are completely identical to each other, and “10” indicates the number of words included in the example sentence.

例文対２４０の場合、アライメントにより日本語のチャンク「私としては」２７０と、韓国語のチャンク２８０とが対応付けられる。また日本語のチャンク「自由が」２７２と、韓国語のチャンク２８２とが対応付けられる。このようにチャンク同士が対応付けられれば、それぞれのチャンクから内容語を差し引けば、互いに対応付けられる機能語同士が残ることになる。そうした機能語同士を機能語対として採用すればよい。 In the case of the example sentence pair 240, the Japanese chunk “I am” 270 and the Korean chunk 280 are associated with each other by alignment. Further, the Japanese chunk “Jiyuga” 272 and the Korean chunk 282 are associated with each other. If the chunks are associated with each other in this way, the function words associated with each other remain if the content word is subtracted from each chunk. Such function words may be adopted as function word pairs.

例文対２４２の場合、英和辞書からの例文２６０の英文と、英韓辞書からの例文２６２の英文とは完全には一致しない。文頭の記号は、両者の英文が類似しているが完全には一致していないことを示す。その後の数字「８」は、英語の例文のうち、一致している単語数を示す。この例では、英和辞書から抽出されたチャンク２９０と、英韓辞書から抽出されたチャンク２９２とが互いに異なっている。また「ｃｌａｓｓ」の前の前置詞にも相違がある。 In the case of the example sentence pair 242, the English sentence of the example sentence 260 from the English-Japanese dictionary does not completely match the English sentence of the example sentence 262 from the English-Korean dictionary. The symbol at the beginning of the sentence indicates that both English sentences are similar but not exactly the same. The subsequent number “8” indicates the number of matching words in the English example sentence. In this example, the chunk 290 extracted from the English-Japanese dictionary is different from the chunk 292 extracted from the English-Korean dictionary. There is also a difference in prepositions before “class”.

機能語対選択部２２６はこの場合、対応する英文部分に相違があるときには、その部分に対応するチャンクから機能語対を抽出することはしない。単に両者が完全に一致した部分に対応するチャンクから、機能語対を抽出する。 In this case, when there is a difference in the corresponding English part, the function word pair selection unit 226 does not extract the function word pair from the chunk corresponding to the part. A function word pair is simply extracted from a chunk corresponding to a part where both are completely matched.

［動作］
図１〜図１０を参照して、以上に構成を説明した辞書自動作成システム２０は以下のように動作する。まず、図１を参照して、利用者が第１言語Ｘを指定する情報３０および第２言語Ｙを指定する情報３２を入力装置５０に対して入力する。入力装置５０は、与えられた情報を内容語用辞書選択部５２および機能語用辞書選択部６０に与える。 [Operation]
With reference to FIGS. 1 to 10, the dictionary automatic creation system 20 whose configuration has been described above operates as follows. First, referring to FIG. 1, the user inputs information 30 specifying the first language X and information 32 specifying the second language Y to the input device 50. The input device 50 provides the given information to the content word dictionary selecting unit 52 and the function word dictionary selecting unit 60.

内容語用辞書選択部５２は、辞書属性記憶装置３６を参照し、_ＸＸ⇒Ｚ，_ＹＹ⇒Ｚなる辞書が存在するような言語Ｚを決定する。言語Ｚを決定するにあたっては、辞書に含まれるエントリ数を基準に、２つの辞書のエントリ数の合計が最も大きなものを選択する方法などがある。 The content word dictionary selection unit 52 refers to the dictionary attribute storage device 36 and determines a language Z in which a dictionary of _XX → Z and _YY → Z exists. In determining the language Z, there is a method of selecting the one having the largest total number of entries in the two dictionaries based on the number of entries included in the dictionary.

一方、機能語用辞書選択部６０は、辞書属性記憶装置３６を参照し、_ＸＷ⇒Ｘ，_ＹＷ⇒Ｙなる辞書が存在するような言語Ｗを決定する。言語Ｗとしては上記した言語Ｚと同じものでもよいが、異なるものでもよい。説明を簡単にするため、以下では言語Ｗ＝言語Ｚとする。なお、このようにしても一般性は失われない。 On the other hand, the function word dictionary selection unit 60 refers to the dictionary attribute storage device 36 and determines a language W in which a dictionary of _X W => _X and _Y W => _Y exists. Language W may be the same as language Z described above, but may be different. In order to simplify the description, language W = language Z is assumed below. Even in this way, generality is not lost.

内容語用辞書選択部５２は、選択された辞書に関する情報、すなわちファイル名およびパス名を内容語訳抽出処理部５４に与える。内容語訳抽出処理部５４は、このファイル名およびパス名を受取ると、辞書記憶装置３８内（またはネットワーク上）のその２つの辞書にアクセスし、内容語対を作成して内容語対記憶部５６に格納する。 The content word dictionary selection unit 52 gives information about the selected dictionary, that is, the file name and path name, to the content word translation extraction processing unit 54. Upon receiving the file name and path name, the content word translation extraction processing unit 54 accesses the two dictionaries in the dictionary storage device 38 (or on the network), creates a content word pair, and stores the content word pair storage unit. 56.

より具体的には内容語訳抽出処理部５４は、以下のように動作する。図５を参照して、単語対抽出部１８０は、指定された辞書１７０および１７２から、全ての単語対を抽出し類似度算出部１８２に与える。類似度算出部１８２は、全ての単語対（ｘ，ｙ）に対し、辞書_ＸＸ⇒Ｚから抽出されたエントリｘ内の言語Ｚの訳語の集合Ｚ（ｘ）と、辞書_ＹＹ⇒Ｚから抽出されたエントリｙ内の言語Ｚの訳語Ｚ（ｙ）との間で、上記した式（１）にしたがって類似度Ｓ₁（ｘ，ｙ）を算出し、内容語対選択部１８４に与える。 More specifically, the content word translation extraction processing unit 54 operates as follows. Referring to FIG. 5, word pair extraction section 180 extracts all word pairs from designated dictionaries 170 and 172 and provides them to similarity calculation section 182. The similarity calculation unit 182, all word pairs (x, y) with respect to, a set of translation language Z in the entry x extracted from the dictionary _X X⇒Z Z (x), from the dictionary _Y Y⇒Z The similarity S ₁ (x, y) is calculated according to the above-described equation (1) with the translated word Z (y) of the language Z in the extracted entry y, and is given to the content word pair selection unit 184.

内容語対選択部１８４は、しきい値テーブル記憶部１８６を参照し、両者に共通する言語Ｚの単語の集合Ｚ（ｘ）∩Ｚ（ｙ）の要素数に応じたしきい値を得て、類似度算出部１８２から与えられた類似度Ｓ₁とこのしきい値とを比較する。内容語対選択部１８４は、類似度Ｓ₁（ｘ，ｙ）がしきい値より大きければこの単語対ｘ、ｙを内容語対として採用し、内容語対記憶部５６に書込む。類似度Ｓ₁（ｘ，ｙ）がしきい値以下であれば内容語対選択部１８４はこの単語対を棄却する。 The content word pair selection unit 184 refers to the threshold value table storage unit 186 and obtains a threshold value corresponding to the number of elements in the language Z word set Z (x) ∩Z (y) common to both. The similarity S ₁ given from the similarity calculation unit 182 is compared with this threshold value. If the similarity S ₁ (x, y) is greater than the threshold value, the content word pair selection unit 184 adopts the word pair x, y as the content word pair and writes it into the content word pair storage unit 56. If the similarity S ₁ (x, y) is less than or equal to the threshold value, the content word pair selection unit 184 rejects the word pair.

以上の処理を単語対抽出部１８０によって抽出された全ての単語対に対し繰返すことにより、内容語対記憶部５６に言語Ｘと言語Ｙとの内容語対の集合が蓄積されていく。 By repeating the above processing for all the word pairs extracted by the word pair extraction unit 180, a set of content word pairs of language X and language Y is accumulated in the content word pair storage unit 56.

内容語対記憶部５６への内容語対の蓄積が完了すると、図１に示す機能語訳抽出処理部６２によって、機能語対の抽出が行なわれる。より具体的には、機能語訳抽出処理部６２は以下のように動作する。 When the accumulation of the content word pairs in the content word pair storage unit 56 is completed, the function word pair extraction is performed by the function word translation extraction processing unit 62 shown in FIG. More specifically, the function word translation extraction processing unit 62 operates as follows.

図９を参照して、例文抽出部２２０は、機能語用辞書選択部６０により選択された２つの辞書２１０（_ＸＺ⇒Ｘ）および２１２（_ＹＺ⇒Ｙ）について、言語Ｚの単語ｚに対応する２つのエントリから例文を抽出する処理を、全ての単語ｚについて行ない、抽出された例文を一致文抽出部２２２に与える。一致文抽出部２２２は、単語ｚに対応する２つのエントリから得られた例文のうち、互いに全く一致する、または類似する言語Ｚの例文があるか否かを判定し、そうした例文をアライメント処理部２２４に与える。 Referring to FIG. 9, example sentence extraction unit 220 converts word dictionary Z to word z for two dictionaries 210 ( _X Z => _X ) and 212 ( _Y Z => _Y ) selected by function word dictionary selection unit 60. The process of extracting example sentences from the two corresponding entries is performed for all the words z, and the extracted example sentences are given to the matching sentence extraction unit 222. The matching sentence extraction unit 222 determines whether or not there is an example sentence in the language Z that is exactly the same as or similar to each other among the example sentences obtained from the two entries corresponding to the word z, and uses the example sentence as an alignment processing part. 224.

アライメント処理部２２４は、一致文抽出部２２２から与えられた例文のうち、言語Ｚの同じ例文に対する訳文である言語Ｘ及び言語Ｙの例文対に対し、内容語対記憶部５６に記憶されている内容語対を用いてアライメントをとる。アライメント処理部２２４は、アライメントをとった例文対を機能語対選択部２２６に与える。機能語対選択部２２６は、アライメントをとった例文のうち、対応する言語Ｚの部分が完全に一致するチャンク同士から、内容語対記憶部５６を参照してそれぞれの言語の内容語を取り除き、残った機能語同士を対にして機能語対記憶部６４に与える。 The alignment processing unit 224 stores the example sentence pairs of the language X and the language Y, which are translations of the same example sentence in the language Z among the example sentences given from the matching sentence extraction unit 222, in the content word pair storage unit 56. Align using content word pairs. The alignment processing unit 224 gives the aligned example sentence pair to the function word pair selection unit 226. The function word pair selection unit 226 removes the content words of each language by referring to the content word pair storage unit 56 from the chunks in which the corresponding language Z portions completely match among the aligned example sentences, The remaining function words are paired and given to the function word pair storage unit 64.

こうした処理を例文抽出部２２０によって抽出されたすべての例文に対し繰返すことにより、機能語対記憶部６４に言語Ｘと言語Ｙとの機能語対が蓄積される。 By repeating such processing for all the example sentences extracted by the example sentence extraction unit 220, the function word pairs of the language X and the language Y are accumulated in the function word pair storage unit 64.

図１を参照して、マージ処理部７０は、内容語対記憶部５６に記憶されている多数の内容語対と、機能語対記憶部６４に記憶されている多数の機能語対とを互いにマージして１つの辞書形式のファイルとし、言語Ｘと言語Ｙとの間の電子対訳辞書４０を作成する。 Referring to FIG. 1, merge processing unit 70 combines a large number of content word pairs stored in content word pair storage unit 56 and a large number of function word pairs stored in function word pair storage unit 64. The electronic bilingual dictionary 40 between the language X and the language Y is created by merging into one dictionary format file.

以上のように本実施の形態に係る辞書自動作成システム２０によれば、最初に２つの言語ＸおよびＹを指定することにより、内容語対作成、および機能語対作成のために適した特定の方向性を持った辞書をそれぞれ選択し、内容語対と機能語対とが作成される。さらにそれらをマージすることで電子対訳辞書４０が作成される。辞書はそれぞれの処理に対応して、できるだけ多数のエントリ対が得られるように選択されるので、最終的に得られる電子対訳辞書４０のエントリ数も多く、機械翻訳に利用する際に有用である。 As described above, according to the dictionary automatic creation system 20 according to the present embodiment, by specifying two languages X and Y first, a specific word suitable for creating a content word pair and a function word pair is created. Each directional dictionary is selected, and a content word pair and a function word pair are created. Furthermore, the electronic bilingual dictionary 40 is created by merging them. The dictionary is selected so that as many entry pairs as possible can be obtained corresponding to each processing, so that the number of entries in the electronic bilingual dictionary 40 finally obtained is large, which is useful when used for machine translation. .

なお、上記実施の形態では、内容語対と機能語対との双方を用いて電子対訳辞書を作成している。しかし本発明はそうした実施の形態には限定されない。例えば内容語対のみの辞書を作成するようにしてもよい。例えば日本語と韓国語のように膠着語に属する言語同士の場合には、上記した実施の形態の方法により機能語対が比較的高い精度で抽出できるが、言語の組合せによっては精度が低くなることもあり得る。そうした場合には、機能語対の抽出を止めるような選択ができるようにしてもよい。 In the above embodiment, an electronic bilingual dictionary is created using both content word pairs and function word pairs. However, the present invention is not limited to such an embodiment. For example, a dictionary of only content word pairs may be created. For example, in the case of languages belonging to an agglutinative language such as Japanese and Korean, function word pairs can be extracted with a relatively high accuracy by the method of the above-described embodiment, but the accuracy may be lowered depending on the combination of languages. It can happen. In such a case, a selection may be made to stop the extraction of function word pairs.

また上記実施の形態では、仲介する言語Ｚは辞書自動作成システム２０が決定している。この決定の基準としては、上記したようにエントリ数（の和）が多くなるものを選ぶほかに、言語により優先順位を付けたり、言語の種類を考慮したり、辞書の作成された時代または年号などを考慮したりしてもよい。また、仲介する言語Ｚを辞書自動作成システム２０が決定するのではなく、利用者が明示的に指定するようにしてもよい。この場合にも、複数の辞書の組合せが可能な場合には、エントリ数の和が最大となるような辞書の組合せを選択すると好ましい。 Moreover, in the said embodiment, the language Z to mediate is determined by the dictionary automatic creation system 20. As a criterion for this decision, in addition to selecting the one with the larger number of entries (the sum of the entries) as described above, prioritizing by language, considering the type of language, era or year when the dictionary was created May be considered. In addition, the dictionary automatic creation system 20 does not determine the language Z to mediate, but the user may explicitly specify it. Also in this case, when a plurality of dictionary combinations are possible, it is preferable to select a dictionary combination that maximizes the sum of the number of entries.

さらに、内容語対を作成する際の仲介言語と、機能語対を作成する際の仲介言語とを互いに一致させるようにしてもよい。 Furthermore, the mediation language when creating the content word pair and the mediation language when creating the function word pair may be made to coincide with each other.

さらに、上記実施の形態では、利用可能な辞書は全て辞書記憶装置３８に予め記憶してあるものとしたが、本発明はそのような実施の形態には限定されない。例えば辞書が遠隔地にあり、ネットワークでアクセス可能なものでもよいし、または何らかの蓄積メディアに格納されたものを、辞書自動作成システム２０からの指定にしたがって人間が読取装置にセットするような方法も可能である。 Furthermore, in the above-described embodiment, all available dictionaries are stored in the dictionary storage device 38 in advance, but the present invention is not limited to such an embodiment. For example, there may be a method in which a dictionary is located in a remote place and can be accessed via a network, or a person stored in some storage medium is set in a reading device by a human according to the designation from the dictionary automatic creation system 20. Is possible.

［コンピュータによる実現］
この実施の形態に係る辞書自動作成システム２０は、コンピュータハードウェアと、そのコンピュータハードウェアにより実行されるプログラムと、コンピュータハードウェアに格納されるデータとにより実現可能である。図１１はこのコンピュータシステム３３０の外観を示し、図１２はコンピュータシステム３３０の内部構成を示す。 [Realization by computer]
The dictionary automatic creation system 20 according to this embodiment can be realized by computer hardware, a program executed by the computer hardware, and data stored in the computer hardware. FIG. 11 shows the external appearance of the computer system 330, and FIG. 12 shows the internal configuration of the computer system 330.

図１１を参照して、このコンピュータシステム３３０は、ＦＤ（フレキシブルディスク）ドライブ３５２およびＣＤ−ＲＯＭ（コンパクトディスク読出専用メモリ）ドライブ３５０を有するコンピュータ３４０と、キーボード３４６と、マウス３４８と、モニタ３４２とを含む。 Referring to FIG. 11, a computer system 330 includes a computer 340 having an FD (flexible disk) drive 352 and a CD-ROM (compact disk read only memory) drive 350, a keyboard 346, a mouse 348, and a monitor 342. including.

図１２を参照して、コンピュータ３４０は、ＦＤドライブ３５２およびＣＤ−ＲＯＭドライブ３５０に加えて、ＣＰＵ（中央処理装置）３５６と、ＣＰＵ３５６、ＦＤドライブ３５２およびＣＤ−ＲＯＭドライブ３５０に接続されたバス３６６と、ブートアッププログラム等を記憶する読出専用メモリ（ＲＯＭ）３５８と、バス３６６に接続され、プログラム命令、システムプログラム、および作業データ等を記憶するランダムアクセスメモリ（ＲＡＭ）３６０とを含む。コンピュータシステム３３０はさらに、プリンタ３４４を含んでいる。 Referring to FIG. 12, in addition to FD drive 352 and CD-ROM drive 350, computer 340 includes CPU (Central Processing Unit) 356 and bus 366 connected to CPU 356, FD drive 352, and CD-ROM drive 350. And a read only memory (ROM) 358 for storing a boot-up program and the like, and a random access memory (RAM) 360 connected to the bus 366 for storing a program command, a system program, work data, and the like. Computer system 330 further includes a printer 344.

ここでは示さないが、コンピュータ３４０はさらにローカルエリアネットワーク（ＬＡＮ）への接続を提供するネットワークアダプタボードを含んでもよい。 Although not shown here, the computer 340 may further include a network adapter board that provides a connection to a local area network (LAN).

コンピュータシステム３３０にこの実施の形態に係る辞書自動作成システム２０としての動作を行なわせるためのコンピュータプログラムは、ＣＤ−ＲＯＭドライブ３５０またはＦＤドライブ３５２に挿入されるＣＤ−ＲＯＭ３６２またはＦＤ３６４に記憶され、さらにハードディスク３５４に転送される。または、プログラムは図示しないネットワークを通じてコンピュータ３４０に送信されハードディスク３５４に記憶されてもよい。プログラムは実行の際にＲＡＭ３６０にロードされる。ＣＤ−ＲＯＭ３６２から、ＦＤ３６４から、またはネットワークを介して、直接にＲＡＭ３６０にプログラムをロードしてもよい。 A computer program for causing the computer system 330 to operate as the dictionary automatic creation system 20 according to this embodiment is stored in the CD-ROM 362 or FD 364 inserted in the CD-ROM drive 350 or FD drive 352, and further Transferred to the hard disk 354. Alternatively, the program may be transmitted to the computer 340 through a network (not shown) and stored in the hard disk 354. The program is loaded into the RAM 360 when executed. The program may be loaded directly into the RAM 360 from the CD-ROM 362, from the FD 364, or via a network.

このプログラムは、コンピュータ３４０にこの実施の形態に係る辞書自動作成システム２０としての動作を行なわせる複数の命令を含む。この方法を行なわせるのに必要な基本的機能のいくつかはコンピュータ３４０上で動作するオペレーティングシステム（ＯＳ）またはサードパーティのプログラム、もしくはコンピュータ３４０にインストールされる各種ツールキットのモジュールにより提供される。したがって、このプログラムはこの実施の形態の辞書自動作成システム２０としての動作を実現するのに必要な機能全てを必ずしも含まなくてよい。このプログラムは、命令のうち、所望の結果が得られるように制御されたやり方で適切な機能または「ツール」を呼出すことにより、上記した辞書自動作成システム２０としての動作を実現する命令のみを含んでいればよい。コンピュータシステム３３０の動作は周知であるので、ここでは繰り返さない。 This program includes a plurality of instructions for causing the computer 340 to operate as the dictionary automatic creation system 20 according to this embodiment. Some of the basic functions necessary to perform this method are provided by operating system (OS) or third party programs running on the computer 340 or various toolkit modules installed on the computer 340. Therefore, this program does not necessarily include all functions necessary for realizing the operation as the dictionary automatic creation system 20 of this embodiment. This program includes only instructions that realize the operation as the dictionary automatic creation system 20 by calling an appropriate function or “tool” in a controlled manner so as to obtain a desired result. Just go out. The operation of computer system 330 is well known and will not be repeated here.

今回開示された実施の形態は単に例示であって、本発明が上記した実施の形態のみに制限されるわけではない。本発明の範囲は、発明の詳細な説明の記載を参酌した上で、特許請求の範囲の各請求項によって示され、そこに記載された文言と均等の意味および範囲内でのすべての変更を含む。 The embodiment disclosed herein is merely an example, and the present invention is not limited to the above-described embodiment. The scope of the present invention is indicated by each claim in the claims after taking into account the description of the detailed description of the invention, and all modifications within the meaning and scope equivalent to the wording described therein are intended. Including.

本発明の一実施の形態に係る辞書自動作成システム２０のブロック図である。1 is a block diagram of an automatic dictionary creation system 20 according to an embodiment of the present invention. 辞書記憶装置３８の構成の一例を模式的に示す図である。3 is a diagram schematically showing an example of the configuration of a dictionary storage device 38. FIG. 辞書属性記憶装置３６の構成の一例を模式的に示す図である。3 is a diagram schematically illustrating an example of a configuration of a dictionary attribute storage device 36. FIG. 辞書の方向性を説明するための図である。It is a figure for demonstrating the directionality of a dictionary. 内容語訳抽出処理部５４のブロック図である。4 is a block diagram of a content word translation extraction processing unit 54. FIG. 韓国語と日本語との間での単語同士の類似度を説明するための図である。It is a figure for demonstrating the similarity of the words between Korean and Japanese. 韓国語と日本語との間での単語同士の類似度を説明するための図である。It is a figure for demonstrating the similarity of the words between Korean and Japanese. 韓国語と日本語との間での、共通訳語数により変化するしきい値を表形式で示す図である。It is a figure which shows the threshold value which changes with the number of common translation between Korean and Japanese by a table form. 機能語訳抽出処理部６２のブロック図である。6 is a block diagram of a function word translation extraction processing unit 62. FIG. 機能語訳抽出処理部６２による機能語対抽出の原理を説明するための図である。It is a figure for demonstrating the principle of the function word pair extraction by the function word translation extraction process part 62. FIG. 本発明の一実施の形態に係る辞書自動作成システム２０を実現するコンピュータシステムの外観図である。1 is an external view of a computer system that realizes an automatic dictionary creation system 20 according to an embodiment of the present invention. 図１１に示すコンピュータシステムのブロック図である。It is a block diagram of the computer system shown in FIG.

Explanation of symbols

２０辞書自動作成システム、３０第１言語を指定する情報、３２第２言語を指定する情報、３４辞書自動作成装置、３６辞書属性記憶装置、３８辞書記憶装置、４０電子対訳辞書、５０入力装置、５２内容語用辞書選択部、５４内容語訳抽出処理部、５６内容語対記憶部、６０機能語用辞書選択部、６２機能語訳抽出処理部、６４機能語対記憶部、７０マージ処理部、１８０単語対抽出部、１８２類似度算出部、１８４内容語対選択部、１８６しきい値テーブル記憶部、２２０例文抽出部、２２２一致文抽出部、２２４アライメント処理部、２２６機能語対選択部 20 dictionary automatic creation system, 30 information specifying the first language, 32 information specifying the second language, 34 dictionary automatic creation device, 36 dictionary attribute storage device, 38 dictionary storage device, 40 electronic bilingual dictionary, 50 input device, 52 content word dictionary selection unit, 54 content word translation extraction processing unit, 56 content word pair storage unit, 60 function word dictionary selection unit, 62 function word translation extraction processing unit, 64 function word pair storage unit, 70 merge processing unit , 180 word pair extraction unit, 182 similarity calculation unit, 184 content word pair selection unit, 186 threshold value table storage unit, 220 example sentence extraction unit, 222 matching sentence extraction unit, 224 alignment processing unit, 226 function word pair selection unit

Claims

A bilingual dictionary creation method for automatically creating a bilingual dictionary between a first language and a second language by using a third language as an intermediary,
A first dictionary with directionality from the first language to the third language for the first language speaker, and the second language to the second language for the second language speaker; Preparing an electronically readable second dictionary having directionality to three languages;
Of entry pairs consisting of entries related to content words in the first dictionary and entries related to content words in the second dictionary, defined as a predetermined function for a set of translated words in the third language included in each entry Extracting an entry pair whose similarity value is greater than or equal to a predetermined threshold;
A bilingual dictionary creation method including a first step of storing the content word pairs corresponding to the entry pairs extracted in the extracting step in an electronically readable form.

The step of preparing comprises
Providing a plurality of electronically readable bilingual dictionaries, each having a predetermined direction, for a predetermined language speaker;
Receiving a designation for the first language and the second language;
A dictionary having a direction from the first language to another language designated in the step of receiving the designation from the plurality of bilingual dictionaries, and the other language from the second language designated in the step of receiving the designation. The bilingual dictionary creation method according to claim 1, further comprising a step of selecting a dictionary pair including a dictionary having directionality to a language.

The step of preparing comprises
Providing a plurality of electronically readable bilingual dictionaries, each having a predetermined direction, for a predetermined language speaker;
Receiving designations for the first language and the second language, and the third language;
From the plurality of bilingual dictionaries, a dictionary having a direction from the first language to the third language designated in the step of receiving the designation, and from the second language designated in the step of receiving the designation The bilingual dictionary creation method of Claim 1 including the step of selecting the dictionary pair which consists of a dictionary with the directionality to the said 3rd language.

A third dictionary with direction from the third language to the first language for speakers of the first language and a third dictionary from the third language for speakers of the second language; Preparing a fourth dictionary with directionality in two languages in an electronically readable form;
Extracting the third language example sentence that satisfies a predetermined condition and its translation from both the third dictionary entry and the fourth dictionary entry corresponding to the same third language heading Creating a bilingual sentence pair;
Of the bilingual sentence pairs created in the step of creating the bilingual sentence pair, taking the alignment between the translated sentences;
A second step of extracting function word pairs from the chunks aligned with each other in the step of taking the alignment and storing them in an electronically readable form;
4. The method according to claim 1, further comprising: merging the content word pairs accumulated in the first and second steps and the function word pairs accumulated in the electronically readable form. How to create a bilingual dictionary as described in Crab.

A computer program that, when executed by a computer, controls the computer to execute all the steps of the bilingual dictionary creation method according to any one of claims 1 to 4.

A bilingual dictionary creation device for automatically creating a bilingual dictionary between a first language and a second language, using the third language as an intermediary,
A first dictionary with directionality from the first language to the third language for the first language speaker, and the second language to the second language for the second language speaker; Dictionary preparing means for preparing a second dictionary having directionality to three languages in an electronically readable form;
Of entry pairs consisting of entries related to content words in the first dictionary and entries related to content words in the second dictionary, defined as a predetermined function for a set of translated words in the third language included in each entry An entry pair extraction means for extracting an entry pair whose similarity value is equal to or greater than a predetermined threshold value;
A bilingual dictionary creation device comprising: content word pair storage means for storing content word pairs corresponding to the entry pairs extracted by the entry pair extraction means in an electronically readable form.