JPH0997256A

JPH0997256A - Dictionary registration device and machine translation device

Info

Publication number: JPH0997256A
Application number: JP7251263A
Authority: JP
Inventors: Mihoko Kitamura; 美穂子北村; Hideki Yamamoto; 秀樹山本; Mitsuo Shimohata; 光夫下畑
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1995-09-28
Filing date: 1995-09-28
Publication date: 1997-04-08
Anticipated expiration: 2015-09-28
Also published as: JP3429612B2

Abstract

PROBLEM TO BE SOLVED: To enable the automatic preparation and the automatic setting of a document corresponding dictionary by the unit according to the feature of a document by extracting the feature of an inputted document, deciding the preparation necessity of the document corresponding dictionary, preparing the document corresponding dictionary and perform a registration operation. SOLUTION: A document feature extraction part 9 extracts document feature information from an inputted Japanese document. A document corresponding dictionary deciding part 11 calculates the degree of similarity of the extracted document feature information and the dictionary feature information corresponding to each document corresponding dictionary 16, 17,... within a dictionary feature data base 10 and decides the document corresponding dictionary of the dictionary feature information having the maximum degree of similarity from the dictionary feature data base 10. When plural document corresponding dictionarys of the maximum degree of similarity exist, they are allowed to be selected by a user. When the document corresponding dictionary to be registration object is determined by including the case where the document corresponding dictionary is prepared in this way, the dictionary name determined by the document corresponding dictionary decision part 11 is set to the dictionary registration part 14 of a dictionary interface part 4. When the user inputs a word to be registered, a word registration is performed for the document corresponding dictionary having this dictionary name.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、辞書作成装置及び
機械翻訳装置に関し、特に、複数の個別辞書を作成した
りそれらを使用したり機能を持った装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a dictionary creating device and a machine translation device, and more particularly to a device having a function of creating a plurality of individual dictionaries and using them.

【０００２】[0002]

【従来の技術】機械翻訳装置において、高い訳質（訳文
の品質）を維持するためには、ユーザが個別に作成する
辞書や専門用語辞書の利用が欠かせず、従来、複数の異
なる種類の辞書を備えた装置が種々提案されている。ま
た、機械翻訳装置が複数の個別辞書や専門用語辞書を保
持している場合であっても、より高い訳質を達成するた
めには、複数種類の辞書の中から入力文書にとって最適
な辞書を選択し、その辞書を使用することが必要であ
る。2. Description of the Related Art In a machine translation apparatus, in order to maintain high translation quality (quality of translated text), it is essential to use a dictionary or a technical term dictionary created individually by a user. Various devices having a dictionary have been proposed. Even if the machine translation device holds multiple individual dictionaries and technical term dictionaries, in order to achieve higher translation quality, the most appropriate dictionary for the input document is selected from among multiple types of dictionaries. It is necessary to choose and use that dictionary.

【０００３】従来においては、辞書の選択は、機械翻訳
装置の使用者が自ら行なっていた。しかし、機械翻訳装
置が入力文書に最適な辞書を自動的に選択する機能があ
れば、翻訳作業の効率化が図られ、かつ高品質な翻訳結
果を得ることができる。Conventionally, the user of the machine translation device has selected the dictionary by himself / herself. However, if the machine translation device has a function of automatically selecting an optimal dictionary for an input document, the translation work can be made efficient and high-quality translation results can be obtained.

【０００４】文献１『特開平６−３３２９４６号公報』文献１に記載の機械翻訳装置では、複数種類の辞書の中
から翻訳対象となる文書に最適な辞書を自動的に選択す
る方法として、入力文書中から、基本語辞書に記述され
ている文脈ベクトルを抽出し、その文脈ベクトルから入
力文書の専門分野を判断して行なう方法が提案されてい
る。すなわち、各辞書には、その辞書に係る文脈ベクト
ルの専門分野のコードが付与されており、入力文書から
判断された専門分野（文脈ベクトル）と同じ専門分野コ
ードを持つ辞書を選択するという方法をとっている。Reference 1 “Japanese Patent Laid-Open No. 6-332946” In the machine translation apparatus described in Reference 1, as a method of automatically selecting an optimal dictionary for a document to be translated from a plurality of types of dictionaries, input is performed. A method has been proposed in which a context vector described in a basic word dictionary is extracted from a document and the specialized field of the input document is determined from the context vector. That is, each dictionary is assigned a code of the specialized field of the context vector related to the dictionary, and a method of selecting a dictionary having the same specialized field code as the specialized field (context vector) determined from the input document is used. I am taking it.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら、上記文
献１に記載の辞書選択方法は、専門辞書の分野コードは
辞書作成時に固定されるため、固定された専門分野内で
の辞書の選択しかできない。However, according to the dictionary selection method described in Document 1, since the field code of the specialized dictionary is fixed when the dictionary is created, only the dictionary can be selected within the fixed specialized field.

【０００６】すなわち、入力文書から得た文脈ベクトル
から推定できるのは、個別辞書作成時に予め辞書に与え
られた専門用語の分野コード（例えば、「化学」、「情
報処理」）だけであり、この分野コード以外の入力文書
の特徴による辞書の分類及び選択はできない。例えば、
「構造」という単語の訳語を考えると、「情報処理」の
分野でも、その文書の作成者や内容によって“structur
e ”と翻訳される場合と“construction”と翻訳される
場合が考えられ、これらは「情報処理」という分類では
訳し分けをすることができない。That is, only the field codes (for example, "chemistry" and "information processing") of technical terms given to the dictionary at the time of creating the individual dictionary can be estimated from the context vector obtained from the input document. It is not possible to classify or select a dictionary according to the characteristics of the input document other than the field code. For example,
Considering the translation of the word "structure", even in the field of "information processing", depending on the creator and content of the document, "structur"
It may be translated as "e" or as "construction", and these cannot be translated by the classification of "information processing".

【０００７】一般に、使用者が辞書を作成する場合、翻
訳の対象となる文書が存在し、その文書又はその文書と
同類の文書の翻訳品質を高めるために、翻訳処理に必要
な辞書を作成する。このため、辞書は、専門分野等とい
う固定された分類で用意されるのではなく、翻訳対象と
なる文書の種類や特徴を単位とした分類で辞書が用意さ
れ、選択されることが望ましい。Generally, when a user creates a dictionary, a document to be translated exists, and a dictionary necessary for translation processing is created in order to improve the translation quality of the document or a document similar to the document. . Therefore, it is desirable that the dictionary is not prepared in a fixed classification such as a specialized field, but is prepared and selected in a classification based on the type and characteristics of the document to be translated.

【０００８】そのため、専門分野コードを付すことな
く、翻訳対象文書に最適な辞書を選択できるように辞書
を作成、登録できる辞書登録装置が求められていると共
に、そのような辞書を翻訳に有効に利用できる機械翻訳
装置が求められている。Therefore, there is a demand for a dictionary registration device capable of creating and registering a dictionary so that an optimum dictionary can be selected for a document to be translated without adding a specialized field code, and such a dictionary is effectively used for translation. There is a need for a machine translation device that can be used.

【０００９】[0009]

【課題を解決するための手段】かかる課題を解決するた
め、第１の本発明においては、機械翻訳装置で使用され
る辞書の作成、登録を行なう辞書登録装置において、以
下の各手段を有することを特徴とする。In order to solve such a problem, in the first aspect of the present invention, a dictionary registration device for creating and registering a dictionary used in a machine translation device has the following means. Is characterized by.

【００１０】すなわち、第１の本発明による辞書登録装
置は、(A) 自然言語で記述された入力文書から、その入
力文書内の単語やイディオムの出現頻度情報を少なくと
も含む文書特徴情報を抽出する文書特徴抽出手段と、
(B) 既に作成されている１以上の文書対応辞書のそれぞ
れについて、文書特徴情報と同様な形式を有する辞書特
徴情報を格納している辞書特徴格納手段と、(C) 文書特
徴抽出手段で抽出された文書特徴情報と、辞書特徴格納
手段に格納されている各辞書特徴情報との類似度を得、
得られた各類似度に基づいて、新たな文書対応辞書の作
成の必要性、及び、新たな文書対応辞書の作成の必要性
がない場合には登録に供する既存の文書対応辞書を決定
すると共に、新たな文書対応辞書を作成する場合には、
抽出された文書特徴情報を、その辞書特徴情報として辞
書特徴格納手段に格納させ、既存の文書対応辞書が登録
に供するものと決定された場合には、抽出された文書特
徴情報に応じて、辞書特徴格納手段に格納されている辞
書特徴情報を更新する類似度判定手段と、(D) この類似
度判定手段の判定結果に基づいて、必要ならば新たな文
書対応辞書を作成した後、判定された文書対応辞書に対
して辞書登録を行なう辞書登録手段とを有することを特
徴とする。That is, the dictionary registration device according to the first aspect of the present invention extracts (A) document feature information including at least appearance frequency information of words and idioms in an input document described in natural language. Document feature extraction means,
(B) For each of the one or more document-corresponding dictionaries that have already been created, a dictionary feature storage unit that stores dictionary feature information having a format similar to the document feature information, and (C) a document feature extraction unit. Obtain the similarity between the document feature information created and each dictionary feature information stored in the dictionary feature storage means,
Based on the obtained similarities, it is necessary to create a new document correspondence dictionary and, if there is no need to create a new document correspondence dictionary, determine an existing document correspondence dictionary to be registered. , To create a new document correspondence dictionary,
The extracted document feature information is stored in the dictionary feature storage means as the dictionary feature information, and when it is determined that the existing document corresponding dictionary is to be registered, the dictionary is stored according to the extracted document feature information. Similarity determination means for updating the dictionary feature information stored in the feature storage means, and (D) based on the determination result of this similarity determination means, if necessary, after creating a new document correspondence dictionary, the determination is made. And a dictionary registration means for registering a dictionary in the document corresponding dictionary.

【００１１】この第１の本発明の辞書登録装置により、
固定的な分野単位ではなく、入力文書の特徴に応じた単
位で文書対応辞書の自動作成や自動設定が可能となる。According to the dictionary registration device of the first aspect of the present invention,
It is possible to automatically create and automatically set the document correspondence dictionary in units according to the characteristics of the input document, not in fixed field units.

【００１２】第２の本発明においては、辞書登録構成を
備えた機械翻訳装置において、上記(A) 〜(D) の手段を
備えると共に、さらに、以下の手段を有することを特徴
とする。In a second aspect of the present invention, a machine translation apparatus having a dictionary registration structure is provided with the above-mentioned means (A) to (D) and further has the following means.

【００１３】すなわち、第２の本発明の機械翻訳装置
は、(E) 翻訳対象入力文書から、その翻訳対象入力文書
内の単語やイディオムの出現頻度情報を少なくとも文書
特徴情報を抽出する第２の文書特徴抽出手段と、(F) 第
２の文書特徴抽出手段で抽出された文書特徴情報と、辞
書特徴格納手段に格納されている各辞書特徴情報との類
似度を得、得られた各類似度に基づいて、翻訳処理に用
いる１又は２以上の文書対応辞書を決定する第２の類似
度判定手段と、(G) 翻訳対象入力文書を、翻訳処理に用
いると決定された文書対応辞書の内容を利用して翻訳す
る翻訳実行手段とをさらに有している。That is, the machine translation apparatus of the second aspect of the present invention is (E) a second feature of extracting at least document feature information from the translation target input document, the appearance frequency information of words and idioms in the translation target input document. The similarity between the document feature extraction means, (F) the document feature information extracted by the second document feature extraction means, and each dictionary feature information stored in the dictionary feature storage means is obtained, and each obtained similarity is obtained. Second similarity determining means for determining one or more document correspondence dictionaries to be used in the translation process based on the degree, and (G) the translation target input document to the document correspondence dictionary decided to be used in the translation process. It further has a translation executing means for translating the contents.

【００１４】この第２の本発明の機械翻訳装置は、翻訳
対象文書から文書特徴を抽出し、文書の特徴に応じた単
位の複数の文書対応辞書から最適なものを選択するよう
にしているので、同一単語であっても入力された翻訳対
象文書によって訳し分けを行なうことができ、固定的な
分野単位で辞書を用意している従来装置に比較して、一
段と訳質を高められる。Since the machine translation apparatus of the second aspect of the present invention extracts the document feature from the document to be translated and selects the optimum one from a plurality of document correspondence dictionaries in units according to the feature of the document. Even if the same word is used, the translation can be performed according to the input translation target document, and the translation quality can be further improved as compared with the conventional device that prepares a dictionary in fixed field units.

【００１５】[0015]

【発明の実施の形態】以下、本発明を日英機械翻訳装置
に適用した一実施形態を図面を参照しながら詳述する。BEST MODE FOR CARRYING OUT THE INVENTION An embodiment in which the present invention is applied to a Japanese-English machine translation device will be described in detail below with reference to the drawings.

【００１６】なお、この実施形態の日英機械翻訳装置は
辞書作成、登録機能を具備しており、その実現構成は、
本発明による辞書作成装置の一実施形態を構成してい
る。すなわち、この実施形態の機械翻訳装置は、動作モ
ードとして、辞書登録モードと翻訳モードとを有してい
る。The Japanese-English machine translation apparatus of this embodiment has a dictionary creating and registering function, and its implementation structure is as follows:
It constitutes one embodiment of the dictionary creating apparatus according to the present invention. That is, the machine translation device of this embodiment has a dictionary registration mode and a translation mode as operation modes.

【００１７】（Ａ）第１の実施形態第１の実施形態の機械翻訳装置は、実際上、キーボード
やマウス等の入力装置や、ＣＲＴディスプレイや液晶デ
ィスプレイやプリンタ等の出力装置や、ハードディスク
装置等の補助記憶装置を備えたワークステーションやミ
ニコンやパソコン等の情報処理装置で構成されている
が、この実施形態の特徴から、要部構成を機能部に分け
ると、図１に示す機能ブロック図に示す構成を有する。(A) First Embodiment The machine translation apparatus of the first embodiment is actually an input device such as a keyboard and a mouse, an output device such as a CRT display, a liquid crystal display and a printer, a hard disk device and the like. It is composed of a workstation provided with an auxiliary storage device, an information processing device such as a minicomputer, a personal computer, etc. However, from the feature of this embodiment, when the main part configuration is divided into functional parts, a functional block diagram shown in FIG. 1 is obtained. It has the structure shown.

【００１８】図１において、第１の実施形態の機械翻訳
装置は、使用者が翻訳したい文書を入力したり翻訳結果
を使用者に表示したりするためのユーザインターフェイ
ス部１、辞書作成用入力文書や翻訳対象文書の特徴を抽
出し、辞書の作成有無や使用辞書を判定したりする辞書
判定部２、辞書判定部２で判定された辞書に登録等の要
求を出したり辞書判定部２で判定された辞書から翻訳処
理に使用する辞書内容を取出したりする辞書インターフ
ェイス部４、翻訳処理を実行する翻訳実行部３、及び、
複数の辞書が格納されている辞書格納部５から構成され
る。In FIG. 1, the machine translation apparatus of the first embodiment is a user interface section 1 for the user to input a document to be translated and to display the translation result to the user, and an input document for dictionary creation. Or a feature of a translation target document is extracted to determine whether or not to create a dictionary and to determine a dictionary to be used, a dictionary determination unit 2, a request such as registration to the dictionary determined by the dictionary determination unit 2, or a determination by the dictionary determination unit 2. A dictionary interface unit 4 for extracting dictionary contents used for translation processing from the created dictionary, a translation execution unit 3 for executing translation processing, and
The dictionary storage unit 5 stores a plurality of dictionaries.

【００１９】ユーザインターフェイス部１は、使用者が
辞書を登録、作成したりする際のユーザインターフェイ
スを担う辞書作成部６、辞書の作成や翻訳の対象となる
文書を取込む文書入力部７、及び、翻訳結果を使用者に
提示、出力するための翻訳結果出力部８から構成され
る。なお、辞書登録モードか翻訳モードかの情報は、例
えば、文書入力部７が入力文書を取込む前に取込むよう
になされている。また、この第１の実施形態の場合に
は、文書入力部７は、辞書登録モードにおいても、原言
語である日本語の文書だけを取込むようになされてお
り、また、翻訳モードにおいては、翻訳対象の日本語文
書を取込むようになされている。The user interface section 1 is a dictionary creating section 6 that serves as a user interface when a user registers and creates a dictionary, a document input section 7 that receives a document to be created or translated, and , And a translation result output unit 8 for presenting and outputting the translation result to the user. The information on the dictionary registration mode or the translation mode is, for example, taken in before the document input unit 7 takes in the input document. Further, in the case of the first embodiment, the document input unit 7 is adapted to take in only the document of the original language Japanese in the dictionary registration mode, and in the translation mode, It is designed to import Japanese documents to be translated.

【００２０】辞書判定部２は、文書特徴抽出部９、辞書
特徴データベース（辞書特徴ＤＢ）１０、及び、文書対
応辞書判定部１１から構成される。The dictionary determination unit 2 is composed of a document feature extraction unit 9, a dictionary feature database (dictionary feature DB) 10, and a document correspondence dictionary determination unit 11.

【００２１】文書特徴抽出部９は、辞書登録モードにお
いては、原言語に係る日本語文書から、その文書特徴を
抽出するものである。また、文書特徴抽出部９は、翻訳
モードにおいては、翻訳対象である日本語文書から、そ
の文書特徴情報を抽出するものである。In the dictionary registration mode, the document feature extraction unit 9 extracts the document features from the Japanese document relating to the source language. In the translation mode, the document feature extraction unit 9 extracts the document feature information from the Japanese document to be translated.

【００２２】文書特徴抽出部９は、文書特徴情報の抽出
のために、形態素解析等を実行している。この解析等に
は辞書内容が必要となるが、例えば、少なくとも後述す
る汎用辞書１９の格納内容を利用する。辞書格納部５に
格納されている全ての格納内容を、形態素解析等を利用
するようにしても良い。また、文書特徴抽出部９は、こ
のような処理のために必要となる辞書格納部５から転送
されてきた辞書内容を、かかる処理のために直接アクセ
スできるように格納する辞書部を内蔵していても良い。The document feature extraction unit 9 executes morphological analysis and the like in order to extract the document feature information. Although dictionary contents are required for this analysis and the like, for example, at least the contents stored in a general-purpose dictionary 19 described later are used. Morphological analysis or the like may be used for all the stored contents stored in the dictionary storage unit 5. Further, the document feature extraction unit 9 has a built-in dictionary unit that stores the dictionary contents transferred from the dictionary storage unit 5 necessary for such processing so as to be directly accessible for such processing. Is also good.

【００２３】辞書特徴データベース１０は、辞書格納部
５に格納されている各文書対応辞書についての特徴情報
（以下、辞書特徴情報と呼ぶ）をそれぞれ格納している
ものである。The dictionary feature database 10 stores feature information (hereinafter, referred to as dictionary feature information) for each document corresponding dictionary stored in the dictionary storage unit 5.

【００２４】文書対応辞書判定部１１は、辞書登録モー
ド及び翻訳モードの両モードにおいて、辞書特徴データ
ベース１０内の辞書特徴情報の中から、文書特徴抽出部
９で抽出された文書特徴情報に最も類似する辞書特徴情
報を持つ既存の後述する文書対応辞書を判定するもので
ある。文書対応辞書判定部１１は、辞書登録モードにお
いては、その判定結果に従って、今回の入力文書に係る
文書対応辞書の作成の必要性等をさらに判定するもので
ある。また、文書対応辞書判定部１１は、辞書登録モー
ドにおいては、類似度判定結果等に基づいて、辞書特徴
データベース１０に対する更新動作を適宜実行するもの
である。In both the dictionary registration mode and the translation mode, the document-corresponding dictionary determining unit 11 is most similar to the document feature information extracted by the document feature extracting unit 9 from the dictionary feature information in the dictionary feature database 10. The existing document correspondence dictionary, which will be described later and has the dictionary feature information, is determined. In the dictionary registration mode, the document-corresponding dictionary determining unit 11 further determines the necessity of creating the document-corresponding dictionary related to the current input document according to the determination result. In the dictionary registration mode, the document-corresponding dictionary determination unit 11 appropriately executes the update operation for the dictionary feature database 10 based on the similarity determination result and the like.

【００２５】図２は、文書特徴情報の一例の説明図であ
る。この例の文書特徴情報２０２は、辞書登録モード又
は翻訳モードを問わず、基本的には、図２（Ｂ）に示す
ように、図２（Ａ）に示すような入力された日本語文書
（原言語文書）２０１において、所定回数以上（例えば
５回以上）出現した所定品詞（例えば名詞）の単語の頻
度分布２０２４である。なお、この例では、入力文書に
付随して入力されたファイル名(filename)２０２１、編
集者情報(editor)２０２２、使用者情報(user)２０２３
等も書誌情報も文書特徴情報を構成しているものとして
いる。FIG. 2 is an explanatory diagram of an example of the document characteristic information. The document feature information 202 of this example is basically the input Japanese document as shown in FIG. 2A as shown in FIG. 2B regardless of the dictionary registration mode or the translation mode. This is a frequency distribution 2024 of words of a predetermined part of speech (for example, a noun) that appears a predetermined number of times or more (for example, five times or more) in the source language document 201. In this example, the file name (filename) 2021, the editor information (editor) 2022, and the user information (user) 2023 input along with the input document are input.
Etc. and the bibliographic information also constitute the document feature information.

【００２６】図３は、辞書特徴データベース１０に格納
されている辞書特徴情報の一例の説明図である。図３
は、辞書格納部５内の後述する３種類の文書形成辞書１
６、１７、１８についての情報例３０１、３０２、３０
３を示しており、その構成は、図２との比較から明らか
なように、文書特徴情報と同様である。FIG. 3 is an explanatory diagram of an example of dictionary feature information stored in the dictionary feature database 10. FIG.
Are three types of document formation dictionaries 1 in the dictionary storage unit 5 to be described later.
Information examples 301, 302, 30 for 6, 17, 18
3 is shown, and its configuration is similar to the document feature information, as is clear from the comparison with FIG.

【００２７】上述した文書対応辞書判定部１１は、例え
ば、文書特徴情報と辞書特徴情報との類似度を、両特徴
情報に共通の単語（名詞）の個数とし、文書特徴情報に
属する単語を最も多く含む文書形成辞書を最も類似して
いると判定する。The document-corresponding dictionary determination unit 11 described above determines, for example, that the similarity between the document feature information and the dictionary feature information is the number of words (nouns) common to both feature information, and the word belonging to the document feature information is the most. It is determined that the document forming dictionary containing many documents is the most similar.

【００２８】翻訳実行部３は、翻訳モードでのみ機能す
るものであり、翻訳処理を実行する翻訳処理部１２、翻
訳処理に用いる辞書内容を格納する辞書部１３から構成
される。The translation execution unit 3 functions only in the translation mode, and is composed of a translation processing unit 12 for executing translation processing and a dictionary unit 13 for storing dictionary contents used for translation processing.

【００２９】翻訳処理部１２は、既存の機械翻訳装置に
搭載されているものと同様であるので、その詳細説明は
省略する。なお、翻訳処理部１２は、より細かく見た場
合、原言語文書（日本語文書）に対する形態素解析部や
構文解析部等を内蔵しているが、これらについては、文
書特徴抽出部９における形態素解析部や構文解析部等と
共通に用いることができる。辞書部１３は、辞書格納部
５から、後述するような所定の辞書内容が転送されてき
てそれを格納するものである。The translation processing unit 12 is the same as that installed in the existing machine translation device, and therefore its detailed description is omitted. Note that the translation processing unit 12 has a morphological analysis unit, a syntactic analysis unit, and the like for the source language document (Japanese document) in a more detailed view. It can be used in common with the department and the parsing unit. The dictionary unit 13 is for transferring a predetermined dictionary content, which will be described later, from the dictionary storage unit 5 and storing it.

【００３０】辞書インターフェイス部４は、辞書登録部
１４、及び、翻訳インターフェイス部１５とからなる。The dictionary interface unit 4 comprises a dictionary registration unit 14 and a translation interface unit 15.

【００３１】辞書登録部１４は、辞書登録モードで機能
するものであり、文書対応辞書判定部１１の辞書登録用
文書に対する判定結果に基づいて、既存の文書対応辞書
に内容を追加登録させたり、文書対応辞書を新規作成さ
せてその新規作成の文書対応辞書に内容を登録させたり
するものである。The dictionary registration unit 14 functions in the dictionary registration mode, and based on the judgment result of the document registration dictionary judgment unit 11 for the document for dictionary registration, the contents are additionally registered in the existing document correspondence dictionary, The document corresponding dictionary is newly created and the contents are registered in the newly created document corresponding dictionary.

【００３２】翻訳インターフェイス部１５は、翻訳モー
ドで機能するものであり、文書対応辞書判定部１１が翻
訳用文書に対して決定した辞書の格納内容を、翻訳実行
部３内の辞書部１３に転送させるものである。The translation interface unit 15 functions in the translation mode, and transfers the stored contents of the dictionary determined by the document correspondence dictionary determination unit 11 for the translation document to the dictionary unit 13 in the translation execution unit 3. It is what makes me.

【００３３】辞書格納部５には、特徴が異なった複数の
文書対応辞書１６、１７、１８、…と、汎用辞書１９と
が格納されている。各文書対応辞書１６、１７、１８、
…は、日本語文書及び英語文書でなるある辞書登録用文
書から形成された辞書内容を格納するものである。一
方、汎用辞書１９は、多くの翻訳対象文書に汎用的な辞
書内容を格納しているものである。The dictionary storage unit 5 stores a plurality of document-corresponding dictionaries 16, 17, 18, ... With different features, and a general-purpose dictionary 19. The document correspondence dictionaries 16, 17, 18,
... stores the dictionary contents formed from a dictionary registration document which is a Japanese document and an English document. On the other hand, the general-purpose dictionary 19 stores general-purpose dictionary contents in many translation target documents.

【００３４】図４は、第１の実施形態の機械翻訳装置に
おける辞書登録モードの動作を示すフローチャートであ
る。FIG. 4 is a flow chart showing the operation in the dictionary registration mode in the machine translation device of the first embodiment.

【００３５】辞書登録モードの処理に入ると、使用者が
入力した辞書作成の対象となる日本語文書を文書入力部
７が取込む（ステップ４０２）。そして、文書特徴抽出
部９は、入力された日本語文書から文書特徴情報（２０
２）を抽出する（ステップ４０３）。When the processing of the dictionary registration mode is entered, the document input section 7 takes in the Japanese document which is the object of the dictionary creation inputted by the user (step 402). Then, the document feature extraction unit 9 extracts the document feature information (20) from the input Japanese document.
2) is extracted (step 403).

【００３６】文書対応辞書判定部１１は、抽出された文
書特徴情報（２０２）と、辞書特徴データベース１０内
の各文書対応辞書１６、１７、１８、…に対応する辞書
特徴情報（３０１、３０２、３０３、…）との類似度を
計算し、辞書特徴データベース１０から最大の類似度を
持つ辞書特徴情報の文書対応辞書を判定する（ステップ
４０４）。The document correspondence dictionary determination unit 11 extracts the extracted document feature information (202) and the dictionary feature information (301, 302, corresponding to each document correspondence dictionary 16, 17, 18, ... In the dictionary feature database 10). (303, ...), the document correspondence dictionary of the dictionary feature information having the maximum similarity is determined from the dictionary feature database 10 (step 404).

【００３７】ここでは、説明を簡単にするため、文書特
徴情報の類似度計算方法が以下のものであるとする。な
お、上述したように、文書特徴情報及び辞書特徴情報
は、５回以上出現した単語の組（頻度分布）を中心情報
としているものである。Here, in order to simplify the explanation, it is assumed that the method of calculating the similarity of document characteristic information is as follows. Note that, as described above, the document feature information and the dictionary feature information have a set of words (frequency distribution) appearing five times or more as the central information.

【００３８】まず、文書特徴情報中のファイル名と同じ
ファイル名を有する辞書特徴情報がある場合には、その
類似度は無限大とする。同じファイル名を有する辞書特
徴情報がなければ、文書特徴情報及び辞書特徴情報の双
方に記述されている単語（出現回数が５回以上の単語）
の数を類似度とする。但し、文書特徴情報及び辞書特徴
情報の双方に記述されている単語数が所定個数（０でも
良い）以下の場合は、類似度を０とする。First, when there is dictionary feature information having the same file name as the file name in the document feature information, the degree of similarity is infinite. If there is no dictionary feature information having the same file name, words described in both the document feature information and the dictionary feature information (words that appear 5 times or more)
The number of is the similarity. However, if the number of words described in both the document characteristic information and the dictionary characteristic information is less than or equal to a predetermined number (may be 0), the degree of similarity is 0.

【００３９】なお、図２及び図３に示した情報の範囲で
みれば、辞書特徴情報３０１は文書特徴情報２０２に対
する類似度が２（「情報」、「技術」が重複であり）、
辞書特徴情報３０２は文書特徴情報２０２に対する類似
度が３（「情報」、「研究」、「技術」）が重複）であ
り、辞書特徴情報３０３は文書特徴情報２０２に対する
類似度が１（「技術」が重複）の結果となり、辞書特徴
情報３０２）であり、辞書特徴情報３０２を持つ文書対
応辞書１７が最大の類似度を持つ辞書として判定され
る。In the range of information shown in FIGS. 2 and 3, the dictionary feature information 301 has a similarity of 2 to the document feature information 202 (“information” and “technology” are duplicated),
The dictionary feature information 302 has a similarity of 3 to the document feature information 202 (“information”, “research”, and “technology” overlap), and the dictionary feature information 303 has a similarity of 1 to the document feature information 202 (“technology”). Is the dictionary feature information 302), and the document corresponding dictionary 17 having the dictionary feature information 302 is determined as the dictionary having the maximum similarity.

【００４０】入力文書に係る文書特徴情報に類似する辞
書特徴情報を持つ文書対応辞書が１個も存在しない場合
には、新しい文書対応辞書を辞書格納部５に作成させる
（ステップ４０６）。なお、この段階では、辞書の枠組
（ファイル）が形成されるだけであり、内容の登録は後
述するように後で行なわれる。If there is no document correspondence dictionary having dictionary feature information similar to the document feature information related to the input document, a new document correspondence dictionary is created in the dictionary storage unit 5 (step 406). At this stage, only the framework (file) of the dictionary is formed, and the contents are registered later as described later.

【００４１】一方、類似度最大の文書対応辞書が複数存
在する場合には、自動的に１個を選択するのではなく、
辞書作成部６を介してその全ての候補辞書を使用者に提
示し、使用者に選択させる（ステップ４０７）。On the other hand, when there are a plurality of document correspondence dictionaries having the highest degree of similarity, one is not automatically selected, but
All the candidate dictionaries are presented to the user through the dictionary creating unit 6 and the user is made to select them (step 407).

【００４２】以上のようにして、作成される場合を含
め、登録対象の文書対応辞書が決定されると、文書対応
辞書判定部１１で決定された辞書名が、辞書インターフ
ェイス部４の辞書登録部１４に設定され、使用者が登録
したい単語を入力すると（ステップ４０８）、この辞書
名を持つ文書対応辞書に対して単語登録が行なわれる
（ステップ４０９）。When the document-corresponding dictionary to be registered is determined, including the case of being created as described above, the dictionary name determined by the document-corresponding dictionary determining unit 11 is used as the dictionary registration unit of the dictionary interface unit 4. When it is set to 14, and the user inputs a word to be registered (step 408), the word is registered in the document corresponding dictionary having this dictionary name (step 409).

【００４３】具体的な登録方法は、いかなる方法でも良
い。例えば、出現回数が５回以上の文書特徴情報に含ま
れている単語が、登録対象の文書対応辞書に記述されて
いるか否かを判断し、記述されていなければその単語を
使用者に提示して登録するか否かを確認し、登録する場
合にはその英語情報を取込んで登録する。この場合、出
現回数が５回未満のものも対象とするようにしても良
い。また、特徴抽出に関係なく、単語を辞書登録させて
も良い。従って、登録単語は、特徴抽出に係る品詞の単
語以外であっても良い。例えば、対訳文書から辞書に格
納し得る内容を文献２に記載の方法等によって予め得て
おき、かかる登録動作で登録するようにしても良い。Any specific registration method may be used. For example, it is determined whether or not a word included in the document feature information that appears five times or more is described in the document-corresponding dictionary to be registered. If not, the word is presented to the user. Confirm whether or not to register, and when registering, import the English information and register. In this case, the number of appearances less than 5 may be targeted. Also, the words may be registered in the dictionary regardless of the feature extraction. Therefore, the registered word may be other than the word of the part of speech related to the feature extraction. For example, the contents that can be stored in the dictionary from the bilingual document may be obtained in advance by the method described in Document 2 and registered by such a registration operation.

【００４４】文献２『北村美穂子、松本裕治共著、「二
言語対訳コーパスからの翻訳知識の自動獲得」、電子情
報通信学会ＮＬＣ研究会報告、信学技報Vol.94 No.32
(2)、pp.9-16 』ここで、文書対応辞書１７が辞書インターフェイス部４
の辞書登録部１４に設定された場合において、使用者が
文書２０１中の「構造」について、「構造：structure
」を登録すると、かかる内容が文書対応辞書１７に登
録される。Reference 2 “Mihoko Kitamura and Yuji Matsumoto,“ Automatic acquisition of translation knowledge from bilingual bilingual corpus ”, Report of NLC workshop of IEICE, Technical Report Vol.94 No.32
(2), pp.9-16 ”Here, the document correspondence dictionary 17 is the dictionary interface unit 4
When it is set in the dictionary registration unit 14 of the user, the user selects the “structure: structure” for the “structure” in the document 201.
Is registered in the document correspondence dictionary 17.

【００４５】使用者が登録終了を指示すると、辞書特徴
データベース１０が更新され（ステップ４１０）、一連
の辞書登録モードでの処理は終了する。文書対応辞書が
新規作成された場合には、文書特徴情報の全てが、新規
作成された文書対応辞書の辞書特徴情報として辞書特徴
データベース１０に新規登録される。また、文書対応辞
書が新規作成されなかった場合には、辞書特徴データベ
ース１０内の登録に供した文書対応辞書についての既存
の辞書特徴情報に、文書特徴情報中の両者に重複しない
単語情報が追加される。例えば、登録対象として文書対
応辞書１７が判定された場合であれば、図３（Ｂ）に示
す辞書特徴情報３０２に、図２（Ｂ）に示す文書特徴情
報２０２中の「研究」、「技術」、「情報］以外の単語
の情報が追加される。When the user gives an instruction to end the registration, the dictionary feature database 10 is updated (step 410), and the processing in the series of dictionary registration modes ends. When the document correspondence dictionary is newly created, all the document feature information is newly registered in the dictionary feature database 10 as the dictionary feature information of the newly created document correspondence dictionary. When the document correspondence dictionary is not newly created, word information that does not overlap with both of the document feature information is added to the existing dictionary feature information of the document correspondence dictionary registered in the dictionary feature database 10. To be done. For example, when the document correspondence dictionary 17 is determined as the registration target, the dictionary feature information 302 illustrated in FIG. 3B includes “research” and “technology” in the document feature information 202 illustrated in FIG. , "And" information "are added.

【００４６】以上のような辞書登録モードでの動作によ
り、使用者が専門分野を意識しなくても、文書の特徴対
応で辞書を作成して登録したり、既存辞書に追加登録し
たりすることができ、同一単語に対する訳語をそのよう
な文書対応辞書毎に任意に登録することができる。例え
ば、「構造」の訳語として、文書対応辞書１６に“cons
truction”、文書対応辞書１７に“structure ”、文書
対応辞書１８に“organization”を登録できる。By the operation in the dictionary registration mode as described above, the dictionary can be created and registered according to the characteristics of the document or additionally registered to the existing dictionary without the user being aware of the specialized field. The translated word for the same word can be arbitrarily registered for each such document corresponding dictionary. For example, as a translation of “structure”, “cons
It is possible to register "truction", "structure" in the document correspondence dictionary 17, and "organization" in the document correspondence dictionary 18.

【００４７】図５は、第１の実施形態の機械翻訳装置に
おける翻訳モードの動作を示すフローチャートである。
なお、図６には、翻訳対象の文書６０１の一例と、その
文書６０１の特徴情報６０２と、その文書６０１を翻訳
した翻訳結果６０３とを示している。FIG. 5 is a flow chart showing the operation of the translation mode in the machine translation device of the first embodiment.
Note that FIG. 6 shows an example of the document 601 to be translated, the characteristic information 602 of the document 601, and the translation result 603 obtained by translating the document 601.

【００４８】翻訳モードの処理に入ると、使用者が入力
した翻訳対象の文書（６０１）を文書入力部７が取込む
（ステップ５０２）。そして、文書特徴抽出部９は、そ
の翻訳対象文書から、文書特徴情報（６０２）を抽出す
る（ステップ５０３）。When the process of the translation mode is started, the document input unit 7 takes in the document (601) to be translated input by the user (step 502). Then, the document feature extraction unit 9 extracts the document feature information (602) from the translation target document (step 503).

【００４９】その後、文書対応辞書判定部１１におい
て、抽出された文書特徴情報（４０２）と、辞書特徴デ
ータベース１０内の辞書特徴情報（３０１、３０２、３
０３…）との類似度が計算されると共に、辞書特徴デー
タベース１０から最大の類似度を持つ辞書特徴情報が選
択される（ステップ５０４）。After that, the document correspondence dictionary determination unit 11 extracts the extracted document feature information (402) and the dictionary feature information (301, 302, 3 in the dictionary feature database 10).
03 ...) and the dictionary feature information having the maximum similarity is selected from the dictionary feature database 10 (step 504).

【００５０】文書対応辞書判定部１１で決定された辞書
名は、辞書インターフェイス部４の翻訳インターフェイ
ス部１５に渡され、翻訳インターフェイス部１５は翻訳
実行部３内の辞書部１３にその辞書名を持つ文書対応辞
書の格納内容を読み込む（ステップ５０５）。なお、類
似する文書対応辞書が存在しない場合には、特別な特徴
を持たない汎用辞書１９の格納内容を辞書部１３に読み
込む。The dictionary name determined by the document correspondence dictionary determination unit 11 is passed to the translation interface unit 15 of the dictionary interface unit 4, and the translation interface unit 15 has the dictionary name in the dictionary unit 13 in the translation execution unit 3. The stored contents of the document corresponding dictionary are read (step 505). If there is no similar document corresponding dictionary, the stored contents of the general-purpose dictionary 19 having no special feature are read into the dictionary unit 13.

【００５１】例えば、図３及び図６に示した情報の範囲
でみれば、辞書特徴情報３０１は文書特徴情報６０２に
対する類似度が１（「計算機」が重複）であり、辞書特
徴情報３０２は文書特徴情報６０２に対する類似度が４
（「情報」、「人工知能」、「研究」、「処理」が重
複）であり、辞書特徴情報３０３は文書特徴情報６０２
に対する類似度が０であり、この場合には、文書対応辞
書１７が選択されることになる。すなわち、文書対応辞
書１７の格納内容が辞書部１３に読み込まれることにな
る。For example, in the range of information shown in FIGS. 3 and 6, the dictionary feature information 301 has a similarity of 1 to the document feature information 602 (“computers” are duplicated), and the dictionary feature information 302 is a document. The similarity to the feature information 602 is 4
(“Information”, “artificial intelligence”, “research”, and “processing” are duplicated), and the dictionary feature information 303 is the document feature information 602.
Is 0, and in this case, the document correspondence dictionary 17 is selected. That is, the stored contents of the document correspondence dictionary 17 are read into the dictionary unit 13.

【００５２】なお、文書対応辞書が、助詞や接続詞等の
文書特徴には関係しない情報を格納していないものであ
れば、決定された文書対応辞書の格納内容だけでなく、
汎用辞書１９の格納内容の辞書部１３に転送されること
になる。If the document-corresponding dictionary does not store information that is not related to the document features such as particles and conjunctions, not only the stored contents of the determined document-corresponding dictionary,
The contents stored in the general-purpose dictionary 19 are transferred to the dictionary unit 13.

【００５３】辞書部１３に辞書内容が転送されると、翻
訳処理部１２は、この辞書部１３を使用して、翻訳対象
文書（６０１）の翻訳処理を実行し（ステップ５０
６）、得られた翻訳結果（６０３）を翻訳結果出力部８
から出力させ（ステップ５０７）、一連の翻訳モードで
の処理を終了する。なお、辞書部１３に、決定された文
書対応辞書の格納内容と、汎用辞書１９の格納内容とが
転送された場合において、原言語単語が重複記述されて
いるときには、決定された文書対応辞書の格納内容が優
先される。When the dictionary contents are transferred to the dictionary unit 13, the translation processing unit 12 uses this dictionary unit 13 to execute the translation process of the translation target document (601) (step 50).
6), the translation result output unit 8 outputs the obtained translation result (603).
Is output (step 507), and the processing in the series of translation modes ends. When the storage contents of the determined document-corresponding dictionary and the storage contents of the general-purpose dictionary 19 are transferred to the dictionary unit 13 and the source language word is redundantly described, the determined document-corresponding dictionary is stored. The stored contents have priority.

【００５４】ここで、各文書対応辞書１６、１７、１８
には、「構造」の訳語として、それぞれ“constructio
n”、“structure ”、“organization”が登録されて
いるとする。このような状況において、図６（Ａ）に示
す翻訳対象文書６０１が入力されると、その文書特徴情
報６０２に基づいて、最も類似した辞書特徴情報を持つ
文書対応辞書１７が自動選択され、それを翻訳処理する
ことによって、「構造」の訳語に最適な“structure ”
という訳語に翻訳される。Here, each document corresponding dictionary 16, 17, 18
Is a translation of "structure",
It is assumed that “n”, “structure”, and “organization” are registered. When the translation target document 601 shown in FIG. 6A is input in such a situation, based on the document feature information 602, The document corresponding dictionary 17 having the most similar dictionary feature information is automatically selected, and the translation process is performed on the document-corresponding dictionary 17.
Is translated into

【００５５】以上のように、第１の実施形態によれば、
文脈ベクトルにしたがった固定的な分野単位ではなく、
文書の特徴に応じた単位で文書対応辞書の自動作成及び
自動設定が可能となる。As described above, according to the first embodiment,
Instead of a fixed field unit according to the context vector,
It is possible to automatically create and automatically set the document correspondence dictionary in units according to the characteristics of the document.

【００５６】また、第１の実施形態によれば、翻訳対象
文書から文書特徴を抽出し、文書の特徴に応じた単位の
複数の文書対応辞書から最適なものを選択するようにし
たので、同一単語であっても入力された翻訳対象文書に
よって訳し分けを行なうことができ、固定的な分野単位
で辞書を用意している従来装置に比較して、一段と訳質
を高めることができる。Further, according to the first embodiment, the document feature is extracted from the document to be translated, and the optimum one is selected from a plurality of document correspondence dictionaries in units corresponding to the feature of the document. Even words can be translated according to the input document to be translated, and the translation quality can be further improved as compared with a conventional device that prepares a dictionary in fixed field units.

【００５７】例えば、分野単位による辞書選択方法で
は、「構造：construction」、「構造：structure 」は
いずれも「情報処理」という同一分野に含まれるため、
同一辞書内に登録され、辞書選択による訳し分けをする
ことができないが、第１の実施形態によれば、文書の特
徴単位に文書対応辞書を作成できるため、それぞれを別
の文書対応辞書に登録することができ、辞書選択による
訳し分けを行なうことができる。For example, in the dictionary selection method by field unit, both "structure: construction" and "structure: structure" are included in the same field of "information processing".
Although they are registered in the same dictionary and cannot be translated by selecting a dictionary, according to the first embodiment, since a document corresponding dictionary can be created for each feature unit of a document, each is registered in a different document corresponding dictionary. It is possible to perform translation by selecting a dictionary.

【００５８】（Ｂ）第２の実施形態次に、本発明を日英機械翻訳装置に適用した第２の実施
形態を図面を参照しながら詳述する。ここで、図７が、
この第２の実施形態の機械翻訳装置を示す機能ブロック
図であり、上述した図１との同一、対応部分には同一符
号を付して示している。(B) Second Embodiment Next, a second embodiment in which the present invention is applied to a Japanese-English machine translation device will be described in detail with reference to the drawings. Here, FIG.
It is a functional block diagram which shows the machine translation apparatus of this 2nd Embodiment, Comprising: The same code | symbol is attached | subjected and shown to the same or corresponding part as FIG. 1 mentioned above.

【００５９】図７において、第２の実施形態の機械翻訳
装置は、辞書判定部２内に登録内容抽出部２０が設けら
れている。この第２の実施形態では、辞書登録モードで
は、原言語である日本語の文書とその英語文書との対訳
文書が文書入力部７を介して入力されるようになされて
いる。このように入力された対訳文書が、登録内容抽出
部２０に与えられるようになされており、入力された対
訳文書の内、日本語文書だけが文書特徴抽出部９に与え
られるようになされている。In FIG. 7, the machine translation apparatus of the second embodiment is provided with a registration content extraction unit 20 in the dictionary determination unit 2. In the second embodiment, in the dictionary registration mode, a bilingual document of a source language Japanese document and its English document is input through the document input unit 7. The bilingual document input in this way is provided to the registered content extracting unit 20, and only the Japanese document of the entered bilingual documents is provided to the document feature extracting unit 9. .

【００６０】登録内容抽出部２０は、辞書登録モードに
おいてのみ機能するものであり、日本語文書及び英語文
書でなる対訳文書から、辞書に登録し得る内容を自動的
に抽出するものである。なお、対訳文書から、辞書に登
録し得る内容を自動的に得る方法としては、例えば、上
記文献２に記載の方法を適用できる。The registered content extraction unit 20 functions only in the dictionary registration mode, and automatically extracts the content that can be registered in the dictionary from the bilingual documents composed of Japanese and English documents. As a method for automatically obtaining the contents that can be registered in the dictionary from the bilingual document, for example, the method described in Document 2 above can be applied.

【００６１】登録内容抽出部２０以外の各部は、第１の
実施形態とほぼ同様に機能する。但し、辞書作成部６
は、例えば、登録内容抽出部２０が抽出した登録可能な
内容（単語の対語情報）の内、登録しようとする文書対
応辞書に格納されていない内容を使用者に提示して、登
録の有無や内容修正を受け付けたりし、辞書登録部１４
は、このようにして辞書作成部６を介して使用者から指
示された登録内容抽出部２０が抽出した内容を所定の文
書対応辞書に登録させるものである。The respective units other than the registered content extracting unit 20 function almost in the same manner as in the first embodiment. However, the dictionary creation unit 6
For example, among the registrable contents (word opposite information) extracted by the registration contents extracting unit 20, which are not stored in the document-corresponding dictionary to be registered, are presented to the user to determine whether or not registration is performed. Accepting content corrections, dictionary registration unit 14
In this way, the contents extracted by the registered contents extracting unit 20 instructed by the user via the dictionary creating unit 6 in this way are registered in a predetermined document corresponding dictionary.

【００６２】従って、この第２の実施形態によっても、
固定的な分野単位ではなく、(1) 文書の特徴に応じた単
位での文書対応辞書の自動作成及び自動設定が可能とな
る、(2) 翻訳対象文書から文書特徴を抽出し、文書の特
徴に応じた単位の複数の文書対応辞書から最適なものを
選択するようにしたので、同一単語であっても入力され
た翻訳対象文書によって訳し分けを行なうことができ、
固定的な分野単位で辞書を用意している従来装置に比較
して、一段と訳質を高めることができる、という効果を
奏することができる。Therefore, according to the second embodiment as well,
Instead of a fixed field unit, (1) it is possible to automatically create and automatically set the document correspondence dictionary in units according to the characteristics of the document, (2) extract the document characteristics from the translation target document, and Since the most suitable dictionary is selected from a plurality of document-corresponding dictionaries corresponding to the unit, it is possible to perform translation according to the input translation target document even for the same word,
It is possible to further improve the translation quality as compared with a conventional device that prepares a dictionary in fixed field units.

【００６３】また、第２の実施形態によれば、辞書に登
録し得る内容を装置側が自動的に得るので、使用者の負
担を少なくすることができる有用な装置を提供できる。Further, according to the second embodiment, since the device side automatically obtains the contents that can be registered in the dictionary, it is possible to provide a useful device which can reduce the burden on the user.

【００６４】（Ｃ）他の実施形態上記各実施形態の説明においても、種々変形可能なこと
を説明したが、さらに、上記実施形態を以下のように変
形した他の実施形態も本発明を構成するものである。(C) Other Embodiments In the description of each of the above embodiments, various modifications have been described, but other embodiments in which the above embodiment is modified as follows also constitute the present invention. To do.

【００６５】(1) 上記各実施形態においては、文書特徴
情報及び辞書特徴情報が、主として、文書中に所定回数
以上出現した単語の組情報であるものを示したが、これ
以外の情報であっても良い。例えば、入力文書の長さを
反映させるため、文書の単語総数で出現回数を割った出
現率が所定の出現率以上の単語の組情報を、文書特徴情
報及び辞書特徴情報とするようにしても良い。また、単
語だけでなく、イディオムをも特徴を構成する要素とす
るようにしても良い。(1) In each of the above-described embodiments, the document characteristic information and the dictionary characteristic information are mainly the group information of words that have appeared a predetermined number of times or more in the document, but they are other information. May be. For example, in order to reflect the length of the input document, the group information of words whose appearance rate obtained by dividing the number of appearances by the total number of words in the document is a predetermined appearance rate or more may be used as the document characteristic information and the dictionary characteristic information. good. Further, not only words but also idioms may be used as the constituent elements of the feature.

【００６６】(2) 同様に、文書特徴情報及び辞書特徴情
報の類似度も、双方に属する単語数に限定されるもので
はない。例えば、文書作成者の一致不一致を値に換算し
て類似度の値に含めるようにしても良い。また、出現回
数や出現率が大きい単語（重要語）については、類似度
への加算値を大きくするようにしても良い。(2) Similarly, the similarity between the document feature information and the dictionary feature information is not limited to the number of words belonging to both. For example, the match / mismatch of the document creator may be converted into a value and included in the similarity value. For words (important words) having a large number of appearances or a high appearance rate, the value added to the degree of similarity may be increased.

【００６７】(3) また、上記各実施形態においては、日
英機械翻訳装置に本発明を適用したものを示したが、原
言語又は目的言語がこれ以外の機械翻訳装置の本発明を
適用できることは勿論である。この場合であっても、原
言語側の文書から、文書特徴情報及び辞書特徴情報を得
ることを要する。(3) In each of the above embodiments, the present invention is applied to a Japanese-English machine translation device, but the present invention can be applied to a machine translation device whose source language or target language is other than this. Of course. Even in this case, it is necessary to obtain the document feature information and the dictionary feature information from the source language document.

【００６８】(4) さらに、上記各実施形態においては、
翻訳方向が１方向の機械翻訳装置に本発明を適用したも
のを示したが、翻訳方向が２方向以上の機械翻訳装置に
本発明を適用することができる。この場合、辞書特徴デ
ータベース１０には、各言語での辞書特徴を格納してお
くことを要し、翻訳モードにおいては、そのときの原言
語の文書から文書特徴情報を得ることを要する。(4) Furthermore, in each of the above embodiments,
Although the present invention is applied to a machine translation device having one translation direction, the present invention can be applied to a machine translation device having two or more translation directions. In this case, the dictionary feature database 10 needs to store the dictionary features in each language, and in the translation mode, it is necessary to obtain the document feature information from the original language document at that time.

【００６９】(5) さらにまた、上記各実施形態において
は、辞書特徴情報を辞書特徴データベース１０に格納し
ておくものを示したが、それぞれ対応する文書対応辞書
に特徴情報の格納エリアを設けて格納しておくようにし
ても良い。(5) Furthermore, in each of the above embodiments, the dictionary feature information is stored in the dictionary feature database 10. However, the corresponding document corresponding dictionaries are provided with feature information storage areas. It may be stored.

【００７０】(6) また、上記各実施形態においては、翻
訳処理に供する文書対応辞書が１個のものを示したが、
２個以上の文書対応辞書を翻訳処理に供するものとして
決定するようにしても良い。この場合、翻訳対象文書の
文書特徴との類似度に応じて、使用の優先順位を設定す
るようにしても良い。(6) Further, in each of the above-described embodiments, one document-compatible dictionary provided for translation processing is shown.
Two or more document-corresponding dictionaries may be determined as those to be subjected to translation processing. In this case, the priority of use may be set according to the degree of similarity with the document feature of the translation target document.

【００７１】[0071]

【発明の効果】以上のように、第１の本発明による辞書
登録装置によれば、入力文書の特徴を抽出して文書対応
辞書の作成必要性を判定して、必要ならば文書対応辞書
を作成して登録動作するようにしたので、文脈ベクトル
にしたがった固定的な分野単位ではなく、文書の特徴に
応じた単位で文書対応辞書の自動作成及び自動設定が可
能となる。As described above, according to the dictionary registration apparatus of the first aspect of the present invention, the features of the input document are extracted to determine the necessity of creating the document correspondence dictionary, and if necessary, the document correspondence dictionary is created. Since it is created and registered, the document correspondence dictionary can be automatically created and automatically set in a unit according to the characteristics of the document, instead of a fixed field unit according to the context vector.

【００７２】また、第２の本発明による機械翻訳装置に
よれば、翻訳対象文書から文書特徴を抽出し、文書の特
徴に応じた単位の複数の文書対応辞書から最適なものを
選択するようにしたので、第１の本発明による辞書登録
装置が奏する効果だけでなく、同一単語であっても入力
された翻訳対象文書によって訳し分けをでき、固定的な
分野単位で辞書を用意している従来装置に比較して一段
と訳質を高めることができるという効果をも奏する。Further, according to the machine translation apparatus of the second aspect of the present invention, the document feature is extracted from the document to be translated, and the optimum one is selected from a plurality of document correspondence dictionaries in units corresponding to the feature of the document. Therefore, in addition to the effect of the dictionary registration device according to the first aspect of the present invention, even the same word can be translated according to the input translation target document, and the dictionary is prepared in fixed field units. It also has the effect of further improving the translation quality as compared with the device.

[Brief description of drawings]

【図１】第１の実施形態の機械翻訳装置の機能ブロック
図である。FIG. 1 is a functional block diagram of a machine translation device according to a first embodiment.

【図２】文書特徴情報の説明図である。FIG. 2 is an explanatory diagram of document characteristic information.

【図３】辞書特徴情報の説明図である。FIG. 3 is an explanatory diagram of dictionary feature information.

【図４】第１の実施形態の辞書登録モードでの動作フロ
ーチャートである。FIG. 4 is an operation flowchart in a dictionary registration mode according to the first embodiment.

【図５】第１の実施形態の翻訳モードでの動作フローチ
ャートである。FIG. 5 is an operation flowchart in a translation mode according to the first embodiment.

【図６】第１の実施形態の翻訳処理例の説明図である。FIG. 6 is an explanatory diagram of a translation processing example according to the first embodiment.

【図７】第２の実施形態の機械翻訳装置の機能ブロック
図である。FIG. 7 is a functional block diagram of a machine translation device according to a second embodiment.

[Explanation of symbols]

１…ユーザインターフェイス部、２…辞書判定部、３…
翻訳実行部、４…辞書インターフェイス部、５…辞書格
納部、６…辞書作成部、９…文書特徴抽出部、１０…辞
書特徴データベース、１１…文書対応辞書判定部、１４
…辞書登録部、１５…翻訳インターフェイス部、１６〜
１８…文書対応辞書、１９…汎用辞書。1 ... User interface part, 2 ... Dictionary determination part, 3 ...
Translation execution unit, 4 ... Dictionary interface unit, 5 ... Dictionary storage unit, 6 ... Dictionary creation unit, 9 ... Document feature extraction unit, 10 ... Dictionary feature database, 11 ... Document correspondence dictionary determination unit, 14
… Dictionary registration part, 15… Translation interface part, 16 ～
18 ... Document correspondence dictionary, 19 ... General-purpose dictionary.

Claims

[Claims]

1. Creation of a dictionary used in a machine translation device,
In a dictionary registration device that performs registration, a document feature extraction unit that extracts document feature information including at least appearance frequency information of words and idioms in the input document from an input document described in natural language, and has already been created. For each of the one or more document-corresponding dictionaries, dictionary feature storage means for storing dictionary feature information having a format similar to the document feature information, document feature information extracted by the document feature extraction means, and the dictionary. The similarity with each dictionary feature information stored in the feature storage means is obtained, and it is necessary to create a new document correspondence dictionary based on each obtained similarity and the creation of a new document correspondence dictionary. When there is no need, the existing document correspondence dictionary to be registered is determined, and when a new document correspondence dictionary is created, the extracted document feature information is stored in the dictionary. Stored in the dictionary feature storage means as signature information, and when it is determined that the existing document correspondence dictionary is to be registered, it is stored in the dictionary feature storage means according to the extracted document feature information. Based on the similarity determining means for updating the dictionary feature information, and a new document correspondence dictionary is created if necessary based on the determination result of this similarity determining means, and then the dictionary is registered in the determined document correspondence dictionary. A dictionary registration device comprising: a dictionary registration means for performing.

2. A machine translation device having a dictionary registration structure, wherein first document characteristic information including at least appearance frequency information of words or idioms in the input document is extracted from the input document described in the source language. A document feature extraction unit, a dictionary feature storage unit that stores dictionary feature information having a format similar to the document feature information for each of the one or more document correspondence dictionaries already created, and the first document The similarity between the document feature information extracted by the feature extraction means and each dictionary feature information stored in the dictionary feature storage means is obtained, and a new document corresponding dictionary is created based on each obtained similarity. If there is no need to create a new document correspondence dictionary, the existing document correspondence dictionary to be registered is determined when there is no need to create a new document correspondence dictionary. The document feature information
The dictionary feature storage means stores the dictionary feature information, and when it is determined that the existing document corresponding dictionary is to be registered, the dictionary feature storage means stores the dictionary feature information according to the extracted document feature information. Based on the first similarity determination means for updating the existing dictionary feature information and the determination result of the first similarity determination means, if necessary, a new document correspondence dictionary is created, and then the determination is made. Dictionary registration means for registering a dictionary in the document-corresponding dictionary, and second document feature extraction means for extracting at least document feature information from the translation target input document, the appearance frequency information of words and idioms in the translation target input document. And a similarity between the document feature information extracted by the second document feature extraction means and each dictionary feature information stored in the dictionary feature storage means, and based on each obtained similarity. , Second similarity determination means for determining one or more document correspondence dictionaries to be used for translation processing, and the contents of the document correspondence dictionary determined to be used for translation processing for the translation target input document. A machine translation device, comprising:

3. The same document feature extracting means is selectively used as the first and second document feature extracting means, and the same similarity determining means is selectively used as the first and second similarity determining means. The machine translation device according to claim 2.