JP2002132764A

JP2002132764A - Machine translation preprocessor

Info

Publication number: JP2002132764A
Application number: JP2000328469A
Authority: JP
Inventors: Kozue Kimura; こずえ木村
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2000-10-27
Filing date: 2000-10-27
Publication date: 2002-05-10

Abstract

PROBLEM TO BE SOLVED: To mechanically translate contents of a dictionary integrated inside a machine translation system with accuracy similar to the case of practically increasing the number of words registered on the dictionary without changing the contents. SOLUTION: In the machine translation preprocessor for preprocessing an input text for improving the accuracy of machine translation before inputting an input text in a source language to a machine translation device, this device is provided with an input part 102 for inputting the text in the source language, a table memory 104 for storing a preprocessing dictionary table 1041 having an entry containing pairs of index words in the source language and correspondent words in a predetermined language, a dictionary reference part 1051 for retrieving an entry having a correspondent index word from the preprocessing dictionary table 1041 concerning a word contained in the inputted text, and a replacing part 1052 for replacing one part of the text with the correspondent word, which is contained in the retrieved entry, in the predetermined language.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明はソース言語のテキ
ストをターゲット言語のテキストに翻訳する機械翻訳シ
ステムに関し、特に、機械翻訳に先立ち、機械翻訳の制
度を向上させることおよび機械翻訳の速度を向上させる
ことを目的として、ソース言語のテキストを前処理する
ための装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a machine translation system for translating a text in a source language into a text in a target language, and more particularly to improving the accuracy of machine translation and the speed of machine translation prior to machine translation. Device for pre-processing text in a source language.

【０００２】[0002]

【従来の技術】機械翻訳システムは、自然言語で書かれ
たテキストを、主としてコンピュータ処理を用いて他の
言語のテキストに翻訳する。機械翻訳においては、ソー
ス言語の翻訳対象の文をすべて正しく解析し、適切にタ
ーゲット言語に翻訳することが望まれる。2. Description of the Related Art A machine translation system translates text written in a natural language into text in another language mainly using computer processing. In machine translation, it is desired that all sentences to be translated in a source language be correctly analyzed and appropriately translated into a target language.

【０００３】文の解析には、ソース言語の辞書が参照さ
れる。ソース言語の翻訳対象の文に辞書にない単語があ
らわれたときには、その部分の解析に失敗するため、そ
の部分の翻訳も失敗することが多い。In analyzing a sentence, a dictionary in a source language is referred to. When a word that is not in the dictionary appears in a sentence to be translated in the source language, the analysis of that part fails, and the translation of that part often fails.

【０００４】したがって、機械翻訳を適切に行なうため
には、システムのソース言語の辞書にできるだけ多く語
彙を登録すること、および辞書の内容を充実させること
が最も重要となる。Therefore, in order to properly perform machine translation, it is most important to register as many vocabularies as possible in the dictionary of the source language of the system and to enhance the contents of the dictionary.

【０００５】しかし、辞書に登録された語数を増大させ
ようとすると、多大な手間が必要となり、実用的な機械
翻訳システムを商業的に市場に供給しようとする場合に
は問題である。特に機械翻訳システムが大きくなればな
るほど、また辞書に登録されている単語数が多くなれば
なるほど、それまでに構築されてきた辞書およびシステ
ムとの整合性を保ちながら辞書の登録語数を増大させる
ことは困難な作業で、多大な人的資源を必要とし、その
結果辞書のコスト、したがって機械翻訳システムのコス
トを上昇させてしまう。[0005] However, an attempt to increase the number of words registered in the dictionary requires a great deal of trouble, which is a problem when a practical machine translation system is to be commercially supplied to the market. In particular, as the size of the machine translation system increases and the number of words registered in the dictionary increases, the number of words registered in the dictionary must be increased while maintaining consistency with the dictionary and system that have been built up to then. Is a difficult task, requires a lot of human resources, and increases the cost of the dictionary and thus the cost of the machine translation system.

【０００６】また、他から購入した機械翻訳システムを
使用する場合などには、システムの内部を参照できなか
ったり、変更できなかったりして、辞書の登録語数を増
大させることが不可能な場合がある。したがって、こう
した制約にもかかわらず、実質的に辞書の登録語数を増
大させたのと同じように精度の高い機械翻訳が可能な機
械翻訳システムが望まれている。機械翻訳システム自体
に組込まれている辞書の内容を変更できないのであるか
ら、機械翻訳の前処理としてなんらかの形でソース言語
のテキストを前処理することによって、こうした効果を
達成することが必要である。Further, when using a machine translation system purchased from another source, it is sometimes impossible to increase the number of words registered in the dictionary because the inside of the system cannot be referenced or changed. is there. Therefore, despite such restrictions, there is a need for a machine translation system that can perform machine translation with high accuracy as in the case where the number of registered words in the dictionary is substantially increased. Since the contents of the dictionary built into the machine translation system itself cannot be changed, it is necessary to achieve such an effect by pre-processing the text of the source language in some form as the pre-processing of the machine translation.

【０００７】[0007]

【発明が解決しようとする課題】ソース言語のテキスト
を前処理することによって機械翻訳の精度を高める先行
技術として、機械翻訳の前処理を行なうための、特開平
５−２２５２３２号公報に開示されたものがある。しか
しこの先行技術の公報は、通常は人手で行なう、機械翻
訳のためのテキストの前編集を自動化することによっ
て、機械翻訳の精度については間接的に高めようとする
ものであって、実質的に辞書の登録語数を増大させたの
と同じ効果をもたらすことによって直接的に機械翻訳の
精度を向上させるものではない。As a prior art for improving the accuracy of machine translation by preprocessing text in a source language, Japanese Patent Application Laid-Open No. 5-225232 discloses a technique for performing preprocessing of machine translation. There is something. However, this prior art publication attempts to indirectly increase the accuracy of machine translation by automating the pre-editing of text for machine translation, which is usually done manually. It does not directly improve the accuracy of machine translation by providing the same effect as increasing the number of words registered in the dictionary.

【０００８】したがって、本発明の目的は、機械翻訳シ
ステムの内部に組み込まれた辞書の内容を変更すること
なく、辞書の登録語数を実質的に増大させた場合と同様
の精度で機械翻訳することを可能とする機械翻訳のため
の前処理装置を提供することである。Therefore, an object of the present invention is to provide machine translation with the same accuracy as that when the number of words registered in a dictionary is substantially increased without changing the contents of a dictionary incorporated in a machine translation system. To provide a pre-processing device for machine translation that enables

【０００９】[0009]

【課題を解決するための手段】この発明のある局面によ
る機械翻訳前処理装置は、ソース言語の入力テキストを
機械翻訳装置に入力するに先立って、機械翻訳の精度を
高めるために入力テキストを前処理するための機械翻訳
前処理装置であって、ソース言語のテキストを入力する
ための入力手段と、ソース言語の見出し語と、予め定め
られた言語の対応語との対を含むエントリを有する前処
理辞書テーブルを記憶するための手段と、入力手段によ
って入力されたテキストに含まれる語に対し、前処理辞
書テーブルから該当する見出し語を有するエントリを検
索するための検索手段と、検索手段によって検索された
エントリに含まれる予め定められた言語の対応語で、テ
キストの一部を置換するための置換手段とを含む。SUMMARY OF THE INVENTION A machine translation preprocessing apparatus according to an aspect of the present invention pre-processes an input text in order to improve the accuracy of machine translation before inputting the input text in a source language to the machine translation apparatus. A machine translation pre-processing apparatus for processing, comprising: input means for inputting a text in a source language; an entry including a pair of a headword in a source language and a corresponding word in a predetermined language. Means for storing a processing dictionary table, searching means for searching for an entry having a corresponding headword from the preprocessing dictionary table for words included in the text input by the input means, and searching by the searching means And a replacement unit for replacing a part of the text with a corresponding word of a predetermined language included in the entered entry.

【００１０】好ましくは機械翻訳前処理装置はさらに、
ソース言語の単語についての形態素情報を含んだ形態素
情報テーブルを記憶するための手段と、形態素情報テー
ブルを参照してテキストを解析し形態素に分割して検索
手段に与えるための形態素解析手段とを含む。[0010] Preferably, the machine translation preprocessing device further comprises:
Including means for storing a morphological information table including morphological information on words in the source language, and morphological analyzing means for analyzing text with reference to the morphological information table, dividing the text into morphemes, and providing the morphemes to a search means .

【００１１】好ましくは、予め定められた言語は、ソー
ス言語とは異なる言語であり、前処理辞書テーブルのエ
ントリの各々は、ソース言語の見出し語と、予め定めら
れた言語の対応語との対に加えて、ソース言語の見出し
語の意味情報をさらに含み、置換手段は、検索手段によ
って検索されたエントリに含まれる予め定められた言語
の対応語で、テキストの一部を置換し、さらに予め定め
られた言語の対応語に対応の見出し語の意味情報を付与
するための手段を含み、機械翻訳前処理装置は、付与手
段によって一部が置換されたテキストを機械翻訳装置に
よって予め定められた言語の文に機械翻訳した結果か
ら、付与するための手段によって付与された意味情報を
削除するための手段をさらに含む。Preferably, the predetermined language is a language different from the source language, and each of the entries in the preprocessing dictionary table includes a pair of a headword of the source language and a corresponding word of the predetermined language. In addition, the replacement means further includes semantic information of the headword of the source language, and the replacement means replaces a part of the text with a corresponding word of a predetermined language included in the entry searched by the search means, and further replaces the text. Means for adding semantic information of the corresponding headword to the corresponding word of the determined language, wherein the machine translation preprocessing device is configured to pre-determine the text partially substituted by the providing device by the machine translation device. The apparatus further includes means for deleting the semantic information given by the means for giving from the result of the machine translation into the sentence of the language.

【００１２】好ましくはまた、前処理辞書テーブルのエ
ントリの各々の、ソース言語の見出し語に対応する意味
情報は、対応の見出し語を構成する形態素の中の一つで
ある。[0012] Preferably, the semantic information corresponding to the headword of the source language in each entry of the preprocessing dictionary table is one of the morphemes constituting the corresponding headword.

【００１３】好ましくは、前処理辞書テーブルのエント
リの見出し語は表意文字の組合わせを含み、見出し語に
対応する意味情報は、見出し語の末尾の一文字のみを含
んでもよい。Preferably, the headword of the entry in the preprocessing dictionary table includes a combination of ideographic characters, and the semantic information corresponding to the headword may include only one character at the end of the headword.

【００１４】本発明のある局面によれば、ソース言語
は、表意文字として漢字を使用する言語であり、意味情
報は、対応の見出しの末尾の漢字一文字のみを含む。According to one aspect of the invention, the source language is a language that uses kanji as ideographic characters, and the semantic information includes only one kanji at the end of the corresponding heading.

【００１５】さらに他の局面によれば、前処理辞書テー
ブルのエントリの各々は、ソース言語の見出し語と、予
め定められた言語の対応語との対に加えて、予め定めら
れた言語の対応語の語形情報をさらに含み、置換手段
は、検索手段によって検索されたエントリに含まれる予
め定められた言語の対応語で、テキストの一部を置換
し、さらに予め定められた言語の対応語の語形情報を付
与するための手段を含み、機械翻訳前処理装置は、付与
手段によって一部が置換されたテキストを機械翻訳装置
によって予め定められた言語の文に機械翻訳した結果か
ら、付与するための手段によって付与された語形情報に
したがって、機械翻訳した結果中に含まれる予め定めら
れた言語の対応語の語形を修正するための手段をさらに
含む。According to still another aspect, each of the entries in the pre-processing dictionary table includes, in addition to a pair of a headword of a source language and a corresponding word of a predetermined language, a correspondence of a predetermined language. Word replacement information, the replacement unit replaces a part of the text with a corresponding word of a predetermined language included in the entry searched by the search unit, and further replaces a corresponding word of the predetermined language. The machine translation preprocessing device includes a means for providing the word form information, and the machine translation preprocessing device is configured to provide, from the result of machine translation of the text partially replaced by the providing device into a sentence in a predetermined language by the machine translation device, Means for correcting the word form of a corresponding word of a predetermined language included in the result of machine translation according to the word form information given by the means.

【００１６】予め定められた言語は、ソース言語と同一
言語であってもよい。[0016] The predetermined language may be the same language as the source language.

【００１７】[0017]

【発明の実施の形態】以下、本発明の機械翻訳前処理装
置について第１〜第７の実施の形態の装置について説明
する。以下の説明では機械翻訳は日本語から英語への翻
訳であるもの（日英翻訳システム）とするが、本発明は
日英翻訳システムに限定されるわけではない。［第１の実施の形態］図１を参照して、本願発明の第１
の実施の形態にかかる機械翻訳前処理装置は、コンピュ
ータのＣＰＵ（中央演算処理装置）によって構成され、
制御プログラムにしたがって機械翻訳前処理装置の各部
を制御するための制御部１０１と、自然言語で入力され
た文の入力、前編集処理のためのオペレータによる指示
の入力、他の情報処理端末とのデータ通信、および制御
プログラムのインストールなどを行なうための入力部１
０２と、入力部１０２による入力結果、制御部１０１の
制御によるソース言語のテキストの前処理による変換結
果などを表示したり印刷したりするための出力部１０３
と、ソース言語のテキストを前処理するための見出し語
とその訳語との対からなるエントリを含む前編集のため
の辞書を記憶するためのテーブルメモリ１０４と、制御
部１０１によって実行される制御プログラムを記憶する
ためのプログラムメモリ１０５と、ソース言語の前編集
処理における作業領域として使用されるバッファメモリ
１０６と、これら各部１０１、１０２、１０３、１０
４、１０５および１０６を互いに結合し、制御プログラ
ムおよびアドレスデータをこれらの間で転送するための
バス１０８とを含む。DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, machine translation preprocessing devices according to first to seventh embodiments of the present invention will be described. In the following description, the machine translation is a translation from Japanese to English (Japanese-English translation system), but the present invention is not limited to the Japanese-English translation system. [First Embodiment] Referring to FIG. 1, a first embodiment of the present invention will be described.
The machine translation preprocessing device according to the embodiment is configured by a CPU (Central Processing Unit) of a computer,
A control unit 101 for controlling each unit of the machine translation pre-processing device according to the control program; input of a sentence input in a natural language; input of an instruction by an operator for pre-editing processing; Input unit 1 for performing data communication, installation of a control program, and the like
02 and an output unit 103 for displaying or printing an input result by the input unit 102, a conversion result by preprocessing of the source language text under the control of the control unit 101, and the like.
A table memory 104 for storing a dictionary for pre-editing including an entry consisting of a pair of a headword for pre-processing a text in a source language and a translation thereof, and a control program executed by the control unit 101 , A buffer memory 106 used as a work area in the pre-edit processing of the source language, and these units 101, 102, 103, and 10.
4, 105 and 106, and a bus 108 for transferring control programs and address data therebetween.

【００１８】制御部１０１は、プログラムメモリ１０５
から制御プログラムを読出し、この制御プログラムを実
行することによってバス１０８を介して各部を制御し、
本実施の形態の機械翻訳前処理装置を実現する。The control unit 101 includes a program memory 105
From the control program, and by executing the control program, controls each unit via the bus 108;
A machine translation preprocessing device according to the present embodiment is realized.

【００１９】入力部１０２は、キーボード、マウス、ペ
ン、タブレット、スキャナーなどの入力装置、文字認識
装置、ＣＤ−ＲＯＭ（ＣｏｍｐａｃｔＤｉｓｃ−Ｒｅ
ａｄＯｎｌｙＭｅｍｏｒy）、ＦＤ（Ｆｌｅｘｉｂｌ
ｅＤｉｓｋ）、ＤＶＤ（ＤｉｇｉｔａｌＶｉｄｅｏ
Ｄｉｓｃ）などの記憶媒体から情報を読取るための記
憶媒体読取装置、通信回線と接続される通信装置などを
含む。The input unit 102 includes input devices such as a keyboard, a mouse, a pen, a tablet, and a scanner, a character recognition device, and a CD-ROM (Compact Disc-Re).
adOnly Memory), FD (Flexible)
e Disk), DVD (Digital Video)
Disc) for reading information from a storage medium, a communication device connected to a communication line, and the like.

【００２０】出力部１０３は、ＣＲＴ（陰極線管）ディ
スプレイ、ＬＣＤ（液晶表示装置）、ＰＤ（プラズマデ
ィスプレイ）などからなる表示装置と、サーマルプリン
タ、レーザプリンタなどからなる印刷装置とを含む。The output unit 103 includes a display device such as a CRT (cathode ray tube) display, an LCD (liquid crystal display device), a PD (plasma display), and a printing device such as a thermal printer or a laser printer.

【００２１】テーブルメモリ１０４、プログラムメモリ
１０５、およびバッファメモリ１０６は、たとえば、マ
スクＲＯＭ（Ｒｅａｄ−ＯｎｌｙＭｍｅｏｒｙ）、Ｅ
ＰＲＯＭ（ＥｒａｓａｂｌｅＰｒｏｇｒａｍｍａｂｌ
ｅＲＯＭ），ＥＥＰＲＯＭ（Ｅｌｅｃｔｒｉｃａｌｌ
ｙＥｒａｓａｂｌｅＲＯＭ）、フラッシュＲＯＭな
どからなる半導体メモリ、または磁気テープもしくはカ
セットテープなどのテープ系の記憶媒体、またはＦＤも
しくはハードディスクなどの磁気ディスク、またはＣＤ
−ＲＯＭ，ＭＯ（Ｍａｇｎｅｔｏ−Ｏｐｔｉｃａｌ），
ＤＶＤなどの光ディスクもしくは光磁気ディスク、また
はメモリカードも含むＩＣカードもしくは光カードなど
のカード系の記憶媒体の任意の組み合わせを含む。The table memory 104, the program memory 105, and the buffer memory 106 include, for example, a mask ROM (Read-Only Memory),
PROM (Erasable Programmable)
e ROM), EEPROM (Electrical
y Erasable ROM), a semiconductor memory such as a flash ROM, a tape-based storage medium such as a magnetic tape or a cassette tape, a magnetic disk such as an FD or a hard disk, or a CD
-ROM, MO (Magneto-Optical),
It includes any combination of an optical disk or a magneto-optical disk such as a DVD, or a card-based storage medium such as an IC card or an optical card including a memory card.

【００２２】テーブルメモリ１０４は、前処理に必要な
日本語の見出し語と、その英語の訳語とを連付けて記憶
した前編集辞書テーブル１０４１を記憶している。The table memory 104 stores a preedit dictionary table 1041 in which Japanese headwords necessary for preprocessing and their English translations are linked and stored.

【００２３】図２を参照して、前編集辞書テーブル１０
４１は、ソース言語の語彙と、機械翻訳システムにおけ
るターゲット言語の訳語に相当する語彙とが対となった
見出しを複数個記憶している。Referring to FIG. 2, pre-edit dictionary table 10
Reference numeral 41 stores a plurality of headings in which the vocabulary of the source language and the vocabulary corresponding to the translation of the target language in the machine translation system are paired.

【００２４】プログラムメモリ１０５は、入力されたソ
ース言語のテキストに対して、前処理辞書テーブル１０
４１を参照した辞書引き処理を行なう辞書引き部１０５
１と、前処理辞書テーブル１０４１内に対応の見出し語
が見出された語彙を見出し語の内容にしたがってしかる
べき語彙で置換するための置換部１０５２として機能す
るプログラムとを記憶している。The program memory 105 stores a pre-processing dictionary table 10
Dictionary lookup unit 105 that performs dictionary lookup processing with reference to 41
1 and a program that functions as a replacement unit 1052 for replacing the vocabulary in which the corresponding headword is found in the preprocessing dictionary table 1041 with an appropriate vocabulary according to the contents of the headword.

【００２５】バッファメモリ１０６は、入力されたソー
ス言語の翻訳対象の文を記憶するための入力文バッファ
１０６１と、入力文に対して前処理辞書テーブル１０４
１を参照して行なわれた辞書引き処理の結果を記憶する
ための辞書引き結果バッファ１０６２と、入力文の語彙
を、辞書引きの結果にしたがってしかるべき語彙で置換
した結果を記憶するための置換結果バッファ１０６３と
を含む。A buffer memory 106 stores an input sentence buffer 1061 for storing an input source sentence to be translated, and a preprocessing dictionary table 104 for the input sentence.
1. A dictionary lookup result buffer 1062 for storing the result of the dictionary lookup process performed with reference to No. 1, and a replacement for storing the result of replacing the vocabulary of the input sentence with an appropriate vocabulary according to the dictionary lookup result. And a result buffer 1063.

【００２６】記憶媒体１０７は、たとえば、マスクＲＯ
Ｍ（Ｒｅａｄ−ＯｎｌｙＭｍｅｏｒｙ）、ＥＰＲＯＭ
（ＥｒａｓａｂｌｅＰｒｏｇｒａｍｍａｂｌｅＲＯ
Ｍ），ＥＥＰＲＯＭ（ＥｌｅｃｔｒｉｃａｌｌｙＥｒ
ａｓａｂｌｅＲＯＭ）、フラッシュＲＯＭなどからな
る半導体メモリ、または磁気テープもしくはカセットテ
ープなどのテープ系の記憶媒体、またはＦＤもしくはハ
ードディスクなどの磁気ディスク、またはＣＤ−ＲＯ
Ｍ，ＭＯ（Ｍａｇｎｅｔｏ−Ｏｐｔｉｃａｌ），ＤＶＤ
などの光ディスクもしくは光磁気ディスク、またはメモ
リカードも含むＩＣカードもしくは光カードなどのカー
ド系の記憶媒体の任意の組み合わせを含む。記憶媒体１
０７は、この機械翻訳前処理装置の入力部１０２に着脱
可能であって、本実施の形態の機械翻訳前処理を実現す
るための機械翻訳前処理プログラムを固定的に担持する
ためのものである。The storage medium 107 includes, for example, a mask RO
M (Read-Only Memory), EPROM
(Erasable Programmable RO
M), EEPROM (Electrically Er
assable ROM), a semiconductor memory such as a flash ROM, a tape-based storage medium such as a magnetic tape or a cassette tape, a magnetic disk such as an FD or a hard disk, or a CD-RO
M, MO (Magneto-Optical), DVD
And any combination of a card-based storage medium such as an IC card or an optical card including a memory card. Storage medium 1
Reference numeral 07 denotes a unit which is detachably attached to the input unit 102 of the machine translation preprocessing apparatus and fixedly carries a machine translation preprocessing program for implementing the machine translation preprocessing of the present embodiment. .

【００２７】この機械翻訳前処理プログラムを入力部１
０２の記憶媒体読取装置を介してプログラム１０５内の
領域にインストールすることにより、本実施の形態の機
械翻訳前処理装置が実現できる。または、入力部１０２
中の通信装置を介して接続された外部ネットワークのい
ずれかのコンピュータからプログラムを受信し、プログ
ラムメモリ１０５にインストールしてもよい。通信ネッ
トワークからプログラムを受信する場合には、そのため
の通信プログラムをあらかじめプログラムメモリ１０５
等に格納しておけばよい。The machine translation pre-processing program is input to the input unit 1
By installing the program in the area of the program 105 via the storage medium reading device 02, the machine translation preprocessing device of the present embodiment can be realized. Or, the input unit 102
The program may be received from any computer on the external network connected via the inside communication device and installed in the program memory 105. When a program is received from a communication network, a communication program for that program is stored in the program memory 105 in advance.
And so on.

【００２８】図３を参照して、この第１の実施の形態の
機械翻訳前処理装置を実現するための制御プログラムは
次のような制御構造を有する。入力部１０２に入力され
た日本語文を、制御部１０１により入力バッファ１０６
１に格納する（ステップ２０１）。Referring to FIG. 3, a control program for realizing the machine translation preprocessing device of the first embodiment has the following control structure. The control unit 101 converts the Japanese sentence input to the input unit 102 into an input buffer 106.
1 (step 201).

【００２９】辞書引き部１０５１により、入力バッファ
１０６１に格納した日本語文の各語に対して前処理辞書
テーブル１０４１内に対応するエントリがあるか否かを
調べる（２０２）。対応するエントリがあった場合に
は、そのエントリの見出し語と訳語とを辞書引き結果バ
ッファ１０６２に格納してステップ２０４に進む。対応
するエントリがなかった場合には、入力文バッファ１０
６１の内容を置換結果バッファ１０６３にコピーしてス
テップ２０５に進む。The dictionary lookup unit 1051 checks whether there is a corresponding entry in the pre-processing dictionary table 1041 for each word of the Japanese sentence stored in the input buffer 1061 (202). If there is a corresponding entry, the headword and the translation of the entry are stored in the dictionary lookup result buffer 1062, and the process proceeds to step 204. If there is no corresponding entry, the input sentence buffer 10
The contents of 61 are copied to the replacement result buffer 1063, and the flow advances to step 205.

【００３０】ステップ２０４では、置換部１０５２が、
入力文バッファ１０６１に格納された日本語文中の、辞
書引き結果バッファ１０６２に格納された見出し語を辞
書引き結果バッファ１０６２に格納された対応する訳語
に変換して置換結果バッファ１０６３に格納する。In step 204, the replacement unit 1052
The headword stored in the dictionary lookup result buffer 1062 in the Japanese sentence stored in the input sentence buffer 1061 is converted into the corresponding translation stored in the dictionary lookup result buffer 1062 and stored in the replacement result buffer 1063.

【００３１】続いてステップ２０５で、置換結果バッフ
ァ１０６３に格納されている日本語文を出力部１０３を
介して図示しない機械翻訳システムに与え、翻訳を行な
わせる。こうして、翻訳結果が得られる。得られた翻訳
結果は、機械翻訳システムの辞書にない単語であって
も、この機械翻訳前処理装置による前処理を行なうこと
により、機械翻訳の前に適切な訳語に置換されているた
め、得られた翻訳結果は前処理を行なわなかった場合と
比較してより適切なものとなる。Subsequently, in step 205, the Japanese sentence stored in the replacement result buffer 1063 is provided to a machine translation system (not shown) via the output unit 103 to perform translation. Thus, a translation result is obtained. Even if the obtained translation result is a word that is not in the dictionary of the machine translation system, it is replaced by an appropriate translation before machine translation by performing preprocessing by the machine translation preprocessing device. The obtained translation result is more appropriate as compared with the case where the preprocessing is not performed.

【００３２】たとえば、例として「彼はプリンストン大
学に入学した。」という文を翻訳する場合を例として説
明する。機械翻訳システムの辞書には「プリンストン大
学」という語がなかったものとする。すると、前処理を
行なわなかった場合には「プリンストン」という語が適
切に訳されないこととなり、望ましい結果を得られな
い。これに対して、前処理辞書テーブル１０４１の内容
が図２に示すようなものである場合には、辞書引き部１
０５１の機能によって、入力文バッファ中の「プリンス
トン大学」という語（図４の上段参照）に対して、「Pr
inceton University」という語が訳語として辞書引き結
果バッファに格納される。さらに置換部１０５２の機能
によって、入力文バッファ中の文の「プリンストン大
学」という語が「Princeton University」と置換されて
置換結果バッファ１０６３に「彼はPrinceton Universi
tyに入学した。」となって格納される。この文を機械翻
訳システムに与えて翻訳することにより、「He entered
Princeton University」と翻訳される。For example, a case where a sentence "He has entered Princeton University" is translated will be described as an example. It is assumed that the word "Princeton University" was not found in the machine translation system dictionary. If the preprocessing is not performed, the word "Princeton" will not be properly translated, and a desired result cannot be obtained. On the other hand, if the contents of the pre-processing dictionary table 1041 are as shown in FIG.
With the function 051, the word "Princeton University" in the input sentence buffer (see the upper part of FIG. 4) is changed to "Pr
The word "inceton University" is stored in the dictionary lookup result buffer as a translation. Further, the function of the substitution unit 1052 replaces the word “Princeton University” in the sentence buffer with “Princeton University” and stores “He is Princeton Universi” in the substitution result buffer 1063.
I entered ty. And stored. By giving this sentence to the machine translation system and translating it, "He entered
Princeton University ".

【００３３】一方、このような前処理を行なわず、入力
文バッファ１０６１の内容をそのまま同じ機械翻訳シス
テムに与えて翻訳させたところ、「He entered プリン
ストン university」となる。両者を比較すると、本実
施の形態の機械翻訳前処理システムを行なった場合の方
が、原文の「プリンストン大学」の部分がきちんと訳さ
れており、より適切な英文が翻訳結果として得られた。On the other hand, when such preprocessing is not performed and the contents of the input sentence buffer 1061 are directly provided to the same machine translation system and translated, the result is "He entered Princeton university". Comparing the two, when the machine translation pre-processing system of the present embodiment was performed, the portion of "Princeton University" in the original text was properly translated, and a more appropriate English text was obtained as a translation result.

【００３４】本実施の形態のシステムでは、機械翻訳シ
ステム内の辞書に「プリンストン大学」と「Princeton
University」という語との対からなるエントリを登録す
ることなく、適切な翻訳を得ることができる。仮に機械
翻訳システムの辞書の内容を変更することが不可能な場
合であっても、実質的に機械翻訳システムの辞書の登録
語数を増加させたのと同様の効果を得ることができる。
しかも本実施の形態の機械翻訳前処理装置では、機械翻
訳システムの辞書の形式とは独立に前処理辞書テーブル
１０４１を作成すればよく、その登録語数も自由に設定
することができる。そのため、機械翻訳システムの内部
を全く変更することなく、実質的に機械翻訳システムに
含まれる辞書の登録語数を大幅に増やしたのと同様に翻
訳の精度を向上させることができるという効果を得るこ
とができる。［第２の実施の形態］第２の実施の形態にかかる機械翻
訳前処理装置は、入力文に対して形態素解析を行なう点
で第１の実施の形態の装置とは異なる。図５を参照し
て、この第２の実施の形態の装置が、その構成上で第１
の実施の形態の装置と異なるのは、テーブルメモリ１０
４が前処理辞書テーブル１０４１に加えて、形態素解析
のための形態素情報テーブル１０４２を含むことと、プ
ログラムメモリ１０５が、辞書引き部１０５１、置換部
１０５２に加えて、入力文バッファ１０６１に格納され
たソース言語の入力文に対して形態素解析を行なうため
の形態素解析部１０５４を含むことと、バッファメモリ
１０６が、入力文１０６１、辞書引き結果バッファ１０
６２，置換結果バッファ１０６３に加えて、形態素解析
部１０５４による形態素解析結果を格納するための形態
素解析結果バッファ１０６４をさらに含むこととであ
る。図５において、図１と同じ部品には同じ参照番号お
よび同じ名称を付してある。それらの機能も同じであ
る。したがってここではそれらについての詳細な説明は
繰り返さない。なお、形態素解析はこの分野においては
周知の事項であり、その実現方法についても周知であ
る。したがってここではその内容の詳細については触れ
ない。In the system according to the present embodiment, "Princeton University" and "Princeton
An appropriate translation can be obtained without registering an entry consisting of a pair with the word "University". Even if it is impossible to change the contents of the dictionary of the machine translation system, it is possible to obtain substantially the same effect as increasing the number of words registered in the dictionary of the machine translation system.
Moreover, in the machine translation preprocessing device of the present embodiment, the preprocessing dictionary table 1041 may be created independently of the dictionary format of the machine translation system, and the number of registered words can be freely set. Therefore, without changing the inside of the machine translation system at all, it is possible to obtain the effect that the accuracy of translation can be improved as well as substantially increasing the number of registered words in the dictionary included in the machine translation system. Can be. [Second Embodiment] The machine translation preprocessing device according to the second embodiment is different from the device of the first embodiment in that morphological analysis is performed on an input sentence. Referring to FIG. 5, the device according to the second embodiment has the same structure as the first device.
The difference from the apparatus of the embodiment is that the table memory 10
4 includes a morphological information table 1042 for morphological analysis in addition to the preprocessing dictionary table 1041, and the program memory 105 is stored in the input sentence buffer 1061 in addition to the dictionary lookup unit 1051 and the replacement unit 1052. A morphological analysis unit 1054 for performing a morphological analysis on an input sentence of a source language is included, and a buffer memory 106 includes an input sentence 1061, a dictionary lookup result buffer 10
62, in addition to the replacement result buffer 1063, a morphological analysis result buffer 1064 for storing a morphological analysis result by the morphological analysis unit 1054 is further included. 5, the same components as those in FIG. 1 are given the same reference numerals and the same names. Their functions are the same. Therefore, detailed description thereof will not be repeated here. Note that morphological analysis is a well-known matter in this field, and its realizing method is also well-known. Therefore, the details of the contents will not be described here.

【００３５】図６を参照して、この第２の実施の形態の
機械翻訳前処理装置を実現するプログラムは、図３に示
される第１の実施の形態の装置の制御プログラムにおい
て、ステップ２０２に代えて、形態素解析部１０５４に
相当するステップ３０２１および３０２２を含んでいる
点が異なる。図６において、図３と同じステップには図
３と同じ参照番号を付してある。Referring to FIG. 6, the program for realizing the machine translation preprocessing device of the second embodiment is the same as the control program of the device of the first embodiment shown in FIG. Instead, the difference is that steps 3021 and 3022 corresponding to the morphological analysis unit 1054 are included. 6, the same steps as those in FIG. 3 are denoted by the same reference numerals as those in FIG.

【００３６】この第２の実施の形態の装置では、入力文
バッファ１０６１に格納された入力文に対して、ステッ
プ３０２１において、形態素情報テーブル１０４２を参
照した形態素解析が行なわれ、その結果が形態素解析結
果バッファ１０６４に格納される。その形態素解析され
た結果に基づいて、辞書引き部１０５１がステップ３０
２２で前処理辞書テーブル１０４１に対応のエントリが
存在するか否かを調べる。入力文が形態素解析の結果、
適切に形態素に分割されているので、辞書引き部１０５
１は、分割された形態素と、それらの結合のみについ
て、対応のエントリを検索すればよい。そのため、文を
不適切な形で分割した結果をも含んで辞書引きを行なう
場合と比較して、処理を高速化でき、辞書引きの精度、
したがって最終的な機械翻訳の精度も向上させることが
できる。In the apparatus according to the second embodiment, a morphological analysis is performed on the input sentence stored in the input sentence buffer 1061 in step 3021 with reference to the morphological information table 1042, and the result is obtained by the morphological analysis. It is stored in the result buffer 1064. Based on the result of the morphological analysis, the dictionary lookup unit 1051
At 22, it is checked whether a corresponding entry exists in the preprocessing dictionary table 1041. If the input sentence is the result of morphological analysis,
Since it is appropriately divided into morphemes, the dictionary lookup unit 105
1 only needs to search a corresponding entry for only the divided morphemes and their combination. Therefore, the processing speed can be increased compared with the case of performing dictionary lookup including a result obtained by dividing a sentence in an inappropriate form, and the dictionary lookup accuracy,
Therefore, the accuracy of the final machine translation can be improved.

【００３７】たとえば、前述の例文「彼はプリンストン
大学に入学した。」という文を考える。この文は、形態
素解析の結果「彼/は/プリンストン/大学/に/入学し/た
/。」というように形態素に分割される。辞書引きの際
には、これら単語と、その結合のみについて、対応のエ
ントリを検索すればよい。この結果、たとえば「彼は
プ」とか、「学に入学し」などのように辞書のエントリ
として存在するはずのない文字列でエントリを検索する
ことがなくなり、辞書引きが高速化できること、精度も
向上することがわかる。For example, consider the above sentence "He has entered Princeton University." This sentence is the result of the morphological analysis "he / ha / Princeton / college /
/. Is divided into morphemes. At the time of dictionary lookup, corresponding entries may be searched for only these words and their combinations. As a result, there is no need to search for an entry with a character string that should not exist as a dictionary entry, such as "He is a pu" or "Just enrolled in a school". It turns out that it improves.

【００３８】以上のようにこの第２の実施の形態の装置
では、第１の実施の形態の効果に加えて、前処理を高速
化でき、かつその精度を向上させることができる。その
結果、機械翻訳システムによる翻訳の精度も向上させる
ことができる。［第３の実施の形態］図７に、本発明の第３の実施の形
態の機械翻訳前処理システムのブロック図を示す。この
装置が図１に示すものと異なるのは、前処理辞書テーブ
ル１０４１の各エントリが、見出し語と、訳語と、意味
情報とを含んでいることと、プログラムメモリ１０５
が、辞書引き部１０５１、置換部１０５２に加えて、辞
書引きの結果得られた単語の意味情報を含んで置き換え
られた単語を用いて機械翻訳された結果から、意味情報
に相当する部分を削除する後処理を行なうための後処理
部１０５３を含んでいる点と、バッファメモリ１０６
が、図示しない機械翻訳システムで翻訳された結果を格
納するための翻訳結果バッファ１０６５を含んでいる点
とである。As described above, in the apparatus according to the second embodiment, in addition to the effects of the first embodiment, the speed of the preprocessing can be increased, and the accuracy thereof can be improved. As a result, the accuracy of translation by the machine translation system can be improved. Third Embodiment FIG. 7 is a block diagram showing a machine translation preprocessing system according to a third embodiment of the present invention. This apparatus differs from that shown in FIG. 1 in that each entry of the preprocessing dictionary table 1041 includes a headword, a translation word, and semantic information, and that the program memory 105
Deletes the part corresponding to the semantic information from the result of machine translation using the replaced word including the semantic information of the word obtained as a result of the dictionary lookup, in addition to the dictionary lookup unit 1051 and the replacement unit 1052 Including a post-processing unit 1053 for performing post-processing,
However, a translation result buffer 1065 for storing a result translated by a machine translation system (not shown) is included.

【００３９】図８に、この第３の実施の形態の前処理辞
書テーブル１０４１の見出しの例を示す。たとえば「大
量出血」という見出しには「hemorrhage」という訳語が
割り当てられているが、さらに意味情報として［出血］
が割り当てられている。ほかの見出しについても同様で
ある。FIG. 8 shows an example of a heading of the preprocessing dictionary table 1041 according to the third embodiment. For example, the heading "Major bleeding" is assigned the translation "hemorrhage", but the semantic information is [bleeding]
Is assigned. The same applies to other headings.

【００４０】図９に、この第３の実施の形態の装置にお
ける制御プログラムのフローチャートを示す。図９を参
照して、入力部１０２に入力された日本語文を、制御部
１０１により入力文バッファ１０６１に格納する（４０
１）。FIG. 9 shows a flowchart of a control program in the device according to the third embodiment. Referring to FIG. 9, control unit 101 stores a Japanese sentence input to input unit 102 in input sentence buffer 1061 (40).
1).

【００４１】辞書引き部１０５１により、入力文バッフ
ァ１０６１に格納された日本語文に対して前処理辞書テ
ーブル１０４１のエントリを辞書引きする。続いてステ
ップ４０３で辞書引きが成功したか否かを判定し、成功
した場合には結果の見出し語と訳語と意味情報とを辞書
引き結果バッファ１０６２に格納して制御はステップ４
０４に進み、失敗した場合には入力文を置換結果バッフ
ァ１０６３にコピーして制御はステップ４０５に進む。The dictionary lookup unit 1051 lookups the entries of the pre-processing dictionary table 1041 for Japanese sentences stored in the input sentence buffer 1061. Subsequently, in step 403, it is determined whether or not the dictionary lookup is successful. If the dictionary lookup is successful, the headword, translation, and semantic information of the result are stored in the dictionary lookup result buffer 1062, and control proceeds to step 4.
04, and if unsuccessful, the input statement is copied to the replacement result buffer 1063, and control proceeds to step 405.

【００４２】ステップ４０４では、置換部１０５２によ
り、入力文バッファ１０６１に格納された日本語文中
の、辞書引き結果バッファ１０６２に格納された見出し
語を訳語に変換し、置換結果バッファ１０６３に格納す
る。このとき、辞書引き結果バッファ１０６２に格納さ
れた意味情報を、機械翻訳システムが正しく解釈できる
形式で置換結果バッファ１０６３に格納する。たとえば
機械翻訳システムが、ある単語に対して記号「◎」を挟
んで意味情報が後置されている場合に、記号「◎」の前
の語の関連部分を訳すときに、記号「◎」の後の意味情
報に即した単語を使用する機能を有している場合、ステ
ップ４０４では訳語の後ろに記号「◎」を挟んで意味情
報を後置する。In step 404, the replacement unit 1052 converts the headword stored in the dictionary lookup result buffer 1062 in the Japanese sentence stored in the input sentence buffer 1061 into a translated word and stores it in the replacement result buffer 1063. At this time, the semantic information stored in the dictionary lookup result buffer 1062 is stored in the replacement result buffer 1063 in a format that can be correctly interpreted by the machine translation system. For example, if a machine translation system translates the relevant part of the word before the symbol “◎” when the semantic information is added after the symbol “◎” to a certain word, If the user has a function of using a word according to the subsequent semantic information, in step 404, the semantic information is added after the translated word with a symbol “◎” therebetween.

【００４３】続いて、置換結果バッファ１０６３に格納
された日本語文を出力部１０３を介して図示しない機械
翻訳システムで翻訳する。その結果を翻訳結果バッファ
１０６５に格納する。Subsequently, the Japanese sentence stored in the replacement result buffer 1063 is translated via the output unit 103 by a machine translation system (not shown). The result is stored in the translation result buffer 1065.

【００４４】ステップ４０６では、ステップ４０４での
意味情報の付加が行なわれた文か否かについての判定が
行なわれ、付加が行なわれた文である場合には制御はス
テップ４０７に、それ以外の場合には制御はステップ４
０８に、それぞれ進む。At step 406, it is determined whether or not the sentence has the semantic information added at step 404. If the sentence has been added, control is passed to step 407; If so, control goes to step 4.
Go to 08 respectively.

【００４５】ステップ４０７では、翻訳結果バッファ１
０６５に格納された翻訳結果文から、ステップ４０４で
付与された意味情報に該当する部分を削除する後処理を
行ない、翻訳結果バッファ１０６５に格納し直す処理が
行なわれる。この処理が図７の後処理部１０５３が行な
う処理に相当する。上の説明の場合、記号「◎」に続く
語が削除される。こうして得られた結果の文をステップ
４０８で出力する。In step 407, the translation result buffer 1
Post-processing for deleting the portion corresponding to the semantic information given in step 404 from the translation result sentence stored in 065 is performed, and the process of re-storing in the translation result buffer 1065 is performed. This processing corresponds to the processing performed by the post-processing unit 1053 in FIG. In the case of the above description, the word following the symbol “◎” is deleted. The resulting sentence is output in step 408.

【００４６】より具体的な文を例としてこの第３の実施
の形態の機械翻訳前処理装置の動作について説明する。
例として「妊婦が大量出血を起こした。」という文が入
力されたものとする。この文に対する辞書引きの結果、
「大量出血」という語に対して「hemorrhage」が辞書引
きされたとする。このとき、辞書引き結果バッファ１０
６２には、「大量出血」「hemorrhage」という、見出し
語、訳語の対に加えて「病気」という意味情報も格納さ
れる。図８に示すように、「大量出血」「hemorrhage」
という対には「病気」という意味情報が付加されている
ためである。The operation of the machine translation preprocessing apparatus according to the third embodiment will be described with a more specific sentence as an example.
As an example, it is assumed that a sentence "pregnant woman has caused massive bleeding." As a result of dictionary lookup for this sentence,
Suppose that "hemorrhage" was dictionary searched for the word "massive bleeding". At this time, the dictionary lookup result buffer 10
In the column 62, semantic information “disease” is stored in addition to a pair of a headword and a translated word such as “mass bleeding” and “hemorrhage”. As shown in FIG. 8, "massive bleeding" and "hemorrhage"
This is because semantic information “disease” is added to the pair.

【００４７】ステップ４０４の結果、置換結果バッファ
１０６２の内容は「妊婦が<hemorrhage◎病気>を起こし
た。」となる。これを機械翻訳装置に与えたとすると、
前述の通りこの機械翻訳システムが、記号「◎」の前の
単語の関連部分については、記号「◎」に後置されてい
る意味情報を考慮して翻訳する機能を有しているので、
「A pregnant woman caused hemorrhage◎illness.」が
翻訳結果として得られる。後処理部１０５３によりステ
ップ４０７で「◎illness」が削除されるので、最終的
に得られる訳文は「A preg nant woman caused hemorr
hage.」となる。この結果は、意味情報を付与せずに入
力文の「大量出血」を「hemorrhage」と置換しただけの
文「妊婦がhemorrhageを起こした。」を機械翻訳した結
果得られた訳文「A pregnant woman set up large-scal
e bleeding.」と比較すると、より適切な英文が得られ
ていることがわかる。つまり、「起こした」の箇所が
「set up」から「caused」に、「大量出血」の意味を反
映してより適切な単語に置き換えられている。As a result of step 404, the content of the replacement result buffer 1062 is "pregnant woman has had <hemorrhage ◎ sickness." If you give this to a machine translator,
As described above, this machine translation system has a function of translating a related part of the word before the symbol “◎” in consideration of the semantic information added after the symbol “◎”.
"A pregnant woman caused hemorrhage ◎ illness." Is obtained as a translation result. Since “◎ illness” is deleted in step 407 by the post-processing unit 1053, the translated text finally obtained is “A pregnant woman caused hemorr
hage. " This result is obtained by machine-translating the sentence "A pregnant woman who has caused hemorrhage." set up large-scal
e bleeding. "shows that more appropriate English sentences are obtained. In other words, the word "raised" has been replaced with a more appropriate word reflecting the meaning of "massive bleeding" from "set up" to "caused".

【００４８】以上のようにこの第３の実施の形態の装置
によれば、前処理辞書テーブルに意味情報が付与されて
おり、機械翻訳システムには英語訳語に加えてその意味
情報が与えられる。そのため、機械翻訳システムはこの
意味情報を用いてより適切な翻訳を行なうことができ、
翻訳システム全体の翻訳精度を向上させることができ
る。［第４の実施の形態］第４の実施の形態の機械翻訳前処
理装置は、第３の実施の形態の装置とほぼ同様の構成で
あるが、前処理情報テーブル１０４１の中の意味情報と
して見出し語の部分形態素を採用している点が異なる。
つまり、予め見出し語を形態素分析しておき、得られた
形態素のうちの一つを意味情報として用いる。As described above, according to the device of the third embodiment, the preprocessing dictionary table is provided with the semantic information, and the machine translation system is provided with the semantic information in addition to the English translation. Therefore, the machine translation system can perform more appropriate translation using this semantic information,
The translation accuracy of the entire translation system can be improved. [Fourth Embodiment] A machine translation pre-processing apparatus according to a fourth embodiment has substantially the same configuration as the apparatus according to the third embodiment. The difference is that a partial morpheme of the headword is used.
That is, the headword is subjected to morphological analysis in advance, and one of the obtained morphemes is used as semantic information.

【００４９】この場合の前処理辞書テーブル１０４１の
内容の一例を図１０に示す。図１０を参照して、たとえ
ば「大量出血」という見出し語は、「大量/出血」と形
態素分析される。そこで、この二つの形態素「大量」お
よび「出血」のうち、意味を表すためにより適切と思わ
れる「出血」という語を意味情報として用いる。FIG. 10 shows an example of the contents of the preprocessing dictionary table 1041 in this case. Referring to FIG. 10, for example, a headword “massive bleeding” is morphologically analyzed as “massive / bleeding”. Therefore, of the two morphemes "mass" and "bleeding", the word "bleeding" which is considered more appropriate to represent the meaning is used as semantic information.

【００５０】前処理辞書テーブル１０４１をこのように
作成した場合、第３の実施の形態の装置と全く同様のハ
ードウェア構成、機能ブロック構成およびソフトウェア
構成で、第３の実施の形態と同様に、意味情報を用いた
精度の高い機械翻訳を実現するための前編集処理が可能
となる。When the preprocessing dictionary table 1041 is created in this way, the hardware configuration, functional block configuration, and software configuration are exactly the same as those of the apparatus of the third embodiment, and the same as in the third embodiment, Pre-editing processing for realizing high-precision machine translation using semantic information becomes possible.

【００５１】たとえば、第３の実施の形態の装置では、
「妊婦が大量出血を起こした。」という入力文に対する
置換結果バッファ１０６３の内容は、「妊婦が<hemorrh
age◎病気>を起こした。」となっていた。それに対して
本実施の形態の装置では、置換結果バッファ１０６３の
内容は「妊婦が<hemorrhage◎出血>を起こした。」とな
る。この文を機械翻訳にかけた結果、「出血」という意
味情報に応じて適切な訳語が選択され、「A pregnant w
oman caused hemorrhage◎bleeding.」という翻訳結果
が得られる。後処理部１０５３による後処理によって
「◎」の次の語を削除した結果、「A pregnant woman c
aused hemorrhage.」という、第３の実施の形態の装置
によるものと同じ訳文が得られる。For example, in the device of the third embodiment,
The contents of the replacement result buffer 1063 for the input sentence “Pregnant woman has caused massive bleeding.”
age ◎ I was ill. It was. On the other hand, in the apparatus of the present embodiment, the content of the replacement result buffer 1063 is “pregnant woman has experienced <hemorrhage bleeding>”. As a result of subjecting this sentence to machine translation, an appropriate translation was selected according to the semantic information "bleeding", and "A pregnant w
oman caused hemorrhage ◎ bleeding. ” As a result of deleting the word next to “◎” by the post-processing by the post-processing unit 1053, “A pregnant woman c
aused hemorrhage. ", which is the same translation as that obtained by the apparatus of the third embodiment.

【００５２】前処理辞書テーブルの見出し語に意味情報
を付与するためには、大規模なシソーラスを用いて見出
し語に対する意味情報を決定したり、人手で意味情報を
チェックしたりする必要がある。そのために、前処理辞
書テーブルのコストが上昇するおそれがある。しかしこ
の第４の実施の形態の装置では、予め見出し語を形態素
解析し、分割された形態素のうちの一つを意味情報とし
て用いる。特に、日本語の漢字の組合わせからなる複合
語の場合には、分割された形態素のうちの一つ、たとえ
ば最後の一つが適切にその複合語の意味情報を表す場合
が多い。そのため、大規模なシソーラスを用いる必要が
なく、各見出し語に意味情報を付与することができる。
また、形態素分析し、そのうちの一語を意味情報として
見出し語に付与する操作は自動的に行なうことができ
る。そのため、前処理辞書テーブル１０４１のコストの
上昇を抑えることができる。［第５の実施の形態］第５の実施の形態の機械翻訳前処
理装置は、図７に示す第３の実施の形態にかかる機械翻
訳前処理装置と同様のハードウェア構成、機能ブロック
構成およびソフトウェア構成で、ただ前処理辞書テーブ
ルの各見出し語への意味情報の付与の方法のみが異な
る。したがってこの第５の実施の形態の装置の構成およ
び機能は、第４の実施の形態にかかる機械翻訳前処理装
置の場合と類似している。In order to add semantic information to the headword in the pre-processing dictionary table, it is necessary to determine the semantic information for the headword using a large-scale thesaurus or to manually check the semantic information. Therefore, the cost of the pre-processing dictionary table may increase. However, in the apparatus according to the fourth embodiment, a headword is subjected to morphological analysis in advance, and one of the divided morphemes is used as semantic information. In particular, in the case of a compound word composed of a combination of Japanese kanji, one of the divided morphemes, for example, the last one, often appropriately represents the semantic information of the compound word. Therefore, it is not necessary to use a large-scale thesaurus, and semantic information can be added to each headword.
In addition, an operation of performing morphological analysis and assigning one of the words to the headword as semantic information can be automatically performed. Therefore, an increase in the cost of the pre-processing dictionary table 1041 can be suppressed. [Fifth Embodiment] A machine translation preprocessing apparatus according to a fifth embodiment has the same hardware configuration, functional block configuration, and the same configuration as those of the machine translation preprocessing apparatus according to the third embodiment shown in FIG. Only the method of adding the meaning information to each headword in the preprocessing dictionary table differs depending on the software configuration. Therefore, the configuration and functions of the apparatus according to the fifth embodiment are similar to those of the machine translation preprocessing apparatus according to the fourth embodiment.

【００５３】この第５の実施の形態の装置の前処理辞書
テーブルでは、各見出し語に付与される意味情報とし
て、見出し語の末尾の一文字を用いる。たとえば図１１
に示すように、「大量出血」に対して「血」を、「血友
病患者」に対して「者」を、「血色素」に対して「素」
のごとくである。In the pre-processing dictionary table of the apparatus according to the fifth embodiment, one character at the end of the headword is used as the meaning information added to each headword. For example, FIG.
As shown in, "blood" for "massive bleeding", "person" for "hemophilia" and "primary" for "hemochrome"
Like.

【００５４】一般に、日本語の漢字の組合わせからなる
複合語では末尾の語がその複合語の意味を最もよく表し
ている。また漢字は表意文字であって、一文字である意
味を表せる。複合語では、「形容詞的な漢字」＋「修飾
される概念を表す漢字」という組合わせがおおい。つま
りある語がたとえば二文字の漢字からなっている場合、
「出血」「止血」「吐血」がいずれも「血」に関連する
語群に属し、「白雲」「黒雲」「青雲」がいずれも
「雲」に関連する語群に属するように、二文字のうちの
末尾の漢字一文字によって、その語が属する包括的な語
群が表わされていることが多い。そこで、この第５の実
施の形態のように見出し語の末尾の一文字を見出し語の
意味情報として採用した。In general, in a compound word composed of a combination of Japanese kanji, the last word best represents the meaning of the compound word. Kanji is an ideographic character and can represent one character. In compound words, the combination of "adjective kanji" + "kanji representing the concept to be modified" is common. That is, if a word consists of two kanji, for example,
Two letters so that "bleeding", "hemostatic" and "hematuration" all belong to the group of words related to "blood", and "white cloud", "black cloud" and "blue cloud" all belong to the group of words related to "cloud". Of the characters, the last kanji character often indicates a comprehensive word group to which the word belongs. Therefore, one character at the end of the headword is adopted as the meaning information of the headword as in the fifth embodiment.

【００５５】この第５の実施の形態の場合、前述の「妊
婦が大量出血を起こした。」という入力文に対する置換
結果バッファ１０６３の内容は、「妊婦が<hemorrhage
◎血>を起こした。」となる。この文を機械翻訳にかけ
た結果、「血」という意味情報に応じて適切な訳語が選
択され、「A pregnant woman caused hemorrhage◎bloo
d.」という翻訳結果が得られる。後処理部１０５３によ
る後処理によって「◎」の次の語を削除した結果、「A
pregnant woman caused hemorrhage.」という、第３お
よび第４の実施の形態の装置によるものと同じ訳文が得
られる。In the case of the fifth embodiment, the contents of the replacement result buffer 1063 for the above-mentioned input sentence “Pregnant woman has caused massive bleeding.”
◎ Blood was raised. ". As a result of subjecting this sentence to machine translation, an appropriate translation was selected according to the semantic information of "blood" and "A pregnant woman caused hemorrhage ◎ bloo
d. "is obtained. As a result of deleting the word next to “◎” by the post-processing by the post-processing unit 1053, “A
"Pregnant woman caused hemorrhage.", which is the same translation as that obtained by the devices of the third and fourth embodiments.

【００５６】このように、見出し語の末尾の一文字をそ
の意味情報として採用する場合、第４の実施の形態の場
合よりもさらに意味情報の付与作業が簡単となる。その
ため前処理辞書テーブル１０４１を作成する場合に要す
る手間を削減でき、コストの上昇を抑えることができ
る。さらにこのような方法を採用する場合には、前処理
辞書テーブル１０４１に意味情報を付与しておくことな
く、前処理時に、置換部１０５２による作業のときに意
味情報を付与することが容易に行なえる。［第６の実施の形態］第６の実施の形態にかかる機械翻
訳前処理装置のハードウェア、および機能ブロックの構
成は、図７に示した第３の実施の形態のものと同様であ
る。第６の実施の形態の装置では、前処理辞書テーブル
１０４１において、各エントリを、見出し語と、英語訳
語、および英語の語形情報とからなるようにした点が第
３の実施の形態と異なる点である。図１２に、この第６
の実施の形態における前処理辞書テーブル１０４１の内
容の例について示す。As described above, when the last character of the headword is adopted as its semantic information, the work of assigning the semantic information is further simplified as compared with the case of the fourth embodiment. Therefore, it is possible to reduce the labor required when creating the pre-processing dictionary table 1041, and suppress an increase in cost. Further, when such a method is adopted, it is possible to easily add the semantic information at the time of the work by the replacing unit 1052 at the time of the preprocessing, without adding the semantic information to the preprocessing dictionary table 1041. You. [Sixth Embodiment] The configuration of hardware and functional blocks of a machine translation preprocessing device according to a sixth embodiment is the same as that of the third embodiment shown in FIG. The device of the sixth embodiment differs from the third embodiment in that each entry in the preprocessing dictionary table 1041 is made up of a headword, an English translation, and English word form information. It is. FIG.
An example of the contents of the preprocessing dictionary table 1041 according to the embodiment will be described.

【００５７】図１２に示す例では、語形情報として英語
単語の複数形に関する語形情報を与えている。In the example shown in FIG. 12, word form information on a plurality of English words is given as word form information.

【００５８】図１３に、この第６の実施の形態の機械翻
訳前処理装置の制御プログラムのフローチャートを示
す。図１３に示すフローチャートは、図９に示した第３
の実施の形態の装置のフローチャートにおいて、ステッ
プ４０３および４０４をそれぞれステップ５０３および
５０４で置換し、ステップ４０６および４０７をそれぞ
れステップ５０６および５０７で置換したものである。
ステップ４０１、４０２、４０５、および４０８は図９
に示すものと同様である。したがってこれらステップに
ついてはその詳細な説明はここでは繰返さない。FIG. 13 shows a flowchart of a control program of the machine translation preprocessing device according to the sixth embodiment. The flowchart shown in FIG. 13 is similar to the flowchart shown in FIG.
In the flowchart of the apparatus according to the embodiment, steps 403 and 404 are replaced by steps 503 and 504, respectively, and steps 406 and 407 are replaced by steps 506 and 507, respectively.
Steps 401, 402, 405, and 408 are shown in FIG.
Is the same as that shown in FIG. Therefore, detailed description of these steps will not be repeated here.

【００５９】ステップ５０３では、ステップ４０２での
辞書引き処理の結果、辞書引きが成功したか否かを判定
する。辞書引きに成功した場合には、結果の見出し語と
訳語と語形情報とを辞書引き結果バッファ１０６２に格
納して、制御はステップ５０４に進む。辞書引きに失敗
した場合には、入力文を置換結果バッファ１０６３にコ
ピーして制御はステップ４０５に進む。In step 503, it is determined whether or not the dictionary lookup has been successful as a result of the dictionary lookup process in step 402. If the dictionary lookup is successful, the resulting headword, translated word, and word form information are stored in the dictionary lookup result buffer 1062, and control proceeds to step 504. If dictionary lookup fails, the input sentence is copied to the replacement result buffer 1063, and control proceeds to step 405.

【００６０】ステップ５０４では、置換部１０５２によ
り、入力文バッファ１０６１に格納された日本語文中
の、辞書引き結果バッファ１０６２に格納された見出し
語を訳語に置換し、置換結果バッファ１０６３に格納す
る。このとき、後に見出し語に対応する語形情報によ
り、置換後の訳語を複数形にするために、この後に用い
る機械翻訳システムにおいて複数形情報が正しく解釈で
きる形式に置換結果バッファ１０６３の内容を修正す
る。ここでは、後に使用する機械翻訳システムにおい
て、複数形情報を正しく解釈し翻訳できるようにするた
めには、辞書引きによって得られた英語訳語の後に、一
例として記号「◎」をはさんで、見出し語の意味を表す
語であって、機械翻訳システムが容易に複数形にできる
語を付加するものとする。At step 504, the replacement unit 1052 replaces the headword stored in the dictionary lookup result buffer 1062 in the Japanese sentence stored in the input sentence buffer 1061 with a translated word, and stores it in the replacement result buffer 1063. At this time, the contents of the replacement result buffer 1063 are corrected to a format in which the plural form information can be correctly interpreted by a machine translation system to be used later, in order to make the translated word into the plural form by the word form information corresponding to the headword later. . Here, in order to be able to correctly interpret and translate plural information in a machine translation system to be used later, a heading “◎” is inserted as an example after the English translation obtained by dictionary lookup. It is assumed that a word that represents the meaning of the word and that can be easily pluralized by the machine translation system is added.

【００６１】ステップ４０５で、置換結果バッファ１０
６３の内容を機械翻訳システムによって翻訳する。At step 405, the replacement result buffer 10
The content of 63 is translated by a machine translation system.

【００６２】ステップ５０６において、ステップ５０４
で語形情報を付与したか否かについて判定し、付与した
場合には制御はステップ５０７に進み、付与しなかった
場合には制御はステップ４０８に進む。In step 506, step 504
Then, it is determined whether or not the word form information has been added. If the word form information has been added, the control proceeds to step 507, and if not, the control proceeds to step 408.

【００６３】ステップ５０７では、ステップ５０４で入
力文に付与した情報の翻訳結果と、ステップ４０２での
辞書引きの結果得られた語形情報とから、見出し語に対
応する語の語形を修正する。ここでは、機械翻訳の結
果、見出し語に対応する英語訳語の直後に、記号「◎」
を挟んでステップ５０４で付与された語の複数形が出力
され、翻訳結果バッファ１０６５に格納される。この記
号「◎」の直後の語が機械翻訳システムにより複数形と
して訳されていれば、その直前の語を複数形にしなけれ
ばならないことが分かる。そこで、辞書引きの結果得ら
れた複数形に基づき、翻訳結果中の英語訳語を複数形に
修正し、翻訳結果バッファ１０６５に結果を再格納す
る。In step 507, the word form of the word corresponding to the headword is corrected from the translation result of the information given to the input sentence in step 504 and the word form information obtained as a result of the dictionary lookup in step 402. Here, as a result of the machine translation, immediately after the English translation corresponding to the headword, the symbol "◎"
The plural form of the word assigned in step 504 is output and stored in the translation result buffer 1065. If the word immediately after the symbol “◎” is translated as a plural form by the machine translation system, it is understood that the word immediately before the symbol must be plural. Therefore, based on the plural form obtained as a result of the dictionary lookup, the English translation word in the translation result is corrected to the plural form, and the result is stored in the translation result buffer 1065 again.

【００６４】ステップ４０８で翻訳結果を出力し、処理
を終了する。具体的な例を挙げて説明する。入力文とし
て「３人の妊産婦が呼ばれた。」が与えられたものとす
る。図１２に示される前処理辞書テーブルに対する辞書
引きにより、「妊産婦」に対する英語訳語として「part
urient」が得られ、さらにその複数形に関する語形情報
「-s」も得られる。At step 408, the translation result is output, and the process ends. A specific example will be described. It is assumed that "three pregnant women have been called" has been given as an input sentence. By dictionary lookup for the pre-processing dictionary table shown in FIG.
urient "and the plural form information" -s ".

【００６５】ステップ５０４では、「３人の妊産婦が呼
ばれた。」という語が置換部１０５２の処理によって
「３人の<parturient◎少女>が呼ばれた。」という文に
変換される。「◎」の直後の「少女」が翻訳後の語形修
正のために原文に付与される情報であるこの文を機械翻
訳システムに与えることにより、翻訳結果として「Thre
e parturient◎, girls, were called.」という文が得
られたものとする。「少女」が「girls」と訳されてい
ることから、「◎」の直前の「parturient」を複数形に
しなければならないことが分かる。一方、ステップ４０
２および５０３での辞書引き処理の結果、「parturien
t」の複数形が「parturients」であることは分かってい
る。そこで、ステップ５０７においては後処理として
「Three parturient◎, girls, were called.」の「par
turient」をその複数形である「parturients」に修正
し、ステップ５０４で付与された記号「◎」とその直後
の単語に該当する部分「◎, girs」とを翻訳結果から削
除する。その結果、「Three parturients were calle
d.」という正しい翻訳結果が得られる。In step 504, the word “three pregnant women have been called” is converted into a sentence “three <parturient ◎ girls> have been called” by the processing of the replacement unit 1052. By giving this sentence, which is information added to the original sentence for correcting the inflected form after translation, to the machine translation system, the "girl" immediately after "◎"
e parturient ◎, girls, were called. " Since “girl” is translated as “girls”, it is understood that “parturient” immediately before “◎” must be pluralized. On the other hand, step 40
As a result of dictionary lookup processing in 2 and 503, "parturien
I know that the plural of "t" is "parturients". Therefore, in step 507, as post-processing, “par” of “Three parturient ◎, girls, were called.”
turient ”is corrected to its plural form“ parturients ”, and the symbol“ ◎ ”assigned in step 504 and the part“ ◎, girs ”corresponding to the word immediately after it are deleted from the translation result. As a result, "Three parturients were calle
d. "is obtained.

【００６６】以上のようにこの第６の実施の形態の機械
翻訳前処理装置によれば、翻訳後の単語の複数形などの
語形を正しく修正し、翻訳の精度を高めることが可能と
なる。［第７の実施の形態］第７の実施の形態にかかる機械翻
訳前処理装置は、図１に示した第１の実施の形態の装置
と同様のハードウェア構成、機能ブロック構成およびソ
フトウェア構成を有する。この第７の実施の形態の装置
は、前処理辞書テーブル１０４１の各エントリが、図１
４に示すように、ソース言語である日本語の見出し語
と、同じ日本語で見出し語を言い換えた語との対からな
る点で第１の実施例と異なっている。その他の点では両
者は同じである。As described above, according to the machine translation preprocessing apparatus of the sixth embodiment, it is possible to correct word forms such as plural forms of translated words correctly, and to improve translation accuracy. [Seventh Embodiment] A machine translation preprocessing apparatus according to a seventh embodiment has the same hardware configuration, functional block configuration, and software configuration as the apparatus of the first embodiment shown in FIG. Have. In the apparatus according to the seventh embodiment, each entry of the preprocessing dictionary table 1041 is
As shown in FIG. 4, the third embodiment differs from the first embodiment in that a pair of a headword in Japanese as a source language and a word in which the headword is paraphrased in the same Japanese. Otherwise they are the same.

【００６７】ここでの「言い換え」とは、機械翻訳シス
テムの辞書中に含まれていない単語を、機械翻訳システ
ムの辞書に登録されているそれとほぼ同じ意味の単語に
置換することをいう。このように、入力文に含まれる、
機械翻訳システムの辞書に含まれていない単語を、機械
翻訳システムに含まれているよく似た意味の単語と置換
する処理を予め前処理として入力文に適用することによ
り、機械翻訳の精度が向上するであろうことは当業者で
あれば容易に理解できるであろう。Here, "paraphrasing" means replacing words that are not included in the dictionary of the machine translation system with words having substantially the same meaning as those registered in the dictionary of the machine translation system. Thus, in the input sentence,
Improve the accuracy of machine translation by applying the process of replacing words that are not included in the dictionary of the machine translation system with words with similar meanings included in the machine translation system to the input sentence as preprocessing in advance Those skilled in the art will readily understand what will be done.

【００６８】たとえば、ある機械翻訳システムの辞書に
「懐妊」はないが「妊娠」ならある場合を考える。入力
文が「王女の懐妊が伝えられた。」というものである場
合、これを直接この機械翻訳システムで翻訳すると、
「懐妊」がうまく翻訳できず、たとえば「Princess' bo
som 妊 was transmitted.」などという翻訳結果とな
る。しかしこれを前処理によって「王女の妊娠が伝えら
れた。」と言い換えておくことにより、同じ機械翻訳シ
ステムで翻訳すると「Princess' pregnancy was transm
itted.」となり、言い換えをしなかった場合と比較して
より適切な翻訳結果を得ることができる。For example, consider a case where a dictionary of a certain machine translation system does not have “pregnancy” but has “pregnancy”. If the input sentence is "The pregnancy of a princess has been conveyed."
"Pregnancy" could not be translated well, for example, "Princess' bo
som pregnant was transmitted. " However, by rephrasing this as "pregnancy of the princess was conveyed" by preprocessing, when translated by the same machine translation system, "Princess' pregnancy was transm
itted. ", and a more appropriate translation result can be obtained as compared with the case where no paraphrase is made.

【００６９】この実施の形態の装置では、前処理辞書テ
ーブル１０４１として、「懐妊」「妊娠」のような類義
語を登録しておくだけでなく、「見積もり」「見積り」
などの異綴り語を言い換え語として登録してもよい。ま
た、「資料収集」などの複合語を「資料の収集」と基本
的な語の組合わせに言い換えるエントリを登録してもよ
い。要は、機械翻訳システムの辞書に登録されていない
単語を、前処理によって、機械翻訳システムの辞書に登
録されている単語またはそれらの組合わせに置換えてし
まう、ということである。In the apparatus of this embodiment, not only synonyms such as “pregnancy” and “pregnancy” are registered as the pre-processing dictionary table 1041, but also “estimation”, “estimation”
Alternatively, such a misspelled word may be registered as a paraphrase. Further, an entry may be registered in which a compound word such as “collection of material” is paraphrased into a combination of “collection of material” and a basic word. The point is that words not registered in the dictionary of the machine translation system are replaced with words registered in the dictionary of the machine translation system or a combination thereof by preprocessing.

【００７０】この実施の形態の装置によれば、前処理辞
書の各エントリが、ソース言語の語の対となるため、ソ
ース言語と訳語との対でエントリを構成する場合と比較
して、前処理辞書テーブルの作成が容易であるという効
果がある。According to the apparatus of this embodiment, each entry of the preprocessing dictionary is a pair of words in the source language, so that the entries are composed of pairs of the source language and the translated words. There is an effect that creation of the processing dictionary table is easy.

【００７１】今回開示された実施の形態はすべての点で
例示であって制限的なものではないと考えられるべきで
ある。本発明の範囲は上記した説明ではなくて特許請求
の範囲によって示され、特許請求の範囲と均等の意味お
よび範囲内でのすべての変更が含まれることが意図され
る。The embodiments disclosed this time are to be considered in all respects as illustrative and not restrictive. The scope of the present invention is defined by the terms of the claims, rather than the description above, and is intended to include any modifications within the scope and meaning equivalent to the terms of the claims.

[Brief description of the drawings]

【図１】本発明の第１の実施の形態にかかる機械翻訳
前処理装置の機能ブロック図である。FIG. 1 is a functional block diagram of a machine translation pre-processing device according to a first embodiment of the present invention.

【図２】第１の実施の形態の装置の前処理辞書テーブ
ルの内容の例を示す図である。FIG. 2 is a diagram illustrating an example of the contents of a preprocessing dictionary table of the device according to the first embodiment.

【図３】第１の実施の形態の装置の制御プログラムの
フローチャートである。FIG. 3 is a flowchart of a control program of the device according to the first embodiment.

【図４】第１の実施の形態の装置における各バッファ
の内容を示す図である。FIG. 4 is a diagram showing the contents of each buffer in the device according to the first embodiment.

【図５】本発明の第２の実施の形態にかかる機械翻訳
前処理装置の機能ブロック図である。FIG. 5 is a functional block diagram of a machine translation pre-processing device according to a second embodiment of the present invention.

【図６】第２の実施の形態の装置の制御プログラムの
フローチャートである。FIG. 6 is a flowchart of a control program of the device according to the second embodiment.

【図７】本発明の第３の実施の形態にかかる機械翻訳
前処理装置の機能ブロック図である。FIG. 7 is a functional block diagram of a machine translation preprocessing device according to a third embodiment of the present invention.

【図８】第３の実施の形態の装置の前処理辞書テーブ
ルの内容の例を示す図である。FIG. 8 is a diagram illustrating an example of contents of a pre-processing dictionary table of the device according to the third embodiment.

【図９】第３の実施の形態の装置の制御プログラムの
フローチャートである。FIG. 9 is a flowchart of a control program of the device according to the third embodiment.

【図１０】第４の実施の形態の装置の前処理辞書テー
ブルの内容の例を示す図である。FIG. 10 is a diagram illustrating an example of contents of a pre-processing dictionary table of the device according to the fourth embodiment.

【図１１】第５の実施の形態の装置の前処理辞書テー
ブルの内容の例を示す図である。FIG. 11 is a diagram illustrating an example of contents of a pre-processing dictionary table of the device according to the fifth embodiment.

【図１２】第６の実施の形態の装置の前処理辞書テー
ブルの内容の例を示す図である。FIG. 12 is a diagram illustrating an example of contents of a preprocessing dictionary table of the device according to the sixth embodiment.

【図１３】第６の実施の形態の装置の制御プログラム
のフローチャートである。FIG. 13 is a flowchart of a control program of the device according to the sixth embodiment.

【図１４】第７の実施の形態の装置の前処理辞書テー
ブルの内容の例を示す図である。FIG. 14 is a diagram illustrating an example of contents of a pre-processing dictionary table of the device according to the seventh embodiment.

[Explanation of symbols]

１０１制御部、１０２入力部、１０３出力部、１
０４テーブルメモリ、１０５プログラムメモリ、１
０６バッファメモリ、１０４１前処理辞書テーブ
ル、１０４２形態素情報テーブル、１０５１辞書引
き部、１０５２置換部、１０５３後処理部、１０５４
形態素解析部、１０６１入力文バッファ、１０６２
辞書引き結果バッファ、１０６３置換結果バッフ
ァ、１０６４形態素解析結果バッファ、１０６５翻
訳結果バッファ。101 control unit, 102 input unit, 103 output unit, 1
04 table memory, 105 program memory, 1
06 buffer memory, 1041 pre-processing dictionary table, 1042 morphological information table, 1051 dictionary lookup unit, 1052 replacement unit, 1053 post-processing unit, 1054
Morphological analysis unit, 1061 input sentence buffer, 1062
Dictionary lookup result buffer, 1063 replacement result buffer, 1064 morphological analysis result buffer, 1065 translation result buffer.

Claims

[Claims]

1. A machine translation pre-processing device for pre-processing input text in order to improve the accuracy of machine translation prior to inputting the input text of the source language into the machine translation device, comprising: Input means for inputting text, means for storing a pre-processing dictionary table having an entry including a pair of a headword of the source language and a corresponding word of a predetermined language, and the input means Searching means for searching the pre-processing dictionary table for an entry having a corresponding headword for a word included in the input text; and the predetermined means included in the entry searched by the searching means. Machine translation pre-processing device, comprising: a replacement unit for replacing a part of the text with a language equivalent.

2. A means for storing a morphological information table including morphological information on words in the source language; analyzing the text with reference to the morphological information table, dividing the text into morphemes, and performing the search 2. The machine translation pre-processing apparatus according to claim 1, further comprising: a morphological analysis unit for giving to the unit.

3. The predetermined language is a language different from the source language, and each of the entries in the pre-processing dictionary table has a correspondence between a headword of the source language and the predetermined language. In addition to the pair with the word, further includes semantic information of the headword of the source language, the replacement means, the corresponding word of the predetermined language included in the entry searched by the search means, the text And further comprising means for adding semantic information of a headword corresponding to the corresponding word of the predetermined language, wherein the machine translation preprocessing device is partially replaced by the providing means. From the result of machine translation of the text that has been translated into a sentence of the predetermined language by the machine translation device,
3. The machine translation pre-processing apparatus according to claim 1, further comprising: means for deleting the semantic information given by the giving means.

4. The method according to claim 2, wherein the semantic information corresponding to the headword of the source language in each of the entries of the preprocessing dictionary table is one of morphemes constituting the corresponding headword. Machine translation preprocessor.

5. The headword of an entry of the preprocessing dictionary table includes a combination of ideographic characters, and the semantic information corresponding to the headword includes only one character at the end of the headword. Machine translation preprocessor.

6. The machine translation pre-processing device according to claim 5, wherein the source language is a language using kanji as an ideographic character, and the semantic information includes only one kanji at the end of a corresponding heading.

7. Each of the entries of the preprocessing dictionary table includes a pair of a headword of the source language and a corresponding word of the predetermined language, and a corresponding word of the predetermined language. Further including word form information, wherein the replacement unit replaces a part of the text with a corresponding word of the predetermined language included in the entry searched by the search unit, and further replaces the predetermined language. The machine translation preprocessing device includes means for adding word form information of a corresponding word, wherein the machine translation preprocessing device converts the text partially substituted by the providing means into a sentence of the predetermined language by the machine translation device. From the translated result,
2. The apparatus according to claim 1, further comprising: a unit configured to correct a form of the corresponding word of the predetermined language included in the result of the machine translation according to the form information provided by the providing unit. 3. 3. The machine translation preprocessing device according to 2.

8. The machine translation preprocessing apparatus according to claim 1, wherein the predetermined language is the same language as the source language.