JP2010067021A

JP2010067021A - Machine translation device and machine translation program

Info

Publication number: JP2010067021A
Application number: JP2008232931A
Authority: JP
Inventors: Yoko Odaka; 陽子小▲高▼
Original assignee: Toshiba Corp; Toshiba Solutions Corp
Current assignee: Toshiba Corp; Toshiba Digital Solutions Corp
Priority date: 2008-09-11
Filing date: 2008-09-11
Publication date: 2010-03-25
Anticipated expiration: 2028-09-11
Also published as: JP5025603B2

Abstract

<P>PROBLEM TO BE SOLVED: To obtain proper translation of second language by extracting information affecting the determination of translation from a translation non-target text, and using it for translation in translating a document in which a first language sentence and a translation non-target sentence coexist. <P>SOLUTION: A second language sentence analysis processing part 27 extracts words and phrases whose parts of speech are predetermined by analyzing the text data of second language from document data in which first language and second language coexist, and stores the words and phrases in a second language sentence extraction word and phrase database 28 as the extracted words and phrases. When a plurality of words and phrases of second language as the translation candidates of the words and phrases of the first language exist in a translation dictionary part 31, a translation selection processing part 35 narrows down the translation candidates of the words and phrases of the first language into one word and phrase of the second language, and selects it based on the extracted words and phrases stored in the second language sentence extracted word and phrase database 28 and the co-occurrence information or field information of the second language machine translation knowledge database 38. Then, the translation is generated by the translation generation processing part 32, and output through an output processing part 34 to a display device. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、翻訳対象の第一言語の文書を第二言語に翻訳する機械翻訳装置及び機械翻訳プログラムに関する。 The present invention relates to a machine translation device and a machine translation program for translating a document in a first language to be translated into a second language.

一般に、翻訳対象の第一言語と翻訳後の第二言語とが混在した文書を翻訳する場合、翻訳対象の文書のうち訳出言語で書かれた部分は、翻訳処理を通さずにそのままの形で出力され、翻訳対象の第一言語で書かれた部分は翻訳処理を通しその訳文が出力される。 In general, when translating a document in which the first language to be translated and the second language after translation are translated, the portion of the document to be translated written in the target language remains in its original form without passing through the translation process. The part written in the first language to be translated is translated and the translation is output.

また、翻訳対象の第一言語の文書を第二言語に翻訳する翻訳処理においては、第１言語の語句に対して複数個の訳語が存在する場合には、翻訳辞書の中で優先順位の高いものから順に訳語を選択していく。従って、普段滅多に翻訳する機会のない分野や専門性が高い分野の文書の場合には、その分野の訳語の優先順位は低くなっていることが多く、結果として正しくない訳語が選択されることが多い。これにより、原文の表す意味とはかけ離れた意味をもつ訳文が生成されてしまう。 Further, in the translation processing for translating a document of the first language to be translated into the second language, when there are a plurality of translated words for the phrase of the first language, the priority is high in the translation dictionary. The translation is selected in order from the one. Therefore, in the case of a document in a field that is rarely an opportunity to translate or a field of high expertise, the translation priority in that field is often low, and as a result, an incorrect translation is selected. There are many. As a result, a translated sentence having a meaning far from the meaning represented by the original sentence is generated.

正しい訳語を得るために、目的言語の語句間の共起情報を収容する共起辞書を備え、原文の語句に対して複数の訳語候補がある場合にその共起辞書を参照し、当該語に共起情報として定義された共起語に訳す第一言語の語句が原文書中にあるかどうかを検索し、その頻度や出現位置、共起辞書に定義された個々の共起の評価値から当該訳語の総合的な評価値を算出し、訳語の決定に用いるようにしたものがある（例えば、特許文献１参照）。 In order to obtain a correct translation, a co-occurrence dictionary that contains co-occurrence information between words in the target language is provided, and when there are a plurality of translation candidates for the original text, the co-occurrence dictionary is referred to, Search the original document for words in the first language that translate into co-occurrence words defined as co-occurrence information. From the frequency, appearance position, and evaluation value of each co-occurrence defined in the co-occurrence dictionary There is one in which a comprehensive evaluation value of the translated word is calculated and used to determine the translated word (for example, see Patent Document 1).

また、翻訳対象となる原文の内容と関連のある内容の既存の目的言語文書を関連文書格納部に予め格納しておき、原文の単語・句に対して複数の訳語・句候補が存在する場合に、これらの各訳語・句候補が関連文書格納部に格納された目的言語文書中に存在するかどうかを検索し、目的言語文書中に該当する訳語・句候補が存在する場合には、その訳語・句候補を他の訳語・句候補よりも優先的に原文の単語・句の訳語として採用するようにしたものがある（例えば、特許文献２参照）。 In addition, when an existing target language document having contents related to the contents of the original text to be translated is stored in advance in the related document storage section, there are a plurality of translation words / phrase candidates for the original text words / phrases. In addition, it is searched whether or not each of these translation words / phrase candidates exists in the target language document stored in the related document storage unit, and if there is a corresponding translation word / phrase candidate in the target language document, Some translation / phrase candidates are preferentially adopted as translations of original words / phrases over other translation / phrase candidates (see, for example, Patent Document 2).

所定の言語間で翻訳処理の際に得た原文情報及び訳文情報の間の対応関係を示す翻訳情報を記憶手段に記憶し、第２の言語から第１の言語に翻訳処理する際には、この処理の前に第１の言語から第２の言語に翻訳処理した際に得た翻訳情報を記憶手段から取り出して翻訳処理を行うようにしたものがある（例えば、特許文献３参照）。
特許第３０４５８３２号公報特許第３０３４２９５号公報特許第２８３１６４７号公報 When storing the translation information indicating the correspondence between the original text information and the translated text information obtained during the translation process between the predetermined languages in the storage unit, and translating from the second language to the first language, There is one in which translation information obtained when translation processing is performed from the first language to the second language before this processing is taken out from the storage means and the translation processing is performed (for example, see Patent Document 3).
Japanese Patent No. 3045832 Japanese Patent No. 3034295 Japanese Patent No. 2831647

しかし、特許文献１のものでは、翻訳対象とならない文章は共起情報の対象としていないので、第一言語の文と第二言語の文とが混在する文書を翻訳する際に、第一言語の文を翻訳した訳語の語句と、文書中に存在する第二言語の文の語句とが統一されないことがある。また、特許文献２や特許文献３のものにおいても、同じ翻訳対象文書に含まれる第二言語から第一言語の翻訳情報を得るわけではないので、訳語選択に役立つ情報を適切に得ることができない。 However, in Patent Document 1, sentences that are not to be translated are not subject to co-occurrence information, so when translating a document in which a sentence in the first language and a sentence in the second language are mixed, In some cases, the translated words and phrases translated from the sentence and the second language sentence in the document are not unified. Also, in Patent Document 2 and Patent Document 3, since the translation information of the first language is not obtained from the second language included in the same translation target document, information useful for selecting a translation cannot be obtained appropriately. .

図８は、従来例での第一言語の文と第二言語の文とが混在する文書を翻訳した一例の説明図である。図８では第一言語を英語とし、第二言語を日本語とした場合を示している。図８（ａ）に示すように、第一言語の文（英語の文）と第二言語の文（日本語の文）とが混在する文書であり、図８（ａ）に示す文書を英日翻訳すると、中央の英語部分のみ翻訳処理にかけられ、図８（ｂ）に示すような訳文が得られる。 FIG. 8 is an explanatory diagram of an example in which a document in which a sentence in a first language and a sentence in a second language are mixed in a conventional example is translated. FIG. 8 shows a case where the first language is English and the second language is Japanese. As shown in FIG. 8A, the first language sentence (English sentence) and the second language sentence (Japanese sentence) are mixed, and the document shown in FIG. When the Japanese translation is performed, only the central English portion is subjected to a translation process, and a translation as shown in FIG. 8B is obtained.

ここで用いている「bank」は、訳語として、「銀行」、「土手」、「堤防」、「岸」等を持つ多義語であるが、従来の機械翻訳装置では、一般的に頻度的に高いと判断された訳語を第一訳語として定義し、翻訳時に特別な訳し分けのための情報が得られない場合には、第一訳語として定義された訳語が翻訳に使用される。そのため、翻訳辞書に「bank」の第一訳語として「銀行」が定義されていれば、原文書中から訳語を決定するための情報が得られない場合は、「bank」は「銀行」と訳される。 As used herein, “bank” is a multiple word with the words “bank”, “bank”, “bank”, “shore”, etc. A translation determined to be high is defined as a first translation, and if information for special translation cannot be obtained during translation, the translation defined as the first translation is used for translation. Therefore, if “bank” is defined as the first translated word of “bank” in the translation dictionary, if information for determining the translated word cannot be obtained from the original document, “bank” is translated as “bank”. Is done.

ここで、図８（ａ）の翻訳する必要のない日本語で書かれた部分に着目すると、この場合の「bank」の訳語としては「土手」、「堤防」、「岸」などの語が適切であることがわかる。このように、従来例においては、第一言語文と第二言語文とが混在する文書を翻訳する際に、第二言語の文章から共起情報や分野情報などの訳語決定に左右する情報を抽出して翻訳に利用することは行われていないので、第一言語の語句の訳語として適切な第二言語の訳語が得られないことがあった。 Here, paying attention to the portion of FIG. 8A written in Japanese that does not need to be translated, the words “bank” in this case are words such as “bank”, “embankment”, “shore”. It turns out to be appropriate. In this way, in the conventional example, when translating a document in which a first language sentence and a second language sentence are mixed, information that influences the translation determination such as co-occurrence information and field information from the sentence in the second language. Since it is not extracted and used for translation, a translation of the second language appropriate as a translation of the phrase of the first language may not be obtained.

本発明の目的は、第一言語文と翻訳非対象文とが混在する文書を翻訳する際に、翻訳非対象の文章から訳語決定に左右する情報を抽出して翻訳に利用し、適切な第二言語の訳語を得ることができる機械翻訳装置及び機械翻訳プログラムを提供することである。 It is an object of the present invention to extract information that affects translation determination from a non-translation sentence and use it for translation when translating a document in which a first language sentence and a non-translation sentence are mixed. A machine translation apparatus and a machine translation program capable of obtaining bilingual translations.

本発明の機械翻訳装置は、機械翻訳プログラム及び第一言語の語句の訳語候補となる一又は複数の第二言語の語句が格納された翻訳辞書部を記憶した記憶装置と、翻訳対象の第一言語及び翻訳非対象の第二言語が混在したデータを入力する入力装置と、前記第一言語を翻訳した第二言語の訳文を表示する表示装置と、前記機械翻訳プログラムを演算実行する演算制御装置とを備えた機械翻訳装置において、前記記憶装置に形成され、第二言語の語句の少なくとも共起情報及び分野情報を格納した第二言語用機械翻訳知識データベースと、第一言語と第二言語とが混在したデータが前記入力装置より入力されたとき、第一言語の翻訳対象部分と第二言語の翻訳非対象部分とを入力する入力処理部と、前記入力処理部で入力された第二言語の翻訳非対象部分を解析し、第二言語の翻訳非対象部分から前記記憶装置に予め記憶された品詞の語句を抽出する第二言語文解析処理部と、前記第二言語文解析処理部で抽出された第二言語の語句を格納する第二言語文抽出語句データベースと、前記入力処理部により入力された第一言語の翻訳対象部分を解析する第一言語解析処理部と、前記第一言語解析処理部で解析された第一言語の語句の訳語候補となる第二言語の語句を前記翻訳辞書部から検索する翻訳辞書検索部と、前記翻訳辞書検索部で検索した結果、前記第一言語の語句の訳語候補となる第二言語の語句が前記翻訳辞書部に複数存在した場合に、前記第二言語文抽出語句データベースに格納された第二言語の語句と第二言語用機械翻訳知識データベースの共起情報又は分野情報に基づいて前記第一言語の語句の訳語候補を一つの第二言語の語句に絞り込み選択する訳語選択処理部と、前記翻訳辞書検索部で検索された第二言語の語句及び前記訳文選択処理部で選択された第二言語の語句に基づいて訳文を生成する訳文生成処理部と、前記訳文生成処理部で生成された第二言語の訳文を前記表示装置に出力する出力処理部とを備えたことを特徴とする。 A machine translation apparatus according to the present invention includes a machine translation program and a storage device storing a translation dictionary unit in which one or a plurality of second language words and phrases serving as translation word candidates of a first language are stored; Input device for inputting data in which languages and non-translatable second languages are mixed, a display device for displaying a translation of the second language obtained by translating the first language, and an arithmetic control device for calculating and executing the machine translation program A machine translation knowledge database for a second language that is formed in the storage device and stores at least co-occurrence information and field information of words in the second language, and a first language and a second language. When the mixed data is input from the input device, the input processing unit for inputting the translation target portion of the first language and the non-translation target portion of the second language, and the second language input by the input processing unit Translation Analyzed the target part, extracted from the second language translation non-target part, the second language sentence analysis processing unit that extracts words of the part of speech stored in the storage device in advance, and the second language sentence analysis processing unit A second language sentence extraction phrase database for storing phrases of the second language, a first language analysis processing section for analyzing a translation target portion of the first language input by the input processing section, and the first language analysis processing section A translation dictionary search unit that searches the translation dictionary unit for a second language phrase that is a candidate for a translation of the first language phrase analyzed in step 1, and as a result of searching in the translation dictionary search unit, Co-occurrence of a second language phrase and a second language machine translation knowledge database stored in the second language sentence extraction phrase database when a plurality of second language phrases that are translation candidates exist in the translation dictionary unit Based on information or field information The translation candidate selection processing unit that narrows down and selects the translation candidates of the first language phrase to one second language phrase, and the second language phrase searched by the translation dictionary search unit and the translation selection processing unit A translation generation processing unit that generates a translation based on the phrase of the second language, and an output processing unit that outputs the translation of the second language generated by the translation generation processing unit to the display device. Features.

本発明によれば、第一言語文と第二言語文とが混在する文書を翻訳する際に、第二言語の文章から訳語決定に左右する情報を抽出して翻訳に利用し、適切な第二言語の訳語を得ることができる。 According to the present invention, when translating a document in which a first language sentence and a second language sentence are mixed, information that affects translation determination is extracted from a sentence in the second language and used for translation. You can get bilingual translations.

図１は本発明の実施の形態に係る機械翻訳装置のハードウエア構成を示すブロック構成図である。機械翻訳装置１１は、例えば一般的なコンピュータに機械翻訳プログラムなどのソフトウェアプログラムがインストールされ、そのソフトウェアプログラムが演算制御装置１２のプロセッサ１３において実行されることにより実現される。 FIG. 1 is a block configuration diagram showing a hardware configuration of a machine translation apparatus according to an embodiment of the present invention. The machine translation device 11 is realized, for example, by installing a software program such as a machine translation program in a general computer and executing the software program in the processor 13 of the arithmetic control device 12.

演算制御装置１２は機械翻訳に関する各種演算を行うものであり、演算制御装置１２はプロセッサ１３とメモリ１４とを有し、メモリ１４には翻訳に関する機械翻訳プログラム１５が記憶され、プロセッサ１３により処理が実行される際には作業エリア１６が用いられる。演算制御装置１２の演算結果等は出力装置１７である表示装置１８に表示出力され、また、通信制御装置１９を介して通信ネットワークに出力される。 The arithmetic control device 12 performs various arithmetic operations related to machine translation. The arithmetic control device 12 has a processor 13 and a memory 14. A memory translation program 15 is stored in the memory 14, and the processor 13 performs processing. When executed, the work area 16 is used. Calculation results and the like of the calculation control device 12 are displayed and output on the display device 18 that is the output device 17 and also output to the communication network via the communication control device 19.

入力装置２０は演算制御装置１２に情報を入力するものであり、例えば、マウス２１、キーボード２２、ディスクドライブ２３、通信制御装置１９から構成され、例えば、マウス２１やキーボード２２は表示装置１８を介して演算制御装置１２に各種指令を入力し、キーボード２２、ディスクドライブ２３、通信制御装置１９は翻訳対象の文書を入力する。 The input device 20 is used to input information to the arithmetic control device 12, and includes, for example, a mouse 21, a keyboard 22, a disk drive 23, and a communication control device 19. For example, the mouse 21 and the keyboard 22 are connected via the display device 18. Then, various commands are input to the arithmetic and control unit 12, and the keyboard 22, the disk drive 23, and the communication control unit 19 input a document to be translated.

すなわち、ディスクドライブ２３は翻訳対象の文書のファイルを記憶媒体に入出力するものであり、通信制御装置１９は機械翻訳装置１１をインターネットやＬＡＮなどの通信ネットワークに接続するものである。通信制御装置１９はＬＡＮカードやモデムなどの装置であり、通信制御装置１９を介して通信ネットワークと送受信したデータは入力信号又は出力信号として演算制御装置１２に送受信される。さらに、演算制御装置１２の演算結果や翻訳に必要な知識・規則を蓄積した翻訳辞書等を記憶するハードディスクドライブ（ＨＤＤ）２４が設けられている。 That is, the disk drive 23 inputs / outputs a file of a document to be translated to / from a storage medium, and the communication control device 19 connects the machine translation device 11 to a communication network such as the Internet or a LAN. The communication control device 19 is a device such as a LAN card or a modem, and data transmitted / received to / from the communication network via the communication control device 19 is transmitted / received to / from the arithmetic control device 12 as an input signal or an output signal. Further, a hard disk drive (HDD) 24 is provided for storing a calculation dictionary of the calculation control device 12 and a translation dictionary storing knowledge and rules necessary for translation.

図２は本発明の実施の形態に係わる機械翻訳装置１１の機能ブロック図である。図２に示す演算制御装置１２内の各機能ブロックは、上述の機械翻訳プログラム１５を構成する各プログラムに対応する。すなわち、プロセッサ１３が機械翻訳プログラム１５を構成する各プログラムを実行することで、演算制御装置１２は、各機能ブロックとして機能することとなる。また、記憶装置２５の各ブロックは、演算制御装置１２内のメモリ１４及びハードディスクドライブ２４の記憶領域に対応する。 FIG. 2 is a functional block diagram of the machine translation apparatus 11 according to the embodiment of the present invention. Each functional block in the arithmetic and control unit 12 shown in FIG. 2 corresponds to each program constituting the machine translation program 15 described above. That is, when the processor 13 executes each program constituting the machine translation program 15, the arithmetic control device 12 functions as each functional block. Each block of the storage device 25 corresponds to a storage area of the memory 14 and the hard disk drive 24 in the arithmetic control device 12.

入力処理部２６は、入力装置２０から入力された第一言語の文と第二言語の文とが混在する文書データを取り込み、文書内で翻訳すべき文章部分（即ち翻訳前の第一言語で書かれた部分）と、翻訳する必要のない文章部分（即ち翻訳後の第二言語で書かれた部分）とに分けるものである。 The input processing unit 26 takes in document data in which a sentence in the first language and a sentence in the second language input from the input device 20 are mixed, and a sentence portion to be translated in the document (that is, in the first language before translation). And a sentence portion that does not need to be translated (ie, a portion written in the second language after translation).

第二言語文解析処理部２７は、文書内の翻訳後の第二言語で書かれた部分の文章データを解析し、第二言語の文章データから予め定めた品詞の語句を抽出するものである。例えば、翻訳対象の第一言語の文章に含まれる語句の共起や分野情報を得るために、文書内の第二言語の文章から、名詞や動詞など共起や分野情報を得るための語を抽出する。もちろん、共起や分野情報を得るために役立つのであれば、抽出する語の品詞は、名詞や動詞以外の品詞の語句でも構わない。予め定めた品詞は記憶装置２５に予め記憶される。第二言語文解析処理部２７は、抽出した語を第二言語文抽出語句データベース２８へ格納する。 The second language sentence analysis processing unit 27 analyzes sentence data of a part written in the second language after translation in the document, and extracts a phrase of a predetermined part of speech from the sentence data of the second language. . For example, in order to obtain co-occurrence of words and field information contained in a sentence in the first language to be translated, words for obtaining co-occurrence and field information such as nouns and verbs from sentences in the second language in the document. Extract. Of course, as long as it is useful for obtaining co-occurrence and field information, the part of speech of the extracted word may be a part of speech other than a noun or a verb. The predetermined part of speech is stored in the storage device 25 in advance. The second language sentence analysis processing unit 27 stores the extracted word in the second language sentence extraction word / phrase database 28.

第一言語文解析処理部２９は、入力処理部２６から入力された文書内の翻訳対象原文とされた第一言語の文について形態素解析及び辞書引きをする単位に分割し翻訳辞書検索部３０に出力する。以下、第一言語文解析処理部２９で得られた語を原語と呼ぶことにする。 The first language sentence analysis processing unit 29 divides the sentence of the first language, which is the original text to be translated in the document input from the input processing unit 26, into units for performing morphological analysis and dictionary lookup, and sends them to the translation dictionary search unit 30. Output. Hereinafter, the words obtained by the first language sentence analysis processing unit 29 are referred to as original words.

次に、翻訳辞書検索部３０は、第一言語文解析処理部２９で分割された言語と共に、入力処理部２６より渡された文書内の翻訳後の第二言語で書かれた文章を入力し、原語の対訳語を得るために、翻訳辞書部３１の中を調べる。この中に、原語と同じものを見出し語とするものが発見されれば、当該見出し語に対して定義された訳語に置き換え訳文生成処理部３２へ進む。このとき、必要に応じて第一言語用機械翻訳知識データベース３３を使用する。第一言語用機械翻訳知識データベース３３には、原語の綴りだけでなく、意味、品詞、共起、分野など、様々な種類の膨大な量の情報が蓄積されている。 Next, the translation dictionary search unit 30 inputs a sentence written in the translated second language in the document passed from the input processing unit 26 together with the language divided by the first language sentence analysis processing unit 29. In order to obtain a parallel translation of the original language, the translation dictionary unit 31 is examined. If an entry having the same word as the original word is found in this list, the process proceeds to the translated sentence generation processing unit 32 with the translation defined for the entry word. At this time, the machine translation knowledge database 33 for the first language is used as necessary. The first language machine translation knowledge database 33 stores not only the spelling of the original language but also a huge amount of various types of information such as meaning, part of speech, co-occurrence, and field.

翻訳辞書検索部３０にて、原語が一つの訳語に置き換えられた場合には、訳文生成処理部３２は訳語に置き換えられた文から訳文を生成し、訳文生成処理部３２で生成された訳文を翻訳出力処理部３４から出力する。 When the translation dictionary search unit 30 replaces the original word with a single translation, the translation generation processing unit 32 generates a translation from the sentence replaced with the translation, and the translation generated by the translation generation processing unit 32 Output from the translation output processing unit 34.

翻訳辞書部３１の中に、原語に相当する訳語が複数発見され、かつ第一言語用機械翻訳知識データベース３３を使用しても一つの対訳語に絞ることができない場合は、翻訳辞書検索部３０は訳語選択処理部３５を起動する。 If a plurality of translations corresponding to the original language are found in the translation dictionary unit 31 and the first language machine translation knowledge database 33 cannot be narrowed down to one translation, the translation dictionary search unit 30 Activates the translation selection processing unit 35.

訳語選択処理部３５は第二言語知識検索部３６及び対訳語決定処理部３７からなり、第二言語知識検索部３６は第二言語文抽出語句データベース２８と第二言語用機械翻訳知識データベース３８とを使って適切な訳語の確からしさを検索し、対訳語決定処理部３７は第二言語知識検索部３６の検索結果から訳語を一つに絞り込んで訳文生成処理部３２に出力する。 The translation selection processing unit 35 includes a second language knowledge search unit 36 and a parallel translation determination processing unit 37. The second language knowledge search unit 36 includes a second language sentence extraction phrase database 28, a second language machine translation knowledge database 38, The parallel word determination processing unit 37 narrows down the translation word to one from the search result of the second language knowledge search unit 36 and outputs it to the translation generation processing unit 32.

次に、訳語選択処理部３５の詳細な処理内容について説明する。いま、図８に示す第一言語（英語）の文と第二言語（日本語）の文とが混在する文書の第一言語の文を翻訳する場合を例にして説明する。まず、第二言語文解析処理部２７では、図８の文書内の第二言語（日本語）で書かれた部分の文章を解析し、第二言語（日本語）の文章から、図４に示すように、「記事」、「運河」、「高台」、「テント」、「設営」の語句を抽出し、第二言語文抽出語句データベース２８に格納しているとする。 Next, detailed processing contents of the translation word selection processing unit 35 will be described. Now, an example will be described in which a sentence in the first language of a document in which a sentence in the first language (English) and a sentence in the second language (Japanese) shown in FIG. 8 are mixed is translated. First, the second language sentence analysis processing unit 27 analyzes the sentence of the portion written in the second language (Japanese) in the document of FIG. 8, and from the sentence of the second language (Japanese) to FIG. As shown, it is assumed that the words “article”, “canal”, “high ground”, “tent”, and “establishment” are extracted and stored in the second language sentence extraction word / phrase database 28.

図３は機械翻訳装置１１の処理内容を示すフローチャートである。翻訳辞書検索部３０は、翻訳辞書部３１の中に原語に相当する訳語が複数発見され、かつ第一言語用機械翻訳知識データベース３３を使用しても一つの対訳語に絞ることができない場合は、この原語及び訳語候補を訳語選択処理部３５に渡す。 FIG. 3 is a flowchart showing the processing contents of the machine translation apparatus 11. When a plurality of translations corresponding to the original language are found in the translation dictionary unit 31 and the first language machine translation knowledge database 33 cannot be used for the translation dictionary search unit 30, The original word and the translation word candidate are passed to the translation word selection processing unit 35.

訳語選択処理部３５の第二言語知識検索部３６は、翻訳辞書検索部３０から複数の訳語候補がある原語及び訳語候補を入力する（Ｓ１１）。いま、複数の訳語候補がある原語は「bank」であり、訳語候補が「銀行」、「貯蔵所」、「土手」、「堤防」、「岸」、「堆積」、「層」、「州」、「浅瀬」…であるとすると、第二言語知識検索部３６は、図５に示すように、複数の訳語候補がある原語及び訳語候補を入力する。 The second language knowledge search unit 36 of the translation word selection processing unit 35 inputs the source word and the translation word candidate having a plurality of translation word candidates from the translation dictionary search unit 30 (S11). The original word with multiple translation candidates is “bank”, and the translation candidates are “bank”, “reservoir”, “bank”, “embankment”, “shore”, “deposition”, “layer”, “state” ”,“ Ashase ”..., The second language knowledge search unit 36 inputs source words and translation word candidates having a plurality of translation word candidates, as shown in FIG.

そして、第二言語知識検索部３６は、第二言語文抽出語句データベース２８に格納されている抽出語句を読み込む（Ｓ１２）。図８の文書の場合には、図４に示す抽出語句（「記事」、「運河」、「高台」、「テント」、「設営」）が読み込まれる。 Then, the second language knowledge search unit 36 reads an extracted phrase stored in the second language sentence extracted phrase database 28 (S12). In the case of the document in FIG. 8, the extracted words (“article”, “canal”, “high ground”, “tent”, “setting”) shown in FIG. 4 are read.

次に、第二言語知識検索部３６は、抽出語句が第二言語用機械翻訳知識データベース３８に存在するかどうかを調べ（Ｓ１３）、抽出語句が第二言語用機械翻訳知識データベース３８に存在する場合には、抽出語句の共起情報を調べ（Ｓ１４）、抽出語句の分野情報を調べる（Ｓ１５）。そして、第二言語知識検索部３６は、抽出語句がまだあるかどうかを判定し（Ｓ１６）、抽出語句がまだある場合にはステップＳ１３に戻り、ステップＳ１３〜Ｓ１５を繰り返す。これにより、すべての抽出語句につき共起情報及び分野情報を調べることになる。 Next, the second language knowledge search unit 36 checks whether the extracted word / phrase exists in the second language machine translation knowledge database 38 (S13), and the extracted word / phrase exists in the second language machine translation knowledge database 38. In this case, the co-occurrence information of the extracted word / phrase is checked (S14), and the field information of the extracted word / phrase is checked (S15). Then, the second language knowledge search unit 36 determines whether or not there is an extracted word / phrase (S16). If there is still an extracted word / phrase, the process returns to step S13 and repeats steps S13 to S15. As a result, the co-occurrence information and the field information are examined for all the extracted words.

図６は第二言語用機械翻訳知識データベース３８に蓄積された語句の共起情報及び分野情報の一例の説明図である。第二言語用機械翻訳知識データベース３８には、第二言語（日本語）の語句の少なくとも共起情報及び分野情報が格納されており、例えば、「記事」については、共起情報「ニュース、報道、新聞」、分野情報として「ビジネス」が格納され、「運河」については、共起情報「堤防、土手」、分野情報「建築土木」が格納され、「高台」については、共起情報「見晴らし、土手」、分野情報「建築土木」が格納され、「テント」については、共起情報「設営」、分野情報「アウトドア、建設」が格納され、「設営」については、共起情報なし、分野情報「建築土木」が格納され、「銀行」については、共起情報「金、預金、残高」、分野情報「金融、経済」が格納され、「土手」については、共起情報「運河」、分野情報「建築土木」が格納されている場合を示している。 FIG. 6 is an explanatory diagram of an example of phrase co-occurrence information and field information stored in the machine translation knowledge database 38 for the second language. The second language machine translation knowledge database 38 stores at least co-occurrence information and field information of words in the second language (Japanese). For example, for “article”, the co-occurrence information “news, news reports” is stored. , Newspapers, "business" is stored as field information, "canal" is stored with co-occurrence information "embankment, bank", field information "architectural engineering" is stored, and "highland" is stored with co-occurrence information "view" , Bank ”, field information“ architectural engineering ”,“ tent ”for co-occurrence information“ installation ”, field information“ outdoor, construction ”,“ construction ”for co-occurrence information, field The information “architectural engineering” is stored, the “bank” is co-occurrence information “gold, deposit, balance”, the field information “finance, economy” is stored, and the “bank” is co-occurrence information “canal”, Field information "architectural engineering" is stored It shows a case that is.

第二言語知識検索部３６は、抽出語句（「記事」、「運河」、「高台」、「テント」、「設営」）について、ステップＳ１３〜Ｓ１５の処理により、共起情報及び分野情報を調べ、抽出語句に共起として登録されている訳語候補、抽出語句が含まれる分野ごとに点数を付け、第二言語用機械翻訳知識データベース３８を検索した結果としての知識検索結果を得る（Ｓ１７）。 The second language knowledge search unit 36 examines the co-occurrence information and the field information for the extracted words (“article”, “canal”, “high plate”, “tent”, “installation”) through the processing of steps S13 to S15. Then, a score is assigned to each field in which the extracted word / phrase registered as co-occurrence and the extracted word / phrase are included, and a knowledge search result is obtained as a result of searching the second language machine translation knowledge database 38 (S17).

例えば、共起として登録されている語句が訳語候補のどれかと同じであるとき、この共起語には共起得点として１０点加算し（Ａ）、また、抽出語句が含まれる分野に１個につき１点を加算して（Ｂ）、図７に示すような知識検索結果を得る。図７に示すように、「土手」は二つの抽出語句「運河」、「高台」の共起語であるので２０点を付与し、「堤防」は一つの抽出語句「運河」の共起語であるので１０点を付与する。また、抽出語句「記事」の分野はビジネス、抽出語句「運河」の分野は建築土木、抽出語句「高台」の分野は建築土木、抽出語句「テント」の分野はアウトドアと建設、抽出語句「設営」の分野は建築土木であるので、分野得点として「建築土木」は３点、「アウトドア」は１点、「建設」は１点、「ビジネス」は１点を付与する。共起得点や分野得点は予め記憶装置２５に記憶しておく。 For example, when a word registered as a co-occurrence is the same as one of the candidate translations, 10 points are added to the co-occurrence word as a co-occurrence score (A), and one is added to the field containing the extracted word / phrase. One point is added (B) to obtain a knowledge search result as shown in FIG. As shown in FIG. 7, since “bank” is a co-occurrence word of two extracted phrases “canal” and “high plateau”, 20 points are given, and “dyke” is a co-occurrence word of one extracted word “canal” Therefore, 10 points are given. The field of the extracted word “article” is business, the field of the extracted word “canal” is architectural civil engineering, the field of the extracted word “high plate” is architectural civil engineering, the field of the extracted word “tent” is outdoor and construction, and the extracted word “setting” Since the field of “Architecture” is architectural civil engineering, “Architecture civil engineering” is given 3 points, “Outdoor” is 1 point, “Construction” is 1 point, and “Business” is 1 point. Co-occurrence scores and field scores are stored in the storage device 25 in advance.

訳語選択処理部３５の対訳語決定処理部３７は、第二言語知識検索部３６で得られた知識検索結果に基づいて、訳語候補の最高得点の語句は一つかどうかを判定し（Ｓ１８）、最高得点の語句が一つであるときは、その最高得点の語句に決定する（Ｓ１９）。一方、最高得点の語句が一つでないときは訳語候補列の順序で語句を決定する（Ｓ２０）。このようにして、訳語決定処理部３７は第二言語知識検索部３６の知識検索結果から対訳語を一つに絞り込む。 Based on the knowledge search result obtained by the second language knowledge search unit 36, the parallel word determination processing unit 37 of the translation selection processing unit 35 determines whether there is one word with the highest score of the translation word candidate (S18). If there is only one word with the highest score, the word with the highest score is determined (S19). On the other hand, when the highest score is not one, the words are determined in the order of the candidate word strings (S20). In this way, the translated word determination processing unit 37 narrows down the parallel translation word to one from the knowledge search result of the second language knowledge search unit 36.

ここで、訳語候補の得点の仕方として、抽出語句の共起として登録されている訳語候補のうち共起得点が最も高い語句を求める対訳語としてもよい。あるいは、最も高い分野得点を得た分野（建築土木）に含まれる訳語候補を求める対訳語としても良い。こうして、最終的に一つの対訳語に決定する。 Here, as a method of scoring a translation word candidate, it may be a parallel translation word for obtaining a phrase having the highest co-occurrence score among translation word candidates registered as co-occurrence of extracted words. Or it is good also as a parallel translation which calculates | requires the translation candidate contained in the field (architectural engineering) which obtained the highest field score. In this way, the final translation is determined as one bilingual word.

以上の説明では、第一言語は英語で第二言語は日本語の場合について説明したが、英日翻訳に限らず、翻訳前の第一言語と翻訳後の第二言語は、中国語や韓国語、ロシア語など、他の言語でも良い。 In the above description, the first language is English and the second language is Japanese. However, not only English-Japanese translation, but the first language before translation and the second language after translation are Chinese and Korean. Other languages such as Russian and Russian are also acceptable.

また、複数の言語による文章が混在する文書を翻訳する場合にも適用できる。また、混在する言語が３種類である場合に、翻訳対象言語の第二言語だけでなく、非訳出言語である第三言語について、第三言語の文章から予め定めた品詞の語句を抽出し、抽出語句の共起情報や分野情報に基づいて、第一言語の訳語を一つに決定するようにすることも可能である。 The present invention can also be applied to a case where a document in which sentences in a plurality of languages are mixed is translated. In addition, when there are three types of mixed languages, not only the second language of the translation target language but also the third language that is a non-translation language, a predetermined part-of-speech phrase is extracted from the third language sentence, Based on the co-occurrence information and field information of the extracted words / phrases, it is also possible to determine one translation of the first language.

この場合、第二言語用機械翻訳知識データベースに代えて第三言語用機械翻訳知識データベースを用意し、また、第二言語文抽出語句データベースに代えて第三言語文抽出語句データベースを用意し、第二言語文解析処理部に代えて第三言語文解析処理部を設けて、第三言語文解析処理部で抽出された第三言語の語句を第三言語文抽出語句データベースに格納することになる。 In this case, a machine translation knowledge database for the third language is prepared instead of the machine translation knowledge database for the second language, and a third language sentence extraction phrase database is prepared instead of the second language sentence extraction phrase database. A third language sentence analysis processing unit is provided instead of the bilingual sentence analysis processing unit, and the third language phrase extracted by the third language sentence analysis processing unit is stored in the third language sentence extraction phrase database. .

そして、訳語選択処理部３５は、翻訳辞書検索部３０で検索した結果、第一言語の語句の訳語候補となる第二言語の語句が翻訳辞書部に複数存在した場合に、第三言語文抽出語句データベースに格納された第三言語の語句と第三言語用機械翻訳知識データベースの共起情報又は分野情報に基づいて第一言語の語句の訳語候補を一つの第二言語の語句に絞り込み選択する。訳語選択処理部３５は第二言語知識検索部３６に代えて第二言語知識検索部を有することになる。 Then, as a result of the search by the translation dictionary search unit 30, the translation word selection processing unit 35 extracts a third language sentence when a plurality of second language phrases that are translation word candidates of the first language phrase exist in the translation dictionary unit. Based on co-occurrence information or field information in the third language stored in the phrase database and the third language machine translation knowledge database, candidate translations of the first language phrase are narrowed down to one second language phrase. . The translated word selection processing unit 35 has a second language knowledge search unit instead of the second language knowledge search unit 36.

本発明の実施の形態によれば、翻訳対象の文書に含まれる第一言語の語句だけでは、複数の訳語候補の中から一つに絞り込めない場合でも、翻訳対象ではない訳語言語である第二言語や非訳語言語の文章に含まれる語句から、共起情報や分野情報の情報を得て、より文意に合った訳語を選択することができる。例えば、多国語によるメール文を翻訳する場合に有効である。メール原文に対する返信文は互いに関連性が高く、予め登録してある文章と比べ、訳語選択により有用な情報が含まれている。従って、同じ文書に含まれる翻訳非対象文が少量でも、訳語選択に役立てることができる。 According to the embodiment of the present invention, even if the phrase of the first language included in the document to be translated cannot be narrowed down to one of a plurality of candidate translations, it is the translated language that is not the translation target. It is possible to obtain co-occurrence information and field information information from the phrases included in the bilingual and non-translated language sentences, and to select a translation that is more suited to the meaning of the sentence. For example, it is effective when translating e-mail texts in multiple languages. Reply sentences to the original e-mail are highly related to each other, and contain useful information by selecting a translated word compared to pre-registered sentences. Therefore, even if a small amount of non-translation sentences included in the same document, it can be used for selecting a translation.

本発明の実施の形態に係る機械翻訳装置のハードウエア構成を示すブロック構成図。The block block diagram which shows the hardware constitutions of the machine translation apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係わる機械翻訳装置の機能ブロック図。The functional block diagram of the machine translation apparatus concerning embodiment of this invention. 本発明の実施の形態における機械翻訳装置の処理内容を示すフローチャート。The flowchart which shows the processing content of the machine translation apparatus in embodiment of this invention. 本発明の実施の形態における第二言語文抽出語句データベースに格納される抽出語句の一例の説明図。Explanatory drawing of an example of the extraction word / phrase stored in the 2nd language sentence extraction word / phrase database in embodiment of this invention. 本発明の実施の形態における翻訳辞書検索部から送られてきた複数の訳語候補がある原語及び訳語候補の一例の説明図。Explanatory drawing of an example of the original word with a several translation word candidate sent from the translation dictionary search part in embodiment of this invention, and a translation word candidate. 本発明の実施の形態における第二言語用機械翻訳知識データベースに蓄積された語句の共起情報及び分野情報の一例の説明図。Explanatory drawing of an example of the co-occurrence information and field information of the phrase accumulate | stored in the machine translation knowledge database for 2nd languages in embodiment of this invention. 本発明の実施の形態における訳語選択処理部で第二言語用機械翻訳知識データベースを検索した結果としての知識検索結果の一例の説明図。Explanatory drawing of an example of the knowledge search result as a result of having searched the machine translation knowledge database for 2nd languages in the translation word selection process part in embodiment of this invention. 従来例での第一言語の文と第二言語の文とが混在する文書を翻訳した一例の説明図。Explanatory drawing of an example which translated the document in which the sentence of the 1st language in the conventional example and the sentence of the 2nd language were mixed.

Explanation of symbols

１１…機械翻訳装置、１２…演算制御装置、１３…プロセッサ、１４…メモリ、１５…機械翻訳プログラム、１６…作業エリア、１７…出力装置、１８…表示装置、１９…通信制御装置、２０…入力装置、２１…マウス、２２…キーボード、２３…ディスクドライブ、２４…ハードディスクドライブ、２５…記憶装置、２６…入力処理部、２７…第二言語文解析処理部、２８…第二言語文抽出語句データベース、２９…第一言語文解析処理部、３０…翻訳辞書検索部、３１…翻訳辞書部、３２…訳文生成処理部、３３…第一言語用機械翻訳知識データベース、３４…翻訳出力処理部、３５…訳語選択処理部、３６…第二言語知識検索部、３７…対訳語決定処理部、３８…第二言語用機械翻訳知識データベース DESCRIPTION OF SYMBOLS 11 ... Machine translation apparatus, 12 ... Operation control apparatus, 13 ... Processor, 14 ... Memory, 15 ... Machine translation program, 16 ... Work area, 17 ... Output device, 18 ... Display apparatus, 19 ... Communication control apparatus, 20 ... Input Device: 21 ... Mouse, 22 ... Keyboard, 23 ... Disk drive, 24 ... Hard disk drive, 25 ... Storage device, 26 ... Input processing unit, 27 ... Second language sentence analysis processing unit, 28 ... Second language sentence extraction phrase database , 29 ... first language sentence analysis processing section, 30 ... translation dictionary search section, 31 ... translation dictionary section, 32 ... translation generation processing section, 33 ... first language machine translation knowledge database, 34 ... translation output processing section, 35 ... translation word selection processing unit, 36 ... second language knowledge search unit, 37 ... translation word determination processing unit, 38 ... second language machine translation knowledge database

Claims

A storage device storing a translation dictionary unit storing a machine translation program and one or a plurality of second language words that are translation candidates of a first language word, and a second language to be translated and a second non-translated word In a machine translation apparatus comprising: an input device that inputs data in which languages are mixed; a display device that displays a translation of a second language obtained by translating the first language; and an arithmetic control device that executes the machine translation program A second language machine translation knowledge database formed in the storage device and storing at least co-occurrence information and field information of words in the second language, and data in which the first language and the second language are mixed is the input device. The input processing unit for inputting the translation target part of the first language and the translation non-target part of the second language, and analyzing the translation non-target part of the second language input by the input processing unit. ,second A second language sentence analysis processing unit that extracts a part-of-speech phrase stored in advance in the storage device from a non-translational part of a word, and a second language phrase extracted by the second language sentence analysis processing unit is stored A second language sentence extraction phrase database, a first language analysis processing unit for analyzing a translation target portion of the first language input by the input processing unit, and a first language analyzed by the first language analysis processing unit A translation dictionary search unit that searches the translation dictionary unit for a phrase in the second language that is a candidate word translation, and a result of the search in the translation dictionary search unit that results in a candidate for the second language that is a candidate for the first language phrase When a plurality of words / phrases exist in the translation dictionary unit, based on the co-occurrence information or field information of the second language word / phrase stored in the second language sentence extraction word / phrase database and the second language machine translation knowledge database The first language phrase A translation selection processing unit that narrows down and selects word candidates into one second language phrase, a second language phrase searched by the translation dictionary search unit, and a second language phrase selected by the translation selection processing unit A machine translation device, comprising: a translation generation processing unit that generates a translation based on a translation, and an output processing unit that outputs a translation of a second language generated by the translation generation processing unit to the display device.

The translation selection processing unit examines a phrase registered as co-occurrence information in the second language machine translation knowledge database for a second language phrase stored in the second language sentence extraction phrase database, and co-occurrence When a word registered as information is a word in a second language that is a translation candidate for the word in the first language, a predetermined co-occurrence score stored in advance in a storage device is added to each word, 2. The machine translation apparatus according to claim 1, wherein a phrase having the highest co-occurrence score is selected as a parallel translation phrase.

The translated word selection processing unit examines field information registered in the second language machine translation knowledge database for the second language phrase stored in the second language sentence extracted phrase database, and the second language sentence Predetermined field score stored in the storage device is added to one field information registered in the second language word stored in the extracted word / phrase database, and the candidate word of the field with the highest field score is selected as the parallel word / phrase. The machine translation apparatus according to claim 1, wherein:

A storage device storing a translation dictionary unit storing a machine translation program and one or a plurality of second language and third language words that are candidates for translation of a first language word, a first language to be translated, and a non-translation target An input device for inputting data in which a target third language is mixed, a display device for displaying a translated sentence in the second language obtained by translating the first language, and an arithmetic control device for calculating and executing the machine translation program In a machine translation device, a third language machine translation knowledge database that stores at least co-occurrence information and field information of a third language phrase that is not to be translated and is an untranslated language, formed in the storage device, An input processing unit that inputs a translation target portion of a first language and a non-translation target portion of a third language when data in which one language and a third language are mixed is input from the input device; and the input processing unit By A third language sentence analysis processing unit that analyzes an input non-target part of the third language and extracts a part-of-speech phrase stored in the storage device in advance from the third language non-target part; A third language sentence extraction phrase database that stores words of the third language extracted by the language sentence analysis processing section, and a first language that analyzes the translation target part of the first language of the translation target part input by the input processing section A language analysis processing unit, a translation dictionary search unit that searches the translation dictionary unit for a second language phrase that is a translation candidate for the first language phrase analyzed by the first language analysis processing unit, and the translation dictionary search As a result of the search in the section, when there are a plurality of second language phrases that are translation candidates for the first language phrase in the translation dictionary section, the third language sentence stored in the third language sentence extraction phrase database is stored. Words and machine translation knowledge for third languages A translation selection processing unit for narrowing down and selecting a translation candidate of the first language word to one second language word based on the database co-occurrence information or field information; and the second language searched by the translation dictionary search unit And a translation generation processing unit that generates a translation based on the phrase of the second language selected by the translation selection processing unit, and a translation of the second language generated by the translation generation processing unit is output to the display device A machine translation device comprising: an output processing unit that performs the processing.

The translation selection processing unit examines a phrase registered as co-occurrence information in the third language machine translation knowledge database for a third language phrase stored in the third language sentence extraction phrase database, and co-occurrence When a word registered as information is a word in a second language that is a translation candidate for the word in the first language, a predetermined co-occurrence score stored in advance in a storage device is added to each word, 5. The machine translation apparatus according to claim 4, wherein a phrase having the highest co-occurrence score is selected as a parallel translation phrase.

The translated word selection processing unit examines field information registered in the third language machine translation knowledge database for a third language phrase stored in the third language sentence extracted phrase database, and extracts the third language sentence. Predetermined field score stored in the storage device is added to one field information registered in the third language word stored in the extracted word database, and the candidate word of the field with the highest field score is selected as the parallel word The machine translation apparatus according to claim 4, wherein:

A storage device storing a translation dictionary unit storing a machine translation program and one or a plurality of second language words that are translation candidates of a first language word, and a second language to be translated and a second non-translated word A machine translation apparatus comprising: an input device that inputs data in which languages are mixed; a display device that displays a translation of a second language obtained by translating the first language; and an arithmetic control device that executes and executes the machine translation program In the machine translation program to be used, the storage device has a second language machine translation knowledge database storing at least co-occurrence information and field information of phrases in the second language, and the computer stores the first language and the second language. A procedure for inputting a translation target part of a first language and a translation non-target part of a second language when data mixed with a language is input from the input device, and the translation of the input second language A procedure for analyzing a target part and extracting a part of speech word stored in advance in the storage device from a second language translation non-target part, and a second language sentence extraction phrase of the storage device for the extracted second language phrase A procedure for storing in a database, and a procedure for searching the translation dictionary unit for a second language phrase that is a translation candidate for the first language phrase analyzed by analyzing the first language translation target part of the input translation target part And the second language stored in the second language sentence extraction word / phrase database when there are a plurality of second language words / phrases as translation candidates of the first language word / phrase as a result of the search. A method of narrowing down and selecting candidate words of the first language to one second language based on co-occurrence information or field information of the machine translation knowledge database for the second language and the second language Bilingual A step of generating a translation based on the word phrases and the second language said selected machine translation program for executing the steps of outputting the translated sentence of the second language the generated on the display device.

A storage device storing a translation dictionary unit storing a machine translation program and one or a plurality of second language and third language words that are candidates for translation of a first language word, a first language to be translated, and a non-translation target A machine comprising an input device for inputting data in which a target third language is mixed, a display device for displaying a translated sentence in a second language obtained by translating the first language, and an arithmetic control device for executing the machine translation program In a machine translation program used in a translation apparatus, the storage device stores at least co-occurrence information and field information of a third language phrase that is not a translation target and is a non-translated language, and a third language machine translation knowledge database And when the mixed data of the first language and the third language is input to the computer from the input device, the first language translation target part and the third language non-translation part A step of inputting, a step of analyzing a third language translation non-target portion of the input non-target translation portion and extracting a part-of-speech phrase stored in advance in the storage unit from a third language sentence; A procedure for storing three-language phrases in a third-language sentence extraction phrase database in the storage device, and a translation candidate for a first-language phrase analyzed by analyzing a first-language translation target part of the input translation-target part; When a plurality of second language phrases that are candidates for translation of the first language word are present in the translation dictionary part as a result of the search from the translation dictionary part, Based on the co-occurrence information or the field information of the third language phrase stored in the third language sentence extracted phrase database and the third language machine translation knowledge database, one second translation candidate of the first language phrase is selected. Focus on language phrases A procedure for generating a translation based on the searched second language phrase and the selected second language phrase, and outputting the generated second language translation to the display device And a machine translation program for executing the procedure.