JP2008225744A

JP2008225744A - Machine translation device and program

Info

Publication number: JP2008225744A
Application number: JP2007061790A
Authority: JP
Inventors: Toshiyuki Takezawa; 寿幸竹澤; Hideo Okuma; 英男大熊; Yutaka Ashikari; 豊葦苅; Toru Shimizu; 徹清水
Original assignee: National Institute of Information and Communications Technology
Current assignee: National Institute of Information and Communications Technology
Priority date: 2007-03-12
Filing date: 2007-03-12
Publication date: 2008-09-25

Abstract

<P>PROBLEM TO BE SOLVED: To more highly precisely achieve machine translation than a conventional manner. <P>SOLUTION: This machine translation device is configured to divide an accepted sentence into two or more words and phrases, and to acquire generalization information corresponding to one or more divided words and phrases, and to calculate the degree of similarity of a sentence using the acquired generalization information and one or more words and phrases among the two or more divided words and phrases and a sentence in each source language owned by the two or more sets of translation sentence sets, and to output one or more sentence in the source language similar to the accepted sentence, and to accept a permission instruction corresponding to one source language among one or more output sentences in the source language, and to acquire a sentence in target language corresponding to the source language corresponding to the permission instruction, and to acquire words and phrases in the target language making a pair with the words and phrases in the source language corresponding to the acquired generalization information, and to replace the words and phrases corresponding to the generalization information in the sentence in the acquired target language with the acquired words and phrases in the target language, and to configure the sentence in the target language, and to output the configured sentence in the target language. Thus, it is possible to highly precisely achieve machine translation. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、精度高く機械翻訳を行う機械翻訳装置等に関するものである。 The present invention relates to a machine translation apparatus that performs machine translation with high accuracy.

従来、翻訳結果を逆方向にもう一度翻訳して、その結果を提示することにより、システム利用者に、翻訳結果の良し悪しを確認させる手法があった（非特許文献１参照）。 Conventionally, there has been a technique for allowing a system user to confirm whether a translation result is good or bad by translating the translation result again in the reverse direction and presenting the result (see Non-Patent Document 1).

また、アナロジに基づく翻訳の一つの実現法があり、当該方法では、類似対訳用例の検索に意味辞書と編集距離を用いている（非特許文献２参照）。 Moreover, there is one method for realizing translation based on analogy, and in this method, a semantic dictionary and an edit distance are used for searching for similar parallel translation examples (see Non-Patent Document 2).

また、二つの文の類似度を算出する方法として、編集距離とtf/idfを併用するスコア付けによる処理がある（非特許文献３参照）。
Uchimoto, K., Hayashida, N., Ishida, T., and Isahara, H., "Automatic rating of machine translatability," Proc. of MT Summit X, pp. 235-242 (2005). Sumita, E., "Example-based machine translation using DP-matching between word sequences," Proc. of ACL 2001 Workshop on Data-Driven Machine Translation, pp. 9-16 (2001). Watanabe, T. and Sumita, E., "Example-based decoding for statistical machine translation," Proc. of MT Summit IX, pp. 410-417 (2003). In addition, as a method for calculating the similarity between two sentences, there is a process by scoring that uses both the edit distance and tf / idf (see Non-Patent Document 3).
Uchimoto, K., Hayashida, N., Ishida, T., and Isahara, H., "Automatic rating of machine translatability," Proc. Of MT Summit X, pp. 235-242 (2005). Sumita, E., "Example-based machine translation using DP-matching between word sequences," Proc. Of ACL 2001 Workshop on Data-Driven Machine Translation, pp. 9-16 (2001). Watanabe, T. and Sumita, E., "Example-based decoding for statistical machine translation," Proc. Of MT Summit IX, pp. 410-417 (2003).

しかしながら、従来の方法は、多数データを用いた平均値的な傾向では役立つ可能性はあるものの、文や発話のような小さい単位で性能を保証するのは難しい、という課題があった。 However, the conventional method has a problem that it is difficult to guarantee the performance in a small unit such as a sentence or an utterance although it may be useful in an average tendency using a large number of data.

本第一の発明の機械翻訳装置は、語句と、当該語句を汎化した情報である汎化情報を有する語句汎化情報を1以上格納し得る語句汎化情報格納部と、原言語の語句と目的言語の語句を対に有する語句セットを１組以上格納している語句セット格納部と、原言語の文と目的言語の文を対に有する翻訳文セットであり、当該翻訳文セットが有する前記原言語の文と前記目的言語の文の一部の語句に対応する汎化情報を有する翻訳文セットを２組以上格納している翻訳文セット格納部と、文を受け付ける文受付部と、前記文を２以上の語句に分割する語句分割部と、前記語句分割部が分割した１以上の語句に対応する汎化情報を、前記語句汎化情報格納部から取得する汎化情報取得部と、前記汎化情報取得部が取得した汎化情報と前記語句分割部が分割した２以上の語句のうちの1以上の語句を用いた文と、前記翻訳文セット格納部の２組以上の翻訳文セットが有する各原言語の文との類似度を算出する類似度算出部と、前記類似度算出部が算出した類似度を用いて、前記文受付部が受け付けた文と類似する１以上の原言語の文を出力する類似文出力部と、前記類似文出力部が出力した1以上の原言語の文のうち、一の原言語の文に対する許可を示す指示である許可指示の入力を受け付ける許可指示受付部と、前記許可指示受付部が受け付けた許可指示に対応する原言語の文に対応する目的言語の文を取得する目的言語文取得部と、前記汎化情報取得部が取得した汎化情報に対応する原言語の語句と対になる目的言語の語句を、前記語句セット格納部から取得する目的言語語句取得部と、前記目的言語文取得部が取得した目的言語の文中における、汎化情報に対応する語句を、前記目的言語語句取得部が取得した目的言語の語句に置き換え、出力する目的言語の文を構成する目的言語文構成部と、前記目的言語文構成部が構成した目的言語の文を出力する出力部を具備する機械翻訳装置である。 The machine translation device of the first invention includes a phrase generalization information storage unit capable of storing one or more phrase generalization information having a phrase and generalization information that is generalized information of the phrase, a phrase in the source language A phrase set storage unit storing at least one phrase set having a pair of a target language phrase and a target sentence, and a translation sentence set having a source language sentence and a target language sentence as a pair. A translated sentence set storage unit that stores two or more sets of translated sentence sets having generalization information corresponding to some words and phrases of the source language sentence and the target language sentence; a sentence receiving unit that receives a sentence; A phrase division unit that divides the sentence into two or more words; a generalization information acquisition unit that acquires generalization information corresponding to one or more words divided by the phrase division unit from the phrase generalization information storage unit; The generalization information acquired by the generalization information acquisition unit and the phrase division unit Similarity calculation for calculating the similarity between a sentence using one or more of two or more divided phrases and sentences in each source language included in two or more translation sentence sets in the translation sentence set storage unit A similar sentence output unit that outputs one or more source language sentences similar to the sentence received by the sentence reception unit using the similarity calculated by the similarity calculation unit, and the similar sentence output unit Of the one or more source language sentences that have been output, the permission instruction reception unit that receives an input of a permission instruction that is an instruction indicating permission for a sentence in one source language, and the permission instruction received by the permission instruction reception unit A target language sentence acquisition unit that acquires a target language sentence corresponding to a source language sentence, and a target language phrase that is paired with a source language phrase corresponding to the generalization information acquired by the generalization information acquisition unit, A target language phrase acquisition unit acquired from the phrase set storage unit; The target language constituting the sentence of the target language to be output by replacing the phrase corresponding to the generalization information in the target language sentence acquired by the target language sentence acquisition unit with the target language phrase acquired by the target language phrase acquisition unit A machine translation apparatus including a sentence composing unit and an output unit that outputs a sentence in a target language configured by the target language sentence composing unit.

かかる構成により、精度の高い機械翻訳ができる。 With this configuration, highly accurate machine translation can be performed.

また、本第二の発明の機械翻訳装置は、原言語の語句と目的言語の語句を対に有する語句セットを１組以上格納している語句セット格納部と、原言語の文と目的言語の文を対に有する翻訳文セットであり、当該翻訳文セットが有する前記原言語の文と前記目的言語の文の一部の語句に対応する汎化情報を有する翻訳文セットを２組以上格納している翻訳文セット格納部と、文を受け付ける文受付部と、前記文を２以上の語句に分割し、当該語句に対応する汎化情報を取得する語句分割部と、前記語句分割部が分割した２以上の語句に対応する汎化情報のうち、語句と置き換える汎化情報を決定し、取得する汎化情報取得部と、前記汎化情報取得部が取得した汎化情報と前記語句分割部が分割した２以上の語句のうちの１以上の語句を用いた文と、前記翻訳文セット格納部の２組以上の翻訳文セットが有する各原言語の文との類似度を算出する類似度算出部と、前記類似度算出部が算出した類似度を用いて、前記文受付部が受け付けた文と類似する１以上の原言語の文を出力する類似文出力部と、前記類似文出力部が出力した１以上の原言語の文のうち、一の原言語の文に対する許可を示す指示である許可指示の入力を受け付ける許可指示受付部と、前記許可指示受付部が受け付けた許可指示に対応する原言語の文に対応する目的言語の文を取得する目的言語文取得部と、前記汎化情報取得部が取得した汎化情報に対応する原言語の語句と対になる目的言語の語句を、前記語句セット格納部から取得する目的言語語句取得部と、前記目的言語文取得部が取得した目的言語の文中における、汎化情報に対応する語句を、前記目的言語語句取得部が取得した目的言語の語句に置き換え、出力する目的言語の文を構成する目的言語文構成部と、前記目的言語文構成部が構成した目的言語の文を出力する出力部を具備する機械翻訳装置である。 The machine translation apparatus according to the second aspect of the invention also includes a phrase set storage unit that stores one or more sets of phrases each having a pair of a source language phrase and a target language phrase, a source language sentence, and a target language. Two or more sets of translation sentences having generalized information corresponding to a part of phrases of the source language sentence and the target language sentence included in the translation sentence set. A translated sentence set storage unit, a sentence receiving unit that receives a sentence, a phrase dividing unit that divides the sentence into two or more words and acquires generalization information corresponding to the words, and the phrase dividing unit Among the generalization information corresponding to two or more words and phrases, generalization information to be replaced with a phrase is determined and acquired, and the generalization information acquired by the generalization information acquisition section and the phrase division section Sentences using one or more words out of two or more words divided by , Using a similarity calculation unit that calculates a similarity with sentences in each source language included in two or more sets of translation sentences in the translation sentence set storage unit, and the similarity calculated by the similarity calculation unit, A similar sentence output unit that outputs one or more source language sentences similar to the sentence received by the sentence reception unit, and one source language sentence among the one or more source language sentences output by the similar sentence output unit A permission instruction receiving unit that receives an input of a permission instruction that is an instruction indicating permission to the target, and a target language sentence acquisition that acquires a sentence in a target language corresponding to a sentence in the source language corresponding to the permission instruction received by the permission instruction receiving unit A target language phrase acquisition unit that acquires from the phrase set storage unit a target language phrase that is paired with a source language phrase corresponding to the generalization information acquired by the generalization information acquisition unit, and the target language In the sentence of the target language acquired by the sentence acquisition unit A target language sentence constructing unit that configures a target language sentence to be output by replacing a phrase corresponding to generalization information with a target language phrase acquired by the target language phrase acquiring unit, and the target language sentence configuring unit The machine translation apparatus includes an output unit that outputs a sentence in the target language.

かかる構成により、語句汎化情報を用意する必要がなく、かつ精度の高い機械翻訳ができる。 With this configuration, it is not necessary to prepare word generalization information, and highly accurate machine translation can be performed.

また、本第三の発明の機械翻訳装置は、第一、第二いずれかの発明に対して、音声を受け付ける音声受付部と、前記音声受付部が受け付けた音声を音声認識処理し、文を構成する文構成部を具備し、前記文受付部は、前記文構成部が構成した文を取得する機械翻訳装置である。 The machine translation device according to the third aspect of the invention relates to either the first or second invention, a voice receiving unit that receives voice, and voice recognition processing of the voice received by the voice receiving unit. The sentence receiving unit is a machine translation device that acquires a sentence formed by the sentence forming unit.

かかる構成により、音声入力した文を、精度高く機械翻訳ができる。 With this configuration, it is possible to perform machine translation with high accuracy on a sentence input by voice.

また、本第四の発明の機械翻訳装置は、第一から第三いずれかの発明に対して、前記類似度算出部は、前記汎化情報取得部が取得した汎化情報の数である第一汎化情報数を取得する第一汎化情報数取得手段と、前記翻訳文セット格納部の２組以上の翻訳文セットが有する原言語の文または目的言語の文が有する汎化情報の数である第二汎化情報数を取得する第二汎化情報数取得手段と、前記第一汎化情報数と一致する第二汎化情報数を有する原言語の文を選択する類似度算出対象文選択手段と、前記汎化情報取得部が取得した汎化情報と前記語句分割部が分割した２以上の語句のうちの１以上の語句を用いた文と、前記類似度算出対象文選択手段が選択した１以上の原言語の文との類似度を算出する類似度算出手段を具備する機械翻訳装置である。 Further, in the machine translation device according to the fourth aspect of the present invention, in the first to third aspects of the invention, the similarity calculation unit is the number of generalization information acquired by the generalization information acquisition unit. Number of generalization information possessed by a source language sentence or a target language sentence included in two or more translation sentence sets of the first sentence generalization information acquisition means for acquiring the number of generalization information and the translation sentence set storage unit A second generalization information number acquisition means for acquiring the second generalization information number, and a similarity calculation target for selecting a source language sentence having a second generalization information number that matches the first generalization information number A sentence selection unit, a sentence using the generalization information acquired by the generalization information acquisition unit and one or more words out of two or more words divided by the word division unit, and the similarity calculation target sentence selection unit A machine translation apparatus comprising similarity calculation means for calculating similarity with one or more source language sentences selected by That.

かかる構成により、高速に、精度の高い機械翻訳ができる。 With this configuration, high-precision machine translation can be performed at high speed.

また、本第五の発明の機械翻訳装置は、第一、第二いずれかの発明に対して、前記許可指示受付部が、許可指示の入力を受け付けなかった場合、前記文受付部が受け付けた文の再入力または編集を促す出力を行う再入力催促部をさらに具備し、前記文受付部は、再度、文を受け付ける機械翻訳装置である。 The machine translation device according to the fifth aspect of the invention receives the sentence accepting unit when the permission instruction accepting unit does not accept the input of the permission instruction with respect to either the first or second invention. A re-input prompting unit that performs output for prompting re-input or editing of a sentence is further provided, and the sentence receiving unit is a machine translation device that receives a sentence again.

かかる構成により、より確実に、精度の高い機械翻訳ができる。 With this configuration, machine translation with higher accuracy can be performed more reliably.

また、本第六の発明の機械翻訳装置は、第二の発明に対して、前記許可指示受付部が、許可指示の入力を受け付けなかった場合、音声の再入力を促す出力を行い、再入力催促部をさらに具備し、前記音声受付部は、再度、音声を受け付ける機械翻訳装置である。 Further, the machine translation apparatus of the sixth aspect of the invention provides the second invention with an output for prompting re-input of the voice when the permission instruction receiving unit does not accept the input of the permission instruction, and re-input The voice receiving unit is a machine translation device that receives voice again.

かかる構成により、より確実に、音声入力した文を、精度高く機械翻訳ができる。 With this configuration, it is possible to perform machine translation with high accuracy on a sentence that is input by voice.

また、本第七の発明の機械翻訳装置は、第一から第六いずれかの発明に対して、前記汎化情報取得部は、名詞または形容詞の語句のみに対して、汎化情報を取得する機械翻訳装置である。 In the machine translation device of the seventh invention, the generalization information acquisition unit acquires generalization information for only a noun or adjective phrase in any one of the first to sixth inventions. Machine translation device.

かかる構成により、効率よく、かつ精度の高い機械翻訳ができる。 With this configuration, machine translation can be performed efficiently and with high accuracy.

本発明による機械翻訳装置によれば、精度高く機械翻訳ができる。 The machine translation apparatus according to the present invention can perform machine translation with high accuracy.

以下、機械翻訳装置等の実施形態について図面を参照して説明する。なお、実施の形態において同じ符号を付した構成要素は同様の動作を行うので、再度の説明を省略する場合がある。 Hereinafter, embodiments of a machine translation apparatus and the like will be described with reference to the drawings. In addition, since the component which attached | subjected the same code | symbol in embodiment performs the same operation | movement, description may be abbreviate | omitted again.

（実施の形態１）
本実施の形態において、例えば、以下のような機械翻訳装置について説明する。つまり、本機械翻訳装置は、文を受け付け、語句（形態素など）に分割し、名詞的内容語などを汎化し、類似対訳用例から、類似表現（名詞的内容語を汎化した情報を含む文など）を検索し、当該検索した類似表現が有する名詞的内容語を、受け付けた文が有する名詞的内容語に置き換え、その置き換えた文を出力する。そして、本機械翻訳装置は、ユーザからのＯＫの指示を受け付けた場合に、類似表現に対応する目的言語文を取得し、当該目的言語文の名詞的内容語の用語を、受け付けた文の名詞的内容語の翻訳語に書き換えて、文を構成し、出力するなどの処理を行う。 (Embodiment 1)
In the present embodiment, for example, the following machine translation apparatus will be described. In other words, this machine translation device accepts a sentence, divides it into words (morphemes, etc.), generalizes noun content words, etc., and uses similar parallel translation examples to obtain sentences containing information that generalizes similar expressions (noun content words). Etc.), the noun content word of the searched similar expression is replaced with the noun content word of the received sentence, and the replaced sentence is output. Then, when the machine translation device receives an OK instruction from the user, the machine translation device acquires a target language sentence corresponding to the similar expression, and uses the noun of the received sentence as a noun content word term of the target language sentence. Rewrite the target content word into a translated word to compose a sentence and output it.

図１は、本実施の形態における機械翻訳装置のブロック図である。 FIG. 1 is a block diagram of a machine translation apparatus according to the present embodiment.

機械翻訳装置は、語句汎化情報格納部１０１、語句セット格納部１０２、翻訳文セット格納部１０３、文受付部１０４、語句分割部１０５、汎化情報取得部１０６、類似度算出部１０７、類似文出力部１０８、許可指示受付部１０９、目的言語文取得部１１０、目的言語語句取得部１１１、目的言語文構成部１１２、出力部１１３、再入力催促部１１４を具備する。 The machine translation apparatus includes a phrase generalization information storage unit 101, a phrase set storage unit 102, a translated sentence set storage unit 103, a sentence reception unit 104, a phrase division unit 105, a generalization information acquisition unit 106, a similarity calculation unit 107, a similarity A sentence output unit 108, a permission instruction accepting unit 109, a target language sentence acquiring unit 110, a target language phrase acquiring unit 111, a target language sentence constructing unit 112, an output unit 113, and a re-input prompting unit 114 are provided.

類似度算出部１０７は、第一汎化情報数取得手段１０７１、第二汎化情報数取得手段１０７２、類似度算出対象文選択手段１０７３、類似度算出手段１０７４を具備する。 The similarity calculation unit 107 includes a first generalization information number acquisition unit 1071, a second generalization information number acquisition unit 1072, a similarity calculation target sentence selection unit 1073, and a similarity calculation unit 1074.

語句汎化情報格納部１０１は、語句汎化情報を1以上格納し得る。語句汎化情報は、語句と、汎化情報を有する。汎化情報は、対応する語句を汎化した情報である。語句の単位は、形態素が好適であるが、単語や句などでも良い。また、汎化情報は、例えば、語句「東京」に対して「地名」である。また、汎化情報は、例えば、語句「夏目漱石」に対して「人名」である。汎化情報は、例えば、品詞や、品詞の細分化情報（「地名」や「人名」など）である。語句汎化情報格納部１０１は、不揮発性の記録媒体が好適であるが、揮発性の記録媒体でも実現可能である。 The phrase generalization information storage unit 101 can store one or more pieces of phrase generalization information. The phrase generalization information includes a phrase and generalization information. The generalization information is information obtained by generalizing the corresponding phrase. The unit of the phrase is preferably a morpheme, but may be a word or a phrase. The generalization information is, for example, “place name” for the phrase “Tokyo”. The generalization information is, for example, “person name” for the phrase “Natsume Soseki”. The generalization information is, for example, part of speech or segmentation information of part of speech (such as “place name” or “person name”). The phrase generalization information storage unit 101 is preferably a nonvolatile recording medium, but can also be realized by a volatile recording medium.

語句セット格納部１０２は、語句セットを１組以上格納している。語句セットは、原言語の語句と目的言語の語句を対に有する情報である。つまり、語句セット格納部１０２は、いわゆる用語辞書である。語句セットは、例えば、「東京，Tokyo」である。語句セット格納部１０２は、不揮発性の記録媒体が好適であるが、揮発性の記録媒体でも実現可能である。 The phrase set storage unit 102 stores one or more sets of phrase sets. The phrase set is information having a pair of a source language phrase and a target language phrase. That is, the phrase set storage unit 102 is a so-called term dictionary. The phrase set is, for example, “Tokyo, Tokyo”. The phrase set storage unit 102 is preferably a nonvolatile recording medium, but can also be realized by a volatile recording medium.

翻訳文セット格納部１０３は、翻訳文セットを２組以上格納している。翻訳文セットは、原言語の文と目的言語の文を対に有する情報である。また、翻訳文セットが有する原言語の文および目的言語の文を構成する語句の一部は、通常、汎化情報を対応付けて有する。つまり、原言語の文が「京都へ行きたいのですが」の場合、例えば、翻訳文セットが有する文は、「京都［地名］／へ／行き／たい／の／です／が」である。また、例えば、翻訳文セットが有する目的言語の文は、「I'd like to go to ／Kyoto［地名］」である。翻訳文セットが有する文は、区切り（ここでは「／」）の情報を有することが好適である。なお、区切りの情報は、「／」とは限らず、スペースや「，」などでも良いことは言うまでもない。翻訳文セット格納部１０３は、不揮発性の記録媒体が好適であるが、揮発性の記録媒体でも実現可能である。 The translated sentence set storage unit 103 stores two or more sets of translated sentence sets. The translation sentence set is information having a pair of a source language sentence and a target language sentence. Also, some of the phrases constituting the source language sentence and the target language sentence included in the translation sentence set usually have generalization information associated with each other. In other words, when the source language sentence is “I want to go to Kyoto”, for example, the sentence in the translation set is “Kyoto [place name] / to / go / toi / no / is / ga”. Further, for example, the sentence in the target language included in the translation sentence set is “I'd like to go to / Kyoto [place name]”. It is preferable that a sentence included in the translation sentence set includes information on a delimiter (here, “/”). Needless to say, the delimiter information is not limited to “/” but may be a space or “,”. The translated sentence set storage unit 103 is preferably a non-volatile recording medium, but can also be realized by a volatile recording medium.

文受付部１０４は、文を受け付ける。また、文受付部１０４は、二度以上、文を受け付けても良い。文受付部１０４が、二度目以降、文を受け付ける場合は、許可指示受付部１０９が許可指示を受け付けなかった場合である。許可指示を受け付けなかった場合とは、不許可の指示を受け付けた場合や、予め決められた所定時間以上、指示を受け付けなかった場合などである。文の入力手段は、テンキーやキーボードやマウスやメニュー画面によるもの等、何でも良い。文受付部１０４は、テンキーやキーボード等の入力手段のデバイスドライバーや、メニュー画面の制御ソフトウェア等で実現され得る。 The sentence receiving unit 104 receives a sentence. In addition, the sentence receiving unit 104 may receive a sentence twice or more. The case where the sentence receiving unit 104 receives a sentence for the second time or later is a case where the permission instruction receiving unit 109 has not received a permission instruction. The case where the permission instruction is not received includes a case where a non-permission instruction is received, or a case where the instruction is not received for a predetermined time or more. The sentence input means may be anything such as a numeric keypad, keyboard, mouse or menu screen. The sentence receiving unit 104 can be realized by a device driver for input means such as a numeric keypad or a keyboard, control software for a menu screen, and the like.

語句分割部１０５は、文を２以上の語句に分割する。語句分割部１０５は、文を形態素解析し、２以上の形態素を得る。なお、語句分割部１０５は、形態素解析以外の処理により、文を語句に分割しても良い。語句分割部１０５は、例えば、形態素解析器（ＣｈａＳｅｎ）URL< http://chasen.naist.jp/hiki/ChaSen/> 参照）で実現され得る。なお、形態素解析器も種々の処理方法があり、そのアルゴリズムや処理結果の若干の相違は問わない。語句分割部１０５は、通常、ＭＰＵやメモリ等から実現され得る。語句分割部１０５の処理手順は、通常、ソフトウェアで実現され、当該ソフトウェアはＲＯＭ等の記録媒体に記録されている。但し、ハードウェア（専用回路）で実現しても良い。 The phrase dividing unit 105 divides a sentence into two or more words. The phrase dividing unit 105 performs morphological analysis on the sentence to obtain two or more morphemes. Note that the phrase dividing unit 105 may divide the sentence into phrases by processing other than morphological analysis. The word division unit 105 can be realized by, for example, a morphological analyzer (ChaSen) URL <see http://chasen.naist.jp/hiki/ChaSen/>. Note that the morphological analyzer also has various processing methods, and there is no particular difference in the algorithm and processing results. The phrase dividing unit 105 can be usually realized by an MPU, a memory, or the like. The processing procedure of the phrase dividing unit 105 is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).

汎化情報取得部１０６は、語句分割部１０５が分割した１以上の語句に対応する汎化情報を、語句汎化情報格納部１０１から取得し、メモリ上に配置する。汎化情報取得部１０６は、通常、ＭＰＵやメモリ等から実現され得る。汎化情報取得部１０６の処理手順は、通常、ソフトウェアで実現され、当該ソフトウェアはＲＯＭ等の記録媒体に記録されている。但し、ハードウェア（専用回路）で実現しても良い。 The generalization information acquisition unit 106 acquires generalization information corresponding to one or more words divided by the word division unit 105 from the word generalization information storage unit 101 and arranges it on the memory. The generalized information acquisition unit 106 can be usually realized by an MPU, a memory, or the like. The processing procedure of the generalized information acquisition unit 106 is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).

類似度算出部１０７は、第一の文と、第二の文との類似度を算出し、メモリ上に配置する。第一の文は、汎化情報取得部１０６が取得した汎化情報と語句分割部１０５が分割した２以上の語句のうちの1以上の語句を用いた文である。第一の文は、例えば、語句分割部１０５が分割した２以上の語句からなる文（例えば、「東京／へ／行き／たい／ん／です／が」）を、当該文中の汎化情報に対応する語句（例えば、「東京」）を汎化情報（例えば、［地名］）に置き換えた文（例えば、「［地名］／へ／行き／たい／ん／です／が」）である。第二の文は、翻訳文セット格納部１０３の２組以上の翻訳文セットが有する各原言語の文である。翻訳文セットが有する原言語の文も、通常、内部に汎化情報を含む。また、二つの文の類似度を算出するアルゴリズムは問わない。二つの文の類似度を算出するアルゴリズムの第一の例は、tf/idfによる。また、第二の例は、編集距離により、類似度を算出する方法である。また、第三の例は、編集距離とtf/idfを併用して、類似度を算出する方法である。さらに、第四の例は、単純に一致する語句の割合を類似度とする方法である。 The similarity calculation unit 107 calculates the similarity between the first sentence and the second sentence, and arranges it on the memory. The first sentence is a sentence using the generalization information acquired by the generalization information acquisition unit 106 and one or more words out of two or more words divided by the word division unit 105. The first sentence is, for example, a sentence composed of two or more words divided by the word dividing unit 105 (for example, “Tokyo / To / Go / Tai / Nan / Is / ga”) as generalization information in the sentence. A sentence (for example, “[place name] / to / go / tai / n / is / ga”) in which a corresponding word (for example, “Tokyo”) is replaced with generalized information (eg, [place name]). The second sentence is a sentence in each source language included in two or more translation sentence sets in the translation sentence set storage unit 103. A sentence in the source language included in the translation sentence set usually also includes generalization information therein. Moreover, the algorithm which calculates the similarity degree of two sentences is not ask | required. The first example of an algorithm for calculating the similarity between two sentences is based on tf / idf. The second example is a method for calculating the similarity based on the edit distance. A third example is a method of calculating the similarity by using both the edit distance and tf / idf. Furthermore, the fourth example is a method in which the ratio of words that are simply matched is used as the similarity.

まず、第一の例について説明する。第一の例のtf/idfによる類似度（Ｐ_{ｔｆ／ｉｄｆ}（Ｉ_ｋ，Ｊ_０））は、以下の数式１により算出される。
First, the first example will be described. The similarity (P _{tf / idf} (I _k , J ₀ )) based on tf / idf in the first example is calculated by the following formula 1.

数式１において、Ｊ_０は原言語の文、Ｊ_０，ｉはＪ_０のｉ番目の形態素、ｄｆ（Ｊ_０，ｉ）は形態素Ｊ_０，ｉの文書頻度（ｄｏｃｕｍｅｎｔｆｒｅｑｕｅｎｃｙ）、Ｎは翻訳文セット格納部１０３の用例数（翻訳セットの数）である。文書頻度は、例えば、翻訳文セット格納部１０３中の出現頻度や、Ｗｅｂ中で、形態素Ｊ_０，ｉが出現するホームページの数などである。
Ｊ_ｋに対象形態素があればその語句頻度（ｔｅｒｍｆｒｅｑｕｅｎｃｙ）は１、なければ０とする。このスコアは原言語の文Ｊ_０の長さ（｜Ｊ_０｜）で正規化している。 In Equation 1, _{J 0} is the source language _{sentence, J 0, i} is the i-th morpheme of _{_{J 0, df (J 0,}} i) document frequency of morpheme _{J 0, i (document frequency)} , N is the translation sentence This is the number of examples in the set storage unit 103 (the number of translation sets). The document frequency is, for example, the appearance frequency in the translated sentence set storage unit 103, the number of home pages in which morphemes J0 _{, i} appear on the Web, and the like.
If there is a target morpheme in J _k , the term frequency is 1 and if it is not 0. This score is normalized by the length (| J ₀ |) of the sentence J _{0 in} the source language.

次に、第二の例について説明する。第二の例は、編集距離を併用して、類似度を算出する方法である。編集距離（ｄｉｓ（Ｊ_ｋ，Ｊ_０））は、以下の数式２で表される。
Next, a second example will be described. The second example is a method of calculating the similarity by using the edit distance together. The edit distance (dis (J _k , J ₀ )) is expressed by Equation 2 below.

数式２において、ｋは編集距離を算出する対象の、翻訳文セット格納部１０３中の原言語の文の数以下の数値である。Ｉ（Ｊ_ｋ，Ｊ_０），Ｄ（Ｊ_ｋ，Ｊ_０），Ｓ（Ｊ_ｋ，Ｊ_０）は、それぞれ挿入，脱落，置換誤りの数である。なお、挿入，脱落，置換誤りの数を算出する方法は公知技術であるので、ここでの詳細な説明は省略する。 In Equation 2, k is a numerical value equal to or less than the number of sentences in the source language in the translated sentence set storage unit 103 for which the edit distance is calculated. I (J _k , J ₀ ), D (J _k , J ₀ ), and S (J _k , J ₀ ) are the numbers of insertion, deletion, and replacement errors, respectively. Note that the method for calculating the number of insertions, omissions, and replacement errors is a well-known technique, and thus detailed description thereof is omitted here.

次に、第三の例について説明する。第三の例は、編集距離とtf/idfを併用して、類似度（ｓｃｏｒｅ）を算出する方法である。第三の例は、以下の数式３を用いて算出する。
Next, a third example will be described. The third example is a method of calculating the similarity (score) using the edit distance and tf / idf together. The third example is calculated using Equation 3 below.

数式３において、ｄｉｓ（Ｊ_ｋ，Ｊ_０）を正規化するにあたり，原言語の文の長さ｜Ｊ_０｜に対して行っても良いが、ここでは、負にならないように｜Ｊ_ｋ＋Ｊ_０｜を用いている。重みαの値は、「０＜α＜１」となるように、適切に選択され、格納されている。 In Formula 3, when dis (J _k , J ₀ ) is normalized, it may be performed on the sentence length | J ₀ | of the source language, but here, | J _k + J so as not to be negative. ₀ | is used. The value of the weight α is appropriately selected and stored so that “0 <α <1”.

なお、類似度算出部１０７は、数式１から数式３などの、類似度を算出するための数式の情報を予め保持しており、かかる数式の情報を読み出し、上述した必要な情報（Ｉ（Ｊ_ｋ，Ｊ_０）など）を取得し、数式に代入し、類似度を得て、当該類似度をメモリ上に配置する。 Note that the similarity calculation unit 107 holds in advance formula information for calculating the similarity, such as Formula 1 to Formula 3, reads the formula information, and reads the necessary information (I (J _k , J ₀ ), etc.) are obtained and substituted into the mathematical formula to obtain the similarity, and the similarity is placed in the memory.

また、類似度算出部１０７は、例えば、上述したtf/idfにより、翻訳文セット格納部１０３中の原言語の各文のスコア付けを行って、上位Ｎ個の文に対してのみ、類似度を算出するようにしても良い。 Further, the similarity calculation unit 107 scores each sentence in the source language in the translated sentence set storage unit 103 by using, for example, tf / idf described above, and the similarity is calculated only for the top N sentences. May be calculated.

類似度算出部１０７は、通常、ＭＰＵやメモリ等から実現され得る。類似度算出部１０７の処理手順は、通常、ソフトウェアで実現され、当該ソフトウェアはＲＯＭ等の記録媒体に記録されている。但し、ハードウェア（専用回路）で実現しても良い。 The similarity calculation unit 107 can usually be realized by an MPU, a memory, or the like. The processing procedure of the similarity calculation unit 107 is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).

第一汎化情報数取得手段１０７１は、汎化情報取得部１０６が取得した汎化情報の数である第一汎化情報数を取得し、メモリ上に配置する。第一汎化情報数は、文受付部１０４が受け付けた文の２以上の語句のうち、対応する汎化情報が存在する語句の数であるとも言える。第一汎化情報数取得手段１０７１は、通常、ＭＰＵやメモリ等から実現され得る。第一汎化情報数取得手段１０７１の処理手順は、通常、ソフトウェアで実現され、当該ソフトウェアはＲＯＭ等の記録媒体に記録されている。但し、ハードウェア（専用回路）で実現しても良い。 The first generalization information number acquisition unit 1071 acquires the first generalization information number that is the number of generalization information acquired by the generalization information acquisition unit 106 and arranges it on the memory. It can be said that the number of first generalization information is the number of words / phrases having corresponding generalization information among two or more words / phrases of the sentence received by the sentence reception unit 104. The first generalized information number acquisition unit 1071 can be usually realized by an MPU, a memory, or the like. The processing procedure of the first generalized information number acquiring unit 1071 is usually realized by software, and the software is recorded in a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).

第二汎化情報数取得手段１０７２は、第二汎化情報数を取得し、メモリ上に配置する。第二汎化情報数は、翻訳文セット格納部１０３の２組以上の翻訳文セットが有する各原言語の文または各目的言語の文が有する汎化情報の数である。通常、翻訳文セットが有する原言語の文または目的言語の情報の中に、汎化情報が含まれており、第二汎化情報数取得手段１０７２は、当該汎化情報を検査することにより、第二汎化情報数を取得する。第二汎化情報数取得手段１０７２は、通常、ＭＰＵやメモリ等から実現され得る。第二汎化情報数取得手段１０７２の処理手順は、通常、ソフトウェアで実現され、当該ソフトウェアはＲＯＭ等の記録媒体に記録されている。但し、ハードウェア（専用回路）で実現しても良い。 The second generalization information number acquisition unit 1072 acquires the second generalization information number and arranges it on the memory. The number of second generalization information is the number of generalization information included in each source language sentence or each target language sentence included in two or more translation sentence sets in the translation sentence set storage unit 103. Usually, generalized information is included in the information of the source language sentence or the target language included in the translated sentence set, and the second generalized information number acquisition unit 1072 examines the generalized information, Get the number of second generalization information. The second generalized information number acquisition unit 1072 can be usually realized by an MPU, a memory, or the like. The processing procedure of the second generalized information number acquiring unit 1072 is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).

類似度算出対象文選択手段１０７３は、第一汎化情報数と一致する第二汎化情報数を有する原言語の文を、翻訳文セット格納部１０３から選択し、メモリ上に配置する。類似度算出対象文選択手段１０７３は、通常、ＭＰＵやメモリ等から実現され得る。類似度算出対象文選択手段１０７３の処理手順は、通常、ソフトウェアで実現され、当該ソフトウェアはＲＯＭ等の記録媒体に記録されている。但し、ハードウェア（専用回路）で実現しても良い。 The similarity calculation target sentence selecting unit 1073 selects a sentence in the source language having the number of second generalization information that matches the number of first generalization information from the translation sentence set storage unit 103 and places it on the memory. The similarity calculation target sentence selection unit 1073 can be usually realized by an MPU, a memory, or the like. The processing procedure of the similarity calculation target sentence selecting means 1073 is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).

類似度算出手段１０７４は、汎化情報取得部１０６が取得した汎化情報と語句分割部１０５が分割した２以上の語句のうちの1以上の語句を用いた文と、類似度算出対象文選択手段１０７３が選択した1以上の原言語の文との類似度を算出し、メモリ上に配置する。類似度算出手段１０７４は、通常、ＭＰＵやメモリ等から実現され得る。類似度算出手段１０７４の処理手順は、通常、ソフトウェアで実現され、当該ソフトウェアはＲＯＭ等の記録媒体に記録されている。但し、ハードウェア（専用回路）で実現しても良い。 The similarity calculation unit 1074 selects a sentence that uses the generalization information acquired by the generalization information acquisition unit 106 and one or more words out of two or more words divided by the word division unit 105, and a similarity calculation target sentence selection The similarity with one or more source language sentences selected by the means 1073 is calculated and placed on the memory. The similarity calculation unit 1074 can be realized typically as an MPU, a memory, or the like. The processing procedure of the similarity calculation means 1074 is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).

類似文出力部１０８は、類似度算出部１０７が算出した類似度を用いて、文受付部１０４が受け付けた文と類似する１以上の原言語の文を出力する。類似文出力部１０８は、通常、最大の類似度に対応する原言語の文を出力する。ただし、類似文出力部１０８は、類似度が上位３位までの原言語の文を出力しても良いし、類似度が所定の値以上のすべての原言語の文を出力しても良い。ここで、出力とは、ディスプレイへの表示、外部の装置（ディスプレイ装置）への送信を含む概念である。類似文出力部１０８は、ディスプレイ等の出力デバイスを含むと考えても含まないと考えても良い。類似文出力部１０８は、出力デバイスのドライバーソフトまたは、出力デバイスのドライバーソフトと出力デバイス等で実現され得る。 The similar sentence output unit 108 outputs one or more source language sentences similar to the sentence received by the sentence receiving unit 104 using the similarity calculated by the similarity calculating unit 107. The similar sentence output unit 108 normally outputs a source language sentence corresponding to the maximum similarity. However, the similar sentence output unit 108 may output a source language sentence having the highest degree of similarity, or may output all source language sentences having a similarity degree equal to or higher than a predetermined value. Here, the output is a concept including display on a display and transmission to an external device (display device). The similar sentence output unit 108 may be considered as including or not including an output device such as a display. The similar sentence output unit 108 can be implemented by output device driver software, or output device driver software and an output device.

許可指示受付部１０９は、類似文出力部１０８が出力した1以上の原言語の文のうち、一の原言語の文に対する許可を示す指示である許可指示の入力を受け付ける。許可指示の入力手段は、テンキーやキーボードやマウスやメニュー画面によるもの等、何でも良い。許可指示受付部１０９は、テンキーやキーボード等の入力手段のデバイスドライバーや、メニュー画面の制御ソフトウェア等で実現され得る。 The permission instruction receiving unit 109 receives an input of a permission instruction which is an instruction indicating permission for one source language sentence among one or more source language sentences output by the similar sentence output unit 108. The permission instruction input means may be anything such as a numeric keypad, a keyboard, a mouse, or a menu screen. The permission instruction accepting unit 109 can be realized by a device driver of an input unit such as a numeric keypad or a keyboard, menu screen control software, or the like.

目的言語文取得部１１０は、許可指示受付部１０９が受け付けた許可指示に対応する原言語の文を取得し、当該原言語の文に対応する目的言語の文を、翻訳文セット格納部１０３から取得し、メモリ上に配置する。目的言語文取得部１１０は、通常、ＭＰＵやメモリ等から実現され得る。目的言語文取得部１１０の処理手順は、通常、ソフトウェアで実現され、当該ソフトウェアはＲＯＭ等の記録媒体に記録されている。但し、ハードウェア（専用回路）で実現しても良い。 The target language sentence acquisition unit 110 acquires a source language sentence corresponding to the permission instruction received by the permission instruction reception unit 109, and sends a target language sentence corresponding to the source language sentence from the translated sentence set storage unit 103. Obtain and place on memory. The target language sentence acquisition unit 110 can usually be realized by an MPU, a memory, or the like. The processing procedure of the target language sentence acquisition unit 110 is usually realized by software, and the software is recorded in a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).

目的言語語句取得部１１１は、汎化情報取得部１０６が取得した汎化情報に対応する原言語の語句（受け付けた原言語の文の中の語句）と対になる目的言語の語句を、語句セット格納部１０２から取得する。目的言語語句取得部１１１は、通常、ＭＰＵやメモリ等から実現され得る。目的言語語句取得部１１１の処理手順は、通常、ソフトウェアで実現され、当該ソフトウェアはＲＯＭ等の記録媒体に記録されている。但し、ハードウェア（専用回路）で実現しても良い。 The target language phrase acquisition unit 111 sets a phrase in the target language that is paired with a source language phrase (a phrase in the accepted source language sentence) corresponding to the generalization information acquired by the generalization information acquisition unit 106. Obtained from the set storage unit 102. The target language phrase acquisition unit 111 can usually be realized by an MPU, a memory, or the like. The processing procedure of the target language phrase acquisition unit 111 is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).

目的言語文構成部１１２は、目的言語文取得部１１０が取得した目的言語の文中における、汎化情報に対応する語句を、目的言語語句取得部１１１が取得した目的言語の語句に置き換え、出力する目的言語の文を構成する。目的言語文構成部１１２は、通常、ＭＰＵやメモリ等から実現され得る。目的言語文構成部１１２の処理手順は、通常、ソフトウェアで実現され、当該ソフトウェアはＲＯＭ等の記録媒体に記録されている。但し、ハードウェア（専用回路）で実現しても良い。 The target language sentence constructing unit 112 replaces the phrase corresponding to the generalization information in the sentence of the target language acquired by the target language sentence acquiring unit 110 with the phrase of the target language acquired by the target language phrase acquiring unit 111 and outputs the phrase. Construct a sentence in the target language. The target language sentence constructing unit 112 can usually be realized by an MPU, a memory, or the like. The processing procedure of the target language sentence constructing unit 112 is usually realized by software, and the software is recorded in a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).

出力部１１３は、目的言語文構成部１１２が構成した目的言語の文を出力する。ここで、出力とは、ディスプレイへの表示、外部の装置（ディスプレイ装置）への送信等を含む概念である。出力部１１３は、ディスプレイ等の出力デバイスを含むと考えても含まないと考えても良い。出力部は、出力デバイスのドライバーソフトまたは、出力デバイスのドライバーソフトと出力デバイス等で実現され得る。 The output unit 113 outputs a sentence in the target language configured by the target language sentence constructing unit 112. Here, the output is a concept including display on a display, transmission to an external device (display device), and the like. The output unit 113 may be considered as including or not including an output device such as a display. The output unit may be realized by output device driver software, or output device driver software and an output device.

再入力催促部１１４は、文受付部１０４が受け付けた文の再入力または編集を促す出力を行う。かかる文の再入力または編集を促す出力により、ユーザは、文の再入力または編集を、入力手段を用いて行う、とする。そして、ユーザが再入力、または編集した文を、文受付部１０４が受け付けることとなる。また、再入力または編集を促す出力の内容は問わない。再入力催促部１１４は、通常、ＭＰＵやメモリ等から実現され得る。再入力催促部１１４の処理手順は、通常、ソフトウェアで実現され、当該ソフトウェアはＲＯＭ等の記録媒体に記録されている。但し、ハードウェア（専用回路）で実現しても良い。 The re-input prompting unit 114 performs output for prompting re-input or editing of the sentence received by the sentence receiving unit 104. It is assumed that the user re-inputs or edits a sentence by using an input unit based on an output prompting the user to re-input or edit the sentence. Then, the sentence receiving unit 104 receives a sentence re-input or edited by the user. Further, the content of the output prompting re-input or editing is not limited. The re-input prompting unit 114 can usually be realized by an MPU, a memory, or the like. The processing procedure of the re-input prompting unit 114 is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).

次に、機械翻訳装置の動作について図２から図４のフローチャートを用いて説明する。 Next, the operation of the machine translation apparatus will be described with reference to the flowcharts of FIGS.

（ステップＳ２０１）文受付部１０４は、文を受け付けたか否かを判断する。文を受け付ければステップＳ２０２に行き、文を受け付けなければステップＳ２０１に戻る。 (Step S201) The sentence receiving unit 104 determines whether a sentence has been received. If a sentence is accepted, the process goes to step S202, and if no sentence is accepted, the process returns to step S201.

（ステップＳ２０２）語句分割部１０５は、ステップＳ２０１で受け付けた文を語句に分割する。例えば、語句分割部１０５は、ステップＳ２０１で受け付けた文を複数の形態素に分割する。なお、ここで、語句分割部１０５は、各語句に対する品詞の情報や、名詞的内容語であるかの情報を得る、とする。なお、名詞的内容語とは、例えば、名詞や形容詞である。名詞的内容語は、他の品詞が入っても良いが、動詞は入れないことが好適である。例えば、公知の形態素解析器により、文を形態素に分けるだけではなく、品詞の情報等も取得できるので、詳細な説明を省略する。 (Step S202) The phrase dividing unit 105 divides the sentence received in step S201 into phrases. For example, the phrase dividing unit 105 divides the sentence received in step S201 into a plurality of morphemes. Here, it is assumed that the phrase dividing unit 105 obtains part-of-speech information for each phrase and information about whether it is a noun content word. Note that the noun content words are, for example, nouns and adjectives. The noun content word may contain other parts of speech, but preferably no verbs. For example, a known morphological analyzer can not only divide a sentence into morphemes but also acquire part-of-speech information and the like, and thus detailed description thereof is omitted.

（ステップＳ２０３）汎化情報取得部１０６は、カウンタｉに１を代入する。 (Step S203) The generalization information acquisition unit 106 substitutes 1 for a counter i.

（ステップＳ２０４）汎化情報取得部１０６は、ステップＳ２０２で得た語句中に、ｉ番目の語句が存在するか否かを判断する。ｉ番目の語句が存在すればステップＳ２０５に行き、ｉ番目の語句が存在しなければステップＳ２１０に行く。 (Step S204) The generalization information acquisition unit 106 determines whether or not the i-th word is present in the word obtained in step S202. If the i-th word / phrase exists, the process goes to step S205, and if the i-th word / phrase does not exist, the process goes to step S210.

（ステップＳ２０５）汎化情報取得部１０６は、ｉ番目の語句が名詞的内容語であるか否かを判断する。ｉ番目の語句が名詞的内容語であればステップＳ２０６に行き、ｉ番目の語句が名詞的内容語でなければステップＳ２０９に行く。 (Step S205) The generalization information acquisition unit 106 determines whether or not the i-th word is a noun content word. If the i-th phrase is a noun content word, go to step S206, and if the i-th phrase is not a noun content word, go to step S209.

（ステップＳ２０６）汎化情報取得部１０６は、ｉ番目の語句をキーとして、当該語句に対応する汎化情報を、語句汎化情報格納部１０１から取得する。 (Step S206) The generalization information acquisition unit 106 acquires generalization information corresponding to the word from the word generalization information storage unit 101 using the i-th word as a key.

（ステップＳ２０７）汎化情報取得部１０６は、ステップＳ２０６で汎化情報を取得できたか否かを判断する。汎化情報を取得できればステップＳ２０８に行き、汎化情報を取得できなければステップＳ２０９に行く。 (Step S207) The generalization information acquisition unit 106 determines whether generalization information has been acquired in step S206. If generalization information can be acquired, it will go to step S208, and if generalization information cannot be acquired, it will go to step S209.

（ステップＳ２０８）汎化情報取得部１０６は、ステップＳ２０６で取得した汎化情報を、ｉ番目の語句に対応する箇所に挿入する。なお、この際、ｉ番目の語句を、別途、メモリ上に記憶しておくこととする。 (Step S208) The generalization information acquisition unit 106 inserts the generalization information acquired in step S206 at a location corresponding to the i-th word / phrase. At this time, the i-th word is separately stored in the memory.

（ステップＳ２０９）汎化情報取得部１０６は、カウンタｉを１、インクリメントする。ステップＳ２０４に戻る。 (Step S209) The generalization information acquisition unit 106 increments the counter i by 1. The process returns to step S204.

（ステップＳ２１０）類似度算出部１０７は、翻訳文セット格納部１０３の各翻訳セットが保持している1以上の原言語の文から、所定の条件を満たす原言語の文を選択する。ここで、所定の条件とは、例えば、上述したｔｆ／ｉｄｆのスコアが、上位Ｎ（Ｎは、２以上の自然数）の文である。また、この原言語の文を選択する処理を予備選択とも言う。 (Step S210) The similarity calculation unit 107 selects a source language sentence satisfying a predetermined condition from one or more source language sentences held in each translation set of the translation sentence set storage unit 103. Here, the predetermined condition is, for example, a sentence in which the above-described score of tf / idf is the top N (N is a natural number of 2 or more). The process of selecting the source language sentence is also referred to as preliminary selection.

（ステップＳ２１１）類似度算出部１０７は、カウンタｉに１を代入する。 (Step S211) The similarity calculation unit 107 substitutes 1 for a counter i.

（ステップＳ２１２）類似度算出部１０７は、ステップＳ２１０で選択した文のうち、ｉ番目の文が存在するか否かを判断する。ｉ番目の文が存在すればステップＳ２１３に行き、ｉ番目の文が存在しなければステップＳ２１５に行く。 (Step S212) The similarity calculation unit 107 determines whether or not the i-th sentence is present among the sentences selected in step S210. If the i-th sentence exists, go to step S213, and if the i-th sentence does not exist, go to step S215.

（ステップＳ２１３）類似度算出部１０７は、ｉ番目の文と、ステップＳ２０８までの処理で得られた原言語の文（汎化情報を含む）の類似度を算出する。なお、ｉ番目の文とは、通常、汎化情報を含んだ文である。類似度算出処理について、図３のフローチャートを用いて説明する。 (Step S213) The similarity calculation unit 107 calculates the similarity between the i-th sentence and the source language sentence (including generalization information) obtained through the processing up to step S208. The i-th sentence is usually a sentence including generalization information. The similarity calculation process will be described with reference to the flowchart of FIG.

（ステップＳ２１４）類似度算出部１０７は、カウンタｉを１、インクリメントする。ステップＳ２１２に行く。 (Step S214) The similarity calculation unit 107 increments the counter i by 1. Go to step S212.

（ステップＳ２１５）類似文出力部１０８は、類似度算出部１０７が算出した類似度を用いて、文を取得する。例えば、類似文出力部１０８は、最も類似度が高い文を取得する。類似文出力部１０８は、類似度が高い上位３つの文を取得しても良い。 (Step S215) The similar sentence output unit 108 acquires a sentence using the similarity calculated by the similarity calculation unit 107. For example, the similar sentence output unit 108 acquires a sentence having the highest similarity. The similar sentence output unit 108 may acquire the top three sentences having the highest similarity.

（ステップＳ２１６）類似文出力部１０８は、ステップＳ２１５で取得した文を出力する。 (Step S216) The similar sentence output unit 108 outputs the sentence acquired in step S215.

（ステップＳ２１７）許可指示受付部１０９は、ステップＳ２１６で出力された1以上の原言語の文のうち、一の原言語の文に対する許可を示す指示である許可指示の入力を受け付けたか否かを判断する。許可指示の入力を受け付ければステップＳ２１８に行き、許可指示の入力を受け付けなければステップＳ２１９に行く。 (Step S217) The permission instruction reception unit 109 determines whether or not an input of a permission instruction that is an instruction indicating permission for one source language sentence among the one or more source language sentences output in Step S216 is received. to decide. If an input of a permission instruction is accepted, the process goes to step S218. If an input of a permission instruction is not accepted, the process goes to step S219.

（ステップＳ２１８）目的言語文取得部１１０等は、翻訳文を出力する処理を行う。処理を終了する。翻訳文出力処理について、図４のフローチャートを用いて説明する。 (Step S218) The target language sentence acquisition unit 110 or the like performs processing for outputting a translated sentence. The process ends. The translated sentence output process will be described with reference to the flowchart of FIG.

（ステップＳ２１９）再入力催促部１１４は、文受付部１０４が受け付けた文の再入力または編集を促す出力を行う。ステップＳ２０１に戻る (Step S219) The re-input prompting unit 114 performs output for prompting re-input or editing of the sentence received by the sentence receiving unit 104. Return to step S201

なお、図３のフローチャートにおいて、ステップＳ２１０の翻訳セットの予備選択の処理は、なくても良い。また、ステップＳ２０５における判断も、無くても良い。 In the flowchart of FIG. 3, the translation set preliminary selection process in step S210 may be omitted. Further, the determination in step S205 may not be performed.

次に、ステップＳ２１３の類似度算出処理について、図３のフローチャートを用いて説明する。 Next, the similarity calculation process in step S213 will be described with reference to the flowchart of FIG.

（ステップＳ３０１）類似度算出部１０７の第一汎化情報数取得手段１０７１は、汎化情報取得部１０６が取得した汎化情報の数である第一汎化情報数を取得し、メモリ上に配置する。 (Step S301) The first generalization information number acquisition unit 1071 of the similarity calculation unit 107 acquires the first generalization information number that is the number of generalization information acquired by the generalization information acquisition unit 106, and stores it in the memory. Deploy.

（ステップＳ３０２）第二汎化情報数取得手段１０７２は、ｉ番目の原言語の文が有する汎化情報の数である第二汎化情報数を取得し、メモリ上に配置する。 (Step S302) The second generalization information number acquisition unit 1072 acquires the number of second generalization information, which is the number of generalization information included in the i-th source language sentence, and arranges it on the memory.

（ステップＳ３０３）類似度算出対象文選択手段１０７３は、ステップＳ３０１で取得した第一汎化情報数と、ステップＳ３０２で取得した第二汎化情報数が一致するか否かを判断する。一致すればステップＳ３０４に行き、一致しなければ上位処理にリターンする。 (Step S303) The similarity calculation target sentence selection unit 1073 determines whether or not the number of first generalized information acquired in step S301 matches the number of second generalized information acquired in step S302. If they match, the process goes to step S304, and if they do not match, the process returns to the upper process.

（ステップＳ３０４）類似度算出手段１０７４は、ｉ番目の原言語の文（通常、汎化情報を含む）と入力された原言語の文（汎化情報を含む）のｔｆ／ｉｄｆのスコアを、上述した算出式、方法により取得する。 (Step S304) The similarity calculation means 1074 calculates the tf / idf score of the i-th source language sentence (usually including generalization information) and the input source language sentence (including generalization information). Obtained by the above-described calculation formula and method.

（ステップＳ３０５）類似度算出手段１０７４は、ｉ番目の原言語の文（通常、汎化情報を含む）と入力された原言語の文（汎化情報を含む）の編集距離を、上述した算出式、方法により取得する。 (Step S305) The similarity calculation means 1074 calculates the edit distance between the i-th source language sentence (usually including generalization information) and the input source language sentence (including generalization information) as described above. Obtained by formula and method.

（ステップＳ３０６）類似度算出手段１０７４は、類似度を算出する式の情報（例えば、数式３）を読み出し、ステップＳ３０４で取得したｔｆ／ｉｄｆのスコア、ステップＳ３０５で取得した編集距離を用いて、類似度（スコア）を算出する。 (Step S306) The similarity calculation unit 1074 reads information on an expression for calculating the similarity (for example, Expression 3), and uses the tf / idf score acquired in Step S304 and the edit distance acquired in Step S305. The similarity (score) is calculated.

（ステップＳ３０７）類似度算出手段１０７４は、ステップＳ３０６で算出した類似度（スコア）を、ｉ番目の原言語の文（翻訳文セット格納部１０３中の文で、選択された文）と対応付けて、メモリ上に一時格納する。なお、類似度算出手段１０７４は、ステップＳ３０６で算出した類似度（スコア）を、ｉ番目の原言語の文に対応する目的言語の文（翻訳文セット格納部１０３中の文）と対応付けて、メモリ上に一時格納しても良い。上位処理にリターンする。 (Step S307) The similarity calculation means 1074 associates the similarity (score) calculated in step S306 with the i-th source language sentence (the sentence selected in the sentence set storage unit 103). And temporarily store it in memory. The similarity calculating unit 1074 associates the similarity (score) calculated in step S306 with a sentence in the target language corresponding to the i-th source language sentence (a sentence in the translated sentence set storage unit 103). It may be temporarily stored in the memory. Return to upper process.

なお、図３のフローチャートにおける類似度算出の方法は、一例であることは言うまでない。 Needless to say, the method of calculating similarity in the flowchart of FIG. 3 is an example.

次に、ステップＳ２１８の翻訳文出力処理について、図４のフローチャートを用いて説明する。 Next, the translated sentence output process of step S218 will be described with reference to the flowchart of FIG.

（ステップＳ４０１）目的言語文取得部１１０は、許可指示受付部１０９が受け付けた許可指示に対応する原言語の文を取得し、当該原言語の文に対応する目的言語の文を、翻訳文セット格納部１０３から取得し、メモリ上に配置する。 (Step S401) The target language sentence acquisition unit 110 acquires a source language sentence corresponding to the permission instruction received by the permission instruction reception unit 109, and converts the target language sentence corresponding to the source language sentence into a translated sentence set. Obtained from the storage unit 103 and placed on the memory.

（ステップＳ４０２）目的言語語句取得部１１１は、カウンタｉに１を代入する。 (Step S402) The target language phrase acquisition unit 111 assigns 1 to the counter i.

（ステップＳ４０３）目的言語語句取得部１１１は、ステップＳ４０１で取得した文を構成する語句（形態素や単語や句など）の中に、ｉ番目の汎化情報に置き換えられる語句が存在するか否かを判断する。ｉ番目の汎化情報に置き換えられる語句が存在すればステップＳ４０４に行き、ｉ番目の汎化情報に置き換えられる語句が存在しなければステップＳ４０８に行く。 (Step S403) The target language phrase acquisition unit 111 determines whether or not there is a phrase to be replaced with the i-th generalization information in the phrases (morphemes, words, phrases, etc.) constituting the sentence acquired in Step S401. Judging. If there is a phrase to be replaced with the i-th generalization information, the process goes to step S404, and if there is no phrase to be replaced with the i-th generalization information, the process goes to step S408.

（ステップＳ４０４）目的言語語句取得部１１１は、ｉ番目の汎化情報に置き換えられる語句に対応する語句であり、入力を受け付けた原言語の文の中の語句を取得し、メモリ上に配置する。 (Step S404) The target language phrase acquisition unit 111 is a phrase corresponding to the phrase to be replaced with the i-th generalization information, acquires the phrase in the sentence of the source language that has received the input, and places it on the memory. .

（ステップＳ４０５）目的言語語句取得部１１１は、ステップＳ４０４で取得した原言語の文の中の語句に対応する目的言語の語句を、語句セット格納部１０２から取得し、メモリ上に配置する。 (Step S405) The target language phrase acquisition unit 111 acquires the target language phrase corresponding to the phrase in the source language sentence acquired in step S404 from the phrase set storage unit 102, and arranges it on the memory.

（ステップＳ４０６）目的言語文構成部１１２は、ステップＳ４０１で取得した目的言語の文中における、汎化情報に対応する語句を、ステップＳ４０５で取得した目的言語の語句に置き換える。 (Step S406) The target language sentence constructing unit 112 replaces the phrase corresponding to the generalization information in the target language sentence acquired in step S401 with the target language phrase acquired in step S405.

（ステップＳ４０７）目的言語語句取得部１１１は、カウンタｉを１、インクリメントする。 (Step S407) The target language phrase acquisition unit 111 increments the counter i by one.

（ステップＳ４０８）出力部１１３は、ステップＳ４０６において、目的言語文構成部１１２が構成した目的言語の文を出力する。上位処理にリターンする。 (Step S408) The output unit 113 outputs the sentence in the target language configured by the target language sentence constructing unit 112 in step S406. Return to upper process.

以下、本実施の形態における機械翻訳装置の具体的な動作について説明する。ここでの機械翻訳装置は、日英翻訳を行う機械翻訳装置である。 Hereinafter, a specific operation of the machine translation apparatus in the present embodiment will be described. The machine translation device here is a machine translation device that performs Japanese-English translation.

語句汎化情報格納部１０１は、図５に示す語句汎化情報管理表を保持している。語句汎化情報管理表は、１以上の語句汎化情報を格納している。語句汎化情報は、「語句」と「汎化情報」を有する。 The phrase generalization information storage unit 101 holds the phrase generalization information management table shown in FIG. The phrase generalization information management table stores one or more phrase generalization information. The phrase generalization information includes “phrase” and “generalization information”.

また、語句セット格納部１０２は、図６に示す語句セット管理表を保持している。語句セット管理表は、１以上の語句セットを格納している。語句セットは、「原言語」と「目的言語」を有する。「原言語」は、原言語（ここでは、日本語）の語句である。「目的言語」は、目的言語（ここでは、英語）の語句である。 The phrase set storage unit 102 holds a phrase set management table shown in FIG. The phrase set management table stores one or more phrase sets. The phrase set has “source language” and “target language”. The “source language” is a phrase of the source language (here, Japanese). The “target language” is a phrase of the target language (here, English).

翻訳文セット格納部１０３は、図７に示す翻訳文セット管理表を保持している。翻訳文セット管理表は、１以上の翻訳文セットを格納している。翻訳文セットは、「原言語文」と「目的言語文」を有する。「原言語文」は、原言語（ここでは、日本語）の文である。「目的言語文」は、目的言語（ここでは、英語）の文である。翻訳文セットにおける各文の一部の語句には、汎化情報が対応付けられている（「東京」に対する「地名」、など）。 The translated sentence set storage unit 103 holds a translated sentence set management table shown in FIG. The translation sentence set management table stores one or more translation sentence sets. The translation sentence set includes “source language sentence” and “target language sentence”. The “source language sentence” is a sentence in the source language (here, Japanese). The “target language sentence” is a sentence of a target language (here, English). Generalized information is associated with some words in each sentence in the translated sentence set (such as “place name” for “Tokyo”).

かかる状況において、ユーザが、口語調の文を、キーボード等の「東京に行きたいんですが」と入力した、とする。 In such a situation, it is assumed that the user inputs a colloquial sentence such as “I want to go to Tokyo” using a keyboard or the like.

次に、文受付部１０４は、文「東京に行きたいんですが」を受け付ける。語句分割部１０５は、受け付けた文「東京に行きたいんですが」を、ここでは形態素に分割し、「東京／に／行き／たい／ん／です／が」を得る。形態素の区切りの「／」は、他の文字コード（例えば、スペースや"，"など）でも良い。また、ここで、語句分割部１０５は、各形態素に対する品詞の情報も出力する。つまり、「東京」は「名詞」である、と出力する。 Next, the sentence reception unit 104 receives the sentence “I want to go to Tokyo”. The phrase dividing unit 105 divides the received sentence “I want to go to Tokyo” into morphemes, and obtains “Tokyo / Ni / Go / Tai / Nan / Is / Ga”. The morpheme separator “/” may be another character code (for example, a space or “,”). Here, the phrase dividing unit 105 also outputs part-of-speech information for each morpheme. That is, “Tokyo” is output as “noun”.

次に、汎化情報取得部１０６は、予め記憶している名詞的内容語に対応する品詞「名詞」「形容詞」を読み出し、「名詞」に対応する１番目の語句「東京」を取得する。この語句「東京」は、名詞的内容語である。 Next, the generalization information acquisition unit 106 reads the part of speech “noun” and “adjective” corresponding to the noun content words stored in advance, and acquires the first word “Tokyo” corresponding to “noun”. The phrase “Tokyo” is a noun content word.

そして、汎化情報取得部１０６は、「東京」をキーとして、図５の語句汎化情報管理表を検索し、当該語句に対応する汎化情報［地名］を取得し、メモリ上に配置する。 Then, the generalization information acquisition unit 106 searches the word generalization information management table in FIG. 5 using “Tokyo” as a key, acquires generalization information [location name] corresponding to the word, and arranges it on the memory. .

そして、汎化情報取得部１０６は、取得した汎化情報［地名］を、１番目の語句「東京」に対応する箇所に挿入し、「［地名］／に／行き／たい／ん／です／が」を得る。 Then, the generalization information acquisition unit 106 inserts the acquired generalization information [place name] into the location corresponding to the first word “Tokyo” and reads “[place name] / ni / go / tai / n / Get ".

そして、「東京／に／行き／たい／ん／です／が」中に、名詞的内容語（ここでは、名詞または形容詞）は「東京」以外には存在しないので、汎化情報取得部１０６は、汎化情報を取得する処理を終了する。なお、汎化情報取得部１０６は、［地名］に対応する語句「東京」を、別途、メモリ上に記憶しておく。 And since there are no noun content words (here, nouns or adjectives) other than “Tokyo” in “Tokyo / ni / go / tai / n / is / ga”, the generalized information acquisition unit 106 Then, the process of acquiring generalization information is terminated. Note that the generalization information acquisition unit 106 separately stores the phrase “Tokyo” corresponding to [place name] on the memory.

次に、類似度算出部１０７は、原言語の文「［地名］／に／行き／たい／ん／です／が」と、図７の翻訳文セット管理表の各文とのｔｆ／ｉｄｆのスコアを算出する。そして、翻訳文セット管理表のｔｆ／ｉｄｆのスコアが高い上位３（上記のＮが「３」）つの原言語の文を、例えば、図８に示す。なお、ｔｆ／ｉｄｆのスコア算出方法は公知であるので、ここでは省略する。また、図８は、予備選択された原言語の文の集合である。 Next, the similarity calculation unit 107 calculates the tf / idf between the source language sentence “[place name] / ni / go / tai / n / d / ga” and each sentence of the translation sentence set management table of FIG. Calculate the score. FIG. 8 shows, for example, sentences in the top three languages (where N is “3”) having the highest tf / idf score in the translated sentence set management table. Note that the tf / idf score calculation method is well-known and is omitted here. FIG. 8 shows a set of pre-selected source language sentences.

次に、類似度算出部１０７は、図８の３つの各文に対して、以下のように、文「［地名］／に／行き／たい／ん／です／が」との類似度を算出する。 Next, the similarity calculation unit 107 calculates the similarity to the sentence “[place name] / ni / go / tai / n / d / ga” for each of the three sentences in FIG. To do.

まず、類似度算出部１０７の第一汎化情報数取得手段１０７１は、文「［地名］／に／行き／たい／ん／です／が」の汎化情報の数である第一汎化情報数「１」を取得し、メモリ上に配置する。 First, the first generalization information number acquisition unit 1071 of the similarity calculation unit 107 includes first generalization information that is the number of generalization information of the sentence “[place name] / ni / go / tai / n / is / ga”. The number “1” is acquired and placed on the memory.

そして、第二汎化情報数取得手段１０７２は、図８の１番目の原言語の文が有する汎化情報の数である第二汎化情報数「１」を取得し、メモリ上に配置する。 Then, the second generalization information number acquisition unit 1072 acquires the second generalization information number “1” which is the number of generalization information included in the first source language sentence of FIG. 8 and arranges it on the memory. .

類似度算出対象文選択手段１０７３は、取得した第一汎化情報数「１」と、取得した第二汎化情報数「１」が一致する、と判断し、類似度算出手段１０７４は、１番目の原言語の文「［地名］／に／行き／たい／の／です／が」と文「［地名］／に／行き／たい／ん／です／が」のｔｆ／ｉｄｆのスコアを、上述した算出式、方法により取得する。なお、ここでは、既に取得しているｔｆ／ｉｄｆのスコアを読み込んでも良い。 The similarity calculation target sentence selection unit 1073 determines that the acquired first generalization information number “1” matches the acquired second generalization information number “1”, and the similarity calculation unit 1074 determines that 1 The tf / idf score of the sentence [[place name] / ni / go / tai / no / is / ga] and the sentence [[place name] / ni / go / tai / n / da / ga] in the second source language, Obtained by the above-described calculation formula and method. Here, the score of tf / idf that has already been acquired may be read.

そして、類似度算出手段１０７４は、１番目の原言語の文「［地名］／に／行き／たい／の／です／が」と入力された原言語の文「［地名］／に／行き／たい／ん／です／が」の編集距離を、上述した算出式、方法により取得する。 Then, the similarity calculation means 1074 inputs the sentence “[place name] / ni / go //” of the source language in which the first original sentence “[place name] / ni / go / tai / no / is / ga” is input. The edit distance of “Tai / Nan / Isuda / Ga” is acquired by the above-described calculation formula and method.

次に、類似度算出手段１０７４は、類似度を算出する式の情報（例えば、数式３）を読み出し、取得したｔｆ／ｉｄｆのスコア、取得した編集距離を用いて、類似度（スコア）を算出する。ここでは、類似度算出手段１０７４は原言語の文「［地名］／に／行き／たい／の／です／が」の類似度として、例えば、「０．９」と算出できた、とする。（この値は、おかしくはないでしょうか？） Next, the similarity calculation unit 1074 reads information on an expression for calculating the similarity (for example, Expression 3), and calculates the similarity (score) using the acquired tf / idf score and the acquired editing distance. To do. Here, it is assumed that the similarity calculation unit 1074 can calculate, for example, “0.9” as the similarity of the sentence “[place name] / ni / go / tai ////” in the source language. (Is this value strange?)

次に、第二汎化情報数取得手段１０７２は、図８の２番目の原言語の文が有する汎化情報の数である第二汎化情報数「２」を取得し、メモリ上に配置する。そして、類似度算出対象文選択手段１０７３は、取得した第一汎化情報数「１」と、取得した第二汎化情報数「２」が一致しない、と判断し、３番目の文の処理に移行する。 Next, the second generalization information number acquisition unit 1072 acquires the second generalization information number “2”, which is the number of generalization information included in the second source language sentence in FIG. To do. Then, the similarity calculation target sentence selection unit 1073 determines that the acquired first generalization information number “1” and the acquired second generalization information number “2” do not match, and processes the third sentence Migrate to

そして、第二汎化情報数取得手段１０７２は、図８の３番目の原言語の文が有する汎化情報の数である第二汎化情報数「１」を取得し、メモリ上に配置する。 Then, the second generalization information number acquisition unit 1072 acquires the second generalization information number “1” that is the number of generalization information included in the third source language sentence of FIG. 8 and arranges it on the memory. .

類似度算出対象文選択手段１０７３は、取得した第一汎化情報数「１」と、取得した第二汎化情報数「１」が一致する、と判断し、類似度算出手段１０７４は、３番目の原言語の文「［地名］／へ／来／たい／の／です／ね」と文「［地名］／に／行き／たい／ん／です／が」のｔｆ／ｉｄｆのスコアを、上述した算出式、方法により取得する。なお、ここでは、既に取得しているｔｆ／ｉｄｆのスコアを読み込んでも良い。 The similarity calculation target sentence selection unit 1073 determines that the acquired first generalization information number “1” matches the acquired second generalization information number “1”. The tf / idf score of the sentence [[place name] / to / coming / tai / no / is / ne] and the sentence [[place name] / ni / go / tai / n / de / ga] Obtained by the above-described calculation formula and method. Here, the score of tf / idf that has already been acquired may be read.

そして、類似度算出手段１０７４は、１番目の原言語の文「［地名］／へ／来／たい／の／です／ね」と入力された原言語の文「［地名］／に／行き／たい／ん／です／が」の編集距離を、上述した算出式、方法により取得する。 Then, the similarity calculation means 1074 inputs the source language sentence “[place name] / ni / go //” where the first source language sentence “[place name] / to / coming / want / no / is / ne” is input. The edit distance of “Tai / Nan / Isuda / Ga” is acquired by the above-described calculation formula and method.

次に、類似度算出手段１０７４は、類似度を算出する式の情報（例えば、数式３）を読み出し、取得したｔｆ／ｉｄｆのスコア、取得した編集距離を用いて、類似度（スコア）を算出する。ここでは、類似度算出手段１０７４は原言語の文「［地名］／へ／来／たい／の／です／ね」の類似度として、例えば、「０．５５」と算出できた、とする。 Next, the similarity calculation unit 1074 reads information on an expression for calculating the similarity (for example, Expression 3), and calculates the similarity (score) using the acquired tf / idf score and the acquired editing distance. To do. Here, it is assumed that the similarity calculation unit 1074 can calculate, for example, “0.55” as the similarity of the sentence “[place name] / to / coming / want / no / is / ne” in the source language.

そして、次に、類似文出力部１０８は、類似度算出部１０７が算出した類似度（「０．９」や「０．５５」）を用いて、最も類似度の高い「０．９」に対応する原言語の文「［地名］／に／行き／たい／の／です／が」を取得し、メモリ上に配置する。 Then, the similar sentence output unit 108 uses the similarity (“0.9” or “0.55”) calculated by the similarity calculation unit 107 to set “0.9” having the highest similarity. The corresponding source language sentence “[place name] / ni / go / tai / no / is / ga” is acquired and placed in the memory.

次に、類似文出力部１０８は、取得した文「［地名］／に／行き／たい／の／です／が」の汎化情報［地名］を、元の語句「東京」に変更し、文「東京／に／行き／たい／の／です／が」を取得する。なお、元の語句「東京」は、汎化情報を取得する際にメモリ上に配置しており、かかるメモリ上の語句「東京」を読み出す。 Next, the similar sentence output unit 108 changes the generalization information [place name] of the acquired sentence “[place name] / ni / go / tai / no / de / ga” to the original phrase “Tokyo”, Get “Tokyo / Ni / Go / Tai / No / Is / Ga”. The original phrase “Tokyo” is arranged on the memory when the generalization information is acquired, and the phrase “Tokyo” on the memory is read out.

そして、類似文出力部１０８は、文「東京に行きたいのですが」をディスプレイに出力する。ここでは、図９に示すように、類似文出力部１０８は、文「東京に行きたいのですが」だけではなく、許可指示の入力を促すボタン（「ＹＥＳ」ボタン）と、不許可を指示するための「ＮＯ」ボタンをも有する画面を出力している。 Then, the similar sentence output unit 108 outputs the sentence “I want to go to Tokyo” on the display. Here, as shown in FIG. 9, the similar sentence output unit 108 displays not only the sentence “I want to go to Tokyo” but also a button (“YES” button) that prompts the user to input a permission instruction and a non-permission instruction. A screen having a “NO” button is also output.

次に、ユーザは、自分が入力した文「東京に行きたいんですが」に近い文「東京に行きたいのですが」が表示されているので、「ＹＥＳ」ボタンを、マウス等の入力手段で押下した、とする。 Next, since the sentence “I want to go to Tokyo” that is close to the sentence “I want to go to Tokyo” is displayed, the user clicks the “YES” button with an input means such as a mouse. Is pressed.

次に、「ＹＥＳ」ボタンの押下により、許可指示受付部１０９は、許可指示の入力を受け付ける。 Next, when the “YES” button is pressed, the permission instruction receiving unit 109 receives an input of a permission instruction.

そして、目的言語文取得部１１０は、許可指示受付部１０９が受け付けた許可指示に対応する原言語の文「東京に行きたいのですが」を取得する。そして、目的言語文取得部１１０は、当該原言語の文「東京に行きたいのですが」（「ＩＤ＝２」の文）に対応する目的言語の文「Ｉ'ｄｌｉｋｅｔｏｇｏｔｏ／Ｋｙｏｔｏ［地名］」を、図７の翻訳文セット管理表から取得し、メモリ上に配置する。 Then, the target language sentence acquisition unit 110 acquires the source language sentence “I want to go to Tokyo” corresponding to the permission instruction received by the permission instruction reception unit 109. Then, the target language sentence acquisition unit 110 reads the sentence “I'd like to go to / Kyoto” in the target language corresponding to the sentence “I want to go to Tokyo” in the source language (the sentence with “ID = 2”). [Place name] ”is acquired from the translated sentence set management table of FIG. 7 and placed on the memory.

次に、目的言語語句取得部１１１は、１番目の汎化情報［地名］に置き換えられる語句に対応する語句であり、入力を受け付けた原言語の文の中の語句「東京」を取得する。なお、語句「東京」はメモリ上に配置されている。そして、目的言語語句取得部１１１は、取得した原言語の文の中の語句「東京」に対応する目的言語の語句「Ｔｏｋｙｏ」を、図６の語句セット管理表から取得し、メモリ上に配置する。 Next, the target language phrase acquisition unit 111 acquires the phrase “Tokyo” in the sentence of the source language that is the input corresponding to the phrase that is replaced with the first generalization information [place name]. Note that the phrase “Tokyo” is arranged on the memory. Then, the target language phrase acquisition unit 111 acquires the phrase “Tokyo” in the target language corresponding to the phrase “Tokyo” in the acquired source language sentence from the phrase set management table of FIG. 6 and places it on the memory. To do.

そして、目的言語文構成部１１２は、取得した目的言語の文「Ｉ'ｄｌｉｋｅｔｏｇｏｔｏ／Ｋｙｏｔｏ［地名］」中における、汎化情報に対応する語句（Ｋｙｏｔｏ［地名］）を、取得した目的言語の語句「Ｔｏｋｙｏ」に置き換え、「Ｉ'ｄｌｉｋｅｔｏｇｏｔｏ／Ｔｏｋｙｏ」を得る。そして、目的言語文構成部１１２は、区切り文字「／」を消去し、文「Ｉ'ｄｌｉｋｅｔｏｇｏｔｏＴｏｋｙｏ」を得る。 And the target language sentence structure part 112 acquired the phrase (Kyoto [place name]) corresponding to the generalization information in the sentence "I'd like to go to / Kyoto [place name]" of the acquired target language. Replace with the phrase “Tokyo” in the target language to obtain “I'd like to go to / Tokyo”. Then, the target language sentence constructing unit 112 deletes the delimiter “/” and obtains the sentence “I'd like to go to Tokyo”.

そして、出力部１１３は、目的言語文構成部１１２が構成した目的言語の文「Ｉ'ｄｌｉｋｅｔｏｇｏｔｏＴｏｋｙｏ」を出力する。以上の処理により、機械翻訳処理が完了する。 Then, the output unit 113 outputs the sentence “I'd like to go to Tokyo” in the target language configured by the target language sentence composing unit 112. With the above processing, the machine translation process is completed.

なお、本具体例において、図９の表示で、例えば、ユーザが「ＮＯ」ボタンを押下したり、一定時間以上、入力を行わなかった場合、図１０に示すように、再入力催促部１１４は、文の再入力を促す画面を提示する。この場合、先にユーザが入力した文が編集可能な形態で表示されている。 In this specific example, in the display of FIG. 9, for example, when the user presses the “NO” button or does not input for a certain period of time, as shown in FIG. Present a screen that prompts the user to re-enter the sentence. In this case, the sentence previously input by the user is displayed in an editable form.

そして、ユーザは、文を編集し、「ＯＫ」ボタンを押下すると、再度、上記の翻訳処理が行われる。かかる再入力催促部１１４の処理は２度以上、行われても良い。 When the user edits the sentence and presses the “OK” button, the above translation processing is performed again. The process of the re-input prompting unit 114 may be performed twice or more.

以上、本実施の形態によれば、入力された文と類似する文を選択し、ユーザに提示し、ユーザに選択された文に対応する翻訳文を用いて、翻訳を行うことにより、非常に精度高く機械翻訳を行える。 As described above, according to the present embodiment, by selecting a sentence similar to the input sentence, presenting it to the user, and performing translation using the translated sentence corresponding to the sentence selected by the user, Machine translation can be performed with high accuracy.

なお、本実施の形態によれば、語句汎化情報格納部に格納されている語句汎化情報を用いて、汎化情報を取得した。しかし、語句分割部が文を２以上の語句に分割する際に、各語句に対応する汎化情報（通常、品詞）を取得しても良い。かかる場合、語句汎化情報格納部は不要となる。そして、かかる場合、機械翻訳装置は、原言語の語句と目的言語の語句を対に有する語句セットを１組以上格納している語句セット格納部と、原言語の文と目的言語の文を対に有する翻訳文セットであり、当該翻訳文セットが有する前記原言語の文と前記目的言語の文の一部の語句に対応する汎化情報を有する翻訳文セットを２組以上格納している翻訳文セット格納部と、文を受け付ける文受付部と、前記文を２以上の語句に分割し、当該語句に対応する汎化情報（通常、品詞）を取得する語句分割部と、前記語句分割部が分割した２以上の語句に対応する汎化情報のうち、語句と置き換える汎化情報（例えば、名詞と形容詞など）を決定し、取得する汎化情報取得部と、前記汎化情報取得部が取得した汎化情報と前記語句分割部が分割した２以上の語句のうちの1以上の語句を用いた文と、前記翻訳文セット格納部の２組以上の翻訳文セットが有する各原言語の文との類似度を算出する類似度算出部と、前記類似度算出部が算出した類似度を用いて、前記文受付部が受け付けた文と類似する１以上の原言語の文を出力する類似文出力部と、前記類似文出力部が出力した1以上の原言語の文のうち、一の原言語の文に対する許可を示す指示である許可指示の入力を受け付ける許可指示受付部と、前記許可指示受付部が受け付けた許可指示に対応する原言語の文に対応する目的言語の文を取得する目的言語文取得部と、前記汎化情報取得部が取得した汎化情報に対応する原言語の語句と対になる目的言語の語句を、前記語句セット格納部から取得する目的言語語句取得部と、前記目的言語文取得部が取得した目的言語の文中における、汎化情報に対応する語句を、前記目的言語語句取得部が取得した目的言語の語句に置き換え、出力する目的言語の文を構成する目的言語文構成部と、前記目的言語文構成部が構成した目的言語の文を出力する出力部を具備する機械翻訳装置、である。 Note that according to the present embodiment, generalization information is acquired using the phrase generalization information stored in the phrase generalization information storage unit. However, when the phrase dividing unit divides a sentence into two or more phrases, generalization information (usually part of speech) corresponding to each phrase may be acquired. In such a case, the phrase generalization information storage unit becomes unnecessary. In such a case, the machine translation device is configured to store a phrase set storage unit that stores one or more sets of phrases having a pair of a source language phrase and a target language phrase, and a source language sentence and a target language sentence. A translation sentence set that includes two or more sets of translation sentence sets having generalization information corresponding to some words and phrases of the source language sentence and the target language sentence of the translation sentence set A sentence set storage unit; a sentence receiving unit that receives a sentence; a phrase dividing unit that divides the sentence into two or more words and acquires generalized information (usually parts of speech) corresponding to the words; and the phrase dividing unit Among generalization information corresponding to two or more words divided by the generalization information (for example, nouns and adjectives) to be replaced with words, and obtaining the generalization information acquisition unit, and the generalization information acquisition unit The acquired generalization information and the phrase dividing unit divided A similarity calculation unit that calculates a similarity between a sentence using one or more of the above phrases and a sentence in each source language included in two or more sets of translation sentences in the translation sentence set storage unit; Using the similarity calculated by the similarity calculator, a similar sentence output unit that outputs one or more source language sentences similar to the sentence received by the sentence receiver, and the similar sentence output unit outputs 1 Among the above source language sentences, a permission instruction receiving unit that receives an input of a permission instruction that is an instruction indicating permission for one source language sentence, and a source language corresponding to the permission instruction received by the permission instruction receiving unit A target language sentence acquisition unit that acquires a sentence in a target language corresponding to a sentence, and a phrase in the target language that is paired with a source language phrase that corresponds to the generalization information acquired by the generalization information acquisition unit. A target language phrase acquisition unit acquired from a storage unit, and the target language In the target language sentence acquired by the sentence acquisition unit, the phrase corresponding to the generalization information is replaced with the target language phrase acquired by the target language phrase acquisition unit, and the target language sentence configuration that configures the output target language sentence And a machine translation apparatus comprising an output unit that outputs a sentence in a target language configured by the target language sentence constructing unit.

また、本実施の形態によれば、２文の類似度を算出するアルゴリズムは、多々あり、問わない。このことは、本明細書における他の実施の形態においても該当する。 Further, according to the present embodiment, there are many algorithms for calculating the similarity between two sentences, and there is no limitation. This also applies to other embodiments in this specification.

また、本実施の形態によれば、原言語は日本語、目的言語は英語の例で説明したが、原言語と目的言語は問わないことは言うまでもない。このことは、本明細書における他の実施の形態においても該当する。 Further, according to the present embodiment, the source language is Japanese and the target language is English. However, it goes without saying that the source language and the target language are not limited. This also applies to other embodiments in this specification.

さらに、本実施の形態における処理は、ソフトウェアで実現しても良い。そして、このソフトウェアをソフトウェアダウンロード等により配布しても良い。また、このソフトウェアをＣＤ−ＲＯＭなどの記録媒体に記録して流布しても良い。なお、このことは、本明細書における他の実施の形態においても該当する。なお、本実施の形態における機械翻訳装置を実現するソフトウェアは、以下のようなプログラムである。つまり、このプログラムは、コンピュータを、文を受け付ける文受付部と、前記文を２以上の語句に分割する語句分割部と、前記語句分割部が分割した１以上の語句に対応する汎化情報を、記憶媒体に格納されている１以上の語句汎化情報から取得する汎化情報取得部と、前記汎化情報取得部が取得した汎化情報と前記語句分割部が分割した２以上の語句のうちの1以上の語句を用いた文と、記憶媒体に格納されている２組以上の翻訳文セットが有する各原言語の文との類似度を算出する類似度算出部と、前記類似度算出部が算出した類似度を用いて、前記文受付部が受け付けた文と類似する１以上の原言語の文を出力する類似文出力部と、前記類似文出力部が出力した1以上の原言語の文のうち、一の原言語の文に対する許可を示す指示である許可指示の入力を受け付ける許可指示受付部と、前記許可指示受付部が受け付けた許可指示に対応する原言語の文に対応する目的言語の文を取得する目的言語文取得部と、前記汎化情報取得部が取得した汎化情報に対応する原言語の語句と対になる目的言語の語句を、記憶媒体に格納されている１以上の語句セットから取得する目的言語語句取得部と、前記目的言語文取得部が取得した目的言語の文中における、汎化情報に対応する語句を、前記目的言語語句取得部が取得した目的言語の語句に置き換え、出力する目的言語の文を構成する目的言語文構成部と、前記目的言語文構成部が構成した目的言語の文を出力する出力部として機能させるためのプログラム、である。 Furthermore, the processing in the present embodiment may be realized by software. Then, this software may be distributed by software download or the like. Further, this software may be recorded and distributed on a recording medium such as a CD-ROM. This also applies to other embodiments in this specification. The software that realizes the machine translation apparatus according to the present embodiment is the following program. In other words, the program includes a computer that receives a sentence receiving unit that receives a sentence, a phrase dividing unit that divides the sentence into two or more words, and generalization information corresponding to one or more words that the word dividing unit divides. A generalization information acquisition unit acquired from one or more word generalization information stored in a storage medium, generalization information acquired by the generalization information acquisition unit, and two or more words divided by the word division unit A similarity calculation unit for calculating a similarity between a sentence using one or more of the phrases and a sentence in each source language included in two or more translation sentence sets stored in a storage medium; and the similarity calculation A similar sentence output unit that outputs one or more source language sentences similar to the sentence received by the sentence receiving unit using the similarity calculated by the unit, and one or more source languages output by the similar sentence output unit Is an instruction indicating permission for a sentence in one source language. A permission instruction receiving unit that receives an input of a permission instruction, a target language sentence acquiring unit that acquires a sentence in a target language corresponding to a sentence in a source language corresponding to the permission instruction received by the permission instruction receiving unit, and the generalization information A target language phrase acquisition unit that acquires, from one or more word / phrase sets stored in a storage medium, a target language phrase that is paired with a source language phrase corresponding to the generalization information acquired by the acquisition unit; In the target language sentence acquired by the sentence acquisition unit, the phrase corresponding to the generalization information is replaced with the target language phrase acquired by the target language phrase acquisition unit, and the target language sentence configuration that configures the output target language sentence And a program for causing the target language sentence configuration unit to function as an output unit that outputs a sentence in the target language.

また、かかるプログラムは、コンピュータを、文を受け付ける文受付部と、前記文を２以上の語句に分割し、当該語句に対応する汎化情報を取得する語句分割部と、前記語句分割部が分割した２以上の語句に対応する汎化情報のうち、語句と置き換える汎化情報を決定し、取得する汎化情報取得部と、前記汎化情報取得部が取得した汎化情報と前記語句分割部が分割した２以上の語句のうちの1以上の語句を用いた文と、記憶媒体に格納されている２組以上の翻訳文セットが有する各原言語の文との類似度を算出する類似度算出部と、前記類似度算出部が算出した類似度を用いて、前記文受付部が受け付けた文と類似する１以上の原言語の文を出力する類似文出力部と、前記類似文出力部が出力した1以上の原言語の文のうち、一の原言語の文に対する許可を示す指示である許可指示の入力を受け付ける許可指示受付部と、前記許可指示受付部が受け付けた許可指示に対応する原言語の文に対応する目的言語の文を取得する目的言語文取得部と、前記汎化情報取得部が取得した汎化情報に対応する原言語の語句と対になる目的言語の語句を、記憶媒体に格納されている１以上の語句セットから取得する目的言語語句取得部と、前記目的言語文取得部が取得した目的言語の文中における、汎化情報に対応する語句を、前記目的言語語句取得部が取得した目的言語の語句に置き換え、出力する目的言語の文を構成する目的言語文構成部と、前記目的言語文構成部が構成した目的言語の文を出力する出力部として機能させるためのプログラム、である。 In addition, the program includes a sentence receiving unit that receives a sentence, a phrase dividing unit that divides the sentence into two or more words and acquires generalization information corresponding to the words, and the word dividing unit divides the computer Among the generalization information corresponding to two or more words and phrases, generalization information to be replaced with a phrase is determined and acquired, and the generalization information acquired by the generalization information acquisition section and the phrase division section Similarity that calculates the similarity between sentences that use one or more of two or more words divided by, and sentences of each source language that two or more translation sentences stored in the storage medium have Using the similarity calculated by the similarity calculator, the similar sentence output unit that outputs one or more source language sentences similar to the sentence received by the sentence receiver, and the similar sentence output unit One source language sentence out of one or more source language sentences output by A permission instruction receiving unit that receives an input of a permission instruction that is an instruction indicating permission to the target, and a target language sentence acquisition that acquires a sentence in a target language corresponding to a sentence in a source language corresponding to the permission instruction received by the permission instruction receiving unit Language and a target language phrase that is acquired from one or more phrase sets stored in a storage medium, and a target language phrase that is paired with a source language phrase corresponding to the generalization information acquired by the generalization information acquisition unit In the target language sentence acquired by the acquisition unit and the target language sentence acquisition unit, the phrase corresponding to the generalization information is replaced with the target language phrase acquired by the target language phrase acquisition unit, and the target language sentence to be output And a program for functioning as an output unit that outputs a sentence in the target language configured by the target language sentence configuration unit.

また、上記プログラムにおいて、前記類似度算出部を、前記汎化情報取得部が取得した汎化情報の数である第一汎化情報数を取得する第一汎化情報数取得手段と、前記翻訳文セット格納部の２組以上の翻訳文セットが有する原言語の文または目的言語の文が有する汎化情報の数である第二汎化情報数を取得する第二汎化情報数取得手段と、前記第一汎化情報数と一致する第二汎化情報数を有する原言語の文を選択する類似度算出対象文選択手段と、前記汎化情報取得部が取得した汎化情報と前記語句分割部が分割した２以上の語句のうちの1以上の語句を用いた文と、前記類似度算出対象文選択手段が選択した1以上の原言語の文との類似度を算出する類似度算出手段として機能させるプログラムであることは好適である。 Further, in the program, the similarity calculation unit includes a first generalization information number acquisition unit that acquires a first generalization information number that is the number of generalization information acquired by the generalization information acquisition unit, and the translation Second generalization information number acquisition means for acquiring the number of second generalization information, which is the number of generalization information included in a source language sentence or a target language sentence included in two or more sets of translated sentence sets in a sentence set storage unit; , A similarity calculation target sentence selection unit that selects a sentence in a source language having a second generalization information number that matches the first generalization information number, the generalization information acquired by the generalization information acquisition unit, and the phrase Similarity calculation for calculating the similarity between a sentence using one or more of two or more words divided by the dividing unit and one or more source language sentences selected by the similarity calculation target sentence selecting means. It is preferable that the program functions as a means.

また、上記プログラムにおいて、前記許可指示受付部が、許可指示の入力を受け付けなかった場合、コンピュータを、前記文受付部が受け付けた文の再入力または編集を促す出力を行う再入力催促部としてさらに機能させ、かつ、前記文受付部は、再度、文を受け付けるように機能させることは好適である。 Further, in the above program, when the permission instruction receiving unit does not receive the input of the permission instruction, the computer is further provided as a re-input prompting unit that performs an output for prompting re-input or editing of the sentence received by the sentence receiving unit. It is preferable that the function is made to function, and the sentence receiving unit functions to accept a sentence again.

また、上記プログラムにおいて、前記汎化情報取得部を、名詞または形容詞の語句のみに対して、汎化情報を取得するように機能させることは好適である。 In the above program, it is preferable that the generalization information acquisition unit functions so as to acquire generalization information for only a noun or adjective phrase.

（実施の形態２）
本実施の形態において、文の入力が音声入力であり、かつ、音声認識処理を行った後、実施の形態１と同様のアルゴリズムにより、機械翻訳を行う。 (Embodiment 2)
In the present embodiment, the sentence input is a speech input, and after performing speech recognition processing, machine translation is performed by the same algorithm as in the first embodiment.

図１１は、本実施の形態における機械翻訳装置のブロック図である。 FIG. 11 is a block diagram of the machine translation apparatus in the present embodiment.

機械翻訳装置は、語句汎化情報格納部１０１、語句セット格納部１０２、翻訳文セット格納部１０３、音声受付部１１０１、文構成部１１０２、文受付部１１０４、語句分割部１０５、汎化情報取得部１０６、類似度算出部１０７、類似文出力部１０８、許可指示受付部１０９、目的言語文取得部１１０、目的言語語句取得部１１１、目的言語文構成部１１２、出力部１１３、再入力催促部１１４を具備する。 The machine translation apparatus includes a phrase generalization information storage unit 101, a phrase set storage unit 102, a translated sentence set storage unit 103, a speech reception unit 1101, a sentence composition unit 1102, a sentence reception unit 1104, a phrase division unit 105, and a generalization information acquisition. Unit 106, similarity calculation unit 107, similar sentence output unit 108, permission instruction reception unit 109, target language sentence acquisition unit 110, target language phrase acquisition unit 111, target language sentence composition unit 112, output unit 113, re-input prompting unit 114.

音声受付部１１０１は、音声を受け付ける。音声受付部１１０１は、再度、音声を受け付けても良い。音声の入力手段は、例えば、マイクであるが、記憶媒体からの読み込み等でも良い。音声の入力手段は、マイク３０５や記録媒体からの読み出し手段等、何でも良い。音声受付部１１０１は、マイク３０５等の入力手段のデバイスドライバー等で実現され得る。 The voice reception unit 1101 receives voice. The voice reception unit 1101 may receive voice again. The voice input means is, for example, a microphone, but may be read from a storage medium. The voice input means may be anything such as a microphone 305 or a reading means from a recording medium. The voice reception unit 1101 can be realized by a device driver or the like of input means such as the microphone 305.

文構成部１１０２は、音声受付部１１０１が受け付けた音声を音声認識処理し、文を構成する。音声認識処理のアルゴリズムは、問わない。音声認識処理は、公知技術であるので、詳細な説明を省略する。文構成部１１０２は、通常、ＭＰＵやメモリ等から実現され得る。文構成部１１０２の処理手順は、通常、ソフトウェアで実現され、当該ソフトウェアはＲＯＭ等の記録媒体に記録されている。但し、ハードウェア（専用回路）で実現しても良い。 The sentence constructing unit 1102 performs speech recognition processing on the speech accepted by the speech accepting unit 1101, and composes a sentence. The algorithm for the speech recognition process is not limited. Since the voice recognition process is a known technique, detailed description thereof is omitted. The sentence constructing unit 1102 can usually be realized by an MPU, a memory, or the like. The processing procedure of the sentence composition unit 1102 is usually realized by software, and the software is recorded in a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).

文受付部１１０４は、文を受け付ける。この受け付けとは、文構成部１１０２が構成した文の取得である。文受付部１１０４は、語句分割部１０５に、文構成部１１０２が構成した文を渡す役割を果たす。文受付部１１０４は、例えば、語句分割部１０５を実現するソフトウェアの実行や、語句分割部１０５を実現する関数の呼び出しなどである。文受付部１１０４は、ＭＰＵとメモリとソフトウェア等で実現され得る。 The sentence reception unit 1104 receives a sentence. This acceptance is acquisition of the sentence which the sentence structure part 1102 comprised. The sentence reception unit 1104 serves to pass the sentence formed by the sentence composition unit 1102 to the phrase dividing unit 105. The sentence receiving unit 1104 is, for example, executing software that implements the phrase dividing unit 105 or calling a function that implements the phrase dividing unit 105. The sentence reception unit 1104 can be realized by an MPU, a memory, software, and the like.

次に、機械翻訳装置の動作について説明する。本機械翻訳装置は、実施の形態１の機械翻訳装置と比較し、ユーザから受け付ける文が音声により受け付ける点が異なる。つまり、本機械翻訳装置の音声受付部１１０１は、音声により文を受け付け、文構成部１１０２は音声認識処理を行い、翻訳対象の文を構成する文字列を取得し、メモリ上に配置する。そして、文受付部１１０４は、文構成部１１０２が得た文を、語句分割部１０５に渡す。その後の翻訳処理は、実施の形態１の機械翻訳装置と同様である。 Next, the operation of the machine translation apparatus will be described. This machine translation device is different from the machine translation device of the first embodiment in that a sentence received from a user is received by voice. That is, the speech receiving unit 1101 of the machine translation apparatus receives a sentence by speech, and the sentence constructing unit 1102 performs a speech recognition process, acquires a character string that constitutes a sentence to be translated, and places it on a memory. Then, the sentence receiving unit 1104 passes the sentence obtained by the sentence constructing unit 1102 to the phrase dividing unit 105. The subsequent translation processing is the same as that of the machine translation apparatus of the first embodiment.

また、再入力催促部１１４は、通常、音声による文の再入力を促す。ここで、「文の再入力を促す」とは、例えば、「再度、マイクに向かって、文を発声してください。」などという文をディスプレイに表示したり、スピーカーに出力したりする。 Further, the re-input prompting unit 114 normally prompts the user to re-input a sentence by voice. Here, “prompt the user to re-enter a sentence” means, for example, that a sentence “Please speak to the microphone again” is displayed on the display or output to the speaker.

以上、本実施の形態によれば、ユーザが発声した文を、非常に精度高く機械翻訳できる。 As described above, according to the present embodiment, a sentence uttered by a user can be machine translated with extremely high accuracy.

なお、本実施の形態における機械翻訳装置を実現するソフトウェアは、以下のようなプログラムである。つまり、このプログラムは、コンピュータを、文を受け付ける文受付部と、前記文を２以上の語句に分割する語句分割部と、前記語句分割部が分割した１以上の語句に対応する汎化情報を、記憶媒体に格納されている１以上の語句汎化情報から取得する汎化情報取得部と、前記汎化情報取得部が取得した汎化情報と前記語句分割部が分割した２以上の語句のうちの1以上の語句を用いた文と、記憶媒体に格納されている２組以上の翻訳文セットが有する各原言語の文との類似度を算出する類似度算出部と、前記類似度算出部が算出した類似度を用いて、前記文受付部が受け付けた文と類似する１以上の原言語の文を出力する類似文出力部と、前記類似文出力部が出力した1以上の原言語の文のうち、一の原言語の文に対する許可を示す指示である許可指示の入力を受け付ける許可指示受付部と、前記許可指示受付部が受け付けた許可指示に対応する原言語の文に対応する目的言語の文を取得する目的言語文取得部と、前記汎化情報取得部が取得した汎化情報に対応する原言語の語句と対になる目的言語の語句を、記憶媒体に格納されている１以上の語句セットから取得する目的言語語句取得部と、前記目的言語文取得部が取得した目的言語の文中における、汎化情報に対応する語句を、前記目的言語語句取得部が取得した目的言語の語句に置き換え、出力する目的言語の文を構成する目的言語文構成部と、前記目的言語文構成部が構成した目的言語の文を出力する出力部として機能させるためのプログラム、である。 The software that realizes the machine translation apparatus according to the present embodiment is the following program. In other words, the program includes a computer that receives a sentence receiving unit that receives a sentence, a phrase dividing unit that divides the sentence into two or more words, and generalization information corresponding to one or more words that the word dividing unit divides. A generalization information acquisition unit acquired from one or more word generalization information stored in a storage medium, generalization information acquired by the generalization information acquisition unit, and two or more words divided by the word division unit A similarity calculation unit for calculating a similarity between a sentence using one or more of the phrases and a sentence in each source language included in two or more translation sentence sets stored in a storage medium; and the similarity calculation A similar sentence output unit that outputs one or more source language sentences similar to the sentence received by the sentence receiving unit using the similarity calculated by the unit, and one or more source languages output by the similar sentence output unit Is an instruction indicating permission for a sentence in one source language. A permission instruction receiving unit that receives an input of a permission instruction, a target language sentence acquiring unit that acquires a sentence in a target language corresponding to a sentence in a source language corresponding to the permission instruction received by the permission instruction receiving unit, and the generalization information A target language phrase acquisition unit that acquires, from one or more word / phrase sets stored in a storage medium, a target language phrase that is paired with a source language phrase corresponding to the generalization information acquired by the acquisition unit; In the target language sentence acquired by the sentence acquisition unit, the phrase corresponding to the generalization information is replaced with the target language phrase acquired by the target language phrase acquisition unit, and the target language sentence configuration that configures the output target language sentence And a program for causing the target language sentence configuration unit to function as an output unit that outputs a sentence in the target language.

また、かかるプログラムは、コンピュータを、文を受け付ける文受付部と、前記文を２以上の語句に分割し、当該語句に対応する汎化情報を取得する語句分割部と、前記語句分割部が分割した２以上の語句に対応する汎化情報のうち、語句と置き換える汎化情報を決定し、取得する汎化情報取得部と、前記汎化情報取得部が取得した汎化情報と前記語句分割部が分割した２以上の語句のうちの1以上の語句を用いた文と、記憶媒体に格納されている２組以上の翻訳文セットが有する各原言語の文との類似度を算出する類似度算出部と、前記類似度算出部が算出した類似度を用いて、前記文受付部が受け付けた文と類似する１以上の原言語の文を出力する類似文出力部と、前記類似文出力部が出力した1以上の原言語の文のうち、一の原言語の文に対する許可を示す指示である許可指示の入力を受け付ける許可指示受付部と、前記許可指示受付部が受け付けた許可指示に対応する原言語の文に対応する目的言語の文を取得する目的言語文取得部と、前記汎化情報取得部が取得した汎化情報に対応する原言語の語句と対になる目的言語の語句を、記憶媒体に格納されている１以上の語句セットから取得する目的言語語句取得部と、前記目的言語文取得部が取得した目的言語の文中における、汎化情報に対応する語句を、前記目的言語語句取得部が取得した目的言語の語句に置き換え、出力する目的言語の文を構成する目的言語文構成部と、前記目的言語文構成部が構成した目的言語の文を出力する出力部として機能させるためのプログラム、である。 In addition, the program includes a sentence receiving unit that receives a sentence, a word dividing unit that divides the sentence into two or more words and acquires generalized information corresponding to the words, and the word dividing unit divides the computer Among the generalization information corresponding to two or more words and phrases, generalization information to be replaced with a phrase is determined and acquired, and the generalization information acquired by the generalization information acquisition section and the phrase division section Similarity that calculates the similarity between sentences that use one or more of two or more words divided by, and sentences of each source language that two or more translation sentences stored in the storage medium have Using the similarity calculated by the similarity calculator, the similar sentence output unit that outputs one or more source language sentences similar to the sentence received by the sentence receiver, and the similar sentence output unit Of one or more source language sentences output by, one source language sentence A permission instruction receiving unit that receives an input of a permission instruction that is an instruction indicating permission to the target, and a target language sentence acquisition that acquires a sentence in a target language corresponding to a sentence in a source language corresponding to the permission instruction received by the permission instruction receiving unit Language and a target language phrase that is acquired from one or more phrase sets stored in a storage medium, and a target language phrase that is paired with a source language phrase corresponding to the generalization information acquired by the generalization information acquisition unit In the target language sentence acquired by the acquisition unit and the target language sentence acquisition unit, the phrase corresponding to the generalization information is replaced with the target language phrase acquired by the target language phrase acquisition unit, and the target language sentence to be output And a program for functioning as an output unit that outputs a sentence in the target language configured by the target language sentence configuration unit.

また、上記プログラムにおいて、コンピュータを、音声を受け付ける音声受付部と、前記音声受付部が受け付けた音声を音声認識処理し、文を構成する文構成部としてさらに機能させ、前記文受付部は、前記文構成部が構成した文を取得するように機能させることは好適である。 Further, in the above program, the computer further causes a voice reception unit that receives a voice and a voice recognition process on the voice received by the voice reception unit to further function as a sentence configuration unit that constitutes a sentence. It is preferable to function so as to acquire a sentence formed by the sentence constructing unit.

（評価実験）
以下、本実施の形態における機械翻訳装置を用いた評価実験について述べる。 (Evaluation experiment)
Hereinafter, an evaluation experiment using the machine translation apparatus in the present embodiment will be described.

文構成部１１０２が行う日本語音声認識は、ATRASR（伊藤玄, 葦苅豊, 實廣貴敏, 中村哲, "音声認識統合環境ATRASRの概要と評価報告," 日本音響学会2004年秋季研究発表会講演論文集I, 1-P-30, pp. 221-222 (2004).参照）を用いた。音響モデルはMDL-SSS（實廣貴敏, 松田繁樹, 藤本雅清, Herbordt, W., 堀内俊治,中村哲, "ATRにおける日本語音声認識の評価―日本語音響モデル―," 日本音響学会2006年春季研究発表会講演論文集, 1-P-21, pp. 185-186 (2006).参照）、言語モデルはマルチクラス複合バイグラム（山本博史, 菊井玄一郎, "ATRにおける音声認識の評価―コーパスと言語モデル―," 日本音響学会2006年春季研究発表会講演論文集, 1-P-22, pp. 187-188 (2006).参照）を用いた。 The Japanese speech recognition performed by the sentence composition part 1102 is ATRASR (Ito Gen, Yutaka Toyo, Takatoshi Satoshi, Satoshi Nakamura, "Summary and Evaluation Report of ATRASR, Integrated Speech Recognition Environment," 2004 Autumn Meeting of the Acoustical Society of Japan Lecture Proceedings I, 1-P-30, pp. 221-222 (2004). The acoustic model is MDL-SSS (Takashi Tsuji, Shigeki Matsuda, Masayoshi Fujimoto, Herbordt, W., Toshiharu Horiuchi, Satoshi Nakamura, "Evaluation of Japanese speech recognition in ATR-Japanese acoustic model-," Acoustical Society of Japan 2006 The Spring Research Conference Proceedings, 1-P-21, pp. 185-186 (2006).) Language model is multi-class composite bigram (Hirofumi Yamamoto, Genichiro Kikui, "Evaluation of speech recognition in ATR-corpus and We used the language model, "The Acoustical Society of Japan 2006 Spring Meeting, 1-P-22, pp. 187-188 (2006)."

テストセットは、図１２に示すような３種類のものを用いた。図１２において、BTEC(Basic Travel Expression Corpus)（「Takezawa, T., Sumita, E., Sugaya, F., Yamamoto, H., and Yamamoto, S., "Toward a broad-coverage bilingual corpus for speech translation of travel conversations in the real world," Proc. of International Conference on Language Resources and Evaluation, pp. 147-152 (2002).」および「Kikui, G., Sumita, E., Takezawa, T., and Yamamoto, S., "Creating corpora for speech-to-speech translation," Proc. of 8^th European Conference on Speech Communication and Technology, Vol. 1, pp. 381-384 (2003).」参照）は、旅行会話基本表現コーパスの読み上げ音声である。 Three types of test sets as shown in FIG. 12 were used. In FIG. 12, BTEC (Basic Travel Expression Corpus) (“Takezawa, T., Sumita, E., Sugaya, F., Yamamoto, H., and Yamamoto, S.,“ Toward a broad-coverage bilingual corpus for speech translation of travel conversations in the real world, "Proc. of International Conference on Language Resources and Evaluation, pp. 147-152 (2002)." and "Kikui, G., Sumita, E., Takezawa, T., and Yamamoto, S., "Creating corpora for speech- to-speech translation," Proc. of 8 th European Conference on Speech Communication and Technology, Vol. 1, pp. 381-384 (2003). "reference), travel conversation basic representation This is the speech of the corpus.

MAD(Machine-Aided Dialogs)（「Takezawa, T. and Kikui, G., "A comparative study on human communication behaviors and linguistic characteristics for speech-to-speech translation," Proc. of International Conference on Language Resources and Evaluation, pp. 1589-1592, (2004).」参照）は、音声翻訳システムを介して、日本語話者と英語話者が実施した課題遂行型対話であり、音声認識システムの代わりにタイピストが発話を書き起こし、機械翻訳システムに入力する形態で集めたものである。 MAD (Machine-Aided Dialogs) ("Takezawa, T. and Kikui, G.," A comparative study on human communication behaviors and linguistic characteristics for speech-to-speech translation, "Proc. Of International Conference on Language Resources and Evaluation, pp. 1589-1592, (2004) ”is a task-execution-type dialogue conducted by a Japanese speaker and an English speaker via a speech translation system. A typist speaks instead of a speech recognition system. These are collected in the form of transcription and input to a machine translation system.

FED (Field Experiment Data)（「菊井玄一郎, 竹澤寿幸, 水島昌英, 山本誠一, 佐々木裕, 河井恒, 中村哲, "音声対話翻訳システムの実環境におけるモニタ実験," 日本音響学会2005年秋季研究発表会講演論文集, 1-7-10, pp. 11-12 (2006).」参照）は、関西国際空港で実施したモニタ実験から選んだデータである。平均発話長はBTECとFEDが同程度で、MADが長い。パープレキシティは、BTEC、MAD、FEDの順に大きくなっている。 FED (Field Experiment Data) ("Genichiro Kikui, Toshiyuki Takezawa, Masahide Mizushima, Seiichi Yamamoto, Hiroshi Sasaki, Tsune Kawai, Satoshi Nakamura," Monitoring Experiment in Spoken Dialogue Translation System in Real Environments, "Japan Society of Acoustics 2005 Fall Research Presentation (See Proceedings of the Conference, 1-7-10, pp. 11-12 (2006).) Is the data selected from the monitoring experiment conducted at Kansai International Airport. The average utterance length is about the same for BTEC and FED, and MAD is long. Perplexity increases in the order of BTEC, MAD, and FED.

利用者が使う状況を前提としているため、話し終えたら、直ちに結果が表示されるような設定とした。具体的には、処理時間をリアルタイムファクタ(RTF)で表現したときに、「RTF=1」とした。日本語音声認識結果の情報を図１３に示す。 Since it is assumed that the user will use it, the setting is made so that the result is displayed immediately after the talk. Specifically, when processing time is expressed in real time factor (RTF), “RTF = 1” is set. FIG. 13 shows information on the result of Japanese speech recognition.

また、上述した汎化情報は、２種類の汎化情報（条件Ａ、条件Ｂ）を用意した。２種類の汎化情報は、共に品詞を汎化したものである。「条件Ａ」は、「人名」「地名」のみ汎化したものである。「条件Ｂ」は、「普通名詞」「形式名詞」「サ変名詞」「形容名詞」「サ変形容名詞」「形容詞」「形容動詞」「数詞」「副詞」を汎化したものである。つまり、「条件Ｂ」の名詞的内容語は、「普通名詞」「形式名詞」「サ変名詞」「形容名詞」「サ変形容名詞」「形容詞」「形容動詞」「数詞」「副詞」である。 Moreover, the generalization information mentioned above prepared two types of generalization information (condition A and condition B). Both types of generalization information are generalized parts of speech. “Condition A” is a generalization of “person name” and “place name” only. “Condition B” is a generalization of “common nouns”, “formal nouns”, “sa variant nouns”, “adjective nouns”, “sa deformed nouns”, “adjectives”, “adjective verbs”, “numerical numbers”, and “adverbs”. That is, the noun content words of “condition B” are “common noun”, “formal noun”, “sa variant noun”, “adjective noun”, “sa deformed noun”, “adjective”, “adjective verb”, “numerical number”, and “adverb”.

品詞を汎化してマッチングする手順は、上述したように、次のとおりである。第一に、汎化する対象の表層表現（例えば、「東京」）を削除する。なお、別途、その情報（例えば、「東京」）を残す。第二に、対訳用例（翻訳文セット格納部１０３）の検索を行う。第三に、対訳ペアで汎化する対象の数が異なる場合は候補から削除する。第四に、汎化されたものを含む候補に対して、元の表層表現を追加する。 The procedure for generalizing and matching parts of speech is as follows, as described above. First, the surface representation (for example, “Tokyo”) to be generalized is deleted. Separately, the information (for example, “Tokyo”) is left. Second, the parallel translation example (translated sentence set storage unit 103) is searched. Third, if the number of objects to be generalized in the parallel translation pair is different, it is deleted from the candidates. Fourth, the original surface representation is added to the candidate including the generalized one.

大規模コーパスとしては、旅行会話基本表現コーパスBTECを用いた。図１４にその概要を示す。収集の時期とどの言語を起点に作成されたかによりサブセットに分けられており、そのうちのBTEC1, BTEC2, BTEC3, BTEC4を用いた。BTEC1, BTEC2, BTEC3は日本人が主に欧米へ行く場面の表現である。BTEC4はアメリカやオーストラリアから旅行者が日本へ来る場面の表現である。検索対象となる対訳形式の発話表現数はあわせて約49万である。日英中の三言語パラレルとなっているが、そのうちの日英方向について実験を実施した。予備選択の数N_rは３０に設定した。 As a large-scale corpus, the travel conversation basic expression corpus BTEC was used. The outline is shown in FIG. BTEC1, BTEC2, BTEC3, and BTEC4 were used, which were divided into subsets according to the time of collection and which language was used as the starting point. BTEC1, BTEC2, BTEC3 are expressions of scenes where Japanese people mainly go to Europe and America. BTEC4 is an expression of a scene where a traveler from the United States or Australia comes to Japan. The total number of utterance expressions in the bilingual format to be searched is about 490,000. Trilingual parallel between Japanese and English, but experiments were conducted in the direction of Japanese and English. The number N _{r of} preliminary selections was set to 30.

検索の重みパラメータαの値を0から1まで0.1きざみとし、図１２の３種類のテストセットを用いて実験を行った。音声認識結果を入力として第１位候補で完全一致した正解率(Top 1 Accuracy)、同じく第３０位までの中に完全一致したものが含まれている正解率(Top 30 Accuracy)、正解形態素列を入力とした場合の正解率(Oracle Accuracy)を条件Ａ(Condition A)、条件Ｂ(Condition B)に対して求めた結果を発話正解率(Correct Utterance)とともに図に示す。図１５がBTEC、図１６がMAD、図１７がFEDの結果である。 Experiments were performed using the three types of test sets shown in FIG. 12, with the value of the search weight parameter α set to 0.1 increments from 0 to 1. The accuracy rate (Top 1 Accuracy) that perfectly matched the first candidate using the speech recognition result as input, and the accuracy rate (Top 30 Accuracy) that includes the exact match up to 30th, correct morpheme sequence The results obtained for the accuracy rate (Oracle Accuracy) with respect to condition A (Condition A) and condition B (Condition B) are shown in the figure together with the utterance accuracy rate (Correct Utterance). FIG. 15 shows the results of BTEC, FIG. 16 shows the results of MAD, and FIG. 17 shows the results of FED.

BTECの実験はクローズドのものである。MAD、FEDの実験については、挨拶のような定型的表現は含まれているであろうが、オープンの結果である。BTECについては、汎化条件Ａが汎化条件Ｂよりも良い結果となっているが、それ以外のMAD、FEDは汎化条件Ｂが汎化条件Ａより良い結果となっている。BTECで汎化条件Ａの場合のみ、音声認識結果を入力として３０位までの候補を出すと１位のみの場合に比べて正解率の向上が見られるが、それ以外のテストセット、条件は１位のみと３０位までの累積の値の間にあまり差はない。 The BTEC experiment is closed. The MAD and FED experiments are open results, although typical expressions such as greetings may be included. As for BTEC, the generalization condition A is better than the generalization condition B, but for other MADs and FEDs, the generalization condition B is better than the generalization condition A. Only in the case of generalization condition A in BTEC, when the candidates up to 30th are given by inputting the speech recognition result, the accuracy rate is improved compared to the case of only 1st place, but other test sets and conditions are 1 There is not much difference between only the place and the cumulative value up to the 30th place.

次に、人手による評価結果を示す。 Next, manual evaluation results are shown.

図１５、図１６、図１７の結果は、得られた類似用例が元の言語表現と形態素列として完全一致している割合を求めたものである。実際には表現が若干異なっていても同じ意味内容を伝達するとみなせる場合がある。そこで、実対話音声のテストセットであるFEDに対して人手でその内容を評価してみた。類似用例として選ばれた第１位の候補のみを、次のABCDランクに分類した。(A)は「形態素列として完全に一致している」ことを示す。(B)は「表現は異なるが、同じ意味内容である」ことを示す。(C)は「一部の役に立つ情報を含んでいる」ことを示す。(D)は「意味が異なり、役に立たない」ことを示す。 The results shown in FIGS. 15, 16, and 17 are obtained by calculating the ratio that the obtained similar use example completely matches the original language expression as a morpheme string. In fact, even if the expression is slightly different, it may be considered that the same semantic content is transmitted. Therefore, I manually evaluated the content of FED, which is a test set of real dialogue voice. Only the first candidate selected as a similar example was classified into the following ABCD rank. (A) indicates that “the morpheme string is completely matched”. (B) indicates that “the expressions are different but have the same meaning”. (C) indicates "contains some useful information". (D) indicates “meaning is different and useless”.

なお、類似用例が得られなかったものは。次の二つに分類した。(OK)は「音声認識結果が発話を単位として正しい」ことを示す。(NG)は「音声認識結果が誤っている」ことを示す。 In addition, what was not obtained similar example. It was classified into the following two. (OK) indicates that “the speech recognition result is correct in units of utterances”. (NG) indicates that “the speech recognition result is incorrect”.

汎化条件Ａに対する人手評価結果を図１８に、汎化条件Ｂに対する人手評価結果を図１９に示す。 FIG. 18 shows a manual evaluation result for the generalization condition A, and FIG. 19 shows a manual evaluation result for the generalization condition B.

実対話音声テストセットFEDでは、汎化条件Ａの場合に類似用例を出力する割合は約７５％であるが、役に立たないものが3分の1程度含まれている。汎化条件Ｂの場合には類似用例を出力する割合は約６０％になるものの、役に立たないものが少ない。汎化条件ＡでランクＢであったものが、汎化条件ＢでランクＢに向上した例を、以下に示す。
［入力音声］電車[ん]もしくはバスがありますが、どちらがよろしいですか。
［汎化条件Ａ］プリント紙は光沢と絹目がありますが、どちらがよろしいですか。
［汎化条件Ｂ］電車とバスではどちらがよろしいですか。 In the real conversation voice test set FED, the rate of outputting similar examples in the case of the generalization condition A is about 75%, but about one-third of those that are useless are included. In the case of the generalization condition B, the ratio of outputting a similar example is about 60%, but there are few useless ones. An example in which rank B in generalization condition A is improved to rank B in generalization condition B is shown below.
[Input Voice] There is a train or bus, which is better?
[Generalization condition A] Print paper is glossy or silky. Which is better?
[Generalization condition B] Which is better for trains and buses?

また、類似用例が検索されなかったが、認識結果が正しいというOKに分類されるものの例を示す。
［OKの例］二百三十円です。 Moreover, although the example for similar use was not searched, the example of what is classified into OK that the recognition result is correct is shown.
[Example of OK] ¥ 230.

以上より、大規模コーパスを用いて音声認識結果に類似した言語表現を検索する手法を検討し、旅行会話文の読み上げ音声と実対話音声の複数のテストセットを用いた評価実験を行って、その効果を示せた。 Based on the above, we examined a method for searching linguistic expressions similar to speech recognition results using a large-scale corpus, and conducted an evaluation experiment using multiple test sets of spoken speech of travel conversation sentences and real conversation speech. Showed the effect.

また、上記各実施の形態において、各処理（各機能）は、単一の装置（システム）によって集中処理されることによって実現されてもよく、あるいは、複数の装置によって分散処理されることによって実現されてもよい。 In each of the above embodiments, each process (each function) may be realized by centralized processing by a single device (system), or by distributed processing by a plurality of devices. May be.

また、図２０は、本明細書で述べたプログラムを実行して、上述した種々の実施の形態の機械翻訳装置を実現するコンピュータの外観を示す。上述の実施の形態は、コンピュータハードウェア及びその上で実行されるコンピュータプログラムで実現され得る。図２０は、このコンピュータシステム３４０の概観図であり、図２１は、コンピュータシステム３４０のブロック図である。 FIG. 20 shows the external appearance of a computer that executes the programs described in this specification to realize the machine translation apparatus according to the various embodiments described above. The above-described embodiments can be realized by computer hardware and a computer program executed thereon. FIG. 20 is an overview of the computer system 340, and FIG. 21 is a block diagram of the computer system 340.

図２０において、コンピュータシステム３４０は、ＦＤ（ＦｌｅｘｉｂｌｅＤｉｓｋ）ドライブ、ＣＤ−ＲＯＭ（ＣｏｍｐａｃｔＤｉｓｋＲｅａｄＯｎｌｙＭｅｍｏｒｙ）ドライブを含むコンピュータ３４１と、キーボード３４２と、マウス３４３と、モニタ３４４と、マイク３４５とを含む。 20, the computer system 340 includes a computer 341 including a FD (Flexible Disk) drive and a CD-ROM (Compact Disk Read Only Memory) drive, a keyboard 342, a mouse 343, a monitor 344, and a microphone 345. .

図２１において、コンピュータ３４１は、ＦＤドライブ３４１１、ＣＤ−ＲＯＭドライブ３４１２に加えて、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）３４１３と、ＣＤ−ＲＯＭドライブ３４１２及びＦＤドライブ３４１１に接続されたバス３４１４と、ブートアッププログラム等のプログラムを記憶するためのＲＯＭ（Ｒｅａｄ−ＯｎｌｙＭｅｍｏｒｙ）３４１５と、ＣＰＵ３４１３に接続され、アプリケーションプログラムの命令を一時的に記憶するとともに一時記憶空間を提供するためのＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）３４１６と、アプリケーションプログラム、システムプログラム、及びデータを記憶するためのハードディスク３４１７とを含む。ここでは、図示しないが、コンピュータ３４１は、さらに、ＬＡＮへの接続を提供するネットワークカードを含んでも良い。 In FIG. 21, in addition to the FD drive 3411 and the CD-ROM drive 3412, a computer 341 includes a CPU (Central Processing Unit) 3413, a bus 3414 connected to the CD-ROM drive 3412 and the FD drive 3411, and a boot-up program. ROM (Read-Only Memory) 3415 for storing programs such as a RAM, and a RAM (Random Access Memory) 3416 connected to the CPU 3413 for temporarily storing application program instructions and providing a temporary storage space , An application program, a system program, and a hard disk 3417 for storing data. Although not shown here, the computer 341 may further include a network card that provides connection to the LAN.

コンピュータシステム３４０に、上述した実施の形態の機械翻訳装置の機能を実行させるプログラムは、ＣＤ−ＲＯＭ３５０１、またはＦＤ３５０２に記憶されて、ＣＤ−ＲＯＭドライブ３４１２またはＦＤドライブ３４１１に挿入され、さらにハードディスク３４１７に転送されても良い。これに代えて、プログラムは、図示しないネットワークを介してコンピュータ３４１に送信され、ハードディスク３４１７に記憶されても良い。プログラムは実行の際にＲＡＭ３４１６にロードされる。プログラムは、ＣＤ−ＲＯＭ３５０１、ＦＤ３５０２またはネットワークから直接、ロードされても良い。 A program that causes the computer system 340 to execute the functions of the machine translation apparatus according to the above-described embodiment is stored in the CD-ROM 3501 or the FD 3502, inserted into the CD-ROM drive 3412 or the FD drive 3411, and further stored in the hard disk 3417. May be forwarded. Alternatively, the program may be transmitted to the computer 341 via a network (not shown) and stored in the hard disk 3417. The program is loaded into the RAM 3416 at the time of execution. The program may be loaded directly from the CD-ROM 3501, the FD 3502, or the network.

プログラムは、コンピュータ３４１に、上述した実施の形態の機械翻訳装置の機能を実行させるオペレーティングシステム（ＯＳ）、またはサードパーティープログラム等は、必ずしも含まなくても良い。プログラムは、制御された態様で適切な機能（モジュール）を呼び出し、所望の結果が得られるようにする命令の部分のみを含んでいれば良い。コンピュータシステム３４０がどのように動作するかは周知であり、詳細な説明は省略する。 The program does not necessarily include an operating system (OS) or a third-party program that causes the computer 341 to execute the functions of the machine translation apparatus according to the above-described embodiment. The program only needs to include an instruction portion that calls an appropriate function (module) in a controlled manner and obtains a desired result. How the computer system 340 operates is well known and will not be described in detail.

また、上記プログラムを実行するコンピュータは、単数であってもよく、複数であってもよい。すなわち、集中処理を行ってもよく、あるいは分散処理を行ってもよい。 Further, the computer that executes the program may be singular or plural. That is, centralized processing may be performed, or distributed processing may be performed.

本発明は、以上の実施の形態に限定されることなく、種々の変更が可能であり、それらも本発明の範囲内に包含されるものであることは言うまでもない。 The present invention is not limited to the above-described embodiments, and various modifications are possible, and it goes without saying that these are also included in the scope of the present invention.

以上のように、本発明にかかる機械翻訳装置は、非常に精度高く機械翻訳できる、という効果を有し、機械翻訳装置等として有用である。 As described above, the machine translation apparatus according to the present invention has an effect that machine translation can be performed with extremely high accuracy, and is useful as a machine translation apparatus or the like.

実施の形態１における機械翻訳装置のブロック図Block diagram of machine translation apparatus in embodiment 1 同機械翻訳装置の動作について説明するフローチャートFlow chart for explaining the operation of the machine translation apparatus 同類似度算出処理の動作について説明するフローチャートA flowchart for explaining the operation of the similarity calculation process 同翻訳文出力処理の動作について説明するフローチャートFlowchart for explaining the operation of the translated sentence output process 同語句汎化情報管理表を示す図Figure showing synonym generalization information management table 同語句セット管理表を示す図Diagram showing synonym phrase management table 同翻訳文セット管理表を示す図Figure showing the translation set management table 同選択された原言語の文の例を示す図The figure which shows the example of the sentence of the selected source language 同画面出力例を示す図The figure which shows the example of the same screen output 同画面出力例を示す図The figure which shows the example of the same screen output 実施の形態２における機械翻訳装置のブロック図Block diagram of machine translation apparatus according to Embodiment 2 同テストセットを示す図Diagram showing the test set 同日本語音声認識結果の情報を示す図The figure which shows the information of the same Japanese voice recognition result 同旅行会話基本表現コーパスの概要を示す図The figure which shows the outline of the travel conversation basic expression corpus 同ＢＴＥＣの結果を示す図The figure which shows the result of the same BTEC 同ＭＡＤの結果を示す図The figure which shows the result of the same MAD 同ＦＥＤの結果を示す図The figure which shows the result of the same FED 同汎化条件Ａに対する人手評価結果を示す図The figure which shows the manual evaluation result with respect to the generalization condition A 同汎化条件Ｂに対する人手評価結果を示す図The figure which shows the manual evaluation result with respect to the generalization condition B 同機械翻訳装置を実現するコンピュータの外観図External view of a computer that implements the machine translation device 同機械翻訳装置を実現するコンピュータシステムのブロック図Block diagram of a computer system that implements the machine translation device

Explanation of symbols

１０１語句汎化情報格納部
１０２語句セット格納部
１０３翻訳文セット格納部
１０４、１１０４文受付部
１０５語句分割部
１０６汎化情報取得部
１０７類似度算出部
１０８類似文出力部
１０９許可指示受付部
１１０目的言語文取得部
１１１目的言語語句取得部
１１２目的言語文構成部
１１３出力部
１１４再入力催促部
１０７１第一汎化情報数取得手段
１０７２第二汎化情報数取得手段
１０７３類似度算出対象文選択手段
１０７４類似度算出手段
１１０１音声受付部
１１０２文構成部 DESCRIPTION OF SYMBOLS 101 Phrase generalization information storage part 102 Phrase set storage part 103 Translation sentence set storage part 104, 1104 Sentence reception part 105 Phrase division part 106 Generalization information acquisition part 107 Similarity calculation part 108 Similar sentence output part 109 Permit instruction | indication reception part 110 Target language sentence acquisition unit 111 Target language phrase acquisition unit 112 Target language sentence composition unit 113 Output unit 114 Re-input prompting unit 1071 First generalization information number acquisition unit 1072 Second generalization information number acquisition unit 1073 Similarity calculation target sentence selection Means 1074 Similarity Calculation Means 1101 Voice Accepting Unit 1102 Sentence Constructing Unit

Claims

A phrase generalization information storage unit capable of storing one or more phrase generalization information having generalization information that is generalized information of the phrase and the phrase;
A phrase set storage unit that stores at least one phrase set having a pair of a source language phrase and a target language phrase;
A translation set having a pair of a source language sentence and a target language sentence, and having generalization information corresponding to some words and phrases of the source language sentence and the target language sentence of the translation sentence set A translated sentence set storage unit storing two or more sentence sets;
A sentence reception unit for receiving sentences;
A phrase dividing unit that divides the sentence into two or more phrases;
A generalization information acquisition unit that acquires generalization information corresponding to one or more words divided by the word division unit from the word generalization information storage unit;
A sentence using one or more words out of two or more words divided by the generalization information and the word dividing part acquired by the generalization information acquisition part, and two or more sets of translation sentences in the translation sentence set storage part A similarity calculation unit for calculating the similarity with each source language sentence included in the set;
A similarity sentence output unit that outputs one or more source language sentences similar to the sentence received by the sentence reception unit using the similarity calculated by the similarity calculation unit;
Among the one or more source language sentences output by the similar sentence output unit, a permission instruction receiving unit that receives an input of a permission instruction that is an instruction indicating permission for one source language sentence;
A target language sentence acquisition unit that acquires a target language sentence corresponding to a source language sentence corresponding to the permission instruction received by the permission instruction reception unit;
A target language phrase acquisition unit that acquires from the phrase set storage unit a target language phrase that is paired with a source language phrase corresponding to the generalization information acquired by the generalization information acquisition unit;
The purpose of constructing a sentence of the target language to be output by replacing the phrase corresponding to the generalization information in the sentence of the target language acquired by the target language sentence acquiring unit with the phrase of the target language acquired by the target language phrase acquiring unit A language sentence composition part;
A machine translation apparatus comprising: an output unit that outputs a target language sentence configured by the target language sentence constructing unit.

A phrase set storage unit that stores at least one phrase set having a pair of a source language phrase and a target language phrase;
A translation set having a pair of a source language sentence and a target language sentence, and having generalization information corresponding to some words and phrases of the source language sentence and the target language sentence of the translation sentence set A translated sentence set storage unit storing two or more sentence sets;
A sentence reception unit for receiving sentences;
A phrase dividing unit that divides the sentence into two or more words and acquires generalization information corresponding to the words;
A generalization information acquisition unit for determining and acquiring generalization information to be replaced with a phrase from among the generalization information corresponding to two or more words divided by the phrase division unit;
A sentence using one or more words out of two or more words divided by the generalization information and the word dividing part acquired by the generalization information acquisition part, and two or more sets of translation sentences in the translation sentence set storage part A similarity calculation unit for calculating the similarity with each source language sentence included in the set;
A similar sentence output unit that outputs one or more source language sentences similar to the sentence received by the sentence receiving unit using the similarity calculated by the similarity calculating unit;
Among the one or more source language sentences output by the similar sentence output unit, a permission instruction receiving unit that receives an input of a permission instruction that is an instruction indicating permission for one source language sentence;
A target language sentence acquisition unit that acquires a target language sentence corresponding to a source language sentence corresponding to the permission instruction received by the permission instruction reception unit;
A target language phrase acquisition unit that acquires from the phrase set storage unit a target language phrase that is paired with a source language phrase corresponding to the generalization information acquired by the generalization information acquisition unit;
The purpose of constructing a sentence of the target language to be output by replacing the phrase corresponding to the generalization information in the sentence of the target language acquired by the target language sentence acquiring unit with the phrase of the target language acquired by the target language phrase acquiring unit A language sentence composition part;
A machine translation apparatus comprising: an output unit that outputs a target language sentence configured by the target language sentence constructing unit.

A voice reception unit for receiving voice;
Voice recognition processing is performed on the voice received by the voice reception unit, and a sentence composing unit that constitutes a sentence is provided.
The machine translation device according to claim 1, wherein the sentence receiving unit acquires a sentence formed by the sentence composing unit.

The similarity calculation unit includes:
First generalization information number acquisition means for acquiring a first generalization information number that is the number of generalization information acquired by the generalization information acquisition unit;
Acquisition of the number of second generalization information for acquiring the number of second generalization information that is the number of generalization information that the sentence of the source language or the sentence of the target language possessed by two or more sets of translation sentences in the translation sentence set storage unit Means,
A similarity calculation target sentence selection means for selecting a sentence in a source language having a second generalization information number that matches the first generalization information number;
Sentence using generalization information acquired by the generalization information acquisition unit and one or more words out of two or more words divided by the word division unit, and one or more selected by the similarity calculation target sentence selection unit The machine translation apparatus according to claim 1, further comprising a similarity calculation unit that calculates a similarity with a source language sentence.

When the permission instruction accepting unit does not accept the input of the permission instruction,
A re-input prompting unit for performing an output for prompting re-input or editing of the sentence received by the sentence receiving unit;
The machine translation device according to claim 1, wherein the sentence receiving unit receives the sentence again.

When the permission instruction accepting unit does not accept the input of the permission instruction,
An output for prompting re-input of voice is provided, and a re-input prompting unit is further provided.
The machine translation apparatus according to claim 2, wherein the voice receiving unit receives voice again.

The generalized information acquisition unit
The machine translation apparatus according to claim 1, wherein generalization information is acquired only for a noun or adjective phrase.

Computer
A sentence reception unit for receiving sentences;
A phrase dividing unit that divides the sentence into two or more phrases;
A generalization information acquisition unit that acquires generalization information corresponding to one or more words divided by the word division unit from one or more word generalization information stored in a storage medium;
A sentence using generalization information acquired by the generalization information acquisition unit and one or more words out of two or more words divided by the word division unit, and two or more sets of translated sentences stored in a storage medium A similarity calculation unit for calculating the similarity with each source language sentence included in the set;
A similar sentence output unit that outputs one or more source language sentences similar to the sentence received by the sentence receiving unit using the similarity calculated by the similarity calculating unit;
Among the one or more source language sentences output by the similar sentence output unit, a permission instruction receiving unit that receives an input of a permission instruction that is an instruction indicating permission for one source language sentence;
A target language sentence acquisition unit that acquires a target language sentence corresponding to a source language sentence corresponding to the permission instruction received by the permission instruction reception unit;
A target language phrase acquisition unit that acquires a target language phrase that is paired with a source language phrase corresponding to the generalization information acquired by the generalization information acquisition unit, from one or more phrase sets stored in a storage medium; ,
The purpose of constructing a sentence of the target language to be output by replacing the phrase corresponding to the generalization information in the sentence of the target language acquired by the target language sentence acquiring unit with the phrase of the target language acquired by the target language phrase acquiring unit A language sentence composition part;
The program for functioning as an output part which outputs the sentence of the target language which the said target language sentence structure part comprised.

Computer
A sentence reception unit for receiving sentences;
A phrase dividing unit that divides the sentence into two or more words and acquires generalization information corresponding to the words;
A generalization information acquisition unit for determining and acquiring generalization information to be replaced with a phrase from among the generalization information corresponding to two or more words divided by the phrase division unit;
A sentence using generalization information acquired by the generalization information acquisition unit and one or more words out of two or more words divided by the word division unit, and two or more sets of translated sentences stored in a storage medium A similarity calculation unit for calculating the similarity with each source language sentence included in the set;
A similar sentence output unit that outputs one or more source language sentences similar to the sentence received by the sentence receiving unit using the similarity calculated by the similarity calculating unit;
Among the one or more source language sentences output by the similar sentence output unit, a permission instruction receiving unit that receives an input of a permission instruction that is an instruction indicating permission for one source language sentence;
A target language sentence acquisition unit that acquires a target language sentence corresponding to a source language sentence corresponding to the permission instruction received by the permission instruction reception unit;
A target language phrase acquisition unit that acquires a target language phrase that is paired with a source language phrase corresponding to the generalization information acquired by the generalization information acquisition unit, from one or more phrase sets stored in a storage medium; ,
The purpose of constructing a sentence of the target language to be output by replacing the phrase corresponding to the generalization information in the sentence of the target language acquired by the target language sentence acquiring unit with the phrase of the target language acquired by the target language phrase acquiring unit A language sentence composition part;
The program for functioning as an output part which outputs the sentence of the target language which the said target language sentence structure part comprised.

Computer
A voice reception unit for receiving voice;
Voice recognition processing is performed on the voice received by the voice reception unit, and the voice reception unit further functions as a sentence configuration unit that forms a sentence.
The program according to claim 8 or 9, wherein the sentence receiving unit functions to acquire a sentence formed by the sentence composing unit.