JP2003308319A

JP2003308319A - Device for selecting translation, translator, program for selecting translation, and translation program

Info

Publication number: JP2003308319A
Application number: JP2002113422A
Authority: JP
Inventors: Seiki Uchimoto; 清貴内元; Satoshi Sekine; 聡関根; Maki Murata; 真樹村田; Hitoshi Isahara; 均井佐原
Original assignee: Communications Research Laboratory
Current assignee: Communications Research Laboratory
Priority date: 2002-04-16
Filing date: 2002-04-16
Publication date: 2003-10-31
Anticipated expiration: 2022-04-16
Also published as: JP3752535B2

Abstract

<P>PROBLEM TO BE SOLVED: To precisely select a translation, and to precisely carry out mechanical translation, in an idiomatic expression in which selection of a proper translation or the translation is difficult in the prior art, without collecting a large volume of parallel translation example data. <P>SOLUTION: A method for outputting the translation corresponding to a translation-objective word in an input text, based on the similarity of a character string between the input text and the parallel translation example data, or/and a method for generating a learning model based on the parallel translation example data, and for outputting an translation option corresponding to one providing the maximum likelihood after the learning model of the highest precision for a learning data is applied to an input composition is/are used independently or combinedly. <P>COPYRIGHT: (C)2004,JPO

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、ある言語で入力さ
れたテキストを他の言語へ翻訳する際に使用される訳語
選択装置、翻訳装置及びそれらのプログラムに関するも
のである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a translation word selection device, a translation device, and programs for translating a text input in a language into another language.

【０００２】[0002]

【従来の技術】機械翻訳において、ある言語で記述され
た文、句、節、又は単語等の原テキストと、その原テキ
ストを別の言語に翻訳した翻訳テキストとを対にした対
訳データを格納したデータベースが使用されることがあ
る。特に最近では、単語だけでなく、単語を含む文や句
等の用例のデータベース（以下、「対訳コーパス」と称
する）が使用されるようになってきている。現在では、
新聞や辞書等を言語資源とした多種多様な対訳コーパス
がインターネット等で公開され、利用に供されている。2. Description of the Related Art In machine translation, parallel translation data is stored that pairs original texts such as sentences, phrases, sections, or words described in a certain language with translated texts obtained by translating the original texts into another language. Sometimes a database is used. Recently, in particular, not only words, but also databases of examples of sentences and phrases including words (hereinafter referred to as “translation corpus”) have been used. Currently,
A wide variety of bilingual corpora using newspapers, dictionaries, etc. as language resources have been made available on the Internet and available for use.

【０００３】機械翻訳では、訳語選択が重要な技術要素
の一つとして考えられるが、対訳用例コーパスを用いた
場合、単純には対訳データの量が多ければ多いほど用例
の数や種類が多くなると考えられることから、単一の対
訳コーパスのみを使用するのではなく、可能な限り多種
類の対訳コーパス又は対訳データを収集し、それらを用
いて機械翻訳を実行することが考えられている。この場
合、翻訳対象となる原言語での入力テキストに基づいて
収集された対訳コーパスを参照し、入力テキストと合致
する或いは最も類似する用例を含む原テキストに対応す
る対訳テキストを翻訳結果として出力する、という用例
ベースの訳語選択方法が最も単純な手法であると考えら
れる。この他にも、対訳コーパスに基づいて作成した学
習データを学習モデルに適用し、単純な統計的に確から
しい訳語を出力するという、学習ベースの訳語選択方法
も考えられている。In machine translation, the translation word selection is considered as one of the important technical elements. However, when the parallel translation example corpus is used, the number and types of examples increase simply as the amount of parallel translation data increases. It is conceivable that instead of using only a single bilingual corpus, collecting as many kinds of bilingual corpus or bilingual data as possible and performing machine translation using them. In this case, the bilingual corpus collected based on the input text in the source language to be translated is referred to, and the bilingual text corresponding to the source text that matches or is most similar to the input text is output as the translation result. The example-based translation word selection method is considered to be the simplest method. In addition to this, a learning-based translation word selection method in which learning data created based on a bilingual corpus is applied to a learning model and a simple statistically probable translation word is output is also considered.

【０００４】[0004]

【発明が解決しようとする課題】用例ベースの訳語選択
方法では、多種多様な対訳コーパスを参照しているた
め、それだけ翻訳の正確さが向上するものと一応は推測
することができる。しかしながら、多種類の対応する訳
語が存在する多義性を有する原言語の単語についてみれ
ば、上述の方法では、対訳コーパス中に入力テキストと
同一又は類似する用例が存在しなければ、正しい訳語を
出力することができず、柔軟性に欠けるという不具合が
ある。一方、学習ベースの訳語選択方法では、統計的に
頻度が高い用例で用いられている訳語を優先的に出力す
るために、数多くの用例で一般的に用いられ出現頻度の
高い当該単語の訳語の正確性は向上する一方で、出現頻
度が低い訳語については翻訳の正確さが低下する。Since the example-based translation word selection method refers to a wide variety of parallel translation corpus, it can be presumed that the translation accuracy is improved accordingly. However, regarding a source language word having polysemy, in which there are many types of corresponding translation words, the above method outputs a correct translation word if there is no example in the bilingual corpus that is the same as or similar to the input text. However, there is a problem that it is not flexible. On the other hand, in the learning-based translation word selection method, in order to preferentially output translation words used in statistically high frequency examples, While the accuracy is improved, the accuracy of the translation is decreased for the translated words with low occurrence frequency.

【０００５】このような問題は、ある単語が他の語句と
結びついて独特の表現となる、「慣用表現」を入力テキ
スト中に含む場合に生じることが多い。一例として、日
本語において多義的な「買う」という単語が原テキスト
に含まれる場合について考えると、「本を買う」という
表現と「反感を買う」という表現とでは、「買う」の意
味が異なり、それによって「買う」に対応する英語の訳
語が異なる。この場合、日英の対訳コーパスには、「物
を買う」という場合における「買う」の訳語と同じ英訳
語（ｂｕｙ）が使われる用例は多数あってその英訳語の
出現頻度は高いと考えられるのに対して、「反感を買
う」というような慣用表現では「買う」の英訳語（ａｎ
ｔｉｐａｔｈｙ）が特殊なものであるためにその英訳語
を含む「買う」の用例は少ないものと考えられる。Such problems often occur when the input text contains "idioms" in which a word is combined with other phrases to give a unique expression. As an example, consider the case where the original text contains the ambiguous word “buy” in Japanese. The meaning of “buy” differs between the expressions “buy a book” and “buy a feeling of disgust”. , The English translation corresponding to "buy" differs accordingly. In this case, in the Japanese-English parallel corpus, there are many examples in which the same translated word (buy) as the translated word "buy" in the case of "buying things" is used, and it is considered that the frequency of occurrence of the translated word is high. On the other hand, in an idiomatic expression such as "buy an antipathy", the English translation of "buy" (an
It is considered that there are few examples of "buy" including the English translation word because the tipathy) is special.

【０００６】また、いずれの訳語選択方法においても、
精度の高い翻訳を実現するには、対訳コーパスを大量に
収集する必要があるが、自然言語には多様なバリエーシ
ョンがあり得るため、単に多数の対訳コーパスを収集す
る方法ではコンピュータ処理の負荷が高まるだけで、現
実にはこのような方法によって短時間で正確な機械翻訳
を実施するのは不可能であると考えられる。Further, in any of the translation word selection methods,
In order to realize highly accurate translation, it is necessary to collect a large number of bilingual corpora, but since there are various variations in natural language, the method of simply collecting a large number of bilingual corpora increases the computational load. However, in reality, it is considered impossible to implement accurate machine translation in a short time by such a method.

【０００７】そこで本発明は、以上のような問題に鑑み
て、機械翻訳において、装置に過剰な負荷を掛けること
なく、訳語選択並びに翻訳を正確かつ適正に短時間で行
うことができるようにすることを主たる目的としてい
る。In view of the above-mentioned problems, the present invention makes it possible to accurately and properly perform translation word selection and translation in machine translation without imposing an excessive load on the device. The main purpose is that.

【０００８】[0008]

【課題を解決するための手段】本発明は、基本的に、第
１言語によるテキストからなる原言語用例及びそれに含
まれる語とその語の第２言語による訳語及び当該訳語に
関する情報とを含む原言語用例データと、原言語用例か
ら第２言語で翻訳されたテキストからなる目的言語用例
とを対にした対訳用例データを格納する対訳用例データ
格納部を利用して、第１言語で入力された入力テキスト
に含まれる翻訳すべき語である翻訳対象語に対応する第
２言語で記述された訳語を選択するものである。ここで
利用する対訳用例データ格納部は、上述したいわゆる対
訳コーパスに該当するが、一つ以上を利用すればその数
は問わない。但し、複数の対訳用例データ格納部を利用
すれば、用例数を増加させて訳語選択の正確性を向上す
ることができる。また、対訳用例データ格納部は、以下
に述べる訳語選択装置や翻訳装置の一構成要素とした
り、これら訳語選択装置等と通信可能な別の装置に設け
ることが可能である。SUMMARY OF THE INVENTION The present invention basically comprises an original language example consisting of text in a first language, a word contained therein, a translation of the word in a second language and information about the translation. It is input in the first language using the bilingual translation example data storage unit that stores bilingual translation example data that is a pair of language example data and a target language example composed of text translated from the source language example in the second language. The translated word described in the second language corresponding to the translation target word that is the word to be translated included in the input text is selected. The parallel translation example data storage unit used here corresponds to the so-called parallel translation corpus described above, but the number is not limited as long as one or more is used. However, by using a plurality of parallel translation example data storage units, it is possible to increase the number of examples and improve the accuracy of translation word selection. In addition, the parallel translation example data storage unit can be provided as a component of a translation word selection device or a translation device described below, or can be provided in another device that can communicate with the translation word selection device or the like.

【０００９】このようなものにおいて本発明は、図１に
概略構成図を示すように、第１の訳語選択装置Ａ１の基
本構成として、入力テキストの入力を受け付ける入力受
付部１と、その受け付けた入力テキスト中の前記翻訳対
象語に該当する語を含む少なくとも一以上の原言語用例
データを対訳用例データ格納部Ｃから抽出する用例抽出
部２と、抽出した原言語用例データと前記入力テキスト
とに基づき入力テキストと原言語用例との類似性を検出
する類似性検出部３と、検出した原言語用例の類似性を
比較評価して最も高い類似性を有する少なくとも原言語
用例データを出力する類似性評価部４と、出力した原言
語用例データに対応する対訳用例データに含まれる目的
言語用例中の前記翻訳対象語に対応する訳語を出力する
訳語出力部５とを有していることを特徴とするものであ
る。According to the present invention, as shown in the schematic configuration diagram of FIG. 1, the input acceptance unit 1 for accepting an input of an input text and its acceptance are accepted as the basic configuration of the first translation word selection device A1. An example extracting unit 2 for extracting at least one or more source language example data containing a word corresponding to the translation target word in the input text from the parallel translation example data storage unit C, and the extracted source language example data and the input text. A similarity detection unit 3 that detects the similarity between the input text and the source language example based on a similarity evaluation of the detected source language example and the similarity that outputs at least the source language example data that has the highest similarity. An evaluation unit 4 and a translation output unit 5 that outputs a translation corresponding to the translation target word in the target language example included in the parallel translation example data corresponding to the output source language example data. And it is characterized in that it is.

【００１０】このように構成することによって、入力テ
キスト中に含まれる翻訳対象語に対して、それが用いら
れている原言語用例との類似性が最も高い訳語を出力す
ることができる。したがって、特に原言語で使用される
慣用句等の出現頻度が低い語句の訳語選択に際して、あ
まりに多くの対訳用例データを利用することなく、また
コンピュータ処理に多大な負荷を掛けることなく、適切
な訳語選択を行うことが可能となる。With this configuration, it is possible to output a translation word having the highest similarity to the source language example in which the translation target word included in the input text is used. Therefore, when selecting a translated word of an infrequently appearing phrase such as an idiom used in the source language, an appropriate translated word is used without using too much parallel translation example data and without imposing a heavy load on computer processing. It becomes possible to make a selection.

【００１１】特に、類似性検出部３において、好適な類
似性の検出を行い得る態様としては、入力テキストと抽
出された原言語用例データに含まれる原言語用例とを文
字単位で比較して求められる差異に基づき入力テキスト
と原言語用例との一致した文字列の割合、又は一致した
部分が何カ所に分割されて一致しているかを示す分割数
の少なくともいずれか一方を用いて計算される類似度を
類似性として演算するようにしたものが挙げられる。In particular, as a mode in which the similarity detection unit 3 can perform suitable similarity detection, the input text and the source language example included in the extracted source language example data are compared character by character to obtain them. Similarity calculated using at least one of the ratio of the matching character string between the input text and the source language example, or the number of divisions indicating how many parts the matching part is divided and matched based on the difference An example is one in which degree is calculated as similarity.

【００１２】また、用例抽出部２で抽出した原言語用例
についてそれ以後の処理の便宜を図るためには、この用
例抽出部２において、抽出された原言語用例データに含
まれる原言語用例に文末処理を施して処理済原言語用例
を出力するようにすればよく、この場合、類似性検出部
３において、入力テキストと処理済原言語用例との文字
単位で比較した場合の差異の演算結果に基づいて、一致
した文字列の当該処理済原言語用例の文字列に対する割
合、又は一致した部分が何カ所に分割されて一致してい
るかを示す分割数の少なくともいずれか一方を類似度と
して演算するように構成することが望ましい。In order to facilitate the subsequent processing of the source language examples extracted by the example extracting unit 2, the example extracting unit 2 adds sentence endings to the source language examples included in the extracted source language example data. It suffices to perform processing and output the processed source language example. In this case, the similarity detection unit 3 calculates the difference between the input text and the processed source language example in character units. Based on this, at least one of the ratio of the matched character string to the character string of the processed source language example or the number of divisions indicating how many parts the matched portion is divided and matched is calculated as the similarity. It is desirable to configure it as follows.

【００１３】さらに、訳語出力部５において、類似性検
出部３で演算の上、出力し類似性評価部４で評価した結
果、類似度が最大となる原言語用例データが複数ある場
合が想定される。この場合、前記演算の結果、入力テキ
ストと一致した文字列又は前記分割数が最大の原言語用
例を含む対訳用例データにおける翻訳対象語に対応する
訳語を出力することで、最も適していると推定される訳
語を出力することができる。Further, in the translated word output unit 5, it is assumed that there is a plurality of source language example data having a maximum degree of similarity as a result of being calculated by the similarity detection unit 3 and output by the similarity evaluation unit 4. It In this case, as a result of the operation, it is presumed to be most suitable by outputting a translation corresponding to the translation target word in the parallel translation example data including the character string matching the input text or the source language example having the largest number of divisions. The translated word can be output.

【００１４】また、入力テキストの受付後の処理を簡便
化するには、入力受付部１において、入力テキストを形
態素解析により翻訳対象語を自動抽出するようにしてお
くことが好ましい。なお、「形態素解析」とは、入力テ
キストを単語毎に分割し、それぞれに品詞を割り当てる
等の解析処理をいい、所定の解析アルゴリズム及び解析
用辞書データが用いられる。Further, in order to simplify the processing after receiving the input text, it is preferable that the input receiving unit 1 automatically extracts the translation target word by morphological analysis of the input text. The “morphological analysis” refers to an analysis process such as dividing an input text into words and assigning a part of speech to each word, and a predetermined analysis algorithm and analysis dictionary data are used.

【００１５】さらに対訳用例データが、原言語用例に含
まれる語に基づいて生成された原言語見出し語を含むも
のである場合には、用例抽出部２において、少なくとも
翻訳対象語に該当する原言語見出し語を含む原言語用例
データを対訳用例データ格納部Ｃから抽出するようにす
ることで、対訳用例データ格納部Ｃからの原言語用例デ
ータの抽出処理を高速化することができる。Further, when the parallel translation example data includes a source language headword generated based on a word included in the source language example, the example extraction unit 2 at least applies the source language headword to the translation target word. By extracting the source language example data including "" from the parallel translation example data storage unit C, the extraction processing of the source language example data from the parallel translation example data storage unit C can be speeded up.

【００１６】さらにまた、対訳用例データが、原言語用
例に含まれる語に基づいて生成された原言語見出し語と
それに対応する訳語に基づいて生成された目的言語見出
し語とを有する場合には、用例抽出部２において、翻訳
対象語に該当する原言語見出し語を含む原言語用例デー
タを少なくとも抽出し、訳語出力部５において、類似性
評価部４で出力した原言語用例データに含まれ且つ用例
抽出部２で抽出した原言語見出し語に対応する目的言語
見出し語を出力することで、訳語出力までの処理をさら
に高速化することができる。Furthermore, when the parallel translation example data has a source language headword generated based on a word included in the source language example and a target language headword generated based on a corresponding translated word, The example extracting unit 2 extracts at least the source language example data including the source language headword corresponding to the translation target word, and the translated word output unit 5 includes the source language example data output by the similarity evaluation unit 4 and the example. By outputting the target language headword corresponding to the source language headword extracted by the extraction unit 2, the processing up to the translated word output can be further speeded up.

【００１７】また本発明は、図２に概略構成図を示すよ
うに、第２の訳語選択装置Ａ２の基本構成として、入力
テキストの入力を受け付ける入力受付部１１と、対訳用
例データ格納部に格納された原言語用例に含まれる語及
び当該原言語用例に対応する対訳用例データに基づいて
作成された学習データを利用して、入力受付部で受け付
けた入力テキスト中の翻訳対象語に対応した学習モデル
を生成する学習モデル生成部１２と、その生成した学習
モデルを入力テキスト中の翻訳対象語に適用し、当該翻
訳対象語の訳語候補の全てについて確信度を演算し、確
信度順に順序付けて訳語候補を出力する学習モデル適用
部１３と、その出力した訳語候補のうち最も高い確信度
が得られた訳語候補を選択して翻訳対象語に対応する訳
語として出力する訳語出力部１４とを有することを特徴
としている。ここで、「学習データ」とは、対訳用例に
基づいて作成された第１言語で入力される語、それに対
応して第２言語で出力されるべき正解の訳語、及びそれ
らに付随する属性や素性等の情報をいう。また、「学習
モデル」とは、前記学習データを利用して推定されたパ
ラメータを含み機械学習の手法により生成される関数的
モデルである。また、確信度の順序づけは、降順又は昇
順の何れであるかを問わない。Further, according to the present invention, as shown in a schematic configuration diagram of FIG. 2, as a basic configuration of the second translation word selection device A2, an input acceptance section 11 for accepting an input of an input text and a parallel translation example data storage section are stored. Learning corresponding to the translation target word in the input text accepted by the input accepting unit by using the learning data created based on the words included in the source language example and the parallel translation example data corresponding to the source language example. A learning model generation unit 12 that generates a model, and applies the generated learning model to a translation target word in an input text, calculates a certainty factor for all translation word candidates of the translation target word, and orders them in order of certainty factor. A learning model application unit 13 that outputs a candidate and a translation word candidate that has the highest certainty factor among the output translation word candidates are selected and output as a translation word corresponding to the translation target word. It is characterized by having a word output section 14. Here, the “learning data” means the words input in the first language created based on the bilingual translation example, the corresponding correct translated words to be output in the second language, and the attributes associated with them. Refers to information such as features. The “learning model” is a functional model that includes parameters estimated using the learning data and is generated by a machine learning method. Further, the ordering of the certainty factors does not matter whether it is a descending order or an ascending order.

【００１８】このような構成によれば、一定量の学習デ
ータを作成又は収集しておくと、それに基づいて生成し
た適切な学習モデルを翻訳対象となる目的言語に適用し
た上で、確信度の最も高い訳語候補、すなわち最も適切
であると推測することができる訳語を出力することがで
きる。したがって、このような訳語選択装置Ａ２であれ
ば、訳語選択に際して、翻訳対象となる語句（単語）ご
とに学習モデルを生成することで、各語句（単語）に応
じた適切なモデルによって訳語を選択することができる
ようになる。According to such a configuration, when a certain amount of learning data is created or collected, an appropriate learning model generated based on the learning data is applied to the target language to be translated, and then the confidence factor is calculated. The highest translation word candidate, that is, the translation word that can be inferred to be the most appropriate, can be output. Therefore, in such a translation word selection device A2, when selecting a translation word, a learning model is generated for each word (word) to be translated, and the translation word is selected by an appropriate model according to each word (word). You will be able to.

【００１９】特に学習モデル生成部１２において、入力
受付部１１で受け付けた入力テキスト中の翻訳対象語ご
とにそれを含む原言語用例に対応する対訳用例データを
対訳用例データ格納部Ｃから抽出し、その抽出された対
訳用例データに基づいて学習モデルを生成するように構
成すれば、迅速且つ正確な訳語出力処理を行うことがで
きる。In particular, in the learning model generation unit 12, for each translation target word in the input text accepted by the input acceptance unit 11, the bilingual translation example data corresponding to the source language example including it is extracted from the bilingual translation example data storage unit C, If the learning model is configured to be generated based on the extracted parallel translation example data, it is possible to quickly and accurately perform the translated word output process.

【００２０】また、出力する訳語の正確性を高めるため
には、学習モデル生成部１２において、学習データを利
用し各学習データごとにそれぞ学習モデルを生成し、さ
らに入力受付部１１で受け付けた入力テキスト中の翻訳
対象語ごとに学習データで精度が最高となる学習モデル
を選択し、学習モデル適用部１３において、学習モデル
生成部１２で選択した最高の精度を得た学習モデルを入
力テキスト中の翻訳対象語に適用するようにするとよ
い。なお、利用する学習データ数は一つであってもよい
し複数であってもよい。In order to increase the accuracy of the translated word to be output, the learning model generating unit 12 uses the learning data to generate a learning model for each learning data, and the input receiving unit 11 receives the learning model. A learning model having the highest accuracy in the learning data is selected for each translation target word in the input text, and the learning model applying unit 13 selects the learning model having the highest accuracy selected in the learning model generating unit 12 in the input text. Should be applied to the translation target word of. The number of learning data to be used may be one or plural.

【００２１】また、この訳語選択装置Ａ２においても、
入力受付部１１において、入力テキストを形態素解析に
より翻訳対象語を自動抽出することで、入力テキストの
受付後の処理を簡便化することができる。同様に、対訳
用例データに、原言語用例に含まれる語に基づいて生成
された原言語見出し語が含まれる場合には、学習モデル
生成部１２が、少なくとも翻訳対象語に該当する原言語
見出し語を含む原言語用例データを対訳用例データ格納
部Ｃから抽出するようにすることで、対訳用例データ格
納部Ｃからの原言語用例データの抽出処理を高速化する
ことができる。Also, in this translated word selection device A2,
By automatically extracting the translation target words from the input text by morphological analysis in the input reception unit 11, it is possible to simplify the processing after the reception of the input text. Similarly, when the bilingual example data includes a source language headword generated based on a word included in the source language example, the learning model generation unit 12 causes the learning model generation unit 12 to at least correspond to the source language headword. By extracting the source language example data including "" from the parallel translation example data storage unit C, the extraction processing of the source language example data from the parallel translation example data storage unit C can be speeded up.

【００２２】本発明の訳語選択装置はまた、上述した２
種類の訳語選択装置Ａ１、Ａ２を組み合わせた態様とし
て、出力される訳語の精度を飛躍的に向上させることも
できる。すなわち、本発明は、図３に概略構成図を示す
ように、第３の訳語選択装置Ａ３の基本構成として、入
力テキストの入力を受け付ける入力受付部３１と、入力
受付部１で受け付けた入力テキスト中の前記翻訳対象語
に該当する語を含む少なくとも一以上の原言語用例デー
タを、前記対訳用例データ格納部Ｃから抽出する用例抽
出部３２と、入力テキスト及び用例抽出部で抽出した原
言語用例データに基づき入力テキストと原言語用例との
類似性を検出する類似性検出部３３と、類似性検出部３
で検出した原言語用例の類似性を比較評価し最も高い類
似性を有する少なくとも原言語用例データを出力する類
似性評価部３４と、対訳用例データ格納部Ｃに格納され
た原言語用例に含まれる語及び当該原言語用例に対応す
る対訳用例データに基づいて作成された学習データを利
用して、入力受付部３１で受け付けた入力テキスト中の
翻訳対象語に対応した学習モデルを生成する学習モデル
生成部３５と、学習モデル生成部３５で生成した学習モ
デルを入力テキスト中の翻訳対象語に適用し、当該翻訳
対象語の訳語候補の全てについて確信度を演算し、確信
度順に順序付けて訳語候補を出力する学習モデル適用部
３６と、類似性評価部３４で出力する原言語用例データ
に対応する対訳用例データに含まれる目的言語用例中の
翻訳対象語に対応する訳語、又は、学習モデル適用部３
６で出力する訳語候補から、最適のもの、すなわち前記
訳語又は最高の確信度を得た訳語候補のいずれかを選択
して翻訳対象語に対応する訳語として出力する訳語出力
部３７とを有することを特徴とするものである。The translation word selection device of the present invention also has the above-mentioned 2
It is also possible to dramatically improve the accuracy of the translated word as a mode in which the translated word selection devices A1 and A2 of different types are combined. That is, according to the present invention, as shown in the schematic configuration diagram of FIG. 3, as a basic configuration of the third translated word selection device A3, an input acceptance unit 31 that accepts an input of an input text and an input text accepted by the input acceptance unit 1 are provided. An example extraction unit 32 that extracts at least one or more source language example data including a word corresponding to the translation target word from the parallel translation example data storage unit C, and an example of the source language extracted by the input text and the example extraction unit. A similarity detection unit 33 that detects the similarity between the input text and the source language example based on the data, and the similarity detection unit 3
Included in the source language example stored in the parallel translation example data storage unit 34 and the similarity evaluation unit 34 that compares and evaluates the similarity of the source language example detected in step S3 and outputs at least the source language example data having the highest similarity. Learning model generation for generating a learning model corresponding to the translation target word in the input text accepted by the input accepting unit 31 using the learning data created based on the word and the parallel translation example data corresponding to the source language example The unit 35 and the learning model generated by the learning model generation unit 35 are applied to the translation target words in the input text, the confidence factors are calculated for all the translation word candidates of the translation target words, and the translation word candidates are ordered in the confidence level. Corresponds to the translation target word in the target language example included in the bilingual example data corresponding to the source language example data output by the learning model application unit 36 and the similarity evaluation unit 34 That translation, or, learning model applying section 3
A translation word output unit 37 that selects the most suitable translation word from the translation word candidates output in step 6, that is, the translation word or the translation word candidate with the highest certainty factor and outputs it as the translation word corresponding to the translation target word. It is characterized by.

【００２３】すなわち、入力受付部３１で受け付けた入
力テキスト及び対訳用例データ格納部Ｃに格納される対
訳用例データに基づいて、第１の訳語選択装置Ａ１に該
当する用例抽出部３２、類似性検出部３３及び類似性評
価部３４により処理された訳語、或いは第２の訳語選択
装置Ａ２に該当する学習モデル生成部３５及び学習モデ
ル適用部３６により処理された訳語候補のいずれかを、
訳語出力部３７において出力する。なお、第１の訳語選
択装置Ａ１該当部分と第２の訳語選択装置Ａ２該当部分
とが利用する対訳用例データ格納部Ｃは、同一のもので
あってもよいし異なっていてもよい。That is, based on the input text received by the input receiving unit 31 and the parallel translation example data stored in the parallel translation example data storage unit C, the example extraction unit 32 corresponding to the first translation word selection device A1 and the similarity detection. Either the translated word processed by the unit 33 and the similarity evaluation unit 34, or the translated word candidate processed by the learning model generation unit 35 and the learning model application unit 36 corresponding to the second translated word selection device A2,
The translated word is output by the translated word output unit 37. The parallel translation example data storage units C used by the first translation word selection device A1 corresponding part and the second translation word selection device A2 corresponding part may be the same or different.

【００２４】この場合、望ましくは次の二態様の何れか
を採用することが好適である。In this case, it is preferable to adopt either of the following two modes.

【００２５】すなわち、まず、第１の訳語選択装置Ａ１
該当部分と、第２の訳語選択装置Ａ２該当部分とを並列
的に動作させ、訳語出力部３７において、類似性評価部
３４で所定の閾値以上の類似性が得られた対訳用例デー
タの出力がある場合に、その結果得られる翻訳対象語に
対応する訳語を出力し、所定の閾値以上の類似性が得ら
れた対訳用例データの出力がない場合に、学習モデル適
用部３６で出力した結果得られる翻訳対象語に対応する
訳語を出力する態様をとることができる。このようにす
れば、並列処理により迅速に訳語を出力できることにな
る。That is, first, the first translation word selection device A1
The corresponding part and the corresponding part of the second translation word selection device A2 are operated in parallel, and the translation word output part 37 outputs the parallel translation example data for which the similarity evaluation part 34 has obtained a similarity equal to or higher than a predetermined threshold value. In some cases, the translated word corresponding to the translation target word obtained as a result is output, and when there is no output of the bilingual translation example data for which the similarity equal to or greater than a predetermined threshold value is output, the learning model application unit 36 outputs the result output. It is possible to adopt a mode of outputting a translated word corresponding to the translated target word. In this way, the translated words can be output quickly by parallel processing.

【００２６】一方、第１の訳語選択装置Ａ１該当部分を
まず動作させ、類似性評価部３４において所定の閾値以
上の類似性が得られた対訳用例データの出力がない場合
に、第２の訳語選択装置該当部分Ａ２である前記学習モ
デル生成部３５、学習モデル適用部３６を動作させたう
えで、訳語出力部３７を動作させるようにする態様をと
ることもできる。このようにすれば、類似性評価部３４
において閾値以上の類似性が得られた対訳用例データが
あれば、第２の訳語選択装置該当部分Ａ２を動作させる
必要がないためコンピュータ処理に掛かる負荷を低減す
るとともに、第２の訳語選択装置Ａ２該当部分を動作さ
せる際に、異なる対訳用例データ格納部Ｃを利用するな
ど、必要に応じて対訳用例データを追加収集又は取捨選
択することができる。On the other hand, when the corresponding portion of the first translation word selection device A1 is operated first, and the parallel translation example data for which the similarity evaluation unit 34 has obtained a similarity of a predetermined threshold value or more is not output, the second translation word is output. It is also possible to adopt a mode in which the translation model output unit 37 is operated after operating the learning model generation unit 35 and the learning model application unit 36 that are the selection device relevant portions A2. In this way, the similarity evaluation unit 34
If there is parallel translation example data in which the similarity equal to or more than the threshold value is obtained, it is not necessary to operate the second translation word selection device corresponding portion A2, so that the load on the computer processing is reduced and the second translation word selection device A2 When operating the corresponding part, additional parallel translation example data can be additionally collected or selected, for example, by using a different parallel translation example data storage unit C.

【００２７】上記いずれの態様であっても、用例抽出部
３２が利用する対訳用例データ格納部と、学習モデル生
成部３５が利用する対訳用例データ格納部Ｃとが、それ
ぞれ異なる言語資源に基づいて作成された異なるもので
あれば、対訳用例の数及び種類をより多様なものとし
て、最終的に出力される訳語の正確性を向上することが
可能となる。In any of the above modes, the parallel translation example data storage unit used by the example extraction unit 32 and the parallel translation example data storage unit C used by the learning model generation unit 35 are based on different language resources. If they are created differently, it is possible to make the number and types of bilingual translation examples more diverse and improve the accuracy of the translated word finally output.

【００２８】また本発明は、以上のような訳語選択装置
Ａ１、Ａ２、Ａ３の何れかを利用して、好適な翻訳装置
を構成することも可能である。すなわち、当該翻訳装置
は、訳語選択装置Ａ１、Ａ２、Ａ３の構成に加えて、そ
れら何れかにおける訳語出力部で出力した訳語及び当該
訳語を含む対訳用例データに基づいて、入力テキストに
対応する対象テキストを生成し出力する翻訳文出力部を
更に備えたものである。このようにすれば、単に入力テ
キスト中の翻訳対象語に対応する訳語選択を行うのみな
らず、第１言語による入力テキストに基づいて第２言語
で翻訳された対象テキストを生成して出力することまで
可能となる。Further, according to the present invention, it is possible to construct a suitable translation device by using any of the translation word selection devices A1, A2 and A3 as described above. That is, the translation device, in addition to the configurations of the translation word selection devices A1, A2, and A3, targets corresponding to the input text based on the translation word output by the translation word output unit in any of them and the parallel translation example data including the translation word. It further comprises a translated sentence output unit for generating and outputting a text. By doing so, not only the translation word corresponding to the translation target word in the input text is selected, but also the target text translated in the second language is generated and output based on the input text in the first language. It becomes possible.

【００２９】[0029]

【発明の実施の形態】以下、本発明の一実施形態を、図
４〜図８を参照して説明する。BEST MODE FOR CARRYING OUT THE INVENTION An embodiment of the present invention will be described below with reference to FIGS.

【００３０】図４に概略構成図を示すこの実施形態は、
上述した第３の基本構成を有する訳語選択装置Ａ３であ
る。すなわち、第１の基本構成を有する訳語選択装置Ａ
１に該当する部分と、第２の基本構成を有する訳語選択
装置Ａ２に該当する部分と、これらに共通する部分とか
ら構成される。また、対訳用例データ格納部Ｃは、この
訳語選択装置Ａ３に含まれるものとしているが、必要に
応じて通信回線で接続された他の装置に設けてある対訳
用例データ格納部Ｃから収集することも可能である。な
お、本実施形態では、第１言語（原言語）として日本語
を、第２言語（目的言語）として英語を適用した場合に
ついて説明するものとする。This embodiment, whose schematic configuration is shown in FIG.
It is the translated word selection device A3 having the above-described third basic configuration. That is, the translated word selection device A having the first basic configuration
1, a part corresponding to the translation word selection device A2 having the second basic configuration, and a part common to these. Although the parallel translation example data storage unit C is included in the translation word selection device A3, it may be collected from the parallel translation example data storage unit C provided in another device connected by a communication line as needed. Is also possible. In the present embodiment, a case where Japanese is applied as the first language (source language) and English is applied as the second language (target language) will be described.

【００３１】まず、対訳用例データ格納部Ｃについて説
明する。対訳用例データ格納部Ｃは、日本語によるテキ
ストからなる用例（以下、「日本語用例」）及び当該日
本語用例に含まれる語とその語の英語による訳語（以
下、英訳語）並びに当該英訳語に関する各種情報とを含
む日本語用例データと、前記日本語用例に対応して英語
に翻訳されたテキストからなる英語用例を含む英語用例
データとを対にした日英対訳用例データを格納してある
データベースである。なお、日英対訳用例データにはさ
らに、日本語用例毎に翻訳対象語となり得る日本語見出
し語が含まれており、場合によっては当該日本語見出し
語に対応する正しい訳語となり得る英語見出し語が含ま
れる場合がある。このような日英対訳用例データとして
は、例えば新聞や雑誌等の記事に基づき出現頻度等を考
慮して作成されたデータベースや、日英対訳電子辞書デ
ータベース、その他オンライン上で利用可能なデータベ
ース等に格納されたデータを利用することができる。First, the parallel translation example data storage section C will be described. The bilingual example data storage unit C includes an example composed of Japanese text (hereinafter, “Japanese example”), a word included in the Japanese example, a translated word in English of the word (hereinafter, English translated word), and the English translated word. Japanese-English bilingual example data is stored which is a pair of Japanese example data including various information regarding the above and English example data including an English example including text translated into English corresponding to the Japanese example. It is a database. In addition, the Japanese-English bilingual example data further includes a Japanese headword that can be a translation target word for each Japanese example, and in some cases, an English headword that can be a correct translation corresponding to the Japanese headword is included. May be included. Examples of such Japanese-English bilingual example data include databases created in consideration of the frequency of occurrence based on articles such as newspapers and magazines, Japanese-English bilingual electronic dictionary databases, and other databases that can be used online. The stored data can be used.

【００３２】ここで、日英対訳用例データの一例の一部
を図５に示す。この例では、日本語「遠慮」という語を
含む３つの日本語用例と、それらに対応する英語用例と
が組になっている。この場合、日本語見出し語には「遠
慮」が該当し、英語見出し語には「feel constraine
d」、「constraint」、「refrain」等が該当する。但
し、日本語見出し語に対応する英語見出し語のみ、或い
は日本語見出し語と英語見出し語の両方に関しては、既
に設定されたものがある場合はそれを利用すればよく、
ない場合は人手で設定するか或いはコンピュータ処理に
より自動的に設定されるようにしておく必要がある。FIG. 5 shows a part of an example of Japanese-English parallel translation example data. In this example, three Japanese examples including the Japanese word "nonsense" and corresponding English examples are paired. In this case, "don't care" applies to Japanese headwords and "feel constraine" to English headwords.
"d", "constraint", "refrain", etc. are applicable. However, for English headwords corresponding to Japanese headwords only, or for both Japanese headwords and English headwords, if there are already set, you can use it.
If not, it must be set manually or automatically set by computer processing.

【００３３】次に訳語選択装置Ａ３の機能について説明
する。この訳語選択装置Ａ３は、汎用コンピュータ又は
専用コンピュータのＨＤＤ等の記憶装置に記憶させた所
定のプログラムに従ってＣＰＵやメモリ等の通常のコン
ピュータが有する内部及び外部装置が動作することによ
って、第１の訳語選択装置Ａ１としての機能を奏する用
例抽出部３２、類似性検出部３３、類似性評価部３４
と、第２の訳語選択装置Ａ２としての機能を奏する学習
モデル生成部３５、学習モデル適用部３６と、これらに
共通の機能を奏する入力受付部３１、訳語出力部３７と
しての機能を発揮する。Next, the function of the translation word selection device A3 will be described. This translation word selection device A3 is a first translation word when the internal and external devices of a normal computer such as a CPU and a memory operate according to a predetermined program stored in a storage device such as an HDD of a general-purpose computer or a dedicated computer. The example extraction unit 32, the similarity detection unit 33, and the similarity evaluation unit 34 that function as the selection device A1.
The learning model generation unit 35 and the learning model application unit 36 that have the function of the second translation word selection device A2, and the functions of the input reception unit 31 and the translation word output unit 37 that have the same functions as these are exhibited.

【００３４】入力受付部３１は、日本語で作成されたテ
キストデータ（入力テキスト）の入力を受け付ける。こ
の入力受付部３１には、入力テキスト処理部３１１が含
まれる。入力テキスト処理部３１１は、前記入力テキス
トに対して形態素解析を行い、当該入力テキストから翻
訳対象語を自動的に抽出する。なお、入力テキストの入
力時に、翻訳対象語を指定しておくことができるが、こ
の場合は入力テキスト処理部３１１にて形態素解析のみ
を行う。The input receiving section 31 receives input of text data (input text) created in Japanese. The input receiving unit 31 includes an input text processing unit 311. The input text processing unit 311 performs a morphological analysis on the input text and automatically extracts a translation target word from the input text. It should be noted that the translation target word can be designated at the time of inputting the input text, but in this case, the input text processing unit 311 only performs morphological analysis.

【００３５】用例抽出部３２は、入力受付部３１で得ら
れた翻訳対象語が含まれた日本語用例データを、対訳用
例データ格納部Ｃを抽出する。その際、対訳用例データ
格納部Ｃに日本語見出し語が含まれている場合にはそれ
を参照して該当する翻訳対象語を検索のうえ抽出を行
う。この用例抽出部３２には、原言語用例処理部たる日
本語用例処理部３２１が含まれる。この日本語用例処理
部３２１は、対訳用例データ格納部Ｃから抽出した日本
語用例データについて、文末処理を行うものである。例
えば上述の図５に示す日英対訳用例データのうち、日本
語用例データについて文末処理を行うことによりと、
「母に遠慮する」、「母への遠慮」、「献金を遠慮して
もらう」は、それぞれ「母に遠慮」、「母への遠慮」、
「献金を遠慮」となる。The example extraction unit 32 extracts the Japanese example data including the translation target word obtained by the input reception unit 31 in the parallel translation example data storage unit C. At this time, if the parallel translation example data storage unit C contains a Japanese headword, the corresponding translation target word is searched for and extracted by referring to it. The example extracting unit 32 includes a Japanese example processing unit 321 which is a source language example processing unit. The Japanese example processing unit 321 performs sentence end processing on the Japanese example data extracted from the bilingual example data storage unit C. For example, by performing sentence end processing for Japanese example data among the Japanese-English bilingual example data shown in FIG.
"Refrain from my mother", "Refrain from my mother", and "Refrain from my donation" mean "Refrain from my mother", "Refrain from my mother",
"Please refrain from donation."

【００３６】類似性検出部３３は、入力受付部３１で受
け付けた入力テキストと、用例抽出部３２で抽出した日
本語用例データとを対比し、それらの類似性を検出す
る。具体的にはこの類似性検出部３３に含まれる類似度
演算部３３１により演算された入力テキストと日本語用
例データとの一致する割合である類似度が前記類似性と
して検出される。すなわち、類似度は、動的計画法によ
り入力テキストと日本語用例データとを文字単位で比較
して両者の差異を求め、一致した文字列の割合として求
められる。より具体的に類似度は、例えばＵＮＩＸ（登
録商標）のｄｉｆｆコマンドにより次式The similarity detecting section 33 compares the input text accepted by the input accepting section 31 with the Japanese example data extracted by the example extracting section 32, and detects the similarity between them. Specifically, the similarity, which is the ratio of matching between the input text calculated by the similarity calculation unit 331 included in the similarity detection unit 33 and the Japanese example data, is detected as the similarity. That is, the degree of similarity is obtained as the ratio of the matched character strings by comparing the input text and the Japanese example data character by character by dynamic programming. More specifically, the similarity is calculated by the following equation using, for example, the UNIX diff command.

【００３７】[0037]

【式１】 [Formula 1]

【００３８】により求められる。なお、日本語用例デー
タは、日本語用例処理部３２１で文末処理を施したもの
を利用する。It is calculated by It should be noted that the Japanese example data used is the one that has been subjected to sentence ending processing by the Japanese example processing unit 321.

【００３９】類似性評価部３４は、入力テキストと対比
された各日本語用例データについて類似性検出部３３で
検出した類似性、すなわち前式で得られた類似度を比較
評価し、最も高い類似度ｒが得られた日本語用例データ
又はその日本語用例データを含む日英対訳用例データを
出力する。このとき、最大の類似度ｒが得られた日本語
用例データが複数あった場合は、最長の日本語用例を含
む日本語用例データを最も高い類似性を有するものとし
て出力する。但し、入力テキストと一致した部分が日本
語見出し単語の長さよりも長い場合に限られる。The similarity evaluation unit 34 compares and evaluates the similarity detected by the similarity detection unit 33 for each Japanese example data compared with the input text, that is, the similarity obtained by the above equation, and the highest similarity. The Japanese example data for which the degree r has been obtained or the Japanese-English bilingual example data including the Japanese example data is output. At this time, when there are a plurality of Japanese example data for which the maximum similarity r is obtained, the Japanese example data including the longest Japanese example is output as having the highest similarity. However, it is limited to the case where the part that matches the input text is longer than the length of the Japanese headword.

【００４０】学習モデル生成部３５は、学習データを利
用して入力受付部３１で受け付けた入力テキスト中の翻
訳対象語毎に対応した学習モデルを生成する。学習デー
タは、対訳用例データ格納部Ｃに格納された日本語用例
に含まれる語とその日本語用例に対応する英語用例デー
タとに基づいて作成されたものであり、日本語で入力さ
れる語、それに対応して英語で出力されるべき正解の訳
語、及びそれらに付随する属性や素性等の情報等からな
る。また、本実施形態では学習モデルとして、例えばＳ
ＶＭ（Support Vector Machine）、ＭＥ（Maximum Entr
opy）、ＤＬ（Decision List）等の既知の機械学習モデ
ルを複数種類適用することとしている。そして、これら
学習モデルを各翻訳対象語に適用することにより、それ
ぞれの正解の訳語が生成される確率を求める。その際、
各学習モデルには、素性を与える必要があるが、本実施
形態では素性として、前記学習データから得られた情報
である形態素情報、文字n-gram、最大一致となる日本語
用例に関する情報、内容語とその訳語候補の出現頻度に
関する情報の４種類の情報を用いている。この学習モデ
ル生成部３５には、学習モデル選択部３５１が含まれ
る。この学習モデル選択部３５１は、各学習モデルにつ
いて学習データを用いてクロスバリデーションを行い精
度が最高となる学習モデルを選択する。The learning model generator 35 uses the learning data to generate a learning model corresponding to each translation target word in the input text accepted by the input acceptor 31. The learning data is created based on the words included in the Japanese example stored in the bilingual example data storage section C and the English example data corresponding to the Japanese example, and the words input in Japanese. , Corresponding correct words to be output in English, and information such as attributes and features associated with them. Further, in the present embodiment, as a learning model, for example, S
VM (Support Vector Machine), ME (Maximum Entr)
Opy), DL (Decision List), and other known machine learning models are applied. Then, by applying these learning models to each translation target word, the probability that each correct translation word is generated is obtained. that time,
It is necessary to give a feature to each learning model, but in the present embodiment, as a feature, morpheme information that is information obtained from the learning data, characters n-gram, information about a Japanese example that is the maximum match, content Four types of information, that is, information on the appearance frequency of a word and its translation candidate are used. The learning model generation unit 35 includes a learning model selection unit 351. The learning model selection unit 351 performs a cross validation using learning data for each learning model and selects a learning model having the highest accuracy.

【００４１】学習モデル適用部３６は、学習モデル生成
部３５で生成した学習モデル、具体的には学習モデル選
択部３５１で選択した学習モデルを入力テキスト中の翻
訳対象語に適用することにより、その翻訳対象語の訳語
候補の全てについて確信度を演算し、確信度順に順序付
けを行って訳語候補を出力する。この確信度は基本的
に、文脈の集合をＢ、分類クラスの集合をＡとした場
合、文脈ｂ（∈Ｂ）でクラスａ（∈Ａ）となる事象
（ａ，ｂ）の確率分布のスコアｐ（ａ，ｂ）として求め
られる。なお、学習モデルの種類によってこのような確
率分布が得られない場合、例えばＳＶＭを適用した場
合、便宜的に最適のクラスに対して確率値を１、その他
のクラスに対して確率値を０としている。The learning model application unit 36 applies the learning model generated by the learning model generation unit 35, specifically, the learning model selected by the learning model selection unit 351, to the translation target word in the input text, thereby The certainty factor is calculated for all the translated word candidates of the translation target word, and the translated word candidates are output by ordering in the certainty factor order. Basically, the certainty factor is the score of the probability distribution of the event (a, b) that becomes the class a (εA) in the context b (εB), where B is the set of contexts and A is the set of classification classes. It is obtained as p (a, b). Note that when such a probability distribution cannot be obtained depending on the type of learning model, for example, when SVM is applied, the probability value is set to 1 for the optimum class and 0 for the other classes for convenience. There is.

【００４２】訳語出力部３７は、入力テキスト中の翻訳
対象語に対応する訳語を出力するものであり、訳語選択
装置Ａ１のルート又は訳語選択装置Ａ２のルートの何れ
かから得られる訳語、すなわち、類似性評価部３４で最
高の類似性を得た日本語用例データに該当する日英対訳
用例データに含まれる訳語、又は、学習モデル適用部３
６で出力した訳語候補のうち最高の確信度（スコア）を
得た訳語候補、の何れかを選択して出力する。具体的
に、本実施形態では、類似性検出部３３における類似性
演算部３３１で得られる類似度に閾値を設定しており、
類似性評価部３４で出力する日本語用例データが当該閾
値以上の場合には、その日本語用例データに対応する訳
語を出力する。本実施形態では前記閾値を１としてい
る。一方、閾値以上の日本語用例データがない場合に、
学習モデル適用部３６で出力した訳語候補から最高の確
信度を得たものを出力する。なお、入力受付部３１で入
力テキストを受け付けた際に、訳語選択装置Ａ１のルー
トと訳語選択装置Ａ２のルートとを同時に動作させても
よいし、訳語選択装置Ａ１のルートを先に動作させてか
ら閾値以上の日本語用例データがない場合にのみ訳語選
択装置Ａ２のルートを動作させてもよい。The translated word output unit 37 outputs a translated word corresponding to the translation target word in the input text, and is a translated word obtained from either the root of the translated word selection apparatus A1 or the root of the translated word selection apparatus A2, that is, The translation included in the Japanese-English bilingual example data corresponding to the Japanese example data that has the highest similarity in the similarity evaluating unit 34, or the learning model applying unit 3
Among the translation word candidates output in 6, the translation word candidate having the highest certainty factor (score) is selected and output. Specifically, in the present embodiment, a threshold is set for the degree of similarity obtained by the similarity calculator 331 in the similarity detector 33,
When the Japanese example data output by the similarity evaluation unit 34 is equal to or more than the threshold value, the translated word corresponding to the Japanese example data is output. In this embodiment, the threshold value is 1. On the other hand, if there is no Japanese example data above the threshold,
From the translation word candidates output by the learning model application unit 36, the one with the highest degree of certainty is output. When the input acceptance unit 31 accepts the input text, the route of the translation word selection device A1 and the route of the translation word selection device A2 may be operated simultaneously, or the route of the translation word selection device A1 may be operated first. The route of the translated word selection device A2 may be operated only when there is no Japanese example data equal to or more than the threshold value.

【００４３】以下、本実施形態の訳語選択装置Ａ３の一
利用態様例を、図６及び図７に示した訳語選択装置Ａ３
の動作手順を表すフローチャートを用いて説明する。な
お、以下の説明は、本発明の発明者が参加した（参加者
名、CRL-NYU）単語の多義性解消コンテスト第２回ＳＥ
ＮＳＥＶＡＬ｛以下、「ＳＥＮＳＥＶＡＬ２」、２００
１年開催（SENSEVAL-2 Organization Committee）｝の
日本語翻訳タスクに本実施形態の訳語選択装置Ａ３を適
用したものであり、同コンテストにおいては訳語選択装
置Ａ３の改良前のもので参加しているが、極めて高い評
価を得ている。Hereinafter, an example of the usage of the translation word selection device A3 of the present embodiment is shown in FIGS. 6 and 7, and the translation word selection device A3.
This will be described with reference to the flowchart showing the operation procedure. In the following explanation, the inseparator of the present invention (participant name, CRL-NYU) word disambiguation contest 2nd SE
NSEVAL {hereinafter, "SENSEVAL2", 200
The translated word selection device A3 of the present embodiment is applied to the Japanese translation task held for one year (SENSEVAL-2 Organization Committee)}, and in this contest, the translated word selection device A3 before improvement is used. However, it has an extremely high reputation.

【００４４】前提として、日英対訳用例データ（３２０
語の日本語見出し語、一見出し語につき約２０の用例
数）は前記コンテスト前に予め与えられたＳＥＮＳＥＶ
ＡＬ２日本語翻訳タスクのものに準ずる。これらのうち
から選択された４０語（名詞２０語、動詞２０語）につ
いて３０出現ずつのテストデータが用いられ、翻訳対象
とされる日本語の単語はのべ１２００語である。また、
コンテストのしゃん貨車は、与えられた日英対訳用例デ
ータ以外の言語資源から得た対訳辞書や各種新聞記事に
基づく日英対訳用例データも用いることも許容されてい
る。さらに、最終的に出力された訳語の正誤を公正に評
価するために、所定の入力テキスト及び翻訳対象語と正
解の訳語に基づいて、訳語の精度が評価されている。As a premise, Japanese-English parallel translation example data (320
Japanese headwords, about 20 examples per headword) are given in advance before the contest.
Same as the AL2 Japanese translation task. For 40 words selected from these (20 nouns, 20 verbs), 30 occurrences of test data are used, and the total number of Japanese words to be translated is 1200 words. Also,
It is allowed to use the parallel translation dictionary obtained from a language resource other than the given Japanese-English parallel translation example data and the Japanese-English parallel translation example data based on various newspaper articles in the contest wagon. Further, in order to fairly evaluate the correctness of the finally output translated word, the accuracy of the translated word is evaluated based on the predetermined input text, the translation target word, and the correct translated word.

【００４５】なお、説明を簡素化するため、ここではま
ず訳語選択装置Ａ１のルートから開始し、当該ルートか
ら訳語が出力されなかった場合に訳語選択装置Ａ２のル
ートに移行する態様について説明するが、両ルートを同
時に進行させてもよいのは上述したとおりである。ま
ず、入力受付部３１が入力テキスト（例えば慣用表現で
ある「一役買う」の表現を含む日本語のテキスト）の入
力を受け付ける（図６；ステップＳ１）と、入力テキス
ト処理部３１１がこの入力テキストを形態素解析するこ
とにより、翻訳対象語（例えば＜買う＞）を抽出する
（ステップＳ２）。次に、用例抽出部３２が前記抽出さ
れた翻訳対象語（＜買う＞）に基づいて対訳用例データ
格納部Ｃを検索し、当該翻訳対象語を含む日本語用例デ
ータを抽出し（ステップＳ３）、日本語用例処理部３２
１が抽出した日本語用例データに含まれる各日本語用例
について文末処理を行う（ステップＳ４）。次に、この
文末処理が施された各日本語用例と前記入力テキストに
ついて、類似性検出部３３における類似性演算部３３１
が前記式１に基づいて類似度ｒを演算する（ステップＳ
５）。そして、類似度ｒが最大となる日本語用例数を調
べ（ステップＳ６）、その数が１であれば（ステップＳ
６；Ｙ）、類似性評価部３４が、当該日本語用例を含む
日本語用例データを出力する（ステップＳ７）。一方、
ステップＳ６において類似度ｒが最大の日本語用例数が
１以上であれば（ステップＳ６；Ｎ）、そのうち類似す
る文字列が最長の日本語用例を含む日本語用例データを
選択し（ステップＳ６ａ）、その日本語用例データを最
も高い類似性を有するものとして出力する（ステップＳ
７）。ここで、この場合、類似度ｒが最高の日本語用例
が、入力テキストに対応する表現（「一役買う」）を含
んでおり、この日英対訳用例データにおける前記日本語
用例に対応する英語用例に、翻訳対象語に対応する英訳
語（＜ｔｏｏｆｆｅｒｔｏｈｅｌｐ＞）が含まれ
ていたものとする。そして、出力された日本語用例デー
タの類似度と所定の閾値（例えば１）とを比較し（ステ
ップＳ８）、類似度が閾値（１）以上であれば（ステッ
プＳ８；Ｙ）、訳語出力部３７が、翻訳対象語（＜買う
＞）に対応する英訳語（例えば＜ｏｆｆｅｒ＞）を出力
する（ステップＳ９）。なお、「一役買う」という日本
語の慣用表現に対応する英語の表現が、「ｔｏｏｆｆ
ｅｒｔｏｈｅｌｐ」であり、この場合、翻訳対象語
「買う」に対する正解の英訳語が「ｏｆｆｅｒ」である
と与えられていれば、ステップＳ９で出力した英訳語は
正解となる。In order to simplify the explanation, a mode in which the route of the translated word selection device A1 is started and the route is shifted to the translated word selection device A2 when the translated word is not output from the route will be described. As described above, both routes may proceed at the same time. First, when the input acceptance unit 31 accepts an input of an input text (for example, a Japanese text including the expression of "single role" which is an idiomatic expression) (FIG. 6; step S1), the input text processing unit 311 receives the input text. A word to be translated (for example, <buy>) is extracted by morphological analysis of (step S2). Next, the example extraction unit 32 searches the parallel translation example data storage unit C based on the extracted translation target word (<buy>), and extracts Japanese example data including the translation target word (step S3). , Japanese example processing unit 32
Sentence end processing is performed for each Japanese example included in the Japanese example data extracted by 1 (step S4). Next, with respect to each of the Japanese examples subjected to the sentence ending process and the input text, the similarity calculation unit 331 in the similarity detection unit 33.
Calculates the degree of similarity r based on Equation 1 (step S
5). Then, the number of Japanese examples in which the degree of similarity r is maximum is checked (step S6). If the number is 1 (step S6).
6; Y), the similarity evaluation unit 34 outputs Japanese example data including the Japanese example (step S7). on the other hand,
If the number of Japanese examples with the maximum similarity r is 1 or more in step S6 (step S6; N), the Japanese example data including the Japanese example with the longest similar character string is selected (step S6a). , The Japanese example data is output as the one having the highest similarity (step S
7). Here, in this case, the Japanese example with the highest similarity r includes the expression corresponding to the input text (“buy a role”), and the English example corresponding to the Japanese example in this Japanese-English bilingual example data. In addition, it is assumed that the English translation word (<to offer to help>) corresponding to the translation target word is included. Then, the similarity of the output Japanese example data is compared with a predetermined threshold value (for example, 1) (step S8), and if the similarity is equal to or higher than the threshold value (1) (step S8; Y), the translated word output unit 37 outputs the English translation word (for example, <offer>) corresponding to the translation target word (<buy>) (step S9). It should be noted that the English expression corresponding to the Japanese idiomatic expression "single role buy" is "to off
er to help, and in this case, if the correct English translation word for the translation target word “buy” is given as “offer”, the English translation word output in step S9 becomes the correct answer.

【００４６】一方、ステップＳ８において、閾値（１）
以上の日本語用例データがなかった場合（ステップＳ
８；Ｎ）、すなわち、入力テキスト中の翻訳対象語を含
む日本語用例と同一又は類似の用例が、いずれの日本語
用例データがない場合、訳語翻訳装置Ａ２のルートに移
行する｛Ｓ６（Ｎ）｝。この場合、学習モデル生成部３
５において、まず入力受付部３１で受け付けた入力テキ
スト中の翻訳対象語に基づいて、前記訳語選択装置Ａ１
のルートで用いたものとは別の日英対訳用例データ格納
部Ｃを検索し、該当する語を含む日本語用例データを抽
出する（図７、ステップＳ１１）。そして、抽出した各
日本語用例データに含まれる日本語用例毎に学習データ
を適用して学習モデル（ＳＶＭ、ＤＬ、ＭＥのいずれか
に基づく）を生成する（ステップＳ１２）。さらに、学
習モデル選択部３５１によって、生成された各学習モデ
ルについて、学習データを用いてクロスバリデーション
を行ったうえで精度が最高となった学習モデルを選択す
る（ステップＳ１３）。ここで選択された学習モデル
を、学習モデル適用部３６において入力テキスト中の翻
訳対象語に適用して、それに対応する訳語候補の全てに
ついて確信度ｐを演算し（ステップＳ１４）、確信度ｐ
順に例えば降順で順序付けて訳語候補を出力する（ステ
ップＳ１５）。最後に、出力した訳語候補から、最高の
確信度ｐが得られた訳語候補を選択して訳語出力手段３
７により出力する（ステップＳ１６）。この出力した訳
語候補が、予め与えられた正解の英訳語と合致していれ
ば、当該英訳語が正解となる。On the other hand, in step S8, the threshold value (1)
If there is no Japanese example data above (step S
8; N), that is, when there is no Japanese example data for the same or similar example as the Japanese example including the word to be translated in the input text, the process moves to the root of the translated word translation device A2 {S6 (N )}. In this case, the learning model generation unit 3
5, first, based on the translation target word in the input text received by the input receiving unit 31, the translation word selection device A1
The Japanese-English parallel translation example data storage unit C different from the one used in the above route is searched, and the Japanese example data including the corresponding word is extracted (FIG. 7, step S11). Then, learning data is applied to each Japanese example included in each extracted Japanese example data to generate a learning model (based on one of SVM, DL, and ME) (step S12). Further, the learning model selection unit 351 selects a learning model having the highest accuracy after performing cross validation using the learning data for each of the generated learning models (step S13). The learning model selected here is applied to the translation target word in the input text in the learning model application unit 36, and the confidence factor p is calculated for all the translation word candidates corresponding thereto (step S14), and the confidence factor p is calculated.
The translated word candidates are output in order, for example, in descending order (step S15). Finally, the translation word output unit 3 selects the translation word candidate having the highest confidence p from the output translation word candidates.
It outputs by 7 (step S16). If the output translation candidate matches the correct English translation given in advance, the English translation is correct.

【００４７】参考として、図８に、ＳＥＮＳＥＶＡＬ２
のコンテストにおける訳語選択装置Ａ１及びＡ２による
結果を一覧表にして示す。この結果は、コンテストで与
えられた翻訳対象語である単語（名詞２０、動詞２０）
ごとについて出力した英訳語の正解率を精度として示す
ものである。与えられたのべ１２００の翻訳対象語のう
ち、１００について訳語選択装置Ａ１を適用した結果、
精度は９１．０％であった。また、１１００の翻訳対象
語について訳語選択装置Ａ２を適用した結果、精度は６
０．９％であった。なお、比較のため、これら訳語選択
装置Ａ１、Ａ２による総合的な結果（Ａ１＋Ａ２）も同
一覧表に示している。この結果から、訳語選択装置Ａ１
について精度が芳しくなかった翻訳対象語については、
訳語選択装置Ａ２を適用するという、本実施形態の訳語
選択装置Ａ３を適用することが適切であるといえる。す
なわち、文字列の類似性に基づく訳語選択装置Ａ１を適
用するルートは、慣用的表現を含むなど一般に学習デー
タ数が少ない用例、換言すればそのような日英対訳用例
データ数が少ない用例に対して適しているといえ、一
方、上記ルートで精度が悪い場合に学習データ及び学習
モデルを適用して確信度を得る訳語選択装置Ａ２のルー
トを適用することで、通常用いられる表現は勿論のこと
慣用的表現も含めて、全体として精度の高い訳語選択を
実行することが可能であるといえる。For reference, FIG. 8 shows SENSEVAL2.
The results obtained by the translation word selection devices A1 and A2 in the contest are shown in a list. The result is a word (noun 20, verb 20) that is the translation target word given in the contest.
The accuracy rate of the English translation word output for each is shown as accuracy. As a result of applying the translation word selection device A1 to 100 out of the given 1200 translation target words,
The accuracy was 91.0%. Further, as a result of applying the translation word selection device A2 to 1100 translation target words, the accuracy is 6
It was 0.9%. For comparison, the comprehensive results (A1 + A2) by the translation word selection devices A1 and A2 are also shown in the list. From this result, the translation word selection device A1
For translation target words that were not accurate in
It can be said that it is appropriate to apply the translation word selection device A3 of this embodiment, which is to apply the translation word selection device A2. That is, the route to which the translation word selection device A1 based on the similarity of character strings is applied to an example in which the number of learning data is generally small such as including an idiomatic expression, in other words, an example in which the number of Japanese-English parallel translation example data is small. However, by applying the route of the translation word selection device A2 that obtains a certainty factor by applying the learning data and the learning model when the accuracy is poor in the above route, not only the expression that is usually used It can be said that it is possible to perform highly accurate translation selection, including idiomatic expressions.

【００４８】本発明は、以上に説明した実施形態に限ら
れるものではない。例えば、訳語選択装置Ａ１、Ａ２を
単独で用いたり、訳語出力部で出力される訳語に基づい
て入力テキストに対応する対象テキストを生成し出力す
る翻訳文出力部を設けることによって翻訳装置を構成す
ることも可能である。また、その他、各部の具体的構成
についても上記実施形態に限られるものではなく、本発
明の趣旨を逸脱しない範囲で種々変形が可能である。The present invention is not limited to the embodiments described above. For example, the translation device is configured by using the translation word selection devices A1 and A2 independently or by providing a translation text output unit that generates and outputs a target text corresponding to the input text based on the translation word output by the translation word output unit. It is also possible. Further, other than that, the specific configuration of each part is not limited to the above-described embodiment, and various modifications can be made without departing from the spirit of the present invention.

【００４９】[0049]

【発明の効果】以上に詳述したように、本発明によれ
ば、多大な人手を掛けずコンピュータに過剰な負荷を掛
けることなく、すなわち、多量の対訳用例データを収集
することなく、精度の高い訳語選択、並びに機械翻訳を
行うことができる。特に、文字列の類似性に基づく方法
と、学習データ及び学習モデルを適用する方法とをそれ
ぞれ別個に用いたり、或いはそれらを併用することで相
互に補完しあうことになり、通常用いられる自然言語の
表現や、出現頻度の低い慣用的表現に対しても極めて精
度の高い訳語選択及び機械翻訳が可能である。As described above in detail, according to the present invention, it is possible to improve the accuracy without putting a large amount of manpower and overloading the computer, that is, without collecting a large amount of parallel translation example data. It is possible to perform high translation selection as well as machine translation. In particular, the method based on the similarity of character strings and the method of applying the learning data and the learning model may be used separately, or may be used together to complement each other. It is possible to perform highly accurate translation selection and machine translation even for expressions such as and conventional expressions that occur infrequently.

[Brief description of drawings]

【図１】本発明の第１の態様に対応する訳語選択装置の
概略機能構成図。FIG. 1 is a schematic functional configuration diagram of a translation word selection device corresponding to a first aspect of the present invention.

【図２】本発明の第２の態様に対応する訳語選択装置の
概略機能構成図。FIG. 2 is a schematic functional configuration diagram of a translation word selection device corresponding to a second aspect of the present invention.

【図３】本発明の第３の態様に対応する訳語選択装置の
概略機能構成図。FIG. 3 is a schematic functional configuration diagram of a translation word selection device corresponding to a third aspect of the present invention.

【図４】本発明の一実施形態における訳語選択装置の概
略機能構成図。FIG. 4 is a schematic functional configuration diagram of a translation word selection device according to an embodiment of the present invention.

【図５】同実施形態に用いられる日英対訳用例データの
一例を示す図。FIG. 5 is a diagram showing an example of Japanese-English parallel translation example data used in the embodiment.

【図６】同実施形態の動作手順を示す概略的なフローチ
ャート。FIG. 6 is a schematic flowchart showing an operation procedure of the embodiment.

【図７】同実施形態の動作手順を示す概略的なフローチ
ャート。FIG. 7 is a schematic flowchart showing an operation procedure of the same embodiment.

【図８】本発明を適用したＳＥＮＳＥＶＡＬ２のコンテ
ストにおける訳語選択結果を一覧表にして示す図。FIG. 8 is a diagram showing a list of translation word selection results in a SENSEVAL2 contest to which the present invention has been applied.

[Explanation of symbols]

Ａ１、Ａ２、Ａ３…訳語選択装置Ｃ…対訳用例データ格納部１、１１、２１、３１…入力受付部２、３２…用例抽出部３、３３…類似性検出部４、３４…類似性評価部５、１４、３７…訳語出力部１２、３５…学習モデル生成部１３、３６…学習モデル適用部３１１…入力テキスト処理部３２１…原言語用例処理部（日本語用例処理部）３３１…類似度演算部３５１…学習モデル選択部 A1, A2, A3 ... Translated word selection device C ... Parallel translation example data storage unit 1, 11, 21, 31 ... Input acceptance section 2, 32 ... Example extraction unit 3, 33 ... Similarity detection unit 4, 34 ... Similarity evaluation unit 5, 14, 37 ... Translated word output section 12, 35 ... Learning model generation unit 13, 36 ... Learning model application unit 311 ... Input text processing unit 321 ... Source language example processing unit (Japanese example processing unit) 331 ... Similarity calculation unit 351 ... Learning model selection unit

───────────────────────────────────────────────────── フロントページの続き (72)発明者村田真樹東京都小金井市貫井北町４−２−１独立行政法人通信総合研究所内 (72)発明者井佐原均東京都小金井市貫井北町４−２−１独立行政法人通信総合研究所内Ｆターム(参考） 5B091 AA05 CA02 CA22 CC01 EA02 ─────────────────────────────────────────────────── ─── Continued front page (72) Inventor Maki Murata 4-2-1 Kanaikitamachi, Koganei City, Tokyo Independent Communications Research Laboratory (72) Inventor Hitoshi Isahara 4-2-1 Kanaikitamachi, Koganei City, Tokyo Independent Communications Research Laboratory F term (reference) 5B091 AA05 CA02 CA22 CC01 EA02

Claims

[Claims]

1. Source language example data including a source language example composed of text in a first language and a word included therein, a translated word of the word in a second language, and information related to the translated word, and second to second source language examples. It is a word to be translated that is included in the input text input in the first language by using the parallel translation example data storage unit that stores the parallel translation example data that is paired with the target language example including the text translated in the language. Selects a translated word described in a second language corresponding to a translation target word, and corresponds to the translation target word in the input text accepted by the input acceptance unit, which accepts the input of the input text. At least one or more source language example data including a word is extracted by the example extraction unit that extracts from the parallel translation example data storage unit, and the input text and the example extraction unit. Based on the language example data, the similarity detection unit that detects the similarity between the input text and the source language example and the similarity between the source language example detected by the similarity detection unit are comparatively evaluated, and at least the highest similarity is found. A similarity evaluation section that outputs source language example data, and a translated word that outputs a translated word corresponding to the translation target word in the target language example included in the bilingual example data corresponding to the source language example data output by the similarity evaluation section A translation word selection device comprising: an output unit.

2. The similarity detection unit matches the input text with the source language example based on the difference obtained by comparing the input text and the source language example included in the extracted source language example data on a character-by-character basis. There is a similarity calculation unit that calculates the similarity calculated using at least one of the proportion of the character string or the number of divisions where the matched portion is divided and matches. Claim 1
Described word selection device.

3. The example extraction unit includes a source language example processing unit that performs sentence end processing on the source language example included in the extracted source language example data and outputs a processed source language example.
In the similarity detection unit, the similarity calculation unit, based on the calculation result of the difference obtained by comparing the input text and the processed source language example on a character-by-character basis, 3. The translation word selection device according to claim 2, wherein at least one of the ratio to the character string or the number of divisions at which the matched portion is divided and matched is calculated as the similarity.

4. The similarity degree is calculated when the translated word output section has a plurality of source language example data having a maximum degree of similarity as a result of calculation by the similarity degree calculation section of the similarity detection section and evaluation by the similarity evaluation section. The translated word selection according to claim 3, wherein the translated word corresponding to the translation target word in the bilingual translation example data including the source language example in which the ratio of the character string matching the input text or the division number is the maximum is output as a result of the operation in the operation unit. apparatus.

5. The translation word selection device according to claim 1, 2, 3 or 4, wherein the input receiving unit has an input text processing unit for automatically extracting a translation target word by morphological analysis of the input text.

6. The parallel translation example data includes a source language headword generated based on a word included in the source language example, and the example extraction unit includes at least the source language headword corresponding to the translation target word. 4. The source language example data including the word is extracted from the bilingual example data storage unit.
4. The translation word selection device according to 4 or 5.

7. The parallel translation example data includes a source language headword generated based on a word included in the source language example and a target language headword generated based on a corresponding translated word. The extraction unit extracts at least source language example data including a source language headword corresponding to the translation target word, and a translated word output unit is included in the source language example data output by the similarity evaluation unit. The translation word selection device according to claim 1, 2, 3, 4, or 5, which outputs a target language headword corresponding to the source language headword extracted by the example extraction unit.

8. Source language example data including a source language example including text in a first language and a word included therein, a translated word of the word in a second language, and information about the translated word, and second to second source language examples. It is a word to be translated that is included in the input text input in the first language by using the parallel translation example data storage unit that stores the parallel translation example data that is paired with the target language example including the text translated in the language. Selects a translated word described in a second language corresponding to a translation target word, and includes an input receiving unit that receives an input of the input text, and a word included in the source language example stored in the translation example data storage unit. Using the learning data created based on the parallel translation example data corresponding to the source language example, the learning model corresponding to the translation target word in the input text accepted by the input accepting unit is used. Applying the learning model generated by the learning model generation unit and the learning model generated by the learning model generation unit to the translation target word in the input text, computing the certainty factor for all translation word candidates of the translation target word, and in the certainty factor order. From the learning model application unit that outputs translation word candidates in order, and the translation word candidate that has the highest certainty factor among the translation word candidates output by the learning model application unit, the translation word that is output as the translation word corresponding to the translation target word A translation word selection device comprising: an output unit.

9. A learning model generation unit extracts, from the parallel translation example data storage unit, parallel translation example data corresponding to a source language example including each translation target word in the input text received by the input receiving unit. 9. The translation word selection device according to claim 8, wherein a learning model is generated based on the extracted parallel translation example data.

10. A learning model generation unit generates a learning model corresponding to each learning data, and the learning data has the highest accuracy for each translation target word in the input text accepted by the input acceptance unit. The learning model selecting unit further includes a learning model selecting unit that selects a learning model that becomes, and the learning model applying unit applies the learning model selected by the learning model selecting unit to a translation target word in the input text. 8. A translation word selection device according to item 8 or 9.

11. The translation word selection device according to claim 8, 9 or 10, wherein the input reception unit has an input text processing unit for automatically extracting a translation target word from the input text by morphological analysis.

12. The parallel translation example data includes a source language headword generated based on a word included in the source language example, and the learning model generation unit at least the source language heading corresponding to the translation target word. 9. Source language example data including words is extracted from the bilingual example data storage unit.
The translated word selection device according to 9, 10 or 11.

13. Source language example data including a source language example consisting of text in a first language and a word included therein, a translated word of the word in a second language, and information about the translated word, and second to second source language examples. It is a word to be translated that is included in the input text input in the first language by using the parallel translation example data storage unit that stores the parallel translation example data that is paired with the target language example including the text translated in the language. Selects a translated word described in a second language corresponding to a translation target word, and corresponds to the translation target word in the input text accepted by the input acceptance unit, which accepts the input of the input text. The at least one or more source language example data including the words to be extracted from the parallel translation example data storage unit by the input text and the example extraction unit. Based on the source language example data, the similarity detection unit that detects the similarity between the input text and the source language example and the source language example detected by the similarity detection unit are compared and evaluated, and the highest similarity is found. At least a similarity evaluation unit that outputs source language example data, and learning data created based on the words included in the source language example stored in the source example data storage and the translation example data corresponding to the source language example. Apply the learning model generated by the learning model generation unit that generates a learning model corresponding to the translation target word in the input text received by the input reception unit and the learning model generated by the learning model generation unit to the translation target word in the input text Then, the learning model application unit that calculates the certainty factors for all the translated word candidates of the translation target word and outputs the translated word candidates in order of certainty factor, and the original language output by the similarity evaluation unit Corresponding to the translation target word by selecting the most suitable translation word corresponding to the translation target word in the target language example included in the bilingual translation example data corresponding to the example data or the translation candidate output by the learning model application unit And a translated word output unit that outputs the translated word as a translated word.

14. A translation target word obtained as a result of being output by the similarity evaluation unit when the translation evaluation unit outputs the parallel translation example data for which the similarity evaluation unit has obtained a similarity equal to or greater than a predetermined threshold value. Corresponding to the translation target word obtained as the result output by the learning model application unit when the parallelism example data for which the similarity evaluation unit obtains a similarity equal to or higher than a predetermined threshold is not output. 14. The translation word selection device according to claim 13, wherein the translation word is output.

15. When there is no output of parallel translation example data for which a similarity equal to or higher than a predetermined threshold is obtained by the similarity evaluation unit,
The translation word selection device according to claim 13, wherein the learning model generation unit, the learning model application unit, and the translation output unit are operated.

16. The parallel translation example data storage unit used by the example extraction unit and the parallel translation example data storage unit used by the learning model generation unit are different parallel translation example data storage units created based on different language resources. Claims 13 and 14
Alternatively, the translation word selection device according to item 15.

17. Source language example data including a source language example consisting of a text in a first language and a word included therein, a translated word of the word in a second language, and information about the translated word, and second to second source language examples. A translation in a second language based on an input text input in a first language by using a translation example data storage unit that stores a translation example data paired with a target language example composed of text translated in a language A target text that is a sentence is output, and an input reception unit that receives an input of the input text, and at least one or more source texts including a word corresponding to each translation target word in the input text received by the input reception unit. Based on the example extraction unit that extracts the language example data from the parallel translation example data storage unit, and the source text example data extracted by the input text and the example extraction unit. When the similarity detection unit that detects the similarity between the input text and the source language example and the source language example detected by the similarity detection unit are compared and evaluated, at least the source language example data that has the highest similarity is evaluated. A similarity evaluation unit that outputs, and a translation output unit that outputs a translation corresponding to the translation target word in the target language example included in the parallel translation example data corresponding to the source language example data output by the similarity evaluation unit, A translation device comprising: a translation output unit that generates and outputs a target text corresponding to an input text based on a translation output by a translation output unit and parallel translation example data including the translation.

18. Source language example data including a source language example consisting of text in a first language and a word contained therein, a translated word of the word in a second language, and information about the translated word, and second to second source language examples. It is a word to be translated that is included in the input text input in the first language by using the parallel translation example data storage unit that stores the parallel translation example data that is paired with the target language example including the text translated in the language. Selects a translated word described in a second language corresponding to a translation target word, and includes an input receiving unit that receives an input of the input text, and a word included in the source language example stored in the translation example data storage unit. And learning corresponding to the translation target word in the input text accepted by the input accepting unit, using the learning data created based on the parallel translation example data corresponding to the source language example. The learning model generator that generates Dell and the learning model generated by the learning model generator are applied to the translation target words in the input text, and the confidence factor is calculated for all translation word candidates of the translation target word, and the confidence factors are calculated in order of the confidence factor. From the learning model application unit that outputs translation word candidates in order, and the translation word candidate that has the highest certainty factor among the translation word candidates output by the learning model application unit, the translation word that is output as the translation word corresponding to the translation target word It is characterized by comprising an output unit and a translated sentence output unit for generating and outputting a target text corresponding to the input text based on the translated word output by the translated word output unit and the parallel translation example data including the translated word. Translation device.

19. A source language example data including a source language example consisting of text in a first language and a word included therein, a translated word of the word in a second language, and information about the translated word, and second to second source language examples. It is a word to be translated that is included in the input text input in the first language by using the parallel translation example data storage unit that stores the parallel translation example data that is paired with the target language example including the text translated in the language. Selects a translated word described in a second language corresponding to a translation target word, and corresponds to the translation target word in the input text accepted by the input acceptance unit, which accepts the input of the input text. The at least one or more source language example data including the words to be extracted from the parallel translation example data storage unit by the input text and the example extraction unit. Based on the source language example data, the similarity detection unit that detects the similarity between the input text and the source language example and the source language example detected by the similarity detection unit are compared and evaluated, and the highest similarity is found. At least a similarity evaluation unit that outputs source language example data, and learning data created based on the words included in the source language example stored in the source example data storage and the translation example data corresponding to the source language example. Apply the learning model generated by the learning model generation unit that generates a learning model corresponding to the translation target word in the input text received by the input reception unit and the learning model generated by the learning model generation unit to the translation target word in the input text Then, the learning model application unit that calculates the certainty factors for all the translated word candidates of the translation target word and outputs the translated word candidates in order of certainty factor, and the original language output by the similarity evaluation unit A translation word corresponding to the translation target word in the target language example included in the bilingual translation example data corresponding to the word example data, or a translation word candidate output from the learning model application unit, is selected as the translation target word. A translation output unit that outputs the corresponding translation, and a translation output unit that generates and outputs a target text corresponding to the input text based on the translation output by the translation output unit and the parallel translation example data including the translation A translation device characterized by:

20. Source language example data including a source language example including text in a first language and a word included therein, a translated word of the word in a second language, and information related to the translated word, and second to second source language examples. A computer is operated using the parallel translation example data storage unit that stores the parallel translation example data paired with the target language example composed of the text translated in the language, and the translation included in the input text input in the first language is performed. A program for selecting a translated word written in a second language corresponding to a translation target word that is a power word, the input receiving step of receiving an input of the input text, and the translation target word in the received input text. At least one or more source language example data including the corresponding word,
An example extraction step of extracting from the parallel translation example data storage unit; a similarity detection step of detecting a similarity between the input text and the source language example based on the input text and the extracted source language example data; A similarity evaluation step of comparing and evaluating the similarities of the source language examples and outputting at least the source language example data having the highest similarity, and a target language example included in the bilingual example data corresponding to the output source language example data. A translation word output step of outputting a translation word corresponding to the translation target word in the translation word selection program.

21. The similarity detection step matches the input text and the source language example based on a difference obtained by comparing the input text and the source language example included in the extracted source language example data on a character-by-character basis. A similarity calculation step of calculating the similarity calculated using at least one of the ratio of the character string or the number of divisions at which the matched portion is divided to match The translation word selection program according to claim 21.

22. The example extraction step includes a source language example processing step of subjecting the source language example included in the source language example data extracted in the example extraction step to end-of-sentence processing and outputting a processed source language example. In the calculation step, based on the calculation result of the difference obtained by comparing the input text and the processed source language example on a character-by-character basis, the ratio of the matched character string to the character string of the processed source language example, or the matched portion. 22. The translation word selection program according to claim 21, wherein at least one of the numbers of divisions indicating the number of divisions in which is matched is calculated as the similarity.

23. In the translated word output step, the result output in the similarity output step and evaluated in the similarity evaluation step,
In the case where there are a plurality of source language example data having the highest degree of similarity, the translation in the parallel translation example data including the character string matching the input text or the source language example having the largest division number as a result of the operation in the difference operation step. 23. The translation word selection device according to claim 22, which outputs a translation word corresponding to the target word.

24. Source language example data including a source language example consisting of text in a first language and a word contained therein, a translated word of the word in a second language, and information about the translated word, and second to second source language examples. A computer is operated using the parallel translation example data storage unit that stores the parallel translation example data paired with the target language example composed of the text translated in the language, and the translation included in the input text input in the first language is performed. A program for selecting a translated word written in a second language corresponding to a translation target word that is a power word, and an input receiving step for receiving the input of the input text, and an example for the source language stored in the parallel translation example data storage unit. Using the learning data created based on the parallel translation example data corresponding to the source language example and the words included in A learning model generation step of generating a learning model corresponding to the translation target word in the list, applying the generated learning model to the translation target word in the input text, and determining the certainty factor for all translation word candidates of the translation target word. A learning model applying step of calculating and ordering the candidate words in order of certainty, and outputting the translated word candidate having the highest certainty factor from the output translated word candidates and outputting it as the translated word corresponding to the translation target word. And a translated word output step of:

25. In the learning model generating step, for each translation target word in the input text accepted in the input accepting step, bilingual example data corresponding to a source language example including it is extracted from the bilingual example data storage unit, 25. The translation word selection program according to claim 24, which generates a learning model based on the extracted parallel translation example data.

26. In the learning model generating step, a learning model is generated corresponding to each learning data, and the learning data has the highest accuracy for each of the translation target words in the input text accepted in the input accepting step. 26. The translated word according to claim 24, further comprising a learning model selecting step of selecting a model, wherein the learning model applying step applies the learning model selected in the learning model selecting step to a translation target word in the input text. Choice program.

27. Source language example data including a source language example consisting of text in a first language and a word included therein, a translated word of the word in a second language, and information related to the translated word; and a source language example second. A computer is operated using the parallel translation example data storage unit that stores the parallel translation example data paired with the target language example composed of the text translated in the language, and the translation included in the input text input in the first language is performed. A program for selecting a translated word written in a second language corresponding to a translation target word that is a power word, the input receiving step of receiving an input of the input text, and the translation target word in the received input text. At least one or more source language example data including the corresponding word,
An example extraction step of extracting from the parallel translation example data storage unit; a similarity detection step of detecting a similarity between the input text and the source language example based on the input text and the extracted source language example data; A similarity evaluation step of comparing and evaluating the similarities of the source language examples and outputting at least the source language example data having the highest similarity, and the words included in the source language examples stored in the bilingual example data storage and the corresponding source A learning model generating step of generating a learning model corresponding to a translation target word in the input text accepted by the input accepting unit, using the learning data created based on the parallel translation example data corresponding to the language example; The learning model is applied to the translation target word in the input text, and the certainty factor is calculated for all translation candidate words of the translation target word, A learning model applying step for outputting candidate translations ordered in order of degree, and a translated word corresponding to the translation target word in the target language example included in the bilingual example data corresponding to the source language example data output in the similarity evaluation step, or And a translation word output step of selecting an optimum translation word candidate output from the learning model application step and outputting the translation word as a translation word corresponding to the translation target word.

28. In the translated word output step, when there is output of parallel translation example data for which a similarity equal to or more than a predetermined threshold is obtained in the similarity evaluation step, a translation target word obtained as a result of the output in the similarity evaluation step. Corresponding to the translation target word obtained as the result output by the learning model application unit when there is no output of the parallel translation example data in which the similarity equal to or more than a predetermined threshold is obtained in the similarity evaluation step. 28. The translated word selection program according to claim 27, which outputs translated words.

29. When there is no output of parallel translation example data for which a similarity equal to or greater than a predetermined threshold is obtained in the similarity evaluation step, the learning model generation step, the learning model application step, and the translated word output step are performed. 29. The translation word selection device according to claim 28.

30. Source language example data including a source language example including text in a first language and a word included therein, a translated word of the word in a second language, and information about the translated word, and second to second source language examples. Based on the input text input in the first language, the computer is operated by using the parallel translation example data storage unit that stores the parallel translation example data paired with the target language example consisting of the text translated in the language. A target text, which is a translation in two languages, is output, and an input receiving step of receiving the input of the input text, and at least one or more including a word corresponding to each translation target word in the received input text. An example extraction step of extracting source language example data from the bilingual example data storage, the input text and the extracted source language Based on the example data, a similarity detection step of detecting the similarity between the input text and the source language example, and the comparative evaluation of the similarity of the detected source language example, at least the source language example data having the highest similarity. A similarity evaluation step of outputting, a translated word output step of outputting a translated word corresponding to the translation target word in the target language example included in the parallel translation example data corresponding to the output source language example data, the output translated word, and A translation program output step of generating and outputting a target text corresponding to an input text based on parallel translation example data including the translation word.

31. Source language example data including a source language example including text in a first language and a word included therein, a translated word of the word in a second language, and information about the translated word, and second to second source language examples. Based on the input text input in the first language, the computer is operated by using the parallel translation example data storage unit that stores the parallel translation example data paired with the target language example consisting of the text translated in the language. A target text that is a translation in two languages is output, and an input receiving step of receiving the input of the input text, a word included in the source language example stored in the bilingual example data storage unit, and the source language example. Using the learning data created based on the bilingual example data corresponding to, it corresponds to the translation target word in the input text accepted by the input acceptance unit. And a learning model generation step of generating a learning model, applying the generated learning model to the translation target word in the input text, calculating the certainty factor for all translation word candidates of the translation target word, and ordering in the certainty factor A learning model applying step of outputting a translation word candidate; a translation word output step of selecting a translation word candidate having the highest certainty factor from the output translation word candidates and outputting the translation word candidate as a translation word corresponding to the translation target word; And a translated sentence output step of generating and outputting a target text corresponding to the input text based on the translated word and the parallel translation example data including the translated word.

32. Source language example data including a source language example including text in a first language and a word included therein, a translated word of the word in a second language, and information related to the translated word; Based on the input text input in the first language, the computer is operated by using the parallel translation example data storage unit that stores the parallel translation example data paired with the target language example consisting of the text translated in the language. A target text, which is a translation in two languages, is output, and an input receiving step of receiving the input of the input text, and at least one or more including a word corresponding to each translation target word in the received input text. An example extraction step of extracting source language example data from the bilingual example data storage, the input text and the extracted source language Based on the example data, a similarity detection step of detecting the similarity between the input text and the source language example, and the comparative evaluation of the similarity of the detected source language example, at least the source language example data having the highest similarity. Input acceptance using the similarity evaluation step to output and the learning data created based on the words included in the source language example stored in the source translation example data storage and the parallel translation example data corresponding to the source language example. A learning model generation step of generating a learning model corresponding to a translation target word in the input text accepted by the section, applying the generated learning model to the translation target word in the input text, and selecting a translation candidate of the translation target word. A learning model application step that computes the certainty factors for all and outputs candidate words in order of certainty factor, and a source language that is output in the similarity evaluation step Corresponding to the translation target word by selecting the optimum one from the translation words corresponding to the translation target words in the target language example included in the bilingual translation example data corresponding to the example data, or the translation word candidates output in the learning model applying step And a translated sentence output step of generating and outputting a target text corresponding to the input text based on the output translated word and the parallel translation example data including the translated word. Translation program.