JP2013186673A

JP2013186673A - Machine translation device and machine translation program

Info

Publication number: JP2013186673A
Application number: JP2012050964A
Authority: JP
Inventors: Naoto Kato; 直人加藤; Taro Miyazaki; 太郎宮▲崎▼
Original assignee: Nippon Hoso Kyokai NHK; Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2012-03-07
Filing date: 2012-03-07
Publication date: 2013-09-19

Abstract

PROBLEM TO BE SOLVED: To realize translation with high accuracy.SOLUTION: A machine translation device which performs machine translation from a source language to a target language using examples includes example-based machine translation means for performing example-based translation using prestored translation example data on a clause or phrase basis in the target language corresponding to the source language for each word contained in input data of the source language, and partial translation synthesizing means for synthesizing pieces of partial translation of a word or a plurality of words translated by the example-based machine translation means.

Description

本発明は、機械翻訳装置及び機械翻訳プログラムに係り、特に、高精度な翻訳を実現するための機械翻訳装置及び機械翻訳プログラムに関する。 The present invention relates to a machine translation apparatus and a machine translation program, and more particularly, to a machine translation apparatus and a machine translation program for realizing highly accurate translation.

従来では、用例を用いて原始言語を目的言語に機械翻訳する手法や、ある単語に続く確率が最も高い単語等を統計的に求め、その情報を用いて原始言語を目的言語に機械翻訳する手法等が知られている（例えば、特許文献１〜５参照）。 Conventionally, a method of machine-translating the source language into the target language using examples, or a method of statistically obtaining the word that has the highest probability of following a certain word, and machine-translating the source language into the target language using that information Etc. are known (see, for example, Patent Documents 1 to 5).

例えば、特許文献１に示されている手法は、入力文と一致又は類似する原始言語の用例に対応する目的言語の用例に基づいて、入力文を目的言語に翻訳した用例翻訳候補と、用例翻訳候補の確からしさを表す第１尤度とを求め、上述の処理と異なる処理により入力文を目的言語に翻訳し、入力文の単語それぞれに対する翻訳結果の候補のうち、第２尤度が第１閾値以上である候補を表す訳語候補を生成し、用例翻訳候補それぞれについて、用例翻訳候補に含まれる訳語が訳語候補に存在しない場合に、第１尤度を所定値だけ下げ、用例翻訳候補から第１尤度が最大の用例翻訳候補を選択する機械翻訳装置が示されている。 For example, the technique disclosed in Patent Document 1 includes an example translation candidate in which an input sentence is translated into a target language based on an example of a target language corresponding to an example of a source language that matches or is similar to the input sentence, and an example translation The first likelihood representing the likelihood of the candidate is obtained, the input sentence is translated into the target language by a process different from the above process, and the second likelihood is the first of the translation result candidates for each word of the input sentence. A translation candidate representing a candidate that is equal to or greater than the threshold is generated, and for each example translation candidate, if the translation included in the example translation candidate does not exist in the translation candidate, the first likelihood is decreased by a predetermined value, A machine translation apparatus for selecting an example translation candidate having the maximum one likelihood is shown.

また、特許文献２に示されている手法は、ソース言語（ＳＬ）入力のフラグメントを、用例ベース中の用例のＳＬフラグメントとマッチさせ、ＳＬ入力中の全てのマッチしたブロックを、用例中の１つ又は複数のＳＬフラグメントとマッチした、ＳＬ入力中の用語ブロックとして識別し、マッチしたブロックのブロック組合せを選択してＳＬ入力の１つ又は複数のフラグメントをカバーし、選択したブロック組合せ中の各ブロックについて、そのブロックに関連する用例を識別し、識別した用例のターゲット言語（ＴＬ）部分を、ＳＬ入力の１つ又は複数のフラグメントとマッチする識別した用例のＳＬ部分にアラインさせ、アラインさせた部分に基づいて、翻訳出力を供給する用例ベースの機械翻訳システムが示されている。 Further, the technique disclosed in Patent Document 2 matches a source language (SL) input fragment with an example SL fragment in the example base, and matches all matched blocks in the SL input to the one in the example. Identify as a term block in the SL input that matches one or more SL fragments, select a block combination of the matched blocks to cover one or more fragments of the SL input, and each in the selected block combination For a block, the example associated with the block was identified, and the target language (TL) portion of the identified example was aligned and aligned with the SL portion of the identified example that matched one or more fragments of the SL input An example-based machine translation system that provides translation output based on parts is shown.

また、特許文献３に示されている手法は、原始言語のテキストの構文木から複数の部分木群を作成し、複数の部分木群に含まれる部分木の各々に対し、用例データベースから当該部分木と一致する原始言語の構文木を持つ用例群を検索し、検索された用例群に含まれる用例の各々に対する翻訳確率を、各用例のコンテキスト類似度以上のコンテキスト類似度を持つ部分用例群内における、当該用例の出現頻度により算出し、この翻訳確率に基づき最尤の部分木群を選択し、選択した部分木群と、当該部分木群に含まれる部分木に対し検索された用例とに基づいて、目的言語のテキストを生成する機械翻訳装置が示されている。 Further, the technique disclosed in Patent Document 3 creates a plurality of subtree groups from a syntax tree of a text in a source language, and for each subtree included in the plurality of subtree groups, the corresponding part from the example database. Search for a group of examples that have a source language syntax tree that matches the tree, and the translation probabilities for each of the examples included in the retrieved group of examples in the partial examples group that have a context similarity greater than or equal to the context similarity of each example In this case, the maximum likelihood subtree group is selected based on the translation probability, and the selected subtree group and examples searched for the subtrees included in the subtree group are selected. Based on this, a machine translation device for generating text in a target language is shown.

また、特許文献４に示されている手法は、翻訳すべき入力文を取得し、予め構築されたフレーズテーブルからフレーズファジーマッチング手法を用いて入力文における各句について同一又は最も類似した対訳句対を検索し、最も類似した対訳句対について修正を行うことにより、各句の正確な翻訳文を取得し、上述した対訳句対と予め構築された言語モデルに基づいて、入力文についての目的言語の全ての翻訳文を検出し、統計モデルを用いて最も高いスコアの翻訳文を入力文の正しい目的言語翻訳文として選択して出力するフレーズベースの統計的機械翻訳方法が示されている。 Further, the technique disclosed in Patent Document 4 acquires an input sentence to be translated, and uses the phrase fuzzy matching technique from a phrase table constructed in advance, and the same or most similar parallel phrase pair for each phrase in the input sentence. To obtain the exact translation of each phrase, and correct the target language for the input sentence based on the above-mentioned parallel translation phrase pair and a pre-built language model. 1 shows a phrase-based statistical machine translation method that detects all translated sentences of the above, and selects and outputs a translated sentence having the highest score as a correct target language translated sentence of an input sentence using a statistical model.

更に、特許文献５に示されている手法は、ソース文のクラスメンバーシップを表す確率のベクトルを決定するための手段を含み、ベクトルの要素はソース文が予め定められたクラスの集合の１つに属する確率を表し、更に予め定められたクラスの集合のクラスそれぞれについて設けられた、複数個のクラス特定統計的サブデコーダをさらに含み、デコーダはそれぞれのクラスのトレーニングデータのそれぞれの集合によって統計的にトレーニングされ、デコーダの各々はソース文中の単語又は単語シーケンスの各々について、ターゲット言語での翻訳単語又は単語シーケンスの確率を出力し、ターゲット言語の可能な単語シーケンスの確率に従って、ソース文のターゲット言語における最も尤度の高い翻訳仮説を推定する手段を有し、ターゲット言語の可能な単語シーケンスの確率は、複数個のサブデコーダによって出力される確率をターゲット言語の単語又は単語シーケンスの各々についての確率ベクトルに従って補間することによって計算される統計的機械翻訳装置が示されている。 Further, the technique disclosed in Patent Document 5 includes means for determining a vector of probabilities representing class membership of the source sentence, the element of the vector being one of a set of classes in which the source sentence is predetermined. And further comprising a plurality of class specific statistical sub-decoders provided for each of the classes of the predetermined set of classes, the decoder statistically depending on each set of training data of each class. Each of the decoders outputs, for each word or word sequence in the source sentence, the probability of the translated word or word sequence in the target language, and according to the probability of the possible word sequence in the target language, the target language of the source sentence Has a means to estimate the most likely translation hypothesis in The probability of a possible word sequence in the target language is calculated by a statistical machine translation device calculated by interpolating the probabilities output by a plurality of subdecoders according to the probability vector for each of the target language words or word sequences. Has been.

特開２００８−３０５１６７号公報JP 2008-305167 A 特開２００８−２６２５８７号公報JP 2008-262587 A 特開２００６−２５２２９０号公報JP 2006-252290 A 特開２０１０−０６１６４５号公報JP 2010-061645 A 特開２００９−２９４７４７号公報JP 2009-294747 A

しかしながら、上述したような従来手法は、用例ベース翻訳も統計的機械翻訳も、文全体を対象にしたものであり、例えば節や句のような部分単語列のみを対象としたものはなかった。つまり、従来手法では、文単位で用例ベース翻訳と統計的機械翻訳をする場合には、翻訳結果は一文全体のものとなってしまい、部分単語列に対する翻訳結果を融合することができないという問題があった。なお、従来手法を用いて、文の代わりに、節や句単位を入力して用例ベース翻訳や統計的機械翻訳し、翻訳結果を一文として統合することも考えられるが、そもそも節や句単位をどのような長さで入力すればよいのかが不明であるため、高精度な翻訳を実現することができなかった。 However, the above-described conventional methods, such as example-based translation and statistical machine translation, are directed to the whole sentence, and there is no one that targets only a partial word string such as a clause or a phrase. In other words, with the conventional method, when example-based translation and statistical machine translation are performed on a sentence-by-sentence basis, the translation result is the entire sentence, and the translation result for partial word strings cannot be merged. there were. In addition, using conventional methods, instead of sentences, it is possible to input clauses or phrase units, use example-based translation or statistical machine translation, and integrate the translation results as one sentence. Since it is unclear what length should be used for input, high-precision translation could not be realized.

本発明は、上述した問題点に鑑みなされたものであり、高精度な翻訳を実現するための機械翻訳装置及び機械翻訳プログラムを提供することを目的とする。 The present invention has been made in view of the above-described problems, and an object thereof is to provide a machine translation device and a machine translation program for realizing highly accurate translation.

上記課題を解決するために、本件発明は、以下の特徴を有する課題を解決するための手段を採用している。 In order to solve the above problems, the present invention employs means for solving the problems having the following characteristics.

本発明は、用例を用いて原始言語を目的言語に機械翻訳する機械翻訳装置において、前記原始言語の入力データに含まれる単語毎に、予め格納された前記原始言語に対応する目的言語の節・句単位の翻訳用例データを用いて用例ベース翻訳を行う用例ベース機械翻訳手段と、前記用例ベース機械翻訳手段により翻訳された１又は複数の単語に対する部分翻訳を合成する部分翻訳合成手段とを有することを特徴とする。 The present invention provides a machine translation device for machine-translating a source language into a target language using an example, and for each word included in the input data of the source language, a clause of the target language corresponding to the source language stored in advance Example-based machine translation means for performing example-based translation using phrase-based translation example data, and partial translation synthesis means for synthesizing partial translations for one or more words translated by the example-based machine translation means It is characterized by.

また、本発明は、コンピュータを、上述した機械翻訳装置が有する各手段として機能させるための機械翻訳プログラムである。 The present invention also provides a machine translation program for causing a computer to function as each unit included in the machine translation apparatus described above.

なお、本発明の構成要素、表現又は構成要素の任意の組み合わせを、方法、装置、システム、コンピュータプログラム、記録媒体、データ構造等に適用したものも本発明の態様として有効である。 In addition, what applied the component, expression, or arbitrary combination of the component of this invention to a method, an apparatus, a system, a computer program, a recording medium, a data structure, etc. is also effective as an aspect of this invention.

本発明によれば、高精度な翻訳を実現することができる。 According to the present invention, highly accurate translation can be realized.

本実施形態における機械翻訳装置の機能構成の一例を示す図である。It is a figure which shows an example of a function structure of the machine translation apparatus in this embodiment. 本実施形態における機械翻訳処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the machine translation process sequence in this embodiment. ＣＹＫテーブルを用いた用例翻訳例を説明するための図である。It is a figure for demonstrating the example translation example using a CYK table. ＣＹＫテーブルを用いた用例翻訳手順の一例を示すフローチャートである。It is a flowchart which shows an example of the example translation procedure using a CYK table. ＣＹＫテーブルを用いた用例翻訳手法の具体例を示す図である。It is a figure which shows the specific example of the example translation technique using a CYK table. 出現頻度を含む翻訳用例データの一例を示す図である。It is a figure which shows an example of the example data for translation containing appearance frequency. 手話ＣＧ翻訳システムの生成画面の一例を示す図である。It is a figure which shows an example of the production | generation screen of a sign language CG translation system.

＜本発明について＞
本発明は、例えば入力文中において長い用例を使って翻訳できる単語列を効率的に翻訳し、翻訳できなかった単語列に対しては統計的機械翻訳によって翻訳し、用例ベース翻訳及び統計的機械翻訳のそれぞれの翻訳結果を融合することで高精度な翻訳を実現するものである。以下に、上述したような各特徴を有する本発明における機械翻訳装置及び機械翻訳プログラムを好適に実施した形態について、図面を用いて詳細に説明する。 <About the present invention>
The present invention efficiently translates a word string that can be translated using, for example, a long example in an input sentence, and translates a word string that could not be translated by statistical machine translation. Example-based translation and statistical machine translation By fusing the translation results of each, high-precision translation is realized. Hereinafter, preferred embodiments of a machine translation apparatus and a machine translation program according to the present invention having the above-described features will be described in detail with reference to the drawings.

なお、近年では、多国語間の翻訳だけでなく、原始言語から手話言語による機械翻訳も注目されている。手話は、聴覚障害者等にとって重要なコミュニケーション手段である。特に、先天的或いは幼少時に失聴した聾者にとって手話は母語であり、日本語より理解しやすい。しかしながら、手話による情報提供は少なく、日本語から手話への自動翻訳の研究が幾つか行われているが、そこで扱われている語彙規模は小さく、実験的なシステムである場合も少なくない。そこで、以下の説明では、機械翻訳の一例として日本語から手話に翻訳する例について説明するが、本発明においてはこれに限定されるものではない。また、以下の説明において、手話に翻訳するとは、手話の動作を単語毎に書き起こしたものに変換することを意味するが、本発明においてはこれに限定されるものではない。 In recent years, not only translation between multilingual languages but also machine translation from primitive language to sign language has attracted attention. Sign language is an important means of communication for the hearing impaired and the like. In particular, sign language is a native language for deaf persons who have been deaf or congenital, and are easier to understand than Japanese. However, there are few information provided by sign language, and some studies have been conducted on automatic translation from Japanese to sign language. However, the scale of the vocabulary handled there is small, and there are many cases where it is an experimental system. Therefore, in the following description, an example of translating from Japanese into sign language will be described as an example of machine translation, but the present invention is not limited to this. Further, in the following description, translating into sign language means converting a sign language operation into a transcript of each word, but the present invention is not limited to this.

＜言語翻訳の基本的な考え方＞
まず、言語翻訳の基本的な考え方について説明する。言語翻訳とは、文字列から文字列への変換をいう。言語翻訳の手法は、規則翻訳（規則ベース翻訳）、用例翻訳（用例ベース翻訳）、統計翻訳（統計的機械翻訳）に大別される。 <Basic concept of language translation>
First, the basic concept of language translation will be explained. Language translation refers to conversion from a character string to a character string. Language translation methods are roughly classified into rule translation (rule-based translation), example translation (example-based translation), and statistical translation (statistical machine translation).

ここで、手話は、視覚言語であるため、最終的にはモーション映像への変換が必要となるが、日本語単文字列から手話映像へ直接変換することは難しい。そのため、本実施形態では、原始言語である日本語入力文を手話単語列（手話動作を単語列に書き起こしたもの）に変換する。そして、変換された手話単語列から各手話単語に対応する手話映像に抽出し、抽出された各手話映像を連結して日本語入力文に対応する手話映像を出力する。手話映像としては、例えばＣＧ（ＣｏｍｐｕｔｅｒＧｒａｐｈｉｃｓ）によるモーション映像でもよく、実写によるモーション映像でもよい。 Here, since sign language is a visual language, it is ultimately necessary to convert it into a motion video, but it is difficult to directly convert a Japanese single character string into a sign language video. For this reason, in the present embodiment, a Japanese input sentence that is a source language is converted into a sign language word string (a sign language action transcribed into a word string). Then, a sign language image corresponding to each sign language word is extracted from the converted sign language word string, and the extracted sign language images are connected to output a sign language image corresponding to the Japanese input sentence. The sign language video may be, for example, a motion video by CG (Computer Graphics) or a motion video by live action.

また、規則翻訳では、人手によって構築された翻訳知識が必要となる。翻訳知識の構築には、翻訳対象の言語的知見が必要となるが、手話（日本手話）に関しては機械翻訳が開発できるまでの充分な解明が進んでいない。したがって、現時点では、規則翻訳のみで早期に翻訳システムを開発することは現実的ではない。一方、統計翻訳では、対象言語の知見が必要ないという利点がある。しかしながら、統計翻訳には、大規模な対訳コーパスが必要であり、現在１００万文〜１，０００万文規模のものが使われている。現在の「日本語−手話」の対訳コーパスには、それほど大きなものはなく、比較的大きな対訳コーパスでも２万〜３万文程度と遥かに小さい。その点、同じコーパスベースの翻訳手法である用例翻訳は対訳文を直接利用しているため、統計翻訳ほど大きなサイズでなくとも、ある程度の翻訳精度を得ることが可能である。 In addition, rule translation requires translation knowledge constructed manually. The construction of translation knowledge requires linguistic knowledge of the translation target, but the sign language (Japanese sign language) has not been fully elucidated until machine translation can be developed. Therefore, at present, it is not realistic to develop a translation system at an early stage using only rule translation. On the other hand, statistical translation has the advantage that knowledge of the target language is not required. However, statistical translation requires a large-scale bilingual corpus and currently has a scale of 1 to 10 million sentences. Current Japanese-Japanese sign language bilingual corpora are not so large, and relatively large bilingual corpora are much smaller, about 20,000 to 30,000 sentences. In that respect, the example translation, which is the same corpus-based translation method, directly uses the parallel translation, so that it is possible to obtain a certain degree of translation accuracy even if it is not as large as the statistical translation.

更に、例えば翻訳対象を特定の情報（例えば、気象情報等）に絞れば、そこに出現する言語現象は、ある程度限られたものとなる。特に、節や句単位でみると、気象情報は、定型的な表現が多い。なお、用例翻訳では、統計翻訳と異なり、翻訳対象に対する言語的知見が必要となるが、規則翻訳ほど詳細なものでなくともよい。 Furthermore, for example, if the target of translation is limited to specific information (for example, weather information), the language phenomenon that appears there is limited to some extent. In particular, in terms of sections and phrases, meteorological information has many typical expressions. Note that example translation, unlike statistical translation, requires linguistic knowledge of the translation target, but it may not be as detailed as regular translation.

したがって、本実施形態では、用例翻訳を主とし、統計翻訳を併用する方法を取る。具体的には、まず、節（句）単位で完全一致による用例翻訳を行う。なお、この翻訳は、用例との完全一致であるため、用例とは一致しない節（句）もある。その場合には、統計翻訳で翻訳を行う。 Therefore, in the present embodiment, a method that mainly uses example translation and also uses statistical translation is adopted. Specifically, first, example translation based on perfect matching is performed in units of clauses (phrases). Since this translation is completely identical to the example, there are some clauses (phrases) that do not match the example. In that case, translation is performed by statistical translation.

＜機械翻訳装置：機能構成例＞
ここで、本実施形態における機械翻訳装置の機能構成例について、図を用いて説明する。図１は、本実施形態における機械翻訳装置の機能構成の一例を示す図である。図１に示す機械翻訳装置１０は、形態素解析手段１１と、用例ベース機械翻訳手段１２と、節・句単位翻訳用例格納手段１３と、用例ベース部分翻訳格納手段１４と、未翻訳単語格納手段１５と、統計的機械翻訳手段１６と、部分翻訳合成手段１７とを有するよう構成されている。 <Machine translation device: functional configuration example>
Here, a functional configuration example of the machine translation apparatus according to the present embodiment will be described with reference to the drawings. FIG. 1 is a diagram illustrating an example of a functional configuration of a machine translation apparatus according to the present embodiment. The machine translation apparatus 10 shown in FIG. 1 includes a morpheme analysis unit 11, an example base machine translation unit 12, a clause / phrase unit translation example storage unit 13, an example base partial translation storage unit 14, and an untranslated word storage unit 15. And statistical machine translation means 16 and partial translation synthesis means 17.

形態素解析手段１１は、ユーザが手話翻訳したい翻訳対象の文書データを入力し、入力文に含まれる形態素を解析し、その解析結果から入力文に含まれる単語単位に分割する。ここで、形態素解析とは、入力される文章データを意味のある単語に区切ることを意味し、例えば予め設定されたコーパス等を利用して品詞や内容を判別してもよく、本発明においてはこれに限定されるものではない。 The morpheme analyzing means 11 inputs the document data to be translated that the user wants to translate in sign language, analyzes the morpheme included in the input sentence, and divides the word unit included in the input sentence from the analysis result. Here, the morphological analysis means to divide the input text data into meaningful words. For example, the part of speech or the content may be determined using a preset corpus or the like. It is not limited to this.

なお、本実施形態では、形態素解析の一例として茶筅（ｈｔｔｐ：／／ｃｈａｓｅｎ．ｎａｉｓｔ．ｊｐ／）等を用いることができるが、本発明においてはこれに限定されるものではない。また、形態素解析手段１１は、解析結果により得られる文章の単語列を用例ベース機械翻訳手段１２に出力する。 In the present embodiment, tea bowl (http://chasen.naist.jp/) or the like can be used as an example of morphological analysis, but the present invention is not limited to this. Further, the morpheme analyzing unit 11 outputs the word string of the sentence obtained from the analysis result to the example-based machine translation unit 12.

用例ベース機械翻訳手段１２は、節・句単位翻訳用例格納手段１３に予め格納された翻訳用例を利用して形態素解析手段１１により得られる単語列に対する用例ベース翻訳を行う。なお、本実施形態における用例ベース翻訳手法については、後述する。用例ベース機械翻訳手段１２は、用例ベースの翻訳ができた単語に対する翻訳結果を、用例ベース部分翻訳格納手段１４に出力する。また、用例ベース機械翻訳手段１２は、用例ベースの翻訳ができなかった単語列があれば、その単語列を未翻訳単語格納手段１５に出力する。 The example-based machine translation unit 12 performs example-based translation on the word string obtained by the morpheme analysis unit 11 using the translation examples stored in advance in the clause / phrase unit translation example storage unit 13. The example-based translation method in this embodiment will be described later. The example-based machine translation unit 12 outputs a translation result for a word that has been example-based translated to the example-based partial translation storage unit 14. Further, the example-based machine translation unit 12 outputs the word string to the untranslated word storage unit 15 if there is a word string that could not be translated based on the example.

節・句単位翻訳用例格納手段１３は、予め設定された節や句単位の翻訳用例が格納されている。また、用例ベース部分翻訳格納手段１４は、用例ベース機械翻訳手段１２での翻訳結果（例えば、翻訳された手話単語列）を格納する。未翻訳単語格納手段１５は、用例ベース機械翻訳手段１２で翻訳できなかった単語を格納する。 The clause / phrase unit translation example storage means 13 stores preset clause or phrase unit translation examples. Moreover, the example base partial translation storage means 14 stores the translation result (for example, translated sign language word string) in the example base machine translation means 12. The untranslated word storage unit 15 stores words that could not be translated by the example-based machine translation unit 12.

統計的機械翻訳手段１６は、節・句単位翻訳用例格納手段１３に格納された翻訳用例に基づいて機械学習された翻訳辞書等を利用して統計的機械翻訳を行う。なお、翻訳辞書については、これに限定されるものではなく、一般的な機械学習により統計的に生成された翻訳辞書等を用いることができる。また、統計的機械翻訳手段１６は、統計的機械翻訳結果を部分翻訳合成手段１７に出力する。 The statistical machine translation means 16 performs statistical machine translation using a translation dictionary or the like that has been machine-learned based on the translation examples stored in the clause / phrase unit translation example storage means 13. Note that the translation dictionary is not limited to this, and a translation dictionary or the like statistically generated by general machine learning can be used. The statistical machine translation unit 16 outputs the statistical machine translation result to the partial translation synthesis unit 17.

部分翻訳合成手段１７は、用例ベース部分翻訳格納手段１４により得られる部分単語列の翻訳結果（手話単語列）と、統計的機械翻訳手段１６により得られる部分単語列の翻訳結果（手話単語列）とを入力文の単語列に対応させた順序（語順）で合成し、合成された翻訳結果を出力データとして出力する。これにより、本実施形態では、高精度な翻訳を実現することができる。 The partial translation synthesis unit 17 translates the partial word string obtained by the example-based partial translation storage unit 14 (sign language word string) and the translation result of the partial word string obtained by the statistical machine translation unit 16 (sign language word string). Are combined in the order corresponding to the word string of the input sentence (word order), and the combined translation result is output as output data. Thereby, in this embodiment, highly accurate translation is realizable.

＜本実施形態における機械翻訳処理手順＞
ここで、本実施形態における機械翻訳処理手順について、フローチャートを用いて説明する。図２は、本実施形態における機械翻訳処理手順の一例を示すフローチャートである。 <Machine translation processing procedure in this embodiment>
Here, the machine translation processing procedure in the present embodiment will be described using a flowchart. FIG. 2 is a flowchart showing an example of a machine translation processing procedure in the present embodiment.

図２において、機械翻訳処理は、まず原始言語のデータ（例えば、日本語文章データ）が入力されると（Ｓ０１）、入力された文章データに対して形態素解析を行い、所定の単語毎に分割する（Ｓ０２）。 In FIG. 2, in the machine translation process, when source language data (for example, Japanese sentence data) is input (S01), the input sentence data is subjected to morphological analysis and divided into predetermined words. (S02).

次に、機械翻訳処理は、分割された単語毎に用例ベースの機械翻訳を行い（Ｓ０３）、分割された単語のうち、翻訳できなかった未翻訳単語があるか否かを判断する（Ｓ０４）。ここで、機械翻訳処理は、未翻訳単語がある場合（Ｓ０４において、ＮＯ）、統計的機械翻訳を行う（Ｓ０５）。 Next, the machine translation process performs example-based machine translation for each divided word (S03), and determines whether there is an untranslated word that could not be translated among the divided words (S04). . Here, if there is an untranslated word (NO in S04), the machine translation process performs statistical machine translation (S05).

機械翻訳処理は、Ｓ０４において、未翻訳単語がない場合（Ｓ０４において、ＹＥＳ）、又はＳ０５の処理が終了後、部分翻訳された単語（例えば、手話単語）を入力データの順序に対応させて合成し（Ｓ０７）、合成された結果を最終的な翻訳結果（例えば、手話単語列）として出力する（Ｓ０８）。なお、Ｓ０８の出力データは、翻訳結果として上述した手話単語列を出力するが、本発明においてはこれに限定されるものではなく、例えば手話単語に対応するモーション映像データを予め用意しておき、手話単語に対応するモーション映像データを連結して手話映像を生成し、生成した手話映像を出力してもよい。 Machine translation processing is performed in S04 when there is no untranslated word (YES in S04) or after completion of processing in S05, partially translated words (for example, sign language words) are synthesized in correspondence with the order of input data. Then, the synthesized result is output as a final translation result (for example, a sign language word string) (S08). The output data of S08 outputs the above-mentioned sign language word string as a translation result. However, the present invention is not limited to this. For example, motion video data corresponding to a sign language word is prepared in advance. The sign language image may be generated by connecting the motion image data corresponding to the sign language word, and the generated sign language image may be output.

また、機械翻訳処理は、処理を終了するか否かを判断し（Ｓ０９）、処理を終了しない場合（Ｓ０９において、ＮＯ）、Ｓ０１に戻り、他のデータを入力して、上述した処理を行う。また、機械翻訳処理は、ユーザ等による終了指示や、入力するデータがない等により処理を終了する場合（Ｓ０９において、ＹＥＳ）、機械翻訳処理を終了する。 Further, the machine translation process determines whether or not to end the process (S09). If the process is not ended (NO in S09), the process returns to S01 to input other data and perform the above-described process. . Further, the machine translation process is terminated when the process is terminated due to an end instruction from the user or the like or when there is no data to be input (YES in S09).

＜具体例＞
ここで、本実施形態における機械翻訳の具体例について説明する。例えば、「九州と沖縄は夕方から雷雨となりそうです」という日本語文章データを翻訳することを考える。なお、節・句単位翻訳用例格納手段１３には、予め以下に示す［用例１］のデータが格納されているとする。
［用例１］（節・句単位）
「九州と」⇔「九州Ｎ」
「沖縄は」⇔「沖縄」
「九州と沖縄は」⇔「九州Ｎ沖縄」
「雷雨となりそうです」⇔「雷雨夢Ｎ」
「雷雨となり」⇔「雷雨」
「そうです」⇔「夢Ｎ」
ここで、矢印（⇔）の左辺が日本語を示し、右辺がその手話を示している。また、手話の表記は、一例として全日本ろうあ連盟が発行している「日本語−手話辞典」（米川明彦（監修），日本手話研究所（編），（財）全日本聾唖連盟出版局，２００６．）にしたがうものとするが、表記についてはこれに限定されるものではない。また、手話の表記に含まれる「Ｎ」は、非手指動作である「頷き」を示し、意味をもつ単位の切れ目を示す。 <Specific example>
Here, a specific example of machine translation in the present embodiment will be described. For example, consider translating Japanese sentence data “Kyushu and Okinawa are likely to be thunderstorms in the evening”. It is assumed that the data of [Example 1] shown below is stored in advance in the clause / phrase unit translation example storage means 13.
[Example 1] (section / phrase unit)
“Kyushu and” “Kyushu N”
“Okinawa” ⇔ “Okinawa”
"Kyushu and Okinawa are" ⇔ "Kyushu N Okinawa"
“It seems to be a thunderstorm” ⇔ “Thunderstorm dream N”
“Thunderstorm” ⇔ “Thunderstorm”
“Yes” ⇔ “Dream N”
Here, the left side of the arrow (⇔) indicates Japanese, and the right side indicates the sign language. In addition, as an example, the sign language is “Japanese-Sign Language Dictionary” published by the All Japan Deaf Federation (Akihiko Yonekawa (supervised), Japan Sign Language Research Institute (edition), All Japan Samurai Federation Publishing Bureau, 2006. ), But the notation is not limited to this. In addition, “N” included in the sign language notation indicates “whispering” which is a non-finger movement, and indicates a break of a meaningful unit.

本実施形態における機械翻訳装置１０は、まず「九州と沖縄は夕方から雷雨となりそうです」という入力文（以下、必要に応じて「日本語文１」という）に対して、形態素解析手段１１により形態素解析を行い、所定の単語毎に分割する。具体的には、日本語文１は、形態素解析結果により「九州／と／沖縄／は／夕方／から／雷雨／と／なり／そう／です」と１１個の単語に分割する（なお、「／」は、単語の区切りを示す）。 The machine translation apparatus 10 according to the present embodiment first receives a morpheme analysis unit 11 for an input sentence “Kyushu and Okinawa are likely to be thunderstorms in the evening” (hereinafter referred to as “Japanese sentence 1” if necessary). Analysis is performed and divided into predetermined words. Specifically, Japanese sentence 1 is divided into 11 words “Kyushu / and / Okinawa / ha / in the evening / from / thunderstorm / and / be / so / is” according to the result of morphological analysis (“/ "Indicates a word break).

次に、用例ベース機械翻訳手段１２により用例翻訳を行う。すなわち、入力文の中で節・句用例を適用し、翻訳可能な箇所の用例翻訳を行う。例えば、上述した日本語文１に上述した［用例１］を適用すると、日本語の節「九州と沖縄は」と「雷雨となりそうです」とは、それぞれ「九州Ｎ沖縄」、「雷雨夢Ｎ」と翻訳することができる。 Next, the example-based machine translation means 12 performs example translation. That is, the example of clauses / phrases is applied in the input sentence, and the example is translated at the part that can be translated. For example, if the above-mentioned [Example 1] is applied to the Japanese sentence 1 described above, the Japanese sections “Kyushu and Okinawa are likely to be thunderstorms” and “Kyushu N Okinawa” and “Thunderstorm Yume N” respectively. And can be translated.

しかし、用例にない「夕方から」は、用例翻訳ができない。そこで、本実施形態では、用例翻訳できなかった単語列を統計翻訳で翻訳する。なお、「日本語−手話」の翻訳では、語順の入れ替えが少ないので、統計翻訳でも精度よく翻訳可能であることが期待できる。 However, “from the evening” that is not in the example cannot be translated. Therefore, in this embodiment, a word string that could not be translated as an example is translated by statistical translation. In addition, in the translation of “Japanese-sign language”, since the word order is not changed, it can be expected that the translation can be performed with high accuracy even by statistical translation.

＜用例ベース機械翻訳手段１２における用例翻訳の具体例＞
次に、用例ベース機械翻訳手段１２における用例翻訳の具体例として、用例翻訳の用例及び翻訳手法について説明する。用例は、通常、人手等により節・句単位に分割されている。なお、分割するにあたっては、日本語側の節・句の切れ目と、手話側の意味的単位の切れ目を考慮する。また、手話側での切れ目は、主に「頷き（Ｎ）」で判断する。なお、手話側の切れ目の判断が難しい場合には、無理に切らないようにする。 <Specific example of example translation in example-based machine translation means 12>
Next, as a specific example of the example translation in the example-based machine translation means 12, an example translation example and a translation technique will be described. Examples are usually divided into sections and phrases by hand. When dividing, consider breaks in Japanese-side clauses and phrases, and breaks in semantic units on the sign language side. Also, the break on the sign language side is determined mainly by “whisper (N)”. If it is difficult to determine the cut on the sign language side, do not cut it forcibly.

また、用例翻訳において、用例は、なるべくデータが長い方が高精度な翻訳が期待できる。そこで、本実施形態では、一文中で分割した節（句）から、その全ての組み合わせを自動生成し、用例に追加する。例えば、「きょうは雷雨となるでしょう」という入力文（以下、必要に応じて「日本語文２」という）では、人手により節・句単位に分割した場合に、以下に示す［用例２］が得られたとする。また、［用例２］の全ての組み合わせを自動生成すると、以下に示す［用例３］が得られる。したがって、本実施形態では、その全てを用例に追加する。
［用例２］（人手による節（句）分割）
「きょうは」⇔「今日Ｎ」
「雷雨となる」⇔「雷雨」
「でしょう」⇔「夢Ｎ」
［用例３］（追加された用例）
「きょうは雷雨となる」⇔「今日Ｎ雷雨」
「雷雨となるでしょう」⇔「雷雨夢Ｎ」
ここで、［用例２］及び［用例３］を用いた場合、日本語文２の用例翻訳結果は、「今日Ｎ雷雨夢Ｎ」となる。本実施形態では、上述した手法を用いることで、例えば気象情報に対して、対訳約３，５００文から、約１０，０００個の節（句）単位の用例を得ることができる。 Further, in the example translation, the example can be expected to be translated with high accuracy when the data is as long as possible. Therefore, in this embodiment, all the combinations are automatically generated from the clauses (phrases) divided in one sentence and added to the example. For example, in the input sentence “Today will be a thunderstorm” (hereinafter referred to as “Japanese sentence 2” if necessary), when manually divided into clauses and phrases, [Example 2] is as follows: Suppose that it was obtained. When all combinations of [Example 2] are automatically generated, [Example 3] shown below is obtained. Therefore, in this embodiment, all of them are added to the examples.
[Example 2] (Manual section (phrase) division)
“Today” ⇔ “N today”
“Thunderstorm” ⇔ “Thunderstorm”
“Well” ⇔ “Dream N”
[Example 3] (Additional example)
“Today is a thunderstorm” ⇔ “Today N thunderstorm”
"It will be a thunderstorm" ⇔ "Thunderstorm dream N"
Here, when [Example 2] and [Example 3] are used, the example translation result of the Japanese sentence 2 is “today N thunderstorm dream N”. In the present embodiment, by using the above-described method, for example, about 10,000 clauses (phrases) can be obtained from about 3,500 parallel translations of weather information.

ここで、本実施形態における用例ベース機械翻訳手段１２による用例翻訳は、例えばＣＹＫ（Ｃｏｃｋｅ−Ｙｏｕｎｇｅｒ−Ｋａｓａｍｉ）テーブルを埋めていくことによって行うことができるが、これに限定されるものではない。 Here, the example translation by the example-based machine translation unit 12 in the present embodiment can be performed by, for example, filling a CYK (Cocke-Younger-Kasami) table, but is not limited thereto.

＜ＣＹＫテーブルを用いた用例翻訳例＞
ここで、ＣＹＫテーブルを用いた用例翻訳例について説明する。図３は、ＣＹＫテーブルを用いた用例翻訳例を説明するための図である。また、図４は、ＣＹＫテーブルを用いた用例翻訳手順の一例を示すフローチャートである。なお、図３に示すＣＹＫテーブル２０の配列は、一例でありこれに限定されるものではない。また、図４の例では、節・句単位の翻訳用例を使って用例ベース翻訳を行い、未翻訳単語に対して統計的機械翻訳を行う部分も示している。 <Example translation using CYK table>
Here, an example translation example using the CYK table will be described. FIG. 3 is a diagram for explaining an example translation example using the CYK table. FIG. 4 is a flowchart showing an example of an example translation procedure using the CYK table. Note that the arrangement of the CYK table 20 shown in FIG. 3 is an example, and the present invention is not limited to this. Further, the example of FIG. 4 also shows a part in which example-based translation is performed using an example of translation in section / phrase units, and statistical machine translation is performed on an untranslated word.

例えば、ある入力データを形態素解析した結果、「ｗ_１，ｗ_２，…，ｗ_ｉ，…，ｗ_ｊ，…，ｗ_ｎ」と得られたとする（なお、「ｗ」は各単語を示し、「，」は各単語の区切りを示すが、必要に応じて「，」を考慮せずに一文（単語列）として扱うものとする）。なお、この例では、構文解析は行っていないものとする。 For example, as a result of morphological analysis of certain input data, it is assumed that “w ₁ , w ₂ ,..., W _i ,..., W _j , ..., w _n ” (where “w” indicates each word, “,” Indicates a delimiter for each word, but is treated as a single sentence (word string) without considering “,” as necessary). In this example, it is assumed that syntax analysis is not performed.

用例翻訳処理では、まず入力文に一致する用例を見つける。入力される日本語文の単語列「ｗ_ｉ，…，ｗ_ｊ」に対して節・句単位の対訳コーパス（用例）と照合し、一致した場合にはその手話単語列をＣＹＫテーブル２０の格納エリアｔ（ｊ−ｉ＋１，ｊ）に格納する。つまり、図３（ａ）に示すように、ＣＹＫテーブル２０の格納エリアｔ（ｊ−１＋１，ｊ）には、「ｗ_ｉ，…，ｗ_ｊ」の単語列に対する翻訳結果が格納される。 In the example translation process, first, an example that matches the input sentence is found. The input word string “w _i ,..., W _j ” is collated with a bilingual corpus (example) for each clause / phrase, and if it matches, the sign language word string is stored in the storage area of the CYK table 20. Store in t (j−i + 1, j). That is, as shown in FIG. 3A, the translation result for the word string “w _i ,..., W _j ” is stored in the storage area t (j−1 + 1, j) of the CYK table 20.

なお、翻訳結果が複数ある場合には、その全てをＣＹＫテーブル２０に登録する。つまり、図３（ｂ）に示すように、「ｗ_ｉ，…，ｗ_{ｊ＋ｋ−１}」の翻訳結果は、「Ｗ_ｉ１，…，Ｗ_ｊ１」としてＣＹＫテーブル２０の対応する格納エリアに格納し、「ｗ_ｉ＋ｋ，…，ｗ_ｊ」の翻訳結果は、「Ｗ_ｉ２，…，Ｗ_ｊ２」としてＣＹＫテーブル２０の対応する格納エリアに格納する。ただし、登録の際には、それまでに登録された翻訳結果と照合し、同じものは登録しないようにする。 If there are a plurality of translation results, all of them are registered in the CYK table 20. That is, as shown in FIG. 3B, the translation result of “w _i ,..., W _{j + k−1} ” is stored in the corresponding storage area of the CYK table 20 as “W _i1 _,. The translation result of “w _{i + k} ,..., W _j ” is stored in the corresponding storage area of the CYK table 20 as “W _i2 ,..., W _j2 ”. However, at the time of registration, the translation result registered so far is checked and the same is not registered.

用例ベース機械翻訳手段１２における翻訳処理において、ある日本語の単語列「ｗ_ｉ，…，ｗ_ｊ」の翻訳結果は、例えば、２つの連続する部分単語列の翻訳結果を連結することで得ることができる。したがって、本実施形態では、上述したように部分単語列の翻訳結果が複数ある場合には、それらを連結してＣＹＫテーブル２０の対応する格納エリアに登録する。例えば、「ｗ_ｉ，…，ｗ_ｊ」の翻訳結果（ｔ（ｊ−ｉ＋１，ｊ））は、図３（ｂ）に示す「ｗ_ｉ，…，ｗ_{ｉ＋ｋ−１}」の翻訳結果（ｔ（ｋ，ｉ＋ｋ−１））と「ｗ_ｉ＋ｋ，…，ｗ_ｊ」の翻訳結果（ｔ（ｊ−ｉ＋１−ｋ，ｊ））とが両方「空」（なお、「空」とは、その格納エリアに翻訳結果が格納されていない状態を示す）でない場合に、それらを連結することで得ることができる。なお、連結対象の組み合わせは、例えば図３（ｂ）の（１）に示す２つの格納エリア、（２）に示す２つの格納エリア等のように、順々に対応する部分単語列同士で部分訳を生成していく。 In the translation processing in the example-based machine translation means 12, the translation result of a certain Japanese word string “w _i ,..., W _j ” can be obtained, for example, by concatenating the translation results of two consecutive partial word strings. Can do. Therefore, in this embodiment, as described above, when there are a plurality of partial word string translation results, they are concatenated and registered in the corresponding storage area of the CYK table 20. For example, _{_"w} i, ..., _w _j" of the translation result (t (j-i + 1 , j)) is shown in FIG. 3 (b) _{_{"w i, ..., w i +}} k-1 " of the translation result (t ( k, i + k−1)) and the translation result (t (j−i + 1−k, j)) of “w _{i + k} ,..., w _j ” are both “empty”. Can be obtained by concatenating them. Note that the combinations to be linked are partial word strings corresponding to each other in sequence, such as two storage areas shown in (1) of FIG. 3B and two storage areas shown in (2). Generate translations.

また、用例ベース機械翻訳手段１２における翻訳処理においては、一文全体の翻訳結果がｔ（ｎ，ｎ）に登録される。ｔ（ｎ，ｎ）の領域に格納されている内容が「空」である場合には、一文全体としては用例翻訳できなかった場合である。その場合には、部分訳を求める。なお、部分訳を求めるには、ＣＹＫテーブル２０の中で「空」でない箇所を見つければよい。その際には、図３（ｃ）に示すように、上のテーブルから順に調べることで、より長い部分訳を得ることができる。 Further, in the translation processing in the example-based machine translation means 12, the translation result of the entire sentence is registered in t (n, n). If the content stored in the area of t (n, n) is “empty”, it means that the example translation as a whole sentence could not be translated. In that case, a partial translation is requested. In order to obtain a partial translation, a portion that is not “empty” in the CYK table 20 may be found. In that case, as shown in FIG.3 (c), a longer partial translation can be obtained by examining sequentially from the upper table.

ここで、ある単語列「ｗ_ｐ，…，ｗ_ｉ，…，ｗ_ｊ，…，ｗ_ｑ」に着目したときに、ｔ（ｊ−ｉ＋１，ｊ）が「空」でなかったとすると、それが部分単語列「ｗ_ｉ，…，ｗ_ｊ」の訳である。この場合、その前後の単語列「ｗ_ｐ，…，ｗ_ｉ−１」と「ｗ_ｊ＋１，…，ｗ_ｑ」は、用例翻訳では翻訳できなかった単語列である。これら２つの単語列のそれぞれに対して、上述と同様な処理を繰り返すことで、図３（ｃ）に示すように用例翻訳された単語列とされなかった単語列を求めることができる。 Here, when focusing on a certain word string “w _p ,..., W _i ,..., W _j ,..., W _q ”, if t (j−i + 1, j) is not “empty”, This is a translation of the partial word string “w _i ,..., W _j ”. In this case, the word strings “w _p ,..., W _i−1 ” and “w _{j + 1} ,..., W _q ” before and after that are word strings that could not be translated by the example translation. By repeating the same processing as described above for each of these two word strings, it is possible to obtain a word string that has not been converted into an example-translated word string as shown in FIG.

ここで、図４の翻訳処理手順を用いて上述の処理内容を具体的に説明する。まず、翻訳処理手順は、原始言語の文章データ（例えば、日本語文）が入力されると、形態素解析手段１１により単語「ｗ_１，…，ｗ_ｎ」に分割される（Ｓ１１）。次に、翻訳処理は、Ｓ１１により分割された単語から全ての部分単語列「ｗ_ｉ，…，ｗ_ｊ（ｉ≦ｊ、ｉ＝１，…，ｎ、ｊ＝１，…，ｎ）」を生成する。 Here, the above-described processing contents will be specifically described using the translation processing procedure of FIG. First, the translation processing procedure is divided into words “w ₁ ,..., W _n ” by the morpheme analysis means 11 when source language sentence data (for example, Japanese sentences) is input (S 11). Next, in the translation process, all partial word strings “w _i ,..., W _j (i ≦ j, i = 1,..., N, j = 1,..., N)” are extracted from the words divided in S11. Generate.

具体的には、翻訳処理は、ｉ＝１を設定し（Ｓ１２）、ｊ＝ｉを設定し（Ｓ１３）、単語列「ｗ_ｉ，…，ｗ_ｊ」が翻訳用例にあるか否かを判断する（Ｓ１４）。具体的には、単語列「ｗｉ，…，ｗ_ｊ」に対して、節・句単位翻訳用例格納手段１３に格納された用例（「ｗｉ，…，ｗ_ｊ」⇔「Ｗ_Ｉ，…Ｗ_Ｊ」）の原始言語側単語列「ｗｉ，…，ｗ_ｊ」と照合する。 Specifically, in the translation process, i = 1 is set (S12), j = i is set (S13), and it is determined whether the word string “w _i ,..., W _j ” is in the translation example. (S14). Specifically, for the word string “wi,..., W _j ”, examples stored in the clause / phrase unit translation example storage means 13 (“wi,..., W _j ” ⇔ “W _I ,... W _J ]) And the source language side word string “wi,..., W _j ”.

翻訳処理は、単語列「ｗｉ，…，ｗ_ｊ」が翻訳用例にある場合（Ｓ１４において、ＹＥＳ）、翻訳用例の目的側単語列「Ｗ_Ｉ，…，Ｗ_Ｊ」を用例ベース部分翻訳格納手段１４のＣＹＫテーブル２０に対してｔ（ｉ，ｊ）＝Ｗ_Ｉ，…，Ｗ_Ｊとして翻訳結果を格納する（Ｓ１５）。 Translation processing word string "wi, ..., _{w j"} (in S14, YES) if there is the translation example, object-side word string translation example _"W I, ..., _{W J"} the example-based partial translation storage means The translation result is stored as t (i, j) = W _I ,..., W _J for the 14 CYK tables 20 (S15).

次に、翻訳処理は、翻訳用例にない場合（Ｓ１４において、ＮＯ）、ｊに１を加算し（Ｓ１６）、ｊがｎより大きいか否か（ｊ＞ｎ？）を判断し（Ｓ１７）、ｊがｎより大きくない場合（Ｓ１７において、ＮＯ）、Ｓ１４に戻る。つまり、上述の処理では、入力された単語列の各単語を１単語ずつ連結させていき、連結させた単語列毎に上述した用例との照合を行い、対応する翻訳結果があれば、対応するＣＹＫテーブル２０の格納エリアに翻訳結果を格納する。 Next, if the translation process is not in the translation example (NO in S14), 1 is added to j (S16), and it is determined whether j is greater than n (j> n?) (S17), If j is not larger than n (NO in S17), the process returns to S14. That is, in the above-described processing, each word of the input word string is connected one word at a time, and each of the connected word strings is compared with the above-described example, and if there is a corresponding translation result, it corresponds. The translation result is stored in the storage area of the CYK table 20.

また、翻訳処理は、ｊがｎより大きい場合（Ｓ１７において、ＹＥＳ）、ｉに１を加算し（Ｓ１８）、ｉがｎより大きいか否か（ｉ＞ｎ？）を判断し（Ｓ１９）、ｉがｎより大きくない場合（Ｓ１９において、ＮＯ）、Ｓ１３に戻る。つまり、上述の処理では、入力された単語列の先頭の単語を１単語ずつずらしながら、上述した用例との照合を行い、対応する翻訳結果があれば、対応するＣＹＫテーブル２０の格納エリアに翻訳結果を格納する。 In the translation process, when j is larger than n (YES in S17), 1 is added to i (S18), and it is determined whether i is larger than n (i> n?) (S19). If i is not larger than n (NO in S19), the process returns to S13. That is, in the above-described processing, the first word in the input word string is shifted one word at a time, collated with the above-described example, and if there is a corresponding translation result, the translation is performed in the storage area of the corresponding CYK table 20. Store the result.

また、翻訳処理は、ｉがｎより大きい場合（Ｓ１９において、ＹＥＳ）、次の処理としてまたｉ＝１を設定し（Ｓ２０）、ｊ＝ｉを設定する（Ｓ２１）。また、翻訳処理は、新たな変数ｋに０を設定する（Ｓ２２）。 In the translation process, when i is larger than n (YES in S19), i = 1 is set again (S20) and j = i is set (S21) as the next process. In the translation process, 0 is set to a new variable k (S22).

次に、翻訳処理は、ｔ（ｉ，ｊ）（ｉ＝１，…，ｎ、ｊ＝１，…，ｎ）に対してｋ＝１，…，ｊ−ｉを生成し、用例ベース部分翻訳格納手段１４を参照し、ｔ（ｋ，ｉ＋ｋ−１）とｔ（ｊ−ｉ＋１−ｋ，ｊ）とが共に「空」でないか否かを判断する（Ｓ２３）。ここで、翻訳処理は、Ｓ２３の処理において、共に「空」でない場合（Ｓ２３において、ＹＥＳ）、ＣＹＫテーブル２０のｔ（ｋ，ｉ＋ｋ−１）とｔ（ｊ−ｉ＋１−ｋ，ｊ）とに格納されている翻訳結果ｔ（ｋ，ｉ＋ｋ−１）＝Ｗ_Ｉ１，・・・，Ｗ_Ｊ１と、ｔ（ｊ−ｉ＋１−ｋ，ｊ）＝Ｗ_Ｉ２，・・・，Ｗ_Ｊ２とを連結し、ｔ（ｊ−ｉ＋１−ｋ，ｊ）＝Ｗ_Ｉ１，・・・，Ｗ_Ｊ１，Ｗ_Ｉ２，・・・，Ｗ_Ｊ２を取得する（用例ベース部分翻訳格納手段１４に追加登録する）（Ｓ２４）。 Next, the translation process generates k = 1,..., J-i for t (i, j) (i = 1,..., N, j = 1,..., N), and example-based partial translation. Referring to the storage means 14, it is determined whether or not both t (k, i + k−1) and t (j−i + 1−k, j) are “empty” (S23). Here, in the process of S23, if both of the translation processes are not “empty” (YES in S23), t (k, i + k−1) and t (j−i + 1−k, j) of the CYK table 20 are used. translation result _{t (k, i + k-} 1) stored = W _I1, ···, and _{W J1, t (j-i} + 1-k, j) = W I2, ···, and a _{W J2} linked , T (j−i + 1−k, j) = W _I1 ,..., W _J1 , W _I2 ,..., W _J2 (additionally registered in the example base partial translation storage unit 14) (S24) .

また、翻訳処理は、Ｓ２３の処理において、共に「空」でない場合（Ｓ２３において、ＮＯ）、ｋに１を加算し（Ｓ２５）、ｋがｊ−１より大きいか否か（ｋ＞ｊ−１？）を判断する（Ｓ２６）。ｋがｊ−１より大きくない場合（Ｓ２６において、ＮＯ）、Ｓ２３に戻る。つまり、上述の処理では、ＣＹＫテーブル２０の各格納エリア毎に翻訳結果が格納されているか否かを確認し、連結対象の２つの格納エリアに翻訳結果が存在する場合には、連結処理を行い、その連結結果を対応する格納エリアに追加登録（格納）する。 Also, in the translation processing, if both are not “empty” in the processing of S23 (NO in S23), 1 is added to k (S25), and whether k is larger than j−1 (k> j−1). ?) Is determined (S26). If k is not larger than j−1 (NO in S26), the process returns to S23. That is, in the above process, it is confirmed whether or not the translation result is stored for each storage area of the CYK table 20, and if the translation result exists in the two storage areas to be linked, the linkage process is performed. The registration result is additionally registered (stored) in the corresponding storage area.

また、翻訳処理は、ｋがｊ−１より大きい場合（Ｓ２６において、ＹＥＳ）、ｊに１を加算する（Ｓ２７）。次に、翻訳処理は、ｊがｎより大きいか否か（ｊ＞ｎ？）を判断し（Ｓ２８）、ｊがｎより大きくない場合（Ｓ２８において、ＮＯ）、Ｓ２２に戻る。また、翻訳処理は、ｊがｎより大きい場合（Ｓ２８において、ＹＥＳ）、ｉに１を加算し（Ｓ２９）、ｉがｎより大きいか否か（ｉ＞ｎ？）を判断する（Ｓ３０）。翻訳処理は、ｉがｎより大きくない場合（Ｓ３０において、ＮＯ）、Ｓ２１に戻る。つまり、上述の処理は、ｉ、ｊが共にｎを超えるまで、ＣＹＫテーブル２０の各格納エリアをずらしながら、全ての格納エリアに対して上述した連結処理を行う。 In the translation process, when k is larger than j-1 (YES in S26), 1 is added to j (S27). Next, the translation processing determines whether j is greater than n (j> n?) (S28). If j is not greater than n (NO in S28), the process returns to S22. In the translation process, when j is larger than n (YES in S28), 1 is added to i (S29), and it is determined whether i is larger than n (i> n?) (S30). If i is not greater than n (NO in S30), the translation process returns to S21. That is, in the above-described processing, the above-described concatenation processing is performed on all the storage areas while shifting each storage area of the CYK table 20 until both i and j exceed n.

また、翻訳処理は、ｉがｎより大きい場合（Ｓ３０において、ＹＥＳ）、ＣＹＫテーブル２０の格納エリアｔ（ｎ，ｎ）が「空」でないか否かを判断する（Ｓ３１）。 In the translation process, if i is larger than n (YES in S30), it is determined whether or not the storage area t (n, n) of the CYK table 20 is “empty” (S31).

翻訳処理は、ｔ（ｎ，ｎ）が「空」でない場合（Ｓ３１において、ＹＥＳ）、一文全体として用例翻訳できていることになるため、ｔ（ｎ，ｎ）に格納されている翻訳結果（手話単語列）を出力して終了する（Ｓ３２）。 In the translation process, if t (n, n) is not “empty” (YES in S31), an example translation can be performed as a whole sentence. Therefore, the translation result stored in t (n, n) ( The sign language word string is output and the process ends (S32).

また、翻訳処理は、ｔ（ｎ，ｎ）が「空」である場合（Ｓ３１において、ＮＯ）、変数ｐ＝１，ｑ＝ｎを設定し（Ｓ３３）、ｋ＝ｑ−ｐ−１として（Ｓ３４）、ｈ＝０を設定し、ｔ（ｋ＋１，ｐ＋ｈ＋ｋ）は「空」でないか否かを判断する（Ｓ３６）。翻訳処理は、ｔ（ｋ＋１，ｐ＋ｈ＋ｋ）が「空」である場合（Ｓ３６において、ＮＯ）、ｈに１を加算し（Ｓ３７）、ｈがｑ−ｐ−ｋ以下か否か（ｈ≦ｑ−ｐ−ｋ？）を判断し（Ｓ３８）、ｑ−ｐ−ｋ以下でない場合（Ｓ３８において、ＮＯ）、Ｓ３６に戻る。また、翻訳処理は、ｈがｑ−ｐ−ｋ以下である場合（Ｓ３８におおいて、ＹＥＳ）、ｋから１を減算し（Ｓ３９）、ｋが０より小さいか否か（ｋ＜０？）を判断する（Ｓ４０）。また、翻訳処理は、ｋが０より小さくない場合（Ｓ４０において、ＮＯ）、Ｓ３５に戻る。つまり、上述の処理では、上述した図３（ｃ）に示すように、ＣＹＫテーブル２０の上（連結数の多い単語列）から翻訳結果の抽出を行う。 Also, in the translation process, when t (n, n) is “empty” (NO in S31), variables p = 1 and q = n are set (S33), and k = q−p−1 is set ( S34), h = 0 is set, and it is determined whether t (k + 1, p + h + k) is not “empty” (S36). In the translation process, when t (k + 1, p + h + k) is “empty” (NO in S36), 1 is added to h (S37), and whether h is equal to or less than q−p−k (h ≦ q− pk?) is determined (S38), and if it is not less than qpk (NO in S38), the process returns to S36. Also, in the translation process, when h is equal to or less than q−p−k (YES in S38), 1 is subtracted from k (S39), and whether or not k is smaller than 0 (k <0?). Is determined (S40). Also, the translation process returns to S35 if k is not smaller than 0 (NO in S40). That is, in the above-described processing, as shown in FIG. 3C described above, the translation result is extracted from the CYK table 20 (a word string having a large number of connections).

また、翻訳処理は、ｋが０より小さい場合（Ｓ４０において、ＹＥＳ）、一文全体として用例翻訳できなかったことを意味するため、文全体を統計的機械翻訳する（Ｓ４１）。 Further, in the translation process, when k is smaller than 0 (YES in S40), it means that the example translation as a whole sentence could not be performed, so the whole sentence is statistically machine translated (S41).

また、翻訳処理は、Ｓ３６の処理において、ｔ（ｋ＋１，ｐ＋ｈ＋ｋ）が「空」でない場合（Ｓ３６において、ＹＥＳ）、ｔ（ｋ＋１，ｐ＋ｈ＋ｋ）に格納されている翻訳結果を取り出し（Ｓ４２）、ｗ_ｐ，…，ｗ_{ｐ＋ｈ−１}と、ｗ_{ｐ＋ｈ＋ｋ＋１}，…，ｗ_ｑに対して部分翻訳結果と未翻訳単語を求め（Ｓ４３）、用例ベース翻訳できなかった単語列を統計的機械翻訳で翻訳する（Ｓ４４）。 In the translation process, if t (k + 1, p + h + k) is not “empty” in the process of S36 (YES in S36), the translation result stored in t (k + 1, p + h + k) is extracted (S42), w _{_p,} ..., and _{_{w p + h-1, w}} p + h + k + 1, ..., w q determine the partial translation result and the untranslated word against (S43), to translate a statistical machine translation of the word strings that could not be example-based translation ( S44).

その後、Ｓ４１及びＳ４４の処理で得られた翻訳結果（手話単語列）を出力する（Ｓ４５）。なお、Ｓ４４の処理で得られた翻訳結果を出力する場合には、用例ベースで翻訳された翻訳結果と、統計的機械翻訳で翻訳された翻訳結果とを合成した翻訳結果を出力して処理を終了する。 Thereafter, the translation result (sign language word string) obtained by the processing of S41 and S44 is output (S45). In addition, when outputting the translation result obtained by the process of S44, the translation result which synthesize | combined the translation result translated by the example base and the translation result translated by statistical machine translation is output, and a process is carried out. finish.

ここで、図５は、ＣＹＫテーブルを用いた用例翻訳手法の具体例を示す図である。なお、図５（ａ）は、翻訳用例データの一例を示し、図５（ｂ）は、ＣＹＫテーブルの一例を示し、図５（ｃ）は、用例翻訳結果の具体例を示し、図５（ｄ）は、部分訳の具体例を示している。 Here, FIG. 5 is a diagram showing a specific example of the example translation technique using the CYK table. 5A shows an example of translation example data, FIG. 5B shows an example of the CYK table, FIG. 5C shows a specific example of the example translation result, and FIG. d) shows a specific example of partial translation.

ここで、図３のＣＹＫテーブル２０の例は、単語に注目してｉ，ｊを付与しているのに対し、図５（ｂ）のＣＹＫテーブルの例は、テーブルの格納エリアに注目してｉ，ｊと付与している。つまり、図３のＣＹＫテーブル２０は、「ｗ_ｉ，…，ｗ_ｊの翻訳結果＝ｔ（ｊ−ｉ＋１，ｊ）」（以下、（１）式という）として示しているのに対し、図５のＣＹＫテーブルは、「ｔ（ｉ，ｊ）＝ｗ_{ｊ−ｉ＋１}，…，ｗ_ｊの翻訳結果」（以下、（２）式という）を示しており、両者は相互に変換可能である。 Here, the example of the CYK table 20 in FIG. 3 gives i and j by paying attention to the word, whereas the example of the CYK table in FIG. 5B pays attention to the storage area of the table. i and j are assigned. That is, the CYK table 20 of FIG. 3 shows “translation result of w _i ,..., W _j = t (j−i + 1, j)” (hereinafter referred to as equation (1)), whereas FIG. The CYK table shows “translation results of t (i, j) = w _{j−i + 1} ,..., W _j ” (hereinafter referred to as equation (2)), and both can be converted into each other.

具体的には、上述した（１）式で、「ｘ＝ｊ−ｉ＋１、ｙ＝ｊ」とおくと、「ｊ＝ｙ、ｉ＝ｊ−ｘ＋１＝ｙ−ｘ＋１」となる。したがって、（１）式は、「左辺＝ｗ_ｉ，…，ｗ_ｊの翻訳結果＝ｗ_{ｙ−ｘ＋１}，…，ｗ_ｙの翻訳結果」（以下、（３）式という）、「右辺＝ｔ（ｊ−ｉ＋１，ｊ）＝ｔ（ｘ，ｙ）」（以下、（４）式という）となる。ここで、改めてｘ→ｉ、ｙ→ｊとおくと、「（３）式＝ｗ_{ｊ−ｉ＋１}，…，ｗ_ｊの翻訳結果」、（４）＝ｔ（ｉ，ｊ）」となり、上述した（２）式が得られる。つまり、本実施形態では、ＣＹＫテーブルに対して、例えば単語に注目してｉ，ｊを付与することもでき、テーブルの格納エリアに注目してｉ，ｊと付与することができる。 Specifically, when “x = j−i + 1, y = j” in the above-described equation (1), “j = y, i = j−x + 1 = y−x + 1” is obtained. Therefore, the expression (1) is expressed as “translation result of left side = w _i ,..., W _j = translation result of w _{y−x + 1} ,..., W _y ” (hereinafter referred to as expression (3)), “right side = t ( j−i + 1, j) = t (x, y) ”(hereinafter referred to as equation (4)). Here, if x → i and y → j are set again, “(3) expression = w _{j−i + 1} ,..., W _j translation result”, (4) = t (i, j) ”, which is described above. Equation (2) is obtained. In other words, in the present embodiment, i and j can be given to the CYK table by paying attention to the word, for example, and i and j can be given by paying attention to the storage area of the table.

図５に示すＣＹＫテーブルを用いた用例翻訳手法の具体例は、上述した日本語文１（九州と沖縄は夕方から雷雨となりそうです）に対応した用例翻訳の具体例を示すものである。日本語文１は、形態素解析結果により「九州／と／沖縄／は／夕方／から／雷雨／と／なり／そう／です」と分割され、分割された単語から全ての部分単語列を生成すると、以下の例のようになる。
「九州」
「九州／と」
「九州／と／沖縄」
「九州／と／沖縄／は」
・・・
「と」
「と／沖縄／」
「と／沖縄／は」
・・・
「そう／です」
「です」
次に、上述した部分単語列のように生成されたある単語列に対して、節・句単位翻訳用例格納手段１３に格納された翻訳用例の原始言語側単語列と照合する。なお、翻訳用例は節・句単位の翻訳用例として、図５（ａ）のように与えられているものとする。 A specific example of the example translation method using the CYK table shown in FIG. 5 is a specific example of example translation corresponding to the above-described Japanese sentence 1 (Kyushu and Okinawa are likely to be thunderstorms in the evening). Japanese sentence 1 is divided into “Kyushu / and / Okinawa / ha / evening / from / thunderstorm / to / nari / so / is” based on the morphological analysis results, and when all partial word strings are generated from the divided words, It looks like the following example.
"Kyushu"
"Kyushu /"
"Kyushu / and / Okinawa"
"Kyushu / and / Okinawa / ha"
...
"When"
"To / Okinawa /"
"To / Okinawa / Ha"
...
"That's right"
"is"
Next, a certain word string generated like the partial word string described above is collated with the source language side word string of the translation example stored in the clause / phrase unit translation example storage means 13. It is assumed that the example for translation is given as an example for translation in sections / phrases as shown in FIG.

本実施形態では、照合の結果、一致した場合には、用例の目的側単語列を用例ベース部分翻訳格納手段１４に格納されたＣＹＫテーブルの対応する格納エリアに格納する。ここで、部分単語列を翻訳用例の日本語側と照合すると、「九州／と」、「沖縄／は」、「雷雨／と／なり／そう／です」、「雷雨／と／なり」、「そう／です」が一致するため、図５（ｃ）に示す「ｔ（２，２）＝九州Ｎ」、「ｔ（２，４）＝沖縄」、「ｔ（５，１１）＝雷雨夢Ｎ」、「ｔ（３，９）＝雷雨」、「ｔ（２，１１）＝夢Ｎ」と登録する（図５（ｂ））。 In the present embodiment, when the result of collation is the same, the target side word string of the example is stored in the corresponding storage area of the CYK table stored in the example base partial translation storage unit 14. Here, when the partial word string is collated with the Japanese side of the translation example, “Kyushu / to”, “Okinawa / ha”, “Thunderstorm / to / nari / so / is”, “Thunderstorm / to / nari”, “ “So / Is” matches, “t (2,2) = Kyushu N”, “t (2,4) = Okinawa”, “t (5,11) = Thunderstorm Dreams” shown in FIG. N ”,“ t (3, 9) = thunderstorm ”, and“ t (2, 11) = dream N ”are registered (FIG. 5B).

次に、用例ベース部分翻訳格納手段１４のＣＹＫテーブルを参照し、ｔ（ｋ，ｋ）とｔ（４−ｋ，４）が共に空でないか否かを判定する。その結果、
ｋ＝１のとき、ｔ（１，１）＝空、ｔ（３，４）＝空
ｋ＝２のとき、ｔ（２，２）＝九州Ｎ、ｔ（２，４）＝沖縄
ｋ＝３のとき、ｔ（３，３）＝空、ｔ（１，４）＝空
となるため、ｋ＝２のときに、連結対象の格納エリアは共に空でないこととなる。 Next, the CYK table of the example base partial translation storage unit 14 is referred to and it is determined whether or not t (k, k) and t (4-k, 4) are both empty. as a result,
When k = 1, t (1,1) = empty, t (3,4) = empty k = 2, t (2,2) = Kyushu N, t (2,4) = Okinawa k = 3 Since t (3,3) = empty and t (1,4) = empty, the storage areas to be linked are not empty when k = 2.

したがって、ｔ（ｋ，ｋ）とｔ（４−ｋ，４）が共に空でなかった場合に、ｔ（ｋ，ｋ）とｔ（４−ｋ，４）とに格納されている翻訳結果「ｔ（２，２）＝九州Ｎ」と「ｔ（２，４）＝沖縄」とを連結し、部分訳として、用例ベース部分翻訳格納手段１４のＣＹＫテーブルに、図５（ｄ）に示す「ｔ（４，４）＝九州Ｎ沖縄」を追加登録する（図５（ｂ））。ただし、すでに同じ翻訳結果が格納されているときは、追加登録しない。例えば、「ｔ（３，９）＝雷雨」であり、「ｔ（２，１１）＝夢Ｎ」であるので共に空ではないが、２つを連結した「雷雨夢Ｎ」は、すでにｔ（５，１１）に格納されている。そのため、上述のような場合には、部分訳の追加登録は行わない。 Therefore, when both t (k, k) and t (4-k, 4) are not empty, the translation result “t (k, k) and t (4-k, 4) stored in“ t (2,2) = Kyushu N ”and“ t (2,4) = Okinawa ”are concatenated as a partial translation in the CYK table of the example base partial translation storage means 14 as shown in FIG. “t (4, 4) = Kyushu N Okinawa” is additionally registered (FIG. 5B). However, if the same translation result is already stored, no additional registration is made. For example, “t (3,9) = Thunderstorm” and “t (2,11) = Dream N”, so they are not both empty, but “Thunderstorm Dream N” connecting the two is already stored in t (5,11). Therefore, in the above case, the partial translation is not additionally registered.

次に、ｔ（１１，１１）が空でないか否かを判定する。図５の例では、空となる。ｔ（１１，１１）が空でない場合には、ｔ（１１，１１）に格納されている翻訳結果を出力して終了する。 Next, it is determined whether t (11, 11) is not empty. In the example of FIG. If t (11,11) is not empty, the translation result stored in t (11,11) is output and the process ends.

次に、入力文「ｗ_１，…，ｗ_ｉ，…，ｗ_ｊ，…，ｗ_１１」に対して、ｉ＝１＋ｈ，ｊ＝１＋ｈ＋ｋ（ｋ＝９，８，…，０、ｈ＝０，１，…，１０−ｋ）として以下の処理を行う。単語列の翻訳結果が格納されている用例ベース部分翻訳格納手段１４のｔ（ｋ＋１，１＋ｈ＋ｋ）が空でないか否かを判定する。例えば、ｋ＝４の場合には、ｈ＝０，１，…，６となり、「ｈ＝０のとき、ｔ（５，５）＝空」、「ｈ＝１のとき、ｔ（５，６）＝空」、「ｈ＝２のとき、ｔ（５，７）＝空」、「ｈ＝３のとき、ｔ（５，８）＝空」、「ｈ＝４のとき、ｔ（５，９）＝空」、「ｈ＝５のとき、ｔ（５，１０）＝空」、「ｈ＝６のとき、ｔ（５，１１）＝雷雨夢Ｎ」となる。したがって、ｈ＝６のときのみ空ではない。 Next, for the input sentence “w ₁ ,..., W _i ,..., W _j ,..., W ₁₁ ”, i = 1 + h, j = 1 + h + k (k = 9, 8,..., 0, h = 0, 1,..., 10-k), the following processing is performed. It is determined whether or not t (k + 1, 1 + h + k) in the example base partial translation storage unit 14 storing the translation result of the word string is not empty. For example, when k = 4, h = 0, 1,..., 6; “when h = 0, t (5,5) = empty”; when “h = 1”, t (5,6 ) = Empty ”,“ h = 2, t (5,7) = empty ”,“ h = 3, t (5,8) = empty ”,“ h = 4, t (5 9) = sky ”,“ when h = 5, t (5,10) = sky ”, and when h = 6, t (5,11) = thunderstorm dream N”. Therefore, it is not empty only when h = 6.

なお、本実施形態では、用例ベース部分翻訳格納手段１４のＣＹＫテーブルに翻訳結果が１つも格納されていない場合（全てのｔ（ｉ，ｊ）が空であった場合）には、一文全体に統計的機械翻訳を行う。これは、用例ベース翻訳で部分翻訳さえも得られなかった場合の処理である。この場合には、文全体を統計的機械翻訳手段１６で翻訳する。なお、統計的機械翻訳手段１６は、予め節・句単位翻訳用例格納手段１３に格納されている翻訳用例等を使って学習された翻訳辞書を用いて統計的機械翻訳を行う。 In the present embodiment, when no translation result is stored in the CYK table of the example-based partial translation storage unit 14 (when all t (i, j) are empty), the entire sentence is stored. Perform statistical machine translation. This is a process in the case where even partial translation cannot be obtained by example-based translation. In this case, the entire sentence is translated by the statistical machine translation means 16. The statistical machine translation means 16 performs statistical machine translation using a translation dictionary that is learned in advance using translation examples stored in the clause / phrase unit translation example storage means 13.

次に、用例ベース部分翻訳格納手段１４のＣＹＫテーブルのｔ（５，１１）が「空」でないため、ｔ（５，１１）に格納されている翻訳結果「雷雨夢Ｎ」を取り出す。 Next, since t (5,11) in the CYK table of the example base partial translation storage means 14 is not “empty”, the translation result “Thunderstorm dream N” stored in t (5,11) is extracted.

次に、図５の例では、入力文「ｗ_１，…，ｗ_ｉ，…，ｗ_ｊ，…，ｗ_１１」の中で、翻訳結果が得られた単語列「ｗ_７，…，ｗ_１１」の前の単語列「ｗ_１，…，ｗ_６」に対して上述と同様な方法をｐ＝ｊ＋１，ｑ＝６として再帰的に繰り返す。その結果、ＣＹＫテーブルが空でない場合には、その翻訳結果を部分翻訳として格納する。図５の例では、「九州／と／沖縄／は」の部分翻訳「九州Ｎ沖縄」を用例ベース部分翻訳格納手段１４に格納されたＣＹＫテーブルの対応する格納エリアに格納する。翻訳できなかった単語は、未翻訳単語格納手段１５に格納する。図５の例では、「夕方」と「から」が未翻訳単語格納手段１５に格納される。 Then, in the example of FIG. 5, the input sentence _{_{"w 1, ..., w i,}} ..., w j, ..., w 11 " in the, the translation results are obtained word sequence _{_"w 7,} ..., _w ₁₁ For the word string “w ₁ ,..., W ₆ ” before “,” a method similar to the above is recursively repeated with p = j + 1 and q = 6. As a result, if the CYK table is not empty, the translation result is stored as a partial translation. In the example of FIG. 5, the partial translation “Kyushu N Okinawa” of “Kyushu / and / Okinawa / ha” is stored in the corresponding storage area of the CYK table stored in the example-based partial translation storage means 14. The words that could not be translated are stored in the untranslated word storage means 15. In the example of FIG. 5, “evening” and “kara” are stored in the untranslated word storage means 15.

次に、本実施形態では、未翻訳単語格納手段１５に格納されている単語「夕方」と「から」を順に並べて単語列「夕方／から」として、それぞれの単語列を統計的機械翻訳装置で翻訳し、翻訳結果「夕がたから」を部分翻訳として取得する。 Next, in the present embodiment, the words “evening” and “from” stored in the untranslated word storage unit 15 are arranged in order to form the word string “evening / from”, and each word string is processed by the statistical machine translation device. Translate and obtain the translation result “Yugata from” as a partial translation.

次に、本実施形態では、上述した部分翻訳の翻訳結果を入力データの入力順に出力し、すなわち、「九州Ｎ沖縄夕がたから雷雨夢Ｎ」を翻訳結果（手話単語列）として出力する。 Next, in the present embodiment, the translation results of the partial translation described above are output in the input data input order, that is, “Kyushu N Okinawa Yugata to Thunder Rain Yume N” is output as a translation result (sign language word string). .

＜統計的機械翻訳手段１６における統計翻訳について＞
次に、上述した統計的機械翻訳手段１６における統計翻訳について、具体的に説明する。本実施形態では、上述した用例ベース機械翻訳手段１２における用例ベース機械翻訳において、用例翻訳できなかった日本語の部分単語列に対し、それぞれの部分単語列毎に統計翻訳を行う。なお、この場合の部分単語列の単位は、必ずしも節や句というような言語的な単位ではない。 <Statistical translation in statistical machine translation means 16>
Next, the statistical translation in the statistical machine translation means 16 mentioned above is demonstrated concretely. In the present embodiment, in the example-based machine translation in the example-based machine translation means 12 described above, statistical translation is performed for each partial word string for a Japanese partial word string that could not be example-translated. In this case, the unit of the partial word string is not necessarily a linguistic unit such as a clause or a phrase.

統計翻訳では、翻訳モデルの学習には用例翻訳で使用した節や句単位の用例を利用し、言語モデルの学習には文単位の手話文を利用する。また、統計的機械翻訳手段１６における統計翻訳では、所定の翻訳モデル（例えば、ＧＩＺＡ＋＋）、デコード（例えば、ｍｏｓｅｓ）、言語モデル（例えば、ＳＲＩＬＭ）等を用いて、統計翻訳を行うことができるが、本発明においてはこれに限定されるものではない。 In statistical translation, the example of clause or phrase used in example translation is used for learning a translation model, and a sign language sentence is used for learning a language model. In the statistical translation in the statistical machine translation means 16, statistical translation can be performed using a predetermined translation model (for example, GIZA ++), decoding (for example, mosaics), language model (for example, SRILM), and the like. However, the present invention is not limited to this.

＜重み付けについて＞
ここで、本実施形態における用例翻訳手法では、上述した節・句単位翻訳用例格納手段１３により格納されている翻訳用例データを用いているが、１つの単語に複数の翻訳結果が抽出される場合がある。そのような場合には、予め翻訳用例データに翻訳結果毎の出現頻度を設定しておき、用例翻訳時には、この出現頻度を重み付けとして、複数の翻訳結果が抽出された場合に、最も出現頻度が大きい（重み付けの重い）ものを出力することで、より高精度な翻訳結果を取得することができる。 <About weighting>
Here, in the example translation method in the present embodiment, the example data for translation stored in the section / phrase unit translation example storage unit 13 described above is used, but a plurality of translation results are extracted for one word. There is. In such a case, the appearance frequency for each translation result is set in advance in the example data for translation, and when the example translation is performed, a plurality of translation results are extracted with this appearance frequency as a weight. By outputting a large (heavy weight) one, a more accurate translation result can be acquired.

ここで、図６は、出現頻度を含む翻訳用例データの一例を示す図である。図６に示す日本語と手話との対訳データには、予め出現頻度が設定されている（図６の左側の数値）。例えば、図６に示すように、翻訳用例データに「沖縄は」⇔「沖縄Ｎ」と、「沖縄は」⇔「沖縄」とが格納されている場合には、翻訳対象の単語列「沖縄は」に対して、２つの翻訳結果が抽出されるが、その中から出現頻度の大きい「沖縄」が翻訳結果（手話単語）として出力される。 Here, FIG. 6 is a diagram illustrating an example of translation example data including the appearance frequency. Appearance frequencies are set in advance in the bilingual data of Japanese and sign language shown in FIG. 6 (numerical values on the left side of FIG. 6). For example, as shown in FIG. 6, when “Okinawa is” ⇔ “Okinawa N” and “Okinawa is” ⇔ “Okinawa” are stored in the example data for translation, the word string “Okinawa is The two translation results are extracted, and “Okinawa” having a high appearance frequency is output as a translation result (sign language word).

＜手話ＣＧ翻訳システムの概要＞
本実施形態では、部分翻訳合成手段１７により合成された翻訳結果（手話単語列）を用いて、入力データに対応する手話映像を出力することができる。 <Outline of sign language CG translation system>
In the present embodiment, a sign language image corresponding to input data can be output using the translation result (sign language word string) synthesized by the partial translation synthesis unit 17.

ここで、図７は、手話ＣＧ翻訳システムの生成画面の一例を示す図である。図７に示す画面３０には、手話映像表示領域３１と、日本語入力領域３２と、翻訳候補表示領域３３と、手話単語表示領域３４と、モーション映像連結表示領域３５とを有するよう構成されている。 Here, FIG. 7 is a diagram illustrating an example of a generation screen of the sign language CG translation system. The screen 30 shown in FIG. 7 includes a sign language video display area 31, a Japanese input area 32, a translation candidate display area 33, a sign language word display area 34, and a motion video link display area 35. Yes.

手話翻訳のモーション生成は、例えば日本語入力領域３２に対して入力された日本語文に対して上述した形態素解析が行われ、その解析結果の各単語を翻訳用例データや翻訳辞書等を用いて変換し、各手話単語に予め設定されたモーション映像を線形補間で繋いでいくことにより入力データに対する滑らかなモーション映像を生成する。 For sign language translation motion generation, for example, the above-described morphological analysis is performed on a Japanese sentence input to the Japanese input area 32, and each word of the analysis result is converted using translation example data, a translation dictionary, or the like. Then, a smooth motion video for the input data is generated by connecting the motion video preset to each sign language word by linear interpolation.

また、生成した映像は、手話映像表示領域３１に表示され、ユーザ等に提供される。このとき、手話映像表示領域３１には、図６に示すように、手話映像に対応する文章（入力データ）を表示させることもできる。なお、手話映像表示領域３１に表示される画像は、光学式モーションキャプチャによって取得した動作データと、骨格構造を持った人体ＣＧモデルで構成されている。また、ＣＧの描画は、映像コンテンツ記述言語ＴＶＭＬ（ＴＶｐｒｏｇｒａｍＭａｋｉｎｇＬａｎｇｕａｇｅ）を用いているため、容易に動作編集が可能となる。 The generated video is displayed in the sign language video display area 31 and provided to the user or the like. At this time, in the sign language video display area 31, as shown in FIG. 6, a sentence (input data) corresponding to the sign language video can be displayed. The image displayed in the sign language video display area 31 includes motion data acquired by optical motion capture and a human body CG model having a skeleton structure. In addition, since the CG drawing uses a video content description language TVML (TV program making language), operation editing can be easily performed.

＜実行プログラム＞
ここで、上述した機械翻訳装置１０は、例えばＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）等の揮発性の記憶媒体、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）等の不揮発性の記憶媒体、マウスやキーボード、ポインティングデバイス等の入力装置、画像やデータを表示する表示装置、並びに外部と通信するためのインタフェース装置を備えたコンピュータによって構成することができる。 <Execution program>
Here, the machine translation device 10 described above includes, for example, a volatile storage medium such as a CPU (Central Processing Unit) and a RAM (Random Access Memory), a nonvolatile storage medium such as a ROM (Read Only Memory), a mouse and a keyboard. Further, it can be constituted by a computer including an input device such as a pointing device, a display device for displaying images and data, and an interface device for communicating with the outside.

したがって、機械翻訳装置が有する上述した各機能は、これらの機能を記述したプログラムをＣＰＵに実行させることによりそれぞれ実現可能となる。また、これらのプログラムは、磁気ディスク（フロッピィーディスク、ハードディスク等）、光ディスク（ＣＤ−ＲＯＭ、ＤＶＤ等）、半導体メモリ等の記録媒体に格納して頒布することもできる。 Accordingly, the above-described functions of the machine translation apparatus can be realized by causing the CPU to execute a program describing these functions. These programs can also be stored and distributed in a recording medium such as a magnetic disk (floppy disk, hard disk, etc.), optical disk (CD-ROM, DVD, etc.), semiconductor memory, or the like.

つまり、上述した各構成における処理をコンピュータに実行させるための実行プログラム（機械翻訳プログラム）を生成し、例えば汎用のパーソナルコンピュータやサーバ等にそのプログラムをインストールすることにより、機械翻訳処理を実現することができる。なお、本発明における実行プログラムによる処理については、例えば上述した各処理を実現することができる。 That is, an execution program (machine translation program) for causing a computer to execute the processing in each configuration described above is generated, and the machine translation processing is realized by installing the program in, for example, a general-purpose personal computer or server Can do. In addition, about the process by the execution program in this invention, each process mentioned above is realizable, for example.

上述したように本発明によれば、高精度な翻訳を実現することができる。具体的には、本発明は、用例ベース翻訳と統計的機械翻訳とを融合した機械翻訳を提供することができる。また、本発明では、入力文中において長い用例を使って翻訳できる単語列を効率的に翻訳し、翻訳できなかった単語列に対しては統計的機械翻訳によって翻訳することにより、用例ベース翻訳と統計的機械翻訳を融合することができる。 As described above, according to the present invention, highly accurate translation can be realized. Specifically, the present invention can provide machine translation that fuses example-based translation and statistical machine translation. Further, in the present invention, the word string that can be translated by using a long example in the input sentence is efficiently translated, and the word string that cannot be translated is translated by statistical machine translation, so that example-based translation and statistical Machine translation can be fused.

なお、本発明は、日本語を手話に翻訳する際の固有名詞の翻訳を自動で行い、手話通訳の支援やＣＧへの変換に利用する自然言語処理に広く適用することができる。そのため、入力データとしては、上述した気象情報に限定されるものではなく、例えばニュース原稿やスポーツ実況等の定型表現の多いものや、それ以外のあらゆる自然言語にも適用することができる。 The present invention automatically applies proper nouns when translating Japanese into sign language, and can be widely applied to natural language processing used to support sign language interpretation and conversion to CG. Therefore, the input data is not limited to the weather information described above, and can be applied to, for example, a lot of standard expressions such as news manuscripts and sports news, and any other natural language.

また、上述した機械翻訳では、日本語から手話への翻訳例を示したが、本発明においてはこれに限定されるものではなく、例えば原始言語と翻訳言語との間で語順が同一の言語間の場合（例えば、日本語と韓国語、英語とフランス語等）にも適用することができる。 In the machine translation described above, an example of translation from Japanese into sign language has been shown. However, the present invention is not limited to this. For example, between the languages in which the word order is the same between the source language and the translated language. This can also be applied to cases (for example, Japanese and Korean, English and French).

以上本発明の好ましい実施形態について詳述したが、本発明は係る特定の実施形態に限定されるものではなく、特許請求の範囲に記載された本発明の要旨の範囲内において、種々の変形、変更が可能である。 Although the preferred embodiment of the present invention has been described in detail above, the present invention is not limited to the specific embodiment, and various modifications, within the scope of the gist of the present invention described in the claims, It can be changed.

１０機械翻訳装置
１１形態素解析手段
１２用例ベース機械翻訳手段
１３節・句単位翻訳用例格納手段
１４用例ベース部分翻訳格納手段
１５未翻訳単語格納手段
１６統計的機械翻訳手段
１７部分翻訳合成手段
２０ＣＹＫテーブル
３０画面
３１手話映像表示領域
３２日本語入力領域
３３翻訳候補表示領域
３４手話単語表示領域
３５モーション映像連結表示領域 DESCRIPTION OF SYMBOLS 10 Machine translation apparatus 11 Morphological analysis means 12 Example base machine translation means 13 Example / phrase unit translation example storage means 14 Example base partial translation storage means 15 Untranslated word storage means 16 Statistical machine translation means 17 Partial translation synthesis means 20 CYK table 30 screen 31 sign language video display area 32 Japanese input area 33 translation candidate display area 34 sign language word display area 35 motion video link display area

Claims

In a machine translation device that translates a source language into a target language using an example,
Example-based machine translation means for performing example-based translation for each word included in the input data of the source language, using example data for translation in a section / phrase unit of the target language corresponding to the source language stored in advance,
A machine translation apparatus comprising: a partial translation synthesis unit that synthesizes partial translations for one or a plurality of words translated by the example-based machine translation unit.

Statistical machine translation means that performs statistical machine translation using a translation dictionary that has been machine-learned in advance for words that could not be translated by the example-based machine translation means,
The partial translation synthesis means synthesizes the partial translation translated by the example-based machine translation means and the partial translation translated by the statistical machine translation means in correspondence with the word order of the input data. The machine translation apparatus according to claim 1.

The example-based machine translation means includes:
The machine translation apparatus according to claim 1, wherein partial translation using each word included in the input data is performed using a CYK table.

When the target language is sign language,
4. The partial translation synthesizing unit converts a sign language word string obtained by the synthesis into a sign language video using video data for a pre-stored sign language word, and outputs the sign language video. The machine translation device according to claim 1.

The machine translation program for functioning a computer as each means which the machine translation apparatus of any one of Claims 1 thru | or 4 has.