JP2017199363A

JP2017199363A - Machine translation device and computer program for machine translation

Info

Publication number: JP2017199363A
Application number: JP2017077021A
Authority: JP
Inventors: 将夫内山; Masao Uchiyama
Original assignee: National Institute of Information and Communications Technology
Current assignee: National Institute of Information and Communications Technology
Priority date: 2016-04-21
Filing date: 2017-04-07
Publication date: 2017-11-02

Abstract

PROBLEM TO BE SOLVED: To provide a machine translation device which appropriately translates a text differently in accordance with information of a range exceeding the text.SOLUTION: A machine translation device 230 includes: a grammar type determination unit 282 which specifies a grammar type of a text; a grammar type specific tag assigning unit 286 which respectively inserts a first tag and a second tag corresponding to the grammar type at a head position and an end position of the text; and a phrase based machine translation unit 288 which receives as an input the text to which the first tag and the second tag are assigned. Plural kinds of predetermined grammar types are defined. The grammar type determination unit 282 selects a first tag and a second tag according to a kind of the grammar type.SELECTED DRAWING: Figure 6

Description

この発明は機械翻訳装置に関し、特に、翻訳原文の相違を的確に翻訳文に反映させ、高精度な翻訳が可能な機械翻訳装置及び機械翻訳のためのコンピュータプログラムに関する。 The present invention relates to a machine translation apparatus, and more particularly, to a machine translation apparatus capable of accurately reflecting a difference in a translation original sentence in a translated sentence and performing a highly accurate translation, and a computer program for machine translation.

統計的機械翻訳には、様々な種類があるが、機械翻訳方式として有力視されている方式に、句に基づく（フレーズベース）統計機械翻訳（Phrase based Statistical Machine Translation：ＰＢＳＭＴ）がある。ＰＢＳＭＴは、原文をフレーズと呼ばれる数単語の連鎖に分割し、各連鎖を相手方言語のフレーズに翻訳した後で、翻訳されたフレーズを並替える（非特許文献１）。ここでいうフレーズとは、言語学でいうフレーズとは異なり、単に単語の連鎖のことをいう。フレーズ単位の翻訳の学習は、対訳データから自動的に行える。例えば、英日翻訳においては、「Hello !」は「こんにちは！」又は「もしもし。」等に自動で対応できる。以下の説明では、翻訳の原文を日本語、翻訳文を英語として説明するが、他の言語についても同様である。 There are various types of statistical machine translation, and a phrase-based (phrase-based) statistical machine translation (PBSMT) is one of the most promising machine translation systems. PBSMT divides the original text into a chain of several words called phrases, translates each chain into a phrase in the partner language, and rearranges the translated phrases (Non-Patent Document 1). The phrase used here is different from the phrase used in linguistics and simply refers to a chain of words. Phrase-based translation can be learned automatically from parallel translation data. For example, in the English-Japanese translation, "Hello!" Is can be dealt with automatically "Hello!" Or "Hello." And the like. In the following description, the original translation is described in Japanese and the translation is in English, but the same applies to other languages.

ＰＢＳＭＴは高速で、特に構造が似た言語間では、高精度で機械翻訳できる。さらに、最近の進展として、原文のフレーズを相手方言語の語順に近くなるように予め語順変換してからＰＢＳＭＴを適用することにより、英語と日本語、中国語と日本語のように、語順が大きく異なる言語間でも、高精度な翻訳が可能になった。このように語順を変換してから翻訳する技術を「事前並替」と呼ぶ。事前並替方式については、後掲の特許文献１に記載されている。 PBSMT is fast and can perform machine translation with high accuracy, especially between languages with similar structures. Furthermore, as a recent development, by applying the PBSMT after converting the original phrases so that the phrases in the original language are close to the word order of the counterpart language, the word order becomes larger, such as English and Japanese, and Chinese and Japanese. Translation between different languages is now possible with high accuracy. This technique of translating after changing the word order is referred to as “pre-ordering”. The prior rearrangement method is described in Patent Document 1 described later.

ＰＢＳＭＴの学習では、フレーズテーブルが作成される。フレーズテーブルは、多数のフレーズペアを収容する。フレーズペアは、２つの言語で互いに対訳となっているフレーズ同士の組み合わせである。 In the PBSMT learning, a phrase table is created. The phrase table contains a large number of phrase pairs. A phrase pair is a combination of phrases that are mutually translated in two languages.

フレーズテーブルの学習では、多数の対訳を含む対訳コーパスが使用される。対訳とは、例えば図１に示す文のペア３０であり、互いに相手の翻訳となっているような２つの言語の文の組み合わせである。ＰＢＳＭＴの学習の主要な部分は、このペアを構成する単語の連鎖からなるフレーズの対応関係を抽出し、フレーズペアを作成することである。 In learning the phrase table, a bilingual corpus including a large number of bilingual translations is used. The bilingual translation is, for example, the sentence pair 30 shown in FIG. 1 and is a combination of sentences in two languages that are translated by each other. The main part of the learning of PBSMT is to extract a correspondence relationship of phrases composed of a chain of words constituting the pair and create a phrase pair.

ＰＢＳＭＴの学習において、対訳コーパスの各対訳の原文と翻訳文の双方に、文頭又は文末を表す記号を挿入して学習し、翻訳時に原文に同じ記号を挿入することにより、翻訳精度が上がることが知られている。例えば、図１を参照して、対訳文３２のうち、原文の先頭及び翻訳文の先頭にいずれも文頭を表すタグ<s>４０、４４を付与し、それぞれの文末に文末を表すタグ</s>４２、４６を付与する。このように文頭と文末にそれぞれタグ<s>及び</s>を付与した対訳文を学習で使用し、翻訳時にも原文に対して同様のタグを付すことにより、ＰＢＳＭＴの翻訳性能は向上する。これは以下の様な理由による。 In PBSMT learning, the translation accuracy can be improved by inserting a symbol representing the beginning or end of the sentence into both the original and translated sentences of each bilingual corpus and inserting the same symbol into the original sentence at the time of translation. Are known. For example, referring to FIG. 1, in bilingual sentence 32, tags <s> 40 and 44 representing the beginning of the sentence are attached to the beginning of the original sentence and the beginning of the translated sentence, respectively, and tags representing the end of the sentence at the end of each sentence </ s> 42 and 46 are assigned. In this way, the translation performance of PBSMT is improved by using bilingual sentences with tags <s> and </ s> at the beginning and end of the sentence, respectively, and adding the same tags to the original sentence during translation. . This is due to the following reasons.

ＰＢＳＭＴの学習では、文頭及び文末に付されたタグも１単語として処理される。その結果、フレーズ同士の対応付がより的確に行える。上記の例では、原文（日本語）で「これ」が文頭に出現するときには、原文の「<s> これ」が翻訳文（英語）の「<s> This」と対応づけられる。原文の文中（文頭でない場所）に「これ」が出現するときには、翻訳文の「this」が対応付けられる。すなわち、原文と翻訳文とで対になるフレーズ（フレーズペア）であっても、対訳文においてそれらが出現する位置が異なる場合を適切に区別して扱えるからである。すなわち、単語の位置を示すための補助情報としてタグを付すことで、結果的にフレーズの対応付けが適切に行えるという効果が得られる。 In learning of PBSMT, tags attached to the beginning and end of sentences are also processed as one word. As a result, the correspondence between phrases can be performed more accurately. In the above example, when “this” appears at the beginning of the sentence in the original sentence (Japanese), “<s> this” in the original sentence is associated with “<s> This” in the translated sentence (English). When “this” appears in the original sentence (a place other than the beginning of the sentence), the translated sentence “this” is associated. That is, even if phrases (phrase pairs) are paired in the original sentence and the translated sentence, cases where the positions where they appear in the parallel sentence are different can be appropriately distinguished and handled. That is, by attaching a tag as auxiliary information for indicating the position of a word, an effect that phrases can be appropriately associated as a result is obtained.

特開2013-250605公報JP 2013-250605 JP

渡辺太郎、今村賢治、賀沢秀人、GrahamNeubig、中澤敏明、機械翻訳（自然言語処理シリーズ４）、コロナ社、ISBN: 978-4-339-02754-9.Taro Watanabe, Kenji Imamura, Hideto Kazawa, GrahamNeubig, Toshiaki Nakazawa, Machine Translation (Natural Language Processing Series 4), Corona, ISBN: 978-4-339-02754-9. Andrew Finch, EiichiroSumita, Dynamic Model Interpolation for Statistical Machine Translation, Proceedingsof the Third Workshop on Statistical Machine Translation, pages 208-215, 2008.Andrew Finch, EiichiroSumita, Dynamic Model Interpolation for Statistical Machine Translation, Proceedingsof the Third Workshop on Statistical Machine Translation, pages 208-215, 2008. Sutskever, I.,Vinyals, O., and Le, Q. V. Sequence to sequence learning with neural networks.In Advances in Neural Information Processing Systems (2014), pp. 3104-3112.Sutskever, I., Vinyals, O., and Le, Q. V. Sequence to sequence learning with neural networks.In Advances in Neural Information Processing Systems (2014), pp. 3104-3112.

以上のようにＰＢＳＭＴは高速でかつ高精度に機械翻訳を行える。しかし、依然としてＰＢＳＭＴには改善すべき余地がある。ＰＢＳＭＴの課題の一つは、仮に上記したような文頭及び文末を示すタグを付したとしても、フレーズの範囲を超えた情報を翻訳に導入し難いという点である。以下、具体的な問題点を挙げる。 As described above, PBSMT can perform machine translation at high speed and with high accuracy. However, there is still room for improvement in PBSMT. One of the problems of PBSMT is that it is difficult to introduce information beyond the range of phrases into the translation even if tags indicating the beginning and end of the sentence are added. Specific problems are listed below.

（１）原文の文法的なタイプによる訳し分けが困難である。 (1) Difficult to translate the original text by grammatical type.

従来のＰＢＳＭＴでは原文の文法的なタイプが異なる場合に、それを的確に訳し分けることが難しいという問題もある。この理由としては以下の様な事情が考えられる。 In the conventional PBSMT, when the grammatical type of the original text is different, it is difficult to accurately translate it. The reasons for this are as follows.

日英翻訳で、図２の上段に示すように、「監査の結果」という名詞句６０を英語に翻訳する場合を考える。ＰＢＳＭＴではまずこの原文を英語に近い語順に語順変換６２して「結果の監査」という単語列６４を得る。この単語列６４に対してタグ付与処理６６を行い、文頭に開始タグ<s>、文末に終了タグ</s>を、それぞれ付与する。この結果、単語列６８が得られる。この単語列６８に対してＰＢＳＭＴによる翻訳７０を適用した場合、"As a result of the audit"という副詞句７２が名詞句６０の訳文として得られてしまう。すなわち、名詞句６０を翻訳した結果が、名詞句ではなく副詞句７２になってしまうことがあるという問題がある。 Consider the case where the noun phrase 60 “result of audit” is translated into English as shown in the upper part of FIG. 2 in Japanese-English translation. In the PBSMT, first, the original text is converted into a word order 62 in the order of words close to English to obtain a word string 64 of “result audit”. A tag addition process 66 is performed on the word string 64 to add a start tag <s> at the beginning of the sentence and an end tag </ s> at the end of the sentence. As a result, a word string 68 is obtained. When the translation 70 by PBSMT is applied to the word string 68, the adverb phrase 72 “As a result of the audit” is obtained as a translation of the noun phrase 60. That is, there is a problem that the result of translating the noun phrase 60 may be an adverb phrase 72 instead of a noun phrase.

同様の例として図２の下段に示すような例が考えられる。この例は、「Webサーバーのサービスは動作中か？」という疑問文８０を英語に翻訳する例である。この疑問文８０を語順変換８２して、「の Web サーバーサービスは動作中か？」という単語列８４が得られる。この単語列に対してタグ付与処理８６を行うことで「<s> の Web サーバーサービスは動作中か？ </s>」という単語列８８が得られる。単語列８８に対してＰＢＳＭＴによる翻訳９０を適用することで"the web server service running ?"という、疑問文とも平叙文ともつかない訳文９２が得られてしまう。 As a similar example, the example shown in the lower part of FIG. 2 can be considered. This example is an example of translating the question sentence 80 “Is the Web server service in operation?” Into English. This question sentence 80 is converted into a word order 82, and a word string 84 “is the Web server service in operation?” Is obtained. By performing the tagging process 86 on this word string, a word string 88 “<s> Web server service is operating? </ S>” is obtained. By applying the translation 90 by PBSMT to the word string 88, a translated sentence 92 that does not have a question sentence and a plain sentence is obtained, "the web server service running?".

このような問題が生じるのは以下の様な理由による。 Such a problem occurs for the following reason.

日英翻訳で、学習に用いるタグ付対訳文が以下のような疑問文である場合を考える。

<s> Ｗｅｂサーバーのサービスは動作中か？ </s>
<s> Is the Web server service is running ? </s>

一方、学習に用いる対訳文として次のような平叙文もあり得る。

<s> Ｗｅｂサーバーのサービスは動作中です。 </s>
<s> The Web server service is running . </s>

両者の表記上の差はごくわずかである。 Consider a case where the tagged translation used for learning is a question sentence such as the following in Japanese-English translation.

<s> Is the Web server service running? </ s>
<s> Is the Web server service is running? </ s>

On the other hand, the following translated text may be used as a parallel translation used for learning.

<s> Web server service is running. </ s>
<s> The Web server service is running. </ s>

The difference in notation between the two is negligible.

語順変換した場合には、これらペアはそれぞれ以下のようになる。

<s> のＷｅｂサーバーサービスは動作中か？ </s>
<s> Is the Web server service is running ? </s>

<s> のＷｅｂサーバーサービスは動作中です。 </s>
<s> The Web server service is running . </s>
両者の表記上の差はごくわずかである。したがって、こうした対訳データを用いた場合には、フレーズテーブルに関して適切な学習ができない。具体的には、同一の日本語フレーズである「<s> のＷｅｂサーバー」というフレーズが、上記した２つの対訳において、一方では「<s> The Web server service is」に対応し、他方では「<s> Is the Web server service」に対応している。このため、このフレーズの範囲内では「<s> のＷｅｂサーバー」の訳としていずれを選択したらよいかが決定できない。その結果として、頻度が大きい平叙文の方が常に使われることになり、疑問文の翻訳に失敗する。 When the word order is converted, these pairs are as follows.

Is the <s> web server service running? </ s>
<s> Is the Web server service is running? </ s>

<s> web server service is running. </ s>
<s> The Web server service is running. </ s>
The difference in notation between the two is negligible. Therefore, when such parallel translation data is used, appropriate learning cannot be performed with respect to the phrase table. Specifically, the phrase “<s> Web server”, which is the same Japanese phrase, corresponds to “<s> The Web server service is” on one side and “ <s> Is the Web server service ”. For this reason, it is impossible to determine which one should be selected as the translation of “<s> Web server” within the range of this phrase. As a result, the plain text with higher frequency is always used, and the translation of the question text fails.

こうした問題を解決するための提案が非特許文献２に開示されている。非特許文献２は、ＰＢＳＭＴで使用するモデルとして、疑問文である対訳文から得られたモデルと、疑問文以外の対訳文から作成されたモデルとを線形補間したモデルを提案している。 A proposal for solving these problems is disclosed in Non-Patent Document 2. Non-Patent Document 2 proposes a model obtained by linearly interpolating a model obtained from a bilingual sentence that is a question sentence and a model created from a bilingual sentence other than the question sentence as a model used in PBSMT.

一方、別の方策として、疑問文は疑問文として翻訳し、名詞句は名詞句として翻訳するために、疑問文のための翻訳エンジンと、名詞句のための翻訳エンジンとを別々に構築することが考えられる。そのような方式による翻訳装置の典型的な例を図３に示す。 On the other hand, as another measure, in order to translate question sentences as question sentences and noun phrases as noun phrases, separate translation engine for question sentences and translation engine for noun phrases Can be considered. A typical example of such a translation apparatus is shown in FIG.

図３を参照して、対訳コーパス１１０を準備する。モデル学習部１１４が、この対訳コーパス１１０を使用して、文法タイプ別の翻訳のための複数のモデル１１２の学習を行う。モデル学習部１１４は、この対訳コーパスの各対訳文をそれらの文法タイプ（疑問文、平叙文、名詞句等）にしたがって複数の部分コーパス１３０に分割する。モデル学習部１１４はさらに、これらの部分コーパス１３０を用いて、従来と同様の手法によりＰＢＳＭＴのための学習１３２を行って、翻訳用の複数のモデル１１２を構築する。これらモデル１１２の各々はフレーズテーブルを含み、それぞれ特定の文法タイプの翻訳に適した構成となる。これらモデルをそれぞれ別々の機械翻訳装置にモデルとして組み込むことにより、各文法タイプに適した翻訳エンジンが得られる。例えば、名詞句用のモデルを機械翻訳装置１２０に組み込むことにより、機械翻訳装置１２０は名詞句の翻訳のための専用の翻訳エンジンとなる。 Referring to FIG. 3, a parallel corpus 110 is prepared. The model learning unit 114 uses the bilingual corpus 110 to learn a plurality of models 112 for translation by grammar type. The model learning unit 114 divides each bilingual sentence of this bilingual corpus into a plurality of partial corpora 130 according to their grammatical types (question sentence, plain text, noun phrase, etc.). The model learning unit 114 further uses the partial corpus 130 to perform learning 132 for PBSMT by a method similar to the conventional method, and constructs a plurality of models 112 for translation. Each of these models 112 includes a phrase table and has a configuration suitable for translation of a specific grammar type. By incorporating these models as models in different machine translation apparatuses, a translation engine suitable for each grammar type can be obtained. For example, by incorporating a model for noun phrases into the machine translation apparatus 120, the machine translation apparatus 120 becomes a dedicated translation engine for translation of noun phrases.

翻訳時には、入力文１１８の文法タイプにしたがって、文法タイプ別の複数の翻訳エンジンのうちで適切な翻訳エンジンを使用する。例えば名詞句用の機械翻訳装置１２０は、入力文１１８を形態素解析する形態素解析部１４０と、原文に対する事前並替を行うために、形態素解析部１４０から出力される形態素列に対して構文解析を行う構文解析部１４２と、構文解析部１４２による構文解析結果を用い、英語の語順に近い語順となるように入力文１１８の単語を並替える事前並替部１４４と、語順が並替えられた入力文１１８の文頭及び文末にそれぞれ開始タグ<s>及び終了</s>を付与するタグ付与部１４６と、語順が変換され、タグが付された入力文１１８に対してＰＢＳＭＴを行い、翻訳文１２２を出力するＰＢＳＭＴ装置１４８とを含む。 At the time of translation, an appropriate translation engine is used among a plurality of translation engines for each grammar type according to the grammar type of the input sentence 118. For example, the machine translation device 120 for a noun phrase performs a syntax analysis on a morpheme sequence output from the morpheme analysis unit 140 and a morpheme analysis unit 140 that performs a morphological analysis on the input sentence 118 and performs a pre-ordering on the original sentence. A syntactic analysis unit 142 to perform, a pre-ordering unit 144 that rearranges words in the input sentence 118 so that the word order is close to the English word order using the syntax analysis result by the syntax analysis unit 142, and the input in which the word order is rearranged A tag assigning unit 146 that assigns start tags <s> and end </ s> to the beginning and end of the sentence 118, respectively, and PBSMT is performed on the input sentence 118 with the word order converted and tagged, PBSMT device 148 that outputs 122.

図４に、図３に示すタグ付与部１４６を実現するコンピュータプログラムの制御構造をフローチャート形式で示す。図４を参照して、このプログラムは、親ルーチンから入力文（単語列）を引数として呼び出される。このプログラムは、文字列を格納する変数ＳＴＲを宣言するステップ１６０と、開始タグ<s>と入力文（単語列）と終了タグ</s>をこの順に連結して変数ＳＴＲに格納し、変数ＳＴＲを戻り値として親ルーチンに制御を復帰させるステップ１６２とを含む。 FIG. 4 is a flowchart showing the control structure of the computer program that implements the tag assignment unit 146 shown in FIG. Referring to FIG. 4, this program is called from the parent routine with an input sentence (word string) as an argument. This program declares a variable STR that stores a character string, concatenates a start tag <s>, an input sentence (word string), and an end tag </ s> in this order and stores them in the variable STR. And step 162 for returning control to the parent routine using STR as a return value.

このような方法を用いることにより、例えば名詞句用の翻訳エンジンが構築できる。学習には、例えば科学技術論文及び特許文献のタイトルだけからなる対訳コーパスを用いることができ、それによってタイトル専用の翻訳エンジンを作ることが可能になる。 By using such a method, for example, a translation engine for a noun phrase can be constructed. For learning, for example, a bilingual corpus consisting only of titles of scientific and technical papers and patent documents can be used, thereby making it possible to create a translation engine dedicated to the titles.

しかし、上記した方法では、疑問文のタイプ別に対訳コーパス１１０を分割して翻訳エンジンを構築する必要がある。その結果、翻訳エンジンの学習に用いられる対訳データの量が減少する。学習に用いる対訳データの量が翻訳エンジンの精度に大きな影響を与えることは既に知られており、文法タイプ別に構築された翻訳エンジンの翻訳性能が低下する。さらに、複数の翻訳エンジンを使用するため、運用コストが高くなるという問題もある。 However, in the above-described method, it is necessary to divide the bilingual corpus 110 for each question sentence type and to build a translation engine. As a result, the amount of parallel translation data used for translation engine learning is reduced. It is already known that the amount of parallel translation data used for learning greatly affects the accuracy of the translation engine, and the translation performance of the translation engine constructed for each grammar type is degraded. Furthermore, since a plurality of translation engines are used, there is a problem that the operation cost becomes high.

従来の技術では、文法タイプ別に適切な翻訳ができないだけではなく、場面による訳し分けができないという問題もある。例えば英語の「Hello」は、対面のときには「こんにちは」と訳せばよいが、電話での会話のときには「もしもし」と訳す必要がある。従来のＰＢＳＭＴでは、図３に示すような方策を採らない限り、このような訳し分けはできない。 In the conventional technology, there is a problem that not only proper translation for each grammar type but also translation by scene cannot be performed. For example, in English, "Hello" is, at the time of face-to-face may be being interpreted as "Hello," but, at the time of the conversation on the phone, it is necessary to translate as "Hello". In the conventional PBSMT, such a translation cannot be made unless the measures shown in FIG. 3 are taken.

従来の技術ではまた、話者による訳し分けもできないという問題がある。例えば、医療翻訳において、患者と看護師とでは、同一の文であっても適切に訳し分けることが必要な場合がある。例を挙げれば、「薬を飲みます」を英語に訳すとき、患者が話者である場合には「Ｉ」を主語とする必要があるが、看護師が話者である場合には「You」を主語とする必要がある。従来のＰＢＳＭＴでは、図３に示す様な方策を採らない限り、このような話者による訳し分けもできない。 The conventional technology also has a problem that it cannot be divided by a speaker. For example, in medical translation, it may be necessary for a patient and a nurse to properly translate even the same sentence. For example, when translating “take a drug” into English, if the patient is a speaker, the subject must be “I”, but if the nurse is a speaker, "Must be the subject. In the conventional PBSMT, such a translation by a speaker cannot be performed unless a measure as shown in FIG. 3 is taken.

さらに、従来のＰＢＳＭＴでは、文脈による訳し分けができないという問題がある。例えば日本語の「はい」を英語に訳すときを考える。「あなたはりんごが好きですか？」「はい」という文脈であれば、「はい」は「Yes」と訳し、「あなたはりんごがすきじゃないですか？」という文脈であれば「はい」は「No」と訳す必要がある。従来のＰＢＳＴＭではこのような文脈による訳し分けはできなかった。図３のような方策をとろうとしても、文脈が無数に考えられることから実現はほぼ不可能である。 Furthermore, the conventional PBSMT has a problem that it cannot be divided according to context. For example, consider the case of translating Japanese “yes” into English. In the context of “Do you like apples?” Or “Yes”, “Yes” translates to “Yes”, and in the context of “Do you like apples?” It is necessary to translate “No”. In conventional PBSTM, it was impossible to make a translation according to this context. Even if the measure as shown in FIG. 3 is taken, it is almost impossible to realize it because there are countless contexts.

上記したような、文法タイプ別の訳し分け、場面による訳し分け、話者による訳し分け、文脈による訳し分けが困難であるという問題は、結局のところ、精度の高い翻訳のために必要な、原文を超える範囲の情報が不足していることを意味している。そうした情報を機械翻訳装置に入力することは可能かもしれないが、そのために複雑な処理を行って翻訳のコストを高くすることは好ましくない。 The above-mentioned problems of translation by grammar type, translation by scene, translation by speaker, and translation by context are, after all, a problem that is necessary for accurate translation. This means that there is a shortage of information that exceeds. Although it may be possible to input such information to the machine translation device, it is not preferable to increase the cost of translation by performing complicated processing.

さらに、上記したような問題は、ＰＢＳＭＴ以外の翻訳方式を採用した場合にも存在する。例えばＬＳＴＭ（Long Short-Term Memory）を用いた機械翻訳（非特許文献３を参照）についても同様の問題が存在する。 Further, the above-described problem exists even when a translation method other than PBSMT is adopted. For example, a similar problem exists in machine translation using LSTM (Long Short-Term Memory) (see Non-Patent Document 3).

したがって、原文を超える範囲の情報にしたがって適切に原文を訳し分けられる機械翻訳装置が望まれている。 Therefore, a machine translation device that appropriately translates and separates the original text according to information in a range exceeding the original text is desired.

本発明の第１の局面に係る機械翻訳装置は、翻訳に関するメタ情報を特定するためのメタ情報特定手段と、翻訳の原文の所定位置に、メタ情報特定手段により特定されたメタ情報に対応するタグを挿入するためのメタ情報対応タグ挿入手段と、タグが付された原文を入力として受ける機械翻訳装置とを含む。メタ情報としては、予め定められた複数種類が規定されている。メタ情報対応タグ挿入手段は、メタ情報の種類に応じてタグを選択する。 The machine translation device according to the first aspect of the present invention corresponds to meta information specifying means for specifying meta information related to translation, and meta information specified by the meta information specifying means at a predetermined position of the original text of translation. It includes a meta information corresponding tag insertion means for inserting a tag, and a machine translation device that receives the original text with the tag as an input. As the meta information, a plurality of predetermined types are defined. The meta information corresponding tag insertion means selects a tag according to the type of meta information.

好ましくは、メタ情報対応タグ挿入手段は、原文のうちでメタ情報を用いた翻訳を行う範囲を特定するために、当該範囲の先頭位置及び終了位置に、メタ情報に対応する第１のタグ及び第２のタグをそれぞれ挿入するための範囲特定タグ挿入手段を含む。 Preferably, the meta information corresponding tag insertion means specifies a first tag corresponding to the meta information and a start position and an end position of the range in order to specify a range in which translation using meta information is performed in the original text. Range specifying tag insertion means for inserting each second tag is included.

好ましくは、メタ情報特定手段は、原文を形態素解析するための形態素解析手段と、形態素解析手段により形態素解析された原文の構文解析を行うための構文解析手段と、構文解析手段による原文の構文解析結果により得られた、原文の文法タイプを示す情報を、当該原文のメタ情報として出力するための文法タイプ出力手段とを含む。 Preferably, the meta information specifying unit includes a morpheme analyzing unit for performing morphological analysis of the original sentence, a syntax analyzing unit for performing syntax analysis of the original sentence analyzed by the morpheme analyzing unit, and a syntax analysis of the original sentence by the syntax analyzing unit. Grammar type output means for outputting information indicating the grammatical type of the original sentence obtained as a result as meta information of the original sentence.

より好ましくは、原文には、当該原文に関するメタ情報が付されている。メタ情報特定手段は、原文に付されているメタ情報を原文から分離してメタ情報対応タグ挿入手段に与えるためのメタ情報分離手段を含む。 More preferably, meta information relating to the original text is attached to the original text. The meta information specifying means includes meta information separating means for separating the meta information attached to the original text from the original text and giving it to the meta information corresponding tag inserting means.

さらに好ましくは、メタ情報は、原文の文法タイプ、原文が発話される場面に関する場面情報、原文を発話する話者に関する話者情報、原文に先行して機械翻訳手段により翻訳された文である先行原文の文法タイプ、及び翻訳先の言語を特定する言語情報からなるグループから選択される。 More preferably, the meta information is a grammatical type of the original sentence, scene information about a scene in which the original sentence is uttered, speaker information about a speaker who utters the original sentence, and a sentence preceded by the machine translation means. It is selected from the group consisting of the grammatical type of the original text and the linguistic information specifying the language of the translation destination.

より好ましくは、機械翻訳手段は、句に基づく機械翻訳手段である。 More preferably, the machine translation means is a phrase-based machine translation means.

さらに好ましくは、メタ情報特定手段は、翻訳の原文の翻訳先言語をメタ情報として特定するための手段を含み、メタ情報対応タグ挿入手段は、メタ情報により特定される翻訳言語を示すタグを原文の所定位置に挿入するための手段を含む。 More preferably, the meta information specifying unit includes a unit for specifying a translation destination language of the original text of the translation as meta information, and the meta information corresponding tag insertion unit includes a tag indicating the translation language specified by the meta information in the original text. Means for insertion at a predetermined position.

本発明の第２の局面に係るコンピュータプログラムは、コンピュータを、上記したいずれかの機械翻訳装置として機能させる。 A computer program according to the second aspect of the present invention causes a computer to function as any of the machine translation devices described above.

従来のＰＢＳＭＴにおけるタグ付与を示す模式図である。It is a schematic diagram which shows tag provision in the conventional PBSMT. 従来のＰＢＳＭＴによる誤訳の生成過程を示す模式図である。It is a schematic diagram which shows the production | generation process of the mistranslation by the conventional PBSMT. 従来のＰＢＳＭＴによって文法タイプ別に翻訳エンジンを構築する方法を示すブロック図である。It is a block diagram which shows the method of constructing | translating a translation engine according to grammar type by the conventional PBSMT. 従来のＰＢＳＭＴの翻訳時に入力文の文頭及び文末に開始タグ及び終了タグを付与するプログラムの概略の制御構造を示すフローチャートである。It is a flowchart which shows the general | schematic control structure of the program which assign | provides a start tag and an end tag to the sentence head and the sentence end of the input sentence at the time of translation of the conventional PBSMT. 本発明の第１の実施の形態に係るＰＢＳＭＴにおける翻訳過程を示す図である。It is a figure which shows the translation process in PBSMT which concerns on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係るＰＢＳＭＴシステム及び当該ＰＢＳＭＴシステムの学習処理部の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of the PBSMT system which concerns on the 1st Embodiment of this invention, and the learning process part of the said PBSMT system. 本発明の第１の実施の形態に係るＰＢＳＭＴシステムにおいて翻訳時の入力文の文頭及び文末にタグを付与するプログラムの制御構造の概略を示すフローチャートである。It is a flowchart which shows the outline of the control structure of the program which provides a tag to the head of the input sentence at the time of translation, and the sentence end in the PBSMT system which concerns on the 1st Embodiment of this invention. 本発明の第２の実施の形態に係るＰＢＳＭＴシステム及び当該ＰＢＳＭＴシステムの学習処理部の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of the PBSMT system which concerns on the 2nd Embodiment of this invention, and the learning process part of the said PBSMT system. 本発明の第３の実施の形態に係るＰＢＳＭＴシステム及び当該ＰＢＳＭＴシステムの学習処理部の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of the PBSMT system which concerns on the 3rd Embodiment of this invention, and the learning process part of the said PBSMT system. 本発明の第４の実施の形態に係る翻訳システム及び当該翻訳システムの学習処理部の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of the translation system which concerns on the 4th Embodiment of this invention, and the learning process part of the said translation system. 図１０に示す学習処理部のタグ付与部を実現するプログラムの制御構造を示すフローチャートである。It is a flowchart which shows the control structure of the program which implement | achieves the tag provision part of the learning process part shown in FIG. 図１０に示すニューラルネットワーク（ＮＮ）学習部によるＮＮの学習過程を説明するための模式図である。It is a schematic diagram for demonstrating the learning process of NN by the neural network (NN) learning part shown in FIG. 本発明の各実施の形態に係る翻訳システム及び当該翻訳システムの学習処理部を実現するコンピュータの外観を示す図である。It is a figure which shows the external appearance of the computer which implement | achieves the translation system which concerns on each embodiment of this invention, and the learning process part of the said translation system. 図１３に外観を示すコンピュータのハードウェア構成を示すブロック図である。FIG. 14 is a block diagram showing a hardware configuration of a computer whose appearance is shown in FIG. 13.

以下の説明及び図面では、同一の部品には同一の参照番号を付してある。したがって、それらについての詳細な説明は繰返さない。 In the following description and drawings, the same parts are denoted by the same reference numerals. Therefore, detailed description thereof will not be repeated.

［基本的な考え方］
以下に説明する各実施の形態は、フレーズの範囲を超えたメタ情報を原文に付与することにより、翻訳時にそのメタ情報を参照して適切な訳し分けをする。メタ情報として、以下の実施の形態では原文に付すタグを用いる。複数種類のタグを準備し、原文の文法タイプにより（第１の実施の形態）、場面又は話者により（第２の実施の形態）、前文脈により（第３の実施の形態）、又は翻訳先の言語により（第４の実施の形態）、異なるタグを原文に付すことにより、適切な訳し分けが行える。学習においても同様のタグ付けをして、フレーズテーブルを含む、翻訳のためのモデルの学習をする必要がある。 [basic way of thinking]
In each embodiment described below, meta information that exceeds the range of a phrase is given to the original text, so that the meta information is appropriately translated by referring to the meta information at the time of translation. As meta information, tags attached to the original text are used in the following embodiments. Prepare multiple types of tags, according to the grammatical type of the original text (first embodiment), according to a scene or speaker (second embodiment), according to the previous context (third embodiment), or translated Appropriate translation can be done by attaching different tags to the original text according to the previous language (fourth embodiment). In learning, it is necessary to perform the same tagging and learn a model for translation including a phrase table.

なお、以下の第１〜第３の実施の形態では、入力文に対して事前並替を行った後、入力文の文頭と文末に、メタ情報を表すタグを付している。事前並替とは、翻訳に先立って、原文の語順を翻訳先の言語の語順に近い語順に変換することをいう。事前並替により、統計的翻訳装置では翻訳精度が高まることが知られている（非特許文献１の１５５ページ〜１５９ページ）。しかし本発明はそのような実施の形態には限定されない。例えば事前並替を行わないようなＰＢＳＭＴにおいて、上記したメタ情報を用いることも可能である。また、ＰＢＳＭＴに対してメタ情報を適用することにより最も大きな効果が得られるが、言語モデルの構築にメタ情報が利用されることになるため、ＰＢＳＭＴ以外の一般の統計的翻訳装置に対しても効果があると考えられる。第４の実施の形態は、ＰＢＳＭＴを使用したものではなく、いわゆるディープニューラルネットワーク（ＤＮＮ）の一種であるＬＳＴＭを使用したSequence-to-Sequence型の翻訳を行うシステムである。以下の実施の形態では、入力文の文法タイプ、話者又は相手に関する情報、場面に関する情報、文脈に関する情報、及び翻訳先の言語を特定する情報等をメタ情報として用いている。しかし、メタ情報はこれらには限定されず、翻訳に有用な情報であればどのような情報を用いてもよい。 In the following first to third embodiments, after performing pre-ordering on the input sentence, tags representing meta information are attached to the beginning and end of the input sentence. Prior rearrangement means that prior to translation, the word order of the original text is converted into a word order that is close to the word order of the language of the translation destination. It is known that the translation accuracy is improved in the statistical translation device by the prior rearrangement (pages 155 to 159 of Non-Patent Document 1). However, the present invention is not limited to such an embodiment. For example, the above-described meta information can be used in PBSMT that does not perform pre-ordering. Moreover, the greatest effect can be obtained by applying meta information to PBSMT. However, since meta information is used to construct a language model, it can be applied to general statistical translation devices other than PBSMT. It is considered effective. The fourth embodiment is a system that performs Sequence-to-Sequence type translation using LSTM, which is a kind of so-called deep neural network (DNN), instead of using PBSMT. In the following embodiments, the grammatical type of the input sentence, the information about the speaker or the other party, the information about the scene, the information about the context, the information specifying the language of the translation destination, etc. are used as the meta information. However, the meta information is not limited to these, and any information that is useful for translation may be used.

なお、事前並替のための手法として、非特許文献１には、人手で並替ルールを作成する手法、並替モデルをコーパスから学習する手法、及び並替のための構文解析器自体を自動的に学習する手法が紹介されている。以下に述べる第１〜第３の各実施の形態では、これら手法のいずれを使用してもよい。また、以下の第１〜第３の実施の形態ではいずれも事前並替を行っているが、事前並替を行わなくても、メタ情報を利用しない翻訳よりも翻訳精度を向上させることが期待できる。 In addition, as a method for prior rearrangement, Non-Patent Document 1 automatically includes a method for manually creating a rearrangement rule, a method for learning a rearrangement model from a corpus, and a parser for rearrangement itself. The method of learning is introduced. In each of the first to third embodiments described below, any of these methods may be used. Also, in the following first to third embodiments, pre-ordering is performed in all cases, but it is expected that the translation accuracy will be improved compared to translation that does not use meta-information without performing pre-ordering. it can.

［第１の実施の形態］
第１の実施の形態に係るＰＢＳＭＴシステムは、ＰＢＭＳＴを行う装置であって、メタ情報としての入力の文法タイプを表すために複数種類のタグを使用する。学習時に、対訳の原文が名詞句であれば、事前並替を行った後の単語列の文頭に開始タグ<NP>を付し、文末に終了タグ</NP>を付してＰＢＳＭＴの学習を行う。対訳の原文が疑問文であれば、事前並替を行った後の単語列の文頭に開始タグ<SQ>を、文末に終了タグ</SQ>を付して学習を行う。翻訳時には、事前並替を行った入力文に対して、構文解析の結果として得られる文法タイプにしたがったタグを学習時と同様に付してＰＢＳＭＴを行う。 [First Embodiment]
The PBSMT system according to the first embodiment is a device that performs PBMST, and uses a plurality of types of tags to represent the grammatical type of input as meta information. If the original text of the translation is a noun phrase during learning, PBSMT learns by adding a start tag <NP> to the beginning of the word string after pre-sorting and an end tag </ NP> at the end of the sentence. I do. If the original text of the parallel translation is a question sentence, learning is performed with the start tag <SQ> at the beginning of the word string after the pre-sorting and the end tag </ SQ> at the end of the sentence. At the time of translation, PBSMT is applied to the input sentence that has been rearranged in advance, with tags according to the grammatical type obtained as a result of the syntax analysis being added in the same manner as in the learning.

例えば図５の上段を参照して、翻訳への入力が「監査の結果」という名詞句６０である場合を考える。語順変換６２を行うことにより「結果の監査」という単語列６４が得られる。この単語列６４に上記した文法タイプ別のタグ付与処理１８０を行う。名詞句に対するタグはここでは<NP>とすることにすると、「<NP> 結果の監査 </NP>」という単語列１８２が得られる。この単語列１８２に対してＰＢＳＭＴによる翻訳１８４を適用することで"results of the audit"という単語列１８６が翻訳結果として得られる。 For example, referring to the upper part of FIG. 5, consider a case where the input to the translation is a noun phrase 60 “result of audit”. By performing the word order conversion 62, a word string 64 of “result audit” is obtained. The grammatical type tagging process 180 described above is performed on the word string 64. If the tag for the noun phrase is <NP> here, the word string 182 "<NP> Audit result </ NP>" is obtained. By applying the translation 184 by PBSMT to this word string 182, a word string 186 of “results of the audit” is obtained as a translation result.

同様の例を図５の下段に示す。入力が「Webサーバーのサービスは動作中か？」という疑問文８０である場合を考える。この疑問文８０に語順変換８２を適用することで単語列８４が得られる。この単語列に対して文法タイプ別のタグ付与処理１９０を行う。ここでは疑問文に対応するタグとして<SQ>を用いる。その結果、文頭にタグ<SQ>が、文末にタグ</SQ>が、それぞれ付された単語列１９２が得られる。この単語列１９２に対してＰＢＳＭＴによる翻訳１９４を行うことにより、「Is the web server service running?」という翻訳文１９６が得られる。 A similar example is shown in the lower part of FIG. Consider a case where the input is a question sentence 80 "A Web server service is in operation?" The word string 84 is obtained by applying the word order conversion 82 to the question sentence 80. A tag assignment process 190 for each grammar type is performed on this word string. Here, <SQ> is used as a tag corresponding to the question sentence. As a result, a word string 192 with the tag <SQ> at the beginning of the sentence and the tag </ SQ> at the end of the sentence is obtained. A translation 196 “Is the web server service running?” Is obtained by performing translation 194 using PBSMT on the word string 192.

なお、事前並替を行う場合には、原文の構文解析木において、位置を交換すべきノードを選択してそれらの位置を交換することが必要になる。そのために、事前並替を利用する場合には前もって原文の構文解析を行う。構文解析の副産物として、原文の文法タイプが得られる。以下の実施の形態では、この文法タイプをタグの種類の判定に利用する。 In the case of performing pre-ordering, it is necessary to select nodes whose positions should be exchanged in the original sentence parsing tree and exchange their positions. Therefore, when using pre-ordering, the original text is analyzed in advance. The textual grammar type is obtained as a by-product of parsing. In the following embodiment, this grammar type is used to determine the tag type.

〈構成〉
図６を参照して、この実施の形態に係るＰＢＳＭＴシステム２１０は、対訳コーパス２２０に含まれる対訳データを学習データとして、上記した文法タイプ別のタグ付与を行うことによって、フレーズテーブルを含む翻訳のための統計的モデルの学習を行い、モデル記憶部２２２に出力するための学習処理部２２４と、入力文２２６が与えられると、モデル記憶部２２２に記憶された翻訳のためのモデルを用いた、ＰＢＳＭＴを行って翻訳文２２８を出力するための機械翻訳装置２３０とを含む。 <Constitution>
Referring to FIG. 6, the PBSMT system 210 according to this embodiment uses the bilingual data included in the bilingual corpus 220 as learning data and performs tag assignment for each grammar type described above, thereby performing translation including a phrase table. When a learning processing unit 224 for learning a statistical model for output and an input sentence 226 is given to the model storage unit 222, a model for translation stored in the model storage unit 222 is used. A machine translation device 230 for performing PBSMT and outputting a translated sentence 228.

文法タイプを用いた学習処理部２２４は、対訳コーパス２２０に含まれる対訳文を読出し、各対訳文を原文と翻訳文とに分離する対訳文読出部２５０と、対訳文読出部２５０が出力する各対訳文の原文に対して、当該原文の文法タイプを特定し、文法タイプにしたがったタグ付与を行う原文処理部２５２と、対訳文読出部２５０が出力する各対訳文の翻訳文に対して、従来と同様の手法によりタグを付与して出力する翻訳文処理部２５４と、原文処理部２５２により出力される、文法にしたがったタグ付けがされた原文と、翻訳文処理部２５４が出力する、従来と同様のタグ付けがされた翻訳文とを対にした対訳データをモデルの学習データとして記憶する学習データ記憶部２５６と、学習データ記憶部２５６に記憶された学習データを用いて翻訳のための統計的モデルの学習を従来と同様に行い、モデル記憶部２２２に格納するためのモデル学習部２５８とを含む。モデル学習部２５８の機能自体は従来と同様だが、原文に文法タイプにしたがったタグ付けが付されているため、モデル記憶部２２２に記憶されるモデルは従来と異なる。 The learning processing unit 224 using the grammar type reads the parallel translation included in the bilingual corpus 220, separates each parallel translation into an original sentence and a translated sentence, and outputs the bilingual sentence reading section 250. For the original sentence of the parallel translation sentence, the grammatical type of the original sentence is specified, the original sentence processing unit 252 that performs tagging according to the grammar type, and the translation sentence of each parallel translation sentence output by the parallel translation sentence reading unit 250, A translation processing unit 254 that outputs a tag by adding a tag in the same manner as the conventional method, a source text that is output by the source processing unit 252 and that is tagged according to the grammar, and a translation processing unit 254 outputs A learning data storage unit 256 that stores bilingual data paired with translations tagged in the same manner as the conventional learning data as model learning data, and learning data stored in the learning data storage unit 256 are used. Learns the statistical model for translation in a conventional manner and a model learning unit 258 for storing in the model storage unit 222. Although the function itself of the model learning unit 258 is the same as the conventional one, the model stored in the model storage unit 222 is different from the conventional one because the original text is tagged according to the grammatical type.

原文処理部２５２は、対訳文読出部２５０から与えられる原文に対して形態素解析を行い、形態素列を出力する形態素解析部２６０と、形態素解析部２６０が出力する形態素列に対して構文解析を行い、同時に文法タイプを判定して、構文解析結果と文法タイプとを別個に出力する文法タイプ判定部２６２と、文法タイプ判定部２６２が出力する構文解析結果を使用し、入力された原文に含まれる単語列の語順を、翻訳に先立って翻訳先の言語の語順に近い語順に並替えて出力する事前並替部２６４と、事前並替部２６４が出力する、語順が変換された単語列の文頭及び文末に、文法タイプ判定部２６２から受けた文法タイプに応じた開始タグ及び終了タグを付した単語列を学習データ記憶部２５６に出力する文法タイプ別タグ付与部２６６とを含む。 The source sentence processing unit 252 performs morpheme analysis on the source sentence given from the parallel translation reading unit 250, performs parse analysis on the morpheme analysis unit 260 that outputs a morpheme sequence, and the morpheme sequence output by the morpheme analysis unit 260. At the same time, the grammar type is determined, and the grammar type determination unit 262 that outputs the syntax analysis result and the grammar type separately, and the syntax analysis result output by the grammar type determination unit 262 is used, and is included in the input original sentence. A pre-ordering unit 264 that outputs the word sequence of the word string by rearranging the words in the order close to the word order of the translation destination language prior to the translation, and the head of the word sequence converted by the pre-ordering unit 264 and converted. And a grammatical type tag adding unit 266 that outputs a word string with a start tag and an end tag corresponding to the grammar type received from the grammar type determination unit 262 to the learning data storage unit 256 at the end of the sentence. Including the.

図６に示す文法タイプ別タグ付与部２６６を実現するコンピュータプログラムの制御構造の一例を、図７にフローチャート形式で示す。図７を参照して、このプログラムは、入力単語列と文法タイプとを引数として受けた親ルーチンから呼出され、文法タイプに応じた開始タグが文頭に、文法タイプに応じた終了タグを文末に付した単語列を戻り値として返す。このプログラムは、文字列操作に用いる文字列形の変数ＳＴＲを宣言するステップ１６０と、引数として受け取った文法タイプに応じた開始タグ及び終了タグを選択するステップ３００と、変数ＳＴＲとして、ステップ３００で選択した開始タグと、引数として受け取った入力単語列と、ステップ３００で選択した終了タグとを連結し、ＳＴＲを戻り値としてこのルーチンを終了するステップ３０２とを含む。 An example of a control structure of a computer program that realizes the grammatical type tag assignment unit 266 shown in FIG. 6 is shown in a flowchart form in FIG. Referring to FIG. 7, this program is called from a parent routine that receives an input word string and a grammar type as arguments, and a start tag corresponding to the grammar type is used at the beginning of the sentence, and an end tag corresponding to the grammar type is used at the end of the sentence. The attached word string is returned as a return value. This program includes a step 160 for declaring a character string variable STR used for character string operation, a step 300 for selecting a start tag and an end tag corresponding to the grammar type received as an argument, and a variable STR as a variable STR in step 300. Step 302 includes concatenating the selected start tag, the input word string received as an argument, and the end tag selected in Step 300, and ending this routine with STR as a return value.

ステップ３００で文法タイプに応じた開始タグ及び終了タグを選択するために、このルーチン中に文法タイプとタグとを対応付けて記述してよいし、別にルックアップテーブルをメモリに記憶しておき、文法タイプをキーとしてルックアップテーブルから開始タグ及び終了タグを読出すようにしてもよい。 In order to select the start tag and the end tag according to the grammar type in step 300, the grammar type and the tag may be described in association with each other in this routine, or a separate lookup table is stored in the memory. The start tag and the end tag may be read from the lookup table using the grammar type as a key.

翻訳文処理部２５４は、従来のＰＢＳＭＴで使用されている翻訳文処理部と同じであり、対訳文読出部２５０から対訳文の訳文を受けて、訳文を構成する単語列の先頭及び終了に従来と同じ開始タグ及び終了タグを付して学習データ記憶部２５６に出力するタグ付与部２７４を含む。 The translated sentence processing unit 254 is the same as the translated sentence processing unit used in the conventional PBSMT. The translated sentence processing unit 254 receives the translated sentence from the parallel translated sentence reading unit 250 and uses the translated sentence processing unit 254 at the beginning and end of the word string constituting the translated sentence. The tag addition part 274 which attaches | subjects the same start tag and end tag, and outputs it to the learning data storage part 256 is included.

機械翻訳装置２３０は、入力文２２６に対して形態素解析を行う形態素解析部２８０と、形態素解析部２８０が出力する形態素列に対して構文解析を行い、構文解析結果の結果から入力文２２６の文法タイプを判定して構文解析結果と入力文２２６の文法タイプとを出力する文法タイプ判定部２８２と、文法タイプ判定部２８２から構文解析結果を受け、翻訳の事前に、翻訳言語の語順に近い語順に単語を並替えて得られる単語列を出力する事前並替部２８４と、事前並替部２８４が出力する単語列と、文法タイプ判定部２８２が出力する文法タイプとを受け、事前並替部２８４が出力する単語列の文頭に文法タイプに応じた開始タグを、文末に同じ文法タイプに応じた終了タグを、それぞれ付与して出力する文法タイプ別タグ付与部２８６と、文法タイプ別タグ付与部２８６が出力する、タグが付された単語列を入力としてＰＢＳＭＴを実行して翻訳文２２８を出力するＰＢＳＭＴ装置２８８とを含む。 The machine translation device 230 performs parse analysis on the morpheme analysis unit 280 that performs morpheme analysis on the input sentence 226 and the morpheme sequence output by the morpheme analysis unit 280, and the grammar of the input sentence 226 is obtained from the result of the parse analysis result. A grammar type determination unit 282 that determines the type and outputs the parse result and the grammar type of the input sentence 226. The grammar type determination unit 282 receives the parse result from the grammar type determination unit 282, and the word order close to the word order of the translation language in advance of translation. A pre-ordering unit 284 that outputs a word string obtained by rearranging the words to the word, a word string output from the pre-sorting unit 284, and a grammar type output from the grammar type determining unit 282, A grammatical-type tag adding unit 286 that outputs a start tag corresponding to the grammar type at the beginning of a word string output by the H.264 and an end tag corresponding to the same grammar type at the end of the sentence. , And a PBSMT device 288 grammar Type tag attaching portion 286 outputs the tag to output a translation 228 running PBSMT as input word string attached.

文法タイプ別タグ付与部２８６は、文法タイプ別タグ付与部２６６と同じ機能を持ち、本実施の形態では文法タイプ別タグ付与部２６６と同一の構成を持つ。 The grammatical type tag assignment unit 286 has the same function as the grammatical type tag assignment unit 266, and has the same configuration as the grammar type tag assignment unit 266 in this embodiment.

〈動作〉
図６及び図７に示す構成を有するＰＢＳＭＴシステム２１０は以下のように動作する。ＰＢＳＭＴシステム２１０の動作フェイズは大きく分けて２つある。第１はモデル記憶部２２２の学習フェイズ、第２は機械翻訳装置２３０によるテスト又は翻訳フェイズである。なお、モデルの学習において、学習データからモデルを直接学習する方式もあるし、学習データからモデルを学習した後、モデルに与える素性の重みを最適化する方式もある。いずれの方式に対しても、本実施の形態は有効である。 <Operation>
The PBSMT system 210 having the configuration shown in FIGS. 6 and 7 operates as follows. There are roughly two operation phases of the PBSMT system 210. The first is a learning phase of the model storage unit 222, and the second is a test or translation phase by the machine translation device 230. In model learning, there is a method of directly learning a model from learning data, and a method of optimizing feature weights given to the model after learning the model from learning data. This embodiment is effective for any method.

予め、対訳コーパス２２０に多数の対訳文が記憶される。ここで準備される対訳文は、いずれもフレーズアライメントが済んでいるものとする。 A large number of parallel translation sentences are stored in the parallel corpus 220 in advance. It is assumed that the phrase translations prepared here have already been phrase-aligned.

対訳文読出部２５０は、対訳コーパス２２０から順番に各対訳文を読出し、原文を原文処理部２５２の形態素解析部２６０に与え、訳文を翻訳文処理部２５４のタグ付与部２７４に与える。 The parallel translation reading unit 250 sequentially reads each parallel translation from the parallel corpus 220, gives the original sentence to the morpheme analysis unit 260 of the original sentence processing unit 252, and gives the translated sentence to the tag adding unit 274 of the translation sentence processing unit 254.

形態素解析部２６０は、対訳文読出部２５０から与えられる原文に対して形態素解析を行い、形態素列を出力する。文法タイプ判定部２６２は、形態素解析部２６０が出力する形態素列に対して構文解析を行い、同時に文法タイプを判定して、構文解析結果と文法タイプとを別個に出力する。事前並替部２６４は、文法タイプ判定部２６２が出力する構文解析結果を使用し、入力された原文に含まれる単語列の語順を、翻訳に先立って翻訳先の言語の語順に近い語順に並替えて出力する。文法タイプ別タグ付与部２６６は、事前並替部２６４が出力する、語順が変換された単語列の文頭及び文末に、文法タイプ判定部２６２から受けた文法タイプに応じた開始タグ及び終了タグを付した単語列を学習データ記憶部２５６に出力する。 The morpheme analysis unit 260 performs a morpheme analysis on the original sentence given from the parallel translation reading unit 250 and outputs a morpheme string. The grammar type determination unit 262 performs syntax analysis on the morpheme sequence output by the morpheme analysis unit 260, simultaneously determines the grammar type, and outputs the syntax analysis result and the grammar type separately. The prior rearrangement unit 264 uses the parsing result output by the grammar type determination unit 262 to rearrange the word order of the word string included in the input original sentence in the order of words closer to the word order of the translation destination language prior to translation. Change the output. The grammatical type tag assigning unit 266 outputs a start tag and an end tag corresponding to the grammatical type received from the grammar type determining unit 262 at the beginning and end of the word string converted from the word order output from the pre-ordering unit 264. The attached word string is output to the learning data storage unit 256.

翻訳文処理部２５４のタグ付与部２７４は、対訳文のうちの翻訳文を構成する単語列の先頭及び終了に従来と同じ開始タグ及び終了タグを付して学習データ記憶部２５６に出力する。 The tag assignment unit 274 of the translated sentence processing unit 254 attaches the same start tag and end tag to the beginning and end of the word string constituting the translated sentence of the parallel translation sentence, and outputs the result to the learning data storage unit 256.

学習データ記憶部２５６は、文法タイプ別タグ付与部２６６が出力する、文法タイプ別のタグが付与された原文と、タグ付与部２７４が出力する、従来と同様のタグが付与された訳文とを対にして記憶する。モデル学習部２５８は、学習データ記憶部２５６に記憶された学習データを用いてモデルの学習を行い、そのパラメータをモデル記憶部２２２に記憶する。 The learning data storage unit 256 outputs the original sentence to which the tag for each grammar type is output, which is output from the tag assignment unit 266 for each grammar type, and the translated sentence to which the same tag is output, which is output from the tag addition unit 274. Remember in pairs. The model learning unit 258 performs model learning using the learning data stored in the learning data storage unit 256 and stores the parameters in the model storage unit 222.

翻訳時には、機械翻訳装置２３０は以下のように動作する。 At the time of translation, the machine translation device 230 operates as follows.

学習が済んだモデルを記憶したモデル記憶部２２２は、機械翻訳装置２３０から参照可能なように機械翻訳装置２３０に接続される。この接続は、機械翻訳装置２３０を実現するコンピュータのハードディスクにモデルを記憶させた後、メモリに展開することによってコンピュータのＣＰＵからモデルが読み出せるようにすることで実現してもよいし、ネットワークによりコンピュータをモデル記憶部２２２に接続し、ネットワークを介してコンピュータの内部記憶装置にモデルを記憶するようにして実現してもよい。 The model storage unit 222 that stores the learned model is connected to the machine translation device 230 so that the machine translation device 230 can refer to it. This connection may be realized by storing a model in a hard disk of a computer that implements the machine translation device 230 and then loading the model into a memory so that the model can be read from the CPU of the computer, or by a network A computer may be connected to the model storage unit 222, and the model may be stored in an internal storage device of the computer via a network.

入力文２２６が与えられたことに応答して、形態素解析部２８０は入力文２２６に対する形態素解析を行って、得られた形態素列を文法タイプ判定部２８２に与える。形態素解析部２８０の形態素解析処理は、入力文２２６の入力の後に特定のコードの入力を受けたことをトリガーとして開始してもよいし、入力文２２６の入力とは独立に、翻訳開始を指示する何らかのコマンドの入力をユーザから受けたことに応答して開始してもよい。 In response to the input sentence 226 being given, the morpheme analysis unit 280 performs a morpheme analysis on the input sentence 226 and gives the obtained morpheme sequence to the grammar type determination unit 282. The morpheme analysis process of the morpheme analysis unit 280 may be triggered by the input of a specific code after the input sentence 226 is input, or the start of translation is instructed independently of the input of the input sentence 226 May be started in response to receiving an input from the user.

文法タイプ判定部２８２は、形態素解析部２８０が出力する形態素列に対して文法タイプ判定部２６２と同様の構文解析を行ってその結果を用いて入力文２２６の文法タイプを判定し、構文解析結果を事前並替部２８４に与え、文法タイプを文法タイプ別タグ付与部２８６に与える。 The grammatical type determination unit 282 performs syntax analysis similar to the grammatical type determination unit 262 on the morpheme sequence output by the morpheme analysis unit 280, determines the grammatical type of the input sentence 226 using the result, and performs the syntax analysis result. Is provided to the pre-sorting unit 284, and the grammatical type is provided to the grammatical type tag assigning unit 286.

事前並替部２８４は、文法タイプ判定部２８２から与えられる、入力文２２６の構文解析結果に対し、翻訳先言語の語順に近い語順になるように、入力文２２６を構成する単語の語順を翻訳に先立って変換し、文法タイプ別タグ付与部２８６に与える。 The prior rearrangement unit 284 translates the word order of the words constituting the input sentence 226 so that the parsing result of the input sentence 226 given by the grammar type determination unit 282 is closer to the word order of the translation target language. Is converted and provided to the grammatical type tag assigning unit 286.

文法タイプ別タグ付与部２８６は、事前並替部２８４から受けた、語順を変換した後の単語列の文頭に、文法タイプ判定部２８２から受けた文法タイプに応じた開始タグを付し、文末に、同じく文法タイプ判定部２８２から受けた文法タイプに応じた終了タグを付す。文法タイプ別タグ付与部２８６は、このように文法タイプ別のタグ付がされた単語列をＰＢＳＭＴ装置２８８に翻訳の原文として与える。 The grammatical type tag assigning unit 286 attaches a start tag corresponding to the grammatical type received from the grammar type determining unit 282 to the beginning of the word string after the word order conversion received from the pre-ordering unit 284, and ends the sentence. Similarly, an end tag corresponding to the grammar type received from the grammar type determination unit 282 is attached. The grammatical type tag assigning unit 286 gives the word string tagged with the grammatical type tag to the PBSMT device 288 as a translation original.

ＰＢＳＭＴ装置２８８は、文法タイプ別タグ付与部２８６から与えられた単語列を翻訳の原文として、モデル記憶部２２２に記憶されたモデルを参照しながらＰＢＳＭＴを行い、翻訳文２２８を出力する。 The PBSMT device 288 performs PBSMT while referring to the model stored in the model storage unit 222 using the word string given from the grammar type tag assigning unit 286 as the original text of translation, and outputs the translated text 228.

〈本実施の形態の効果〉
上記第１の実施の形態に係るＰＢＳＭＴシステム２１０によれば、文法タイプによって異なるタグが文頭および文末に付与される。ＰＢＳＭＴでは、フレーズを構成する単語としてこれらタグも考慮される。そのため、同じフレーズであっても文頭にある場合と文中にある場合とを互いに区別できる。また、肯定文と疑問文とがタグにより区別できるようになるため、肯定文から得られるフレーズペアと疑問文から得られるフレーズペアとは、互いに異なるタグを含む。そのため、フレーズテーブルの学習が的確に行える。その結果、翻訳精度が向上する。しかもこの場合、ＰＢＳＭＴ装置自体の構成は全く変える必要がない。したがって、簡単な構成により機械翻訳の精度を向上できる。 <Effects of the present embodiment>
According to the PBSMT system 210 according to the first embodiment, tags that differ depending on the grammar type are attached to the beginning and end of a sentence. In PBSMT, these tags are also considered as words constituting a phrase. Therefore, even when the phrase is the same, it can be distinguished from the case of being at the beginning of the sentence and the case of being within the sentence. Further, since the affirmative sentence and the question sentence can be distinguished from each other by the tag, the phrase pair obtained from the affirmative sentence and the phrase pair obtained from the question sentence include different tags. Therefore, the phrase table can be accurately learned. As a result, translation accuracy is improved. In addition, in this case, there is no need to change the configuration of the PBSMT device itself. Therefore, the accuracy of machine translation can be improved with a simple configuration.

なお、上記した第１の実施の形態では、事前並替を行うために構文解析が必要であり、構文解析の結果得られる文法タイプをタグの判定に利用している。しかし本発明はそのような実施の形態には限定されない。事前並替を行わない場合には、別途、原文の文法タイプを決定可能な分類器を機械学習により構築し、その分類器を活用してもよい。 In the first embodiment described above, syntax analysis is necessary to perform pre-ordering, and a grammar type obtained as a result of syntax analysis is used for tag determination. However, the present invention is not limited to such an embodiment. If pre-sorting is not performed, a classifier that can determine the grammatical type of the original text may be separately constructed by machine learning, and the classifier may be used.

［第２の実施の形態］
上記第１の実施の形態では、文法タイプにより異なるタグをメタ情報として原文に付与している。そのために第１の実施の形態では、学習時及び翻訳時に原文に対して行われる構文解析の結果から得られる文法タイプを用いる。しかし本発明はそのような実施の形態には限定されない。例えば、メタ情報を表すタグを予め原文に付与するようにしてもよい。第２の実施の形態はそのような翻訳システムに関する。この実施の形態でも、機械翻訳の方式としてはＰＢＳＭＴを使用する。 [Second Embodiment]
In the first embodiment, different tags depending on the grammar type are given to the original text as meta information. Therefore, in the first embodiment, a grammar type obtained from the result of syntax analysis performed on the original text at the time of learning and translation is used. However, the present invention is not limited to such an embodiment. For example, a tag representing meta information may be added to the original text in advance. The second embodiment relates to such a translation system. Also in this embodiment, PBSMT is used as a machine translation system.

〈構成〉
図８に、第２の実施の形態に係るＰＢＳＭＴシステム３２０の機能的構成を示す。図８を参照して、ＰＢＳＭＴシステム３２０は、メタ情報が付された対訳文からなるメタ情報付対訳コーパス２４０を用いてＰＢＳＭＴのためのモデルの学習を行い、モデルのパラメータをモデル記憶部３４２に記憶させる、メタ情報を用いた学習処理部３４０と、モデル記憶部３４２に記憶されたモデルパラメータを用い、メタ情報付入力文３４４に対する機械翻訳を行って翻訳文３４６を出力する機械翻訳装置３４８とを含む。 <Constitution>
FIG. 8 shows a functional configuration of the PBSMT system 320 according to the second embodiment. Referring to FIG. 8, the PBSMT system 320 learns a model for PBSMT using a bilingual corpus 240 with meta information that includes bilingual sentences with meta information, and stores model parameters in the model storage unit 342. A learning processing unit 340 using meta information to be stored, a machine translation device 348 that performs machine translation on the input sentence 344 with meta information using the model parameters stored in the model storage unit 342, and outputs a translation sentence 346; including.

学習処理部３４０は、対訳文読出部２５０と、対訳文読出部２５０から対訳文の原文を受け取って、メタ情報に応じたタグを付して出力する原文処理部３６０と、対訳文読出部２５０から対訳文の訳文を受け取って従来と同様のタグを付与して出力する、図６と同じ翻訳文処理部２５４と、原文処理部３６０から出力されたメタ情報に応じたタグが付与された原文の単語列と、翻訳文処理部２５４から出力された、従来と同様のタグが付された訳文とを互いに対応づけて記憶する学習データ記憶部３６２と、学習データ記憶部３６２に記憶された学習データを用いてＰＢＳＭＴのためのモデルの学習を行ってモデルパラメータをモデル記憶部３４２に記憶させるためのモデル学習部３６４とを含む。 The learning processing unit 340 receives the translated text reading unit 250, receives the translated text from the translated text reading unit 250, attaches a tag according to the meta information, and outputs it, and the translated text reading unit 250. 6 that receives the translation of the bilingual sentence, attaches and outputs the same tag as in the past, and the original sentence with the tag corresponding to the meta information output from the original sentence processing unit 360 and the same translation sentence processing unit 254 as in FIG. A learning data storage unit 362 that stores the same word sequence and a translation sentence that is output from the translated sentence processing unit 254 and that has the same tag as the conventional one, and learning stored in the learning data storage unit 362 A model learning unit 364 for performing model learning for the PBSMT using the data and storing the model parameters in the model storage unit 342;

原文処理部３６０は、第１の実施の形態と同様の形態素解析部２６０と、第１の実施の形態の文法タイプ判定部２６２と同様の構文解析処理を行う構文解析部３７２と、事前並替部２６４と、対訳文読出部２５０から受け取った原文からメタ情報を分離するメタ情報分離部３７０と、事前並替部２６４から受け取った事前並替後の単語列の文頭及び文末に、メタ情報分離部３７０から与えられたメタ情報に対応するタグを付与して学習データ記憶部３６２に出力するタグ付与部３７４とを含む。 The source text processing unit 360 includes a morphological analysis unit 260 similar to that of the first embodiment, a syntax analysis unit 372 that performs parsing processing similar to the grammar type determination unit 262 of the first embodiment, and a pre-ordering 264, meta information separation unit 370 that separates the meta information from the original text received from the parallel translation reading unit 250, and meta information separation at the beginning and end of the word string after the pre-ordering received from the pre-ordering unit 264 A tag adding unit 374 that adds a tag corresponding to the meta information given from the unit 370 and outputs the tag to the learning data storage unit 362.

メタ情報としては、話者、話者の性別、話者の年齢又は職業を示す情報、相手、相手の性別、相手の年齢又は職業を示す情報、話者と相手の関係を示す情報等が考えられる。場面を示す情報としては、例えば対面／電話／ＴＶ会議等が考えられる。予めメタ情報付対訳コーパス２４０に記憶された対訳文の各々にメタ情報を付与しておくことにより、メタ情報を含む単語列に対する統計的なモデルの学習を行える。 Meta information may include information about the speaker, the gender of the speaker, information indicating the age or occupation of the speaker, the other party, the gender of the other party, information indicating the age or occupation of the other party, and information indicating the relationship between the speaker and the other party. It is done. As information indicating a scene, for example, face-to-face / phone / TV conference can be considered. By assigning meta information to each bilingual sentence stored in the bilingual corpus 240 with meta information in advance, a statistical model for a word string including meta information can be learned.

機械翻訳装置３４８は、メタ情報が付されたメタ情報付入力文３４４のうち、単語列を受ける形態素解析部２８０と、形態素解析部２８０が出力する形態素列に対して構文解析を行う構文解析部３８２と、構文解析部３８２による構文解析結果を用いて入力文を構成する単語列を翻訳先言語の単語列の順序に近く並替えるための事前並替部２８４と、メタ情報付入力文３４４からメタ情報を分離するメタ情報分離部３８０と、事前並替部２８４から与えられる、事前並替された単語列の文頭及び文末に、メタ情報分離部３８０が出力するメタ情報に応じた種類のタグを付与して出力するメタ情報別タグ付与部３８４と、メタ情報別タグ付与部３８４が出力するメタ情報別のタグが付された単語列を入力とし、モデル記憶部３４２に記憶されたモデルパラメータに基づく機械翻訳用のモデルを用いてＰＢＳＭＴを行って翻訳文３４６を出力するＰＢＳＭＴ装置２８８とを含む。 The machine translation device 348 includes a morpheme analysis unit 280 that receives a word string and a syntax analysis unit that performs syntax analysis on a morpheme string output by the morpheme analysis unit 280 in the input sentence 344 with meta information. 382, a pre-ordering unit 284 for rearranging the word string constituting the input sentence using the syntax analysis result by the syntax analysis unit 382 close to the order of the word string in the translation target language, and the input sentence with meta information 344 Meta information separating unit 380 that separates meta information, and a tag of a type corresponding to the meta information output from the meta information separating unit 380 at the beginning and end of a pre-ordered word string given from the pre-ordering unit 284 And a tag string for each meta information output by the meta information tag adding unit 384, and a word string to which the meta information tag is output and stored in the model storage unit 342. And a PBSMT apparatus 288 to output a translation 346 performs PBSMT using the model for a machine translation based on the Dell parameters.

〈動作〉
図６に示す第１の実施の形態では、学習時、文法タイプ判定部２６２により判定された文法タイプを用いて文法タイプ別のタグを単語列に付与している。この第２の実施の形態では、第１の実施の形態とは異なり、学習時、メタ情報分離部３７０が予めメタ情報が付された対訳文からメタ情報を分離し、タグ付与部３７４がメタ情報により異なるタグを単語列に付与する。メタ情報として何を用いるかを予め決定しておき、そのメタ情報を学習のための対訳文に付与することで、効率的にメタ情報を用いた機械翻訳のためのモデル学習が行える。 <Operation>
In the first embodiment shown in FIG. 6, at the time of learning, a tag for each grammar type is assigned to a word string using the grammar type determined by the grammar type determination unit 262. In the second embodiment, unlike the first embodiment, during learning, the meta information separation unit 370 separates the meta information from the bilingual sentence to which the meta information is added in advance, and the tag addition unit 374 Different tags are assigned to word strings depending on information. Model learning for machine translation using meta information can be efficiently performed by determining in advance what to use as meta information and assigning the meta information to a parallel translation for learning.

翻訳時にも同様で、入力文３４４にはメタ情報が付与されている。メタ情報分離部３８０がこのメタ情報を分離し、メタ情報別タグ付与部３８４に与える。メタ情報別タグ付与部３８４はメタ情報により異なるタグを原文の単語列に付与してＰＢＳＭＴ装置２８８に入力する。学習時に使用された種類のメタ情報を入力文３４４に付与しておくことで、メタ情報に応じた適切な翻訳文３４６が得られる。 The same applies to translation, and meta information is given to the input sentence 344. The meta information separation unit 380 separates the meta information and gives it to the meta information tag adding unit 384. The meta information tag assigning unit 384 assigns different tags to the word string of the original text according to the meta information and inputs them to the PBSMT device 288. By giving the input sentence 344 the type of meta information used at the time of learning, an appropriate translation sentence 346 corresponding to the meta information can be obtained.

構文解析の結果得られる文法タイプと同様、原文を分析することにより得られるメタ情報を用いる場合には、学習時の対訳コーパス２２０内の対訳文及び翻訳時の入力文３４４にメタ情報を付しておく必要はない。 Similar to the grammatical type obtained as a result of parsing, when using meta information obtained by analyzing the original sentence, meta information is attached to the bilingual sentence in the parallel corpus 220 at the time of learning and the input sentence 344 at the time of translation. There is no need to keep it.

［第３の実施の形態］
第１の実施の形態では、原文に対する構文解析の結果から判定される文法タイプ情報に基づいてタグを選択している。第２の実施の形態では、予め原文に付与されているメタ情報又は原文を解析することで得られるメタ情報に基づいてタグを選択している。以下に説明する第３の実施の形態では、メタ情報に相当する情報として１つ前の文の文法タイプを文脈情報として記憶しておき、原文にはこの文脈情報に応じて異なるタグを付与する。こうした仕組みにより、文脈に応じて原文を訳し分けることが可能になる。 [Third Embodiment]
In the first embodiment, a tag is selected based on grammar type information determined from the result of parsing the original sentence. In the second embodiment, a tag is selected based on meta information given in advance to the original text or meta information obtained by analyzing the original text. In the third embodiment described below, the grammar type of the previous sentence is stored as context information as information corresponding to the meta information, and different tags are assigned to the original sentence according to the context information. . This mechanism makes it possible to translate the original text according to the context.

〈構成〉
図９を参照して、この第３の実施の形態に係るＰＢＳＭＴシステム４００は、対訳コーパス２２０の中の対訳文を用いて機械翻訳のためのモデルの学習を行い、モデルパラメータ等をモデル記憶部４１０に記憶させるための学習処理部４１２と、入力文２２６に対して、モデル記憶部４１０に記憶されたモデルパラメータ等により構成される翻訳用のモデルを用いてＰＢＳＭＴを行って翻訳文４１４を出力する機械翻訳装置４１６とを含む。 <Constitution>
Referring to FIG. 9, PBSMT system 400 according to the third embodiment learns a model for machine translation using a bilingual sentence in bilingual corpus 220, and stores model parameters and the like in a model storage unit. A translation processing 414 is output by performing PBSMT on the learning processing unit 412 to be stored in 410 and the input sentence 226 using a model for translation composed of model parameters and the like stored in the model storage unit 410. Machine translation device 416.

学習処理部４１２は、図６と同じ対訳文読出部２５０と、対訳文読出部２５０から対訳の原文を受けて、その文の文脈に応じて異なる開始タグ及び終了タグを原文の文頭及び文末にそれぞれ付与して出力する原文処理部４４０と、対訳文読出部２５０から与えられる、対訳文の訳文にタグを付与して出力する、図６と同じ翻訳文処理部２５４と、原文処理部４４０が出力する、タグが付与された原文の単語列と、翻訳文処理部２５４が出力する、従来と同様のタグが付与された訳文とを互いに対応づけて、学習データとして記憶するための学習データ記憶部４４２と、学習データ記憶部４４２に記憶された学習データを用いてＰＢＳＭＴのためのモデルの学習を行い、モデルパラメータ等をモデル記憶部４１０に記憶させるためのモデル学習部４４４とを含む。 The learning processing unit 412 receives the parallel translation reading unit 250 as in FIG. 6 and the parallel translation original from the parallel translation reading unit 250, and sets different start tags and end tags depending on the context of the sentence at the beginning and end of the original sentence. A translation processing unit 254 and a source text processing unit 440, which are the same as those in FIG. A learning data storage for storing the word string of the original sentence to which the tag is added and the translation sentence to which the translation sentence processing unit 254 outputs the same tag as that of the conventional tag and corresponding to each other and storing them as learning data Model learning for performing learning of a model for PBSMT using the learning data stored in the learning unit 442 and the learning data storage unit 442 and storing the model parameters and the like in the model storage unit 410 And a 444.

原文処理部４４０は、形態素解析部２６０と、構文解析部３７２と、事前並替部２６４と、構文解析部３７２による構文解析結果に基づいて、処理中の原文が否定疑問文か否かを表す情報を文脈情報として記憶する文脈情報記憶部４５０と、文脈情報記憶部４５０に記憶された文脈情報を一文の処理後にシフトして記憶し、先行する原文から得られた文脈情報として出力する一文前文脈情報記憶部４５２と、事前並替部２６４から出力される事前並替後の原文の単語列に対し、一文前文脈情報記憶部４５２に記憶された一文前の文脈情報に応じて異なるタグを付与して学習データ記憶部４４２に出力するためのタグ付与部４５４とを含む。 The original sentence processing unit 440 indicates whether or not the original sentence being processed is a negative question sentence based on the result of the parsing by the morphological analysis unit 260, the syntax analysis unit 372, the pre-ordering unit 264, and the syntax analysis unit 372. Context information storage unit 450 that stores information as context information, and the context information stored in context information storage unit 450 is shifted and stored after processing of one sentence, and is output as context information obtained from the preceding original sentence For the word string of the original sentence after the pre-sorting output from the context information storage unit 452 and the pre-sorting unit 264, different tags are added according to the context information of one sentence before stored in the one-sentence context information storage unit 452 A tag assigning unit 454 for giving and outputting to the learning data storage unit 442.

機械翻訳装置４１６は、図８に示す形態素解析部２８０、構文解析部３８２、及び事前並替部２８４と、構文解析部３８２の出力から得られる、入力文２２６が否定疑問文か否かを示す文脈情報を記憶するための文脈情報記憶部４７０と、機械翻訳装置４１６が一文を処理するたびに文脈情報記憶部４７０に記憶されている文脈情報を一文前の文脈情報としてシフトして記憶し出力するための一文前文脈情報記憶部４７２と、事前並替部２８４により並替された入力文２２６の単語列の文頭及び文末に、一文前文脈情報記憶部４７２に記憶されている一文前の文脈情報に応じて異なるタグを付与して出力するタグ付与部４７４と、タグ付与部４７４が出力するタグ付の単語列を入力として、モデル記憶部４１０に記憶された翻訳のモデルパラメータ等を参照してＰＢＳＭＴを行って翻訳文４１４を出力するためのＰＢＳＭＴ装置２８８とを含む。 The machine translation device 416 indicates whether or not the input sentence 226 obtained from the output of the morphological analysis unit 280, the syntax analysis unit 382, the pre-rearrangement unit 284, and the syntax analysis unit 382 shown in FIG. Context information storage unit 470 for storing context information, and every time the machine translation device 416 processes one sentence, the context information stored in the context information storage unit 470 is shifted and stored as the previous context information. The previous sentence context stored in the previous sentence context information storage unit 472 at the beginning and end of the word string of the input sentence 226 rearranged by the previous rearrangement unit 284 A tag assignment unit 474 that assigns and outputs different tags according to information, and a tagged word string output by the tag assignment unit 474 as input, and a model parameter of translation stored in the model storage unit 410 Performing PBSMT with reference to chromatography data or the like and a PBSMT device 288 for outputting the translated sentence 414.

〈動作〉
ＰＢＳＭＴシステム４００は以下のように動作する。 <Operation>
The PBSMT system 400 operates as follows.

モデルの学習時には、対訳文読出部２５０は対訳コーパス２２０から一つずつ対訳文を取出し、原文を原文処理部４４０の形態素解析部２６０に、訳文を翻訳文処理部２５４のタグ付与部２７４に、それぞれ与える。この実施の形態では、文脈により異なるタグを原文に付す。したがって対訳コーパス２２０に記憶された対訳文は順序付けされており、対訳文読出部２５０は、対訳コーパス２２０から順序にしたがって対訳文を読出さなければならない。 When learning the model, the bilingual sentence reading unit 250 takes out the bilingual sentences one by one from the bilingual corpus 220, and the original sentence is sent to the morpheme analyzing unit 260 of the original sentence processing unit 440, and the translated sentence is sent to the tag adding unit 274 of the translated sentence processing unit 254 Give each. In this embodiment, different tags are attached to the original text depending on the context. Accordingly, the bilingual sentences stored in the bilingual corpus 220 are ordered, and the bilingual sentence reading unit 250 must read the bilingual sentences from the bilingual corpus 220 in the order.

形態素解析部２６０及び構文解析部３７２はそれぞれ原文に対して形態素解析及び構文解析をし、構文解析結果は事前並替部２６４に与えられる。構文解析部３７２は、構文解析の結果から、その文が否定疑問文か否かを示す文脈情報を出力する。文脈情報記憶部４５０は、この文脈情報を記憶する。事前並替部２６４は原文の単語列に対して構文解析部３７２による構文解析結果を用いた事前並替を行って並替後の単語列をタグ付与部４５４に与える。タグ付与部４５４は、事前並替部２６４から出力された単語列に対し、一文前文脈情報記憶部４５２に記憶された一文前の文脈情報により異なる開始タグ及び終了タグを文頭及び文末にそれぞれ付与してタグ付与部４５４に与える。最初の文を処理する場合には、一文前文脈情報記憶部４５２には何も記憶されていないため、一文前は平叙文であったと仮定する。 The morpheme analysis unit 260 and the syntax analysis unit 372 perform morpheme analysis and syntax analysis on the original sentence, respectively, and the syntax analysis result is given to the pre-ordering unit 264. The syntax analysis unit 372 outputs context information indicating whether or not the sentence is a negative question sentence from the result of the syntax analysis. The context information storage unit 450 stores this context information. The pre-rearrangement unit 264 performs pre-rearrangement on the original word string using the result of the syntax analysis by the syntax analysis unit 372, and gives the rearranged word string to the tag adding unit 454. The tag assignment unit 454 assigns, to the word string output from the pre-rearrangement unit 264, a start tag and an end tag that differ depending on the previous sentence context information stored in the previous sentence context information storage unit 452, at the beginning and end of the sentence, respectively. To the tag assignment unit 454. When the first sentence is processed, nothing is stored in the previous sentence context information storage unit 452, and therefore it is assumed that the previous sentence was a plain text.

こうした処理をする場合、全く異なる文書から抽出された文を続けて処理する場合には、先行する文書の最後の文から得た文脈情報を、次の文書の最初の文の文脈情報として利用するのは好ましくない。したがって、文書が変わるたびに一文前文脈情報記憶部４５２に記憶される文脈情報はクリアする必要がある。 When performing such processing, when processing a sentence extracted from a completely different document, the context information obtained from the last sentence of the preceding document is used as the context information of the first sentence of the next document. Is not preferred. Therefore, it is necessary to clear the context information stored in the previous sentence context information storage unit 452 every time the document changes.

翻訳文処理部２５４は、従来と同様のタグを対訳の訳文に付与して出力する。学習データ記憶部４４２は、タグ付与部４５４から出力される、一文前の文脈情報が付された事前並替後の原文の単語列と、タグ付与部２７４から出力される訳文の単語列とを互いに対応付けて記憶する。 The translated sentence processing unit 254 adds the same tag as the conventional tag to the translated sentence and outputs it. The learning data storage unit 442 outputs the word sequence of the original sentence after the pre-rearrangement with the context information of the previous sentence output from the tag addition unit 454 and the word string of the translation output from the tag addition unit 274. Store them in association with each other.

学習データが学習データ記憶部４４２において利用可能になると、モデル学習部４４４はこの学習データを使用して翻訳用のモデルの学習を開始する。学習されたモデルのモデルパラメータ等はモデル記憶部４１０に記憶される。 When the learning data becomes available in the learning data storage unit 442, the model learning unit 444 starts learning a model for translation using the learning data. The model parameters of the learned model are stored in the model storage unit 410.

入力文２２６の翻訳時には、機械翻訳装置４１６は以下のように動作する。なお、翻訳時にも文脈情報を使用するため、機械翻訳装置４１６に与えられる入力文２２６は、文書中で文が出現する順序にしたがって機械翻訳装置４１６に与えなければならない。 When translating the input sentence 226, the machine translation device 416 operates as follows. Since the context information is also used during translation, the input sentence 226 given to the machine translation device 416 must be given to the machine translation device 416 according to the order in which the sentences appear in the document.

形態素解析部２８０は、入力文２２６が与えられると形態素解析をし、得られる形態素列を構文解析部３８２に与える。構文解析部３８２は、この形態素列に対して構文解析を行い、構文解析結果を事前並替部２８４に出力する。この構文解析結果には、その文が否定疑問文か否かを示す情報が含まれる。文脈情報記憶部４７０はこの情報を記憶する。事前並替部２８４は、構文解析部３８２から与えられる構文解析結果を用いて入力文２２６を構成する単語列の順序を、翻訳先言語の単語列の順番に近くなるように翻訳に先立って並替えてタグ付与部４７４に与える。タグ付与部４７４は、一文前文脈情報記憶部４７２に記憶されている一文前の文脈情報を読出し、その文脈情報に応じて異なる開始タグ及び終了タグを入力された単語列の文頭及び文末に付与して出力する。 When the input sentence 226 is given, the morpheme analysis unit 280 performs morpheme analysis and gives the obtained morpheme sequence to the syntax analysis unit 382. The syntax analysis unit 382 performs syntax analysis on the morpheme string and outputs the syntax analysis result to the pre-ordering unit 284. This parsing result includes information indicating whether or not the sentence is a negative question sentence. The context information storage unit 470 stores this information. The prior rearrangement unit 284 rearranges the order of the word strings constituting the input sentence 226 using the syntax analysis result given from the syntax analysis unit 382 prior to translation so as to be close to the order of the word strings in the translation target language. It replaces and gives to the tag provision part 474. FIG. The tag assigning unit 474 reads the previous context information stored in the previous context information storage unit 472 and assigns different start tags and end tags to the beginning and end of the input word string according to the context information. And output.

ＰＢＳＭＴ装置２８８は、タグ付与部４７４から出力される、タグ付の単語列に対し、モデル記憶部４１０に記憶されたモデルパラメータ等からなる翻訳用のモデルを適用することによりＰＢＳＭＴを行って翻訳文４１４を出力する。翻訳文４１４の出力が完了すると、文脈情報記憶部４７０に記憶されていた文脈情報が一文前文脈情報記憶部４７２にシフトされ、一文前の文脈情報として利用可能になる。 The PBSMT device 288 performs PBSMT by applying a model for translation composed of model parameters and the like stored in the model storage unit 410 to the tagged word string output from the tag addition unit 474, and translates the translated sentence. 414 is output. When the output of the translated sentence 414 is completed, the context information stored in the context information storage unit 470 is shifted to the previous sentence context information storage unit 472 and can be used as the previous sentence context information.

〈本実施の形態の効果〉
本実施の形態によれば、翻訳フェイズでは、一文前の原文が否定疑問文か否か等を示す文脈情報が一文前文脈情報記憶部４７２に記憶されている。この文脈情報に応じたタグを単語列に付与してＰＢＳＭＴ装置２８８への入力とすることにより、一文前が否定疑問文である場合とそうでない場合等の文脈に応じて適切に訳し分けることが可能になる。 <Effects of the present embodiment>
According to the present embodiment, in the translation phase, context information indicating whether or not the previous sentence is a negative question sentence is stored in the previous sentence context information storage unit 472. By assigning a tag corresponding to the context information to the word string and using it as an input to the PBSMT device 288, it is possible to appropriately translate depending on the context, such as when the previous sentence is a negative question sentence or not. It becomes possible.

なお、この第３の実施の形態では、一文前が否定疑問文か否かのみを文脈情報として用いている。しかし本発明はそのような実施の形態には限定されない。ある文の前に存在する一連の文をひとまとめにしてクラス分けし、クラスに応じたタグを後続する文に付与するようにしてもよい。クラスとしては、肯定／否定、疑問／平叙等を使用できるし、これらを組み合わせても使用できる。 In the third embodiment, only whether or not the previous sentence is a negative question sentence is used as context information. However, the present invention is not limited to such an embodiment. A series of sentences existing before a certain sentence may be grouped together, and a tag corresponding to the class may be assigned to the subsequent sentence. As a class, affirmation / denial, question / phrase, etc. can be used, or a combination of these can be used.

上記した各実施の形態では、機械翻訳エンジンとしてはＰＢＳＭＴを利用している。しかし本発明はそのような実施の形態には限定されない。それ以外の機械翻訳方式であっても、対訳文を統計処理することにより学習するモデルを用いる機械翻訳方式であれば上記各実施の形態と同様の効果が得られる。 In each of the above-described embodiments, PBSMT is used as the machine translation engine. However, the present invention is not limited to such an embodiment. Even with other machine translation systems, the same effects as those of the above embodiments can be obtained as long as the machine translation system uses a model that learns by statistically processing parallel translations.

なお、上記第１〜第３の実施の形態ではメタ情報を表すタグを原文の文頭と文末とに付している。しかし本発明はそのような位置にメタ情報を付する実施の形態には限定されない。要は、メタ情報により訳し分ける部分が特定できるようにタグを付与すればよい。その場合には、訳し分けを行う必要がある部分が原文に相当することになる。以下に述べる第４の実施の形態がそのような例に当たる。 In the first to third embodiments, tags representing meta information are attached to the beginning and end of a sentence. However, the present invention is not limited to an embodiment in which meta information is attached to such a position. In short, it is only necessary to add a tag so that a part to be translated can be specified by meta information. In that case, the portion that needs to be translated corresponds to the original text. The fourth embodiment described below corresponds to such an example.

［第４の実施の形態］
〈構成〉
以下の第４の実施の形態において説明するように、あるタグが付された後、次のタグに遭遇した場合には、前のタグによるメタ情報が影響を及ぼす範囲が終了したものと考えられ、その場合にはメタ情報の終了タグを省略できる。また、翻訳対象の文の末尾に到達した場合に、メタ情報の影響が及ぶ範囲が終わったものと解釈することにより、同様に終了タグを省略できる。 [Fourth Embodiment]
<Constitution>
As described in the fourth embodiment below, when a next tag is encountered after a certain tag is attached, it is considered that the range in which the meta information by the previous tag affects has ended. In that case, the end tag of the meta information can be omitted. Further, when the end of the sentence to be translated is reached, the end tag can be similarly omitted by interpreting that the range affected by the meta information is over.

第１の実施の形態では、原文に対する構文解析の結果から判定される文法タイプ情報に基づいてタグを選択している。第２の実施の形態では、予め原文に付与されているメタ情報又は原文を解析することで得られるメタ情報に基づいてタグを選択している。第３の実施の形態では、メタ情報に相当する情報として１つ前の文の文法タイプを文脈情報として記憶しておき、原文にはこの文脈情報に応じて異なるタグを付与している。以下に説明する第４の実施の形態では、メタ情報として、翻訳先の言語を特定するタグを原文の先頭に付与している。モデルの学習時、対訳の一方に、対訳の他方の言語を特定するメタ情報を付与して学習を行い、翻訳時に入力原文の先頭に翻訳先の言語を特定するメタ情報を付与することで、１つのモデルを用いて複数の言語への訳し分けを行うことができる。 In the first embodiment, a tag is selected based on grammar type information determined from the result of parsing the original sentence. In the second embodiment, a tag is selected based on meta information given in advance to the original text or meta information obtained by analyzing the original text. In the third embodiment, the grammar type of the previous sentence is stored as context information as information corresponding to the meta information, and different tags are assigned to the original sentence according to the context information. In the fourth embodiment described below, a tag for specifying the language of the translation destination is added to the head of the original text as meta information. When learning the model, by adding meta information that specifies the other language of the parallel translation to one of the parallel translations, and by adding meta information that specifies the language of the translation destination at the beginning of the input source text at the time of translation, Translation into a plurality of languages can be performed using one model.

図１０を参照して、この実施の形態に係る翻訳システム５００は、多数の言語の組み合わせに関する対訳を含むマルチリンガル対訳コーパス５１０と、マルチリンガル対訳コーパス５１０から各対訳を読出して、Sequence-to-Sequence型の翻訳を行うＮＮの学習を行う学習処理部５１２と、学習処理部５１２により学習が行われたＮＮのパラメータを記憶するＮＮパラメータ記憶部５１４とを含む。本実施の形態では、マルチリンガル対訳コーパス５１０に記憶されている各対訳にはその言語を示す情報は付されていないものとする。本実施の形態で使用するＮＮは、非特許文献３に記載されたＬＳＴＭを用いるものと同様の構成を持つ。 Referring to FIG. 10, translation system 500 according to the present embodiment reads a multilingual parallel corpus 510 including parallel translations related to a combination of many languages, and reads each parallel translation from multilingual parallel corpus 510, and sequence-to- A learning processing unit 512 that performs learning of an NN that performs sequence-type translation and an NN parameter storage unit 514 that stores parameters of the NN learned by the learning processing unit 512 are included. In the present embodiment, it is assumed that each bilingual translation stored in the multilingual parallel corpus 510 has no information indicating its language. The NN used in the present embodiment has the same configuration as that using the LSTM described in Non-Patent Document 3.

学習処理部５１２は、マルチリンガル対訳コーパス５１０から各対訳を読出す対訳文読出部５４０と、対訳文読出部５４０により読出された対訳のうち、第１文の先頭に第２文の言語を示すタグを付して出力する第１文処理部５４２と、対訳文読出部５４０に読出された対訳のうち、第２文の先頭に第１文の言語を示すタグを付して出力する第２文処理部５４４と、第１文処理部５４２の出力する第１文と第２文処理部５４４の出力する第２文とをペアにして学習データを生成し出力する学習データ生成部５４６と、出力された学習データを記憶する学習データ記憶部５４８と、学習データ記憶部５４８に記憶された各対訳データを用いてＮＮ５５２の学習を行うＮＮ学習部５５０とを含む。 The learning processing unit 512 indicates the language of the second sentence at the beginning of the first sentence among the parallel sentence reading part 540 that reads each parallel translation from the multilingual parallel corpus 510 and the parallel sentence read by the parallel sentence reading part 540. A first sentence processing unit 542 that outputs a tag, and a second sentence that is output by adding a tag indicating the language of the first sentence to the beginning of the second sentence among the parallel translations read by the parallel sentence reading unit 540. A sentence processing unit 544; a learning data generation unit 546 that generates and outputs learning data by pairing the first sentence output from the first sentence processing unit 542 and the second sentence output from the second sentence processing unit 544; A learning data storage unit 548 that stores the output learning data and an NN learning unit 550 that performs learning of the NN 552 using each parallel translation data stored in the learning data storage unit 548 are included.

この実施の形態において、マルチリンガル対訳コーパス５１０が記憶している対訳の各々は、ある言語の文と、別のある言語の文とが対になったものである。しかし本発明はそのような実施の形態には限定されない。例えば３個以上の言語で互いに訳文になっている訳文グループを集めたコーパスでもよい。そうした場合には、例えば対訳文読出部５４０がそれら訳文グループのうちから任意の２つの文を対訳として選択し第１文処理部５４２及び第２文処理部５４４に与えるようにすればよい。したがって、本実施の形態で使用されるタグは少なくとも２種類以上となり、マルチリンガル対訳コーパス５１０に記憶されている対訳の文の言語の数だけ存在することになる。 In this embodiment, each of the translations stored in the multilingual parallel corpus 510 is a pair of a sentence in one language and a sentence in another language. However, the present invention is not limited to such an embodiment. For example, it may be a corpus in which translation groups that are translated into each other in three or more languages are collected. In such a case, for example, the bilingual sentence reading unit 540 may select any two sentences from the translation group as bilingual translations and give them to the first sentence processing unit 542 and the second sentence processing unit 544. Therefore, there are at least two types of tags used in the present embodiment, and there are as many as the number of parallel sentence languages stored in the multilingual parallel corpus 510.

第１文処理部５４２は、対訳文読出部５４０が読出した対訳のうち、第１文の言語を識別し、その言語を特定する情報を出力する言語識別部５８０を含む。第２文処理部５４４は、同様に、第２文の言語を識別し、その言語を特定する情報を出力する言語識別部５９０を含む。第１文処理部５４２は、対訳文読出部５４０が読出した対訳の第１文に、言語識別部５９０が出力した第２文の言語を示すタグを付して出力するタグ付与部５８２をさらに含む。第２文処理部５４４も同様に、対訳文読出部５４０が読出した対訳の第２文に、言語識別部５８０が出力した第１文の言語を示すタグを付して出力するタグ付与部５９２をさらに含む。 The first sentence processing unit 542 includes a language identifying unit 580 that identifies the language of the first sentence in the parallel translation read by the parallel translation reading unit 540 and outputs information specifying the language. Similarly, the second sentence processing unit 544 includes a language identifying unit 590 that identifies the language of the second sentence and outputs information specifying the language. The first sentence processing unit 542 further includes a tag adding unit 582 that adds a tag indicating the language of the second sentence output by the language identifying unit 590 to the first sentence of the parallel translation read by the parallel sentence reading unit 540 and outputs the tag. Including. Similarly, the second sentence processing unit 544 attaches a tag indicating the language of the first sentence output by the language identifying unit 580 to the second sentence of the parallel translation read by the parallel sentence reading unit 540 and outputs the tag. Further included.

図１０に示すタグ付与部５８２とタグ付与部５９２とは同じ構成を持ち、いずれも本実施の形態ではコンピュータプログラムにより実現される。図１１を参照して、例えばタグ付与部５８２を実現するプログラムは、変数ＳＴＲを宣言するステップ６３０と、処理対象の対訳の第１文を対訳文読出部５４０の出力が格納されたメモリロケーションから読出すステップ６３２と、第２文の言語を示す言語タグを言語識別部５９０の出力が格納されたメモリロケーションから読出すステップ６３４と、ステップ６３４で読出した言語タグ、ステップ６３２で読出した第１文、及び文末を示す記号＜ＥＯＳ＞を結合した文字列を変数ＳＴＲに代入するステップ６３６と、変数ＳＴＲの格納内容を学習データ生成部５４６に出力するステップ６３８とを含む。 The tag assignment unit 582 and the tag assignment unit 592 illustrated in FIG. 10 have the same configuration, and both are realized by a computer program in this embodiment. Referring to FIG. 11, for example, a program that implements tag assigning unit 582 declares variable STR from step 630, and reads the first sentence of the parallel translation to be processed from the memory location where the output of parallel sentence reading unit 540 is stored. Step 632 for reading, Step 634 for reading the language tag indicating the language of the second sentence from the memory location where the output of the language identification unit 590 is stored, the language tag read in Step 634, and the first read in Step 632 A step 636 for substituting a character string obtained by combining a sentence and a symbol <EOS> indicating the end of the sentence into a variable STR, and a step 638 for outputting the stored contents of the variable STR to the learning data generation unit 546 are included.

タグ付与部５９２を実現するプログラムでは、図１１において第１文を第２文と読替え、第２文を第１文と読替えればよい。 In the program that implements the tag assignment unit 592, the first sentence may be replaced with the second sentence and the second sentence may be replaced with the first sentence in FIG. 11.

図１０を参照して、学習データ生成部５４６は、上記第１文処理部５４２の出力及び第２文処理部５４４の出力から学習データを生成する。具体的には学習データ生成部５４６は、第１文処理部５４２の出力を原文、第２文処理部５４４の出力を訳文とする学習データと、第２文処理部５４４の出力を原文、第１文処理部５４２の出力を訳文とする学習データとを生成し学習データ記憶部５４８に格納する。学習データ生成部５４６により、処理対象の対訳について、第１文及び第２文の言語の組み合わせに関する双方向の学習データが準備できる。 Referring to FIG. 10, learning data generation unit 546 generates learning data from the output of first sentence processing unit 542 and the output of second sentence processing unit 544. Specifically, the learning data generation unit 546 has learning data in which the output of the first sentence processing unit 542 is the original text, the output of the second sentence processing unit 544 is the translated sentence, and the output of the second sentence processing unit 544 is the original text. Learning data with the output of the single sentence processing unit 542 as a translation is generated and stored in the learning data storage unit 548. The learning data generation unit 546 can prepare bi-directional learning data related to the combination of the first sentence and the second sentence for the translation to be processed.

ＮＮ学習部５５０は、学習データ記憶部５４８に記憶されている学習データを用いてＮＮ５５２の学習を行う機能を持つ。この学習は、非特許文献３に記載された技術と同様に行うことができる。 The NN learning unit 550 has a function of performing learning of the NN 552 using learning data stored in the learning data storage unit 548. This learning can be performed in the same manner as the technique described in Non-Patent Document 3.

具体的に、非特許文献３に記載された学習の概要は以下のようなものである。学習用の対訳の原文が単語Ａ、Ｂ、Ｃを含み、翻訳文がＷ、Ｘ、Ｙ、Ｚを含むものとする。これらの末尾にはいずれも文末記号＜ＥＯＳ＞が付されている。図１２を参照して、例えば最初に入力文の単語Ａ、Ｂ、及びＣをそれぞれ順番にＮＮへの入力とし、これらをそれぞれ教師信号として誤差逆伝播法によりＮＮの学習を行う。入力文の終わりを示す記号＜ＥＯＳ＞に対しては、翻訳文の単語の先頭Ｗを教師信号としてＮＮの学習を行う。以下、翻訳文の単語Ｘ、Ｙ、Ｚを入力とし、その次の単語Ｙ、Ｚ及び翻訳文の終了を示す記号＜ＥＯＳ＞を教師信号としてＮＮの学習を行う。こうした処理を全ての対訳文について行う。 Specifically, the outline of learning described in Non-Patent Document 3 is as follows. It is assumed that the original text of the parallel translation for learning includes words A, B, and C, and the translated text includes W, X, Y, and Z. All of these end with a sentence ending symbol <EOS>. Referring to FIG. 12, for example, first, words A, B, and C of an input sentence are input to NN in order, and NN is learned by back propagation using these as teacher signals. For the symbol <EOS> indicating the end of the input sentence, NN learning is performed using the beginning W of the word of the translated sentence as a teacher signal. In the following, NN learning is performed using words X, Y, and Z of the translated sentence as inputs, and using the next word Y, Z and the symbol <EOS> indicating the end of the translated sentence as a teacher signal. This process is performed for all parallel translations.

本実施の形態でＮＮ学習部５５０がＮＮ５５２に対して行う学習もこれと全く同じである。違う点は、入力される各文の先頭に対訳の相手側の言語を示すタグが付されている点だけである。 The learning performed by the NN learning unit 550 on the NN 552 in the present embodiment is exactly the same. The only difference is that a tag indicating the language of the other party of the translation is attached to the head of each sentence to be input.

図１０に戻って、ＮＮを用いる機械翻訳装置５１８は、入力文５１６に対してＮＮを用いた機械翻訳を行い、翻訳文５２０を出力するためのものである。機械翻訳装置５１８は、翻訳先の言語を決定するために、ユーザの入力を対話形式で受け付ける入出力装置６００と、入出力装置６００により入力された翻訳先の言語を示すタグを記憶するターゲット言語記憶部６０２と、ターゲット言語記憶部６０２に接続され、入力文５１６を受けてその先頭にターゲット言語記憶部６０２から読出したタグを、末尾に文末記号＜ＥＯＳ＞を、それぞれ付して出力するタグ付与部６０４と、タグ付与部６０４の出力するタグ付きの入力文５１６に対し、ＮＮパラメータ記憶部５１４に記憶されたパラメータを持つ、ＮＮ５５２と同様の構成のＮＮにより翻訳を行って翻訳文５２０を出力するためのＮＮによる翻訳エンジン６０６とを含む。 Returning to FIG. 10, the machine translation device 518 using the NN performs machine translation using the NN on the input sentence 516 and outputs the translation sentence 520. The machine translation device 518 has an input / output device 600 that accepts user input in an interactive manner and a tag that indicates a translation destination language input by the input / output device 600 in order to determine a translation destination language. A tag that is connected to the storage unit 602 and the target language storage unit 602, receives the input sentence 516, reads a tag read from the target language storage unit 602 at the top, and adds a sentence ending symbol <EOS> to the end and outputs the tag The translation unit 604 and the input sentence 516 with a tag output from the tag addition unit 604 are translated by an NN having a parameter stored in the NN parameter storage unit 514 and having the same configuration as that of the NN 552. NN translation engine 606 for output.

本実施の形態では、入出力装置６００を用いてユーザが翻訳先の言語を指定する構成になっているが、本発明はそのような実施の形態には限定されない。例えば機械翻訳装置５１８が組み込まれた装置（スマートフォン、コンピュータ等）に設定された、ユーザインターフェイスの言語として選択された言語を用いるようにしてもよい。 In this embodiment, the user designates a language to be translated using the input / output device 600, but the present invention is not limited to such an embodiment. For example, the language selected as the language of the user interface set in a device (smart phone, computer, etc.) in which the machine translation device 518 is incorporated may be used.

〈動作〉
以上に構成を説明した翻訳システム５００は以下のように動作する。翻訳システム５００の動作には２つのフェイズがある。第１はＮＮの学習フェイズであり、第２は機械翻訳装置５１８による翻訳フェイズである。 <Operation>
The translation system 500 whose configuration has been described above operates as follows. There are two phases in the operation of translation system 500. The first is an NN learning phase, and the second is a translation phase by the machine translation device 518.

学習の際、翻訳システム５００の学習処理部５１２は以下のように動作する。予めマルチリンガル対訳コーパス５１０には多数の言語の組み合わせに関する多数の対訳が格納されている。対訳文読出部５４０は、マルチリンガル対訳コーパス５１０から順番に対訳を取出し、各対訳の第１文を第１文処理部５４２に、第２文を第２文処理部５４４に与える。第１文処理部５４２の言語識別部５８０は、第１文の言語を識別し、その言語を示すタグを所定のメモリロケーションに格納する。第２文処理部５４４の言語識別部５９０は、同様にして第２文の言語を識別し、その言語を示すタグを所定のメモリロケーションに記憶するタグ付与部５８２は、対訳文読出部５４０から対訳の第１文を受け取ると、言語識別部５９０により識別された言語のタグを所定のメモリロケーションから読出し、第１文の先頭にそのタグを、末尾に文末記号＜ＥＯＳ＞を、それぞれ付して学習データ生成部５４６に出力する。タグ付与部５９２も同様に、対訳文読出部５４０から対訳の第２文を受け取ると、第１文の言語を示すタグを所定のメモリロケーションから読出し、第２文の先頭にそのタグを、末尾に文末記号＜ＥＯＳ＞を、それぞれ付して学習データ生成部５４６に与える。 During learning, the learning processing unit 512 of the translation system 500 operates as follows. A multilingual parallel corpus 510 stores a large number of parallel translations related to a combination of a large number of languages. The parallel translation reading unit 540 sequentially extracts the parallel translations from the multilingual parallel corpus 510 and provides the first sentence of each parallel translation to the first sentence processing unit 542 and the second sentence to the second sentence processing unit 544. The language identifying unit 580 of the first sentence processing unit 542 identifies the language of the first sentence, and stores a tag indicating the language in a predetermined memory location. Similarly, the language identifying unit 590 of the second sentence processing unit 544 identifies the language of the second sentence and stores the tag indicating the language in a predetermined memory location. When the first sentence of the parallel translation is received, the tag of the language identified by the language identification unit 590 is read from a predetermined memory location, and the tag is added to the beginning of the first sentence and the sentence end symbol <EOS> is attached to the end. To the learning data generation unit 546. Similarly, when the tag adding unit 592 receives the second sentence of the parallel translation from the parallel sentence reading unit 540, the tag indicating the language of the first sentence is read from a predetermined memory location, and the tag is added to the end of the second sentence. Is appended with a sentence end symbol <EOS>, and is given to the learning data generation unit 546.

学習データ生成部５４６は、タグ付与部５８２から受けた第１文を原文、タグ付与部５９２から受けた第２文を翻訳文とする学習データと、タグ付与部５９２から受けた第２文を原文、タグ付与部５８２から受けた第１文を翻訳文とする学習データとを生成し、学習データ記憶部５４８に格納する。対訳文読出部５４０、第１文処理部５４２、第２文処理部５４４、及び学習データ生成部５４６はこのようにして多数の学習データを生成し学習データ記憶部５４８に蓄積する。 The learning data generation unit 546 uses the first sentence received from the tag assignment unit 582 as the original sentence, the learning data having the second sentence received from the tag assignment unit 592 as the translated sentence, and the second sentence received from the tag assignment unit 592. The original text and learning data having the first sentence received from the tag assigning unit 582 as a translated sentence are generated and stored in the learning data storage unit 548. The parallel sentence reading unit 540, the first sentence processing unit 542, the second sentence processing unit 544, and the learning data generation unit 546 generate a large number of learning data in this way and store them in the learning data storage unit 548.

学習データ記憶部５４８に十分な数の学習データが生成されると、ＮＮ学習部５５０がその学習データを用いてＮＮ５５２の学習を行う。この学習方法については前述したとおりである。学習においてＮＮ５５２のパラメータがある終了条件を満たすとＮＮ５５２の学習が終わり、そのときのＮＮ５５２の機能を定めるパラメータがＮＮパラメータ記憶部５１４に記憶され、ＮＮの学習が終わる。このパラメータは翻訳エンジン６０６に含まれるＮＮに設定される。 When a sufficient number of learning data is generated in the learning data storage unit 548, the NN learning unit 550 learns the NN 552 using the learning data. This learning method is as described above. In the learning, when the NN552 parameter satisfies a certain end condition, the learning of the NN552 is finished, and the parameters that define the function of the NN552 at that time are stored in the NN parameter storage unit 514, and the learning of the NN is finished. This parameter is set to NN included in the translation engine 606.

翻訳時には機械翻訳装置５１８は以下のように動作する。翻訳に先立ち、ユーザは入出力装置６００を操作して翻訳先の言語を指定する。ターゲット言語記憶部６０２は、指定された言語を示すタグを保存する。 During translation, the machine translation device 518 operates as follows. Prior to translation, the user operates the input / output device 600 to specify the language of the translation destination. The target language storage unit 602 stores a tag indicating the designated language.

入力文５１６が機械翻訳装置５１８に入力され、翻訳が要求されると、タグ付与部６０４はターゲット言語記憶部６０２から翻訳先の言語を示すタグを読出し、入力文５１６の先頭にそのタグを付与する。さらにタグ付与部６０４は、入力文５１６の末尾に文末記号＜ＥＯＳ＞を付与して翻訳エンジン６０６に入力する。 When the input sentence 516 is input to the machine translation device 518 and translation is requested, the tag assigning unit 604 reads a tag indicating the language of the translation destination from the target language storage unit 602 and assigns the tag to the head of the input sentence 516. To do. Further, the tag assigning unit 604 assigns a sentence end symbol <EOS> to the end of the input sentence 516 and inputs it to the translation engine 606.

翻訳エンジン６０６は、学習によるパラメータが設定されたＮＮに、入力された入力文５１６の各単語を順番に入力として与える。入力として入力文５１６の末尾の文末記号＜ＥＯＳ＞が与えられたときにＮＮの出力に得られた単語が翻訳文の先頭の単語となる。以後、そのようにして得られた単語をＮＮの入力に与え、得られた出力を逐次接続していくことで入力文５１６に対する翻訳の単語列が得られる。ＮＮの出力として文末記号＜ＥＯＳ＞が得られた時点で翻訳エンジン６０６は翻訳を終了し、それまでに得られた単語列を連結して翻訳文５２０として出力する。 The translation engine 606 gives each word of the input sentence 516 as input to the NN in which the learning parameters are set. When an end sentence <EOS> at the end of the input sentence 516 is given as an input, the word obtained at the output of the NN becomes the first word of the translated sentence. Thereafter, the word string thus obtained is given to the NN input, and the obtained output is sequentially connected to obtain a translation word string for the input sentence 516. When the sentence end symbol <EOS> is obtained as the output of NN, the translation engine 606 finishes the translation, concatenates the word strings obtained so far, and outputs it as a translated sentence 520.

〈本実施の形態の効果〉
上記第４の実施の形態に係る翻訳システム５００によれば、翻訳先の言語によって異なるタグが文頭に付与される。ＮＮでは、翻訳エンジンであるＮＮへの入力単語としてこれらタグも考慮される。そのため、こうしたタグを用いて複数の言語の対訳により学習したＮＮでは、１つのＮＮで複数の言語間の翻訳が行えるようになる。複数の言語が共通した性質を持つ場合、そのうちのある特定の言語の文を含む対訳の数が少なかったとしても、それ以外で共通した性質を持つ言語の対訳を用いた学習により、そうした特定の言語の翻訳精度も向上することが期待できる。しかもこの場合、ＮＮによる翻訳エンジン自体の構成は全く変える必要がなく、学習時及び翻訳時の前処理として各文の先頭に翻訳先の言語を示すタグを付すだけである。したがって、簡単な構成により機械翻訳の精度を向上できる。 <Effects of the present embodiment>
According to the translation system 500 according to the fourth embodiment, different tags are given to the sentence heads depending on the language of the translation destination. In the NN, these tags are also considered as input words to the translation engine NN. Therefore, in an NN learned by parallel translation of a plurality of languages using such tags, a single NN can perform translation between the plurality of languages. When multiple languages have a common property, even if there are a small number of parallel translations that contain sentences in a particular language, such specific language can be learned by learning with parallel translations of languages that have other common properties. The language translation accuracy can also be improved. In addition, in this case, there is no need to change the configuration of the translation engine itself by the NN, and only a tag indicating the language of the translation destination is added to the head of each sentence as preprocessing at the time of learning and translation. Therefore, the accuracy of machine translation can be improved with a simple configuration.

なお、上記した第４の実施の形態では、学習時、対訳の第１文と第２文との双方について言語識別部で言語を識別している。マルチリンガル対訳コーパス５１０に格納された各対訳に、それら言語を特定する情報が付されている場合には、言語識別部を設ける必要はなく、付された情報を用いて翻訳先の言語を示すタグを特定すればよい。マルチリンガル対訳コーパス５１０の各対訳の文の先頭に、対になっている相手の文の言語を示すタグを付すような前処理を行っていてもよい。 In the above-described fourth embodiment, at the time of learning, the language identifying unit identifies the language for both the first sentence and the second sentence of the parallel translation. If each bilingual translation stored in the multilingual bilingual corpus 510 has information identifying the language, there is no need to provide a language identification unit, and the translated language is indicated using the attached information. What is necessary is just to specify a tag. Preprocessing may be performed such that a tag indicating the language of the partner sentence in the pair is attached to the head of each parallel sentence in the multilingual parallel corpus 510.

上記した実施の形態では、翻訳エンジンとしてＬＳＴＭによるＮＮを用いている。しかし本発明はそのような実施の形態には限定されない。ＬＳＴＭ以外のセルを利用したＲＮＮを用いた場合でも、同様の学習を行えばよいので、第４の実施の形態と同様の効果が期待できる。 In the above-described embodiment, LSTM NN is used as a translation engine. However, the present invention is not limited to such an embodiment. Even when an RNN using a cell other than LSTM is used, the same learning as that of the fourth embodiment can be expected because similar learning is performed.

［コンピュータによる実現］
上記実施の形態に係る機械翻訳システム、学習処理部、及び機械翻訳装置は、コンピュータハードウェアと、そのコンピュータハードウェア上で実行されるコンピュータプログラムとにより実現できる。図１３はこのコンピュータシステム９３０の外観を示し、図１４はコンピュータシステム９３０の内部構成を示す。 [Realization by computer]
The machine translation system, the learning processing unit, and the machine translation apparatus according to the above embodiments can be realized by computer hardware and a computer program executed on the computer hardware. FIG. 13 shows the external appearance of the computer system 930, and FIG. 14 shows the internal configuration of the computer system 930.

図１３を参照して、このコンピュータシステム９３０は、メモリポート９５２及びＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ）ドライブ９５０を有するコンピュータ９４０と、キーボード９４６と、マウス９４８と、モニタ９４２とを含む。 Referring to FIG. 13, the computer system 930 includes a computer 940 having a memory port 952 and a DVD (Digital Versatile Disc) drive 950, a keyboard 946, a mouse 948, and a monitor 942.

図１４を参照して、コンピュータ９４０は、メモリポート９５２及びＤＶＤドライブ９５０に加えて、ＣＰＵ（中央処理装置）９５６と、ＣＰＵ９５６、メモリポート９５２及びＤＶＤドライブ９５０に接続されたバス９６６と、ブートアッププログラム等を記憶する読出専用メモリ（ＲＯＭ）９５８と、バス９６６に接続され、プログラム命令、システムプログラム、及び作業データ等を記憶するランダムアクセスメモリ（ＲＡＭ）９６０とを含む。コンピュータシステム９３０はさらに、他端末との通信を可能とするネットワークへの接続を提供するネットワークインターフェイス（Ｉ／Ｆ）９４４を含む。ネットワークＩ／Ｆ９４４は、インターネット９６８に接続されてもよい。 14, in addition to the memory port 952 and the DVD drive 950, the computer 940 boots up with a CPU (Central Processing Unit) 956, a bus 966 connected to the CPU 956, the memory port 952, and the DVD drive 950. A read only memory (ROM) 958 that stores programs and the like, and a random access memory (RAM) 960 that is connected to the bus 966 and stores program instructions, system programs, work data, and the like are included. The computer system 930 further includes a network interface (I / F) 944 that provides a connection to a network that enables communication with other terminals. The network I / F 944 may be connected to the Internet 968.

コンピュータシステム９３０を上記した各実施の形態の機械翻訳システム、学習処理部、又は機械翻訳装置を構成する各機能部として機能させるためのコンピュータプログラムは、ＤＶＤドライブ９５０又はメモリポート９５２に装着されるＤＶＤ９６２又はリムーバブルメモリ９６４に記憶され、さらにハードディスク９５４に転送される。又は、プログラムはネットワークＩ／Ｆ９４４を通じてコンピュータ９４０に送信されハードディスク９５４に記憶されてもよい。プログラムは実行の際にＲＡＭ９６０にロードされる。ＤＶＤ９６２から、リムーバブルメモリ９６４から、又はネットワークＩ／Ｆ９４４を介して、直接にＲＡＭ９６０にプログラムをロードしてもよい。 A computer program for causing the computer system 930 to function as each of the functional units constituting the machine translation system, the learning processing unit, or the machine translation device of each of the above-described embodiments is a DVD 962 mounted on the DVD drive 950 or the memory port 952. Alternatively, it is stored in the removable memory 964 and further transferred to the hard disk 954. Alternatively, the program may be transmitted to the computer 940 through the network I / F 944 and stored in the hard disk 954. The program is loaded into the RAM 960 when executed. The program may be loaded directly into the RAM 960 from the DVD 962, the removable memory 964, or the network I / F 944.

このプログラムは、コンピュータ９４０を、上記各実施の形態に係る機械翻訳システム、学習処理部、又は機械翻訳装置の各機能部として機能させるための複数の命令を含む。この動作を行なわせるのに必要な基本的機能のいくつかはコンピュータ９４０上で動作するオペレーティングシステム（ＯＳ）若しくはサードパーティのプログラム、又は、コンピュータ９４０にインストールされる各種プログラミングツールキットのモジュールにより提供される。したがって、このプログラムはこの実施の形態の機械翻訳システム、学習処理部、又は機械翻訳装置を実現するのに必要な機能全てを必ずしも含まなくてよい。このプログラムは、命令のうち、所望の結果が得られるように制御されたやり方で適切な機能又はプログラミングツールキット内の適切なプログラムツールを呼出すことにより、上記した機械翻訳システム、学習処理部、又は機械翻訳装置としての機能を実現する命令のみを含んでいればよい。コンピュータシステム９３０の動作は周知である。したがってここでは繰返さない。 This program includes a plurality of instructions for causing the computer 940 to function as each functional unit of the machine translation system, the learning processing unit, or the machine translation device according to each of the above embodiments. Some of the basic functions required to perform this operation are provided by operating system (OS) or third party programs running on the computer 940, or modules of various programming toolkits installed on the computer 940. The Therefore, this program does not necessarily include all functions necessary for realizing the machine translation system, the learning processing unit, or the machine translation apparatus of this embodiment. The program calls the appropriate function or the appropriate program tool in the programming tool kit in a controlled manner so as to obtain a desired result. It is only necessary to include an instruction for realizing a function as a machine translation device. The operation of computer system 930 is well known. Therefore, it does not repeat here.

なお、各種のコーパスは、上記実施の形態ではハードディスク９５４に記憶され、適宜ＲＡＭ９６０に展開される。翻訳のためのモデルパラメータ等はいずれもＲＡＭ９６０に記憶される。最終的に最適化されたモデルパラメータ等はＲＡＭ９６０からハードディスク９５４、ＤＶＤ９６２又はリムーバブルメモリ９６４に格納される。またはモデルパラメータはネットワークＩ／Ｆ９４４を介して別の装置に送信してもよいし、別の装置から受信してもよい。 Note that various corpora are stored in the hard disk 954 in the above embodiment, and are expanded in the RAM 960 as appropriate. All model parameters for translation are stored in the RAM 960. The finally optimized model parameters and the like are stored from the RAM 960 to the hard disk 954, the DVD 962, or the removable memory 964. Alternatively, the model parameter may be transmitted to another device via the network I / F 944 or may be received from another device.

今回開示された実施の形態は単に例示であって、本発明が上記した実施の形態のみに制限されるわけではない。本発明の範囲は、発明の詳細な説明の記載を参酌した上で、特許請求の範囲の各請求項によって示され、そこに記載された文言と均等の意味及び範囲内での全ての変更を含む。 The embodiment disclosed herein is merely an example, and the present invention is not limited to the above-described embodiment. The scope of the present invention is indicated by each claim of the claims after taking into account the description of the detailed description of the invention, and all modifications within the meaning and scope equivalent to the wording described therein are included. Including.

６２、８２語順変換
６４、６８、８４、８８、１８２、１９２単語列
６６、８６、１８０、１９０タグ付与処理
１１０、２２０対訳コーパス
１１４、２５８、３６４、４４４モデル学習部
１１８、２２６、３４４、５１６入力文
１２０、２３０、３４８、４１６、５１８機械翻訳装置
１２２、２２８、３４６、４１４、５２０翻訳文
１４０、２６０、２８０形態素解析部
１４２、３７２，３８２構文解析部
１４４、２６４、２８４事前並替部
１４６、２７４、３７４、４５４、４７４、５８２、５９２、６０４タグ付与部
１４８、２８８ＰＢＳＭＴ装置
１８４、１９４ＰＢＳＭＴによる翻訳
２１０、３２０、４００ＰＢＳＭＴシステム
２２２、３４２、４１０モデル記憶部
２４０メタ情報付対訳コーパス
２５０、５４０対訳文読出部
２５２、３６０、４４０原文処理部
２５４翻訳文処理部
２５６、３６２、４４２学習データ記憶部
２６２、２８２文法タイプ判定部（構文解析部）
２６６、２８６文法タイプ別タグ付与部
２２４、３４０、４１２、５１２学習処理部
３７０、３８０メタ情報分離部
３８４メタ情報別タグ付与部
４５０、４７０文脈情報記憶部
４５２、４７２一文前文脈情報記憶部
５１０マルチリンガル対訳コーパス
５１４ＮＮパラメータ記憶部
５４２第１文処理部
５４４第２文処理部
５４６学習データ生成部
５４８学習データ記憶部
５５０ＮＮ学習部
５５２ＮＮ
６０２ターゲット言語記憶部
６０６ＮＮによる翻訳エンジン 62, 82 Word order conversion 64, 68, 84, 88, 182, 192 Word string 66, 86, 180, 190 Tag assignment processing 110, 220 Bilingual corpus 114, 258, 364, 444 Model learning unit 118, 226, 344, 516 Input sentence 120, 230, 348, 416, 518 Machine translation device 122, 228, 346, 414, 520 Translated sentence 140, 260, 280 Morphological analyzer 142, 372, 382 Syntax analyzer 144, 264, 284 Pre-rearranger 146, 274, 374, 454, 474, 582, 592, 604 Tag assignment unit 148, 288 PBSMT device 184, 194 Translation by PBSMT 210, 320, 400 PBSMT system 222, 342, 410 Model storage unit 240 Parallel corpus with meta information 250, 540 Bilingual reading Part 252,360,440 original processing unit 254 translation processing unit 256,362,442 learning data storage unit 262,282 grammar type determination unit (syntax analysis section)
266, 286 Tag assignment unit by grammar type 224, 340, 412, 512 Learning processing unit 370, 380 Meta information separation unit 384 Meta information tag assignment unit 450, 470 Context information storage unit 452, 472 Previous sentence context information storage unit 510 Multilingual parallel corpus 514 NN parameter storage unit 542 First sentence processing unit 544 Second sentence processing unit 546 Learning data generation unit 548 Learning data storage unit 550 NN learning unit 552 NN
602 Target language storage unit 606 Translation engine by NN

Claims

Meta-information specifying means for specifying meta-information about translation;
Meta information corresponding tag insertion means for inserting a tag corresponding to the meta information specified by the meta information specifying means at a predetermined position of the original text of translation;
A machine translation device that receives as input the original text with the tag attached,
A plurality of predetermined types are defined as the meta information, and the meta information corresponding tag insertion unit selects the tag according to the type of the meta information.

The meta information corresponding tag insertion means is configured to specify a range in which translation using the meta information is performed in the original text, and a first tag corresponding to the meta information at a start position and an end position of the range. The machine translation device according to claim 1, further comprising range specifying tag insertion means for inserting the second tag and the second tag, respectively.

The meta information specifying means includes:
Morphological analysis means for morphological analysis of the original text;
Syntax analysis means for performing syntax analysis of the original text that has been morphologically analyzed by the morpheme analysis means;
The grammar type output means for outputting, as the meta information of the original sentence, information indicating the grammatical type of the original sentence, obtained from the result of syntactic analysis of the original sentence by the syntactic analysis means. Item 3. The machine translation device according to Item 2.

The original text has the meta information related to the translation of the original text,
The meta information specifying means includes meta information separating means for separating the meta information attached to the original text from the original text and supplying the meta information corresponding tag insertion means to the meta information corresponding tag inserting means. The machine translation device described.

The meta information includes the grammatical type of the original sentence, scene information about a scene where the original sentence is uttered, speaker information about a speaker who utters the original sentence, and a sentence translated by the machine translation means prior to the original sentence. 5. The machine translation apparatus according to claim 1, wherein the machine translation apparatus is selected from a group consisting of grammatical types of preceding texts.

The machine translation device according to claim 1, wherein the machine translation unit is a phrase-based machine translation unit.

The meta information specifying means includes means for specifying a translation destination language of the original text of the translation as meta information,
The machine translation device according to claim 1, wherein the meta information corresponding tag insertion means includes means for inserting a tag indicating the translation language specified by the meta information into a predetermined position of the original text.

A computer program that causes a computer to function as the machine translation device according to any one of claims 1 to 7.