JP2014170296A

JP2014170296A - Word order rearranging device, translation device, translation model learning device, method, and program

Info

Publication number: JP2014170296A
Application number: JP2013040796A
Authority: JP
Inventors: Katsuto Sudo; 克仁須藤; Masaaki Nagata; 昌明永田; Sho Hoshino; 翔星野; Yusuke Miyao; 祐介宮尾
Original assignee: Nippon Telegraph and Telephone Corp; Research Organization of Information and Systems
Current assignee: Nippon Telegraph and Telephone Corp; Research Organization of Information and Systems
Priority date: 2013-03-01
Filing date: 2013-03-01
Publication date: 2014-09-18
Anticipated expiration: 2033-03-01
Also published as: JP5800206B2

Abstract

PROBLEM TO BE SOLVED: To accurately rearrange a word order of an input sentence.SOLUTION: A language analysis section 30 classifies each element in a clause of an input sentence into two or more kinds of elements, and a rearrangement section 40 rearranges an order of two or more kinds of the classified elements in accordance with a rearrangement rule determined in advance for every clause, so that a word order of the input sentence is rearranged.

Description

本発明は、語順並べ替え装置、翻訳装置、翻訳モデル学習装置、方法、及びプログラムに係り、特に、入力文の語順を並べ替える語順並べ替え装置、翻訳装置、翻訳モデル学習装置、方法、及びプログラムに関する。 The present invention relates to a word order rearrangement device, a translation device, a translation model learning device, a method, and a program, and more particularly to a word order rearrangement device, a translation device, a translation model learning device, a method, and a program for rearranging the word order of input sentences. About.

言語Ａから言語Ｂへの機械翻訳の処理は、言語Ａの語句から言語Ｂの語句への翻訳と、翻訳された言語Ｂの語句の言語Ｂにおける適切な並べ替えとの２つに大別される。当該分野で広く利用されている統計的翻訳技術においては、大量の対訳文から推定された言語Ａの語句と言語Ｂの語句との対応関係から語句の翻訳と語句の並べ替えを統計的にモデル化し、言語Ａの入力文に対し、それらの統計モデルに基づいて尤もらしい語句の翻訳と語句の並べ替えによって構成される言語Ｂの翻訳文を探索するという方法が採られる。 The process of machine translation from language A to language B is roughly divided into two: translation from language A phrases to language B phrases, and appropriate reordering of translated language B phrases in language B. The In the statistical translation technology widely used in the field, the translation of words and the rearrangement of words are statistically modeled from the correspondence between the words of language A and the words of language B estimated from a large number of parallel translations. The language B input sentence is searched for a translation sentence of the language B constituted by translation of a probable phrase and rearrangement of the phrase based on the statistical model.

一般にすべての翻訳文候補を網羅的に探索することは計算量的に非常に困難であるため、各語句の翻訳の候補数を制限し、かつ語句の並べ替えの距離を一定の範囲内に制約することによって実用的な計算量での機械翻訳処理が実現される。 In general, it is extremely difficult to comprehensively search for all translation candidates, so the number of translation candidates for each word is limited, and the distance of word sorting is limited within a certain range. By doing so, machine translation processing with a practical calculation amount is realized.

しかし、翻訳の対象となる言語Ａと言語Ｂの組み合わせによっては、対応する語句が大きく異なる順序で現れる可能性があり、そのような言語間の翻訳を正確に行うためには十分に大きな並べ替え距離を考慮した翻訳処理が要求されるため、計算量の増加が避けられないという問題が存在する。 However, depending on the combination of language A and language B to be translated, the corresponding words may appear in a significantly different order, and the reordering is large enough to accurately translate between such languages. Since translation processing in consideration of distance is required, there is a problem that an increase in calculation amount is unavoidable.

上記問題に対処する技術として、翻訳処理を行う前に言語Ａの語句を対応する言語Ｂの語句の順序に近づけるように並べ替える「事前並べ替え(pre-ordering)」と呼ばれる技術が存在する(特許文献１、非特許文献１)。
また、非特許文献２及び非特許文献３の技術において、日本語から英語への翻訳においても、構文解析を利用して日本語の文節の係り受け構造を推定し、文節の順序を入れ替えることによって英語の語順に近づけることができる。 As a technique for dealing with the above problem, there is a technique called “pre-ordering” for rearranging the words of the language A so as to approach the order of the corresponding words of the language B before performing the translation process ( Patent Document 1, Non-Patent Document 1).
Also, in the techniques of Non-Patent Document 2 and Non-Patent Document 3, in Japanese-to-English translation, syntactic analysis is used to estimate the dependency structure of Japanese phrases, and the order of the phrases is changed. You can get close to English word order.

特開２０１１−１７５５００号公報JP 2011-175500 A

Michael Collins, Philipp Koehn, Ivona Kucerova, "Clause Restructuring for Statistical Machine Translation", In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 531-540, 2005Michael Collins, Philipp Koehn, Ivona Kucerova, "Clause Restructuring for Statistical Machine Translation", In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 531-540, 2005 Mamoru Komachi, Yuji Matsumoto, Masaaki Nagata, "Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure", In Proceedings of International Workshop on Spoken Language Translation (IWSLT 2006), 2006Mamoru Komachi, Yuji Matsumoto, Masaaki Nagata, "Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure", In Proceedings of International Workshop on Spoken Language Translation (IWSLT 2006), 2006 Katsuhito Sudoh, Kevin Duh, Hajime Tsukada, Masaaki Nagata, Xianchao Wu, Takuya Matsuzaki and Jun'ichi Tsujii, "NTT-UT Statistical Machine Translation in NTCIR-9 PatentMT", In Proceedings of NTCIR-9, 2011.Katsuhito Sudoh, Kevin Duh, Hajime Tsukada, Masaaki Nagata, Xianchao Wu, Takuya Matsuzaki and Jun'ichi Tsujii, "NTT-UT Statistical Machine Translation in NTCIR-9 PatentMT", In Proceedings of NTCIR-9, 2011.

非特許文献１は独語から英語、特許文献１は英語から日本語への翻訳を対象としており入力文の言語(言語Ａ)の語句を翻訳後の言語(言語Ｂ)の対応する語句の順序に近づけるように並べ替える規則を利用している。これらの技術は言語Ａ側の構文解析と適切な規則を利用することによって並べ替えをかなり正確に行うことができる反面、言語Ａや言語Ｂが異なれば必要な規則も異なるため、新たに規則を定義する必要があるという問題がある。 Non-patent document 1 is intended for translation from German to English, and patent document 1 is intended for translation from English to Japanese. The language of the input sentence (language A) is changed to the corresponding phrase in the translated language (language B). It uses a rule that rearranges them closer to each other. While these technologies can be sorted fairly accurately by using parsing and appropriate rules on the language A side, the necessary rules are different for different languages A and B, so new rules are added. There is a problem that needs to be defined.

非特許文献２は係り受け構造に加え、述語項構造解析と呼ばれる述語と主語や目的語の関係を推定する技術を利用して主語や目的語を同定し、英語の主語、動詞、目的語の順になるように主語や目的語を移動するための規則を利用している。非特許文献３は係り受け構造と、助詞を利用して主語や目的語を推定し、非特許文献２と同様の並べ替え規則を利用している。これらの方法は文節の順序を英語に近い順に並べることが期待できるが、文節内の語順は変えないため、「東京に着いた」という句は「着いた東京に」という語順になるに留まり、対応する英語の"arrived at Tokyo"とは助詞「に」、前置詞"at"の位置に違いが残るという問題がある。 In addition to the dependency structure, Non-Patent Document 2 identifies the subject and object using a technique that estimates the relationship between the predicate and subject and object, called predicate term structure analysis. Rules are used to move the subject and object in order. Non-Patent Document 3 uses a dependency structure and a particle to estimate the subject and object, and uses the same rearrangement rule as Non-Patent Document 2. These methods can be expected to arrange the order of clauses in the order close to English, but the word order in the clauses does not change, so the phrase “I arrived in Tokyo” will only be in the order of words “I arrived in Tokyo” There is a problem that the position of the preposition "at" remains different from the corresponding "arrived at Tokyo" in English.

本発明では、上記問題点を解決するために成されたものであり、入力文の語順を精度よく並べ替えることできる語順並べ替え装置、翻訳装置、翻訳モデル学習装置、方法、及びプログラムを提供することを目的とする。 In the present invention, there is provided a word order rearrangement device, a translation device, a translation model learning device, a method, and a program, which are made to solve the above-described problems and can accurately rearrange the word order of an input sentence. For the purpose.

上記目的を達成するために、第１の発明に係る語順並べ替え装置は、入力文の各文節について、前記文節内の各要素を２種類以上の要素に分類する構文解析部と、前記文節毎に、予め定められた並べ替え規則に従って、前記構文解析部により分類された前記２種類以上の要素の順序を並べ替えることにより、前記入力文の語順を並べ替える並べ替え部と、を含んで構成されている。 In order to achieve the above object, the word order rearranging device according to the first invention includes, for each phrase of an input sentence, a syntax analysis unit that classifies each element in the phrase into two or more types of elements, and each phrase A rearrangement unit that rearranges the word order of the input sentence by rearranging the order of the two or more types of elements classified by the syntax analysis unit in accordance with a predetermined rearrangement rule. Has been.

第２の発明に係る語順並べ替え方法は、構文解析部と、並べ替え部と、を含む語順並べ替え装置の語順並べ替え方法であって、前記構文解析部は、入力文の各文節について、前記文節内の各要素を２種類以上の要素に分類し、前記並べ替え部は、前記入力文の各文節に対する係り受け解析の結果と、前記入力文の各述語に対する述語項構造解析結果とに基づいて、日本語で記載された入力文の各文節の順序を並べ替え、前記構文解析部により分類された２種類以上の要素の順序を並べ替える。 A word order rearrangement method according to a second invention is a word order rearrangement method of a word order rearrangement device including a syntax analysis unit and a rearrangement unit, wherein the syntax analysis unit includes: Each element in the clause is classified into two or more types of elements, and the reordering unit divides into a result of dependency analysis for each clause of the input sentence and a predicate term structure analysis result for each predicate of the input sentence. Based on this, the order of each clause of the input sentence written in Japanese is rearranged, and the order of two or more types of elements classified by the syntax analysis unit is rearranged.

第１の発明及び第２の発明によれば、構文解析部によって、入力文の各文節内の要素を２種類以上の要素に分類し、並べ替え部によって、分類された２種類以上の要素の順序を並べ替える。 According to the first and second inventions, the syntax analysis unit classifies the elements in each clause of the input sentence into two or more types of elements, and the rearrangement unit classifies the two or more types of elements classified. Rearrange the order.

このように、第１の発明及び第２の発明によれば、入力文の文節内の各要素を２種類以上の要素に分類し、分類された２種類以上の要素の順序を並べ替えることにより、入力文の語順を精度良く並べ替えを行うことができる。 Thus, according to the first invention and the second invention, each element in the clause of the input sentence is classified into two or more types of elements, and the order of the two or more types of classified elements is rearranged. The word order of the input sentence can be rearranged with high accuracy.

第３の発明に係る語順並べ替え装置は、日本語で記述された入力文の語順を、日本語とは異なる特定言語で記述された文に近い語順に並べ替える語順並べ替え装置において、前記入力文の各文節について、前記文節内の各要素を２種類以上の要素に分類する構文解析部と、日本語で記述された文の文節の順序を前記特定言語で記述された文の文節の順序に並べ替えるための予め定められた文節並べ替え規則に従って、前記入力文の文節の順序を並べ替え、前記並べ替えた前記入力文について、前記文節毎に、日本語で記述された文の文節の前記２種類以上の要素の順序を前記特定言語で記述された文の文節の前記２種類以上の要素の順序に並べ替えるための予め定められた要素並べ替え規則に従って、前記構文解析部により分類された前記２種類以上の要素の順序を並べ替えることにより、前記入力文の語順を並べ替える並べ替え部と、を含んで構成されている。 The word order rearrangement device according to a third aspect of the present invention is the word order rearrangement device for rearranging the word order of an input sentence described in Japanese in the order of words similar to a sentence described in a specific language different from Japanese. For each clause of a sentence, a parsing unit that classifies each element in the clause into two or more types of elements, and the order of the clauses of the sentence described in the specific language, the order of the clauses of the sentence described in Japanese The order of the clauses of the input sentence is rearranged in accordance with a predetermined clause rearrangement rule for rearranging the input sentence, and for each of the rearranged input sentences, the sentence clause of the sentence written in Japanese is sorted. According to a predetermined element rearrangement rule for rearranging the order of the two or more types of elements into the order of the two or more types of elements of a sentence clause described in the specific language, the classification is performed by the parsing unit. Said 2 By rearranging the order of the above components like, it is configured to include a, a sort rearrangement unit the word order of the input sentence.

第４の発明に係る語順並べ替え方法は、構文解析部と、並べ替え部と、を含み、日本語で記述された入力文の語順を、日本語とは異なる特定言語で記述された文に近い語順に並べ替える語順並べ替え装置の語順並べ替え方法であって、前記構文解析部は、前記入力文の各文節について、前記文節内の各要素を２種類以上の要素に分類し、前記並べ替え部は、日本語で記述された文の文節の順序を前記特定言語で記述された文の文節の順序に並べ替えるための予め定められた文節並べ替え規則に従って、前記入力文の文節の順序を並べ替え、前記並べ替えた前記入力文について、前記文節毎に、日本語で記述された文の文節の前記２種類以上の要素の順序を前記特定言語で記述された文の文節の前記２種類以上の要素の順序に並べ替えるための予め定められた要素並べ替え規則に従って、前記構文解析部により分類された前記２種類以上の要素の順序を並べ替えることにより、前記入力文の語順を並べ替える。 A word order rearrangement method according to a fourth invention includes a syntax analysis unit and a rearrangement unit, and changes the word order of an input sentence described in Japanese to a sentence described in a specific language different from Japanese. A word order rearrangement method of a word order rearrangement device that rearranges words in close order, wherein the syntax analysis unit classifies each element in the phrase into two or more types of elements for each phrase of the input sentence, The replacement unit is configured to determine the order of the clauses of the input sentence according to a predetermined clause rearrangement rule for rearranging the order of the clauses of the sentence described in Japanese to the order of the clauses of the sentence described in the specific language. For the input sentence thus rearranged, the order of the two or more types of elements of the sentence clause described in Japanese for each of the clauses is changed to the 2 of the sentence clauses described in the specific language. To sort in the order of more than types of elements According Because defined elements collation rules, by rearranging the order of the classified the two or more elements by the parser, rearranges word order of the input sentence.

第３の発明及び第４の発明によれば、構文解析部によって、日本語で記述された入力文の各文節内の各要素を２種類以上の要素に分類し、並べ替え部によって、予め定められた文節並べ替え規則に従って、入力文の文節の順序を並べ替え、予め定められた要素並べ替え規則に従って、分類された２種類以上の要素の順序を並べ替える。 According to the third and fourth inventions, the syntax analysis unit classifies each element in each clause of the input sentence written in Japanese into two or more types of elements, and the rearrangement unit determines in advance. The order of clauses of the input sentence is rearranged according to the determined clause rearrangement rule, and the order of two or more types of classified elements is rearranged according to a predetermined element rearrangement rule.

このように、第３の発明及び第４の発明によれば、日本語で記述された入力文の各文節内の各要素を２種類以上の要素に分類し、予め定められた文節並べ替え規則に従って、入力文の文節の順序を並べ替え、予め定められた要素並べ替え規則に従って、分類された２種類以上の要素の順序を並べ替えることにより、入力文の語順を精度良く並べ替えを行うことが出来る。 Thus, according to the third and fourth inventions, each element in each phrase of an input sentence written in Japanese is classified into two or more types of elements, and a predetermined phrase rearrangement rule The order of the phrases in the input sentence is rearranged, and the word order of the input sentence is rearranged with high accuracy by rearranging the order of the two or more types of classified elements according to a predetermined element rearrangement rule. I can do it.

第５の発明に係る翻訳装置は、日本語で記述された入力文を、日本語とは異なる特定言語で記述された文に翻訳する翻訳装置において、前記入力文の各文節について、前記文節内の各要素を２種類以上の要素に分類する構文解析部と、日本語で記述された文の文節の順序を前記特定言語で記述された文の文節の順序に並べ替えるための予め定められた文節並べ替え規則に従って、前記入力文の文節の順序を並べ替え、前記並べ替えた前記入力文について、前記文節毎に、日本語で記述された文の文節の前記２種類以上の要素の順序を前記特定言語で記述された文の文節の前記２種類以上の要素の順序に並べ替えるための予め定められた要素並べ替え規則に従って、前記構文解析部により分類された前記２種類以上の要素の順序を並べ替えることにより、前記入力文の語順を並べ替える並べ替え部と、複数種類の翻訳モデル及び前記複数種類の翻訳モデルの各々に対する重みに基づいて、前記並べ替え部により要素の順序を並べ替えられた前記入力文を、前記特定言語で記述された文に翻訳する翻訳部と、を含んで構成されている。 A translation device according to a fifth aspect of the invention is a translation device for translating an input sentence written in Japanese into a sentence written in a specific language different from Japanese, and for each phrase of the input sentence, A parsing unit for classifying each element of the sentence into two or more elements, and a predetermined order for rearranging the order of sentence clauses described in Japanese into the order of sentence clauses described in the specific language The order of clauses of the input sentence is rearranged according to the clause rearrangement rule, and the order of the two or more types of elements of the sentence clauses described in Japanese for each of the clauses of the rearranged input sentence. The order of the two or more types of elements classified by the parsing unit according to a predetermined element rearrangement rule for rearranging in the order of the two or more types of elements of the sentence clause described in the specific language Sort this The input unit in which the order of the elements is rearranged by the rearrangement unit based on the weight for each of a plurality of types of translation models and the plurality of types of translation models, according to A translation unit that translates a sentence into a sentence written in the specific language.

第６の発明に係る翻訳方法は、構文解析部と、並べ替え部と、翻訳部と、を含み、日本語で記述された入力文を、日本語とは異なる特定言語で記述された文に翻訳する翻訳装置の翻訳方法であって、前記構文解析部は、前記入力文の各文節について、前記文節内の各要素を２種類以上の要素に分類し、前記並べ替え部は、日本語で記述された文の文節の順序を前記特定言語で記述された文の文節の順序に並べ替えるための予め定められた文節並べ替え規則に従って、前記入力文の文節の順序を並べ替え、前記並べ替えた前記入力文について、前記文節毎に、日本語で記述された文の文節の前記２種類以上の要素の順序を前記特定言語で記述された文の文節の前記２種類以上の要素の順序に並べ替えるための予め定められた要素並べ替え規則に従って、前記構文解析部により分類された前記２種類以上の要素の順序を並べ替えることにより、前記入力文の語順を並べ替え、前記翻訳部は、複数種類の翻訳モデル及び前記複数種類の翻訳モデルの各々に対する重みに基づいて、前記並べ替え部により要素の順序を並べ替えられた前記入力文を、前記特定言語で記述された文に翻訳する。 A translation method according to a sixth aspect of the present invention includes a syntax analysis unit, a rearrangement unit, and a translation unit, and converts an input sentence written in Japanese into a sentence written in a specific language different from Japanese. A translation method of a translation apparatus for translating, wherein the syntax analysis unit classifies each element in the clause into two or more types of elements for each clause of the input sentence, and the sorting unit is in Japanese. Rearranging the order of the clauses of the input sentence according to a predetermined clause rearrangement rule for rearranging the order of the clauses of the described sentence to the order of the clauses of the sentence described in the specific language, and For each of the input sentences, the order of the two or more types of elements in the sentence clause described in Japanese is changed to the order of the two or more types of elements in the sentence clause described in the specific language. Pre-defined element sorting rules for sorting Then, the word order of the input sentence is rearranged by rearranging the order of the two or more types of elements classified by the syntax analysis unit, and the translation unit includes a plurality of types of translation models and the plurality of types of translations. Based on the weight for each model, the input sentence in which the order of the elements is rearranged by the rearranging unit is translated into a sentence described in the specific language.

第５の発明及び第６の発明によれば、構文解析部によって、入力文の各文節内の各要素を２種類以上の要素に分類し、並べ替え部によって、予め定められた文節並べ替え規則に従って、入力文の文節の順序を並べ替え、予め定められた要素並べ替え規則に従って、分類された２種類以上の要素の順序を並べ替え、翻訳部によって、入力文を日本語とは異なる特定言語で記述された文に翻訳する。 According to the fifth and sixth inventions, the syntax analysis unit classifies each element in each clause of the input sentence into two or more types of elements, and the rearrangement unit determines a predetermined clause rearrangement rule. The order of clauses in the input sentence is rearranged according to the above, and the order of two or more types of classified elements is rearranged according to a predetermined element rearrangement rule. Translate to the sentence described in.

このように、第５の発明及び第６の発明によれば、入力文の各文節内の各要素を２種類以上の要素に分類し、予め定められた文節並べ替え規則に従って、入力文の文節の順序を並べ替え、予め定められた要素並べ替え規則に従って、分類された２種類以上の要素の順序を並べ替え、入力文を特定言語で記述された文に翻訳することにより、入力文を精度良く翻訳を行うことが出来る。 As described above, according to the fifth and sixth inventions, each element in each clause of the input sentence is classified into two or more types of elements, and the phrase of the input sentence is determined according to a predetermined clause rearrangement rule. The order of the input sentence is rearranged, the order of two or more classified elements is rearranged according to a predetermined element rearrangement rule, and the input sentence is translated into a sentence written in a specific language. Can translate well.

第７の発明に係る翻訳モデル学習装置は、予め用意された、日本語で記述された文又は語句と、日本語とは異なる特定言語で記述された文又は語句との対である対訳データの集合に基づいて、日本語で記述された文を、前記特定言語で記述された文に翻訳するための複数種類の翻訳モデル及び前記複数種類の翻訳モデルの各々に対する重みを学習する翻訳モデル学習装置において、前記対訳データの集合の各対訳データの日本語で記述された文又は語句の各文節について、前記文節内の各要素を２種類以上の要素に分類する学習データ言語解析部と、日本語で記述された文の文節の順序を前記特定言語で記述された文の文節の順序に並べ替えるための予め定められた文節並べ替え規則に従って、各対訳データの日本語で記述された文又は語句の文節の順序を並べ替え、前記並べ替えた前記文又は語句について、前記文節毎に、前記文又は語句の文節の前記２種類以上の要素の順序を前記特定言語で記述された文又は語句の文節の前記２種類以上の要素の順序に並べ替えるための予め定められた要素並べ替え規則に従って、前記学習データ言語解析部により分類された前記２種類以上の要素の順序を並べ替えることにより、前記文又は語句の語順を並べ替える学習データ並べ替え部と、前記学習データ並べ替え部により要素の順序を並べ替えられた各対訳データの日本語で記述された文又は語句と、各対訳データの前記特定言語で記述された文又は語句とに基づいて、前記複数種類の翻訳モデルを学習する翻訳モデル学習部と、前記学習データ並べ替え部により要素の順序を並べ替えられた各対訳データの日本語で記述された文又は語句と、各対訳データの前記特定言語で記述された文又は語句と、前記翻訳モデル学習部により学習された複数種類の翻訳モデルとに基づいて、前記複数種類の翻訳モデルの各々に対する重みを学習するモデル重み学習部と、を含んで構成されている。 The translation model learning device according to the seventh aspect of the present invention provides a bilingual data that is a pair of a prepared sentence or phrase written in Japanese and a sentence or phrase written in a specific language different from Japanese. A translation model learning device for learning a plurality of types of translation models for translating a sentence described in Japanese into a sentence described in the specific language and weights for each of the plurality of types of translation models based on the set A learning data language analyzing unit for classifying each element in the phrase into two or more elements for each sentence of the sentence or phrase described in Japanese of each parallel translation data of the set of parallel translation data; Sentences or phrases written in Japanese in each bilingual data in accordance with a predetermined clause rearrangement rule for rearranging the order of clauses of sentences described in the above into the order of clauses of sentences described in the specific language The order of clauses is rearranged, and the sentence or phrase of the sentence or phrase described in the specific language is the order of the two or more types of elements of the clause of the sentence or phrase for each clause. By rearranging the order of the two or more types of elements classified by the learning data language analysis unit according to a predetermined element rearrangement rule for rearranging in the order of the two or more types of elements. Or, a learning data rearrangement unit that rearranges the word order of words and phrases, a phrase or phrase written in Japanese of each parallel translation data that has been rearranged in the element order by the learning data rearrangement unit, and the identification of each parallel translation data Based on a sentence or phrase described in a language, the translation model learning unit that learns the plurality of types of translation models, and the order of elements are rearranged by the learning data rearrangement unit Based on a sentence or phrase described in Japanese of each parallel translation data, a sentence or phrase described in the specific language of each parallel translation data, and a plurality of types of translation models learned by the translation model learning unit A model weight learning unit that learns weights for each of the plurality of types of translation models.

第８の発明に係る翻訳モデル学習方法は、学習データ言語解析部と、学習データ並べ替え部と、翻訳モデル学習部と、モデル重み学習部を含み、予め用意された、日本語で記述された文又は語句と、日本語とは異なる特定言語で記述された文又は語句との対である対訳データの集合に基づいて、日本語で記述された文を、前記特定言語で記述された文に翻訳するための複数種類の翻訳モデル及び前記複数種類の翻訳モデルの各々に対する重みを学習する翻訳モデル学習装置の翻訳モデル学習方法であって、前記学習データ言語解析部は、前記対訳データの集合の各対訳データの日本語で記述された文又は語句の各文節について、前記文節内の各要素を２種類以上の要素に分類し、前記学習データ並べ替え部は、日本語で記述された文の文節の順序を前記特定言語で記述された文の文節の順序に並べ替えるための予め定められた文節並べ替え規則に従って、各対訳データの日本語で記述された文又は語句の文節の順序を並べ替え、前記並べ替えた前記文又は語句について、前記文節毎に、前記文又は語句の文節の前記２種類以上の要素の順序を前記特定言語で記述された文又は語句の文節の前記２種類以上の要素の順序に並べ替えるための予め定められた要素並べ替え規則に従って、前記学習データ言語解析部により分類された前記２種類以上の要素の順序を並べ替えることにより、前記文又は語句の語順を並べ替え、前記翻訳モデル学習部は、前記学習データ並べ替え部により要素の順序を並べ替えられた各対訳データの日本語で記述された文又は語句と、各対訳データの前記特定言語で記述された文又は語句とに基づいて、前記複数種類の翻訳モデルを学習し、前記モデル重み学習部は、前記学習データ並べ替え部により要素の順序を並べ替えられた各対訳データの日本語で記述された文又は語句と、各対訳データの前記特定言語で記述された文又は語句と、前記翻訳モデル学習部により学習された複数種類の翻訳モデルとに基づいて、前記複数種類の翻訳モデルの各々に対する重みを学習する。 A translation model learning method according to an eighth invention includes a learning data language analysis unit, a learning data rearrangement unit, a translation model learning unit, and a model weight learning unit, and is prepared in advance and written in Japanese A sentence written in Japanese is converted into a sentence written in the specific language based on a set of parallel translation data that is a pair of a sentence or phrase written in a specific language different from Japanese. A translation model learning method of a translation model learning device for learning a plurality of types of translation models for translation and weights for each of the plurality of types of translation models, wherein the learning data language analysis unit For each sentence of a sentence or phrase written in Japanese of each bilingual data, each element in the phrase is classified into two or more types of elements, and the learning data rearrangement unit Phraseual Rearranging the order of sentences or phrases in Japanese in each bilingual data according to a predetermined phrase rearrangement rule for rearranging the order into the order of sentences in the specific language; With respect to the rearranged sentence or phrase, for each phrase, the two or more elements of the sentence or phrase phrase described in the specific language in the order of the two or more elements of the sentence or phrase phrase The word order of the sentence or phrase is rearranged by rearranging the order of the two or more types of elements classified by the learning data language analysis unit according to a predetermined element rearrangement rule for rearranging in the order of The translation model learning unit includes a sentence or phrase described in Japanese of each parallel translation data in which the order of elements is rearranged by the learning data rearrangement unit, and the identification of each parallel translation data Based on a sentence or phrase described in words, the plurality of types of translation models are learned, and the model weight learning unit is arranged in the Japanese of each parallel translation data in which the order of elements is rearranged by the learning data rearranging unit. The plurality of types of translation based on a sentence or phrase described in words, a sentence or phrase described in the specific language of each parallel translation data, and a plurality of types of translation models learned by the translation model learning unit Learn the weights for each of the models.

第７の発明に及び第８の発明によれば、学習データ言語解析部によって、各対訳データの日本語で記述された文又は語句の各文節内の各要素を２種類以上の要素に分類し、学習データ並べ替え部によって、予め定められた文節並べ替え規則に従って、各対訳データの日本語で記述された文又は語句の文節の順序を並べ替え、予め定められた要素並べ替え規則に従って、分類された２種類以上の要素の順序を並べ替え、翻訳モデル学習部によって、要素の順序を並べ替えられた各対訳データの日本語で記述された文又は語句と、各対訳データの特定言語で記述された文又は語句とに基づいて、複数種類の翻訳モデルを学習し、モデル重み学習部によって、複数種類の翻訳モデルの各々に対する重みを学習する。 According to the seventh invention and the eighth invention, the learning data language analysis unit classifies each element in each phrase of a sentence or phrase described in Japanese of each parallel translation data into two or more types of elements. The learning data rearrangement unit rearranges the order of the sentences or phrases in Japanese in each bilingual data according to a predetermined phrase rearrangement rule, and classifies according to a predetermined element rearrangement rule. The order of the two or more types of elements that have been arranged is rearranged, and the translation model learning unit describes the sentence or phrase that is written in Japanese for each parallel translation data and the specific language for each parallel translation data. A plurality of types of translation models are learned on the basis of the sentence or phrase that has been made, and a weight for each of the plurality of types of translation models is learned by the model weight learning unit.

このように、第７の発明及び第８の発明によれば、日本語で記述された文又は語句の各文節内の各要素を２種類以上の要素に分類し、予め定められた文節並べ替え規則に従って、日本語で記述された文又は語句の文節の順序を並べ替え、予め定められた要素並べ替え規則に従って、分類された２種類以上の要素の順序を並べ替えることによって語順を並べ替え、語順を並べ替えた日本語で記述された文又は語句と、対応する対訳データに含まれる特定言語で記述された文又は語句とに基づいて、複数種類の翻訳モデル及び複数種類の翻訳モデルの各々に対する重みを学習することにより、精度良く翻訳を行うことが出来る翻訳モデルを学習することが出来る。 Thus, according to the seventh and eighth inventions, each element in each phrase of a sentence or phrase written in Japanese is classified into two or more types of elements, and a predetermined phrase rearrangement is performed. Rearrange the order of sentences or phrases in Japanese according to the rules, rearrange the word order by rearranging the order of two or more classified elements according to a predetermined element rearrangement rule, Each of a plurality of types of translation models and a plurality of types of translation models based on sentences or phrases written in Japanese with the word order rearranged and sentences or phrases written in a specific language included in the corresponding parallel translation data By learning the weight for, it is possible to learn a translation model that can accurately translate.

また、第１の発明において、文節内の各要素を、内容語と機能語とに分類してもよい。 In the first invention, each element in the clause may be classified into a content word and a function word.

また、第１の発明において、入力文に対する係り受け解析の結果から得られる文節間の係り受け関係に基づいて、文節毎に、文節の前記２種類以上の要素の順序を並べ替えるか否かを判断し、２以上の要素の順序を並べ替えてもよい。 In the first invention, based on the dependency relationship between clauses obtained from the result of dependency analysis on the input sentence, whether or not the order of the two or more types of elements of the clause is rearranged for each clause is determined. The order of two or more elements may be rearranged.

また、第１の発明において、文節毎に、入力文に対する係り受け解析の結果から得られる文節の係り元及び係り先の少なくとも一方の文節の要素の構成に基づいて、２種類以上の文節の要素の順序を並べ替えるか否かを判断し、２種類以上の要素の順序を並べ替えてもよい。 Further, in the first invention, for each phrase, two or more types of phrase elements based on the configuration of the element of at least one of the phrase source and destination clauses obtained from the result of dependency analysis on the input sentence It may be determined whether or not the order is rearranged, and the order of two or more types of elements may be rearranged.

また、第１の発明は、文節毎に、入力文に対する係り受け解析の結果から得られる文節間の係り受け関係と、文節の係り元及び係り先の少なくとも一方の文節の要素の構成とに基づいて、文節の２種類以上の要素の順序を並べ替えるか否かを判断し、２種類以上の要素の順序を並べ替えてもよい。 The first invention is based on the dependency relationship between clauses obtained from the result of dependency analysis on the input sentence for each clause, and the configuration of elements of at least one of the clause source and destination clauses. Thus, it may be determined whether or not the order of two or more types of elements in the clause is rearranged, and the order of two or more types of elements may be rearranged.

また、第１の発明は、文節毎に、文節の機能語が文節の先頭に配置されるように、文節の２種類以上の要素の順序を並べ替えてもよい。 In the first invention, the order of two or more types of elements of the clause may be rearranged so that the functional word of the clause is arranged at the head of the clause for each clause.

また、本発明のプログラムは、コンピュータを、上記の語順並べ替え装置、翻訳装置、翻訳モデル学習装置を構成する各部として機能させるためのプログラムである。 Moreover, the program of this invention is a program for functioning a computer as each part which comprises said word order rearrangement apparatus, translation apparatus, and translation model learning apparatus.

以上説明したように、本発明の語順並べ替え装置、方法、及びプログラムによれば、入力文の文節内の各要素を２種類以上の要素に分類し、分類された２種類以上の要素の順序を並べ替えることにより、入力文の語順を精度良く並べ替えを行うことができる。 As described above, according to the word order rearrangement device, method, and program of the present invention, each element in the clause of the input sentence is classified into two or more types of elements, and the order of the two or more types of classified elements is sorted. By rearranging, the word order of the input sentence can be rearranged with high accuracy.

また、本発明の翻訳装置、方法、及びプログラムによれば、入力文の各文節内の各要素を２種類以上の要素に分類し、予め定められた文節並べ替え規則に従って、入力文の文節の順序を並べ替え、予め定められた要素並べ替え規則に従って、分類された２種類以上の要素の順序を並べ替え、入力文を特定言語で記述された文に翻訳することにより、入力文を精度良く翻訳を行うことが出来る。 Further, according to the translation apparatus, method, and program of the present invention, each element in each clause of the input sentence is classified into two or more types of elements, and the phrase of the input sentence is determined according to a predetermined phrase rearrangement rule. Rearranging the order, rearranging the order of two or more classified elements according to a predetermined element rearrangement rule, and translating the input sentence into a sentence written in a specific language, the input sentence is accurately Can translate.

また、本発明の翻訳モデル学習装置、方法、及びプログラムによれば、日本語で記述された文又は語句の各文節内の各要素を２種類以上の要素に分類し、予め定められた文節並べ替え規則に従って、日本語で記述された文又は語句の文節の順序を並べ替え、予め定められた要素並べ替え規則に従って、分類された２種類以上の要素の順序を並べ替えることによって語順を並べ替え、語順を並べ替えた日本語で記述された文又は語句と、対応する対訳データに含まれる特定言語で記述された文又は語句とに基づいて、複数種類の翻訳モデルを学習し、語順を並べ替えた日本語で記述された文又は語句と、対応する対訳データに含まれる特定言語で記述された文又は語句と、複数種類の翻訳モデルとに基づいて、複数種類の翻訳モデルの各々に対する重みを学習することにより、精度良く翻訳を行うことが出来る翻訳モデルを学習することが出来る。 Further, according to the translation model learning apparatus, method, and program of the present invention, each element in each phrase of a sentence or phrase described in Japanese is classified into two or more kinds of elements, and a predetermined phrase arrangement is performed. Rearrange the order of the sentences or phrases in Japanese according to the replacement rules, and rearrange the word order by rearranging the order of two or more types of classified elements according to the predetermined element rearrangement rules Based on sentences or phrases written in Japanese with rearranged word order and sentences or phrases written in a specific language included in the corresponding parallel translation data, learn multiple types of translation models and arrange the word order For each of multiple types of translation models based on the sentences or phrases written in Japanese, the sentences or phrases described in a specific language included in the corresponding parallel translation data, and multiple types of translation models. That by learning the weights, it is possible to learn accurately the translation can be carried out the translation model.

本発明の実施の形態に係る翻訳装置の構成を示す概略図である。It is the schematic which shows the structure of the translation apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る翻訳モデル学習装置の構成を示す概略図である。It is the schematic which shows the structure of the translation model learning apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る翻訳モデル学習装置における翻訳モデル学習処理ルーチンの内容を示すフローチャートである。It is a flowchart which shows the content of the translation model learning process routine in the translation model learning apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る翻訳装置における翻訳処理ルーチンの内容を示すフローチャートである。It is a flowchart which shows the content of the translation processing routine in the translation apparatus which concerns on embodiment of this invention. ＫＮＰの解析結果を示す図である。It is a figure which shows the analysis result of KNP. ＫＮＰの解析結果の解釈を示す図である。It is a figure which shows the interpretation of the analysis result of KNP. 文節の並べ替え規則に従い文節を並べ替えた例を示す図である。It is a figure which shows the example which rearranged the phrase according to the rearrangement rule of a phrase. 文節内要素を並べ替えた例を示す図である。It is a figure which shows the example which rearranged the element in a clause. 英語学習データの単語分割結果を示す図である。It is a figure which shows the word division | segmentation result of English learning data. 日本語学習データの並べ替え結果を示す図である。It is a figure which shows the rearrangement result of Japanese learning data.

以下、図面を参照して本発明の実施の形態を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

＜発明の概要＞
本発明の実施の形態では、日本語から英語への翻訳を対象とした事前並べ替えのための、日本語の構文解析結果に対して適用する並べ替え規則を提供し、それに基づく日本語から英語への機械翻訳を実現する。本発明の実施の形態では、並べ替え規則に基づいて、文節の順序を入れ替えるだけでなく、例えば、英語の前置詞の位置に合うよう助詞等の機能語を当該文節の先頭に移動させるように、並べ替え規則に基づいて文節内の単語についても並べ替え、従来の技術よりさらに英語に近い語順に日本語を並べ替えられるようにする。なお、本発明の実施の形態において、その語単独で意味を持つ語を「内容語」、内容語に付属する形でのみ使用され、付属する内容語の構文的あるいは意味的役割を表す語を「機能語」とする。 <Outline of the invention>
In the embodiment of the present invention, a sorting rule to be applied to a Japanese parsing result for pre-sorting for translation from Japanese to English is provided. Realize machine translation into In the embodiment of the present invention, not only the order of the clauses is changed based on the rearrangement rule, but also, for example, a function word such as a particle is moved to the head of the clause so as to match the position of the English preposition, Based on the rearrangement rules, the words in the phrase are rearranged so that Japanese can be rearranged in the order of words closer to English than in the prior art. In the embodiment of the present invention, a word having meaning only by the word is `` content word '', a word that is used only in a form attached to the content word, and represents a syntactic or semantic role of the attached content word. “Function word”.

＜翻訳装置の構成＞
本発明の実施の形態に係る翻訳装置について説明する。図１に示すように、本発明の実施の形態に係る翻訳装置１００は、ＣＰＵと、ＲＡＭと、後述する翻訳処理ルーチンを実行するためのプログラムや各種データを記憶したＲＯＭと、を含むコンピュータで構成することが出来る。この翻訳装置１００は、機能的には図１に示すように入力部１０と、演算部２０と、出力部７０とを備えている。 <Configuration of translation device>
A translation apparatus according to an embodiment of the present invention will be described. As shown in FIG. 1, a translation apparatus 100 according to an embodiment of the present invention is a computer that includes a CPU, a RAM, and a ROM that stores a program for executing a translation processing routine described later and various data. Can be configured. The translation apparatus 100 functionally includes an input unit 10, a calculation unit 20, and an output unit 70 as shown in FIG.

入力部１０は、キーボードなどの入力装置から日本語の入力文を受け付ける。入力される文は前処理としてＨＴＭＬやＸＭＬなどのタグの除去、表記の正規化等がされ、入力時においては適切な入力形式に変換された状態で入力される。なお、入力部１０は、ネットワーク等を介して外部から入力されたものを受け付けるようにしてもよい。 The input unit 10 receives a Japanese input sentence from an input device such as a keyboard. The input sentence is pre-processed by removing tags such as HTML and XML, normalizing the notation, and the like, and is input in a state of being converted into an appropriate input format at the time of input. Note that the input unit 10 may accept input from the outside via a network or the like.

演算部２０は、言語解析部３０と、並べ替え部４０と、翻訳部４２と、モデル記憶部４４と、を備えている。 The calculation unit 20 includes a language analysis unit 30, a rearrangement unit 40, a translation unit 42, and a model storage unit 44.

言語解析部３０は、形態素解析部３２と、構文解析部３４と、述語項構造解析部３６と、を備えている。言語解析部３０は、入力部１０において受け付けた日本語で記述された入力文に対して形態素解析、構文解析、及び述語項構造解析を行う。 The language analysis unit 30 includes a morpheme analysis unit 32, a syntax analysis unit 34, and a predicate term structure analysis unit 36. The language analysis unit 30 performs morphological analysis, syntax analysis, and predicate term structure analysis on the input sentence described in Japanese accepted by the input unit 10.

形態素解析部３２は、入力部１０において受け付けた日本語で記述された入力文に対して、公知の形態素解析器（ＪＵＭＡＮ、ＭｅＣａｂ等）を利用し、形態素解析（単語区切りと品詞の同定）を行う。 The morpheme analysis unit 32 uses a known morpheme analyzer (such as JUMAN, MeCab) for the input sentence written in Japanese accepted by the input unit 10 and performs morpheme analysis (word break and part of speech identification). Do.

構文解析部３４は、形態素解析部３２において形態素解析された日本語で記述された文に対して構文解析を行う。本実施の形態においては、日本語の文節単位での係り受け構造を利用するため、公知の係り受け解析器（ＫＮＰ、ＣａｂｏＣｈａ等）を用いて、構文解析を行う。これらの係り受け解析器では、文節間の係り受けを解析すると共に、文節内の各要素としての各単語に「内容語」と「機能語」の分類を与える。なお、「内容語」と「機能語」が２種類以上の要素の一例であり、内容語の分類のみが与えられる文節や、機能語の分類のみが与えられる文節が存在する可能性がある。 The syntax analysis unit 34 performs syntax analysis on the sentence described in Japanese that has been morphologically analyzed by the morphological analysis unit 32. In this embodiment, since a dependency structure in Japanese phrase units is used, syntax analysis is performed using a known dependency analyzer (KNP, CaboCha, etc.). These dependency analyzers analyze dependency between clauses, and classify “content words” and “function words” into each word as each element in the clause. Note that “content word” and “function word” are examples of two or more types of elements, and there is a possibility that there are clauses to which only content word classification is given and clauses to which only function word classification is given.

述語項構造解析部３６は、構文解析部３４において構文解析された日本語の文に対して、公知の述語項構造解析器（ＫＮＰ、ＳｙｎＣｈａ等）を利用し、述語項構造解析を行う。ここで、述語項構文解析とは、述語（動詞、形容詞、動作を表す名詞）に対し、日本語であれば「ガ格」と呼ばれる主語、「ヲ格」と呼ばれる目的語、「ニ格」と呼ばれる対象を表す語句を同定することである。述語項構造解析により、文節間の係り受けは、係り元と係り先という情報だけでなく、その係り受けがどのような構文的役割を表すかを求めることができる。 The predicate term structure analysis unit 36 performs a predicate term structure analysis on a Japanese sentence parsed by the syntax analysis unit 34 using a known predicate term structure analyzer (KNP, SynCha, etc.). Here, predicate term parsing refers to a predicate (verb, adjective, action noun), a subject called “ga case”, an object called “wo case”, “ni case” in Japanese. Is to identify a word representing the object called. Through the predicate term structure analysis, the dependency between clauses can determine not only the information of the dependency source and the relationship destination, but also what syntactic role the dependency represents.

並べ替え部４０は、言語解析部３０で得られた日本語で記述された文についての、日本語の係り受け構造及び述語項構造を利用して、日本語で記述された文について、英語で記述された文の語順に相似するように文節及び文節内の単語を並べ替える。具体的には、以下に示す、日本語で記述された文の文節の順序を英語で記述された文の文節の順序に並べ替えるための予め定められた文節の並べ替え規則により文節を並べ替え、その後に、日本語で記述された文の文節の「内容語」と「機能語」の順序を英語で記述された文の文節の「内容語」と「機能語」の順序に並べ替えるための予め定められた単語の並べ替え規則により各文節の単語を並べ替える。なお、単語の並べ替え規則が要素並べ替え規則の一例である。 The rearrangement unit 40 uses the Japanese dependency structure and predicate term structure for the sentence described in Japanese obtained by the language analysis unit 30, for the sentence described in Japanese, in English. The clauses and the words in the clauses are rearranged so as to be similar to the word order of the written sentences. Specifically, the clauses are rearranged according to the following clause rearrangement rules for rearranging the order of the clauses of sentences written in Japanese to the order of clauses of sentences written in English: Then, to rearrange the order of "content word" and "function word" in the sentence clause written in Japanese into the order of "content word" and "function word" in the sentence sentence written in English The words in each clause are rearranged according to the predetermined word rearrangement rules. The word rearrangement rule is an example of the element rearrangement rule.

英語では述語は主語の直後に置かれるため、上記非特許文献２と同様の以下の（１）〜（３）の文節の並べ替え規則に従って述語文節を並べ替える。
（１）述語文節を主語文節（ガ格）の直後に移動する。
（２）主語文節がない場合は目的語文節（ヲ格）及び対象文節（ニ格）のうちの何れか前方にあるものの直前に述語文節を移動する。
（３）述語文節が動詞連用形の文節である場合、当該述語文節を係り先の単語（被修飾語）の直後に移動する。 In English, since the predicate is placed immediately after the subject, the predicate clauses are rearranged according to the following clause rearrangement rules (1) to (3) as in Non-Patent Document 2.
(1) The predicate clause is moved immediately after the subject clause (G).
(2) If there is no subject clause, the predicate clause is moved immediately before either the target clause (wo case) or the target clause (d.).
(3) When the predicate clause is a verb-continuous clause, the predicate clause is moved immediately after the related word (modified word).

また、主語文節、目的語文節、対象文節のいずれも得られない場合には、例外として、以下の（４）の文節の並べ替え規則を利用する。
（４）主語文節、目的語文節、対象文節のいずれも存在しない場合は、述語文節を文末から数えて２番目になる位置に移動する。
なお、係り受け構造は階層的な構造であるため、係り受けの各階層において、係り先となる述語文節を上記文節の並べ替え規則に従って移動させる操作を行う。 When none of the subject phrase, the object phrase, and the target phrase is obtained, the following paragraph (4) rearrangement rule is used as an exception.
(4) If none of the subject clause, object clause, and target clause exists, the predicate clause is moved to the second position counted from the end of the sentence.
Since the dependency structure is a hierarchical structure, an operation of moving the predicate clause as a dependency destination according to the above-described clause rearrangement rule is performed in each dependency hierarchy.

最後に、各文節の単語（内容語、機能語）を当該文節内で以下の（５）の単語の並べ替え規則に従って並べ替える。
（５）機能語を文節の先頭に移動させる。
上記（５）の単語の並べ替え規則に従った並べ替えを行うか否かを、文節間の係り受け関係、当該文節の文節内要素の構成、又は当該文節を係り先とする文節の文節内要素の構成に基づいて判断する。 Finally, the words (content words and function words) of each clause are rearranged according to the following word rearrangement rule (5) in the clause.
(5) Move the function word to the beginning of the phrase.
Whether or not to perform rearrangement according to the word rearrangement rule in (5) above is determined depending on the dependency relationship between clauses, the configuration of elements in the clause of the clause, or in the clause of the clause whose destination is the clause Judge based on the composition of the elements.

具体的には、文節間の係り受け関係において、文全体の述語文節における機能語が助動詞や終助詞である場合、当該機能語を文節の先頭（動詞の前）に移動する必要はないため、文全体の述語文節になっている（係り先がない）場合は、上記（５）の単語の並べ替え規則に基づく並べ替えを行わないように判断する。 Specifically, in the dependency relationship between clauses, if the function word in the predicate clause of the whole sentence is an auxiliary verb or final particle, it is not necessary to move the function word to the beginning of the clause (before the verb) If it is a predicate clause of the whole sentence (there is no relation point), it is determined not to perform rearrangement based on the word rearrangement rule of (5) above.

また、当該文節の文節内要素の構成において、主語文節の機能語が「格助詞」の「が」「は」である場合、上記（５）の単語の並べ替え規則に基づく並べ替えを行わないように判断する。 In addition, in the configuration of the element in the clause of the clause, when the function word of the subject clause is “case particle” “ga” or “ha”, the rearrangement based on the word rearrangement rule of the above (5) is not performed. Judge as follows.

また、当該文節を係り先とする文節の文節内要素の構成において、係り元の文節（当該文節を係り先とする文節）が格助詞「が」「は」を含んでいる場合、係り元の文節が格助詞「を」を含んでいる場合は、当該文節は述語文節であるため、上記（５）の単語の並べ替え規則に基づく並べ替えを行わないように判断する。また、当該文節を係り元とする文節の文節内要素の構成に基づいて、単語の並べ替え規則に基づく並べ替えを行うか否かを判断してもよい。 In addition, in the configuration of the clause element of the clause with the relevant clause as the destination, if the source clause (the clause with the relevant clause as a destination) contains the case particles “ga” and “ha”, When the phrase includes the case particle “”, since the clause is a predicate phrase, it is determined not to perform rearrangement based on the word rearrangement rule of (5) above. Further, it may be determined whether or not to perform rearrangement based on the word rearrangement rule based on the configuration of the elements in the clause of the clause having the clause as a source.

また、上記の文節間の係り受け関係、当該文節の文節内要素の構成、当該文節を係り先とする文節の文節内要素の構成、及び当該文節を係り元とする文節の文節内要素の構成の各々に基づく並べ替えの判断条件を組み合わせることによって、文節内要素を並べ替えるか否かを判断するようにしてもよい。 In addition, the dependency relationship between the above-mentioned clauses, the configuration of the element in the clause of the clause, the configuration of the element in the clause of the clause that is related to the clause, and the configuration of the element in the clause of the clause that is related to the clause It is also possible to determine whether or not the elements in the phrase are to be rearranged by combining rearrangement determination conditions based on each of the above.

翻訳部４２は、公知の機械翻訳器を用いて、並べ替え部４０において並べ替えられた日本語の入力文を翻訳し、複数の翻訳候補文の中から、モデル記憶部４４に記憶されている複数の翻訳モデルと翻訳モデルの各々に対する重みに基づいて、翻訳スコアが最適な翻訳候補文を選択し、出力部７０に出力する。なお、翻訳の方法は、非特許文献４（Phillip Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical Phrase-based Translation. In Proc. HLT- NAACL, pages 263-270.）の公知の統計的機械翻訳の技術によって実現すればよく、詳細な説明を省略する。 The translation unit 42 translates the Japanese input sentences rearranged by the rearrangement unit 40 using a known machine translator, and is stored in the model storage unit 44 from among a plurality of translation candidate sentences. Based on the plurality of translation models and the weights for each of the translation models, a translation candidate sentence with an optimal translation score is selected and output to the output unit 70. The translation method is a known statistical machine of Non-Patent Document 4 (Phillip Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical Phrase-based Translation. In Proc. HLT-NAACL, pages 263-270.). It may be realized by a translation technique, and detailed description is omitted.

モデル記憶部４４は、後述する翻訳モデル学習装置２００のモデル記憶部２６２に記憶されている翻訳モデルと翻訳モデルの各々に対する重みと同一の翻訳モデルと翻訳モデルの各々に対する重みが記憶されている。 The model storage unit 44 stores the same weights for the translation model and the translation model as the weights for the translation model and the translation model stored in the model storage unit 262 of the translation model learning device 200 described later.

出力部７０は、翻訳部４２において翻訳された英語で記述された文に、入力前の処理で削除したＸＭＬやＨＴＭＬタグなどの復元や追加を行い、出力装置もしくはネットワーク等によって外部に出力する。 The output unit 70 restores or adds XML or HTML tags deleted in the pre-input processing to the sentence written in English translated by the translation unit 42, and outputs it to the outside through an output device or a network.

＜翻訳モデル学習装置の構成＞
次に、本発明の実施の形態に係る翻訳モデル学習装置の構成について説明する。図２に示すように、本発明の実施の形態に係る翻訳モデル学習装置２００は、ＣＰＵと、ＲＡＭと、後述する翻訳モデル学習処理ルーチンを実行するためのプログラムや各種データを記憶したＲＯＭと、を含むコンピュータで構成することができる。この翻訳モデル学習装置２００は、機能的には図２に示すように入力部２１０と、演算部２２０と、出力部２７０とを備えている。 <Configuration of translation model learning device>
Next, the configuration of the translation model learning device according to the embodiment of the present invention will be described. As shown in FIG. 2, the translation model learning device 200 according to the embodiment of the present invention includes a CPU, a RAM, a ROM for storing a program and various data for executing a translation model learning process routine to be described later, It can comprise with the computer which includes. Functionally, the translation model learning apparatus 200 includes an input unit 210, a calculation unit 220, and an output unit 270 as shown in FIG.

入力部２１０は、キーボードなどの入力装置から複数の日本語と英語の対訳となっている文対、語句対（対訳データ）の集合である学習用並行コーパスを受け付ける。なお、入力部２１０は、ネットワーク等を介して外部から入力されたものを受け付けるようにしてもよい。 The input unit 210 accepts a parallel learning corpus that is a set of a plurality of sentence pairs and phrase pairs (parallel translation data) that are translated into Japanese and English from an input device such as a keyboard. Note that the input unit 210 may accept input from the outside via a network or the like.

演算部２２０は、学習用並行コーパス２５０と、モデル学習部２５２と、モデル記憶部２６２とを備えている。 The arithmetic unit 220 includes a learning parallel corpus 250, a model learning unit 252, and a model storage unit 262.

学習用並行コーパス２５０は、入力部２１０において受け付けた学習用並行コーパスが記憶されている。なお、学習用並行コーパスは、日本語と英語の対訳文が同じ行数の位置に記載されたテキストファイルである。 The learning parallel corpus 250 stores the learning parallel corpus received by the input unit 210. The learning parallel corpus is a text file in which Japanese and English translations are written at the same number of lines.

モデル学習部２５２は、学習データ言語解析部２５４と、学習データ並べ替え部２５６と、モデル用並行コーパス２５７と、翻訳モデル学習部２５８と、翻訳部２５９と、モデル重み学習部２６０と、を備えている。モデル学習部２５２は、学習用並行コーパス２５０に記憶されている、学習用並行コーパスを利用して統計的な翻訳モデル及びモデル重みを学習し、モデル記憶部２６２に記憶する。 The model learning unit 252 includes a learning data language analysis unit 254, a learning data rearrangement unit 256, a model parallel corpus 257, a translation model learning unit 258, a translation unit 259, and a model weight learning unit 260. ing. The model learning unit 252 learns statistical translation models and model weights using the learning parallel corpus stored in the learning parallel corpus 250 and stores them in the model storage unit 262.

学習データ言語解析部２５４は、学習用並行コーパスに含まれる日本語及び英語の対訳データ毎に、日本語及び英語の言語解析を行う。日本語の言語解析は、翻訳装置１００の言語解析部３０で行われる言語解析と同一の言語解析（形態素解析、構文解析、述語項構造解析）を行う。一方、英語の言語解析は単語区切りの同定を行う。英語は通常単語毎に分割されているため、文末記号等を切り出すのみでもよいし、その他当該分野において一般的に用いられる単語分割方法を利用しても良い。 The learning data language analyzing unit 254 performs Japanese and English language analysis for each Japanese and English parallel data included in the learning parallel corpus. The Japanese language analysis performs the same language analysis (morpheme analysis, syntax analysis, predicate term structure analysis) as the language analysis performed by the language analysis unit 30 of the translation apparatus 100. On the other hand, English language analysis identifies word breaks. Since English is usually divided for each word, it is possible to cut out only the end-of-sentence symbol or the like, or any other word division method generally used in this field may be used.

学習データ並べ替え部２５６は、学習データ言語解析部２５４において言語解析された複数の対訳データ毎に、対訳データに含まれる日本語の文又は語句に対して、翻訳装置１００の並べ替え部４０で用いられている並べ替え規則と同様の並べ替え規則を用いて、文節及び文節内の単語を並べ替えることによって語順を並べ替える。学習データ並べ替え部２５６は、並べ替えられた日本語で記述された文又は語句と、対応する対訳データに含まれる英語で記述された文又は語句（単語に区切られたもの）とを併せてモデル用並行コーパスとしてモデル用並行コーパス２５７に各々記憶する。 The learning data rearrangement unit 256 uses the rearrangement unit 40 of the translation apparatus 100 to translate Japanese sentences or phrases included in the bilingual data for each of a plurality of bilingual data analyzed by the learning data language analysis unit 254. The word order is rearranged by rearranging the clauses and words in the clauses using rearrangement rules similar to the rearrangement rules used. The learning data rearrangement unit 256 combines the rearranged sentence or phrase described in Japanese and the sentence or phrase described in English (separated into words) included in the corresponding parallel translation data. Each of them is stored in the model parallel corpus 257 as a model parallel corpus.

モデル用並行コーパス２５７は、学習データ並べ替え部２５６において並べ替えられた日本語で記述された文又は語句と、対応する対訳データに含まれる英語で記述された文又は語句（単語に区切られたもの）とを併せた対訳データの集合をモデル用並行コーパスとして記憶している。 The model parallel corpus 257 includes a sentence or phrase described in Japanese sorted by the learning data sorting unit 256 and a sentence or phrase written in English included in the corresponding parallel translation data (delimited by words). A parallel translation data set is stored as a parallel corpus for models.

翻訳モデル学習部２５８は、モデル用並行コーパス２５７に記憶されている対訳データの集合に基づいて、複数の統計的な翻訳モデルを学習し、モデル記憶部２６２に記憶する。統計的な翻訳モデルには、非特許文献４に代表される統計的機械翻訳技術で利用される「語句翻訳モデル」、「語句並べ替えモデル」、「言語モデル」等があり、機械翻訳の制約として適宜その構成を選択可能である。学習の方法は非特許文献４などが広く知られているが、本実施の形態においては、特定の方法に限定されず適用可能である。 The translation model learning unit 258 learns a plurality of statistical translation models based on the set of parallel translation data stored in the model parallel corpus 257 and stores it in the model storage unit 262. Statistical translation models include “phrase translation model”, “phrase rearrangement model”, “language model”, and the like that are used in statistical machine translation techniques represented by Non-Patent Document 4, and restrictions on machine translation. As appropriate, the configuration can be selected. As a learning method, Non-Patent Document 4 or the like is widely known, but the present embodiment is not limited to a specific method and can be applied.

例えば、翻訳モデル学習部２５８は、モデル用並行コーパス２５７に記憶されている日本語で記述された文又は語句と、英語で記述された文又は語句からなる対訳データに基づいて、統計的な語句翻訳モデルと語句並べ替えモデルの各々を学習する。また、モデル用並行コーパス２５７に記憶されている英語で記述された文又は語句に基づいて、言語モデルを学習する。 For example, the translation model learning unit 258 uses a statistical phrase based on bilingual data including a sentence or phrase written in Japanese and a sentence or phrase written in English stored in the model parallel corpus 257. Learn each of translation model and phrase rearrangement model. In addition, the language model is learned based on sentences or phrases described in English stored in the model parallel corpus 257.

翻訳部２５９は、重み学習用データであるモデル用並行コーパスに記憶されている日本語の文又は語句の各々について、公知の機械翻訳器と、モデル記憶部２６２に記憶されている複数の翻訳モデル及び複数の翻訳モデルの各々に対する重みと、を用いて複数の翻訳候補を作成する。例えば、モデル記憶部２６２に記憶されている翻訳モデルの各々に対する重みを用いて翻訳スコア（例えば、各翻訳モデルのスコアの重み付き加算値）を算出し、算出された翻訳スコアが一定の値よりも大きい翻訳候補のみを翻訳候補として抽出する。 The translation unit 259 includes a known machine translator and a plurality of translation models stored in the model storage unit 262 for each of Japanese sentences or phrases stored in the model parallel corpus that is weight learning data. And a plurality of translation candidates using the weights for each of the plurality of translation models. For example, a translation score (for example, a weighted addition value of the scores of each translation model) is calculated using the weight for each translation model stored in the model storage unit 262, and the calculated translation score is calculated from a certain value. Only translation candidates having a larger value are extracted as translation candidates.

モデル重み学習部２６０は、重み学習用データの複数の翻訳候補と、モデル用並行コーパス２５７に記憶されている当該重み学習用データに対応する英語で記述された文又は語句（正解翻訳）とに基づいて、翻訳部２５９において抽出された各翻訳候補に対する翻訳評価尺度（例えば、ＢＬＥＵ値）を算出する。 The model weight learning unit 260 includes a plurality of translation candidates for the weight learning data and sentences or phrases (correct answer translations) written in English corresponding to the weight learning data stored in the model parallel corpus 257. Based on this, a translation evaluation scale (for example, BLEU value) for each translation candidate extracted by the translation unit 259 is calculated.

そして、モデル重み学習部２６０は、翻訳部２５９において抽出された翻訳候補の各々の翻訳評価尺度に基づいて、良い翻訳である翻訳候補ほど、モデル記憶部２６２に記憶されている翻訳モデルの各々に対する重みを用いて算出される翻訳スコア（例えば、各翻訳モデルのスコアの重み付き加算値）が高くなるように、翻訳モデルの各々に対する重みを最適化し、翻訳モデルの各々に対する重みをモデル記憶部２６２に記憶する。 Then, based on the translation evaluation scales of the translation candidates extracted by the translation unit 259, the model weight learning unit 260 increases the translation candidate that is a better translation for each of the translation models stored in the model storage unit 262. The weight for each translation model is optimized so that the translation score calculated using the weight (for example, the weighted addition value of the scores of each translation model) is high, and the weight for each translation model is assigned to the model storage unit 262. To remember.

上記の翻訳部２５９において行われる翻訳候補の抽出と、モデル重み学習部２６０において行われる重みの学習とは、翻訳モデルの各々に対する重みが収束するまで繰り返して行われる。 The extraction of translation candidates performed in the translation unit 259 and the learning of weights performed in the model weight learning unit 260 are repeatedly performed until the weights for each of the translation models converge.

出力部２７０は、モデル記憶部２６２に記憶されている翻訳モデル及びモデル重みを出力する。 The output unit 270 outputs the translation model and model weight stored in the model storage unit 262.

＜翻訳モデル学習装置の作用＞
次に、本発明の実施の形態に係る翻訳モデル学習装置２００の作用について説明する。まず、入力部２１０により、日本語と英語の対訳となっている文対、語句対（対訳データ）の集合である学習用並行コーパスが入力され、学習用並行コーパス２５０に記憶される。そして、翻訳モデル学習装置２００のＲＯＭに記憶されたプログラムを、ＣＰＵが実行することにより、図３に示す翻訳モデル学習処理ルーチンが実行される。 <Operation of translation model learning device>
Next, the operation of the translation model learning device 200 according to the embodiment of the present invention will be described. First, a learning parallel corpus, which is a set of sentence pairs and phrase pairs (translation data) that are parallel translations of Japanese and English, is input by the input unit 210 and stored in the learning parallel corpus 250. Then, when the CPU executes the program stored in the ROM of the translation model learning device 200, the translation model learning processing routine shown in FIG. 3 is executed.

まず、ステップＳ２００では、学習用並行コーパス２５０に記憶されている学習用並行コーパスを読み出す。 First, in step S200, the learning parallel corpus stored in the learning parallel corpus 250 is read.

次に、ステップＳ２０２では、ステップＳ２００において読み込んだ学習用並行コーパスに含まれる日本語の文又は語句の各々について、公知の形態素解析器（ＪＵＭＡＮ、ＭｅＣａｂ等）を利用し、形態素解析（単語区切りと品詞の同定）を行う。 Next, in step S202, for each of the Japanese sentences or phrases included in the learning parallel corpus read in step S200, using a known morphological analyzer (such as JUMAN, MeCab), Identification of part of speech).

次に、ステップＳ２０４では、ステップＳ２０２において形態素解析された日本語で記述された文又は語句の各々について、公知の係り受け解析器（ＫＮＰ、ＳｙｎＣｈａ等）を用いて、文節間の係り受けを解析すると共に、文節内の各単語に「内容語」及び「機能語」の分類を与える。 Next, in step S204, for each sentence or phrase described in Japanese that has been morphologically analyzed in step S202, the dependency between phrases is analyzed using a known dependency analyzer (KNP, SynCha, etc.). At the same time, a classification of “content word” and “function word” is given to each word in the phrase.

次に、ステップＳ２０６では、ステップＳ２０４において構文解析された日本語で記述された文又は語句の各々について、公知の述語項構造解析器（ＫＮＰ、ＳｙｎＣｈａ等）を利用し、述語項構造解析を行い、述語文節（動詞、形容詞、動作を表す名詞）に対し、「ガ格」と呼ばれる主語文節、「ヲ格」と呼ばれる目的語文節、「ニ格」と呼ばれる対象文節を同定する。 Next, in step S206, for each sentence or phrase described in Japanese that has been parsed in step S204, a predicate term structure analysis is performed using a known predicate term structure analyzer (KNP, SynCha, etc.). For a predicate clause (verb, adjective, action noun), a subject clause called “ga case”, an object clause called “wo case”, and a target clause called “ni case” are identified.

次に、ステップＳ２０７では、ステップＳ２００において読み込んだ学習用並行コーパスに含まれる英語で記述されている文又は語句の各々について、単語区切りの同定を行う。 Next, in step S207, word breaks are identified for each sentence or phrase described in English included in the parallel learning corpus read in step S200.

次に、ステップＳ２０８では、ステップＳ２０６において述語項構造解析を行った日本語で記述された文又は語句の各々について、文節の並べ替え規則を用いて、文節を並べ替えた後に、単語の並べ替え規則を用いて、文節内の単語を並べ替えることによって語順を並べ替え、並べ替えられた日本語で記述された文又は語句と、対応する対訳データに含まれる英語で記述された文又は語句（ステップＳ２０７で取得したもの）とを併せてモデル用並行コーパスとしてモデル用並行コーパス２５７に記憶する。 Next, in step S208, for each sentence or phrase described in Japanese that has been subjected to the predicate term structure analysis in step S206, the phrase is rearranged using the phrase rearrangement rule, and then the word rearrangement is performed. Using the rules, the word order is rearranged by rearranging the words in the clause, the rearranged sentence or phrase written in Japanese, and the sentence or phrase written in English included in the corresponding bilingual data ( Together with the one acquired in step S207) and stored in the model parallel corpus 257 as a model parallel corpus.

次に、ステップＳ２１２では、モデル用並行コーパス２５７に記憶されている日本語で記述された文又は語句と、英語で記述された文又は語句からなる対訳データの集合に基づいて、統計的な語句翻訳モデルと語句並べ替えモデルの各々を学習し、モデル用並行コーパス２５７に記憶されている英語で記述された文又は語句の各々に基づいて、言語モデルを学習する。そして、学習した各翻訳モデルをモデル記憶部２６２に記憶する。また、各翻訳モデルの各々に対する重みの初期値をモデル記憶部２６２に記憶する。 Next, in step S212, based on a set of bilingual data composed of sentences or phrases written in Japanese and sentences or phrases written in English, stored in the model parallel corpus 257, statistical phrases Each of the translation model and the phrase rearrangement model is learned, and the language model is learned based on each sentence or phrase described in English stored in the model parallel corpus 257. Then, each learned translation model is stored in the model storage unit 262. In addition, the initial value of the weight for each translation model is stored in the model storage unit 262.

次に、ステップＳ２１４では、ステップＳ２０８において語順を並べ替えた日本語で記述された文又は語句の各々について、公知の機械翻訳器と、上記ステップＳ２１２で学習した各翻訳モデルと、モデル記憶部２６２に記憶されている翻訳モデルの各々に対する重みとを用いて、翻訳スコアに基づいて、複数の翻訳候補を作成する。 Next, in step S214, for each sentence or phrase described in Japanese whose word order is rearranged in step S208, a known machine translator, each translation model learned in step S212, and a model storage unit 262 A plurality of translation candidates are created based on the translation score using the weight for each of the translation models stored in.

ステップＳ２１６では、上記ステップＳ２１４で作成された複数の翻訳候補の各々について、モデル用並行コーパス２５７に記憶されている当該翻訳候補に対応する英語で記述された文又は語句（正解翻訳）に基づいて、各翻訳候補に対する翻訳評価尺度を算出する。 In step S216, for each of the plurality of translation candidates created in step S214, based on the sentence or phrase (correct translation) described in English corresponding to the translation candidate stored in the model parallel corpus 257. The translation evaluation scale for each translation candidate is calculated.

そして、各翻訳候補の翻訳評価尺度と、各翻訳候補の翻訳スコアとに基づいて、翻訳モデルの各々に対する重みを最適化し、翻訳モデルの各々に対する重みを学習し、モデル記憶部２６２に記憶する。 Then, based on the translation evaluation scale of each translation candidate and the translation score of each translation candidate, the weight for each translation model is optimized, the weight for each translation model is learned, and stored in the model storage unit 262.

次に、ステップＳ２１８では、ステップＳ２１６で学習した翻訳モデルの各々に対する重みが収束したか否かを判定する。収束している場合には、処理を終了する。収束していない場合には、ステップＳ２１４に移行する。 Next, in step S218, it is determined whether the weights for each of the translation models learned in step S216 have converged. If it has converged, the process ends. If not converged, the process proceeds to step S214.

＜翻訳装置の作用＞
次に、本発明の実施の形態に係る翻訳装置１００の作用について説明する。まず、入力部１０により、翻訳モデル学習装置２００によって学習された複数の翻訳モデル及び翻訳モデルの各々に対する重みが入力されモデル記憶部４４に記憶される。そして、入力部１０により、日本語で記述された文が入力されると、翻訳装置１００のＲＯＭに記憶されたプログラムを、ＣＰＵが実行することにより、図４に示す翻訳処理ルーチンが実行される。 <Operation of translation device>
Next, the operation of translation apparatus 100 according to the embodiment of the present invention will be described. First, the input unit 10 inputs a plurality of translation models learned by the translation model learning device 200 and weights for each of the translation models, and stores them in the model storage unit 44. When a sentence written in Japanese is input by the input unit 10, the translation processing routine shown in FIG. 4 is executed by the CPU executing the program stored in the ROM of the translation apparatus 100. .

まず、ステップＳ１００において、日本語で記述された入力文を受け付ける。 First, in step S100, an input sentence written in Japanese is accepted.

次に、ステップＳ１０２において、ステップＳ１００において受け付けた日本語で記述された入力文に対して、公知の形態素解析器（ＪＵＭＡＮ、ＭｅＣａｂ等）を利用し、形態素解析（単語区切りと品詞の同定）を行う。 Next, in step S102, a morphological analysis (word delimiter and part-of-speech identification) is performed on the input sentence described in Japanese received in step S100 by using a known morphological analyzer (such as JUMAN, MeCab). Do.

次に、ステップＳ１０４において、ステップＳ１０２において形態素解析された入力文に対して、公知の係り受け解析器（ＫＮＰ、ＣａｂｏＣｈａ等）を使用して構文解析を行い、文節間の係り受けを解析すると共に、文節内の各単語に「内容語」及び「機能語」の分類を与える。 Next, in step S104, the input sentence subjected to morphological analysis in step S102 is parsed using a known dependency analyzer (KNP, CaboCha, etc.) to analyze the dependency between clauses. The words “content word” and “function word” are assigned to each word in the phrase.

次に、ステップＳ１０６において、ステップＳ１０４において構文解析された入力文に対して、公知の述語項構造解析器（ＫＮＰ、ＳｙｎＣｈａ等）を利用し、述語項構造解析を行い、述語文節に対し、主語文節、目的語文節、対象文節を同定する。 Next, in step S106, a predicate term structure analyzer (KNP, SynCha, etc.) is used for the input sentence parsed in step S104 to analyze the predicate term structure, Identify clauses, object clauses, and target clauses.

次に、ステップＳ１０８において、ステップＳ１０６において述語項構造解析を行った入力文について、文節の並べ替え規則を用いて、文節を並べ替えた後に、単語の並べ替え規則を用いて、文節内の単語を並べ替えることによって語順を並べ替える。 Next, in step S108, the input sentence that has been subjected to the predicate term structure analysis in step S106 is rearranged using the clause rearrangement rule, and then the word in the clause is used using the word rearrangement rule. Rearrange the word order by rearranging.

次に、ステップＳ１１２において、ステップＳ１０８において取得した語順を並べ替えた入力文について、公知の機械翻訳器を用いて翻訳し、複数の翻訳候補文の中から、モデル記憶部４４に記憶されている複数の翻訳モデルと翻訳モデルの各々に対する重みに基づいて、翻訳スコアが最適な翻訳候補文を選択する。 Next, in step S112, the input sentence obtained by rearranging the word order acquired in step S108 is translated using a known machine translator, and stored in the model storage unit 44 from a plurality of translation candidate sentences. A translation candidate sentence having an optimal translation score is selected based on a plurality of translation models and a weight for each of the translation models.

次に、ステップＳ１１４において、ステップＳ１１２において選択した翻訳結果を出力して処理を終了する。 Next, in step S114, the translation result selected in step S112 is output and the process ends.

＜翻訳モデル学習装置２００の実施例＞
次に、約３００万文の日本語・英語並行コーパスから統計翻訳モデルの学習を行った実施例について以下説明する。 <Example of Translation Model Learning Device 200>
Next, an embodiment in which a statistical translation model is learned from a Japanese-English parallel corpus of about 3 million sentences will be described below.

翻訳モデル学習装置２００の学習データ言語解析部２５４において、日本語及び英語の言語解析を行う。日本語の言語解析は翻訳装置１００の言語解析部３０と同一の処理を行う。英語の言語解析はＭｏｓｅｓに同梱されている単語分割プログラムを利用し、単語分割のみ行う。日本語の言語解析結果は言語解析部３０と同様の結果である。英語の単語分割結果の例を図９に示す。 The learning data language analysis unit 254 of the translation model learning device 200 performs language analysis of Japanese and English. Japanese language analysis performs the same processing as the language analysis unit 30 of the translation apparatus 100. English language analysis uses only the word segmentation program bundled with Moses and performs only word segmentation. The Japanese language analysis result is the same as that of the language analysis unit 30. An example of the English word division result is shown in FIG.

翻訳モデル学習装置２００の学習データ並べ替え部２５６において、並行コーパスの日本語については、翻訳装置１００の並べ替え部４０と同様の並べ替え処理を行う。当該処理の結果を図１０に示す。 In the learning data rearrangement unit 256 of the translation model learning device 200, the same sort processing as that of the rearrangement unit 40 of the translation device 100 is performed for the Japanese language of the parallel corpus. The result of this processing is shown in FIG.

上述の通り、日本語と英語の並行コーパスは、並べ替えられた日本語の単語列の集合と、英語の単語列の集合に書き換えられる。各単語列の集合は前記単語区切りの結果の例のように単語ごとに半角空白で区切られた、１行に１文が格納されたテキストファイルである。翻訳モデル学習においては、Ｍｏｓｅｓで提供されている学習プログラムにより、日本語と英語の単語列の集合を表すテキストファイルから、「語句翻訳モデルＤＢ：phrase-table.gz」「語句並べ替えモデルＤＢ：reordering-table.wbe-msd-bidirectional-fe.gz」を作成する。 As described above, the parallel corpus of Japanese and English is rewritten into a rearranged set of Japanese word strings and a set of English word strings. Each set of word strings is a text file in which one sentence is stored in one line separated by single-byte spaces for each word as in the example of the result of word separation. In the translation model learning, a phrase program model DB: phrase-table.gz, phrase rearrangement model DB: from a text file representing a set of Japanese and English word strings by a learning program provided by Moses. reordering-table.wbe-msd-bidirectional-fe.gz ".

また、公知の言語モデル学習プログラムＳＲＩＬＭにより、英語の単語列の集合のテキストファイルから「言語モデルＤＢ：ja.5gram.arpa.gz」を作成する。本実施例では、単語５グラム言語モデルを作成する。 Further, “language model DB: ja.5gram.arpa.gz” is created from a text file of a set of English word strings by a known language model learning program SRILM. In this embodiment, a word 5-gram language model is created.

さらに、モデル間の重みの最適値を決定する「誤り最小化学習」（Minimum Error Rate Training:MERT）と呼ばれる公知の方法（非特許文献５）によって、各モデルに対する重みを学習し、前記モデルＤＢの情報と併せて、翻訳プログラム設定ファイルに書き出す。 Further, the weight of each model is learned by a known method (Non-Patent Document 5) called “Minimum Error Rate Training (MERT)” that determines the optimum value of the weight between models, and the model DB Along with this information, write it to the translation program settings file.

＜翻訳装置１００の実施例＞
「データ保存装置１０がデータ収集装置２０に接続される。」という日本語文を、翻訳装置１００が実装された計算機端末に入力した場合の実施例を以下に示す。 <Example of Translation Device 100>
An example in the case where a Japanese sentence “the data storage device 10 is connected to the data collection device 20” is input to a computer terminal on which the translation device 100 is mounted is shown below.

本実施例では日本語の言語解析に公知の日本語形態素解析ソフトウェアＪＵＭＡＮ、及び公知の述語項構造解析を含む構文解析ソフトウェアＫＮＰを利用する。本実施例においては、形態素解析ステップ、構文解析ステップ、述語項構造解析ステップを一括で行うために、入力部１０から入力された日本語文を計算機端末の標準入力からＪＵＭＡＮに入力し、その出力を直接ＫＮＰの入力とするように構成している。また、ＫＮＰ解析結果は図５のように出力される。なお、上記図５では、本実施例で利用しない情報の一部を取り除いて表記している。 In this embodiment, Japanese morphological analysis software JUMAN known for Japanese language analysis and syntactic analysis software KNP including known predicate term structure analysis are used. In this embodiment, in order to perform the morphological analysis step, the syntax analysis step, and the predicate term structure analysis step collectively, the Japanese sentence input from the input unit 10 is input from the standard input of the computer terminal to JUMAN, and the output is It is configured to directly input KNP. The KNP analysis result is output as shown in FIG. In FIG. 5, the information that is not used in the present embodiment is partly omitted.

行頭の記号「＊」は文節の始まりを示し、行頭の記号「＋」は続く行に示された単語が内容語であることを示している。この解析結果の解釈は図６に示す通りである。 The symbol “*” at the beginning of a line indicates the beginning of a phrase, and the symbol “+” at the beginning of a line indicates that the word indicated in the following line is a content word. The interpretation of the analysis result is as shown in FIG.

翻訳装置１００の並べ替え部４０は、文節の並べ替え規則に従い、日本語文の文節を並べ替える。本実施例では、上記の日本語文の述語文節を主語（ガ格）文節の直後に移動する。ただし、句点は文末を表す記号であることを考慮し、文末に残したままにする。並べ替えの結果は、図７に示すようになる。そして、各文節の文節内の機能語を、主語文節の助詞「は」「が」を除いて文節の先頭へ並べ替えると、図８に示すようになる。 The rearrangement unit 40 of the translation apparatus 100 rearranges the Japanese sentence clauses according to the clause rearrangement rules. In the present embodiment, the predicate clause of the above Japanese sentence is moved immediately after the subject clause. However, considering that the punctuation mark is a symbol representing the end of the sentence, it is left at the end of the sentence. The result of rearrangement is as shown in FIG. Then, when the function words in the clauses of each clause are rearranged to the head of the clauses except for the particles “ha” and “ga” in the subject clause, the result is as shown in FIG.

なお、入力文が、「図１に示すデータ保存装置１０がデータ収集装置２０に接続される」のように、文節「データ保存装置１０が」がさらに別の文節によって修飾されているような場合には、階層的な処理が必要となる。まず、「図１に示す」→「データ保存装置１０が」という文節間の関係に基づき、述語文節が動詞連用形の文節である場合、当該述語文節を係り先の単語の直後に移動する、という文節の並べ替え規則、主語文節がない場合は目的語文節（ヲ格）及び対象文節（ニ格）のうちの何れか前方にあるものの直前に述語文節を移動する、という文節の並べ替え規則、及び単語の並べ替え規則に従って、「データ保存装置１０「示すに図１」が」のように並べ替えて、その後、上述の文節の並べ替え規則に従って、文節の並べ替えを行う。 In the case where the phrase “data storage device 10” is further modified by another clause, such as “the data storage device 10 shown in FIG. 1 is connected to the data collection device 20”. Requires hierarchical processing. First, based on the relationship between the phrases “shown in FIG. 1” → “data storage device 10 is”, when the predicate clause is a verb-use phrase, the predicate clause is moved immediately after the word concerned. Reordering rules for clauses, if there is no subject clause, reordering rules for moving the predicate clause immediately before one of the target clause (wo case) or the target clause (d), Then, according to the word rearrangement rule, the data is rearranged as “data storage device 10“ shown in FIG. 1 ”, and then the phrase rearrangement is performed according to the phrase rearrangement rule described above.

翻訳装置１００の翻訳部４２は、並べ替え部４０で並べ替えられた日本語の文を英語に翻訳する。本実施例では、翻訳モデル学習装置２００の翻訳モデル学習部２５８で学習された統計翻訳モデル（句翻訳モデル、句並べ替えモデル、言語モデル）及び各モデルに対する重みを利用し、公知の統計翻訳ソフトウェアＭｏｓｅｓを用いた。本実施例においてはＭｏｓｅｓの「the data storage device 10 is connected to a data collecting device 20.」との出力結果を出力部７０に出力する。 The translation unit 42 of the translation apparatus 100 translates the Japanese sentences rearranged by the rearrangement unit 40 into English. In this embodiment, the statistical translation model (phrase translation model, phrase rearrangement model, language model) learned by the translation model learning unit 258 of the translation model learning device 200 and the weights for each model are used, and known statistical translation software is used. Moses was used. In this embodiment, the output result of “the data storage device 10 is connected to a data collecting device 20.” of Moses is output to the output unit 70.

上記の実施例の方法で実現された機械翻訳プログラムは、従来技術で構成された機械翻訳プログラムおよび従来の主辞後置型並べ替え方法に基づく機械翻訳プログラムより高い翻訳性能を示すことが分かった。約３００万文の日英並行コーパスで学習された統計モデルを用いた実験において、当該分野で最も一般的に用いられる評価指標ＢＬＥＵの値が、事前並べ替え処理を行わない従来技術で構成された機械翻訳プログラムでは０．２９５６であったのに対し、本実施例の機械翻訳プログラムでは０．３１７０を達成している。 It has been found that the machine translation program realized by the method of the above embodiment shows higher translation performance than the machine translation program configured by the prior art and the machine translation program based on the conventional postfix rearrangement method. In an experiment using a statistical model learned with a Japanese-English parallel corpus of about 3 million sentences, the value of the evaluation index BLEU that is most commonly used in the field is composed of conventional techniques that do not perform pre-ordering processing. While the machine translation program was 0.2956, the machine translation program of this embodiment achieved 0.3170.

以上説明したように、本発明の実施の形態に係る翻訳装置によれば、入力文の各文節内の各単語を「内容語」、「機能語」に分類し、予め定められた文節の並べ替え規則に従って、入力文の文節の順序を並べ替え、予め定められた単語の並べ替え規則に従って、「内容語」、「機能語」の順序を並べ替えることによって語順を並べ替え、語順を並べ替えた入力文を翻訳先言語で記述された文に翻訳することにより、入力文を精度良く翻訳することが出来る。 As described above, according to the translation apparatus according to the embodiment of the present invention, each word in each phrase of the input sentence is classified into “content word” and “function word”, and a predetermined phrase arrangement is performed. The order of clauses in the input sentence is rearranged according to the replacement rule, the word order is rearranged by rearranging the order of the “content word” and “functional word” according to the predetermined word rearrangement rule, and the word order is rearranged. By translating the input sentence into a sentence described in the translation destination language, the input sentence can be translated with high accuracy.

また、本発明の実施の形態に係る翻訳モデル学習装置によれば、日本語の文又は語句の各文節内の各単語を「内容語」、「機能語」に分類し、予め定められた文節の並べ替え規則に従って、日本語の文又は語句の文節の順序を並べ替え、予め定められた単語の並べ替え規則に従って、「内容語」、「機能語」の順序を並べ替えることによって語順を並べ替え、語順を並べ替えた日本語の文又は語句と、対応する対訳データに含まれる翻訳先言語で記述された文又は語句とに基づいて、複数種類の翻訳モデルを学習し、語順を並べ替えた日本語で記述された文又は語句と、対応する対訳データに含まれる特定言語で記述された文又は語句と、複数種類の翻訳モデルとに基づいて、複数種類の翻訳モデルの各々に対する重みを学習することにより、精度良く翻訳を行うことが出来る翻訳モデルを学習することが出来る。 Further, according to the translation model learning apparatus according to the embodiment of the present invention, each word in each phrase of a Japanese sentence or phrase is classified into “content word” and “function word”, and a predetermined phrase is determined. Rearrange the order of Japanese sentence or phrase clause according to the rearrangement rules, and rearrange the word order by rearranging the order of “content words” and “functional words” according to the predetermined word rearrangement rules Learn multiple translation models based on Japanese sentences or phrases that have been rearranged and rearranged, and sentences or phrases written in the target language included in the corresponding bilingual data, and rearrange the word order Based on a sentence or phrase written in Japanese, a sentence or phrase written in a specific language included in the corresponding parallel translation data, and a plurality of kinds of translation models, weights for each of the plurality of kinds of translation models are given. By learning It is possible to learn a translation model that can be performed with high accuracy translation.

また、日本語から英語への翻訳において、日本語の語順を英語に近い語順に並べ替えることが可能になるため、日本語から英語への翻訳において語順の差が非常に小さくなることから、翻訳がより容易になる。 Also, in the translation from Japanese to English, it is possible to rearrange the Japanese word order to the word order close to English, so the difference in word order in the translation from Japanese to English is very small. Becomes easier.

なお、本発明は、上述した実施形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。 The present invention is not limited to the above-described embodiment, and various modifications and applications are possible without departing from the gist of the present invention.

例えば、形態素解析と、構文解析と、述語項構造解析とを別々に行う場合を説明したが、これに限定されるものではなく、形態素解析と構文解析を同時に行ってもよい。また、構文解析と述語項構造解析を同時に行ってもよい。 For example, a case has been described in which morphological analysis, syntax analysis, and predicate term structure analysis are performed separately, but the present invention is not limited to this, and morphological analysis and syntax analysis may be performed simultaneously. Further, syntax analysis and predicate term structure analysis may be performed simultaneously.

また、日本語を英語の語順に相似するように語順を並べ替える場合について説明したが、これに限定されるものではなく、言語が異なる２言語の第１言語を第２言語の語順に相似するように語順を並べ替えてもよい。 Moreover, although the case where word order was rearranged so that Japanese is similar to the word order of English was demonstrated, it is not limited to this, The first language of two languages from which languages differ is similar to the word order of a 2nd language In this way, the word order may be rearranged.

１０入力部
２０演算部
３０言語解析部
３２形態素解析部
３４構文解析部
３６述語項構造解析部
４０並べ替え部
４２翻訳部
４４モデル記憶部
７０出力部
１００翻訳装置
２００翻訳モデル学習装置
２１０入力部
２２０演算部
２５０学習用並行コーパス
２５２モデル学習部
２５４学習データ言語解析部
２５６学習データ並べ替え部
２５７モデル用並行コーパス
２５８翻訳モデル学習部
２５９翻訳部
２６０モデル重み学習部
２６２モデル記憶部
２７０出力部 DESCRIPTION OF SYMBOLS 10 Input part 20 Operation part 30 Language analysis part 32 Morphological analysis part 34 Syntax analysis part 36 Predicate term structure analysis part 40 Rearrangement part 42 Translation part 44 Model storage part 70 Output part 100 Translation apparatus 200 Translation model learning apparatus 210 Input part 220 Computing unit 250 Learning parallel corpus 252 Model learning unit 254 Learning data language analysis unit 256 Learning data rearrangement unit 257 Model parallel corpus 258 Translation model learning unit 259 Translation unit 260 Model weight learning unit 262 Model storage unit 270 Output unit

Claims

For each clause of the input sentence, a parsing unit that classifies each element in the clause into two or more types of elements,
A rearrangement unit that rearranges the word order of the input sentence by rearranging the order of the two or more elements classified by the syntax analysis unit according to a predetermined rearrangement rule for each clause.
Sorting device including word order.

The word order rearrangement device according to claim 1, wherein the syntax analysis unit classifies each element in the phrase into a content word and a function word.

The reordering unit reorders the order of the two or more types of elements of the clause for each clause based on a dependency relationship between the clauses obtained from a result of dependency analysis on the input sentence. The word order rearrangement apparatus according to claim 1, wherein the rearrangement apparatus rearranges the order of the two or more types of elements.

The reordering unit, for each of the clauses, based on a configuration of elements of at least one of the clause source and the destination clause of the clause obtained from the result of dependency analysis on the input sentence, the two types of the clauses The word order rearrangement device according to claim 1 or 2, wherein it is determined whether or not the order of the elements is rearranged, and the order of the two or more types of elements is rearranged.

The rearrangement unit, for each clause, a dependency relationship between the clauses obtained from a result of dependency analysis on the input sentence, and a configuration of elements of at least one clause of the clause source and destination The word order rearrangement device according to claim 1 or 2, wherein it is determined whether or not the order of the two or more types of elements of the phrase is to be rearranged based on the order, and the order of the two or more types of elements is rearranged.

The word order rearrangement according to claim 2, wherein the rearrangement unit rearranges the order of the two or more types of elements of the clause so that the functional word of the clause is arranged at the head of the clause for each clause. apparatus.

In a word order rearrangement device that rearranges the word order of input sentences written in Japanese into the order of words close to sentences written in a specific language different from Japanese,
For each clause of the input sentence, a syntax analyzer that classifies each element in the clause into two or more types of elements;
Rearranging the order of clauses of the input sentence according to a predetermined clause rearrangement rule for rearranging the order of clauses of sentences written in Japanese to the order of clauses of sentences described in the specific language; For the rearranged input sentence, for each of the phrases, the order of the two or more kinds of elements of the sentence sentence described in Japanese is the two or more kinds of elements of the sentence phrase described in the specific language. A rearrangement unit that rearranges the word order of the input sentence by rearranging the order of the two or more types of elements classified by the syntax analysis unit in accordance with a predetermined element rearrangement rule for rearranging in the order of When,
Sorting device including word order.

In a translation device that translates input sentences written in Japanese into sentences written in a specific language different from Japanese,
For each clause of the input sentence, a syntax analyzer that classifies each element in the clause into two or more types of elements;
Rearranging the order of clauses of the input sentence according to a predetermined clause rearrangement rule for rearranging the order of clauses of sentences written in Japanese to the order of clauses of sentences described in the specific language; For the rearranged input sentence, for each of the phrases, the order of the two or more kinds of elements of the sentence sentence described in Japanese is the two or more kinds of elements of the sentence phrase described in the specific language. A rearrangement unit that rearranges the word order of the input sentence by rearranging the order of the two or more types of elements classified by the syntax analysis unit in accordance with a predetermined element rearrangement rule for rearranging in the order of When,
A translation for translating the input sentence in which the order of the elements is rearranged by the rearrangement unit into a sentence described in the specific language based on a plurality of kinds of translation models and weights for each of the plurality of kinds of translation models And
Translation device including

A sentence written in Japanese based on a set of bilingual data prepared in advance, which is a pair of a sentence or phrase written in Japanese and a sentence or phrase written in a specific language different from Japanese A translation model learning apparatus for learning a plurality of types of translation models for translating a sentence described in the specific language and a weight for each of the plurality of types of translation models,
A learning data language analysis unit for classifying each element in the phrase into two or more elements for each phrase of the sentence or phrase described in Japanese of each parallel translation data of the set of parallel translation data;
Sentences written in Japanese in each bilingual data according to a predetermined clause rearrangement rule for rearranging the order of clauses of sentences written in Japanese to the order of clauses of sentences described in the specific language Alternatively, the order of the phrase clauses is rearranged, and, for each of the rearranged sentences or phrases, the sentence in which the order of the two or more types of elements of the sentence or phrase clause is described in the specific language or By rearranging the order of the two or more types of elements classified by the learning data language analysis unit according to a predetermined element rearrangement rule for rearranging in the order of the two or more types of elements of the phrase of the phrase , A learning data rearrangement unit for rearranging the word order of the sentence or phrase;
Based on the sentence or phrase described in Japanese of each parallel translation data in which the order of the elements is rearranged by the learning data rearrangement unit, and the sentence or phrase described in the specific language of each parallel translation data, A translation model learning unit for learning multiple types of translation models;
A sentence or phrase described in Japanese of each parallel translation data in which the order of elements is rearranged by the learning data rearrangement unit, a sentence or phrase described in the specific language of each parallel translation data, and the translation model learning A model weight learning unit that learns weights for each of the plurality of types of translation models based on a plurality of types of translation models learned by the unit;
Translation model learning device including

A word order rearrangement method of a word order rearrangement device including a parsing unit and a rearrangement unit,
The parsing unit classifies each element in the clause into two or more types of elements for each clause of the input sentence,
The reordering unit determines the order of the clauses of the input sentence written in Japanese based on the result of dependency analysis for each clause of the input sentence and the predicate term structure analysis result for each predicate of the input sentence. The word order rearrangement method of rearranging and rearranging the order of two or more types of elements classified by the syntax analysis unit.

A word order rearrangement device that includes a parsing unit and a rearrangement unit, and rearranges the word order of input sentences written in Japanese into the order of words similar to sentences written in a specific language different from Japanese. A replacement method,
The parsing unit classifies each element in the clause into two or more types of elements for each clause of the input sentence,
The rearrangement unit includes a phrase clause of the input sentence according to a predetermined phrase rearrangement rule for rearranging the order of sentence phrases described in Japanese to the order of sentence phrases described in the specific language. For the input sentence thus rearranged, the order of the two or more types of elements of the sentence clause described in Japanese is set for each of the clauses of the sentence clause described in the specific language. By rearranging the order of the two or more types of elements classified by the syntax analysis unit in accordance with a predetermined element rearrangement rule for rearranging in the order of the two or more types of elements, the word order of the input sentence Sort the word order.

A translation method for a translation apparatus, comprising: a parsing unit, a sorting unit, and a translation unit, and translating an input sentence written in Japanese into a sentence written in a specific language different from Japanese ,
The parsing unit classifies each element in the clause into two or more types of elements for each clause of the input sentence,
The rearrangement unit includes a phrase clause of the input sentence according to a predetermined phrase rearrangement rule for rearranging the order of sentence phrases described in Japanese to the order of sentence phrases described in the specific language. For the input sentence thus rearranged, the order of the two or more types of elements of the sentence clause described in Japanese is set for each of the clauses of the sentence clause described in the specific language. By rearranging the order of the two or more types of elements classified by the syntax analysis unit in accordance with a predetermined element rearrangement rule for rearranging in the order of the two or more types of elements, the word order of the input sentence Sort
The translation unit is described in the specific language with the input sentence in which the order of the elements is rearranged by the rearrangement unit based on a plurality of types of translation models and a weight for each of the plurality of types of translation models. Translate to sentence Translation method.

A sentence or phrase written in Japanese, which includes a learning data language analysis unit, a learning data rearrangement unit, a translation model learning unit, and a model weight learning unit, is different from Japanese A plurality of types of translation models for translating sentences described in Japanese into sentences described in the specific language based on a set of parallel translation data that is a pair with sentences or phrases described in the specific language; A translation model learning method of a translation model learning device for learning a weight for each of the plurality of types of translation models,
The learning data language analysis unit classifies each element in the phrase into two or more elements for each sentence of a sentence or a phrase described in Japanese of each parallel translation data of the set of parallel translation data,
The learning data rearrangement unit is configured to convert each bilingual data in accordance with a predetermined phrase rearrangement rule for rearranging the order of sentences of sentences described in Japanese to the order of sentences of sentences described in the specific language. Rearranging the order of clauses of sentences or phrases described in Japanese, and for the rearranged sentences or phrases, the order of the two or more types of elements of the sentence or phrase clauses for each of the phrases The two or more types classified by the learning data language analysis unit according to a predetermined element rearrangement rule for rearranging in the order of the two or more types of elements in a sentence or phrase phrase described in a specific language Rearrange the word order of the sentence or phrase by rearranging the order of the elements,
The translation model learning unit includes a sentence or phrase described in Japanese of each parallel translation data in which the order of elements is rearranged by the learning data rearrangement unit, and a sentence or phrase described in the specific language of each parallel translation data Learn the multiple types of translation models based on words and phrases,
The model weight learning unit includes a sentence or phrase described in Japanese of each parallel translation data in which the order of elements is rearranged by the learning data rearrangement unit, and a sentence or phrase described in the specific language of each parallel translation data A translation model learning method for learning a weight for each of the plurality of types of translation models based on a word and a plurality of types of translation models learned by the translation model learning unit.

The program for functioning a computer as each part which comprises the word order rearrangement apparatus of any one of Claims 1-7, the translation apparatus of Claim 8, or the translation model learning apparatus of Claim 9. .