JP2016164707A

JP2016164707A - Automatic translation device and translation model learning device

Info

Publication number: JP2016164707A
Application number: JP2015044418A
Authority: JP
Inventors: 富士　秀; Hide Fuji; 秀富士; 将夫内山; Masao Uchiyama; 隅田　英一郎; Eiichiro Sumida; 英一郎隅田
Original assignee: National Institute of Information and Communications Technology
Current assignee: National Institute of Information and Communications Technology
Priority date: 2015-03-06
Filing date: 2015-03-06
Publication date: 2016-09-08
Anticipated expiration: 2035-03-06
Also published as: JP6952967B2

Abstract

PROBLEM TO BE SOLVED: To increase accuracy of automatic translation of long sentences that can be divided into structural components.SOLUTION: A claim translation part 328 includes: a pattern search part 500 which identifies patterns of English claims 334 after syntactic analysis and divides them into structural components; a Japanese pattern generation part 504 which identifies Japanese patterns associated with English patterns; an input sentence division part 502 and a structural component kind determination part 510 which identify correspondence and grammatical characteristics of the structural components between English and Japanese patterns; a statistical automatic translation machine 524 which uses either a noun phrase translation model 320 or a verb phrase translation model 322 which are learned to use word strings of the structural components of the English claims 334 for translation of word strings of the grammatical characteristics of the structural components and generates Japanese translation; and a Japanese pattern substitution part 526 which substitutes respective Japanese translations of the structural components of the English claims 334 for structural components of the Japanese pattern according to the correspondence of the structural components and generates Japanese claims 326.SELECTED DRAWING: Figure 9

Description

この発明は自動翻訳技術に関し、特に、特許出願の特許請求の範囲の請求項（以下「クレーム」と呼ぶ）のように、ある種のルールにしたがっていながら、長文で翻訳が難しい文を扱う技術に関する。 TECHNICAL FIELD The present invention relates to automatic translation technology, and in particular, technology that handles long and difficult-to-translate sentences while following certain rules, such as the claims of a patent application claim (hereinafter referred to as “claims”). About.

外国語で書かれた特許出願の技術的内容を確認したり、権利範囲についての知識を得たりするために、外国語で書かれたクレームの意味を知る必要が生じることがある。対象になるものが少数であり、かつ対象となる文書の言語に関する知識があれば、原文を読むことについてそれほど問題はないかも知れない。しかし、対象文書が大量であったり、文書の言語についての知識が乏しかったりする場合には、対象文書のクレームを全て読むことは非現実的である。そのような場合に、自動翻訳で各クレームを自己の母語に翻訳することで、内容を把握しようとすることがよく行なわれる。 It may be necessary to know the meaning of a claim written in a foreign language in order to confirm the technical content of the patent application written in a foreign language or to obtain knowledge about the scope of rights. If you have a small number of subjects and knowledge of the language of the target document, you may not have much trouble reading the original. However, when there are a large number of target documents or knowledge of the language of the document is poor, it is unrealistic to read all the claims of the target document. In such a case, it is often performed to try to grasp the contents by translating each claim into its own language by automatic translation.

ところが、クレームを他の言語に自動翻訳した場合、その品質が低いことはよく知られている。クレームでは、発明に関する多くの限定を１文で記載することが慣例となっている。そのためにクレームの文は長くなる傾向がある。しかも、多くの発明者が開発にしのぎをけずるような技術分野では、比較的多くの限定（構成要件）をクレームに付加しないと先行技術との差異を明確にできない場合もある。そのために、クレームは、通常の文書ではほとんど生じ得ないような長さの文となることも多い。自動翻訳技術は特に近年非常に発達し精度も向上しているものの、特にクレームのように翻訳の原文が長い場合には精度の高い翻訳はまだ期待できない。 However, it is well known that when a claim is automatically translated into another language, its quality is low. In the claims, it is customary to describe many limitations relating to the invention in one sentence. Therefore, the claim text tends to be long. In addition, in technical fields where many inventors have difficulty in development, the difference from the prior art may not be clear unless relatively many limitations (constituent requirements) are added to the claims. As a result, claims are often sentences of length that rarely occur in ordinary documents. Although automatic translation technology has been greatly developed and improved in recent years, high-precision translation is not yet expected, especially when the original translation is long, as in the case of claims.

こうした問題を解決するための提案が、後掲の特許文献１でなされている。特許文献１に記載された技術は、基本的には、長いクレームを、その構成要素（以下、「構造部品」と呼ぶ。）に分割して構造部品ごとに翻訳する、という考え方に基づいている。これは、特に最近では、クレームの記載が、一般的に複数の構造部品に分けられること、それら構造部品の間は、特定の区切りパターンで区切られて記載されていること、という事実を利用している。クレームの構造部品としては、発明の主題と、１又は複数の構成要件と、発明の主題と構成要件との間に挿入される移行句が考えられる。これらの記載の順番は、言語によって異なってくる。例えば、日本語の場合には、１又は複数の構成要件が先頭に位置し、移行句が続き、移行句に続いて発明の主題が末尾に記載される。英語の場合にはこの逆で、先頭に発明の主題が記載され、続いて移行句、続いて１又は複数の構成要件という形が一般的である。 A proposal for solving such a problem is made in Patent Document 1 described later. The technique described in Patent Document 1 is basically based on the idea that a long claim is divided into its constituent elements (hereinafter referred to as “structural parts”) and translated for each structural part. . This takes advantage of the fact that, particularly recently, claims are generally divided into a plurality of structural parts and the structural parts are separated by a specific separation pattern. ing. The structural parts of the claims can be the subject matter of the invention, one or more components and a transitional phrase inserted between the subject matter and the components of the invention. The order of these descriptions varies depending on the language. For example, in the case of Japanese, one or more constituent elements are located at the top, followed by a transition phrase, followed by the transition phrase followed by the subject matter of the invention. The reverse is true for English, typically in the form of an inventive subject at the beginning, followed by a transition phrase, followed by one or more components.

日本語の場合、移行句としては「含む（ことを特徴とする）」、「を備える（ことを特徴とする）」、「からなる（ことを特徴とする）」などが主に用いられる。英語の場合には、移行句としては「comprising:」、「including:」、「consisting of:」等が用いられる。 In the case of Japanese, “contains (characteristic)”, “comprises (characteristic)”, “consists of (characteristically)”, etc. are mainly used as transition phrases. In English, “comprising:”, “including:”, “consisting of:”, etc. are used as transition phrases.

１又は複数の構成要件の間は、日本語であれば「、」＋改行が主として用いられ、英語であれば「;」が用いられる。 Between one or a plurality of constituent elements, “,” + line feed is mainly used in Japanese, and “;” is used in English.

このような区切りパターンをクレーム中で検出することにより、クレームの構造を判定できる。 By detecting such a delimiter pattern in a claim, the structure of the claim can be determined.

図１に、特許文献１に開示された技術についての大まかな処理の流れを示す。図１を参照して、特許文献１におけるクレームの翻訳手順３０は、コンピュータを用いたものであって、クレームのテキストに対して形態素解析及び構文解析し、構文情報を生成するステップ４０と、構文解析の結果内において構造部品の区切りパターンを検出し、クレームのテキストを構成要素ごとに分割するステップ４２と、各構造部品について、その構造部品名を示す文字列をテキスト中で特定し、その文字列に、構造部品名であることを示すタグを付するステップ４４とを含む。構文解析の結果、構造部品の間にツリー状の構文構造が形成される。特許文献１では、この構文構造は、構造部品及びその内部の単語等に付されるタグの形で表現される。 FIG. 1 shows a rough process flow of the technique disclosed in Patent Document 1. Referring to FIG. 1, the claim translation procedure 30 in Patent Document 1 uses a computer, and performs a morphological analysis and a syntax analysis on a claim text to generate syntax information, and a syntax In the result of the analysis, a delimiter pattern of the structural part is detected, a step 42 for dividing the text of the claim into components, a character string indicating the structural part name is specified in the text for each structural part, and the character And a step 44 of attaching a tag indicating that the column is a structural part name to the column. As a result of the parsing, a tree-like syntactic structure is formed between the structural parts. In Patent Document 1, this syntax structure is expressed in the form of a tag attached to a structural component, a word inside the structural component, and the like.

特許文献１の第１の実施の形態では、対象となるクレームが英語であることが想定されている。英語の場合、各構造部品に関するテキストの最初に、その構造部品の名称に相当する単語が記載されていることが多く、その後にその構造部品に関する説明を記載している形式がほとんどである。そのためにこのステップ４４のような処理が容易に行なえる。また、構造部品の説明もパターン化されていることが多い。特許文献１では、そのように説明についてパターン化された説明文を説明パターン文と呼んでいる。 In the first embodiment of Patent Document 1, it is assumed that the target claim is English. In the case of English, a word corresponding to the name of the structural part is often described at the beginning of the text relating to each structural part, and in most cases, an explanation regarding the structural part is described thereafter. Therefore, the process as in step 44 can be easily performed. In addition, descriptions of structural parts are often patterned. In Patent Document 1, an explanatory text that is patterned as described above is called an explanatory pattern text.

特許文献１の翻訳手順３０はさらに、処理対象となっている構造部品に含まれる説明パターン文を特定し、処理対象の構造部品をパターン文ごとに分割し、分割されたものが説明パターン文であることを示すタグを付すステップ４６と、処理対象の構造部品の構造部品名を自動翻訳により目的言語に翻訳するステップ４８と、説明パターン文ごとに、文の構造を翻訳に適したものに変更するステップ５０と、ステップ５０で文構造が変更された各説明文を自動翻訳によって目的言語５２に翻訳するステップと、このように目的言語に翻訳された構造部品名及び各パターン文を、タグにしたがってツリー表示するステップ５４とを含む。 The translation procedure 30 of Patent Document 1 further specifies an explanation pattern sentence included in the structural part to be processed, divides the structural part to be processed for each pattern sentence, and the divided part is an explanation pattern sentence. Step 46 for attaching a tag indicating the presence, Step 48 for translating the structural part name of the structural part to be processed into a target language by automatic translation, and changing the sentence structure to one suitable for translation for each explanation pattern sentence Step 50, translating each explanatory sentence whose sentence structure has been changed in step 50 into the target language 52 by automatic translation, and using the structural part name and each pattern sentence thus translated into the target language as tags. Therefore, it includes a step 54 of displaying a tree.

ステップ５０での文構造の変更とは、説明文が主語と述語とを含むような形式に変更することである。例えば英語の関係代名詞を省略して代わりに構造部品名を挿入したり、説明パターン文が修飾する名詞又は名詞句を挿入したりする。特許文献１では、これを含め、文の変換については予め変換パターンが設定されているとしている。この文構造の変換は、説明パターン文を一般の文の語順にあわせるためのものと思われる。 The change of the sentence structure in step 50 is a change to a format in which the explanatory text includes the subject and the predicate. For example, an English related pronoun is omitted and a structural part name is inserted instead, or a noun or noun phrase that is modified by an explanatory pattern sentence is inserted. In Patent Document 1, including this, a conversion pattern is set in advance for sentence conversion. This conversion of the sentence structure seems to be to match the explanation pattern sentence to the word order of general sentences.

図２を参照して、特許文献１による自動翻訳手法では、より具体的には以下の様な処理を行なう。英語クレーム７０が翻訳対象であるものとする。特許文献１では、英語クレームを、英語の区切りパターンを見つけることにより複数の構造部品に分解し、それらの間にツリー構造を形成する（処理７２）。前述したように、英語クレーム中の区切りパターンに着目すれば、英語クレームがどのような構造になっているかを判定できる。その結果にしたがって英語クレームを複数の構造部品に分解できる。 Referring to FIG. 2, the automatic translation method according to Patent Document 1 performs the following processing more specifically. Assume that the English claim 70 is to be translated. In Patent Document 1, an English claim is decomposed into a plurality of structural parts by finding an English delimiter pattern, and a tree structure is formed between them (process 72). As described above, it is possible to determine the structure of the English claim by paying attention to the delimiter pattern in the English claim. According to the result, the English claim can be disassembled into a plurality of structural parts.

特許文献１による手順では、処理７２の結果、ツリー構造の英語パターン７４が形成される。図２に示す例では、英語パターン７４は、主題９０と、移行句９２と、説明９４とを含む。主題９０、移行句９２，及び説明９４をそれぞれ別個に自動翻訳する（処理７６）ことで、ツリー構造の日本語パターン７８が生成される。日本語パターン７８は、構造としては英語パターン７４と同じで、テキストが日本語に変換されたものである。すなわち、日本語パターン７８は、主題の日本語訳１００、移行句の日本語訳１０２、説明の日本語訳１０４を含む。特許文献１では、日本語パターン７８から１文の日本語クレームを生成する代わりに、日本語パターン７８をそのままツリー形式で表示するとされている。 In the procedure according to Patent Document 1, as a result of the processing 72, an English pattern 74 having a tree structure is formed. In the example shown in FIG. 2, the English pattern 74 includes a subject 90, a transition phrase 92, and a description 94. A Japanese pattern 78 having a tree structure is generated by automatically translating the subject 90, the transition phrase 92, and the description 94 separately (process 76). The Japanese pattern 78 has the same structure as the English pattern 74, and the text is converted into Japanese. That is, the Japanese pattern 78 includes a Japanese translation 100 of the subject, a Japanese translation 102 of the transition phrase, and a Japanese translation 104 of the explanation. In Patent Document 1, instead of generating a one sentence Japanese claim from the Japanese pattern 78, the Japanese pattern 78 is displayed as it is in a tree format.

特許文献１によれば、このような処理を行なうことで、個々の構造部品を自動翻訳すればよくなるため、構成する説明パターン文の翻訳精度が従来のものより向上するとされている。さらに、翻訳後の文を木構造で表示することにより、クレームの内容がわかりやすくなるとされている。 According to Patent Document 1, by performing such processing, it is only necessary to automatically translate each structural component, so that the translation accuracy of the explanation pattern sentence to be configured is improved from the conventional one. Furthermore, the translated text is displayed in a tree structure, which makes it easier to understand the content of the complaint.

特開２０１４−１９９４７６（特に図５，７，８及び段落３１，４１、５１−７７）JP2014-199476 (particularly FIGS. 5, 7, 8 and paragraphs 31, 41, 51-77)

確かに、特許文献１の手法により、クレーム全体をひとまとめに翻訳するよりも翻訳の精度は高くなると思われる。しかし、自動翻訳を用いる場合、特許文献１の技術では依然として解決すべき課題がある。 Certainly, the technique of Patent Document 1 seems to improve the accuracy of translation rather than translating the entire claim together. However, when automatic translation is used, there is still a problem to be solved by the technique of Patent Document 1.

最も大きな問題は、説明を構成する文構造の変更である。文構造の変更は、説明文が通常文ではなく、各構造部品を修飾する形式となっていることにより必要となる処理である。文構造の変換パターンが予め準備されており、かつ説明パターン文の変更が完全に行なわれれば、既存の自動翻訳技術を用いて各説明文を翻訳しても問題は生じないかもしれない。しかし、クレームとして生じ得る文のバリエーションは事実上無限であり、それらについての文構造の変換規則を予め準備しておくことは不可能である。しかも変換パターンとしてどのようなものを準備すべきかについて、特許文献１には明確な開示がない。したがって、説明文が自動翻訳に適した文構造に変換される可能性は決して高くない。そのような変換が正しく行なわれず、説明文が文の形をなしていない場合でも、自動翻訳装置は、各説明文が通常の文であるとみなして翻訳を行なう。その結果、得られる翻訳結果が理解可能なものになる可能性はほとんどない。特に、構造部品が名詞句である場合、自動翻訳装置が文とみなして翻訳すると、翻訳文に対する悪影響が大きくなる。 The biggest problem is the change in the sentence structure that constitutes the explanation. The change of the sentence structure is a process that is necessary because the explanatory text is not a normal sentence but a format that modifies each structural component. If a sentence structure conversion pattern is prepared in advance and the explanation pattern sentence is completely changed, there may be no problem even if each explanation sentence is translated using the existing automatic translation technology. However, sentence variations that can occur as claims are virtually infinite, and it is impossible to prepare sentence structure conversion rules in advance. Moreover, there is no clear disclosure in Patent Document 1 regarding what kind of conversion pattern should be prepared. Therefore, there is no high possibility that the explanatory text is converted into a sentence structure suitable for automatic translation. Even when such conversion is not performed correctly and the explanation is not in the form of a sentence, the automatic translation apparatus interprets each explanation as a normal sentence and performs translation. As a result, the translation results obtained are unlikely to be understandable. In particular, when the structural part is a noun phrase, if the automatic translation apparatus regards it as a sentence and translates it, the adverse effect on the translated sentence increases.

例えば、図３を参照して、説明文に相当する２つの名詞句１及び２を持つ英語クレーム１２０を考える。特許文献１にしたがって本願発明者が自動翻訳して得られた日本語クレーム１２２によれば、名詞句１及び名詞句２に対する翻訳がほとんど理解できないものとなっている。図３に示すように、特許文献１にしたがって処理した場合、文構造の変換がうまく行なわれないにもかかわらず、各名詞句等を文として翻訳するような場合には、得られた翻訳文は理解が困難となり、自動翻訳の目的が果たせない。 For example, referring to FIG. 3, consider an English claim 120 having two noun phrases 1 and 2 corresponding to explanatory text. According to Japanese claim 122 obtained by automatic translation by the present inventor according to Patent Document 1, translation for noun phrase 1 and noun phrase 2 is hardly understood. As shown in FIG. 3, in the case where each noun phrase or the like is translated as a sentence even though the sentence structure is not converted successfully when processed according to Patent Document 1, the obtained translated sentence Is difficult to understand and the purpose of automatic translation cannot be fulfilled.

また、このような問題は、特許出願のクレームの翻訳に関して生ずるだけでなく、同様に長文で、特定のルールにしたがって記載されるような、複数の構造部品からなる文の翻訳においても生じ得る問題である。例えば法令、約款、及び条約、並びに様々な機械、電子機器、及びソフトウェアの使用説明書等においてもこうした問題が生じ得る。 In addition, such problems arise not only with respect to the translation of claims of patent applications, but also with the translation of sentences composed of a plurality of structural parts, which are similarly long and described according to specific rules. It is. For example, such problems may occur in laws, regulations, treaties, treaties, and instructions for using various machines, electronic devices, and software.

したがって本発明の目的は、特定の形式にしたがって記載された、複数の構造部品に分割できる長文に対する自動翻訳の精度を高めることができる自動翻訳装置を提供することである。 Accordingly, an object of the present invention is to provide an automatic translation apparatus capable of improving the accuracy of automatic translation for a long sentence that can be divided into a plurality of structural parts described according to a specific format.

本発明の第１の局面に係る自動翻訳装置は、第１の言語の文を、第１の言語と異なる第２の言語の文に翻訳する。この自動翻訳装置は、第１の言語の文のパターンを特定し、第１の言語の文を構造部品に分割する分割手段と、分割手段により特定された第１の言語の文のパターンと予め対応付けられた、第２の言語の文のパターンを特定するパターン特定手段と、分割手段により分割された第１の言語の文のパターンと、パターン特定手段により特定された第２の言語のパターンとの間で、構造部品の対応関係及び各構造部品の文法特性を特定する対応特定手段と、第１の言語の構造部品の各々について、当該構造部品を構成する単語列に対し、第１の言語から第２の言語への翻訳用のモデルを使用した自動翻訳を行なって、第２の言語の翻訳を生成する翻訳手段と、第１の言語の構造部品の各々について、翻訳手段によって得られた第２の言語の翻訳を、対応特定手段により特定された、構造部品の対応関係にしたがって、パターン特定手段により特定された第２の言語の文のパターンの構造部品のいずれかに代入することにより、第１の言語の文の翻訳である第２の言語の文を生成する代入手段とを含む。 The automatic translation apparatus according to the first aspect of the present invention translates a sentence in a first language into a sentence in a second language different from the first language. The automatic translation apparatus specifies a sentence pattern of a first language, divides a sentence of the first language into structural parts, a sentence pattern of the first language identified by the dividing means, Pattern specification means for specifying the second language sentence pattern, the first language sentence pattern divided by the dividing means, and the second language pattern specified by the pattern specification means The correspondence specifying means for specifying the correspondence between the structural parts and the grammatical characteristics of each structural part, and for each of the structural parts in the first language, the first A translation unit that performs automatic translation using a model for translation from a language to a second language to generate a translation of the second language, and each of the structural parts of the first language is obtained by the translation unit. Translation of the second language By substituting one of the structural parts of the second language sentence pattern specified by the pattern specifying means according to the correspondence relationship of the structural parts specified by the correspondence specifying means, the sentence of the first language Substitution means for generating a sentence in a second language that is a translation.

第１の言語の文は、例えば特許出願のクレーム、法令、約款、及び条約、並びに様々な機械、電子機器、及びソフトウェアの使用説明書等のいずれでもよい。 The sentence in the first language may be, for example, a patent application claim, a law, a contract, a treaty, and instructions for using various machines, electronic devices, and software.

好ましくは、自動翻訳装置は、対応特定手段と前記翻訳手段との間に設けられ、対応特定手段により対応関係及び文法特性が特定された第１の言語の構造部品を受け、第１の言語の構造部品の各々について、当該構造部品を構成する単語列の順番を、第２の言語において当該単語列の各々の訳語が出現する順番にあわせて並べ替えて前記翻訳手段に与える単語並べ替え手段を含む。 Preferably, the automatic translation apparatus is provided between the correspondence specifying means and the translation means, receives the structural component of the first language in which the correspondence relation and the grammatical characteristics are specified by the correspondence specifying means, A word rearrangement unit that rearranges the order of the word strings constituting the structural part in accordance with the order in which each translated word of the word string appears in the second language, and supplies the word translation unit to the translation unit. Including.

より好ましくは、単語並べ替え手段は、第１の言語の構造部品の各々について、構文解析を行なって第１の言語の構文解析木を生成する構文解析手段と、構文解析手段により生成された第１の言語の構文解析木を、予め準備された変換規則にしたがって、第２の言語の構文解析木に変換する変換手段と、変換手段による変換で得られた第２の言語の構文解析木における単語の出現順序にしたがって、第１の言語の構造部品の単語を並べ替えて出力する並べ替え手段とを含む。 More preferably, the word rearrangement means performs syntax analysis on each of the structural parts of the first language to generate a syntax analysis tree of the first language, and the first word generated by the syntax analysis means. In the parse tree of the second language obtained by conversion by the conversion means for converting the parse tree of one language into the parse tree of the second language in accordance with a conversion rule prepared in advance, Rearrangement means for rearranging and outputting the words of the structural parts of the first language in accordance with the appearance order of the words.

翻訳手段は、第１の言語の構造部品の各々について、当該構造部品を構成する単語列に対し、当該構造部品の文法特性を持つ単語列の翻訳に対して予め最適化された、第１の言語から第２の言語への翻訳用のモデルを使用した自動翻訳を行なって、第２の言語の翻訳を生成する文法特性別翻訳手段を含んでもよい。 For each of the structural parts of the first language, the translation means is preliminarily optimized for the translation of the word strings having the grammatical characteristics of the structural parts with respect to the word strings constituting the structural parts. A grammatical property-specific translation unit that performs automatic translation using a model for translation from a language into a second language and generates a translation in the second language may be included.

さらに好ましくは、パターン特定手段は、第１の言語の文に含まれる、予め定められた区切りパターンにより、第１の言語の文を複数個の構造部品に分割する分割手段と、分割手段により分割された各構造部品の文法特性を判定する文法特性判定手段と、分割手段により分割された構造部品の出現順序及び各構造部品について文法特性判定手段により判定された文法特性とによって、第１の言語の文パターンを特定する手段とを含む。 More preferably, the pattern specifying means divides the sentence in the first language into a plurality of structural parts by a predetermined delimiter pattern included in the sentence in the first language, and divides by the dividing means. The grammatical characteristic determining means for determining the grammatical characteristics of each structural part, the appearance order of the structural parts divided by the dividing means, and the grammatical characteristics determined by the grammatical characteristic determining means for each structural part. Means for specifying the sentence pattern.

第１の言語の文は、第１の言語で記載された特許出願のクレームでもよい。 The sentence in the first language may be a claim of a patent application written in the first language.

本発明の第２の局面に係る翻訳用モデル学習装置は、第１の言語の特定の文法特性の文を、第１の言語と異なる第２の言語の文に統計的翻訳によって翻訳する際に使用される翻訳用モデルの学習を行なう。この装置は、第１の言語の特定の文法特性の文と、当該文の、第１の言語と異なる第２の言語の訳文とからなる対訳を複数個収集するための収集手段と、収集手段により収集された複数の対訳を学習データとして、第１の言語の特定の文法特性の文から第２の言語の文への統計的翻訳を行なうために必要な統計的モデルの学習を行なうための統計的学習手段とを含む。 The translation model learning device according to the second aspect of the present invention translates a sentence having a specific grammatical characteristic of a first language into a sentence of a second language different from the first language by statistical translation. Learn the translation model used. The apparatus includes: a collecting unit for collecting a plurality of parallel translations composed of a sentence having a specific grammatical characteristic of a first language and a translation of the sentence in a second language different from the first language; For learning a statistical model necessary for statistical translation from a sentence of a specific grammatical characteristic of a first language into a sentence of a second language, using a plurality of parallel translations collected by And statistical learning means.

従来の特許出願のクレームの自動翻訳処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the automatic translation process of the claim of the conventional patent application. 従来の特許出願のクレームの自動翻訳処理におけるテキストの翻訳過程を模式的に示す図である。It is a figure which shows typically the translation process of the text in the automatic translation process of the claim of the conventional patent application. 従来の特許出願のクレームの自動翻訳処理における課題を説明するための模式図である。It is a schematic diagram for demonstrating the subject in the automatic translation process of the claim of the conventional patent application. 本発明の第１の実施の形態における自動翻訳の処理に必要な学習過程を説明するための模式図である。It is a schematic diagram for demonstrating the learning process required for the process of automatic translation in the 1st Embodiment of this invention. 本発明の第１の実施の形態に係る、クレームの自動翻訳処理におけるテキストの翻訳過程を説明するための模式図である。It is a schematic diagram for demonstrating the translation process of the text in the claim automatic translation process based on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係る自動翻訳システムの全体構成を示すブロック図である。It is a block diagram which shows the whole structure of the automatic translation system which concerns on the 1st Embodiment of this invention. 本発明の第１の実施の形態における自動翻訳用の各種モデルの学習を行なう、構造部品翻訳用モデル学習部の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the model learning part for structural component translation which learns the various models for automatic translation in the 1st Embodiment of this invention. 本発明の第１の実施の形態において、名詞句及び動詞句をそれぞれ構文解析するためのモデルの学習を行なう構文解析用モデル学習部の概略構成を示すブロック図である。FIG. 3 is a block diagram illustrating a schematic configuration of a syntax analysis model learning unit that learns a model for parsing a noun phrase and a verb phrase in the first embodiment of the present invention. 本発明の第１の実施の形態において、英文の特許出願のクレームを日本語に自動翻訳するクレーム翻訳部の概略構成を示すブロック図である。1 is a block diagram showing a schematic configuration of a claim translation unit that automatically translates claims of an English patent application into Japanese in the first embodiment of the present invention. FIG. 本発明の第２の実施の形態に係る自動翻訳システムの全体構成を示すブロック図である。It is a block diagram which shows the whole structure of the automatic translation system which concerns on the 2nd Embodiment of this invention. 本発明の第２の実施の形態における自動翻訳用の各種モデルの学習を行なう、構造部品翻訳用モデル学習部の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the model learning part for structural component translation which learns the various models for automatic translation in the 2nd Embodiment of this invention. 本発明の第２の実施の形態において、英文の特許出願のクレームを日本語に自動翻訳するクレーム翻訳部の概略構成を示すブロック図である。In the 2nd Embodiment of this invention, it is a block diagram which shows schematic structure of the claim translation part which translates the claim of an English patent application into Japanese automatically. 本発明の各実施の形態に係る自動翻訳システムを実現するハードウェアフラットフォームであるコンピュータシステムの外観を示す図である。It is a figure which shows the external appearance of the computer system which is a hardware flat form which implement | achieves the automatic translation system which concerns on each embodiment of this invention. 図１３に示すコンピュータシステムの内部のハードウェア構成を示すブロック図である。It is a block diagram which shows the hardware constitutions inside the computer system shown in FIG.

以下の説明及び図面では、同一の部品には同一の参照番号を付してある。したがって、それらについての詳細な説明は繰返さない。なお、以下の説明は、特許出願の英語クレームを日本語に翻訳する場合に関する。しかし以下の実施の形態の翻訳対象は、そのような場合には限定されない。例えば、日本語のクレームを英文に翻訳する場合にも適用できる。さらに、特許出願のクレームと同様に翻訳が難しいとされる法令、約款、及び条約、並びに様々な機械、電子機器、及びソフトウェアの使用説明書等についても適用できる。 In the following description and drawings, the same parts are denoted by the same reference numerals. Therefore, detailed description thereof will not be repeated. The following description relates to the case of translating an English claim of a patent application into Japanese. However, the translation object of the following embodiment is not limited to such a case. For example, the present invention can be applied to a case where a Japanese claim is translated into English. Furthermore, it can be applied to laws and regulations, covenants, and treaties that are difficult to translate as well as claims of patent applications, and instructions for using various machines, electronic devices, and software.

［第１の実施の形態］
＜基本的考え方＞
以下に説明する第１の実施の形態に係るクレームの自動翻訳システムでは、統計的機械翻訳を採用する。図４を参照して、英語、日本語を問わず、特許出願のクレームは、様々なバリエーションはあるものの、ある一定のパターンのいずれかにしたがって構造部品が配列され記載されている。本実施の形態では、こうしたクレームの構造部品の配列パターンをクレームパターンと呼ぶ。これは、図２に示す従来の英語パターン７４と同様のもので、ツリー構造で表すことができる。英語のクレームパターンを英語パターン、日本語のクレームパターンを日本語パターンと呼ぶ。また、実際の英語クレームとそれに対応する日本語クレームとを比較することにより、英語パターンと日本語パターンとを対応付けることができる。すなわち、ある英語パターンにしたがって記載されたクレームを、その英語パターンに対応する日本語パターンと合致するような日本語クレームに翻訳できる。 [First Embodiment]
<Basic concept>
The automatic machine translation system for claims according to the first embodiment described below employs statistical machine translation. Referring to FIG. 4, regardless of whether in English or Japanese, the claims of the patent application are arranged and described in accordance with any of certain patterns, although there are various variations. In the present embodiment, such an arrangement pattern of structural parts of a claim is referred to as a claim pattern. This is similar to the conventional English pattern 74 shown in FIG. 2, and can be represented by a tree structure. An English claim pattern is called an English pattern, and a Japanese claim pattern is called a Japanese pattern. Moreover, an English pattern and a Japanese pattern can be matched by comparing an actual English claim with a corresponding Japanese claim. That is, a claim written according to a certain English pattern can be translated into a Japanese claim that matches a Japanese pattern corresponding to the English pattern.

これらクレームパターンは、複数のクレームを分類することで得られる。各クレームパターンでは、各構造部品にはその構造部品の文法特性が付される。例えば構造部品を形成する文字列又は単語列が名詞句であれば、その構造部品には名詞句であるというマークが付され、動詞句であれば動詞句であるというマークが付される。ツリー構造の見かけが同じ２つのクレームパターンでも、ある構造部品の文法特性が異なれば、それらは別々のクレームパターンである。こうしたクレームパターンは、特許文献１で使用されている「区切りパターン」とは別のものである点に注意が必要である。 These claim patterns are obtained by classifying a plurality of claims. In each claim pattern, each structural part is given a grammatical characteristic of the structural part. For example, if a character string or word string forming a structural part is a noun phrase, the structural part is marked as a noun phrase, and if it is a verb phrase, it is marked as a verb phrase. Two claim patterns with the same appearance of the tree structure are different claim patterns if the grammatical characteristics of a certain structural component are different. It should be noted that such a claim pattern is different from the “separation pattern” used in Patent Document 1.

英語のクレームのクレームパターンと、その英語のクレームに対応する日本語のクレームのクレームパターンとを比較することにより、英語パターンと日本語パターンとを対応付けることができる。さらに、対応するクレームパターン同士で、構造部品同士の対応を付けることもできる。こうしたクレームパターン同士の対は、パターン分類データとして予め蓄積され、以下に記載するように翻訳用のモデルの学習時に使用される。なお、以下の実施の形態では、英語パターンと日本語パターンとは１対１に対応付けられているものとする。 By comparing the claim pattern of the English claim with the claim pattern of the Japanese claim corresponding to the English claim, the English pattern and the Japanese pattern can be associated with each other. Further, the corresponding claim patterns can be associated with each other. Such a pair of claim patterns is stored in advance as pattern classification data, and is used when learning a model for translation as described below. In the following embodiment, it is assumed that an English pattern and a Japanese pattern are associated one-to-one.

以下の実施の形態では、英語クレームを日本語クレームに翻訳する場合を想定する。原言語である英語クレームの単語列を複数の構造部品に分解し、各構造部品の文法特性を判定することで、英語パターンを判定する。この結果により、各構造部品を構成するテキストの内容も得られる。なお、構造部品の分解には、特許文献１と同様、クレーム中の区切りパターンを使用できる。特に、移行句はクレーム表現に特有なものでそのバリエーションも少ないので、比較的簡単に特定できる。移行句を特定した後は、言語によりその前後のいずれかに発明の主題に相当する構造部品が位置し、他方に説明に相当する１又は複数の構造部品が位置する。これら構造部品の間の分離は、特定の区切りパターンを発見することで行なうこともできるし、分離のための統計的モデルを予め作成しておき、区切り位置である可能性の高い部分を特定して行なうこともできる。 In the following embodiment, it is assumed that an English claim is translated into a Japanese claim. An English pattern is determined by decomposing a word string of an English claim as a source language into a plurality of structural parts and determining grammatical characteristics of each structural part. As a result, the contents of the text constituting each structural part can also be obtained. It should be noted that, as in the case of Patent Document 1, the division pattern in the claims can be used for disassembling the structural parts. In particular, the transitional phrase is specific to the claim expression and has few variations, so it can be identified relatively easily. After the transition phrase is specified, the structural part corresponding to the subject of the invention is located either before or after the transition phrase according to the language, and one or more structural parts corresponding to the description are located on the other side. Separation between these structural parts can be performed by finding a specific separation pattern, or a statistical model for separation is created in advance to identify a portion that is likely to be a separation position. Can also be done.

本実施の形態では、構造部品の文法特性（例えば名詞句、動詞句の種別等）ごとに、別々の翻訳用モデルの学習を行なっておき、各構造部品の文法特性に応じたモデルを用いて構造部品ごとに自動翻訳を行なう。すなわち、名詞句からなる構造部品を翻訳するときは、名詞句のみを用いて予め学習した翻訳用のモデルを使用する。動詞句からなる構造部品を翻訳するときは、動詞句のみを用いて予め学習した翻訳用のモデルを使用する。この結果、各構造部品についてより理解しやすい日本語訳が得られる。 In the present embodiment, learning is performed for a separate translation model for each grammatical characteristic of a structural component (for example, the type of a noun phrase or a verb phrase), and a model corresponding to the grammatical characteristic of each structural component is used. Automatic translation for each structural part. That is, when translating a structural part composed of noun phrases, a translation model learned in advance using only noun phrases is used. When translating structural parts composed of verb phrases, a translation model learned in advance using only verb phrases is used. As a result, it is possible to obtain a more easily understood Japanese translation of each structural component.

もとの英語パターンは、ある日本語パターンと関係付けられており、英語パターンのどの構造部品が日本語パターンのどの構造部品に対応しているかも予め記録されている。この対応関係を用い、各構造部品の日本語訳を、元の英語パターンに対応する日本語パターンの、元の構造部品に対応する構造部品の位置に代入することで、クレームの日本語訳を得ることができる。 The original English pattern is related to a certain Japanese pattern, and which structural component of the English pattern corresponds to which structural component of the Japanese pattern is recorded in advance. Using this correspondence, the Japanese translation of each structural part is substituted into the position of the structural part corresponding to the original structural part of the Japanese pattern corresponding to the original English pattern, so that the Japanese translation of the claim Can be obtained.

なお、ここでいう「翻訳用モデル」とは、統計的翻訳装置で使用される、原言語から目的言語への翻訳モデル、目的言語の言語モデル、及びそれらの学習の過程で生成される句テーブル等からなる１組のモデルを指す。以下の実施の形態では、原言語は英語であり、目的言語は日本語である。また、以下の実施の形態では、文法特性として名詞句と動詞句とを採用する。 The term “translation model” used here means a translation model from a source language to a target language, a language model of the target language, and a phrase table generated in the course of learning thereof, which are used in a statistical translation apparatus. Refers to a set of models. In the following embodiment, the source language is English and the target language is Japanese. In the following embodiments, noun phrases and verb phrases are adopted as grammatical characteristics.

図４に、名詞句翻訳用モデル２１０と動詞句翻訳用モデル２１４との学習過程を模式的に示す。なお、ここでは、英語パターンと日本語パターンとについては予め収集されており、それらの間の対応関係も確立されている。この対応関係は、前述したとおり１対１である。すなわち、英語パターンが決まればそれに対応する日本語パターンも決まり、構造部品の対応関係も同様に１対１で決まる。 FIG. 4 schematically shows the learning process of the noun phrase translation model 210 and the verb phrase translation model 214. Here, the English pattern and the Japanese pattern are collected in advance, and the correspondence between them is also established. This correspondence is one-to-one as described above. That is, when an English pattern is determined, a Japanese pattern corresponding to the English pattern is also determined, and the correspondence between the structural parts is similarly determined one-to-one.

図４を参照して、学習用の、クレームの対訳１７０を複数個準備する。各対訳１７０は、英語クレーム１７２と、その英語のクレームの翻訳である日本語クレーム１７４とを含む。例えば日本出願を英語に訳して米国に出願したもののように、互いに対訳に相当する関係となっている出願書類で、しかも公開されているものは多数存在する。対訳１７０は、それら公開された出願の組から選択すればよい。 Referring to FIG. 4, a plurality of complaint translations 170 are prepared for learning. Each parallel translation 170 includes an English claim 172 and a Japanese claim 174 which is a translation of the English claim. For example, there are many published application documents that have a parallel translation relationship, such as a Japanese application translated into English and filed in the United States. The bilingual 170 may be selected from the set of published applications.

英語クレーム１７２の英語パターンを判定し、その英語パターンにしたがってクレームを構造部品に分解し、各構造部品の文法特性を英語パターンにしたがって分類する（処理１７６）。同様に日本語クレーム１７４は英語パターンに対応する日本語パターンにしたがって構造部品に分解し、各構造部品の文法特性を日本語パターンにしたがって分類する（処理１８０）。この処理は、人手でも実行できるし、機械可読な形式のパターン集を準備しておくことで機械でも実行できる。 The English pattern of the English claim 172 is determined, the claim is decomposed into structural parts according to the English pattern, and the grammatical characteristics of each structural part are classified according to the English pattern (process 176). Similarly, the Japanese claim 174 is decomposed into structural parts according to the Japanese pattern corresponding to the English pattern, and the grammatical characteristics of each structural part are classified according to the Japanese pattern (process 180). This process can be executed manually or by a machine by preparing a machine-readable pattern collection.

このパターン分類の結果、例えば、英語クレーム１７２は英語パターン１７８に合致すると判定され、日本語クレーム１７４は日本語パターン１８２に合致すると判定される。英語パターンと日本語パターンとの構造部品は予め対応付けられており、各構造部品を対にすることができる（処理１８４）。例えば図４に示す例では、英語クレームの主題は日本語クレームの主題に対応付けられる。英語クレームの移行句は日本語クレームの移行句に対応付けられる。英語クレームの説明は、日本語クレームの説明に対応付けられる。 As a result of this pattern classification, for example, it is determined that the English claim 172 matches the English pattern 178, and the Japanese claim 174 is determined to match the Japanese pattern 182. The structural parts of the English pattern and the Japanese pattern are associated in advance, and each structural part can be paired (process 184). For example, in the example shown in FIG. 4, the subject of the English claim is associated with the subject of the Japanese claim. The English phrase transition phrase is associated with the Japanese complaint transition phrase. The description of the English claim is associated with the description of the Japanese claim.

このようにして組み合わされた構造部品のうち、本実施の形態では、構造部品の文法特性に着目し、各構造部品の対を文法特性（文法的な種別）ごとに別々の集合に分類する。すなわち、構造部品の対のテキストが名詞句であれば名詞句対の集合１９８に、動詞句であれば動詞句対の集合２００に、各構造部品の対を分類する（処理１９６）。このようにして得られた名詞句対の集合１９８は、英語クレームの名詞句と日本語クレームの名詞句との対を多数含み、動詞句対の集合２００は、英語クレームの動詞句と日本語クレームの動詞句との対を多数含む。 Of the structural parts combined in this way, in the present embodiment, focusing on the grammatical characteristics of the structural parts, the pairs of the structural parts are classified into separate sets for each grammatical characteristic (grammatical type). That is, if the text of the pair of structural parts is a noun phrase, the pair of structural parts is classified into a noun phrase pair set 198 and if it is a verb phrase, it is classified into a verb phrase pair set 200 (process 196). The set 198 of noun phrase pairs thus obtained includes a large number of pairs of noun phrases of English claims and noun phrases of Japanese claims, and the set 200 of verb phrase pairs includes verb phrases and Japanese of English claims. Includes many pairs of claim verb phrases.

最後に、名詞句対の集合１９８を学習データとして名詞句翻訳用モデル２１０の学習を、動詞句対の集合２００を学習データとして動詞句翻訳用モデル２１４の学習を、それぞれ行なう。 Finally, the noun phrase translation model 210 is learned using the noun phrase pair set 198 as learning data, and the verb phrase translation model 214 is learned using the verb phrase pair set 200 as learning data.

図５を参照して、英語から日本語への翻訳時には以下の様な処理を行なう。英語クレーム７０を英語パターンと照合することにより、英語クレーム７０のクレームパターンを特定する。この例では英語パターン７４が特定されたものとする。英語パターン７４は、この例では、３個の構造部品、すなわち主題と移行句と説明とを含む。これらはいずれもテキストを格納する領域を持つ。これら領域には、英語クレーム７０を構造部品に分解して得られる、主題に対応する英語テキスト、移行句に対応する英語テキスト、及び説明に対応する英語テキストがそれぞれ格納される（処理７２）。 Referring to FIG. 5, the following processing is performed at the time of translation from English to Japanese. By comparing the English claim 70 with the English pattern, the claim pattern of the English claim 70 is specified. In this example, it is assumed that the English pattern 74 is specified. The English pattern 74 in this example includes three structural parts: a subject, a transitional phrase, and a description. Each of these has an area for storing text. In these areas, the English text corresponding to the subject, the English text corresponding to the transition phrase, and the English text corresponding to the explanation obtained by disassembling the English claim 70 into structural parts are stored (process 72).

次に、特定された英語パターンと対応付けられている日本語パターンが特定され、生成される（処理７６）。具体的には、特定された英語パターン７４と対になった日本語パターン７８の構造を特定する情報がクレームパターンの記憶部から読み出され、この新たな日本語パターンを記憶する領域が新たに記憶領域に確保される。この例では、読み出された日本語パターン７８は、説明と移行句と主題とを含む。これらはこの順番で配置されており、いずれも、記憶装置内に、対応のテキストを格納する領域を持つ。日本語パターンの生成時には、これら領域は空である。 Next, a Japanese pattern associated with the identified English pattern is identified and generated (process 76). Specifically, information specifying the structure of the Japanese pattern 78 paired with the specified English pattern 74 is read from the claim pattern storage unit, and a new area for storing the new Japanese pattern is newly created. Secured in the storage area. In this example, the retrieved Japanese pattern 78 includes a description, a transition phrase, and a subject. These are arranged in this order, and each has an area for storing the corresponding text in the storage device. When generating Japanese patterns, these areas are empty.

続いて、英語パターンの主題、移行句、及び説明のテキストを、それぞれの文法特性に応じて選択したモデルを使用して自動翻訳する（処理２４０）。例えば主題は名詞句であるから名詞句翻訳用モデルを用いて翻訳される。説明のうち、動詞句であるものは動詞句用翻訳モデルを用いて翻訳され、名詞句であるものは名詞句翻訳用モデルを用いて翻訳される。移行句は動詞句であるから動詞句翻訳用モデルを用いて翻訳される。主題を翻訳した結果は、日本語パターン７８の主題の領域に、移行句を翻訳した結果は日本語パターン７８の移行句の領域に、説明を翻訳した結果は日本語パターンの説明の領域に、それぞれ格納される。この結果、翻訳後の日本語パターン２４２が得られる。得られた日本語パターン２４２の各領域のテキストを、日本語パターン２４２の各領域の順番に連結することで英語クレーム７０の翻訳である日本語のクレームが得られる。 Subsequently, the subject of the English pattern, the transitional phrase, and the explanatory text are automatically translated using a model selected according to their grammatical characteristics (process 240). For example, since the subject is a noun phrase, it is translated using a noun phrase translation model. Among the descriptions, those that are verb phrases are translated using a verb phrase translation model, and those that are noun phrases are translated using a noun phrase translation model. Since the transition phrase is a verb phrase, it is translated using a verb phrase translation model. The result of translating the subject is the subject area of the Japanese pattern 78, the result of translating the transition phrase is the transition phrase area of the Japanese pattern 78, and the result of translating the description is the explanation area of the Japanese pattern. Each is stored. As a result, a translated Japanese pattern 242 is obtained. By connecting the text of each area of the obtained Japanese pattern 242 in the order of each area of the Japanese pattern 242, a Japanese claim that is a translation of the English claim 70 is obtained.

＜構成＞
図６を参照して、上記した自動翻訳処理を実現する、本実施の形態に係るクレーム翻訳システム３００は、英語クレームを日本語クレームに翻訳するためのものである。クレーム翻訳システム３００は、名詞句の英日翻訳を行なう際に用いられる名詞句翻訳用モデル３２０と、動詞句の英日翻訳を行なう際に用いられる動詞句翻訳用モデル３２２と、これらの学習を行なうための構造部品翻訳用モデル学習部３１８と、動詞句構文解析用モデル３１２及び名詞句構文解析用モデル３１４を用いて英語クレーム３２４の構文解析を行い、構文解析後の英語クレーム３３４を出力するための構文解析部３３２と、名詞句翻訳用モデル３２０及び動詞句翻訳用モデル３２２を使用して、入力される構文解析後の英語クレーム３３４を翻訳して日本語クレーム３２６を出力するためのクレーム翻訳部３２８とを含む。 <Configuration>
With reference to FIG. 6, the claim translation system 300 which implement | achieves the automatic translation process mentioned above which concerns on this Embodiment is for translating an English claim into a Japanese claim. The claim translation system 300 includes a noun phrase translation model 320 used for English-Japanese translation of a noun phrase, a verb phrase translation model 322 used for English-Japanese translation of a verb phrase, and learning of these. Using the structural part translation model learning unit 318, the verb phrase syntax analysis model 312 and the noun phrase syntax analysis model 314, the English claim 324 is parsed and the parsed English claim 334 is output. For parsing the input English claim 334 after parsing and outputting the Japanese claim 326 using the parsing unit 332, the noun phrase translation model 320, and the verb phrase translation model 322 A translation unit 328.

構造部品翻訳用モデル学習部３１８は、名詞句翻訳用モデル３２０及び動詞句翻訳用モデル３２２の学習を行なうために、英語と日本語とのクレームの対訳を多数記憶したクレーム対訳コーパス３１０と、英語の動詞句の構文解析を行なうための動詞句構文解析用モデル３１２と、英語の名詞句の構文解析を行なうための名詞句構文解析用モデル３１４と、クレームに関する英語パターンとそれに対応する日本語パターンとの対を複数個格納したパターン分類データ３１６と、英語の構造部品について後述する事前並べ替えをするための構文解析木変換規則３３０とを用いる。動詞句構文解析用モデル３１２は、英語の動詞句のみを学習データとして学習した構文解析用モデルである。名詞句構文解析用モデル３１４は、英語の名詞句のみを学習データとして学習した構文解析用モデルである。 In order to learn the noun phrase translation model 320 and the verb phrase translation model 322, the structural part translation model learning unit 318 stores a complaint parallel translation corpus 310 that stores a large number of parallel translations of claims in English and Japanese, Verb phrase parsing model 312 for parsing Japanese verb phrases, Noun phrase parsing model 314 for parsing English noun phrases, English patterns related to claims and corresponding Japanese patterns Pattern classification data 316 in which a plurality of pairs are stored, and a parse tree conversion rule 330 for performing rearrangement on English structural parts, which will be described later, are used. The verb phrase syntax analysis model 312 is a syntax analysis model in which only English verb phrases are learned as learning data. The noun phrase syntax analysis model 314 is a syntax analysis model in which only English noun phrases are learned as learning data.

図７を参照して、構造部品翻訳用モデル学習部３１８は、クレーム対訳コーパス３１０からクレーム対訳を順番に読出し、そのうちの英語クレームをパターン分類データ３１６と照合することにより、合致する英語パターンと、その英語パターンと対になった日本語パターンを読み出すためのパターン分類部３５０と、クレーム対訳コーパス３１０から読み出されたクレーム及びパターン分類部３５０により読み出された英語パターンに基づいて、英語クレームを構造部品に分類し、名詞句と動詞句とに分類して出力する構造部品分類部３５２と、構造部品分類部３５２が出力する英語の名詞句を格納する名詞句データ記憶部３５４と、構造部品分類部３５２が出力する動詞句を格納する動詞句データ記憶部３５６とを含む。 With reference to FIG. 7, the structural part translation model learning unit 318 sequentially reads the claim parallel translations from the claim parallel translation corpus 310 and collates the English claims with the pattern classification data 316, thereby matching the English patterns. A pattern classification unit 350 for reading a Japanese pattern paired with the English pattern, a claim read out from the claim bilingual corpus 310, and an English claim based on the English pattern read out by the pattern classification unit 350 A structural part classifying unit 352 that classifies the structural part into a noun phrase and a verb phrase and outputs it, a noun phrase data storage unit 354 that stores an English noun phrase output by the structural part classifying unit 352, and a structural part A verb phrase data storage unit 356 that stores verb phrases output from the classification unit 352.

構造部品翻訳用モデル学習部３１８はさらに、名詞句データ記憶部３５４に記憶された名詞句データの各々に対して、事前並べ替えと呼ばれる、英語クレームの単語の並べ替え処理を行なう名詞句事前並べ替え部３５８と、名詞句事前並べ替え部３５８の出力する、事前並べ替え後の名詞句データを記憶するための名詞句用学習データ記憶部３６２と、名詞句用学習データ記憶部３６２に記憶された事前並べ替え後の英語クレームと日本語クレームとの対を学習データとし名詞句翻訳用モデル３２０の学習を行なうための名詞句用モデル学習部３６４とを含む。 The model learning unit for structural component translation 318 further performs a noun phrase pre-arrangement that performs an English claim word rearrangement process called pre-arrangement for each of the noun phrase data stored in the noun phrase data storage unit 354. Stored in the noun phrase learning data storage unit 362 and the noun phrase learning data storage unit 362 for storing the pre-sorted noun phrase data output from the noun phrase pre-ordering unit 358. A noun phrase model learning unit 364 for learning the noun phrase translation model 320 using the pair of the English claim and the Japanese claim after the rearrangement as learning data.

構造部品翻訳用モデル学習部３１８はさらに、動詞句データ記憶部３５６に記憶された動詞句データの各々に対して、英語クレームの単語の事前並べ替えを行なう動詞句事前並べ替え部３６０と、動詞句事前並べ替え部３６０の出力する、事前並べ替え後の動詞句データを記憶するための動詞句用学習データ記憶部３６６と、動詞句用学習データ記憶部３６６に記憶された事前並べ替え後の英語クレームと日本語クレームとの対を学習データとして動詞句翻訳用モデル３２２の学習を行なうための動詞句用モデル学習部３６８とを含む。 The structural part translation model learning unit 318 further includes a verb phrase pre-ordering unit 360 that performs pre-ordering of English claim words for each of the verb phrase data stored in the verb phrase data storage unit 356, and a verb. The verb phrase learning data storage unit 366 for storing the pre-sorted verb phrase data output from the phrase pre-sorting unit 360, and the pre-sorted pre-sorted data stored in the verb phrase learning data storage unit 366 And a verb phrase model learning unit 368 for learning the verb phrase translation model 322 by using pairs of English claims and Japanese claims as learning data.

名詞句事前並べ替え部３５８は、名詞句データ記憶部３５４に記憶された名詞句の各々について名詞句構文解析用モデル３１４を用いて構文解析を行ない、構文解析木を出力するための構文解析部３８０と、構文解析部３８０が出力する構文解析木を、構文解析木変換規則３３０内に記憶された変換規則を用いて、対応する日本語の構文解析木と同じ構造の構文解析木に変形するための木構造変換部３８２と、木構造変換部３８２により出力された変換後の構文解析木により規定される順番で英語の単語を並べ替えて名詞句用学習データ記憶部３６２に格納するための単語並べ替え部３８４とを含む。なお、この実施の形態では、翻訳前に原文（英語）の単語の並べ替えを行う事前並べ替えを採用しているが、これは一例であって、他の並べ替えの手法を用いることもできる。例えば、原文の単語の順序にしたがって単語の翻訳をした後、翻訳後の単語を目的言語の語順となるように並べ替えを行う、いわゆる事後並べ替えの手法を採用することもできる。 The noun phrase pre-arrangement unit 358 performs syntax analysis for each noun phrase stored in the noun phrase data storage unit 354 using the noun phrase syntax analysis model 314 and outputs a syntax analysis tree. 380 and the parsing tree output by the parsing unit 380 are transformed into a parsing tree having the same structure as the corresponding Japanese parsing tree using the conversion rules stored in the parsing tree conversion rule 330. For rearranging English words in the order defined by the tree structure conversion unit 382 and the post-conversion parse tree output by the tree structure conversion unit 382 and storing them in the noun phrase learning data storage unit 362 A word rearrangement unit 384. In this embodiment, pre-ordering is performed in which the original (English) words are rearranged before translation. However, this is merely an example, and other rearrangement methods may be used. . For example, it is possible to employ a so-called post-sorting method in which words are translated according to the order of the words in the original text, and then the translated words are rearranged so as to be in the word order of the target language.

動詞句事前並べ替え部３６０は、動詞句データ記憶部３５６に記憶された動詞句の各々について動詞句構文解析用モデル３１２を用いて構文解析を行ない、構文解析木を出力するための構文解析部４００と、構文解析部４００が出力する構文解析木を、構文解析木変換規則３３０内に記憶された変換規則を用いて、対応する日本語の構文解析木と同じ構造の構文解析木に変形するための木構造変換部４０２と、木構造変換部４０２により出力された変換後の構文解析木により規定される順番で英語の動詞句内の単語を並べ替えて動詞句用学習データ記憶部３６６に格納するための単語並べ替え部４０４とを含む。 The verb phrase pre-ordering unit 360 performs parsing for each verb phrase stored in the verb phrase data storage unit 356 using the verb phrase parsing model 312 and outputs a parsing tree. 400 and the parsing tree output by the parsing unit 400 are transformed into a parsing tree having the same structure as the corresponding Japanese parsing tree by using the conversion rule stored in the parsing tree conversion rule 330. The words in the English verb phrase are rearranged in the order specified by the tree structure conversion unit 402 and the post-conversion parse tree output by the tree structure conversion unit 402, and stored in the verb phrase learning data storage unit 366 A word rearrangement unit 404 for storage.

本実施の形態では、動詞句構文解析用モデル３１２及び名詞句構文解析用モデル３１４はそれぞれ、英語の動詞句と英語の名詞句の構文解析を行なうように学習を行なったモデルである。これらの学習は、事前に図８に示すような構成の構文解析用モデル学習部４４２によって行なう。 In the present embodiment, the verb phrase syntax analysis model 312 and the noun phrase syntax analysis model 314 are models that have been learned to perform syntax analysis of an English verb phrase and an English noun phrase, respectively. Such learning is performed in advance by the model learning unit 442 for syntax analysis having a configuration as shown in FIG.

図８を参照して、構文解析用モデル学習部４４２は、大量の英語文書を記憶した原言語コーパス４４０から名詞句を抽出し、当該名詞句と、当該名詞句を分解して得られる様々な単語及び句（以下、これらをまとめて単に「名詞句」と呼ぶ）とを抽出するための名詞句抽出・分解部４６０と、名詞句抽出・分解部４６０により抽出された名詞句を記憶するための名詞句構文解析学習データ記憶部４６２と、名詞句構文解析学習データ記憶部４６２に記憶された名詞句の構文解析学習データを用いて名詞句構文解析用モデル３１４の学習を行なうための名詞句構文解析用モデル学習部４６４とを含む。名詞句構文解析用モデル３１４は統計的モデルであり、英語の名詞句が与えられると、その名詞句の構文解析木として最も尤度が高い構文解析木を出力するように学習を行なう。 Referring to FIG. 8, the parsing model learning unit 442 extracts a noun phrase from the source language corpus 440 storing a large amount of English documents, and decomposes the noun phrase and the various noun phrases. A noun phrase extraction / decomposition unit 460 for extracting words and phrases (hereinafter collectively referred to as “noun phrases”) and a noun phrase extracted by the noun phrase extraction / decomposition unit 460 are stored. Noun phrase syntax analysis learning data storage unit 462 and a noun phrase syntax analysis model 314 for learning the noun phrase syntax analysis model 314 using the noun phrase syntax analysis learning data storage unit 462. And a parsing model learning unit 464. The noun phrase parsing model 314 is a statistical model. When an English noun phrase is given, learning is performed so as to output a parse tree having the highest likelihood as a parse tree of the noun phrase.

構文解析用モデル学習部４４２はさらに、原言語コーパス４４０から動詞句を抽出するための動詞句抽出部４７０と、動詞句抽出部４７０により抽出された動詞句を格納するための動詞句構文解析学習データ記憶部４７２と、動詞句構文解析学習データ記憶部４７２に記憶された動詞句の構文解析学習データを用いて動詞句構文解析用モデル３１２の学習を行なうための動詞句構文解析用モデル学習部４７４とを含む。動詞句構文解析用モデル３１２は統計的モデルであり、英語の動詞句が与えられると、その動詞句の構文解析木として最も尤度が高い構文解析木を出力するように学習を行なう。 The parsing model learning unit 442 further includes a verb phrase extraction unit 470 for extracting a verb phrase from the source language corpus 440, and a verb phrase syntax analysis learning for storing the verb phrase extracted by the verb phrase extraction unit 470. Data storage unit 472 and verb phrase syntax analysis model learning unit for learning verb phrase syntax analysis model 312 using verb phrase syntax analysis learning data stored in verb phrase syntax analysis learning data storage unit 472 474. The verb phrase parsing model 312 is a statistical model. When an English verb phrase is given, learning is performed so as to output a parse tree having the highest likelihood as a parse tree of the verb phrase.

図９を参照して、図６に示すクレーム翻訳部３２８は、入力される構文解析後の英語クレーム３３４を受けて、パターン分類データ３１６を検索し、構文解析後の英語クレーム３３４と合致する英語パターンを持つパターン対を読み出すパターン検索部５００と、パターン検索部５００により読み出されたパターン対から、日本語パターンを取り出し、空の日本語パターンを生成する日本語パターン生成部５０４と、日本語パターン生成部５０４が生成した日本語パターンを記憶するための日本語パターン記憶部５０６とを含む。 Referring to FIG. 9, the claim translation unit 328 shown in FIG. 6 receives the input English claim 334 after parsing, searches the pattern classification data 316, and matches the English claim 334 after parsing. A pattern search unit 500 that reads a pattern pair having a pattern, a Japanese pattern generation unit 504 that extracts a Japanese pattern from the pattern pair read by the pattern search unit 500 and generates an empty Japanese pattern, and Japanese And a Japanese pattern storage unit 506 for storing the Japanese pattern generated by the pattern generation unit 504.

クレーム翻訳部３２８はさらに、英語パターンが付された構文解析後の英語クレーム３３４をパターン検索部５００から受け、構文解析後の英語クレーム３３４を、英語パターンの構造にあわせて構造部品に分割して出力する入力文分割部５０２と、入力文分割部５０２が出力する構造部品を英語パターンの対応する部分と関連付けて記憶するための構造部品記憶部５０８と、構造部品記憶部５０８に記憶された構造部品を順番に読出し、読み出した構造部品が名詞句か動詞句かを英語パターンから特定して判定信号を出力する構造部品種別判定部５１０と、構造部品種別判定部５１０の出力する判定信号の値に応答して、動詞句構文解析用モデル３１２及び名詞句構文解析用モデル３１４のいずれかを選択するセレクタ５１２と、同じく構造部品種別判定部５１０の出力する判定信号の値に応答して、動詞句翻訳用モデル３２２及び名詞句翻訳用モデル３２０のいずれかを選択するセレクタ５２２と、構造部品種別判定部５１０により種別が判定された構造部品を構造部品記憶部５０８から読出し、セレクタ５１２の選択した構文解析用モデルを用いて構文解析を行ない、構文解析木を出力する構文解析部５１４とを含む。 The claim translation unit 328 further receives the post-parsing English claim 334 with an English pattern from the pattern search unit 500, and divides the post-parsing English claim 334 into structural parts in accordance with the structure of the English pattern. An input sentence division unit 502 to be output, a structural part storage unit 508 for storing a structural part output by the input sentence division unit 502 in association with a corresponding portion of an English pattern, and a structure stored in the structural part storage unit 508 The part is read in order, the structural part type determination unit 510 that outputs a determination signal by specifying whether the read out structural part is a noun phrase or a verb phrase from the English pattern, and the value of the determination signal output from the structural part type determination unit 510 In response to the selector 512 for selecting either the verb phrase parsing model 312 or the noun phrase parsing model 314; In response to the value of the determination signal output from the component type determination unit 510, the type is determined by the selector 522 that selects either the verb phrase translation model 322 or the noun phrase translation model 320, and the structural component type determination unit 510. And a syntactic analysis unit 514 that reads the structural component from the structural component storage unit 508, performs parsing using the parsing model selected by the selector 512, and outputs a parsing tree.

クレーム翻訳部３２８はさらに、構文解析部５１４の出力する構文解析木を、構文解析木変換規則３３０に記憶された構文解析木変換規則にしたがって、対応する日本語の構文解析木に変換する構文解析木変換部５１８とを含む。 The claim translation unit 328 further converts the syntax analysis tree output from the syntax analysis unit 514 into a corresponding Japanese syntax analysis tree according to the syntax analysis tree conversion rule stored in the syntax analysis tree conversion rule 330. A tree conversion unit 518.

クレーム翻訳部３２８はさらに、構文解析木変換部５１８が出力した日本語の構文解析で各単語が現れる順番にしたがって、英語の単語を並べ替えた英語単語列を出力する単語並べ替え部５２０と、単語並べ替え部５２０が出力する英語単語列を受け、セレクタ５２２により選択された翻訳用モデル（対象の入力文が動詞句のときは動詞句翻訳用モデル３２２、名詞句のときは名詞句翻訳用モデル３２０）を用いて英語から日本語への統計的自動翻訳を行なう統計的自動翻訳機５２４と、統計的自動翻訳機５２４の出力する日本語文を、日本語パターン記憶部５０６の記憶する日本語パターンの、翻訳対象となっている構造部品に対応する領域に代入する日本語パターン代入部５２６とを含む。 The claim translation unit 328 further includes a word rearrangement unit 520 that outputs an English word string in which English words are rearranged according to the order in which each word appears in the Japanese syntax analysis output from the syntax analysis tree conversion unit 518; A translation model selected by the selector 522 upon receipt of an English word string output from the word rearrangement unit 520 (a verb phrase translation model 322 when the target input sentence is a verb phrase, and a noun phrase translation model when the target input sentence is a noun phrase) Statistical automatic translator 524 that performs automatic statistical translation from English to Japanese using model 320), and Japanese sentences output by statistical automatic translator 524 are stored in Japanese pattern storage unit 506. And a Japanese pattern substitution unit 526 that substitutes the pattern into an area corresponding to the structural component to be translated.

＜動作＞
以上に構成を説明したクレーム翻訳システム３００は以下のように動作する。クレーム翻訳システム３００の動作は、大きく分けて３段階に別れる。第１段階は図６に示す動詞句構文解析用モデル３１２及び名詞句構文解析用モデル３１４の学習である。第２段階は、図６に示す名詞句翻訳用モデル３２０及び動詞句翻訳用モデル３２２の学習である。これら２つの処理が終了すると、第３段階として構文解析部３３２及びクレーム翻訳部３２８は英語クレーム３２４を日本語に翻訳して日本語クレーム３２６を出力可能になる。なお、これら処理に先立ち、パターン分類データ３１６及び構文解析木変換規則３３０は何らかの手段で予め準備されているものとする。以下、これらについて順番に説明する。 <Operation>
The claim translation system 300 whose configuration has been described above operates as follows. The operation of the claim translation system 300 is roughly divided into three stages. The first stage is learning of the verb phrase syntax analysis model 312 and the noun phrase syntax analysis model 314 shown in FIG. The second stage is learning of the noun phrase translation model 320 and the verb phrase translation model 322 shown in FIG. When these two processes are completed, the syntax analysis unit 332 and the claim translation unit 328 can translate the English claim 324 into Japanese and output the Japanese claim 326 as a third stage. Prior to these processes, it is assumed that the pattern classification data 316 and the parse tree conversion rule 330 are prepared in advance by some means. Hereinafter, these will be described in order.

＜第１段階:構文解析用モデルの学習＞
図８を参照して、予め英語の原言語コーパス４４０が準備される。原言語コーパス４４０に記憶された各文に対しては、予め形態素解析及び構文解析が行なわれ、構文解析木の情報が付されている。 <Stage 1: Learning a model for parsing>
Referring to FIG. 8, an English source language corpus 440 is prepared in advance. For each sentence stored in the source language corpus 440, morphological analysis and syntax analysis are performed in advance, and information of the syntax analysis tree is attached.

名詞句抽出・分解部４６０は、原言語コーパス４４０から、名詞句と、その名詞句に含まれる単語と、その名詞句の構文構造を示す部分解析木を抽出し、名詞句構文解析学習データ記憶部４６２に蓄積する。名詞句構文解析用モデル学習部４６４は、名詞句構文解析学習データ記憶部４６２に蓄積された情報を読出し、名詞句構文解析用モデル３１４の学習を行なう。この学習の結果、名詞句構文解析用モデル３１４は、英語の名詞句が与えられたときに、その構文解析木として最も尤度の高いものを出力するような計算に利用できるようになる。 The noun phrase extraction / decomposition unit 460 extracts, from the source language corpus 440, a noun phrase, a word included in the noun phrase, and a partial parse tree indicating the syntax structure of the noun phrase, and stores noun phrase parsing learning data storage Stored in the unit 462. The noun phrase syntax analysis model learning unit 464 reads information stored in the noun phrase syntax analysis learning data storage unit 462 and learns the noun phrase syntax analysis model 314. As a result of this learning, the noun phrase parsing model 314 can be used for calculations that output the most likely parsing tree when an English noun phrase is given.

一方、動詞句抽出部４７０は、原言語コーパス４４０から、動詞句と、その動詞句の構文構造を示す部分解析木を抽出し、動詞句構文解析学習データ記憶部４７２に蓄積する。動詞句構文解析用モデル学習部４７４は、動詞句構文解析学習データ記憶部４７２に蓄積された動詞句を読出し、動詞句構文解析用モデル３１２の学習を行なう。この学習の結果、動詞句構文解析用モデル３１２は、英語の動詞句が与えられたときに、その構文解析木として最も尤度の高いものを出力するような計算に利用できるようになる。 On the other hand, the verb phrase extraction unit 470 extracts a verb phrase and a partial parse tree indicating the syntax structure of the verb phrase from the source language corpus 440 and stores them in the verb phrase syntax analysis learning data storage unit 472. The verb phrase syntax analysis model learning unit 474 reads the verb phrases stored in the verb phrase syntax analysis learning data storage unit 472 and learns the verb phrase syntax analysis model 312. As a result of this learning, the verb phrase parsing model 312 can be used for calculation to output the most likely parsing tree when an English verb phrase is given.

＜第２段階:翻訳用モデルの学習＞
図７を参照して、パターン分類部３５０は、クレーム対訳コーパス３１０からクレームの日英対訳を読出し、パターン分類データ３１６に含まれるパターンと照合することにより、英語クレームのパターンを特定する。パターン分類部３５０は、特定した英語のパターンをパターン分類データ３１６から読み出す。パターン分類部３５０は、英語クレームを英語パターンにしたがって構造部品に分割しその英語パターンの各構造部品のテキスト領域に代入する。構造部品分類部３５２は、テキストが代入された英語パターンの各構造部品について、それが名詞句なら名詞句データ記憶部３５４に、動詞句なら動詞句データ記憶部３５６に、それぞれ蓄積する。クレーム対訳コーパス３１０に記憶された全てのクレーム対訳についてこの処理が終了すれば、又はこの処理と並行して、名詞句事前並べ替え部３５８及び動詞句事前並べ替え部３６０が動作できる。 <Second stage: Learning translation model>
With reference to FIG. 7, the pattern classification unit 350 reads the Japanese-English parallel translation of the complaint from the parallel translation corpus 310 and collates it with the pattern included in the pattern classification data 316 to identify the English complaint pattern. The pattern classification unit 350 reads the specified English pattern from the pattern classification data 316. The pattern classification unit 350 divides the English claim into structural parts according to the English pattern, and substitutes it into the text area of each structural part of the English pattern. The structural part classification unit 352 stores each structural part of the English pattern to which the text is assigned in the noun phrase data storage unit 354 if it is a noun phrase, and in the verb phrase data storage unit 356 if it is a verb phrase. When this process is completed for all the complaint parallel translations stored in the complaint parallel translation corpus 310, or in parallel with this process, the noun phrase pre-ordering unit 358 and the verb phrase pre-sorting unit 360 can operate.

名詞句事前並べ替え部３５８の構文解析部３８０は、名詞句データ記憶部３５４に蓄積された名詞句データを順番に読み出す。構文解析部３８０は、読出した名詞句に対し、名詞句構文解析用モデル３１４を利用して構文解析することにより、構文解析木を木構造変換部３８２に出力する。木構造変換部３８２は、構文解析木が与えられると、構文解析木変換規則３３０に格納された木構造変換規則のいずれかを用いて英語クレームの名詞句の構文解析木を対応の日本語の構文解析木の構造に変換し、単語並べ替え部３８４に与える。単語並べ替え部３８４は、この日本語の構文解析木により定まる順番で、英語の単語を並べ替えて名詞句用学習データ記憶部３６２に蓄積する。名詞句用モデル学習部３６４は、このようにして名詞句用学習データ記憶部３６２に蓄積された名詞句用学習データを用いて名詞句翻訳用モデル３２０の学習を行なう。 The syntax analysis unit 380 of the noun phrase pre-arrangement unit 358 reads the noun phrase data stored in the noun phrase data storage unit 354 in order. The syntax analysis unit 380 analyzes the syntax of the read noun phrase using the noun phrase syntax analysis model 314, and outputs the syntax analysis tree to the tree structure conversion unit 382. When the parse tree is given, the tree structure conversion unit 382 converts the noun phrase parse tree of the English claim into the corresponding Japanese language using one of the tree structure conversion rules stored in the parse tree conversion rule 330. The data is converted into a parse tree structure and given to the word rearrangement unit 384. The word rearrangement unit 384 rearranges English words in the order determined by the Japanese parsing tree and accumulates them in the noun phrase learning data storage unit 362. The noun phrase model learning unit 364 learns the noun phrase translation model 320 using the noun phrase learning data stored in the noun phrase learning data storage unit 362 in this manner.

動詞句事前並べ替え部３６０の構文解析部４００は、動詞句データ記憶部３５６に蓄積された動詞句データを順番に読み出す。構文解析部４００は、読出した動詞句を、動詞句構文解析用モデル３１２を利用して構文解析することにより、構文解析木を木構造変換部４０２に出力する。木構造変換部４０２は、構文解析木が与えられると、構文解析木変換規則３３０に格納された木構造変換規則のいずれかを用いて英語クレームの動詞句の構文解析木を対応の日本語の構文解析木の構造に変換し、単語並べ替え部４０４に与える。単語並べ替え部４０４は、この日本語の構文解析木により定まる順番で、英語の単語を並べ替えて動詞句用学習データ記憶部３６６に蓄積する。動詞句用モデル学習部３６８は、このようにして動詞句用学習データ記憶部３６６に蓄積された動詞句用学習データを用いて動詞句翻訳用モデル３２２の学習を行なう。 The syntax analysis unit 400 of the verb phrase pre-ordering unit 360 sequentially reads the verb phrase data stored in the verb phrase data storage unit 356. The syntax analysis unit 400 parses the read verb phrase using the verb phrase syntax analysis model 312 to output a parse tree to the tree structure conversion unit 402. When the parse tree is given, the tree structure conversion unit 402 converts the parse tree of the verb phrase of the English claim into the corresponding Japanese language using one of the tree structure conversion rules stored in the parse tree conversion rule 330. The data is converted into a parse tree structure and given to the word rearrangement unit 404. The word rearrangement unit 404 rearranges English words in the order determined by the Japanese parsing tree and stores them in the verb phrase learning data storage unit 366. The verb phrase model learning unit 368 learns the verb phrase translation model 322 using the verb phrase learning data stored in the verb phrase learning data storage unit 366 in this way.

以上のようにして、クレーム対訳コーパス３１０に保存されたクレーム対訳の全てについての処理が終了すると、名詞句翻訳用モデル３２０及び動詞句翻訳用モデル３２２を利用して名詞句及び動詞句の英日の翻訳を行なうことが可能になる。 As described above, when the processing for all of the claim parallel translations stored in the claim parallel translation corpus 310 is completed, the noun phrase and verb phrase translation model 322 is used to translate the English and Japanese noun phrases and verb phrases. Can be translated.

＜第３段階:クレームの翻訳＞
クレームの翻訳時、クレーム翻訳部３２８は以下のように動作する。クレーム翻訳部３２８の処理に先立ち、図６に示す構文解析部３３２により英語クレーム３２４に対する構文解析が行われる。その結果、構文解析後の英語クレーム３３４がクレーム翻訳部３２８に入力される。 <Stage 3: Claim translation>
When translating a claim, the claim translation unit 328 operates as follows. Prior to the processing of the claim translation unit 328, the syntax analysis unit 332 shown in FIG. As a result, the English claim 334 after parsing is input to the claim translation unit 328.

図９を参照して、構文解析後の英語クレーム３３４がクレーム翻訳部３２８に与えられると、パターン検索部５００がパターン分類データ３１６に記憶されているパターンと構文解析後の英語クレーム３３４とを照合し、一致する英語パターンを検索する。パターン検索部５００は、構文解析後の英語クレーム３３４の英語と最もよく一致する英語パターンを、対応の日本語パターンとともにパターン分類データ３１６から読出し、日本語パターンを日本語パターン生成部５０４に、英語パターンと構文解析後の英語クレーム３３４とを入力文分割部５０２に、それぞれ与える。 Referring to FIG. 9, when an English claim 334 after parsing is given to the claim translation unit 328, the pattern search unit 500 compares the pattern stored in the pattern classification data 316 with the English claim 334 after parsing. And search for matching English patterns. The pattern search unit 500 reads the English pattern that best matches the English of the English claim 334 after parsing from the pattern classification data 316 together with the corresponding Japanese pattern, and sends the Japanese pattern to the Japanese pattern generation unit 504. The pattern and the parsed English claim 334 are given to the input sentence dividing unit 502, respectively.

日本語パターン生成部５０４は、与えられた日本語パターンに基づいて、日本語パターン記憶部５０６に日本語パターンのテキストの記憶領域を、その日本語パターンの構造部品ごとに作成する。これら各領域はこの時点では空である。 Based on the given Japanese pattern, the Japanese pattern generation unit 504 creates a Japanese pattern text storage area in the Japanese pattern storage unit 506 for each structural component of the Japanese pattern. Each of these areas is empty at this point.

入力文分割部５０２は、構文解析後の英語クレーム３３４をパターン分類データ３１６から読出された英語パターンにしたがって分割し、各構造部品のテキストの記憶領域に代入して構造部品記憶部５０８に格納する。構造部品種別判定部５１０は、構造部品記憶部５０８にこれら構造部品が格納されると、そのうちの一つを読出し、その構造部品が名詞句か動詞句かを判定して判定信号をセレクタ５１２及びセレクタ５２２に与える。この判定信号に応答して、セレクタ５１２は、構造部品が動詞句である場合には動詞句構文解析用モデル３１２を、名詞句である場合には名詞句構文解析用モデル３１４を、それぞれ選択して構文解析部５１４に接続する。セレクタ５２２は、構造部品が動詞句である場合には動詞句翻訳用モデル３２２を、名詞句である場合には名詞句翻訳用モデル３２０を、それぞれ選択して統計的自動翻訳機５２４に接続する。 The input sentence dividing unit 502 divides the English claim 334 after the parsing according to the English pattern read from the pattern classification data 316, substitutes it into the text storage area of each structural component, and stores it in the structural component storage unit 508. . When these structural components are stored in the structural component storage unit 508, the structural component type determination unit 510 reads one of them, determines whether the structural component is a noun phrase or a verb phrase, and sends a determination signal to the selector 512 and This is given to the selector 522. In response to this determination signal, the selector 512 selects the verb phrase parsing model 312 when the structural component is a verb phrase, and the noun phrase parsing model 314 when it is a noun phrase. To the syntax analysis unit 514. The selector 522 selects the verb phrase translation model 322 when the structural component is a verb phrase, and selects the noun phrase translation model 320 when the structural component is a noun phrase, and connects to the statistical automatic translator 524. .

構文解析部５１４は、構造部品種別判定部５１０から構造部品を受けとり、セレクタ５１２を介して接続された動詞句構文解析用モデル３１２又は名詞句構文解析用モデル３１４を使用して構造部品の構文解析を行なう。この結果、構文解析木が構文解析部５１４から構文解析木変換部５１８に与えられる。構文解析部５１４は、この構文解析において、対象となる構造部品が名詞句であれば名詞句構文解析用モデル３１４を使用して、動詞句であれば動詞句構文解析用モデル３１２を使用して、それぞれ対象となる構造部品の構文解析を行なう。動詞句構文解析用モデル３１２は動詞句を学習データとして図８で示した構文解析用モデル学習部４４２で学習したものであり、名詞句構文解析用モデル３１４は名詞句を学習データとして構文解析用モデル学習部４４２で学習したものである。したがって、名詞句構文解析用モデル３１４が行なう構文解析は構造部品の種別に応じた最適なものとなる。 The syntax analysis unit 514 receives the structural component from the structural component type determination unit 510 and uses the verb phrase syntax analysis model 312 or the noun phrase syntax analysis model 314 connected via the selector 512 to analyze the structural component. To do. As a result, the syntax analysis tree is given from the syntax analysis unit 514 to the syntax analysis tree conversion unit 518. In this syntax analysis, the syntax analysis unit 514 uses the noun phrase syntax analysis model 314 if the target structural component is a noun phrase, and uses the verb phrase syntax analysis model 312 if it is a verb phrase. , And parse each target structural part. The verb phrase syntax analysis model 312 is learned by the syntax analysis model learning unit 442 shown in FIG. 8 as verb phrases as learning data, and the noun phrase syntax analysis model 314 is used for syntax analysis using noun phrases as learning data. This is learned by the model learning unit 442. Therefore, the syntax analysis performed by the noun phrase syntax analysis model 314 is optimal according to the type of the structural component.

構文解析木変換部５１８は、構文解析部５１４から構文解析木を受け取ると、構文解析木変換規則３３０を参照し、変換規則の中で、構文解析部５１４から受け取った構文解析木に対する構文解析規則として最もふさわしいものを選択する。構文解析木変換部５１８は、選択された変換規則を用いて、構文解析部５１４から与えられた構文解析木を、その構文解析木により表される英語に対し、その訳文となるべき日本語の構文構造を示す構文解析木に変換する。この変換後の構文解析木において、木の構造は日本語文の構造であるが、各リーフに割り当てられている語は英語の単語である。構文解析木変換部５１８は、変換後の構文解析木を単語並べ替え部５２０に与える。 When the syntax analysis tree conversion unit 518 receives the syntax analysis tree from the syntax analysis unit 514, the syntax analysis tree conversion unit 518 refers to the syntax analysis tree conversion rule 330, and among the conversion rules, the syntax analysis rule for the syntax analysis tree received from the syntax analysis unit 514. Choose the most appropriate one. The parse tree conversion unit 518 uses the selected conversion rule to convert the parse tree given from the parse tree 514 to the English that is to be the translation for the English represented by the parse tree. Convert to a parse tree that shows the syntax structure. In the converted parse tree, the tree structure is a Japanese sentence structure, but the words assigned to each leaf are English words. The parse tree conversion unit 518 gives the converted parse tree to the word rearrangement unit 520.

単語並べ替え部５２０は、構文解析木変換部５１８から与えられた変換後の構文解析木を用い、その構文解析木に出現する英単語を、構文解析木の構造によって定まる順番に並べ替え、得られた英単語列を統計的自動翻訳機５２４に入力として与える。 The word rearrangement unit 520 uses the converted parse tree given from the parse tree conversion unit 518 to rearrange the English words appearing in the parse tree in the order determined by the structure of the parse tree. The obtained English word string is given to the statistical automatic translator 524 as an input.

統計的自動翻訳機５２４は、この英単語列に対し、セレクタ５２２により接続される翻訳用モデルを用いた英語から日本語への統計的翻訳を行なって、その結果得られた日本語の単語列を日本語パターン代入部５２６に与える。このとき使用される翻訳用モデルは、翻訳対象の構造部品が動詞句のときには動詞句翻訳用モデル３２２であり、名詞句である場合には名詞句翻訳用モデル３２０である。したがって、各構造部品の英日翻訳は、名詞句は名詞句として、動詞句は動詞句として、正しく翻訳される可能性が高くなる。 The statistical automatic translator 524 performs statistical translation from English to Japanese using the translation model connected by the selector 522 for the English word string, and the resulting Japanese word string Is given to the Japanese pattern substitution unit 526. The translation model used at this time is a verb phrase translation model 322 when the structural component to be translated is a verb phrase, and is a noun phrase translation model 320 when it is a noun phrase. Therefore, in the English-Japanese translation of each structural component, there is a high possibility that noun phrases are correctly translated as noun phrases and verb phrases as verb phrases.

日本語パターン代入部５２６は、統計的自動翻訳機５２４が出力する日本語の単語列について、もとの英語の構造部品と日本語パターンの構造部品との対応情報を用いて、日本語パターン記憶部５０６に記憶されている日本語パターンの、該当する構造部品のテキスト記憶領域に翻訳結果を代入する。 The Japanese pattern substitution unit 526 stores the Japanese pattern for the Japanese word string output from the statistical automatic translator 524 using the correspondence information between the original English structural component and the Japanese pattern structural component. The translation result is substituted into the text storage area of the corresponding structural part of the Japanese pattern stored in the part 506.

以上の処理が、構造部品記憶部５０８に記憶された全ての構造部品に対して行なわれると、日本語パターン記憶部５０６に記憶された日本語パターンの各構造部品には、対応する英語の構造部品を日本語に訳した単語列が格納されることになる。これらの翻訳結果は、もとの英語の構造部品が動詞句であれば動詞句に最適化された動詞句翻訳用モデル３２２を用いて行なわれ、名詞句であれば名詞句に最適化された名詞句翻訳用モデル３２０を用いて行なわれたものである。したがって、日本語のクレームを構成する個々の構造部品は、もとの英語の構造部品について、その種別を反映した的確なものとなることが期待される。また、日本語パターン記憶部５０６の構造部品は、日本語のクレームにおける各構造部品の出現順序にしたがって配列されている。したがって、日本語パターン記憶部５０６に記憶された各構造部品を先頭から順番に読み出して適切に表示することにより、英語クレーム３２４に対する正確な翻訳である日本語クレーム３２６を得ることができる。 When the above processing is performed on all the structural parts stored in the structural part storage unit 508, each structural part of the Japanese pattern stored in the Japanese pattern storage unit 506 has a corresponding English structure. A word string obtained by translating parts into Japanese is stored. These translation results were performed using a verb phrase translation model 322 optimized for a verb phrase if the original English structural component was a verb phrase, and optimized for a noun phrase if it was a noun phrase. This was performed using the noun phrase translation model 320. Therefore, it is expected that the individual structural parts constituting the Japanese claims will be accurate reflecting the type of the original English structural parts. The structural parts of the Japanese pattern storage unit 506 are arranged according to the appearance order of the structural parts in the Japanese claim. Therefore, by reading each structural part stored in the Japanese pattern storage unit 506 in order from the top and displaying it appropriately, the Japanese claim 326 which is an accurate translation for the English claim 324 can be obtained.

［第２の実施の形態］
上記第１の実施の形態では、図７に示す構造部品翻訳用モデル学習部３１８による翻訳用モデルの学習において、名詞句事前並べ替え部３５８及び動詞句事前並べ替え部３６０による、名詞句と動詞句の事前並べ替えを行なっている。また、図９に示すクレーム翻訳部３２８による英語クレームの翻訳時にも、構文解析部５１４、構文解析木変換部５１８及び単語並べ替え部５２０による単語の事前並べ替えを行なっている。このように、学習時及び翻訳時に単語の事前並べ替えを行なう事により、学習の効率が高くなり、かつ翻訳の精度も高くなる。したがって、このように単語並べ替えを行なうことが望ましい。しかし、本発明はそのような実施の形態には限定されない。英語パターンの構造部品の文法的特性（動詞句か名詞句か、というような種別）に応じて別々の翻訳用モデルの学習を予め行ない、翻訳時には翻訳対象の構造部品の文法的特性に応じて適切な翻訳用モデルを用いて自動翻訳を行なうようなものであれば、どのようなものでもよい。第２の実施の形態は、事前並べ替えを省略したものである。 [Second Embodiment]
In the first embodiment, in the translation model learning by the structural part translation model learning unit 318 shown in FIG. 7, the noun phrase pre-rearrangement unit 358 and the verb phrase pre-rearrangement unit 360 perform noun phrases and verbs. Pre-ordered phrases. In addition, when translating an English claim by the claim translation unit 328 shown in FIG. Thus, by rearranging the words in advance during learning and translation, the efficiency of learning increases and the accuracy of translation also increases. Therefore, it is desirable to perform word rearrangement in this way. However, the present invention is not limited to such an embodiment. Depending on the grammatical characteristics of the structural parts of the English pattern (types such as verb phrases or noun phrases), different translation models are pre-learned. Anything can be used as long as automatic translation is performed using an appropriate translation model. In the second embodiment, prior rearrangement is omitted.

図１０にこの第２の実施の形態に係るクレーム翻訳システム５５０の全体構成をブロック図形式で示す。このクレーム翻訳システム５５０が図６に示すクレーム翻訳システム３００と異なるのは、動詞句構文解析用モデル３１２、名詞句構文解析用モデル３１４、及び構文解析木変換規則３３０を使用しない点、及び、図６の構造部品翻訳用モデル学習部３１８及びクレーム翻訳部３２８に替えて、いずれも事前並べ替えを用いない構造部品翻訳用モデル学習部５７０及びクレーム翻訳部５７２をそれぞれ含むことである。なお、動詞句構文解析用モデル３１２及び名詞句構文解析用モデル３１４は、第１の実施の形態と同様、構文解析部３３２による構文解析において使用される。 FIG. 10 is a block diagram showing the overall configuration of the claim translation system 550 according to the second embodiment. The claim translation system 550 is different from the claim translation system 300 shown in FIG. 6 in that the verb phrase parsing model 312, the noun phrase parsing model 314, and the parsing tree conversion rule 330 are not used. 6 includes a structural part translation model learning unit 570 and a claim translation unit 572 that do not use pre-ordering, instead of the structural part translation model learning unit 318 and the claim translation unit 328, respectively. Note that the verb phrase syntax analysis model 312 and the noun phrase syntax analysis model 314 are used in the syntax analysis by the syntax analysis unit 332 as in the first embodiment.

図１１に、この構造部品翻訳用モデル学習部５７０の概略ブロック図を示す。構造部品翻訳用モデル学習部５７０が図７に示す構造部品翻訳用モデル学習部３１８と異なるのは、名詞句データ記憶部３５４、動詞句データ記憶部３５６、事前並べ替えのための名詞句事前並べ替え部３５８及び動詞句事前並べ替え部３６０を含まず、構造部品分類部３５２から出力される名詞句が直接に名詞句用学習データ記憶部３６２に蓄積され、動詞句が直接に動詞句用学習データ記憶部３６６に蓄積される点である。他の点では構造部品翻訳用モデル学習部５７０の各部は構造部品翻訳用モデル学習部３１８の対応部分と同様である。 FIG. 11 shows a schematic block diagram of the structural part translation model learning unit 570. The structural part translation model learning unit 570 is different from the structural part translation model learning unit 318 shown in FIG. 7 in that the noun phrase data storage unit 354, the verb phrase data storage unit 356, and the noun phrase pre-arrangement for pre-ordering. The noun phrase output from the structural part classification unit 352 is directly stored in the noun phrase learning data storage unit 362 without including the replacement unit 358 and the verb phrase pre-ordering unit 360, and the verb phrase is directly learned for the verb phrase. This is a point accumulated in the data storage unit 366. In other respects, each part of the structural part translation model learning unit 570 is the same as the corresponding part of the structural part translation model learning unit 318.

図１２に、図１０に示すクレーム翻訳部５７２の概略ブロック図を示す。クレーム翻訳部５７２は、図９と比較して、事前並べ替えに必要なセレクタ５１２、構文解析部５１４、構文解析木変換規則３３０，構文解析木変換部５１８及び単語並べ替え部５２０を含まず、構造部品記憶部５０８の出力が統計的自動翻訳機５２４に直接与えられる点で図９のクレーム翻訳部３２８と異なっている。その他の点ではクレーム翻訳部５７２の各部はクレーム翻訳部３２８の各部と同じ構成である。 FIG. 12 shows a schematic block diagram of the claim translation unit 572 shown in FIG. Compared with FIG. 9, the claim translation unit 572 does not include the selector 512, the syntax analysis unit 514, the parse tree conversion rule 330, the parse tree conversion unit 518, and the word rearrangement unit 520 necessary for pre-ordering, 9 is different from the claim translation unit 328 of FIG. 9 in that the output of the structural part storage unit 508 is directly given to the statistical automatic translator 524. In other respects, each part of the claim translation unit 572 has the same configuration as each part of the claim translation unit 328.

この第２の実施の形態では、翻訳用モデルの学習時にも、翻訳時にも事前並べ替えが行なわれない点で第１の実施の形態と異なっている。このような構成でもクレームの翻訳は従来のものよりも高精度に行なうことができる。特に、２つの言語の語順が比較的近いような言語間の翻訳では、事前並べ替えを用いなくても十分に翻訳精度の向上を見込むことができる。 This second embodiment is different from the first embodiment in that pre-sorting is not performed during learning of a translation model or during translation. Even with such a configuration, the translation of the claims can be performed with higher accuracy than the conventional one. In particular, in the translation between languages in which the word order of the two languages is relatively close, it is possible to expect a sufficient improvement in translation accuracy without using pre-ordering.

＜変形例＞
上記第１の実施の形態では事前並べ替えを使用し、第２の実施の形態では事前並べ替えを使用していない。本発明はそのような実施の形態のみに限定して適用可能ではなく、たとえば、事前並べ替えを行なうか否かを選択可能にしてもよい。このようにすることで、対象となる言語によって事前並べ替えを行なう翻訳と行なわない翻訳との双方が可能になる。 <Modification>
In the first embodiment, pre-ordering is used, and in the second embodiment, pre-ordering is not used. The present invention is not limited to such an embodiment, and for example, it may be possible to select whether or not to perform rearrangement. By doing in this way, it becomes possible to perform both translation with pre-sorting and translation without translation according to the target language.

また上記実施の形態では、翻訳用モデルとして動詞句翻訳用のモデルと名詞句翻訳用のモデルとを用いている。しかし本発明はそのような実施の形態には限定されず、それ以外のモデル、例えば副詞句翻訳用のモデルを用いることもできる。 In the above embodiment, a verb phrase translation model and a noun phrase translation model are used as translation models. However, the present invention is not limited to such an embodiment, and other models such as an adverb phrase translation model may be used.

［コンピュータによる実現］
上記第１の実施の形態に係るクレーム翻訳システム３００、第２の実施の形態に係るクレーム翻訳システム５５０、及びその他の変形例は、コンピュータハードウェアと、そのコンピュータハードウェア上で実行されるコンピュータプログラムとにより実現できる。図１３はこのコンピュータシステム９３０の外観を示し、図１４はコンピュータシステム９３０の内部構成を示す。 [Realization by computer]
The claim translation system 300 according to the first embodiment, the claim translation system 550 according to the second embodiment, and other modified examples include computer hardware and a computer program executed on the computer hardware. And can be realized. FIG. 13 shows the external appearance of the computer system 930, and FIG. 14 shows the internal configuration of the computer system 930.

図１３を参照して、このコンピュータシステム９３０は、メモリポート９５２及びＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ）ドライブ９５０を有するコンピュータ９４０と、キーボード９４６と、マウス９４８と、モニタ９４２とを含む。 Referring to FIG. 13, the computer system 930 includes a computer 940 having a memory port 952 and a DVD (Digital Versatile Disc) drive 950, a keyboard 946, a mouse 948, and a monitor 942.

図１４を参照して、コンピュータ９４０は、メモリポート９５２及びＤＶＤドライブ９５０に加えて、ＣＰＵ（中央処理装置）９５６と、ＣＰＵ９５６、メモリポート９５２及びＤＶＤドライブ９５０に接続されたバス９６６と、ブートプログラム等を記憶する読出専用メモリ（ＲＯＭ）９５８と、バス９６６に接続され、プログラム命令、システムプログラム及び作業データ等を記憶するランダムアクセスメモリ（ＲＡＭ）９６０と、ハードディスク９５４を含む。コンピュータシステム９３０はさらに、他端末との通信を可能とするネットワーク９６８への接続を提供するネットワークインターフェイス（Ｉ／Ｆ）９４４を含む。 Referring to FIG. 14, in addition to memory port 952 and DVD drive 950, computer 940 includes a CPU (Central Processing Unit) 956, a bus 966 connected to CPU 956, memory port 952, and DVD drive 950, and a boot program. A read-only memory (ROM) 958 for storing etc., a random access memory (RAM) 960 connected to the bus 966 for storing program instructions, system programs, work data, etc. The computer system 930 further includes a network interface (I / F) 944 that provides a connection to a network 968 that allows communication with other terminals.

コンピュータシステム９３０を上記した実施の形態に係るクレーム翻訳システム３００及びクレーム翻訳システム５５０の各機能部として機能させるためのコンピュータプログラムは、ＤＶＤドライブ９５０又はメモリポート９５２に装着されるＤＶＤ９６２又はリムーバブルメモリ９６４に記憶され、さらにハードディスク９５４に転送される。又は、プログラムはネットワーク９６８を通じてコンピュータ９４０に送信されハードディスク９５４に記憶されてもよい。プログラムは実行の際にＲＡＭ９６０にロードされる。ＤＶＤ９６２から、リムーバブルメモリ９６４から又はネットワーク９６８を介して、直接にＲＡＭ９６０にプログラムをロードしてもよい。 A computer program for causing the computer system 930 to function as each functional unit of the claim translation system 300 and the claim translation system 550 according to the above-described embodiment is stored in the DVD 962 or the removable memory 964 installed in the DVD drive 950 or the memory port 952. It is stored and further transferred to the hard disk 954. Alternatively, the program may be transmitted to the computer 940 through the network 968 and stored in the hard disk 954. The program is loaded into the RAM 960 when executed. The program may be loaded directly from the DVD 962 into the RAM 960 from the removable memory 964 or via the network 968.

このプログラムは、コンピュータ９４０を、上記実施の形態に係るクレーム翻訳システム３００及びクレーム翻訳システム５５０の各機能部として機能させるための複数の命令からなる命令列を含む。コンピュータ９４０にこの動作を行なわせるのに必要な基本的機能のいくつかはコンピュータ９４０上で動作するオペレーティングシステム若しくはサードパーティのプログラム又はコンピュータ９４０にインストールされる、ダイナミックリンク可能な各種プログラミングツールキット又はプログラムライブラリにより提供される。したがって、このプログラム自体はこの実施の形態のシステム及び方法を実現するのに必要な機能を実現するためのオブジェクトコードの全てを必ずしも含まなくてよい。このプログラムは、命令の内、所望の結果が得られるように制御されたやり方で適切な機能又はプログラミングツールキット又はプログラムライブラリ内の適切なプログラムを実行時に動的に呼出すことにより、上記したシステムとしての機能を実現する命令のみを含んでいればよい。もちろん、プログラムのみで必要な機能を全て提供するようにしてもよい。 This program includes an instruction sequence including a plurality of instructions for causing the computer 940 to function as each functional unit of the claim translation system 300 and the claim translation system 550 according to the above-described embodiment. Some of the basic functions required to cause computer 940 to perform this operation are an operating system or third party program that runs on computer 940 or various dynamically linkable programming toolkits or programs installed on computer 940. Provided by the library. Therefore, this program itself does not necessarily include all of the object code for realizing the functions necessary for realizing the system and method of this embodiment. This program can be used as a system as described above by dynamically calling the appropriate program in the appropriate function or programming toolkit or program library at run time in a controlled manner to achieve the desired result. It is only necessary to include an instruction for realizing the function. Of course, all necessary functions may be provided only by the program.

また、クレーム翻訳システム３００及びクレーム翻訳システム５５０の各機能部を別々のコンピュータに分散して処理したり、ネットワークを介して別々の地域に存在する別々のコンピュータで分散して処理したりするようにしてもよい。 In addition, the functional units of the claim translation system 300 and the claim translation system 550 are processed by being distributed to different computers, or are distributed and processed by different computers in different regions via a network. May be.

今回開示された実施の形態は単に例示であって、本発明が上記した実施の形態のみに制限されるわけではない。本発明の範囲は、発明の詳細な説明の記載を参酌した上で、特許請求の範囲の各請求項によって示され、そこに記載された文言と均等の意味及び範囲内での全ての変更を含む。 The embodiment disclosed herein is merely an example, and the present invention is not limited to the above-described embodiment. The scope of the present invention is indicated by each claim of the claims after taking into account the description of the detailed description of the invention, and all modifications within the meaning and scope equivalent to the wording described therein are included. Including.

３００、５５０クレーム翻訳システム
３１０クレーム対訳コーパス
３１２動詞句構文解析用モデル
３１４名詞句構文解析用モデル
３１６パターン分類データ
３１８、５７０構造部品翻訳用モデル学習部
３２０名詞句翻訳用モデル
３２２動詞句翻訳用モデル
３２８、５７２クレーム翻訳部
３３０構文解析木変換規則
３５８名詞句事前並べ替え部
３６０動詞句事前並べ替え部
４４２構文解析用モデル学習部
５２４統計的自動翻訳機 300, 550 Claim translation system 310 Claim translation corpus 312 Verb phrase parsing model 314 Noun phrase parsing model 316 Pattern classification data 318, 570 Structural part translation model learning unit 320 Noun phrase translation model 322 Verb phrase translation model 328, 572 Claim translation unit 330 Parse tree conversion rule 358 Noun phrase pre-ordering unit 360 Verb phrase pre-ordering unit 442 Parsing model learning unit 524 Statistical automatic translator

Claims

An automatic translation device that translates a sentence in a first language into a sentence in a second language different from the first language,
A dividing unit that identifies a sentence pattern of the first language and divides the sentence of the first language into structural parts;
Pattern specifying means for specifying a pattern of the sentence in the second language, which is associated in advance with the pattern of the sentence in the first language specified by the dividing means;
Between the sentence pattern of the first language divided by the dividing means and the pattern of the second language specified by the pattern specifying means, the correspondence between the structural parts and the grammatical characteristics of each structural part A correspondence identifying means for identifying
For each of the structural parts of the first language, automatic translation using a model for translation from the first language to the second language is performed on a word string constituting the structural part, A translation means for generating a second language translation;
For each of the structural parts in the first language, the pattern specifying means determines the translation in the second language obtained by the translation means in accordance with the correspondence relationship of the structural parts specified by the correspondence specifying means. Substituting means for generating a sentence of the second language, which is a translation of the sentence of the first language, by substituting into any of the structural components of the pattern of the sentence of the second language specified by Including automatic translation device.

Further, the first language structural component, which is provided between the correspondence specifying unit and the translation unit and has the correspondence relationship and the grammatical characteristics specified by the correspondence specifying unit, is received. A word rearrangement unit that rearranges the order of the word strings constituting the structural component in accordance with the order in which the respective translations of the word strings appear in the second language, and gives them to the translation unit. The automatic translation apparatus according to claim 1.

The word rearranging means is:
A syntax analysis means for generating a syntax analysis tree of the first language by performing syntax analysis according to a grammatical characteristic of the structure component for each of the structural components of the first language;
Conversion means for converting the parse tree of the first language generated by the parse analysis means into the parse tree of the second language according to a conversion rule prepared in advance;
Rearrangement means for rearranging and outputting the words of the structural parts of the first language according to the appearance order of the words in the parsing tree of the second language obtained by the conversion by the conversion means. Item 3. The automatic translation device according to Item 2.

For each of the structural parts in the first language, the translation means is pre-optimized for translation of a word string having a grammatical characteristic of the structural part with respect to a word string constituting the structural part. 4. A grammatical characteristic translation unit that performs automatic translation using a model for translation from a first language to the second language to generate a translation of the second language. The automatic translation apparatus in any one of.

The pattern specifying means includes
Dividing means for dividing the sentence in the first language into a plurality of structural parts according to a predetermined delimiter pattern included in the sentence in the first language;
A grammatical characteristic determining means for determining a grammatical characteristic of each structural part divided by the dividing means;
Means for specifying a sentence pattern of the first language according to the appearance order of the structural parts divided by the dividing means and the grammatical characteristics determined by the grammatical characteristic determining means for each structural part. The automatic translation apparatus in any one of Claims 1-4.

Translation model learning apparatus for learning a translation model used when a sentence having a specific grammatical characteristic of a first language is translated into a sentence of a second language different from the first language by statistical translation Because
A collecting means for collecting a plurality of parallel translations of the sentence having the specific grammatical characteristic of the first language and a translation of the sentence in a second language different from the first language;
Statistical data necessary for statistical translation from the sentence of the specific grammatical characteristic of the first language to the sentence of the second language using the plurality of parallel translations collected by the collecting means as learning data A model learning device for translation, including statistical learning means for learning a model.