JP6952967B2

JP6952967B2 - Automatic translator

Info

Publication number: JP6952967B2
Application number: JP2015044418A
Authority: JP
Inventors: 富士　秀; 秀富士; 将夫内山; 隅田　英一郎; 英一郎隅田
Original assignee: National Institute of Information and Communications Technology
Current assignee: National Institute of Information and Communications Technology
Priority date: 2015-03-06
Filing date: 2015-03-06
Publication date: 2021-10-27
Anticipated expiration: 2035-03-06
Also published as: JP2016164707A

Description

この発明は自動翻訳技術に関し、特に、特許出願の特許請求の範囲の請求項（以下「クレーム」と呼ぶ）のように、ある種のルールにしたがっていながら、長文で翻訳が難しい文を扱う技術に関する。 The present invention relates to an automatic translation technique, and in particular, a technique for handling a long sentence that is difficult to translate while following certain rules, such as claims in the claims of a patent application (hereinafter referred to as "claims"). Regarding.

外国語で書かれた特許出願の技術的内容を確認したり、権利範囲についての知識を得たりするために、外国語で書かれたクレームの意味を知る必要が生じることがある。対象になるものが少数であり、かつ対象となる文書の言語に関する知識があれば、原文を読むことについてそれほど問題はないかも知れない。しかし、対象文書が大量であったり、文書の言語についての知識が乏しかったりする場合には、対象文書のクレームを全て読むことは非現実的である。そのような場合に、自動翻訳で各クレームを自己の母語に翻訳することで、内容を把握しようとすることがよく行なわれる。 It may be necessary to know the meaning of a claim written in a foreign language in order to confirm the technical content of a patent application written in a foreign language or to gain knowledge about the scope of rights. If you have a small number of subjects and you have knowledge of the language of the document in question, you may not have much trouble reading the original text. However, when the target document is large or the knowledge about the language of the document is poor, it is impractical to read all the claims of the target document. In such cases, it is common to try to understand the content by translating each claim into one's own language by automatic translation.

ところが、クレームを他の言語に自動翻訳した場合、その品質が低いことはよく知られている。クレームでは、発明に関する多くの限定を１文で記載することが慣例となっている。そのためにクレームの文は長くなる傾向がある。しかも、多くの発明者が開発にしのぎをけずるような技術分野では、比較的多くの限定（構成要件）をクレームに付加しないと先行技術との差異を明確にできない場合もある。そのために、クレームは、通常の文書ではほとんど生じ得ないような長さの文となることも多い。自動翻訳技術は特に近年非常に発達し精度も向上しているものの、特にクレームのように翻訳の原文が長い場合には精度の高い翻訳はまだ期待できない。 However, it is well known that the quality of complaints automatically translated into other languages is low. It is customary in the claims to state many limitations of the invention in one sentence. Therefore, the complaint text tends to be long. Moreover, in a technical field where many inventors are competing with development, it may not be possible to clarify the difference from the prior art unless a relatively large number of restrictions (constituent requirements) are added to the claim. As a result, claims are often sentences of length that are unlikely to occur in ordinary documents. Although automatic translation technology has been greatly developed and improved in accuracy in recent years, highly accurate translation cannot be expected yet, especially when the original text of the translation is long, such as a complaint.

こうした問題を解決するための提案が、後掲の特許文献１でなされている。特許文献１に記載された技術は、基本的には、長いクレームを、その構成要素（以下、「構造部品」と呼ぶ。）に分割して構造部品ごとに翻訳する、という考え方に基づいている。これは、特に最近では、クレームの記載が、一般的に複数の構造部品に分けられること、それら構造部品の間は、特定の区切りパターンで区切られて記載されていること、という事実を利用している。クレームの構造部品としては、発明の主題と、１又は複数の構成要件と、発明の主題と構成要件との間に挿入される移行句が考えられる。これらの記載の順番は、言語によって異なってくる。例えば、日本語の場合には、１又は複数の構成要件が先頭に位置し、移行句が続き、移行句に続いて発明の主題が末尾に記載される。英語の場合にはこの逆で、先頭に発明の主題が記載され、続いて移行句、続いて１又は複数の構成要件という形が一般的である。 A proposal for solving such a problem is made in Patent Document 1 described later. The technique described in Patent Document 1 is basically based on the idea that a long claim is divided into its components (hereinafter referred to as "structural parts") and translated for each structural part. .. This takes advantage of the fact that claims are generally divided into multiple structural parts, especially these days, and that the structural parts are separated by a specific delimiter pattern. ing. As the structural component of the claim, a transition clause inserted between the subject of the invention and one or more constituents and the subject of the invention and the constituents can be considered. The order of these descriptions varies depending on the language. For example, in the case of Japanese, one or more constituent requirements are located at the beginning, followed by a transitional phrase, followed by the transitional phrase, and the subject matter of the invention is described at the end. In the case of English, the opposite is true, with the subject of the invention being listed first, followed by a transitional phrase, followed by one or more constituents.

日本語の場合、移行句としては「含む（ことを特徴とする）」、「を備える（ことを特徴とする）」、「からなる（ことを特徴とする）」などが主に用いられる。英語の場合には、移行句としては「comprising:」、「including:」、「consisting of:」等が用いられる。 In the case of Japanese, as transitional phrases, "include (characterized by)", "provide (characterized by)", "consisting of (characterized by)", etc. are mainly used. In the case of English, "comprising:", "including:", "consisting of:", etc. are used as transition phrases.

１又は複数の構成要件の間は、日本語であれば「、」＋改行が主として用いられ、英語であれば「;」が用いられる。 Between one or more components, "," + line feed is mainly used in Japanese, and ";" is used in English.

このような区切りパターンをクレーム中で検出することにより、クレームの構造を判定できる。 By detecting such a delimiter pattern in the claim, the structure of the claim can be determined.

図１に、特許文献１に開示された技術についての大まかな処理の流れを示す。図１を参照して、特許文献１におけるクレームの翻訳手順３０は、コンピュータを用いたものであって、クレームのテキストに対して形態素解析及び構文解析し、構文情報を生成するステップ４０と、構文解析の結果内において構造部品の区切りパターンを検出し、クレームのテキストを構成要素ごとに分割するステップ４２と、各構造部品について、その構造部品名を示す文字列をテキスト中で特定し、その文字列に、構造部品名であることを示すタグを付するステップ４４とを含む。構文解析の結果、構造部品の間にツリー状の構文構造が形成される。特許文献１では、この構文構造は、構造部品及びその内部の単語等に付されるタグの形で表現される。 FIG. 1 shows a rough processing flow of the technique disclosed in Patent Document 1. With reference to FIG. 1, the claim translation procedure 30 in Patent Document 1 uses a computer, and has a step 40 of morphological analysis and parsing of the claim text to generate syntactic information, and a syntax. Step 42, which detects the delimiter pattern of structural parts in the analysis result and divides the text of the claim into each component, and for each structural part, the character string indicating the structural part name is specified in the text, and the character is specified. The column includes step 44 with a tag indicating that it is a structural part name. As a result of parsing, a tree-like syntactic structure is formed between structural parts. In Patent Document 1, this syntactic structure is expressed in the form of tags attached to structural parts and words inside them.

特許文献１の第１の実施の形態では、対象となるクレームが英語であることが想定されている。英語の場合、各構造部品に関するテキストの最初に、その構造部品の名称に相当する単語が記載されていることが多く、その後にその構造部品に関する説明を記載している形式がほとんどである。そのためにこのステップ４４のような処理が容易に行なえる。また、構造部品の説明もパターン化されていることが多い。特許文献１では、そのように説明についてパターン化された説明文を説明パターン文と呼んでいる。 In the first embodiment of Patent Document 1, it is assumed that the subject claim is in English. In the case of English, the word corresponding to the name of the structural part is often written at the beginning of the text about each structural part, and most of the forms include the explanation about the structural part after that. Therefore, the process of step 44 can be easily performed. In addition, the explanation of structural parts is often patterned. In Patent Document 1, the explanatory text in which the description is patterned in this way is referred to as an explanatory pattern sentence.

特許文献１の翻訳手順３０はさらに、処理対象となっている構造部品に含まれる説明パターン文を特定し、処理対象の構造部品をパターン文ごとに分割し、分割されたものが説明パターン文であることを示すタグを付すステップ４６と、処理対象の構造部品の構造部品名を自動翻訳により目的言語に翻訳するステップ４８と、説明パターン文ごとに、文の構造を翻訳に適したものに変更するステップ５０と、ステップ５０で文構造が変更された各説明文を自動翻訳によって目的言語５２に翻訳するステップと、このように目的言語に翻訳された構造部品名及び各パターン文を、タグにしたがってツリー表示するステップ５４とを含む。 The translation procedure 30 of Patent Document 1 further specifies an explanatory pattern sentence included in the structural component to be processed, divides the structural component to be processed into each pattern sentence, and the divided one is an explanatory pattern sentence. Step 46 to add a tag indicating that there is, step 48 to translate the structural part name of the structural part to be processed into the target language by automatic translation, and change the sentence structure to one suitable for translation for each explanation pattern sentence. Step 50, the step of translating each explanatory text whose sentence structure has been changed in step 50 into the target language 52 by automatic translation, and the structural part name and each pattern sentence translated into the target language in this way are used as tags. Therefore, it includes step 54 for displaying the tree.

ステップ５０での文構造の変更とは、説明文が主語と述語とを含むような形式に変更することである。例えば英語の関係代名詞を省略して代わりに構造部品名を挿入したり、説明パターン文が修飾する名詞又は名詞句を挿入したりする。特許文献１では、これを含め、文の変換については予め変換パターンが設定されているとしている。この文構造の変換は、説明パターン文を一般の文の語順にあわせるためのものと思われる。 The change in the sentence structure in step 50 is to change the description so that the description includes the subject and the predicate. For example, the English relative pronoun is omitted and a structural part name is inserted instead, or a noun or noun phrase modified by the explanation pattern sentence is inserted. In Patent Document 1, it is stated that a conversion pattern is set in advance for sentence conversion including this. This sentence structure conversion seems to be for matching the explanation pattern sentences in the word order of general sentences.

図２を参照して、特許文献１による自動翻訳手法では、より具体的には以下の様な処理を行なう。英語クレーム７０が翻訳対象であるものとする。特許文献１では、英語クレームを、英語の区切りパターンを見つけることにより複数の構造部品に分解し、それらの間にツリー構造を形成する（処理７２）。前述したように、英語クレーム中の区切りパターンに着目すれば、英語クレームがどのような構造になっているかを判定できる。その結果にしたがって英語クレームを複数の構造部品に分解できる。 With reference to FIG. 2, in the automatic translation method according to Patent Document 1, more specifically, the following processing is performed. English claim 70 shall be the subject of translation. In Patent Document 1, an English claim is decomposed into a plurality of structural parts by finding an English delimiter pattern, and a tree structure is formed between them (process 72). As described above, by paying attention to the delimiter pattern in the English claim, it is possible to determine the structure of the English claim. According to the result, the English claim can be decomposed into multiple structural parts.

特許文献１による手順では、処理７２の結果、ツリー構造の英語パターン７４が形成される。図２に示す例では、英語パターン７４は、主題９０と、移行句９２と、説明９４とを含む。主題９０、移行句９２，及び説明９４をそれぞれ別個に自動翻訳する（処理７６）ことで、ツリー構造の日本語パターン７８が生成される。日本語パターン７８は、構造としては英語パターン７４と同じで、テキストが日本語に変換されたものである。すなわち、日本語パターン７８は、主題の日本語訳１００、移行句の日本語訳１０２、説明の日本語訳１０４を含む。特許文献１では、日本語パターン７８から１文の日本語クレームを生成する代わりに、日本語パターン７８をそのままツリー形式で表示するとされている。 In the procedure according to Patent Document 1, as a result of the process 72, an English pattern 74 having a tree structure is formed. In the example shown in FIG. 2, the English pattern 74 includes subject 90, transitional phrase 92, and description 94. By automatically translating the subject 90, the transition phrase 92, and the explanation 94 separately (processing 76), a Japanese pattern 78 having a tree structure is generated. The Japanese pattern 78 has the same structure as the English pattern 74, and the text is converted into Japanese. That is, the Japanese pattern 78 includes a Japanese translation 100 of the subject, a Japanese translation 102 of the transition phrase, and a Japanese translation 104 of the explanation. In Patent Document 1, instead of generating one sentence of Japanese claim from Japanese pattern 78, Japanese pattern 78 is displayed as it is in a tree format.

特許文献１によれば、このような処理を行なうことで、個々の構造部品を自動翻訳すればよくなるため、構成する説明パターン文の翻訳精度が従来のものより向上するとされている。さらに、翻訳後の文を木構造で表示することにより、クレームの内容がわかりやすくなるとされている。 According to Patent Document 1, since it is sufficient to automatically translate individual structural parts by performing such processing, it is said that the translation accuracy of the constituent explanatory pattern sentences is improved as compared with the conventional one. Furthermore, it is said that the content of the complaint will be easier to understand by displaying the translated sentence in a tree structure.

特開２０１４−１９９４７６（特に図５，７，８及び段落３１，４１、５１−７７）JP-A-2014-199476 (particularly FIGS. 5, 7, 8 and paragraphs 31, 41, 51-77).

確かに、特許文献１の手法により、クレーム全体をひとまとめに翻訳するよりも翻訳の精度は高くなると思われる。しかし、自動翻訳を用いる場合、特許文献１の技術では依然として解決すべき課題がある。 Certainly, it seems that the method of Patent Document 1 makes the translation more accurate than translating the entire claim as a whole. However, when automatic translation is used, the technique of Patent Document 1 still has a problem to be solved.

最も大きな問題は、説明を構成する文構造の変更である。文構造の変更は、説明文が通常文ではなく、各構造部品を修飾する形式となっていることにより必要となる処理である。文構造の変換パターンが予め準備されており、かつ説明パターン文の変更が完全に行なわれれば、既存の自動翻訳技術を用いて各説明文を翻訳しても問題は生じないかもしれない。しかし、クレームとして生じ得る文のバリエーションは事実上無限であり、それらについての文構造の変換規則を予め準備しておくことは不可能である。しかも変換パターンとしてどのようなものを準備すべきかについて、特許文献１には明確な開示がない。したがって、説明文が自動翻訳に適した文構造に変換される可能性は決して高くない。そのような変換が正しく行なわれず、説明文が文の形をなしていない場合でも、自動翻訳装置は、各説明文が通常の文であるとみなして翻訳を行なう。その結果、得られる翻訳結果が理解可能なものになる可能性はほとんどない。特に、構造部品が名詞句である場合、自動翻訳装置が文とみなして翻訳すると、翻訳文に対する悪影響が大きくなる。 The biggest problem is the change in the sentence structure that makes up the explanation. The change of the sentence structure is a process required because the explanation is not a normal sentence but a form that modifies each structural part. If the conversion pattern of the sentence structure is prepared in advance and the explanation pattern sentence is completely changed, there may be no problem even if each explanation sentence is translated using the existing automatic translation technique. However, the variations of sentences that can occur as claims are virtually infinite, and it is impossible to prepare sentence structure conversion rules for them in advance. Moreover, there is no clear disclosure in Patent Document 1 as to what kind of conversion pattern should be prepared. Therefore, it is unlikely that the description will be converted into a sentence structure suitable for automatic translation. Even if such conversion is not performed correctly and the explanatory text is not in the form of a sentence, the automatic translation device considers each explanatory text to be a normal sentence and performs translation. As a result, the resulting translation results are unlikely to be comprehensible. In particular, when a structural part is a noun phrase, if the automatic translation device treats it as a sentence and translates it, the adverse effect on the translated sentence becomes large.

例えば、図３を参照して、説明文に相当する２つの名詞句１及び２を持つ英語クレーム１２０を考える。特許文献１にしたがって本願発明者が自動翻訳して得られた日本語クレーム１２２によれば、名詞句１及び名詞句２に対する翻訳がほとんど理解できないものとなっている。図３に示すように、特許文献１にしたがって処理した場合、文構造の変換がうまく行なわれないにもかかわらず、各名詞句等を文として翻訳するような場合には、得られた翻訳文は理解が困難となり、自動翻訳の目的が果たせない。 For example, with reference to FIG. 3, consider an English claim 120 having two noun phrases 1 and 2 corresponding to explanatory text. According to the Japanese claim 122 obtained by automatic translation by the inventor of the present application in accordance with Patent Document 1, the translation for noun phrase 1 and noun phrase 2 is almost incomprehensible. As shown in FIG. 3, when processing is performed according to Patent Document 1, even though the sentence structure is not converted well, when each noun phrase or the like is translated as a sentence, the obtained translated sentence is obtained. Is difficult to understand and cannot serve the purpose of automatic translation.

また、このような問題は、特許出願のクレームの翻訳に関して生ずるだけでなく、同様に長文で、特定のルールにしたがって記載されるような、複数の構造部品からなる文の翻訳においても生じ得る問題である。例えば法令、約款、及び条約、並びに様々な機械、電子機器、及びソフトウェアの使用説明書等においてもこうした問題が生じ得る。 Moreover, such a problem arises not only in the translation of the claims of the patent application, but also in the translation of a sentence consisting of a plurality of structural parts, which is also a long sentence and is described according to a specific rule. Is. For example, laws, contracts, and treaties, as well as instructions for using various machines, electronic devices, and software can cause such problems.

したがって本発明の目的は、特定の形式にしたがって記載された、複数の構造部品に分割できる長文に対する自動翻訳の精度を高めることができる自動翻訳装置を提供することである。 Therefore, an object of the present invention is to provide an automatic translation apparatus capable of improving the accuracy of automatic translation for a long sentence that is described according to a specific format and can be divided into a plurality of structural parts.

本発明の第１の局面に係る自動翻訳装置は、第１の言語の文を、第１の言語と異なる第２の言語の文に翻訳する。この自動翻訳装置は、第１の言語の文のパターンを特定し、第１の言語の文を構造部品に分割する分割手段と、分割手段により特定された第１の言語の文のパターンと予め対応付けられた、第２の言語の文のパターンを特定するパターン特定手段と、分割手段により分割された第１の言語の文のパターンと、パターン特定手段により特定された第２の言語のパターンとの間で、構造部品の対応関係及び各構造部品の文法特性を特定する対応特定手段と、第１の言語の構造部品の各々について、当該構造部品を構成する単語列に対し、第１の言語から第２の言語への翻訳用のモデルを使用した自動翻訳を行なって、第２の言語の翻訳を生成する翻訳手段と、第１の言語の構造部品の各々について、翻訳手段によって得られた第２の言語の翻訳を、対応特定手段により特定された、構造部品の対応関係にしたがって、パターン特定手段により特定された第２の言語の文のパターンの構造部品のいずれかに代入することにより、第１の言語の文の翻訳である第２の言語の文を生成する代入手段とを含む。 The automatic translation device according to the first aspect of the present invention translates a sentence in the first language into a sentence in a second language different from the first language. This automatic translation device identifies a sentence pattern of the first language and divides the sentence of the first language into structural parts, and a sentence pattern of the first language specified by the dividing means in advance. A pattern specifying means for specifying the pattern of the associated second language sentence, a pattern of the first language sentence divided by the dividing means, and a pattern of the second language specified by the pattern specifying means. With respect to the correspondence identification means for specifying the correspondence relationship of the structural parts and the grammatical characteristics of each structural part, and the word string constituting the structural part for each of the structural parts of the first language, the first A translation means that performs automatic translation using a model for translation from a language to a second language to generate a translation of the second language, and a translation means for each of the structural components of the first language. Substituting the translation of the second language into one of the structural parts of the pattern of the sentence of the second language specified by the pattern specifying means according to the correspondence of the structural parts specified by the corresponding specifying means. Includes an assignment means for generating a second language sentence, which is a translation of the first language sentence.

第１の言語の文は、例えば特許出願のクレーム、法令、約款、及び条約、並びに様々な機械、電子機器、及びソフトウェアの使用説明書等のいずれでもよい。 The text in the first language may be, for example, any of the claims of a patent application, laws, regulations, treaties, and instructions for use of various machines, electronic devices, and software.

好ましくは、自動翻訳装置は、対応特定手段と前記翻訳手段との間に設けられ、対応特定手段により対応関係及び文法特性が特定された第１の言語の構造部品を受け、第１の言語の構造部品の各々について、当該構造部品を構成する単語列の順番を、第２の言語において当該単語列の各々の訳語が出現する順番にあわせて並べ替えて前記翻訳手段に与える単語並べ替え手段を含む。 Preferably, the automatic translation device receives the structural parts of the first language in which the correspondence relationship and the grammatical characteristics are specified by the correspondence identification means, which is provided between the correspondence identification means and the translation means, and receives the structural parts of the first language. For each of the structural parts, the word rearranging means for giving the translation means by rearranging the order of the word strings constituting the structural parts according to the order in which the translated words of the word strings appear in the second language. include.

より好ましくは、単語並べ替え手段は、第１の言語の構造部品の各々について、構文解析を行なって第１の言語の構文解析木を生成する構文解析手段と、構文解析手段により生成された第１の言語の構文解析木を、予め準備された変換規則にしたがって、第２の言語の構文解析木に変換する変換手段と、変換手段による変換で得られた第２の言語の構文解析木における単語の出現順序にしたがって、第１の言語の構造部品の単語を並べ替えて出力する並べ替え手段とを含む。 More preferably, the word rearrangement means is a parsing means that performs parsing for each of the structural parts of the first language to generate a parsing tree for the first language, and a parsing means generated by the parsing means. In the conversion means for converting the parse tree of one language into the parse tree of the second language according to the conversion rules prepared in advance, and in the parse tree of the second language obtained by the conversion by the conversion means. It includes a rearrangement means for rearranging and outputting the words of the structural parts of the first language according to the order of appearance of the words.

翻訳手段は、第１の言語の構造部品の各々について、当該構造部品を構成する単語列に対し、当該構造部品の文法特性を持つ単語列の翻訳に対して予め最適化された、第１の言語から第２の言語への翻訳用のモデルを使用した自動翻訳を行なって、第２の言語の翻訳を生成する文法特性別翻訳手段を含んでもよい。 The translation means is pre-optimized for the translation of the word string having the grammatical characteristics of the structural part with respect to the word string constituting the structural part for each of the structural parts of the first language. It may include a grammatical characteristic-specific translation means that performs automatic translation using a model for translation from language to second language to generate a translation of the second language.

さらに好ましくは、パターン特定手段は、第１の言語の文に含まれる、予め定められた区切りパターンにより、第１の言語の文を複数個の構造部品に分割する分割手段と、分割手段により分割された各構造部品の文法特性を判定する文法特性判定手段と、分割手段により分割された構造部品の出現順序及び各構造部品について文法特性判定手段により判定された文法特性とによって、第１の言語の文パターンを特定する手段とを含む。 More preferably, the pattern specifying means includes a dividing means for dividing the sentence in the first language into a plurality of structural parts by a predetermined dividing pattern included in the sentence in the first language, and a dividing means for dividing the sentence in the first language. The first language is based on the grammatical characteristic determining means for determining the grammatical characteristics of each structural component, the appearance order of the structural parts divided by the dividing means, and the grammatical characteristics determined by the grammatical characteristic determining means for each structural component. Including means for identifying the sentence pattern of.

第１の言語の文は、第１の言語で記載された特許出願のクレームでもよい。 The sentence in the first language may be a claim of a patent application written in the first language.

本発明の第２の局面に係る翻訳用モデル学習装置は、第１の言語の特定の文法特性の文を、第１の言語と異なる第２の言語の文に統計的翻訳によって翻訳する際に使用される翻訳用モデルの学習を行なう。この装置は、第１の言語の特定の文法特性の文と、当該文の、第１の言語と異なる第２の言語の訳文とからなる対訳を複数個収集するための収集手段と、収集手段により収集された複数の対訳を学習データとして、第１の言語の特定の文法特性の文から第２の言語の文への統計的翻訳を行なうために必要な統計的モデルの学習を行なうための統計的学習手段とを含む。 The translation model learning device according to the second aspect of the present invention translates a sentence of a specific grammatical characteristic of the first language into a sentence of a second language different from the first language by statistical translation. Train the translation model used. This device is a collecting means for collecting a plurality of translations consisting of a sentence having a specific grammatical characteristic of the first language and a translation of the sentence in a second language different from the first language, and a collecting means. To train the statistical model necessary for statistical translation from a sentence with a specific grammatical characteristic in the first language to a sentence in the second language, using the multiple translations collected by Including statistical learning means.

従来の特許出願のクレームの自動翻訳処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the automatic translation processing of the claim of the conventional patent application. 従来の特許出願のクレームの自動翻訳処理におけるテキストの翻訳過程を模式的に示す図である。It is a figure which shows typically the translation process of the text in the automatic translation processing of the claim of the conventional patent application. 従来の特許出願のクレームの自動翻訳処理における課題を説明するための模式図である。It is a schematic diagram for demonstrating the problem in the automatic translation processing of the claim of the conventional patent application. 本発明の第１の実施の形態における自動翻訳の処理に必要な学習過程を説明するための模式図である。It is a schematic diagram for demonstrating the learning process necessary for the process of automatic translation in 1st Embodiment of this invention. 本発明の第１の実施の形態に係る、クレームの自動翻訳処理におけるテキストの翻訳過程を説明するための模式図である。It is a schematic diagram for demonstrating the text translation process in the automatic translation process of a claim which concerns on 1st Embodiment of this invention. 本発明の第１の実施の形態に係る自動翻訳システムの全体構成を示すブロック図である。It is a block diagram which shows the whole structure of the automatic translation system which concerns on 1st Embodiment of this invention. 本発明の第１の実施の形態における自動翻訳用の各種モデルの学習を行なう、構造部品翻訳用モデル学習部の概略構成を示すブロック図である。It is a block diagram which shows the schematic structure of the model learning part for structural component translation which learns various models for automatic translation in 1st Embodiment of this invention. 本発明の第１の実施の形態において、名詞句及び動詞句をそれぞれ構文解析するためのモデルの学習を行なう構文解析用モデル学習部の概略構成を示すブロック図である。It is a block diagram which shows the schematic structure of the model learning part for parsing which learns the model for parsing each noun phrase and the verb phrase in the 1st Embodiment of this invention. 本発明の第１の実施の形態において、英文の特許出願のクレームを日本語に自動翻訳するクレーム翻訳部の概略構成を示すブロック図である。It is a block diagram which shows the schematic structure of the claim translation part which automatically translates the claim of the patent application of English sentence into Japanese in 1st Embodiment of this invention. 本発明の第２の実施の形態に係る自動翻訳システムの全体構成を示すブロック図である。It is a block diagram which shows the whole structure of the automatic translation system which concerns on 2nd Embodiment of this invention. 本発明の第２の実施の形態における自動翻訳用の各種モデルの学習を行なう、構造部品翻訳用モデル学習部の概略構成を示すブロック図である。It is a block diagram which shows the schematic structure of the model learning part for structural component translation which learns various models for automatic translation in the 2nd Embodiment of this invention. 本発明の第２の実施の形態において、英文の特許出願のクレームを日本語に自動翻訳するクレーム翻訳部の概略構成を示すブロック図である。It is a block diagram which shows the schematic structure of the claim translation part which automatically translates the claim of the patent application of English sentence into Japanese in the 2nd Embodiment of this invention. 本発明の各実施の形態に係る自動翻訳システムを実現するハードウェアフラットフォームであるコンピュータシステムの外観を示す図である。It is a figure which shows the appearance of the computer system which is the hardware flat form which realizes the automatic translation system which concerns on each embodiment of this invention. 図１３に示すコンピュータシステムの内部のハードウェア構成を示すブロック図である。It is a block diagram which shows the hardware structure inside the computer system shown in FIG.

以下の説明及び図面では、同一の部品には同一の参照番号を付してある。したがって、それらについての詳細な説明は繰返さない。なお、以下の説明は、特許出願の英語クレームを日本語に翻訳する場合に関する。しかし以下の実施の形態の翻訳対象は、そのような場合には限定されない。例えば、日本語のクレームを英文に翻訳する場合にも適用できる。さらに、特許出願のクレームと同様に翻訳が難しいとされる法令、約款、及び条約、並びに様々な機械、電子機器、及びソフトウェアの使用説明書等についても適用できる。 In the following description and drawings, the same parts are given the same reference numbers. Therefore, detailed explanations about them will not be repeated. The following explanation relates to the case of translating an English claim of a patent application into Japanese. However, the translation target of the following embodiments is not limited to such cases. For example, it can be applied when translating a Japanese claim into an English sentence. Furthermore, it can be applied to laws, regulations, treaties, and instructions for use of various machines, electronic devices, and software that are difficult to translate as well as claims of patent applications.

［第１の実施の形態］
＜基本的考え方＞
以下に説明する第１の実施の形態に係るクレームの自動翻訳システムでは、統計的機械翻訳を採用する。図４を参照して、英語、日本語を問わず、特許出願のクレームは、様々なバリエーションはあるものの、ある一定のパターンのいずれかにしたがって構造部品が配列され記載されている。本実施の形態では、こうしたクレームの構造部品の配列パターンをクレームパターンと呼ぶ。これは、図２に示す従来の英語パターン７４と同様のもので、ツリー構造で表すことができる。英語のクレームパターンを英語パターン、日本語のクレームパターンを日本語パターンと呼ぶ。また、実際の英語クレームとそれに対応する日本語クレームとを比較することにより、英語パターンと日本語パターンとを対応付けることができる。すなわち、ある英語パターンにしたがって記載されたクレームを、その英語パターンに対応する日本語パターンと合致するような日本語クレームに翻訳できる。 [First Embodiment]
<Basic idea>
In the claim automatic translation system according to the first embodiment described below, statistical machine translation is adopted. With reference to FIG. 4, in both English and Japanese, the claims of a patent application are described in which structural parts are arranged and described according to any of a certain pattern, although there are various variations. In the present embodiment, the arrangement pattern of the structural parts of such a claim is referred to as a claim pattern. This is similar to the conventional English pattern 74 shown in FIG. 2, and can be represented by a tree structure. An English claim pattern is called an English pattern, and a Japanese claim pattern is called a Japanese pattern. Further, by comparing the actual English claim with the corresponding Japanese claim, the English pattern and the Japanese pattern can be associated with each other. That is, a claim described according to a certain English pattern can be translated into a Japanese claim that matches the Japanese pattern corresponding to the English pattern.

これらクレームパターンは、複数のクレームを分類することで得られる。各クレームパターンでは、各構造部品にはその構造部品の文法特性が付される。例えば構造部品を形成する文字列又は単語列が名詞句であれば、その構造部品には名詞句であるというマークが付され、動詞句であれば動詞句であるというマークが付される。ツリー構造の見かけが同じ２つのクレームパターンでも、ある構造部品の文法特性が異なれば、それらは別々のクレームパターンである。こうしたクレームパターンは、特許文献１で使用されている「区切りパターン」とは別のものである点に注意が必要である。 These claim patterns can be obtained by classifying a plurality of claims. In each claim pattern, each structural part is given a grammatical characteristic of that structural part. For example, if the character string or word string forming the structural part is a noun phrase, the structural part is marked as a noun phrase, and if it is a verb phrase, it is marked as a verb phrase. Even if two claim patterns have the same appearance of the tree structure, if the grammatical characteristics of a certain structural part are different, they are different claim patterns. It should be noted that such a claim pattern is different from the "separation pattern" used in Patent Document 1.

英語のクレームのクレームパターンと、その英語のクレームに対応する日本語のクレームのクレームパターンとを比較することにより、英語パターンと日本語パターンとを対応付けることができる。さらに、対応するクレームパターン同士で、構造部品同士の対応を付けることもできる。こうしたクレームパターン同士の対は、パターン分類データとして予め蓄積され、以下に記載するように翻訳用のモデルの学習時に使用される。なお、以下の実施の形態では、英語パターンと日本語パターンとは１対１に対応付けられているものとする。 By comparing the claim pattern of an English claim with the claim pattern of a Japanese claim corresponding to the English claim, the English pattern and the Japanese pattern can be associated with each other. Further, the corresponding claim patterns can be associated with each other. Such pairs of claim patterns are stored in advance as pattern classification data, and are used when learning a model for translation as described below. In the following embodiment, it is assumed that the English pattern and the Japanese pattern are associated with each other on a one-to-one basis.

以下の実施の形態では、英語クレームを日本語クレームに翻訳する場合を想定する。原言語である英語クレームの単語列を複数の構造部品に分解し、各構造部品の文法特性を判定することで、英語パターンを判定する。この結果により、各構造部品を構成するテキストの内容も得られる。なお、構造部品の分解には、特許文献１と同様、クレーム中の区切りパターンを使用できる。特に、移行句はクレーム表現に特有なものでそのバリエーションも少ないので、比較的簡単に特定できる。移行句を特定した後は、言語によりその前後のいずれかに発明の主題に相当する構造部品が位置し、他方に説明に相当する１又は複数の構造部品が位置する。これら構造部品の間の分離は、特定の区切りパターンを発見することで行なうこともできるし、分離のための統計的モデルを予め作成しておき、区切り位置である可能性の高い部分を特定して行なうこともできる。 In the following embodiment, it is assumed that an English claim is translated into a Japanese claim. The English pattern is determined by decomposing the word string of the English claim, which is the original language, into a plurality of structural parts and determining the grammatical characteristics of each structural component. From this result, the contents of the texts constituting each structural component can also be obtained. As in Patent Document 1, the delimiter pattern in the claim can be used for disassembling the structural parts. In particular, the transition phrase is peculiar to the claim expression and its variation is small, so it can be identified relatively easily. After the transition phrase is specified, the structural part corresponding to the subject of the invention is located either before or after the transition phrase, and one or more structural parts corresponding to the description are located on the other side. Separation between these structural parts can be performed by discovering a specific delimiter pattern, or a statistical model for separation is created in advance to identify the part that is likely to be the delimiter position. Can also be done.

本実施の形態では、構造部品の文法特性（例えば名詞句、動詞句の種別等）ごとに、別々の翻訳用モデルの学習を行なっておき、各構造部品の文法特性に応じたモデルを用いて構造部品ごとに自動翻訳を行なう。すなわち、名詞句からなる構造部品を翻訳するときは、名詞句のみを用いて予め学習した翻訳用のモデルを使用する。動詞句からなる構造部品を翻訳するときは、動詞句のみを用いて予め学習した翻訳用のモデルを使用する。この結果、各構造部品についてより理解しやすい日本語訳が得られる。 In this embodiment, separate translation models are learned for each grammatical characteristic of structural parts (for example, noun phrase, verb phrase type, etc.), and a model corresponding to the grammatical characteristics of each structural component is used. Automatic translation is performed for each structural part. That is, when translating a structural component consisting of a noun phrase, a pre-learned translation model using only the noun phrase is used. When translating a structural component consisting of a verb phrase, a pre-learned translation model using only the verb phrase is used. As a result, a Japanese translation that is easier to understand for each structural part can be obtained.

もとの英語パターンは、ある日本語パターンと関係付けられており、英語パターンのどの構造部品が日本語パターンのどの構造部品に対応しているかも予め記録されている。この対応関係を用い、各構造部品の日本語訳を、元の英語パターンに対応する日本語パターンの、元の構造部品に対応する構造部品の位置に代入することで、クレームの日本語訳を得ることができる。 The original English pattern is associated with a certain Japanese pattern, and which structural part of the English pattern corresponds to which structural part of the Japanese pattern is also recorded in advance. Using this correspondence, the Japanese translation of the claim can be obtained by substituting the Japanese translation of each structural part into the position of the structural part corresponding to the original structural part of the Japanese pattern corresponding to the original English pattern. Obtainable.

なお、ここでいう「翻訳用モデル」とは、統計的翻訳装置で使用される、原言語から目的言語への翻訳モデル、目的言語の言語モデル、及びそれらの学習の過程で生成される句テーブル等からなる１組のモデルを指す。以下の実施の形態では、原言語は英語であり、目的言語は日本語である。また、以下の実施の形態では、文法特性として名詞句と動詞句とを採用する。 The "translation model" here is a translation model from the original language to the target language used in the statistical translation device, a language model of the target language, and a phrase table generated in the process of learning them. Refers to a set of models consisting of etc. In the following embodiments, the original language is English and the target language is Japanese. Further, in the following embodiment, a noun phrase and a verb phrase are adopted as grammatical characteristics.

図４に、名詞句翻訳用モデル２１０と動詞句翻訳用モデル２１４との学習過程を模式的に示す。なお、ここでは、英語パターンと日本語パターンとについては予め収集されており、それらの間の対応関係も確立されている。この対応関係は、前述したとおり１対１である。すなわち、英語パターンが決まればそれに対応する日本語パターンも決まり、構造部品の対応関係も同様に１対１で決まる。 FIG. 4 schematically shows the learning process of the noun phrase translation model 210 and the verb phrase translation model 214. Here, the English pattern and the Japanese pattern are collected in advance, and the correspondence between them is also established. This correspondence is one-to-one as described above. That is, once the English pattern is determined, the corresponding Japanese pattern is also determined, and the correspondence between the structural parts is also determined on a one-to-one basis.

図４を参照して、学習用の、クレームの対訳１７０を複数個準備する。各対訳１７０は、英語クレーム１７２と、その英語のクレームの翻訳である日本語クレーム１７４とを含む。例えば日本出願を英語に訳して米国に出願したもののように、互いに対訳に相当する関係となっている出願書類で、しかも公開されているものは多数存在する。対訳１７０は、それら公開された出願の組から選択すればよい。 With reference to FIG. 4, a plurality of claim translations 170 for learning are prepared. Each translation 170 includes an English claim 172 and a Japanese claim 174, which is a translation of the English claim. For example, there are many application documents that have a bilingual relationship with each other, such as a Japanese application translated into English and filed in the United States. The bilingual translation 170 may be selected from the set of published applications.

英語クレーム１７２の英語パターンを判定し、その英語パターンにしたがってクレームを構造部品に分解し、各構造部品の文法特性を英語パターンにしたがって分類する（処理１７６）。同様に日本語クレーム１７４は英語パターンに対応する日本語パターンにしたがって構造部品に分解し、各構造部品の文法特性を日本語パターンにしたがって分類する（処理１８０）。この処理は、人手でも実行できるし、機械可読な形式のパターン集を準備しておくことで機械でも実行できる。 The English pattern of the English claim 172 is determined, the claim is decomposed into structural parts according to the English pattern, and the grammatical characteristics of each structural part are classified according to the English pattern (process 176). Similarly, Japanese claim 174 is decomposed into structural parts according to the Japanese pattern corresponding to the English pattern, and the grammatical characteristics of each structural part are classified according to the Japanese pattern (process 180). This process can be executed manually, or can be executed by a machine by preparing a machine-readable pattern collection.

このパターン分類の結果、例えば、英語クレーム１７２は英語パターン１７８に合致すると判定され、日本語クレーム１７４は日本語パターン１８２に合致すると判定される。英語パターンと日本語パターンとの構造部品は予め対応付けられており、各構造部品を対にすることができる（処理１８４）。例えば図４に示す例では、英語クレームの主題は日本語クレームの主題に対応付けられる。英語クレームの移行句は日本語クレームの移行句に対応付けられる。英語クレームの説明は、日本語クレームの説明に対応付けられる。 As a result of this pattern classification, for example, the English claim 172 is determined to match the English pattern 178, and the Japanese claim 174 is determined to match the Japanese pattern 182. The structural parts of the English pattern and the Japanese pattern are associated in advance, and each structural part can be paired (process 184). For example, in the example shown in FIG. 4, the subject of the English claim is associated with the subject of the Japanese claim. The transition phrase of the English claim is associated with the transition phrase of the Japanese claim. The explanation of the English claim is associated with the explanation of the Japanese claim.

このようにして組み合わされた構造部品のうち、本実施の形態では、構造部品の文法特性に着目し、各構造部品の対を文法特性（文法的な種別）ごとに別々の集合に分類する。すなわち、構造部品の対のテキストが名詞句であれば名詞句対の集合１９８に、動詞句であれば動詞句対の集合２００に、各構造部品の対を分類する（処理１９６）。このようにして得られた名詞句対の集合１９８は、英語クレームの名詞句と日本語クレームの名詞句との対を多数含み、動詞句対の集合２００は、英語クレームの動詞句と日本語クレームの動詞句との対を多数含む。 Among the structural parts combined in this way, in the present embodiment, attention is paid to the grammatical characteristics of the structural parts, and the pairs of the structural parts are classified into separate sets for each grammatical characteristic (grammatical type). That is, if the text of the pair of structural parts is a noun phrase, it is classified into a set of noun phrase pairs 198, and if it is a verb phrase, it is classified into a set of verb phrase pairs 200 (process 196). The set of noun phrase pairs 198 thus obtained contains many pairs of noun phrases of English claims and noun phrases of Japanese claims, and the set of verb phrase pairs 200 includes verb phrases of English claims and Japanese. Includes many pairs with claim verb phrases.

最後に、名詞句対の集合１９８を学習データとして名詞句翻訳用モデル２１０の学習を、動詞句対の集合２００を学習データとして動詞句翻訳用モデル２１４の学習を、それぞれ行なう。 Finally, the noun phrase translation model 210 is trained using the noun phrase pair set 198 as training data, and the verb phrase translation model 214 is trained using the verb phrase pair set 200 as training data.

図５を参照して、英語から日本語への翻訳時には以下の様な処理を行なう。英語クレーム７０を英語パターンと照合することにより、英語クレーム７０のクレームパターンを特定する。この例では英語パターン７４が特定されたものとする。英語パターン７４は、この例では、３個の構造部品、すなわち主題と移行句と説明とを含む。これらはいずれもテキストを格納する領域を持つ。これら領域には、英語クレーム７０を構造部品に分解して得られる、主題に対応する英語テキスト、移行句に対応する英語テキスト、及び説明に対応する英語テキストがそれぞれ格納される（処理７２）。 With reference to FIG. 5, the following processing is performed when translating from English to Japanese. By collating the English claim 70 with the English pattern, the claim pattern of the English claim 70 is specified. In this example, it is assumed that the English pattern 74 is specified. The English pattern 74 includes three structural parts in this example: the subject, the transition phrase, and the description. Each of these has an area for storing text. In these areas, the English text corresponding to the subject, the English text corresponding to the transition phrase, and the English text corresponding to the explanation, which are obtained by decomposing the English claim 70 into structural parts, are stored (process 72).

次に、特定された英語パターンと対応付けられている日本語パターンが特定され、生成される（処理７６）。具体的には、特定された英語パターン７４と対になった日本語パターン７８の構造を特定する情報がクレームパターンの記憶部から読み出され、この新たな日本語パターンを記憶する領域が新たに記憶領域に確保される。この例では、読み出された日本語パターン７８は、説明と移行句と主題とを含む。これらはこの順番で配置されており、いずれも、記憶装置内に、対応のテキストを格納する領域を持つ。日本語パターンの生成時には、これら領域は空である。 Next, the Japanese pattern associated with the specified English pattern is identified and generated (process 76). Specifically, information specifying the structure of the Japanese pattern 78 paired with the specified English pattern 74 is read from the claim pattern storage unit, and a new area for storing this new Japanese pattern is newly created. Reserved in the storage area. In this example, the read Japanese pattern 78 includes a description, a transitional phrase, and a subject. These are arranged in this order, and each has an area in the storage device for storing the corresponding text. These areas are empty when the Japanese pattern is generated.

続いて、英語パターンの主題、移行句、及び説明のテキストを、それぞれの文法特性に応じて選択したモデルを使用して自動翻訳する（処理２４０）。例えば主題は名詞句であるから名詞句翻訳用モデルを用いて翻訳される。説明のうち、動詞句であるものは動詞句用翻訳モデルを用いて翻訳され、名詞句であるものは名詞句翻訳用モデルを用いて翻訳される。移行句は動詞句であるから動詞句翻訳用モデルを用いて翻訳される。主題を翻訳した結果は、日本語パターン７８の主題の領域に、移行句を翻訳した結果は日本語パターン７８の移行句の領域に、説明を翻訳した結果は日本語パターンの説明の領域に、それぞれ格納される。この結果、翻訳後の日本語パターン２４２が得られる。得られた日本語パターン２４２の各領域のテキストを、日本語パターン２４２の各領域の順番に連結することで英語クレーム７０の翻訳である日本語のクレームが得られる。 Subsequently, the subject, transitional phrase, and explanatory texts of the English pattern are automatically translated using the model selected according to each grammatical characteristic (process 240). For example, since the subject is a noun phrase, it is translated using a noun phrase translation model. Of the explanations, those that are verb phrases are translated using the verb phrase translation model, and those that are noun phrases are translated using the noun phrase translation model. Since the transition phrase is a verb phrase, it is translated using a verb phrase translation model. The result of translating the subject is in the area of the subject of the Japanese pattern 78, the result of translating the transition phrase is in the area of the transition phrase of the Japanese pattern 78, and the result of translating the explanation is in the area of the explanation of the Japanese pattern. Each is stored. As a result, the translated Japanese pattern 242 is obtained. By concatenating the texts in each area of the obtained Japanese pattern 242 in the order of each area of the Japanese pattern 242, a Japanese claim which is a translation of the English claim 70 can be obtained.

＜構成＞
図６を参照して、上記した自動翻訳処理を実現する、本実施の形態に係るクレーム翻訳システム３００は、英語クレームを日本語クレームに翻訳するためのものである。クレーム翻訳システム３００は、名詞句の英日翻訳を行なう際に用いられる名詞句翻訳用モデル３２０と、動詞句の英日翻訳を行なう際に用いられる動詞句翻訳用モデル３２２と、これらの学習を行なうための構造部品翻訳用モデル学習部３１８と、動詞句構文解析用モデル３１２及び名詞句構文解析用モデル３１４を用いて英語クレーム３２４の構文解析を行い、構文解析後の英語クレーム３３４を出力するための構文解析部３３２と、名詞句翻訳用モデル３２０及び動詞句翻訳用モデル３２２を使用して、入力される構文解析後の英語クレーム３３４を翻訳して日本語クレーム３２６を出力するためのクレーム翻訳部３２８とを含む。 <Structure>
The claim translation system 300 according to the present embodiment, which realizes the above-mentioned automatic translation process with reference to FIG. 6, is for translating an English claim into a Japanese claim. The claim translation system 300 learns the nose phrase translation model 320 used when performing English-Japanese translation of nomenclature, the verb phrase translation model 322 used when performing English-Japanese translation of verb phrases, and learning these. The English claim 324 is syntactically analyzed using the structural component translation model learning unit 318, the verb phrase syntactic analysis model 312, and the noun phrase syntactic analysis model 314, and the English claim 334 after the syntactic analysis is output. A claim for translating the input syntactically analyzed English claim 334 and outputting a Japanese claim 326 using the syntactic analysis unit 332 for Includes translation unit 328.

構造部品翻訳用モデル学習部３１８は、名詞句翻訳用モデル３２０及び動詞句翻訳用モデル３２２の学習を行なうために、英語と日本語とのクレームの対訳を多数記憶したクレーム対訳コーパス３１０と、英語の動詞句の構文解析を行なうための動詞句構文解析用モデル３１２と、英語の名詞句の構文解析を行なうための名詞句構文解析用モデル３１４と、クレームに関する英語パターンとそれに対応する日本語パターンとの対を複数個格納したパターン分類データ３１６と、英語の構造部品について後述する事前並べ替えをするための構文解析木変換規則３３０とを用いる。動詞句構文解析用モデル３１２は、英語の動詞句のみを学習データとして学習した構文解析用モデルである。名詞句構文解析用モデル３１４は、英語の名詞句のみを学習データとして学習した構文解析用モデルである。 The structural component translation model learning unit 318 has a claim translation corpus 310 that stores a large number of translations of claims between English and Japanese in order to learn the nose phrase translation model 320 and the verb phrase translation model 322, and English. Model 312 for verb phrase syntax analysis for syntactic analysis of verb phrases, model 314 for nose phrase syntax analysis for syntactic analysis of English nomenclature, English patterns related to claims and corresponding Japanese patterns Pattern classification data 316, which stores a plurality of pairs of and, and syntax analysis tree conversion rule 330 for pre-sorting English structural parts, which will be described later, are used. The verb phrase parsing model 312 is a parsing model in which only English verb phrases are learned as training data. The noun phrase parsing model 314 is a parsing model in which only English noun phrases are learned as learning data.

図７を参照して、構造部品翻訳用モデル学習部３１８は、クレーム対訳コーパス３１０からクレーム対訳を順番に読出し、そのうちの英語クレームをパターン分類データ３１６と照合することにより、合致する英語パターンと、その英語パターンと対になった日本語パターンを読み出すためのパターン分類部３５０と、クレーム対訳コーパス３１０から読み出されたクレーム及びパターン分類部３５０により読み出された英語パターンに基づいて、英語クレームを構造部品に分類し、名詞句と動詞句とに分類して出力する構造部品分類部３５２と、構造部品分類部３５２が出力する英語の名詞句を格納する名詞句データ記憶部３５４と、構造部品分類部３５２が出力する動詞句を格納する動詞句データ記憶部３５６とを含む。 With reference to FIG. 7, the structural component translation model learning unit 318 reads the phrase translations in order from the phrase translation corpus 310, and collates the English claims with the pattern classification data 316 to obtain a matching English pattern. Based on the pattern classification unit 350 for reading the Japanese pattern paired with the English pattern, the phrase read from the claim translation corpus 310, and the English pattern read by the pattern classification unit 350, the English claim is made. A structural part classification unit 352 that classifies into structural parts and classifies them into nomenclatures and verb phrases and outputs them, a nomenclature data storage unit 354 that stores English nomenclatures output by the structural parts classification unit 352, and structural parts. It includes a verb phrase data storage unit 356 that stores a verb phrase output by the classification unit 352.

構造部品翻訳用モデル学習部３１８はさらに、名詞句データ記憶部３５４に記憶された名詞句データの各々に対して、事前並べ替えと呼ばれる、英語クレームの単語の並べ替え処理を行なう名詞句事前並べ替え部３５８と、名詞句事前並べ替え部３５８の出力する、事前並べ替え後の名詞句データを記憶するための名詞句用学習データ記憶部３６２と、名詞句用学習データ記憶部３６２に記憶された事前並べ替え後の英語クレームと日本語クレームとの対を学習データとし名詞句翻訳用モデル３２０の学習を行なうための名詞句用モデル学習部３６４とを含む。 The model learning unit 318 for structural component translation further performs a noun phrase pre-arrangement process, which is called pre-sorting, for each of the noun phrase data stored in the noun phrase data storage unit 354. It is stored in the rearrangement unit 358, the noun phrase learning data storage unit 362 for storing the pre-sorted noun phrase data output by the noun phrase pre-sorting unit 358, and the noun phrase learning data storage unit 362. It includes a noun phrase model learning unit 364 for learning the noun phrase translation model 320 by using the pair of the English phrase and the Japanese claim after the pre-sorting as learning data.

構造部品翻訳用モデル学習部３１８はさらに、動詞句データ記憶部３５６に記憶された動詞句データの各々に対して、英語クレームの単語の事前並べ替えを行なう動詞句事前並べ替え部３６０と、動詞句事前並べ替え部３６０の出力する、事前並べ替え後の動詞句データを記憶するための動詞句用学習データ記憶部３６６と、動詞句用学習データ記憶部３６６に記憶された事前並べ替え後の英語クレームと日本語クレームとの対を学習データとして動詞句翻訳用モデル３２２の学習を行なうための動詞句用モデル学習部３６８とを含む。 The model learning unit 318 for structural component translation further includes a verb phrase pre-sorting unit 360 that pre-sorts words in English claims for each of the verb phrase data stored in the verb phrase data storage unit 356, and a verb. After pre-sorting, the verb phrase learning data storage unit 366 for storing the pre-sorted verb phrase data output by the phrase pre-sorting unit 360 and the verb phrase learning data storage unit 366 for storing the pre-sorted verb phrase data. It includes a verb phrase model learning unit 368 for learning the verb phrase translation model 322 using a pair of an English claim and a Japanese claim as learning data.

名詞句事前並べ替え部３５８は、名詞句データ記憶部３５４に記憶された名詞句の各々について名詞句構文解析用モデル３１４を用いて構文解析を行ない、構文解析木を出力するための構文解析部３８０と、構文解析部３８０が出力する構文解析木を、構文解析木変換規則３３０内に記憶された変換規則を用いて、対応する日本語の構文解析木と同じ構造の構文解析木に変形するための木構造変換部３８２と、木構造変換部３８２により出力された変換後の構文解析木により規定される順番で英語の単語を並べ替えて名詞句用学習データ記憶部３６２に格納するための単語並べ替え部３８４とを含む。なお、この実施の形態では、翻訳前に原文（英語）の単語の並べ替えを行う事前並べ替えを採用しているが、これは一例であって、他の並べ替えの手法を用いることもできる。例えば、原文の単語の順序にしたがって単語の翻訳をした後、翻訳後の単語を目的言語の語順となるように並べ替えを行う、いわゆる事後並べ替えの手法を採用することもできる。 The nomenclature pre-sorting unit 358 performs parsing for each of the nomenclatures stored in the nomenclature data storage unit 354 using the parsing model 314, and outputs a parsing tree. 380 and the parsing tree output by the parser 380 are transformed into a parsing tree with the same structure as the corresponding Japanese parsing tree using the conversion rules stored in the parsing tree conversion rule 330. To rearrange English words in the order specified by the parsing tree after conversion output by the tree structure conversion unit 382 and the tree structure conversion unit 382, and store them in the learning data storage unit 362 for nomenclature. Includes a word rearrangement unit 384. In this embodiment, pre-sorting is used to sort the words in the original text (English) before translation, but this is just an example, and other sorting methods can also be used. .. For example, it is possible to adopt a so-called post-sorting method in which words are translated according to the order of words in the original text and then the translated words are rearranged so as to be in the word order of the target language.

動詞句事前並べ替え部３６０は、動詞句データ記憶部３５６に記憶された動詞句の各々について動詞句構文解析用モデル３１２を用いて構文解析を行ない、構文解析木を出力するための構文解析部４００と、構文解析部４００が出力する構文解析木を、構文解析木変換規則３３０内に記憶された変換規則を用いて、対応する日本語の構文解析木と同じ構造の構文解析木に変形するための木構造変換部４０２と、木構造変換部４０２により出力された変換後の構文解析木により規定される順番で英語の動詞句内の単語を並べ替えて動詞句用学習データ記憶部３６６に格納するための単語並べ替え部４０４とを含む。 The verb phrase pre-sorting unit 360 performs parsing for each of the verb phrases stored in the verb phrase data storage unit 356 using the verb phrase parsing model 312, and is a parsing unit for outputting a parsing tree. The parsing tree 400 and the parsing tree output by the parsing unit 400 are transformed into a parsing tree having the same structure as the corresponding Japanese parsing tree by using the conversion rule stored in the parsing tree conversion rule 330. To the verb phrase learning data storage unit 366, the words in the English verb phrase are rearranged in the order specified by the converted parsing tree output by the tree structure conversion unit 402 and the tree structure conversion unit 402. Includes a word sort unit 404 for storage.

本実施の形態では、動詞句構文解析用モデル３１２及び名詞句構文解析用モデル３１４はそれぞれ、英語の動詞句と英語の名詞句の構文解析を行なうように学習を行なったモデルである。これらの学習は、事前に図８に示すような構成の構文解析用モデル学習部４４２によって行なう。 In the present embodiment, the verb phrase syntax analysis model 312 and the noun phrase syntax analysis model 314 are models that have been trained to perform syntactic analysis of an English verb phrase and an English noun phrase, respectively. These learnings are performed in advance by the parsing model learning unit 442 having the configuration shown in FIG.

図８を参照して、構文解析用モデル学習部４４２は、大量の英語文書を記憶した原言語コーパス４４０から名詞句を抽出し、当該名詞句と、当該名詞句を分解して得られる様々な単語及び句（以下、これらをまとめて単に「名詞句」と呼ぶ）とを抽出するための名詞句抽出・分解部４６０と、名詞句抽出・分解部４６０により抽出された名詞句を記憶するための名詞句構文解析学習データ記憶部４６２と、名詞句構文解析学習データ記憶部４６２に記憶された名詞句の構文解析学習データを用いて名詞句構文解析用モデル３１４の学習を行なうための名詞句構文解析用モデル学習部４６４とを含む。名詞句構文解析用モデル３１４は統計的モデルであり、英語の名詞句が与えられると、その名詞句の構文解析木として最も尤度が高い構文解析木を出力するように学習を行なう。 With reference to FIG. 8, the syntactic analysis model learning unit 442 extracts a nose phrase from the original language corpus 440 that stores a large amount of English documents, and decomposes the nose phrase and the nose phrase into various obtained ones. To memorize the nose phrase extraction / decomposition unit 460 for extracting words and phrases (hereinafter, these are collectively simply referred to as "noun phrase") and the nose phrase extracted by the nose phrase extraction / decomposition unit 460. Nose phrase for learning the nose phrase syntax analysis model 314 using the nose phrase syntax analysis learning data storage unit 462 and the nose phrase syntax analysis learning data storage unit 462 stored in the nose phrase syntax analysis learning data storage unit 462. Includes a model learning unit 464 for syntax analysis. The noun phrase parsing model 314 is a statistical model, and when an English noun phrase is given, it learns to output the most probable parsing tree as the noun phrase parsing tree.

構文解析用モデル学習部４４２はさらに、原言語コーパス４４０から動詞句を抽出するための動詞句抽出部４７０と、動詞句抽出部４７０により抽出された動詞句を格納するための動詞句構文解析学習データ記憶部４７２と、動詞句構文解析学習データ記憶部４７２に記憶された動詞句の構文解析学習データを用いて動詞句構文解析用モデル３１２の学習を行なうための動詞句構文解析用モデル学習部４７４とを含む。動詞句構文解析用モデル３１２は統計的モデルであり、英語の動詞句が与えられると、その動詞句の構文解析木として最も尤度が高い構文解析木を出力するように学習を行なう。 The model learning unit 442 for syntax analysis further includes a verb phrase extraction unit 470 for extracting verb phrases from the original language corpus 440 and a verb phrase syntax analysis learning unit for storing the verb phrases extracted by the verb phrase extraction unit 470. Verb phrase syntax analysis learning unit for learning the verb phrase syntax analysis model 312 using the data storage unit 472 and the verb phrase syntax analysis learning data stored in the data storage unit 472. Includes 474 and. The verb phrase parsing model 312 is a statistical model, and when an English verb phrase is given, learning is performed so as to output the most probable parsing tree as the parsing tree for the verb phrase.

図９を参照して、図６に示すクレーム翻訳部３２８は、入力される構文解析後の英語クレーム３３４を受けて、パターン分類データ３１６を検索し、構文解析後の英語クレーム３３４と合致する英語パターンを持つパターン対を読み出すパターン検索部５００と、パターン検索部５００により読み出されたパターン対から、日本語パターンを取り出し、空の日本語パターンを生成する日本語パターン生成部５０４と、日本語パターン生成部５０４が生成した日本語パターンを記憶するための日本語パターン記憶部５０６とを含む。 With reference to FIG. 9, the claim translation unit 328 shown in FIG. 6 receives the input English claim 334 after parsing, searches the pattern classification data 316, and matches the English claim 334 after parsing. A pattern search unit 500 that reads out a pattern pair having a pattern, a Japanese pattern generation unit 504 that extracts a Japanese pattern from the pattern pair read by the pattern search unit 500, and generates an empty Japanese pattern, and Japanese. It includes a Japanese pattern storage unit 506 for storing the Japanese pattern generated by the pattern generation unit 504.

クレーム翻訳部３２８はさらに、英語パターンが付された構文解析後の英語クレーム３３４をパターン検索部５００から受け、構文解析後の英語クレーム３３４を、英語パターンの構造にあわせて構造部品に分割して出力する入力文分割部５０２と、入力文分割部５０２が出力する構造部品を英語パターンの対応する部分と関連付けて記憶するための構造部品記憶部５０８と、構造部品記憶部５０８に記憶された構造部品を順番に読出し、読み出した構造部品が名詞句か動詞句かを英語パターンから特定して判定信号を出力する構造部品種別判定部５１０と、構造部品種別判定部５１０の出力する判定信号の値に応答して、動詞句構文解析用モデル３１２及び名詞句構文解析用モデル３１４のいずれかを選択するセレクタ５１２と、同じく構造部品種別判定部５１０の出力する判定信号の値に応答して、動詞句翻訳用モデル３２２及び名詞句翻訳用モデル３２０のいずれかを選択するセレクタ５２２と、構造部品種別判定部５１０により種別が判定された構造部品を構造部品記憶部５０８から読出し、セレクタ５１２の選択した構文解析用モデルを用いて構文解析を行ない、構文解析木を出力する構文解析部５１４とを含む。 The claim translation unit 328 further receives the parsed English phrase 334 with the English pattern from the pattern search unit 500, and divides the parsed English phrase 334 into structural parts according to the structure of the English pattern. The output input sentence dividing unit 502, the structural part storage unit 508 for storing the structural parts output by the input sentence dividing unit 502 in association with the corresponding parts of the English pattern, and the structure stored in the structural part storage unit 508. The values of the judgment signal output by the structural part type judgment unit 510 and the structural part type judgment unit 510 that read the parts in order, identify whether the read structural part is a nomenclature phrase or a verb phrase from the English pattern and output a judgment signal. In response to the selector 512 that selects either the verb phrase parsing model 312 or the nomenclature parsing model 314, and in response to the value of the determination signal output by the structural component type determination unit 510, the verb. The selector 522 for selecting either the phrase translation model 322 or the nose phrase translation model 320 and the structural component whose type was determined by the structural component type determination unit 510 were read from the structural component storage unit 508, and the selector 512 was selected. It includes a parsing unit 514 that performs parsing using a parsing model and outputs a parsing tree.

クレーム翻訳部３２８はさらに、構文解析部５１４の出力する構文解析木を、構文解析木変換規則３３０に記憶された構文解析木変換規則にしたがって、対応する日本語の構文解析木に変換する構文解析木変換部５１８とを含む。 The claim translation unit 328 further converts the parsing tree output by the parsing unit 514 into the corresponding Japanese parsing tree according to the parsing tree conversion rule stored in the parsing tree conversion rule 330. Includes a tree conversion unit 518.

クレーム翻訳部３２８はさらに、構文解析木変換部５１８が出力した日本語の構文解析で各単語が現れる順番にしたがって、英語の単語を並べ替えた英語単語列を出力する単語並べ替え部５２０と、単語並べ替え部５２０が出力する英語単語列を受け、セレクタ５２２により選択された翻訳用モデル（対象の入力文が動詞句のときは動詞句翻訳用モデル３２２、名詞句のときは名詞句翻訳用モデル３２０）を用いて英語から日本語への統計的自動翻訳を行なう統計的自動翻訳機５２４と、統計的自動翻訳機５２４の出力する日本語文を、日本語パターン記憶部５０６の記憶する日本語パターンの、翻訳対象となっている構造部品に対応する領域に代入する日本語パターン代入部５２６とを含む。 The claim translation unit 328 further includes a word rearrangement unit 520 that outputs an English word string in which English words are rearranged according to the order in which each word appears in the Japanese syntactic analysis output by the syntactic analysis tree conversion unit 518. The translation model selected by the selector 522 after receiving the English word string output by the word sorting unit 520 (the verb phrase translation model 322 when the target input sentence is a verb phrase, and the nose phrase translation when the target input sentence is a nomenclature phrase). The Japanese pattern storage unit 506 stores the Japanese sentences output by the statistical automatic translator 524 that performs statistical automatic translation from English to Japanese using the model 320) and the statistical automatic translator 524. Includes a Japanese pattern substitution unit 526 that assigns to the area of the pattern corresponding to the structural component to be translated.

＜動作＞
以上に構成を説明したクレーム翻訳システム３００は以下のように動作する。クレーム翻訳システム３００の動作は、大きく分けて３段階に別れる。第１段階は図６に示す動詞句構文解析用モデル３１２及び名詞句構文解析用モデル３１４の学習である。第２段階は、図６に示す名詞句翻訳用モデル３２０及び動詞句翻訳用モデル３２２の学習である。これら２つの処理が終了すると、第３段階として構文解析部３３２及びクレーム翻訳部３２８は英語クレーム３２４を日本語に翻訳して日本語クレーム３２６を出力可能になる。なお、これら処理に先立ち、パターン分類データ３１６及び構文解析木変換規則３３０は何らかの手段で予め準備されているものとする。以下、これらについて順番に説明する。 <Operation>
The claim translation system 300 whose configuration has been described above operates as follows. The operation of the claim translation system 300 is roughly divided into three stages. The first step is the learning of the verb phrase parsing model 312 and the noun phrase parsing model 314 shown in FIG. The second step is the learning of the noun phrase translation model 320 and the verb phrase translation model 322 shown in FIG. When these two processes are completed, the parsing unit 332 and the claim translation unit 328 can translate the English claim 324 into Japanese and output the Japanese claim 326 as the third step. Prior to these processes, it is assumed that the pattern classification data 316 and the parsing tree conversion rule 330 are prepared in advance by some means. Hereinafter, these will be described in order.

＜第１段階:構文解析用モデルの学習＞
図８を参照して、予め英語の原言語コーパス４４０が準備される。原言語コーパス４４０に記憶された各文に対しては、予め形態素解析及び構文解析が行なわれ、構文解析木の情報が付されている。 <Phase 1: Learning a model for parsing>
With reference to FIG. 8, an English original language corpus 440 is prepared in advance. For each sentence stored in the original language corpus 440, morphological analysis and syntactic analysis are performed in advance, and information on the syntactic analysis tree is attached.

名詞句抽出・分解部４６０は、原言語コーパス４４０から、名詞句と、その名詞句に含まれる単語と、その名詞句の構文構造を示す部分解析木を抽出し、名詞句構文解析学習データ記憶部４６２に蓄積する。名詞句構文解析用モデル学習部４６４は、名詞句構文解析学習データ記憶部４６２に蓄積された情報を読出し、名詞句構文解析用モデル３１４の学習を行なう。この学習の結果、名詞句構文解析用モデル３１４は、英語の名詞句が与えられたときに、その構文解析木として最も尤度の高いものを出力するような計算に利用できるようになる。 The noun phrase extraction / decomposition unit 460 extracts a noun phrase, a word contained in the noun phrase, and a partial analysis tree showing the syntactic structure of the noun phrase from the original language corpus 440, and stores the noun phrase parsing learning data. Accumulate in unit 462. The noun phrase syntax analysis model learning unit 464 reads the information stored in the noun phrase syntax analysis learning data storage unit 462 and learns the noun phrase syntax analysis model 314. As a result of this learning, the noun phrase parsing model 314 can be used for calculations that output the most probable syntactic analysis tree given an English noun phrase.

一方、動詞句抽出部４７０は、原言語コーパス４４０から、動詞句と、その動詞句の構文構造を示す部分解析木を抽出し、動詞句構文解析学習データ記憶部４７２に蓄積する。動詞句構文解析用モデル学習部４７４は、動詞句構文解析学習データ記憶部４７２に蓄積された動詞句を読出し、動詞句構文解析用モデル３１２の学習を行なう。この学習の結果、動詞句構文解析用モデル３１２は、英語の動詞句が与えられたときに、その構文解析木として最も尤度の高いものを出力するような計算に利用できるようになる。 On the other hand, the verb phrase extraction unit 470 extracts a verb phrase and a partial analysis tree showing the syntactic structure of the verb phrase from the original language corpus 440, and stores the verb phrase in the verb phrase parsing learning data storage unit 472. The verb phrase syntax analysis model learning unit 474 reads the verb phrase stored in the verb phrase syntax analysis learning data storage unit 472 and learns the verb phrase syntax analysis model 312. As a result of this learning, the verb phrase parsing model 312 can be used for calculations that output the most probable syntactic analysis tree given an English verb phrase.

＜第２段階:翻訳用モデルの学習＞
図７を参照して、パターン分類部３５０は、クレーム対訳コーパス３１０からクレームの日英対訳を読出し、パターン分類データ３１６に含まれるパターンと照合することにより、英語クレームのパターンを特定する。パターン分類部３５０は、特定した英語のパターンをパターン分類データ３１６から読み出す。パターン分類部３５０は、英語クレームを英語パターンにしたがって構造部品に分割しその英語パターンの各構造部品のテキスト領域に代入する。構造部品分類部３５２は、テキストが代入された英語パターンの各構造部品について、それが名詞句なら名詞句データ記憶部３５４に、動詞句なら動詞句データ記憶部３５６に、それぞれ蓄積する。クレーム対訳コーパス３１０に記憶された全てのクレーム対訳についてこの処理が終了すれば、又はこの処理と並行して、名詞句事前並べ替え部３５８及び動詞句事前並べ替え部３６０が動作できる。 <Second stage: Learning of translation model>
With reference to FIG. 7, the pattern classification unit 350 identifies the pattern of the English claim by reading the Japanese-English translation of the claim from the claim translation corpus 310 and collating it with the pattern included in the pattern classification data 316. The pattern classification unit 350 reads the specified English pattern from the pattern classification data 316. The pattern classification unit 350 divides an English claim into structural parts according to an English pattern and substitutes them into a text area of each structural part of the English pattern. The structural component classification unit 352 stores each structural component of the English pattern to which the text is assigned in the noun phrase data storage unit 354 if it is a noun phrase, and in the verb phrase data storage unit 356 if it is a verb phrase. Claim Biling When this process is completed for all claim translations stored in the corpus 310, or in parallel with this process, the noun phrase pre-sorting unit 358 and the verb phrase pre-sorting unit 360 can operate.

名詞句事前並べ替え部３５８の構文解析部３８０は、名詞句データ記憶部３５４に蓄積された名詞句データを順番に読み出す。構文解析部３８０は、読出した名詞句に対し、名詞句構文解析用モデル３１４を利用して構文解析することにより、構文解析木を木構造変換部３８２に出力する。木構造変換部３８２は、構文解析木が与えられると、構文解析木変換規則３３０に格納された木構造変換規則のいずれかを用いて英語クレームの名詞句の構文解析木を対応の日本語の構文解析木の構造に変換し、単語並べ替え部３８４に与える。単語並べ替え部３８４は、この日本語の構文解析木により定まる順番で、英語の単語を並べ替えて名詞句用学習データ記憶部３６２に蓄積する。名詞句用モデル学習部３６４は、このようにして名詞句用学習データ記憶部３６２に蓄積された名詞句用学習データを用いて名詞句翻訳用モデル３２０の学習を行なう。 The parsing unit 380 of the noun phrase pre-sorting unit 358 reads the noun phrase data stored in the noun phrase data storage unit 354 in order. The parsing unit 380 outputs a parsing tree to the tree structure conversion unit 382 by parsing the read noun phrase using the noun phrase syntactic analysis model 314. When a parse tree is given, the tree structure conversion unit 382 uses one of the tree structure conversion rules stored in the parse tree conversion rule 330 to convert the parse tree of the nomenclature of the English claim into the corresponding Japanese parse tree. It is converted into a parsing tree structure and given to the word sorting unit 384. The word sorting unit 384 rearranges English words in the order determined by the Japanese parsing tree and stores them in the noun phrase learning data storage unit 362. The noun phrase model learning unit 364 learns the noun phrase translation model 320 using the noun phrase learning data accumulated in the noun phrase learning data storage unit 362 in this way.

動詞句事前並べ替え部３６０の構文解析部４００は、動詞句データ記憶部３５６に蓄積された動詞句データを順番に読み出す。構文解析部４００は、読出した動詞句を、動詞句構文解析用モデル３１２を利用して構文解析することにより、構文解析木を木構造変換部４０２に出力する。木構造変換部４０２は、構文解析木が与えられると、構文解析木変換規則３３０に格納された木構造変換規則のいずれかを用いて英語クレームの動詞句の構文解析木を対応の日本語の構文解析木の構造に変換し、単語並べ替え部４０４に与える。単語並べ替え部４０４は、この日本語の構文解析木により定まる順番で、英語の単語を並べ替えて動詞句用学習データ記憶部３６６に蓄積する。動詞句用モデル学習部３６８は、このようにして動詞句用学習データ記憶部３６６に蓄積された動詞句用学習データを用いて動詞句翻訳用モデル３２２の学習を行なう。 The syntactic analysis unit 400 of the verb phrase pre-sorting unit 360 reads out the verb phrase data stored in the verb phrase data storage unit 356 in order. The parsing unit 400 outputs the parsing tree to the tree structure conversion unit 402 by parsing the read verb phrase using the verb phrase parsing model 312. Given a parsing tree, the tree transforming unit 402 uses one of the tree parsing rules stored in the parsing rule 330 to translate the parsing tree of the verb phrase in the English claim into the corresponding Japanese parse tree. It is converted into a parsing tree structure and given to the word sorting unit 404. The word sorting unit 404 rearranges English words in the order determined by the Japanese parsing tree and stores them in the verb phrase learning data storage unit 366. The verb phrase model learning unit 368 learns the verb phrase translation model 322 using the verb phrase learning data accumulated in the verb phrase learning data storage unit 366 in this way.

以上のようにして、クレーム対訳コーパス３１０に保存されたクレーム対訳の全てについての処理が終了すると、名詞句翻訳用モデル３２０及び動詞句翻訳用モデル３２２を利用して名詞句及び動詞句の英日の翻訳を行なうことが可能になる。 When the processing for all the claim translations stored in the claim translation corpus 310 is completed as described above, the English-Japanese nomenclature and the verb phrase are used by using the nose phrase translation model 320 and the verb phrase translation model 322. Can be translated.

＜第３段階:クレームの翻訳＞
クレームの翻訳時、クレーム翻訳部３２８は以下のように動作する。クレーム翻訳部３２８の処理に先立ち、図６に示す構文解析部３３２により英語クレーム３２４に対する構文解析が行われる。その結果、構文解析後の英語クレーム３３４がクレーム翻訳部３２８に入力される。 <Third stage: Complaint translation>
When translating a claim, the claim translation unit 328 operates as follows. Prior to the processing of the claim translation unit 328, the syntax analysis unit 332 shown in FIG. 6 performs a syntactic analysis on the English claim 324. As a result, the English claim 334 after parsing is input to the claim translation unit 328.

図９を参照して、構文解析後の英語クレーム３３４がクレーム翻訳部３２８に与えられると、パターン検索部５００がパターン分類データ３１６に記憶されているパターンと構文解析後の英語クレーム３３４とを照合し、一致する英語パターンを検索する。パターン検索部５００は、構文解析後の英語クレーム３３４の英語と最もよく一致する英語パターンを、対応の日本語パターンとともにパターン分類データ３１６から読出し、日本語パターンを日本語パターン生成部５０４に、英語パターンと構文解析後の英語クレーム３３４とを入力文分割部５０２に、それぞれ与える。 With reference to FIG. 9, when the English claim 334 after the parsing is given to the claim translation unit 328, the pattern search unit 500 collates the pattern stored in the pattern classification data 316 with the English claim 334 after the parsing. And search for a matching English pattern. The pattern search unit 500 reads the English pattern that best matches the English of the English claim 334 after parsing from the pattern classification data 316 together with the corresponding Japanese pattern, and sends the Japanese pattern to the Japanese pattern generation unit 504 in English. The pattern and the English claim 334 after parsing are given to the input sentence division unit 502, respectively.

日本語パターン生成部５０４は、与えられた日本語パターンに基づいて、日本語パターン記憶部５０６に日本語パターンのテキストの記憶領域を、その日本語パターンの構造部品ごとに作成する。これら各領域はこの時点では空である。 Based on the given Japanese pattern, the Japanese pattern generation unit 504 creates a storage area for the text of the Japanese pattern in the Japanese pattern storage unit 506 for each structural component of the Japanese pattern. Each of these regions is empty at this point.

入力文分割部５０２は、構文解析後の英語クレーム３３４をパターン分類データ３１６から読出された英語パターンにしたがって分割し、各構造部品のテキストの記憶領域に代入して構造部品記憶部５０８に格納する。構造部品種別判定部５１０は、構造部品記憶部５０８にこれら構造部品が格納されると、そのうちの一つを読出し、その構造部品が名詞句か動詞句かを判定して判定信号をセレクタ５１２及びセレクタ５２２に与える。この判定信号に応答して、セレクタ５１２は、構造部品が動詞句である場合には動詞句構文解析用モデル３１２を、名詞句である場合には名詞句構文解析用モデル３１４を、それぞれ選択して構文解析部５１４に接続する。セレクタ５２２は、構造部品が動詞句である場合には動詞句翻訳用モデル３２２を、名詞句である場合には名詞句翻訳用モデル３２０を、それぞれ選択して統計的自動翻訳機５２４に接続する。 The input sentence division unit 502 divides the English claim 334 after parsing according to the English pattern read from the pattern classification data 316, substitutes it into the text storage area of each structural component, and stores it in the structural component storage unit 508. .. When the structural component type determination unit 510 stores these structural components in the structural component storage unit 508, the structural component type determination unit 510 reads one of them, determines whether the structural component is a noun phrase or a verb phrase, and selects a determination signal 512 and a determination signal. Give to selector 522. In response to this determination signal, the selector 512 selects the verb phrase syntax analysis model 312 when the structural component is a verb phrase, and the noun phrase syntax analysis model 314 when the structural part is a noun phrase. Connect to the syntax analysis unit 514. The selector 522 selects a verb phrase translation model 322 when the structural part is a verb phrase, and a noun phrase translation model 320 when it is a noun phrase, and connects it to the statistical automatic translator 524. ..

構文解析部５１４は、構造部品種別判定部５１０から構造部品を受けとり、セレクタ５１２を介して接続された動詞句構文解析用モデル３１２又は名詞句構文解析用モデル３１４を使用して構造部品の構文解析を行なう。この結果、構文解析木が構文解析部５１４から構文解析木変換部５１８に与えられる。構文解析部５１４は、この構文解析において、対象となる構造部品が名詞句であれば名詞句構文解析用モデル３１４を使用して、動詞句であれば動詞句構文解析用モデル３１２を使用して、それぞれ対象となる構造部品の構文解析を行なう。動詞句構文解析用モデル３１２は動詞句を学習データとして図８で示した構文解析用モデル学習部４４２で学習したものであり、名詞句構文解析用モデル３１４は名詞句を学習データとして構文解析用モデル学習部４４２で学習したものである。したがって、名詞句構文解析用モデル３１４が行なう構文解析は構造部品の種別に応じた最適なものとなる。 The parsing unit 514 receives the structural parts from the structural part type determination unit 510, and uses the verb phrase parsing model 312 or the noun phrase parsing model 314 connected via the selector 512 to parse the structural parts. To do. As a result, the parsing tree is given from the parsing unit 514 to the parsing tree conversion unit 518. In this parsing, the parsing unit 514 uses the noun phrase parsing model 314 if the target structural part is a noun phrase, and uses the verb phrase parsing model 312 if it is a verb phrase. , Parsing the target structural parts respectively. The verb phrase syntax analysis model 312 was trained by the syntactic analysis model learning unit 442 shown in FIG. 8 using the verb phrase as training data, and the noun phrase syntax analysis model 314 was used for syntactic analysis using the noun phrase as training data. It was learned by the model learning unit 442. Therefore, the syntactic analysis performed by the noun phrase parsing model 314 is optimal according to the type of structural component.

構文解析木変換部５１８は、構文解析部５１４から構文解析木を受け取ると、構文解析木変換規則３３０を参照し、変換規則の中で、構文解析部５１４から受け取った構文解析木に対する構文解析規則として最もふさわしいものを選択する。構文解析木変換部５１８は、選択された変換規則を用いて、構文解析部５１４から与えられた構文解析木を、その構文解析木により表される英語に対し、その訳文となるべき日本語の構文構造を示す構文解析木に変換する。この変換後の構文解析木において、木の構造は日本語文の構造であるが、各リーフに割り当てられている語は英語の単語である。構文解析木変換部５１８は、変換後の構文解析木を単語並べ替え部５２０に与える。 When the parsing tree conversion unit 518 receives the parsing tree from the parsing unit 514, it refers to the parsing tree conversion rule 330, and in the conversion rules, the parsing rule for the parsing tree received from the parsing unit 514. Select the most suitable one as. The parsing tree conversion unit 518 uses the selected conversion rule to translate the parsing tree given by the parsing unit 514 into the English represented by the parsing tree in Japanese. Convert to a parse tree that shows the syntax structure. In this converted parsing tree, the structure of the tree is the structure of a Japanese sentence, but the words assigned to each leaf are English words. The parse tree conversion unit 518 gives the converted parse tree to the word sorting unit 520.

単語並べ替え部５２０は、構文解析木変換部５１８から与えられた変換後の構文解析木を用い、その構文解析木に出現する英単語を、構文解析木の構造によって定まる順番に並べ替え、得られた英単語列を統計的自動翻訳機５２４に入力として与える。 The word rearranging unit 520 uses the converted syntactic analysis tree given by the parsing tree conversion unit 518, and rearranges the English words appearing in the parsing tree in the order determined by the structure of the parsing tree. The obtained English word string is given to the statistical automatic translator 524 as an input.

統計的自動翻訳機５２４は、この英単語列に対し、セレクタ５２２により接続される翻訳用モデルを用いた英語から日本語への統計的翻訳を行なって、その結果得られた日本語の単語列を日本語パターン代入部５２６に与える。このとき使用される翻訳用モデルは、翻訳対象の構造部品が動詞句のときには動詞句翻訳用モデル３２２であり、名詞句である場合には名詞句翻訳用モデル３２０である。したがって、各構造部品の英日翻訳は、名詞句は名詞句として、動詞句は動詞句として、正しく翻訳される可能性が高くなる。 The statistical automatic translator 524 performs statistical translation from English to Japanese using the translation model connected by the selector 522 on this English word string, and the Japanese word string obtained as a result. Is given to the Japanese pattern substitution unit 526. The translation model used at this time is a verb phrase translation model 322 when the structural component to be translated is a verb phrase, and a noun phrase translation model 320 when it is a noun phrase. Therefore, in the English-Japanese translation of each structural part, there is a high possibility that a noun phrase will be correctly translated as a noun phrase and a verb phrase will be translated as a verb phrase.

日本語パターン代入部５２６は、統計的自動翻訳機５２４が出力する日本語の単語列について、もとの英語の構造部品と日本語パターンの構造部品との対応情報を用いて、日本語パターン記憶部５０６に記憶されている日本語パターンの、該当する構造部品のテキスト記憶領域に翻訳結果を代入する。 The Japanese pattern substitution unit 526 uses the correspondence information between the original English structural parts and the Japanese pattern structural parts for the Japanese word string output by the statistical automatic translator 524 to store the Japanese pattern. The translation result is substituted into the text storage area of the corresponding structural component of the Japanese pattern stored in the part 506.

以上の処理が、構造部品記憶部５０８に記憶された全ての構造部品に対して行なわれると、日本語パターン記憶部５０６に記憶された日本語パターンの各構造部品には、対応する英語の構造部品を日本語に訳した単語列が格納されることになる。これらの翻訳結果は、もとの英語の構造部品が動詞句であれば動詞句に最適化された動詞句翻訳用モデル３２２を用いて行なわれ、名詞句であれば名詞句に最適化された名詞句翻訳用モデル３２０を用いて行なわれたものである。したがって、日本語のクレームを構成する個々の構造部品は、もとの英語の構造部品について、その種別を反映した的確なものとなることが期待される。また、日本語パターン記憶部５０６の構造部品は、日本語のクレームにおける各構造部品の出現順序にしたがって配列されている。したがって、日本語パターン記憶部５０６に記憶された各構造部品を先頭から順番に読み出して適切に表示することにより、英語クレーム３２４に対する正確な翻訳である日本語クレーム３２６を得ることができる。 When the above processing is performed on all the structural parts stored in the structural part storage unit 508, each structural part of the Japanese pattern stored in the Japanese pattern storage unit 506 has a corresponding English structure. The word string that translates the parts into Japanese will be stored. These translation results were performed using the verb phrase translation model 322, which was optimized for verb phrases if the original English structural part was a verb phrase, and optimized for noun phrases if it was a noun phrase. This was done using a noun phrase translation model 320. Therefore, it is expected that the individual structural parts that make up the Japanese claim will be accurate, reflecting the type of the original English structural part. Further, the structural parts of the Japanese pattern storage unit 506 are arranged according to the appearance order of each structural part in the Japanese claim. Therefore, by reading out each structural component stored in the Japanese pattern storage unit 506 in order from the beginning and displaying it appropriately, a Japanese claim 326 which is an accurate translation of the English claim 324 can be obtained.

［第２の実施の形態］
上記第１の実施の形態では、図７に示す構造部品翻訳用モデル学習部３１８による翻訳用モデルの学習において、名詞句事前並べ替え部３５８及び動詞句事前並べ替え部３６０による、名詞句と動詞句の事前並べ替えを行なっている。また、図９に示すクレーム翻訳部３２８による英語クレームの翻訳時にも、構文解析部５１４、構文解析木変換部５１８及び単語並べ替え部５２０による単語の事前並べ替えを行なっている。このように、学習時及び翻訳時に単語の事前並べ替えを行なう事により、学習の効率が高くなり、かつ翻訳の精度も高くなる。したがって、このように単語並べ替えを行なうことが望ましい。しかし、本発明はそのような実施の形態には限定されない。英語パターンの構造部品の文法的特性（動詞句か名詞句か、というような種別）に応じて別々の翻訳用モデルの学習を予め行ない、翻訳時には翻訳対象の構造部品の文法的特性に応じて適切な翻訳用モデルを用いて自動翻訳を行なうようなものであれば、どのようなものでもよい。第２の実施の形態は、事前並べ替えを省略したものである。 [Second Embodiment]
In the first embodiment, in the learning of the translation model by the structural component translation model learning unit 318 shown in FIG. 7, noun phrase and verb by the noun phrase pre-sorting unit 358 and the verb phrase pre-sorting unit 360. Pre-sorting phrases. Further, when the English claim is translated by the claim translation unit 328 shown in FIG. 9, the words are pre-sorted by the parsing unit 514, the parsing tree conversion unit 518, and the word rearrangement unit 520. In this way, by performing pre-sorting of words at the time of learning and translation, the learning efficiency is improved and the translation accuracy is also improved. Therefore, it is desirable to rearrange the words in this way. However, the present invention is not limited to such embodiments. Separate translation models are learned in advance according to the grammatical characteristics of the structural parts of the English pattern (types such as verb phrases or nomenclature phrases), and at the time of translation, according to the grammatical characteristics of the structural parts to be translated. Anything can be used as long as it automatically translates using an appropriate translation model. The second embodiment omits the pre-sorting.

図１０にこの第２の実施の形態に係るクレーム翻訳システム５５０の全体構成をブロック図形式で示す。このクレーム翻訳システム５５０が図６に示すクレーム翻訳システム３００と異なるのは、動詞句構文解析用モデル３１２、名詞句構文解析用モデル３１４、及び構文解析木変換規則３３０を使用しない点、及び、図６の構造部品翻訳用モデル学習部３１８及びクレーム翻訳部３２８に替えて、いずれも事前並べ替えを用いない構造部品翻訳用モデル学習部５７０及びクレーム翻訳部５７２をそれぞれ含むことである。なお、動詞句構文解析用モデル３１２及び名詞句構文解析用モデル３１４は、第１の実施の形態と同様、構文解析部３３２による構文解析において使用される。 FIG. 10 shows the overall configuration of the claim translation system 550 according to the second embodiment in a block diagram format. The claim translation system 550 differs from the claim translation system 300 shown in FIG. 6 in that it does not use the verb phrase parsing model 312, the nomenclature parsing model 314, and the parsing tree conversion rule 330, and FIG. Instead of the structural component translation model learning unit 318 and the claim translation unit 328 of 6, each includes the structural component translation model learning unit 570 and the claim translation unit 572, which do not use pre-sorting. The verb phrase parsing model 312 and the noun phrase parsing model 314 are used in the parsing by the parsing unit 332, as in the first embodiment.

図１１に、この構造部品翻訳用モデル学習部５７０の概略ブロック図を示す。構造部品翻訳用モデル学習部５７０が図７に示す構造部品翻訳用モデル学習部３１８と異なるのは、名詞句データ記憶部３５４、動詞句データ記憶部３５６、事前並べ替えのための名詞句事前並べ替え部３５８及び動詞句事前並べ替え部３６０を含まず、構造部品分類部３５２から出力される名詞句が直接に名詞句用学習データ記憶部３６２に蓄積され、動詞句が直接に動詞句用学習データ記憶部３６６に蓄積される点である。他の点では構造部品翻訳用モデル学習部５７０の各部は構造部品翻訳用モデル学習部３１８の対応部分と同様である。 FIG. 11 shows a schematic block diagram of the structural component translation model learning unit 570. The structural component translation model learning unit 570 differs from the structural component translation model learning unit 318 shown in FIG. 7 in that the nomenclature data storage unit 354, the verb phrase data storage unit 356, and the nomenclature pre-arrangement for pre-sorting. The nomenclature output from the structural part classification unit 352 is directly stored in the nomenclature learning data storage unit 362 without including the replacement unit 358 and the verb phrase pre-sorting unit 360, and the verb phrase is directly learned for the verb phrase. This is a point accumulated in the data storage unit 366. In other respects, each part of the structural part translation model learning unit 570 is the same as the corresponding part of the structural part translation model learning unit 318.

図１２に、図１０に示すクレーム翻訳部５７２の概略ブロック図を示す。クレーム翻訳部５７２は、図９と比較して、事前並べ替えに必要なセレクタ５１２、構文解析部５１４、構文解析木変換規則３３０，構文解析木変換部５１８及び単語並べ替え部５２０を含まず、構造部品記憶部５０８の出力が統計的自動翻訳機５２４に直接与えられる点で図９のクレーム翻訳部３２８と異なっている。その他の点ではクレーム翻訳部５７２の各部はクレーム翻訳部３２８の各部と同じ構成である。 FIG. 12 shows a schematic block diagram of the claim translation unit 572 shown in FIG. The claim translation unit 572 does not include the selector 512, the parsing unit 514, the parsing tree conversion rule 330, the parsing tree conversion unit 518, and the word sorting unit 520, which are necessary for pre-sorting, as compared with FIG. It differs from the claim translation unit 328 of FIG. 9 in that the output of the structural component storage unit 508 is directly given to the statistical automatic translator 524. In other respects, each part of the claim translation unit 572 has the same configuration as each part of the claim translation unit 328.

この第２の実施の形態では、翻訳用モデルの学習時にも、翻訳時にも事前並べ替えが行なわれない点で第１の実施の形態と異なっている。このような構成でもクレームの翻訳は従来のものよりも高精度に行なうことができる。特に、２つの言語の語順が比較的近いような言語間の翻訳では、事前並べ替えを用いなくても十分に翻訳精度の向上を見込むことができる。 This second embodiment is different from the first embodiment in that pre-sorting is not performed at the time of training the translation model and at the time of translation. Even with such a configuration, the translation of the claim can be performed with higher accuracy than the conventional one. In particular, in translation between languages in which the word orders of the two languages are relatively close, it can be expected that the translation accuracy will be sufficiently improved without using pre-sorting.

＜変形例＞
上記第１の実施の形態では事前並べ替えを使用し、第２の実施の形態では事前並べ替えを使用していない。本発明はそのような実施の形態のみに限定して適用可能ではなく、たとえば、事前並べ替えを行なうか否かを選択可能にしてもよい。このようにすることで、対象となる言語によって事前並べ替えを行なう翻訳と行なわない翻訳との双方が可能になる。 <Modification example>
The first embodiment uses pre-sorting, and the second embodiment does not use pre-sorting. The present invention is not applicable only to such embodiments, and for example, it may be possible to select whether or not to perform pre-sorting. By doing so, it is possible to perform both translations that are pre-sorted according to the target language and translations that are not.

また上記実施の形態では、翻訳用モデルとして動詞句翻訳用のモデルと名詞句翻訳用のモデルとを用いている。しかし本発明はそのような実施の形態には限定されず、それ以外のモデル、例えば副詞句翻訳用のモデルを用いることもできる。 Further, in the above embodiment, a model for verb phrase translation and a model for noun phrase translation are used as translation models. However, the present invention is not limited to such an embodiment, and other models, such as a model for adverbial phrase translation, can also be used.

［コンピュータによる実現］
上記第１の実施の形態に係るクレーム翻訳システム３００、第２の実施の形態に係るクレーム翻訳システム５５０、及びその他の変形例は、コンピュータハードウェアと、そのコンピュータハードウェア上で実行されるコンピュータプログラムとにより実現できる。図１３はこのコンピュータシステム９３０の外観を示し、図１４はコンピュータシステム９３０の内部構成を示す。 [Realization by computer]
The claim translation system 300 according to the first embodiment, the claim translation system 550 according to the second embodiment, and other modifications are computer hardware and a computer program executed on the computer hardware. It can be realized by. FIG. 13 shows the appearance of the computer system 930, and FIG. 14 shows the internal configuration of the computer system 930.

図１３を参照して、このコンピュータシステム９３０は、メモリポート９５２及びＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ）ドライブ９５０を有するコンピュータ９４０と、キーボード９４６と、マウス９４８と、モニタ９４２とを含む。 With reference to FIG. 13, the computer system 930 includes a computer 940 having a memory port 952 and a DVD (Digital Versaille Disc) drive 950, a keyboard 946, a mouse 948, and a monitor 942.

図１４を参照して、コンピュータ９４０は、メモリポート９５２及びＤＶＤドライブ９５０に加えて、ＣＰＵ（中央処理装置）９５６と、ＣＰＵ９５６、メモリポート９５２及びＤＶＤドライブ９５０に接続されたバス９６６と、ブートプログラム等を記憶する読出専用メモリ（ＲＯＭ）９５８と、バス９６６に接続され、プログラム命令、システムプログラム及び作業データ等を記憶するランダムアクセスメモリ（ＲＡＭ）９６０と、ハードディスク９５４を含む。コンピュータシステム９３０はさらに、他端末との通信を可能とするネットワーク９６８への接続を提供するネットワークインターフェイス（Ｉ／Ｆ）９４４を含む。 With reference to FIG. 14, the computer 940 has a memory port 952 and a DVD drive 950, as well as a CPU (central processing unit) 956, a CPU 956, a bus 966 connected to the memory port 952 and the DVD drive 950, and a boot program. It includes a read-only memory (ROM) 958 that stores the data, a random access memory (RAM) 960 that is connected to the bus 966 and stores program instructions, system programs, work data, and the like, and a hard disk 954. The computer system 930 further includes a network interface (I / F) 944 that provides a connection to a network 968 that allows communication with other terminals.

コンピュータシステム９３０を上記した実施の形態に係るクレーム翻訳システム３００及びクレーム翻訳システム５５０の各機能部として機能させるためのコンピュータプログラムは、ＤＶＤドライブ９５０又はメモリポート９５２に装着されるＤＶＤ９６２又はリムーバブルメモリ９６４に記憶され、さらにハードディスク９５４に転送される。又は、プログラムはネットワーク９６８を通じてコンピュータ９４０に送信されハードディスク９５４に記憶されてもよい。プログラムは実行の際にＲＡＭ９６０にロードされる。ＤＶＤ９６２から、リムーバブルメモリ９６４から又はネットワーク９６８を介して、直接にＲＡＭ９６０にプログラムをロードしてもよい。 A computer program for causing the computer system 930 to function as each functional unit of the claim translation system 300 and the claim translation system 550 according to the above-described embodiment is provided in the DVD 962 or the removable memory 964 mounted on the DVD drive 950 or the memory port 952. It is stored and further transferred to the hard disk 954. Alternatively, the program may be transmitted to the computer 940 via network 968 and stored on the hard disk 954. The program is loaded into RAM 960 at run time. Programs may be loaded directly into RAM 960 from DVD 962, from removable memory 964, or via network 968.

このプログラムは、コンピュータ９４０を、上記実施の形態に係るクレーム翻訳システム３００及びクレーム翻訳システム５５０の各機能部として機能させるための複数の命令からなる命令列を含む。コンピュータ９４０にこの動作を行なわせるのに必要な基本的機能のいくつかはコンピュータ９４０上で動作するオペレーティングシステム若しくはサードパーティのプログラム又はコンピュータ９４０にインストールされる、ダイナミックリンク可能な各種プログラミングツールキット又はプログラムライブラリにより提供される。したがって、このプログラム自体はこの実施の形態のシステム及び方法を実現するのに必要な機能を実現するためのオブジェクトコードの全てを必ずしも含まなくてよい。このプログラムは、命令の内、所望の結果が得られるように制御されたやり方で適切な機能又はプログラミングツールキット又はプログラムライブラリ内の適切なプログラムを実行時に動的に呼出すことにより、上記したシステムとしての機能を実現する命令のみを含んでいればよい。もちろん、プログラムのみで必要な機能を全て提供するようにしてもよい。 This program includes an instruction sequence including a plurality of instructions for causing the computer 940 to function as each functional unit of the claim translation system 300 and the claim translation system 550 according to the above embodiment. Some of the basic functions required to force the computer 940 to perform this operation are operating systems or third-party programs running on the computer 940 or various dynamically linkable programming toolkits or programs installed on the computer 940. Provided by the library. Therefore, the program itself does not necessarily include all of the object code for realizing the functions required to realize the system and method of this embodiment. This program, as described above, by dynamically invoking the appropriate function or the appropriate program in the programming toolkit or program library at runtime in a controlled manner to obtain the desired result within the instructions. It is only necessary to include the instruction that realizes the function of. Of course, the program alone may provide all the necessary functions.

また、クレーム翻訳システム３００及びクレーム翻訳システム５５０の各機能部を別々のコンピュータに分散して処理したり、ネットワークを介して別々の地域に存在する別々のコンピュータで分散して処理したりするようにしてもよい。 In addition, each functional part of the claim translation system 300 and the claim translation system 550 may be distributed and processed on different computers, or may be distributed and processed by different computers existing in different regions via a network. You may.

今回開示された実施の形態は単に例示であって、本発明が上記した実施の形態のみに制限されるわけではない。本発明の範囲は、発明の詳細な説明の記載を参酌した上で、特許請求の範囲の各請求項によって示され、そこに記載された文言と均等の意味及び範囲内での全ての変更を含む。 The embodiments disclosed this time are merely examples, and the present invention is not limited to the above-described embodiments. The scope of the present invention is indicated by each claim of the scope of claims, taking into consideration the description of the detailed description of the invention, and all changes within the meaning and scope equivalent to the wording described therein. include.

３００、５５０クレーム翻訳システム
３１０クレーム対訳コーパス
３１２動詞句構文解析用モデル
３１４名詞句構文解析用モデル
３１６パターン分類データ
３１８、５７０構造部品翻訳用モデル学習部
３２０名詞句翻訳用モデル
３２２動詞句翻訳用モデル
３２８、５７２クレーム翻訳部
３３０構文解析木変換規則
３５８名詞句事前並べ替え部
３６０動詞句事前並べ替え部
４４２構文解析用モデル学習部
５２４統計的自動翻訳機 300, 550 Claim translation system 310 Claim translation corpus 312 Model for verb phrase parsing 314 Model for nomenclature parsing 316 Pattern classification data 318 570 Model for structural parts translation Learning unit 320 Model for verb phrase translation 322 Model for verb phrase translation 328, 572 Claim translation department 330 Parsing tree conversion rule 358 Nomenclature pre-sorting section 360 Verb phrase pre-sorting section 442 Parsing model learning section 524 Statistical automatic translator

Claims

An automatic translation device that translates a sentence in the first language into a sentence in a second language different from the first language.
A dividing means for identifying the sentence pattern of the first language and dividing the sentence of the first language into structural parts, and
Correspondence relationship storage means for memorizing the correspondence between the sentence pattern of the first language and its structural parts and the sentence pattern of the second language and its structural parts, and the grammatical characteristics of each structural part. there are, the pattern of said pattern and sentence of the second language sentence of the first language are each, and correlation storage means you express in a tree structure,
By referring to the correspondence storage means, the pattern specifying means for specifying the pattern of the sentence of the second language, which is previously associated with the pattern of the sentence of the first language specified by the dividing means. ,
Using the correspondence relationship stored in the correspondence storage means, the pattern of the sentence of the first language divided by the division means and the pattern of the second language specified by the pattern identification means. Correspondence identification means for specifying the correspondence relationship of structural parts and the grammatical characteristics of each structural part,
For each of the structural parts of the first language, for the word string constituting the structural part, for translation from the first language to the second language according to the grammatical characteristics of the structural part. A translation means that performs automatic translation using a model to generate a translation of the second language,
For each of the structural parts of the first language, the translation of the second language obtained by the translation means is subjected to the pattern identification means according to the correspondence relationship of the structural parts specified by the correspondence identification means. By substituting into any of the structural components of the pattern of the sentence in the second language specified by, the substitution means for generating the sentence in the second language, which is a translation of the sentence in the first language. Including
The dividing means
Predetermined delimiters used to delimit structural components of a sentence in the first language contained in the sentence in the first language, based on the result of parsing the sentence in the first language. A sentence dividing means for dividing a sentence in the first language into a plurality of structural parts by detecting a pattern, and a sentence dividing means.
A grammatical characteristic determining means for determining the grammatical characteristics of each structural component divided by the sentence dividing means based on the result of parsing for the structural component,
Automatic translation including means for specifying the sentence pattern of the first language based on the appearance order of the structural parts divided by the sentence dividing means and the grammatical characteristics determined by the grammatical characteristic determining means for each structural part. Device.

Further, it receives the structural component of the first language provided between the corresponding identification means and the translation means and whose correspondence relationship and grammatical characteristics are specified by the corresponding identification means, and receives the structural component of the first language. For each of the above, the order of the word strings constituting the structural component is rearranged according to the order in which the translated words of the word strings appear in the second language, and the word rearranging means is given to the translation means. , The automatic translation apparatus according to claim 1.

The word rearrangement means
A parsing means for generating a parsing tree for the first language by performing a parsing for each of the structural parts of the first language according to the grammatical characteristics of the structural parts.
A conversion means for converting the parsing tree of the first language generated by the parsing means into a parsing tree of the second language according to a conversion rule prepared in advance.
A claim including a rearrangement means for rearranging and outputting the words of the structural parts of the first language according to the order of appearance of the words in the parsing tree of the second language obtained by the conversion by the conversion means. Item 2. The automatic translation device according to item 2.

The translation means is pre-optimized for the translation of the word string having the grammatical characteristics of the structural part with respect to the word string constituting the structural part for each of the structural parts of the first language. Claims 1 to 3 include a grammatical characteristic-specific translation means that performs automatic translation using a model for translation from the first language to the second language and generates a translation of the second language. The automatic translation device described in any of.