JP4971844B2

JP4971844B2 - Example database creation device, example database creation program, translation device, and translation program

Info

Publication number: JP4971844B2
Application number: JP2007068088A
Authority: JP
Inventors: 功雄後藤; 英輝田中
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2007-03-16
Filing date: 2007-03-16
Publication date: 2012-07-11
Anticipated expiration: 2027-03-16
Also published as: JP2008233955A

Description

本発明は、翻訳に用いる用例データベースを作成する用例データベース作成装置及び用例データベース作成プログラム、並びに、この用例データベースを用いて翻訳を行う翻訳装置及び翻訳プログラムに関する。 The present invention relates to an example database creation device and an example database creation program for creating an example database used for translation, and a translation device and a translation program that perform translation using the example database.

従来、用例（言葉などの、実際に用いられた例、使い方の例）を用いて、第一言語（例えば、日本語）を第二言語（例えば、英語）に翻訳する翻訳システムには、第一言語の入力文と、最も類似する用例を文単位で選択してその対訳の第二言語を編集するもの（例えば、非特許文献１参照）や、第一言語の入力文と完全に一致する部分的な用例の第二言語の単語を組み合わせて第二言語の文を生成するものがある（例えば、特許文献１参照）。 Conventionally, a translation system that translates a first language (for example, Japanese) into a second language (for example, English) using examples (examples such as words, examples of actual use, examples of usage) An input sentence in one language and the most similar example are selected in sentence units and the second language of the parallel translation is edited (see, for example, Non-Patent Document 1), or the input sentence in the first language completely matches. There is one that generates a second language sentence by combining partial language words in a partial example (see, for example, Patent Document 1).

また、用例を用いた翻訳システムには、第一言語の入力文を、翻訳前に分割するものもあり（例えば、非特許文献２参照）、さらに、第二言語を翻訳元の言語として、第一言語の入力文を分割して翻訳するものもある（例えば、特許文献２，３参照）。
特開２００６−２５５２２９０号公報特開２００４−１１０５８３号公報特開平１０−３１２３８２号公報Ｓｕｍｉｔａ，Ｅｘａｍｐｌｅ−ｂａｓｅｄｍａｃｈｉｎｅｔｒａｎｓｌａｔｉｏｎｕｓｉｎｇＤＰ−ｍａｔｃｈｉｎｇｂｅｔｗｅｅｎｗｏｒｄｓｅｑｕｅｎｃｅｓ，３９ｔｈＡＣＬｗｏｒｋｓｈｏｐｏｎＤＤＭＴ，ｐｐ．１−８，２００１．金ほか、日英機械翻訳のための日本語長文自動短文分割と主語の補完，情報処理学会論文誌，Ｖｏｌ．３５，Ｎｏ．６，ｐｐ．１０１８−１０２８，１９９４． In addition, some translation systems using examples split an input sentence in a first language before translation (see, for example, Non-Patent Document 2), and the second language is a translation source language. There is also one that divides and translates an input sentence in one language (for example, see Patent Documents 2 and 3).
JP 2006-2525290 A JP 2004-110583 A Japanese Patent Laid-Open No. 10-312382 Sumita, Example-based machine translation using DP-matching betaword word sequences, 39th ACL workshop on DDMT, pp. 1-8, 2001. Kim et al., Japanese automatic short sentence segmentation and subject completion for Japanese-English machine translation, IPSJ Journal, Vol. 35, no. 6, pp. 1018-1028, 1994.

しかしながら、従来の翻訳システムでは、第一言語が複文等の長い文である場合に、第二言語の文単位の用例を用いようとすると、この第一言語の長い文には多くの内容が含まれているので、すべての内容が類似する用例が当該翻訳システムで用いる用例データベース中に含まれる可能性が低くなってしまい、すべての内容が類似する用例が用例データベース中に含まれない場合、翻訳精度が下がってしまうことが生じうる。 However, in the conventional translation system, when the first language is a long sentence such as a compound sentence, if an attempt is made to use a sentence unit example of the second language, the long sentence of the first language includes a lot of contents. As a result, it is unlikely that all examples with similar contents will be included in the example database used in the translation system. If all examples with similar contents are not included in the example database, It can happen that the accuracy is reduced.

この結果、非特許文献１に開示されている翻訳システムでは、入力文が長い文である場合、翻訳できる文のバリエーションが少なくなってしまい、自然な文に翻訳することができないという問題がある。
例えば、翻訳のプロフェッショナルが行った日本語から英語へのニュース翻訳の結果を調査したところ、日本語ニュース文は複数の英語ニュース文に翻訳されることの多いことが判明した。つまり、自然な英語ニュース文の単位は、日本語ニュース文の単位よりも短いことが判明した。このため、日本語の長いニュースを、自然な英語のニュースに翻訳するためには、１文の日本語を１文以上の英語に翻訳した方が適切であることになる。 As a result, in the translation system disclosed in Non-Patent Document 1, when the input sentence is a long sentence, there is a problem that variations in the sentence that can be translated are reduced and the sentence cannot be translated into a natural sentence.
For example, when the results of news translation from Japanese to English conducted by a translation professional were investigated, it was found that Japanese news sentences were often translated into multiple English news sentences. In other words, the unit of a natural English news sentence was found to be shorter than the unit of a Japanese news sentence. For this reason, in order to translate long Japanese news into natural English news, it is more appropriate to translate one Japanese sentence into one or more English sentences.

また、特許文献１に開示されている翻訳システムでは、入力文が日本語の長い文である場合、英語の文中の一部の表現を組み合わせて文を生成するので、日本語の１文を英語の複数の文に翻訳することができない。また、この翻訳システムでは、英語の文の構造自体を生成するため、不自然な英語の文を生成してしまう可能性がある。なお、文の構造とは、例えば、ＳＶＯＣ等で表されるものであり、ここでは、名詞や動詞や形容詞などの断片を繋ぎ合わせて英文を作成する。このときに構造も生成されることになるが、文法的に正しい又は自然な英語にならない可能性がある。この結果、特許文献１に開示されている翻訳システムでも、自然な文に翻訳することができないという問題がある。 Further, in the translation system disclosed in Patent Document 1, when an input sentence is a long sentence in Japanese, a sentence is generated by combining a part of expressions in an English sentence. Cannot translate into multiple sentences. Moreover, since this translation system generates the structure of an English sentence itself, there is a possibility that an unnatural English sentence will be generated. The sentence structure is represented by, for example, SVOC. Here, English sentences are created by connecting fragments such as nouns, verbs, and adjectives. A structure will also be generated at this time, but it may not be grammatically correct or natural English. As a result, there is a problem that even the translation system disclosed in Patent Document 1 cannot be translated into a natural sentence.

さらに、非特許文献２に開示されている翻訳システムでは、入力文が日本語の長い文である場合、当該日本語の入力文を、翻訳結果となる英語の文の単位を考慮せずに分割すると、必ずしも適切な英語の単位に分割できるとは限らず、適切な単位で日本語の入力文が分割されなければ、やはり、自然な文に翻訳することができないという問題がある。 Further, in the translation system disclosed in Non-Patent Document 2, when the input sentence is a long Japanese sentence, the Japanese input sentence is divided without considering the unit of the English sentence that is the translation result. Then, it is not always possible to divide the sentence into appropriate English units, and there is a problem that it cannot be translated into a natural sentence unless a Japanese input sentence is divided into appropriate units.

さらにまた、特許文献２に開示されている翻訳システムでは、入力文が日本語の長い文である場合、英語を翻訳元の言語としており、用例を分割していないため、翻訳できる文のバリエーションが少なくなってしまい、やはり、自然な文に翻訳することができないという問題がある。また、特許文献３に開示されている翻訳システムでは、入力文が日本語の長い文である場合、英語に依存したルールを分割手法に用いているため、日本語の長い文を分割することができず、やはり、自然な文に翻訳することができないという問題がある。 Furthermore, in the translation system disclosed in Patent Document 2, when the input sentence is a long sentence in Japanese, English is used as the translation source language, and the example is not divided. After all, there is a problem that it cannot be translated into a natural sentence. Further, in the translation system disclosed in Patent Document 3, when an input sentence is a long sentence in Japanese, since a rule that depends on English is used for the division method, a long sentence in Japanese can be divided. After all, there is a problem that it cannot be translated into a natural sentence.

そこで、本発明では、前記した問題を解決し、第一言語（例えば、日本語）の１つの文が第二言語（例えば、英語）の複数の文に翻訳されている場合に、日本語の１つの文における部分と英語の１つの文とが対応するようにした用例データベースを作成することができる用例データベース作成装置及び用例データベース作成プログラム、並びに、第一言語の文を第二言語の自然な文に翻訳することができる翻訳装置及び翻訳プログラムを提供することを目的とする。 Therefore, in the present invention, when the above-mentioned problem is solved and one sentence in the first language (for example, Japanese) is translated into a plurality of sentences in the second language (for example, English), An example database creation device and an example database creation program capable of creating an example database in which a part in one sentence and one sentence in English correspond to each other, and a sentence in the first language as a natural language in the second language It is an object of the present invention to provide a translation apparatus and a translation program that can translate a sentence.

前記課題を解決するため、請求項１に記載の用例データベース作成装置は、第一言語の文を第二言語の文に翻訳した翻訳データが収められている対訳データベースであり、前記第一言語の１つの文が前記第二言語の複数の文に翻訳されている場合において、当該第一言語の１つの文の部分と当該第二言語の１つの文とが対応するようにした一文対応翻訳データを、前記対訳データベースに追加した用例データベースを作成する用例データベース作成装置であって、翻訳データ判別手段と、引用表現判別手段と、引用表現分離手段と、第二言語複数文判別手段と、第一言語表現特定手段と、第一言語分割手段と、分割第一言語対応第二言語追加手段と、を備える構成とした。 In order to solve the above problem, the example database creation device according to claim 1 is a bilingual database in which translation data obtained by translating a sentence in a first language into a sentence in a second language is stored. One sentence corresponding translation data in which one sentence part of the first language corresponds to one sentence of the second language when one sentence is translated into a plurality of sentences of the second language Is an example database creation device for creating an example database added to the bilingual database, wherein the translation data discrimination means, the quoted expression discrimination means, the quoted expression separation means, the second language multiple sentence discrimination means, and the first A language expression specifying unit, a first language dividing unit, and a divided first language-compatible second language adding unit are provided.

かかる構成によれば、用例データベース作成装置は、引用表現判別手段によって、第一言語の文中に引用を表す引用表現が含まれているか否かを、第一言語において頻出する引用表現をパターン化した引用表現パターンを用いて判別する。そして、用例データベース作成装置は、引用表現分離手段によって、引用表現判別手段で引用表現が含まれていると判別した場合に、当該引用表現を、当該第一言語の文から分離して破棄する。そして、用例データベース作成装置は、第二言語複数文判別手段によって、引用表現分離手段で引用表現を分離した第一言語の文を本文とし、この本文が前記第二言語の複数の文に翻訳されているか否かを、前記第二言語の文法に従った文の区切りをパターン化した文区切りパターンを用いて判別する。 According to such a configuration, the example database creation device patterns the citation expression frequently appearing in the first language by the citation expression discriminating means to determine whether or not the citation expression representing the citation is included in the sentence of the first language. Discrimination is performed using a quoted expression pattern. Then, the example database creation device separates the quoted expression from the sentence in the first language and discards it when the quoted expression separating unit determines that the quoted expression is included by the quoted expression separating unit. Then, the example database creation device uses the second language multi-sentence discriminating means to make the text in the first language in which the quoted expression separating means is separated into the text, and the text is translated into the plurality of sentences in the second language. Is determined using a sentence break pattern obtained by patterning sentence breaks according to the grammar of the second language.

そして、用例データベース作成装置は、第一言語表現特定手段によって、第二言語複数文判別手段で前記引用表現を分離した本文が第二言語の複数の文に翻訳されていると判別された場合に、当該第二言語の複数の文に含まれる単語が、本文に含まれる単語のどの単語に該当するのかを、第一言語の単語と第二言語の単語とが対応付けられている対訳辞書データを用いて特定する。 Then, the example database creation device, when the first language expression specifying unit determines that the text obtained by separating the cited expression by the second language multiple sentence determination unit is translated into a plurality of sentences in the second language. The bilingual dictionary data in which the words of the first language and the words of the second language are associated with which word of the words contained in the text the words included in the plurality of sentences of the second language correspond to Use to specify.

なお、第一言語表現特定手段では、対訳辞書データの代わりに、第一言語の単語と第二言語の単語との関連性について求められた対数尤度比データと第一言語の単語と第二言語の単語との対応する確率について求められた単語対応確率との少なくとも１つを用いてもよい。さらに、第一言語表現特定手段では、第一言語の文の構文構造のデータと第二言語と構文構造のデータとを用いてもよい。 In the first language expression specifying means, instead of the bilingual dictionary data, the log likelihood ratio data obtained for the relationship between the first language word and the second language word, the first language word, and the second language You may use at least 1 with the word corresponding | compatible probability calculated | required about the probability corresponding to the word of a language. Further, the first language expression specifying means may use the syntax structure data of the first language sentence, the second language, and the syntax structure data.

さらに、用例データベース作成装置は、第一言語分割手段によって、第一言語表現特定手段で特定された第二言語の複数の文に含まれる単語と、本文に含まれる単語との対応関係に従って、本文の各部分と第二言語の１つの文とが対応するように分割する。そして、用例データベース作成装置は、分割第一言語対応第二言語追加手段によって、第一言語分割手段で分割された本文の各部分と第二言語の１つの文とが対応している一文対応翻訳データを、対訳データベースに追加する。 Further, the example database creation device is configured so that the first language dividing unit determines the text according to the correspondence between the words included in the plurality of sentences in the second language specified by the first language expression specifying unit and the words included in the text. Are divided so that one part of the second language corresponds to one sentence of the second language. Then, the example database creation device uses the divided first language-corresponding second language adding means, so that each part of the text divided by the first language dividing means corresponds to one sentence of the second language. Add data to the bilingual database.

請求項２に記載の用例データベース作成装置は、請求項１に記載の用例データベース作成装置において、前記第一言語分割手段が、前記本文の各部分に、当該各部分の主語又は提題となる予め設定した単語を付加することを特徴とする。 The example database creation device according to claim 2 is the example database creation device according to claim 1, wherein the first language segmentation unit preliminarily becomes a subject or a suggestion of each part of the text. The set word is added.

かかる構成によれば、用例データベース作成装置は、第一言語分割手段によって、本文の各部分に主語又は提題を付加し、この本文の各部分を１つの文として成立させることができる。 According to such a configuration, the example database creation device can add the subject or the proposal to each part of the text by the first language dividing means, and can establish each part of the text as one sentence.

請求項３に記載の翻訳装置は、入力された第一言語の文を、第二言語の文に翻訳する翻訳装置であって、用例データベースと、引用表現判別手段と、引用表現分離手段と、第一言語節・並列句判別手段と、単位毎翻訳手段と、最大スコア翻訳結果選択手段と、翻訳手段と、翻訳結果出力手段と、を備える構成とした。 The translation device according to claim 3 is a translation device for translating an input sentence in the first language into a sentence in the second language, an example database, a quoted expression discriminating means, a quoted expression separating means, The first language clause / parallel phrase discrimination means, the unit translation means, the maximum score translation result selection means, the translation means, and the translation result output means are provided.

かかる構成によれば、翻訳装置は、請求項１のデータベース作成装置で作成された用例データベースを備え、引用表現判別手段によって、第一言語の文の中に引用を表す引用表現が含まれているか否かを、第一言語において頻出する引用表現をパターン化した引用表現パターンを用いて判別する。続いて、翻訳装置は、引用表現分離手段によって、引用表現判別手段で引用表現が含まれていると判別した場合に、当該引用表現を、該当する第一言語の１つの文から分離する。そして、翻訳装置は、第一言語節・並列句判別手段によって、引用表現分離手段で引用表現を分離した第一言語の文が節又は並列句を含むか否かを、第一言語の文法に従った文の節及び並列句をパターン化した節・並列句パターンを用いて判別する。 According to this configuration, the translation apparatus includes the example database created by the database creation apparatus according to claim 1, and whether the quotation expression representing the quotation is included in the sentence in the first language by the quotation expression determination unit. Whether or not is determined using a citation expression pattern obtained by patterning citation expressions frequently appearing in the first language. Subsequently, the translation device separates the quoted expression from one sentence of the corresponding first language when the quoted expression separating unit determines that the quoted expression is included by the quoted expression separating unit. Then, the translation device uses the first language clause / parallel phrase discriminating means to determine whether or not the sentence of the first language separated by the quote expression separating means includes a clause or a parallel phrase in the grammar of the first language. Discrimination is performed using clause / parallel phrase patterns obtained by patterning the clauses and parallel phrases of the sentence.

そして、翻訳装置は、第一言語翻訳単位分割手段によって、第一言語節・並列句判別手段で節又は並列句を含むと判別された第一言語の文を、第二言語に翻訳する単位となる翻訳単位に分割する。そして、翻訳装置は、単位毎翻訳手段によって、第一言語翻訳単位分割手段で分割された翻訳単位と用例データベースに含まれるデータとが一致する度合いを示すスコアを、当該翻訳単位毎に計算して翻訳する。さらに、翻訳装置は、最大スコア翻訳結果選択手段によって、単位毎翻訳手段で翻訳単位を翻訳する際に計算した当該翻訳単位ごとのスコアについて合計した合計スコアが最大となる最大スコア翻訳結果を選択する。そして、翻訳装置は、翻訳手段によって、第一言語節・並列句判別手段で節又は並列句を含まないと判別された第一言語の文を、用例データベースを用いて翻訳する。そして、翻訳装置は、翻訳結果出力手段によって、最大スコア翻訳結果選択手段で選択された最大スコア翻訳結果、又は、翻訳手段で翻訳された翻訳結果と、引用表現分離手段で引用表現が分離された場合に当該引用表現を、第一言語の単語と第二言語の単語とが対応付けられている対訳辞書データを用いて翻訳した翻訳結果とを出力する。 The translation device includes a unit for translating the sentence of the first language, which is determined by the first language translation unit dividing means to include the clause or the parallel phrase by the first language clause / parallel phrase determining means, into the second language. Divide into translation units. Then, the translation device calculates, for each translation unit, a score indicating the degree of matching between the translation unit divided by the first language translation unit dividing unit and the data included in the example database by the unit translation unit. translate. Furthermore, the translation apparatus selects the maximum score translation result that maximizes the total score of the scores for each translation unit calculated when the translation unit is translated by the unit translation means by the maximum score translation result selection means. . Then, the translation apparatus uses the example database to translate the sentence in the first language, which is determined by the translation unit to include no clause or parallel phrase by the first language clause / parallel phrase determination unit. In the translation device, the translation result output means has separated the maximum score translation result selected by the maximum score translation result selection means or the translation result translated by the translation means and the quote expression by the quote expression separation means. In this case, a translation result obtained by translating the cited expression using bilingual dictionary data in which a word in the first language and a word in the second language are associated is output.

請求項４に記載の翻訳装置は、請求項３に記載の翻訳装置において、前記第一言語翻訳単位分割手段が、前記翻訳単位に、当該翻訳単位の主語又は提題となる予め設定した単語を付加することを特徴とする。 The translation device according to claim 4 is the translation device according to claim 3, wherein the first language translation unit dividing unit assigns, to the translation unit, a preset word that is a subject or a proposal of the translation unit. It is characterized by adding.

かかる構成によれば、翻訳装置は、第一言語翻訳単位分割手段によって、翻訳単位に主語又は提題を付加し、この翻訳単位を１つの文として成立させることで、用例データベースに含まれている第二言語の文と対応させることができる。 According to such a configuration, the translation device is included in the example database by adding a subject or a proposal to the translation unit by the first language translation unit dividing unit and establishing the translation unit as one sentence. Can correspond to a second language sentence.

請求項５に記載の翻訳装置は、請求項２又は３に記載の翻訳装置において、前記単位毎翻訳手段が、用例データ取得手段と、用例データ選択手段と、編集手段と、翻訳候補出力手段と、を備えることを特徴とする。 The translation device according to claim 5 is the translation device according to claim 2 or 3, wherein the unit-by-unit translation means includes example data acquisition means, example data selection means, editing means, and translation candidate output means. It is characterized by providing.

かかる構成によれば、翻訳装置は、用例データ取得手段によって、用例データベースから、翻訳単位に含まれる述語が一致又は予め設定した類似度を満たす第一言語の文を、用例データとして取得する。続いて、翻訳装置は、用例データ選択手段によって、用例データ取得手段で取得した用例データと翻訳単位との構文構造が近似する度合いを表した距離を計算し、この距離が最小のものから所定数の用例データを選択する。 According to such a configuration, the translation apparatus acquires, as example data, a sentence in the first language that satisfies the predicate included in the translation unit or satisfies the similarity set in advance from the example database by the example data acquisition unit. Subsequently, the translation apparatus calculates a distance representing the degree of approximation of the syntax structure between the example data acquired by the example data acquisition unit and the translation unit by the example data selection unit, and a predetermined number from the smallest distance is used. Select example data for.

なお、この距離の計算は、用例データと翻訳単位とが同じ場合に最小の編集距離を有するとし、この編集距離に、用例データを翻訳単位に編集した際に、単語を削除している場合のコストを削除コストとして付加し、単語を置換している場合のコストを置換コストとして付加し、単語を挿入している場合のコストを挿入コストとして付加したものを計算している。 Note that this distance is calculated when the example data and the translation unit are the same, and the minimum edit distance is assumed, and when the example data is edited into the translation unit, the word is deleted at this edit distance. Is added as the deletion cost, the cost when the word is replaced is added as the replacement cost, and the cost when the word is inserted is added as the insertion cost.

また、翻訳装置は、編集手段によって、用例データ選択手段で選択した用例データと翻訳単位との表現が同一になるように当該用例データを編集する際に、予め設定した編集の規則に従った編集コストを計算すると共に、当該用例データの第二言語の表現を、第一言語の単語と第二言語の単語との対応関係を予め設定した第一言語第二言語対応情報を用いて、編集した後の第二言語の文を翻訳候補とする。
なお、この編集コストは、用例データの編集に従って当該用例データに対応する第二言語の文に含まれている単語を置換、削除又は挿入する場合を計算したものである。 In addition, when the translation device edits the example data so that the expression of the example data selected by the example data selection unit and the translation unit is the same by the editing unit, the translation device performs editing in accordance with a preset editing rule. While calculating the cost, the second language expression of the example data was edited using the first language second language correspondence information in which the correspondence between the first language word and the second language word was preset. The later second language sentence is taken as a translation candidate.
This editing cost is calculated when a word included in a second language sentence corresponding to the example data is replaced, deleted, or inserted according to the editing of the example data.

そして、翻訳装置は、翻訳候補出力手段によって、用例データ選択手段で計算した距離と、編集手段で計算した編集コストとから、翻訳単位ごとのスコア及び翻訳候補を出力する。 Then, the translation apparatus outputs a score and a translation candidate for each translation unit from the distance calculated by the example data selection unit and the editing cost calculated by the editing unit by the translation candidate output unit.

請求項６に記載の用例データベース作成プログラムは、第一言語の文を第二言語の文に翻訳した翻訳データが収められている対訳データベースであり、前記第一言語の１つの文が前記第二言語の複数の文に翻訳されている場合において、当該第一言語の１つの文の部分と当該第二言語の１つの文とが対応するようにした一文対応翻訳データを、前記対訳データベースに追加した用例データベースを作成するために、コンピュータを、翻訳データ判別手段、引用表現判別手段、引用表現分離手段、第二言語複数文判別手段、第一言語表現特定手段、第一言語分割手段、分割第一言語対応第二言語追加手段、として機能させる構成とした。 The example database creation program according to claim 6 is a parallel translation database storing translation data obtained by translating a sentence in a first language into a sentence in a second language, wherein one sentence in the first language is the second sentence. When translated into a plurality of sentences in a language, one sentence corresponding translation data in which one sentence part of the first language corresponds to one sentence of the second language is added to the parallel translation database. In order to create the example database, the computer is divided into a translation data discriminating means, a quoted expression discriminating means, a quoted expression separating means, a second language plural sentence discriminating means, a first language expression specifying means, a first language dividing means, The second language adding means corresponding to one language is configured to function.

かかる構成によれば、用例データベース作成プログラムは、引用表現判別手段によって、第一言語の文中に引用を表す引用表現が含まれているか否かを、第一言語において頻出する引用表現をパターン化した引用表現パターンを用いて判別する。そして、用例データベース作成プログラムは、引用表現分離手段によって、引用表現判別手段で引用表現が含まれていると判別した場合に、当該引用表現を、当該第一言語の文から分離して破棄し、第二言語複数文判別手段によって、引用表現分離手段で引用表現を分離した本文が第二言語の複数の文に翻訳されているか否かを、第二言語の文法に従った文の区切りをパターン化した文区切りパターンを用いて判別し、第一言語表現特定手段によって、第二言語複数文判別手段で本文が第二言語の複数の文に翻訳されていると判別された場合に、当該第二言語の複数の文に含まれる単語が、本文に含まれる単語のどの単語に該当するのかを、第一言語の単語と第二言語の単語とが対応付けられている対訳辞書データを用いて特定する。さらに、用例データベース作成プログラムは、第一言語分割手段によって、第一言語表現特定手段で特定された第二言語の複数の文に含まれる単語と、引用表現を分離した複数翻訳文包含文に含まれる単語との対応関係に従って、引用表現を分離した複数翻訳文包含文の各部分と第二言語の１つの文とが対応するように分割し、分割第一言語対応第二言語追加手段によって、第一言語分割手段で分割された本文の各部分と第二言語の１つの文とが対応している一文対応翻訳データを、対訳データベースに追加する。 According to such a configuration, the example database creation program patterns the citation expression frequently appearing in the first language by the citation expression discriminating means to determine whether or not the citation expression representing the citation is included in the sentence of the first language. Discrimination is performed using a quoted expression pattern. Then, the example database creation program, when the citation representation separation means determines that the citation expression is included by the citation expression determination means, separates the citation expression from the sentence in the first language, and discards it. The second language multi-sentence discriminating means determines whether or not the text separated from the cited expression by the cited expression separating means has been translated into a plurality of sentences in the second language, and patterns of sentence breaks according to the grammar of the second language If the first language expression specifying unit determines that the text is translated into a plurality of sentences in the second language by the first language expression specifying unit, Using bilingual dictionary data in which words in the first language and words in the second language are associated with each other, the words included in the sentences in two languages correspond to which of the words included in the text. Identify. Further, the example database creation program is included in a plurality of translated sentence inclusion sentences in which a word included in a plurality of sentences in the second language specified by the first language expression specifying means and a quoted expression are separated by the first language dividing means. In accordance with the corresponding relationship with each word, each portion of the sentence including the multiple translation sentences separated from the quoted expression and one sentence in the second language correspond to each other. One sentence corresponding translation data in which each part of the text divided by the first language dividing means corresponds to one sentence of the second language is added to the parallel translation database.

請求項７に記載の翻訳プログラムは、入力された第一言語の文を、第二言語の文に翻訳するために、請求項１のデータベース作成装置で作成された用例データベースを備えたコンピュータを、引用表現判別手段と、引用表現分離手段と、第一言語節・並列句判別手段と、単位毎翻訳手段と、最大スコア翻訳結果選択手段と、翻訳手段と、翻訳結果出力手段と、として機能させる構成とした。 The translation program according to claim 7, a computer having an example database created by the database creation device according to claim 1, for translating an input sentence in the first language into a sentence in the second language, Citation expression discrimination means, citation expression separation means, first language clause / parallel phrase discrimination means, unit-by-unit translation means, maximum score translation result selection means, translation means, and translation result output means The configuration.

かかる構成によれば、翻訳プログラムは、引用表現判別手段によって、第一言語の文の中に引用を表す引用表現が含まれているか否かを、第一言語において頻出する引用表現をパターン化した引用表現パターンを用いて判別し、引用表現分離手段によって、引用表現判別手段で引用表現が含まれていると判別した場合に、当該引用表現を、該当する第一言語の１つの文から分離し、第一言語節・並列句判別手段によって、引用表現分離手段で引用表現を分離した第一言語の文が節又は並列句を含むか否かを、第一言語の文法に従った文の節及び並列句をパターン化した節・並列句パターンを用いて判別する。そして、翻訳プログラムは、第一言語翻訳単位分割手段によって、第一言語節・並列句判別手段で節又は並列句を含むと判別された第一言語の文を、第二言語に翻訳する単位となる翻訳単位に分割し、単位毎翻訳手段によって、第一言語翻訳単位分割手段で分割された翻訳単位と用例データベースに含まれるデータとが一致する度合いを示すスコアを、翻訳単位ごとに計算して翻訳する。さらに、翻訳プログラムは、最大スコア翻訳結果選択手段によって、単位毎翻訳手段で翻訳単位を翻訳する際に計算した当該翻訳単位ごとのスコアについて合計した合計スコアが最大となる最大スコア翻訳結果を選択し、翻訳手段によって、第一言語節・並列句判別手段で節又は並列句を含まないと判別された第一言語の文を、用例データベースを用いて翻訳する。そして、翻訳プログラムは、翻訳結果出力手段によって、最大スコア翻訳結果選択手段で選択された最大スコア翻訳結果、又は、翻訳手段で翻訳された翻訳結果と、引用表現分離手段で引用表現が分離された場合に当該引用表現を、第一言語の単語と第二言語の単語とが対応付けられている対訳辞書データを用いて翻訳した翻訳結果とを出力する。 According to such a configuration, the translation program patterns the citation expression frequently appearing in the first language by the citation expression discriminating means to determine whether or not the citation expression representing the citation is included in the sentence of the first language. When the quotation expression pattern is determined and the quotation expression separation means determines that the quotation expression is included, the quotation expression is separated from one sentence of the corresponding first language. The sentence section according to the grammar of the first language is used to determine whether the sentence in the first language from which the quote expression is separated by the quote expression separation means by the first language clause / parallel phrase discrimination means includes a clause or a parallel phrase. And it is discriminated by using a clause / parallel phrase pattern obtained by patterning the parallel phrase. The translation program includes a unit for translating the sentence in the first language, which is determined by the first language translation unit dividing means to include the clause or the parallel phrase in the first language clause / parallel phrase determining means, into the second language. The translation unit for each translation unit calculates a score indicating the degree of matching between the translation unit divided by the first language translation unit division unit and the data included in the example database. translate. Furthermore, the translation program selects the maximum score translation result that maximizes the total sum of the scores for each translation unit calculated when the translation unit is translated by the unit translation means by the maximum score translation result selection means. The sentence in the first language determined not to contain the clause or the parallel phrase by the first language clause / parallel phrase determining means by the translation means is translated using the example database. Then, the translation program outputs the maximum score translation result selected by the maximum score translation result selection means or the translation result translated by the translation means and the quote expression by the quote expression separation means by the translation result output means. In this case, a translation result obtained by translating the cited expression using bilingual dictionary data in which a word in the first language and a word in the second language are associated is output.

請求項１、６に記載の発明によれば、第一言語の引用表現を除いた後に、第二言語の文と対応付けることで、第一言語（例えば、日本語）の１つの文が第二言語（例えば、英語）の複数の文に翻訳されている場合に、第一言語の１つの文における部分と第二言語の１つの文とが対応するようにした用例データベースを作成することができる。 According to the first and sixth aspects of the invention, after removing the citation expression in the first language, one sentence in the first language (for example, Japanese) can be obtained by associating it with the sentence in the second language. When translated into a plurality of sentences in a language (for example, English), it is possible to create an example database in which a part of one sentence in the first language corresponds to one sentence in the second language .

請求項２に記載の発明によれば、分割した本文の各部分に主語又は提題を付加し、この複数翻訳文包含文の各部分を１つの文として成立させることで、用例データベースに含まれている第二言語の文と対応させることができる。 According to the second aspect of the present invention, a subject or a suggestion is added to each part of the divided text, and each part of the sentence including multiple translations is established as one sentence, so that it is included in the example database. It can correspond to a sentence in the second language.

請求項３、７に記載の発明によれば、引用表現や複数の節・並列句を含む第一言語が入力された場合にこれらを適切に分離・分割することで、第一言語の文と第二言語の文とが１対１に対応している用例データベースを適切に用いることができるので、第一言語の文を第二言語の自然な文に翻訳することができる。 According to the third and seventh aspects of the present invention, when a first language including a quoted expression or a plurality of clauses / parallel phrases is input, the first language sentence and Since the example database in which the second language sentence has a one-to-one correspondence can be used appropriately, the sentence in the first language can be translated into a natural sentence in the second language.

請求項４に記載の発明によれば、分割した翻訳単位に主語又は提題を付加し、この翻訳単位を１つの文として成立させることで、用例データベースに含まれている第二言語の文と対応させることができる。 According to the invention described in claim 4, by adding a subject or a proposal to the divided translation unit and establishing the translation unit as one sentence, the sentence of the second language included in the example database Can be matched.

請求項５に記載の発明によれば、分割した翻訳単位について、当該翻訳単位が、用例データベースに収められている用例データと同じ構文構造を取るようにした場合の編集距離に、削除コスト及び置換コストを付加した距離を計算すると共に、当該翻訳単位が用例データと同じになるように、用例データに対応する第二言語の文に含まれている単語を置換、削除又は挿入して編集した場合の編集コストを計算し、これら距離及び編集コストから、翻訳単位ごとのスコア及び翻訳候補を出力することで、用例データベースを適切に用いることができる。 According to the invention described in claim 5, with respect to the divided translation units, the deletion cost and the replacement are included in the edit distance when the translation unit has the same syntax structure as the example data stored in the example database. When calculating the distance with the cost added and editing by replacing, deleting, or inserting words contained in the second language sentence corresponding to the example data so that the translation unit is the same as the example data The example database can be used appropriately by calculating the editing cost of the above and outputting the score and the translation candidate for each translation unit from the distance and the editing cost.

次に、本発明の実施形態について、適宜、図面を参照しながら詳細に説明する。
まず、用例データベース作成装置について構成及び動作について、続けて、翻訳装置の構成及び動作について、適宜、具体的な例を示しながら説明する。
（用例データベース作成装置の構成）
図１は、用例データベース作成装置のブロック図である。図１に示すように、用例データベース作成装置１は、第一言語の文を第二言語の文に翻訳した翻訳データが収められている対訳データベース２において、第一言語の１つの文が第二言語の複数の文に翻訳されている場合に、この第一言語の１つの文の各部分と第二言語の１つの文とが対応するように翻訳した一文対応翻訳データを対訳データベース２に追加した用例データベース４を作成するもので、翻訳データ判別手段３と、引用表現判別手段５と、引用表現パターン蓄積手段７と、引用表現分離手段９と、第二言語複数文判別手段１１と、文区切りパターン蓄積手段１３と、第一言語表現特定手段１５と、対訳辞書データ蓄積手段１７と、第一言語分割手段１９と、分割第一言語対応第二言語追加手段２１とを備えている。 Next, embodiments of the present invention will be described in detail with reference to the drawings as appropriate.
First, the configuration and operation of the example database creation device will be described, and the configuration and operation of the translation device will be described with specific examples as appropriate.
(Configuration of example database creation device)
FIG. 1 is a block diagram of an example database creation apparatus. As illustrated in FIG. 1, the example database creation device 1 includes a translation database 2 in which translation data obtained by translating a sentence in a first language into a sentence in a second language is stored. When translated into a plurality of sentences in a language, one sentence corresponding translation data translated so that each part of one sentence in the first language corresponds to one sentence in the second language is added to the parallel translation database 2 The example database 4 is created, the translation data discriminating means 3, the quoted expression discriminating means 5, the quoted expression pattern accumulating means 7, the quoted expression separating means 9, the second language plural sentence discriminating means 11, the sentence A delimiter pattern accumulating unit 13, a first language expression specifying unit 15, a bilingual dictionary data accumulating unit 17, a first language dividing unit 19, and a divided first language corresponding second language adding unit 21 are provided.

なお、この実施形態では、用例データベース作成装置１によって作成する用例データベース４は、第一言語を日本語、第二言語を英語として説明するが、対訳データベース２が存在する言語であり、引用表現パターン蓄積手段７、文区切りパターン蓄積手段１３及び対訳辞書データ蓄積手段１７が用意できるのであれば、どのような言語であってもよい。 In this embodiment, the example database 4 created by the example database creation device 1 is described as the first language is Japanese and the second language is English. However, the bilingual database 2 exists, and the quote expression pattern Any language may be used as long as the storage means 7, the sentence delimiter pattern storage means 13 and the bilingual dictionary data storage means 17 can be prepared.

また、対訳データベース２は、日本語のニュース記事と英語のニュース記事とについて、記事対応と文対応とが付加されているもので、ここでは、ニュース番組等で実際に使用されたものを採用している。 Also, the bilingual database 2 is an article correspondence and sentence correspondence for Japanese news articles and English news articles, and here, the one actually used in news programs etc. is adopted. ing.

翻訳データ判別手段３は、対訳データベース２に収められている翻訳データについて、
日本語の１つの文が英語の複数の文に翻訳されているか否かを、判別するものである。この翻訳データ判別手段３による判別は、単純に日本語の１つの文に対し、対応する英語の文にピリオドが２つ以上含まれているか否かによって判定している。なお、この翻訳データ判別手段３による判別では、英語の文中に省略を示すピリオドが含まれていた場合には、当該英語の文を複数の文と判定してしまうことになる。 The translation data discriminating means 3 uses the translation data stored in the parallel translation database 2 as follows:
It is determined whether one sentence in Japanese is translated into a plurality of sentences in English. The discrimination by the translation data discriminating means 3 is simply determined by whether or not two or more periods are included in the corresponding English sentence for one Japanese sentence. In the determination by the translation data determination means 3, if an English sentence includes a period indicating omission, the English sentence is determined as a plurality of sentences.

そして、この用例データベース作成装置１では、翻訳データ判別手段３によって判別された日本語の文について、当該文を１文ずつ、対応する英語の文と共に処理している。 The example database creation device 1 processes each sentence of the Japanese sentence determined by the translation data determination means 3 together with the corresponding English sentence.

引用表現判別手段５は、翻訳データ判別手段３によって、日本語の１つの文が英語の複数の文に翻訳されていると判別された文（複数翻訳文包含文）の中に、引用を表す引用表現が含まれているか否かを、引用表現パターン蓄積手段７に蓄積されている引用表現パターンを用いて判別するものである。 The quoted expression discriminating means 5 represents a citation in a sentence (multiple translated sentence inclusion sentence) determined by the translation data discriminating means 3 that one Japanese sentence is translated into a plurality of English sentences. Whether or not a quote expression is included is determined using the quote expression pattern stored in the quote expression pattern storage means 7.

引用表現パターン蓄積手段７は、日本語において頻出する引用表現をパターン化した引用表現パターンを蓄積しているもので、一般的なハードディスク等の記録媒体によって構成されている。この引用表現パターン蓄積手段７に蓄積されている引用表現パターンは、例えば、「・・・によりますと」、「一般的には・・・といわれています。」、「・・・によりますと、・・・・ということです。」が挙げられる。 The quoted expression pattern storage means 7 stores quoted expression patterns obtained by patterning quoted expressions that frequently appear in Japanese, and is constituted by a general recording medium such as a hard disk. The quote expression pattern stored in the quote expression pattern storage means 7 is, for example, “according to ...”, “generally said to ...”, “... according to ... ... ".

引用表現分離手段９は、引用表現判別手段５で判別された引用表現を、複数翻訳文包含文から分離するものである。
ここで、引用表現判別手段５で判別後、引用表現分離手段９で分離する場合の具体的な例について説明する。
複数翻訳文包含文が「ＪＲによりますと、東海道・山陽新幹線のダイヤの乱れはきょう一杯続く見込みだということです。」であり、この複数翻訳文包含文に対応する英語の文が「ＴｈｅＪａｐａｎＲａｉｌｗａｙＣｏｍｐａｎｙｓａｙｓｔｈｅＴｏｋａｉｄｏＳａｎｙｏＳｈｉｎｋａｎｓｅｎｓｅｒｖｉｃｅｓｗｉｌｌｂｅｄｉｓｒｕｐｔｅｄｕｎｔｉｌｔｈｅｌａｓｔｔｒａｉｎｔｏｎｉｇｈｔ．」であるとする。 The quoted expression separating means 9 is for separating the quoted expression discriminated by the quoted expression discriminating means 5 from a plurality of translated sentence inclusion sentences.
Here, a specific example in the case of separation by the quoted expression separation unit 9 after discrimination by the quoted expression discrimination unit 5 will be described.
The multiple-translation sentence inclusion sentence is "If JR says, the Tokaido / Sanyo Shinkansen diamonds are expected to continue to be disrupted today." The English sentence corresponding to this multiple-translation sentence inclusion sentence is "The Japan Railway". “Company say the Tokyo Sanyo Shinkansen services will be disrupted the last train tonight”.

こうした場合、引用表現判別手段５で判別後、引用表現分離手段９で判別した引用表現を分離すると、引用表現が「ＴｈｅＪａｐａｎＲａｉｌｗａｙＣｏｍｐａｎｙｓａｙｓ」となり、この引用表現を分離した複数翻訳文包含文（以下、単に本文ともいう）が「ｔｈｅＴｏｋａｉｄｏＳａｎｙｏＳｈｉｎｋａｎｓｅｎｓｅｒｖｉｃｅｓｗｉｌｌｂｅｄｉｓｒｕｐｔｅｄｕｎｔｉｌｔｈｅｌａｓｔｔｒａｉｎｔｏｎｉｇｈｔ．」となる。 In such a case, after the citation expression discriminating means 5 discriminates and the citation expression discriminated by the citation expression separation means 9 is separated, the citation expression becomes “The Japan Railway Company Says”, and the multiple translation sentence inclusion sentence ( Hereinafter, the text is also simply referred to as “the Tokaido Sanyo Shinkansen services will be disrupted the last train tonight.”.

第二言語複数文判別手段１１は、引用表現分離手段９で引用表現が分離された複数翻訳文包含文（本文）が、英語の複数の文によって構成されているか否かを、文区切りパターン蓄積手段１３に蓄積されている文区切りパターンを用いて判別するものである。そして、この第二言語複数分判別手段１１により、本文が英語の複数の文によって構成されていないと判別された場合には、この本文は破棄されることとなる。つまり、当該本文についてはこれ以上の処理を行わない。 The second language plural sentence discriminating means 11 stores a sentence delimiter pattern as to whether or not the plural translated sentence inclusion sentence (text) from which the quote expression is separated by the quote expression separating means 9 is composed of a plurality of English sentences. The determination is made using the sentence delimiter pattern stored in the means 13. Then, when the second language plural determination unit 11 determines that the text is not composed of a plurality of English sentences, the text is discarded. That is, no further processing is performed on the text.

なお、この第二言語複数分判別手段１１では、本文が英語の複数の文によって構成されていないと判別された場合の当該本文を破棄することとしているが、少なくとも本文は複数翻訳文包含文から引用表現が分離されているものであるので、この結果を対訳データベース２に出力して、翻訳データを充実させてもよい。つまり、当初、対訳データベース２に収められていた翻訳データである日本語の１つの複数翻訳文包含文及び英語の複数の文は、日本語の引用表現及び英語の引用表現と、日本語の１つの文及び英語の１つの文との２つに分けられたものとなる。 Note that the second language plural determination unit 11 discards the text when it is determined that the text is not composed of a plurality of English sentences. Since the quoted expressions are separated, this result may be output to the parallel translation database 2 to enrich the translation data. That is, at first, the translation data included in the bilingual database 2 includes a Japanese multiple translation sentence inclusion sentence and an English multiple sentence, a Japanese quotation expression and an English quotation expression, and a Japanese one. It is divided into two sentences, one sentence and one sentence in English.

文区切りパターン蓄積手段１３は、英語の文法に従った文の区切りをパターン化した文区切りパターンを蓄積するもので、一般的なハードディスク等の記録媒体によって構成されている。この文区切りパターン蓄積手段１３に蓄積されている文区切りパターンは、省略を示すピリオドを除いて、文末にあるピリオドを識別するためのパターンである。なお、省略を示すピリオドの例は、「Ｍｒ．」、「Ｍｔ．」、「Ｄｒ．」である。 The sentence delimiter pattern accumulating unit 13 accumulates a sentence delimiter pattern obtained by patterning sentence delimiters according to English grammar, and is configured by a general recording medium such as a hard disk. The sentence delimiter pattern stored in the sentence delimiter pattern accumulating unit 13 is a pattern for identifying a period at the end of the sentence except for a period indicating omission. Examples of periods indicating omission are “Mr.”, “Mt.”, and “Dr.”.

第一言語表現特定手段１５は、第二言語複数文判別手段１１において、引用表現が分離された複数翻訳文包含文（本文）が英語の複数の文によって構成されていると判別された場合に、この英語の複数の文に含まれている単語が、本文に含まれている単語のどの単語に該当するのかを、対訳辞書データ蓄積手段１７に蓄積されている対訳辞書データと、日本語の文と英語の文との構文構造の関連性について求められた構文データとを用いて、特定する（対応付けを行う）ものである。 The first language expression specifying unit 15 determines that the second language plural sentence discriminating unit 11 determines that the plural translated sentence inclusion sentence (text) from which the cited expression is separated is composed of plural English sentences. The bilingual dictionary data accumulated in the bilingual dictionary data accumulating means 17 and the Japanese dictionary to determine which of the words contained in the text the words contained in the plurality of English sentences correspond to. It is specified (corresponds) using the syntax data obtained for the relationship between the sentence structure and the sentence structure in English.

なお、構文データは、文節や句のまとまりを表したものである。例えば、句であれば、日本語の「気象庁」は、英語の「ＴｈｅＭｅｔｅｏｒｏｌｏｇｉｃａｌＡｇｅｎｃｙ」に該当しており、この単語が「ＴｈｅＭｅｔｅｏｒｏｌｏｇｉｃａｌＡｇｅｎｃｙｈａｓｉｓｓｕｅｄａｈｅａｖｙｒａｉｎａｄｖｉｓｏｒｙ」という文中に存在した場合に、この句を区切る括弧「（ＴｈｅＭｅｔｅｏｒｏｌｏｇｉｃａｌＡｇｅｎｃｙ）ｈａｓｉｓｓｕｅｄａｈｅａｖｙｒａｉｎａｄｖｉｓｏｒｙ」として反映されるものである。 The syntax data represents a set of clauses and phrases. For example, in the case of a phrase, the Japanese “Meteorological Agency” corresponds to the English “The Meteorological Agency”, and this word is present in the sentence “The Meteorological Agenda has a rain train”. This is reflected as parentheses that delimit this phrase “(The Metalogical Agency) has been a heavy rain advice”.

なお、この実施形態では、第一言語表現特定手段１５では、対訳辞書データを用いて、英語の複数の文に含まれている単語が、本文に含まれている単語のどの単語に該当するのかを特定しているが、日本語の単語と英語の単語との関連性について求められた対数尤度比データや日本語の単語と英語の単語とが対応する確率について求められた単語対応確率を用いて特定してもよい。 In this embodiment, the first language expression specifying means 15 uses the bilingual dictionary data to determine which word included in the plurality of English sentences corresponds to the word included in the text. Log likelihood ratio data obtained for the relationship between the Japanese word and the English word, and the word correspondence probability obtained for the probability that the Japanese word corresponds to the English word. May be specified.

ここで、図７を参照して、第一言語表現特定手段１５によって特定した例（対応付け例）について説明する。
この図７に示したように、英語の（ｆｒｏｍｌａｔｅｒｔｏｎｉｇｈｔ）は日本語の（これから）に対応付けられ、英語の（ｕｎｔｉｌｔｏｍｏｒｒｏｗ）は日本語の（あすにかけても）に対応付けられ、英語の（Ｈｅａｖｙｒａｉｎ）は日本語の（強い雨が降る）に対応付けられ、英語の（ｉｓｆｏｒｅｃａｓｔ）は日本語の（恐れがあり）に対応付けられ、英語の（ＴｈｅＭｅｔｅｏｒｏｌｏｇｉｃａｌＡｇｅｎｃｙ）は日本語の（気象庁は）に対応付けられ、英語の（ａｈｅａｖｙｒａｉｎ）は日本語の（雨に）に対応付けられ、英語の（ａｄｖｉｓｏｒｙ）は日本語の（警戒するよう）に対応付けられ、英語の（ｈａｓｉｓｓｕｅｄ）は日本語の（呼びかけています）に対応付けられている。 Here, with reference to FIG. 7, the example (association example) specified by the first language expression specifying means 15 will be described.
As shown in FIG. 7, English (from later tonight) is associated with Japanese (from now on), English (until torrow) is associated with Japanese (even tomorrow), and English (Heavy rain) is associated with Japanese (severe rain), English (is forecast) is associated with Japanese (may be feared), and English (The Metalogical Agency) is associated with Japanese (Meteorological Agency) is associated with English (a heavy rain) is associated with Japanese (rain), English (advancery) is associated with Japanese (to be wary), English (Has raised) is associated with Japanese (calling).

この図７の例からもわかるように、日本語の（各地で）と（今後の）とは、英語の単語と対応付けられていない。これは、翻訳された英語が、日本語を直訳したものではないために内容的に重要でない部分か、省略されているために生じた部分かいずれかである。 As can be seen from the example in FIG. 7, Japanese (in various places) and (future) are not associated with English words. This is either a part that is not important in terms of content because the translated English is not a direct translation of Japanese, or a part that occurs because it is omitted.

対訳辞書データ蓄積手段１７は、日本語の単語と英語の単語との逐語訳である対訳辞書データを蓄積したもので、一般的なハードディスク等の記録媒体によって構成されている。 The bilingual dictionary data accumulating means 17 accumulates bilingual dictionary data which is a verbatim translation of Japanese words and English words, and is constituted by a general recording medium such as a hard disk.

第一言語分割手段１９は、第一言語表現特定手段１５で特定された英語の複数の文に含まれている単語と、本文に含まれている単語との対応関係に従って、本文の各部分と英語の１つの文とが対応するように、当該本文を分割するもので、主語・提題付加手段１９ａを備えている。 The first language dividing unit 19 is configured to determine each part of the text according to the correspondence between the words included in the plurality of English sentences specified by the first language expression specifying unit 15 and the words included in the text. The main text is divided so that one sentence in English corresponds, and a subject / proposition adding means 19a is provided.

主語・提題付加手段１９ａは、本文を各部分に分割した際に、この各部分に英語の１つの文にあわせて、日本語の主語や提題を付加するものである。なお、提題には、論証されるべき命題、論題、定立、主張、テーゼといった様々な意味があるが、ここでは、助詞「は」で終わる文節のうち、最も文頭に近いものである。
ここで、第一言語分割手段１９によって、図７に示した本文が分割される例について説明する。 The subject / proposition adding means 19a adds a Japanese subject or proposal to each part in accordance with one English sentence when the text is divided into each part. The proposal has various meanings such as a proposition to be proved, a topic, a standing, an assertion, and a thesis, but here, the phrase that ends with the particle "ha" is the closest to the beginning of the sentence.
Here, an example in which the text shown in FIG. 7 is divided by the first language dividing unit 19 will be described.

「（これから）（あすにかけても）（各地で）（強い雨が降る）（恐れがあり）、（気象庁は）（今後の）（雨に）（警戒するように）（呼びかけています）。」は、「（これから）（あすにかけても）（各地で）（強い雨が降る）（恐れがあり）、」と「（気象庁は）（今後の）（雨に）（警戒するように）（呼びかけています）。」との２つに分割され、前文に対応した英語の文は「Ｈｅａｖｙｒａｉｎｉｓｆｏｒｅｃａｓｔｆｒｏｍｌａｔｅｒｔｏｎｉｇｈｔｕｎｔｉｌｔｏｍｏｒｒｏｗ」となり、後文に対応した英語の文は「ＴｈｅＭｅｔｅｏｒｏｌｏｇｉｃａｌＡｇｅｎｃｙｈａｓｉｓｓｕｅｄａｈｅａｖｙｒａｉｎａｄｖｉｓｏｒｙ」となる。 "(Future) (even tomorrow) (in various places) (it will rain heavily) (may be afraid), (the Meteorological Agency) (future) (in the rain) (to be vigilant) (calling)." "(Future) (even tomorrow) (in various places) (it will rain heavily) (may be afraid)" and "(Japan Meteorological Agency) (future) (in the rain) (to be vigilant) (call The English sentence corresponding to the previous sentence will be "Heavy rain is fore cast from ton until tomtrorow", and the English sentence corresponding to the latter sentence will be "The Metrological agenda". heavy rain advice ".

「東海道・山陽新幹線は台風のため、三回にわたって運転を見合わせた影響で、これまでに十八本の列車が運休するなどダイヤが大幅に乱れています。」は、「東海道・山陽新幹線はダイヤが大幅に乱れています。」と「東海道・山陽新幹線は台風のため、三回にわたって運転を見合わせた影響で、」と「これまでに十八本の列車が運休するなど」との３つに分割される。そして、最初の文に対応した英語の文は「ＴｏｋａｉｄｏＳａｎｙｏＳｈｉｎｋａｎｓｅｎｔｒａｉｎｓｅｒｖｉｃｅｓｈａｖｅｂｅｅｎｄｉｓｒｕｐｔｅｄ．」となり、次の文に対応した英語の文は「Ｔｈｅｓｈｉｎｋａｎｓｅｎｂｕｌｌｅｔｔｒａｉｎｓｈａｄｔｏｓｕｓｐｅｎｄｏｐｅｒａｔｉｏｎｓｔｈｒｅｅｔｉｍｅｓｔｏｄａｙｄｕｅｔｏｔｈｅｔｙｐｈｏｏｎ．」となり、その次の文に対応した英語の文は「１８ｔｒａｉｎｓｈａｖｅｂｅｅｎｃａｎｃｅｌｌｅｄｓｏｆａｒ．」となる。 “The Tokaido / Sanyo Shinkansen is a typhoon, so the schedule has been greatly disturbed by the fact that eighteen trains have been suspended due to the impact of three stopovers.” “Tokaido / Sanyo Shinkansen is a typhoon, so it was affected by driving for three times,” and “18 trains have been suspended so far.” Divided. And the English sentence corresponding to the first sentence is "Tokaido Sankinshin train services have been recycled toed." typhoon. "and the English sentence corresponding to the next sentence is" 18 trains have bean cancelled so far. "

なお、これらの英語の文において、「Ｔｈｅｓｈｉｎｋａｎｓｅｎｂｕｌｌｅｔｔｒａｉｎｓ」が主語・提題付加手段１９ａによって付加された主語に該当する。 In these English sentences, “The Shinkansen bullettrains” corresponds to the subject added by the subject / proposition adding means 19a.

分割第一言語対応第二言語追加手段２１は、第一言語分割手段１９によって分割された本文の各部分と英語の１つの文とが対応している複数の英語の文及び当該本文の各部分を、一文対応翻訳データとして、対訳データベース２に追加するものである。 The divided first language corresponding second language adding means 21 includes a plurality of English sentences corresponding to each part of the text divided by the first language dividing means 19 and one English sentence, and each part of the text. Is added to the parallel translation database 2 as single-text translation data.

この用例データベース作成装置１によれば、対訳データベース２に収められている翻訳データにおいて、日本語の１つの文が英語の複数の文に翻訳されていると判別した場合に、当該日本語の引用表現を除いた後に、英語の文と対応付けることで、日本語の１つの文が英語の複数の文に翻訳されている場合に、日本語の１つの文における部分と英語の１つの文とが対応するようにした用例データベース４を作成することができる。 According to this example database creation device 1, when it is determined that one Japanese sentence is translated into a plurality of English sentences in the translation data stored in the bilingual database 2, the Japanese citation After removing the expression, if the sentence in Japanese is translated into multiple sentences in English by associating it with the sentence in English, the part in one sentence in Japanese and the sentence in English are A corresponding example database 4 can be created.

（用例データベース作成装置の動作）
次に、図２に示すフローチャートを参照して、用例データベース作成装置１の動作を説明する（適宜、図１参照）。
まず、用例データベース作成装置１は、翻訳データ判別手段３によって、対訳データベース２に収められている翻訳データにおいて、日本語の１つの文が英語の複数の文に翻訳されているか否かを判別する（ステップＳ１）。用例データベース作成装置１は、日本語の１つの文が英語の複数の文に翻訳されている翻訳データが全くない場合（ステップＳ１、Ｎｏ）は動作を終了し、日本語の１つの文が英語の複数の文に翻訳されている場合（ステップＳ１、Ｙｅｓ）は、引用表現判別手段５によって、当該日本語の１つの文（複数翻訳文包含文）に引用表現が含まれているか否かを判別する（ステップＳ２）。 (Operation of example database creation device)
Next, the operation of the example database creation device 1 will be described with reference to the flowchart shown in FIG. 2 (see FIG. 1 as appropriate).
First, the example database creation device 1 uses the translation data discriminating means 3 to discriminate whether one Japanese sentence is translated into a plurality of English sentences in the translation data stored in the parallel translation database 2. (Step S1). The example database creation device 1 ends the operation when there is no translation data in which one Japanese sentence is translated into a plurality of English sentences (step S1, No), and one Japanese sentence is converted to English. If the sentence is translated into a plurality of sentences (step S1, Yes), the quoted expression discriminating means 5 determines whether or not a quoted expression is included in one sentence in Japanese (multiple translated sentences included sentence). It discriminate | determines (step S2).

続いて、用例データベース作成装置１は、引用表現判別手段５で複数翻訳文包含文に引用表現が含まれていると判別された場合（ステップＳ２、Ｙｅｓ）、引用表現分離手段９によって、複数翻訳文包含文から引用表現を分離し（ステップＳ３）。引用表現が含まれていると判別されなかった場合（ステップＳ２、Ｎｏ）、このステップＳ３は、エスケープされ、ステップＳ４に移行する。 Subsequently, the example database creation device 1 determines that the quoted expression discriminating unit 5 determines that the quoted expression is included in the plural translation sentence inclusion sentence (step S2, Yes), and the quoted expression separating unit 9 performs plural translation. The quoted expression is separated from the sentence inclusion sentence (step S3). If it is not determined that a quoted expression is included (No in step S2), step S3 is escaped and the process proceeds to step S4.

そして、用例データベース作成装置１は、第二言語複数文判別手段１１によって、引用表現分離手段９で引用表現が分離された複数翻訳文包含文（本文）が英語の複数の文によって構成されているか否かを判別する（ステップＳ４）。そして、用例データベース作成装置１は、本文が英語の複数の文によって構成されていないと判別した場合（ステップＳ４、Ｎｏ）は動作を終了し、本文が英語の複数の文によって構成されていると判別した場合（ステップＳ４、Ｙｅｓ）は、第一言語表現特定手段１５によって、英語の文に含まれている単語が、日本語の文である本文のどの表現（単語）に該当するかを特定する（ステップＳ５）。 Then, the example database creation device 1 is configured so that the multiple translated sentence inclusion sentence (text) in which the quotation expression is separated by the quotation expression separation means 9 by the second language plural sentence discrimination means 11 is composed of a plurality of English sentences. It is determined whether or not (step S4). When the example database creation device 1 determines that the text is not composed of a plurality of English sentences (No in step S4), the operation ends, and the text is composed of a plurality of English sentences. If it is determined (step S4, Yes), the first language expression specifying means 15 specifies which expression (word) in the body that is a Japanese sentence the word included in the English sentence corresponds to. (Step S5).

そして、用例データベース作成装置１は、第一言語分割手段１９によって、第一言語表現特定手段１５で特定された英語の複数の文に含まれている単語と、本文の各部分と英語の１つの文とが対応するように、当該本文を分割する（ステップＳ６）。そして、用例データベース作成装置１は、分割第一言語対応第二言語追加手段２１によって、本文の各部分と英語の１つの文とが対応している複数の英語の文及び当該本文の各部分とを、一文対応翻訳データとして、対訳データベース２に追加し、用例データベース４とする（ステップＳ７）。 Then, the example database creation device 1 uses the first language dividing unit 19 to include words included in a plurality of English sentences specified by the first language expression specifying unit 15, each part of the text, and one English sentence. The body is divided so that the sentence corresponds (step S6). Then, the example database creation device 1 uses the divided first language-corresponding second language adding unit 21 to generate a plurality of English sentences corresponding to each part of the text and one English sentence and each part of the text. Is added to the bilingual database 2 as single sentence corresponding translation data to be an example database 4 (step S7).

（翻訳装置の構成）
次に、用例データベース作成装置１で作成した用例データベース４を用いた翻訳装置について、図３を参照して説明する。図３は翻訳装置のブロック図である。
図３に示すように、翻訳装置３１は、入力された第一言語の文を、第二言語の文に翻訳するもので、引用表現判別手段５と、引用表現パターン蓄積手段７と、引用表現分離手段９と、第一言語節・並列句判別手段３３と、節・並列句パターン蓄積手段３５と、第一言語翻訳単位分割手段３７と、単位毎翻訳手段３９と、翻訳手段４１と、最大スコア翻訳結果選択手段４３と、翻訳結果出力手段４５と、対訳辞書データ蓄積手段１７とを備えている。なお、図１に示した用例データベース作成装置１と同様の構成は同一の符号を付してその説明を省略する。 (Configuration of translation device)
Next, a translation apparatus using the example database 4 created by the example database creation apparatus 1 will be described with reference to FIG. FIG. 3 is a block diagram of the translation apparatus.
As shown in FIG. 3, the translation device 31 translates an input sentence in the first language into a sentence in the second language, and includes a quoted expression discriminating means 5, a quoted expression pattern accumulating means 7, and a quoted expression. Separating means 9, first language clause / parallel phrase discriminating means 33, clause / parallel phrase pattern accumulating means 35, first language translation unit dividing means 37, unit-by-unit translation means 39, translation means 41, maximum Score translation result selection means 43, translation result output means 45, and bilingual dictionary data storage means 17 are provided. Note that the same components as those in the example database creation device 1 shown in FIG.

なお、この翻訳装置３１では、入力される第一言語を日本語、翻訳する第二言語を英語として説明するが、用例データベース４が存在する言語であり、対訳辞書データ蓄積手段１７が用意できるのであれば、どのような言語であってもよい。 In this translation apparatus 31, the first language to be input is described as Japanese, and the second language to be translated is English. However, the translation database data storage means 17 can be prepared because the example database 4 exists. Any language can be used.

また、この翻訳装置３１に入力される日本語の文（以下、単に入力文ともいう）の例を「気象庁によりますと低気圧が日本付近を通過するためこれからあすにかけても北日本の太平洋側にまとまった雪が降る恐れがあるということです。」としている。 In addition, an example of a Japanese sentence (hereinafter also referred to simply as an input sentence) input to the translation device 31 is that, according to the Japan Meteorological Agency, the cyclone passes near Japan, so it will be gathered on the Pacific side of northern Japan. There is a risk that it will snow. "

第一言語節・並列句判別手段３３は、引用表現分離手段９で引用表現の分離された日本語の文（本文）が節又は並列句を含むか否かを、節・並列句パターン蓄積手段３５に蓄積されている節・並列句パターンを用いて判別するものである。この第一言語節・並列句判別手段３３では、本文と節・並列句パターンとが一致した場合に、節又は並列句と判別している。そして、第一言語節・並列句判別手段３３では、節又は並列句が含まれていると判別した本文を第一言語翻訳単位分割手段３７に、節又は並列句が含まれていると判別しなかった本文を翻訳手段４１に出力する。 The first language clause / parallel phrase discriminating means 33 determines whether or not the Japanese sentence (text) from which the quote expression is separated by the quote expression separating means 9 includes a clause or a parallel phrase. It is determined by using the clause / parallel phrase pattern stored in 35. The first language clause / parallel phrase discriminating means 33 discriminates a clause or parallel phrase when the text matches the clause / parallel phrase pattern. Then, the first language clause / parallel phrase discriminating means 33 discriminates that the text determined to contain the clause or the parallel phrase is contained in the first language translation unit dividing means 37 as containing the clause or the parallel phrase. The missing text is output to the translation means 41.

なお、この第一言語節・並列句判別手段３３には、入力文から引用表現が除かれた本文「低気圧が日本付近を通過するためこれからあすにかけても北日本の太平洋側を中心にまとまった雪が降る恐れがある」が入力されている。ちなみに、引用表現分離手段９で入力文から分離された引用表現は「気象庁によりますとということです」である。 In the first language clause / parallel phrase discriminating means 33, the text “quoted expression is removed from the input sentence“ Snow that has gathered around the Pacific side of northern Japan will continue tomorrow because the cyclone passes near Japan. "There is a risk of falling" is entered. By the way, the quoted expression separated from the input sentence by the quoted expression separating means 9 is “It means that it depends on the Japan Meteorological Agency”.

節・並列句パターン蓄積手段３５は、日本語の文法に従った文の節及び並列句をパターン化した節・並列句パターンを蓄積するもので、一般的なハードディスク等の記録媒体によって構成されている。この節・並列句パターン蓄積手段３５に蓄積されている節・並列句パターンには、例えば、節（節の末端である節末）の場合、“動詞＋が、”や“動詞＋て、”が挙げられる。 The clause / parallel phrase pattern storage means 35 stores sentence clauses according to Japanese grammar and clause / parallel phrase patterns obtained by patterning parallel phrases, and is constituted by a recording medium such as a general hard disk. Yes. The clause / parallel phrase pattern stored in the clause / parallel phrase pattern storage unit 35 includes, for example, “verb +” and “verb + to” in the case of a clause (the end of a clause which is the end of a clause). Can be mentioned.

例えば、「関東では雨が降ってきていますが、東北では雨は降っていません。」という本文では、“動詞＋が、”のパターンと「降ってきていますが」の部分とが一致して、節末が含まれていると判別される。 For example, in the text “It's raining in Kanto, but not in Tohoku.”, The pattern “verb + is” and the part “It ’s falling” match. , It is determined that the end of the clause is included.

また、並列句の場合、「１つ以上の“名詞類＋が＋数詞＋ヶ所、”と１つの“名詞類＋が＋数詞＋ヶ所”」が挙げられる。
例えば、「このほか、道路の損壊が二ヶ所、流された橋が一ヶ所、山崩れが五ヶ所などとなっています。」という本文に、この並列句のパターンを適用すると、「道路の損壊が二ヶ所」、「流された橋が一ヶ所」、「山崩れが五ヶ所」という３つの並列句が含まれていると判別される。 In the case of a parallel phrase, “one or more“ nouns + is + numbers + places ”” and one “nouns + is + numbers + places” are listed.
For example, if this parallel phrase pattern is applied to the text “There are two other road damages, one bridge washed away, and five mountain landslides.” It is determined that three parallel phrases, “Two places”, “One washed away bridge”, and “Five places of landslides” are included.

第一言語翻訳単位分割手段３７は、第一言語節・並列句判別手段３３で節又は並列句が含まれていると判別された本文を、様々な翻訳単位に分割して単位毎翻訳手段３９に出力するもので、主語・提題付加手段３７ａを備えている。この第一言語翻訳単位分割手段３７から出力された翻訳単位は、当該翻訳単位毎に単位毎翻訳手段３９で翻訳され、翻訳単位ごとのスコア（詳細は後記）が計算されることになり、最大スコア翻訳結果選択手段４３に出力されて最大のスコアの翻訳候補（詳細は後記）が選択されることになる。 The first language translation unit dividing means 37 divides the text determined by the first language clause / parallel phrase discriminating means 33 to contain a clause or parallel phrase into various translation units, and translates each unit. And a subject / proposition adding means 37a. The translation unit output from the first language translation unit dividing unit 37 is translated by the unit translation unit 39 for each translation unit, and a score (details will be described later) for each translation unit is calculated. The translation candidate with the maximum score (details will be described later) is selected by being output to the score translation result selection means 43.

それゆえ、この第一言語翻訳単位分割手段３７では、第一言語節・並列句判別手段３３で節又は並列句が含まれている本文を、これら節又は並列句に従って、分割可能なすべての組み合わせについて、翻訳単位を出力する。 Therefore, in this first language translation unit dividing means 37, all combinations that can be divided according to these clauses or parallel phrases are included in the text containing the clauses or parallel phrases in the first language clause / parallel phrase discriminating means 33. Output translation units for.

例えば、この第一言語翻訳単位分割手段３７では、複数の節のみが本文に含まれている場合には、各節を翻訳単位とするように本文を分割し、３つ以上の節（Ａ節、Ｂ節、Ｃ節）が本文に含まれている場合には、Ａ節とＢ節とを１つの翻訳単位とし、Ｃ節を１つの翻訳単位としたり、Ａ節を１つの翻訳単位とし、Ｂ節とＣ節とを１つの翻訳単位としたりするように本文を分割する。また、この第一言語翻訳単位分割手段３７では、節が含まれておらず、複数の並列句のみが本文に含まれている場合に、各並列句を翻訳単位とするように本文を分割し、３つ以上の並列句（Ａ句、Ｂ句、Ｃ句）が本文に含まれている場合には、Ａ句とＢ句とを１つの翻訳単位とし、Ｃ句を１つの翻訳単位としたり、Ａ句を１つの翻訳単位とし、Ｂ句とＣ句とを１つの翻訳単位としたりするように本文を分割する。 For example, in the first language translation unit dividing means 37, when only a plurality of sections are included in the body, the body is divided so that each section is a translation unit, and three or more sections (section A) , Section B, Section C) are included in the text, Section A and Section B are one translation unit, Section C is one translation unit, Section A is a translation unit, The body is divided so that Section B and Section C are used as one translation unit. Further, the first language translation unit dividing means 37 divides the text so that each parallel phrase is used as a translation unit when no clause is included and only a plurality of parallel phrases are included in the text. When three or more parallel phrases (A phrase, B phrase, C phrase) are included in the text, the A phrase and the B phrase are set as one translation unit, and the C phrase is set as one translation unit. The text is divided so that the A phrase is one translation unit and the B phrase and the C phrase are one translation unit.

或いは、この第一言語翻訳単位分割手段３７では、節又は並列句に従った分割可能なすべての組み合わせについて出力せずに、複数の節と複数の並列句とが本文に含まれている場合には、１つの節については並列句を１つの翻訳単位とし、別の節については、並列句それぞれを翻訳単位として、本文を分割したり、１つの節について並列句の数が所定数よりも多い場合のみ並列句それぞれを翻訳単位として、本文を分割したりする。 Alternatively, the first language translation unit dividing means 37 does not output all combinations that can be divided according to the clauses or parallel phrases, and includes a plurality of clauses and a plurality of parallel phrases included in the text. Divides the text by using parallel phrases as one translation unit for one section, and each parallel phrase as a translation unit for another section, or the number of parallel phrases for one section is greater than a predetermined number Only when the parallel phrase is used as a translation unit, the text is divided.

ここで、翻訳単位の例を挙げる。
例えば、「低気圧が日本付近を通過するためこれからあすにかけても北日本の太平洋側を中心にまとまった雪が降る恐れがある」は「低気圧が日本付近を通過するため」と「これからあすにかけても北日本の太平洋側を中心にまとまった雪が降る恐れがある」という２つの翻訳単位に分割される。 Here, examples of translation units are given.
For example, “Since the cyclone passes near Japan, there is a risk of falling snow centering on the Pacific side of northern Japan even in the future.” “Because the cyclone passes near Japan” It is divided into two translation units, "There is a risk of snow falling on the Pacific side of the country."

主語・提題付加手段３７ａは、本文を翻訳単位に分割した際に、この翻訳単位に英語の１つの文にあわせて、日本語の主語や提題を付加するものである。 The subject / proposition adding means 37a adds a Japanese subject or proposal to the translation unit in accordance with one English sentence when the text is divided into translation units.

単位毎翻訳手段３９は、用例データベース４を用いて、第一言語翻訳単位分割手段３７から出力された翻訳単位ごとに翻訳して、当該翻訳単位毎のスコア及び当該翻訳単位毎の翻訳結果である翻訳候補を、最大スコア翻訳結果選択手段４３に出力するものである。この単位毎翻訳手段３９の詳細な構成を図４に示す。この図４に示すように、単位毎翻訳手段３９は、用例データ取得手段３９ａと、用例データ選択手段３９ｂと、編集手段３９ｃと、翻訳候補出力手段３９ｄとを備えている。 The unit translation means 39 translates each translation unit output from the first language translation unit division means 37 using the example database 4, and provides the score for each translation unit and the translation result for each translation unit. The translation candidate is output to the maximum score translation result selection means 43. A detailed configuration of the unit-by-unit translation means 39 is shown in FIG. As shown in FIG. 4, the unit-by-unit translation means 39 includes an example data acquisition means 39a, an example data selection means 39b, an editing means 39c, and a translation candidate output means 39d.

用例データ取得手段３９ａは、第一言語翻訳単位分割手段３７から出力された翻訳単位に含まれる述語が一致又は予め設定した類似度を満たす日本語の文を、用例データとして用例データベース４に収められている翻訳データ又は一文対応翻訳データから取得するものである。なお、予め設定した類似度は、ある述語と別の述語との意味的な近さである。例えば、ある述語の日本語と別の述語の日本語とが用例データベース４の中で、同じ英語に対応する確率として、類似度の値を求めることができる。そして、予め設定した類似度を満たすとは、予め設定した閾値よりも類似度の値が大きい場合のことを指している。 The example data obtaining unit 39a stores, in the example database 4 as example data, a Japanese sentence in which predicates included in the translation unit output from the first language translation unit dividing unit 37 match or satisfy a preset similarity. It is obtained from translation data that is stored or translation data corresponding to one sentence. Note that the similarity set in advance is the semantic proximity between one predicate and another predicate. For example, the similarity value can be obtained as the probability that Japanese in one predicate and Japanese in another predicate correspond to the same English in the example database 4. Satisfying a preset similarity indicates that the similarity value is greater than a preset threshold.

例えば、この用例データ取得手段３９ａに翻訳単位として「低気圧が日本付近を通過するため」が入力された場合、この翻訳単位に含まれている述語“通過する”と一致又は類似する述語を含む用例データとして「低気圧が沖縄付近を通過する」（対応している英語の文は、「ＡｌｏｗａｉｒｐｒｅｓｓｕｒｅｓｙｓｔｅｍｐａｓｓｅｓｃｌｏｓｅｔｏＯｋｉｎａｗａ．」である）が用例データベース４から取得される。 For example, when “because the low pressure passes near Japan” is input to the example data acquisition means 39a as a translation unit, a predicate that matches or is similar to the predicate “passes” included in this translation unit is included. As the example data, “low pressure passes near Okinawa” (the corresponding English sentence is “A low air pressure systems close to Okinawa.”) Is acquired from the example database 4.

なお、この実施形態では、用例データ取得手段３９ａは、日本語の文に含まれている述語に着目しているが、この述語以外に、用例データベース４に収められている翻訳データ又は一文対応翻訳データにおいて、英語の文に含まれている述語が日本語の文に含まれている述語以外に対応している場合は、該当する部分について、翻訳単位と一致する部分のみを取得する。 In this embodiment, the example data acquisition unit 39a focuses on a predicate included in a Japanese sentence, but in addition to this predicate, translation data stored in the example database 4 or single sentence correspondence translation In the data, if the predicate included in the English sentence corresponds to a predicate other than the predicate included in the Japanese sentence, only the part that matches the translation unit is acquired for the corresponding part.

例えば、この用例データ取得手段３９ａに翻訳単位として「発達中の低気圧が関東の南の海上を進んでいるため、」が入力された場合、この翻訳単位に含まれている“発達中”と一致又は類似する部分を含む翻訳データ又は一文対応翻訳データの英語の文「Ａｌｏｗ−ｐｒｅｓｓｕｒｅｓｙｓｔｅｍｉｓｄｅｖｅｌｏｐｉｎｇｏｆｆｔｈｅＫａｎｔｏｃｏａｓｔ」から“ｉｓｄｅｖｅｌｏｐｉｎｇ”のみが用例データベース４から取得される。 For example, if “because a developing low pressure is traveling south of the Kanto region” is input to this example data acquisition means 39a, “under development” included in this translation unit Only “is developing” is acquired from the example database 4 from the English sentence “A low-pressure system is developing off the Kanto coast” of translation data including matching or similar parts or translation data corresponding to one sentence.

用例データ選択手段３９ｂは、用例データ取得手段３９ａで取得された用例データと、翻訳単位との距離を計算し、計算した距離が最小のものから所定数の用例データ（距離が小さいものから順位を付けた場合の上位に位置する用例データ）を選択するものである。 The example data selection unit 39b calculates the distance between the example data acquired by the example data acquisition unit 39a and the translation unit, and calculates a predetermined number of example data (from the smallest distance to the ranking) from the smallest calculated distance. (Example data positioned at the top of the case) is selected.

この距離は、用例データと翻訳単位とがどれほど似通っているのかを示す目安（構文構造が近似している度合いを表すもの、さらに、構文構造だけではなく、内容（単語）も類似している目安）となるもので、この距離が小さいほど似通っていることになる。 This distance is a guideline indicating how similar the example data and translation units are (representing the degree to which the syntax structure is approximate, and not only the syntax structure but also the contents (words) are similar ), The smaller the distance, the more similar.

そして、用例データ選択手段３９ｂで計算される距離は、用例データと翻訳単位とが同じ構文構造を取る場合（単語の順序を入れ替えることで、同じ構文構造になる場合も含む）に最小の編集距離であるとする。例えば、用例データの日本語の文が主語＋補語１＋補語２＋述語で構成されている場合に、翻訳単位が主語＋補語１＋補語２＋述語で構成されている場合には、同じ構文構造を取ると言えるし、翻訳単位が主語＋補語２＋補語１＋述語で構成されていても同じ構文構造を取ると言える。 The distance calculated by the example data selection unit 39b is the minimum edit distance when the example data and the translation unit have the same syntax structure (including the case where the same syntax structure is obtained by changing the order of words). Suppose that For example, when the Japanese sentence of the example data is composed of subject + complement 1 + complement 2 + predicate, and the translation unit is composed of subject + complement 1 + complement 2 + predicate, the same syntax structure is assumed. It can be said that even if the translation unit is composed of subject + complement 2 + complement 1 + predicate, the same syntax structure is taken.

そして、この編集距離に、用例データ（日本語の文）を英語の文に翻訳した際に削除されている日本語の単語が含まれている場合のコストを削除コストとして付加し、言語上の意味属性が一致する場合又はシソーラス（ｔｈｅｓａｕｒｕｓ：語を意味的類似により分類・配列したもの、分類語彙表）において意味が近似する場合のコストを置換コストとして付加したものである。 Then, the cost when a Japanese word that has been deleted when the example data (Japanese sentence) is translated into an English sentence is added to the edit distance as a deletion cost. The cost when the semantic attributes are the same or when the meaning is approximated in the thesaurus (thesesaurus: classified and arranged words by semantic similarity, classified vocabulary table) is added as a replacement cost.

削除コストの例としては、用例データの日本語の文とこの用例データに対応付けられている英語の文とを比較した際に、省略されている場合の削除コストを“０”とする。 As an example of the deletion cost, when the Japanese sentence of the example data is compared with the English sentence associated with the example data, the deletion cost when omitted is “0”.

置換コストの例としては、シソーラスにおいて、意味が近い単語、例えば、“未来”と“将来”、“嫌い”と“苦手”といったように、単純に単語の置換が可能であれば、置換コストは低くなる。また、逆に、意味が遠い単語（例えば、反対語）、“未来”と“過去”、“嫌い”と“好き”といったように、単純に逆の意味の単語に置換しなければならない場合、置換コストは高くなる。 As an example of replacement cost, if a word can be replaced simply, such as words that have similar meanings in the thesaurus, such as “future” and “future”, “dislike” and “bad”, the replacement cost is Lower. On the other hand, if you need to replace words with meanings that are far from each other (for example, opposite words), “future” and “past”, “hate” and “like”, etc. The replacement cost is high.

また、置換コストの例として、例えば、「埼玉県」と「神奈川県」は、地名という意味では、同じであるので、置換コストは低くなる。 Further, as examples of replacement costs, for example, “Saitama Prefecture” and “Kanagawa Prefecture” are the same in terms of place names, so the replacement cost is low.

そして、この用例データ選択手段３９ｂで選択された用例データ（日本語の文）は対応する英語の文と共に、編集手段３９ｃに出力されると共に、計算した距離については、編集手段３９ｃを経由して翻訳候補出力手段３９ｄに出力される。 The example data (Japanese sentence) selected by the example data selection means 39b is output to the editing means 39c together with the corresponding English sentence, and the calculated distance is passed through the editing means 39c. It is output to the translation candidate output means 39d.

編集手段３９ｃは、用例データ選択手段３９ｂで選択された用例データ（日本語の文）と翻訳単位とが同じになるように、用例データを編集し、日本語の単語と英語の単語との対応関係を予め設定した日英単語対応情報（予め設定した編集の規則）を用いて、この編集に従って当該用例データに対応付けられている英語の文に含まれている単語を、置換、削除又は挿入する場合の編集コストを計算して出力すると共に、置換、削除又は挿入した後の英語の文を翻訳候補として出力するものである。ここで、英語の文に含まれている単語を、置換する場合のコストを置換コスト、削除する場合のコストを削除コストと呼称し呼び、別の単語を挿入する場合のコストを挿入コストと呼称することとする。 The editing unit 39c edits the example data so that the example data (Japanese sentence) selected by the example data selecting unit 39b is the same as the translation unit, and the correspondence between the Japanese word and the English word is edited. Using Japanese-English correspondence information with preset relations (pre-set editing rules), replace, delete, or insert words contained in English sentences associated with the example data according to this editing In addition to calculating and outputting the editing cost for the case, the English sentence after replacement, deletion or insertion is output as a translation candidate. Here, a word included in an English sentence is referred to as a replacement cost, a cost when it is replaced, a deletion cost as a deletion cost, and a cost when another word is inserted as an insertion cost. I decided to.

なお、日英単語対応情報は、日本語の単語（日本語表現）を英語の単語（英語表現）に置換する場合にどれだけの信頼度（１以下）で対応しているのかを定義したものである。例えば、この日英単語対応情報では、日本語の単語「沖縄」は信頼度０．９で英語の単語「Ｏｋｉｎａｗａ」に対応していると定義されている。また、この日本語の単語「沖縄」は信頼度０．４で英語の単語「ｓｙｓｔｅｍ」に対応していると定義されている。 Note that the Japanese-English correspondence information defines how much reliability (1 or less) is used when replacing Japanese words (Japanese expressions) with English words (English expressions). It is. For example, the Japanese English word correspondence information defines that the Japanese word “Okinawa” corresponds to the English word “Okinawa” with a reliability of 0.9. The Japanese word “Okinawa” is defined as having a reliability of 0.4 and corresponding to the English word “system”.

また、編集コストは、用例データに対応付けられている英語の文に含まれている単語を、置換又は削除する際には、日英単語対応で定義されている信頼度に基づいて計算されることになり、当該日英単語対応で定義されている信頼度が高ければ小さくなり、信頼度が低ければ大きくなることになる。 Also, the editing cost is calculated based on the reliability defined for Japanese-English words when replacing or deleting words included in the English sentence associated with the example data. In other words, if the reliability defined in correspondence with the Japanese-English word is high, the reliability is low, and if the reliability is low, the reliability is high.

さらに、編集コストは、用例データに対応付けられている英語の文に、単語を挿入する際には、以下のように計算されることになる。まず、用例データに対応付けられている英語の文の構文構造を解析し、解析した結果、挿入する単語の修飾関係が正しくなる位置のみを挿入位置の候補とする。そして、解析した結果、挿入位置の候補が複数ある場合には、予め記録しておいた言語モデル等の統計情報を利用して、最適な挿入位置の候補を決定する。 Further, the editing cost is calculated as follows when a word is inserted into an English sentence associated with the example data. First, the syntax structure of the English sentence associated with the example data is analyzed, and as a result of the analysis, only the position where the modification relationship of the inserted word is correct is set as the insertion position candidate. As a result of the analysis, when there are a plurality of insertion position candidates, the optimal insertion position candidate is determined using statistical information such as a language model recorded in advance.

なお、この単語を挿入する場合の挿入コストは、挿入する単語数が増加すればするほど、増加することとなる。そして、編集コストは、削除コストと置換コストと挿入コストと総和で求められることとなる。 Note that the insertion cost for inserting this word increases as the number of words to be inserted increases. Then, the editing cost is obtained as the sum of the deletion cost, the replacement cost, the insertion cost, and the like.

ここで、編集手段３９ｃで用例データを、翻訳単位と同じように編集した例と、この編集に従って（合わせて）用例データに対応付けられている英語の文を編集した場合の編集コストの計算例とについて説明する。 Here, an example of calculating the editing cost when the example data is edited by the editing means 39c in the same manner as the translation unit, and the English sentence associated with the example data is edited in accordance with this editing. And will be described.

この編集手段３９ｃに入力された翻訳単位が「低気圧が日本付近を通過するため」であり、用例データが「低気圧が沖縄付近を通過する」であるとすると、翻訳単位に含まれている「日本」に合わせるため、用例データに含まれている「沖縄」を「日本」に置換する。そして、この用例データに対応付けられている英語の文も、この編集に合わせて、日英単語対応情報を用いて編集すると、「ＡｌｏｗａｉｒｐｒｅｓｓｕｒｅｓｙｓｔｅｍｐａｓｓｅｓｃｌｏｓｅｔｏＯｋｉｎａｗａ．」が「ＡｌｏｗａｉｒｐｒｅｓｓｕｒｅｓｙｓｔｅｍｐａｓｓｅｓｃｌｏｓｅｔｏＪａｐａｎ．」になる。 If the translation unit input to the editing means 39c is “because the low pressure passes near Japan” and the example data is “the low pressure passes near Okinawa”, it is included in the translation unit. To match “Japan”, “Okinawa” included in the example data is replaced with “Japan”. When the English sentence associated with the example data is also edited using the Japanese-English correspondence information in accordance with this editing, “A low air pressure system close to Okinawa.” Is changed to “A low air.” "pressure system passes close to Japan."

この場合、編集手段３９ｃでは、用例データ（日本語の文）の「沖縄」を「日本」に置換する編集をし、この編集に従って、用例データに対応付けられている英語の文の「Ｏｋｉｎａｗａ」を「Ｊａｐａｎ」に置換する編集を行っている。このような編集では、日英単語対応情報により、日本語の単語「沖縄」と英語の単語（英語表現）とがどれだけの信頼度で対応しているかによって、英語表現と編集コストとが決定されることになる。そして、この実施形態では、編集コストを“１−信頼度”で定義している。 In this case, the editing unit 39c performs editing by replacing “Okinawa” in the example data (Japanese sentence) with “Japan”, and according to this editing, “Okinawa” in the English sentence associated with the example data. Is edited to replace “Japan” with “Japan”. In such editing, the English expression and the editing cost are determined by the reliability of the correspondence between the Japanese word “Okinawa” and the English word (English expression) based on the Japanese / English correspondence information. Will be. In this embodiment, the editing cost is defined as “1-reliability”.

日英単語対応情報において、例えば、日本語の単語「沖縄」と英語表現「Ｏｋｉｎａｗａ」とが対応付けられており、その信頼度が０．９である場合には、編集コストは１−０．９＝０．１となる。そして、翻訳単位が「低気圧が日本付近を通過するため」であり、用例データが「低気圧が沖縄付近を通過する」であるとすると、翻訳単位に含まれている「日本」に合わせるため、用例データに含まれている「沖縄」を「日本」に置換する。そして、この用例データに対応付けられている英語の文も、この編集に合わせて、日英単語対応情報を用いて編集するので、「沖縄」に対応付けられた「Ｏｋｉｎａｗａ」が「日本」を表現した「Ｊａｐａｎ」となり、編集後の英語の文（翻訳候補）は、前記したように「ＡｌｏｗａｉｒｐｒｅｓｓｕｒｅｓｙｓｔｅｍｐａｓｓｅｓｃｌｏｓｅｔｏＪａｐａｎ．」となる。 In the Japanese-English correspondence information, for example, when the Japanese word “Okinawa” is associated with the English expression “Okinawa” and the reliability is 0.9, the editing cost is 1-0. 9 = 0.1. And if the translation unit is “because the low pressure passes near Japan” and the example data is “the low pressure passes near Okinawa”, it matches the “Japan” included in the translation unit. Replace “Okinawa” in the example data with “Japan”. The English sentence associated with this example data is also edited using the Japanese-English correspondence information in accordance with this editing, so that “Okinawa” associated with “Okinawa” is changed to “Japan”. The expression “Japan” is expressed, and the edited English sentence (translation candidate) is “A low air pressure system close to Japan” as described above.

また、日英単語対応情報において、例えば、日本語の単語「沖縄」と英語表現「ｓｙｓｔｅｍ」とが対応付けられており、その信頼度が０．４である場合には、編集コストは１−０．４＝０．６となる。そして、翻訳単位が「低気圧が日本付近を通過するため」であり、用例データが「低気圧が沖縄付近を通過する」であるとすると、翻訳単位に含まれている「日本」に合わせるため、用例データに含まれている「沖縄」を「日本」に置換する。そして、この用例データに対応付けられている英語の文も、この編集に合わせて、日英単語対応情報を用いて編集するので、「沖縄」に対応付けられた「ｓｙｓｔｅｍ」が「日本」を表現した「Ｊａｐａｎ」となり、編集後の英語の文（翻訳候補）は、「ＡｌｏｗａｉｒｐｒｅｓｓｕｒｅＪａｐａｎｐａｓｓｅｓｃｌｏｓｅｔｏＯｋｉｎａｗａ．」となる。 In the Japanese-English correspondence information, for example, when the Japanese word “Okinawa” is associated with the English expression “system” and the reliability is 0.4, the editing cost is 1− 0.4 = 0.6. And if the translation unit is “because the low pressure passes near Japan” and the example data is “the low pressure passes near Okinawa”, it matches the “Japan” included in the translation unit. Replace “Okinawa” in the example data with “Japan”. The English sentence associated with this example data is also edited using the Japanese-English correspondence information in accordance with this editing, so that “system” associated with “Okinawa” is changed to “Japan”. The expression “Japan” is expressed, and the edited English sentence (translation candidate) is “A low air pressure Japan passes close to Okinawa.”.

翻訳候補出力手段３９ｄは、用例データ選択手段３９ｂで計算した距離と、編集手段３９ｃで計算した編集コストとから、翻訳単位ごとのスコア及び翻訳候補を出力するものである。なお、翻訳単位ごとのスコアとは、用例データ選択手段３９ｂで計算した距離に、編集手段３９ｃで計算した編集コストを加算して、−１をかけたものである。 The translation candidate output unit 39d outputs a score and a translation candidate for each translation unit from the distance calculated by the example data selection unit 39b and the editing cost calculated by the editing unit 39c. The score for each translation unit is obtained by adding -1 by adding the editing cost calculated by the editing means 39c to the distance calculated by the example data selecting means 39b.

ここで、図８を参照して、第一言語翻訳単位分割手段３７と、単位毎翻訳手段３９とによる一連の翻訳処理例について説明する。
この翻訳処理例は、翻訳装置３１に、日本語の入力文を英語に翻訳した際のもので、この図８に示すように、入力文が「（気象庁によりますと）低気圧が日本付近を通過するためこれからあすにかけて北日本の太平洋側を中心にまとまった雪が降る恐れがある（ということです。）」である。そして、この入力文中の括弧で囲った「（気象庁によりますと）（ということです。）」が引用表現であり、引用表現分離手段９で分離されているものとする。 Here, a series of translation processing examples by the first language translation unit dividing unit 37 and the unit-by-unit translation unit 39 will be described with reference to FIG.
This example of translation processing is the translation device 31 when a Japanese input sentence is translated into English. As shown in FIG. 8, the input sentence is “if the low pressure passes near Japan (according to the Japan Meteorological Agency). Therefore, there is a risk that it will fall down tomorrow, mainly on the Pacific side of northern Japan. " Then, it is assumed that “(according to the Japan Meteorological Agency) (that is)” enclosed in parentheses in this input sentence is a quoted expression and is separated by the quoted expression separating means 9.

さらに、引用表現を分離した「低気圧が日本付近を通過するためこれからあすにかけて北日本の太平洋側を中心にまとまった雪が降る恐れがある」について、第一言語節・並列句判別手段３３によって、節・並列句が含まれていると判別されているとする。この場合、第一言語翻訳単位分割手段３７によって、分割された入力文（正確には、翻訳単位）は「低気圧が日本付近を通過するため」と「これからあすにかけて北日本の太平洋側を中心にまとまった雪が降る恐れがある」と２つになり、この２つについて、単位毎翻訳手段３９の用例データ取得手段３９ａによって、それぞれ用例データ（類似用例の日本語文）を取得すると「低気圧が沖縄付近を通過する」と「あすは北日本で大雪が降る恐れがある」とになる。 Furthermore, the first language clause / parallel phrase discriminating means 33 uses the first language clause / parallel phrase discriminating means 33 to separate the quoted expression “there is a possibility that snow falls mainly on the Pacific side of northern Japan as the low pressure passes near Japan.・ It is assumed that a parallel phrase is included. In this case, the input sentence divided by the first language translation unit dividing means 37 (to be exact, the translation unit) is “because the low pressure passes near Japan” and “tomorrow, focusing on the Pacific side of northern Japan. There is a risk of falling snow together ”, and for each of these two example data (a Japanese sentence in a similar example) is obtained by the example data acquisition means 39a of the unit-by-unit translation means 39,“ low pressure is Okinawa “Tomorrow is likely to have heavy snowfall in northern Japan”.

そして、用例データに対応付けられている英語の文（類似用例の英語文）は、「低気圧が沖縄付近を通過する」が「ＡｌｏｗａｉｒｐｒｅｓｓｕｒｅｓｙｓｔｅｍｐａｓｓｅｓｃｌｏｓｅｔｏＯｋｉｎａｗａ」であり、「あすは北日本で大雪が降る恐れがある」が「ＨｅａｖｙｓｎｏｗｗｉｌｌｆａｌｌｉｎｎｏｒｔｈｅｒｎＪａｐａｎｔｏｍｏｒｒｏｗ」である。 The English sentence associated with the example data (the English sentence of the similar example) is “A low air pressure system passes to Okinawa” is “A low air pressure system passes to Okinawa”. “There is a risk of heavy snowfall in northern Japan” is “Heavy snow will fall in northern Japan tomorrow”.

そして、単位毎翻訳手段３９の用例データ選択手段３９ｂ及び編集手段３９ｃを経たものが「ＡｌｏｗａｉｒｐｒｅｓｓｕｒｅｓｙｓｔｅｍｐａｓｓｅｓｃｌｏｓｅｔｏＪａｐａｎ」及び「ＨｅａｖｙｓｎｏｗｗｉｌｌｆａｌｌａｌｏｎｇｔｈｅＰａｃｉｆｉｃＯｃｅａｎｉｎｎｏｒｔｈｅｒｎＪａｐａｎｆｒｏｍｌａｔｅｒｔｏｎｉｇｈｔｕｎｔｉｌｔｏｍｏｒｒｏｗ」となる。図３に戻る。 Then, the example data selection unit 39b and the editing unit 39c of the unit-by-unit translation unit 39 are "A low air pressure system close to Japan" and "Heavy snowman confused in the world." " Returning to FIG.

翻訳手段４１は、第一言語節・並列句判別手段３３で節又は並列句が含まれていると判別されなかった本文を、用例データベース４を用いて翻訳し、翻訳結果出力手段４５に出力するものである。 The translation unit 41 translates the text that has not been identified by the first language clause / parallel phrase discrimination unit 33 as containing a clause or parallel phrase, using the example database 4, and outputs it to the translation result output unit 45. Is.

最大スコア翻訳結果選択手段４３は、単位毎翻訳手段３９から出力された翻訳単位ごとのスコアを合計した合計スコアが最大となる翻訳単位の組み合わせを選択し、該当する翻訳単位の翻訳候補の組み合わせを、最大スコア翻訳結果として、翻訳結果出力手段４５に出力するものである。 The maximum score translation result selection unit 43 selects a combination of translation units that has the maximum total score obtained by summing up the scores for the respective translation units output from the unit-by-unit translation unit 39, and selects a translation candidate combination of the corresponding translation unit. The maximum score translation result is output to the translation result output means 45.

ここで、この最大スコア翻訳結果選択手段４３において、翻訳単位毎のスコアを合計した合計スコア及び該当する翻訳単位の翻訳候補の組み合わせの例（入力文は前記した通り）を示す。
スコア＝５の翻訳候補（節に分割しなかった場合の翻訳結果）では、「低気圧が日本付近を通過するためこれからあすにかけても北日本の太平洋側を中心にまとまった雪が降る恐れがある」が「ＨｅａｖｙｓｎｏｗｗｉｌｌｆａｌｌａｌｏｎｇｔｈｅＰａｃｉｆｉｃＯｃｅａｎｉｎｎｏｒｔｈｅｒｎＪａｐａｎｆｒｏｍｌａｔｅｒｔｏｎｉｇｈｔｕｎｔｉｌｔｏｍｏｒｒｏｗ．」となる。 Here, an example of a combination of the total score obtained by summing up the scores for each translation unit and the translation candidates for the corresponding translation unit (input sentence is as described above) in the maximum score translation result selection means 43 is shown.
According to the translation candidate with a score of 5 (translation result when not divided into sections), there is a risk that snow will fall on the Pacific side of northern Japan, even if it goes tomorrow because the cyclone passes near Japan. “Heavy snow will fall along the Pacific Ocean in northern Japan from flat ton until torrow.”

また、スコア＝８の翻訳候補（２つの節に分割した場合の翻訳結果）では、「低気圧が日本付近を通過するため」が「ＡｌｏｗａｉｒｐｒｅｓｓｕｒｅｓｙｓｔｅｍｐａｓｓｅｓｃｌｏｓｅｔｏＪａｐａｎ．」に、「これからあすにかけても北日本の太平洋側を中心にまとまった雪が降る恐れがある」が「ＨｅａｖｙｓｎｏｗｗｉｌｌｆａｌｌａｌｏｎｇｔｈｅＰａｃｉｆｉｃＯｃｅａｎｉｎｎｏｒｔｈｅｒｎＪａｐａｎｆｒｏｍｌａｔｅｒｔｏｎｉｇｈｔｕｎｔｉｌｔｏｍｏｒｒｏｗ．」になる。 For a translation candidate with a score = 8 (translation result when divided into two sections), “A low air pressure system passes to Japan.” “Tomorrow, there is a risk of falling snow centering on the Pacific side of northern Japan” will be “Heavy snow will fall all the Pacific Ocean in northern Japan tontountil tomorrow.”

そして、この場合、節に分割しなかった場合の翻訳結果（スコア＝５）よりも２つの節に分割した場合の翻訳結果（スコア＝８）の方がスコアが高いので、最大スコア翻訳結果選択手段４３では、最大スコア翻訳結果として「ＡｌｏｗａｉｒｐｒｅｓｓｕｒｅｓｙｓｔｅｍｐａｓｓｅｓｃｌｏｓｅｔｏＪａｐａｎ．」、「ＨｅａｖｙｓｎｏｗｗｉｌｌｆａｌｌａｌｏｎｇｔｈｅＰａｃｉｆｉｃＯｃｅａｎｉｎｎｏｒｔｈｅｒｎＪａｐａｎｆｒｏｍｌａｔｅｒｔｏｎｉｇｈｔｕｎｔｉｌｔｏｍｏｒｒｏｗ．」が出力されることとなる。 In this case, the translation result (score = 8) when divided into two sections has a higher score than the translation result (score = 5) when not divided into sections. In means 43, “A low air pressure systems close to Japan” and “Heavy snow will fall in the north of Japan” are output as the maximum score translation result.

翻訳結果出力手段４５は、引用表現分離手段９で分離された引用表現を、対訳辞書データ蓄積手段１７に蓄積されている対訳辞書データを用いて翻訳した翻訳結果と、翻訳手段４１から出力された翻訳結果又は最大スコア翻訳結果選択手段４３で選択された最大スコア翻訳結果と、を出力するものである。なお、翻訳装置１に入力された入力文に引用表現が含まれていなかった場合には、当然のことながら、引用表現分離手段９から引用表現が出力されることがないので、この翻訳結果が出力されることはない。 The translation result output means 45 translates the citation expression separated by the citation expression separation means 9 by using the translation dictionary data stored in the parallel translation dictionary data storage means 17 and the translation result output from the translation means 41. The translation result or the maximum score translation result selected by the maximum score translation result selection means 43 is output. It should be noted that if the quoted expression is not included in the input sentence input to the translation apparatus 1, the quoted expression is not output from the quoted expression separating unit 9 as a matter of course. It is never output.

ちなみに、引用表現を翻訳した翻訳結果が「ＴｈｅＭｅｔｅｏｒｏｌｏｇｉｃａｌＡｇｅｎｃｙｓａｙｓ」（気象庁によりますと）であり、最大スコア翻訳結果選択手段４３で前記した「ＡｌｏｗａｉｒｐｒｅｓｓｕｒｅｓｙｓｔｅｍｐａｓｓｅｓｃｌｏｓｅｔｏＪａｐａｎ．」、「ＨｅａｖｙｓｎｏｗｗｉｌｌｆａｌｌａｌｏｎｇｔｈｅＰａｃｉｆｉｃＯｃｅａｎｉｎｎｏｒｔｈｅｒｎＪａｐａｎｆｒｏｍｌａｔｅｒｔｏｎｉｇｈｔｕｎｔｉｌｔｏｍｏｒｒｏｗ．」が出力されているとすると、この翻訳結果出力手段４５からは、「ＴｈｅＭｅｔｅｏｒｏｌｏｇｉｃａｌＡｇｅｎｃｙｓａｙｓａｌｏｗａｉｒｐｒｅｓｓｕｒｅｓｙｓｔｅｍｐａｓｓｅｓｃｌｏｓｅｔｏＪａｐａｎ．」、「ＴｈｅＭｅｔｅｏｒｏｌｏｇｉｃａｌＡｇｅｎｃｙｓａｙｓｈｅａｖｙｓｎｏｗｗｉｌｌｆａｌｌａｌｏｎｇｔｈｅＰａｃｉｆｉｃＯｃｅａｎｉｎｎｏｒｔｈｅｒｎＪａｐａｎｆｒｏｍｌａｔｅｒｔｏｎｉｇｈｔｕｎｔｉｌｔｏｍｏｒｒｏｗ．」が出力されることとなる。 By the way, the translation result obtained by translating the quoted expression is “The Metalogical Agency says” (according to the Japan Meteorological Agency), and “A low air pressure systems close to Japan”, “A low air pressure systems close to Japan.” Will fall along the Pacific Ocean Atta ri s à s e m e s e n e n e s e n e n e n e n e n e n s e m e n e n e n e n e n e n e n e n e n e n e m e n e n e n e n e m e n e n e n e n e n e n e n e n e n e n e n e n e n e n e n e m e s e n e e n e n e n e n e n e n e n e n e n e n e n e n e n e n e n e n e n e n e n e n e n e n e n e n e n e n e n e n e n e n e n to e n e n e. , “The Metalogical Ageny Says heavy snow will fall all along the Pacific Ocean in northern Japan tonnage output will be output.”

この翻訳装置３１によれば、入力文として、引用表現や複数の節・並列句を含む日本語の文が入力された場合に、これらを適切に分離・分割することで、日本語の文と英語の文とが１対１に対応している用例データベース４を適切に用いることができるので、日本語の文を英語の自然な文に翻訳することができる。 According to this translation apparatus 31, when a Japanese sentence including a quoted expression or a plurality of clauses / parallel phrases is input as an input sentence, the sentence is appropriately separated and divided to obtain a Japanese sentence. Since the example database 4 with one-to-one correspondence with English sentences can be used appropriately, Japanese sentences can be translated into natural English sentences.

（翻訳装置の動作）
次に、図５に示すフローチャートを参照して、翻訳装置３１の動作を説明する（適宜、図３参照）。
まず、翻訳装置３１は、日本語の文を入力文として入力し（ステップ１１）、引用表現判別手段５によって、この入力文が引用表現を含むか否かを判別する（ステップＳ１２）。 (Translation device operation)
Next, the operation of the translation apparatus 31 will be described with reference to the flowchart shown in FIG. 5 (see FIG. 3 as appropriate).
First, the translation device 31 inputs a Japanese sentence as an input sentence (step 11), and determines whether or not the input sentence includes a quotation expression by the quotation expression determination means 5 (step S12).

続いて、翻訳装置３１は、引用表現分離手段９によって、引用表現判別手段５で引用表現が含まれていると判別された場合（ステップＳ１２、Ｙｅｓ）には、この判別された引用表現を、入力文から分離する（ステップＳ１３）。引用表現が含まれていると判別されなかった場合（ステップＳ１２、Ｎｏ）には、ステップＳ１３はエスケープされ、ステップＳ１４に移行する。 Subsequently, in the translation device 31, when the quoted expression separating unit 9 determines that the quoted expression discriminating unit 5 includes the quoted expression (step S <b> 12, Yes), Separated from the input sentence (step S13). If it is not determined that the quoted expression is included (step S12, No), step S13 is escaped and the process proceeds to step S14.

そして、翻訳装置３１は、引用表現が分離された入力文について、第一言語節・並列句判別手段３３によって、節又は並列句を含むか否かを判別する（ステップＳ１４）。翻訳装置３１は、節又は並列句を含むと判別した場合（ステップＳ１４、Ｙｅｓ）には、第一言語翻訳単位分割手段３７によって、翻訳単位に分割し（ステップＳ１５）、単位毎翻訳手段３９によって、この分割した翻訳単位ごと翻訳し（ステップＳ１６）、最大スコア翻訳結果選択手段４３によって、翻訳単位のスコアの合計が最大となる最大スコア翻訳結果を選択して、翻訳結果出力手段４５に出力する（ステップＳ１７）。 Then, the translation device 31 determines whether or not the input sentence from which the quoted expression is separated includes a clause or a parallel phrase by the first language clause / parallel phrase determination means 33 (step S14). When it is determined that the translation device 31 includes a clause or a parallel phrase (step S14, Yes), the translation unit 31 divides the translation unit into translation units by the first language translation unit division unit 37 (step S15). Then, each divided translation unit is translated (step S16), and the maximum score translation result selection means 43 selects the maximum score translation result that maximizes the total score of the translation units, and outputs it to the translation result output means 45. (Step S17).

また、翻訳装置３１は、節又は並列句を含むと判別しなかった場合（ステップＳ１４、Ｎｏ）には、翻訳手段４１によって、引用表現が分離された入力文をそのまま翻訳して、翻訳結果出力手段４５に出力する（ステップＳ１８）。 If the translation device 31 does not determine that it contains a clause or a parallel phrase (step S14, No), the translation unit 41 translates the input sentence from which the quoted expression is separated as it is, and outputs the translation result. It outputs to the means 45 (step S18).

その後、翻訳装置３１は、翻訳結果出力手段４５によって、最大スコア翻訳結果選択手段４３から出力された最大スコア翻訳結果又は翻訳手段４１から出力された翻訳結果と、引用表現分離手段９から引用表現が出力された場合に対訳辞書データ蓄積手段１７を参照して翻訳した翻訳結果とを出力する（ステップＳ１９）。 Thereafter, the translation device 31 uses the translation result output means 45 to obtain the maximum score translation result output from the maximum score translation result selection means 43 or the translation result output from the translation means 41 and the quote expression from the quote expression separation means 9. If output, the translation result translated with reference to the bilingual dictionary data storage means 17 is output (step S19).

ここで、図６に示すフローチャートを参照して、翻訳装置３１の単位毎翻訳手段３９における処理について説明する（適宜、図４参照）。
まず、翻訳装置３１は、第一言語翻訳単位分割手段３７から、単位毎翻訳手段３９に翻訳単位を入力する（ステップＳ２１）。続いて、翻訳装置３１は、単位毎翻訳手段３９の用例データ取得手段３９ａによって、用例データベース４から、翻訳単位に含まれる述語が一致する又は類似する用例データを取得する（ステップＳ２２）。 Here, with reference to the flowchart shown in FIG. 6, the process in the unit translation means 39 of the translation apparatus 31 is demonstrated (refer FIG. 4 suitably).
First, the translation device 31 inputs a translation unit from the first language translation unit dividing unit 37 to the unit-by-unit translation unit 39 (step S21). Subsequently, the translation apparatus 31 acquires, from the example database 4, example data that matches or is similar to the predicate included in the translation unit by the example data acquisition unit 39a of the unit-by-unit translation unit 39 (step S22).

そして、翻訳装置３１は、単位毎翻訳手段３９の用例データ選択手段３９ｂによって、用例データ取得手段３９ａで取得された用例データと、入力された翻訳単位との距離について計算し、距離が小さい用例データを選択する（ステップＳ２３）。また、翻訳装置３１は、単位毎翻訳手段３９の編集手段３９ｃによって、選択された用例データについて、翻訳単位ごとの編集コスト及び用例データに対応付けられている英語の文を編集した翻訳候補を出力する（ステップＳ２４）。 The translation device 31 calculates the distance between the example data acquired by the example data acquisition unit 39a and the input translation unit by the example data selection unit 39b of the unit-by-unit translation unit 39, and the example data having a small distance is used. Is selected (step S23). Further, the translation device 31 outputs the translation candidate for the selected example data edited by the editing unit 39c of the unit-by-unit translation unit 39 by editing the editing cost for each translation unit and the English sentence associated with the example data. (Step S24).

そして、翻訳装置３１は、単位毎翻訳手段３９の翻訳候補出力手段３９ｄによって、用例データ選択手段３９ｂで計算された距離と編集手段３９ｃで計算された編集コストとから、翻訳単位ごとのスコアを計算し、このスコア及び翻訳単位ごとの翻訳結果を出力する（ステップＳ２５）。 Then, the translation device 31 calculates a score for each translation unit from the distance calculated by the example data selection unit 39b and the editing cost calculated by the editing unit 39c by the translation candidate output unit 39d of the unit-by-unit translation unit 39. Then, the score and the translation result for each translation unit are output (step S25).

以上、本発明の実施形態について説明したが、本発明は前記実施形態には限定されない。例えば、本実施形態では、用例データベース作成装置１及び翻訳装置３１として説明したが、これらの装置における各手段の処理を、実行可能にコンピュータ言語によって記述した、用例データベース作成プログラム及び翻訳プログラムとしても実現できる。 As mentioned above, although embodiment of this invention was described, this invention is not limited to the said embodiment. For example, in the present embodiment, the example database creation device 1 and the translation device 31 have been described. However, the processing of each means in these devices is also realized as an example database creation program and a translation program that are described in a computer language in an executable manner. it can.

本発明の実施形態に係る用例データベース作成装置のブロック図である。It is a block diagram of the example database creation apparatus which concerns on embodiment of this invention. 図１に示した用例データベース作成装置の動作を示したフローチャートである。It is the flowchart which showed operation | movement of the example database creation apparatus shown in FIG. 本発明の実施形態に係る翻訳装置のブロック図である。It is a block diagram of the translation apparatus which concerns on embodiment of this invention. 図３に示した翻訳装置の単位毎翻訳手段のブロック図である。It is a block diagram of the translation means for every unit of the translation apparatus shown in FIG. 図３に示した翻訳装置の動作を示したフローチャートである。It is the flowchart which showed the operation | movement of the translation apparatus shown in FIG. 図４に示した単位毎翻訳手段の動作を示したフローチャートである。It is the flowchart which showed the operation | movement of the unit translation means shown in FIG. 用例データベース作成装置において、対応付け例を示した図である。It is the figure which showed the example of matching in the example database creation apparatus. 翻訳装置において、翻訳処理例を示した図である。It is the figure which showed the example of translation processing in the translation apparatus.

Explanation of symbols

１用例データベース作成装置
２対訳データベース
３翻訳データ判別手段
４用例データベース
５引用表現判別手段
７引用表現パターン蓄積手段
９引用表現分離手段
１１第二言語複数文判別手段
１３文区切りパターン蓄積手段
１５第一言語表現特定手段
１７対訳辞書データ蓄積手段
１９第一言語分割手段
１９ａ、３７ａ主語・提題付加手段
２１分割第一言語対応第二言語追加手段
３１翻訳装置
３３第一言語節・並列句判別手段
３５節・並列句パターン蓄積手段
３７第一言語翻訳単位分割手段
３９単位毎翻訳手段
３９ａ用例データ取得手段
３９ｂ用例データ選択手段
３９ｃ編集手段
３９ｄ翻訳候補出力手段
４１翻訳手段
４３最大スコア翻訳結果選択手段
４５翻訳結果出力手段 DESCRIPTION OF SYMBOLS 1 Example database creation apparatus 2 Bilingual database 3 Translation data discrimination means 4 Example database 5 Citation expression discrimination means 7 Citation expression pattern storage means 9 Citation expression separation means 11 Second language multiple sentence discrimination means 13 Sentence delimiter pattern accumulation means 15 First language Expression specifying means 17 Bilingual dictionary data storage means 19 First language dividing means 19a, 37a Subject / subject adding means 21 Divided first language compatible second language adding means 31 Translation device 33 First language clause / parallel phrase discriminating means 35 Parallel phrase pattern storage means 37 First language translation unit division means 39 Unit-by-unit translation means 39a Example data acquisition means 39b Example data selection means 39c Editing means 39d Translation candidate output means 41 Translation means 43 Maximum score translation result selection means 45 Translation result Output means

Claims

A bilingual database storing translation data in which a sentence in the first language is translated into a sentence in the second language, and one sentence in the first language is translated into a plurality of sentences in the second language An example database creation device for creating an example database by adding single sentence corresponding translation data in which one sentence part of the first language corresponds to one sentence of the second language to the parallel translation database Because
A quoted expression discriminating means for discriminating whether or not a quoted expression representing a quote is included in the sentence of the first language, using a quoted expression pattern obtained by patterning a quoted expression frequently appearing in the first language;
A quoted expression separating means for separating and discarding the quoted expression from the sentence in the first language when the quoted expression judging means determines that the quoted expression is included;
A sentence in the first language from which the quote expression is separated by the quote expression separating means is set as a text, and whether or not the text is translated into a plurality of sentences in the second language is determined according to the grammar of the second language. A second language multi-sentence discriminating means for discriminating using a sentence delimiter pattern obtained by patterning sentence delimiters;
When it is determined by the second language plural sentence discriminating means that the text from which the cited expression is separated is translated into a plurality of sentences in the second language, words included in the sentences in the second language 1st language expression specification that specifies which of the words included in the text corresponds to using the bilingual dictionary data in which the first language word and the second language word are associated with each other Means,
Each part of the body and one sentence of the second language according to the correspondence relationship between the words included in the plurality of sentences of the second language specified by the first language expression specifying means and the words included in the body First language dividing means for dividing so that and correspond to each other,
Addition of second language corresponding to divided first language for adding one sentence corresponding translation data in which each part of the body divided by the first language dividing means corresponds to one sentence of the second language to the parallel translation database Means,
An example database creation device comprising:

The example database creation device according to claim 1, wherein the first language dividing unit adds a preset word to be a subject or a proposal of each part to each part of the text.

A translation device that translates an input sentence in a first language into a sentence in a second language,
An example database created by the database creation device of claim 1;
A quoted expression discriminating means for discriminating whether or not a quoted expression representing a quote is included in the sentence in the first language, using a quoted expression pattern obtained by patterning a quoted expression frequently appearing in the first language;
When it is determined that the quote expression is included by the quote expression determination means, the quote expression separating means for separating the quote expression from one sentence of the corresponding first language;
Whether or not the sentence in the first language from which the quote expression is separated by the quote expression separating means includes a clause or a parallel phrase, the clause / parallel in which the sentence clause and the parallel phrase according to the grammar of the first language are patterned A first language clause / parallel phrase discrimination means for discriminating using a phrase pattern;
The first language translation unit dividing means for dividing the sentence of the first language determined to contain the clause or the parallel phrase by the first language clause / parallel phrase determining means into the translation unit as a unit to be translated into the second language. When,
A unit-by-unit translation unit that calculates and translates, for each translation unit, a score indicating a degree of coincidence between the translation unit divided by the first language translation unit dividing unit and the data included in the example database;
A maximum score translation result selection means for selecting a maximum score translation result in which the total score summed up about the score for each translation unit calculated when the translation unit is translated by the unit translation means;
Translation means for translating a sentence in the first language determined not to contain a clause or parallel phrase by the first language clause / parallel phrase determination means, using the example database;
The maximum score translation result selected by the maximum score translation result selection means, or the translation result translated by the translation means and the quote expression when the quote expression is separated by the quote expression separation means, A translation result output means for outputting a translation result translated using bilingual dictionary data in which a language word and a second language word are associated;
A translation apparatus comprising:

The translation apparatus according to claim 3, wherein the first language translation unit dividing unit adds a preset word which is a subject or a subject of the translation unit to the translation unit.

The unit translation means is:
From the example database, example data acquisition means for acquiring, as example data, a sentence in a first language that matches or pre-set the predicate included in the translation unit;
An example data selection unit that calculates a distance representing a degree of approximation between the syntax structure of the example data acquired by the example data acquisition unit and the translation unit, and selects a predetermined number of example data from the smallest distance; ,
When editing the example data so that the example data selected by the example data selection means and the translation unit have the same representation, the editing cost is calculated according to a preset editing rule, and the example data An editing means that uses the second language sentence after editing the data as translation candidates,
Translation candidate output means for outputting a score and a translation candidate for each translation unit from the distance calculated by the example data selection means and the editing cost calculated by the editing means;
The translation apparatus according to claim 3, further comprising:

A bilingual database storing translation data in which a sentence in the first language is translated into a sentence in the second language, and one sentence in the first language is translated into a plurality of sentences in the second language In order to create an example database by adding, to the parallel translation database, one sentence corresponding translation data in which one sentence part of the first language corresponds to one sentence of the second language The
A quoted expression discriminating means for discriminating whether or not a quoted expression representing a quote is included in the sentence in the first language, using a quoted expression pattern obtained by patterning a quoted expression frequently appearing in the first language;
A quoted expression separating unit that separates and discards the quoted expression from the sentence in the first language when the quoted expression determining unit determines that the quoted expression is included;
A sentence in the first language from which the quote expression is separated by the quote expression separating means is set as a text, and whether or not the text is translated into a plurality of sentences in the second language is determined according to the grammar of the second language. Second language multi-sentence discriminating means for discriminating using a sentence delimiter pattern obtained by patterning sentence delimiters
When it is determined by the second language plural sentence discriminating means that the text from which the cited expression is separated is translated into a plurality of sentences in the second language, words included in the sentences in the second language 1st language expression specification that specifies which of the words included in the text corresponds to using the bilingual dictionary data in which the first language word and the second language word are associated with each other means,
Each part of the body and one sentence of the second language according to the correspondence relationship between the words included in the plurality of sentences of the second language specified by the first language expression specifying means and the words included in the body A first language dividing means for dividing so as to correspond to
Addition of second language corresponding to divided first language for adding one sentence corresponding translation data in which each part of the body divided by the first language dividing means corresponds to one sentence of the second language to the parallel translation database means,
An example database creation program characterized by being made to function as

In order to translate the sentence of the input first language into the sentence of the second language, a computer having an example database created by the database creation device according to claim 1,
A quoted expression discriminating means for discriminating whether or not a quoted expression representing a quote is included in the sentence in the first language, using a quoted expression pattern obtained by patterning a quoted expression frequently appearing in the first language;
A quoted expression separating unit that separates the quoted expression from one sentence of the corresponding first language when the quoted expression determining unit determines that the quoted expression is included;
Whether or not the sentence in the first language from which the quote expression is separated by the quote expression separating means includes a clause or a parallel phrase, the clause / parallel in which the sentence clause and the parallel phrase according to the grammar of the first language are patterned First language clause / parallel phrase discrimination means for discriminating using phrase patterns,
The first language translation unit dividing means for dividing the sentence of the first language determined to contain the clause or the parallel phrase by the first language clause / parallel phrase determining means into the translation unit as a unit to be translated into the second language. When,
A unit-by-unit translation unit that calculates and translates, for each translation unit, a score indicating the degree of matching between the translation unit divided by the first language translation unit dividing unit and the data included in the example database;
Maximum score translation result selection means for selecting a maximum score translation result that maximizes the total score totaled for the scores for the translation units calculated when the translation unit is translated by the unit translation means,
Translation means for translating a sentence in the first language determined not to contain a clause or parallel phrase by the first language clause / parallel phrase determination means, using the example database;
The maximum score translation result selected by the maximum score translation result selection means, or the translation result translated by the translation means and the quote expression when the quote expression is separated by the quote expression separation means, A translation result output means for outputting a translation result translated using bilingual dictionary data in which a language word and a second language word are associated;
Translation program characterized by functioning as