JP2010244385A

JP2010244385A - Machine translation device, machine translation method, and program

Info

Publication number: JP2010244385A
Application number: JP2009093718A
Authority: JP
Inventors: Kaneyasu Jo; 金安徐; Seiya Osada; 誠也長田; Ryosuke Isotani; 亮輔磯谷
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2009-04-08
Filing date: 2009-04-08
Publication date: 2010-10-28

Abstract

<P>PROBLEM TO BE SOLVED: To achieve machine translation without making it necessary to prepare an accurate translation rule to a grammar phenomenon which is difficult to describe or can not be fully described as a translation rule. <P>SOLUTION: A vocabulary having a grammar phenomenon which is difficult to describe or can not be fully described as a translation rule is registered as a specific pattern in a specific pattern DB103 in advance, and when the specific pattern is included in an input sentence in first language, the vocabulary or syntax information in second language corresponding to the analysis results of the specific pattern is acquired from a translation DB108, and the error of a second language translation sentence to be obtained by translating first language is detected by an error detection/proofreading part 107 by using the vocabulary or syntax information in the second language and statistical co-occurrence information stored in a second language statistical model 104, and the second language translation sentence is proofread. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、翻訳技術に関し、特に翻訳辞書または翻訳規則を用いて、第１言語を第２言語に機械翻訳する技術に関する。 The present invention relates to a translation technique, and more particularly to a technique for machine translation of a first language into a second language using a translation dictionary or translation rules.

従来、ルールベースを用いた機械翻訳の翻訳精度を向上するために、前編集規則や後編集規則による半自動または全自動編集を行う技術が、例えば、特許文献１−３で提案されている。 Conventionally, in order to improve the translation accuracy of machine translation using a rule base, a technique for performing semi-automatic or fully automatic editing based on a pre-edit rule or a post-edit rule has been proposed in, for example, Patent Documents 1-3.

この特許文献１に記載された機械翻訳後編集支援装置は、図１５に示すように、表示部と、表示制御部と、文章記憶部と、入力制御部と、入力部と、自動後編集処理部と、自動後編集規則記憶部とで構成されている。
この特許文献１では、機械翻訳後編集支援装置として、自動後編集規則を用いて、自動後編集文判別部の判別結果に基づいて自動後編集文であると判別された文と自動後編集文であると判別されなかった文とを区別して表示部に表示させている。 As shown in FIG. 15, the post-translational editing support device described in Patent Document 1 includes a display unit, a display control unit, a text storage unit, an input control unit, an input unit, and an automatic post-editing process. And an automatic post-editing rule storage unit.
In this patent document 1, as a post-translation editing support device, a sentence that is determined to be an automatic post-edit sentence based on a determination result of an automatic post-edit sentence determination unit and an automatic post-edit sentence using an automatic post-edit rule. The sentence that is not determined to be is distinguished from the sentence and displayed on the display unit.

また、特許文献２に記載された機械翻訳におけるテキスト自動前編集装置は、図１６に示すように、文章入力部と、入力文章を記憶する文章バッファと、形態素解析部と、辞書と、前編集ルール群と、前編集ルール群内に格納されたルール群から所定の検索パターンを検出して所定の処理を行って機械翻訳に適した形に前編集処理を行う制御処理部と、文章バッファと、出力部とで構成されている。
この特許文献２では、前編集規則群を機械翻訳システムに適用することで、自動前編集装置を実現している。 In addition, as shown in FIG. 16, an automatic text pre-editing device for machine translation described in Patent Document 2 includes a text input unit, a text buffer for storing input text, a morpheme analysis unit, a dictionary, and pre-editing. A control group that detects a predetermined search pattern from a rule group stored in the pre-edit rule group, performs a predetermined process, and performs pre-edit processing in a form suitable for machine translation; a sentence buffer; And an output unit.
In Patent Document 2, an automatic pre-editing device is realized by applying a pre-edit rule group to a machine translation system.

また、特許文献３に記載された機械翻訳用ルール生成装置は、対訳例として第１自然言語の表現と、その対訳となる第２自然言語の表現を入力する入力部と、入力された対訳例の第１自然言語を第２自然言語に変換する変換規則を同定する変換規則同定手段と、同定された変換規則の第１自然言語の条件に適合する表現を第１自然言語コーパスから抽出する第１自然言語表現抽出手段と、抽出された第１自然言語の表現に同定された変換規則を適用して、第１自然言語に対応する第２自然言語の表現の候補を生成する第２自然言語表現生成部と、生成された第２自然言語の表現の妥当性を第２自然言語コーパスを用いて検証し、妥当な表現を出力する第２自然言語表現検証手段と、第１自然言語表現抽出手段によって第１自然言語コーパスから抽出された表現と、第２自然言語表現検証手段によって出力された第２自然言語の表現の組を新たな翻訳ルールとして追加するルール追加手段とで構成される。
この特許文献３では、対訳例から翻訳ルールを自動的に取得している。 The rule generation device for machine translation described in Patent Document 3 includes a first natural language expression as a parallel translation example, an input unit for inputting the second natural language expression as a parallel translation, and the input parallel translation example. A conversion rule identifying means for identifying a conversion rule for converting the first natural language into the second natural language, and an expression that matches the condition of the first natural language of the identified conversion rule from the first natural language corpus. A second natural language that generates a candidate for a second natural language expression corresponding to the first natural language by applying the identified natural rule to the first natural language expression; An expression generator, a second natural language expression verification means for verifying the validity of the generated second natural language expression using the second natural language corpus, and outputting a valid expression; and a first natural language expression extraction The first natural language corpus by means And extracted representation, and a rule addition unit for adding a set of representations of the second natural language which is output by the second natural language expressions verification means as a new translation rules.
In Patent Document 3, a translation rule is automatically acquired from a parallel translation example.

特開平７−２８８１８号公報JP-A-7-28818 特開平０６−１３９２７４号公報Japanese Patent Laid-Open No. 06-139274 特許第００３３２９３７１号Patent No. 003329371

Martin, S., Liermann, J. and Ney, H., 1998, "Algorithms for bigram and trigram word clustering", Speech Communication, 24(1998), 19-37.Martin, S., Liermann, J. and Ney, H., 1998, "Algorithms for bigram and trigram word clustering", Speech Communication, 24 (1998), 19-37.

しかしながら、このような従来技術では、ルールベースの機械翻訳装置において、前編集規則、後編集規則、翻訳規則として記述しにくいまたは記述しきれない文法現象について、その正確な翻訳規則を作成するコストが高くなるという問題がある。
その理由は、従来技術によれば、翻訳規則として記述しにくいまたは記述しきれない文法現象に対して、前編集規則、後編集規則、翻訳規則を作成するための網羅性のある対訳用例データをたくさん集める必要があるためである。 However, with such a conventional technique, the cost of creating an accurate translation rule for a grammatical phenomenon that is difficult or impossible to describe as a pre-edit rule, a post-edit rule, or a translation rule in a rule-based machine translation device. There is a problem of becoming higher.
The reason for this is that, according to the prior art, for the grammatical phenomenon that is difficult or impossible to describe as a translation rule, pre-editing rules, post-editing rules, and complete translation example data for creating translation rules are provided. This is because it is necessary to collect a lot.

つまり、翻訳規則として記述しにくいまたは記述しきれない文法現象については、翻訳規則の数が膨大であることから、２言語間の対応が取れた網羅性の高い用例データがなければ、汎用的な前編集規則、後編集規則、翻訳規則を取得できない。
しかし、翻訳規則として記述しにくいまたは記述しきれない文法現象を網羅的に反映できる２言語間の対応を取れた用例の収集は高いコストを要する。 In other words, for grammatical phenomena that are difficult or impossible to describe as translation rules, there are a large number of translation rules. The pre-edit rule, post-edit rule, and translation rule cannot be acquired.
However, it is expensive to collect examples in which correspondence between two languages can be comprehensively reflected on grammatical phenomena that are difficult to describe or cannot be described as translation rules.

日中機械翻訳システムを例として説明すると、日本語の連体修飾における中国語定語の生成処理、日本語の連用修飾における中国語状語の生成処理、および、日本語から中国語補語への生成処理を行うために、対応する中国語生成用翻訳規則の数は数え切れないほど存在する。特に、高精度な日中翻訳システムを構築するために、中国語の定語、状語、補語が複数ある際に、その並び順に関連する生成規則、および各分成分と共存する中国語構造助詞「的」、「地」、「得」を生成するかしないかの判定にも、より厳密な文法分析を行った上で、中国語生成規則を細かく作成する必要がある。しかし、これらの規則を作成するために、人手による文法現象の抽象化や翻訳規則の作成など、煩雑な文法現象に応じて、膨大な数の変換規則を作成する必要となり、高いコストを要する。 Using a Japanese-Chinese machine translation system as an example, Chinese constants generation processing for Japanese combination modification, Chinese character generation processing for Japanese combination modification, and generation from Japanese to Chinese complements In order to perform the processing, there are an infinite number of corresponding Chinese generation translation rules. In particular, in order to build a high-precision Japanese-Chinese translation system, when there are multiple Chinese constants, adjectives, and complements, the production rules related to the order of the order and the Chinese structure particles that coexist with each component In order to determine whether or not to generate “target”, “ground”, and “profit”, it is necessary to prepare a Chinese generation rule in detail after conducting a stricter grammatical analysis. However, in order to create these rules, it is necessary to create an enormous number of conversion rules according to complicated grammatical phenomena such as manually abstracting grammatical phenomena and creating translation rules, which is expensive.

本発明はこのような課題を解決するためのものであり、翻訳規則として記述しにくいまたは記述しきれない文法現象に対する正確な翻訳規則を必要とすることなく、機械翻訳を行うことができる機械翻訳技術を提供することにある。 The present invention is for solving such problems, and machine translation that can perform machine translation without requiring an accurate translation rule for a grammatical phenomenon that is difficult or impossible to describe as a translation rule. To provide technology.

このような目的を達成するために、本発明にかかる機械翻訳装置は、第１言語で表現された入力文に対して形態素・構文解析を行う形態素・構文解析部と、第１言語を第２言語に翻訳するために用いる翻訳辞書または翻訳規則からなる翻訳データベースと、形態素・構文解析部の解析結果と翻訳データベースとを参照し、入力文に対応する翻訳結果として第２言語翻訳文を生成する第２言語生成部と、第１言語で用いられる特定の語彙を特定パターンとして格納する特定パターンデータベースと、形態素・構文解析部の解析結果から、特定パターンデータベースに格納されている特定パターンを検出し、得られた特定パターンの解析結果と対応する第２言語の語彙または構文情報を翻訳データベースから取得する特定パターン検出部と、第２言語で用いられる語彙の共起に関する統計的共起情報を格納する第２言語統計的モデルと、特定パターン検出部で得られた第２言語の語彙または構文情報と第２言語統計的モデルに格納されている統計的共起情報とを用いて、第２言語生成部で生成された第２言語翻訳文の誤りを検出し、当該第２言語翻訳文を校正する誤り検出・校正部とを備えている。 In order to achieve such an object, a machine translation apparatus according to the present invention includes a morpheme / syntax analyzer that performs morpheme / syntax analysis on an input sentence expressed in a first language, and a second language as a first language. A translation database composed of a translation dictionary or translation rule used for translation into a language, the analysis result of the morpheme / syntactic analysis unit, and the translation database are referred to, and a second language translation sentence is generated as a translation result corresponding to the input sentence. The specific pattern stored in the specific pattern database is detected from the analysis result of the second language generation unit, the specific pattern database storing the specific vocabulary used in the first language as the specific pattern, and the morpheme / syntactic analysis unit A specific pattern detection unit that acquires vocabulary or syntax information of the second language corresponding to the obtained analysis result of the specific pattern from the translation database; A second language statistical model that stores statistical co-occurrence information related to the co-occurrence of vocabulary used in a word, and a second language vocabulary or syntax information obtained by a specific pattern detection unit and a second language statistical model An error detection / calibration unit that detects an error in the second language translation sentence generated by the second language generation unit using the statistical co-occurrence information that has been generated, and calibrates the second language translation sentence. ing.

また、本発明にかかる機械翻訳方法は、第１言語を第２言語に翻訳する機械翻訳装置で用いられる機械翻訳方法であって、形態素・構文解析部が、第１言語で表現された入力文に対して形態素・構文解析を行う形態素・構文解析ステップと、第２言語生成部が、形態素・構文解析部の解析結果と、第１言語を第２言語に翻訳するために用いる翻訳辞書または翻訳規則からなる翻訳データベースとを参照し、入力文に対応する翻訳結果として第２言語翻訳文を生成する第２言語生成ステップと、特定パターン検出部が、形態素・構文解析部の解析結果から、第１言語で用いられる特定の語彙を特定パターンとして格納する特定パターンデータベースに格納された特定パターンを検出し、得られた特定パターンの解析結果と対応する第２言語の語彙または構文情報を翻訳データベースから取得する特定パターン検出ステップと、誤り検出・校正部が、特定パターン検出部で得られた第２言語の語彙または構文情報と、第２言語で用いられる語彙の共起に関する統計的共起情報を格納する第２言語統計的モデルに格納されている統計的共起情報とを用いて、第２言語生成部で生成された第２言語翻訳文の誤りを検出し、当該第２言語翻訳文を校正する誤り検出・校正ステップとを備えている。 A machine translation method according to the present invention is a machine translation method used in a machine translation device that translates a first language into a second language, and the morpheme / syntax analyzer is an input sentence expressed in the first language. A morpheme / syntax analysis step for performing morpheme / syntax analysis on a gramme, a result of analysis by a morpheme / syntax analysis unit, and a translation dictionary or translation used for translating the first language into the second language A second language generation step for generating a second language translation sentence as a translation result corresponding to the input sentence with reference to a translation database composed of rules, and a specific pattern detection unit from the analysis result of the morpheme / syntax analysis unit A specific language stored in a specific pattern database that stores a specific vocabulary used in one language as a specific pattern, and a second language word corresponding to the obtained analysis result of the specific pattern Alternatively, the specific pattern detection step of acquiring syntax information from the translation database, and the error detection / proofreading unit co-occurrence of the vocabulary or syntax information of the second language obtained by the specific pattern detection unit and the vocabulary used in the second language And using the statistical co-occurrence information stored in the second language statistical model for storing the statistical co-occurrence information regarding the second language translation unit to detect an error in the second language translation sentence, An error detection / proofreading step for proofreading the second language translation sentence.

また、本発明にかかるプログラムは、第１言語を第２言語に翻訳する機械翻訳装置のコンピュータに、形態素・構文解析部が、第１言語で表現された入力文に対して形態素・構文解析を行う形態素・構文解析ステップと、第２言語生成部が、形態素・構文解析部の解析結果と、第１言語を第２言語に翻訳するために用いる翻訳辞書または翻訳規則からなる翻訳データベースとを参照し、入力文に対応する翻訳結果として第２言語翻訳文を生成する第２言語生成ステップと、特定パターン検出部が、形態素・構文解析部の解析結果から、第１言語で用いられる特定の語彙を特定パターンとして格納する特定パターンデータベースに格納された特定パターンを検出し、得られた特定パターンの解析結果と対応する第２言語の語彙または構文情報を翻訳データベースから取得する特定パターン検出ステップと、誤り検出・校正部が、特定パターン検出部で得られた第２言語の語彙または構文情報と、第２言語で用いられる語彙の共起に関する統計的共起情報を格納する第２言語統計的モデルに格納されている統計的共起情報とを用いて、第２言語生成部で生成された第２言語翻訳文の誤りを検出し、当該第２言語翻訳文を校正する誤り検出・校正ステップとを実行させる。 Further, the program according to the present invention allows a morpheme / syntax analyzer to perform a morpheme / syntax analysis on an input sentence expressed in a first language on a computer of a machine translation device that translates a first language into a second language. The morpheme / syntax analysis step to be performed, the second language generation unit refers to the analysis result of the morpheme / syntax analysis unit, and a translation database comprising translation dictionaries or translation rules used to translate the first language into the second language And a second language generation step for generating a second language translation sentence as a translation result corresponding to the input sentence, and a specific pattern detection unit using a specific vocabulary used in the first language based on an analysis result of the morpheme / syntax analysis unit. The specific pattern stored in the specific pattern database is stored as a specific pattern, and the lexical or syntax information of the second language corresponding to the obtained analysis result of the specific pattern is obtained. The specific pattern detection step acquired from the translation database, and the error detection / proofreading unit perform statistical co-occurrence regarding the co-occurrence of the vocabulary or syntax information of the second language obtained by the specific pattern detection unit and the vocabulary used in the second language. Using the statistical co-occurrence information stored in the second language statistical model for storing the origin information, detecting an error in the second language translation generated by the second language generation unit, and detecting the second language An error detection / proofreading step for proofreading the translation is executed.

本発明によれば、第１言語から第２言語へのルールベースの機械翻訳装置において、翻訳規則として記述しにくいまたは記述しきれない文法現象に対する正確な翻訳規則を必要とすることなく、機械翻訳を行うことができる。
これにより、翻訳規則として記述しにくいまたは記述しきれない文法現象に対して、翻訳規則の作成に必要となるコストを削減することが可能となる。 According to the present invention, in a rule-based machine translation device from a first language to a second language, machine translation without requiring accurate translation rules for grammatical phenomena that are difficult or impossible to describe as translation rules. It can be performed.
This makes it possible to reduce the cost required to create a translation rule for a grammatical phenomenon that is difficult to describe or cannot be described as a translation rule.

本発明の第１の実施形態にかかる機械翻訳装置の構成を示すブロック図である。It is a block diagram which shows the structure of the machine translation apparatus concerning the 1st Embodiment of this invention. 日本語構文解析例を示す説明図である。It is explanatory drawing which shows the example of Japanese syntax analysis. 本発明の第１の実施形態にかかる機械翻訳装置で用いる特定パターンＤＢを示す説明図である。It is explanatory drawing which shows specific pattern DB used with the machine translation apparatus concerning the 1st Embodiment of this invention. 本発明の第１の実施形態にかかる機械翻訳処理を示すフローチャートである。It is a flowchart which shows the machine translation process concerning the 1st Embodiment of this invention. 本発明の第２の実施形態にかかる機械翻訳装置の構成を示すブロック図である。It is a block diagram which shows the structure of the machine translation apparatus concerning the 2nd Embodiment of this invention. 本発明の第２の実施形態にかかる機械翻訳装置で用いる特定パターンＤＢを示す説明図である。It is explanatory drawing which shows specific pattern DB used with the machine translation apparatus concerning the 2nd Embodiment of this invention. 本発明の第２の実施形態にかかる機械翻訳装置で用いる特定パターンＤＢの構成例を示す説明図である。It is explanatory drawing which shows the structural example of specific pattern DB used with the machine translation apparatus concerning the 2nd Embodiment of this invention. 本発明の第２の実施形態にかかる機械翻訳処理を示すフローチャートである。It is a flowchart which shows the machine translation process concerning the 2nd Embodiment of this invention. 本発明の第３の実施形態にかかる機械翻訳装置の構成を示すブロック図である。It is a block diagram which shows the structure of the machine translation apparatus concerning the 3rd Embodiment of this invention. 本発明の第４の実施形態にかかる機械翻訳装置の構成を示すブロック図である。It is a block diagram which shows the structure of the machine translation apparatus concerning the 4th Embodiment of this invention. 本発明の第１の実施例にかかる機械翻訳装置の構成を示すブロック図である。It is a block diagram which shows the structure of the machine translation apparatus concerning 1st Example of this invention. 本発明の第１の実施例にかかる機械翻訳処理を示すフローチャートである。It is a flowchart which shows the machine translation process concerning 1st Example of this invention. 本発明の第２の実施例にかかる機械翻訳装置の構成を示すブロック図である。It is a block diagram which shows the structure of the machine translation apparatus concerning the 2nd Example of this invention. 本発明の第２の実施例にかかる機械翻訳処理を示すフローチャートである。It is a flowchart which shows the machine translation process concerning the 2nd Example of this invention. 従来の機械翻訳後編集支援装置の構成を示すブロック図である。It is a block diagram which shows the structure of the conventional post-machine-translational editing support apparatus. 従来の機械翻訳自動前編集システムの構成を示すブロック図である。It is a block diagram which shows the structure of the conventional machine translation automatic pre-editing system.

次に、本発明の実施形態について図面を参照して説明する。
［第１の実施形態］
まず、図１を参照して、本発明の第１の実施形態にかかる機械翻訳装置について説明する。図１は、本発明の第１の実施形態にかかる機械翻訳装置の構成を示すブロック図である。 Next, embodiments of the present invention will be described with reference to the drawings.
[First Embodiment]
First, a machine translation apparatus according to a first embodiment of the present invention will be described with reference to FIG. FIG. 1 is a block diagram showing the configuration of the machine translation apparatus according to the first embodiment of the present invention.

機械翻訳装置１００は、全体としてコンピュータなどの情報処理装置からなり、入力された第１言語を第２言語へ機械翻訳する機能を有している。
この機械翻訳装置１００には、主な機能部として、第１言語入力部１０１、形態素・構文解析部１０２、特定パターンデータベース（以下、特定パターンＤＢという）１０３、第２言語統計的モデル１０４、特定パターン検出部１０５、第２言語生成部１０６、誤り検出・校正部１０７、翻訳データベース（以下、翻訳ＤＢという）１０８、および第２言語出力部１０９が設けられている。 The machine translation apparatus 100 includes an information processing apparatus such as a computer as a whole, and has a function of machine-translating an input first language into a second language.
The machine translation apparatus 100 includes, as main functional units, a first language input unit 101, a morpheme / syntax analysis unit 102, a specific pattern database (hereinafter referred to as a specific pattern DB) 103, a second language statistical model 104, a specific A pattern detection unit 105, a second language generation unit 106, an error detection / calibration unit 107, a translation database (hereinafter referred to as translation DB) 108, and a second language output unit 109 are provided.

第１言語入力部１０１は、キーボードなどの操作入力装置や外部装置（図示せず）からデータを取得する入力インターフェース回路からなり、翻訳対象となる第１言語の入力文を入力する機能を有している。 The first language input unit 101 includes an operation input device such as a keyboard and an input interface circuit that acquires data from an external device (not shown), and has a function of inputting an input sentence of a first language to be translated. ing.

形態素・構文解析部１０２は、第１言語入力部１０１で入力された第１言語の入力文に対して、形態素解析処理および構文解析処理を行う機能を有している。
図２は、日本語構文解析例を示す説明図である。ここでは、日本語入力文「一番近いレストランの駐車所は満員です」に対する日本語構文解析例が示されている。日本語入力文が形態素からなる語彙に分解され、これら語彙の品詞と組み合わせが示されている。 The morpheme / syntax analysis unit 102 has a function of performing morpheme analysis processing and syntax analysis processing on the input sentence of the first language input by the first language input unit 101.
FIG. 2 is an explanatory diagram showing an example of Japanese syntax analysis. Here, a Japanese parsing example is shown for the Japanese input sentence "The nearest restaurant parking lot is full". Japanese input sentences are broken down into vocabulary of morphemes, and the parts of speech and combinations of these vocabularies are shown.

特定パターンＤＢ１０３は、第１言語で用いられる特定の語彙を示す特定パターンを、形態素・構文解析部１０２の解析結果として付与される第１言語側の情報で記述して格納するデータベースであり、ハードディスクなどの記憶装置からなる記憶部に格納されている。 The specific pattern DB 103 is a database in which a specific pattern indicating a specific vocabulary used in the first language is described and stored as information on the first language side given as an analysis result of the morpheme / syntactic analysis unit 102, Are stored in a storage unit including a storage device.

第２言語統計的モデル１０４は、第２言語で用いられる語彙の共起に関する統計的共起情報を格納する統計的モデルであり、記憶部に格納されている。例えば、第２言語単言語コーパスで、Ｎ−ｇｒａｍ、決定木、ＳＶＭ（Support Vector Machine）、最大エントロピー、ＨＭＭ（Hidden Markov Model：隠れマルコフモデル）、ベイズ学習等の統計的手法を用いて構築できる。無論、これらの統計的手法に限定されない。第２言語統計的モデル１０４には、統計的共起情報として、第２言語の語彙または構文情報に関する、表記、原型、活用形、品詞、格フレーム、時制、態、相、意味分類、または、係り受け関係を有する共起パターンのうち、いずれか１つ以上に関する統計的共起情報が格納されている。 The second language statistical model 104 is a statistical model that stores statistical co-occurrence information related to vocabulary co-occurrence used in the second language, and is stored in the storage unit. For example, in the second language monolingual corpus, it can be constructed using statistical methods such as N-gram, decision tree, SVM (Support Vector Machine), maximum entropy, HMM (Hidden Markov Model), and Bayesian learning. . Of course, it is not limited to these statistical methods. The second language statistical model 104 includes, as statistical co-occurrence information, a notation, a prototype, a usage form, a part of speech, a case frame, a tense, a state, a phase, a semantic classification, or a vocabulary or syntax information of the second language. Statistical co-occurrence information regarding any one or more of the co-occurrence patterns having the dependency relationship is stored.

特定パターン検出部１０５は、形態素・構文解析部１０２で得られた解析結果から、特定パターンＤＢ１０３に格納されている特定パターンを検出する機能と、この検索で得られた特定パターンの解析結果と対応する第２言語の語彙または構文情報を翻訳ＤＢ１０８から取得する機能とを有している。
第２言語生成部１０６は、形態素・構文解析部１０２で得られた解析結果と翻訳ＤＢ１０８とを参照して、第１言語の入力文を第２言語に翻訳し、得られた翻訳結果として第２言語翻訳文を生成する機能を有している。 The specific pattern detection unit 105 corresponds to the function of detecting the specific pattern stored in the specific pattern DB 103 from the analysis result obtained by the morpheme / syntax analysis unit 102 and the analysis result of the specific pattern obtained by this search. The second language vocabulary or syntax information is acquired from the translation DB 108.
The second language generation unit 106 refers to the analysis result obtained by the morpheme / syntax analysis unit 102 and the translation DB 108, translates the input sentence of the first language into the second language, and obtains the first translation result as the obtained translation result. It has a function to generate bilingual translated sentences.

誤り検出・校正部１０７は、特定パターン検出部１０５で得られた第２言語の語彙または構文情報と第２言語統計的モデル１０４に格納されている統計的共起情報とを用いて、第２言語生成部１０６で生成された第２言語翻訳文の誤りを検出する機能と、当該第２言語翻訳文の誤りを自動校正する機能とを有している。誤り検出・校正部１０７での誤り検出処理としては、第２言語統計的モデルを用いて、第２言語生成部で生成された第２言語翻訳文から、不要成分の検出、欠落成分の検出、または、語順の誤り検出のうちのいずれか１つの誤り検出処理が行われる。 The error detection / calibration unit 107 uses the second language vocabulary or syntax information obtained by the specific pattern detection unit 105 and the statistical co-occurrence information stored in the second language statistical model 104 to generate a second It has a function of detecting an error in the second language translation sentence generated by the language generation unit 106 and a function of automatically calibrating the error of the second language translation sentence. As the error detection processing in the error detection / calibration unit 107, using the second language statistical model, detection of unnecessary components, detection of missing components from the second language translation sentence generated by the second language generation unit, Alternatively, any one of the error detections in the word order is detected.

また、誤り検出・校正部１０７では、第２言語統計的モデル１０４に登録された統計的モデルの学習素性や特徴量等を元に、第２言語翻訳文から第２言語統計的モデルに対応した学習素性や特徴量などの成分を抽出して、モデルに適用できるフォーマットに整形すれば、モデルに適用して最適解を推定できる。
例えば、語彙の表記情報、品詞、意味分類、係り受け関係を有するパターン等を学習素性や特徴量として、頻度値、確率値、情報量などの共起情報量で重み付けて構築された第２言語統計的モデル１０４を用いて、誤り検出・校正部１０７での処理を行う場合、モデルの学習素性や特徴量に適したに素性抽出方法とアルゴリズムで第２言語翻訳文から誤り検出・校正処理を行えばよい。 Further, the error detection / calibration unit 107 corresponds to the second language statistical model from the second language translation sentence based on the learning features and feature quantities of the statistical model registered in the second language statistical model 104. If components such as learning features and feature quantities are extracted and shaped into a format that can be applied to the model, the optimal solution can be estimated by applying it to the model.
For example, a second language constructed by weighting vocabulary notation information, parts of speech, semantic classification, patterns having dependency relationships, etc. as learning features and feature quantities, with co-occurrence information quantities such as frequency values, probability values, and information quantities When the statistical model 104 is used to perform processing in the error detection / calibration unit 107, error detection / calibration processing is performed from the second language translation sentence using a feature extraction method and algorithm suitable for the learning feature and feature amount of the model. Just do it.

また、第２言語統計的モデル１０４として、第２言語単言語コーパスで構築されたＮ−ｇｒａｍ言語統計的モデルやＣｌａｓｓＮ−ｇｒａｍモデルを用いた場合、第２言語翻訳文から、特定パターンに応じた不要成分、欠落成分が存在するかどうかを検出する処理、または、文成分の語順の誤りが存在するかどうかを検出する処理を行う最も単純な計算方法の一例として、第２言語統計的モデル１０４で、第２言語翻訳文と特定パターン検出部１０５により取得された第２言語の語彙や構文情報との組み合わせから構成される生成文の大局的な統計情報量を最大となるものを最適な解とすればよい。また、文生起確率を計算する際に、第２言語生成文の語彙の字面の共起情報、または、品詞クラスの共起情報で計算すればよい。特に、Ｎ−ｇｒａｍを用いる際に、前向きのＮ−ｇｒａｍモデルと後向きのＮ−ｇｒａｍモデルを同時に使用してもよい。 Further, when an N-gram language statistical model or a Class N-gram model constructed with the second language monolingual corpus is used as the second language statistical model 104, the second language translation sentence is used in accordance with a specific pattern. As an example of the simplest calculation method for detecting whether there is an unnecessary component or missing component, or detecting whether there is an error in the word order of sentence components, the second language statistical model In 104, the one that maximizes the global statistical information amount of the generated sentence composed of a combination of the second language translation sentence and the vocabulary and syntax information of the second language acquired by the specific pattern detection unit 105 is optimized. It can be a solution. Further, when calculating the sentence occurrence probability, it may be calculated using the co-occurrence information of the vocabulary of the vocabulary of the second language generation sentence or the co-occurrence information of the part of speech class. In particular, when using an N-gram, a forward-facing N-gram model and a backward-facing N-gram model may be used simultaneously.

あるいは、第２言語統計的モデル１０４として、正例や負例の学習データでＳＶＭを用いて構築されたモデルを用いた場合、誤り検出・校正部１０７では、ＳＶＭモデルを構築する際に使われた学習素性に基づいて、第２言語翻訳文から学習素性に一致するものを抽出して一定のフォーマットで整形し、ＳＶＭモデルで第２言語翻訳文から特定対象の不要な文成分、欠落の分析文、または語順の誤り等を検出して、校正処理を行うことができる。 Alternatively, when a model constructed using SVM with positive or negative learning data is used as the second language statistical model 104, the error detection / calibration unit 107 is used when constructing the SVM model. Based on the learning feature, the sentence that matches the learning feature is extracted from the second language translation sentence and shaped in a certain format, and the SVM model analyzes the unnecessary sentence components and missing parts of the specific target from the second language translation sentence. A proofreading process can be performed by detecting an error in sentence or word order.

翻訳ＤＢ１０８は、翻訳規則、翻訳辞書、または翻訳規則と翻訳辞書の両方を含む。翻訳規則は、一般的に、ルールベースの翻訳システムに使われる第１言語側の解析規則、第２言語側の生成規則、および第１言語・第２言語の両側の対応規則が記憶されたものである。翻訳辞書は、一般的に、ルールベースの翻訳システムに使われる第１言語の語彙と第２言語の語彙との対応が取れた辞書である。 The translation DB 108 includes a translation rule, a translation dictionary, or both a translation rule and a translation dictionary. The translation rule generally stores the analysis rules on the first language side used in the rule-based translation system, the generation rules on the second language side, and the corresponding rules on both sides of the first language and the second language. It is. A translation dictionary is generally a dictionary in which correspondence between a vocabulary of a first language and a vocabulary of a second language used in a rule-based translation system is obtained.

第２言語出力部１０９は、ＬＣＤやＰＤＰなどの画面表示装置や、外部装置（図示せず）に対してデータを出力する出力インターフェース回路からなり、特定パターン検出部１０５から特定パターンが検出されない場合、第２言語生成部１０６からの第２言語翻訳文を出力する機能と、特定パターン検出部１０５から特定パターンが検出された場合、第２言語生成部１０６からの第２言語翻訳文を誤り検出・校正部１０７により校正された結果を出力する機能とを有している。 The second language output unit 109 includes a screen display device such as an LCD or a PDP, or an output interface circuit that outputs data to an external device (not shown). When the specific pattern is not detected by the specific pattern detection unit 105 , The function of outputting the second language translation from the second language generator 106, and the error detection of the second language translation from the second language generator 106 when the specific pattern is detected from the specific pattern detector 105 A function of outputting a result calibrated by the calibration unit 107;

図３は、本発明の第１の実施形態にかかる機械翻訳装置で用いる特定パターンＤＢを示す説明図である。この例は、日中機械翻訳システムと想定したもので、日本語単言語のみの情報で特定パターンデータベースが構築された例を示している。ここでは、日本語格助詞「の」、形式名詞「の」、助動詞「だ」などの品詞情報と表記からなる特定パターンが記録されている。 FIG. 3 is an explanatory diagram showing a specific pattern DB used in the machine translation apparatus according to the first embodiment of the present invention. This example is assumed to be a daytime machine translation system, and shows an example in which a specific pattern database is constructed with information in only a single Japanese language. Here, a specific pattern consisting of part-of-speech information and notation such as a Japanese case particle “NO”, a formal noun “NO”, and an auxiliary verb “DA” is recorded.

例えば、格助詞の「の」と対応する中国語の訳語は、文脈状況により、何も生成しない訳語と中国語構造助詞「的」の２つがある。しかし、これらの中国語訳語を正しく訳し分けできるはっきりした日本語および中国語の文法現象が極めて複雑で、機械翻訳規則を作成する際に、一般的には、厳密な翻訳規則を作成するのが困難である。
また、形式名詞「の」、助動詞「だ」も同様な問題を存在する。形式名詞「の」の中国語訳語は、「的」と「的東西」の２つがある。助動詞「だ」の中国語訳語は「在」「叫」「是」「有」などがあり、これらの語彙の中国語訳語の訳し分けも極めて複雑である。
よって、これらの語彙に対応する中国語へ翻訳規則は複雑で一般的に作成しきれないため、本発明では、これらのものを特定パターンＤＢ１０３に記録して、特定パターン検出部１０５により、特定パターンが入力文中に含まれるかどうかを検出している。 For example, there are two Chinese translations corresponding to the case particle “no”, a translation that does not generate anything and a Chinese structure particle “target”, depending on the context. However, the clear Japanese and Chinese grammatical phenomena that can correctly translate and translate these Chinese translations are extremely complex. When creating machine translation rules, it is generally necessary to create strict translation rules. Have difficulty.
In addition, the formal noun “NO” and the auxiliary verb “DA” have similar problems. There are two Chinese translations of the formal noun “NO”: “target” and “target east-west”. The Chinese translations of the auxiliary verb “Da” are “present”, “screaming”, “sore”, “present”, etc., and the translation of the Chinese translations of these vocabularies is extremely complicated.
Therefore, the rules for translation into Chinese corresponding to these vocabularies are complicated and cannot be generally created. Therefore, in the present invention, these rules are recorded in the specific pattern DB 103 and the specific pattern detection unit 105 performs the specific pattern detection. Is detected in the input sentence.

また、特定パターン検出部１０５は、形態素・構文解析部１０２から得られた第１言語入力文の解析結果から、特定パターンＤＢ１０３に格納されている特定パターンが含まれるかを検出し、検出された特定パターンに対して、第１言語文章に含まれている隣接語彙の情報、または、検出された特定パターンと係り受け関係を有する構文情報を取得して、これらの情報の第２言語側の語彙や構文情報を翻訳辞書や翻訳規則から取得して、誤り検出・校正部１０７に使えるような仕組みを用いてもよい。 Further, the specific pattern detection unit 105 detects whether or not the specific pattern stored in the specific pattern DB 103 is included from the analysis result of the first language input sentence obtained from the morpheme / syntax analysis unit 102. For a specific pattern, information on adjacent vocabulary contained in the first language sentence or syntax information having a dependency relationship with the detected specific pattern is acquired, and the vocabulary on the second language side of these information is acquired. Alternatively, a mechanism may be used in which the syntax information is acquired from a translation dictionary or translation rules and can be used by the error detection / calibration unit 107.

ここで、図３を参照して、特定パターン検出部１０５の処理を説明すると、まず、形態素・構文解析部１０２の解析結果と図３に示す特定パターンＤＢ１０３とを用いて、パターンマッチングを行うことにより、日中翻訳システムの日本語入力文の中に含まれているかどうかを検出できる。
次に、図３に示す特定パターンが検出された場合、検出された特定パターンの品詞と表記情報を検索キーとして、誤り検出・校正部１０７により、翻訳ＤＢ１０８の日中翻訳辞書の辞書引き処理を行って、検出された特定パターンの中国語の対訳候補を取得する。 Here, the processing of the specific pattern detection unit 105 will be described with reference to FIG. 3. First, pattern matching is performed using the analysis result of the morpheme / syntax analysis unit 102 and the specific pattern DB 103 shown in FIG. Thus, it can be detected whether it is included in the Japanese input sentence of the Japanese-Chinese translation system.
Next, when the specific pattern shown in FIG. 3 is detected, the error detection / calibration unit 107 performs a dictionary lookup process for the daytime translation dictionary in the translation DB 108 using the part of speech and notation information of the detected specific pattern as search keys. Go to acquire Chinese translation candidates of the detected specific pattern.

続いて、誤り検出・校正部１０７により、第２言語生成部１０６で得られた中国語翻訳文に対して、対訳候補を誤り検出・校正処理の対象とする。また、特定パターンの日本語語彙と係り受け関係を有する語彙情報も取得して、それに対応した中国語側の語彙情報を同時に利用すれば、誤り検出・校正処理に利用すれば、精度を向上できる。
例えば、図３に示す日本語格助詞「の」品詞と表記を検索キーとして、日中翻訳辞書から「訳語なし」と示す記号「Φ」と中国語構造助詞「的」を取得できる。これらの情報を中国語生成側の特定対象として、中国語統計的モデルを用いて中国語翻訳文から候補の適切性を判定することができる。 Subsequently, the error detection / proofreading unit 107 sets the parallel translation candidate for error detection / proofreading processing for the Chinese translation obtained by the second language generation unit 106. Also, by acquiring vocabulary information that has a dependency relationship with a Japanese vocabulary of a specific pattern and using the corresponding vocabulary information on the Chinese side at the same time, the accuracy can be improved if it is used for error detection and proofreading processing. .
For example, the symbol “Φ” indicating “no translation” and the Chinese structure particle “target” can be acquired from the Japanese-Chinese translation dictionary using the Japanese case particle “no” part of speech and notation shown in FIG. 3 as search keys. With these pieces of information as the identification target on the Chinese generation side, the suitability of candidates can be determined from the Chinese translation using a Chinese statistical model.

機械翻訳装置１００の機能部のうち、形態素・構文解析部１０２、特定パターン検出部１０５、第２言語生成部１０６、誤り検出・校正部１０７については、専用の情報処理回路で実現してもよく、ＣＰＵとその周辺回路を有し、記憶部（図示せず）からプログラムを読み込んで実行することにより、各種処理部を実現する演算処理部で実現してもよい。 Among the functional units of the machine translation apparatus 100, the morpheme / syntax analyzing unit 102, the specific pattern detecting unit 105, the second language generating unit 106, and the error detecting / calibrating unit 107 may be realized by a dedicated information processing circuit. It may be realized by an arithmetic processing unit that has a CPU and its peripheral circuits, and implements various processing units by reading and executing a program from a storage unit (not shown).

［第１の実施形態の動作］
次に、図１と図４を参照して、本発明の第１の実施形態にかかる機械翻訳装置の動作について詳細に説明する。図４は、第１の実施形態にかかる機械翻訳処理を示すフローチャートである。 [Operation of First Embodiment]
Next, the operation of the machine translation apparatus according to the first embodiment of the present invention will be described in detail with reference to FIGS. FIG. 4 is a flowchart showing machine translation processing according to the first embodiment.

まず、第１言語入力部１０１により、翻訳対象となる第１言語の入力文を入力する（ステップＳ１１）、形態素・構文解析部１０２により、この入力文に対して、形態素解析または単語分割等の処理を行い、その形態素情報を用いて入力文の構文解析を行い、入力文に含まれる語彙間の係り受け関係を有する語彙を取得する処理を行い、解析した結果を記憶しておく（ステップＳ１２）。 First, the first language input unit 101 inputs an input sentence of the first language to be translated (step S11), and the morpheme / syntax analysis unit 102 performs morphological analysis or word division on the input sentence. Processing is performed, syntax analysis of the input sentence is performed using the morpheme information, processing for obtaining a vocabulary having a dependency relationship between words included in the input sentence is performed, and the analysis result is stored (step S12). ).

続いて、形態素・構文解析部１０２で得られた形態素・構文解析結果から、特定パターン検出部１０５により、特定パターンＤＢ１０３に登録されている特定パターンを検出し（ステップＳ１３）、入力文から特定パターンが検出されなかった場合、ステップＳ１４へ移行する。 Subsequently, the specific pattern registered in the specific pattern DB 103 is detected by the specific pattern detection unit 105 from the morpheme / syntax analysis result obtained by the morpheme / syntax analysis unit 102 (step S13), and the specific pattern is detected from the input sentence. If NO is detected, the process proceeds to step S14.

一方、ステップＳ１３において、入力文から特定パターンが検出された場合、特定パターン検出部１０５により、検出された特定パターンに対応する解析結果を用いて機械翻訳システムの翻訳規則または翻訳辞書から第２言語の語彙や構文情報を取得して、検出された特定パターンおよび第２言語の語彙や構文情報を記憶し、ステップＳ１４へ移行する。
この際、前述したように、入力文から特定パターンが検出された場合、特定パターン検出部１０５により、第１言語文章に含まれている特定パターンと隣接する語彙の情報、または、検出された特定パターンと係り受け関係を有する構文情報を取得し、翻訳辞書や翻訳規則からこれらの情報に対応する第２言語側の語彙や構文情報を取得してもよい。 On the other hand, when a specific pattern is detected from the input sentence in step S13, the specific pattern detection unit 105 uses the analysis result corresponding to the detected specific pattern to search the second language from the translation rule or translation dictionary of the machine translation system. , And the detected specific pattern and the vocabulary and syntax information of the second language are stored, and the process proceeds to step S14.
At this time, as described above, when a specific pattern is detected from the input sentence, the specific pattern detection unit 105 detects the vocabulary information adjacent to the specific pattern included in the first language sentence, or the detected specific pattern. Syntactic information having a dependency relationship with the pattern may be acquired, and the vocabulary and syntax information on the second language side corresponding to the information may be acquired from a translation dictionary or a translation rule.

次に、形態素・構文解析部１０２で得られた形態素・構文解析の結果と、第１言語から第２言語へ翻訳するための翻訳辞書と翻訳規則を用いて、第２言語生成部１０６により、第１言語の入力文を第２言語へ翻訳し、その翻訳結果である第２言語翻訳文を生成する（ステップＳ１４）。
この後、ステップＳ１３で特定パターンが検出されなかった場合、ステップＳ１６へ移行して、ステップＳ１４で生成した第２言語翻訳文を第２言語出力部１０９で出力し（ステップＳ１６）、一連の機械翻訳処理を終了する。 Next, the second language generation unit 106 uses the result of the morpheme / syntax analysis obtained by the morpheme / syntax analysis unit 102 and the translation dictionary and translation rules for translation from the first language to the second language. The input sentence in the first language is translated into the second language, and a second language translation sentence that is the translation result is generated (step S14).
Thereafter, when the specific pattern is not detected in step S13, the process proceeds to step S16, and the second language translation sentence generated in step S14 is output by the second language output unit 109 (step S16). End the translation process.

一方、ステップＳ１３で特定パターンが検出された場合、ステップＳ１３で記憶した特定パターンと対応する第２言語の語彙や構文情報と、第２言語統計的モデル１０４の統計的共起情報とを用いて、誤り検出・校正部１０７により、ステップＳ１４で生成した第２言語翻訳文の誤りを検出して、この誤りを校正した後（ステップＳ１５）、第２言語出力部１０９で校正後の第２言語翻訳文さらには校正結果を出力し（ステップＳ１６）、一連の機械翻訳処理を終了する。 On the other hand, when the specific pattern is detected in step S13, the vocabulary and syntax information of the second language corresponding to the specific pattern stored in step S13 and the statistical co-occurrence information of the second language statistical model 104 are used. After the error detection / proofreading unit 107 detects an error in the second language translation sentence generated in step S14 and corrects this error (step S15), the second language output unit 109 corrects the second language. The translated text and the proofreading result are output (step S16), and the series of machine translation processing is terminated.

［第１の実施形態の効果］
このように、本実施形態では、翻訳規則として記述しにくいまたは記述しきれない文法現象を持つ語彙を、特定パターンとして予め特定パターンＤＢ１０３に登録しておき、この特定パターンが第１言語の入力文に含まれていた場合には、当該特定パターンの解析結果と対応する第２言語の語彙または構文情報を翻訳ＤＢ１０８から取得し、この第２言語の語彙または構文情報と第２言語統計的モデル１０４に格納されている統計的共起情報とを用いて、誤り検出・校正部１０７により、第１言語を翻訳して得られた第２言語翻訳文の誤りを検出し、当該第２言語翻訳文を校正するようにしたので、翻訳規則として記述しにくいまたは記述しきれない文法現象に対する正確な翻訳規則を必要とすることなく、機械翻訳を行うことができる。 [Effect of the first embodiment]
As described above, in this embodiment, a vocabulary having a grammatical phenomenon that is difficult to describe or cannot be described as a translation rule is registered in advance in the specific pattern DB 103 as a specific pattern, and this specific pattern is an input sentence in the first language. The second language vocabulary or syntax information corresponding to the analysis result of the specific pattern is obtained from the translation DB 108, and the second language vocabulary or syntax information and the second language statistical model 104 are acquired. The error detection / calibration unit 107 detects an error in the second language translation sentence obtained by translating the first language using the statistical co-occurrence information stored in the second language translation sentence. Thus, machine translation can be performed without requiring accurate translation rules for grammatical phenomena that are difficult to describe or cannot be described as translation rules.

これにより、翻訳規則として記述しにくいまたは記述しきれない文法現象に対して、翻訳規則の作成に必要となるコストを削減することが可能となる。例えば、本実施形態において、図３に示す特定パターンＤＢの例で日中翻訳システムに実装される場合、日本語格助詞「の」、日本語形式名詞「の」、日本語助動詞「だ」の訳語選択における翻訳規則を作成する必要がなくなる。 This makes it possible to reduce the cost required to create a translation rule for a grammatical phenomenon that is difficult to describe or cannot be described as a translation rule. For example, in this embodiment, when the example of the specific pattern DB shown in FIG. 3 is implemented in a Japanese-Chinese translation system, the Japanese case particle “NO”, the Japanese formal noun “NO”, and the Japanese auxiliary verb “DA” There is no need to create a translation rule for translation selection.

また、本実施の形態において、第２言語統計的モデルとして、第２言語の語彙または構文情報に関する、表記、原型、活用形、品詞、格フレーム、時制、態、相、意味分類、または、係り受け関係を有する共起パターンのうち、いずれか１つ以上の統計的共起情報を格納するモデルを用いてもよい。 Further, in the present embodiment, as the second language statistical model, notation, prototype, inflection form, part of speech, case frame, tense, state, phase, semantic classification, or relation regarding vocabulary or syntax information of the second language You may use the model which stores any one or more statistical co-occurrence information among the co-occurrence patterns which have a receiving relationship.

また、本実施の形態において、誤り検出・校正部で、第２言語統計的モデルを用いて、第２言語生成部で生成された第２言語翻訳文から、不要成分の検出、欠落成分の検出、または、語順の誤り検出のうちのいずれか１つの誤り検出処理と、得られた誤りの自動校正処理とを行うようにしてもよい。 In the present embodiment, the error detection / calibration unit uses the second language statistical model to detect unnecessary components and missing components from the second language translation sentence generated by the second language generation unit. Alternatively, the error detection process of any one of the error detections in the word order and the automatic correction process of the obtained error may be performed.

また、本実施の形態において、第２言語が中国語である場合、特定パターンデータベースに格納されている特定パターンとして、中国語の定語成分と対応する第１言語の構文成分を用い、誤り検出・校正部で、第２言語生成部により生成された中国語翻訳文の中国語定語成分に対して、中国語定語の語順および構造助詞「的」の誤り検出処理と、得られた誤りの自動校正処理とを行うようにしてもよい。 Further, in the present embodiment, when the second language is Chinese, error detection is performed by using the syntax component of the first language corresponding to the Chinese constant word component as the specific pattern stored in the specific pattern database.・ In the proofreading part, for the Chinese constants of the Chinese translation generated by the second language generation part, the error detection processing of the word order of the Chinese constants and the structure particle “target” and the obtained error The automatic calibration process may be performed.

また、本実施の形態において、第２言語が中国語である場合、特定パターンデータベースに格納されている特定パターンとして、中国語の状語成分と対応する第１言語の構文成分を用い、誤り検出・校正部で、第２言語生成部により生成された中国語翻訳文の中国語状語成分に対して、中国語状語の語順および構造助詞「地」の誤り検出処理と、得られた誤りの自動校正処理とを行うようにしてもよい。 Further, in the present embodiment, when the second language is Chinese, error detection is performed by using the syntax component of the first language corresponding to the Chinese character component as the specific pattern stored in the specific pattern database.・ In the proofreading part, for the Chinese-like word component of the Chinese translation sentence generated by the second language generation part, the error detection processing of the word order of the Chinese-like word and the structure particle “earth”, and the obtained error The automatic calibration process may be performed.

また、本実施の形態において、第２言語が中国語である場合、特定パターンデータベースに格納されている特定パターンとして、中国語の補語成分と対応する第１言語の構文成分を用い、誤り検出・校正部で、第２言語生成部により生成された中国語翻訳文の中国語補語成分に対して、中国語補語の語順および構造助詞「得」の誤り検出処理と、得られた誤りの自動校正処理とを行うようにしてもよい。 In the present embodiment, when the second language is Chinese, the syntax component of the first language corresponding to the Chinese complement component is used as the specific pattern stored in the specific pattern database, and error detection / In the proofreading section, for the Chinese complement component of the Chinese translation sentence generated by the second language generation section, the error detection processing of the word order of the Chinese complement and the structure particle “Koto”, and automatic correction of the obtained error Processing may be performed.

また、本実施の形態において、第２言語が中国語である場合、特定パターンデータベースに格納されている特定パターンとして、中国語の量詞成分と対応する第１言語の構文成分を用い、誤り検出・校正部で、第２言語生成部により生成された中国語翻訳文の中国語量詞成分に対して、中国語量詞の誤り検出処理と、得られた誤りの自動校正処理とを行うようにしてもよい。 In the present embodiment, when the second language is Chinese, the syntax component of the first language corresponding to the Chinese verbal component is used as the specific pattern stored in the specific pattern database, and error detection / The proofreading unit may perform error detection processing of the Chinese quantifier and automatic correction processing of the obtained error for the Chinese quantifier component of the Chinese translation sentence generated by the second language generation unit. Good.

また、本実施の形態において、第２言語が中国語である場合、特定パターンデータベースに格納されている特定パターンとして、中国語の態相情報を表す語彙や構文情報と対応する第１言語の構文成分を用い、誤り検出・校正部で、第２言語生成部により生成された中国語翻訳文の中国語態相情報を表す語彙や構文情報成分に対して、中国語の態相情報を表す語彙や構文情報成分の誤り検出処理と、得られた誤りの自動校正処理とを行うようにしてもよい。 Further, in the present embodiment, when the second language is Chinese, the syntax of the first language corresponding to the vocabulary and syntax information representing the state information of Chinese is used as the specific pattern stored in the specific pattern database. Vocabulary that expresses Chinese modal information with respect to vocabulary and syntactic information components of the Chinese translation sentence generated by the second language generator in the error detection / proofreading section using the component Alternatively, error detection processing of the syntax information component and automatic correction processing of the obtained error may be performed.

また、本実施の形態において、第２言語が中国語である場合、特定パターンデータベースに格納されている特定パターンとして、中国語の前置詞である介詞と対応する第１言語の構文成分を用い、誤り検出・校正部で、第２言語生成部により生成された中国語翻訳文の中国語介詞成分に対して、中国語介詞の誤り検出処理と、得られた誤りの自動校正処理とを行うようにしてもよい。 Further, in the present embodiment, when the second language is Chinese, the syntax component of the first language corresponding to the infix which is a Chinese preposition is used as the specific pattern stored in the specific pattern database, and an error occurs. The detection / calibration unit performs error detection processing of the Chinese language verb and automatic correction processing of the obtained error on the Chinese language component of the Chinese translation generated by the second language generation unit. May be.

［第２の実施形態］
次に、図５を参照して、本発明の第２の実施形態にかかる機械翻訳装置について説明する。図５は、本発明の第２の実施形態にかかる機械翻訳装置の構成を示すブロック図であり、図１と同じまたは同等部分には同一符号を付してある。 [Second Embodiment]
Next, a machine translation device according to a second embodiment of the present invention will be described with reference to FIG. FIG. 5 is a block diagram showing a configuration of a machine translation apparatus according to the second embodiment of the present invention, and the same or equivalent parts as those in FIG. 1 are denoted by the same reference numerals.

図５に示すように、本実施形態は、図１に示した第１の実施形態にかかる機械翻訳装置のうち特定パターンＤＢ１０３に代えて、特定パターンＤＢ２０１を備えている。その他の構成については、図１に示した第１の実施形態と同等であり、ここでの詳細な説明は省略する。
この特定パターンＤＢ２０１は、第１言語で用いられる特定の語彙を示す特定パターンが、形態素・構文解析部１０２の解析結果として付与される第１言語側の情報と、この第１言語側の情報に対応する第２言語側の特定対象の情報との組で記述されているデータベースである。 As shown in FIG. 5, the present embodiment includes a specific pattern DB 201 instead of the specific pattern DB 103 in the machine translation apparatus according to the first embodiment shown in FIG. 1. Other configurations are the same as those of the first embodiment shown in FIG. 1, and a detailed description thereof is omitted here.
In the specific pattern DB 201, a specific pattern indicating a specific vocabulary used in the first language is added to the information on the first language given as the analysis result of the morpheme / syntax analyzing unit 102, and the information on the first language side. It is a database described in a pair with corresponding information of a specific target on the second language side.

図６は、本発明の第２の実施形態にかかる機械翻訳装置で用いる特定パターンＤＢを示す説明図である。この例は、日中機械翻訳システムを想定したものであり、図６に示すものは、日中機械翻訳システムにおける翻訳規則の利用情報と中国語構造助詞との関連付けをするためのテーブルである。 FIG. 6 is an explanatory diagram showing a specific pattern DB used in the machine translation apparatus according to the second embodiment of the present invention. This example assumes a Japanese-Chinese machine translation system, and what is shown in FIG. 6 is a table for associating translation rule usage information with Chinese structure particles in the Japanese-Chinese machine translation system.

例えば、図６の検索ＩＤ０番に示すパターンは、日本語側の解析規則として、日本語入力文に対して形態素・構文解析を行う際に、「日本語連体修飾の取り込み」という翻訳規則が適用されることを意味する。また、図６の検索ＩＤ０番に対応する中国語側生成情報は、定語スロットが生成されることを意味する。そして、図６の検索ＩＤ０番に対応する中国語構造助詞の「的」は、中国語翻訳結果の中から構造助詞「的」の適切性を判定するための特定対象とすることを意味する。 For example, in the pattern shown in the search ID No. 0 in FIG. 6, the translation rule “capture Japanese language modification” is applied as the analysis rule on the Japanese side when performing morpheme / syntactic analysis on the Japanese input sentence. Means that Further, the Chinese side generation information corresponding to the search ID No. 0 in FIG. 6 means that a fixed word slot is generated. Then, “target” of the Chinese structure particle corresponding to the search ID No. 0 in FIG. 6 means that it is a specific object for determining the appropriateness of the structure particle “target” from the Chinese translation result.

なお、スロットとは、構文解析で用いる格フレームの構成要素であり、格スロットともいう。格文法に基づく構文解析では、動詞が要求する言語構造(格)に着目して解析する手法があり、格として、どのようなものがあるか、どういう性質を持つものなのかを記述した木構造の格フレームを用い、形態素解析で得られた形態素またはこれら組み合わせ句を格スロットへ当てはめることにより、構文解析を行う。
また、定語とは、中国語の文法成分のうちの修飾成分の１つであり、日本語連体修飾語に相当する。中国語定語および日本語連体修飾語とも、「修飾語（名詞・代名詞・形容詞）＋被修飾語（名詞・名詞句）」という構造を持つ。 A slot is a component of a case frame used in syntax analysis and is also called a case slot. In parsing based on case grammar, there is a method to analyze by focusing on the language structure (case) required by the verb, and a tree structure describing what kind of case is and what kind of property it has Using the case frame, the morpheme obtained by the morpheme analysis or the combination phrase thereof is applied to the case slot to perform the syntax analysis.
A fixed word is one of the modifiers of Chinese grammatical components and corresponds to a Japanese collocation modifier. Chinese constants and Japanese modifiers have a structure of “modifier (noun, pronoun, adjective) + modifier (noun, noun phrase)”.

図６の検索ＩＤ０番は、日本語側の解析において、「日本語連体修飾の取り込み」という翻訳規則が適用されて、かつ、この規則に対応する中国語生成側では、中国語定語スロットが生成されると、日本語入力文の中国語翻訳文に対して、中国語構造助詞「的」の適切性を判定すること意味をする。
同様に、中国語構造助詞「地」の適切性を判別する条件は図６の検索ＩＤ１番の特定パターンに示す。中国語構造助詞「得」の適切性を判別する条件は図６の検索ＩＤ２番の特定パターンに示す。 In the search ID No. 0 in FIG. 6, the translation rule “import Japanese linkage modification” is applied in the analysis on the Japanese side, and the Chinese fixed word slot is set on the Chinese generation side corresponding to this rule. When generated, it means to determine the appropriateness of the Chinese structure particle “target” for the Chinese translation of the Japanese input sentence.
Similarly, the conditions for determining the appropriateness of the Chinese structure particle “ground” are shown in the specific pattern of the search ID No. 1 in FIG. The conditions for determining the appropriateness of the Chinese structure particle “Koto” are shown in the specific pattern of the search ID No. 2 in FIG.

また、図７は、本発明の第２の実施形態にかかる機械翻訳装置で用いる特定パターンＤＢの構成例を示す説明図である。この例は、日中機械翻訳システムにおける中国語翻訳文の語順をチェックするための特定パターンＤＢ２０１の構成例を示している。 FIG. 7 is an explanatory diagram showing a configuration example of the specific pattern DB used in the machine translation apparatus according to the second embodiment of the present invention. This example shows a configuration example of the specific pattern DB 201 for checking the word order of Chinese translation sentences in the daytime machine translation system.

図６と同様に、図７の検索ＩＤ０番で説明すると、検索ＩＤ０番は全体的に、日本語側の解析では、「日本語連体修飾の取り込み」との翻訳規則が適用されて、かつ、この規則に対応する中国語生成側では中国語定語スロットが生成され、かつ、中国語定語スロットが２箇所以上を有する場合、日本語入力文の中国語翻訳文に対して、中国語の定語成分に対して、生成語順の適切性を判定することを意味をする。
同様に、中国語状語の適切性を判別する条件は図７の検索ＩＤ１番に示す。また、中国語補語の適切性を判別する条件は図７の検索ＩＤ２番に示す。 Similar to FIG. 6, the search ID No. 0 in FIG. 7 will be described. As a whole, the search ID No. 0 is applied with the translation rule of “import Japanese language modification” in the analysis on the Japanese side, and If the Chinese generation side corresponding to this rule generates a Chinese fixed word slot and the Chinese fixed word slot has two or more places, the Chinese translation of the Japanese input sentence It means that the appropriateness of the generated word order is determined for the fixed word component.
Similarly, the conditions for determining the appropriateness of Chinese characters are shown in the search ID No. 1 in FIG. The conditions for determining the appropriateness of the Chinese complement are shown in the search ID No. 2 in FIG.

ここで、中国語定語、状語、補語との概念が中国語の構文要素である。一般的に中国語構文成分は、主語、述語、目的語、定語、状語、補語を用いて記述する。機械翻訳においては、これらの情報を中国語格フレームに付与する。
そして、中国語構造助詞「的」は中国語の定語に伴うもので、中国語構造助詞「地」は中国語の状語に伴うもので、構造助詞「得」は中国語の補語に伴うものである。これらの構造助詞は中国語文脈によりあったりなかったりするため、機械翻訳においては、厳密な翻訳規則を作成するのに、多大なコストを要する。 Here, the concepts of Chinese constants, words, and complements are Chinese syntax elements. In general, a Chinese syntactic component is described using a subject, predicate, object, constant, condition, and complement. In machine translation, this information is added to the Chinese case frame.
And the Chinese structural particle “ma” is associated with Chinese constants, the Chinese structural particle “ji” is associated with Chinese characters, and the structural particle “Toku” is associated with Chinese complements. Is. Since these structural particles may or may not exist depending on the Chinese context, it takes a great deal of cost to create strict translation rules in machine translation.

前述したように、中国語翻訳文のうちから誤り検出・校正の対象となる特定対象を選択し、中国語統計的モデルを用いて中国語翻訳文のうち、対応した特定対象の適切性を判別して、不要な成分、欠落成分および語順のチェックなどの誤り検出・校正処理を行うことができる。
また、中国語翻訳文のうち、その他の成分の誤り検出および校正処理を行う必要がある際に、その第１言語と対応する語彙情報や翻訳規則などの情報を用いて、特定パターンＤＢ２０１に記述して、本発明に適用すればよい。 As mentioned above, select the target for error detection and proofreading from the Chinese translations, and determine the appropriateness of the corresponding specific target among the Chinese translations using the Chinese statistical model Thus, error detection / calibration processing such as checking unnecessary components, missing components and word order can be performed.
In addition, when it is necessary to perform error detection and proofreading processing of other components in the Chinese translation, it is described in the specific pattern DB 201 using information such as vocabulary information and translation rules corresponding to the first language. Then, it may be applied to the present invention.

例えば、中国語翻訳文のうち、量詞の適切性を特定したい場合、日本語代名詞と、日本語名詞との品詞パターンで中国語量詞との関連付けておけばよい。
また、中国語の態相情報を表す助詞の適切性を特定したい場合、日本語の態相情報を表す日本語助詞または助動詞と中国語の態相情報を表す助詞との対応を取れたテーブルを作成すればよい。
また、中国語前置詞である介詞の適切性を特定したい場合、日本語格助詞や副助詞などと中国語介詞との対応を取れたテーブルを作成すればよい。 For example, when it is desired to specify the appropriateness of a participle in a Chinese translation, it is only necessary to associate the Chinese participle with a Japanese pronoun and a part of speech pattern of the Japanese noun.
In addition, if you want to specify the appropriateness of particles that represent Chinese modal information, you can create a table with correspondence between Japanese particles or vocabulary that represent Japanese modal information and particles that represent Chinese modal information. Create it.
In addition, when it is desired to specify the appropriateness of an injunction that is a Chinese preposition, a table in which the correspondence between Japanese case particles and adjunct particles and Chinese injunction can be created.

［第２の実施形態の動作］
次に、図５と図８を参照して本発明を実施するための第２の形態の動作について詳細に説明する。図８は、本発明の第２の実施形態にかかる機械翻訳処理を示すフローチャートであり、図４と同じまたは同等部分には同一符号を付してある。
図８に示すように、本実施形態にかかる機械翻訳処理は、図４と比較して、ステップＳ１３に代えてステップＳ２１が設けられており、ステップＳ１４とステップＳ１５との間にステップＳ２２が追加されている。 [Operation of Second Embodiment]
Next, the operation of the second embodiment for carrying out the present invention will be described in detail with reference to FIGS. FIG. 8 is a flowchart showing machine translation processing according to the second embodiment of the present invention, and the same or equivalent parts as those in FIG. 4 are denoted by the same reference numerals.
As shown in FIG. 8, in the machine translation processing according to the present embodiment, step S21 is provided instead of step S13 as compared with FIG. 4, and step S22 is added between step S14 and step S15. Has been.

まず、第１言語入力部１０１により、翻訳対象となる第１言語の入力文を入力する（ステップＳ１１）、形態素・構文解析部１０２により、この入力文に対して、形態素解析または単語分割等の処理を行い、その形態素情報を用いて入力文の構文解析を行い、入力文に含まれる語彙間の係り受け関係を有する語彙を取得する処理を行い、解析した結果を記憶しておく（ステップＳ１２）。この際、形態素・構文解析処理の各段階で適用した解析ルールも解析結果の一部として記憶しておく。 First, the first language input unit 101 inputs an input sentence of the first language to be translated (step S11), and the morpheme / syntax analysis unit 102 performs morphological analysis or word division on the input sentence. Processing is performed, syntax analysis of the input sentence is performed using the morpheme information, processing for obtaining a vocabulary having a dependency relationship between words included in the input sentence is performed, and the analysis result is stored (step S12). ). At this time, the analysis rules applied at each stage of the morpheme / syntax analysis process are also stored as a part of the analysis result.

続いて、形態素・構文解析部１０２で得られた形態素・構文解析結果から、特定パターン検出部１０５により、特定パターンＤＢ２０１に登録されている第１言語側の情報からなる特定パターンと一致するものすべてを、特定パターン候補として検出する（ステップＳ２１）。
次に、形態素・構文解析部１０２で得られた形態素・構文解析の結果と、第１言語から第２言語へ翻訳するための翻訳辞書と翻訳規則を用いて、第２言語生成部１０６により、第１言語の入力文を第２言語へ翻訳し、その翻訳結果である第２言語翻訳文を生成する（ステップＳ１４）。 Subsequently, from the morpheme / syntax analysis results obtained by the morpheme / syntax analysis unit 102, all of the morpheme / syntax analysis results that match the specific pattern made up of the information in the first language registered in the specific pattern DB 201 by the specific pattern detection unit 105 Are detected as specific pattern candidates (step S21).
Next, the second language generation unit 106 uses the result of the morpheme / syntax analysis obtained by the morpheme / syntax analysis unit 102 and the translation dictionary and translation rules for translation from the first language to the second language. The input sentence in the first language is translated into the second language, and a second language translation sentence that is the translation result is generated (step S14).

この後、ステップＳ１４で得られた第２言語翻訳文から、特定パターン検出部１０５により、特定パターンＤＢ１０３に登録されている第２言語側の情報からなる特定パターンを検出し、得られた特定パターンのうち、ステップＳ２１で得られた特定パターン候補と一致する特定パターンを検出し、この特定パターンと対応する第２言語側の情報を誤り検出・校正の特定対象として選択する（ステップＳ２２）。 After that, the specific pattern made up of the information on the second language side registered in the specific pattern DB 103 is detected by the specific pattern detection unit 105 from the second language translation sentence obtained in step S14, and the specific pattern obtained Among these, a specific pattern that matches the specific pattern candidate obtained in step S21 is detected, and information on the second language side corresponding to this specific pattern is selected as a specific target for error detection / proofreading (step S22).

ここで、ステップＳ２２で特定パターンが検出されなかった場合、ステップＳ１６へ移行して、ステップＳ１４で生成した第２言語翻訳文を第２言語出力部１０９で出力し（ステップＳ１６）、一連の機械翻訳処理を終了する。
一方、ステップＳ２２で特定パターンが検出された場合、当該特定対象の特定パターンおよび第２言語の特定対象と、第２言語統計的モデル１０４に格納されている統計的共起情報とを用いて、ステップＳ１４で生成した第２言語翻訳文の誤りを検出して、この誤りを校正した後（ステップＳ１５）、第２言語出力部１０９で校正後の第２言語翻訳文さらには校正結果を出力し（ステップＳ１６）、一連の機械翻訳処理を終了する。 Here, when the specific pattern is not detected in step S22, the process proceeds to step S16, and the second language translation sentence generated in step S14 is output by the second language output unit 109 (step S16). End the translation process.
On the other hand, when a specific pattern is detected in step S22, using the specific pattern of the specific target and the specific target of the second language, and the statistical co-occurrence information stored in the second language statistical model 104, After detecting an error in the second language translation sentence generated in step S14 and correcting this error (step S15), the second language output unit 109 outputs the proofread second language translation sentence and the proofreading result. (Step S16), a series of machine translation processing ends.

［第２の実施形態の効果］
このように、本実施形態では、翻訳規則として記述しにくいまたは記述しきれない文法現象を持つ語彙を、特定パターンとして第１言語側の情報と第２言語側の情報との組として、予め特定パターンＤＢ１０３に登録しておき、この特定パターンの第１言語側の情報が第１言語の入力文に含まれおり、かつこの特定パターンの第２言語側の情報が、入力文を翻訳して得られた第２言語翻訳文に含まれている場合、当該特定パターンおよび当該第２言語側の情報に対応する第２言語翻訳文からなる特定対象と、第２言語統計的モデルに格納されている統計的共起情報とを用いて、第１言語を翻訳して得られた第２言語翻訳文の誤りを検出し、当該第２言語翻訳文を校正するようにしたので、翻訳規則として記述しにくいまたは記述しきれない文法現象に対する正確な翻訳規則を必要とすることなく、機械翻訳を行うことができる。 [Effects of Second Embodiment]
Thus, in this embodiment, a vocabulary having a grammatical phenomenon that is difficult to describe or cannot be described as a translation rule is specified in advance as a set of information on the first language side and information on the second language side as a specific pattern. Registered in the pattern DB 103, information on the first language side of the specific pattern is included in the input sentence of the first language, and information on the second language side of the specific pattern is obtained by translating the input sentence. If it is included in the second language translation sentence, it is stored in the second language statistical model and the specific object consisting of the second language translation sentence corresponding to the specific pattern and the information on the second language side. Since the error of the second language translation sentence obtained by translating the first language is detected using the statistical co-occurrence information and the second language translation sentence is proofread, it is described as a translation rule. Hard or hard to write Without the need for accurate translation rules for the phenomenon, it is possible to perform machine translation.

また、本実施形態では、特定パターンが第１言語側の情報と第２言語側の情報との組で記述された特定パターンＤＢ１０３を用いるようにしたので、特定パターンの第１言語側の情報と対応する第２言語側の情報を適切に設定することができ、誤り検出および校正を精度よく行うことが可能となる。
例えば、本実施形態において、図６と図７に示す特定パターンＤＢの例を用いた場合、下記の日本語例文１の翻訳結果の構造助詞「的」の欠落、日本語例文２の翻訳結果の構造助詞「地」の欠落、および日本語例文３の状語成分「東京」の誤りを検出して自動的に正しい結果に校正できるようになる。 In this embodiment, since the specific pattern DB 103 in which the specific pattern is described as a set of information on the first language side and information on the second language side is used, information on the first language side of the specific pattern and Corresponding information on the second language side can be set appropriately, and error detection and calibration can be performed with high accuracy.
For example, in the present embodiment, when the example of the specific pattern DB shown in FIGS. 6 and 7 is used, the structure particle “target” in the translation result of Japanese example sentence 1 below is missing, the translation result of Japanese example sentence 2 is It is possible to automatically proofread the correct result by detecting the lack of the structural particle “ground” and the error of the adjective component “Tokyo” in the Japanese example sentence 3.

例文（用例中の括弧内は誤り箇所を示す）
日本語例文１：「いちばん近いレストランの駐車場は満員です。」
中国語翻訳結果：「最近飯店的停車場満員。」
中国語正解：「最近（的）飯店的停車場満員。」

日本語例文２：「何をぼんやり考えているのか？」
中国語翻訳結果：「呆呆在考慮什幺？」
中国語正解：「呆呆（地）在考慮什幺？」

日本語例文３：「Ｍ航空Ａ便東京行きはただ今から１番ゲートで搭乗を開始します。」
中国語翻訳結果：「去往Ｍ航空Ａ航班東京従現在開始在一号登機口登機。」
中国語正解：「去往（東京）Ｍ航空Ａ航班従現在開始在一号登機口登機。」 Example sentences (in parentheses in the examples indicate error locations)
Japanese example sentence 1: “The parking lot at the nearest restaurant is full.”
Chinese translation result: “Recently full stop at a hotel stop”
Chinese correct answer: “Recently, the hotel is full of restaurants.”

Japanese example sentence 2: "What are you thinking about?"
Chinese translation result: “Consideration of the presence of senile dementia?”
Chinese correct answer: “Are you aware of the sensation?”

Japanese example sentence 3: “M Airlines A flight to Tokyo will start boarding at Gate 1 now.”
Chinese translation result: “Old M Airlines A voyage group Tokyo start now.
Chinese correct answer: “Old (Tokyo) M Airlines A voyage starter No. 1 climber entrance lift.”

［第３の実施形態］
次に、図９を参照して、本発明の第３の実施形態にかかる機械翻訳装置について説明する。図９は、本発明の第３の実施形態にかかる機械翻訳装置の構成を示すブロック図であり、図１と同じまたは同等部分には同一符号を付してある。 [Third Embodiment]
Next, a machine translation apparatus according to a third embodiment of the present invention will be described with reference to FIG. FIG. 9 is a block diagram showing a configuration of a machine translation apparatus according to the third embodiment of the present invention. The same reference numerals are given to the same or equivalent parts as FIG.

図９に示すように、本実施形態は、図１に示した第１の実施形態にかかる機械翻訳装置の構成要素と同じであるが、各機能部の接続関係を換えて構成してある。具体的には、特定パターン検出部１０５での特定パターンの検出結果に応じて、第２言語生成部１０６により、誤り検出・校正部１０７での誤り検出・構成処理を行うか否か決定している。その他の点では、第１の実施形態と同様である。
このような接続関係であっても、第１の実施形態と同様の作用効果を得ることができる。 As shown in FIG. 9, the present embodiment is the same as the components of the machine translation apparatus according to the first embodiment shown in FIG. 1, but is configured by changing the connection relationship of each functional unit. Specifically, in accordance with the detection result of the specific pattern in the specific pattern detection unit 105, the second language generation unit 106 determines whether or not to perform error detection / configuration processing in the error detection / calibration unit 107. Yes. Other points are the same as those of the first embodiment.
Even with such a connection relationship, the same effects as those of the first embodiment can be obtained.

［第４の実施形態］
次に、図１０を参照して、本発明の第４の実施形態にかかる機械翻訳装置について説明する。図１０は、本発明の第４の実施形態にかかる機械翻訳装置の構成を示すブロック図であり、図５と同じまたは同等部分には同一符号を付してある。 [Fourth Embodiment]
Next, a machine translation apparatus according to the fourth embodiment of the present invention will be described with reference to FIG. FIG. 10 is a block diagram showing a configuration of a machine translation apparatus according to the fourth embodiment of the present invention. The same or equivalent parts as those in FIG.

図１０に示すように、本実施形態は、図５に示した第２の実施形態にかかる機械翻訳装置の構成要素と同じであるが、各機能部の接続関係を換えて構成してある。具体的には、特定パターン検出部１０５での特定パターンの検出結果に応じて、第２言語生成部１０６により、誤り検出・校正部１０７での誤り検出・構成処理を行うか否か決定している。また、形態素・構文解析部１０２と特定パターン検出部１０５との間に、訳語選択部４０１が追加されている。その他の点では、第１の実施形態と同様である。 As shown in FIG. 10, the present embodiment is the same as the constituent elements of the machine translation apparatus according to the second embodiment shown in FIG. 5, but is configured by changing the connection relationship of each functional unit. Specifically, in accordance with the detection result of the specific pattern in the specific pattern detection unit 105, the second language generation unit 106 determines whether or not to perform error detection / configuration processing in the error detection / calibration unit 107. Yes. In addition, a translation selection unit 401 is added between the morpheme / syntax analysis unit 102 and the specific pattern detection unit 105. Other points are the same as those of the first embodiment.

訳語選択部４０１は、形態素・構文解析部１０２で得られた第１言語の入力文に対する形態素・構文解析結果を用いて、翻訳ＤＢ１０８の翻訳辞書から各形態素の訳語候補を取得する機能と、この訳語候補を翻訳ＤＢ１０８の翻訳規則や訳語選択処理用規則に適用して、各形態素の最適な訳語候補を取得する機能とを有している。
これにより、特定パターン検出部１０５において、第２言語生成部１０６で生成された第２言語翻訳文から、特定パターンＤＢ１０３に登録されている第２言語側の情報からなる特定パターンを検出する際、高い精度で特定パターンを検索でき、より正確に誤り検出・校正の特定対象を選択することが可能となる。 The translation word selection unit 401 uses the morpheme / syntax analysis result for the input sentence in the first language obtained by the morpheme / syntax analysis unit 102 to acquire a translation word candidate of each morpheme from the translation dictionary of the translation DB 108, The translation word candidate is applied to the translation rules of the translation DB 108 and translation rule selection processing rules to obtain the optimal translation word candidate for each morpheme.
Accordingly, when the specific pattern detection unit 105 detects a specific pattern made up of information on the second language side registered in the specific pattern DB 103 from the second language translation generated by the second language generation unit 106, A specific pattern can be searched with high accuracy, and a specific target for error detection / calibration can be selected more accurately.

次に、図１１を参照して、本発明の第１の実施例について説明する。図１１は、本発明の第１の実施例にかかる機械翻訳装置の構成を示すブロック図であり、図１と同じまたは同等部分には同一符号を付してある。 Next, the first embodiment of the present invention will be described with reference to FIG. FIG. 11 is a block diagram showing the configuration of the machine translation apparatus according to the first embodiment of the present invention. The same reference numerals are given to the same or equivalent parts as in FIG.

本実施例は、前述した第１および第３の実施形態に対応するものである。図１１に示すように、本実施例は、図１に示した第１の実施形態にかかる機械翻訳装置のうち、第１言語入力部１０１、特定パターンＤＢ１０３、第２言語統計的モデル１０４、第２言語生成部１０６、および第２言語出力部１０９に代えて、それぞれ日本語入力部５０１、特定パターンＤＢ５００、中国語統計的モデル５０２、中国語生成部５０３、および中国語出力部５０４を備えている。これらは、第１の実施形態の第１言語および第２言語を日本語および中国語に特化したものであり、実質的には第１の実施形態の構成要素と同等である。 This example corresponds to the first and third embodiments described above. As shown in FIG. 11, the present example is the first language input unit 101, the specific pattern DB 103, the second language statistical model 104, the first of the machine translation devices according to the first embodiment shown in FIG. 1. Instead of the bilingual generation unit 106 and the second language output unit 109, a Japanese input unit 501, a specific pattern DB 500, a Chinese statistical model 502, a Chinese generation unit 503, and a Chinese output unit 504 are provided. Yes. These specialize the first language and the second language of the first embodiment in Japanese and Chinese, and are substantially equivalent to the components of the first embodiment.

図１２は、本発明の第１の実施例にかかる機械翻訳処理を示すフローチャートであり、前述した図４と同じまたは同等部分には同一符号を付してある。ここでは、日本語で入力された「それは、三列と五列の間にある」という入力文を中国語へ機械翻訳する場合を例として説明する。 FIG. 12 is a flowchart showing the machine translation process according to the first embodiment of the present invention. The same reference numerals are given to the same or equivalent parts as those in FIG. Here, an example will be described in which an input sentence “It is between three and five columns” input in Japanese is machine-translated into Chinese.

日本語入力部５０１で入力された入力文を（ステップＳ５１）、形態素・構文解析部１０２で解析した場合（ステップＳ１２）、「それ／は／、／三／列／と／五／列／の／間／に／ある」という形態素解析結果が得られる。形態素解析で得られた各形態素は独自の属性値を持つ。属性値とは、原形、品詞、表記、活用形、意味分類、態、相等の情報からなる。
例えば、例文の形態素「の」と「ある」は以下属性を有する。
表記仮名固有部原型品詞 …
のノのの格助詞 …
あるアルあるある動詞 … When the input sentence input by the Japanese input unit 501 is analyzed by the morpheme / syntax analyzing unit 102 (step S12), “it / ha /, / three / column / and / five / column / The result of morpheme analysis is obtained. Each morpheme obtained by morpheme analysis has a unique attribute value. The attribute value includes information such as original form, part of speech, notation, utilization form, semantic classification, state, and phase.
For example, the morphemes “no” and “a” in the example sentence have the following attributes.
Notation Kana Proper part Prototype Part of speech…
Nono's case particle ...
A certain a certain certain verb ...

また、翻訳ＤＢ１０８の翻訳辞書は、各形態素の中国語生成ブロックに、その形態素の訳語、品詞、意味分類、用言の場合の格フレーム、態、相情報等の情報を含んでいる。中国語の格フレームの構成成分は、中国語生成用格フレームの要素として、主語、述語、目的語、定語、状語、補語等の文構造情報が記述されている。 In addition, the translation dictionary of the translation DB 108 includes information such as a translation of the morpheme, a part of speech, a semantic classification, a case frame in case of a predicate, a state, and phase information in the Chinese generation block of each morpheme. The constituent elements of the Chinese case frame describe sentence structure information such as a subject, predicate, object, constants, adjectives, and complements as elements of the case frame for Chinese generation.

形態素・構文解析部１０２は、前述した形態素解析結果を用いて、構文解析規則と合わせて入力文の構文解析処理を行う（ステップＳ１２）。構文解析処理を行った結果、入力文中係り受け関係を有する語彙間の係り受け関係を取得できる。
特定パターン検出部１０５は、形態素・構文解析部１０２での形態素解析・構文解析処理の結果を用いて、特定パターンＤＢ５００に登録された特定パターンと照合して特定パターン検出処理を行う（ステップＳ５２）。 The morpheme / syntax analyzer 102 performs syntax analysis of the input sentence together with the syntax analysis rules using the morpheme analysis result described above (step S12). As a result of the parsing process, it is possible to obtain a dependency relationship between words having a dependency relationship in the input sentence.
The specific pattern detection unit 105 uses the result of the morpheme analysis / syntax analysis process in the morpheme / syntax analysis unit 102 to perform a specific pattern detection process by collating with the specific pattern registered in the specific pattern DB 500 (step S52). .

例えば、入力文「それは、三列と五列の間にある」の形態素・構文解析処理の結果には、格助詞「の」が含まれる。特定パターン検出部１０５において、図３に示す日中翻訳用特定パターンＤＢを用いて入力文とのパターンマッチングを行うと、検索ＩＤ０番の格助詞「の」が検出される。すると、検出された格助詞「の」に対して、翻訳ＤＢ１０８の翻訳辞書から、格助詞「の」の訳語候補である「訳語なし」と示す記号「Φ」と中国語構造助詞「的」が取得される。 For example, the case particle “no” is included in the result of the morpheme / syntactic analysis process of the input sentence “it is between the third and fifth columns”. When the specific pattern detection unit 105 performs pattern matching with the input sentence using the specific pattern DB for intraday translation shown in FIG. 3, the case particle “NO” of search ID 0 is detected. Then, with respect to the detected case particle “no”, a symbol “Φ” indicating “no translation” as a candidate word of the case particle “no” and a Chinese structure particle “target” are found from the translation dictionary of the translation DB 108. To be acquired.

また、特定パターン検出部１０５は、入力文の形態素・構文解析処理の結果から、特定パターンＤＢ５００に記述された特定パターンを検出した場合、検出した特定パターンと隣接する形態素の情報を、入力文の形態素・構文解析処理の結果から取得する。
例えば、例文の形態素・構文解析結果から、格助詞「の」と隣接する語彙「列」と「間」の情報を切り出して取得できる。また、「列」、「の」、「間」と対応する中国語側の情報を日中翻訳辞書の辞書引き処理から取得できる。 Further, when the specific pattern detection unit 105 detects a specific pattern described in the specific pattern DB 500 from the result of the morpheme / syntax analysis process of the input sentence, information on the morpheme adjacent to the detected specific pattern is displayed. Obtained from the result of morpheme / syntactic analysis.
For example, from the morpheme / syntax analysis result of the example sentence, the information on the vocabulary “sequence” and “between” adjacent to the case particle “no” can be cut out and acquired. In addition, information on the Chinese side corresponding to “column”, “no”, and “between” can be acquired from the dictionary lookup process of the Sino-Japanese translation dictionary.

これにより、日本語語彙の表記や品詞などの属性情報と、それに対応する中国語側の語彙の表記と品詞などの属性情報を取得できる。例えば、格助詞「の」の前の語彙「列」の日本語の品詞である「助数詞」と対応する中国語の品詞である「量詞」、格助詞「の」の後ろの語彙「間」の日本語の品詞である「名詞」と対応する中国語の品詞である「名詞」を取得できる。したがって、日本語「列＋の＋間」と対応する中国語の生成語彙の品詞情報として、「量詞＋Φ＋名詞」または「量詞＋的＋名詞」のようなパターンを、入力文の形態素・構文解析処理の結果から取得できる。 Thereby, attribute information such as Japanese vocabulary notation and part of speech and the corresponding Chinese vocabulary notation and part of speech attribute information can be acquired. For example, the Japanese part-of-speech “classifier” in the vocabulary “column” before the case particle “no” and the corresponding Chinese part-of-speech “quantifier”, and the word “ma” after the case particle “no” It can acquire “noun”, which is part of Chinese corresponding to “noun”, which is part of Japanese. Therefore, as part-of-speech information in the Chinese generated vocabulary corresponding to Japanese “column + no + ma”, patterns such as “quantifier + Φ + noun” or “quantifier + target + noun” are used for morpheme / syntax analysis of the input sentence. It can be obtained from the processing result.

次に、中国語生成部５０３は、形態素・構文解析した結果と、日中翻訳辞書および翻訳規則を用いて、日本語入力文の中国語、すなわち中国語翻訳文を生成する（ステップＳ５３）。ここでの中国語翻訳文には、中国語品詞などの属性情報が各中国語形態素に付与されている。
ここで、特定パターン検出部１０５で特定パターンが検出されなかった場合、誤り検出・校正処理を行わず、中国語生成部５０３で得られた中国語翻訳文を整形して、中国語出力部５０４から中国語を出力する（ステップＳ５４）。 Next, the Chinese generation unit 503 generates a Chinese input sentence, that is, a Chinese translation, using the result of the morpheme / syntax analysis, the Japanese-Chinese translation dictionary, and the translation rules (step S53). In this Chinese translation, attribute information such as Chinese part of speech is assigned to each Chinese morpheme.
Here, when the specific pattern is not detected by the specific pattern detection unit 105, the Chinese translation sentence obtained by the Chinese generation unit 503 is shaped without performing error detection / proofreading processing, and the Chinese output unit 504 From Chinese (step S54).

一方、特定パターン検出部１０５で例文のように特定パターンが検出された場合、誤り検出・校正部１０７は、特定パターンに基づき特定した中国語翻訳文内の特定対象に対して、中国語統計的共起情報を格納している中国語統計的モデル５０２を用いて、中国語翻訳文から誤りを検出して校正処理を行い（ステップＳ１５）、校正された結果を中国語出力部５０４から出力する（ステップＳ５４）。 On the other hand, when the specific pattern is detected by the specific pattern detection unit 105 as in the example sentence, the error detection / calibration unit 107 performs Chinese statistical analysis on the specific target in the Chinese translation sentence specified based on the specific pattern. Using the Chinese statistical model 502 storing the co-occurrence information, an error is detected from the Chinese translation and proofreading is performed (step S15), and the proofread result is output from the Chinese output unit 504. (Step S54).

誤り検出・校正部１０７における、具体的な誤り検出と校正処理について、中国語Ｎ−ｇｒａｍモデルで日本語例文に対する処理例を説明する。
前述したように、格助詞「の」の訳語が「訳語なし」の記号「Φ」を訳語候補として取得された場合、この記号「Φ」に対して、誤り検出・構成処理を行う際には、空文字に変換すればよい。「量詞＋Φ＋名詞」の例で説明すると、「量詞＋名詞」のように変換すればよい。 A specific error detection and proofreading process in the error detection / proofreading unit 107 will be described with respect to a Japanese example sentence using a Chinese N-gram model.
As described above, when the translation of the case particle “no” is acquired with the symbol “Φ” of “no translation” as a translation candidate, when error detection / configuration processing is performed on this symbol “Φ” , Convert to empty character. In the case of “quantifier + Φ + noun”, it may be converted into “quantifier + noun”.

日本語入力文「それは、三列と五列の間にある」で説明すると、例えば、中国語生成部５０３で得られた中国語翻訳文が「那／代名詞個／量詞在／介詞三／数詞列／量詞和／連詞五／数詞列／量詞的／構造助詞之間／名詞」となったものとする。ここで、中国語統計的モデル５０２の一例として、中国語単言語コーパスで単語表記と品詞情報で構築されたＮ−ｇｒａｍモデルを用いる場合、中国語構造助詞「的」の適切性を判定するためのアルゴリズムは、次のような計算方法が使用できる。 For example, if the Japanese input sentence “It is between the 3rd and 5th rows”, the Chinese translation obtained by the Chinese generation unit 503 is “Na / pronouns / quantifiers / adverbs 3 / numerals”. Column / quantifier sum / adjunct five / numerical string / quantifier / structure particle noma / noun ”. Here, as an example of the Chinese statistical model 502, when an N-gram model constructed with word notation and part-of-speech information is used in a Chinese monolingual corpus, the suitability of the Chinese structure particle “target” is determined. For this algorithm, the following calculation method can be used.

まず、中国語特定対象ｗ_deの適切性を、３−ｇｒａｍで近似計算される生成確率を利用して判定する方法がある。なお、以下のｗは、語彙の表記、または表記と品詞の組を示す。
例えば、翻訳結果を文１＝ｗ₀，ｗ₁，ｗ₂，…，ｗ_i-1，ｗ_i，ｗ_i+1，…ｗ_nとし、これにｗ_deの位置を考慮したものを、文２＝ｗ₀，ｗ₁，ｗ₂，…，ｗ_i-1，ｗ_de，ｗ_i，ｗ_i+1，…ｗ_nと仮定する。 First, there is a method of determining the appropriateness of the Chinese identification target w _de using the generation probability that is approximated by 3-gram. In the following, w represents a vocabulary notation or a combination of notation and part of speech.
For example, sentence translation result _{_{1 = w 0, w 1,}} w 2, ..., w i-1, w i, w i + 1, and ... w _n, this to those considering the position of the w _de, sentences _{_{2 = w 0, w 1,}} w 2, ..., w i-1, w de, w i, w i + 1, it is assumed that the ... w _n.

ここで、文１の生成確率を３−ｇｒａｍで近似すると、次の式（１）となり、同様に文２の生成確率を３−ｇｒａｍで近似すると、次の式（２）となる。

Here, when the generation probability of sentence 1 is approximated by 3-gram, the following expression (1) is obtained. Similarly, when the generation probability of sentence 2 is approximated by 3-gram, expression (2) is obtained.

よって、（ｗ_i-1，ｗ_de，ｗ_i）のうちｗ_deの適切性を判断できる計算式は、次の式（３）で表すことができる。

Therefore, the calculation formula that can determine the appropriateness of w _de in (w _i−1 , w _de , w _i ) can be expressed by the following formula (3).

また、文＝ｗ₀，ｗ₁，…，ｗ_nの生成確率については、３−ｇｒａｍを利用した次の式（４）に示すような計算方法を用いてもよい。

Furthermore, sentence _{_{= w 0, w 1, ...}} , for the generation probability of the w _n, may be used the calculation method shown in the following using 3-gram formula (4).

また、中国語生成部５０３で得られた中国語翻訳文のうち、特定パターン検出部１０５で、中国語の語彙、構文情報である「量詞＋名詞」と「量詞＋的＋名詞」との２つのパターンが、翻訳結果の特定対象として特定された場合、「量詞＋的＋名詞」のパターンに一致する部分として「列／量詞的／構造助詞之間／名詞」が取得される。 Of the Chinese translations obtained by the Chinese generation unit 503, the specific pattern detection unit 105 uses two words, “quantifier + noun” and “quantifier + target + noun”, which are Chinese vocabulary and syntax information. When one pattern is specified as the target of the translation result, “sequence / quantitative / structural particle noma / noun” is acquired as a part that matches the pattern of “quantifier + target + noun”.

この場合には、品詞付き中国語Ｎ−ｇｒａｍ言語統計的モデルを用いて、式（１）〜式（３）で示されたアルゴリズムで計算すると、以下の２つの条件付き確率値を計算して比較すればよい。
Ｐ１＝Ｐ（的／構造助詞｜五／数詞列／量詞）・Ｐ（之間／名詞｜列／量詞的／構造助詞）
Ｐ２＝Ｐ（之間／名詞｜五／数詞列／量詞） In this case, using the Chinese N-gram language statistical model with parts of speech, the following two conditional probability values are calculated using the algorithms shown in equations (1) to (3). Compare.
P1 = P (target / structural particle | five / numerical string / quantifier) / P (noma / noun | string / quantifier / structural particle)
P2 = P (Noma / Noun | Five / Numeric string / Quantifier)

ここで、中国語３−ｇｒａｍで確率値を計算して、
Ｐ１＝８．３３３３６ｅ−００５
Ｐ２＝１．２５０００ｅ−００１
のような結果が得られた場合、Ｐ１よりＰ２の確率値が高いため、翻訳結果から（列／量詞的／構造助詞之間／名詞）の「的／構造助詞」という誤りが検出できる。
この際に、Ｐ２のパターン「五／数詞列／量詞之間／名詞」を切り出して、翻訳結果文の「列／量詞的／構造助詞之間／名詞」と置換すれば、中国語翻訳文の校正結果である「那／代名詞個／量詞在／介詞三／数詞列／量詞和／連詞五／数詞列／量詞之間／名詞」となる。 Here, the probability value is calculated in Chinese 3-gram,
P1 = 8.333336e-005
P2 = 1.25000e-001
When a result like this is obtained, the probability value of P2 is higher than P1, and therefore the error “target / structural particle” in (sequence / quantitative / structural noun / noun) can be detected from the translation result.
At this time, if the P2 pattern “5 / numeral string / quantifier noma / noun” is cut out and replaced with “sequence / quantifier / structure particle noma / noun” in the translation result sentence, the Chinese translation sentence The proofreading result is "Na / pronoun individual / quantifier in / injunction three / numerical string / quantifier sum / adjunct five / numerical string / quantifier noma / noun".

また、Ｐ１の確率値を計算する際、誤りを含んでいる（列／量詞的／構造助詞之間／名詞）との３−ｇｒａｍが検出されない場合、Ｐ（之間／名詞｜列／量詞的／構造助詞）の値は２−ｇｒａｍと１−ｇｒａｍでスムージングによる補間処理で近似確率を計算することができる。
具体的に、Ｎ−ｇｒａｍモデルのスムージング方法として、可算スムージング、線形補間、バックオフ・スムージング、ウィトン・ベル・スムージング、ウン・カウント法などが挙げられる。 In addition, when calculating the probability value of P1, if 3-gram that contains an error (column / quantifier / structure particle noma / noun) is not detected, P (noma / noun | sequence / quantifier) The value of / structural particle) is 2-gram and 1-gram, and the approximate probability can be calculated by interpolation processing by smoothing.
Specifically, the smoothing method of the N-gram model includes countable smoothing, linear interpolation, back-off smoothing, Witon Bell smoothing, uncounting method, and the like.

また、誤り検出・校正部１０７での処理において、式（４）に示すアルゴリズムを使う際は、特定パターン検出部１０５により検出された中国語の語彙および構文情報と、中国語生成部５０３で得られた中国語翻訳文とで共起するすべての文生起確率を計算して、生起確率が最大となるものを正解とすることができる。
例文では、格助詞「の」の２つの訳語候補を取得して、中国語翻訳文について「的」が生成される場合と、「的」が生成されない場合のすべての文生起確率を計算して、最大となるものを校正結果とすればよい。誤り箇所を検出したい場合、文生起確率が最大となる生成文と中国語翻訳文との差分を求めればよい。 In addition, when using the algorithm shown in Expression (4) in the processing in the error detection / calibration unit 107, the Chinese vocabulary and syntax information detected by the specific pattern detection unit 105 and the Chinese generation unit 503 All sentence occurrence probabilities that co-occur with the Chinese translations obtained can be calculated, and the sentence with the maximum occurrence probability can be determined as the correct answer.
In the example sentence, two candidate words for the case particle “no” are obtained, and all sentence occurrence probabilities when “target” is generated for the Chinese translation and when “target” is not generated are calculated. The maximum result may be used as the calibration result. When an error location is to be detected, the difference between the generated sentence with the maximum sentence occurrence probability and the Chinese translation may be obtained.

例えば、日本語入力文に対する中国語翻訳文が「那／代名詞個／量詞在／介詞三／数詞列／量詞和／連詞五／数詞列／量詞的／構造助詞之間／名詞」となった場合、以下の文の生起確率を近似計算することができる。
■「的／構造助詞」を含まない文：
文３「那／代名詞個／量詞在／介詞三／数詞列／量詞和／連詞五／数詞列／量詞之間／名詞」
■「的／構造助詞」を含む文：
文４「那／代名詞個／量詞在／介詞三／数詞列／量詞和／連詞五／数詞列／量詞的／構造助詞之間／名詞」
文５「那／代名詞的／構造助詞個／量詞在／介詞三／数詞列／量詞和／連詞五／数詞列／量詞之間／名詞」
文６「那／代名詞個／量詞的／構造助詞在／介詞三／数詞列／量詞和／連詞五／数詞列／量詞之間／名詞」
文７「那／代名詞個／量詞在／介詞的／構造助詞三／数詞列／量詞和／連詞五／数詞列／量詞之間／名詞」
文８「那／代名詞個／量詞在／介詞三／数詞的／構造助詞列／量詞和／連詞五／数詞列／量詞之間／名詞」
文９「那／代名詞個／量詞在／介詞三／数詞列／量詞的／構造助詞和／連詞五／数詞列／量詞之間／名詞」
文１０「那／代名詞個／量詞在／介詞三／数詞列／量詞和／連詞的／構造助詞五／数詞列／量詞之間／名詞」
文１１「那／代名詞個／量詞在／介詞三／数詞列／量詞和／連詞五／数詞的／構造助詞列／量詞之間／名詞」
文１２「那／代名詞個／量詞在／介詞三／数詞列／量詞和／連詞五／数詞列／量詞的／構造助詞之間／名詞」
文１３「那／代名詞個／量詞在／介詞三／数詞列／量詞和／連詞五／数詞列／量詞之間／名詞的／構造助詞」 For example, if the Chinese translation for a Japanese input sentence is “Na / pronouns / quantifiers in / adjunct three / numerical strings / quantifier sum / adjunct five / numerical strings / quantifier-like / structural particles noma / nouns”. The occurrence probability of the following sentence can be approximated.
■ Sentences that do not contain “target / structural particles”:
Sentence 3 “Na / pronouns individual / quantifiers present / intern three / numerical strings / quantifiers sum / adjunct five / numerical strings / quantifiers noma / nouns”
■ Sentences that contain “target / structural particles”:
Sentence 4 “Na / pronouns individual / quantifiers present / adjunct three / numerical strings / quantifiers sum / adjunct five / numerical strings / quantifier-like / structural particles noma / nouns”
Sentence 5 “Na / pronouns / structural particles individual / quantifiers present / adjunct three / numerical strings / quantifiers sum / adjunct five / numerical strings / quantifiers noma / nouns”
Sentence 6 “Na / pronoun individual / quantifier-like / structural particle resident / adjunct three / numerical string / quantifier sum / adjunct five / numerical string / quantifier noman / noun”
Sentence 7 “Na / pronoun individual / quantifier in / injective / structure particle three / numerical string / quantifier sum / adjunct five / numerical string / quantifier noma / noun”
Sentence 8 “Na / pronouns individual / quantifiers present / adjunct three / numerical / structural particle sequence / quantifier sum / adjunct five / numerical string / quantifier noma / noun”
Sentence 9 “Na / pronouns individual / quantifiers present / adjunct three / numerical strings / quantifier-like / structural particles sum / adjunct five / numerical strings / quantifiers noma / nouns”
Sentence 10 “Na / pronouns individual / quantifiers present / adjunct three / numerical strings / quantifiers sum / adjunctive / structural particles five / numerical strings / quantifiers noma / nouns”
Sentence 11 “Na / pronouns individual / quantifiers in / in adjunct three / numerical strings / quantifiers sum / adjunctive five / numerical / structural particle strings / quantifiers noma / nouns”
Sentence 12 “Na / pronouns individual / quantifiers present / adjunct three / numerical strings / quantifiers sum / adjunct five / numerical strings / quantifier-like / structural particles noma / nouns”
Sentence 13 “Na / pronouns individual / quantifiers present / adjunct three / numerical strings / quantifiers sum / adjunct five / numerical strings / quantifier strings / nounative / structural particles”

これらの文３〜文１３の文生起確率を中国語３−ｇｒａｍを用いて計算して、確率値が最大となるものを最適な翻訳結果とすることができる。ここでの文生起確率を計算する際、前述したＮ−ｇｒａｍモデルのスムージング方法を用いることができる。
例文に対して、Ｗｉｔｔｅｎ−Ｂｅｌｌｄｉｓｃｏｕｎｔでバックオフ・スムージング平滑化処理での文生起確率の次のような計算結果が得られたものとする。
文３の文生起確率：１．４９２８６ｅ−００８
文４の文生起確率：１．５１６７７ｅ−０１１
文５の文生起確率：１．０１７４４ｅ−０１０
文６の文生起確率：５．６７２７０ｅ−０１１
文７の文生起確率：１．０５３７１ｅ−０１０
文８の文生起確率：３．２０４９４ｅ−０１０
文９の文生起確率：１．５３９６４ｅ−０１１
文１０の文生起確率：８．８７８１０ｅ−０１０
文１１の文生起確率：３．０７９９９ｅ−０１０
文１２の文生起確率：１．４９２５２ｅ−０１１
文１３の文生起確率：１．０３９４９ｅ−００９ The sentence occurrence probabilities of these sentences 3 to 13 are calculated using Chinese 3-gram, and the sentence with the maximum probability value can be set as the optimum translation result. When calculating the sentence occurrence probability here, the above-described smoothing method of the N-gram model can be used.
Assume that the following calculation result of the sentence occurrence probability in the back-off / smoothing smoothing process is obtained for the example sentence by the Witten-Bell disc.
Sentence probability of sentence 3: 1.49286e-008
Sentence occurrence probability of sentence 4: 1.51677e-011
Sentence occurrence probability of sentence 5: 1.01744e-010
Sentence occurrence probability of sentence 6: 5.67270e-011
Sentence occurrence probability of sentence 7: 1.05371e-010
Sentence occurrence probability of sentence 8: 3.20494e-010
Sentence occurrence probability of sentence 9: 1.53964e-011
Sentence occurrence probability of sentence 10: 8.87810e-010
Sentence occurrence probability of sentence 11: 3.079999e-010
Sentence probability of sentence 12: 1.49252e-011
Sentence occurrence probability of sentence 13: 1.03949e-009

この場合には、確率値をソートして文３の結果が最大となるため、文３が正解とすることができる。また、中国語構造助詞「的」の誤りを検出したいときに、文３と中国語翻訳文との差分から求められる。 In this case, the probability values are sorted, and the result of sentence 3 is maximized, so that sentence 3 can be the correct answer. Further, when it is desired to detect an error in the Chinese structure particle “target”, it is obtained from the difference between the sentence 3 and the Chinese translation sentence.

また、中国語品詞を単語クラスとして、中国語単言語コーパスで構築されたＣｌａｓｓＮ−ｇｒａｍモデルを用いることができる。
モデルの構築方法は、非特許文献１に記載されたＣｌａｓｓｂｉｇｒａｍモデルやＣｌａｓｓｔｒｉｇｒａｍモデルを使用できる。 Also, a Class N-gram model constructed with a Chinese monolingual corpus can be used with the Chinese part of speech as the word class.
As a model construction method, a Class bigram model or a Class trigram model described in Non-Patent Document 1 can be used.

ＣｌａｓｓＮ−ｇｒａｍモデルを用いて、第２言語翻訳文から対象語彙の誤りの検出および校正処理を行うアルゴリズムは、非特許文献１に記録されたＣｌａｓｓｂｉｇｒａｍモデルやＣｌａｓｓｔｒｉｇｒａｍモデルを用いて、第２言語翻訳文のうち、「的」の有無を考慮した文生成確率の最大となるものを最もらしい校正結果と推定すればよい。
また、中国語統計的モデルは、決定リスト、ＳＶＭ、最大エントロピー、ＨＭＭ、ベイズ学習のいずれかの学習手法で構築されてもよい。無論、これらの学習手法には限定されない。 An algorithm for detecting and correcting an error in the target vocabulary from the second language translation sentence using the Class N-gram model is the second algorithm using the Class bigram model and the Class trigram model recorded in Non-Patent Document 1. Of the language translation sentences, the sentence having the maximum sentence generation probability considering the presence or absence of “target” may be estimated as the most likely proofreading result.
The Chinese statistical model may be constructed by any learning method of decision list, SVM, maximum entropy, HMM, and Bayesian learning. Of course, it is not limited to these learning methods.

このようにして、日本語入力文「それは、三列と五列の間にある」の中国語翻訳文が「那個在三列和五列的之間」となった場合、した処理によって、「那個在三列和五列之間」との正しい結果が得られる。
また、実施例１と同様に、中国語翻訳文の中から、図３に示す形式名詞の訳語および助動詞「だ」の訳語選択問題を解決することもできる。 In this way, if the Chinese translation of the Japanese input sentence “It is between the third and fifth rows” becomes “Natashi, the third row, the fifth row, and the fifth row”, The correct result is obtained as “Nana three-row sum five-row nooma”.
Similarly to the first embodiment, it is possible to solve the problem of selecting a translation of a formal noun and an auxiliary verb “DA” shown in FIG.

次に、図１３を参照して、本発明の第２の実施例について説明する。図１３は、本発明の第２の実施例にかかる機械翻訳装置の構成を示すブロック図であり、図５と同じまたは同等部分には同一符号を付してある。 Next, a second embodiment of the present invention will be described with reference to FIG. FIG. 13 is a block diagram showing the configuration of the machine translation apparatus according to the second embodiment of the present invention. The same or equivalent parts as those in FIG. 5 are given the same reference numerals.

本実施例は、前述した第２および第４の実施形態に対応するものである。図１３に示すように、本実施例は、図１０に示した第４の実施形態にかる機械翻訳装置のうち、第１言語入力部１０１、特定パターンＤＢ１０３、第２言語統計的モデル１０４、第２言語生成部１０６、および第２言語出力部１０９に代えて、それぞれ日本語入力部６０２、特定パターンＤＢ６０１、中国語統計的モデル６０３、中国語生成部６０４、および中国語出力部６０５を備えている。これらは、第４の実施形態の第１言語および第２言語を日本語および中国語に特化したものであり、実質的には第４の実施形態の構成要素と同等である。 This example corresponds to the second and fourth embodiments described above. As shown in FIG. 13, the present example is the first language input unit 101, the specific pattern DB 103, the second language statistical model 104, the first of the machine translation devices according to the fourth embodiment shown in FIG. 10. Instead of the bilingual generation unit 106 and the second language output unit 109, a Japanese input unit 602, a specific pattern DB 601, a Chinese statistical model 603, a Chinese generation unit 604, and a Chinese output unit 605 are provided. Yes. These specialize the first language and the second language of the fourth embodiment in Japanese and Chinese, and are substantially equivalent to the constituent elements of the fourth embodiment.

図１４は、本発明の第２の実施例にかかる機械翻訳処理を示すフローチャートであり、前述した図１４と同じまたは同等部分には同一符号を付してある。
ここでは、日本語で入力された以下のような入力文を中国語へ機械翻訳する場合を例として説明する。
「いちばん近いレストランの駐車場は満員です」
「何をぼんやり考えているのか」
「Ｍ航空Ａ便東京行きはただ今から１番ゲートで搭乗を開始します」 FIG. 14 is a flowchart showing a machine translation process according to the second embodiment of the present invention, and the same or equivalent parts as those in FIG.
Here, a case where the following input sentence input in Japanese is machine-translated into Chinese will be described as an example.
“The closest restaurant parking lot is full”
“What are you thinking about?”
“M Airlines A flight to Tokyo will start boarding at Gate 1”

日本語入力部５０１で入力された入力文を（ステップＳ６１）、形態素・構文解析部１０２で解析した場合（ステップＳ１２）、
「いちばん／近い／レストラン／の／駐車場／は／満員／です」
「何／を／ぼんやり／考え／て／いる／の／か」
「Ｍ航空／Ａ便／東京／行き／は／ただ今／から／一番／ゲート／で／搭乗／を／開始／し／ます」
という形態素解析結果が得られる。形態素解析で得られた各形態素は、前述したように独自の属性値を持つ。 When the input sentence input by the Japanese input unit 501 is analyzed by the morpheme / syntax analyzing unit 102 (step S61),
"The most / close / restaurant / no / parking lot / ha / full /"
"What / What / Vague / Thinking / Te / I ///"
“M Airlines / A flight / Tokyo / bound / ha / just now / from / first / gate / de / boarding / start / do / do”
The result of morphological analysis is obtained. Each morpheme obtained by morpheme analysis has a unique attribute value as described above.

形態素・構文解析部１０２は、前述した形態素解析結果を用いて、構文解析規則と合わせて入力文の構文解析処理を行う。構文解析処理を行った結果、入力文中係り受け関係を有する語彙間の係り受け関係を取得できる。
例えば、図２は、入力文の「いちばん近いレストランの駐車場は満員です」に対して、文脈自由文法規則に基づいて解析した結果である。
また、日本語形態素・構文解析処理を行う際、各解析段階で適用された解析規則を形態素または、係り受け関係を有するものに付与することができる。 The morpheme / syntax analyzing unit 102 performs the syntax analysis processing of the input sentence together with the syntax analysis rules using the morpheme analysis result described above. As a result of the parsing process, it is possible to obtain a dependency relationship between words having a dependency relationship in the input sentence.
For example, FIG. 2 shows the result of analyzing the input sentence “The closest restaurant parking lot is full” based on the context-free grammar rules.
In addition, when performing Japanese morpheme / syntactic analysis processing, analysis rules applied at each analysis stage can be assigned to morphemes or those having a dependency relationship.

次に、訳語選択部４０１は、入力文の形態素・構文解析処理の結果と、翻訳ＤＢ１０８の翻訳辞書とを用いて、入力文の各形態素の訳語候補を取得し、記憶部に記憶する（ステップＳ６２）。このとき、翻訳ＤＢ１０８の訳語選択処理用規則に適用して、各形態素の最適な訳語候補を取得し、記憶部に記憶する。また、中国語訳語選択処理をする際に適用された訳語選択規則も、対象語彙の中国語訳語に解析結果の一部として、記憶部に記憶する。 Next, the translation selection unit 401 acquires translation word candidates of each morpheme of the input sentence using the result of the morpheme / syntax analysis processing of the input sentence and the translation dictionary of the translation DB 108, and stores it in the storage unit (step S1). S62). At this time, it is applied to the translation word selection rules of the translation DB 108, and the optimal translation candidate for each morpheme is acquired and stored in the storage unit. Also, the translation selection rules applied when the Chinese translation selection process is performed are stored in the storage unit as part of the analysis result in the Chinese translation of the target vocabulary.

続いて、特定パターン検出部１０５は、入力文の形態素・構文解析処理の結果と、訳語選択部４０１での日本語各形態素の訳語選択処理結果とを用いて、特定パターンＤＢ６０１に登録された特定パターンとのパターン照合処理を行い、特定パターンが検出された場合、その特定パターンを候補として記憶部に記憶する（ステップＳ６３）。
次に、中国語生成部６０４は、形態素・構文解析した結果と、日中翻訳辞書および翻訳規則を用いて、日本語入力文の中国語、すなわち中国語翻訳文を生成する（ステップＳ６４）。ここでの中国語翻訳文には、中国語品詞などの属性情報が各中国語形態素に付与されている。 Subsequently, the specific pattern detection unit 105 uses the result of the morpheme / syntax analysis processing of the input sentence and the translation selection processing result of each Japanese morpheme in the translation selection unit 401 to identify the specific pattern registered in the specific pattern DB 601. When a pattern matching process with a pattern is performed and a specific pattern is detected, the specific pattern is stored as a candidate in the storage unit (step S63).
Next, the Chinese generation unit 604 generates a Chinese input sentence, that is, a Chinese translation, using the result of the morpheme / syntactic analysis, the daytime translation dictionary, and the translation rules (step S64). In this Chinese translation, attribute information such as Chinese part of speech is assigned to each Chinese morpheme.

この後、特定パターン検出部１０５は、得られた中国語翻訳文から、特定パターンＤＢ６０１に登録されている中国語生成規則を示す情報からなる特定パターンを検出し、得られた特定パターンのうち、候補として記憶しておいた特定パターンと一致する特定パターンを検出し、これら対応する中国語翻訳文を誤り検出・校正の特定対象として特定する。
ここで、特定パターン検出部１０５で特定パターンが検出されなかった場合、誤り検出・校正処理を行わず、中国語生成部６０４で得られた中国語翻訳文を整形して、中国語出力部６０５から中国語を出力する（ステップＳ６５）。 After that, the specific pattern detection unit 105 detects a specific pattern including information indicating the Chinese generation rule registered in the specific pattern DB 601 from the obtained Chinese translation, and among the obtained specific patterns, A specific pattern that matches the specific pattern stored as a candidate is detected, and the corresponding Chinese translation is specified as an error detection / proofreading target.
Here, when the specific pattern is not detected by the specific pattern detection unit 105, the Chinese translation sentence obtained by the Chinese generation unit 604 is shaped without performing error detection / calibration processing, and the Chinese output unit 605. To output Chinese (step S65).

一方、特定パターン検出部１０５で特定パターンが検出された場合、誤り検出・校正部１０７は、特定パターンに基づき特定した中国語翻訳文内の特定対象に対して、中国語統計的共起情報を格納している中国語統計的モデル６０３を用いて、中国語翻訳文から誤りを検出して校正処理を行い（ステップＳ１５）、校正された結果を中国語出力部６０５から出力する（ステップＳ６５）。 On the other hand, when the specific pattern is detected by the specific pattern detection unit 105, the error detection / calibration unit 107 outputs Chinese statistical co-occurrence information for the specific target in the Chinese translation specified based on the specific pattern. Using the stored Chinese statistical model 603, an error is detected from the Chinese translation and proofreading is performed (step S15), and the proofread result is output from the Chinese output unit 605 (step S65). .

前述した特定パターン検出部１０５では、例えば、入力文「いちばん近いレストランの駐車場は満員です」の形態素・構文解析処理結果を用いて、図６に示す特定パターンＤＢに示された例との照合処理を行う。入力文のうち、「近い」＋「レストラン」の解析に適用された解析規則から、「連体修飾の取り込み」を取得できる。また、日中機械翻訳における日本語「連体修飾の取り込み」に対応する中国語側の生成規則は、一般的に「定語スロット」を生成することとなるため、入力文から、図６に示すパターンの検索ＩＤ０とのパターンを検出して、そこに付与された中国語構造助詞「的」を特定対象として中国語翻訳文から、中国語構造助詞「的」の適切性を判別することができる。 In the specific pattern detection unit 105 described above, for example, using the result of the morpheme / syntax analysis processing of the input sentence “the closest parking lot of the restaurant is full” is compared with the example shown in the specific pattern DB shown in FIG. Process. In the input sentence, “incorporation modification incorporation” can be acquired from the analysis rule applied to the analysis of “close” + “restaurant”. In addition, the Chinese generation rules corresponding to Japanese “incorporation of collocation modification” in Japanese-Chinese machine translation generally generate “constant word slots”. By detecting the pattern with the pattern search ID 0, it is possible to determine the appropriateness of the Chinese structure particle “target” from the Chinese translation sentence with the Chinese structure particle “target” given thereto as a specific target. .

また、検出精度を保障するために、「近い」の訳語「近」と品詞、「レストラン」の訳語「飯店」と品詞共に用いて、生成される中国語結果から「近」と「飯店」の間に「的」を入れるか入れないかの判定処理を行うことにより、中国語側の語彙の曖昧性を改善できる。
例えば、入力文「いちばん近いレストランの駐車場は満員です」の中国語翻訳文が「最近飯店的停車場満員」となった場合、この翻訳結果「最近飯店的停車場満員」の日本語の意味は「最近、レストランの駐車場は満員です」となる。この誤った日本語文の「最近」の品詞は時間的名詞または副詞であり、この最近に対応した中国語訳語の「最近」は時間的副詞となる。したがって、２言語間の語彙と品詞情報の対応を考慮して誤り検出処理を行うことにより、検出精度を向上できる。 In addition, in order to guarantee the detection accuracy, the words “Near” and “Haiten” are derived from the generated Chinese results using the words “Near” and “Part of speech” of “Near”, and the word “Restaurant” and part of speech of “Restaurant”. The ambiguity of the vocabulary on the Chinese side can be improved by performing the process of determining whether or not to insert “target” in between.
For example, if the Chinese translation of the input sentence “Closest restaurant parking lot is full” becomes “Recently full restaurant stop”, the translation means “Recently full restaurant stop” in Japanese. Recently, the restaurant parking lot is full. " The part-of-speech of “recent” in this incorrect Japanese sentence is a temporal noun or adverb, and “recent” in the Chinese translation corresponding to this recently becomes a temporal adverb. Therefore, detection accuracy can be improved by performing error detection processing in consideration of correspondence between vocabulary between two languages and part-of-speech information.

中国語統計的モデル６０３の一例として、例えば、中国語単言語コーパスで字面の表層または品詞情報で構築されたＮ−ｇｒａｍモデル、この際に、中国語構造助詞「的」の適切性を判定するためのアルゴリズムは、前述した式（１）〜式（３）または式（４）の計算方法を使用できる。
また、中国語品詞を単語クラスとして、中国語単言語コーパスで構築されたＣｌａｓｓＮ−ｇｒａｍモデルを用いることができる。 As an example of the Chinese statistical model 603, for example, an N-gram model constructed with a Chinese monolingual corpus with the surface of the face or part-of-speech information, the appropriateness of the Chinese structure particle “target” is determined. As the algorithm for the above, the calculation method of the above-described formula (1) to formula (3) or formula (4) can be used.
Also, a Class N-gram model constructed with a Chinese monolingual corpus can be used with the Chinese part of speech as the word class.

モデルの構築方法は、非特許文献１に記載されたＣｌａｓｓｂｉｇｒａｍモデルやＣｌａｓｓｔｒｉｇｒａｍモデルを使用できる。ＣｌａｓｓＮ−ｇｒａｍモデルを用いて、中国語翻訳文から語順の誤りの検出および校正処理を行うアルゴリズムは、非特許文献に記録されたＣｌａｓｓｂｉｇｒａｍモデルやＣｌａｓｓｔｒｉｇｒａｍモデルを用いて、中国語翻訳文の語順を考慮した文生成確率の最大となるものを最もらしい校正結果と推定すればよい。
なお、中国語統計的モデル６０３は、決定リスト、ＳＶＭ、最大エントロピー、ＨＭＭ、ベイズ学習のいずれかの学習手法で構築されてもよい。無論、これらの手法に限定されない。 As a model construction method, a Class bigram model or a Class trigram model described in Non-Patent Document 1 can be used. The algorithm for detecting and correcting word order errors from Chinese translations using the Class N-gram model is based on the class bigram model and class trigram model recorded in non-patent literature. What has become the most probable proofreading result should be the one with the highest sentence generation probability considering the word order.
The Chinese statistical model 603 may be constructed by any learning method of decision list, SVM, maximum entropy, HMM, and Bayesian learning. Of course, it is not limited to these methods.

また、中国語の状語が生成される際には、前述と同様に、状語と対応する構造助詞「地」、中国語の補語が生成される際に、補語と対応する構造助詞「得」の適切性を判別して校正処理を行うことができる。
また、中国語翻訳文のうち、定語、状語、補語の数が２個以上ある場合、生成された中国語語順のチェック処理も同様に行うことができる。例えば、図７に示す特定パターンＤＢを使用すれば、日本語文例のうち、「Ｍ航空Ａ便東京行きはただ今から１番ゲートで搭乗を開始します」の中国語翻訳文が、「去往Ｍ航空Ａ航班東京従現在開始在一号登機口登機」となった場合、前述のアルゴリズムにより、「去往東京Ｍ航空Ａ航班従現在開始在一号登機口登機」という正しい中国語翻訳文を生成することができる。 In addition, when a Chinese word is generated, the structure particle “ji” corresponding to the word is generated in the same manner as described above, and when the Chinese word is generated, the structure particle “ Can be determined and the calibration process can be performed.
Further, in the case where there are two or more constant words, words, and complements in the Chinese translation, the generated Chinese word order can be checked in the same manner. For example, if the specific pattern DB shown in FIG. 7 is used, a Chinese translation of “M Airlines A flight to Tokyo will start boarding at Gate 1 from now” in the Japanese sentence example, In the case of “M aviation A aviation group Tokyo starting current No. 1 climax”, according to the algorithm described above, the correct China “Old Tokyo M aviation A flight group currently starting No. 1 climax” A word translation can be generated.

また、第２の実施形態と同様に、中国語翻訳文から、中国語主語、述語、目的語、定語、状語、補語等の成分を統合的に考慮して、誤り検出・校正処理を行うことで、訳質を大きく向上できる。
また、第２の実施形態と同様に、例えば、日本語の態相情報を表す日本語助詞または助動詞と中国語の態相情報を表すものとの対応を取れたテーブルを特定パターンＤＢ２０１に記述すれば日本語翻訳結果から中国語態相を表すものの誤りを検出・校正できる。 Similarly to the second embodiment, error detection / calibration processing is performed by considering components such as Chinese subject, predicate, object, fixed word, synonym, and complement from the Chinese translation sentence in an integrated manner. By doing so, the translation quality can be greatly improved.
Similarly to the second embodiment, for example, a table in which the correspondence between Japanese particles or auxiliary verbs representing Japanese phase information and Chinese phase information is described in the specific pattern DB 201. For example, it is possible to detect and proofread errors in the Chinese translation from the Japanese translation results.

例えば、日中機械翻訳システムと想定する場合の日本語例文「風邪を引いていると思います。」の中国語の生成結果に対して、中国語統計的モデルのみの情報を用いて、翻訳結果の誤りを検出すると、「得感冒」、「得過感冒」のような誤った成分が推定される可能性が高く存在する。
これに対して、日本語の態相情報「ている」に対応した中国語の態相を表す助詞の候補「着」、「了」、「在」との三つを中国語翻訳文の誤り検出対象とすれば、日本語文の中国語翻訳結果が「我認為得感冒。」となった際に、誤り検出対象である「着」、「了」、「在」と「我認為得感冒。」との文生成確率が最大となるものを正解とすれば、「我認為得了感冒。」との正しい翻訳結果を得られる。 For example, for a Japanese example sentence “I think I'm catching a cold” when assuming a Japanese-Chinese machine translation system, the translation result using only the Chinese statistical model information If an error is detected, there is a high possibility that an erroneous component such as “acquired cold” or “acquired cold” is estimated.
On the other hand, the three candidate words "Chaku", "Ryo" and "Dai" representing the Chinese language corresponding to the Japanese language information "I" are errors in the Chinese translation. As a detection target, when the Chinese translation result of a Japanese sentence is “I am for a common cold”, the error detection targets “Arrival”, “End”, “A” and “I am for a common cold. If the correct answer is the one with the highest sentence generation probability, the correct translation can be obtained.

以上説明したように、本発明によると、機械翻訳システムの翻訳精度を大きく改善できる。 As described above, according to the present invention, the translation accuracy of the machine translation system can be greatly improved.

以上、実施形態および実施例を参照して本発明を説明したが、本発明は上記実施形態および実施例に限定されるものではない。本発明の構成や詳細には、本発明のスコープ内で当業者が理解しうる様々な変更をすることができる。 As mentioned above, although this invention was demonstrated with reference to embodiment and an Example, this invention is not limited to the said embodiment and Example. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.

本発明にかかる第１言語から第２言語への機械翻訳装置、翻訳方法およびプログラムは、機械翻訳システムなどにおいて、従来機械翻訳システムの翻訳結果を対象とした後編集や翻訳ルールの作成を行うためのコストが高い問題を解決すると同時に、機械翻訳装置の訳質を大きく改善することができる機械翻訳装置、機械翻訳方法およびプログラムに適している。
本発明によると、以上説明したとおり、機械翻訳における翻訳結果の誤りを改善しにくい問題を改善し、訳質を大きく改善することができる。その翻訳精度の高い翻訳結果をユーザに正しく提供することができる。 A machine translation apparatus, a translation method, and a program for translating a first language into a second language according to the present invention perform post-editing and creation of a translation rule for a translation result of a conventional machine translation system in a machine translation system or the like It is suitable for a machine translation device, a machine translation method, and a program that can greatly improve the translation quality of a machine translation device while solving the high-cost problem.
According to the present invention, as described above, it is possible to improve the problem that it is difficult to improve the error of the translation result in machine translation, and greatly improve the translation quality. The translation result with high translation accuracy can be correctly provided to the user.

１００…機械翻訳装置、１０１…第１言語入力部、１０２…形態素・構文解析部、１０３…特定パターンＤＢ（第１言語のみ）、１０４…第２言語統計的モデル、１０５…特定パターン検出部、１０６…第２言語生成部、１０７…誤り検出・校正部、１０８…翻訳ＤＢ、１０９…第２言語出力部、２０１…特定パターンＤＢ（２言語対応）、４０１…訳語選択部、５０１…日本語入力部、５０２…中国語統計的モデル、５０３…中国語生成部、５０４…中国語出力部、６０１…特定パターンＤＢ（日中対応）、６０２…日本語入力部、６０３…中国語統計的モデル、６０４…中国語生成部、６０５…中国語出力部。 DESCRIPTION OF SYMBOLS 100 ... Machine translation apparatus, 101 ... 1st language input part, 102 ... Morphological and syntax analysis part, 103 ... Specific pattern DB (only 1st language), 104 ... 2nd language statistical model, 105 ... Specific pattern detection part, 106: second language generation unit, 107: error detection / calibration unit, 108: translation DB, 109: second language output unit, 201: specific pattern DB (corresponding to two languages), 401: translation selection unit, 501: Japanese Input unit 502 ... Chinese statistical model 503 ... Chinese generation unit 504 ... Chinese output unit 601 ... Specific pattern DB (corresponding to daytime) 602 ... Japanese input unit 603 ... Chinese statistical model 604 ... Chinese generation unit 605 ... Chinese output unit.

Claims

A morpheme / syntax analyzer that performs morpheme / syntax analysis on an input sentence expressed in a first language;
A translation database comprising a translation dictionary or translation rules used to translate the first language into the second language;
A second language generation unit that refers to the analysis result of the morpheme / syntax analysis unit and the translation database, and generates a second language translation as a translation result corresponding to the input sentence;
A specific pattern database for storing a specific vocabulary used in the first language as a specific pattern;
A specific pattern stored in the specific pattern database is detected from the analysis result of the morpheme / syntax analysis unit, and the lexical or syntax information of the second language corresponding to the obtained analysis result of the specific pattern is obtained from the translation database. A specific pattern detection unit to be acquired;
A second language statistical model storing statistical co-occurrence information relating to vocabulary co-occurrence used in the second language;
Generated by the second language generator using the vocabulary or syntax information of the second language obtained by the specific pattern detector and the statistical co-occurrence information stored in the second language statistical model A machine translation apparatus comprising: an error detection / calibration unit that detects an error in the second language translation sentence and calibrates the second language translation sentence.

The machine translation device according to claim 1,
The specific pattern stored in the specific pattern database includes information on the first language provided as an analysis result of the morpheme / syntax analyzer, and information on the second language corresponding to the information on the first language. Machine translation apparatus characterized by being described in a set of

The machine translation device according to claim 1,
The second language statistical model is a co-occurrence having a notation, a prototype, an inflection, a part of speech, a case frame, tense, a state, a phase, a semantic classification, or a dependency relationship regarding the vocabulary or syntax information of the second language. A machine translation device storing at least one piece of statistical co-occurrence information among patterns.

The machine translation device according to claim 1,
The error detection / calibration unit uses the second language statistical model to detect unnecessary components, missing components, or word order from the second language translation generated by the second language generation unit. A machine translation apparatus that performs error detection processing of any one of error detection and automatic correction processing of an obtained error.

The machine translation device according to claim 1,
The second language is Chinese, the specific pattern stored in the specific pattern database includes a syntactic component of the first language corresponding to a Chinese constant component, and the error detection / calibration unit includes the An error detection process of the word order of the Chinese definite word and the structure particle “target”, and an automatic correction process of the obtained error for the Chinese definite word component of the Chinese translation sentence generated by the second language generation unit A machine translation apparatus characterized by

The machine translation device according to claim 1,
The second language is Chinese, the specific pattern stored in the specific pattern database includes a first language syntax component corresponding to a Chinese character component, and the error detection / calibration unit includes the An error detection process of the Chinese word word order and the structure particle “ground” for the Chinese word component of the Chinese translation generated by the second language generation unit, and an automatic correction process of the obtained error A machine translation apparatus characterized by

The machine translation device according to claim 1,
The second language is Chinese, the specific pattern stored in the specific pattern database includes a syntax component of the first language corresponding to a Chinese complement component, and the error detection / calibration unit includes the first language Performing error detection processing of the Chinese complement word order and structure particle “Koto” and automatic correction processing of the obtained error for the Chinese complement component of the Chinese translation generated by the bilingual generator Machine translation device characterized by the above.

The machine translation device according to claim 1,
The second language is Chinese, the specific pattern stored in the specific pattern database includes a syntax component of the first language corresponding to a Chinese quantile component, and the error detection / proofreading unit includes the first language A machine translation apparatus that performs error detection processing of a Chinese quantile and automatic correction processing of an obtained error on a Chinese quantile component of a Chinese translation sentence generated by a bilingual generation unit.

The machine translation device according to claim 1,
The second language is Chinese, and the specific pattern stored in the specific pattern database includes a syntax component of the first language corresponding to vocabulary and syntax information representing Chinese modal information, and the error detection The proofreading unit uses the vocabulary and syntax information component representing the Chinese modal information for the vocabulary and syntax information component representing the Chinese modal information of the Chinese translation generated by the second language generation unit. A machine translation apparatus that performs an error detection process and an automatic correction process for an obtained error.

The machine translation device according to claim 1,
The second language is Chinese, and the specific pattern stored in the specific pattern database includes a syntax component of the first language corresponding to an infix that is a Chinese preposition, and the error detection / calibration unit includes: A machine translation, comprising: performing Chinese error detection processing and automatic error correction processing on the Chinese language component of the Chinese translation component generated by the second language generation unit. apparatus.

A machine translation method used in a machine translation device that translates a first language into a second language,
A morpheme / syntactic analysis unit that performs a morpheme / syntax analysis on an input sentence expressed in a first language;
The second language generation unit corresponds to the input sentence by referring to the analysis result of the morpheme / syntax analysis unit and a translation database or translation database used for translating the first language into the second language. A second language generation step of generating a second language translation as a translation result to be performed;
A specific pattern detection unit detects a specific pattern stored in a specific pattern database that stores a specific vocabulary used in the first language as a specific pattern from the analysis result of the morpheme / syntax analysis unit, and the obtained specific pattern A specific pattern detection step of acquiring vocabulary or syntax information of a second language corresponding to a pattern analysis result from the translation database;
A second language statistic in which the error detection / proofreading unit stores vocabulary or syntax information of the second language obtained by the specific pattern detection unit and statistical co-occurrence information regarding the co-occurrence of the vocabulary used in the second language Error detection for detecting an error in the second language translation sentence generated by the second language generation unit using statistical co-occurrence information stored in a static model and calibrating the second language translation sentence A machine translation method comprising: a proofreading step.

In the computer of the machine translation device that translates the first language into the second language,
A morpheme / syntactic analysis unit that performs a morpheme / syntax analysis on an input sentence expressed in a first language;
The second language generation unit corresponds to the input sentence by referring to the analysis result of the morpheme / syntax analysis unit and a translation database or translation database used for translating the first language into the second language. A second language generation step of generating a second language translation as a translation result to be performed;
A specific pattern detection unit detects a specific pattern stored in a specific pattern database that stores a specific vocabulary used in the first language as a specific pattern from the analysis result of the morpheme / syntax analysis unit, and the obtained specific pattern A specific pattern detection step of acquiring vocabulary or syntax information of a second language corresponding to a pattern analysis result from the translation database;
A second language statistic in which the error detection / proofreading unit stores vocabulary or syntax information of the second language obtained by the specific pattern detection unit and statistical co-occurrence information regarding the co-occurrence of the vocabulary used in the second language Error detection for detecting an error in the second language translation sentence generated by the second language generation unit using statistical co-occurrence information stored in a static model and calibrating the second language translation sentence • A program that executes calibration steps and.