JP2004086919A

JP2004086919A - Mechanical translation system

Info

Publication number: JP2004086919A
Application number: JP2003356566A
Authority: JP
Inventors: Tetsuo Ueda; 上　田　哲　夫; Katsuo Koga; 古　賀　勝　夫
Original assignee: CROSS LANGUAGE Inc
Current assignee: CROSS LANGUAGE Inc
Priority date: 2000-10-24
Filing date: 2003-10-16
Publication date: 2004-03-18

Abstract

<P>PROBLEM TO BE SOLVED: To provide a mechanical translation system equipped with a word dictionary allowing a user to freely customize 'the way of translation'. <P>SOLUTION: This mechanical translation system is provided with the word dictionary 9 in which an entry word every word, the equivalent term in translation when present, the grammatical attribute, and analysis information showing the relation with other words registered therein, a grammar database 10 for storing essential construction grammar, a morpheme analytic function part 3 for inputting a sentence and collating it with the word dictionary to divide it into morphemes, a construction analytic function part 4 for extracting the subject from the word group of the morphemes divided by the morpheme analytic function part 3, trying the generation of a construction tree from the analytic information of the subject, and judging the adaptability with the construction identity of each word, thereby determining an optimum construction tree, an equivalent term determining function part 5 for determining the equivalent term of each word in the construction tree determined by the construction analytic function part 4, and a translated sentence generating function part 6 for applying the equivalent term to each word of the construction tree to generate a translated sentence. <P>COPYRIGHT: (C)2004,JPO

Description

　本発明は、ある言語で記載された文章を他の言語に自動的に翻訳する機械翻訳システムに係り、特に、各単語に付随してその単語と他の単語との関係（解析情報）を単語辞書に記憶させ、主辞駆動句構造文法HPSG(Head-Driven Phrase Structure Grammar)理論に基づいて、主辞を中心として解析情報を用いて合理的な自動翻訳を行う機械翻訳システムに関する。 The present invention relates to a machine translation system that automatically translates a sentence written in a certain language into another language, and particularly relates to a word (relationship information) between each word and the other words (analysis information) accompanying each word. The present invention relates to a machine translation system that stores in a dictionary and performs a rational automatic translation using analytic information centering on a head, based on a head-driven phrase structure grammar (HPSG) theory.

　日本で商用の機械翻訳ソフトが登場して１５年が経過した。現在では、低価格の機械翻訳ソフトも数多く存在し、機械翻訳ソフトの普及も進んでいる。特に最近のインターネットブームも追い風となり、翻訳ソフトは急激に普及しつつあり、ビジネスアプリケーションソフトの中でもトップクラスの盛況をみせている。機械 Fifteen years have passed since commercial machine translation software appeared in Japan. At present, there are many low-cost machine translation software, and machine translation software has been widely used. In particular, the recent Internet boom has been a tailwind, and translation software is rapidly spreading, and it is one of the most successful business application software.

　しかしながら、現在の機械翻訳ソフトは、文章全体の大まかな意味を捉えるための「速読のためのツール」として利用されるのがほとんどであり、当初期待された「実用的な翻訳物の生産を支援するツール」としてはまだまだ完成度が低く、普及も遅れている。 However, current machine translation software is mostly used as a “speed-reading tool” to capture the general meaning of the entire sentence. As a tool to support, the degree of perfection is still low, and its spread is slow.

　近年、世界中で国際化・オープン化が急速に進んだことから、日本でも「実用的な翻訳物」を大量に生産しなければならない場面が増えている。そのため、これまでの機械翻訳ソフトのような大まかな意味が理解できる程度の訳文ではなく、原文に忠実で第三者にも正しく意味が伝わる訳文を生成し、本当の意味で「翻訳」を支援する、新しい機械翻訳ソフトの登場が期待されている。 In recent years, internationalization and openness have progressed rapidly around the world, and in Japan, the number of situations where large quantities of “practical translations” must be produced has increased. Therefore, instead of a translation that can understand the general meaning of conventional machine translation software, it generates a translation that is faithful to the original and that can be conveyed correctly to third parties, and supports "translation" in the true meaning New machine translation software is expected to appear.

　一般に、翻訳ソフトウェアは、「単語辞書」と「文法データベース」と「プログラム」から構成されている。翻訳 Generally, translation software is composed of a “word dictionary”, a “grammar database”, and a “program”.

　「単語辞書」は、単語単位に品詞や訳語ほか、原文の解析やターゲット文の生成に必要な種々の情報をもっている。「文法データベース」は、単語の列が、文として成立するために、どういう品詞がどういう順序で登場し、どういう構文をなしているかを規則として記述している。「プログラム」は原文を入力し、これを文単位に「単語辞書」と「文法データベース」を使って、文の構造を決定し(「解析」フェーズ)、その構造に基づいてターゲット文を生成する(「生成」フェーズ)。 (4) The "word dictionary" has, for each word, various information necessary for analyzing the original sentence and generating the target sentence, in addition to the part of speech and the translated word. The "grammar database" describes, as rules, what parts of speech appear in what order and what syntax they have in order for a sequence of words to be formed as a sentence. The "program" inputs the original sentence, determines the sentence structure using the "word dictionary" and "grammar database" for each sentence ("analysis" phase), and generates a target sentence based on that structure ("Generate" phase).

　従来の機械翻訳ソフトの単語辞書には、単語の見出し語（つづり）、訳語、品詞等が登録され、一方、文法データベースには専ら品詞の結びつき方からなる種々の構文のパターン（文法規則）が登録されていた。 In the word dictionary of conventional machine translation software, word headwords (spellings), translations, parts of speech, and the like are registered. It was registered.

　ここで、従来の機械翻訳システムの構成とその処理の流れについて説明する。 Here, the configuration of a conventional machine translation system and the flow of its processing will be described.

　図１２は、従来の機械翻訳システムの構成とその処理の流れを示している。従来の機械翻訳システム１１は、標準入力部１２と、形態素解析機能部１３と、構文解析機能部１４と、訳語決定機能部１５と、生成木生成／変形機能部１６と、訳文生成機能部１７と、標準出力部１８とを備えている。 FIG. 12 shows the configuration of a conventional machine translation system and the flow of its processing. The conventional machine translation system 11 includes a standard input unit 12, a morphological analysis function unit 13, a syntax analysis function unit 14, a translated word determination function unit 15, a generated tree generation / transformation function unit 16, and a translated sentence generation function unit 17. And a standard output unit 18.

　また、機械翻訳システム１１は、単語辞書１９と、文法データベース２０とを有している。 The machine translation system 11 has a word dictionary 19 and a grammar database 20.

　単語辞書１９には各単語の見出し語（つづり）、品詞（名詞、動詞、形容詞など）、訳語が記述されている。 The word dictionary 19 describes headwords (spellings), parts of speech (nouns, verbs, adjectives, etc.) and translations of each word.

　文法データベース２０には品詞に関する構文のルール（文法規則）が記述されている。品詞に関する構文のルールとは、品詞の配列や出現パターンに構文パターンを対応させるルールである。文法規則の数はたとえば２０００ないし３０００に及ぶ。 The grammar database 20 describes syntax rules (grammar rules) for parts of speech. The rule of the syntax regarding the part of speech is a rule for associating the syntax pattern with the arrangement and appearance pattern of the part of speech. The number of grammar rules ranges, for example, from 2000 to 3000.

　従来の機械翻訳システム１１によれば、標準入力部１２により英文文字列を入力し、形態素解析機能部１３にその英文文字列を渡す。 According to the conventional machine translation system 11, an English character string is input from the standard input unit 12, and the English character string is passed to the morphological analysis function unit 13.

　形態素解析機能部１３は、単語辞書９を参照して英文文字列から単語を抽出し、それらの単語の品詞（名詞、動詞、形容詞、副詞など）の情報とともに構文解析機能部１４に渡す。 The morphological analysis function unit 13 extracts words from the English character string with reference to the word dictionary 9 and passes the words to the syntactic analysis function unit 14 together with information on the parts of speech (nouns, verbs, adjectives, adverbs, etc.).

　構文解析機能部１４は、英文文字列の単語の品詞配列と文法データベース２０から構文木を決定する。 The syntax analysis function unit 14 determines a syntax tree from the part-of-speech array of words in the English character string and the grammar database 20.

　たとえば、"This is a pen."という文では、"This"（名詞）、"is"（動詞）、"a"（冠詞）、"pen"（名詞）が形態素解析機能部１３によって抽出され、図１３に例示するこれらの品詞配列と一致する構文パターンが文法データベース２０から検索される。文法データベース２０には前述したように多数の品詞配列の構文パターンが記憶されており、入力した英文文字列の品詞配列と一致する構文パターンをパターンマッチングの方法により検索し、構文木を作成する。 For example, in the sentence "This is a pen.", "This" (noun), "is" (verb), "a" (article), and "pen" (noun) are extracted by the morphological analysis function unit 13, A syntax pattern that matches these part-of-speech arrangements illustrated in FIG. As described above, the grammar database 20 stores the syntax patterns of a large number of part-of-speech arrays, and searches for a syntax pattern that matches the part-of-speech array of the input English character string by a pattern matching method to create a syntax tree.

　構文解析機能部１４は、上述した方法によって得た構文木（構文解析データ）を訳語決定機能部１５に送る。 The syntax analysis function unit 14 sends the syntax tree (syntax analysis data) obtained by the above method to the translated word determination function unit 15.

　訳語決定機能部１５は、意味処理を行い構文木に対応する訳語を決定し、構文解析データと訳語データとを生成木生成／変形機能部１６に出力する。 The translated word determination function unit 15 performs semantic processing to determine a translated word corresponding to the syntax tree, and outputs syntactic analysis data and translated word data to the generation tree generation / transformation function unit 16.

　生成木生成／変形機能部１６は、文法データベース２０を参照し、特定の翻訳ルールに従い日本語生成用構文木を生成して訳文生成機能部１７に出力する。 The generation tree generation / transformation function unit 16 refers to the grammar database 20, generates a syntax tree for generating Japanese in accordance with a specific translation rule, and outputs it to the translated sentence generation function unit 17.

　訳文生成機能部１７は、上記日本語生成用構文木の各単語を日本語に翻訳して日本語翻訳データ（和文文字列）を標準出力部１８を介して出力する。 The translated sentence generation function unit 17 translates each word of the syntax tree for generating Japanese into Japanese and outputs Japanese translation data (Japanese sentence character string) via the standard output unit 18.

　上記従来の機械翻訳システムによる機械翻訳では、構文解析の決め手となっていたのは、文法データベースに登録された文法規則である。機械 In machine translation by the above-mentioned conventional machine translation system, the crucial factor in parsing is the grammar rules registered in the grammar database.

　この文法規則に漏れがあった場合はむろん最適な構文木を発見することはできなかった。また、たとえ文法規則に記述されていても、その文法規則に基づく解析で導かれる多数の解の中から最適な解を判別する手段がなかった。第１解として「正しい解釈」が選ばれなかったりすると、そのような文はいつでも誤った結果を出すことになった。この場合にはユーザーが単語辞書に単語をどう指定しようが、文法データベース自体が適当な構文木を生成しないので、適切な翻訳を得られないことになった。場合 If there were any omissions in the grammar rules, of course, the optimal parse tree could not be found. Further, even if it is described in a grammar rule, there is no means for determining an optimal solution from among a large number of solutions derived by analysis based on the grammar rule. Such sentences would always produce incorrect results if the "right interpretation" was not chosen as the first solution. In this case, no matter how the user specifies words in the word dictionary, the grammar database itself does not generate an appropriate syntax tree, so that an appropriate translation cannot be obtained.

　つまり、従来の機械翻訳システムでは、「英文の解析」と「訳文の生成」に関して、システムにあらかじめ組み込まれた「文法カテゴリー」や「文法規則」に依存しており、解析や生成の主要な部分を文法規則が支配し、ユーザーが希望する訳を出すために変更（カスタマイズ）できるのは単語単位の辞書記述、たとえば特定の訳語を出力するようなカスタマイズだけであったため、対応可能な範囲には自ら限界があった。 In other words, conventional machine translation systems rely on "grammar categories" and "grammar rules" built into the system for "analysis of English sentences" and "generation of translated sentences". Is governed by the grammar rules, and the user can change (customize) only the word-by-word dictionary description, for example, customization to output a specific translated word, in order to obtain the desired translation. There was a limit.

　ここで、上記従来の機械翻訳による翻訳の限界を一つの具体例をあげて説明ことにする。 Here, the limitation of the conventional machine translation will be described with reference to a specific example.

　今、“Time flies like an arrow.”という英文文字列を日本語に翻訳する場合を考える。 Now, consider the case of translating the English character string “Time flies like an arrow.” Into Japanese.

　この英文文字列には“flies”（飛ぶ）と””like”（好む）の両単語が動詞となり得るので、これらの２つの単語が文章全体の述語に成り得る。には In this English character string, both words “flies” and “like” can be verbs, so these two words can be predicates for the whole sentence.

　“flies”（飛ぶ）を述語とした場合、“Time flies like an arrow”は図９のような構文木となる。場合 If “flies” (fly) is a predicate, “Time flies like an arrow” becomes a syntax tree as shown in FIG.

　図９の構文木においては、述語“flies”は、”Time”という主語を持ち、”like”以下は前置詞句と解され、前置詞”like”はかかる相手の単語（ここでは目的語という）を持ち、その目的語として””an arrow”があると解される。 In the parse tree of FIG. 9, the predicate "flies" has the subject of "Time", and the words "like" and below are interpreted as prepositional phrases, and the preposition "like" represents the word of the other party (here, called the object). It is understood that the object has "" an arrow "".

　このような構文木に解すると、入力された英文文字列“Time flies like an arrow.”は、“時間は矢のように飛ぶ。”（直訳）と翻訳される。 When interpreted as such a parse tree, the input English character string “Time flies like an arrow.” Is translated as “Time flies like an arrow.”

　一方、“like”(好む)を述語とした場合、“Time flies like an arrow”は図１０のような構文木となる。 On the other hand, when “like” is used as a predicate, “Time flies like an arrow” becomes a syntax tree as shown in FIG.

　図１０の構文木においては、主辞“like”は、”Time flies”（時間ハエ）という主語を持ち、かつ、”an arrow”という目的語を持つ。において In the syntax tree of FIG. 10, the subject “like” has the subject “Time flies” and the object “an arrow”.

　このような構文木に解すると、入力された英文文字列 “Time flies like an arrow.”は、“時間ハエは矢を好む。”と翻訳される。解 Given this syntax tree, the input English character string “Time flies like an arrow.” Is translated as “Time flies like arrows.”

　従来の機械翻訳システムは、上述した両構文木のうち、ユーザーが望む構文木を選択することができなかった。 The conventional machine translation system could not select a syntax tree desired by the user from both of the above syntax trees.

　本発明が解決しようとする課題は、第一にユーザーが自由に「翻訳の仕方」をカスタマイズできる単語辞書を有する機械翻訳システムを提供することにある。 A first problem to be solved by the present invention is to provide a machine translation system having a word dictionary in which a user can freely customize “translation method”.

　第二に本発明が解決しようとする課題は、文法からは複数あり得る構文木から最適な構文木を決定できる機械翻訳システムを提供することにある。 Secondly, a problem to be solved by the present invention is to provide a machine translation system that can determine an optimal syntax tree from a plurality of possible syntax trees from a grammar.

　本願請求項１に係る機械翻訳システムは、
　単語ごとに見出し語と、存在する場合の訳語と、文法上の属性と、他の単語との関係を示す解析情報とを登録した単語辞書と、
　主要な構文文法を記憶した文法データベースと、
　文章を入力し、前記単語辞書と照合して形態素に分解する形態素解析機能部と、
　前記形態素解析機能部によって分解された形態素のうちの単語群から主辞を抽出して主辞の解析情報から前後の形態素の各単語の構文素性と適合するものを選択することによって構文木を決定する構文解析機能部と、
　前記構文解析機能部が決定した構文木における各単語の対応する訳語を決定する訳語決定機能部と、
　前記構文木の各単語に訳語を当てはめて訳文を生成する訳文生成機能部とを有することを特徴とするものである。 The machine translation system according to claim 1 of the present application
A word dictionary that registers headwords for each word, translations when they exist, grammatical attributes, and analysis information indicating the relationship with other words,
A grammar database that stores the main syntactic grammars,
A morphological analysis function unit for inputting a sentence, collating with the word dictionary and decomposing into morphemes;
A syntax for extracting a head from a group of words among the morphemes decomposed by the morphological analysis function unit and selecting a match that matches the syntactic feature of each word of the preceding and following morphemes from the analysis information of the head to determine a syntax tree Analysis function part,
A translated word determining function unit that determines a translated word corresponding to each word in the syntax tree determined by the syntax analyzing function unit;
A translation generating function unit that generates a translation by applying a translation to each word of the syntax tree.

　本願請求項２に係る機械翻訳システムは、請求項1のシステムにおいて、
　前記単語辞書には、所定の単語がユーザー指定の条件を満たす場合の特別な翻訳ルールを記述した生成情報が登録されており、
　前記翻訳ルールにより、前記構文解析機能部が決定した構文木を変形する生成木生成／変形機能部を有することを特徴とするものである。 The machine translation system according to claim 2 of the present application is the system according to claim 1,
In the word dictionary, generation information describing a special translation rule when a predetermined word satisfies a user-specified condition is registered,
It has a generator tree generation / modification function unit that modifies the syntax tree determined by the syntax analysis function unit according to the translation rule.

　本願請求項３に係る機械翻訳システムは、請求項１または２のシステムにおいて、
　前記単語辞書には、単語の解析情報としてその語が主辞となった場合に関係する対象の単語の意味属性が登録されており、主辞の関係対象となれる単語の解析情報には意味属性が登録されていることを特徴とするものである。 The machine translation system according to claim 3 of the present application is the machine translation system according to claim 1 or 2,
In the word dictionary, the semantic attribute of the target word related when the word becomes the subject is registered as the analysis information of the word, and the semantic attribute is registered in the analysis information of the word that can be the subject of the subject. It is characterized by having been done.

　本願請求項４に係る機械翻訳システムは、請求項１ないし３のいずれかのシステムにおいて、
　単語の意味属性と、解析情報と、生成情報の少なくとも１つをユーザーに登録・更新させる辞書登録手段を有することを特徴とするものである。 The machine translation system according to claim 4 of the present application is the system according to any one of claims 1 to 3,
It has a dictionary registration unit for registering and updating at least one of the semantic attributes of words, analysis information, and generation information.

　本願請求項５に係る機械翻訳システムは、請求項１ないし４のいずれかのシステムにおいて、
　単語の意味属性、あるいは解析情報、あるいは生成情報が複数個ある場合には、適用すべき意味属性、あるいは解析情報、あるいは生成情報の優先順位をユーザーが辞書登録手段を介して指定する単語辞書を有していることを特徴とするものである。 The machine translation system according to claim 5 of the present application is the system according to any one of claims 1 to 4,
When there are a plurality of semantic attributes, analysis information, or generation information of a word, a word dictionary in which the user designates the priority of the semantic attribute to be applied, analysis information, or generation information via a dictionary registration unit is provided. It is characterized by having.

　本願請求項６に係る機械翻訳システムは、請求項１ないし４のいずれかのシステムにおいて、
　前記構文解析機能部は、単語の意味属性、あるいは解析情報、あるいは生成情報が複数個ある場合に、最前に適用した意味属性、あるいは解析情報、あるいは生成情報を前記単語辞書から検索することを特徴とするものである。 The machine translation system according to claim 6 of the present application is the system according to any one of claims 1 to 4,
When there are a plurality of semantic attributes, analysis information, or generation information of a word, the syntax analysis function unit searches the word dictionary for the most recently applied semantic attribute, analysis information, or generation information. It is to be.

本発明によれば、以前翻訳した文章とよく似ているが微妙に違う文章を翻訳する際、以前翻訳した文章を参考にして訳文を生成する、いわゆる「自己学習型機械翻訳システム」の前段階として、これまでの辞書記述（単語−品詞−訳語）を拡大し、従来の単語単位の辞書登録だけでなく、単語ごとに解析情報や生成情報を単語辞書に記述するようにした。 According to the present invention, when translating a sentence that is very similar to, but slightly different from, a previously translated sentence, a translation is generated with reference to the previously translated sentence, which is a pre-stage of a so-called “self-learning type machine translation system”. The conventional dictionary description (word-part-of-speech-translation) has been expanded so that not only a conventional dictionary registration for each word but also analysis information and generation information for each word are described in a word dictionary.

　これにより、ユーザーは単語辞書の解析情報や生成情報に任意のルールを書き込めることができ、決まりきった言い回しを含む、より広範囲にわたる構文解析や翻訳文生成のカスタマイズを行うことができるようになった。 This allows users to write arbitrary rules in the analysis and generation information of word dictionaries, and allows for a wider range of syntax analysis and customization of translation generation, including fixed wording. .

　このカスタマイズは、定義した単語の翻訳に使用が限定されるので、ユーザーが単語ごとに柔軟かつきめ細かい翻訳のカスタマイズをすることができる。カスタマイズ This customization is limited to the translation of defined words, so the user can customize the translation flexibly and finely for each word.

　また、解析情報に単語の意味属性を持たせることにより、主辞とそれに関係する単語を正しく選択することができ、これによって文法上あり得る複数の構文木から正しい構文木を選択することができるようになった。 Also, by giving the semantic attribute of a word to the analysis information, it is possible to correctly select the head and the word related thereto, and thereby to select the correct syntax tree from a plurality of syntax trees that are grammatically possible. Became.

　さらに、単語に付随して意味属性を登録し、付属語および述語がとり得る単語の意味属性を登録した単語辞書を備えた漢字変換用フロントエンドプロセッサによれば、単語同士の意味属性の関係から複数の同音異義語の中から正しい漢字変換を行うことができるようになった。 Furthermore, according to the kanji conversion front-end processor having a word dictionary in which semantic attributes are registered along with the words and the semantic attributes of words that can be taken by adjuncts and predicates, according to the relationship between the semantic attributes of the words, Kanji conversion can now be performed correctly from multiple homonyms.

　以下、本発明に係る機械翻訳システムについて、図面を参照しながら具体的に説明する。 Hereinafter, the machine translation system according to the present invention will be specifically described with reference to the drawings.

　図１は本発明に係る機械翻訳システムの一実施形態のブロック図である。本発明は特定の言語間の翻訳に限られないが、ここでは理解を容易にするために英語から日本語に翻訳する場合の例を示している。したがって、下記の説明の「英語」および「日本語」の語は、翻訳すべき言語に応じて被翻訳言語および訳出言語に適宜読み替えられるものとする。 FIG. 1 is a block diagram of an embodiment of a machine translation system according to the present invention. Although the present invention is not limited to translation between specific languages, an example in which English is translated into Japanese for ease of understanding is shown. Therefore, the words “English” and “Japanese” in the following description are appropriately read as the translated language and the translated language according to the language to be translated.

　本発明に係る機械翻訳システムは、構成上従来の機械翻訳システムとほとんど同様の構成要素を有している。しかし、本発明に係る機械翻訳システムは、辞書の登録内容が従来の辞書のそれと大きく相違し、それに伴って構文解析および生成木生成／変形の方法が従来の機械翻訳システムと大きく相違する。機械 The machine translation system according to the present invention has almost the same components as the conventional machine translation system in terms of configuration. However, in the machine translation system according to the present invention, the registered contents of the dictionary are significantly different from those of the conventional dictionary, and accordingly, the method of parsing and generating / transforming the generated tree is significantly different from the conventional machine translation system.

　図１の機械翻訳システム１は、標準入力部２と、形態素解析機能部３と、構文解析機能部４と、訳語決定機能部５と、生成木生成／変形機能部６と、訳文生成機能部７と、標準出力部８とを備えている。 The machine translation system 1 of FIG. 1 includes a standard input unit 2, a morphological analysis function unit 3, a syntax analysis function unit 4, a translated word determination function unit 5, a generated tree generation / transformation function unit 6, and a translated sentence generation function unit. 7 and a standard output unit 8.

　また、機械翻訳システム１は、単語辞書９と、文法データベース１０とを有している。 The machine translation system 1 has a word dictionary 9 and a grammar database 10.

　標準入力部２と標準出力部８は、公知の任意の入力手段と出力手段である。 The standard input unit 2 and the standard output unit 8 are known input means and output means.

　本発明の単語辞書９は、単語ごとに見出し語と、存在する場合の訳語と、文法上の属性と、他の単語との関係を示す解析情報と、所定の条件を満たす場合の特別な翻訳ルールを記述した生成情報とを登録した辞書である。 The word dictionary 9 of the present invention includes, for each word, a headword, a translated word when present, a grammatical attribute, analysis information indicating a relationship with another word, and a special translation when a predetermined condition is satisfied. This is a dictionary in which generation information describing rules is registered.

　「文法上の属性」とは品詞、数、人称、格等の情報をいう。「他の単語との関係を示す解析情報」とはある単語が主語としてどのような単語を必要とするか、あるいは補語としてどのような単語を必要とするか等を記述した情報をいう。「生成情報」は日本語として自然な翻訳を行うために、入力された文の単語が一定の配列条件を満たした場合に、それに対応した翻訳の仕方を記述した情報をいう。 “Grammatical attributes” refers to information such as part of speech, numbers, person names, and cases. “Analysis information indicating the relationship with other words” refers to information that describes what word a certain word requires as a subject, what kind of word as a complement is required, and the like. "Generated information" refers to information describing a translation method corresponding to a word of an input sentence that satisfies a certain arrangement condition in order to perform natural translation in Japanese.

　なお、単語の見出し語、訳語、文法属性等は単語辞書９のコンテンツ部という部分に記憶され、解析情報は単語辞書９の解析用バイナリ部に記憶され、生成情報は単語辞書９の生成用バイナリ部に記憶されている。 Note that the headwords, translations, grammatical attributes, and the like of the words are stored in a part called the content part of the word dictionary 9, the analysis information is stored in the analysis binary part of the word dictionary 9, and the generation information is the binary for the generation of the word dictionary 9. Stored in the department.

　本発明の文法データベース１０は、文型等の主要な構文の文法を記憶した辞書である。なお、従来の機械翻訳システムの文法データベースでは、種々の品詞・語型ごとの詳細かつ膨大な文法ルール（たとえば２０００〜３０００ルール）が記憶されていたのに対し、本発明の文法データベース１０には基本文型など、数十ルールが記憶されいるにすぎない。 The grammar database 10 of the present invention is a dictionary that stores grammar of main syntax such as sentence patterns. In the grammar database of the conventional machine translation system, detailed and huge grammar rules (for example, 2000 to 3000 rules) are stored for each part of speech / word type, whereas the grammar database 10 of the present invention is stored in the grammar database 10 of the present invention. Only dozens of rules, such as basic sentence patterns, are stored.

　形態素解析機能部３は、入力した文字列（文章）を引用符・括弧・ダッシュ・（これらをブロックデータという）と単語に分解する手段である。形態素解析機能部３は、入力された文字列からブロックデータを認識し、そのブロックデータやスペースによって単語を分割し、単語を抽出することができる。 The morphological analysis function unit 3 is means for decomposing the input character string (sentence) into quotes, parentheses, dashes (these are referred to as block data), and words. The morphological analysis function unit 3 can recognize block data from the input character string, divide words by the block data and spaces, and extract words.

　なお、ブロックデータと単語とをまとめて本明細書では「形態素」ということにする。 Note that block data and words are collectively referred to as “morpheme” in this specification.

　形態素解析機能部３は、標準入力部２から英文文字列を入力し、上述したように入力した英文文字列から引用符・括弧・ダッシュからなるブロックデータリストを作成し、当該英文文字列からブロックデータリストを除いて分割された単語用文字列を抽出し、単語辞書９のコンテンツ部から各単語を検索し、各単語に対応する単語データを作成する。形態素解析機能部３は、単語データと最初に作成したブロックデータリストをまとめて形態素データとして構文解析機能部４に出力する。 The morphological analysis function unit 3 inputs an English character string from the standard input unit 2, creates a block data list including quotation marks, parentheses, and dashes from the English character string input as described above, and creates a block from the English character string. The word character string divided except for the data list is extracted, each word is searched from the content part of the word dictionary 9, and word data corresponding to each word is created. The morphological analysis function unit 3 collects the word data and the first created block data list and outputs them to the syntax analysis function unit 4 as morphological data.

　構文解析機能部４は、形態素データから最適な構文木（文章構造すなわち単語の関係を表現したツリー構造）を決定する手段である。 The syntax analysis function unit 4 is a means for determining an optimal syntax tree (sentence structure, that is, a tree structure expressing the relationship between words) from morphological data.

　構文解析機能部４は、形態素解析機能部３から形態素データを入力し、単語辞書９の解析用バイナリ部を参照して形態素データ中の単語データをすべて句構造データに変換する。ここで、「句」とは、名詞、形容詞、副詞等と同様の働きをするまとまった複数の単語である。「句」への変換に際しては、構文解析機能部４は、形態素解析機能部３から入力した形態素データに含まれるブロックデータリストに従い、指定されたブロック部分（引用符・括弧・ダッシュで囲まれた部分）がそれぞれ一つの句としてまとまるように解析を行う。 The syntax analysis function unit 4 receives the morphological data from the morphological analysis function unit 3 and converts all the word data in the morphological data into phrase structure data with reference to the analysis binary unit of the word dictionary 9. Here, the “phrase” is a group of plural words having the same function as a noun, an adjective, an adverb, and the like. At the time of conversion into “phrase”, the syntax analysis function unit 4 follows the block data list included in the morphological data input from the morphological analysis function unit 3 and specifies the designated block portion (enclosed in quotation marks, parentheses, and dashes). Part) are analyzed as one phrase.

　次に、構文解析機能部４は、上記句構造データと単語辞書９の解析用バイナリ部と文法データベース１０の情報とから構文解析を行って文全体の構文解析データ（構文木を表すデータ）を作成し、訳語決定機能部５に出力する。 Next, the syntax analysis function unit 4 performs a syntax analysis based on the phrase structure data, the analysis binary unit of the word dictionary 9 and the information of the grammar database 10 to obtain syntax analysis data (data representing a syntax tree) of the entire sentence. It is created and output to the translated word determination function unit 5.

　この構文解析は、文の中心となる主辞（その文の動詞であることが多い）を中心に、その主辞となる単語に登録されている主語と述語や補語の文法上の関係（品詞、数、人称等）や意味属性（後述する）等の解析情報を用いて最適な構文を決定するプロセスである。この構文解析については後に具体例を挙げて再び説明する。 This syntactic analysis focuses on the head of the sentence (often the verb of the sentence) and the grammatical relationship between the subject registered in the head word and the predicate or complement (part of speech, number This is a process of determining an optimal syntax using analysis information such as a personal name, a personal attribute, and the like, and a semantic attribute (described later). This syntax analysis will be described again later with a specific example.

　訳語決定機能部５は、構文解析データに適応する単語の訳語を決定する手段である。 The translated word determination function unit 5 is a means for determining a translated word of a word adapted to the syntax analysis data.

　訳語決定機能部５は、構文解析機能部４から構文解析データを入力し、文全体の構造と各単語の訳語選択情報に基づいてどの訳語を採用するか決定し、文に複数の解釈がある場合はそれぞれの解釈のウエイトを計算して最もウエイトの小さなものを解として選択し、構文解析データ及び訳語データを生成木生成／変形機能部６へ出力する。 The translated word determination function unit 5 receives the syntax analysis data from the syntax analysis function unit 4, determines which translated word is to be adopted based on the entire sentence structure and the translated word selection information of each word, and the sentence has a plurality of interpretations. In this case, the weight of each interpretation is calculated, the one with the smallest weight is selected as a solution, and the syntax analysis data and the translated word data are output to the generation tree generation / deformation function unit 6.

　生成木生成／変形機能部６は、特定の条件に当てはまる場合に、日本語として自然な翻訳を行えるように日本語翻訳用の生成木（日本語生成用構文木）を生成すべく、元の構文木を変形する手段である。 The generation tree generation / deformation function unit 6 generates a generation tree for Japanese translation (a syntax tree for Japanese generation) so that a natural translation can be performed as Japanese when a specific condition is satisfied. It is a means to transform the parse tree.

　具体的には生成木生成／変形機能部６は、訳語決定機能部５から構文解析データ及び訳語データを入力し、それに含まれる単語の配列、態様等から単語辞書９の生成用バイナリ部に記載されている生成情報にしたがって日本語生成用構文木を作成し、あるいはよりわかりやすい日本語が生成されるように元の構文木を変形する。変形終了後の日本語生成用構文木は、生成木生成／変形機能部６によって訳文生成機能部７へ出力される。 Specifically, the generation tree generation / deformation function unit 6 receives the syntax analysis data and the translated word data from the translated word determination function unit 5, and describes the data in the binary unit for generation of the word dictionary 9 based on the arrangement and form of the words included in the data. According to the generated generation information, a syntax tree for generating a Japanese language is created, or the original syntax tree is transformed so as to generate a more understandable Japanese language. The Japanese-language generation syntax tree after the transformation is output to the translated sentence generation function unit 7 by the generation tree generation / transformation function unit 6.

　訳文生成機能部７は、上記日本語生成用構文木に訳語を当てはめて日本語翻訳データ（和文文字列）を出力するための手段である。 The translated sentence generation function unit 7 is a means for applying translated words to the syntax tree for generating Japanese language and outputting Japanese translation data (Japanese sentence character string).

　具体的には訳文生成機能部７は、生成木生成／変形機能部６から日本語生成用構文木を入力し、単語辞書９に記載された訳語に従って和文文字列を作成し、これを標準入力部２へ出力する。 Specifically, the translated sentence generation function unit 7 receives the syntax tree for generating Japanese from the generation tree generation / deformation function unit 6, creates a Japanese sentence character string in accordance with the translated word described in the word dictionary 9, and inputs this to the standard input. Output to section 2.

　訳文生成機能部７は以下の要件を満たすものとする。　
　英語と１対１で対応しない情報（許可、義務のようなモダリティなど）が日本語の付加情報（付加の生成情報）という形で単語辞書９のコンテンツ部に記録されている。 The translation generation function unit 7 shall satisfy the following requirements.
Information that does not correspond one-to-one with English (modality such as permission and duty) is recorded in the content section of the word dictionary 9 in the form of Japanese additional information (additional generation information).

　また、単語辞書９の生成用バイナリ部には活用語の活用表が記述されており、活用語をどう活用させるかはこの活用表データに従う。また、この活用表には上述した付加情報によって活用語がどう変化するかも記述されている。 {Circle around (2)} The utilization table of the vocabulary word is described in the binary part for generation of the word dictionary 9, and how to utilize the vocabulary word follows the utilization table data. The utilization table also describes how the utilization word changes depending on the additional information described above.

　単語辞書９の生成用バイナリ部には、生成木の上下関係によって、各単語に付加すべき語（名詞に対する助詞など）のデータも記述されており、このデータに従って単語や句に付加すべき語を追加する。 The data for words to be added to each word (such as particles for nouns) is also described in the generation binary part of the word dictionary 9 according to the hierarchical relationship of the generation tree, and words to be added to words or phrases according to this data are described. Add.

　なお、上記機械翻訳システム１の構成手段のうち、生成木生成／変形機能部６は、システムの目的に応じて省略することができる。たとえば、直訳や下訳のみを目的とする簡素なシステムでは、生成木生成／変形機能部６を適宜省略することができる。 In addition, among the constituent means of the machine translation system 1, the generated tree generation / deformation function unit 6 can be omitted according to the purpose of the system. For example, in a simple system for the purpose of only direct translation or draft translation, the generation tree generation / deformation function unit 6 can be omitted as appropriate.

　また、上記機械翻訳システム１では、辞書をカスタマイズする手段を示していないが、単語辞書９をカスタマイズする辞書登録手段を適宜追加したシステムも本発明に含まれる。 Although the machine translation system 1 does not show a means for customizing a dictionary, the present invention also includes a system in which dictionary registration means for customizing the word dictionary 9 is appropriately added.

　また、単語辞書９の解析用バイナリ部は、主辞が関係する単語の文法上の属性のみを記載した記載したものと、主辞が関係する単語の意味上のカテゴリー（意味属性）をも付加したものとがあり得る。これらの単語辞書９及びそれを用いた翻訳の方法について具体例を用いて以下に説明する。 In addition, the analysis binary part of the word dictionary 9 is a description in which only the grammatical attribute of the word related to the head is described, and a semantic category (semantic attribute) of the word related to the head is added. There can be. The word dictionary 9 and a translation method using the word dictionary 9 will be described below using specific examples.

　まず、HPSG理論の概要を説明する。HPSGは、Head-Driven Phrase Structure Grammar（主辞駆動句構造文法）の略である。この理論の中心となるのは、その名前が示す通り「Head＝主辞（文・句の中心となる語）」という概念である。HPSG では、句あるいは文には必ずその中心となる語＝主辞があり、句の性質は句の主辞となる語に記述されているとする。 First, an overview of HPSG theory is explained. HPSG is an abbreviation of Head-Driven Phrase Structure Grammar. At the heart of this theory is the concept of "Head = head (word that is the center of a sentence or phrase)" as the name implies. In HPSG, it is assumed that a phrase or sentence always has a central word = head, and the nature of the phrase is described in the word which is the head of the phrase.

　以下では、"I go."という文章を例にとって、主辞の概念を説明する。なおこの例では、構文の決定プロセスに意味属性の情報を用いない場合について説明する。 In the following, the concept of the subject will be explained using the example of the sentence "I go." In this example, a case will be described where the information of the semantic attribute is not used in the syntax determination process.

　"I go."という文は単に「主語動詞」と並んでいる文章であるが、これが文として成立するのは、主辞である"go"という単語が「主語を持つ」という性質を有し、その主語の条件に"I"が適合する場合に限られる。本発明では、単語辞書９に、"go"という単語が主辞と成り得る単語であり、「主語を持つ」という性質を有していることが単語"go"に付随して記載されている。この「主語を持つ」という性質及び主語の条件が"go"の解析情報である。 The sentence "I go." Is simply a sentence aligned with the "subject verb," but this is formed as a sentence because the subject word "go" has the property of "having the subject." Only if "I" satisfies the condition of the subject. In the present invention, it is described in the word dictionary 9 that the word "go" is a word that can be a subject and has the property of "having a subject", which accompanies the word "go". The property “having the subject” and the condition of the subject are the analysis information of “go”.

　同様にたとえば、"I see you."が文として成立するのは、主辞である"see"という単語が「主語を持つ」と「目的語を持つ」という両性質の双方を有し、その主語の条件に"I"が適合し、目的語の条件に"you"が適合する場合に限られる。この場合、"see"という単語が「主語を持つ」と「目的語を持つ」及び主語の条件と目的語の条件という解析情報は、単語辞書９の単語"see"に付随して記載されている。 Similarly, for example, "I see you." Is formed as a sentence because the subject "see" has both properties of "having a subject" and "having an object" and its subject "I" satisfies the condition, and "you" satisfies the condition of the object. In this case, the analysis information that the word "see" has "the subject" and "the object" and the condition of the subject and the condition of the object are described along with the word "see" in the word dictionary 9. I have.

　"I go."の"I"が満たすべき「主語」の条件を具体的に記述するために、単語辞書９の各単語には「構文素性」というものが定義されている。構文素性は単語の文法上の属性、解析情報など、構文を構成する要素の性質を包含するものであり、「素性とその値の集合」という形式を持ち、例えば図２のように定義される。 In order to specifically describe the condition of the “subject” that “I” of “I go.” Must satisfy, a “syntax feature” is defined for each word in the word dictionary 9. The syntactic feature includes the grammatical attributes of the word, the analysis information, and the like, and the properties of the elements constituting the syntax. The syntactic feature has a format of "set of features and their values", and is defined as shown in FIG. .

　図２では、左辺の「品詞」が素性（名）であり、右辺の「名詞、動詞…」が素性値となる（「{}」は、{}中の要素のどれか一つを値としてとることを意味する）。この定義に基づき、各単語の構文素性を図３のように記述することができる。 In FIG. 2, the "part of speech" on the left side is the feature (name), and the "noun, verb ..." on the right side is the feature value. ("{}" Is one of the elements in {} Take). Based on this definition, the syntactic features of each word can be described as shown in FIG.

　図３に"I go."の各単語がどのような構文素性を持つかを定義した「単語辞書」を示す。構文素性が複数ある場合は、これをスラッシュで区切って表現する。 FIG. 3 shows a “word dictionary” that defines what syntactic feature each word of “I go.” Has. If there are multiple syntactic features, express them by separating them with slashes.

　"go"の「主語となる句=<…>」は、"go"の解析情報であり、一つの構文素性であってその語の主語の条件を指示する役割を持つ。なお、たとえば"go"の三単現形＝"goes"では、「主語となる句=<…>」の値は、"go"の「人称=一人称」から「人称=三人称」に変わる。句 The “subject phrase = <…>” of “go” is the analysis information of “go”, and has one syntactic feature and has a role of indicating the condition of the subject of the word. For example, in the triad of "go" = "goes", the value of "the subject phrase = <...>" changes from "person = first person" of "go" to "person = third person".

　"I go"という文字列が文として成立するためには、"go"が持つ「主語となる句=<…>」の条件と"I"の構文素性値とが一致しなければならない。この制限を明示するために、図４のような文法規則を定義する。 In order for the character string “I go” to be a sentence, the condition of “the subject phrase = <…>” of “go” must match the syntactic feature value of “I”. To specify this restriction, a grammar rule as shown in FIG. 4 is defined.

　図４の(1)式の「新しい句 → 句１句２」は、『新しく作られる句は、「句１句２」から構成される』ことを示しており、「句２[主語となる句=<主語の条件>] 」は『句２が持つ「主語となる句」という素性の値が「主語の条件」という変数で表わされる』ことを示している。 “New phrase → phrase 1 phrase 2” in the expression (1) in FIG. 4 indicates that “a newly created phrase is composed of“ phrase 1 phrase 2 ””, and “phrase 2 [becomes the subject Phrase = <condition of subject>] ”indicates that the value of the feature“ phrase that is the subject ”of phrase 2 is represented by a variable of“ condition of subject ”.

　以下では、この文法規則がどのように適用されるかを説明する。"I go"という文字列をこの文法規則に当てはめると、句1 ="I"、句2 ="go"となる。"go"の解析情報は図５のように構文素性の形式で辞書に記述されている。 The following explains how these grammatical rules are applied. Applying the string "I go" to this grammar rule yields clause 1 = "I" and clause 2 = "go". The analysis information of "go" is described in the dictionary in the form of syntactic feature as shown in FIG.

　図５を文法規則中の「句２[主語となる句=<主語の条件>] 」に当てはめると、図６のようになる。文法規則の (2) if 以降は、それぞれの句の条件を示している。「句1 = 主語の条件」は、『句1 の構文素性と、「主語の条件」という変数に代入された構文素性が矛盾なく一致する』という条件を示している。 Applying FIG. 5 to “Phrase 2 [Phrase that becomes subject = <condition of subject>]” in the grammar rules, the result is as shown in FIG. After (2) if in the grammar rules, the condition of each phrase is shown. “Phrase 1 = subject condition” indicates a condition that “the syntactic feature of phrase 1 and the syntactic feature assigned to the variable“ subject condition ”are consistent”.

　"I go"という文章では、図７の２つがまったく矛盾なく一致し、条件が満たされるので、この文法規則が成立することになる。 In the sentence "I go", the two in FIG. 7 match without any contradiction, and the condition is satisfied, so that this grammatical rule is established.

　文法規則の (3) then 以降は、この文法規則の適用により、どのような句が作成されるかを示す。「新しい句:主語 = 句１」は『新しい句の「主語」が「句１」になる』ことを示し、「新しい句:主辞 = 句２」は『新しい句の「主辞」が「句２」となる』ことを示している。つまり、新しく作成される句（"I go"という文字列全体に対応する句）の「主語」の素性値は「句１＝"I"」の内容がそのまま入り、「主辞」の素性値は「句２＝"go"」の内容がそのまま入る。従って、"I go"は図８のような構成の句として作成される。 (After (3) then of the grammar rules, it shows what phrases are created by applying the grammar rules. “New phrase: subject = phrase 1” indicates that “the subject of the new phrase becomes“ phrase 1 ””, and “new phrase: subject = phrase 2” indicates that “the new phrase“ subject ”is“ phrase 2 ”. "]. In other words, the feature value of the "subject" of the newly created phrase (the phrase corresponding to the entire character string "I go") contains the contents of "phrase 1 =" I "" as it is, and the feature value of the "subject" is The content of “phrase 2 =“ go ”” is entered as it is. Therefore, "I go" is created as a phrase having a configuration as shown in FIG.

　このように、"I go."の例文では、"go"が一人称単数の名詞からなる主語を持ち、"I"が一人称単数の名詞であるということから、"I"が"go"の主語となって、主語＋動詞なる構文が決定された。 Thus, in the example sentence of "I go.", "Go" has the subject of the first person singular noun, and "I" is the first person singular noun, so "I" is the subject of "go". As a result, the subject + verb syntax was determined.

　上記例の構文決定のプロセスではもっぱら単語の文法属性が判断基準とされている。次に意味属性をも判断基準とする構文解析を前出の“Time flies like an arrow.”を例に説明する。では In the syntax determination process in the above example, the grammatical attribute of the word is the sole criterion. Next, parsing using a semantic attribute as a criterion will be described with reference to "Time flies like an arrow."

　“Time flies like an arrow.”は、前述したように“flies”（飛ぶ）と””like”（好む）が主辞と成り得る。 “Time flies like an arrow.” Can be headed by “flies” (fly) and “” like ”(preferred) as described above.

　“flies”（飛ぶ）を主辞とした場合、“Time flies like an arrow”は前述したように図９のような構文木となる。場合 When “flies” (fly) is used as the head, “Time flies like an arrow” becomes a syntax tree as shown in FIG. 9 as described above.

　図９の構文木においては、主辞“flies”は、”Time”という主語を持ち、”like”以下は前置詞句と解される。また、前置詞句の内部では、”like”はその目的語として””an arrow”を持つと解される。この場合、“Time flies like an arrow.”は、“時間は矢のように飛ぶ。”（直訳）と解釈される。において In the syntax tree of FIG. 9, the subject “flies” has the subject “Time”, and the words after “like” are interpreted as prepositional phrases. Also, inside a prepositional phrase, "like" is interpreted as having "an arrow" as its object, in which case "Time flies like an arrow." "(Literally translated).

　一方、“like”(好む)を主辞とした場合、“Time flies like an arrow”は図１０のような構文木となる。 On the other hand, when “like” is the head, “Time flies like an arrow” becomes a syntax tree as shown in FIG.

　図１０の構文木においては、主辞“like”は、”Time flies”（時間ハエ）という主語を持ち、””an arrow”という目的語を持つ。この場合、“Time flies like an arrow.”は、“時間ハエは矢を好む。”と解釈される。 10, the subject "like" has the subject "Time flies" and the object "" an arrow. "In this case," Time flies like an arrow. " , "Time flies like arrows. Is interpreted as ".

　このような２つの構文木に対しては、従来の機械翻訳システムは、これらの構文木がいずれも文法上あり得るため最適の構文木を選択することができなかったが、本発明においては単語辞書９は、“like”を主辞「好む」と解した場合は、その主語は「人を表す意味属性」を有することを“like”の解析情報として登録することができる。一方、”flies”を名詞（ハエ）と解した場合は、「昆虫を表す意味属性」を有することを”flies”の解析情報として登録することができる。 For such two parse trees, the conventional machine translation system could not select an optimal parse tree because each of these parse trees could be grammatically correct. When the dictionary 9 interprets “like” as the subject of “like”, the fact that the subject has “semantic attribute representing a person” can be registered as the analysis information of “like”. On the other hand, if "flies" is interpreted as a noun (fly), the fact that "flies" is present can be registered as analysis information of "flies".

　このため、図１０の構文木のように主語を”flies”（ハエ）（「昆虫を表す意味属性」を有する）とすると、主語”flies”の意味属性と主辞“like”の意味属性が一致しないことになる。これに対して、図９の構文木のように”flies”を主辞「飛ぶ」と解した場合は、このような意味属性の不一致は生じない。従って、本発明の機械翻訳システム１によれば、図１０の構文木を排除して図９の構文木を選択するようになる。 Therefore, if the subject is “flies” (flies) (having “semantic attributes representing insects”) as in the syntax tree of FIG. 10, the semantic attributes of the subject “flies” and the semantic attributes of the subject “like” match. Will not do. On the other hand, if "flies" is interpreted as the head of "fly" as in the syntax tree of FIG. 9, such a mismatch in semantic attributes does not occur. Therefore, according to the machine translation system 1 of the present invention, the syntax tree in FIG. 10 is excluded and the syntax tree in FIG. 9 is selected.

　このように、従来の機械翻訳システムが最適な構文木を判断する手段を有していないのに比べて、本発明による機械翻訳システムでは適切な構文木を判断することができるのである。 As described above, the conventional machine translation system has no means for determining an optimal syntax tree, whereas the machine translation system according to the present invention can determine an appropriate syntax tree.

　最後に、本発明の生成木生成／変形機能について説明する。 Finally, the generator tree generation / deformation function of the present invention will be described.

　HPSG は、主に「構文解析」を行うための理論であるが、本出願人は、その枠組みを利用して「構文解析」以外のモジュールも開発する。こうすることによって、従来の機械翻訳システムより細かい「構文解析」を行えることはもちろん、「訳語決定」のための文脈情報（「特定の修飾語に修飾されている」といった情報）や、「生成木生成／変形」のための原文と訳文の対応情報（「原文では形容詞だが、訳文では副詞になる」といった情報）なども辞書に記述できるようになり、従来の機械翻訳システムよりも細かい翻訳上の設定が可能となる。 HPSG is a theory mainly for performing “syntactic analysis”, but the applicant uses the framework to develop modules other than “syntactic analysis”. By doing so, it is possible to perform “syntax analysis” more finely than a conventional machine translation system, as well as context information (information such as “modified by a specific modifier”) for “translation determination” and “generation”. Correspondence information between the original sentence and the translated sentence for "tree generation / transformation" (information such as "adjective in the original sentence but adverb in the translated sentence") can also be described in the dictionary, allowing for more detailed translation than conventional machine translation systems. Can be set.

　本実施形態の機械翻訳システム１は一定の条件のもとに自然な日本語への翻訳を行うための生成木生成／変形を行う。 The machine translation system 1 of the present embodiment generates / transforms a generated tree for performing natural translation into Japanese under certain conditions.

　例えば"He is a good swimmer."という文について考える。 For example, consider the sentence "He is a good swimmer."

　図１１は、上記文例の生成木生成／変形、および、訳文生成の様子を示している。 FIG. 11 shows a state of generation / transformation of a generation tree and generation of a translated sentence in the above sentence example.

　従来の機械翻訳システムでは、主語を"he"、動詞を"is"、補語を"a good swimmer"としてとらえ、「主語」は「補語」であるという生成方法により、「彼は、良い泳ぎ手である。」というような直訳調の訳文を生成していた。 In a conventional machine translation system, the subject is "he", the verb is "is", and the complement is "a good swimmer", and "subject" is "complement". . "

　本発明の機械翻訳システム１では、HPSG理論に基づき、主辞を"is"とし、主語を"he"、補語を"a good swimmer"として構文解析する。 The machine translation system 1 of the present invention parses the subject as "is", the subject as "he", and the complement as "a good swimmer" based on the HPSG theory.

　次に、本発明の機械翻訳システム１によれば、"is"に関して単語辞書９の生成用バイナリ部に「補語の名詞に動詞に成り得る名詞（swimmer）が来て、かつ、それが形容詞（good）によって修飾されているとき、名詞を動詞表現で訳し、かつ、「のが」をつけて体言化し全体を”がが構文”で訳す」という生成情報が記載されていて、かつ"swimmer"の辞書記述に動詞訳「泳ぐ」が登録されていれば、図１１に示すように、名詞の動詞表現、形容詞の副詞表現に適当な生成木（日本語生成用構文木）を生成・変換する。 Next, according to the machine translation system 1 of the present invention, “noun (swimmer) that can be a verb is added to the noun of the complement in the generation binary part of the word dictionary 9 for“ is ”, and it is an adjective ( When modified by "good", the noun is translated by a verb expression, and the generation information that "noga" is added to the nomenclature and the whole is translated by "ga syntax" is described, and "swimmer" If the verb translation “swimming” is registered in the dictionary description, a generator tree (syntax tree for generating Japanese) suitable for the verb expression of the noun and the adverb expression of the adjective is generated as shown in FIG. .

　次に、機械翻訳システム１はこの生成木に対して、訳語を適用し、「彼は泳ぐのがうまい」という訳文を生成するのである。 Next, the machine translation system 1 applies a translated word to this generated tree, and generates a translated sentence “He is good at swimming”.

　上述した例のような生成情報を緻密に単語辞書９に記載することにより、従来の直訳調の翻訳文の不自然さを克服し、より自然な日本語翻訳文を生成することができる。こと By writing the generated information as described in the above example in the word dictionary 9 precisely, it is possible to overcome the unnaturalness of the conventional direct-translation-like translation and to generate a more natural Japanese translation.

　本発明による機械翻訳システムは、上述したように単語辞書に、各単語について見出し語、訳語、文法属性、解析情報、生成情報等（構文素性）を記載している。 (4) The machine translation system according to the present invention describes headwords, translations, grammatical attributes, analysis information, generation information, and the like (syntax features) in the word dictionary as described above.

　このため、本発明による機械翻訳システムは、ユーザーが自由に翻訳のカスタマイズを行うための「辞書登録手段」を容易に備えることができる。 Therefore, the machine translation system according to the present invention can easily include “dictionary registration means” for allowing a user to freely customize translation.

　すなわち、辞書登録手段により、単語辞書の単語の意味属性と、解析情報と、生成情報を登録・更新すれば、その単語はユーザーの指定したような意味属性と解析情報と生成情報を有するように働く。 That is, if the dictionary registration means registers / updates the semantic attributes of the words in the word dictionary, the analysis information, and the generation information, the words have the semantic attributes, analysis information, and generation information as specified by the user. work.

　ここで、単語についてユーザーが登録した解析情報は当該単語の翻訳にのみ適用されることは、大きな利点を有する。 Here, there is a great advantage that the analysis information registered by the user for the word is applied only to the translation of the word.

　すなわち、従来の機械翻訳システムでは、翻訳方法を規定するために文法規則を定義する必要がある。しかし、一旦文法規則を定義すると、目的とする単語の翻訳のみならず、すべての単語に当該文法規則が適用される。このようにすると、必ずしも望ましくない翻訳方法が思わぬところで適用される弊害があった。これに対して、本発明のように単語ごとに解析情報を定義する方法によれば、その解析情報は当該単語の翻訳にのみ適用されるので、きめ細かいユーザーカスタマイズを行うことができる。 That is, in the conventional machine translation system, it is necessary to define a grammar rule in order to define a translation method. However, once a grammar rule is defined, the grammar rule is applied to all words, not just the translation of the target word. In this case, there is a problem that an undesirable translation method is applied unexpectedly. On the other hand, according to the method of defining the analysis information for each word as in the present invention, the analysis information is applied only to the translation of the word, so that the user can be finely customized.

　もともと言語は、ある単語がある単語と結びついて意味を作り上げていくといったほうが言語の特徴をとらえていると思われるが、このような語彙理論に基づく翻訳ソフトウェアは、まだ登場していなかった。本発明のHPSG理論に基づく方式では、文の構造の把握に用いる構文情報の大半を、「文法規則」ではなく「語彙」つまり「辞書」に記述できるようにした点に特徴がある。また、「訳語の選択」、「訳文の生成」についても、より精度を高めるため、今よりも詳しい生成規則を辞書上で記述可能とする。これによって、たとえ翻訳ソフトウェアが正しい翻訳を出せなかったとしても、ユーザーが辞書さえ修正すれば望んだ解析結果、訳語、訳文を得ることができる。その結果、ユーザーがカスタマイズできる範囲が従来より圧倒的に増し、「学習効果」の高い機械翻訳システムが実現できる。 Originally, it seems that a language captures the characteristics of a language by combining a word with a word to create meaning, but translation software based on such vocabulary theory has not yet appeared. The method based on the HPSG theory of the present invention is characterized in that most of the syntax information used to grasp the structure of a sentence can be described in a "vocabulary", that is, a "dictionary" instead of a "grammar rule". In order to further improve the accuracy of “selection of translation” and “generation of translation”, more detailed generation rules can be described on the dictionary. As a result, even if the translation software fails to produce a correct translation, the user can obtain the desired analysis result, translation, and translation if the user corrects the dictionary. As a result, the range in which the user can customize is greatly increased compared to the conventional case, and a machine translation system with a high "learning effect" can be realized.

　つまり、HPSG理論に基づいて開発した本発明の機械翻訳システムでは、構文に関する情報の大部分を語彙的な性質として定義し、それを辞書に記述できるようにしている。これによって、これまでの辞書記述（解析情報や生成情報の記述）を拡大し、従来の単語単位の辞書登録だけでなく、決まりきった言い回しを含む、より広範囲にわたる辞書登録を可能にし、ユーザーが辞書の登録内容さえ変更すれば自分の望む翻訳を取得できる機械翻訳システムの実現することができるのである。 In other words, in the machine translation system of the present invention developed based on the HPSG theory, most of the information on syntax is defined as a lexical property, and this can be described in a dictionary. This expands existing dictionary descriptions (descriptions of analysis information and generated information), and enables not only conventional dictionary registration in word units, but also broader dictionary registration including fixed wording. By simply changing the registered contents of the dictionary, a machine translation system that can obtain the desired translation can be realized.

　また、辞書登録手段は、単語の意味属性、あるいは解析情報、あるいは生成情報が複数個ある場合に、適用すべき意味属性、あるいは解析情報、あるいは生成情報の優先順位を指定することができるようにすることができる。 Further, the dictionary registering means can specify a semantic attribute to be applied, or analysis information, or a priority order of generation information, when there are a plurality of meaning attributes, analysis information, or generation information of a word. can do.

　また、単語の意味属性、あるいは解析情報、あるいは生成情報が複数個ある場合に、最前に適用した意味属性、あるいは解析情報、あるいは生成情報を自動的に優先適用することもできる。 Also, when there are a plurality of meaning attributes, analysis information, or generation information of a word, it is possible to automatically prioritize the most recently applied meaning attribute, analysis information, or generation information.

　以上が本発明の機械翻訳システムについての説明であった。以下は上記単語辞書の特徴を応用した漢字変換用フロントエンドプロセッサについて述べる。 The above is the description of the machine translation system of the present invention. The following describes a kanji conversion front-end processor that utilizes the features of the word dictionary.

　一般に、漢字変換用フロントエンドプロセッサは同音異義語の変換が困難である。 Generally, it is difficult for kanji conversion front-end processors to convert homonyms.

　従来の漢字変換用フロントエンドプロセッサは、単語の品詞と見出し語を登録した単語辞書と、同音異義語を変換するための文脈解析用辞書およびそのプログラムを有していた。 The conventional kanji conversion front-end processor had a word dictionary in which the parts of speech and headwords of words were registered, a context analysis dictionary for converting homonyms and a program therefor.

　これに対して、本発明による漢字変換用フロントエンドプロセッサは、単語辞書に単語のみならず、単語の意味属性、付属語および述語がとり得る単語の意味属性が登録されていることを特徴としている。 On the other hand, the kanji conversion front end processor according to the present invention is characterized in that not only words but also semantic attributes of words, adjuncts, and semantic attributes of words that can be taken by predicates are registered in the word dictionary. .

　たとえば”貴社の記者は、汽車で帰社した。”という文例を考える。この場合「きしゃ」が４つの異なる単語に漢字変換されなければならない。 For example, consider the sentence "Your reporter returned by train." In this case, "Kisha" must be converted to Kanji into four different words.

　「貴社の」の「の」、「記者は」の「は」、「汽車で」の「で」、「帰社した」の「した」のように単語に付属して使用される語を「付属語」ということにする。付属語は、それが使用される単語の属性を示す役割を持っている。「〜の」は、帰属する相手を示す名詞に付属し、その名詞は人、組織、物、・・・の意味属性を有している。「〜は」は、主語を示す名詞に付属し、その名詞は人、組織、物、・・・の意味属性を有している。「〜で」は、手段を示す名詞に付属し、その名詞は物の意味属性を有している。「〜した」は、動作を示す動詞に付属する。 The words that are used in the word, such as "no" for "your company", "ha" for "reporter", "de" for "by train", and "ha" for "returned", are included Word. An adjunct has the role of indicating the attribute of the word in which it is used. "-" Is attached to a noun indicating the partner to which the user belongs, and the noun has a semantic attribute of a person, an organization, an object,.... "~ Ha" is attached to a noun indicating the subject, and the noun has a semantic attribute of person, organization, object, .... "~" Is attached to a noun indicating a means, and the noun has a semantic attribute of an object. The word "to" is attached to a verb indicating an action.

　一方、「貴社」は組織という意味属性、「記者」は人間という意味属性、「汽車」は乗り物という意味属性を有する名詞であり、「帰社」は動作を示し、人間という意味属性の主語を持つ。 On the other hand, "you" is a noun with a semantic attribute of organization, "reporter" is a noun with a semantic attribute of human, "train" is a noun with a semantic attribute of vehicle, and "going home" indicates action and has the subject of a semantic attribute of human. .

　上記文例では、「帰社した」が述語になり、人間という意味属性を有する主語を持つ。これにより主語を示す「きしゃは」は「記者は」となり、その行動手段を示すものは「汽車で」となり、その記者の帰属する相手は「貴社の」となる。では In the above sentence example, “returned to home” is a predicate, and has a subject having a semantic attribute of human. As a result, the subject "Kishaha" becomes "Reporter", the one showing its action means "By train", and the person to whom the reporter belongs becomes "Your company".

　本発明による漢字変換用の単語辞書は、上述したように、単語に付随して意味属性その他漢字変換用の生成情報を登録している。このため、前述した機械翻訳システムの単語辞書と同様に、特定の漢字変換をさせるためのユーザーカスタマイズを容易に実現することができる。 As described above, the word dictionary for kanji conversion according to the present invention registers semantic attributes and other generation information for kanji conversion along with words. Therefore, similarly to the word dictionary of the machine translation system described above, user customization for performing a specific kanji conversion can be easily realized.

　すなわち、ユーザーに単語の意味属性あるいは特定の用法における生成情報を登録・更新させる辞書登録手段を設けることにより、単語に任意の意味属性を持たせることができ、また、特定の付属語あるいは述語に対しては特定の漢字に変換させることができるのである。 In other words, by providing a dictionary registration means for registering and updating the semantic attribute of a word or generated information in a specific usage, a word can have an arbitrary semantic attribute, and a specific adjunct or predicate On the other hand, it can be converted to a specific kanji.

　また、機械翻訳システムの場合と同様に、単語の意味属性と生成情報が複数個ある場合には、上記辞書登録手段により適用すべき意味属性と生成情報の優先順位を指定することができる。また、単語の意味属性と生成情報が複数個ある場合には、最前に適用した意味属性と生成情報を優先して適用するようにすることもできる。 Also, as in the case of the machine translation system, when there are a plurality of semantic attributes and generation information of a word, the dictionary registration means can specify the semantic attributes to be applied and the priority of the generation information. Further, when there are a plurality of meaning attributes and generation information of a word, it is also possible to preferentially apply the semantic attribute and generation information applied first.

　なお、本願出願人は、有効な「例文翻訳」を行うことができる機械翻訳システムを提供することを一つの目的としている。 The applicant of the present invention has one object to provide a machine translation system capable of performing effective "example sentence translation".

　「例文翻訳」が提唱されたのは１０年以上前にさかのぼるが、商用翻訳ソフトウェアに登場しだしたのは、まだここ２年程度のことである。本出願人が開発した「例文翻訳」は、既に翻訳をした結果を、原文と翻訳文を対にしてデータベースに蓄えておき、これとまったく同じ文が現れたときは、この訳を活用しようというものである。例 “Translation of example sentences” was proposed more than ten years ago, but it has only appeared in commercial translation software in the last two years or so. In the "Example Sentence Translation" developed by the applicant, the result of translation is stored in the database as a pair of the original sentence and the translated sentence, and when the exact same sentence appears, this translation is used. Things.

　しかしながら、まったく同じ文が登場する確率は一般に低いので、その例文の一部が違っていてもその訳例を採用できるように、一部変数表現を許して、例文を登録することもできるなどの工夫をしている。この技術によって、従来の「文法規則」を核とした翻訳技術では正しい解が得られない文章や、機械翻訳が出す直訳調の表現ではなくもっと適切ななめらかな日本語表現を出したい場合などに、ユーザーは文単位で原文・訳文の対を例文データベースに登録することによって、辞書以外にも翻訳システムをカスタマイズしていける手段を持てるようになった。 However, since the probability that the same sentence appears is generally low, it is also possible to allow some variable expressions and register the example sentence so that even if the example sentence is partially different, the translated example can be adopted. We are devising. With this technology, you can use a translation technique based on the traditional "grammar rules" as a core, or if you want to produce a more appropriate smooth Japanese expression instead of a direct translation-like expression produced by machine translation. By registering pairs of original sentences and translated sentences in the example sentence database for each sentence, the user can have a means of customizing the translation system other than the dictionary.

　翻訳者が過去に翻訳した例文を例文データベースに蓄えておき、翻訳したい文と一致もしくは類似する文を、例文データベース中から探して表示し、その訳文を参照しながら人間翻訳を支援するというシステムは、「翻訳メモリ」というジャンルで、商品化されている。 A system that stores the example sentences translated by the translator in the past in the example sentence database, searches the example sentence database for a sentence that matches or is similar to the sentence to be translated and displays it, and supports human translation while referring to the translated sentence , "Translation memory".

　本出願人は、「翻訳メモリ」を機械翻訳の中に取り込み、「例文翻訳」と「従来のルールベースの翻訳」とを融合させたということで、「統合翻訳ソフトウェア」という表現で数年前から組み込んだ。これも、今後、マッチング技術や、類似文章の検索技術に加えて、より柔軟な変数表現を含む例文の拡張や、ユーザーにやさしい登録方法など、今後の改良課題は多いが、現状の機械翻訳ソフトウェアの壁を破るひとつの柱であると確信している。 The present applicant incorporated the "translation memory" into machine translation and fused "example sentence translation" and "conventional rule-based translation" several years ago using the expression "integrated translation software". Incorporated from. In the future, in addition to matching technology and similar sentence search technology, there will be many future improvements such as expansion of example sentences including more flexible variable expressions and user-friendly registration methods. I am convinced that it is one pillar that breaks the wall of the.

　本出願人は、従来の「文法規則」を核にする翻訳技術から、最新の言語理論である HPSG（Head-driven Phrase Structure Grammar）理論（「語彙理論」の中で代表的なもの）を基礎とした機械翻訳システムの開発に数年前から取り組んできた。目的は、翻訳の精度を高めると同時にユーザーカスタマイズの範囲を飛躍的に増大させるための「次世代翻訳技術」を構築するためである。 The present applicant has developed the latest linguistic theory, HPSG (Head-driven Phrase Structure Grammar) theory (typical of "vocabulary theory"), from the conventional translation technology based on "grammar rules". We have been working on the development of a machine translation system for several years. The purpose is to build "next-generation translation technology" to increase the accuracy of translation and dramatically increase the scope of user customization.

本発明に係る機械翻訳システムの一実施形態のブロック図。FIG. 1 is a block diagram of an embodiment of a machine translation system according to the present invention. 構文素性の定義を示す図。The figure which shows the definition of a syntactic feature. 各単語の構文素性の記述例を示す図。The figure which shows the description example of the syntactic feature of each word. 文法規則の一例を示す図。The figure which shows an example of a grammar rule. "go"の構文素性の記述例を示す図。The figure which shows the description example of the syntactic feature of "go". 文法規則中の主語の条件を示す図。The figure which shows the condition of the subject in a grammar rule. "I go"の主語の条件を示す図。The figure which shows the condition of the subject of "I go". "I go"の主語および主辞を示す図。The figure which shows the subject and the head of "I go". "Time flies like an arrow."の"flies"を述語とした場合の構文木を示す図。The figure which shows the syntax tree when "flies" of "Time flies like an arrow." Is used as a predicate. "Time flies like an arrow."の"like"を述語とした場合の構文木を示す図。The figure which shows the syntax tree when "like" of "Time flies like an arrow." Is used as a predicate. 生成木生成／変換の様子を示した図。FIG. 6 is a diagram showing a state of generation / conversion of a generation tree. 従来の機械翻訳システムの構成を示したブロック図。FIG. 2 is a block diagram showing a configuration of a conventional machine translation system. 従来の機械翻訳システムによる構文木の例を示した図。The figure which showed the example of the syntax tree by the conventional machine translation system.

Explanation of reference numerals

１　機械翻訳システム
２　標準入力部
３　形態素解析機能部
４　構文解析機能部
５　訳語決定機能部
６　生成木生成／変形機能部
７　訳文生成機能部
８　標準出力部
９　単語辞書
１０　文法データベース DESCRIPTION OF SYMBOLS 1 Machine translation system 2 Standard input unit 3 Morphological analysis function unit 4 Syntax analysis function unit 5 Translated word determination function unit 6 Generated tree generation / transformation function unit 7 Translated sentence generation function unit 8 Standard output unit 9 Word dictionary 10 Grammar database

Claims

A word dictionary that registers headwords for each word, translations when they exist, grammatical attributes, and analysis information indicating the relationship with other words,
A grammar database that stores the main syntactic grammars,
A morphological analysis function unit for inputting a sentence, collating with the word dictionary and decomposing into morphemes;
A parse tree is determined by extracting a head from a group of words among the morphemes decomposed by the morphological analysis function unit and selecting a morphological information that matches the syntactic feature of each word of the preceding and following morphemes from the analysis information of the head. A parsing function unit,
A translated word determining function unit that determines a translated word corresponding to each word in the syntax tree determined by the syntax analyzing function unit;
A translation generation function unit that generates a translation by applying a translation to each word of the syntax tree.

In the word dictionary, generation information describing a special translation rule when a predetermined word satisfies a user-specified condition is registered,
The machine translation system according to claim 1, further comprising a generation tree generation / deformation function unit that deforms the syntax tree determined by the syntax analysis function unit according to the translation rule.

In the word dictionary, the semantic attribute of the target word related when the word becomes the head is registered as the analysis information of the word, and the semantic attribute is registered in the analysis information of the word that can be the target of the head. The machine translation system according to claim 1, wherein the translation is performed.

4. The machine translation system according to claim 1, further comprising a dictionary registration unit that allows a user to register / update at least one of a semantic attribute of a word, analysis information, and generation information.

When there are a plurality of semantic attributes, analysis information, or generation information of a word, a word dictionary in which the user designates the priority of the semantic attribute to be applied, analysis information, or generation information via a dictionary registration unit is provided. The machine translation system according to any one of claims 1 to 4, wherein the system is provided.

When there are a plurality of semantic attributes, analysis information, or generation information of a word, the syntax analysis function unit searches the word dictionary for the most recently applied semantic attribute, analysis information, or generation information. The machine translation system according to any one of claims 1 to 4, wherein