JP4321336B2

JP4321336B2 - Natural language processing system, natural language processing method, and computer program

Info

Publication number: JP4321336B2
Application number: JP2004120082A
Authority: JP
Inventors: 博増市; 智子大熊; 真広瀬; 大悟杉原; 宏樹吉村
Original assignee: Fuji Xerox Co Ltd; Fujifilm Business Innovation Corp
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2004-04-15
Filing date: 2004-04-15
Publication date: 2009-08-26
Anticipated expiration: 2024-04-15
Also published as: JP2005301868A

Description

本発明は、人間が日常的なコミュニケーションに使用する自然言語を数学的に取り扱うための自然言語処理システム及び自然言語処理方法、並びにコンピュータ・プログラムに係り、特に、自然言語文の校正あるいは推敲を支援する自然言語処理システム及び自然言語処理方法、並びにコンピュータ・プログラムに関する。 The present invention relates to a natural language processing system, a natural language processing method, and a computer program for mathematically handling a natural language used by humans for daily communication, and in particular, assists in proofreading or revising a natural language sentence. The present invention relates to a natural language processing system, a natural language processing method, and a computer program.

さらに詳しくは、本発明は、母国語でない言語を用いて記述された文章に対して見られがちな、単語の並びやその他の不自然な表現の校正又は推敲を支援する自然言語処理システム及び自然言語処理方法、並びにコンピュータ・プログラムに係り、特に、単語の綴りの間違いや文法的な誤りはないが、母国語話者からみて不自然となる文章の校正又は推敲を支援する自然言語処理システム及び自然言語処理方法、並びにコンピュータ・プログラムに関する。 More particularly, the present invention relates to a natural language processing system and a natural language processing system that assist in proofreading or revising word sequences and other unnatural expressions that are often found in sentences written using a language that is not a native language. A language processing system and a computer program, and in particular, a natural language processing system that supports the proofreading or revising of sentences that are unnatural from the viewpoint of a native speaker, although there are no spelling errors or grammatical errors The present invention relates to a natural language processing method and a computer program.

日本語や英語など、人間が日常的なコミュニケーションに使用する言葉のことを「自然言語」と呼ぶ。多くの自然言語は、自然発生的な起源を持ち、人類、民族、社会の歴史とともに進化してきた。勿論、人は身振りや手振りなどによっても意思疎通を行なうことが可能であるが、自然言語により最も自然で且つ高度なコミュニケーションを実現することができる。 Words that humans use for everyday communication, such as Japanese and English, are called “natural languages”. Many natural languages have a naturally occurring origin and have evolved with the history of mankind, people and society. Of course, people can communicate with each other by gestures and hand gestures, but natural language can realize the most natural and advanced communication.

自然言語は、本来抽象的で曖昧性が高い性質を持つが、文章を数学的に取り扱うことにより、コンピュータ処理を行なうことができる。この結果、機械翻訳や対話システム、検索システム、質問応答システム、文章校正システムなど、自動化処理により自然言語に関するさまざまなアプリケーション／サービスが実現される。 Natural language is inherently abstract and highly ambiguous, but it can perform computer processing by mathematically handling sentences. As a result, various applications / services related to natural language are realized by automated processing such as machine translation, dialogue system, search system, question answering system, and sentence proofing system.

例えば、母国語でない言語を用いて文章を書く場合、最大限注意を払ったとしても、その言語を母国語とするものからみると不適切な表現となっている文章を書いてしまうことは避けられない。すなわち、誤った綴りの単語を書いてしまう、文法的な誤りを犯してしまう、あるいは、文法的には正しいが不自然な文を書いてしまう、といったことが頻繁に起こる。このような場合、文章校正システムを用いて、単語の綴りの誤りや文法的な誤りを検出し、文章を修正することができる。スペル・チェッカや構文チェッカなどは既に実用化されている。 For example, when writing a sentence in a language that is not your native language, even if you pay the utmost care, avoid writing sentences that are inappropriately expressed in terms of the language as the native language. I can't. That is, it often happens that the user writes a misspelled word, commits a grammatical error, or writes a sentence that is grammatically correct but unnatural. In such a case, it is possible to correct a sentence by detecting a spelling error or grammatical error using a sentence proofreading system. Spell checkers and syntax checkers have already been put into practical use.

文章に含まれる文法的な誤りに関しては、基本的に、構文解析処理を施すことで検出することができる。構文解析とは、文法規則に基づいて語と語の係り受け関係を特定する処理である。 Basically, grammatical errors contained in sentences can be detected by performing a parsing process. Parsing is a process of specifying a dependency relationship between words based on grammatical rules.

例えば、正しい句構造規則の他に、誤り句構造を含む誤り文に関する情報を文法規則テーブルに含めることで、文法的な誤りを含んだ文を、誤りを含まない正しい文と同様に構文解析することができる（例えば、特許文献１を参照のこと）。 For example, in addition to correct phrase structure rules, grammatical rule tables include information about error sentences that contain error phrase structures, so that sentences containing grammatical errors are parsed in the same way as correct sentences that do not contain errors. (For example, see Patent Document 1).

また、訂正文生成用辞書と、学習者の入力文中の誤りに関する情報を格納する情報スタックを備え、学習者の入力文から文法的に誤りのある語句を検出し、その誤りのある語句を訂正して学習者の入力文を文法的に正しい文に直すことができる（例えば、特許文献２を参照のこと）。例えば、「Ｉｗａｎｔｔｏｏｒｏｏｍ．」という文が入力された場合に、「Ｉｗａｎｔｔｗｏｒｏｏｍ」へと正しく訂正することができる。 In addition, it has a correction sentence generation dictionary and an information stack that stores information about errors in the learner's input sentence. It detects grammatically incorrect phrases from the learner's input sentence and corrects the erroneous phrases. Thus, the learner's input sentence can be corrected to a grammatically correct sentence (see, for example, Patent Document 2). For example, when a sentence “I want to room.” Is input, it can be correctly corrected to “I want two room”.

特開平２−２２６３６４号公報JP-A-2-226364 特開平５−２０４２９９号公報JP-A-5-204299

上述したように、母国語でない言語を用いて文章を書く場合、最大限注意を払ったとしても、その言語を母国語とするものからみると不適切な表現となっている文章を書いてしまうことは避けられない、という問題がある。 As mentioned above, when writing in a language that is not your native language, even if you pay the most attention, you will write a sentence that is inappropriate when viewed from your native language. There is a problem that this is inevitable.

非母国語であっても、それを一定以上のレベルで習得した文の書き手にとっては、このような文法的なミスあるいは不注意を避けることは比較的容易なことである。さらに、上述したような文章校正システムによれば、単語レベルの綴りに関する誤りを軽減し、構文レベルにおいても、文法的に誤っていることが明らかな文を書くことが回避できる。 Even for non-native languages, it is relatively easy for a writer who has mastered it at a certain level to avoid such grammatical errors or carelessness. Furthermore, according to the sentence proofreading system as described above, it is possible to reduce errors related to word level spelling and to avoid writing sentences that are clearly grammatically incorrect at the syntax level.

例えば、文法的であるか否かという基準に沿って非文法的な文を特定するルールを記述することは可能であり、実際このようなルールを蓄積することによって、文法チェッカを実現することができる（例えば、特許文献１並びに特許文献２を参照のこと）。 For example, it is possible to write rules that identify non-grammatical sentences according to the criteria of whether or not they are grammatical. In fact, by accumulating such rules, a grammar checker can be realized. (For example, see Patent Document 1 and Patent Document 2).

これに対し、文法的には誤りがないにもかかわらず母国語話者からみて不自然な文の記述を避けることは、非母国語話者にとっては極めて困難なことである。また、文法的には的確であるが不自然な文を自然言語処理システムにおいて特定することは難しい。これは、母国語話者が自然と感じるか否かをルールとして宣言的に記述することが困難であることに由来する。また、たとえ母国語話者からみて自然に感じる文であるか否かが判定できたとしても、システムがより自然な文を書き手に提示することはさらに困難な処理である。 On the other hand, it is extremely difficult for non-native speakers to avoid writing unnatural sentences from the viewpoint of native speakers even though there are no grammatical errors. Moreover, it is difficult to identify an unnatural sentence that is grammatically correct but unnatural in a natural language processing system. This is because it is difficult to declaratively describe whether or not the native speaker feels natural as a rule. Further, even if it can be determined whether or not the sentence is natural to the native speaker, it is more difficult for the system to present a more natural sentence to the writer.

例えば、以下の２つの英語文（１）（２）はほぼ同じ意味を表し、且つ、どちらも自然な英語文である。 For example, the following two English sentences (1) and (2) have almost the same meaning, and both are natural English sentences.

（１）ＩｔｉｓｕｎｌｉｋｅｌｙｔｈａｔＪｏｈｎｗｉｌｌｂｅｅｌｅｃｔｅｄ．
（２）Ｉｔｉｓｉｍｐｒｏｂａｂｌｅｔｈａｔｊｏｈｎｗｉｌｌｂｅｅｌｅｃｔｅｄ． (1) It is unlikely that John will be selected.
(2) It is improbable that John will be selected.

これに対し、以下の文（３）は英語を母国語とする話者にとっては自然な英文であるのに対し、文（４）は不自然な文である。 On the other hand, the following sentence (3) is a natural sentence for a speaker whose native language is English, whereas sentence (4) is an unnatural sentence.

（３）Ｊｏｈｎｉｓｕｎｌｉｋｅｌｙｔｏｂｅｅｌｅｃｔｅｄ.
（４）Ｊｏｈｎｉｓｉｍｐｒｏｂａｌｅｔｏｂｅｅｌｅｃｔｅｄ． (3) John is unlikely to be elected.
(4) John is improbable to be elected.

英語が母国語でない書き手が、（１）（２）（３）がともに自然な文であることから（４）の英文を書いてしまうことは十分にあり得る。しかしながら、文（４）は英語を母国語とするものからみて不自然な表現であり、さらに、（４）を（２）へと修正すべきである点をシステムが指摘することは困難である。 It is quite possible for a writer whose English is not his native language to write the English sentence of (4) because both (1), (2) and (3) are natural sentences. However, sentence (4) is unnatural in terms of English as a native language, and it is difficult for the system to point out that (4) should be corrected to (2). .

本発明は、上述したような技術的課題を鑑みたものであり、その主な目的は、自然言語文の校正あるいは推敲を好適に支援することができる、優れた自然言語処理システム及び自然言語処理方法、並びにコンピュータ・プログラムを提供することにある。 The present invention has been made in view of the technical problems as described above, and its main purpose is to provide an excellent natural language processing system and natural language processing capable of suitably supporting the proofreading or revising of natural language sentences. It is to provide a method and a computer program.

本発明のさらなる目的は、母国語でない言語を用いて記述された文章に対して見られがちな、単語の並びやその他の不自然な表現の校正又は推敲を好適に支援することができる、優れた自然言語処理システム及び自然言語処理方法、並びにコンピュータ・プログラムを提供することにある。 A further object of the present invention is that it can favorably assist in the proofreading or revising of word sequences and other unnatural expressions that are often found in sentences written using a language other than the native language. A natural language processing system, a natural language processing method, and a computer program.

本発明のさらなる目的は、はっきりとした文法的誤りを含むとは言い難く、したがって、文法誤りを特定するルールの蓄積によって文の校正支援を行なう従来の技術では取り扱うことができないような文を、より自然な文へと自動修正することができる、優れた自然言語処理システム及び自然言語処理方法、並びにコンピュータ・プログラムを提供することにある。 A further object of the present invention is that it is difficult to say that it contains clear grammatical errors. Therefore, a sentence that cannot be handled by conventional technology that supports proofreading of sentences by accumulating rules that identify grammatical errors, An object of the present invention is to provide an excellent natural language processing system, natural language processing method, and computer program that can be automatically corrected into a more natural sentence.

本発明は、単語の綴りの間違いや文法的な誤りはないが、母国語話者からみて不自然となる文章の校正又は推敲を支援する自然言語処理システムであり、図１にはそのシステム構成を概略的に示している。すなわち、本発明に係る自然言語処理システムは、校正の対象となる自然言語文に意味解析を施す意味解析手段と、意味解析結果を保持する解析結果保持手段と、意味解析結果から複数の自然言語文を生成する生成手段と、生成結果を保持する生成結果保持手段と、生成結果保持手段に保持されている複数の自然言語文の中から最も自然な文を選択する選択手段とを備えている。 The present invention is a natural language processing system that supports the proofreading or revising of sentences that are unnatural from the viewpoint of a native language speaker, although there are no misspellings or grammatical errors. FIG. Is shown schematically. That is, the natural language processing system according to the present invention includes a semantic analysis unit that performs semantic analysis on a natural language sentence to be proofread, an analysis result holding unit that holds a semantic analysis result, and a plurality of natural languages based on the semantic analysis result. A generation unit that generates a sentence; a generation result holding unit that holds a generation result; and a selection unit that selects the most natural sentence from a plurality of natural language sentences held in the generation result holding unit. .

ここで、校正の対象となる自然言語文に意味解析を施す意味解析、並びに意味解析結果から自然言語文を生成する処理には、例えば、ＬｅｘｉｃａｌＦｕｎｃｔｉｏｎａｌＧｒａｍｍａｒ（ＬＦＧ：語彙機能文法）理論に基づく構文意味解析処理システムを適用することができる。 Here, in the semantic analysis for performing the semantic analysis on the natural language sentence to be proofread and the process for generating the natural language sentence from the semantic analysis result, for example, a syntax based on Lexical Functional Grammar (LFG: Vocabulary Functional Grammar) theory A semantic analysis processing system can be applied.

ＬＦＧ理論では、自然言語文を解析し、解析結果として文の意味内容をｆｕｎｃｔｉｏｎａｌｓｔｒｕｃｔｕｒｅ（ｆ−ｓｔｒｕｃｔｕｒｅ：機能構造）と呼ばれる属性−属性値ペアの入れ子構造（マトリックス構造）で表現する。ｆ−ｓｔｒｕｃｔｕｒｅは入力文の意味内容を抽象化したものである。いかなる言語であっても文の表現する意味が同じであれば等しい構造を持つｆ−ｓｔｒｕｃｔｕｒｅを出力できる点が、ＬＦＧの最大の特徴である。なお、ｆ−ｓｔｒｕｃｔｕｒｅの言語普遍性の詳細については、例えばＤａｌｒｙｍｐｌｅ，Ｍ．著“ＳｙｎｔａｘａｎｄＳｅｍａｎｔｉｃs：ＬｅｘｉｃａｌＦｕｎｃｔｉｏｎａｌＧｒａｍｍａｒ”（ＡｃａｄｅｍｉｃＰｒｅｓｓ（２００１））に記載されている。 In the LFG theory, a natural language sentence is analyzed, and the semantic content of the sentence is expressed as an analysis result by a nested structure (matrix structure) of attribute-attribute value pairs called a function structure (f-structure: functional structure). f-structure is an abstraction of the semantic content of the input sentence. The greatest feature of LFG is that f-structures having the same structure can be output if the meanings expressed by sentences are the same in any language. For details of language universality of f-structure, see, for example, Dalymplle, M. et al. "Syntax and Semantics: Lexical Functional Grammar" (Academic Press (2001)).

例えば、ｆ−ｓｔｕｒｕｃｔｕｒｅの言語普遍性の特徴を活かし、ｆ−ｓｔｒｕｃｔｕｒｅを中間言語とみなした機械翻訳手法を実現することができる。すなわち、ｆ−ｓｔｒｕｃｔｕｒｅを中間言語として利用することにより、図２（ａ）に示すように複数の言語間の相互機械翻訳システムを実現することが可能である。この場合、まず翻訳元言語Ａで記述された自然言語文をＬＦＧ理論に基づいて解析し、解析結果として言語Ａのｆ−ｓｔｒｕｃｔｕｒｅを得る。次いで、言語Ａのｆ−ｓｔｒｕｃｔｕｒｅを翻訳先言語Ｂのｆ−ｓｔｒｕｃｔｕｒｅに変換する。この変換処理は、ｆ−ｓｔｕｒｕｃｔｕｒｅの言語普遍性を勘案すれば、比較的容易な処理であると言える。最後に、得られた言語Ｂのｆ−ｓｔｒｕｃｔｕｒｅを言語Ｂの自然言語文に変換する。 For example, it is possible to realize a machine translation technique in which f-structure is regarded as an intermediate language by utilizing the characteristics of f-structure universality. That is, by using f-structure as an intermediate language, a mutual machine translation system between a plurality of languages can be realized as shown in FIG. In this case, first, a natural language sentence described in the translation source language A is analyzed based on the LFG theory, and an f-structure of the language A is obtained as an analysis result. Next, the f-structure of the language A is converted into the f-structure of the translation destination language B. This conversion process can be said to be a relatively easy process in consideration of the language universality of f-structure. Finally, the obtained f-structure of language B is converted into a natural language sentence of language B.

なお、ｆ−ｓｔｒｕｃｔｕｒｅを中間言語とみなした機械翻訳手法に関しては、例えば、ＡｎｅｔｔｅＦｒａｎｋ著“ＦｒｏｍＰａｒａｌｌｅｌＧｒａｍｍａｒＤｅｖｅｌｏｐｍｅｎｔｔｏｗａｒｄｓＭａｃｈｉｎｅＴｒａｎｓｌａｔｉｏｎ”（ＩｎＰｒｏｃｅｅｄｉｎｇｓｏｆＭＴＳｕｍｍｉｔＶＩＩ，ｐｐ．１３４−１４２（１９９９））を参照されたい。また、ｆ−ｓｔｒｕｃｔｕｒｅから自然言語文への変換については、例えばＲｏｎａｌｄＫａｐｌａｎ及びＪｕｒｇｅｎＷｅｄｅｋｉｎｄ共著“ＬＦＧｇｅｎｅｒａｔｉｏｎｐｒｏｄｕｃｅｓｃｏｎｔｅｘｔ−ｆｒｅｅｌａｎｇｕａｇｅｓ”（ＩｎＰｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅ１８ｔｈＩｎｔｅｒｎａｔｉｏｎａｌＣｏｎｆｅｒｅｎｃｅｏｎＣｏｍｐｕｔａｔｉｏｎａｌＬｉｎｇｕｉｓｔｉｃｓ，ｐｐ．４２５−４３１（２０００））に詳細な説明がなされている。また、自然言語文からｆ−ｓｔｒｕｃｔｕｒｅへの変換と、その逆変換であるｆ−ｓｔｒｕｃｔｕｒｅから自然言語文への変換を実現するシステムとしては、ＪｏｈｎＭａｘｗｅｌｌ及びＲｏｎａｌｄＫａｐｌａｎ共著“ＡＭｅｔｈｏｄｆｏｒＤｉｓｊｕｎｃｔｉｖｅＣｏｎｓｔｒａｉｎｔＳａｔｉｓｆａｃｔｉｏｎ（ＩｎＣｕｒｒｅｎｔＩｓｓｕｅｓｉｎＰａｒｓｉｎｇＴｅｃｈｎｏｌｏｇｙ，ｐｐ．１７３−１９０，Ｋｌｕｗｅｒ（１９９１））や、ＪｏｈｎＭａｘｗｅｌｌ及びＲｏｎａｌｄＫａｐｌａｎ共著“Ｔｈｅｉｎｔｅｒｆａｃｅｂｅｔｗｅｅｎｐｈｒａｓａｌａｎｄｆｕｎｃｔｉｏｎａｌｃｏｎｓｔｒａｉｎｔｓ”（ＣｏｍｐｕｔａｔｉｏｎａｌＬｉｎｇｕｉｓｔｉｃｓ，１９（４），ｐｐ．５７１−５９０（１９９３））で詳細な説明がなされているＸＬＥと呼ばれるＬＦＧシステムを挙げることができる。 As for a machine translation method in which f-structure is regarded as an intermediate language, for example, “From Parallel Grammar Development towns Machine Translation” (In Proceedings of MT Supra. I want to be. Also, for conversion from f-structure to natural language sentences, for example, “LFG generation products context-free encoding in the Proceedings of the World,” by Lonald Kaplan and Jurgen Wedekind. )) Is a detailed explanation. Also, as a system that realizes conversion from natural language sentences to f-structure and vice versa, conversion from f-structure to natural language sentences, John Maxwell and Ronald Kaplan co-authored “A Method for Distinct Constitutive Constrate In Current Issues in Parsing Technology, pp. 173-190, Kluwer (1991)) and John Maxwell, Ronald Kaplan, "The interface between franss and 4". An LFG system called XLE, which is described in detail in 1993)).

本発明では、自然言語文の表層的記述の違いに依存しないｆ−ｓｔｒｕｃｔｕｒｅの言語普遍的性質を、機械翻訳ではなく、文の校正支援に利用している。すなわち、図２（ｂ）に示すように、校正の対象となる文をＬＦＧ理論に基づいて解析することによってｆ−ｓｔｕｒｕｃｔｕｒｅに変換した後、さらにｆ−ｓｔｒｕｃｔｕｒｅを同じ言語の自然言語文に逆変換する。ｆ−ｓｔｒｕｃｔｕｒｅは抽象的な構造であるため、文の意味は保存するが、その表現形式は保存されない。ここで「文の意味」と呼んでいるものは、一般に述語・項構造と呼ばれる情報で、文の述部が何であり、その主語や目的語などが何であるか、といった情報のことである。したがって、上記の処理を行なうことにより、文の意味内容は同一であるが、表現の異なる文が複数生成されることになる。 In the present invention, the universal nature of the f-structure, which does not depend on the difference in the surface description of natural language sentences, is used not for machine translation but for sentence proofreading support. That is, as shown in FIG. 2B, the sentence to be proofread is converted into f-structure by analyzing it based on the LFG theory, and then the f-structure is further converted back into a natural language sentence of the same language. To do. Since f-structure is an abstract structure, the meaning of the sentence is preserved, but the expression format is not preserved. What is called “the meaning of a sentence” here is information generally called a predicate / term structure, which is information such as what is a predicate of a sentence and what is its subject or object. Therefore, by performing the above processing, a plurality of sentences having the same semantic content but different expressions are generated.

この同一言語内での自然言語文からｆ−ｓｔｒｕｃｔｕｒｅへの変換、及び、ｆ−ｓｔｒｕｃｔｕｒｅから自然言語文への逆変換を行なうことによって、例えば、上記の英語文（４）を入力として、英語文（４）及び英語文（２）を出力結果として得ることができる。ここで重要なことは、英語文（４）だけを見て不自然な文であるか否かの判断を下すことが極めて困難な処理であるのに対して、（２）と（４）という２つの英語文が与えられれば、それら２文から、より自然な文を選択することは比較的容易な処理となる点である。 By performing conversion from the natural language sentence to f-structure and reverse conversion from f-structure to the natural language sentence within the same language, for example, the above-mentioned English sentence (4) is used as an input to the English sentence. (4) and English sentence (2) can be obtained as output results. What is important here is that it is extremely difficult to determine whether or not the sentence is unnatural by looking only at the English sentence (4), whereas (2) and (4) If two English sentences are given, selecting a more natural sentence from the two sentences is a relatively easy process.

また、同一言語内での自然言語文からｆ−ｓｔｒｕｃｔｕｒｅへの変換、及び、ｆ−ｓｔｒｕｃｔｕｒｅから自然言語文への逆変換を行なうことによって、文の意味内容は同一であるが、表現の異なる複数の文が候補として生成された場合において、例えばコーパスとの比較を行なうことによって、意味内容が同一となる文の中からより自然な表現をした文を選択することができる。 Also, by converting from natural language sentences to f-structure and reverse conversion from f-structure to natural language sentences within the same language, the meaning and content of the sentences are the same, but the expressions are different. When the sentence is generated as a candidate, for example, by comparing with a corpus, a sentence with a more natural expression can be selected from sentences having the same semantic content.

ここで、複数の候補文とコーパスを比較する場合、原文同士を直接比較するのではなく、これらの構文解析結果に基づいて評価することができる。ここで、意味解析結果を用いて評価を行なう場合、文章のうち意味を持つ部分のみが抽出され、表層的記述の違いが捨象されてしまう。これに対し、構文解析結果によれば、これらの表層的な情報が残されているため、単語の並びなど文章表現の自然さを好適に判断することができる。 Here, when comparing a plurality of candidate sentences with the corpus, the original sentences can be evaluated based on the result of the syntax analysis instead of directly comparing the original sentences. Here, when the evaluation is performed using the semantic analysis result, only the meaningful part of the sentence is extracted, and the difference in the surface description is discarded. On the other hand, according to the result of the syntax analysis, since the surface information is left, it is possible to appropriately determine the naturalness of the sentence expression such as the arrangement of words.

また、本発明の第２の側面は、文章の校正を支援するための処理をコンピュータ・システム上で実行するようにコンピュータ可読形式で記述されたコンピュータ・プログラムであって、校正の対象となる自然言語文に意味解析を施す意味解析ステップと、意味解析結果を保持する意味解析結果保持ステップと、意味解析結果から１以上の自然言語文を生成する文章生成ステップとを備え、前記文章生成ステップにおいて生成された自然言語文に基づいて、前記校正の対象となる自然言語文の校正を支援することを特徴とするコンピュータ・プログラムである。 The second aspect of the present invention is a computer program written in a computer-readable format so as to execute a process for supporting the proofreading of a sentence on a computer system. A semantic analysis step for performing semantic analysis on a language sentence; a semantic analysis result holding step for holding a semantic analysis result; and a sentence generation step for generating one or more natural language sentences from the semantic analysis result. A computer program that supports proofreading of a natural language sentence to be proofread based on a generated natural language sentence.

本発明の第２の側面に係るコンピュータ・プログラムは、コンピュータ・システム上で所定の処理を実現するようにコンピュータ可読形式で記述されたコンピュータ・プログラムを定義したものである。換言すれば、本発明の第２の側面に係るコンピュータ・プログラムをコンピュータ・システムにインストールすることによって、コンピュータ・システム上では協働的作用が発揮され、本発明の第１の側面に係る自然言語処理システムと同様の作用効果を得ることができる。 The computer program according to the second aspect of the present invention defines a computer program described in a computer-readable format so as to realize predetermined processing on a computer system. In other words, by installing the computer program according to the second aspect of the present invention in the computer system, a cooperative action is exhibited on the computer system, and the natural language according to the first aspect of the present invention. The same effects as the processing system can be obtained.

本発明に係る自然言語処理システムによれば、はっきりとした文法的誤りを含むとは言い難く、したがって、文法誤りを特定するルールの蓄積によって文の校正支援を行なう従来の技術では取り扱うことができないような文を、より自然な文へと自動修正することが可能となり、且つ、元の入力文と意味的に異なる校正結果を出力することもない。 According to the natural language processing system of the present invention, it is difficult to say that it contains clear grammatical errors, and therefore cannot be handled by conventional techniques that support sentence proofreading by accumulating rules that identify grammatical errors. Such a sentence can be automatically corrected to a more natural sentence, and a proofreading result that is semantically different from the original input sentence is not output.

また、本発明に係る自然言語処理システムは、従来技術によっては実現できなかった自然な表現への校正を可能とするものであり、機械翻訳システムから得られる出力をさらに精錬する用途にも使用することができる。 The natural language processing system according to the present invention enables proofreading to natural expressions that could not be realized by the prior art, and is also used for further refining the output obtained from the machine translation system. be able to.

本発明のさらに他の目的、特徴や利点は、後述する本発明の実施形態や添付する図面に基づくより詳細な説明によって明らかになるであろう。 Other objects, features, and advantages of the present invention will become apparent from more detailed description based on embodiments of the present invention described later and the accompanying drawings.

以下、図面を参照しながら本発明の実施形態について詳解する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

図３には、本発明の一実施形態に係る文章校正システム１０の構成を模式的に示している。同図に示すように、文章校正システム１０は、校正対象文入力部１１と、意味解析部１２と、意味解析結果保持部１３と、文章生成部１４と、生成結果保持部１５と、コーパス保持部１６と、構文解析部１７と、選択部１８とを備えている。図示の文章校正システム１０は、専用のハードウェア装置として構成することもできるが、パーソナル・コンピュータなどの一般的な計算機システムに文章校正用アプリケーションを導入するという形態で実現することも可能である。 In FIG. 3, the structure of the text proofreading system 10 which concerns on one Embodiment of this invention is shown typically. As shown in the figure, the sentence proofreading system 10 includes a proofreading object input part 11, a semantic analysis part 12, a semantic analysis result holding part 13, a sentence generation part 14, a generation result holding part 15, and a corpus holding. A unit 16, a syntax analysis unit 17, and a selection unit 18 are provided. The illustrated text proofing system 10 can be configured as a dedicated hardware device, but can also be realized by introducing a text proofreading application into a general computer system such as a personal computer.

校正対象文入力部１１は、校正の対象となる英語文をユーザから受け取るユーザ・インターフェースを備える。 The proofreading target sentence input unit 11 includes a user interface that receives an English sentence to be proofread from a user.

意味解析部１２は、校正対象文入力部１１が受け取った英語文に対し、ＬＦＧに基づく意味解析処理を施し、対応するｆ−ｓｔｒｕｃｔｕｒｅを得る。 The semantic analysis unit 12 performs a semantic analysis process based on LFG on the English sentence received by the proofreading target sentence input unit 11 to obtain a corresponding f-structure.

ここで、ＬＦＧに基づく意味解析処理について説明しておく。図４には、ＬＦＧ文法理論に基づく構文意味解析処理システム２００の機能的構成を模式的に示している。 Here, semantic analysis processing based on LFG will be described. FIG. 4 schematically shows a functional configuration of the syntactic and semantic analysis processing system 200 based on the LFG grammar theory.

形態素解析部２０２は、日本語など特定の言語に関する形態素ルール２０２Ａと形態素辞書２０２Ｂを持ち、入力文を意味的最小単位である形態素に分節して品詞の認定処理を行なう。但し、校正対象文入力部１１は、校正の対象となる英語文の形態素解析結果を入力することができる場合には、形態素解析部２０２を省略することができる。形態素解析システムとして、例えば「茶筌（Ｃｈａｓｅｎ）」など日本語形態素解析システムを適用することができるが、本発明の要旨はこれに限定されるものではない。茶筌による形態素解析システムについては、例えば、松本裕治、北内啓、山下達雄、平野善隆、松田寛、高岡一馬、浅原正幸共著「日本語形態素解析システム茶筌ｖｅｒｓｉｏｎ２．２．１使用説明書」（奈良先端科学技術大学院大学，２０００）を参照されたい。 The morpheme analysis unit 202 has a morpheme rule 202A and a morpheme dictionary 202B relating to a specific language such as Japanese, and performs a part-of-speech recognition process by segmenting an input sentence into morphemes that are the smallest semantic units. However, the proofreading target sentence input unit 11 can omit the morpheme analyzing unit 202 when the morphological analysis result of the English sentence to be proofread can be input. As the morphological analysis system, for example, a Japanese morphological analysis system such as “Chasen” can be applied, but the gist of the present invention is not limited to this. For example, Yuji Matsumoto, Kei Kitauchi, Tatsuo Yamashita, Yoshitaka Hirano, Hiroshi Matsuda, Kazuma Takaoka, and Masayuki Asahara, “Corporation version 2.2.1 Instruction Manual for Japanese Morphological Analysis System” (Nara Institute of Science and Technology, 2000).

形態素解析結果は、次いで、構文意味解析部２０３に入力される。構文意味解析部２０３は、文法ルール２０３Ａや格フレーム辞書２０３Ｂなどの辞書を持ち、文法ルールなどに基づく句構造の解析や、文中の語の語義や語と語の間の意味関係などに基づいて文が伝える意味を表現する意味構造の解析を行なう。格フレーム辞書は動詞と主語などの文中の他の構成要素との関係を記述したものであり、述部とそれに係る語の意味関係を抽出することができる。 The morpheme analysis result is then input to the syntax and semantic analysis unit 203. The syntactic and semantic analysis unit 203 has dictionaries such as a grammar rule 203A and a case frame dictionary 203B. Based on the analysis of the phrase structure based on the grammar rule, the meaning of words in the sentence, and the semantic relationship between words. Analyzes the semantic structure expressing the meaning conveyed by the sentence. The case frame dictionary describes the relationship between verbs and other components in the sentence such as the subject, and the predicate and the semantic relationship between the words can be extracted.

そして、構文解析した結果として、単語や形態素などからなる文章の句構造を木構造として表した“ｃ−ｓｔｒｕｃｔｕｒｅ（ｃｏｎｓｔｉｔｕｅｎｔｓｔｒｕｃｔｕｒｅ：）”を得て、さらに主語、目的語などの格構造に基づいて入力文を疑問文、過去形、丁寧文など意味的・機能的に解析した結果として“ｆ−ｓｔｒｕｃｔｕｒｅ（ｆｕｎｃｔｉｏｎａｌｓｔｒｕｃｔｕｒｅ）”を出力する。 As a result of the syntax analysis, “c-structure (constituent structure :)” representing a phrase structure of a sentence composed of words, morphemes and the like as a tree structure is obtained, and further based on the case structure of the subject and the object. “F-structure (functional structure)” is output as a result of semantically and functionally analyzing the input sentence such as a question sentence, past tense, and polite sentence.

ｃ−ｓｔｒｕｃｔｕｒｅは、文中の単語や句の構造を木構造形式で表したものであり、構文カテゴリによって定義される。例えば音素列を生成するための音韻学的な解釈を、ｃ−ｓｔｒｕｃｔｕｒｅを基に行なうことができる。一方、ｆ−ｓｔｒｕｃｔｕｒｅは、文法的な機能を明確に表現したものであり、文法的な機能名、意味的形式、並びに特徴シンボルにより構成される。このようなｆ−ｓｔｒｕｃｔｕｒｅを参照することにより、主語（ｓｕｂｊｅｃｔ）、目的語（ｏｂｊｅｃｔ）、補語（ｃｏｍｐｌｅｍｅｎｔ）、修飾語（ａｄｊｕｎｃｔ）といった意味理解を得ることができる。ｆ−ｓｔｒｕｃｔｕｒｅは、ｃ−ｓｔｒｕｃｔｕｒｅの各節点に付随する素性の集合であり、属性−属性値のマトリックスの形で表現される。ｆ−ｓｔｒｕｃｔｕｒｅでは、自然言語文の表層的記述の違いに依存しない、文章の意味を持つ部分のみが抽出され、言語普遍的性質を記述することができる。これに対し、ｃ−ｓｔｒｕｃｔｕｒｅでは、表層的記述の違いが残され、単語の並びなど文章表現の自然さを判断するための要素が包含されている。 c-structure represents the structure of words and phrases in a sentence in a tree structure format, and is defined by a syntax category. For example, phonological interpretation for generating a phoneme string can be performed based on c-structure. On the other hand, f-structure clearly expresses a grammatical function, and includes a grammatical function name, a semantic form, and a feature symbol. By referring to such f-structure, it is possible to obtain an understanding of the meaning of a subject, a subject, an complement, a modifier, and an adjunct. The f-structure is a set of features attached to each node of the c-structure, and is expressed in the form of an attribute-attribute value matrix. In f-structure, only the part having the meaning of the sentence, which does not depend on the difference in the surface description of the natural language sentence, is extracted, and the universal property of the language can be described. On the other hand, in c-structure, the difference in the surface description is left, and elements for judging the naturalness of the sentence expression such as the arrangement of words are included.

なお、ＬＦＧの詳細に関しては、例えばＲ．Ｍ．Ｋａｐｌａｎ及びＪ．Ｂｒｅｓｎａｎ共著の論文“Ｌｅｘｉｃａｌ−ＦｕｎｃｔｉｏｎａｌＧｒａｍｍａｒ：ＡＦｏｒｍａｌＳｙｓｔｅｍｆｏｒＧｒａｍｍａｔｉｃａｌＲｅｐｒｅｓｅｎｔａｔｉｏｎ”（ＴｈｅＭＩＴＰｒｅｓｓ，Ｃａｍｂｒｉｄｇｅ（１９８２）．ＲｅｐｒｉｎｔｅｄｉｎＦｏｒｍａｌＩｓｓｕｅｓｉｎＬｅｘｉｃａｌ−ＦｕｎｃｔｉｏｎａｌＧｒａｍｍａｒ，ｐｐ．２９−１３０．ＣＳＬＩｐｕｂｌｉｃａｔｉｏｎｓ，ＳｔａｎｆｏｒｄＵｎｉｖｅｒｓｉｔｙ（１９９５）．）などに記述されている。 For details of LFG, see, for example, R.A. M.M. Kaplan and J.H. Bresnan co-author of the paper. "Lexical-Functional Grammar: A Formal System for Grammatical Representation" (The MIT Press, Cambridge (1982) Reprinted in Formal Issues in Lexical-Functional Grammar, pp.29-130.CSLI publications, Stanford University (1995 ).) Etc.

図５には、校正対象文入力部１１が以下の英語文（５）を受け取った場合に、意味解析部１２が出力するｆ−ｓｔｒｕｃｔｕｒｅを示している。 FIG. 5 shows the f-structure output from the semantic analysis unit 12 when the proofreading target sentence input unit 11 receives the following English sentence (5).

（５）Ｆｏｏｄｑｕｉｃｋｌｙｓｐｏｉｌｓｉｎｔｈｅｓｕｍｍｅｒ．（夏は食物が腐りやすい。) (5) Food quick spoils in the summer. (Food tends to rot in summer.)

ｆ−ｓｔｒｕｃｔｕｒｅでは、属性−属性値のマトリックスの形で表現され、［］で囲まれた中の左側は素性（属性）の名前であり、右側は素性の値（属性値）である。図５に示す例では、英語（５）の主辞（述部、ＰＲＥＤ（ｉｃａｔｅ））が「ｓｐｏｉｌ」であり、その主語（ＳＵＢＪ（ｅｃｔ））が「ｆｏｏｄ」、時制が「現在（ＰＲＥＳ（ｅｎｔ））」であること、さらに、任意の修飾要素（ＡＤＪＵＮＣＴ）として「ｑｕｉｃｋｌｙ」と「ｉｎｔｈｅｓｕｍｍｅｒ」が「ｓｐｏｉｌ」を修飾していることが示されている。なお、意味解析部１２による意味解析に失敗した場合には、入力文に文法的不備があるとして、その旨をユーザに通知する。 In f-structure, it is expressed in the form of an attribute-attribute value matrix, the left side in [] is the name of the feature (attribute), and the right side is the feature value (attribute value). In the example shown in FIG. 5, the main word (predicate, PRED (icate)) of English (5) is “soil”, the subject (SUBJ (ect)) is “food”, and the tense is “present (PRES (ent )) ", And further," quickly "and" in the summer "have modified" soil "as optional modifying elements (ADJUNCT). If the semantic analysis by the semantic analysis unit 12 fails, the user is notified that there is a grammatical defect in the input sentence.

意味解析結果保持部１３は、意味解析部１２から得られるＬＦＧ理論に基づく意味解析結果としての上記ｆ−ｓｔｒｕｃｔｕｒｅを、文章校正システム１０を構成する計算機内部に保持する。 The semantic analysis result holding unit 13 holds the f-structure as a semantic analysis result based on the LFG theory obtained from the semantic analysis unit 12 in a computer constituting the sentence proofreading system 10.

文章生成部１４は、意味解析結果保持部１３に保持されているｆ−ｓｔｒｕｃｔｕｒｅに対して、ＬＦＧに基づく英語文生成処理を施すことにより、対応する１つ以上の英語文を得る。ＬＦＧに基づく自然言語文生成処理の詳細については、前述したＲｏｎａｌｄＫａｐｌａｎ及びＪｕｒｇｅｎＷｅｄｅｋｉｎｄ共著“ＬＦＧｇｅｎｅｒａｔｉｏｎｐｒｏｄｕｃｅｓｃｏｎｔｅｘｔ−ｆｒｅｅｌａｎｇｕａｇｅｓ”（ＩｎＰｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅ１８ｔｈＩｎｔｅｒｎａｔｉｏｎａｌＣｏｎｆｅｒｅｎｃｅｏｎＣｏｍｐｕｔａｔｉｏｎａｌＬｉｎｇｕｉｓｔｉｃｓ，ｐｐ．４２５−４３１（２０００））を参照されたい。 The sentence generation unit 14 obtains one or more corresponding English sentences by applying an English sentence generation process based on LFG to the f-structure held in the semantic analysis result holding unit 13. For details of the natural language sentence generation processing based on LFG, the above-mentioned Ronald Kaplan and Jurgen Wedekind's "LFG generation products context-free enumeration of the first 3 of the ensembles". Refer to).

図５に示したｆ−ｓｔｒｕｃｔｕｒｅを入力として生成される複数の英語文の例を以下に挙げる。ここで、文章生成部１４により生成された複数の文のうち１つが、入力された英語文（５）と同じであるとする。 An example of a plurality of English sentences generated using the f-structure shown in FIG. 5 as an input will be given below. Here, it is assumed that one of the plurality of sentences generated by the sentence generation unit 14 is the same as the input English sentence (5).

（５）Ｆｏｏｄｑｕｉｃｋｌｙｓｐｏｉｌｓｉｎｔｈｅｓｕｍｍｅｒ．
（６）Ｆｏｏｄｓｐｏｉｌｓｑｕｉｃｋｌｙｉｎｔｈｅｓｕｍｍｅｒ．
（７）Ｆｏｏｄｓｐｏｉｌｓｉｎｔｈｅｓｕｍｍｅｒｑｕｉｃｋｌｙ．
（８）Ｑｕｉｃｋｌｙｆｏｏｄｓｐｏｉｌｓｉｎｔｈｅｓｕｍｍｅｒ．
（９）Ｉｎｔｈｅｓｕｍｍｅｒｆｏｏｄｑｕｉｃｋｌｙｓｐｏｉｌｓ．
（10）Ｉｎｔｈｅｓｕｍｍｅｒｆｏｏｄｓｐｏｉｌｓｑｕｉｃｋｌｙ．
… (5) Food quick spoils in the summer.
(6) Food spoons quickly in the summer.
(7) Food spoons in the summer quickly.
(8) Quickly food spoils in the summer.
(9) In the summer food quick spoils.
(10) In the summer food spoils quickly.
...

生成結果保持部１５は、文章生成部１４により生成される各英語文を、文章校正システム１０を構成する計算機内部に保持する。 The generation result holding unit 15 holds each English sentence generated by the sentence generation unit 14 inside a computer constituting the sentence proofreading system 10.

コーパス保持部１６は、テキストや発話を大規模又は網羅的に集めた言語資料体である。本実施形態では、英語を母国語とする話者によって書かれた英語文を言語資料として、文章校正システム１０を構成する計算機内部に保持する。 The corpus holding unit 16 is a language material that collects texts and utterances on a large scale or exhaustively. In the present embodiment, an English sentence written by a speaker whose native language is English is stored as a language material in a computer constituting the sentence proofreading system 10.

構文解析部１７は、生成結果保持部１５及びコーパス保持部１６に保持されている英語文に対して構文解析処理を施す。構文解析は、文法規則などを基に句構造などの文の構造を解析する処理である。文法規則が木構造であることから、文の構造を文の構成要素をより上位のフレーズへと纏め上げることによって、個々の形態素が係り受け関係などに基づいて接合された木構造として、構文解析結果が表現される。構文解析の結果得られる構造は一般に「構文木」と呼ばれる。 The syntax analysis unit 17 performs a syntax analysis process on the English sentences held in the generation result holding unit 15 and the corpus holding unit 16. Parsing is a process of analyzing a sentence structure such as a phrase structure based on grammatical rules. Since the grammatical rule is a tree structure, by synthesizing the sentence structure as a tree structure in which individual morphemes are joined based on dependency relationships, etc. The result is expressed. A structure obtained as a result of parsing is generally called a “syntax tree”.

本実施形態では、ＬＦＧ文法理論に基づく構文解析を施し、構文木として、単語や形態素などからなる文章の句構造を木構造として表したｃ−ｓｔｒｕｃｔｕｒｅを扱う（前述）。図６〜図８には、上記の英語文（５）、（６）、及び以下の英語文（１１）に対応する構文木を示している。 In this embodiment, syntax analysis based on LFG grammar theory is performed, and c-structure that represents a phrase structure of a sentence composed of words, morphemes, and the like as a tree structure is handled as a syntax tree (described above). 6 to 8 show syntax trees corresponding to the above-mentioned English sentences (5) and (6) and the following English sentence (11).

（11）Ｒａｗｆｏｏｄｓｐｏｉｌｔｑｕｉｃｋｌｙｉｎｔｈｅｌａｓｔｓｕｍｍｅｒ． (11) Raw food spoof quickly in the last summer.

選択部１８は、構文解析部１７から得られる、コーパス保持部１６中に保持されている英語文に対応する構文木と、生成結果保持部１５中に保持されている英語文に対応する構文木とを比較し、生成結果保持手段１５中に保持されている英語文の中から自然な英語文を選択する。具体的には、生成結果保持部１５中に保持されている英語文に対応する構文木のうち、コーパス保持部１６中に保持されている文中に類似する文が多く存在するものを、より自然な英語文として選択する。構文木の類似度は、構文木間の距離計算（周知）などに基づいて算出することができる。 The selection unit 18 obtains the syntax tree corresponding to the English sentence held in the corpus holding unit 16 and the syntax tree corresponding to the English sentence held in the generation result holding unit 15 obtained from the syntax analysis unit 17. And a natural English sentence is selected from the English sentences held in the generation result holding means 15. Specifically, among the syntax trees corresponding to the English sentences held in the generation result holding unit 15, those having many similar sentences in the sentences held in the corpus holding unit 16 are more natural. To select as an English sentence. The similarity of the syntax trees can be calculated based on the calculation of distance between syntax trees (well-known).

選択部１８において、文章の構文解析結果に基づいて選択処理を行なうのは、意味解析結果では、文章のうち意味を持つ部分のみが抽出され、表層的記述の違いが捨象されてしまうのに対し、構文解析結果によれば、これらの情報がまだ残されており、単語の並びなど文章表現の自然さを好適に判断することができるからである。 The selection unit 18 performs the selection process based on the sentence syntax analysis result, because the semantic analysis result extracts only the meaningful part of the sentence and discards the difference in the surface description. This is because, according to the syntax analysis result, these pieces of information are still left, and the naturalness of the sentence expression such as the arrangement of words can be suitably determined.

図９には、選択部１８において実行される英語文選択アルゴリズムをフローチャートの形式で示している。 FIG. 9 shows an English sentence selection algorithm executed in the selection unit 18 in the form of a flowchart.

ここでは、文章生成部１４によって生成された英語文に対応する構文木の集合をＸとし、コーパス保持部１６中の英語文に対応する構文木の集合をＹとする。また、Ｘの要素のうち、校正対象文入力部１１に入力された文（すなわち元の文）に対応する構文木をｘ_kとする（ステップＳ１）。 Here, a set of syntax trees corresponding to the English sentences generated by the sentence generation unit 14 is X, and a set of syntax trees corresponding to the English sentences in the corpus holding unit 16 is Y. Further, among the elements of X, a syntax tree corresponding to a sentence (that is, the original sentence) input to the proofreading sentence input unit 11 is set to x _k (step S1).

構文木の集合Ｘの要素数をｍとして、ｍが１であれば（ステップＳ２）、ｘ_kに対応する校正対象文をそのままユーザに示す（ステップＳ６）。 The number of elements of the set X of syntax tree as m, if m is 1 (step S2), and shown as it is a user calibration target sentence corresponding to x _k (step S6).

一方、構文木の集合Ｘの要素数ｍが２以上の場合には（ステップＳ２）、Ｘの各要素ｘ_n（１≦ｎ≦ｍ）に対して、コーパスの構文木の集合Ｙのうちｘ_nを包含する要素数Ｎ（ｘ_n）を算出する（ステップＳ３）。 On the other hand, when the number m of elements in the syntax tree set X is 2 or more (step S2), x of the corpus syntax tree set Y for each element x _n (1 ≦ n ≦ m) of X. _The number of elements N (x _n ) including _n is calculated (step S3).

ここで、一方の構文木が他方の構文木を「包含」するとは、構文木同士が類似することを意味し、例えば木構造間の距離計算（周知）に基づいて包含するかどうかを判別することができる。一方の構文木が他方の構文木を包含する顕著な例としては、他方の構文木が一方の構文木に完全に含まれている場合や、構文木同士が一致する場合である。例えば、図８に示した構文木は、図７に示した構文木を完全に含む構造を有しているため（図１０を参照のこと）、図８に示した構文木は図７に示した構文木を包含していると判断する。但し、動詞の時制は、異なっていても等しいものとした。 Here, “inclusive” of one syntax tree means that the syntax trees are similar to each other, for example, whether or not to include based on distance calculation (known) between the tree structures. be able to. A prominent example in which one syntax tree includes the other syntax tree is when the other syntax tree is completely included in one syntax tree or when the syntax trees match. For example, since the syntax tree shown in FIG. 8 has a structure that completely includes the syntax tree shown in FIG. 7 (see FIG. 10), the syntax tree shown in FIG. 8 is shown in FIG. It is determined that the syntax tree is included. However, the tense of verbs is the same even if they are different.

ステップＳ３において算出されるコーパスの構文木の集合Ｙの要素数Ｎ（ｘ_n）とは、要するに、処理対象となっている英語文と表現が類似しているコーパス中の文章の数に相当する。すなわち、要素数Ｎ（ｘ_n）が大きければ、構文木ｘ_nの元の英語文に類似する表現が、英語を母国語とする話者によって書かれた英語文に多く見られる、すなわち自然であることが推定される。 The number of elements N (x _n ) of the corpus syntax tree set Y calculated in step S3 basically corresponds to the number of sentences in the corpus that are similar in expression to the English sentence being processed. . That is, if the number of elements N (x _n ) is large, expressions similar to the original English sentence of the parse tree x _n are often seen in English sentences written by speakers whose native language is English, It is estimated that there is.

ステップＳ４では、要素数Ｎ（ｘ_n）の判断基準として、Ｎ（ｘ_n）／Ｎ（ｘ_k）＞Ｔ_a及びＮ（ｘ_n）＞Ｔ_b、すなわち生成された英語文と表現が類似するコーパス中の英語文の数を元の入力文と閾値Ｔ_aにより比較し、さらに、表現が類似するコーパス中の英語文の数自体を所定の閾値Ｔ_bで判断する。 In step S4, N (x _n ) / N (x _k )> T _a and N (x _n )> T _b as the criteria for determining the number of elements N (x _n ), that is, the expression is similar to the generated English sentence the number of English sentences in the corpus compared by original input sentence and the threshold T _a which further determines the number itself English sentences in corpus representation is similar in certain threshold T _b.

そして、Ｎ（ｘ_n）／Ｎ（ｘ_k）＞Ｔ_aを満たし、且つ、Ｎ（ｘ_n）＞Ｔ_bを満たすｘ_nが存在すれば、そのｘ_nに対応する文をＮ（ｘ_n）の値が大きいものから順に校正結果として出力する（ステップＳ５）。 Then, N satisfies the _{(x n) / N (x} k)> T a, and, N (x _n)> T if _b exists x _n satisfying, the x _n a sentence corresponding to N (x _n ) Are output as calibration results in descending order (step S5).

なお、Ｔ_a及びＴ_bはあらかじめ設定された閾値であり、Ｔ_aは１よりも大きい正の実数、Ｔ_bは１よりも大きい自然数であるとする。上記制約を満たすｘ_nが存在しない場合は、ｘ_kに対応する校正対象文をそのままユーザに示す（ステップＳ６）。 Note that T _a and T _b are thresholds set in advance, T _a is _a positive real number greater than 1, and T _b is a natural number greater than 1. If there is no x _n satisfying the above constraint, the proofreading sentence corresponding to x _k is shown to the user as it is (step S6).

上記のアルゴリズムは、基本的にコーパス保持部１６中の文集合の中に類似する構文構造を持つ文が多く存在するものを校正結果として選択するものである。本実施形態では、構文木が完全に包含関係にあるものを対象として選択を行なった。例えば、高橋哲郎、乾健太郎、松本裕治共著「テキストの構文的類似度の評価方法について」（情報処理学会研究報告，２００２−ＮＬ−１５０，ｐｐ．１６３−１７０（２００２））で提案されている種々の構文木比較手法を用いることによって、より頑健なシステムを実現することができる。 In the above algorithm, basically, a sentence set in the corpus holding unit 16 in which many sentences having a similar syntax structure exist is selected as a proofreading result. In the present embodiment, the selection is made on the object whose syntax tree is completely inclusive. For example, co-authored by Tetsuro Takahashi, Kentaro Inui, and Yuji Matsumoto, “Evaluation Method of Syntactic Similarity of Text” (Information Processing Society of Japan Research Report, 2002-NL-150, pp.163-170 (2002)). By using various syntax tree comparison methods, a more robust system can be realized.

あるいは、構文木の比較に基づく選択処理に機械学習手法を用いることによって、最も自然な英語文を選択することも可能である。この場合、機械学習手法によってコーパス手段から自然な文を選択するための学習を行ない、その学習結果に基づいて自然な文の選択を行なう。機械学習では、学習データを入力とし、統計処理手法を用いることによって、データの特徴を説明するための規則を出力することができる。 Alternatively, the most natural English sentence can be selected by using a machine learning method for the selection process based on the comparison of the syntax trees. In this case, learning for selecting a natural sentence from the corpus means is performed by a machine learning method, and a natural sentence is selected based on the learning result. In machine learning, rules for explaining the characteristics of data can be output by using learning data as input and using a statistical processing method.

機械学習手法として、ＳｕｐｐｏｒｔＶｅｃｔｏｒｅＭａｃｈｉｎｅやＭａｘｉｍｕｍＥｎｔｒｏｐｙなどを挙げることができる。例えばＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｃｈｉｎｅは、ノンパラメトリックなパターン分類器の１つであり、学習の最適解として求められた分離超平面による線形識別を行ない、学習資料を線形分離することが不適切な場合には学習資料を元のパターン空間からより高次のパターン空間に非線形写像し高次元空間で分離超平面を構築し線形識別を行なう。なお、これら機械学習手法の詳細については、ＦａｂｒｉｚｉｏＳｅｂａｓｔｉａｎｉ著“ＭａｃｈｉｎｅＬｅａｒｎｉｎｇｉｎＡｕｔｏｍａｔｅｄＴｅｘｔＣａｔｅｇｏｒｉｚａｔｉｏｎ”（ＡＣＭＣｏｍｐｕｒｔｉｎｇＳｕｒｖｅｙｓＶｏｌ．３４，Ｎｏ．１，ｐｐ．１−４７，２００２）及び当該論文中の引用文献を参照されたい。 Examples of machine learning methods include Support Vector Machine and Maximum Entropy. For example, Support Vector Machine is one of the non-parametric pattern classifiers that performs linear discrimination using the separation hyperplane obtained as the optimal solution for learning, and learns when it is inappropriate to linearly separate learning materials. The material is nonlinearly mapped from the original pattern space to a higher-order pattern space, and a separation hyperplane is constructed in the high-dimensional space to perform linear discrimination. For details of these machine learning methods, “Machine Learning in Automated Text Categorization” by Fabrizio Sebastani (ACM Computing Surveys Vol. 34, No. 1, pp. 1-47, 2002) and the literature are cited. Please refer.

以上の構成により、校正対象文入力部１１に入力された英語文を、意味内容が等しく、且つ、文法的に誤りのない他の表現に変換した後、その中から最も自然な文を選択することができる。この結果、はっきりとした文法的誤りを含むとは言い難く、したがって、文法誤りを特定するルールの蓄積によって文の校正支援を行なう従来の技術では取り扱うことができないような文を、より自然な文へと自動修正することが可能となり、且つ、元の入力文と意味的に異なる校正結果を出力することもない。 With the above configuration, the English sentence input to the proofreading sentence input unit 11 is converted into another expression having the same semantic content and no grammatical error, and then the most natural sentence is selected from them. be able to. As a result, it is difficult to say that there is a clear grammatical error. Therefore, a sentence that cannot be handled by conventional technology that supports proofreading of sentences by accumulating rules that identify grammatical errors is more natural. Can be automatically corrected, and a calibration result that is semantically different from the original input sentence is not output.

上記の英語文（５）の場合、副詞「ｑｕｉｃｋｌｙ」が主語と動詞の間の位置に存在することは、文法的な誤りであるとは言えないが、英語を母国語とするものがこのような文を書くことは稀である。このため、上記の手法によって、コーパス中の英語を母国語とする話者によって書かれた英語文との比較により、英語文としてより自然な英語文（６）が選択されることが期待できる。なお、英語文の場合「ｉｎｔｈｅｓｕｍｍｅｒ」のような時間を表すＡＤＪＵＮＣＴは文の最後部に位置するのが自然であり、その点からも英語文（６）が選択されることになる。 In the case of the above English sentence (5), the adverb “quickly” existing between the subject and the verb cannot be said to be a grammatical error, but this is what makes English a native language. Writing rare sentences is rare. For this reason, it can be expected that a more natural English sentence (6) is selected as an English sentence by the above-described method, by comparison with an English sentence written by a speaker whose native language is English in the corpus. In the case of an English sentence, it is natural that ADJUNCT representing time such as “in the summer” is located at the end of the sentence, and the English sentence (6) is also selected from this point.

同様に、入力英語文が以下の（１２）の場合に、意味解析部１２におけるＬＦＧに基づく意味解析処理により得られるｆ−ｓｔｒｕｃｔｕｒｅを図１１に示している。 Similarly, FIG. 11 shows f-structure obtained by semantic analysis processing based on LFG in the semantic analysis unit 12 when the input English sentence is (12) below.

（12）Ａｎｎｆａｘｅｄｔｈｅｎｅｗｓｔｏｈｉｍ． (12) Ann faxed the news to him.

この場合、文章生成部１４により、ｆ−ｓｔｒｕｃｔｕｒｅに対して、ＬＦＧに基づく英語文生成処理を施すことにより、対応する英語文として、入力された元の英語文（１２）とともに以下の英語文（１３）が生成される。 In this case, the sentence generation unit 14 applies an English sentence generation process based on LFG to the f-structure, and as a corresponding English sentence, together with the original English sentence (12) input, the following English sentence ( 13) is generated.

（13）Ａｎｎｆａｘｅｄｈｉｍｔｈｅｎｅｗｓ． (13) Ann faxed him the news.

この場合、英語文（１２）は文法的に誤った文であるとは言えないが、間接目的語が代名詞であれば動詞の直後に間接目的語が位置する文が自然である。したがって、コーパス中の英語を母国語とする話者によって書かれた英語文との構文木の比較により、校正結果として英語文（１３）が選択されることが期待できる。 In this case, it cannot be said that the English sentence (12) is a grammatically incorrect sentence, but if the indirect object is a pronoun, the sentence in which the indirect object is located immediately after the verb is natural. Therefore, it can be expected that the English sentence (13) is selected as the proofreading result by comparing the syntax tree with the English sentence written by the speaker whose native language is English in the corpus.

また、入力英語文が以下の（１４）の場合に、意味解析部１２におけるＬＦＧに基づく意味解析処理により得られるｆ−ｓｔｒｕｃｔｕｒｅを図１２に示している。 FIG. 12 shows f-structure obtained by semantic analysis processing based on LFG in the semantic analysis unit 12 when the input English sentence is (14) below.

（14）ＪｏｈｎｇａｖｅａｈａｒｄｔｉｍｅｔｏＢｏｂ． (14) John gave a hard time to Bob.

この場合、文章生成部１４により、ｆ−ｓｔｒｕｃｔｕｒｅに対して、ＬＦＧに基づく英語文生成処理を施すことにより、対応する英語文として、入力された元の英語文（１４）とともに以下の英語文（１５）が生成される。 In this case, the sentence generation unit 14 applies an English sentence generation process based on LFG to the f-structure, and as a corresponding English sentence, together with the input original English sentence (14), the following English sentence ( 15) is generated.

（15）ＪｏｈｎｇａｖｅＢｏｂａｈａｒｄｔｉｍｅ． (15) John gave Bob a hard time.

この場合も、英語文（１４）は文法的に正しい文であるが不自然な文であり、コーパス中の英語を母国語とする話者によって書かれた英語文との構文木の比較により、校正結果として英語文（１５）が選択される。 Again, the English sentence (14) is a grammatically correct sentence but an unnatural sentence, and by comparing the syntax tree with an English sentence written by a speaker whose native language is English in the corpus, The English sentence (15) is selected as the proofreading result.

また、前述の英語文（４）、及び英語文（１６）、（１７）が入力された場合に、意味解析部１２におけるＬＦＧに基づく意味解析処理により得られるｆ−ｓｔｒｕｃｔｕｒｅを図１３〜図１５にそれぞれ示している。 Further, when the above-described English sentence (4) and English sentences (16) and (17) are input, f-structure obtained by the semantic analysis processing based on the LFG in the semantic analysis unit 12 is shown in FIGS. Respectively.

（４）Ｊｏｈｎｉｓｉｍｐｒｏｂａｂｌｅｔｏｂｅｅｌｅｃｔｅｄ．
（16）Ｖｏｔｅｆｏｒｙｏｕ．
（17）ＨｅｉｓｔｈｅＧｏｄｔｈａｔｉｓｉｎＪｅｒｕｓａｌｅｍ． (4) John is improbable to be elected.
(16) Vote for you.
(17) He is the God that is in Jerusalem.

図１３に示したｆ−ｓｔｒｕｃｔｕｒｅからは英語文（４）とともに前出の英語文（２）が、図１４に示したｆ−ｓｔｒｕｃｔｕｒｅからは英語文（１６）とともに英語文（１８）が、図１５に示したｆ−ｓｔｒｕｃｔｕｒｅからは英語文（１７）とともに（１９）及び（２０）が、ｆ−ｓｔｒｕｃｔｕｒｅに対するＬＦＧに基づく英語文生成処理を施すことによってそれぞれ生成される。そして、コーパス中の英語を母国語とする話者によって書かれた英語文との構文木の比較により、最も自然な校正結果として英語文（２）、（１８）、（１９）がそれぞれ選択されることが期待できる。 From the f-structure shown in FIG. 13, the English sentence (2) is displayed together with the English sentence (4), and from the f-structure shown in FIG. From the f-structure shown in FIG. 15, the English text (17) and (19) and (20) are generated by performing the English text generation processing based on the LFG for the f-structure, respectively. Then, by comparing the syntax tree with an English sentence written by a speaker whose native language is English in the corpus, English sentences (2), (18), and (19) are selected as the most natural proofreading results, respectively. Can be expected.

（２）ＩｔｉｓｉｍｐｒｏｂａｂｌｅｔｈａｔＪｏｈｎｗｉｌｌｂｅｅｌｅｃｔｅｄ．
（18）Ｖｏｔｅｆｏｒｙｏｕｒｓｅｌｆ．
（19）ＨｅｉｓｔｈｅＧｏｄｗｈｏｉｓｉｎＪｅｒｕｓａｌｅｍ．
（20）ＨｅｉｓｔｈｅＧｏｄｗｈｉｃｈｉｓｉｎＪｅｒｕｓａｌｅｍ． (2) It is impableable that John will be selected.
(18) Vote for yourself.
(19) He is the God who is in Jerusalem.
(20) He is the God who is in Jerusalem.

上述した本発明の実施形態では、コーパス保持部１６には校正結果の選択をするに十分な数の英語文が保持されていることを前提としている。これに対し、保持されている英語文の数が少ない場合は、シソーラス辞書を用い、校正対象となっている英語文に含まれる単語の類似語を含む例文をコーパス保持部１６から優先的に取り出して利用するようにしてもよい（図１６を参照のこと）。この場合、選択部１８は、文中の単語を上位の概念語で置き変えた上で、文の選択を行なうことができる。例えば、「Ｊｏｈｎ」を「人名」などのマークに置き換えた上で、選択部１８による上記の選択アルゴリズムを実行すればよい。 In the embodiment of the present invention described above, it is assumed that the corpus holding unit 16 holds a sufficient number of English sentences for selecting the calibration result. On the other hand, when the number of English sentences held is small, a thesaurus dictionary is used to preferentially extract example sentences including similar words of words included in the English sentence to be proofread from the corpus holding unit 16. (See FIG. 16). In this case, the selection unit 18 can select a sentence after replacing a word in the sentence with a higher concept word. For example, the above selection algorithm by the selection unit 18 may be executed after replacing “John” with a mark such as “person name”.

また、選択部１８において実行される英語文選択アルゴリズムでは、生成された英語文と表現が類似するコーパス中の英語文の数を元の入力文の場合と閾値Ｔ_aにより比較し、さらに、表現が類似するコーパス中の英語文の数自体を所定の閾値Ｔ_bで判断することで、生成されたそれぞれの英語文についての表現の自然さについて判断若しくは評価している（前述）。ここで、コーパス保持部１６に保持されている文がすべて十分に信頼できる文であれば、Ｔ_ａ及びＴ_bの閾値を小さな（１に近い）値に設定することが可能である。例えば、英語論文に用いる英語文を校正する用途で本発明の文校正支援システムを使用する際には、同じ分野の論文で実際に使用された英語文のみをコーパス保持部１６に保持しておく。そして、構文木の一致数が少ない場合であっても十分信頼できる結果として選択部１８による選択結果を採用することが可能である。 Further, in the English sentence selection algorithm to be executed in the selector 18, the number of English sentences in the corpus representing the generated English sentence is similar compared with the case with the threshold T _a of the original input sentence, further, the expression By judging the number of English sentences in the corpus having a similar value with a predetermined threshold value T _b , the naturalness of expression for each generated English sentence is judged or evaluated (described above). Here, if the statement sentences stored in corpus holder 16 can be any sufficiently reliable, it is possible to set a threshold value of T _a and T _b to a small (close to 1) value. For example, when using the sentence proofreading support system of the present invention for proofreading an English sentence used in an English paper, only the English sentence actually used in a paper in the same field is held in the corpus holding unit 16. . Even if the number of matches in the syntax tree is small, the selection result by the selection unit 18 can be adopted as a sufficiently reliable result.

また、上述した実施形態では、コーパス保持部１６には、文の表現が自然な自然言語文、あるいは母国語話者によって書かれた自然言語文を言語資料として多数保持し、これらのコーパス中の文と、校正対象とする文の意味解析結果を基に生成された文とを構文木比較を行ない、構造木同士が類似（包含）する文を選択することによって、母国語話者からみて自然となる文章を選択するようにしている。これに対し、コーパス保持部１６に、文の表現が自然でない自然言語文、あるいは母国語話者によって自然でないと判断された自然言語文を保持し、これらコーパス中の文と構造木が類似しない文を選択することによって、母国語話者からみて自然となる文章を選択することもできる。あるいは、これらの方法を組み合わせることによって、母国語話者からみてより自然となる文章を選択することができる。 In the above-described embodiment, the corpus holding unit 16 holds many natural language sentences with natural expressions or natural language sentences written by native speakers as language materials. Naturally, it is natural for the native speaker by selecting a sentence whose structure trees are similar (inclusive) by comparing the sentence and the sentence generated based on the semantic analysis result of the sentence to be proofread. The sentence which becomes becomes is selected. On the other hand, the corpus holding unit 16 holds a natural language sentence in which the expression of the sentence is not natural, or a natural language sentence determined not to be natural by the native speaker, and the sentence in the corpus is not similar to the structure tree. By selecting a sentence, it is possible to select a sentence that is natural for the native speaker. Or, by combining these methods, it is possible to select a sentence that is more natural as viewed from the native speaker.

また、構文構造が等しい場合には、それをさらに抽象化した意味解析結果（ｆ−ｓｔｒｕｃｔｕｒｅ）も基本的に等しい構造となる。コーパス保持部１６中の文集合に対し、構文解析結果に加えて意味解析結果もあらかじめ施しておくことによって、まず意味解析結果保持部１３に保持されているｆ−ｓｔｒｕｃｔｕｒｅと等しいｆ−ｓｔｒｕｃｔｕｒｅを持つ文をコーパス保持部１６中から検索し、その後、検索結果のみを対象として上述した構文木の比較アルゴリズムを行なうようにしてもよい（図１７を参照のこと）。これによって、処理の効率化を図ることが可能である。 If the syntax structures are equal, the semantic analysis result (f-structure) obtained by further abstracting the syntax structures is basically the same structure. The sentence set in the corpus holding unit 16 has a f-structure equal to the f-structure held in the semantic analysis result holding unit 13 by applying the semantic analysis result in addition to the syntax analysis result in advance. The sentence may be searched from the corpus holding unit 16, and then the above-described syntax tree comparison algorithm may be performed only on the search result (see FIG. 17). As a result, the processing efficiency can be improved.

また、上述した本発明の実施形態では、１つの校正対象入力文に対して、意味解析部１２において意味解析結果（ｆ−ｓｔｒｕｃｔｕｒｅ）が一意に定まることを前提にして説明を行なったが、実際には１つの文に対して意味解析結果が複数得られることもあり得る。このような場合であっても、意味解析結果毎に同様の上述した文章生成及び構文木の比較に基づく文章校正処理を行なうことが可能である。 In the above-described embodiment of the present invention, the description has been made on the assumption that the semantic analysis unit 12 uniquely determines the semantic analysis result (f-structure) for one proofreading target sentence. May have multiple semantic analysis results for one sentence. Even in such a case, it is possible to perform sentence proofing processing based on the above-described sentence generation and syntax tree comparison for each semantic analysis result.

あるいは、意味解析結果を１つに絞り込んだ後に、上述した文章生成及び構文木の比較に基づく文章校正処理を行なうことも可能である（図１８を参照のこと）。意味解析結果の絞込みには、例えばＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｃｈｉｎｅやその他の機械学習方法を適用することができる。意味解析結果の絞込みに関しては、例えば、吉村、増市、大熊、杉原共著「ＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｃｈｉｎｅに基づくｆ−ｓｔｒｕｃｔｕｒｅの選択」（情報処理学会研究報告２００３−ＮＬ−１５８，ｐｐ．７５−８０（２００３）」を参照されたい。 Alternatively, it is also possible to perform sentence proofing processing based on the above-described sentence generation and syntax tree comparison after narrowing down the semantic analysis result to one (see FIG. 18). For narrowing down the semantic analysis results, for example, Support Vector Machine and other machine learning methods can be applied. Regarding narrowing down of the semantic analysis results, for example, Yoshimura, Masuichi, Okuma, Sugihara, “Selection of f-structure based on Support Vector Machine” (Information Processing Society of Japan Research Report 2003-NL-158, pp. 75-80 (2003) ) ".

図１９には、本発明の他の実施形態に係る文章校正システム１０の構成例を示している。同図に示すシステム１０は、意味解析部１２により得られた校正対象文の意味解析結果であるｆ−ｓｔｒｕｃｔｕｒｅを、ユーザ・インターフェースを介したユーザ入力、あるいはその他の自動化処理に基づいて変更する意味解析結果変更部１９を備えている。 FIG. 19 shows a configuration example of a sentence proofreading system 10 according to another embodiment of the present invention. The system 10 shown in the figure is a meaning for changing the f-structure, which is the result of semantic analysis of the proofreading sentence obtained by the semantic analysis unit 12, based on user input via a user interface or other automated processing. An analysis result changing unit 19 is provided.

意味解析結果変更部１９は、ｆ−ｓｔｒｕｃｔｕｒｅ中の指定された属性とその属性値を削除又は変更することができる。これによって、ユーザの要求に沿って、生成される英語文を増減させるなど、生成される自然言語文を変更することが可能である。例えば、母国語でない言語を用いて文章を書いた文章を校正対象とした場合であって、文中の時制に自信がない場合には、ｆ−ｓｔｒｕｃｔｕｒｅから時制に関する属性とその属性値を削除する。これによって、文章生成部１４では、時制に制限されず英語文を生成し、選択部１８による選択の幅を増やすことができる。 The semantic analysis result changing unit 19 can delete or change the designated attribute and its attribute value in the f-structure. Thereby, it is possible to change the generated natural language sentence such as increasing or decreasing the generated English sentence according to the user's request. For example, when a sentence in which a sentence is written using a language other than the native language is targeted for proofreading and there is no confidence in the tense in the sentence, the tense attribute and the attribute value are deleted from the f-structure. As a result, the sentence generation unit 14 can generate an English sentence without being limited to the tense, and can increase the range of selection by the selection unit 18.

例えば、以下の英語文（２１）を入力として、意味解析部１２がＬＦＧ理論に基づいて意味解析し出力されるｆ−ｓｔｒｕｃｔｕｒｅを図２０に示している。 For example, FIG. 20 shows an f-structure output by the semantic analysis unit 12 based on the LFG theory with the following English sentence (21) as an input.

（21）Ｈｅｉｓｏｗｎｉｎｇａｈｏｕｓｅ． (21) He is owning a house.

ここで、意味解析結果変更部１９が、例えば「校正候補の生成時に、進行形の情報を使用しますか？（する／しない）」というような、あらかじめ用意された質問パターンをユーザに提示し、ｆ−ｓｔｒｕｃｔｕｒｅの変更を行なう。そして、ユーザが（時制について文（２１）が自然であるか否かに自信がなく）「（進行形の情報を使用）しない」と回答した場合には、図２０中のｆ−ｓｔｒｕｃｔｕｒｅから文（２１）が進行形であるという情報（すなわち「ＡＳＰＥＣＴＰＲＯＧ（ｒｅｓｓ）」の部分）を削除した上で、処理を進める。これにより、以下の英語文（２２）のような文が生成されることになる。 Here, the semantic analysis result changing unit 19 presents a prepared question pattern to the user, for example, “Do you want to use progressive information when generating proofreading candidates? (Yes / No)”. , F-structure is changed. Then, when the user replies “does not use (progressive information)” (not sure whether the sentence (21) is natural about the tense) or not, the sentence from the f-structure in FIG. After deleting the information that (21) is the progressive form (that is, the part of “ASPECT PROG (ress)”), the process proceeds. As a result, a sentence such as the following English sentence (22) is generated.

（22）Ｈｅｏｗｎｓａｈｏｕｓｅ． (22) He owns a house.

実際には、英語文において「ｏｗｎ」が現在進行形で使用されることは稀であり、英語文（２２）が校正結果として採用されることになる。 Actually, “own” is rarely used in the progressive form in the English sentence, and the English sentence (22) is adopted as the proofreading result.

上述した各実施形態では、文章校正システムは、入力文に対応するｆ−ｓｔｒｕｃｔｕｒｅの全情報を用いて文の生成を行なっている。ここで、計算量は、文の形態素数が増すにつれ指数関数的に増大するため、長文を対象とした場合には計算コストが問題となる場合がある。この解決方法として、入力文を適切に分割し、分割した要素毎に上述と同様の処理を行なうことが考えられる。例えば、形態素解析結果に基づいて入力文を分割し、分割した要素毎に同様の処理を行なうようにしてもよい。あるいは、ｆ−ｓｔｒｕｃｔｕｒｅの部分毎を対象に同様の処理を行なうことも可能である。 In each of the above-described embodiments, the sentence proofreading system generates a sentence using all information of f-structure corresponding to the input sentence. Here, since the amount of calculation increases exponentially as the number of morphemes of the sentence increases, the calculation cost may be a problem when a long sentence is targeted. As a solution to this, it is conceivable to appropriately divide the input sentence and perform the same processing as described above for each divided element. For example, the input sentence may be divided based on the morphological analysis result, and the same processing may be performed for each divided element. Alternatively, the same processing can be performed for each part of the f-structure.

例えば、形態素解析処理によって、品詞情報付きの形態素列に変換した後、所定の分割ルールを優先度順に順次適用していき、合致する分割ルールを用いて分割位置を決定し、入力文を分割位置の前後の要素に分割することができる。このようにして分割された要素毎に意味解析処理を施すことによって、意味解析結果を分割したと同様の結果を得ることができる。 For example, after converting to a morpheme string with part-of-speech information by morpheme analysis processing, predetermined division rules are sequentially applied in order of priority, division positions are determined using the matching division rules, and the input sentence is divided Can be divided into elements before and after. By performing the semantic analysis process for each element divided in this way, a result similar to that obtained by dividing the semantic analysis result can be obtained.

図２１には、分割ルールの例を示している。分割ルールは、入力文における分割位置を決定するための１つの形態素又は２以上の連続する形態素が持つ形態素情報として記述される。同図に示す例では、優先度１の分割ルールＡとして、接続助詞Ａ類に読点が連結する箇所を分割位置として取り決めている。また、優先度２の分割ルールＢとして、接続助詞Ｂ類に読点が連結する箇所を分割位置として取り決めている。また、優先度３の分割ルールＣとして、活用後の連用形に読点が連結する箇所を分割位置として取り決めている。また、優先度４の分割ルールＤとして、格助詞に読点が連結する箇所を分割位置として取り決めている。また、優先度５の分割ルールＥとして、名詞と係助詞と読点が連続する箇所を分割位置として取り決めている。また、優先度６の分割ルールＦとして、副詞可能名詞に読点が連結する箇所を分割位置として取り決めている。また、優先度７の分割ルールＧとして、読点が出現する箇所を分割位置として取り決めている。また、優先度８の分割ルールＡ’として、接続助詞Ａ類が出現する箇所を分割位置として取り決めている。また、優先度９の分割ルールＢ’として、接続助詞Ｂ類が出現する箇所を分割位置として取り決めている。また、優先度１０の分割ルールＣ’として、活用後の連用形が出現する箇所を分割位置として取り決めている。また、優先度１１の分割ルールＤ’として、格助詞が出現する箇所を分割位置として取り決めている。また、優先度１２の分割ルールＥ’として、係助詞が出現する箇所を分割位置として取り決めている。また、優先度１３の分割ルールＦ’として、副詞可能名詞が出現する箇所を分割位置として取り決めている。 FIG. 21 shows an example of the division rule. The division rule is described as morpheme information of one morpheme or two or more continuous morphemes for determining a division position in the input sentence. In the example shown in the figure, as the division rule A with priority 1, the location where the reading mark is connected to the connection particle A is determined as the division position. Further, as a division rule B with a priority of 2, a portion where a reading point is connected to the connection particle B class is determined as a division position. Further, as a division rule C of priority 3, a place where a reading point is connected to a continuous form after use is decided as a division position. Further, as a division rule D having a priority level 4, a position where a reading mark is connected to a case particle is determined as a division position. Further, as a division rule E of priority 5, a place where a noun, an auxiliary particle and a reading point continue is determined as a division position. Further, as a division rule F having a priority level 6, a position where a reading point is connected to an adverb noun can be determined as a division position. Further, as a division rule G with a priority of 7, a place where a reading point appears is decided as a division position. In addition, as a division rule A ′ having a priority of 8, a place where a connection particle A class appears is determined as a division position. In addition, as a division rule B ′ having a priority of 9, a location where a connection particle B class appears is determined as a division position. In addition, as a division rule C ′ having a priority of 10, a portion where a continuous form after use appears is determined as a division position. In addition, as a division rule D ′ having a priority of 11, a location where a case particle appears is determined as a division position. In addition, as a division rule E ′ having a priority of 12, a part where a coordinator appears is determined as a division position. In addition, as a division rule F ′ having a priority of 13, a location where an adverb noun appears appears as a division position.

また、例えば、ｆ−ｓｔｒｕｃｔｕｒｅの最も外側のマトリックス構造が並列構造になっている場合、すなわち、文が重文になっている場合は、単文に相当するｆ−ｓｔｕｒｕｃｔｕｒｅ毎に生成処理を行ない、それぞれに対応する文を生成しても構わない。また、入れ子になっているＡＤＪＵＮＣＴ（任意の修飾要素）を削除し、文の主要な構成要素だけを用いて生成処理を行なうことも可能である。 For example, when the outermost matrix structure of f-structure has a parallel structure, that is, when the sentence is a heavy sentence, generation processing is performed for each f-structure corresponding to a single sentence. A corresponding sentence may be generated. It is also possible to delete the nested ADJUNCT (arbitrary modifier) and perform the generation process using only the main components of the sentence.

以上、特定の実施形態を参照しながら、本発明について詳解してきた。しかしながら、本発明の要旨を逸脱しない範囲で当業者が該実施形態の修正や代用を成し得ることは自明である。 The present invention has been described in detail above with reference to specific embodiments. However, it is obvious that those skilled in the art can make modifications and substitutions of the embodiment without departing from the gist of the present invention.

本明細書では、入力された自然言語文に対する意味解析結果を得るためにＬＦＧ（ＬｅｘｉｃａｌＦｕｎｃｔｉｏｎａｌＧｒａｍｍｅｒ）理論に基づく意味解析手段を例として具体的に説明してきたが、本発明の要旨はこれに限定されるものではなく、自然言語文の構造を抽象的な意味表現に変換し、さらにその意味表現を自然言語文に逆変換可能な理論あるいは技術を用いれば同じ効果が期待できることは明らかである。このような理論のＬＦＧ以外の例としてＣ．Ｊ．Ｆｉｌｌｍｏｒｅが提案した格文法（ｃａｓｅｇｒａｍｍａｒ）を挙げることができる。 In this specification, the semantic analysis means based on the LFG (Lexical Functional Grammer) theory has been specifically described in order to obtain the semantic analysis result for the input natural language sentence, but the gist of the present invention is not limited thereto. It is obvious that the same effect can be expected by converting the structure of a natural language sentence into an abstract semantic expression and then using a theory or technique that can reversely convert the semantic expression into a natural language sentence. As an example of such a theory other than LFG, C.I. J. et al. The case grammar proposed by Fillmore can be mentioned.

また、本明細書中では、英語文を対象として本発明に係る校正支援システムの説明を行なったが、英語以外の言語であっても、意味解析及び意味解析結果からの文生成が可能であれば、同様の効果が得られることは明らかである。 Also, in this specification, the proofreading support system according to the present invention has been described for English sentences. However, even in languages other than English, it is possible to generate sentences from semantic analysis and semantic analysis results. Obviously, the same effect can be obtained.

要するに、例示という形態で本発明を開示してきたのであり、本明細書の記載内容を限定的に解釈するべきではない。本発明の要旨を判断するためには、冒頭に記載した特許請求の範囲の欄を参酌すべきである。 In short, the present invention has been disclosed in the form of exemplification, and the description of the present specification should not be interpreted in a limited manner. In order to determine the gist of the present invention, the claims section described at the beginning should be considered.

図１は、本発明に係る自然言語処理システムの機能構成を模式的に示した図である。FIG. 1 is a diagram schematically showing a functional configuration of a natural language processing system according to the present invention. 図２は、自然言語文の表層的記述の違いに依存しないｆ−ｓｔｒｕｃｔｕｒｅの言語普遍的性質を、機械翻訳並びに文の校正支援に利用したシステムを説明するための図である。FIG. 2 is a diagram for explaining a system that uses the language universal nature of f-structure, which does not depend on the difference in surface description of natural language sentences, for machine translation and sentence proofreading support. 図３は、本発明の一実施形態に係る文章構成システム１０の構成を模式的に示した図である。FIG. 3 is a diagram schematically showing the configuration of the text composition system 10 according to an embodiment of the present invention. 図４は、ＬＦＧ文法理論に基づく構文意味解析処理システム２００の機能的構成を模式的に示した図である。FIG. 4 is a diagram schematically showing a functional configuration of the syntax and semantic analysis processing system 200 based on the LFG grammar theory. 図５は、意味解析部１２が出力するｆ−ｓｔｒｕｃｔｕｒｅの構成例を示した図である。FIG. 5 is a diagram illustrating a configuration example of f-structure output by the semantic analysis unit 12. 図６は、構文木の構成例を示した図である。FIG. 6 is a diagram illustrating a configuration example of a syntax tree. 図７は、構文木の構成例を示した図である。FIG. 7 is a diagram illustrating a configuration example of a syntax tree. 図８は、構文木の構成例を示した図である。FIG. 8 is a diagram illustrating a configuration example of a syntax tree. 図９は、選択部１８において実行される英語文選択アルゴリズムを示したフローチャートである。FIG. 9 is a flowchart showing an English sentence selection algorithm executed in the selection unit 18. 図１０は、図８に示した構文木が図７に示した構文木を完全に含む構造を有していることを説明するための図である。FIG. 10 is a diagram for explaining that the syntax tree shown in FIG. 8 has a structure that completely includes the syntax tree shown in FIG. 図１１は、意味解析部１２が出力するｆ−ｓｔｒｕｃｔｕｒｅの構成例を示した図である。FIG. 11 is a diagram illustrating a configuration example of f-structure output by the semantic analysis unit 12. 図１２は、意味解析部１２が出力するｆ−ｓｔｒｕｃｔｕｒｅの構成例を示した図である。FIG. 12 is a diagram illustrating a configuration example of f-structure output by the semantic analysis unit 12. 図１３は、意味解析部１２が出力するｆ−ｓｔｒｕｃｔｕｒｅの構成例を示した図である。FIG. 13 is a diagram illustrating a configuration example of f-structure output from the semantic analysis unit 12. 図１４は、意味解析部１２が出力するｆ−ｓｔｒｕｃｔｕｒｅの構成例を示した図である。FIG. 14 is a diagram illustrating a configuration example of f-structure output by the semantic analysis unit 12. 図１５は、意味解析部１２が出力するｆ−ｓｔｒｕｃｔｕｒｅの構成例を示した図である。FIG. 15 is a diagram illustrating a configuration example of f-structure output by the semantic analysis unit 12. 図１６は、シソーラス辞書を含んだ文章校正システム１０の構成例を示した図である。FIG. 16 is a diagram illustrating a configuration example of the sentence proofreading system 10 including a thesaurus dictionary. 図１７は、コーパス中の意味解析結果が等しい構造となる構文解析結果を用いて文章校正を行なう文章校正システム１０の構成例を示した図である。FIG. 17 is a diagram illustrating a configuration example of the sentence proofreading system 10 that performs sentence proofreading using the syntax analysis result having the same structure of the semantic analysis results in the corpus. 図１８は、１つの校正対象入力文に対して複数の意味解析結果が得られた場合に、意味解析結果を１つに絞り込んだ後に文章校正を行なう文章校正システム１０の構成例を示した図である。FIG. 18 is a diagram illustrating a configuration example of the sentence proofreading system 10 that performs proofreading after narrowing down the semantic analysis result to one when a plurality of semantic analysis results are obtained for one proofreading target input sentence. It is. 図１９は、校正対象文の意味解析結果を変更して文章校正を行なう文章校正システム１０の構成例を示した図である。FIG. 19 is a diagram illustrating a configuration example of the sentence proofreading system 10 that performs proofreading by changing the semantic analysis result of the proofreading sentence. 図２０は、意味解析部１２が出力するｆ−ｓｔｒｕｃｔｕｒｅの構成例を示した図である。FIG. 20 is a diagram illustrating a configuration example of f-structure output from the semantic analysis unit 12. 図２１は、入力文の分割位置を決める分割ルールの例を示した図である。FIG. 21 is a diagram illustrating an example of a division rule that determines a division position of an input sentence.

Explanation of symbols

１０…文章構成システム
１１…校正対象文入力部
１２…意味解析部
１３…意味解析結果保持部
１４…文章生成部
１５…生成結果保持部
１６…コーパス保持部
１７…構文解析部
１８…選択部 DESCRIPTION OF SYMBOLS 10 ... Text structure system 11 ... Proofread sentence input part 12 ... Semantic analysis part 13 ... Semantic analysis result holding part 14 ... Text generation part 15 ... Generation result holding part 16 ... Corpus holding part 17 ... Syntax analysis part 18 ... Selection part

Claims

A natural language processing system that supports proofreading of sentences,
Semantic analysis means for performing semantic analysis on natural language sentences to be proofread;
Semantic analysis result holding means for holding a semantic analysis result by the semantic analysis means;
Sentence generation means for generating one or more natural language sentences from the semantic analysis result by the semantic analysis means;
Generation result holding means for holding a generation result by the sentence generation means;
A corpus holding means for holding a large number of natural language sentences;
A parsing means for parsing a natural language sentence;
The natural language sentence held in the generation result holding means and the natural language sentence held in the corpus holding means are respectively compared with the result of the syntax analysis by the syntax analysis means, and held in the generation result holding means. A selecting means for selecting, as a natural sentence, a sentence having a lot of sentences having similar parsing results among the natural language sentences held in the corpus holding means among the two or more natural language sentences;
Comprising
The semantic analysis means performs a semantic analysis based on a lexical functional grammar (LFG) theory, and converts a semantic content of a natural language sentence to be proofread into a functional structure including a nested structure of attribute-attribute value pairs ( functional
output as structure (f-structure)),
The sentence generation unit generates a natural language sentence from a semantic analysis result described in a functional structure obtained by a semantic analysis process based on the vocabulary functional grammar theory in the semantic analysis unit;
Natural language processing system characterized by that.

The parsing means outputs the structure of a word or phrase in a sentence as a syntax tree that is expressed in a tree structure format,
The selection means calculates a similarity based on a distance calculation between syntax trees of the natural language sentences held in the generation result holding means and the natural language sentences held in the corpus holding means, Select a natural sentence that has a lot of sentences with similar parsing results among the natural language sentences held in the corpus holding means.
The natural language processing system according to claim 1.

It also has a thesaurus dictionary,
The selection means replaces a word in a sentence to be proofread with a higher concept word based on the thesaurus dictionary, and then selects a sentence.
The natural language processing system according to claim 1.

Means for obtaining a semantic analysis result for the natural language sentence held in the corpus holding means;
The selecting means targets a sentence having a semantic analysis result equal to a semantic analysis result for a sentence to be proofread held in the semantic analysis result holding means among sentences in the corpus holding means. Select sentence based on comparison,
The natural language processing system according to claim 2.

The semantic analysis means outputs a relationship between a feature name (attribute) and a feature value (attribute value) included in a natural language sentence to be analyzed as a semantic analysis result,
Semantic analysis result changing means for changing or deleting the name of the predetermined feature and the value of the feature included in the semantic analysis result held in the semantic analysis result holding means,
The natural language processing system according to claim 1.

Semantic analysis result dividing means for dividing the semantic analysis result held in the semantic analysis result holding means into two or more, or dividing the sentence to be proofread and performing semantic analysis to obtain two or more semantic analysis results. Prepared,
The sentence generation means performs a sentence generation process for each divided semantic analysis result.
The natural language processing system according to claim 1.

Semantic analysis means, semantic analysis result holding means, sentence generation means, generation result holding means, corpus holding means, syntax analysis means, and sentence correction on a natural language processing system constructed using a computer functioning as a selection means A natural language processing method to support,
The semantic analysis means for performing a semantic analysis on a natural language sentence to be proofread;
The semantic analysis result holding means holds a semantic analysis result holding step of holding a semantic analysis result by the semantic analysis means;
A sentence generating step said sentence generating means, for generating one or more natural language text from the semantic analysis result by the semantic analysis means,
Wherein the generation result holding unit, a generation result holding step of holding the product result by the sentence generating means,
And corpus holding step the corpus holding means for holding a number of natural language text,
The parsing means for parsing a natural language sentence;
The selection unit compares the natural language sentence held in the generation result holding unit and the natural language sentence held in the corpus holding unit by the parsing result by the syntax analysis unit, and the generation result A selection step of selecting, as a natural sentence, a sentence having a lot of sentences having similar parsing results among natural language sentences held in the corpus holding means among two or more natural language sentences held in the holding means. When,
Have
In the semantic analysis step, semantic analysis based on Lexical Functional Grammar (LFG) theory is performed, and the semantic content of a natural language sentence to be proofread is converted into a functional structure consisting of a nested structure of attribute-attribute value pairs ( functional
output as structure (f-structure)),
In the sentence generation step, a natural language sentence is generated from a semantic analysis result described in a functional structure obtained by a semantic analysis process based on the vocabulary functional grammar theory in the semantic analysis step.
A natural language processing method characterized by that.

In the parsing step, the structure of the words and phrases in the sentence is output as a syntax tree expressed in a tree structure format,
In the selection step, a similarity is calculated based on a distance calculation between syntax trees of the natural language sentence held in the generation result holding unit and the natural language sentence held in the corpus holding unit, Select a natural sentence that has a lot of sentences with similar parsing results among the natural language sentences held in the corpus holding means.
The natural language processing method according to claim 7.

In the selection step, based on the thesaurus dictionary, the word in the sentence to be proofread is replaced with a higher-order concept word, and then the sentence is selected.
The natural language processing method according to claim 7.

Further comprising the step of obtaining a semantic analysis result for a natural language sentence as a corpus;
In the selection step, a sentence is selected based on a comparison of syntax trees for a target whose semantic analysis result is equal to a semantic analysis result for the sentence to be proofread in the corpus.
The natural language processing method according to claim 8.

In the semantic analysis step, a relationship between a feature name (attribute) and a feature value (attribute value) included in a natural language sentence to be analyzed is output as a semantic analysis result,
The computer also functions as a semantic analysis result changing means,
The semantic analysis result changing means further includes a semantic analysis result changing step for changing the semantic analysis result obtained by the semantic analysis step.
The natural language processing method according to claim 7.

The computer also functions as a semantic analysis result dividing means,
The semantic analysis result dividing unit divides the semantic analysis result in the semantic analysis step into two or more, or divides a sentence to be proofread and performs semantic analysis to obtain two or more semantic analysis results. Further comprising
In the sentence generation step, a sentence generation process is performed for each divided semantic analysis result.
The natural language processing method according to claim 7.

The computer is described in a computer-readable format so that the processing for supporting the proofreading of the sentence is executed on the computer. The computer is a semantic analysis unit, a semantic analysis result holding unit, a sentence generation unit, a generation result holding unit, and a generation result holding unit. , A computer program for functioning as a corpus holding means, a syntax analysis means, and a selection means ,
The semantic analysis means, lexical functional grammar for a natural language sentence as a calibration target (Lexical Functional
Semantic analysis based on Grammar (LFG) theory is performed, and the semantic content of the natural language sentence is converted into a functional structure consisting of nested structure of attribute-attribute value pairs.
(F-structure)) ,
The semantic analysis result holding unit holds the semantic analysis result by the semantic analysis means,
It said sentence generating means generates the vocabulary features one or more natural language sentences from the result of the semantic analysis is described with the resulting functional structures by semantic analysis processing based on grammar theory in the semantic analysis means,
The product result holding means holds the generated result of the sentence generating means,
The corpus holding unit holds a number of natural language text,
The syntax analysis means, parses the natural language sentence,
The selection means compares the natural language sentence held in the generation result holding means and the result of syntactic analysis of the natural language sentence held in the corpus holding means by the syntax analysis means, and the generation result Of the two or more natural language sentences held in the holding means, the natural language sentence held in the corpus holding means is selected as a natural sentence in which many sentences having similar syntax analysis results exist.
A computer program characterized by being a thing .

The parsing means outputs the structure of a word or phrase in a sentence as a syntax tree that is expressed in a tree structure format,
The selection means calculates a similarity based on a distance calculation between syntax trees of the natural language sentences held in the generation result holding means and the natural language sentences held in the corpus holding means, Select a natural sentence that has a lot of sentences with similar parsing results among the natural language sentences held in the corpus holding means.
The computer program according to claim 13.

The selection means selects a sentence after replacing a word in a sentence to be proofread with a higher-order concept word based on a thesaurus dictionary.
The computer program according to claim 13.

The computer program of claim 14, the computer program further the computer, but the order is made to function as means for acquiring the semantic analysis result for natural language text which is retained in the corpus holding means Yes ,
The selecting means targets a sentence having a semantic analysis result equal to a semantic analysis result for a sentence to be proofread held in the semantic analysis result holding means among sentences in the corpus holding means. Select sentence based on comparison,
A computer program characterized by being a thing.

The semantic analysis means outputs a relationship between a feature name (attribute) and a feature value (attribute value) included in a natural language sentence to be analyzed as a semantic analysis result,
Causing the computer to further function a semantic analysis result changing unit for changing or deleting a name and a feature value of a predetermined feature included in the semantic analysis result held in the semantic analysis result holding unit;
The computer program according to claim 13.

14. The computer program according to claim 13, wherein the computer program further causes the computer to function as a semantic analysis result dividing unit .
The semantic analysis result dividing means divides the semantic analysis result held in the semantic analysis result holding means into two or more, or divides a sentence to be proofread and performs semantic analysis to obtain two or more semantic analysis results. obtained,
The sentence generation means performs a sentence generation process for each divided semantic analysis result.
A computer program characterized by being a thing.