JP6235373B2

JP6235373B2 - Language analysis method and system

Info

Publication number: JP6235373B2
Application number: JP2014036496A
Authority: JP
Inventors: 亮永田
Original assignee: Edulab
Current assignee: Edulab
Priority date: 2013-03-02
Filing date: 2014-02-27
Publication date: 2017-11-22
Anticipated expiration: 2034-02-27
Also published as: JP2014209317A

Description

本発明は、英語文章の正誤を自動解析するための言語解析方法に関し、特に、英語文章における前置詞の正誤をコンピュータで自動解析するための言語解析方法及びシステムに関する。 The present invention relates to a language analysis method for automatically analyzing correctness of English sentences, and more particularly to a language analysis method and system for automatically analyzing correctness of prepositions in English sentences with a computer.

教育支援などの目的で、自由作文された英語文章の正誤をコンピュータで自動解析するような場合にあっては、英語文章が表現しようとする意味を把握する必要があり、構文解析処理と意味解析が用いられる。まず、文章を単語（形態素）に分解し、辞書を参照して各単語にその構文情報や意味情報を付加し（形態素解析処理）、これらから所定の規則に従って文章の句構造や依存構造を機械的に解析する。得られる構文木や動詞に対する格フレームの情報から動詞と名詞の意味的な整合性を与えることで文章が表現しようとする意味を解析できる。 For the purpose of educational support, etc., if the computer automatically analyzes the correctness or inaccuracies of freely-written English sentences, it is necessary to understand the meaning of the English sentences. Is used. First, the sentence is decomposed into words (morphemes), the syntax information and semantic information are added to each word by referring to the dictionary (morpheme analysis processing), and the phrase structure and dependency structure of the sentences are machined according to predetermined rules. Analyze automatically. The meaning of the sentence to be expressed can be analyzed by giving the semantic consistency between the verb and the noun from the information of the case tree for the obtained syntax tree and verb.

例えば、特許文献１では、形態素解析処理のなされた英語文章の単語列を入力すると、該単語列の文頭側の語句から順次着目してその構文的意味関係を判定し、英語文章の句構造若しくは依存構造を決定する構文解析処理をコンピュータで自動解析する方法を開示している。ここでは、接続し得る単語が複数ある前置詞句を含む英語文章において、構文解析処理によりその正当な接続先を決定する方法について特に述べている。これによれば、文章が表現しようとする意味の正確な把握を与え得る。 For example, in Patent Document 1, when a word string of an English sentence that has been subjected to morphological analysis processing is input, the syntactic and semantic relationship is determined by sequentially focusing on the words at the beginning of the word string, and the phrase structure of the English sentence or A method for automatically analyzing a parsing process for determining a dependency structure by a computer is disclosed. Here, a method of determining a valid connection destination by a parsing process in an English sentence including a prepositional phrase having a plurality of connectable words is particularly described. According to this, it is possible to give an accurate grasp of the meaning that the sentence intends to express.

ところで、英語の単語（形態素）である前置詞の用法は複雑であり、文脈に応じた適切な前置詞を選択することは難しい。例えば、“He will go back Japan.”は、“He will go back to Japan.”の前置詞が抜けた誤りであるが、慣用的であるため比較的容易に誤りであることを判断できるであろう。一方、例えば、“I walked with my dog in the morning. ”では、前置詞「with」が不要であるが、誤りの理由を説明することは難しい。つまり、“walk with a dog”では，犬と一緒になって犬のように歩く様子を想起させるので、犬を散歩させるという意味の場合は、“walk a dog”が自然である」というような説明を与えることになる。ここでは、文章が表現しようとする意味によって前置詞の有無の正誤の判断が異なり得るのである。 By the way, the usage of prepositions which are English words (morphemes) is complicated, and it is difficult to select an appropriate preposition according to the context. For example, “He will go back Japan.” Is an error that the preposition of “He will go back to Japan.” Is missing, but because it is idiomatic, it can be judged relatively easily. . On the other hand, for example, “I walked with my dog in the morning.” Does not require the preposition “with”, but it is difficult to explain the reason for the error. In other words, “walk with a dog” reminds us of walking with a dog like a dog, so “walk a dog” is natural when it means walking the dog. ” Will give an explanation. Here, the correctness / incorrectness of the presence or absence of a preposition may differ depending on the meaning of the sentence.

このような英語文章における前置詞の正誤をコンピュータで自動解析する場合にあっても、構文解析処理と意味解析が用いられ得る。ここで、近年、各種のコーパスが整備されているが、非特許文献１では前置詞の誤りがランダムに起こるのではなく母語に応じた誤りの傾向があることを述べており、英語以外の特定言語を母語とする者により作成された英語文章のコーパスには、該特定言語に特有の前置詞の誤りの傾向が反映されているはずである。更に、例えば、非特許文献２に述べられているようなコーパスから格フレームをコンピュータで自動生成する方法を用いることで、英語文章における前置詞の正誤をコンピュータで自動解析できるであろう。 Even when the correctness of the preposition in such an English sentence is automatically analyzed by a computer, syntax analysis processing and semantic analysis can be used. In recent years, various corpora have been prepared, but Non-Patent Document 1 states that preposition errors do not occur randomly but tend to be in error according to the mother tongue. A corpus of English sentences created by a person who speaks native language should reflect the tendency of preposition errors peculiar to the specific language. Furthermore, for example, by using a method of automatically generating a case frame from a corpus as described in Non-Patent Document 2, it is possible to automatically analyze the correctness of a preposition in an English sentence by a computer.

特開２００５−１３４６９１号公報JP 2005-134691 A

Alla Rozovskaya and Dan Roth, "Algorithm Selection and Model Adaptation for ESL Correction Tasks", Proc. of the 49th Annual Meeting of the Association for Computational Linguistics, pp 924-933, Portland, Oregon, June 19-24, 2011Alla Rozovskaya and Dan Roth, "Algorithm Selection and Model Adaptation for ESL Correction Tasks", Proc. Of the 49th Annual Meeting of the Association for Computational Linguistics, pp 924-933, Portland, Oregon, June 19-24, 2011 D. Kawahara and S. Kurohashi, "Acquiring reliable predicate-argument structures from raw corpora for case frame compilation", Proc. of LREC, pp.1389-1393, 2010.D. Kawahara and S. Kurohashi, "Acquiring reliable predicate-argument structures from raw corpora for case frame compilation", Proc. Of LREC, pp.1389-1393, 2010.

上記したように、母語に応じた前置詞の誤りの傾向があるなら、英語以外の言語を母語とする者により作成された英語文章のコーパスから得られた格フレームと、英語を母語とする者により作成された英語文章のコーパスから得られた格フレームとを比較することで、英語以外の特定言語を母語者とする者により作成された英語文章の正誤を自動解析できるはずである。 As mentioned above, if there is a tendency for preposition errors according to the mother tongue, the case frame obtained from a corpus of English sentences created by a person whose mother tongue is a language other than English, and those who are native speakers of English By comparing with the case frame obtained from the corpus of the created English sentence, it should be possible to automatically analyze the correctness of the English sentence created by a person whose mother tongue is a specific language other than English.

本発明は、上記したような状況に鑑みてなされたものであって、その目的とするところは、英語文章における前置詞の正誤をコンピュータで自動解析するための言語解析方法を提供することにある。 The present invention has been made in view of the above situation, and an object of the present invention is to provide a language analysis method for automatically analyzing the correctness of a preposition in an English sentence by a computer.

本発明による言語解析方法は、英語文章における前置詞の正誤を自動解析するためのコンピュータによる言語解析方法であって、（１）英語の母語話者による英語文章からなる参照英語コーパス、及び、英語以外の特定言語の母語話者による英語文章からなる特定英語コーパスのそれぞれにおいて、動詞と、前記動詞の取る表層格について、格の種類に対応する基本格標識、及び、前記格の要素である語に対応する基本格要素、を含む基本格と、前置詞に対応する前置詞格標識、及び、前記前置詞の与えられる要素である語に対応する前置詞格要素、を含む前置詞格と、のセットからなる格フレームを得るステップと、（２）前記参照英語コーパスからの前記格フレームについて、前記動詞、前記基本格及び前記前置詞格の前記前置詞格標識の共通するものを前記前置詞格要素を和集合として１つの格フレームに統合するステップと、（３）前記参照英語コーパスからの前記格フレームに存在しない前記特定英語コーパスからの前記格フレームについて、前記前置詞格標識に対応する前記前置詞とで前記特定言語の母語話者において確率的に誤りやすい前置詞に対応させて前記前置詞格標識を変化させてこれが前記参照英語コーパスからの前記格フレームのいずれかと一致する場合、これを誤り格フレームとして、一致した前記参照英語コーパスからの前記格フレームの前置詞格要素を和集合として加えるステップと、（４）前記特定言語の母語話者により作成された英語文章における前置詞の正誤を前記誤り格フレームにより正誤判定するステップと、を含むことを特徴とする。 The language analysis method according to the present invention is a computer-based language analysis method for automatically analyzing the correctness of a preposition in an English sentence, and (1) a reference English corpus consisting of an English sentence by an English native speaker and other than English In each of the specific English corpora consisting of English sentences by a native speaker of a specific language, a verb and a surface case taken by the verb, a basic case indicator corresponding to the type of case, and a word that is an element of the case A case frame comprising a set of a basic case including a corresponding basic case element, a preposition case indicator corresponding to a preposition, and a preposition case element corresponding to a word that is an element to which the preposition is given. And (2) for the case frame from the reference English corpus, the preposition case indicator of the verb, the basic case, and the preposition case. Integrating the preposition case elements into a case frame as a union of common preposition case elements, and (3) the preposition for the case frame from the specific English corpus not present in the case frame from the reference English corpus The preposition corresponding to the case mark is changed to correspond to a preposition that is probabilistically erroneous in a native language speaker of the specific language, and the preposition case mark is changed to match one of the case frames from the reference English corpus. Adding the prepositional case element of the case frame from the matched reference English corpus as a union, and (4) a preposition in an English sentence created by a native speaker of the specific language. Determining whether the error is true or false based on the error case frame.

かかる発明によれば、英語の母語話者による英語文章からなる参照英語コーパス及び、英語以外の特定言語の母語話者による英語文章からなる特定英語コーパスを用いて誤り格フレームを作成できて、該特定言語の母語話者による英語文章における前置詞の正誤をコンピュータにて自動解析できるのである。 According to this invention, an error case frame can be created using a reference English corpus consisting of English sentences by an English native speaker and a specific English corpus consisting of English sentences by a native language speaker other than English, The computer can automatically analyze the correctness of prepositions in English sentences by native speakers of a specific language.

上記した発明において、（３）前記前置詞格標識に対応する前記前置詞とで前記特定言語の母語話者において確率的に誤りやすい前置詞は、前記前置詞格標識に対応する前記前置詞に対応する前記特定言語の語に更に対応する英語の前置詞のうちの１つであることを特徴としてもよい。かかる発明によれば、上記したような英語文章における前置詞の正誤をより高い精度でコンピュータにて自動解析できるのである。 In the above-described invention, (3) the preposition corresponding to the preposition case indicator is probabilistically erroneous in a native language speaker of the specific language, and the specific language corresponding to the preposition corresponding to the preposition case indicator It may be one of the English prepositions corresponding to the word. According to this invention, the correctness of the preposition in the English sentence as described above can be automatically analyzed with higher accuracy by the computer.

本発明による言語解析システムは、英語文章における前置詞の正誤を自動解析するためのコンピュータによる言語解析システムであって、（１）英語の母語話者による英語文章からなる参照英語コーパス、及び、英語以外の特定言語の母語話者による英語文章からなる特定英語コーパス、のそれぞれにおいて、動詞と、前記動詞の取る表層格について、格の種類に対応する基本格標識、及び、前記格の要素である語に対応する基本格要素、を含む基本格と、前置詞に対応する前置詞格標識、及び、前記前置詞の与えられる要素である語に対応する前置詞格要素、を含む前置詞格と、のセットからなる格フレームを得る手段と、（２）前記参照英語コーパスからの前記格フレームについて、前記動詞、前記基本格及び前記前置詞格の前記前置詞格標識の共通するものを前記前置詞格要素を和集合として１つの格フレームに統合する手段と、（３）前記参照英語コーパスからの前記格フレームに存在しない前記特定英語コーパスからの前記格フレームについて、前記前置詞格標識に対応する前記前置詞とで前記特定言語の母語話者において確率的に誤りやすい前置詞に対応させて前記前置詞格標識を変化させてこれが前記参照英語コーパスからの前記格フレームのいずれかと一致する場合、これを誤り格フレームとして、一致した前記参照英語コーパスからの前記格フレームの前置詞格要素を和集合として加える手段と、を含むことを特徴とする。 The language analysis system according to the present invention is a computer-based language analysis system for automatically analyzing the correctness of a preposition in an English sentence, and (1) a reference English corpus consisting of an English sentence by an English native speaker and other than English In each of a specific English corpus consisting of English sentences by a native speaker of a specific language, a verb and a surface case taken by the verb, a basic case indicator corresponding to the type of case, and a word that is an element of the case And a preposition case including a preposition case indicator corresponding to a preposition case indicator corresponding to a preposition and a preposition case element corresponding to a word that is an element to which the preposition is given. Means for obtaining a frame; (2) for the case frame from the reference English corpus, the preposition of the verb, the basic case and the preposition case; Means for integrating the prepositional case elements into a case frame as a union of common signs, and (3) for the case frame from the specific English corpus not present in the case frame from the reference English corpus, The preposition corresponding to the preposition case marker and the preposition case marker corresponding to the preposition that is probabilistically erroneous in the native language speaker of the specific language is changed to change any of the preposition case markers from the reference English corpus. Means for adding a preposition case element of the case frame from the matched reference English corpus as a union with the case frame as an error case frame if they match.

かかる発明によれば、英語の母語話者による英語文章からなる参照英語コーパス及び、英語以外の特定言語の母語話者による英語文章からなる特定英語コーパスを用いて誤り格フレームを作成できて、該特定言語の母語話者による英語文章における前置詞の正誤をコンピュータで自動解析させ得るのである。 According to this invention, an error case frame can be created using a reference English corpus consisting of English sentences by an English native speaker and a specific English corpus consisting of English sentences by a native language speaker other than English, The computer can automatically analyze the correctness of prepositions in English sentences by native speakers of a specific language.

上記した発明において、（４）前記特定言語の母語話者により作成された英語文章における前置詞の正誤を前記誤り格フレームにより正誤判定する手段と、を含むことを特徴としてもよい。かかる発明によれば、上記したような英語文章における前置詞の正誤をコンピュータで自動解析できるのである。 In the above-described invention, it may include (4) means for determining whether a preposition is correct or incorrect in an English sentence created by a native speaker of the specific language based on the error case frame. According to this invention, the correctness of the preposition in the English sentence as described above can be automatically analyzed by the computer.

上記した発明において、（３）前記前置詞格標識に対応する前記前置詞とで前記特定言語の母語話者において確率的に誤りやすい前置詞は、前記前置詞格標識に対応する前記前置詞に対応する前記特定言語の語に更に対応する英語の前置詞のうちの１つであることを特徴としてもよい。かかる発明によれば、上記したような英語文章における前置詞の正誤をより高い精度でコンピュータによって自動解析できるのである。 In the above-described invention, (3) the preposition corresponding to the preposition case indicator is probabilistically erroneous in a native language speaker of the specific language, and the specific language corresponding to the preposition corresponding to the preposition case indicator It may be one of the English prepositions corresponding to the word. According to this invention, the correctness of the preposition in the English sentence as described above can be automatically analyzed by the computer with higher accuracy.

本発明のシステム構成を示す図である。It is a figure which shows the system configuration | structure of this invention. 誤り格フレームを示す図である。It is a figure which shows an error case frame. 本発明の方法の要部である誤り格フレーム生成のフロー図である。It is a flowchart of the error case frame production | generation which is the principal part of the method of this invention. 格フレームを示す図である。It is a figure which shows a case frame. 格フレームの統合の説明図である。It is explanatory drawing of integration of a case frame. 確率的に誤りやすい前置詞の説明図である。It is explanatory drawing of the preposition which is easy to mistake stochastically. 格フレームの訂正情報の決定についての図である。It is a figure about determination of the correction information of a case frame. 格フレームの訂正情報の決定についての図である。It is a figure about determination of the correction information of a case frame. 格フレームの訂正情報の決定についての図である。It is a figure about determination of the correction information of a case frame.

図１乃至図７を用いて、本発明の１つの実施例による、英語文章における前置詞の正誤をコンピュータで自動解析するための言語解析方法及びそのためのシステムの詳細を説明する。 The details of a language analysis method for automatically analyzing the correctness of a preposition in an English sentence by a computer and a system therefor according to one embodiment of the present invention will be described with reference to FIGS.

図１に示すように、言語解析システム１は、主として、解析処理の中枢を担う中央制御部３０と、中央制御部３０とともに各種処理を行う処理プログラムとしての誤り格フレーム作成部３２及び正誤判断部３４とを含む。また、中央制御部３０は、適宜、コーパス部１００の英語の母語話者による英語文章からなる母語話者コーパス（参照英語コーパス）１０２及び英語以外の特定言語の母語話者による英語文章からなる非母語話者コーパス（特定英語コーパス）１０４の情報を参照可能に接続されている。なお、コーパス部１００は、言語解析システム１の外部にあって、ネット回線などを経由して参照可能になっていてもよい。更に、前置詞の正誤判断の対象となる英語文章のテキストデータを入力するためのキーボードやスキャナなどの入力装置５１、解析結果を出力するための印字機（プリンタ）５２や映像装置（モニタ）５３などが入出力インターフェース部５０を介して接続されている。 As shown in FIG. 1, the language analysis system 1 mainly includes a central control unit 30 that plays a central role in analysis processing, an error case frame generation unit 32 as a processing program that performs various processes together with the central control unit 30, and a correctness determination unit. 34. The central control unit 30 appropriately includes a native speaker corpus (reference English corpus) 102 made up of English sentences by an English native speaker of the corpus unit 100 and non-English sentences made up of native speakers of a specific language other than English. The information of the native speaker corpus (specific English corpus) 104 is connected so that it can be referred to. The corpus unit 100 may be outside the language analysis system 1 and can be referred to via a net line or the like. Furthermore, an input device 51 such as a keyboard or a scanner for inputting text data of an English sentence that is a target of correctness determination of the preposition, a printing machine (printer) 52 and a video device (monitor) 53 for outputting analysis results, etc. Are connected via the input / output interface unit 50.

まず、誤り格フレーム作成部３２により作成される誤り格フレーム１０の構成について説明する。 First, the configuration of the error case frame 10 created by the error case frame creation unit 32 will be described.

図２に示すように、誤り格フレーム１０は、文章の中心となる動詞１７を必ず含み、動詞１７を記載するスロットとしての動詞欄１１以外に、基本格を記載するスロットとしての基本格欄１２、前置詞格を記載するスロットとしての前置詞格欄１４、前置詞の誤りに関する説明を記述する部分であるフィードバックメッセージ欄１６からなる。つまり、基本格欄１２及び前置詞格欄１４は、英語文章中の動詞１７がどのような表層格を取るかを記載する欄となる。 As shown in FIG. 2, the error case frame 10 always includes a verb 17 that is the center of a sentence. In addition to the verb column 11 as a slot in which the verb 17 is written, the basic case column 12 as a slot in which the basic case is written. , A preposition case column 14 as a slot describing a preposition case, and a feedback message column 16 which is a portion describing an explanation of a preposition error. That is, the basic case column 12 and the preposition case column 14 are columns describing what surface case the verb 17 in the English sentence takes.

基本格欄１２及び前置詞格欄１４の”Subj:”、“Prt:”、“Prep_do:”、”Prep_with:”などの格標識１８は、主格などの格の種類を表すためのラベルである。また、これら格標識１８の横に並ぶ“PERSON”、”back”、”tokyo”、”japan”の格要素１９は、格標識１８の付与される語を表している。なお、人を表す“PERSON”や中カッコ{}については後述する。更に、以下において、特に断らない限り、「格」とは、格標識１８と格要素１９とを合わせたものを指称するものとする。 Case indicators 18 such as “Subj:”, “Prt:”, “Prep_do:”, and “Prep_with:” in the basic case column 12 and the preposition case column 14 are labels for indicating the type of case such as the main case. In addition, the case elements 19 of “PERSON”, “back”, “tokyo”, and “japan” lined next to these case indicators 18 represent words to which the case indicator 18 is attached. Note that “PERSON” and curly braces {} representing people will be described later. Further, hereinafter, unless otherwise specified, “case” refers to a combination of the case indicator 18 and the case element 19.

ここで、基本格欄１２は少なくとも１つ以上の格からなるものとし、基本格欄１２に入り得る格標識１８は、例えば、少なくとも、“Subj:”（Subject：主格）、“Prt”（Particle：小詞）、“Com”（Complement：補語）の３種類を考慮する。なお、“Subj:”は必須である。 Here, the basic case column 12 is composed of at least one case, and the case indicators 18 that can enter the basic case column 12 include, for example, at least “Subj:” (Subject: main case), “Prt” (Particle : Verb) and “Com” (Complement). “Subj:” is indispensable.

前置詞格欄１４も少なくとも１つ以上の格からなるものとし、動詞が取りうる前置詞を記述する。具体的には、前置詞格欄１４の格標識１８は、“Prep_x”のように記述する。但し、xの部分には前置詞が入る。例えば、前置詞が”to”であれば、”Prep_to”のように記述する。なお、動詞の直接目的語を表すための”Prep_do”や、間接目的語を表すための”Prep_io”も便宜的に前置詞格に含める。これは、前置詞の抜け落ちや、前置詞の不要な場合に対応するためである。 The preposition case column 14 is also composed of at least one case, and describes prepositions that can be taken by the verb. Specifically, the case indicator 18 in the preposition case column 14 is described as “Prep_x”. However, a preposition is entered in the x part. For example, if the preposition is “to”, it is described as “Prep_to”. Note that “Prep_do” for representing the direct object of the verb and “Prep_io” for representing the indirect object are also included in the preposition case for convenience. This is to cope with omission of prepositions or when no preposition is required.

さらに、前置詞格欄１４において、誤りがある格に“*”を付与することで誤り情報であることを示す。例えば、図２では、“*Prep_do:{tokyo, japan}”の部分が誤りとなる格であり、”Prep_do”、すなわち、直接目的語として、“tokyo”や“japan”を取ることは誤りであって、何らかの前置詞が必要なことを意味している。これに対して、誤りである格の後ろに、訂正情報を“→”を用いて記述する。つまり、“*Prep_do:{tokyo, japan}”は、“Prep_to:”が前置詞格として正しいことを意味している。 Further, in the preposition case column 14, “*” is given to a case with an error to indicate error information. For example, in Figure 2, “* Prep_do: {tokyo, japan}” is an incorrect case, and it is incorrect to take “Prep_do”, that is, “tokyo” or “japan” as the direct object. It means that some kind of preposition is necessary. On the other hand, correction information is described using “→” after the case that is an error. In other words, “* Prep_do: {tokyo, japan}” means that “Prep_to:” is correct as a preposition case.

基本格欄１２と前置詞格欄１４において、共通して使用される記述方式を２種類定義する。１つは、括弧“（）”を用いて表す任意格であり、例えば、“（Prt:back）”の如きである。もう１つは、格要素１９が複数ある場合に、複数の格要素をカンマで区切って、中括弧で囲うこととし、例えば、”*Prep_do:{tokyo, japan}”の如きである。なお、例えば、特定の動詞に依存しないような誤り格フレーム１０を定義する場合、動詞欄１１に“ALL”を入れるようにもできる。つまり、どのような動詞１７であっても、かかる誤り各フレーム１０が該当するのである。これにより、同じ格を有し、且つ、同じ誤りを有する異なる動詞に対する誤り格フレーム１０をまとめて記述できるのである。同様に、格標識１８や格要素１９においても、特定のものに依存しない表記を与えて、誤り格フレーム１０をまとめて記述できるようにもできる。 In the basic case column 12 and the preposition case column 14, two types of description methods commonly used are defined. One is an arbitrary case expressed by using parentheses “()”, for example, “(Prt: back)”. The other is that when there are a plurality of case elements 19, a plurality of case elements are separated by commas and enclosed in braces, for example, “* Prep_do: {tokyo, japan}”. For example, when an error case frame 10 that does not depend on a specific verb is defined, “ALL” may be entered in the verb column 11. In other words, the error frame 10 corresponds to any verb 17. Thus, error case frames 10 for different verbs having the same case and the same error can be collectively described. Similarly, also in the case indicator 18 and the case element 19, it is possible to give a description that does not depend on a specific thing so that the error case frames 10 can be described collectively.

フィードバックメッセージ欄１６は、前置詞の誤りに関する説明を記述する部分であって、後述する誤り格フレーム１０を解釈し、主に作業者によって記述される。かかる説明は、誤り検出／訂正の際に、学習者へのフィードバックなどに使用できる。 The feedback message column 16 is a part describing an explanation regarding the preposition error, and is mainly described by an operator by interpreting an error case frame 10 described later. Such an explanation can be used for feedback to a learner during error detection / correction.

次に、誤り格フレーム作成部３２により誤り格フレーム１０を作成する方法について図３に沿って説明する。 Next, a method for creating the error case frame 10 by the error case frame creation unit 32 will be described with reference to FIG.

ところで、誤り格フレーム１０を生成するための基本アイデアは、非母語話者コーパス１０４に存在し、母語話者コーパス１０２には存在しない格フレームを誤り格フレーム１０とするものである。但し、これだけでは、正しい格フレームが誤り格フレーム１０として抽出されてしまう。そこで、以下のような方法を採用する。 Incidentally, the basic idea for generating the error case frame 10 is that the case frame that exists in the non-native speaker corpus 104 and does not exist in the native speaker corpus 102 is the error case frame 10. However, with this alone, the correct case frame is extracted as the error case frame 10. Therefore, the following method is adopted.

（１）コーパスからの格フレームの生成
まず、母語話者コーパス１０２及び非母語話者コーパス１０４のそれぞれについて、各文を構文解析する下処理を行っておく（図３、Ｓ１）。ここで、本解析においては、後述する誤り格フレーム１０の作成に不適切な文を予め除外しておくことが好ましい。これは、例えば、所定以上のトークンの長さの文や、所定個数以上のカンマを含む文を除く処理である。また、適宜、この処理を非母語話者コーパス（特定英語コーパス）１０４よりも大規模となる母語話者コーパス（参照英語コーパス）１０２のみに与えても良い。 (1) Generation of Case Frame from Corpus First, for each of the native speaker corpus 102 and the non-native speaker corpus 104, a sub-process for parsing each sentence is performed (FIG. 3, S1). Here, in this analysis, it is preferable to exclude in advance a sentence inappropriate for creating an error case frame 10 described later. This is, for example, processing for removing a sentence having a token length greater than or equal to a predetermined number or a sentence including a predetermined number or more of commas. In addition, this processing may be applied only to the native speaker corpus (reference English corpus) 102 that is larger than the non-native speaker corpus (specific English corpus) 104 as appropriate.

次に、図４に示すように、構文解析の結果から、格フレーム１０ａの動詞欄１１、基本格欄１２及び前置詞格欄１４の各スロットを埋めて格フレームを生成する（図３、Ｓ２）。例えば、図４（ａ）に示す”He will go back Japan with his son.”なる英語文章に対しては、図４（ｂ）に示すように、動詞”go”を動詞欄１７に、その他の格を対応する箇所に配置していく。 Next, as shown in FIG. 4, a case frame is generated from the result of syntax analysis by filling the slots of the verb column 11, basic case column 12, and preposition case column 14 of the case frame 10a (FIG. 3, S2). . For example, for the English sentence “He will go back Japan with his son.” Shown in FIG. 4A, the verb “go” is entered in the verb column 17 as shown in FIG. Place the case in the corresponding place.

ここで、格要素１９には、対応する名詞相当句の主辞（head）を小文字且つ原形にしたものを用いる。例えば、”Japan”は、”japan”とする。但し、接尾辞“-ing”は前置詞の決定に影響を与えることがあるため、語尾が“-ing”である語については原形にしない。また、一部の語については、対応する意味を表す特別な語に置換する。この意味を表す特別な語は大文字のみを用いて表記する。例えば、“he”や”his son”は、人であることを表す“PERSON”に置換する。かかる置換は、単純な辞書引きに基づいて自動的に行うことができる。また、非母語話者コーパス（特定英語コーパス）１０４からの処理には、スペルチェッカにより綴り誤りを訂正しておくことが好ましい。 Here, as the case element 19, a corresponding noun equivalent phrase head (lower) in lowercase and original form is used. For example, “Japan” is “japan”. However, since the suffix “-ing” may affect the determination of the preposition, the word ending with “-ing” is not made the original form. Some words are replaced with special words representing the corresponding meanings. Special words representing this meaning are written using only capital letters. For example, “he” and “his son” are replaced with “PERSON” representing a person. Such replacement can be done automatically based on a simple dictionary lookup. In addition, it is preferable to correct spelling errors by a spell checker for processing from the non-native speaker corpus (specific English corpus) 104.

以下において、母語話者コーパス１０２及び非母語話者コーパス１０４からそれぞれ抽出された格フレーム１０ａの集合を母語話者格フレーム１０ｂ及び非母語話者格フレーム１０ｃとする。 Hereinafter, a set of case frames 10a extracted from the native speaker corpus 102 and the non-native speaker corpus 104 will be referred to as a native speaker case frame 10b and a non-native speaker case frame 10c.

ところで、以下の３つの条件のいずれかに当てはまる場合には、例外として上記した格フレーム１０ａの生成を行わないことも場合に応じて考慮できる。１つ目は、動詞が接続詞により並列されている場合であり、例えば、”go and get it”のようなものである。これは、並列により前置詞の用法が変更されることがあるためである。２つ目は、“be”、“do”、“have”は、助動詞としても使われる特殊な動詞であり、例外とし得る。３つ目は、格要素が、“it”、“this”、“that”、“one”及び通常名詞の働きをしない単語、例えば、”the”である場合も例外とすることも考慮できる。“it”、“this”、“that”、“one”は、具体的に指すものにより格の用法が異なると考えられるためである。その他については、構文解析の誤りの可能性が高いためである。 By the way, when any of the following three conditions is satisfied, it can be considered according to circumstances that the case frame 10a is not generated as an exception. The first is a case where verbs are paralleled by a conjunction, for example, “go and get it”. This is because the usage of prepositions may be changed by parallelism. Second, “be”, “do”, and “have” are special verbs that are also used as auxiliary verbs, and can be exceptions. Third, it can be considered that the case element is “it”, “this”, “that”, “one” and a word that does not function as a normal noun, for example, “the”. This is because “it”, “this”, “that”, and “one” are considered to be used in different ways depending on what is specifically indicated. For others, there is a high possibility of syntax error.

ここで、上記した格フレーム中の任意格となる前置詞格の同定は、（i）目的語は常に必須格とする（目的語も便宜的に前置詞格として扱う）、（ii）動詞より左に出現する前置詞格は常に任意格とする、（iii）動詞より右に出現する前置詞格は動詞に一番近いものを除いて全て任意格とする、ことによる。例えば、（ii）について、“In the morning, he went shopping.”では、“In the morning”の前置詞格が動詞よりも左に出現しているため任意格とされる。また、（iii）について、“He went to the market with his family.”では、動詞からより遠い“with his family”が任意格とされるのである。 Here, the identification of the preposition case that is an arbitrary case in the case frame is as follows: (i) The object is always an indispensable case (the object is also treated as a preposition case for convenience), (ii) To the left of the verb The preposition case that appears is always an arbitrary case. (Iii) The preposition cases that appear to the right of the verb are all arbitrary cases except the one closest to the verb. For example, with regard to (ii), “In the morning, he went shopping.” Is an arbitrary case because the preposition case of “In the morning” appears to the left of the verb. Regarding (iii), “He went to the market with his family.” Means that “with his family”, which is farther from the verb, is an arbitrary case.

任意格の同定について、上記したヒューリスティクス以外に、２つの格フレームを比較してもよい。例えば、“He went shopping” と“He went shopping at the market.”とを比較すると、“at the market”がなくとも文として成立し得て、これを任意格と同定出来得るのである。 In addition to the heuristics described above, two case frames may be compared for arbitrary case identification. For example, comparing “He went shopping” and “He went shopping at the market.”, It can be established as a sentence without “at the market”, and can be identified as an arbitrary case.

（２）格フレームの統合
母語話者コーパス１０２から抽出された母語話者格フレーム１０ｂについて統合処理を行う（図３、Ｓ３）。統合処理は、母語話者格フレーム１０ｂの２つの格フレーム１０ａについて、（i）動詞が同一であり、（ii）基本格が同一であり、且つ、（iii）前置詞格の格標識が同一である場合に、前置詞格欄１４の格要素１９を格標識１８ごとに統合する。 (2) Case Frame Integration An integration process is performed on the native speaker case frame 10b extracted from the native speaker corpus 102 (FIG. 3, S3). In the integration process, for the two case frames 10a of the native speaker case frame 10b, (i) the verb is the same, (ii) the basic case is the same, and (iii) the preposition case case mark is the same. In some cases, the case elements 19 in the preposition case column 14 are integrated for each case indicator 18.

図５に示すように、例えば、格フレーム１０ｂ−１の[Prep_to:tokyo]と格フレーム１０ｂ−２の[Prep_to:japan]とは、”Prep_to:”の格要素１９である”tokyo”及び”japan”以外を共通にするため、この格要素１９について中カッコ{}を用いた和集合の型式にして、格フレーム１０ｂのように[Prep_to:{tokyo, japan}]と統合する。 As shown in FIG. 5, for example, [Prep_to: tokyo] of the case frame 10b-1 and [Prep_to: japan] of the case frame 10b-2 are case elements 19 of “Prep_to:” “tokyo” and “ In order to make common except for “japan”, the case element 19 is integrated into [Prep_to: {tokyo, japan}] as in the case frame 10b by making the form of a union using curly braces {}.

なお、上記したように、統合処理は母語話者コーパス１０２から抽出された母語話者格フレーム１０ｂについてのみ行う。これは、非母語話者コーパス１０４から抽出された非母語話者格フレーム１０ｃには、正しい格フレームと誤り格フレームの両方が含まれるため、両者が統合されてしまうと１つの格フレームに正誤の格要素１９が含まれてしまうからである。なお、非母語話者格フレーム１０ｃについては、動詞、基本格、前置詞格が同一である場合にのみ統合を行ってもよい。また、統合の際に、各格要素１９の頻度を記録し、頻度情報を誤り格フレーム１０の生成に利用しても良い。 As described above, the integration process is performed only for the native speaker case frame 10 b extracted from the native speaker corpus 102. This is because the non-native speaker case frame 10c extracted from the non-native speaker corpus 104 includes both a correct case frame and an erroneous case frame. This is because the case element 19 is included. The non-native speaker case frame 10c may be integrated only when the verb, the basic case, and the preposition case are the same. Further, at the time of integration, the frequency of each case element 19 may be recorded, and the frequency information may be used for generating the error case frame 10.

（３）誤り格フレーム候補の取得
母語話者格フレーム１０ｂと非母語話者格フレーム１０ｃとを比較し、誤り格フレーム１０の候補を取得する（図３、Ｓ４）。ここでは、非母語話者コーパス１０４から抽出された非母語話者格フレーム１０ｃにのみ存在する格フレームを誤り格フレーム１０の候補とする。 (3) Acquisition of Error Case Frame Candidates The native speaker case frame 10b and the non-native speaker case frame 10c are compared to acquire the error case frame 10 candidates (FIG. 3, S4). Here, case frames that exist only in the non-native speaker case frame 10 c extracted from the non-native speaker corpus 104 are set as candidates for the error case frame 10.

（４）訂正情報の決定
誤り格フレーム１０の候補に対して訂正情報を決定する（図３、Ｓ５）。これには母語の影響を考慮した後述する誤りセット（confusion set）を用いて、前置詞格欄１４内の格標識１８を変更しつつ決定する。なお、格標識１８が複数あるときは、１つのみ変更しつつ決定する。 (4) Determination of correction information Correction information is determined for the error case frame 10 candidate (FIG. 3, S5). This is determined by changing the case indicator 18 in the preposition case column 14 using an error set (confusion set) described later in consideration of the influence of the native language. In addition, when there are a plurality of case indicators 18, only one is changed and determined.

ところで、母語の影響を考慮するには、当該母語話者の書いた英語文章からなる非母語話者コーパス１０４を用いればよい。例えば、フランス語を母語とするフランス語話者を対象とする場合には、フランス語話者が書いた英語文章を非母語話者コーパス１０４に使用することで、自然に母語の影響を考慮できる。 By the way, in order to consider the influence of the native language, a non-native speaker corpus 104 composed of English sentences written by the native speaker may be used. For example, when a French speaker whose native language is French is used, an English sentence written by the French speaker is used for the non-native speaker corpus 104, so that the influence of the native language can be considered naturally.

ここで、上記した非特許文献１に述べられているように、前置詞の誤りはランダムに起こるのではなく、母語に応じた誤りの傾向がある。例えば、図６に示すように、フランス語の前置詞“`a”は、英語の前置詞“at”、“in”、“to”などに対応するため、フランス語話者は、これらの前置詞を互いに混同する傾向にあると予想できる。そこで、例えば[“at”、“in”]を“to”に対する誤りセットとする。このような母語に応じた英語の各前置詞の誤りセットを用意し、訂正情報の決定（図３、Ｓ５）に用いるのである。 Here, as described in Non-Patent Document 1 described above, preposition errors do not occur randomly, but tend to be errors according to the mother tongue. For example, as shown in FIG. 6, the French preposition “` a ”corresponds to the English prepositions“ at ”,“ in ”,“ to ”, etc., so French speakers confuse these prepositions with each other. It can be expected that there is a tendency. Therefore, for example, [“at”, “in”] is an error set for “to”. An error set of each English preposition corresponding to such a native language is prepared and used for determination of correction information (S5 in FIG. 3).

本実施例では、統計的機械翻訳の確率テーブルを利用して自動的に誤りセットを作成する。つまり、直観的には、確率の値に基づいて、混同されやすい前置詞を特定していることになる。 In this embodiment, an error set is automatically created using a statistical machine translation probability table. That is, intuitively, prepositions that are easily confused are specified based on the probability value.

再び、図６を参照すると、具体的には、左の列がフランス語（仏語）の単語、右の列が英語の単語である。なお、この例では、左右全ての単語が前置詞であるが、必ずしも前置詞である必要はなく、対応する単語であればよい。図中の矢印は、フランス語の各単語が翻訳されやすい英単語を表す。すなわち、”e”が英単語、”ｆ”がフランス語の単語をそれぞれ表すとき、確率Pr(e|f)がある一定の値以上の単語の組に矢印が付与されている。例えば、英語の“to”は、フランス語の“`a”から翻訳される確率が高いことを示す。一方で、フランス語の“`a”は、“to”以外にも“at”と“in”にも翻訳されやすい。つまり、“to”は“at”や“in”と混同されやすい。このように、矢印を２回たどることで誤りセットを作成する。 Referring to FIG. 6 again, specifically, the left column is a French (French) word and the right column is an English word. In this example, all the left and right words are prepositions, but they are not necessarily prepositions and may be corresponding words. Arrows in the figure represent English words in which each French word is easily translated. That is, when “e” represents an English word and “f” represents a French word, an arrow is given to a pair of words having a probability Pr (e | f) equal to or greater than a certain value. For example, “to” in English indicates a high probability of being translated from “` a ”in French. On the other hand, French “` a ”is easily translated into“ at ”and“ in ”in addition to“ to ”. In other words, “to” is easily confused with “at” or “in”. In this way, an error set is created by following the arrow twice.

最終的に、前置詞の抜け落ちや、前置詞の不要な場合に対応するために、それぞれ“Prep_do”と“Prep_io”も誤りセットに加える。例えば、“to”に対する誤りセットとして、{Prep_at, Prep_in, Prep_do, Prep_io}が得られる。 Finally, "Prep_do" and "Prep_io" are also added to the error set, respectively, in order to cope with missing prepositions and cases where prepositions are unnecessary. For example, {Prep_at, Prep_in, Prep_do, Prep_io} is obtained as an error set for “to”.

なお、図６において、“in”のように、１回目にたどる矢印が複数ある場合は、それぞれの矢印をたどり、得られた前置詞の和集合を誤りセットとする。つまり、“in”に対する誤りセットは、{Prep_to, Prep_at, Prep_of, Prep_do, Prep_io}となる。 In FIG. 6, when there are a plurality of arrows to be traced for the first time as in “in”, each arrow is traced, and the union of the obtained prepositions is set as an error set. That is, the error set for “in” is {Prep_to, Prep_at, Prep_of, Prep_do, Prep_io}.

図７（ａ）に示した[Prep_do:Tokyo Prep_with:PERSON]の訂正情報の決定について説明する。”Prep_do:”について、これを含む誤りセットを選択しこの中から他の前置詞を含む格標識１８、例えば、誤りセット{Prep_at, Prep_in, Prep_do, Prep_io}のうちの”Prep_at”や“Prep_to:”に変更する。例えば、“Prep_to:”に変更するなら、[Prep_to:Tokyo Prep_with:PERSON]を得る。かかる格フレームが母語話者格フレーム１０ｂに存在すれば、その格標識１８は正しいものと判定し、訂正情報として決定する（図７（ｂ）参照）。更に、図８に示すように、訂正情報を示す”*”を与えて、誤り格フレーム１０の候補に記述したものを誤り格フレーム１０として確定する。 Determination of correction information of [Prep_do: Tokyo Prep_with: PERSON] shown in FIG. 7A will be described. For “Prep_do:”, an error set including this is selected, and a case indicator 18 including other prepositions, for example, “Prep_at” or “Prep_to:” of the error set {Prep_at, Prep_in, Prep_do, Prep_io} is selected. Change to For example, if you change to “Prep_to:”, you get [Prep_to: Tokyo Prep_with: PERSON]. If such a case frame is present in the native speaker case frame 10b, the case indicator 18 is determined to be correct and determined as correction information (see FIG. 7B). Further, as shown in FIG. 8, “*” indicating correction information is given, and what is described as the error case frame 10 candidate is determined as the error case frame 10.

（５）格要素の拡張
次に、誤り格フレーム１０のカバー率を向上させるために、前置詞格欄１４内の格要素１９を拡張する（図３、Ｓ６）。上記した訂正情報により、誤り格フレーム１０に対応する正しい格フレームが母語話者格フレーム１０ｂにおいて特定できる。図７（ｂ）に示すように、例えば、誤り格フレーム１０の[*Prep_do:tokyo→Prep_to]（図８参照）に対して、母語話者格フレーム１０ｂでは[Prep_to:{tokyo, japan}]が対応する。統合処理（図３、Ｓ３）により母語話者格フレーム１０ｂでは、格要素１９が統合されて和集合で記載されている。図９に示すように、この格要素１９の情報を誤り格フレーム１０の対応する格に追加して、誤り格フレーム１０の格要素１９を拡張できる。つまり、図２に示すように、[*Prep do:{tokyo, japan}→Prep_to]として格要素１９に“japan”が追加される。なお、この拡張が真に誤りを表しているかを確認するために、新しく得られた誤り格フレーム１０が母語話者格フレーム１０ｂに存在しない場合にのみ拡張を許すこととする。 (5) Expansion of case element Next, in order to improve the coverage of the error case frame 10, the case element 19 in the preposition case column 14 is expanded (FIG. 3, S6). With the correction information described above, the correct case frame corresponding to the error case frame 10 can be specified in the native speaker case frame 10b. As shown in FIG. 7B, for example, [* Prep_do: tokyo → Prep_to] in the error case frame 10 (see FIG. 8), [Prep_to: {tokyo, japan}] in the native speaker case frame 10b. Corresponds. In the native speaker case frame 10b by the integration process (FIG. 3, S3), the case elements 19 are integrated and described as a union. As shown in FIG. 9, the case element 19 of the error case frame 10 can be expanded by adding the information of the case element 19 to the corresponding case of the error case frame 10. That is, as shown in FIG. 2, “japan” is added to the case element 19 as [* Prep do: {tokyo, japan} → Prep_to]. In order to confirm whether or not this extension truly represents an error, the extension is allowed only when the newly obtained error case frame 10 does not exist in the native speaker case frame 10b.

（６）誤り格フレームの出力
得られた誤り格フレーム１０を出力、所定のデータベースに構築する（図３、Ｓ７）。上記した誤り各フレーム１０についての情報を、例えば、XML形式で出力し、データベースに構築し、後述する正誤判定に使用できる。 (6) Output of error case frame The obtained error case frame 10 is output and constructed in a predetermined database (FIG. 3, S7). The information about each error frame 10 described above can be output, for example, in an XML format, constructed in a database, and used for correctness determination described later.

以上において、誤り格フレーム１０は、母語話者コーパス１０２と非母語話者コーパス１０４さえあれば自動生成でき、時間と労力を要する誤り情報の付与という作業を必要としない。かかる方法では、誤り情報の付与を必要としない代わりに、２つのコーパスを２度比較することで誤り格フレーム１０の正当性をチェックしている（図３、Ｓ４及びＳ５）。 As described above, the error case frame 10 can be automatically generated as long as the native speaker corpus 102 and the non-native speaker corpus 104 are present, and does not require the operation of adding error information that requires time and effort. In such a method, the validity of the error case frame 10 is checked by comparing two corpora twice instead of needing to give error information (FIG. 3, S4 and S5).

なお、誤り情報が付与された非母語話者コーパス１０４を用いて誤り格フレーム１０を生成することも可能である。その場合には、誤り情報により誤り格フレーム１０の選択と訂正情報の決定を行う。 It is also possible to generate the error case frame 10 using the non-native speaker corpus 104 to which error information is assigned. In that case, the error case frame 10 is selected and the correction information is determined based on the error information.

次に、上記した誤り格フレーム１０を用いて、非母語話者コーパス１０４に関する英語以外の特定言語の母語話者による英語文章を正誤判定部３４により正誤判定する方法について説明する。 Next, a description will be given of a method of determining whether an English sentence by a native speaker of a specific language other than English related to the non-native speaker corpus 104 is correct or incorrect by using the error case frame 10 described above.

まず、正誤判定部３４は、正誤判定を行う英語文章について構文解析を行った上で格フレームを生成する。格フレームの生成方法については上記したコーパスからの格フレームの生成と同様である。次に、かかる格フレームを上記した所定のデータベース中の誤り格フレーム１０と照合して動詞欄１７、格標識１８、格要素１９のいずれも合致する誤り格フレーム１０があれば、かかる英語文章の前置詞に誤りがあると判定するのである。なお、正誤判定において合致した誤り格フレーム１０の訂正情報を用いると、かかる前置詞の誤りを訂正することも可能である。 First, the correctness / incorrectness determination unit 34 generates a case frame after performing a syntax analysis on an English sentence for which correctness / incorrectness determination is performed. The case frame generation method is the same as the case frame generation from the corpus described above. Next, if such a case frame is matched with the error case frame 10 in the above-mentioned predetermined database and any of the verb field 17, the case indicator 18, and the case element 19 match, It is determined that there is an error in the preposition. In addition, when the correction information of the error case frame 10 that matches in the correctness determination is used, it is also possible to correct the error of the preposition.

正誤判定においては、上記したように、正誤判定を行う英語文章の作成者の母語に応じた非母語話者コーパス１０４を使用して得た誤り格フレーム１０のデータベースを用いることで、正誤判定の精度を向上させることができる。例えば、日本語を母語とする日本語話者による英語文章は、日本語話者の書いた英語文章による非母語話者コーパス１０４を用いて得た誤り格フレーム１０のデータベースを用いるのである。この場合、日本語において前置詞は無いが、助詞が対応し、誤りセットを同様に作成出来て、上記したフランス語話者の例と同様に、日本語話者に混同されやすい前置詞の誤りセットを用いて誤り格フレーム１０を生成できる。その上で、正誤判定及び誤り訂正が可能である。 In the correctness determination, as described above, by using the database of the error case frame 10 obtained by using the non-native speaker corpus 104 corresponding to the native language of the creator of the English sentence for which the correctness determination is performed, the correctness determination is performed. Accuracy can be improved. For example, an English sentence by a Japanese speaker whose native language is Japanese uses a database of error case frames 10 obtained by using a non-native speaker corpus 104 of English sentences written by a Japanese speaker. In this case, there are no prepositions in Japanese, but particles are supported, error sets can be created in the same way, and preposition error sets that are easily confused by Japanese speakers are used, as in the example of French speakers described above. Thus, the error case frame 10 can be generated. In addition, correctness determination and error correction are possible.

なお、正誤判定部３４について誤り格フレーム作成部３２を含む言語解析システム１とは別の付属システムに設けても良い。格フレーム作成部３２で抽出した誤り格フレーム１０をデータベース、XML形式等として、正誤判定部３４を含む付属システムからアクセスし、特定言語の母語話者による英語文章を正誤判定できるのである。 The correctness determination unit 34 may be provided in an attached system different from the language analysis system 1 including the error case frame generation unit 32. The error case frame 10 extracted by the case frame creation unit 32 can be accessed as a database, XML format, or the like from an attached system including the correct / incorrect determination unit 34, so that English sentences by native speakers of a specific language can be determined correctly.

上記した実施例によれば、誤りに関する説明を適宜、目的に応じてフィードバックメッセージとして与えることができる。例えば、なぜその訂正候補が選択されたのかを人間が直感的に解釈できる形で提供できる。 According to the above-described embodiment, an explanation regarding an error can be given as a feedback message as appropriate according to the purpose. For example, it can be provided in a form in which a human can intuitively understand why the correction candidate is selected.

以上、本発明による実施例及びこれに基づく変形例を説明したが、本発明は必ずしもこれに限定されるものではなく、当業者であれば、本発明の主旨又は添付した特許請求の範囲を逸脱することなく、様々な代替実施例及び改変例を見出すことができるであろう。 As mentioned above, although the Example by this invention and the modification based on this were demonstrated, this invention is not necessarily limited to this, A person skilled in the art will deviate from the main point of this invention, or the attached claim. Various alternative embodiments and modifications could be found without doing so.

１０誤り格フレーム
１０ａ格フレーム
１０ｂ母語話者格フレーム
１０ｃ非母語話者格フレーム
１８格標識
１９格要素

10 Error case frame 10a Case frame 10b Native speaker case frame 10c Non-native speaker case frame 18 Case indicator 19 Case element

Claims

A computer-based language analysis method for automatically analyzing the correctness of prepositions in English sentences,
(1) In each of a reference English corpus consisting of English sentences by an English native speaker and a specific English corpus consisting of English sentences by a native speaker of a specific language other than English,
Verbs and
About the surface case that the verb takes,
A basic case including a basic case indicator corresponding to a case type and a basic case element corresponding to a word which is an element of the case;
A preposition case that includes a preposition case indicator corresponding to the preposition and a preposition case element corresponding to a word that is an element to which the preposition is given;
Obtaining a case frame consisting of a set of
(2) For the case frame from the reference English corpus, integrating the common preposition case markers of the verb, the basic case, and the preposition case into the case frame as a union of the preposition case elements When,
(3) The case frame from the specific English corpus that does not exist in the case frame from the reference English corpus is probabilistically erroneous in the native language speaker of the specific language with the preposition corresponding to the preposition case indicator. The preposition of the case frame from the matched reference English corpus, if it matches any of the case frames from the reference English corpus, by changing the preposition case indicator in response to a preposition, if this matches any of the case frames from the reference English corpus Adding a case element as a union;
(4) A language analysis method comprising: determining whether a preposition is correct or incorrect in an English sentence created by a native speaker of the specific language using the error case frame.

(3) Prepositions that are probabilistically erroneous in native speakers of the specific language due to the prepositions corresponding to the prepositional case markers further correspond to the words of the specific language corresponding to the prepositions corresponding to the preposition case indicators. The language analysis method according to claim 1, wherein the language analysis method is one of English prepositions.

A computer-based language analysis system for automatically analyzing the correctness of prepositions in English sentences,
(1) In each of a reference English corpus consisting of English sentences by an English native speaker and a specific English corpus consisting of English sentences by a native speaker of a specific language other than English,
Verbs and
About the surface case that the verb takes,
A basic case including a basic case indicator corresponding to a case type and a basic case element corresponding to a word which is an element of the case;
A preposition case that includes a preposition case indicator corresponding to the preposition and a preposition case element corresponding to a word that is an element to which the preposition is given;
Means for obtaining a case frame consisting of a set of
(2) Means for integrating the common preposition case markers of the verb, the basic case, and the preposition case into one case frame as a union of the preposition case elements for the case frame from the reference English corpus When,
(3) The case frame from the specific English corpus that does not exist in the case frame from the reference English corpus is probabilistically erroneous in the native language speaker of the specific language with the preposition corresponding to the preposition case indicator. The preposition of the case frame from the matched reference English corpus, if it matches any of the case frames from the reference English corpus, by changing the preposition case indicator in response to a preposition, if this matches any of the case frames from the reference English corpus A language analysis system comprising: means for adding case elements as a union.

(4) The language analysis system according to claim 3, further comprising: means for determining whether a preposition is correct or incorrect in an English sentence created by a native speaker of the specific language based on the error case frame.

(3) Prepositions that are probabilistically erroneous in native speakers of the specific language due to the prepositions corresponding to the prepositional case markers further correspond to the words of the specific language corresponding to the prepositions corresponding to the preposition case indicators. The language analysis system according to claim 3, wherein the language analysis system is one of English prepositions.