JP2014154030A

JP2014154030A - Subject-verb agreement error detection device and program for agreement error detection

Info

Publication number: JP2014154030A
Application number: JP2013024807A
Authority: JP
Inventors: Akira Nagata; 亮永田
Original assignee: JAPAN INST FOR EDUCATIONAL MEASUREMENT Inc; JAPAN INSTITUTE FOR EDUCATIONAL MEASUREMENT Inc
Current assignee: JAPAN INST FOR EDUCATIONAL MEASUREMENT Inc; JAPAN INSTITUTE FOR EDUCATIONAL MEASUREMENT Inc
Priority date: 2013-02-12
Filing date: 2013-02-12
Publication date: 2014-08-25

Abstract

PROBLEM TO BE SOLVED: To achieve more simple and accurate detection of a subject-verb agreement error in an input sentence in English without using syntax analysis.SOLUTION: A subject-verb agreement error detection device 12 includes: input sentence analysis means 14 which performs part-of-speech analysis and phrase analysis of an input sentence in English, specifies words, phrases and clauses configuring the input sentence, and acquires part-of-speech information regarding the kinds and the number of the parts of speech of the words and phrase and clause information regarding the kinds of the phrases and clauses; subject-verb candidate extraction means 15 which extracts a verb in a verb phrase in the input sentence based on rules stored in advance without performing syntax analysis by using the result of the input sentence analysis means 14, and extracts a noun in a noun phrase existing on the left side of the verb as a subject candidate; and correctness/incorrectness determination means 16 which determines correctness/incorrectness of the agreement between the subject candidate and the verb based on the rules stored in advance from the person and the number of the verb and the subject candidate respectively extracted by the subject-verb candidate extraction means 15.

Description

本発明は、英語学習者の作成した英文について、主語の人称・数が動詞の人称・数に一致していない一致誤りを検出する装置及びコンピュータプログラムに係り、更に詳しくは、構文解析を用いずに品詞解析及び句解析のみから、前記一致誤りをより正確且つ簡易に検出可能な主語動詞の一致誤り検出装置及び一致誤り検出用プログラムに関する。 The present invention relates to an apparatus and a computer program for detecting a coincidence error in which the subject person / number of the subject does not match the person name / number of the verb for an English sentence created by an English learner, and more specifically, without using parsing. Furthermore, the present invention relates to a matching error detection apparatus and a matching error detection program for a subject verb that can detect the matching error more accurately and easily from only part-of-speech analysis and phrase analysis.

日本人の英語学習者が英文を作成したときの誤りとして、例えば、
「ＴｈｅｆｏｏｔｂａｌｌｐｌａｙｅｒｆｒｏｍＪａｐａｎｐｌａｙｆｏｒＵｎｉｔｅｄ．」
のように、主語の人称・数が動詞の人称・数に一致していない誤り（以下、「主語動詞の一致誤り」と称する。）の頻度が比較的高い。このような主語動詞の一致誤りを検出する装置として、特許文献１に開示された学習支援装置がある。この学習支援装置では、英語による入力文の中から主語と動詞に該当する単語を特定し、当該各単語の文法属性を単語辞書から獲得した上で、主語と動詞の人称が一致するか否かを検出する。しかしながら、特許文献１には、入力文の主語と動詞をどのような手法で判断するかの具体的な開示はない。 As an error when a Japanese learner of English creates an English sentence, for example,
"The football player from Japan play for United."
As described above, the frequency of errors in which the subject's personality / number does not match the verb's personality / number (hereinafter referred to as “subject verbal matching error”) is relatively high. As an apparatus for detecting such a subject verb matching error, there is a learning support apparatus disclosed in Patent Document 1. In this learning support device, the words corresponding to the subject and the verb are identified from the input sentences in English, the grammatical attribute of each word is acquired from the word dictionary, and whether or not the subject and the verb personality match. Is detected. However, Patent Document 1 does not specifically disclose how to determine the subject and verb of an input sentence.

ところで、主語動詞の一致誤りの検出手法として、２、３単語（または品詞）の連接確率に基づいて前記一致誤りを検出するＣｈｏｄｏｒｏｗの手法が知られている（非特許文献１参照）。当該手法で前記一致誤りの検出が可能になるのは、主語と動詞が隣接する場合、又は、主語と動詞の間隔が１単語である場合のみに限られる。ところが、前述の例文のように、英文には、主語と動詞の間隔が２単語以上になることは稀ではないため、前記手法では、このような場合の前記一致誤りを正確に検出できない。そこで、前記一致誤りの検出を正確に行うには、文中の主語と動詞の関係を正確に把握することであり、このためには、英語学習者の作成した英文に対して構文解析を行って、当該構文解析により主語と動詞の関係を取得した後で、主語動詞の一致誤りの検出を行うという非特許文献２記載の手法が知られている。 By the way, as a method for detecting a matching error of a subject verb, a Chodrow method for detecting the matching error based on a concatenation probability of a few words (or parts of speech) is known (see Non-Patent Document 1). The matching error can be detected by this method only when the subject and the verb are adjacent to each other or when the distance between the subject and the verb is one word. However, as in the case of the above-described example sentence, it is not uncommon for an English sentence to have an interval between a subject and a verb of two words or more. Therefore, the method cannot accurately detect the matching error in such a case. Therefore, in order to accurately detect the matching error, it is necessary to accurately grasp the relationship between the subject in the sentence and the verb. For this purpose, the English sentence created by the English learner is analyzed. A technique described in Non-Patent Document 2 is known in which after a relationship between a subject and a verb is acquired by the syntax analysis, a matching error of the subject verb is detected.

特開平８−３０５９８号公報JP-A-8-30598

M. Chodorow and C.Leacock, An unsupervised method for detecting grammatical errors, Proc. Of 1stMeeting of the North America Chapter of ACL, Oct.2000, PP140-147M. Chodorow and C. Leacock, An unsupervised method for detecting grammatical errors, Proc. Of 1st Meeting of the North America Chapter of ACL, Oct. 2000, PP140-147 河合敦夫、杉原厚吉、杉江昇, 「英文の誤りを検出するシステムASPEC-I」情報処理学会論文誌 Vol.25 No.6, 1984年11月,pp1072-1079Ikuo Kawai, Atsuyoshi Sugihara, Noboru Sugie, “A system for detecting English errors ASPEC-I” IPSJ Transactions Vol.25 No.6, November 1984, pp1072-1079

しかしながら、前記非特許文献２記載の手法は次の問題がある。すなわち、構文解析は、正確性を期すのが難しいタスクであり、例えば、主語が多数の単語によって構成されるような場合に、単語の意味や属性等を考慮した判断が必要であり、英文中の主語と動詞を常に正確に特定するには限界がある。特に、英語学習者が作成する英文には、主語動詞の一致誤りのみならず、その他の様々な誤りが含まれるため、このような場合には、構文解析が一層困難になる。従って、構文解析を用いても、主語動詞の一致誤りの検出を常に正確に行えるとは言えない。加えて、構文解析は、人手により構文情報を付与した大量の英文データを用いて開発することが一般的であり、当該構文解析を構築し利用するためには、多大な手間やコストがかかることから、現状として、構文解析を用いた主語動詞の一致誤りの検出手法は有用とは言えない。 However, the method described in Non-Patent Document 2 has the following problems. In other words, parsing is a task that is difficult to be accurate. For example, when the subject is composed of a large number of words, it is necessary to make a judgment considering the meaning and attributes of the words. There is a limit to always accurately identifying the subject and verb. In particular, the English sentence created by the English learner includes not only the matching error of the subject verb but also various other errors, and in such a case, the syntax analysis becomes more difficult. Therefore, even if syntactic analysis is used, it cannot be said that the subject verb matching error can always be detected accurately. In addition, syntax analysis is generally developed using a large amount of English data with syntax information added manually, and it takes a lot of time and money to construct and use the syntax analysis. Therefore, at present, it cannot be said that the subject verb matching error detection method using parsing is useful.

本発明は、このような課題に着目して案出されたものであり、その目的は、構文解析を用いることなく、英語による入力文について、主語動詞の一致誤りをより簡単且つ正確に検出できる主語動詞の一致誤り検出装置及び一致誤り検出用プログラムを提供することにある。 The present invention has been devised by paying attention to such a problem, and its purpose is to detect a subject verb matching error more easily and accurately in an input sentence in English without using syntax analysis. An object of the present invention is to provide a matching error detection device and a matching error detection program for a subject verb.

前記目的を達成するため、本発明は、主として、英語による入力文の主語と動詞の一致誤りを検出する主語動詞の一致誤り検出装置であって、
前記入力文の品詞解析及び句解析を行い、前記入力文を構成する単語、句及び節を特定し、前記単語の品詞の種類及び数に関する品詞情報と前記句及び節の種類に関する句及び節情報を獲得する入力文解析手段と、当該入力文解析手段の結果を用い、構文解析を行わずに予め記憶された規則に基づき、前記入力文について、動詞句内の動詞を抽出するとともに、当該動詞の左側に存在する名詞句内の名詞を主語候補として抽出する主語動詞候補抽出手段と、当該主語動詞候補抽出手段で抽出された前記動詞と前記主語候補のそれぞれの人称及び数から、予め記憶された規則に基づき、前記主語候補と前記動詞の一致に関する正誤判定をする正誤判定手段とを備える、という構成を採っている。 To achieve the above object, the present invention mainly provides a subject verb matching error detection device for detecting matching errors between a subject and a verb in an input sentence in English,
Perform part-of-speech analysis and phrase analysis of the input sentence, specify words, phrases and clauses constituting the input sentence, part-of-speech information related to the type and number of parts of speech of the word, and phrase and clause information related to the type of phrase and clause Using the result of the input sentence analyzing means, and extracting the verb in the verb phrase for the input sentence based on the rules stored in advance without performing syntax analysis, and the verb The subject verb candidate extracting means for extracting a noun in the noun phrase existing on the left side of the subject as a subject candidate, and the personality and number of each of the verb and the subject candidate extracted by the subject verb candidate extracting means are stored in advance. In accordance with the rules, correctness determination means for determining whether or not the subject candidate matches the verb is provided.

本発明者が行った評価実験では、本発明に適用された手法が、構文解析を利用した従来手法に比べ、入力文についての主語動詞の一致誤りの検出精度が高いことが実証されており、本発明によれば、構文解析を用いずに、入力文についての主語動詞の一致誤りをより簡単且つ正確に検出することができる。 In the evaluation experiment conducted by the inventor, it has been proved that the technique applied to the present invention has a higher detection accuracy of the subject verb matching error in the input sentence than the conventional technique using syntax analysis. According to the present invention, it is possible to more easily and accurately detect a subject verb matching error in an input sentence without using parsing.

本発明に係る主語動詞の一致誤り検出装置を備えた英語学習システムの概略構成を示すブロック図。The block diagram which shows schematic structure of the English learning system provided with the coincidence error detection apparatus of the subject verb which concerns on this invention. 前記一致誤り検出装置の処理手順を表すチャート図。The chart figure showing the process sequence of the said coincidence error detection apparatus.

以下、本発明の実施形態について図面を参照しながら説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図１には、本発明に係る主語動詞の一致誤り検出装置を備えた英語学習システムの概略構成を表すブロック図が示されている。この図において、英語学習システム１０は、英語の学習者であるユーザが作成した英文が入力される入力装置１１と、入力装置１１で入力された入力文について、主語の人称・数が動詞の人称・数に一致していない文法上の誤りを検出する主語動詞の一致誤り検出装置１２とを含んで構成されている。なお、以下については、一文からなる入力文を前提に説明を行う。仮に、複数文からなる英文が入力装置１１に入力された場合には、ピリオドの位置等に基づいて、一文単位で切り出しを行う処理がなされた後、一文毎に後述する処理が行われる。 FIG. 1 is a block diagram showing a schematic configuration of an English learning system including a subject verb matching error detection apparatus according to the present invention. In this figure, an English learning system 10 includes an input device 11 to which an English sentence created by a user who is an English learner is input, and an input sentence input by the input device 11 with a subject whose personal name / number is a verb. A subject verb matching error detection device 12 that detects a grammatical error that does not match the number. The following description will be made on the premise of an input sentence consisting of one sentence. If an English sentence consisting of a plurality of sentences is input to the input device 11, a process for cutting out one sentence at a time is performed based on the position of the period and the like, and a process described later is performed for each sentence.

前記入力装置１１は、図示省略したキーボードやタッチパネル等のデータ入力用機器により構成されているが、これに限定されるものでなく、紙媒体に記録された英文を画像データとして読み込み、当該画像データからテキストデータに変換するスキャナ装置、或いは、記憶媒体に記憶された英文に関するテキストデータを読み取り可能な装置等であっても良い。 The input device 11 is constituted by a data input device such as a keyboard or a touch panel (not shown), but is not limited to this, and reads English text recorded on a paper medium as image data. It may be a scanner device that converts text data into text data, or a device that can read text data related to English text stored in a storage medium.

前記一致誤り検出装置１２は、ＣＰＵ等の演算処理装置、メモリやハードディスク等の記憶装置等からなるコンピュータによって構成され、当該コンピュータを後述する各手段として機能させるためのプログラムがインストールされている。ここで、特に限定されるものではないが、本実施形態の一致誤り検出装置１２は、多くのユーザの入力装置１１に対して、インターネット回線等のネットワーク回線を介してデータを送受信可能なサーバに設けられている。 The coincidence error detection device 12 is constituted by a computer including an arithmetic processing unit such as a CPU, a storage device such as a memory and a hard disk, and the like is installed with a program for causing the computer to function as each unit described later. Here, although not particularly limited, the matching error detection device 12 of the present embodiment is a server that can transmit and receive data to and from many user input devices 11 via a network line such as the Internet line. Is provided.

この一致誤り検出装置１２は、入力装置１１からの入力文の解析を行う入力文解析手段１４と、入力文解析手段１４の結果を用いて、前記入力文から、構文解析を行わずに、動詞と当該動詞に対応する主語候補を抽出する主語動詞候補抽出手段１５と、主語動詞候補抽出手段１５で抽出された動詞及び主語候補のそれぞれの人称及び数に関する情報から、入力文の主語と動詞の一致に関する正誤を判定する正誤判定手段１６と、これら各手段１４〜１６での処理に際し必要となる各種データが記憶されたデータベース１７とを備えている。 The coincidence error detection device 12 uses the input sentence analysis means 14 for analyzing the input sentence from the input device 11 and the result of the input sentence analysis means 14, and does not perform syntax analysis from the input sentence. And subject verb candidate extraction means 15 for extracting subject candidates corresponding to the verb, and information on the names and numbers of the verbs and subject candidates extracted by the subject verb candidate extraction means 15. A correctness / incorrectness determination means 16 for determining correctness / incorrectness related to the match and a database 17 in which various data necessary for processing in each of the means 14 to 16 are stored.

前記入力文解析手段１４は、入力文を構成する各単語に対して品詞解析を行う品詞解析部１９と、品詞解析部１９での解析結果を用い、入力文に対して句解析を行う句解析部２０とからなる。 The input sentence analysis means 14 includes a part-of-speech analysis unit 19 that performs part-of-speech analysis on each word constituting the input sentence, and a phrase analysis that performs phrase analysis on the input sentence using the analysis result in the part-of-speech analysis unit 19 Part 20.

前記品詞解析部１９では、入力文中のスペースの存在により、構成する各単語が抽出され、データベース１７に記憶されたデータ、すなわち、各単語の品詞名や意味等の辞書データ及び隣接する単語の品詞の種類の確率データ等から、各単語の品詞情報が特定される。つまり、ここでは、各単語に対し、品詞名の他に、当該品詞名に付随して人称、単数形又は複数形等の数、活用形等を表す符号（品詞ラベル）が付される。 In the part-of-speech analysis unit 19, each constituent word is extracted due to the presence of a space in the input sentence, and data stored in the database 17, that is, dictionary data such as part-of-speech name and meaning of each word, and part-of-speech of an adjacent word The part-of-speech information of each word is specified from the probability data of the type. That is, here, in addition to the part-of-speech name, a code (part-of-speech label) representing the person name, the number of singular or plural forms, the utilization form, etc. is attached to each word.

前記句解析部２０では、品詞解析により得られた各単語の品詞情報に基づき、データベース１７に記憶されたデータ、すなわち、前記辞書データ及び隣接する品詞及び句の種類の確率データ等から、各単語が入力文の文頭から順に、句及び節単位でグループ化される。つまり、ここでは、入力文が、名詞句、動詞句、形容詞句、副詞句、接続詞句等の句単位、また、名詞節、動詞節等の節単位で区分され、それぞれの句単位及び節単位で、その種類となる句及び節情報を表す符号（句ラベル、節ラベル）が付される。 In the phrase analysis unit 20, based on the part-of-speech information of each word obtained by the part-of-speech analysis, each word is obtained from the data stored in the database 17, that is, from the dictionary data and the adjacent part-of-speech and phrase type probability data. Are grouped in units of phrases and clauses in order from the beginning of the input sentence. That is, here, the input sentence is divided into phrase units such as noun phrases, verb phrases, adjective phrases, adverb phrases, conjunction phrases, etc., and clause units such as noun clauses, verb clauses, etc. And a code (phrase label, clause label) indicating the phrase and clause information of the type.

なお、品詞解析部１９及び句解析部２０での処理は、公知の手法に基づいて行われており、発明の本質部分ではないため、詳細な説明を省略する。 Note that the processing in the part-of-speech analysis unit 19 and the phrase analysis unit 20 is performed based on a known method and is not an essential part of the invention, and thus detailed description thereof is omitted.

前記主語動詞候補抽出手段１５は、入力文に対し、動詞及び主語候補の抽出精度を向上させるための正規化処理を行う入力文正規化処理部２２と、入力文正規化処理部２２で正規化処理された後の入力文について、入力文の動詞及び主語候補を抽出する処理を行う抽出処理部２３とを備えている。 The subject verb candidate extraction means 15 normalizes the input sentence normalization processing unit 22 for normalizing the input sentence to improve the extraction accuracy of verbs and subject candidates, and the input sentence normalization processing part 22 An extraction processing unit 23 that performs processing for extracting verbs and subject candidates of the input sentence for the input sentence after processing is provided.

前記入力文正規化処理部２２は、入力文の種類に基づいて、主語動詞の一致誤りの検出対象となる文の選定処理を行う対象選定処理部２５と、名詞が並列された名詞句について、主語候補を正確に抽出可能にするための調整処理を行う並列名詞解析処理部２６と、動詞と主語候補の抽出範囲を狭めるために入力文の分割処理を行う入力文分割処理部２７と、入力文に対して主語になり得ない句を削除して余分な主語候補を減らす削除処理を行う不要句削除処理部２８と、Ｔｈｅｒｅから始まる主語と動詞の倒置構文に対し、主語と動詞の語順を通常の構文のように並び替える語順変換処理を行う倒置構文変換処理部２９とにより構成されている。 The input sentence normalization processing unit 22 is based on a target selection processing unit 25 that performs a process of selecting a sentence to be detected as a subject verb matching error based on the type of the input sentence, and a noun phrase in which nouns are arranged in parallel. A parallel noun analysis processing unit 26 that performs adjustment processing so that subject candidates can be accurately extracted, an input sentence division processing unit 27 that performs input sentence division processing to narrow the range of verb and subject candidate extraction, and input An unnecessary phrase deletion processing unit 28 that deletes phrases that cannot be the subject of a sentence to reduce unnecessary subject candidates, and an inversion syntax of the subject and verb starting from There, It is composed of an inverted syntax conversion processing unit 29 that performs word order conversion processing for rearranging like a normal syntax.

前記対象選定処理部２５では、入力文について、疑問文及び命令文を除外し、平叙文のみが抽出される。すなわち、一致誤り検出装置１２では、入力文が平叙文であるもののみについて主語動詞の一致誤りが検出される。具体的には、入力文に疑問符（？）が含まれる場合に、入力文は、疑問文と判定されて主語動詞の一致誤りの検出対象外とされる。また、抽出処理部２３での後述する処理により、主語候補が得られなかった場合に、入力文は、命令文であると判定されて主語動詞の一致誤りの検出対象外とされる。 The target selection processing unit 25 excludes question sentences and command sentences from the input sentence and extracts only a plain text. In other words, the matching error detection device 12 detects a matching error of the subject verb only for the input sentence that is a plain text. Specifically, when a question mark (?) Is included in the input sentence, the input sentence is determined to be a question sentence and is excluded from detection of a subject verb matching error. Further, when a subject candidate is not obtained by the processing to be described later in the extraction processing unit 23, the input sentence is determined to be a command sentence and is excluded from the subject verb matching error detection target.

前記並列名詞解析処理部２６では、次の処理が行われる。例えば、
「ｂｒｅａｄａｎｄｂｕｔｔｅｒａｒｅｅｘｐｅｎｓｉｖｅ」
のように、一つの名詞句「ｂｒｅａｄａｎｄｂｕｔｔｅｒ」内の「ｂｒｅａｄ」と「ｂｕｔｔｅｒ」は、それぞれ単数の名詞であるが、それらが並列されているため、名詞句「ｂｒｅａｄａｎｄｂｕｔｔｅｒ」は、複数扱いとして、文中の後（右側）に位置する動詞の人称・数が特定される。
そこで、並列名詞解析処理部２６では、一つの名詞句内で名詞が、「ａｎｄ」又は「ｏｒ」で並列されている場合、それら単語の品詞について、それぞれ「並列名詞」を意味する特別な符号（品詞ラベル）が付されて、それら単語に対応して記憶される。すなわち、ここでは、主語候補の数を調整し、主語動詞の一致誤りの検出を適正に行うための品詞ラベルの置換処理が行われ、正誤判定手段１６での正誤判定に際し、動詞の数情報に影響を与える主語候補の単数、複数が正確に決定可能になる。 In the parallel noun analysis processing unit 26, the following processing is performed. For example,
"Bread and butter are expensive"
Like “bread” and “butter” in a single noun phrase “bread and butter”, each is a singular noun, but since they are arranged in parallel, the noun phrase “bread and butter” As the treatment, the personality / number of the verb located after (right side) in the sentence is specified.
Therefore, in the parallel noun analysis processing unit 26, when nouns are arranged in parallel with “and” or “or” in one noun phrase, a special code meaning “parallel noun” for each part of speech of the words. (Part of speech label) is attached and stored in correspondence with these words. That is, here, the number of subject candidates is adjusted, and the part-of-speech label replacement process is performed to properly detect the subject verb matching error. It is possible to accurately determine singular and plural subject candidates that have influence.

前記入力文分割処理部２７では、予め設定されてデータベース１７に記憶された次の規則に基づき、入力文が所定の位置で分割される。なお、以下の説明において、入力文分割処理部２７で分割された各部分をセグメントと称する。抽出処理部２３での主語候補と動詞の抽出は、入力文分割処理部２７で分割されたセグメント毎に行われる。 In the input sentence division processing unit 27, the input sentence is divided at a predetermined position based on the next rule set in advance and stored in the database 17. In the following description, each part divided by the input sentence division processing unit 27 is referred to as a segment. Extraction of subject candidates and verbs by the extraction processing unit 23 is performed for each segment divided by the input sentence division processing unit 27.

先ず、所定の節の直前で入力文の分割が行われる。すなわち、句解析部２０による入力文の解析により従属接続詞節が存在するか、又は、データベース１７に予め記憶されたキーワードが含まれている場合には、当該従属接続詞節及び当該キーワードの直前で入力文が分割される。ここでのキーワードとしては、「ｂｕｔ」、「ｉｆ」、「ｂｅｃａｕｓｅ」、「ｓｉｎｃｅ」、「ｔｈｏｕｇｈ」、「ａｌｔｈｏｕｇｈ」、「ｈｏｗ」、「ｗｈａｔ」、「ｗｈｅｎ」、「ｗｈｅｔｈｅｒ」、「ｗｈｅｒｅ」、「ｗｈｉｌｅ」を例示できる。 First, the input sentence is divided immediately before a predetermined clause. That is, if a subordinate conjunction clause exists by analysis of the input sentence by the phrase analysis unit 20 or a keyword stored in the database 17 is included in advance, the input is performed immediately before the subordinate conjunction clause and the keyword. The sentence is split. As keywords here, “but”, “if”, “because”, “since”, “though”, “although”, “how”, “what”, “where”, “where”, “where” , “While”.

例えば、入力文が、
「ＴｈｅｆｏｏｔｂａｌｌｐｌａｙｅｒｆｒｏｍＪａｐａｎｐｌａｙｆｏｒＵｎｉｔｅｄｗｈｅｎｈｅｔｈｉｎｋｓｔｈａｔｉｔｉｓｒｉｇｈｔ．」
である場合、前記キーワードの「ｗｈｅｎ」の直前と、従属接続詞節の頭の「ｔｈａｔ」の直前とで入力文が分割され、３つのセグメントが得られる。 For example, if the input sentence is
"The football player from Japan play for United when he thinks that it is right."
In this case, the input sentence is divided immediately before “where” of the keyword and immediately before “that” at the head of the subordinate conjunction clause, and three segments are obtained.

加えて、品詞解析部１９での解析結果により、入力文中に関係代名詞（「ｗｈｏ」、「ｗｈｉｃｈ」）が存在すると判断された場合、当該関係代名詞の前（左側）に存在する名詞句のうち、この関係代名詞に最も近い名詞句の直前でも入力文が分割される。 In addition, when it is determined from the analysis result in the part-of-speech analysis unit 19 that there is a related pronoun (“who”, “who”) in the input sentence, among the noun phrases existing before (left side) of the related pronoun The input sentence is divided even immediately before the noun phrase closest to the related pronoun.

前記不要句削除処理部２８では、予め設定されてデータベース１７に記憶された次の規則に基づき、不要句の削除が行われる。すなわち、先ず、句解析部２０での入力文の解析結果から、各セグメントそれぞれについて、名詞句及び動詞句以外の句が削除される。ここで、前置詞が付いている名詞は主語にならないため、名詞句の中で、前置詞句の後（右側）に隣接する名詞句も削除される。 The unnecessary phrase deletion processing unit 28 deletes unnecessary phrases based on the following rules set in advance and stored in the database 17. That is, first, phrases other than the noun phrase and the verb phrase are deleted for each segment from the analysis result of the input sentence in the phrase analysis unit 20. Here, nouns with prepositions do not become the subject, so noun phrases adjacent to the right after the preposition phrase are also deleted in the noun phrases.

例えば、入力文が、
「ＴｈｅｆｏｏｔｂａｌｌｐｌａｙｅｒｗｉｔｈｇｌａｓｓｅｓｐｌａｙｆｏｒＡｊａｘ．」
である場合、句解析部２０では、「Ｔｈｅｆｏｏｔｂａｌｌｐｌａｙｅｒ」が名詞句、「ｗｉｔｈ」が前置詞句、「ｇｌａｓｓｅｓ」が名詞句、「ｐｌａｙ」が動詞句、「ｆｏｒ」が前置詞句、「Ａｊａｘ」が名詞句と判定される。そこで、この場合、不要句削除処理部２８では、前置詞句「ｗｉｔｈ」、名詞句「ｇｌａｓｓｅｓ」、前置詞句「ｆｏｒ」、及び名詞句「Ａｊａｘ」が削除され、名詞句「Ｔｈｅｆｏｏｔｂａｌｌｐｌａｙｅｒ」と動詞句「ｐｌａｙ」が残ることになる。 For example, if the input sentence is
“The football player with glasses play for Ajax.”
In the phrase analysis unit 20, "The football player" is a noun phrase, "with" is a preposition phrase, "glasses" is a noun phrase, "play" is a verb phrase, "for" is a preposition phrase, and "Ajax" Is determined as a noun phrase. Therefore, in this case, the unnecessary phrase deletion processing unit 28 deletes the preposition phrase “with”, the noun phrase “glasses”, the preposition phrase “for”, and the noun phrase “Ajax”, and the noun phrase “The football player” and the verb The phrase “play” will remain.

ここで、例外として、「ａｎｕｍｂｅｒｏｆｐｅｏｐｌｅ」や「ｍｏｓｔｏｆｔｈｅｍ」のように、数量詞として働く「名詞句＋ｏｆ」の後に隣接する名詞句は、この名詞句を形容詞的に修飾するため、例外規則として、削除が行われない。すなわち、第１の例外規則として、「ａｎｕｍｂｅｒｏｆ」、「ａｃｏｕｐｌｅｏｆ」が存在する場合、「名詞句＋ｏｆ」の部分を削除し、後続する名詞句について、その名詞の数情報が複数扱いになるように、数情報を表す符号（品詞ラベル）が置換される。また、第２の例外規則として、「ｍｏｓｔｏｆ」、「ａｎｙｏｆ」、「ｓｏｍｅｏｆ」、「ｍａｎｙｏｆ」、「（ａ）ｆｅｗｏｆ」、「（ａ）ｌｉｔｔｌｅｏｆ」、「ａｌｏｔｏｆ」、「（ａ）ｐｌｅｎｔｙｏｆ」の何れかが存在する場合、「名詞句＋ｏｆ」が削除される。 Here, as an exception, noun phrases such as “a number of people” and “most of them” that are adjacent to “noun phrases + of” that act as quantifiers are modified as adjectives. As a rule, no deletion is performed. That is, when “a number of” and “a couple of” exist as the first exception rule, the part of “noun phrase + of” is deleted, and the number information of the noun is handled in plural for the following noun phrase. The code (part-of-speech label) representing the numerical information is replaced. In addition, as the second exception rule, “most of”, “any of”, “some of”, “many of”, “(a) new of”, “(a) small of”, “a lot of” , “(A) friendship of” is deleted, “noun phrase + of” is deleted.

前記倒置構文変換処理部２９では、存在を意味する「Ｔｈｅｒｅ」を用いた主語と動詞の倒置構文について、通常の語順に戻す処理が行われ、抽出処理部２３での動詞と主語候補の抽出処理を行い易くしている。具体的には、品詞解析部１９での解析によって、存在を意味する「Ｔｈｅｒｅ」が判明するため、その符号（品詞ラベル）が検出され、「Ｔｈｅｒｅ」の直後に動詞句があるときに、「Ｔｈｅｒｅ」から、「Ｔｈｅｒｅ」の直後の動詞句の次の動詞句まで、或いは、次の動詞句がないときは文末まで、句のレベルで逆順に並び替えられる。例えば、入力文が、
「Ｔｈｅｒｅｉｓａｋｅｙ．」
である場合、倒置構文変換処理部２９において、
「ａｋｅｙｉｓＴｈｅｒｅ．」
に並び替えられる。 The inversion syntax conversion processing unit 29 performs processing for returning the inversion syntax of the subject and verb using “There” meaning presence to the normal word order, and the extraction processing unit 23 extracts the verb and subject candidate. It is easy to do. Specifically, since “There” meaning existence is found by analysis in the part-of-speech analysis unit 19, the sign (part-of-speech label) is detected, and when there is a verb phrase immediately after “There”, “ From “There” to the next verb phrase of the verb phrase immediately after “There”, or to the end of the sentence when there is no next verb phrase, they are rearranged in reverse order at the phrase level. For example, if the input sentence is
“There is a key.”
In the case of the inversion syntax conversion processing unit 29,
“A key is There.”
Sorted into

前記抽出処理部２３では、予め設定されてデータベース１７に記憶された次の規則に基づき、入力文正規化処理部２２で正規化された入力文に対して、セグメント毎に、次のように動詞と当該動詞の主語候補とが抽出される。 In the extraction processing unit 23, the following verb is set for each segment with respect to the input sentence normalized by the input sentence normalization processing unit 22 based on the next rule set in advance and stored in the database 17. And the subject candidate of the verb.

先ず、動詞については、入力文解析手段１４で得られた解析結果すなわち品詞情報や句情報を利用して抽出される。ここで、主語動詞の一致の制約を受けるのは、Ｂｅ動詞を除くと現在形の動詞であるため、先ず、Ｂｅ動詞以外の動詞については、動詞非３人称現在又は動詞３人称現在の品詞ラベルが付された動詞について抽出される。ここで、品詞解析部１９において、動詞非３人称現在と解析すべきところ、動詞原型と解析してしまう品詞解析の解析ミスにも対応するため、動詞原型の品詞ラベルが付された動詞も抽出対象とされる。但し、「ｃａｎ」や「ｗｉｌｌ」などの助動詞と共に用いられている動詞原型は、抽出対象外とされる。また、「Ｔｏｒｅａｄ」のようなＴｏ不定詞については、動詞句と解析されるが、ここでの「Ｔｏ」が、品詞情報としてＴｏ不定詞と解析され、「ｒｅａｄ」は、動詞として抽出対象外とされる。 First, verbs are extracted by using analysis results obtained by the input sentence analysis means 14, that is, part of speech information and phrase information. Here, the subject verbs are subject to the restriction of the present verb except for the Be verb. First, for verbs other than the Be verb, the verb non-third-person present or verb third-person present part-of-speech label. Extracted for verbs marked with. Here, in the part-of-speech analysis unit 19, the verb with the part-of-speech label of the verb prototype is also extracted in order to cope with the analysis error of the part-of-speech analysis that is analyzed with the verb non-third-person present. Be targeted. However, verb prototypes used with auxiliary verbs such as “can” and “will” are not extracted. In addition, a To infinitive such as “To read” is analyzed as a verb phrase, but “To” here is analyzed as a To infinitive as part of speech information, and “read” is extracted as a verb. It is assumed to be outside.

一方、Ｂｅ動詞については、後述するように単語の表層情報に基づいて一致誤りの検出が行われるようになっており、抽出処理部２３では、「ａｍ」、「ａｒｅ」、「ｉｓ」、「ｗａｓ」、「ｗｅｒｅ」の五種類のＢｅ動詞が存在する場合に、それらが動詞として抽出される。 On the other hand, for the Be verb, matching errors are detected based on the surface information of the word as will be described later. In the extraction processing unit 23, “am”, “are”, “is”, “ When there are five types of Be verbs “was” and “where”, they are extracted as verbs.

また、主語候補については、以上の処理で抽出された動詞の前（左側）にある名詞句内の名詞、代名詞、数詞が抽出される。この際、動詞の前（左側）にあるＴｏ不定詞や動名詞は、主語になり得るが、句解析部２０で動詞句として解析されてしまうため、データベース１７に予め記憶されたパターンによるパターンマッチング処理が行われ、主語候補として抽出される。この際、Ｔｏ不定詞の場合は、便宜上、不定詞の「Ｔｏ」が主語候補とされ、動名詞の場合は、動名詞となる単語が主語候補とされる。 For the subject candidates, nouns, pronouns, and numbers in the noun phrase in front of the verb extracted by the above processing (left side) are extracted. At this time, To infinitives and verbal nouns in front of the verb (left side) can be the subject, but are analyzed as verb phrases by the phrase analysis unit 20, so that pattern matching based on patterns stored in the database 17 in advance is performed. Processing is performed and extracted as a subject candidate. In this case, for the To infinitive, for convenience, the infinitive “To” is a subject candidate, and in the case of a verbal noun, a word that is a verbal noun is a subject candidate.

そして、抽出された各主語候補について人称と数が決定される。普通名詞と固有名詞については、全て三人称とされる。また、これら名詞の単数、複数等の数情報については、品詞解析部１９で獲得した品詞情報とともに、入力文正規化処理部２２で行われた単語の数情報の調整に基づいて決定される。ここで、主語候補のうち、品詞解析部１９で獲得した品詞情報が、原形で単数名詞としても複数名詞としても使用できる集合名詞である場合には、単数と複数の二つの数を持つと決定される。また、一つの動詞に対して複数の主語候補が存在する場合には、主語候補となる名詞の数情報に関係無く、それぞれ、数情報が複数とされる。すなわち、名詞の数情報は、品詞ラベルに基づき、下表に示される対応規則に従い決定される。一方、代名詞については、人称・数が決定されず、後述するように単語の表層情報に基づいて一致誤りの検出が行われる。 Then, a person name and a number are determined for each extracted subject candidate. Common nouns and proper nouns are all third person. The number information such as singular and plural of the nouns is determined based on the adjustment of the number information of words performed by the input sentence normalization processing unit 22 together with the part of speech information acquired by the part of speech analysis unit 19. Here, among the subject candidates, when the part of speech information acquired by the part of speech analysis unit 19 is a collective noun that can be used as both a singular noun and a plurality of nouns in the original form, it is determined to have a singular and a plurality of two numbers. Is done. In addition, when there are a plurality of subject candidates for one verb, a plurality of pieces of number information are provided regardless of the number information of nouns that are subject candidates. That is, the number information of the noun is determined according to the corresponding rule shown in the following table based on the part of speech label. On the other hand, for pronouns, the person name / number is not determined, and matching errors are detected based on the surface information of the word as will be described later.

なお、ここで、入力文のスペル誤りにより受ける影響を低減するために、図示しないスペルチェッカを用いて、スペルミスのあった語を正しく修正した上で、動詞及び主語候補を抽出するようにしても良い。すなわち、スペルチェッカによりスペルミスの有無を調べ、スペルミスが発見された場合に、誤りの訂正候補を取得し、当該候補となる語につき、前述のようにして人称・数を決定することもできる。 Here, in order to reduce the influence of the spelling error of the input sentence, the verb and the subject candidate may be extracted after correcting the misspelled word correctly by using a spell checker (not shown). good. In other words, the spell checker checks whether there is a spelling error, and if a spelling error is found, an error correction candidate is obtained, and the person / number of the candidate word can be determined as described above.

前記正誤判定手段１６では、データベース１７に予め記憶された下表の検出規則に基づいて、主語動詞の一致に関する正誤判定が行われる。なお、下表においては、主語動詞の人称・数が一致している場合を「○」で表し、主語動詞の人称・数が一致していない場合を「×」で表している。すなわち、ここでは、抽出処理部２３で抽出された動詞それぞれについて、対応する各主語候補との間で人称・数が合っているか否か判定される。その結果、一つの動詞に対する主語候補全ての人称・数が誤っているときのみ、主語動詞の一致が誤っていると判定される。換言すれば、動詞に対応する主語として、どの主語候補を選んだとしても、人称・数が一致しない場合のみ誤りと検出される。 The correctness / incorrectness determination means 16 performs correctness / incorrectness determination regarding the matching of subject verbs based on the detection rules in the table below stored in the database 17 in advance. In the table below, “◯” indicates that the subject verbs have the same personality / number, and “x” indicates that the subject verbs do not have the same personality / number. That is, here, it is determined whether or not the person / number of each verb extracted by the extraction processing unit 23 matches the corresponding subject candidate. As a result, it is determined that the matching of the subject verb is incorrect only when the subject names / numbers of all the subject candidates for one verb are incorrect. In other words, no matter which subject candidate is selected as the subject corresponding to the verb, an error is detected only when the person / number does not match.

次に、前記一致誤り検出装置１２の処理手順につき、図２を用いて説明する。 Next, the processing procedure of the matching error detection device 12 will be described with reference to FIG.

先ず、入力文が入力された後、入力文解析手段１４により、入力文を構成する各語の品詞解析及び句解析を行う入力文解析処理が行われる（ステップＳ１０１）。 First, after an input sentence is input, the input sentence analysis unit 14 performs an input sentence analysis process for performing part-of-speech analysis and phrase analysis of each word constituting the input sentence (step S101).

その後、入力文正規化処理部２２で入力文の正規化処理が行われる。 Thereafter, the input sentence normalization processing unit 22 performs an input sentence normalization process.

具体的には、先ず、対象選定処理部２５で、入力文が疑問文である場合に、主語動詞の一致誤りの検出対象から除外する対象除外処理が行われる（ステップＳ１０２）。 Specifically, first, in the target selection processing unit 25, when the input sentence is a question sentence, a target exclusion process is performed to exclude it from the subject verb coincidence error detection target (step S102).

次に、所定の名詞句に対し、品詞解析部１９での品詞解析により特定された単語の品詞ラベルの置換処理が行われる（ステップＳ１０３）。すなわち、ここでの置換処理は、並列名詞解析処理部２６で、一つの名詞句内で名詞が、「ａｎｄ」又は「ｏｒ」で並列されている場合に、それら単語の品詞について、それぞれ並列名詞を表す品詞ラベルに変更される。また、不要句削除処理部２８で、前記第１の例外規則に該当する場合、つまり、
「ａｎｕｍｂｅｒｏｆ」、「ａｃｏｕｐｌｅｏｆ」が存在する場合、「名詞句＋ｏｆ」の部分を削除し、後続する名詞句について、その名詞の数情報を複数にするように品詞ラベルが変更される。 Next, the part-of-speech label replacement processing of the word specified by the part-of-speech analysis in the part-of-speech analysis unit 19 is performed on a predetermined noun phrase (step S103). That is, the replacement processing here is performed by the parallel noun analysis processing unit 26. When nouns are arranged in parallel with “and” or “or” in one noun phrase, parallel nouns are respectively obtained for the parts of speech of the words. The part-of-speech label representing In addition, in the unnecessary phrase deletion processing unit 28, when it corresponds to the first exception rule, that is,
When “a number of” and “a couple of” exist, the part of “noun phrase + of” is deleted, and the part of speech label is changed so that the number information of the noun is made plural for the following noun phrase .

その後、入力文分割処理部２７で、入力文をセグメントに分割する入力文分割処理が行われる（ステップＳ１０４）。ここで、前述した規則により入力文の途中で分割できない場合は、一つのセグメントとして取り扱われる。 Thereafter, the input sentence division processing unit 27 performs input sentence division processing for dividing the input sentence into segments (step S104). Here, when it cannot be divided in the middle of the input sentence according to the rules described above, it is handled as one segment.

そして、不要句削除処理部２８で、所定の名詞句及び動詞句以外の句が削除される不要句削除処理が行われる（ステップＳ１０５）。ここで削除された句は、主語動詞の一致誤りの検出対象から除外される。 Then, the unnecessary phrase deletion processing unit 28 performs an unnecessary phrase deletion process in which phrases other than the predetermined noun phrase and verb phrase are deleted (step S105). The phrase deleted here is excluded from the subject of the subject verb matching error detection.

次に、倒置構文変換処理部２９で、存在を意味する「Ｔｈｅｒｅ」を用いた主語と動詞の倒置構文がある場合に、句のレベルで逆順に並び替える語順変換処理が行われる（ステップＳ１０６）。 Next, in the inverted syntax conversion processing unit 29, when there is an inverted syntax of a subject and a verb using “There” meaning presence, a word order conversion process is performed to rearrange them in reverse order at the phrase level (step S106). .

以上の入力文の正規化処理が終了すると、抽出処理部２３で、セグメント毎に、動詞と対応する主語候補とを抽出する主語動詞候補抽出処理が行われる（ステップＳ１０７）。この際、動詞の前（左側）に、主語候補が一つも検出できなければ、対象選定処理部２５で命令文であると判定され、主語動詞の一致誤りの検出対象から除外される。 When the above input sentence normalization processing is completed, the extraction processing section 23 performs subject verb candidate extraction processing for extracting the verb and the corresponding subject candidate for each segment (step S107). At this time, if no subject candidate is detected before (on the left side of) the verb, the object selection processing unit 25 determines that the sentence is an imperative sentence, and excludes it from a subject verb matching error detection target.

最後に、セグメント毎に抽出された動詞と主語候補について、正誤判定手段１６で主語動詞の一致に関する判定処理が行われ（ステップＳ１０８）、その結果が出力される。 Finally, the correctness / incorrectness determination unit 16 performs a determination process on matching of the subject verb with respect to the verb and the subject candidate extracted for each segment (step S108), and the result is output.

なお、入力文の正規化処理の手順ついては、前述の処理順序が必須でなく、後の処理に影響を与えない限りにおいて、処理順序を変更することも可能である。 Note that the order of normalization processing of input sentences is not essential, and the processing order can be changed as long as it does not affect the subsequent processing.

本発明者は、主語動詞の一致が誤っているものを含む日本人英語学習者（大学生）の書いた多数の英文について、前記一致誤り検出装置１２による主語動詞の一致誤りの検出の正確性を評価する実験を行ったところ、構文解析を利用して行う従来手法に比べ、主語動詞の一致誤りをより正確に検出できる結果が得られた。 The present inventor has determined the accuracy of the subject verb matching error detection by the matching error detection device 12 for a large number of English sentences written by Japanese learners of English (university students) including those whose subject verbs are not matched correctly. When we conducted an experiment to evaluate, we found that the subject verb matching error can be detected more accurately than the conventional method using syntax analysis.

なお、前述したように、一致誤り検出装置１２をサーバに設ける態様の他に、各ユーザが保有するコンピュータに前記プログラムをインストールすることにより、当該コンピュータを一致誤り検出装置１２として機能させることも可能である。 As described above, in addition to the mode in which the coincidence error detection device 12 is provided in the server, it is possible to cause the computer to function as the coincidence error detection device 12 by installing the program in a computer owned by each user. It is.

また、本発明は、英語学習システム１０のみならず、英語による翻訳文のチェック等、英語学習以外の用途にも利用することができる。 In addition, the present invention can be used not only for the English learning system 10 but also for purposes other than English learning, such as checking translations in English.

その他、本発明における装置各部の構成は図示構成例に限定されるものではなく、実質的に同様の作用を奏する限りにおいて、種々の変更が可能である。 In addition, the configuration of each part of the apparatus in the present invention is not limited to the illustrated configuration example, and various modifications are possible as long as substantially the same operation is achieved.

１２一致誤り検出装置
１４入力文解析手段
１５主語動詞候補抽出手段
１６正誤判定手段
２２入力文正規化処理部
２３抽出処理部
２５対象選定処理部
２６並列名詞解析処理部
２７入力文分割処理部
２８不要句削除処理部
２９倒置構文変換処理部 DESCRIPTION OF SYMBOLS 12 Match error detection apparatus 14 Input sentence analysis means 15 Subject verb candidate extraction means 16 Correct / incorrect judgment means 22 Input sentence normalization process part 23 Extraction process part 25 Target selection process part 26 Parallel noun analysis process part 27 Input sentence division | segmentation process part 28 Unnecessary Phrase deletion processing unit 29 Inverted syntax conversion processing unit

Claims

A subject verb matching error detection device for detecting a matching error between a subject and a verb in an input sentence in English,
Perform part-of-speech analysis and phrase analysis of the input sentence, specify words, phrases and clauses constituting the input sentence, part-of-speech information related to the type and number of parts of speech of the word, and phrase and clause information related to the type of phrase and clause Using the result of the input sentence analyzing means, and extracting the verb in the verb phrase for the input sentence based on the rules stored in advance without performing syntax analysis, and the verb The subject verb candidate extracting means for extracting a noun in the noun phrase existing on the left side of the subject as a subject candidate, and the personality and number of each of the verb and the subject candidate extracted by the subject verb candidate extracting means are stored in advance. An apparatus for detecting coincidence errors of a subject verb, comprising: correctness / incorrectness determination means for determining whether or not the subject candidate and the verb match based on the rules.

In the correctness / incorrectness determination means, if the extracted verb does not match the subject name and number for all the subject candidates extracted corresponding to the verb, the matching error exists in the input sentence. The subject verb matching error detection apparatus according to claim 1, wherein the subject verb matching error detection apparatus is determined.

The subject verb candidate extraction means includes an input sentence normalization processing unit that performs normalization processing on the input sentence to improve extraction accuracy of the subject candidate and the verb, and normalization processing by the input sentence normalization processing unit The subject verb matching error detection apparatus according to claim 1, further comprising: an extraction processing unit that extracts the subject candidate and the verb from the input sentence after being processed.

The input sentence normalization processing unit specifies the type of the input sentence and performs selection processing for the sentence to be detected, and the number of words constituting a noun phrase in which nouns are arranged in parallel A parallel noun analysis processing unit that performs processing for adjusting the input sentence, an input sentence division processing unit that performs processing for dividing the input sentence at a predetermined position based on the analysis result of the input sentence analysis unit, and the input sentence For an unnecessary phrase deletion processing unit that performs a process of deleting a phrase that cannot be a subject candidate, and an inversion syntax of a subject and a verb starting from Thee, the word order of the subject candidate and the verb is rearranged as in a normal syntax An inversion syntax conversion processing unit for processing,
The subject verb matching error detection apparatus according to claim 3, wherein the extraction processing unit extracts the verb and the subject candidate for each part divided by the input sentence division processing unit.

A program for detecting a matching error of a subject verb for causing a computer to execute a process of detecting a matching error between a subject and a verb of an input sentence in English,
Perform part-of-speech analysis and phrase analysis of the input sentence, specify words, phrases and clauses constituting the input sentence, part-of-speech information related to the type and number of parts of speech of the word, and phrase and clause information related to the type of phrase and clause Using the result of the input sentence analyzing means, and extracting the verb in the verb phrase for the input sentence based on the rules stored in advance without performing syntax analysis, and the verb The subject verb candidate extracting means for extracting a noun in the noun phrase existing on the left side of the subject as a subject candidate, and the personality and number of each of the verb and the subject candidate extracted by the subject verb candidate extracting means are stored in advance. A program for detecting a matching error of a subject verb for causing the computer to function as correctness / incorrectness determination means for determining correctness / incorrectness regarding the match between the subject candidate and the verb based on the rule.