JP2008129449A

JP2008129449A - Automatic question creating device, automatic question creating method and computer program

Info

Publication number: JP2008129449A
Application number: JP2006316057A
Authority: JP
Inventors: Takeshi Masuyama; 毅司増山; Kaori Tanio; 香里谷尾
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2006-11-22
Filing date: 2006-11-22
Publication date: 2008-06-05
Anticipated expiration: 2026-11-22
Also published as: JP5230927B2

Abstract

PROBLEM TO BE SOLVED: To provide an automatic question creating device, an automatic question creating method and a computer program, capable of automatically creating a fill-in-the blank question etc. of complicated grammar, without time and effort. SOLUTION: The automatic question creating device includes a learning section and a classification section. The learning section receives multiple fill-in-the blank questions as reasonable data, and creates non-reasonable data by changing a position from a sentence head in each reasonable data of a blank equivalent part. The reasonable data and the non-reasonable data are characterized by an identity list of a plurality of kinds of identities for characterizing the data, and a reference data is created for distinguishing between a reasonable data group and a non-reasonable data group, by statistic processing. The classification section performs morphological analysis of an input test data and characterizes a candidate data in which each morpheme is specified as the blank equivalent part by the identity list, and outputs the candidate data which is determined to belong to the reasonable data group, by the reference data. COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、穴埋め問題等を自動的に作成する問題自動作成装置、問題自動作成方法、及びコンピュータプログラムに関する。 The present invention relates to an automatic problem creation apparatus, an automatic problem creation method, and a computer program that automatically create a hole filling problem and the like.

入試等のテストにおいて、穴埋め問題や選択肢問題（以下、「穴埋め問題等」と呼ぶ）が出題されるが、自動的に穴埋め問題等を作成することができれば問題作成者にとって有用である。これに対して、入力された学習対象文を形態素解析し、単語一つ一つにつき辞書と対照して対象語を抽出し、抽出された単語部分を空白に置き換えて穴埋め問題を作成する技術が提案されている（特許文献１）。また、質問と解答のペアを含む辞書（コーパス）から任意のペアを選択し、その解答と等価な表現の解答、その質問と等価な表現の質問、その解答と類似しない解答をコーパスから削除することによって不正解用選択肢候補を求め、多肢選択問題を作成する技術が提案されている（特許文献２）。
特開平０６−９５５８３号公報特開２００６−１２６２４２号公報 In a test such as an entrance examination, a hole filling problem and an option problem (hereinafter referred to as “a hole filling problem etc.”) are given, but it is useful for a problem creator if a hole filling problem etc. can be automatically created. On the other hand, there is a technology that performs morphological analysis on the input learning target sentence, extracts the target word for each word against the dictionary, and replaces the extracted word part with a blank to create a hole filling problem. It has been proposed (Patent Document 1). In addition, an arbitrary pair is selected from a dictionary (corpus) containing question and answer pairs, and an answer with an expression equivalent to the answer, a question with an equivalent expression to the question, and an answer not similar to the answer are deleted from the corpus. Thus, a technique for obtaining an incorrect answer option candidate and creating a multiple-choice question has been proposed (Patent Document 2).
Japanese Patent Laid-Open No. 06-95583 JP 2006-126242 A

しかし、特許文献１の技術は、予め辞書に年号や人物の氏名や地名が登録されていて、問題対象文中に辞書に存在する単語が存在する場合にはその部分を空白に置き換えるものであるから、社会科等における年号や人物名等の単純な穴埋め問題等を作成することはできるが、外国語の文法等、高度な穴埋め問題等を作成することができないという問題がある。 However, the technique of Patent Document 1 is such that if a year name, a person's name and a place name are registered in the dictionary in advance, and a word existing in the dictionary exists in the problem target sentence, that portion is replaced with a blank. Therefore, although it is possible to create simple filling problems such as year names and person names in social studies, etc., there is a problem that advanced filling problems such as foreign language grammar cannot be created.

また、特許文献２の技術によれば、正解に類似する不正解解答を作成できるため、より高度な多肢選択問題の作成が可能であるが、質問・解答のペアを事前に手作業で大量に作成し、コーパスを作成する必要があるため、労力の負担が大きいという問題がある。 In addition, according to the technique of Patent Document 2, since an incorrect answer similar to the correct answer can be created, a more advanced multiple-choice question can be created. However, a large number of question / answer pairs are manually created in advance. Therefore, there is a problem that the burden of labor is large.

そこで、本発明は、複雑な文法の穴埋め問題等を手間なく自動で作成することができる
問題自動作成装置、問題自動作成方法、及びコンピュータプログラムを提供することを目的とする。 Therefore, an object of the present invention is to provide an automatic problem creation apparatus, an automatic problem creation method, and a computer program that can automatically create a complicated grammar filling problem without trouble.

（１）機械学習部と分類部とを有する問題自動作成装置であって、前記機械学習部は、空白部分を有する穴埋め問題を学習用途に適する妥当データとして多数受け付ける妥当データ受付部と、各前記妥当データにおいて、文頭からの空白部分の位置を変えることによって学習用途に不適切な非妥当データを生成する非妥当データ生成部と、データを特徴づけるための複数種類の素性の素性リストによって、前記妥当データ及び前記非妥当データを特徴づける特徴解析部と、前記妥当データの集合である妥当データ群と前記非妥当データの集合である非妥当データ群とを統計処理することによって、前記妥当データ群と前記非妥当データ群とを区別するための基準データを生成する基準データ生成部と、を有し、前記分類部は、テストデータの入力を受け付けるテストデータ受付部と、前記テストデータを形態素解析し、各形態素を空白部分として規定した候補データを前記素性リストによって特徴づける候補データ特徴解析部と、前記基準データによって、前記妥当データ群に分類した前記候補データを出力するための出力データとする出力データ生成部と、を有することを特徴とする問題自動作成装置。 (1) An automatic problem creation apparatus including a machine learning unit and a classification unit, wherein the machine learning unit receives a large number of filling data having blank portions as valid data suitable for a learning application, In the valid data, by changing the position of the blank part from the head of the sentence, the invalid data generation unit that generates invalid data inappropriate for learning use, and the feature list of a plurality of types of features for characterizing the data, The valid data group is characterized by statistically processing the valid data and the feature analysis unit that characterizes the invalid data, the valid data group that is a set of the valid data, and the invalid data group that is the set of the invalid data. And a reference data generation unit that generates reference data for distinguishing between the invalid data group and the classification unit, A test data reception unit that receives force, a morphological analysis of the test data, a candidate data feature analysis unit that characterizes candidate data that defines each morpheme as a blank portion by the feature list, and the valid data group by the reference data And an output data generation unit for generating output data for outputting the candidate data classified as (1).

（１）の発明によれば、問題自動作成装置は、妥当データを妥当データ受付部によって受け付けると、基準データ生成部によって、自動的に、前記妥当データ群と前記非妥当データ群とを区別するための基準データを生成し、候補データ特徴解析部によって、テストデータに基づいて生成した候補データを特徴付ける。そして、出力データ生成部によって、妥当データ群に属すると判断した候補データを自動的に出力する。 According to the invention of (1), when valid data is received by the valid data receiving unit, the automatic problem creating apparatus automatically distinguishes between the valid data group and the invalid data group by the reference data generating unit. Reference data is generated, and the candidate data feature analysis unit characterizes the candidate data generated based on the test data. Then, the output data generation unit automatically outputs the candidate data determined to belong to the valid data group.

このように、（１）の発明によれば、自動的に基準データを作成し、その基準データを使用して適切な出力データを生成することができるから、複雑な文法の穴埋め問題等を手間なく自動で作成することができる。 As described above, according to the invention of (1), it is possible to automatically create reference data and generate appropriate output data using the reference data. Can be created automatically.

なお、本明細書においては、「妥当」とは、学習用途に適切であることを意味する。そして、「非妥当」とは、学習用途に不適切であることを意味する。また、「正答」とは、客観的に正しい文章を構成することができる正しい答えであることを意味する。そして、「誤答」とは、客観的に誤りを有する文章を構成する誤りの答えを意味する。
そして、「妥当」か「非妥当」かの区別と、「正答」か「誤答」かの区別は、別途独立の概念である。 In this specification, “appropriate” means that it is appropriate for a learning application. And “invalid” means inappropriate for learning use. The “correct answer” means a correct answer that can objectively compose a correct sentence. The “wrong answer” means an erroneous answer that constitutes an objectively erroneous sentence.
The distinction between “valid” and “invalid” and the distinction between “correct answer” and “wrong answer” are separately independent concepts.

（２）前記基準データ生成部は、サポートベクターマシンであり、前記基準データは、サポートベクターによって規定される識別面であることを特徴とする（１）に記載の問題自動作成装置。 (2) The automatic problem creation apparatus according to (1), wherein the reference data generation unit is a support vector machine, and the reference data is an identification plane defined by a support vector.

（２）に記載の構成によれば、（１）の効果に加えて、サポートベクターマシン（ＳＶＭ：ＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｃｈｉｎｅ）という信頼性のある手段によって、複雑な文法の穴埋め問題等を手間なく自動で作成することができる。ここで、サポートベクターマシン（ＳＶＭ）とは、１９９５年にＡＴ＆ＴのＶ．Ｖａｐｎｉｋによって統計的学習理論の枠組で提案された学習機械のことであり、高次元特徴空間において線形関数の仮説空間を用いる学習システムである。 According to the configuration described in (2), in addition to the effect of (1), a complicated means such as filling a complicated grammar can be automatically and easily performed by a reliable means such as a support vector machine (SVM). Can be created. Here, the support vector machine (SVM) is an AT & T V. It is a learning machine proposed by Vapnik in the framework of statistical learning theory, and is a learning system that uses a hypothesis space of a linear function in a high-dimensional feature space.

（３）前記出力データ生成部は、前記識別面に最も近い前記候補データを前記出力データとすることを特徴とする（２）に記載の問題自動作成装置。 (3) The automatic problem creation device according to (2), wherein the output data generation unit sets the candidate data closest to the identification plane as the output data.

（３）に記載の構成によれば、識別面から大きく乖離して妥当データであることが明確な候補データではなくて、識別面から最も近く、非妥当データと最も近い妥当データを出力するから、（１）及び（２）の効果に加えて、正答か否かの判断が困難な良問を作成することができる。 According to the configuration described in (3), the valid data closest to the identification surface and closest to the invalid data is output, not the candidate data that is clearly deviated from the identification surface and is valid data. In addition to the effects (1) and (2), it is possible to create a good question for which it is difficult to determine whether the answer is correct.

（４）前記妥当データは、国語又は外国語のテストにおいて出題された問題文と解答の組であることを特徴とする（１）乃至（３）のいずれかに記載の問題自動作成装置。 (4) The automatic question creation apparatus according to any one of (1) to (3), wherein the valid data is a set of a question sentence and an answer given in a test of a national language or a foreign language.

（４）に記載の構成によれば、出題された国語又は外国語のテストに基づいているので、（１）乃至（３）の効果に加えて、国語又は外国語の問題において、適切な穴埋め問題等を作成することができる。 According to the configuration described in (4), since it is based on the test of the given language or foreign language, in addition to the effects of (1) to (3), appropriate filling in in the problem of national language or foreign language. Can create problems etc.

（５）前記特徴解析部は、さらに、前記妥当データに含まれる正答以外の誤答選択肢を空白部分に当てはめて生成した第１誤答データと、前記妥当データに含まれないが前記誤答選択肢と所定の関連性を有する単語を空白部分に当てはめて生成した第２誤答データとを、前記素性リスト及び誤答用の追加素性リストによって特徴づける誤答データ特徴解析部と、前記第１誤答データの集合である第１誤答データ群と前記第２誤答データの集合である第２誤答データ群とを統計処理することによって、前記第１誤答データ群と前記第２誤答データ群とを区別するための選択肢用基準データを生成する選択肢用基準データ生成部と、を有し、前記分類部は、さらに、前記出力データの空白部分に該当する語と所定の関係を有する語を空白部分にあてはめて生成した候補誤答データを、前記素性リスト及び前記追加素性リストによって特徴づける候補誤答データ特徴解析部と、前記選択肢用基準データによって、前記第１誤答データ群に属すると判断した前記候補誤答データの空白部分に該当する単語を出力する選択肢生成部と、有することを特徴とする（１）乃至（４）のいずれかに記載の問題自動作成装置。 (5) The feature analysis unit further includes first incorrect answer data generated by applying an incorrect answer option other than the correct answer included in the valid data to a blank portion, and the incorrect answer option that is not included in the valid data. And the second erroneous answer data generated by applying a word having a predetermined relationship to a blank part by the feature list and the additional feature list for erroneous answers, and the first erroneous data By statistically processing a first erroneous answer data group that is a set of answer data and a second erroneous answer data group that is a set of the second incorrect answer data, the first incorrect answer data group and the second incorrect answer data are processed. An option reference data generation unit that generates option reference data for distinguishing the data group, and the classification unit further has a predetermined relationship with a word corresponding to a blank portion of the output data Words in blanks The candidate error data generated by fitting the candidate error data feature analysis unit characterized by the feature list and the additional feature list and the option reference data are determined to belong to the first error data group. The automatic problem creation device according to any one of (1) to (4), further comprising: an option generation unit that outputs a word corresponding to a blank portion of the candidate incorrect answer data.

（５）に記載の構成によれば、誤答データ特徴解析部によって、第１誤答データと第２誤答データとを、前記素性リスト及び誤答用の追加素性リストによって特徴づけ、選択肢用基準データ生成部によって、選択肢用基準データを生成する。そして、候補誤答データ特徴解析部によって、候補誤答データを、素性リスト及び追加素性リストによって特徴づけて、選択肢生成部によって前記第１誤答データ群に属すると判断した前記候補誤答データの空白部分に該当する単語を出力する。 According to the configuration described in (5), the erroneous answer data feature analysis unit characterizes the first incorrect answer data and the second incorrect answer data with the feature list and the additional feature list for erroneous answers, and for the option. The reference data generation unit generates option reference data. Then, the candidate error data is characterized by the candidate error data feature analysis unit by the feature list and the additional feature list, and the candidate error data determined by the option generation unit as belonging to the first error data group Output the word corresponding to the blank part.

このように、問題自動作成装置は、自動的に選択肢用基準データを生成し、前記第１誤答データ群に属すると判断した前記候補誤答データの空白部分に該当する単語を自動的に出力するから、（１）乃至（４）の効果に加えて、選択肢を含む複雑な文法の穴埋め問題等を手間なく自動で作成することができる。 As described above, the automatic question creation apparatus automatically generates the reference data for options, and automatically outputs a word corresponding to the blank portion of the candidate wrong answer data determined to belong to the first wrong answer data group. Therefore, in addition to the effects (1) to (4), a complicated grammar filling problem including options can be automatically created without trouble.

（６）前記選択肢用基準データ生成部は、サポートベクターマシンであり、
前記選択肢用基準データは、サポートベクターによって規定される識別面であることを特徴とする（５）に記載の問題自動作成装置。 (6) The option reference data generation unit is a support vector machine,
The automatic problem creation device according to (5), wherein the option reference data is an identification surface defined by a support vector.

（６）に記載の構成によれば、（５）の効果に加えて、サポートベクターマシンという信頼性のある手段によって、選択肢を、手間なく自動で作成することができる。 According to the configuration described in (6), in addition to the effect of (5), options can be automatically created without trouble by a reliable means such as a support vector machine.

（７）前記選択肢生成部は、前記識別面に最も近い前記候補誤答データの空白部分に該当する単語を出力する構成となっていることを特徴とする（６）に記載の問題自動作成装置。 (7) The automatic problem creation device according to (6), wherein the option generation unit is configured to output a word corresponding to a blank portion of the candidate incorrect answer data closest to the identification plane. .

（７）に記載の構成によれば、識別面から大きく乖離して第１誤答データ群に属することが明確な候補データではなくて、識別面から最も近く、第２誤答データ群と最も近い第１誤答データの空白部分に該当する単語を出力するから、（６）の効果に加えて、正答か否かの判断が困難な良問を作成することができる。 According to the configuration described in (7), the candidate data is not the candidate data that deviates greatly from the identification plane and belongs to the first erroneous answer data group, but is closest to the identification plane and is the most similar to the second erroneous answer data group. Since a word corresponding to the blank portion of the first first incorrect answer data is output, in addition to the effect of (6), it is possible to create a good question in which it is difficult to determine whether or not the answer is correct.

（８）穴埋め問題の自動作成方法であって、空白部分を有する穴埋め問題を妥当データとして多数受け付ける妥当データ受付ステップと、空白部分の各前記妥当データにおける文頭からの位置を変えることによって非妥当データを生成する非妥当データ生成ステップと、データを特徴づけるための複数種類の素性の素性リストによって、前記妥当データ及び前記非妥当データを特徴づける特徴解析ステップと、前記妥当データの集合である妥当データ群と前記非妥当データの集合である非妥当データ群とを統計処理することによって、前記妥当データ群と前記非妥当データ群とを区別するための基準データを生成する基準データ生成ステップと、テストデータの入力を受け付けるテストデータ受付ステップと、
前記テストデータを形態素解析し、各形態素を空白部分として規定した候補データを前記素性リストによって特徴づける候補データ特徴解析ステップと、前記基準データによって、前記妥当データ群に属すると判断した前記候補データを出力するための出力データとする出力データ生成ステップと、を有することを特徴とする問題自動作成方法。 (8) A method for automatically creating a hole-filling problem, in which a valid data receiving step for receiving a large number of hole-filling problems having blank portions as valid data, and by changing the position of each blank portion from the beginning of each valid data, the invalid data A non-valid data generation step for generating data, a feature analysis step for characterizing the valid data and the non-valid data by a feature list of a plurality of types of features for characterizing the data, and valid data that is a set of the valid data A reference data generation step for generating reference data for distinguishing between the valid data group and the invalid data group by statistically processing the group and the invalid data group that is a set of the invalid data; and a test A test data reception step for receiving data input;
A candidate data feature analyzing step of characterizing candidate data defining each morpheme as a blank part by the feature list by performing morphological analysis on the test data, and the candidate data determined to belong to the valid data group by the reference data And an output data generation step for generating output data for output.

（８）の発明によれば、（１）の発明と同様に、自動的に基準データを作成し、その基準データを使用して適切な出力データを生成することができるから、複雑な文法の穴埋め問題等を手間なく自動で作成することができる。 According to the invention of (8), as in the invention of (1), it is possible to automatically create reference data and generate appropriate output data using the reference data. It is possible to automatically create a hole filling problem without trouble.

（９）コンピュータを、穴埋め問題の自動作成装置として機能させるためのコンピュータプログラムであって、空白相当部分を有する穴埋め問題を妥当データとして多数受け付ける妥当データ受付ステップと、空白相当部分の各前記妥当データにおける文頭からの位置を変えることによって非妥当データを生成する非妥当データ生成ステップと、データを特徴づけるための複数種類の素性の素性リストによって、前記妥当データ及び前記非妥当データを特徴づける特徴解析ステップと、前記妥当データの集合である妥当データ群と前記非妥当データの集合である非妥当データ群とを統計処理することによって、前記妥当データ群と前記非妥当データ群とを区別するための基準データを生成する基準データ生成ステップと、テストデータの入力を受け付けるテストデータ受付ステップと、前記テストデータを形態素解析し、各形態素を空白相当部分として規定した候補データを前記素性リストによって特徴づける候補データ特徴解析ステップと、前記基準データによって、前記妥当データ群に属すると判断した前記候補データを出力するための出力データとする出力データ生成ステップと、を実行させることを特徴とするコンピュータプログラム。 (9) A computer program for causing a computer to function as a device for automatically creating a hole-filling problem, wherein a valid data receiving step for accepting a large number of hole-filling problems having a blank equivalent part as valid data, and each of the valid data of a blank equivalent part Characteristic analysis for characterizing the valid data and the invalid data by a non-valid data generation step for generating invalid data by changing the position from the beginning of the sentence and a feature list of a plurality of types of features for characterizing the data Statistically processing a valid data group that is a set of valid data and an invalid data group that is a set of invalid data, thereby distinguishing the valid data group from the invalid data group Step of generating reference data to generate reference data and input of test data A test data receiving step for receiving, a morphological analysis of the test data, a candidate data feature analyzing step for characterizing candidate data defining each morpheme as a blank equivalent part by the feature list, and the reference data to the valid data group An output data generation step for generating output data for outputting the candidate data determined to belong is executed.

本発明によれば、複雑な文法の穴埋め問題等を手間なく自動で作成することができる
問題自動作成装置、問題自動作成方法、及びコンピュータプログラムを提供することができる。 According to the present invention, it is possible to provide an automatic problem creation apparatus, an automatic problem creation method, and a computer program that can automatically create a complicated grammar filling problem without trouble.

（第１の実施形態）
図１は、本発明の第１の実施形態に係る問題自動作成装置１０（以下、「装置１０」と呼ぶ）の概略構成を示すブロック図である。装置１０は、各種情報の入力を受け付ける入力部１２、辞書データベース（ＤＢ）１４、学習結果データベース（ＤＢ）１６、出力部１８及び制御部２０を有する。辞書ＤＢ１４は、例えば、英語の辞書であり、制御部２０が、ある単語と所定の関連性を有する単語を判定することができるように構成されている。ここで、「ある単語と所定の関連性を有する単語」とは、例えば、英語の「ｏｎ」に対して、「ａｔ」や「ｉｎ」である。これらは、場所を示す前置詞という共通の性質を有し、「所定の関連性を有する単語」である。装置１０は、例えば通常のＰＣやサーバ等のコンピュータであるが、携帯電話等の携帯情報端末であってもよい。なお、本実施形態の英語の辞書は英和辞書であるが、本実施形態とは異なり、英英辞書等であってもよい。 (First embodiment)
FIG. 1 is a block diagram showing a schematic configuration of an automatic problem creation apparatus 10 (hereinafter referred to as “apparatus 10”) according to the first embodiment of the present invention. The apparatus 10 includes an input unit 12 that receives input of various information, a dictionary database (DB) 14, a learning result database (DB) 16, an output unit 18, and a control unit 20. The dictionary DB 14 is an English dictionary, for example, and is configured so that the control unit 20 can determine a word having a predetermined relationship with a certain word. Here, “a word having a predetermined relationship with a certain word” is, for example, “at” or “in” with respect to “on” in English. These have the common property of a preposition indicating a place, and are “words having a predetermined relationship”. The device 10 is a computer such as a normal PC or server, but may be a portable information terminal such as a mobile phone. The English dictionary of this embodiment is an English-Japanese dictionary, but unlike this embodiment, an English-English dictionary or the like may be used.

制御部２０は、例えば、装置１０を制御する中央演算装置である、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）である。 The control unit 20 is, for example, a CPU (Central Processing Unit) that is a central processing unit that controls the apparatus 10.

制御部２０は、機械学習部３０と分類部５０を含む。機械学習部３０は、入力部１２によって入力されたデータを自動的に学習し、統計的処理を実施し、妥当なデータ（以下、「妥当データ」と呼ぶ）と妥当ではないデータ（以下、「非妥当データ」と呼ぶ）との区別の基準を示す基準データを生成するための構成である。入力部１２と機械学習部３０は、学習手段として機能する。機械学習部３０は、ソフトウェアであるサポートベクターマシン（ＳＶＭ：ＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｃｈｉｎｅ）を含む。上述のように、サポートベクターマシンとは、高次元特徴空間において線形関数の仮説空間を用いる学習システムである。その学習結果は、識別面は、妥当データの集合と非妥当データの集合とを識別するための識別面及びサポートベクターを含み、学習結果ＤＢ１６に格納される。分類部５０は、入力された英文（テストデータ）と、学習結果ＤＢ１６の学習結果とを使用して、新たな穴埋め問題を作成するための構成である。 The control unit 20 includes a machine learning unit 30 and a classification unit 50. The machine learning unit 30 automatically learns the data input by the input unit 12 and performs statistical processing. The machine learning unit 30 performs appropriate processing (hereinafter, referred to as “valid data”) and invalid data (hereinafter, “ This is a configuration for generating reference data indicating a criterion for distinction with “invalid data”. The input unit 12 and the machine learning unit 30 function as learning means. The machine learning unit 30 includes a support vector machine (SVM: Support Vector Machine) that is software. As described above, the support vector machine is a learning system that uses a hypothesis space of a linear function in a high-dimensional feature space. The learning result includes an identification surface and a support vector for identifying a valid data set and an invalid data set, and is stored in the learning result DB 16. The classification unit 50 has a configuration for creating a new hole filling problem using the input English text (test data) and the learning result of the learning result DB 16.

図２は、機械学習部３０の詳細を示すブロック図である。図２に示すように、機械学習部３０は、非妥当データ生成部３２、特徴解析部３４及び基準データ生成部３６を有する。非妥当データ生成部３２は、トレーニングデータに基づいて非妥当データを生成するための構成である。特徴解析部３４は妥当データ及び非妥当データの特徴を解析するための構成である。基準データ生成部３６は妥当データと非妥当データを識別するための基準としての識別面及びサポートベクターを算出するための構成である。なお、各部の機能の詳細については後述する。 FIG. 2 is a block diagram showing details of the machine learning unit 30. As illustrated in FIG. 2, the machine learning unit 30 includes an invalid data generation unit 32, a feature analysis unit 34, and a reference data generation unit 36. The invalid data generation unit 32 is a configuration for generating invalid data based on the training data. The feature analysis unit 34 is configured to analyze features of valid data and invalid data. The reference data generation unit 36 is configured to calculate an identification plane and a support vector as a reference for identifying valid data and invalid data. Details of the function of each unit will be described later.

図３は、分類部５０の詳細を示すブロック図である。図３に示すように、分類部５０は、候補データ特徴解析部５２及び出力データ生成部５４を有する。候補データ特徴解析部５２は、テストデータから候補データを作成し、その候補データの特徴を解析するための構成である。出力データ生成部５４は、候補データの中から出力するためのデータを選択（生成）するための構成である。各部の機能の詳細については後述する。 FIG. 3 is a block diagram showing details of the classification unit 50. As illustrated in FIG. 3, the classification unit 50 includes a candidate data feature analysis unit 52 and an output data generation unit 54. The candidate data feature analysis unit 52 is configured to create candidate data from the test data and analyze the feature of the candidate data. The output data generation unit 54 is configured to select (generate) data to be output from candidate data. Details of the function of each unit will be described later.

図４は、機械学習部３０が使用する素性（そせい）リストの一例を示す図である。素性とは、入力されたデータを特徴づけるものである。素性値は、素性に与えられるスコアである。素性値は、「０」か「１」のみの２値をとるものや、「０か「−１」のみの２値をとるものや、任意の数値をとるものがある。 FIG. 4 is a diagram illustrating an example of a feature list used by the machine learning unit 30. A feature characterizes input data. The feature value is a score given to the feature. The feature value includes a binary value of “0” or “1”, a binary value of “0” or “−1”, or an arbitrary numerical value.

素性リストは、複数種類の素性から構成され、例えば、図４に示すように、素性Ａ〜Ｆの６種類の素性から構成されている。素性Ａは、穴埋め問題の空白部分にあたる単語であり、単語が存在すれば素性値１に規定され、存在しなければ０に規定される。なお、「空白部分」とは、穴埋め問題において、埋めるべき穴の部分を示す。言い換えると、「空白部分」は、穴埋め問題の穴（空所）に入るべき単語（以下、「問題単語」と呼ぶ）が、複数の単語で構成される文章において入るべき位置を意味する。 The feature list is composed of a plurality of types of features. For example, as shown in FIG. 4, the feature list is composed of six types of features A to F. The feature A is a word corresponding to the blank part of the hole-filling problem, and is defined as a feature value 1 if there is a word, and is defined as 0 if it does not exist. The “blank part” indicates a hole part to be filled in the hole filling problem. In other words, the “blank part” means a position where a word (hereinafter referred to as “problem word”) that should enter a hole (vacant space) of the hole filling problem should be included in a sentence composed of a plurality of words.

素性Ｂは、空所に当たる単語の品詞であり、品詞が決定できれば素性値が１に規定され、品詞が決定できなければ素性値が０に規定される。素性Ｃは、空所に当たる単語の原型であり、原型が決定できれば素性値が１に規定され、原型が決定できなければ素性値が０に規定される。素性Ｄは、空所に当たる単語と原型の一致であり、一致すれば素性値が１に規定され、一致しなければ素性値が０に規定される。素性Ｅは、文の長さであり、素性値は文を構成する単語数によって規定される。素性Ｆは、文中の空所位置であり、素性値は先頭からの単語数によって規定される。 The feature B is the part of speech of the word corresponding to the empty space. If the part of speech can be determined, the feature value is defined as 1, and if the part of speech cannot be determined, the feature value is defined as 0. The feature C is a prototype of a word corresponding to a void. If the prototype can be determined, the feature value is defined as 1. If the prototype cannot be determined, the feature value is defined as 0. The feature D is a match between the word corresponding to the empty space and the prototype, and if they match, the feature value is defined as 1, and if not, the feature value is defined as 0. The feature E is the length of the sentence, and the feature value is defined by the number of words constituting the sentence. The feature F is a blank position in the sentence, and the feature value is defined by the number of words from the beginning.

図５は、トレーニングデータの一例を示す図である。トレーニングデータとは、装置１０が学習するための材料となるデータである。本実施の形態においては、トレーニングデータは、空白部分を有する穴埋め問題の問題文と正答である。 FIG. 5 is a diagram illustrating an example of training data. Training data is data used as a material for the apparatus 10 to learn. In the present embodiment, the training data is a question sentence and a correct answer of a hole filling problem having a blank part.

図５に示すように、トレーニングデータは、例えば、「Ｔｈａｎｋｙｏｕｖｅｒｙｍｕｃｈ（ｆｏｒ）ｔａｋｉｎｇｃａｒｅｏｆｍｙｃｈｉｌｄｒｅｎ．」という英文である。この英文においては、カッコ内の単語である「ｆｏｒ」が、空白部分の単語（問題単語）であり、正答である。実際に出題されるテストにおいては、空白部分は空所とされる。なお、上述の英文は、実際の英語のテストにおいて出題された問題文とその正答である。このトレーニングデータは実際の英語のテストにおいて出題されたものであるから、学習用途として妥当である。このため、本実施形態において、トレーニングデータに示される問題文と正答は妥当データとしている。なお、本実施の形態とは異なり、トレーニングデータは、日本語（国語）のテストにおいて出題された問題文とその正答であってもよいし、英語以外の外国語（フランス語等）のテストにおいて出題された問題文とその正答であってもよい。 As illustrated in FIG. 5, the training data is, for example, an English sentence “Thank you very much (for) taking care of my child.”. In this English sentence, the word “for” in parentheses is a blank word (question word) and is a correct answer. In the actual test, blanks are left blank. Note that the above-mentioned English sentences are the question sentences given in the actual English test and their correct answers. Since this training data is given in an actual English test, it is appropriate for learning purposes. For this reason, in this embodiment, the question sentence and the correct answer shown in the training data are set as valid data. Unlike the present embodiment, the training data may be a question sentence and its correct answer given in a Japanese (Japanese language) test, or in a test in a foreign language other than English (French etc.). It may be a question sentence and its correct answer.

入力部１２（図１参照）は、上述のトレーニングデータを多数受け付けるようになっている。また、入力部１２は、トレーニングデータがもともと多肢選択問題である場合には、正答以外の選択肢の入力も受けるようになっている。正答以外の選択肢も、問題としては妥当であるから、妥当データである。このように、入力部１２は、妥当データ受付部として機能する。 The input unit 12 (see FIG. 1) is adapted to accept a large number of the training data described above. Further, the input unit 12 is configured to receive input of options other than the correct answer when the training data is originally a multiple choice question. Options other than the correct answer are also valid data because they are valid as problems. Thus, the input unit 12 functions as a valid data receiving unit.

機械学習部３０（図１参照）は、空白部分の各トレーニングデータにおける文頭からの位置を変えることによって、非妥当データを生成する。非妥当データは、学習用途として妥当ではないデータである。非妥当データに基づいて使用して作成した穴埋め問題は、英語を勉強する学生等の学習効果を十分に達成することができず、学習用途として不適切な問題になる。この非妥当データは、各トレーニングデータに基づいて生成される。 The machine learning unit 30 (see FIG. 1) generates invalid data by changing the position of the blank portion from the beginning of each sentence in the training data. Non-valid data is data that is not valid for learning purposes. The hole filling problem created based on the invalid data cannot sufficiently achieve the learning effect of students studying English, and becomes an inappropriate problem for learning use. This invalid data is generated based on each training data.

図６は、非妥当データの一例を示す図である。機械学習部３０の非妥当データ生成部３２は、図６に示すように、各トレーニングデータの空白部分の文頭からの位置を変えて、非妥当データを生成する。図６において、かっこ内の部分が空白部分である。 FIG. 6 is a diagram illustrating an example of invalid data. As illustrated in FIG. 6, the invalid data generation unit 32 of the machine learning unit 30 generates invalid data by changing the position of the blank portion of each training data from the beginning of the sentence. In FIG. 6, the part in parentheses is a blank part.

図７は、妥当データ及び非妥当データ（以下、「妥当データ等」と呼ぶ）に基づいて生成される事例の一例を示す図である。事例は、妥当データ等の特徴を示すデータである。妥当データは妥当であることが既知であり、そのラベルは「１」である。非妥当データは妥当でないことが既知であり、そのラベルは「−１」である。 FIG. 7 is a diagram illustrating an example of a case generated based on valid data and invalid data (hereinafter referred to as “valid data”). Examples are data indicating features such as valid data. Valid data is known to be valid and its label is “1”. It is known that invalid data is not valid and its label is “−1”.

特徴解析部３４（図２参照）は、妥当データ等を形態素解析し、空白部分に当てはまる単語の素性及び妥当データ等全体の素性を判定し、さらにラベルの値を付加する（以下、「ラベル付け」と呼ぶ）。例えば、空白部分の単語が「ｆｏｒ」である妥当データについては、図４の素性リストにおいて、素性Ａが「ｆｏｒ」でありその素性値が「１」、素性Ｂが「前置詞」でありその素性値が「１」、素性Ｃが「ｆｏｒ」でありその素性値が「１」、素性Ｄが「一致」でありその素性値が「１」である。そして、素性Ｅは「文の長さ」でありその素性値は「１０」であり素性Ｆは「文中の空所の位置」であり素性値は「５」である。これらの素性Ａ〜Ｆ及び対応する素性値が妥当データの事例である。そして、妥当データであるから、ラベルは「１」である。 The feature analysis unit 34 (see FIG. 2) performs morphological analysis on valid data, etc., determines the features of the words that apply to the blank portion and the entire features such as valid data, and adds a label value (hereinafter referred to as “labeling”). "). For example, for valid data in which the word of the blank part is “for”, in the feature list of FIG. 4, the feature A is “for”, the feature value is “1”, and the feature B is “preposition”. The value is “1”, the feature C is “for”, the feature value is “1”, the feature D is “match”, and the feature value is “1”. The feature E is “sentence length”, the feature value is “10”, the feature F is “a position of a space in the sentence”, and the feature value is “5”. These features A to F and corresponding feature values are examples of valid data. Since it is valid data, the label is “1”.

これに対して、例えば、空白部分の単語が「ｔａｋｉｎｇ」である非妥当データについては、素性Ａが「ｔａｋｉｎｇ」でありその素性値が「１」、素性Ｂが「動詞」でありその素性値が「１」、素性Ｃが「ｔａｋｅ」でありその素性値が「１」、素性Ｄが「不一致」でありその素性値が「０」である。そして、素性Ｅは「文の長さ」であり素性値は「１０」であり、素性Ｆは「文中の空所の位置」でありその素性値は「６」である。そして、非妥当データであるから、ラベルは「−１」である。 On the other hand, for example, for invalid data whose blank word is “taking”, the feature A is “taking”, the feature value is “1”, the feature B is “verb”, and the feature value is Is “1”, the feature C is “take”, the feature value is “1”, the feature D is “mismatch”, and the feature value is “0”. The feature E is “sentence length” and the feature value is “10”, the feature F is “position of a space in the sentence” and the feature value is “6”. Since the data is invalid, the label is “−1”.

同様にして、すべての非妥当データについて、素性の判定及びラベル付けが実施される。そして、妥当データ等についてのラベル付きの事例が、学習結果ＤＢ１６に格納される。 Similarly, feature determination and labeling are performed on all invalid data. Then, a case with a label about valid data or the like is stored in the learning result DB 16.

図８は、基準データ生成部３６（図２参照）による学習結果の一例を示す図である。基準データは、基準データ生成部３６によって生成される。基準データ生成部３６は、サポートベクターマシンである。基準データ生成部３６は、妥当か非妥当かのラベルが未知の事例に対して、ラベルを推定する分類器である。妥当データ等を事例にしたうえで、妥当データから生成された事例（以下、「妥当事例」と呼ぶ）と非妥当データから生成された事例（以下、「非妥当事例」と呼ぶ）との２つの識別面同士の距離（マージン）が最大になるような識別面を算出する。このように、基準データ生成部３６は、妥当データ等を事例という形に変換したうえで、学習するようになっている。 FIG. 8 is a diagram illustrating an example of a learning result by the reference data generation unit 36 (see FIG. 2). The reference data is generated by the reference data generation unit 36. The reference data generation unit 36 is a support vector machine. The reference data generation unit 36 is a classifier that estimates a label for a case in which a valid or invalid label is unknown. Cases that are generated from valid data (hereinafter referred to as “valid cases”) and cases that are generated from invalid data (hereinafter referred to as “invalid cases”) after taking appropriate data as examples The identification surface is calculated such that the distance (margin) between the two identification surfaces is maximized. As described above, the reference data generation unit 36 learns after converting the valid data into the form of the case.

基準データ生成部３６は、識別面にもっとも近接する妥当事例と非妥当事例をサポートベクターとして、ラベルが未知の事例の分類に利用するようになっている。このように、基準データ生成部３６は、妥当事例と非妥当事例とを事例にしたうえで統計処理することによって妥当データ（事例）群と非妥当データ（事例）群とを区別するための識別データである識別面を生成するようになっている。 The reference data generation unit 36 uses a valid case and an invalid case that are closest to the identification surface as support vectors, and uses them for classification of cases whose labels are unknown. In this way, the reference data generation unit 36 makes an identification for distinguishing between a valid data (case) group and an invalid data (case) group by performing statistical processing after taking a valid case and an invalid case as examples. An identification surface which is data is generated.

図８に示すように、基準データ生成部３６は、妥当事例が属する集合である妥当事例群（妥当事例群集合）と、非妥当事例が属する集合である非妥当事例群（非妥当事例集合）とを識別するための識別面を生成する。そして、識別面にもっとも近接する妥当事例と非妥当事例をサポートベクターとする。言い換えると、識別面はサポートベクターによって規定されている。 As shown in FIG. 8, the reference data generation unit 36 includes a valid case group (a valid case group set) that is a set to which valid cases belong, and an invalid case group (invalid case set) that is a set to which invalid cases belong. An identification surface is generated for identifying. Then, the valid case and the invalid case that are closest to the identification surface are used as support vectors. In other words, the identification plane is defined by the support vector.

図９は、テストデータの一例等を示す図である。図９（ａ）に示すように、本実施形態においては、テストデータは「Ｔｈａｎｋｙｏｕｖｅｒｙｍｕｃｈｆｏｒｙｏｕｒｈｅｌｐ．」という英文である。このテストデータは、入力部１２（図１参照）によって受け付けられる。このように、入力部１２は、テストデータ受付部としても機能する。 FIG. 9 is a diagram illustrating an example of test data. As shown in FIG. 9A, in the present embodiment, the test data is an English sentence “Thank you very much for your help.” This test data is received by the input unit 12 (see FIG. 1). Thus, the input unit 12 also functions as a test data receiving unit.

分類部５０の候補データ特徴解析部５２（図３参照）は、テストデータを形態素解析し、各形態素を空白部分として規定した候補データを素性リストによって特徴づけるための構成である。 The candidate data feature analysis unit 52 (see FIG. 3) of the classification unit 50 is configured to perform morphological analysis on the test data and to characterize candidate data that defines each morpheme as a blank part using a feature list.

図９（ｂ）は、候補データの一例を示す図である。図９（ａ）に示すように、候補データ特徴解析部５２は、テストデータから、各形態素を空白部分とした複数の候補データを生成する。 FIG. 9B is a diagram illustrating an example of candidate data. As shown in FIG. 9A, the candidate data feature analysis unit 52 generates a plurality of candidate data having each morpheme as a blank portion from the test data.

図９（ｃ）は、各候補データに基づいて生成された事例の一例を示す図である。図９（ｃ）に示すように、各候補データごとに、事例が生成される。以後、候補データに基づいて生成された事例を「候補事例」と呼ぶ。なお、候補事例は妥当であるか否かは未知であるから、そのラベルは未知である。 FIG. 9C is a diagram illustrating an example of a case generated based on each candidate data. As shown in FIG. 9C, a case is generated for each candidate data. Hereinafter, a case generated based on the candidate data is referred to as a “candidate case”. Since it is unknown whether the candidate case is valid, its label is unknown.

図１０は、出力データ生成部５４（図３参照）が、候補事例をラベル付けする方法の一例を示す図である。図１０に示すように、出力データ生成部５４は、各候補事例を識別面によって、妥当事例群に属する候補事例（以下、「妥当候補事例」と呼ぶ）と、非妥当事例群に属する候補事例（以下、「非妥当候補事例」と呼ぶ）とに区別する。これは、候補事例を、ラベル付けすることを意味する。妥当候補事例のラベルは「１」であり、非妥当候補事例のラベルは「−１」である。 FIG. 10 is a diagram illustrating an example of a method in which the output data generation unit 54 (see FIG. 3) labels candidate cases. As shown in FIG. 10, the output data generation unit 54 uses each identification case to identify each candidate case belonging to a valid case group (hereinafter referred to as a “valid candidate case”) and a candidate case belonging to an invalid case group. (Hereinafter referred to as “invalid candidate cases”). This means that candidate cases are labeled. The label of the valid candidate case is “1”, and the label of the non-valid candidate case is “−1”.

出力データ生成部５４は、妥当候補事例である候補データを出力するための出力データとするようになっている。詳細には、出力データ生成部５４は、識別面に最も近い妥当候補事例を判定し、その妥当候補事例に対応する候補データを出力データとするようになっている。 The output data generation unit 54 is configured to output data for outputting candidate data that are valid candidate cases. Specifically, the output data generation unit 54 determines a valid candidate case closest to the identification surface, and uses candidate data corresponding to the valid candidate case as output data.

装置１０は、以上のように構成されている。上述のように、装置１０は、妥当データを入力部１２（図１参照）によって受け付けると、基準データ生成部３６（図２参照）によって、自動的に妥当データ群と非妥当データ群とを区別するための識別面とサポートベクターを生成し、候補データ特徴解析部５２（図３参照）によって、テストデータに基づいて生成した候補データを特徴付ける。そして、出力データ生成部５４（図３参照）によって、妥当データ群に属すると判断した候補データを自動的に出力する。 The apparatus 10 is configured as described above. As described above, when the valid data is received by the input unit 12 (see FIG. 1), the apparatus 10 automatically distinguishes the valid data group from the invalid data group by the reference data generation unit 36 (see FIG. 2). An identification plane and a support vector are generated, and candidate data generated based on the test data is characterized by the candidate data feature analysis unit 52 (see FIG. 3). Then, the candidate data determined to belong to the valid data group is automatically output by the output data generation unit 54 (see FIG. 3).

このように、装置１０によれば、自動的に基準データを作成し、その基準データを使用して適切な出力データを生成することができるから、複雑な文法の穴埋め問題等を手間なく自動で作成することができる。 As described above, according to the apparatus 10, since the reference data can be automatically created and appropriate output data can be generated using the reference data, the complicated grammar filling problem can be automatically and easily performed. Can be created.

基準データ生成部３６は、サポートベクターマシンという信頼性のある手段によって、複雑な文法の穴埋め問題等を手間なく自動で作成することができる。そして、出力データ生成部５４は、識別面に最も近い候補データを出力データとするから、識別面から大きく乖離して妥当データであることが明確な候補データではなくて、識別面から最も近く、非妥当データと最も近い妥当データを出力する。このため、正答か否かの判断が困難な良問を作成することができる。 The reference data generating unit 36 can automatically create a complicated grammar filling problem and the like without trouble by a reliable means called a support vector machine. The output data generation unit 54 uses the candidate data closest to the identification plane as output data. Therefore, the output data generation unit 54 is not the candidate data that is significantly deviated from the identification plane and is clearly valid data, but is closest to the identification plane. The valid data closest to the invalid data is output. For this reason, it is possible to create a good question that is difficult to determine whether or not it is a correct answer.

さらに、トレーニングデータ（妥当データ）は、実際の英語テストにおいて出題された問題文と解答の組であり、装置１０は実際に出題された英語テストに基づいて出力データを生成するから、英語問題において、適切な穴埋め問題等を作成することができる。 Further, the training data (valid data) is a set of question sentences and answers given in an actual English test, and the apparatus 10 generates output data based on the English test actually given. Appropriate hole filling problems can be created.

以下、装置１０の動作例を主に、図１１及び図１２を使用して説明する。図１１及び図１２は、装置１０の動作例を示す概略フローチャートである。 Hereinafter, an operation example of the apparatus 10 will be described mainly using FIGS. 11 and 12. 11 and 12 are schematic flowcharts showing an operation example of the apparatus 10.

まず、装置１０は、穴埋め問題の問題文と解答の組をトレーニングデータとして、多数の入力を受ける（図１１のステップＳ１０）。続いて、非妥当データを生成する（ステップＳ１２）。そして、妥当データ（トレーニングデータ）及び非妥当データの素性を判定し、ラベル付きの事例を生成する（ステップＳ１４）。トレーニングデータ（妥当データ）は、妥当であることが予め既知であるから、妥当であることを示すラベル「１」を付し、非妥当データは妥当でないことが予め既知であるから、妥当でないことを示すラベル「−１」が付される。 First, the apparatus 10 receives a large number of inputs by using a combination of a question sentence and an answer of a hole filling problem as training data (step S10 in FIG. 11). Subsequently, invalid data is generated (step S12). Then, the features of valid data (training data) and invalid data are determined, and a labeled case is generated (step S14). Since the training data (valid data) is known to be valid in advance, it is labeled as “1” indicating that it is valid, and the invalid data is not valid because it is known in advance. A label “−1” is attached.

続いて、装置１０は、ラベル付きの事例を統計処理して、識別面及びサポートベクターを算出する（ステップＳ１６）。以上が、装置１０による、学習プロセスである。次に、図１２を使用して、装置１０による分類プロセスを説明する。 Subsequently, the apparatus 10 performs statistical processing on the labeled case and calculates an identification surface and a support vector (step S16). The above is the learning process by the apparatus 10. Next, the classification process by the apparatus 10 will be described with reference to FIG.

まず、装置１０は、テストデータとして、英文の入力を受け付ける（図１２のステップＳ３０）。続いて、テストデータを形態素解析し、各形態素を空白相当部分として規定した候補データを生成する（ステップＳ３２）。続いて、候補データを素性リストによって特徴付け、候補事例を生成する（ステップＳ３４）。 First, the apparatus 10 accepts an English input as test data (step S30 in FIG. 12). Subsequently, morphological analysis is performed on the test data, and candidate data defining each morpheme as a blank equivalent part is generated (step S32). Subsequently, the candidate data is characterized by a feature list, and a candidate case is generated (step S34).

続いて、装置１０は、学習結果を参照し、各候補事例をラベル付けする（ステップＳ３６）。続いて、装置１０は、妥当候補事例のうち、最も識別面と近い候補事例を出力事例として選択する（ステップＳ３８）。続いて、装置１０は、出力事例に対応する候補データを出力する（ステップＳ４０）。 Subsequently, the device 10 refers to the learning result and labels each candidate case (step S36). Subsequently, the apparatus 10 selects a candidate case closest to the identification surface among the valid candidate cases as an output case (step S38). Subsequently, the device 10 outputs candidate data corresponding to the output case (step S40).

以上のステップによって、複雑な文法の穴埋め問題を手間なく自動で作成することができる。 Through the above steps, a complicated grammar filling problem can be automatically created without trouble.

（第２の実施形態）
次に、第２の実施形態について、図１３、１４、１５、１６及び１７を参照しながら説明する。第２の実施形態の装置１０Ａ（図１３参照）は、第１の実施形態の装置１０の機能に加えて、妥当な選択肢を自動的に生成するための機能を有する。装置１０Ａは、以下に説明するように、トレーニングデータ（図１６参照）に選択肢も含まれている場合に、その選択肢を使用して学習することができる。そして、装置１０Ａは、テストデータを使用して問題を作成するときに妥当な選択肢を生成することができるようになっている。 (Second Embodiment)
Next, a second embodiment will be described with reference to FIGS. 13, 14, 15, 16 and 17. The apparatus 10A (see FIG. 13) of the second embodiment has a function for automatically generating appropriate options in addition to the function of the apparatus 10 of the first embodiment. As will be described below, the apparatus 10A can learn using the option when the option is also included in the training data (see FIG. 16). The apparatus 10A can generate a valid option when creating a problem using the test data.

図１３、図１４及び図１５は、装置１０Ａの機能ブロック図である。図１６は、トレーニングデータの一例を示す図である。図１７は、誤答データ用追加素性リストの一例を示す図である。 13, 14 and 15 are functional block diagrams of the apparatus 10A. FIG. 16 is a diagram illustrating an example of training data. FIG. 17 is a diagram illustrating an example of an additional feature list for incorrect answer data.

図１３に示すように、装置１０Ａの制御部２０Ａは、機械学習部３０Ａ及び分類部５０Ａを有する。図１４に示すように、装置１０Ａの機械学習部３０Ａは、第１誤答データ生成部４０を有する。第１誤答データ生成部４０は、トレーニングデータの正答以外の選択肢（以下、「妥当選択肢」と呼ぶ）をあてはめたデータ（以下、「第１誤答データ」と呼ぶ）を生成するための構成である。また、機械学習部３０Ａは、第２誤答データ生成部４２を有する。第２誤答データ生成部４２は、トレーニングデータ（妥当データ）に含まれないが妥当選択肢と所定の関連性を有する単語（以下、「非妥当選択肢」と呼ぶ）を辞書ＤＢ１４から抽出し、トレーニングデータの空白部分に当てはめることによってデータ（以下、「第２誤答データ」と呼ぶ）を生成するようになっている。 As illustrated in FIG. 13, the control unit 20A of the apparatus 10A includes a machine learning unit 30A and a classification unit 50A. As illustrated in FIG. 14, the machine learning unit 30 A of the apparatus 10 A includes a first incorrect answer data generation unit 40. The first incorrect answer data generation unit 40 is configured to generate data (hereinafter referred to as “first incorrect answer data”) to which options other than correct answers of training data (hereinafter referred to as “valid options”) are applied. It is. The machine learning unit 30 A includes a second incorrect answer data generating unit 42. The second incorrect answer data generation unit 42 extracts words that are not included in the training data (valid data) but have a predetermined relationship with the valid option (hereinafter referred to as “invalid option”) from the dictionary DB 14, and training. Data (hereinafter referred to as “second erroneous answer data”) is generated by applying the blank portion of the data.

上述のように、妥当選択肢は、トレーニングデータに含まれる、正答ではない選択肢である。例えば、図１６のトレーニングデータにおいては、「ｉｎ」が正答の選択肢である。そして、「ｉｎｔｏ」及び「ｏｎ」が正答ではない選択肢である。これらは、正答ではないが、「ｉｎ」と紛らわしい意味を有する単語であって、英語を勉強する学生等が学習するためには妥当である。このため、これらの正答ではない選択肢を妥当選択肢と呼んでいる。これに対して、トレーニングデータの空白部分の単語と所定の関連性を有する単語であって、トレーニングデータに含まれていない単語を非妥当選択肢と呼ぶ。例えば、「ａｂｏｖｅ」や「ｈａｖｉｎｇ」が非妥当選択肢である。第２誤答データ生成部４２は、辞書ＤＢ１４を参照することによって、「ｉｎ」と所定の関連性を有する単語である非妥当選択肢を抽出し、トレーニングデータの空白部分に当てはめるようになっている。所定の関連性とは、例えば、意味が類似したり、スペル（つづり）が近似することである。 As described above, the appropriate option is an option that is included in the training data and is not a correct answer. For example, in the training data of FIG. 16, “in” is a correct answer option. “Into” and “on” are options that are not correct answers. These are not correct answers, but are words that have a misleading meaning of “in” and are appropriate for students who study English to learn. For this reason, these non-correct answers are called valid options. On the other hand, a word that has a predetermined relationship with a word in the blank portion of the training data and is not included in the training data is called an invalid option. For example, “above” and “having” are invalid options. The second incorrect answer data generation unit 42 refers to the dictionary DB 14 to extract invalid options that are words having a predetermined relationship with “in” and apply them to blank portions of the training data. . The predetermined relationship is, for example, that the meaning is similar or the spelling (spelling) is approximate.

図１４に示すように、機械学習部３０Ａは、誤答データ特徴解析部４４を有する。誤答データ特徴解析部４４は、妥当選択肢をトレーニングデータの空白部分に当てはめた第１誤答データと、非妥当選択肢をトレーニングデータの空白部分に当てはめた第２誤答データとを、素性リスト（図４参照）及び追加素性リスト（図１７参照）によって特徴付け、事例を生成するようになっている。ここで、素性リストも使用するのは、空白部分は同じトレーニングデータの中において同じ場所に該当するから、より詳細に素性を分析することによって、各事例の相違を抽出するためである。第１誤答データに基づいて生成された事例は妥当であることが予め既知であるから、誤答データ特徴解析部４４は妥当であることを示すラベル「１」を付する。これに対して、非妥当誤答データに基づいて生成された事例は妥当ではないことが予め既知であるから、誤答データ特徴解析部４４は妥当ではないことを示すラベル「−１」を付する。 As illustrated in FIG. 14, the machine learning unit 30 A includes an erroneous answer data feature analysis unit 44. The incorrect answer data feature analysis unit 44 uses a feature list (first error data in which valid options are applied to blank portions of training data and second incorrect answer data in which invalid options are applied to blank portions of training data. Characterized by an additional feature list (see FIG. 17) and a case are generated. Here, the feature list is also used because the blank portion corresponds to the same place in the same training data, so that the difference between the cases is extracted by analyzing the features in more detail. Since the case generated based on the first erroneous answer data is known in advance to be valid, the erroneous answer data feature analysis unit 44 attaches a label “1” indicating that it is valid. On the other hand, since it is known in advance that the case generated based on the invalid answer data is not valid, the label “-1” indicating that the answer data feature analysis unit 44 is not valid is attached. To do.

図１７に示すように、誤答データ用追加素性リストは、例えば、素性Ｇ乃至Ｊから構成される。素性Ｇは正答との品詞の一であり、素性Ｈは「正答との意味の類似性」であり、素性Ｉは「単語の長さ」であり、素性Ｊは「意味の数が所定の閾値α以上か否か」であり、各素性Ｇ〜Ｊに対応する素性値を含む。 As shown in FIG. 17, the additional feature list for incorrect answer data is composed of features G to J, for example. The feature G is one of the parts of speech with the correct answer, the feature H is “similarity of meaning with the correct answer”, the feature I is “word length”, and the feature J is “the number of meanings is a predetermined threshold value” whether or not α or more ”, and includes feature values corresponding to the features G to J.

図１４に示すように、機械学習部３０Ａは、選択肢用基準データ生成部４６を有する。選択肢用基準データ生成部４６は、第１誤答データから生成された事例の集合である第１誤答事例群と、第２誤答データから生成された事例の集合である第２誤答事例群とを統計処理することによって、第１誤答事例群と第２誤答事例群とを区別するための基準データ（以下、「選択肢用基準データ」と呼ぶ）を生成するための構成である。選択肢用基準データ生成部４６は、サポートベクターマシンであって、選択肢用基準データは識別面及び、この識別面を規定するサポートベクターである。 As illustrated in FIG. 14, the machine learning unit 30 A includes an option reference data generation unit 46. The option reference data generation unit 46 includes a first incorrect answer case group that is a set of cases generated from the first incorrect answer data and a second incorrect answer case that is a set of cases generated from the second incorrect answer data. This is a configuration for generating reference data (hereinafter referred to as “reference data for options”) for distinguishing between the first erroneous answer case group and the second erroneous answer case group by performing statistical processing on the group. . The option reference data generation unit 46 is a support vector machine, and the option reference data is an identification plane and a support vector that defines the identification plane.

図１５に示すように、分類部５０Ａは、候補誤答データ特徴解析部６０を有する。候補誤答データ特徴解析部６０は、テストデータの空白部分の単語と所定の関連性を有する単語をテストデータに当てはめて生成した複数の候補誤答データを、素性リスト（図４参照）及び追加素性リスト（図１７参照）によって特徴づけることによって、候補誤答事例を生成するようになっている。ここで、テストデータの空白部分の単語と所定の関連性を有するとは、例えば、意味が類似することである。そして、候補誤答データ特徴解析部６０は、識別面によって、候補誤答事例をラベル付けする。 As illustrated in FIG. 15, the classification unit 50 A includes a candidate incorrect answer data feature analysis unit 60. The candidate error data feature analysis unit 60 adds a plurality of candidate error data generated by applying a word having a predetermined relationship with a blank word of the test data to the test data, and adds a feature list (see FIG. 4) and By characterizing with a feature list (see FIG. 17), a candidate wrong answer example is generated. Here, the phrase “having a predetermined relationship with the word in the blank portion of the test data” means that the meaning is similar, for example. Then, the candidate incorrect answer data feature analysis unit 60 labels candidate incorrect answer cases according to the identification surface.

図１５に示すように、分類部５０Ａは、選択肢生成部６２を有する。選択肢生成部６２は、各候補誤答事例を識別面によって、第１誤答事例群に属する候補誤答事例と、第２誤答事例群に属する候補誤答事例とに区別する。これは、候補誤答事例を、ラベル付けすることを意味する。第１誤答事例群に属する候補誤答事例のラベルは「１」であり、第２誤答事例群に属する候補誤答事例のラベルは「−１」である。 As illustrated in FIG. 15, the classification unit 50A includes an option generation unit 62. The option generation unit 62 distinguishes each candidate error answer case into a candidate error answer case belonging to the first error answer case group and a candidate error answer case belonging to the second error answer case group according to the identification surface. This means that the candidate wrong answer cases are labeled. The label of the candidate wrong answer case belonging to the first wrong answer case group is “1”, and the label of the candidate wrong answer case belonging to the second wrong answer case group is “−1”.

選択肢生成部６２は、識別面に近い順に候補誤答事例を所定数選択し、その誤答候補事例に対応するデータの空白部分を誤答の選択肢として出力するようになっている。 The option generation unit 62 selects a predetermined number of candidate erroneous answer examples in order from the closest to the identification plane, and outputs a blank portion of data corresponding to the erroneous answer candidate examples as an erroneous answer option.

上述のように、装置１０Ａは、識別面から大きく乖離して第１誤答データ群に属することが明確な候補データではなくて、識別面から最も近く、第２誤答当データ群と最も近い第１誤答データを出力するから、正答か否かの判断が困難な良問を作成することができる。 As described above, the device 10A is not candidate data that deviates greatly from the identification plane and belongs to the first erroneous answer data group, but is closest to the identification plane and closest to the second incorrect answer data group. Since the first erroneous answer data is output, it is possible to create a good question for which it is difficult to determine whether the answer is correct.

［問題自動作成装置１０のハードウェア構成］
図１８は、装置１０及び１０Ａ（以下、情報処理装置１０００と表記する）のハードウェア構成例を示す図である。情報処理装置１０００は、それぞれの制御部を構成するＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１０１０（マルチプロセッサ構成ではＣＰＵ１０１２等複数のＣＰＵが追加されてもよい）、バスライン１００５、通信Ｉ／Ｆ１０４０、メインメモリ１０５０、ＢＩＯＳ（ＢａｓｉｃＩｎｐｕｔＯｕｔｐｕｔＳｙｓｔｅｍ）１０６０、ＵＳＢポート１０９０、Ｉ／Ｏコントローラ１０７０、ならびにキーボード及びマウス１１００等の入力手段や表示装置１０２２を備える。 [Hardware configuration of automatic problem creation apparatus 10]
FIG. 18 is a diagram illustrating a hardware configuration example of the devices 10 and 10A (hereinafter referred to as the information processing device 1000). The information processing apparatus 1000 includes a CPU (Central Processing Unit) 1010 (a plurality of CPUs such as the CPU 1012 may be added in a multiprocessor configuration), a bus line 1005, a communication I / F 1040, and a main memory 1050 that configure each control unit. , A BIOS (Basic Input Output System) 1060, a USB port 1090, an I / O controller 1070, an input means such as a keyboard and mouse 1100, and a display device 1022.

Ｉ／Ｏコントローラ１０７０には、テープドライブ１０７２、ハードディスク１０７４、光ディスクドライブ１０７６、半導体メモリ１０７８、等の記憶手段を接続することができる。 Storage means such as a tape drive 1072, a hard disk 1074, an optical disk drive 1076, and a semiconductor memory 1078 can be connected to the I / O controller 1070.

ＢＩＯＳ１０６０は、情報処理装置１０００の起動時にＣＰＵ１０１０が実行するブートプログラムや、情報処理装置１０００のハードウェアに依存するプログラム等を格納する。 The BIOS 1060 stores a boot program executed by the CPU 1010 when the information processing apparatus 1000 is activated, a program depending on the hardware of the information processing apparatus 1000, and the like.

記憶部を構成するハードディスク１０７４は、情報処理装置１０００が機能するための各種プログラム及び本発明の機能を実行するプログラムを記憶しており、さらに必要に応じて各種データベースを構成可能である。 The hard disk 1074 constituting the storage unit stores various programs for the information processing apparatus 1000 to function and programs for executing the functions of the present invention, and can configure various databases as necessary.

光ディスクドライブ１０７６としては、例えば、ＤＶＤ−ＲＯＭドライブ、ＣＤ−ＲＯＭドライブ、ＤＶＤ−ＲＡＭドライブ、ＣＤ−ＲＡＭドライブを使用することができる。この場合は各ドライブに対応した光ディスク１０７７を使用する。光ディスク１０７７から光ディスクドライブ１０７６によりプログラムまたはデータを読み取り、Ｉ／Ｏコントローラ１０７０を介してメインメモリ１０５０またはハードディスク１０７４に提供することもできる。また、同様にテープドライブ１０７２に対応したテープメディア１０７１を主としてバックアップのために使用することもできる。 As the optical disc drive 1076, for example, a DVD-ROM drive, a CD-ROM drive, a DVD-RAM drive, or a CD-RAM drive can be used. In this case, the optical disk 1077 corresponding to each drive is used. A program or data can be read from the optical disk 1077 by the optical disk drive 1076 and provided to the main memory 1050 or the hard disk 1074 via the I / O controller 1070. Similarly, the tape medium 1071 corresponding to the tape drive 1072 can be used mainly for backup.

情報処理装置１０００に提供されるプログラムは、ハードディスク１０７４、光ディスク１０７７、またはメモリーカード等の記録媒体に格納されて提供される。このプログラムは、Ｉ／Ｏコントローラ１０７０を介して、記録媒体から読み出され、または通信Ｉ／Ｆ１０４０を介してダウンロードされることによって、情報処理装置１０００にインストールされ実行されてもよい。 The program provided to the information processing apparatus 1000 is provided by being stored in a recording medium such as the hard disk 1074, the optical disk 1077, or a memory card. The program may be installed in the information processing apparatus 1000 and executed by being read from the recording medium via the I / O controller 1070 or downloaded via the communication I / F 1040.

前述のプログラムは、内部または外部の記憶媒体に格納されてもよい。ここで、記憶部を構成する記憶媒体としては、ハードディスク１０７４、光ディスク１０７７、またはメモリカードの他に、ＭＤ等の光磁気記録媒体、テープメディア１０７１を用いることができる。また、専用通信回線やインターネットに接続されたサーバシステムに設けたハードディスク１０７４または光ディスクライブラリ等の記憶装置を記録媒体として使用し、通信回線を介してプログラムを情報処理装置１０００に提供してもよい。 The aforementioned program may be stored in an internal or external storage medium. Here, in addition to the hard disk 1074, the optical disk 1077, or the memory card, a magneto-optical recording medium such as an MD, or a tape medium 1071 can be used as a storage medium constituting the storage unit. Further, a storage device such as a hard disk 1074 or an optical disk library provided in a server system connected to a dedicated communication line or the Internet may be used as a recording medium, and the program may be provided to the information processing apparatus 1000 via the communication line.

ここで、表示装置１０２２は、サーバ管理者にデータの入力を受け付ける画面を表示したり、情報処理装置１０００による演算処理結果の画面を表示したりするものであり、ブラウン管表示装置（ＣＲＴ）、液晶表示装置（ＬＣＤ）等のディスプレイ装置を含む。 Here, the display device 1022 displays a screen for accepting data input to the server administrator, or displays a screen of a calculation processing result by the information processing device 1000. The display device 1022 is a cathode ray tube display (CRT), liquid crystal display, or the like. A display device such as a display device (LCD) is included.

ここで、入力手段は、サーバ管理者による入力の受け付けを行うものであり、キーボード及びマウス１１００等により構成してよい。 Here, the input means accepts input by the server administrator, and may be constituted by a keyboard, a mouse 1100, and the like.

また、通信Ｉ／Ｆ１０４０は、情報処理装置１０００を専用ネットワークまたは公共ネットワークを介して端末と接続できるようにするためのネットワーク・アダプタである。通信Ｉ／Ｆ１０４０は、モデム、ケーブル・モデム及びイーサネット（登録商標）・アダプタを含んでよい。 The communication I / F 1040 is a network adapter that enables the information processing apparatus 1000 to be connected to a terminal via a dedicated network or a public network. The communication I / F 1040 may include a modem, a cable modem, and an Ethernet (registered trademark) adapter.

以上の例は、情報処理装置１０００について主に説明したが、コンピュータに、プログラムをインストールして、そのコンピュータを情報処理装置１０００として動作させることにより上記で説明した機能を実現することもできる。従って、本発明において一実施形態として説明した情報処理装置１０００により実現される機能は、上述の方法を当該コンピュータにより実行することにより、あるいは、上述のプログラムを当該コンピュータに導入して実行することによっても実現可能である。上述のキーボード／マウス１１００が入力部１２（図１及び図１３参照）に対応し、ＣＰＵ１０１０及びＣＰＵ１０１２が制御部２０，２０Ａ（図１及び図１３参照）に対応する。そして、表示装置１０２２が出力部１８（図１及び図１３参照）に対応する。そして、ハードディスク１０７４が、辞書ＤＢ１４及び学習結果ＤＢ１６に対応する。 In the above example, the information processing apparatus 1000 has been mainly described. However, the functions described above can also be realized by installing a program in a computer and causing the computer to operate as the information processing apparatus 1000. Therefore, the functions realized by the information processing apparatus 1000 described as an embodiment in the present invention are executed by executing the above-described method by the computer, or by introducing the above-described program into the computer and executing it. Is also feasible. The keyboard / mouse 1100 described above corresponds to the input unit 12 (see FIGS. 1 and 13), and the CPU 1010 and CPU 1012 correspond to the control units 20 and 20A (see FIGS. 1 and 13). The display device 1022 corresponds to the output unit 18 (see FIGS. 1 and 13). The hard disk 1074 corresponds to the dictionary DB 14 and the learning result DB 16.

本発明の実施形態である装置１０、１０Ａ（図１及び図１３参照）、またはこの装置の制御で用いられる方法は、コンピュータ上のプログラムによっても実現可能である。上記プログラムを格納する記憶媒体は、電子的、磁気的、光学的、電磁的、赤外線または半導体システム（または、装置または機器）あるいは伝搬媒体であることができる。この記憶媒体の例には、半導体またはソリッド・ステート記憶装置、磁気テープ、取り外し可能なコンピュータ可読の媒体の例には、半導体またはソリッド・ステート記憶装置、磁気テープ、取り外し可能なフレキシブルディスク、ランダム・アクセス・メモリ（ＲＡＭ）、リードオンリー・メモリ（ＲＯＭ）、リジッド磁気ディスク及び光ディスクが含まれる。現時点における光ディスクの例には、コンパクト・ディスク・リードオンリー・メモリ（ＣＤ−ＲＯＭ）、コンパクト・ディスク−リード・ライト（ＣＤ−Ｒ／Ｗ）及びＤＶＤが含まれる。 The apparatus 10, 10A (see FIGS. 1 and 13), which is an embodiment of the present invention, or the method used for controlling this apparatus can also be realized by a program on a computer. The storage medium storing the program can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium. Examples of this storage medium include semiconductor or solid state storage devices, magnetic tape, removable computer readable media examples include semiconductor or solid state storage devices, magnetic tape, removable flexible disks, random Includes access memory (RAM), read only memory (ROM), rigid magnetic disk and optical disk. Examples of optical disks at the present time include compact disk read only memory (CD-ROM), compact disk read / write (CD-R / W) and DVD.

以上、本発明の実施形態を説明したが、具体例を例示したに過ぎず、特に本発明を限定しない。また、本発明の実施形態に記載された効果は、本発明から生じる最も好適な効果を列挙したに過ぎず、本発明による効果は、本発明の実施形態に記載された効果に限定されない。 As mentioned above, although embodiment of this invention was described, it only showed the specific example and does not specifically limit this invention. Further, the effects described in the embodiments of the present invention only list the most preferable effects resulting from the present invention, and the effects of the present invention are not limited to the effects described in the embodiments of the present invention.

問題自動作成装置の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of an automatic problem preparation apparatus. 機械学習部の詳細を示すブロック図である。It is a block diagram which shows the detail of a machine learning part. 分類部の詳細を示すブロック図である。It is a block diagram which shows the detail of a classification | category part. 素性リストの一例を示す図である。It is a figure which shows an example of a feature list. トレーニングデータの一例を示す図である。It is a figure which shows an example of training data. 非妥当データの一例を示す図である。It is a figure which shows an example of invalid data. 事例の一例を示す図である。It is a figure which shows an example of a case. 学習結果の一例を示す図である。It is a figure which shows an example of a learning result. テストデータの一例等を示す図である。It is a figure which shows an example etc. of test data. 候補事例をラベル付けする方法の一例を示す図である。It is a figure which shows an example of the method of labeling a candidate example. 問題自動作成装置の動作例を示す概略フローチャートである。It is a schematic flowchart which shows the operation example of a problem automatic preparation apparatus. 問題自動作成装置の動作例を示す概略フローチャートである。It is a schematic flowchart which shows the operation example of a problem automatic preparation apparatus. 問題自動作成装置の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of an automatic problem preparation apparatus. 機械学習部の詳細を示すブロック図である。It is a block diagram which shows the detail of a machine learning part. 分類部の詳細を示すブロック図である。It is a block diagram which shows the detail of a classification | category part. トレーニングデータの一例を示す図である。It is a figure which shows an example of training data. 追加素性リストの一例を示す図である。It is a figure which shows an example of an additional feature list. 問題自動作成装置のサーバのハードウェア構成例を示す図である。It is a figure which shows the hardware structural example of the server of a problem automatic preparation apparatus.

Explanation of symbols

１０，１０Ａ問題自動作成装置
１２入力部
１４辞書ＤＢ
１６学習結果ＤＢ
１８出力部
２０，２０Ａ制御部
３０，３０Ａ機械学習部
３２非妥当データ生成部
３４特徴解析部
３６基準データ生成部
４０第１誤答データ生成部
４２第２誤答データ性西部
４４誤答データ特徴解析部
４６選択肢用基準データ生成部
５０，５０Ａ分類部
５２候補データ特徴解析部
５４出力データ生成部
６０候補誤答データ特徴解析部
６２選択肢生成部 10, 10A automatic problem creation device 12 input unit 14 dictionary DB
16 Learning result DB
18 Output unit 20, 20A Control unit 30, 30A Machine learning unit 32 Inappropriate data generation unit 34 Feature analysis unit 36 Reference data generation unit 40 First error data generation unit 42 Second error data property western part
44 erroneous answer data feature analysis unit 46 option reference data generation unit 50, 50A classification unit 52 candidate data feature analysis unit 54 output data generation unit 60 candidate error data feature analysis unit 62 option generation unit

Claims

An automatic problem creation apparatus having a machine learning unit and a classification unit,
The machine learning unit
A valid data receiving unit that accepts a large number of blank filling problems with blank parts as valid data suitable for learning use;
In each of the valid data, an invalid data generator that generates invalid data inappropriate for learning use by changing the position of the blank part from the beginning of the sentence;
A feature analysis unit characterizing the valid data and the invalid data by a feature list of a plurality of types of features for characterizing the data;
Reference data for distinguishing between the valid data group and the invalid data group by statistically processing the valid data group that is the set of valid data and the invalid data group that is the set of invalid data A reference data generation unit to generate,
Have
The classification unit includes:
A test data reception unit for receiving input of test data;
A candidate data feature analysis unit that performs morphological analysis on the test data and characterizes candidate data that defines each morpheme as a blank portion by the feature list
An output data generation unit that serves as output data for outputting the candidate data classified into the valid data group according to the reference data;
An automatic problem creation apparatus characterized by comprising:

The reference data generation unit is a support vector machine,
The automatic problem creation device according to claim 1, wherein the reference data is an identification surface defined by a support vector.

The automatic problem creation apparatus according to claim 2, wherein the output data generation unit sets the candidate data closest to the identification plane as the output data.

The automatic question creation apparatus according to any one of claims 1 to 3, wherein the valid data is a set of a question sentence and an answer given in a national language or foreign language test.

The feature analysis unit further includes:
First error data generated by applying wrong answer options other than correct answers included in the valid data to blank parts, and blank parts that are not included in the valid data but have a predetermined relationship with the wrong answer options An erroneous answer data feature analysis unit that characterizes the second erroneous answer data generated by applying to the feature list and the additional feature list for erroneous answers;
By statistically processing a first erroneous answer data group that is a set of the first incorrect answer data and a second incorrect answer data group that is a set of the second incorrect answer data, the first incorrect answer data group and the An option reference data generation unit for generating option reference data for distinguishing the second erroneous answer data group;
Have
The classification unit further includes:
Candidate error data characteristic analysis unit characterized by the feature list and the additional feature list, the candidate answer data generated by applying a word having a predetermined relationship with the word corresponding to the blank portion of the output data to the blank portion; ,
An option generation unit that outputs a word corresponding to a blank portion of the candidate incorrect answer data determined to belong to the first incorrect answer data group based on the option reference data;
5. The automatic problem creating apparatus according to claim 1, further comprising:

The option reference data generation unit is a support vector machine,
6. The automatic problem creation apparatus according to claim 5, wherein the option reference data is an identification surface defined by a support vector.

The automatic question creation apparatus according to claim 6, wherein the option generation unit is configured to output a word corresponding to a blank portion of the candidate incorrect answer data closest to the identification surface.

A method for automatically creating a hole filling problem,
A valid data acceptance step for accepting a large number of hole filling problems having blank portions as valid data;
A non-valid data generation step of generating invalid data by changing the position of the blank part from the beginning of each valid data;
A feature analysis step for characterizing the valid data and the invalid data by a feature list of a plurality of types of features for characterizing the data;
Reference data for distinguishing between the valid data group and the invalid data group by statistically processing the valid data group that is the set of valid data and the invalid data group that is the set of invalid data A reference data generation step to generate;
A test data reception step for receiving input of test data;
A morphological analysis of the test data, a candidate data feature analysis step characterized by the feature list candidate data defining each morpheme as a blank part;
An output data generation step as output data for outputting the candidate data determined to belong to the valid data group according to the reference data;
A method for automatically creating a problem, characterized by comprising:

A computer program for causing a computer to function as an automatic creation device for a hole filling problem,
A valid data receiving step for accepting a large number of hole filling problems having a blank equivalent part as valid data;
A non-valid data generation step of generating non-valid data by changing a position from the beginning of each valid data of the blank equivalent part;
A feature analysis step for characterizing the valid data and the invalid data by a feature list of a plurality of types of features for characterizing the data;
Reference data for distinguishing between the valid data group and the invalid data group by statistically processing the valid data group that is the set of valid data and the invalid data group that is the set of invalid data A reference data generation step to generate;
A test data reception step for receiving input of test data;
A morphological analysis of the test data, a candidate data feature analyzing step characterized by the feature list candidate data defining each morpheme as a blank equivalent part,
An output data generation step as output data for outputting the candidate data determined to belong to the valid data group according to the reference data;
A computer program for executing