JP2019185754A

JP2019185754A - Descriptive test scoring program and descriptive test scoring method

Info

Publication number: JP2019185754A
Application number: JP2019040212A
Authority: JP
Inventors: 弘樹須鎗; Hiroki Suyari; 康久仁森; Yasukuni Mori; 謙吾竹谷; Kengo Takeya; 浩平高井; Kohei Takai
Original assignee: Chiba University NUC
Current assignee: Chiba University NUC
Priority date: 2018-04-06
Filing date: 2019-03-06
Publication date: 2019-10-24
Anticipated expiration: 2039-03-06
Also published as: JP7268849B2

Abstract

To provide a descriptive test scoring program and a descriptive test scoring method which improve the accuracy and reliability of grading.SOLUTION: A descriptive test scoring program causes a computer to execute: a morphological analysis step for performing morphological analysis on the text information as the criteria for scoring and the character string information to be evaluated; an alignment step for calculating the correspondence relationship between the morphemes of the character string information to be analyzed in the morphological analysis step and the morphological morphemes of the character string information as a criterion for scoring; a similarity calculation step for calculating the similarity between the character string information as the criteria for scoring and the character string information to be scored based on the correspondence relationship between the morphemes of the character string information to be evaluated and the morphological morphemes of the character string information as the criteria for scoring in the alignment step; and a scoring step for calculating the number of points of the character string information to be the criteria for scoring based on the information on the similarity calculated in the similarity calculation step.SELECTED DRAWING: Figure 5

Description

本発明は、記述式試験採点プログラム及び記述式試験採点方法に関するものである。 The present invention relates to a descriptive test scoring program and a descriptive test scoring method.

下記非特許文献１が、最も近い従来技術と思われる。この論文では、2020年度に大学入試センター試験に導入予定の記述式問題の自動採点のコスト削減を図る方法が提案されている。そこでは、記述式問題が複数の採点者によって採点されることを考慮して、予め採点済みデータを教師データとして、正解と不正解の二値分類問題として学習し、二人目の採点者として、その学習結果を活かす提案である。数百人規模の解答データを用いて実験し、SVM、CNN(畳み込みニューラルネットワーク) ともに90%近くの精度で分類できることがわかった。 The following non-patent document 1 seems to be the closest prior art. This paper proposes a method to reduce the cost of automatic scoring of descriptive questions scheduled to be introduced in the university entrance examination center examination in 2020. There, considering that the descriptive question is scored by a plurality of graders, pre-scored data is used as teacher data, learning as a binary classification problem of correct and incorrect answers, and as a second grader, It is a proposal that makes use of the learning results. Experiments with answer data of hundreds of people showed that both SVM and CNN (convolutional neural network) can be classified with an accuracy of nearly 90%.

寺田他，ニューラルネットワークを用いた記述式問題の自動採点，言語処理学会第22回年次大会発表論文集, pp.370-373(2016)Terada et al., Automatic scoring of description problems using neural networks, Proc. Of the 22nd Annual Conference of the Language Processing Society, pp.370-373 (2016)

大学入試センター試験を想定した場合、ここ数年の受験者数から、55万人弱の答案データを想定してもよいと考える。このとき、従来技術では、次のような課題がある。 Assuming a university entrance examination center examination, it is possible to assume answer data of just under 550,000 from the number of examinees over the past few years. At this time, the conventional technique has the following problems.

（１）二人目の採点者としての方法ならば、55万人弱の答案を一度は人の手で採点する必要があり、大きなコストがかかることには変わりない。 (1) In the case of the method as the second grader, it is necessary to score the answer of less than 550,000 people once by hand, and it is still expensive.

（２）仮に、55万人弱の答案のうち、1万人分の答案を人の手で採点し、採点済みデータとして学習し、それを残り54万人分の答案に対して、学習結果を適用した場合の精度の保証は何もない。 (2) Temporarily, out of the answers of less than 550,000 people, the answers for 10,000 people were scored by hand and learned as graded data, and the results of the learning for the remaining 540,000 answers There is no guarantee of accuracy when applying.

（３）90%の精度は高いと言えるかもしれないが、データ数が55万と考えた場合、約5万は、正解と不正解を間違えるという意味である。これは、受験生本人にとって大きな問題であり、信頼できる分類とは言えない。 (3) Although 90% accuracy may be high, if the number of data is 550,000, about 50,000 means that the correct answer is incorrect. This is a big problem for students and it cannot be said to be a reliable classification.

（４）正解と不正解の二値分類のため、記述式問題によくある部分点を与えることはできない。部分点を与えられるように、上述の従来技術において、多値分類に変更することは可能であるが、当然、精度は落ちる。 (4) Because of the binary classification of the correct answer and the incorrect answer, it is not possible to give partial points that are common to descriptive formula problems. In the above-described prior art, it is possible to change to multi-value classification so that partial points can be given, but naturally the accuracy is lowered.

本発明は、上記課題に鑑みてなされたものであり、採点の精度、信頼性を向上させた記述式試験採点プログラム及び記述式試験採点方法を提供することを目的とする。 The present invention has been made in view of the above problems, and an object of the present invention is to provide a descriptive test scoring program and a descriptive test scoring method with improved scoring accuracy and reliability.

本発明の一つの手段によれば、上記課題を解決するために、記述式試験採点プログラムを、コンピュータに、採点の基準となる文字列情報及び採点の対象となる文字列情報に対して形態素解析を行う形態素解析手順と、形態素解析手順により解析した採点対象となる文字列情報の形態素及び採点の基準となる文字列情報の形態素の対応関係を算出するアライメント手順と、アライメント手順により算出した採点対象となる文字列情報の形態素及び採点の基準となる文字列情報の形態素の対応関係に基づいて、採点の基準となる文字列情報及び採点の対象となる文字列情報の類似性に関する情報を算出する類似性算出手順と、類似性算出手順により算出した類似性に関する情報に基づき前記採点の対象となる文字列情報の点数を算出する採点手順を実行させるものとした。 According to one aspect of the present invention, in order to solve the above problem, a descriptive test scoring program is transmitted to a computer by using a morphological analysis for character string information serving as a scoring reference and character string information serving as a scoring target. The morpheme analysis procedure, the morpheme of the character string information to be scored analyzed by the morpheme analysis procedure, the alignment procedure to calculate the correspondence of the morpheme of the character string information to be the reference of scoring, Based on the correspondence between the morpheme of the character string information to be used and the morpheme of the character string information to be used for scoring, information on the similarity between the character string information used as the scoring reference and the character string information used as the scoring target is calculated. A scorer that calculates the score of the character string information to be scored based on the similarity calculation procedure and information on the similarity calculated by the similarity calculation procedure It was assumed to be an execution.

さらに、アライメント手順において、類似性算出手段が算出した類似性に関する情報が最大となるように採点対象となる文字列情報の形態素及び採点の基準となる文字列情報の形態素の対応関係を算出するものとすると望ましい。 Further, in the alignment procedure, the correspondence between the morphemes of the character string information to be scored and the character string information to be used as a reference for scoring is calculated so that the information on the similarity calculated by the similarity calculation means is maximized. This is desirable.

さらに、記述式試験の受験者の解答情報に基づいて採点の基準となる文字列情報を修正すると望ましい。
また、採点手順において、キーワード、含まれていれば不正解とする語又は文末に含まれていれば不正解とする語が採点対象となる文字列情報に含まれているか否かに基づいて前記採点の対象となる文字列情報の点数を算出することが望ましい。
また、採点手順により採点の対象となる文字列情報の点数を算出できるか否かを判断する自動採点可能性判断手順と、自動採点可能性判断手順により、採点手順によって点数を算出できないと判断された採点の対象となる文字列情報をクラスタリングするクラスタリング手順と、を有するものとすれば、現在の採点設定では自動採点できない文章に共通する特徴が明確になるため、有効なキーフレーズやNGワードを人手により設定することが容易となるため、望ましい。 Furthermore, it is desirable to correct the character string information that serves as a scoring standard based on the answer information of the test taker.
Further, in the scoring procedure, based on whether the character string information to be scored includes a keyword, a word that is incorrect if included, or a word that is incorrect if included at the end of the sentence. It is desirable to calculate the number of character string information to be scored.
In addition, it is determined that the scoring procedure cannot calculate the score by the automatic scoring possibility judgment procedure and the automatic scoring possibility judgment procedure for judging whether the scoring procedure can calculate the score of the character string information to be scored. If you have a clustering procedure for clustering character string information that is subject to scoring, you will be able to identify key features and NG words that are common to sentences that cannot be automatically scored with the current scoring settings. This is desirable because it is easy to set manually.

また、本発明の他の観点によれば、記述式試験採点方法を、採点の基準となる文字列情報及び採点の対象となる文字列情報に対してコンピュータにより形態素解析を行う形態素解析手順と、形態素解析手順により解析した採点対象となる文字列情報の形態素及び採点の基準となる文字列情報の形態素の対応関係を前記コンピュータにより算出するアライメント手順と、アライメント手順により算出した採点対象となる文字列情報の形態素及び採点の基準となる文字列情報の形態素の対応関係に基づいて、採点の基準となる文字列情報及び採点の対象となる文字列情報の類似性に関する情報をコンピュータにより算出する類似性算出手順と、類似性算出手順により算出した類似性に関する情報に基づき採点の対象となる文字列情報の点数をコンピュータにより算出する採点手順とを有するものとすると望ましい。 Further, according to another aspect of the present invention, a descriptive test scoring method includes a morphological analysis procedure for performing morphological analysis by a computer on character string information serving as a scoring reference and character string information serving as a scoring target; The alignment procedure for calculating the correspondence between the morpheme of the character string information to be scored analyzed by the morpheme analysis procedure and the morpheme of the character string information to be a reference for scoring, and the character string to be scored calculated by the alignment procedure Similarity in which the computer calculates information related to the similarity between the character string information that serves as the scoring standard and the character string information that serves as the scoring target, based on the correspondence between the morphemes of the information and the morphemes of the character string information that serves as the scoring standard Based on the calculation procedure and the similarity information calculated by the similarity calculation procedure, the score of the character string information to be scored Desirable Assuming and a scoring procedure for calculating by Yuta.

本発明によれば、採点の精度、信頼性を向上させた記述式試験採点プログラム及び記述式試験採点方法を提供することができる。 According to the present invention, it is possible to provide a descriptive test scoring program and a descriptive test scoring method with improved scoring accuracy and reliability.

本発明の実施形態例による自動採点方法の概要を示す図である。It is a figure which shows the outline | summary of the automatic scoring method by the example of embodiment of this invention. 本実施例によるアライメントの例を示す図である。It is a figure which shows the example of the alignment by a present Example. 本実施例によるスコアリングの例を示す図である。It is a figure which shows the example of the scoring by a present Example. 本実施例による形態素解析の例を示す図である。It is a figure which shows the example of the morphological analysis by a present Example. 本実施例による類似性の判定の手順を示す図である。It is a figure which shows the procedure of the determination of the similarity by a present Example. 本実施例による類似性の算出の具体例を示す図である。It is a figure which shows the specific example of calculation of the similarity by a present Example. 記述式試験の採点基準の例を示す図である。It is a figure which shows the example of the scoring standard of a description type test. 実施例２の自動採点方法の概要を示す図である。It is a figure which shows the outline | summary of the automatic scoring method of Example 2. FIG. 実施例２における、シーケンスアライメントの例を示す図である。FIG. 10 is a diagram illustrating an example of sequence alignment in the second embodiment.

以下、本発明の実施形態例及び実施例を説明するが、本発明の実施形態は以下に説明する実施形態例、実施例に限定されない。 Embodiments and examples of the present invention will be described below, but the embodiments of the present invention are not limited to the embodiment examples and examples described below.

従来技術では、上述の論文以外でも、二人目の採点者としての方法の提案がほとんどであるが、本発明では、一人目の採点者として自動採点のシステムを想定している。 In the prior art, most of the proposals of the method as the second grader are other than the above-mentioned papers, but in the present invention, an automatic scoring system is assumed as the first grader.

通常、採点基準を定めたり、また、解答者である受験生に対しても、使うキーワードや文字数制限などを設定している場合がほとんどである。このような事情を考慮して、次のような手順で採点する（図１参照。）。 In most cases, scoring standards are set, and the keywords to be used and the number of characters are set for the students who are answerers. In consideration of such circumstances, scoring is performed according to the following procedure (see FIG. 1).

１．手順１
正答の設定：解答者に課している条件や採点基準を設定する。例：必須キーワード、文字数制限、条件キーフレーズなど。条件キーフレーズとは、重要な単語を含む短い文章のことである。条件キーフレーズによる採点を行うことで、重要な単語とその並び、自然な文章かどうかを判定する。また、部分点の付与なども可能となる。 1. Step 1
Setting correct answers: Set the conditions and scoring standards imposed on the answerer. Examples: Mandatory keywords, character limit, conditional key phrases, etc. Conditional key phrases are short sentences that contain important words. By scoring with conditional key phrases, it is determined whether important words and their alignment, natural sentences. In addition, partial points can be given.

２．手順２
正解と不正解の文書データを用意：正解と不正解の文書データをある程度用意する。答案のうち、正解の文書データ全てを用意する必要はない。 2. Step 2
Prepare correct and incorrect document data: Prepare a certain amount of correct and incorrect document data. Of the answers, it is not necessary to prepare all the correct document data.

３．手順３
図１の構成図をもとに、解答の入力、正解/不正解文群との照合、問題の必要条件を満たしているか、条件キーフレーズの有無の確認を行う。ここで、正解あるいは不正解と判定できた解答は自動採点できたことになるが、判定できなかった場合は、手動採点となる。なお、解答の入力については、人がコンピュータに入力することも可能であるが、受験生の解答用紙を文字認識ソフトウェアで認識して自動的に認識する方が労力・採点時間の軽減の観点から望ましい。 3. Step 3
Based on the configuration diagram of FIG. 1, input of an answer, collation with a correct / incorrect answer sentence group, and the presence / absence of a condition key phrase are checked to see if the necessary condition of the problem is satisfied. Here, the answer that can be determined to be correct or incorrect can be automatically scored, but if it cannot be determined, it is manually scored. Although it is possible for a person to input an answer to a computer, it is preferable to recognize the student's answer sheet with character recognition software and automatically recognize it from the viewpoint of reducing labor and scoring time. .

４．手順４
自動採点と手動採点の結果、解答の結果を正解と不正解文群に定期的に追加する。これらをもとに、クラスタリングとアライメントを行い、新たな条件キーフレーズとして追加する。 4). Step 4
As a result of automatic scoring and manual scoring, results of answers are periodically added to correct and incorrect answer sentences. Based on these, clustering and alignment are performed and added as a new conditional key phrase.

以下、本発明の実施例を、用語の説明とともに説明する。 Hereinafter, examples of the present invention will be described together with explanations of terms.

アライメントとは、2個もしくは3個以上の配列の類似性の判定に利用される技術であり、文字間の適切、最適な対応関係を求める技術である。図２に、アライメントの概念を説明した図を示す。
図３に、本実施例の点数（スコア）の算出方法を示す。２つの配列を並べた時、（１）位置pにある２つの文字が一致する場合（マッチ）、（２）位置pにある２つの文字が不一致の場合（ミスマッチ）、（３）位置pにおいて、片方の配列の文字に対して、もう片方の配列の文字が存在しない場合（ギャップ）を判別する。そして、判別した結果、例えばマッチの場合は＋1点、ミスマッチは−1点、ギャップは−1点として点数を算出する。これはあくまでも採点の一例であり、文字の重要性に応じて加点、減点する点数を変化させることが好ましい。算出したスコアを最大化するように整列するアルゴリズムを用いることが好ましい。 Alignment is a technique used to determine the similarity of two or more sequences, and is a technique for obtaining an appropriate and optimal correspondence between characters. FIG. 2 is a diagram illustrating the concept of alignment.
In FIG. 3, the calculation method of the score (score) of a present Example is shown. When arranging two sequences, (1) if two characters at position p match (match), (2) if two characters at position p do not match (mismatch), (3) at position p When there is no character in the other array (gap) with respect to the character in the other array, it is determined. As a result of the determination, for example, the score is calculated as +1 point in the case of a match, −1 point in the mismatch, and −1 point in the gap. This is merely an example of scoring, and it is preferable to change the points to be added or subtracted according to the importance of the characters. It is preferable to use an algorithm that aligns so as to maximize the calculated score.

図４は、アライメントを行う前提となる形態素解析の概要を説明する図である。文字列を品詞（例えば、代名詞、格助詞、固有名詞、動詞など）の情報を元に形態素（意味を持つ最小単位）に分割する。そして、文字列を形態素に分割した形態素配列に基づき、アライメント、すなわち文字列を構成する形態素配列を、基準となる形態素配列と比較し、対象文字列の点数（スコア）を算出する。 FIG. 4 is a diagram for explaining an outline of morphological analysis which is a premise for performing alignment. The character string is divided into morphemes (the smallest meaningful unit) based on information of parts of speech (for example, pronouns, case particles, proper nouns, verbs, etc.). Then, based on the morpheme array obtained by dividing the character string into morphemes, the alignment, that is, the morpheme array constituting the character string is compared with the reference morpheme array, and the score (score) of the target character string is calculated.

本実施例の手順の概略を図５により説明する。まず、基準となる文字列（基準文字列）及び採点対象となる文字列に対し、形態素解析を行い形態素に分割する（５０２）。形態素解析をした基準文字列及び採点対象文字列に対し、アライメント（例えば、Needleman-Wunsch algorithmによるアライメント）を行う（５０３）。そして、アライメントを行った基準対象文字列と採点対象文字列を比較し、類似性を判定し、点数（スコア）を算出する（５０４）。点数を最大化するように、基準文字列を整列しても良い。 The outline of the procedure of this embodiment will be described with reference to FIG. First, a morpheme analysis is performed on a character string as a reference (reference character string) and a character string as a scoring target, and divided into morphemes (502). Alignment (for example, alignment by Needleman-Wunsch algorithm) is performed on the reference character string and the scoring target character string that have been subjected to morphological analysis (503). Then, the aligned reference target character string and the scoring target character string are compared, similarity is determined, and a score (score) is calculated (504). The reference character strings may be aligned so as to maximize the score.

採点対象となる文字列に対する点数を算出した具体例を図６に示す。基準文字列が「私は今から東京に行く」、採点対象文字列が「私は東京に行く予定だ」だとする（６０１）。基準文字列を形態素に分解し、「私は今から東京に行く」とし、採点対象文字列を形態素に分割し、「私は東京に行く予定だ」とした（６０２）。形態素に分解した基準対象文字列と採点対象文字列に対し、それらを構成する形態素の品詞に基づいてアライメントを行うと、基準文字列は、「私は今から東京に行く −」、採点対象文字列は、「私は − 東京に行く予定だ」となる（６０３）。ここでは、大まかにいえば、基準文字列を構成する形態素の品詞に対応するように、採点対象文字列を構成する形態を並べ替える。ここで、「−」は、採点対象文字列に、基準文字列を構成する形態素の種類（例えば品詞）の形態素が存在しないことを意味している。逆に、基準対象文字列に、採点対象文字列を構成する形態素の種類の形態素が存在しない場合も「−」が表示される。このようにアライメントを行った基準文字列と採点対象文字列を比較すると、基準対象文字列と採点対象文字列では、7単語中6単語が一致しており、類似性が85.7%であることが分かる。例えば、この類似性をそのまま採点対象文字列の点数（スコア）とすることが考えられる。 A specific example in which the score for the character string to be scored is calculated is shown in FIG. Assume that the reference character string is “I am going to Tokyo now” and the scoring target character string is “I am going to Tokyo” (601). The reference character string was decomposed into morphemes, “I am going to Tokyo now”, the character string to be scored was divided into morphemes, and “I am going to Tokyo” (602). When the reference character string and the scoring target character string decomposed into morphemes are aligned based on the part of speech of the morphemes that compose them, the reference character string is “I will go to Tokyo from now on”, the scoring character The column reads “I am going to Tokyo-” (603). Here, roughly speaking, the forms constituting the scoring target character string are rearranged so as to correspond to the part of speech of the morphemes constituting the reference character string. Here, “-” means that there is no morpheme of morpheme type (for example, part of speech) constituting the reference character string in the scoring target character string. Conversely, “-” is also displayed when the morpheme of the morpheme type constituting the scoring target character string does not exist in the reference target character string. When comparing the reference character string and the scoring target character string that are aligned in this way, the reference target character string and the scoring target character string indicate that six of the seven words match and the similarity is 85.7%. I understand. For example, this similarity can be used as the score (score) of the character string to be scored as it is.

図７は、平成２９年５月１６日に公表された大学入試センター試験の記述式問題の例である。この例では、「正答例」として「景観を守るガイドラインによって、治安が維持され観光資源として活用されること。」とされており、この正答例を基準文字列としても良い。 FIG. 7 is an example of a descriptive problem of the university entrance examination center test published on May 16, 2017. In this example, “an example of correct answer” is “to be maintained as a tourist resource by the guidelines for protecting the landscape and used as a tourism resource.” This correct example may be used as a reference character string.

また、「正答の条件」として「40字以内で書いているもの」という文字数の条件があり、この文字数の条件を満たしているか否かを最終的な点数の算出に反映させても良い。また、それ以外の「正答の条件」についても、最終的な点数の算出に反映させても良い。 Moreover, there is a condition for the number of characters such as “what is written within 40 characters” as the “correct answer condition”, and whether or not this character number condition is satisfied may be reflected in the final score calculation. Further, other “correct answer conditions” may be reflected in the final score calculation.

本実施例の効果は、次のとおりである。
（１）一人目の採点者として、採点できる。
（２）採点の流れから、最初は自動採点の割合はシミュレーションにより30%ぐらいを見込んでいるが、採点が進むにつれて、条件キーフレーズが拡充していき、自動採点する割合は100%になることがわかっている。
（３）自動採点された答案については、高精度の採点結果を維持できる。シミュレーションでは、正解と不正解とも100%であった。
（４）条件キーフレーズとアライメントにより、記述式問題に部分点の設定が可能である。 The effects of the present embodiment are as follows.
(1) Can be graded as the first grader.
(2) From the scoring flow, initially, the automatic scoring ratio is expected to be about 30% by simulation, but as the scoring progresses, the condition key phrases will be expanded and the automatic scoring ratio will be 100% I know.
(3) About an automatically graded answer, a highly accurate scoring result can be maintained. In the simulation, both correct and incorrect answers were 100%.
(4) Partial points can be set in the description problem by conditional key phrases and alignment.

1. 自動採点システム
提案する自動採点システムについて、1.1節ではシステム構成、1.2節以降では採点アルゴリズムについて述べる。 1. Automatic scoring system For the proposed automatic scoring system, Section 1.1 describes the system configuration and Section 1.2 and later describe the scoring algorithm.

1.1 システム構成
システムの構成図を図８に示す。あらかじめ人手により設定された採点設定をもとに自動採点を行う。自動採点部分により正解不正解の判断ができない場合、手動採点に移行する。デジタル化された文字列データを入力とし、正解・不正解の二値を結果として出力する。採点結果付きのデータをテストデータとして用いる場合には、手動採点にまわった解答をクラスタリングし、特徴を可視化することで採点設定改善の支援を行う。
採点設定として、以下の項目を設定する。
・文字数制限(範囲内でなければ不正解とする。)
・事前置き換え単語(表記を統一するために採点前に置き換える語)
・NGワード(含まれていれば不正解とする語)
・NGエンドワード(文末に含まれていれば不正解とする語) 1.1 System configuration Figure 8 shows the system configuration. Automatic scoring is performed based on scoring settings set in advance by hand. If the correct / incorrect answer cannot be determined by the automatic scoring part, the system shifts to manual scoring. The digitized character string data is input, and the correct / incorrect binary is output as the result. When using data with scoring results as test data, support for improving scoring settings is performed by clustering the answers around manual scoring and visualizing the features.
The following items are set as scoring settings.
・ Character limit (If it is not within the range, it will be considered incorrect)
・ Pre-replacement word (replacement word before scoring to unify notation)
・ NG words (words that are incorrect if included)
NG end word (words that are incorrect if included at the end of the sentence)

1.2 一次採点
採点設定をもとに、満たすべき条件を全て満たしているかを確認する。一つでも満たしていない場合、不正解であると判定し、採点を終了する。全て満たしている場合、キーフレーズ比較の処理へ移行する。採点済み解答との比較では、正解不正解の結果に基づいて蓄積された採点済み解答データと、新たに採点をおこなう解答データを比較し、同一の場合即座に正解不正解の判断を行い、採点を終了する。採点が行われた解答は採点結果とともに採点済み解答に追加される。 1.2 Primary scoring Based on the scoring settings, check whether all the conditions to be satisfied are satisfied. If even one is not satisfied, it is determined that the answer is incorrect and the scoring is terminated. If all are satisfied, the process proceeds to key phrase comparison processing. In the comparison with the scored answer, the scored answer data accumulated based on the result of the incorrect answer is compared with the answer data to be newly scored. Exit. The scored answer is added to the scored answer along with the score.

1.3 キーフレーズ比較
Needleman-Wunsch Algorithmというシーケンスアライメントアルゴリズムによって解答文とキーフレーズの比較を行う。このアルゴリズムは2つの配列において、一致する文字数を最大にし、不一致の数が最小になるようスコア関数を用いて配列を並び替えることで最適な配列を得る。採点を行う解答文と各キーフレーズについて形態素解析を行った後にシーケンスアライメントを行うことで、解答文とキーフレーズとの共通部分・非共通部分を抽出することができる。その様子を図９に示す。これにより、単語の並びに着目して解答文とキーフレーズとの一致率を計算し、設定されているいくつかのキーフレーズの中から最も類似した文章を選択することや、非共通部分を抽出し、比較を行うことができる。
抽出された非共通部分について、表記揺れの補正と類義語の置換を行う。単純な文字列の比較では、同一の単語であっても漢字とひらがなのように表記が異なる場合に正しく採点を行うことができない。また、「人」と「人間」のように、同じ意味を持つ単語についても表記が異なるため正しく採点を行うことができない。そこで本システムでは、読み方や単語の意味カテゴリに着目して同一の意味であるかどうかを判断することで表記方法に依存しない採点を行う。 1.3 Key phrase comparison
The answer sentence and key phrase are compared by the sequence alignment algorithm called Needleman-Wunsch Algorithm. This algorithm obtains the optimal sequence by rearranging the sequences using a score function so that the number of matching characters is maximized and the number of mismatches is minimized in the two sequences. By performing sequence alignment after performing morphological analysis on the answer sentence and each key phrase to be graded, it is possible to extract a common part / non-common part between the answer sentence and the key phrase. This is shown in FIG. This calculates the matching rate between the answer sentence and the key phrase by paying attention to the sequence of words, selects the most similar sentence from several set key phrases, and extracts non-common parts. A comparison can be made.
The extracted non-common part is corrected for notation fluctuation and synonym substitution. In simple character string comparison, even if the same word is used, scoring cannot be performed correctly if the notation is different, such as kanji and hiragana. In addition, words such as “people” and “people” having the same meaning cannot be scored correctly because their notations are different. Therefore, in this system, scoring independent of the notation method is performed by judging whether or not they have the same meaning by paying attention to the reading method and the semantic category of the word.

1.4 採点設定の改善支援
採点結果付きのテストデータがあり、採点設定を改善することができる場合、テストデータの採点終了後に手動採点にまわった解答をクラスタリングする。現在の採点設定では自動採点できない文章に共通する特徴が明確になるため、有効なキーフレーズやNGワードを人手により設定することが容易となる。 1.4 Support for improvement of scoring settings If there is test data with scoring results and scoring settings can be improved, the answers to manual scoring are clustered after scoring the test data. Since the features common to sentences that cannot be automatically scored with the current scoring settings are clarified, it is easy to manually set valid key phrases and NG words.

2. 実験
提案する自動採点システムを用いた実験について、2.1節では実験設定、2.2節では結果について述べる。 2. Experiments For experiments using the proposed automatic scoring system, section 2.1 describes the experimental settings and section 2.2 describes the results.

2.1 実験設定
中学生を対象に行われた模試の記述式問題3科目分それぞれ約1200人分の解答データについて採点を行う。その後、採点設定の改善をして再度採点を行う。問題の特徴を以下に示す。
・国語 20字以内穴埋め形式
・社会 25字以内穴埋め形式
・理科文字数制限無し自由記述形式
評価の指標として、自動採点率と採点精度を算出する。自動採点率は、全解答文のうち本システムで自動採点の対象となった解答の割合を表し、採点精度は、自動採点を行なった解答のうち正しく採点できている解答の割合を表す。 2.1 Setting up the experiment The graded answer questions for three subjects for junior high school students will be scored for about 1,200 students each. Thereafter, the scoring setting is improved and the scoring is performed again. The characteristics of the problem are as follows.
・ Language filling within 20 characters / Society Filling within 25 characters / Science Unlimited number of characters Free description format As an evaluation index, calculate the automatic scoring rate and scoring accuracy. The automatic scoring rate represents the proportion of answers that have been subject to automatic scoring in this system, and the scoring accuracy represents the proportion of answers that have been scored correctly among the answers that have been automatically scored.

2.2 実験結果
実験結果を表１、表２に示す。
表１、２より、全ての問題に対して非常に高い精度で採点できていることが分かる。また、採点設定を改善することで、自動採点率・採点精度ともに向上していることが分かる。教科別で見ると理科が最も良い結果が得られていることが分かる。これは、理科は単語を答える問題に近いため、表現の幅が小さく、採点設定が容易であったことが理由だと考えられる。明確な答えが存在し、文字数制限や使用単語の指定などによって表現の幅が小さくなればなるほど自動採点率・採点精度は向上すると考えられる。 2.2 Experimental results Tables 1 and 2 show the experimental results.
From Tables 1 and 2, it can be seen that scoring is possible with very high accuracy for all problems. It can also be seen that the automatic scoring rate and scoring accuracy are improved by improving the scoring settings. If you look at each subject, you can see that science has the best results. This is probably because science is close to the problem of answering words, so the range of expression is small and scoring is easy. There is a clear answer, and it is thought that the automatic scoring rate and scoring accuracy will improve as the range of expression becomes smaller due to the limitation on the number of characters and the designation of words used.

3. おわりに
本実施例では、大学入学共通テストに記述式問題が導入されることを背景として、日本語記述式問題の自動採点を行うシステムを提案した。国語社会理科の3教科それぞれ約1200 人分の模試の解答データに対して実験を行った結果、100%に近い精度で約60%の解答を自動採点することができ、採点にかかるコスト削減に繋がることを確認できた。従来の自動採点手法と比較すると、採点結果付きのデータを必要とせず、高い精度で自動採点が行える点において優位性があると考えられる。
今後の展望として、より自由度の高い文章においてどれだけ有効であるかを検証するとともに、100%の採点精度を維持しながら自動採点率の改善を行っていく必要がある。 3. Conclusion In this example, we proposed a system for automatic scoring of Japanese-style descriptive questions against the background of the introduction of descriptive questions to the university entrance common test. As a result of conducting experiments on the answer data of about 1,200 students in each of the three subjects of the Japanese language social science, it was possible to automatically score about 60% of the answers with an accuracy close to 100%, thus reducing the cost of scoring I was able to confirm that they were connected. Compared to the conventional automatic scoring method, it is considered that there is an advantage in that automatic scoring can be performed with high accuracy without requiring data with scoring results.
As future prospects, it is necessary to verify how effective it is in sentences with a higher degree of freedom and to improve the automatic scoring rate while maintaining 100% scoring accuracy.

本発明は、記述式試験採点プログラム及び記述式試験採点方法として産業上の利用可能である。 The present invention is industrially applicable as a descriptive test scoring program and a descriptive test scoring method.

５０１採点対象文章と採点基準文章の設定
５０２形態素解析
５０３アライメント
５０４類似性判定
６０１採点対象文章と採点基準文章の例
６０２形態素解析の例
６０３アライメントの例
６０４類似性の例 501 Setting of scoring target sentence and scoring reference sentence 502 Morphological analysis 503 Alignment 504 Similarity determination 601 Example of scoring target sentence and scoring reference sentence 602 Example of morphological analysis 603 Example of alignment 604 Example of similarity

Claims

On the computer,
Morphological analysis procedure for performing morphological analysis on character string information that is a scoring standard and character string information that is a target of scoring,
An alignment procedure for calculating the correspondence between the morpheme of the character string information to be scored and the morpheme of the character string information to be a reference for scoring analyzed by the morpheme analysis procedure;
Based on the correspondence between the morpheme of the character string information to be scored and the morpheme of the character string information to be a reference for scoring calculated by the alignment procedure, the character string information to be the reference for scoring and the character string to be the target of scoring A similarity calculation procedure for calculating information related to information similarity;
A descriptive test scoring program for executing a scoring procedure for calculating the score of character string information to be scored based on information on similarity calculated by the similarity calculating procedure.

In the alignment procedure, the correspondence between the morphemes of the character string information to be scored and the character string information to be the reference of the scoring is calculated so that the information on the similarity calculated by the similarity calculation means is maximized. The descriptive test scoring program according to claim 1.

The descriptive test scoring program according to claim 1, wherein the character string information serving as a reference for the scoring is corrected based on answer information of a test taker of the descriptive test.

A morpheme analysis procedure for performing morphological analysis by a computer on character string information as a reference for scoring and character string information to be scored;
An alignment procedure for calculating the correspondence between the morpheme of the character string information to be scored and the morpheme of the character string information to be a reference for scoring analyzed by the morpheme analysis procedure;
Based on the correspondence between the morpheme of the character string information to be scored and the morpheme of the character string information to be a reference for scoring calculated by the alignment procedure, the character string information to be the reference for scoring and the character string to be the target of scoring Similarity calculation procedure for calculating information related to information similarity by the computer;
A descriptive test scoring method comprising: a scoring procedure for calculating a score of character string information to be scored by the computer based on information on similarity calculated by the similarity calculating procedure.

In the scoring procedure, based on whether a keyword, a word that is incorrect if included, or a word that is incorrect if included at the end of the sentence are included in the character string information to be scored The descriptive test scoring program according to claim 1, wherein the score of character string information to be scored is calculated.

An automatic scoring possibility determination procedure for determining whether or not the score of the character string information to be scored can be calculated by the scoring procedure;
The description according to claim 1, further comprising: a clustering procedure for clustering character string information that is a target of the scoring determined that the scoring procedure cannot calculate the score by the automatic scoring possibility judging procedure. Formula exam scoring program.