JP6190802B2

JP6190802B2 - Computing to estimate the ability value of many candidates based on item response theory

Info

Publication number: JP6190802B2
Application number: JP2014264102A
Authority: JP
Inventors: 圭太仲村
Original assignee: 公益財団法人日本英語検定協会
Priority date: 2014-12-26
Filing date: 2014-12-26
Publication date: 2017-08-30
Anticipated expiration: 2034-12-26
Also published as: JP2016126029A

Description

この発明は、複数の問題を複数の受験者に回答させて得られた各受験者の各問題ごとの正答および誤答を表す採点結果を、周知の項目応答理論の１パラメータロジスティックモデルを適用したアルゴリズムで処理し、各受験者の能力値を推定するコンピューティングの改良に関するものである。 In the present invention, a one-parameter logistic model of a well-known item response theory is applied to scoring results representing correct and incorrect answers for each question obtained by allowing a plurality of examinees to answer a plurality of questions. The present invention relates to an improvement in computing that uses an algorithm to estimate the ability value of each examinee.

出願人は、１級から５級にグレード分けした実用英語技能検定試験を実施しており、現状では、試験結果を所定の基準により二値判定して合格または不合格というかたちで受験者に検定結果を知らせている。この発明を創作するきっかけは、各グレードの合格・不合格という判定結果に加えて、１級から５級にわたる統一した尺度で表現した能力値（実用英語技能の習得度を表現した値である）を各受験者に提示しようと企図したことにある。たとえば、１級試験に合格したある受験者の能力値が９７であり、５級試験に合格したある受験者の能力値が４３であるという具合である。 Applicants are conducting practical English proficiency tests that are graded from 1st to 5th grades, and at present, the test results are judged by binary according to the prescribed criteria and passed or rejected. Inform the result. The reason for creating this invention is the ability value expressed on a unified scale ranging from 1st to 5th grades in addition to the judgment result of pass / fail of each grade (value expressing the mastery of practical English skills) Is intended to be presented to each examinee. For example, the ability value of a candidate who has passed the first grade examination is 97, and the ability value of a candidate who has passed the fifth grade examination is 43.

各種の能力検定試験に関する統計確率理論としては、代表的なものとして、項目応答理論（ＩRＴ）の１パラメータロジスティック（１ＰＬ）モデルがよく知られている。このＩRＴ−１ＰＬモデルでは、複数の問題を複数の受験者に回答させて正答・誤答を採点するという大規模な試験の実施を前提として、統計確率論的に、問題の難易度が大きいほどその問題に正答する確率が小さくなり、また、回答者の能力値が大きいほど問題に正答する確率が大きくなるという関係性について、各問題に正答する確率が回答者の能力値と問題の難易度の関数として表されるという、有名なつぎの理論式を導き出している。

ここで、Ｐij（θj）は、能力値θjの受験者が難易度ｂiの問題に対して正解する確率である As a representative statistical probability theory regarding various ability test tests, a one-parameter logistic (1PL) model of item response theory (IRT) is well known. In this IRT-1PL model, the greater the difficulty of a problem, the more statistically statistically, the premise is that a large-scale test is performed in which multiple questions are answered by multiple examinees and correct / incorrect answers are scored. Regarding the relationship that the probability of correctly answering a question decreases and the probability that the answerer's ability value increases as the answerer's ability value increases, the probability of answering each question correctly depends on the ability value of the respondent and the difficulty level of the question. The following well-known theoretical formula, which is expressed as a function of, is derived.

Here, Pij (θj) is the probability that the examinee with the ability value θj will correctly answer the problem of the difficulty level bi.

≪参考文献１≫項目応答理論入門―言語テスト・データの新しい分析法、大友賢二著、大修館書店、１９９６．
≪参考文献２≫項目応答理論・理論編−テストの数理−，豊田秀樹編著；朝倉書店，2005．
≪参考文献３≫ベイジアンネットワーク技術・顧客・ユーザーのモデル化と不確実性理論，本村陽一，岩崎弘利著；東京電機大学出版，2006．
≪参考文献４≫特開２００８−２４２６３７号公報
≪参考文献５≫特表２０１３−５２１０５３号公報
≪参考文献６≫特開２００４−１７７５１０号公報 << Reference 1 >> Introduction to Item Response Theory-A New Method for Analyzing Language Test Data, Kenji Otomo, Taishukan Shoten, 1996.
<< Reference 2 >> Item Response Theory / Theory -Mathematics of Test-, edited by Hideki Toyoda; Asakura Shoten, 2005.
<< Reference 3 >> Bayesian network technology, customer / user modeling and uncertainty theory, Yoichi Motomura, Hirotoshi Iwasaki; Tokyo Denki University Press, 2006.
<< Reference 4 >> JP 2008-242737 A << Reference 5 >> Special Table 2013-521053 << Reference 6 >> JP 2004-177510 A

＝＝＝従来の方法１＝＝＝
１００個の問題からなる試験を１万人の受験者が受けて、それらの正答・誤答を採点し、１００個の各問題ごとの正答率を算出したとする。ここで、これら１００個の問題については、過去の試験実績から算定された難易度があらかじめ定まっているものとする。この場合、ＩRＴ−１ＰＬモデルの上記式１を用いた周知のプログラム（これをＩRＴシステムと称することにする）により、採点により得られた各問題ごとの正答率と、各問題に付された難易度とを入力情報として処理することにより、１万人の各受験者の能力値を算出（推定）することができる。そして、１００個のすべての問題の難易度が過去の実績を反映して信頼性が高いとすれば、算出される１万人の受験者の能力値の信頼性も高いと考えられる。 === conventional method 1 ===
Suppose that 10,000 examinees have taken a test consisting of 100 questions, scored the correct / incorrect answers, and calculated the correct answer rate for each of the 100 questions. Here, it is assumed that the difficulty calculated from the past test results is determined in advance for these 100 problems. In this case, the correct answer rate for each question obtained by scoring and the difficulty attached to each question by a well-known program using the above equation 1 of the IRT-1PL model (hereinafter referred to as an IRT system) By processing the degree as input information, it is possible to calculate (estimate) the ability value of each 10,000 examinees. If the difficulty level of all 100 questions reflects the past results and is highly reliable, it is considered that the calculated ability values of 10,000 examinees are also highly reliable.

＝＝＝従来の方法２＝＝＝
上記の方法１は、１００個の試験問題のすべてが定評のある難易度の付いた過去問であることを前提としている。全国的な規模で使用された問題の公開を前提として定期的に行われる実用英語技能検定試験のような試験では、試験問題のすべてを過去問とするような方法はとうてい採用することはできないので、つぎのような試験方法（方法２）が実施されている。 === conventional method 2 ===
Method 1 above assumes that all 100 test questions are past questions with a proven difficulty. For exams such as the Practical English Proficiency Test that are held regularly on the premise that the questions used on a nationwide scale are made public, it is not possible to adopt a method in which all of the exam questions are past questions. The following test method (Method 2) is carried out.

方法２においては、たとえば、全１００個の試験問題を、定評のある難易度の付いた問題（過去問）と、難易度がまだ付いていない問題（新問）とを混合した構成として試験を実施する。具体例として、難易度付きの３０問と難易度なしの７０問とで試験問題を構成し、これを１万人が受験し、その採点結果に基づいて１万人の受験者のそれぞれの能力値を算定（推定）する場合について説明を進める。 In Method 2, for example, all 100 test questions are tested as a mixture of problems with a proven difficulty (past questions) and problems that do not have difficulty yet (new questions). carry out. As a specific example, a test question is composed of 30 questions with difficulty and 70 questions without difficulty, and 10,000 people take the exam, and each ability of 10,000 test takers based on the scoring results The explanation is given for the case of calculating (estimating) the value.

１万人の受験者の採点結果から得た１００個の問題ごとの正答率を前記ＩRＴシステムに入力するとともに、１００問中の３０問についてはそれぞれに設定されている難易度を当該ＩRＴシステムに入力して、プログラムされた推定演算を実行させる。ＩRＴシステムは、前記式１に基づく統計確率論的な演算により、７０個の新問の難易度をそれぞれに算定（推定）するとともに、１万人の受験者のそれぞれの能力値を算定（推定）して出力する。 The correct answer rate for each of the 100 questions obtained from the scores of 10,000 test takers is input to the IRT system, and the difficulty set for each of the 30 questions out of 100 is entered into the IRT system. Enter and run the programmed estimation operation. The IRT system calculates (estimates) the difficulty level of 70 new questions, respectively, and calculates (estimates) the respective ability values of 10,000 test takers, using statistical probability theory calculations based on Equation 1 above. ) And output.

上記の推定演算アルゴリズムは当業者の周知事項であるので詳細は割愛するが、ここでは、その原理について一応説明する。図１には、１万人の採点結果から得た１００個の問題ごとの正答率を、一次元の正答率軸上にプロットした様子を示している。プロットされた１００個の問題別の正答率のうち、３０個には難易度があらかじめ付いており、残りの７０個には難易度が付いていない。ＩRＴシステムは、正答率軸上の３０個の問題の難易度の配列状態を参照して正答率軸上の残り７０個の問題に妥当な難易度を演算により算定して付加する。 The above estimation calculation algorithm is well known to those skilled in the art and will not be described in detail, but here the principle will be described. FIG. 1 shows a state where the correct answer rate for every 100 questions obtained from the scoring results of 10,000 people is plotted on the one-dimensional correct answer rate axis. Of the 100 correct answer rates by problem, 30 have a difficulty level in advance, and the remaining 70 have no difficulty level. The IRT system calculates and adds an appropriate difficulty level to the remaining 70 questions on the correct answer rate axis by referring to the arrangement state of the 30 difficulty levels on the correct answer rate axis.

上記の処理により１００個の問題のすべてに難易度がついたので、方法１と同じアルゴリズムにより１万人の受験者ごとの能力値を算定することができる。このようにしてＩRＴシステムは、７０個の問題ごとの難易度と１万人の受験者ごとの能力値とを算定（推定）して出力する。 Since all of the 100 questions have been made difficult by the above processing, the ability value for each 10,000 examinees can be calculated by the same algorithm as in Method 1. In this way, the IRT system calculates (estimates) and outputs the difficulty level for each of the 70 questions and the ability value for each of 10,000 examinees.

＝＝＝方法２の問題点＝＝＝
上記の説明から理解できるように、方法２により推定される受験者の能力値の尤もらしさは、方法１と比較して明らかに悪化する。方法１では全１００問のすべてに定評のある難易度が付いていたのに対し、方法２では１００問中の３０問にしか定評のある難易度は付いておらず、残り７０問の問題の難易度も１万人の採点結果から推定し、その上で１万人の能力値を推定しているのであるから、尤度が低くなるのは当然であるといえる。 === Problem of Method 2 ===
As can be understood from the above description, the likelihood of the ability value of the examinee estimated by the method 2 is clearly deteriorated as compared with the method 1. In Method 1, all of the 100 questions had a proven difficulty, whereas in Method 2, only 30 of the 100 questions had a proven difficulty, and the remaining 70 questions. Since the difficulty level is estimated from the scoring result of 10,000 people, and then the ability value of 10,000 people is estimated, it can be said that the likelihood is naturally lowered.

＝＝＝従来の方法３＝＝＝
方法２における上述した問題点を少しでも改善するべく、つぎに述べる処理を加えた方法３が従来から実施されていた。方法２の上記演算結果を前提とし、当初難易度が付いていなかった７０問について、以下のようにして問題の質指標を計算し、その質指標が所定の基準に満たなかった問題を能力値の推定演算の対象から削除する。 === conventional method 3 ===
In order to improve the above-described problems in the method 2 as much as possible, the method 3 to which the following processing is added has been conventionally performed. Based on the above calculation result of method 2, the quality index of the problem is calculated as follows for 70 questions that did not have difficulty at first, and the problem that the quality index did not meet the predetermined standard Are removed from the target of the estimation operation.

問題の質指標はつぎのように計算する。上記方法２の推定演算処理により、当初難易度が付いていなかった７０問について難易度が算定されるとともに、１万人の能力値が算定された。このことを前提として説明を進める。また１００個の問題には識別番号ｉ（ｉ＝１〜１００）が付いている。 The quality indicator of the problem is calculated as follows: With the estimation calculation process of Method 2 above, the difficulty level was calculated for 70 questions that did not have a difficulty level at the beginning, and the ability value of 10,000 people was calculated. The explanation will proceed on the premise of this. In addition, the identification number i (i = 1 to 100) is attached to 100 problems.

当初難易度が付いていなかったある問題ｉについては１万人の回答データから受験者の能力値θを推定し、この問題ｉに付された難易度ｂとに基づいて、式１により理論上の正答率Ｐを計算する。そして、この問題ｉの実際の正答率と正答率理論値Ｐの差分（これが問題の質指標である）を計算する。この差分が所定の基準範囲から外れて大きい場合、この問題ｉを能力値の推定演算の対象から削除する。このように判定された問題ｉのことを基準外問題と称する。 For a problem i that was not initially assigned a difficulty level, the ability value θ of the examinee is estimated from the response data of 10,000 people, and based on the difficulty level b attached to the problem i, the following equation 1 is theoretically calculated. The correct answer rate P is calculated. Then, the difference between the actual correct answer rate of question i and the correct answer rate theoretical value P (this is the quality indicator of the question) is calculated. When this difference is large outside the predetermined reference range, this problem i is deleted from the target of the ability value estimation calculation. The problem i determined in this way is referred to as a non-reference problem.

以上の追加の処理を、当初難易度がついていなかった７０問について実施する。前述した周知のＩRＴシステムには、この処理を実施する機能も備わっている。この説明例においては、上記の７０問中の２８問が基準外問題として判定されたとする。７０問中の４２問は基準内問題である。 The above additional processing is performed for 70 questions that were not initially assigned difficulty. The well-known IRT system described above also has a function of performing this processing. In this explanation example, it is assumed that 28 of the above 70 questions are determined as non-standard issues. 42 of the 70 questions are within the standards.

つぎに、上記の質判定で基準内問題とされた４２問と、当初から難易度が付いていた３０問の合計７２問のみを対象とし、１万人の採点結果から得た７２問の正答率を前記ＩRＴシステムに入力するとともに、３０個の問題に当初から付されていた各問題の難易度を前記ＩRＴシステムに入力し、方法２と同じ推定演算を実行させる。そうすると方法２で説明したとおり、前記ＩRＴシステムは、４２問の問題ごとの難易度を出力するとともに、１万人の受験者ごとの能力値（再計算された能力値）を出力する。 Next, the 72 correct answers obtained from the scoring results of 10,000 people, covering only a total of 72 questions, including 42 questions that were regarded as in-standard questions in the quality judgment and 30 questions that had difficulty from the beginning. The rate is input to the IRT system, and the difficulty level of each problem assigned to 30 problems from the beginning is input to the IRT system, and the same estimation operation as that of the method 2 is executed. Then, as described in Method 2, the IRT system outputs the difficulty level for each of the 42 questions and outputs the ability value (recalculated ability value) for each of 10,000 examinees.

１万人の受験者について再計算された能力値は、推定演算の対象から削除された２８問の影響を受けて、最初の推定演算により算定された能力値とは異なる値になり（全員について異なるという意味ではない）、かつ、再計算された能力値は１回目の計算よりもその尤度は向上していると考えられる。従来は、このようにして再計算された能力値をそれぞれ各受験者に提示する方法が望ましいとされていた。 The recalculated ability values for 10,000 test takers are affected by the 28 questions that were deleted from the estimation calculations, and differ from the ability values calculated by the first estimation calculation (for all It does not mean that they are different), and the recalculated ability value is considered to be more likely than the first calculation. Conventionally, it has been desirable to provide each examinee with the ability values recalculated in this way.

＝＝＝方法３の問題点＝＝＝
上記の方法３により再計算した能力値をそれぞれの受験者の能力値として提示する従来の試験方法においては、上記の具体例に従って説明すると、１００問中の２８問（前記の質判定により基準外問題とされた２８問である）は能力値の推定演算の対象から完全に削除されており、この２８問に対して正答したのか誤答したのかという各受験者の事実関係は各人の能力値にまったく反映していないのである。このことが本出願の発明者が解決すべき課題として認識したことである。 === Problem of Method 3 ===
In the conventional test method in which the ability value recalculated by the above method 3 is presented as the ability value of each examinee, it will be explained according to the above specific example. 28 questions) are completely removed from the target of the ability value estimation calculation, and each test taker's facts about whether they answered correctly or not are the ability of each person. It is not reflected in the value at all. This is what the inventor of the present application has recognized as a problem to be solved.

＝＝＝解決課題および目的＝＝＝
以下では、この発明の端的な理解促進を目的として、上記の「発明の背景」で説明した文脈の中で、この発明の核心とする事項を具体的に説明することとする。 === Solution Issues and Objectives ===
In the following, for the purpose of facilitating a simple understanding of the present invention, the core matter of the present invention will be specifically described in the context described in the “Background of the Invention” above.

実用英語技能検定試験などのような大規模に定期的に実施される試験の場合、方法３の質判定により基準外問題とされた１００問中の２８問に対する各受験者の正答または誤答が各人に伝えられた能力値とまったく無関係であるという事実は、多数の受験者が採点結果や能力値を見せあって、互いの情報を突き合わせて受験者間で分析が行われることにより、比較的容易にその事実関係が解明されるものである。このことは技能検定試験の信頼性評価を引き下げるマイナス要因となってしまう。 In the case of a test that is regularly conducted on a large scale, such as a practical English proficiency test, the correct or incorrect answer of each examinee for 28 out of 100 questions that were judged as non-standard questions by the quality judgment of Method 3 The fact that it is completely unrelated to the ability value communicated to each person is compared by the fact that a large number of examinees show their scoring results and ability values, collate information with each other, and perform analysis between examinees. The factual relationship can be easily clarified. This becomes a negative factor that lowers the reliability evaluation of the skill test.

この発明の目的の１つは、能力値の推定に無関係となる試験問題が発生しないようにすることにある。この発明のもう１つの目的は、すべての試験問題の採点結果に基づいてより尤度の高い能力値を算定できるようにすることにある。 One of the objects of the present invention is to prevent a test problem unrelated to the estimation of the capability value from occurring. Another object of the present invention is to make it possible to calculate a capability value having a higher likelihood based on the scoring results of all test questions.

＝＝＝本発明の実施例説明の前提事項＝＝＝
以下に説明する具体的な実施例の処理進行の様子を図２に図解した。この実施例は、従来の方法２および方法３としてすでに説明したつぎの事項を前提としている。
（ａ）１万人の受験者が全１００問の試験を受けた。１００問中の３０問には定評のある難易度があらかじめ付与されおり、残りの７０問には難易度は付いていない。
（ｂ）１万人の受験者の採点結果から得た１００個の問題ごとの正答率と、３０問の難易度とを前記ＩRＴシステムに入力して能力値の推定演算を行い、７０個の新問の難易度と、１万人の受験者の能力値を算定する（従来の方法２）。これを１回目の能力値計算と呼ぶことにする。
（ｃ）当初難易度の付いていなかった７０問について前記ＩRＴシステムにより質判定処理を実行する（従来の方法３）。その結果、２８問が基準外問題と判定され、４２問が基準内問題と判定された。これを１回目の質判定処理と呼ぶことにする。 === Prerequisites for explaining the embodiments of the present invention ===
FIG. 2 illustrates the progress of processing in a specific embodiment described below. This embodiment is based on the following matters already described as the conventional method 2 and method 3.
(A) 10,000 test takers took all 100 questions. 30 of the 100 questions are given a pre-reputed difficulty level, and the remaining 70 questions have no difficulty level.
(B) The correct answer rate for each of the 100 questions obtained from the scores of 10,000 test takers and the difficulty level of 30 questions are input to the IRT system, and the ability value is estimated and calculated. Calculate the difficulty level of new questions and the ability values of 10,000 test takers (conventional method 2). This is called the first ability value calculation.
(C) Quality judgment processing is executed by the above-mentioned IRT system for 70 questions that were not initially assigned difficulty (conventional method 3). As a result, 28 questions were determined as non-standard issues, and 42 questions were determined as non-standard issues. This is called the first quality determination process.

＝＝＝本発明の実施例による処理進行＝＝＝
（ｄ）当初から難易度が付いていた３０問と、基準内問題と判定された４２問の合計７２問を対象とし、１万人の受験者の採点結果から得たこれら７２個の問題ごとの正答率と、３０問の難易度とを前記ＩRＴシステムに入力して能力値の推定演算を行い、上記４２個の新問の難易度と、１万人の受験者の能力値を算定する。これを２回目の能力値計算と呼ぶことにする。この計算アルゴリズムは１回目の能力値計算と同じであり周知である。 === Processing Progress According to Embodiment of the Present Invention ===
(D) For each of these 72 questions obtained from the results of scoring 10,000 test takers, a total of 72 questions including 30 questions with difficulty from the beginning and 42 questions determined to be within the standard The correct answer rate of 30 questions and the difficulty level of 30 questions are input to the IRT system to estimate the ability value, and the difficulty level of the 42 new questions and the ability value of 10,000 test takers are calculated. . This is called the second ability value calculation. This calculation algorithm is the same as the first capability value calculation and is well known.

ここで注意すべきことは、上記４２個の新問には１回目の能力値計算により難易度が付与されているが、２回目の能力値計算においてはこれら難易度は採用されていないことである。２回目の能力値計算によって４２個の新問に改めて難易度が付与される。 It should be noted here that the 42 new questions are assigned difficulty levels by the first ability value calculation, but these difficulty levels are not adopted in the second ability value calculation. is there. The difficulty level is newly given to 42 new questions by the second ability value calculation.

（ｅ）改めて難易度が付与された上記４２問について、前記ＩRＴシステムにより、従来の方法３と同じアルゴリズムにより質判定処理を実行する。この２回目の質判定処理により、上記４２問中の１７問が基準外問題と判定され、２５問が基準内問題と判定されたとする。 (E) With respect to the above-mentioned 42 questions to which the difficulty level is newly given, the quality determination process is executed by the same algorithm as the conventional method 3 by the IRT system. It is assumed that 17 of the 42 questions are determined as non-standard issues and 25 questions are determined as non-standard issues in the second quality determination process.

（ｆ）当初から難易度が付いていた３０問と、２回目の質判定処理により基準内問題と判定された上記２５問の合計５５問を対象とし、１万人の受験者の採点結果から得たこれら５５個の問題ごとの正答率と、３０問の難易度とを前記ＩRＴシステムに入力して能力値の推定演算（３回目の能力値計算）を行い、上記２５個の新問の難易度と、１万人の受験者の能力値を算定する。 (F) From the scoring results of 10,000 examinees, covering a total of 55 questions, 30 questions that had difficulty from the beginning and the above 25 questions that were determined to be within the criteria by the second quality determination process. The correct answer rate for each of these 55 questions and the difficulty level of 30 questions are input to the IRT system to perform the ability value estimation calculation (third ability value calculation). Calculate the difficulty and the ability values of 10,000 test takers.

（ｇ）改めて難易度が付与された上記２５問について、前記ＩRＴシステムにより、従来の方法３と同じアルゴリズムにより３回目の質判定処理を実行する。この３回目の質判定処理により、上記２５問中の２２問が基準内問題と判定され、３問が基準外問題と判定されたとする。 (G) For the 25 questions given the difficulty level anew, the quality determination process for the third time is executed by the IRT system using the same algorithm as the conventional method 3. It is assumed that in the third quality determination process, 22 of the 25 questions are determined as problems within the standard, and 3 questions are determined as problems outside the standard.

（ｈ）当初から難易度が付いていた３０問と、３回目の質判定処理により基準内問題と判定された上記２２問の合計５２問を対象とし、１万人の受験者の採点結果から得たこれら５２個の問題ごとの正答率と、３０問の難易度とを前記ＩRＴシステムに入力して能力値の推定演算（４回目の能力値計算）を行い、上記２２個の新問の難易度と、１万人の受験者の能力値を算定する。 (H) From the scoring results of 10,000 examinees, covering a total of 52 questions, 30 questions that had difficulty from the beginning and the above 22 questions that were determined to be within the criteria by the third quality determination process. The correct answer rate for each of these 52 questions and the difficulty level of 30 questions are input to the IRT system, and the ability value is estimated (fourth ability value calculation). Calculate the difficulty and the ability values of 10,000 test takers.

（ｉ）改めて難易度が付与された上記２２問について、前記ＩRＴシステムにより、従来の方法３と同じアルゴリズムにより４回目の質判定処理を実行する。この４回目の質判定処理により、上記２２問のすべてが基準内問題と判定され、基準外問題はなかったとする。 (I) For the 22 questions given the difficulty level again, a fourth quality determination process is executed by the IRT system using the same algorithm as the conventional method 3. It is assumed that through the fourth quality determination process, all of the 22 questions are determined to be in-standard problems, and there are no non-standard problems.

（ｊ）ここで、ＩRＴシステムによりつぎの要領で最後の能力値計算を実施する。当初から難易度が付いていた３０問と、３回目と４回目の質判定処理により基準内問題と判定された上記２２問と、１回目〜３回目の質判定処理により基準外問題と判定された２８＋１７＋３＝４８問の合計１００問を対象とし、１万人の受験者の採点結果から得た全１００個の問題ごとの正答率と、３０問の難易度と、４回目の能力値計算で上記２２問に付与された難易度とを、前記ＩRＴシステムに入力して能力値の推定演算（最後の能力値計算）を行い、上記４８個の新問の難易度と、１万人の受験者の能力値を算定する。 (J) Here, the last ability value calculation is performed by the IRT system in the following manner. 30 questions that had difficulty from the beginning, 22 questions that were determined to be within the standard by the third and fourth quality judgment processes, and a non-standard problem by the first to third quality judgment processes Targeting a total of 100 questions of 28 + 17 + 3 = 48 questions, the correct answer rate for every 100 questions obtained from the scoring results of 10,000 examinees, the difficulty level of 30 questions, and the fourth ability value calculation The difficulty assigned to the 22 questions is input to the IRT system to perform the ability value estimation calculation (last ability value calculation). The difficulty of the 48 new questions and the 10,000 tests The ability value of a person.

この実施例においては、以上のように進行して実行された、最後の能力値計算により算定（推定）された１万人の受験者の能力値を適宜な形式で発表するものである。 In this embodiment, the ability values of 10,000 test takers calculated (estimated) by the last ability value calculation carried out as described above are announced in an appropriate format.

以上のように、当初は７０問あった難易度なし問題に能力値計算によって難易度を付与し、質判定処理により付与した難易度の尤度を分析して、尤度が基準に満たない問題を対象から外して能力値計算を再実行し（定評ある難易度が付いていた３０問の難易度のみを能力値計算に用いる）、対象となった新問に改めて難易度を付与する。これについて質判定処理を再実行し、基準外問題があれば同様の手順で処理を繰り返し、基準外問題がなくなった段階で全１００問を対象として最後の能力値計算を実行する。最後の能力値計算では、最後まで残った基準内問題に最後に付与された難易度を能力値計算の入力情報として用いている。 As described above, the difficulty is assigned by the ability value calculation to the problem with no difficulty initially having 70 questions, the likelihood of the difficulty given by the quality determination process is analyzed, and the likelihood does not meet the standard Is removed from the target, and the ability value calculation is re-executed (only the difficulty level of 30 questions with the established difficulty level is used for the ability value calculation), and the difficulty level is assigned again to the new target question. For this, the quality determination process is re-executed, and if there is a non-standard problem, the process is repeated in the same manner. In the last ability value calculation, the difficulty assigned last to the problem within the standard remaining until the end is used as input information for ability value calculation.

したがって、この発明の実施例によれば、前述した従来の方法３に比べ、すべての試験問題の採点結果に基づいてより尤度の高い能力値を算定できるようになるとともに、能力値の算定に無関係となる試験問題は発生しなくなる。 Therefore, according to the embodiment of the present invention, it is possible to calculate the ability value with higher likelihood based on the scoring results of all the test questions as compared with the above-described conventional method 3, and to calculate the ability value. There will be no irrelevant test questions.

なお、以上説明した実施例では、質判定処理により基準外問題がなくなった段階で最後の能力値計算を行うこととした。これはつぎのように変えても良い。質判定処理により基準外問題と判定された問題の数が規定数以下になったなら最後の能力値計算を行うようにする。また、繰り返し実行する質判定処理において、判定基準を徐々に緩やかにするなど、判定基準を適切に変えるようにしても良い。さらに、最後の能力値計算で付与された問題の難易度の情報はこの発明においてはとくに利用していない。 In the embodiment described above, the last ability value calculation is performed when the non-standard problem is eliminated by the quality determination process. This may be changed as follows. When the number of problems determined as non-standard problems by the quality determination process falls below a specified number, the last ability value calculation is performed. In addition, in the quality determination process that is repeatedly executed, the determination criterion may be appropriately changed, for example, the determination criterion may be gradually relaxed. Further, the information on the difficulty level of the problem given in the last ability value calculation is not particularly used in the present invention.

以上では具体的な数値を当てはめて分かりやすく説明したが、これを一般化すると、この発明はつぎの事項（１）〜（８）により特定されるコンピューティングの方法であると捉えることができる。
（１）複数Ｍの問題を複数Ｎの受験者に回答させて得られた各受験者の各問題ごとの正答および誤答を表す採点結果を項目応答理論の１パラメータロジスティックモデルを適用したＩRＴシステムで処理し、各受験者の能力値を推定する方法であること
（２）Ｍ個の試験問題のうち、ｍ個の問題にはあらかじめ難易度が付いており、他の問題には難易度は付いていないこと
（３）Ｎ人の受験者の採点結果から得たＭ個の問題ごとの正答率と、ｍ問の難易度とを前記ＩRＴシステムに入力して１回目の能力値計算を行い、難易度なし問題の難易度と、Ｎ人の受験者の能力値を算定すること
（４）上記難易度なし問題について前記ＩRＴシステムにより１回目の質判定処理を実行し、基準外問題を特定すること
（５）Ｍ個の問題から基準外問題と判定された問題を除いた問題を対象として、Ｎ人の受験者の採点結果から得た当該対象問題ごとの正答率と、ｍ問の難易度とを前記ＩRＴシステムに入力して２回目の能力値計算を行い、対象となった難易度なし問題の難易度と、Ｎ人の受験者の能力値を算定すること
（６）２回目の能力値計算の対象となった難易度なし問題について前記ＩRＴシステムにより２回目の質判定処理を実行し、基準外問題を特定すること
（７）質判定処理により基準外問題と判定された問題数がゼロまたは所定数以下である場合は最後の能力値計算を行い、そうでない場合は、基準外問題と判定される問題数がゼロまたは所定数以下になるまで対象を絞り込みながら上記の処理（５）および（６）繰り返すこと
（８）最後の能力値計算では、当初から難易度が付いていたｍ問と、直前の質判定処理により基準内問題と判定されたＸ問と、各質判定処理により基準外問題と判定された複数問の合計Ｍ問を対象とし、Ｎ人の受験者の採点結果から得た全Ｍ個の問題ごとの正答率と、ｍ問の難易度と、直前の能力値計算で上記Ｘ問に付与された難易度とを、前記ＩRＴシステムに入力して能力値計算を行い、Ｎ人の受験者の能力値を算定すること In the above description, specific numerical values have been applied for easy understanding. However, when generalized, the present invention can be regarded as a computing method specified by the following items (1) to (8).
(1) An IRT system that applies a one-parameter logistic model of item response theory to scoring results representing the correct and incorrect answers for each question obtained by letting multiple N examinees answer multiple M questions (2) Of M test questions, m questions have a difficulty level in advance, and other questions have difficulty levels. (3) Input the correct answer rate for each of M questions and the difficulty of m questions obtained from the scoring results of N examinees into the IRT system to calculate the first ability value. Calculating the difficulty level of the non-difficulty problem and the ability value of N examinees (4) The first quality judgment process is performed by the above-mentioned IRT system for the above-mentioned difficulty level problem, and the non-standard problem is identified. (5) Determined as non-standard problem from M problems 2nd ability value by inputting into the IRT system the correct answer rate for each target question and the difficulty of m questions obtained from the results of scoring N examinees. Calculate and calculate the difficulty level of the subject difficulty level problem and the ability value of N examinees (6) The above IRT for the difficulty level problem subject to the second ability value calculation Execute the second quality judgment process by the system and identify the non-standard problem (7) If the number of problems judged as non-standard problem by the quality judgment process is zero or less than the predetermined number, calculate the last ability value If not, repeat the above processing (5) and (6) while narrowing down the target until the number of problems determined as non-standard problems is zero or less than the predetermined number (8) Last ability value calculation So, there is a difficulty from the beginning M questions, X questions that were determined to be within the criteria by the previous quality determination process, and M questions that were determined as non-standard problems by each quality determination process, The correct answer rate for every M questions obtained from the scoring results, the difficulty level of m questions, and the difficulty level assigned to the above X questions in the previous ability value calculation are input to the IRT system and the ability value Perform calculations to determine the ability values of N examinees

難易度なし問題に難易度を付与する原理の解説図Illustration of the principle of assigning difficulty to problems with no difficulty 本発明の一実施例の処理進行の解説図Explanatory diagram of processing progress of one embodiment of the present invention

Claims

A computing method specified by the following items (1) to (8).
(1) An IRT system that applies a one-parameter logistic model of item response theory to scoring results representing the correct and incorrect answers for each question obtained by letting multiple N examinees answer multiple M questions (2) Of M test questions, m questions have a difficulty level in advance, and other questions have difficulty levels. (3) Input the correct answer rate for each of M questions and the difficulty of m questions obtained from the scoring results of N examinees into the IRT system to calculate the first ability value. Calculating the difficulty level of the non-difficulty problem and the ability value of N examinees (4) The first quality judgment process is performed by the above-mentioned IRT system for the above-mentioned difficulty level problem, and the non-standard problem is identified. (5) Determined as non-standard problem from M problems 2nd ability value by inputting into the IRT system the correct answer rate for each target question and the difficulty of m questions obtained from the results of scoring N examinees. Calculate and calculate the difficulty level of the subject difficulty level problem and the ability value of N examinees (6) The above IRT for the difficulty level problem subject to the second ability value calculation Execute the second quality judgment process by the system and identify the non-standard problem (7) If the number of problems judged as non-standard problem by the quality judgment process is zero or less than the predetermined number, calculate the last ability value If not, repeat the above processing (5) and (6) while narrowing down the target until the number of problems determined as non-standard problems is zero or less than the predetermined number (8) Last ability value calculation So, there is a difficulty from the beginning M questions, X questions that were determined to be within the criteria by the previous quality determination process, and M questions that were determined as non-standard problems by each quality determination process, The correct answer rate for every M questions obtained from the scoring results, the difficulty level of m questions, and the difficulty level assigned to the above X questions in the previous ability value calculation are input to the IRT system and the ability value Perform calculations to determine the ability values of N examinees