JP6190802B2 - Computing to estimate the ability value of many candidates based on item response theory - Google Patents
Computing to estimate the ability value of many candidates based on item response theory Download PDFInfo
- Publication number
- JP6190802B2 JP6190802B2 JP2014264102A JP2014264102A JP6190802B2 JP 6190802 B2 JP6190802 B2 JP 6190802B2 JP 2014264102 A JP2014264102 A JP 2014264102A JP 2014264102 A JP2014264102 A JP 2014264102A JP 6190802 B2 JP6190802 B2 JP 6190802B2
- Authority
- JP
- Japan
- Prior art keywords
- questions
- ability value
- difficulty
- difficulty level
- examinees
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000004044 response Effects 0.000 title claims description 8
- 238000000034 method Methods 0.000 claims description 49
- 238000004364 calculation method Methods 0.000 claims description 40
- 238000012360 testing method Methods 0.000 claims description 35
- 230000008569 process Effects 0.000 claims description 28
- 238000012545 processing Methods 0.000 claims description 11
- 238000007796 conventional method Methods 0.000 description 10
- 238000004422 calculation algorithm Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 2
- 238000010998 test method Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000008685 targeting Effects 0.000 description 1
Images
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Description
この発明は、複数の問題を複数の受験者に回答させて得られた各受験者の各問題ごとの正答および誤答を表す採点結果を、周知の項目応答理論の1パラメータロジスティックモデルを適用したアルゴリズムで処理し、各受験者の能力値を推定するコンピューティングの改良に関するものである。 In the present invention, a one-parameter logistic model of a well-known item response theory is applied to scoring results representing correct and incorrect answers for each question obtained by allowing a plurality of examinees to answer a plurality of questions. The present invention relates to an improvement in computing that uses an algorithm to estimate the ability value of each examinee.
出願人は、1級から5級にグレード分けした実用英語技能検定試験を実施しており、現状では、試験結果を所定の基準により二値判定して合格または不合格というかたちで受験者に検定結果を知らせている。この発明を創作するきっかけは、各グレードの合格・不合格という判定結果に加えて、1級から5級にわたる統一した尺度で表現した能力値(実用英語技能の習得度を表現した値である)を各受験者に提示しようと企図したことにある。たとえば、1級試験に合格したある受験者の能力値が97であり、5級試験に合格したある受験者の能力値が43であるという具合である。 Applicants are conducting practical English proficiency tests that are graded from 1st to 5th grades, and at present, the test results are judged by binary according to the prescribed criteria and passed or rejected. Inform the result. The reason for creating this invention is the ability value expressed on a unified scale ranging from 1st to 5th grades in addition to the judgment result of pass / fail of each grade (value expressing the mastery of practical English skills) Is intended to be presented to each examinee. For example, the ability value of a candidate who has passed the first grade examination is 97, and the ability value of a candidate who has passed the fifth grade examination is 43.
各種の能力検定試験に関する統計確率理論としては、代表的なものとして、項目応答理論(IRT)の1パラメータロジスティック(1PL)モデルがよく知られている。このIRT−1PLモデルでは、複数の問題を複数の受験者に回答させて正答・誤答を採点するという大規模な試験の実施を前提として、統計確率論的に、問題の難易度が大きいほどその問題に正答する確率が小さくなり、また、回答者の能力値が大きいほど問題に正答する確率が大きくなるという関係性について、各問題に正答する確率が回答者の能力値と問題の難易度の関数として表されるという、有名なつぎの理論式を導き出している。
≪参考文献1≫項目応答理論入門―言語テスト・データの新しい分析法、大友賢二著、大修館書店、1996.
≪参考文献2≫項目応答理論・理論編−テストの数理−,豊田秀樹編著;朝倉書店,2005.
≪参考文献3≫ベイジアンネットワーク技術・顧客・ユーザーのモデル化と不確実性理論,本村陽一,岩崎弘利著;東京電機大学出版,2006.
≪参考文献4≫特開2008−242637号公報
≪参考文献5≫特表2013−521053号公報
≪参考文献6≫特開2004−177510号公報
<< Reference 1 >> Introduction to Item Response Theory-A New Method for Analyzing Language Test Data, Kenji Otomo, Taishukan Shoten, 1996.
<< Reference 2 >> Item Response Theory / Theory -Mathematics of Test-, edited by Hideki Toyoda; Asakura Shoten, 2005.
<< Reference 3 >> Bayesian network technology, customer / user modeling and uncertainty theory, Yoichi Motomura, Hirotoshi Iwasaki; Tokyo Denki University Press, 2006.
<<
===従来の方法1===
100個の問題からなる試験を1万人の受験者が受けて、それらの正答・誤答を採点し、100個の各問題ごとの正答率を算出したとする。ここで、これら100個の問題については、過去の試験実績から算定された難易度があらかじめ定まっているものとする。この場合、IRT−1PLモデルの上記式1を用いた周知のプログラム(これをIRTシステムと称することにする)により、採点により得られた各問題ごとの正答率と、各問題に付された難易度とを入力情報として処理することにより、1万人の各受験者の能力値を算出(推定)することができる。そして、100個のすべての問題の難易度が過去の実績を反映して信頼性が高いとすれば、算出される1万人の受験者の能力値の信頼性も高いと考えられる。
=== conventional method 1 ===
Suppose that 10,000 examinees have taken a test consisting of 100 questions, scored the correct / incorrect answers, and calculated the correct answer rate for each of the 100 questions. Here, it is assumed that the difficulty calculated from the past test results is determined in advance for these 100 problems. In this case, the correct answer rate for each question obtained by scoring and the difficulty attached to each question by a well-known program using the above equation 1 of the IRT-1PL model (hereinafter referred to as an IRT system) By processing the degree as input information, it is possible to calculate (estimate) the ability value of each 10,000 examinees. If the difficulty level of all 100 questions reflects the past results and is highly reliable, it is considered that the calculated ability values of 10,000 examinees are also highly reliable.
===従来の方法2===
上記の方法1は、100個の試験問題のすべてが定評のある難易度の付いた過去問であることを前提としている。全国的な規模で使用された問題の公開を前提として定期的に行われる実用英語技能検定試験のような試験では、試験問題のすべてを過去問とするような方法はとうてい採用することはできないので、つぎのような試験方法(方法2)が実施されている。
=== conventional method 2 ===
Method 1 above assumes that all 100 test questions are past questions with a proven difficulty. For exams such as the Practical English Proficiency Test that are held regularly on the premise that the questions used on a nationwide scale are made public, it is not possible to adopt a method in which all of the exam questions are past questions. The following test method (Method 2) is carried out.
方法2においては、たとえば、全100個の試験問題を、定評のある難易度の付いた問題(過去問)と、難易度がまだ付いていない問題(新問)とを混合した構成として試験を実施する。具体例として、難易度付きの30問と難易度なしの70問とで試験問題を構成し、これを1万人が受験し、その採点結果に基づいて1万人の受験者のそれぞれの能力値を算定(推定)する場合について説明を進める。 In Method 2, for example, all 100 test questions are tested as a mixture of problems with a proven difficulty (past questions) and problems that do not have difficulty yet (new questions). carry out. As a specific example, a test question is composed of 30 questions with difficulty and 70 questions without difficulty, and 10,000 people take the exam, and each ability of 10,000 test takers based on the scoring results The explanation is given for the case of calculating (estimating) the value.
1万人の受験者の採点結果から得た100個の問題ごとの正答率を前記IRTシステムに入力するとともに、100問中の30問についてはそれぞれに設定されている難易度を当該IRTシステムに入力して、プログラムされた推定演算を実行させる。IRTシステムは、前記式1に基づく統計確率論的な演算により、70個の新問の難易度をそれぞれに算定(推定)するとともに、1万人の受験者のそれぞれの能力値を算定(推定)して出力する。 The correct answer rate for each of the 100 questions obtained from the scores of 10,000 test takers is input to the IRT system, and the difficulty set for each of the 30 questions out of 100 is entered into the IRT system. Enter and run the programmed estimation operation. The IRT system calculates (estimates) the difficulty level of 70 new questions, respectively, and calculates (estimates) the respective ability values of 10,000 test takers, using statistical probability theory calculations based on Equation 1 above. ) And output.
上記の推定演算アルゴリズムは当業者の周知事項であるので詳細は割愛するが、ここでは、その原理について一応説明する。図1には、1万人の採点結果から得た100個の問題ごとの正答率を、一次元の正答率軸上にプロットした様子を示している。プロットされた100個の問題別の正答率のうち、30個には難易度があらかじめ付いており、残りの70個には難易度が付いていない。IRTシステムは、正答率軸上の30個の問題の難易度の配列状態を参照して正答率軸上の残り70個の問題に妥当な難易度を演算により算定して付加する。 The above estimation calculation algorithm is well known to those skilled in the art and will not be described in detail, but here the principle will be described. FIG. 1 shows a state where the correct answer rate for every 100 questions obtained from the scoring results of 10,000 people is plotted on the one-dimensional correct answer rate axis. Of the 100 correct answer rates by problem, 30 have a difficulty level in advance, and the remaining 70 have no difficulty level. The IRT system calculates and adds an appropriate difficulty level to the remaining 70 questions on the correct answer rate axis by referring to the arrangement state of the 30 difficulty levels on the correct answer rate axis.
上記の処理により100個の問題のすべてに難易度がついたので、方法1と同じアルゴリズムにより1万人の受験者ごとの能力値を算定することができる。このようにしてIRTシステムは、70個の問題ごとの難易度と1万人の受験者ごとの能力値とを算定(推定)して出力する。 Since all of the 100 questions have been made difficult by the above processing, the ability value for each 10,000 examinees can be calculated by the same algorithm as in Method 1. In this way, the IRT system calculates (estimates) and outputs the difficulty level for each of the 70 questions and the ability value for each of 10,000 examinees.
===方法2の問題点===
上記の説明から理解できるように、方法2により推定される受験者の能力値の尤もらしさは、方法1と比較して明らかに悪化する。方法1では全100問のすべてに定評のある難易度が付いていたのに対し、方法2では100問中の30問にしか定評のある難易度は付いておらず、残り70問の問題の難易度も1万人の採点結果から推定し、その上で1万人の能力値を推定しているのであるから、尤度が低くなるのは当然であるといえる。
=== Problem of Method 2 ===
As can be understood from the above description, the likelihood of the ability value of the examinee estimated by the method 2 is clearly deteriorated as compared with the method 1. In Method 1, all of the 100 questions had a proven difficulty, whereas in Method 2, only 30 of the 100 questions had a proven difficulty, and the remaining 70 questions. Since the difficulty level is estimated from the scoring result of 10,000 people, and then the ability value of 10,000 people is estimated, it can be said that the likelihood is naturally lowered.
===従来の方法3===
方法2における上述した問題点を少しでも改善するべく、つぎに述べる処理を加えた方法3が従来から実施されていた。方法2の上記演算結果を前提とし、当初難易度が付いていなかった70問について、以下のようにして問題の質指標を計算し、その質指標が所定の基準に満たなかった問題を能力値の推定演算の対象から削除する。
=== conventional method 3 ===
In order to improve the above-described problems in the method 2 as much as possible, the method 3 to which the following processing is added has been conventionally performed. Based on the above calculation result of method 2, the quality index of the problem is calculated as follows for 70 questions that did not have difficulty at first, and the problem that the quality index did not meet the predetermined standard Are removed from the target of the estimation operation.
問題の質指標はつぎのように計算する。上記方法2の推定演算処理により、当初難易度が付いていなかった70問について難易度が算定されるとともに、1万人の能力値が算定された。このことを前提として説明を進める。また100個の問題には識別番号i(i=1〜100)が付いている。 The quality indicator of the problem is calculated as follows: With the estimation calculation process of Method 2 above, the difficulty level was calculated for 70 questions that did not have a difficulty level at the beginning, and the ability value of 10,000 people was calculated. The explanation will proceed on the premise of this. In addition, the identification number i (i = 1 to 100) is attached to 100 problems.
当初難易度が付いていなかったある問題iについては1万人の回答データから受験者の能力値θを推定し、この問題iに付された難易度bとに基づいて、式1により理論上の正答率Pを計算する。そして、この問題iの実際の正答率と正答率理論値Pの差分(これが問題の質指標である)を計算する。この差分が所定の基準範囲から外れて大きい場合、この問題iを能力値の推定演算の対象から削除する。このように判定された問題iのことを基準外問題と称する。 For a problem i that was not initially assigned a difficulty level, the ability value θ of the examinee is estimated from the response data of 10,000 people, and based on the difficulty level b attached to the problem i, the following equation 1 is theoretically calculated. The correct answer rate P is calculated. Then, the difference between the actual correct answer rate of question i and the correct answer rate theoretical value P (this is the quality indicator of the question) is calculated. When this difference is large outside the predetermined reference range, this problem i is deleted from the target of the ability value estimation calculation. The problem i determined in this way is referred to as a non-reference problem.
以上の追加の処理を、当初難易度がついていなかった70問について実施する。前述した周知のIRTシステムには、この処理を実施する機能も備わっている。この説明例においては、上記の70問中の28問が基準外問題として判定されたとする。70問中の42問は基準内問題である。 The above additional processing is performed for 70 questions that were not initially assigned difficulty. The well-known IRT system described above also has a function of performing this processing. In this explanation example, it is assumed that 28 of the above 70 questions are determined as non-standard issues. 42 of the 70 questions are within the standards.
つぎに、上記の質判定で基準内問題とされた42問と、当初から難易度が付いていた30問の合計72問のみを対象とし、1万人の採点結果から得た72問の正答率を前記IRTシステムに入力するとともに、30個の問題に当初から付されていた各問題の難易度を前記IRTシステムに入力し、方法2と同じ推定演算を実行させる。そうすると方法2で説明したとおり、前記IRTシステムは、42問の問題ごとの難易度を出力するとともに、1万人の受験者ごとの能力値(再計算された能力値)を出力する。 Next, the 72 correct answers obtained from the scoring results of 10,000 people, covering only a total of 72 questions, including 42 questions that were regarded as in-standard questions in the quality judgment and 30 questions that had difficulty from the beginning. The rate is input to the IRT system, and the difficulty level of each problem assigned to 30 problems from the beginning is input to the IRT system, and the same estimation operation as that of the method 2 is executed. Then, as described in Method 2, the IRT system outputs the difficulty level for each of the 42 questions and outputs the ability value (recalculated ability value) for each of 10,000 examinees.
1万人の受験者について再計算された能力値は、推定演算の対象から削除された28問の影響を受けて、最初の推定演算により算定された能力値とは異なる値になり(全員について異なるという意味ではない)、かつ、再計算された能力値は1回目の計算よりもその尤度は向上していると考えられる。従来は、このようにして再計算された能力値をそれぞれ各受験者に提示する方法が望ましいとされていた。 The recalculated ability values for 10,000 test takers are affected by the 28 questions that were deleted from the estimation calculations, and differ from the ability values calculated by the first estimation calculation (for all It does not mean that they are different), and the recalculated ability value is considered to be more likely than the first calculation. Conventionally, it has been desirable to provide each examinee with the ability values recalculated in this way.
===方法3の問題点===
上記の方法3により再計算した能力値をそれぞれの受験者の能力値として提示する従来の試験方法においては、上記の具体例に従って説明すると、100問中の28問(前記の質判定により基準外問題とされた28問である)は能力値の推定演算の対象から完全に削除されており、この28問に対して正答したのか誤答したのかという各受験者の事実関係は各人の能力値にまったく反映していないのである。このことが本出願の発明者が解決すべき課題として認識したことである。
=== Problem of Method 3 ===
In the conventional test method in which the ability value recalculated by the above method 3 is presented as the ability value of each examinee, it will be explained according to the above specific example. 28 questions) are completely removed from the target of the ability value estimation calculation, and each test taker's facts about whether they answered correctly or not are the ability of each person. It is not reflected in the value at all. This is what the inventor of the present application has recognized as a problem to be solved.
===解決課題および目的===
以下では、この発明の端的な理解促進を目的として、上記の「発明の背景」で説明した文脈の中で、この発明の核心とする事項を具体的に説明することとする。
=== Solution Issues and Objectives ===
In the following, for the purpose of facilitating a simple understanding of the present invention, the core matter of the present invention will be specifically described in the context described in the “Background of the Invention” above.
実用英語技能検定試験などのような大規模に定期的に実施される試験の場合、方法3の質判定により基準外問題とされた100問中の28問に対する各受験者の正答または誤答が各人に伝えられた能力値とまったく無関係であるという事実は、多数の受験者が採点結果や能力値を見せあって、互いの情報を突き合わせて受験者間で分析が行われることにより、比較的容易にその事実関係が解明されるものである。このことは技能検定試験の信頼性評価を引き下げるマイナス要因となってしまう。 In the case of a test that is regularly conducted on a large scale, such as a practical English proficiency test, the correct or incorrect answer of each examinee for 28 out of 100 questions that were judged as non-standard questions by the quality judgment of Method 3 The fact that it is completely unrelated to the ability value communicated to each person is compared by the fact that a large number of examinees show their scoring results and ability values, collate information with each other, and perform analysis between examinees. The factual relationship can be easily clarified. This becomes a negative factor that lowers the reliability evaluation of the skill test.
この発明の目的の1つは、能力値の推定に無関係となる試験問題が発生しないようにすることにある。この発明のもう1つの目的は、すべての試験問題の採点結果に基づいてより尤度の高い能力値を算定できるようにすることにある。 One of the objects of the present invention is to prevent a test problem unrelated to the estimation of the capability value from occurring. Another object of the present invention is to make it possible to calculate a capability value having a higher likelihood based on the scoring results of all test questions.
===本発明の実施例説明の前提事項===
以下に説明する具体的な実施例の処理進行の様子を図2に図解した。この実施例は、従来の方法2および方法3としてすでに説明したつぎの事項を前提としている。
(a)1万人の受験者が全100問の試験を受けた。100問中の30問には定評のある難易度があらかじめ付与されおり、残りの70問には難易度は付いていない。
(b)1万人の受験者の採点結果から得た100個の問題ごとの正答率と、30問の難易度とを前記IRTシステムに入力して能力値の推定演算を行い、70個の新問の難易度と、1万人の受験者の能力値を算定する(従来の方法2)。これを1回目の能力値計算と呼ぶことにする。
(c)当初難易度の付いていなかった70問について前記IRTシステムにより質判定処理を実行する(従来の方法3)。その結果、28問が基準外問題と判定され、42問が基準内問題と判定された。これを1回目の質判定処理と呼ぶことにする。
=== Prerequisites for explaining the embodiments of the present invention ===
FIG. 2 illustrates the progress of processing in a specific embodiment described below. This embodiment is based on the following matters already described as the conventional method 2 and method 3.
(A) 10,000 test takers took all 100 questions. 30 of the 100 questions are given a pre-reputed difficulty level, and the remaining 70 questions have no difficulty level.
(B) The correct answer rate for each of the 100 questions obtained from the scores of 10,000 test takers and the difficulty level of 30 questions are input to the IRT system, and the ability value is estimated and calculated. Calculate the difficulty level of new questions and the ability values of 10,000 test takers (conventional method 2). This is called the first ability value calculation.
(C) Quality judgment processing is executed by the above-mentioned IRT system for 70 questions that were not initially assigned difficulty (conventional method 3). As a result, 28 questions were determined as non-standard issues, and 42 questions were determined as non-standard issues. This is called the first quality determination process.
===本発明の実施例による処理進行===
(d)当初から難易度が付いていた30問と、基準内問題と判定された42問の合計72問を対象とし、1万人の受験者の採点結果から得たこれら72個の問題ごとの正答率と、30問の難易度とを前記IRTシステムに入力して能力値の推定演算を行い、上記42個の新問の難易度と、1万人の受験者の能力値を算定する。これを2回目の能力値計算と呼ぶことにする。この計算アルゴリズムは1回目の能力値計算と同じであり周知である。
=== Processing Progress According to Embodiment of the Present Invention ===
(D) For each of these 72 questions obtained from the results of scoring 10,000 test takers, a total of 72 questions including 30 questions with difficulty from the beginning and 42 questions determined to be within the standard The correct answer rate of 30 questions and the difficulty level of 30 questions are input to the IRT system to estimate the ability value, and the difficulty level of the 42 new questions and the ability value of 10,000 test takers are calculated. . This is called the second ability value calculation. This calculation algorithm is the same as the first capability value calculation and is well known.
ここで注意すべきことは、上記42個の新問には1回目の能力値計算により難易度が付与されているが、2回目の能力値計算においてはこれら難易度は採用されていないことである。2回目の能力値計算によって42個の新問に改めて難易度が付与される。 It should be noted here that the 42 new questions are assigned difficulty levels by the first ability value calculation, but these difficulty levels are not adopted in the second ability value calculation. is there. The difficulty level is newly given to 42 new questions by the second ability value calculation.
(e)改めて難易度が付与された上記42問について、前記IRTシステムにより、従来の方法3と同じアルゴリズムにより質判定処理を実行する。この2回目の質判定処理により、上記42問中の17問が基準外問題と判定され、25問が基準内問題と判定されたとする。 (E) With respect to the above-mentioned 42 questions to which the difficulty level is newly given, the quality determination process is executed by the same algorithm as the conventional method 3 by the IRT system. It is assumed that 17 of the 42 questions are determined as non-standard issues and 25 questions are determined as non-standard issues in the second quality determination process.
(f)当初から難易度が付いていた30問と、2回目の質判定処理により基準内問題と判定された上記25問の合計55問を対象とし、1万人の受験者の採点結果から得たこれら55個の問題ごとの正答率と、30問の難易度とを前記IRTシステムに入力して能力値の推定演算(3回目の能力値計算)を行い、上記25個の新問の難易度と、1万人の受験者の能力値を算定する。 (F) From the scoring results of 10,000 examinees, covering a total of 55 questions, 30 questions that had difficulty from the beginning and the above 25 questions that were determined to be within the criteria by the second quality determination process. The correct answer rate for each of these 55 questions and the difficulty level of 30 questions are input to the IRT system to perform the ability value estimation calculation (third ability value calculation). Calculate the difficulty and the ability values of 10,000 test takers.
(g)改めて難易度が付与された上記25問について、前記IRTシステムにより、従来の方法3と同じアルゴリズムにより3回目の質判定処理を実行する。この3回目の質判定処理により、上記25問中の22問が基準内問題と判定され、3問が基準外問題と判定されたとする。 (G) For the 25 questions given the difficulty level anew, the quality determination process for the third time is executed by the IRT system using the same algorithm as the conventional method 3. It is assumed that in the third quality determination process, 22 of the 25 questions are determined as problems within the standard, and 3 questions are determined as problems outside the standard.
(h)当初から難易度が付いていた30問と、3回目の質判定処理により基準内問題と判定された上記22問の合計52問を対象とし、1万人の受験者の採点結果から得たこれら52個の問題ごとの正答率と、30問の難易度とを前記IRTシステムに入力して能力値の推定演算(4回目の能力値計算)を行い、上記22個の新問の難易度と、1万人の受験者の能力値を算定する。 (H) From the scoring results of 10,000 examinees, covering a total of 52 questions, 30 questions that had difficulty from the beginning and the above 22 questions that were determined to be within the criteria by the third quality determination process. The correct answer rate for each of these 52 questions and the difficulty level of 30 questions are input to the IRT system, and the ability value is estimated (fourth ability value calculation). Calculate the difficulty and the ability values of 10,000 test takers.
(i)改めて難易度が付与された上記22問について、前記IRTシステムにより、従来の方法3と同じアルゴリズムにより4回目の質判定処理を実行する。この4回目の質判定処理により、上記22問のすべてが基準内問題と判定され、基準外問題はなかったとする。 (I) For the 22 questions given the difficulty level again, a fourth quality determination process is executed by the IRT system using the same algorithm as the conventional method 3. It is assumed that through the fourth quality determination process, all of the 22 questions are determined to be in-standard problems, and there are no non-standard problems.
(j)ここで、IRTシステムによりつぎの要領で最後の能力値計算を実施する。当初から難易度が付いていた30問と、3回目と4回目の質判定処理により基準内問題と判定された上記22問と、1回目〜3回目の質判定処理により基準外問題と判定された28+17+3=48問の合計100問を対象とし、1万人の受験者の採点結果から得た全100個の問題ごとの正答率と、30問の難易度と、4回目の能力値計算で上記22問に付与された難易度とを、前記IRTシステムに入力して能力値の推定演算(最後の能力値計算)を行い、上記48個の新問の難易度と、1万人の受験者の能力値を算定する。 (J) Here, the last ability value calculation is performed by the IRT system in the following manner. 30 questions that had difficulty from the beginning, 22 questions that were determined to be within the standard by the third and fourth quality judgment processes, and a non-standard problem by the first to third quality judgment processes Targeting a total of 100 questions of 28 + 17 + 3 = 48 questions, the correct answer rate for every 100 questions obtained from the scoring results of 10,000 examinees, the difficulty level of 30 questions, and the fourth ability value calculation The difficulty assigned to the 22 questions is input to the IRT system to perform the ability value estimation calculation (last ability value calculation). The difficulty of the 48 new questions and the 10,000 tests The ability value of a person.
この実施例においては、以上のように進行して実行された、最後の能力値計算により算定(推定)された1万人の受験者の能力値を適宜な形式で発表するものである。 In this embodiment, the ability values of 10,000 test takers calculated (estimated) by the last ability value calculation carried out as described above are announced in an appropriate format.
以上のように、当初は70問あった難易度なし問題に能力値計算によって難易度を付与し、質判定処理により付与した難易度の尤度を分析して、尤度が基準に満たない問題を対象から外して能力値計算を再実行し(定評ある難易度が付いていた30問の難易度のみを能力値計算に用いる)、対象となった新問に改めて難易度を付与する。これについて質判定処理を再実行し、基準外問題があれば同様の手順で処理を繰り返し、基準外問題がなくなった段階で全100問を対象として最後の能力値計算を実行する。最後の能力値計算では、最後まで残った基準内問題に最後に付与された難易度を能力値計算の入力情報として用いている。 As described above, the difficulty is assigned by the ability value calculation to the problem with no difficulty initially having 70 questions, the likelihood of the difficulty given by the quality determination process is analyzed, and the likelihood does not meet the standard Is removed from the target, and the ability value calculation is re-executed (only the difficulty level of 30 questions with the established difficulty level is used for the ability value calculation), and the difficulty level is assigned again to the new target question. For this, the quality determination process is re-executed, and if there is a non-standard problem, the process is repeated in the same manner. In the last ability value calculation, the difficulty assigned last to the problem within the standard remaining until the end is used as input information for ability value calculation.
したがって、この発明の実施例によれば、前述した従来の方法3に比べ、すべての試験問題の採点結果に基づいてより尤度の高い能力値を算定できるようになるとともに、能力値の算定に無関係となる試験問題は発生しなくなる。 Therefore, according to the embodiment of the present invention, it is possible to calculate the ability value with higher likelihood based on the scoring results of all the test questions as compared with the above-described conventional method 3, and to calculate the ability value. There will be no irrelevant test questions.
なお、以上説明した実施例では、質判定処理により基準外問題がなくなった段階で最後の能力値計算を行うこととした。これはつぎのように変えても良い。質判定処理により基準外問題と判定された問題の数が規定数以下になったなら最後の能力値計算を行うようにする。また、繰り返し実行する質判定処理において、判定基準を徐々に緩やかにするなど、判定基準を適切に変えるようにしても良い。さらに、最後の能力値計算で付与された問題の難易度の情報はこの発明においてはとくに利用していない。 In the embodiment described above, the last ability value calculation is performed when the non-standard problem is eliminated by the quality determination process. This may be changed as follows. When the number of problems determined as non-standard problems by the quality determination process falls below a specified number, the last ability value calculation is performed. In addition, in the quality determination process that is repeatedly executed, the determination criterion may be appropriately changed, for example, the determination criterion may be gradually relaxed. Further, the information on the difficulty level of the problem given in the last ability value calculation is not particularly used in the present invention.
以上では具体的な数値を当てはめて分かりやすく説明したが、これを一般化すると、この発明はつぎの事項(1)〜(8)により特定されるコンピューティングの方法であると捉えることができる。
(1)複数Mの問題を複数Nの受験者に回答させて得られた各受験者の各問題ごとの正答および誤答を表す採点結果を項目応答理論の1パラメータロジスティックモデルを適用したIRTシステムで処理し、各受験者の能力値を推定する方法であること
(2)M個の試験問題のうち、m個の問題にはあらかじめ難易度が付いており、他の問題には難易度は付いていないこと
(3)N人の受験者の採点結果から得たM個の問題ごとの正答率と、m問の難易度とを前記IRTシステムに入力して1回目の能力値計算を行い、難易度なし問題の難易度と、N人の受験者の能力値を算定すること
(4)上記難易度なし問題について前記IRTシステムにより1回目の質判定処理を実行し、基準外問題を特定すること
(5)M個の問題から基準外問題と判定された問題を除いた問題を対象として、N人の受験者の採点結果から得た当該対象問題ごとの正答率と、m問の難易度とを前記IRTシステムに入力して2回目の能力値計算を行い、対象となった難易度なし問題の難易度と、N人の受験者の能力値を算定すること
(6)2回目の能力値計算の対象となった難易度なし問題について前記IRTシステムにより2回目の質判定処理を実行し、基準外問題を特定すること
(7)質判定処理により基準外問題と判定された問題数がゼロまたは所定数以下である場合は最後の能力値計算を行い、そうでない場合は、基準外問題と判定される問題数がゼロまたは所定数以下になるまで対象を絞り込みながら上記の処理(5)および(6)繰り返すこと
(8)最後の能力値計算では、当初から難易度が付いていたm問と、直前の質判定処理により基準内問題と判定されたX問と、各質判定処理により基準外問題と判定された複数問の合計M問を対象とし、N人の受験者の採点結果から得た全M個の問題ごとの正答率と、m問の難易度と、直前の能力値計算で上記X問に付与された難易度とを、前記IRTシステムに入力して能力値計算を行い、N人の受験者の能力値を算定すること
In the above description, specific numerical values have been applied for easy understanding. However, when generalized, the present invention can be regarded as a computing method specified by the following items (1) to (8).
(1) An IRT system that applies a one-parameter logistic model of item response theory to scoring results representing the correct and incorrect answers for each question obtained by letting multiple N examinees answer multiple M questions (2) Of M test questions, m questions have a difficulty level in advance, and other questions have difficulty levels. (3) Input the correct answer rate for each of M questions and the difficulty of m questions obtained from the scoring results of N examinees into the IRT system to calculate the first ability value. Calculating the difficulty level of the non-difficulty problem and the ability value of N examinees (4) The first quality judgment process is performed by the above-mentioned IRT system for the above-mentioned difficulty level problem, and the non-standard problem is identified. (5) Determined as non-standard problem from M problems 2nd ability value by inputting into the IRT system the correct answer rate for each target question and the difficulty of m questions obtained from the results of scoring N examinees. Calculate and calculate the difficulty level of the subject difficulty level problem and the ability value of N examinees (6) The above IRT for the difficulty level problem subject to the second ability value calculation Execute the second quality judgment process by the system and identify the non-standard problem (7) If the number of problems judged as non-standard problem by the quality judgment process is zero or less than the predetermined number, calculate the last ability value If not, repeat the above processing (5) and (6) while narrowing down the target until the number of problems determined as non-standard problems is zero or less than the predetermined number (8) Last ability value calculation So, there is a difficulty from the beginning M questions, X questions that were determined to be within the criteria by the previous quality determination process, and M questions that were determined as non-standard problems by each quality determination process, The correct answer rate for every M questions obtained from the scoring results, the difficulty level of m questions, and the difficulty level assigned to the above X questions in the previous ability value calculation are input to the IRT system and the ability value Perform calculations to determine the ability values of N examinees
Claims (1)
(1)複数Mの問題を複数Nの受験者に回答させて得られた各受験者の各問題ごとの正答および誤答を表す採点結果を項目応答理論の1パラメータロジスティックモデルを適用したIRTシステムで処理し、各受験者の能力値を推定する方法であること
(2)M個の試験問題のうち、m個の問題にはあらかじめ難易度が付いており、他の問題には難易度は付いていないこと
(3)N人の受験者の採点結果から得たM個の問題ごとの正答率と、m問の難易度とを前記IRTシステムに入力して1回目の能力値計算を行い、難易度なし問題の難易度と、N人の受験者の能力値を算定すること
(4)上記難易度なし問題について前記IRTシステムにより1回目の質判定処理を実行し、基準外問題を特定すること
(5)M個の問題から基準外問題と判定された問題を除いた問題を対象として、N人の受験者の採点結果から得た当該対象問題ごとの正答率と、m問の難易度とを前記IRTシステムに入力して2回目の能力値計算を行い、対象となった難易度なし問題の難易度と、N人の受験者の能力値を算定すること
(6)2回目の能力値計算の対象となった難易度なし問題について前記IRTシステムにより2回目の質判定処理を実行し、基準外問題を特定すること
(7)質判定処理により基準外問題と判定された問題数がゼロまたは所定数以下である場合は最後の能力値計算を行い、そうでない場合は、基準外問題と判定される問題数がゼロまたは所定数以下になるまで対象を絞り込みながら上記の処理(5)および(6)繰り返すこと
(8)最後の能力値計算では、当初から難易度が付いていたm問と、直前の質判定処理により基準内問題と判定されたX問と、各質判定処理により基準外問題と判定された複数問の合計M問を対象とし、N人の受験者の採点結果から得た全M個の問題ごとの正答率と、m問の難易度と、直前の能力値計算で上記X問に付与された難易度とを、前記IRTシステムに入力して能力値計算を行い、N人の受験者の能力値を算定すること A computing method specified by the following items (1) to (8).
(1) An IRT system that applies a one-parameter logistic model of item response theory to scoring results representing the correct and incorrect answers for each question obtained by letting multiple N examinees answer multiple M questions (2) Of M test questions, m questions have a difficulty level in advance, and other questions have difficulty levels. (3) Input the correct answer rate for each of M questions and the difficulty of m questions obtained from the scoring results of N examinees into the IRT system to calculate the first ability value. Calculating the difficulty level of the non-difficulty problem and the ability value of N examinees (4) The first quality judgment process is performed by the above-mentioned IRT system for the above-mentioned difficulty level problem, and the non-standard problem is identified. (5) Determined as non-standard problem from M problems 2nd ability value by inputting into the IRT system the correct answer rate for each target question and the difficulty of m questions obtained from the results of scoring N examinees. Calculate and calculate the difficulty level of the subject difficulty level problem and the ability value of N examinees (6) The above IRT for the difficulty level problem subject to the second ability value calculation Execute the second quality judgment process by the system and identify the non-standard problem (7) If the number of problems judged as non-standard problem by the quality judgment process is zero or less than the predetermined number, calculate the last ability value If not, repeat the above processing (5) and (6) while narrowing down the target until the number of problems determined as non-standard problems is zero or less than the predetermined number (8) Last ability value calculation So, there is a difficulty from the beginning M questions, X questions that were determined to be within the criteria by the previous quality determination process, and M questions that were determined as non-standard problems by each quality determination process, The correct answer rate for every M questions obtained from the scoring results, the difficulty level of m questions, and the difficulty level assigned to the above X questions in the previous ability value calculation are input to the IRT system and the ability value Perform calculations to determine the ability values of N examinees
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2014264102A JP6190802B2 (en) | 2014-12-26 | 2014-12-26 | Computing to estimate the ability value of many candidates based on item response theory |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2014264102A JP6190802B2 (en) | 2014-12-26 | 2014-12-26 | Computing to estimate the ability value of many candidates based on item response theory |
Publications (2)
Publication Number | Publication Date |
---|---|
JP2016126029A JP2016126029A (en) | 2016-07-11 |
JP6190802B2 true JP6190802B2 (en) | 2017-08-30 |
Family
ID=56359329
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP2014264102A Active JP6190802B2 (en) | 2014-12-26 | 2014-12-26 | Computing to estimate the ability value of many candidates based on item response theory |
Country Status (1)
Country | Link |
---|---|
JP (1) | JP6190802B2 (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106780222B (en) * | 2017-01-12 | 2020-03-27 | 陈星� | Method and device for acquiring simulated comprehensive scores |
JP6489723B1 (en) * | 2018-06-29 | 2019-03-27 | 株式会社なるほどゼミナール | Learning support apparatus, method, and computer program |
JP7067428B2 (en) * | 2018-11-05 | 2022-05-16 | 日本電信電話株式会社 | Learning support devices, learning support methods and programs |
CN110377814A (en) * | 2019-05-31 | 2019-10-25 | 平安国际智慧城市科技股份有限公司 | Topic recommended method, device and medium |
JP7290273B2 (en) * | 2019-06-17 | 2023-06-13 | 国立大学法人 筑波大学 | Item check device for tests based on item response theory |
JP7339414B1 (en) | 2022-11-04 | 2023-09-05 | 株式会社Z会 | Proficiency level determination device, proficiency level determination method, and program |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003085296A (en) * | 2001-09-06 | 2003-03-20 | Nippon Telegr & Teleph Corp <Ntt> | Method and device for evaluating test question and its program and storage medium with its program stored thereon |
US20080124696A1 (en) * | 2006-10-26 | 2008-05-29 | Houser Ronald L | Empirical development of learning content using educational measurement scales |
JP5029090B2 (en) * | 2007-03-26 | 2012-09-19 | Kddi株式会社 | Capability estimation system and method, program, and recording medium |
-
2014
- 2014-12-26 JP JP2014264102A patent/JP6190802B2/en active Active
Also Published As
Publication number | Publication date |
---|---|
JP2016126029A (en) | 2016-07-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6190802B2 (en) | Computing to estimate the ability value of many candidates based on item response theory | |
CN112508334B (en) | Personalized paper grouping method and system integrating cognition characteristics and test question text information | |
CN105279365A (en) | Method for learning exemplars for anomaly detection | |
CN106296503A (en) | Evaluation of teacher's method and system | |
CN111651677B (en) | Course content recommendation method, apparatus, computer device and storage medium | |
JP6879526B2 (en) | How to analyze the data | |
US11915615B2 (en) | Systems and methods for detecting collusion in student testing using graded scores or answers for individual questions | |
JP2018205354A (en) | Learning support device, learning support system, and program | |
CN111626372A (en) | Online teaching supervision management method and system | |
Ahmad et al. | An improved course assessment measurement for analyzing learning outcomes performance using Rasch model | |
Foster et al. | Selection tests work better than we think they do, and have for years | |
US20170206456A1 (en) | Assessment performance prediction | |
CN117079504B (en) | Wrong question data management method of big data accurate teaching and reading system | |
CN113361780A (en) | Behavior data-based crowdsourcing tester evaluation method | |
CN109934407A (en) | A kind of volunteers working intention prediction technique based on Logistic generalized linear regression model | |
CN111932160A (en) | Knowledge acquisition information processing method, knowledge acquisition information processing device, computer device, and storage medium | |
CN116562836A (en) | Method, device, electronic equipment and storage medium for multidimensional forced choice question character test | |
CN111639194A (en) | Knowledge graph query method and system based on sentence vectors | |
CN116704606A (en) | Physicochemical experiment operation behavior identification method, system, device and storage medium | |
CN115689000A (en) | Learning situation intelligent prediction method and system based on whole learning behavior flow | |
CN108417266A (en) | The determination method and system of student's cognitive state | |
CN113918825A (en) | Exercise recommendation method and device and computer storage medium | |
CN113919983A (en) | Test question portrait method, device, electronic equipment and storage medium | |
Meijer et al. | Person fit across subgroups: An achievement testing example | |
CN112765830A (en) | Cognitive diagnosis method based on learner cognitive response model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A621 | Written request for application examination |
Free format text: JAPANESE INTERMEDIATE CODE: A621 Effective date: 20160825 |
|
A977 | Report on retrieval |
Free format text: JAPANESE INTERMEDIATE CODE: A971007 Effective date: 20170628 |
|
TRDD | Decision of grant or rejection written | ||
A01 | Written decision to grant a patent or to grant a registration (utility model) |
Free format text: JAPANESE INTERMEDIATE CODE: A01 Effective date: 20170711 |
|
A61 | First payment of annual fees (during grant procedure) |
Free format text: JAPANESE INTERMEDIATE CODE: A61 Effective date: 20170807 |
|
R150 | Certificate of patent or registration of utility model |
Ref document number: 6190802 Country of ref document: JP Free format text: JAPANESE INTERMEDIATE CODE: R150 |
|
R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |
|
R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |
|
R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |
|
R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |
|
R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |