WO2020065663A1 - Methods and systems for partial credit model (pcm) scoring in classical test theory (ctt) - Google Patents

Methods and systems for partial credit model (pcm) scoring in classical test theory (ctt) Download PDF

Info

Publication number
WO2020065663A1
WO2020065663A1 PCT/IN2019/050688 IN2019050688W WO2020065663A1 WO 2020065663 A1 WO2020065663 A1 WO 2020065663A1 IN 2019050688 W IN2019050688 W IN 2019050688W WO 2020065663 A1 WO2020065663 A1 WO 2020065663A1
Authority
WO
WIPO (PCT)
Prior art keywords
test
question
calculating
score
multiple choice
Prior art date
Application number
PCT/IN2019/050688
Other languages
French (fr)
Inventor
Natarajan VENKATESA
Original Assignee
Merittrac Services Pvt. Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Merittrac Services Pvt. Ltd filed Critical Merittrac Services Pvt. Ltd
Publication of WO2020065663A1 publication Critical patent/WO2020065663A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B7/00Electrically-operated teaching apparatus or devices working with questions and answers
    • G09B7/06Electrically-operated teaching apparatus or devices working with questions and answers of the multiple-choice answer-type, i.e. where a given question is provided with a series of answers and a choice has to be made from the answers

Definitions

  • Embodiments disclosed herein relate to methods and systems for managing test scoring and more particularly to methods and systems for using Partial Credit Model (PCM) for scoring multiple choice tests.
  • PCM Partial Credit Model
  • CTT Classical Test Theory
  • CTT comprises of recognizing the correct response to an item to be allotted 1 mark and the incorrect/unanswered answers to be allocated 0 or negative marks.
  • Several statistical quantities are inferred from this such as Total Number Right Score, Score of Odd Numbered Items, Score of Even Numbered Items and Correlating to give Split Half Reliability or internal consistency.
  • Reliabilities are estimated by different formulae contributed by different individuals and agencies over a period of time and put together in the referred text above, such as Kuder Richardson 20 (KR20), Kuder Richardson 21 (KR21), Cronbach’s Coefficient Alpha, Analysis of Variance (ANNOVA) and several others including Rulon’s Formula that gives the higher bound estimate compared to the lower bound estimate of KR21.
  • PCM Partial Credit Model
  • CTT classical Test Theory
  • FIGs. la, lb, and lc depict a system for using Partial Credit Model (PCM) in classical Test Theory (CTT) to improve the reliability estimate for the interpretation of scores in a multiple choice test, according to embodiments as disclosed herein;
  • PCM Partial Credit Model
  • CTT classical Test Theory
  • FIGs. 2, 3a, 3b, 4a, and 4b depict example formats for representing the scores, according to embodiments as disclosed herein;
  • FIG. 5 illustrates a device implementing an apparatus and methods for using Partial Credit Model (PCM) in classical Test Theory (CTT) to improve the reliability estimate for the interpretation of scores in a multiple choice test, according to embodiments as disclosed herein; and
  • PCM Partial Credit Model
  • CTT classical Test Theory
  • FIGs. 6a, 6b and 6c are flowcharts depicting the process of using Partial Credit Model (PCM) in classical Test Theory (CTT) to improve the reliability estimate for the interpretation of scores in a multiple choice test, according to embodiments as disclosed herein.
  • PCM Partial Credit Model
  • CTT classical Test Theory
  • Embodiments herein may be described and illustrated in terms of blocks which carry out a described function or functions. These blocks, which may be referred to herein as units or modules or the like, are physically implemented by analog and/or digital circuits such as logic gates, integrated circuits, microprocessors, microcontrollers, memory circuits, passive electronic components, active electronic components, optical components, hardwired circuits and the like, and may optionally be driven by firmware and/or software.
  • the circuits may, for example, be embodied in one or more semiconductor chips, or on substrate supports such as printed circuit boards and the like.
  • circuits constituting a block may be implemented by dedicated hardware, or by a processor (e.g., one or more programmed microprocessors and associated circuitry), or by a combination of dedicated hardware to perform some functions of the block and a processor to perform other functions of the block.
  • a processor e.g., one or more programmed microprocessors and associated circuitry
  • Each block of the embodiments may be physically separated into two or more interacting and discrete blocks without departing from the scope of the disclosure.
  • the blocks of the embodiments may be physically combined into more complex blocks without departing from the scope of the disclosure.
  • Item Response Theory is a theory of Testing. IRT can assign partial credit to undertake calculations using iterative procedure assuming an initial value and refining it through successive approximation using a Maximum Likelihood Function.
  • the item parameters i.e. Single parameter ( Item difficulty (b)), Two Parameter ( Item difficulty(b), Item Discrimination(a)) and Three Parameter ( Item difficulty (b), Item Discrimination (a) and Item Guessing (c)) can be formulated.
  • Embodiments herein have extended this concept of assigning partial credit from IRT to CTT to examine the effect of awarding partial credit to options of a Multiple Choice Test item (a key and a plurality of distractors which are normally the assumed mistakes, misconceptions and misunderstandings of the test taker).
  • the key option choice is credited with 4 credits, revealing the next best option by the next number of choices by Higher Ability Group (HAG) test takers taking the same test awarding a credit of 3 and the next option which is lesser than the HAG choice of the second is credited with 2 and the last remaining option is credited with 1.
  • HAG Higher Ability Group
  • (1, 0, X) format is the basis for performing additional analysis.
  • Embodiments herein use scores (such as 1, 0) herein merely as an example, however it may be obvious to a person of ordinary skill in the art that any other scoring method/pattem can be used.
  • FIGs. la, lb, and lc depict a system for using Partial Credit Model (PCM) in classical Test Theory (CTT) to improve the reliability estimate for the interpretation of scores in a multiple choice test.
  • the system 100 comprises a test analyzer module 101, a statistical analysis module 102, a reliability estimation module 103, an item analysis module 104, and a scoring engine 105.
  • the system 100 may be connected to at least one external module, such as at least one testing module 106 (as depicted in FIG. la), a scanner module 107 (as depicted in FIG. lb), and a database 108 (as depicted in FIG. lc).
  • the testing module 106 may enable at least one user to take a test, wherein the test comprises at least one multiple choice question.
  • the system 100 can receive data from the testing module 106 in real time, at pre-defined intervals of time (say, every 30 minutes till the duration of the time assigned for the test), on pre-defined events occurring (completing a section of the test, receiving a user input), and so on.
  • the scanner module 107 can comprise of a means to scan one or more tests and the scanned results can be provided to the testing module 106.
  • the database 108 can be a location for storing data, such as a database, a file server, a data server, the Cloud, Internet, a local server, and so on.
  • the database 108 can comprise of information related to tests and results of tests taken by one or more users.
  • the database 108 can receive information from the testing module 106 and/or the scanner module 107.
  • Embodiments herein use the terms ‘item’ and ’question’ interchangeably, wherein both the terms refer to a multiple choice question in the test.
  • the test analyzer module 101 can calculate a total score by adding up all the candidate’s responses in (1, 0, X) format along the horizontal.
  • the test analyzer module 101 further calculates the score of odd numbered items, and the score of even numbered items.
  • test analyzer module 101 calculates the total score as follows:
  • the test analyzer module 101 calculates the correlation between the score of odd numbered items and the score of the even numbered items.
  • the test analyzer module 101 can calculate the total number of test takers who answered the question correctly, the total number of test takers who answered the question incorrectly, and the total number of test takers who have not attempted the question. From the above, the test analyzer module 101 can find the difference between the total number of test takers and the total number of users who have attempted the question (wherein the attempt can be a correct answer or an incorrect answer).
  • the test analyzer module 101 can calculate a facility value as follows: Facility value (p)
  • the test analyzer module 101 can calculate an index of difficulty as follows:
  • the test analyzer module 101 can calculate the product of the facility value and the index of difficulty and sum the values of the product.
  • the test analyzer module 101 can calculate a difference for each question between each value of q and the minimum value of q.
  • the test analyzer module 101 can determine a scoring weight for each question as follows:
  • the statistical analysis module 102 can calculate the following values for the test for a user: mean, median, mode, standard deviation, sample variance, total number of items, minimum score, and maximum score.
  • the reliability estimation module 103 can calculate Split Half Reliability (SHR) as the correlation between the score of the odd numbered items, and the score of the even numbered items. Having obtained the total score, the reliability estimation module 103 can find the standard deviation for the total score for a user. The reliability estimation module 103 can calculate the Standard Error of Measurement (SEM) for SHR as follows:
  • the reliability estimation module 103 calculates the SEM as a percentage as follows:
  • the reliability estimation module 103 calculates the Full Test Reliability (FTR) as follows:
  • the reliability estimation module 103 calculates the SEM (%) as follows:
  • N is the number of items in the test.
  • the reliability estimation module 103 can calculate the reliability value using the Kuder-Richardson (lowest bound r) (KR 21) as follows:
  • the reliability estimation module 103 can calculate the reliability value using the Kuder-Richardson (lowest bound r) (KR 21) for 200 items as follows:
  • KR 21 (200) (r, reliability) nl * KR 21/(1 + (n2 * KR 21)
  • the reliability estimation module 103 compares the calculated SEM % to a pre-defined threshold percentage (in this case, 94%). If the calculated SEM % is equal to or greater than the pre-defined threshold, the SEM % can be considered as satisfying the ETS world standard. If the calculated SEM % is less than the pre-defined threshold, the SEM % can be considered as not satisfying the ETS world standard. [0039] In an embodiment herein, the reliability estimation module 103 can calculate the reliability value (highest bound r) using the Kuder-Richardson (KR 20) for the score of items with correct answers as follows:
  • KR 20 (iV/(jV— 1)) * (1— ( sum p * q)/ (sample variance ))
  • the reliability estimation module 103 can calculate the reliability value for 200 items using the Kuder-Richardson (KR 20(200)) as follows:
  • KR 20(200) nl * KR 20/(l + (n2 * KR 20))
  • KR 20 can be considered as the CRONBACH coefficient alpha.
  • the item analysis module 104 writes all the responses along with number right scores (the answers which are right) sorted in descending order.
  • the item analysis module 104 obtains the count for number of each of options in the multiple choice questions (for example, A, B, C, and D, and not answered (X)) in both HAG (Higher Ability Group) and LAG (Lower Ability Group) and arranges the responses from HAG and LAG groups in the example format as depicted in FIG. 2.
  • the HAG and LAG data can be taken from raw data (HAG from the upper part and LAG from the bottom part).
  • HAG can be considered as a top percentage level of test takers.
  • HAG can be considered as the top 27% of test takers.
  • LAG can be considered as a bottom percentage level of test takers.
  • LAG can be considered as the bottom 27% of test takers.
  • the item analysis module 104 arranges all the items with the answer key.
  • the test is for evaluating the analytical ability of the user and the items are arranged as in the example depicted in FIG. 3a.
  • the value in column A, B, C and D is the count of total number of candidates who responded with that option.
  • the first row is of HAG (the lined cells) and second row is of LAG (the shaded cells).
  • the test is for evaluating the verbal ability of the user and the items are arranged as in the example depicted in FIG. 3b.
  • the value in column A, B, C and D is the count of total number of candidates who responded with that option.
  • the first row is of HAG (the lined cells) and second row is of LAG (the shaded cells).
  • the scoring engine 105 compares the number right scores with PCM scores with respect to reliability and error.
  • the number right scores have a KR21 reliability of 0.701513 which is a lower bound estimate and any other estimate will be more than this and a standard error of measurement 8.677179 which is also the highest error. Comparing it to the values obtained in PCM, the reliability estimate is seen as 0.910805 much improved from the number right scores reliability and a standard error of measurement 3.403708 which is a reduced value and can be considered to be more acceptable. If trying to be ascertain with ETS world standard for increased number of items i.e. 200, it is further seen that the reliability of no. right scores is increased to 0.940006 satisfying just ETS world standard and the SEM% of 3.890196. At the same time, PCM has an improved reliability of 0.985523 and a reduced error of 1.371256 for 200 items which is also satisfies the ETS world standard. This is depicted in FIG. 4a.
  • the no. right scores (number of right scores) have a KR21 reliability 0.565748018 which is a lower bound estimate and any other estimate will be more than this and a standard error of measurement 9.703683215, which is also the highest error. Comparing it to the values obtained in PCM, the reliability estimate is seen as 0.844043851 which is much improved from no. right scores reliability and a standard error of measurement 3.667510978 which is much reduced value and more acceptable. If trying to be ascertain with ETS world standard for increased number of items i.e. 200, it is further seen that the reliability of no.
  • FIG. 5 illustrates a system 100 implementing an apparatus and methods for using Partial Credit Model (PCM) in classical Test Theory (CTT) to improve the reliability estimate for the interpretation of scores in a multiple choice test, according to embodiments as disclosed herein.
  • PCM Partial Credit Model
  • CTT classical Test Theory
  • the computing environment 502 comprises at least one processing unit 508 that further comprises a control unit 504, an Arithmetic Logic Unit (ALU) 506, a memory 510, a storage unit 512, a plurality of networking devices 516 and a plurality of Input output (I/O) devices 514.
  • ALU Arithmetic Logic Unit
  • the processing unit 508 is responsible for processing the instructions of the embodiments as disclosed herein.
  • the processing unit 508 receives commands from the control unit in order to perform its processing. Further, any logical and arithmetic operations involved in the execution of the instructions are computed with the help of the ALU 506.
  • the overall computing environment 502 can be composed of multiple homogeneous or heterogeneous cores, multiple CPUs of different kinds, special media and other accelerators.
  • the processing unit 508 is responsible for processing the instructions of the scheme. Further, the plurality of processing units 508 may be located on a single chip or over multiple chips.
  • the processing unit 508 can comprise of the test analyzer module 101, the statistical analysis module 102, the reliability estimation module 103, the item analysis module 104, and the scoring engine 105.
  • the test analyzer module 101 can calculate the total scores of all the items, the score of odd numbered items, and the score of even numbered items.
  • the test analyzer module 101 can calculate the correlation between the score of odd numbered items and the score of the even numbered items.
  • the test analyzer module 101 can calculate the total number of test takers who answered the question correctly, the total number of test takers who answered the question incorrectly, and the total number of test takers who have not attempted the question.
  • the test analyzer module 101 find the difference between the total number of test takers and the total number of users who have attempted the question (wherein the attempt can be a correct answer or an incorrect answer).
  • the test analyzer module 101 can calculate the facility value (p).
  • the test analyzer module 101 can calculate the index of difficulty (q).
  • the test analyzer module 101 can calculate the product of the facility value and the index of difficulty and sum the values of the product.
  • the test analyzer module 101 can calculate the difference for each question between each value of q and the minimum value of q.
  • the test analyzer module 101 can determine the scoring weight of each question.
  • the statistical analysis module 102 can calculate the following statistical values for the test for a user: mean, median, mode, standard deviation, sample variance, total number of items, minimum score, and maximum score.
  • the reliability estimation module 103 can calculate the SHR as the correlation between the score of the odd numbered items, and the score of the even numbered items.
  • the reliability estimation module 103 can find the standard deviation for the total score for a user.
  • the reliability estimation module 103 can calculate the Standard Error of Measurement (SEM) for the SHR, which can be in terms of a percentage.
  • SEM Standard Error of Measurement
  • the reliability estimation module 103 can calculate the FTR.
  • the reliability estimation module 103 can calculate the SEM for FTR, which can be in terms of a percentage.
  • the reliability estimation module 103 can calculate the reliability estimate(s) of the test for the score of items with correct answers.
  • the item analysis module 104 can write all the responses along with number right scores sorted in descending order.
  • the item analysis module 104 can obtain the count for number of A, B, C, D, X in both the HAG and the LAG and arranges the responses from HAG and LAG groups in the example format as depicted in FIG. 2.
  • the scoring engine 105 can arrange all the items with the answer key.
  • the scoring engine 105 can compare the number right scores with PCM scores with respect to reliability and error.
  • the scheme comprising of instructions and codes required for the implementation are stored in either the memory 510 or the storage 512 or both. At the time of execution, the instructions may be fetched from a corresponding memory or storage 512, and executed by the processing unit 508.
  • networking devices 516 or external I/O devices 514 may be connected to the computing environment to support the implementation through the networking unit and the I/O device unit.
  • the computing environment 502 may be at least one of an electronic device, a server, a client device, and so on.
  • the computing environment 502 may perform accelerating tasks during storage caching and tiering.
  • the computing environment 502 may include the application management framework.
  • the application management framework may include plurality of processing modules 106 and sub modules.
  • the processing modules 508 may be stored in the storage unit 512.
  • the processing modules 508 may be responsible for execution of the task for accelerating tasks during storage caching and tiering.
  • FIGs. 6a, 6b and 6c are flowcharts depicting the process of using Partial Credit Model (PCM) in classical Test Theory (CTT) to improve the reliability estimate for the interpretation of scores in a multiple choice test.
  • PCM Partial Credit Model
  • CTT classical Test Theory
  • the test analyzer module 101 calculates (601) the total scores of all the items, the score of odd numbered items, and the score of even numbered items.
  • the test analyzer module 101 calculates (602) the correlation between the score of odd numbered items and the score of the even numbered items.
  • the test analyzer module 101 calculates (603) the total number of test takers who answered the question correctly, the total number of test takers who answered the question incorrectly, and the total number of test takers who have not attempted the question.
  • the test analyzer module 101 finds (604) the difference between the total number of test takers and the total number of users who have attempted the question (wherein the attempt can be a correct answer or an incorrect answer).
  • the test analyzer module 101 calculates (605) the facility value (p).
  • the test analyzer module 101 calculates (606) the index of difficulty (q).
  • the test analyzer module 101 calculates (607) the product of the facility value and the index of difficulty and sum the values of the product.
  • the test analyzer module 101 calculates (608) the difference for each question between each value of q and the minimum value of q.
  • the test analyzer module 101 determines (609) the scoring weight of each question.
  • the statistical analysis module 102 calculates (610) the following statistical values for the test for a user: mean, median, mode, standard deviation, sample variance, total number of items, minimum score, and maximum score.
  • the reliability estimation module 103 calculates (611) the SHR as the correlation between the score of the odd numbered items, and the score of the even numbered items.
  • the reliability estimation module 103 finds (612) the standard deviation for the total score for a user.
  • the reliability estimation module 103 calculates (613) the Standard Error of Measurement (SEM) for the SHR, which can be in terms of a percentage.
  • the reliability estimation module 103 calculates (614) the FTR.
  • the reliability estimation module 103 calculates (615) the SEM for FTR, which can be in terms of a percentage.
  • the reliability estimation module 103 calculates (616) the reliability estimate(s) of the test for the score of items with correct answers.
  • the item analysis module 104 obtains (618) the count for number of A, B, C, D, X in both the HAG and the LAG and arranges the responses from HAG and LAG groups in the example format as depicted in FIG. 2.
  • the item analysis module 104 arranges (619) all the items with the answer key.
  • the scoring engine 105 compares (620) the number right scores with PCM scores with respect to reliability and error.
  • the various actions in method 600 may be performed in the order presented, in a different order or simultaneously. Further, in some embodiments, some actions listed in FIGs. 6a, 6b and 6c may be omitted.
  • Embodiments herein can result in a considerable reduction in measurement precision given by Standard Error of Measurement (SEM).
  • SEM Standard Error of Measurement
  • Embodiments herein can be used to award partial credit to choices of a multiple choice test item.
  • Embodiments herein can allot the key option a maximum number of credits, the next best option indicated by the number of HAG choices less than that of the key given a one lesser than the maximum number of credits and following the same procedure.
  • the embodiments disclosed herein can be implemented through at least one software program running on at least one hardware device and performing network management functions to control the network elements.
  • the elements shown in FIGs. 1 and 5 include blocks which can be at least one of a hardware device, or a combination of hardware device and software module.
  • the hardware device can be any kind of portable device that can be programmed.
  • the device may also include means which could be e.g. hardware means like e.g. an ASIC, or a combination of hardware and software means, e.g. an ASIC and an FPGA, or at least one microprocessor and at least one memory with software modules located therein.
  • the method embodiments described herein could be implemented partly in hardware and partly in software.
  • the invention may be implemented on different hardware devices, e.g. using a plurality of CPUs.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Evolutionary Biology (AREA)
  • Educational Administration (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Educational Technology (AREA)
  • Operations Research (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Algebra (AREA)
  • Business, Economics & Management (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

Embodiments disclosed herein relate to methods and systems for managing test scoring and more particularly to methods and systems for using Partial Credit Model (PCM) for scoring multiple choice tests. Embodiments herein disclose methods (600) and systems (100) for using Partial Credit Model (PCM) in classical Test Theory (CTT) to improve the reliability estimate for the interpretation of scores in a multiple choice test.

Description

METHODS AND SYSTEMS FOR PARTIAL CREDIT MODEL (PCM) SCORING IN
CLASSICAL TEST THEORY (CTT)
CROSS REFERENCE TO RELATED APPLICATION
This application is based on and derives the benefit of Indian Provisional Application
201841036068, filed on 25th September, 2018, the contents of which are incorporated herein by reference.
TECHNICAL FIELD
[001] Embodiments disclosed herein relate to methods and systems for managing test scoring and more particularly to methods and systems for using Partial Credit Model (PCM) for scoring multiple choice tests.
BACKGROUND
[002] Classical Test Theory (CTT) was developed for scoring multiple choice tests. CTT comprises of a plurality of component theories resulted such as, Theory of Validity, Theory of Reliability, Theory of Objectivity, Theory of Test and Item Analysis, and so on. In the Classical Test theory process, only guessing was considered and it was more to penalize the wrong option assuming it to be a result of guessing.
[003] CTT comprises of recognizing the correct response to an item to be allotted 1 mark and the incorrect/unanswered answers to be allocated 0 or negative marks. Several statistical quantities are inferred from this such as Total Number Right Score, Score of Odd Numbered Items, Score of Even Numbered Items and Correlating to give Split Half Reliability or internal consistency. Reliabilities are estimated by different formulae contributed by different individuals and agencies over a period of time and put together in the referred text above, such as Kuder Richardson 20 (KR20), Kuder Richardson 21 (KR21), Cronbach’s Coefficient Alpha, Analysis of Variance (ANNOVA) and several others including Rulon’s Formula that gives the higher bound estimate compared to the lower bound estimate of KR21. In the Classical Test theory process earlier only guessing was considered and it was more to penalize the wrong option assuming it to be a result of guessing. However, there is a considerable reduction in measurement precision given by Standard Error of Measurement (SEM), when using these approaches. OBJECTS
[004] The principal object of embodiments herein is to disclose methods and systems for using Partial Credit Model (PCM) in classical Test Theory (CTT) to improve the reliability estimate for the interpretation of scores in a multiple choice test.
[005] These and other aspects of the embodiments herein will be better appreciated and understood when considered in conjunction with the following description and the accompanying drawings. It should be understood, however, that the following descriptions, while indicating at least one embodiment and numerous specific details thereof, are given by way of illustration and not of limitation. Many changes and modifications may be made within the scope of the embodiments herein without departing from the spirit thereof, and the embodiments herein include all such modifications.
BRIEF DESCRIPTION OF FIGURES
[006] Embodiments herein are illustrated in the accompanying drawings, throughout which like reference letters indicate corresponding parts in the various figures. The embodiments herein will be better understood from the following description with reference to the drawings, in which:
[007] FIGs. la, lb, and lc depict a system for using Partial Credit Model (PCM) in classical Test Theory (CTT) to improve the reliability estimate for the interpretation of scores in a multiple choice test, according to embodiments as disclosed herein;
[008] FIGs. 2, 3a, 3b, 4a, and 4b depict example formats for representing the scores, according to embodiments as disclosed herein;
[009] FIG. 5 illustrates a device implementing an apparatus and methods for using Partial Credit Model (PCM) in classical Test Theory (CTT) to improve the reliability estimate for the interpretation of scores in a multiple choice test, according to embodiments as disclosed herein; and
[0010] FIGs. 6a, 6b and 6c are flowcharts depicting the process of using Partial Credit Model (PCM) in classical Test Theory (CTT) to improve the reliability estimate for the interpretation of scores in a multiple choice test, according to embodiments as disclosed herein. DETAILED DESCRIPTION
[0011] The embodiments herein and the various features and advantageous details thereof are explained more fully with reference to the non-limiting embodiments that are illustrated in the accompanying drawings and detailed in the following description. Descriptions of well-known components and processing techniques are omitted so as to not unnecessarily obscure the embodiments herein. The examples used herein are intended merely to facilitate an understanding of ways in which the embodiments herein may be practiced and to further enable those of skill in the art to practice the embodiments herein. Accordingly, the examples should not be construed as limiting the scope of the embodiments herein.
[0012] Embodiments herein may be described and illustrated in terms of blocks which carry out a described function or functions. These blocks, which may be referred to herein as units or modules or the like, are physically implemented by analog and/or digital circuits such as logic gates, integrated circuits, microprocessors, microcontrollers, memory circuits, passive electronic components, active electronic components, optical components, hardwired circuits and the like, and may optionally be driven by firmware and/or software. The circuits may, for example, be embodied in one or more semiconductor chips, or on substrate supports such as printed circuit boards and the like. The circuits constituting a block may be implemented by dedicated hardware, or by a processor (e.g., one or more programmed microprocessors and associated circuitry), or by a combination of dedicated hardware to perform some functions of the block and a processor to perform other functions of the block. Each block of the embodiments may be physically separated into two or more interacting and discrete blocks without departing from the scope of the disclosure. Likewise, the blocks of the embodiments may be physically combined into more complex blocks without departing from the scope of the disclosure.
[0013] The embodiments herein achieve methods and systems for using Partial Credit Model (PCM) in classical Test Theory (CTT) to improve the reliability estimate for the interpretation of scores in a multiple choice test. Referring now to the drawings, and more particularly to FIGS la through 6c, where similar reference characters denote corresponding features consistently throughout the figures, there are shown embodiments.
[0014] Item Response Theory (IRT) is a theory of Testing. IRT can assign partial credit to undertake calculations using iterative procedure assuming an initial value and refining it through successive approximation using a Maximum Likelihood Function. The item parameters i.e. Single parameter ( Item difficulty (b)), Two Parameter ( Item difficulty(b), Item Discrimination(a)) and Three Parameter ( Item difficulty (b), Item Discrimination (a) and Item Guessing (c)) can be formulated.
[0015] Embodiments herein have extended this concept of assigning partial credit from IRT to CTT to examine the effect of awarding partial credit to options of a Multiple Choice Test item (a key and a plurality of distractors which are normally the assumed mistakes, misconceptions and misunderstandings of the test taker).
[0016] In an example embodiment disclosed herein, the key option choice is credited with 4 credits, revealing the next best option by the next number of choices by Higher Ability Group (HAG) test takers taking the same test awarding a credit of 3 and the next option which is lesser than the HAG choice of the second is credited with 2 and the last remaining option is credited with 1.
[0017] In a testing of abilities such as Analytical Ability (AA), Verbal Ability (VA), Numerical Ability (NA), Quantitative Ability (QA) and Attention to Details (ATD) are administered to a group of test takers and the results are analyzed using Classical Item Analysis (as disclosed herein). The starting point is the (A, B, C, D, X) format denoting the choice of either A, B, C or D or Omitting an answer denoted by X. Thus, the (A, B, C, D, X) format is used to generate the (1, 0, X) format where 1 is the correct answer, 0 is the incorrect answer, and X is one of the other options other than the key or an omitted answer. For all CTT test & item analysis, (1, 0, X) format is the basis for performing additional analysis. Embodiments herein use scores (such as 1, 0) herein merely as an example, however it may be obvious to a person of ordinary skill in the art that any other scoring method/pattem can be used.
[0018] FIGs. la, lb, and lc depict a system for using Partial Credit Model (PCM) in classical Test Theory (CTT) to improve the reliability estimate for the interpretation of scores in a multiple choice test. The system 100, as depicted, comprises a test analyzer module 101, a statistical analysis module 102, a reliability estimation module 103, an item analysis module 104, and a scoring engine 105.
[0019] The system 100 may be connected to at least one external module, such as at least one testing module 106 (as depicted in FIG. la), a scanner module 107 (as depicted in FIG. lb), and a database 108 (as depicted in FIG. lc). The testing module 106 may enable at least one user to take a test, wherein the test comprises at least one multiple choice question. The system 100 can receive data from the testing module 106 in real time, at pre-defined intervals of time (say, every 30 minutes till the duration of the time assigned for the test), on pre-defined events occurring (completing a section of the test, receiving a user input), and so on.
[0020] The scanner module 107 can comprise of a means to scan one or more tests and the scanned results can be provided to the testing module 106.
[0021] The database 108 can be a location for storing data, such as a database, a file server, a data server, the Cloud, Internet, a local server, and so on. The database 108 can comprise of information related to tests and results of tests taken by one or more users. In an embodiment herein, the database 108 can receive information from the testing module 106 and/or the scanner module 107.
[0022] Embodiments herein use the terms ‘item’ and ’question’ interchangeably, wherein both the terms refer to a multiple choice question in the test.
[0023] On receiving test results from at least one of the testing module or the scanning module 107 or fetched from the database 108, the test analyzer module 101 can calculate a total score by adding up all the candidate’s responses in (1, 0, X) format along the horizontal. The test analyzer module 101 further calculates the score of odd numbered items, and the score of even numbered items.
[0024] If negative marking is applied, then the test analyzer module 101 calculates the total score as follows:
total score = [ Score of right answers— score for wrong answers / n— 1)] where n= number of options available for a multiple choice question.
[0025] The test analyzer module 101 calculates the correlation between the score of odd numbered items and the score of the even numbered items. The test analyzer module 101 can calculate the total number of test takers who answered the question correctly, the total number of test takers who answered the question incorrectly, and the total number of test takers who have not attempted the question. From the above, the test analyzer module 101 can find the difference between the total number of test takers and the total number of users who have attempted the question (wherein the attempt can be a correct answer or an incorrect answer). The test analyzer module 101 can calculate a facility value as follows: Facility value (p)
= Total correct answers for a question/total answers for a question
[0026] The test analyzer module 101 can calculate an index of difficulty as follows:
index of difficulty ( q )
= Total incorrect answers for a question/ total answers for a question
Or
q = l - p
[0027] The test analyzer module 101 can calculate the product of the facility value and the index of difficulty and sum the values of the product. The test analyzer module 101 can calculate a difference for each question between each value of q and the minimum value of q.
[0028] The test analyzer module 101 can determine a scoring weight for each question as follows:
Scoring weight of a question (SI/ ) = 1 + the difference for the question
[0029] The statistical analysis module 102 can calculate the following values for the test for a user: mean, median, mode, standard deviation, sample variance, total number of items, minimum score, and maximum score.
[0030] The reliability estimation module 103 can calculate Split Half Reliability (SHR) as the correlation between the score of the odd numbered items, and the score of the even numbered items. Having obtained the total score, the reliability estimation module 103 can find the standard deviation for the total score for a user. The reliability estimation module 103 can calculate the Standard Error of Measurement (SEM) for SHR as follows:
Figure imgf000008_0001
[0031] The reliability estimation module 103 calculates the SEM as a percentage as follows:
SEM (%) = ( SEM * 100) /total number of questions
[0032] The reliability estimation module 103 calculates the Full Test Reliability (FTR) as follows:
2 * SHR
FTR = rxx =
1 + SHR [0033] The reliability estimation module 103 calculates the SEM for FTR as follows: SEM for FTR = standard deviation * j( 1— FTR )
[0034] The reliability estimation module 103 calculates the SEM (%) as follows:
SEM (%) = ( SEM * 100 ) /total number of questions
[0035] The reliability estimation module 103 calculates the reliability estimate(s) of the test for the score of items with correct answers. If the test has less than 200 items, then calculate nl = 200 /N n2 = (200 /N - 1
Where N is the number of items in the test.
[0036] In an embodiment herein, the reliability estimation module 103 can calculate the reliability value using the Kuder-Richardson (lowest bound r) (KR 21) as follows:
KR 21 (r, reliability)
= (iV/(jV— 1)) * (1— ( Mean * (iV— mean)/(N * sample variance)))
SEM(KR 21) = Standard deviation * (l— KR 21)
SEM % (KR 21) = SEM(KR 21) * 100/JV
[0037] In an embodiment herein, the reliability estimation module 103 can calculate the reliability value using the Kuder-Richardson (lowest bound r) (KR 21) for 200 items as follows:
KR 21 (200) (r, reliability) = nl * KR 21/(1 + (n2 * KR 21)
Figure imgf000009_0001
SEM % (KR 21(200)) = SEM(KR 21 (200)) * 100 /N
[0038] The reliability estimation module 103 compares the calculated SEM % to a pre-defined threshold percentage (in this case, 94%). If the calculated SEM % is equal to or greater than the pre-defined threshold, the SEM % can be considered as satisfying the ETS world standard. If the calculated SEM % is less than the pre-defined threshold, the SEM % can be considered as not satisfying the ETS world standard. [0039] In an embodiment herein, the reliability estimation module 103 can calculate the reliability value (highest bound r) using the Kuder-Richardson (KR 20) for the score of items with correct answers as follows:
KR 20 = (iV/(jV— 1)) * (1— ( sum p * q)/ (sample variance ))
Figure imgf000010_0001
SEM % (KR 20) = SEM(KR20 ) * 100 /N
[0040] In an embodiment herein, the reliability estimation module 103 can calculate the reliability value for 200 items using the Kuder-Richardson (KR 20(200)) as follows:
KR 20(200) = nl * KR 20/(l + (n2 * KR 20))
Figure imgf000010_0002
SEM % (KR 20(200)) = SEM (KR20 (200)) * 100 /N
[0041] KR 20 can be considered as the CRONBACH coefficient alpha.
[0042] The item analysis module 104 writes all the responses along with number right scores (the answers which are right) sorted in descending order. The item analysis module 104 obtains the count for number of each of options in the multiple choice questions (for example, A, B, C, and D, and not answered (X)) in both HAG (Higher Ability Group) and LAG (Lower Ability Group) and arranges the responses from HAG and LAG groups in the example format as depicted in FIG. 2. The HAG and LAG data can be taken from raw data (HAG from the upper part and LAG from the bottom part).
[0043] In an embodiment herein, HAG can be considered as a top percentage level of test takers. For example, HAG can be considered as the top 27% of test takers.
[0044] In an embodiment herein, LAG can be considered as a bottom percentage level of test takers. For example, LAG can be considered as the bottom 27% of test takers.
[0045] The item analysis module 104 arranges all the items with the answer key. In an example, consider that the test is for evaluating the analytical ability of the user and the items are arranged as in the example depicted in FIG. 3a. The value in column A, B, C and D is the count of total number of candidates who responded with that option. The first row is of HAG (the lined cells) and second row is of LAG (the shaded cells). [0046] In an example, consider that the test is for evaluating the verbal ability of the user and the items are arranged as in the example depicted in FIG. 3b. The value in column A, B, C and D is the count of total number of candidates who responded with that option. The first row is of HAG (the lined cells) and second row is of LAG (the shaded cells).
[0047] The scoring engine 105 compares the number right scores with PCM scores with respect to reliability and error.
[0048] Considering the example depicted in FIG. 3a, it is seen that the number right scores have a KR21 reliability of 0.701513 which is a lower bound estimate and any other estimate will be more than this and a standard error of measurement 8.677179 which is also the highest error. Comparing it to the values obtained in PCM, the reliability estimate is seen as 0.910805 much improved from the number right scores reliability and a standard error of measurement 3.403708 which is a reduced value and can be considered to be more acceptable. If trying to be ascertain with ETS world standard for increased number of items i.e. 200, it is further seen that the reliability of no. right scores is increased to 0.940006 satisfying just ETS world standard and the SEM% of 3.890196. At the same time, PCM has an improved reliability of 0.985523 and a reduced error of 1.371256 for 200 items which is also satisfies the ETS world standard. This is depicted in FIG. 4a.
[0049] Consider the example in FIG. 3b, it is seen that the no. right scores (number of right scores) have a KR21 reliability 0.565748018 which is a lower bound estimate and any other estimate will be more than this and a standard error of measurement 9.703683215, which is also the highest error. Comparing it to the values obtained in PCM, the reliability estimate is seen as 0.844043851 which is much improved from no. right scores reliability and a standard error of measurement 3.667510978 which is much reduced value and more acceptable. If trying to be ascertain with ETS world standard for increased number of items i.e. 200, it is further seen that the reliability of no. right scores is increased to 0.912453365 partially satisfying just ETS world standard and the SEM% of 4.356978678. At the same time, PCM has an improved reliability of 0.977424838 and a reduced error of 1.395357881 for 200 items which is also satisfying much beyond ETS world standard and very acceptable. This is depicted in FIG. 4b.
[0050] The modules as disclosed above can store the intermediate results and the final results in a suitable location such as a memory, the database 108, the Cloud, a data server, a file server, the Internet, a local server, and so on. [0051] FIG. 5 illustrates a system 100 implementing an apparatus and methods for using Partial Credit Model (PCM) in classical Test Theory (CTT) to improve the reliability estimate for the interpretation of scores in a multiple choice test, according to embodiments as disclosed herein.
[0052] As depicted in the figure, the computing environment 502 comprises at least one processing unit 508 that further comprises a control unit 504, an Arithmetic Logic Unit (ALU) 506, a memory 510, a storage unit 512, a plurality of networking devices 516 and a plurality of Input output (I/O) devices 514.
[0053] The processing unit 508 is responsible for processing the instructions of the embodiments as disclosed herein. The processing unit 508 receives commands from the control unit in order to perform its processing. Further, any logical and arithmetic operations involved in the execution of the instructions are computed with the help of the ALU 506. The overall computing environment 502 can be composed of multiple homogeneous or heterogeneous cores, multiple CPUs of different kinds, special media and other accelerators. The processing unit 508 is responsible for processing the instructions of the scheme. Further, the plurality of processing units 508 may be located on a single chip or over multiple chips.
[0054] The processing unit 508 can comprise of the test analyzer module 101, the statistical analysis module 102, the reliability estimation module 103, the item analysis module 104, and the scoring engine 105.
[0055] On receiving test results from at least one of the testing module or the scanning module 107 or fetched from the database 108, the test analyzer module 101 can calculate the total scores of all the items, the score of odd numbered items, and the score of even numbered items. The test analyzer module 101 can calculate the correlation between the score of odd numbered items and the score of the even numbered items. The test analyzer module 101 can calculate the total number of test takers who answered the question correctly, the total number of test takers who answered the question incorrectly, and the total number of test takers who have not attempted the question. From the above, the test analyzer module 101 find the difference between the total number of test takers and the total number of users who have attempted the question (wherein the attempt can be a correct answer or an incorrect answer). The test analyzer module 101 can calculate the facility value (p). The test analyzer module 101 can calculate the index of difficulty (q). The test analyzer module 101 can calculate the product of the facility value and the index of difficulty and sum the values of the product. The test analyzer module 101 can calculate the difference for each question between each value of q and the minimum value of q. The test analyzer module 101 can determine the scoring weight of each question.
[0056] The statistical analysis module 102 can calculate the following statistical values for the test for a user: mean, median, mode, standard deviation, sample variance, total number of items, minimum score, and maximum score.
[0057] The reliability estimation module 103 can calculate the SHR as the correlation between the score of the odd numbered items, and the score of the even numbered items. The reliability estimation module 103 can find the standard deviation for the total score for a user. The reliability estimation module 103 can calculate the Standard Error of Measurement (SEM) for the SHR, which can be in terms of a percentage. The reliability estimation module
103 can calculate the FTR. The reliability estimation module 103 can calculate the SEM for FTR, which can be in terms of a percentage. The reliability estimation module 103 can calculate the reliability estimate(s) of the test for the score of items with correct answers.
[0058] The item analysis module 104 can write all the responses along with number right scores sorted in descending order. The item analysis module 104 can obtain the count for number of A, B, C, D, X in both the HAG and the LAG and arranges the responses from HAG and LAG groups in the example format as depicted in FIG. 2. The item analysis module
104 can arrange all the items with the answer key. The scoring engine 105 can compare the number right scores with PCM scores with respect to reliability and error.
[0059] The scheme comprising of instructions and codes required for the implementation are stored in either the memory 510 or the storage 512 or both. At the time of execution, the instructions may be fetched from a corresponding memory or storage 512, and executed by the processing unit 508.
[0060] In case of any hardware implementations various networking devices 516 or external I/O devices 514 may be connected to the computing environment to support the implementation through the networking unit and the I/O device unit.
[0061] In an embodiment, the computing environment 502 may be at least one of an electronic device, a server, a client device, and so on. The computing environment 502 may perform accelerating tasks during storage caching and tiering. The computing environment 502 may include the application management framework. The application management framework may include plurality of processing modules 106 and sub modules. The processing modules 508 may be stored in the storage unit 512. The processing modules 508 may be responsible for execution of the task for accelerating tasks during storage caching and tiering.
[0062] FIGs. 6a, 6b and 6c are flowcharts depicting the process of using Partial Credit Model (PCM) in classical Test Theory (CTT) to improve the reliability estimate for the interpretation of scores in a multiple choice test.
[0063] On receiving test results from at least one of the testing module or the scanning module 107 or fetched from the database 108, the test analyzer module 101 calculates (601) the total scores of all the items, the score of odd numbered items, and the score of even numbered items. The test analyzer module 101 calculates (602) the correlation between the score of odd numbered items and the score of the even numbered items. The test analyzer module 101 calculates (603) the total number of test takers who answered the question correctly, the total number of test takers who answered the question incorrectly, and the total number of test takers who have not attempted the question. From the above, the test analyzer module 101 finds (604) the difference between the total number of test takers and the total number of users who have attempted the question (wherein the attempt can be a correct answer or an incorrect answer). The test analyzer module 101 calculates (605) the facility value (p). The test analyzer module 101 calculates (606) the index of difficulty (q). The test analyzer module 101 calculates (607) the product of the facility value and the index of difficulty and sum the values of the product. The test analyzer module 101 calculates (608) the difference for each question between each value of q and the minimum value of q. The test analyzer module 101 determines (609) the scoring weight of each question.
[0064] The statistical analysis module 102 calculates (610) the following statistical values for the test for a user: mean, median, mode, standard deviation, sample variance, total number of items, minimum score, and maximum score.
[0065] The reliability estimation module 103 calculates (611) the SHR as the correlation between the score of the odd numbered items, and the score of the even numbered items. The reliability estimation module 103 finds (612) the standard deviation for the total score for a user. The reliability estimation module 103 calculates (613) the Standard Error of Measurement (SEM) for the SHR, which can be in terms of a percentage. The reliability estimation module 103 calculates (614) the FTR. The reliability estimation module 103 calculates (615) the SEM for FTR, which can be in terms of a percentage. The reliability estimation module 103 calculates (616) the reliability estimate(s) of the test for the score of items with correct answers.
[0066] The item analysis module 104 sorting (617) all the responses along with number right scores in descending order. The item analysis module 104 obtains (618) the count for number of A, B, C, D, X in both the HAG and the LAG and arranges the responses from HAG and LAG groups in the example format as depicted in FIG. 2. The item analysis module 104 arranges (619) all the items with the answer key. The scoring engine 105 compares (620) the number right scores with PCM scores with respect to reliability and error. The various actions in method 600 may be performed in the order presented, in a different order or simultaneously. Further, in some embodiments, some actions listed in FIGs. 6a, 6b and 6c may be omitted.
[0067] Embodiments herein can result in a considerable reduction in measurement precision given by Standard Error of Measurement (SEM).
[0068] Embodiments herein can be used to award partial credit to choices of a multiple choice test item. Embodiments herein can allot the key option a maximum number of credits, the next best option indicated by the number of HAG choices less than that of the key given a one lesser than the maximum number of credits and following the same procedure.
[0069] The embodiments disclosed herein can be implemented through at least one software program running on at least one hardware device and performing network management functions to control the network elements. The elements shown in FIGs. 1 and 5 include blocks which can be at least one of a hardware device, or a combination of hardware device and software module.
[0070] The embodiment disclosed herein describes using Partial Credit Model (PCM) in classical Test Theory (CTT) to improve the reliability estimate for the interpretation of scores in a multiple choice test. Therefore, it is understood that the scope of the protection is extended to such a program and in addition to a computer readable means having a message therein, such computer readable storage means contain program code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The method is implemented in at least one embodiment through or together with a software program written in e.g. Very high speed integrated circuit Hardware Description Language (VHDL) another programming language, or implemented by one or more VHDL or several software modules being executed on at least one hardware device. The hardware device can be any kind of portable device that can be programmed. The device may also include means which could be e.g. hardware means like e.g. an ASIC, or a combination of hardware and software means, e.g. an ASIC and an FPGA, or at least one microprocessor and at least one memory with software modules located therein. The method embodiments described herein could be implemented partly in hardware and partly in software. Alternatively, the invention may be implemented on different hardware devices, e.g. using a plurality of CPUs.
[0071] The foregoing description of the specific embodiments will so fully reveal the general nature of the embodiments herein that others can, by applying current knowledge, readily modify and/or adapt for various applications such specific embodiments without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments. It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation. Therefore, while the embodiments herein have been described in terms of embodiments and examples, those skilled in the art will recognize that the embodiments and examples disclosed herein can be practiced with modification within the spirit and scope of the embodiments as described herein.

Claims

STATEMENT OF CLAIMS We claim:
1. A method (600) for scoring a test comprising at least one multiple choice question, the method comprising:
calculating (601), by a test analyzer module (101), total scores of all multiple choice questions present in the test, score of odd multiple choice questions present in the test, and score of even multiple choice questions present in the test;
calculating (602), by the test analyzer module (101), a correlation between the score of odd multiple choice questions and the score of the even multiple choice questions;
calculating (603), by the test analyzer module (101), total number of test takers who answered each question correctly, total number of test takers who answered each question incorrectly, and total number of test takers who have not attempted each question;
finding (604), by the test analyzer module (101), a difference between the total number of test takers and the total number of users who have attempted each question;
calculating (605), by the test analyzer module (101), a facility value for each question; calculating (606), by the test analyzer module (101), the index of difficulty for each question;
calculating (607), by the test analyzer module (101), a sum of values of the product of the facility value and the index of difficulty for each question;
calculating (608), by the test analyzer module (101), a difference for each question between each value of the index of difficulty and a minimum value of the index of difficulty; determining (609), by the test analyzer module (101), a scoring weight for each question; calculating (611), by a reliability estimation module (103), a Split Half Reliability (SHR) as a correlation between the score of the odd multiple choice questions, and the score of the even multiple choice questions;
calculating (613), by the reliability estimation module (103), a Standard Error of Measurement (SEM) for the SHR;
calculating (614), by the reliability estimation module (103), a Full Test Reliability (FTR); calculating (615), by the reliability estimation module (103), a SEM for the calculated FTR;
calculating (616), by the reliability estimation module (103), reliability estimate(s) of the test for the score of multiple choice questions with correct answers;
sorting (617), by an item analysis module (104), all responses to the multiple choice questions with number right scores in descending order;
obtaining (618), by the item analysis module (104), the count for number of possible answer options in the HAG (Higher Ability Group) and LAG (Lower Ability Group);
arranging (618), by the item analysis module (104), the responses from HAG and LAG; and
arranging (619), by the item analysis module (104), all the items with the answer key.
2. The method, as claimed in claim 1, wherein calculating (605), by the test analyzer module (101), the facility value for each question as total correct answers for a question)/(total answers for a question.
3. The method, as claimed in claim 1, wherein calculating (606), by the test analyzer module (101), the index of difficulty for each question as total incorrect answers for a question)/(total answers for a question.
4. The method, as claimed in claim 1, wherein determining (609), by the test analyzer module (101), the scoring weight for each question as the sum of one and the difference for each question between each value of the index of difficulty and the minimum value of the index of difficulty.
5. The method, as claimed in claim 1, wherein the method further comprises calculating (610), by the statistical analysis module (102), a plurality of statistical values for the test for a user comprising mean, median, mode, standard deviation, sample variance, total number of items, minimum score, and maximum score.
6. The method, as claimed in claim 1, wherein the reliability estimates depends on number of multiple choice questions in the test.
7. The method, as claimed in claim 1, wherein the method further comprises comparing (620), by a scoring engine (105), the number right scores with PCM scores with respect to reliability and error.
8. A system (100) for scoring a test comprising at least one multiple choice question, the system comprising:
a memory (510);
a storage (512); and
a processing unit (508) further comprising
a test analyzer module (101) configured for
calculating total scores of all multiple choice questions present in the test, score of odd multiple choice questions present in the test, and score of even multiple choice questions present in the test;
calculating a correlation between the score of odd multiple choice questions and the score of the even multiple choice questions;
calculating total number of test takers who answered each question correctly, total number of test takers who answered each question incorrectly, and total number of test takers who have not attempted each question;
finding a difference between the total number of test takers and the total number of users who have attempted each question;
calculating a facility value for each question;
calculating the index of difficulty for each question;
calculating a sum of values of the product of the facility value and the index of difficulty for each question;
calculating a difference for each question between each value of the index of difficulty and a minimum value of the index of difficulty;
determining a scoring weight for each question;
a reliability estimation module (103) configured for
calculating a Split Half Reliability (SHR) as a correlation between the score of the odd multiple choice questions, and the score of the even multiple choice questions;
calculating a Standard Error of Measurement (SEM) for the SHR; calculating a Full Test Reliability (FTR); calculating a SEM for the calculated FTR;
calculating reliability estimate(s) of the test for the score of multiple choice questions with correct answers;
an item analysis module (104) configured for
sorting all responses to the multiple choice questions with number right scores in descending order;
obtaining the count for number of possible answer options in the HAG (Higher Ability Group) and LAG (Lower Ability Group);
arranging the responses from HAG and LAG; and
arranging all the items with the answer key.
9. The system, as claimed in claim 8, wherein the test analyzer module (101) is further configured for calculating the facility value for each question as total correct answers for a question)/(total answers for a question.
10. The system, as claimed in claim 8, wherein the test analyzer module (101) is configured for calculating the index of difficulty for each question as total incorrect answers for a question)/(total answers for a question.
11. The system, as claimed in claim 8, wherein the test analyzer module (101) is configured for determining the scoring weight for each question as the sum of one and the difference for each question between each value of the index of difficulty and the minimum value of the index of difficulty.
12. The system, as claimed in claim 8, wherein the statistical analysis module (102) is further configured for calculating a plurality of statistical values for the test for a user comprising mean, median, mode, standard deviation, sample variance, total number of items, minimum score, and maximum score.
13. The system, as claimed in claim 8, wherein the reliability estimates depends on number of multiple choice questions in the test.
14. The system, as claimed in claim 8, wherein the system (100) further comprising a scoring engine (105) configured for comparing the number right scores with PCM scores with respect to reliability and error.
PCT/IN2019/050688 2018-09-25 2019-09-19 Methods and systems for partial credit model (pcm) scoring in classical test theory (ctt) WO2020065663A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IN201841036068 2018-09-25
IN201841036068 2018-09-25

Publications (1)

Publication Number Publication Date
WO2020065663A1 true WO2020065663A1 (en) 2020-04-02

Family

ID=69951973

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IN2019/050688 WO2020065663A1 (en) 2018-09-25 2019-09-19 Methods and systems for partial credit model (pcm) scoring in classical test theory (ctt)

Country Status (1)

Country Link
WO (1) WO2020065663A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050256663A1 (en) * 2002-09-25 2005-11-17 Susumu Fujimori Test system and control method thereof
US20080108037A1 (en) * 2006-10-19 2008-05-08 Darin Beamish Control of audience response systems during use with paper-based questions
US20150056597A1 (en) * 2013-08-22 2015-02-26 LoudCloud Systems Inc. System and method facilitating adaptive learning based on user behavioral profiles
US20150379454A1 (en) * 2014-06-27 2015-12-31 Pymetrics, Inc. Systems and Methods for Data-Driven Identification of Talent

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050256663A1 (en) * 2002-09-25 2005-11-17 Susumu Fujimori Test system and control method thereof
US20080108037A1 (en) * 2006-10-19 2008-05-08 Darin Beamish Control of audience response systems during use with paper-based questions
US20150056597A1 (en) * 2013-08-22 2015-02-26 LoudCloud Systems Inc. System and method facilitating adaptive learning based on user behavioral profiles
US20150379454A1 (en) * 2014-06-27 2015-12-31 Pymetrics, Inc. Systems and Methods for Data-Driven Identification of Talent

Similar Documents

Publication Publication Date Title
Wachter et al. The future of meta-analysis
CN107730131B (en) Capability prediction and recommendation method and device for crowdsourced software developers
WO2021111670A1 (en) Annotation device and method
US7149468B2 (en) Methods for improving certainty of test-taker performance determinations for assessments with open-ended items
CN109492644A (en) A kind of matching and recognition method and terminal device of exercise image
Breytenbach et al. Communities in control of their own integrated technology development processes
CN108256699A (en) Graduation whereabouts Forecasting Methodology and system based on college student stereo data
CN109063116A (en) Data identification method, device, electronic equipment and computer readable storage medium
CN117151070B (en) Test paper question-setting method, device, equipment and computer readable storage medium
CN113887930A (en) Question-answering robot health degree evaluation method, device, equipment and storage medium
WO2021151305A1 (en) Sample analysis method, apparatus, electronic device, and medium based on missing data
CN112288337A (en) Behavior recommendation method, behavior recommendation device, behavior recommendation equipment and behavior recommendation medium
US11238410B1 (en) Methods and systems for merging outputs of candidate and job-matching artificial intelligence engines executing machine learning-based models
CN113516417A (en) Service evaluation method and device based on intelligent modeling, electronic equipment and medium
CN114862140A (en) Behavior analysis-based potential evaluation method, device, equipment and storage medium
CN113627160A (en) Text error correction method and device, electronic equipment and storage medium
CN112052310A (en) Information acquisition method, device, equipment and storage medium based on big data
CN112948705A (en) Intelligent matching method, device and medium based on policy big data
WO2020065663A1 (en) Methods and systems for partial credit model (pcm) scoring in classical test theory (ctt)
CN107845047B (en) Dynamic scoring system, method and computer readable storage medium
David et al. New Frontiers: The Origins and Content of New Work, 1940–2018
CN110032714A (en) A kind of corpus labeling feedback method and device
KR20200025282A (en) Method and system for providing online reading study
CN111652767B (en) User portrait construction method and device, computer equipment and storage medium
Shaw et al. Success in the US: Are Cambridge International Assessments Good Preparation for University Study?.

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19867589

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19867589

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 19867589

Country of ref document: EP

Kind code of ref document: A1