WO2020065663A1

WO2020065663A1 - Methods and systems for partial credit model (pcm) scoring in classical test theory (ctt)

Info

Publication number: WO2020065663A1
Application number: PCT/IN2019/050688
Authority: WO
Inventors: Natarajan VENKATESA
Original assignee: Merittrac Services Pvt. Ltd
Priority date: 2018-09-25
Filing date: 2019-09-19
Publication date: 2020-04-02

Abstract

Embodiments disclosed herein relate to methods and systems for managing test scoring and more particularly to methods and systems for using Partial Credit Model (PCM) for scoring multiple choice tests. Embodiments herein disclose methods (600) and systems (100) for using Partial Credit Model (PCM) in classical Test Theory (CTT) to improve the reliability estimate for the interpretation of scores in a multiple choice test.

Description

METHODS AND SYSTEMS FOR PARTIAL CREDIT MODEL (PCM) SCORING IN

CLASSICAL TEST THEORY (CTT)

CROSS REFERENCE TO RELATED APPLICATION

This application is based on and derives the benefit of Indian Provisional Application

201841036068, filed on 25^th September, 2018, the contents of which are incorporated herein by reference.

TECHNICAL FIELD

[001] Embodiments disclosed herein relate to methods and systems for managing test scoring and more particularly to methods and systems for using Partial Credit Model (PCM) for scoring multiple choice tests.

BACKGROUND

[002] Classical Test Theory (CTT) was developed for scoring multiple choice tests. CTT comprises of a plurality of component theories resulted such as, Theory of Validity, Theory of Reliability, Theory of Objectivity, Theory of Test and Item Analysis, and so on. In the Classical Test theory process, only guessing was considered and it was more to penalize the wrong option assuming it to be a result of guessing.

[003] CTT comprises of recognizing the correct response to an item to be allotted 1 mark and the incorrect/unanswered answers to be allocated 0 or negative marks. Several statistical quantities are inferred from this such as Total Number Right Score, Score of Odd Numbered Items, Score of Even Numbered Items and Correlating to give Split Half Reliability or internal consistency. Reliabilities are estimated by different formulae contributed by different individuals and agencies over a period of time and put together in the referred text above, such as Kuder Richardson 20 (KR20), Kuder Richardson 21 (KR21), Cronbach’s Coefficient Alpha, Analysis of Variance (ANNOVA) and several others including Rulon’s Formula that gives the higher bound estimate compared to the lower bound estimate of KR21. In the Classical Test theory process earlier only guessing was considered and it was more to penalize the wrong option assuming it to be a result of guessing. However, there is a considerable reduction in measurement precision given by Standard Error of Measurement (SEM), when using these approaches. OBJECTS

[004] The principal object of embodiments herein is to disclose methods and systems for using Partial Credit Model (PCM) in classical Test Theory (CTT) to improve the reliability estimate for the interpretation of scores in a multiple choice test.

[005] These and other aspects of the embodiments herein will be better appreciated and understood when considered in conjunction with the following description and the accompanying drawings. It should be understood, however, that the following descriptions, while indicating at least one embodiment and numerous specific details thereof, are given by way of illustration and not of limitation. Many changes and modifications may be made within the scope of the embodiments herein without departing from the spirit thereof, and the embodiments herein include all such modifications.

BRIEF DESCRIPTION OF FIGURES

[006] Embodiments herein are illustrated in the accompanying drawings, throughout which like reference letters indicate corresponding parts in the various figures. The embodiments herein will be better understood from the following description with reference to the drawings, in which:

[007] FIGs. la, lb, and lc depict a system for using Partial Credit Model (PCM) in classical Test Theory (CTT) to improve the reliability estimate for the interpretation of scores in a multiple choice test, according to embodiments as disclosed herein;

[008] FIGs. 2, 3a, 3b, 4a, and 4b depict example formats for representing the scores, according to embodiments as disclosed herein;

[009] FIG. 5 illustrates a device implementing an apparatus and methods for using Partial Credit Model (PCM) in classical Test Theory (CTT) to improve the reliability estimate for the interpretation of scores in a multiple choice test, according to embodiments as disclosed herein; and

[0010] FIGs. 6a, 6b and 6c are flowcharts depicting the process of using Partial Credit Model (PCM) in classical Test Theory (CTT) to improve the reliability estimate for the interpretation of scores in a multiple choice test, according to embodiments as disclosed herein. DETAILED DESCRIPTION

[0011] The embodiments herein and the various features and advantageous details thereof are explained more fully with reference to the non-limiting embodiments that are illustrated in the accompanying drawings and detailed in the following description. Descriptions of well-known components and processing techniques are omitted so as to not unnecessarily obscure the embodiments herein. The examples used herein are intended merely to facilitate an understanding of ways in which the embodiments herein may be practiced and to further enable those of skill in the art to practice the embodiments herein. Accordingly, the examples should not be construed as limiting the scope of the embodiments herein.

[0012] Embodiments herein may be described and illustrated in terms of blocks which carry out a described function or functions. These blocks, which may be referred to herein as units or modules or the like, are physically implemented by analog and/or digital circuits such as logic gates, integrated circuits, microprocessors, microcontrollers, memory circuits, passive electronic components, active electronic components, optical components, hardwired circuits and the like, and may optionally be driven by firmware and/or software. The circuits may, for example, be embodied in one or more semiconductor chips, or on substrate supports such as printed circuit boards and the like. The circuits constituting a block may be implemented by dedicated hardware, or by a processor (e.g., one or more programmed microprocessors and associated circuitry), or by a combination of dedicated hardware to perform some functions of the block and a processor to perform other functions of the block. Each block of the embodiments may be physically separated into two or more interacting and discrete blocks without departing from the scope of the disclosure. Likewise, the blocks of the embodiments may be physically combined into more complex blocks without departing from the scope of the disclosure.

[0013] The embodiments herein achieve methods and systems for using Partial Credit Model (PCM) in classical Test Theory (CTT) to improve the reliability estimate for the interpretation of scores in a multiple choice test. Referring now to the drawings, and more particularly to FIGS la through 6c, where similar reference characters denote corresponding features consistently throughout the figures, there are shown embodiments.

[0014] Item Response Theory (IRT) is a theory of Testing. IRT can assign partial credit to undertake calculations using iterative procedure assuming an initial value and refining it through successive approximation using a Maximum Likelihood Function. The item parameters i.e. Single parameter ( Item difficulty (b)), Two Parameter ( Item difficulty(b), Item Discrimination(a)) and Three Parameter ( Item difficulty (b), Item Discrimination (a) and Item Guessing (c)) can be formulated.

[0015] Embodiments herein have extended this concept of assigning partial credit from IRT to CTT to examine the effect of awarding partial credit to options of a Multiple Choice Test item (a key and a plurality of distractors which are normally the assumed mistakes, misconceptions and misunderstandings of the test taker).

[0016] In an example embodiment disclosed herein, the key option choice is credited with 4 credits, revealing the next best option by the next number of choices by Higher Ability Group (HAG) test takers taking the same test awarding a credit of 3 and the next option which is lesser than the HAG choice of the second is credited with 2 and the last remaining option is credited with 1.

[0017] In a testing of abilities such as Analytical Ability (AA), Verbal Ability (VA), Numerical Ability (NA), Quantitative Ability (QA) and Attention to Details (ATD) are administered to a group of test takers and the results are analyzed using Classical Item Analysis (as disclosed herein). The starting point is the (A, B, C, D, X) format denoting the choice of either A, B, C or D or Omitting an answer denoted by X. Thus, the (A, B, C, D, X) format is used to generate the (1, 0, X) format where 1 is the correct answer, 0 is the incorrect answer, and X is one of the other options other than the key or an omitted answer. For all CTT test & item analysis, (1, 0, X) format is the basis for performing additional analysis. Embodiments herein use scores (such as 1, 0) herein merely as an example, however it may be obvious to a person of ordinary skill in the art that any other scoring method/pattem can be used.

[0018] FIGs. la, lb, and lc depict a system for using Partial Credit Model (PCM) in classical Test Theory (CTT) to improve the reliability estimate for the interpretation of scores in a multiple choice test. The system 100, as depicted, comprises a test analyzer module 101, a statistical analysis module 102, a reliability estimation module 103, an item analysis module 104, and a scoring engine 105.

[0019] The system 100 may be connected to at least one external module, such as at least one testing module 106 (as depicted in FIG. la), a scanner module 107 (as depicted in FIG. lb), and a database 108 (as depicted in FIG. lc). The testing module 106 may enable at least one user to take a test, wherein the test comprises at least one multiple choice question. The system 100 can receive data from the testing module 106 in real time, at pre-defined intervals of time (say, every 30 minutes till the duration of the time assigned for the test), on pre-defined events occurring (completing a section of the test, receiving a user input), and so on.

[0020] The scanner module 107 can comprise of a means to scan one or more tests and the scanned results can be provided to the testing module 106.

[0021] The database 108 can be a location for storing data, such as a database, a file server, a data server, the Cloud, Internet, a local server, and so on. The database 108 can comprise of information related to tests and results of tests taken by one or more users. In an embodiment herein, the database 108 can receive information from the testing module 106 and/or the scanner module 107.

[0022] Embodiments herein use the terms ‘item’ and ’question’ interchangeably, wherein both the terms refer to a multiple choice question in the test.

[0023] On receiving test results from at least one of the testing module or the scanning module 107 or fetched from the database 108, the test analyzer module 101 can calculate a total score by adding up all the candidate’s responses in (1, 0, X) format along the horizontal. The test analyzer module 101 further calculates the score of odd numbered items, and the score of even numbered items.

[0024] If negative marking is applied, then the test analyzer module 101 calculates the total score as follows:

total score = [ Score of right answers— score for wrong answers / n— 1)] where n= number of options available for a multiple choice question.

[0025] The test analyzer module 101 calculates the correlation between the score of odd numbered items and the score of the even numbered items. The test analyzer module 101 can calculate the total number of test takers who answered the question correctly, the total number of test takers who answered the question incorrectly, and the total number of test takers who have not attempted the question. From the above, the test analyzer module 101 can find the difference between the total number of test takers and the total number of users who have attempted the question (wherein the attempt can be a correct answer or an incorrect answer). The test analyzer module 101 can calculate a facility value as follows: Facility value (p)

= Total correct answers for a question/total answers for a question

[0026] The test analyzer module 101 can calculate an index of difficulty as follows:

index of difficulty ( q )

= Total incorrect answers for a question/ total answers for a question

Or

q = l - p

[0027] The test analyzer module 101 can calculate the product of the facility value and the index of difficulty and sum the values of the product. The test analyzer module 101 can calculate a difference for each question between each value of q and the minimum value of q.

[0028] The test analyzer module 101 can determine a scoring weight for each question as follows:

Scoring weight of a question (SI/ ) = 1 + the difference for the question

[0029] The statistical analysis module 102 can calculate the following values for the test for a user: mean, median, mode, standard deviation, sample variance, total number of items, minimum score, and maximum score.

[0030] The reliability estimation module 103 can calculate Split Half Reliability (SHR) as the correlation between the score of the odd numbered items, and the score of the even numbered items. Having obtained the total score, the reliability estimation module 103 can find the standard deviation for the total score for a user. The reliability estimation module 103 can calculate the Standard Error of Measurement (SEM) for SHR as follows:

[0031] The reliability estimation module 103 calculates the SEM as a percentage as follows:

SEM (%) = ( SEM * 100) /total number of questions

[0032] The reliability estimation module 103 calculates the Full Test Reliability (FTR) as follows:

2 * SHR

FTR = r_xx =

1 + SHR [0033] The reliability estimation module 103 calculates the SEM for FTR as follows: SEM for FTR = standard deviation * j( 1— FTR )

[0034] The reliability estimation module 103 calculates the SEM (%) as follows:

SEM (%) = ( SEM * 100 ) /total number of questions

[0035] The reliability estimation module 103 calculates the reliability estimate(s) of the test for the score of items with correct answers. If the test has less than 200 items, then calculate nl = 200 /N n2 = (200 /N - 1

Where N is the number of items in the test.

[0036] In an embodiment herein, the reliability estimation module 103 can calculate the reliability value using the Kuder-Richardson (lowest bound r) (KR 21) as follows:

KR 21 (r, reliability)

= (iV/(jV— 1)) * (1— ( Mean * (iV— mean)/(N * sample variance)))

SEM(KR 21) = Standard deviation * (l— KR 21)

SEM % (KR 21) = SEM(KR 21) * 100/JV

[0037] In an embodiment herein, the reliability estimation module 103 can calculate the reliability value using the Kuder-Richardson (lowest bound r) (KR 21) for 200 items as follows:

KR 21 (200) (r, reliability) = nl * KR 21/(1 + (n2 * KR 21)

SEM % (KR 21(200)) = SEM(KR 21 (200)) * 100 /N

[0038] The reliability estimation module 103 compares the calculated SEM % to a pre-defined threshold percentage (in this case, 94%). If the calculated SEM % is equal to or greater than the pre-defined threshold, the SEM % can be considered as satisfying the ETS world standard. If the calculated SEM % is less than the pre-defined threshold, the SEM % can be considered as not satisfying the ETS world standard. [0039] In an embodiment herein, the reliability estimation module 103 can calculate the reliability value (highest bound r) using the Kuder-Richardson (KR 20) for the score of items with correct answers as follows:

KR 20 = (iV/(jV— 1)) * (1— ( sum p * q)/ (sample variance ))

SEM % (KR 20) = SEM(KR20 ) * 100 /N

[0040] In an embodiment herein, the reliability estimation module 103 can calculate the reliability value for 200 items using the Kuder-Richardson (KR 20(200)) as follows:

KR 20(200) = nl * KR 20/(l + (n2 * KR 20))

SEM % (KR 20(200)) = SEM (KR20 (200)) * 100 /N

[0041] KR 20 can be considered as the CRONBACH coefficient alpha.

[0042] The item analysis module 104 writes all the responses along with number right scores (the answers which are right) sorted in descending order. The item analysis module 104 obtains the count for number of each of options in the multiple choice questions (for example, A, B, C, and D, and not answered (X)) in both HAG (Higher Ability Group) and LAG (Lower Ability Group) and arranges the responses from HAG and LAG groups in the example format as depicted in FIG. 2. The HAG and LAG data can be taken from raw data (HAG from the upper part and LAG from the bottom part).

[0043] In an embodiment herein, HAG can be considered as a top percentage level of test takers. For example, HAG can be considered as the top 27% of test takers.

[0044] In an embodiment herein, LAG can be considered as a bottom percentage level of test takers. For example, LAG can be considered as the bottom 27% of test takers.

[0045] The item analysis module 104 arranges all the items with the answer key. In an example, consider that the test is for evaluating the analytical ability of the user and the items are arranged as in the example depicted in FIG. 3a. The value in column A, B, C and D is the count of total number of candidates who responded with that option. The first row is of HAG (the lined cells) and second row is of LAG (the shaded cells). [0046] In an example, consider that the test is for evaluating the verbal ability of the user and the items are arranged as in the example depicted in FIG. 3b. The value in column A, B, C and D is the count of total number of candidates who responded with that option. The first row is of HAG (the lined cells) and second row is of LAG (the shaded cells).

[0047] The scoring engine 105 compares the number right scores with PCM scores with respect to reliability and error.

[0048] Considering the example depicted in FIG. 3a, it is seen that the number right scores have a KR21 reliability of 0.701513 which is a lower bound estimate and any other estimate will be more than this and a standard error of measurement 8.677179 which is also the highest error. Comparing it to the values obtained in PCM, the reliability estimate is seen as 0.910805 much improved from the number right scores reliability and a standard error of measurement 3.403708 which is a reduced value and can be considered to be more acceptable. If trying to be ascertain with ETS world standard for increased number of items i.e. 200, it is further seen that the reliability of no. right scores is increased to 0.940006 satisfying just ETS world standard and the SEM% of 3.890196. At the same time, PCM has an improved reliability of 0.985523 and a reduced error of 1.371256 for 200 items which is also satisfies the ETS world standard. This is depicted in FIG. 4a.

[0049] Consider the example in FIG. 3b, it is seen that the no. right scores (number of right scores) have a KR21 reliability 0.565748018 which is a lower bound estimate and any other estimate will be more than this and a standard error of measurement 9.703683215, which is also the highest error. Comparing it to the values obtained in PCM, the reliability estimate is seen as 0.844043851 which is much improved from no. right scores reliability and a standard error of measurement 3.667510978 which is much reduced value and more acceptable. If trying to be ascertain with ETS world standard for increased number of items i.e. 200, it is further seen that the reliability of no. right scores is increased to 0.912453365 partially satisfying just ETS world standard and the SEM% of 4.356978678. At the same time, PCM has an improved reliability of 0.977424838 and a reduced error of 1.395357881 for 200 items which is also satisfying much beyond ETS world standard and very acceptable. This is depicted in FIG. 4b.

[0050] The modules as disclosed above can store the intermediate results and the final results in a suitable location such as a memory, the database 108, the Cloud, a data server, a file server, the Internet, a local server, and so on. [0051] FIG. 5 illustrates a system 100 implementing an apparatus and methods for using Partial Credit Model (PCM) in classical Test Theory (CTT) to improve the reliability estimate for the interpretation of scores in a multiple choice test, according to embodiments as disclosed herein.

[0052] As depicted in the figure, the computing environment 502 comprises at least one processing unit 508 that further comprises a control unit 504, an Arithmetic Logic Unit (ALU) 506, a memory 510, a storage unit 512, a plurality of networking devices 516 and a plurality of Input output (I/O) devices 514.

[0053] The processing unit 508 is responsible for processing the instructions of the embodiments as disclosed herein. The processing unit 508 receives commands from the control unit in order to perform its processing. Further, any logical and arithmetic operations involved in the execution of the instructions are computed with the help of the ALU 506. The overall computing environment 502 can be composed of multiple homogeneous or heterogeneous cores, multiple CPUs of different kinds, special media and other accelerators. The processing unit 508 is responsible for processing the instructions of the scheme. Further, the plurality of processing units 508 may be located on a single chip or over multiple chips.

[0054] The processing unit 508 can comprise of the test analyzer module 101, the statistical analysis module 102, the reliability estimation module 103, the item analysis module 104, and the scoring engine 105.

[0055] On receiving test results from at least one of the testing module or the scanning module 107 or fetched from the database 108, the test analyzer module 101 can calculate the total scores of all the items, the score of odd numbered items, and the score of even numbered items. The test analyzer module 101 can calculate the correlation between the score of odd numbered items and the score of the even numbered items. The test analyzer module 101 can calculate the total number of test takers who answered the question correctly, the total number of test takers who answered the question incorrectly, and the total number of test takers who have not attempted the question. From the above, the test analyzer module 101 find the difference between the total number of test takers and the total number of users who have attempted the question (wherein the attempt can be a correct answer or an incorrect answer). The test analyzer module 101 can calculate the facility value (p). The test analyzer module 101 can calculate the index of difficulty (q). The test analyzer module 101 can calculate the product of the facility value and the index of difficulty and sum the values of the product. The test analyzer module 101 can calculate the difference for each question between each value of q and the minimum value of q. The test analyzer module 101 can determine the scoring weight of each question.

[0056] The statistical analysis module 102 can calculate the following statistical values for the test for a user: mean, median, mode, standard deviation, sample variance, total number of items, minimum score, and maximum score.

[0057] The reliability estimation module 103 can calculate the SHR as the correlation between the score of the odd numbered items, and the score of the even numbered items. The reliability estimation module 103 can find the standard deviation for the total score for a user. The reliability estimation module 103 can calculate the Standard Error of Measurement (SEM) for the SHR, which can be in terms of a percentage. The reliability estimation module

103 can calculate the FTR. The reliability estimation module 103 can calculate the SEM for FTR, which can be in terms of a percentage. The reliability estimation module 103 can calculate the reliability estimate(s) of the test for the score of items with correct answers.

[0058] The item analysis module 104 can write all the responses along with number right scores sorted in descending order. The item analysis module 104 can obtain the count for number of A, B, C, D, X in both the HAG and the LAG and arranges the responses from HAG and LAG groups in the example format as depicted in FIG. 2. The item analysis module

104 can arrange all the items with the answer key. The scoring engine 105 can compare the number right scores with PCM scores with respect to reliability and error.

[0059] The scheme comprising of instructions and codes required for the implementation are stored in either the memory 510 or the storage 512 or both. At the time of execution, the instructions may be fetched from a corresponding memory or storage 512, and executed by the processing unit 508.

[0060] In case of any hardware implementations various networking devices 516 or external I/O devices 514 may be connected to the computing environment to support the implementation through the networking unit and the I/O device unit.

[0061] In an embodiment, the computing environment 502 may be at least one of an electronic device, a server, a client device, and so on. The computing environment 502 may perform accelerating tasks during storage caching and tiering. The computing environment 502 may include the application management framework. The application management framework may include plurality of processing modules 106 and sub modules. The processing modules 508 may be stored in the storage unit 512. The processing modules 508 may be responsible for execution of the task for accelerating tasks during storage caching and tiering.

[0062] FIGs. 6a, 6b and 6c are flowcharts depicting the process of using Partial Credit Model (PCM) in classical Test Theory (CTT) to improve the reliability estimate for the interpretation of scores in a multiple choice test.

[0063] On receiving test results from at least one of the testing module or the scanning module 107 or fetched from the database 108, the test analyzer module 101 calculates (601) the total scores of all the items, the score of odd numbered items, and the score of even numbered items. The test analyzer module 101 calculates (602) the correlation between the score of odd numbered items and the score of the even numbered items. The test analyzer module 101 calculates (603) the total number of test takers who answered the question correctly, the total number of test takers who answered the question incorrectly, and the total number of test takers who have not attempted the question. From the above, the test analyzer module 101 finds (604) the difference between the total number of test takers and the total number of users who have attempted the question (wherein the attempt can be a correct answer or an incorrect answer). The test analyzer module 101 calculates (605) the facility value (p). The test analyzer module 101 calculates (606) the index of difficulty (q). The test analyzer module 101 calculates (607) the product of the facility value and the index of difficulty and sum the values of the product. The test analyzer module 101 calculates (608) the difference for each question between each value of q and the minimum value of q. The test analyzer module 101 determines (609) the scoring weight of each question.

[0064] The statistical analysis module 102 calculates (610) the following statistical values for the test for a user: mean, median, mode, standard deviation, sample variance, total number of items, minimum score, and maximum score.

[0065] The reliability estimation module 103 calculates (611) the SHR as the correlation between the score of the odd numbered items, and the score of the even numbered items. The reliability estimation module 103 finds (612) the standard deviation for the total score for a user. The reliability estimation module 103 calculates (613) the Standard Error of Measurement (SEM) for the SHR, which can be in terms of a percentage. The reliability estimation module 103 calculates (614) the FTR. The reliability estimation module 103 calculates (615) the SEM for FTR, which can be in terms of a percentage. The reliability estimation module 103 calculates (616) the reliability estimate(s) of the test for the score of items with correct answers.

[0066] The item analysis module 104 sorting (617) all the responses along with number right scores in descending order. The item analysis module 104 obtains (618) the count for number of A, B, C, D, X in both the HAG and the LAG and arranges the responses from HAG and LAG groups in the example format as depicted in FIG. 2. The item analysis module 104 arranges (619) all the items with the answer key. The scoring engine 105 compares (620) the number right scores with PCM scores with respect to reliability and error. The various actions in method 600 may be performed in the order presented, in a different order or simultaneously. Further, in some embodiments, some actions listed in FIGs. 6a, 6b and 6c may be omitted.

[0067] Embodiments herein can result in a considerable reduction in measurement precision given by Standard Error of Measurement (SEM).

[0068] Embodiments herein can be used to award partial credit to choices of a multiple choice test item. Embodiments herein can allot the key option a maximum number of credits, the next best option indicated by the number of HAG choices less than that of the key given a one lesser than the maximum number of credits and following the same procedure.

[0069] The embodiments disclosed herein can be implemented through at least one software program running on at least one hardware device and performing network management functions to control the network elements. The elements shown in FIGs. 1 and 5 include blocks which can be at least one of a hardware device, or a combination of hardware device and software module.

[0070] The embodiment disclosed herein describes using Partial Credit Model (PCM) in classical Test Theory (CTT) to improve the reliability estimate for the interpretation of scores in a multiple choice test. Therefore, it is understood that the scope of the protection is extended to such a program and in addition to a computer readable means having a message therein, such computer readable storage means contain program code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The method is implemented in at least one embodiment through or together with a software program written in e.g. Very high speed integrated circuit Hardware Description Language (VHDL) another programming language, or implemented by one or more VHDL or several software modules being executed on at least one hardware device. The hardware device can be any kind of portable device that can be programmed. The device may also include means which could be e.g. hardware means like e.g. an ASIC, or a combination of hardware and software means, e.g. an ASIC and an FPGA, or at least one microprocessor and at least one memory with software modules located therein. The method embodiments described herein could be implemented partly in hardware and partly in software. Alternatively, the invention may be implemented on different hardware devices, e.g. using a plurality of CPUs.

[0071] The foregoing description of the specific embodiments will so fully reveal the general nature of the embodiments herein that others can, by applying current knowledge, readily modify and/or adapt for various applications such specific embodiments without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments. It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation. Therefore, while the embodiments herein have been described in terms of embodiments and examples, those skilled in the art will recognize that the embodiments and examples disclosed herein can be practiced with modification within the spirit and scope of the embodiments as described herein.

Claims

STATEMENT OF CLAIMS We claim:

1. A method (600) for scoring a test comprising at least one multiple choice question, the method comprising:

calculating (601), by a test analyzer module (101), total scores of all multiple choice questions present in the test, score of odd multiple choice questions present in the test, and score of even multiple choice questions present in the test;

calculating (602), by the test analyzer module (101), a correlation between the score of odd multiple choice questions and the score of the even multiple choice questions;

calculating (603), by the test analyzer module (101), total number of test takers who answered each question correctly, total number of test takers who answered each question incorrectly, and total number of test takers who have not attempted each question;

finding (604), by the test analyzer module (101), a difference between the total number of test takers and the total number of users who have attempted each question;

calculating (605), by the test analyzer module (101), a facility value for each question; calculating (606), by the test analyzer module (101), the index of difficulty for each question;

calculating (607), by the test analyzer module (101), a sum of values of the product of the facility value and the index of difficulty for each question;

calculating (608), by the test analyzer module (101), a difference for each question between each value of the index of difficulty and a minimum value of the index of difficulty; determining (609), by the test analyzer module (101), a scoring weight for each question; calculating (611), by a reliability estimation module (103), a Split Half Reliability (SHR) as a correlation between the score of the odd multiple choice questions, and the score of the even multiple choice questions;

calculating (613), by the reliability estimation module (103), a Standard Error of Measurement (SEM) for the SHR;

calculating (614), by the reliability estimation module (103), a Full Test Reliability (FTR); calculating (615), by the reliability estimation module (103), a SEM for the calculated FTR;

calculating (616), by the reliability estimation module (103), reliability estimate(s) of the test for the score of multiple choice questions with correct answers;

sorting (617), by an item analysis module (104), all responses to the multiple choice questions with number right scores in descending order;

obtaining (618), by the item analysis module (104), the count for number of possible answer options in the HAG (Higher Ability Group) and LAG (Lower Ability Group);

arranging (618), by the item analysis module (104), the responses from HAG and LAG; and

arranging (619), by the item analysis module (104), all the items with the answer key.

2. The method, as claimed in claim 1, wherein calculating (605), by the test analyzer module (101), the facility value for each question as total correct answers for a question)/(total answers for a question.

3. The method, as claimed in claim 1, wherein calculating (606), by the test analyzer module (101), the index of difficulty for each question as total incorrect answers for a question)/(total answers for a question.

4. The method, as claimed in claim 1, wherein determining (609), by the test analyzer module (101), the scoring weight for each question as the sum of one and the difference for each question between each value of the index of difficulty and the minimum value of the index of difficulty.

5. The method, as claimed in claim 1, wherein the method further comprises calculating (610), by the statistical analysis module (102), a plurality of statistical values for the test for a user comprising mean, median, mode, standard deviation, sample variance, total number of items, minimum score, and maximum score.

6. The method, as claimed in claim 1, wherein the reliability estimates depends on number of multiple choice questions in the test.

7. The method, as claimed in claim 1, wherein the method further comprises comparing (620), by a scoring engine (105), the number right scores with PCM scores with respect to reliability and error.

8. A system (100) for scoring a test comprising at least one multiple choice question, the system comprising:

a memory (510);

a storage (512); and

a processing unit (508) further comprising

a test analyzer module (101) configured for

calculating total scores of all multiple choice questions present in the test, score of odd multiple choice questions present in the test, and score of even multiple choice questions present in the test;

calculating a correlation between the score of odd multiple choice questions and the score of the even multiple choice questions;

calculating total number of test takers who answered each question correctly, total number of test takers who answered each question incorrectly, and total number of test takers who have not attempted each question;

finding a difference between the total number of test takers and the total number of users who have attempted each question;

calculating a facility value for each question;

calculating the index of difficulty for each question;

calculating a sum of values of the product of the facility value and the index of difficulty for each question;

calculating a difference for each question between each value of the index of difficulty and a minimum value of the index of difficulty;

determining a scoring weight for each question;

a reliability estimation module (103) configured for

calculating a Split Half Reliability (SHR) as a correlation between the score of the odd multiple choice questions, and the score of the even multiple choice questions;

calculating a Standard Error of Measurement (SEM) for the SHR; calculating a Full Test Reliability (FTR); calculating a SEM for the calculated FTR;

calculating reliability estimate(s) of the test for the score of multiple choice questions with correct answers;

an item analysis module (104) configured for

sorting all responses to the multiple choice questions with number right scores in descending order;

obtaining the count for number of possible answer options in the HAG (Higher Ability Group) and LAG (Lower Ability Group);

arranging the responses from HAG and LAG; and

arranging all the items with the answer key.

9. The system, as claimed in claim 8, wherein the test analyzer module (101) is further configured for calculating the facility value for each question as total correct answers for a question)/(total answers for a question.

10. The system, as claimed in claim 8, wherein the test analyzer module (101) is configured for calculating the index of difficulty for each question as total incorrect answers for a question)/(total answers for a question.

11. The system, as claimed in claim 8, wherein the test analyzer module (101) is configured for determining the scoring weight for each question as the sum of one and the difference for each question between each value of the index of difficulty and the minimum value of the index of difficulty.

12. The system, as claimed in claim 8, wherein the statistical analysis module (102) is further configured for calculating a plurality of statistical values for the test for a user comprising mean, median, mode, standard deviation, sample variance, total number of items, minimum score, and maximum score.

13. The system, as claimed in claim 8, wherein the reliability estimates depends on number of multiple choice questions in the test.

14. The system, as claimed in claim 8, wherein the system (100) further comprising a scoring engine (105) configured for comparing the number right scores with PCM scores with respect to reliability and error.