US20110311957A1 - Constructed response scoring mechanism - Google Patents
Constructed response scoring mechanism Download PDFInfo
- Publication number
- US20110311957A1 US20110311957A1 US13/221,703 US201113221703A US2011311957A1 US 20110311957 A1 US20110311957 A1 US 20110311957A1 US 201113221703 A US201113221703 A US 201113221703A US 2011311957 A1 US2011311957 A1 US 2011311957A1
- Authority
- US
- United States
- Prior art keywords
- scoring
- response
- objects
- rubric
- responses
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B7/00—Electrically-operated teaching apparatus or devices working with questions and answers
- G09B7/02—Electrically-operated teaching apparatus or devices working with questions and answers of the type wherein the student is expected to construct an answer to the question which is presented or wherein the machine gives an answer to the question presented by a student
Definitions
- depth of knowledge refers to the depth with which students understand the material that they are taught.
- the specific stages and levels of depth vary across taxonomies, but the general idea is that knowledge becomes deeper and more internalized with additional mastery, and that in turn allows more robust application of the knowledge.
- test questions When assessing student mastery, it is often desirable to evaluate their depth of knowledge. From the perspective of test developers it can be quite difficult to develop selected response items (test questions) that measure deeper levels of knowledge.
- a selected response item is a test question, such as a multiple choice question, in which the correct answer is selected from a collection of choices.
- a constructed response item is an item that does not offer the examinee answer options from which to choose, but rather the examinee must construct a response.
- each student's response is evaluated against a scoring rubric, which describes the characteristics of a response that should receive full credit.
- a scoring rubric describes the characteristics of a response that should receive full credit.
- the characteristics of responses that receive some portion of the total overall score are also enumerated. For example, an item might award three points for full credit, and individually enumerate characteristics of imperfect responses that would warrant the award of two points, one point, and zero points.
- the scoring rubric usually goes through a refinement process called rangefinding.
- rangefinding samples of student responses (usually from a field test) are evaluated by a committee of subject matter experts with the goal of selecting sample responses exemplifying each score point to be awarded. It is not uncommon for the scoring rubrics to be refined during this process.
- the current process has several limitations. First, it is very expensive to score constructed response items by hand, requiring that each response be read by one or more qualified scorers. Furthermore, the process by which scoring rubrics are refined does not offer an opportunity for large-scale evaluation of the consequences of the refinements, risking potential unintended consequences. Additionally, the process necessarily takes time, limiting the usefulness of constructed response items in online tests. For example, adaptive online tests use the scores on items administered early in the test to select the best items to administer later. Due to current limitations, human scoring prevents using constructed response items to support adaptive testing.
- an invention directed towards systems and methods that improve the current process for administering and scoring constructed response items. These systems and methods, used separately or together, allow for the immediate score of constructed response items.
- the invention has the practical benefits of reducing costs of administering constructed response items, providing more reliable scoring constructed response items, broadly validating constructed response scoring rubrics, and allowing for the integration of constructed response items into computerized adaptive tests delivered online or at testing centers.
- FIG. 1 is a block diagram of an embodiment of an exemplary system for implementing the invention.
- FIG. 2 is a screenshot of an exemplary user interface (UI) for collecting student responses according to an embodiment of the invention.
- UI user interface
- FIG. 3 illustrates exemplary binding statements used in the binding stage according to an embodiment of the invention.
- FIG. 4 illustrates exemplary assertions according to an embodiment of the invention.
- FIG. 5 illustrates an exemplary snippet from a scoring specification for a three-point item according to an embodiment of the invention.
- FIG. 6 is a block diagram of an exemplary method for scoring user responses according to an embodiment of the invention.
- embodiments of the present invention may be embodied as, among other things: a method, system, or computer-program product. Accordingly, the embodiments may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware. In one embodiment, the present invention takes the form of a computer-program product that includes computer-useable instructions embodied on one or more computer-readable media.
- Computer-readable media include both volatile and nonvolatile media, removable and nonremovable media, and contemplates media readable by a database, a switch, and various other network devices. Network switches, routers, and related components are conventional in nature, as are means of communicating with the same.
- computer-readable media comprise computer-storage media and communications media.
- Computer-storage media include media implemented in any method or technology for storing information. Examples of stored information include computer-useable instructions, data structures, program modules, and other data representations.
- Computer-storage media include, but are not limited to RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile discs (DVD), holographic media or other optical disc storage, magnetic cassettes, magnetic tape, magnetic disk storage, and other magnetic storage devices. These memory components can store data momentarily, temporarily, or permanently.
- Communications media typically store computer-useable instructions—including data structures and program modules—in a modulated data signal.
- modulated data signal refers to a propagated signal that has one or more of its characteristics set or changed to encode information in the signal.
- An exemplary modulated data signal includes a carrier wave or other transport mechanism.
- Communications media include any information-delivery media.
- communications media include wired media, such as a wired network or direct-wired connection, and wireless media such as acoustic, infrared, radio, microwave, spread-spectrum, and other wireless media technologies. Combinations of the above are included within the scope of computer-readable media.
- FIG. 1 is a block diagram of an embodiment of an exemplary system 100 for implementing the invention.
- the system 100 includes components such as client 102 , scoring manager 104 , scoring engine 106 , and primitive library 108 .
- Each component includes a communication interface.
- the communication interface may be an interface that can allow a component to be directly connected to any other component or allows the component to be connected to another component over network 110 .
- Network 110 can include, for example, a local area network (LAN), a wide area network (WAN), cable system, telco system, or the Internet.
- a component can be connected to another device via a wireless communication interface through the network 110 .
- Client 102 may be or can include a desktop computer, a laptop computer or other mobile computing device, a network-enabled cellular telephone (with or without media capturing/playback capabilities), a server, a wireless email client, or other client, machine or device, or any combination of the above, to perform various tasks including Web browsing, search, electronic mail (email) and other tasks, applications and functions.
- Client 102 may additionally be any portable media device such as digital still camera devices, digital video cameras (with or without still image capture functionality), media players such as personal music players and personal video players, and any other portable media device, or any combination of the above.
- Scoring manager 104 is utilized for administering and scoring constructed response items for a user.
- the scoring manager 104 may also be utilized to generate individual expert systems to represent the scoring knowledge for a single constructed response item.
- the scoring manager 104 additionally may be configured to refine each expert system and test it against a broad range of student responses.
- scoring manager 104 is a server external to client 102 .
- scoring manger 104 may be an application that resides and is executable within client 102 .
- scoring engine 106 and primitive library 108 are components that reside within scoring manager 104 . In other embodiments, one or more of the scoring engine 106 and primitive library 108 may be external to the scoring manager 104 .
- the scoring engine 106 is a component that receives a user's response to a question and evaluates the response against a scoring rubric.
- the scoring engine 106 may include or have access to the library of primitives 108 .
- the primitive library 108 may include the calculation of distances, slopes, comparisons of strings and numbers, and other basic operations. In order to make the scoring engine 106 general so that it can support a very large range of items, the primitive library 108 may be low-level and higher order predicates may be created from the primitive library 108 .
- complex predicates may be added to the primitive library.
- the language for representing a scoring rubric may enable the library functions to reference elements including, but not limited to, object sets, objects, attributes of objects, as well as transformations of any of these elements.
- FIG. 2 is a screenshot of an exemplary user interface (UI) 200 for collecting student responses according to an embodiment of the invention.
- the UI 200 may be presented through an application that is part of the client 102 and/or the scoring manager 104 .
- the invention uses items developed for user responses collected on a Cartesian grid to illustrate points.
- the invention can be applied to user responses to other types of items using other response modes.
- the UI 200 is referred to as an Interactive Grid (IG). A broad range of different item types can be presented in the IG.
- IG Interactive Grid
- the UI 200 may be used to ensure that user responses are collected with a consistent mechanism that creates and transmits a data structure to a scoring engine.
- a user response may comprise a set of objects, each of which may have one or more attributes.
- the UI can produce a collection of objects that may include points, line segments connecting points, geometric objects comprised of connected line segments, and user-defined atomic objects, such as the weights 202 on the left palette in FIG. 200 .
- Each object may be characterized by an ordered set of points.
- the lines on the bottom-center of the weights 202 may be an example of an ordered set of points.
- the UI 200 can return a data structure containing these objects to the scoring engine 106 .
- objects have properties that include, but are not limited to, locations, names, labels, and values.
- the UI 200 can be configured to capture natural language where the object set may include elements of a semantic network derived from a parse of the text provided by the user.
- the UI 200 can be configured to capture input from an equation editor representing sequences of symbols as the initial set of objects.
- an application to test proficiency with a computer program may capture menu commands, keyboard input, or mouse events as the set of objects.
- this list is intended to be exemplary rather than exhaustive.
- a scoring rubric may be defined in three sequential stages: a binding stage, in which references to elements are established; an assertion stage, in which assertions about elements are evaluated and stored; and a scoring stage, in which a score is assigned based on the values of the results of the assertions.
- XML-based language may be used for implementing these stages for the UI responses.
- FIG. 3 illustrates exemplary binding statements used in the binding stage according to an embodiment of the invention.
- FIG. 3 presents two exemplary binding statements.
- the first, “SelectObjectSet” binds a subset of the input set to the variable “S1.”
- the first binding statement collects all of the objects that have at least one side (NUMBEROFSIDES GT 0).
- the symbol “@” is bound sequentially to each object in the input set.
- the second statement in FIG. 3 “Bind,” creates an additional binding, associating the symbol “S1Count” with the number of elements contained in set “S1.”
- the symbol “$” dereferences the previously bound variable “S1.”
- FIG. 4 illustrates exemplary assertions according to an embodiment of the invention.
- the first two assertions dereference the previously bound integer “S1Count” and assert that it is (respectively) equal to four and less than four.
- These assertions are named by the user, for example, “FourObjects” and “FewerObjects.”
- the third assertion references another previously bound integer, which is the count of objects in another set meeting some other set of conditions. In this example, the third assertion is named “FourGood” and is true if the value of “S3Count” is 4.
- the representation of annotated And-Or trees is well known in the computer science art.
- the internal representation used is a set of nodes, in which each node has a list of children, each of which can be an And node, an Or node, or an assertion node.
- the resulting internal representation of the binding, assertion, and scoring trees comprises an Answer Set that includes an expert system embodying the knowledge of the scoring rubric for a particular item.
- the scoring rubric may be written directly in the specification language or authoring tools may be developed to help test developers specify the rubrics. In some embodiments, tools may be domain specific.
- FIG. 6 is a block diagram of an exemplary method 600 for scoring user responses according to an embodiment of the invention.
- the three-stage Answer Set is applied to the set of elements returned by the UI.
- One practical value of method 600 is that this process facilitates the use of a low-level library of primitives which can reduce or eliminate the need for any programming when defining a very broad range of new items or item types.
- method 600 can integrate the assertion and scoring stage into one stage.
- a user response is captured as a collection of objects with attributes.
- the response is captured through a UI such as UI 200 ( FIG. 2 ).
- a component binds the variables identified in a binding tree.
- the component is a scoring engine such as scoring engine 106 ( FIG. 1 ).
- results are stored for each named assertion.
- a scoring tree is evaluated, stopping when the subtree associated with a score evaluates to true.
- the disclosed invention also presents an enhanced method of “rangefinding” which refines expert systems and tests them against a broad range of student responses.
- Rangefinding is a committee process in which subject-matter experts agree on appropriate scores for sample examinee responses.
- rangefinding a small sample of items, often in the range of 25-100, are reviewed by committees to test the application of the scoring rubrics.
- refinements are made to the rubric, and sample papers are selected to train scoring staff on the accurate scoring of responses to the item.
- FIG. 7 is a block diagram of an exemplary method 700 for refining a scoring rubric according to an embodiment of the invention.
- items are field tested and are scored either in real time or after data collection.
- a sample of responses are identified for transmission to a rangefinding committee.
- the sample of responses may be selected by combing a small random sample with student responses selected to represent the work of otherwise high- or low-performing students. This may be done because otherwise high-performing students may score poorly on the item, or otherwise low-performing students may score well on the item.
- items and corresponding scores are provided to the rangefinding committee.
- the rangefinding committee is trained in the formal specifications of the scoring rubric.
- one or more rules or principles are identified that differentiates the correct score from the incorrect scores.
- a modification to the scoring rubric, corresponding to the indentified rules is provided.
- the identified rules for modifying the scoring rubric are applied to field test responses in order to identify any unintended consequences of the new rules. In an embodiment, this may be done by identifying scores that changed under the new rules and evaluating those changes.
- a consensus on whether to fully implement the new rules is achieved based on the modification to the formal scoring rubric. In an embodiment, the consensus is achieved after the committee reviews a new sample of responses for which the revision resulted in a change of scores and determines that the changes are limited to those intended.
Abstract
A system, method, and related techniques are disclosed for scoring user responses to constructed response test items. The system includes a scoring engine for receiving a user response to a test question and evaluating the response against a scoring rubric. The scoring rubric may include a binding stage, an assertion stage, and a scoring stage. Furthermore, the system includes a database for referencing elements used by the scoring engine which may comprise objects, object sets, attributes of objects, and transformations of any elements.
Description
- This application is a divisional of U.S. application Ser. No. 12/320,631 filed Jan. 30, 2009, entitled “Constructed Response Scoring Mechanism”, which claims the benefit of U.S. Provisional Application No. 61/193,252 filed Nov. 12, 2008 and titled “Constructed Response Scoring Mechanism,” each of which application is hereby incorporated by reference in its entirety for all purposes in its entirety for all purposes.
- Learning often happens incrementally. At first students may be able to recall, recognize or name concepts. As mastery increases, they may be able describe concepts, the properties of concepts, or relationships among concepts. Eventually, students may be able to apply concepts to novel situations, use learned material to generate new insights, or synthesize learned material. This learning sequence is often referred to as “depth of knowledge,” and refers to the depth with which students understand the material that they are taught. The specific stages and levels of depth vary across taxonomies, but the general idea is that knowledge becomes deeper and more internalized with additional mastery, and that in turn allows more robust application of the knowledge.
- When assessing student mastery, it is often desirable to evaluate their depth of knowledge. From the perspective of test developers it can be quite difficult to develop selected response items (test questions) that measure deeper levels of knowledge. A selected response item is a test question, such as a multiple choice question, in which the correct answer is selected from a collection of choices.
- Many testing programs use constructed response items to measure content at deeper levels of knowledge. A constructed response item is an item that does not offer the examinee answer options from which to choose, but rather the examinee must construct a response.
- In a typical system, each student's response is evaluated against a scoring rubric, which describes the characteristics of a response that should receive full credit. When partial credit is to be awarded, the characteristics of responses that receive some portion of the total overall score are also enumerated. For example, an item might award three points for full credit, and individually enumerate characteristics of imperfect responses that would warrant the award of two points, one point, and zero points.
- The scoring rubric usually goes through a refinement process called rangefinding. In this process, samples of student responses (usually from a field test) are evaluated by a committee of subject matter experts with the goal of selecting sample responses exemplifying each score point to be awarded. It is not uncommon for the scoring rubrics to be refined during this process.
- Using the refined rubric, human scorers apply the scoring criteria to score each examinee's response to the item. Typically, this process is monitored and managed, giving each scorer a number of pre-scored papers to evaluate whether they continue to apply the rubric correctly, and having a proportion of scored papers independently scored by a second scorer to monitor the reliability with which scorers apply the rubric.
- The current process has several limitations. First, it is very expensive to score constructed response items by hand, requiring that each response be read by one or more qualified scorers. Furthermore, the process by which scoring rubrics are refined does not offer an opportunity for large-scale evaluation of the consequences of the refinements, risking potential unintended consequences. Additionally, the process necessarily takes time, limiting the usefulness of constructed response items in online tests. For example, adaptive online tests use the scores on items administered early in the test to select the best items to administer later. Due to current limitations, human scoring prevents using constructed response items to support adaptive testing.
- Presented is an invention directed towards systems and methods that improve the current process for administering and scoring constructed response items. These systems and methods, used separately or together, allow for the immediate score of constructed response items. The invention has the practical benefits of reducing costs of administering constructed response items, providing more reliable scoring constructed response items, broadly validating constructed response scoring rubrics, and allowing for the integration of constructed response items into computerized adaptive tests delivered online or at testing centers.
- This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
- Illustrative embodiments of the present invention are described in detail below with reference to the attached drawing figures, which are incorporated by reference herein and wherein:
-
FIG. 1 is a block diagram of an embodiment of an exemplary system for implementing the invention. -
FIG. 2 is a screenshot of an exemplary user interface (UI) for collecting student responses according to an embodiment of the invention. -
FIG. 3 illustrates exemplary binding statements used in the binding stage according to an embodiment of the invention. -
FIG. 4 illustrates exemplary assertions according to an embodiment of the invention. -
FIG. 5 illustrates an exemplary snippet from a scoring specification for a three-point item according to an embodiment of the invention. -
FIG. 6 is a block diagram of an exemplary method for scoring user responses according to an embodiment of the invention. - As one skilled in the art will appreciate, embodiments of the present invention may be embodied as, among other things: a method, system, or computer-program product. Accordingly, the embodiments may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware. In one embodiment, the present invention takes the form of a computer-program product that includes computer-useable instructions embodied on one or more computer-readable media.
- Computer-readable media include both volatile and nonvolatile media, removable and nonremovable media, and contemplates media readable by a database, a switch, and various other network devices. Network switches, routers, and related components are conventional in nature, as are means of communicating with the same. By way of example, and not limitation, computer-readable media comprise computer-storage media and communications media.
- Computer-storage media, or machine-readable media, include media implemented in any method or technology for storing information. Examples of stored information include computer-useable instructions, data structures, program modules, and other data representations. Computer-storage media include, but are not limited to RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile discs (DVD), holographic media or other optical disc storage, magnetic cassettes, magnetic tape, magnetic disk storage, and other magnetic storage devices. These memory components can store data momentarily, temporarily, or permanently.
- Communications media typically store computer-useable instructions—including data structures and program modules—in a modulated data signal. The term “modulated data signal” refers to a propagated signal that has one or more of its characteristics set or changed to encode information in the signal. An exemplary modulated data signal includes a carrier wave or other transport mechanism. Communications media include any information-delivery media. By way of example but not limitation, communications media include wired media, such as a wired network or direct-wired connection, and wireless media such as acoustic, infrared, radio, microwave, spread-spectrum, and other wireless media technologies. Combinations of the above are included within the scope of computer-readable media.
-
FIG. 1 is a block diagram of an embodiment of anexemplary system 100 for implementing the invention. Thesystem 100 includes components such asclient 102,scoring manager 104,scoring engine 106, andprimitive library 108. Each component includes a communication interface. The communication interface may be an interface that can allow a component to be directly connected to any other component or allows the component to be connected to another component overnetwork 110.Network 110 can include, for example, a local area network (LAN), a wide area network (WAN), cable system, telco system, or the Internet. In an embodiment, a component can be connected to another device via a wireless communication interface through thenetwork 110. -
Client 102 may be or can include a desktop computer, a laptop computer or other mobile computing device, a network-enabled cellular telephone (with or without media capturing/playback capabilities), a server, a wireless email client, or other client, machine or device, or any combination of the above, to perform various tasks including Web browsing, search, electronic mail (email) and other tasks, applications and functions.Client 102 may additionally be any portable media device such as digital still camera devices, digital video cameras (with or without still image capture functionality), media players such as personal music players and personal video players, and any other portable media device, or any combination of the above. -
Scoring manager 104 is utilized for administering and scoring constructed response items for a user. Thescoring manager 104 may also be utilized to generate individual expert systems to represent the scoring knowledge for a single constructed response item. Thescoring manager 104 additionally may be configured to refine each expert system and test it against a broad range of student responses. In an embodiment, scoringmanager 104 is a server external toclient 102. In another embodiment, scoringmanger 104 may be an application that resides and is executable withinclient 102. - As shown, scoring
engine 106 andprimitive library 108 are components that reside withinscoring manager 104. In other embodiments, one or more of thescoring engine 106 andprimitive library 108 may be external to thescoring manager 104. Thescoring engine 106 is a component that receives a user's response to a question and evaluates the response against a scoring rubric. Thescoring engine 106 may include or have access to the library ofprimitives 108. In an embodiment, theprimitive library 108 may include the calculation of distances, slopes, comparisons of strings and numbers, and other basic operations. In order to make thescoring engine 106 general so that it can support a very large range of items, theprimitive library 108 may be low-level and higher order predicates may be created from theprimitive library 108. In other embodiments, complex predicates may be added to the primitive library. In an embodiment, in using theprimitive library 108, the language for representing a scoring rubric may enable the library functions to reference elements including, but not limited to, object sets, objects, attributes of objects, as well as transformations of any of these elements. -
FIG. 2 is a screenshot of an exemplary user interface (UI) 200 for collecting student responses according to an embodiment of the invention. The UI 200 may be presented through an application that is part of theclient 102 and/or thescoring manager 104. In an embodiment, through the UI 200, the invention uses items developed for user responses collected on a Cartesian grid to illustrate points. In other embodiments, the invention can be applied to user responses to other types of items using other response modes. In an embodiment, the UI 200 is referred to as an Interactive Grid (IG). A broad range of different item types can be presented in the IG. - The UI 200 may be used to ensure that user responses are collected with a consistent mechanism that creates and transmits a data structure to a scoring engine. A user response may comprise a set of objects, each of which may have one or more attributes. For example, the UI can produce a collection of objects that may include points, line segments connecting points, geometric objects comprised of connected line segments, and user-defined atomic objects, such as the
weights 202 on the left palette inFIG. 200 . Each object may be characterized by an ordered set of points. For example, the lines on the bottom-center of theweights 202 may be an example of an ordered set of points. In an embodiment, the UI 200 can return a data structure containing these objects to thescoring engine 106. In an embodiment, objects have properties that include, but are not limited to, locations, names, labels, and values. - In another embodiment, the UI 200 can be configured to capture natural language where the object set may include elements of a semantic network derived from a parse of the text provided by the user. Alternatively, the UI 200 can be configured to capture input from an equation editor representing sequences of symbols as the initial set of objects. Moreover, in other embodiments, an application to test proficiency with a computer program may capture menu commands, keyboard input, or mouse events as the set of objects. However, this list is intended to be exemplary rather than exhaustive.
- In an embodiment, a scoring rubric may be defined in three sequential stages: a binding stage, in which references to elements are established; an assertion stage, in which assertions about elements are evaluated and stored; and a scoring stage, in which a score is assigned based on the values of the results of the assertions. XML-based language may be used for implementing these stages for the UI responses.
-
FIG. 3 illustrates exemplary binding statements used in the binding stage according to an embodiment of the invention.FIG. 3 presents two exemplary binding statements. The first, “SelectObjectSet” binds a subset of the input set to the variable “S1.” The first binding statement collects all of the objects that have at least one side (NUMBEROFSIDES GT 0). The symbol “@” is bound sequentially to each object in the input set. The second statement inFIG. 3 , “Bind,” creates an additional binding, associating the symbol “S1Count” with the number of elements contained in set “S1.” The symbol “$” dereferences the previously bound variable “S1.” - An assertion is a predicate that is either true or false. The assertion further is an atomic unit from which scoring rubrics can be built. Each assertion can be named for later reference in the scoring stage.
FIG. 4 illustrates exemplary assertions according to an embodiment of the invention. InFIG. 4 , the first two assertions dereference the previously bound integer “S1Count” and assert that it is (respectively) equal to four and less than four. These assertions are named by the user, for example, “FourObjects” and “FewerObjects.” The third assertion references another previously bound integer, which is the count of objects in another set meeting some other set of conditions. In this example, the third assertion is named “FourGood” and is true if the value of “S3Count” is 4. - In the scoring stage, named assertions are collected in a set of And-Or trees, one tree for each numeric score point. An exemplary snippet from a scoring specification for a three-point item appears in
FIG. 5 according to an embodiment of the invention. In this case, full credit is assigned if there are four objects represented and all four meet whatever criteria was used to construct set “S3,” and thus leading to the bound variable “S3Count” above. - The representation of annotated And-Or trees is well known in the computer science art. In an embodiment, the internal representation used is a set of nodes, in which each node has a list of children, each of which can be an And node, an Or node, or an assertion node. The resulting internal representation of the binding, assertion, and scoring trees comprises an Answer Set that includes an expert system embodying the knowledge of the scoring rubric for a particular item. The scoring rubric may be written directly in the specification language or authoring tools may be developed to help test developers specify the rubrics. In some embodiments, tools may be domain specific.
-
FIG. 6 is a block diagram of anexemplary method 600 for scoring user responses according to an embodiment of the invention. In an embodiment, the three-stage Answer Set is applied to the set of elements returned by the UI. One practical value ofmethod 600 is that this process facilitates the use of a low-level library of primitives which can reduce or eliminate the need for any programming when defining a very broad range of new items or item types. In an embodiment,method 600 can integrate the assertion and scoring stage into one stage. - At
operation 602, a user response is captured as a collection of objects with attributes. In an embodiment, the response is captured through a UI such as UI 200 (FIG. 2 ). Atoperation 604, a component binds the variables identified in a binding tree. In an embodiment, the component is a scoring engine such as scoring engine 106 (FIG. 1 ). Atoperation 606, results (true or false) are stored for each named assertion. At operation 608, starting with the highest possible score, a scoring tree is evaluated, stopping when the subtree associated with a score evaluates to true. - The disclosed invention also presents an enhanced method of “rangefinding” which refines expert systems and tests them against a broad range of student responses. Rangefinding is a committee process in which subject-matter experts agree on appropriate scores for sample examinee responses. During rangefinding, a small sample of items, often in the range of 25-100, are reviewed by committees to test the application of the scoring rubrics. During this process, refinements are made to the rubric, and sample papers are selected to train scoring staff on the accurate scoring of responses to the item.
- However, improvements are needed for enhancing the rangefinding process. The invention provides such improvements. For example, through the invention, decisions of the rangefinding committee can be expressed formally as assertions in the language used to define the scoring rubrics. Formalizing the committee results as a series of explicit rules improves the accuracy of scoring, and would likely lead to more reliable scoring even when scoring is done by human scorers. Furthermore, committee decisions can be systematically tested against the full set of field-test data to locate unintended consequences of the proposed new rules.
-
FIG. 7 is a block diagram of anexemplary method 700 for refining a scoring rubric according to an embodiment of the invention. Atoperation 702, items are field tested and are scored either in real time or after data collection. Atoperation 704, a sample of responses are identified for transmission to a rangefinding committee. In an embodiment, the sample of responses may be selected by combing a small random sample with student responses selected to represent the work of otherwise high- or low-performing students. This may be done because otherwise high-performing students may score poorly on the item, or otherwise low-performing students may score well on the item. - At
operation 706, items and corresponding scores are provided to the rangefinding committee. In an embodiment, the rangefinding committee is trained in the formal specifications of the scoring rubric. In instances where the committee reaches a consensus that a score is incorrect, at operation 708, one or more rules or principles are identified that differentiates the correct score from the incorrect scores. Atoperation 710, a modification to the scoring rubric, corresponding to the indentified rules, is provided. - At
operation 712, the identified rules for modifying the scoring rubric are applied to field test responses in order to identify any unintended consequences of the new rules. In an embodiment, this may be done by identifying scores that changed under the new rules and evaluating those changes. At operation 714, a consensus on whether to fully implement the new rules is achieved based on the modification to the formal scoring rubric. In an embodiment, the consensus is achieved after the committee reviews a new sample of responses for which the revision resulted in a change of scores and determines that the changes are limited to those intended. - While particular embodiments of the invention have been illustrated and described in detail herein, it should be understood that various changes and modifications might be made to the invention without departing from the scope and intent of the invention. The embodiments described herein are intended in all respects to be illustrative rather than restrictive. Alternate embodiments will become apparent to those skilled in the art to which the present invention pertains without departing from its scope.
- From the foregoing it will be seen that this invention is one well adapted to attain all the ends and objects set forth above, together with other advantages, which are obvious and inherent to the system and method. It will be understood that certain features and sub-combinations are of utility and may be employed without reference to other features and sub-combinations. This is contemplated and within the scope of the appended claims.
Claims (3)
1. A method for scoring user responses to a test question, comprising:
receiving a user response wherein the response is a collection of objects with one or more attributes;
binding one or more variables identified in a binding tree based on the collection of objects;
evaluating at least one named assertion associated with an element; and
storing the named assertion.
2. The method of claim 1 , further comprising commencing the evaluation of a scoring tree by starting with the highest possible score.
3. The method of claim 2 , wherein the evaluation of the scoring tree is stopped when a subtree associated with a score evaluates to true.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/221,703 US20110311957A1 (en) | 2008-11-12 | 2011-08-30 | Constructed response scoring mechanism |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US19325208P | 2008-11-12 | 2008-11-12 | |
US12/320,631 US9812026B2 (en) | 2008-11-12 | 2009-01-30 | Constructed response scoring mechanism |
US13/221,703 US20110311957A1 (en) | 2008-11-12 | 2011-08-30 | Constructed response scoring mechanism |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/320,631 Division US9812026B2 (en) | 2008-11-12 | 2009-01-30 | Constructed response scoring mechanism |
Publications (1)
Publication Number | Publication Date |
---|---|
US20110311957A1 true US20110311957A1 (en) | 2011-12-22 |
Family
ID=42165517
Family Applications (4)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/320,631 Active 2032-03-08 US9812026B2 (en) | 2008-11-12 | 2009-01-30 | Constructed response scoring mechanism |
US13/221,703 Abandoned US20110311957A1 (en) | 2008-11-12 | 2011-08-30 | Constructed response scoring mechanism |
US13/221,716 Abandoned US20110311958A1 (en) | 2008-11-12 | 2011-08-30 | Constructed response scoring mechanism |
US15/806,063 Active US10643489B2 (en) | 2008-11-12 | 2017-11-07 | Constructed response scoring mechanism |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/320,631 Active 2032-03-08 US9812026B2 (en) | 2008-11-12 | 2009-01-30 | Constructed response scoring mechanism |
Family Applications After (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/221,716 Abandoned US20110311958A1 (en) | 2008-11-12 | 2011-08-30 | Constructed response scoring mechanism |
US15/806,063 Active US10643489B2 (en) | 2008-11-12 | 2017-11-07 | Constructed response scoring mechanism |
Country Status (2)
Country | Link |
---|---|
US (4) | US9812026B2 (en) |
WO (1) | WO2010056316A1 (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130004931A1 (en) * | 2011-06-28 | 2013-01-03 | Yigal Attali | Computer-Implemented Systems and Methods for Determining Content Analysis Metrics for Constructed Responses |
US9967211B2 (en) | 2015-05-31 | 2018-05-08 | Microsoft Technology Licensing, Llc | Metric for automatic assessment of conversational responses |
CN113326626A (en) * | 2021-06-08 | 2021-08-31 | 核电运行研究(上海)有限公司 | User-oriented system modeling simulation platform and method |
WO2024040328A1 (en) * | 2022-08-26 | 2024-02-29 | Acuity Insights Inc. | System and process for secure online testing with minimal group differences |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5991595A (en) * | 1997-03-21 | 1999-11-23 | Educational Testing Service | Computerized system for scoring constructed responses and methods for training, monitoring, and evaluating human rater's scoring of constructed responses |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6120299A (en) * | 1997-06-06 | 2000-09-19 | Educational Testing Service | System and method for interactive scoring of standardized test responses |
US8374540B2 (en) * | 2002-03-15 | 2013-02-12 | Educational Testing Service | Consolidated on-line assessment system |
US20070218450A1 (en) * | 2006-03-02 | 2007-09-20 | Vantage Technologies Knowledge Assessment, L.L.C. | System for obtaining and integrating essay scoring from multiple sources |
US20090265307A1 (en) * | 2008-04-18 | 2009-10-22 | Reisman Kenneth | System and method for automatically producing fluent textual summaries from multiple opinions |
-
2009
- 2009-01-30 US US12/320,631 patent/US9812026B2/en active Active
- 2009-11-12 WO PCT/US2009/006066 patent/WO2010056316A1/en active Application Filing
-
2011
- 2011-08-30 US US13/221,703 patent/US20110311957A1/en not_active Abandoned
- 2011-08-30 US US13/221,716 patent/US20110311958A1/en not_active Abandoned
-
2017
- 2017-11-07 US US15/806,063 patent/US10643489B2/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5991595A (en) * | 1997-03-21 | 1999-11-23 | Educational Testing Service | Computerized system for scoring constructed responses and methods for training, monitoring, and evaluating human rater's scoring of constructed responses |
Also Published As
Publication number | Publication date |
---|---|
US20100120010A1 (en) | 2010-05-13 |
US10643489B2 (en) | 2020-05-05 |
US9812026B2 (en) | 2017-11-07 |
US20180225983A1 (en) | 2018-08-09 |
US20110311958A1 (en) | 2011-12-22 |
WO2010056316A1 (en) | 2010-05-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Fraillon et al. | IEA international computer and information literacy study 2018 assessment framework | |
Easterby-Smith et al. | Management and business research | |
Roemer et al. | Meaningful metrics: A 21st century librarian's guide to bibliometrics, altmetrics, and research impact | |
Norris et al. | The value and practice of research synthesis for language learning and teaching | |
US10643489B2 (en) | Constructed response scoring mechanism | |
Fagan et al. | Usability test results for a discovery tool in an academic library | |
Johnston | Software and method: Reflections on teaching and using QSR NVivo in doctoral research | |
Popescu et al. | Accommodating learning styles in an adaptive educational system | |
Caniato et al. | Designing and developing OM research–from concept to publication | |
US20110270883A1 (en) | Automated Short Free-Text Scoring Method and System | |
Azzopardi et al. | Lucene4IR: Developing information retrieval evaluation resources using Lucene | |
Joorabchi et al. | Text mining stackoverflow: An insight into challenges and subject-related difficulties faced by computer science learners | |
Yamaç et al. | How Digital Reading Differs from Traditional Reading: An Action Research. | |
Moore et al. | Exploring the role of curriculum materials in teacher professional development | |
Ramdeen et al. | A tale of two interfaces: How facets affect the library catalog search | |
Jovanović et al. | The Social Semantic Web in Intelligent Learning Environments: state of the art and future challenges | |
Kliewer et al. | Using Primo for undergraduate research: A usability study | |
Wong | The internet in medical education: a worked example of a realist review | |
Wallis | The missing piece of the integrative studies puzzle | |
Rozenszajn et al. | What do they really think? the repertory grid technique as an educational research tool for revealing tacit cognitive structures | |
Omer et al. | Learning analytics in programming courses: Review and implications | |
Urgo et al. | Goal-setting in support of learning during search: An exploration of learning outcomes and searcher perceptions | |
Riddell | The importance of word and world knowledge for the successful strategic processing of multiple texts online | |
US20120150328A1 (en) | System and method for defining and applying scoring rubrics | |
Pitt Derryberry et al. | Assessing the relationship among Defining Issues Test scores and crystallised and fluid intellectual indices |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: AMERICAN INSTITUTES FOR RESEARCH, DISTRICT OF COLU Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:COHEN, JON;DVORAK, JOSEPH L;ALBRIGHT, LARRY;AND OTHERS;SIGNING DATES FROM 20060126 TO 20090130;REEL/FRAME:032768/0877 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |