WO2014069741A1 - Apparatus and method for automatic scoring - Google Patents

Apparatus and method for automatic scoring Download PDF

Info

Publication number
WO2014069741A1
WO2014069741A1 PCT/KR2013/005347 KR2013005347W WO2014069741A1 WO 2014069741 A1 WO2014069741 A1 WO 2014069741A1 KR 2013005347 W KR2013005347 W KR 2013005347W WO 2014069741 A1 WO2014069741 A1 WO 2014069741A1
Authority
WO
WIPO (PCT)
Prior art keywords
evaluation
scoring
score
automatic scoring
automatic
Prior art date
Application number
PCT/KR2013/005347
Other languages
French (fr)
Korean (ko)
Inventor
윤종철
윤경아
Original Assignee
에스케이텔레콤 주식회사
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 에스케이텔레콤 주식회사 filed Critical 에스케이텔레콤 주식회사
Priority to CN201380031051.4A priority Critical patent/CN104364815A/en
Publication of WO2014069741A1 publication Critical patent/WO2014069741A1/en
Priority to US14/558,154 priority patent/US20150093737A1/en

Links

Images

Classifications

    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B7/00Electrically-operated teaching apparatus or devices working with questions and answers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B7/00Electrically-operated teaching apparatus or devices working with questions and answers
    • G09B7/02Electrically-operated teaching apparatus or devices working with questions and answers of the type wherein the student is expected to construct an answer to the question which is presented or wherein the machine gives an answer to the question presented by a student

Definitions

  • the present invention relates to an automatic scoring technique for automatically scoring a user's answer through machine learning, and more particularly, to an automatic scoring apparatus and method for automatically scoring a target data in consideration of correlations between evaluation areas.
  • a server device that provides a test provides a scoring result by scoring a test.
  • grading results were provided by applying a method such as direct grading by a person and inputting grading data into a server device.
  • this scoring method requires a lot of manpower for scoring and it takes a considerable time to check the scoring results, it is difficult to provide fast service.
  • an automatic scoring system for automatically scoring through machine learning, rather than a human scoring system.
  • the conventional automatic scoring system collects examiner's subjective scoring data for a number of existing answers. Analyze items that can be evaluated by machine learning (Evaluation Qualities) in each answer, generate a scoring model based on items that can be evaluated through machine learning, and analyze the results of the analysis and subjective scoring of the examiner. Analyze the similarity of the answers through the automatic scoring.
  • the scoring areas are not completely mutually exclusive, and the scores of the examiner's evaluation areas are mutually influential.
  • the conventional automatic scoring system does not reflect the characteristics. There is a problem that the accuracy and accuracy of the examiner's scoring through automatic scoring.
  • the present invention is proposed to solve the conventional inconvenience, in the automatic scoring of the target data including the user-written answer using machine learning, to automatically score the target data in consideration of the correlation between the evaluation areas It is intended to provide an automatic scoring apparatus and method.
  • the present invention is proposed to solve the conventional inconvenience, generating a correlation model between evaluation areas by reflecting language pedagogical characteristics, evaluation region characteristics, examiner's answer evaluation characteristics, and generated correlation model
  • the present invention aims to provide an automatic scoring apparatus and method that can compensate for errors in the scoring model for each evaluation area.
  • the present invention provides a means for solving the problem, an automatic scoring unit for performing the automatic scoring for each evaluation region for the scoring target data by applying a pre-generated scoring model for each evaluation region; It provides an automatic scoring device including a score tuning unit for calculating the final automatic scoring score by adjusting the automatic scoring score for each evaluation area for the scoring target data output from the automatic scoring unit according to the correlation model between evaluation areas.
  • the automatic scoring apparatus comprises a scoring model for each evaluation area through machine learning using pre-scoring data for evaluating the one or more evaluation areas for one or more answers and one or more evaluation qualities extracted from the one or more answers. And a correlation model generation unit for generating a correlation model between the scoring model generation unit to generate and the evaluation region defining the probability of generating each score between the one or more evaluation areas based on the previously scored data. have.
  • the score tuning unit compares the automatic scoring scores for each evaluation region, selects an abnormal evaluation region having a separation degree of scoring correlation between evaluation regions larger than a predetermined range, and the abnormal evaluation region
  • the automatic scoring score of can be tuned using the correlation model between the evaluation areas.
  • the score tuning unit generates the scores of the selected abnormal evaluation regions based on the automatic scoring scores of the remaining evaluation regions other than the abnormal evaluation region using the correlation model.
  • the probability may be calculated and the automatic scoring score of the abnormal evaluation region may be changed to the score having the highest probability.
  • the tuning may include: selecting an abnormal evaluation region having a distance greater than a preset range by comparing the automatic scoring scores between the evaluation regions; Calculating a probability of occurrence of each score of the selected abnormal evaluation region based on the automatic scoring scores of the remaining evaluation regions other than the abnormal evaluation region; And changing the automatic scoring score of the abnormal evaluation region to the score having the highest probability.
  • the automatic scoring method before performing the automatic scoring, one or more evaluations extracted from the pre-scoring data and the one or more answers that evaluated the one or more evaluation areas for one or more answers Generating a scoring model for each evaluation area through machine learning using a feature, and generating a correlation model between evaluation areas that define a probability of generating each score among the one or more evaluation areas based on the previous scoring data. It may further include one.
  • the present invention provides a computer-readable recording medium characterized in that a program for executing the above-described automatic scoring method is recorded.
  • the present invention relates to a technique for automatically evaluating an answer written by a user in one or more language areas including speaking, listening, writing, etc.
  • language education By creating a correlation model between the evaluation areas, reflecting the scientific characteristics, evaluation area characteristics, and in vitro test evaluation characteristics, the implicit judgment criteria for the evaluation area can be modeled more realistically.
  • the present invention applies the correlation that can appear between the evaluation areas by applying the correlation model between the generated evaluation areas, and the evaluation characteristics and errors of the examiner's answer in performing automatic scoring through the scoring model for each evaluation area It has the effect of minimizing and increasing the reliability of the evaluation result.
  • FIG. 1 is a diagram illustrating an automatic scoring apparatus according to an exemplary embodiment of the present invention.
  • FIG. 2 is a diagram illustrating a method for performing automatic scoring by applying a correlation model between evaluation areas according to an embodiment of the present invention.
  • FIG. 3 is a diagram illustrating a configuration of an automatic evaluation service system to which an automatic scoring apparatus according to the present invention is applied.
  • FIG. 4 is a diagram illustrating a terminal device to which an automatic scoring method is applied according to an exemplary embodiment of the present invention.
  • 5A to 5C are correlation tables between evaluation areas for describing a correlation model between evaluation areas according to an embodiment of the present invention.
  • 6 to 8 are diagrams showing an example of an automatic scoring process to which a correlation model between evaluation areas is applied according to an embodiment of the present invention.
  • evaluation area is a grading criterion set to standardize in-vitro scoring in relation to a specific evaluation test, and can be defined as a scoring area and evaluation contents of the scoring area.
  • the evaluation area may include a scoring area consisting of fluency, language use, compositional power, and pronunciation.
  • fluency is an element for evaluating the degree of natural ignition without appropriateness and hesitation.
  • Language usage is a factor in evaluating the correctness of expression and the adequacy of vocabulary usage.
  • Constructivity is a factor that evaluates the logical connectivity of speech and the consistency / aggregation of speech content.
  • Pronunciation is a factor that assesses the clarity and comprehension of pronunciation. In the present invention, it is intended to implement automatic scoring of one or more predetermined evaluation areas.
  • FIG. 1 is a diagram illustrating a configuration of an automatic scoring apparatus for performing automatic scoring according to an embodiment of the present invention.
  • the automatic scoring apparatus 100 is an apparatus for automatically scoring an answer written by an evaluator for a specific problem based on one or more preset evaluation areas.
  • the automatic scoring apparatus 100 automatically calculates the scores of one or more evaluation areas for the scoring target data using one or more evaluation area scoring models. Subsequently, the automatic scoring apparatus 100 compares the automatic scoring scores of each evaluation area scored by the scoring model for each evaluation area by using the correlation model between the evaluation areas that have been previously generated, and has an abnormal score having a score outside the preset range. ) Tune the automatic scoring of the evaluation area.
  • the automatic scoring apparatus 100 collects scoring data which is a reference for one or more answers, for example, scoring data for one or more evaluation areas that are directly scored by the examiner.
  • the automatic scoring device 100 may extract one or more evaluation qualities from the one or more answers.
  • the automatic scoring apparatus 100 may generate a scoring model for each evaluation area by performing machine learning using the evaluation quality for each answer and the previous scoring data.
  • the automatic scoring apparatus 100 may automatically score the score for each evaluation area for the newly input scoring object data through the generated evaluation model for each evaluation area.
  • the automatic scoring apparatus 100 may generate a correlation model between evaluation areas in advance using the scoring data.
  • the automatic scoring apparatus 100 may include a scoring model generator 110, a correlation model generator 120, an automatic scoring unit 130, and a score tuning unit 140.
  • the scoring model generator 110, the correlation model generator 120, the automatic scoring unit 130, and the score tuning unit 140 may be implemented in hardware or software, or a combination of hardware and software.
  • the scoring model generator 110, the correlation model generator 120, the automatic scoring unit 130, and the score tuning unit 140 may be implemented with the following software. It can be implemented in combination with a microprocessor that executes.
  • the scoring model generation unit 110 generates a scoring model for each evaluation area through machine learning using one or more evaluation data for the evaluation area for one or more answers that have been previously scored by the examiner, and one or more evaluation qualities from the one or more pre-scored answers. Create
  • the scoring model generating unit 110 is an evaluation quality extracted from one or more answers, that is, items that can be automatically evaluated (eg, word count, adjective number, grammatical error, spelling error, tense match, best answer) And similarity with).
  • items that can be automatically evaluated eg, word count, adjective number, grammatical error, spelling error, tense match, best answer
  • similarity with eg., word count, adjective number, grammatical error, spelling error, tense match, best answer
  • a scoring model for each evaluation area that defines the relationship between the evaluation quality and the scoring score for each evaluation area. That is, the subjective evaluation criteria of the examiner are modeled based on one or more automatically evaluable evaluation qualities.
  • Correlation model generation unit 120 is for modeling the correlation between the evaluation areas in the grading data scored by the examiner by reflecting language pedagogical characteristics, evaluation area characteristics, examiner's answer evaluation characteristics. To this end, the correlation model generator 120 analyzes the correlation between evaluation areas using the one or more pre-giving data used to generate the scoring model for each evaluation area, and generates a correlation model.
  • the correlation model generator 120 may define, as shown in FIG. 5A to FIG. 5C, the characteristics influencing the scoring between evaluation areas as a generation probability table for each score range between evaluation areas. have.
  • the first to fourth evaluation areas are set, and when the scores in the range of 0 to 5 are scored for each evaluation area, the other evaluation areas (Rubric # 4) based on the other evaluation areas ( The score correlation with Rubric # 1, # 2, # 3) is analyzed and illustrated.
  • FIG. 5A illustrates the correlation between the first evaluation region Rubric # 1 and the fourth evaluation region Rubric # 4 as occurrence probabilities for each score group
  • FIG. 5B illustrates a second evaluation region Rubric # 2.
  • Figure 5c shows the correlation between the third evaluation region (Rubric # 3) and the fourth evaluation region (Rubric # 4).
  • the probability of occurrence of scores between evaluation areas can be checked. For example, referring to FIG. 5C, when the third evaluation region Rubric # 3 has three points, the probability that the fourth evaluation region Rubric # 4 is zero is 0%, and the probability of one point is 0.2%. The probability of 2 points is 5.6%, the probability of 3 points is 16.4%, the probability of 4 points is 0.4%, and the probability of 5 points is 0%. Therefore, when the third evaluation area (Rubric # 3) is three points, the score of the fourth evaluation area (Rubric # 4) is very likely to be three or two points.
  • the probability that the third evaluation region (Rubric # 3) is 0 or 1 point is 0%
  • the probability of 2 points is 2.8%
  • the probability of 3 points is 16.4%.
  • the probability of 4 points is 6.6%
  • the probability of 5 points is 0.6%.
  • An answer that receives a high score in the third evaluation area (Rubric # 3) through the correlation model between these evaluation areas is likely to receive a high score in the fourth evaluation area (Rubric # 4), and a low score in the third evaluation area.
  • the answer received was found to have a high probability of receiving a low score in the fourth evaluation area. This is because one or more assessment areas for a particular answer are linguistically linked without being independent of each other.
  • the automatic scoring unit 130 receives new scoring object data, which is a test answer to be scored, by the evaluator, and uses one or more evaluation regions for the scoring object data using the scoring model for each evaluation region generated by the scoring model generator 110.
  • the star score is automatically calculated.
  • the score tuning unit 140 tunes an automatic scoring score for each evaluation area for the scoring target data output from the automatic scoring unit 130 through the correlation model between the evaluation areas generated by the correlation model generator 120. do.
  • the score tuning unit 140 compares the automatic scoring scores for each evaluation area, selects an abnormal evaluation area having a score greater than a preset range, and between the selected abnormal evaluation area and the remaining evaluation areas.
  • the automatic scoring score of the abnormal evaluation region may be adjusted based on the correlation model.
  • FIG. 2 is a diagram illustrating a method for performing automatic scoring by applying a correlation model between evaluation areas in an automatic evaluation service system according to an exemplary embodiment of the present invention.
  • the automatic scoring device 100 collects one or more scoring data previously scored by the examiners in step 1101.
  • the one or more scoring data includes information about the scores of one or more answers each of the one or more examiners for one or more evaluation areas.
  • the automatic scoring apparatus 100 generates a scoring model for each evaluation region through machine learning based on one or more scoring data collected in step 1102. More specifically, the automatic scoring apparatus 100 may automatically evaluate the evaluation qualities (eg, word count, adjective number, grammar error, spelling error, tense match, model, etc.) from the answer corresponding to the scoring data for each evaluation region. Analyze the similarity with the answer). Then, a scoring model for each evaluation area is generated to calculate scores for each evaluation area based on the evaluated evaluation properties and the at least one scoring data by machine learning for each evaluation area.
  • the evaluation qualities eg, word count, adjective number, grammar error, spelling error, tense match, model, etc.
  • the automatic scoring apparatus 100 generates a correlation model between evaluation regions based on the scoring data for each evaluation region collected in operation 1103, as shown in FIGS. 5A to 5C.
  • the correlation model between evaluation areas is a structural representation of the correlation between two evaluation areas. For example, if four evaluation areas exist, six correlation models may be generated.
  • the correlation model between the evaluation areas may be implemented in the form of defining the occurrence probability for each score between the two evaluation areas.
  • the automatic scoring apparatus 100 newly receives the scoring target data prepared by the evaluator who took the test in step 1104 about the specific problem.
  • the automatic scoring apparatus 100 calculates an automatic scoring score for each of the scoring targets by one or more evaluation regions by applying the scoring model generated for each evaluation region in operation 1105. Specifically, at least one evaluation feature is extracted from the new scoring object data, and the extracted evaluation feature is input to the scoring model for each evaluation area to calculate an automatic scoring score for each evaluation area.
  • the automatic scoring scores for the evaluation areas calculated as described above may include errors because the correlations between the evaluation areas are not reflected.
  • the present invention further performs a process of tuning the automatic scoring result using the correlation model described below.
  • the automatic scoring apparatus 100 compares the automatic scoring score for each evaluation region calculated through the automatic scoring in operation 1106, and selects the abnormal evaluation region having a score whose correlation distance is out of the preset range.
  • the correlation spacing may be defined as a probability that a difference between scores of two evaluation areas or an automatic scoring score of two evaluation areas occurs at the same time.
  • Figure 6 is an example for explaining the automatic scoring method according to the present invention
  • the examinee number is information for identifying each subject
  • the subjective scoring results of the examiner for the answer by each subject is shown on the left
  • the same answer is evaluated
  • the automatic scoring score calculated using the area scoring model is shown on the right. In this case, the scoring is performed on four evaluation areas (Rubric # 1 to # 4).
  • the score of the first evaluation area is 4 points, 2
  • the score of the evaluation area (Rubric # 2) was calculated by 3 points, the score of the third evaluation area (Rubric # 3) by 3 points, the score of the fourth evaluation area (Rubric # 4) by 0 points.
  • the score of the fourth evaluation region (Rubric # 4) of the automatic scoring result is 0, and the difference from the scores of the other evaluation regions. Since the fourth evaluation area (Rubric # 4) can be selected as an abnormal evaluation area.
  • the selection of the abnormal evaluation region may be made based on a difference between the average value of the automatic scoring scores of the remaining evaluation regions and their own automatic scoring scores for each evaluation region. That is, the evaluation areas in which the scores of each evaluation area differ by more than a predetermined reference value from the average value of the automatic scoring scores of the remaining evaluation areas are selected as the abnormal evaluation areas.
  • the selection criterion ⁇ of the abnormal evaluation region may be arbitrarily determined.
  • the automatic scoring apparatus 100 tunes the automatic scoring of the selected abnormal evaluation region by applying a correlation model between the evaluation regions. Specifically, the automatic scoring apparatus 100 checks the automatic scoring scores of the selected abnormal evaluation areas and the automatic scoring scores of the remaining evaluation areas, and based on the automatic scoring scores of the remaining evaluation areas through the correlation model, the abnormal evaluation areas. The probability of occurrence for each score (for example, 0 to 5 points) is calculated. Thereafter, the automatic scoring apparatus 100 obtains a sum of probabilities of generating automatic scoring scores of the remaining evaluation regions for each score of the selected abnormal evaluation region, and extracts a score having the highest sum of the probabilities. The automatic scoring apparatus 100 may perform score tuning by changing the automatic scoring score of the selected abnormal evaluation region to the score having the highest probability.
  • the fourth evaluation region is selected as the abnormal evaluation region from the automatic scoring result of the examinee having the examinee number “20121102”, and the automatic scoring scores of the remaining first, second and third evaluation regions are evaluated. Were 4 points, 3 points, and 3 points, respectively.
  • the automatic scoring apparatus 100 has a probability of occurrence of score points (0 to 5 points) of the fourth evaluation area when the first evaluation area is four points, and the second evaluation area is three points. The occurrence probability of each of the fourth evaluation region score points (0 to 5 points) when the point is, and the occurrence probability of the fourth evaluation region by score points (0 to 5 points) when the third evaluation area is 3 points.
  • the sum of the probabilities of generating the automatic scoring scores of the remaining first, second, and third evaluation areas for each score range of the fourth evaluation area is obtained, and the score of the fourth evaluation area having the maximum is detected.
  • the automatic scoring scores of the first to third evaluation areas (Rubric # 1 to # 3) are 4, 3, and 3, respectively, three points of the scores of the fourth evaluation area (Rubric # 4) are scored. It can be seen that the probability of occurrence is the highest with 40%.
  • the automatic scoring apparatus 100 changes the automatic score of the fourth evaluation region selected as the abnormal evaluation region from 0 to 3, as shown in FIG. 8.
  • the final automatic scoring result in the automatic scoring apparatus 100 is adjusted similarly to the scoring result by the examiner, as shown in FIG. 8.
  • the automatic scoring apparatus 100 may calculate final automatic scoring result data through score tuning, and provide final evaluating result information on the calculated final automatic scoring result data to the evaluator.
  • the automatic evaluation apparatus and method according to the present invention can be applied to an automatic evaluation service system based on a network.
  • FIG. 3 is a diagram illustrating a configuration of an automatic evaluation service system to which an automatic evaluation apparatus according to an exemplary embodiment of the present invention is applied.
  • the automatic evaluation service system may include an evaluation service server 30 including a plurality of terminal devices 20 and an automatic scoring device 100_1 connected through the communication network 10.
  • the plurality of terminal devices 20 refers to a terminal capable of transmitting and receiving various data via the communication network 10 according to a user's key manipulation, and may be a tablet PC, a laptop, or a personal computer. It may be one of a personal computer, a smart phone, a personal digital assistant (PDA), a smart TV, and a mobile communication terminal.
  • the terminal device 20 is a terminal for performing voice or data communication using the communication network 10, and stores a browser, a program, and a protocol for communicating with the evaluation service server 30 via the communication network 10.
  • the terminal device 20 may be any terminal as long as server-client communication with the evaluation service server 30 is possible, and is a broad concept including all communication computing devices such as notebook computers, mobile communication terminals, and PDAs. Meanwhile, the terminal device 20 is preferably manufactured in a form having a touch screen, but is not necessarily limited thereto.
  • the plurality of terminal devices 20 mean a terminal for receiving an automatic scoring service, and may be a terminal device of an examinee or a terminal device of an examiner.
  • the plurality of terminal devices 20 interoperate with the evaluation service server 100 through the communication network 10, receive a test answer from an evaluator, and transmit the test answer to the evaluation service server 30, from the evaluation service server 30.
  • An automatic evaluation result for the test answer may be sent.
  • by applying the correlation model for each evaluation area from the evaluation service server 30 may receive the automatically scored scoring result data to guide the user.
  • the evaluation service server 30 is a server device that performs an automatic evaluation on an answer transmitted from the terminal device 20 and provides the evaluation result.
  • the evaluation service server 30 includes an automatic scoring device 100_1 to which a correlation model according to the present invention is applied. can do.
  • the automatic scoring apparatus 100_1 may provide an automatic scoring service in cooperation with a plurality of terminal apparatuses 20 through the communication network 10.
  • the automatic scoring apparatus 100_1 may collect scoring data for each evaluation area from the examiner and store the evaluation data in advance for each evaluation area in the database. At this time, the scoring data and evaluation data for each evaluation area may be directly input from the examiner or may be transmitted through the communication network 10.
  • the automatic scoring device 100_1 generates a scoring model for each evaluation area through machine learning using the collected scoring data and evaluation quality of each evaluation area, and compares the scoring results of the evaluation area to evaluate language pedagogical characteristics and evaluation. Correlation models between assessment areas can be created by reflecting domain characteristics, examiner's answer evaluation characteristics, etc.
  • the automatic scoring apparatus 100_1 receives the new scoring target data from the terminal device 20, the automatic scoring apparatus 100_1 extracts an evaluation feature from the new scoring target data. Then, the extracted evaluation feature is input to the generated scoring region-specific scoring model to calculate an automatic scoring score for each evaluation region for the new scoring object data.
  • the automatic scoring apparatus 100_1 applies the generated correlation model between the evaluation regions, and selects the abnormal evaluation region having a score of a correlation greater than a predetermined reference value.
  • the automatic scoring apparatus 100_1_ calculates a probability of occurrence of each of the abnormal evaluation areas by using the correlation model based on the automatic scoring scores of the remaining evaluation areas other than the selected abnormal evaluation area, and calculates the scores by the correlation model. By comparing the probability of occurrence, the highest probability score is applied as the automatic scoring score of the selected abnormal evaluation region.
  • the automatic scoring apparatus 100_1 may provide the terminal apparatus 20 with the final automatic scoring score thus calculated. Since the detailed configuration of the automatic scoring apparatus 100_1 has been described with reference to FIGS. 1 and 2, a redundant description thereof will be omitted.
  • the automatic scoring method according to the present invention may be implemented and used in the form of a program mounted on the terminal device.
  • FIG. 4 is a diagram illustrating a terminal device having a program according to an automatic evaluation method according to an exemplary embodiment of the present invention.
  • the terminal device 40 may include a control unit 210, a communication unit 220, an input unit 230, a storage unit 240, and an output unit 250.
  • the terminal device 40 is a user information processing device capable of installing and executing the automatic scoring program 100_2 according to the present invention and performing the automatic scoring method according to the present invention. Anything is possible.
  • the terminal device 40 a tablet PC (Tablet PC), a laptop (Laptop) computer, a personal computer (PC), a smart phone (Smart Phone), a personal digital assistant (PDA) , A smart TV, a mobile communication terminal, and the like.
  • the controller 210 controls the overall operation of the terminal device 40 and the operation related to the automatic scoring service execution.
  • the controller 210 executes an application for taking a test according to the input test take request information, and displays a test problem or the like on the screen of the output unit 250.
  • Control to display Accordingly, the controller 210 receives and processes the information on the answer of the test question, that is, the scoring target data through the input unit 230, and stores the processed scoring target data in the storage 140.
  • the automatic scoring program 100_2 is executed to control automatic scoring of new data.
  • the controller 310 controls the user to guide the final automatic scoring result information through the screen of the output unit 250.
  • the communication unit 220 is for transmitting and receiving data through a communication network.
  • the communication unit 220 may transmit and receive data through various communication methods as well as wired and wireless methods.
  • the communication unit 220 may transmit and receive data using one or more communication methods, and for this purpose, the communication unit 220 may include a plurality of communication modules that transmit and receive data according to different communication methods.
  • the input unit 230 may generate a user input signal corresponding to a user's request or information according to a user's operation, and may be implemented by various input means that are currently commercialized or may be commercialized in the future. For example, a keyboard and a mouse In addition to a general input device such as a joystick, a touch screen, a touch pad, and the like, a gesture input means for detecting a user's motion and generating a specific input signal may be included.
  • the input unit 230 may transfer information input from the user to the controller 210. That is, the input unit 230 may receive an answer to a test question, that is, new scoring target data, from an evaluator.
  • the storage unit 240 stores information necessary for the operation of the terminal device 40, and in particular, may store information related to an automatic scoring service.
  • the automatic scoring program 100_2 programmed to perform the automatic scoring method according to the present invention may be stored.
  • the storage unit 240 may include an optical recording medium such as a magnetic media such as a hard disk, a floppy disk, and a magnetic tape, a compact disk read only memory (CD-ROM), and a digital video disk (DVD). And magneto-optical media such as floppy disks and ROM, random access memory (RAM), and flash memory.
  • the output unit 250 is a means for providing the user to recognize the operation result or state of the terminal device 40, and includes, for example, a display unit for visually outputting through a screen or a speaker for outputting an audible sound. can do.
  • a screen related to an automatic scoring service driven by the terminal device 40 may be displayed, and a screen for executing the automatic scoring service may be displayed according to a user's request.
  • the output unit 250 may display an answer to a test question input from an evaluator, that is, scoring target data, or display an automatic scoring score for the scoring target data on the screen.
  • the terminal device 40 executes the automatic scoring program 100_2, and automatically uses the scoring model for each evaluation area for the user's answer, that is, the target data input through the input unit 230, for each evaluation area.
  • a scoring score is calculated, and then an abnormal evaluation region having a score that is out of a predetermined range is extracted using a correlation model between evaluation regions, and the abnormal evaluation region is based on the automatic scoring score of the remaining evaluation regions.
  • the probability of occurrence of each score is calculated, and the automatic scoring score of the abnormal evaluation region is changed to the highest probability score.
  • the terminal device 40 may provide the user with the automatic scoring result finally calculated as described above.
  • program instructions recorded in the automatic scoring program 100_2 may be those specially designed and configured for the present invention or may be known and available to those skilled in computer software.
  • the present invention relates to an automatic scoring apparatus and method, wherein in scoring grading data by one or more evaluation areas, a correlation model between evaluation areas is generated by reflecting language pedagogical characteristics, evaluation area characteristics, test tube answer evaluation characteristics, and the like. By generating, there is an effect that can more realisticly model the implicit judgment criteria that examiners subjectively apply.
  • the present invention applies a correlation model between the generated evaluation areas to select an abnormal evaluation area that the correlation distance between evaluation areas is out of a predetermined range, the most likely to occur based on the automatic scoring of the remaining evaluation areas.
  • the present invention is a useful invention that is applied to the automatic scoring service, a useful invention that generates the effect of performing automatic scoring more similarly to the test and answer in consideration of the scoring correlation between the evaluation areas, through which the development of the service industry I can contribute.

Abstract

The present invention relates to an apparatus and method for automatic scoring. According to the present invention: an implicit determination criteria of an examiner can be realistically modeled by generating a correlation model between evaluation regions based on language education characteristics, evaluation region characteristics, and answer evaluation characteristics of the examiner; one or more evaluation regions for scoring target data are automatically scored by applying a pre-generated scoring model for each evaluation region; and reliable automatic scoring results can be obtained using the correlation model for each evaluation region and tuning automatic scoring scores for one or more evaluation regions.

Description

자동 채점 장치 및 방법Automatic scoring device and method
본 발명은 기계학습을 통해 사용자의 답안을 자동으로 채점하는 자동 채점 기술에 관한 것으로서, 특히 평가영역 간의 상관 관계를 고려하여 채점 대상 데이터를 자동 채점하는 자동 채점 장치 및 방법에 관한 것이다. The present invention relates to an automatic scoring technique for automatically scoring a user's answer through machine learning, and more particularly, to an automatic scoring apparatus and method for automatically scoring a target data in consideration of correlations between evaluation areas.
이 부분에 기술된 내용은 단순히 본 실시 예에 대한 배경 정보를 제공할 뿐 종래기술을 구성하는 것은 아니다.The contents described in this section merely provide background information on the present embodiment and do not constitute a prior art.
통신 기술이 발전함에 따라 최근에는 통신 기술을 이용하여 어학 시험 및 간단한 레벨 테스트 등을 응시할 수 있게 되었으며, 이를 위해 시험을 제공하는 서버 장치에서는 시험에 대한 채점을 하여 채점 결과를 제공하고 있다. 기존에는 이러한 시험에 대한 답을 채점하기 위해 사람이 직접 채점하여 채점 데이터를 서버 장치에 입력하는 등의 방식을 적용하여 채점 결과를 제공하였다. With the development of communication technology, it is possible to take a language test and a simple level test using communication technology in recent years. To this end, a server device that provides a test provides a scoring result by scoring a test. Previously, in order to grade the answers to these tests, grading results were provided by applying a method such as direct grading by a person and inputting grading data into a server device.
그러나 이러한 채점 방식은 채점을 위해 많은 인력이 필요하고, 채점 결과를 확인하는 데 상당한 시간이 소요되므로 빠른 서비스를 제공하기 어렵다. However, this scoring method requires a lot of manpower for scoring and it takes a considerable time to check the scoring results, it is difficult to provide fast service.
이를 개선하기 위해, 최근에는 사람이 직접 채점하지 않고, 기계 학습을 통해 자동으로 채점하는 자동 채점 시스템이 개발되고 있는데, 이러한 종래의 자동 채점 시스템은 기존 다수의 답안에 대한 시험관의 주관적 채점 데이터를 수집하고, 각 답안에서 기계 학습으로 평가 가능한 항목(평가자질)들을 분석하고, 분석 결과와 시험관의 주관적 채점 결과를 기계 학습을 통해 평가 가능한 항목들을 기준으로 채점 모델을 생성하며, 이러하게 생성된 채점 모델을 통해 답안의 유사도를 분석하여 자동 채점을 수행한다. In order to improve this, in recent years, an automatic scoring system has been developed for automatically scoring through machine learning, rather than a human scoring system. The conventional automatic scoring system collects examiner's subjective scoring data for a number of existing answers. Analyze items that can be evaluated by machine learning (Evaluation Qualities) in each answer, generate a scoring model based on items that can be evaluated through machine learning, and analyze the results of the analysis and subjective scoring of the examiner. Analyze the similarity of the answers through the automatic scoring.
그러나 언어교육학적 특성에 따라 채점영역이 완벽하게 상호 배타적(Mutually Exclusive)이지 않으며, 시험관의 평가영역 별 채점이 상호 영향을 주는 특성을 가지고 있는데, 종래의 자동 채점 시스템은 상기 특성을 반영하고 있지 않으므로 자동 채점을 통한 시험관의 채점 결과와 정확도가 떨어지는 문제점이 있다. However, according to the linguistic pedagogy, the scoring areas are not completely mutually exclusive, and the scores of the examiner's evaluation areas are mutually influential. However, the conventional automatic scoring system does not reflect the characteristics. There is a problem that the accuracy and accuracy of the examiner's scoring through automatic scoring.
본 발명은 종래의 불편함을 해소하기 위하여 제안된 것으로서, 사용자 작성한 답안을 포함하는 채점 대상 데이터를 기계 학습을 이용하여 자동 채점하는데 있어서, 평가영역 간 상관관계를 고려하여 채점 대상 데이터를 자동 채점하기 위한 자동 채점 장치 및 방법을 제공하고자 한다.The present invention is proposed to solve the conventional inconvenience, in the automatic scoring of the target data including the user-written answer using machine learning, to automatically score the target data in consideration of the correlation between the evaluation areas It is intended to provide an automatic scoring apparatus and method.
또한, 본 발명은 종래의 불편함을 해소하기 위하여 제안된 것으로서, 언어교육학적 특성, 평가영역 특성, 시험관의 답안 평가 특성 등을 반영하여 평가영역 간 상관관계 모델을 생성하고, 생성된 상관관계 모델을 적용하여 평가영역 별 채점 모델에서의 오차를 보상할 수 있는 자동 채점 장치 및 방법을 제공하고자 한다.In addition, the present invention is proposed to solve the conventional inconvenience, generating a correlation model between evaluation areas by reflecting language pedagogical characteristics, evaluation region characteristics, examiner's answer evaluation characteristics, and generated correlation model The present invention aims to provide an automatic scoring apparatus and method that can compensate for errors in the scoring model for each evaluation area.
본 발명은 과제를 해결하기 위한 수단으로서, 기 생성된 평가영역 별 채점 모델을 적용하여 채점 대상 데이터에 대하여 상기 평가영역 별로 자동 채점을 수행하는 자동 채점부; 상기 자동 채점부로부터 출력된 상기 채점 대상 데이터에 대한 평가영역 별 자동 채점 점수를 평가영역 간의 상관관계 모델에 따라서 조정하여 최종 자동 채점 점수를 산출하는 점수 튜닝부를 포함하는 자동 채점 장치를 제공한다.The present invention provides a means for solving the problem, an automatic scoring unit for performing the automatic scoring for each evaluation region for the scoring target data by applying a pre-generated scoring model for each evaluation region; It provides an automatic scoring device including a score tuning unit for calculating the final automatic scoring score by adjusting the automatic scoring score for each evaluation area for the scoring target data output from the automatic scoring unit according to the correlation model between evaluation areas.
본 발명에 의한 자동 채점 장치는, 하나 이상의 답안에 대하여 상기 하나 이상의 평가영역들을 평가한 기 채점 데이터 및 상기 하나 이상의 답안으로부터 추출된 하나 이상의 평가자질을 이용한 기계 학습을 통해 상기 평가영역 별 채점 모델을 생성하는 채점 모델 생성 부와, 상기 기 채점 데이터를 기반으로 상기 하나 이상의 평가영역 간에 각 점수가 발생할 확률을 정의한 상기 평가영역 간의 상관관계 모델을 생성하는 상관관계 모델 생성부 중에서 하나를 더 포함할 수 있다.The automatic scoring apparatus according to the present invention comprises a scoring model for each evaluation area through machine learning using pre-scoring data for evaluating the one or more evaluation areas for one or more answers and one or more evaluation qualities extracted from the one or more answers. And a correlation model generation unit for generating a correlation model between the scoring model generation unit to generate and the evaluation region defining the probability of generating each score between the one or more evaluation areas based on the previously scored data. have.
본 발명에 의한 자동 채점 장치에 있어서, 상기 점수 튜닝부는 상기 평가영역 별 자동 채점 점수를 비교하여, 평가영역 간 채점 상관관계의 이격도가 미리 설정된 범위보다 큰 이상 평가영역을 선별하고, 상기 이상 평가영역의 자동 채점 점수를 상기 평가영역 간 상관관계 모델을 이용하여 튜닝할 수 있다.In the automatic scoring apparatus according to the present invention, the score tuning unit compares the automatic scoring scores for each evaluation region, selects an abnormal evaluation region having a separation degree of scoring correlation between evaluation regions larger than a predetermined range, and the abnormal evaluation region The automatic scoring score of can be tuned using the correlation model between the evaluation areas.
또한, 본 발명에 의한 자동 채점 장치에 있어서, 상기 점수 튜닝부는 상기 상관관계 모델을 이용하여 상기 이상 평가영역을 제외한 나머지 평가영역의 자동 채점 점수를 기준으로, 상기 선별된 이상 평가영역의 점수 별 발생 확률을 산출하고, 상기 이상 평가영역의 자동 채점 점수를 가장 높은 확률을 갖는 점수로 변경할 수 있다.In the automatic scoring apparatus according to the present invention, the score tuning unit generates the scores of the selected abnormal evaluation regions based on the automatic scoring scores of the remaining evaluation regions other than the abnormal evaluation region using the correlation model. The probability may be calculated and the automatic scoring score of the abnormal evaluation region may be changed to the score having the highest probability.
더하여, 본 발명의 일 실시 예에 따르면, 상술한 과제를 해결하기 위한 다른 수단으로서, 기 생성된 평가영역 별 채점 모델을 적용하여, 채점 대상 데이터에 대하여 하나 이상의 평가영역 별로 자동 채점을 수행하는 단계; 및 평가영역별 상관 모델을 이용하여, 상기 하나 이상의 평가 영역별 자동 채점 점수를 튜닝하는 단계를 포함하는 것을 특징으로 하는 자동 채점 방법을 제공한다.In addition, according to an embodiment of the present invention, as another means for solving the above-described problem, by applying a pre-generated scoring model for each evaluation area, performing the automatic scoring for one or more evaluation areas for the scoring target data ; And tuning an automatic scoring score for each of the one or more evaluation areas using a correlation model for each evaluation area.
본 발명의 일 실시 예에 따른 자동 채점 방법에 있어서, 상기 튜닝하는 단계는, 평가영역 간 자동 채점 점수를 비교하여, 이격도가 기 설정된 범위보다 큰 이상 평가영역을 선별하는 단계; 상기 이상 평가영역을 제외한 나머지 평가영역의 자동 채점 점수를 기준으로 상기 선별된 이상 평가영역의 점수 별 발생 확률을 산출하는 단계; 및 가장 높은 확률을 갖는 점수로 상기 이상 평가영역의 자동 채점 점수를 변경하는 단계를 포함할 수 있다.In the automatic scoring method according to an embodiment of the present disclosure, the tuning may include: selecting an abnormal evaluation region having a distance greater than a preset range by comparing the automatic scoring scores between the evaluation regions; Calculating a probability of occurrence of each score of the selected abnormal evaluation region based on the automatic scoring scores of the remaining evaluation regions other than the abnormal evaluation region; And changing the automatic scoring score of the abnormal evaluation region to the score having the highest probability.
더하여, 본 발명의 일 실시 예에 따른 자동 채점 방법은, 상기 자동 채점을 수행하기 전에, 하나 이상의 답안에 대하여 상기 하나 이상의 평가영역들을 평가한 기 채점 데이터 및 상기 하나 이상의 답안으로부터 추출된 하나 이상의 평가자질을 이용한 기계 학습을 통해 상기 평가영역 별 채점 모델을 생성하는 단계와, 상기 기 채점 데이터를 기반으로 상기 하나 이상의 평가영역 간에 각 점수가 발생할 확률을 정의한 평가영역 간 상관 관계 모델을 생성하는 단계 중 하나를 더 포함할 수 있다.In addition, the automatic scoring method according to an embodiment of the present invention, before performing the automatic scoring, one or more evaluations extracted from the pre-scoring data and the one or more answers that evaluated the one or more evaluation areas for one or more answers Generating a scoring model for each evaluation area through machine learning using a feature, and generating a correlation model between evaluation areas that define a probability of generating each score among the one or more evaluation areas based on the previous scoring data. It may further include one.
더하여, 본 발명은 상술한 자동 채점 방법을 실행하기 위한 프로그램이 기록되어 있는 것을 특징으로 하는 컴퓨터에서 판독 가능한 기록 매체를 제공한다.In addition, the present invention provides a computer-readable recording medium characterized in that a program for executing the above-described automatic scoring method is recorded.
본 발명은 말하기, 듣기, 쓰기 등을 포함하는 하나 이상의 언어 영역에서 사용자가 작성한 답안을 자동 평가하는 기술에 관한 것으로서, 특히, 시험자가 작성한 답안에 대하여, 하나 이상의 평가영역을 평가하는데 있어서, 언어교육학적 특성, 평가영역 특성, 시험관의 답안 평가 특성 등을 반영하여 평가영역 간 상관관계 모델을 생성함으로써, 평가영역에 대한 암묵적 판단 기준을 보다 현실적으로 모델링 할 수 있는 효과가 있다. The present invention relates to a technique for automatically evaluating an answer written by a user in one or more language areas including speaking, listening, writing, etc. In particular, in evaluating one or more evaluation areas for an answer written by an examiner, language education By creating a correlation model between the evaluation areas, reflecting the scientific characteristics, evaluation area characteristics, and in vitro test evaluation characteristics, the implicit judgment criteria for the evaluation area can be modeled more realistically.
아울러, 본 발명은 생성된 평가영역 간의 상관관계 모델을 적용하여 평가영역 간에 나타날 수 있는 상관 관계를 적용하고, 평가영역 별 채점 모델을 통해 자동 채점을 수행하는데 있어서의 시험관의 답안 평가 특성과 오차를 최소화하고 평가 결과에 대한 신뢰성을 높일 수 있는 효과가 있다. In addition, the present invention applies the correlation that can appear between the evaluation areas by applying the correlation model between the generated evaluation areas, and the evaluation characteristics and errors of the examiner's answer in performing automatic scoring through the scoring model for each evaluation area It has the effect of minimizing and increasing the reliability of the evaluation result.
도 1은 본 발명의 실시 예에 따른 자동 채점 장치를 도시한 도면이다.1 is a diagram illustrating an automatic scoring apparatus according to an exemplary embodiment of the present invention.
도 2는 본 발명의 실시 예에 따라 평가영역 간 상관관계 모델을 적용하여 자동 채점을 수행하기 위한 방법을 도시한 도면이다.2 is a diagram illustrating a method for performing automatic scoring by applying a correlation model between evaluation areas according to an embodiment of the present invention.
도 3은 본 발명에 따른 자동 채점 장치를 적용한 자동 평가 서비스 시스템의 구성을 나타낸 도면이다.3 is a diagram illustrating a configuration of an automatic evaluation service system to which an automatic scoring apparatus according to the present invention is applied.
도 4는 본 발명의 실시 예에 따른 자동 채점 방법을 적용한 단말 장치를 도시한 도면이다.4 is a diagram illustrating a terminal device to which an automatic scoring method is applied according to an exemplary embodiment of the present invention.
도 5a 내지 도 5c는 본 발명의 실시 예에 따른 평가영역 간 상관관계 모델을 설명하기 위한 평가영역 간 상관관계 테이블이다.5A to 5C are correlation tables between evaluation areas for describing a correlation model between evaluation areas according to an embodiment of the present invention.
도 6 내지 도 8은 본 발명의 실시 예에 따라 평가영역 간 상관관계 모델을 적용한 자동 채점 과정의 일 예를 나타낸 도면이다.6 to 8 are diagrams showing an example of an automatic scoring process to which a correlation model between evaluation areas is applied according to an embodiment of the present invention.
이하, 본 발명의 바람직한 실시 예를 첨부한 도면을 참조하여 상세히 설명한다. 다만, 하기의 설명 및 첨부된 도면에서 본 발명의 요지를 흐릴 수 있는 공지 기능 또는 구성에 대한 상세한 설명은 생략한다. 또한, 도면 전체에 걸쳐 동일한 구성 요소들은 가능한 한 동일한 도면 부호로 나타내고 있음에 유의하여야 한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. However, in the following description and the accompanying drawings, detailed descriptions of well-known functions or configurations that may obscure the subject matter of the present invention will be omitted. In addition, it should be noted that like elements are denoted by the same reference numerals as much as possible throughout the drawings.
이하에서 설명되는 본 명세서 및 청구범위에 사용된 용어나 단어는 통상적이거나 사전적인 의미로 한정해서 해석되어서는 아니 되며, 발명자는 그 자신의 발명을 가장 최선의 방법으로 설명하기 위한 용어의 개념으로 적절하게 정의할 수 있다는 원칙에 입각하여 본 발명의 기술적 사상에 부합하는 의미와 개념으로 해석되어야만 한다. 따라서 본 명세서에 기재된 실시 예와 도면에 도시된 구성은 본 발명의 가장 바람직한 일 실시 예에 불과할 뿐이고, 본 발명의 기술적 사상을 모두 대변하는 것은 아니므로, 본 출원시점에 있어서 이들을 대체할 수 있는 다양한 균등물과 변형 예들이 있을 수 있음을 이해하여야 한다.The terms or words used in the specification and claims described below should not be construed as being limited to ordinary or dictionary meanings, and the inventors are appropriate as concepts of terms for explaining their own invention in the best way. It should be interpreted as meanings and concepts in accordance with the technical spirit of the present invention based on the principle that it can be defined. Therefore, the embodiments described in the present specification and the configuration shown in the drawings are only the most preferred embodiments of the present invention, and do not represent all of the technical ideas of the present invention, and various alternatives may be substituted at the time of the present application. It should be understood that there may be equivalents and variations.
이하의 설명에 있어서,"평가영역"은 특정 평가 시험과 관련하여 시험관간의 채점을 정형화하기 위하여 설정된 채점 기준으로서, 채점 영역 및 그 채점 영역의 평가 내용으로 정의될 수 있다. 예를 들어, 외국어에 대한 말하기 평가의 경우, 평가영역은 유창성, 언어사용, 구성력, 발음으로 이루어진 채점 영역을 포함할 수 있다. 여기서, 유창성은 발화속도의 적절성, 머뭇거림이 없이 자연스런 발화 유지 정도를 평가하는 요소이다. 언어사용은 표현의 정확성 및 어휘 사용의 적절성을 평가하는 요소이다. 구성력은 발화의 논리적 연결성 및 발화 내용의 일관성/응집성을 평가하는 요소이다. 발음은 발음의 명확성, 이해 가능 정도를 평가하는 요소이다. 본 발명에서는 이러한 기 설정된 하나 이상의 평가영역에 대한 자동 채점을 구현하고자 한다.In the following description, " evaluation area " is a grading criterion set to standardize in-vitro scoring in relation to a specific evaluation test, and can be defined as a scoring area and evaluation contents of the scoring area. For example, in the case of a speech evaluation for a foreign language, the evaluation area may include a scoring area consisting of fluency, language use, compositional power, and pronunciation. Here, fluency is an element for evaluating the degree of natural ignition without appropriateness and hesitation. Language usage is a factor in evaluating the correctness of expression and the adequacy of vocabulary usage. Constructivity is a factor that evaluates the logical connectivity of speech and the consistency / aggregation of speech content. Pronunciation is a factor that assesses the clarity and comprehension of pronunciation. In the present invention, it is intended to implement automatic scoring of one or more predetermined evaluation areas.
우선, 본 발명의 실시 예에 따른 자동 채점 장치 및 방법에 대해 첨부된 도면을 참조하여 구체적으로 설명하기로 한다.First, an automatic scoring apparatus and method according to an embodiment of the present invention will be described in detail with reference to the accompanying drawings.
도 1은 본 발명의 실시 예에 따른 자동 채점을 수행하기 위한 자동 채점 장치의 구성을 도시한 도면이다.1 is a diagram illustrating a configuration of an automatic scoring apparatus for performing automatic scoring according to an embodiment of the present invention.
도 1을 참조하면, 본 발명의 실시 예에 따른 자동 채점 장치(100)는 본 발명에 따라서 특정 문제에 대하여 피 평가자가 작성한 답안을 기 설정된 하나 이상의 평가영역을 기준으로 자동 채점하기 위한 장치이다. 특히 본 발명에 의한 자동 채점 장치(100)는, 하나 이상의 평가영역 별 채점 모델을 이용하여, 채점 대상 데이터에 대하여 하나 이상의 평가영역 별 점수를 자동으로 산출한다. 이어서 자동 채점 장치(100)는 기 생성된 평가영역 간 상관관계 모델을 이용하여 평가영역 별 채점 모델로 채점한 각 평가영역의 자동 채점 점수를 비교하여, 기 설정된 범위를 벗어나는 점수를 갖는 이상(異常) 평가영역의 자동 채점 점수를 튜닝한다.Referring to FIG. 1, the automatic scoring apparatus 100 according to an exemplary embodiment of the present invention is an apparatus for automatically scoring an answer written by an evaluator for a specific problem based on one or more preset evaluation areas. In particular, the automatic scoring apparatus 100 according to the present invention automatically calculates the scores of one or more evaluation areas for the scoring target data using one or more evaluation area scoring models. Subsequently, the automatic scoring apparatus 100 compares the automatic scoring scores of each evaluation area scored by the scoring model for each evaluation area by using the correlation model between the evaluation areas that have been previously generated, and has an abnormal score having a score outside the preset range. ) Tune the automatic scoring of the evaluation area.
이를 위하여, 자동 채점 장치(100)는 하나 이상의 답안에 대하여 기준이 되는 채점 데이터, 예를 들어, 시험관에 의해 직접 채점된 하나 이상의 평가영역에 대한 채점 데이터를 수집한다. 또한, 자동 채점 장치(100)는 상기 하나 이상의 답안으로부터 하나 이상의 평가자질을 추출할 수 있다. 그리고, 상기 자동 채점 장치(100)는 각 답안 별 평가자질과 기 채점 데이터를 이용하여 기계 학습을 수행함으로써, 평가영역 별 채점 모델을 생성할 수 있다.To this end, the automatic scoring apparatus 100 collects scoring data which is a reference for one or more answers, for example, scoring data for one or more evaluation areas that are directly scored by the examiner. In addition, the automatic scoring device 100 may extract one or more evaluation qualities from the one or more answers. In addition, the automatic scoring apparatus 100 may generate a scoring model for each evaluation area by performing machine learning using the evaluation quality for each answer and the previous scoring data.
자동 채점 장치(100)는 상기 생성된 평가영역 별 채점 모델을 통해 신규로 입력되는 채점 대상 데이터에 대한 평가영역 별 점수를 자동으로 채점할 수 있다.The automatic scoring apparatus 100 may automatically score the score for each evaluation area for the newly input scoring object data through the generated evaluation model for each evaluation area.
아울러, 자동 채점 장치(100)는 상기 채점 데이터를 이용하여 평가영역 간 상관관계 모델을 미리 생성할 수 있다.In addition, the automatic scoring apparatus 100 may generate a correlation model between evaluation areas in advance using the scoring data.
이와 같은, 자동 채점 장치(100)는 채점 모델 생성부(110), 상관관계 모델 생성부(120), 자동 채점부(130) 및 점수 튜닝부(140)를 포함하여 구성될 수 있다. 상기 채점 모델 생성부(110), 상관관계 모델 생성부(120), 자동 채점부(130) 및 점수 튜닝부(140)은 하드웨어 혹은 소프트웨어 혹은 하드웨어와 소프트웨어의 조합으로 구현될 수 있다. 예를 들어, 상기 채점 모델 생성부(110), 상관관계 모델 생성부(120), 자동 채점부(130) 및 점수 튜닝부(140)은 이후에 설명하는 기능을 수행하도록 구현된 소프트웨어 및 상기 소프트웨어를 실행하는 마이크로프로세서의 결합으로 구현될 수 있다.As such, the automatic scoring apparatus 100 may include a scoring model generator 110, a correlation model generator 120, an automatic scoring unit 130, and a score tuning unit 140. The scoring model generator 110, the correlation model generator 120, the automatic scoring unit 130, and the score tuning unit 140 may be implemented in hardware or software, or a combination of hardware and software. For example, the scoring model generator 110, the correlation model generator 120, the automatic scoring unit 130, and the score tuning unit 140 may be implemented with the following software. It can be implemented in combination with a microprocessor that executes.
채점 모델 생성부(110)는 시험관에 의하여 기 채점된 하나 이상의 답안에 대한 평가영역 별 채점 데이터와, 상기 기 채점된 하나 이상의 답안으로부터 하나 이상의 평가자질을 이용한 기계학습을 통해 평가영역 별 채점 모델을 생성한다.The scoring model generation unit 110 generates a scoring model for each evaluation area through machine learning using one or more evaluation data for the evaluation area for one or more answers that have been previously scored by the examiner, and one or more evaluation qualities from the one or more pre-scored answers. Create
구체적으로, 채점 모델 생성부(110)는 하나 이상의 답안으로부터 추출된 평가자질 즉, 자동으로 평가 가능한 항목들(예를 들어, 단어 수, 형용사 수, 문법 오류, 철자 오류, 시제 부합 여부, 모범 답안과의 유사도 등)을 입력 받는다. 그리고, 상기 평가 자질과 상기 하나 이상의 답안에 대한 시험관의 평가영역 별 채점 데이터를 기계 학습시켜서, 평가 자질과 평가영역 별 채점 점수와의 관계를 정의하는 평가영역별 채점 모델을 생성할 수 있다. 즉, 시험관의 주관적인 평가 기준을 하나 이상의 자동 평가 가능한 평가 자질을 기준으로 모델화하는 것이다.Specifically, the scoring model generating unit 110 is an evaluation quality extracted from one or more answers, that is, items that can be automatically evaluated (eg, word count, adjective number, grammatical error, spelling error, tense match, best answer) And similarity with). In addition, by performing a machine learning of the evaluation quality and the evaluation data for each evaluation area of the examiner for the one or more answers, it is possible to generate a scoring model for each evaluation area that defines the relationship between the evaluation quality and the scoring score for each evaluation area. That is, the subjective evaluation criteria of the examiner are modeled based on one or more automatically evaluable evaluation qualities.
상관관계 모델 생성부(120)는 언어교육학적 특성, 평가영역 특성, 시험관의 답안 평가 특성 등을 반영하여 시험관이 채점하는 채점 데이터에서 평가영역 간 상관관계를 모델화하기 위한 것이다. 이를 위하여, 상관관계 모델 생성부(120)는 상기 평가영역 별 채점 모델을 생성하는데 이용된 상기 하나 이상의 기 채점 데이터를 이용하여 평가영역 간의 상관관계를 분석하고, 상관관계 모델을 생성한다.Correlation model generation unit 120 is for modeling the correlation between the evaluation areas in the grading data scored by the examiner by reflecting language pedagogical characteristics, evaluation area characteristics, examiner's answer evaluation characteristics. To this end, the correlation model generator 120 analyzes the correlation between evaluation areas using the one or more pre-giving data used to generate the scoring model for each evaluation area, and generates a correlation model.
예를 들어, 상관관계 모델 생성부(120)는 첨부된 도 5a 내지 도 5c에 도시된 바와 같이, 평가영역 간 채점의 상호 영향을 주는 특성을, 평가영역 간의 점수대별 발생 확률 테이블로 정의할 수 있다. 본 실시 예에서는, 제1 내지 제4 의 평가영역이 설정되어 있으며, 각 평가영역 별로 0 내지 5 범위의 점수를 채점한다고 할 때, 제4 평가영역(Rubric #4)을 기준으로 다른 평가영역(Rubric #1, #2, #3)과의 점수 상관관계를 분석하여 예시한다. 구체적으로, 도 5a는 제1 평가영역(Rubric #1)과 제4 평가영역(Rubric #4) 간의 상관관계를 각 점수대별 발생확률로서 나타낸 것이고, 도 5b는 제2 평가영역(Rubric #2)과 제4 평가영역(Rubric #4) 간의 상관관계를 각 점수대별 발생확률로서 나타내며, 도 5c는 제3 평가영역(Rubric #3)과 제4 평가영역(Rubric #4) 간의 상관관계를 나타낸다.For example, the correlation model generator 120 may define, as shown in FIG. 5A to FIG. 5C, the characteristics influencing the scoring between evaluation areas as a generation probability table for each score range between evaluation areas. have. In the present embodiment, the first to fourth evaluation areas are set, and when the scores in the range of 0 to 5 are scored for each evaluation area, the other evaluation areas (Rubric # 4) based on the other evaluation areas ( The score correlation with Rubric # 1, # 2, # 3) is analyzed and illustrated. Specifically, FIG. 5A illustrates the correlation between the first evaluation region Rubric # 1 and the fourth evaluation region Rubric # 4 as occurrence probabilities for each score group, and FIG. 5B illustrates a second evaluation region Rubric # 2. And the correlation between the fourth evaluation region (Rubric # 4) as the probability of occurrence for each score group, Figure 5c shows the correlation between the third evaluation region (Rubric # 3) and the fourth evaluation region (Rubric # 4).
이러한 상관관계 모델을 이용하여, 평가영역 간 점수 발생 확률을 확인할 수 있다. 예를 들어, 도 5c를 참조하면, 제3 평가영역(Rubric #3)이 3점일 때, 제4 평가영역(Rubric #4)이 0점일 확률은 0%이고, 1점일 확률은 0.2%이고, 2점일 확률은 5.6%, 3점일 확률은 16.4%이고, 4점일 확률은 0.4%이며, 5점인 확률은 0%이다. 따라서, 제3 평가영역(Rubric #3)이 3점일 경우, 제4 평가영역(Rubric #4)의 점수는 3점이나 2점일 가능성이 매우 높다. 또한, 제4 평가영역(Rubric #4)이 3점일 때 제3 평가영역(Rubric #3)이 0점 또는 1점일 확률은 0%이고, 2점일 확률은 2.8%이고, 3점일 확률은 16.4%, 4점일 확률은 6.6%이고, 5점일 확률은 0.6%임을 알 수 있다. 이러한 평가영역 간 상관관계 모델을 통해 제3 평가영역(Rubric #3)에서 높은 점수를 받은 답안은 제4 평가영역(Rubric #4)에서도 높은 점수를 받을 확률이 높으며, 제3 평가영역에서 낮은 점수를 받은 답안은 제4 평가영역에서도 낮은 점수를 받을 확률이 높음을 알 수 있다. 이는 특정 답안에 대한 하나 이상의 평가영역들이 서로 독립적이지 않으면서 언어교육학적으로 연결되기 때문이다.By using such a correlation model, the probability of occurrence of scores between evaluation areas can be checked. For example, referring to FIG. 5C, when the third evaluation region Rubric # 3 has three points, the probability that the fourth evaluation region Rubric # 4 is zero is 0%, and the probability of one point is 0.2%. The probability of 2 points is 5.6%, the probability of 3 points is 16.4%, the probability of 4 points is 0.4%, and the probability of 5 points is 0%. Therefore, when the third evaluation area (Rubric # 3) is three points, the score of the fourth evaluation area (Rubric # 4) is very likely to be three or two points. In addition, when the fourth evaluation region (Rubric # 4) has three points, the probability that the third evaluation region (Rubric # 3) is 0 or 1 point is 0%, the probability of 2 points is 2.8%, and the probability of 3 points is 16.4%. , The probability of 4 points is 6.6%, and the probability of 5 points is 0.6%. An answer that receives a high score in the third evaluation area (Rubric # 3) through the correlation model between these evaluation areas is likely to receive a high score in the fourth evaluation area (Rubric # 4), and a low score in the third evaluation area. The answer received was found to have a high probability of receiving a low score in the fourth evaluation area. This is because one or more assessment areas for a particular answer are linguistically linked without being independent of each other.
자동 채점부(130)는 피 평가자로부터 채점할 시험 답안인 신규 채점 대상 데이터를 입력받아, 채점 모델 생성부(110)에서 생성된 평가영역 별 채점 모델을 이용하여 채점 대상 데이터에 대하여 하나 이상의 평가영역 별 점수를 자동으로 산출한다.The automatic scoring unit 130 receives new scoring object data, which is a test answer to be scored, by the evaluator, and uses one or more evaluation regions for the scoring object data using the scoring model for each evaluation region generated by the scoring model generator 110. The star score is automatically calculated.
이어서, 점수 튜닝부(140)는 자동 채점부(130)에서 출력되는 채점 대상 데이터에 대한 평가 영역별 자동 채점 점수를 상관관계 모델 생성부(120)에서 생성된 평가영역 간 상관관계 모델을 통해 튜닝한다. 구체적으로, 점수 튜닝부(140)는 평가영역별 자동 채점 점수를 비교하여, 상관관계 이격도가 기 설정된 범위보다 큰 점수를 갖는 이상 평가영역을 선별하고, 상기 선별된 이상 평가영역과 나머지 평가영역 간의 상관관계 모델에 근거하여 상기 이상 평가영역의 자동 채점 점수를 조정할 수 있다. Subsequently, the score tuning unit 140 tunes an automatic scoring score for each evaluation area for the scoring target data output from the automatic scoring unit 130 through the correlation model between the evaluation areas generated by the correlation model generator 120. do. In detail, the score tuning unit 140 compares the automatic scoring scores for each evaluation area, selects an abnormal evaluation area having a score greater than a preset range, and between the selected abnormal evaluation area and the remaining evaluation areas. The automatic scoring score of the abnormal evaluation region may be adjusted based on the correlation model.
그러면, 이와 같이 구성된 자동 채점 장치에서 구현되는 본 발명의 실시 예에 따른 자동 채점 방법을 도 2를 참조하여 구체적으로 설명하기로 한다. Then, the automatic scoring method according to the embodiment of the present invention implemented in the automatic scoring apparatus configured as described above will be described in detail with reference to FIG. 2.
도 2는 본 발명의 실시 예에 따라 자동 평가 서비스 시스템에서 평가영역 간 상관관계 모델을 적용하여 자동 채점을 수행하기 위한 방법을 도시한 도면이다. 2 is a diagram illustrating a method for performing automatic scoring by applying a correlation model between evaluation areas in an automatic evaluation service system according to an exemplary embodiment of the present invention.
도 2를 참조하면, 본 발명의 실시 예에 다른 자동 채점 장치(100)는 1101단계에서 시험관들에 의하여 기 채점된 하나 이상의 채점 데이터를 수집한다. 하나 이상의 채점 데이터는, 하나 이상의 답안을 한 명 이상의 시험관이 각각 하나 이상의 평가영역에 대한 채점한 점수에 대한 정보를 포함한다. Referring to FIG. 2, the automatic scoring device 100 according to an embodiment of the present invention collects one or more scoring data previously scored by the examiners in step 1101. The one or more scoring data includes information about the scores of one or more answers each of the one or more examiners for one or more evaluation areas.
이후, 자동 채점 장치(100)는 1102단계에서 수집한 하나 이상의 채점 데이터를 기반으로 기계 학습을 통해 평가영역 별 채점 모델을 생성한다. 더 구체적으로, 자동 채점 장치(100)는 평가영역 별 기 채점 데이터에 대응하는 답안으로부터 자동으로 평가 가능한 평가자질들(예들 들어, 단어 수, 형용사 수, 문법 오류, 철자 오류, 시제 부합 여부, 모범 답안과의 유사도 등)을 분석한다. 그리고, 분석된 평가자질들과 상기 하나 이상의 채점 데이터를, 평가영역 별로 각각 기계 학습시켜서 평가 가능한 평가자질들에 근거하여 각 평가영역 별로 점수를 산출하는 평가영역 별 채점 모델을 생성한다.Thereafter, the automatic scoring apparatus 100 generates a scoring model for each evaluation region through machine learning based on one or more scoring data collected in step 1102. More specifically, the automatic scoring apparatus 100 may automatically evaluate the evaluation qualities (eg, word count, adjective number, grammar error, spelling error, tense match, model, etc.) from the answer corresponding to the scoring data for each evaluation region. Analyze the similarity with the answer). Then, a scoring model for each evaluation area is generated to calculate scores for each evaluation area based on the evaluated evaluation properties and the at least one scoring data by machine learning for each evaluation area.
또한, 자동 채점 장치(100)는 1103단계에서 수집된 평가영역 별 채점 데이터를 기반으로 평가영역 간 상관관계 모델을 첨부된 도 5a 내지 도 5c와 같이, 생성한다. 평가영역 간 상관관계 모델은 두 평가영역 간의 상관관계를 구조화하여 나타낸 것으로서, 예를 들어, 4개의 평가영역이 존재할 경우, 6개의 상관관계 모델이 생성될 수 있다. 여기서, 상기 평가영역간 상관관계 모델은, 상기 두 평가영역 간의 점수 별 발생 확률을 정의하는 형태로 구현될 수 있다.In addition, the automatic scoring apparatus 100 generates a correlation model between evaluation regions based on the scoring data for each evaluation region collected in operation 1103, as shown in FIGS. 5A to 5C. The correlation model between evaluation areas is a structural representation of the correlation between two evaluation areas. For example, if four evaluation areas exist, six correlation models may be generated. Here, the correlation model between the evaluation areas may be implemented in the form of defining the occurrence probability for each score between the two evaluation areas.
이후, 자동 채점 장치(100)는 1104단계에서 시험에 응시한 피 평가자가 특정 문제에 대하여 작성한 채점 대상 데이터를 신규로 입력 받는다. Subsequently, the automatic scoring apparatus 100 newly receives the scoring target data prepared by the evaluator who took the test in step 1104 about the specific problem.
신규 채점 대상 데이터가 입력되면, 자동 채점 장치(100)는 1105단계에서 평가영역 별로 생성된 채점 모델을 적용하여 상기 채점 대상 데이터에 대하여 하나 이상의 평가영역 별로 자동 채점 점수를 산출한다. 구체적으로 설명하면, 상기 신규 채점 대상 데이터로부터, 하나 이상의 평가자질을 추출하고, 추출된 평가자질을 상기 평가영역별 채점 모델에 입력하여, 평가영역별 자동 채점 점수를 산출한다.When the new scoring target data is input, the automatic scoring apparatus 100 calculates an automatic scoring score for each of the scoring targets by one or more evaluation regions by applying the scoring model generated for each evaluation region in operation 1105. Specifically, at least one evaluation feature is extracted from the new scoring object data, and the extracted evaluation feature is input to the scoring model for each evaluation area to calculate an automatic scoring score for each evaluation area.
이렇게 산출된 평가영역 별 자동 채점 점수는 평가영역 간의 상관 관계가 반영되어 있지 않기 때문에, 오류를 포함할 수 있다. 이를 위하여, 본 발명은 이하에서 설명하는 상관관계 모델을 이용하여 상기 자동 채점 결과를 튜닝하는 과정을 더 수행한다.The automatic scoring scores for the evaluation areas calculated as described above may include errors because the correlations between the evaluation areas are not reflected. To this end, the present invention further performs a process of tuning the automatic scoring result using the correlation model described below.
구체적으로, 자동 채점 장치(100)는 1106단계에서 자동 채점을 통해 산출된 각 평가영역 별 자동 채점 점수를 비교하여, 상관관계 이격도가 기 설정된 범위를 벗어나는 점수를 갖는 이상 평가영역을 선별한다. 여기서, 상관관계 이격도는, 두 평가영역의 점수 차이 혹은 두 평가영역의 자동 채점 점수가 동시에 발생할 확률로서 정의될 수 있다.In detail, the automatic scoring apparatus 100 compares the automatic scoring score for each evaluation region calculated through the automatic scoring in operation 1106, and selects the abnormal evaluation region having a score whose correlation distance is out of the preset range. Here, the correlation spacing may be defined as a probability that a difference between scores of two evaluation areas or an automatic scoring score of two evaluation areas occurs at the same time.
도 6은 본 발명에 따른 자동 채점 방법을 설명하기 위한 예시로서, 수험번호는 각 피 평가자를 식별하는 정보로서, 피 평가자별 답안에 대한 시험관의 주관적인 채점 결과는 왼쪽에 나타내고, 동일 답안에 대하여 평가영역 별 채점 모델을 이용한 산출한 자동 채점 점수를 오른쪽에 나타낸다. 여기서, 채점은 4개의 평가영역(Rubric #1 내지 #4)에 대하여 이루어진다.Figure 6 is an example for explaining the automatic scoring method according to the present invention, the examinee number is information for identifying each subject, the subjective scoring results of the examiner for the answer by each subject is shown on the left, the same answer is evaluated The automatic scoring score calculated using the area scoring model is shown on the right. In this case, the scoring is performed on four evaluation areas (Rubric # 1 to # 4).
예를 들어, 도 6에 도시된 수험번호 "20121102"의 피 평가자에 대한 답안을 평가영역 별 채점 모델을 이용하여 자동 채점한 결과, 제1 평가영역(Rubric #1)의 점수는 4점, 제2 평가영역(Rubric #2)의 점수는 3점, 제3 평가영역(Rubric #3)의 점수는 3점, 제4 평가영역(Rubric #4)의 점수는 0점이 산출되었다. 이 경우, 1106 단계에 따라서 상관관계 이격도가 기 설정된 범위를 벗어난 이상 평가영역을 선별하면, 자동 채점 결과의 제4 평가영역(Rubric #4)의 점수가 0점으로 다른 평가영역들의 점수와의 차이가 크므로, 상기 제4 평가영역(Rubric #4)을 이상 평가영역으로 선별할 수 있다. 여기서, 이상 평가영역의 선별은 각 평가영역 별로 나머지 평가영역의 자동 채점 점수의 평균값과 자신의 자동 채점 점수의 차이를 기준으로 이루어질 수 있다. 즉, 각 평가영역의 점수가 나머지 평가영역의 자동 채점 점수의 평균값과 기 설정된 기준값 이상으로 차이가 나는 평가영역을 이상 평가영역으로 선별한다. 이때, 이상 평가영역의 선별 기준 δ는 임의로 결정할 수 있다. For example, as a result of automatically scoring the answer for the examinee of the examinee number "20121102" shown in FIG. 6 using a scoring model for each evaluation area, the score of the first evaluation area (Rubric # 1) is 4 points, 2 The score of the evaluation area (Rubric # 2) was calculated by 3 points, the score of the third evaluation area (Rubric # 3) by 3 points, the score of the fourth evaluation area (Rubric # 4) by 0 points. In this case, according to step 1106, if the abnormal evaluation region is selected out of the predetermined range, the score of the fourth evaluation region (Rubric # 4) of the automatic scoring result is 0, and the difference from the scores of the other evaluation regions. Since the fourth evaluation area (Rubric # 4) can be selected as an abnormal evaluation area. Here, the selection of the abnormal evaluation region may be made based on a difference between the average value of the automatic scoring scores of the remaining evaluation regions and their own automatic scoring scores for each evaluation region. That is, the evaluation areas in which the scores of each evaluation area differ by more than a predetermined reference value from the average value of the automatic scoring scores of the remaining evaluation areas are selected as the abnormal evaluation areas. In this case, the selection criterion δ of the abnormal evaluation region may be arbitrarily determined.
이후, 1107단계에서 자동 채점 장치(100)는 평가영역 간 상관관계 모델을 적용하여 선별된 이상 평가영역의 자동 채점 점를 튜닝한다. 구체적으로, 자동 채점 장치(100)는 상기 선별된 이상 평가영역의 자동 채점 점수 및 나머지 평가영역들의 자동 채점 점수를 확인하고, 상관관계 모델을 통해서 나머지 평가영역들의 자동 채점 점수를 기준으로 이상 평가영역의 점수별(예를 들어, 0 내지 5 점) 발생 확률을 산출한다. 이후, 자동 채점 장치(100)는 상기 선별된 이상 평가영역의 점수 별로 나머지 평가영역들의 자동 채점 점수가 발생한 확률들의 합을 구하고, 상기 확률의 총합이 가장 높은 점수를 추출한다. 그리고 자동 채점 장치(100)는 가장 높은 확률을 갖는 점수로 상기 선별된 이상 평가영역의 자동 채점 점수를 변경함으로써, 점수 튜닝을 수행할 수 있다. In operation 1107, the automatic scoring apparatus 100 tunes the automatic scoring of the selected abnormal evaluation region by applying a correlation model between the evaluation regions. Specifically, the automatic scoring apparatus 100 checks the automatic scoring scores of the selected abnormal evaluation areas and the automatic scoring scores of the remaining evaluation areas, and based on the automatic scoring scores of the remaining evaluation areas through the correlation model, the abnormal evaluation areas. The probability of occurrence for each score (for example, 0 to 5 points) is calculated. Thereafter, the automatic scoring apparatus 100 obtains a sum of probabilities of generating automatic scoring scores of the remaining evaluation regions for each score of the selected abnormal evaluation region, and extracts a score having the highest sum of the probabilities. The automatic scoring apparatus 100 may perform score tuning by changing the automatic scoring score of the selected abnormal evaluation region to the score having the highest probability.
도 6에 도시된 예시를 참조하면, 수험번호 "20121102"의 피 평가자의 자동 채점 결과로부터, 제4 평가영역이 이상 평가영역으로 선별되었으며, 이때 나머지 제1, 2, 3 평가영역들의 자동 채점 점수는 각각 4점, 3점, 3점이었다. 이 경우, 자동 채점 장치(100)는 도 7에 도시된 바와 같이, 제1 평가영역이 4점일 때, 제4 평가영역의 점수대별(0 내지 5 점)의 발생 확률, 제2 평가영역이 3점일 때 제4 평가영역 점수대별(0 내지 5 점)의 발생 확률, 제3 평가영역이 3점일 때 제4 평가영역의 점수대별(0 내지 5 점)의 발생 확률을 확인한다. 이어서, 제4 평가영역의 점수대별로 나머지 제1, 2, 3 평가영역의 자동 채점 점수가 발생한 확률들의 합을 구하여, 그 합이 최대가 되는 제4 평가영역의 점수를 검출한다. 도 7의 예시를 참조하면, 제1~제3 평가영역(Rubric #1~#3)의 자동 채점 점수가 각각 4,3,3일때, 제4 평가영역(Rubric #4)의 점수 중 3점이 발생할 확률이 40%로 가장 높게 나타남을 알 수 있다.Referring to the example illustrated in FIG. 6, the fourth evaluation region is selected as the abnormal evaluation region from the automatic scoring result of the examinee having the examinee number “20121102”, and the automatic scoring scores of the remaining first, second and third evaluation regions are evaluated. Were 4 points, 3 points, and 3 points, respectively. In this case, as shown in FIG. 7, the automatic scoring apparatus 100 has a probability of occurrence of score points (0 to 5 points) of the fourth evaluation area when the first evaluation area is four points, and the second evaluation area is three points. The occurrence probability of each of the fourth evaluation region score points (0 to 5 points) when the point is, and the occurrence probability of the fourth evaluation region by score points (0 to 5 points) when the third evaluation area is 3 points. Subsequently, the sum of the probabilities of generating the automatic scoring scores of the remaining first, second, and third evaluation areas for each score range of the fourth evaluation area is obtained, and the score of the fourth evaluation area having the maximum is detected. Referring to the example of FIG. 7, when the automatic scoring scores of the first to third evaluation areas (Rubric # 1 to # 3) are 4, 3, and 3, respectively, three points of the scores of the fourth evaluation area (Rubric # 4) are scored. It can be seen that the probability of occurrence is the highest with 40%.
따라서, 본 발명에 따른 자동 채점 장치(100)는, 도 8에 도시된 바와 같이, 이상 평가영역으로 선별된 제4 평가영역의 자동 점수를, 0에서 3으로 변경한다.Therefore, the automatic scoring apparatus 100 according to the present invention changes the automatic score of the fourth evaluation region selected as the abnormal evaluation region from 0 to 3, as shown in FIG. 8.
이에 따르면, 자동 채점 장치(100)에서 최종적인 자동 채점 결과는, 도 8에 도시된 바와 같이, 시험관에 의해 채점한 결과와 유사하게 조정됨을 알 수 있다.According to this, it can be seen that the final automatic scoring result in the automatic scoring apparatus 100 is adjusted similarly to the scoring result by the examiner, as shown in FIG. 8.
이후, 1108단계에서 자동 채점 장치(100)는 점수 튜닝을 통해 최종 자동 채점 결과 데이터를 산출하고, 산출된 최종 자동 채점 결과 데이터에 대한 최종 자동 채점 결과 정보를 피 평가자에게 제공할 수 있다.Thereafter, in operation 1108, the automatic scoring apparatus 100 may calculate final automatic scoring result data through score tuning, and provide final evaluating result information on the calculated final automatic scoring result data to the evaluator.
이러한 본 발명에 따른 자동 평가 장치 및 방법은, 네트워크를 기반으로 한 자동 평가 서비스 시스템에 적용될 수 있다.The automatic evaluation apparatus and method according to the present invention can be applied to an automatic evaluation service system based on a network.
도 3은 본 발명의 실시 예에 따른 자동 평가 장치가 적용된 자동 평가 서비스 시스템의 구성을 도시한 도면이다.3 is a diagram illustrating a configuration of an automatic evaluation service system to which an automatic evaluation apparatus according to an exemplary embodiment of the present invention is applied.
도 3을 참조하면, 자동 평가 서비스 시스템은 통신망(10)을 통해 연결되는 다수의 단말 장치(20) 및 자동 채점 장치(100_1)를 포함하는 평가 서비스 서버(30)로 구성할 수 있다.Referring to FIG. 3, the automatic evaluation service system may include an evaluation service server 30 including a plurality of terminal devices 20 and an automatic scoring device 100_1 connected through the communication network 10.
다수의 단말 장치(20)는 사용자의 키 조작에 따라 통신망(10)을 경유하여 각종 데이터를 송수신할 수 있는 단말기를 말하는 것이며, 태블릿 PC(Tablet PC), 랩톱(Laptop), 개인용 컴퓨터(PC: Personal Computer), 스마트폰(Smart Phone), 개인휴대용 정보단말기(PDA: Personal Digital Assistant), 스마트 TV 및 이동통신 단말기(Mobile Communication Terminal) 등 중 어느 하나일 수 있다. 또한, 단말 장치(20)는 통신망(10)을 이용하여 음성 또는 데이터 통신을 수행하는 단말기이며, 통신망(10)을 경유하여 평가 서비스 서버(30)와 통신하기 위한 브라우저, 프로그램 및 프로토콜을 저장하는 메모리, 각종 프로그램을 실행하여 연산 및 제어하기 위한 마이크로프로세서 등을 구비하고 있는 단말기를 의미한다. 즉, 단말 장치(20)는 평가 서비스 서버(30)와 서버-클라이언트 통신이 가능하다면 그 어떠한 단말기도 가능하며, 노트북 컴퓨터, 이동통신 단말기, PDA 등의 통신 컴퓨팅 장치를 모두 포함하는 넓은 개념이다. 한편, 단말 장치(20)는 터치 스크린을 구비한 형태로 제작되는 것이 바람직하나 반드시 이에 한정되는 것은 아니다. The plurality of terminal devices 20 refers to a terminal capable of transmitting and receiving various data via the communication network 10 according to a user's key manipulation, and may be a tablet PC, a laptop, or a personal computer. It may be one of a personal computer, a smart phone, a personal digital assistant (PDA), a smart TV, and a mobile communication terminal. In addition, the terminal device 20 is a terminal for performing voice or data communication using the communication network 10, and stores a browser, a program, and a protocol for communicating with the evaluation service server 30 via the communication network 10. Means a terminal having a memory, a microprocessor for operating and controlling various programs. That is, the terminal device 20 may be any terminal as long as server-client communication with the evaluation service server 30 is possible, and is a broad concept including all communication computing devices such as notebook computers, mobile communication terminals, and PDAs. Meanwhile, the terminal device 20 is preferably manufactured in a form having a touch screen, but is not necessarily limited thereto.
특히, 본 발명의 실시 예에 따른 다수의 단말 장치(20)는 자동 채점 서비스를 제공받기 위한 단말을 의미하며, 피 평가자의 단말 장치 또는 시험관의 단말 장치일 수 있다. 이러한 다수의 단말 장치(20)는 통신망(10)을 통해 평가 서비스 서버(100)와 연동하며, 피 평가자로부터 시험 답안을 입력 받아 평가 서비스 서버(30)로 전송하고, 평가 서비스 서버(30)로부터 상기 시험 답안에 대한 자동 평가 결과를 전송받을 수 있다. 특히, 평가 서비스 서버(30)로부터 평가영역 별 상관관계 모델을 적용하여 자동 채점된 채점 결과 데이터를 제공받아 사용자에게 안내할 수 있다.In particular, the plurality of terminal devices 20 according to the embodiment of the present invention mean a terminal for receiving an automatic scoring service, and may be a terminal device of an examinee or a terminal device of an examiner. The plurality of terminal devices 20 interoperate with the evaluation service server 100 through the communication network 10, receive a test answer from an evaluator, and transmit the test answer to the evaluation service server 30, from the evaluation service server 30. An automatic evaluation result for the test answer may be sent. In particular, by applying the correlation model for each evaluation area from the evaluation service server 30 may receive the automatically scored scoring result data to guide the user.
평가 서비스 서버(30)는 단말 장치(20)로부터 전송된 답안에 대한 자동 평가를 수행하여 그 평가 결과를 제공하는 서버 장치로서, 본 발명에 따른 상관관계 모델을 적용한 자동 채점 장치(100_1)를 포함할 수 있다.The evaluation service server 30 is a server device that performs an automatic evaluation on an answer transmitted from the terminal device 20 and provides the evaluation result. The evaluation service server 30 includes an automatic scoring device 100_1 to which a correlation model according to the present invention is applied. can do.
자동 채점 장치(100_1)는 통신망(10)을 통해 다수의 단말 장치(20)와 연동하여 자동 채점 서비스를 제공할 수 있다. 이러한 자동 채점 장치(100_1)는 시험관으로부터 평가영역 별 채점 데이터를 수집하여 각각 데이터베이스에 평가영역 별로 미리 설정하여 저장할 수 있다. 이때, 평가영역 별 채점 데이터 및 평가 데이터는 시험관으로부터 직접 입력 받거나, 통신망(10)을 통해서 전송 받을 수 있다.The automatic scoring apparatus 100_1 may provide an automatic scoring service in cooperation with a plurality of terminal apparatuses 20 through the communication network 10. The automatic scoring apparatus 100_1 may collect scoring data for each evaluation area from the examiner and store the evaluation data in advance for each evaluation area in the database. At this time, the scoring data and evaluation data for each evaluation area may be directly input from the examiner or may be transmitted through the communication network 10.
또한, 자동 채점 장치(100_1)는 수집한 평가영역 별 채점 데이터 및 평가자질을 이용한 기계 학습을 통해 평가영역 별 채점 모델을 생성하고, 아울러 상기 평가영역의 채점 결과를 비교하여 언어교육학적 특성, 평가영역 특성, 시험관의 답안 평가 특성 등을 반영하여 평가영역 간 상관관계 모델을 생성할 수 있다. 아울러 자동 채점 장치(100_1)는 단말 장치(20)로부터 신규 채점 대상 데이터를 수신되면, 상기 신규 채점 대상 데이터로부터 평가 자질을 추출한다. 그리고, 추출한 평가 자질을 상기 생성된 평가영역별 채점 모델에 입력하여, 상기 신규 채점 대상 데이터에 대한 평가영역별 자동 채점 점수를 산출한다. 이어서, 상기 자동 채점 장치(100_1)은, 상기 생성한 평가영역 간 상관관계 모델을 적용하여, 상관관계의 이격도가 기 설정 기준값보다 큰 점수를 갖는 이상 평가영역을 선별한다. 그리고, 자동 채점 장치(100_1_)은 상기 선별한 이상 평가영역을 제외한 나머지 평가영역의 자동 채점 점수를 기준으로 상기 이상 평가영역의 점수별 발생 확률을 상기 상관관계 모델을 이용하여 산출하고, 상기 점수별 발생 확률을 비교하여 가장 높은 확률을 갖는 점수를 선별된 이상 평가영역의 자동 채점 점수로 적용한다. 자동 채점 장치(100_1)는 이렇게 산출된 최종 자동 채점 점수를 해당 단말 장치(20)로 제공할 수 있다. 이러한 자동 채점 장치(100_1)의 구체적인 구성은 도 1 및 도 2를 참조하여 설명한 바 있으므로, 그 중복 설명은 생략하기로 한다.In addition, the automatic scoring device 100_1 generates a scoring model for each evaluation area through machine learning using the collected scoring data and evaluation quality of each evaluation area, and compares the scoring results of the evaluation area to evaluate language pedagogical characteristics and evaluation. Correlation models between assessment areas can be created by reflecting domain characteristics, examiner's answer evaluation characteristics, etc. In addition, when the automatic scoring apparatus 100_1 receives the new scoring target data from the terminal device 20, the automatic scoring apparatus 100_1 extracts an evaluation feature from the new scoring target data. Then, the extracted evaluation feature is input to the generated scoring region-specific scoring model to calculate an automatic scoring score for each evaluation region for the new scoring object data. Subsequently, the automatic scoring apparatus 100_1 applies the generated correlation model between the evaluation regions, and selects the abnormal evaluation region having a score of a correlation greater than a predetermined reference value. The automatic scoring apparatus 100_1_ calculates a probability of occurrence of each of the abnormal evaluation areas by using the correlation model based on the automatic scoring scores of the remaining evaluation areas other than the selected abnormal evaluation area, and calculates the scores by the correlation model. By comparing the probability of occurrence, the highest probability score is applied as the automatic scoring score of the selected abnormal evaluation region. The automatic scoring apparatus 100_1 may provide the terminal apparatus 20 with the final automatic scoring score thus calculated. Since the detailed configuration of the automatic scoring apparatus 100_1 has been described with reference to FIGS. 1 and 2, a redundant description thereof will be omitted.
또한, 본 발명에 따른 자동 채점 방법은, 단말 장치에 탑재되는 프로그램 형태로 구현되어 이용될 수 있다.In addition, the automatic scoring method according to the present invention may be implemented and used in the form of a program mounted on the terminal device.
도 4는 본 발명의 실시 예에 따른 자동 평가 방법에 따른 프로그램을 구비한 단말 장치를 도시한 도면이다.4 is a diagram illustrating a terminal device having a program according to an automatic evaluation method according to an exemplary embodiment of the present invention.
도 4를 참조하면, 단말 장치(40)는 제어부(210), 통신부(220), 입력부(230), 저장부(240) 및 출력부(250)를 포함하여 구성될 수 있다. 이러한 단말 장치(40)는, 본 발명에 따른 자동 채점 프로그램(100_2)를 설치하고 실행하여, 본 발명에 따른 자동 채점 방법을 수행할 수 있는 사용자 정보 처리 장치로서, 프로그램 설치 및 실행이 가능한 단말이라면 어떠한 것이라도 가능하다. 예를 들어, 단말 장치(40), 태블릿 PC(Tablet PC), 랩톱(Laptop) 컴퓨터, 개인용 컴퓨터(PC: Personal Computer), 스마트폰(Smart Phone), 개인휴대용 정보단말기(PDA: Personal Digital Assistant), 스마트 TV 및 이동통신 단말기(Mobile Communication Terminal) 등 중 어느 하나일 수 있다.Referring to FIG. 4, the terminal device 40 may include a control unit 210, a communication unit 220, an input unit 230, a storage unit 240, and an output unit 250. The terminal device 40 is a user information processing device capable of installing and executing the automatic scoring program 100_2 according to the present invention and performing the automatic scoring method according to the present invention. Anything is possible. For example, the terminal device 40, a tablet PC (Tablet PC), a laptop (Laptop) computer, a personal computer (PC), a smart phone (Smart Phone), a personal digital assistant (PDA) , A smart TV, a mobile communication terminal, and the like.
제어부(210)는 단말 장치(40)의 전반적인 동작 및 자동 채점 서비스 실행에 관련된 동작을 제어한다. 특히, 제어부(210)는 입력부(230)로부터 사용자의 시험 응시 요청 신호를 입력 받으면, 입력된 시험 응시 요청 정보에 따라 시험 응시를 위한 어플리케이션을 실행하고, 시험 문제 등을 출력부(250)의 화면에 표시하도록 제어한다. 이에 따라 제어부(210)는 시험 문제의 답에 대한 정보 즉, 채점 대상 데이터를 입력부(230)를 통해 입력 받아 처리하고, 처리된 채점 대상 데이터를 저장부(140)를 저장한다. 그리고, 자동 채점 프로그램(100_2)를 실행하여, 신규 데이터를 자동 채점하도록 제어한다. 또한, 제어부(310)는 최종 자동 채점 결과 정보를 출력부(250)의 화면을 통해 사용자에게 안내하도록 제어한다.The controller 210 controls the overall operation of the terminal device 40 and the operation related to the automatic scoring service execution. In particular, when the controller 210 receives a user's test take request signal from the input unit 230, the controller 210 executes an application for taking a test according to the input test take request information, and displays a test problem or the like on the screen of the output unit 250. Control to display Accordingly, the controller 210 receives and processes the information on the answer of the test question, that is, the scoring target data through the input unit 230, and stores the processed scoring target data in the storage 140. Then, the automatic scoring program 100_2 is executed to control automatic scoring of new data. In addition, the controller 310 controls the user to guide the final automatic scoring result information through the screen of the output unit 250.
통신부(220)는 통신망을 통해 데이터를 송수신하기 위한 것으로서 이러한 통신부(220)는 유선 방식 및 무선 방식뿐만 아니라 다양한 통신 방식을 통해서 데이터를 송수신할 수 있다. 더하여, 통신부(220)는 하나 이상의 통신 방식을 사용하여 데이터를 송수신할 수 있으며, 이를 위하여 통신부(220)는 각각 서로 다른 통신 방식에 따라서 데이터를 송수신하는 복수의 통신 모듈을 포함할 수 있다.The communication unit 220 is for transmitting and receiving data through a communication network. The communication unit 220 may transmit and receive data through various communication methods as well as wired and wireless methods. In addition, the communication unit 220 may transmit and receive data using one or more communication methods, and for this purpose, the communication unit 220 may include a plurality of communication modules that transmit and receive data according to different communication methods.
입력부(230)는 사용자의 조작에 따라서 사용자의 요청이나 정보에 해당하는 사용자 입력 신호를 발생할 수 있으며, 현재 상용화되어 있거나 향후 상용화가 가능한 다양한 입력 수단으로 구현될 수 있으며, 예를 들면, 키보드, 마우스, 조이스틱, 터치 스크린, 터치 패드 등과 같은 일반적인 입력 장치뿐만 아니라, 사용자의 모션을 감지하여 특정 입력 신호를 발생하는 제스처 입력 수단을 포함할 수 있다. 입력부(230)는 사용자로부터 입력된 정보를 제어부(210)로 전달할 수 있다. 즉, 입력부(230)는 피 평가자로부터 시험 문제에 대한 답안, 즉, 신규 채점 대상 데이터를 입력받을 수 있다.The input unit 230 may generate a user input signal corresponding to a user's request or information according to a user's operation, and may be implemented by various input means that are currently commercialized or may be commercialized in the future. For example, a keyboard and a mouse In addition to a general input device such as a joystick, a touch screen, a touch pad, and the like, a gesture input means for detecting a user's motion and generating a specific input signal may be included. The input unit 230 may transfer information input from the user to the controller 210. That is, the input unit 230 may receive an answer to a test question, that is, new scoring target data, from an evaluator.
저장부(240)는 단말 장치(40)의 동작에 필요한 정보들을 저장하며, 특히, 자동 채점 서비스에 관련된 정보들을 저장할 수 있다. 특히, 본 발명에 따른 자동 채점 방법을 수행하도록 프로그램된 자동 채점 프로그램(100_2)를 저장할 수 있다. 이러한 저장부(240)는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(Magnetic Media), CD-ROM(Compact Disk Read Only Memory), DVD(Digital Video Disk)와 같은 광 기록 매체(Optical Media), 플롭티컬 디스크(Floptical Disk)와 같은 자기-광 매체(Magneto-Optical Media) 및 롬(ROM), 램(RAM, Random Access Memory), 플래시 메모리를 포함한다.The storage unit 240 stores information necessary for the operation of the terminal device 40, and in particular, may store information related to an automatic scoring service. In particular, the automatic scoring program 100_2 programmed to perform the automatic scoring method according to the present invention may be stored. The storage unit 240 may include an optical recording medium such as a magnetic media such as a hard disk, a floppy disk, and a magnetic tape, a compact disk read only memory (CD-ROM), and a digital video disk (DVD). And magneto-optical media such as floppy disks and ROM, random access memory (RAM), and flash memory.
출력부(250)는 단말 장치(40)의 동작 결과나 상태를 사용자가 인식할 수 있도록 제공하는 수단으로서, 예를 들면, 화면을 통해 시각적으로 출력하는 표시부나, 가청음을 출력하는 스피커 등을 포함할 수 있다. 특히, 단말 장치(40)에서 구동되는 자동 채점 서비스에 관련된 화면을 표시할 수 있으며, 사용자의 요청에 따라 자동 채점 서비스 실행을 위한 화면을 표시할 수 있다. 또한, 출력부(250)는 피 평가자로부터 입력된 시험 문제에 대한 답안, 즉, 채점 대상 데이터를 표시하거나, 상기 채점 대상 데이터에 대한 자동 채점 점수를 화면에 표시할 수 있다.The output unit 250 is a means for providing the user to recognize the operation result or state of the terminal device 40, and includes, for example, a display unit for visually outputting through a screen or a speaker for outputting an audible sound. can do. In particular, a screen related to an automatic scoring service driven by the terminal device 40 may be displayed, and a screen for executing the automatic scoring service may be displayed according to a user's request. In addition, the output unit 250 may display an answer to a test question input from an evaluator, that is, scoring target data, or display an automatic scoring score for the scoring target data on the screen.
즉, 단말 장치(40)는 자동 채점 프로그램(100_2)를 실행하여, 입력부(230)를 통해 입력된 사용자의 답안, 즉, 채점 대상 데이터에 대하여 평가영역 별 채점 모델을 이용하여, 평가영역 별 자동 채점 점수를 산출하고, 이어서, 평가영역 간의 상관 관계 모델을 이용하여 상관관계 이격도가 기 설정된 범위를 벗어나는 점수를 갖는 이상 평가영역을 추출하고, 나머지 평가영역의 자동 채점 점수를 기준으로 상기 이상 평가영역의 점수별 발생 확률을 산출하여, 확률이 가장 높은 점수로 상기 이상 평가영역의 자동 채점 점수를 변경한다. 그리고, 단말 장치(40)는 상술한 바에 의하여 최종적으로 산출된 자동 채점 결과를 사용자에게 제공할 수 있다.That is, the terminal device 40 executes the automatic scoring program 100_2, and automatically uses the scoring model for each evaluation area for the user's answer, that is, the target data input through the input unit 230, for each evaluation area. A scoring score is calculated, and then an abnormal evaluation region having a score that is out of a predetermined range is extracted using a correlation model between evaluation regions, and the abnormal evaluation region is based on the automatic scoring score of the remaining evaluation regions. The probability of occurrence of each score is calculated, and the automatic scoring score of the abnormal evaluation region is changed to the highest probability score. In addition, the terminal device 40 may provide the user with the automatic scoring result finally calculated as described above.
여기서, 자동 채점 프로그램(100_2)에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다.Here, the program instructions recorded in the automatic scoring program 100_2 may be those specially designed and configured for the present invention or may be known and available to those skilled in computer software.
한편, 본 명세서와 도면에 개시된 본 발명의 실시 예들은 이해를 돕기 위해 특정 예를 제시한 것에 지나지 않으며, 본 발명의 범위를 한정하고자 하는 것은 아니다. 여기에 개시된 실시 예들 이외에도 본 발명의 기술적 사상에 바탕을 둔 다른 변형 예들이 실시 가능하다는 것은, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 자명한 것이다.On the other hand, the embodiments of the present invention disclosed in the specification and drawings are merely presented specific examples for clarity and are not intended to limit the scope of the present invention. It is apparent to those skilled in the art that other modifications based on the technical idea of the present invention can be carried out in addition to the embodiments disclosed herein.
본 발명은 자동 채점 장치 및 방법에 관한 것으로서, 채점 평가 데이터를 하나 이상의 평가영역별로 채점하는데 있어서, 언어교육학적 특성, 평가영역 특성, 시험관의 답안 평가 특성 등을 반영하여 평가영역 간의 상관관계 모델을 생성함으로써, 시험관이 주관적으로 적용하는 암묵적 판단 기준을 보다 현실적으로 모델링 할 수 있는 효과가 있다.The present invention relates to an automatic scoring apparatus and method, wherein in scoring grading data by one or more evaluation areas, a correlation model between evaluation areas is generated by reflecting language pedagogical characteristics, evaluation area characteristics, test tube answer evaluation characteristics, and the like. By generating, there is an effect that can more realisticly model the implicit judgment criteria that examiners subjectively apply.
또한, 본 발명은 생성된 평가영역 간의 상관관계 모델을 적용하여 평가영역 간의 상관관계 이격도가 기 설정된 범위를 벗어나는 이상 평가영역을 선별하여, 나머지 평가영역의 자동 채점 점수를 기준으로 가장 발생할 확률이 높은 점수로 튜닝함으로써, 시험관의 주관적인 채점 데이터와 보다 유사하게 채점이 가능하므로 자동 평가 성능을 높일 수 있는 효과가 있다.In addition, the present invention applies a correlation model between the generated evaluation areas to select an abnormal evaluation area that the correlation distance between evaluation areas is out of a predetermined range, the most likely to occur based on the automatic scoring of the remaining evaluation areas. By tuning the score, it is possible to score more similarly to the subjective scoring data of the examiner, thereby increasing the automatic evaluation performance.
이와 같은 본 발명은 자동 채점 서비스에 적용되어, 평가영역 간의 채점 상관관계를 고려하여 시험과 답안과 보다 유사하게 자동 채점을 수행할 수 있는 효과를 발생하는 유용한 발명으로, 이를 통해 서비스 산업의 발전에 이바지할 수 있다.The present invention is a useful invention that is applied to the automatic scoring service, a useful invention that generates the effect of performing automatic scoring more similarly to the test and answer in consideration of the scoring correlation between the evaluation areas, through which the development of the service industry I can contribute.

Claims (10)

  1. 기 생성된 평가영역 별 채점 모델을 적용하여 채점 대상 데이터에 대하여 상기 평가영역 별로 자동 채점을 수행하는 자동 채점부;An automatic scoring unit configured to perform automatic scoring for each evaluation region on the scoring target data by applying a previously generated evaluation model for each evaluation region;
    상기 자동 채점부로부터 출력된 상기 채점 대상 데이터에 대한 평가영역 별 자동 채점 점수를 평가영역 간의 상관관계 모델에 따라서 튜닝하여 최종 자동 채점 점수를 산출하는 점수 튜닝부;A score tuning unit that calculates a final automatic scoring score by tuning an automatic scoring score for each evaluation region for the scoring target data output from the automatic scoring unit according to a correlation model between evaluation regions;
    를 포함하는 자동 채점 장치.Automatic scoring device comprising a.
  2. 제1항에 있어서,The method of claim 1,
    하나 이상의 답안에 대하여 상기 하나 이상의 평가영역들을 평가한 기 채점 데이터 및 상기 하나 이상의 답안으로부터 추출된 하나 이상의 평가자질을 이용한 기계 학습을 통해 상기 평가영역 별 채점 모델을 생성하는 채점 모델 생성 부;A scoring model generation unit for generating a scoring model for each evaluation area through machine learning using pre-scoring data evaluating the one or more evaluation areas for at least one answer and at least one evaluation quality extracted from the at least one answer;
    를 더 포함하는 자동 채점 장치.Automatic scoring device comprising more.
  3. 제2항에 있어서,The method of claim 2,
    상기 기 채점 데이터를 기반으로 상기 하나 이상의 평가영역 간에 각 점수가 발생할 확률을 정의한 상기 평가영역 간의 상관관계 모델을 생성하는 상관관계 모델 생성부;A correlation model generator configured to generate a correlation model between the evaluation areas that define a probability of generating each score between the one or more evaluation areas based on the pre-marked data;
    를 더 포함하는 자동 채점 장치.Automatic scoring device comprising more.
  4. 제1항에 있어서, 상기 점수 튜닝부는,The method of claim 1, wherein the score tuning unit,
    상기 평가영역 별 자동 채점 점수를 비교하여, 평가영역 간 채점 상관관계의 이격도가 미리 설정된 범위보다 큰 점수를 갖는 이상 평가영역을 선별하고, 상기 이상 평가영역의 자동 채점 점수를 상기 평가영역 간 상관관계 모델을 이용하여 튜닝하는 것을 특징으로 하는 자동 채점 장치.By comparing the automatic scoring scores of the evaluation areas, an abnormal evaluation area having a score greater than a preset range of scoring correlations between evaluation areas is selected, and the automatic scoring scores of the abnormal evaluation areas are correlated between the evaluation areas. Automatic scoring device, characterized in that tuning using the model.
  5. 제4항에 있어서, 상기 점수 튜닝부는,The method of claim 4, wherein the score tuning unit,
    상기 상관관계 모델을 이용하여 상기 이상 평가영역을 제외한 나머지 평가영역의 자동 채점 점수를 기준으로, 상기 선별된 이상 평가영역의 점수 별 발생 확률을 산출하고,Calculating the probability of occurrence of each score of the selected abnormal evaluation region based on the automatic scoring scores of the remaining evaluation regions other than the abnormal evaluation region using the correlation model,
    상기 이상 평가영역의 자동 채점 점수를 가장 높은 확률을 갖는 점수로 변경하는 것을 특징으로 하는 자동 채점 장치. And an automatic scoring score of the abnormal evaluation region to a score having the highest probability.
  6. 기 생성된 평가영역 별 채점 모델을 적용하여, 채점 대상 데이터에 대하여 하나 이상의 평가영역 별로 자동 채점을 수행하는 단계; 및Performing automatic scoring for one or more evaluation regions on the scoring target data by applying a previously generated evaluation model for each evaluation region; And
    평가영역별 상관 모델을 이용하여, 상기 하나 이상의 평가 영역별 자동 채점 점수를 튜닝하는 단계;Tuning an automatic scoring score for each of the one or more evaluation areas using a correlation model for each evaluation area;
    를 포함하는 자동 채점 방법.Automatic scoring method comprising a.
  7. 제6항에 있어서, 상기 튜닝하는 단계는The method of claim 6, wherein the tuning step
    평가영역 간 자동 채점 점수를 비교하여, 이격도가 기 설정된 범위보다 큰 점수를 갖는 이상 평가영역을 선별하는 단계;Selecting an abnormal evaluation region having a score greater than a predetermined range by comparing the automatic scoring scores between the evaluation regions;
    상기 이상 평가영역을 제외한 나머지 평가영역의 자동 채점 점수를 기준으로 상기 선별된 이상 평가영역의 점수 별 발생 확률을 산출하는 단계;Calculating a probability of occurrence of each score of the selected abnormal evaluation region based on the automatic scoring scores of the remaining evaluation regions other than the abnormal evaluation region;
    가장 높은 확률을 갖는 점수로 상기 이상 평가영역의 자동 채점 점수를 변경하는 단계를 포함하는 것을 특징으로 하는 자동 채점 방법.And changing the automatic scoring score of the abnormal evaluation region to the score having the highest probability.
  8. 제6항에 있어서,The method of claim 6,
    상기 자동 채점을 수행하기 전에, 하나 이상의 답안에 대하여 상기 하나 이상의 평가영역들을 평가한 기 채점 데이터 및 상기 하나 이상의 답안으로부터 추출된 하나 이상의 평가자질을 이용한 기계 학습을 통해 상기 평가영역 별 채점 모델을 생성하는 단계;Before performing the automatic scoring, a scoring model for each evaluation region is generated through machine learning using pre-scoring data evaluating the one or more evaluation regions for one or more answers and one or more evaluation features extracted from the one or more answers. Doing;
    를 더 포함하는 것을 특징으로 하는 자동 채점 방법.Automatic scoring method characterized in that it further comprises.
  9. 제8항에 있어서,The method of claim 8,
    상기 기 채점 데이터를 기반으로 상기 하나 이상의 평가영역 간에 각 점수가 발생할 확률을 정의한 평가영역 간 상관 관계 모델을 생성하는 단계를 더 포함하는 것을 특징으로 하는 자동 채점 방법.And generating a correlation model between evaluation areas that defines a probability of generating each score between the one or more evaluation areas based on the previously scored data.
  10. 기 생성된 평가영역 별 채점 모델을 적용하여, 채점 대상 데이터에 대하여 하나 이상의 평가영역 별로 자동 채점을 수행하는 단계; 및Performing automatic scoring for one or more evaluation regions on the scoring target data by applying a previously generated evaluation model for each evaluation region; And
    평가영역별 상관 모델을 이용하여, 상기 하나 이상의 평가 영역별 자동 채점 점수를 튜닝하는 단계;Tuning an automatic scoring score for each of the one or more evaluation areas using a correlation model for each evaluation area;
    를 실행하기 위한 프로그램이 기록되는 컴퓨터에서 판독 가능한 기록 매체.A computer-readable recording medium in which a program for executing the program is recorded.
PCT/KR2013/005347 2012-10-31 2013-06-18 Apparatus and method for automatic scoring WO2014069741A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201380031051.4A CN104364815A (en) 2012-10-31 2013-06-18 Apparatus and method for automatic scoring
US14/558,154 US20150093737A1 (en) 2012-10-31 2014-12-02 Apparatus and method for automatic scoring

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR10-2012-0122380 2012-10-31
KR1020120122380A KR101616909B1 (en) 2012-10-31 2012-10-31 Automatic scoring apparatus and method

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US14/558,154 Continuation US20150093737A1 (en) 2012-10-31 2014-12-02 Apparatus and method for automatic scoring

Publications (1)

Publication Number Publication Date
WO2014069741A1 true WO2014069741A1 (en) 2014-05-08

Family

ID=50627614

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2013/005347 WO2014069741A1 (en) 2012-10-31 2013-06-18 Apparatus and method for automatic scoring

Country Status (4)

Country Link
US (1) US20150093737A1 (en)
KR (1) KR101616909B1 (en)
CN (1) CN104364815A (en)
WO (1) WO2014069741A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109767663A (en) * 2019-03-22 2019-05-17 河南城建学院 A kind of linear algebra test question question-setting system
CN113421643A (en) * 2021-07-09 2021-09-21 浙江大学 AI model reliability judgment method, device, equipment and storage medium

Families Citing this family (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107292575A (en) * 2016-03-31 2017-10-24 阿里巴巴集团控股有限公司 Data processing method and device
WO2017190281A1 (en) * 2016-05-04 2017-11-09 汤美 Method and system for online teacher lecturing evaluation
US10516641B2 (en) 2016-06-21 2019-12-24 Pearson Education, Inc. System and method for automated evaluation system routing
US10581953B1 (en) * 2017-05-31 2020-03-03 Snap Inc. Real-time content integration based on machine learned selections
GB201710877D0 (en) 2017-07-06 2017-08-23 Nokia Technologies Oy A method and an apparatus for evaluating generative machine learning model
CN107729936B (en) * 2017-10-12 2020-12-08 科大讯飞股份有限公司 Automatic error correction review method and system
US11449762B2 (en) 2018-02-20 2022-09-20 Pearson Education, Inc. Real time development of auto scoring essay models for custom created prompts
US11443140B2 (en) 2018-02-20 2022-09-13 Pearson Education, Inc. Systems and methods for automated machine learning model training for a custom authored prompt
JP7080759B2 (en) * 2018-07-19 2022-06-06 アルー株式会社 Predicted score providing device, predicted score providing method and predicted score providing program
CN109491915B (en) * 2018-11-09 2022-02-08 网易有道信息技术(杭州)有限公司 Data processing method and device, medium and computing equipment
KR20200082540A (en) 2018-12-29 2020-07-08 김만돌 In-basket for competency assessment
KR20200086601A (en) 2019-01-09 2020-07-17 김만돌 Group discussion for competency assessment
KR20200086602A (en) 2019-01-09 2020-07-17 김만돌 In-basket system for competency assessment
KR20200086600A (en) 2019-01-09 2020-07-17 김만돌 Oral presentation for competency assessment
KR20200086796A (en) 2019-01-10 2020-07-20 김만돌 Manless on-line auto in-basket system for competency assessment
KR20200086798A (en) 2019-01-10 2020-07-20 김만돌 Manless on-line auto role play system for competency assessment
KR20200086799A (en) 2019-01-10 2020-07-20 김만돌 Manless on-line auto group discussion system for competency assessment
KR20200086795A (en) 2019-01-10 2020-07-20 김만돌 Group discussion system for competency assessment
KR20200086797A (en) 2019-01-10 2020-07-20 김만돌 Manless on-line auto oral presentation system for competency assessment
KR20200086794A (en) 2019-01-10 2020-07-20 김만돌 Role play system for competency assessment
KR20200086793A (en) 2019-01-10 2020-07-20 김만돌 Oral presentation system for competency assessment
WO2020166539A1 (en) * 2019-02-15 2020-08-20 日本電気株式会社 Grading support device, grading support system, grading support method, and program recording medium
CN110648058A (en) * 2019-09-17 2020-01-03 广州光大教育软件科技股份有限公司 Reliability analysis method and system based on examination paper reading and amending results and storage medium
CN110516060B (en) * 2019-10-24 2020-02-21 支付宝(杭州)信息技术有限公司 Method for determining answers to questions and question-answering device
KR20210084915A (en) 2019-12-30 2021-07-08 부산대학교 산학협력단 Online Learning Diagnosis Subjective Automatic Scoring System Using Machine Learning Technique and Its Method
CN113128883A (en) * 2021-04-23 2021-07-16 广东电网有限责任公司 GIM file automatic scoring method, device and storage medium
CN113705873B (en) * 2021-08-18 2024-01-19 中国科学院自动化研究所 Construction method of film and television work score prediction model and score prediction method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20010025202A (en) * 2000-10-14 2001-04-06 조만재 An Intellectual Valuation System
JP2004151757A (en) * 2002-10-28 2004-05-27 Ricoh Co Ltd Sentence evaluating and scoring device, program, and storage medium
KR20050042743A (en) * 2002-09-25 2005-05-10 가부시키가이샤 베네세 코포레이션 Test system and control method thereof

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060014129A1 (en) * 2001-02-09 2006-01-19 Grow.Net, Inc. System and method for processing test reports
US8380491B2 (en) * 2002-04-19 2013-02-19 Educational Testing Service System for rating constructed responses based on concepts and a model answer
US7088949B2 (en) * 2002-06-24 2006-08-08 Educational Testing Service Automated essay scoring
WO2005045786A1 (en) * 2003-10-27 2005-05-19 Educational Testing Service Automatic essay scoring system
US7657220B2 (en) * 2004-05-21 2010-02-02 Ordinate Corporation Adaptive scoring of responses to constructed response questions
WO2006093928A2 (en) * 2005-02-28 2006-09-08 Educational Testing Service Method of model scaling for an automated essay scoring system
KR100919912B1 (en) * 2005-04-05 2009-10-06 에이아이 리미티드 Systems and methods for semantic knowledge assessment, instruction, and acquisition
KR20090001485A (en) 2007-04-18 2009-01-09 주식회사 아이오시스 A self-study system through automatic marking of answers to subjective questions
JP5454357B2 (en) * 2010-05-31 2014-03-26 ソニー株式会社 Information processing apparatus and method, and program
US20120244510A1 (en) * 2011-03-22 2012-09-27 Watkins Jr Robert Todd Normalization and Cumulative Analysis of Cognitive Educational Outcome Elements and Related Interactive Report Summaries

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20010025202A (en) * 2000-10-14 2001-04-06 조만재 An Intellectual Valuation System
KR20050042743A (en) * 2002-09-25 2005-05-10 가부시키가이샤 베네세 코포레이션 Test system and control method thereof
JP2004151757A (en) * 2002-10-28 2004-05-27 Ricoh Co Ltd Sentence evaluating and scoring device, program, and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KANG, WON-SEOG: "Automatic Grading System for Subjective Questions Through Analyzing Question Type", THE JOURNAL OF THE KOREA CONTENTS ASSOCIATION, vol. 11, no. 2, 17 February 2011 (2011-02-17), pages 13 - 21 *
OH, JUNG SEOK ET AL.: "A Descriptive Question Marking System based on Semantic Kernels", KOREAN INSTITUTE OF INFORMATION TECHNOLOGY, vol. 3, no. 4, 1 October 2005 (2005-10-01), pages 95 - 104 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109767663A (en) * 2019-03-22 2019-05-17 河南城建学院 A kind of linear algebra test question question-setting system
CN113421643A (en) * 2021-07-09 2021-09-21 浙江大学 AI model reliability judgment method, device, equipment and storage medium

Also Published As

Publication number Publication date
KR20140055442A (en) 2014-05-09
CN104364815A (en) 2015-02-18
KR101616909B1 (en) 2016-04-29
US20150093737A1 (en) 2015-04-02

Similar Documents

Publication Publication Date Title
WO2014069741A1 (en) Apparatus and method for automatic scoring
CN109523194B (en) Chinese reading ability evaluation method and device and readable storage medium
US10769958B2 (en) Generating high-level questions from sentences
Bahreini et al. Towards real-time speech emotion recognition for affective e-learning
WO2018161917A1 (en) Intelligent scoring method and apparatus, computer device, and computer-readable medium
Li et al. An automated assessment framework for atypical prosody and stereotyped idiosyncratic phrases related to autism spectrum disorder
WO2012115324A1 (en) Conversation management method, and device for executing same
CN111833853B (en) Voice processing method and device, electronic equipment and computer readable storage medium
WO2012026674A2 (en) Method, apparatus and system for learning plan analysis
CN110600033B (en) Learning condition evaluation method and device, storage medium and electronic equipment
WO2021218029A1 (en) Artificial intelligence-based interview method and apparatus, computer device, and storage medium
CN103730032A (en) Method and system for controlling multimedia data
WO2023279692A1 (en) Question-and-answer platform-based data processing method and apparatus, and related device
WO2011074772A2 (en) Grammatical error simulation device and method
WO2023106855A1 (en) Method, system and non-transitory computer-readable recording medium for supporting writing assessment
WO2017131325A1 (en) System and method for verifying and correcting knowledge base
WO2016208941A1 (en) Text preprocessing method and preprocessing system for performing same
WO2009119991A2 (en) Method and system for learning language based on sound analysis on the internet
CN109346108A (en) A kind of review of operations method and system
WO2021137534A1 (en) Method and system for learning korean pronunciation via voice analysis
CN109272983A (en) Bilingual switching device for child-parent education
WO2011049313A2 (en) Apparatus and method for processing documents to extract expressions and descriptions
KR20060087821A (en) System and its method for rating language ability in language learning stage based on l1 acquisition
CN114462428A (en) Translation evaluation method and system, electronic device and readable storage medium
CN114241835A (en) Student spoken language quality evaluation method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13852182

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13852182

Country of ref document: EP

Kind code of ref document: A1