CN112528011B

CN112528011B - Open type mathematic operation correction method, system and equipment driven by multiple data sources

Info

Publication number: CN112528011B
Application number: CN202011405957.6A
Authority: CN
Inventors: 张婷; 余新国; 何彬
Original assignee: Central China Normal University
Current assignee: Central China Normal University
Priority date: 2020-12-05
Filing date: 2020-12-05
Publication date: 2022-06-17
Anticipated expiration: 2040-12-05
Also published as: CN112528011A

Abstract

The invention discloses an open type mathematic operation correction method, system and device driven by multiple data sources. The method comprises the following steps: s1, acquiring standard answer data of the open type mathematic question, and acquiring knowledge points examined by the question from the standard answer data; s2, obtaining historical job correction sample data, and determining the score distribution of each knowledge point according to the historical job correction sample data; s3, acquiring the job to be batched; and S4, converting each solving step in the job to be batched into an operator tree, converting each solving step in the standard answer data into the operator tree, calculating the similarity of the two operator trees, and outputting a batched and changed result according to the similarity and the score of each knowledge point. The invention realizes multidimensional knowledge attribute scoring by constructing a score distribution model based on knowledge points and a scoring model of the problem solving step of an operator tree, and can improve the accuracy of computer correction operation.

Description

Open type mathematic operation correction method, system and equipment driven by multiple data sources

Technical Field

The invention belongs to the technical field of intelligent education, and particularly relates to a method, a system and equipment for batch correction of open type mathematic jobs driven by multiple data sources.

Background

Open questions mean that answers to questions and ways to obtain answers are various. With the development of information technology, the improvement of the opening problem by using a computer can save a large amount of manpower and material resources. But how to make the computer objectively and accurately reflect the student's homework level is a very important research content. Research on automatic scoring and feedback of open-ended questions has focused on specific questions such as composition, computer programs, proof of mathematical logic, open-ended mathematical questions, and the like.

A data-driven mathematical language processing framework is proposed in the prior art. Firstly, extracting numerical characteristics from answers of students to open mathematic questions; then, performing cluster analysis on the features extracted from the multiple answers to find the structures of correct, partially correct and wrong answers; and finally, scoring one sample in each cluster by the teacher to finish automatic scoring of the rest of a large number of samples in the cluster. According to the method, the answer structure is used as the grading feature, and the automatic grading based on the single-dimension feature of the student answer structure is realized by carrying out cluster analysis on the student answer structure. But it has problems that: the open type mathematic questions based on the single characteristics of the answer structure are automatically scored, the mathematic formulas are not understood in a true sense, and valuable feedback information cannot be provided for students by the method.

The prior art also provides an automatic scoring and feedback generation method of an open mathematical problem (polynomial factorial decomposition problem) based on a long-short term memory (LSTM) neural network. The decomposed polynomial is learned from a large amount of data (topic-correct answers) using the LSTM model, after which the trained model is used to score and generate feedback. Aiming at the scoring problem, firstly, calculating the prediction probability of each character in the student answer by using a trained LSTM network, wherein the total prediction probability of all characters is the score of the answer; because the LSTM network is trained based on correct answer data, the prediction probability of a correct answer must be higher than the prediction probability of a wrong answer. Aiming at the problem generated by personalized feedback, when the prediction probability of a certain character in the answer input by the student is lower than a set threshold value, the system automatically reminds the student that an error possibly exists at the position. The above method uses neural networks to learn and understand simple mathematical operations, but still has certain limitations. First, the problem types that can be applied are limited, and are limited to simple mathematical operation problems, such as polynomial decomposition, but are not feasible for mathematical application problems using literal descriptions in the problem stem. Secondly, each character in the answer is used as a grading feature and each grading feature is given the same score weight, automatic grading based on the grading feature of the multi-dimensional character is achieved, important grading information that a plurality of knowledge point difficulty coefficients for topic investigation are different and can correspond to different scores is omitted, and interpretability of a grading model is limited.

In summary, the existing open type automatic mathematic question scoring model based on student answering or standard answer sheet data sources cannot effectively extract knowledge points and difficulty coefficient scoring characteristics thereof, cannot realize multi-dimensional knowledge attribute scoring, so that the scoring model has limited interpretability, a computer cannot objectively and accurately determine the work level of students, and the work batch correction accuracy is low.

Disclosure of Invention

Aiming at least one defect or improvement requirement in the prior art, the invention provides an open type mathematic homework correcting method, system and device driven by multiple data sources, which realize multidimensional knowledge attribute scoring, enable a computer to objectively and accurately determine the homework level of students and improve the homework correcting accuracy.

To achieve the above object, according to a first aspect of the present invention, there is provided an open math job approval method driven by multiple data sources, comprising the steps of:

s1, obtaining standard answer data of the open type mathematic question, wherein the standard answer data comprises a plurality of problem solving steps, and associating knowledge points for each problem solving step;

s2, acquiring historical job correction sample data, wherein the historical job correction sample data comprises a plurality of problem solving steps, each problem solving step is marked with correction data, knowledge points are associated with each problem solving step, and the score of each knowledge point is determined according to the historical job correction sample data;

s3, acquiring the job to be batched;

s4, converting each solving step in the job to be batched into a first operator tree, converting each solving step in the standard answer data into a second operator tree, determining the corresponding relation between each first operator tree and each second operator tree, calculating the similarity between the corresponding first operator tree and the corresponding second operator tree, and outputting a batched and modified result according to the similarity and the score of each knowledge point.

Preferably, the standard answer data for acquiring the open-ended mathematical problem is: and acquiring a plurality of standard answer data of different problem solving strategies.

Preferably, the method further comprises the steps of: the step S4 is repeatedly executed, the correction results calculated according to each standard answer data are respectively output, and the correction result with the highest score is selected as the final correction result.

Preferably, the standard answer data is input manual answer data or answer data acquired by using a machine answer technique.

Preferably, the obtaining by using the machine solution technique includes the steps of:

acquiring a mathematical relation group of equivalent representation questions by adopting a question understanding technology;

carrying out evolution on the mathematical relation group to obtain a plurality of problem solving steps;

and mapping each problem solving step to a pre-constructed knowledge graph, and associating knowledge points for each problem solving step.

Preferably, the determining the score of each knowledge point according to the sample data of the historical job correction comprises the following steps:

classifying historical job correction sample data according to the knowledge points, and determining the difficulty coefficient of each knowledge point according to correction data of each knowledge point;

recording the total score of the open type mathematic problem as M, recording the number of the knowledge points included in the open type mathematic problem as K, and then recording the score M of the K-th knowledge point_kComprises the following steps:

wherein K is more than or equal to 1 and less than or equal to K, c_kThe difficulty coefficient of the k-th knowledge point.

Preferably, the step S4 includes the steps of:

and marking the output correction result as G, wherein the calculation formula of the correction result G is as follows:

wherein s is_kIs the similarity of the first operator tree and the second operator tree.

Preferably, the similarity s_kThe calculation formula of (2) is as follows:

wherein N is_sRepresenting the number of nodes in the second operator tree, N_olRepresenting the number of identical nodes in the first operator tree and the second operator tree.

According to a second aspect of the present invention, there is provided an open math job approval system driven by multiple data sources, comprising:

the standard answer data acquisition module is used for acquiring standard answer data of the open type mathematic questions, wherein the standard answer data comprises a plurality of problem solving steps and associates knowledge points for each problem solving step;

the score model building module is used for acquiring historical job correction sample data, the historical job correction sample data comprises a plurality of problem solving steps, each problem solving step is marked with correction data, knowledge points are associated with each problem solving step, and the score of each knowledge point is determined according to the historical job correction sample data;

the input module is used for acquiring the job to be corrected;

the scoring model building module is used for converting each solving problem step in the job to be batched into a first operator tree, converting each solving problem step in the standard answer data into a second operator tree, determining the corresponding relation between each first operator tree and each second operator tree, and calculating the similarity between the corresponding first operator tree and the corresponding second operator tree;

and the output module is used for outputting the batch modification result according to the similarity and the score of each knowledge point.

According to a third aspect of the present invention, there is provided a computer apparatus comprising a memory storing a computer program and a processor implementing any of the above methods when the processor executes the computer program.

In general, compared with the prior art, the invention has the following beneficial effects: the method comprises the steps of obtaining knowledge points examined by questions from standard answer data, obtaining score distribution of the knowledge points according to historical homework correcting sample data, achieving multidimensional knowledge attribute scoring, enabling a computer to objectively and accurately determine the student homework level, and improving homework correcting accuracy, and is specifically embodied in the following aspects:

(1) and (5) constructing an open type mathematical problem score distribution model. And (3) considering the characteristic that the difficulty coefficients of the knowledge points associated with each step are different and correspond to different score weights, constructing a score distribution model based on the knowledge points.

(2) And (4) constructing a grading model of a key step. On the scoring problem of the key step, as the writing forms of the mathematical formulas are various, the requirement of scoring by reading and scoring cannot be met only by aiming at the measurement of the visual structure information of the formulas, and the scoring of the key step of the open type mathematical problem measures the similarity of the mathematical formulas at a semantic level. An operator tree is a model based on operator syntax as a representation method of mathematical formulas. In consideration of the close relationship of syntax and semantics in mathematical languages, the invention proposes to construct a key step scoring model based on an operator tree.

(3) And a specific acquisition mode of the score feature of the knowledge point granularity is provided.

Drawings

Fig. 1 is a schematic diagram of an open math job batch modification method according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

The principle of the open math job batch modification method driven by multiple data sources according to the embodiment of the present invention is shown in fig. 1, and the method includes steps S1 to S4.

And S1, acquiring standard answer data of the open type mathematic question, wherein the standard answer data comprises a plurality of problem solving steps and associates knowledge points for each problem solving step.

The open math problem is: according to the condition of a certain mathematical problem, the various ways and methods for analyzing, reasoning and judging to obtain a conclusion by using the learned knowledge can be various, and the different ways and methods are different strategies for solving the problem. Therefore, the open math question can have several standard answer data of different question solving strategies.

By obtaining standard answer data, associating knowledge points for each problem solving step, the knowledge points examined by the problems can be obtained. The mathematical problem solving process generally includes a plurality of solving steps, each step investigating different knowledge points.

Preferably, the acquisition of the solution data by using the machine solution technology is realized by adopting the following modes: a problem understanding technology is adopted to obtain a mathematical relation group of equivalent representation problems, and the problem understanding technology can adopt a syntactic semantic model and the like; carrying out evolution on the mathematical relation group to obtain a plurality of problem solving steps; and mapping each problem solving step to a pre-constructed knowledge graph, and associating knowledge points for each problem solving step.

And S2, acquiring historical job correction sample data, wherein the historical job correction sample data comprises a plurality of problem solving steps, each problem solving step is marked with correction data, knowledge points are associated with each problem solving step, and the score of each knowledge point is determined according to the historical job correction sample data.

The sample data of the historical job batch modification comprises the following steps: the answer data of different students to the open type mathematic problems and the correction data of the answer data. The answer data comprises a plurality of answer steps, each answer step is batched, and correct or wrong answers or scores are marked.

The method for associating knowledge points is similar to that in step S1, a knowledge graph may be pre-constructed by using a computer, each node in the knowledge graph represents a knowledge point, and each problem solving step is mapped to a node in the knowledge graph.

And carrying out knowledge point marking on the batch modification sample data of the historical homework by adopting a manual or mode matching method, and then counting student score data of the knowledge point granularity to determine the score of each knowledge point. Preferably, the difficulty factor of a knowledge point is equal to the ratio of the number of wrong answers to the total number of people. If all students grasp the knowledge point, the difficulty coefficient is 0; on the contrary, if no student answers the knowledge point, the coefficient is 1.

The existing score distribution model of the open type mathematic question is mainly generated by depending on the experience of teachers, and the method has manual dependence and certain subjectivity. Considering the characteristic that the difficulty coefficients of knowledge points associated with each problem solving step are different and correspond to different score weights, the embodiment of the invention provides an automatic establishing method of an open type mathematical problem score distribution model based on the knowledge points.

Assuming that the total score of the question is represented by M, the difficulty coefficient for recording the number of knowledge points included in the open-form mathematical problem as K knowledge points K is represented by c_kMeaning that K is more than or equal to 1 and less than or equal to K, then the score distribution m corresponding to the key step or knowledge point_kCan be calculated from the following formula:

and S3, acquiring the job to be batched.

Due to the various writing forms of the mathematical formula, the prior art can not meet the requirement of grading key steps in the research topic only by aiming at the measurement of the visual structure information of the formula. The scoring of key steps of an open-ended math problem requires that the similarity of the math formulas be measured at a semantic level.

Mathematical formulas have different methods of representation. The symbol layout tree is a formula representation model starting from symbol space layout, is a description of a formula visual structure and does not contain semantic information, and the symbol layout tree is LATEX, for example. An operator tree, as another expression method of a mathematical formula, is a model based on the syntax of an operator.

In consideration of the close relationship of syntax and semantics in mathematical languages, the embodiment of the invention provides a key step scoring model based on an operator tree. Firstly, the visual structure recognition result of the handwritten mathematical formula, such as LATEX markup language, is converted into a corresponding operator tree, namely, an operator in a problem solving step is expressed by using a data structure of the tree in a computer, and then the score of a key step is predicted by calculating the distance between the operator trees. Assuming semantic similarity between student answers and standard answers for key step or knowledge point k, by s_k∈[0,1]Then, regarding the subject, the final score G of the student can be preferably calculated by the following formula:

preferably, it is assumed that the standard answer includes P question solving steps corresponding to P second operator trees, and the job to be modified includes Q question solving steps corresponding to Q first operator trees, and the second operator tree P1 in the standard answer is sequentially compared with the Q first operator trees in the job to be modified according to the sequence order to find a first operator tree Q1 with the highest matching degree. For the second operator tree p2 in the standard answer, and other Q-1 except Q1 in the job to be batchedThe first operator trees are compared in sequence order to find a first operator tree q2 with the highest matching degree. And so on until a corresponding first operator tree is found for each second operator tree in the standard answer. Here s_kActually representing the similarity of two corresponding operator trees (student answer tree and standard answer tree), e.g. s₁Is p₁And q is₁Similarity of (S)₂Is p₂And q is₂The similarity of (c). Preferably, s_kCan be calculated from the following formula:

wherein N is_sNumber of nodes of the second operator tree corresponding to the number of standard answers, N_olRepresenting the number of identical nodes in the first operator tree and the second operator tree.

Preferably, if P is greater than Q, then some second operator trees in the standard answer may not find the corresponding first operator tree in the job to be corrected, and s is then_kIs 0.

If a plurality of standard answer data of the solution strategy are obtained in step S1, step S4 is repeatedly executed, the similarity between the operator tree corresponding to the solution step in each standard answer data and the operator tree in the job to be batched and modified is respectively calculated, each standard answer data can give a modification result, and the modification result with the highest score is selected as the final modification result.

The open type mathematic operation correcting system driven by multiple data sources comprises the following steps:

the score model building module is used for acquiring historical job correction sample data which comprises a plurality of problem solving steps, wherein each problem solving step is marked with correction data, knowledge points are associated with each problem solving step, and the score of each knowledge point is determined according to the historical job correction sample data;

the input module is used for acquiring the operation to be corrected;

the scoring model construction module is used for converting each problem solving step in the job to be batched into a first operator tree, converting each problem solving step in the standard answer data into a second operator tree, determining the corresponding relation between each first operator tree and each second operator tree, and calculating the similarity between the corresponding first operator tree and the corresponding second operator tree;

The implementation principle and technical effect of the system are the same as those of the method, and are not described herein again.

The present embodiment further provides a computer device, which includes at least one processor and at least one memory, where the memory stores a computer program, and when the computer program is executed by the processor, the processor is enabled to execute the modifying method according to any one of the above embodiments, which is not described herein again; in this embodiment, the types of the processor and the memory are not particularly limited, for example: the processor may be a microprocessor, digital information processor, on-chip programmable logic system, or the like; the memory may be volatile memory, non-volatile memory, a combination thereof, or the like.

It must be noted that in any of the above embodiments, the methods are not necessarily executed in order of sequence number, and as long as it cannot be assumed from the execution logic that they are necessarily executed in a certain order, it means that they can be executed in any other possible order.

It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A multiple data source driven open type mathematic operation batch modification method is characterized by comprising the following steps:

s1, acquiring standard answer data of the open type mathematic question, wherein the standard answer data comprises a plurality of problem solving steps and associates knowledge points for each problem solving step;

s3, acquiring the job to be batched;

s4, converting each solving problem step in the job to be batched into a first operator tree, converting each solving problem step in the standard answer data into a second operator tree, determining the corresponding relation between each first operator tree and each second operator tree, calculating the similarity between the corresponding first operator tree and the corresponding second operator tree, and outputting a batched and modified result according to the similarity and the score of each knowledge point;

the method for determining the score of each knowledge point according to the historical job correction sample data comprises the following steps:

classifying historical homework correcting sample data according to knowledge points, determining the difficulty coefficient of each knowledge point according to correcting data of each knowledge point, specifically, marking the knowledge points on the historical homework correcting sample data by adopting a manual or mode matching method, then counting student score data of the granularity of the knowledge points to determine the score of each knowledge point, wherein the difficulty coefficient of one knowledge point is equal to the ratio of the number of wrong answers to the total number of people, if all students grasp the knowledge point, the difficulty coefficient is 0, otherwise, if no student answers to the knowledge point, the difficulty coefficient is 1;

wherein K is more than or equal to 1 and less than or equal to K, c_kThe difficulty coefficient of the k-th knowledge point;

the step S4 includes the steps of:

and recording the output correction result as G, wherein the calculation formula of the correction result G is as follows:

wherein s is_kSimilarity of the first operator tree and the second operator tree;

the similarity s_kThe calculation formula of (2) is as follows:

2. The method as claimed in claim 1, wherein the standard answer data for the open math problem is obtained by: and acquiring a plurality of standard answer data of different problem solving strategies.

3. The multiple data source driven open math job batching method as claimed in claim 2, further comprising the steps of: the step S4 is repeatedly executed, the correction results calculated according to each standard answer data are respectively output, and the correction result with the highest score is selected as the final correction result.

4. The multiple data source driven open math job approval method of claim 1, wherein the standard answer data is input manual answer data or answer data obtained by a machine answer technique.

5. The multiple data source driven open math job approval method of claim 4, wherein said obtaining using a machine solution technique comprises the steps of:

acquiring a mathematical relation group of equivalent representation titles by adopting a title understanding technology;

6. An open math job approval system driven by multiple data sources, comprising:

the input module is used for acquiring the job to be corrected;

the output module is used for outputting a batch modification result according to the similarity and the score of each knowledge point;

classifying the historical homework batch modification sample data according to knowledge points, determining the difficulty coefficient of each knowledge point according to the batch modification data of each knowledge point, specifically, marking the knowledge points on the historical homework batch modification sample data by adopting a manual or mode matching method, then counting student score data of the granularity of the knowledge points to determine the score of each knowledge point, wherein the difficulty coefficient of one knowledge point is equal to the ratio of the number of wrong answers to the total number of people, if all students grasp the knowledge point, the difficulty coefficient is 0, otherwise, if no student answers the knowledge point, the difficulty coefficient is 1;

recording the total score of the open type mathematical problem as M, recording the number of the knowledge points included in the open type mathematical problem as K, and then recording the score M of the kth knowledge point_kComprises the following steps:

the step S4 includes the steps of:

the similarity s_kThe calculation formula of (2) is as follows:

7. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 5.