CN114092288A - Personalized intelligent tutoring method for programming beginners - Google Patents
Personalized intelligent tutoring method for programming beginners Download PDFInfo
- Publication number
- CN114092288A CN114092288A CN202111395652.6A CN202111395652A CN114092288A CN 114092288 A CN114092288 A CN 114092288A CN 202111395652 A CN202111395652 A CN 202111395652A CN 114092288 A CN114092288 A CN 114092288A
- Authority
- CN
- China
- Prior art keywords
- variable
- programming
- program
- student
- repair
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 55
- 230000008439 repair process Effects 0.000 claims abstract description 87
- 238000013507 mapping Methods 0.000 claims abstract description 19
- 238000011156 evaluation Methods 0.000 claims abstract description 13
- 238000012360 testing method Methods 0.000 claims abstract description 8
- 230000014509 gene expression Effects 0.000 claims description 19
- 230000008569 process Effects 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 6
- 239000010410 layer Substances 0.000 claims description 6
- 238000012986 modification Methods 0.000 claims description 6
- 230000004048 modification Effects 0.000 claims description 6
- 230000009471 action Effects 0.000 claims description 3
- 239000002356 single layer Substances 0.000 claims description 3
- 238000000638 solvent extraction Methods 0.000 claims description 3
- 206010063385 Intellectualisation Diseases 0.000 abstract description 3
- 230000000875 corresponding effect Effects 0.000 description 20
- 239000013598 vector Substances 0.000 description 13
- 230000008859 change Effects 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 4
- 238000005457 optimization Methods 0.000 description 4
- 238000012549 training Methods 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000012217 deletion Methods 0.000 description 2
- 230000037430 deletion Effects 0.000 description 2
- 229910000831 Steel Inorganic materials 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000007792 addition Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 210000001503 joint Anatomy 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 239000010959 steel Substances 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000010977 unit operation Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/20—Education
- G06Q50/205—Education administration or guidance
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B19/00—Teaching not covered by other main groups of this subclass
- G09B19/0053—Computers, e.g. programming
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Educational Administration (AREA)
- Educational Technology (AREA)
- Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Tourism & Hospitality (AREA)
- General Health & Medical Sciences (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Human Resources & Organizations (AREA)
- General Business, Economics & Management (AREA)
- Economics (AREA)
- Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Entrepreneurship & Innovation (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a personalized intelligent tutoring method for programming beginners, relating to the field of education intellectualization; firstly, aiming at each programming operation of a certain topic, dividing by taking a block as granularity; then, inputting the test sample into each operation program to obtain respective variable execution tracks; dividing the correct operation programs into clusters according to matching conditions, and randomly selecting a template from each cluster; aiming at the current error operation program, selecting templates one by one to match variable execution tracks with the templates, generating mapping relations by adopting Cartesian products, and calculating the repair cost corresponding to each mapping relation; and selecting a mapping relation which meets the conditions that the matching of the variables is completely consistent and the cost value is minimum, corresponding the correct variable to the variable corresponding to the error program, repairing the corresponding knowledge point, and finishing the final repair feedback generation. Finally, a combined factor model is constructed to realize the evaluation of the programming learning state of the student; the invention improves the repair rate of the programming operation.
Description
Technical Field
The invention relates to the field of education intellectualization, in particular to a programming beginner-oriented personalized intelligent tutoring method.
Background
The programming intelligent tutoring is an important content in intelligent education and aims to help students to repair submitted error programs and estimate mastering conditions of programming knowledge points.
Currently, the following problems exist in the field of programmed intelligent tutoring: the method has the advantages that automatic repair support for small-scale programs written by beginners is insufficient, programming error repair time is long, repair results are obscure and unintelligible, a model for programming learning state prediction is lacked, and an existing programming course online practice system is insufficient in providing personalized feedback; the above problems have led to a need for a programmed intelligent tutoring system.
In a traditional online evaluation system, students can only obtain a result of whether a code is correct, further repair work of an error code needs to be completed independently by the students or assisted by teachers, manual repair needs a large amount of time, and the efficiency is extremely low; in addition, students lack basic knowledge on the mastery conditions of own knowledge points in the programming learning process, and are difficult to obtain personalized guidance, an intelligent tutoring system for programming beginners is urgently needed to assist the students in learning programming courses, meanwhile, codes submitted by the students can be evaluated in real time, repair opinions and error-related knowledge points can be provided for the students for error programs, and the learning degree of the students on the programming knowledge points can be estimated after the students learn in stages.
If manual repair is adopted, a large amount of time is wasted by the students when the students repair errors by themselves, the programming learning enthusiasm of the students is also struck, and the difference of the number of the teachers and the students cannot meet the requirement that the teachers repair the errors of the students one by one. Therefore, implementing an intelligent tutoring system for a specific course has become an important content for intelligent learning at present stage.
Disclosure of Invention
Aiming at the problem of personalized intelligent tutoring of a programming beginner, the invention builds a personalized intelligent tutoring method facing the programming beginner in order to overcome the defects of the prior art, starts with the collection of learning behavior data of a programming course, carries out matching clustering on correct submissions of programming practice to obtain templates of each class, returns feedback opinions by using an automatic repair tool when the templates are submitted incorrectly, and simultaneously builds a learning state prediction model to evaluate and predict results newly submitted by a student through the wrong submissions of the student.
The personalized intelligent tutoring method for the programming beginners comprises the following steps:
step one, aiming at a certain question, dividing a programming operation program submitted by each programming beginner into two types according to a correct answer and an incorrect answer;
step two, dividing each operation program by taking a block as granularity according to whether a control statement and a cycle statement are contained;
the rules of the division are as follows:
i. the method has the advantages that the loop statement and the selection statement do not exist, and the loop statement and the selection statement are an integral block and are not divided; selecting a statement without nesting a loop statement, wherein the statement is taken as a whole block without division, and each block is expressed as { L:0 };
a single-layer loop statement, expressed as a separate block as { L:1 };
if a loop statement is nested in the selection statement, the whole is expressed as a single block as L: 1;
a plurality of layers of nested loops, each layer being a separate block; a loop block L2 is nested in the loop block L1, and is denoted as { L1: { L2:1} }.
Inputting the test sample corresponding to the question into each correct operation program to obtain the variable value of the block in each operation program, and combining the variable values to form a variable execution track corresponding to each operation;
the test sample comprises correct input and output of the question;
step four, selecting variable execution tracks one by one, judging whether matching conditions are met, if so, grouping all matched operation programs into one type, otherwise, independently classifying unmatched operation programs;
the matching of the two variable execution tracks needs to satisfy the following conditions at the same time:
i. whether the number of blocks in the two variable execution tracks is the same or not and whether the control statement and the loop structure statement are the same or not;
ii, whether the number of variables in the two variable execution tracks is the same;
and iii, whether the two variable execution tracks correspond to one another.
The method specifically comprises the following steps:
firstly, two variable execution tracks are initially randomly selected, whether matching conditions are met is judged, and if yes, the two matched operation programs are classified into one class; otherwise, the two corresponding operation programs are respectively of one type;
then, continuously selecting a third variable execution track, matching the third variable execution track with any one of the two matched operation programs, judging whether a matching condition is met, if so, classifying the third variable execution track into the class, and if not, independently classifying the third variable execution track into the class;
for two unmatched operation programs, matching the third variable execution tracks with the two operation programs one by one, repeatedly judging whether matching conditions are met, classifying the matching conditions into one class, and otherwise, respectively classifying the three operation programs into one class;
sequentially selecting the next variable execution track, and repeating the process until the variable execution tracks of all the operations are matched;
randomly selecting an operation program from each class as a template to form a template set for repairing the local part of the error program;
step six, for each wrong operation program, obtaining a variable execution track corresponding to each operation by using the test sample;
step seven, matching the variable execution track in the current error operation program S with the correct variable execution track selected one by one from the template set to obtain the matched correct variable and the unmatched error variable in the program S;
the correct variable refers to a variable which is matched with the correct variable in the variable execution track of the error operation program, namely a variable which does not have errors in the error operation program;
step eight, generating a mapping relation set by adopting Cartesian products between error variables of the program S and all variables in a correct variable execution track in the template set;
and step nine, calculating the repair cost corresponding to each mapping relation in the mapping relation set.
The repair cost is as follows: modifying the error code segment a into the size of the action required to be executed by the b;
the invention adopts the steps of converting a program fragment into an abstract syntax tree, calculating the edit Distance (a, b) of the tree as the repair cost based on the structure of the tree, and the calculation formula is as follows:
Distance(a,b)=Sq+Iq+Dr
Sqnumber of node modifications for two abstract syntax trees, IqInserting operation times for nodes of two abstract syntax trees, DrDeleting the operation times for the nodes of the two abstract syntax trees;
selecting a mapping relation which satisfies the conditions that the variables are completely matched and the cost value is minimum from all the repairing costs, corresponding the correct variables in the correct variable execution track to the variables corresponding to the error program S, and using a regular expression to correspond the repairing operation to the programming knowledge points to complete the final repairing feedback generation.
Step eleven, constructing a joint factor model, realizing the evaluation of the programming learning state of the student, and predicting the answer result of the student on the next question;
the joint factor model predicts the student answer probability by linearly adding the mastery conditions and the difficulty of all knowledge points according to the student answer performance.
Probability p of correct answerijThe calculation formula of (c) is as follows:
Yijrepresenting the probability of the student i answering the question j; theta.theta.iThe mastery degree of the student i on the question before answering is represented; beta is akRepresenting the ease of knowledge point k, gammakRepresents the learning rate of the knowledge point k, qjkWhether the topic j contains a knowledge point k is represented, wherein the knowledge point k is 1 and is not 0; c. CikRepresents the actual performance of student i at knowledge point k, TikRepresents the number of attempts by student i at knowledge point k, γkAnd TikMultiplication means that the mastery degree of the student on the knowledge point k is higher every time the student tries; k represents the total number of knowledge points in the programming topic.
By matching the parameter theta of the association factor modeliBeta and gamma are optimized, and the next question of the student is input into the optimized combined factor model to predict the result of the question;
compared with the prior art, the invention has the following advantages:
(1) compared with the existing software repair tool, the personalized intelligent tutoring method for the programming beginner repairs the programming jobs of students and can generate the clustering results of correct programs and the repair operation of wrong programs; the method positions the program analysis granularity to the block, matches and repairs by using the execution track of the variable, repairs based on the semantics, sets the repair emphasis as an algorithm rather than an expression form, and improves the repair rate of the programming operation.
(2) A personalized intelligent tutoring method for programming beginners optimizes an algorithm for generating local repair and reduces time for obtaining repair cost; the tree editing distance is used for calculating the repair cost, so that various choices are provided for repairing different scenes;
(3) a personalized intelligent tutoring method facing programming beginners defaults each question to contain only one knowledge point when analyzing student answer states in the existing learning state prediction model, and each programming question is complex and may contain a plurality of knowledge points in programming practice;
(4) a personalized intelligent tutoring method for programming beginners is characterized in that codes submitted by students contain more information than traditional objective questions, observation variables of existing models can only be binary variables (right answer and wrong answer), and multiple knowledge points and code information are introduced for learning state evaluation of programming practice.
Drawings
FIG. 1 is a schematic diagram of a programming beginner-oriented personalized intelligent tutoring method according to the present invention;
FIG. 2 is a flowchart of a programming beginner-oriented personalized intelligent tutoring method of the present invention;
FIG. 3 is a block diagram of a programming beginner-oriented personalized intelligent tutoring method of the present invention;
FIG. 4 is a time comparison of tree edit distance repair and vector distance repair used in the present invention;
FIG. 5 is a histogram of relative repair costs versus number of erroneous procedures according to the present invention.
Detailed Description
The present invention will be described in further detail and with reference to the accompanying drawings so that those skilled in the art can understand and practice the invention.
The intelligent tutoring system ITS is an adaptive learning support system which makes a computer play a role of a virtual tutor to teach knowledge to learners and provide learning guidance by means of an artificial intelligence technology. As a product of the development of modern distance education towards intellectualization, ITS combines scientific theories and technical methods of a plurality of subjects such as artificial intelligence, computer science, education, behavior science and psychology, and aims to provide learning guidance and help for learners in a man-machine interaction mode. The intelligent tutoring system can judge the mastering level of the students on the corresponding knowledge points according to the answering effect of the students, so that the students are helped to generate personalized learning routes and provide targeted tutoring opinions.
The existing intelligent tutoring system is mainly applied to the field of basic subjects, is still deficient in the intelligent tutoring system of programming courses, is more a simple on-line evaluation system, and has the following defects: (1) the repair guidance for student errors cannot be carried out; (2) without summarizing the learning process of students, the students cannot get the own mastery condition of the programming knowledge points. Aiming at the problems, the invention provides a personalized intelligent tutoring method for programming beginners, which realizes the intelligent tutoring for the programming beginners;
the personalized intelligent tutoring method for the programming beginners is characterized in that as shown in fig. 1, programming data of students are collected, verified and stored; the correct operation program in the programs submitted by the students is clustered to repair the wrong operation program; training is carried out through a learning state prediction model, the ability of the student to the programming knowledge point is deduced, and the final feedback generation is completed.
As shown in fig. 2, the method comprises the following steps:
step one, aiming at a certain question, dividing a programming operation program submitted by each programming beginner into two types according to a correct answer and an incorrect answer;
step two, dividing each operation program by taking a block as granularity according to whether a control statement and a cycle statement are contained;
the rules of the division are as follows:
i. the method has the advantages that the loop statement and the selection statement do not exist, and the loop statement and the selection statement are an integral block and are not divided; selecting a statement without nesting a loop statement, wherein the statement is taken as a whole block without division, and each block is expressed as { L:0 };
a single-layer loop statement, expressed as a separate block as { L:1 };
if a loop statement is nested in the selection statement, the whole is expressed as a single block as L: 1;
a plurality of layers of nested loops, each layer being a separate block; a loop block L2 is nested in the loop block L1, and is denoted as { L1: { L2:1} }.
Inputting the test sample corresponding to the question into each correct operation program to obtain the variable values of the blocks in each operation program, and combining the variable values to form a variable execution track corresponding to each operation;
the test sample comprises the correct input and output of the question;
step four, selecting variable execution tracks one by one, judging whether matching conditions are met, if so, grouping all matched operation programs into one type, otherwise, independently classifying unmatched operation programs;
the matching of the two variable execution tracks needs to satisfy the following conditions at the same time:
i. whether the number of blocks in the two variable execution tracks is the same or not and whether the control statement and the loop structure statement are the same or not;
ii, whether the number of variables in the two variable execution tracks is the same;
whether the two variable execution tracks correspond to each other one by one or not is judged; that is, in a variable trajectory set, for each variable in a program, there is one variable in another program that takes equal values in the same order as it takes.
The method specifically comprises the following steps:
firstly, two variable execution tracks are initially randomly selected, whether matching conditions are met is judged, and if yes, the two matched operation programs are classified into one class; otherwise, the two corresponding operation programs are respectively of one type;
then, continuously selecting a third variable execution track, matching the third variable execution track with any one of the two matched operation programs, judging whether a matching condition is met, if so, classifying the third variable execution track into the class, and if not, independently classifying the third variable execution track into the class;
for two unmatched operation programs, matching the third variable execution tracks with the two operation programs one by one, repeatedly judging whether matching conditions are met, classifying the matching conditions into one class, and otherwise, respectively classifying the three operation programs into one class;
sequentially selecting the next variable execution track, and repeating the process until the variable execution tracks of all the operations are matched;
randomly selecting an operation program from each class as a template to form a template set for repairing the local part of the error program;
step six, for each wrong operation program, obtaining a variable execution track corresponding to each operation by using the test sample;
step seven, matching the variable execution track in the current error operation program S with the correct variable execution track selected one by one from the template set to obtain the matched correct variable and the unmatched error variable in the program S;
the correct variable refers to a variable which is matched with the correct variable in the variable execution track of the error operation program, namely, a variable which does not have an error in the error operation program;
step eight, generating a mapping relation set by adopting Cartesian products between error variables of the program S and all variables in a correct variable execution track in the template set;
and utilizing the repair mapping relation to correspond the variable to the variable of the error program to obtain the line number and the specific operation to be repaired.
And step nine, calculating the repair cost corresponding to each mapping relation in the mapping relation set.
The repair cost is as follows: modifying the error code segment a into the size of the action required to be executed by the b; the method is obtained by using an Abstract Syntax Tree (Abstract Syntax Tree), the AST change calculates the repair cost in two ways, one is that the Tree edit Distance (a, b) is calculated based on the Tree structure as the cost, and the calculation formula is as follows:
Distance(a,b)=Sq+Iq+Dr
Sqnumber of node modifications for two abstract syntax trees, IqInserting operation times for nodes of two abstract syntax trees, DrDeleting the operation times for the nodes of the two abstract syntax trees;
another consideration is to convert AST into vectors, with the difference of two vectors representing the repair cost:
and step ten, selecting a mapping relation which satisfies the conditions that the variable matching is completely consistent and the cost value is minimum from all the repair costs, corresponding the correct variable in the correct variable execution track to the variable corresponding to the error program S, using a regular expression to correspond the repair operation to the programming knowledge point, and completing the final repair feedback generation.
In order to obtain the final repair, the repair cost needs to be minimized except for meeting the condition of matching consistency of variables. Therefore, a subset of the repair set is selected with minimal repair cost, considering the use of constraint optimization techniques; first, this problem needs to be transformed into an optimization problem under linear constraints, which is then handed over to an off-the-shelf ILP solver for optimal results.
Step eleven, constructing a joint factor model, realizing the evaluation of the programming learning state of the student, and predicting the answer result of the student on the next question;
the repaired wrong knowledge points can be regarded as knowledge points which are not mastered by students and used as the description of the learning process of the students, and a joint factor model can be introduced to improve the prediction effect of the model.
Probability p of correct parameters and answers to be solvedijThe calculation formula of (a) is as follows:
Yijrepresenting the probability of the student i answering the question j; thetaiThe ability of the student i is represented, namely the mastery degree of the question before answering, the ability of the student is positively correlated with the answer probability, and the stronger the ability is, the higher the answer probability is; beta is akRepresents the ease of knowledge point k, and qjkMultiplying to influence the probability of final answer, wherein the ease degree of the knowledge points can influence the result only if the questions contain the knowledge points; gamma raykShows the learning rate of the knowledge point k, i.e. the student does not grasp the knowledge point k in the jth subject, but learns the knowledge point k in the jth +1 subjectThe probability of the knowledge point is mastered; q. q.sjkWhether the topic j contains the knowledge point k or not is shown, the term is obtained from the Q-matrix and is a binary variable, and the topic contains the knowledge points k and QjkIs 1, not inclusive of 0; t isikRepresents the number of attempts by student i at knowledge point k, γkAnd TikMultiplication means that the mastery degree of the student on the knowledge point k is higher every time the student tries; k represents the number of the knowledge points summed up in the Q-matrix, and the final result is that the grasping conditions and the difficulty of all the knowledge points are linearly added to predict the answer probability of the students.
Q-matrix represents the relation between the topics and the knowledge points, the nth column value of the mth row is 1, the fact that the topics m contain the knowledge points n is represented, and the fact that the values are 0 indicates that the topics m do not contain the knowledge points n. Parameter cikThe method is characterized in that a new variable introduced by programming teaching is directly given by a programming tool to represent the actual performance of a student i on a knowledge point k, if the programming tool already gives the performance of the knowledge point, the value is 1 when an error occurs to emphasize the influence on the result, otherwise, the value is 0, and if the programming tool does not give the result, the knowledge point is considered not to be mastered, and the value is 1 correspondingly.
By matching the parameter theta of the association factor modeliBeta, gamma are optimized, the next question of the student is input into the optimized combined factor model, and the result prediction of the question is carried out; the specific optimization process comprises the following steps:
(1) initializing parameters: initializing thetai 0,βk 0,γk 0(i-1, 2, …, N; K-1, 2, …, K) setting m to 0, determining the maximum number of iterations G, initializing the iterator G to 0, and calculating L (θ)i 0,βk 0,γk 0);
(2) Let W be L (theta)i 0,βk 0,γk 0) Calculating a likelihood functionIf g is satisfied>G orAnd g ≠ 0, stops operating, otherwise orders
Under the scene that M students answer N topics, the log-likelihood function of the scene is as follows:
wherein the content of the first and second substances,
If it is satisfied withAnd isStopping the iteration, thetai *=θi m,βk *=βk mB, carrying out the following steps of; otherwise, the step length is takenmSo that:
If it is satisfied withOr | | | θi m+1-θi m||<Epsilon and betak m+1-βk m||<Epsilon, stop iteration, let thetai *=θi m+1,βk *=βk m+1Updating m to m +1, and rotating (3);
(5) holding thetai m+1,βk m+1The temperature of the molten steel is not changed,m is 0, and L is calculated for gammak mGradient of (2)If it is notStopping iteration and let gammak *=γk m(ii) a Otherwise, selecting step length lambdamTo make
(6) If it is notOr | | | γk m+1-γk m||<E, stopping the iteration,let gamma bek *=γk m+1G +1, go (2); otherwise, m is g +1, and then (3) is carried out;
stopping iteration until gradient reduction is realized;
the personalized intelligent tutoring method for the programming beginner is based on three modules shown in fig. 3, including a programming data collection and storage module, an automatic error recovery module and a learning effect evaluation module;
data are collected through two channels, one part of the data is collected through a student registration login evaluation module, an on-line submission operation program sends the data to a storage module, the evaluation module is established based on an open source on-line evaluation platform of Qingdao university, the other part of the data comes from a Karaoke practice teaching platform (a website providing programming course on-line practical training), the data are connected with the Karaoke platform in a butt joint mode, relevant data of students for answering objective questions and completing practical training are received regularly in a specific data format, and the data are stored in the module after the data format is confirmed to be correct.
And the error automatic repairing module acquires the data of the operation program from the data collecting and storing module and divides the data into correct answers and wrong answers. Clustering programs for correct operation, selecting templates of each type, and prompting students with multiple problem solving methods in the form of a tree diagram; and repairing the procedure of the error homework according to the template, and finally feeding back a modification prompt and an error knowledge point to the student.
The learning effect evaluation module is mainly used for analyzing answer data of objective questions and practical training questions, and finally, evaluation results are expressed by a visual chart.
The program matching in the error automatic repair module needs to meet the support of diversified expression, if the algorithm targets are consistent but the grammatical expressions are different, the algorithm targets tend to be regarded as one class, so that the minimum classes can be obtained, and the algorithm targets can be compared with fewer templates in the later repair process. Therefore, it is necessary to first define a criterion whether a program belongs to a class; generally, there are functions, statements, and variables at the granularity of program analysis; if the function is used as the analysis granularity of the programming operation, the loop and the control structure of the algorithm are difficult to master, some information in the process can be ignored, and the details of the program implementation are lost. If statements and variables are used as analysis granularity, the execution of the loop structure can greatly increase the complexity of analysis and is difficult to meet the timeliness of repair. It is therefore contemplated to divide the program into blocks, so that the program analysis is performed with the blocks as granularity. The specific steps of program matching and repairing are as follows:
The algorithm obtains a series of track-matched variable pairs (v2, v1), but whether the variables of the two programs are in one-to-one correspondence cannot be judged, so that the judgment can be carried out by means of the algorithm of the graph, the variables are regarded as nodes, and the variables with the same track are connected into edges. That is, in the formed graph, all the points are divided into two groups, the points of one group can only be connected with the points of the other group, whether perfect matching exists on the bipartite graph formed by the variables is searched, and whether the programs P and Q are matched can be solved by using the Hungarian algorithm.
The method for judging the track corresponding relation emphasizes the dynamic execution of the program, can cluster codes with different code styles or codes expressing different codes but with consistent algorithm ideas, and lays a foundation for program repair.
And 2, program clustering needs to meet expandability, correct samples are dynamically increased, and if new correct submissions exist each time, the historical correct samples need to be clustered again, so that the algorithm efficiency is greatly reduced. It is contemplated that the clustering algorithm may be set to incremental, without requiring the algorithm to be restarted each time.
Firstly, inputting program set P ═ P composed of correct codes1,p2,p3,…,pnAnd (4) initializing a clustering result C { }, and circularly traversing generations, wherein the number k of template classes is 0Each program P in P in the code setiThe following procedure is performed:
i. if the clustering result C is null, let C1=pi,C={c1:[pi]},k=k+1;
And ii, if not, circularly traversing the template program C in the clustering result CiThe following procedure is performed:
a) if c isiAnd piMatch, then Ci]+=[pi],
b) Otherwise, the new form c is createdk+1=piLet C [ C ] bek+1]=[pi],k=k+1;
Returning a clustering result C;
and 3, the program repairing needs to refer to the clustered result, and since the clustered program algorithms belonging to the same class are consistent, the repairing only needs to compare the error program with the template of each class, so that the complexity of the algorithm can be reduced, and the efficiency of the algorithm is improved.
Considering that the granularity of program analysis is determined to be a block in program matching, a method from local to overall is adopted in repair, error codes are firstly partitioned, a group of local repairs are generated for expressions in the block, and the results of all the blocks form a repair set.
And 3.1, determining the repair cost.
The diff tool reflects the difference of the code lines by adding and deleting two basic change operations of the code lines, but the diff tool cannot reflect the difference of the code syntax structure. In order to be able to quantify the size of the repair operation, the program expression is converted into the form of an abstract syntax tree, each node on the tree representing some structure of the program language, which representation facilitates hiding syntax details. Therefore, addition, deletion and modification of expressions can be obtained by using corresponding changes of an abstract Syntax tree AST (abstract Syntax tree).
Whereas the change in AST can calculate the cost in two ways:
the tree edit distance is similar to the string edit distance, primarily to minimize the operational cost required to change from tree a to tree b. From a to b may need to be modifiedNode, insert node or delete node, therefore, each unit operation needs to be given weight 1, if the number of modification times is recorded as SqThe number of deletion operations is denoted as DrThe number of insertion operations is denoted as Iq(ii) a The tree edit Distance is the sum of these operations, i.e. Distance (a, b) is Sq+Iq+Dr;
For AST vectorization, location information of each node is reserved as much as possible. The abstract syntax tree is essentially a binary tree, and for conversion into vectors, the abstract syntax tree is considered to be split into a series of original subtrees to capture structural information of the tree. Specifically, the node type number L of the tree and the height q of the tree are obtained first, and then the corresponding atomic tree has at mostAnd (4) seed preparation. By biTo calculate the number of ith atomic trees, the vector representing the shape of the tree isTo add position information, each element in the form vector is expanded to biDimension, which respectively represents the height of the atomic tree in the AST, and the position form vector of the AST is:
such a representation method may retain both the form of the AST and its location information. After converting AST into vectors, the final repair cost J is:
3.2 local repair.
i. Inputting an error program p, a template program c and an input I to be tested;
if the partitioning results of p and c are different, returning to 'repair failure' and exiting;
iii, input I, get the trajectory set of C Γ C { [ P ]C](ρ)|ρ∈I};
iv, inputting I, and obtaining a track set of P, wherein t { [ P ]P](ρ)|ρ∈I};
v. loop through all variables (l, v2) in all blocks in program p to be repaired;
Circularly traversing all variables v1 in the template program c to obtain an assignment expression of v1
Constructing a mapping relation set W, and setting e for each mapping relation W, W belonging to WimplAll variables of (2) map to eCEnsuring that v1 is mapped to obtain v 2;
x, traversing W in the set W;
a)eCis replaced by w to erepair;
b) If erepairAnd eimplMatching, adding (w,0) into LR;
c) otherwise, calculate erepairTo eimplRepairing cost, adding (w, cost) into LR;
3.3 solving for final repair.
There are many options for whole program repair, and a complete repair is a subset of a partial repair that is found to be consistent. That is, in this set, no conflict occurs between all the variable mappings, each variable in the error program corresponds to only a unique variable in the template program, and there is no relationship between two variables corresponding to the same.
i. First determining a decision variable byTo represent the set of variables, V, of the correct programimplTo indicate that the program was in error,representing a local repair set, the domain D of the decision variable is:
and the feasible fields to be satisfied can be listed according to the four conditions above:
the final objective function to be obtained is as follows:
and iv, converting the problem of the final repair result into an optimization problem under the linear constraint condition of 0-1, and obtaining an optimal solution by using lpsolve.
And 4, aiming at the obtained repairing result, corresponding the variable to the variable of the error program by utilizing the repairing mapping relation, and obtaining the line number and the specific operation which need to be repaired. And according to the operation statement, the repairing operation is corresponded to the knowledge point by using a regular expression method based on a certain rule, and the final feedback generation is completed.
The step of programming the learning prediction model is as follows, the grasping condition of the student on the programming knowledge point is obtained by utilizing the combined factor model aiming at the programming:
(1) the factors influencing the answering performance mainly comprise two types, one type is question correlation and comprises question difficulty, contained knowledge points and the easy learning degree of each knowledge point; the other type is the characteristics of the students, including the skill mastery degree before the students answer, the learning rate and some error factors, such as the guessing probability and the false answer probability, etc.
The difficulty degree and the learning rate of each knowledge point in the question are different, and the difficulty degree and the learning rate of the knowledge points are set as parameters.
The specific implementation steps of the feedback module are as follows:
(1) a programming error repair system is built by using the django module, and various solutions of correct questions and repair method prompts of wrong questions of students are displayed;
(2) displaying the change of the student ability by using a JavaScript chart library such as Echarts;
(3) and feeding back the visualization result to the teacher or the student.
The results show that:
a set of collected student programming examples are given below to further detail the method of the present invention. A course has a title of nine chapters, and Table 1 describes the basic situation of this part of the data.
Table 1: course chapters and sections title, students and related knowledge points
In table 1, the course contains 6 sections, each section has different number of subjects, the total number of subjects is finally added up to 9 subjects, 1030 students participate in answering the questions, the subjects of each student are not completely consistent, the number of submissions of each subject is also different, the submissions are added up to 6885 times, wherein the maximum number of submissions for the same exercise is 23 times. All topics relate to 11 knowledge points, including Constants (CO), Variables (VA), Operators (OP), Strings (ST), Expressions (EX), Lists (LI), Tuples (TU), Dictionaries (DI), Conditions (CD), Loops (LO) and printed statements (PR). The number of knowledge points contained in each topic is different, and C-9 is a comprehensive problem and contains all knowledge points.
The experimental procedure for carrying out the invention on this data set is as follows:
1. a program matching experiment was performed.
After a programming automatic repair tool is used for correctly submitting the same topic and using a matching algorithm, the two matched programs are further analyzed, so that different expression modes of the same algorithm are obtained, as shown in table 2, the expressions in the expression set 1 and the expression set 2 are consistent in execution tracks of variables aiming at the same input.
Table 2: matched expressions
The accurate submission of the 9 subjects in the experimental course is clustered, and the final obtained result is shown in table 3, which shows that the number of the 9 subject clusters is between 2 and 5, and the more complex the subjects are, the more the number of clusters is. The clustering result of topic C-9 is 5.
Table 3: clustering results
Topic of questions | Title | Average number of code lines | Number of clusters |
C-1 | |
5 | 2 |
C-2 | Challenge list | 11 | 3 |
C-3 | Sorting | 11 | 3 |
C-4 | Changing dictionary elements | 12 | 4 |
C-5 | Calculating a numerical value | 18 | 5 |
C-6 | Data classification | 14 | 3 |
C-7 | Calculating factorial | 12 | 3 |
C-8 | Data traversal | 12 | 4 |
C-9 | Synthetic challenge | 19 | 5 |
2. Performing program automatic repair experiment
The repairing rates and the repairing times of two different repairing cost representation modes are tested on the course data set, namely the difference between the tree editing distance and the vector distance representation method, and the result shows that the repairing rates are not different, as shown in fig. 4, the comparison between the repairing time using the tree editing distance as the repairing cost and the repairing time using the vector distance as the repairing cost is shown, and it can be seen that in the case of simple topics, such as C-1 and C-2, the repairing time difference between the two is not obvious, and as the complexity of the topics gradually increases, the time using the vector distance as the repairing cost is obviously lower than the repairing algorithm using the tree editing distance as the repairing cost.
In order to measure the complexity of the repair operation, vector distance is used as the repair cost, the repair cost is divided by the total vector value of the error program to obtain the relative repair cost, the relative repair cost is normalized, and the result is represented by a histogram. As shown in fig. 5, in the histogram, the horizontal axis represents the relative repair cost value, the vertical axis represents the number of error programs, and if the error programs are null and the AST number is 0, the relative repair cost is greater than 1.0, and may even be infinite. In the histogram, 68% of repairs have a relative repair cost of less than 0.2 and 25% of repairs have a relative repair cost of less than 0.1. The repair completed by the repair tool is mostly simple repair and has strong practicability.
3. Programming learning state prediction
Ten-fold cross validation is adopted on a data set collected by the method, an experiment is completed, AUC and ACC under prediction of a plurality of models are calculated, and the result is shown in table 4.
TABLE 4 prediction results
The invention has not been described in detail and is within the skill of the art.
Claims (6)
1. A personalized intelligent tutoring method for programming beginners is characterized by comprising the following specific steps;
firstly, aiming at a certain question, dividing a programming operation program submitted by each beginner into two types according to correct answers and wrong answers; dividing each operation program by taking a block as granularity according to whether each operation program contains a control statement and a cycle statement;
then, inputting the test sample corresponding to the question into each operation program to obtain the variable value of the block in each operation program, and combining the variable values to form a variable execution track corresponding to each operation;
aiming at the correct answer, selecting the variable execution tracks corresponding to the correct answer one by one, judging whether the matching condition is met, if so, gathering all matched operation programs into one type, otherwise, independently classifying unmatched operation programs; randomly selecting an operation program from each class as a template to form a template set for repairing the local part of the error program;
aiming at the error answer, matching the variable execution track in the current error operation program S with the correct variable execution track selected one by one from the template set to obtain the matched correct variable and the unmatched error variable in the program S; generating a mapping relation set by adopting a Cartesian product between error variables of the program S and all variables in the current correct variable execution track in the template set; calculating the repair cost corresponding to each mapping relation; selecting a mapping relation which meets the requirements of complete consistency of variable matching and minimum cost value from all the repair costs, corresponding a correct variable in a correct variable execution track to a variable corresponding to an error program S, and corresponding the repair operation to a programming knowledge point by using a regular expression to complete final repair feedback generation;
and finally, constructing a joint factor model, realizing the evaluation of the programming learning state of the student, and predicting the answer result of the student on the next question.
2. The method as claimed in claim 1, wherein the rules for partitioning the plurality of task programs with block granularity are as follows:
i. the method has the advantages that the loop statement and the selection statement do not exist, and the loop statement and the selection statement are an integral block and are not divided; selecting a statement without nesting a loop statement, wherein the statement is taken as a whole block without division, and each block is expressed as { L:0 };
a single-layer loop statement, as a separate block, denoted as { L:1 };
if a loop statement is nested in the selection statement, the whole is expressed as a single block as L: 1;
a plurality of layers of nested loops, each layer being a separate block; a loop block L2 is nested in the loop block L1, and is denoted as { L1: { L2:1} }.
3. The method as claimed in claim 1, wherein the matching condition of the two variable execution tracks is satisfied simultaneously as follows:
i. whether the number of blocks in the two variable execution tracks is the same or not and whether the control statement and the loop structure statement are the same or not;
ii, whether the number of variables in the two variable execution tracks is the same;
and iii, judging whether the execution tracks of the two variables are in one-to-one correspondence.
4. The method for personalized intelligent tutoring for programming beginners as claimed in claim 1, wherein said process of clustering the correct answers is as follows:
firstly, two variable execution tracks are initially randomly selected, whether matching conditions are met is judged, and if yes, the two matched operation programs are classified into one class; otherwise, the two corresponding operation programs are respectively of one type;
then, continuously selecting a third variable execution track, matching the third variable execution track with any one of the two matched operation programs, judging whether a matching condition is met, if so, classifying the third variable execution track into the class, and if not, independently classifying the third variable execution track into the class;
for two unmatched operation programs, matching the third variable execution tracks with the two operation programs one by one, repeatedly judging whether matching conditions are met, classifying the matching conditions into one class, and otherwise, respectively classifying the three operation programs into one class;
and sequentially selecting the next variable execution track, and repeating the process until the variable execution tracks of all the operations are matched.
5. The method for personalized intelligent tutoring for programming beginners as claimed in claim 1, wherein said repair cost is: modifying the error code segment a into the size of the action required to be executed by the b;
the invention adopts the steps of converting a program segment into an abstract syntax tree, calculating the edit Distance (a, b) of the tree based on the structure of the tree as the repair cost, and the calculation formula is as follows:
Distance(a,b)=Sq+Iq+Dr
Sqnumber of node modifications for two abstract syntax trees, IqInserting operation times for nodes of two abstract syntax trees, DrThe number of operations is deleted for the nodes of the two abstract syntax trees.
6. The method as claimed in claim 1, wherein the joint factor model predicts student answer probability by linearly adding mastery conditions and difficulty levels of all knowledge points according to student answer performance; probability p of correct answerijThe calculation formula of (a) is as follows:
Yijrepresenting the probability of the student i answering the question j; thetaiThe mastery degree of the student i on the question before answering is represented; beta is akRepresenting the ease of knowledge point k, gammakDenotes the learning rate of the knowledge point k, qjkWhether the topic j contains a knowledge point k is represented, wherein the knowledge point k is 1 and is not 0; c. CikRepresents the actual performance of student i at knowledge point k, TikRepresents the number of attempts by student i at knowledge point k, γkAnd TikMultiplication means that the mastery degree of the student on the knowledge point k is higher every time the student tries; k represents the total number of knowledge points in the programming questions;
by matching the parameter theta of the association factor modeliAnd beta and gamma are optimized, and the next question of the student is input into the optimized combined factor model to predict the result of the question.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2020113337521 | 2020-11-24 | ||
CN202011333752 | 2020-11-24 |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114092288A true CN114092288A (en) | 2022-02-25 |
Family
ID=80303301
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111395652.6A Pending CN114092288A (en) | 2020-11-24 | 2021-11-23 | Personalized intelligent tutoring method for programming beginners |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114092288A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106095663A (en) * | 2016-05-26 | 2016-11-09 | 西安交通大学 | Program based on hierarchical model returns location of mistake method |
CN109408114A (en) * | 2018-08-20 | 2019-03-01 | 哈尔滨工业大学 | A kind of program error automatic correcting method, device, electronic equipment and storage medium |
KR101984954B1 (en) * | 2019-02-08 | 2019-05-31 | 주식회사 소풍앤컴퍼니 | Coding learning support device |
CN109919500A (en) * | 2019-03-13 | 2019-06-21 | 中南大学 | Auto-Evaluation System with the error feedback function based on ontology |
CN110349477A (en) * | 2019-07-16 | 2019-10-18 | 湖南酷得网络科技有限公司 | A kind of misprogrammed restorative procedure, system and server based on history learning behavior |
-
2021
- 2021-11-23 CN CN202111395652.6A patent/CN114092288A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106095663A (en) * | 2016-05-26 | 2016-11-09 | 西安交通大学 | Program based on hierarchical model returns location of mistake method |
CN109408114A (en) * | 2018-08-20 | 2019-03-01 | 哈尔滨工业大学 | A kind of program error automatic correcting method, device, electronic equipment and storage medium |
KR101984954B1 (en) * | 2019-02-08 | 2019-05-31 | 주식회사 소풍앤컴퍼니 | Coding learning support device |
CN109919500A (en) * | 2019-03-13 | 2019-06-21 | 中南大学 | Auto-Evaluation System with the error feedback function based on ontology |
CN110349477A (en) * | 2019-07-16 | 2019-10-18 | 湖南酷得网络科技有限公司 | A kind of misprogrammed restorative procedure, system and server based on history learning behavior |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113851020A (en) | Self-adaptive learning platform based on knowledge graph | |
Serral et al. | Automating immediate and personalized feedback taking conceptual modelling education to a next level | |
CN107544960A (en) | A kind of inference method activated based on Variable-Bindings and relation | |
CN116361697A (en) | Learner learning state prediction method based on heterogeneous graph neural network model | |
Nakhkob et al. | Predicted increase enrollment in higher education using neural networks and data mining techniques | |
Carmona et al. | Subgroup discovery in an e-learning usage study based on Moodle | |
Latypova | Automated system for checking works with free response using intelligent tutor’s comment analysis in engineering education | |
Fayzrakhmanov et al. | The use of mathematical methods to automate the control of skills in the study of software testing algorithms | |
CN113870634A (en) | Intelligent volume combination method and system combined with virtual teaching | |
CN111625631B (en) | Method for generating option of choice question | |
CN117473041A (en) | Programming knowledge tracking method based on cognitive strategy | |
CN114092288A (en) | Personalized intelligent tutoring method for programming beginners | |
CN110349477A (en) | A kind of misprogrammed restorative procedure, system and server based on history learning behavior | |
CN113066358B (en) | Science teaching auxiliary system | |
Hare et al. | Optimize student learning via random forest-based adaptive narrative game | |
Broisin et al. | Design and evaluation of a semantic indicator for automatically supporting programming learning | |
Latypova | Work with free response implementation process analysis based on sequential pattern mining in engineering education | |
Sudol et al. | Calculating Probabilistic Distance to Solution in a Complex Problem Solving Domain. | |
CN112256869B (en) | Same-knowledge-point test question grouping system and method based on question meaning text | |
Ventura | A Self-learning musical tool to support the educational activity | |
Bouarab-Dahmani et al. | Automated evaluation of learners with ODALA: application to relational databases E-learning | |
Guts et al. | Modelling and construction of curriculum optimization algorithms in order to improve the effectiveness of management of the educational process | |
Suraweera et al. | Widening the knowledge acquisition bottleneck for constraint-based tutors | |
Wang et al. | A novel intelligent tutoring system for learning programming | |
Shubin et al. | Methods of Adaptive Knowledge Testing Based on the Theory of Logical Networks. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |