CN114092288A

CN114092288A - Personalized intelligent tutoring method for programming beginners

Info

Publication number: CN114092288A
Application number: CN202111395652.6A
Authority: CN
Inventors: 吴文峻; 梁堉; 武丽莎; 韩勇
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2020-11-24
Filing date: 2021-11-23
Publication date: 2022-02-25

Abstract

The invention discloses a personalized intelligent tutoring method for programming beginners, relating to the field of education intellectualization; firstly, aiming at each programming operation of a certain topic, dividing by taking a block as granularity; then, inputting the test sample into each operation program to obtain respective variable execution tracks; dividing the correct operation programs into clusters according to matching conditions, and randomly selecting a template from each cluster; aiming at the current error operation program, selecting templates one by one to match variable execution tracks with the templates, generating mapping relations by adopting Cartesian products, and calculating the repair cost corresponding to each mapping relation; and selecting a mapping relation which meets the conditions that the matching of the variables is completely consistent and the cost value is minimum, corresponding the correct variable to the variable corresponding to the error program, repairing the corresponding knowledge point, and finishing the final repair feedback generation. Finally, a combined factor model is constructed to realize the evaluation of the programming learning state of the student; the invention improves the repair rate of the programming operation.

Description

Personalized intelligent tutoring method for programming beginners

Technical Field

The invention relates to the field of education intellectualization, in particular to a programming beginner-oriented personalized intelligent tutoring method.

Background

The programming intelligent tutoring is an important content in intelligent education and aims to help students to repair submitted error programs and estimate mastering conditions of programming knowledge points.

Currently, the following problems exist in the field of programmed intelligent tutoring: the method has the advantages that automatic repair support for small-scale programs written by beginners is insufficient, programming error repair time is long, repair results are obscure and unintelligible, a model for programming learning state prediction is lacked, and an existing programming course online practice system is insufficient in providing personalized feedback; the above problems have led to a need for a programmed intelligent tutoring system.

In a traditional online evaluation system, students can only obtain a result of whether a code is correct, further repair work of an error code needs to be completed independently by the students or assisted by teachers, manual repair needs a large amount of time, and the efficiency is extremely low; in addition, students lack basic knowledge on the mastery conditions of own knowledge points in the programming learning process, and are difficult to obtain personalized guidance, an intelligent tutoring system for programming beginners is urgently needed to assist the students in learning programming courses, meanwhile, codes submitted by the students can be evaluated in real time, repair opinions and error-related knowledge points can be provided for the students for error programs, and the learning degree of the students on the programming knowledge points can be estimated after the students learn in stages.

If manual repair is adopted, a large amount of time is wasted by the students when the students repair errors by themselves, the programming learning enthusiasm of the students is also struck, and the difference of the number of the teachers and the students cannot meet the requirement that the teachers repair the errors of the students one by one. Therefore, implementing an intelligent tutoring system for a specific course has become an important content for intelligent learning at present stage.

Disclosure of Invention

Aiming at the problem of personalized intelligent tutoring of a programming beginner, the invention builds a personalized intelligent tutoring method facing the programming beginner in order to overcome the defects of the prior art, starts with the collection of learning behavior data of a programming course, carries out matching clustering on correct submissions of programming practice to obtain templates of each class, returns feedback opinions by using an automatic repair tool when the templates are submitted incorrectly, and simultaneously builds a learning state prediction model to evaluate and predict results newly submitted by a student through the wrong submissions of the student.

The personalized intelligent tutoring method for the programming beginners comprises the following steps:

step one, aiming at a certain question, dividing a programming operation program submitted by each programming beginner into two types according to a correct answer and an incorrect answer;

step two, dividing each operation program by taking a block as granularity according to whether a control statement and a cycle statement are contained;

the rules of the division are as follows:

i. the method has the advantages that the loop statement and the selection statement do not exist, and the loop statement and the selection statement are an integral block and are not divided; selecting a statement without nesting a loop statement, wherein the statement is taken as a whole block without division, and each block is expressed as { L:0 };

a single-layer loop statement, expressed as a separate block as { L:1 };

if a loop statement is nested in the selection statement, the whole is expressed as a single block as L: 1;

a plurality of layers of nested loops, each layer being a separate block; a loop block L2 is nested in the loop block L1, and is denoted as { L1: { L2:1} }.

Inputting the test sample corresponding to the question into each correct operation program to obtain the variable value of the block in each operation program, and combining the variable values to form a variable execution track corresponding to each operation;

the test sample comprises correct input and output of the question;

step four, selecting variable execution tracks one by one, judging whether matching conditions are met, if so, grouping all matched operation programs into one type, otherwise, independently classifying unmatched operation programs;

the matching of the two variable execution tracks needs to satisfy the following conditions at the same time:

i. whether the number of blocks in the two variable execution tracks is the same or not and whether the control statement and the loop structure statement are the same or not;

ii, whether the number of variables in the two variable execution tracks is the same;

and iii, whether the two variable execution tracks correspond to one another.

The method specifically comprises the following steps:

firstly, two variable execution tracks are initially randomly selected, whether matching conditions are met is judged, and if yes, the two matched operation programs are classified into one class; otherwise, the two corresponding operation programs are respectively of one type;

then, continuously selecting a third variable execution track, matching the third variable execution track with any one of the two matched operation programs, judging whether a matching condition is met, if so, classifying the third variable execution track into the class, and if not, independently classifying the third variable execution track into the class;

for two unmatched operation programs, matching the third variable execution tracks with the two operation programs one by one, repeatedly judging whether matching conditions are met, classifying the matching conditions into one class, and otherwise, respectively classifying the three operation programs into one class;

sequentially selecting the next variable execution track, and repeating the process until the variable execution tracks of all the operations are matched;

randomly selecting an operation program from each class as a template to form a template set for repairing the local part of the error program;

step six, for each wrong operation program, obtaining a variable execution track corresponding to each operation by using the test sample;

step seven, matching the variable execution track in the current error operation program S with the correct variable execution track selected one by one from the template set to obtain the matched correct variable and the unmatched error variable in the program S;

the correct variable refers to a variable which is matched with the correct variable in the variable execution track of the error operation program, namely a variable which does not have errors in the error operation program;

step eight, generating a mapping relation set by adopting Cartesian products between error variables of the program S and all variables in a correct variable execution track in the template set;

and step nine, calculating the repair cost corresponding to each mapping relation in the mapping relation set.

The repair cost is as follows: modifying the error code segment a into the size of the action required to be executed by the b;

the invention adopts the steps of converting a program fragment into an abstract syntax tree, calculating the edit Distance (a, b) of the tree as the repair cost based on the structure of the tree, and the calculation formula is as follows:

Distance(a,b)＝S_q+I_q+D_r

S_qnumber of node modifications for two abstract syntax trees, I_qInserting operation times for nodes of two abstract syntax trees, D_rDeleting the operation times for the nodes of the two abstract syntax trees;

selecting a mapping relation which satisfies the conditions that the variables are completely matched and the cost value is minimum from all the repairing costs, corresponding the correct variables in the correct variable execution track to the variables corresponding to the error program S, and using a regular expression to correspond the repairing operation to the programming knowledge points to complete the final repairing feedback generation.

Step eleven, constructing a joint factor model, realizing the evaluation of the programming learning state of the student, and predicting the answer result of the student on the next question;

the joint factor model predicts the student answer probability by linearly adding the mastery conditions and the difficulty of all knowledge points according to the student answer performance.

Probability p of correct answer_ijThe calculation formula of (c) is as follows:

Y_ijrepresenting the probability of the student i answering the question j; theta.theta._iThe mastery degree of the student i on the question before answering is represented; beta is a_kRepresenting the ease of knowledge point k, gamma_kRepresents the learning rate of the knowledge point k, q_jkWhether the topic j contains a knowledge point k is represented, wherein the knowledge point k is 1 and is not 0; c. C_ikRepresents the actual performance of student i at knowledge point k, T_ikRepresents the number of attempts by student i at knowledge point k, γ_kAnd T_ikMultiplication means that the mastery degree of the student on the knowledge point k is higher every time the student tries; k represents the total number of knowledge points in the programming topic.

By matching the parameter theta of the association factor model_iBeta and gamma are optimized, and the next question of the student is input into the optimized combined factor model to predict the result of the question;

compared with the prior art, the invention has the following advantages:

(1) compared with the existing software repair tool, the personalized intelligent tutoring method for the programming beginner repairs the programming jobs of students and can generate the clustering results of correct programs and the repair operation of wrong programs; the method positions the program analysis granularity to the block, matches and repairs by using the execution track of the variable, repairs based on the semantics, sets the repair emphasis as an algorithm rather than an expression form, and improves the repair rate of the programming operation.

(2) A personalized intelligent tutoring method for programming beginners optimizes an algorithm for generating local repair and reduces time for obtaining repair cost; the tree editing distance is used for calculating the repair cost, so that various choices are provided for repairing different scenes;

(3) a personalized intelligent tutoring method facing programming beginners defaults each question to contain only one knowledge point when analyzing student answer states in the existing learning state prediction model, and each programming question is complex and may contain a plurality of knowledge points in programming practice;

(4) a personalized intelligent tutoring method for programming beginners is characterized in that codes submitted by students contain more information than traditional objective questions, observation variables of existing models can only be binary variables (right answer and wrong answer), and multiple knowledge points and code information are introduced for learning state evaluation of programming practice.

Drawings

FIG. 1 is a schematic diagram of a programming beginner-oriented personalized intelligent tutoring method according to the present invention;

FIG. 2 is a flowchart of a programming beginner-oriented personalized intelligent tutoring method of the present invention;

FIG. 3 is a block diagram of a programming beginner-oriented personalized intelligent tutoring method of the present invention;

FIG. 4 is a time comparison of tree edit distance repair and vector distance repair used in the present invention;

FIG. 5 is a histogram of relative repair costs versus number of erroneous procedures according to the present invention.

Detailed Description

The present invention will be described in further detail and with reference to the accompanying drawings so that those skilled in the art can understand and practice the invention.

The intelligent tutoring system ITS is an adaptive learning support system which makes a computer play a role of a virtual tutor to teach knowledge to learners and provide learning guidance by means of an artificial intelligence technology. As a product of the development of modern distance education towards intellectualization, ITS combines scientific theories and technical methods of a plurality of subjects such as artificial intelligence, computer science, education, behavior science and psychology, and aims to provide learning guidance and help for learners in a man-machine interaction mode. The intelligent tutoring system can judge the mastering level of the students on the corresponding knowledge points according to the answering effect of the students, so that the students are helped to generate personalized learning routes and provide targeted tutoring opinions.

The existing intelligent tutoring system is mainly applied to the field of basic subjects, is still deficient in the intelligent tutoring system of programming courses, is more a simple on-line evaluation system, and has the following defects: (1) the repair guidance for student errors cannot be carried out; (2) without summarizing the learning process of students, the students cannot get the own mastery condition of the programming knowledge points. Aiming at the problems, the invention provides a personalized intelligent tutoring method for programming beginners, which realizes the intelligent tutoring for the programming beginners;

the personalized intelligent tutoring method for the programming beginners is characterized in that as shown in fig. 1, programming data of students are collected, verified and stored; the correct operation program in the programs submitted by the students is clustered to repair the wrong operation program; training is carried out through a learning state prediction model, the ability of the student to the programming knowledge point is deduced, and the final feedback generation is completed.

As shown in fig. 2, the method comprises the following steps:

the rules of the division are as follows:

a single-layer loop statement, expressed as a separate block as { L:1 };

Inputting the test sample corresponding to the question into each correct operation program to obtain the variable values of the blocks in each operation program, and combining the variable values to form a variable execution track corresponding to each operation;

the test sample comprises the correct input and output of the question;

whether the two variable execution tracks correspond to each other one by one or not is judged; that is, in a variable trajectory set, for each variable in a program, there is one variable in another program that takes equal values in the same order as it takes.

The method specifically comprises the following steps:

the correct variable refers to a variable which is matched with the correct variable in the variable execution track of the error operation program, namely, a variable which does not have an error in the error operation program;

and utilizing the repair mapping relation to correspond the variable to the variable of the error program to obtain the line number and the specific operation to be repaired.

The repair cost is as follows: modifying the error code segment a into the size of the action required to be executed by the b; the method is obtained by using an Abstract Syntax Tree (Abstract Syntax Tree), the AST change calculates the repair cost in two ways, one is that the Tree edit Distance (a, b) is calculated based on the Tree structure as the cost, and the calculation formula is as follows:

Distance(a,b)＝S_q+I_q+D_r

another consideration is to convert AST into vectors, with the difference of two vectors representing the repair cost:

and step ten, selecting a mapping relation which satisfies the conditions that the variable matching is completely consistent and the cost value is minimum from all the repair costs, corresponding the correct variable in the correct variable execution track to the variable corresponding to the error program S, using a regular expression to correspond the repair operation to the programming knowledge point, and completing the final repair feedback generation.

In order to obtain the final repair, the repair cost needs to be minimized except for meeting the condition of matching consistency of variables. Therefore, a subset of the repair set is selected with minimal repair cost, considering the use of constraint optimization techniques; first, this problem needs to be transformed into an optimization problem under linear constraints, which is then handed over to an off-the-shelf ILP solver for optimal results.

the repaired wrong knowledge points can be regarded as knowledge points which are not mastered by students and used as the description of the learning process of the students, and a joint factor model can be introduced to improve the prediction effect of the model.

Probability p of correct parameters and answers to be solved_ijThe calculation formula of (a) is as follows:

Y_ijrepresenting the probability of the student i answering the question j; theta_iThe ability of the student i is represented, namely the mastery degree of the question before answering, the ability of the student is positively correlated with the answer probability, and the stronger the ability is, the higher the answer probability is; beta is a_kRepresents the ease of knowledge point k, and q_jkMultiplying to influence the probability of final answer, wherein the ease degree of the knowledge points can influence the result only if the questions contain the knowledge points; gamma ray_kShows the learning rate of the knowledge point k, i.e. the student does not grasp the knowledge point k in the jth subject, but learns the knowledge point k in the jth +1 subjectThe probability of the knowledge point is mastered; q. q.s_jkWhether the topic j contains the knowledge point k or not is shown, the term is obtained from the Q-matrix and is a binary variable, and the topic contains the knowledge points k and Q_jkIs 1, not inclusive of 0; t is_ikRepresents the number of attempts by student i at knowledge point k, γ_kAnd T_ikMultiplication means that the mastery degree of the student on the knowledge point k is higher every time the student tries; k represents the number of the knowledge points summed up in the Q-matrix, and the final result is that the grasping conditions and the difficulty of all the knowledge points are linearly added to predict the answer probability of the students.

Q-matrix represents the relation between the topics and the knowledge points, the nth column value of the mth row is 1, the fact that the topics m contain the knowledge points n is represented, and the fact that the values are 0 indicates that the topics m do not contain the knowledge points n. Parameter c_ikThe method is characterized in that a new variable introduced by programming teaching is directly given by a programming tool to represent the actual performance of a student i on a knowledge point k, if the programming tool already gives the performance of the knowledge point, the value is 1 when an error occurs to emphasize the influence on the result, otherwise, the value is 0, and if the programming tool does not give the result, the knowledge point is considered not to be mastered, and the value is 1 correspondingly.

By matching the parameter theta of the association factor model_iBeta, gamma are optimized, the next question of the student is input into the optimized combined factor model, and the result prediction of the question is carried out; the specific optimization process comprises the following steps:

(1) initializing parameters: initializing theta_i ⁰,β_k ⁰,γ_k ⁰(i-1, 2, …, N; K-1, 2, …, K) setting m to 0, determining the maximum number of iterations G, initializing the iterator G to 0, and calculating L (θ)_i ⁰,β_k ⁰,γ_k ⁰)；

(2) Let W be L (theta)_i ⁰,β_k ⁰,γ_k ⁰) Calculating a likelihood function

If g is satisfied>G or

And g ≠ 0, stops operating, otherwise orders

Under the scene that M students answer N topics, the log-likelihood function of the scene is as follows:

wherein the content of the first and second substances,

(3) order to

Gamma retention_k ^mConstant, calculate L vs. theta_i ^m，β_k ^mGradient of (2)

If it is satisfied with

And is

Stopping the iteration, theta_i ^*＝θ_i ^m，β_k ^*＝β_k ^mB, carrying out the following steps of; otherwise, the step length is taken^mSo that:

(4) order to

Calculating a likelihood function L (theta)_i ^m+1,β_k ^m+1,γ_k ^m)；

If it is satisfied with

Or | | | θ_i ^m+1-θ_i ^m||<Epsilon and beta_k ^m+1-β_k ^m||<Epsilon, stop iteration, let theta_i ^*＝θ_i ^m+1，β_k ^*＝β_k ^m+1Updating m to m +1, and rotating (3);

(5) holding theta_i ^m+1，β_k ^m+1The temperature of the molten steel is not changed,

m is 0, and L is calculated for gamma_k ^mGradient of (2)

If it is not

Stopping iteration and let gamma_k ^*＝γ_k ^m(ii) a Otherwise, selecting step length lambda^mTo make

Computing

(6) If it is not

Or | | | γ_k ^m+1-γ_k ^m||<E, stopping the iteration,let gamma be_k ^*＝γ_k ^m+1G +1, go (2); otherwise, m is g +1, and then (3) is carried out;

stopping iteration until gradient reduction is realized;

the personalized intelligent tutoring method for the programming beginner is based on three modules shown in fig. 3, including a programming data collection and storage module, an automatic error recovery module and a learning effect evaluation module;

data are collected through two channels, one part of the data is collected through a student registration login evaluation module, an on-line submission operation program sends the data to a storage module, the evaluation module is established based on an open source on-line evaluation platform of Qingdao university, the other part of the data comes from a Karaoke practice teaching platform (a website providing programming course on-line practical training), the data are connected with the Karaoke platform in a butt joint mode, relevant data of students for answering objective questions and completing practical training are received regularly in a specific data format, and the data are stored in the module after the data format is confirmed to be correct.

And the error automatic repairing module acquires the data of the operation program from the data collecting and storing module and divides the data into correct answers and wrong answers. Clustering programs for correct operation, selecting templates of each type, and prompting students with multiple problem solving methods in the form of a tree diagram; and repairing the procedure of the error homework according to the template, and finally feeding back a modification prompt and an error knowledge point to the student.

The learning effect evaluation module is mainly used for analyzing answer data of objective questions and practical training questions, and finally, evaluation results are expressed by a visual chart.

The program matching in the error automatic repair module needs to meet the support of diversified expression, if the algorithm targets are consistent but the grammatical expressions are different, the algorithm targets tend to be regarded as one class, so that the minimum classes can be obtained, and the algorithm targets can be compared with fewer templates in the later repair process. Therefore, it is necessary to first define a criterion whether a program belongs to a class; generally, there are functions, statements, and variables at the granularity of program analysis; if the function is used as the analysis granularity of the programming operation, the loop and the control structure of the algorithm are difficult to master, some information in the process can be ignored, and the details of the program implementation are lost. If statements and variables are used as analysis granularity, the execution of the loop structure can greatly increase the complexity of analysis and is difficult to meet the timeliness of repair. It is therefore contemplated to divide the program into blocks, so that the program analysis is performed with the blocks as granularity. The specific steps of program matching and repairing are as follows:

step 1, firstly, matching programs. For a given program P, Q and multiple groups of input I, partitioning the two programs, observing whether the blocks of the two programs are in one-to-one correspondence, and directly exiting when the blocks of the two programs are not in one-to-one correspondence; otherwise, obtaining P, Q variable execution track according to the just divided blocks; simultaneously taking the Cartesian product of the variables P and Q as H; for each variable pair (v2, v1) in the P, Q variable cartesian product, checking whether the execution trajectories of the variables in the pair are the same, and if not, removing the variables from the H.

The algorithm obtains a series of track-matched variable pairs (v2, v1), but whether the variables of the two programs are in one-to-one correspondence cannot be judged, so that the judgment can be carried out by means of the algorithm of the graph, the variables are regarded as nodes, and the variables with the same track are connected into edges. That is, in the formed graph, all the points are divided into two groups, the points of one group can only be connected with the points of the other group, whether perfect matching exists on the bipartite graph formed by the variables is searched, and whether the programs P and Q are matched can be solved by using the Hungarian algorithm.

The method for judging the track corresponding relation emphasizes the dynamic execution of the program, can cluster codes with different code styles or codes expressing different codes but with consistent algorithm ideas, and lays a foundation for program repair.

And 2, program clustering needs to meet expandability, correct samples are dynamically increased, and if new correct submissions exist each time, the historical correct samples need to be clustered again, so that the algorithm efficiency is greatly reduced. It is contemplated that the clustering algorithm may be set to incremental, without requiring the algorithm to be restarted each time.

Firstly, inputting program set P ═ P composed of correct codes₁,p₂,p₃,…,p_nAnd (4) initializing a clustering result C { }, and circularly traversing generations, wherein the number k of template classes is 0Each program P in P in the code set_iThe following procedure is performed:

i. if the clustering result C is null, let C₁＝p_i,C＝{c₁:[p_i]},k＝k+1；

And ii, if not, circularly traversing the template program C in the clustering result C_iThe following procedure is performed:

a) if c is_iAnd p_iMatch, then C_i]+＝[p_i],

b) Otherwise, the new form c is created_k+1＝p_iLet C [ C ] be_k+1]＝[p_i],k＝k+1；

Returning a clustering result C;

and 3, the program repairing needs to refer to the clustered result, and since the clustered program algorithms belonging to the same class are consistent, the repairing only needs to compare the error program with the template of each class, so that the complexity of the algorithm can be reduced, and the efficiency of the algorithm is improved.

Considering that the granularity of program analysis is determined to be a block in program matching, a method from local to overall is adopted in repair, error codes are firstly partitioned, a group of local repairs are generated for expressions in the block, and the results of all the blocks form a repair set.

And 3.1, determining the repair cost.

The diff tool reflects the difference of the code lines by adding and deleting two basic change operations of the code lines, but the diff tool cannot reflect the difference of the code syntax structure. In order to be able to quantify the size of the repair operation, the program expression is converted into the form of an abstract syntax tree, each node on the tree representing some structure of the program language, which representation facilitates hiding syntax details. Therefore, addition, deletion and modification of expressions can be obtained by using corresponding changes of an abstract Syntax tree AST (abstract Syntax tree).

Whereas the change in AST can calculate the cost in two ways:

the tree edit distance is similar to the string edit distance, primarily to minimize the operational cost required to change from tree a to tree b. From a to b may need to be modifiedNode, insert node or delete node, therefore, each unit operation needs to be given weight 1, if the number of modification times is recorded as S_qThe number of deletion operations is denoted as D_rThe number of insertion operations is denoted as I_q(ii) a The tree edit Distance is the sum of these operations, i.e. Distance (a, b) is S_q+I_q+D_r；

For AST vectorization, location information of each node is reserved as much as possible. The abstract syntax tree is essentially a binary tree, and for conversion into vectors, the abstract syntax tree is considered to be split into a series of original subtrees to capture structural information of the tree. Specifically, the node type number L of the tree and the height q of the tree are obtained first, and then the corresponding atomic tree has at most

And (4) seed preparation. By b_iTo calculate the number of ith atomic trees, the vector representing the shape of the tree is

To add position information, each element in the form vector is expanded to b_iDimension, which respectively represents the height of the atomic tree in the AST, and the position form vector of the AST is:

such a representation method may retain both the form of the AST and its location information. After converting AST into vectors, the final repair cost J is:

3.2 local repair.

i. Inputting an error program p, a template program c and an input I to be tested;

if the partitioning results of p and c are different, returning to 'repair failure' and exiting;

iii, input I, get the trajectory set of C Γ C { [ P ]_C](ρ)|ρ∈I}；

iv, inputting I, and obtaining a track set of P, wherein t { [ P ]_P](ρ)|ρ∈I}；

v. loop through all variables (l, v2) in all blocks in program p to be repaired;

initializing a local repair set

Get valuation expression of v2

Circularly traversing all variables v1 in the template program c to obtain an assignment expression of v1

Constructing a mapping relation set W, and setting e for each mapping relation W, W belonging to W_implAll variables of (2) map to e_CEnsuring that v1 is mapped to obtain v 2;

x, traversing W in the set W;

a)e_Cis replaced by w to e_repair；

b) If e_repairAnd e_implMatching, adding (w,0) into LR;

c) otherwise, calculate e_repairTo e_implRepairing cost, adding (w, cost) into LR;

3.3 solving for final repair.

There are many options for whole program repair, and a complete repair is a subset of a partial repair that is found to be consistent. That is, in this set, no conflict occurs between all the variable mappings, each variable in the error program corresponds to only a unique variable in the template program, and there is no relationship between two variables corresponding to the same.

i. First determining a decision variable by

To represent the set of variables, V, of the correct program_implTo indicate that the program was in error,

representing a local repair set, the domain D of the decision variable is:

and the feasible fields to be satisfied can be listed according to the four conditions above:

the final objective function to be obtained is as follows:

and iv, converting the problem of the final repair result into an optimization problem under the linear constraint condition of 0-1, and obtaining an optimal solution by using lpsolve.

And 4, aiming at the obtained repairing result, corresponding the variable to the variable of the error program by utilizing the repairing mapping relation, and obtaining the line number and the specific operation which need to be repaired. And according to the operation statement, the repairing operation is corresponded to the knowledge point by using a regular expression method based on a certain rule, and the final feedback generation is completed.

The step of programming the learning prediction model is as follows, the grasping condition of the student on the programming knowledge point is obtained by utilizing the combined factor model aiming at the programming:

(1) the factors influencing the answering performance mainly comprise two types, one type is question correlation and comprises question difficulty, contained knowledge points and the easy learning degree of each knowledge point; the other type is the characteristics of the students, including the skill mastery degree before the students answer, the learning rate and some error factors, such as the guessing probability and the false answer probability, etc.

The difficulty degree and the learning rate of each knowledge point in the question are different, and the difficulty degree and the learning rate of the knowledge points are set as parameters.

The specific implementation steps of the feedback module are as follows:

(1) a programming error repair system is built by using the django module, and various solutions of correct questions and repair method prompts of wrong questions of students are displayed;

(2) displaying the change of the student ability by using a JavaScript chart library such as Echarts;

(3) and feeding back the visualization result to the teacher or the student.

The results show that:

a set of collected student programming examples are given below to further detail the method of the present invention. A course has a title of nine chapters, and Table 1 describes the basic situation of this part of the data.

Table 1: course chapters and sections title, students and related knowledge points

In table 1, the course contains 6 sections, each section has different number of subjects, the total number of subjects is finally added up to 9 subjects, 1030 students participate in answering the questions, the subjects of each student are not completely consistent, the number of submissions of each subject is also different, the submissions are added up to 6885 times, wherein the maximum number of submissions for the same exercise is 23 times. All topics relate to 11 knowledge points, including Constants (CO), Variables (VA), Operators (OP), Strings (ST), Expressions (EX), Lists (LI), Tuples (TU), Dictionaries (DI), Conditions (CD), Loops (LO) and printed statements (PR). The number of knowledge points contained in each topic is different, and C-9 is a comprehensive problem and contains all knowledge points.

The experimental procedure for carrying out the invention on this data set is as follows:

1. a program matching experiment was performed.

After a programming automatic repair tool is used for correctly submitting the same topic and using a matching algorithm, the two matched programs are further analyzed, so that different expression modes of the same algorithm are obtained, as shown in table 2, the expressions in the expression set 1 and the expression set 2 are consistent in execution tracks of variables aiming at the same input.

Table 2: matched expressions

The accurate submission of the 9 subjects in the experimental course is clustered, and the final obtained result is shown in table 3, which shows that the number of the 9 subject clusters is between 2 and 5, and the more complex the subjects are, the more the number of clusters is. The clustering result of topic C-9 is 5.

Table 3: clustering results

Topic of questions	Title	Average number of code lines	Number of clusters
				C-1	String splicing	5	2
C-2	Challenge list	11	3
				C-3	Sorting	11	3
C-4	Changing dictionary elements	12	4
				C-5	Calculating a numerical value	18	5
C-6	Data classification	14	3
				C-7	Calculating factorial	12	3
C-8	Data traversal	12	4
				C-9	Synthetic challenge	19	5

2. Performing program automatic repair experiment

The repairing rates and the repairing times of two different repairing cost representation modes are tested on the course data set, namely the difference between the tree editing distance and the vector distance representation method, and the result shows that the repairing rates are not different, as shown in fig. 4, the comparison between the repairing time using the tree editing distance as the repairing cost and the repairing time using the vector distance as the repairing cost is shown, and it can be seen that in the case of simple topics, such as C-1 and C-2, the repairing time difference between the two is not obvious, and as the complexity of the topics gradually increases, the time using the vector distance as the repairing cost is obviously lower than the repairing algorithm using the tree editing distance as the repairing cost.

In order to measure the complexity of the repair operation, vector distance is used as the repair cost, the repair cost is divided by the total vector value of the error program to obtain the relative repair cost, the relative repair cost is normalized, and the result is represented by a histogram. As shown in fig. 5, in the histogram, the horizontal axis represents the relative repair cost value, the vertical axis represents the number of error programs, and if the error programs are null and the AST number is 0, the relative repair cost is greater than 1.0, and may even be infinite. In the histogram, 68% of repairs have a relative repair cost of less than 0.2 and 25% of repairs have a relative repair cost of less than 0.1. The repair completed by the repair tool is mostly simple repair and has strong practicability.

3. Programming learning state prediction

Ten-fold cross validation is adopted on a data set collected by the method, an experiment is completed, AUC and ACC under prediction of a plurality of models are calculated, and the result is shown in table 4.

TABLE 4 prediction results

The invention has not been described in detail and is within the skill of the art.

Claims

1. A personalized intelligent tutoring method for programming beginners is characterized by comprising the following specific steps;

firstly, aiming at a certain question, dividing a programming operation program submitted by each beginner into two types according to correct answers and wrong answers; dividing each operation program by taking a block as granularity according to whether each operation program contains a control statement and a cycle statement;

then, inputting the test sample corresponding to the question into each operation program to obtain the variable value of the block in each operation program, and combining the variable values to form a variable execution track corresponding to each operation;

aiming at the correct answer, selecting the variable execution tracks corresponding to the correct answer one by one, judging whether the matching condition is met, if so, gathering all matched operation programs into one type, otherwise, independently classifying unmatched operation programs; randomly selecting an operation program from each class as a template to form a template set for repairing the local part of the error program;

aiming at the error answer, matching the variable execution track in the current error operation program S with the correct variable execution track selected one by one from the template set to obtain the matched correct variable and the unmatched error variable in the program S; generating a mapping relation set by adopting a Cartesian product between error variables of the program S and all variables in the current correct variable execution track in the template set; calculating the repair cost corresponding to each mapping relation; selecting a mapping relation which meets the requirements of complete consistency of variable matching and minimum cost value from all the repair costs, corresponding a correct variable in a correct variable execution track to a variable corresponding to an error program S, and corresponding the repair operation to a programming knowledge point by using a regular expression to complete final repair feedback generation;

and finally, constructing a joint factor model, realizing the evaluation of the programming learning state of the student, and predicting the answer result of the student on the next question.

2. The method as claimed in claim 1, wherein the rules for partitioning the plurality of task programs with block granularity are as follows:

a single-layer loop statement, as a separate block, denoted as { L:1 };

3. The method as claimed in claim 1, wherein the matching condition of the two variable execution tracks is satisfied simultaneously as follows:

and iii, judging whether the execution tracks of the two variables are in one-to-one correspondence.

4. The method for personalized intelligent tutoring for programming beginners as claimed in claim 1, wherein said process of clustering the correct answers is as follows:

and sequentially selecting the next variable execution track, and repeating the process until the variable execution tracks of all the operations are matched.

5. The method for personalized intelligent tutoring for programming beginners as claimed in claim 1, wherein said repair cost is: modifying the error code segment a into the size of the action required to be executed by the b;

the invention adopts the steps of converting a program segment into an abstract syntax tree, calculating the edit Distance (a, b) of the tree based on the structure of the tree as the repair cost, and the calculation formula is as follows:

Distance(a,b)＝S_q+I_q+D_r

S_qnumber of node modifications for two abstract syntax trees, I_qInserting operation times for nodes of two abstract syntax trees, D_rThe number of operations is deleted for the nodes of the two abstract syntax trees.

6. The method as claimed in claim 1, wherein the joint factor model predicts student answer probability by linearly adding mastery conditions and difficulty levels of all knowledge points according to student answer performance; probability p of correct answer_ijThe calculation formula of (a) is as follows:

Y_ijrepresenting the probability of the student i answering the question j; theta_iThe mastery degree of the student i on the question before answering is represented; beta is a_kRepresenting the ease of knowledge point k, gamma_kDenotes the learning rate of the knowledge point k, q_jkWhether the topic j contains a knowledge point k is represented, wherein the knowledge point k is 1 and is not 0; c. C_ikRepresents the actual performance of student i at knowledge point k, T_ikRepresents the number of attempts by student i at knowledge point k, γ_kAnd T_ikMultiplication means that the mastery degree of the student on the knowledge point k is higher every time the student tries; k represents the total number of knowledge points in the programming questions;

by matching the parameter theta of the association factor model_iAnd beta and gamma are optimized, and the next question of the student is input into the optimized combined factor model to predict the result of the question.