CN114092288A - Personalized intelligent tutoring method for programming beginners - Google Patents

Personalized intelligent tutoring method for programming beginners Download PDF

Info

Publication number
CN114092288A
CN114092288A CN202111395652.6A CN202111395652A CN114092288A CN 114092288 A CN114092288 A CN 114092288A CN 202111395652 A CN202111395652 A CN 202111395652A CN 114092288 A CN114092288 A CN 114092288A
Authority
CN
China
Prior art keywords
variable
programming
program
student
repair
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111395652.6A
Other languages
Chinese (zh)
Inventor
吴文峻
梁堉
武丽莎
韩勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Publication of CN114092288A publication Critical patent/CN114092288A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • G06Q50/205Education administration or guidance
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B19/00Teaching not covered by other main groups of this subclass
    • G09B19/0053Computers, e.g. programming

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Tourism & Hospitality (AREA)
  • General Health & Medical Sciences (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Human Resources & Organizations (AREA)
  • General Business, Economics & Management (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a personalized intelligent tutoring method for programming beginners, relating to the field of education intellectualization; firstly, aiming at each programming operation of a certain topic, dividing by taking a block as granularity; then, inputting the test sample into each operation program to obtain respective variable execution tracks; dividing the correct operation programs into clusters according to matching conditions, and randomly selecting a template from each cluster; aiming at the current error operation program, selecting templates one by one to match variable execution tracks with the templates, generating mapping relations by adopting Cartesian products, and calculating the repair cost corresponding to each mapping relation; and selecting a mapping relation which meets the conditions that the matching of the variables is completely consistent and the cost value is minimum, corresponding the correct variable to the variable corresponding to the error program, repairing the corresponding knowledge point, and finishing the final repair feedback generation. Finally, a combined factor model is constructed to realize the evaluation of the programming learning state of the student; the invention improves the repair rate of the programming operation.

Description

Personalized intelligent tutoring method for programming beginners
Technical Field
The invention relates to the field of education intellectualization, in particular to a programming beginner-oriented personalized intelligent tutoring method.
Background
The programming intelligent tutoring is an important content in intelligent education and aims to help students to repair submitted error programs and estimate mastering conditions of programming knowledge points.
Currently, the following problems exist in the field of programmed intelligent tutoring: the method has the advantages that automatic repair support for small-scale programs written by beginners is insufficient, programming error repair time is long, repair results are obscure and unintelligible, a model for programming learning state prediction is lacked, and an existing programming course online practice system is insufficient in providing personalized feedback; the above problems have led to a need for a programmed intelligent tutoring system.
In a traditional online evaluation system, students can only obtain a result of whether a code is correct, further repair work of an error code needs to be completed independently by the students or assisted by teachers, manual repair needs a large amount of time, and the efficiency is extremely low; in addition, students lack basic knowledge on the mastery conditions of own knowledge points in the programming learning process, and are difficult to obtain personalized guidance, an intelligent tutoring system for programming beginners is urgently needed to assist the students in learning programming courses, meanwhile, codes submitted by the students can be evaluated in real time, repair opinions and error-related knowledge points can be provided for the students for error programs, and the learning degree of the students on the programming knowledge points can be estimated after the students learn in stages.
If manual repair is adopted, a large amount of time is wasted by the students when the students repair errors by themselves, the programming learning enthusiasm of the students is also struck, and the difference of the number of the teachers and the students cannot meet the requirement that the teachers repair the errors of the students one by one. Therefore, implementing an intelligent tutoring system for a specific course has become an important content for intelligent learning at present stage.
Disclosure of Invention
Aiming at the problem of personalized intelligent tutoring of a programming beginner, the invention builds a personalized intelligent tutoring method facing the programming beginner in order to overcome the defects of the prior art, starts with the collection of learning behavior data of a programming course, carries out matching clustering on correct submissions of programming practice to obtain templates of each class, returns feedback opinions by using an automatic repair tool when the templates are submitted incorrectly, and simultaneously builds a learning state prediction model to evaluate and predict results newly submitted by a student through the wrong submissions of the student.
The personalized intelligent tutoring method for the programming beginners comprises the following steps:
step one, aiming at a certain question, dividing a programming operation program submitted by each programming beginner into two types according to a correct answer and an incorrect answer;
step two, dividing each operation program by taking a block as granularity according to whether a control statement and a cycle statement are contained;
the rules of the division are as follows:
i. the method has the advantages that the loop statement and the selection statement do not exist, and the loop statement and the selection statement are an integral block and are not divided; selecting a statement without nesting a loop statement, wherein the statement is taken as a whole block without division, and each block is expressed as { L:0 };
a single-layer loop statement, expressed as a separate block as { L:1 };
if a loop statement is nested in the selection statement, the whole is expressed as a single block as L: 1;
a plurality of layers of nested loops, each layer being a separate block; a loop block L2 is nested in the loop block L1, and is denoted as { L1: { L2:1} }.
Inputting the test sample corresponding to the question into each correct operation program to obtain the variable value of the block in each operation program, and combining the variable values to form a variable execution track corresponding to each operation;
the test sample comprises correct input and output of the question;
step four, selecting variable execution tracks one by one, judging whether matching conditions are met, if so, grouping all matched operation programs into one type, otherwise, independently classifying unmatched operation programs;
the matching of the two variable execution tracks needs to satisfy the following conditions at the same time:
i. whether the number of blocks in the two variable execution tracks is the same or not and whether the control statement and the loop structure statement are the same or not;
ii, whether the number of variables in the two variable execution tracks is the same;
and iii, whether the two variable execution tracks correspond to one another.
The method specifically comprises the following steps:
firstly, two variable execution tracks are initially randomly selected, whether matching conditions are met is judged, and if yes, the two matched operation programs are classified into one class; otherwise, the two corresponding operation programs are respectively of one type;
then, continuously selecting a third variable execution track, matching the third variable execution track with any one of the two matched operation programs, judging whether a matching condition is met, if so, classifying the third variable execution track into the class, and if not, independently classifying the third variable execution track into the class;
for two unmatched operation programs, matching the third variable execution tracks with the two operation programs one by one, repeatedly judging whether matching conditions are met, classifying the matching conditions into one class, and otherwise, respectively classifying the three operation programs into one class;
sequentially selecting the next variable execution track, and repeating the process until the variable execution tracks of all the operations are matched;
randomly selecting an operation program from each class as a template to form a template set for repairing the local part of the error program;
step six, for each wrong operation program, obtaining a variable execution track corresponding to each operation by using the test sample;
step seven, matching the variable execution track in the current error operation program S with the correct variable execution track selected one by one from the template set to obtain the matched correct variable and the unmatched error variable in the program S;
the correct variable refers to a variable which is matched with the correct variable in the variable execution track of the error operation program, namely a variable which does not have errors in the error operation program;
step eight, generating a mapping relation set by adopting Cartesian products between error variables of the program S and all variables in a correct variable execution track in the template set;
and step nine, calculating the repair cost corresponding to each mapping relation in the mapping relation set.
The repair cost is as follows: modifying the error code segment a into the size of the action required to be executed by the b;
the invention adopts the steps of converting a program fragment into an abstract syntax tree, calculating the edit Distance (a, b) of the tree as the repair cost based on the structure of the tree, and the calculation formula is as follows:
Distance(a,b)=Sq+Iq+Dr
Sqnumber of node modifications for two abstract syntax trees, IqInserting operation times for nodes of two abstract syntax trees, DrDeleting the operation times for the nodes of the two abstract syntax trees;
selecting a mapping relation which satisfies the conditions that the variables are completely matched and the cost value is minimum from all the repairing costs, corresponding the correct variables in the correct variable execution track to the variables corresponding to the error program S, and using a regular expression to correspond the repairing operation to the programming knowledge points to complete the final repairing feedback generation.
Step eleven, constructing a joint factor model, realizing the evaluation of the programming learning state of the student, and predicting the answer result of the student on the next question;
the joint factor model predicts the student answer probability by linearly adding the mastery conditions and the difficulty of all knowledge points according to the student answer performance.
Probability p of correct answerijThe calculation formula of (c) is as follows:
Figure BDA0003370234230000031
Yijrepresenting the probability of the student i answering the question j; theta.theta.iThe mastery degree of the student i on the question before answering is represented; beta is akRepresenting the ease of knowledge point k, gammakRepresents the learning rate of the knowledge point k, qjkWhether the topic j contains a knowledge point k is represented, wherein the knowledge point k is 1 and is not 0; c. CikRepresents the actual performance of student i at knowledge point k, TikRepresents the number of attempts by student i at knowledge point k, γkAnd TikMultiplication means that the mastery degree of the student on the knowledge point k is higher every time the student tries; k represents the total number of knowledge points in the programming topic.
By matching the parameter theta of the association factor modeliBeta and gamma are optimized, and the next question of the student is input into the optimized combined factor model to predict the result of the question;
compared with the prior art, the invention has the following advantages:
(1) compared with the existing software repair tool, the personalized intelligent tutoring method for the programming beginner repairs the programming jobs of students and can generate the clustering results of correct programs and the repair operation of wrong programs; the method positions the program analysis granularity to the block, matches and repairs by using the execution track of the variable, repairs based on the semantics, sets the repair emphasis as an algorithm rather than an expression form, and improves the repair rate of the programming operation.
(2) A personalized intelligent tutoring method for programming beginners optimizes an algorithm for generating local repair and reduces time for obtaining repair cost; the tree editing distance is used for calculating the repair cost, so that various choices are provided for repairing different scenes;
(3) a personalized intelligent tutoring method facing programming beginners defaults each question to contain only one knowledge point when analyzing student answer states in the existing learning state prediction model, and each programming question is complex and may contain a plurality of knowledge points in programming practice;
(4) a personalized intelligent tutoring method for programming beginners is characterized in that codes submitted by students contain more information than traditional objective questions, observation variables of existing models can only be binary variables (right answer and wrong answer), and multiple knowledge points and code information are introduced for learning state evaluation of programming practice.
Drawings
FIG. 1 is a schematic diagram of a programming beginner-oriented personalized intelligent tutoring method according to the present invention;
FIG. 2 is a flowchart of a programming beginner-oriented personalized intelligent tutoring method of the present invention;
FIG. 3 is a block diagram of a programming beginner-oriented personalized intelligent tutoring method of the present invention;
FIG. 4 is a time comparison of tree edit distance repair and vector distance repair used in the present invention;
FIG. 5 is a histogram of relative repair costs versus number of erroneous procedures according to the present invention.
Detailed Description
The present invention will be described in further detail and with reference to the accompanying drawings so that those skilled in the art can understand and practice the invention.
The intelligent tutoring system ITS is an adaptive learning support system which makes a computer play a role of a virtual tutor to teach knowledge to learners and provide learning guidance by means of an artificial intelligence technology. As a product of the development of modern distance education towards intellectualization, ITS combines scientific theories and technical methods of a plurality of subjects such as artificial intelligence, computer science, education, behavior science and psychology, and aims to provide learning guidance and help for learners in a man-machine interaction mode. The intelligent tutoring system can judge the mastering level of the students on the corresponding knowledge points according to the answering effect of the students, so that the students are helped to generate personalized learning routes and provide targeted tutoring opinions.
The existing intelligent tutoring system is mainly applied to the field of basic subjects, is still deficient in the intelligent tutoring system of programming courses, is more a simple on-line evaluation system, and has the following defects: (1) the repair guidance for student errors cannot be carried out; (2) without summarizing the learning process of students, the students cannot get the own mastery condition of the programming knowledge points. Aiming at the problems, the invention provides a personalized intelligent tutoring method for programming beginners, which realizes the intelligent tutoring for the programming beginners;
the personalized intelligent tutoring method for the programming beginners is characterized in that as shown in fig. 1, programming data of students are collected, verified and stored; the correct operation program in the programs submitted by the students is clustered to repair the wrong operation program; training is carried out through a learning state prediction model, the ability of the student to the programming knowledge point is deduced, and the final feedback generation is completed.
As shown in fig. 2, the method comprises the following steps:
step one, aiming at a certain question, dividing a programming operation program submitted by each programming beginner into two types according to a correct answer and an incorrect answer;
step two, dividing each operation program by taking a block as granularity according to whether a control statement and a cycle statement are contained;
the rules of the division are as follows:
i. the method has the advantages that the loop statement and the selection statement do not exist, and the loop statement and the selection statement are an integral block and are not divided; selecting a statement without nesting a loop statement, wherein the statement is taken as a whole block without division, and each block is expressed as { L:0 };
a single-layer loop statement, expressed as a separate block as { L:1 };
if a loop statement is nested in the selection statement, the whole is expressed as a single block as L: 1;
a plurality of layers of nested loops, each layer being a separate block; a loop block L2 is nested in the loop block L1, and is denoted as { L1: { L2:1} }.
Inputting the test sample corresponding to the question into each correct operation program to obtain the variable values of the blocks in each operation program, and combining the variable values to form a variable execution track corresponding to each operation;
the test sample comprises the correct input and output of the question;
step four, selecting variable execution tracks one by one, judging whether matching conditions are met, if so, grouping all matched operation programs into one type, otherwise, independently classifying unmatched operation programs;
the matching of the two variable execution tracks needs to satisfy the following conditions at the same time:
i. whether the number of blocks in the two variable execution tracks is the same or not and whether the control statement and the loop structure statement are the same or not;
ii, whether the number of variables in the two variable execution tracks is the same;
whether the two variable execution tracks correspond to each other one by one or not is judged; that is, in a variable trajectory set, for each variable in a program, there is one variable in another program that takes equal values in the same order as it takes.
The method specifically comprises the following steps:
firstly, two variable execution tracks are initially randomly selected, whether matching conditions are met is judged, and if yes, the two matched operation programs are classified into one class; otherwise, the two corresponding operation programs are respectively of one type;
then, continuously selecting a third variable execution track, matching the third variable execution track with any one of the two matched operation programs, judging whether a matching condition is met, if so, classifying the third variable execution track into the class, and if not, independently classifying the third variable execution track into the class;
for two unmatched operation programs, matching the third variable execution tracks with the two operation programs one by one, repeatedly judging whether matching conditions are met, classifying the matching conditions into one class, and otherwise, respectively classifying the three operation programs into one class;
sequentially selecting the next variable execution track, and repeating the process until the variable execution tracks of all the operations are matched;
randomly selecting an operation program from each class as a template to form a template set for repairing the local part of the error program;
step six, for each wrong operation program, obtaining a variable execution track corresponding to each operation by using the test sample;
step seven, matching the variable execution track in the current error operation program S with the correct variable execution track selected one by one from the template set to obtain the matched correct variable and the unmatched error variable in the program S;
the correct variable refers to a variable which is matched with the correct variable in the variable execution track of the error operation program, namely, a variable which does not have an error in the error operation program;
step eight, generating a mapping relation set by adopting Cartesian products between error variables of the program S and all variables in a correct variable execution track in the template set;
and utilizing the repair mapping relation to correspond the variable to the variable of the error program to obtain the line number and the specific operation to be repaired.
And step nine, calculating the repair cost corresponding to each mapping relation in the mapping relation set.
The repair cost is as follows: modifying the error code segment a into the size of the action required to be executed by the b; the method is obtained by using an Abstract Syntax Tree (Abstract Syntax Tree), the AST change calculates the repair cost in two ways, one is that the Tree edit Distance (a, b) is calculated based on the Tree structure as the cost, and the calculation formula is as follows:
Distance(a,b)=Sq+Iq+Dr
Sqnumber of node modifications for two abstract syntax trees, IqInserting operation times for nodes of two abstract syntax trees, DrDeleting the operation times for the nodes of the two abstract syntax trees;
another consideration is to convert AST into vectors, with the difference of two vectors representing the repair cost:
Figure BDA0003370234230000061
and step ten, selecting a mapping relation which satisfies the conditions that the variable matching is completely consistent and the cost value is minimum from all the repair costs, corresponding the correct variable in the correct variable execution track to the variable corresponding to the error program S, using a regular expression to correspond the repair operation to the programming knowledge point, and completing the final repair feedback generation.
In order to obtain the final repair, the repair cost needs to be minimized except for meeting the condition of matching consistency of variables. Therefore, a subset of the repair set is selected with minimal repair cost, considering the use of constraint optimization techniques; first, this problem needs to be transformed into an optimization problem under linear constraints, which is then handed over to an off-the-shelf ILP solver for optimal results.
Step eleven, constructing a joint factor model, realizing the evaluation of the programming learning state of the student, and predicting the answer result of the student on the next question;
the repaired wrong knowledge points can be regarded as knowledge points which are not mastered by students and used as the description of the learning process of the students, and a joint factor model can be introduced to improve the prediction effect of the model.
Probability p of correct parameters and answers to be solvedijThe calculation formula of (a) is as follows:
Figure BDA0003370234230000071
Yijrepresenting the probability of the student i answering the question j; thetaiThe ability of the student i is represented, namely the mastery degree of the question before answering, the ability of the student is positively correlated with the answer probability, and the stronger the ability is, the higher the answer probability is; beta is akRepresents the ease of knowledge point k, and qjkMultiplying to influence the probability of final answer, wherein the ease degree of the knowledge points can influence the result only if the questions contain the knowledge points; gamma raykShows the learning rate of the knowledge point k, i.e. the student does not grasp the knowledge point k in the jth subject, but learns the knowledge point k in the jth +1 subjectThe probability of the knowledge point is mastered; q. q.sjkWhether the topic j contains the knowledge point k or not is shown, the term is obtained from the Q-matrix and is a binary variable, and the topic contains the knowledge points k and QjkIs 1, not inclusive of 0; t isikRepresents the number of attempts by student i at knowledge point k, γkAnd TikMultiplication means that the mastery degree of the student on the knowledge point k is higher every time the student tries; k represents the number of the knowledge points summed up in the Q-matrix, and the final result is that the grasping conditions and the difficulty of all the knowledge points are linearly added to predict the answer probability of the students.
Q-matrix represents the relation between the topics and the knowledge points, the nth column value of the mth row is 1, the fact that the topics m contain the knowledge points n is represented, and the fact that the values are 0 indicates that the topics m do not contain the knowledge points n. Parameter cikThe method is characterized in that a new variable introduced by programming teaching is directly given by a programming tool to represent the actual performance of a student i on a knowledge point k, if the programming tool already gives the performance of the knowledge point, the value is 1 when an error occurs to emphasize the influence on the result, otherwise, the value is 0, and if the programming tool does not give the result, the knowledge point is considered not to be mastered, and the value is 1 correspondingly.
By matching the parameter theta of the association factor modeliBeta, gamma are optimized, the next question of the student is input into the optimized combined factor model, and the result prediction of the question is carried out; the specific optimization process comprises the following steps:
(1) initializing parameters: initializing thetai 0k 0k 0(i-1, 2, …, N; K-1, 2, …, K) setting m to 0, determining the maximum number of iterations G, initializing the iterator G to 0, and calculating L (θ)i 0k 0k 0);
(2) Let W be L (theta)i 0k 0k 0) Calculating a likelihood function
Figure BDA0003370234230000072
If g is satisfied>G or
Figure BDA0003370234230000073
And g ≠ 0, stops operating, otherwise orders
Figure BDA0003370234230000074
Under the scene that M students answer N topics, the log-likelihood function of the scene is as follows:
Figure BDA0003370234230000075
wherein the content of the first and second substances,
Figure BDA0003370234230000081
(3) order to
Figure BDA0003370234230000082
Gamma retentionk mConstant, calculate L vs. thetai m,βk mGradient of (2)
Figure BDA0003370234230000083
If it is satisfied with
Figure BDA0003370234230000084
And is
Figure BDA0003370234230000085
Stopping the iteration, thetai *=θi m,βk *=βk mB, carrying out the following steps of; otherwise, the step length is takenmSo that:
Figure BDA0003370234230000086
(4) order to
Figure BDA0003370234230000087
Calculating a likelihood function L (theta)i m+1k m+1k m);
If it is satisfied with
Figure BDA0003370234230000088
Or | | | θi m+1i m||<Epsilon and betak m+1k m||<Epsilon, stop iteration, let thetai *=θi m+1,βk *=βk m+1Updating m to m +1, and rotating (3);
(5) holding thetai m+1,βk m+1The temperature of the molten steel is not changed,
Figure BDA0003370234230000089
m is 0, and L is calculated for gammak mGradient of (2)
Figure BDA00033702342300000810
If it is not
Figure BDA00033702342300000811
Stopping iteration and let gammak *=γk m(ii) a Otherwise, selecting step length lambdamTo make
Figure BDA00033702342300000812
Figure BDA00033702342300000813
Computing
Figure BDA00033702342300000814
(6) If it is not
Figure BDA00033702342300000815
Or | | | γk m+1k m||<E, stopping the iteration,let gamma bek *=γk m+1G +1, go (2); otherwise, m is g +1, and then (3) is carried out;
stopping iteration until gradient reduction is realized;
the personalized intelligent tutoring method for the programming beginner is based on three modules shown in fig. 3, including a programming data collection and storage module, an automatic error recovery module and a learning effect evaluation module;
data are collected through two channels, one part of the data is collected through a student registration login evaluation module, an on-line submission operation program sends the data to a storage module, the evaluation module is established based on an open source on-line evaluation platform of Qingdao university, the other part of the data comes from a Karaoke practice teaching platform (a website providing programming course on-line practical training), the data are connected with the Karaoke platform in a butt joint mode, relevant data of students for answering objective questions and completing practical training are received regularly in a specific data format, and the data are stored in the module after the data format is confirmed to be correct.
And the error automatic repairing module acquires the data of the operation program from the data collecting and storing module and divides the data into correct answers and wrong answers. Clustering programs for correct operation, selecting templates of each type, and prompting students with multiple problem solving methods in the form of a tree diagram; and repairing the procedure of the error homework according to the template, and finally feeding back a modification prompt and an error knowledge point to the student.
The learning effect evaluation module is mainly used for analyzing answer data of objective questions and practical training questions, and finally, evaluation results are expressed by a visual chart.
The program matching in the error automatic repair module needs to meet the support of diversified expression, if the algorithm targets are consistent but the grammatical expressions are different, the algorithm targets tend to be regarded as one class, so that the minimum classes can be obtained, and the algorithm targets can be compared with fewer templates in the later repair process. Therefore, it is necessary to first define a criterion whether a program belongs to a class; generally, there are functions, statements, and variables at the granularity of program analysis; if the function is used as the analysis granularity of the programming operation, the loop and the control structure of the algorithm are difficult to master, some information in the process can be ignored, and the details of the program implementation are lost. If statements and variables are used as analysis granularity, the execution of the loop structure can greatly increase the complexity of analysis and is difficult to meet the timeliness of repair. It is therefore contemplated to divide the program into blocks, so that the program analysis is performed with the blocks as granularity. The specific steps of program matching and repairing are as follows:
step 1, firstly, matching programs. For a given program P, Q and multiple groups of input I, partitioning the two programs, observing whether the blocks of the two programs are in one-to-one correspondence, and directly exiting when the blocks of the two programs are not in one-to-one correspondence; otherwise, obtaining P, Q variable execution track according to the just divided blocks; simultaneously taking the Cartesian product of the variables P and Q as H; for each variable pair (v2, v1) in the P, Q variable cartesian product, checking whether the execution trajectories of the variables in the pair are the same, and if not, removing the variables from the H.
The algorithm obtains a series of track-matched variable pairs (v2, v1), but whether the variables of the two programs are in one-to-one correspondence cannot be judged, so that the judgment can be carried out by means of the algorithm of the graph, the variables are regarded as nodes, and the variables with the same track are connected into edges. That is, in the formed graph, all the points are divided into two groups, the points of one group can only be connected with the points of the other group, whether perfect matching exists on the bipartite graph formed by the variables is searched, and whether the programs P and Q are matched can be solved by using the Hungarian algorithm.
The method for judging the track corresponding relation emphasizes the dynamic execution of the program, can cluster codes with different code styles or codes expressing different codes but with consistent algorithm ideas, and lays a foundation for program repair.
And 2, program clustering needs to meet expandability, correct samples are dynamically increased, and if new correct submissions exist each time, the historical correct samples need to be clustered again, so that the algorithm efficiency is greatly reduced. It is contemplated that the clustering algorithm may be set to incremental, without requiring the algorithm to be restarted each time.
Firstly, inputting program set P ═ P composed of correct codes1,p2,p3,…,pnAnd (4) initializing a clustering result C { }, and circularly traversing generations, wherein the number k of template classes is 0Each program P in P in the code setiThe following procedure is performed:
i. if the clustering result C is null, let C1=pi,C={c1:[pi]},k=k+1;
And ii, if not, circularly traversing the template program C in the clustering result CiThe following procedure is performed:
a) if c isiAnd piMatch, then Ci]+=[pi],
b) Otherwise, the new form c is createdk+1=piLet C [ C ] bek+1]=[pi],k=k+1;
Returning a clustering result C;
and 3, the program repairing needs to refer to the clustered result, and since the clustered program algorithms belonging to the same class are consistent, the repairing only needs to compare the error program with the template of each class, so that the complexity of the algorithm can be reduced, and the efficiency of the algorithm is improved.
Considering that the granularity of program analysis is determined to be a block in program matching, a method from local to overall is adopted in repair, error codes are firstly partitioned, a group of local repairs are generated for expressions in the block, and the results of all the blocks form a repair set.
And 3.1, determining the repair cost.
The diff tool reflects the difference of the code lines by adding and deleting two basic change operations of the code lines, but the diff tool cannot reflect the difference of the code syntax structure. In order to be able to quantify the size of the repair operation, the program expression is converted into the form of an abstract syntax tree, each node on the tree representing some structure of the program language, which representation facilitates hiding syntax details. Therefore, addition, deletion and modification of expressions can be obtained by using corresponding changes of an abstract Syntax tree AST (abstract Syntax tree).
Whereas the change in AST can calculate the cost in two ways:
the tree edit distance is similar to the string edit distance, primarily to minimize the operational cost required to change from tree a to tree b. From a to b may need to be modifiedNode, insert node or delete node, therefore, each unit operation needs to be given weight 1, if the number of modification times is recorded as SqThe number of deletion operations is denoted as DrThe number of insertion operations is denoted as Iq(ii) a The tree edit Distance is the sum of these operations, i.e. Distance (a, b) is Sq+Iq+Dr
For AST vectorization, location information of each node is reserved as much as possible. The abstract syntax tree is essentially a binary tree, and for conversion into vectors, the abstract syntax tree is considered to be split into a series of original subtrees to capture structural information of the tree. Specifically, the node type number L of the tree and the height q of the tree are obtained first, and then the corresponding atomic tree has at most
Figure BDA0003370234230000101
And (4) seed preparation. By biTo calculate the number of ith atomic trees, the vector representing the shape of the tree is
Figure BDA0003370234230000102
To add position information, each element in the form vector is expanded to biDimension, which respectively represents the height of the atomic tree in the AST, and the position form vector of the AST is:
Figure BDA0003370234230000103
such a representation method may retain both the form of the AST and its location information. After converting AST into vectors, the final repair cost J is:
Figure BDA0003370234230000111
3.2 local repair.
i. Inputting an error program p, a template program c and an input I to be tested;
if the partitioning results of p and c are different, returning to 'repair failure' and exiting;
iii, input I, get the trajectory set of C Γ C { [ P ]C](ρ)|ρ∈I};
iv, inputting I, and obtaining a track set of P, wherein t { [ P ]P](ρ)|ρ∈I};
v. loop through all variables (l, v2) in all blocks in program p to be repaired;
initializing a local repair set
Figure BDA0003370234230000112
Get valuation expression of v2
Figure BDA0003370234230000113
Circularly traversing all variables v1 in the template program c to obtain an assignment expression of v1
Figure BDA0003370234230000114
Constructing a mapping relation set W, and setting e for each mapping relation W, W belonging to WimplAll variables of (2) map to eCEnsuring that v1 is mapped to obtain v 2;
x, traversing W in the set W;
a)eCis replaced by w to erepair
b) If erepairAnd eimplMatching, adding (w,0) into LR;
c) otherwise, calculate erepairTo eimplRepairing cost, adding (w, cost) into LR;
3.3 solving for final repair.
There are many options for whole program repair, and a complete repair is a subset of a partial repair that is found to be consistent. That is, in this set, no conflict occurs between all the variable mappings, each variable in the error program corresponds to only a unique variable in the template program, and there is no relationship between two variables corresponding to the same.
i. First determining a decision variable by
Figure BDA0003370234230000115
To represent the set of variables, V, of the correct programimplTo indicate that the program was in error,
Figure BDA0003370234230000118
representing a local repair set, the domain D of the decision variable is:
Figure BDA0003370234230000116
and the feasible fields to be satisfied can be listed according to the four conditions above:
Figure BDA0003370234230000117
Figure BDA0003370234230000121
Figure BDA0003370234230000122
Figure BDA0003370234230000123
the final objective function to be obtained is as follows:
Figure BDA0003370234230000124
and iv, converting the problem of the final repair result into an optimization problem under the linear constraint condition of 0-1, and obtaining an optimal solution by using lpsolve.
And 4, aiming at the obtained repairing result, corresponding the variable to the variable of the error program by utilizing the repairing mapping relation, and obtaining the line number and the specific operation which need to be repaired. And according to the operation statement, the repairing operation is corresponded to the knowledge point by using a regular expression method based on a certain rule, and the final feedback generation is completed.
The step of programming the learning prediction model is as follows, the grasping condition of the student on the programming knowledge point is obtained by utilizing the combined factor model aiming at the programming:
(1) the factors influencing the answering performance mainly comprise two types, one type is question correlation and comprises question difficulty, contained knowledge points and the easy learning degree of each knowledge point; the other type is the characteristics of the students, including the skill mastery degree before the students answer, the learning rate and some error factors, such as the guessing probability and the false answer probability, etc.
The difficulty degree and the learning rate of each knowledge point in the question are different, and the difficulty degree and the learning rate of the knowledge points are set as parameters.
The specific implementation steps of the feedback module are as follows:
(1) a programming error repair system is built by using the django module, and various solutions of correct questions and repair method prompts of wrong questions of students are displayed;
(2) displaying the change of the student ability by using a JavaScript chart library such as Echarts;
(3) and feeding back the visualization result to the teacher or the student.
The results show that:
a set of collected student programming examples are given below to further detail the method of the present invention. A course has a title of nine chapters, and Table 1 describes the basic situation of this part of the data.
Table 1: course chapters and sections title, students and related knowledge points
Figure BDA0003370234230000125
Figure BDA0003370234230000131
In table 1, the course contains 6 sections, each section has different number of subjects, the total number of subjects is finally added up to 9 subjects, 1030 students participate in answering the questions, the subjects of each student are not completely consistent, the number of submissions of each subject is also different, the submissions are added up to 6885 times, wherein the maximum number of submissions for the same exercise is 23 times. All topics relate to 11 knowledge points, including Constants (CO), Variables (VA), Operators (OP), Strings (ST), Expressions (EX), Lists (LI), Tuples (TU), Dictionaries (DI), Conditions (CD), Loops (LO) and printed statements (PR). The number of knowledge points contained in each topic is different, and C-9 is a comprehensive problem and contains all knowledge points.
The experimental procedure for carrying out the invention on this data set is as follows:
1. a program matching experiment was performed.
After a programming automatic repair tool is used for correctly submitting the same topic and using a matching algorithm, the two matched programs are further analyzed, so that different expression modes of the same algorithm are obtained, as shown in table 2, the expressions in the expression set 1 and the expression set 2 are consistent in execution tracks of variables aiming at the same input.
Table 2: matched expressions
Figure BDA0003370234230000132
The accurate submission of the 9 subjects in the experimental course is clustered, and the final obtained result is shown in table 3, which shows that the number of the 9 subject clusters is between 2 and 5, and the more complex the subjects are, the more the number of clusters is. The clustering result of topic C-9 is 5.
Table 3: clustering results
Topic of questions Title Average number of code lines Number of clusters
C-1 String splicing 5 2
C-2 Challenge list 11 3
C-3 Sorting 11 3
C-4 Changing dictionary elements 12 4
C-5 Calculating a numerical value 18 5
C-6 Data classification 14 3
C-7 Calculating factorial 12 3
C-8 Data traversal 12 4
C-9 Synthetic challenge 19 5
2. Performing program automatic repair experiment
The repairing rates and the repairing times of two different repairing cost representation modes are tested on the course data set, namely the difference between the tree editing distance and the vector distance representation method, and the result shows that the repairing rates are not different, as shown in fig. 4, the comparison between the repairing time using the tree editing distance as the repairing cost and the repairing time using the vector distance as the repairing cost is shown, and it can be seen that in the case of simple topics, such as C-1 and C-2, the repairing time difference between the two is not obvious, and as the complexity of the topics gradually increases, the time using the vector distance as the repairing cost is obviously lower than the repairing algorithm using the tree editing distance as the repairing cost.
In order to measure the complexity of the repair operation, vector distance is used as the repair cost, the repair cost is divided by the total vector value of the error program to obtain the relative repair cost, the relative repair cost is normalized, and the result is represented by a histogram. As shown in fig. 5, in the histogram, the horizontal axis represents the relative repair cost value, the vertical axis represents the number of error programs, and if the error programs are null and the AST number is 0, the relative repair cost is greater than 1.0, and may even be infinite. In the histogram, 68% of repairs have a relative repair cost of less than 0.2 and 25% of repairs have a relative repair cost of less than 0.1. The repair completed by the repair tool is mostly simple repair and has strong practicability.
3. Programming learning state prediction
Ten-fold cross validation is adopted on a data set collected by the method, an experiment is completed, AUC and ACC under prediction of a plurality of models are calculated, and the result is shown in table 4.
TABLE 4 prediction results
Figure BDA0003370234230000141
Figure BDA0003370234230000151
The invention has not been described in detail and is within the skill of the art.

Claims (6)

1. A personalized intelligent tutoring method for programming beginners is characterized by comprising the following specific steps;
firstly, aiming at a certain question, dividing a programming operation program submitted by each beginner into two types according to correct answers and wrong answers; dividing each operation program by taking a block as granularity according to whether each operation program contains a control statement and a cycle statement;
then, inputting the test sample corresponding to the question into each operation program to obtain the variable value of the block in each operation program, and combining the variable values to form a variable execution track corresponding to each operation;
aiming at the correct answer, selecting the variable execution tracks corresponding to the correct answer one by one, judging whether the matching condition is met, if so, gathering all matched operation programs into one type, otherwise, independently classifying unmatched operation programs; randomly selecting an operation program from each class as a template to form a template set for repairing the local part of the error program;
aiming at the error answer, matching the variable execution track in the current error operation program S with the correct variable execution track selected one by one from the template set to obtain the matched correct variable and the unmatched error variable in the program S; generating a mapping relation set by adopting a Cartesian product between error variables of the program S and all variables in the current correct variable execution track in the template set; calculating the repair cost corresponding to each mapping relation; selecting a mapping relation which meets the requirements of complete consistency of variable matching and minimum cost value from all the repair costs, corresponding a correct variable in a correct variable execution track to a variable corresponding to an error program S, and corresponding the repair operation to a programming knowledge point by using a regular expression to complete final repair feedback generation;
and finally, constructing a joint factor model, realizing the evaluation of the programming learning state of the student, and predicting the answer result of the student on the next question.
2. The method as claimed in claim 1, wherein the rules for partitioning the plurality of task programs with block granularity are as follows:
i. the method has the advantages that the loop statement and the selection statement do not exist, and the loop statement and the selection statement are an integral block and are not divided; selecting a statement without nesting a loop statement, wherein the statement is taken as a whole block without division, and each block is expressed as { L:0 };
a single-layer loop statement, as a separate block, denoted as { L:1 };
if a loop statement is nested in the selection statement, the whole is expressed as a single block as L: 1;
a plurality of layers of nested loops, each layer being a separate block; a loop block L2 is nested in the loop block L1, and is denoted as { L1: { L2:1} }.
3. The method as claimed in claim 1, wherein the matching condition of the two variable execution tracks is satisfied simultaneously as follows:
i. whether the number of blocks in the two variable execution tracks is the same or not and whether the control statement and the loop structure statement are the same or not;
ii, whether the number of variables in the two variable execution tracks is the same;
and iii, judging whether the execution tracks of the two variables are in one-to-one correspondence.
4. The method for personalized intelligent tutoring for programming beginners as claimed in claim 1, wherein said process of clustering the correct answers is as follows:
firstly, two variable execution tracks are initially randomly selected, whether matching conditions are met is judged, and if yes, the two matched operation programs are classified into one class; otherwise, the two corresponding operation programs are respectively of one type;
then, continuously selecting a third variable execution track, matching the third variable execution track with any one of the two matched operation programs, judging whether a matching condition is met, if so, classifying the third variable execution track into the class, and if not, independently classifying the third variable execution track into the class;
for two unmatched operation programs, matching the third variable execution tracks with the two operation programs one by one, repeatedly judging whether matching conditions are met, classifying the matching conditions into one class, and otherwise, respectively classifying the three operation programs into one class;
and sequentially selecting the next variable execution track, and repeating the process until the variable execution tracks of all the operations are matched.
5. The method for personalized intelligent tutoring for programming beginners as claimed in claim 1, wherein said repair cost is: modifying the error code segment a into the size of the action required to be executed by the b;
the invention adopts the steps of converting a program segment into an abstract syntax tree, calculating the edit Distance (a, b) of the tree based on the structure of the tree as the repair cost, and the calculation formula is as follows:
Distance(a,b)=Sq+Iq+Dr
Sqnumber of node modifications for two abstract syntax trees, IqInserting operation times for nodes of two abstract syntax trees, DrThe number of operations is deleted for the nodes of the two abstract syntax trees.
6. The method as claimed in claim 1, wherein the joint factor model predicts student answer probability by linearly adding mastery conditions and difficulty levels of all knowledge points according to student answer performance; probability p of correct answerijThe calculation formula of (a) is as follows:
Figure FDA0003370234220000021
Yijrepresenting the probability of the student i answering the question j; thetaiThe mastery degree of the student i on the question before answering is represented; beta is akRepresenting the ease of knowledge point k, gammakDenotes the learning rate of the knowledge point k, qjkWhether the topic j contains a knowledge point k is represented, wherein the knowledge point k is 1 and is not 0; c. CikRepresents the actual performance of student i at knowledge point k, TikRepresents the number of attempts by student i at knowledge point k, γkAnd TikMultiplication means that the mastery degree of the student on the knowledge point k is higher every time the student tries; k represents the total number of knowledge points in the programming questions;
by matching the parameter theta of the association factor modeliAnd beta and gamma are optimized, and the next question of the student is input into the optimized combined factor model to predict the result of the question.
CN202111395652.6A 2020-11-24 2021-11-23 Personalized intelligent tutoring method for programming beginners Pending CN114092288A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2020113337521 2020-11-24
CN202011333752 2020-11-24

Publications (1)

Publication Number Publication Date
CN114092288A true CN114092288A (en) 2022-02-25

Family

ID=80303301

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111395652.6A Pending CN114092288A (en) 2020-11-24 2021-11-23 Personalized intelligent tutoring method for programming beginners

Country Status (1)

Country Link
CN (1) CN114092288A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106095663A (en) * 2016-05-26 2016-11-09 西安交通大学 Program based on hierarchical model returns location of mistake method
CN109408114A (en) * 2018-08-20 2019-03-01 哈尔滨工业大学 A kind of program error automatic correcting method, device, electronic equipment and storage medium
KR101984954B1 (en) * 2019-02-08 2019-05-31 주식회사 소풍앤컴퍼니 Coding learning support device
CN109919500A (en) * 2019-03-13 2019-06-21 中南大学 Auto-Evaluation System with the error feedback function based on ontology
CN110349477A (en) * 2019-07-16 2019-10-18 湖南酷得网络科技有限公司 A kind of misprogrammed restorative procedure, system and server based on history learning behavior

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106095663A (en) * 2016-05-26 2016-11-09 西安交通大学 Program based on hierarchical model returns location of mistake method
CN109408114A (en) * 2018-08-20 2019-03-01 哈尔滨工业大学 A kind of program error automatic correcting method, device, electronic equipment and storage medium
KR101984954B1 (en) * 2019-02-08 2019-05-31 주식회사 소풍앤컴퍼니 Coding learning support device
CN109919500A (en) * 2019-03-13 2019-06-21 中南大学 Auto-Evaluation System with the error feedback function based on ontology
CN110349477A (en) * 2019-07-16 2019-10-18 湖南酷得网络科技有限公司 A kind of misprogrammed restorative procedure, system and server based on history learning behavior

Similar Documents

Publication Publication Date Title
CN113851020A (en) Self-adaptive learning platform based on knowledge graph
Serral et al. Automating immediate and personalized feedback taking conceptual modelling education to a next level
CN107544960A (en) A kind of inference method activated based on Variable-Bindings and relation
CN116361697A (en) Learner learning state prediction method based on heterogeneous graph neural network model
Nakhkob et al. Predicted increase enrollment in higher education using neural networks and data mining techniques
Carmona et al. Subgroup discovery in an e-learning usage study based on Moodle
Latypova Automated system for checking works with free response using intelligent tutor’s comment analysis in engineering education
Fayzrakhmanov et al. The use of mathematical methods to automate the control of skills in the study of software testing algorithms
CN113870634A (en) Intelligent volume combination method and system combined with virtual teaching
CN111625631B (en) Method for generating option of choice question
CN117473041A (en) Programming knowledge tracking method based on cognitive strategy
CN114092288A (en) Personalized intelligent tutoring method for programming beginners
CN110349477A (en) A kind of misprogrammed restorative procedure, system and server based on history learning behavior
CN113066358B (en) Science teaching auxiliary system
Hare et al. Optimize student learning via random forest-based adaptive narrative game
Broisin et al. Design and evaluation of a semantic indicator for automatically supporting programming learning
Latypova Work with free response implementation process analysis based on sequential pattern mining in engineering education
Sudol et al. Calculating Probabilistic Distance to Solution in a Complex Problem Solving Domain.
CN112256869B (en) Same-knowledge-point test question grouping system and method based on question meaning text
Ventura A Self-learning musical tool to support the educational activity
Bouarab-Dahmani et al. Automated evaluation of learners with ODALA: application to relational databases E-learning
Guts et al. Modelling and construction of curriculum optimization algorithms in order to improve the effectiveness of management of the educational process
Suraweera et al. Widening the knowledge acquisition bottleneck for constraint-based tutors
Wang et al. A novel intelligent tutoring system for learning programming
Shubin et al. Methods of Adaptive Knowledge Testing Based on the Theory of Logical Networks.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination