CN114997461B

CN114997461B - Time-sensitive answer correctness prediction method combining learning and forgetting

Info

Publication number: CN114997461B
Application number: CN202210374206.5A
Authority: CN
Inventors: 马海平; 王菁源; 张海峰; 张兴义
Original assignee: Anhui University; Institute of Artificial Intelligence of Hefei Comprehensive National Science Center
Current assignee: Anhui University; Institute of Artificial Intelligence of Hefei Comprehensive National Science Center
Priority date: 2022-04-11
Filing date: 2022-04-11
Publication date: 2024-05-28
Anticipated expiration: 2042-04-11
Also published as: CN114997461A

Abstract

The invention discloses a time-sensitive answer correctness prediction method combining learning and forgetting, which comprises the following steps: 1, acquiring a student history answer record and carrying out serialization pretreatment; 2, fitting the knowledge state of the student by using a long-short-period memory network in continuous time and predicting the correctness of the answering of the student; and 3, training the neural network parameters to obtain a trained correct answer prediction model for realizing the prediction of the student answer correctness. The invention can realize the prediction of the answering correctness of the students from end to end, and can model the knowledge state of the students at any moment, thereby providing effective assistance for the intelligent coaching system and teachers.

Description

Time-sensitive answer correctness prediction method combining learning and forgetting

Technical Field

The invention relates to the field of cognitive modeling, in particular to a method for predicting the accuracy of a time-sensitive answer by combining learning and forgetting.

Background

In recent years, rapid development of intelligent teaching systems has accumulated a large number of student practice records, thereby providing a new data-driven model for computer-aided education: cognitive modeling. The purpose of cognitive modeling is to discover the knowledge level or learning ability of a student, and its results can benefit from a wide range of intelligent educational applications, such as predicting student performance and personalized course recommendations.

In cognitive modeling, many people aim to track changes in the knowledge level of students, taking into account the dynamics of the learning process. Existing methods can be divided into two categories: (1) A conventional model represented by Bayesian Knowledge Tracking (BKT) and factorization models; (2) Sequence models based on deep neural networks, such as Deep Knowledge Tracking (DKT), dynamic key-value store network (DKVMN), etc. Wherein the deep knowledge tracking model is the first method to fit the knowledge state of students using recurrent neural networks and infer the current problem achievements from their historical learning records.

One long-standing research challenge in the field of cognitive modeling is how to naturally integrate a forgetting mechanism into the learning process of knowledge, and some researchers have integrated forgetting factors into the cognitive modeling of students, so that the accuracy of the answering results of the students and the capability of capturing forgetting are improved. Most of these methods rely on manually designed forgetting features (e.g., dkt+ Forgetting counts the number of times a student has made a piece of knowledge bring out the theme mesh and inputs it as a feature into a model); or rely on simplified process assumptions (e.g., fixed and discrete learning intervals) that greatly limit the performance and flexibility of downstream applications, there is still a lack of a realistic cognitive modeling approach to balance the learning and forgetting process such that forgetting occurs over a continuous time period and the student's performance of the questions also changes over time. The modeling mode of the neural Hox process is similar to the description of the memory law in the cognitive psychology, continuous long-short-term memory networks in the neural Hox are used for fitting the learning and forgetting processes which are mutually dependent in continuous time in a exploratory way, and the learning and forgetting process can improve the ability of predicting the correct answering performance of students and provide references related to the memory capacity of the students for intelligent coaching systems and teachers.

Disclosure of Invention

The invention aims to solve the problems in the prior art, and provides a time-sensitive answer correctness prediction method combining learning and forgetting, so as to fully and truly model the change process of the knowledge state of students under the mutual influence of learning and forgetting, thereby obtaining the knowledge mastering degree of the students at any moment, realizing end-to-end prediction of the answer correctness of the students, improving the accuracy of the answer result prediction of the students, and providing effective assistance for intelligent coaching systems and teachers.

The invention adopts the following technical scheme for solving the technical problems:

the invention relates to a method for predicting the accuracy of a time-sensitive answer by combining learning and forgetting, which is characterized by comprising the following steps:

step 1, acquiring a student history answer record and carrying out serialization pretreatment:

set the student set as The title set is/>Knowledge concept set is/>Wherein student set/>There are L students, topic set/>There are M problems in the middle, knowledge concept set/>There are N knowledge points in the model; using s to represent student set/>Any student in (a), q represents the question set/>K represents the knowledge concept set/>Any knowledge point in the database, and gathering the questions/>The number of the questions in the method is 1, …, M, knowledge concept set/>The number of the middle knowledge point is 1, … and N;

representing a history of any student s as a response sequence

Wherein,Time instant of ith answer for student s, and/> Question numbering for the ith answer of student s,/>Questions/>, for the ith answer of student sNumber of knowledge concepts examined,/>Representing student s answering the question/>, at the ith timeThe above answer cases, if/>Indicate answer pairs, if/>Indicating answering errors, wherein i=1, 2, … and n _s,n_s are the answering times of the student s;

step 2, constructing a neural network for knowledge state fitting-answer accuracy prediction, which comprises the following steps: a learning part represented by the continuous long-short-term memory network, a forgetting part represented by the continuous long-short-term memory network and a question-answering prediction module;

wherein, the study part that the continuous long-short term memory network represents includes: the system comprises a single thermal coding embedded layer, four single-layer full-connection feedforward neural networks, two activation functions and a cell information calculation layer;

The forgetting part represented by the continuous long-short term memory network comprises: three single-layer full-connection feedforward neural networks, two activation functions, a memory attenuation layer and a knowledge state acquisition layer;

the answer prediction module comprises two independent thermal coding embedded layers, a multi-layer perceptron layer and an activation function;

Step 2.1, learning part represented by long-short-period memory network in continuous time:

step 2.1.1, calculating student s in the single thermal coding embedded layer by using the method (1) Interactive embedding/>, when answering questions at any time

In formula (1), A is an embedding matrix to be trained, andM is the embedding dimension,/>Representing student s is/>Instant answer manifestation/>Is encoded unithermally, and/>If it isIndicating that s has no answer or wrong answer at the knowledge point with the number of j% N at the time t _i, if/>Then it indicates that student s is/>The moment is correctly answered at the knowledge point numbered j% N, where the% symbol represents the remainder, and is derived from equation (2):

Step 2.1.2 in At the moment, let student s answer question/>, at the ith timeKnowledge state at time is/>Will beAnd/>Spliced into the ith input vector/>Then, respectively inputting three single-layer full-connection feedforward neural networks and correspondingly outputting the first forgetting gate/>, when the ith update is correspondingly output, through a sigmoid functionFirst input gate/>Output door/>Let the initial knowledge state/>, of student s when i=1Is the set value;

Step 2.1.3, input vector i Inputting a fourth single-layer full-connection feedforward neural network, and outputting/>, through a tanh activation functionCandidate memory representation of time/>Thereby calculating the expression (3)Memory representation/>, in time cell information calculation layer

In the formula (3), the amino acid sequence of the compound,Representation/>Post-decay memory in the time memory decay layer indicates that when i=1, let/>Is the set value;

step 2.2, forgetting part represented by long-short-term memory network in continuous time:

Step 2.2.1, input vector i Inputting into a fifth monolayer full-connection feedforward neural network, and activating a function through softplus to obtain the student s's/>Forgetting coefficient in time period/>

Step 2.2.2, input vector iRespectively inputting the residual two single-layer full-connection feedforward neural networks and correspondingly activating a function through sigmoid, thereby correspondingly obtaining a second forgetting gate/>, when the ith update is obtainedSecond input gate/>

Step 2.2.3, the memory decay layer is calculated using (4)Lower limit of memory attenuation over a period of time

In the formula (4), the amino acid sequence of the compound,For the last time period/>Lower limit of internal memory attenuation, let/>, when i=1Is the set value;

Step 2.2.4, the memory attenuation layer is calculated in using (5) The memory after the moment is forgotten represents c ^s (t):

Step 2.3, obtaining hidden knowledge state:

In the following formula (6) Obtain/>Post-forget memory representation of time/>And is noted as attenuated memory representation/>The knowledge state acquisition layer calculates student s's presence/>, using (6)Hidden knowledge state/>, when the questions are answered at the moment

In the formula (6), sigma (·) is a sigmoid activation function;

Step 2.4, a question-answering prediction module:

Step 2.4.1, order Is a question/>The two independent thermal coding embedded layers respectively obtain the title/>, by using the formula (7) and the formula (8)Difficulty/>And degree of differentiation/>

In the formulas (7) and (8), sigma (·) is a sigmoid function,Two embedding matrixes to be trained;

step 2.4.2, the multi-layer perception machine layer enables students s to be in Capability level representation of time of day/>Thereby obtaining the question/>, at the (i+1) th answer, of the student s by using the formula (9)Prediction of correct probability of answer on/>

In the formula (9), F (-) is a multi-layer perceptron;

step 2.5, after i+1 is assigned to i, returning to the step 2.1 for sequential execution until the history answer sequence of the student s is completed Prediction/>, of answer correctness probability of last answer in the list

Step 3, constructing cross entropy loss by using the method (10)Training a knowledge state fitting-answer correctness prediction neural network, so as to obtain a trained answer correctness prediction model for predicting the student answer correctness:

In the formula (10), the amino acid sequence of the compound, For the predicted value of the correct probability of answering questions of students s at the time t _i,/>Is the true value of the question answering correctness of the student s at the time t _i, wherein/>Representing misanswers,/>Representing an answer pair.

The method for predicting the correctness of the answer of the combined learning and forgetting time sensitivity is characterized in that the answer prediction module in the step 2.4 is set to predict the correctness of the answer according to the following process:

Step 2.4.1, order Is a question/>Is represented by the independent thermal encoding of (a) and (b) using the formulas (11) and (12) to obtain the title/>, respectivelyDifficulty/>And degree of differentiation/>

In the formulas (11) and (12), sigma (·) is a sigmoid function,Is two embedded matrixes to be trained;

Step 2.4.2, obtaining the student s at the same time by using the multi-layer sensing machine layer (13) Capability level representation of time of day/>

In the formula (13), the amino acid sequence of the compound,Is a matrix to be trained;

Step 2.4.3, multiple perceptron layers to obtain student s answering at i+1st time using equation (9), at question Prediction of correct probability of answer on/>

Further, the answer prediction module in the step 2.4 is set to predict the answer correctness according to the following process:

Step 2.4.1, order Is a question/>Is represented by the independent thermal encoding of (a) and (b) using the formulas (15) and (16) to obtain the title/>, respectivelyDifficulty/>And degree of differentiation/>

In the formulas (15) and (16), sigma (·) is a sigmoid function,Two embedding matrixes to be trained;

Step 2.4.2, obtaining the student s at the position by using the multi-layer sensing machine layer (17) Capability level representation of time of day/>

In the formula (17), the amino acid sequence of the compound,Is a matrix to be trained;

step 2.4.3, setting a question-knowledge point matrix as Q _q＝{Q_mn}_M×N, wherein M is more than or equal to 1 and less than or equal to M, N is more than or equal to 1 and less than or equal to N, if the problem with the number of M surveys the knowledge point with the number of N, marking Q _mn =1, otherwise marking Q _mn =0;

The multi-layer perceptron layer obtains the question when the student s answers in the (i+1) th time by using the formula (18) Prediction of correct probability of answer on/>

In the formula (18), f' (. Cndot.) represents a multi-layer perceptron, and the symbol isRepresenting the multiplication of the matrix corresponding positions.

Compared with the prior art, the invention has the beneficial effects that:

1. The invention uses the continuous long-short term memory network in the neural Hox process to jointly model, learn and forget, thereby obtaining the knowledge state of the student in continuous time; the forgotten influencing factors are not only related to the current knowledge mastering degree and learning content of students, but also are related to the time, are sensitive to the time factors, and are used for more truly and fully carrying out cognitive modeling on the students, so that the forgetting ability of the students at different times can be measured, the correctness of the answering of the students can be predicted with high accuracy, valuable references are provided for learning states of learners of intelligent coaching systems, teachers and the like, the students are guided to carry out targeted teaching training, and the learning model can be used as upstream application of self-adaptive thematic application and the like.

2. According to the invention, student answer expression prediction is performed through the couplable knowledge mastering degree-question interaction function, the mode can effectively contact student knowledge mastering degree and question information, scalar comprehensive knowledge mastering degree of students can be obtained or mastering degree of students on each knowledge point is enhanced due to the couplability, the interpretability of a model is enhanced, the method can be used for visualizing knowledge states, an intelligent coaching system, a learner and the like are helped to quickly know the comprehensive ability level of the learner and the ability level on a specific knowledge point, and targeted training is performed.

3. According to the invention, the dynamic change of the knowledge state of the student is modeled through the continuous long-short-term memory network, and the modeling mode can combine learning and forgetting processes, so that the change of the knowledge state is close to the actual change process, and further the prediction precision is improved on the future performance prediction of the student.

4. Experiments show that compared with other advanced algorithms, the method has stable answer prediction performance on different sequence lengths (namely the number of questions to be answered of each student), and shows good robustness.

Drawings

FIG. 1 is a diagram of a model framework corresponding to the method of the present invention.

Detailed Description

In this embodiment, referring to fig. 1, a method for predicting validity of a time-sensitive answer for joint learning and forgetting is performed according to the following steps:

set the student set as The title set is/>Knowledge concept set is/>Wherein student set/>There are L students, topic set/>There are M problems in the middle, knowledge concept set/>There are N knowledge points in the model; using s to represent student set/>Any student in (a), q represents the question set/>In (a) k represents the knowledge concept set/>And collect the questions/>The number of the questions in the method is 1, …, M, knowledge concept set/>The number of the middle knowledge point is 1, … and N;

representing a history of any student s as a response sequence

Wherein,Time instant of ith answer for student s, and/> Question numbering for the ith answer of student s,/>Questions/>, for the ith answer of student sNumber of knowledge concepts examined,/>Representing student s answering the question/>, at the ith timeThe above answer cases, if/>Indicate answer pairs, if/>Indicating answering errors, wherein i=1, 2, … and n _s,n_s are the answering times of the student s; due to/>The length of the answers of the middle school students is different, the maximum length is set to be ML, the answer records are cut into new sequences beyond ML, and the new sequences are not fully supplemented by 0. Three real datasets ASSISTMENT, ASSISTMENT, and slepemapy. Cz are used in this embodiment, and ml=100 is set. The example uses 5-fold cross training, the experimental results are averaged over 5 runs, 20% of the data set is used as the test set, 10% is used as the validation set, and 70% is used as the training set.

step 2.1.1, calculating student s at the same time by using the single thermal coding embedded layer (1) Interactive embedding in time answering

In formula (1), A is an embedding matrix to be trained, andM is the embedding dimension, in this example, set m=64,/>Representing student s is/>Instant answer manifestation/>Is encoded unithermally, and/> If/>Indicating that s has no answer or wrong answer at the knowledge point with the number of j% N at the time t _i, ifThen it indicates that student s is/>The moment is correct at the knowledge point numbered j% N, where the% symbol represents the remainder,Obtained by using the formula (2):

Step 2.1.2 in At the moment, let student s answer question/>, at the ith timeKnowledge state at time is/>Will beAnd/>Spliced into the ith input vector/>Then, respectively inputting three single-layer full-connection feedforward neural networks and correspondingly outputting the first forgetting gate/>, when the ith update is correspondingly output, through a sigmoid functionFirst input gate/>Output door/>Let the initial knowledge state/>, of student s when i=1Is the set value. In this example, d=64 is set; when i=1,/>

Step 2.1.3, input vector iInputting a fourth single-layer full-connection feedforward neural network, and outputting/>, through a tanh activation functionCandidate memory representation of time/>Thereby calculating the expression (3)Memory representation/>, in time cell information calculation layer

In the formula (3), the amino acid sequence of the compound,Representation/>Post-decay memory in the time memory decay layer indicates that when i=1, let/>Is the set value. In this embodiment, when i=1, i/>

Step 2.2.3, the memory decay layer is calculated using (4)Lower memory attenuation limit/>, over a period of time

In the formula (4), the amino acid sequence of the compound,For the last time period/>Lower limit of internal memory attenuation, let/>, when i=1Is the set value. In this embodiment, when i=1, i/>

Step 2.2.4, the memory decay layer is calculated using (5)The memory after the moment is forgotten represents c ^s (t):

Step 2.3, obtaining hidden knowledge state:

In formula (6), σ (·) is a sigmoid activation function.

Step 2.4, a question-answering prediction module:

step 2.4.2, the multi-layer perception machine layer enables students s to be in Capability level representation of time of day/>Thereby obtaining the question/>, when the student s answers at the (i+1) th time, by using the formula (9)Prediction of correct probability of answer on/>

In the formula (9), F (-) is a multi-layer perceptron, and in the embodiment, F (-) is a three-layer fully-connected neural network;

Step 3, constructing cross entropy loss by using the method (10)Training the knowledge state fitting-answer correctness prediction neural network, so as to obtain a trained answer correctness prediction model for realizing the prediction of the student answer correctness. In this example implementation, an Adam optimizer is used:

In the formula (10), the amino acid sequence of the compound, For the predicted value of the correct probability of answering questions of students s at the time t _i,/>The true value of the correctness of the answer of the student s at the time t _i is shown, wherein 0 represents the wrong answer and 1 represents the answer pair.

In specific implementation, the answer prediction module in step 2.4 may further predict the answer correctness according to the following process:

In the formulas (11) and (12), sigma (·) is a sigmoid function,Two embedding matrixes to be trained;

In the formula (13), the amino acid sequence of the compound,Is a matrix that needs to be trained.

Step 2.4.3, multiple perceptron layers to obtain student s answering at i+1st time using equation (9), at questionPrediction of correct probability of answer on/>

Step 2.4.3, setting a question-knowledge point matrix as Q _q＝{Q_mn}_M×N, wherein M is more than or equal to 1 and less than or equal to M, N is more than or equal to 1 and less than or equal to N, if the problem with the number of M surveys the knowledge point with the number of N, marking Q _mn =1, otherwise marking Q _mn =0; the multi-layer perceptron layer obtains the question when the student s answers in the (i+1) th time by using the formula (18) Prediction of correct probability of answer on/>

In the formula (18), f' (. Cndot.) represents a multi-layer perceptron, and the symbol isRepresenting the multiplication of the matrix corresponding positions. In this embodiment, f' (. Cndot.) is a three-layer fully-connected neural network.

Examples

To verify the effectiveness of the method of the present invention, the present invention selects three public datasets ASSISTMENT, ASSISTMENT and slepemapy. Cz that are widely used in the educational field. For the three data sets, setting the maximum sequence length as 100, and cutting the student sequences exceeding the maximum sequence length into a plurality of pieces, wherein the number of pieces of student sequences is less than 0; at the same time, sequences below 5 interactions are removed in order to ensure that each sequence has sufficient data for training.

The present embodiment adopts the Accuracy (ACC) and the area under ROC curve (AUC) as evaluation criteria.

Five methods were selected in this example for effect comparison with the method of the present invention, the selected methods were DKT, DKT_ V, DKT + Forgetting, AKT and HawkesKT, CT-NCM was the method of the present invention, CT-NCM_IRT and CT-NCM_NCD were two expansion methods of step 2.4, and experimental results are shown in Table 1.

Table 1 experimental results of student answer predictions performed on three data sets by the method of the present invention and other comparison algorithms

The CT-NCM and two variants thereof can obtain excellent results on three large data sets disclosed in Table 1, and the CT-NCM obtains optimal results on the three data sets, so that experiments prove that the invention has high accuracy and reliability in predicting the correctness of the student questions.

Claims

1. A time-sensitive answer correctness prediction method combining learning and forgetting is characterized by comprising the following steps:

representing a history of any student s as a response sequence

Wherein/>Time instant of ith answer for student s, and/> Question numbering for the ith answer of student s,/>Questions/>, for the ith answer of student sNumber of knowledge concepts examined,/>Representing student s answering the question/>, at the ith timeThe above answer cases, if/>Indicate answer pairs, if/>Indicating answering errors, wherein i=1, 2, … and n _s,n_s are the answering times of the student s;

step 2.1.1, calculating student s in the single thermal coding embedded layer by using the method (1) Interactive embedding in time answering

In formula (1), A is an embedding matrix to be trained, andM is the embedding dimension,/>Representing student s is/>Instant answer manifestation/>Is encoded unithermally, and/>If/>Indicating that s has no answer or wrong answer at the knowledge point with the number of j% N at the time t _i, if/>Then it indicates that student s is/>The moment is correctly answered at the knowledge point numbered j% N, where the% symbol represents the remainder, and is derived from equation (2):

Step 2.1.2 in At the moment, let student s answer question/>, at the ith timeKnowledge state at time is/>Will/>AndSpliced into the ith input vector/>Then, respectively inputting three single-layer full-connection feedforward neural networks and correspondingly outputting the first forgetting gate/>, when the ith update is correspondingly output, through a sigmoid functionFirst input gate/>Output door/>Let the initial knowledge state/>, of student s when i=1Is the set value;

Step 2.3, obtaining hidden knowledge state:

In the formula (6), sigma (·) is a sigmoid activation function;

Step 2.4, a question-answering prediction module:

In the formula (9), F (-) is a multi-layer perceptron;

2. The method for predicting validity of answer questions based on time sensitivity of joint learning and forgetting as claimed in claim 1, wherein the answer questions predicting module in step 2.4 predicts validity of answer questions according to the following procedure:

3. The method for predicting validity of answer questions based on time sensitivity of joint learning and forgetting as claimed in claim 1, wherein the answer questions predicting module in step 2.4 predicts validity of answer questions according to the following procedure: