CN115688863A

CN115688863A - Depth knowledge tracking method based on residual connection and student near-condition feature fusion

Info

Publication number: CN115688863A
Application number: CN202211354947.3A
Authority: CN
Inventors: 张姝; 韩晓瑜; 李子杰; 周菊香; 欧阳昭相
Original assignee: Yunnan Normal University
Current assignee: Yunnan Normal University
Priority date: 2022-11-01
Filing date: 2022-11-01
Publication date: 2023-02-03

Abstract

The invention relates to a deep knowledge tracking method based on fusion of residual connection and current situation features of students, and belongs to the technical field of knowledge tracking. The present invention improves the problems existing in the Deep Knowledge Tracking (DKT) model. The Deep Knowledge Tracking is to use the time-sequential knowledge point answering data of the learner and the relevant data of the learner's answer to the knowledge point correctly or not, and use the recurrent neural network (RNN), to predict the students' future answers. The invention adds the residual connection to the DKT model to solve the problem of network information degradation, and enhances the recent learning situation of the students through the recent learning data of the students, effectively improving the accuracy of the model prediction, and is a great contribution to the field of knowledge tracking. Development offers feasible solutions.

Description

A Deep Knowledge Tracking Method Based on Fusion of Residual Connections and Students' Current Situation Features

技术领域technical field

本发明涉及一种基于残差连接和学生近况特征融合的深度知识追踪方法，属于知识追踪技术领域。The invention relates to a deep knowledge tracking method based on fusion of residual connection and current situation characteristics of students, and belongs to the technical field of knowledge tracking.

背景技术Background technique

知识追踪是人工智能在教育中的重要实践之一。其主要研究任务是根据学生以往的学习数据，建立一个模型对学生未来做题情况进行预测。近年来，人们提出了优于以往所提出的知识追踪方法和传统方法的基于循环神经网络(RNN)创建的深度知识追踪(DKT)。循环神经网络对于实时预测学生做题情况有着出乎意料的优势和效果，但是经过观察和实验，发现DKT还有很大的提升空间。Knowledge tracking is one of the important practices of artificial intelligence in education. Its main research task is to establish a model to predict students' future performance based on the students' past learning data. In recent years, deep knowledge tracing (DKT) based on recurrent neural network (RNN) creation has been proposed that outperforms previously proposed knowledge tracing methods and traditional methods. The cyclic neural network has unexpected advantages and effects in predicting students' performance in real time, but after observation and experimentation, it is found that DKT still has a lot of room for improvement.

随着大数据时代的到来，人工智能在教育中的应用越来越普及，学生在学习过程中产生的数据被大量存储，计算机对数据处理的能力也大大加强。教育数据挖掘和教育数据分析的发展为学习预测的发展提供了动力。深度知识追踪就是通过对学生以往的学习数据，构建以RNN为基础的模型，对学生未来做题情况进行预测。RNN对具有序列特性的数据非常有效，它能挖掘数据中的时序信息以及语义信息，利用RNN的这种能力，从学生以往的学习数据中预测未来做题情况。为教师个性化教学提供助力，同时，也提高了学生的学习效率。With the advent of the era of big data, the application of artificial intelligence in education is becoming more and more popular. The data generated by students in the learning process is stored in large quantities, and the ability of computers to process data is also greatly enhanced. Advances in educational data mining and educational data analysis have provided impetus for the development of learning prediction. In-depth knowledge tracking is to build an RNN-based model based on students' past learning data to predict students' future performance. RNN is very effective for data with sequential characteristics. It can mine the timing information and semantic information in the data. Using this ability of RNN, it can predict the future situation of the students from the past learning data. It provides assistance for teachers' personalized teaching, and at the same time, it also improves the learning efficiency of students.

发明内容Contents of the invention

本发明要解决的技术问题是提供一种基于残差连接和学生近况特征融合的深度知识追踪方法，用以解决现有的智商追踪方法中，因为忽略其他知识点预测值的合理性，从而加剧了以后预测其他知识点时的误差问题。The technical problem to be solved by the present invention is to provide a deep knowledge tracking method based on the fusion of residual connection and students' current situation features, which is used to solve the problem of ignoring the rationality of other knowledge points' predicted values in the existing IQ tracking method, thus aggravating the After understanding the error problem when predicting other knowledge points.

本发明的技术方案是：一种基于残差连接和学生近况特征融合的深度知识追踪方法，将所需数据进行one-hot编码作为循环神经网络的输入，输入到循环神经网络每一条数据生成一个预测列表，设置一个CUT_STEP值，每个学生每在循环神经网络中运算CUT_STEP个数据，就将前面的CUT_STEP个原始数据乘以权重再相加，作为输入放到全连接层，再将全连接层的输出做为补充信息加到隐含层中继续运算，直至得到输入数据的初始预测矩阵。将初始预测矩阵的每条数据与学生近期的学习数据的和进行拼接，即对学生最近的学习情况进行增强，最终经过线性层得到预测矩阵。从预测矩阵中读取所需的预测值，然后将预测值和真实值计算损失，根据损失对模型进行优化，直至模型达到相对最优。The technical solution of the present invention is: a deep knowledge tracking method based on the fusion of residual connection and current situation characteristics of students, one-hot encoding is performed on the required data as the input of the cyclic neural network, and each piece of data input to the cyclic neural network generates a For the prediction list, set a CUT_STEP value. Every time each student calculates CUT_STEP data in the cyclic neural network, multiply the previous CUT_STEP original data by weight and add them together, put them into the fully connected layer as input, and then put the fully connected layer The output of is added to the hidden layer as supplementary information to continue the operation until the initial prediction matrix of the input data is obtained. Each piece of data in the initial prediction matrix is spliced with the sum of the students' recent learning data, that is, the recent learning situation of the students is enhanced, and finally the prediction matrix is obtained through the linear layer. Read the required prediction value from the prediction matrix, then calculate the loss between the prediction value and the real value, and optimize the model according to the loss until the model reaches a relative optimum.

具体步骤为：The specific steps are:

Step1：数据预处理，将每条原始数据处理成one-hot数据形式，每条原始数据合起来就构成one-hot矩阵，具体步骤如下：Step1: Data preprocessing, process each piece of raw data into one-hot data form, each piece of raw data is combined to form a one-hot matrix, the specific steps are as follows:

Step1.1：数据读取及清理，从学生的做题记录中读取所需特征，组成新的数据形式data，所需特征包括时间序列id、学生id、知识点id、做题正误。Step1.1: Data reading and cleaning, read the required features from the student's test record, and form a new data form data. The required features include time series id, student id, knowledge point id, and correctness of the question.

Step1.2：计算知识点种类，并形成一个知识点id组成的列表K＝{k₁，k₂，...，k_l}。Step1.2: Calculate the types of knowledge points, and form a list of knowledge point ids K={k ₁ , k ₂ ,..., k _l }.

Step1.3：根据知识点列表K形成一个字典，字典形式为{知识点id：知识点在Step1.2形成的知识点列表中的位置}，数学符号表示为dict＝{k：e}，其中k∈K，K为Step1.2中形成的知识点列表，e∈{0，1，2，...，l-1}。Step1.3: Form a dictionary according to the knowledge point list K, the dictionary form is {knowledge point id: the position of the knowledge point in the knowledge point list formed in Step1.2}, and the mathematical symbol is expressed as dict={k:e}, where k∈K, K is the list of knowledge points formed in Step1.2, e∈{0, 1, 2, ..., l-1}.

Step1.4：从data中提取出每个学生的知识点列表和对应的回答正确与否的列表，形成一个序列sequences。Step1.4: Extract the list of knowledge points of each student and the corresponding list of correct answers from the data to form a sequence of sequences.

Step1.5：将sequences转化为one-hot编码形式，并将数据处理成统一形式，即每个学生有MAX_STEP条数据，每一条one-hot数据对应学生做的一道题。Step1.5: Convert the sequences into one-hot encoded form, and process the data into a unified form, that is, each student has MAX_STEP pieces of data, and each piece of one-hot data corresponds to a question that the student has done.

Step2：从Step1处理好的数据中批次读取训练数据。Step2: Read the training data in batches from the data processed in Step1.

Step3：利用循环神经网络对学生下一次做题正误进行预测，即将Step2读取的数据放入循环神经网络中运算，产生原始预测数据。其公式为：Step3: Use the cyclic neural network to predict the correctness of the student's next question, that is, put the data read in Step 2 into the cyclic neural network for calculation, and generate the original prediction data. Its formula is:

h_t＝tanh(W_xx_t+b_x+W_hh_(t-1)+b_h) (1)h _t ＝tanh(W _x x _t +b _x +W _h h _(t-1) +b _h ) (1)

式中，h_t为t时刻的隐藏状态，x_t为t时刻的输入，h_(t-1)为t-1时刻的隐藏状态。In the formula, h _t is the hidden state at time t, x _t is the input at time t, and h _(t-1) is the hidden state at time t-1.

Step4：循环神经网络产生的原始预测数据，通过学生近期的学习数据来增强学生最近的学习情况，进行最终预测，具体步骤为。Step4: The original prediction data generated by the cyclic neural network is used to enhance the recent learning situation of the students through the recent learning data of the students, and make the final prediction. The specific steps are as follows.

如果学生在前面的做题情况中都表现良好，但是近期的做题情况表现的很差，那么极有可能学生近期的学习情况出现了问题。并且将影响到当前的做题情况。通过学生以往的学习数据来判断学生最近的学习情况。通过实验，当拼接学生近期的学习数据的和时，效果最佳。If the student performed well in the previous problem-solving situations, but the recent problem-solving situation performed poorly, it is very likely that the student's recent learning situation has problems. And it will affect the current problem-making situation. Judge the student's recent learning situation through the student's past learning data. Through experiments, the effect is the best when splicing the sum of the students' recent learning data.

Step4.1：将Step3产生的原始预测数据与该学生近期做题数据的和进行拼接，形成心得数据o_t，其公式为：Step4.1: Splicing the original prediction data generated in Step3 and the sum of the student’s recent test data to form the experience data o _t , the formula is:

式中，o_t为将原始预测数据与该学生近期做题数据的和进行拼接后的数据，h_t为原始预测数据，x_i为该学生i时刻的one-hot做题数据，n为需要拼接的学生近期做题数据的数量。In the formula, o _t is the data after splicing the original forecast data and the sum of the student’s recent test data, h _t is the original forecast data, x _i is the one-hot test data of the student at time i, and n is the required The number of spliced students' recent test data.

Step4.2：将Step4.1拼接后的数据o_t放入全连接层fc₁中，调整预测的维度，fc₁的表达式为：Step4.2: Put the spliced data o _t of Step4.1 into the fully connected layer fc ₁ , and adjust the predicted dimension. The expression of fc ₁ is:

式中，

和

为线性层fc₁的输入和输出，

为权重，

为偏置。In the formula,

and

is the input and output of the linear layer fc ₁ ,

is the weight,

for the bias.

Step4.3：将Step4.2的输出向量放入激活函数中，将输出的每个元素控制在0到1之间，得到最终预测结果，数值转换表示为：Step4.3: Put the output vector of Step4.2 into the activation function, control each element of the output between 0 and 1, and get the final prediction result. The numerical conversion is expressed as:

Step5：在Step3所述的循环神经网络的基础上加入残差连接，提高预测准确率，具体步骤为：Step5: Add residual connections on the basis of the cyclic neural network described in Step3 to improve the prediction accuracy. The specific steps are:

在Step3循环神经网络的基础上加入残差连接提高预测准确率。由于发现在循环神经网络中的预测结果会过于追求下一步知识点答题的结果，而忽略了其他知识点预测值的合理性，从而加剧了以后预测其他知识点时的误差，所以设置一个CUT_STEP(1＜CUT_STEP＜MAX_STEP)值用于对一个学生的one-hot矩阵的MAX_STEP进行分割，CUT_STEP即为分段数据聚合的分段值，将分段的CUT_STEP个数据相加再经过变换加入到隐含层，进入下一次循环，以上所述即为残差连接，实现残差连接具体步骤如下：On the basis of the Step3 cyclic neural network, the residual connection is added to improve the prediction accuracy. It is found that the prediction results in the cyclic neural network will be too pursuit of the results of the next knowledge point answering questions, while ignoring the rationality of the prediction values of other knowledge points, thus exacerbating the error when predicting other knowledge points in the future, so set a CUT_STEP( 1<CUT_STEP<MAX_STEP) value is used to segment the MAX_STEP of a student's one-hot matrix, and CUT_STEP is the segmentation value of segmented data aggregation, adding the segmented CUT_STEP data and adding them to the implicit Layer, enter the next cycle, the above is the residual connection, the specific steps to realize the residual connection are as follows:

Step5.1：每个学生每在循环神经网络中运算CUT_STEP个数据，就将前面的CUT_STEP个数据乘上权重W再相加，利用残差思想将前面CUT_STEP个数据根据其重要性(权重)加起来得到一个蕴含前面CUT_STEP个数据信息的向量，其中权重W＝{w₁，W₂，...，W_C}，C为CUT_STEP的简写。Step5.1: Every time each student calculates CUT_STEP data in the cyclic neural network, multiply the previous CUT_STEP data by the weight W and add them together, and use the residual idea to add the previous CUT_STEP data according to their importance (weight) Get a vector containing the previous CUT_STEP data information, where the weight W={w ₁ , W ₂ ,...,W _C }, and C is the abbreviation of CUT_STEP.

Step5.2：将Step5.1产生的蕴含前面CUT_STEP个数据信息的向量作为输入放入全连接层fc₂，将Step5.1的结果转化成可以直接与隐含层相加的形式，fc₂的表达式为：Step5.2: Put the vector generated by Step5.1 containing the previous CUT_STEP data information into the fully connected layer fc ₂ as input, and convert the result of Step5.1 into a form that can be directly added to the hidden layer, fc ₂ The expression is:

式中，

和

为线性层fc₂的输入和输出，

为权重，

为偏置。In the formula,

and

is the input and output of the linear layer fc ₂ ,

is the weight,

for the bias.

Step5.3：再将fc₂的输出加到隐含层中继续运算，再在RNN中运算CUT_STEP个数据，将前面的CUT_STEp个数据相加，重复上面步骤，t时刻隐含层变化后

的表达式为：Step5.3: Then add the output of fc ₂ to the hidden layer to continue the operation, then calculate CUT_STEP data in RNN, add the previous CUT_STEp data, repeat the above steps, after the hidden layer changes at time t

The expression is:

式中，C为CUT_STEP的简称，s_hd为第d个学生one-hot数据，W指为参与计算的s_hd赋的权重，

即{s_d(i-C+1)，...s_di}乘以W再求和。In the formula, C is the abbreviation of CUT_STEP, s _hd is the one-hot data of the dth student, W refers to the weight assigned to s _hd participating in the calculation,

That is {s _d(i-C+1) ,...s _di } multiplied by W and summed.

Step6：读取学生回答正误的真实值和模型预测学生回答正误的预测值，具体步骤为：Step6: Read the true value of the student's correct answer and the predicted value of the model's predicted student's correct or incorrect answer. The specific steps are:

由于数据集中没有给出明确的标签，所以采用的方案是：学生的第i+1条数据是该学生第i条数据生成的预测列表的标签，由于该学生第i条数据生成的预测列表是对该学生下一步做所有知识点的预测，而学生的第i+1条数据是该学生在做某个知识点时正确与否的one-hot编码，所以从这里可以看出来学生的i+1条数据是可以从该学生第i条数据生成的预测列表中找到预测值的，具体步骤如下：Since there is no clear label given in the data set, the scheme adopted is: the i+1th piece of data of the student is the label of the prediction list generated by the i-th piece of data of the student, because the prediction list generated by the i-th piece of data of the student is The next step for the student is to predict all knowledge points, and the student's i+1th piece of data is the one-hot code of whether the student is correct or not when doing a certain knowledge point, so it can be seen from here that the student's i+ 1 piece of data can be found from the prediction list generated by the i-th piece of data of the student. The specific steps are as follows:

Step6.1：从输入的one-hot矩阵中找到该学生做的除第一个以外(即零号位置)所有知识点的位置组成位置列表。Step6.1: From the input one-hot matrix, find the positions of all the knowledge points made by the student except the first one (that is, the zero position) to form a position list.

Step6.2：根据找出的位置列表从预测列表找出对应的预测值。Step6.2: Find the corresponding predicted value from the predicted list according to the found position list.

Step6.3：从输入的one-hot矩阵中找到该学生做的除第一个以外(即零号位置)所有知识点正确与否组成的真实值列表。Step6.3: From the input one-hot matrix, find the true value list consisting of all knowledge points except the first one (that is, the zero position) that the student has done.

Step7：计算预测误差并优化网络，得到可预测学生下一次做题正确与否的模型，具体步骤为：Step7: Calculate the prediction error and optimize the network to obtain a model that can predict whether the student will do the next question correctly or not. The specific steps are:

Step7.1：根据Step6中读取的真实值和预测值，利用损失函数计算损失值。Step7.1: According to the real value and predicted value read in Step6, use the loss function to calculate the loss value.

Step7.2：根据损失值，利用优化器对网络进行优化。Step7.2: According to the loss value, use the optimizer to optimize the network.

Step7.3：因为优化并不是优化一次，所欲根据需要定义Epoch的大小，决定优化的次数，即可重复几次以上步骤。Step7.3: Because optimization is not optimized once, if you want to define the size of Epoch and determine the number of optimizations according to your needs, you can repeat the above steps several times.

Step7.4：得到训练好的可预测学生下一步做题正确与否的模型。Step7.4: Obtain a trained model that can predict whether the student will do the next question correctly or not.

现有技术中，针对有时间序列的知识点答题和对应回答正误的学生学习数据，利用循环神经网络对学生下一步答题正误进行预测，由于每一次都是对学生下一步所有知识点答题结果的预测，而学生只会在下一步做一道题，所以发现预测结果会过于追求下一步知识点答题的结果，而忽略了其他知识点预测值的合理性，从而加剧了以后预测其他知识点时的误差，为了解决这个问题，本发明在循环神经网络的基础上加入了残差连接，一起完成对学生下一步答题正误的预测。同时发现，如果学生在前面的做题情况中都表现良好，但是近期的做题情况表现的很差，那么极有可能学生近期的学习情况出现了问题。并且将影响到当前的做题情况。因此本发明通过学生近期的学习数据来增强学生最近的学习情况，并取得了不错的效果。本发明运用残差连接，有效解决了预测结果过于追求下一步知识点答题的结果，而忽略了其他知识点预测值的合理性，从而加剧了以后预测其他知识点时的误差的问题。In the existing technology, for the knowledge point answers with time series and the student learning data corresponding to the correct answers, the recurrent neural network is used to predict the correctness of the students' next answer, because each time it is the result of all the knowledge points of the students in the next step. Forecasting, and students will only do one question in the next step, so it is found that the prediction result will be too pursuit of the result of answering the next knowledge point, while ignoring the rationality of the predicted value of other knowledge points, thus exacerbating the error when predicting other knowledge points in the future , in order to solve this problem, the present invention adds a residual connection on the basis of the recurrent neural network, and together completes the prediction of the student's next answer. At the same time, it was found that if the students performed well in the previous problems, but performed poorly in the recent problems, it is very likely that the students have problems in their recent learning. And it will affect the current problem-making situation. Therefore, the present invention enhances the recent learning situation of the students through the recent learning data of the students, and achieves good results. The present invention uses the residual connection to effectively solve the problem that the prediction results are too pursuit of the answer to the next knowledge point, while ignoring the rationality of the prediction values of other knowledge points, thereby exacerbating the problem of errors when predicting other knowledge points in the future.

本发明的有益效果是：本发明对深度知识追踪(DKT)模型存在的问题进行改进，深度知识追踪是利用学习者有时序的知识点答题相关数据和学习者对该知识点答题正确与否的相关数据，运用循环神经网络(RNN)，来对学生未来答题情况进行预测。本发明将残差连接加入到DKT模型中，以解决网络信息退化的问题，以及通过学生近期的学习数据来增强学生最近的学习情况，有效的提高了模型预测的准确率，为知识追踪领域的发展提供可行的方案。The beneficial effects of the present invention are: the present invention improves the problems existing in the Deep Knowledge Tracking (DKT) model, and the Deep Knowledge Tracking is to use the relevant data of the learner's time-sequenced knowledge point answer questions and the learner's information on whether the knowledge point answer is correct or not. Relevant data, using recurrent neural network (RNN), to predict students' future answers. The invention adds the residual connection to the DKT model to solve the problem of network information degradation, and enhances the recent learning situation of the students through the recent learning data of the students, effectively improving the accuracy of the model prediction, and is a great contribution to the field of knowledge tracking. Development offers feasible solutions.

附图说明Description of drawings

图1是本发明的步骤流程图；Fig. 1 is a flow chart of steps of the present invention;

图2是本发明的模型示意图。Fig. 2 is a schematic diagram of a model of the present invention.

具体实施方式Detailed ways

下面结合附图和具体实施方式，对本发明作进一步说明。The present invention will be further described below in combination with the accompanying drawings and specific embodiments.

实施例1：如图2所示，本实例以公开数据集2009-2010ASSISTment数据集为例，将每个学生的数据进行处理，然后放在搭建好的模型中对模型进行训练。具体过程包括：将数据处理成可以输入模型的数据形式，从处理好的数据中批次读取训练数据，利用循环神经网络对学生下一次做题正误进行初步预测，通过学生近期的学习数据来增强学生最近的学习情况，进行最终预测，在循环神经网络的基础上加入残差连接提高预测准确率，读取真实值和预测值，计算预测误差并优化网络，得到可预测学生下一次做题正确与否的模型。Example 1: As shown in Figure 2, this example takes the public dataset 2009-2010ASSISTment dataset as an example, processes the data of each student, and then puts it in the built model to train the model. The specific process includes: processing the data into a data form that can be input into the model, reading the training data in batches from the processed data, using the recurrent neural network to make a preliminary prediction of the correctness of the students' next question, and using the students' recent learning data to Enhance students' recent learning situation, make final predictions, add residual connections on the basis of recurrent neural networks to improve prediction accuracy, read real values and predicted values, calculate prediction errors and optimize the network, and get predictable students' next questions correct or not model.

如图1所示，一种基于残差连接和学生近况特征融合的深度知识追踪方法，具体步骤为：As shown in Figure 1, a deep knowledge tracking method based on the fusion of residual connections and students' current situation features, the specific steps are:

Step1：数据预处理。Step1: Data preprocessing.

将原始数据处理成可以直接输入模型的数据，具体步骤如下：To process the raw data into data that can be directly input into the model, the specific steps are as follows:

Step1：数据预处理，将每条原始数据处理成one-hot数据形式，具体步骤如下：Step1: Data preprocessing, processing each piece of raw data into one-hot data form, the specific steps are as follows:

Step1.1：数据读取及清理：Step1.1: Data reading and cleaning:

从学生的做题记录中读取所需特征，组成新的数据形式data。所需特征包括非按时间顺序排列的编号、学生编号、知识点编号，正确与否四个特征，其中知识点为学生需掌握的知识点，每个知识点可能对应几道题来辅助学生掌握该知识点，即每个知识点可能在一个学生的数据中出现多次。Read the required features from the student's record of doing questions to form a new data form data. The required features include the number in non-chronological order, the number of students, the number of knowledge points, and the four characteristics of correctness. Among them, the knowledge points are the knowledge points that students need to master. Each knowledge point may correspond to several questions to assist students to master This knowledge point, that is, each knowledge point may appear multiple times in a student's data.

Step1.2：计算知识点种类，并形成一个知识点id组成的列表K＝{k₁，k₂，...，k_l}，列表长度为123，列表K的最主要作用在于为每个知识点确定位置，以便在后面的one-hot矩阵中，仅根据one-hot矩阵中的1位置即可确定是哪个知识点。Step1.2: Calculate the types of knowledge points, and form a list K={k ₁ , k ₂ ,...,k _l } composed of knowledge point ids, the length of the list is 123. The main function of the list K is to provide each The position of the knowledge point is determined, so that in the following one-hot matrix, which knowledge point can be determined only according to the 1 position in the one-hot matrix.

Step1.3：根据Step1.2中形成的知识点列表K形成一个字典，字典形式为{知识点编号：知识点在Step1.2形成的知识点列表中的位置}，数学符号表示为dict＝{k：e}，其中k∈K，e∈{0，1，2，...122}，K为Step1.2中生成的知识点列表，122为Step1.2中知识点列表长度-1，为以后形成one-hot编码做准备。Step1.3: Form a dictionary according to the knowledge point list K formed in Step1.2, the dictionary form is {knowledge point number: the position of the knowledge point in the knowledge point list formed in Step1.2}, and the mathematical symbol is expressed as dict={ k:e}, where k∈K, e∈{0, 1, 2, ... 122}, K is the list of knowledge points generated in Step1.2, 122 is the length of the list of knowledge points in Step1.2 -1, Prepare for the formation of one-hot encoding in the future.

Step1.4：从data中提取出每个学生的知识点列表和对应的回答正确与否的列表形成一个序列sequences，具体步骤如下：Step1.4: Extract the list of knowledge points of each student and the corresponding list of correct answers from the data to form a sequence sequence. The specific steps are as follows:

Step1.4.1：从data中提取出每个学生的所有数据S_r＝{s_r1，s_r2，...，s_rt}，S_r表示所有学生的未经处理的数据，与Step1.6.4中的S_h区别，其中s_rt是不等长的二维数组(不等长是因为每个学生的做题数目不等)，包含第t个学生的所有数据。Step1.4.1: Extract all the data of each student from data S _r = {s _r1 , s _r2 ,..., s _rt }, S _r represents the unprocessed data of all students, which is the same as in Step1.6.4 The difference between Sh _h , where s _rt is a two-dimensional array of unequal length (the unequal length is because each student has a different number of questions to do), containing all the data of the tth student.

Step1.4.2：从每个学生的所有数据中提取出该学生的知识点列表S_Ki＝{q_ij|0≤i＜b，j∈N^*}和对应的回答正确与否的列表S_Ai＝{a_ij|0≤i＜b，j∈N^*}(用1表示正确，用0表示错误)，其中S_K指学生的知识点列表，i指第i个学生，即S_Ki指第i个学生的知识点列表，q_ij表示第i个学生按时间顺序做的第j道题，b为Step1.4.1中S列表的长度，即学生的个数，在该数据集中有4163个学生。j的范围取决于对应学生的做题个数(因为每个学生的做题个数不尽相同，所以每个学生的j的范围也不相同)，S_A指学生的回答正误列表，i指第i个学生，即S_Ai指第i个学生的回答正误列表。Step1.4.2: Extract the student's knowledge point list S _Ki = {q _ij |0≤i<b, j∈N ^* } and the corresponding list of correct answers S _Ai = from all the data of each student {a _ij |0≤i<b, j∈N ^* } (1 means correct, 0 means wrong), where S _K refers to the student's knowledge point list, i refers to the i-th student, that is, S _Ki refers to the i-th student The list of knowledge points of each student, q _ij represents the jth question done by the i-th student in chronological order, b is the length of the S list in Step1.4.1, that is, the number of students, and there are 4163 students in this data set. The range of j depends on the number of questions that the corresponding students do (because the number of questions for each student is different, so the range of j for each student is also different), S _A refers to the student's list of correct and incorrect answers, and i refers to The i-th student, that is, S _Ai refers to the correct and incorrect answer list of the i-th student.

Step1.4.3：所有学生的知识点列表和对应的回答正误列表组成一个序列sequences。Step1.4.3: The list of knowledge points of all students and the corresponding list of correct answers form a sequence sequences.

Step1.5：将sequences转化为one-hot编码形式，每一条one-hot数据对应学生做的一道题，具体步骤如下：Step1.5: Convert the sequences into one-hot coded form, each one-hot data corresponds to a question for the students, the specific steps are as follows:

Step1.5.1：从sequences中依此读取每个学生的知识点列表(学生的知识点列表长度记为M)和回答正确与否列表。Step1.5.1: Read the list of knowledge points of each student from the sequences accordingly (the length of the list of knowledge points of the students is recorded as M) and the list of whether the answers are correct or not.

Step1.5.2：M为学生的知识点列表长度(每个学生做题数目不同，则M的值也不同)，L为所有知识点个数的两倍，即L＝246，根据需要设置一个MAX_STEP为50，如果一个学生的数据个数M是50的整数倍，则该学生形成一个M*246的零矩阵，否则：Step1.5.2: M is the length of the student's knowledge point list (each student has a different number of questions, so the value of M is also different), L is twice the number of all knowledge points, that is, L=246, set a MAX_STEP as needed is 50, if a student's data number M is an integer multiple of 50, then the student forms a M*246 zero matrix, otherwise:

若M％50＝PIf M%50=P

则该学生形成一个M_ch*246的零矩阵，M_ch的表达式如下：Then the student forms a zero matrix of M _ch *246, and the expression of M _ch is as follows:

M_ch＝M+(50-P) (1)M _ch =M+(50-P) (1)

MAX_STEP的作用在于将学生的数据形状进行统一化，由于每个学生的做题数目不同，所以形成的学生知识点列表长度M也不同，导致学生形成的one-hot矩阵不尽相同，而输入模型的数据形状是需要一样的，所以设定了一个MAX_STEP来对每个学生形成的one-hot矩阵进行修正，将每个学生的one-hot矩阵的长度变为MAX_STEP的整数倍，不足的补零，以便进一步将数据统一成模型需要的输入样式。The role of MAX_STEP is to unify the data shape of students. Since each student has a different number of questions, the length M of the list of students' knowledge points is also different, resulting in different one-hot matrices formed by students. The input model The shape of the data needs to be the same, so a MAX_STEP is set to correct the one-hot matrix formed by each student, and the length of each student's one-hot matrix is changed to an integer multiple of MAX_STEP, and the insufficient one-hot matrix is filled with zeros , in order to further unify the data into the input style required by the model.

Step1.5.3：使用Step1.4.2生成的学生知识点列表S_Ki＝{q_ij|0≤i＜b，j∈N^*}中的q_ij(即第i个学生做的第j个题)，根据Step1.3形成的字典dict找到_qij在Step1.2中形成知识点列表的位置s，Step1.5.3: Use the student knowledge point list S _Ki generated by Step1.4.2 q _ij in {q _ij |0≤i<b, j∈N ^* } (that is, the jth question done by the i-th student), Find the position s where _qij forms the knowledge point list in Step1.2 according to the dictionary dict formed in Step1.3,

如果该知识点在对应的Step1.4.2回答正确与否列表S_Ai中的回答正确，则第i个学生的零矩阵的第j行的第s个位置记为1，否则该学生的零矩阵的第j行的第s+l个位置变为1，l为Step1.2中形成知识点列表的长度，重复以上步骤，直到所有学生的零矩阵变成对应的one-hot矩阵。If the knowledge point is correct in the corresponding Step1.4.2 answer list S _Ai , the sth position of the jth row of the i-th student’s zero matrix is recorded as 1, otherwise the student’s zero-matrix The s+l position of the jth row becomes 1, l is the length of the knowledge point list formed in Step1.2, and the above steps are repeated until the zero matrix of all students becomes the corresponding one-hot matrix.

Step1.5.4：接下来将每个学生的one-hot矩阵M*246(或者M_ch*246)变成X*50*246，X根据MAX_STEP和M自动生成，即一个学生的one-hot数据根据MAX_STEP分成多块，当成多个学生的one-hot数据，重复以上步骤，所有学生根据MAX_STEP切分之后的数据组成最后的train_data，train_data里面的数据可表示为S_h＝{s_h1，s_h2，...，s_hi}，S_h的维度为Y*50*246，其中Y为X*50*246中所有X的和，1≤i≤Y，s_hi＝{s_i1，s_i2，...s_im}，其中s_hi为第j个one-hot学生的数据，其维度为50*246，1≤m≤50。Step1.5.4: Next, change each student's one-hot matrix M*246 (or M _ch *246) to X*50*246, and X is automatically generated according to MAX_STEP and M, that is, a student's one-hot data is based on MAX_STEP is divided into multiple blocks, which are regarded as one-hot data of multiple students, and the above steps are repeated. The data of all students according to MAX_STEP segmentation forms the final train_data. The data in train_data can be expressed as Sh ₌ {s _h1 , s _h2 , ..., s _hi }, the dimension of S _h is Y*50*246, where Y is the sum of all X in X*50*246, 1≤i≤Y, s _hi ={s _i1 , s _i2 ,. ..s _im }, where s _hi is the data of the jth one-hot student, and its dimension is 50*246, 1≤m≤50.

以上步骤将data数据转换成了一个Y*50*246的one-hot矩阵。The above steps convert the data data into a one-hot matrix of Y*50*246.

Step2：从Step1处理好的数据中批次读取训练数据，设置batch大小为64，决定一次加载多少个one-hot学生数据，其维度为64*50*246。Step2: Read the training data in batches from the data processed in Step1, set the batch size to 64, and decide how many one-hot student data to load at a time, and its dimension is 64*50*246.

Step3：利用循环神经网络对学生下一次做题正误进行预测，产生原始预测数据，具体步骤为：Step3: Use the cyclic neural network to predict the correctness of the students' next question, and generate the original prediction data. The specific steps are:

将Step2读取的数据放入循环神经网络中运算，由于学生的做题数据是有时序的，即学生上一步所做的题会对学生下一步做题正确率产生影响，所以选择具有记忆性和数据共享特性的RNN循环神经网络为基础构建模型，设置input_size为246，隐含层大小(hidden_size)为10，循环层的数量为1，激活函数的选择′tanh′，初始隐含层参数随机生成，h_t(t时刻的隐含状态)即会作为t时刻循环神经网络的输出，也会作为t时刻的隐含状态进入下一次循环，则t时刻的隐藏状态为：Put the data read in Step 2 into the recurrent neural network for calculation. Since the data of the students' questions is time-sequential, that is, the questions the students did in the previous step will affect the correct rate of the students' next questions, so the choice has memory Build a model based on the RNN cyclic neural network with data sharing characteristics, set the input_size to 246, the hidden layer size (hidden_size) to 10, the number of cyclic layers to 1, select the activation function 'tanh', and the initial hidden layer parameters are random Generated, h _t (hidden state at time t) will be used as the output of the recurrent neural network at time t, and will enter the next cycle as the hidden state at time t, then the hidden state at time t is:

h_t＝tanh(W_xx_t+b_x+W_hh_(t-1)+b_h) (2)h _t ＝tanh(W _x x _t +b _x +W _h h _(t-1) +b _h ) (2)

Step4：循环神经网络产生的原始预测数据，通过学生近期的学习数据来增强学生最近的学习情况，进行最终预测，具体步骤为：Step4: The original prediction data generated by the cyclic neural network is used to enhance the student's recent learning data through the student's recent learning data, and make the final prediction. The specific steps are:

式中，

和

为线性层fc₁的输入和输出，

为权重，

为偏置。In the formula,

and

is the input and output of the linear layer fc ₁ ,

is the weight,

for the bias.

Step5：在Step3循环神经网络的基础上加入残差连接，提高预测准确率，具体步骤为：Step5: Add residual connection on the basis of Step3 cyclic neural network to improve prediction accuracy. The specific steps are:

在Step3循环神经网络的基础上加入残差连接提高预测准确率。由于发现在循环神经网络中的预测结果会过于追求下一步知识点答题的结果，而忽略了其他知识点预测值的合理性，从而加剧了以后预测其他知识点时的误差，所以设置一个CUT_STEP为10，用于对一个学生的one-hot矩阵的MAX_STEP进行分割，CUT_STEP即为分段数据聚合的分段值，将分段的10个数据相加再经过变换加入到隐含层，进入下一次循环，以上所述即为残差连接，实现残差连接具体步骤如下：On the basis of the Step3 cyclic neural network, the residual connection is added to improve the prediction accuracy. It is found that the prediction results in the cyclic neural network will pursue the results of the next knowledge point answer too much, while ignoring the rationality of the prediction values of other knowledge points, which will aggravate the error when predicting other knowledge points in the future, so set a CUT_STEP as 10. It is used to segment the MAX_STEP of a student's one-hot matrix. CUT_STEP is the segmentation value of segmented data aggregation. Add the 10 segmented data and add them to the hidden layer after transformation, and enter the next time cycle, the above is the residual connection, and the specific steps to realize the residual connection are as follows:

Step5.1：每个学生每在循环神经网络中运算10个数据，就将前面的10个数据乘上权重W再相加(因为每个数据的重要性不同)，利用残差思想将前面10个数据根据其重要性(权重)加起来得到一个蕴含前面10个数据信息的向量，其中权重W＝{w₁，W₂，...，W_C}，权重交给网络自己学习，C为CUT_STEP的简写。Step5.1: Every time each student calculates 10 data in the cyclic neural network, multiply the previous 10 data by the weight W and add them together (because the importance of each data is different), and use the residual idea to divide the previous 10 According to their importance (weight), add up a vector containing the information of the previous 10 data, where the weight W={w ₁ , W ₂ ,...,W _C }, the weight is given to the network to learn by itself, and C is Shorthand for CUT_STEP.

Step5.2：将Step5.1产生的蕴含前面10个数据信息的向量作为输入放入全连接层fc₂，fc₂的输入大小为246，输出为隐含层大小10，fc₂的主要作用为将Step5.1的结果转化成可以直接与隐含层相加的形式，fc₂的表达式为：Step5.2: Put the vector generated by Step5.1 containing the previous 10 data information as input into the fully connected layer fc ₂ , the input size of fc ₂ is 246, and the output is the hidden layer size 10. The main function of fc ₂ is Convert the result of Step5.1 into a form that can be directly added to the hidden layer, the expression of fc ₂ is:

式中，

和

为线性层fc₂的输入和输出，

为权重，

为偏置。In the formula,

and

is the input and output of the linear layer fc ₂ ,

is the weight,

for the bias.

Step5.3：再将fc₂的输出加到隐含层中继续运算，再在RNN中运算10个数据，将前面的10个数据相加，重复上面步骤，t时刻隐含层变化后

的表达式为：Step5.3: Then add the output of fc ₂ to the hidden layer to continue the operation, then calculate 10 data in the RNN, add the previous 10 data, repeat the above steps, after the hidden layer changes at time t

The expression is:

即{s_d(i-C+1)，...s_di}乘以W再求和。In the formula, C is the abbreviation of CUT_STEP, _Shd is the one-hot data of the dth student, W refers to the weight assigned to the _shd involved in the calculation,

That is {s _d(i-C+1) ,...s _di } multiplied by W and summed.

Step6.1：从输入的one-hot矩阵中找到该学生做的除第一个以外(即零号位置)所有知识点的位置组成位置列表，除去第一个的原因在于：由第一条数据产生对第二条的预测，第二条产生对第三条的预测，以此类推，明显，并没有产生对第一条数据的预测，即没有办法对比它的真实值和预测值，计算误差Loss，故将其去掉。Step6.1: From the input one-hot matrix, find the positions of all knowledge points made by the student except the first one (that is, the zero position) to form a position list. The reason for removing the first one is: from the first piece of data Generate a prediction for the second item, the second item generates a prediction for the third item, and so on, obviously, there is no prediction for the first item of data, that is, there is no way to compare its real value and predicted value, and calculate the error Loss, so remove it.

Step6.2：根据Step6.1中找出的位置列表从预测列表找出对应的预测值(因为每一次预测都是对该同学下一次做所有知识点的正确概率的预测，但下一次该同学做的是一个知识点，所以只需要这一个知识点的预测值即可，其他的舍弃)组成预测列表。Step6.2: According to the location list found in Step6.1, find out the corresponding prediction value from the prediction list (because each prediction is the prediction of the correct probability of the student doing all the knowledge points next time, but the next time the student What is done is a knowledge point, so only the predicted value of this knowledge point is needed, and the others are discarded) to form a prediction list.

Step7.3：根据需要定义Epoch的大小为70，决定训练几轮，即可重复几次以上步骤。Step7.3: Define the size of Epoch as 70 according to the needs, decide how many rounds to train, and repeat the above steps several times.

以上结合附图对本发明的具体实施方式作了详细说明，但是本发明并不限于上述实施方式，在本领域普通技术人员所具备的知识范围内，还可以在不脱离本发明宗旨的前提下作出各种变化。The specific embodiments of the present invention have been described in detail above in conjunction with the accompanying drawings, but the present invention is not limited to the above embodiments. Variations.

Claims

1. A deep knowledge tracking method based on residual connection and student current situation feature fusion, characterized in that:

Step1: Data preprocessing, processing each piece of raw data into one-hot data form;

Step2: Read the training data in batches from the data processed in Step1;

Step3: Use the cyclic neural network to predict the correctness of the students' next question, and generate the original prediction data;

Step4: Circulate the original prediction data, use the students' recent learning data to enhance the students' recent learning situation, and make the final prediction;

Step5: Add residual connections on the basis of the cyclic neural network described in Step3;

Step6: Read the true value of the student's correct answer and the predicted value of the model's predicted student's correct answer;

Step7. Calculate the prediction error and optimize the network to obtain a model that can predict whether students will be correct or not next time.

2. the depth knowledge tracking method based on residual connection and student current situation feature fusion according to claim 1, characterized in that, said Step1 is specifically:

Step1.1: Data reading and cleaning, read the required features from the student's record of doing questions, and form a new data form data. The required features include time series id, student id, knowledge point id, and correctness and error of the question;

Step1.2: Calculate the types of knowledge points and form a list of knowledge point ids K={k ₁ , k ₂ ,...,k _l };

Step1.3: Form a dictionary according to the knowledge point list K, the dictionary form is {knowledge point id: the position of the knowledge point in the knowledge point list formed in Step1.2}, and the mathematical symbol is expressed as dict={k:e}, where k∈K, K is the list of knowledge points formed in Step1.2, e∈{0, 1, 2, ..., l-1};

Step1.4: Extract the list of knowledge points of each student and the corresponding list of correct answers from the data to form a sequence of sequences;

Step1.5: Convert the sequences into one-hot encoded form, and process the data into a unified form, that is, each student has MAX_STEP pieces of data, and each piece of one-hot data corresponds to a question that the student has done.

3. the depth knowledge tracking method based on residual connection and student's current situation feature fusion according to claim 1, characterized in that, said Step3 is specifically:

Put the data read in Step2 into the cyclic neural network for calculation to generate the original forecast data, specifically:

h _t ＝tanh(W _x x _t +b _x +W _h h _(t-1) +b _h ) (1)

In the formula, h _t is the hidden state at time t, x _t is the input at time t, and h _(t-1) is the hidden state at time t-1.

4. the depth knowledge tracking method based on the fusion of residual connection and student's recent situation features according to claim 1, wherein, said Step4 is specifically:

Step4.1: Splicing the original prediction data generated in Step3 and the sum of the student’s recent test data to form the experience data o _t , the formula is:

In the formula, o _t is the data after splicing the original forecast data and the sum of the student’s recent test data, h _t is the original forecast data, x _i is the one-hot test data of the student at time i, and n is the required The number of spliced students' recent test data;

Step4.2: Put the spliced data o _t of Step4.1 into the fully connected layer fc ₁ , and adjust the predicted dimension. The expression of fc ₁ is:

In the formula, o _fc1 and y _fc1 are the input and output of the linear layer _fc1 , A _fc1 is the weight, b _fc1 is the bias;

Step4.3: Put the output vector of Step4.2 into the activation function, control each element of the output between 0 and 1, and get the final prediction result. The numerical conversion is expressed as:

5. the depth knowledge tracking method based on residual connection and student's current situation feature fusion according to claim 1, is characterized in that, described Step5 is specifically:

Step5.1: Every time each student calculates CUT_STEP data in the cyclic neural network, multiply the previous CUT_STEP data by the weight W and add them together, and use the residual idea to add the previous CUT_STEP data according to their importance to get a A vector containing the previous CUT_STEP data information, where the weight W={w ₁ , w ₂ ,...,w _C }, C is the abbreviation of CUT_STEP;

Step5.2: Put the vector generated by Step5.1 containing the previous CUT_STEP data information into the fully connected layer fc ₂ as input, and convert the result of Step5.1 into a form that can be directly added to the hidden layer, fc ₂ The expression is:

In the formula, x _fc2 and y _fc2 are the input and output of the linear layer _fc2 , A _fc2 is the weight, b _fc2 is the bias;

Step5.3: Then add the output of fc ₂ to the hidden layer to continue the operation, then calculate CUT_STEP data in RNN, add the previous CUT_STEP data, repeat the above steps, after the hidden layer changes at time t

The expression is:

In the formula, C is the abbreviation of CUT_STEP, s _hd is the one-hot data of the dth student, W refers to the weight assigned to s _hd participating in the calculation,

That is {s _d(i-C+1) ,...s _di } multiplied by W and summed.

6. the depth knowledge tracking method based on residual connection and student current situation feature fusion according to claim 1, characterized in that, said Step6 is specifically:

Step6.1: From the input one-hot matrix, find the positions of all knowledge points made by the student except the first one to form a list of positions;

Step6.2: Find the corresponding predicted value from the predicted list according to the found position list;

Step6.3: From the input one-hot matrix, find the true value list consisting of all knowledge points except the first one that the student has done correctly or not.

7. the depth knowledge tracking method based on residual connection and student's current situation feature fusion according to claim 1, wherein, said Step7 is specifically:

Step7.1: According to the actual value and predicted value read in Step6, use the loss function to calculate the loss value;

Step7.2: According to the loss value, use the optimizer to optimize the network;

Step7.3: Define the size of Epoch according to the needs, determine the number of optimizations, and repeat the above steps several times;

Step7.4: Obtain a trained model that can predict whether the student will do the next question correctly or not.