CN114266340A

CN114266340A - Knowledge query network model introducing self-attention mechanism

Info

Publication number: CN114266340A
Application number: CN202111560167.XA
Authority: CN
Inventors: 程艳; 吴刚; 陈豪迈; 项国雄
Original assignee: Jiangxi Normal University
Current assignee: Jiangxi Normal University
Priority date: 2021-12-20
Filing date: 2021-12-20
Publication date: 2022-04-01

Abstract

Aiming at the defects of the knowledge query network and the self-attention knowledge tracking model, the knowledge query network model introducing the self-attention mechanism is designed. The invention aims to provide respective advantages of a knowledge query network and a self-attention mechanism, and the self-attention mechanism is introduced into the knowledge query network, so that the capability of a model modeling sequence is maintained, and meanwhile, different positions of a single sequence can be enhanced to be associated to calculate the representation of the sequence, so that more accurate internal key characteristics of a student history question making record can be obtained. And a regularization term corresponding to reconstruction errors is added into the model loss function to enhance the consistency of model prediction and further solve the existing reconstruction errors.

Description

Knowledge query network model introducing self-attention mechanism

Technical Field

The invention belongs to the field of intelligent education and is applied to a knowledge tracking task.

Background

First, noun interpretation: 1. knowledge Tracking (KT): and modeling according to the answer records of the learner to obtain the knowledge mastering state of the learner and predict the answer probability of the learner in the next question.

2. Knowledge Queries (KQN): the researchers in 2019 propose a knowledge query network to solve the knowledge tracking task, the knowledge query network encodes the historical interaction sequence of the current time step of the student and the KC contained in the next time step topic into a knowledge state vector and a skill vector with the same dimension by using a neural network, and then the interaction between the knowledge state of the student and the KC is defined by using the dot product between the two vectors.

3. Long short-term memory network (LSTM): the LSTM model is an improved model proposed by researchers for the RNN gradient disappearance and gradient explosion problems. A 'gate' is added on an original RNN model to control information transmission, so that the problems of gradient disappearance and explosion can be avoided to a certain extent, and long-distance dependence information of a sequence is acquired.

4. Self-attention Mechanism (self-attention Mechanism): derived from studies on human vision. In cognitive science, due to the bottleneck of information processing, human beings selectively pay attention to a part of all information while ignoring other visible information; later on, this idea was applied to image processing and natural language processing, and a good result was obtained, and recently, a self-attention mechanism was introduced to the task of knowledge tracking, which aims to better focus on the learning history sequence that is more important for prediction.

5. Reconstruction error (reconstruction error): one of the major problems of the knowledge query network model is the reconstruction error, that is, when a student correctly answers a question containing a certain skill, the prediction probability of the model for whether the student can answer the question containing the skill at the current time step is reduced, and vice versa.

6. Knowledge Component (KC): KC can be understood broadly as a point of knowledge, a concept of knowledge, a principle, a fact, or a skill.

Secondly, the prior art: 1.(1) Bayesian Knowledge Tracking (BKT) method: the BKT model models the knowledge state of the student into a group of potential binary variables, and meanwhile, the potential variable of the knowledge state of the student is updated by using a Hidden Markov Model (HMM) according to observable variables such as wrong conditions of the student for answering the question. Although BKT and its extended model have been highly successful in the KT domain, they still have significant problems in their own right. First, the knowledge state of a learner expressed as a set of binary variables does not conform to the learning process in the real world; second, the way BKT models separately for each KC makes it impossible to capture the relationships between different KCs, nor to model undefined KCs. (2) Depth knowledge tracking model (DKT): in 2015, the DKT model introduced a deep neural network into the knowledge tracking task for the first time, which utilized LSTM to model student sequences and achieved good results, but its interpretability was always questionable. (3) Knowledge Query Network (KQN) model: the historical interaction sequence of the current time step of the student and the KC contained in the next time step topic are coded into a knowledge state vector and a skill vector with the same dimension by using a neural network, and then the interaction between the knowledge state of the student and the KC is defined by using the dot product between the two vectors, and the reconstruction error exists in the same way as in DKT.

2. Self-attention knowledge tracking (SAKT) model: a Transformer structure is used in the KT field to replace an RNN used by an original DKT model, so that the long-term dependence problem of the RNN is solved, the model prediction performance is greatly improved, and the sequence modeling capability of the RNN is lost in the SAKT model.

Thirdly, the technical problem is as follows: 1. although the knowledge query network improves interpretability in student and KC interaction to some extent, the prediction performance is not as good as self-attention knowledge tracking because the long-term dependence problem of LSTM limits the performance of the knowledge query network, and the knowledge query network has reconstruction errors as same as DKT. 2. Self-attention knowledge tracking uses more advanced Transformer structures, but also loses the RNN's ability to model sequences, and the student's learning is continuous, so the sequence modeling ability of the model is not negligible.

Disclosure of Invention

1. Aiming at the defects of a knowledge query network and a self-attention knowledge tracking model, the invention aims to integrate the advantages of the knowledge query network and the self-attention knowledge tracking model, introduce a self-attention mechanism into the knowledge query network model, obtain more accurate internal key characteristics of a student historical interaction sequence, simultaneously keep the cyclic modeling capacity of a long-term and short-term memory network, introduce a regularization item into a loss function, enhance the consistency of model prediction and solve the reconstruction error of the knowledge query network;

2. the technical innovation points of the invention are as follows: (1) the deep knowledge tracking model of the knowledge inquiry network with the self-attention mechanism is provided, the position information provided by the long-term and short-term memory network is utilized to model the front-back relation of the historical interaction sequence of the students, the modeling capability of the model sequence is reserved, meanwhile, the self-attention mechanism is used for associating different positions of a single sequence to calculate the representation of the sequence to obtain more accurate internal key characteristics of the historical problem making record of the students, and the advantages of the two are fused to improve the prediction performance; (2) introducing a regularization term corresponding to a reconstruction problem into a loss function of the model to enhance the consistency of prediction so as to solve KQN reconstruction errors existing in the model;

drawings

FIG. 1 is a diagram of a knowledge query network architecture incorporating a self-attention mechanism.

Detailed Description

The attached drawing of the specification is a model structure diagram of the invention at the time t, and the model consists of three parts: knowledge state coder (knowledge state encoder), skill coder (kill encoder), and knowledge state query (knowledge state query). At time t, the knowledge state encoder will be sending historical interaction tuples from students x_tFirstly inputting the data into the LSTM layer to obtain the hidden state h_tThen handle h_tInput to the attention layer results in_tFinally a is added_tKnowledge state vector KS transformed into d dimension_t(ii) a And the skill encoder will include the skill q in the next time step t +1_t+1Embedding into a skill vector S, also d-dimensional, by means of a Multilayer Perceptron (MLP)_t+1In (1). Then the two vectors are transmitted to a knowledge state query component, the knowledge query component describes the interaction between the knowledge state of the student and the KC contained in the question in a dot product mode of the two vectors, and finally the dot product result is processed by sigmoThe id function obtains a probabilistic prediction of whether a student at the current time step can correctly answer the topic at the next time step.

Claims

1. A deep knowledge tracking model of a knowledge query network introducing a self-attention mechanism is characterized in that: the method comprises the steps of firstly, utilizing position information provided by a long-term and short-term memory network in a knowledge state encoder to model the context of student interaction sequences, then associating different positions of a single sequence through an attention mechanism to calculate the representation of the sequence to obtain more accurate internal key features of student history problem making records, and then encoding the results obtained from an attention layer into knowledge state vectors. And finally, the model carries out vector dot product on the skill vector obtained by the multi-layer perceptron of the skill encoder and the knowledge state vector obtained by the knowledge encoder to simulate the interaction between the knowledge state and the knowledge point and inputs the interaction into a sigmoid function to obtain the correct answer probability of the next question of the student. Regularization terms corresponding to reconstruction problems are introduced into the loss functions to enhance the consistency of model prediction and further solve reconstruction errors.

2. The knowledge query network model for introducing a self-attention mechanism as claimed in claim 1, wherein: the knowledge state encoder keeps the capability of memorizing the network modeling sequence in a long and short term, and meanwhile, automatically focuses on the problem making record with larger influence on the prediction result in the student history interaction sequence by using a self-attention mechanism, and extracts more accurate relevant characteristics of the student knowledge state.

3. The knowledge query network model for introducing a self-attention mechanism as claimed in claim 1, wherein: the addition of regularization terms to the corresponding reconstruction problem normalizes the original model by taking into account the loss of interaction between the prediction and the current student knowledge state and skill.

4. The knowledge query network model for introducing a self-attention mechanism as claimed in claim 1, wherein: the input student interaction sequence of the student knowledge state encoder and the input skills of the skills encoder are encoded into one-hot encoded vectors.

5. The knowledge query network model for introducing a self-attention mechanism as claimed in claim 1, wherein: the dot product effect of the output student knowledge state vector of the student knowledge state encoder and the output skill vector of the skill encoder is in accordance with the condition that students answer questions in the real world based on self knowledge states and questions.