CN114911975A

CN114911975A - Knowledge tracking method based on graph attention network

Info

Publication number: CN114911975A
Application number: CN202210479195.7A
Authority: CN
Inventors: 张井合; 李林昊; 李英双
Original assignee: Jinhua Hangda Beidou Application Technology Co ltd
Current assignee: Jinhua Hangda Beidou Application Technology Co ltd
Priority date: 2022-05-05
Filing date: 2022-05-05
Publication date: 2022-08-16
Anticipated expiration: 2042-05-05
Also published as: CN114911975B

Abstract

The invention provides a knowledge tracking method based on a graph attention network, and belongs to the technical field of knowledge tracking. The knowledge tracking method based on the graph attention network comprises the following steps: s1: constructing a knowledge graph; s2; obtaining the input of a model; s3: obtaining an embedded representation of the problem; s4: obtaining a final node representation; s5: acquiring the knowledge state of a student at the moment t; s6: acquiring a prediction probability of a student; s7: a joint training objective is constructed. Aiming at the defects of the existing knowledge tracking method, the invention researches how to effectively mine deep-level relation information in a knowledge graph, constructs the knowledge graph based on exercises and knowledge points contained in the exercises, creatively applies the graph attention network to the knowledge tracking field, distributes different weights for different adjacent nodes, learns the embedded expression of the exercises from deep-level characteristics, and then captures the knowledge state change of students in the answering process by using the LSTM, thereby accurately predicting the future answering expression of the students.

Description

Knowledge tracking method based on graph attention network

Technical Field

The invention relates to the technical field of knowledge tracking, in particular to a knowledge tracking method based on a graph attention network.

Background

In intelligent teaching systems such as MOOC, Udemy, Lynda and the like and large-scale online open course platforms, knowledge tracking is an indispensable task and aims to capture the change of the knowledge state of a student in the answering process and predict the future answering performance of the student. The research of knowledge tracking has important significance for realizing intelligent education and completing personalized education tasks. Specifically, after a learner of the online platform learns a knowledge point, the platform will have corresponding exercises to verify that the learner is fully mastered the knowledge point. The knowledge tracking task puts the sequence of exercises already answered by the learner on the platform into a model for training, the model can capture the change of the knowledge mastery degree of the learner in the sequence, and when a new exercise comes, the model predicts whether the learner can answer correctly according to the knowledge mastery degree of the learner relative to the exercise.

Existing KT methods typically build prediction models based on one-hot coding formats of problems or knowledge points as input. Item Response Theory (IRT) is a method integration of learning and studying student performance in the field of psychology, and is also a traditional method of knowledge tracking. The Deep Knowledge tracking model (DKT) introduces Deep learning into the Knowledge tracking field and greatly improves the performance compared with the traditional method. Subsequent scholars also propose various novel frameworks based on deep learning methods. The DKVMN uses a static matrix key to store the embedding of the knowledge points and a dynamic matrix value to store and update the learner's mastery of the knowledge points. A convolutional knowledge tracking model (CKT) considers the difference between the prior knowledge and the learning rate of students and introduces a convolutional neural network to model the individuation of the students. In addition, in recent years, knowledge tracking models based on deep learning have been the focus of research by researchers, such as EERNN, EKT, SKVMN, and the like. The depth model shows strong performance with strong feature extraction capability, and especially in a time sequence task, the knowledge state can be obtained through complex change for each simple input model to predict. However, the existing knowledge tracking method has the following defects:

(1) the existing knowledge tracking research method often cannot effectively mine deep-level information in exercises and knowledge points, namely, the relations between the exercises and the exercises, between the exercises and the knowledge points and between the knowledge points and the knowledge points cannot be mined, so that the model based on the exercises or the knowledge points is difficult to exert the advantages in the knowledge tracking task.

(2) The existing knowledge tracking method can capture the knowledge state of a student, but the deep information of the student cannot be fully utilized in the process of predicting the future answer expression of the student, so that a model cannot obtain a good effect in a knowledge tracking task.

Disclosure of Invention

The present invention is directed to a knowledge tracking method based on a graph attention network, which is provided to overcome the above-mentioned shortcomings of the prior art.

The invention provides a knowledge tracking method based on a graph attention network, which comprises the following steps:

s1: based on exercise collection

Set of knowledge points

Constructing a knowledge graph G ═ t _i ，t _j ，b _ij }

Wherein b is _ij ∈{0，1}，t _i ∈{q _j ，c _m }，j∈|Q|，m∈|C|；

S2; based on problem set

Obtaining input x of a model _t ；

S3: one-hot coding of each problem is used for obtaining an embedded representation of the problem in a matrix multiplication mode

S4: using a self-attention mechanism for all graph nodes, based on knowledge graph G ═ t _i ，t _j ，b _ij Get toCoefficient of relationship between nodes v _ij GAT uses the calculation of weights for each central node by a multi-headed attention layer, based on a relationship coefficient v _ij Obtaining a final node representation

S5: obtaining knowledge state changes of students with time by using LSTM based on input x _t Acquiring knowledge state h of student at time t _t ；

S6: problem-based embedded representation obtaining representation vector d of difficulty through difficulty analysis layer _t Knowledge state h based on time t of student in IRT model _t Obtaining a project discrimination coefficient alpha based on an input x _t Obtaining student's ability theta, representing vector d based on difficulty _t Obtaining final exercise difficulty beta, and obtaining the prediction probability p of the student for answering the current exercise based on the project discrimination coefficient alpha, the student ability theta and the final exercise difficulty beta _t ；

S7: evaluating relationships between graph nodes in a manner that uses inner products based on knowledge graphs

Evaluating the embedded representation of the graph nodes by local proximity of the graph nodes is based on relationships between the graph nodes

And b _ij Obtaining a first loss function

Predictive probability p for answering a current problem using a student _t With the true result r _t The cross entropy loss function between the two is taken as an objective function, and then the cross entropy loss function is based on the prediction probability p _t With the true result r _t Obtaining a second loss function

Constructing a joint training target L ═ lambda ₁ L ₁ +λ ₂ L ₂ Wherein λ is ₁ ，λ ₂ And (3) representing a weighing coefficient for controlling the local proximity loss of the nodes in the graph and the performance prediction loss of the students.

Further, the exercises are collected

Set of knowledge points

Merge to get a set

Wherein j belongs to the element of Q, m belongs to the element of C, T is Q + C, based on the set

Constructing a knowledge graph G ═ t _i ，t _j ，b _ij In which b is _ij E {0, 1}, when b _ij When 1, it means that there is an edge between node i and node j, and vice versa.

Further, step S2 specifically includes: at time t, the model input x _t The device consists of two parts: the first part being problem q of dimension N _t The second part is r _t (ii) a If the student can answer q correctly _t Then r is _t 1, otherwise r _t 0, if the student answers the correct question, then at q _t Rear re-splicing r _t，l If the answer is wrong, at q _t Rear re-splicing r _t，0 ，

Wherein r is _t，1 、r _t，0 Are all vectors of dimension N, r _t，1 The position of the corresponding question mark is 1, and the rest are 0, r _t，0 Then is an all-zero vector

x _t Is a vector with dimension 2N.

Further, step S3 includes: embedded representation of exercises

Wherein

Are trainable parameters and N represents the embedded dimension.

Further, step S4 includes: using the self-attention mechanism for all graph nodes, and then using the shared weight matrix W _t Acting on adjacent nodes, and then obtaining a relation coefficient v between the nodes by using a nonlinear activation function LeakyReLU _ij ，v _ij ＝LeakyReLU(W·[W _t ·t _i ||W _t ·t _j ]) Where | | | represents a join operation, and | | | represents an inner product operation.

Further, step S4 includes: for v _ij Normalization operation is carried out to obtain attention weight value of node

Where softmax denotes the activation function, N _i Represents a node t _i Number of adjacent nodes of a _ij Represents a node t _i And its adjacent node t _j Attention weight of (a); GAT uses the weight calculation of a multi-head attention layer acting on each central node, K different attention surfaces are obtained by setting K independent attention mechanisms, and then a final node representation is obtained by using a nonlinear activation function

Where K denotes the number of multi-head attention levels set, σ denotes the sigmoid activation function,

representing the node representation of the h-th level.

Further, step S5 specifically includes: LSTM passing forget gate f _t And input gate i _t And an output gate o _t And cell state C _t Acquiring knowledge state h of student at time t _t (ii) a Wherein the forgetting door simulates the knowledge forgetting rate f of students in the learning process at the moment t _t ＝σ(W _f ·[h _t-1 ，x _t ]+b _f ) Wherein h is _t-1 The knowledge state of the student at time t-1,

is a trainable parameter; the input gate simulates the process of a student updating his knowledge status in the face of a problem, where i _t Representing a learning process that simulates new knowledge i _t ＝σ(W _i ·[h _t-1 ，x _t ]+b _i )，

Shows a variation process simulating the degree of mastery of old knowledge,

C _t indicating that the two are combined to form a new cell state,

where tanh represents the hyperbolic tangent activation function,

are all trainable parameters, C _t-1 Represents the state of the cells at time t-1,

multiplying corresponding elements of the representative vector; the output gate simulates the change process of the knowledge state of the student forgotten according to the current learning knowledge and the historical knowledge, and the formula at the moment t is as follows: o _t ＝σ(W _o ·[h _t-1 ，x _t ]+b _o )，

Wherein

Are trainable parameters, h _t-1 Represents the knowledge state of the student at the time t-1, and sigma (-) represents the sigmoid activation function.

Further, step S6 includes: by the formula θ ═ tanh (W) _θ ·h _t +b _θ ) Acquiring the ability theta of the student, and obtaining the ability theta of the student through a formula alpha-sigma (W) _α ·x _t +b _α ) Acquiring a project discrimination coefficient alpha, wherein sigma and tanh both represent activation functions,

are trainable parameters.

Further, step S6 includes: obtaining a representation vector d of the difficulty by a difficulty analysis layer _t ，

Wherein

For the embedded representation of the problem,

are all trainable parameters; to d _t Performing matrix transformation operation, and obtaining final problem difficulty beta through a tanh activation function, wherein the beta is tanh (W) _β ·d _t +b _β ) Wherein

Are all trainable parameters; obtaining the prediction probability p of the student answering the current exercise _t ，p _t ＝σ(W _p ·[α(θ-β)]+b _p ) Wherein

Are trainable parameters.

Further, step S7 includes: evaluating relationships between graph nodes by means of inner products

Converting the weight value between nodes into [0, 1]]Wherein i, j ∈ [ 1., | T |, ]]And σ denotes a sigmoid activation function.

The knowledge tracking method based on the graph attention network has the following beneficial effects:

the invention establishes a new knowledge tracking method based on a graph attention network. In the method, a graph attention network is used for capturing node representation containing deep level information, the relation of problem-problem, knowledge point-knowledge point and problem-knowledge point is captured based on a constructed knowledge graph, a new problem representation containing the deep level information is obtained, an LSTM network is used for simulating the change of the knowledge state of a student in the process of answering a problem sequence, and finally the future performance of the student is predicted from the three aspects of the capability of the student, the difficulty of the problem and the item distinction degree by combining with the improved IRT. Experimental analysis is carried out on 6 public data sets, and the knowledge tracking method based on the graph attention network is proved to be capable of remarkably improving the accuracy rate of a knowledge tracking task; aiming at the defects of the existing knowledge tracking method, the method researches how to effectively mine deep-level relation information in a knowledge graph, constructs the knowledge graph based on exercises and knowledge points contained in the exercises, creatively applies the graph attention network to the knowledge tracking field, distributes different weights for different adjacent nodes, learns the embedded representation of the exercises from deep-level characteristics, then uses LSTM to capture the knowledge state change of students in the exercise answering process, and further accurately predicts the future exercise answering performance of the students.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention. In the drawings, like reference numerals are used to indicate like elements. The drawings in the following description are directed to some, but not all embodiments of the invention. For a person skilled in the art, other figures can be derived from these figures without inventive effort.

FIG. 1 is a schematic diagram of a knowledge tracking method based on a graph attention network according to the present invention;

FIG. 2 is a schematic diagram of an LSTM long-short term memory neural network in the knowledge tracking method based on the graph attention network according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention. It should be noted that the embodiments and features of the embodiments in the present application may be arbitrarily combined with each other without conflict.

Please refer to fig. 1-2. The embodiment of the invention provides a knowledge tracking method based on a graph attention network, which comprises the following steps:

s1: based on exercise collection

Set of knowledge points

Constructing a knowledge graph G ═ t _i ，t _j ，b _ij }

Wherein b is _ij ∈{0，1}，t _i ∈{q _j ，c _m }，j∈|Q|，m∈|C|；

S2; based on exercise collection

Obtaining input x of a model _t ；

S4: using a self-attention mechanism for all graph nodes, based on knowledge graph G ═ t _i ，t _j ，b _ij Get the relation coefficient v between the nodes _ij GAT uses the computation of weights for each central node by a multi-headed attention tier, based on a relationship coefficient v _ij Obtaining a final node representation

And b _ij Obtaining a first loss function

Predictive probability p for answering a current problem using a student _t With the true result r _t Cross entropy loss function between as the target function, then based on the prediction probability p _t With the true result r _t Obtaining a second loss function

Constructing a joint training target L ═ λ ₁ L ₁ +λ ₂ L ₂ Wherein λ is ₁ ，λ ₂ And (3) representing a weighing coefficient for controlling the local proximity loss of the nodes in the graph and the performance prediction loss of the students.

The invention aims to provide a knowledge tracking method and a knowledge tracking system based on a graph attention network, which improve the performance of knowledge tracking and can better help learners to make personalized plans. Aiming at the defects of the existing knowledge tracking method, the invention researches how to effectively mine deep-level relation information in a knowledge graph, constructs the knowledge graph based on exercises and knowledge points contained in the exercises, and is inspired by a graph attention network (GAT). . In addition, the invention also introduces and improves the cognitive education psychology theory IRT, predicts the future performance of students from the three aspects of exercise difficulty, student ability and project discrimination, and further improves the performance of the model.

The problems in the original data set are preprocessed according to students, then the number of all problems and knowledge points in the data set is counted, and each problem and knowledge point are numbered, so that the model is convenient to train. And for each student, counting the answer sequence of the student, wherein each answer sequence comprises three lines of data, the first line represents the number of the answers of the student, the second line represents the exercise number of the answers of the student, the third line represents whether the student answers the exercise correctly, the correct answer is 1, and the correct answer is 0.

Step S1 specifically includes: set exercises

Set of knowledge points

Merge to get a set

Where j belongs to | Q |, m belongs to | C |, | T | + | C |, based on the set

In a real online education scenario, students usually face multiple exercises and multiple knowledge points in the process of answering questions, wherein one exercise may include multiple knowledge points, and one knowledge point may correspond to multiple exercises. Besides, there is also an association relationship between the exercises and knowledge points.

The knowledge graph is established to explore the relation between the exercises, the relation between the knowledge points and the relation between the exercises and the knowledge points. Suppose that

Is a set of students of length | S ",

is a problem set of length | Q |,

is a set of knowledge points of length | C |, for each student s _i Completing exercises in Q independently and arbitrarily Q _i All contained knowledge points belong to the knowledge point set C. Considering the size and simplicity of the data set in the invention, the exercise set and the knowledge point set are combined to obtain a set

Wherein j belongs to | Q |, m belongs to | C |, and | T | + | C |; then, a knowledge graph G ═ t is defined _i ，t _j ，b _ij In which b is _ij E {0, 1}, i.e., when b _ij When the value is 1, the condition indicates that an edge exists between the node i and the node j, otherwise, the edge does not exist, and an adjacency matrix of the knowledge graph is defined as

Step S2 specifically includes: at time t, the model input x _t The device consists of two parts: the first part being problem q of dimension N _t The second part is r _t (ii) a If the student can answer q correctly _t Then r is _t 1, otherwise r _t 0, if the student answers the correct question, then at q _t Rear re-splicing r _t，1 If the answer is wrong, at q _t Rear re-splicing r _t，0 ，

x _t Is a vector with dimension 2N.

Relevant elements are extracted from the raw data and the input to the model is constructed. The first step has been to count the sequence of answers for each student, but the information expressed by the individual numbers is limited. Thus, one-hot encoding lattices are usedAnd (4) processing each exercise and the answering condition. At time t, the model input x _t Consists of two parts. The first part being problem q of dimension N _t Consists of N different exercises. It is a one-hot code, and only the position corresponding to the title number is 1, and the rest positions are 0. The second part is r _t If students s _i Correct answer to problem q _t Then r is _t 1, otherwise r _t 0. For better training, r is set _t Conversion to q _t Representation of the form, i.e. one-hot encoding. If the student answers the correct exercise, at q _t Rear re-splicing r _t，1 If the answer is wrong, then splice r _t，0 。

x _t Is a vector with dimension 2N.

Step S3 includes: embedded representation of problem

Wherein

Are trainable parameters and N represents the embedded dimension.

The input data of the knowledge tracking task comprises an exercise and corresponding student performance, and the deep information contained in the exercise and knowledge points is mined, so that the exercise and knowledge points are irrelevant to the student performance. Therefore, the method extracts the problem part in the student historical answer sequence, and obtains the embedded expression of the problem by matrix multiplication for the one-hot code of each problem. The embedding of the problem is represented as follows:

wherein

Are trainable parameters, N represents the embedded dimension,

representing the representation of the problem after embedding.

Step S4 includes: using a self-attention mechanism on all graph nodes, and then using a shared weight matrix W _t Acting on adjacent nodes, and then obtaining a relation coefficient v between the nodes by using a nonlinear activation function LeakyReLU _ij ，v _ij ＝LeakyReLU(W·[W _t ·t _i ||W _t ·t _j ]) Where | | | represents a join operation, and | | | represents an inner product operation.

Step S4 further includes: for v _ij Normalization operation is carried out to obtain attention weight value of node

Wherein softmax represents the activation function, N _i Represents a node t _i Number of adjacent nodes of (a) _ij Represents a node t _i And its adjacent node t _j Attention weight of (1); GAT uses the weight calculation of a multi-head attention layer acting on each central node, K different attention surfaces are obtained by setting K independent attention mechanisms, and then a final node representation is obtained by using a nonlinear activation function

representing the node representation of the h-th level.

Given a knowledge graph G ═ t _i ，t _j ，b _ij First, to compute the attention coefficients between nodes, a self-attention mechanism is used for all graph nodes, and then a shared weight matrix W is used _t Acting on adjacent nodes, and then obtaining a relation coefficient v between the nodes by using a nonlinear activation function LeakyReLU _ij 。v _ij The calculation process of (2) is as follows:

v _ij ＝LeakyReLU(W·[W _t ·t _i ||W _t ·t _j ])

where | | | represents a join operation, and | | | represents an inner product operation. For v _ij Carrying out normalization operation to obtain the attention weight value of the node, wherein the specific formula is as follows:

where softmax denotes the activation function, N _i Represents a node t _i Number of adjacent nodes of a _ij Represents a node t _i And its adjacent node t _j Attention weight of (1). In addition, in order to stably learn attention weights, GAT uses a multi-head attention layer to act on weight calculation of each central node, K different attention planes are obtained by setting K independent attention mechanisms, and a final node representation is obtained by using a nonlinear activation function

The present invention provides 8 independent attention mechanisms. The specific process is as follows:

representing the node representation of the h-th level. In addition, in order to obtain comprehensive node representation, the invention sets 3 graph attention layers for training. In the graph attention network, after the graph attention layer training of 3 layers, each central node can obtain the information of 3-order adjacent nodes.

Step S5 specifically includes: LSTM passing forget gate f _t And input gate i _t And an output gate o _t And cell state C _t Acquiring knowledge state h of student at t moment _t (ii) a Wherein the forgetting door simulates the knowledge forgetting rate f of students in the learning process at the moment t _t ＝σ(W _f ·[h _t-1 ，x _t ]+b _f ) Wherein h is _t-1 The knowledge state of the student at time t-1,

Shows a variation process simulating the degree of mastery of old knowledge,

C _t indicating that the two are combined to form a new cell state,

where tanh represents the hyperbolic tangent activation function,

are all trainable parameters, C _t-1 Represents t-1The state of the cells at the time of day,

Wherein

LSTM is employed to capture the student's knowledge state over time, which exhibits excellent performance in the deep knowledge tracking task. LSTM acquires the knowledge state h of the student through three gates and one state _t Respectively is a forgetting door f _t And input gate i _t Output gate o _t And cell state C _t . Wherein, cell state C _t Historical information is passed to each cell to address the problem of RNN's difficulty in capturing long term dependencies.

Forget the door: in a real scenario, over time, students gradually forget some of the previously learned knowledge. The forgetting gate uses a scalar of [0, 1] to simulate the knowledge forgetting rate of students in the learning process, and the formula at the time t is as follows:

f _t ＝σ(W _f ·[h _t-1 ，x _t ]+b _f )

an input gate: the input gate simulates the process of a student updating the knowledge state of the student when facing a problem, wherein the process comprises learning of new knowledge and reviewing of old knowledge. i.e. i _t A learning process is shown that models new knowledge,

representing simulated old knowledge learning processDegree of change, C _t Indicating that the two are combined to form a new cell state. This process receives new input and updates the current cell state. The formula at time t is as follows:

i _t ＝σ(W _i ·[h _t-1 ，x _t ]+b _i )

where tanh represents the hyperbolic tangent activation function,

are trainable parameters. C _t-1 Represents the state of the cells at time t-1,

the representative vector is multiplied by the corresponding elements.

An output gate: the output gate simulates the change process of the knowledge state of the student according to the current learning knowledge and the historical knowledge forgotten, and outputs the current knowledge state h of the student _t . The formula at time t is:

o _t ＝σ(W _o ·[h _t-1 ，x _t ]+b _o )

wherein

Are trainable parameters. h is _t-1 Representing the knowledge state of the student at time t-1And σ (·) denotes a sigmoid activation function.

Step S6 includes: by the formula θ ═ tanh (W) _θ ·h _t +b _θ ) Acquiring the ability theta of the student, and obtaining the ability theta of the student through a formula alpha-sigma (W) _α ·x _t +b _α ) Acquiring a project discrimination coefficient alpha, wherein sigma and tanh both represent activation functions,

are trainable parameters.

Step S6 further includes: obtaining a representation vector d of the difficulty by a difficulty analysis layer _t ，

Wherein

For the embedded representation of the problem,

Are trainable parameters.

IRT is the classical theory in statistical psychology, can quantify student's ability, shows its powerful performance in knowledge tracking. The IRT model adopted by the invention mainly comprises three parameters, namely a project discrimination coefficient alpha, student ability theta and exercise difficulty beta. In the knowledge tracking task, the ability of the student is based on the knowledge state of the student, and the better the knowledge state of the student, the stronger the ability of the student. The item distinction degree is closely related to the exercise itself and the student's answering situation, and if the student's answering situation is greatly different when facing a certain exercise, it shows that the performance of the exercise for distinguishing the student's ability is better. The invention defines the ability theta of students and the item discrimination coefficient alpha as follows:

θ＝tanh(W _θ ·h _t +b _θ )

α＝σ(W _α ·x _t +b _α )

wherein σ and tanh both represent activation functions,

are trainable parameters. h is _t Representing the knowledge state representation of the student at time t.

Since the difficulty of the exercises is only related to the exercises, the model firstly obtains a representing vector d of the difficulty through a difficulty analysis layer _t Then, d is aligned again _t And performing matrix transformation operation, and obtaining the final problem difficulty beta through the tanh activation function. d _t The representation of β is as follows:

β＝tanh(W _β ·d _t +b _β )

the invention expands each parameter to the multidimensional space to represent each parameter from a plurality of different characteristics so as to predict the future answer performance of students more accurately. The prediction process is as follows:

p _t ＝σ(W _p ·[α(θ-β)]+b _p )

wherein

Are all trainable parameters, p _t ∈[0，1]Representing the student at time tThe probability of answering the problem correctly. This model is defined as p _t ∈[0，0.5]And if so, judging that the student answers wrongly, and otherwise, judging that the student answers correctly.

Step S7 includes: evaluating relationships between graph nodes by means of inner products

In order to train the model better, the invention establishes a joint training framework to train the parameters of the model to reach the optimal. The invention uses the inner product mode to evaluate the relationship between the nodes of the graph, and the relationship between the nodes is expressed as follows:

wherein i, j belongs to [1,., | T | ], and sigma represents a sigmoid activation function, and the weight value among nodes is converted into a value of [0,1 ]. To make the learned node representation closer to the true result, the present invention defines the local proximity of graph nodes to evaluate the embedded representation of graph nodes, with the loss function shown as:

in order to enable the model to more accurately model the knowledge state of the student and further predict the future answer performance of the student, the invention uses the prediction probability p of the student for answering the current exercise _t With the true result r _t The cross entropy loss function between them is used as the objective function, and the specific loss function is defined as follows:

then, aiming at the training targets of the two aspects, the invention constructs a joint training target which is specifically expressed as follows:

L＝λ ₁ L ₁ +λ ₂ L ₂

wherein λ ₁ ，λ ₂ And (3) representing a weighing coefficient for controlling the local proximity loss of the nodes in the graph and the performance prediction loss of the students.

The invention makes a large number of experimental designs to search for proper hyper-parameters, and concretely, the invention randomly divides each data set into 80% of training sets and 20% of testing sets, wherein the testing sets are used for evaluating the performance of the model and stopping the model training in advance. All experiments used the 5-fold cross-validation method, and all models were evaluated for performance by averaging 5 experiments. The method uses an Adam optimizer to train a model, the maximum training frequency is set to be 200, the maximum gradient norm during clipping is set to be 5.0, the learning rate setting range is [0.001,0.01], and the weight matrix and the bias in the network are initialized by normal distribution with the average value of 0 and the standard deviation of 0.01. The batch for experimental training is generally set to 64, but the specific setting will be according to the size of the data set, for example, a2012 data set is large in number, and the batch for training is generally set to 32. To prevent the over-fitting situation, the model is added with a Dropout layer, and the parameter is set to 0.5 during training.

To verify the advantages of the present invention in solving the knowledge tracking task, the present embodiment performed experiments on 6 public datasets, namely ASS09-up, ASSIST2012, stats 2011, synthetic, AICFE-math, AICFE-phy. 4 latest knowledge tracking models are compared, namely a hidden Markov-based knowledge tracking model (BKT), a latest factorization model (KTM), a deep knowledge tracking model (DKT) and a dynamic key value memory network model (DKMN). The AUC (area Under curve) index is used for measuring the result of the model, and is the area enclosed by the coordinate axes Under the ROC curve, and the closer the value of the AUC is to 1, the better the effect of the representative model is, and the closer the model is to the reality. Table 1 shows the comparison results of the method comparing 5 latest models on 8 public data sets, and it can be seen from the results that the knowledge tracking model based on the graph attention network proposed by the present invention is significantly superior to the prior art scheme.

TABLE 15 comparison of Performance of the methods on 6 datasets

The above-described aspects may be implemented individually or in various combinations, and such variations are within the scope of the present invention.

It should be noted that in the description of the present application, the terms "upper end", "lower end" and "bottom end" indicating the orientation or positional relationship are based on the orientation or positional relationship shown in the drawings, or the orientation or positional relationship which the product of the application is conventionally placed in use, and are used only for convenience of describing the present application and for simplicity of description, and do not indicate or imply that the device referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present application. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Finally, it should be noted that: the above examples are only for illustrating the technical solutions of the present invention, and are not limited thereto. Although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A knowledge tracking method based on a graph attention network is characterized by comprising the following steps:

s1: based on exercise collection

Set of knowledge points

Constructing a knowledge graph G ═ t _i ，t _j ，b _ij In which b is _ij ∈{0，1}，t _i ∈{q _j ，c _m }，j∈|Q|，m∈|C|；

S2; based on exercise collection

Obtaining input x of a model _t ；

S4: using a self-attention mechanism for all graph nodes based on knowledge graph G ═ t _i ，t _j ，b _ij Get the relation coefficient v between the nodes _ij GAT uses the calculation of weights for each central node by a multi-headed attention layer, based on a relationship coefficient v _ij Obtaining a final node representation

S6: problem-based embedded representation obtaining representation vector d of difficulty through difficulty analysis layer _t Based on the time t of the student in the IRT modelKnowledge state h _t Obtaining a project discrimination coefficient alpha based on an input x _t Obtaining student's ability theta, representing vector d based on difficulty _t Obtaining final exercise difficulty beta, and obtaining the prediction probability p of the student for answering the current exercise based on the project discrimination coefficient alpha, the student ability theta and the final exercise difficulty beta _t ；

And b _ij Obtaining a first loss function

2. The method for knowledge tracking based on graph attention network as claimed in claim 1, wherein the step S1 specifically includes: set exercises

Set of knowledge points

Merge to get a set

t _i ∈{q _j ，c _m J belongs to | Q |, m belongs to | C |, | T |, Q | + | C |, based on the set

3. The method for knowledge tracking based on graph attention network as claimed in claim 2, wherein the step S2 specifically includes: at time t, the model input x _t The device consists of two parts: the first part being problem q of dimension N _t The second part is r _t (ii) a If the student can answer q correctly _t Then r is _t 1, otherwise r _t 0, if the student answers the correct question, then at q _t Rear re-splicing r _t，1 If the answer is wrong, at q _t Rear re-splicing r _t，0 ，

Wherein r is _t，1 、r _t，0 Are all vectors of dimension N, r _t，1 The position of the corresponding question mark is 1, and the rest are 0, r _t，0 Is an all-zero vector

x _t Is a vector with dimension 2N.

4. A knowledge tracing method based on graph attention network as claimed in claim 2 or 3, characterized in that the step S3 includes: embedded representation of exercises

Wherein

Are trainable parameters and N represents the embedded dimension.

5. A knowledge tracing method based on graph attention network as claimed in claim 2 or 3, characterized in that the step S4 includes: using a self-attention mechanism on all graph nodes, and then using a shared weight matrix W _t Acting on adjacent nodes, and then obtaining a relation coefficient v between the nodes by using a nonlinear activation function LeakyReLU _ij ，v _ij ＝LeakyReLU(W·[W _t ·t _i ||W _t ·t _j ]) Where | | | represents a join operation, and | | | represents an inner product operation.

6. The method of claim 5, wherein the method comprises the following steps: step S4 further includes: for v _ij Normalization operation is carried out to obtain attention weight value of node

Where softmax denotes the activation function, N _i Represents a node t _i Number of adjacent nodes of a _ij Represents a node t _i And its adjacent node t _j Attention weight of (1); GAT uses the weight calculation of a multi-head attention layer acting on each central node, K different attention surfaces are obtained by setting K independent attention mechanisms, and then a final node representation is obtained by using a nonlinear activation function

Where K denotes the number of layers of the multi-head attention set and σ denotes sThe function is activated by the igmoid in such a way that,

the node representation representing the h-th level.

7. The method for tracking knowledge based on a graph attention network as claimed in claim 2 or 3, wherein the step S5 specifically comprises: LSTM passing forget gate f _t And input gate i _t And an output gate o _t And cell state C _t Acquiring knowledge state h of student at time t _t (ii) a Wherein the forgetting door simulates the knowledge forgetting rate f of students in the learning process at the moment t _t ＝σ(W _f ·[h _t-1 ，x _t ]+b _f ) Wherein h is _t-1 The knowledge state of the student at time t-1,

Shows a variation process simulating the degree of mastery of old knowledge,

C _t indicating that the two are combined to form a new cell state,

where tanh represents the hyperbolic tangent activation function,

are all trainable parameters, C _t- 1 represents the state of the cells at time t-1,

multiplying corresponding elements of the representative vector; the output gate simulates the change process of the knowledge state of the student forgotten according to the current learning knowledge and the historical knowledge, and the formula at the moment t is as follows: o. o _t ＝σ(W _o ·[h _t-1 ，x _t ]+b _o )，

Wherein

Are all trainable parameters, h _t-1 Represents the knowledge state of the student at time t-1, and σ (-) represents the sigmoid activation function.

8. A knowledge tracing method based on graph attention network as claimed in claim 2 or 3, characterized in that the step S6 includes: by the formula θ ═ tanh (W) _θ ·h _t +b _θ ) Acquiring the ability theta of the student, and obtaining the ability theta of the student through a formula alpha-sigma (W) _α ·x _t +b _a ) Acquiring a project discrimination coefficient alpha, wherein sigma and tanh both represent activation functions,

are trainable parameters.

9. The method for tracking knowledge based on graph attention network as claimed in claim 8, wherein the step S6 further comprises: obtaining a representation vector d of the difficulty by a difficulty analysis layer _t ，

Wherein

For the embedded representation of the problem,

are all trainable parameters; to d _t Performing matrix transformation operation, and obtaining final problem difficulty beta through a tanh activation function, wherein the problem difficulty beta is tanh (W) _β ·d _t +b _β ) Wherein

Are all trainable parameters; obtaining the prediction probability p of the student answering the current exercise _t ，p _t ＝σ(W _p ·[α(θ-β)]+b _p ) In which

Are trainable parameters.

10. A knowledge tracing method based on graph attention network as claimed in claim 2 or 3, characterized in that the step S7 includes: evaluating relationships between graph nodes by means of inner products