CN114911975B

CN114911975B - Knowledge tracking method based on graph attention network

Info

Publication number: CN114911975B
Application number: CN202210479195.7A
Authority: CN
Inventors: 张井合; 李林昊; 李英双
Original assignee: Jinhua Hangda Beidou Application Technology Co ltd
Current assignee: Jinhua Hangda Beidou Application Technology Co ltd
Priority date: 2022-05-05
Filing date: 2022-05-05
Publication date: 2024-04-05
Anticipated expiration: 2042-05-05
Also published as: CN114911975A

Abstract

The invention provides a knowledge tracking method based on a graph attention network, and belongs to the technical field of knowledge tracking. The knowledge tracking method based on the graph attention network comprises the following steps: s1: constructing a knowledge graph; s2, performing S2; acquiring the input of a model; s3: obtaining an embedded representation of the problem; s4: obtaining a final node representation; s5: acquiring a knowledge state of a student at a time t; s6: obtaining prediction probability of students; s7: a joint training objective is constructed. Aiming at the defects of the prior knowledge tracking method, the invention researches how to effectively mine the deep relation information in the knowledge graph, constructs the knowledge graph based on the problems and the knowledge points contained in the problems, creatively applies the graph attention network to the knowledge tracking field, distributes different weights for different adjacent nodes, learns the embedded representation of the problems from the deep features, and then captures the knowledge state change of the students in the answering process by using LSTM, thereby accurately predicting the future answering performance of the students.

Description

Knowledge tracking method based on graph attention network

Technical Field

The invention relates to the technical field of knowledge tracking, in particular to a knowledge tracking method based on a graph attention network.

Background

In MOOC, udemy, lynda and other intelligent teaching systems and large-scale online open course platforms, knowledge tracking is an indispensable task, and aims to capture the change of knowledge states of students in the answering process and make predictions on future answering performances of the students. The research of knowledge tracking has important significance for realizing intelligent education and completing personalized study guiding tasks. Specifically, after the learner of the online platform learns the knowledge point, the platform may have a corresponding problem to check whether the learner has a complete grasp of the knowledge point. The knowledge tracking task puts the problem sequence which is answered by the learner in the platform into a model for training, the model can capture the change of the knowledge grasping degree of the learner in the sequence, and when a new problem arrives, the model predicts whether the learner can answer correctly according to the grasping degree of the learner on the problem related knowledge.

Existing KT methods typically construct predictive models based on the one-hot encoding format of problems or knowledge points as input. Project response theory (item response theory, IRT) is a method integration of studying student performance in the cognitive and psychological fields, and is also a traditional method of knowledge tracking. Deep knowledge tracking models (Deep Knowledge Tracing, DKT) introduce deep learning into the knowledge tracking field and offer a significant improvement in performance over traditional approaches. Subsequent scholars have also proposed a variety of novel frameworks based on deep learning methods. The DKVMN uses a static matrix key to store knowledge points embedded and a dynamic matrix value to store and update the learner's knowledge points. The convolution knowledge tracking model (CKT) takes the difference of priori knowledge and learning rate of students into consideration, and a convolution neural network is introduced to model individuation of the students. In addition, in recent years, knowledge tracking models based on deep learning have become a hotspot for students to study, for example EERNN, EKT, SKVMN. The depth model shows strong performance with its strong feature extraction capability, especially in time series tasks, and knowledge state can be obtained through complex changes for each simple input model to predict. However, the existing knowledge tracking method has the following disadvantages:

(1) The existing knowledge tracking research method often cannot effectively mine deep information in problems and knowledge points, namely cannot mine relations among problems and problems, problems and knowledge points and relations among knowledge points and knowledge points, so that models based on problems or knowledge points are difficult to develop in knowledge tracking tasks.

(2) The existing knowledge tracking method can capture the knowledge state of students, but can not fully utilize deep information in the process of predicting future answering performance of students, so that a model can not obtain good effects in knowledge tracking tasks.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a knowledge tracking method based on a graph attention network.

The invention provides a knowledge tracking method based on a graph attention network, which comprises the following steps:

s1: based on problem setKnowledge Point set->Constructing a knowledge graph g= { t _i ，t _j ，b _ij }

Wherein b _ij ∈{0，1}，t _i ∈{q _j ，c _m }，j∈|Q|，m∈|C|；

S2, performing S2; based on problem setAcquiring input x of model _t ；

S3: one-hot encoding of each problem results in an embedded representation of the problem by matrix multiplication

S4: using self-attention mechanism to act on all graph nodes, knowledge graph based g= { t _i ，t _j ，b _ij Obtaining a relationship coefficient v between nodes _ij The GAT uses the weight calculation of the multi-head attention layer on each central node based on the relation coefficient v _ij Obtaining a final node representation

S5: obtaining the change condition of the knowledge state of students along with time by adopting LSTM based on input x _t Acquiring knowledge state h of student at t moment _t ；

S6: obtaining a difficulty representation vector d through a difficulty analysis layer based on an embedded representation of a problem _t Knowledge state h based on student t moment in IRT model _t Acquiring item discrimination coefficient alpha based on input x _t Obtaining student's ability θ, difficulty-based representation vector d _t Obtaining the final problem difficulty beta, and obtaining the prediction probability p of the student to answer the current problem based on the project distinguishing coefficient alpha, the student capacity theta and the final problem difficulty beta _t ；

S7: evaluating relationships between graph nodes using inner products based on knowledge graphsEvaluation of the embedded representation of graph nodes by their local proximity is based on the relationship between the graph nodes>And b _ij Acquiring a first loss function->Predictive probability p of answering current problem using student _t And the true result r _t The cross entropy loss function between the two is taken as an objective function, and then the prediction probability p is based _t And the true result r _t Acquiring a second loss function->Construction of joint training objective l=λ ₁ L ₁ +λ ₂ L ₂ Wherein lambda is ₁ ，λ ₂ And the trade-off coefficient of the local proximity loss of the nodes in the control graph and the prediction loss of the student performance is represented.

Further, the problems are assembledKnowledge Point set->Merging to obtain a sett _i ∈{q _j ，c _m "where j is e |Q|, m is e|c|, |t|= |q|+|c|, based on set +|>Constructing a knowledge graph g= { t _i ，t _j ，b _ij }, wherein b _ij E {0,1}, when b _ij When=1, it means that there is an edge between the node i and the node j, and vice versa.

Further, the step S2 specifically includes: at time t, input x of model _t Consists of two parts: the first part is the problem q with the dimension N _t The second part is r _t The method comprises the steps of carrying out a first treatment on the surface of the If students can answer q correctly _t R is then _t =1, otherwise r _t =0, if the student answers the question correctly, at q _t Then splice r _t，l If the answer is wrong then at q _t Then splice r _t，0 ，Wherein r is _t，1 、r _t，0 Are vectors of dimension N, r _t，1 Is expressed as 1 at the corresponding question number position and 0 at the rest _t，0 Then is the all zero vector +.>x _t Is a vector with a dimension of 2N.

Further, step S3 includes: embedded representation of problemsWherein the method comprises the steps ofAre trainable parameters, N representing the embedded dimension.

Further, step S4 includes: acting on all graph nodes using a self-attention mechanism, then using a shared weight matrix W _t Acting on adjacent nodes, and then obtaining the relationship coefficient v between nodes using a nonlinear activation function LeakyReLU _ij ，v _ij ＝LeakyReLU(W·[W _t ·t _i |W _t ·t _j ]) Where I represents a join operation and I represents an inner product operation.

Further, step S4 further includes: for v _ij Performing normalization operation to obtain attention weight value of nodeWherein softmax represents the activation function, N _i Representing node t _i A) number of adjacent nodes, a _ij Representing node t _i Adjacent to it is node t _j Is a weight of attention of (2); GAT uses multiple attention layers to act on weight calculation of each central node, K different attention surfaces are obtained by setting K independent attention mechanisms, and a nonlinear activation function is used to obtain final node representation->Wherein K represents the number of multi-head attention layers set, σ represents the sigmoid activation function, +.>The node representation representing the h-th layer.

Further, the step S5 specifically includes: LSTM through forget door f _t Input gate i _t Output door o _t Cell state C _t Acquiring knowledge state h of student at t moment _t The method comprises the steps of carrying out a first treatment on the surface of the Wherein the forgetting gate simulates the knowledge forgetting rate f of students in the learning process at the time t _t ＝σ(W _f ·[h _t-1 ，x _t ]+b _f ) Wherein h is _t-1 Is the knowledge state of the student at the time t-1,is a trainable parameter; an input gate simulates a student's process of updating its knowledge state when facing a problem, where i _t Representing a learning process simulating new knowledge, i _t ＝σ(W _i ·[h _t-1 ，x _t ]+b _i )，/>Representing a course of change simulating the degree of knowledge mastery of the old,C _t indicating that the two are combined to form a new cellular state,wherein tanh represents hyperbolic tangent activation function, < -> Are all trainable parameters, C _t-1 Indicates the state of the cell at time t-1, +.>Multiplying the corresponding elements of the representative vectors; the output gate simulates the change process of the knowledge state of the student according to the current learned knowledge and the historical knowledge forgetting, and the formula at the time t is as follows: o (o) _t ＝σ(W _o ·[h _t-1 ，x _t ]+b _o )，/>Wherein the method comprises the steps ofAre all trainable parameters, h _t-1 Representing the knowledge state of the student at time t-1, σ (·) represents the sigmoid activation function.

Further, step s6 includes: through the formula θ=tanh (W _θ ·h _t +b _θ ) The ability θ of the student is obtained by the formula α=σ (W _α ·x _t +b _α ) Obtaining a project discrimination coefficient alpha, wherein sigma and tanh represent an activation function,are trainable parameters.

Further, step S6 further includes: obtaining a difficulty representation vector d through a difficulty analysis layer _t ，Wherein->For the embedded representation of the problem, +.>Are trainable parameters; for d _t Performing matrix transformation operation, and obtaining final problem difficulty beta, beta=tanh (W _β ·d _t +b _β ) Wherein->Are trainable parameters; obtaining the prediction probability p of the student to answer the current problem _r ，p _t ＝σ(W _p ·[α(θ-β)]+b _p ) Wherein->Are trainable parameters.

Further, step S7 includes: evaluating relationships between graph nodes using inner product, by relationships between nodesConverting weight values among nodes into [0,1]]Where i, j e [ 1., |T|is a value of (1)]Sigma represents the sigmoidx liveness function.

The knowledge tracking method based on the graph attention network has the following beneficial effects:

the invention establishes a new knowledge tracking method based on a graph attention network. In the method, a node representation containing deep information is firstly captured by using a drawing and meaning network, problems-problems, knowledge points-knowledge points and problems-knowledge points relation are captured based on a constructed knowledge graph, a new problem representation containing the deep information is obtained, then the change of self knowledge state of students in the process of answering problem sequences is simulated by using an LSTM network, and finally the future performance of the students is predicted from three aspects of student capacity, problem difficulty and project distinction by combining with an improved IRT. Through experimental analysis on 6 public data sets, the knowledge tracking method based on the graph attention network is proved to be capable of remarkably improving the accuracy of the knowledge tracking task; aiming at the defects of the prior knowledge tracking method, how to effectively mine deep relation information in a knowledge graph is researched, a knowledge graph based on problems and knowledge points contained in the problems is constructed, a graph attention network is creatively applied to the field of knowledge tracking, different weights are distributed for different adjacent nodes, embedded representations of the problems are learned from deep features, then LSTM is used for capturing knowledge state changes of students in the answering process, and further future answering performances of the students are accurately predicted.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention. In the drawings, like reference numerals are used to identify like elements. The drawings, which are included in the description, illustrate some, but not all embodiments of the invention. Other figures can be derived from these figures by one of ordinary skill in the art without undue effort.

FIG. 1 is a schematic diagram of a knowledge tracking method based on a graph attention network according to the present invention;

FIG. 2 is a schematic diagram of an LSTM long-term memory neural network in a knowledge tracking method based on a graph attention network according to the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention. It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be arbitrarily combined with each other.

Please refer to fig. 1-2. The knowledge tracking method based on the graph attention network provided by the embodiment of the invention comprises the following steps of:

Wherein b _ij ∈{0，1}，t _i ∈{q _j ，c _m }，j∈|Q|，m∈|C|；

S2, performing S2; based on problem setAcquiring input x of model _t ；

S4: using self-attention mechanism to act on all graph nodes, knowledge graph based g= { t _i ，t _j ，b _ij Obtaining a relationship coefficient v between nodes _ij The GAT uses the weight calculation of the multi-head attention layer on each central node based on the relation coefficient v _ij Obtaining the final productNode representation

S7: evaluating relationships between graph nodes using inner products based on knowledge graphsEvaluation of the embedded representation of graph nodes by their local proximity is based on the relationship between the graph nodes>And b _ij Acquiring a first loss function->Predictive probability p of answering current problem using student _t And the true result r _t The cross entropy loss function between the two is taken as an objective function, and then the prediction probability p is based _t And the true result r _t Acquiring a second loss function->Construction of joint training objective l=λ ₁ L ₁ +λ ₂ L ₂ Wherein lambda is ₁ ，λ ₂ Trade-off representing local proximity loss of nodes in control graph and student performance prediction lossA number.

The invention aims to provide a knowledge tracking method and a knowledge tracking system based on a graph attention network, which can improve the performance of knowledge tracking and better help learners to make personalized plans. Aiming at the defects of the prior knowledge tracking method, how to effectively mine deep relation information in a knowledge graph is researched, a knowledge graph based on problems and knowledge points contained in the problems is constructed, and inspired by a graph annotation force network (GAT), the invention creatively applies the graph annotation force network to the knowledge tracking field, distributes different weights for different adjacent nodes, learns embedded representation of the problems from deep features, provides deeper information for a model, further accurately captures the knowledge state of students, and then captures the knowledge state change of the students in the answering process by using LSTM, so as to accurately predict future answering performance of the students. . In addition, the invention introduces and improves the IRT of the cognitive education psychology theory, predicts the future performance of students from three aspects of problem difficulty, student capability and project distinction, and further improves the performance of the model.

The problems in the original data set are preprocessed according to students, the number of all problems and knowledge points in the data set is counted, each problem and knowledge point are numbered, and the model is convenient to train. For each student, counting the answer sequences of the students, wherein each answer sequence comprises three rows of data, the first row represents the number of the students to answer, the second row represents the problem number of the students to answer, the third row represents whether the students answer the problem correctly, the correct answer is 1, and otherwise, the correct answer is 0.

The step S1 specifically includes: collecting exercisesKnowledge Point set->Merging to get the set->t _i ∈{q _j ，c _m -j e|q|, m e|c|, |t|= |q|+|c|, based on set +|c|>Constructing a knowledge graph g= { t _i ，t _j ，b _ij }, wherein b _ij E {0,1}, when b _ij When=1, it means that there is an edge between the node i and the node j, and vice versa.

In a real online education scenario, students usually face a plurality of problems and a plurality of knowledge points in a answering process, wherein one problem can comprise a plurality of knowledge points, and one knowledge point can also correspond to a plurality of problems. In addition, there is an association between the problems and the knowledge points.

The invention establishes a knowledge graph to explore the relationship between problems, the relationship between knowledge points and the relationship between problems and knowledge points. Assume thatIs a student set of length |s|,is a set of problems of length |Q| and +.>Is a knowledge point set with length of |C| and each student s _i Independently complete the problem in Q and any problem Q _i The knowledge points contained all belong to the knowledge point set C. Considering the size and simplicity of the data set in the invention, the problem set and the knowledge point set are combined to obtain a sett _i ∈{q _j ，c _m J is equal to |Q|, m is equal to |C|, and |T|= |Q|+|C|; then define knowledge graph g= { t _i ，t _j ，b _ij }, wherein b _ij E {0,1}, i.eWhen b _ij When the value of the adjacency matrix of the knowledge graph is +.1, the adjacency matrix of the knowledge graph is defined as +.>

The step S2 specifically includes: at time t, input x of model _t Consists of two parts: the first part is the problem q with the dimension N _t The second part is r _t The method comprises the steps of carrying out a first treatment on the surface of the If students can answer q correctly _t R is then _t =1, otherwise r _t =0, if the student answers the question correctly, at q _t Then splice r _t，1 If the answer is wrong then at q _t Then splice r _t，0 ，Wherein r is _t，1 、r _t，0 Are vectors of dimension N, r _t，1 Is expressed as 1 at the corresponding question number position and 0 at the rest _t，0 Then it is an all zero vectorx _t Is a vector with a dimension of 2N.

Relevant elements are extracted from the raw data and input to the model is constructed. The first step has counted the answer sequence of each student, but the information expressed by the single number is limited. Therefore, each problem and answer situation are processed by using the one-hot coding format. At time t, input x of model _t Is composed of two parts. The first part is the problem q with the dimension N _t Consists of N different problems. The method is one-hot code, and only the position corresponding to the question mark is 1, and the rest positions are 0. The second part is r _t If student S _i Can correctly answer the problem q _t R is then _t =1, otherwise r _t =0. For better training, r will be _t Conversion to q _t Representation of the form, i.e. one-hot coding. If the student answers the correct question, at q _t Then splice r _t，1 If the answer is wrongIf the error occurs, then splice r _t，0 。

Wherein r is _t，1 、r _t，0 Are vectors of dimension N, r _t，l Is expressed as 1 at the corresponding question number position and 0 at the rest _t，0 Then it is an all zero vectorx _t Is a vector with a dimension of 2N.

The step S3 includes: embedded representation of problemsWherein-> Are trainable parameters, N representing the embedded dimension.

The input data of the knowledge tracking task comprises two parts of problems and corresponding student performances, and the invention needs to mine deep information contained in the problems and knowledge points, which is irrelevant to the student performances. Therefore, the invention extracts the problem parts in the student history answer sequence, and obtains the embedded representation of the problems by multiplying the one-hot codes of each problem by a matrix. The embedding of the problem is expressed as follows:

wherein the method comprises the steps ofAre trainable parameters, N represents embedded dimension, < >>The representation of the problem after embedding.

The step S4 includes: acting on all graph nodes using a self-attention mechanism, then using a shared weight matrix W _t Acting on adjacent nodes, and then obtaining the relationship coefficient v between nodes using a nonlinear activation function LeakyReLU _ij ，v _ij ＝LeakyReLU(W·[W _t ·t _i ||W _t ·t _j ]) Where I represents a join operation and I represents an inner product operation.

The step S4 further includes: for v _ij The talkback normalization operation obtains the attention weight value of the nodeWherein softmax represents the activation function, N _i Representing node t _i A) number of adjacent nodes, a _ij Representing node t _i Adjacent to it is node t _j Is a weight of attention of (2); GAT uses multiple attention layers to act on weight calculation of each central node, K different attention surfaces are obtained by setting K independent attention mechanisms, and a nonlinear activation function is used to obtain final node representation-> Wherein K represents the number of multi-head attention layers set, σ represents the sigmoid activation function, +.>The node representation representing the h-th layer.

Given a knowledge graph g= { t _i ，t _j ，b _ij First, to calculate the attention coefficients between nodes, the self-attention mechanism is used to act on all graph nodes, then the shared weight matrix W is used _t Acting on adjacent nodes, then obtaining between nodes using a nonlinear activation function, leakyReLURelation coefficient v _ij 。v _ij The calculation process of (2) is as follows:

v _ij ＝LeakyReLU(W·[W _t ·t _i ||W _t ·t _j ])

where I represents a join operation and I represents an inner product operation. For v _ij And carrying out normalization operation to obtain the attention weight value of the node, wherein the specific formula is as follows:

wherein softmax represents the activation function, N _i Representing node t _i A) number of adjacent nodes, a _ij Representing node t _i Adjacent to it is node t _j Is a weight of attention of (c). In addition, in order to stably learn the attention weight, the GAT uses the weight calculation of the multi-head attention layer acting on each central node, obtains K different attention surfaces by setting K independent attention mechanisms, and obtains the final node representation by using a nonlinear activation function. The invention sets 8 independent attention mechanisms. The specific process is as follows:

where K denotes the number of multi-headed attention layers set, σ denotes the sigmoid activation function,the node representation representing the h-th layer. In addition, in order to obtain a more comprehensive node representation, the invention sets 3 layers of attention for training. In a graph attention network, each central node may obtain information of 3-order neighboring nodes after training through a 3-layer graph attention layer.

The step S5 specifically includes: LSTM through forget door f _t Input gate i _t Output door o _t Cell state C _t Acquiring knowledge state h of student at t moment _t The method comprises the steps of carrying out a first treatment on the surface of the Wherein the forgetting gate simulates the knowledge forgetting rate f of students in the learning process at the time t _t ＝σ(W _f ·[h _t-1 ，x _t ]+b _f ) Wherein h is _t-1 Is the knowledge state of the student at the time t-1,is a trainable parameter; an input gate simulates a student's process of updating its knowledge state when facing a problem, where i _t Representing a learning process simulating new knowledge, i _t ＝σ(W _i ·[h _t-1 ，x _t ]+b _i )，/>Representing the course of change simulating the degree of knowledge of old, +.>C _t Indicating that the two are combined to form a new cell state, < > and>wherein tanh represents hyperbolic tangent activation function, < ->Are all trainable parameters, C _t-1 Indicates the state of the cell at time t-1, +.>Multiplying the corresponding elements of the representative vectors; the output gate simulates the change process of the knowledge state of the student according to the current learned knowledge and the historical knowledge forgetting, and the formula at the time t is as follows: o (o) _t ＝σ(W _o ·[h _t-1 ，x _t ]+b _o )，Wherein->Are all trainable parameters, h _t-1 Representing the knowledge state of the student at time t-1, σ (·) represents the sigmoid activation function.

LSTM is employed to obtain the change in knowledge state of students over time, which exhibits excellent performance in deep knowledge tracking tasks. LSTM obtains knowledge state h of student through three gates and one state _t Respectively forget door f _t Input gate i _t Output door o _t Cell state C _t . Wherein, cell state C _t History information is passed to each unit to solve the problem that RNNs have difficulty capturing long-term dependencies.

Forgetting the door: in a real scene, a student gradually forgets some knowledge that was learned before over time. The forgetting gate uses a scalar of [0,1] to simulate the knowledge forgetting rate of students in the learning process, and the formula is as follows at the time t:

f _t ＝σ(W _f ·[h _t-1 ，x _t ]+b _f )

an input door: the input gate simulates the process of updating the knowledge state of the student when facing the problem, including the learning of new knowledge and the review of old knowledge. i.e _t Representing a learning process simulating new knowledge,representing the course of simulating the mastery of old knowledge, C _t Indicating that the two are combined to form a new cellular state. This process receives new inputs and updates the current cell state. At time t, the formula is as follows:

i _t ＝σ(W _i ·[h _t-1 ，x _t ]+b _i )

where tanh represents the hyperbolic tangent activation function, are trainable parameters. C (C) _t-1 Indicates the state of the cell at time t-1, +.>The representative vectors correspond to the elements multiplication.

Output door: the output gate simulates the change process of the knowledge state of the student according to the current learned knowledge and the historical knowledge forgetting, and outputs the current knowledge state h of the student _t . At the time t, the formula is:

o _t ＝σ(W _o ，[h _t-1 ，x _t ]+b _o )

wherein the method comprises the steps ofAre trainable parameters. h is a _t-1 Representing the knowledge state of the student at time t-1, σ (·) represents the sigmoid activation function.

The step S6 includes: through the formula θ=tanh (W _θ ·h _t +b _θ ) The ability θ of the student is obtained by the formula α=σ (W _α ·x _t +b _α ) Obtaining a project discrimination coefficient alpha, wherein sigma and tanh represent an activation function,are trainable parameters.

The step S6 further includes: obtaining difficulty through difficulty analysis layerIs a representation vector d of (a) _t ，Wherein->For the embedded representation of the problem, +.>Are trainable parameters; for d _t Performing matrix transformation operation, and obtaining final problem difficulty beta, beta=tanh (W _β ·d _t +b _β ) WhereinAre trainable parameters; obtaining the prediction probability p of the student to answer the current problem _t ，p _t ＝σ(W _p ·[α(θ-β)]+b _p ) Wherein->Are trainable parameters.

IRT, as a classical theory in statistical psychology, can quantify the ability of students, showing its powerful performance in knowledge tracking. The IRT model adopted by the invention mainly comprises three parameters, namely an item differentiation coefficient alpha, the capability theta of students and the problem difficulty beta. In the knowledge tracking task, the ability of the student is based on the knowledge state of the student, and the better the knowledge state of the student, the stronger the ability of the student. The item distinguishing degree is closely related to the problem itself and the problem answering situation of the student, and if the problem answering situation of the student is greatly different when facing a certain problem, the better the performance of the problem distinguishing capability of the student is explained. The invention defines the capability theta of students and the project discrimination coefficient alpha as follows:

θ＝tanh(W _θ ·h _t +b _θ )

α＝σ(W _α ·x _t +b _α )

wherein sigma and tanh each represent an activation function, are trainable parameters. h is a _t And representing the knowledge state of the student at the time t.

As the problem difficulty is only related to the problem itself, the model firstly obtains the difficulty expression vector d through the difficulty analysis layer _t Then to d _t And performing matrix transformation operation, and obtaining the final problem difficulty beta through a tanh activation function. d, d _t The expression of beta is as follows:

β＝tanh(W _β ·d _t +b _β )

the invention expands the parameters into a multidimensional space to represent each parameter from a plurality of different characteristics so as to more accurately predict future answer performances of students. The prediction process is as follows:

p _t ＝σ(W _p ·[α(θ-β)]+b _p )

wherein the method comprises the steps ofAre all trainable parameters, p _t ∈[0，1]Representing the probability that the student correctly answers the problem at time t. The present model is defined as p _t ∈[0，0.5]And if the answer is wrong, judging that the student answers correctly.

The step S7 includes: evaluating relationships between graph nodes using inner product, by relationships between nodesConverting weight values among nodes into [0,1]]Where i, j e [ 1., |T|is a value of (1)]Sigma represents a sigmoid activation function.

In order to better train the model, the invention establishes a combined training framework to train the parameters of the model to be optimal. The invention uses an inner product mode to evaluate the relation between the nodes of the graph, and the relation between the nodes is expressed as follows:

where i, j e [ 1., |T| ], σ represents a sigmoid activation function, converting the weight values between nodes to values of [0,1 ]. In order to make the learned node representation closer to the real result, the invention defines the local proximity of the graph nodes to evaluate the embedded representation of the graph nodes, and the loss function is shown as follows:

in order to enable the model to model the knowledge state of the student more accurately and further predict future answering performance of the student, the invention uses the prediction probability pt of the student to answer the current problem _{And (3) with} True result r _t The cross entropy loss function between the two is taken as an objective function, and a specific loss function is defined as follows:

then, aiming at the training targets in the two aspects, the invention constructs a combined training target, which is specifically expressed as follows:

L＝λ ₁ L ₁ +λ ₂ L ₂

wherein lambda is ₁ ，λ ₂ And the trade-off coefficient of the local proximity loss of the nodes in the control graph and the prediction loss of the student performance is represented.

The invention makes a large number of experimental designs to find proper super parameters, and specifically, the invention randomly divides each data set into 80% as a training set and 20% as a test set, wherein the test set is used for evaluating the performance of a model and stopping model training in advance. All experiments used a 5-fold cross-validation method and all models used an average of 5 experiments for performance evaluation. The invention uses an Adam optimizer to train a model, sets the maximum training times as 200, sets the maximum gradient norm as 5.0 during clipping, sets the learning rate as [0.001,0.01], uses an average value as 0 for initializing a weight matrix and bias in a network, and has a normal distribution with a standard deviation of 0.01. The batch for experimental training is typically set to 64, but the specific set will be based on the size of the data set, e.g., a2012 the number of data sets is large, and the batch for training is typically set to 32. To prevent the occurrence of over-fitting, the model was added with Dropout layers, and its parameters were set to 0.5 during training.

To verify the advantages of the present invention in solving the knowledge tracking task, the present embodiment conducted experiments on 6 published data sets, namely ASS09-up, ASSIST2012, statics2011, synhetic, AICFE-math, AICFE-phy. The method also compares 4 latest knowledge tracking models, namely a hidden Markov-based knowledge tracking model (BKT), a latest factorization model (KTM), a depth knowledge tracking model (DKT) and a dynamic key value memory network model (DKVMN). The AUC (Area Under Curve) index is used to measure the result of the model, which is the area enclosed by the coordinate axis under the ROC curve, and the closer the AUC value is to 1, the better the model representing effect is, and the closer the AUC value is to reality. Table 1 shows the comparison results of the method for comparing 5 latest models on 8 public data sets, and from the results, the knowledge tracking model based on the graph attention network provided by the invention is obviously superior to the prior art scheme.

Table 1 5 results of performance comparisons of methods over 6 data sets

The above description may be implemented alone or in various combinations and these modifications are within the scope of the present invention.

It should be noted that, in the description of the present application, the terms "upper end," "lower end," and "bottom end" of the indicated orientation or positional relationship are based on the orientation or positional relationship shown in the drawings, or the orientation or positional relationship in which the product of the application is conventionally put in use, merely for convenience of describing the present application and simplifying the description, and do not indicate or imply that the device to be referred to must have a specific orientation, be configured and operated in a specific orientation, and therefore should not be construed as limiting the present application. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one.," does not exclude the presence of additional identical elements in a process, method, article, or apparatus that comprises the element.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting. Although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A knowledge tracking method based on a graph attention network, comprising the steps of:

s1: based on problem setKnowledge Point set->Construction of knowledge graphG＝{t _i ，t _j ，b _ij }, wherein b _ij ∈{0，1}，t _i ∈{q _j ，c _m }，j∈|Q|，m∈|C|；

S2, performing S2; based on problem setAcquiring input x of model _t ；

S7: evaluating relationships between graph nodes using inner products based on knowledge graphsEvaluation of the embedded representation of graph nodes by their local proximity is based on the relationship between the graph nodes>And b _ij Acquiring a first loss functionPredictive probability p of answering current problem using student _t And the true result r _t The cross entropy loss function between the two is taken as an objective function, and then the prediction probability p is based _t And the true result r _t Acquiring a second loss function->Construction of joint training objective l=λ ₁ L ₁ +λ ₂ L ₂ Wherein lambda is ₁ ，λ ₂ And the trade-off coefficient of the local proximity loss of the nodes in the control graph and the prediction loss of the student performance is represented.

2. A method for graph-attention-network-based knowledge tracking in accordance with claim 1,

the step S1 specifically includes: collecting exercisesKnowledge Point set->Merging to obtain a sett _i ∈{q _j ，c _m -j e|q|, m e|c|, |t|= |q|+|c|, based on set +|c|>ConstructionKnowledge graph g= { t _i ，t _j ，b _ij }, wherein b _ij E {0,1}, when b _ij When=1, it means that there is an edge between the node i and the node j, and vice versa.

3. A method for graph attention network based knowledge tracking in accordance with claim 2 wherein,

the step S2 specifically includes: at time t, input x of model _t Consists of two parts: the first part is the problem q with the dimension N _t The second part is r _t The method comprises the steps of carrying out a first treatment on the surface of the If students can answer q correctly _t R is then _t =1, otherwise r _t =0, if the student answers the question correctly, at q _t Then splice r _t，1 If the answer is wrong then at q _t Then splice r _t，0 ，Wherein r is _t，1 、r _t，0 Are vectors of dimension N, r _t，1 Is expressed as 1 at the corresponding question number position and 0 at the rest _t，0 Then is the all zero vector +.>x _t Is a vector with a dimension of 2N.

4. A method of knowledge tracking based on graph attention network as claimed in claim 2 or 3, characterized in that step S3 comprises: embedded representation of problemsWherein->Are trainable parameters, N representing the embedded dimension.

5. A drawing-based annotation as claimed in claim 2 or 3The knowledge tracking method of the meaning network is characterized in that the step S4 includes: acting on all graph nodes using a self-attention mechanism, then using a shared weight matrix W _t Acting on adjacent nodes, and then obtaining the relationship coefficient v between nodes using a nonlinear activation function LeakyReLU _ij ，v _ij ＝LeakyReLU(W·[W _t ·t _i ||W _t ·t _j ]) Where I represents a join operation and I represents an inner product operation.

6. The knowledge tracking method based on graph attention network as recited in claim 5, wherein:

the step S4 further includes: for v _ij Performing normalization operation to obtain attention weight value of nodeWherein softmax represents the activation function, N _i Representing node t _i A) number of adjacent nodes, a _ij Representing node t _i Adjacent to it is node t _j Is a weight of attention of (2); GAT uses multiple attention layers to act on weight calculation of each central node, K different attention surfaces are obtained by setting K independent attention mechanisms, and a nonlinear activation function is used to obtain final node representation-> Wherein K represents the number of multi-head attention layers set, σ represents the sigmoid activation function, +.>The node representation representing the h-th layer.

7. A method for tracking knowledge based on graph attention network as claimed in claim 2 or 3, wherein the step S5 specifically comprises:LSTM through forget door f _t Input gate i _t Output door o _t Cell state C _t Acquiring knowledge state h of student at t moment _t The method comprises the steps of carrying out a first treatment on the surface of the Wherein the forgetting gate simulates the knowledge forgetting rate f of students in the learning process at the time t _t ＝σ(W _f ·[h _t-1 ，x _t ]+b _f ) Wherein h is _t-1 Is the knowledge state of the student at the time t-1,is a trainable parameter; an input gate simulates a student's process of updating its knowledge state when facing a problem, where i _t Representing a learning process simulating new knowledge, i _t ＝σ(W _i ·[h _t-1 ，x _t ]+b _i )，/>Representing the course of change simulating the degree of knowledge of old, +.>C _t Indicating that the two are combined to form a new cell state, < > and>wherein tanh represents hyperbolic tangent activation function, < ->Are all trainable parameters, C _t-1 Indicates the state of the cell at time t-1, +.>Multiplying the corresponding elements of the representative vectors; the output gate simulates the change process of the knowledge state of the student according to the current learned knowledge and the historical knowledge forgetting, and the formula at the time t is as follows:wherein-> Are all trainable parameters, h _t-1 Representing the knowledge state of the student at time t-1, σ (·) represents the sigmoid activation function.

8. A method of knowledge tracking based on graph attention network as claimed in claim 2 or 3, characterized in that step S6 comprises: through the formula θ=tanh (W _θ ·h _t +b _θ ) The ability θ of the student is obtained by the formula α=σ (W _α ·x _t +b _α ) Obtaining a project discrimination coefficient alpha, wherein sigma and tanh represent an activation function,are trainable parameters.

9. The method for tracing knowledge based on graph attention network of claim 8, wherein,

the step S6 further includes: obtaining a difficulty representation vector d through a difficulty analysis layer _t ，Wherein->For the embedded representation of the problem, +.>Are trainable parameters; for d _t Performing matrix transformation operation, and obtaining final problem difficulty beta, beta=tanh (W _β ·d _t +b _β ) Wherein->Are trainable parameters; obtaining the prediction probability p of the student to answer the current problem _t ，p _t ＝σ(W _p ·[α(θ-β)]+b _p ) Wherein Are trainable parameters.

10. A method of knowledge tracking based on graph attention network as claimed in claim 2 or 3, characterized in that step S7 comprises: evaluating relationships between graph nodes using inner product, by relationships between nodesConverting weight values among nodes into [0,1]]Where i, j e [ 1., |T|is a value of (1)]Sigma represents a sigmoid activation function.