CN114911975B - Knowledge tracking method based on graph attention network - Google Patents

Knowledge tracking method based on graph attention network Download PDF

Info

Publication number
CN114911975B
CN114911975B CN202210479195.7A CN202210479195A CN114911975B CN 114911975 B CN114911975 B CN 114911975B CN 202210479195 A CN202210479195 A CN 202210479195A CN 114911975 B CN114911975 B CN 114911975B
Authority
CN
China
Prior art keywords
knowledge
graph
student
obtaining
attention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210479195.7A
Other languages
Chinese (zh)
Other versions
CN114911975A (en
Inventor
张井合
李林昊
李英双
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinhua Hangda Beidou Application Technology Co ltd
Original Assignee
Jinhua Hangda Beidou Application Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinhua Hangda Beidou Application Technology Co ltd filed Critical Jinhua Hangda Beidou Application Technology Co ltd
Priority to CN202210479195.7A priority Critical patent/CN114911975B/en
Publication of CN114911975A publication Critical patent/CN114911975A/en
Application granted granted Critical
Publication of CN114911975B publication Critical patent/CN114911975B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a knowledge tracking method based on a graph attention network, and belongs to the technical field of knowledge tracking. The knowledge tracking method based on the graph attention network comprises the following steps: s1: constructing a knowledge graph; s2, performing S2; acquiring the input of a model; s3: obtaining an embedded representation of the problem; s4: obtaining a final node representation; s5: acquiring a knowledge state of a student at a time t; s6: obtaining prediction probability of students; s7: a joint training objective is constructed. Aiming at the defects of the prior knowledge tracking method, the invention researches how to effectively mine the deep relation information in the knowledge graph, constructs the knowledge graph based on the problems and the knowledge points contained in the problems, creatively applies the graph attention network to the knowledge tracking field, distributes different weights for different adjacent nodes, learns the embedded representation of the problems from the deep features, and then captures the knowledge state change of the students in the answering process by using LSTM, thereby accurately predicting the future answering performance of the students.

Description

Knowledge tracking method based on graph attention network
Technical Field
The invention relates to the technical field of knowledge tracking, in particular to a knowledge tracking method based on a graph attention network.
Background
In MOOC, udemy, lynda and other intelligent teaching systems and large-scale online open course platforms, knowledge tracking is an indispensable task, and aims to capture the change of knowledge states of students in the answering process and make predictions on future answering performances of the students. The research of knowledge tracking has important significance for realizing intelligent education and completing personalized study guiding tasks. Specifically, after the learner of the online platform learns the knowledge point, the platform may have a corresponding problem to check whether the learner has a complete grasp of the knowledge point. The knowledge tracking task puts the problem sequence which is answered by the learner in the platform into a model for training, the model can capture the change of the knowledge grasping degree of the learner in the sequence, and when a new problem arrives, the model predicts whether the learner can answer correctly according to the grasping degree of the learner on the problem related knowledge.
Existing KT methods typically construct predictive models based on the one-hot encoding format of problems or knowledge points as input. Project response theory (item response theory, IRT) is a method integration of studying student performance in the cognitive and psychological fields, and is also a traditional method of knowledge tracking. Deep knowledge tracking models (Deep Knowledge Tracing, DKT) introduce deep learning into the knowledge tracking field and offer a significant improvement in performance over traditional approaches. Subsequent scholars have also proposed a variety of novel frameworks based on deep learning methods. The DKVMN uses a static matrix key to store knowledge points embedded and a dynamic matrix value to store and update the learner's knowledge points. The convolution knowledge tracking model (CKT) takes the difference of priori knowledge and learning rate of students into consideration, and a convolution neural network is introduced to model individuation of the students. In addition, in recent years, knowledge tracking models based on deep learning have become a hotspot for students to study, for example EERNN, EKT, SKVMN. The depth model shows strong performance with its strong feature extraction capability, especially in time series tasks, and knowledge state can be obtained through complex changes for each simple input model to predict. However, the existing knowledge tracking method has the following disadvantages:
(1) The existing knowledge tracking research method often cannot effectively mine deep information in problems and knowledge points, namely cannot mine relations among problems and problems, problems and knowledge points and relations among knowledge points and knowledge points, so that models based on problems or knowledge points are difficult to develop in knowledge tracking tasks.
(2) The existing knowledge tracking method can capture the knowledge state of students, but can not fully utilize deep information in the process of predicting future answering performance of students, so that a model can not obtain good effects in knowledge tracking tasks.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a knowledge tracking method based on a graph attention network.
The invention provides a knowledge tracking method based on a graph attention network, which comprises the following steps:
s1: based on problem setKnowledge Point set->Constructing a knowledge graph g= { t i ,t j ,b ij }
Wherein b ij ∈{0,1},t i ∈{q j ,c m },j∈|Q|,m∈|C|;
S2, performing S2; based on problem setAcquiring input x of model t
S3: one-hot encoding of each problem results in an embedded representation of the problem by matrix multiplication
S4: using self-attention mechanism to act on all graph nodes, knowledge graph based g= { t i ,t j ,b ij Obtaining a relationship coefficient v between nodes ij The GAT uses the weight calculation of the multi-head attention layer on each central node based on the relation coefficient v ij Obtaining a final node representation
S5: obtaining the change condition of the knowledge state of students along with time by adopting LSTM based on input x t Acquiring knowledge state h of student at t moment t
S6: obtaining a difficulty representation vector d through a difficulty analysis layer based on an embedded representation of a problem t Knowledge state h based on student t moment in IRT model t Acquiring item discrimination coefficient alpha based on input x t Obtaining student's ability θ, difficulty-based representation vector d t Obtaining the final problem difficulty beta, and obtaining the prediction probability p of the student to answer the current problem based on the project distinguishing coefficient alpha, the student capacity theta and the final problem difficulty beta t
S7: evaluating relationships between graph nodes using inner products based on knowledge graphsEvaluation of the embedded representation of graph nodes by their local proximity is based on the relationship between the graph nodes>And b ij Acquiring a first loss function->Predictive probability p of answering current problem using student t And the true result r t The cross entropy loss function between the two is taken as an objective function, and then the prediction probability p is based t And the true result r t Acquiring a second loss function->Construction of joint training objective l=λ 1 L 12 L 2 Wherein lambda is 1 ,λ 2 And the trade-off coefficient of the local proximity loss of the nodes in the control graph and the prediction loss of the student performance is represented.
Further, the problems are assembledKnowledge Point set->Merging to obtain a sett i ∈{q j ,c m "where j is e |Q|, m is e|c|, |t|= |q|+|c|, based on set +|>Constructing a knowledge graph g= { t i ,t j ,b ij }, wherein b ij E {0,1}, when b ij When=1, it means that there is an edge between the node i and the node j, and vice versa.
Further, the step S2 specifically includes: at time t, input x of model t Consists of two parts: the first part is the problem q with the dimension N t The second part is r t The method comprises the steps of carrying out a first treatment on the surface of the If students can answer q correctly t R is then t =1, otherwise r t =0, if the student answers the question correctly, at q t Then splice r t,l If the answer is wrong then at q t Then splice r t,0Wherein r is t,1 、r t,0 Are vectors of dimension N, r t,1 Is expressed as 1 at the corresponding question number position and 0 at the rest t,0 Then is the all zero vector +.>x t Is a vector with a dimension of 2N.
Further, step S3 includes: embedded representation of problemsWherein the method comprises the steps ofAre trainable parameters, N representing the embedded dimension.
Further, step S4 includes: acting on all graph nodes using a self-attention mechanism, then using a shared weight matrix W t Acting on adjacent nodes, and then obtaining the relationship coefficient v between nodes using a nonlinear activation function LeakyReLU ij ,v ij =LeakyReLU(W·[W t ·t i |W t ·t j ]) Where I represents a join operation and I represents an inner product operation.
Further, step S4 further includes: for v ij Performing normalization operation to obtain attention weight value of nodeWherein softmax represents the activation function, N i Representing node t i A) number of adjacent nodes, a ij Representing node t i Adjacent to it is node t j Is a weight of attention of (2); GAT uses multiple attention layers to act on weight calculation of each central node, K different attention surfaces are obtained by setting K independent attention mechanisms, and a nonlinear activation function is used to obtain final node representation->Wherein K represents the number of multi-head attention layers set, σ represents the sigmoid activation function, +.>The node representation representing the h-th layer.
Further, the step S5 specifically includes: LSTM through forget door f t Input gate i t Output door o t Cell state C t Acquiring knowledge state h of student at t moment t The method comprises the steps of carrying out a first treatment on the surface of the Wherein the forgetting gate simulates the knowledge forgetting rate f of students in the learning process at the time t t =σ(W f ·[h t-1 ,x t ]+b f ) Wherein h is t-1 Is the knowledge state of the student at the time t-1,is a trainable parameter; an input gate simulates a student's process of updating its knowledge state when facing a problem, where i t Representing a learning process simulating new knowledge, i t =σ(W i ·[h t-1 ,x t ]+b i ),/>Representing a course of change simulating the degree of knowledge mastery of the old,C t indicating that the two are combined to form a new cellular state,wherein tanh represents hyperbolic tangent activation function, < -> Are all trainable parameters, C t-1 Indicates the state of the cell at time t-1, +.>Multiplying the corresponding elements of the representative vectors; the output gate simulates the change process of the knowledge state of the student according to the current learned knowledge and the historical knowledge forgetting, and the formula at the time t is as follows: o (o) t =σ(W o ·[h t-1 ,x t ]+b o ),/>Wherein the method comprises the steps ofAre all trainable parameters, h t-1 Representing the knowledge state of the student at time t-1, σ (·) represents the sigmoid activation function.
Further, step s6 includes: through the formula θ=tanh (W θ ·h t +b θ ) The ability θ of the student is obtained by the formula α=σ (W α ·x t +b α ) Obtaining a project discrimination coefficient alpha, wherein sigma and tanh represent an activation function,are trainable parameters.
Further, step S6 further includes: obtaining a difficulty representation vector d through a difficulty analysis layer tWherein->For the embedded representation of the problem, +.>Are trainable parameters; for d t Performing matrix transformation operation, and obtaining final problem difficulty beta, beta=tanh (W β ·d t +b β ) Wherein->Are trainable parameters; obtaining the prediction probability p of the student to answer the current problem r ,p t =σ(W p ·[α(θ-β)]+b p ) Wherein->Are trainable parameters.
Further, step S7 includes: evaluating relationships between graph nodes using inner product, by relationships between nodesConverting weight values among nodes into [0,1]]Where i, j e [ 1., |T|is a value of (1)]Sigma represents the sigmoidx liveness function.
The knowledge tracking method based on the graph attention network has the following beneficial effects:
the invention establishes a new knowledge tracking method based on a graph attention network. In the method, a node representation containing deep information is firstly captured by using a drawing and meaning network, problems-problems, knowledge points-knowledge points and problems-knowledge points relation are captured based on a constructed knowledge graph, a new problem representation containing the deep information is obtained, then the change of self knowledge state of students in the process of answering problem sequences is simulated by using an LSTM network, and finally the future performance of the students is predicted from three aspects of student capacity, problem difficulty and project distinction by combining with an improved IRT. Through experimental analysis on 6 public data sets, the knowledge tracking method based on the graph attention network is proved to be capable of remarkably improving the accuracy of the knowledge tracking task; aiming at the defects of the prior knowledge tracking method, how to effectively mine deep relation information in a knowledge graph is researched, a knowledge graph based on problems and knowledge points contained in the problems is constructed, a graph attention network is creatively applied to the field of knowledge tracking, different weights are distributed for different adjacent nodes, embedded representations of the problems are learned from deep features, then LSTM is used for capturing knowledge state changes of students in the answering process, and further future answering performances of the students are accurately predicted.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention. In the drawings, like reference numerals are used to identify like elements. The drawings, which are included in the description, illustrate some, but not all embodiments of the invention. Other figures can be derived from these figures by one of ordinary skill in the art without undue effort.
FIG. 1 is a schematic diagram of a knowledge tracking method based on a graph attention network according to the present invention;
FIG. 2 is a schematic diagram of an LSTM long-term memory neural network in a knowledge tracking method based on a graph attention network according to the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention. It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be arbitrarily combined with each other.
Please refer to fig. 1-2. The knowledge tracking method based on the graph attention network provided by the embodiment of the invention comprises the following steps of:
s1: based on problem setKnowledge Point set->Constructing a knowledge graph g= { t i ,t j ,b ij }
Wherein b ij ∈{0,1},t i ∈{q j ,c m },j∈|Q|,m∈|C|;
S2, performing S2; based on problem setAcquiring input x of model t
S3: one-hot encoding of each problem results in an embedded representation of the problem by matrix multiplication
S4: using self-attention mechanism to act on all graph nodes, knowledge graph based g= { t i ,t j ,b ij Obtaining a relationship coefficient v between nodes ij The GAT uses the weight calculation of the multi-head attention layer on each central node based on the relation coefficient v ij Obtaining the final productNode representation
S5: obtaining the change condition of the knowledge state of students along with time by adopting LSTM based on input x t Acquiring knowledge state h of student at t moment t
S6: obtaining a difficulty representation vector d through a difficulty analysis layer based on an embedded representation of a problem t Knowledge state h based on student t moment in IRT model t Acquiring item discrimination coefficient alpha based on input x t Obtaining student's ability θ, difficulty-based representation vector d t Obtaining the final problem difficulty beta, and obtaining the prediction probability p of the student to answer the current problem based on the project distinguishing coefficient alpha, the student capacity theta and the final problem difficulty beta t
S7: evaluating relationships between graph nodes using inner products based on knowledge graphsEvaluation of the embedded representation of graph nodes by their local proximity is based on the relationship between the graph nodes>And b ij Acquiring a first loss function->Predictive probability p of answering current problem using student t And the true result r t The cross entropy loss function between the two is taken as an objective function, and then the prediction probability p is based t And the true result r t Acquiring a second loss function->Construction of joint training objective l=λ 1 L 12 L 2 Wherein lambda is 1 ,λ 2 Trade-off representing local proximity loss of nodes in control graph and student performance prediction lossA number.
The invention aims to provide a knowledge tracking method and a knowledge tracking system based on a graph attention network, which can improve the performance of knowledge tracking and better help learners to make personalized plans. Aiming at the defects of the prior knowledge tracking method, how to effectively mine deep relation information in a knowledge graph is researched, a knowledge graph based on problems and knowledge points contained in the problems is constructed, and inspired by a graph annotation force network (GAT), the invention creatively applies the graph annotation force network to the knowledge tracking field, distributes different weights for different adjacent nodes, learns embedded representation of the problems from deep features, provides deeper information for a model, further accurately captures the knowledge state of students, and then captures the knowledge state change of the students in the answering process by using LSTM, so as to accurately predict future answering performance of the students. . In addition, the invention introduces and improves the IRT of the cognitive education psychology theory, predicts the future performance of students from three aspects of problem difficulty, student capability and project distinction, and further improves the performance of the model.
The problems in the original data set are preprocessed according to students, the number of all problems and knowledge points in the data set is counted, each problem and knowledge point are numbered, and the model is convenient to train. For each student, counting the answer sequences of the students, wherein each answer sequence comprises three rows of data, the first row represents the number of the students to answer, the second row represents the problem number of the students to answer, the third row represents whether the students answer the problem correctly, the correct answer is 1, and otherwise, the correct answer is 0.
The step S1 specifically includes: collecting exercisesKnowledge Point set->Merging to get the set->t i ∈{q j ,c m -j e|q|, m e|c|, |t|= |q|+|c|, based on set +|c|>Constructing a knowledge graph g= { t i ,t j ,b ij }, wherein b ij E {0,1}, when b ij When=1, it means that there is an edge between the node i and the node j, and vice versa.
In a real online education scenario, students usually face a plurality of problems and a plurality of knowledge points in a answering process, wherein one problem can comprise a plurality of knowledge points, and one knowledge point can also correspond to a plurality of problems. In addition, there is an association between the problems and the knowledge points.
The invention establishes a knowledge graph to explore the relationship between problems, the relationship between knowledge points and the relationship between problems and knowledge points. Assume thatIs a student set of length |s|,is a set of problems of length |Q| and +.>Is a knowledge point set with length of |C| and each student s i Independently complete the problem in Q and any problem Q i The knowledge points contained all belong to the knowledge point set C. Considering the size and simplicity of the data set in the invention, the problem set and the knowledge point set are combined to obtain a sett i ∈{q j ,c m J is equal to |Q|, m is equal to |C|, and |T|= |Q|+|C|; then define knowledge graph g= { t i ,t j ,b ij }, wherein b ij E {0,1}, i.eWhen b ij When the value of the adjacency matrix of the knowledge graph is +.1, the adjacency matrix of the knowledge graph is defined as +.>
The step S2 specifically includes: at time t, input x of model t Consists of two parts: the first part is the problem q with the dimension N t The second part is r t The method comprises the steps of carrying out a first treatment on the surface of the If students can answer q correctly t R is then t =1, otherwise r t =0, if the student answers the question correctly, at q t Then splice r t,1 If the answer is wrong then at q t Then splice r t,0Wherein r is t,1 、r t,0 Are vectors of dimension N, r t,1 Is expressed as 1 at the corresponding question number position and 0 at the rest t,0 Then it is an all zero vectorx t Is a vector with a dimension of 2N.
Relevant elements are extracted from the raw data and input to the model is constructed. The first step has counted the answer sequence of each student, but the information expressed by the single number is limited. Therefore, each problem and answer situation are processed by using the one-hot coding format. At time t, input x of model t Is composed of two parts. The first part is the problem q with the dimension N t Consists of N different problems. The method is one-hot code, and only the position corresponding to the question mark is 1, and the rest positions are 0. The second part is r t If student S i Can correctly answer the problem q t R is then t =1, otherwise r t =0. For better training, r will be t Conversion to q t Representation of the form, i.e. one-hot coding. If the student answers the correct question, at q t Then splice r t,1 If the answer is wrongIf the error occurs, then splice r t,0
Wherein r is t,1 、r t,0 Are vectors of dimension N, r t,l Is expressed as 1 at the corresponding question number position and 0 at the rest t,0 Then it is an all zero vectorx t Is a vector with a dimension of 2N.
The step S3 includes: embedded representation of problemsWherein-> Are trainable parameters, N representing the embedded dimension.
The input data of the knowledge tracking task comprises two parts of problems and corresponding student performances, and the invention needs to mine deep information contained in the problems and knowledge points, which is irrelevant to the student performances. Therefore, the invention extracts the problem parts in the student history answer sequence, and obtains the embedded representation of the problems by multiplying the one-hot codes of each problem by a matrix. The embedding of the problem is expressed as follows:
wherein the method comprises the steps ofAre trainable parameters, N represents embedded dimension, < >>The representation of the problem after embedding.
The step S4 includes: acting on all graph nodes using a self-attention mechanism, then using a shared weight matrix W t Acting on adjacent nodes, and then obtaining the relationship coefficient v between nodes using a nonlinear activation function LeakyReLU ij ,v ij =LeakyReLU(W·[W t ·t i ||W t ·t j ]) Where I represents a join operation and I represents an inner product operation.
The step S4 further includes: for v ij The talkback normalization operation obtains the attention weight value of the nodeWherein softmax represents the activation function, N i Representing node t i A) number of adjacent nodes, a ij Representing node t i Adjacent to it is node t j Is a weight of attention of (2); GAT uses multiple attention layers to act on weight calculation of each central node, K different attention surfaces are obtained by setting K independent attention mechanisms, and a nonlinear activation function is used to obtain final node representation-> Wherein K represents the number of multi-head attention layers set, σ represents the sigmoid activation function, +.>The node representation representing the h-th layer.
Given a knowledge graph g= { t i ,t j ,b ij First, to calculate the attention coefficients between nodes, the self-attention mechanism is used to act on all graph nodes, then the shared weight matrix W is used t Acting on adjacent nodes, then obtaining between nodes using a nonlinear activation function, leakyReLURelation coefficient v ij 。v ij The calculation process of (2) is as follows:
v ij =LeakyReLU(W·[W t ·t i ||W t ·t j ])
where I represents a join operation and I represents an inner product operation. For v ij And carrying out normalization operation to obtain the attention weight value of the node, wherein the specific formula is as follows:
wherein softmax represents the activation function, N i Representing node t i A) number of adjacent nodes, a ij Representing node t i Adjacent to it is node t j Is a weight of attention of (c). In addition, in order to stably learn the attention weight, the GAT uses the weight calculation of the multi-head attention layer acting on each central node, obtains K different attention surfaces by setting K independent attention mechanisms, and obtains the final node representation by using a nonlinear activation function. The invention sets 8 independent attention mechanisms. The specific process is as follows:
where K denotes the number of multi-headed attention layers set, σ denotes the sigmoid activation function,the node representation representing the h-th layer. In addition, in order to obtain a more comprehensive node representation, the invention sets 3 layers of attention for training. In a graph attention network, each central node may obtain information of 3-order neighboring nodes after training through a 3-layer graph attention layer.
The step S5 specifically includes: LSTM through forget door f t Input gate i t Output door o t Cell state C t Acquiring knowledge state h of student at t moment t The method comprises the steps of carrying out a first treatment on the surface of the Wherein the forgetting gate simulates the knowledge forgetting rate f of students in the learning process at the time t t =σ(W f ·[h t-1 ,x t ]+b f ) Wherein h is t-1 Is the knowledge state of the student at the time t-1,is a trainable parameter; an input gate simulates a student's process of updating its knowledge state when facing a problem, where i t Representing a learning process simulating new knowledge, i t =σ(W i ·[h t-1 ,x t ]+b i ),/>Representing the course of change simulating the degree of knowledge of old, +.>C t Indicating that the two are combined to form a new cell state, < > and>wherein tanh represents hyperbolic tangent activation function, < ->Are all trainable parameters, C t-1 Indicates the state of the cell at time t-1, +.>Multiplying the corresponding elements of the representative vectors; the output gate simulates the change process of the knowledge state of the student according to the current learned knowledge and the historical knowledge forgetting, and the formula at the time t is as follows: o (o) t =σ(W o ·[h t-1 ,x t ]+b o ),Wherein->Are all trainable parameters, h t-1 Representing the knowledge state of the student at time t-1, σ (·) represents the sigmoid activation function.
LSTM is employed to obtain the change in knowledge state of students over time, which exhibits excellent performance in deep knowledge tracking tasks. LSTM obtains knowledge state h of student through three gates and one state t Respectively forget door f t Input gate i t Output door o t Cell state C t . Wherein, cell state C t History information is passed to each unit to solve the problem that RNNs have difficulty capturing long-term dependencies.
Forgetting the door: in a real scene, a student gradually forgets some knowledge that was learned before over time. The forgetting gate uses a scalar of [0,1] to simulate the knowledge forgetting rate of students in the learning process, and the formula is as follows at the time t:
f t =σ(W f ·[h t-1 ,x t ]+b f )
an input door: the input gate simulates the process of updating the knowledge state of the student when facing the problem, including the learning of new knowledge and the review of old knowledge. i.e t Representing a learning process simulating new knowledge,representing the course of simulating the mastery of old knowledge, C t Indicating that the two are combined to form a new cellular state. This process receives new inputs and updates the current cell state. At time t, the formula is as follows:
i t =σ(W i ·[h t-1 ,x t ]+b i )
where tanh represents the hyperbolic tangent activation function, are trainable parameters. C (C) t-1 Indicates the state of the cell at time t-1, +.>The representative vectors correspond to the elements multiplication.
Output door: the output gate simulates the change process of the knowledge state of the student according to the current learned knowledge and the historical knowledge forgetting, and outputs the current knowledge state h of the student t . At the time t, the formula is:
o t =σ(W o ,[h t-1 ,x t ]+b o )
wherein the method comprises the steps ofAre trainable parameters. h is a t-1 Representing the knowledge state of the student at time t-1, σ (·) represents the sigmoid activation function.
The step S6 includes: through the formula θ=tanh (W θ ·h t +b θ ) The ability θ of the student is obtained by the formula α=σ (W α ·x t +b α ) Obtaining a project discrimination coefficient alpha, wherein sigma and tanh represent an activation function,are trainable parameters.
The step S6 further includes: obtaining difficulty through difficulty analysis layerIs a representation vector d of (a) tWherein->For the embedded representation of the problem, +.>Are trainable parameters; for d t Performing matrix transformation operation, and obtaining final problem difficulty beta, beta=tanh (W β ·d t +b β ) WhereinAre trainable parameters; obtaining the prediction probability p of the student to answer the current problem t ,p t =σ(W p ·[α(θ-β)]+b p ) Wherein->Are trainable parameters.
IRT, as a classical theory in statistical psychology, can quantify the ability of students, showing its powerful performance in knowledge tracking. The IRT model adopted by the invention mainly comprises three parameters, namely an item differentiation coefficient alpha, the capability theta of students and the problem difficulty beta. In the knowledge tracking task, the ability of the student is based on the knowledge state of the student, and the better the knowledge state of the student, the stronger the ability of the student. The item distinguishing degree is closely related to the problem itself and the problem answering situation of the student, and if the problem answering situation of the student is greatly different when facing a certain problem, the better the performance of the problem distinguishing capability of the student is explained. The invention defines the capability theta of students and the project discrimination coefficient alpha as follows:
θ=tanh(W θ ·h t +b θ )
α=σ(W α ·x t +b α )
wherein sigma and tanh each represent an activation function, are trainable parameters. h is a t And representing the knowledge state of the student at the time t.
As the problem difficulty is only related to the problem itself, the model firstly obtains the difficulty expression vector d through the difficulty analysis layer t Then to d t And performing matrix transformation operation, and obtaining the final problem difficulty beta through a tanh activation function. d, d t The expression of beta is as follows:
β=tanh(W β ·d t +b β )
the invention expands the parameters into a multidimensional space to represent each parameter from a plurality of different characteristics so as to more accurately predict future answer performances of students. The prediction process is as follows:
p t =σ(W p ·[α(θ-β)]+b p )
wherein the method comprises the steps ofAre all trainable parameters, p t ∈[0,1]Representing the probability that the student correctly answers the problem at time t. The present model is defined as p t ∈[0,0.5]And if the answer is wrong, judging that the student answers correctly.
The step S7 includes: evaluating relationships between graph nodes using inner product, by relationships between nodesConverting weight values among nodes into [0,1]]Where i, j e [ 1., |T|is a value of (1)]Sigma represents a sigmoid activation function.
In order to better train the model, the invention establishes a combined training framework to train the parameters of the model to be optimal. The invention uses an inner product mode to evaluate the relation between the nodes of the graph, and the relation between the nodes is expressed as follows:
where i, j e [ 1., |T| ], σ represents a sigmoid activation function, converting the weight values between nodes to values of [0,1 ]. In order to make the learned node representation closer to the real result, the invention defines the local proximity of the graph nodes to evaluate the embedded representation of the graph nodes, and the loss function is shown as follows:
in order to enable the model to model the knowledge state of the student more accurately and further predict future answering performance of the student, the invention uses the prediction probability pt of the student to answer the current problem And (3) with True result r t The cross entropy loss function between the two is taken as an objective function, and a specific loss function is defined as follows:
then, aiming at the training targets in the two aspects, the invention constructs a combined training target, which is specifically expressed as follows:
L=λ 1 L 12 L 2
wherein lambda is 1 ,λ 2 And the trade-off coefficient of the local proximity loss of the nodes in the control graph and the prediction loss of the student performance is represented.
The invention makes a large number of experimental designs to find proper super parameters, and specifically, the invention randomly divides each data set into 80% as a training set and 20% as a test set, wherein the test set is used for evaluating the performance of a model and stopping model training in advance. All experiments used a 5-fold cross-validation method and all models used an average of 5 experiments for performance evaluation. The invention uses an Adam optimizer to train a model, sets the maximum training times as 200, sets the maximum gradient norm as 5.0 during clipping, sets the learning rate as [0.001,0.01], uses an average value as 0 for initializing a weight matrix and bias in a network, and has a normal distribution with a standard deviation of 0.01. The batch for experimental training is typically set to 64, but the specific set will be based on the size of the data set, e.g., a2012 the number of data sets is large, and the batch for training is typically set to 32. To prevent the occurrence of over-fitting, the model was added with Dropout layers, and its parameters were set to 0.5 during training.
To verify the advantages of the present invention in solving the knowledge tracking task, the present embodiment conducted experiments on 6 published data sets, namely ASS09-up, ASSIST2012, statics2011, synhetic, AICFE-math, AICFE-phy. The method also compares 4 latest knowledge tracking models, namely a hidden Markov-based knowledge tracking model (BKT), a latest factorization model (KTM), a depth knowledge tracking model (DKT) and a dynamic key value memory network model (DKVMN). The AUC (Area Under Curve) index is used to measure the result of the model, which is the area enclosed by the coordinate axis under the ROC curve, and the closer the AUC value is to 1, the better the model representing effect is, and the closer the AUC value is to reality. Table 1 shows the comparison results of the method for comparing 5 latest models on 8 public data sets, and from the results, the knowledge tracking model based on the graph attention network provided by the invention is obviously superior to the prior art scheme.
Table 1 5 results of performance comparisons of methods over 6 data sets
The above description may be implemented alone or in various combinations and these modifications are within the scope of the present invention.
It should be noted that, in the description of the present application, the terms "upper end," "lower end," and "bottom end" of the indicated orientation or positional relationship are based on the orientation or positional relationship shown in the drawings, or the orientation or positional relationship in which the product of the application is conventionally put in use, merely for convenience of describing the present application and simplifying the description, and do not indicate or imply that the device to be referred to must have a specific orientation, be configured and operated in a specific orientation, and therefore should not be construed as limiting the present application. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one.," does not exclude the presence of additional identical elements in a process, method, article, or apparatus that comprises the element.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting. Although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A knowledge tracking method based on a graph attention network, comprising the steps of:
s1: based on problem setKnowledge Point set->Construction of knowledge graphG={t i ,t j ,b ij }, wherein b ij ∈{0,1},t i ∈{q j ,c m },j∈|Q|,m∈|C|;
S2, performing S2; based on problem setAcquiring input x of model t
S3: one-hot encoding of each problem results in an embedded representation of the problem by matrix multiplication
S4: using self-attention mechanism to act on all graph nodes, knowledge graph based g= { t i ,t j ,b ij Obtaining a relationship coefficient v between nodes ij The GAT uses the weight calculation of the multi-head attention layer on each central node based on the relation coefficient v ij Obtaining a final node representation
S5: obtaining the change condition of the knowledge state of students along with time by adopting LSTM based on input x t Acquiring knowledge state h of student at t moment t
S6: obtaining a difficulty representation vector d through a difficulty analysis layer based on an embedded representation of a problem t Knowledge state h based on student t moment in IRT model t Acquiring item discrimination coefficient alpha based on input x t Obtaining student's ability θ, difficulty-based representation vector d t Obtaining the final problem difficulty beta, and obtaining the prediction probability p of the student to answer the current problem based on the project distinguishing coefficient alpha, the student capacity theta and the final problem difficulty beta t
S7: evaluating relationships between graph nodes using inner products based on knowledge graphsEvaluation of the embedded representation of graph nodes by their local proximity is based on the relationship between the graph nodes>And b ij Acquiring a first loss functionPredictive probability p of answering current problem using student t And the true result r t The cross entropy loss function between the two is taken as an objective function, and then the prediction probability p is based t And the true result r t Acquiring a second loss function->Construction of joint training objective l=λ 1 L 12 L 2 Wherein lambda is 1 ,λ 2 And the trade-off coefficient of the local proximity loss of the nodes in the control graph and the prediction loss of the student performance is represented.
2. A method for graph-attention-network-based knowledge tracking in accordance with claim 1,
the step S1 specifically includes: collecting exercisesKnowledge Point set->Merging to obtain a sett i ∈{q j ,c m -j e|q|, m e|c|, |t|= |q|+|c|, based on set +|c|>ConstructionKnowledge graph g= { t i ,t j ,b ij }, wherein b ij E {0,1}, when b ij When=1, it means that there is an edge between the node i and the node j, and vice versa.
3. A method for graph attention network based knowledge tracking in accordance with claim 2 wherein,
the step S2 specifically includes: at time t, input x of model t Consists of two parts: the first part is the problem q with the dimension N t The second part is r t The method comprises the steps of carrying out a first treatment on the surface of the If students can answer q correctly t R is then t =1, otherwise r t =0, if the student answers the question correctly, at q t Then splice r t,1 If the answer is wrong then at q t Then splice r t,0Wherein r is t,1 、r t,0 Are vectors of dimension N, r t,1 Is expressed as 1 at the corresponding question number position and 0 at the rest t,0 Then is the all zero vector +.>x t Is a vector with a dimension of 2N.
4. A method of knowledge tracking based on graph attention network as claimed in claim 2 or 3, characterized in that step S3 comprises: embedded representation of problemsWherein->Are trainable parameters, N representing the embedded dimension.
5. A drawing-based annotation as claimed in claim 2 or 3The knowledge tracking method of the meaning network is characterized in that the step S4 includes: acting on all graph nodes using a self-attention mechanism, then using a shared weight matrix W t Acting on adjacent nodes, and then obtaining the relationship coefficient v between nodes using a nonlinear activation function LeakyReLU ij ,v ij =LeakyReLU(W·[W t ·t i ||W t ·t j ]) Where I represents a join operation and I represents an inner product operation.
6. The knowledge tracking method based on graph attention network as recited in claim 5, wherein:
the step S4 further includes: for v ij Performing normalization operation to obtain attention weight value of nodeWherein softmax represents the activation function, N i Representing node t i A) number of adjacent nodes, a ij Representing node t i Adjacent to it is node t j Is a weight of attention of (2); GAT uses multiple attention layers to act on weight calculation of each central node, K different attention surfaces are obtained by setting K independent attention mechanisms, and a nonlinear activation function is used to obtain final node representation-> Wherein K represents the number of multi-head attention layers set, σ represents the sigmoid activation function, +.>The node representation representing the h-th layer.
7. A method for tracking knowledge based on graph attention network as claimed in claim 2 or 3, wherein the step S5 specifically comprises:LSTM through forget door f t Input gate i t Output door o t Cell state C t Acquiring knowledge state h of student at t moment t The method comprises the steps of carrying out a first treatment on the surface of the Wherein the forgetting gate simulates the knowledge forgetting rate f of students in the learning process at the time t t =σ(W f ·[h t-1 ,x t ]+b f ) Wherein h is t-1 Is the knowledge state of the student at the time t-1,is a trainable parameter; an input gate simulates a student's process of updating its knowledge state when facing a problem, where i t Representing a learning process simulating new knowledge, i t =σ(W i ·[h t-1 ,x t ]+b i ),/>Representing the course of change simulating the degree of knowledge of old, +.>C t Indicating that the two are combined to form a new cell state, < > and>wherein tanh represents hyperbolic tangent activation function, < ->Are all trainable parameters, C t-1 Indicates the state of the cell at time t-1, +.>Multiplying the corresponding elements of the representative vectors; the output gate simulates the change process of the knowledge state of the student according to the current learned knowledge and the historical knowledge forgetting, and the formula at the time t is as follows:wherein-> Are all trainable parameters, h t-1 Representing the knowledge state of the student at time t-1, σ (·) represents the sigmoid activation function.
8. A method of knowledge tracking based on graph attention network as claimed in claim 2 or 3, characterized in that step S6 comprises: through the formula θ=tanh (W θ ·h t +b θ ) The ability θ of the student is obtained by the formula α=σ (W α ·x t +b α ) Obtaining a project discrimination coefficient alpha, wherein sigma and tanh represent an activation function,are trainable parameters.
9. The method for tracing knowledge based on graph attention network of claim 8, wherein,
the step S6 further includes: obtaining a difficulty representation vector d through a difficulty analysis layer tWherein->For the embedded representation of the problem, +.>Are trainable parameters; for d t Performing matrix transformation operation, and obtaining final problem difficulty beta, beta=tanh (W β ·d t +b β ) Wherein->Are trainable parameters; obtaining the prediction probability p of the student to answer the current problem t ,p t =σ(W p ·[α(θ-β)]+b p ) Wherein Are trainable parameters.
10. A method of knowledge tracking based on graph attention network as claimed in claim 2 or 3, characterized in that step S7 comprises: evaluating relationships between graph nodes using inner product, by relationships between nodesConverting weight values among nodes into [0,1]]Where i, j e [ 1., |T|is a value of (1)]Sigma represents a sigmoid activation function.
CN202210479195.7A 2022-05-05 2022-05-05 Knowledge tracking method based on graph attention network Active CN114911975B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210479195.7A CN114911975B (en) 2022-05-05 2022-05-05 Knowledge tracking method based on graph attention network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210479195.7A CN114911975B (en) 2022-05-05 2022-05-05 Knowledge tracking method based on graph attention network

Publications (2)

Publication Number Publication Date
CN114911975A CN114911975A (en) 2022-08-16
CN114911975B true CN114911975B (en) 2024-04-05

Family

ID=82765831

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210479195.7A Active CN114911975B (en) 2022-05-05 2022-05-05 Knowledge tracking method based on graph attention network

Country Status (1)

Country Link
CN (1) CN114911975B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116151329B (en) * 2023-04-23 2023-07-18 山东师范大学 Student knowledge state tracking method and system based on inverse fact graph learning
CN116166998B (en) * 2023-04-25 2023-07-07 合肥师范学院 Student performance prediction method combining global and local features
CN116976434B (en) * 2023-07-05 2024-02-20 长江大学 Knowledge point diffusion representation-based knowledge tracking method and storage medium
CN117077737B (en) * 2023-08-22 2024-03-15 长江大学 Knowledge tracking system for dynamic collaboration of knowledge points
CN117057422B (en) * 2023-08-23 2024-04-02 长江大学 Knowledge tracking system for global knowledge convergence sensing

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112116092A (en) * 2020-08-11 2020-12-22 浙江师范大学 Interpretable knowledge level tracking method, system and storage medium
CN113033808A (en) * 2021-03-08 2021-06-25 西北大学 Deep embedded knowledge tracking method based on exercise difficulty and student ability
CN114021722A (en) * 2021-10-30 2022-02-08 华中师范大学 Attention knowledge tracking method integrating cognitive portrayal
CN114385801A (en) * 2021-12-27 2022-04-22 河北工业大学 Knowledge tracking method and system based on hierarchical refinement LSTM network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11631338B2 (en) * 2020-06-11 2023-04-18 Act, Inc. Deep knowledge tracing with transformers

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112116092A (en) * 2020-08-11 2020-12-22 浙江师范大学 Interpretable knowledge level tracking method, system and storage medium
CN113033808A (en) * 2021-03-08 2021-06-25 西北大学 Deep embedded knowledge tracking method based on exercise difficulty and student ability
CN114021722A (en) * 2021-10-30 2022-02-08 华中师范大学 Attention knowledge tracking method integrating cognitive portrayal
CN114385801A (en) * 2021-12-27 2022-04-22 河北工业大学 Knowledge tracking method and system based on hierarchical refinement LSTM network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
马骁睿 ; 徐圆 ; 朱群雄 ; .一种结合深度知识追踪的个性化习题推荐方法.小型微型计算机系统.2020,(第05期),全文. *

Also Published As

Publication number Publication date
CN114911975A (en) 2022-08-16

Similar Documents

Publication Publication Date Title
CN114911975B (en) Knowledge tracking method based on graph attention network
CN109829541A (en) Deep neural network incremental training method and system based on learning automaton
CN108257052B (en) Online student knowledge assessment method and system
CN110321361A (en) Test question recommendation and judgment method based on improved LSTM neural network model
CN109840595B (en) Knowledge tracking method based on group learning behavior characteristics
CN113344053B (en) Knowledge tracking method based on examination question different composition representation and learner embedding
CN113591988B (en) Knowledge cognitive structure analysis method, system, computer equipment, medium and terminal
CN114385801A (en) Knowledge tracking method and system based on hierarchical refinement LSTM network
CN112085168A (en) Knowledge tracking method and system based on dynamic key value gating circulation network
CN114372137A (en) Dynamic perception test question recommendation method and system integrating depth knowledge tracking
CN114861754A (en) Knowledge tracking method and system based on external attention mechanism
CN114021722A (en) Attention knowledge tracking method integrating cognitive portrayal
CN113361685A (en) Knowledge tracking method and system based on learner knowledge state evolution expression
CN115329096A (en) Interactive knowledge tracking method based on graph neural network
CN115545160A (en) Knowledge tracking method and system based on multi-learning behavior cooperation
CN115544158A (en) Multi-knowledge-point dynamic knowledge tracking method applied to intelligent education system
CN117540104A (en) Learning group difference evaluation method and system based on graph neural network
Yunusov et al. Shapley values to explain machine learning models of school student’s academic performance during COVID-19
CN114997461B (en) Time-sensitive answer correctness prediction method combining learning and forgetting
CN117094859A (en) Learning path recommendation method and system combining graph neural network and multi-layer perceptron
CN114117033B (en) Knowledge tracking method and system
Ma et al. Dtkt: An improved deep temporal convolutional network for knowledge tracing
CN111898803A (en) Exercise prediction method, system, equipment and storage medium
CN114742292A (en) Knowledge tracking process-oriented two-state co-evolution method for predicting future performance of students
CN112906293A (en) Machine teaching method and system based on review mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant