CN116796041B

CN116796041B - Learning path recommendation method, system, device and medium based on knowledge tracking

Info

Publication number: CN116796041B
Application number: CN202310546269.9A
Authority: CN
Inventors: 陈展轩; 吴正洋; 汤庸; 张广涛
Original assignee: South China Normal University
Current assignee: South China Normal University
Priority date: 2023-05-15
Filing date: 2023-05-15
Publication date: 2024-04-02
Anticipated expiration: 2043-05-15
Also published as: CN116796041A

Abstract

The invention discloses a learning path recommending method, a learning path recommending system, a learning path recommending device and a learning path recommending medium based on knowledge tracking, which can be widely applied to the technical field of education information processing. According to the invention, the first vector after embedding the learning record representing the preset moment is input into the time convolution network, so that the sequence learning range of the knowledge tracking process can be enlarged, and during knowledge tracking, the third vector obtained through the graph meaning network represents the question prediction probability of the vector prediction target object after embedding the recommended questions in combination with the current moment at the next moment, so that the knowledge state of dynamic change of the target object can be extracted and obtained, and then the third vector is used as a historical knowledge grasping state vector to be input into the reinforcement learning model, so that a learning path with higher quality can be obtained.

Description

Learning path recommendation method, system, device and medium based on knowledge tracking

Technical Field

The invention relates to the technical field of education information processing, in particular to a learning path recommending method, a learning path recommending system, a learning path recommending device and a learning path recommending medium based on knowledge tracking.

Background

In the related art, the existing method adopts a knowledge tracking model based on a cyclic neural network (RNN) to acquire the knowledge state of students, and the quality of learning path recommendation is affected due to low prediction accuracy. For example, the existing model does not set a knowledge state target for prediction, but only promotes the knowledge state, so that the existing model does not conform to the actual teaching situation, and the grasping degree of a learner expected to reach for different knowledge points is ignored. Meanwhile, the existing model only considers the difference of the knowledge points of the current recommendation in front and back mastering degree as the rewarding setting, so that the current recommendation is considered from a local moment only, and the global situation and the learning target are not considered, so that the current recommendation is not consistent with the actual teaching situation, and the recommendation quality is not good enough.

Disclosure of Invention

The present invention aims to solve at least one of the technical problems existing in the prior art. Therefore, the invention provides a learning path recommending method, a system, a device and a medium based on knowledge tracking, which can effectively improve the quality of learning path recommendation.

In one aspect, an embodiment of the present invention provides a learning path recommendation method based on knowledge tracking, including the following steps:

acquiring problems and knowledge skills, and constructing a knowledge learning graph as an entity, wherein the learning graph is used for representing the relationship between entity nodes;

acquiring an input sequence, and embedding to obtain a first vector, wherein elements of the input sequence are used for representing a learning record at a preset moment, and the learning record comprises learning questions and knowledge answers of target objects to the learning questions;

inputting the first vector into a time convolution network to obtain a second vector representation, wherein the time convolution network is used for processing the vector of each element in the input sequence to time;

updating and reasoning the knowledge learning graph by using the second vector representation input graph attention network, learning the association relation among nodes by using the graph attention network through a self-attention mechanism, updating the feature vector of the nodes, and finally carrying out average pooling on the knowledge skill nodes to obtain a third vector representation, wherein the graph attention network is used for processing the vector of each element in the input sequence on the association dependence;

predicting the question prediction probability of the target object at the next moment according to the third vector representation and the vector with the embedded recommended questions at the current moment;

and inputting the third vector representation as a historical knowledge mastering state vector into a reinforcement learning model to obtain a personalized learning path of the target object.

In some embodiments, the inputting the third vector representation as a historical knowledge mastering state vector into a reinforcement learning model to obtain a personality learning path of the target object includes:

adding the questions and the question prediction probability into a historical input sequence, and carrying out re-processing on the embedded, time convolution network and a drawing meaning mechanism to grasp a state vector by knowledge at the current moment, wherein a time node corresponding to the historical knowledge grasp state vector is positioned before the current moment;

and inputting the knowledge mastering state vector at the current moment into the reinforcement learning model to obtain the personalized learning path of the target object.

In some embodiments, the inputting the knowledge mastering state vector at the current time into the reinforcement learning model to obtain the personalized learning path of the target object includes:

inputting the knowledge mastering state vector at the current moment into the reinforcement learning model, and comparing the knowledge mastering state vector at the current moment with a target knowledge mastering state vector to obtain a first difference;

and when the first difference meets a preset state, determining that the reinforcement learning model training is completed, and generating a personalized learning path of the target object.

In some embodiments, the stability of the knowledge mastery state of the target object is measured by global change stability conditions while the reinforcement learning model is trained;

the expression formula of the global change stability condition is as follows:

F _t ＝X(s ₁ ,s ₂ ,L,s _t )

F _t representing global change stability conditions; x represents converting the state into a tensor, and then calculating a gap value through linear transformation, wherein the gap value is used for measuring the stability of state change; s is(s) ₁ ,s ₂ ,L,s _t The knowledge grasping state vector at all times is shown.

In some embodiments, when the reinforcement learning model is trained, comparing the difference of the front and rear mastery degrees of knowledge skills associated with the recommended topics at the current moment through the second difference; wherein the expression formula of the second difference is as follows:

A _t representing the second difference, p _ki,t Title q indicating current time t recommendation _t Associated knowledge skills k _i Answer probability at current time t, p _ki,t-1 Representing the knowledge skills k _i Answer probability at last moment, n _ki,t Title q indicating current time t recommendation _t Associated knowledge skills k _i Number of times recommended.

In some embodiments, the reward function expression of the reinforcement learning model is as follows:

r _t ＝Y(C _t ,F _t ,A _t )

r _t indicating rewards, C _t Representing the first difference, F _t Representing global change stability, A _t Representing the second difference, Y (C _t ,F _t ,A _t ) Representing C by a multi-layer perceptron _t ,F _t ,A _t Mapping to [0,1 ]]And (3) upper part.

In some embodiments, predicting the topic prediction probability of the target object at the next time according to the third vector representation and the vector with the embedded recommended topic at the current time includes:

splicing the third vector representation and the vector with the embedded recommendation title at the current moment to input a residual error network to obtain first hidden output data;

inputting the first hidden output data into a dense block to obtain second hidden output data;

and inputting the second hidden output data into a full-connection layer, and obtaining the question prediction probability of the target object at the next moment through a sigmoid activation function.

On the other hand, the embodiment of the invention provides a learning path recommendation system based on knowledge tracking, which comprises the following steps:

the first module is used for acquiring problems and knowledge skills and constructing a knowledge learning graph as an entity, wherein the learning graph is used for representing the relation between entity nodes;

the second module is used for acquiring an input sequence, embedding the input sequence to obtain a first vector, wherein elements of the input sequence are used for representing a learning record at a preset moment, and the learning record comprises learning questions and knowledge answers of target objects to the learning questions;

a third module, configured to input the first vector into a time convolution network to obtain a second vector representation, where the time convolution network is configured to process a vector of each element in the input sequence over time;

a fourth module, configured to update and infer the knowledge learning graph by using the second vector representation input graph attention network, and at the same time, learn association relationships between nodes by using the graph attention network through a self-attention mechanism, so as to update feature vectors of the nodes, and finally average and pool knowledge skill nodes to obtain a third vector representation, where the graph attention network is configured to process vectors of association dependence of each element in the input sequence;

a fifth module, configured to predict a topic prediction probability of the target object at a next time according to the third vector representation and a vector with the recommendation topic embedded at the current time;

and a sixth module, configured to input the third vector representation as a historical knowledge mastering state vector into a reinforcement learning model, and obtain a personalized learning path of the target object.

On the other hand, the embodiment of the invention provides a learning path recommending device based on knowledge tracking, which comprises the following components:

at least one memory for storing a program;

at least one processor for loading the program to perform the knowledge tracking based learned route recommendation method.

In another aspect, an embodiment of the present invention provides a computer storage medium in which a computer-executable program is stored, where the computer-executable program is used to implement the learning path recommendation method based on knowledge tracking when executed by a processor.

The learning path recommending method based on knowledge tracking provided by the embodiment of the invention has the following beneficial effects:

according to the method, the first vector after embedding the learning record representing the preset moment is input into the time convolution network, so that the sequence learning range of the knowledge tracking process can be enlarged, and when the knowledge is tracked, the third vector obtained through the graph meaning network represents the question prediction probability of the vector prediction target object after embedding the recommended questions in combination with the current moment at the next moment, so that the knowledge state of dynamic change of the target object can be extracted and obtained, and then the third vector is used as a historical knowledge grasping state vector to be input into the reinforcement learning model, so that a learning path with higher quality can be obtained.

Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

The invention is further described with reference to the accompanying drawings and examples, in which:

FIG. 1 is a flowchart of a learning path recommendation method based on knowledge tracking according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a knowledge tracking model according to an embodiment of the present invention;

FIG. 3 is a schematic diagram illustrating a data processing of a reinforcement learning model according to an embodiment of the present invention;

FIG. 4 is a training schematic diagram of a reinforcement learning model according to an embodiment of the present invention.

Detailed Description

Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.

In the description of the present invention, it should be understood that references to orientation descriptions such as upper, lower, front, rear, left, right, etc. are based on the orientation or positional relationship shown in the drawings, are merely for convenience of description of the present invention and to simplify the description, and do not indicate or imply that the apparatus or elements referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus should not be construed as limiting the present invention.

In the description of the present invention, the meaning of a number is one or more, the meaning of a number is two or more, and greater than, less than, exceeding, etc. are understood to exclude the present number, and the meaning of a number is understood to include the present number. The description of the first and second is for the purpose of distinguishing between technical features only and should not be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the precedence of the technical features indicated.

In the description of the present invention, unless explicitly defined otherwise, terms such as arrangement, installation, connection, etc. should be construed broadly and the specific meaning of the terms in the present invention can be reasonably determined by a person skilled in the art in combination with the specific contents of the technical scheme.

In the description of the present invention, the descriptions of the terms "one embodiment," "some embodiments," "illustrative embodiments," "examples," "specific examples," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Before proceeding with the description of specific embodiments, the terms involved in the embodiments of the present application are explained as follows:

knowledge tracking: the intelligent education service system is a basic and key task for supporting intelligent education service application, and aims to monitor the continuously developed knowledge state of students, so that the purposes of providing optimal and adaptive learning experience for each student, reasonably configuring learning time and improving teaching quality and efficiency are achieved. The knowledge tracking adopts a series of machine learning methods oriented to sequence modeling so as to achieve the purpose of dynamically predicting the knowledge state of students by utilizing learning interaction data, and is widely applied to intelligent education systems at present.

Time convolution knowledge tracking: namely, the processing of the time convolution sequence is added in the knowledge tracking, so that the probability of learning to track and predict the next question answer pair is more accurate.

Reinforcement learning: the method is an algorithm cluster for simulating interactive upgrading of an agent and an environment, and can be very naturally analogous to an education process: the learner can be regarded as an agent, the learning resources can be regarded as environments, the selection of the learning resources by the learner can be regarded as actions of the agent on the environments, the learning effect obtained after the learner learns the relevant knowledge can be regarded as rewards of the environments on the actions of the agent, and the process of recommending the optimal learning path for the target learner is the sequence of the agent obtaining the highest return on different environments.

Learning path recommendation: the learning path is a sequence of path nodes formed for a target learner to guide the learner to complete a given learning target within a time node at a specified time.

The learning route recommendation of reinforcement learning based on knowledge tracking utilizes knowledge tracking to predict the knowledge state of students and is used as the state input of reinforcement learning, the reinforcement learning recommends exercises with high rewards according to the state and the set reward execution strategy, and therefore a training closed loop is formed, and finally the recommended exercise sequence is the recommended learning route. The existing method adopts a knowledge tracking model based on RNN to acquire the knowledge state of students, and the quality of learning path recommendation is affected due to low prediction accuracy; secondly, the existing model is not considered to be specific and comprehensive in the aspect of reinforcement learning strategy setting, and is different from the actual teaching situation. For example, the existing model does not set a knowledge state target for prediction, but only promotes the knowledge state, so that the existing model does not conform to the actual teaching situation, and the grasping degree of a learner expected to reach for different knowledge points is ignored. Meanwhile, the existing model only considers the difference of the knowledge points of the current recommendation in front and back mastering degree as the rewarding setting, so that the current recommendation is considered from a local moment only, and the global situation and the learning target are not considered, so that the current recommendation is not consistent with the actual teaching situation, and the recommendation quality is not good enough.

Based on this, referring to fig. 1, an embodiment of the present invention provides a learning path recommendation method based on knowledge tracking, and the embodiment may be applied to a server, a processor or a cloud corresponding to an education platform. During application, the method of the present embodiment includes, but is not limited to, the following steps:

step S110, acquiring problems and knowledge skills, and constructing a knowledge learning graph as an entity, wherein the learning graph is used for representing the relation between entity nodes;

step S120, an input sequence is obtained, and a first vector is obtained by embedding, wherein elements of the input sequence are used for representing learning records at preset moments, and the learning records comprise learning questions and knowledge answers of target objects to the learning questions;

step S130, inputting the first vector into a time convolution network to obtain a second vector representation, wherein the time convolution network is used for processing the vector of each element in the input sequence against time;

step S140, a second vector representation is input into a graph attention network, the knowledge learning graph is updated and inferred, meanwhile, the graph attention network learns association relations among nodes through a self-attention mechanism, further, feature vectors of the nodes are updated, and finally, knowledge skill nodes are averaged and pooled to obtain a third vector representation, wherein the graph attention network is used for processing vectors of association dependence of each element in an input sequence;

step S150, predicting the question prediction probability of the target object at the next moment according to the third vector representation and the vector after the recommendation question at the current moment is embedded;

and step S160, inputting the third vector representation as a historical knowledge mastering state vector into a reinforcement learning model to obtain a personalized learning path of the target object.

In the embodiment of the present application, it is first necessary to construct a knowledge learning graph G _kl Enabling the graph attention network to act on all graph nodes using a self-attention mechanism. Let v _kl And epsilon _kl Representing the entities and associations involved in knowledge learning, respectively, the entity that is mainly emphasized in this embodiment is the knowledge question Q, and the related content in the related knowledge skills SK, i.e. v _kl =q≡sk, whereThe knowledge skill SK is mapped by the knowledge problem Q; second, ε _kl The expression =q×sk is the relationship between the above-mentioned problem set and the elements in the knowledge skill set, and the knowledge learning graph is finally defined as G _kl ＝(v _kl ,ε _kl )。

In order to construct the learned path recommendation model, it is first necessary to vector all input data. For example, when the input sequence is acquired as seq= { x ₁ ,x ₂ ,L,x _t-1 X, where x _i Learning record x at time of presentation timing i _i ＝{q _i ,a _s,i Knowledge problem q recorded at time i _i E Q (Q represents knowledge problem)Complete set) and learner s knowledge answer a _s,i ∈[0,1]. Assuming that |q| represents the total number of knowledge questions, then knowledge questions Q _i Can be expressed as a vector(d is vector dimension) and the vector space of the whole knowledge problem corpus is E.epsilon.R ^Q*d Knowledge answer a _s,i Expressed as vector +.>Study record->Set of vector connections expressed as knowledge questions and knowledge answersIn order to embody the time sequence change, the present embodiment defines a time sequence change vector +.>The specific definition is shown in a formula (1) and a formula (2):

wherein f _q2e Representing a knowledge problem transfer function, may be implemented in particular by MLP (Multilayer Perceptron, multi-layer perceptron), etc. Next, the time convolution network (TCN, temporal Convolutional Network) is responsible for processing the learning record { x } ₁ ,x ₂ ,L,x _t-1 Vectorizing the time sequence to obtain the final second vector representationWherein TCN is mainly thatCausal convolution and extended convolution are used to observe longer input sequences and capture long-term time dependencies. The addition of causal convolution can ensure that the output at time t in the network is only related to the input before time t, thereby preventing future information leakage. At the same time, it is often desirable to capture longer dependencies, requiring many layers to be stacked linearly, while introducing extended convolution in the TCN allows the convolutional network to use fewer layers, allowing a larger acceptance domain. The graph attention network (GAT, graph Attention Network) is responsible for processing the learning record { x } ₁ ,x ₂ ,L,x _t-1 Vectorization processing of the relevance dependence, and simultaneously, the GAT uses a multi-head attention layer to act on the weight calculation of each central node, so as to update the characteristic vector of the node, and finally, the knowledge skill node is averagely pooled to obtain a final third-vector representation

In this embodiment, as shown in fig. 2, the knowledge tracking model of this embodiment includes encoder and decoder structures. Wherein the above-described processing of the time convolution network and the graph attention network has completed a stochastic process of the encoder's interaction information used to model the learner's historical learning behavior and the encoder's feature vector output. In the decoder, the third vector representation of the combined encoder of the residual network Resnet is used as hidden output, and the recommended title q at the current t moment is combined _t Embedded vectorThen obtaining first hidden output data of the decoder through a residual error network, inputting the first hidden output data into a dense block to obtain second hidden output data, inputting the second hidden output data into a full-connection layer and through a sigmoid activation function to obtain the question prediction probability of the future moment of the student, namely, the question q _t The answer pair probability p of (2) _t Projects it to a probability p (a _t The probability value of=1) is used as a feedback learning standard, and the third vector is also used as an input of the reinforcement learning model.Among other things, residual networks and dense blocks have proven to be an effective way of training deep networks, which allow the network to transmit information in a cross-layer fashion, alleviating the problem of gradient extinction. Accordingly, it brings lower level features to a higher level, enabling improved accuracy of prediction.

In this embodiment, after obtaining the question prediction probability at the next time, the third vector obtained after the re-processing of the embedding, the time convolution network and the graph annotation force mechanism is used as the input of the reinforcement learning model to obtain the knowledge question q recommended at the next time t+1 _t+1 And finally forming a personalized learning path. Specifically, as shown in fig. 3, the topics and the topic prediction probabilities are added into the history input sequence, and the state vector s is grasped by the knowledge of the current moment through the re-processing of the embedding, the time convolution network and the graph annotation force mechanism _t . Wherein, the time node corresponding to the historical knowledge mastering state vector is positioned before the current moment. And then inputting the knowledge mastering state vector at the current moment into the reinforcement learning model to obtain the personalized learning path of the target object. It can be understood that, when the knowledge mastering state vector at the current moment is input into the reinforcement learning model to obtain the personalized learning path of the target object, the knowledge mastering state vector at the current moment can be compared with the target knowledge mastering state vector by inputting the knowledge mastering state vector at the current moment into the reinforcement learning model to obtain the first difference; when the first difference meets the preset state, determining that the reinforcement learning model training is completed, and generating a personalized learning path of the target object.

It can be appreciated that the reinforcement learning model predicts the probability p of the topic at the time t generated as described above _t Problem of binding q _t Adding the knowledge mastering state vector s into a history sequence, and obtaining a knowledge mastering state vector s at the time t after re-processing of an embedding, time convolution network and a graph annotation force mechanism _t As an input state for reinforcement learning. Before training starts, a target knowledge grasping state vector is presetRepresenting learner desired knowledge skills k _i The degree of mastery to be achieved. In the generation of s _t After that, it is necessary to make the pair s _t And G to obtain a first difference, as shown in equation (3):

wherein,representing knowledge skills k _i If the difference between the current grasp state and the target grasp state is |C _t - ε|=0, i.e. the first difference satisfies the preset state, indicating that the training of the reinforcement learning model is completed, and the generated topic sequence is the recommended learning path. Otherwise, further training of the reinforcement learning network is required. As shown in FIG. 4, during the training process, all of the history states s need to be considered ₁ ,s ₂ ,L,s _t As global change stabilization case F _t In this way, whether the state changes steadily is measured so as to be beneficial to achieving the goal. Wherein the global change stability case F _t The expression of (2) is shown in formula (4):

F _t ＝X(s ₁ ,s ₂ ,L,s _t ) Formula (4)

In the present embodiment, the title q recommended for the comparison at time t _t Associated knowledge skills k _i The present embodiment uses the second difference A _t The expression is as shown in formula (5):

wherein A is _t Representing the second difference, p _ki,t Title q indicating current time t recommendation _t Associated knowledge skills k _i The answer probability at the current instant t,representing the knowledge skills k _i Answer probability at last moment, +.>Title q indicating current time t recommendation _t Associated knowledge skills k _i Number of times recommended.

In this embodiment, A _t Is set to facilitate avoiding repeated recommendation of high rewards by the model of knowledge skills k _i Related topics are used for improving diversity of recommended results.

For the reinforcement learning recommendation model of the present embodiment, rewards are defined as rewards obtained by recommending appropriate topics, and therefore, the present embodiment synthesizes equations (3), (4) and (5) to design the reward function as shown in equation (6):

r _t ＝Y(C _t ,F _t ,A _t ) Formula (6)

Wherein r is _t Indicating rewards, C _t Representing the first difference, F _t Representing global change stability, A _t Representing the second difference, Y (C _t ,F _t ,A _t ) Representing C by a multi-layer perceptron _t ,F _t ,A _t Mapping to [0,1 ]]And (3) upper part. As can be seen from this, s is shown in FIG. 4 _t Will be input into the Actor and Critic networks, r _t Will be entered into the Critic network in charge of evaluating the quality of the policy made by the Actor at the last moment. The Actor executes the action according to the state and the strategy, and selects the formula (3)Related knowledge skills k of (2) _i Predicting to obtain personalized recommended problem q _t+1 At the same time problem q _t+1 Input the answer pair probability of the predicted questions into the knowledge tracking networkRate p _t+1 Thereby forming a training closed loop.

The termination condition of the reinforcement learning model is the state s generated at time t _t The target knowledge, which has reached the initial setting, grasps the state vector G, i.e., |C _t - ε|=0, where the sequence of topics recommended before time t is the model-recommended personalized learning path LP obtained in this embodiment, where the personalized learning path LP can be represented by formula (7):

LP＝{q ₁ ,q ₂ ,L,q _t |q _i E|Q| } formula (7)

Based on the above, experimental design was performed on the above embodiments. Specifically, the dataset employs assist09, assist12, and ednet. Baseline methods include RA, GA. The comparison index is that a target G is required to be set first, and a state s is generated at the time t+1 _t+1 Each knowledge skill k _i All reach the predicted value set in the target, the student is regarded as having mastered the target, and the path length can be learned according to the recommended individuation at the moment to serve as an evaluation index. Knowledge skills k related to the subject for observation of recommendation _i Whether or not sufficiently diverse, by enumerating knowledge skills k associated with the recommended problem _i As an evaluation index of diversity.

The specific experimental steps are as follows:

step one, the number of questions |Q|, the number of knowledge skills |SK|, and a knowledge learning diagram is constructed, wherein the entity v _kl ＝Q∪SK，ε _kl Representing the relation related to knowledge learning, and finally defining a knowledge learning graph as follows: g _kl ＝(v _kl ,ε _kl )。

Step two, the original input sequence seq= { x ₁ ,x ₂ ,L,x _t-1 Vectorization, while vector concatenation with knowledge answers results inTime convolution network processing pair learning record { x } ₁ ,x ₂ ,L,x _t-1 Vectorization of the sequence of the vector to obtain the final vector representation +.>The graph attention network is responsible for processing the learning record { x } ₁ ,x ₂ ,L,x _t-1 Vectorization processing of the relevance dependence, and simultaneously, the graph attention network uses a multi-head attention layer to act on weight calculation of each central node, so as to update the characteristic vector of the node, and finally, the knowledge skill node is averaged and pooled to obtain a final vector representation s _t-1 。

Step three, feature vector output s of a combined encoder through a residual network Resnet _t-1 Combining the recommended title q at time t _t Embedded vectorObtaining first hidden output data, then obtaining second hidden output data of a decoder through dense blocks, and finally obtaining probability prediction results of future performance of students through a full connection layer and a sigmoid activation function, namely, for the problem q _t The answer pair probability p of (2) _t 。

Title q at time t _t Combining the prediction probabilities p _t Adding the knowledge mastering state vector s at the time t into a history input sequence, and performing re-processing on the embedding, the time convolution network and the graph annotation force mechanism to obtain the knowledge mastering state vector s at the time t _t As an input state for reinforcement learning.

Step five, strengthening the learning model to the relevant knowledge skills k according to the strategy _i Predicting and outputting personalized recommended problem q _t+1 At the same time problem q _t+1 The answer pair probability p of the predicted questions is input into the knowledge tracking network _t+1 Thereby forming a training closed loop and optimizing reinforcement learning strategies.

It can be known that, in the embodiment, the knowledge tracking part is combined with the time convolution network from the overall consideration of the knowledge tracking part and the reinforcement learning part, so that the knowledge tracking can enlarge the sequence learning range, extract the knowledge state of dynamic change of students and more accurately predict the probability of question answering pairs; the reinforcement learning part is provided with a target of knowledge grasping degree expected to be achieved, so that the reinforcement learning part has more pertinence and practical significance. Secondly, the reward setting of the reinforcement learning part in the prior art is too simple, only the difference between the front and back mastering degrees of recommended problems is considered, and the reward is comprehensively formed by considering the overall target, each round of state in the recommendation process and the difference between the front and back mastering degrees of knowledge skills related to the recommended problems, and the recommendation of the reinforcement learning part can be more accurate and reasonable by comprehensively considering the local aspect and the global aspect.

The embodiment of the invention provides a learning path recommendation system based on knowledge tracking, which comprises the following steps:

the second module is used for acquiring an input sequence and embedding the input sequence to obtain a first vector, wherein elements of the input sequence are used for representing learning records at preset moments, and the learning records comprise learning questions and knowledge answers of target objects to the learning questions;

a third module for inputting the first vector into a time convolution network to obtain a second vector representation, wherein the time convolution network is used for processing the vector of each element in the input sequence against time;

a fourth module, configured to update and infer the knowledge learning graph by using the second vector representation input graph attention network, and at the same time, learn association relationships between nodes by using the graph attention network through a self-attention mechanism, so as to update feature vectors of the nodes, and finally, average and pool knowledge skill nodes to obtain a third vector representation, where the graph attention network is configured to process vectors of association dependence of each element in the input sequence;

a fifth module, configured to predict, according to the third vector representation and the vector after the recommendation topic at the current time is embedded, a topic prediction probability of the target object at the next time;

and a sixth module, configured to input the third vector representation as a historical knowledge mastering state vector into the reinforcement learning model, and obtain a personalized learning path of the target object.

The content of the method embodiment of the invention is suitable for the system embodiment, the specific function of the system embodiment is the same as that of the method embodiment, and the achieved beneficial effects are the same as those of the method.

The embodiment of the invention provides a learning path recommending device based on knowledge tracking, which comprises the following components:

at least one memory for storing a program;

at least one processor for loading the program to perform the learning path recommendation method based on knowledge tracking shown in fig. 1.

The content of the method embodiment of the invention is suitable for the device embodiment, the specific function of the device embodiment is the same as that of the method embodiment, and the achieved beneficial effects are the same as those of the method.

An embodiment of the present invention provides a computer storage medium in which a computer-executable program is stored, which when executed by a processor is configured to implement the learning path recommendation method based on knowledge tracking shown in fig. 1.

The content of the method embodiment of the invention is applicable to the storage medium embodiment, the specific function of the storage medium embodiment is the same as that of the method embodiment, and the achieved beneficial effects are the same as those of the method.

Furthermore, embodiments of the present invention provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions may be read from a computer-readable storage medium by a processor of a computer device, and executed by the processor, to cause the computer device to perform the learning path recommendation method based on knowledge tracking shown in fig. 1.

The embodiments of the present invention have been described in detail with reference to the accompanying drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of one of ordinary skill in the art without departing from the spirit of the present invention. Furthermore, embodiments of the invention and features of the embodiments may be combined with each other without conflict.

Claims

1. The learning path recommending method based on knowledge tracking is characterized by comprising the following steps of:

after obtaining the question prediction probability of the next moment, inputting the third vector representation as a historical knowledge mastering state vector into a reinforcement learning model to obtain a personalized learning path of the target object;

the predicting the question prediction probability of the target object at the next moment according to the third vector representation and the vector with the embedded recommended questions at the current moment comprises the following steps:

inputting the second hidden output data into a full-connection layer and obtaining the question prediction probability of the target object at the next moment through a sigmoid activation function;

inputting the third vector representation as a historical knowledge mastering state vector into a reinforcement learning model to obtain a personalized learning path of the target object, wherein the personalized learning path comprises the following steps:

inputting the knowledge mastering state vector at the current moment into the reinforcement learning model to obtain a personalized learning path of the target object;

the step of inputting the knowledge mastering state vector at the current moment into the reinforcement learning model to obtain the personalized learning path of the target object, comprises the following steps:

2. The learning path recommendation method based on knowledge tracking according to claim 1, wherein the stability of knowledge mastering state of the target object is measured by global change stability condition when the reinforcement learning model is trained;

the expression formula of the global change stability condition is as follows:

representing global change stability conditions; />Representing the state conversion into a tensor, and then calculating a gap value through linear transformation, wherein the gap value is used for measuring the stability of the state change; />The knowledge grasping state vector at all times is shown.

3. The learning path recommendation method based on knowledge tracking according to claim 2, wherein the difference in the degree of mastery before and after knowledge skills associated with the subject recommended at the current time is compared by the second difference when the reinforcement learning model is trained; wherein the expression formula of the second difference is as follows:

representing the second difference, ++>A question which indicates the recommendation of the current time t>Associated knowledge skills->Answer probability at current time t, +.>The knowledge skills->Answer probability at last moment, +.>A question which indicates the recommendation of the current time t>Associated knowledge skills->Number of times recommended.

4. A learning path recommendation method based on knowledge tracking according to claim 3, wherein the reward function expression formula of said reinforcement learning model is as follows:

indicating rewards, funnels>Representing the first difference, ++>Indicating global change stability, ∈>A second difference is indicated by the fact that,indicating that ∈10 is to be perceived by a multi-layer perceptron>Mapping to [0,1 ]]And (3) upper part.

5. A learning path recommendation system based on knowledge tracking, comprising:

a sixth module, configured to input the third vector representation as a historical knowledge grasping state vector into a reinforcement learning model after obtaining a topic prediction probability at a next moment, so as to obtain a personalized learning path of the target object;

6. A learning path recommendation device based on knowledge tracking, comprising:

at least one memory for storing a program;

at least one processor for loading the program to perform the knowledge tracking based learned route recommendation method according to any of claims 1-4.

7. A computer storage medium, in which a computer executable program is stored, which when executed by a processor is adapted to implement the knowledge tracking based learned route recommendation method according to any of claims 1-4.