CN116680477A - Personalized problem recommendation method based on reinforcement learning - Google Patents

Personalized problem recommendation method based on reinforcement learning Download PDF

Info

Publication number
CN116680477A
CN116680477A CN202310703313.2A CN202310703313A CN116680477A CN 116680477 A CN116680477 A CN 116680477A CN 202310703313 A CN202310703313 A CN 202310703313A CN 116680477 A CN116680477 A CN 116680477A
Authority
CN
China
Prior art keywords
learner
model
reinforcement learning
personalized
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310703313.2A
Other languages
Chinese (zh)
Inventor
张天成
李季
李捷
张馨艺
于明鹤
于戈
Original Assignee
东北大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 东北大学 filed Critical 东北大学
Priority to CN202310703313.2A priority Critical patent/CN116680477A/en
Publication of CN116680477A publication Critical patent/CN116680477A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a personalized problem recommendation method based on reinforcement learning, and relates to the technical field of education data mining. According to the invention, the learning record of the learner is firstly obtained, the potential knowledge level of the learner is judged through the knowledge tracking model, and the potential knowledge level is used as a part of the characteristics of the learner, so that the characteristic modeling of the learner is more accurate. And then, the unsatisfactory problems selected by the learner by mistake in the problem records are deleted through a reinforcement learning algorithm, so that the recommendation accuracy is improved. And finally, performing problem recommendation on the learner through the personalized recommendation model. The invention combines personalized recommendation, knowledge tracking and reinforcement learning algorithms, considers the potential knowledge level of a learner, removes the influence caused by wrong selection problems in the learning process, and has important theoretical and practical application values.

Description

Personalized problem recommendation method based on reinforcement learning
Technical Field
The invention relates to the technical field of education data mining, in particular to a personalized problem recommendation method based on reinforcement learning.
Background
The development of emerging information and communication technologies such as mobile communications, the internet of things, cloud computing, big data, and artificial intelligence is changing the thinking, production, life, and learning ways of humans. The present education is developing in the direction of 'networking, digitalization, individuation, ubiquitous and intelligent' as main characteristics, and a large number of novel education modes such as mobile learning, generalized learning, intelligent learning, mixed learning and the like are emerging.
In recent years, online learning is an emerging personalized learning mode, and by virtue of convenience, openness and richness of learning resources, the online learning successfully attracts registration and use of a large number of learners. In a new generation of learning environment based on the Internet, the learning time is more flexible, the learning method is more various, and the learning resources are more abundant. The learner can autonomously arrange learning time, learning mode and learning resources according to self-learning conditions and learning targets.
However, unlike the conventional classroom, the online education platform cannot supervise and guide the learner in real time, thus creating problems of 'information overload' and 'knowledge navigation'. These problems are mainly manifested in that when a learner faces a large number of learning resources with good quality, a great deal of time is often required to find the learning resources of interest, and meanwhile, how to perform learning planning is not known, and sometimes learning cannot be completed effectively even if a great deal of time is spent. These problems may lead to reduced learning efficiency, reduced learning quality, reduced learning enthusiasm, and increased risk of learning failure. The occurrence of these problems has led to the foreshadowing of numerous educators and researchers, and how to use computers to guide and assist learners instead of teachers has gradually become a popular direction of research.
The problem that online learners are difficult to find interesting problem resources when facing massive learning resources is solved, a feasible personalized problem recommendation algorithm method is provided, so that learning efficiency of the learners is greatly improved, and the problem is needed to be solved at present, and the following three problems are needed to be considered:
first, how to accurately construct the learner's features.
The conventional personalized recommendation model, whether a matrix decomposition model, a cyclic neural network model or an attention mechanism model, models the characteristics of a learner through the problem records of the learner when solving the problem recommendation problem, and does not consider the performance of the learner on the practice problem, so the following problems may occur: assume that learner i and learner j have substantially the same problem record, but different performance on the problem. Learner i has made most exercises and learner j has made most exercises, so that the exercises they select at the next moment are likely to be different.
It can be seen that building features of a learner based only on problems that the learner has made is not accurate enough. How to consider potential knowledge levels of a learner when modeling the learner is a primary concern.
Secondly, how to dig out the influence caused by the wrong choice problem in the learning process.
Often, a learner selects unsatisfactory problems, such as unsatisfactory difficulty or unsatisfactory category, but the problem records do not include the satisfaction degree of the learner on the problems, and the problems of the wrong selection form interference items when modeling the interest characteristics of the learner. Although researchers have attempted to distinguish the importance of problems by assigning different attention coefficients to each of the learner's historical problems through an attention mechanism, the effects of these misconvergence problems still cannot be completely eliminated. How to remove the effect of the misconvergence problem is a necessary issue to consider.
Thirdly, how to accurately conduct problem recommendation.
After considering the potential knowledge level of the learner and removing the influence caused by the wrong choice of the problems, what is needed to be done finally is how to accurately recommend the problems to the learner. The choice of which personalized recommendation algorithm is therefore an important issue to consider.
The problem encountered in online education is handled in combination with reinforcement learning-related algorithms, which is a research hotspot in current educational data mining. The knowledge tracking model, the personalized recommendation model and the reinforcement learning model are combined, the potential knowledge level of a learner is considered, the influence caused by the problem of wrong selection is removed, and the problem of information overload in online education is effectively solved. Personalized problem recommendation by reinforcement learning in online education is a better way to improve learning efficiency of learners.
Disclosure of Invention
The invention aims to solve the technical problem of providing a personalized problem recommendation method based on reinforcement learning aiming at the defects of the prior art, which is a personalized problem recommendation method based on reinforcement learning and combining a knowledge tracking model and a personalized recommendation model, and is used for solving the real problem that a learner is difficult to find interested learning resources in online education.
In order to solve the technical problems, the invention adopts the following technical scheme:
a personalized problem recommendation method based on reinforcement learning comprises the following steps:
step 1: calculating potential knowledge level of the learner by using the knowledge tracking model, and adding the potential knowledge level into the characteristic construction of the personalized recommendation model and the state representation of the problem record modification model;
step 2: constructing and training a personalized recommendation model for problem recommendation;
step 3: designing and training a problem record modification model based on a Deep Q-Learning algorithm of reinforcement Learning to remove dislike or dissatisfaction problems selected by mistake in the Learning process;
step 4: performing joint training on the personalized recommendation model and the problem record modification model;
step 5: and (3) modifying the problem records of the learner by using the problem record modification model obtained after the combined training in the step (4), and recommending the problems of the learner by using the personalized recommendation model obtained after the combined training in the step (4) to obtain a problem recommendation list.
Further, in the step 1, the knowledge tracking model is a depth knowledge tracking model DKT; the DKT model predicts the question score at the next moment according to the historic learning record of the learner by utilizing a time sequence relation through a long-short-period memory network LSTM; the DKT model firstly generates a one-hot vector from the historical achievements of a learner through one-hot coding, the one-hot vector is input into an LSTM network, features are extracted through the LSTM layer, the extracted features are input into a hidden layer, then a prediction result is output from an output layer, and the output of the DKT model represents the probability of each problem correctly answered by the learner, namely the achievement of the next answer of the learner; the output of the LSTM layer is used as the potential knowledge level of the learner and is added to the characteristic construction of the personalized recommendation model and the state representation of the problem record modification model. The input of DKT model is training record of learnerThe exercise record of learner i at time t is specifically denoted +.> wherein />Question indicating that learner i selected at time t,/->The answer result of the learner i at the time t is shown; recording of exercises->Comprises only the exercises of learner i selection learning, exercise record +.>The answer result of learner i is also recorded.
Further, the personalized recommendation model in the step 2 comprises three parts, namely an Embedding layer, a GRU layer and a full connection layer; the Embedding layer is used for mapping one-hot vectors of problem records made by learners to a low-dimensional vector space for encoding; the GRU layer is a gating circulation unit layer, and the layer is also an improved circulation neural network model and is used for extracting sequence characteristics of problem records; the full-connection layer is used for calculating the probability of each problem selected by the learner through the characteristics of the learner, and recommending the problems for the learner according to the size of the selected probability.
Further, the specific method of the step 2 is as follows:
step 2-1: recording problems made by the learner i through an Embedding layerIs>Mapping to low-dimensional vector space for coding, and outputting as low-dimensional vector
Step 2-2: extracting sequence features of problem records through the GRU layer;
the update gate of the GRU determines the amount by which the state information at the previous time and the state information at the current time continue to be transferred into the future, and the calculation formula is as follows:
wherein ,low representing problem done by learner i at time tDimension vector representation, h t-1 Hidden state information indicating time t-1, W z The weight coefficient representing the update gate, σ (·) is the sigmod activation function;
the reset gate of the GRU layer determines the amount by which the state information of the previous time is to be forgotten, and the calculation formula is as follows:
wherein ,Wr A weight coefficient representing a reset gate;
the calculation formula of the current memory content is shown as follows:
wherein ,Wh Is another weight coefficient of the reset gate, reset gate r t And hidden state information h t-1 The corresponding element product of (a) determines the information to be preserved at the previous moment, which is an operator representing the dot product of the matrix;
the final memorized calculation formula of the current time step is shown as follows:
wherein, (1-z t )*h t-1 Information representing the previous time is retained to the amount that is ultimately remembered at the current time,representing the amount of final memory of the current memory content reserved to the current moment; h finally obtained t The sequence characteristic of the problem records of the learner;
step 2-3: the probability of each problem selected by the learner is calculated according to the characteristics of the learner through the full connection layer, and the following formula is shown:
y=softmax(W j ·[K i ,h t ]+b j )
wherein ,Wj Is the weight coefficient of the full connection layer, b j Is the bias factor of the fully connected layer,is the potential knowledge level of learner i calculated by the DKT model; [ K ] i ,h t ]Is the sequence characteristic h of the learner problem record obtained by combining the potential knowledge level of the learner i with the GRU layer t Splicing; softmax (·) is an activation function, limiting the output value between 0 and 1;
step 2-4: the personalized recommendation model adopts cross entropy as a loss function to train and update the model, and the calculation formula is shown as follows:
wherein M is the number of learners, p i Is the true probability distribution of the problem selected by learner i at the next moment, q i The personalized recommendation model gives out the prediction probability distribution for representing the problem selected by the learner i at the next moment;
the cross entropy loss function is an index for measuring the difference between the real probability distribution p and the model predictive probability distribution q;
step 2-5: and sequencing the probability of selecting each problem by the learner i calculated by the personalized recommendation model according to the sequence from big to small, and recommending the first K problems to the learner i to form a problem recommendation list.
Further, the problem record modification model in the step 3 adopts a reinforcement learning related algorithm, including action representation, state representation, rewarding function of the model and reinforcement learning algorithm, and specifically comprises the following steps:
in order to delete problems that are disliked or unsatisfied in the learning process of the learner, the action a of each step t With only two values, a t =0 means that the problem is deleted in the problem record, a t =1 means that the problem is kept in the recordLeaving the problem;
the status of the learner is represented by the following formula:
S=[k 1 ,k 2 ,...,k N ,p 1 ,p 2 ,...,p N ]
wherein ,k1 ,k 2 ,...,k N Representing potential knowledge levels of learners, specifically to the ith learner as Given by a knowledge tracking model; p is p 1 ,p 2 ,...,p N Is a low-dimensional vector representation of learner problem records and location identifiers that function to record the location of modifications;
the reward function of the reinforcement learning module is given by a personalized recommendation model, and the form is shown as follows:
wherein ,etarget Is the problem actually selected by the learner at the next moment,representing the probability of selecting a target problem based on the modified problem records, p (e target |E i ) Representing a probability of selecting a target problem based on the original problem record; the reinforcement learning module adopts a round update strategy, and obtains a reward function only after finishing the modification of the whole learning record of one learner, and the reward function is 0 at the rest time;
the reinforcement learning algorithm adopts a depth Q network algorithm DQN, and the algorithm combines a neural network and a Q-learning algorithm in the traditional reinforcement learning algorithm;
the reinforcement learning module takes the square of the difference between the true value and the predicted value as a loss function, trains and updates the parameters of the DQN model, and the specific formula of the loss function is shown as follows:
wherein ,Qθ (s t ,a t ) Represented in state s t Lower selection action a t Calculating the obtained predicted value of the rewards by a predicted Q network, wherein the network parameter of the predicted Q network is theta;representing state s t Lower selection action a t A true value of the prize available; wherein->Calculated by the target Q network, represents the next state s t+1 The maximum prize value that can be obtained, the network parameters of the target Q network are +.>r t Is the current available reward value, which is given by the reward function;
the gradient of the loss function is shown as follows:
network parameters are updated according to the gradient descent.
Further, the specific procedure of modifying the learner problem record in the step 3 is as follows:
step 3-1: initializing a model, including initializing parameters of a predictive Q network and a target Q network; initializing an experience playback pool, wherein the capacity is N; initializing a learner-modified problem record setLearner index i=1, whenT=0;
step 3-2: obtaining problem records of learner iAnd an initial state s 0
Step 3-3: state s t Feature vector phi(s) t ) As the input of the predictive Q network, obtaining the Q value corresponding to the action in the current state;
step 3-4: selecting action a in current Q value by adopting epsilon-greedy strategy t
Step 3-5: if a is t =0, deleteIs->
Step 3-6: in state s t Executing the current action a t Obtaining the next state s t+1 Sum prize r t
Step 3-7: will { s ] t ,a t ,r t ,s t+1 This quadruple is stored in an experience playback pool;
step 3-8: updating state s t =s t+1
Step 3-9: sampling m samples { s } from an empirical playback pool j ,a j ,r j ,s j+1 J=1, 2, …, m, calculating the current target Q value y j
Step 3-10: using a mean square error loss functionUpdating parameters of the predictive Q network;
step 3-11: updating parameters of the target Q network after each step C, wherein the parameter value is the parameter value of the current predicted Q network;
step 3-12: judging whether the moment reaches a set value T or not; if not, returning to the step 3-3; if so, executing the next step;
step 3-13: record E of the problem after modification i Is marked asWill->Added to the problem record set modified by the learner
Step 3-14: judging whether all the problem records of the learners are modified, if not, returning to the step 3-2, continuing the modification of the problem records of the next learner, and if so, ending the step.
Further, the process of the step 4 joint training is specifically as follows:
step 4-1: initializing parameters α=α of personalized recommendation model 0 Parameter β=β of knowledge tracking model 0 And the parameter θ=θ of the reinforcement learning module 0
Step 4-2: using learner exercise recordsTraining a knowledge tracking model;
step 4-3: recording of problems with learnerTraining the personalized recommendation model by the knowledge tracking model;
step 4-4: parameter α=α of the fixed personalized recommendation model 1 And parameter β=β of knowledge tracking model 1 Pre-training the reinforcement learning module; the specific method comprises the following steps:
step 4-4-1: reinforced learning algorithm in problem recordingA step of up-selecting action;
step 4-4-2: calculating a Reward function Reward according to the selected action;
step 4-4-3: updating parameters of the reinforcement Learning module according to a loss function of the Deep Q-Learning algorithm;
step 4-4-4: circularly executing the steps 4-4-1 to 4-4-3 until all problems are recordedThe circulation is completed;
step 4-4-5: repeating the steps 4-4-1 to 4-4-4 until the parameters of the reinforcement learning module reach the optimal values;
step 4-5: parameter β=β for fixed knowledge tracking 1 Performing joint training on the personalized recommendation model and the reinforcement learning module; the specific method comprises the following steps:
step 4-5-1: reinforced learning algorithm in problem recordingA step of up-selecting action;
step 4-5-2: calculating a Reward function Reward according to the selected action;
step 4-5-3: updating parameters of the reinforcement Learning module according to a loss function of the Deep Q-Learning algorithm;
step 4-5-4: step 4-5-1 to step 4-5-3 are circularly performed until allThe circulation is completed;
step 4-5-5: updating parameters of the recommendation model according to the loss function of the recommendation model;
step 4-5-6: and repeatedly and circularly executing the steps 4-5-1 to 4-5-5 until the parameters of the personalized recommendation model and the reinforcement learning module reach the optimal.
The beneficial effects of adopting above-mentioned technical scheme to produce lie in: according to the personalized problem recommendation method based on reinforcement learning, firstly, the learning record of a learner is obtained, the potential knowledge level of the learner is judged through the knowledge tracking model, and the potential knowledge level is used as a part of the characteristics of the learner, so that the characteristic modeling of the learner is more accurate. Then, the chapter tries to delete unsatisfactory problems selected by the learner by mistake in the problem records through a reinforcement learning algorithm, so that the recommendation accuracy is improved. And finally, performing problem recommendation on the learner through the personalized recommendation model. The method combines personalized recommendation, knowledge tracking and reinforcement learning algorithms, considers the potential knowledge level of a learner, removes the influence caused by wrong selection problems in the learning process, and has important theoretical and practical application values.
Drawings
FIG. 1 is a diagram of a personalized problem recommendation model provided by an embodiment of the present invention;
FIG. 2 is a flowchart of a personalized problem recommendation method based on reinforcement learning according to an embodiment of the present invention;
fig. 3 is a block diagram of a knowledge tracking model DKT provided by an embodiment of the present invention;
FIG. 4 is a block diagram of a long and short term memory network LSTM provided by an embodiment of the present invention;
FIG. 5 is a block diagram of a personalized recommendation model provided by an embodiment of the present invention;
fig. 6 is a block diagram of a deep Q network DQN provided by an embodiment of the invention.
Detailed Description
The following describes in further detail the embodiments of the present invention with reference to the drawings and examples. The following examples are illustrative of the invention and are not intended to limit the scope of the invention.
The embodiment provides a personalized problem recommendation method based on reinforcement learning, as shown in fig. 1, a model constructed by the method of the embodiment is composed of three parts, namely a knowledge tracking model, a personalized recommendation model and a problem record modification model. The knowledge tracking model can calculate potential knowledge level of the learner and add the potential knowledge level to the characteristic construction of the personalized recommendation model and the state representation of the problem record modification model. The personalized recommendation model provides a reward function for the problem record modification model and recommends problems for learners. The problem record modification model modifies the history problem record of the learner, and judges and updates the modified problem according to the reward function provided by the personalized learning model, thereby improving the accuracy of problem recommendation. The flow of this method is shown in FIG. 2, and the specific method is as follows.
Step 1: the potential knowledge level of the learner can be calculated by using the knowledge tracking model and added to the characteristic construction of the personalized recommendation model and the state representation of the problem record modification model.
The knowledge tracking model employed in this embodiment is a depth knowledge tracking model (Deep Knowledge Tracing, DKT). The DKT model predicts the topic performance at the next moment according to the historic learning record of the learner by using a time sequence relation through a cyclic neural network or a long-short-term memory network LSTM. The recurrent neural network in this embodiment employs a long short-term memory network LSTM. The DKT model firstly generates a one-hot vector from the historical achievements of the learner through one-hot coding, inputs the one-hot vector into an LSTM network, extracts characteristics through the LSTM layer, inputs the characteristics into a hidden layer, and then outputs a prediction result from an output layer, wherein the output of the DKT represents the probability of the learner to answer each problem correctly, namely the achievement of the next answer of the learner.
The structure of the DKT model is shown in fig. 3, where the model is a knowledge tracking model based on Long Short-Term Memory (LSTM) network, and the potential knowledge level of the learner can be determined by the learner's performance on the learning record. The input of the DKT model is the training record of learner iThe exercise record of learner i at time t is specifically denoted +.> wherein />Problem number indicating the choice of learner i at time t,/->The learner i's performance on the problem at time t is shown, 1 indicates that the problem was done, and 0 indicates that the problem was done incorrectly. First will->Conversion to one-hot vector by one-hot coding>Is input into the LSTM network.
The LSTM network is an improved recurrent neural network, which can solve the problem that RNNs cannot handle long-distance dependence, and the LSTM structure is shown in figure 4.
Different from the cyclic neural network, the long-term and short-term memory neural network introduces a memory state, and the three gating units of the neurons are used for controlling the stored information, so that the memory state of the neurons always stores the information of the whole long sequence.
The forget gate in the LSTM network is responsible for controlling the state reservation at the last moment, and the calculation formula is shown as follows:
wherein ,Wf Is a weight matrix of the forgetting gate,is the input of a forgetting gate at time t, here the training record of learner i at time t,/>Representing the concatenation of two vectors, h t-1 Indicating the output at time t-1, b f The bias term representing the forgetting gate, σ (·) is the sigmoid activation function.
An input gate in the LSTM network is responsible for controlling the input of the current state into the long-term state, and the calculation formula is as follows:
wherein ,WI Is the weight matrix of the input gate, b I Is an offset term of the input gate.
The cell state of the current input is represented as follows:
wherein ,Wc Is a weight matrix of cell states, b c Is an offset term for the cell state, and tanh is an activation function.
Through the above three formulas, the cell state C at the previous time t-1 The cell state at the current time is obtained as shown in the following formula:
where x is an operator representing the dot product of the matrix.
The output gate in the LSTM network is responsible for controlling whether the long-term status is taken as the current output, which is expressed as follows:
wherein ,Wo Is the weight matrix of the output gate, b c Is the bias term of the output gate.
Finally, the output state is obtained by the following formula:
h t =o t *tanh(C t )
the DKT model can comprehensively consider the exercise performance of the learner for a long time and the recent exercise performance, therebyThe potential knowledge level of the learner is determined. And wherein the design of the forgetting gate conforms to the feature that the learner will decrease over time, with a gradual decrease in the level of mastery of previously learned knowledge. The present embodiment marks the output of the LSTM layer as the knowledge level at the potential N knowledge points of learner i asWhich is used as part of the learner's profile to enhance the performance of the recommendation.
Step 2: a personalized recommendation model is built and trained, and comprises three parts, namely an Embedding layer, a GRU layer and a full connection layer. The Embedding layer is used for mapping one-hot vectors of problem records made by a learner i to a low-dimensional vector space for encoding; the GRU layer is a gating circulation unit layer, and is also an improved circulation neural network model and used for extracting sequence characteristics of problem records; the function of the full connection layer is to calculate the probability of selecting each problem by the learner through the characteristic of the learner i, and to recommend problems for the learner according to the size of the selected probability. The personalized recommendation model has two functions: firstly, providing a reward function for the problem record modification model, and secondly, recommending problems for learners. The personalized recommendation model structure is shown in fig. 5, and the specific method is as follows.
Step 2-1: recording problems made by the learner i through an Embedding layerIs>Mapping to low-dimensional vector space for coding, and outputting as low-dimensional vector
Step 2-2: and extracting sequence features of the problem records through the GRU layer.
The GRU layer has only two operations, update gates and reset gates. The GRU layer calculates the output of the reset gate and the update gate according to the input of the current moment and the network hiding state of the last moment, calculates the candidate hiding state according to the input of the current moment and the output of the reset gate, obtains the final hiding state according to the candidate hiding state and the output of the update gate, and obtains the output of the current moment according to the hiding state.
The update gate of the GRU determines the amount by which the state information at the previous time and the state information at the current time continue to be transferred into the future, and the calculation formula is as follows:
wherein ,a low-dimensional vector representation representing problems performed by learner i at time t, h t-1 Hidden state information indicating time t-1, W z The weight coefficient representing the update gate, σ (·) is the sigmoid activation function.
The reset gate of the GRU layer determines the amount by which the state information of the previous time is to be forgotten, and the calculation formula is as follows:
wherein ,Wr Representing the weight coefficient of the reset gate.
The calculation formula of the current memory content is shown as follows:
wherein ,Wh Is reset gate r t Is used to reset the gate r t And hidden state information h t-1 The corresponding element product of (a) determines the information to be retained at the previous instant, an operator representing the dot product of the matrix.
The final memorized calculation formula of the current time step is shown as follows:
wherein, (1-z t )*h t-1 Information representing the previous time is retained to the amount that is ultimately remembered at the current time,indicating the amount of memory that the current memory content remains to the end of the current time. H finally obtained t Is the sequence feature of the learner problem record.
Step 2-3: the probability of each problem selected by the learner is calculated according to the characteristics of the learner through the full connection layer, and the following formula is shown:
y=softmax(W j ·[K i ,h t ]+b j )
wherein ,Wj Is the weight coefficient of the full connection layer, b j Is the bias factor of the fully connected layer,is the potential knowledge level of learner i calculated by the DKT model; [ K ] i ,h t ]Is the sequence characteristic h of the learner problem record obtained by combining the potential knowledge level of the learner i with the GRU layer t Splicing; softmax (·) is an activation function, limiting the output value between 0 and 1;
step 2-4: the personalized recommendation model adopts cross entropy as a loss function to train and update the model, and the calculation formula is shown as follows:
wherein M is the number of learners, p i Is the true probability distribution of the problem selected by learner i at the next moment, q i Is to give a personalized recommendation model to indicate that the learner i is at the next momentA predictive probability distribution of the problem is selected. The cross entropy loss function is an indicator that measures the difference between the true probability distribution p and the model predictive probability distribution q.
Step 2-5: and sequencing the probability of selecting each problem by the learner i obtained by calculation of the personalized recommendation model according to the sequence from big to small, and recommending the first K problems to the learner i to form a problem recommendation list.
Step 3: a problem record modification model is constructed and trained to remove dislike or dissatisfied problems which are mistakenly selected by a learner in the learning process, so that problem recommendation is more accurately performed for the learner. Because the problem record modification model adopts a reinforcement learning related algorithm, the action representation, the state representation, the rewarding function and the reinforcement learning algorithm of the model are described in detail according to the general development flow of reinforcement learning.
(1) Motion representation
The problem record modification model is used for deleting the problems that the learner dislikes or is dissatisfied with, so that the action a of each step t With only two values, a t =0 means that the problem is deleted in the problem record, a t =1 means that the problem is retained in the problem record.
(2) State representation
The status of the learner is represented by the following formula:
S=[k 1 ,k 2 ,...,k N ,p 1 ,p 2 ,...,p N ]
wherein ,k1 ,k 2 ,...,k N Representing potential knowledge levels of the learner, given by a knowledge tracking model; p is p 1 ,p 2 ,...,p N Is a low-dimensional vector representation of the learner problem record and a location identifier that functions to record the location of the modification.
(3) Reward function
The reward function of the reinforcement learning module is given by a personalized recommendation model, and the form is shown as follows:
wherein ,etarget Is the problem actually selected by the learner at the next moment,representing the probability of selecting a target problem based on the modified problem records, p (e target |E i ) Representing the probability of selecting a target problem based on the original problem record. The reinforcement learning module adopts a round-trip updating strategy, and obtains the rewarding function only after finishing the modification of the whole learning record of one learner, and the rewarding function is 0 at the rest time.
(4) Reinforcement learning algorithm
The present embodiment employs a Deep Q Network (DQN) algorithm that combines a neural Network with a Q-learning algorithm in a conventional reinforcement learning algorithm. The structure of the DQN is shown in fig. 6.
The reinforcement learning module takes the square of the difference between the true value and the predicted value as a loss function, trains and updates the parameters of the DQN model, and the specific formula of the loss function is shown as follows:
wherein ,Qθ (s t ,a t ) Represented in state s t Lower selection action a t And calculating the obtained predicted value of the rewards by a predicted Q network, wherein the network parameter of the predicted Q network is theta.Representing state s t Lower selection action a t The true value of the prize that can be achieved. Wherein->Calculated by the target Q network, represents the next state s t+1 Maximum prize value obtainable, target QThe network parameters of the network are->r t Is the current prize value available, given by the prize function.
The gradient of the loss function is shown in the following equation, and the network parameters are updated according to the gradient descent.
The specific procedure for the learner problem record modification is as follows:
step 3-1: initializing a model, including initializing parameters of a predictive Q network and a target Q network; initializing an experience playback pool, wherein the capacity is N; initializing a learner-modified problem record setLearner index i=1, time t=0;
step 3-2: obtaining problem records of learner iAnd an initial state s 0
Step 3-3: state s t Feature vector phi(s) t ) As the input of the predictive Q network, obtaining the Q value corresponding to the action in the current state;
step 3-4: selecting action a in current Q value by adopting epsilon-greedy strategy t
Step 3-5: if a is t =0, deleteIs->
Step 3-6: in state s t Executing the current action a t Obtaining the next state s t+1 Sum prize r t
Step 3-7: will { s ] t ,a t ,r t ,s t+1 This quadruple is stored in an experience playback pool;
step 3-8: updating state s t =s t+1
Step 3-9: sampling m samples { s } from an empirical playback pool j ,a j ,r j ,s j+1 J=1, 2, …, m, calculating the current target Q value y j
Step 3-10: using a mean square error loss functionUpdating parameters of the predictive Q network;
step 3-11: updating parameters of the target Q network after each step C, wherein the parameter value is the parameter value of the current predicted Q network;
step 3-12: judging whether the moment reaches a set value T or not; if not, returning to the step 3-3; if so, executing the next step;
step 3-13: record E of the problem after modification i Is marked asWill->Added to the problem record set modified by the learner
Step 3-14: judging whether all the problem records of the learners are modified, if not, returning to the step 3-2, continuing the modification of the problem records of the next learner, and if so, ending the step.
Step 4: and carrying out combined training on the personalized recommendation model and the problem record modification model to obtain optimal model parameters, and improving the accuracy of problem recommendation. The combined training process of the personalized problem recommendation model based on the reinforcement learning algorithm provided in this embodiment is specifically as follows.
Step 4-1: initializing parameters α=α of personalized recommendation model 0 Parameter β=β of knowledge tracking model 0 And the parameter θ=θ of the reinforcement learning module 0
Step 4-2: using learner exercise recordsRecording and training a knowledge tracking model;
step 4-3: recording using problemsRecording and training the personalized recommendation model by the knowledge tracking model;
step 4-4: parameter α=α of the fixed personalized recommendation model 1 And parameter β=β of knowledge tracking model 1 Pre-training the reinforcement learning module; the specific method comprises the following steps:
step 4-4-1: reinforced learning algorithm in problem recordingA step of up-selecting action;
step 4-4-2: calculating a Reward function Reward according to the selected action;
step 4-4-3: updating parameters of the reinforcement Learning module according to a loss function of the Deep Q-Learning algorithm;
step 4-4-4: circularly executing the steps 4-4-1 to 4-4-3 until all problems are recordedThe circulation is completed;
step 4-4-5: repeating the steps 4-4-1 to 4-4-4 until the parameters of the reinforcement learning module reach the optimal values;
step 4-5: parameter β=β for fixed knowledge tracking 1 Performing joint training on the personalized recommendation model and the reinforcement learning module; the specific method comprises the following steps:
step 4-5-1: reinforced learning algorithm in problem recordingA step of up-selecting action;
step 4-5-2: calculating a Reward function Reward according to the selected action;
step 4-5-3: updating parameters of the reinforcement Learning module according to a loss function of the Deep Q-Learning algorithm;
step 4-5-4: step 4-5-1 to step 4-5-3 are circularly performed until allThe circulation is completed;
step 4-5-5: updating parameters of the recommendation model according to the loss function of the recommendation model;
step 4-5-6: and repeatedly and circularly executing the steps 4-5-1 to 4-5-5 until the parameters of the personalized recommendation model and the reinforcement learning module reach the optimal.
Step 5: and (3) modifying the problem records of the learner by using the problem record modification model obtained after the combined training in the step (4), and recommending the problems of the learner by using the personalized recommendation model obtained after the combined training in the step (4) to obtain a problem recommendation list.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions, which are defined by the scope of the appended claims.

Claims (7)

1. A personalized problem recommendation method based on reinforcement learning is characterized in that: the method comprises the following steps:
step 1: calculating potential knowledge level of the learner by using the knowledge tracking model, and adding the potential knowledge level into the characteristic construction of the personalized recommendation model and the state representation of the problem record modification model;
step 2: constructing and training a personalized recommendation model for problem recommendation;
step 3: designing and training a problem record modification model based on a Deep Q-Learning algorithm of reinforcement Learning to remove dislike or dissatisfaction problems selected by mistake in the Learning process;
step 4: performing joint training on the personalized recommendation model and the problem record modification model;
step 5: and (3) modifying the problem records of the learner by using the problem record modification model obtained after the combined training in the step (4), and recommending the problems of the learner by using the personalized recommendation model obtained after the combined training in the step (4) to obtain a problem recommendation list.
2. The reinforcement learning-based personalized problem recommendation method of claim 1, wherein: in the step 1, the knowledge tracking model is a depth knowledge tracking model DKT; the DKT model predicts the question score at the next moment according to the historic learning record of the learner by utilizing a time sequence relation through a long-short-period memory network LSTM; the DKT model firstly generates a one-hot vector from the historical achievements of a learner through one-hot coding, the one-hot vector is input into an LSTM network, features are extracted through the LSTM layer, the extracted features are input into a hidden layer, then a prediction result is output from an output layer, and the output of the DKT model represents the probability of each problem correctly answered by the learner, namely the achievement of the next answer of the learner; taking the output of the LSTM layer as the potential knowledge level of a learner, and adding the output into the characteristic construction of the personalized recommendation model and the state representation of the problem record modification model; the input of DKT model is training record of learnerTraining memory of learner i at time tThe transcript is shown as-> wherein />Question indicating that learner i selected at time t,/->The answer result of the learner i at the time t is shown; recording of exercises->Comprises only the exercises of learner i selection learning, exercise record +.> The answer result of learner i is also recorded.
3. The reinforcement learning-based personalized problem recommendation method of claim 1, wherein: the personalized recommendation model in the step 2 comprises three parts, namely an Embedding layer, a GRU layer and a full-connection layer; the Embedding layer is used for mapping one-hot vectors of problem records made by learners to a low-dimensional vector space for encoding; the GRU layer is a gating circulation unit layer, and the layer is also an improved circulation neural network model and is used for extracting sequence characteristics of problem records; the full-connection layer is used for calculating the probability of each problem selected by the learner through the characteristics of the learner, and recommending the problems for the learner according to the size of the selected probability.
4. The reinforcement learning-based personalized problem recommendation method of claim 3, wherein: the specific method of the step 2 is as follows:
step 2-1: recording problems made by the learner i through an Embedding layerIs>Mapping to low-dimensional vector space for coding, and outputting as low-dimensional vector +.>
Step 2-2: extracting sequence features of problem records through the GRU layer;
the update gate of the GRU determines the amount by which the state information at the previous time and the state information at the current time continue to be transferred into the future, and the calculation formula is as follows:
wherein ,a low-dimensional vector representation representing problems performed by learner i at time t, h t-1 Hidden state information indicating time t-1, W z The weight coefficient representing the update gate, σ (·) is the sigmod activation function;
the reset gate of the GRU layer determines the amount by which the state information of the previous time is to be forgotten, and the calculation formula is as follows:
wherein ,Wr A weight coefficient representing a reset gate;
the calculation formula of the current memory content is shown as follows:
wherein ,Wh Is another weight coefficient of the reset gate, reset gate r t And hidden state information h t-1 The corresponding element product of (a) determines the information to be preserved at the previous moment, which is an operator representing the dot product of the matrix;
the final memorized calculation formula of the current time step is shown as follows:
wherein, (1-z t )*h t-1 Information representing the previous time is retained to the amount that is ultimately remembered at the current time,representing the amount of final memory of the current memory content reserved to the current moment; h finally obtained t The sequence characteristic of the problem records of the learner;
step 2-3: the probability of each problem selected by the learner is calculated according to the characteristics of the learner through the full connection layer, and the following formula is shown:
y=softmax(W j ·[K i ,h t ]+b j )
wherein ,Wj Is the weight coefficient of the full connection layer, b j Is the bias factor of the fully connected layer,is the potential knowledge level of learner i calculated by the DKT model; [ K ] i ,h t ]Is the sequence characteristic h of the learner problem record obtained by combining the potential knowledge level of the learner i with the GRU layer t Splicing; softmax (·) is an activation function, limiting the output value between 0 and 1;
step 2-4: the personalized recommendation model adopts cross entropy as a loss function to train and update the model, and the calculation formula is shown as follows:
wherein M is the number of learners, p i Is the true probability distribution of the problem selected by learner i at the next moment, q i The personalized recommendation model gives out the prediction probability distribution for representing the problem selected by the learner i at the next moment;
the cross entropy loss function is an index for measuring the difference between the real probability distribution p and the model predictive probability distribution q;
step 2-5: and sequencing the probability of selecting each problem by the learner i calculated by the personalized recommendation model according to the sequence from big to small, and recommending the first K problems to the learner i to form a problem recommendation list.
5. The reinforcement learning-based personalized problem recommendation method of claim 4, wherein: the problem record modification model in the step 3 adopts a reinforcement learning related algorithm, comprising action representation, state representation, rewarding function of the model and reinforcement learning algorithm, and specifically comprises the following steps:
in order to delete problems that are disliked or unsatisfied in the learning process of the learner, the action a of each step t With only two values, a t =0 means that the problem is deleted in the problem record, a t =1 means that the problem is retained in the problem record;
the status of the learner is represented by the following formula:
S=[k 1 ,k 2 ,…,k N ,p 1 ,p 2 ,…,p N ]
wherein ,k1 ,k 2 ,…,k N Representing potential knowledge levels of learners, specifically to the ith learner as Given by a knowledge tracking model; p is p 1 ,p 2 ,…,p N Is a low-dimensional vector representation of learner problem records and location identifiers that function to record the location of modifications;
the reward function of the reinforcement learning module is given by a personalized recommendation model, and the form is shown as follows:
wherein ,etarget Is the problem actually selected by the learner at the next moment,representing the probability of selecting a target problem based on the modified problem records, p (e target |E i ) Representing a probability of selecting a target problem based on the original problem record; the reinforcement learning module adopts a round update strategy, and obtains a reward function only after finishing the modification of the whole learning record of one learner, and the reward function is 0 at the rest time;
the reinforcement Learning algorithm adopts a depth Q network algorithm DQN, and the algorithm combines a neural network and a Q-Learning algorithm in the traditional reinforcement Learning algorithm;
the reinforcement learning module takes the square of the difference between the true value and the predicted value as a loss function, trains and updates the parameters of the DQN model, and the specific formula of the loss function is shown as follows:
wherein ,Qθ (s t ,a t ) Represented in state s t Lower selection action a t Calculating the predicted value of the obtained rewards from the predicted Q network, and predicting the network parameters of the Q networkθ;representing state s t Lower selection action a t A true value of the prize available; wherein->Calculated by the target Q network, represents the next state s t+1 The maximum prize value that can be obtained, the network parameters of the target Q network are +.>r t Is the current available reward value, which is given by the reward function;
the gradient of the loss function is shown as follows:
network parameters are updated according to the gradient descent.
6. The reinforcement learning-based personalized problem recommendation method of claim 5, wherein: the specific process of modifying the problem records of the learner in the step 3 is as follows:
step 3-1: initializing a model, including initializing parameters of a predictive Q network and a target Q network; initializing an experience playback pool, wherein the capacity is N; initializing a learner-modified problem record setLearner index i=1, time t=0;
step 3-2: obtaining problem records of learner iAnd an initial state s 0
Step 3-3: state s t Feature vector phi(s) t ) As the input of the predictive Q network, obtaining the Q value corresponding to the action in the current state;
step 3-4: selecting action a in current Q value by adopting epsilon-greedy strategy t
Step 3-5: if a is t =0, deleteIs->
Step 3-6: in state s t Executing the current action a t Obtaining the next state s t+1 Sum prize r t
Step 3-7: will { s ] t ,a t ,r t ,s t+1 This quadruple is stored in an experience playback pool;
step 3-8: updating state s t =s t+1
Step 3-9: sampling m samples { s } from an empirical playback pool j ,a j ,r j ,s j+1 J=1, 2, …, m, calculating the current target Q value y j
Step 3-10: using a mean square error loss functionUpdating parameters of the predictive Q network;
step 3-11: updating parameters of the target Q network after each step C, wherein the parameter value is the parameter value of the current predicted Q network;
step 3-12: judging whether the moment reaches a set value T or not; if not, returning to the step 3-3; if so, executing the next step;
step 3-13: record E of the problem after modification i Is marked asWill->Add to learner modified problem record set +.>
Step 3-14: judging whether all the problem records of the learners are modified, if not, returning to the step 3-2, continuing the modification of the problem records of the next learner, and if so, ending the step.
7. The reinforcement learning-based personalized problem recommendation method of claim 6, wherein: the process of the step 4 joint training is specifically as follows:
step 4-1: initializing parameters α=α of personalized recommendation model 0 Parameter β=β of knowledge tracking model 0 And the parameter θ=θ of the reinforcement learning module 0
Step 4-2: using learner exercise recordsTraining a knowledge tracking model;
step 4-3: recording of problems with learnerTraining the personalized recommendation model by the knowledge tracking model;
step 4-4: parameter α=α of the fixed personalized recommendation model 1 And parameter β=β of knowledge tracking model 1 Pre-training the reinforcement learning module; the specific method comprises the following steps:
step 4-4-1: reinforced learning algorithm in problem recordingA step of up-selecting action;
step 4-4-2: calculating a Reward function Reward according to the selected action;
step 4-4-3: updating parameters of the reinforcement Learning module according to a loss function of the Deep Q-Learning algorithm;
step 4-4-4: circularly executing the steps 4-4-1 to 4-4-3 until all problems are recordedThe circulation is completed;
step 4-4-5: repeating the steps 4-4-1 to 4-4-4 until the parameters of the reinforcement learning module reach the optimal values;
step 4-5: parameter β=β for fixed knowledge tracking 1 Performing joint training on the personalized recommendation model and the reinforcement learning module; the specific method comprises the following steps:
step 4-5-1: reinforced learning algorithm in problem recordingA step of up-selecting action;
step 4-5-2: calculating a Reward function Reward according to the selected action;
step 4-5-3: updating parameters of the reinforcement Learning module according to a loss function of the Deep Q-Learning algorithm;
step 4-5-4: step 4-5-1 to step 4-5-3 are circularly performed until allThe circulation is completed;
step 4-5-5: updating parameters of the recommendation model according to the loss function of the recommendation model;
step 4-5-6: and repeatedly and circularly executing the steps 4-5-1 to 4-5-5 until the parameters of the personalized recommendation model and the reinforcement learning module reach the optimal.
CN202310703313.2A 2023-06-14 2023-06-14 Personalized problem recommendation method based on reinforcement learning Pending CN116680477A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310703313.2A CN116680477A (en) 2023-06-14 2023-06-14 Personalized problem recommendation method based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310703313.2A CN116680477A (en) 2023-06-14 2023-06-14 Personalized problem recommendation method based on reinforcement learning

Publications (1)

Publication Number Publication Date
CN116680477A true CN116680477A (en) 2023-09-01

Family

ID=87787013

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310703313.2A Pending CN116680477A (en) 2023-06-14 2023-06-14 Personalized problem recommendation method based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN116680477A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116720007A (en) * 2023-08-11 2023-09-08 河北工业大学 Online learning resource recommendation method based on multidimensional learner state and joint rewards

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116720007A (en) * 2023-08-11 2023-09-08 河北工业大学 Online learning resource recommendation method based on multidimensional learner state and joint rewards
CN116720007B (en) * 2023-08-11 2023-11-28 河北工业大学 Online learning resource recommendation method based on multidimensional learner state and joint rewards

Similar Documents

Publication Publication Date Title
CN111460249B (en) Personalized learning resource recommendation method based on learner preference modeling
CN108095716B (en) Electrocardiosignal detection method based on confidence rule base and deep neural network
Shah et al. Interactive reinforcement learning for task-oriented dialogue management
CN112085168A (en) Knowledge tracking method and system based on dynamic key value gating circulation network
CN113268611B (en) Learning path optimization method based on deep knowledge tracking and reinforcement learning
CN116680477A (en) Personalized problem recommendation method based on reinforcement learning
CN112529155A (en) Dynamic knowledge mastering modeling method, modeling system, storage medium and processing terminal
CN114299349B (en) Crowdsourcing image learning method based on multi-expert system and knowledge distillation
CN107544960A (en) A kind of inference method activated based on Variable-Bindings and relation
CN113610235A (en) Adaptive learning support device and method based on deep knowledge tracking
CN113361791A (en) Student score prediction method based on graph convolution
CN115329959A (en) Learning target recommendation method based on double-flow knowledge embedded network
CN113591988B (en) Knowledge cognitive structure analysis method, system, computer equipment, medium and terminal
CN114567815B (en) Pre-training-based adaptive learning system construction method and device for lessons
CN113934840B (en) Covering heuristic quantity sensing exercise recommendation method
CN113378581A (en) Knowledge tracking method and system based on multivariate concept attention model
CN113239699B (en) Depth knowledge tracking method and system integrating multiple features
CN114091657A (en) Intelligent learning state tracking method, system and application based on multi-task framework
CN114742292A (en) Knowledge tracking process-oriented two-state co-evolution method for predicting future performance of students
CN111539292B (en) Action decision model and method for question-answering task with actualized scene
KR102426812B1 (en) Scheme for reinforcement learning based enhancement of interaction
CN112884129B (en) Multi-step rule extraction method, device and storage medium based on teaching data
CN117057422B (en) Knowledge tracking system for global knowledge convergence sensing
CN115952838B (en) Self-adaptive learning recommendation system-based generation method and system
CN116882450B (en) Question-answering model editing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination