CN115238169A - Mu course interpretable recommendation method, terminal device and storage medium - Google Patents

Mu course interpretable recommendation method, terminal device and storage medium Download PDF

Info

Publication number
CN115238169A
CN115238169A CN202210666129.0A CN202210666129A CN115238169A CN 115238169 A CN115238169 A CN 115238169A CN 202210666129 A CN202210666129 A CN 202210666129A CN 115238169 A CN115238169 A CN 115238169A
Authority
CN
China
Prior art keywords
course
path
learner
representing
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210666129.0A
Other languages
Chinese (zh)
Inventor
林元国
林凡
张志宏
张伟
游环宇
柳蕴轩
陈鸿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen University
Original Assignee
Xiamen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen University filed Critical Xiamen University
Priority to CN202210666129.0A priority Critical patent/CN115238169A/en
Publication of CN115238169A publication Critical patent/CN115238169A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The invention relates to a mu lesson interpretable recommendation method, terminal equipment and a storage medium, wherein the method comprises the following steps: constructing a triple data set according to the relation between entities in the history course selection record of the learner and the entities; constructing a knowledge graph based on a triple data set, and vectorizing and expressing entities and relations in the knowledge graph through a TransE model, wherein learners and courses are expressed through a course expression method with coarse granularity and a concept expression method with fine granularity respectively; constructing a learning path reasoning model based on an automatic supervision reinforcement learning method, and guiding a recommended agent to carry out learning path reasoning from a learner to a target course on a knowledge graph; training a learning path inference model by adopting an executor-critic algorithm; and reasoning the learning path from the learner to the target course through the trained learning path reasoning model. The invention not only can construct explicit information and implicit feedback in the knowledge map, but also can explain and recommend the mullet lesson through deep reinforcement learning.

Description

Mu course interpretable recommendation method, terminal device and storage medium
Technical Field
The invention relates to the field of admiration class recommendation, in particular to an admiration class interpretable recommendation method, terminal equipment and a storage medium.
Background
The existing popular interpretable recommendation method is not suitable for a mu class (Massive Open Online course, abbreviated as MOOC) recommendation scene, because the Online education environment is usually constrained by two complex conditions: firstly, the courses are successively revised. The precedence relationship between courses is an important consideration for recommending courses. Generally, the course recommendation should include the first-repair courses because the learner may lack the knowledge points of the first-repair courses and need to master the knowledge points to enhance the understanding of the repaired courses. Second, the knowledge structure of the learner. It is well known that the knowledge structure of learners is evolving during the learning process. In this case, any one-to-no-change recommendation strategy is suboptimal because learners will reach different completion states for recommended courses depending on their learning abilities. Therefore, the recommendation strategy needs to take into account the learner's cognitive level. The constraints of these complex conditions lead to the challenge of high difficulty in implementing the interpretable recommendation method for the Muyu lesson.
Disclosure of Invention
In order to solve the above problems, the present invention provides an mu lesson interpretable recommendation method, a terminal device and a storage medium.
The specific scheme is as follows:
a mu class interpretable recommendation method, comprising the steps of:
s1: collecting historical course selection records of learners, extracting learners, courses, course concepts and subject classifications from the historical course selection records as entities, extracting the relationships among the entities, and constructing a triple data set based on the relationships among the entities;
s2: constructing a knowledge graph based on a triple data set, and performing vectorization representation on entities and relations in the knowledge graph through a TransE model, wherein an embedded vector of a learner is represented through a course representation method with coarse granularity, and the embedded vector of a course is represented through a concept representation method with fine granularity;
s3: constructing a learning path reasoning model based on an automatic supervision reinforcement learning method, and guiding a recommended agent to carry out learning path reasoning from a learner to a target course on a knowledge graph;
the learning path inference model comprises an executor network and a path discriminator, wherein the executor network firstly generates a path, then the path discriminator distinguishes an expert demonstration path from the generated path, and the executor network tries to deceive the path discriminator by imitating the expert demonstration path;
s4: training a learning path reasoning model by adopting an executor-critic algorithm, wherein an executor network learns a path reasoning strategy according to a value function of a critic network, and the critic network updates the value function in a single step by utilizing a time sequence difference method;
s5: and reasoning the learning path from the learner to the target course through the trained learning path reasoning model.
Further, the method for representing the learner's embedded vector by a course representation method with coarse granularity is as follows: after the history course selection records of the learner u are sequenced according to the time sequence, the embedded vector of the learner is expressed as:
Figure BDA0003693010880000021
wherein,
Figure BDA0003693010880000022
shows the course of learner u's repair at time t, 1, …, t, …, t u Indicating a time from far to near.
Further, the embedded vector of the course is represented as c t
c t ={(k,w)|(k i ,w j ),n>i>0,j>0}
Wherein n represents the number of the course concepts contained in the course, i represents the sequence number of the course concepts, j represents the sequence number of the words contained in the course concepts, k i Embedded vectors, w, representing the ith course concept of a course j Representing an embedding vector k i The embedded vector of the jth word in the lesson, k represents the embedded vectors of all concepts in the lesson, and w represents the embedded vectors of all words in the lesson.
Further, in step S3, the path inference problem is expressed as a markov decision process, and the agent recommends a learning path from the learner to the target course for the learner by performing multi-hop path inference on the knowledge base; setting an initial state as s0= u and a state at time t as
Figure BDA0003693010880000031
Figure BDA0003693010880000032
According to state s t The agent performs the associated action according to the policy
Figure BDA0003693010880000033
To predict entity e t Feasible output edge, action space
Figure BDA0003693010880000034
Where epsilon represents the set of entities,
Figure BDA0003693010880000035
table cloud a knowledge graph; profit through terminal R e,T Measuring whether an agent has generated a course starting with learner u and targeting
Figure BDA0003693010880000036
A terminated multi-hop path; wherein u represents learner, r represents relationship, e represents entity, r t A relation vector representing time t, e t Showing the fact that at time tThe volume vector, t = {1,2,. Once, t-1, t +1},
Figure BDA0003693010880000037
a revenue function is represented.
Further, potential output edges are reserved through weighting action paths in the Markov decision process, and the weight of each edge in the paths is set as:
Figure BDA0003693010880000038
wherein,
Figure BDA0003693010880000039
representation with respect to triplets
Figure BDA00036930108800000310
V represents the vector e of the head entity t With head entity e t With the tail entity
Figure BDA00036930108800000311
Relation r between t Is given by the relation vector r t The sum of the total weight of the components,
Figure BDA00036930108800000312
vector representing tail entity
Figure BDA00036930108800000313
| | | represents the L1 norm.
Further, the method for acquiring the expert demonstration path comprises the following steps: for all learner u and target courses
Figure BDA0003693010880000041
Generation of learner u and target lessons on a weighted graph using Dijkstra's algorithm based on weighted action paths
Figure BDA0003693010880000042
The shortest path between the two paths is obtained to obtain a series of demonstration paths
Figure BDA0003693010880000043
From demonstration path
Figure BDA0003693010880000044
And obtaining an expert demonstration path by the random sampling.
Further, a path discriminator D p (s t ,a t ) Indicating a state related to s t Action a at time t t Specifically defined as:
Figure BDA0003693010880000045
Figure BDA0003693010880000046
wherein,
Figure BDA0003693010880000047
is the intermediate variable(s) of the variable,
Figure BDA0003693010880000048
represents a state s t The embedded vector of (a) is embedded,
Figure BDA0003693010880000049
is at discriminator D p Middle action a p,t Tan h (-) represents a hyperbolic tangent function, σ (-) represents a logistic sigmoid function,
Figure BDA00036930108800000410
and
Figure BDA00036930108800000411
are all parameters that are learned, d a Dimension representing embedding of actions in an actor network, d s Dimension representing state embedding, d d Representing the dimension of action embedding in the path arbiter.
A mu lesson interpretable recommendation terminal apparatus comprising a processor, a memory and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the method as described above in embodiments of the present invention when executing the computer program.
A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the method as described above for an embodiment of the invention.
By adopting the technical scheme, the invention not only can construct explicit information and implicit feedback in the knowledge map, but also can explain and recommend the admiration class through deep reinforcement learning.
Drawings
Fig. 1 is a flowchart illustrating a first embodiment of the present invention.
Fig. 2 is a general framework diagram of the first embodiment of the present invention.
Detailed Description
To further illustrate the various embodiments, the invention provides the accompanying drawings. The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate embodiments of the invention and, together with the description, serve to explain the principles of the embodiments. Those skilled in the art will appreciate still other possible embodiments and advantages of the present invention with reference to these figures.
The invention will now be further described with reference to the drawings and the detailed description.
The first embodiment is as follows:
an embodiment of the present invention provides an mu lesson interpretable recommendation method, as shown in fig. 1 and 2, the method including the following steps:
s1: the method comprises the steps of collecting historical course selection records of learners, extracting learners, courses, course concepts and subject classifications from the historical course selection records as entities, extracting relationships among the entities, and constructing a triple data set based on the relationships among the entities.
S2: and constructing a knowledge graph based on the triple data set, and performing vectorization representation on the entities in the knowledge graph through a TransE model to obtain the embedded vectors of the entities.
The knowledge graph can be constructed based on triple data (h, r, t), and semantic association between entities can be performed through triple data, wherein h and t represent a head node and a tail node of a relationship, or, in other words, h and t represent a head entity and a tail entity of a relationship, and r represents a relationship, such as < learner-pick-curriculum >.
It should be noted that, for the triple data in the constructed knowledge graph, if it is desired to actually apply the triple data, the vocabulary text therein needs to be digitally converted for subsequent calculation. Specifically, the mapping of the vocabulary text in the triple data in the vector space can be obtained through the distributed representation learning of the knowledge graph, that is, the corresponding vector of the vocabulary text in the vector space, wherein the entity corresponds to the entity vector, and the relationship corresponds to the relationship vector. The embodiment utilizes a TransE model to carry out the pair of entities e in the knowledge graph i Vectorizing the sum relation r to obtain the embedded vector of each entity
Figure BDA0003693010880000061
Embedded vectors of sum relations
Figure BDA0003693010880000062
Wherein d is E Representing the dimensions of the vector.
In order to capture the learner's time-series preference for lessons in the present embodiment, the learner's learning behavior is modeled from the interaction sequence using a coarse-grained lesson representation. In addition, a fine-grained concept representation method is also utilized to master the knowledge state of the learner, and the concepts are taken as the attribute-level information of the course. In this way, the learner's timing preferences may be combined with attribute level preferences to better simulate the learner's knowledge structure while mining for potential factors in the follow-up relationships between courses, as similar courses generally have some of the same course concepts.
(1) The learner is encoded by a course representation method of coarse granularity.
Based on knowledge graphIn the course recommendation of the spectrum, U is set to represent a set of learners, C is set to represent a set of courses, and a history course selection record C is given u The recommended task aims to find a corresponding target course for a specific learner U belonging to U
Figure BDA0003693010880000063
The recommended learned path. To this end, the history course selection record of learner u can be formalized by sequencing the history course selection record in time order
Figure BDA0003693010880000064
Wherein
Figure BDA0003693010880000065
The course C belonging to C and t representing the course of the learner u at the moment t u Representing the number of courses selected by the learner. Thus, a coarse-grained curriculum representation may be used to encode a learner by representing the learner's embedded vector as:
Figure BDA0003693010880000066
in this way, the learner's embedded vector models the learner's learning behavior through a coarse-grained curriculum representation approach, thereby capturing the learner's chronological preferences for the curriculum.
Since the course representation method of coarse granularity cannot understand and interpret the hidden vector of each course, it is difficult to know the knowledge level of the learner from the history of course selection. For this reason, a fine-grained conceptual representation method is also proposed in the present embodiment to address this challenge.
(2) The course is encoded by a fine-grained conceptual representation.
It is well known that a learner's knowledge structure is made up of many knowledge points, including curriculum concepts. In addition, each course contains multiple concepts, and similar courses often have the same concepts between them. In this case, we can capture the learner's knowledge level using semantic representations of the lesson concepts.
More precisely, the concept representation method of fine granularity can be through a series of coursesThe concept of a course to capture the knowledge state of a learner, i.e. { k } 1 ,...,k i In which k is i The embedded vector representing the concept of the course in the taken course can be used as the attribute-level information of the taken course. Typically, course concept embedding consists of a series of word vectors. Formally, according to a series of conceptual embeddings, a course embeddings can be made up of a set of vector pairs:
c t ={(k,w)|(k i ,w j ),n>i>0,j>0}
wherein n represents the number of the course concepts contained in the course, i represents the sequence number of the course concepts, j represents the sequence number of the words contained in the course concepts, k i Embedded vectors, w, representing the ith course concept of a course j Representing an embedding vector k i The embedded vector of the jth word in the course, k represents the embedded vector of all concepts in the course, and w represents the embedded vector of all words in the course.
In particular, a plurality of similar courses may be associated by one or more of the same course concepts. This connectivity may reveal underlying factors of the sequencing relationship between courses. For example, the course "genetics" and its predecessor course "cell biology" share some of the same course concepts, such as "genes", "cells", etc. In this case, if the learner chooses to revise the "genetics" or "cell biology" course, this semantic perception can enable the multi-scale representation learning method to capture the knowledge level and interest of the learner. Therefore, the method enriches the perception information of semantic interaction in the knowledge graph and is beneficial to path reasoning of the admiration course recommendation.
S3: and constructing a learning path reasoning model (such as an automatic supervision module in fig. 2) based on an automatic supervision reinforcement learning method, wherein the learning path reasoning model is used for guiding the recommendation agent to carry out learning path reasoning from the learner to the target course on the knowledge graph.
The present embodiment is used to provide interpretable recommendations to the learner in step S3. Therefore, the embodiment proposes an auto-supervised deep reinforcement learning method for guiding path reasoning on the knowledge graph constructed by the multi-scale representation method. Specifically, starting from a certain learner in the observed historical course selection record Cu, the recommending agent executes multi-hop path reasoning on the knowledge graph, so that the recommended course not only conforms to the knowledge structure of the learner, but also can meet the constraint requirement of the course sequential relation. The learning path inference model in this embodiment helps the recommending agent to differentiate the strengths of different paths in the knowledge-graph to infer learner preferences and find reasonable demonstrations to achieve accurate recommendations.
(1) Markov decision process
The path inference problem is expressed in this embodiment as a Markov Decision Process (MDP). The agent attempts to recommend an appropriate course for the learner by performing multi-hop path reasoning on the knowledge graph. Formally, MDP can be defined as a 5-tuple
Figure BDA0003693010880000081
Wherein S represents the space of states in which,
Figure BDA0003693010880000082
representing the action space, P representing the state transition probability,
Figure BDA0003693010880000083
representing the revenue function in the environment and gamma representing the discount coefficient of the revenue.
State: s is t E S represents the search state of the agent in the knowledge-graph at time t. Here, assume that the path discovery process is for learner u and the target course
Figure BDA0003693010880000084
The multi-hop relationship between them, i.e. initial state s0= u, and other states
Figure BDA0003693010880000085
To enhance the path reasoning capabilities of the agent to obtain higher recommendation accuracy, we introduce course concepts as auxiliary information to increase path connectivity.
The actions: according to state s t The agent performs the correlation according to the policyMovement of
Figure BDA0003693010880000086
To predict entity e t Feasible output edges (except for the searched entities). It is necessary here to control the size of the motion space, since some entities have a great degree of outages in the knowledge-graph. Thus, the present embodiment utilizes a weighted action to preserve potential output edges so that the strategy can be adjusted to infer learner preferences. Formally, an action space can be defined as
Figure BDA0003693010880000091
Figure BDA0003693010880000092
Where epsilon represents the set of entities,
Figure BDA0003693010880000093
representing a knowledge graph.
Revenue:
Figure BDA0003693010880000094
represents terminal revenue that measures whether an agent has generated a course that starts with learner u and targets
Figure BDA0003693010880000095
Ending the multi-hop path. Formally, the terminal gain at the last time T may be defined as
Figure BDA0003693010880000096
Wherein
Figure BDA0003693010880000097
Indicating functions representing path discovery, i.e. when
Figure BDA0003693010880000098
Is 1 when the compound is used; when in
Figure BDA0003693010880000099
Is 0.
(2) Self-monitoring module
The self-supervision module in this embodiment includes two functions: one is a weighted action path that helps the recommending agent (i.e., actor) to differentiate the strengths of the different paths in the knowledge-graph to infer the learner's preferences. The other is a path discriminator based on inverse reinforcement learning, which can obtain reasonable demonstration paths to realize accurate recommendation. Details of the implementation of these two functions are described below.
1) Weighted motion paths
Some studies assume that shorter paths are easier to interpret recommendations and then use minimal multi-hop relationships to infer paths on unweighted graphs. However, the approach does not adequately mine the overall semantics of the dependencies and paths between entities, which may lead to unreasonable reasoning. As an alternative, a weighted operation based on the similarity of relationships between entities may learn the dependencies between entities and distinguish the strengths of different paths. For a given arbitrary triplet
Figure BDA00036930108800000910
For representing head entities e t And tail entity
Figure BDA00036930108800000911
By the relationship r t Connected, the weight of each edge in the path can be defined as follows.
Figure BDA00036930108800000912
Wherein,
Figure BDA00036930108800000913
representation with respect to triplets
Figure BDA00036930108800000914
V represents the vector e of the head entity t Vector r in relation to it t The sum of the total weight of the components,
Figure BDA00036930108800000915
vector representing tail entity
Figure BDA00036930108800000916
The smaller the weight value of each edge in the path, the stronger the dependency between two entities on the path, as they are closer in vector space.
Generating learner u and target lessons on a weighted graph using Dijkstra's algorithm based on weighted action paths
Figure BDA0003693010880000101
The shortest path between them. For all learner u and target courses
Figure BDA0003693010880000102
This process is repeated to obtain a series of demonstration paths
Figure BDA0003693010880000103
Figure BDA0003693010880000104
Wherein,
Figure BDA0003693010880000105
representation with respect to triplets
Figure BDA0003693010880000106
The minimum edge weight of. In this way, the recommending agent utilizes the weighted action path to adjust the policy to efficiently infer the learner's preferences, as the path weights can explore the overall semantics of the path in the observed interactions.
2) Path discriminator based on inverse reinforcement learning
In the embodiment, the reasonable demonstration path conforming to the predefined meta path is obtained by adopting the generative confrontation simulation learning. It employs expert demonstration paths and revenue signals to incentivize strategies to achieve accurate recommendations. In this way, the recommending agent can recommend lessons that match the learner's knowledge level and interests while enhancing their reasoning ability.
Specifically, the actor network competes with the path arbiter D p And (3) performing cooperation: the actor network first generates a path and then the path arbiter distinguishes the expert demonstration path from the generated path, and the actor network attempts to fool the path arbiter by mimicking the expert demonstration path. Formally, the path discriminator D p (s t ,a t ) Indicating a state related to s t Action a at time t t Specifically, the following can be defined.
Figure BDA00036930108800001011
Figure BDA0003693010880000107
Wherein,
Figure BDA0003693010880000108
is the intermediate variable(s) of the variable,
Figure BDA0003693010880000109
represents a state s t The embedded vector of (a) is embedded,
Figure BDA00036930108800001010
is at discriminator D p Middle action a p,t Tan h (-) represents a hyperbolic tangent function, σ (-) represents a logistic sigmoid function,
Figure BDA0003693010880000111
and
Figure BDA0003693010880000112
are all learned parameters, d a Dimension representing embedding of actions in an actor network, d s Dimension representing state embedding, d d Indicating path judgmentThe dimension of action embedding in the discriminator.
Route discriminator D p (s t ,a t ) Is used to calculate(s) t ,a t ) Probability from the observed demonstration path. In general, the following classification loss function can be minimized
Figure BDA0003693010880000113
To realize that:
Figure BDA0003693010880000114
wherein the motion
Figure BDA0003693010880000115
And state
Figure BDA0003693010880000116
Determined by an expert demonstration path, which is a path of demonstration from observation
Figure BDA0003693010880000117
Is randomly sampled.
When the actor network generates a path similar to the observed presentation(s) t ,a t ) Then, the profit R of the path discriminator can be obtained p,t Specifically, the following is shown.
R p,t =log D p (s t ,a t )-log(1-D p (s t ,a t ))
To smoothly update the strategy to find a path approximating the observed presentation, we define the aggregate revenue R by a linear combination of path discovery and the revenue of the path arbiter t
R t =λR e,T +(1-λ)R p,t
Wherein lambda belongs to [0,1 ∈ >]Is the profit R of balanced path discovery e,T Yield R of sum path discriminator p,t A scaling factor of (2).
S4: and training a learning path reasoning model by adopting an executor-critic algorithm, wherein an executor network learns a path reasoning strategy according to a value function of a critic network, and the critic network updates the value function in a single step by using a time sequence difference method.
(1) The executive is: the actor network aims to compute each action in the state st
Figure BDA0003693010880000118
To learn a path inference strategy. It effectively guides path reasoning using weighted action paths and expert path discriminators. In this embodiment, the multi-layer fully-connected neural network is used to train the executor network pi θ (a t ,s t ):
h θ =ReLU(W θ,s s t )
Figure BDA0003693010880000121
Wherein ReLU (. Cndot.) represents an activation function,
Figure BDA0003693010880000122
representing an action a in an actor network t The embedded vector of (a) is embedded,
Figure BDA0003693010880000123
and
Figure BDA0003693010880000124
is the actor network parameter to learn, d h Dimension representing hidden layer, d s Dimension representing state embedding, d a Representing the dimension of action embedding. Here, the actor network is optimized by a policy gradient method. For each of the sampling trajectories there is a sampling trajectory,
Figure BDA0003693010880000125
the gradient of (c) can be calculated as follows.
Figure BDA0003693010880000126
Wherein the symbol. Varies indicates "proportional to", Q φ (s t ,a t ) Represents a state s t Lower motion a t Thus, we can learn the participant network by minimizing the loss function, as shown in detail below.
Figure BDA0003693010880000127
Wherein
Figure BDA0003693010880000128
Representing compliance with actor network policy π θ The expected values of the variables given below.
(2) A critic: the critic network evaluates an action cost function to evaluate each action in the MDP environment. It can model the benefits of path discovery and path discriminators to effectively guide the actor network. Critic network computing state s t Lower operating value Q φ
h φ =ReLU(W φ,s s t )
Q φ (s t ,a t )=a φ,t ReLU(W φ,a h φ )
Wherein
Figure BDA0003693010880000129
Representing an action a in a critic network t The embedded vector of (a) is embedded,
Figure BDA00036930108800001210
and
Figure BDA00036930108800001211
is the critic network parameter to learn.
The commenting family network is trained through a time sequence difference method, and the method updates the target q in a single step according to a Bellman equation t As follows.
Figure BDA0003693010880000131
Wherein beta epsilon [0,1]Is a function of action value Q φ (s t+1 The attenuation factor of a). Thus, the critic network can be learned by minimizing timing difference errors:
Figure BDA0003693010880000132
by minimizing the total loss function, we jointly optimize the path arbiter D p (s t ,a t ) Executor network pi θ And critic network Q φ . Thus, the objective function of the learning path inference model may be defined as follows:
Figure BDA0003693010880000133
s5: and reasoning the learning path from the learner to the target course through the trained learning path reasoning model.
For the knowledge graph environment, the semantic representation and the relation of the knowledge graph are enhanced by adopting a multi-scale representation learning method. More specifically, a coarse-grained curriculum representation simulates a learner's learning behavior through user-curriculum interactions; fine-grained concept representation can capture the knowledge state of a learner, i.e., a series of course concepts k 1 ,...,k i They are regarded as attribute level information of the taken lessons. In this way, potential relationships between courses may be well learned.
The embodiment recommends a target course matched with the knowledge level and interest of a learner by an auto-supervised reinforcement learning method, and recommends an intelligent agent to start from a learner, carry out multi-hop path reasoning on a knowledge graph and finally recommend a proper course in the knowledge graph to the learner. The self-supervision module in the method comprises two functions: the path discriminator based on the inverse reinforcement learning can obtain a reasonable demonstration path to realize accurate recommendation. In addition, weighting action paths may help the recommending agent differentiate the strengths of different paths in the knowledge-graph to infer the learner's preferences.
The present embodiment utilizes an actor-critic algorithm to train the learning path inference model. It uses the benefit signal (i.e. a benefit R) e,T For path discovery, another benefit R p,t For path discrimination) to motivate path reasoning for policy evaluation of the mu lesson recommendation.
In summary, the embodiment can construct explicit information (e.g. learning behavior of learners) and implicit feedback (e.g. knowledge level of learners) in the knowledge-graph, and can also make mu class interpretable recommendations through deep reinforcement learning.
Example two:
the invention also provides a mu lesson interpretable recommendation terminal device, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor executes the computer program to realize the steps of the method embodiment of the first embodiment of the invention.
Further, as an executable solution, the mu lesson interpretable recommendation terminal device may be a desktop computer, a notebook, a palm computer, a cloud server, and other computing devices. The mu class interpretable recommendation terminal device can include, but is not limited to, a processor, and a memory. Those skilled in the art will appreciate that the above-mentioned composition structure of the admiration course interpretable recommendation terminal device is only an example of the admiration course interpretable recommendation terminal device, and does not constitute a limitation on the admiration course interpretable recommendation terminal device, and may include more or less components than the above-mentioned one, or combine some components, or different components, for example, the admiration course interpretable recommendation terminal device may further include an input-output device, a network access device, a bus, etc., which is not limited in this embodiment of the present invention.
Further, as an executable solution, the Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, a discrete hardware component, and the like. The general processor can be a microprocessor or the processor can be any conventional processor or the like, said processor is the control center of said mullet interpretable recommendation terminal device, and various interfaces and lines are used to connect the various parts of the whole mullet interpretable recommendation terminal device.
The memory may be used for storing the computer program and/or module, and the processor may be configured to implement various functions of the mu course interpretable recommendation terminal device by executing or executing the computer program and/or module stored in the memory and calling data stored in the memory. The memory can mainly comprise a program storage area and a data storage area, wherein the program storage area can store an operating system and an application program required by at least one function; the storage data area may store data created according to the use of the mobile phone, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
The invention also provides a computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the above-mentioned method of an embodiment of the invention.
The said mu lesson can explain the module/unit integrated by the recommending terminal equipment can be stored in a computer readable storage medium if it is realized in the form of software functional unit and sold or used as independent product. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), software distribution medium, and the like.
While the invention has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (9)

1. A mu lesson interpretable recommendation method, comprising the steps of:
s1: collecting historical lesson selection records of learners, extracting learners, courses, course concepts and subject classifications from the historical lesson selection records as entities, extracting relationships among the entities, and constructing a triple data set based on the relationships among the entities;
s2: constructing a knowledge graph based on a triple data set, and performing vectorization representation on entities and relations in the knowledge graph through a TransE model, wherein an embedded vector of a learner is represented through a course representation method with coarse granularity, and the embedded vector of a course is represented through a concept representation method with fine granularity;
s3: constructing a learning path reasoning model based on an automatic supervision reinforcement learning method, and guiding the recommended agent to carry out learning path reasoning from the learner to the target course on the knowledge map;
the learning path inference model comprises an executor network and a path discriminator, wherein the executor network firstly generates a path, then the path discriminator distinguishes an expert demonstration path from the generated path, and the executor network tries to deceive the path discriminator by imitating the expert demonstration path;
s4: training a learning path reasoning model by adopting an executor-critic algorithm, wherein an executor network learns a path reasoning strategy according to a value function of a critic network, and the critic network updates the value function in a single step by utilizing a time sequence difference method;
s5: and reasoning the learning path from the learner to the target course through the trained learning path reasoning model.
2. The mu lesson interpretable recommendation method according to claim 1, wherein: the method for representing the learner's embedded vector by a course representation method with coarse granularity is as follows: after the history course selection records of the learner u are sequenced according to the time sequence, the embedded vector of the learner is expressed as:
Figure FDA0003693010870000011
wherein,
Figure FDA0003693010870000012
shows the course of learner u's repair at time t, 1, …, t, …, t u Indicating a time from far to near.
3. The mu lesson interpretable recommendation method according to claim 1, wherein: the method for expressing the embedded vector of the course by a fine-grained concept expression method comprises the following steps: representing the embedded vector of the course as c t
c t ={(k,w)|(k i ,w j ),n>i>0,j>0}
Wherein n represents the number of the course concepts contained in the course, i represents the sequence number of the course concepts, j represents the sequence number of the words contained in the course concepts, k i Embedded vectors, w, representing the ith course concept of a course j Representing an embedding vector k i The embedded vector of the jth word in the course, k represents the embedded vector of all concepts in the course, and w represents the embedded vector of all words in the course.
4. The mu lesson interpretable recommendation method according to claim 1, wherein: step S3, expressing the path reasoning problem as a Markov decision process, and recommending a learning path from a learner to a target course for the learner by the intelligent agent through executing multi-hop path reasoning on the knowledge graph; setting an initial state as s in a Markov decision process 0 =, the state at time t is
Figure FDA0003693010870000021
According to state s t The agent performs the associated action according to the policy
Figure FDA0003693010870000022
To predict entity e t Feasible output edge and action space
Figure FDA0003693010870000023
Where epsilon represents a set of entities that are,
Figure FDA0003693010870000024
representing a knowledge graph; profit R through terminal e,T Measuring whether an agent has generated a course starting with learner u and targeting
Figure FDA0003693010870000025
A terminated multi-hop path; wherein u represents learner, r represents relationship, e represents entity, r t A relation vector representing time t, e t An entity vector representing time t, t = {1,2, …, t-1, t +1},
Figure FDA0003693010870000029
a revenue function is represented.
5. The mu lesson interpretable recommendation method according to claim 1, wherein: and reserving potential output edges by weighting action paths in the Markov decision process, wherein the weight of each edge in the paths is set as:
Figure FDA0003693010870000026
wherein,
Figure FDA0003693010870000027
representation with respect to triplets
Figure FDA0003693010870000028
V represents the vector e of the head entity t With head entity e t With the tail entity
Figure FDA0003693010870000031
Relation r between t Is given by the relation vector r t The sum of the total weight of the components,
Figure FDA0003693010870000032
vector representing tail entity
Figure FDA0003693010870000033
Iiii denotes the L1 norm.
6. The mu lesson interpretable recommendation method according to claim 1, wherein: the method for acquiring the expert demonstration path comprises the following steps: for all learner u and target courses
Figure FDA0003693010870000034
Generation of learner u and target lessons on a weighted graph using Dijkstra's algorithm based on weighted action paths
Figure FDA0003693010870000035
The shortest path therebetween, a series of demonstration paths are obtained
Figure FDA0003693010870000036
From demonstration path
Figure FDA0003693010870000037
And obtaining an expert demonstration path by the random sampling.
7. The mu lesson interpretable recommendation method according to claim 1, wherein: route discriminator D p (s t ,a t ) Indicating a state related to s t Action a at time t t Specifically defined as:
Figure FDA0003693010870000038
Figure FDA0003693010870000039
wherein,
Figure FDA00036930108700000310
is the intermediate variable(s) of the variable,
Figure FDA00036930108700000311
represents a state s t The embedded vector of (a) is embedded,
Figure FDA00036930108700000312
is at discriminator D p Middle action a p,t Tan h (-) represents a hyperbolic tangent function, σ (-) represents a logistic sigmoid function,
Figure FDA00036930108700000313
and
Figure FDA00036930108700000314
are all learned parameters, d a Dimension representing embedding of actions in an actor network, d s Dimension representing state embedding, d d Representing the dimension of action embedding in the path arbiter.
8. An admire class interpretable recommendation terminal device, characterized by: comprising a processor, a memory and a computer program stored in said memory and running on said processor, said processor implementing the steps of the method according to any one of claims 1 to 7 when executing said computer program.
9. A computer-readable storage medium storing a computer program, the computer program characterized in that: the computer program realizing the steps of the method according to any one of claims 1 to 7 when executed by a processor.
CN202210666129.0A 2022-06-14 2022-06-14 Mu course interpretable recommendation method, terminal device and storage medium Pending CN115238169A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210666129.0A CN115238169A (en) 2022-06-14 2022-06-14 Mu course interpretable recommendation method, terminal device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210666129.0A CN115238169A (en) 2022-06-14 2022-06-14 Mu course interpretable recommendation method, terminal device and storage medium

Publications (1)

Publication Number Publication Date
CN115238169A true CN115238169A (en) 2022-10-25

Family

ID=83669551

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210666129.0A Pending CN115238169A (en) 2022-06-14 2022-06-14 Mu course interpretable recommendation method, terminal device and storage medium

Country Status (1)

Country Link
CN (1) CN115238169A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115577185A (en) * 2022-11-15 2023-01-06 湖南师范大学 Muting course recommendation method and device based on mixed reasoning and mesopic group decision
CN115658877A (en) * 2022-12-27 2023-01-31 神州医疗科技股份有限公司 Medicine recommendation method and device based on reinforcement learning, electronic equipment and medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115577185A (en) * 2022-11-15 2023-01-06 湖南师范大学 Muting course recommendation method and device based on mixed reasoning and mesopic group decision
CN115658877A (en) * 2022-12-27 2023-01-31 神州医疗科技股份有限公司 Medicine recommendation method and device based on reinforcement learning, electronic equipment and medium

Similar Documents

Publication Publication Date Title
Huo et al. Knowledge modeling via contextualized representations for LSTM-based personalized exercise recommendation
CN111460249B (en) Personalized learning resource recommendation method based on learner preference modeling
CN111582694B (en) Learning evaluation method and device
CN116134454A (en) Method and system for training neural network models using knowledge distillation
CN115238169A (en) Mu course interpretable recommendation method, terminal device and storage medium
CN109032591B (en) Crowdsourcing software developer recommendation method based on meta-learning
CN110889450B (en) Super-parameter tuning and model construction method and device
CN112221159A (en) Virtual item recommendation method and device and computer readable storage medium
Govindarajan et al. Dynamic learning path prediction—A learning analytics solution
CN111428448A (en) Text generation method and device, computer equipment and readable storage medium
El Gourari et al. The Implementation of Deep Reinforcement Learning in E‐Learning and Distance Learning: Remote Practical Work
Habib Hands-on Q-learning with python: Practical Q-learning with openai gym, Keras, and tensorflow
CN115330142A (en) Training method of joint capacity model, capacity requirement matching method and device
CN116186409A (en) Diversified problem recommendation method, system and equipment combining difficulty and weak knowledge points
CN112819024A (en) Model processing method, user data processing method and device and computer equipment
Adnan et al. Improving m-learners’ performance through deep learning techniques by leveraging features weights
KR102624135B1 (en) Artificial intelligence-based non-face-to-face programming training automation platform service provision method, device and system for enterprises
Ge et al. Deep reinforcement learning navigation via decision transformer in autonomous driving
CN117876090A (en) Risk identification method, electronic device, storage medium, and program product
Ciaburro Keras reinforcement learning projects: 9 projects exploring popular reinforcement learning techniques to build self-learning agents
CN114358988B (en) Teaching mode pushing method and device based on AI technology
Jiang et al. Learning analytics in a blended computer education course
CN112825147B (en) Learning path planning method, device, equipment and storage medium
Houlsby Efficient Bayesian active learning and matrix modelling
CN112907004B (en) Learning planning method, device and computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination