CN115238169A - Mu course interpretable recommendation method, terminal device and storage medium - Google Patents
Mu course interpretable recommendation method, terminal device and storage medium Download PDFInfo
- Publication number
- CN115238169A CN115238169A CN202210666129.0A CN202210666129A CN115238169A CN 115238169 A CN115238169 A CN 115238169A CN 202210666129 A CN202210666129 A CN 202210666129A CN 115238169 A CN115238169 A CN 115238169A
- Authority
- CN
- China
- Prior art keywords
- course
- path
- learner
- representing
- vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 73
- 230000002787 reinforcement Effects 0.000 claims abstract description 12
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 8
- 238000012549 training Methods 0.000 claims abstract description 4
- 239000013598 vector Substances 0.000 claims description 60
- 230000009471 action Effects 0.000 claims description 36
- 230000006870 function Effects 0.000 claims description 28
- 239000003795 chemical substances by application Substances 0.000 claims description 21
- 238000004590 computer program Methods 0.000 claims description 18
- 230000008569 process Effects 0.000 claims description 10
- 238000005070 sampling Methods 0.000 claims description 4
- 230000008439 repair process Effects 0.000 claims description 2
- 241001502129 Mullus Species 0.000 abstract description 3
- 230000008901 benefit Effects 0.000 description 5
- 230000006399 behavior Effects 0.000 description 4
- 230000003993 interaction Effects 0.000 description 4
- 238000013459 approach Methods 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 238000012163 sequencing technique Methods 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000001149 cognitive effect Effects 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 108090000623 proteins and genes Proteins 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Electrically Operated Instructional Devices (AREA)
Abstract
The invention relates to a mu lesson interpretable recommendation method, terminal equipment and a storage medium, wherein the method comprises the following steps: constructing a triple data set according to the relation between entities in the history course selection record of the learner and the entities; constructing a knowledge graph based on a triple data set, and vectorizing and expressing entities and relations in the knowledge graph through a TransE model, wherein learners and courses are expressed through a course expression method with coarse granularity and a concept expression method with fine granularity respectively; constructing a learning path reasoning model based on an automatic supervision reinforcement learning method, and guiding a recommended agent to carry out learning path reasoning from a learner to a target course on a knowledge graph; training a learning path inference model by adopting an executor-critic algorithm; and reasoning the learning path from the learner to the target course through the trained learning path reasoning model. The invention not only can construct explicit information and implicit feedback in the knowledge map, but also can explain and recommend the mullet lesson through deep reinforcement learning.
Description
Technical Field
The invention relates to the field of admiration class recommendation, in particular to an admiration class interpretable recommendation method, terminal equipment and a storage medium.
Background
The existing popular interpretable recommendation method is not suitable for a mu class (Massive Open Online course, abbreviated as MOOC) recommendation scene, because the Online education environment is usually constrained by two complex conditions: firstly, the courses are successively revised. The precedence relationship between courses is an important consideration for recommending courses. Generally, the course recommendation should include the first-repair courses because the learner may lack the knowledge points of the first-repair courses and need to master the knowledge points to enhance the understanding of the repaired courses. Second, the knowledge structure of the learner. It is well known that the knowledge structure of learners is evolving during the learning process. In this case, any one-to-no-change recommendation strategy is suboptimal because learners will reach different completion states for recommended courses depending on their learning abilities. Therefore, the recommendation strategy needs to take into account the learner's cognitive level. The constraints of these complex conditions lead to the challenge of high difficulty in implementing the interpretable recommendation method for the Muyu lesson.
Disclosure of Invention
In order to solve the above problems, the present invention provides an mu lesson interpretable recommendation method, a terminal device and a storage medium.
The specific scheme is as follows:
a mu class interpretable recommendation method, comprising the steps of:
s1: collecting historical course selection records of learners, extracting learners, courses, course concepts and subject classifications from the historical course selection records as entities, extracting the relationships among the entities, and constructing a triple data set based on the relationships among the entities;
s2: constructing a knowledge graph based on a triple data set, and performing vectorization representation on entities and relations in the knowledge graph through a TransE model, wherein an embedded vector of a learner is represented through a course representation method with coarse granularity, and the embedded vector of a course is represented through a concept representation method with fine granularity;
s3: constructing a learning path reasoning model based on an automatic supervision reinforcement learning method, and guiding a recommended agent to carry out learning path reasoning from a learner to a target course on a knowledge graph;
the learning path inference model comprises an executor network and a path discriminator, wherein the executor network firstly generates a path, then the path discriminator distinguishes an expert demonstration path from the generated path, and the executor network tries to deceive the path discriminator by imitating the expert demonstration path;
s4: training a learning path reasoning model by adopting an executor-critic algorithm, wherein an executor network learns a path reasoning strategy according to a value function of a critic network, and the critic network updates the value function in a single step by utilizing a time sequence difference method;
s5: and reasoning the learning path from the learner to the target course through the trained learning path reasoning model.
Further, the method for representing the learner's embedded vector by a course representation method with coarse granularity is as follows: after the history course selection records of the learner u are sequenced according to the time sequence, the embedded vector of the learner is expressed as:wherein,shows the course of learner u's repair at time t, 1, …, t, …, t u Indicating a time from far to near.
Further, the embedded vector of the course is represented as c t :
c t ={(k,w)|(k i ,w j ),n>i>0,j>0}
Wherein n represents the number of the course concepts contained in the course, i represents the sequence number of the course concepts, j represents the sequence number of the words contained in the course concepts, k i Embedded vectors, w, representing the ith course concept of a course j Representing an embedding vector k i The embedded vector of the jth word in the lesson, k represents the embedded vectors of all concepts in the lesson, and w represents the embedded vectors of all words in the lesson.
Further, in step S3, the path inference problem is expressed as a markov decision process, and the agent recommends a learning path from the learner to the target course for the learner by performing multi-hop path inference on the knowledge base; setting an initial state as s0= u and a state at time t as According to state s t The agent performs the associated action according to the policyTo predict entity e t Feasible output edge, action spaceWhere epsilon represents the set of entities,table cloud a knowledge graph; profit through terminal R e,T Measuring whether an agent has generated a course starting with learner u and targetingA terminated multi-hop path; wherein u represents learner, r represents relationship, e represents entity, r t A relation vector representing time t, e t Showing the fact that at time tThe volume vector, t = {1,2,. Once, t-1, t +1},a revenue function is represented.
Further, potential output edges are reserved through weighting action paths in the Markov decision process, and the weight of each edge in the paths is set as:
wherein,representation with respect to tripletsV represents the vector e of the head entity t With head entity e t With the tail entityRelation r between t Is given by the relation vector r t The sum of the total weight of the components,vector representing tail entity| | | represents the L1 norm.
Further, the method for acquiring the expert demonstration path comprises the following steps: for all learner u and target coursesGeneration of learner u and target lessons on a weighted graph using Dijkstra's algorithm based on weighted action pathsThe shortest path between the two paths is obtained to obtain a series of demonstration pathsFrom demonstration pathAnd obtaining an expert demonstration path by the random sampling.
Further, a path discriminator D p (s t ,a t ) Indicating a state related to s t Action a at time t t Specifically defined as:
wherein,is the intermediate variable(s) of the variable,represents a state s t The embedded vector of (a) is embedded,is at discriminator D p Middle action a p,t Tan h (-) represents a hyperbolic tangent function, σ (-) represents a logistic sigmoid function,andare all parameters that are learned, d a Dimension representing embedding of actions in an actor network, d s Dimension representing state embedding, d d Representing the dimension of action embedding in the path arbiter.
A mu lesson interpretable recommendation terminal apparatus comprising a processor, a memory and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the method as described above in embodiments of the present invention when executing the computer program.
A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the method as described above for an embodiment of the invention.
By adopting the technical scheme, the invention not only can construct explicit information and implicit feedback in the knowledge map, but also can explain and recommend the admiration class through deep reinforcement learning.
Drawings
Fig. 1 is a flowchart illustrating a first embodiment of the present invention.
Fig. 2 is a general framework diagram of the first embodiment of the present invention.
Detailed Description
To further illustrate the various embodiments, the invention provides the accompanying drawings. The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate embodiments of the invention and, together with the description, serve to explain the principles of the embodiments. Those skilled in the art will appreciate still other possible embodiments and advantages of the present invention with reference to these figures.
The invention will now be further described with reference to the drawings and the detailed description.
The first embodiment is as follows:
an embodiment of the present invention provides an mu lesson interpretable recommendation method, as shown in fig. 1 and 2, the method including the following steps:
s1: the method comprises the steps of collecting historical course selection records of learners, extracting learners, courses, course concepts and subject classifications from the historical course selection records as entities, extracting relationships among the entities, and constructing a triple data set based on the relationships among the entities.
S2: and constructing a knowledge graph based on the triple data set, and performing vectorization representation on the entities in the knowledge graph through a TransE model to obtain the embedded vectors of the entities.
The knowledge graph can be constructed based on triple data (h, r, t), and semantic association between entities can be performed through triple data, wherein h and t represent a head node and a tail node of a relationship, or, in other words, h and t represent a head entity and a tail entity of a relationship, and r represents a relationship, such as < learner-pick-curriculum >.
It should be noted that, for the triple data in the constructed knowledge graph, if it is desired to actually apply the triple data, the vocabulary text therein needs to be digitally converted for subsequent calculation. Specifically, the mapping of the vocabulary text in the triple data in the vector space can be obtained through the distributed representation learning of the knowledge graph, that is, the corresponding vector of the vocabulary text in the vector space, wherein the entity corresponds to the entity vector, and the relationship corresponds to the relationship vector. The embodiment utilizes a TransE model to carry out the pair of entities e in the knowledge graph i Vectorizing the sum relation r to obtain the embedded vector of each entityEmbedded vectors of sum relationsWherein d is E Representing the dimensions of the vector.
In order to capture the learner's time-series preference for lessons in the present embodiment, the learner's learning behavior is modeled from the interaction sequence using a coarse-grained lesson representation. In addition, a fine-grained concept representation method is also utilized to master the knowledge state of the learner, and the concepts are taken as the attribute-level information of the course. In this way, the learner's timing preferences may be combined with attribute level preferences to better simulate the learner's knowledge structure while mining for potential factors in the follow-up relationships between courses, as similar courses generally have some of the same course concepts.
(1) The learner is encoded by a course representation method of coarse granularity.
Based on knowledge graphIn the course recommendation of the spectrum, U is set to represent a set of learners, C is set to represent a set of courses, and a history course selection record C is given u The recommended task aims to find a corresponding target course for a specific learner U belonging to UThe recommended learned path. To this end, the history course selection record of learner u can be formalized by sequencing the history course selection record in time orderWhereinThe course C belonging to C and t representing the course of the learner u at the moment t u Representing the number of courses selected by the learner. Thus, a coarse-grained curriculum representation may be used to encode a learner by representing the learner's embedded vector as:in this way, the learner's embedded vector models the learner's learning behavior through a coarse-grained curriculum representation approach, thereby capturing the learner's chronological preferences for the curriculum.
Since the course representation method of coarse granularity cannot understand and interpret the hidden vector of each course, it is difficult to know the knowledge level of the learner from the history of course selection. For this reason, a fine-grained conceptual representation method is also proposed in the present embodiment to address this challenge.
(2) The course is encoded by a fine-grained conceptual representation.
It is well known that a learner's knowledge structure is made up of many knowledge points, including curriculum concepts. In addition, each course contains multiple concepts, and similar courses often have the same concepts between them. In this case, we can capture the learner's knowledge level using semantic representations of the lesson concepts.
More precisely, the concept representation method of fine granularity can be through a series of coursesThe concept of a course to capture the knowledge state of a learner, i.e. { k } 1 ,...,k i In which k is i The embedded vector representing the concept of the course in the taken course can be used as the attribute-level information of the taken course. Typically, course concept embedding consists of a series of word vectors. Formally, according to a series of conceptual embeddings, a course embeddings can be made up of a set of vector pairs:
c t ={(k,w)|(k i ,w j ),n>i>0,j>0}
wherein n represents the number of the course concepts contained in the course, i represents the sequence number of the course concepts, j represents the sequence number of the words contained in the course concepts, k i Embedded vectors, w, representing the ith course concept of a course j Representing an embedding vector k i The embedded vector of the jth word in the course, k represents the embedded vector of all concepts in the course, and w represents the embedded vector of all words in the course.
In particular, a plurality of similar courses may be associated by one or more of the same course concepts. This connectivity may reveal underlying factors of the sequencing relationship between courses. For example, the course "genetics" and its predecessor course "cell biology" share some of the same course concepts, such as "genes", "cells", etc. In this case, if the learner chooses to revise the "genetics" or "cell biology" course, this semantic perception can enable the multi-scale representation learning method to capture the knowledge level and interest of the learner. Therefore, the method enriches the perception information of semantic interaction in the knowledge graph and is beneficial to path reasoning of the admiration course recommendation.
S3: and constructing a learning path reasoning model (such as an automatic supervision module in fig. 2) based on an automatic supervision reinforcement learning method, wherein the learning path reasoning model is used for guiding the recommendation agent to carry out learning path reasoning from the learner to the target course on the knowledge graph.
The present embodiment is used to provide interpretable recommendations to the learner in step S3. Therefore, the embodiment proposes an auto-supervised deep reinforcement learning method for guiding path reasoning on the knowledge graph constructed by the multi-scale representation method. Specifically, starting from a certain learner in the observed historical course selection record Cu, the recommending agent executes multi-hop path reasoning on the knowledge graph, so that the recommended course not only conforms to the knowledge structure of the learner, but also can meet the constraint requirement of the course sequential relation. The learning path inference model in this embodiment helps the recommending agent to differentiate the strengths of different paths in the knowledge-graph to infer learner preferences and find reasonable demonstrations to achieve accurate recommendations.
(1) Markov decision process
The path inference problem is expressed in this embodiment as a Markov Decision Process (MDP). The agent attempts to recommend an appropriate course for the learner by performing multi-hop path reasoning on the knowledge graph. Formally, MDP can be defined as a 5-tupleWherein S represents the space of states in which,representing the action space, P representing the state transition probability,representing the revenue function in the environment and gamma representing the discount coefficient of the revenue.
State: s is t E S represents the search state of the agent in the knowledge-graph at time t. Here, assume that the path discovery process is for learner u and the target courseThe multi-hop relationship between them, i.e. initial state s0= u, and other statesTo enhance the path reasoning capabilities of the agent to obtain higher recommendation accuracy, we introduce course concepts as auxiliary information to increase path connectivity.
The actions: according to state s t The agent performs the correlation according to the policyMovement ofTo predict entity e t Feasible output edges (except for the searched entities). It is necessary here to control the size of the motion space, since some entities have a great degree of outages in the knowledge-graph. Thus, the present embodiment utilizes a weighted action to preserve potential output edges so that the strategy can be adjusted to infer learner preferences. Formally, an action space can be defined as Where epsilon represents the set of entities,representing a knowledge graph.
Revenue:represents terminal revenue that measures whether an agent has generated a course that starts with learner u and targetsEnding the multi-hop path. Formally, the terminal gain at the last time T may be defined asWhereinIndicating functions representing path discovery, i.e. whenIs 1 when the compound is used; when inIs 0.
(2) Self-monitoring module
The self-supervision module in this embodiment includes two functions: one is a weighted action path that helps the recommending agent (i.e., actor) to differentiate the strengths of the different paths in the knowledge-graph to infer the learner's preferences. The other is a path discriminator based on inverse reinforcement learning, which can obtain reasonable demonstration paths to realize accurate recommendation. Details of the implementation of these two functions are described below.
1) Weighted motion paths
Some studies assume that shorter paths are easier to interpret recommendations and then use minimal multi-hop relationships to infer paths on unweighted graphs. However, the approach does not adequately mine the overall semantics of the dependencies and paths between entities, which may lead to unreasonable reasoning. As an alternative, a weighted operation based on the similarity of relationships between entities may learn the dependencies between entities and distinguish the strengths of different paths. For a given arbitrary tripletFor representing head entities e t And tail entityBy the relationship r t Connected, the weight of each edge in the path can be defined as follows.
Wherein,representation with respect to tripletsV represents the vector e of the head entity t Vector r in relation to it t The sum of the total weight of the components,vector representing tail entityThe smaller the weight value of each edge in the path, the stronger the dependency between two entities on the path, as they are closer in vector space.
Generating learner u and target lessons on a weighted graph using Dijkstra's algorithm based on weighted action pathsThe shortest path between them. For all learner u and target coursesThis process is repeated to obtain a series of demonstration paths
Wherein,representation with respect to tripletsThe minimum edge weight of. In this way, the recommending agent utilizes the weighted action path to adjust the policy to efficiently infer the learner's preferences, as the path weights can explore the overall semantics of the path in the observed interactions.
2) Path discriminator based on inverse reinforcement learning
In the embodiment, the reasonable demonstration path conforming to the predefined meta path is obtained by adopting the generative confrontation simulation learning. It employs expert demonstration paths and revenue signals to incentivize strategies to achieve accurate recommendations. In this way, the recommending agent can recommend lessons that match the learner's knowledge level and interests while enhancing their reasoning ability.
Specifically, the actor network competes with the path arbiter D p And (3) performing cooperation: the actor network first generates a path and then the path arbiter distinguishes the expert demonstration path from the generated path, and the actor network attempts to fool the path arbiter by mimicking the expert demonstration path. Formally, the path discriminator D p (s t ,a t ) Indicating a state related to s t Action a at time t t Specifically, the following can be defined.
Wherein,is the intermediate variable(s) of the variable,represents a state s t The embedded vector of (a) is embedded,is at discriminator D p Middle action a p,t Tan h (-) represents a hyperbolic tangent function, σ (-) represents a logistic sigmoid function,andare all learned parameters, d a Dimension representing embedding of actions in an actor network, d s Dimension representing state embedding, d d Indicating path judgmentThe dimension of action embedding in the discriminator.
Route discriminator D p (s t ,a t ) Is used to calculate(s) t ,a t ) Probability from the observed demonstration path. In general, the following classification loss function can be minimizedTo realize that:
wherein the motionAnd stateDetermined by an expert demonstration path, which is a path of demonstration from observationIs randomly sampled.
When the actor network generates a path similar to the observed presentation(s) t ,a t ) Then, the profit R of the path discriminator can be obtained p,t Specifically, the following is shown.
R p,t =log D p (s t ,a t )-log(1-D p (s t ,a t ))
To smoothly update the strategy to find a path approximating the observed presentation, we define the aggregate revenue R by a linear combination of path discovery and the revenue of the path arbiter t 。
R t =λR e,T +(1-λ)R p,t
Wherein lambda belongs to [0,1 ∈ >]Is the profit R of balanced path discovery e,T Yield R of sum path discriminator p,t A scaling factor of (2).
S4: and training a learning path reasoning model by adopting an executor-critic algorithm, wherein an executor network learns a path reasoning strategy according to a value function of a critic network, and the critic network updates the value function in a single step by using a time sequence difference method.
(1) The executive is: the actor network aims to compute each action in the state stTo learn a path inference strategy. It effectively guides path reasoning using weighted action paths and expert path discriminators. In this embodiment, the multi-layer fully-connected neural network is used to train the executor network pi θ (a t ,s t ):
h θ =ReLU(W θ,s s t )
Wherein ReLU (. Cndot.) represents an activation function,representing an action a in an actor network t The embedded vector of (a) is embedded,andis the actor network parameter to learn, d h Dimension representing hidden layer, d s Dimension representing state embedding, d a Representing the dimension of action embedding. Here, the actor network is optimized by a policy gradient method. For each of the sampling trajectories there is a sampling trajectory,the gradient of (c) can be calculated as follows.
Wherein the symbol. Varies indicates "proportional to", Q φ (s t ,a t ) Represents a state s t Lower motion a t Thus, we can learn the participant network by minimizing the loss function, as shown in detail below.
WhereinRepresenting compliance with actor network policy π θ The expected values of the variables given below.
(2) A critic: the critic network evaluates an action cost function to evaluate each action in the MDP environment. It can model the benefits of path discovery and path discriminators to effectively guide the actor network. Critic network computing state s t Lower operating value Q φ :
h φ =ReLU(W φ,s s t )
Q φ (s t ,a t )=a φ,t ReLU(W φ,a h φ )
WhereinRepresenting an action a in a critic network t The embedded vector of (a) is embedded,andis the critic network parameter to learn.
The commenting family network is trained through a time sequence difference method, and the method updates the target q in a single step according to a Bellman equation t As follows.
Wherein beta epsilon [0,1]Is a function of action value Q φ (s t+1 The attenuation factor of a). Thus, the critic network can be learned by minimizing timing difference errors:
by minimizing the total loss function, we jointly optimize the path arbiter D p (s t ,a t ) Executor network pi θ And critic network Q φ . Thus, the objective function of the learning path inference model may be defined as follows:
s5: and reasoning the learning path from the learner to the target course through the trained learning path reasoning model.
For the knowledge graph environment, the semantic representation and the relation of the knowledge graph are enhanced by adopting a multi-scale representation learning method. More specifically, a coarse-grained curriculum representation simulates a learner's learning behavior through user-curriculum interactions; fine-grained concept representation can capture the knowledge state of a learner, i.e., a series of course concepts k 1 ,...,k i They are regarded as attribute level information of the taken lessons. In this way, potential relationships between courses may be well learned.
The embodiment recommends a target course matched with the knowledge level and interest of a learner by an auto-supervised reinforcement learning method, and recommends an intelligent agent to start from a learner, carry out multi-hop path reasoning on a knowledge graph and finally recommend a proper course in the knowledge graph to the learner. The self-supervision module in the method comprises two functions: the path discriminator based on the inverse reinforcement learning can obtain a reasonable demonstration path to realize accurate recommendation. In addition, weighting action paths may help the recommending agent differentiate the strengths of different paths in the knowledge-graph to infer the learner's preferences.
The present embodiment utilizes an actor-critic algorithm to train the learning path inference model. It uses the benefit signal (i.e. a benefit R) e,T For path discovery, another benefit R p,t For path discrimination) to motivate path reasoning for policy evaluation of the mu lesson recommendation.
In summary, the embodiment can construct explicit information (e.g. learning behavior of learners) and implicit feedback (e.g. knowledge level of learners) in the knowledge-graph, and can also make mu class interpretable recommendations through deep reinforcement learning.
Example two:
the invention also provides a mu lesson interpretable recommendation terminal device, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor executes the computer program to realize the steps of the method embodiment of the first embodiment of the invention.
Further, as an executable solution, the mu lesson interpretable recommendation terminal device may be a desktop computer, a notebook, a palm computer, a cloud server, and other computing devices. The mu class interpretable recommendation terminal device can include, but is not limited to, a processor, and a memory. Those skilled in the art will appreciate that the above-mentioned composition structure of the admiration course interpretable recommendation terminal device is only an example of the admiration course interpretable recommendation terminal device, and does not constitute a limitation on the admiration course interpretable recommendation terminal device, and may include more or less components than the above-mentioned one, or combine some components, or different components, for example, the admiration course interpretable recommendation terminal device may further include an input-output device, a network access device, a bus, etc., which is not limited in this embodiment of the present invention.
Further, as an executable solution, the Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, a discrete hardware component, and the like. The general processor can be a microprocessor or the processor can be any conventional processor or the like, said processor is the control center of said mullet interpretable recommendation terminal device, and various interfaces and lines are used to connect the various parts of the whole mullet interpretable recommendation terminal device.
The memory may be used for storing the computer program and/or module, and the processor may be configured to implement various functions of the mu course interpretable recommendation terminal device by executing or executing the computer program and/or module stored in the memory and calling data stored in the memory. The memory can mainly comprise a program storage area and a data storage area, wherein the program storage area can store an operating system and an application program required by at least one function; the storage data area may store data created according to the use of the mobile phone, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
The invention also provides a computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the above-mentioned method of an embodiment of the invention.
The said mu lesson can explain the module/unit integrated by the recommending terminal equipment can be stored in a computer readable storage medium if it is realized in the form of software functional unit and sold or used as independent product. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), software distribution medium, and the like.
While the invention has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (9)
1. A mu lesson interpretable recommendation method, comprising the steps of:
s1: collecting historical lesson selection records of learners, extracting learners, courses, course concepts and subject classifications from the historical lesson selection records as entities, extracting relationships among the entities, and constructing a triple data set based on the relationships among the entities;
s2: constructing a knowledge graph based on a triple data set, and performing vectorization representation on entities and relations in the knowledge graph through a TransE model, wherein an embedded vector of a learner is represented through a course representation method with coarse granularity, and the embedded vector of a course is represented through a concept representation method with fine granularity;
s3: constructing a learning path reasoning model based on an automatic supervision reinforcement learning method, and guiding the recommended agent to carry out learning path reasoning from the learner to the target course on the knowledge map;
the learning path inference model comprises an executor network and a path discriminator, wherein the executor network firstly generates a path, then the path discriminator distinguishes an expert demonstration path from the generated path, and the executor network tries to deceive the path discriminator by imitating the expert demonstration path;
s4: training a learning path reasoning model by adopting an executor-critic algorithm, wherein an executor network learns a path reasoning strategy according to a value function of a critic network, and the critic network updates the value function in a single step by utilizing a time sequence difference method;
s5: and reasoning the learning path from the learner to the target course through the trained learning path reasoning model.
2. The mu lesson interpretable recommendation method according to claim 1, wherein: the method for representing the learner's embedded vector by a course representation method with coarse granularity is as follows: after the history course selection records of the learner u are sequenced according to the time sequence, the embedded vector of the learner is expressed as:wherein,shows the course of learner u's repair at time t, 1, …, t, …, t u Indicating a time from far to near.
3. The mu lesson interpretable recommendation method according to claim 1, wherein: the method for expressing the embedded vector of the course by a fine-grained concept expression method comprises the following steps: representing the embedded vector of the course as c t :
c t ={(k,w)|(k i ,w j ),n>i>0,j>0}
Wherein n represents the number of the course concepts contained in the course, i represents the sequence number of the course concepts, j represents the sequence number of the words contained in the course concepts, k i Embedded vectors, w, representing the ith course concept of a course j Representing an embedding vector k i The embedded vector of the jth word in the course, k represents the embedded vector of all concepts in the course, and w represents the embedded vector of all words in the course.
4. The mu lesson interpretable recommendation method according to claim 1, wherein: step S3, expressing the path reasoning problem as a Markov decision process, and recommending a learning path from a learner to a target course for the learner by the intelligent agent through executing multi-hop path reasoning on the knowledge graph; setting an initial state as s in a Markov decision process 0 =, the state at time t isAccording to state s t The agent performs the associated action according to the policyTo predict entity e t Feasible output edge and action spaceWhere epsilon represents a set of entities that are,representing a knowledge graph; profit R through terminal e,T Measuring whether an agent has generated a course starting with learner u and targetingA terminated multi-hop path; wherein u represents learner, r represents relationship, e represents entity, r t A relation vector representing time t, e t An entity vector representing time t, t = {1,2, …, t-1, t +1},a revenue function is represented.
5. The mu lesson interpretable recommendation method according to claim 1, wherein: and reserving potential output edges by weighting action paths in the Markov decision process, wherein the weight of each edge in the paths is set as:
6. The mu lesson interpretable recommendation method according to claim 1, wherein: the method for acquiring the expert demonstration path comprises the following steps: for all learner u and target coursesGeneration of learner u and target lessons on a weighted graph using Dijkstra's algorithm based on weighted action pathsThe shortest path therebetween, a series of demonstration paths are obtainedFrom demonstration pathAnd obtaining an expert demonstration path by the random sampling.
7. The mu lesson interpretable recommendation method according to claim 1, wherein: route discriminator D p (s t ,a t ) Indicating a state related to s t Action a at time t t Specifically defined as:
wherein,is the intermediate variable(s) of the variable,represents a state s t The embedded vector of (a) is embedded,is at discriminator D p Middle action a p,t Tan h (-) represents a hyperbolic tangent function, σ (-) represents a logistic sigmoid function,andare all learned parameters, d a Dimension representing embedding of actions in an actor network, d s Dimension representing state embedding, d d Representing the dimension of action embedding in the path arbiter.
8. An admire class interpretable recommendation terminal device, characterized by: comprising a processor, a memory and a computer program stored in said memory and running on said processor, said processor implementing the steps of the method according to any one of claims 1 to 7 when executing said computer program.
9. A computer-readable storage medium storing a computer program, the computer program characterized in that: the computer program realizing the steps of the method according to any one of claims 1 to 7 when executed by a processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210666129.0A CN115238169A (en) | 2022-06-14 | 2022-06-14 | Mu course interpretable recommendation method, terminal device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210666129.0A CN115238169A (en) | 2022-06-14 | 2022-06-14 | Mu course interpretable recommendation method, terminal device and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115238169A true CN115238169A (en) | 2022-10-25 |
Family
ID=83669551
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210666129.0A Pending CN115238169A (en) | 2022-06-14 | 2022-06-14 | Mu course interpretable recommendation method, terminal device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115238169A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115577185A (en) * | 2022-11-15 | 2023-01-06 | 湖南师范大学 | Muting course recommendation method and device based on mixed reasoning and mesopic group decision |
CN115658877A (en) * | 2022-12-27 | 2023-01-31 | 神州医疗科技股份有限公司 | Medicine recommendation method and device based on reinforcement learning, electronic equipment and medium |
-
2022
- 2022-06-14 CN CN202210666129.0A patent/CN115238169A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115577185A (en) * | 2022-11-15 | 2023-01-06 | 湖南师范大学 | Muting course recommendation method and device based on mixed reasoning and mesopic group decision |
CN115658877A (en) * | 2022-12-27 | 2023-01-31 | 神州医疗科技股份有限公司 | Medicine recommendation method and device based on reinforcement learning, electronic equipment and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Huo et al. | Knowledge modeling via contextualized representations for LSTM-based personalized exercise recommendation | |
CN111460249B (en) | Personalized learning resource recommendation method based on learner preference modeling | |
CN111582694B (en) | Learning evaluation method and device | |
CN116134454A (en) | Method and system for training neural network models using knowledge distillation | |
CN115238169A (en) | Mu course interpretable recommendation method, terminal device and storage medium | |
CN109032591B (en) | Crowdsourcing software developer recommendation method based on meta-learning | |
CN110889450B (en) | Super-parameter tuning and model construction method and device | |
CN112221159A (en) | Virtual item recommendation method and device and computer readable storage medium | |
Govindarajan et al. | Dynamic learning path prediction—A learning analytics solution | |
CN111428448A (en) | Text generation method and device, computer equipment and readable storage medium | |
El Gourari et al. | The Implementation of Deep Reinforcement Learning in E‐Learning and Distance Learning: Remote Practical Work | |
Habib | Hands-on Q-learning with python: Practical Q-learning with openai gym, Keras, and tensorflow | |
CN115330142A (en) | Training method of joint capacity model, capacity requirement matching method and device | |
CN116186409A (en) | Diversified problem recommendation method, system and equipment combining difficulty and weak knowledge points | |
CN112819024A (en) | Model processing method, user data processing method and device and computer equipment | |
Adnan et al. | Improving m-learners’ performance through deep learning techniques by leveraging features weights | |
KR102624135B1 (en) | Artificial intelligence-based non-face-to-face programming training automation platform service provision method, device and system for enterprises | |
Ge et al. | Deep reinforcement learning navigation via decision transformer in autonomous driving | |
CN117876090A (en) | Risk identification method, electronic device, storage medium, and program product | |
Ciaburro | Keras reinforcement learning projects: 9 projects exploring popular reinforcement learning techniques to build self-learning agents | |
CN114358988B (en) | Teaching mode pushing method and device based on AI technology | |
Jiang et al. | Learning analytics in a blended computer education course | |
CN112825147B (en) | Learning path planning method, device, equipment and storage medium | |
Houlsby | Efficient Bayesian active learning and matrix modelling | |
CN112907004B (en) | Learning planning method, device and computer storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |