CN115238169A

CN115238169A - Mu course interpretable recommendation method, terminal device and storage medium

Info

Publication number: CN115238169A
Application number: CN202210666129.0A
Authority: CN
Inventors: 林元国; 林凡; 张志宏; 张伟; 游环宇; 柳蕴轩; 陈鸿
Original assignee: Xiamen University
Current assignee: Xiamen University
Priority date: 2022-06-14
Filing date: 2022-06-14
Publication date: 2022-10-25

Abstract

The invention relates to a mu lesson interpretable recommendation method, terminal equipment and a storage medium, wherein the method comprises the following steps: constructing a triple data set according to the relation between entities in the history course selection record of the learner and the entities; constructing a knowledge graph based on a triple data set, and vectorizing and expressing entities and relations in the knowledge graph through a TransE model, wherein learners and courses are expressed through a course expression method with coarse granularity and a concept expression method with fine granularity respectively; constructing a learning path reasoning model based on an automatic supervision reinforcement learning method, and guiding a recommended agent to carry out learning path reasoning from a learner to a target course on a knowledge graph; training a learning path inference model by adopting an executor-critic algorithm; and reasoning the learning path from the learner to the target course through the trained learning path reasoning model. The invention not only can construct explicit information and implicit feedback in the knowledge map, but also can explain and recommend the mullet lesson through deep reinforcement learning.

Description

Mu course interpretable recommendation method, terminal device and storage medium

Technical Field

The invention relates to the field of admiration class recommendation, in particular to an admiration class interpretable recommendation method, terminal equipment and a storage medium.

Background

The existing popular interpretable recommendation method is not suitable for a mu class (Massive Open Online course, abbreviated as MOOC) recommendation scene, because the Online education environment is usually constrained by two complex conditions: firstly, the courses are successively revised. The precedence relationship between courses is an important consideration for recommending courses. Generally, the course recommendation should include the first-repair courses because the learner may lack the knowledge points of the first-repair courses and need to master the knowledge points to enhance the understanding of the repaired courses. Second, the knowledge structure of the learner. It is well known that the knowledge structure of learners is evolving during the learning process. In this case, any one-to-no-change recommendation strategy is suboptimal because learners will reach different completion states for recommended courses depending on their learning abilities. Therefore, the recommendation strategy needs to take into account the learner's cognitive level. The constraints of these complex conditions lead to the challenge of high difficulty in implementing the interpretable recommendation method for the Muyu lesson.

Disclosure of Invention

In order to solve the above problems, the present invention provides an mu lesson interpretable recommendation method, a terminal device and a storage medium.

The specific scheme is as follows:

a mu class interpretable recommendation method, comprising the steps of:

s1: collecting historical course selection records of learners, extracting learners, courses, course concepts and subject classifications from the historical course selection records as entities, extracting the relationships among the entities, and constructing a triple data set based on the relationships among the entities;

s2: constructing a knowledge graph based on a triple data set, and performing vectorization representation on entities and relations in the knowledge graph through a TransE model, wherein an embedded vector of a learner is represented through a course representation method with coarse granularity, and the embedded vector of a course is represented through a concept representation method with fine granularity;

s3: constructing a learning path reasoning model based on an automatic supervision reinforcement learning method, and guiding a recommended agent to carry out learning path reasoning from a learner to a target course on a knowledge graph;

the learning path inference model comprises an executor network and a path discriminator, wherein the executor network firstly generates a path, then the path discriminator distinguishes an expert demonstration path from the generated path, and the executor network tries to deceive the path discriminator by imitating the expert demonstration path;

s4: training a learning path reasoning model by adopting an executor-critic algorithm, wherein an executor network learns a path reasoning strategy according to a value function of a critic network, and the critic network updates the value function in a single step by utilizing a time sequence difference method;

s5: and reasoning the learning path from the learner to the target course through the trained learning path reasoning model.

Further, the method for representing the learner's embedded vector by a course representation method with coarse granularity is as follows: after the history course selection records of the learner u are sequenced according to the time sequence, the embedded vector of the learner is expressed as:

wherein,

shows the course of learner u's repair at time t, 1, …, t, …, t _u Indicating a time from far to near.

Further, the embedded vector of the course is represented as c _t ：

c _t ＝{(k，w)|(k _i ，w _j )，n＞i＞0，j＞0}

Wherein n represents the number of the course concepts contained in the course, i represents the sequence number of the course concepts, j represents the sequence number of the words contained in the course concepts, k _i Embedded vectors, w, representing the ith course concept of a course _j Representing an embedding vector k _i The embedded vector of the jth word in the lesson, k represents the embedded vectors of all concepts in the lesson, and w represents the embedded vectors of all words in the lesson.

Further, in step S3, the path inference problem is expressed as a markov decision process, and the agent recommends a learning path from the learner to the target course for the learner by performing multi-hop path inference on the knowledge base; setting an initial state as s0= u and a state at time t as

According to state s _t The agent performs the associated action according to the policy

To predict entity e _t Feasible output edge, action space

Where epsilon represents the set of entities,

table cloud a knowledge graph; profit through terminal R _e，T Measuring whether an agent has generated a course starting with learner u and targeting

A terminated multi-hop path; wherein u represents learner, r represents relationship, e represents entity, r _t A relation vector representing time t, e _t Showing the fact that at time tThe volume vector, t = {1,2,. Once, t-1, t +1},

a revenue function is represented.

Further, potential output edges are reserved through weighting action paths in the Markov decision process, and the weight of each edge in the paths is set as:

wherein,

representation with respect to triplets

V represents the vector e of the head entity _t With head entity e _t With the tail entity

Relation r between _t Is given by the relation vector r _t The sum of the total weight of the components,

vector representing tail entity

| | | represents the L1 norm.

Further, the method for acquiring the expert demonstration path comprises the following steps: for all learner u and target courses

Generation of learner u and target lessons on a weighted graph using Dijkstra's algorithm based on weighted action paths

The shortest path between the two paths is obtained to obtain a series of demonstration paths

From demonstration path

And obtaining an expert demonstration path by the random sampling.

Further, a path discriminator D _p (s _t ，a _t ) Indicating a state related to s _t Action a at time t _t Specifically defined as:

wherein,

is the intermediate variable(s) of the variable,

represents a state s _t The embedded vector of (a) is embedded,

is at discriminator D _p Middle action a _p，t Tan h (-) represents a hyperbolic tangent function, σ (-) represents a logistic sigmoid function,

and

are all parameters that are learned, d _a Dimension representing embedding of actions in an actor network, d _s Dimension representing state embedding, d _d Representing the dimension of action embedding in the path arbiter.

A mu lesson interpretable recommendation terminal apparatus comprising a processor, a memory and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the method as described above in embodiments of the present invention when executing the computer program.

A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the method as described above for an embodiment of the invention.

By adopting the technical scheme, the invention not only can construct explicit information and implicit feedback in the knowledge map, but also can explain and recommend the admiration class through deep reinforcement learning.

Drawings

Fig. 1 is a flowchart illustrating a first embodiment of the present invention.

Fig. 2 is a general framework diagram of the first embodiment of the present invention.

Detailed Description

To further illustrate the various embodiments, the invention provides the accompanying drawings. The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate embodiments of the invention and, together with the description, serve to explain the principles of the embodiments. Those skilled in the art will appreciate still other possible embodiments and advantages of the present invention with reference to these figures.

The invention will now be further described with reference to the drawings and the detailed description.

The first embodiment is as follows:

an embodiment of the present invention provides an mu lesson interpretable recommendation method, as shown in fig. 1 and 2, the method including the following steps:

s1: the method comprises the steps of collecting historical course selection records of learners, extracting learners, courses, course concepts and subject classifications from the historical course selection records as entities, extracting relationships among the entities, and constructing a triple data set based on the relationships among the entities.

S2: and constructing a knowledge graph based on the triple data set, and performing vectorization representation on the entities in the knowledge graph through a TransE model to obtain the embedded vectors of the entities.

The knowledge graph can be constructed based on triple data (h, r, t), and semantic association between entities can be performed through triple data, wherein h and t represent a head node and a tail node of a relationship, or, in other words, h and t represent a head entity and a tail entity of a relationship, and r represents a relationship, such as < learner-pick-curriculum >.

It should be noted that, for the triple data in the constructed knowledge graph, if it is desired to actually apply the triple data, the vocabulary text therein needs to be digitally converted for subsequent calculation. Specifically, the mapping of the vocabulary text in the triple data in the vector space can be obtained through the distributed representation learning of the knowledge graph, that is, the corresponding vector of the vocabulary text in the vector space, wherein the entity corresponds to the entity vector, and the relationship corresponds to the relationship vector. The embodiment utilizes a TransE model to carry out the pair of entities e in the knowledge graph _i Vectorizing the sum relation r to obtain the embedded vector of each entity

Embedded vectors of sum relations

Wherein d is _E Representing the dimensions of the vector.

In order to capture the learner's time-series preference for lessons in the present embodiment, the learner's learning behavior is modeled from the interaction sequence using a coarse-grained lesson representation. In addition, a fine-grained concept representation method is also utilized to master the knowledge state of the learner, and the concepts are taken as the attribute-level information of the course. In this way, the learner's timing preferences may be combined with attribute level preferences to better simulate the learner's knowledge structure while mining for potential factors in the follow-up relationships between courses, as similar courses generally have some of the same course concepts.

(1) The learner is encoded by a course representation method of coarse granularity.

Based on knowledge graphIn the course recommendation of the spectrum, U is set to represent a set of learners, C is set to represent a set of courses, and a history course selection record C is given _u The recommended task aims to find a corresponding target course for a specific learner U belonging to U

The recommended learned path. To this end, the history course selection record of learner u can be formalized by sequencing the history course selection record in time order

Wherein

The course C belonging to C and t representing the course of the learner u at the moment t _u Representing the number of courses selected by the learner. Thus, a coarse-grained curriculum representation may be used to encode a learner by representing the learner's embedded vector as:

in this way, the learner's embedded vector models the learner's learning behavior through a coarse-grained curriculum representation approach, thereby capturing the learner's chronological preferences for the curriculum.

Since the course representation method of coarse granularity cannot understand and interpret the hidden vector of each course, it is difficult to know the knowledge level of the learner from the history of course selection. For this reason, a fine-grained conceptual representation method is also proposed in the present embodiment to address this challenge.

(2) The course is encoded by a fine-grained conceptual representation.

It is well known that a learner's knowledge structure is made up of many knowledge points, including curriculum concepts. In addition, each course contains multiple concepts, and similar courses often have the same concepts between them. In this case, we can capture the learner's knowledge level using semantic representations of the lesson concepts.

More precisely, the concept representation method of fine granularity can be through a series of coursesThe concept of a course to capture the knowledge state of a learner, i.e. { k } ₁ ，...，k _i In which k is _i The embedded vector representing the concept of the course in the taken course can be used as the attribute-level information of the taken course. Typically, course concept embedding consists of a series of word vectors. Formally, according to a series of conceptual embeddings, a course embeddings can be made up of a set of vector pairs:

c _t ＝{(k，w)|(k _i ，w _j )，n＞i＞0，j＞0}

wherein n represents the number of the course concepts contained in the course, i represents the sequence number of the course concepts, j represents the sequence number of the words contained in the course concepts, k _i Embedded vectors, w, representing the ith course concept of a course _j Representing an embedding vector k _i The embedded vector of the jth word in the course, k represents the embedded vector of all concepts in the course, and w represents the embedded vector of all words in the course.

In particular, a plurality of similar courses may be associated by one or more of the same course concepts. This connectivity may reveal underlying factors of the sequencing relationship between courses. For example, the course "genetics" and its predecessor course "cell biology" share some of the same course concepts, such as "genes", "cells", etc. In this case, if the learner chooses to revise the "genetics" or "cell biology" course, this semantic perception can enable the multi-scale representation learning method to capture the knowledge level and interest of the learner. Therefore, the method enriches the perception information of semantic interaction in the knowledge graph and is beneficial to path reasoning of the admiration course recommendation.

S3: and constructing a learning path reasoning model (such as an automatic supervision module in fig. 2) based on an automatic supervision reinforcement learning method, wherein the learning path reasoning model is used for guiding the recommendation agent to carry out learning path reasoning from the learner to the target course on the knowledge graph.

The present embodiment is used to provide interpretable recommendations to the learner in step S3. Therefore, the embodiment proposes an auto-supervised deep reinforcement learning method for guiding path reasoning on the knowledge graph constructed by the multi-scale representation method. Specifically, starting from a certain learner in the observed historical course selection record Cu, the recommending agent executes multi-hop path reasoning on the knowledge graph, so that the recommended course not only conforms to the knowledge structure of the learner, but also can meet the constraint requirement of the course sequential relation. The learning path inference model in this embodiment helps the recommending agent to differentiate the strengths of different paths in the knowledge-graph to infer learner preferences and find reasonable demonstrations to achieve accurate recommendations.

(1) Markov decision process

The path inference problem is expressed in this embodiment as a Markov Decision Process (MDP). The agent attempts to recommend an appropriate course for the learner by performing multi-hop path reasoning on the knowledge graph. Formally, MDP can be defined as a 5-tuple

Wherein S represents the space of states in which,

representing the action space, P representing the state transition probability,

representing the revenue function in the environment and gamma representing the discount coefficient of the revenue.

State: s is _t E S represents the search state of the agent in the knowledge-graph at time t. Here, assume that the path discovery process is for learner u and the target course

The multi-hop relationship between them, i.e. initial state s0= u, and other states

To enhance the path reasoning capabilities of the agent to obtain higher recommendation accuracy, we introduce course concepts as auxiliary information to increase path connectivity.

The actions: according to state s _t The agent performs the correlation according to the policyMovement of

To predict entity e _t Feasible output edges (except for the searched entities). It is necessary here to control the size of the motion space, since some entities have a great degree of outages in the knowledge-graph. Thus, the present embodiment utilizes a weighted action to preserve potential output edges so that the strategy can be adjusted to infer learner preferences. Formally, an action space can be defined as

Where epsilon represents the set of entities,

representing a knowledge graph.

Revenue:

represents terminal revenue that measures whether an agent has generated a course that starts with learner u and targets

Ending the multi-hop path. Formally, the terminal gain at the last time T may be defined as

Wherein

Indicating functions representing path discovery, i.e. when

Is 1 when the compound is used; when in

Is 0.

(2) Self-monitoring module

The self-supervision module in this embodiment includes two functions: one is a weighted action path that helps the recommending agent (i.e., actor) to differentiate the strengths of the different paths in the knowledge-graph to infer the learner's preferences. The other is a path discriminator based on inverse reinforcement learning, which can obtain reasonable demonstration paths to realize accurate recommendation. Details of the implementation of these two functions are described below.

1) Weighted motion paths

Some studies assume that shorter paths are easier to interpret recommendations and then use minimal multi-hop relationships to infer paths on unweighted graphs. However, the approach does not adequately mine the overall semantics of the dependencies and paths between entities, which may lead to unreasonable reasoning. As an alternative, a weighted operation based on the similarity of relationships between entities may learn the dependencies between entities and distinguish the strengths of different paths. For a given arbitrary triplet

For representing head entities e _t And tail entity

By the relationship r _t Connected, the weight of each edge in the path can be defined as follows.

Wherein,

representation with respect to triplets

V represents the vector e of the head entity _t Vector r in relation to it _t The sum of the total weight of the components,

vector representing tail entity

The smaller the weight value of each edge in the path, the stronger the dependency between two entities on the path, as they are closer in vector space.

Generating learner u and target lessons on a weighted graph using Dijkstra's algorithm based on weighted action paths

The shortest path between them. For all learner u and target courses

This process is repeated to obtain a series of demonstration paths

Wherein,

representation with respect to triplets

The minimum edge weight of. In this way, the recommending agent utilizes the weighted action path to adjust the policy to efficiently infer the learner's preferences, as the path weights can explore the overall semantics of the path in the observed interactions.

2) Path discriminator based on inverse reinforcement learning

In the embodiment, the reasonable demonstration path conforming to the predefined meta path is obtained by adopting the generative confrontation simulation learning. It employs expert demonstration paths and revenue signals to incentivize strategies to achieve accurate recommendations. In this way, the recommending agent can recommend lessons that match the learner's knowledge level and interests while enhancing their reasoning ability.

Specifically, the actor network competes with the path arbiter D _p And (3) performing cooperation: the actor network first generates a path and then the path arbiter distinguishes the expert demonstration path from the generated path, and the actor network attempts to fool the path arbiter by mimicking the expert demonstration path. Formally, the path discriminator D _p (s _t ，a _t ) Indicating a state related to s _t Action a at time t _t Specifically, the following can be defined.

Wherein,

is the intermediate variable(s) of the variable,

represents a state s _t The embedded vector of (a) is embedded,

and

are all learned parameters, d _a Dimension representing embedding of actions in an actor network, d _s Dimension representing state embedding, d _d Indicating path judgmentThe dimension of action embedding in the discriminator.

Route discriminator D _p (s _t ，a _t ) Is used to calculate(s) _t ，a _t ) Probability from the observed demonstration path. In general, the following classification loss function can be minimized

To realize that:

wherein the motion

And state

Determined by an expert demonstration path, which is a path of demonstration from observation

Is randomly sampled.

When the actor network generates a path similar to the observed presentation(s) _t ，a _t ) Then, the profit R of the path discriminator can be obtained _p，t Specifically, the following is shown.

R _p，t ＝log D _p (s _t ，a _t )-log(1-D _p (s _t ，a _t ))

To smoothly update the strategy to find a path approximating the observed presentation, we define the aggregate revenue R by a linear combination of path discovery and the revenue of the path arbiter _t 。

R _t ＝λR _e，T +(1-λ)R _p，t

Wherein lambda belongs to [0,1 ∈ >]Is the profit R of balanced path discovery _e，T Yield R of sum path discriminator _p，t A scaling factor of (2).

S4: and training a learning path reasoning model by adopting an executor-critic algorithm, wherein an executor network learns a path reasoning strategy according to a value function of a critic network, and the critic network updates the value function in a single step by using a time sequence difference method.

(1) The executive is: the actor network aims to compute each action in the state st

To learn a path inference strategy. It effectively guides path reasoning using weighted action paths and expert path discriminators. In this embodiment, the multi-layer fully-connected neural network is used to train the executor network pi _θ (a _t ，s _t )：

h _θ ＝ReLU(W _θ，s s _t )

Wherein ReLU (. Cndot.) represents an activation function,

representing an action a in an actor network _t The embedded vector of (a) is embedded,

and

is the actor network parameter to learn, d _h Dimension representing hidden layer, d _s Dimension representing state embedding, d _a Representing the dimension of action embedding. Here, the actor network is optimized by a policy gradient method. For each of the sampling trajectories there is a sampling trajectory,

the gradient of (c) can be calculated as follows.

Wherein the symbol. Varies indicates "proportional to", Q _φ (s _t ，a _t ) Represents a state s _t Lower motion a _t Thus, we can learn the participant network by minimizing the loss function, as shown in detail below.

Wherein

Representing compliance with actor network policy π _θ The expected values of the variables given below.

(2) A critic: the critic network evaluates an action cost function to evaluate each action in the MDP environment. It can model the benefits of path discovery and path discriminators to effectively guide the actor network. Critic network computing state s _t Lower operating value Q _φ ：

h _φ ＝ReLU(W _φ，s s _t )

Q _φ (s _t ，a _t )＝a _φ，t ReLU(W _φ，a h _φ )

Wherein

Representing an action a in a critic network _t The embedded vector of (a) is embedded,

and

is the critic network parameter to learn.

The commenting family network is trained through a time sequence difference method, and the method updates the target q in a single step according to a Bellman equation _t As follows.

Wherein beta epsilon [0,1]Is a function of action value Q _φ (s _t+1 The attenuation factor of a). Thus, the critic network can be learned by minimizing timing difference errors:

by minimizing the total loss function, we jointly optimize the path arbiter D _p (s _t ，a _t ) Executor network pi _θ And critic network Q _φ . Thus, the objective function of the learning path inference model may be defined as follows:

For the knowledge graph environment, the semantic representation and the relation of the knowledge graph are enhanced by adopting a multi-scale representation learning method. More specifically, a coarse-grained curriculum representation simulates a learner's learning behavior through user-curriculum interactions; fine-grained concept representation can capture the knowledge state of a learner, i.e., a series of course concepts k ₁ ，...，k _i They are regarded as attribute level information of the taken lessons. In this way, potential relationships between courses may be well learned.

The embodiment recommends a target course matched with the knowledge level and interest of a learner by an auto-supervised reinforcement learning method, and recommends an intelligent agent to start from a learner, carry out multi-hop path reasoning on a knowledge graph and finally recommend a proper course in the knowledge graph to the learner. The self-supervision module in the method comprises two functions: the path discriminator based on the inverse reinforcement learning can obtain a reasonable demonstration path to realize accurate recommendation. In addition, weighting action paths may help the recommending agent differentiate the strengths of different paths in the knowledge-graph to infer the learner's preferences.

The present embodiment utilizes an actor-critic algorithm to train the learning path inference model. It uses the benefit signal (i.e. a benefit R) _e,T For path discovery, another benefit R _p,t For path discrimination) to motivate path reasoning for policy evaluation of the mu lesson recommendation.

In summary, the embodiment can construct explicit information (e.g. learning behavior of learners) and implicit feedback (e.g. knowledge level of learners) in the knowledge-graph, and can also make mu class interpretable recommendations through deep reinforcement learning.

Example two:

the invention also provides a mu lesson interpretable recommendation terminal device, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor executes the computer program to realize the steps of the method embodiment of the first embodiment of the invention.

Further, as an executable solution, the mu lesson interpretable recommendation terminal device may be a desktop computer, a notebook, a palm computer, a cloud server, and other computing devices. The mu class interpretable recommendation terminal device can include, but is not limited to, a processor, and a memory. Those skilled in the art will appreciate that the above-mentioned composition structure of the admiration course interpretable recommendation terminal device is only an example of the admiration course interpretable recommendation terminal device, and does not constitute a limitation on the admiration course interpretable recommendation terminal device, and may include more or less components than the above-mentioned one, or combine some components, or different components, for example, the admiration course interpretable recommendation terminal device may further include an input-output device, a network access device, a bus, etc., which is not limited in this embodiment of the present invention.

Further, as an executable solution, the Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, a discrete hardware component, and the like. The general processor can be a microprocessor or the processor can be any conventional processor or the like, said processor is the control center of said mullet interpretable recommendation terminal device, and various interfaces and lines are used to connect the various parts of the whole mullet interpretable recommendation terminal device.

The memory may be used for storing the computer program and/or module, and the processor may be configured to implement various functions of the mu course interpretable recommendation terminal device by executing or executing the computer program and/or module stored in the memory and calling data stored in the memory. The memory can mainly comprise a program storage area and a data storage area, wherein the program storage area can store an operating system and an application program required by at least one function; the storage data area may store data created according to the use of the mobile phone, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.

The invention also provides a computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the above-mentioned method of an embodiment of the invention.

The said mu lesson can explain the module/unit integrated by the recommending terminal equipment can be stored in a computer readable storage medium if it is realized in the form of software functional unit and sold or used as independent product. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), software distribution medium, and the like.

While the invention has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A mu lesson interpretable recommendation method, comprising the steps of:

s1: collecting historical lesson selection records of learners, extracting learners, courses, course concepts and subject classifications from the historical lesson selection records as entities, extracting relationships among the entities, and constructing a triple data set based on the relationships among the entities;

s3: constructing a learning path reasoning model based on an automatic supervision reinforcement learning method, and guiding the recommended agent to carry out learning path reasoning from the learner to the target course on the knowledge map;

2. The mu lesson interpretable recommendation method according to claim 1, wherein: the method for representing the learner's embedded vector by a course representation method with coarse granularity is as follows: after the history course selection records of the learner u are sequenced according to the time sequence, the embedded vector of the learner is expressed as:

wherein,

3. The mu lesson interpretable recommendation method according to claim 1, wherein: the method for expressing the embedded vector of the course by a fine-grained concept expression method comprises the following steps: representing the embedded vector of the course as c _t ：

c _t ＝{(k,w)|（k _i ,w _j ),n＞i＞0,j＞0}

4. The mu lesson interpretable recommendation method according to claim 1, wherein: step S3, expressing the path reasoning problem as a Markov decision process, and recommending a learning path from a learner to a target course for the learner by the intelligent agent through executing multi-hop path reasoning on the knowledge graph; setting an initial state as s in a Markov decision process ₀ =, the state at time t is

To predict entity e _t Feasible output edge and action space

Where epsilon represents a set of entities that are,

representing a knowledge graph; profit R through terminal _e,T Measuring whether an agent has generated a course starting with learner u and targeting

A terminated multi-hop path; wherein u represents learner, r represents relationship, e represents entity, r _t A relation vector representing time t, e _t An entity vector representing time t, t = {1,2, …, t-1, t +1},

a revenue function is represented.

5. The mu lesson interpretable recommendation method according to claim 1, wherein: and reserving potential output edges by weighting action paths in the Markov decision process, wherein the weight of each edge in the paths is set as:

wherein,

representation with respect to triplets

vector representing tail entity

Iiii denotes the L1 norm.

6. The mu lesson interpretable recommendation method according to claim 1, wherein: the method for acquiring the expert demonstration path comprises the following steps: for all learner u and target courses

The shortest path therebetween, a series of demonstration paths are obtained

From demonstration path

And obtaining an expert demonstration path by the random sampling.

7. The mu lesson interpretable recommendation method according to claim 1, wherein: route discriminator D _p (s _t ,a _t ) Indicating a state related to s _t Action a at time t _t Specifically defined as:

wherein,

is the intermediate variable(s) of the variable,

represents a state s _t The embedded vector of (a) is embedded,

is at discriminator D _p Middle action a _p,t Tan h (-) represents a hyperbolic tangent function, σ (-) represents a logistic sigmoid function,

and

are all learned parameters, d _a Dimension representing embedding of actions in an actor network, d _s Dimension representing state embedding, d _d Representing the dimension of action embedding in the path arbiter.

8. An admire class interpretable recommendation terminal device, characterized by: comprising a processor, a memory and a computer program stored in said memory and running on said processor, said processor implementing the steps of the method according to any one of claims 1 to 7 when executing said computer program.

9. A computer-readable storage medium storing a computer program, the computer program characterized in that: the computer program realizing the steps of the method according to any one of claims 1 to 7 when executed by a processor.