CN116595245A

CN116595245A - Hierarchical reinforcement learning-based lesson admiring course recommendation system method

Info

Publication number: CN116595245A
Application number: CN202310405646.7A
Authority: CN
Inventors: 燕博文; 黄先开; 韩龙飞
Original assignee: Beijing Technology and Business University
Current assignee: Beijing Technology and Business University
Priority date: 2023-04-17
Filing date: 2023-04-17
Publication date: 2023-08-15

Abstract

The application provides a method for a lesson admiring course recommendation system based on layered reinforcement learning, belonging to the field of reinforcement learning education; the reinforcement learning data module processes the history course selection information of the students by using layered reinforcement learning, and judges whether the target course selection information of the students needs to be modified or not through two layers of intelligent agents, so that the interference of irrelevant courses on the result of a recommendation system is greatly reduced, and a better data processing mode is provided for the course recommendation system; the reinforcement learning recommendation module uses layered reinforcement learning to conduct course recommendation, the course recommendation task is divided into two parts, and two layers of intelligent agents are used for jointly recommending and predicting target courses, so that the problem of overlarge probability of course recommendation in the hot field can be avoided to a great extent, the accuracy of a recommendation system is improved, and the thought and the method of a new mousing course recommendation system are provided.

Description

Hierarchical reinforcement learning-based lesson admiring course recommendation system method

Technical Field

The application relates to the field of reinforcement learning education, in particular to a course recommendation system method based on hierarchical reinforcement learning.

Background

In daily life, we face a long list of items in the webstore, we will realize that the more items in the list are, the more difficult we choose, the recommendation system is a tool and algorithm whose idea is to help users find items of interest by predicting their preferences or rating the items.

In contrast to deep reinforcement learning, which is a training agent (agent) that combines deep learning with reinforcement learning, the agent in Deep Reinforcement Learning (DRL) can learn actively from real-time feedback of users to infer dynamic user preferences.

Disclosure of Invention

In order to solve the above problems, the present application provides a lesson-based lesson recommendation system based on hierarchical reinforcement learning, which realizes modification of input history of students according to lesson registration information by using hierarchical reinforcement learning data, and then recommends new lessons by using hierarchical reinforcement learning to improve learning efficiency and interest of students, the lesson recommendation system based on hierarchical reinforcement learning mainly comprises the following steps:

1. data preprocessing: the method is based on a public data set, wherein the mousse data set (MOOCCubeX) comprises 4, 216 courses, 230263 videos, 358265 exercises and 2.96 hundred million original behavior data of more than 2330294 students, each student data comprises student IDs, schools, course registration sequences and the like, students with rich course registration experience (more than or equal to 5) are screened, meanwhile, behaviors such as student data, video watching of the students, course comments and the like are integrated, and information such as course numbers (IDs), domains/subjects to which the courses belong, course resources (videos, exercises) and the like are integrated.

2. Hierarchical reinforcement learning data: firstly, using student course registration information as a data set, using the last course registered by students as a prediction target, training a basic recommendation model (NAIS) with an attention mechanism, and then formulating data hierarchical reinforcement learning into a two-stage Markov Decision Process (MDP), and modifying the course registration information of students with incorrect NAIS model prediction.

Firstly, defining high-level reinforcement learning, wherein the high-level reinforcement learning is mainly action and state, and rewarding the action is the action fed back to the state by the agent; the state is the environment in which the agent interacts and is updated according to the action of the agent; rewards are a signal that indicates whether the performed action is reasonable or not, the environment feeds back to the agent action according to the rewards function, and the agent updates its policy by rewards.

The high-level agent acts as whether to modify course information, set it as binary, 0 is not modified, 1 is modified; the status is that the student history course registration information and the target course embed vector cosine similarity, the embedded vector is provided by a pre-trained basic recommendation model (NAIS), thus the relevance of the i-th history course and the target course can be effectively represented.

Wherein θ is _i For the included angle between the i-th historic course embedded vector and the target course embedded vector,and B is connected with _m The values of the i-th historic course embedded vector and the target course embedded vector are respectively.

The higher-level rewards are obtained by the lower-level rewards, if the higher-level task chooses to modify course information, it will call the lower-level task and obtain the same rewards after performing the last lower-level action.

The lower agent judges whether the target course is deleted from the history course information, and also mainly comprises an action a _i State s _i Awarding R. Low-level state s _i Increasing the average value of the first two historical course characteristics on the basis of the high-level state; the low-level action is also binary, 0 means not deleting the current course, 1 means deleting:

wherein a is _i Is a low-level action, s _i For the lower-layer state, gamma, alpha and b are parameters to be learned, the number of state features is related to the dimension of a network hidden layer, and gamma (alphas) _i +b) is an embedded value of the input state σ is a Sigmoid function, outputting the value in a probabilistic form (between 0 and 1).

The low-level rewards are set to be delay rewards after each action in the low-level agent performs low-level reinforcement learning _i (course information) is input into the NAIS model to obtain the model predictive accuracy probability p (y/S) _i ，c _l ) And rewards are obtained, the formula of rewards is as follows:

wherein y is the action of the higher layer agent, S _i For the modified low-level state (course information), S _i When y=0, bonus R (a _i ，s _i ) Likewise, theAt zero, when y=1, the prize R (a _i ，s _i ) Is the difference between the post-modification predicted correct probability and the pre-modification predicted correct probability.

3. Hierarchical reinforcement learning recommendation: the hierarchical reinforcement learning divides the course recommendation into two tasks, namely the sequential recommendation field and the course.

The goal of the higher agent is to recommend an appropriate domain/discipline, where state S _i Including student files, student historic courses, student historic behaviors, and recently selected domain information _i Specifically, we will state S _i Adding to the full connection layer to obtain the ith high-level motion embedded vector A _i The formula of the high-level action is as follows:

A _i ＝t(αS _i +B _i )

wherein A is _i Is the action embedded vector of the high-level agent, t is the tanh function, S _i And then, calculating cosine similarity between the high-level actions and all candidate courses in the fields, and directly taking the field of the most similar course as the prediction field of the high-level actions.

The reward R of the high-level reinforcement learning is whether the course field in the training set is consistent with the action of the high-level agent, if so, positive feedback is obtained, and otherwise, negative feedback is obtained.

The goal of the low-level agent is to recommend the most appropriate course, where the low-level agentInformation including student files, student history courses, recent course information, and domain constraints>Also the object of the agent, i.e. generating the appropriateRecommended course, we will also state +.>Added to the fully connected layer to get the i-th low-level motion embedded vector +.>Low layer prize R ^l The course learning time, course evaluation and course richness are selected to measure, the following is a low-level rewarding formula:

wherein the method comprises the steps ofRewards obtained by taking learning time of student course i as an evaluation index; />For students to evaluate course i, positive evaluation can obtain high rewards, and negative evaluation can obtain negative rewards; />For rewards of course i richness, the richer the course content (video, exercises, forum, etc.) is, the higher the rewards obtained are, alpha, beta, gamma are the weights of different rewards, b _i Is the deviation value.

And feeding back the prediction results of the two layers of agents to the environment, rewarding and updating the agents according to the rewarding function by the environment, updating the strategy of the two layers of agents by the same, and obtaining a final result after training, namely predicting the target course.

In this application, based on the foregoing solution, a hierarchical reinforcement learning course recommendation device is described, including the following modules:

the reinforcement learning data module is used for generating proper user data and course data, screening the user course information before to meet the course recommendation requirement (namely, the course items are rich), and the rest are abandoned together, and then processing the screened user course information through a hierarchical reinforcement learning algorithm, wherein the obtained result is used for constructing a data set;

and the reinforcement learning recommendation module is used for interactively learning with the environment through two layers of agents and training two layers of different agents through reasonable rewards and behaviors of design, wherein the lower layer agents predict courses under the condition that the prediction field is limited by the upper layer, and a course recommendation system result is obtained.

The application provides a recommendation system establishment method, which comprises the following steps: and after the data set is constructed, training the two layers of different agents by using hierarchical reinforcement learning recommendation, wherein the agents obtain training according to feedback of the environment, and obtaining a course recommendation result after the training is finished.

Drawings

Fig. 1 is a training flow diagram of the hierarchical reinforcement learning lesson recommendation method of the present application.

Fig. 2 is a test flow chart of the hierarchical reinforcement learning lesson recommendation method of the present application.

Detailed Description

The embodiments described below are only a part of the present application and should not be construed as being limited to all examples of the above-described modes.

S101: data preprocessing: and meanwhile, the behaviors of the student data, the video watching of the students, the comments of the courses and the like are integrated, and the information of the course ID, the domain/discipline of the course, the course resource (video, exercise) and the like is integrated.

S102: training reinforcement learning data: firstly, pre-training a basic recommendation model (NAIS) by using screened student course information, then processing the screened student course information by using hierarchical reinforcement learning, firstly obtaining student course selection data information by using a high-level agent, judging whether to modify student course selection history information, if so, judging whether specific courses are reserved one by using a low-level agent, if so, inputting a modified result into the pre-trained basic recommendation model (NAIS), obtaining rewards by the difference of accuracy of the NAIS model before and after modification, and feeding the rewards back to data hierarchical reinforcement learning.

S103: training reinforcement learning recommendation: the method comprises the steps that a high-level agent uses a current strategy to send a generated field to a low-level agent, then the low-level agent generates a recommended course under the constraint of the field and inputs the recommended course into the environment, the environment obtains rewards through a rewarding function and feeds the rewards back to two-level agents, the agents update own strategy by means of the rewards, then the next training is carried out, and after the training is finished, the recommended course generated by the reinforced learning recommendation is the final result.

Fig. 1 is a training flowchart of the hierarchical reinforcement learning lesson recommendation method of the present application, showing the interaction flow between the reinforcement learning data module and the reinforcement learning recommendation module during training.

Fig. 2 is a test flow chart of the hierarchical reinforcement learning lesson recommendation method of the present application, showing how the system obtains test results during the test.

While the preferred embodiment of the present application has been described in detail, the application is not limited to the embodiment, and various equivalent modifications and substitutions can be made by those skilled in the art without departing from the spirit of the application, and these equivalent modifications and substitutions are intended to be included in the scope of the present application as defined in the appended claims.

Claims

1. The method for recommending the lesson-based course based on the hierarchical reinforcement learning is characterized by comprising the following steps of:

s1: data preprocessing: the data is obtained from a mousing course data set (MOOCCubeX), wherein necessary data is course information, the field/discipline to which the course belongs and a student file, wherein the necessary data in the student file is student number (id), the course registration order can be added with other data on the basis of necessity to build a more perfect data set (such as school information, course content and the like), and more detailed data is provided for a recommendation system.

S2: hierarchical reinforcement learning data: processing data establishment hierarchical reinforcement learning, judging whether to prune course registration information of students through two different agents (agents), wherein a higher agent is used for deleting the course registration information of the students; the lower agent judges that the specific course of the student is deleted from the registration information so as to reduce the interference of the specific course of the student on the result of the recommendation system, and then the processed registration information of the course of the student is used as a data set.

S3: hierarchical reinforcement learning recommendation: using two layers of different agent training models, wherein the high-level agents are responsible for predicting the field/discipline to which the recommended course belongs; the low-level agent receives the field/discipline, predicts the course under the constraint of the predicted field/discipline, so that the problem of overlarge probability of course recommendation in the hot field can be avoided to a great extent, and the accuracy of system recommendation is obviously improved.

2. The method S2 step of a lesson recommended system for lessons based on hierarchical reinforcement learning according to claim 1, wherein: the hierarchical reinforcement learning data divides the data processing task into two parts, a basic recommendation model is required to be pre-trained, wherein the high-level reinforcement learning is responsible for judging whether the course selection information of a target student needs to be modified, and the low-level reinforcement learning judges whether a specific course of the target student needs to be deleted or not on the basis of the high-level judgment of the need of modification; inputting the deleted course selection history information into a basic recommendation model, obtaining rewards through the difference of the correction rates of the basic recommendation model before and after modification, feeding the rewards back to the hierarchical reinforcement learning data, and performing the training of the hierarchical reinforcement learning recommendation by using the modified data.

3. The method S3 for a lesson recommended system based on hierarchical reinforcement learning according to claim 1, wherein: the hierarchical reinforcement learning recommendation divides a course recommendation task into two parts, wherein a high-level reinforcement learning is responsible for generating a field to which a prediction course belongs, and a low-level reinforcement learning is responsible for generating a predicted target course, firstly, a high-level reward and action (action) are designed, wherein the high-level agent interacts with the environment, and the field to which the prediction course belongs is obtained through the current state; and then designing low-level rewards and actions on the basis of the low-level rewards and actions, predicting the low-level agents under the constraint of the high-level predicted results, and feeding back the recommended results to the environment, wherein the environment feeds back rewards to the high-level agents and the low-level agents through two rewards functions, and the high-level agents and the low-level agents update own strategies until training is finished.

4. A method of a class course recommendation system based on hierarchical reinforcement learning, comprising:

the reinforcement learning data module is used for generating proper user data and course data, screening user course information meeting the course recommendation requirement (namely rich course items) before reinforcement learning data is carried out, and the rest is abandoned, and then obtaining the needed user and course information through a hierarchical reinforcement learning algorithm to construct a data set.

The reinforcement learning recommendation module is used for interactively learning with the environment through two layers of agents and training two layers of different agents through designed reasonable rewards and behaviors; and predicting the course by the lower agent under the condition that the prediction field is limited by the upper layer, so as to obtain a course recommendation system result.