CN116595245A - Hierarchical reinforcement learning-based lesson admiring course recommendation system method - Google Patents

Hierarchical reinforcement learning-based lesson admiring course recommendation system method Download PDF

Info

Publication number
CN116595245A
CN116595245A CN202310405646.7A CN202310405646A CN116595245A CN 116595245 A CN116595245 A CN 116595245A CN 202310405646 A CN202310405646 A CN 202310405646A CN 116595245 A CN116595245 A CN 116595245A
Authority
CN
China
Prior art keywords
course
reinforcement learning
level
recommendation
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310405646.7A
Other languages
Chinese (zh)
Inventor
燕博文
黄先开
韩龙飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Technology and Business University
Original Assignee
Beijing Technology and Business University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Technology and Business University filed Critical Beijing Technology and Business University
Priority to CN202310405646.7A priority Critical patent/CN116595245A/en
Publication of CN116595245A publication Critical patent/CN116595245A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • G06Q50/205Education administration or guidance

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Educational Administration (AREA)
  • Strategic Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Tourism & Hospitality (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Business, Economics & Management (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Educational Technology (AREA)
  • Development Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Mathematical Physics (AREA)
  • Marketing (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The application provides a method for a lesson admiring course recommendation system based on layered reinforcement learning, belonging to the field of reinforcement learning education; the reinforcement learning data module processes the history course selection information of the students by using layered reinforcement learning, and judges whether the target course selection information of the students needs to be modified or not through two layers of intelligent agents, so that the interference of irrelevant courses on the result of a recommendation system is greatly reduced, and a better data processing mode is provided for the course recommendation system; the reinforcement learning recommendation module uses layered reinforcement learning to conduct course recommendation, the course recommendation task is divided into two parts, and two layers of intelligent agents are used for jointly recommending and predicting target courses, so that the problem of overlarge probability of course recommendation in the hot field can be avoided to a great extent, the accuracy of a recommendation system is improved, and the thought and the method of a new mousing course recommendation system are provided.

Description

Hierarchical reinforcement learning-based lesson admiring course recommendation system method
Technical Field
The application relates to the field of reinforcement learning education, in particular to a course recommendation system method based on hierarchical reinforcement learning.
Background
In daily life, we face a long list of items in the webstore, we will realize that the more items in the list are, the more difficult we choose, the recommendation system is a tool and algorithm whose idea is to help users find items of interest by predicting their preferences or rating the items.
In contrast to deep reinforcement learning, which is a training agent (agent) that combines deep learning with reinforcement learning, the agent in Deep Reinforcement Learning (DRL) can learn actively from real-time feedback of users to infer dynamic user preferences.
Disclosure of Invention
In order to solve the above problems, the present application provides a lesson-based lesson recommendation system based on hierarchical reinforcement learning, which realizes modification of input history of students according to lesson registration information by using hierarchical reinforcement learning data, and then recommends new lessons by using hierarchical reinforcement learning to improve learning efficiency and interest of students, the lesson recommendation system based on hierarchical reinforcement learning mainly comprises the following steps:
1. data preprocessing: the method is based on a public data set, wherein the mousse data set (MOOCCubeX) comprises 4, 216 courses, 230263 videos, 358265 exercises and 2.96 hundred million original behavior data of more than 2330294 students, each student data comprises student IDs, schools, course registration sequences and the like, students with rich course registration experience (more than or equal to 5) are screened, meanwhile, behaviors such as student data, video watching of the students, course comments and the like are integrated, and information such as course numbers (IDs), domains/subjects to which the courses belong, course resources (videos, exercises) and the like are integrated.
2. Hierarchical reinforcement learning data: firstly, using student course registration information as a data set, using the last course registered by students as a prediction target, training a basic recommendation model (NAIS) with an attention mechanism, and then formulating data hierarchical reinforcement learning into a two-stage Markov Decision Process (MDP), and modifying the course registration information of students with incorrect NAIS model prediction.
Firstly, defining high-level reinforcement learning, wherein the high-level reinforcement learning is mainly action and state, and rewarding the action is the action fed back to the state by the agent; the state is the environment in which the agent interacts and is updated according to the action of the agent; rewards are a signal that indicates whether the performed action is reasonable or not, the environment feeds back to the agent action according to the rewards function, and the agent updates its policy by rewards.
The high-level agent acts as whether to modify course information, set it as binary, 0 is not modified, 1 is modified; the status is that the student history course registration information and the target course embed vector cosine similarity, the embedded vector is provided by a pre-trained basic recommendation model (NAIS), thus the relevance of the i-th history course and the target course can be effectively represented.
Wherein θ is i For the included angle between the i-th historic course embedded vector and the target course embedded vector,and B is connected with m The values of the i-th historic course embedded vector and the target course embedded vector are respectively.
The higher-level rewards are obtained by the lower-level rewards, if the higher-level task chooses to modify course information, it will call the lower-level task and obtain the same rewards after performing the last lower-level action.
The lower agent judges whether the target course is deleted from the history course information, and also mainly comprises an action a i State s i Awarding R. Low-level state s i Increasing the average value of the first two historical course characteristics on the basis of the high-level state; the low-level action is also binary, 0 means not deleting the current course, 1 means deleting:
wherein a is i Is a low-level action, s i For the lower-layer state, gamma, alpha and b are parameters to be learned, the number of state features is related to the dimension of a network hidden layer, and gamma (alphas) i +b) is an embedded value of the input state σ is a Sigmoid function, outputting the value in a probabilistic form (between 0 and 1).
The low-level rewards are set to be delay rewards after each action in the low-level agent performs low-level reinforcement learning i (course information) is input into the NAIS model to obtain the model predictive accuracy probability p (y/S) i ,c l ) And rewards are obtained, the formula of rewards is as follows:
wherein y is the action of the higher layer agent, S i For the modified low-level state (course information), S i When y=0, bonus R (a i ,s i ) Likewise, theAt zero, when y=1, the prize R (a i ,s i ) Is the difference between the post-modification predicted correct probability and the pre-modification predicted correct probability.
3. Hierarchical reinforcement learning recommendation: the hierarchical reinforcement learning divides the course recommendation into two tasks, namely the sequential recommendation field and the course.
The goal of the higher agent is to recommend an appropriate domain/discipline, where state S i Including student files, student historic courses, student historic behaviors, and recently selected domain information i Specifically, we will state S i Adding to the full connection layer to obtain the ith high-level motion embedded vector A i The formula of the high-level action is as follows:
A i =t(αS i +B i )
wherein A is i Is the action embedded vector of the high-level agent, t is the tanh function, S i And then, calculating cosine similarity between the high-level actions and all candidate courses in the fields, and directly taking the field of the most similar course as the prediction field of the high-level actions.
The reward R of the high-level reinforcement learning is whether the course field in the training set is consistent with the action of the high-level agent, if so, positive feedback is obtained, and otherwise, negative feedback is obtained.
The goal of the low-level agent is to recommend the most appropriate course, where the low-level agentInformation including student files, student history courses, recent course information, and domain constraints>Also the object of the agent, i.e. generating the appropriateRecommended course, we will also state +.>Added to the fully connected layer to get the i-th low-level motion embedded vector +.>Low layer prize R l The course learning time, course evaluation and course richness are selected to measure, the following is a low-level rewarding formula:
wherein the method comprises the steps ofRewards obtained by taking learning time of student course i as an evaluation index; />For students to evaluate course i, positive evaluation can obtain high rewards, and negative evaluation can obtain negative rewards; />For rewards of course i richness, the richer the course content (video, exercises, forum, etc.) is, the higher the rewards obtained are, alpha, beta, gamma are the weights of different rewards, b i Is the deviation value.
And feeding back the prediction results of the two layers of agents to the environment, rewarding and updating the agents according to the rewarding function by the environment, updating the strategy of the two layers of agents by the same, and obtaining a final result after training, namely predicting the target course.
In this application, based on the foregoing solution, a hierarchical reinforcement learning course recommendation device is described, including the following modules:
the reinforcement learning data module is used for generating proper user data and course data, screening the user course information before to meet the course recommendation requirement (namely, the course items are rich), and the rest are abandoned together, and then processing the screened user course information through a hierarchical reinforcement learning algorithm, wherein the obtained result is used for constructing a data set;
and the reinforcement learning recommendation module is used for interactively learning with the environment through two layers of agents and training two layers of different agents through reasonable rewards and behaviors of design, wherein the lower layer agents predict courses under the condition that the prediction field is limited by the upper layer, and a course recommendation system result is obtained.
The application provides a recommendation system establishment method, which comprises the following steps: and after the data set is constructed, training the two layers of different agents by using hierarchical reinforcement learning recommendation, wherein the agents obtain training according to feedback of the environment, and obtaining a course recommendation result after the training is finished.
Drawings
Fig. 1 is a training flow diagram of the hierarchical reinforcement learning lesson recommendation method of the present application.
Fig. 2 is a test flow chart of the hierarchical reinforcement learning lesson recommendation method of the present application.
Detailed Description
The embodiments described below are only a part of the present application and should not be construed as being limited to all examples of the above-described modes.
S101: data preprocessing: and meanwhile, the behaviors of the student data, the video watching of the students, the comments of the courses and the like are integrated, and the information of the course ID, the domain/discipline of the course, the course resource (video, exercise) and the like is integrated.
S102: training reinforcement learning data: firstly, pre-training a basic recommendation model (NAIS) by using screened student course information, then processing the screened student course information by using hierarchical reinforcement learning, firstly obtaining student course selection data information by using a high-level agent, judging whether to modify student course selection history information, if so, judging whether specific courses are reserved one by using a low-level agent, if so, inputting a modified result into the pre-trained basic recommendation model (NAIS), obtaining rewards by the difference of accuracy of the NAIS model before and after modification, and feeding the rewards back to data hierarchical reinforcement learning.
S103: training reinforcement learning recommendation: the method comprises the steps that a high-level agent uses a current strategy to send a generated field to a low-level agent, then the low-level agent generates a recommended course under the constraint of the field and inputs the recommended course into the environment, the environment obtains rewards through a rewarding function and feeds the rewards back to two-level agents, the agents update own strategy by means of the rewards, then the next training is carried out, and after the training is finished, the recommended course generated by the reinforced learning recommendation is the final result.
Fig. 1 is a training flowchart of the hierarchical reinforcement learning lesson recommendation method of the present application, showing the interaction flow between the reinforcement learning data module and the reinforcement learning recommendation module during training.
Fig. 2 is a test flow chart of the hierarchical reinforcement learning lesson recommendation method of the present application, showing how the system obtains test results during the test.
While the preferred embodiment of the present application has been described in detail, the application is not limited to the embodiment, and various equivalent modifications and substitutions can be made by those skilled in the art without departing from the spirit of the application, and these equivalent modifications and substitutions are intended to be included in the scope of the present application as defined in the appended claims.

Claims (4)

1. The method for recommending the lesson-based course based on the hierarchical reinforcement learning is characterized by comprising the following steps of:
s1: data preprocessing: the data is obtained from a mousing course data set (MOOCCubeX), wherein necessary data is course information, the field/discipline to which the course belongs and a student file, wherein the necessary data in the student file is student number (id), the course registration order can be added with other data on the basis of necessity to build a more perfect data set (such as school information, course content and the like), and more detailed data is provided for a recommendation system.
S2: hierarchical reinforcement learning data: processing data establishment hierarchical reinforcement learning, judging whether to prune course registration information of students through two different agents (agents), wherein a higher agent is used for deleting the course registration information of the students; the lower agent judges that the specific course of the student is deleted from the registration information so as to reduce the interference of the specific course of the student on the result of the recommendation system, and then the processed registration information of the course of the student is used as a data set.
S3: hierarchical reinforcement learning recommendation: using two layers of different agent training models, wherein the high-level agents are responsible for predicting the field/discipline to which the recommended course belongs; the low-level agent receives the field/discipline, predicts the course under the constraint of the predicted field/discipline, so that the problem of overlarge probability of course recommendation in the hot field can be avoided to a great extent, and the accuracy of system recommendation is obviously improved.
2. The method S2 step of a lesson recommended system for lessons based on hierarchical reinforcement learning according to claim 1, wherein: the hierarchical reinforcement learning data divides the data processing task into two parts, a basic recommendation model is required to be pre-trained, wherein the high-level reinforcement learning is responsible for judging whether the course selection information of a target student needs to be modified, and the low-level reinforcement learning judges whether a specific course of the target student needs to be deleted or not on the basis of the high-level judgment of the need of modification; inputting the deleted course selection history information into a basic recommendation model, obtaining rewards through the difference of the correction rates of the basic recommendation model before and after modification, feeding the rewards back to the hierarchical reinforcement learning data, and performing the training of the hierarchical reinforcement learning recommendation by using the modified data.
3. The method S3 for a lesson recommended system based on hierarchical reinforcement learning according to claim 1, wherein: the hierarchical reinforcement learning recommendation divides a course recommendation task into two parts, wherein a high-level reinforcement learning is responsible for generating a field to which a prediction course belongs, and a low-level reinforcement learning is responsible for generating a predicted target course, firstly, a high-level reward and action (action) are designed, wherein the high-level agent interacts with the environment, and the field to which the prediction course belongs is obtained through the current state; and then designing low-level rewards and actions on the basis of the low-level rewards and actions, predicting the low-level agents under the constraint of the high-level predicted results, and feeding back the recommended results to the environment, wherein the environment feeds back rewards to the high-level agents and the low-level agents through two rewards functions, and the high-level agents and the low-level agents update own strategies until training is finished.
4. A method of a class course recommendation system based on hierarchical reinforcement learning, comprising:
the reinforcement learning data module is used for generating proper user data and course data, screening user course information meeting the course recommendation requirement (namely rich course items) before reinforcement learning data is carried out, and the rest is abandoned, and then obtaining the needed user and course information through a hierarchical reinforcement learning algorithm to construct a data set.
The reinforcement learning recommendation module is used for interactively learning with the environment through two layers of agents and training two layers of different agents through designed reasonable rewards and behaviors; and predicting the course by the lower agent under the condition that the prediction field is limited by the upper layer, so as to obtain a course recommendation system result.
CN202310405646.7A 2023-04-17 2023-04-17 Hierarchical reinforcement learning-based lesson admiring course recommendation system method Pending CN116595245A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310405646.7A CN116595245A (en) 2023-04-17 2023-04-17 Hierarchical reinforcement learning-based lesson admiring course recommendation system method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310405646.7A CN116595245A (en) 2023-04-17 2023-04-17 Hierarchical reinforcement learning-based lesson admiring course recommendation system method

Publications (1)

Publication Number Publication Date
CN116595245A true CN116595245A (en) 2023-08-15

Family

ID=87592707

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310405646.7A Pending CN116595245A (en) 2023-04-17 2023-04-17 Hierarchical reinforcement learning-based lesson admiring course recommendation system method

Country Status (1)

Country Link
CN (1) CN116595245A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117151346A (en) * 2023-10-30 2023-12-01 中国民航大学 Civil aviation specialty teaching training system based on wisdom study

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117151346A (en) * 2023-10-30 2023-12-01 中国民航大学 Civil aviation specialty teaching training system based on wisdom study
CN117151346B (en) * 2023-10-30 2024-02-09 中国民航大学 Civil aviation specialty teaching training system based on wisdom study

Similar Documents

Publication Publication Date Title
Wan et al. A hybrid e-learning recommendation approach based on learners’ influence propagation
Al-Hmouz et al. Modeling and simulation of an adaptive neuro-fuzzy inference system (ANFIS) for mobile learning
CN111291266A (en) Artificial intelligence based recommendation method and device, electronic equipment and storage medium
Gan et al. Knowledge structure enhanced graph representation learning model for attentive knowledge tracing
CN110580314A (en) Course recommendation method and system based on graph convolution neural network and dynamic weight
CN113268611B (en) Learning path optimization method based on deep knowledge tracking and reinforcement learning
CN108228674A (en) A kind of information processing method and device based on DKT
CN116595245A (en) Hierarchical reinforcement learning-based lesson admiring course recommendation system method
CN115545160B (en) Knowledge tracking method and system for multi-learning behavior collaboration
CN112800323A (en) Intelligent teaching system based on deep learning
CN115329959A (en) Learning target recommendation method based on double-flow knowledge embedded network
CN115115389A (en) Express customer loss prediction method based on value subdivision and integrated prediction
Brigui-Chtioui et al. Intelligent digital learning: Agent-based recommender system
Bhaskar et al. Genetic algorithm based adaptive learning scheme generation for context aware e-learning
CN115168721A (en) User interest recommendation method and system integrating collaborative transformation and temporal perception
CN113934840B (en) Covering heuristic quantity sensing exercise recommendation method
Yang et al. Research on students’ adaptive learning system based on deep learning model
CN117035074A (en) Multi-modal knowledge generation method and device based on feedback reinforcement
Zhang et al. Develop academic question recommender based on Bayesian network for personalizing student’s practice
CN113609337A (en) Pre-training method, device, equipment and medium of graph neural network
CN117056595A (en) Interactive project recommendation method and device and computer readable storage medium
Sarwar et al. Ontology Based e-Learning Systems: A Step towards Adaptive Content Recommendation
CN114971066A (en) Knowledge tracking method and system integrating forgetting factor and learning ability
CN113360635B (en) Intelligent teaching method and system based on self-attention and pre-training mechanism
CN115952838B (en) Self-adaptive learning recommendation system-based generation method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination