CN116595245A - Hierarchical reinforcement learning-based lesson admiring course recommendation system method - Google Patents
Hierarchical reinforcement learning-based lesson admiring course recommendation system method Download PDFInfo
- Publication number
- CN116595245A CN116595245A CN202310405646.7A CN202310405646A CN116595245A CN 116595245 A CN116595245 A CN 116595245A CN 202310405646 A CN202310405646 A CN 202310405646A CN 116595245 A CN116595245 A CN 116595245A
- Authority
- CN
- China
- Prior art keywords
- course
- reinforcement learning
- level
- recommendation
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000002787 reinforcement Effects 0.000 title claims abstract description 59
- 238000000034 method Methods 0.000 title claims abstract description 19
- 238000012545 processing Methods 0.000 claims abstract description 5
- 230000009471 action Effects 0.000 claims description 21
- 238000012549 training Methods 0.000 claims description 20
- 238000012986 modification Methods 0.000 claims description 8
- 230000004048 modification Effects 0.000 claims description 7
- 230000006399 behavior Effects 0.000 claims description 6
- 230000006870 function Effects 0.000 claims description 6
- 238000007781 pre-processing Methods 0.000 claims description 3
- 238000012216 screening Methods 0.000 claims description 2
- 238000012937 correction Methods 0.000 claims 1
- 230000008569 process Effects 0.000 abstract description 2
- 239000003795 chemical substances by application Substances 0.000 description 33
- 238000011156 evaluation Methods 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 238000006467 substitution reaction Methods 0.000 description 2
- 241000195940 Bryophyta Species 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 235000011929 mousse Nutrition 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
- G06Q10/06393—Score-carding, benchmarking or key performance indicator [KPI] analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/20—Education
- G06Q50/205—Education administration or guidance
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Human Resources & Organizations (AREA)
- General Physics & Mathematics (AREA)
- Educational Administration (AREA)
- Strategic Management (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Tourism & Hospitality (AREA)
- Health & Medical Sciences (AREA)
- Economics (AREA)
- General Health & Medical Sciences (AREA)
- General Business, Economics & Management (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Educational Technology (AREA)
- Development Economics (AREA)
- Entrepreneurship & Innovation (AREA)
- Mathematical Physics (AREA)
- Marketing (AREA)
- Molecular Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- Quality & Reliability (AREA)
- Operations Research (AREA)
- Game Theory and Decision Science (AREA)
- Primary Health Care (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Electrically Operated Instructional Devices (AREA)
Abstract
The application provides a method for a lesson admiring course recommendation system based on layered reinforcement learning, belonging to the field of reinforcement learning education; the reinforcement learning data module processes the history course selection information of the students by using layered reinforcement learning, and judges whether the target course selection information of the students needs to be modified or not through two layers of intelligent agents, so that the interference of irrelevant courses on the result of a recommendation system is greatly reduced, and a better data processing mode is provided for the course recommendation system; the reinforcement learning recommendation module uses layered reinforcement learning to conduct course recommendation, the course recommendation task is divided into two parts, and two layers of intelligent agents are used for jointly recommending and predicting target courses, so that the problem of overlarge probability of course recommendation in the hot field can be avoided to a great extent, the accuracy of a recommendation system is improved, and the thought and the method of a new mousing course recommendation system are provided.
Description
Technical Field
The application relates to the field of reinforcement learning education, in particular to a course recommendation system method based on hierarchical reinforcement learning.
Background
In daily life, we face a long list of items in the webstore, we will realize that the more items in the list are, the more difficult we choose, the recommendation system is a tool and algorithm whose idea is to help users find items of interest by predicting their preferences or rating the items.
In contrast to deep reinforcement learning, which is a training agent (agent) that combines deep learning with reinforcement learning, the agent in Deep Reinforcement Learning (DRL) can learn actively from real-time feedback of users to infer dynamic user preferences.
Disclosure of Invention
In order to solve the above problems, the present application provides a lesson-based lesson recommendation system based on hierarchical reinforcement learning, which realizes modification of input history of students according to lesson registration information by using hierarchical reinforcement learning data, and then recommends new lessons by using hierarchical reinforcement learning to improve learning efficiency and interest of students, the lesson recommendation system based on hierarchical reinforcement learning mainly comprises the following steps:
1. data preprocessing: the method is based on a public data set, wherein the mousse data set (MOOCCubeX) comprises 4, 216 courses, 230263 videos, 358265 exercises and 2.96 hundred million original behavior data of more than 2330294 students, each student data comprises student IDs, schools, course registration sequences and the like, students with rich course registration experience (more than or equal to 5) are screened, meanwhile, behaviors such as student data, video watching of the students, course comments and the like are integrated, and information such as course numbers (IDs), domains/subjects to which the courses belong, course resources (videos, exercises) and the like are integrated.
2. Hierarchical reinforcement learning data: firstly, using student course registration information as a data set, using the last course registered by students as a prediction target, training a basic recommendation model (NAIS) with an attention mechanism, and then formulating data hierarchical reinforcement learning into a two-stage Markov Decision Process (MDP), and modifying the course registration information of students with incorrect NAIS model prediction.
Firstly, defining high-level reinforcement learning, wherein the high-level reinforcement learning is mainly action and state, and rewarding the action is the action fed back to the state by the agent; the state is the environment in which the agent interacts and is updated according to the action of the agent; rewards are a signal that indicates whether the performed action is reasonable or not, the environment feeds back to the agent action according to the rewards function, and the agent updates its policy by rewards.
The high-level agent acts as whether to modify course information, set it as binary, 0 is not modified, 1 is modified; the status is that the student history course registration information and the target course embed vector cosine similarity, the embedded vector is provided by a pre-trained basic recommendation model (NAIS), thus the relevance of the i-th history course and the target course can be effectively represented.
Wherein θ is i For the included angle between the i-th historic course embedded vector and the target course embedded vector,and B is connected with m The values of the i-th historic course embedded vector and the target course embedded vector are respectively.
The higher-level rewards are obtained by the lower-level rewards, if the higher-level task chooses to modify course information, it will call the lower-level task and obtain the same rewards after performing the last lower-level action.
The lower agent judges whether the target course is deleted from the history course information, and also mainly comprises an action a i State s i Awarding R. Low-level state s i Increasing the average value of the first two historical course characteristics on the basis of the high-level state; the low-level action is also binary, 0 means not deleting the current course, 1 means deleting:
wherein a is i Is a low-level action, s i For the lower-layer state, gamma, alpha and b are parameters to be learned, the number of state features is related to the dimension of a network hidden layer, and gamma (alphas) i +b) is an embedded value of the input state σ is a Sigmoid function, outputting the value in a probabilistic form (between 0 and 1).
The low-level rewards are set to be delay rewards after each action in the low-level agent performs low-level reinforcement learning i (course information) is input into the NAIS model to obtain the model predictive accuracy probability p (y/S) i ,c l ) And rewards are obtained, the formula of rewards is as follows:
wherein y is the action of the higher layer agent, S i For the modified low-level state (course information), S i When y=0, bonus R (a i ,s i ) Likewise, theAt zero, when y=1, the prize R (a i ,s i ) Is the difference between the post-modification predicted correct probability and the pre-modification predicted correct probability.
3. Hierarchical reinforcement learning recommendation: the hierarchical reinforcement learning divides the course recommendation into two tasks, namely the sequential recommendation field and the course.
The goal of the higher agent is to recommend an appropriate domain/discipline, where state S i Including student files, student historic courses, student historic behaviors, and recently selected domain information i Specifically, we will state S i Adding to the full connection layer to obtain the ith high-level motion embedded vector A i The formula of the high-level action is as follows:
A i =t(αS i +B i )
wherein A is i Is the action embedded vector of the high-level agent, t is the tanh function, S i And then, calculating cosine similarity between the high-level actions and all candidate courses in the fields, and directly taking the field of the most similar course as the prediction field of the high-level actions.
The reward R of the high-level reinforcement learning is whether the course field in the training set is consistent with the action of the high-level agent, if so, positive feedback is obtained, and otherwise, negative feedback is obtained.
The goal of the low-level agent is to recommend the most appropriate course, where the low-level agentInformation including student files, student history courses, recent course information, and domain constraints>Also the object of the agent, i.e. generating the appropriateRecommended course, we will also state +.>Added to the fully connected layer to get the i-th low-level motion embedded vector +.>Low layer prize R l The course learning time, course evaluation and course richness are selected to measure, the following is a low-level rewarding formula:
wherein the method comprises the steps ofRewards obtained by taking learning time of student course i as an evaluation index; />For students to evaluate course i, positive evaluation can obtain high rewards, and negative evaluation can obtain negative rewards; />For rewards of course i richness, the richer the course content (video, exercises, forum, etc.) is, the higher the rewards obtained are, alpha, beta, gamma are the weights of different rewards, b i Is the deviation value.
And feeding back the prediction results of the two layers of agents to the environment, rewarding and updating the agents according to the rewarding function by the environment, updating the strategy of the two layers of agents by the same, and obtaining a final result after training, namely predicting the target course.
In this application, based on the foregoing solution, a hierarchical reinforcement learning course recommendation device is described, including the following modules:
the reinforcement learning data module is used for generating proper user data and course data, screening the user course information before to meet the course recommendation requirement (namely, the course items are rich), and the rest are abandoned together, and then processing the screened user course information through a hierarchical reinforcement learning algorithm, wherein the obtained result is used for constructing a data set;
and the reinforcement learning recommendation module is used for interactively learning with the environment through two layers of agents and training two layers of different agents through reasonable rewards and behaviors of design, wherein the lower layer agents predict courses under the condition that the prediction field is limited by the upper layer, and a course recommendation system result is obtained.
The application provides a recommendation system establishment method, which comprises the following steps: and after the data set is constructed, training the two layers of different agents by using hierarchical reinforcement learning recommendation, wherein the agents obtain training according to feedback of the environment, and obtaining a course recommendation result after the training is finished.
Drawings
Fig. 1 is a training flow diagram of the hierarchical reinforcement learning lesson recommendation method of the present application.
Fig. 2 is a test flow chart of the hierarchical reinforcement learning lesson recommendation method of the present application.
Detailed Description
The embodiments described below are only a part of the present application and should not be construed as being limited to all examples of the above-described modes.
S101: data preprocessing: and meanwhile, the behaviors of the student data, the video watching of the students, the comments of the courses and the like are integrated, and the information of the course ID, the domain/discipline of the course, the course resource (video, exercise) and the like is integrated.
S102: training reinforcement learning data: firstly, pre-training a basic recommendation model (NAIS) by using screened student course information, then processing the screened student course information by using hierarchical reinforcement learning, firstly obtaining student course selection data information by using a high-level agent, judging whether to modify student course selection history information, if so, judging whether specific courses are reserved one by using a low-level agent, if so, inputting a modified result into the pre-trained basic recommendation model (NAIS), obtaining rewards by the difference of accuracy of the NAIS model before and after modification, and feeding the rewards back to data hierarchical reinforcement learning.
S103: training reinforcement learning recommendation: the method comprises the steps that a high-level agent uses a current strategy to send a generated field to a low-level agent, then the low-level agent generates a recommended course under the constraint of the field and inputs the recommended course into the environment, the environment obtains rewards through a rewarding function and feeds the rewards back to two-level agents, the agents update own strategy by means of the rewards, then the next training is carried out, and after the training is finished, the recommended course generated by the reinforced learning recommendation is the final result.
Fig. 1 is a training flowchart of the hierarchical reinforcement learning lesson recommendation method of the present application, showing the interaction flow between the reinforcement learning data module and the reinforcement learning recommendation module during training.
Fig. 2 is a test flow chart of the hierarchical reinforcement learning lesson recommendation method of the present application, showing how the system obtains test results during the test.
While the preferred embodiment of the present application has been described in detail, the application is not limited to the embodiment, and various equivalent modifications and substitutions can be made by those skilled in the art without departing from the spirit of the application, and these equivalent modifications and substitutions are intended to be included in the scope of the present application as defined in the appended claims.
Claims (4)
1. The method for recommending the lesson-based course based on the hierarchical reinforcement learning is characterized by comprising the following steps of:
s1: data preprocessing: the data is obtained from a mousing course data set (MOOCCubeX), wherein necessary data is course information, the field/discipline to which the course belongs and a student file, wherein the necessary data in the student file is student number (id), the course registration order can be added with other data on the basis of necessity to build a more perfect data set (such as school information, course content and the like), and more detailed data is provided for a recommendation system.
S2: hierarchical reinforcement learning data: processing data establishment hierarchical reinforcement learning, judging whether to prune course registration information of students through two different agents (agents), wherein a higher agent is used for deleting the course registration information of the students; the lower agent judges that the specific course of the student is deleted from the registration information so as to reduce the interference of the specific course of the student on the result of the recommendation system, and then the processed registration information of the course of the student is used as a data set.
S3: hierarchical reinforcement learning recommendation: using two layers of different agent training models, wherein the high-level agents are responsible for predicting the field/discipline to which the recommended course belongs; the low-level agent receives the field/discipline, predicts the course under the constraint of the predicted field/discipline, so that the problem of overlarge probability of course recommendation in the hot field can be avoided to a great extent, and the accuracy of system recommendation is obviously improved.
2. The method S2 step of a lesson recommended system for lessons based on hierarchical reinforcement learning according to claim 1, wherein: the hierarchical reinforcement learning data divides the data processing task into two parts, a basic recommendation model is required to be pre-trained, wherein the high-level reinforcement learning is responsible for judging whether the course selection information of a target student needs to be modified, and the low-level reinforcement learning judges whether a specific course of the target student needs to be deleted or not on the basis of the high-level judgment of the need of modification; inputting the deleted course selection history information into a basic recommendation model, obtaining rewards through the difference of the correction rates of the basic recommendation model before and after modification, feeding the rewards back to the hierarchical reinforcement learning data, and performing the training of the hierarchical reinforcement learning recommendation by using the modified data.
3. The method S3 for a lesson recommended system based on hierarchical reinforcement learning according to claim 1, wherein: the hierarchical reinforcement learning recommendation divides a course recommendation task into two parts, wherein a high-level reinforcement learning is responsible for generating a field to which a prediction course belongs, and a low-level reinforcement learning is responsible for generating a predicted target course, firstly, a high-level reward and action (action) are designed, wherein the high-level agent interacts with the environment, and the field to which the prediction course belongs is obtained through the current state; and then designing low-level rewards and actions on the basis of the low-level rewards and actions, predicting the low-level agents under the constraint of the high-level predicted results, and feeding back the recommended results to the environment, wherein the environment feeds back rewards to the high-level agents and the low-level agents through two rewards functions, and the high-level agents and the low-level agents update own strategies until training is finished.
4. A method of a class course recommendation system based on hierarchical reinforcement learning, comprising:
the reinforcement learning data module is used for generating proper user data and course data, screening user course information meeting the course recommendation requirement (namely rich course items) before reinforcement learning data is carried out, and the rest is abandoned, and then obtaining the needed user and course information through a hierarchical reinforcement learning algorithm to construct a data set.
The reinforcement learning recommendation module is used for interactively learning with the environment through two layers of agents and training two layers of different agents through designed reasonable rewards and behaviors; and predicting the course by the lower agent under the condition that the prediction field is limited by the upper layer, so as to obtain a course recommendation system result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310405646.7A CN116595245A (en) | 2023-04-17 | 2023-04-17 | Hierarchical reinforcement learning-based lesson admiring course recommendation system method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310405646.7A CN116595245A (en) | 2023-04-17 | 2023-04-17 | Hierarchical reinforcement learning-based lesson admiring course recommendation system method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116595245A true CN116595245A (en) | 2023-08-15 |
Family
ID=87592707
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310405646.7A Pending CN116595245A (en) | 2023-04-17 | 2023-04-17 | Hierarchical reinforcement learning-based lesson admiring course recommendation system method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116595245A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117151346A (en) * | 2023-10-30 | 2023-12-01 | 中国民航大学 | Civil aviation specialty teaching training system based on wisdom study |
-
2023
- 2023-04-17 CN CN202310405646.7A patent/CN116595245A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117151346A (en) * | 2023-10-30 | 2023-12-01 | 中国民航大学 | Civil aviation specialty teaching training system based on wisdom study |
CN117151346B (en) * | 2023-10-30 | 2024-02-09 | 中国民航大学 | Civil aviation specialty teaching training system based on wisdom study |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wan et al. | A hybrid e-learning recommendation approach based on learners’ influence propagation | |
Al-Hmouz et al. | Modeling and simulation of an adaptive neuro-fuzzy inference system (ANFIS) for mobile learning | |
CN111291266A (en) | Artificial intelligence based recommendation method and device, electronic equipment and storage medium | |
Gan et al. | Knowledge structure enhanced graph representation learning model for attentive knowledge tracing | |
CN110580314A (en) | Course recommendation method and system based on graph convolution neural network and dynamic weight | |
CN113268611B (en) | Learning path optimization method based on deep knowledge tracking and reinforcement learning | |
CN108228674A (en) | A kind of information processing method and device based on DKT | |
CN116595245A (en) | Hierarchical reinforcement learning-based lesson admiring course recommendation system method | |
CN115545160B (en) | Knowledge tracking method and system for multi-learning behavior collaboration | |
CN112800323A (en) | Intelligent teaching system based on deep learning | |
CN115329959A (en) | Learning target recommendation method based on double-flow knowledge embedded network | |
CN115115389A (en) | Express customer loss prediction method based on value subdivision and integrated prediction | |
Brigui-Chtioui et al. | Intelligent digital learning: Agent-based recommender system | |
Bhaskar et al. | Genetic algorithm based adaptive learning scheme generation for context aware e-learning | |
CN115168721A (en) | User interest recommendation method and system integrating collaborative transformation and temporal perception | |
CN113934840B (en) | Covering heuristic quantity sensing exercise recommendation method | |
Yang et al. | Research on students’ adaptive learning system based on deep learning model | |
CN117035074A (en) | Multi-modal knowledge generation method and device based on feedback reinforcement | |
Zhang et al. | Develop academic question recommender based on Bayesian network for personalizing student’s practice | |
CN113609337A (en) | Pre-training method, device, equipment and medium of graph neural network | |
CN117056595A (en) | Interactive project recommendation method and device and computer readable storage medium | |
Sarwar et al. | Ontology Based e-Learning Systems: A Step towards Adaptive Content Recommendation | |
CN114971066A (en) | Knowledge tracking method and system integrating forgetting factor and learning ability | |
CN113360635B (en) | Intelligent teaching method and system based on self-attention and pre-training mechanism | |
CN115952838B (en) | Self-adaptive learning recommendation system-based generation method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |